All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device
@ 2019-11-22 18:29 Eric Auger
  2019-11-22 18:29 ` [PATCH for-5.0 v11 01/20] migration: Support QLIST migration Eric Auger
                   ` (21 more replies)
  0 siblings, 22 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This series implements the QEMU virtio-iommu device.

This matches the v0.12 spec and the corresponding virtio-iommu
driver upstreamed in 5.3.

The pci proxy for the virtio-iommu device is instantiated using
"-device virtio-iommu-pci". This series still relies on ACPI IORT/DT
integration. Note the ACPI IORT integration is not yet upstreamed
and testing needs to be based on Jean-Philippe's additional
kernel patches [1].

Work is ongoing to remove IORT adherence and allow the
bindings between the IOMMU and the root complex to be defined
and written into the PCI device configuration space. The outcome
of this work is uncertain at this stage though. See [2].

So only patches 1-11 fully rely on upstreamed kernel code. Others
should be considered as RFC.

This respin allows people to test on ARM and x86. It also
brings migration support (tested on ARM) and various cleanups.
Reserved regions are now passed through an array of properties.
A libqos test also is introduced to test the virtio-iommu API.

Note integration with vhost devices and vfio devices is not part
of this series. Please follow Bharat's respins [3].

The 1st Patch ("migration: Support QLIST migration") was sent
separately [4].

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu/tree/v4.2-rc2-virtio-iommu-v11

[1] kernel branch to be used for guest
    https://github.com/eauger/linux/tree/v5.4-rc8-virtio-iommu-iort
[2] [RFC 00/13] virtio-iommu on non-devicetree platforms
[3] VFIO/VHOST integration is not part of this series. Please follow
    [PATCH RFC v5 0/5] virtio-iommu: VFIO integration respins
[4] [PATCH v6] migration: Support QLIST migration

Testing:
- tested with guest using virtio-net-pci
  (,vhost=off,iommu_platform,disable-modern=off,disable-legacy=on)
  and virtio-blk-pci
- migration on ARM
- on x86 PC machine I get some AHCI non translated transactions,
  very early. This does not prevent the guest from boot and behaving
  properly. Warnings look like:
qemu-system-x86_64: virtio_iommu_translate sid=250 is not known!!
qemu-system-x86_64: no buffer available in event queue to report event
qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
receive buffer address

History:

v10 -> v11:
- introduce virtio_iommu_handle_req macro
- migration support
- introduce DEFINE_PROP_INTERVAL and pass reserved regions
  through an array of those
- domain gtree simplification

v9 -> v10:
- rebase on 4.1.0-rc2, compliance with 0.12 spec
- removed ACPI part
- cleanup (see individual change logs)
- moved to a PATCH series

v8 -> v9:
- virtio-iommu-pci device needs to be instantiated from the command
  line (RID is not imposed anymore).
- tail structure properly initialized

v7 -> v8:
- virtio-iommu-pci added
- virt instantiation modified
- DT and ACPI modified to exclude the iommu RID from the mapping
- VIRTIO_IOMMU_F_BYPASS, VIRTIO_F_VERSION_1 features exposed

v6 -> v7:
- rebase on qemu 3.0.0-rc3
- minor update against v0.7
- fix issue with EP not on pci.0 and ACPI probing
- change the instantiation method

v5 -> v6:
- minor update against v0.6 spec
- fix g_hash_table_lookup in virtio_iommu_find_add_as
- replace some error_reports by qemu_log_mask(LOG_GUEST_ERROR, ...)

v4 -> v5:
- event queue and fault reporting
- we now return the IOAPIC MSI region if the virtio-iommu is instantiated
  in a PC machine.
- we bypass transactions on MSI HW region and fault on reserved ones.
- We support ACPI boot with mach-virt (based on IORT proposal)
- We moved to the new driver naming conventions
- simplified mach-virt instantiation
- worked around the disappearing of pci_find_primary_bus
- in virtio_iommu_translate, check the dev->as is not NULL
- initialize as->device_list in virtio_iommu_get_as
- initialize bufstate.error to false in virtio_iommu_probe

v3 -> v4:
- probe request support although no reserved region is returned at
  the moment
- unmap semantics less strict, as specified in v0.4
- device registration, attach/detach revisited
- split into smaller patches to ease review
- propose a way to inform the IOMMU mr about the page_size_mask
  of underlying HW IOMMU, if any
- remove warning associated with the translation of the MSI doorbell

v2 -> v3:
- rebase on top of 2.10-rc0 and especially
  [PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
- add mutex init
- fix as->mappings deletion using g_tree_ref/unref
- when a dev is attached whereas it is already attached to
  another address space, first detach it
- fix some error values
- page_sizes = TARGET_PAGE_MASK;
- I haven't changed the unmap() semantics yet, waiting for the
  next virtio-iommu spec revision.

v1 -> v2:
- fix redefinition of viommu_as typedef



Eric Auger (20):
  migration: Support QLIST migration
  virtio-iommu: Add skeleton
  virtio-iommu: Decode the command payload
  virtio-iommu: Add the iommu regions
  virtio-iommu: Endpoint and domains structs and helpers
  virtio-iommu: Implement attach/detach command
  virtio-iommu: Implement map/unmap
  virtio-iommu: Implement translate
  virtio-iommu: Implement fault reporting
  virtio-iommu-pci: Add virtio iommu pci support
  hw/arm/virt: Add the virtio-iommu device tree mappings
  qapi: Introduce DEFINE_PROP_INTERVAL
  virtio-iommu: Implement probe request
  virtio-iommu: Handle reserved regions in the translation process
  virtio-iommu-pci: Add array of Interval properties
  hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper
  hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table
  virtio-iommu: Support migration
  pc: Add support for virtio-iommu-pci
  tests: Add virtio-iommu test

 hw/arm/virt-acpi-build.c         |  91 ++-
 hw/arm/virt.c                    |  53 +-
 hw/core/qdev-properties.c        |  90 +++
 hw/i386/acpi-build.c             |  72 +++
 hw/i386/pc.c                     |  15 +-
 hw/virtio/Kconfig                |   5 +
 hw/virtio/Makefile.objs          |   2 +
 hw/virtio/trace-events           |  22 +
 hw/virtio/virtio-iommu-pci.c     |  91 +++
 hw/virtio/virtio-iommu.c         | 952 +++++++++++++++++++++++++++++++
 include/exec/memory.h            |   6 +
 include/hw/acpi/acpi-defs.h      |  21 +-
 include/hw/arm/virt.h            |   2 +
 include/hw/i386/pc.h             |   2 +
 include/hw/pci/pci.h             |   1 +
 include/hw/qdev-properties.h     |   3 +
 include/hw/virtio/virtio-iommu.h |  67 +++
 include/migration/vmstate.h      |  21 +
 include/qemu/queue.h             |  39 ++
 include/qemu/typedefs.h          |   1 +
 migration/trace-events           |   5 +
 migration/vmstate-types.c        |  70 +++
 qdev-monitor.c                   |   1 +
 tests/Makefile.include           |   2 +
 tests/libqos/virtio-iommu.c      | 177 ++++++
 tests/libqos/virtio-iommu.h      |  45 ++
 tests/test-vmstate.c             | 170 ++++++
 tests/virtio-iommu-test.c        | 261 +++++++++
 28 files changed, 2253 insertions(+), 34 deletions(-)
 create mode 100644 hw/virtio/virtio-iommu-pci.c
 create mode 100644 hw/virtio/virtio-iommu.c
 create mode 100644 include/hw/virtio/virtio-iommu.h
 create mode 100644 tests/libqos/virtio-iommu.c
 create mode 100644 tests/libqos/virtio-iommu.h
 create mode 100644 tests/virtio-iommu-test.c

-- 
2.20.1



^ permalink raw reply	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 01/20] migration: Support QLIST migration
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-11-27 11:46   ` Dr. David Alan Gilbert
  2019-11-22 18:29 ` [PATCH for-5.0 v11 02/20] virtio-iommu: Add skeleton Eric Auger
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

Support QLIST migration using the same principle as QTAILQ:
94869d5c52 ("migration: migrate QTAILQ").

The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
and QLIST_RAW_REVERSE.

Tests also are provided.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v5 - v6:
- by doing more advanced testing with virtio-iommu migration
  I noticed this was broken. "prev" field was not set properly.
  I improved the tests to manipulate both the next and prev
  fields.
- Removed Peter and Juan's R-b
---
 include/migration/vmstate.h |  21 +++++
 include/qemu/queue.h        |  39 +++++++++
 migration/trace-events      |   5 ++
 migration/vmstate-types.c   |  70 +++++++++++++++
 tests/test-vmstate.c        | 170 ++++++++++++++++++++++++++++++++++++
 5 files changed, 305 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index ac4f46a67d..08683d93c6 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -227,6 +227,7 @@ extern const VMStateInfo vmstate_info_tmp;
 extern const VMStateInfo vmstate_info_bitmap;
 extern const VMStateInfo vmstate_info_qtailq;
 extern const VMStateInfo vmstate_info_gtree;
+extern const VMStateInfo vmstate_info_qlist;
 
 #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
 /*
@@ -796,6 +797,26 @@ extern const VMStateInfo vmstate_info_gtree;
     .offset       = offsetof(_state, _field),                                  \
 }
 
+/*
+ * For migrating a QLIST
+ * Target QLIST needs be properly initialized.
+ * _type: type of QLIST element
+ * _next: name of QLIST_ENTRY entry field in QLIST element
+ * _vmsd: VMSD for QLIST element
+ * size: size of QLIST element
+ * start: offset of QLIST_ENTRY in QTAILQ element
+ */
+#define VMSTATE_QLIST_V(_field, _state, _version, _vmsd, _type, _next)  \
+{                                                                        \
+    .name         = (stringify(_field)),                                 \
+    .version_id   = (_version),                                          \
+    .vmsd         = &(_vmsd),                                            \
+    .size         = sizeof(_type),                                       \
+    .info         = &vmstate_info_qlist,                                 \
+    .offset       = offsetof(_state, _field),                            \
+    .start        = offsetof(_type, _next),                              \
+}
+
 /* _f : field name
    _f_n : num of elements field_name
    _n : num of elements
diff --git a/include/qemu/queue.h b/include/qemu/queue.h
index 4764d93ea3..4d4554a7ce 100644
--- a/include/qemu/queue.h
+++ b/include/qemu/queue.h
@@ -501,4 +501,43 @@ union {                                                                 \
         QTAILQ_RAW_TQH_CIRC(head)->tql_prev = QTAILQ_RAW_TQE_CIRC(elm, entry);  \
 } while (/*CONSTCOND*/0)
 
+#define QLIST_RAW_FIRST(head)                                                  \
+        field_at_offset(head, 0, void *)
+
+#define QLIST_RAW_NEXT(elm, entry)                                             \
+        field_at_offset(elm, entry, void *)
+
+#define QLIST_RAW_PREVIOUS(elm, entry)                                         \
+        field_at_offset(elm, entry + sizeof(void *), void *)
+
+#define QLIST_RAW_FOREACH(elm, head, entry)                                    \
+        for ((elm) = *QLIST_RAW_FIRST(head);                                   \
+             (elm);                                                            \
+             (elm) = *QLIST_RAW_NEXT(elm, entry))
+
+#define QLIST_RAW_INSERT_HEAD(head, elm, entry) do {                           \
+        void *first = *QLIST_RAW_FIRST(head);                                  \
+        *QLIST_RAW_FIRST(head) = elm;                                          \
+        *QLIST_RAW_PREVIOUS(elm, entry) = QLIST_RAW_FIRST(head);               \
+        if (first) {                                                           \
+            *QLIST_RAW_NEXT(elm, entry) = first;                               \
+            *QLIST_RAW_PREVIOUS(first, entry) = QLIST_RAW_NEXT(elm, entry);    \
+        } else {                                                               \
+            *QLIST_RAW_NEXT(elm, entry) = NULL;                                \
+        }                                                                      \
+} while (0)
+
+#define QLIST_RAW_REVERSE(head, elm, entry) do {                               \
+        void *iter = *QLIST_RAW_FIRST(head), *prev = NULL, *next;              \
+        while (iter) {                                                         \
+            next = *QLIST_RAW_NEXT(iter, entry);                               \
+            *QLIST_RAW_PREVIOUS(iter, entry) = QLIST_RAW_NEXT(next, entry);    \
+            *QLIST_RAW_NEXT(iter, entry) = prev;                               \
+            prev = iter;                                                       \
+            iter = next;                                                       \
+        }                                                                      \
+        *QLIST_RAW_FIRST(head) = prev;                                         \
+        *QLIST_RAW_PREVIOUS(prev, entry) = QLIST_RAW_FIRST(head);              \
+} while (0)
+
 #endif /* QEMU_SYS_QUEUE_H */
diff --git a/migration/trace-events b/migration/trace-events
index 6dee7b5389..e0a33cffca 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -76,6 +76,11 @@ get_gtree_end(const char *field_name, const char *key_vmsd_name, const char *val
 put_gtree(const char *field_name, const char *key_vmsd_name, const char *val_vmsd_name, uint32_t nnodes) "%s(%s/%s) nnodes=%d"
 put_gtree_end(const char *field_name, const char *key_vmsd_name, const char *val_vmsd_name, int ret) "%s(%s/%s) %d"
 
+get_qlist(const char *field_name, const char *vmsd_name, int version_id) "%s(%s v%d)"
+get_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
+put_qlist(const char *field_name, const char *vmsd_name, int version_id) "%s(%s v%d)"
+put_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
+
 # qemu-file.c
 qemu_file_fclose(void) ""
 
diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
index 7236cf92bc..1eee36773a 100644
--- a/migration/vmstate-types.c
+++ b/migration/vmstate-types.c
@@ -843,3 +843,73 @@ const VMStateInfo vmstate_info_gtree = {
     .get  = get_gtree,
     .put  = put_gtree,
 };
+
+static int put_qlist(QEMUFile *f, void *pv, size_t unused_size,
+                     const VMStateField *field, QJSON *vmdesc)
+{
+    const VMStateDescription *vmsd = field->vmsd;
+    /* offset of the QTAILQ entry in a QTAILQ element*/
+    size_t entry_offset = field->start;
+    void *elm;
+    int ret;
+
+    trace_put_qlist(field->name, vmsd->name, vmsd->version_id);
+    QLIST_RAW_FOREACH(elm, pv, entry_offset) {
+        qemu_put_byte(f, true);
+        ret = vmstate_save_state(f, vmsd, elm, vmdesc);
+        if (ret) {
+            error_report("%s: failed to save %s (%d)", field->name,
+                         vmsd->name, ret);
+            return ret;
+        }
+    }
+    qemu_put_byte(f, false);
+    trace_put_qlist_end(field->name, vmsd->name);
+
+    return 0;
+}
+
+static int get_qlist(QEMUFile *f, void *pv, size_t unused_size,
+                     const VMStateField *field)
+{
+    int ret = 0;
+    const VMStateDescription *vmsd = field->vmsd;
+    /* size of a QLIST element */
+    size_t size = field->size;
+    /* offset of the QLIST entry in a QLIST element */
+    size_t entry_offset = field->start;
+    int version_id = field->version_id;
+    void *elm;
+
+    trace_get_qlist(field->name, vmsd->name, vmsd->version_id);
+    if (version_id > vmsd->version_id) {
+        error_report("%s %s",  vmsd->name, "too new");
+        return -EINVAL;
+    }
+    if (version_id < vmsd->minimum_version_id) {
+        error_report("%s %s",  vmsd->name, "too old");
+        return -EINVAL;
+    }
+
+    while (qemu_get_byte(f)) {
+        elm = g_malloc(size);
+        ret = vmstate_load_state(f, vmsd, elm, version_id);
+        if (ret) {
+            error_report("%s: failed to load %s (%d)", field->name,
+                         vmsd->name, ret);
+            g_free(elm);
+            return ret;
+        }
+        QLIST_RAW_INSERT_HEAD(pv, elm, entry_offset);
+    }
+    QLIST_RAW_REVERSE(pv, elm, entry_offset);
+    trace_get_qlist_end(field->name, vmsd->name);
+
+    return ret;
+}
+
+const VMStateInfo vmstate_info_qlist = {
+    .name = "qlist",
+    .get  = get_qlist,
+    .put  = put_qlist,
+};
diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
index 1e5be1d4ff..9660f932b9 100644
--- a/tests/test-vmstate.c
+++ b/tests/test-vmstate.c
@@ -926,6 +926,28 @@ static const VMStateDescription vmstate_domain = {
     }
 };
 
+/* test QLIST Migration */
+
+typedef struct TestQListElement {
+    uint32_t  id;
+    QLIST_ENTRY(TestQListElement) next;
+} TestQListElement;
+
+typedef struct TestQListContainer {
+    uint32_t  id;
+    QLIST_HEAD(, TestQListElement) list;
+} TestQListContainer;
+
+static const VMStateDescription vmstate_qlist_element = {
+    .name = "test/queue list",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(id, TestQListElement),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 static const VMStateDescription vmstate_iommu = {
     .name = "iommu",
     .version_id = 1,
@@ -939,6 +961,18 @@ static const VMStateDescription vmstate_iommu = {
     }
 };
 
+static const VMStateDescription vmstate_container = {
+    .name = "test/container/qlist",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(id, TestQListContainer),
+        VMSTATE_QLIST_V(list, TestQListContainer, 1, vmstate_qlist_element,
+                        TestQListElement, next),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 uint8_t first_domain_dump[] = {
     /* id */
     0x00, 0x0, 0x0, 0x6,
@@ -1229,6 +1263,140 @@ static void test_gtree_load_iommu(void)
     qemu_fclose(fload);
 }
 
+static uint8_t qlist_dump[] = {
+    0x00, 0x00, 0x00, 0x01, /* container id */
+    0x1, /* start of a */
+    0x00, 0x00, 0x00, 0x0a,
+    0x1, /* start of b */
+    0x00, 0x00, 0x0b, 0x00,
+    0x1, /* start of c */
+    0x00, 0x0c, 0x00, 0x00,
+    0x1, /* start of d */
+    0x0d, 0x00, 0x00, 0x00,
+    0x0, /* end of list */
+    QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
+};
+
+static TestQListContainer *alloc_container(void)
+{
+    TestQListElement *a = g_malloc(sizeof(TestQListElement));
+    TestQListElement *b = g_malloc(sizeof(TestQListElement));
+    TestQListElement *c = g_malloc(sizeof(TestQListElement));
+    TestQListElement *d = g_malloc(sizeof(TestQListElement));
+    TestQListContainer *container = g_malloc(sizeof(TestQListContainer));
+
+    a->id = 0x0a;
+    b->id = 0x0b00;
+    c->id = 0xc0000;
+    d->id = 0xd000000;
+    container->id = 1;
+
+    QLIST_INIT(&container->list);
+    QLIST_INSERT_HEAD(&container->list, d, next);
+    QLIST_INSERT_HEAD(&container->list, c, next);
+    QLIST_INSERT_HEAD(&container->list, b, next);
+    QLIST_INSERT_HEAD(&container->list, a, next);
+    return container;
+}
+
+static void free_container(TestQListContainer *container)
+{
+    TestQListElement *iter, *tmp;
+
+    QLIST_FOREACH_SAFE(iter, &container->list, next, tmp) {
+        QLIST_REMOVE(iter, next);
+        g_free(iter);
+    }
+    g_free(container);
+}
+
+static void compare_containers(TestQListContainer *c1, TestQListContainer *c2)
+{
+    TestQListElement *first_item_c1, *first_item_c2;
+
+    while (!QLIST_EMPTY(&c1->list)) {
+        first_item_c1 = QLIST_FIRST(&c1->list);
+        first_item_c2 = QLIST_FIRST(&c2->list);
+        assert(first_item_c2);
+        assert(first_item_c1->id == first_item_c2->id);
+        QLIST_REMOVE(first_item_c1, next);
+        QLIST_REMOVE(first_item_c2, next);
+        g_free(first_item_c1);
+        g_free(first_item_c2);
+    }
+    assert(QLIST_EMPTY(&c2->list));
+}
+
+/*
+ * Check the prev & next fields are correct by doing list
+ * manipulations on the container. We will do that for both
+ * the source and the destination containers
+ */
+static void manipulate_container(TestQListContainer *c)
+{
+     TestQListElement *prev, *iter = QLIST_FIRST(&c->list);
+     TestQListElement *elem;
+
+     elem = g_malloc(sizeof(TestQListElement));
+     elem->id = 0x12;
+     QLIST_INSERT_AFTER(iter, elem, next);
+
+     elem = g_malloc(sizeof(TestQListElement));
+     elem->id = 0x13;
+     QLIST_INSERT_HEAD(&c->list, elem, next);
+
+     while (iter) {
+        prev = iter;
+        iter = QLIST_NEXT(iter, next);
+     }
+
+     elem = g_malloc(sizeof(TestQListElement));
+     elem->id = 0x14;
+     QLIST_INSERT_BEFORE(prev, elem, next);
+
+     elem = g_malloc(sizeof(TestQListElement));
+     elem->id = 0x15;
+     QLIST_INSERT_AFTER(prev, elem, next);
+
+     QLIST_REMOVE(prev, next);
+     g_free(prev);
+}
+
+static void test_save_qlist(void)
+{
+    TestQListContainer *container = alloc_container();
+
+    save_vmstate(&vmstate_container, container);
+    compare_vmstate(qlist_dump, sizeof(qlist_dump));
+    free_container(container);
+}
+
+static void test_load_qlist(void)
+{
+    QEMUFile *fsave, *fload;
+    TestQListContainer *orig_container = alloc_container();
+    TestQListContainer *dest_container = g_malloc0(sizeof(TestQListContainer));
+    char eof;
+
+    QLIST_INIT(&dest_container->list);
+
+    fsave = open_test_file(true);
+    qemu_put_buffer(fsave, qlist_dump, sizeof(qlist_dump));
+    g_assert(!qemu_file_get_error(fsave));
+    qemu_fclose(fsave);
+
+    fload = open_test_file(false);
+    vmstate_load_state(fload, &vmstate_container, dest_container, 1);
+    eof = qemu_get_byte(fload);
+    g_assert(!qemu_file_get_error(fload));
+    g_assert_cmpint(eof, ==, QEMU_VM_EOF);
+    manipulate_container(orig_container);
+    manipulate_container(dest_container);
+    compare_containers(orig_container, dest_container);
+    free_container(orig_container);
+    free_container(dest_container);
+}
+
 typedef struct TmpTestStruct {
     TestStruct *parent;
     int64_t diff;
@@ -1353,6 +1521,8 @@ int main(int argc, char **argv)
     g_test_add_func("/vmstate/gtree/load/loaddomain", test_gtree_load_domain);
     g_test_add_func("/vmstate/gtree/save/saveiommu", test_gtree_save_iommu);
     g_test_add_func("/vmstate/gtree/load/loadiommu", test_gtree_load_iommu);
+    g_test_add_func("/vmstate/qlist/save/saveqlist", test_save_qlist);
+    g_test_add_func("/vmstate/qlist/load/loadqlist", test_load_qlist);
     g_test_add_func("/vmstate/tmp_struct", test_tmp_struct);
     g_test_run();
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 02/20] virtio-iommu: Add skeleton
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
  2019-11-22 18:29 ` [PATCH for-5.0 v11 01/20] migration: Support QLIST migration Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:31   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 03/20] virtio-iommu: Decode the command payload Eric Auger
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patchs adds the skeleton for the virtio-iommu device.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v9 -> v10:
- mutex initialized here
- initialize tail
- included hw/qdev-properties.h
- removed g_memdup
- removed s->config.domain_range.start = 0;

v9 -> v10:
- expose VIRTIO_IOMMU_F_MMIO feature
- s/domain_bits/domain_range struct
- change error codes
- enforce unmigratable
- Kconfig

v7 -> v8:
- expose VIRTIO_IOMMU_F_BYPASS and VIRTIO_F_VERSION_1
  features
- set_config dummy implementation + tracing
- add trace in get_features
- set the features on realize() and store the acked ones
- remove inclusion of linux/virtio_iommu.h

v6 -> v7:
- removed qapi-event.h include
- add primary_bus and associated property

v4 -> v5:
- use the new v0.5 terminology (domain, endpoint)
- add the event virtqueue

v3 -> v4:
- use page_size_mask instead of page_sizes
- added set_features()
- added some traces (reset, set_status, set_features)
- empty virtio_iommu_set_config() as the driver MUST NOT
  write to device configuration fields
- add get_config trace

v2 -> v3:
- rebase on 2.10-rc0, ie. use IOMMUMemoryRegion and remove
  iommu_ops.
- advertise VIRTIO_IOMMU_F_MAP_UNMAP feature
- page_sizes set to TARGET_PAGE_SIZE

Conflicts:
	hw/virtio/trace-events
---
 hw/virtio/Kconfig                |   5 +
 hw/virtio/Makefile.objs          |   1 +
 hw/virtio/trace-events           |   8 +
 hw/virtio/virtio-iommu.c         | 274 +++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-iommu.h |  62 +++++++
 5 files changed, 350 insertions(+)
 create mode 100644 hw/virtio/virtio-iommu.c
 create mode 100644 include/hw/virtio/virtio-iommu.h

diff --git a/hw/virtio/Kconfig b/hw/virtio/Kconfig
index 3724ff8bac..a30107b439 100644
--- a/hw/virtio/Kconfig
+++ b/hw/virtio/Kconfig
@@ -6,6 +6,11 @@ config VIRTIO_RNG
     default y
     depends on VIRTIO
 
+config VIRTIO_IOMMU
+    bool
+    default y
+    depends on VIRTIO
+
 config VIRTIO_PCI
     bool
     default y if PCI_DEVICES
diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index e2f70fbb89..f68ac14a90 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -16,6 +16,7 @@ obj-$(call land,$(CONFIG_VIRTIO_CRYPTO),$(CONFIG_VIRTIO_PCI)) += virtio-crypto-p
 obj-$(CONFIG_VIRTIO_PMEM) += virtio-pmem.o
 common-obj-$(call land,$(CONFIG_VIRTIO_PMEM),$(CONFIG_VIRTIO_PCI)) += virtio-pmem-pci.o
 obj-$(call land,$(CONFIG_VHOST_USER_FS),$(CONFIG_VIRTIO_PCI)) += vhost-user-fs-pci.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
 obj-$(CONFIG_VHOST_VSOCK) += vhost-vsock.o
 
 ifeq ($(CONFIG_VIRTIO_PCI),y)
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index e28ba48da6..f7dac39213 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -53,3 +53,11 @@ virtio_mmio_write_offset(uint64_t offset, uint64_t value) "virtio_mmio_write off
 virtio_mmio_guest_page(uint64_t size, int shift) "guest page size 0x%" PRIx64 " shift %d"
 virtio_mmio_queue_write(uint64_t value, int max_size) "mmio_queue write 0x%" PRIx64 " max %d"
 virtio_mmio_setting_irq(int level) "virtio_mmio setting IRQ %d"
+
+# hw/virtio/virtio-iommu.c
+virtio_iommu_device_reset(void) "reset!"
+virtio_iommu_get_features(uint64_t features) "device supports features=0x%"PRIx64
+virtio_iommu_set_features(uint64_t features) "features accepted by the driver =0x%"PRIx64
+virtio_iommu_device_status(uint8_t status) "driver status = %d"
+virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" domain_range=%d probe_size=0x%x"
+virtio_iommu_set_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" domain_bits=%d probe_size=0x%x"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
new file mode 100644
index 0000000000..7b25db3713
--- /dev/null
+++ b/hw/virtio/virtio-iommu.c
@@ -0,0 +1,274 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/iov.h"
+#include "qemu-common.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/virtio.h"
+#include "sysemu/kvm.h"
+#include "trace.h"
+
+#include "standard-headers/linux/virtio_ids.h"
+
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/virtio-access.h"
+#include "hw/virtio/virtio-iommu.h"
+
+/* Max size */
+#define VIOMMU_DEFAULT_QUEUE_SIZE 256
+
+static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
+                                      struct iovec *iov,
+                                      unsigned int iov_cnt)
+{
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
+                                      struct iovec *iov,
+                                      unsigned int iov_cnt)
+{
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+static int virtio_iommu_handle_map(VirtIOIOMMU *s,
+                                   struct iovec *iov,
+                                   unsigned int iov_cnt)
+{
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
+                                     struct iovec *iov,
+                                     unsigned int iov_cnt)
+{
+    return VIRTIO_IOMMU_S_UNSUPP;
+}
+
+static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
+{
+    VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
+    struct virtio_iommu_req_head head;
+    struct virtio_iommu_req_tail tail = {};
+    VirtQueueElement *elem;
+    unsigned int iov_cnt;
+    struct iovec *iov;
+    size_t sz;
+
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+        if (!elem) {
+            return;
+        }
+
+        if (iov_size(elem->in_sg, elem->in_num) < sizeof(tail) ||
+            iov_size(elem->out_sg, elem->out_num) < sizeof(head)) {
+            virtio_error(vdev, "virtio-iommu bad head/tail size");
+            virtqueue_detach_element(vq, elem, 0);
+            g_free(elem);
+            break;
+        }
+
+        iov_cnt = elem->out_num;
+        iov = elem->out_sg;
+        sz = iov_to_buf(iov, iov_cnt, 0, &head, sizeof(head));
+        if (unlikely(sz != sizeof(head))) {
+            tail.status = VIRTIO_IOMMU_S_DEVERR;
+            goto out;
+        }
+        qemu_mutex_lock(&s->mutex);
+        switch (head.type) {
+        case VIRTIO_IOMMU_T_ATTACH:
+            tail.status = virtio_iommu_handle_attach(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_DETACH:
+            tail.status = virtio_iommu_handle_detach(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_MAP:
+            tail.status = virtio_iommu_handle_map(s, iov, iov_cnt);
+            break;
+        case VIRTIO_IOMMU_T_UNMAP:
+            tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
+            break;
+        default:
+            tail.status = VIRTIO_IOMMU_S_UNSUPP;
+        }
+        qemu_mutex_unlock(&s->mutex);
+
+out:
+        sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+                          &tail, sizeof(tail));
+        assert(sz == sizeof(tail));
+
+        virtqueue_push(vq, elem, sizeof(tail));
+        virtio_notify(vdev, vq);
+        g_free(elem);
+    }
+}
+
+static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+    struct virtio_iommu_config *config = &dev->config;
+
+    trace_virtio_iommu_get_config(config->page_size_mask,
+                                  config->input_range.start,
+                                  config->input_range.end,
+                                  config->domain_range.end,
+                                  config->probe_size);
+    memcpy(config_data, &dev->config, sizeof(struct virtio_iommu_config));
+}
+
+static void virtio_iommu_set_config(VirtIODevice *vdev,
+                                      const uint8_t *config_data)
+{
+    struct virtio_iommu_config config;
+
+    memcpy(&config, config_data, sizeof(struct virtio_iommu_config));
+    trace_virtio_iommu_set_config(config.page_size_mask,
+                                  config.input_range.start,
+                                  config.input_range.end,
+                                  config.domain_range.end,
+                                  config.probe_size);
+}
+
+static uint64_t virtio_iommu_get_features(VirtIODevice *vdev, uint64_t f,
+                                          Error **errp)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+
+    f |= dev->features;
+    trace_virtio_iommu_get_features(f);
+    return f;
+}
+
+static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
+{
+    VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
+
+    dev->acked_features = val;
+    trace_virtio_iommu_set_features(dev->acked_features);
+}
+
+/*
+ * Migration is not yet supported: most of the state consists
+ * of balanced binary trees which are not yet ready for getting
+ * migrated
+ */
+static const VMStateDescription vmstate_virtio_iommu_device = {
+    .name = "virtio-iommu-device",
+    .unmigratable = 1,
+};
+
+static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+    virtio_init(vdev, "virtio-iommu", VIRTIO_ID_IOMMU,
+                sizeof(struct virtio_iommu_config));
+
+    s->req_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE,
+                             virtio_iommu_handle_command);
+    s->event_vq = virtio_add_queue(vdev, VIOMMU_DEFAULT_QUEUE_SIZE, NULL);
+
+    s->config.page_size_mask = TARGET_PAGE_MASK;
+    s->config.input_range.end = -1UL;
+    s->config.domain_range.end = 32;
+
+    virtio_add_feature(&s->features, VIRTIO_RING_F_EVENT_IDX);
+    virtio_add_feature(&s->features, VIRTIO_RING_F_INDIRECT_DESC);
+    virtio_add_feature(&s->features, VIRTIO_F_VERSION_1);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_INPUT_RANGE);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_DOMAIN_RANGE);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
+
+    qemu_mutex_init(&s->mutex);
+}
+
+static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+
+    virtio_cleanup(vdev);
+}
+
+static void virtio_iommu_device_reset(VirtIODevice *vdev)
+{
+    trace_virtio_iommu_device_reset();
+}
+
+static void virtio_iommu_set_status(VirtIODevice *vdev, uint8_t status)
+{
+    trace_virtio_iommu_device_status(status);
+}
+
+static void virtio_iommu_instance_init(Object *obj)
+{
+}
+
+static const VMStateDescription vmstate_virtio_iommu = {
+    .name = "virtio-iommu",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_VIRTIO_DEVICE,
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+static Property virtio_iommu_properties[] = {
+    DEFINE_PROP_LINK("primary-bus", VirtIOIOMMU, primary_bus, "PCI", PCIBus *),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_iommu_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
+
+    dc->props = virtio_iommu_properties;
+    dc->vmsd = &vmstate_virtio_iommu;
+
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    vdc->realize = virtio_iommu_device_realize;
+    vdc->unrealize = virtio_iommu_device_unrealize;
+    vdc->reset = virtio_iommu_device_reset;
+    vdc->get_config = virtio_iommu_get_config;
+    vdc->set_config = virtio_iommu_set_config;
+    vdc->get_features = virtio_iommu_get_features;
+    vdc->set_features = virtio_iommu_set_features;
+    vdc->set_status = virtio_iommu_set_status;
+    vdc->vmsd = &vmstate_virtio_iommu_device;
+}
+
+static const TypeInfo virtio_iommu_info = {
+    .name = TYPE_VIRTIO_IOMMU,
+    .parent = TYPE_VIRTIO_DEVICE,
+    .instance_size = sizeof(VirtIOIOMMU),
+    .instance_init = virtio_iommu_instance_init,
+    .class_init = virtio_iommu_class_init,
+};
+
+static void virtio_register_types(void)
+{
+    type_register_static(&virtio_iommu_info);
+}
+
+type_init(virtio_register_types)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
new file mode 100644
index 0000000000..4d47b6abeb
--- /dev/null
+++ b/include/hw/virtio/virtio-iommu.h
@@ -0,0 +1,62 @@
+/*
+ * virtio-iommu device
+ *
+ * Copyright (c) 2017 Red Hat, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef QEMU_VIRTIO_IOMMU_H
+#define QEMU_VIRTIO_IOMMU_H
+
+#include "standard-headers/linux/virtio_iommu.h"
+#include "hw/virtio/virtio.h"
+#include "hw/pci/pci.h"
+
+#define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
+#define VIRTIO_IOMMU(obj) \
+        OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
+
+#define IOMMU_PCI_BUS_MAX      256
+#define IOMMU_PCI_DEVFN_MAX    256
+
+typedef struct IOMMUDevice {
+    void         *viommu;
+    PCIBus       *bus;
+    int           devfn;
+    IOMMUMemoryRegion  iommu_mr;
+    AddressSpace  as;
+} IOMMUDevice;
+
+typedef struct IOMMUPciBus {
+    PCIBus       *bus;
+    IOMMUDevice  *pbdev[0]; /* Parent array is sparse, so dynamically alloc */
+} IOMMUPciBus;
+
+typedef struct VirtIOIOMMU {
+    VirtIODevice parent_obj;
+    VirtQueue *req_vq;
+    VirtQueue *event_vq;
+    struct virtio_iommu_config config;
+    uint64_t features;
+    uint64_t acked_features;
+    GHashTable *as_by_busptr;
+    IOMMUPciBus *as_by_bus_num[IOMMU_PCI_BUS_MAX];
+    PCIBus *primary_bus;
+    GTree *domains;
+    QemuMutex mutex;
+    GTree *endpoints;
+} VirtIOIOMMU;
+
+#endif
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 03/20] virtio-iommu: Decode the command payload
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
  2019-11-22 18:29 ` [PATCH for-5.0 v11 01/20] migration: Support QLIST migration Eric Auger
  2019-11-22 18:29 ` [PATCH for-5.0 v11 02/20] virtio-iommu: Add skeleton Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:32   ` Jean-Philippe Brucker
  2019-12-10 19:14   ` Peter Xu
  2019-11-22 18:29 ` [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions Eric Auger
                   ` (18 subsequent siblings)
  21 siblings, 2 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch adds the command payload decoding and
introduces the functions that will do the actual
command handling. Those functions are not yet implemented.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- use a macro for handle command functions

v9 -> v10:
- make virtio_iommu_handle_* more compact and
  remove get_payload_size

v7 -> v8:
- handle new domain parameter in detach
- remove reserved checks

v5 -> v6:
- change map/unmap semantics (remove size)

v4 -> v5:
- adopt new v0.5 terminology

v3 -> v4:
- no flags field anymore in struct virtio_iommu_req_unmap
- test reserved on attach/detach, change trace proto
- rebase on v2.10.0.
---
 hw/virtio/trace-events   |  4 +++
 hw/virtio/virtio-iommu.c | 76 +++++++++++++++++++++++++++++++++-------
 2 files changed, 68 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index f7dac39213..c7276116e7 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -61,3 +61,7 @@ virtio_iommu_set_features(uint64_t features) "features accepted by the driver =0
 virtio_iommu_device_status(uint8_t status) "driver status = %d"
 virtio_iommu_get_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" domain_range=%d probe_size=0x%x"
 virtio_iommu_set_config(uint64_t page_size_mask, uint64_t start, uint64_t end, uint32_t domain_range, uint32_t probe_size) "page_size_mask=0x%"PRIx64" start=0x%"PRIx64" end=0x%"PRIx64" domain_bits=%d probe_size=0x%x"
+virtio_iommu_attach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
+virtio_iommu_detach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
+virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, uint64_t phys_start, uint32_t flags) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64 " phys_start=0x%"PRIx64" flags=%d"
+virtio_iommu_unmap(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 7b25db3713..afd6397ac9 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -34,31 +34,83 @@
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
-static int virtio_iommu_handle_attach(VirtIOIOMMU *s,
-                                      struct iovec *iov,
-                                      unsigned int iov_cnt)
+static int virtio_iommu_attach(VirtIOIOMMU *s,
+                               struct virtio_iommu_req_attach *req)
 {
+    uint32_t domain_id = le32_to_cpu(req->domain);
+    uint32_t ep_id = le32_to_cpu(req->endpoint);
+
+    trace_virtio_iommu_attach(domain_id, ep_id);
+
     return VIRTIO_IOMMU_S_UNSUPP;
 }
-static int virtio_iommu_handle_detach(VirtIOIOMMU *s,
-                                      struct iovec *iov,
-                                      unsigned int iov_cnt)
+
+static int virtio_iommu_detach(VirtIOIOMMU *s,
+                               struct virtio_iommu_req_detach *req)
 {
+    uint32_t domain_id = le32_to_cpu(req->domain);
+    uint32_t ep_id = le32_to_cpu(req->endpoint);
+
+    trace_virtio_iommu_detach(domain_id, ep_id);
+
     return VIRTIO_IOMMU_S_UNSUPP;
 }
-static int virtio_iommu_handle_map(VirtIOIOMMU *s,
-                                   struct iovec *iov,
-                                   unsigned int iov_cnt)
+
+static int virtio_iommu_map(VirtIOIOMMU *s,
+                            struct virtio_iommu_req_map *req)
 {
+    uint32_t domain_id = le32_to_cpu(req->domain);
+    uint64_t phys_start = le64_to_cpu(req->phys_start);
+    uint64_t virt_start = le64_to_cpu(req->virt_start);
+    uint64_t virt_end = le64_to_cpu(req->virt_end);
+    uint32_t flags = le32_to_cpu(req->flags);
+
+    trace_virtio_iommu_map(domain_id, virt_start, virt_end, phys_start, flags);
+
     return VIRTIO_IOMMU_S_UNSUPP;
 }
-static int virtio_iommu_handle_unmap(VirtIOIOMMU *s,
-                                     struct iovec *iov,
-                                     unsigned int iov_cnt)
+
+static int virtio_iommu_unmap(VirtIOIOMMU *s,
+                              struct virtio_iommu_req_unmap *req)
 {
+    uint32_t domain_id = le32_to_cpu(req->domain);
+    uint64_t virt_start = le64_to_cpu(req->virt_start);
+    uint64_t virt_end = le64_to_cpu(req->virt_end);
+
+    trace_virtio_iommu_unmap(domain_id, virt_start, virt_end);
+
     return VIRTIO_IOMMU_S_UNSUPP;
 }
 
+static int virtio_iommu_iov_to_req(struct iovec *iov,
+                                   unsigned int iov_cnt,
+                                   void *req, size_t req_sz)
+{
+    size_t sz, payload_sz = req_sz - sizeof(struct virtio_iommu_req_tail);
+
+    sz = iov_to_buf(iov, iov_cnt, 0, req, payload_sz);
+    if (unlikely(sz != payload_sz)) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+    return 0;
+}
+
+#define virtio_iommu_handle_req(__req)                                  \
+static int virtio_iommu_handle_ ## __req(VirtIOIOMMU *s,                \
+                                         struct iovec *iov,             \
+                                         unsigned int iov_cnt)          \
+{                                                                       \
+    struct virtio_iommu_req_ ## __req req;                              \
+    int ret = virtio_iommu_iov_to_req(iov, iov_cnt, &req, sizeof(req)); \
+                                                                        \
+    return ret ? ret : virtio_iommu_ ## __req(s, &req);                 \
+}
+
+virtio_iommu_handle_req(attach)
+virtio_iommu_handle_req(detach)
+virtio_iommu_handle_req(map)
+virtio_iommu_handle_req(unmap)
+
 static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (2 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 03/20] virtio-iommu: Decode the command payload Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:34   ` Jean-Philippe Brucker
  2019-12-10 19:18   ` Peter Xu
  2019-11-22 18:29 ` [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
                   ` (17 subsequent siblings)
  21 siblings, 2 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch initializes the iommu memory regions so that
PCIe end point transactions get translated. The translation
function is not yet implemented though.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- use g_hash_table_new_full for allocating as_by_busptr

v9 -> v10:
- remove pc/virt machine headers
- virtio_iommu_find_add_as: mr_index introduced in that patch
  and name properly freed

v6 -> v7:
- use primary_bus
- rebase on new translate proto featuring iommu_idx

v5 -> v6:
- include qapi/error.h
- fix g_hash_table_lookup key in virtio_iommu_find_add_as

v4 -> v5:
- use PCI bus handle as a key
- use get_primary_pci_bus() callback

v3 -> v4:
- add trace_virtio_iommu_init_iommu_mr

v2 -> v3:
- use IOMMUMemoryRegion
- iommu mr name built with BDF
- rename smmu_get_sid into virtio_iommu_get_sid and use PCI_BUILD_BDF
---
 hw/virtio/trace-events           |  2 +
 hw/virtio/virtio-iommu.c         | 92 ++++++++++++++++++++++++++++++++
 include/hw/virtio/virtio-iommu.h |  2 +
 3 files changed, 96 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index c7276116e7..b32169d56c 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -65,3 +65,5 @@ virtio_iommu_attach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
 virtio_iommu_detach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
 virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, uint64_t phys_start, uint32_t flags) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64 " phys_start=0x%"PRIx64" flags=%d"
 virtio_iommu_unmap(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
+virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
+virtio_iommu_init_iommu_mr(char *iommu_mr) "init %s"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index afd6397ac9..2d7b1752b7 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -23,6 +23,8 @@
 #include "hw/qdev-properties.h"
 #include "hw/virtio/virtio.h"
 #include "sysemu/kvm.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "trace.h"
 
 #include "standard-headers/linux/virtio_ids.h"
@@ -34,6 +36,50 @@
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
+{
+    return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
+}
+
+static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
+                                              int devfn)
+{
+    VirtIOIOMMU *s = opaque;
+    IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, bus);
+    static uint32_t mr_index;
+    IOMMUDevice *sdev;
+
+    if (!sbus) {
+        sbus = g_malloc0(sizeof(IOMMUPciBus) +
+                         sizeof(IOMMUDevice *) * IOMMU_PCI_DEVFN_MAX);
+        sbus->bus = bus;
+        g_hash_table_insert(s->as_by_busptr, bus, sbus);
+    }
+
+    sdev = sbus->pbdev[devfn];
+    if (!sdev) {
+        char *name = g_strdup_printf("%s-%d-%d",
+                                     TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+                                     mr_index++, devfn);
+        sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(IOMMUDevice));
+
+        sdev->viommu = s;
+        sdev->bus = bus;
+        sdev->devfn = devfn;
+
+        trace_virtio_iommu_init_iommu_mr(name);
+
+        memory_region_init_iommu(&sdev->iommu_mr, sizeof(sdev->iommu_mr),
+                                 TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+                                 OBJECT(s), name,
+                                 UINT64_MAX);
+        address_space_init(&sdev->as,
+                           MEMORY_REGION(&sdev->iommu_mr), TYPE_VIRTIO_IOMMU);
+        g_free(name);
+    }
+    return &sdev->as;
+}
+
 static int virtio_iommu_attach(VirtIOIOMMU *s,
                                struct virtio_iommu_req_attach *req)
 {
@@ -172,6 +218,27 @@ out:
     }
 }
 
+static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
+                                            IOMMUAccessFlags flag,
+                                            int iommu_idx)
+{
+    IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    uint32_t sid;
+
+    IOMMUTLBEntry entry = {
+        .target_as = &address_space_memory,
+        .iova = addr,
+        .translated_addr = addr,
+        .addr_mask = ~(hwaddr)0,
+        .perm = IOMMU_NONE,
+    };
+
+    sid = virtio_iommu_get_sid(sdev);
+
+    trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
+    return entry;
+}
+
 static void virtio_iommu_get_config(VirtIODevice *vdev, uint8_t *config_data)
 {
     VirtIOIOMMU *dev = VIRTIO_IOMMU(vdev);
@@ -252,6 +319,15 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
 
     qemu_mutex_init(&s->mutex);
+
+    memset(s->as_by_bus_num, 0, sizeof(s->as_by_bus_num));
+    s->as_by_busptr = g_hash_table_new_full(NULL, NULL, NULL, g_free);
+
+    if (s->primary_bus) {
+        pci_setup_iommu(s->primary_bus, virtio_iommu_find_add_as, s);
+    } else {
+        error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
+    }
 }
 
 static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
@@ -310,6 +386,14 @@ static void virtio_iommu_class_init(ObjectClass *klass, void *data)
     vdc->vmsd = &vmstate_virtio_iommu_device;
 }
 
+static void virtio_iommu_memory_region_class_init(ObjectClass *klass,
+                                                  void *data)
+{
+    IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_CLASS(klass);
+
+    imrc->translate = virtio_iommu_translate;
+}
+
 static const TypeInfo virtio_iommu_info = {
     .name = TYPE_VIRTIO_IOMMU,
     .parent = TYPE_VIRTIO_DEVICE,
@@ -318,9 +402,17 @@ static const TypeInfo virtio_iommu_info = {
     .class_init = virtio_iommu_class_init,
 };
 
+static const TypeInfo virtio_iommu_memory_region_info = {
+    .parent = TYPE_IOMMU_MEMORY_REGION,
+    .name = TYPE_VIRTIO_IOMMU_MEMORY_REGION,
+    .class_init = virtio_iommu_memory_region_class_init,
+};
+
+
 static void virtio_register_types(void)
 {
     type_register_static(&virtio_iommu_info);
+    type_register_static(&virtio_iommu_memory_region_info);
 }
 
 type_init(virtio_register_types)
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 4d47b6abeb..f55f48d304 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -28,6 +28,8 @@
 #define VIRTIO_IOMMU(obj) \
         OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
 
+#define TYPE_VIRTIO_IOMMU_MEMORY_REGION "virtio-iommu-memory-region"
+
 #define IOMMU_PCI_BUS_MAX      256
 #define IOMMU_PCI_DEVFN_MAX    256
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (3 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:37   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 06/20] virtio-iommu: Implement attach/detach command Eric Auger
                   ` (16 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch introduce domain and endpoint internal
datatypes. Both are stored in RB trees. The domain
owns a list of endpoints attached to it.

Helpers to get/put end points and domains are introduced.
get() helpers will become static in subsequent patches.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- fixed interval_cmp (<= -> < and >= -> >)
- removed unused viommu field from endpoint
- removed Bharat's R-b

v9 -> v10:
- added Bharat's R-b

v6 -> v7:
- on virtio_iommu_find_add_as the bus number computation may
  not be finalized yet so we cannot register the EPs at that time.
  Hence, let's remove the get_endpoint and also do not use the
  bus number for building the memory region name string (only
  used for debug though).

v4 -> v5:
- initialize as->endpoint_list

v3 -> v4:
- new separate patch
---
 hw/virtio/trace-events   |   4 ++
 hw/virtio/virtio-iommu.c | 117 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 121 insertions(+)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index b32169d56c..a373bdebb3 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -67,3 +67,7 @@ virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, uin
 virtio_iommu_unmap(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
 virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
 virtio_iommu_init_iommu_mr(char *iommu_mr) "init %s"
+virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
+virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
+virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
+virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 2d7b1752b7..235bde2203 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -32,15 +32,116 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
 
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
 
+typedef struct viommu_domain {
+    uint32_t id;
+    GTree *mappings;
+    QLIST_HEAD(, viommu_endpoint) endpoint_list;
+} viommu_domain;
+
+typedef struct viommu_endpoint {
+    uint32_t id;
+    viommu_domain *domain;
+    QLIST_ENTRY(viommu_endpoint) next;
+} viommu_endpoint;
+
+typedef struct viommu_interval {
+    uint64_t low;
+    uint64_t high;
+} viommu_interval;
+
 static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
 {
     return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
 }
 
+static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+    viommu_interval *inta = (viommu_interval *)a;
+    viommu_interval *intb = (viommu_interval *)b;
+
+    if (inta->high < intb->low) {
+        return -1;
+    } else if (intb->high < inta->low) {
+        return 1;
+    } else {
+        return 0;
+    }
+}
+
+static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
+{
+    QLIST_REMOVE(ep, next);
+    ep->domain = NULL;
+}
+
+viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id);
+viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id)
+{
+    viommu_endpoint *ep;
+
+    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(ep_id));
+    if (ep) {
+        return ep;
+    }
+    ep = g_malloc0(sizeof(*ep));
+    ep->id = ep_id;
+    trace_virtio_iommu_get_endpoint(ep_id);
+    g_tree_insert(s->endpoints, GUINT_TO_POINTER(ep_id), ep);
+    return ep;
+}
+
+static void virtio_iommu_put_endpoint(gpointer data)
+{
+    viommu_endpoint *ep = (viommu_endpoint *)data;
+
+    if (ep->domain) {
+        virtio_iommu_detach_endpoint_from_domain(ep);
+        g_tree_unref(ep->domain->mappings);
+    }
+
+    trace_virtio_iommu_put_endpoint(ep->id);
+    g_free(ep);
+}
+
+viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id);
+viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id)
+{
+    viommu_domain *domain;
+
+    domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
+    if (domain) {
+        return domain;
+    }
+    domain = g_malloc0(sizeof(*domain));
+    domain->id = domain_id;
+    domain->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
+                                   NULL, (GDestroyNotify)g_free,
+                                   (GDestroyNotify)g_free);
+    g_tree_insert(s->domains, GUINT_TO_POINTER(domain_id), domain);
+    QLIST_INIT(&domain->endpoint_list);
+    trace_virtio_iommu_get_domain(domain_id);
+    return domain;
+}
+
+static void virtio_iommu_put_domain(gpointer data)
+{
+    viommu_domain *domain = (viommu_domain *)data;
+    viommu_endpoint *iter, *tmp;
+
+    QLIST_FOREACH_SAFE(iter, &domain->endpoint_list, next, tmp) {
+        virtio_iommu_detach_endpoint_from_domain(iter);
+    }
+    g_tree_destroy(domain->mappings);
+    trace_virtio_iommu_put_domain(domain->id);
+    g_free(domain);
+}
+
 static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
                                               int devfn)
 {
@@ -293,6 +394,13 @@ static const VMStateDescription vmstate_virtio_iommu_device = {
     .unmigratable = 1,
 };
 
+static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
+{
+    uint ua = GPOINTER_TO_UINT(a);
+    uint ub = GPOINTER_TO_UINT(b);
+    return (ua > ub) - (ua < ub);
+}
+
 static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
@@ -328,11 +436,20 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     } else {
         error_setg(errp, "VIRTIO-IOMMU is not attached to any PCI bus!");
     }
+
+    s->domains = g_tree_new_full((GCompareDataFunc)int_cmp,
+                                 NULL, NULL, virtio_iommu_put_domain);
+    s->endpoints = g_tree_new_full((GCompareDataFunc)int_cmp,
+                                   NULL, NULL, virtio_iommu_put_endpoint);
 }
 
 static void virtio_iommu_device_unrealize(DeviceState *dev, Error **errp)
 {
     VirtIODevice *vdev = VIRTIO_DEVICE(dev);
+    VirtIOIOMMU *s = VIRTIO_IOMMU(dev);
+
+    g_tree_destroy(s->domains);
+    g_tree_destroy(s->endpoints);
 
     virtio_cleanup(vdev);
 }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 06/20] virtio-iommu: Implement attach/detach command
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (4 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:41   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 07/20] virtio-iommu: Implement map/unmap Eric Auger
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch implements the endpoint attach/detach to/from
a domain.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
---
 hw/virtio/virtio-iommu.c | 43 ++++++++++++++++++++++++++++++++--------
 1 file changed, 35 insertions(+), 8 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 235bde2203..138d5b2a9c 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -77,11 +77,12 @@ static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
 static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
 {
     QLIST_REMOVE(ep, next);
+    g_tree_unref(ep->domain->mappings);
     ep->domain = NULL;
 }
 
-viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id);
-viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id)
+static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
+                                                  uint32_t ep_id)
 {
     viommu_endpoint *ep;
 
@@ -102,15 +103,14 @@ static void virtio_iommu_put_endpoint(gpointer data)
 
     if (ep->domain) {
         virtio_iommu_detach_endpoint_from_domain(ep);
-        g_tree_unref(ep->domain->mappings);
     }
 
     trace_virtio_iommu_put_endpoint(ep->id);
     g_free(ep);
 }
 
-viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id);
-viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id)
+static viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s,
+                                              uint32_t domain_id)
 {
     viommu_domain *domain;
 
@@ -137,7 +137,6 @@ static void virtio_iommu_put_domain(gpointer data)
     QLIST_FOREACH_SAFE(iter, &domain->endpoint_list, next, tmp) {
         virtio_iommu_detach_endpoint_from_domain(iter);
     }
-    g_tree_destroy(domain->mappings);
     trace_virtio_iommu_put_domain(domain->id);
     g_free(domain);
 }
@@ -186,10 +185,27 @@ static int virtio_iommu_attach(VirtIOIOMMU *s,
 {
     uint32_t domain_id = le32_to_cpu(req->domain);
     uint32_t ep_id = le32_to_cpu(req->endpoint);
+    viommu_domain *domain;
+    viommu_endpoint *ep;
 
     trace_virtio_iommu_attach(domain_id, ep_id);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    ep = virtio_iommu_get_endpoint(s, ep_id);
+    if (ep->domain) {
+        /*
+         * the device is already attached to a domain,
+         * detach it first
+         */
+        virtio_iommu_detach_endpoint_from_domain(ep);
+    }
+
+    domain = virtio_iommu_get_domain(s, domain_id);
+    QLIST_INSERT_HEAD(&domain->endpoint_list, ep, next);
+
+    ep->domain = domain;
+    g_tree_ref(domain->mappings);
+
+    return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_detach(VirtIOIOMMU *s,
@@ -197,10 +213,21 @@ static int virtio_iommu_detach(VirtIOIOMMU *s,
 {
     uint32_t domain_id = le32_to_cpu(req->domain);
     uint32_t ep_id = le32_to_cpu(req->endpoint);
+    viommu_endpoint *ep;
 
     trace_virtio_iommu_detach(domain_id, ep_id);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(ep_id));
+    if (!ep) {
+        return VIRTIO_IOMMU_S_NOENT;
+    }
+
+    if (!ep->domain) {
+        return VIRTIO_IOMMU_S_INVAL;
+    }
+
+    virtio_iommu_detach_endpoint_from_domain(ep);
+    return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_map(VirtIOIOMMU *s,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 07/20] virtio-iommu: Implement map/unmap
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (5 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 06/20] virtio-iommu: Implement attach/detach command Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:43   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate Eric Auger
                   ` (14 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch implements virtio_iommu_map/unmap.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- revisit the implementation of unmap according to Peter's suggestion
- removed virt_addr and size from viommu_mapping struct
- use g_tree_lookup_extended()
- return VIRTIO_IOMMU_S_RANGE in case a mapping were
  to be split on unmap (instead of INVAL)

v5 -> v6:
- use new v0.6 fields
- replace error_report by qemu_log_mask

v3 -> v4:
- implement unmap semantics as specified in v0.4
---
 hw/virtio/trace-events   |  1 +
 hw/virtio/virtio-iommu.c | 65 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a373bdebb3..f25359cee2 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -65,6 +65,7 @@ virtio_iommu_attach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
 virtio_iommu_detach(uint32_t domain_id, uint32_t ep_id) "domain=%d endpoint=%d"
 virtio_iommu_map(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end, uint64_t phys_start, uint32_t flags) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64 " phys_start=0x%"PRIx64" flags=%d"
 virtio_iommu_unmap(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
+virtio_iommu_unmap_done(uint32_t domain_id, uint64_t virt_start, uint64_t virt_end) "domain=%d virt_start=0x%"PRIx64" virt_end=0x%"PRIx64
 virtio_iommu_translate(const char *name, uint32_t rid, uint64_t iova, int flag) "mr=%s rid=%d addr=0x%"PRIx64" flag=%d"
 virtio_iommu_init_iommu_mr(char *iommu_mr) "init %s"
 virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 138d5b2a9c..f0a56833a2 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -18,6 +18,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/log.h"
 #include "qemu/iov.h"
 #include "qemu-common.h"
 #include "hw/qdev-properties.h"
@@ -55,6 +56,11 @@ typedef struct viommu_interval {
     uint64_t high;
 } viommu_interval;
 
+typedef struct viommu_mapping {
+    uint64_t phys_addr;
+    uint32_t flags;
+} viommu_mapping;
+
 static inline uint16_t virtio_iommu_get_sid(IOMMUDevice *dev)
 {
     return PCI_BUILD_BDF(pci_bus_num(dev->bus), dev->devfn);
@@ -238,10 +244,35 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
     uint64_t virt_start = le64_to_cpu(req->virt_start);
     uint64_t virt_end = le64_to_cpu(req->virt_end);
     uint32_t flags = le32_to_cpu(req->flags);
+    viommu_domain *domain;
+    viommu_interval *interval;
+    viommu_mapping *mapping;
+
+    interval = g_malloc0(sizeof(*interval));
+
+    interval->low = virt_start;
+    interval->high = virt_end;
+
+    domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
+    if (!domain) {
+        return VIRTIO_IOMMU_S_NOENT;
+    }
+
+    mapping = g_tree_lookup(domain->mappings, (gpointer)interval);
+    if (mapping) {
+        g_free(interval);
+        return VIRTIO_IOMMU_S_INVAL;
+    }
 
     trace_virtio_iommu_map(domain_id, virt_start, virt_end, phys_start, flags);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    mapping = g_malloc0(sizeof(*mapping));
+    mapping->phys_addr = phys_start;
+    mapping->flags = flags;
+
+    g_tree_insert(domain->mappings, interval, mapping);
+
+    return VIRTIO_IOMMU_S_OK;
 }
 
 static int virtio_iommu_unmap(VirtIOIOMMU *s,
@@ -250,10 +281,40 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
     uint32_t domain_id = le32_to_cpu(req->domain);
     uint64_t virt_start = le64_to_cpu(req->virt_start);
     uint64_t virt_end = le64_to_cpu(req->virt_end);
+    viommu_mapping *iter_val;
+    viommu_interval interval, *iter_key;
+    viommu_domain *domain;
+    int ret = VIRTIO_IOMMU_S_OK;
 
     trace_virtio_iommu_unmap(domain_id, virt_start, virt_end);
 
-    return VIRTIO_IOMMU_S_UNSUPP;
+    domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
+    if (!domain) {
+        qemu_log_mask(LOG_GUEST_ERROR, "%s: no domain\n", __func__);
+        return VIRTIO_IOMMU_S_NOENT;
+    }
+    interval.low = virt_start;
+    interval.high = virt_end;
+
+    while (g_tree_lookup_extended(domain->mappings, &interval,
+                                  (void **)&iter_key, (void**)&iter_val)) {
+        uint64_t current_low = iter_key->low;
+        uint64_t current_high = iter_key->high;
+
+        if (interval.low <= current_low && interval.high >= current_high) {
+            g_tree_remove(domain->mappings, iter_key);
+            trace_virtio_iommu_unmap_done(domain_id, current_low, current_high);
+        } else {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                "%s: domain= %d Unmap [0x%"PRIx64",0x%"PRIx64"] forbidden as "
+                "it would split existing mapping [0x%"PRIx64", 0x%"PRIx64"]\n",
+                __func__, domain_id, interval.low, interval.high,
+                current_low, current_high);
+            ret = VIRTIO_IOMMU_S_RANGE;
+            break;
+        }
+    }
+    return ret;
 }
 
 static int virtio_iommu_iov_to_req(struct iovec *iov,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (6 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 07/20] virtio-iommu: Implement map/unmap Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:43   ` Jean-Philippe Brucker
  2019-12-10 19:33   ` Peter Xu
  2019-11-22 18:29 ` [PATCH for-5.0 v11 09/20] virtio-iommu: Implement fault reporting Eric Auger
                   ` (13 subsequent siblings)
  21 siblings, 2 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch implements the translate callback

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- take into account the new value struct and use
  g_tree_lookup_extended
- switched to error_report_once

v6 -> v7:
- implemented bypass-mode

v5 -> v6:
- replace error_report by qemu_log_mask

v4 -> v5:
- check the device domain is not NULL
- s/printf/error_report
- set flags to IOMMU_NONE in case of all translation faults
---
 hw/virtio/trace-events   |  1 +
 hw/virtio/virtio-iommu.c | 63 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index f25359cee2..de7cbb3c8f 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -72,3 +72,4 @@ virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
 virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
 virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
 virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
+virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index f0a56833a2..a83666557b 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -412,19 +412,80 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
                                             int iommu_idx)
 {
     IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
+    viommu_interval interval, *mapping_key;
+    viommu_mapping *mapping_value;
+    VirtIOIOMMU *s = sdev->viommu;
+    viommu_endpoint *ep;
+    bool bypass_allowed;
     uint32_t sid;
+    bool found;
+
+    interval.low = addr;
+    interval.high = addr + 1;
 
     IOMMUTLBEntry entry = {
         .target_as = &address_space_memory,
         .iova = addr,
         .translated_addr = addr,
-        .addr_mask = ~(hwaddr)0,
+        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
         .perm = IOMMU_NONE,
     };
 
+    bypass_allowed = virtio_has_feature(s->acked_features,
+                                        VIRTIO_IOMMU_F_BYPASS);
+
     sid = virtio_iommu_get_sid(sdev);
 
     trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
+    qemu_mutex_lock(&s->mutex);
+
+    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
+    if (!ep) {
+        if (!bypass_allowed) {
+            error_report_once("%s sid=%d is not known!!", __func__, sid);
+        } else {
+            entry.perm = flag;
+        }
+        goto unlock;
+    }
+
+    if (!ep->domain) {
+        if (!bypass_allowed) {
+            qemu_log_mask(LOG_GUEST_ERROR,
+                          "%s %02x:%02x.%01x not attached to any domain\n",
+                          __func__, PCI_BUS_NUM(sid),
+                          PCI_SLOT(sid), PCI_FUNC(sid));
+        } else {
+            entry.perm = flag;
+        }
+        goto unlock;
+    }
+
+    found = g_tree_lookup_extended(ep->domain->mappings, (gpointer)(&interval),
+                                   (void **)&mapping_key,
+                                   (void **)&mapping_value);
+    if (!found) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "%s no mapping for 0x%"PRIx64" for sid=%d\n",
+                      __func__, addr, sid);
+        goto unlock;
+    }
+
+    if (((flag & IOMMU_RO) &&
+            !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
+        ((flag & IOMMU_WO) &&
+            !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
+        qemu_log_mask(LOG_GUEST_ERROR,
+                      "Permission error on 0x%"PRIx64"(%d): allowed=%d\n",
+                      addr, flag, mapping_value->flags);
+        goto unlock;
+    }
+    entry.translated_addr = addr - mapping_key->low + mapping_value->phys_addr;
+    entry.perm = flag;
+    trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
+
+unlock:
+    qemu_mutex_unlock(&s->mutex);
     return entry;
 }
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 09/20] virtio-iommu: Implement fault reporting
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (7 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:44   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 10/20] virtio-iommu-pci: Add virtio iommu pci support Eric Auger
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

The event queue allows to report asynchronous errors.
The translate function now injects faults when relevant.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- change a virtio_error into an error_report_once
  (no buffer available for output faults)
---
 hw/virtio/trace-events   |  1 +
 hw/virtio/virtio-iommu.c | 69 +++++++++++++++++++++++++++++++++++++---
 2 files changed, 65 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index de7cbb3c8f..a572eb71aa 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -73,3 +73,4 @@ virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
 virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
 virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
 virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
+virtio_iommu_report_fault(uint8_t reason, uint32_t flags, uint32_t endpoint, uint64_t addr) "FAULT reason=%d flags=%d endpoint=%d address =0x%"PRIx64
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index a83666557b..723616a5db 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -407,6 +407,51 @@ out:
     }
 }
 
+static void virtio_iommu_report_fault(VirtIOIOMMU *viommu, uint8_t reason,
+                                      uint32_t flags, uint32_t endpoint,
+                                      uint64_t address)
+{
+    VirtIODevice *vdev = &viommu->parent_obj;
+    VirtQueue *vq = viommu->event_vq;
+    struct virtio_iommu_fault fault;
+    VirtQueueElement *elem;
+    size_t sz;
+
+    memset(&fault, 0, sizeof(fault));
+    fault.reason = reason;
+    fault.flags = flags;
+    fault.endpoint = endpoint;
+    fault.address = address;
+
+    for (;;) {
+        elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
+
+        if (!elem) {
+            error_report_once(
+                "no buffer available in event queue to report event");
+            return;
+        }
+
+        if (iov_size(elem->in_sg, elem->in_num) < sizeof(fault)) {
+            virtio_error(vdev, "error buffer of wrong size");
+            virtqueue_detach_element(vq, elem, 0);
+            g_free(elem);
+            continue;
+        }
+        break;
+    }
+    /* we have a buffer to fill in */
+    sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+                      &fault, sizeof(fault));
+    assert(sz == sizeof(fault));
+
+    trace_virtio_iommu_report_fault(reason, flags, endpoint, address);
+    virtqueue_push(vq, elem, sz);
+    virtio_notify(vdev, vq);
+    g_free(elem);
+
+}
+
 static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
                                             IOMMUAccessFlags flag,
                                             int iommu_idx)
@@ -415,9 +460,10 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     viommu_interval interval, *mapping_key;
     viommu_mapping *mapping_value;
     VirtIOIOMMU *s = sdev->viommu;
+    bool read_fault, write_fault;
     viommu_endpoint *ep;
+    uint32_t sid, flags;
     bool bypass_allowed;
-    uint32_t sid;
     bool found;
 
     interval.low = addr;
@@ -443,6 +489,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     if (!ep) {
         if (!bypass_allowed) {
             error_report_once("%s sid=%d is not known!!", __func__, sid);
+            virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_UNKNOWN,
+                                      0, sid, 0);
         } else {
             entry.perm = flag;
         }
@@ -455,6 +503,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
                           "%s %02x:%02x.%01x not attached to any domain\n",
                           __func__, PCI_BUS_NUM(sid),
                           PCI_SLOT(sid), PCI_FUNC(sid));
+            virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_DOMAIN,
+                                      0, sid, 0);
         } else {
             entry.perm = flag;
         }
@@ -468,16 +518,25 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         qemu_log_mask(LOG_GUEST_ERROR,
                       "%s no mapping for 0x%"PRIx64" for sid=%d\n",
                       __func__, addr, sid);
+        virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
+                                  0, sid, addr);
         goto unlock;
     }
 
-    if (((flag & IOMMU_RO) &&
-            !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
-        ((flag & IOMMU_WO) &&
-            !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
+    read_fault = (flag & IOMMU_RO) &&
+                    !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_READ);
+    write_fault = (flag & IOMMU_WO) &&
+                    !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_WRITE);
+
+    flags = read_fault ? VIRTIO_IOMMU_FAULT_F_READ : 0;
+    flags |= write_fault ? VIRTIO_IOMMU_FAULT_F_WRITE : 0;
+    if (flags) {
         qemu_log_mask(LOG_GUEST_ERROR,
                       "Permission error on 0x%"PRIx64"(%d): allowed=%d\n",
                       addr, flag, mapping_value->flags);
+        flags |= VIRTIO_IOMMU_FAULT_F_ADDRESS;
+        virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
+                                  flags, sid, addr);
         goto unlock;
     }
     entry.translated_addr = addr - mapping_key->low + mapping_value->phys_addr;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 10/20] virtio-iommu-pci: Add virtio iommu pci support
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (8 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 09/20] virtio-iommu: Implement fault reporting Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:44   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 11/20] hw/arm/virt: Add the virtio-iommu device tree mappings Eric Auger
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch adds virtio-iommu-pci, which is the pci proxy for
the virtio-iommu device.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- add the reserved_regions array property

v9 -> v10:
- include "hw/qdev-properties.h" header

v8 -> v9:
- add the msi-bypass property
- create virtio-iommu-pci.c
---
 hw/virtio/Makefile.objs          |  1 +
 hw/virtio/virtio-iommu-pci.c     | 91 ++++++++++++++++++++++++++++++++
 include/hw/pci/pci.h             |  1 +
 include/hw/virtio/virtio-iommu.h |  1 +
 qdev-monitor.c                   |  1 +
 5 files changed, 95 insertions(+)
 create mode 100644 hw/virtio/virtio-iommu-pci.c

diff --git a/hw/virtio/Makefile.objs b/hw/virtio/Makefile.objs
index f68ac14a90..33e6bc591a 100644
--- a/hw/virtio/Makefile.objs
+++ b/hw/virtio/Makefile.objs
@@ -29,6 +29,7 @@ obj-$(CONFIG_VIRTIO_INPUT_HOST) += virtio-input-host-pci.o
 obj-$(CONFIG_VIRTIO_INPUT) += virtio-input-pci.o
 obj-$(CONFIG_VIRTIO_RNG) += virtio-rng-pci.o
 obj-$(CONFIG_VIRTIO_BALLOON) += virtio-balloon-pci.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu-pci.o
 obj-$(CONFIG_VIRTIO_9P) += virtio-9p-pci.o
 obj-$(CONFIG_VIRTIO_SCSI) += virtio-scsi-pci.o
 obj-$(CONFIG_VIRTIO_BLK) += virtio-blk-pci.o
diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
new file mode 100644
index 0000000000..280230b31e
--- /dev/null
+++ b/hw/virtio/virtio-iommu-pci.c
@@ -0,0 +1,91 @@
+/*
+ * Virtio IOMMU PCI Bindings
+ *
+ * Copyright (c) 2019 Red Hat, Inc.
+ * Written by Eric Auger
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License version 2 or
+ *  (at your option) any later version.
+ */
+
+#include "qemu/osdep.h"
+
+#include "virtio-pci.h"
+#include "hw/virtio/virtio-iommu.h"
+#include "hw/qdev-properties.h"
+
+typedef struct VirtIOIOMMUPCI VirtIOIOMMUPCI;
+
+/*
+ * virtio-iommu-pci: This extends VirtioPCIProxy.
+ *
+ */
+#define VIRTIO_IOMMU_PCI(obj) \
+        OBJECT_CHECK(VirtIOIOMMUPCI, (obj), TYPE_VIRTIO_IOMMU_PCI)
+
+struct VirtIOIOMMUPCI {
+    VirtIOPCIProxy parent_obj;
+    VirtIOIOMMU vdev;
+};
+
+static Property virtio_iommu_pci_properties[] = {
+    DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
+    DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
+                      vdev.nb_reserved_regions, vdev.reserved_regions,
+                      qdev_prop_interval, Interval),
+    DEFINE_PROP_END_OF_LIST(),
+};
+
+static void virtio_iommu_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(vpci_dev);
+    DeviceState *vdev = DEVICE(&dev->vdev);
+
+    qdev_set_parent_bus(vdev, BUS(&vpci_dev->bus));
+    object_property_set_link(OBJECT(dev),
+                             OBJECT(pci_get_bus(&vpci_dev->pci_dev)),
+                             "primary-bus", errp);
+    object_property_set_bool(OBJECT(vdev), true, "realized", errp);
+}
+
+static void virtio_iommu_pci_class_init(ObjectClass *klass, void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+    PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+    k->realize = virtio_iommu_pci_realize;
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    dc->props = virtio_iommu_pci_properties;
+    pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+    pcidev_k->device_id = PCI_DEVICE_ID_VIRTIO_IOMMU;
+    pcidev_k->revision = VIRTIO_PCI_ABI_VERSION;
+    pcidev_k->class_id = PCI_CLASS_OTHERS;
+}
+
+static void virtio_iommu_pci_instance_init(Object *obj)
+{
+    VirtIOIOMMUPCI *dev = VIRTIO_IOMMU_PCI(obj);
+
+    virtio_instance_init_common(obj, &dev->vdev, sizeof(dev->vdev),
+                                TYPE_VIRTIO_IOMMU);
+}
+
+static const VirtioPCIDeviceTypeInfo virtio_iommu_pci_info = {
+    .base_name             = TYPE_VIRTIO_IOMMU_PCI,
+    .generic_name          = "virtio-iommu-pci",
+    .transitional_name     = "virtio-iommu-pci-transitional",
+    .non_transitional_name = "virtio-iommu-pci-non-transitional",
+    .instance_size = sizeof(VirtIOIOMMUPCI),
+    .instance_init = virtio_iommu_pci_instance_init,
+    .class_init    = virtio_iommu_pci_class_init,
+};
+
+static void virtio_iommu_pci_register(void)
+{
+    virtio_pci_types_register(&virtio_iommu_pci_info);
+}
+
+type_init(virtio_iommu_pci_register)
+
+
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index db75c6dfd0..d7715c826a 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -86,6 +86,7 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_VIRTIO_9P          0x1009
 #define PCI_DEVICE_ID_VIRTIO_VSOCK       0x1012
 #define PCI_DEVICE_ID_VIRTIO_PMEM        0x1013
+#define PCI_DEVICE_ID_VIRTIO_IOMMU       0x1014
 
 #define PCI_VENDOR_ID_REDHAT             0x1b36
 #define PCI_DEVICE_ID_REDHAT_BRIDGE      0x0001
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index f55f48d304..1ab6993d29 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -25,6 +25,7 @@
 #include "hw/pci/pci.h"
 
 #define TYPE_VIRTIO_IOMMU "virtio-iommu-device"
+#define TYPE_VIRTIO_IOMMU_PCI "virtio-iommu-device-base"
 #define VIRTIO_IOMMU(obj) \
         OBJECT_CHECK(VirtIOIOMMU, (obj), TYPE_VIRTIO_IOMMU)
 
diff --git a/qdev-monitor.c b/qdev-monitor.c
index e6b112eb0a..e61ca62061 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -66,6 +66,7 @@ static const QDevAlias qdev_alias_table[] = {
     { "virtio-input-host-ccw", "virtio-input-host", QEMU_ARCH_S390X },
     { "virtio-input-host-pci", "virtio-input-host",
             QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
+    { "virtio-iommu-pci", "virtio-iommu", QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
     { "virtio-keyboard-ccw", "virtio-keyboard", QEMU_ARCH_S390X },
     { "virtio-keyboard-pci", "virtio-keyboard",
             QEMU_ARCH_ALL & ~QEMU_ARCH_S390X },
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 11/20] hw/arm/virt: Add the virtio-iommu device tree mappings
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (9 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 10/20] virtio-iommu-pci: Add virtio iommu pci support Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:45   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL Eric Auger
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

Adds the "virtio,pci-iommu" node in the host bridge node and
the RID mapping, excluding the IOMMU RID.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- remove msi_bypass

v8 -> v9:
- disable msi-bypass property
- addition of the subnode is handled is the hotplug handler
  and IOMMU RID is notimposed anymore

v6 -> v7:
- align to the smmu instantiation code

v4 -> v5:
- VirtMachineClass no_iommu added in this patch
- Use object_resolve_path_type
---
 hw/arm/virt.c                | 53 +++++++++++++++++++++++++++++++-----
 hw/virtio/virtio-iommu-pci.c |  3 --
 include/hw/arm/virt.h        |  2 ++
 3 files changed, 48 insertions(+), 10 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index d4bedc2607..cb6a95e7c8 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -32,6 +32,7 @@
 #include "qemu-common.h"
 #include "qemu/units.h"
 #include "qemu/option.h"
+#include "monitor/qdev.h"
 #include "qapi/error.h"
 #include "hw/sysbus.h"
 #include "hw/boards.h"
@@ -54,6 +55,7 @@
 #include "qemu/error-report.h"
 #include "qemu/module.h"
 #include "hw/pci-host/gpex.h"
+#include "hw/virtio/virtio-pci.h"
 #include "hw/arm/sysbus-fdt.h"
 #include "hw/platform-bus.h"
 #include "hw/qdev-properties.h"
@@ -71,6 +73,7 @@
 #include "hw/mem/pc-dimm.h"
 #include "hw/mem/nvdimm.h"
 #include "hw/acpi/generic_event_device.h"
+#include "hw/virtio/virtio-iommu.h"
 
 #define DEFINE_VIRT_MACHINE_LATEST(major, minor, latest) \
     static void virt_##major##_##minor##_class_init(ObjectClass *oc, \
@@ -1181,6 +1184,30 @@ static void create_smmu(const VirtMachineState *vms, qemu_irq *pic,
     g_free(node);
 }
 
+static void create_virtio_iommu(VirtMachineState *vms, Error **errp)
+{
+    const char compat[] = "virtio,pci-iommu";
+    uint16_t bdf = vms->virtio_iommu_bdf;
+    char *node;
+
+    vms->iommu_phandle = qemu_fdt_alloc_phandle(vms->fdt);
+
+    node = g_strdup_printf("%s/virtio_iommu@%d", vms->pciehb_nodename, bdf);
+    qemu_fdt_add_subnode(vms->fdt, node);
+    qemu_fdt_setprop(vms->fdt, node, "compatible", compat, sizeof(compat));
+    qemu_fdt_setprop_sized_cells(vms->fdt, node, "reg",
+                                 1, bdf << 8, 1, 0, 1, 0,
+                                 1, 0, 1, 0);
+
+    qemu_fdt_setprop_cell(vms->fdt, node, "#iommu-cells", 1);
+    qemu_fdt_setprop_cell(vms->fdt, node, "phandle", vms->iommu_phandle);
+    g_free(node);
+
+    qemu_fdt_setprop_cells(vms->fdt, vms->pciehb_nodename, "iommu-map",
+                           0x0, vms->iommu_phandle, 0x0, bdf,
+                           bdf + 1, vms->iommu_phandle, bdf + 1, 0xffff - bdf);
+}
+
 static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
 {
     hwaddr base_mmio = vms->memmap[VIRT_PCIE_MMIO].base;
@@ -1258,7 +1285,7 @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
         }
     }
 
-    nodename = g_strdup_printf("/pcie@%" PRIx64, base);
+    nodename = vms->pciehb_nodename = g_strdup_printf("/pcie@%" PRIx64, base);
     qemu_fdt_add_subnode(vms->fdt, nodename);
     qemu_fdt_setprop_string(vms->fdt, nodename,
                             "compatible", "pci-host-ecam-generic");
@@ -1301,13 +1328,17 @@ static void create_pcie(VirtMachineState *vms, qemu_irq *pic)
     if (vms->iommu) {
         vms->iommu_phandle = qemu_fdt_alloc_phandle(vms->fdt);
 
-        create_smmu(vms, pic, pci->bus);
+        switch (vms->iommu) {
+        case VIRT_IOMMU_SMMUV3:
+            create_smmu(vms, pic, pci->bus);
+            qemu_fdt_setprop_cells(vms->fdt, nodename, "iommu-map",
+                                   0x0, vms->iommu_phandle, 0x0, 0x10000);
+            break;
+        default:
+            g_assert_not_reached();
+        }
 
-        qemu_fdt_setprop_cells(vms->fdt, nodename, "iommu-map",
-                               0x0, vms->iommu_phandle, 0x0, 0x10000);
     }
-
-    g_free(nodename);
 }
 
 static void create_platform_bus(VirtMachineState *vms, qemu_irq *pic)
@@ -1972,6 +2003,13 @@ static void virt_machine_device_plug_cb(HotplugHandler *hotplug_dev,
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) {
         virt_memory_plug(hotplug_dev, dev, errp);
     }
+    if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
+        PCIDevice *pdev = PCI_DEVICE(dev);
+
+        vms->iommu = VIRT_IOMMU_VIRTIO;
+        vms->virtio_iommu_bdf = pci_get_bdf(pdev);
+        create_virtio_iommu(vms, errp);
+    }
 }
 
 static void virt_machine_device_unplug_request_cb(HotplugHandler *hotplug_dev,
@@ -1985,7 +2023,8 @@ static HotplugHandler *virt_machine_get_hotplug_handler(MachineState *machine,
                                                         DeviceState *dev)
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_SYS_BUS_DEVICE) ||
-       (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM))) {
+       (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM)) ||
+       (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI))) {
         return HOTPLUG_HANDLER(machine);
     }
 
diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
index 280230b31e..4cfae1f9df 100644
--- a/hw/virtio/virtio-iommu-pci.c
+++ b/hw/virtio/virtio-iommu-pci.c
@@ -31,9 +31,6 @@ struct VirtIOIOMMUPCI {
 
 static Property virtio_iommu_pci_properties[] = {
     DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
-    DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
-                      vdev.nb_reserved_regions, vdev.reserved_regions,
-                      qdev_prop_interval, Interval),
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 0b41083e9d..32fb1142ef 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -124,8 +124,10 @@ typedef struct {
     bool virt;
     int32_t gic_version;
     VirtIOMMUType iommu;
+    uint16_t virtio_iommu_bdf;
     struct arm_boot_info bootinfo;
     MemMapEntry *memmap;
+    char *pciehb_nodename;
     const int *irqmap;
     int smp_cpus;
     void *fdt;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (10 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 11/20] hw/arm/virt: Add the virtio-iommu device tree mappings Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-11-22 19:03   ` Dr. David Alan Gilbert
  2019-12-12 12:17   ` Markus Armbruster
  2019-11-22 18:29 ` [PATCH for-5.0 v11 13/20] virtio-iommu: Implement probe request Eric Auger
                   ` (9 subsequent siblings)
  21 siblings, 2 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

Introduce a new property defining a labelled interval:
<low address>,<high address>,label.

This will be used to encode reserved IOVA regions. The label
is left undefined to ease reuse accross use cases.

For instance, in virtio-iommu use case, reserved IOVA regions
will be passed by the machine code to the virtio-iommu-pci
device (an array of those). The label will match the
virtio_iommu_probe_resv_mem subtype value:
- VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0)
- VIRTIO_IOMMU_RESV_MEM_T_MSI (1)

This is used to inform the virtio-iommu-pci device it should
bypass the MSI region: 0xfee00000, 0xfeefffff, 1.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/core/qdev-properties.c    | 90 ++++++++++++++++++++++++++++++++++++
 include/exec/memory.h        |  6 +++
 include/hw/qdev-properties.h |  3 ++
 include/qemu/typedefs.h      |  1 +
 4 files changed, 100 insertions(+)

diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
index ac28890e5a..8d70f34e37 100644
--- a/hw/core/qdev-properties.c
+++ b/hw/core/qdev-properties.c
@@ -13,6 +13,7 @@
 #include "qapi/visitor.h"
 #include "chardev/char.h"
 #include "qemu/uuid.h"
+#include "qemu/cutils.h"
 
 void qdev_prop_set_after_realize(DeviceState *dev, const char *name,
                                   Error **errp)
@@ -585,6 +586,95 @@ const PropertyInfo qdev_prop_macaddr = {
     .set   = set_mac,
 };
 
+/* --- Labelled Interval --- */
+
+/*
+ * accepted syntax versions:
+ *   <low address>,<high address>,<type>
+ *   where low/high addresses are uint64_t in hexa (feat. 0x prefix)
+ *   and type is an unsigned integer
+ */
+static void get_interval(Object *obj, Visitor *v, const char *name,
+                         void *opaque, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    Interval *interval = qdev_get_prop_ptr(dev, prop);
+    char buffer[64];
+    char *p = buffer;
+
+    snprintf(buffer, sizeof(buffer), "0x%"PRIx64",0x%"PRIx64",%d",
+             interval->low, interval->high, interval->type);
+
+    visit_type_str(v, name, &p, errp);
+}
+
+static void set_interval(Object *obj, Visitor *v, const char *name,
+                         void *opaque, Error **errp)
+{
+    DeviceState *dev = DEVICE(obj);
+    Property *prop = opaque;
+    Interval *interval = qdev_get_prop_ptr(dev, prop);
+    Error *local_err = NULL;
+    unsigned int type;
+    gchar **fields;
+    uint64_t addr;
+    char *str;
+    int ret;
+
+    if (dev->realized) {
+        qdev_prop_set_after_realize(dev, name, errp);
+        return;
+    }
+
+    visit_type_str(v, name, &str, &local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
+    fields = g_strsplit(str, ",", 3);
+
+    ret = qemu_strtou64(fields[0], NULL, 16, &addr);
+    if (!ret) {
+        interval->low = addr;
+    } else {
+        error_setg(errp, "Failed to decode interval low addr");
+        error_append_hint(errp,
+                          "should be an address in hexa with 0x prefix\n");
+        goto out;
+    }
+
+    ret = qemu_strtou64(fields[1], NULL, 16, &addr);
+    if (!ret) {
+        interval->high = addr;
+    } else {
+        error_setg(errp, "Failed to decode interval high addr");
+        error_append_hint(errp,
+                          "should be an address in hexa with 0x prefix\n");
+        goto out;
+    }
+
+    ret = qemu_strtoui(fields[2], NULL, 10, &type);
+    if (!ret) {
+        interval->type = type;
+    } else {
+        error_setg(errp, "Failed to decode interval type");
+        error_append_hint(errp, "should be an unsigned int in decimal\n");
+    }
+out:
+    g_free(str);
+    g_strfreev(fields);
+    return;
+}
+
+const PropertyInfo qdev_prop_interval = {
+    .name  = "labelled_interval",
+    .description = "Labelled interval, example: 0xFEE00000,0xFEEFFFFF,0",
+    .get   = get_interval,
+    .set   = set_interval,
+};
+
 /* --- on/off/auto --- */
 
 const PropertyInfo qdev_prop_on_off_auto = {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index e499dc215b..e238d1c352 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -57,6 +57,12 @@ struct MemoryRegionMmio {
     CPUWriteMemoryFunc *write[3];
 };
 
+struct Interval {
+    hwaddr low;
+    hwaddr high;
+    unsigned int type;
+};
+
 typedef struct IOMMUTLBEntry IOMMUTLBEntry;
 
 /* See address_space_translate: bit 0 is read, bit 1 is write.  */
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index c6a8cb5516..2ba7c8711b 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -20,6 +20,7 @@ extern const PropertyInfo qdev_prop_chr;
 extern const PropertyInfo qdev_prop_tpm;
 extern const PropertyInfo qdev_prop_ptr;
 extern const PropertyInfo qdev_prop_macaddr;
+extern const PropertyInfo qdev_prop_interval;
 extern const PropertyInfo qdev_prop_on_off_auto;
 extern const PropertyInfo qdev_prop_losttickpolicy;
 extern const PropertyInfo qdev_prop_blockdev_on_error;
@@ -202,6 +203,8 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
     DEFINE_PROP(_n, _s, _f, qdev_prop_drive_iothread, BlockBackend *)
 #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
     DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
+#define DEFINE_PROP_INTERVAL(_n, _s, _f)         \
+    DEFINE_PROP(_n, _s, _f, qdev_prop_interval, Interval)
 #define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
     DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
 #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index 375770a80f..a827c9a3fe 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -58,6 +58,7 @@ typedef struct ISABus ISABus;
 typedef struct ISADevice ISADevice;
 typedef struct IsaDma IsaDma;
 typedef struct MACAddr MACAddr;
+typedef struct Interval Interval;
 typedef struct MachineClass MachineClass;
 typedef struct MachineState MachineState;
 typedef struct MemoryListener MemoryListener;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 13/20] virtio-iommu: Implement probe request
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (11 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:46   ` Jean-Philippe Brucker
  2019-12-10 19:36   ` Peter Xu
  2019-11-22 18:29 ` [PATCH for-5.0 v11 14/20] virtio-iommu: Handle reserved regions in the translation process Eric Auger
                   ` (8 subsequent siblings)
  21 siblings, 2 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch implements the PROBE request. At the moment,
no reserved regions are returned as none are registered
per device. Only a NONE property is returned.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v9 -> v10
- fully rewrite the code in preparation of
  reserved_regions array property introduction

v8 -> v9:
- fix filling of properties (changes induced by v0.7 -> v0.8 spec
  evolution)
- return VIRTIO_IOMMU_S_INVAL in case of error

v7 -> v8:
- adapt to removal of value filed in virtio_iommu_probe_property

v6 -> v7:
- adapt to the change in virtio_iommu_probe_resv_mem fields
- use get_endpoint() instead of directly checking the EP
  was registered.

v4 -> v5:
- initialize bufstate.error to false
- add cpu_to_le64(size)
---
 hw/virtio/trace-events           |  1 +
 hw/virtio/virtio-iommu.c         | 89 +++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-iommu.h |  2 +
 3 files changed, 90 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index a572eb71aa..b7bc8ac6d1 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -74,3 +74,4 @@ virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
 virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
 virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
 virtio_iommu_report_fault(uint8_t reason, uint32_t flags, uint32_t endpoint, uint64_t addr) "FAULT reason=%d flags=%d endpoint=%d address =0x%"PRIx64
+virtio_iommu_fill_resv_property(uint32_t devid, uint8_t subtype, uint64_t start, uint64_t end) "dev= %d, type=%d start=0x%"PRIx64" end=0x%"PRIx64
diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 723616a5db..1ce2218935 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -38,6 +38,7 @@
 
 /* Max size */
 #define VIOMMU_DEFAULT_QUEUE_SIZE 256
+#define VIOMMU_PROBE_SIZE 512
 
 typedef struct viommu_domain {
     uint32_t id;
@@ -317,6 +318,61 @@ static int virtio_iommu_unmap(VirtIOIOMMU *s,
     return ret;
 }
 
+static ssize_t virtio_iommu_fill_resv_mem_prop(VirtIOIOMMU *s, uint32_t ep,
+                                               uint8_t *buf, size_t free)
+{
+    struct virtio_iommu_probe_resv_mem prop = {};
+    size_t size = sizeof(prop), length = size - sizeof(prop.head), total;
+    int i;
+
+    total = size * s->nb_reserved_regions;
+
+    if (total > free) {
+        return -ENOSPC;
+    }
+
+    for (i = 0; i < s->nb_reserved_regions; i++) {
+        prop.head.type = VIRTIO_IOMMU_PROBE_T_RESV_MEM;
+        prop.head.length = cpu_to_le64(length);
+        prop.subtype = cpu_to_le64(s->reserved_regions[i].type);
+        prop.start = cpu_to_le64(s->reserved_regions[i].low);
+        prop.end = cpu_to_le64(s->reserved_regions[i].high);
+
+        memcpy(buf, &prop, size);
+
+        trace_virtio_iommu_fill_resv_property(ep, prop.subtype,
+                                              prop.start, prop.end);
+        buf += size;
+    }
+    return total;
+}
+
+/**
+ * virtio_iommu_probe - Fill the probe request buffer with
+ * the properties the device is able to return and add a NONE
+ * property at the end.
+ */
+static int virtio_iommu_probe(VirtIOIOMMU *s,
+                              struct virtio_iommu_req_probe *req,
+                              uint8_t *buf)
+{
+    uint32_t ep_id = le32_to_cpu(req->endpoint);
+    struct virtio_iommu_probe_property last = {};
+    size_t free = VIOMMU_PROBE_SIZE - sizeof(last);
+    ssize_t count;
+
+    count = virtio_iommu_fill_resv_mem_prop(s, ep_id, buf, free);
+    if (count < 0) {
+            return VIRTIO_IOMMU_S_INVAL;
+    }
+    buf += count;
+    free -= count;
+
+    memcpy(buf, &last, sizeof(last));
+
+    return VIRTIO_IOMMU_S_OK;
+}
+
 static int virtio_iommu_iov_to_req(struct iovec *iov,
                                    unsigned int iov_cnt,
                                    void *req, size_t req_sz)
@@ -346,6 +402,17 @@ virtio_iommu_handle_req(detach)
 virtio_iommu_handle_req(map)
 virtio_iommu_handle_req(unmap)
 
+static int virtio_iommu_handle_probe(VirtIOIOMMU *s,
+                                     struct iovec *iov,
+                                     unsigned int iov_cnt,
+                                     uint8_t *buf)
+{
+    struct virtio_iommu_req_probe req;
+    int ret = virtio_iommu_iov_to_req(iov, iov_cnt, &req, sizeof(req));
+
+    return ret ? ret : virtio_iommu_probe(s, &req, buf);
+}
+
 static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIOIOMMU *s = VIRTIO_IOMMU(vdev);
@@ -391,17 +458,33 @@ static void virtio_iommu_handle_command(VirtIODevice *vdev, VirtQueue *vq)
         case VIRTIO_IOMMU_T_UNMAP:
             tail.status = virtio_iommu_handle_unmap(s, iov, iov_cnt);
             break;
+        case VIRTIO_IOMMU_T_PROBE:
+        {
+            struct virtio_iommu_req_tail *ptail;
+            uint8_t *buf = g_malloc0(s->config.probe_size + sizeof(tail));
+
+            ptail = (struct virtio_iommu_req_tail *)
+                        (buf + s->config.probe_size);
+            ptail->status = virtio_iommu_handle_probe(s, iov, iov_cnt, buf);
+
+            sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
+                              buf, s->config.probe_size + sizeof(tail));
+            g_free(buf);
+            assert(sz == s->config.probe_size + sizeof(tail));
+            goto push;
+        }
         default:
             tail.status = VIRTIO_IOMMU_S_UNSUPP;
         }
-        qemu_mutex_unlock(&s->mutex);
 
 out:
         sz = iov_from_buf(elem->in_sg, elem->in_num, 0,
                           &tail, sizeof(tail));
         assert(sz == sizeof(tail));
 
-        virtqueue_push(vq, elem, sizeof(tail));
+push:
+        qemu_mutex_unlock(&s->mutex);
+        virtqueue_push(vq, elem, sz);
         virtio_notify(vdev, vq);
         g_free(elem);
     }
@@ -624,6 +707,7 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     s->config.page_size_mask = TARGET_PAGE_MASK;
     s->config.input_range.end = -1UL;
     s->config.domain_range.end = 32;
+    s->config.probe_size = VIOMMU_PROBE_SIZE;
 
     virtio_add_feature(&s->features, VIRTIO_RING_F_EVENT_IDX);
     virtio_add_feature(&s->features, VIRTIO_RING_F_INDIRECT_DESC);
@@ -633,6 +717,7 @@ static void virtio_iommu_device_realize(DeviceState *dev, Error **errp)
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MAP_UNMAP);
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_BYPASS);
     virtio_add_feature(&s->features, VIRTIO_IOMMU_F_MMIO);
+    virtio_add_feature(&s->features, VIRTIO_IOMMU_F_PROBE);
 
     qemu_mutex_init(&s->mutex);
 
diff --git a/include/hw/virtio/virtio-iommu.h b/include/hw/virtio/virtio-iommu.h
index 1ab6993d29..4176785368 100644
--- a/include/hw/virtio/virtio-iommu.h
+++ b/include/hw/virtio/virtio-iommu.h
@@ -57,6 +57,8 @@ typedef struct VirtIOIOMMU {
     GHashTable *as_by_busptr;
     IOMMUPciBus *as_by_bus_num[IOMMU_PCI_BUS_MAX];
     PCIBus *primary_bus;
+    Interval *reserved_regions;
+    uint32_t nb_reserved_regions;
     GTree *domains;
     QemuMutex mutex;
     GTree *endpoints;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 14/20] virtio-iommu: Handle reserved regions in the translation process
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (12 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 13/20] virtio-iommu: Implement probe request Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:46   ` Jean-Philippe Brucker
  2019-12-10 19:39   ` Peter Xu
  2019-11-22 18:29 ` [PATCH for-5.0 v11 15/20] virtio-iommu-pci: Add array of Interval properties Eric Auger
                   ` (7 subsequent siblings)
  21 siblings, 2 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

When translating an address we need to check if it belongs to
a reserved virtual address range. If it does, there are 2 cases:

- it belongs to a RESERVED region: the guest should neither use
  this address in a MAP not instruct the end-point to DMA on
  them. We report an error

- It belongs to an MSI region: we bypass the translation.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v10 -> v11:
- directly use the reserved_regions properties array

v9 -> v10:
- in case of MSI region, we immediatly return
---
 hw/virtio/virtio-iommu.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 1ce2218935..c5b202fab7 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -548,6 +548,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
     uint32_t sid, flags;
     bool bypass_allowed;
     bool found;
+    int i;
 
     interval.low = addr;
     interval.high = addr + 1;
@@ -580,6 +581,22 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
         goto unlock;
     }
 
+    for (i = 0; i < s->nb_reserved_regions; i++) {
+        if (interval.low >= s->reserved_regions[i].low &&
+            interval.low <= s->reserved_regions[i].high) {
+            switch (s->reserved_regions[i].type) {
+            case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+                entry.perm = flag;
+                goto unlock;
+            case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+            default:
+                virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
+                                          0, sid, addr);
+            goto unlock;
+           }
+        }
+    }
+
     if (!ep->domain) {
         if (!bypass_allowed) {
             qemu_log_mask(LOG_GUEST_ERROR,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 15/20] virtio-iommu-pci: Add array of Interval properties
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (13 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 14/20] virtio-iommu: Handle reserved regions in the translation process Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:47   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 16/20] hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper Eric Auger
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

The machine may need to pass reserved regions to the
virtio-iommu-pci device (such as the MSI window on x86).
So let's add an array of Interval properties.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/virtio-iommu-pci.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
index 4cfae1f9df..280230b31e 100644
--- a/hw/virtio/virtio-iommu-pci.c
+++ b/hw/virtio/virtio-iommu-pci.c
@@ -31,6 +31,9 @@ struct VirtIOIOMMUPCI {
 
 static Property virtio_iommu_pci_properties[] = {
     DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
+    DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
+                      vdev.nb_reserved_regions, vdev.reserved_regions,
+                      qdev_prop_interval, Interval),
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 16/20] hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (14 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 15/20] virtio-iommu-pci: Add array of Interval properties Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:47   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 17/20] hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table Eric Auger
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

To avoid code duplication, let's introduce an helper that
fills one IORT ID mappings array index.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---

v8: new
---
 hw/arm/virt-acpi-build.c | 43 ++++++++++++++++++++++++----------------
 1 file changed, 26 insertions(+), 17 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 4cd50175e0..825f3a79c0 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -368,6 +368,17 @@ static void acpi_dsdt_add_power_button(Aml *scope)
     aml_append(scope, dev);
 }
 
+static inline void
+fill_iort_idmap(AcpiIortIdMapping *idmap, int i,
+                uint32_t input_base, uint32_t id_count,
+                uint32_t output_base, uint32_t output_reference)
+{
+    idmap[i].input_base = cpu_to_le32(input_base);
+    idmap[i].id_count = cpu_to_le32(id_count);
+    idmap[i].output_base = cpu_to_le32(output_base);
+    idmap[i].output_reference = cpu_to_le32(output_reference);
+}
+
 static void
 build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 {
@@ -426,13 +437,12 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
         smmu->gerr_gsiv = cpu_to_le32(irq + 2);
         smmu->sync_gsiv = cpu_to_le32(irq + 3);
 
-        /* Identity RID mapping covering the whole input RID range */
-        idmap = &smmu->id_mapping_array[0];
-        idmap->input_base = 0;
-        idmap->id_count = cpu_to_le32(0xFFFF);
-        idmap->output_base = 0;
-        /* output IORT node is the ITS group node (the first node) */
-        idmap->output_reference = cpu_to_le32(iort_node_offset);
+        /*
+         * Identity RID mapping covering the whole input RID range.
+         * The output IORT node is the ITS group node (the first node).
+         */
+        fill_iort_idmap(smmu->id_mapping_array, 0, 0, 0xffff, 0,
+                        iort_node_offset);
     }
 
     /* Root Complex Node */
@@ -450,18 +460,17 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     rc->memory_properties.memory_flags = 0x3; /* CCA = CPM = DCAS = 1 */
     rc->pci_segment_number = 0; /* MCFG pci_segment */
 
-    /* Identity RID mapping covering the whole input RID range */
-    idmap = &rc->id_mapping_array[0];
-    idmap->input_base = 0;
-    idmap->id_count = cpu_to_le32(0xFFFF);
-    idmap->output_base = 0;
-
     if (vms->iommu == VIRT_IOMMU_SMMUV3) {
-        /* output IORT node is the smmuv3 node */
-        idmap->output_reference = cpu_to_le32(smmu_offset);
+        /* Identity RID mapping and output IORT node is the iommu node */
+        fill_iort_idmap(rc->id_mapping_array, 0, 0, 0xFFFF, 0,
+                        smmu_offset);
     } else {
-        /* output IORT node is the ITS group node (the first node) */
-        idmap->output_reference = cpu_to_le32(iort_node_offset);
+        /*
+         * Identity RID mapping and the output IORT node is the ITS group
+         * node (the first node).
+         */
+        fill_iort_idmap(rc->id_mapping_array, 0, 0, 0xFFFF, 0,
+                        iort_node_offset);
     }
 
     /*
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 17/20] hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (15 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 16/20] hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:48   ` Jean-Philippe Brucker
  2019-11-22 18:29 ` [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration Eric Auger
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This patch builds the virtio-iommu node in the ACPI IORT table.

The RID space of the root complex, which spans 0x0-0x10000
maps to streamid space 0x0-0x10000 in the virtio-iommu which in
turn maps to deviceid space 0x0-0x10000 in the ITS group.

The iommu RID is excluded as described in virtio-iommu
specification.

Signed-off-by: Eric Auger <eric.auger@redhat.com>

---
v8 -> v9:
- iommu RID is not fixed anymore

v7 -> v8:
- exclude the iommu RID (0x8) in the root complex ID mapping
---
 hw/arm/virt-acpi-build.c    | 50 ++++++++++++++++++++++++++++++-------
 include/hw/acpi/acpi-defs.h | 21 +++++++++++++++-
 2 files changed, 61 insertions(+), 10 deletions(-)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 825f3a79c0..1e22cbbbfd 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -386,14 +386,14 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     AcpiIortIdMapping *idmap;
     AcpiIortItsGroup *its;
     AcpiIortTable *iort;
-    AcpiIortSmmu3 *smmu;
-    size_t node_size, iort_node_offset, iort_length, smmu_offset = 0;
+    size_t node_size, iort_node_offset, iort_length, iommu_offset = 0;
     AcpiIortRC *rc;
+    int nb_rc_idmappings = 1;
 
     iort = acpi_data_push(table_data, sizeof(*iort));
 
-    if (vms->iommu == VIRT_IOMMU_SMMUV3) {
-        nb_nodes = 3; /* RC, ITS, SMMUv3 */
+    if (vms->iommu) {
+        nb_nodes = 3; /* RC, ITS, IOMMU */
     } else {
         nb_nodes = 2; /* RC, ITS */
     }
@@ -419,9 +419,9 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
 
     if (vms->iommu == VIRT_IOMMU_SMMUV3) {
         int irq =  vms->irqmap[VIRT_SMMU] + ARM_SPI_BASE;
+        AcpiIortSmmu3 *smmu;
 
-        /* SMMUv3 node */
-        smmu_offset = iort_node_offset + node_size;
+        iommu_offset = iort_node_offset + node_size;
         node_size = sizeof(*smmu) + sizeof(*idmap);
         iort_length += node_size;
         smmu = acpi_data_push(table_data, node_size);
@@ -443,16 +443,38 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
          */
         fill_iort_idmap(smmu->id_mapping_array, 0, 0, 0xffff, 0,
                         iort_node_offset);
+    } else if (vms->iommu == VIRT_IOMMU_VIRTIO) {
+        AcpiIortPVIommuPCI *iommu;
+
+        nb_rc_idmappings = 2;
+        iommu_offset = iort_node_offset + node_size;
+        node_size = sizeof(*iommu) + 2 * sizeof(*idmap);
+        iort_length += node_size;
+        iommu = acpi_data_push(table_data, node_size);
+
+        iommu->type = ACPI_IORT_NODE_PARAVIRT;
+        iommu->length = cpu_to_le16(node_size);
+        iommu->mapping_count = cpu_to_le32(2);
+        iommu->mapping_offset = cpu_to_le32(sizeof(*iommu));
+        iommu->devid = cpu_to_le32(vms->virtio_iommu_bdf);
+        iommu->model = cpu_to_le32(ACPI_IORT_NODE_PV_VIRTIO_IOMMU_PCI);
+
+        /*
+         * Identity RID mapping covering the whole input RID range
+         * output IORT node is the ITS group node (the first node)
+         */
+        fill_iort_idmap(iommu->id_mapping_array, 0, 0, 0xffff, 0,
+                        iort_node_offset);
     }
 
     /* Root Complex Node */
-    node_size = sizeof(*rc) + sizeof(*idmap);
+    node_size = sizeof(*rc) + nb_rc_idmappings * sizeof(*idmap);
     iort_length += node_size;
     rc = acpi_data_push(table_data, node_size);
 
     rc->type = ACPI_IORT_NODE_PCI_ROOT_COMPLEX;
     rc->length = cpu_to_le16(node_size);
-    rc->mapping_count = cpu_to_le32(1);
+    rc->mapping_count = cpu_to_le32(nb_rc_idmappings);
     rc->mapping_offset = cpu_to_le32(sizeof(*rc));
 
     /* fully coherent device */
@@ -463,7 +485,17 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
     if (vms->iommu == VIRT_IOMMU_SMMUV3) {
         /* Identity RID mapping and output IORT node is the iommu node */
         fill_iort_idmap(rc->id_mapping_array, 0, 0, 0xFFFF, 0,
-                        smmu_offset);
+                        iommu_offset);
+    } else if (vms->iommu == VIRT_IOMMU_VIRTIO) {
+        /*
+         * Identity mapping with the IOMMU RID (0x8) excluded. The output
+         * IORT node is the iommu node.
+         */
+        fill_iort_idmap(rc->id_mapping_array, 0, 0, vms->virtio_iommu_bdf, 0,
+                        iommu_offset);
+        fill_iort_idmap(rc->id_mapping_array, 1, vms->virtio_iommu_bdf + 1,
+                        0xFFFF - vms->virtio_iommu_bdf,
+                        vms->virtio_iommu_bdf + 1, iommu_offset);
     } else {
         /*
          * Identity RID mapping and the output IORT node is the ITS group
diff --git a/include/hw/acpi/acpi-defs.h b/include/hw/acpi/acpi-defs.h
index 57a3f58b0c..ba06f41fc0 100644
--- a/include/hw/acpi/acpi-defs.h
+++ b/include/hw/acpi/acpi-defs.h
@@ -581,7 +581,8 @@ enum {
         ACPI_IORT_NODE_NAMED_COMPONENT = 0x01,
         ACPI_IORT_NODE_PCI_ROOT_COMPLEX = 0x02,
         ACPI_IORT_NODE_SMMU = 0x03,
-        ACPI_IORT_NODE_SMMU_V3 = 0x04
+        ACPI_IORT_NODE_SMMU_V3 = 0x04,
+        ACPI_IORT_NODE_PARAVIRT = 0x80
 };
 
 struct AcpiIortIdMapping {
@@ -610,6 +611,24 @@ typedef struct AcpiIortItsGroup AcpiIortItsGroup;
 
 #define ACPI_IORT_SMMU_V3_COHACC_OVERRIDE 1
 
+struct AcpiIortPVIommuPCI {
+    ACPI_IORT_NODE_HEADER_DEF
+    uint32_t devid;
+    uint8_t reserved2[12];
+    uint32_t model;
+    uint32_t flags;
+    uint8_t reserved3[16];
+    AcpiIortIdMapping id_mapping_array[0];
+} QEMU_PACKED;
+typedef struct AcpiIortPVIommuPCI AcpiIortPVIommuPCI;
+
+enum {
+    ACPI_IORT_NODE_PV_VIRTIO_IOMMU     = 0x0,
+    ACPI_IORT_NODE_PV_VIRTIO_IOMMU_PCI = 0x1,
+};
+
+#define ACPI_IORT_NODE_PV_CACHE_COHERENT    (1 << 0)
+
 struct AcpiIortSmmu3 {
     ACPI_IORT_NODE_HEADER_DEF
     uint64_t base_address;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (16 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 17/20] hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-11-27 12:06   ` Dr. David Alan Gilbert
                     ` (2 more replies)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci Eric Auger
                   ` (3 subsequent siblings)
  21 siblings, 3 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

Add Migration support. We rely on recently added gtree and qlist
migration. Besides, we have to fixup end point <-> domain link.

Indeed each domain has a list of endpoints attached to it. And each
endpoint has a pointer to its domain.

Raw gtree and qlist migration cannot handle this as it re-allocates
all the nodes while reconstructing the trees/lists.

So in post_load we re-construct the relationship between endpoints
and domains.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/virtio/virtio-iommu.c | 127 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 117 insertions(+), 10 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index c5b202fab7..4e92fc0c95 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -692,16 +692,6 @@ static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
     trace_virtio_iommu_set_features(dev->acked_features);
 }
 
-/*
- * Migration is not yet supported: most of the state consists
- * of balanced binary trees which are not yet ready for getting
- * migrated
- */
-static const VMStateDescription vmstate_virtio_iommu_device = {
-    .name = "virtio-iommu-device",
-    .unmigratable = 1,
-};
-
 static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
 {
     uint ua = GPOINTER_TO_UINT(a);
@@ -778,6 +768,123 @@ static void virtio_iommu_instance_init(Object *obj)
 {
 }
 
+#define VMSTATE_INTERVAL                               \
+{                                                      \
+    .name = "interval",                                \
+    .version_id = 1,                                   \
+    .minimum_version_id = 1,                           \
+    .fields = (VMStateField[]) {                       \
+        VMSTATE_UINT64(low, viommu_interval),          \
+        VMSTATE_UINT64(high, viommu_interval),         \
+        VMSTATE_END_OF_LIST()                          \
+    }                                                  \
+}
+
+#define VMSTATE_MAPPING                               \
+{                                                     \
+    .name = "mapping",                                \
+    .version_id = 1,                                  \
+    .minimum_version_id = 1,                          \
+    .fields = (VMStateField[]) {                      \
+        VMSTATE_UINT64(phys_addr, viommu_mapping),    \
+        VMSTATE_UINT32(flags, viommu_mapping),        \
+        VMSTATE_END_OF_LIST()                         \
+    },                                                \
+}
+
+static const VMStateDescription vmstate_interval_mapping[2] = {
+    VMSTATE_MAPPING,   /* value */
+    VMSTATE_INTERVAL   /* key   */
+};
+
+static int domain_preload(void *opaque)
+{
+    viommu_domain *domain = opaque;
+
+    domain->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
+                                       NULL, g_free, g_free);
+    return 0;
+}
+
+static const VMStateDescription vmstate_endpoint = {
+    .name = "endpoint",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(id, viommu_endpoint),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static const VMStateDescription vmstate_domain = {
+    .name = "domain",
+    .version_id = 1,
+    .minimum_version_id = 1,
+    .pre_load = domain_preload,
+    .fields = (VMStateField[]) {
+        VMSTATE_UINT32(id, viommu_domain),
+        VMSTATE_GTREE_V(mappings, viommu_domain, 1,
+                        vmstate_interval_mapping,
+                        viommu_interval, viommu_mapping),
+        VMSTATE_QLIST_V(endpoint_list, viommu_domain, 1,
+                        vmstate_endpoint, viommu_endpoint, next),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
+static gboolean reconstruct_ep_domain_link(gpointer key, gpointer value,
+                                           gpointer data)
+{
+    viommu_domain *d = (viommu_domain *)value;
+    viommu_endpoint *iter, *tmp;
+    viommu_endpoint *ep = (viommu_endpoint *)data;
+
+    QLIST_FOREACH_SAFE(iter, &d->endpoint_list, next, tmp) {
+        if (iter->id == ep->id) {
+            /* remove the ep */
+            QLIST_REMOVE(iter, next);
+            g_free(iter);
+            /* replace it with the good one */
+            QLIST_INSERT_HEAD(&d->endpoint_list, ep, next);
+            /* update the domain */
+            ep->domain = d;
+            return true; /* stop the search */
+        }
+    }
+    return false; /* continue the traversal */
+}
+
+static gboolean fix_endpoint(gpointer key, gpointer value, gpointer data)
+{
+    VirtIOIOMMU *s = (VirtIOIOMMU *)data;
+
+    g_tree_foreach(s->domains, reconstruct_ep_domain_link, value);
+    return false;
+}
+
+static int iommu_post_load(void *opaque, int version_id)
+{
+    VirtIOIOMMU *s = opaque;
+
+    g_tree_foreach(s->endpoints, fix_endpoint, s);
+    return 0;
+}
+
+static const VMStateDescription vmstate_virtio_iommu_device = {
+    .name = "virtio-iommu-device",
+    .minimum_version_id = 1,
+    .version_id = 1,
+    .post_load = iommu_post_load,
+    .fields = (VMStateField[]) {
+        VMSTATE_GTREE_DIRECT_KEY_V(domains, VirtIOIOMMU, 1,
+                                   &vmstate_domain, viommu_domain),
+        VMSTATE_GTREE_DIRECT_KEY_V(endpoints, VirtIOIOMMU, 1,
+                                   &vmstate_endpoint, viommu_endpoint),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
+
 static const VMStateDescription vmstate_virtio_iommu = {
     .name = "virtio-iommu",
     .minimum_version_id = 1,
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (17 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-12-10 16:50   ` Jean-Philippe Brucker
  2020-01-09 12:02   ` Michael S. Tsirkin
  2019-11-22 18:29 ` [PATCH for-5.0 v11 20/20] tests: Add virtio-iommu test Eric Auger
                   ` (2 subsequent siblings)
  21 siblings, 2 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

The virtio-iommu-pci is instantiated through the -device QEMU
option. However if instantiated it also requires an IORT ACPI table
to describe the ID mappings between the root complex and the iommu.

This patch adds the generation of the IORT table if the
virtio-iommu-pci device is instantiated.

We also declare the [0xfee00000 - 0xfeefffff] MSI reserved region
so that it gets bypassed by the IOMMU.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 hw/i386/acpi-build.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
 hw/i386/pc.c         | 15 ++++++++-
 include/hw/i386/pc.h |  2 ++
 3 files changed, 88 insertions(+), 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 12ff55fcfb..f09cabdcae 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2744,6 +2744,72 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
     return true;
 }
 
+static inline void
+fill_iort_idmap(AcpiIortIdMapping *idmap, int i,
+                uint32_t input_base, uint32_t id_count,
+                uint32_t output_base, uint32_t output_reference)
+{
+    idmap[i].input_base = cpu_to_le32(input_base);
+    idmap[i].id_count = cpu_to_le32(id_count);
+    idmap[i].output_base = cpu_to_le32(output_base);
+    idmap[i].output_reference = cpu_to_le32(output_reference);
+}
+
+static void
+build_iort(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
+{
+    size_t iommu_node_size, rc_node_size, iommu_node_offset;
+    int iort_start = table_data->len;
+    AcpiIortPVIommuPCI *iommu;
+    AcpiIortIdMapping *idmap;
+    AcpiIortTable *iort;
+    size_t iort_length;
+    AcpiIortRC *rc;
+
+    iort = acpi_data_push(table_data, sizeof(*iort));
+    iort_length = sizeof(*iort);
+    iort->node_count = cpu_to_le32(2);
+
+    /* virtio-iommu node */
+
+    iommu_node_offset = sizeof(*iort);
+    iort->node_offset = cpu_to_le32(iommu_node_offset);
+    iommu_node_size = sizeof(*iommu);
+    iort_length += iommu_node_offset;
+    iommu = acpi_data_push(table_data, iommu_node_size);
+    iommu->type = ACPI_IORT_NODE_PARAVIRT;
+    iommu->length = cpu_to_le16(iommu_node_size);
+    iommu->mapping_count = 0;
+    iommu->devid = cpu_to_le32(pcms->virtio_iommu_bdf);
+    iommu->model = cpu_to_le32(ACPI_IORT_NODE_PV_VIRTIO_IOMMU_PCI);
+
+    /* Root Complex Node */
+    rc_node_size = sizeof(*rc) + 2 * sizeof(*idmap);
+    iort_length += rc_node_size;
+    rc = acpi_data_push(table_data, rc_node_size);
+
+    rc->type = ACPI_IORT_NODE_PCI_ROOT_COMPLEX;
+    rc->length = cpu_to_le16(rc_node_size);
+    rc->mapping_count = cpu_to_le32(2);
+    rc->mapping_offset = cpu_to_le32(sizeof(*rc));
+
+    /* fully coherent device */
+    rc->memory_properties.cache_coherency = cpu_to_le32(1);
+    rc->memory_properties.memory_flags = 0x3; /* CCA = CPM = DCAS = 1 */
+    rc->pci_segment_number = 0; /* MCFG pci_segment */
+    fill_iort_idmap(rc->id_mapping_array, 0, 0, pcms->virtio_iommu_bdf, 0,
+                    iommu_node_offset);
+    fill_iort_idmap(rc->id_mapping_array, 1, pcms->virtio_iommu_bdf + 1,
+                    0xFFFF - pcms->virtio_iommu_bdf,
+                    pcms->virtio_iommu_bdf + 1, iommu_node_offset);
+
+    iort = (AcpiIortTable *)(table_data->data + iort_start);
+    iort->length = cpu_to_le32(iort_length);
+
+    build_header(linker, table_data, (void *)(table_data->data + iort_start),
+                 "IORT", table_data->len - iort_start, 0, NULL, NULL);
+}
+
 static
 void acpi_build(AcpiBuildTables *tables, MachineState *machine)
 {
@@ -2835,6 +2901,12 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
             build_slit(tables_blob, tables->linker, machine);
         }
     }
+
+    if (pcms->virtio_iommu) {
+        acpi_add_table(table_offsets, tables_blob);
+        build_iort(tables_blob, tables->linker, pcms);
+    }
+
     if (acpi_get_mcfg(&mcfg)) {
         acpi_add_table(table_offsets, tables_blob);
         build_mcfg(tables_blob, tables->linker, &mcfg);
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index ac08e63604..af984ee041 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -84,6 +84,7 @@
 #include "hw/net/ne2000-isa.h"
 #include "standard-headers/asm-x86/bootparam.h"
 #include "hw/virtio/virtio-pmem-pci.h"
+#include "hw/virtio/virtio-iommu.h"
 #include "hw/mem/memory-device.h"
 #include "sysemu/replay.h"
 #include "qapi/qmp/qerror.h"
@@ -1940,6 +1941,11 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
         pc_cpu_pre_plug(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
         pc_virtio_pmem_pci_pre_plug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
+        /* we declare a VIRTIO_IOMMU_RESV_MEM_T_MSI region */
+        qdev_prop_set_uint32(dev, "len-reserved-regions", 1);
+        qdev_prop_set_string(dev, "reserved-regions[0]",
+                             "0xfee00000, 0xfeefffff, 1");
     }
 }
 
@@ -1952,6 +1958,12 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
         pc_cpu_plug(hotplug_dev, dev, errp);
     } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
         pc_virtio_pmem_pci_plug(hotplug_dev, dev, errp);
+    } else if (object_dynamic_cast(OBJECT(dev), "virtio-iommu-pci")) {
+        PCMachineState *pcms = PC_MACHINE(hotplug_dev);
+        PCIDevice *pdev = PCI_DEVICE(dev);
+
+        pcms->virtio_iommu = true;
+        pcms->virtio_iommu_bdf = pci_get_bdf(pdev);
     }
 }
 
@@ -1990,7 +2002,8 @@ static HotplugHandler *pc_get_hotplug_handler(MachineState *machine,
 {
     if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
         object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
-        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
+        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
+        object_dynamic_cast(OBJECT(dev), "virtio-iommu-pci")) {
         return HOTPLUG_HANDLER(machine);
     }
 
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 1f86eba3f9..221b4c6ef9 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -49,6 +49,8 @@ struct PCMachineState {
     bool smbus_enabled;
     bool sata_enabled;
     bool pit_enabled;
+    bool virtio_iommu;
+    uint16_t virtio_iommu_bdf;
 
     /* NUMA information: */
     uint64_t numa_nodes;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* [PATCH for-5.0 v11 20/20] tests: Add virtio-iommu test
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (18 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci Eric Auger
@ 2019-11-22 18:29 ` Eric Auger
  2019-11-22 21:56 ` [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device no-reply
  2019-12-11 16:40 ` Michael S. Tsirkin
  21 siblings, 0 replies; 89+ messages in thread
From: Eric Auger @ 2019-11-22 18:29 UTC (permalink / raw)
  To: eric.auger.pro, eric.auger, qemu-devel, qemu-arm, peter.maydell,
	mst, armbru, jean-philippe.brucker, bharatb.linux, yang.zhong,
	dgilbert, quintela
  Cc: kevin.tian, peterx, tnowicki

This adds the framework to test the virtio-iommu-pci device
and tests exercising the attach/detach, map/unmap API.

To run the tests:
make tests/qos-test
QTEST_QEMU_BINARY=x86_64-softmmu/qemu-system-x86_64 tests/qos-test V=1

Signed-off-by: Eric Auger <eric.auger@redhat.com>
---
 tests/Makefile.include      |   2 +
 tests/libqos/virtio-iommu.c | 177 ++++++++++++++++++++++++
 tests/libqos/virtio-iommu.h |  45 +++++++
 tests/virtio-iommu-test.c   | 261 ++++++++++++++++++++++++++++++++++++
 4 files changed, 485 insertions(+)
 create mode 100644 tests/libqos/virtio-iommu.c
 create mode 100644 tests/libqos/virtio-iommu.h
 create mode 100644 tests/virtio-iommu-test.c

diff --git a/tests/Makefile.include b/tests/Makefile.include
index 8566f5f119..76a303c4fb 100644
--- a/tests/Makefile.include
+++ b/tests/Makefile.include
@@ -734,6 +734,7 @@ qos-test-obj-y += tests/libqos/virtio-net.o
 qos-test-obj-y += tests/libqos/virtio-pci.o
 qos-test-obj-y += tests/libqos/virtio-pci-modern.o
 qos-test-obj-y += tests/libqos/virtio-rng.o
+qos-test-obj-y += tests/libqos/virtio-iommu.o
 qos-test-obj-y += tests/libqos/virtio-scsi.o
 qos-test-obj-y += tests/libqos/virtio-serial.o
 
@@ -773,6 +774,7 @@ qos-test-obj-$(CONFIG_VIRTFS) += tests/virtio-9p-test.o
 qos-test-obj-y += tests/virtio-blk-test.o
 qos-test-obj-y += tests/virtio-net-test.o
 qos-test-obj-y += tests/virtio-rng-test.o
+qos-test-obj-y += tests/virtio-iommu-test.o
 qos-test-obj-y += tests/virtio-scsi-test.o
 qos-test-obj-y += tests/virtio-serial-test.o
 qos-test-obj-y += tests/vmxnet3-test.o
diff --git a/tests/libqos/virtio-iommu.c b/tests/libqos/virtio-iommu.c
new file mode 100644
index 0000000000..b4e9ea44fb
--- /dev/null
+++ b/tests/libqos/virtio-iommu.c
@@ -0,0 +1,177 @@
+/*
+ * libqos driver framework
+ *
+ * Copyright (c) 2018 Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest.h"
+#include "qemu/module.h"
+#include "libqos/qgraph.h"
+#include "libqos/virtio-iommu.h"
+#include "hw/virtio/virtio-iommu.h"
+
+static QGuestAllocator *alloc;
+
+/* virtio-iommu-device */
+static void *qvirtio_iommu_get_driver(QVirtioIOMMU *v_iommu,
+                                      const char *interface)
+{
+    if (!g_strcmp0(interface, "virtio-iommu")) {
+        return v_iommu;
+    }
+    if (!g_strcmp0(interface, "virtio")) {
+        return v_iommu->vdev;
+    }
+
+    fprintf(stderr, "%s not present in virtio-iommu-device\n", interface);
+    g_assert_not_reached();
+}
+
+static void *qvirtio_iommu_device_get_driver(void *object,
+                                             const char *interface)
+{
+    QVirtioIOMMUDevice *v_iommu = object;
+    return qvirtio_iommu_get_driver(&v_iommu->iommu, interface);
+}
+
+static void virtio_iommu_cleanup(QVirtioIOMMU *interface)
+{
+    qvirtqueue_cleanup(interface->vdev->bus, interface->vq, alloc);
+}
+
+static void virtio_iommu_setup(QVirtioIOMMU *interface)
+{
+    QVirtioDevice *vdev = interface->vdev;
+    uint64_t features;
+
+    features = qvirtio_get_features(vdev);
+    features &= ~(QVIRTIO_F_BAD_FEATURE |
+                  (1ull << VIRTIO_RING_F_INDIRECT_DESC) |
+                  (1ull << VIRTIO_RING_F_EVENT_IDX) |
+                  (1ull << VIRTIO_IOMMU_F_BYPASS));
+    qvirtio_set_features(vdev, features);
+    interface->vq = qvirtqueue_setup(interface->vdev, alloc, 0);
+    qvirtio_set_driver_ok(interface->vdev);
+}
+
+static void qvirtio_iommu_device_destructor(QOSGraphObject *obj)
+{
+    QVirtioIOMMUDevice *v_iommu = (QVirtioIOMMUDevice *) obj;
+    QVirtioIOMMU *iommu = &v_iommu->iommu;
+
+    virtio_iommu_cleanup(iommu);
+}
+
+static void qvirtio_iommu_device_start_hw(QOSGraphObject *obj)
+{
+    QVirtioIOMMUDevice *v_iommu = (QVirtioIOMMUDevice *) obj;
+    QVirtioIOMMU *iommu = &v_iommu->iommu;
+
+    virtio_iommu_setup(iommu);
+}
+
+static void *virtio_iommu_device_create(void *virtio_dev,
+                                        QGuestAllocator *t_alloc,
+                                        void *addr)
+{
+    QVirtioIOMMUDevice *virtio_rdevice = g_new0(QVirtioIOMMUDevice, 1);
+    QVirtioIOMMU *interface = &virtio_rdevice->iommu;
+
+    interface->vdev = virtio_dev;
+    alloc = t_alloc;
+
+    virtio_rdevice->obj.get_driver = qvirtio_iommu_device_get_driver;
+    virtio_rdevice->obj.destructor = qvirtio_iommu_device_destructor;
+    virtio_rdevice->obj.start_hw = qvirtio_iommu_device_start_hw;
+
+    return &virtio_rdevice->obj;
+}
+
+/* virtio-iommu-pci */
+static void *qvirtio_iommu_pci_get_driver(void *object, const char *interface)
+{
+    QVirtioIOMMUPCI *v_iommu = object;
+    if (!g_strcmp0(interface, "pci-device")) {
+        return v_iommu->pci_vdev.pdev;
+    }
+    return qvirtio_iommu_get_driver(&v_iommu->iommu, interface);
+}
+
+static void qvirtio_iommu_pci_destructor(QOSGraphObject *obj)
+{
+    QVirtioIOMMUPCI *iommu_pci = (QVirtioIOMMUPCI *) obj;
+    QVirtioIOMMU *interface = &iommu_pci->iommu;
+    QOSGraphObject *pci_vobj =  &iommu_pci->pci_vdev.obj;
+
+    virtio_iommu_cleanup(interface);
+    qvirtio_pci_destructor(pci_vobj);
+}
+
+static void qvirtio_iommu_pci_start_hw(QOSGraphObject *obj)
+{
+    QVirtioIOMMUPCI *iommu_pci = (QVirtioIOMMUPCI *) obj;
+    QVirtioIOMMU *interface = &iommu_pci->iommu;
+    QOSGraphObject *pci_vobj =  &iommu_pci->pci_vdev.obj;
+
+    qvirtio_pci_start_hw(pci_vobj);
+    virtio_iommu_setup(interface);
+}
+
+
+static void *virtio_iommu_pci_create(void *pci_bus, QGuestAllocator *t_alloc,
+                                   void *addr)
+{
+    QVirtioIOMMUPCI *virtio_rpci = g_new0(QVirtioIOMMUPCI, 1);
+    QVirtioIOMMU *interface = &virtio_rpci->iommu;
+    QOSGraphObject *obj = &virtio_rpci->pci_vdev.obj;
+
+    virtio_pci_init(&virtio_rpci->pci_vdev, pci_bus, addr);
+    interface->vdev = &virtio_rpci->pci_vdev.vdev;
+    alloc = t_alloc;
+
+    obj->get_driver = qvirtio_iommu_pci_get_driver;
+    obj->start_hw = qvirtio_iommu_pci_start_hw;
+    obj->destructor = qvirtio_iommu_pci_destructor;
+
+    return obj;
+}
+
+static void virtio_iommu_register_nodes(void)
+{
+    QPCIAddress addr = {
+        .devfn = QPCI_DEVFN(4, 0),
+    };
+
+    QOSGraphEdgeOptions opts = {
+        .extra_device_opts = "addr=04.0",
+    };
+
+    /* virtio-iommu-device */
+    qos_node_create_driver("virtio-iommu-device", virtio_iommu_device_create);
+    qos_node_consumes("virtio-iommu-device", "virtio-bus", NULL);
+    qos_node_produces("virtio-iommu-device", "virtio");
+    qos_node_produces("virtio-iommu-device", "virtio-iommu");
+
+    /* virtio-iommu-pci */
+    add_qpci_address(&opts, &addr);
+    qos_node_create_driver("virtio-iommu-pci", virtio_iommu_pci_create);
+    qos_node_consumes("virtio-iommu-pci", "pci-bus", &opts);
+    qos_node_produces("virtio-iommu-pci", "pci-device");
+    qos_node_produces("virtio-iommu-pci", "virtio");
+    qos_node_produces("virtio-iommu-pci", "virtio-iommu");
+}
+
+libqos_init(virtio_iommu_register_nodes);
diff --git a/tests/libqos/virtio-iommu.h b/tests/libqos/virtio-iommu.h
new file mode 100644
index 0000000000..6970b45a01
--- /dev/null
+++ b/tests/libqos/virtio-iommu.h
@@ -0,0 +1,45 @@
+/*
+ * libqos driver framework
+ *
+ * Copyright (c) 2018 Emanuele Giuseppe Esposito <e.emanuelegiuseppe@gmail.com>
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2 as published by the Free Software Foundation.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see <http://www.gnu.org/licenses/>
+ */
+
+#ifndef TESTS_LIBQOS_VIRTIO_IOMMU_H
+#define TESTS_LIBQOS_VIRTIO_IOMMU_H
+
+#include "libqos/qgraph.h"
+#include "libqos/virtio.h"
+#include "libqos/virtio-pci.h"
+
+typedef struct QVirtioIOMMU QVirtioIOMMU;
+typedef struct QVirtioIOMMUPCI QVirtioIOMMUPCI;
+typedef struct QVirtioIOMMUDevice QVirtioIOMMUDevice;
+
+struct QVirtioIOMMU {
+    QVirtioDevice *vdev;
+    QVirtQueue *vq;
+};
+
+struct QVirtioIOMMUPCI {
+    QVirtioPCIDevice pci_vdev;
+    QVirtioIOMMU iommu;
+};
+
+struct QVirtioIOMMUDevice {
+    QOSGraphObject obj;
+    QVirtioIOMMU iommu;
+};
+
+#endif
diff --git a/tests/virtio-iommu-test.c b/tests/virtio-iommu-test.c
new file mode 100644
index 0000000000..1d93d686bc
--- /dev/null
+++ b/tests/virtio-iommu-test.c
@@ -0,0 +1,261 @@
+/*
+ * QTest testcase for VirtIO IOMMU
+ *
+ * Copyright (c) 2014 SUSE LINUX Products GmbH
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+#include "qemu/module.h"
+#include "libqos/qgraph.h"
+#include "libqos/virtio-iommu.h"
+#include "hw/virtio/virtio-iommu.h"
+
+#define PCI_SLOT_HP             0x06
+#define QVIRTIO_IOMMU_TIMEOUT_US (30 * 1000 * 1000)
+
+static QGuestAllocator *alloc;
+
+static void iommu_hotplug(void *obj, void *data, QGuestAllocator *alloc)
+{
+    QVirtioPCIDevice *dev = obj;
+    QTestState *qts = dev->pdev->bus->qts;
+
+    qtest_qmp_device_add(qts, "virtio-iommu-pci", "iommu1",
+                         "{'addr': %s}", stringify(PCI_SLOT_HP));
+
+}
+
+static void pci_config(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+    QVirtioIOMMU *v_iommu = obj;
+    QVirtioDevice *dev = v_iommu->vdev;
+    uint64_t input_range_start = qvirtio_config_readq(dev, 8);
+    uint64_t input_range_end = qvirtio_config_readq(dev, 16);
+    uint32_t domain_range_start = qvirtio_config_readl(dev, 24);
+    uint32_t domain_range_end = qvirtio_config_readl(dev, 28);
+    uint32_t probe_size = qvirtio_config_readl(dev, 32);
+
+    g_assert_cmpint(input_range_start, ==, 0);
+    g_assert_cmphex(input_range_end, ==, 0xFFFFFFFFFFFFFFFF);
+    g_assert_cmpint(domain_range_start, ==, 0);
+    g_assert_cmpint(domain_range_end, ==, 32);
+    g_assert_cmphex(probe_size, ==, 0x200);
+}
+
+/**
+ * send_attach_detach - Send an attach/detach command to the device
+ * @type: VIRTIO_IOMMU_T_ATTACH/VIRTIO_IOMMU_T_DETACH
+ * @domain: domain the end point is attached to
+ * @ep: end-point
+ */
+static int send_attach_detach(QTestState *qts, QVirtioIOMMU *v_iommu,
+                              uint8_t type, uint32_t domain, uint32_t ep)
+{
+    QVirtioDevice *dev = v_iommu->vdev;
+    QVirtQueue *vq = v_iommu->vq;
+    uint64_t ro_addr, wr_addr;
+    uint32_t free_head;
+    struct virtio_iommu_req_attach req; /* same layout as detach */
+    size_t ro_size = sizeof(req) - sizeof(struct virtio_iommu_req_tail);
+    size_t wr_size = sizeof(struct virtio_iommu_req_tail);
+    char buffer[64];
+    int ret;
+
+    req.head.type = type;
+    req.domain = domain;
+    req.endpoint = ep;
+
+    ro_addr = guest_alloc(alloc, ro_size);
+    wr_addr = guest_alloc(alloc, wr_size);
+
+    qtest_memwrite(qts, ro_addr, &req, ro_size);
+    free_head = qvirtqueue_add(qts, vq, ro_addr, ro_size, false, true);
+    qvirtqueue_add(qts, vq, wr_addr, wr_size, true, false);
+    qvirtqueue_kick(qts, dev, vq, free_head);
+    qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+                           QVIRTIO_IOMMU_TIMEOUT_US);
+    qtest_memread(qts, wr_addr, buffer, wr_size);
+    ret = ((struct virtio_iommu_req_tail *)buffer)->status;
+    guest_free(alloc, ro_addr);
+    guest_free(alloc, wr_addr);
+    return ret;
+}
+
+/**
+ * send_map - Send a map command to the device
+ * @domain: domain the new binding is attached to
+ * @virt_start: iova start
+ * @virt_end: iova end
+ * @phys_start: base physical address
+ * @flags: mapping flags
+ */
+static int send_map(QTestState *qts, QVirtioIOMMU *v_iommu,
+                    uint32_t domain, uint64_t virt_start, uint64_t virt_end,
+                    uint64_t phys_start, uint32_t flags)
+{
+    QVirtioDevice *dev = v_iommu->vdev;
+    QVirtQueue *vq = v_iommu->vq;
+    uint64_t ro_addr, wr_addr;
+    uint32_t free_head;
+    struct virtio_iommu_req_map req;
+    size_t ro_size = sizeof(req) - sizeof(struct virtio_iommu_req_tail);
+    size_t wr_size = sizeof(struct virtio_iommu_req_tail);
+    char buffer[64];
+    int ret;
+
+    req.head.type = VIRTIO_IOMMU_T_MAP;
+    req.domain = domain;
+    req.virt_start = virt_start;
+    req.virt_end = virt_end;
+    req.phys_start = phys_start;
+    req.flags = flags;
+
+    ro_addr = guest_alloc(alloc, ro_size);
+    wr_addr = guest_alloc(alloc, wr_size);
+
+    qtest_memwrite(qts, ro_addr, &req, ro_size);
+    free_head = qvirtqueue_add(qts, vq, ro_addr, ro_size, false, true);
+    qvirtqueue_add(qts, vq, wr_addr, wr_size, true, false);
+    qvirtqueue_kick(qts, dev, vq, free_head);
+    qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+                           QVIRTIO_IOMMU_TIMEOUT_US);
+    memread(wr_addr, buffer, wr_size);
+    ret = ((struct virtio_iommu_req_tail *)buffer)->status;
+    guest_free(alloc, ro_addr);
+    guest_free(alloc, wr_addr);
+    return ret;
+}
+
+/**
+ * send_unmap - Send an unmap command to the device
+ * @domain: domain the new binding is attached to
+ * @virt_start: iova start
+ * @virt_end: iova end
+ */
+static int send_unmap(QTestState *qts, QVirtioIOMMU *v_iommu,
+                      uint32_t domain, uint64_t virt_start, uint64_t virt_end)
+{
+    QVirtioDevice *dev = v_iommu->vdev;
+    QVirtQueue *vq = v_iommu->vq;
+    uint64_t ro_addr, wr_addr;
+    uint32_t free_head;
+    struct virtio_iommu_req_unmap req;
+    size_t ro_size = sizeof(req) - sizeof(struct virtio_iommu_req_tail);
+    size_t wr_size = sizeof(struct virtio_iommu_req_tail);
+    char buffer[64];
+    int ret;
+
+    req.head.type = VIRTIO_IOMMU_T_UNMAP;
+    req.domain = domain;
+    req.virt_start = virt_start;
+    req.virt_end = virt_end;
+
+    ro_addr = guest_alloc(alloc, ro_size);
+    wr_addr = guest_alloc(alloc, wr_size);
+
+    qtest_memwrite(qts, ro_addr, &req, ro_size);
+    free_head = qvirtqueue_add(qts, vq, ro_addr, ro_size, false, true);
+    qvirtqueue_add(qts, vq, wr_addr, wr_size, true, false);
+    qvirtqueue_kick(qts, dev, vq, free_head);
+    qvirtio_wait_used_elem(qts, dev, vq, free_head, NULL,
+                           QVIRTIO_IOMMU_TIMEOUT_US);
+    memread(wr_addr, buffer, wr_size);
+    ret = ((struct virtio_iommu_req_tail *)buffer)->status;
+    guest_free(alloc, ro_addr);
+    guest_free(alloc, wr_addr);
+    return ret;
+}
+
+/* Test unmap scenari documented in the spec v0.12 */
+static void test_attach_detach(void *obj, void *data, QGuestAllocator *t_alloc)
+{
+    QVirtioIOMMU *v_iommu = obj;
+    QTestState *qts = global_qtest;
+    int ret;
+
+    alloc = t_alloc;
+
+    /* type, domain, ep */
+    ret = send_attach_detach(qts, v_iommu, VIRTIO_IOMMU_T_ATTACH, 0, 0);
+    g_assert_cmpint(ret, ==, 0);
+    ret = send_attach_detach(qts, v_iommu, VIRTIO_IOMMU_T_ATTACH, 1, 2);
+    g_assert_cmpint(ret, ==, 0);
+    ret = send_attach_detach(qts, v_iommu, VIRTIO_IOMMU_T_ATTACH, 1, 2);
+    g_assert_cmpint(ret, ==, 0);
+    ret = send_attach_detach(qts, v_iommu, VIRTIO_IOMMU_T_ATTACH, 0, 2);
+    g_assert_cmpint(ret, ==, 0);
+
+    /* domain, virt start, virt end, phys start, flags */
+    ret = send_map(qts, v_iommu, 0, 0, 0xFFF, 0xa1000, VIRTIO_IOMMU_MAP_F_READ);
+    g_assert_cmpint(ret, ==, 0);
+
+    ret = send_unmap(qts, v_iommu, 4, 0x10, 0xFFF);
+    g_assert_cmpint(ret, ==, VIRTIO_IOMMU_S_NOENT);
+
+    ret = send_unmap(qts, v_iommu, 0, 0x10, 0xFFF);
+    g_assert_cmpint(ret, ==, VIRTIO_IOMMU_S_RANGE);
+
+    /* Spec example sequence */
+
+    /* 1 */
+    ret = send_unmap(qts, v_iommu, 1, 0, 4);
+    g_assert_cmpint(ret, ==, 0); /* doesn't unmap anything */
+
+    /* 2 */
+    send_map(qts, v_iommu, 1, 0, 9, 0xa1000, VIRTIO_IOMMU_MAP_F_READ);
+    ret = send_unmap(qts, v_iommu, 1, 0, 9);
+    g_assert_cmpint(ret, ==, 0); /* unmaps [0,9] */
+
+    /* 3 */
+    send_map(qts, v_iommu, 1, 0, 4, 0xb1000, VIRTIO_IOMMU_MAP_F_READ);
+    send_map(qts, v_iommu, 1, 5, 9, 0xb2000, VIRTIO_IOMMU_MAP_F_READ);
+    ret = send_unmap(qts, v_iommu, 1, 0, 9);
+    g_assert_cmpint(ret, ==, 0); /* unmaps [0,4] and [5,9] */
+
+    /* 4 */
+    send_map(qts, v_iommu, 1, 0, 9, 0xc1000, VIRTIO_IOMMU_MAP_F_READ);
+    ret = send_unmap(qts, v_iommu, 1, 0, 4);
+    g_assert_cmpint(ret, ==, VIRTIO_IOMMU_S_RANGE); /* doesn't unmap anything */
+
+    ret = send_unmap(qts, v_iommu, 1, 0, 10);
+    g_assert_cmpint(ret, ==, 0);
+
+    /* 5 */
+    send_map(qts, v_iommu, 1, 0, 4, 0xd1000, VIRTIO_IOMMU_MAP_F_READ);
+    send_map(qts, v_iommu, 1, 5, 9, 0xd2000, VIRTIO_IOMMU_MAP_F_READ);
+    ret = send_unmap(qts, v_iommu, 1, 0, 4);
+    g_assert_cmpint(ret, ==, 0); /* unmaps [0,4] */
+
+    ret = send_unmap(qts, v_iommu, 1, 5, 9);
+    g_assert_cmpint(ret, ==, 0);
+
+    /* 6 */
+    send_map(qts, v_iommu, 1, 0, 4, 0xe2000, VIRTIO_IOMMU_MAP_F_READ);
+    ret = send_unmap(qts, v_iommu, 1, 0, 9);
+    g_assert_cmpint(ret, ==, 0); /* unmaps [0,4] */
+
+    /* 7 */
+    send_map(qts, v_iommu, 1, 0, 4, 0xf2000, VIRTIO_IOMMU_MAP_F_READ);
+    send_map(qts, v_iommu, 1, 10, 14, 0xf3000, VIRTIO_IOMMU_MAP_F_READ);
+    ret = send_unmap(qts, v_iommu, 1, 0, 14);
+    g_assert_cmpint(ret, ==, 0); /* unmaps [0,4] and [10,14] */
+
+    send_unmap(qts, v_iommu, 1, 0, 100);
+    send_map(qts, v_iommu, 1, 10, 14, 0xf3000, VIRTIO_IOMMU_MAP_F_READ);
+    send_map(qts, v_iommu, 1, 0, 4, 0xf2000, VIRTIO_IOMMU_MAP_F_READ);
+    ret = send_unmap(qts, v_iommu, 1, 0, 4);
+    g_assert_cmpint(ret, ==, 0); /* unmaps [0,4] and [10,14] */
+}
+
+static void register_virtio_iommu_test(void)
+{
+    qos_add_test("hotplug", "virtio-iommu-pci", iommu_hotplug, NULL);
+    qos_add_test("config", "virtio-iommu", pci_config, NULL);
+    qos_add_test("attach_detach", "virtio-iommu", test_attach_detach, NULL);
+}
+
+libqos_init(register_virtio_iommu_test);
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL
  2019-11-22 18:29 ` [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL Eric Auger
@ 2019-11-22 19:03   ` Dr. David Alan Gilbert
  2019-11-25 13:12     ` Auger Eric
  2019-12-12 12:17   ` Markus Armbruster
  1 sibling, 1 reply; 89+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-22 19:03 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, eric.auger.pro

* Eric Auger (eric.auger@redhat.com) wrote:
> Introduce a new property defining a labelled interval:
> <low address>,<high address>,label.
> 
> This will be used to encode reserved IOVA regions. The label
> is left undefined to ease reuse accross use cases.
> 
> For instance, in virtio-iommu use case, reserved IOVA regions
> will be passed by the machine code to the virtio-iommu-pci
> device (an array of those). The label will match the
> virtio_iommu_probe_resv_mem subtype value:
> - VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0)
> - VIRTIO_IOMMU_RESV_MEM_T_MSI (1)
> 
> This is used to inform the virtio-iommu-pci device it should
> bypass the MSI region: 0xfee00000, 0xfeefffff, 1.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> ---
>  hw/core/qdev-properties.c    | 90 ++++++++++++++++++++++++++++++++++++
>  include/exec/memory.h        |  6 +++
>  include/hw/qdev-properties.h |  3 ++
>  include/qemu/typedefs.h      |  1 +
>  4 files changed, 100 insertions(+)
> 
> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index ac28890e5a..8d70f34e37 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -13,6 +13,7 @@
>  #include "qapi/visitor.h"
>  #include "chardev/char.h"
>  #include "qemu/uuid.h"
> +#include "qemu/cutils.h"
>  
>  void qdev_prop_set_after_realize(DeviceState *dev, const char *name,
>                                    Error **errp)
> @@ -585,6 +586,95 @@ const PropertyInfo qdev_prop_macaddr = {
>      .set   = set_mac,
>  };
>  
> +/* --- Labelled Interval --- */
> +
> +/*
> + * accepted syntax versions:
> + *   <low address>,<high address>,<type>
> + *   where low/high addresses are uint64_t in hexa (feat. 0x prefix)
> + *   and type is an unsigned integer
> + */
> +static void get_interval(Object *obj, Visitor *v, const char *name,
> +                         void *opaque, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    Property *prop = opaque;
> +    Interval *interval = qdev_get_prop_ptr(dev, prop);
> +    char buffer[64];
> +    char *p = buffer;
> +
> +    snprintf(buffer, sizeof(buffer), "0x%"PRIx64",0x%"PRIx64",%d",
> +             interval->low, interval->high, interval->type);
> +
> +    visit_type_str(v, name, &p, errp);
> +}
> +
> +static void set_interval(Object *obj, Visitor *v, const char *name,
> +                         void *opaque, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    Property *prop = opaque;
> +    Interval *interval = qdev_get_prop_ptr(dev, prop);
> +    Error *local_err = NULL;
> +    unsigned int type;
> +    gchar **fields;
> +    uint64_t addr;
> +    char *str;
> +    int ret;
> +
> +    if (dev->realized) {
> +        qdev_prop_set_after_realize(dev, name, errp);
> +        return;
> +    }
> +
> +    visit_type_str(v, name, &str, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    fields = g_strsplit(str, ",", 3);
> +
> +    ret = qemu_strtou64(fields[0], NULL, 16, &addr);
> +    if (!ret) {
> +        interval->low = addr;
> +    } else {
> +        error_setg(errp, "Failed to decode interval low addr");
> +        error_append_hint(errp,
> +                          "should be an address in hexa with 0x prefix\n");
> +        goto out;
> +    }
> +
> +    ret = qemu_strtou64(fields[1], NULL, 16, &addr);
> +    if (!ret) {
> +        interval->high = addr;
> +    } else {
> +        error_setg(errp, "Failed to decode interval high addr");
> +        error_append_hint(errp,
> +                          "should be an address in hexa with 0x prefix\n");
> +        goto out;
> +    }
> +
> +    ret = qemu_strtoui(fields[2], NULL, 10, &type);
> +    if (!ret) {
> +        interval->type = type;
> +    } else {
> +        error_setg(errp, "Failed to decode interval type");
> +        error_append_hint(errp, "should be an unsigned int in decimal\n");
> +    }
> +out:
> +    g_free(str);
> +    g_strfreev(fields);
> +    return;
> +}
> +
> +const PropertyInfo qdev_prop_interval = {
> +    .name  = "labelled_interval",
> +    .description = "Labelled interval, example: 0xFEE00000,0xFEEFFFFF,0",
> +    .get   = get_interval,
> +    .set   = set_interval,
> +};
> +
>  /* --- on/off/auto --- */
>  
>  const PropertyInfo qdev_prop_on_off_auto = {
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index e499dc215b..e238d1c352 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -57,6 +57,12 @@ struct MemoryRegionMmio {
>      CPUWriteMemoryFunc *write[3];
>  };
>  
> +struct Interval {
> +    hwaddr low;
> +    hwaddr high;
> +    unsigned int type;
> +};
> +

Just an observation that 'Interval' is a very generic name.
We've got 'AddrRange' but that's Int128.

Dave

>  typedef struct IOMMUTLBEntry IOMMUTLBEntry;
>  
>  /* See address_space_translate: bit 0 is read, bit 1 is write.  */
> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> index c6a8cb5516..2ba7c8711b 100644
> --- a/include/hw/qdev-properties.h
> +++ b/include/hw/qdev-properties.h
> @@ -20,6 +20,7 @@ extern const PropertyInfo qdev_prop_chr;
>  extern const PropertyInfo qdev_prop_tpm;
>  extern const PropertyInfo qdev_prop_ptr;
>  extern const PropertyInfo qdev_prop_macaddr;
> +extern const PropertyInfo qdev_prop_interval;
>  extern const PropertyInfo qdev_prop_on_off_auto;
>  extern const PropertyInfo qdev_prop_losttickpolicy;
>  extern const PropertyInfo qdev_prop_blockdev_on_error;
> @@ -202,6 +203,8 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
>      DEFINE_PROP(_n, _s, _f, qdev_prop_drive_iothread, BlockBackend *)
>  #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
>      DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
> +#define DEFINE_PROP_INTERVAL(_n, _s, _f)         \
> +    DEFINE_PROP(_n, _s, _f, qdev_prop_interval, Interval)
>  #define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
>      DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
>  #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 375770a80f..a827c9a3fe 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -58,6 +58,7 @@ typedef struct ISABus ISABus;
>  typedef struct ISADevice ISADevice;
>  typedef struct IsaDma IsaDma;
>  typedef struct MACAddr MACAddr;
> +typedef struct Interval Interval;
>  typedef struct MachineClass MachineClass;
>  typedef struct MachineState MachineState;
>  typedef struct MemoryListener MemoryListener;
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (19 preceding siblings ...)
  2019-11-22 18:29 ` [PATCH for-5.0 v11 20/20] tests: Add virtio-iommu test Eric Auger
@ 2019-11-22 21:56 ` no-reply
  2019-12-11 16:40 ` Michael S. Tsirkin
  21 siblings, 0 replies; 89+ messages in thread
From: no-reply @ 2019-11-22 21:56 UTC (permalink / raw)
  To: eric.auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	eric.auger, bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Patchew URL: https://patchew.org/QEMU/20191122182943.4656-1-eric.auger@redhat.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      x86_64-softmmu/hw/virtio/virtio-scsi-pci.o
  CC      x86_64-softmmu/hw/virtio/virtio-blk-pci.o
/tmp/qemu-test/src/hw/virtio/virtio-iommu.c: In function 'int_cmp':
/tmp/qemu-test/src/hw/virtio/virtio-iommu.c:697:5: error: unknown type name 'uint'; did you mean 'guint'?
     uint ua = GPOINTER_TO_UINT(a);
     ^~~~
     guint
/tmp/qemu-test/src/hw/virtio/virtio-iommu.c:698:5: error: unknown type name 'uint'; did you mean 'guint'?
     uint ub = GPOINTER_TO_UINT(b);
     ^~~~
     guint
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: hw/virtio/virtio-iommu.o] Error 1
make[1]: *** Waiting for unfinished jobs....
  CC      aarch64-softmmu/accel/tcg/tcg-runtime.o
  CC      aarch64-softmmu/accel/tcg/tcg-runtime-gvec.o
---
  CC      aarch64-softmmu/hw/block/dataplane/virtio-blk.o
  CC      aarch64-softmmu/hw/char/exynos4210_uart.o
  CC      aarch64-softmmu/hw/char/omap_uart.o
make: *** [Makefile:491: x86_64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs....
  CC      aarch64-softmmu/hw/char/digic-uart.o
  CC      aarch64-softmmu/hw/char/stm32f2xx_usart.o
---
  CC      aarch64-softmmu/hw/arm/boot.o
  CC      aarch64-softmmu/hw/arm/sysbus-fdt.o
/tmp/qemu-test/src/hw/virtio/virtio-iommu.c: In function 'int_cmp':
/tmp/qemu-test/src/hw/virtio/virtio-iommu.c:697:5: error: unknown type name 'uint'; did you mean 'guint'?
     uint ua = GPOINTER_TO_UINT(a);
     ^~~~
     guint
/tmp/qemu-test/src/hw/virtio/virtio-iommu.c:698:5: error: unknown type name 'uint'; did you mean 'guint'?
     uint ub = GPOINTER_TO_UINT(b);
     ^~~~
     guint
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: hw/virtio/virtio-iommu.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:491: aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=028df3325d7c4927a1c334040016728f', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-cuthbude/src/docker-src.2019-11-22-16.49.52.5210:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=028df3325d7c4927a1c334040016728f
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-cuthbude/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    6m21.012s
user    0m8.319s


The full log is available at
http://patchew.org/logs/20191122182943.4656-1-eric.auger@redhat.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL
  2019-11-22 19:03   ` Dr. David Alan Gilbert
@ 2019-11-25 13:12     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-11-25 13:12 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, eric.auger.pro

Hi Dave,

On 11/22/19 8:03 PM, Dr. David Alan Gilbert wrote:
> * Eric Auger (eric.auger@redhat.com) wrote:
>> Introduce a new property defining a labelled interval:
>> <low address>,<high address>,label.
>>
>> This will be used to encode reserved IOVA regions. The label
>> is left undefined to ease reuse accross use cases.
>>
>> For instance, in virtio-iommu use case, reserved IOVA regions
>> will be passed by the machine code to the virtio-iommu-pci
>> device (an array of those). The label will match the
>> virtio_iommu_probe_resv_mem subtype value:
>> - VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0)
>> - VIRTIO_IOMMU_RESV_MEM_T_MSI (1)
>>
>> This is used to inform the virtio-iommu-pci device it should
>> bypass the MSI region: 0xfee00000, 0xfeefffff, 1.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> ---
>>  hw/core/qdev-properties.c    | 90 ++++++++++++++++++++++++++++++++++++
>>  include/exec/memory.h        |  6 +++
>>  include/hw/qdev-properties.h |  3 ++
>>  include/qemu/typedefs.h      |  1 +
>>  4 files changed, 100 insertions(+)
>>
>> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
>> index ac28890e5a..8d70f34e37 100644
>> --- a/hw/core/qdev-properties.c
>> +++ b/hw/core/qdev-properties.c
>> @@ -13,6 +13,7 @@
>>  #include "qapi/visitor.h"
>>  #include "chardev/char.h"
>>  #include "qemu/uuid.h"
>> +#include "qemu/cutils.h"
>>  
>>  void qdev_prop_set_after_realize(DeviceState *dev, const char *name,
>>                                    Error **errp)
>> @@ -585,6 +586,95 @@ const PropertyInfo qdev_prop_macaddr = {
>>      .set   = set_mac,
>>  };
>>  
>> +/* --- Labelled Interval --- */
>> +
>> +/*
>> + * accepted syntax versions:
>> + *   <low address>,<high address>,<type>
>> + *   where low/high addresses are uint64_t in hexa (feat. 0x prefix)
>> + *   and type is an unsigned integer
>> + */
>> +static void get_interval(Object *obj, Visitor *v, const char *name,
>> +                         void *opaque, Error **errp)
>> +{
>> +    DeviceState *dev = DEVICE(obj);
>> +    Property *prop = opaque;
>> +    Interval *interval = qdev_get_prop_ptr(dev, prop);
>> +    char buffer[64];
>> +    char *p = buffer;
>> +
>> +    snprintf(buffer, sizeof(buffer), "0x%"PRIx64",0x%"PRIx64",%d",
>> +             interval->low, interval->high, interval->type);
>> +
>> +    visit_type_str(v, name, &p, errp);
>> +}
>> +
>> +static void set_interval(Object *obj, Visitor *v, const char *name,
>> +                         void *opaque, Error **errp)
>> +{
>> +    DeviceState *dev = DEVICE(obj);
>> +    Property *prop = opaque;
>> +    Interval *interval = qdev_get_prop_ptr(dev, prop);
>> +    Error *local_err = NULL;
>> +    unsigned int type;
>> +    gchar **fields;
>> +    uint64_t addr;
>> +    char *str;
>> +    int ret;
>> +
>> +    if (dev->realized) {
>> +        qdev_prop_set_after_realize(dev, name, errp);
>> +        return;
>> +    }
>> +
>> +    visit_type_str(v, name, &str, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    fields = g_strsplit(str, ",", 3);
>> +
>> +    ret = qemu_strtou64(fields[0], NULL, 16, &addr);
>> +    if (!ret) {
>> +        interval->low = addr;
>> +    } else {
>> +        error_setg(errp, "Failed to decode interval low addr");
>> +        error_append_hint(errp,
>> +                          "should be an address in hexa with 0x prefix\n");
>> +        goto out;
>> +    }
>> +
>> +    ret = qemu_strtou64(fields[1], NULL, 16, &addr);
>> +    if (!ret) {
>> +        interval->high = addr;
>> +    } else {
>> +        error_setg(errp, "Failed to decode interval high addr");
>> +        error_append_hint(errp,
>> +                          "should be an address in hexa with 0x prefix\n");
>> +        goto out;
>> +    }
>> +
>> +    ret = qemu_strtoui(fields[2], NULL, 10, &type);
>> +    if (!ret) {
>> +        interval->type = type;
>> +    } else {
>> +        error_setg(errp, "Failed to decode interval type");
>> +        error_append_hint(errp, "should be an unsigned int in decimal\n");
>> +    }
>> +out:
>> +    g_free(str);
>> +    g_strfreev(fields);
>> +    return;
>> +}
>> +
>> +const PropertyInfo qdev_prop_interval = {
>> +    .name  = "labelled_interval",
>> +    .description = "Labelled interval, example: 0xFEE00000,0xFEEFFFFF,0",
>> +    .get   = get_interval,
>> +    .set   = set_interval,
>> +};
>> +
>>  /* --- on/off/auto --- */
>>  
>>  const PropertyInfo qdev_prop_on_off_auto = {
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index e499dc215b..e238d1c352 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -57,6 +57,12 @@ struct MemoryRegionMmio {
>>      CPUWriteMemoryFunc *write[3];
>>  };
>>  
>> +struct Interval {
>> +    hwaddr low;
>> +    hwaddr high;
>> +    unsigned int type;
>> +};
>> +
> 
> Just an observation that 'Interval' is a very generic name.
> We've got 'AddrRange' but that's Int128.
As it is defined in memory.h it may make sense to rename it
ReservedRegion then?

Thanks

Eric

> 
> Dave
> 
>>  typedef struct IOMMUTLBEntry IOMMUTLBEntry;
>>  
>>  /* See address_space_translate: bit 0 is read, bit 1 is write.  */
>> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
>> index c6a8cb5516..2ba7c8711b 100644
>> --- a/include/hw/qdev-properties.h
>> +++ b/include/hw/qdev-properties.h
>> @@ -20,6 +20,7 @@ extern const PropertyInfo qdev_prop_chr;
>>  extern const PropertyInfo qdev_prop_tpm;
>>  extern const PropertyInfo qdev_prop_ptr;
>>  extern const PropertyInfo qdev_prop_macaddr;
>> +extern const PropertyInfo qdev_prop_interval;
>>  extern const PropertyInfo qdev_prop_on_off_auto;
>>  extern const PropertyInfo qdev_prop_losttickpolicy;
>>  extern const PropertyInfo qdev_prop_blockdev_on_error;
>> @@ -202,6 +203,8 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
>>      DEFINE_PROP(_n, _s, _f, qdev_prop_drive_iothread, BlockBackend *)
>>  #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
>>      DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
>> +#define DEFINE_PROP_INTERVAL(_n, _s, _f)         \
>> +    DEFINE_PROP(_n, _s, _f, qdev_prop_interval, Interval)
>>  #define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
>>      DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
>>  #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
>> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
>> index 375770a80f..a827c9a3fe 100644
>> --- a/include/qemu/typedefs.h
>> +++ b/include/qemu/typedefs.h
>> @@ -58,6 +58,7 @@ typedef struct ISABus ISABus;
>>  typedef struct ISADevice ISADevice;
>>  typedef struct IsaDma IsaDma;
>>  typedef struct MACAddr MACAddr;
>> +typedef struct Interval Interval;
>>  typedef struct MachineClass MachineClass;
>>  typedef struct MachineState MachineState;
>>  typedef struct MemoryListener MemoryListener;
>> -- 
>> 2.20.1
>>
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 01/20] migration: Support QLIST migration
  2019-11-22 18:29 ` [PATCH for-5.0 v11 01/20] migration: Support QLIST migration Eric Auger
@ 2019-11-27 11:46   ` Dr. David Alan Gilbert
  2020-01-08 13:19     ` Juan Quintela
  0 siblings, 1 reply; 89+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-27 11:46 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, eric.auger.pro

* Eric Auger (eric.auger@redhat.com) wrote:
> Support QLIST migration using the same principle as QTAILQ:
> 94869d5c52 ("migration: migrate QTAILQ").
> 
> The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
> The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
> and QLIST_RAW_REVERSE.
> 
> Tests also are provided.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v5 - v6:
> - by doing more advanced testing with virtio-iommu migration
>   I noticed this was broken. "prev" field was not set properly.
>   I improved the tests to manipulate both the next and prev
>   fields.
> - Removed Peter and Juan's R-b
> ---
>  include/migration/vmstate.h |  21 +++++
>  include/qemu/queue.h        |  39 +++++++++
>  migration/trace-events      |   5 ++
>  migration/vmstate-types.c   |  70 +++++++++++++++
>  tests/test-vmstate.c        | 170 ++++++++++++++++++++++++++++++++++++
>  5 files changed, 305 insertions(+)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index ac4f46a67d..08683d93c6 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -227,6 +227,7 @@ extern const VMStateInfo vmstate_info_tmp;
>  extern const VMStateInfo vmstate_info_bitmap;
>  extern const VMStateInfo vmstate_info_qtailq;
>  extern const VMStateInfo vmstate_info_gtree;
> +extern const VMStateInfo vmstate_info_qlist;
>  
>  #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
>  /*
> @@ -796,6 +797,26 @@ extern const VMStateInfo vmstate_info_gtree;
>      .offset       = offsetof(_state, _field),                                  \
>  }
>  
> +/*
> + * For migrating a QLIST
> + * Target QLIST needs be properly initialized.
> + * _type: type of QLIST element
> + * _next: name of QLIST_ENTRY entry field in QLIST element
> + * _vmsd: VMSD for QLIST element
> + * size: size of QLIST element
> + * start: offset of QLIST_ENTRY in QTAILQ element
> + */
> +#define VMSTATE_QLIST_V(_field, _state, _version, _vmsd, _type, _next)  \
> +{                                                                        \
> +    .name         = (stringify(_field)),                                 \
> +    .version_id   = (_version),                                          \
> +    .vmsd         = &(_vmsd),                                            \
> +    .size         = sizeof(_type),                                       \
> +    .info         = &vmstate_info_qlist,                                 \
> +    .offset       = offsetof(_state, _field),                            \
> +    .start        = offsetof(_type, _next),                              \
> +}
> +
>  /* _f : field name
>     _f_n : num of elements field_name
>     _n : num of elements
> diff --git a/include/qemu/queue.h b/include/qemu/queue.h
> index 4764d93ea3..4d4554a7ce 100644
> --- a/include/qemu/queue.h
> +++ b/include/qemu/queue.h
> @@ -501,4 +501,43 @@ union {                                                                 \
>          QTAILQ_RAW_TQH_CIRC(head)->tql_prev = QTAILQ_RAW_TQE_CIRC(elm, entry);  \
>  } while (/*CONSTCOND*/0)
>  
> +#define QLIST_RAW_FIRST(head)                                                  \
> +        field_at_offset(head, 0, void *)
> +
> +#define QLIST_RAW_NEXT(elm, entry)                                             \
> +        field_at_offset(elm, entry, void *)
> +
> +#define QLIST_RAW_PREVIOUS(elm, entry)                                         \
> +        field_at_offset(elm, entry + sizeof(void *), void *)
> +
> +#define QLIST_RAW_FOREACH(elm, head, entry)                                    \
> +        for ((elm) = *QLIST_RAW_FIRST(head);                                   \
> +             (elm);                                                            \
> +             (elm) = *QLIST_RAW_NEXT(elm, entry))
> +
> +#define QLIST_RAW_INSERT_HEAD(head, elm, entry) do {                           \
> +        void *first = *QLIST_RAW_FIRST(head);                                  \
> +        *QLIST_RAW_FIRST(head) = elm;                                          \
> +        *QLIST_RAW_PREVIOUS(elm, entry) = QLIST_RAW_FIRST(head);               \
> +        if (first) {                                                           \
> +            *QLIST_RAW_NEXT(elm, entry) = first;                               \
> +            *QLIST_RAW_PREVIOUS(first, entry) = QLIST_RAW_NEXT(elm, entry);    \
> +        } else {                                                               \
> +            *QLIST_RAW_NEXT(elm, entry) = NULL;                                \
> +        }                                                                      \
> +} while (0)
> +
> +#define QLIST_RAW_REVERSE(head, elm, entry) do {                               \
> +        void *iter = *QLIST_RAW_FIRST(head), *prev = NULL, *next;              \
> +        while (iter) {                                                         \
> +            next = *QLIST_RAW_NEXT(iter, entry);                               \
> +            *QLIST_RAW_PREVIOUS(iter, entry) = QLIST_RAW_NEXT(next, entry);    \
> +            *QLIST_RAW_NEXT(iter, entry) = prev;                               \
> +            prev = iter;                                                       \
> +            iter = next;                                                       \
> +        }                                                                      \
> +        *QLIST_RAW_FIRST(head) = prev;                                         \
> +        *QLIST_RAW_PREVIOUS(prev, entry) = QLIST_RAW_FIRST(head);              \
> +} while (0)
> +
>  #endif /* QEMU_SYS_QUEUE_H */
> diff --git a/migration/trace-events b/migration/trace-events
> index 6dee7b5389..e0a33cffca 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -76,6 +76,11 @@ get_gtree_end(const char *field_name, const char *key_vmsd_name, const char *val
>  put_gtree(const char *field_name, const char *key_vmsd_name, const char *val_vmsd_name, uint32_t nnodes) "%s(%s/%s) nnodes=%d"
>  put_gtree_end(const char *field_name, const char *key_vmsd_name, const char *val_vmsd_name, int ret) "%s(%s/%s) %d"
>  
> +get_qlist(const char *field_name, const char *vmsd_name, int version_id) "%s(%s v%d)"
> +get_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
> +put_qlist(const char *field_name, const char *vmsd_name, int version_id) "%s(%s v%d)"
> +put_qlist_end(const char *field_name, const char *vmsd_name) "%s(%s)"
> +
>  # qemu-file.c
>  qemu_file_fclose(void) ""
>  
> diff --git a/migration/vmstate-types.c b/migration/vmstate-types.c
> index 7236cf92bc..1eee36773a 100644
> --- a/migration/vmstate-types.c
> +++ b/migration/vmstate-types.c
> @@ -843,3 +843,73 @@ const VMStateInfo vmstate_info_gtree = {
>      .get  = get_gtree,
>      .put  = put_gtree,
>  };
> +
> +static int put_qlist(QEMUFile *f, void *pv, size_t unused_size,
> +                     const VMStateField *field, QJSON *vmdesc)
> +{
> +    const VMStateDescription *vmsd = field->vmsd;
> +    /* offset of the QTAILQ entry in a QTAILQ element*/
> +    size_t entry_offset = field->start;
> +    void *elm;
> +    int ret;
> +
> +    trace_put_qlist(field->name, vmsd->name, vmsd->version_id);
> +    QLIST_RAW_FOREACH(elm, pv, entry_offset) {
> +        qemu_put_byte(f, true);
> +        ret = vmstate_save_state(f, vmsd, elm, vmdesc);
> +        if (ret) {
> +            error_report("%s: failed to save %s (%d)", field->name,
> +                         vmsd->name, ret);
> +            return ret;
> +        }
> +    }
> +    qemu_put_byte(f, false);
> +    trace_put_qlist_end(field->name, vmsd->name);
> +
> +    return 0;
> +}
> +
> +static int get_qlist(QEMUFile *f, void *pv, size_t unused_size,
> +                     const VMStateField *field)
> +{
> +    int ret = 0;
> +    const VMStateDescription *vmsd = field->vmsd;
> +    /* size of a QLIST element */
> +    size_t size = field->size;
> +    /* offset of the QLIST entry in a QLIST element */
> +    size_t entry_offset = field->start;
> +    int version_id = field->version_id;
> +    void *elm;
> +
> +    trace_get_qlist(field->name, vmsd->name, vmsd->version_id);
> +    if (version_id > vmsd->version_id) {
> +        error_report("%s %s",  vmsd->name, "too new");
> +        return -EINVAL;
> +    }
> +    if (version_id < vmsd->minimum_version_id) {
> +        error_report("%s %s",  vmsd->name, "too old");
> +        return -EINVAL;
> +    }
> +
> +    while (qemu_get_byte(f)) {
> +        elm = g_malloc(size);
> +        ret = vmstate_load_state(f, vmsd, elm, version_id);
> +        if (ret) {
> +            error_report("%s: failed to load %s (%d)", field->name,
> +                         vmsd->name, ret);
> +            g_free(elm);
> +            return ret;
> +        }
> +        QLIST_RAW_INSERT_HEAD(pv, elm, entry_offset);
> +    }
> +    QLIST_RAW_REVERSE(pv, elm, entry_offset);

Can you explain why you need to do a REVERSE on the loaded list,
rather than using doing a QLIST_INSERT_AFTER to always insert at
the end?

Other than that it looks good.

Dave

> +    trace_get_qlist_end(field->name, vmsd->name);
> +
> +    return ret;
> +}
> +
> +const VMStateInfo vmstate_info_qlist = {
> +    .name = "qlist",
> +    .get  = get_qlist,
> +    .put  = put_qlist,
> +};
> diff --git a/tests/test-vmstate.c b/tests/test-vmstate.c
> index 1e5be1d4ff..9660f932b9 100644
> --- a/tests/test-vmstate.c
> +++ b/tests/test-vmstate.c
> @@ -926,6 +926,28 @@ static const VMStateDescription vmstate_domain = {
>      }
>  };
>  
> +/* test QLIST Migration */
> +
> +typedef struct TestQListElement {
> +    uint32_t  id;
> +    QLIST_ENTRY(TestQListElement) next;
> +} TestQListElement;
> +
> +typedef struct TestQListContainer {
> +    uint32_t  id;
> +    QLIST_HEAD(, TestQListElement) list;
> +} TestQListContainer;
> +
> +static const VMStateDescription vmstate_qlist_element = {
> +    .name = "test/queue list",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(id, TestQListElement),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  static const VMStateDescription vmstate_iommu = {
>      .name = "iommu",
>      .version_id = 1,
> @@ -939,6 +961,18 @@ static const VMStateDescription vmstate_iommu = {
>      }
>  };
>  
> +static const VMStateDescription vmstate_container = {
> +    .name = "test/container/qlist",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(id, TestQListContainer),
> +        VMSTATE_QLIST_V(list, TestQListContainer, 1, vmstate_qlist_element,
> +                        TestQListElement, next),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
>  uint8_t first_domain_dump[] = {
>      /* id */
>      0x00, 0x0, 0x0, 0x6,
> @@ -1229,6 +1263,140 @@ static void test_gtree_load_iommu(void)
>      qemu_fclose(fload);
>  }
>  
> +static uint8_t qlist_dump[] = {
> +    0x00, 0x00, 0x00, 0x01, /* container id */
> +    0x1, /* start of a */
> +    0x00, 0x00, 0x00, 0x0a,
> +    0x1, /* start of b */
> +    0x00, 0x00, 0x0b, 0x00,
> +    0x1, /* start of c */
> +    0x00, 0x0c, 0x00, 0x00,
> +    0x1, /* start of d */
> +    0x0d, 0x00, 0x00, 0x00,
> +    0x0, /* end of list */
> +    QEMU_VM_EOF, /* just to ensure we won't get EOF reported prematurely */
> +};
> +
> +static TestQListContainer *alloc_container(void)
> +{
> +    TestQListElement *a = g_malloc(sizeof(TestQListElement));
> +    TestQListElement *b = g_malloc(sizeof(TestQListElement));
> +    TestQListElement *c = g_malloc(sizeof(TestQListElement));
> +    TestQListElement *d = g_malloc(sizeof(TestQListElement));
> +    TestQListContainer *container = g_malloc(sizeof(TestQListContainer));
> +
> +    a->id = 0x0a;
> +    b->id = 0x0b00;
> +    c->id = 0xc0000;
> +    d->id = 0xd000000;
> +    container->id = 1;
> +
> +    QLIST_INIT(&container->list);
> +    QLIST_INSERT_HEAD(&container->list, d, next);
> +    QLIST_INSERT_HEAD(&container->list, c, next);
> +    QLIST_INSERT_HEAD(&container->list, b, next);
> +    QLIST_INSERT_HEAD(&container->list, a, next);
> +    return container;
> +}
> +
> +static void free_container(TestQListContainer *container)
> +{
> +    TestQListElement *iter, *tmp;
> +
> +    QLIST_FOREACH_SAFE(iter, &container->list, next, tmp) {
> +        QLIST_REMOVE(iter, next);
> +        g_free(iter);
> +    }
> +    g_free(container);
> +}
> +
> +static void compare_containers(TestQListContainer *c1, TestQListContainer *c2)
> +{
> +    TestQListElement *first_item_c1, *first_item_c2;
> +
> +    while (!QLIST_EMPTY(&c1->list)) {
> +        first_item_c1 = QLIST_FIRST(&c1->list);
> +        first_item_c2 = QLIST_FIRST(&c2->list);
> +        assert(first_item_c2);
> +        assert(first_item_c1->id == first_item_c2->id);
> +        QLIST_REMOVE(first_item_c1, next);
> +        QLIST_REMOVE(first_item_c2, next);
> +        g_free(first_item_c1);
> +        g_free(first_item_c2);
> +    }
> +    assert(QLIST_EMPTY(&c2->list));
> +}
> +
> +/*
> + * Check the prev & next fields are correct by doing list
> + * manipulations on the container. We will do that for both
> + * the source and the destination containers
> + */
> +static void manipulate_container(TestQListContainer *c)
> +{
> +     TestQListElement *prev, *iter = QLIST_FIRST(&c->list);
> +     TestQListElement *elem;
> +
> +     elem = g_malloc(sizeof(TestQListElement));
> +     elem->id = 0x12;
> +     QLIST_INSERT_AFTER(iter, elem, next);
> +
> +     elem = g_malloc(sizeof(TestQListElement));
> +     elem->id = 0x13;
> +     QLIST_INSERT_HEAD(&c->list, elem, next);
> +
> +     while (iter) {
> +        prev = iter;
> +        iter = QLIST_NEXT(iter, next);
> +     }
> +
> +     elem = g_malloc(sizeof(TestQListElement));
> +     elem->id = 0x14;
> +     QLIST_INSERT_BEFORE(prev, elem, next);
> +
> +     elem = g_malloc(sizeof(TestQListElement));
> +     elem->id = 0x15;
> +     QLIST_INSERT_AFTER(prev, elem, next);
> +
> +     QLIST_REMOVE(prev, next);
> +     g_free(prev);
> +}
> +
> +static void test_save_qlist(void)
> +{
> +    TestQListContainer *container = alloc_container();
> +
> +    save_vmstate(&vmstate_container, container);
> +    compare_vmstate(qlist_dump, sizeof(qlist_dump));
> +    free_container(container);
> +}
> +
> +static void test_load_qlist(void)
> +{
> +    QEMUFile *fsave, *fload;
> +    TestQListContainer *orig_container = alloc_container();
> +    TestQListContainer *dest_container = g_malloc0(sizeof(TestQListContainer));
> +    char eof;
> +
> +    QLIST_INIT(&dest_container->list);
> +
> +    fsave = open_test_file(true);
> +    qemu_put_buffer(fsave, qlist_dump, sizeof(qlist_dump));
> +    g_assert(!qemu_file_get_error(fsave));
> +    qemu_fclose(fsave);
> +
> +    fload = open_test_file(false);
> +    vmstate_load_state(fload, &vmstate_container, dest_container, 1);
> +    eof = qemu_get_byte(fload);
> +    g_assert(!qemu_file_get_error(fload));
> +    g_assert_cmpint(eof, ==, QEMU_VM_EOF);
> +    manipulate_container(orig_container);
> +    manipulate_container(dest_container);
> +    compare_containers(orig_container, dest_container);
> +    free_container(orig_container);
> +    free_container(dest_container);
> +}
> +
>  typedef struct TmpTestStruct {
>      TestStruct *parent;
>      int64_t diff;
> @@ -1353,6 +1521,8 @@ int main(int argc, char **argv)
>      g_test_add_func("/vmstate/gtree/load/loaddomain", test_gtree_load_domain);
>      g_test_add_func("/vmstate/gtree/save/saveiommu", test_gtree_save_iommu);
>      g_test_add_func("/vmstate/gtree/load/loadiommu", test_gtree_load_iommu);
> +    g_test_add_func("/vmstate/qlist/save/saveqlist", test_save_qlist);
> +    g_test_add_func("/vmstate/qlist/load/loadqlist", test_load_qlist);
>      g_test_add_func("/vmstate/tmp_struct", test_tmp_struct);
>      g_test_run();
>  
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration
  2019-11-22 18:29 ` [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration Eric Auger
@ 2019-11-27 12:06   ` Dr. David Alan Gilbert
  2019-12-10 16:50   ` Jean-Philippe Brucker
  2019-12-10 20:01   ` Peter Xu
  2 siblings, 0 replies; 89+ messages in thread
From: Dr. David Alan Gilbert @ 2019-11-27 12:06 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, eric.auger.pro

* Eric Auger (eric.auger@redhat.com) wrote:
> Add Migration support. We rely on recently added gtree and qlist
> migration. Besides, we have to fixup end point <-> domain link.
> 
> Indeed each domain has a list of endpoints attached to it. And each
> endpoint has a pointer to its domain.
> 
> Raw gtree and qlist migration cannot handle this as it re-allocates
> all the nodes while reconstructing the trees/lists.
> 
> So in post_load we re-construct the relationship between endpoints
> and domains.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

From the migration side of things,


Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

> ---
>  hw/virtio/virtio-iommu.c | 127 ++++++++++++++++++++++++++++++++++++---
>  1 file changed, 117 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index c5b202fab7..4e92fc0c95 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -692,16 +692,6 @@ static void virtio_iommu_set_features(VirtIODevice *vdev, uint64_t val)
>      trace_virtio_iommu_set_features(dev->acked_features);
>  }
>  
> -/*
> - * Migration is not yet supported: most of the state consists
> - * of balanced binary trees which are not yet ready for getting
> - * migrated
> - */
> -static const VMStateDescription vmstate_virtio_iommu_device = {
> -    .name = "virtio-iommu-device",
> -    .unmigratable = 1,
> -};
> -
>  static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
>  {
>      uint ua = GPOINTER_TO_UINT(a);
> @@ -778,6 +768,123 @@ static void virtio_iommu_instance_init(Object *obj)
>  {
>  }
>  
> +#define VMSTATE_INTERVAL                               \
> +{                                                      \
> +    .name = "interval",                                \
> +    .version_id = 1,                                   \
> +    .minimum_version_id = 1,                           \
> +    .fields = (VMStateField[]) {                       \
> +        VMSTATE_UINT64(low, viommu_interval),          \
> +        VMSTATE_UINT64(high, viommu_interval),         \
> +        VMSTATE_END_OF_LIST()                          \
> +    }                                                  \
> +}
> +
> +#define VMSTATE_MAPPING                               \
> +{                                                     \
> +    .name = "mapping",                                \
> +    .version_id = 1,                                  \
> +    .minimum_version_id = 1,                          \
> +    .fields = (VMStateField[]) {                      \
> +        VMSTATE_UINT64(phys_addr, viommu_mapping),    \
> +        VMSTATE_UINT32(flags, viommu_mapping),        \
> +        VMSTATE_END_OF_LIST()                         \
> +    },                                                \
> +}
> +
> +static const VMStateDescription vmstate_interval_mapping[2] = {
> +    VMSTATE_MAPPING,   /* value */
> +    VMSTATE_INTERVAL   /* key   */
> +};
> +
> +static int domain_preload(void *opaque)
> +{
> +    viommu_domain *domain = opaque;
> +
> +    domain->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
> +                                       NULL, g_free, g_free);
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_endpoint = {
> +    .name = "endpoint",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(id, viommu_endpoint),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static const VMStateDescription vmstate_domain = {
> +    .name = "domain",
> +    .version_id = 1,
> +    .minimum_version_id = 1,
> +    .pre_load = domain_preload,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_UINT32(id, viommu_domain),
> +        VMSTATE_GTREE_V(mappings, viommu_domain, 1,
> +                        vmstate_interval_mapping,
> +                        viommu_interval, viommu_mapping),
> +        VMSTATE_QLIST_V(endpoint_list, viommu_domain, 1,
> +                        vmstate_endpoint, viommu_endpoint, next),
> +        VMSTATE_END_OF_LIST()
> +    }
> +};
> +
> +static gboolean reconstruct_ep_domain_link(gpointer key, gpointer value,
> +                                           gpointer data)
> +{
> +    viommu_domain *d = (viommu_domain *)value;
> +    viommu_endpoint *iter, *tmp;
> +    viommu_endpoint *ep = (viommu_endpoint *)data;
> +
> +    QLIST_FOREACH_SAFE(iter, &d->endpoint_list, next, tmp) {
> +        if (iter->id == ep->id) {
> +            /* remove the ep */
> +            QLIST_REMOVE(iter, next);
> +            g_free(iter);
> +            /* replace it with the good one */
> +            QLIST_INSERT_HEAD(&d->endpoint_list, ep, next);
> +            /* update the domain */
> +            ep->domain = d;
> +            return true; /* stop the search */
> +        }
> +    }
> +    return false; /* continue the traversal */
> +}
> +
> +static gboolean fix_endpoint(gpointer key, gpointer value, gpointer data)
> +{
> +    VirtIOIOMMU *s = (VirtIOIOMMU *)data;
> +
> +    g_tree_foreach(s->domains, reconstruct_ep_domain_link, value);
> +    return false;
> +}
> +
> +static int iommu_post_load(void *opaque, int version_id)
> +{
> +    VirtIOIOMMU *s = opaque;
> +
> +    g_tree_foreach(s->endpoints, fix_endpoint, s);
> +    return 0;
> +}
> +
> +static const VMStateDescription vmstate_virtio_iommu_device = {
> +    .name = "virtio-iommu-device",
> +    .minimum_version_id = 1,
> +    .version_id = 1,
> +    .post_load = iommu_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_GTREE_DIRECT_KEY_V(domains, VirtIOIOMMU, 1,
> +                                   &vmstate_domain, viommu_domain),
> +        VMSTATE_GTREE_DIRECT_KEY_V(endpoints, VirtIOIOMMU, 1,
> +                                   &vmstate_endpoint, viommu_endpoint),
> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +
>  static const VMStateDescription vmstate_virtio_iommu = {
>      .name = "virtio-iommu",
>      .minimum_version_id = 1,
> -- 
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 02/20] virtio-iommu: Add skeleton
  2019-11-22 18:29 ` [PATCH for-5.0 v11 02/20] virtio-iommu: Add skeleton Eric Auger
@ 2019-12-10 16:31   ` Jean-Philippe Brucker
  2019-12-19 10:31     ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:31 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Eric,

On Fri, Nov 22, 2019 at 07:29:25PM +0100, Eric Auger wrote:
> +typedef struct VirtIOIOMMU {
> +    VirtIODevice parent_obj;
> +    VirtQueue *req_vq;
> +    VirtQueue *event_vq;
> +    struct virtio_iommu_config config;
> +    uint64_t features;
> +    uint64_t acked_features;

We already have guest_features in the parent object.

> +    GHashTable *as_by_busptr;
> +    IOMMUPciBus *as_by_bus_num[IOMMU_PCI_BUS_MAX];

Doesn't seem used anymore.

Thanks,
Jean

> +    PCIBus *primary_bus;
> +    GTree *domains;
> +    QemuMutex mutex;
> +    GTree *endpoints;
> +} VirtIOIOMMU;
> +
> +#endif
> -- 
> 2.20.1
> 
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 03/20] virtio-iommu: Decode the command payload
  2019-11-22 18:29 ` [PATCH for-5.0 v11 03/20] virtio-iommu: Decode the command payload Eric Auger
@ 2019-12-10 16:32   ` Jean-Philippe Brucker
  2019-12-10 19:14   ` Peter Xu
  1 sibling, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:32 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:26PM +0100, Eric Auger wrote:
> This patch adds the command payload decoding and
> introduces the functions that will do the actual
> command handling. Those functions are not yet implemented.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

Which isn't worth much since I don't have prior QEMU experience but I did
stare at this code for a while and work on future extensions.

Thanks,
Jean



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions
  2019-11-22 18:29 ` [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions Eric Auger
@ 2019-12-10 16:34   ` Jean-Philippe Brucker
  2019-12-19 18:11     ` Auger Eric
  2019-12-10 19:18   ` Peter Xu
  1 sibling, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:34 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Two small things below, but looks good overall

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

On Fri, Nov 22, 2019 at 07:29:27PM +0100, Eric Auger wrote:
> +static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
> +                                              int devfn)
> +{
> +    VirtIOIOMMU *s = opaque;
> +    IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, bus);
> +    static uint32_t mr_index;
> +    IOMMUDevice *sdev;
> +
> +    if (!sbus) {
> +        sbus = g_malloc0(sizeof(IOMMUPciBus) +
> +                         sizeof(IOMMUDevice *) * IOMMU_PCI_DEVFN_MAX);
> +        sbus->bus = bus;
> +        g_hash_table_insert(s->as_by_busptr, bus, sbus);
> +    }
> +
> +    sdev = sbus->pbdev[devfn];
> +    if (!sdev) {
> +        char *name = g_strdup_printf("%s-%d-%d",
> +                                     TYPE_VIRTIO_IOMMU_MEMORY_REGION,
> +                                     mr_index++, devfn);
> +        sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(IOMMUDevice));
> +
> +        sdev->viommu = s;
> +        sdev->bus = bus;
> +        sdev->devfn = devfn;

It might be better to store the endpoint ID in IOMMUDevice, then you could
get rid of virtio_iommu_get_sid(), and remove a tiny bit of overhead in
virtio_iommu_translate(). But I doubt it's significant.

[...]
> +static const TypeInfo virtio_iommu_memory_region_info = {
> +    .parent = TYPE_IOMMU_MEMORY_REGION,
> +    .name = TYPE_VIRTIO_IOMMU_MEMORY_REGION,
> +    .class_init = virtio_iommu_memory_region_class_init,
> +};
> +
> +

nit: newline.

Thanks,
Jean 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers
  2019-11-22 18:29 ` [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
@ 2019-12-10 16:37   ` Jean-Philippe Brucker
  2019-12-19 18:31     ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:37 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:28PM +0100, Eric Auger wrote:
> +typedef struct viommu_domain {
> +    uint32_t id;
> +    GTree *mappings;
> +    QLIST_HEAD(, viommu_endpoint) endpoint_list;
> +} viommu_domain;
> +
> +typedef struct viommu_endpoint {
> +    uint32_t id;
> +    viommu_domain *domain;
> +    QLIST_ENTRY(viommu_endpoint) next;
> +} viommu_endpoint;

There might be a way to merge viommu_endpoint and the IOMMUDevice
structure introduced in patch 4, since they both represent one endpoint.
Maybe virtio_iommu_find_add_pci_as() could add the IOMMUDevice to
s->endpoints, and IOMMUDevice could store the endpoint ID rather than bus
and devfn.

> +typedef struct viommu_interval {
> +    uint64_t low;
> +    uint64_t high;
> +} viommu_interval;

I guess these should be named in CamelCase? Although if we're allowed to
choose my vote goes to underscores :)

Thanks,
Jean


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 06/20] virtio-iommu: Implement attach/detach command
  2019-11-22 18:29 ` [PATCH for-5.0 v11 06/20] virtio-iommu: Implement attach/detach command Eric Auger
@ 2019-12-10 16:41   ` Jean-Philippe Brucker
  2019-12-23  9:14     ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:41 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:29PM +0100, Eric Auger wrote:
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 235bde2203..138d5b2a9c 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -77,11 +77,12 @@ static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
>  static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
>  {
>      QLIST_REMOVE(ep, next);
> +    g_tree_unref(ep->domain->mappings);
>      ep->domain = NULL;
>  }
>  
> -viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id);
> -viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id)
> +static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
> +                                                  uint32_t ep_id)
>  {
>      viommu_endpoint *ep;
>  
> @@ -102,15 +103,14 @@ static void virtio_iommu_put_endpoint(gpointer data)
>  
>      if (ep->domain) {
>          virtio_iommu_detach_endpoint_from_domain(ep);
> -        g_tree_unref(ep->domain->mappings);
>      }
>  
>      trace_virtio_iommu_put_endpoint(ep->id);
>      g_free(ep);
>  }
>  
> -viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id);
> -viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id)
> +static viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s,
> +                                              uint32_t domain_id)

Looks like the above change belong to patch 5?

>  {
>      viommu_domain *domain;
>  
> @@ -137,7 +137,6 @@ static void virtio_iommu_put_domain(gpointer data)
>      QLIST_FOREACH_SAFE(iter, &domain->endpoint_list, next, tmp) {
>          virtio_iommu_detach_endpoint_from_domain(iter);
>      }
> -    g_tree_destroy(domain->mappings);

When created by virtio_iommu_get_domain(), mappings has one reference.
Then for each attach (including the first one) an additional reference is
taken, and freed by virtio_iommu_detach_endpoint_from_domain(). So I think
there are two problems:

* virtio_iommu_put_domain() drops one ref for each endpoint, but we still
  have one reference to mappings, so they're not freed. We do need this
  g_tree_destroy()

* After detaching all the endpoints, the guest may reuse the domain ID for
  another domain, but the previous mappings haven't been erased. Not sure
  how to fix this using the g_tree refs, because dropping all the
  references will free the internal tree data and it won't be reusable.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 07/20] virtio-iommu: Implement map/unmap
  2019-11-22 18:29 ` [PATCH for-5.0 v11 07/20] virtio-iommu: Implement map/unmap Eric Auger
@ 2019-12-10 16:43   ` Jean-Philippe Brucker
  2019-12-23  9:42     ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:43 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:30PM +0100, Eric Auger wrote:
> @@ -238,10 +244,35 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
>      uint64_t virt_start = le64_to_cpu(req->virt_start);
>      uint64_t virt_end = le64_to_cpu(req->virt_end);
>      uint32_t flags = le32_to_cpu(req->flags);
> +    viommu_domain *domain;
> +    viommu_interval *interval;
> +    viommu_mapping *mapping;

Additional checks would be good. Most importantly we need to return
S_INVAL if we don't recognize a bit in flags (a MUST in the spec). It
might be good to check that addresses are aligned on the page granule as
well, and return S_RANGE if they aren't (a SHOULD in the spec), but I
don't care as much.

> +
> +    interval = g_malloc0(sizeof(*interval));
> +
> +    interval->low = virt_start;
> +    interval->high = virt_end;
> +
> +    domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
> +    if (!domain) {
> +        return VIRTIO_IOMMU_S_NOENT;

Leaks interval, I guess you could allocate it after this block.

Thanks,
Jean

> +    }
> +
> +    mapping = g_tree_lookup(domain->mappings, (gpointer)interval);
> +    if (mapping) {
> +        g_free(interval);
> +        return VIRTIO_IOMMU_S_INVAL;
> +    }
>  
>      trace_virtio_iommu_map(domain_id, virt_start, virt_end, phys_start, flags);
>  
> -    return VIRTIO_IOMMU_S_UNSUPP;
> +    mapping = g_malloc0(sizeof(*mapping));
> +    mapping->phys_addr = phys_start;
> +    mapping->flags = flags;
> +
> +    g_tree_insert(domain->mappings, interval, mapping);
> +
> +    return VIRTIO_IOMMU_S_OK;


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-11-22 18:29 ` [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate Eric Auger
@ 2019-12-10 16:43   ` Jean-Philippe Brucker
  2019-12-10 19:33   ` Peter Xu
  1 sibling, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:43 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:31PM +0100, Eric Auger wrote:
> This patch implements the translate callback
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 09/20] virtio-iommu: Implement fault reporting
  2019-11-22 18:29 ` [PATCH for-5.0 v11 09/20] virtio-iommu: Implement fault reporting Eric Auger
@ 2019-12-10 16:44   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:44 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:32PM +0100, Eric Auger wrote:
> @@ -443,6 +489,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>      if (!ep) {
>          if (!bypass_allowed) {
>              error_report_once("%s sid=%d is not known!!", __func__, sid);
> +            virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_UNKNOWN,
> +                                      0, sid, 0);

I guess we could report the faulting address as well, it can be useful for
diagnostics.

>          } else {
>              entry.perm = flag;
>          }
> @@ -455,6 +503,8 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>                            "%s %02x:%02x.%01x not attached to any domain\n",
>                            __func__, PCI_BUS_NUM(sid),
>                            PCI_SLOT(sid), PCI_FUNC(sid));
> +            virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_DOMAIN,
> +                                      0, sid, 0);

Here as well, especially since that error would get propagated by a linux
guest to the device driver

>          } else {
>              entry.perm = flag;
>          }
> @@ -468,16 +518,25 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>          qemu_log_mask(LOG_GUEST_ERROR,
>                        "%s no mapping for 0x%"PRIx64" for sid=%d\n",
>                        __func__, addr, sid);
> +        virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
> +                                  0, sid, addr);

Flag VIRTIO_IOMMU_FAULT_F_ADDRESS denotes a valid address field

Thanks,
Jean


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 10/20] virtio-iommu-pci: Add virtio iommu pci support
  2019-11-22 18:29 ` [PATCH for-5.0 v11 10/20] virtio-iommu-pci: Add virtio iommu pci support Eric Auger
@ 2019-12-10 16:44   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:44 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:33PM +0100, Eric Auger wrote:
> This patch adds virtio-iommu-pci, which is the pci proxy for
> the virtio-iommu device.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 11/20] hw/arm/virt: Add the virtio-iommu device tree mappings
  2019-11-22 18:29 ` [PATCH for-5.0 v11 11/20] hw/arm/virt: Add the virtio-iommu device tree mappings Eric Auger
@ 2019-12-10 16:45   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:45 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:34PM +0100, Eric Auger wrote:
> diff --git a/hw/virtio/virtio-iommu-pci.c b/hw/virtio/virtio-iommu-pci.c
> index 280230b31e..4cfae1f9df 100644
> --- a/hw/virtio/virtio-iommu-pci.c
> +++ b/hw/virtio/virtio-iommu-pci.c
> @@ -31,9 +31,6 @@ struct VirtIOIOMMUPCI {
>  
>  static Property virtio_iommu_pci_properties[] = {
>      DEFINE_PROP_UINT32("class", VirtIOPCIProxy, class_code, 0),
> -    DEFINE_PROP_ARRAY("reserved-regions", VirtIOIOMMUPCI,
> -                      vdev.nb_reserved_regions, vdev.reserved_regions,
> -                      qdev_prop_interval, Interval),

Belongs in patch 10?

Apart from that 
Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 13/20] virtio-iommu: Implement probe request
  2019-11-22 18:29 ` [PATCH for-5.0 v11 13/20] virtio-iommu: Implement probe request Eric Auger
@ 2019-12-10 16:46   ` Jean-Philippe Brucker
  2019-12-10 19:36   ` Peter Xu
  1 sibling, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:46 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:36PM +0100, Eric Auger wrote:
> This patch implements the PROBE request. At the moment,
> no reserved regions are returned as none are registered
> per device. Only a NONE property is returned.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 14/20] virtio-iommu: Handle reserved regions in the translation process
  2019-11-22 18:29 ` [PATCH for-5.0 v11 14/20] virtio-iommu: Handle reserved regions in the translation process Eric Auger
@ 2019-12-10 16:46   ` Jean-Philippe Brucker
  2019-12-10 19:39   ` Peter Xu
  1 sibling, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:46 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:37PM +0100, Eric Auger wrote:
> +    for (i = 0; i < s->nb_reserved_regions; i++) {
> +        if (interval.low >= s->reserved_regions[i].low &&
> +            interval.low <= s->reserved_regions[i].high) {
> +            switch (s->reserved_regions[i].type) {
> +            case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> +                entry.perm = flag;
> +                goto unlock;
> +            case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> +            default:
> +                virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
> +                                          0, sid, addr);

Needs the VIRTIO_IOMMU_FAULT_F_ADDRESS flag.

Thanks,
Jean



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 15/20] virtio-iommu-pci: Add array of Interval properties
  2019-11-22 18:29 ` [PATCH for-5.0 v11 15/20] virtio-iommu-pci: Add array of Interval properties Eric Auger
@ 2019-12-10 16:47   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:47 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:38PM +0100, Eric Auger wrote:
> The machine may need to pass reserved regions to the
> virtio-iommu-pci device (such as the MSI window on x86).
> So let's add an array of Interval properties.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 16/20] hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper
  2019-11-22 18:29 ` [PATCH for-5.0 v11 16/20] hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper Eric Auger
@ 2019-12-10 16:47   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:47 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:39PM +0100, Eric Auger wrote:
>  build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>  {
> @@ -426,13 +437,12 @@ build_iort(GArray *table_data, BIOSLinker *linker, VirtMachineState *vms)
>          smmu->gerr_gsiv = cpu_to_le32(irq + 2);
>          smmu->sync_gsiv = cpu_to_le32(irq + 3);
>  
> -        /* Identity RID mapping covering the whole input RID range */
> -        idmap = &smmu->id_mapping_array[0];
> -        idmap->input_base = 0;
> -        idmap->id_count = cpu_to_le32(0xFFFF);
> -        idmap->output_base = 0;
> -        /* output IORT node is the ITS group node (the first node) */
> -        idmap->output_reference = cpu_to_le32(iort_node_offset);
> +        /*
> +         * Identity RID mapping covering the whole input RID range.
> +         * The output IORT node is the ITS group node (the first node).
> +         */
> +        fill_iort_idmap(smmu->id_mapping_array, 0, 0, 0xffff, 0,

nit: the other calls use uppercase hex digits

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 17/20] hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table
  2019-11-22 18:29 ` [PATCH for-5.0 v11 17/20] hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table Eric Auger
@ 2019-12-10 16:48   ` Jean-Philippe Brucker
  0 siblings, 0 replies; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:48 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:40PM +0100, Eric Auger wrote:
> This patch builds the virtio-iommu node in the ACPI IORT table.
> 
> The RID space of the root complex, which spans 0x0-0x10000
> maps to streamid space 0x0-0x10000 in the virtio-iommu which in
> turn maps to deviceid space 0x0-0x10000 in the ITS group.
> 
> The iommu RID is excluded as described in virtio-iommu
> specification.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

Although VIOT changes the layout of the IORT node slightly, the
implementation should stay pretty much the same.


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration
  2019-11-22 18:29 ` [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration Eric Auger
  2019-11-27 12:06   ` Dr. David Alan Gilbert
@ 2019-12-10 16:50   ` Jean-Philippe Brucker
  2019-12-19 11:03     ` Auger Eric
  2019-12-10 20:01   ` Peter Xu
  2 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:50 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:41PM +0100, Eric Auger wrote:
> +static const VMStateDescription vmstate_virtio_iommu_device = {
> +    .name = "virtio-iommu-device",
> +    .minimum_version_id = 1,
> +    .version_id = 1,
> +    .post_load = iommu_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_GTREE_DIRECT_KEY_V(domains, VirtIOIOMMU, 1,
> +                                   &vmstate_domain, viommu_domain),
> +        VMSTATE_GTREE_DIRECT_KEY_V(endpoints, VirtIOIOMMU, 1,
> +                                   &vmstate_endpoint, viommu_endpoint),

So if I understand correctly these fields are state that is modified by
the guest? We don't need to save/load fields that cannot be modified by
the guest, static information that is created from the QEMU command-line. 

I think the above covers everything we need to migrate in VirtIOIOMMU
then, except for acked_features, which (as I pointed out on another patch)
seems redundant anyway since there is vdev->guest_features.

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>

> +        VMSTATE_END_OF_LIST()
> +    },
> +};
> +
> +
>  static const VMStateDescription vmstate_virtio_iommu = {
>      .name = "virtio-iommu",
>      .minimum_version_id = 1,
> -- 
> 2.20.1
> 
> 


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci
  2019-11-22 18:29 ` [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci Eric Auger
@ 2019-12-10 16:50   ` Jean-Philippe Brucker
  2019-12-24  7:39     ` Auger Eric
  2020-01-09 12:02   ` Michael S. Tsirkin
  1 sibling, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-10 16:50 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:42PM +0100, Eric Auger wrote:
> The virtio-iommu-pci is instantiated through the -device QEMU
> option. However if instantiated it also requires an IORT ACPI table
> to describe the ID mappings between the root complex and the iommu.
> 
> This patch adds the generation of the IORT table if the
> virtio-iommu-pci device is instantiated.
> 
> We also declare the [0xfee00000 - 0xfeefffff] MSI reserved region
> so that it gets bypassed by the IOMMU.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

It would be nice to factor the IORT code with arm, but this looks OK.

Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 03/20] virtio-iommu: Decode the command payload
  2019-11-22 18:29 ` [PATCH for-5.0 v11 03/20] virtio-iommu: Decode the command payload Eric Auger
  2019-12-10 16:32   ` Jean-Philippe Brucker
@ 2019-12-10 19:14   ` Peter Xu
  1 sibling, 0 replies; 89+ messages in thread
From: Peter Xu @ 2019-12-10 19:14 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:26PM +0100, Eric Auger wrote:
> This patch adds the command payload decoding and
> introduces the functions that will do the actual
> command handling. Those functions are not yet implemented.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

I would simply squash this into previous patch to avoid removing lines
from newly introduced, but this is ok too so to keep Jean's r-b:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions
  2019-11-22 18:29 ` [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions Eric Auger
  2019-12-10 16:34   ` Jean-Philippe Brucker
@ 2019-12-10 19:18   ` Peter Xu
  1 sibling, 0 replies; 89+ messages in thread
From: Peter Xu @ 2019-12-10 19:18 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:27PM +0100, Eric Auger wrote:
> This patch initializes the iommu memory regions so that
> PCIe end point transactions get translated. The translation
> function is not yet implemented though.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Either with/without Jean's comment addressed:

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-11-22 18:29 ` [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate Eric Auger
  2019-12-10 16:43   ` Jean-Philippe Brucker
@ 2019-12-10 19:33   ` Peter Xu
  2019-12-19 10:30     ` Auger Eric
  1 sibling, 1 reply; 89+ messages in thread
From: Peter Xu @ 2019-12-10 19:33 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:31PM +0100, Eric Auger wrote:
> This patch implements the translate callback
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v10 -> v11:
> - take into account the new value struct and use
>   g_tree_lookup_extended
> - switched to error_report_once
> 
> v6 -> v7:
> - implemented bypass-mode
> 
> v5 -> v6:
> - replace error_report by qemu_log_mask
> 
> v4 -> v5:
> - check the device domain is not NULL
> - s/printf/error_report
> - set flags to IOMMU_NONE in case of all translation faults
> ---
>  hw/virtio/trace-events   |  1 +
>  hw/virtio/virtio-iommu.c | 63 +++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> index f25359cee2..de7cbb3c8f 100644
> --- a/hw/virtio/trace-events
> +++ b/hw/virtio/trace-events
> @@ -72,3 +72,4 @@ virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
>  virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
>  virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
>  virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
> +virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index f0a56833a2..a83666557b 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -412,19 +412,80 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>                                              int iommu_idx)
>  {
>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> +    viommu_interval interval, *mapping_key;
> +    viommu_mapping *mapping_value;
> +    VirtIOIOMMU *s = sdev->viommu;
> +    viommu_endpoint *ep;
> +    bool bypass_allowed;
>      uint32_t sid;
> +    bool found;
> +
> +    interval.low = addr;
> +    interval.high = addr + 1;
>  
>      IOMMUTLBEntry entry = {
>          .target_as = &address_space_memory,
>          .iova = addr,
>          .translated_addr = addr,
> -        .addr_mask = ~(hwaddr)0,
> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
>          .perm = IOMMU_NONE,
>      };
>  
> +    bypass_allowed = virtio_has_feature(s->acked_features,
> +                                        VIRTIO_IOMMU_F_BYPASS);
> +

Would it be easier to check bypass_allowed here once and then drop the
latter [1] and [2] check?

>      sid = virtio_iommu_get_sid(sdev);
>  
>      trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
> +    qemu_mutex_lock(&s->mutex);
> +
> +    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
> +    if (!ep) {
> +        if (!bypass_allowed) {

[1]

> +            error_report_once("%s sid=%d is not known!!", __func__, sid);
> +        } else {
> +            entry.perm = flag;
> +        }
> +        goto unlock;
> +    }
> +
> +    if (!ep->domain) {
> +        if (!bypass_allowed) {

[2]

> +            qemu_log_mask(LOG_GUEST_ERROR,
> +                          "%s %02x:%02x.%01x not attached to any domain\n",
> +                          __func__, PCI_BUS_NUM(sid),
> +                          PCI_SLOT(sid), PCI_FUNC(sid));
> +        } else {
> +            entry.perm = flag;
> +        }
> +        goto unlock;
> +    }
> +
> +    found = g_tree_lookup_extended(ep->domain->mappings, (gpointer)(&interval),
> +                                   (void **)&mapping_key,
> +                                   (void **)&mapping_value);
> +    if (!found) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "%s no mapping for 0x%"PRIx64" for sid=%d\n",
> +                      __func__, addr, sid);

I would still suggest that we use the same logging interface (either
error_report_once() or qemu_log_mask(), not use them randomly).

> +        goto unlock;
> +    }
> +
> +    if (((flag & IOMMU_RO) &&
> +            !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
> +        ((flag & IOMMU_WO) &&
> +            !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
> +        qemu_log_mask(LOG_GUEST_ERROR,
> +                      "Permission error on 0x%"PRIx64"(%d): allowed=%d\n",
> +                      addr, flag, mapping_value->flags);

(Btw, IIUC this may not be a guest error. Say, what if the device is
 simply broken?)

> +        goto unlock;
> +    }
> +    entry.translated_addr = addr - mapping_key->low + mapping_value->phys_addr;
> +    entry.perm = flag;
> +    trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
> +
> +unlock:
> +    qemu_mutex_unlock(&s->mutex);
>      return entry;
>  }
>  
> -- 
> 2.20.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 13/20] virtio-iommu: Implement probe request
  2019-11-22 18:29 ` [PATCH for-5.0 v11 13/20] virtio-iommu: Implement probe request Eric Auger
  2019-12-10 16:46   ` Jean-Philippe Brucker
@ 2019-12-10 19:36   ` Peter Xu
  1 sibling, 0 replies; 89+ messages in thread
From: Peter Xu @ 2019-12-10 19:36 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:36PM +0100, Eric Auger wrote:
> This patch implements the PROBE request. At the moment,
> no reserved regions are returned as none are registered
> per device. Only a NONE property is returned.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 14/20] virtio-iommu: Handle reserved regions in the translation process
  2019-11-22 18:29 ` [PATCH for-5.0 v11 14/20] virtio-iommu: Handle reserved regions in the translation process Eric Auger
  2019-12-10 16:46   ` Jean-Philippe Brucker
@ 2019-12-10 19:39   ` Peter Xu
  1 sibling, 0 replies; 89+ messages in thread
From: Peter Xu @ 2019-12-10 19:39 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:37PM +0100, Eric Auger wrote:
> When translating an address we need to check if it belongs to
> a reserved virtual address range. If it does, there are 2 cases:
> 
> - it belongs to a RESERVED region: the guest should neither use
>   this address in a MAP not instruct the end-point to DMA on
>   them. We report an error
> 
> - It belongs to an MSI region: we bypass the translation.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> ---
> 
> v10 -> v11:
> - directly use the reserved_regions properties array
> 
> v9 -> v10:
> - in case of MSI region, we immediatly return
> ---
>  hw/virtio/virtio-iommu.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index 1ce2218935..c5b202fab7 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -548,6 +548,7 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>      uint32_t sid, flags;
>      bool bypass_allowed;
>      bool found;
> +    int i;
>  
>      interval.low = addr;
>      interval.high = addr + 1;
> @@ -580,6 +581,22 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>          goto unlock;
>      }
>  
> +    for (i = 0; i < s->nb_reserved_regions; i++) {
> +        if (interval.low >= s->reserved_regions[i].low &&
> +            interval.low <= s->reserved_regions[i].high) {
> +            switch (s->reserved_regions[i].type) {
> +            case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> +                entry.perm = flag;
> +                goto unlock;

Might be a bit clearer to break here instead of goto, then..

> +            case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:

               /* Passthrough */

> +            default:
> +                virtio_iommu_report_fault(s, VIRTIO_IOMMU_FAULT_R_MAPPING,
> +                                          0, sid, addr);
> +            goto unlock;

.. do the same thing here, and...

> +           }

.. goto unlock here..

> +        }
> +    }
> +
>      if (!ep->domain) {
>          if (!bypass_allowed) {
>              qemu_log_mask(LOG_GUEST_ERROR,
> -- 
> 2.20.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration
  2019-11-22 18:29 ` [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration Eric Auger
  2019-11-27 12:06   ` Dr. David Alan Gilbert
  2019-12-10 16:50   ` Jean-Philippe Brucker
@ 2019-12-10 20:01   ` Peter Xu
  2019-12-24  7:39     ` Auger Eric
  2 siblings, 1 reply; 89+ messages in thread
From: Peter Xu @ 2019-12-10 20:01 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:41PM +0100, Eric Auger wrote:
> +static const VMStateDescription vmstate_virtio_iommu_device = {
> +    .name = "virtio-iommu-device",
> +    .minimum_version_id = 1,
> +    .version_id = 1,
> +    .post_load = iommu_post_load,
> +    .fields = (VMStateField[]) {
> +        VMSTATE_GTREE_DIRECT_KEY_V(domains, VirtIOIOMMU, 1,
> +                                   &vmstate_domain, viommu_domain),
> +        VMSTATE_GTREE_DIRECT_KEY_V(endpoints, VirtIOIOMMU, 1,
> +                                   &vmstate_endpoint, viommu_endpoint),

IIUC vmstate_domain already contains all the endpoint information (in
endpoint_list of vmstate_domain), but here we migrate it twice.  I
suppose that's why now we need reconstruct_ep_domain_link() to fixup
the duplicated migration?

Then I'll instead ask whether we can skip migrating here?  Then in
post_load we simply:

  foreach(domain)
    foreach(endpoint in domain)
      g_tree_insert(s->endpoints);

It might help to avoid the reconstruct_ep_domain_link ugliness?

And besides, I also agree with Jean that the endpoint data structure
could be reused with IOMMUDevice somehow.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device
  2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
                   ` (20 preceding siblings ...)
  2019-11-22 21:56 ` [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device no-reply
@ 2019-12-11 16:40 ` Michael S. Tsirkin
  2019-12-11 16:48   ` Auger Eric
  21 siblings, 1 reply; 89+ messages in thread
From: Michael S. Tsirkin @ 2019-12-11 16:40 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, quintela,
	jean-philippe.brucker, qemu-devel, peterx, armbru, bharatb.linux,
	qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:23PM +0100, Eric Auger wrote:
> This series implements the QEMU virtio-iommu device.
> 
> This matches the v0.12 spec and the corresponding virtio-iommu
> driver upstreamed in 5.3.
> 
> The pci proxy for the virtio-iommu device is instantiated using
> "-device virtio-iommu-pci". This series still relies on ACPI IORT/DT
> integration. Note the ACPI IORT integration is not yet upstreamed
> and testing needs to be based on Jean-Philippe's additional
> kernel patches [1].

Or the config space approach? I really liked that one.

> 
> Work is ongoing to remove IORT adherence and allow the
> bindings between the IOMMU and the root complex to be defined
> and written into the PCI device configuration space. The outcome
> of this work is uncertain at this stage though. See [2].
> 
> So only patches 1-11 fully rely on upstreamed kernel code. Others
> should be considered as RFC.
> 
> This respin allows people to test on ARM and x86. It also
> brings migration support (tested on ARM) and various cleanups.
> Reserved regions are now passed through an array of properties.
> A libqos test also is introduced to test the virtio-iommu API.
> 
> Note integration with vhost devices and vfio devices is not part
> of this series. Please follow Bharat's respins [3].
> 
> The 1st Patch ("migration: Support QLIST migration") was sent
> separately [4].
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> https://github.com/eauger/qemu/tree/v4.2-rc2-virtio-iommu-v11
> 
> [1] kernel branch to be used for guest
>     https://github.com/eauger/linux/tree/v5.4-rc8-virtio-iommu-iort
> [2] [RFC 00/13] virtio-iommu on non-devicetree platforms
> [3] VFIO/VHOST integration is not part of this series. Please follow
>     [PATCH RFC v5 0/5] virtio-iommu: VFIO integration respins
> [4] [PATCH v6] migration: Support QLIST migration
> 
> Testing:
> - tested with guest using virtio-net-pci
>   (,vhost=off,iommu_platform,disable-modern=off,disable-legacy=on)
>   and virtio-blk-pci
> - migration on ARM
> - on x86 PC machine I get some AHCI non translated transactions,
>   very early. This does not prevent the guest from boot and behaving
>   properly. Warnings look like:
> qemu-system-x86_64: virtio_iommu_translate sid=250 is not known!!
> qemu-system-x86_64: no buffer available in event queue to report event
> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
> receive buffer address
> 
> History:
> 
> v10 -> v11:
> - introduce virtio_iommu_handle_req macro
> - migration support
> - introduce DEFINE_PROP_INTERVAL and pass reserved regions
>   through an array of those
> - domain gtree simplification
> 
> v9 -> v10:
> - rebase on 4.1.0-rc2, compliance with 0.12 spec
> - removed ACPI part
> - cleanup (see individual change logs)
> - moved to a PATCH series
> 
> v8 -> v9:
> - virtio-iommu-pci device needs to be instantiated from the command
>   line (RID is not imposed anymore).
> - tail structure properly initialized
> 
> v7 -> v8:
> - virtio-iommu-pci added
> - virt instantiation modified
> - DT and ACPI modified to exclude the iommu RID from the mapping
> - VIRTIO_IOMMU_F_BYPASS, VIRTIO_F_VERSION_1 features exposed
> 
> v6 -> v7:
> - rebase on qemu 3.0.0-rc3
> - minor update against v0.7
> - fix issue with EP not on pci.0 and ACPI probing
> - change the instantiation method
> 
> v5 -> v6:
> - minor update against v0.6 spec
> - fix g_hash_table_lookup in virtio_iommu_find_add_as
> - replace some error_reports by qemu_log_mask(LOG_GUEST_ERROR, ...)
> 
> v4 -> v5:
> - event queue and fault reporting
> - we now return the IOAPIC MSI region if the virtio-iommu is instantiated
>   in a PC machine.
> - we bypass transactions on MSI HW region and fault on reserved ones.
> - We support ACPI boot with mach-virt (based on IORT proposal)
> - We moved to the new driver naming conventions
> - simplified mach-virt instantiation
> - worked around the disappearing of pci_find_primary_bus
> - in virtio_iommu_translate, check the dev->as is not NULL
> - initialize as->device_list in virtio_iommu_get_as
> - initialize bufstate.error to false in virtio_iommu_probe
> 
> v3 -> v4:
> - probe request support although no reserved region is returned at
>   the moment
> - unmap semantics less strict, as specified in v0.4
> - device registration, attach/detach revisited
> - split into smaller patches to ease review
> - propose a way to inform the IOMMU mr about the page_size_mask
>   of underlying HW IOMMU, if any
> - remove warning associated with the translation of the MSI doorbell
> 
> v2 -> v3:
> - rebase on top of 2.10-rc0 and especially
>   [PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
> - add mutex init
> - fix as->mappings deletion using g_tree_ref/unref
> - when a dev is attached whereas it is already attached to
>   another address space, first detach it
> - fix some error values
> - page_sizes = TARGET_PAGE_MASK;
> - I haven't changed the unmap() semantics yet, waiting for the
>   next virtio-iommu spec revision.
> 
> v1 -> v2:
> - fix redefinition of viommu_as typedef
> 
> 
> 
> Eric Auger (20):
>   migration: Support QLIST migration
>   virtio-iommu: Add skeleton
>   virtio-iommu: Decode the command payload
>   virtio-iommu: Add the iommu regions
>   virtio-iommu: Endpoint and domains structs and helpers
>   virtio-iommu: Implement attach/detach command
>   virtio-iommu: Implement map/unmap
>   virtio-iommu: Implement translate
>   virtio-iommu: Implement fault reporting
>   virtio-iommu-pci: Add virtio iommu pci support
>   hw/arm/virt: Add the virtio-iommu device tree mappings
>   qapi: Introduce DEFINE_PROP_INTERVAL
>   virtio-iommu: Implement probe request
>   virtio-iommu: Handle reserved regions in the translation process
>   virtio-iommu-pci: Add array of Interval properties
>   hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper
>   hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table
>   virtio-iommu: Support migration
>   pc: Add support for virtio-iommu-pci
>   tests: Add virtio-iommu test
> 
>  hw/arm/virt-acpi-build.c         |  91 ++-
>  hw/arm/virt.c                    |  53 +-
>  hw/core/qdev-properties.c        |  90 +++
>  hw/i386/acpi-build.c             |  72 +++
>  hw/i386/pc.c                     |  15 +-
>  hw/virtio/Kconfig                |   5 +
>  hw/virtio/Makefile.objs          |   2 +
>  hw/virtio/trace-events           |  22 +
>  hw/virtio/virtio-iommu-pci.c     |  91 +++
>  hw/virtio/virtio-iommu.c         | 952 +++++++++++++++++++++++++++++++
>  include/exec/memory.h            |   6 +
>  include/hw/acpi/acpi-defs.h      |  21 +-
>  include/hw/arm/virt.h            |   2 +
>  include/hw/i386/pc.h             |   2 +
>  include/hw/pci/pci.h             |   1 +
>  include/hw/qdev-properties.h     |   3 +
>  include/hw/virtio/virtio-iommu.h |  67 +++
>  include/migration/vmstate.h      |  21 +
>  include/qemu/queue.h             |  39 ++
>  include/qemu/typedefs.h          |   1 +
>  migration/trace-events           |   5 +
>  migration/vmstate-types.c        |  70 +++
>  qdev-monitor.c                   |   1 +
>  tests/Makefile.include           |   2 +
>  tests/libqos/virtio-iommu.c      | 177 ++++++
>  tests/libqos/virtio-iommu.h      |  45 ++
>  tests/test-vmstate.c             | 170 ++++++
>  tests/virtio-iommu-test.c        | 261 +++++++++
>  28 files changed, 2253 insertions(+), 34 deletions(-)
>  create mode 100644 hw/virtio/virtio-iommu-pci.c
>  create mode 100644 hw/virtio/virtio-iommu.c
>  create mode 100644 include/hw/virtio/virtio-iommu.h
>  create mode 100644 tests/libqos/virtio-iommu.c
>  create mode 100644 tests/libqos/virtio-iommu.h
>  create mode 100644 tests/virtio-iommu-test.c
> 
> -- 
> 2.20.1



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device
  2019-12-11 16:40 ` Michael S. Tsirkin
@ 2019-12-11 16:48   ` Auger Eric
  2019-12-11 20:40     ` Michael S. Tsirkin
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2019-12-11 16:48 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, quintela,
	jean-philippe.brucker, qemu-devel, peterx, armbru, bharatb.linux,
	qemu-arm, dgilbert, eric.auger.pro

Hi Michael,

On 12/11/19 5:40 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 22, 2019 at 07:29:23PM +0100, Eric Auger wrote:
>> This series implements the QEMU virtio-iommu device.
>>
>> This matches the v0.12 spec and the corresponding virtio-iommu
>> driver upstreamed in 5.3.
>>
>> The pci proxy for the virtio-iommu device is instantiated using
>> "-device virtio-iommu-pci". This series still relies on ACPI IORT/DT
>> integration. Note the ACPI IORT integration is not yet upstreamed
>> and testing needs to be based on Jean-Philippe's additional
>> kernel patches [1].
> 
> Or the config space approach? I really liked that one.
Yes this corresponds to the paragraph below.
> 
>>
>> Work is ongoing to remove IORT adherence and allow the
>> bindings between the IOMMU and the root complex to be defined
>> and written into the PCI device configuration space. The outcome
>> of this work is uncertain at this stage though. See [2].

Thanks

Eric

>>
>> So only patches 1-11 fully rely on upstreamed kernel code. Others
>> should be considered as RFC.
>>
>> This respin allows people to test on ARM and x86. It also
>> brings migration support (tested on ARM) and various cleanups.
>> Reserved regions are now passed through an array of properties.
>> A libqos test also is introduced to test the virtio-iommu API.
>>
>> Note integration with vhost devices and vfio devices is not part
>> of this series. Please follow Bharat's respins [3].
>>
>> The 1st Patch ("migration: Support QLIST migration") was sent
>> separately [4].
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> https://github.com/eauger/qemu/tree/v4.2-rc2-virtio-iommu-v11
>>
>> [1] kernel branch to be used for guest
>>     https://github.com/eauger/linux/tree/v5.4-rc8-virtio-iommu-iort
>> [2] [RFC 00/13] virtio-iommu on non-devicetree platforms
>> [3] VFIO/VHOST integration is not part of this series. Please follow
>>     [PATCH RFC v5 0/5] virtio-iommu: VFIO integration respins
>> [4] [PATCH v6] migration: Support QLIST migration
>>
>> Testing:
>> - tested with guest using virtio-net-pci
>>   (,vhost=off,iommu_platform,disable-modern=off,disable-legacy=on)
>>   and virtio-blk-pci
>> - migration on ARM
>> - on x86 PC machine I get some AHCI non translated transactions,
>>   very early. This does not prevent the guest from boot and behaving
>>   properly. Warnings look like:
>> qemu-system-x86_64: virtio_iommu_translate sid=250 is not known!!
>> qemu-system-x86_64: no buffer available in event queue to report event
>> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
>> receive buffer address
>>
>> History:
>>
>> v10 -> v11:
>> - introduce virtio_iommu_handle_req macro
>> - migration support
>> - introduce DEFINE_PROP_INTERVAL and pass reserved regions
>>   through an array of those
>> - domain gtree simplification
>>
>> v9 -> v10:
>> - rebase on 4.1.0-rc2, compliance with 0.12 spec
>> - removed ACPI part
>> - cleanup (see individual change logs)
>> - moved to a PATCH series
>>
>> v8 -> v9:
>> - virtio-iommu-pci device needs to be instantiated from the command
>>   line (RID is not imposed anymore).
>> - tail structure properly initialized
>>
>> v7 -> v8:
>> - virtio-iommu-pci added
>> - virt instantiation modified
>> - DT and ACPI modified to exclude the iommu RID from the mapping
>> - VIRTIO_IOMMU_F_BYPASS, VIRTIO_F_VERSION_1 features exposed
>>
>> v6 -> v7:
>> - rebase on qemu 3.0.0-rc3
>> - minor update against v0.7
>> - fix issue with EP not on pci.0 and ACPI probing
>> - change the instantiation method
>>
>> v5 -> v6:
>> - minor update against v0.6 spec
>> - fix g_hash_table_lookup in virtio_iommu_find_add_as
>> - replace some error_reports by qemu_log_mask(LOG_GUEST_ERROR, ...)
>>
>> v4 -> v5:
>> - event queue and fault reporting
>> - we now return the IOAPIC MSI region if the virtio-iommu is instantiated
>>   in a PC machine.
>> - we bypass transactions on MSI HW region and fault on reserved ones.
>> - We support ACPI boot with mach-virt (based on IORT proposal)
>> - We moved to the new driver naming conventions
>> - simplified mach-virt instantiation
>> - worked around the disappearing of pci_find_primary_bus
>> - in virtio_iommu_translate, check the dev->as is not NULL
>> - initialize as->device_list in virtio_iommu_get_as
>> - initialize bufstate.error to false in virtio_iommu_probe
>>
>> v3 -> v4:
>> - probe request support although no reserved region is returned at
>>   the moment
>> - unmap semantics less strict, as specified in v0.4
>> - device registration, attach/detach revisited
>> - split into smaller patches to ease review
>> - propose a way to inform the IOMMU mr about the page_size_mask
>>   of underlying HW IOMMU, if any
>> - remove warning associated with the translation of the MSI doorbell
>>
>> v2 -> v3:
>> - rebase on top of 2.10-rc0 and especially
>>   [PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
>> - add mutex init
>> - fix as->mappings deletion using g_tree_ref/unref
>> - when a dev is attached whereas it is already attached to
>>   another address space, first detach it
>> - fix some error values
>> - page_sizes = TARGET_PAGE_MASK;
>> - I haven't changed the unmap() semantics yet, waiting for the
>>   next virtio-iommu spec revision.
>>
>> v1 -> v2:
>> - fix redefinition of viommu_as typedef
>>
>>
>>
>> Eric Auger (20):
>>   migration: Support QLIST migration
>>   virtio-iommu: Add skeleton
>>   virtio-iommu: Decode the command payload
>>   virtio-iommu: Add the iommu regions
>>   virtio-iommu: Endpoint and domains structs and helpers
>>   virtio-iommu: Implement attach/detach command
>>   virtio-iommu: Implement map/unmap
>>   virtio-iommu: Implement translate
>>   virtio-iommu: Implement fault reporting
>>   virtio-iommu-pci: Add virtio iommu pci support
>>   hw/arm/virt: Add the virtio-iommu device tree mappings
>>   qapi: Introduce DEFINE_PROP_INTERVAL
>>   virtio-iommu: Implement probe request
>>   virtio-iommu: Handle reserved regions in the translation process
>>   virtio-iommu-pci: Add array of Interval properties
>>   hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper
>>   hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table
>>   virtio-iommu: Support migration
>>   pc: Add support for virtio-iommu-pci
>>   tests: Add virtio-iommu test
>>
>>  hw/arm/virt-acpi-build.c         |  91 ++-
>>  hw/arm/virt.c                    |  53 +-
>>  hw/core/qdev-properties.c        |  90 +++
>>  hw/i386/acpi-build.c             |  72 +++
>>  hw/i386/pc.c                     |  15 +-
>>  hw/virtio/Kconfig                |   5 +
>>  hw/virtio/Makefile.objs          |   2 +
>>  hw/virtio/trace-events           |  22 +
>>  hw/virtio/virtio-iommu-pci.c     |  91 +++
>>  hw/virtio/virtio-iommu.c         | 952 +++++++++++++++++++++++++++++++
>>  include/exec/memory.h            |   6 +
>>  include/hw/acpi/acpi-defs.h      |  21 +-
>>  include/hw/arm/virt.h            |   2 +
>>  include/hw/i386/pc.h             |   2 +
>>  include/hw/pci/pci.h             |   1 +
>>  include/hw/qdev-properties.h     |   3 +
>>  include/hw/virtio/virtio-iommu.h |  67 +++
>>  include/migration/vmstate.h      |  21 +
>>  include/qemu/queue.h             |  39 ++
>>  include/qemu/typedefs.h          |   1 +
>>  migration/trace-events           |   5 +
>>  migration/vmstate-types.c        |  70 +++
>>  qdev-monitor.c                   |   1 +
>>  tests/Makefile.include           |   2 +
>>  tests/libqos/virtio-iommu.c      | 177 ++++++
>>  tests/libqos/virtio-iommu.h      |  45 ++
>>  tests/test-vmstate.c             | 170 ++++++
>>  tests/virtio-iommu-test.c        | 261 +++++++++
>>  28 files changed, 2253 insertions(+), 34 deletions(-)
>>  create mode 100644 hw/virtio/virtio-iommu-pci.c
>>  create mode 100644 hw/virtio/virtio-iommu.c
>>  create mode 100644 include/hw/virtio/virtio-iommu.h
>>  create mode 100644 tests/libqos/virtio-iommu.c
>>  create mode 100644 tests/libqos/virtio-iommu.h
>>  create mode 100644 tests/virtio-iommu-test.c
>>
>> -- 
>> 2.20.1
> 
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device
  2019-12-11 16:48   ` Auger Eric
@ 2019-12-11 20:40     ` Michael S. Tsirkin
  2019-12-12 15:05       ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Michael S. Tsirkin @ 2019-12-11 20:40 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, quintela,
	jean-philippe.brucker, qemu-devel, peterx, armbru, bharatb.linux,
	qemu-arm, dgilbert, eric.auger.pro

On Wed, Dec 11, 2019 at 05:48:05PM +0100, Auger Eric wrote:
> Hi Michael,
> 
> On 12/11/19 5:40 PM, Michael S. Tsirkin wrote:
> > On Fri, Nov 22, 2019 at 07:29:23PM +0100, Eric Auger wrote:
> >> This series implements the QEMU virtio-iommu device.
> >>
> >> This matches the v0.12 spec and the corresponding virtio-iommu
> >> driver upstreamed in 5.3.
> >>
> >> The pci proxy for the virtio-iommu device is instantiated using
> >> "-device virtio-iommu-pci". This series still relies on ACPI IORT/DT
> >> integration. Note the ACPI IORT integration is not yet upstreamed
> >> and testing needs to be based on Jean-Philippe's additional
> >> kernel patches [1].
> > 
> > Or the config space approach? I really liked that one.
> Yes this corresponds to the paragraph below.
> > 
> >>
> >> Work is ongoing to remove IORT adherence and allow the
> >> bindings between the IOMMU and the root complex to be defined
> >> and written into the PCI device configuration space. The outcome
> >> of this work is uncertain at this stage though. See [2].

Oh right. Why is it uncertain? Anything can be done to help?

> Thanks
> 
> Eric
> 
> >>
> >> So only patches 1-11 fully rely on upstreamed kernel code. Others
> >> should be considered as RFC.
> >>
> >> This respin allows people to test on ARM and x86. It also
> >> brings migration support (tested on ARM) and various cleanups.
> >> Reserved regions are now passed through an array of properties.
> >> A libqos test also is introduced to test the virtio-iommu API.
> >>
> >> Note integration with vhost devices and vfio devices is not part
> >> of this series. Please follow Bharat's respins [3].
> >>
> >> The 1st Patch ("migration: Support QLIST migration") was sent
> >> separately [4].
> >>
> >> Best Regards
> >>
> >> Eric
> >>
> >> This series can be found at:
> >> https://github.com/eauger/qemu/tree/v4.2-rc2-virtio-iommu-v11
> >>
> >> [1] kernel branch to be used for guest
> >>     https://github.com/eauger/linux/tree/v5.4-rc8-virtio-iommu-iort
> >> [2] [RFC 00/13] virtio-iommu on non-devicetree platforms
> >> [3] VFIO/VHOST integration is not part of this series. Please follow
> >>     [PATCH RFC v5 0/5] virtio-iommu: VFIO integration respins
> >> [4] [PATCH v6] migration: Support QLIST migration
> >>
> >> Testing:
> >> - tested with guest using virtio-net-pci
> >>   (,vhost=off,iommu_platform,disable-modern=off,disable-legacy=on)
> >>   and virtio-blk-pci
> >> - migration on ARM
> >> - on x86 PC machine I get some AHCI non translated transactions,
> >>   very early. This does not prevent the guest from boot and behaving
> >>   properly. Warnings look like:
> >> qemu-system-x86_64: virtio_iommu_translate sid=250 is not known!!
> >> qemu-system-x86_64: no buffer available in event queue to report event
> >> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
> >> receive buffer address
> >>
> >> History:
> >>
> >> v10 -> v11:
> >> - introduce virtio_iommu_handle_req macro
> >> - migration support
> >> - introduce DEFINE_PROP_INTERVAL and pass reserved regions
> >>   through an array of those
> >> - domain gtree simplification
> >>
> >> v9 -> v10:
> >> - rebase on 4.1.0-rc2, compliance with 0.12 spec
> >> - removed ACPI part
> >> - cleanup (see individual change logs)
> >> - moved to a PATCH series
> >>
> >> v8 -> v9:
> >> - virtio-iommu-pci device needs to be instantiated from the command
> >>   line (RID is not imposed anymore).
> >> - tail structure properly initialized
> >>
> >> v7 -> v8:
> >> - virtio-iommu-pci added
> >> - virt instantiation modified
> >> - DT and ACPI modified to exclude the iommu RID from the mapping
> >> - VIRTIO_IOMMU_F_BYPASS, VIRTIO_F_VERSION_1 features exposed
> >>
> >> v6 -> v7:
> >> - rebase on qemu 3.0.0-rc3
> >> - minor update against v0.7
> >> - fix issue with EP not on pci.0 and ACPI probing
> >> - change the instantiation method
> >>
> >> v5 -> v6:
> >> - minor update against v0.6 spec
> >> - fix g_hash_table_lookup in virtio_iommu_find_add_as
> >> - replace some error_reports by qemu_log_mask(LOG_GUEST_ERROR, ...)
> >>
> >> v4 -> v5:
> >> - event queue and fault reporting
> >> - we now return the IOAPIC MSI region if the virtio-iommu is instantiated
> >>   in a PC machine.
> >> - we bypass transactions on MSI HW region and fault on reserved ones.
> >> - We support ACPI boot with mach-virt (based on IORT proposal)
> >> - We moved to the new driver naming conventions
> >> - simplified mach-virt instantiation
> >> - worked around the disappearing of pci_find_primary_bus
> >> - in virtio_iommu_translate, check the dev->as is not NULL
> >> - initialize as->device_list in virtio_iommu_get_as
> >> - initialize bufstate.error to false in virtio_iommu_probe
> >>
> >> v3 -> v4:
> >> - probe request support although no reserved region is returned at
> >>   the moment
> >> - unmap semantics less strict, as specified in v0.4
> >> - device registration, attach/detach revisited
> >> - split into smaller patches to ease review
> >> - propose a way to inform the IOMMU mr about the page_size_mask
> >>   of underlying HW IOMMU, if any
> >> - remove warning associated with the translation of the MSI doorbell
> >>
> >> v2 -> v3:
> >> - rebase on top of 2.10-rc0 and especially
> >>   [PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
> >> - add mutex init
> >> - fix as->mappings deletion using g_tree_ref/unref
> >> - when a dev is attached whereas it is already attached to
> >>   another address space, first detach it
> >> - fix some error values
> >> - page_sizes = TARGET_PAGE_MASK;
> >> - I haven't changed the unmap() semantics yet, waiting for the
> >>   next virtio-iommu spec revision.
> >>
> >> v1 -> v2:
> >> - fix redefinition of viommu_as typedef
> >>
> >>
> >>
> >> Eric Auger (20):
> >>   migration: Support QLIST migration
> >>   virtio-iommu: Add skeleton
> >>   virtio-iommu: Decode the command payload
> >>   virtio-iommu: Add the iommu regions
> >>   virtio-iommu: Endpoint and domains structs and helpers
> >>   virtio-iommu: Implement attach/detach command
> >>   virtio-iommu: Implement map/unmap
> >>   virtio-iommu: Implement translate
> >>   virtio-iommu: Implement fault reporting
> >>   virtio-iommu-pci: Add virtio iommu pci support
> >>   hw/arm/virt: Add the virtio-iommu device tree mappings
> >>   qapi: Introduce DEFINE_PROP_INTERVAL
> >>   virtio-iommu: Implement probe request
> >>   virtio-iommu: Handle reserved regions in the translation process
> >>   virtio-iommu-pci: Add array of Interval properties
> >>   hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper
> >>   hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table
> >>   virtio-iommu: Support migration
> >>   pc: Add support for virtio-iommu-pci
> >>   tests: Add virtio-iommu test
> >>
> >>  hw/arm/virt-acpi-build.c         |  91 ++-
> >>  hw/arm/virt.c                    |  53 +-
> >>  hw/core/qdev-properties.c        |  90 +++
> >>  hw/i386/acpi-build.c             |  72 +++
> >>  hw/i386/pc.c                     |  15 +-
> >>  hw/virtio/Kconfig                |   5 +
> >>  hw/virtio/Makefile.objs          |   2 +
> >>  hw/virtio/trace-events           |  22 +
> >>  hw/virtio/virtio-iommu-pci.c     |  91 +++
> >>  hw/virtio/virtio-iommu.c         | 952 +++++++++++++++++++++++++++++++
> >>  include/exec/memory.h            |   6 +
> >>  include/hw/acpi/acpi-defs.h      |  21 +-
> >>  include/hw/arm/virt.h            |   2 +
> >>  include/hw/i386/pc.h             |   2 +
> >>  include/hw/pci/pci.h             |   1 +
> >>  include/hw/qdev-properties.h     |   3 +
> >>  include/hw/virtio/virtio-iommu.h |  67 +++
> >>  include/migration/vmstate.h      |  21 +
> >>  include/qemu/queue.h             |  39 ++
> >>  include/qemu/typedefs.h          |   1 +
> >>  migration/trace-events           |   5 +
> >>  migration/vmstate-types.c        |  70 +++
> >>  qdev-monitor.c                   |   1 +
> >>  tests/Makefile.include           |   2 +
> >>  tests/libqos/virtio-iommu.c      | 177 ++++++
> >>  tests/libqos/virtio-iommu.h      |  45 ++
> >>  tests/test-vmstate.c             | 170 ++++++
> >>  tests/virtio-iommu-test.c        | 261 +++++++++
> >>  28 files changed, 2253 insertions(+), 34 deletions(-)
> >>  create mode 100644 hw/virtio/virtio-iommu-pci.c
> >>  create mode 100644 hw/virtio/virtio-iommu.c
> >>  create mode 100644 include/hw/virtio/virtio-iommu.h
> >>  create mode 100644 tests/libqos/virtio-iommu.c
> >>  create mode 100644 tests/libqos/virtio-iommu.h
> >>  create mode 100644 tests/virtio-iommu-test.c
> >>
> >> -- 
> >> 2.20.1
> > 
> > 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL
  2019-11-22 18:29 ` [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL Eric Auger
  2019-11-22 19:03   ` Dr. David Alan Gilbert
@ 2019-12-12 12:17   ` Markus Armbruster
  2019-12-12 15:13     ` Auger Eric
  1 sibling, 1 reply; 89+ messages in thread
From: Markus Armbruster @ 2019-12-12 12:17 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Eric Auger <eric.auger@redhat.com> writes:

> Introduce a new property defining a labelled interval:
> <low address>,<high address>,label.
>
> This will be used to encode reserved IOVA regions. The label
> is left undefined to ease reuse accross use cases.

What does the last sentence mean?

> For instance, in virtio-iommu use case, reserved IOVA regions
> will be passed by the machine code to the virtio-iommu-pci
> device (an array of those). The label will match the
> virtio_iommu_probe_resv_mem subtype value:
> - VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0)
> - VIRTIO_IOMMU_RESV_MEM_T_MSI (1)
>
> This is used to inform the virtio-iommu-pci device it should
> bypass the MSI region: 0xfee00000, 0xfeefffff, 1.

So the "label" part of "<low address>,<high address>,label" is a number?

Is a number appropriate for your use case, or would an enum be better?

>
> Signed-off-by: Eric Auger <eric.auger@redhat.com> ---
>hw/core/qdev-properties.c | 90 ++++++++++++++++++++++++++++++++++++
>include/exec/memory.h | 6 +++ include/hw/qdev-properties.h | 3 ++
>include/qemu/typedefs.h | 1 + 4 files changed, 100 insertions(+)

Subject has 'qapi:', but it's actually about qdev.  Please adjust the subject.

> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
> index ac28890e5a..8d70f34e37 100644
> --- a/hw/core/qdev-properties.c
> +++ b/hw/core/qdev-properties.c
> @@ -13,6 +13,7 @@
>  #include "qapi/visitor.h"
>  #include "chardev/char.h"
>  #include "qemu/uuid.h"
> +#include "qemu/cutils.h"
>  
>  void qdev_prop_set_after_realize(DeviceState *dev, const char *name,
>                                    Error **errp)
> @@ -585,6 +586,95 @@ const PropertyInfo qdev_prop_macaddr = {
>      .set   = set_mac,
>  };
>  
> +/* --- Labelled Interval --- */
> +
> +/*
> + * accepted syntax versions:

"versions"?

> + *   <low address>,<high address>,<type>
> + *   where low/high addresses are uint64_t in hexa (feat. 0x prefix)

"hexa" is not a word.

I'm afraid I don't get the parenthesis.

> + *   and type is an unsigned integer
> + */
> +static void get_interval(Object *obj, Visitor *v, const char *name,
> +                         void *opaque, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    Property *prop = opaque;
> +    Interval *interval = qdev_get_prop_ptr(dev, prop);
> +    char buffer[64];
> +    char *p = buffer;
> +
> +    Snprintf(buffer, sizeof(buffer), "0x%"PRIx64",0x%"PRIx64",%d",
> +             interval->low, interval->high, interval->type);

interval->type is unsigned.  Use %u, not %d.

> +
> +    visit_type_str(v, name, &p, errp);
> +}
> +
> +static void set_interval(Object *obj, Visitor *v, const char *name,
> +                         void *opaque, Error **errp)
> +{
> +    DeviceState *dev = DEVICE(obj);
> +    Property *prop = opaque;
> +    Interval *interval = qdev_get_prop_ptr(dev, prop);
> +    Error *local_err = NULL;
> +    unsigned int type;
> +    gchar **fields;
> +    uint64_t addr;
> +    char *str;
> +    int ret;
> +
> +    if (dev->realized) {
> +        qdev_prop_set_after_realize(dev, name, errp);
> +        return;
> +    }
> +
> +    visit_type_str(v, name, &str, &local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
> +    fields = g_strsplit(str, ",", 3);
> +
> +    ret = qemu_strtou64(fields[0], NULL, 16, &addr);

Aha, the 0x prefix is actually optional.

> +    if (!ret) {
> +        interval->low = addr;
> +    } else {
> +        error_setg(errp, "Failed to decode interval low addr");
> +        error_append_hint(errp,
> +                          "should be an address in hexa with 0x prefix\n");

"hexa" is not a word, and the 0x prefix is actually optional.

> +        goto out;
> +    }

I prefer

       if (error) {
           handle error
           bail out
       }
       handle success

over

       if (success) {
           handle success
       if (error) {
           handle error
           bail out
       }

In this case:

       if (ret) {
           error_setg(errp, "Failed to decode interval low addr");
           error_append_hint(errp,
                             "should be an address in hexa with 0x prefix\n");
           goto out;
       }
       interval->low = addr;


> +
> +    ret = qemu_strtou64(fields[1], NULL, 16, &addr);

Crash if @str doesn't contain ',', because the g_strsplit(str, ",", 3)
yields { [0] = str, NULL }.

> +    if (!ret) {
> +        interval->high = addr;
> +    } else {
> +        error_setg(errp, "Failed to decode interval high addr");
> +        error_append_hint(errp,
> +                          "should be an address in hexa with 0x prefix\n");
> +        goto out;
> +    }
> +
> +    ret = qemu_strtoui(fields[2], NULL, 10, &type);

Likewise, crash if @str contains only one ','.

I wouldn't use g_strsplit() here.  After

    ret = qemu_strtoui(str, &endptr, 16, &interval->low);

@endptr points behind the address.  So:

    if (ret || *endptr != ',') {
        handle error ...
        goto out
    }

    ret = qemu_strtoui(endptr + 1, &endptr, 16, &interval->high);

and so forth.

Note that the if (ret || *endptr != ',') checks for two distinct errors.
Distinct error messages might be more helpful.

> +    if (!ret) {
> +        interval->type = type;
> +    } else {
> +        error_setg(errp, "Failed to decode interval type");
> +        error_append_hint(errp, "should be an unsigned int in decimal\n");
> +    }
> +out:
> +    g_free(str);
> +    g_strfreev(fields);
> +    return;
> +}
> +
> +const PropertyInfo qdev_prop_interval = {
> +    .name  = "labelled_interval",
> +    .description = "Labelled interval, example: 0xFEE00000,0xFEEFFFFF,0",
> +    .get   = get_interval,
> +    .set   = set_interval,
> +};
> +
>  /* --- on/off/auto --- */
>  
>  const PropertyInfo qdev_prop_on_off_auto = {
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index e499dc215b..e238d1c352 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -57,6 +57,12 @@ struct MemoryRegionMmio {
>      CPUWriteMemoryFunc *write[3];
>  };
>  
> +struct Interval {
> +    hwaddr low;
> +    hwaddr high;
> +    unsigned int type;
> +};

This isn't an interval.  An interval consists of two values, not three.

The third one is called "type" here, and "label" elsewhere.  Pick one
and stick to it.

Then pick a name for the triple.  Elsewhere, you call it "labelled
interval".

> +
>  typedef struct IOMMUTLBEntry IOMMUTLBEntry;
>  
>  /* See address_space_translate: bit 0 is read, bit 1 is write.  */
> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
> index c6a8cb5516..2ba7c8711b 100644
> --- a/include/hw/qdev-properties.h
> +++ b/include/hw/qdev-properties.h
> @@ -20,6 +20,7 @@ extern const PropertyInfo qdev_prop_chr;
>  extern const PropertyInfo qdev_prop_tpm;
>  extern const PropertyInfo qdev_prop_ptr;
>  extern const PropertyInfo qdev_prop_macaddr;
> +extern const PropertyInfo qdev_prop_interval;
>  extern const PropertyInfo qdev_prop_on_off_auto;
>  extern const PropertyInfo qdev_prop_losttickpolicy;
>  extern const PropertyInfo qdev_prop_blockdev_on_error;
> @@ -202,6 +203,8 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
>      DEFINE_PROP(_n, _s, _f, qdev_prop_drive_iothread, BlockBackend *)
>  #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
>      DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
> +#define DEFINE_PROP_INTERVAL(_n, _s, _f)         \
> +    DEFINE_PROP(_n, _s, _f, qdev_prop_interval, Interval)
>  #define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
>      DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
>  #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
> index 375770a80f..a827c9a3fe 100644
> --- a/include/qemu/typedefs.h
> +++ b/include/qemu/typedefs.h
> @@ -58,6 +58,7 @@ typedef struct ISABus ISABus;
>  typedef struct ISADevice ISADevice;
>  typedef struct IsaDma IsaDma;
>  typedef struct MACAddr MACAddr;
> +typedef struct Interval Interval;
>  typedef struct MachineClass MachineClass;
>  typedef struct MachineState MachineState;
>  typedef struct MemoryListener MemoryListener;



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device
  2019-12-11 20:40     ` Michael S. Tsirkin
@ 2019-12-12 15:05       ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-12 15:05 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, quintela,
	jean-philippe.brucker, qemu-devel, peterx, armbru, bharatb.linux,
	qemu-arm, dgilbert, eric.auger.pro

Hi Michael,

On 12/11/19 9:40 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 11, 2019 at 05:48:05PM +0100, Auger Eric wrote:
>> Hi Michael,
>>
>> On 12/11/19 5:40 PM, Michael S. Tsirkin wrote:
>>> On Fri, Nov 22, 2019 at 07:29:23PM +0100, Eric Auger wrote:
>>>> This series implements the QEMU virtio-iommu device.
>>>>
>>>> This matches the v0.12 spec and the corresponding virtio-iommu
>>>> driver upstreamed in 5.3.
>>>>
>>>> The pci proxy for the virtio-iommu device is instantiated using
>>>> "-device virtio-iommu-pci". This series still relies on ACPI IORT/DT
>>>> integration. Note the ACPI IORT integration is not yet upstreamed
>>>> and testing needs to be based on Jean-Philippe's additional
>>>> kernel patches [1].
>>>
>>> Or the config space approach? I really liked that one.
>> Yes this corresponds to the paragraph below.
>>>
>>>>
>>>> Work is ongoing to remove IORT adherence and allow the
>>>> bindings between the IOMMU and the root complex to be defined
>>>> and written into the PCI device configuration space. The outcome
>>>> of this work is uncertain at this stage though. See [2].
> 
> Oh right. Why is it uncertain? Anything can be done to help?

Jean's series was sent on the same day as this QEMU respin. My
understanding is we still need a way to handle platform devices. Also
the binding info layout needs to be revised and integrated into the spec
+ voted. Those are the uncertainties I wanted to point out.

Thanks

Eric
> 
>> Thanks
>>
>> Eric
>>
>>>>
>>>> So only patches 1-11 fully rely on upstreamed kernel code. Others
>>>> should be considered as RFC.
>>>>
>>>> This respin allows people to test on ARM and x86. It also
>>>> brings migration support (tested on ARM) and various cleanups.
>>>> Reserved regions are now passed through an array of properties.
>>>> A libqos test also is introduced to test the virtio-iommu API.
>>>>
>>>> Note integration with vhost devices and vfio devices is not part
>>>> of this series. Please follow Bharat's respins [3].
>>>>
>>>> The 1st Patch ("migration: Support QLIST migration") was sent
>>>> separately [4].
>>>>
>>>> Best Regards
>>>>
>>>> Eric
>>>>
>>>> This series can be found at:
>>>> https://github.com/eauger/qemu/tree/v4.2-rc2-virtio-iommu-v11
>>>>
>>>> [1] kernel branch to be used for guest
>>>>     https://github.com/eauger/linux/tree/v5.4-rc8-virtio-iommu-iort
>>>> [2] [RFC 00/13] virtio-iommu on non-devicetree platforms
>>>> [3] VFIO/VHOST integration is not part of this series. Please follow
>>>>     [PATCH RFC v5 0/5] virtio-iommu: VFIO integration respins
>>>> [4] [PATCH v6] migration: Support QLIST migration
>>>>
>>>> Testing:
>>>> - tested with guest using virtio-net-pci
>>>>   (,vhost=off,iommu_platform,disable-modern=off,disable-legacy=on)
>>>>   and virtio-blk-pci
>>>> - migration on ARM
>>>> - on x86 PC machine I get some AHCI non translated transactions,
>>>>   very early. This does not prevent the guest from boot and behaving
>>>>   properly. Warnings look like:
>>>> qemu-system-x86_64: virtio_iommu_translate sid=250 is not known!!
>>>> qemu-system-x86_64: no buffer available in event queue to report event
>>>> qemu-system-x86_64: AHCI: Failed to start FIS receive engine: bad FIS
>>>> receive buffer address
>>>>
>>>> History:
>>>>
>>>> v10 -> v11:
>>>> - introduce virtio_iommu_handle_req macro
>>>> - migration support
>>>> - introduce DEFINE_PROP_INTERVAL and pass reserved regions
>>>>   through an array of those
>>>> - domain gtree simplification
>>>>
>>>> v9 -> v10:
>>>> - rebase on 4.1.0-rc2, compliance with 0.12 spec
>>>> - removed ACPI part
>>>> - cleanup (see individual change logs)
>>>> - moved to a PATCH series
>>>>
>>>> v8 -> v9:
>>>> - virtio-iommu-pci device needs to be instantiated from the command
>>>>   line (RID is not imposed anymore).
>>>> - tail structure properly initialized
>>>>
>>>> v7 -> v8:
>>>> - virtio-iommu-pci added
>>>> - virt instantiation modified
>>>> - DT and ACPI modified to exclude the iommu RID from the mapping
>>>> - VIRTIO_IOMMU_F_BYPASS, VIRTIO_F_VERSION_1 features exposed
>>>>
>>>> v6 -> v7:
>>>> - rebase on qemu 3.0.0-rc3
>>>> - minor update against v0.7
>>>> - fix issue with EP not on pci.0 and ACPI probing
>>>> - change the instantiation method
>>>>
>>>> v5 -> v6:
>>>> - minor update against v0.6 spec
>>>> - fix g_hash_table_lookup in virtio_iommu_find_add_as
>>>> - replace some error_reports by qemu_log_mask(LOG_GUEST_ERROR, ...)
>>>>
>>>> v4 -> v5:
>>>> - event queue and fault reporting
>>>> - we now return the IOAPIC MSI region if the virtio-iommu is instantiated
>>>>   in a PC machine.
>>>> - we bypass transactions on MSI HW region and fault on reserved ones.
>>>> - We support ACPI boot with mach-virt (based on IORT proposal)
>>>> - We moved to the new driver naming conventions
>>>> - simplified mach-virt instantiation
>>>> - worked around the disappearing of pci_find_primary_bus
>>>> - in virtio_iommu_translate, check the dev->as is not NULL
>>>> - initialize as->device_list in virtio_iommu_get_as
>>>> - initialize bufstate.error to false in virtio_iommu_probe
>>>>
>>>> v3 -> v4:
>>>> - probe request support although no reserved region is returned at
>>>>   the moment
>>>> - unmap semantics less strict, as specified in v0.4
>>>> - device registration, attach/detach revisited
>>>> - split into smaller patches to ease review
>>>> - propose a way to inform the IOMMU mr about the page_size_mask
>>>>   of underlying HW IOMMU, if any
>>>> - remove warning associated with the translation of the MSI doorbell
>>>>
>>>> v2 -> v3:
>>>> - rebase on top of 2.10-rc0 and especially
>>>>   [PATCH qemu v9 0/2] memory/iommu: QOM'fy IOMMU MemoryRegion
>>>> - add mutex init
>>>> - fix as->mappings deletion using g_tree_ref/unref
>>>> - when a dev is attached whereas it is already attached to
>>>>   another address space, first detach it
>>>> - fix some error values
>>>> - page_sizes = TARGET_PAGE_MASK;
>>>> - I haven't changed the unmap() semantics yet, waiting for the
>>>>   next virtio-iommu spec revision.
>>>>
>>>> v1 -> v2:
>>>> - fix redefinition of viommu_as typedef
>>>>
>>>>
>>>>
>>>> Eric Auger (20):
>>>>   migration: Support QLIST migration
>>>>   virtio-iommu: Add skeleton
>>>>   virtio-iommu: Decode the command payload
>>>>   virtio-iommu: Add the iommu regions
>>>>   virtio-iommu: Endpoint and domains structs and helpers
>>>>   virtio-iommu: Implement attach/detach command
>>>>   virtio-iommu: Implement map/unmap
>>>>   virtio-iommu: Implement translate
>>>>   virtio-iommu: Implement fault reporting
>>>>   virtio-iommu-pci: Add virtio iommu pci support
>>>>   hw/arm/virt: Add the virtio-iommu device tree mappings
>>>>   qapi: Introduce DEFINE_PROP_INTERVAL
>>>>   virtio-iommu: Implement probe request
>>>>   virtio-iommu: Handle reserved regions in the translation process
>>>>   virtio-iommu-pci: Add array of Interval properties
>>>>   hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper
>>>>   hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table
>>>>   virtio-iommu: Support migration
>>>>   pc: Add support for virtio-iommu-pci
>>>>   tests: Add virtio-iommu test
>>>>
>>>>  hw/arm/virt-acpi-build.c         |  91 ++-
>>>>  hw/arm/virt.c                    |  53 +-
>>>>  hw/core/qdev-properties.c        |  90 +++
>>>>  hw/i386/acpi-build.c             |  72 +++
>>>>  hw/i386/pc.c                     |  15 +-
>>>>  hw/virtio/Kconfig                |   5 +
>>>>  hw/virtio/Makefile.objs          |   2 +
>>>>  hw/virtio/trace-events           |  22 +
>>>>  hw/virtio/virtio-iommu-pci.c     |  91 +++
>>>>  hw/virtio/virtio-iommu.c         | 952 +++++++++++++++++++++++++++++++
>>>>  include/exec/memory.h            |   6 +
>>>>  include/hw/acpi/acpi-defs.h      |  21 +-
>>>>  include/hw/arm/virt.h            |   2 +
>>>>  include/hw/i386/pc.h             |   2 +
>>>>  include/hw/pci/pci.h             |   1 +
>>>>  include/hw/qdev-properties.h     |   3 +
>>>>  include/hw/virtio/virtio-iommu.h |  67 +++
>>>>  include/migration/vmstate.h      |  21 +
>>>>  include/qemu/queue.h             |  39 ++
>>>>  include/qemu/typedefs.h          |   1 +
>>>>  migration/trace-events           |   5 +
>>>>  migration/vmstate-types.c        |  70 +++
>>>>  qdev-monitor.c                   |   1 +
>>>>  tests/Makefile.include           |   2 +
>>>>  tests/libqos/virtio-iommu.c      | 177 ++++++
>>>>  tests/libqos/virtio-iommu.h      |  45 ++
>>>>  tests/test-vmstate.c             | 170 ++++++
>>>>  tests/virtio-iommu-test.c        | 261 +++++++++
>>>>  28 files changed, 2253 insertions(+), 34 deletions(-)
>>>>  create mode 100644 hw/virtio/virtio-iommu-pci.c
>>>>  create mode 100644 hw/virtio/virtio-iommu.c
>>>>  create mode 100644 include/hw/virtio/virtio-iommu.h
>>>>  create mode 100644 tests/libqos/virtio-iommu.c
>>>>  create mode 100644 tests/libqos/virtio-iommu.h
>>>>  create mode 100644 tests/virtio-iommu-test.c
>>>>
>>>> -- 
>>>> 2.20.1
>>>
>>>
> 
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL
  2019-12-12 12:17   ` Markus Armbruster
@ 2019-12-12 15:13     ` Auger Eric
  2019-12-13 10:03       ` Markus Armbruster
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2019-12-12 15:13 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, dgilbert,
	bharatb.linux, qemu-arm, eric.auger.pro

Hi Markus,

On 12/12/19 1:17 PM, Markus Armbruster wrote:
> Eric Auger <eric.auger@redhat.com> writes:
> 
>> Introduce a new property defining a labelled interval:
>> <low address>,<high address>,label.
>>
>> This will be used to encode reserved IOVA regions. The label
>> is left undefined to ease reuse accross use cases.
> 
> What does the last sentence mean?
The dilemma was shall I specialize this property such as ReservedRegion
or shall I leave it generic enough to serve somebody else use case. I
first chose the latter but now I think I should rather call it something
like ReservedRegion as in any case it has addresses and an integer label.
> 
>> For instance, in virtio-iommu use case, reserved IOVA regions
>> will be passed by the machine code to the virtio-iommu-pci
>> device (an array of those). The label will match the
>> virtio_iommu_probe_resv_mem subtype value:
>> - VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0)
>> - VIRTIO_IOMMU_RESV_MEM_T_MSI (1)
>>
>> This is used to inform the virtio-iommu-pci device it should
>> bypass the MSI region: 0xfee00000, 0xfeefffff, 1.
> 
> So the "label" part of "<low address>,<high address>,label" is a number?
yes it is.
> 
> Is a number appropriate for your use case, or would an enum be better?
I think a number is OK. There might be other types of reserved regions
in the future. Also if we want to allow somebody else to reuse that
property in another context, I would rather leave it open?
> 
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com> ---
>> hw/core/qdev-properties.c | 90 ++++++++++++++++++++++++++++++++++++
>> include/exec/memory.h | 6 +++ include/hw/qdev-properties.h | 3 ++
>> include/qemu/typedefs.h | 1 + 4 files changed, 100 insertions(+)
> 
> Subject has 'qapi:', but it's actually about qdev.  Please adjust the subject.
OK
> 
>> diff --git a/hw/core/qdev-properties.c b/hw/core/qdev-properties.c
>> index ac28890e5a..8d70f34e37 100644
>> --- a/hw/core/qdev-properties.c
>> +++ b/hw/core/qdev-properties.c
>> @@ -13,6 +13,7 @@
>>  #include "qapi/visitor.h"
>>  #include "chardev/char.h"
>>  #include "qemu/uuid.h"
>> +#include "qemu/cutils.h"
>>  
>>  void qdev_prop_set_after_realize(DeviceState *dev, const char *name,
>>                                    Error **errp)
>> @@ -585,6 +586,95 @@ const PropertyInfo qdev_prop_macaddr = {
>>      .set   = set_mac,
>>  };
>>  
>> +/* --- Labelled Interval --- */
>> +
>> +/*
>> + * accepted syntax versions:
> 
> "versions"?
s/versions/version
> 
>> + *   <low address>,<high address>,<type>
>> + *   where low/high addresses are uint64_t in hexa (feat. 0x prefix)
> 
> "hexa" is not a word.
OK
> 
> I'm afraid I don't get the parenthesis.
I wanted to mention the 0x prefix was needed but as you mentionned below
it is not needed actually.
> 
>> + *   and type is an unsigned integer
>> + */
>> +static void get_interval(Object *obj, Visitor *v, const char *name,
>> +                         void *opaque, Error **errp)
>> +{
>> +    DeviceState *dev = DEVICE(obj);
>> +    Property *prop = opaque;
>> +    Interval *interval = qdev_get_prop_ptr(dev, prop);
>> +    char buffer[64];
>> +    char *p = buffer;
>> +
>> +    Snprintf(buffer, sizeof(buffer), "0x%"PRIx64",0x%"PRIx64",%d",
>> +             interval->low, interval->high, interval->type);
> 
> interval->type is unsigned.  Use %u, not %d.
OK
> 
>> +
>> +    visit_type_str(v, name, &p, errp);
>> +}
>> +
>> +static void set_interval(Object *obj, Visitor *v, const char *name,
>> +                         void *opaque, Error **errp)
>> +{
>> +    DeviceState *dev = DEVICE(obj);
>> +    Property *prop = opaque;
>> +    Interval *interval = qdev_get_prop_ptr(dev, prop);
>> +    Error *local_err = NULL;
>> +    unsigned int type;
>> +    gchar **fields;
>> +    uint64_t addr;
>> +    char *str;
>> +    int ret;
>> +
>> +    if (dev->realized) {
>> +        qdev_prop_set_after_realize(dev, name, errp);
>> +        return;
>> +    }
>> +
>> +    visit_type_str(v, name, &str, &local_err);
>> +    if (local_err) {
>> +        error_propagate(errp, local_err);
>> +        return;
>> +    }
>> +
>> +    fields = g_strsplit(str, ",", 3);
>> +
>> +    ret = qemu_strtou64(fields[0], NULL, 16, &addr);
> 
> Aha, the 0x prefix is actually optional.
> 
>> +    if (!ret) {
>> +        interval->low = addr;
>> +    } else {
>> +        error_setg(errp, "Failed to decode interval low addr");
>> +        error_append_hint(errp,
>> +                          "should be an address in hexa with 0x prefix\n");
> 
> "hexa" is not a word, and the 0x prefix is actually optional.
OK
> 
>> +        goto out;
>> +    }
> 
> I prefer
> 
>        if (error) {
>            handle error
>            bail out
>        }
>        handle success
> 
> over
> 
>        if (success) {
>            handle success
>        if (error) {
>            handle error
>            bail out
>        }
> 
> In this case:
> 
>        if (ret) {
>            error_setg(errp, "Failed to decode interval low addr");
>            error_append_hint(errp,
>                              "should be an address in hexa with 0x prefix\n");
>            goto out;
>        }
>        interval->low = addr;
OK
> 
> 
>> +
>> +    ret = qemu_strtou64(fields[1], NULL, 16, &addr);
> 
> Crash if @str doesn't contain ',', because the g_strsplit(str, ",", 3)
> yields { [0] = str, NULL }.
> 
>> +    if (!ret) {
>> +        interval->high = addr;
>> +    } else {
>> +        error_setg(errp, "Failed to decode interval high addr");
>> +        error_append_hint(errp,
>> +                          "should be an address in hexa with 0x prefix\n");
>> +        goto out;
>> +    }
>> +
>> +    ret = qemu_strtoui(fields[2], NULL, 10, &type);
> 
> Likewise, crash if @str contains only one ','.
> 
> I wouldn't use g_strsplit() here.  After
> 
>     ret = qemu_strtoui(str, &endptr, 16, &interval->low);
> 
> @endptr points behind the address.  So:
> 
>     if (ret || *endptr != ',') {
>         handle error ...
>         goto out
>     }
> 
>     ret = qemu_strtoui(endptr + 1, &endptr, 16, &interval->high);
> 
> and so forth.
> 
> Note that the if (ret || *endptr != ',') checks for two distinct errors.
> Distinct error messages might be more helpful.
OK I will revisit that.
> 
>> +    if (!ret) {
>> +        interval->type = type;
>> +    } else {
>> +        error_setg(errp, "Failed to decode interval type");
>> +        error_append_hint(errp, "should be an unsigned int in decimal\n");
>> +    }
>> +out:
>> +    g_free(str);
>> +    g_strfreev(fields);
>> +    return;
>> +}
>> +
>> +const PropertyInfo qdev_prop_interval = {
>> +    .name  = "labelled_interval",
>> +    .description = "Labelled interval, example: 0xFEE00000,0xFEEFFFFF,0",
>> +    .get   = get_interval,
>> +    .set   = set_interval,
>> +};
>> +
>>  /* --- on/off/auto --- */
>>  
>>  const PropertyInfo qdev_prop_on_off_auto = {
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index e499dc215b..e238d1c352 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -57,6 +57,12 @@ struct MemoryRegionMmio {
>>      CPUWriteMemoryFunc *write[3];
>>  };
>>  
>> +struct Interval {
>> +    hwaddr low;
>> +    hwaddr high;
>> +    unsigned int type;
>> +};
> 
> This isn't an interval.  An interval consists of two values, not three.
> 
> The third one is called "type" here, and "label" elsewhere.  Pick one
> and stick to it.
> 
> Then pick a name for the triple.  Elsewhere, you call it "labelled
> interval".
I would tend to use ReservedRegion now if nobody objects.

Thank you for the review!


Eric
> 
>> +
>>  typedef struct IOMMUTLBEntry IOMMUTLBEntry;
>>  
>>  /* See address_space_translate: bit 0 is read, bit 1 is write.  */
>> diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
>> index c6a8cb5516..2ba7c8711b 100644
>> --- a/include/hw/qdev-properties.h
>> +++ b/include/hw/qdev-properties.h
>> @@ -20,6 +20,7 @@ extern const PropertyInfo qdev_prop_chr;
>>  extern const PropertyInfo qdev_prop_tpm;
>>  extern const PropertyInfo qdev_prop_ptr;
>>  extern const PropertyInfo qdev_prop_macaddr;
>> +extern const PropertyInfo qdev_prop_interval;
>>  extern const PropertyInfo qdev_prop_on_off_auto;
>>  extern const PropertyInfo qdev_prop_losttickpolicy;
>>  extern const PropertyInfo qdev_prop_blockdev_on_error;
>> @@ -202,6 +203,8 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
>>      DEFINE_PROP(_n, _s, _f, qdev_prop_drive_iothread, BlockBackend *)
>>  #define DEFINE_PROP_MACADDR(_n, _s, _f)         \
>>      DEFINE_PROP(_n, _s, _f, qdev_prop_macaddr, MACAddr)
>> +#define DEFINE_PROP_INTERVAL(_n, _s, _f)         \
>> +    DEFINE_PROP(_n, _s, _f, qdev_prop_interval, Interval)
>>  #define DEFINE_PROP_ON_OFF_AUTO(_n, _s, _f, _d) \
>>      DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_on_off_auto, OnOffAuto)
>>  #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
>> diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
>> index 375770a80f..a827c9a3fe 100644
>> --- a/include/qemu/typedefs.h
>> +++ b/include/qemu/typedefs.h
>> @@ -58,6 +58,7 @@ typedef struct ISABus ISABus;
>>  typedef struct ISADevice ISADevice;
>>  typedef struct IsaDma IsaDma;
>>  typedef struct MACAddr MACAddr;
>> +typedef struct Interval Interval;
>>  typedef struct MachineClass MachineClass;
>>  typedef struct MachineState MachineState;
>>  typedef struct MemoryListener MemoryListener;
> 
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL
  2019-12-12 15:13     ` Auger Eric
@ 2019-12-13 10:03       ` Markus Armbruster
  0 siblings, 0 replies; 89+ messages in thread
From: Markus Armbruster @ 2019-12-13 10:03 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, dgilbert,
	bharatb.linux, qemu-arm, eric.auger.pro

Auger Eric <eric.auger@redhat.com> writes:

> Hi Markus,
>
> On 12/12/19 1:17 PM, Markus Armbruster wrote:
>> Eric Auger <eric.auger@redhat.com> writes:
>> 
>>> Introduce a new property defining a labelled interval:
>>> <low address>,<high address>,label.
>>>
>>> This will be used to encode reserved IOVA regions. The label
>>> is left undefined to ease reuse accross use cases.
>> 
>> What does the last sentence mean?
> The dilemma was shall I specialize this property such as ReservedRegion
> or shall I leave it generic enough to serve somebody else use case. I
> first chose the latter but now I think I should rather call it something
> like ReservedRegion as in any case it has addresses and an integer label.
>> 
>>> For instance, in virtio-iommu use case, reserved IOVA regions
>>> will be passed by the machine code to the virtio-iommu-pci
>>> device (an array of those). The label will match the
>>> virtio_iommu_probe_resv_mem subtype value:
>>> - VIRTIO_IOMMU_RESV_MEM_T_RESERVED (0)
>>> - VIRTIO_IOMMU_RESV_MEM_T_MSI (1)
>>>
>>> This is used to inform the virtio-iommu-pci device it should
>>> bypass the MSI region: 0xfee00000, 0xfeefffff, 1.
>> 
>> So the "label" part of "<low address>,<high address>,label" is a number?
> yes it is.
>> 
>> Is a number appropriate for your use case, or would an enum be better?
> I think a number is OK. There might be other types of reserved regions
> in the future. Also if we want to allow somebody else to reuse that
> property in another context, I would rather leave it open?

I'd prioritize the user interface over possible reuse (which might never
happen).  Mind, I'm not telling you using numbers is a bad user
interface.  In general, enums are nicer, but I don't know enough about
this particular case.

>> 
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com> ---
[...]
>>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>>> index e499dc215b..e238d1c352 100644
>>> --- a/include/exec/memory.h
>>> +++ b/include/exec/memory.h
>>> @@ -57,6 +57,12 @@ struct MemoryRegionMmio {
>>>      CPUWriteMemoryFunc *write[3];
>>>  };
>>>  
>>> +struct Interval {
>>> +    hwaddr low;
>>> +    hwaddr high;
>>> +    unsigned int type;
>>> +};
>> 
>> This isn't an interval.  An interval consists of two values, not three.
>> 
>> The third one is called "type" here, and "label" elsewhere.  Pick one
>> and stick to it.
>> 
>> Then pick a name for the triple.  Elsewhere, you call it "labelled
>> interval".
> I would tend to use ReservedRegion now if nobody objects.

Sounds good to me.

> Thank you for the review!

You're welcome!



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-12-10 19:33   ` Peter Xu
@ 2019-12-19 10:30     ` Auger Eric
  2019-12-19 13:33       ` Peter Xu
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2019-12-19 10:30 UTC (permalink / raw)
  To: Peter Xu
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Peter,
On 12/10/19 8:33 PM, Peter Xu wrote:
> On Fri, Nov 22, 2019 at 07:29:31PM +0100, Eric Auger wrote:
>> This patch implements the translate callback
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>
>> ---
>>
>> v10 -> v11:
>> - take into account the new value struct and use
>>   g_tree_lookup_extended
>> - switched to error_report_once
>>
>> v6 -> v7:
>> - implemented bypass-mode
>>
>> v5 -> v6:
>> - replace error_report by qemu_log_mask
>>
>> v4 -> v5:
>> - check the device domain is not NULL
>> - s/printf/error_report
>> - set flags to IOMMU_NONE in case of all translation faults
>> ---
>>  hw/virtio/trace-events   |  1 +
>>  hw/virtio/virtio-iommu.c | 63 +++++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 63 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>> index f25359cee2..de7cbb3c8f 100644
>> --- a/hw/virtio/trace-events
>> +++ b/hw/virtio/trace-events
>> @@ -72,3 +72,4 @@ virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
>>  virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
>>  virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
>>  virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
>> +virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index f0a56833a2..a83666557b 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -412,19 +412,80 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>>                                              int iommu_idx)
>>  {
>>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
>> +    viommu_interval interval, *mapping_key;
>> +    viommu_mapping *mapping_value;
>> +    VirtIOIOMMU *s = sdev->viommu;
>> +    viommu_endpoint *ep;
>> +    bool bypass_allowed;
>>      uint32_t sid;
>> +    bool found;
>> +
>> +    interval.low = addr;
>> +    interval.high = addr + 1;
>>  
>>      IOMMUTLBEntry entry = {
>>          .target_as = &address_space_memory,
>>          .iova = addr,
>>          .translated_addr = addr,
>> -        .addr_mask = ~(hwaddr)0,
>> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
>>          .perm = IOMMU_NONE,
>>      };
>>  
>> +    bypass_allowed = virtio_has_feature(s->acked_features,
>> +                                        VIRTIO_IOMMU_F_BYPASS);
>> +
> 
> Would it be easier to check bypass_allowed here once and then drop the
> latter [1] and [2] check?
bypass_allowed does not mean you systematically bypass. You bypass if
the SID is unknown or if the device is not attached to any domain.
Otherwise you translate. But maybe I miss your point.
> 
>>      sid = virtio_iommu_get_sid(sdev);
>>  
>>      trace_virtio_iommu_translate(mr->parent_obj.name, sid, addr, flag);
>> +    qemu_mutex_lock(&s->mutex);
>> +
>> +    ep = g_tree_lookup(s->endpoints, GUINT_TO_POINTER(sid));
>> +    if (!ep) {
>> +        if (!bypass_allowed) {
> 
> [1]
> 
>> +            error_report_once("%s sid=%d is not known!!", __func__, sid);
>> +        } else {
>> +            entry.perm = flag;
>> +        }
>> +        goto unlock;
>> +    }
>> +
>> +    if (!ep->domain) {
>> +        if (!bypass_allowed) {
> 
> [2]
> 
>> +            qemu_log_mask(LOG_GUEST_ERROR,
>> +                          "%s %02x:%02x.%01x not attached to any domain\n",
>> +                          __func__, PCI_BUS_NUM(sid),
>> +                          PCI_SLOT(sid), PCI_FUNC(sid));
>> +        } else {
>> +            entry.perm = flag;
>> +        }
>> +        goto unlock;
>> +    }
>> +
>> +    found = g_tree_lookup_extended(ep->domain->mappings, (gpointer)(&interval),
>> +                                   (void **)&mapping_key,
>> +                                   (void **)&mapping_value);
>> +    if (!found) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                      "%s no mapping for 0x%"PRIx64" for sid=%d\n",
>> +                      __func__, addr, sid);
> 
> I would still suggest that we use the same logging interface (either
> error_report_once() or qemu_log_mask(), not use them randomly).
OK I will switch to error_report_once() then
> 
>> +        goto unlock;
>> +    }
>> +
>> +    if (((flag & IOMMU_RO) &&
>> +            !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_READ)) ||
>> +        ((flag & IOMMU_WO) &&
>> +            !(mapping_value->flags & VIRTIO_IOMMU_MAP_F_WRITE))) {
>> +        qemu_log_mask(LOG_GUEST_ERROR,
>> +                      "Permission error on 0x%"PRIx64"(%d): allowed=%d\n",
>> +                      addr, flag, mapping_value->flags);
> 
> (Btw, IIUC this may not be a guest error. Say, what if the device is
>  simply broken?)
> 
>> +        goto unlock;
>> +    }
>> +    entry.translated_addr = addr - mapping_key->low + mapping_value->phys_addr;
>> +    entry.perm = flag;
>> +    trace_virtio_iommu_translate_out(addr, entry.translated_addr, sid);
>> +
>> +unlock:
>> +    qemu_mutex_unlock(&s->mutex);
>>      return entry;
>>  }
>>  
>> -- 
>> 2.20.1
>>
> 
Thanks

Eric



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 02/20] virtio-iommu: Add skeleton
  2019-12-10 16:31   ` Jean-Philippe Brucker
@ 2019-12-19 10:31     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-19 10:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 12/10/19 5:31 PM, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On Fri, Nov 22, 2019 at 07:29:25PM +0100, Eric Auger wrote:
>> +typedef struct VirtIOIOMMU {
>> +    VirtIODevice parent_obj;
>> +    VirtQueue *req_vq;
>> +    VirtQueue *event_vq;
>> +    struct virtio_iommu_config config;
>> +    uint64_t features;
>> +    uint64_t acked_features;
> 
> We already have guest_features in the parent object.
That's correct. I also removed the set_features() specific
implementation as I can rely on the default one.
> 
>> +    GHashTable *as_by_busptr;
>> +    IOMMUPciBus *as_by_bus_num[IOMMU_PCI_BUS_MAX];
> 
> Doesn't seem used anymore.
removed
> 
> Thanks,
> Jean
> 
>> +    PCIBus *primary_bus;
>> +    GTree *domains;
>> +    QemuMutex mutex;
>> +    GTree *endpoints;
>> +} VirtIOIOMMU;
>> +
>> +#endif
>> -- 
>> 2.20.1
>>
>>
> 
Thanks

Eric



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration
  2019-12-10 16:50   ` Jean-Philippe Brucker
@ 2019-12-19 11:03     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-19 11:03 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 12/10/19 5:50 PM, Jean-Philippe Brucker wrote:
> On Fri, Nov 22, 2019 at 07:29:41PM +0100, Eric Auger wrote:
>> +static const VMStateDescription vmstate_virtio_iommu_device = {
>> +    .name = "virtio-iommu-device",
>> +    .minimum_version_id = 1,
>> +    .version_id = 1,
>> +    .post_load = iommu_post_load,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_GTREE_DIRECT_KEY_V(domains, VirtIOIOMMU, 1,
>> +                                   &vmstate_domain, viommu_domain),
>> +        VMSTATE_GTREE_DIRECT_KEY_V(endpoints, VirtIOIOMMU, 1,
>> +                                   &vmstate_endpoint, viommu_endpoint),
> 
> So if I understand correctly these fields are state that is modified by
> the guest? We don't need to save/load fields that cannot be modified by
> the guest, static information that is created from the QEMU command-line. 

Yes that's correct.
> 
> I think the above covers everything we need to migrate in VirtIOIOMMU
> then, except for acked_features, which (as I pointed out on another patch)
> seems redundant anyway since there is vdev->guest_features.

you're right, acked features were not properly migrated.
> 
> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
Thanks!

Eric
> 
>> +        VMSTATE_END_OF_LIST()
>> +    },
>> +};
>> +
>> +
>>  static const VMStateDescription vmstate_virtio_iommu = {
>>      .name = "virtio-iommu",
>>      .minimum_version_id = 1,
>> -- 
>> 2.20.1
>>
>>
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-12-19 10:30     ` Auger Eric
@ 2019-12-19 13:33       ` Peter Xu
  2019-12-19 14:38         ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Peter Xu @ 2019-12-19 13:33 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Thu, Dec 19, 2019 at 11:30:40AM +0100, Auger Eric wrote:
> Hi Peter,
> On 12/10/19 8:33 PM, Peter Xu wrote:
> > On Fri, Nov 22, 2019 at 07:29:31PM +0100, Eric Auger wrote:
> >> This patch implements the translate callback
> >>
> >> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>
> >> ---
> >>
> >> v10 -> v11:
> >> - take into account the new value struct and use
> >>   g_tree_lookup_extended
> >> - switched to error_report_once
> >>
> >> v6 -> v7:
> >> - implemented bypass-mode
> >>
> >> v5 -> v6:
> >> - replace error_report by qemu_log_mask
> >>
> >> v4 -> v5:
> >> - check the device domain is not NULL
> >> - s/printf/error_report
> >> - set flags to IOMMU_NONE in case of all translation faults
> >> ---
> >>  hw/virtio/trace-events   |  1 +
> >>  hw/virtio/virtio-iommu.c | 63 +++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 63 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> >> index f25359cee2..de7cbb3c8f 100644
> >> --- a/hw/virtio/trace-events
> >> +++ b/hw/virtio/trace-events
> >> @@ -72,3 +72,4 @@ virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
> >>  virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
> >>  virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
> >>  virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
> >> +virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
> >> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> >> index f0a56833a2..a83666557b 100644
> >> --- a/hw/virtio/virtio-iommu.c
> >> +++ b/hw/virtio/virtio-iommu.c
> >> @@ -412,19 +412,80 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
> >>                                              int iommu_idx)
> >>  {
> >>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> >> +    viommu_interval interval, *mapping_key;
> >> +    viommu_mapping *mapping_value;
> >> +    VirtIOIOMMU *s = sdev->viommu;
> >> +    viommu_endpoint *ep;
> >> +    bool bypass_allowed;
> >>      uint32_t sid;
> >> +    bool found;
> >> +
> >> +    interval.low = addr;
> >> +    interval.high = addr + 1;
> >>  
> >>      IOMMUTLBEntry entry = {
> >>          .target_as = &address_space_memory,
> >>          .iova = addr,
> >>          .translated_addr = addr,
> >> -        .addr_mask = ~(hwaddr)0,
> >> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
> >>          .perm = IOMMU_NONE,
> >>      };
> >>  
> >> +    bypass_allowed = virtio_has_feature(s->acked_features,
> >> +                                        VIRTIO_IOMMU_F_BYPASS);
> >> +
> > 
> > Would it be easier to check bypass_allowed here once and then drop the
> > latter [1] and [2] check?
> bypass_allowed does not mean you systematically bypass. You bypass if
> the SID is unknown or if the device is not attached to any domain.
> Otherwise you translate. But maybe I miss your point.

Ah ok, then could I ask how will this VIRTIO_IOMMU_F_BYPASS be used?
For example, I think VT-d defines passthrough in a totally different
way in that the PT mark will be stored in the per-device context
entries, then we can allow a specific device to be pass-through when
doing DMA.  That information is explicit (e.g., unknown SID will
always fail the DMA), and per-device.

Here do you mean that you just don't put a device into any domain to
show it wants to use PT?  Then I'm not sure how do you identify
whether this is a legal PT or a malicious device (e.g., an unknown
device that even does not have any driver bound to it, which will also
satisfy "unknown SID" and "not attached to any domain", iiuc).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-12-19 13:33       ` Peter Xu
@ 2019-12-19 14:38         ` Auger Eric
  2019-12-19 14:49           ` Peter Xu
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2019-12-19 14:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Peter,

On 12/19/19 2:33 PM, Peter Xu wrote:
> On Thu, Dec 19, 2019 at 11:30:40AM +0100, Auger Eric wrote:
>> Hi Peter,
>> On 12/10/19 8:33 PM, Peter Xu wrote:
>>> On Fri, Nov 22, 2019 at 07:29:31PM +0100, Eric Auger wrote:
>>>> This patch implements the translate callback
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> ---
>>>>
>>>> v10 -> v11:
>>>> - take into account the new value struct and use
>>>>   g_tree_lookup_extended
>>>> - switched to error_report_once
>>>>
>>>> v6 -> v7:
>>>> - implemented bypass-mode
>>>>
>>>> v5 -> v6:
>>>> - replace error_report by qemu_log_mask
>>>>
>>>> v4 -> v5:
>>>> - check the device domain is not NULL
>>>> - s/printf/error_report
>>>> - set flags to IOMMU_NONE in case of all translation faults
>>>> ---
>>>>  hw/virtio/trace-events   |  1 +
>>>>  hw/virtio/virtio-iommu.c | 63 +++++++++++++++++++++++++++++++++++++++-
>>>>  2 files changed, 63 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>>>> index f25359cee2..de7cbb3c8f 100644
>>>> --- a/hw/virtio/trace-events
>>>> +++ b/hw/virtio/trace-events
>>>> @@ -72,3 +72,4 @@ virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
>>>>  virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
>>>>  virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
>>>>  virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
>>>> +virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
>>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>>> index f0a56833a2..a83666557b 100644
>>>> --- a/hw/virtio/virtio-iommu.c
>>>> +++ b/hw/virtio/virtio-iommu.c
>>>> @@ -412,19 +412,80 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>>>>                                              int iommu_idx)
>>>>  {
>>>>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
>>>> +    viommu_interval interval, *mapping_key;
>>>> +    viommu_mapping *mapping_value;
>>>> +    VirtIOIOMMU *s = sdev->viommu;
>>>> +    viommu_endpoint *ep;
>>>> +    bool bypass_allowed;
>>>>      uint32_t sid;
>>>> +    bool found;
>>>> +
>>>> +    interval.low = addr;
>>>> +    interval.high = addr + 1;
>>>>  
>>>>      IOMMUTLBEntry entry = {
>>>>          .target_as = &address_space_memory,
>>>>          .iova = addr,
>>>>          .translated_addr = addr,
>>>> -        .addr_mask = ~(hwaddr)0,
>>>> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
>>>>          .perm = IOMMU_NONE,
>>>>      };
>>>>  
>>>> +    bypass_allowed = virtio_has_feature(s->acked_features,
>>>> +                                        VIRTIO_IOMMU_F_BYPASS);
>>>> +
>>>
>>> Would it be easier to check bypass_allowed here once and then drop the
>>> latter [1] and [2] check?
>> bypass_allowed does not mean you systematically bypass. You bypass if
>> the SID is unknown or if the device is not attached to any domain.
>> Otherwise you translate. But maybe I miss your point.
> 
> Ah ok, then could I ask how will this VIRTIO_IOMMU_F_BYPASS be used?
> For example, I think VT-d defines passthrough in a totally different
> way in that the PT mark will be stored in the per-device context
> entries, then we can allow a specific device to be pass-through when
> doing DMA.  That information is explicit (e.g., unknown SID will
> always fail the DMA), and per-device.
> 
> Here do you mean that you just don't put a device into any domain to
> show it wants to use PT?  Then I'm not sure how do you identify
> whether this is a legal PT or a malicious device (e.g., an unknown
> device that even does not have any driver bound to it, which will also
> satisfy "unknown SID" and "not attached to any domain", iiuc).

The virtio-iommu spec currently says:

"If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, all accesses from
unattached endpoints are
allowed and translated by the IOMMU using the identity function. If the
feature is not negotiated, any
memory access from an unattached endpoint fails. Upon attaching an
endpoint in bypass mode to a new
domain, any memory access from the endpoint fails, since the domain does
not contain any mapping.
"

I guess this can serve the purpose of devices doing early accesses,
before the guest OS gets the hand and maps them?

Thanks

Eric
> 
> Thanks,
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-12-19 14:38         ` Auger Eric
@ 2019-12-19 14:49           ` Peter Xu
  2019-12-19 15:09             ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Peter Xu @ 2019-12-19 14:49 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Thu, Dec 19, 2019 at 03:38:34PM +0100, Auger Eric wrote:
> Hi Peter,
> 
> On 12/19/19 2:33 PM, Peter Xu wrote:
> > On Thu, Dec 19, 2019 at 11:30:40AM +0100, Auger Eric wrote:
> >> Hi Peter,
> >> On 12/10/19 8:33 PM, Peter Xu wrote:
> >>> On Fri, Nov 22, 2019 at 07:29:31PM +0100, Eric Auger wrote:
> >>>> This patch implements the translate callback
> >>>>
> >>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>
> >>>> ---
> >>>>
> >>>> v10 -> v11:
> >>>> - take into account the new value struct and use
> >>>>   g_tree_lookup_extended
> >>>> - switched to error_report_once
> >>>>
> >>>> v6 -> v7:
> >>>> - implemented bypass-mode
> >>>>
> >>>> v5 -> v6:
> >>>> - replace error_report by qemu_log_mask
> >>>>
> >>>> v4 -> v5:
> >>>> - check the device domain is not NULL
> >>>> - s/printf/error_report
> >>>> - set flags to IOMMU_NONE in case of all translation faults
> >>>> ---
> >>>>  hw/virtio/trace-events   |  1 +
> >>>>  hw/virtio/virtio-iommu.c | 63 +++++++++++++++++++++++++++++++++++++++-
> >>>>  2 files changed, 63 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
> >>>> index f25359cee2..de7cbb3c8f 100644
> >>>> --- a/hw/virtio/trace-events
> >>>> +++ b/hw/virtio/trace-events
> >>>> @@ -72,3 +72,4 @@ virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
> >>>>  virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
> >>>>  virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
> >>>>  virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
> >>>> +virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
> >>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> >>>> index f0a56833a2..a83666557b 100644
> >>>> --- a/hw/virtio/virtio-iommu.c
> >>>> +++ b/hw/virtio/virtio-iommu.c
> >>>> @@ -412,19 +412,80 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
> >>>>                                              int iommu_idx)
> >>>>  {
> >>>>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> >>>> +    viommu_interval interval, *mapping_key;
> >>>> +    viommu_mapping *mapping_value;
> >>>> +    VirtIOIOMMU *s = sdev->viommu;
> >>>> +    viommu_endpoint *ep;
> >>>> +    bool bypass_allowed;
> >>>>      uint32_t sid;
> >>>> +    bool found;
> >>>> +
> >>>> +    interval.low = addr;
> >>>> +    interval.high = addr + 1;
> >>>>  
> >>>>      IOMMUTLBEntry entry = {
> >>>>          .target_as = &address_space_memory,
> >>>>          .iova = addr,
> >>>>          .translated_addr = addr,
> >>>> -        .addr_mask = ~(hwaddr)0,
> >>>> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
> >>>>          .perm = IOMMU_NONE,
> >>>>      };
> >>>>  
> >>>> +    bypass_allowed = virtio_has_feature(s->acked_features,
> >>>> +                                        VIRTIO_IOMMU_F_BYPASS);
> >>>> +
> >>>
> >>> Would it be easier to check bypass_allowed here once and then drop the
> >>> latter [1] and [2] check?
> >> bypass_allowed does not mean you systematically bypass. You bypass if
> >> the SID is unknown or if the device is not attached to any domain.
> >> Otherwise you translate. But maybe I miss your point.
> > 
> > Ah ok, then could I ask how will this VIRTIO_IOMMU_F_BYPASS be used?
> > For example, I think VT-d defines passthrough in a totally different
> > way in that the PT mark will be stored in the per-device context
> > entries, then we can allow a specific device to be pass-through when
> > doing DMA.  That information is explicit (e.g., unknown SID will
> > always fail the DMA), and per-device.
> > 
> > Here do you mean that you just don't put a device into any domain to
> > show it wants to use PT?  Then I'm not sure how do you identify
> > whether this is a legal PT or a malicious device (e.g., an unknown
> > device that even does not have any driver bound to it, which will also
> > satisfy "unknown SID" and "not attached to any domain", iiuc).
> 
> The virtio-iommu spec currently says:
> 
> "If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, all accesses from
> unattached endpoints are
> allowed and translated by the IOMMU using the identity function. If the
> feature is not negotiated, any
> memory access from an unattached endpoint fails. Upon attaching an
> endpoint in bypass mode to a new
> domain, any memory access from the endpoint fails, since the domain does
> not contain any mapping.
> "
> 
> I guess this can serve the purpose of devices doing early accesses,
> before the guest OS gets the hand and maps them?

OK, so there's no global enablement knob for virtio-iommu? Hmm... Then:

  - This flag is a must for all virtio-iommu emulation, right?
    (otherwise I can't see how system bootstraps..)

  - Should this flag be gone right after OS starts (otherwise I think
    we still have the issue that any malicious device can be seen as
    in PT mode as default)?  How is that done?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-12-19 14:49           ` Peter Xu
@ 2019-12-19 15:09             ` Auger Eric
  2019-12-20 16:26               ` Jean-Philippe Brucker
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2019-12-19 15:09 UTC (permalink / raw)
  To: Peter Xu
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Peter, jean,

On 12/19/19 3:49 PM, Peter Xu wrote:
> On Thu, Dec 19, 2019 at 03:38:34PM +0100, Auger Eric wrote:
>> Hi Peter,
>>
>> On 12/19/19 2:33 PM, Peter Xu wrote:
>>> On Thu, Dec 19, 2019 at 11:30:40AM +0100, Auger Eric wrote:
>>>> Hi Peter,
>>>> On 12/10/19 8:33 PM, Peter Xu wrote:
>>>>> On Fri, Nov 22, 2019 at 07:29:31PM +0100, Eric Auger wrote:
>>>>>> This patch implements the translate callback
>>>>>>
>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v10 -> v11:
>>>>>> - take into account the new value struct and use
>>>>>>   g_tree_lookup_extended
>>>>>> - switched to error_report_once
>>>>>>
>>>>>> v6 -> v7:
>>>>>> - implemented bypass-mode
>>>>>>
>>>>>> v5 -> v6:
>>>>>> - replace error_report by qemu_log_mask
>>>>>>
>>>>>> v4 -> v5:
>>>>>> - check the device domain is not NULL
>>>>>> - s/printf/error_report
>>>>>> - set flags to IOMMU_NONE in case of all translation faults
>>>>>> ---
>>>>>>  hw/virtio/trace-events   |  1 +
>>>>>>  hw/virtio/virtio-iommu.c | 63 +++++++++++++++++++++++++++++++++++++++-
>>>>>>  2 files changed, 63 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
>>>>>> index f25359cee2..de7cbb3c8f 100644
>>>>>> --- a/hw/virtio/trace-events
>>>>>> +++ b/hw/virtio/trace-events
>>>>>> @@ -72,3 +72,4 @@ virtio_iommu_get_endpoint(uint32_t ep_id) "Alloc endpoint=%d"
>>>>>>  virtio_iommu_put_endpoint(uint32_t ep_id) "Free endpoint=%d"
>>>>>>  virtio_iommu_get_domain(uint32_t domain_id) "Alloc domain=%d"
>>>>>>  virtio_iommu_put_domain(uint32_t domain_id) "Free domain=%d"
>>>>>> +virtio_iommu_translate_out(uint64_t virt_addr, uint64_t phys_addr, uint32_t sid) "0x%"PRIx64" -> 0x%"PRIx64 " for sid=%d"
>>>>>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>>>>>> index f0a56833a2..a83666557b 100644
>>>>>> --- a/hw/virtio/virtio-iommu.c
>>>>>> +++ b/hw/virtio/virtio-iommu.c
>>>>>> @@ -412,19 +412,80 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
>>>>>>                                              int iommu_idx)
>>>>>>  {
>>>>>>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
>>>>>> +    viommu_interval interval, *mapping_key;
>>>>>> +    viommu_mapping *mapping_value;
>>>>>> +    VirtIOIOMMU *s = sdev->viommu;
>>>>>> +    viommu_endpoint *ep;
>>>>>> +    bool bypass_allowed;
>>>>>>      uint32_t sid;
>>>>>> +    bool found;
>>>>>> +
>>>>>> +    interval.low = addr;
>>>>>> +    interval.high = addr + 1;
>>>>>>  
>>>>>>      IOMMUTLBEntry entry = {
>>>>>>          .target_as = &address_space_memory,
>>>>>>          .iova = addr,
>>>>>>          .translated_addr = addr,
>>>>>> -        .addr_mask = ~(hwaddr)0,
>>>>>> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
>>>>>>          .perm = IOMMU_NONE,
>>>>>>      };
>>>>>>  
>>>>>> +    bypass_allowed = virtio_has_feature(s->acked_features,
>>>>>> +                                        VIRTIO_IOMMU_F_BYPASS);
>>>>>> +
>>>>>
>>>>> Would it be easier to check bypass_allowed here once and then drop the
>>>>> latter [1] and [2] check?
>>>> bypass_allowed does not mean you systematically bypass. You bypass if
>>>> the SID is unknown or if the device is not attached to any domain.
>>>> Otherwise you translate. But maybe I miss your point.
>>>
>>> Ah ok, then could I ask how will this VIRTIO_IOMMU_F_BYPASS be used?
>>> For example, I think VT-d defines passthrough in a totally different
>>> way in that the PT mark will be stored in the per-device context
>>> entries, then we can allow a specific device to be pass-through when
>>> doing DMA.  That information is explicit (e.g., unknown SID will
>>> always fail the DMA), and per-device.
>>>
>>> Here do you mean that you just don't put a device into any domain to
>>> show it wants to use PT?  Then I'm not sure how do you identify
>>> whether this is a legal PT or a malicious device (e.g., an unknown
>>> device that even does not have any driver bound to it, which will also
>>> satisfy "unknown SID" and "not attached to any domain", iiuc).
>>
>> The virtio-iommu spec currently says:
>>
>> "If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, all accesses from
>> unattached endpoints are
>> allowed and translated by the IOMMU using the identity function. If the
>> feature is not negotiated, any
>> memory access from an unattached endpoint fails. Upon attaching an
>> endpoint in bypass mode to a new
>> domain, any memory access from the endpoint fails, since the domain does
>> not contain any mapping.
>> "
>>
>> I guess this can serve the purpose of devices doing early accesses,
>> before the guest OS gets the hand and maps them?
> 
> OK, so there's no global enablement knob for virtio-iommu? Hmm... Then:
well this is a global knob. If this is bot negotiated any unmapped
device can PT.

My assumption above must be wrong as this is a negotiated feature so
anyway the virtio-iommu driver should be involved.

I don't really remember the rationale of the feature bit tbh.

In "[virtio-dev] RE: [RFC] virtio-iommu version 0.4 " Jean discussed
that with Kevein. Sorry I cannot find the link.

" If the endpoint is not attached to any address space,
then the device MAY abort the transaction."

Kevin> From definition of BYPASS, it's orthogonal to whether there is an
address space attached, then should we still allow "May abort" behavior?

Jean> The behavior is left as an implementation choice, and I'm not sure
it's worth enforcing in the architecture. If the endpoint isn't attached
to any domain then (unless VIRTIO_IOMMU_F_BYPASS is negotiated), it
isn't necessarily able to do DMA at all. The virtio-iommu device may
setup DMA mastering lazily, in which case any DMA transaction would
abort, or have setup DMA already, in which case the endpoint can access
MEM_T_BYPASS regions.

Hopefully Jean will remember and comment on this.

Thanks

Eric

> 
>   - This flag is a must for all virtio-iommu emulation, right?
>     (otherwise I can't see how system bootstraps..)
> 
>   - Should this flag be gone right after OS starts (otherwise I think
>     we still have the issue that any malicious device can be seen as
>     in PT mode as default)?  How is that done?
> 
> Thanks,
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions
  2019-12-10 16:34   ` Jean-Philippe Brucker
@ 2019-12-19 18:11     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-19 18:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 12/10/19 5:34 PM, Jean-Philippe Brucker wrote:
> Two small things below, but looks good overall
> 
> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> 
> On Fri, Nov 22, 2019 at 07:29:27PM +0100, Eric Auger wrote:
>> +static AddressSpace *virtio_iommu_find_add_as(PCIBus *bus, void *opaque,
>> +                                              int devfn)
>> +{
>> +    VirtIOIOMMU *s = opaque;
>> +    IOMMUPciBus *sbus = g_hash_table_lookup(s->as_by_busptr, bus);
>> +    static uint32_t mr_index;
>> +    IOMMUDevice *sdev;
>> +
>> +    if (!sbus) {
>> +        sbus = g_malloc0(sizeof(IOMMUPciBus) +
>> +                         sizeof(IOMMUDevice *) * IOMMU_PCI_DEVFN_MAX);
>> +        sbus->bus = bus;
>> +        g_hash_table_insert(s->as_by_busptr, bus, sbus);
>> +    }
>> +
>> +    sdev = sbus->pbdev[devfn];
>> +    if (!sdev) {
>> +        char *name = g_strdup_printf("%s-%d-%d",
>> +                                     TYPE_VIRTIO_IOMMU_MEMORY_REGION,
>> +                                     mr_index++, devfn);
>> +        sdev = sbus->pbdev[devfn] = g_malloc0(sizeof(IOMMUDevice));
>> +
>> +        sdev->viommu = s;
>> +        sdev->bus = bus;
>> +        sdev->devfn = devfn;
> 
> It might be better to store the endpoint ID in IOMMUDevice, then you could
> get rid of virtio_iommu_get_sid(), and remove a tiny bit of overhead in
> virtio_iommu_translate(). But I doubt it's significant.
virtio_iommu_find_add_as() gets called on PCI bus enumeration. At that
point, the bus number may not be resolved. So I cannot retrieve and set
the bus_number in this function.

When virtio_iommu_get_sid() is called we are sure pci_bus_num(dev->bus)
returns a correct value.
> 
> [...]
>> +static const TypeInfo virtio_iommu_memory_region_info = {
>> +    .parent = TYPE_IOMMU_MEMORY_REGION,
>> +    .name = TYPE_VIRTIO_IOMMU_MEMORY_REGION,
>> +    .class_init = virtio_iommu_memory_region_class_init,
>> +};
>> +
>> +
> 
> nit: newline.
Thanks

Eric
> 
> Thanks,
> Jean 
> 




^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers
  2019-12-10 16:37   ` Jean-Philippe Brucker
@ 2019-12-19 18:31     ` Auger Eric
  2019-12-20 17:00       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2019-12-19 18:31 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 12/10/19 5:37 PM, Jean-Philippe Brucker wrote:
> On Fri, Nov 22, 2019 at 07:29:28PM +0100, Eric Auger wrote:
>> +typedef struct viommu_domain {
>> +    uint32_t id;
>> +    GTree *mappings;
>> +    QLIST_HEAD(, viommu_endpoint) endpoint_list;
>> +} viommu_domain;
>> +
>> +typedef struct viommu_endpoint {
>> +    uint32_t id;
>> +    viommu_domain *domain;
>> +    QLIST_ENTRY(viommu_endpoint) next;
>> +} viommu_endpoint;
> 
> There might be a way to merge viommu_endpoint and the IOMMUDevice
> structure introduced in patch 4, since they both represent one endpoint.
> Maybe virtio_iommu_find_add_pci_as() could add the IOMMUDevice to
> s->endpoints, and IOMMUDevice could store the endpoint ID rather than bus
> and devfn.

On PCI bus enumeration we locally store the PCI bus hierarchy under the
form of GHashTable of IOMMUDevice indexed by iommu_pci_bus pointer.
Those are all the devices attached to the downstream buses. We also use
an array of iommu pci bus pointers indexed by bus number that is lazily
populated due to the fact, at enumeration time we do know the bus number
yet. As you pointed, I haven't used the array of iommu pci bus pointers
indexed by bus number in this series and I should actually. Currently I
am not checking on attach that the sid effectively corresponds to a sid
protected by this iommu. I will add this in my next version. The above
structures are used in intel_iommu and smmu code as well and I think
eventually this may be factorized a common base class..

on the other hand the gtree of viommu_endpoint - soon renamed in
CamelCase form ;-) - corresponds to the EPs that are actually attached
to any domain. It is indexed by sid and not by bus pointer. This is more
adapted to the virtio-iommu case.

So, despite your suggestion, I am tempted to keep the different
structures as the first ones are common to all iommu emulation code and
the last is adapted to the virtio-iommu operations.

Thoughts?

Eric

> 
>> +typedef struct viommu_interval {
>> +    uint64_t low;
>> +    uint64_t high;
>> +} viommu_interval;
> 
> I guess these should be named in CamelCase? Although if we're allowed to
> choose my vote goes to underscores :)
> 
> Thanks,
> Jean
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-12-19 15:09             ` Auger Eric
@ 2019-12-20 16:26               ` Jean-Philippe Brucker
  2019-12-20 16:51                 ` Peter Xu
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-20 16:26 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, Peter Xu, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Thu, Dec 19, 2019 at 04:09:47PM +0100, Auger Eric wrote:
> >>>>>> @@ -412,19 +412,80 @@ static IOMMUTLBEntry virtio_iommu_translate(IOMMUMemoryRegion *mr, hwaddr addr,
> >>>>>>                                              int iommu_idx)
> >>>>>>  {
> >>>>>>      IOMMUDevice *sdev = container_of(mr, IOMMUDevice, iommu_mr);
> >>>>>> +    viommu_interval interval, *mapping_key;
> >>>>>> +    viommu_mapping *mapping_value;
> >>>>>> +    VirtIOIOMMU *s = sdev->viommu;
> >>>>>> +    viommu_endpoint *ep;
> >>>>>> +    bool bypass_allowed;
> >>>>>>      uint32_t sid;
> >>>>>> +    bool found;
> >>>>>> +
> >>>>>> +    interval.low = addr;
> >>>>>> +    interval.high = addr + 1;
> >>>>>>  
> >>>>>>      IOMMUTLBEntry entry = {
> >>>>>>          .target_as = &address_space_memory,
> >>>>>>          .iova = addr,
> >>>>>>          .translated_addr = addr,
> >>>>>> -        .addr_mask = ~(hwaddr)0,
> >>>>>> +        .addr_mask = (1 << ctz32(s->config.page_size_mask)) - 1,
> >>>>>>          .perm = IOMMU_NONE,
> >>>>>>      };
> >>>>>>  
> >>>>>> +    bypass_allowed = virtio_has_feature(s->acked_features,
> >>>>>> +                                        VIRTIO_IOMMU_F_BYPASS);
> >>>>>> +
> >>>>>
> >>>>> Would it be easier to check bypass_allowed here once and then drop the
> >>>>> latter [1] and [2] check?
> >>>> bypass_allowed does not mean you systematically bypass. You bypass if
> >>>> the SID is unknown or if the device is not attached to any domain.
> >>>> Otherwise you translate. But maybe I miss your point.
> >>>
> >>> Ah ok, then could I ask how will this VIRTIO_IOMMU_F_BYPASS be used?
> >>> For example, I think VT-d defines passthrough in a totally different
> >>> way in that the PT mark will be stored in the per-device context
> >>> entries, then we can allow a specific device to be pass-through when
> >>> doing DMA.  That information is explicit (e.g., unknown SID will
> >>> always fail the DMA), and per-device.
> >>>
> >>> Here do you mean that you just don't put a device into any domain to
> >>> show it wants to use PT?  Then I'm not sure how do you identify
> >>> whether this is a legal PT or a malicious device (e.g., an unknown
> >>> device that even does not have any driver bound to it, which will also
> >>> satisfy "unknown SID" and "not attached to any domain", iiuc).
> >>
> >> The virtio-iommu spec currently says:
> >>
> >> "If the VIRTIO_IOMMU_F_BYPASS feature is negotiated, all accesses from
> >> unattached endpoints are
> >> allowed and translated by the IOMMU using the identity function. If the
> >> feature is not negotiated, any
> >> memory access from an unattached endpoint fails. Upon attaching an
> >> endpoint in bypass mode to a new
> >> domain, any memory access from the endpoint fails, since the domain does
> >> not contain any mapping.
> >> "
> >>
> >> I guess this can serve the purpose of devices doing early accesses,
> >> before the guest OS gets the hand and maps them?
> > 
> > OK, so there's no global enablement knob for virtio-iommu? Hmm... Then:

There is at the virtio transport level: the driver sets status to
FEATURES_OK once it accepted the feature bits, and to DRIVER_OK once its
fully operational. The virtio-iommu spec says:

  If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
  device SHOULD NOT let endpoints access the guest-physical address space.

So before features negotiation, there is no access. Afterwards it depends
if the VIRTIO_IOMMU_F_BYPASS has been accepted by the driver.

> well this is a global knob. If this is bot negotiated any unmapped
> device can PT.
> 
> My assumption above must be wrong as this is a negotiated feature so
> anyway the virtio-iommu driver should be involved.
> 
> I don't really remember the rationale of the feature bit tbh.

I don't remember writing down a rationale for this bit, it was in the very
first version (I think someone suggested it during the initial internal
discussion) and I didn't remove it afterwards because it seems useful:

Say a guest only wants to use the vIOMMU for userspace assignment and
wants all other endpoints to bypass translation, which is our primary
use-case. In other words booting Linux with iommu.passthrough=1. It can
either create an identity domain for each endpoint (one MAP request with
VA==PA) or it can set the VIRTIO_IOMMU_F_BYPASS bit. The device-side
implementation should be more efficient with the latter, since you don't
need to lookup the domain + address space for each access.

> In "[virtio-dev] RE: [RFC] virtio-iommu version 0.4 " Jean discussed
> that with Kevein. Sorry I cannot find the link.
> 
> " If the endpoint is not attached to any address space,
> then the device MAY abort the transaction."

Hmm, that was regarding a "bypass" reserved memory region, which isn't in
the current spec.

> Kevin> From definition of BYPASS, it's orthogonal to whether there is an
> address space attached, then should we still allow "May abort" behavior?
> 
> Jean> The behavior is left as an implementation choice, and I'm not sure
> it's worth enforcing in the architecture. If the endpoint isn't attached
> to any domain then (unless VIRTIO_IOMMU_F_BYPASS is negotiated), it
> isn't necessarily able to do DMA at all. The virtio-iommu device may
> setup DMA mastering lazily, in which case any DMA transaction would
> abort, or have setup DMA already, in which case the endpoint can access
> MEM_T_BYPASS regions.
> 
> Hopefully Jean will remember and comment on this.
> 
> Thanks
> 
> Eric
> 
> > 
> >   - This flag is a must for all virtio-iommu emulation, right?
> >     (otherwise I can't see how system bootstraps..)

What do you mean by system bootstrap?

One thing I've been wondering, and may be related, is how to handle a
bootloader that wants to read for example an initrd from a virtio-block
device that's behind the IOMMU. Either we allow the device to let any DMA
bypass the device until FEATURES_OK, which is a source of vulnerabilities
[1], or we have to implement some support for the virtio-iommu in the
BIOS. Again the F_BYPASS bit would help for this, since all the BIOS has
to do is set it on boot. However, F_BYPASS is optional, and more complex
support is needed for setting up identity mappings.

[1] See "IOMMU protection against I/O attacks: a vulnerability and a proof
of concept" by Morgan et al, where a malicious device bypassing the IOMMU
overwrites the IOMMU configuration as it is being created by the OS.
Arguably we're not too concerned about malicious devices at the moment,
but I'm not comfortable relaxing this.

> >   - Should this flag be gone right after OS starts (otherwise I think
> >     we still have the issue that any malicious device can be seen as
> >     in PT mode as default)?  How is that done?

Yes bypass mode assumes that devices and drivers aren't malicious, and the
IOMMU is only used for things like assigning devices to guest userspace,
or having large contiguous DMA buffers.

Thanks,
Jean



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-12-20 16:26               ` Jean-Philippe Brucker
@ 2019-12-20 16:51                 ` Peter Xu
  2020-01-06 17:06                   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 89+ messages in thread
From: Peter Xu @ 2019-12-20 16:51 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru, Auger Eric,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Dec 20, 2019 at 05:26:42PM +0100, Jean-Philippe Brucker wrote:
> There is at the virtio transport level: the driver sets status to
> FEATURES_OK once it accepted the feature bits, and to DRIVER_OK once its
> fully operational. The virtio-iommu spec says:
> 
>   If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
>   device SHOULD NOT let endpoints access the guest-physical address space.
> 
> So before features negotiation, there is no access. Afterwards it depends
> if the VIRTIO_IOMMU_F_BYPASS has been accepted by the driver.

Before enabling virtio-iommu device, should we still let the devices
to access the whole system address space?  I believe that's at least
what Intel IOMMUs are doing.  From code-wise, its:

    if (likely(s->dmar_enabled)) {
        success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn,
                                         addr, flag & IOMMU_WO, &iotlb);
    } else {
        /* DMAR disabled, passthrough, use 4k-page*/
        iotlb.iova = addr & VTD_PAGE_MASK_4K;
        iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
        iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
        iotlb.perm = IOMMU_RW;
        success = true;
    }

From hardware-wise, an IOMMU should be close to transparent if you
never enable it, imho.

Otherwise I'm confused on how a guest (with virtio-iommu) could boot
with a normal BIOS that does not contain a virtio-iommu driver.  For
example, what if the BIOS needs to read some block sectors (as you
mentioned)?

> > >   - This flag is a must for all virtio-iommu emulation, right?
> > >     (otherwise I can't see how system bootstraps..)
> 
> What do you mean by system bootstrap?

Sorry, I meant when the system boots before the OS.

> 
> One thing I've been wondering, and may be related, is how to handle a
> bootloader that wants to read for example an initrd from a virtio-block
> device that's behind the IOMMU.

My understanding is that virtio devices are special in that they can
use the VIRTIO_F_IOMMU_PLATFORM flag to bypass any vIOMMU (though, I
don't think that'll work when virtio hardwares comes to the
world.. because they can't really bypass the IOMMU hardware).

> Either we allow the device to let any DMA
> bypass the device until FEATURES_OK, which is a source of vulnerabilities
> [1], or we have to implement some support for the virtio-iommu in the
> BIOS. Again the F_BYPASS bit would help for this, since all the BIOS has
> to do is set it on boot. However, F_BYPASS is optional, and more complex
> support is needed for setting up identity mappings.
> 
> [1] See "IOMMU protection against I/O attacks: a vulnerability and a proof
> of concept" by Morgan et al, where a malicious device bypassing the IOMMU
> overwrites the IOMMU configuration as it is being created by the OS.
> Arguably we're not too concerned about malicious devices at the moment,
> but I'm not comfortable relaxing this.
> 
> > >   - Should this flag be gone right after OS starts (otherwise I think
> > >     we still have the issue that any malicious device can be seen as
> > >     in PT mode as default)?  How is that done?
> 
> Yes bypass mode assumes that devices and drivers aren't malicious, and the
> IOMMU is only used for things like assigning devices to guest userspace,
> or having large contiguous DMA buffers.

Yes I agree.  However again when the BYPASS flag was introduced, have
you thought of introducing that flag per-device?  IMHO that could be
better because you have a finer granularity on controlling all these,
so you'll be able to reject malicious devices but at the meantime
grant permission to trusted devices.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers
  2019-12-19 18:31     ` Auger Eric
@ 2019-12-20 17:00       ` Jean-Philippe Brucker
  2019-12-23  9:11         ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2019-12-20 17:00 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Thu, Dec 19, 2019 at 07:31:08PM +0100, Auger Eric wrote:
> Hi Jean,
> 
> On 12/10/19 5:37 PM, Jean-Philippe Brucker wrote:
> > On Fri, Nov 22, 2019 at 07:29:28PM +0100, Eric Auger wrote:
> >> +typedef struct viommu_domain {
> >> +    uint32_t id;
> >> +    GTree *mappings;
> >> +    QLIST_HEAD(, viommu_endpoint) endpoint_list;
> >> +} viommu_domain;
> >> +
> >> +typedef struct viommu_endpoint {
> >> +    uint32_t id;
> >> +    viommu_domain *domain;
> >> +    QLIST_ENTRY(viommu_endpoint) next;
> >> +} viommu_endpoint;
> > 
> > There might be a way to merge viommu_endpoint and the IOMMUDevice
> > structure introduced in patch 4, since they both represent one endpoint.
> > Maybe virtio_iommu_find_add_pci_as() could add the IOMMUDevice to
> > s->endpoints, and IOMMUDevice could store the endpoint ID rather than bus
> > and devfn.
> 
> On PCI bus enumeration we locally store the PCI bus hierarchy under the
> form of GHashTable of IOMMUDevice indexed by iommu_pci_bus pointer.
> Those are all the devices attached to the downstream buses. We also use
> an array of iommu pci bus pointers indexed by bus number that is lazily
> populated due to the fact, at enumeration time we do know the bus number
> yet. As you pointed, I haven't used the array of iommu pci bus pointers
> indexed by bus number in this series and I should actually. Currently I
> am not checking on attach that the sid effectively corresponds to a sid
> protected by this iommu. I will add this in my next version. The above
> structures are used in intel_iommu and smmu code as well and I think
> eventually this may be factorized a common base class..
> 
> on the other hand the gtree of viommu_endpoint - soon renamed in
> CamelCase form ;-) - corresponds to the EPs that are actually attached
> to any domain. It is indexed by sid and not by bus pointer. This is more
> adapted to the virtio-iommu case.
> 
> So, despite your suggestion, I am tempted to keep the different
> structures as the first ones are common to all iommu emulation code and
> the last is adapted to the virtio-iommu operations.
> 
> Thoughts?

Makes sense, it seems better to keep them separate. I had missed that the
PCI bus number is resolved later, and started to move the endpoint ID into
IOMMUDevice when adding MMIO support, but I'll need to revisit this.

I'll be off for two weeks, have a nice holiday!

Thanks,
Jean



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers
  2019-12-20 17:00       ` Jean-Philippe Brucker
@ 2019-12-23  9:11         ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-23  9:11 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 12/20/19 6:00 PM, Jean-Philippe Brucker wrote:
> On Thu, Dec 19, 2019 at 07:31:08PM +0100, Auger Eric wrote:
>> Hi Jean,
>>
>> On 12/10/19 5:37 PM, Jean-Philippe Brucker wrote:
>>> On Fri, Nov 22, 2019 at 07:29:28PM +0100, Eric Auger wrote:
>>>> +typedef struct viommu_domain {
>>>> +    uint32_t id;
>>>> +    GTree *mappings;
>>>> +    QLIST_HEAD(, viommu_endpoint) endpoint_list;
>>>> +} viommu_domain;
>>>> +
>>>> +typedef struct viommu_endpoint {
>>>> +    uint32_t id;
>>>> +    viommu_domain *domain;
>>>> +    QLIST_ENTRY(viommu_endpoint) next;
>>>> +} viommu_endpoint;
>>>
>>> There might be a way to merge viommu_endpoint and the IOMMUDevice
>>> structure introduced in patch 4, since they both represent one endpoint.
>>> Maybe virtio_iommu_find_add_pci_as() could add the IOMMUDevice to
>>> s->endpoints, and IOMMUDevice could store the endpoint ID rather than bus
>>> and devfn.
>>
>> On PCI bus enumeration we locally store the PCI bus hierarchy under the
>> form of GHashTable of IOMMUDevice indexed by iommu_pci_bus pointer.
>> Those are all the devices attached to the downstream buses. We also use
>> an array of iommu pci bus pointers indexed by bus number that is lazily
>> populated due to the fact, at enumeration time we do know the bus number
>> yet. As you pointed, I haven't used the array of iommu pci bus pointers
>> indexed by bus number in this series and I should actually. Currently I
>> am not checking on attach that the sid effectively corresponds to a sid
>> protected by this iommu. I will add this in my next version. The above
>> structures are used in intel_iommu and smmu code as well and I think
>> eventually this may be factorized a common base class..
>>
>> on the other hand the gtree of viommu_endpoint - soon renamed in
>> CamelCase form ;-) - corresponds to the EPs that are actually attached
>> to any domain. It is indexed by sid and not by bus pointer. This is more
>> adapted to the virtio-iommu case.
>>
>> So, despite your suggestion, I am tempted to keep the different
>> structures as the first ones are common to all iommu emulation code and
>> the last is adapted to the virtio-iommu operations.
>>
>> Thoughts?
> 
> Makes sense, it seems better to keep them separate. I had missed that the
> PCI bus number is resolved later, and started to move the endpoint ID into
> IOMMUDevice when adding MMIO support, but I'll need to revisit this.
> 
> I'll be off for two weeks, have a nice holiday!

Thanks, you too.

Merry Christmas! :-)

Eric
> 
> Thanks,
> Jean
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 06/20] virtio-iommu: Implement attach/detach command
  2019-12-10 16:41   ` Jean-Philippe Brucker
@ 2019-12-23  9:14     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-23  9:14 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 12/10/19 5:41 PM, Jean-Philippe Brucker wrote:
> On Fri, Nov 22, 2019 at 07:29:29PM +0100, Eric Auger wrote:
>> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
>> index 235bde2203..138d5b2a9c 100644
>> --- a/hw/virtio/virtio-iommu.c
>> +++ b/hw/virtio/virtio-iommu.c
>> @@ -77,11 +77,12 @@ static gint interval_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
>>  static void virtio_iommu_detach_endpoint_from_domain(viommu_endpoint *ep)
>>  {
>>      QLIST_REMOVE(ep, next);
>> +    g_tree_unref(ep->domain->mappings);
>>      ep->domain = NULL;
>>  }
>>  
>> -viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id);
>> -viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s, uint32_t ep_id)
>> +static viommu_endpoint *virtio_iommu_get_endpoint(VirtIOIOMMU *s,
>> +                                                  uint32_t ep_id)
>>  {
>>      viommu_endpoint *ep;
>>  
>> @@ -102,15 +103,14 @@ static void virtio_iommu_put_endpoint(gpointer data)
>>  
>>      if (ep->domain) {
>>          virtio_iommu_detach_endpoint_from_domain(ep);
>> -        g_tree_unref(ep->domain->mappings);
>>      }
>>  
>>      trace_virtio_iommu_put_endpoint(ep->id);
>>      g_free(ep);
>>  }
>>  
>> -viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id);
>> -viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s, uint32_t domain_id)
>> +static viommu_domain *virtio_iommu_get_domain(VirtIOIOMMU *s,
>> +                                              uint32_t domain_id)
> 
> Looks like the above change belong to patch 5?
virtio_iommu_get_domain was not used yet in last patch. I turn it into
static now it gets used.
> 
>>  {
>>      viommu_domain *domain;
>>  
>> @@ -137,7 +137,6 @@ static void virtio_iommu_put_domain(gpointer data)
>>      QLIST_FOREACH_SAFE(iter, &domain->endpoint_list, next, tmp) {
>>          virtio_iommu_detach_endpoint_from_domain(iter);
>>      }
>> -    g_tree_destroy(domain->mappings);
> 
> When created by virtio_iommu_get_domain(), mappings has one reference.
> Then for each attach (including the first one) an additional reference is
> taken, and freed by virtio_iommu_detach_endpoint_from_domain(). So I think
> there are two problems:
> 
> * virtio_iommu_put_domain() drops one ref for each endpoint, but we still
>   have one reference to mappings, so they're not freed. We do need this
>   g_tree_destroy()
> 
> * After detaching all the endpoints, the guest may reuse the domain ID for
>   another domain, but the previous mappings haven't been erased. Not sure
>   how to fix this using the g_tree refs, because dropping all the
>   references will free the internal tree data and it won't be reusable.

You're perfectly right, mappings were not destroyed and I missed that.
So I made 2 modifications:
- do not increment the ref count on the first EP addition
- destroy the domain when its EP list get empty.

Thanks

Eric
> 
> Thanks,
> Jean
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 07/20] virtio-iommu: Implement map/unmap
  2019-12-10 16:43   ` Jean-Philippe Brucker
@ 2019-12-23  9:42     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-23  9:42 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi jean,

On 12/10/19 5:43 PM, Jean-Philippe Brucker wrote:
> On Fri, Nov 22, 2019 at 07:29:30PM +0100, Eric Auger wrote:
>> @@ -238,10 +244,35 @@ static int virtio_iommu_map(VirtIOIOMMU *s,
>>      uint64_t virt_start = le64_to_cpu(req->virt_start);
>>      uint64_t virt_end = le64_to_cpu(req->virt_end);
>>      uint32_t flags = le32_to_cpu(req->flags);
>> +    viommu_domain *domain;
>> +    viommu_interval *interval;
>> +    viommu_mapping *mapping;
> 
> Additional checks would be good. Most importantly we need to return
> S_INVAL if we don't recognize a bit in flags (a MUST in the spec). 
Sure

It
> might be good to check that addresses are aligned on the page granule as
> well, and return S_RANGE if they aren't (a SHOULD in the spec), but I
> don't care as much.
with KVM accelerated guest I don't have access to the guest page size,
hence the choice of not checking it.
> 
>> +
>> +    interval = g_malloc0(sizeof(*interval));
>> +
>> +    interval->low = virt_start;
>> +    interval->high = virt_end;
>> +
>> +    domain = g_tree_lookup(s->domains, GUINT_TO_POINTER(domain_id));
>> +    if (!domain) {
>> +        return VIRTIO_IOMMU_S_NOENT;
> 
> Leaks interval, I guess you could allocate it after this block.
Sure

Thanks!

Eric
> 
> Thanks,
> Jean
> 
>> +    }
>> +
>> +    mapping = g_tree_lookup(domain->mappings, (gpointer)interval);
>> +    if (mapping) {
>> +        g_free(interval);
>> +        return VIRTIO_IOMMU_S_INVAL;
>> +    }
>>  
>>      trace_virtio_iommu_map(domain_id, virt_start, virt_end, phys_start, flags);
>>  
>> -    return VIRTIO_IOMMU_S_UNSUPP;
>> +    mapping = g_malloc0(sizeof(*mapping));
>> +    mapping->phys_addr = phys_start;
>> +    mapping->flags = flags;
>> +
>> +    g_tree_insert(domain->mappings, interval, mapping);
>> +
>> +    return VIRTIO_IOMMU_S_OK;
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration
  2019-12-10 20:01   ` Peter Xu
@ 2019-12-24  7:39     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-24  7:39 UTC (permalink / raw)
  To: Peter Xu
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Peter,

On 12/10/19 9:01 PM, Peter Xu wrote:
> On Fri, Nov 22, 2019 at 07:29:41PM +0100, Eric Auger wrote:
>> +static const VMStateDescription vmstate_virtio_iommu_device = {
>> +    .name = "virtio-iommu-device",
>> +    .minimum_version_id = 1,
>> +    .version_id = 1,
>> +    .post_load = iommu_post_load,
>> +    .fields = (VMStateField[]) {
>> +        VMSTATE_GTREE_DIRECT_KEY_V(domains, VirtIOIOMMU, 1,
>> +                                   &vmstate_domain, viommu_domain),
>> +        VMSTATE_GTREE_DIRECT_KEY_V(endpoints, VirtIOIOMMU, 1,
>> +                                   &vmstate_endpoint, viommu_endpoint),
> 
> IIUC vmstate_domain already contains all the endpoint information (in
> endpoint_list of vmstate_domain), but here we migrate it twice. 

I migrated both because at that time I considered we could have
endpoints not attached to any domains but I think I can now simplify
based on the fact any EP is attached.


 I
> suppose that's why now we need reconstruct_ep_domain_link() to fixup
> the duplicated migration?

Even if I only migrate the domain gtree, I need to reconstruct the
ep->domain which was not migrated, on purpose, as it pointed to the old
domain in the origin.
> 
> Then I'll instead ask whether we can skip migrating here?  Then in
> post_load we simply:
> 
>   foreach(domain)
>     foreach(endpoint in domain)
>       g_tree_insert(s->endpoints);
> 
> It might help to avoid the reconstruct_ep_domain_link ugliness?
I agree that it is simpler. Also need to update the ep->domain as
mentionned above. Thank you for the suggestion.


> 
> And besides, I also agree with Jean that the endpoint data structure
> could be reused with IOMMUDevice somehow.

As I replied to Jean, I think it makes sense to keep both structures as
endpoints are not indexed by the same key and the bus number is resolved
later.

Thanks

Eric
> 
> Thanks,
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci
  2019-12-10 16:50   ` Jean-Philippe Brucker
@ 2019-12-24  7:39     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2019-12-24  7:39 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, peterx, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 12/10/19 5:50 PM, Jean-Philippe Brucker wrote:
> On Fri, Nov 22, 2019 at 07:29:42PM +0100, Eric Auger wrote:
>> The virtio-iommu-pci is instantiated through the -device QEMU
>> option. However if instantiated it also requires an IORT ACPI table
>> to describe the ID mappings between the root complex and the iommu.
>>
>> This patch adds the generation of the IORT table if the
>> virtio-iommu-pci device is instantiated.
>>
>> We also declare the [0xfee00000 - 0xfeefffff] MSI reserved region
>> so that it gets bypassed by the IOMMU.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> It would be nice to factor the IORT code with arm, but this looks OK.
I factorized the iort table code generation. Not sure this will be used
eventually but well.

Thanks

Eric
> 
> Reviewed-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2019-12-20 16:51                 ` Peter Xu
@ 2020-01-06 17:06                   ` Jean-Philippe Brucker
  2020-01-06 17:58                     ` Peter Xu
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2020-01-06 17:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru, Auger Eric,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Fri, Dec 20, 2019 at 11:51:00AM -0500, Peter Xu wrote:
> On Fri, Dec 20, 2019 at 05:26:42PM +0100, Jean-Philippe Brucker wrote:
> > There is at the virtio transport level: the driver sets status to
> > FEATURES_OK once it accepted the feature bits, and to DRIVER_OK once its
> > fully operational. The virtio-iommu spec says:
> > 
> >   If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
> >   device SHOULD NOT let endpoints access the guest-physical address space.
> > 
> > So before features negotiation, there is no access. Afterwards it depends
> > if the VIRTIO_IOMMU_F_BYPASS has been accepted by the driver.
> 
> Before enabling virtio-iommu device, should we still let the devices
> to access the whole system address space?  I believe that's at least
> what Intel IOMMUs are doing.  From code-wise, its:
> 
>     if (likely(s->dmar_enabled)) {
>         success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn,
>                                          addr, flag & IOMMU_WO, &iotlb);
>     } else {
>         /* DMAR disabled, passthrough, use 4k-page*/
>         iotlb.iova = addr & VTD_PAGE_MASK_4K;
>         iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
>         iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
>         iotlb.perm = IOMMU_RW;
>         success = true;
>     }
> 
> From hardware-wise, an IOMMU should be close to transparent if you
> never enable it, imho.

For hardware that's not necessarily the best choice. As cited in my
previous reply it has been shown to introduce vulnerabilities since
malicious devices can DMA during boot, before the OS takes control of the
IOMMU. The Arm SMMU allows an implementation to adopt a deny policy by
default.

> Otherwise I'm confused on how a guest (with virtio-iommu) could boot
> with a normal BIOS that does not contain a virtio-iommu driver.  For
> example, what if the BIOS needs to read some block sectors (as you
> mentioned)?

Ideally we should aim at supporting the device in both the BIOS and the
OS. Failing that, there should at least be a way to instantiate a
virtio-iommu device that is blocking by default, that cannot be bypassed
unless the controlling software decides to allow it. Could the bypass
policy be a command-line option to the virtio-iommu device?

[...]
> > Yes bypass mode assumes that devices and drivers aren't malicious, and the
> > IOMMU is only used for things like assigning devices to guest userspace,
> > or having large contiguous DMA buffers.
> 
> Yes I agree.  However again when the BYPASS flag was introduced, have
> you thought of introducing that flag per-device?  IMHO that could be
> better because you have a finer granularity on controlling all these,
> so you'll be able to reject malicious devices but at the meantime
> grant permission to trusted devices.

At the moment that per-device behavior can be emulated by sending an
ATTACH request followed by identity MAP. It could be a little more
efficient to add a "bypass" flag to the ATTACH request and avoid setting
up the identity mapping manually, since the device then wouldn't need to
look up the mapping on translation, but I don't know how much it would
improve performance. The device could also cache the fact that the address
space is identity-mapped, for the same result. The domain lookup has to
happen in any case, so you can never get the full iommu-free performance
with these mechanisms.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-06 17:06                   ` Jean-Philippe Brucker
@ 2020-01-06 17:58                     ` Peter Xu
  2020-01-07 10:10                       ` Jean-Philippe Brucker
  0 siblings, 1 reply; 89+ messages in thread
From: Peter Xu @ 2020-01-06 17:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru, Auger Eric,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Mon, Jan 06, 2020 at 06:06:34PM +0100, Jean-Philippe Brucker wrote:
> On Fri, Dec 20, 2019 at 11:51:00AM -0500, Peter Xu wrote:
> > On Fri, Dec 20, 2019 at 05:26:42PM +0100, Jean-Philippe Brucker wrote:
> > > There is at the virtio transport level: the driver sets status to
> > > FEATURES_OK once it accepted the feature bits, and to DRIVER_OK once its
> > > fully operational. The virtio-iommu spec says:
> > > 
> > >   If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
> > >   device SHOULD NOT let endpoints access the guest-physical address space.
> > > 
> > > So before features negotiation, there is no access. Afterwards it depends
> > > if the VIRTIO_IOMMU_F_BYPASS has been accepted by the driver.
> > 
> > Before enabling virtio-iommu device, should we still let the devices
> > to access the whole system address space?  I believe that's at least
> > what Intel IOMMUs are doing.  From code-wise, its:
> > 
> >     if (likely(s->dmar_enabled)) {
> >         success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn,
> >                                          addr, flag & IOMMU_WO, &iotlb);
> >     } else {
> >         /* DMAR disabled, passthrough, use 4k-page*/
> >         iotlb.iova = addr & VTD_PAGE_MASK_4K;
> >         iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
> >         iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
> >         iotlb.perm = IOMMU_RW;
> >         success = true;
> >     }
> > 
> > From hardware-wise, an IOMMU should be close to transparent if you
> > never enable it, imho.
> 
> For hardware that's not necessarily the best choice. As cited in my
> previous reply it has been shown to introduce vulnerabilities since
> malicious devices can DMA during boot, before the OS takes control of the
> IOMMU. The Arm SMMU allows an implementation to adopt a deny policy by
> default.

I see.  But then how to read a sector from the block to at least boot
an OS if we use a default-deny policy?  Does it still need a mapping
that is established somehow by someone before hand?

> 
> > Otherwise I'm confused on how a guest (with virtio-iommu) could boot
> > with a normal BIOS that does not contain a virtio-iommu driver.  For
> > example, what if the BIOS needs to read some block sectors (as you
> > mentioned)?
> 
> Ideally we should aim at supporting the device in both the BIOS and the
> OS. Failing that, there should at least be a way to instantiate a
> virtio-iommu device that is blocking by default, that cannot be bypassed
> unless the controlling software decides to allow it. Could the bypass
> policy be a command-line option to the virtio-iommu device?
> 
> [...]
> > > Yes bypass mode assumes that devices and drivers aren't malicious, and the
> > > IOMMU is only used for things like assigning devices to guest userspace,
> > > or having large contiguous DMA buffers.
> > 
> > Yes I agree.  However again when the BYPASS flag was introduced, have
> > you thought of introducing that flag per-device?  IMHO that could be
> > better because you have a finer granularity on controlling all these,
> > so you'll be able to reject malicious devices but at the meantime
> > grant permission to trusted devices.
> 
> At the moment that per-device behavior can be emulated by sending an
> ATTACH request followed by identity MAP. It could be a little more
> efficient to add a "bypass" flag to the ATTACH request and avoid setting
> up the identity mapping manually, since the device then wouldn't need to
> look up the mapping on translation, but I don't know how much it would
> improve performance. The device could also cache the fact that the address
> space is identity-mapped, for the same result. The domain lookup has to
> happen in any case, so you can never get the full iommu-free performance
> with these mechanisms.

IMHO it's really a matter of whether virtio-iommu wants to have a
device layer besides the domain layer for the initial versions (just
like VT-d has a device context, then it points to a domain, so it has
these two layers).  But I agree for the bypass feature it should work
(though trying to detect "a device is put into an identital domain is
bypassed" is still a bit tricky to me).  And after all virtio-iommu is
always extensible when needs come.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-06 17:58                     ` Peter Xu
@ 2020-01-07 10:10                       ` Jean-Philippe Brucker
  2020-01-08 16:55                         ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2020-01-07 10:10 UTC (permalink / raw)
  To: Peter Xu
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru, Auger Eric,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Mon, Jan 06, 2020 at 12:58:50PM -0500, Peter Xu wrote:
> On Mon, Jan 06, 2020 at 06:06:34PM +0100, Jean-Philippe Brucker wrote:
> > On Fri, Dec 20, 2019 at 11:51:00AM -0500, Peter Xu wrote:
> > > On Fri, Dec 20, 2019 at 05:26:42PM +0100, Jean-Philippe Brucker wrote:
> > > > There is at the virtio transport level: the driver sets status to
> > > > FEATURES_OK once it accepted the feature bits, and to DRIVER_OK once its
> > > > fully operational. The virtio-iommu spec says:
> > > > 
> > > >   If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
> > > >   device SHOULD NOT let endpoints access the guest-physical address space.
> > > > 
> > > > So before features negotiation, there is no access. Afterwards it depends
> > > > if the VIRTIO_IOMMU_F_BYPASS has been accepted by the driver.
> > > 
> > > Before enabling virtio-iommu device, should we still let the devices
> > > to access the whole system address space?  I believe that's at least
> > > what Intel IOMMUs are doing.  From code-wise, its:
> > > 
> > >     if (likely(s->dmar_enabled)) {
> > >         success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn,
> > >                                          addr, flag & IOMMU_WO, &iotlb);
> > >     } else {
> > >         /* DMAR disabled, passthrough, use 4k-page*/
> > >         iotlb.iova = addr & VTD_PAGE_MASK_4K;
> > >         iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
> > >         iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
> > >         iotlb.perm = IOMMU_RW;
> > >         success = true;
> > >     }
> > > 
> > > From hardware-wise, an IOMMU should be close to transparent if you
> > > never enable it, imho.
> > 
> > For hardware that's not necessarily the best choice. As cited in my
> > previous reply it has been shown to introduce vulnerabilities since
> > malicious devices can DMA during boot, before the OS takes control of the
> > IOMMU. The Arm SMMU allows an implementation to adopt a deny policy by
> > default.
> 
> I see.  But then how to read a sector from the block to at least boot
> an OS if we use a default-deny policy?  Does it still need a mapping
> that is established somehow by someone before hand?

Yes, it looks like EDK II uses IOMMU operations in order to access those
devices on platforms where the IOMMU isn't default-bypass (AMD SEV support
is provided by edk2, and a VT-d driver seems provided by edk2-platforms).
However for OVMF we could just set the bypass feature bit in virtio-iommu
device, which doesn't even requires setting up the virtqueue.

I'm missing a piece of the puzzle for Arm platforms though, because it
looks like Trusted Firmware-A sets up the default-deny policy on reset
even when it wasn't hardwired, but doesn't provide a service to create
SMMUv3 mappings for the bootloader.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 01/20] migration: Support QLIST migration
  2019-11-27 11:46   ` Dr. David Alan Gilbert
@ 2020-01-08 13:19     ` Juan Quintela
  2020-01-08 13:40       ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Juan Quintela @ 2020-01-08 13:19 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, qemu-devel, peterx, armbru, Eric Auger,
	bharatb.linux, qemu-arm, eric.auger.pro

"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> * Eric Auger (eric.auger@redhat.com) wrote:
>> Support QLIST migration using the same principle as QTAILQ:
>> 94869d5c52 ("migration: migrate QTAILQ").
>> 
>> The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
>> The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
>> and QLIST_RAW_REVERSE.
>> 
>> Tests also are provided.
>> 
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> 
>> +    while (qemu_get_byte(f)) {
>> +        elm = g_malloc(size);
>> +        ret = vmstate_load_state(f, vmsd, elm, version_id);
>> +        if (ret) {
>> +            error_report("%s: failed to load %s (%d)", field->name,
>> +                         vmsd->name, ret);
>> +            g_free(elm);
>> +            return ret;
>> +        }
>> +        QLIST_RAW_INSERT_HEAD(pv, elm, entry_offset);
>> +    }
>> +    QLIST_RAW_REVERSE(pv, elm, entry_offset);
>
> Can you explain why you need to do a REVERSE on the loaded list,
> rather than using doing a QLIST_INSERT_AFTER to always insert at
> the end?
>
> Other than that it looks good.

This was my fault (integrated as this is).

Old code had a "walk to the end of the list" and then insert.
I told it was way faster just to insert and the beggining and then
reverse.  I didn't noticed that we had the previous element to know
where to insert.

Eric, feel free to send a patch on top of this, or I will do it.

Later, Juan.



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 01/20] migration: Support QLIST migration
  2020-01-08 13:19     ` Juan Quintela
@ 2020-01-08 13:40       ` Auger Eric
  2020-01-08 13:51         ` Juan Quintela
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2020-01-08 13:40 UTC (permalink / raw)
  To: quintela, Dr. David Alan Gilbert
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, qemu-devel, peterx, armbru, bharatb.linux,
	qemu-arm, eric.auger.pro

Hi Juan,

On 1/8/20 2:19 PM, Juan Quintela wrote:
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>> * Eric Auger (eric.auger@redhat.com) wrote:
>>> Support QLIST migration using the same principle as QTAILQ:
>>> 94869d5c52 ("migration: migrate QTAILQ").
>>>
>>> The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
>>> The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
>>> and QLIST_RAW_REVERSE.
>>>
>>> Tests also are provided.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>
>>> +    while (qemu_get_byte(f)) {
>>> +        elm = g_malloc(size);
>>> +        ret = vmstate_load_state(f, vmsd, elm, version_id);
>>> +        if (ret) {
>>> +            error_report("%s: failed to load %s (%d)", field->name,
>>> +                         vmsd->name, ret);
>>> +            g_free(elm);
>>> +            return ret;
>>> +        }
>>> +        QLIST_RAW_INSERT_HEAD(pv, elm, entry_offset);
>>> +    }
>>> +    QLIST_RAW_REVERSE(pv, elm, entry_offset);
>>
>> Can you explain why you need to do a REVERSE on the loaded list,
>> rather than using doing a QLIST_INSERT_AFTER to always insert at
>> the end?
>>
>> Other than that it looks good.
> 
> This was my fault (integrated as this is).
> 
> Old code had a "walk to the end of the list" and then insert.
> I told it was way faster just to insert and the beggining and then
> reverse.  I didn't noticed that we had the previous element to know
> where to insert.

Not sure I get your comment. To insert at the end one needs to walk
though the list. The head has no prev pointer pointing to the tail as
opposed to the queue. So I understood Dave's comment as "just explain
why you prefered this solution against the QLIST_INSERT_AFTER alternative.

Thanks

Eric
> 
> Eric, feel free to send a patch on top of this, or I will do it.

> 
> Later, Juan.
> 
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 01/20] migration: Support QLIST migration
  2020-01-08 13:40       ` Auger Eric
@ 2020-01-08 13:51         ` Juan Quintela
  2020-01-08 14:02           ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Juan Quintela @ 2020-01-08 13:51 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, armbru, Dr. David Alan Gilbert, peterx,
	qemu-devel, bharatb.linux, qemu-arm, eric.auger.pro

Auger Eric <eric.auger@redhat.com> wrote:
> Hi Juan,
>
> On 1/8/20 2:19 PM, Juan Quintela wrote:
>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>>> * Eric Auger (eric.auger@redhat.com) wrote:
>>>> Support QLIST migration using the same principle as QTAILQ:
>>>> 94869d5c52 ("migration: migrate QTAILQ").
>>>>
>>>> The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
>>>> The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
>>>> and QLIST_RAW_REVERSE.
>>>>
>>>> Tests also are provided.
>>>>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> +    while (qemu_get_byte(f)) {
>>>> +        elm = g_malloc(size);
>>>> +        ret = vmstate_load_state(f, vmsd, elm, version_id);
>>>> +        if (ret) {
>>>> +            error_report("%s: failed to load %s (%d)", field->name,
>>>> +                         vmsd->name, ret);
>>>> +            g_free(elm);
>>>> +            return ret;
>>>> +        }
>>>> +        QLIST_RAW_INSERT_HEAD(pv, elm, entry_offset);
>>>> +    }
>>>> +    QLIST_RAW_REVERSE(pv, elm, entry_offset);
>>>
>>> Can you explain why you need to do a REVERSE on the loaded list,
>>> rather than using doing a QLIST_INSERT_AFTER to always insert at
>>> the end?
>>>
>>> Other than that it looks good.
>> 
>> This was my fault (integrated as this is).
>> 
>> Old code had a "walk to the end of the list" and then insert.
>> I told it was way faster just to insert and the beggining and then
>> reverse.  I didn't noticed that we had the previous element to know
>> where to insert.
>
> Not sure I get your comment. To insert at the end one needs to walk
> though the list. The head has no prev pointer pointing to the tail as
> opposed to the queue. So I understood Dave's comment as "just explain
> why you prefered this solution against the QLIST_INSERT_AFTER alternative.

You have the previous inserted element, so it is kind of easy O:-)

    prev = NULL;
    while (qemu_get_byte(f)) {
        elm = g_malloc(size);
        ret = vmstate_load_state(f, vmsd, elm, version_id);
        if (ret) {
            error_report("%s: failed to load %s (%d)", field->name,
                         vmsd->name, ret);
            g_free(elm);
            return ret;
        }
        if (!prev) {
            QLIST_RAW_INSERT_HEAD(pv, elm, entry_offset);
        } else {
            QLIST_RAW_INSERT_AFTER(prev, elm, entry_offset);
        }
        prev = elm;
    }

And yes, I realize that there is no QLIST_RAW_INSTERT_AFTER() (it is
QLIST_INSERT_AFTER).  And no, I haven't took the time to understand the
different between QLIST and QLIST_RAW.  From a quick look, it seems that
QLIST_RAW is embededed inside other structure.

But as said, we can move that to another patch.

Later, Juan.



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 01/20] migration: Support QLIST migration
  2020-01-08 13:51         ` Juan Quintela
@ 2020-01-08 14:02           ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2020-01-08 14:02 UTC (permalink / raw)
  To: quintela
  Cc: yang.zhong, peter.maydell, kevin.tian, qemu-devel, tnowicki, mst,
	jean-philippe.brucker, armbru, peterx, Dr. David Alan Gilbert,
	bharatb.linux, qemu-arm, eric.auger.pro

Hi Juan,

On 1/8/20 2:51 PM, Juan Quintela wrote:
> Auger Eric <eric.auger@redhat.com> wrote:
>> Hi Juan,
>>
>> On 1/8/20 2:19 PM, Juan Quintela wrote:
>>> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>>>> * Eric Auger (eric.auger@redhat.com) wrote:
>>>>> Support QLIST migration using the same principle as QTAILQ:
>>>>> 94869d5c52 ("migration: migrate QTAILQ").
>>>>>
>>>>> The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
>>>>> The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
>>>>> and QLIST_RAW_REVERSE.
>>>>>
>>>>> Tests also are provided.
>>>>>
>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>
>>>>> +    while (qemu_get_byte(f)) {
>>>>> +        elm = g_malloc(size);
>>>>> +        ret = vmstate_load_state(f, vmsd, elm, version_id);
>>>>> +        if (ret) {
>>>>> +            error_report("%s: failed to load %s (%d)", field->name,
>>>>> +                         vmsd->name, ret);
>>>>> +            g_free(elm);
>>>>> +            return ret;
>>>>> +        }
>>>>> +        QLIST_RAW_INSERT_HEAD(pv, elm, entry_offset);
>>>>> +    }
>>>>> +    QLIST_RAW_REVERSE(pv, elm, entry_offset);
>>>>
>>>> Can you explain why you need to do a REVERSE on the loaded list,
>>>> rather than using doing a QLIST_INSERT_AFTER to always insert at
>>>> the end?
>>>>
>>>> Other than that it looks good.
>>>
>>> This was my fault (integrated as this is).
>>>
>>> Old code had a "walk to the end of the list" and then insert.
>>> I told it was way faster just to insert and the beggining and then
>>> reverse.  I didn't noticed that we had the previous element to know
>>> where to insert.
>>
>> Not sure I get your comment. To insert at the end one needs to walk
>> though the list. The head has no prev pointer pointing to the tail as
>> opposed to the queue. So I understood Dave's comment as "just explain
>> why you prefered this solution against the QLIST_INSERT_AFTER alternative.
> 
> You have the previous inserted element, so it is kind of easy O:-)
> 
>     prev = NULL;
>     while (qemu_get_byte(f)) {
>         elm = g_malloc(size);
>         ret = vmstate_load_state(f, vmsd, elm, version_id);
>         if (ret) {
>             error_report("%s: failed to load %s (%d)", field->name,
>                          vmsd->name, ret);
>             g_free(elm);
>             return ret;
>         }
>         if (!prev) {
>             QLIST_RAW_INSERT_HEAD(pv, elm, entry_offset);
>         } else {
>             QLIST_RAW_INSERT_AFTER(prev, elm, entry_offset);
>         }
>         prev = elm;
>     }
> 
> And yes, I realize that there is no QLIST_RAW_INSTERT_AFTER() (it is
> QLIST_INSERT_AFTER).  And no, I haven't took the time to understand the
> different between QLIST and QLIST_RAW.  From a quick look, it seems that
> QLIST_RAW is embededed inside other structure.

Ah OK I get it now. Yes indeed that looks simpler.

> 
> But as said, we can move that to another patch.

OK.

Thanks

Eric
> 
> Later, Juan.
> 
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-07 10:10                       ` Jean-Philippe Brucker
@ 2020-01-08 16:55                         ` Auger Eric
  2020-01-09  8:47                           ` Jean-Philippe Brucker
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2020-01-08 16:55 UTC (permalink / raw)
  To: Jean-Philippe Brucker, Peter Xu
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean-Philippe, Peter,

On 1/7/20 11:10 AM, Jean-Philippe Brucker wrote:
> On Mon, Jan 06, 2020 at 12:58:50PM -0500, Peter Xu wrote:
>> On Mon, Jan 06, 2020 at 06:06:34PM +0100, Jean-Philippe Brucker wrote:
>>> On Fri, Dec 20, 2019 at 11:51:00AM -0500, Peter Xu wrote:
>>>> On Fri, Dec 20, 2019 at 05:26:42PM +0100, Jean-Philippe Brucker wrote:
>>>>> There is at the virtio transport level: the driver sets status to
>>>>> FEATURES_OK once it accepted the feature bits, and to DRIVER_OK once its
>>>>> fully operational. The virtio-iommu spec says:
>>>>>
>>>>>   If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
>>>>>   device SHOULD NOT let endpoints access the guest-physical address space.
>>>>>
>>>>> So before features negotiation, there is no access. Afterwards it depends
>>>>> if the VIRTIO_IOMMU_F_BYPASS has been accepted by the driver.
>>>>
>>>> Before enabling virtio-iommu device, should we still let the devices
>>>> to access the whole system address space?  I believe that's at least
>>>> what Intel IOMMUs are doing.  From code-wise, its:
>>>>
>>>>     if (likely(s->dmar_enabled)) {
>>>>         success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn,
>>>>                                          addr, flag & IOMMU_WO, &iotlb);
>>>>     } else {
>>>>         /* DMAR disabled, passthrough, use 4k-page*/
>>>>         iotlb.iova = addr & VTD_PAGE_MASK_4K;
>>>>         iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
>>>>         iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
>>>>         iotlb.perm = IOMMU_RW;
>>>>         success = true;
>>>>     }
>>>>
>>>> From hardware-wise, an IOMMU should be close to transparent if you
>>>> never enable it, imho.
>>>
>>> For hardware that's not necessarily the best choice. As cited in my
>>> previous reply it has been shown to introduce vulnerabilities since
>>> malicious devices can DMA during boot, before the OS takes control of the
>>> IOMMU. The Arm SMMU allows an implementation to adopt a deny policy by
>>> default.
>>
>> I see.  But then how to read a sector from the block to at least boot
>> an OS if we use a default-deny policy?  Does it still need a mapping
>> that is established somehow by someone before hand?
> 
> Yes, it looks like EDK II uses IOMMU operations in order to access those
> devices on platforms where the IOMMU isn't default-bypass (AMD SEV support
> is provided by edk2, and a VT-d driver seems provided by edk2-platforms).
> However for OVMF we could just set the bypass feature bit in virtio-iommu
> device, which doesn't even requires setting up the virtqueue.
> 
> I'm missing a piece of the puzzle for Arm platforms though, because it
> looks like Trusted Firmware-A sets up the default-deny policy on reset
> even when it wasn't hardwired, but doesn't provide a service to create
> SMMUv3 mappings for the bootloader.
> 
> Thanks,
> Jean
> 

I think we have a concrete example for the above discussion. The AHCI.
When running the virtio-iommu on x86 I get messages like:

virtio_iommu_translate sid=250 is not known!!
no buffer available in event queue to report event

and a bunch of "AHCI: Failed to start FIS receive engine: bad FIS
receive buffer address" messages (For each port)

This was reported in my cover letter (*). This happens very early in the
boot process, before the OS get the hand and before the virtio-iommu
driver creates any mapping. It does not prevent the guest from booting
though.

Currently the virtio-iommu device checks the VIRTIO_IOMMU_F_BYPASS. If I
overwrite it to true in the device, then, the guest boots without those
messages.

I share Peter's concern about having a different default policy than x86.

Thanks

Eric

Note the migration issue reported in the cover letter is fixed now and
was due to the migration priority unset.





^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-08 16:55                         ` Auger Eric
@ 2020-01-09  8:47                           ` Jean-Philippe Brucker
  2020-01-09  8:58                             ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2020-01-09  8:47 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, Peter Xu, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Wed, Jan 08, 2020 at 05:55:52PM +0100, Auger Eric wrote:
> Hi Jean-Philippe, Peter,
> 
> On 1/7/20 11:10 AM, Jean-Philippe Brucker wrote:
> > On Mon, Jan 06, 2020 at 12:58:50PM -0500, Peter Xu wrote:
> >> On Mon, Jan 06, 2020 at 06:06:34PM +0100, Jean-Philippe Brucker wrote:
> >>> On Fri, Dec 20, 2019 at 11:51:00AM -0500, Peter Xu wrote:
> >>>> On Fri, Dec 20, 2019 at 05:26:42PM +0100, Jean-Philippe Brucker wrote:
> >>>>> There is at the virtio transport level: the driver sets status to
> >>>>> FEATURES_OK once it accepted the feature bits, and to DRIVER_OK once its
> >>>>> fully operational. The virtio-iommu spec says:
> >>>>>
> >>>>>   If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
> >>>>>   device SHOULD NOT let endpoints access the guest-physical address space.
> >>>>>
> >>>>> So before features negotiation, there is no access. Afterwards it depends
> >>>>> if the VIRTIO_IOMMU_F_BYPASS has been accepted by the driver.
> >>>>
> >>>> Before enabling virtio-iommu device, should we still let the devices
> >>>> to access the whole system address space?  I believe that's at least
> >>>> what Intel IOMMUs are doing.  From code-wise, its:
> >>>>
> >>>>     if (likely(s->dmar_enabled)) {
> >>>>         success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn,
> >>>>                                          addr, flag & IOMMU_WO, &iotlb);
> >>>>     } else {
> >>>>         /* DMAR disabled, passthrough, use 4k-page*/
> >>>>         iotlb.iova = addr & VTD_PAGE_MASK_4K;
> >>>>         iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
> >>>>         iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
> >>>>         iotlb.perm = IOMMU_RW;
> >>>>         success = true;
> >>>>     }
> >>>>
> >>>> From hardware-wise, an IOMMU should be close to transparent if you
> >>>> never enable it, imho.
> >>>
> >>> For hardware that's not necessarily the best choice. As cited in my
> >>> previous reply it has been shown to introduce vulnerabilities since
> >>> malicious devices can DMA during boot, before the OS takes control of the
> >>> IOMMU. The Arm SMMU allows an implementation to adopt a deny policy by
> >>> default.
> >>
> >> I see.  But then how to read a sector from the block to at least boot
> >> an OS if we use a default-deny policy?  Does it still need a mapping
> >> that is established somehow by someone before hand?
> > 
> > Yes, it looks like EDK II uses IOMMU operations in order to access those
> > devices on platforms where the IOMMU isn't default-bypass (AMD SEV support
> > is provided by edk2, and a VT-d driver seems provided by edk2-platforms).
> > However for OVMF we could just set the bypass feature bit in virtio-iommu
> > device, which doesn't even requires setting up the virtqueue.
> > 
> > I'm missing a piece of the puzzle for Arm platforms though, because it
> > looks like Trusted Firmware-A sets up the default-deny policy on reset
> > even when it wasn't hardwired, but doesn't provide a service to create
> > SMMUv3 mappings for the bootloader.
> > 
> > Thanks,
> > Jean
> > 
> 
> I think we have a concrete example for the above discussion. The AHCI.
> When running the virtio-iommu on x86 I get messages like:
> 
> virtio_iommu_translate sid=250 is not known!!
> no buffer available in event queue to report event
> 
> and a bunch of "AHCI: Failed to start FIS receive engine: bad FIS
> receive buffer address" messages (For each port)
> 
> This was reported in my cover letter (*). This happens very early in the
> boot process, before the OS get the hand and before the virtio-iommu
> driver creates any mapping. It does not prevent the guest from booting
> though.
> 
> Currently the virtio-iommu device checks the VIRTIO_IOMMU_F_BYPASS. If I
> overwrite it to true in the device, then, the guest boots without those
> messages.

Oh that's good, I was afraid it was an issue in Linux.

> I share Peter's concern about having a different default policy than x86.

Yes I'd say just align with whatever policy is already in place. Do you
think we could add a command-line option to let people disable
default-bypass, though?  That would be a convenient way to make the IOMMU
protection airtight for those who need it.

Thanks,
Jean


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-09  8:47                           ` Jean-Philippe Brucker
@ 2020-01-09  8:58                             ` Auger Eric
  2020-01-09 10:40                               ` Jean-Philippe Brucker
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2020-01-09  8:58 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, Peter Xu, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 1/9/20 9:47 AM, Jean-Philippe Brucker wrote:
> On Wed, Jan 08, 2020 at 05:55:52PM +0100, Auger Eric wrote:
>> Hi Jean-Philippe, Peter,
>>
>> On 1/7/20 11:10 AM, Jean-Philippe Brucker wrote:
>>> On Mon, Jan 06, 2020 at 12:58:50PM -0500, Peter Xu wrote:
>>>> On Mon, Jan 06, 2020 at 06:06:34PM +0100, Jean-Philippe Brucker wrote:
>>>>> On Fri, Dec 20, 2019 at 11:51:00AM -0500, Peter Xu wrote:
>>>>>> On Fri, Dec 20, 2019 at 05:26:42PM +0100, Jean-Philippe Brucker wrote:
>>>>>>> There is at the virtio transport level: the driver sets status to
>>>>>>> FEATURES_OK once it accepted the feature bits, and to DRIVER_OK once its
>>>>>>> fully operational. The virtio-iommu spec says:
>>>>>>>
>>>>>>>   If the driver does not accept the VIRTIO_IOMMU_F_BYPASS feature, the
>>>>>>>   device SHOULD NOT let endpoints access the guest-physical address space.
>>>>>>>
>>>>>>> So before features negotiation, there is no access. Afterwards it depends
>>>>>>> if the VIRTIO_IOMMU_F_BYPASS has been accepted by the driver.
>>>>>>
>>>>>> Before enabling virtio-iommu device, should we still let the devices
>>>>>> to access the whole system address space?  I believe that's at least
>>>>>> what Intel IOMMUs are doing.  From code-wise, its:
>>>>>>
>>>>>>     if (likely(s->dmar_enabled)) {
>>>>>>         success = vtd_do_iommu_translate(vtd_as, vtd_as->bus, vtd_as->devfn,
>>>>>>                                          addr, flag & IOMMU_WO, &iotlb);
>>>>>>     } else {
>>>>>>         /* DMAR disabled, passthrough, use 4k-page*/
>>>>>>         iotlb.iova = addr & VTD_PAGE_MASK_4K;
>>>>>>         iotlb.translated_addr = addr & VTD_PAGE_MASK_4K;
>>>>>>         iotlb.addr_mask = ~VTD_PAGE_MASK_4K;
>>>>>>         iotlb.perm = IOMMU_RW;
>>>>>>         success = true;
>>>>>>     }
>>>>>>
>>>>>> From hardware-wise, an IOMMU should be close to transparent if you
>>>>>> never enable it, imho.
>>>>>
>>>>> For hardware that's not necessarily the best choice. As cited in my
>>>>> previous reply it has been shown to introduce vulnerabilities since
>>>>> malicious devices can DMA during boot, before the OS takes control of the
>>>>> IOMMU. The Arm SMMU allows an implementation to adopt a deny policy by
>>>>> default.
>>>>
>>>> I see.  But then how to read a sector from the block to at least boot
>>>> an OS if we use a default-deny policy?  Does it still need a mapping
>>>> that is established somehow by someone before hand?
>>>
>>> Yes, it looks like EDK II uses IOMMU operations in order to access those
>>> devices on platforms where the IOMMU isn't default-bypass (AMD SEV support
>>> is provided by edk2, and a VT-d driver seems provided by edk2-platforms).
>>> However for OVMF we could just set the bypass feature bit in virtio-iommu
>>> device, which doesn't even requires setting up the virtqueue.
>>>
>>> I'm missing a piece of the puzzle for Arm platforms though, because it
>>> looks like Trusted Firmware-A sets up the default-deny policy on reset
>>> even when it wasn't hardwired, but doesn't provide a service to create
>>> SMMUv3 mappings for the bootloader.
>>>
>>> Thanks,
>>> Jean
>>>
>>
>> I think we have a concrete example for the above discussion. The AHCI.
>> When running the virtio-iommu on x86 I get messages like:
>>
>> virtio_iommu_translate sid=250 is not known!!
>> no buffer available in event queue to report event
>>
>> and a bunch of "AHCI: Failed to start FIS receive engine: bad FIS
>> receive buffer address" messages (For each port)
>>
>> This was reported in my cover letter (*). This happens very early in the
>> boot process, before the OS get the hand and before the virtio-iommu
>> driver creates any mapping. It does not prevent the guest from booting
>> though.
>>
>> Currently the virtio-iommu device checks the VIRTIO_IOMMU_F_BYPASS. If I
>> overwrite it to true in the device, then, the guest boots without those
>> messages.
> 
> Oh that's good, I was afraid it was an issue in Linux.
> 
>> I share Peter's concern about having a different default policy than x86.
> 
> Yes I'd say just align with whatever policy is already in place. Do you
> think we could add a command-line option to let people disable
> default-bypass, though?  That would be a convenient way to make the IOMMU
> protection airtight for those who need it.
Yes I could easily add a device option to disable the default bypass.

Shall we change the meaning of the F_BYPASS feature then? If exposed by
the device, the device does bypass by default, otherwise it doesn't.
This would be controlled by the device option.

The driver then could have means to overwrite this behavior once loaded?

Thanks

Eric
> 
> Thanks,
> Jean
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-09  8:58                             ` Auger Eric
@ 2020-01-09 10:40                               ` Jean-Philippe Brucker
  2020-01-09 11:01                                 ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2020-01-09 10:40 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, Peter Xu, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Thu, Jan 09, 2020 at 09:58:49AM +0100, Auger Eric wrote:
> >> I share Peter's concern about having a different default policy than x86.
> > 
> > Yes I'd say just align with whatever policy is already in place. Do you
> > think we could add a command-line option to let people disable
> > default-bypass, though?  That would be a convenient way to make the IOMMU
> > protection airtight for those who need it.
> Yes I could easily add a device option to disable the default bypass.
> 
> Shall we change the meaning of the F_BYPASS feature then? If exposed by
> the device, the device does bypass by default, otherwise it doesn't.
> This would be controlled by the device option.

For a device that doesn't do bypass by default, the driver wouldn't have
the ability to enable bypass (feature bit not offered, not negotiable).

> The driver then could have means to overwrite this behavior once loaded?

Let's keep the bypass feature bit for this. If the bit is offered, the
driver chooses to enable or disable it. If the bit is not offered or not
negotiated, then the behavior is deny. If the bit is offered and
negotiated then the behavior is allow.

We can say that before features negotiation (latched at features register
write, I think, in practice?) the behavior is platform dependent. The
current wording about bypass intends to discourage unsafe choices but
makes a strong statement only about the device behavior after features
negotiation. 

We could add a second feature bit specifically for the boot bypass
behavior. It wouldn't be useful to the OS (the driver doesn't have a
choice) but could present a bit in config space that allows a firmware to
disable boot-bypass in a way that is sticky across reset. So when the OS
resets the device after taking it over, it doesn't accidentally enable
bypass. I wouldn't bother though. If a FW/bootloader is able to support
virtio-iommu, the user might as well instantiate the device with the
default-deny option.

Thanks,
Jean



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-09 10:40                               ` Jean-Philippe Brucker
@ 2020-01-09 11:01                                 ` Auger Eric
  2020-01-09 11:15                                   ` Jean-Philippe Brucker
  0 siblings, 1 reply; 89+ messages in thread
From: Auger Eric @ 2020-01-09 11:01 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, Peter Xu, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi,

On 1/9/20 11:40 AM, Jean-Philippe Brucker wrote:
> On Thu, Jan 09, 2020 at 09:58:49AM +0100, Auger Eric wrote:
>>>> I share Peter's concern about having a different default policy than x86.
>>>
>>> Yes I'd say just align with whatever policy is already in place. Do you
>>> think we could add a command-line option to let people disable
>>> default-bypass, though?  That would be a convenient way to make the IOMMU
>>> protection airtight for those who need it.
>> Yes I could easily add a device option to disable the default bypass.
>>
>> Shall we change the meaning of the F_BYPASS feature then? If exposed by
>> the device, the device does bypass by default, otherwise it doesn't.
>> This would be controlled by the device option.
> 
> For a device that doesn't do bypass by default, the driver wouldn't have
> the ability to enable bypass (feature bit not offered, not negotiable).
> 
>> The driver then could have means to overwrite this behavior once loaded?
> 
> Let's keep the bypass feature bit for this. If the bit is offered, the
> driver chooses to enable or disable it. If the bit is not offered or not
> negotiated, then the behavior is deny. If the bit is offered and
> negotiated then the behavior is allow.
> 
> We can say that before features negotiation (latched at features register
> write, I think, in practice?) the behavior is platform dependent. The
> current wording about bypass intends to discourage unsafe choices but
> makes a strong statement only about the device behavior after features
> negotiation. 
OK. May be worth adding in the spec later.

By the way what is the plan for the vote?

Thanks

Eric
> 
> We could add a second feature bit specifically for the boot bypass
> behavior. It wouldn't be useful to the OS (the driver doesn't have a
> choice) but could present a bit in config space that allows a firmware to
> disable boot-bypass in a way that is sticky across reset. So when the OS
> resets the device after taking it over, it doesn't accidentally enable
> bypass. I wouldn't bother though. If a FW/bootloader is able to support
> virtio-iommu, the user might as well instantiate the device with the
> default-deny option.
> 
> Thanks,
> Jean
> 
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-09 11:01                                 ` Auger Eric
@ 2020-01-09 11:15                                   ` Jean-Philippe Brucker
  2020-01-09 11:32                                     ` Auger Eric
  0 siblings, 1 reply; 89+ messages in thread
From: Jean-Philippe Brucker @ 2020-01-09 11:15 UTC (permalink / raw)
  To: Auger Eric
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, Peter Xu, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

On Thu, Jan 09, 2020 at 12:01:26PM +0100, Auger Eric wrote:
> Hi,
> 
> On 1/9/20 11:40 AM, Jean-Philippe Brucker wrote:
> > On Thu, Jan 09, 2020 at 09:58:49AM +0100, Auger Eric wrote:
> >>>> I share Peter's concern about having a different default policy than x86.
> >>>
> >>> Yes I'd say just align with whatever policy is already in place. Do you
> >>> think we could add a command-line option to let people disable
> >>> default-bypass, though?  That would be a convenient way to make the IOMMU
> >>> protection airtight for those who need it.
> >> Yes I could easily add a device option to disable the default bypass.
> >>
> >> Shall we change the meaning of the F_BYPASS feature then? If exposed by
> >> the device, the device does bypass by default, otherwise it doesn't.
> >> This would be controlled by the device option.
> > 
> > For a device that doesn't do bypass by default, the driver wouldn't have
> > the ability to enable bypass (feature bit not offered, not negotiable).
> > 
> >> The driver then could have means to overwrite this behavior once loaded?
> > 
> > Let's keep the bypass feature bit for this. If the bit is offered, the
> > driver chooses to enable or disable it. If the bit is not offered or not
> > negotiated, then the behavior is deny. If the bit is offered and
> > negotiated then the behavior is allow.
> > 
> > We can say that before features negotiation (latched at features register
> > write, I think, in practice?) the behavior is platform dependent. The
> > current wording about bypass intends to discourage unsafe choices but
> > makes a strong statement only about the device behavior after features
> > negotiation. 
> OK. May be worth adding in the spec later.
> 
> By the way what is the plan for the vote?

The ballot closed and the spec is accepted for virtio-v1.2-cs01, with the
condition that the stale statement about padding is removed
(https://lists.oasis-open.org/archives/virtio-dev/201911/msg00083.html)

Thanks,
Jean


^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate
  2020-01-09 11:15                                   ` Jean-Philippe Brucker
@ 2020-01-09 11:32                                     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2020-01-09 11:32 UTC (permalink / raw)
  To: Jean-Philippe Brucker
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, mst,
	jean-philippe.brucker, quintela, qemu-devel, Peter Xu, armbru,
	bharatb.linux, qemu-arm, dgilbert, eric.auger.pro

Hi Jean,

On 1/9/20 12:15 PM, Jean-Philippe Brucker wrote:
> On Thu, Jan 09, 2020 at 12:01:26PM +0100, Auger Eric wrote:
>> Hi,
>>
>> On 1/9/20 11:40 AM, Jean-Philippe Brucker wrote:
>>> On Thu, Jan 09, 2020 at 09:58:49AM +0100, Auger Eric wrote:
>>>>>> I share Peter's concern about having a different default policy than x86.
>>>>>
>>>>> Yes I'd say just align with whatever policy is already in place. Do you
>>>>> think we could add a command-line option to let people disable
>>>>> default-bypass, though?  That would be a convenient way to make the IOMMU
>>>>> protection airtight for those who need it.
>>>> Yes I could easily add a device option to disable the default bypass.
>>>>
>>>> Shall we change the meaning of the F_BYPASS feature then? If exposed by
>>>> the device, the device does bypass by default, otherwise it doesn't.
>>>> This would be controlled by the device option.
>>>
>>> For a device that doesn't do bypass by default, the driver wouldn't have
>>> the ability to enable bypass (feature bit not offered, not negotiable).
>>>
>>>> The driver then could have means to overwrite this behavior once loaded?
>>>
>>> Let's keep the bypass feature bit for this. If the bit is offered, the
>>> driver chooses to enable or disable it. If the bit is not offered or not
>>> negotiated, then the behavior is deny. If the bit is offered and
>>> negotiated then the behavior is allow.
>>>
>>> We can say that before features negotiation (latched at features register
>>> write, I think, in practice?) the behavior is platform dependent. The
>>> current wording about bypass intends to discourage unsafe choices but
>>> makes a strong statement only about the device behavior after features
>>> negotiation. 
>> OK. May be worth adding in the spec later.
>>
>> By the way what is the plan for the vote?
> 
> The ballot closed and the spec is accepted for virtio-v1.2-cs01, with the
> condition that the stale statement about padding is removed
> (https://lists.oasis-open.org/archives/virtio-dev/201911/msg00083.html)

Ah OK. Sorry I missed the outcome. Congratulations!

Eric
> 
> Thanks,
> Jean
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci
  2019-11-22 18:29 ` [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci Eric Auger
  2019-12-10 16:50   ` Jean-Philippe Brucker
@ 2020-01-09 12:02   ` Michael S. Tsirkin
  2020-01-09 13:34     ` Auger Eric
  1 sibling, 1 reply; 89+ messages in thread
From: Michael S. Tsirkin @ 2020-01-09 12:02 UTC (permalink / raw)
  To: Eric Auger
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, quintela,
	jean-philippe.brucker, qemu-devel, peterx, armbru, bharatb.linux,
	qemu-arm, dgilbert, eric.auger.pro

On Fri, Nov 22, 2019 at 07:29:42PM +0100, Eric Auger wrote:
> The virtio-iommu-pci is instantiated through the -device QEMU
> option. However if instantiated it also requires an IORT ACPI table
> to describe the ID mappings between the root complex and the iommu.
> 
> This patch adds the generation of the IORT table if the
> virtio-iommu-pci device is instantiated.
> 
> We also declare the [0xfee00000 - 0xfeefffff] MSI reserved region
> so that it gets bypassed by the IOMMU.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

So I'd like us to use virtio config space in preference to ACPI.

Guest bits for that are not ready yet, but I think it's
better to wait than maintaining both ACPI and config space forever
later.

Could you send a smaller patchset without pci/acpi bits for now?

> ---
>  hw/i386/acpi-build.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
>  hw/i386/pc.c         | 15 ++++++++-
>  include/hw/i386/pc.h |  2 ++
>  3 files changed, 88 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> index 12ff55fcfb..f09cabdcae 100644
> --- a/hw/i386/acpi-build.c
> +++ b/hw/i386/acpi-build.c
> @@ -2744,6 +2744,72 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
>      return true;
>  }
>  
> +static inline void
> +fill_iort_idmap(AcpiIortIdMapping *idmap, int i,
> +                uint32_t input_base, uint32_t id_count,
> +                uint32_t output_base, uint32_t output_reference)
> +{
> +    idmap[i].input_base = cpu_to_le32(input_base);
> +    idmap[i].id_count = cpu_to_le32(id_count);
> +    idmap[i].output_base = cpu_to_le32(output_base);
> +    idmap[i].output_reference = cpu_to_le32(output_reference);
> +}
> +
> +static void
> +build_iort(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
> +{
> +    size_t iommu_node_size, rc_node_size, iommu_node_offset;
> +    int iort_start = table_data->len;
> +    AcpiIortPVIommuPCI *iommu;
> +    AcpiIortIdMapping *idmap;
> +    AcpiIortTable *iort;
> +    size_t iort_length;
> +    AcpiIortRC *rc;
> +
> +    iort = acpi_data_push(table_data, sizeof(*iort));
> +    iort_length = sizeof(*iort);
> +    iort->node_count = cpu_to_le32(2);
> +
> +    /* virtio-iommu node */
> +
> +    iommu_node_offset = sizeof(*iort);
> +    iort->node_offset = cpu_to_le32(iommu_node_offset);
> +    iommu_node_size = sizeof(*iommu);
> +    iort_length += iommu_node_offset;
> +    iommu = acpi_data_push(table_data, iommu_node_size);
> +    iommu->type = ACPI_IORT_NODE_PARAVIRT;
> +    iommu->length = cpu_to_le16(iommu_node_size);
> +    iommu->mapping_count = 0;
> +    iommu->devid = cpu_to_le32(pcms->virtio_iommu_bdf);
> +    iommu->model = cpu_to_le32(ACPI_IORT_NODE_PV_VIRTIO_IOMMU_PCI);
> +
> +    /* Root Complex Node */
> +    rc_node_size = sizeof(*rc) + 2 * sizeof(*idmap);
> +    iort_length += rc_node_size;
> +    rc = acpi_data_push(table_data, rc_node_size);
> +
> +    rc->type = ACPI_IORT_NODE_PCI_ROOT_COMPLEX;
> +    rc->length = cpu_to_le16(rc_node_size);
> +    rc->mapping_count = cpu_to_le32(2);
> +    rc->mapping_offset = cpu_to_le32(sizeof(*rc));
> +
> +    /* fully coherent device */
> +    rc->memory_properties.cache_coherency = cpu_to_le32(1);
> +    rc->memory_properties.memory_flags = 0x3; /* CCA = CPM = DCAS = 1 */
> +    rc->pci_segment_number = 0; /* MCFG pci_segment */
> +    fill_iort_idmap(rc->id_mapping_array, 0, 0, pcms->virtio_iommu_bdf, 0,
> +                    iommu_node_offset);
> +    fill_iort_idmap(rc->id_mapping_array, 1, pcms->virtio_iommu_bdf + 1,
> +                    0xFFFF - pcms->virtio_iommu_bdf,
> +                    pcms->virtio_iommu_bdf + 1, iommu_node_offset);
> +
> +    iort = (AcpiIortTable *)(table_data->data + iort_start);
> +    iort->length = cpu_to_le32(iort_length);
> +
> +    build_header(linker, table_data, (void *)(table_data->data + iort_start),
> +                 "IORT", table_data->len - iort_start, 0, NULL, NULL);
> +}
> +
>  static
>  void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>  {
> @@ -2835,6 +2901,12 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>              build_slit(tables_blob, tables->linker, machine);
>          }
>      }
> +
> +    if (pcms->virtio_iommu) {
> +        acpi_add_table(table_offsets, tables_blob);
> +        build_iort(tables_blob, tables->linker, pcms);
> +    }
> +
>      if (acpi_get_mcfg(&mcfg)) {
>          acpi_add_table(table_offsets, tables_blob);
>          build_mcfg(tables_blob, tables->linker, &mcfg);
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index ac08e63604..af984ee041 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -84,6 +84,7 @@
>  #include "hw/net/ne2000-isa.h"
>  #include "standard-headers/asm-x86/bootparam.h"
>  #include "hw/virtio/virtio-pmem-pci.h"
> +#include "hw/virtio/virtio-iommu.h"
>  #include "hw/mem/memory-device.h"
>  #include "sysemu/replay.h"
>  #include "qapi/qmp/qerror.h"
> @@ -1940,6 +1941,11 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>          pc_cpu_pre_plug(hotplug_dev, dev, errp);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
>          pc_virtio_pmem_pci_pre_plug(hotplug_dev, dev, errp);
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
> +        /* we declare a VIRTIO_IOMMU_RESV_MEM_T_MSI region */
> +        qdev_prop_set_uint32(dev, "len-reserved-regions", 1);
> +        qdev_prop_set_string(dev, "reserved-regions[0]",
> +                             "0xfee00000, 0xfeefffff, 1");
>      }
>  }
>  
> @@ -1952,6 +1958,12 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>          pc_cpu_plug(hotplug_dev, dev, errp);
>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
>          pc_virtio_pmem_pci_plug(hotplug_dev, dev, errp);
> +    } else if (object_dynamic_cast(OBJECT(dev), "virtio-iommu-pci")) {
> +        PCMachineState *pcms = PC_MACHINE(hotplug_dev);
> +        PCIDevice *pdev = PCI_DEVICE(dev);
> +
> +        pcms->virtio_iommu = true;
> +        pcms->virtio_iommu_bdf = pci_get_bdf(pdev);
>      }
>  }
>  
> @@ -1990,7 +2002,8 @@ static HotplugHandler *pc_get_hotplug_handler(MachineState *machine,
>  {
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>          object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
> -        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
> +        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
> +        object_dynamic_cast(OBJECT(dev), "virtio-iommu-pci")) {
>          return HOTPLUG_HANDLER(machine);
>      }
>  
> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
> index 1f86eba3f9..221b4c6ef9 100644
> --- a/include/hw/i386/pc.h
> +++ b/include/hw/i386/pc.h
> @@ -49,6 +49,8 @@ struct PCMachineState {
>      bool smbus_enabled;
>      bool sata_enabled;
>      bool pit_enabled;
> +    bool virtio_iommu;
> +    uint16_t virtio_iommu_bdf;
>  
>      /* NUMA information: */
>      uint64_t numa_nodes;
> -- 
> 2.20.1



^ permalink raw reply	[flat|nested] 89+ messages in thread

* Re: [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci
  2020-01-09 12:02   ` Michael S. Tsirkin
@ 2020-01-09 13:34     ` Auger Eric
  0 siblings, 0 replies; 89+ messages in thread
From: Auger Eric @ 2020-01-09 13:34 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: yang.zhong, peter.maydell, kevin.tian, tnowicki, quintela,
	jean-philippe.brucker, qemu-devel, peterx, armbru, bharatb.linux,
	qemu-arm, dgilbert, eric.auger.pro

Hi Michael,

On 1/9/20 1:02 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 22, 2019 at 07:29:42PM +0100, Eric Auger wrote:
>> The virtio-iommu-pci is instantiated through the -device QEMU
>> option. However if instantiated it also requires an IORT ACPI table
>> to describe the ID mappings between the root complex and the iommu.
>>
>> This patch adds the generation of the IORT table if the
>> virtio-iommu-pci device is instantiated.
>>
>> We also declare the [0xfee00000 - 0xfeefffff] MSI reserved region
>> so that it gets bypassed by the IOMMU.
>>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> 
> So I'd like us to use virtio config space in preference to ACPI.
> 
> Guest bits for that are not ready yet, but I think it's
> better to wait than maintaining both ACPI and config space forever
> later.
> 
> Could you send a smaller patchset without pci/acpi bits for now?

Yes I am about to send v12.

Indeed I hope the DT integration in ARM virt can land in 5.0. All the
kernel dependencies are resolved and it complies with the voted spec.

Then I will push the non DT integration when Jean's series get
stabilized and spec updated.

Thanks

Eric
> 
>> ---
>>  hw/i386/acpi-build.c | 72 ++++++++++++++++++++++++++++++++++++++++++++
>>  hw/i386/pc.c         | 15 ++++++++-
>>  include/hw/i386/pc.h |  2 ++
>>  3 files changed, 88 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
>> index 12ff55fcfb..f09cabdcae 100644
>> --- a/hw/i386/acpi-build.c
>> +++ b/hw/i386/acpi-build.c
>> @@ -2744,6 +2744,72 @@ static bool acpi_get_mcfg(AcpiMcfgInfo *mcfg)
>>      return true;
>>  }
>>  
>> +static inline void
>> +fill_iort_idmap(AcpiIortIdMapping *idmap, int i,
>> +                uint32_t input_base, uint32_t id_count,
>> +                uint32_t output_base, uint32_t output_reference)
>> +{
>> +    idmap[i].input_base = cpu_to_le32(input_base);
>> +    idmap[i].id_count = cpu_to_le32(id_count);
>> +    idmap[i].output_base = cpu_to_le32(output_base);
>> +    idmap[i].output_reference = cpu_to_le32(output_reference);
>> +}
>> +
>> +static void
>> +build_iort(GArray *table_data, BIOSLinker *linker, PCMachineState *pcms)
>> +{
>> +    size_t iommu_node_size, rc_node_size, iommu_node_offset;
>> +    int iort_start = table_data->len;
>> +    AcpiIortPVIommuPCI *iommu;
>> +    AcpiIortIdMapping *idmap;
>> +    AcpiIortTable *iort;
>> +    size_t iort_length;
>> +    AcpiIortRC *rc;
>> +
>> +    iort = acpi_data_push(table_data, sizeof(*iort));
>> +    iort_length = sizeof(*iort);
>> +    iort->node_count = cpu_to_le32(2);
>> +
>> +    /* virtio-iommu node */
>> +
>> +    iommu_node_offset = sizeof(*iort);
>> +    iort->node_offset = cpu_to_le32(iommu_node_offset);
>> +    iommu_node_size = sizeof(*iommu);
>> +    iort_length += iommu_node_offset;
>> +    iommu = acpi_data_push(table_data, iommu_node_size);
>> +    iommu->type = ACPI_IORT_NODE_PARAVIRT;
>> +    iommu->length = cpu_to_le16(iommu_node_size);
>> +    iommu->mapping_count = 0;
>> +    iommu->devid = cpu_to_le32(pcms->virtio_iommu_bdf);
>> +    iommu->model = cpu_to_le32(ACPI_IORT_NODE_PV_VIRTIO_IOMMU_PCI);
>> +
>> +    /* Root Complex Node */
>> +    rc_node_size = sizeof(*rc) + 2 * sizeof(*idmap);
>> +    iort_length += rc_node_size;
>> +    rc = acpi_data_push(table_data, rc_node_size);
>> +
>> +    rc->type = ACPI_IORT_NODE_PCI_ROOT_COMPLEX;
>> +    rc->length = cpu_to_le16(rc_node_size);
>> +    rc->mapping_count = cpu_to_le32(2);
>> +    rc->mapping_offset = cpu_to_le32(sizeof(*rc));
>> +
>> +    /* fully coherent device */
>> +    rc->memory_properties.cache_coherency = cpu_to_le32(1);
>> +    rc->memory_properties.memory_flags = 0x3; /* CCA = CPM = DCAS = 1 */
>> +    rc->pci_segment_number = 0; /* MCFG pci_segment */
>> +    fill_iort_idmap(rc->id_mapping_array, 0, 0, pcms->virtio_iommu_bdf, 0,
>> +                    iommu_node_offset);
>> +    fill_iort_idmap(rc->id_mapping_array, 1, pcms->virtio_iommu_bdf + 1,
>> +                    0xFFFF - pcms->virtio_iommu_bdf,
>> +                    pcms->virtio_iommu_bdf + 1, iommu_node_offset);
>> +
>> +    iort = (AcpiIortTable *)(table_data->data + iort_start);
>> +    iort->length = cpu_to_le32(iort_length);
>> +
>> +    build_header(linker, table_data, (void *)(table_data->data + iort_start),
>> +                 "IORT", table_data->len - iort_start, 0, NULL, NULL);
>> +}
>> +
>>  static
>>  void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>  {
>> @@ -2835,6 +2901,12 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
>>              build_slit(tables_blob, tables->linker, machine);
>>          }
>>      }
>> +
>> +    if (pcms->virtio_iommu) {
>> +        acpi_add_table(table_offsets, tables_blob);
>> +        build_iort(tables_blob, tables->linker, pcms);
>> +    }
>> +
>>      if (acpi_get_mcfg(&mcfg)) {
>>          acpi_add_table(table_offsets, tables_blob);
>>          build_mcfg(tables_blob, tables->linker, &mcfg);
>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>> index ac08e63604..af984ee041 100644
>> --- a/hw/i386/pc.c
>> +++ b/hw/i386/pc.c
>> @@ -84,6 +84,7 @@
>>  #include "hw/net/ne2000-isa.h"
>>  #include "standard-headers/asm-x86/bootparam.h"
>>  #include "hw/virtio/virtio-pmem-pci.h"
>> +#include "hw/virtio/virtio-iommu.h"
>>  #include "hw/mem/memory-device.h"
>>  #include "sysemu/replay.h"
>>  #include "qapi/qmp/qerror.h"
>> @@ -1940,6 +1941,11 @@ static void pc_machine_device_pre_plug_cb(HotplugHandler *hotplug_dev,
>>          pc_cpu_pre_plug(hotplug_dev, dev, errp);
>>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
>>          pc_virtio_pmem_pci_pre_plug(hotplug_dev, dev, errp);
>> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_IOMMU_PCI)) {
>> +        /* we declare a VIRTIO_IOMMU_RESV_MEM_T_MSI region */
>> +        qdev_prop_set_uint32(dev, "len-reserved-regions", 1);
>> +        qdev_prop_set_string(dev, "reserved-regions[0]",
>> +                             "0xfee00000, 0xfeefffff, 1");
>>      }
>>  }
>>  
>> @@ -1952,6 +1958,12 @@ static void pc_machine_device_plug_cb(HotplugHandler *hotplug_dev,
>>          pc_cpu_plug(hotplug_dev, dev, errp);
>>      } else if (object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
>>          pc_virtio_pmem_pci_plug(hotplug_dev, dev, errp);
>> +    } else if (object_dynamic_cast(OBJECT(dev), "virtio-iommu-pci")) {
>> +        PCMachineState *pcms = PC_MACHINE(hotplug_dev);
>> +        PCIDevice *pdev = PCI_DEVICE(dev);
>> +
>> +        pcms->virtio_iommu = true;
>> +        pcms->virtio_iommu_bdf = pci_get_bdf(pdev);
>>      }
>>  }
>>  
>> @@ -1990,7 +2002,8 @@ static HotplugHandler *pc_get_hotplug_handler(MachineState *machine,
>>  {
>>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) ||
>>          object_dynamic_cast(OBJECT(dev), TYPE_CPU) ||
>> -        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI)) {
>> +        object_dynamic_cast(OBJECT(dev), TYPE_VIRTIO_PMEM_PCI) ||
>> +        object_dynamic_cast(OBJECT(dev), "virtio-iommu-pci")) {
>>          return HOTPLUG_HANDLER(machine);
>>      }
>>  
>> diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
>> index 1f86eba3f9..221b4c6ef9 100644
>> --- a/include/hw/i386/pc.h
>> +++ b/include/hw/i386/pc.h
>> @@ -49,6 +49,8 @@ struct PCMachineState {
>>      bool smbus_enabled;
>>      bool sata_enabled;
>>      bool pit_enabled;
>> +    bool virtio_iommu;
>> +    uint16_t virtio_iommu_bdf;
>>  
>>      /* NUMA information: */
>>      uint64_t numa_nodes;
>> -- 
>> 2.20.1
> 



^ permalink raw reply	[flat|nested] 89+ messages in thread

end of thread, other threads:[~2020-01-09 13:38 UTC | newest]

Thread overview: 89+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-22 18:29 [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device Eric Auger
2019-11-22 18:29 ` [PATCH for-5.0 v11 01/20] migration: Support QLIST migration Eric Auger
2019-11-27 11:46   ` Dr. David Alan Gilbert
2020-01-08 13:19     ` Juan Quintela
2020-01-08 13:40       ` Auger Eric
2020-01-08 13:51         ` Juan Quintela
2020-01-08 14:02           ` Auger Eric
2019-11-22 18:29 ` [PATCH for-5.0 v11 02/20] virtio-iommu: Add skeleton Eric Auger
2019-12-10 16:31   ` Jean-Philippe Brucker
2019-12-19 10:31     ` Auger Eric
2019-11-22 18:29 ` [PATCH for-5.0 v11 03/20] virtio-iommu: Decode the command payload Eric Auger
2019-12-10 16:32   ` Jean-Philippe Brucker
2019-12-10 19:14   ` Peter Xu
2019-11-22 18:29 ` [PATCH for-5.0 v11 04/20] virtio-iommu: Add the iommu regions Eric Auger
2019-12-10 16:34   ` Jean-Philippe Brucker
2019-12-19 18:11     ` Auger Eric
2019-12-10 19:18   ` Peter Xu
2019-11-22 18:29 ` [PATCH for-5.0 v11 05/20] virtio-iommu: Endpoint and domains structs and helpers Eric Auger
2019-12-10 16:37   ` Jean-Philippe Brucker
2019-12-19 18:31     ` Auger Eric
2019-12-20 17:00       ` Jean-Philippe Brucker
2019-12-23  9:11         ` Auger Eric
2019-11-22 18:29 ` [PATCH for-5.0 v11 06/20] virtio-iommu: Implement attach/detach command Eric Auger
2019-12-10 16:41   ` Jean-Philippe Brucker
2019-12-23  9:14     ` Auger Eric
2019-11-22 18:29 ` [PATCH for-5.0 v11 07/20] virtio-iommu: Implement map/unmap Eric Auger
2019-12-10 16:43   ` Jean-Philippe Brucker
2019-12-23  9:42     ` Auger Eric
2019-11-22 18:29 ` [PATCH for-5.0 v11 08/20] virtio-iommu: Implement translate Eric Auger
2019-12-10 16:43   ` Jean-Philippe Brucker
2019-12-10 19:33   ` Peter Xu
2019-12-19 10:30     ` Auger Eric
2019-12-19 13:33       ` Peter Xu
2019-12-19 14:38         ` Auger Eric
2019-12-19 14:49           ` Peter Xu
2019-12-19 15:09             ` Auger Eric
2019-12-20 16:26               ` Jean-Philippe Brucker
2019-12-20 16:51                 ` Peter Xu
2020-01-06 17:06                   ` Jean-Philippe Brucker
2020-01-06 17:58                     ` Peter Xu
2020-01-07 10:10                       ` Jean-Philippe Brucker
2020-01-08 16:55                         ` Auger Eric
2020-01-09  8:47                           ` Jean-Philippe Brucker
2020-01-09  8:58                             ` Auger Eric
2020-01-09 10:40                               ` Jean-Philippe Brucker
2020-01-09 11:01                                 ` Auger Eric
2020-01-09 11:15                                   ` Jean-Philippe Brucker
2020-01-09 11:32                                     ` Auger Eric
2019-11-22 18:29 ` [PATCH for-5.0 v11 09/20] virtio-iommu: Implement fault reporting Eric Auger
2019-12-10 16:44   ` Jean-Philippe Brucker
2019-11-22 18:29 ` [PATCH for-5.0 v11 10/20] virtio-iommu-pci: Add virtio iommu pci support Eric Auger
2019-12-10 16:44   ` Jean-Philippe Brucker
2019-11-22 18:29 ` [PATCH for-5.0 v11 11/20] hw/arm/virt: Add the virtio-iommu device tree mappings Eric Auger
2019-12-10 16:45   ` Jean-Philippe Brucker
2019-11-22 18:29 ` [PATCH for-5.0 v11 12/20] qapi: Introduce DEFINE_PROP_INTERVAL Eric Auger
2019-11-22 19:03   ` Dr. David Alan Gilbert
2019-11-25 13:12     ` Auger Eric
2019-12-12 12:17   ` Markus Armbruster
2019-12-12 15:13     ` Auger Eric
2019-12-13 10:03       ` Markus Armbruster
2019-11-22 18:29 ` [PATCH for-5.0 v11 13/20] virtio-iommu: Implement probe request Eric Auger
2019-12-10 16:46   ` Jean-Philippe Brucker
2019-12-10 19:36   ` Peter Xu
2019-11-22 18:29 ` [PATCH for-5.0 v11 14/20] virtio-iommu: Handle reserved regions in the translation process Eric Auger
2019-12-10 16:46   ` Jean-Philippe Brucker
2019-12-10 19:39   ` Peter Xu
2019-11-22 18:29 ` [PATCH for-5.0 v11 15/20] virtio-iommu-pci: Add array of Interval properties Eric Auger
2019-12-10 16:47   ` Jean-Philippe Brucker
2019-11-22 18:29 ` [PATCH for-5.0 v11 16/20] hw/arm/virt-acpi-build: Introduce fill_iort_idmap helper Eric Auger
2019-12-10 16:47   ` Jean-Philippe Brucker
2019-11-22 18:29 ` [PATCH for-5.0 v11 17/20] hw/arm/virt-acpi-build: Add virtio-iommu node in IORT table Eric Auger
2019-12-10 16:48   ` Jean-Philippe Brucker
2019-11-22 18:29 ` [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration Eric Auger
2019-11-27 12:06   ` Dr. David Alan Gilbert
2019-12-10 16:50   ` Jean-Philippe Brucker
2019-12-19 11:03     ` Auger Eric
2019-12-10 20:01   ` Peter Xu
2019-12-24  7:39     ` Auger Eric
2019-11-22 18:29 ` [PATCH for-5.0 v11 19/20] pc: Add support for virtio-iommu-pci Eric Auger
2019-12-10 16:50   ` Jean-Philippe Brucker
2019-12-24  7:39     ` Auger Eric
2020-01-09 12:02   ` Michael S. Tsirkin
2020-01-09 13:34     ` Auger Eric
2019-11-22 18:29 ` [PATCH for-5.0 v11 20/20] tests: Add virtio-iommu test Eric Auger
2019-11-22 21:56 ` [PATCH for-5.0 v11 00/20] VIRTIO-IOMMU device no-reply
2019-12-11 16:40 ` Michael S. Tsirkin
2019-12-11 16:48   ` Auger Eric
2019-12-11 20:40     ` Michael S. Tsirkin
2019-12-12 15:05       ` Auger Eric

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.