All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v7 00/17] vfio-user server in QEMU
@ 2022-03-25 19:19 Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 01/17] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
                   ` (16 more replies)
  0 siblings, 17 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Hi,

This is v7 of the server side changes to enable vfio-user in QEMU.

Thank you very much for reviewing the last revision of this series!

In this revision, we've dropped the patches concerning migration
due to a major changes in the VFIO PCI protocol. We are changing the
client, library and server to conform with the updated spec, and
will send the migration patches for review separately.

The review of client side changes are happening parallelly [1]. The
avocado test in Patch 17 of this series depends on the client, as
such it has to wait until client changes are available. However,
we run the avocado test, among others, to confirm that the server
is working as expected - we do it before submitting patches for
review each time. The following repo applies this series on the
latest client, in case anyone would like to run vfio-user.
repo: https://github.com/oracle/qemu
branch: vfio-user-client-v7server
launcher: scripts/vfiouser-launcher.py
[1]: https://lore.kernel.org/all/6d8ae21cbade8f4bb7eaca4da29e57f0cb1a03f3.1641584317.git.john.g.johnson@oracle.com/T/

We've made the following changes in this revision:

[PATCH v7 02/17] qdev: unplug blocker for devices
  - corrects comments to prevent creation of new section for
    unplug blocker
  - adds an assert to device_finalize() to confirm that device
    does not have unplug blockers
  - moves the unplug blocker functions to hw/core/qdev.c
  - moves test for unplug blocker to qdev_unplug() from qmp_device_del()

[PATCH v7 05/17] configure: require cmake 3.19 or newer
  - new in this series

[PATCH v7 06/17] vfio-user: build library
  - configure script sets cmake_required flag for vfio user

[PATCH v7 07/17] vfio-user: define vfio-user-server object
  - adds auto-shutdown sub-option to the remote machine
  - adds boolean auto_shutdown to TYPE_REMOTE_MACHINE's class
  - adds vfu_object_auto_shutdown() helper function to
    vfio-user-obj.c to query the auto-shutdown property
    from the machine
  - reworks VFU_OBJECT_ERROR() & vfu_object_finalize() to use
    the helper function above.
  - updates QEMU version to 7.1 in commentary for VfioUserServerProperties

[PATCH v7 08/17] vfio-user: instantiate vfio-user context
  - moves phase_check() after the check for machine type
    in vfu_object_init()
  - sets o->vfu_ctx to NULL in vfu_object_finalize()

[PATCH v7 09/17] vfio-user: find and init PCI device
  - holds a reference to attached device in the server,
    and unrefs it during cleanup

[PATCH v7 10/17] vfio-user: run vfio-user context
  - updates QEMU version to 7.1 in commentary for VFU_CLIENT_HANGUP

[PATCH v7 11/17] vfio-user: handle PCI config space accesses
  - adds a comment explaining that writes to BAR register in
    config space doesn't create conflicting memory regions

[PATCH v7 12/17] vfio-user: IOMMU support for remote device
  - passes IOMMU table as an opaque data to pci_setup_iommu()
  - adds locking to access the table to enable concurrent access
  - retains the global data structure as remote_iommu_del_device()
    needs it because it doesn't have access to the opaque data
  - removes redundant check to confirm if hash table is present
    in remote_iommu_find_add_as()
  - adds remote_iommu_del_elem() to free IOMMU entry when it
    is removed from the table

[PATCH v7 14/17] vfio-user: handle PCI BAR accesses
  - adjusts the 'offset' in vfu_object_bar_rw() - MemoryRegion
    returned by memory_region_find() could be a subregion of the
    root memory region referenced by pci_dev->io_regions[pci_bar].memory.
    'offset' input argument is relative to the root region, whereas it
    must be relative to the subregion before access.
  - adds warning for out-of-range access

[PATCH v7 15/17] vfio-user: handle device interrupts
  - moves msi_nonbroken initialization to hw/remote/machine.c
  - adds irq_opaque to PCIDevice which the interrupt notification
    could use; drops global hash table which map device to
    vfio-user context
  - removes NULL function pointer test in msi_notify and msix_notify
  - vfu_object_msi_notify() asserts that IRQ vector is allocated

[PATCH v7 16/17] vfio-user: handle reset of remote device
  - adds comment to explain lost connection handling

[PATCH v7 17/17] vfio-user: avocado tests for vfio-user
  - drops the migration test

Dropped the following patches:
configure, meson: override C compiler for cmake
softmmu/vl: defer backend init
vfio-user: register handlers to facilitate migration

We are looking forward to your comments.

Thank you very much!

Jagannathan Raman (17):
  tests/avocado: Specify target VM argument to helper routines
  qdev: unplug blocker for devices
  remote/machine: add HotplugHandler for remote machine
  remote/machine: add vfio-user property
  configure: require cmake 3.19 or newer
  vfio-user: build library
  vfio-user: define vfio-user-server object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: IOMMU support for remote device
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: handle reset of remote device
  vfio-user: avocado tests for vfio-user

 configure                                  |  36 +-
 meson.build                                |  44 +-
 qapi/misc.json                             |  23 +
 qapi/qom.json                              |  20 +-
 include/exec/memory.h                      |   3 +
 include/hw/pci/pci.h                       |  10 +
 include/hw/qdev-core.h                     |  29 +
 include/hw/remote/iommu.h                  |  18 +
 include/hw/remote/machine.h                |  10 +-
 include/hw/remote/vfio-user-obj.h          |   6 +
 hw/core/qdev.c                             |  24 +
 hw/pci/msi.c                               |  11 +-
 hw/pci/msix.c                              |  10 +-
 hw/remote/iommu.c                          |  95 +++
 hw/remote/machine.c                        |  73 +-
 hw/remote/vfio-user-obj.c                  | 847 +++++++++++++++++++++
 softmmu/physmem.c                          |   4 +-
 softmmu/qdev-monitor.c                     |   4 +
 stubs/vfio-user-obj.c                      |   6 +
 tests/qtest/fuzz/generic_fuzz.c            |   9 +-
 .gitlab-ci.d/buildtest.yml                 |   2 +
 .gitmodules                                |   3 +
 Kconfig.host                               |   4 +
 MAINTAINERS                                |   6 +
 hw/remote/Kconfig                          |   4 +
 hw/remote/meson.build                      |   4 +
 hw/remote/trace-events                     |  11 +
 meson_options.txt                          |   3 +
 stubs/meson.build                          |   1 +
 subprojects/libvfio-user                   |   1 +
 tests/avocado/avocado_qemu/__init__.py     |  14 +-
 tests/avocado/vfio-user.py                 | 164 ++++
 tests/docker/dockerfiles/centos8.docker    |   2 +
 tests/docker/dockerfiles/ubuntu2004.docker |   2 +
 34 files changed, 1483 insertions(+), 20 deletions(-)
 create mode 100644 include/hw/remote/iommu.h
 create mode 100644 include/hw/remote/vfio-user-obj.h
 create mode 100644 hw/remote/iommu.c
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 100644 stubs/vfio-user-obj.c
 create mode 160000 subprojects/libvfio-user
 create mode 100644 tests/avocado/vfio-user.py

-- 
2.20.1



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v7 01/17] tests/avocado: Specify target VM argument to helper routines
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 02/17] qdev: unplug blocker for devices Jagannathan Raman
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Specify target VM for exec_command and
exec_command_and_wait_for_pattern routines

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Beraldo Leal <bleal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/avocado/avocado_qemu/__init__.py | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
index ac85e36a4d..18a34a798c 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -198,7 +198,7 @@ def wait_for_console_pattern(test, success_message, failure_message=None,
     """
     _console_interaction(test, success_message, failure_message, None, vm=vm)
 
-def exec_command(test, command):
+def exec_command(test, command, vm=None):
     """
     Send a command to a console (appending CRLF characters), while logging
     the content.
@@ -207,11 +207,14 @@ def exec_command(test, command):
     :type test: :class:`avocado_qemu.QemuSystemTest`
     :param command: the command to send
     :type command: str
+    :param vm: target vm
+    :type vm: :class:`qemu.machine.QEMUMachine`
     """
-    _console_interaction(test, None, None, command + '\r')
+    _console_interaction(test, None, None, command + '\r', vm=vm)
 
 def exec_command_and_wait_for_pattern(test, command,
-                                      success_message, failure_message=None):
+                                      success_message, failure_message=None,
+                                      vm=None):
     """
     Send a command to a console (appending CRLF characters), then wait
     for success_message to appear on the console, while logging the.
@@ -223,8 +226,11 @@ def exec_command_and_wait_for_pattern(test, command,
     :param command: the command to send
     :param success_message: if this message appears, test succeeds
     :param failure_message: if this message appears, test fails
+    :param vm: target vm
+    :type vm: :class:`qemu.machine.QEMUMachine`
     """
-    _console_interaction(test, success_message, failure_message, command + '\r')
+    _console_interaction(test, success_message, failure_message, command + '\r',
+                         vm=vm)
 
 class QemuBaseTest(avocado.Test):
     def _get_unique_tag_val(self, tag_name):
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 02/17] qdev: unplug blocker for devices
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 01/17] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-29 10:08   ` Stefan Hajnoczi
  2022-03-25 19:19 ` [PATCH v7 03/17] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Add blocker to prevent hot-unplug of devices

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/qdev-core.h | 29 +++++++++++++++++++++++++++++
 hw/core/qdev.c         | 24 ++++++++++++++++++++++++
 softmmu/qdev-monitor.c |  4 ++++
 3 files changed, 57 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 92c3d65208..1b9fa25e5c 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    GSList *unplug_blockers;
 };
 
 struct DeviceListener {
@@ -419,6 +420,34 @@ void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
 void qdev_machine_creation_done(void);
 bool qdev_machine_modified(void);
 
+/*
+ * qdev_add_unplug_blocker: Adds an unplug blocker to a device
+ *
+ * @dev: Device to be blocked from unplug
+ * @reason: Reason for blocking
+ */
+void qdev_add_unplug_blocker(DeviceState *dev, Error *reason);
+
+/*
+ * qdev_del_unplug_blocker: Removes an unplug blocker from a device
+ *
+ * @dev: Device to be unblocked
+ * @reason: Pointer to the Error used with qdev_add_unplug_blocker.
+ *          Used as a handle to lookup the blocker for deletion.
+ */
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason);
+
+/*
+ * qdev_unplug_blocked: Confirms if a device is blocked from unplug
+ *
+ * @dev: Device to be tested
+ * @reason: Returns one of the reasons why the device is blocked,
+ *          if any
+ *
+ * Returns: true if device is blocked from unplug, false otherwise
+ */
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp);
+
 /**
  * GpioPolarity: Polarity of a GPIO line
  *
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 84f3019440..0806d8fcaa 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -468,6 +468,28 @@ char *qdev_get_dev_path(DeviceState *dev)
     return NULL;
 }
 
+void qdev_add_unplug_blocker(DeviceState *dev, Error *reason)
+{
+    dev->unplug_blockers = g_slist_prepend(dev->unplug_blockers, reason);
+}
+
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason)
+{
+    dev->unplug_blockers = g_slist_remove(dev->unplug_blockers, reason);
+}
+
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp)
+{
+    ERRP_GUARD();
+
+    if (dev->unplug_blockers) {
+        error_propagate(errp, error_copy(dev->unplug_blockers->data));
+        return true;
+    }
+
+    return false;
+}
+
 static bool device_get_realized(Object *obj, Error **errp)
 {
     DeviceState *dev = DEVICE(obj);
@@ -704,6 +726,8 @@ static void device_finalize(Object *obj)
 
     DeviceState *dev = DEVICE(obj);
 
+    g_assert(!dev->unplug_blockers);
+
     QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
         QLIST_REMOVE(ngl, node);
         qemu_free_irqs(ngl->in, ngl->num_in);
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 12fe60c467..9cfd59d17c 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -898,6 +898,10 @@ void qdev_unplug(DeviceState *dev, Error **errp)
     HotplugHandlerClass *hdc;
     Error *local_err = NULL;
 
+    if (qdev_unplug_blocked(dev, errp)) {
+        return;
+    }
+
     if (dev->parent_bus && !qbus_is_hotpluggable(dev->parent_bus)) {
         error_setg(errp, QERR_BUS_NO_HOTPLUG, dev->parent_bus->name);
         return;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 03/17] remote/machine: add HotplugHandler for remote machine
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 01/17] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 02/17] qdev: unplug blocker for devices Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 04/17] remote/machine: add vfio-user property Jagannathan Raman
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Allow hotplugging of PCI(e) devices to remote machine

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/machine.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 952105eab5..0c5bd4f923 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -21,6 +21,7 @@
 #include "qapi/error.h"
 #include "hw/pci/pci_host.h"
 #include "hw/remote/iohub.h"
+#include "hw/qdev-core.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -54,14 +55,19 @@ static void remote_machine_init(MachineState *machine)
 
     pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
                  &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+
+    qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
 
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
     mc->init = remote_machine_init;
     mc->desc = "Experimental remote machine";
+
+    hc->unplug = qdev_simple_device_unplug_cb;
 }
 
 static const TypeInfo remote_machine = {
@@ -69,6 +75,10 @@ static const TypeInfo remote_machine = {
     .parent = TYPE_MACHINE,
     .instance_size = sizeof(RemoteMachineState),
     .class_init = remote_machine_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
 };
 
 static void remote_machine_register_types(void)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 04/17] remote/machine: add vfio-user property
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (2 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 03/17] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 05/17] configure: require cmake 3.19 or newer Jagannathan Raman
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Add vfio-user to x-remote machine. It is a boolean, which indicates if
the machine supports vfio-user protocol. The machine configures the bus
differently vfio-user and multiprocess protocols, so this property
informs it on how to configure the bus.

This property should be short lived. Once vfio-user fully replaces
multiprocess, this property could be removed.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/machine.h |  2 ++
 hw/remote/machine.c         | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 2a2a33c4b2..8d0fa98d33 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -22,6 +22,8 @@ struct RemoteMachineState {
 
     RemotePCIHost *host;
     RemoteIOHubState iohub;
+
+    bool vfio_user;
 };
 
 /* Used to pass to co-routine device and ioc. */
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 0c5bd4f923..a9a75e170f 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -59,6 +59,25 @@ static void remote_machine_init(MachineState *machine)
     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
 
+static bool remote_machine_get_vfio_user(Object *obj, Error **errp)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    return s->vfio_user;
+}
+
+static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    if (phase_check(PHASE_MACHINE_CREATED)) {
+        error_setg(errp, "Error enabling vfio-user - machine already created");
+        return;
+    }
+
+    s->vfio_user = value;
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -68,6 +87,10 @@ static void remote_machine_class_init(ObjectClass *oc, void *data)
     mc->desc = "Experimental remote machine";
 
     hc->unplug = qdev_simple_device_unplug_cb;
+
+    object_class_property_add_bool(oc, "vfio-user",
+                                   remote_machine_get_vfio_user,
+                                   remote_machine_set_vfio_user);
 }
 
 static const TypeInfo remote_machine = {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 05/17] configure: require cmake 3.19 or newer
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (3 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 04/17] remote/machine: add vfio-user property Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 06/17] vfio-user: build library Jagannathan Raman
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

cmake needs to accept the compiler flags specified with
CMAKE_<LANG>_COMPILER variable. It does so starting with
version 3.19

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 configure | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/configure b/configure
index 7c08c18358..7a1a98bddf 100755
--- a/configure
+++ b/configure
@@ -250,6 +250,7 @@ stack_protector=""
 safe_stack=""
 use_containers="yes"
 gdb_bin=$(command -v "gdb-multiarch" || command -v "gdb")
+cmake_required="no"
 
 if test -e "$source_path/.git"
 then
@@ -2777,6 +2778,21 @@ if !(GIT="$git" "$source_path/scripts/git-submodule.sh" "$git_submodules_action"
     exit 1
 fi
 
+# Per cmake spec, CMAKE_<LANG>_COMPILER variable may include "mandatory" compiler
+# flags. QEMU needs to specify these flags to correctly configure the build
+# environment. cmake 3.19 allows specifying these mandatory compiler flags,
+# and as such 3.19 or newer is required to build QEMU.
+if test "$cmake_required" = "yes" ; then
+    cmake_bin=$(command -v "cmake")
+    if [ -z "$cmake_bin" ]; then
+        error_exit "cmake not found"
+    fi
+    cmake_version=$($cmake_bin --version | head -n 1)
+    if ! version_ge ${cmake_version##* } 3.19; then
+        error_exit "QEMU needs cmake 3.19 or newer"
+    fi
+fi
+
 config_host_mak="config-host.mak"
 
 echo "# Automatically generated by configure - do not modify" > $config_host_mak
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 06/17] vfio-user: build library
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (4 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 05/17] configure: require cmake 3.19 or newer Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 07/17] vfio-user: define vfio-user-server object Jagannathan Raman
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

add the libvfio-user library as a submodule. build it as a cmake
subproject.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 configure                                  | 20 +++++++++-
 meson.build                                | 44 +++++++++++++++++++++-
 .gitlab-ci.d/buildtest.yml                 |  2 +
 .gitmodules                                |  3 ++
 Kconfig.host                               |  4 ++
 MAINTAINERS                                |  1 +
 hw/remote/Kconfig                          |  4 ++
 hw/remote/meson.build                      |  2 +
 meson_options.txt                          |  3 ++
 subprojects/libvfio-user                   |  1 +
 tests/docker/dockerfiles/centos8.docker    |  2 +
 tests/docker/dockerfiles/ubuntu2004.docker |  2 +
 12 files changed, 86 insertions(+), 2 deletions(-)
 create mode 160000 subprojects/libvfio-user

diff --git a/configure b/configure
index 7a1a98bddf..c4fd7a42d4 100755
--- a/configure
+++ b/configure
@@ -333,6 +333,7 @@ meson_args=""
 ninja=""
 gio="$default_feature"
 skip_meson=no
+vfio_user_server="disabled"
 
 # The following Meson options are handled manually (still they
 # are included in the automatically generated help message)
@@ -1044,6 +1045,11 @@ for opt do
   ;;
   --disable-blobs) meson_option_parse --disable-install-blobs ""
   ;;
+  --enable-vfio-user-server) vfio_user_server="enabled"
+                             cmake_required="yes"
+  ;;
+  --disable-vfio-user-server) vfio_user_server="disabled"
+  ;;
   --enable-tcmalloc) meson_option_parse --enable-malloc=tcmalloc tcmalloc
   ;;
   --enable-jemalloc) meson_option_parse --enable-malloc=jemalloc jemalloc
@@ -1267,6 +1273,7 @@ cat << EOF
   vhost-vdpa      vhost-vdpa kernel backend support
   opengl          opengl support
   gio             libgio support
+  vfio-user-server    vfio-user server support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -2622,6 +2629,17 @@ but not implemented on your system"
     fi
 fi
 
+##########################################
+# check for vfio_user_server
+
+case "$vfio_user_server" in
+  auto | enabled )
+    if test "$git_submodules_action" != "ignore"; then
+      git_submodules="${git_submodules} subprojects/libvfio-user"
+    fi
+    ;;
+esac
+
 ##########################################
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -3185,7 +3203,7 @@ if test "$skip_meson" = no; then
         -Db_pie=$(if test "$pie" = yes; then echo true; else echo false; fi) \
         -Db_coverage=$(if test "$gcov" = yes; then echo true; else echo false; fi) \
         -Db_lto=$lto -Dcfi=$cfi -Dtcg=$tcg -Dxen=$xen \
-        -Dcapstone=$capstone -Dfdt=$fdt -Dslirp=$slirp \
+        -Dcapstone=$capstone -Dfdt=$fdt -Dslirp=$slirp -Dvfio_user_server=$vfio_user_server \
         $(test -n "${LIB_FUZZING_ENGINE+xxx}" && echo "-Dfuzzing_engine=$LIB_FUZZING_ENGINE") \
         $(if test "$default_feature" = no; then echo "-Dauto_features=disabled"; fi) \
         "$@" $cross_arg "$PWD" "$source_path"
diff --git a/meson.build b/meson.build
index aef724ad3c..5ae87b8f86 100644
--- a/meson.build
+++ b/meson.build
@@ -298,6 +298,11 @@ have_tpm = get_option('tpm') \
   .require(targetos != 'windows', error_message: 'TPM emulation only available on POSIX systems') \
   .allowed()
 
+if targetos != 'linux' and get_option('vfio_user_server').enabled()
+  error('vfio-user server is supported only on Linux')
+endif
+vfio_user_server_allowed = targetos == 'linux' and not get_option('vfio_user_server').disabled()
+
 # Target-specific libraries and flags
 libm = cc.find_library('m', required: false)
 threads = dependency('threads')
@@ -2111,7 +2116,8 @@ host_kconfig = \
   (have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
   ('CONFIG_LINUX' in config_host ? ['CONFIG_LINUX=y'] : []) + \
   ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : []) + \
-  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : [])
+  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : []) + \
+  (vfio_user_server_allowed ? ['CONFIG_VFIO_USER_SERVER_ALLOWED=y'] : [])
 
 ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
 
@@ -2500,6 +2506,41 @@ if get_option('cfi') and slirp_opt == 'system'
          + ' Please configure with --enable-slirp=git')
 endif
 
+vfiouser = not_found
+if have_system and vfio_user_server_allowed
+  have_internal = fs.exists(meson.current_source_dir() / 'subprojects/libvfio-user/Makefile')
+
+  if not have_internal
+    error('libvfio-user source not found - please pull git submodule')
+  endif
+
+  json_c = dependency('json-c', required: false)
+  if not json_c.found()
+    json_c = dependency('libjson-c', required: false)
+  endif
+  if not json_c.found()
+    json_c = dependency('libjson-c-dev', required: false)
+  endif
+
+  if not json_c.found()
+    error('Unable to find json-c package')
+  endif
+
+  cmake = import('cmake')
+
+  vfiouser_subproj = cmake.subproject('libvfio-user')
+
+  vfiouser_sl = vfiouser_subproj.dependency('vfio-user-static')
+
+  # Although cmake links the json-c library with vfio-user-static
+  # target, that info is not available to meson via cmake.subproject.
+  # As such, we have to separately declare the json-c dependency here.
+  # This appears to be a current limitation of using cmake inside meson.
+  # libvfio-user is planning a switch to meson in the future, which
+  # would address this item automatically.
+  vfiouser = declare_dependency(dependencies: [vfiouser_sl, json_c])
+endif
+
 fdt = not_found
 if have_system
   fdt_opt = get_option('fdt')
@@ -3612,6 +3653,7 @@ summary_info += {'target list':       ' '.join(target_dirs)}
 if have_system
   summary_info += {'default devices':   get_option('default_devices')}
   summary_info += {'out of process emulation': multiprocess_allowed}
+  summary_info += {'vfio-user server': vfio_user_server_allowed}
 endif
 summary(summary_info, bool_yn: true, section: 'Targets and accelerators')
 
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 0aea7ab84c..671a5d8fa4 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -42,6 +42,7 @@ build-system-ubuntu:
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --enable-docs --enable-fdt=system --enable-slirp=system
+                    --enable-vfio-user-server
     TARGETS: aarch64-softmmu alpha-softmmu cris-softmmu hppa-softmmu
       microblazeel-softmmu mips64el-softmmu
     MAKE_CHECK_ARGS: check-build
@@ -165,6 +166,7 @@ build-system-centos:
     IMAGE: centos8
     CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-fdt=system
       --enable-modules --enable-trace-backends=dtrace --enable-docs
+      --enable-vfio-user-server
     TARGETS: ppc64-softmmu or1k-softmmu s390x-softmmu
       x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
     MAKE_CHECK_ARGS: check-build
diff --git a/.gitmodules b/.gitmodules
index f4b6a9b401..d66af96dc9 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -67,3 +67,6 @@
 [submodule "tests/lcitool/libvirt-ci"]
 	path = tests/lcitool/libvirt-ci
 	url = https://gitlab.com/libvirt/libvirt-ci.git
+[submodule "subprojects/libvfio-user"]
+	path = subprojects/libvfio-user
+	url = https://github.com/nutanix/libvfio-user.git
diff --git a/Kconfig.host b/Kconfig.host
index 60b9c07b5e..f2da8bcf8a 100644
--- a/Kconfig.host
+++ b/Kconfig.host
@@ -45,3 +45,7 @@ config MULTIPROCESS_ALLOWED
 config FUZZ
     bool
     select SPARSE_MEM
+
+config VFIO_USER_SERVER_ALLOWED
+    bool
+    imply VFIO_USER_SERVER
diff --git a/MAINTAINERS b/MAINTAINERS
index cc364afef7..0488bae9d0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3597,6 +3597,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: subprojects/libvfio-user
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
index 08c16e235f..2d6b4f4cf4 100644
--- a/hw/remote/Kconfig
+++ b/hw/remote/Kconfig
@@ -2,3 +2,7 @@ config MULTIPROCESS
     bool
     depends on PCI && PCI_EXPRESS && KVM
     select REMOTE_PCIHOST
+
+config VFIO_USER_SERVER
+    bool
+    depends on MULTIPROCESS
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574242..dfea6b533b 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
 
+remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
+
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
 
diff --git a/meson_options.txt b/meson_options.txt
index 52b11cead4..bd531fd545 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -91,6 +91,9 @@ option('avx2', type: 'feature', value: 'auto',
 option('avx512f', type: 'feature', value: 'disabled',
        description: 'AVX512F optimizations')
 
+option('vfio_user_server', type: 'feature', value: 'auto',
+       description: 'vfio-user server support')
+
 option('attr', type : 'feature', value : 'auto',
        description: 'attr/xattr support')
 option('auth_pam', type : 'feature', value : 'auto',
diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
new file mode 160000
index 0000000000..7056525da5
--- /dev/null
+++ b/subprojects/libvfio-user
@@ -0,0 +1 @@
+Subproject commit 7056525da5399d00831e90bed4aedb4b8442c9b2
diff --git a/tests/docker/dockerfiles/centos8.docker b/tests/docker/dockerfiles/centos8.docker
index 3ede55d09b..b6b4aa9626 100644
--- a/tests/docker/dockerfiles/centos8.docker
+++ b/tests/docker/dockerfiles/centos8.docker
@@ -23,6 +23,7 @@ RUN dnf update -y && \
         capstone-devel \
         ccache \
         clang \
+        cmake \
         ctags \
         cyrus-sasl-devel \
         daxctl-devel \
@@ -45,6 +46,7 @@ RUN dnf update -y && \
         gtk3-devel \
         hostname \
         jemalloc-devel \
+        json-c-devel \
         libaio-devel \
         libasan \
         libattr-devel \
diff --git a/tests/docker/dockerfiles/ubuntu2004.docker b/tests/docker/dockerfiles/ubuntu2004.docker
index b9d06cb040..2422cc5574 100644
--- a/tests/docker/dockerfiles/ubuntu2004.docker
+++ b/tests/docker/dockerfiles/ubuntu2004.docker
@@ -18,6 +18,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
             ca-certificates \
             ccache \
             clang \
+            cmake \
             dbus \
             debianutils \
             diffutils \
@@ -58,6 +59,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
             libiscsi-dev \
             libjemalloc-dev \
             libjpeg-turbo8-dev \
+            libjson-c-dev \
             liblttng-ust-dev \
             liblzo2-dev \
             libncursesw5-dev \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 07/17] vfio-user: define vfio-user-server object
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (5 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 06/17] vfio-user: build library Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-29 10:21   ` Stefan Hajnoczi
  2022-03-25 19:19 ` [PATCH v7 08/17] vfio-user: instantiate vfio-user context Jagannathan Raman
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 qapi/qom.json               |  20 +++-
 include/hw/remote/machine.h |   8 +-
 hw/remote/machine.c         |  23 ++++
 hw/remote/vfio-user-obj.c   | 211 ++++++++++++++++++++++++++++++++++++
 MAINTAINERS                 |   1 +
 hw/remote/meson.build       |   1 +
 hw/remote/trace-events      |   3 +
 7 files changed, 264 insertions(+), 3 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3..e7b1758a11 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -703,6 +703,20 @@
 { 'struct': 'RemoteObjectProperties',
   'data': { 'fd': 'str', 'devid': 'str' } }
 
+##
+# @VfioUserServerProperties:
+#
+# Properties for x-vfio-user-server objects.
+#
+# @socket: socket to be used by the libvfiouser library
+#
+# @device: the id of the device to be emulated at the server
+#
+# Since: 7.1
+##
+{ 'struct': 'VfioUserServerProperties',
+  'data': { 'socket': 'SocketAddress', 'device': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -842,7 +856,8 @@
     'tls-creds-psk',
     'tls-creds-x509',
     'tls-cipher-suites',
-    { 'name': 'x-remote-object', 'features': [ 'unstable' ] }
+    { 'name': 'x-remote-object', 'features': [ 'unstable' ] },
+    { 'name': 'x-vfio-user-server', 'features': [ 'unstable' ] }
   ] }
 
 ##
@@ -905,7 +920,8 @@
       'tls-creds-psk':              'TlsCredsPskProperties',
       'tls-creds-x509':             'TlsCredsX509Properties',
       'tls-cipher-suites':          'TlsCredsProperties',
-      'x-remote-object':            'RemoteObjectProperties'
+      'x-remote-object':            'RemoteObjectProperties',
+      'x-vfio-user-server':         'VfioUserServerProperties'
   } }
 
 ##
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 8d0fa98d33..2fcb9dada5 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -26,6 +26,12 @@ struct RemoteMachineState {
     bool vfio_user;
 };
 
+struct RemoteMachineClass {
+    MachineClass parent_class;
+
+    bool auto_shutdown;
+};
+
 /* Used to pass to co-routine device and ioc. */
 typedef struct RemoteCommDev {
     PCIDevice *dev;
@@ -33,7 +39,7 @@ typedef struct RemoteCommDev {
 } RemoteCommDev;
 
 #define TYPE_REMOTE_MACHINE "x-remote-machine"
-OBJECT_DECLARE_SIMPLE_TYPE(RemoteMachineState, REMOTE_MACHINE)
+OBJECT_DECLARE_TYPE(RemoteMachineState, RemoteMachineClass, REMOTE_MACHINE)
 
 void coroutine_fn mpqemu_remote_msg_loop_co(void *data);
 
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index a9a75e170f..70178b222c 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -78,25 +78,48 @@ static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
     s->vfio_user = value;
 }
 
+static bool remote_machine_get_auto_shutdown(Object *obj, Error **errp)
+{
+    RemoteMachineClass *rmc = REMOTE_MACHINE_GET_CLASS(obj);
+
+    return rmc->auto_shutdown;
+}
+
+static void remote_machine_set_auto_shutdown(Object *obj, bool value,
+                                             Error **errp)
+{
+    RemoteMachineClass *rmc = REMOTE_MACHINE_GET_CLASS(obj);
+
+    rmc->auto_shutdown = value;
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
+    RemoteMachineClass *rmc = REMOTE_MACHINE_CLASS(oc);
     HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
     mc->init = remote_machine_init;
     mc->desc = "Experimental remote machine";
 
+    rmc->auto_shutdown = true;
+
     hc->unplug = qdev_simple_device_unplug_cb;
 
     object_class_property_add_bool(oc, "vfio-user",
                                    remote_machine_get_vfio_user,
                                    remote_machine_set_vfio_user);
+
+    object_class_property_add_bool(oc, "auto-shutdown",
+                                   remote_machine_get_auto_shutdown,
+                                   remote_machine_set_auto_shutdown);
 }
 
 static const TypeInfo remote_machine = {
     .name = TYPE_REMOTE_MACHINE,
     .parent = TYPE_MACHINE,
     .instance_size = sizeof(RemoteMachineState),
+    .class_size = sizeof(RemoteMachineClass),
     .class_init = remote_machine_class_init,
     .interfaces = (InterfaceInfo[]) {
         { TYPE_HOTPLUG_HANDLER },
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 0000000000..c4d59b4d9d
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,211 @@
+/**
+ * QEMU vfio-user-server server object
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ *     -machine x-remote,vfio-user=on,auto-shutdown=on
+ *     -device <PCI-device>,id=<pci-dev-id>
+ *     -object x-vfio-user-server,id=<id>,type=unix,path=<socket-path>,
+ *             device=<pci-dev-id>
+ *
+ * Note that x-vfio-user-server object must be used with x-remote machine only.
+ * This server could only support PCI devices for now.
+ *
+ * type - SocketAddress type - presently "unix" alone is supported. Required
+ *        option
+ *
+ * path - named unix socket, it will be created by the server. It is
+ *        a required option
+ *
+ * device - id of a device on the server, a required option. PCI devices
+ *          alone are supported presently.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+#include "hw/boards.h"
+#include "hw/remote/machine.h"
+#include "qapi/error.h"
+#include "qapi/qapi-visit-sockets.h"
+
+#define TYPE_VFU_OBJECT "x-vfio-user-server"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+/**
+ * VFU_OBJECT_ERROR - reports an error message. If auto_shutdown
+ * is set, it aborts the machine on error. Otherwise, it logs an
+ * error message without aborting.
+ */
+#define VFU_OBJECT_ERROR(o, fmt, ...)                                     \
+    {                                                                     \
+        if (vfu_object_auto_shutdown()) {                                 \
+            error_setg(&error_abort, (fmt), ## __VA_ARGS__);              \
+        } else {                                                          \
+            error_report((fmt), ## __VA_ARGS__);                          \
+        }                                                                 \
+    }                                                                     \
+
+struct VfuObjectClass {
+    ObjectClass parent_class;
+
+    unsigned int nr_devs;
+};
+
+struct VfuObject {
+    /* private */
+    Object parent;
+
+    SocketAddress *socket;
+
+    char *device;
+
+    Error *err;
+};
+
+static bool vfu_object_auto_shutdown(void)
+{
+    bool auto_shutdown = true;
+    Error *local_err = NULL;
+
+    if (!current_machine) {
+        return auto_shutdown;
+    }
+
+    auto_shutdown = object_property_get_bool(OBJECT(current_machine),
+                                             "auto-shutdown",
+                                             &local_err);
+
+    /*
+     * local_err would be set if no such property exists - safe to ignore.
+     * Unlikely scenario as auto-shutdown is always defined for
+     * TYPE_REMOTE_MACHINE, and  TYPE_VFU_OBJECT only works with
+     * TYPE_REMOTE_MACHINE
+     */
+    if (local_err) {
+        auto_shutdown = true;
+        error_free(local_err);
+    }
+
+    return auto_shutdown;
+}
+
+static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    qapi_free_SocketAddress(o->socket);
+
+    o->socket = NULL;
+
+    visit_type_SocketAddress(v, name, &o->socket, errp);
+
+    if (o->socket->type != SOCKET_ADDRESS_TYPE_UNIX) {
+        error_setg(errp, "vfu: Unsupported socket type - %s",
+                   SocketAddressType_str(o->socket->type));
+        qapi_free_SocketAddress(o->socket);
+        o->socket = NULL;
+        return;
+    }
+
+    trace_vfu_prop("socket", o->socket->u.q_unix.path);
+}
+
+static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->device);
+
+    o->device = g_strdup(str);
+
+    trace_vfu_prop("device", str);
+}
+
+static void vfu_object_init(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs++;
+
+    if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
+        error_setg(&o->err, "vfu: %s only compatible with %s machine",
+                   TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
+        return;
+    }
+}
+
+static void vfu_object_finalize(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs--;
+
+    qapi_free_SocketAddress(o->socket);
+
+    o->socket = NULL;
+
+    g_free(o->device);
+
+    o->device = NULL;
+
+    if (!k->nr_devs && vfu_object_auto_shutdown()) {
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
+}
+
+static void vfu_object_class_init(ObjectClass *klass, void *data)
+{
+    VfuObjectClass *k = VFU_OBJECT_CLASS(klass);
+
+    k->nr_devs = 0;
+
+    object_class_property_add(klass, "socket", "SocketAddress", NULL,
+                              vfu_object_set_socket, NULL, NULL);
+    object_class_property_set_description(klass, "socket",
+                                          "SocketAddress "
+                                          "(ex: type=unix,path=/tmp/sock). "
+                                          "Only UNIX is presently supported");
+    object_class_property_add_str(klass, "device", NULL,
+                                  vfu_object_set_device);
+    object_class_property_set_description(klass, "device",
+                                          "device ID - only PCI devices "
+                                          "are presently supported");
+}
+
+static const TypeInfo vfu_object_info = {
+    .name = TYPE_VFU_OBJECT,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(VfuObject),
+    .instance_init = vfu_object_init,
+    .instance_finalize = vfu_object_finalize,
+    .class_size = sizeof(VfuObjectClass),
+    .class_init = vfu_object_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void vfu_register_types(void)
+{
+    type_register_static(&vfu_object_info);
+}
+
+type_init(vfu_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index 0488bae9d0..e7b0297a63 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3598,6 +3598,7 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
+F: hw/remote/vfio-user-obj.c
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index dfea6b533b..534ac5df79 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 0b23974f90..7da12f0d96 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -2,3 +2,6 @@
 
 mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
 mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
+
+# vfio-user-obj.c
+vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 08/17] vfio-user: instantiate vfio-user context
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (6 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 07/17] vfio-user: define vfio-user-server object Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-29 10:26   ` Stefan Hajnoczi
  2022-03-25 19:19 ` [PATCH v7 09/17] vfio-user: find and init PCI device Jagannathan Raman
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

create a context with the vfio-user library to run a PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 82 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 82 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index c4d59b4d9d..d46acd5b63 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -41,6 +41,9 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/notify.h"
+#include "sysemu/sysemu.h"
+#include "libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -74,8 +77,14 @@ struct VfuObject {
     char *device;
 
     Error *err;
+
+    Notifier machine_done;
+
+    vfu_ctx_t *vfu_ctx;
 };
 
+static void vfu_object_init_ctx(VfuObject *o, Error **errp);
+
 static bool vfu_object_auto_shutdown(void)
 {
     bool auto_shutdown = true;
@@ -108,6 +117,11 @@ static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
 {
     VfuObject *o = VFU_OBJECT(obj);
 
+    if (o->vfu_ctx) {
+        error_setg(errp, "vfu: Unable to set socket property - server busy");
+        return;
+    }
+
     qapi_free_SocketAddress(o->socket);
 
     o->socket = NULL;
@@ -123,17 +137,69 @@ static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
     }
 
     trace_vfu_prop("socket", o->socket->u.q_unix.path);
+
+    vfu_object_init_ctx(o, errp);
 }
 
 static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
 {
     VfuObject *o = VFU_OBJECT(obj);
 
+    if (o->vfu_ctx) {
+        error_setg(errp, "vfu: Unable to set device property - server busy");
+        return;
+    }
+
     g_free(o->device);
 
     o->device = g_strdup(str);
 
     trace_vfu_prop("device", str);
+
+    vfu_object_init_ctx(o, errp);
+}
+
+/*
+ * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
+ * properties. It also depends on devices instantiated in QEMU. These
+ * dependencies are not available during the instance_init phase of this
+ * object's life-cycle. As such, the server is initialized after the
+ * machine is setup. machine_init_done_notifier notifies TYPE_VFU_OBJECT
+ * when the machine is setup, and the dependencies are available.
+ */
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+    VfuObject *o = container_of(notifier, VfuObject, machine_done);
+    Error *err = NULL;
+
+    vfu_object_init_ctx(o, &err);
+
+    if (err) {
+        error_propagate(&error_abort, err);
+    }
+}
+
+static void vfu_object_init_ctx(VfuObject *o, Error **errp)
+{
+    ERRP_GUARD();
+
+    if (o->vfu_ctx || !o->socket || !o->device ||
+            !phase_check(PHASE_MACHINE_READY)) {
+        return;
+    }
+
+    if (o->err) {
+        error_propagate(errp, o->err);
+        o->err = NULL;
+        return;
+    }
+
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+                                o, VFU_DEV_TYPE_PCI);
+    if (o->vfu_ctx == NULL) {
+        error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
+        return;
+    }
 }
 
 static void vfu_object_init(Object *obj)
@@ -148,6 +214,12 @@ static void vfu_object_init(Object *obj)
                    TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
         return;
     }
+
+    if (!phase_check(PHASE_MACHINE_READY)) {
+        o->machine_done.notify = vfu_object_machine_done;
+        qemu_add_machine_init_done_notifier(&o->machine_done);
+    }
+
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -161,6 +233,11 @@ static void vfu_object_finalize(Object *obj)
 
     o->socket = NULL;
 
+    if (o->vfu_ctx) {
+        vfu_destroy_ctx(o->vfu_ctx);
+        o->vfu_ctx = NULL;
+    }
+
     g_free(o->device);
 
     o->device = NULL;
@@ -168,6 +245,11 @@ static void vfu_object_finalize(Object *obj)
     if (!k->nr_devs && vfu_object_auto_shutdown()) {
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
+
+    if (o->machine_done.notify) {
+        qemu_remove_machine_init_done_notifier(&o->machine_done);
+        o->machine_done.notify = NULL;
+    }
 }
 
 static void vfu_object_class_init(ObjectClass *klass, void *data)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 09/17] vfio-user: find and init PCI device
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (7 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 08/17] vfio-user: instantiate vfio-user context Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 10/17] vfio-user: run vfio-user context Jagannathan Raman
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 67 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index d46acd5b63..15f6fe3a1a 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -44,6 +44,8 @@
 #include "qemu/notify.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -81,6 +83,10 @@ struct VfuObject {
     Notifier machine_done;
 
     vfu_ctx_t *vfu_ctx;
+
+    PCIDevice *pci_dev;
+
+    Error *unplug_blocker;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -182,6 +188,9 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
 static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 {
     ERRP_GUARD();
+    DeviceState *dev = NULL;
+    vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
+    int ret;
 
     if (o->vfu_ctx || !o->socket || !o->device ||
             !phase_check(PHASE_MACHINE_READY)) {
@@ -200,6 +209,53 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
         return;
     }
+
+    dev = qdev_find_recursive(sysbus_get_default(), o->device);
+    if (dev == NULL) {
+        error_setg(errp, "vfu: Device %s not found", o->device);
+        goto fail;
+    }
+
+    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        error_setg(errp, "vfu: %s not a PCI device", o->device);
+        goto fail;
+    }
+
+    o->pci_dev = PCI_DEVICE(dev);
+
+    object_ref(OBJECT(o->pci_dev));
+
+    if (pci_is_express(o->pci_dev)) {
+        pci_type = VFU_PCI_TYPE_EXPRESS;
+    }
+
+    ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
+    if (ret < 0) {
+        error_setg(errp,
+                   "vfu: Failed to attach PCI device %s to context - %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
+    error_setg(&o->unplug_blocker,
+               "vfu: %s for %s must be deleted before unplugging",
+               TYPE_VFU_OBJECT, o->device);
+    qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+
+    return;
+
+fail:
+    vfu_destroy_ctx(o->vfu_ctx);
+    if (o->unplug_blocker && o->pci_dev) {
+        qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+        error_free(o->unplug_blocker);
+        o->unplug_blocker = NULL;
+    }
+    if (o->pci_dev) {
+        object_unref(OBJECT(o->pci_dev));
+        o->pci_dev = NULL;
+    }
+    o->vfu_ctx = NULL;
 }
 
 static void vfu_object_init(Object *obj)
@@ -242,6 +298,17 @@ static void vfu_object_finalize(Object *obj)
 
     o->device = NULL;
 
+    if (o->unplug_blocker && o->pci_dev) {
+        qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+        error_free(o->unplug_blocker);
+        o->unplug_blocker = NULL;
+    }
+
+    if (o->pci_dev) {
+        object_unref(OBJECT(o->pci_dev));
+        o->pci_dev = NULL;
+    }
+
     if (!k->nr_devs && vfu_object_auto_shutdown()) {
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 10/17] vfio-user: run vfio-user context
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (8 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 09/17] vfio-user: find and init PCI device Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 11/17] vfio-user: handle PCI config space accesses Jagannathan Raman
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/misc.json            | 23 ++++++++++
 hw/remote/vfio-user-obj.c | 95 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 117 insertions(+), 1 deletion(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index b83cc39029..f3cc4a4854 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -553,3 +553,26 @@
 ##
 { 'event': 'RTC_CHANGE',
   'data': { 'offset': 'int', 'qom-path': 'str' } }
+
+##
+# @VFU_CLIENT_HANGUP:
+#
+# Emitted when the client of a TYPE_VFIO_USER_SERVER closes the
+# communication channel
+#
+# @id: ID of the TYPE_VFIO_USER_SERVER object
+#
+# @device: ID of attached PCI device
+#
+# Since: 7.1
+#
+# Example:
+#
+# <- { "event": "VFU_CLIENT_HANGUP",
+#      "data": { "id": "vfu1",
+#                "device": "lsi1" },
+#      "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+#
+##
+{ 'event': 'VFU_CLIENT_HANGUP',
+  'data': { 'id': 'str', 'device': 'str' } }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 15f6fe3a1a..06d99a8698 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,6 +27,9 @@
  *
  * device - id of a device on the server, a required option. PCI devices
  *          alone are supported presently.
+ *
+ * notes - x-vfio-user-server could block IO and monitor during the
+ *         initialization phase.
  */
 
 #include "qemu/osdep.h"
@@ -41,11 +44,14 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "qemu/timer.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -87,6 +93,8 @@ struct VfuObject {
     PCIDevice *pci_dev;
 
     Error *unplug_blocker;
+
+    int vfu_poll_fd;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -165,6 +173,69 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
     vfu_object_init_ctx(o, errp);
 }
 
+static void vfu_object_ctx_run(void *opaque)
+{
+    VfuObject *o = opaque;
+    const char *id = NULL;
+    int ret = -1;
+
+    while (ret != 0) {
+        ret = vfu_run_ctx(o->vfu_ctx);
+        if (ret < 0) {
+            if (errno == EINTR) {
+                continue;
+            } else if (errno == ENOTCONN) {
+                id = object_get_canonical_path_component(OBJECT(o));
+                qapi_event_send_vfu_client_hangup(id, o->device);
+                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+                o->vfu_poll_fd = -1;
+                object_unparent(OBJECT(o));
+                break;
+            } else {
+                VFU_OBJECT_ERROR(o, "vfu: Failed to run device %s - %s",
+                                 o->device, strerror(errno));
+                break;
+            }
+        }
+    }
+}
+
+static void vfu_object_attach_ctx(void *opaque)
+{
+    VfuObject *o = opaque;
+    GPollFD pfds[1];
+    int ret;
+
+    qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+
+    pfds[0].fd = o->vfu_poll_fd;
+    pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+
+retry_attach:
+    ret = vfu_attach_ctx(o->vfu_ctx);
+    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
+        /**
+         * vfu_object_attach_ctx can block QEMU's main loop
+         * during attach - the monitor and other IO
+         * could be unresponsive during this time.
+         */
+        (void)qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
+        goto retry_attach;
+    } else if (ret < 0) {
+        VFU_OBJECT_ERROR(o, "vfu: Failed to attach device %s to context - %s",
+                         o->device, strerror(errno));
+        return;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        VFU_OBJECT_ERROR(o, "vfu: Failed to get poll fd %s", o->device);
+        return;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -203,7 +274,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         return;
     }
 
-    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path,
+                                LIBVFIO_USER_FLAG_ATTACH_NB,
                                 o, VFU_DEV_TYPE_PCI);
     if (o->vfu_ctx == NULL) {
         error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
@@ -242,6 +314,21 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
                TYPE_VFU_OBJECT, o->device);
     qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
 
+    ret = vfu_realize_ctx(o->vfu_ctx);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to realize device %s- %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        error_setg(errp, "vfu: Failed to get poll fd %s", o->device);
+        goto fail;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_attach_ctx, NULL, o);
+
     return;
 
 fail:
@@ -276,6 +363,7 @@ static void vfu_object_init(Object *obj)
         qemu_add_machine_init_done_notifier(&o->machine_done);
     }
 
+    o->vfu_poll_fd = -1;
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -289,6 +377,11 @@ static void vfu_object_finalize(Object *obj)
 
     o->socket = NULL;
 
+    if (o->vfu_poll_fd != -1) {
+        qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+        o->vfu_poll_fd = -1;
+    }
+
     if (o->vfu_ctx) {
         vfu_destroy_ctx(o->vfu_ctx);
         o->vfu_ctx = NULL;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 11/17] vfio-user: handle PCI config space accesses
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (9 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 10/17] vfio-user: run vfio-user context Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 12/17] vfio-user: IOMMU support for remote device Jagannathan Raman
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 51 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 53 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 06d99a8698..7b863dec4f 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -47,6 +47,7 @@
 #include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
@@ -236,6 +237,45 @@ retry_attach:
     qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+                                     size_t count, loff_t offset,
+                                     const bool is_write)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint32_t pci_access_width = sizeof(uint32_t);
+    size_t bytes = count;
+    uint32_t val = 0;
+    char *ptr = buf;
+    int len;
+
+    /*
+     * Writes to the BAR registers would trigger an update to the
+     * global Memory and IO AddressSpaces. But the remote device
+     * never uses the global AddressSpaces, therefore overlapping
+     * memory regions are not a problem
+     */
+    while (bytes > 0) {
+        len = (bytes > pci_access_width) ? pci_access_width : bytes;
+        if (is_write) {
+            memcpy(&val, ptr, len);
+            pci_host_config_write_common(o->pci_dev, offset,
+                                         pci_config_size(o->pci_dev),
+                                         val, len);
+            trace_vfu_cfg_write(offset, val);
+        } else {
+            val = pci_host_config_read_common(o->pci_dev, offset,
+                                              pci_config_size(o->pci_dev), len);
+            memcpy(ptr, &val, len);
+            trace_vfu_cfg_read(offset, val);
+        }
+        offset += len;
+        ptr += len;
+        bytes -= len;
+    }
+
+    return count;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -314,6 +354,17 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
                TYPE_VFU_OBJECT, o->device);
     qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
 
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+                           pci_config_size(o->pci_dev), &vfu_object_cfg_access,
+                           VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+                           NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(errp,
+                   "vfu: Failed to setup config space handlers for %s- %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0d96..2ef7884346 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (10 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 11/17] vfio-user: handle PCI config space accesses Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-29 12:35   ` Stefan Hajnoczi
  2022-04-13 14:25   ` Igor Mammedov
  2022-03-25 19:19 ` [PATCH v7 13/17] vfio-user: handle DMA mappings Jagannathan Raman
                   ` (4 subsequent siblings)
  16 siblings, 2 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Assign separate address space for each device in the remote processes.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/remote/iommu.h | 18 ++++++++
 hw/remote/iommu.c         | 95 +++++++++++++++++++++++++++++++++++++++
 MAINTAINERS               |  2 +
 hw/remote/meson.build     |  1 +
 4 files changed, 116 insertions(+)
 create mode 100644 include/hw/remote/iommu.h
 create mode 100644 hw/remote/iommu.c

diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
new file mode 100644
index 0000000000..8f850400f1
--- /dev/null
+++ b/include/hw/remote/iommu.h
@@ -0,0 +1,18 @@
+/**
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_IOMMU_H
+#define REMOTE_IOMMU_H
+
+#include "hw/pci/pci_bus.h"
+
+void remote_configure_iommu(PCIBus *pci_bus);
+
+void remote_iommu_del_device(PCIDevice *pci_dev);
+
+#endif
diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
new file mode 100644
index 0000000000..13f329b45d
--- /dev/null
+++ b/hw/remote/iommu.c
@@ -0,0 +1,95 @@
+/**
+ * IOMMU for remote device
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/remote/iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
+
+struct RemoteIommuElem {
+    AddressSpace  as;
+    MemoryRegion  mr;
+};
+
+struct RemoteIommuTable {
+    QemuMutex lock;
+    GHashTable *elem_by_bdf;
+} remote_iommu_table;
+
+#define INT2VOIDP(i) (void *)(uintptr_t)(i)
+
+static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
+                                              void *opaque, int devfn)
+{
+    struct RemoteIommuTable *iommu_table = opaque;
+    struct RemoteIommuElem *elem = NULL;
+    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
+
+    elem = g_hash_table_lookup(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf));
+
+    if (!elem) {
+        g_autofree char *mr_name = g_strdup_printf("vfu-ram-%d", pci_bdf);
+        g_autofree char *as_name = g_strdup_printf("vfu-as-%d", pci_bdf);
+
+        elem = g_malloc0(sizeof(struct RemoteIommuElem));
+
+        memory_region_init(&elem->mr, NULL, mr_name, UINT64_MAX);
+        address_space_init(&elem->as, &elem->mr, as_name);
+
+        qemu_mutex_lock(&iommu_table->lock);
+        g_hash_table_insert(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf), elem);
+        qemu_mutex_unlock(&iommu_table->lock);
+    }
+
+    return &elem->as;
+}
+
+static void remote_iommu_del_elem(gpointer data)
+{
+    struct RemoteIommuElem *elem = data;
+
+    g_assert(elem);
+
+    memory_region_unref(&elem->mr);
+    address_space_destroy(&elem->as);
+
+    g_free(elem);
+}
+
+void remote_iommu_del_device(PCIDevice *pci_dev)
+{
+    int pci_bdf;
+
+    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
+        return;
+    }
+
+    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
+
+    qemu_mutex_lock(&remote_iommu_table.lock);
+    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
+    qemu_mutex_unlock(&remote_iommu_table.lock);
+}
+
+void remote_configure_iommu(PCIBus *pci_bus)
+{
+    if (!remote_iommu_table.elem_by_bdf) {
+        remote_iommu_table.elem_by_bdf =
+            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
+        qemu_mutex_init(&remote_iommu_table.lock);
+    }
+
+    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index e7b0297a63..21694a9698 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3599,6 +3599,8 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: hw/remote/iommu.c
+F: include/hw/remote/iommu.h
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 534ac5df79..bcef83c8cc 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iommu.c'))
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 13/17] vfio-user: handle DMA mappings
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (11 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 12/17] vfio-user: IOMMU support for remote device Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 14/17] vfio-user: handle PCI BAR accesses Jagannathan Raman
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/machine.c       |  5 ++++
 hw/remote/vfio-user-obj.c | 55 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 3 files changed, 62 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 70178b222c..67801c3a4d 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -22,6 +22,7 @@
 #include "hw/pci/pci_host.h"
 #include "hw/remote/iohub.h"
 #include "hw/qdev-core.h"
+#include "hw/remote/iommu.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -51,6 +52,10 @@ static void remote_machine_init(MachineState *machine)
 
     pci_host = PCI_HOST_BRIDGE(rem_host);
 
+    if (s->vfio_user) {
+        remote_configure_iommu(pci_host->bus);
+    }
+
     remote_iohub_init(&s->iohub);
 
     pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 7b863dec4f..425e45e8b2 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -276,6 +276,54 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
     return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    AddressSpace *dma_as = NULL;
+    MemoryRegion *subregion = NULL;
+    g_autofree char *name = NULL;
+    struct iovec *iov = &info->iova;
+
+    if (!info->vaddr) {
+        return;
+    }
+
+    name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
+                           (uint64_t)info->vaddr);
+
+    subregion = g_new0(MemoryRegion, 1);
+
+    memory_region_init_ram_ptr(subregion, NULL, name,
+                               iov->iov_len, info->vaddr);
+
+    dma_as = pci_device_iommu_address_space(o->pci_dev);
+
+    memory_region_add_subregion(dma_as->root, (hwaddr)iov->iov_base, subregion);
+
+    trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    AddressSpace *dma_as = NULL;
+    MemoryRegion *mr = NULL;
+    ram_addr_t offset;
+
+    mr = memory_region_from_host(info->vaddr, &offset);
+    if (!mr) {
+        return;
+    }
+
+    dma_as = pci_device_iommu_address_space(o->pci_dev);
+
+    memory_region_del_subregion(dma_as->root, mr);
+
+    object_unparent((OBJECT(mr)));
+
+    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -365,6 +413,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, &dma_unregister);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup DMA handlers for %s",
+                   o->device);
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884346..f945c7e33b 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 14/17] vfio-user: handle PCI BAR accesses
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (12 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 13/17] vfio-user: handle DMA mappings Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-29 12:50   ` Stefan Hajnoczi
  2022-03-25 19:19 ` [PATCH v7 15/17] vfio-user: handle device interrupts Jagannathan Raman
                   ` (2 subsequent siblings)
  16 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/exec/memory.h           |   3 +
 hw/remote/vfio-user-obj.c       | 167 ++++++++++++++++++++++++++++++++
 softmmu/physmem.c               |   4 +-
 tests/qtest/fuzz/generic_fuzz.c |   9 +-
 hw/remote/trace-events          |   3 +
 5 files changed, 180 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 4d5997e6bb..4b061e62d5 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2810,6 +2810,9 @@ MemTxResult address_space_write_cached_slow(MemoryRegionCache *cache,
                                             hwaddr addr, const void *buf,
                                             hwaddr len);
 
+int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
+bool prepare_mmio_access(MemoryRegion *mr);
+
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
     if (is_write) {
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 425e45e8b2..6910c16243 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -53,6 +53,7 @@
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
 #include "qemu/timer.h"
+#include "exec/memory.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -324,6 +325,170 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
     trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
 }
 
+static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
+                                hwaddr offset, char * const buf,
+                                hwaddr len, const bool is_write)
+{
+    uint8_t *ptr = (uint8_t *)buf;
+    uint8_t *ram_ptr = NULL;
+    bool release_lock = false;
+    MemoryRegionSection section = { 0 };
+    MemoryRegion *mr = NULL;
+    int access_size;
+    hwaddr size = 0;
+    MemTxResult result;
+    uint64_t val;
+
+    section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
+                                 offset, len);
+
+    if (!section.mr) {
+        return 0;
+    }
+
+    mr = section.mr;
+    offset = section.offset_within_region;
+
+    if (is_write && mr->readonly) {
+        warn_report("vfu: attempting to write to readonly region in "
+                    "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
+                    pci_bar, offset, (offset + len));
+        return 0;
+    }
+
+    if (memory_access_is_direct(mr, is_write)) {
+        /**
+         * Some devices expose a PCI expansion ROM, which could be buffer
+         * based as compared to other regions which are primarily based on
+         * MemoryRegionOps. memory_region_find() would already check
+         * for buffer overflow, we don't need to repeat it here.
+         */
+        ram_ptr = memory_region_get_ram_ptr(mr);
+
+        size = len;
+
+        if (is_write) {
+            memcpy((ram_ptr + offset), buf, size);
+        } else {
+            memcpy(buf, (ram_ptr + offset), size);
+        }
+
+        goto exit;
+    }
+
+    while (len > 0) {
+        /**
+         * The read/write logic used below is similar to the ones in
+         * flatview_read/write_continue()
+         */
+        release_lock = prepare_mmio_access(mr);
+
+        access_size = memory_access_size(mr, len, offset);
+
+        if (is_write) {
+            val = ldn_he_p(ptr, access_size);
+
+            result = memory_region_dispatch_write(mr, offset, val,
+                                                  size_memop(access_size),
+                                                  MEMTXATTRS_UNSPECIFIED);
+        } else {
+            result = memory_region_dispatch_read(mr, offset, &val,
+                                                 size_memop(access_size),
+                                                 MEMTXATTRS_UNSPECIFIED);
+
+            stn_he_p(ptr, access_size, val);
+        }
+
+        if (release_lock) {
+            qemu_mutex_unlock_iothread();
+            release_lock = false;
+        }
+
+        if (result != MEMTX_OK) {
+            warn_report("vfu: failed to %s 0x%"PRIx64"",
+                        is_write ? "write to" : "read from",
+                        (offset - size));
+
+            goto exit;
+        }
+
+        len -= access_size;
+        size += access_size;
+        ptr += access_size;
+        offset += access_size;
+    }
+
+exit:
+    memory_region_unref(mr);
+
+    return size;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
+    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
+                                        char * const buf, size_t count,        \
+                                        loff_t offset, const bool is_write)    \
+    {                                                                          \
+        VfuObject *o = vfu_get_private(vfu_ctx);                               \
+        PCIDevice *pci_dev = o->pci_dev;                                       \
+                                                                               \
+        return vfu_object_bar_rw(pci_dev, BAR_NO, offset,                      \
+                                 buf, count, is_write);                        \
+    }                                                                          \
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+VFU_OBJECT_BAR_HANDLER(6)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+    &vfu_object_bar0_handler,
+    &vfu_object_bar1_handler,
+    &vfu_object_bar2_handler,
+    &vfu_object_bar3_handler,
+    &vfu_object_bar4_handler,
+    &vfu_object_bar5_handler,
+    &vfu_object_bar6_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ *                            callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+    int flags = VFU_REGION_FLAG_RW;
+    int i;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        if (!pdev->io_regions[i].size) {
+            continue;
+        }
+
+        if ((i == VFU_PCI_DEV_ROM_REGION_IDX) ||
+            pdev->io_regions[i].memory->readonly) {
+            flags &= ~VFU_REGION_FLAG_WRITE;
+        }
+
+        vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i,
+                         (size_t)pdev->io_regions[i].size,
+                         vfu_object_bar_handlers[i],
+                         flags, NULL, 0, -1, 0);
+
+        trace_vfu_bar_register(i, pdev->io_regions[i].addr,
+                               pdev->io_regions[i].size);
+    }
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -420,6 +585,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 4e1b27a20e..f9a68d1fe5 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2719,7 +2719,7 @@ void memory_region_flush_rom_device(MemoryRegion *mr, hwaddr addr, hwaddr size)
     invalidate_and_set_dirty(mr, addr, size);
 }
 
-static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
+int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
 {
     unsigned access_size_max = mr->ops->valid.max_access_size;
 
@@ -2746,7 +2746,7 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
     return l;
 }
 
-static bool prepare_mmio_access(MemoryRegion *mr)
+bool prepare_mmio_access(MemoryRegion *mr)
 {
     bool release_lock = false;
 
diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index dd7e25851c..77547fc1d8 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -144,7 +144,7 @@ static void *pattern_alloc(pattern p, size_t len)
     return buf;
 }
 
-static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
+static int fuzz_memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
 {
     unsigned access_size_max = mr->ops->valid.max_access_size;
 
@@ -242,11 +242,12 @@ void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion *mr)
 
         /*
          *  If mr1 isn't RAM, address_space_translate doesn't update l. Use
-         *  memory_access_size to identify the number of bytes that it is safe
-         *  to write without accidentally writing to another MemoryRegion.
+         *  fuzz_memory_access_size to identify the number of bytes that it
+         *  is safe to write without accidentally writing to another
+         *  MemoryRegion.
          */
         if (!memory_region_is_ram(mr1)) {
-            l = memory_access_size(mr1, l, addr1);
+            l = fuzz_memory_access_size(mr1, l, addr1);
         }
         if (memory_region_is_ram(mr1) ||
             memory_region_is_romd(mr1) ||
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e33b..847d50d88f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,6 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
+vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
+vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
+vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 15/17] vfio-user: handle device interrupts
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (13 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 14/17] vfio-user: handle PCI BAR accesses Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 16/17] vfio-user: handle reset of remote device Jagannathan Raman
  2022-03-25 19:19 ` [PATCH v7 17/17] vfio-user: avocado tests for vfio-user Jagannathan Raman
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/pci/pci.h              |  10 +++
 include/hw/remote/vfio-user-obj.h |   6 ++
 hw/pci/msi.c                      |  11 +++-
 hw/pci/msix.c                     |  10 ++-
 hw/remote/machine.c               |  14 +++--
 hw/remote/vfio-user-obj.c         | 101 ++++++++++++++++++++++++++++++
 stubs/vfio-user-obj.c             |   6 ++
 MAINTAINERS                       |   1 +
 hw/remote/trace-events            |   1 +
 stubs/meson.build                 |   1 +
 10 files changed, 155 insertions(+), 6 deletions(-)
 create mode 100644 include/hw/remote/vfio-user-obj.h
 create mode 100644 stubs/vfio-user-obj.c

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 3a32b8dd40..fb8a05ae25 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -16,6 +16,7 @@ extern bool pci_available;
 #define PCI_SLOT(devfn)         (((devfn) >> 3) & 0x1f)
 #define PCI_FUNC(devfn)         ((devfn) & 0x07)
 #define PCI_BUILD_BDF(bus, devfn)     ((bus << 8) | (devfn))
+#define PCI_BDF_TO_DEVFN(x)     ((x) & 0xff)
 #define PCI_BUS_MAX             256
 #define PCI_DEVFN_MAX           256
 #define PCI_SLOT_MAX            32
@@ -126,6 +127,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
 typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
                                 pcibus_t addr, pcibus_t size, int type);
 typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
+typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
+typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
 
 typedef struct PCIIORegion {
     pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
@@ -321,6 +324,13 @@ struct PCIDevice {
     /* Space to store MSIX table & pending bit array */
     uint8_t *msix_table;
     uint8_t *msix_pba;
+
+    /* May be used by INTx or MSI during interrupt notification */
+    void *irq_opaque;
+
+    PCIMSINotify *msi_notify;
+    PCIMSIxNotify *msix_notify;
+
     /* MemoryRegion container for msix exclusive BAR setup */
     MemoryRegion msix_exclusive_bar;
     /* Memory Regions for MSIX table and pending bit entries. */
diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-user-obj.h
new file mode 100644
index 0000000000..87ab78b875
--- /dev/null
+++ b/include/hw/remote/vfio-user-obj.h
@@ -0,0 +1,6 @@
+#ifndef VFIO_USER_OBJ_H
+#define VFIO_USER_OBJ_H
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus);
+
+#endif
diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index 47d2b0f33c..a161a5380b 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -51,6 +51,8 @@
  */
 bool msi_nonbroken;
 
+static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
     dev->msi_cap = config_offset;
     dev->cap_present |= QEMU_PCI_CAP_MSI;
 
+    dev->msi_notify = pci_msi_notify;
+
     pci_set_word(dev->config + msi_flags_off(dev), flags);
     pci_set_word(dev->wmask + msi_flags_off(dev),
                  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
@@ -307,7 +311,7 @@ bool msi_is_masked(const PCIDevice *dev, unsigned int vector)
     return mask & (1U << vector);
 }
 
-void msi_notify(PCIDevice *dev, unsigned int vector)
+static void pci_msi_notify(PCIDevice *dev, unsigned int vector)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
@@ -332,6 +336,11 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
     msi_send_message(dev, msg);
 }
 
+void msi_notify(PCIDevice *dev, unsigned int vector)
+{
+    dev->msi_notify(dev, vector);
+}
+
 void msi_send_message(PCIDevice *dev, MSIMessage msg)
 {
     MemTxAttrs attrs = {};
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index ae9331cd0b..fbf88654b3 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -31,6 +31,8 @@
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
 #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
 
+static void pci_msix_notify(PCIDevice *dev, unsigned vector);
+
 MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
 {
     uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
@@ -334,6 +336,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
     dev->msix_table = g_malloc0(table_size);
     dev->msix_pba = g_malloc0(pba_size);
     dev->msix_entry_used = g_malloc0(nentries * sizeof *dev->msix_entry_used);
+    dev->msix_notify = pci_msix_notify;
 
     msix_mask_all(dev, nentries);
 
@@ -485,7 +488,7 @@ int msix_enabled(PCIDevice *dev)
 }
 
 /* Send an MSI-X message */
-void msix_notify(PCIDevice *dev, unsigned vector)
+static void pci_msix_notify(PCIDevice *dev, unsigned vector)
 {
     MSIMessage msg;
 
@@ -503,6 +506,11 @@ void msix_notify(PCIDevice *dev, unsigned vector)
     msi_send_message(dev, msg);
 }
 
+void msix_notify(PCIDevice *dev, unsigned vector)
+{
+    dev->msix_notify(dev, vector);
+}
+
 void msix_reset(PCIDevice *dev)
 {
     if (!msix_present(dev)) {
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 67801c3a4d..95f97ab268 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -23,6 +23,8 @@
 #include "hw/remote/iohub.h"
 #include "hw/qdev-core.h"
 #include "hw/remote/iommu.h"
+#include "hw/remote/vfio-user-obj.h"
+#include "hw/pci/msi.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -53,13 +55,17 @@ static void remote_machine_init(MachineState *machine)
     pci_host = PCI_HOST_BRIDGE(rem_host);
 
     if (s->vfio_user) {
+        msi_nonbroken = true;
+
         remote_configure_iommu(pci_host->bus);
-    }
 
-    remote_iohub_init(&s->iohub);
+        vfu_object_set_bus_irq(pci_host->bus);
+    } else {
+        remote_iohub_init(&s->iohub);
 
-    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
-                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+        pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
+                     &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+    }
 
     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 6910c16243..19d075a9dd 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -54,6 +54,9 @@
 #include "hw/pci/pci.h"
 #include "qemu/timer.h"
 #include "exec/memory.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+#include "hw/remote/vfio-user-obj.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -489,6 +492,95 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
     }
 }
 
+static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
+{
+    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
+                                pci_dev->devfn);
+
+    return pci_bdf;
+}
+
+static void vfu_object_set_irq(void *opaque, int pirq, int level)
+{
+    PCIBus *pci_bus = opaque;
+    PCIDevice *pci_dev = NULL;
+    vfu_ctx_t *vfu_ctx = NULL;
+    int pci_bus_num, devfn;
+
+    if (level) {
+        pci_bus_num = PCI_BUS_NUM(pirq);
+        devfn = PCI_BDF_TO_DEVFN(pirq);
+
+        /*
+         * pci_find_device() performs at O(1) if the device is attached
+         * to the root PCI bus. Whereas, if the device is attached to a
+         * secondary PCI bus (such as when a root port is involved),
+         * finding the parent PCI bus could take O(n)
+         */
+        pci_dev = pci_find_device(pci_bus, pci_bus_num, devfn);
+
+        vfu_ctx = pci_dev->irq_opaque;
+
+        g_assert(vfu_ctx);
+
+        vfu_irq_trigger(vfu_ctx, 0);
+    }
+}
+
+static void vfu_object_msi_notify(PCIDevice *pci_dev, unsigned vector)
+{
+    vfu_ctx_t *vfu_ctx = pci_dev->irq_opaque;
+    unsigned nr_vectors = 0;
+
+    g_assert(vfu_ctx);
+
+    nr_vectors = msix_nr_vectors_allocated(pci_dev);
+    if (!nr_vectors) {
+        nr_vectors = msi_nr_vectors_allocated(pci_dev);
+    }
+
+    g_assert(vector < nr_vectors);
+
+    vfu_irq_trigger(vfu_ctx, vector);
+}
+
+static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
+{
+    vfu_ctx_t *vfu_ctx = o->vfu_ctx;
+    int ret;
+
+    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = 0;
+    if (msix_nr_vectors_allocated(pci_dev)) {
+        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
+                                       msix_nr_vectors_allocated(pci_dev));
+
+        pci_dev->msix_notify = vfu_object_msi_notify;
+    } else if (msi_nr_vectors_allocated(pci_dev)) {
+        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
+                                       msi_nr_vectors_allocated(pci_dev));
+
+        pci_dev->msi_notify = vfu_object_msi_notify;
+    }
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    pci_dev->irq_opaque = vfu_ctx;
+
+    return 0;
+}
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus)
+{
+    pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, pci_bus, 1);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -587,6 +679,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 
     vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
 
+    ret = vfu_object_setup_irqs(o, o->pci_dev);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup interrupts for %s",
+                   o->device);
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
@@ -612,6 +711,7 @@ fail:
         o->unplug_blocker = NULL;
     }
     if (o->pci_dev) {
+        o->pci_dev->irq_opaque = NULL;
         object_unref(OBJECT(o->pci_dev));
         o->pci_dev = NULL;
     }
@@ -671,6 +771,7 @@ static void vfu_object_finalize(Object *obj)
     }
 
     if (o->pci_dev) {
+        o->pci_dev->irq_opaque = NULL;
         object_unref(OBJECT(o->pci_dev));
         o->pci_dev = NULL;
     }
diff --git a/stubs/vfio-user-obj.c b/stubs/vfio-user-obj.c
new file mode 100644
index 0000000000..79100d768e
--- /dev/null
+++ b/stubs/vfio-user-obj.c
@@ -0,0 +1,6 @@
+#include "qemu/osdep.h"
+#include "hw/remote/vfio-user-obj.h"
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus)
+{
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index 21694a9698..d07f2a0985 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3599,6 +3599,7 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: include/hw/remote/vfio-user-obj.h
 F: hw/remote/iommu.c
 F: include/hw/remote/iommu.h
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 847d50d88f..c167b3c7a5 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -12,3 +12,4 @@ vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
 vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
 vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
 vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
+vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
diff --git a/stubs/meson.build b/stubs/meson.build
index 6f80fec761..d8f3fd5c44 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -60,3 +60,4 @@ if have_system
 else
   stub_ss.add(files('qdev.c'))
 endif
+stub_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_false: files('vfio-user-obj.c'))
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 16/17] vfio-user: handle reset of remote device
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (14 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 15/17] vfio-user: handle device interrupts Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  2022-03-29 14:46   ` Stefan Hajnoczi
  2022-03-25 19:19 ` [PATCH v7 17/17] vfio-user: avocado tests for vfio-user Jagannathan Raman
  16 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Adds handler to reset a remote device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 19d075a9dd..65e56d39fe 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -581,6 +581,20 @@ void vfu_object_set_bus_irq(PCIBus *pci_bus)
     pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, pci_bus, 1);
 }
 
+static int vfu_object_device_reset(vfu_ctx_t *vfu_ctx, vfu_reset_type_t type)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    /* vfu_object_ctx_run() handles lost connection */
+    if (type == VFU_RESET_LOST_CONN) {
+        return 0;
+    }
+
+    qdev_reset_all(DEVICE(o->pci_dev));
+
+    return 0;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -686,6 +700,12 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    ret = vfu_setup_device_reset_cb(o->vfu_ctx, &vfu_object_device_reset);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup reset callback");
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v7 17/17] vfio-user: avocado tests for vfio-user
  2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
                   ` (15 preceding siblings ...)
  2022-03-25 19:19 ` [PATCH v7 16/17] vfio-user: handle reset of remote device Jagannathan Raman
@ 2022-03-25 19:19 ` Jagannathan Raman
  16 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2022-03-25 19:19 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Avocado tests for libvfio-user in QEMU - tests startup,
hotplug and migration of the server object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 MAINTAINERS                |   1 +
 tests/avocado/vfio-user.py | 164 +++++++++++++++++++++++++++++++++++++
 2 files changed, 165 insertions(+)
 create mode 100644 tests/avocado/vfio-user.py

diff --git a/MAINTAINERS b/MAINTAINERS
index d07f2a0985..f165281796 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3602,6 +3602,7 @@ F: hw/remote/vfio-user-obj.c
 F: include/hw/remote/vfio-user-obj.h
 F: hw/remote/iommu.c
 F: include/hw/remote/iommu.h
+F: tests/avocado/vfio-user.py
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/tests/avocado/vfio-user.py b/tests/avocado/vfio-user.py
new file mode 100644
index 0000000000..ced304d770
--- /dev/null
+++ b/tests/avocado/vfio-user.py
@@ -0,0 +1,164 @@
+# vfio-user protocol sanity test
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+import os
+import socket
+import uuid
+
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado_qemu import exec_command_and_wait_for_pattern
+
+from avocado.utils import network
+from avocado.utils import wait
+
+class VfioUser(QemuSystemTest):
+    """
+    :avocado: tags=vfiouser
+    """
+    KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+    timeout = 20
+
+    def _get_free_port(self):
+        port = network.find_free_port()
+        if port is None:
+            self.cancel('Failed to find a free port')
+        return port
+
+    def validate_vm_launch(self, vm):
+        wait_for_console_pattern(self, 'as init process',
+                                 'Kernel panic - not syncing', vm=vm)
+        exec_command(self, 'mount -t sysfs sysfs /sys', vm=vm)
+        exec_command_and_wait_for_pattern(self,
+                                          'cat /sys/bus/pci/devices/*/uevent',
+                                          'PCI_ID=1000:0060', vm=vm)
+
+    def launch_server_startup(self, socket, *opts):
+        server_vm = self.get_vm()
+        server_vm.add_args('-machine', 'x-remote,vfio-user=on')
+        server_vm.add_args('-nodefaults')
+        server_vm.add_args('-device', 'megasas,id=sas1')
+        server_vm.add_args('-object', 'x-vfio-user-server,id=vfioobj1,'
+                           'type=unix,path='+socket+',device=sas1')
+        for opt in opts:
+            server_vm.add_args(opt)
+        server_vm.launch()
+        return server_vm
+
+    def launch_server_hotplug(self, socket):
+        server_vm = self.get_vm()
+        server_vm.add_args('-machine', 'x-remote,vfio-user=on')
+        server_vm.add_args('-nodefaults')
+        server_vm.launch()
+        server_vm.qmp('device_add', args_dict=None, conv_keys=None,
+                      driver='megasas', id='sas1')
+        obj_add_opts = {'qom-type': 'x-vfio-user-server',
+                        'id': 'vfioobj', 'device': 'sas1',
+                        'socket': {'type': 'unix', 'path': socket}}
+        server_vm.qmp('object-add', args_dict=obj_add_opts)
+        return server_vm
+
+    def launch_client(self, kernel_path, initrd_path, kernel_command_line,
+                      machine_type, socket, *opts):
+        client_vm = self.get_vm()
+        client_vm.set_console()
+        client_vm.add_args('-machine', machine_type)
+        client_vm.add_args('-accel', 'kvm')
+        client_vm.add_args('-cpu', 'host')
+        client_vm.add_args('-object',
+                           'memory-backend-memfd,id=sysmem-file,size=2G')
+        client_vm.add_args('--numa', 'node,memdev=sysmem-file')
+        client_vm.add_args('-m', '2048')
+        client_vm.add_args('-kernel', kernel_path,
+                           '-initrd', initrd_path,
+                           '-append', kernel_command_line)
+        client_vm.add_args('-device',
+                           'vfio-user-pci,socket='+socket)
+        for opt in opts:
+            client_vm.add_args(opt)
+        client_vm.launch()
+        return client_vm
+
+    def do_test_startup(self, kernel_url, initrd_url, kernel_command_line,
+                machine_type):
+        self.require_accelerator('kvm')
+
+        kernel_path = self.fetch_asset(kernel_url)
+        initrd_path = self.fetch_asset(initrd_url)
+        socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(socket):
+            os.remove(socket)
+        self.launch_server_startup(socket)
+        client = self.launch_client(kernel_path, initrd_path,
+                                    kernel_command_line, machine_type, socket)
+        self.validate_vm_launch(client)
+
+    def do_test_hotplug(self, kernel_url, initrd_url, kernel_command_line,
+                machine_type):
+        self.require_accelerator('kvm')
+
+        kernel_path = self.fetch_asset(kernel_url)
+        initrd_path = self.fetch_asset(initrd_url)
+        socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(socket):
+            os.remove(socket)
+        self.launch_server_hotplug(socket)
+        client = self.launch_client(kernel_path, initrd_path,
+                                    kernel_command_line, machine_type, socket)
+        self.validate_vm_launch(client)
+
+    def test_vfio_user_x86_64(self):
+        """
+        :avocado: tags=arch:x86_64
+        :avocado: tags=distro:centos
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyS0 rdinit=/bin/bash')
+        machine_type = 'pc'
+        self.do_test_startup(kernel_url, initrd_url, kernel_command_line,
+                             machine_type)
+
+    def test_vfio_user_aarch64(self):
+        """
+        :avocado: tags=arch:aarch64
+        :avocado: tags=distro:ubuntu
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'rdinit=/bin/bash console=ttyAMA0')
+        machine_type = 'virt,gic-version=3'
+        self.do_test_startup(kernel_url, initrd_url, kernel_command_line,
+                             machine_type)
+
+    def test_vfio_user_hotplug_x86_64(self):
+        """
+        :avocado: tags=arch:x86_64
+        :avocado: tags=distro:centos
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyS0 rdinit=/bin/bash')
+        machine_type = 'pc'
+        self.do_test_hotplug(kernel_url, initrd_url, kernel_command_line,
+                             machine_type)
+
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 02/17] qdev: unplug blocker for devices
  2022-03-25 19:19 ` [PATCH v7 02/17] qdev: unplug blocker for devices Jagannathan Raman
@ 2022-03-29 10:08   ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29 10:08 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 558 bytes --]

On Fri, Mar 25, 2022 at 03:19:31PM -0400, Jagannathan Raman wrote:
> Add blocker to prevent hot-unplug of devices
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/qdev-core.h | 29 +++++++++++++++++++++++++++++
>  hw/core/qdev.c         | 24 ++++++++++++++++++++++++
>  softmmu/qdev-monitor.c |  4 ++++
>  3 files changed, 57 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 07/17] vfio-user: define vfio-user-server object
  2022-03-25 19:19 ` [PATCH v7 07/17] vfio-user: define vfio-user-server object Jagannathan Raman
@ 2022-03-29 10:21   ` Stefan Hajnoczi
  2022-03-29 14:03     ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29 10:21 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 3211 bytes --]

On Fri, Mar 25, 2022 at 03:19:36PM -0400, Jagannathan Raman wrote:
  ##
> diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
> index 8d0fa98d33..2fcb9dada5 100644
> --- a/include/hw/remote/machine.h
> +++ b/include/hw/remote/machine.h
> @@ -26,6 +26,12 @@ struct RemoteMachineState {
>      bool vfio_user;
>  };
>  
> +struct RemoteMachineClass {
> +    MachineClass parent_class;
> +
> +    bool auto_shutdown;
> +};
> +
>  /* Used to pass to co-routine device and ioc. */
>  typedef struct RemoteCommDev {
>      PCIDevice *dev;
> @@ -33,7 +39,7 @@ typedef struct RemoteCommDev {
>  } RemoteCommDev;
>  
>  #define TYPE_REMOTE_MACHINE "x-remote-machine"
> -OBJECT_DECLARE_SIMPLE_TYPE(RemoteMachineState, REMOTE_MACHINE)
> +OBJECT_DECLARE_TYPE(RemoteMachineState, RemoteMachineClass, REMOTE_MACHINE)
>  
>  void coroutine_fn mpqemu_remote_msg_loop_co(void *data);
>  
> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> index a9a75e170f..70178b222c 100644
> --- a/hw/remote/machine.c
> +++ b/hw/remote/machine.c
> @@ -78,25 +78,48 @@ static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
>      s->vfio_user = value;
>  }
>  
> +static bool remote_machine_get_auto_shutdown(Object *obj, Error **errp)
> +{
> +    RemoteMachineClass *rmc = REMOTE_MACHINE_GET_CLASS(obj);
> +
> +    return rmc->auto_shutdown;
> +}
> +
> +static void remote_machine_set_auto_shutdown(Object *obj, bool value,
> +                                             Error **errp)
> +{
> +    RemoteMachineClass *rmc = REMOTE_MACHINE_GET_CLASS(obj);
> +
> +    rmc->auto_shutdown = value;
> +}
> +
>  static void remote_machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> +    RemoteMachineClass *rmc = REMOTE_MACHINE_CLASS(oc);
>      HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
>  
>      mc->init = remote_machine_init;
>      mc->desc = "Experimental remote machine";
>  
> +    rmc->auto_shutdown = true;
> +
>      hc->unplug = qdev_simple_device_unplug_cb;
>  
>      object_class_property_add_bool(oc, "vfio-user",
>                                     remote_machine_get_vfio_user,
>                                     remote_machine_set_vfio_user);
> +
> +    object_class_property_add_bool(oc, "auto-shutdown",
> +                                   remote_machine_get_auto_shutdown,
> +                                   remote_machine_set_auto_shutdown);
>  }
>  
>  static const TypeInfo remote_machine = {
>      .name = TYPE_REMOTE_MACHINE,
>      .parent = TYPE_MACHINE,
>      .instance_size = sizeof(RemoteMachineState),
> +    .class_size = sizeof(RemoteMachineClass),

Why is ->auto_shutdown a global RemoteMachineClass field instead of a
RemoteMachineState instance field?

The getter/setter functions receive an object instance so they could
access the value in RemoteMachineState instead of RemoteMachineClass.
Moving the field to RemoteMachineState would allow multiple
RemoteMachineState instances with different ->auto_shutdown values in
case QEMU ever supports running multiple machines within a single
process.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 08/17] vfio-user: instantiate vfio-user context
  2022-03-25 19:19 ` [PATCH v7 08/17] vfio-user: instantiate vfio-user context Jagannathan Raman
@ 2022-03-29 10:26   ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29 10:26 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 495 bytes --]

On Fri, Mar 25, 2022 at 03:19:37PM -0400, Jagannathan Raman wrote:
> create a context with the vfio-user library to run a PCI device
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 82 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 82 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-25 19:19 ` [PATCH v7 12/17] vfio-user: IOMMU support for remote device Jagannathan Raman
@ 2022-03-29 12:35   ` Stefan Hajnoczi
  2022-03-29 14:12     ` Jag Raman
  2022-04-13 14:25   ` Igor Mammedov
  1 sibling, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29 12:35 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 4733 bytes --]

On Fri, Mar 25, 2022 at 03:19:41PM -0400, Jagannathan Raman wrote:
> Assign separate address space for each device in the remote processes.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/remote/iommu.h | 18 ++++++++
>  hw/remote/iommu.c         | 95 +++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS               |  2 +
>  hw/remote/meson.build     |  1 +
>  4 files changed, 116 insertions(+)
>  create mode 100644 include/hw/remote/iommu.h
>  create mode 100644 hw/remote/iommu.c
> 
> diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
> new file mode 100644
> index 0000000000..8f850400f1
> --- /dev/null
> +++ b/include/hw/remote/iommu.h
> @@ -0,0 +1,18 @@
> +/**
> + * Copyright © 2022 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_IOMMU_H
> +#define REMOTE_IOMMU_H
> +
> +#include "hw/pci/pci_bus.h"
> +
> +void remote_configure_iommu(PCIBus *pci_bus);
> +
> +void remote_iommu_del_device(PCIDevice *pci_dev);
> +
> +#endif
> diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
> new file mode 100644
> index 0000000000..13f329b45d
> --- /dev/null
> +++ b/hw/remote/iommu.c
> @@ -0,0 +1,95 @@
> +/**
> + * IOMMU for remote device
> + *
> + * Copyright © 2022 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/iommu.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/pci/pci.h"
> +#include "exec/memory.h"
> +#include "exec/address-spaces.h"
> +#include "trace.h"
> +
> +struct RemoteIommuElem {
> +    AddressSpace  as;
> +    MemoryRegion  mr;
> +};
> +
> +struct RemoteIommuTable {
> +    QemuMutex lock;
> +    GHashTable *elem_by_bdf;
> +} remote_iommu_table;
> +
> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
> +
> +static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
> +                                              void *opaque, int devfn)
> +{
> +    struct RemoteIommuTable *iommu_table = opaque;
> +    struct RemoteIommuElem *elem = NULL;
> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
> +
> +    elem = g_hash_table_lookup(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf));

Why is a lock needed around g_hash_table_insert() below but no lock is
held around g_hash_table_lookup()?

Insertion isn't atomic because lookup and insert are separate operations
and they are not done under a single lock.

> +
> +    if (!elem) {
> +        g_autofree char *mr_name = g_strdup_printf("vfu-ram-%d", pci_bdf);
> +        g_autofree char *as_name = g_strdup_printf("vfu-as-%d", pci_bdf);
> +
> +        elem = g_malloc0(sizeof(struct RemoteIommuElem));
> +
> +        memory_region_init(&elem->mr, NULL, mr_name, UINT64_MAX);
> +        address_space_init(&elem->as, &elem->mr, as_name);
> +
> +        qemu_mutex_lock(&iommu_table->lock);
> +        g_hash_table_insert(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf), elem);
> +        qemu_mutex_unlock(&iommu_table->lock);
> +    }
> +
> +    return &elem->as;
> +}
> +
> +static void remote_iommu_del_elem(gpointer data)
> +{
> +    struct RemoteIommuElem *elem = data;
> +
> +    g_assert(elem);
> +
> +    memory_region_unref(&elem->mr);
> +    address_space_destroy(&elem->as);
> +
> +    g_free(elem);
> +}
> +
> +void remote_iommu_del_device(PCIDevice *pci_dev)
> +{
> +    int pci_bdf;
> +
> +    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
> +        return;
> +    }
> +
> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
> +
> +    qemu_mutex_lock(&remote_iommu_table.lock);
> +    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
> +    qemu_mutex_unlock(&remote_iommu_table.lock);
> +}
> +
> +void remote_configure_iommu(PCIBus *pci_bus)
> +{
> +    if (!remote_iommu_table.elem_by_bdf) {
> +        remote_iommu_table.elem_by_bdf =
> +            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
> +        qemu_mutex_init(&remote_iommu_table.lock);
> +    }
> +
> +    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);

Why is remote_iommu_table global? It could be per-PCIBus and indexed by
just devfn instead of the full BDF.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 14/17] vfio-user: handle PCI BAR accesses
  2022-03-25 19:19 ` [PATCH v7 14/17] vfio-user: handle PCI BAR accesses Jagannathan Raman
@ 2022-03-29 12:50   ` Stefan Hajnoczi
  2022-03-29 15:51     ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29 12:50 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 2418 bytes --]

On Fri, Mar 25, 2022 at 03:19:43PM -0400, Jagannathan Raman wrote:
> @@ -324,6 +325,170 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>      trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
>  }
>  
> +static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
> +                                hwaddr offset, char * const buf,
> +                                hwaddr len, const bool is_write)
> +{
> +    uint8_t *ptr = (uint8_t *)buf;
> +    uint8_t *ram_ptr = NULL;
> +    bool release_lock = false;
> +    MemoryRegionSection section = { 0 };
> +    MemoryRegion *mr = NULL;
> +    int access_size;
> +    hwaddr size = 0;
> +    MemTxResult result;
> +    uint64_t val;
> +
> +    section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
> +                                 offset, len);
> +
> +    if (!section.mr) {
> +        return 0;
> +    }
> +
> +    mr = section.mr;
> +    offset = section.offset_within_region;
> +
> +    if (is_write && mr->readonly) {
> +        warn_report("vfu: attempting to write to readonly region in "
> +                    "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
> +                    pci_bar, offset, (offset + len));
> +        return 0;

A mr reference is leaked. The return statement can be replaced with goto
exit.

> +    }
> +
> +    if (memory_access_is_direct(mr, is_write)) {
> +        /**
> +         * Some devices expose a PCI expansion ROM, which could be buffer
> +         * based as compared to other regions which are primarily based on
> +         * MemoryRegionOps. memory_region_find() would already check
> +         * for buffer overflow, we don't need to repeat it here.
> +         */
> +        ram_ptr = memory_region_get_ram_ptr(mr);
> +
> +        size = len;

This looks like it will access beyond the end of ram_ptr when
section.size < len after memory_region_find() returns.

> +
> +        if (is_write) {
> +            memcpy((ram_ptr + offset), buf, size);
> +        } else {
> +            memcpy(buf, (ram_ptr + offset), size);
> +        }
> +
> +        goto exit;

What happens when the access spans two adjacent MemoryRegions? I think
the while (len > 0) loop is needed even in the memory_access_is_direct()
case so we perform the full access instead of truncating it.

> +    }
> +
> +    while (len > 0) {

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 07/17] vfio-user: define vfio-user-server object
  2022-03-29 10:21   ` Stefan Hajnoczi
@ 2022-03-29 14:03     ` Jag Raman
  0 siblings, 0 replies; 44+ messages in thread
From: Jag Raman @ 2022-03-29 14:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	Paolo Bonzini, eblake, dgilbert



> On Mar 29, 2022, at 6:21 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Fri, Mar 25, 2022 at 03:19:36PM -0400, Jagannathan Raman wrote:
>  ##
>> diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
>> index 8d0fa98d33..2fcb9dada5 100644
>> --- a/include/hw/remote/machine.h
>> +++ b/include/hw/remote/machine.h
>> @@ -26,6 +26,12 @@ struct RemoteMachineState {
>>     bool vfio_user;
>> };
>> 
>> +struct RemoteMachineClass {
>> +    MachineClass parent_class;
>> +
>> +    bool auto_shutdown;
>> +};
>> +
>> /* Used to pass to co-routine device and ioc. */
>> typedef struct RemoteCommDev {
>>     PCIDevice *dev;
>> @@ -33,7 +39,7 @@ typedef struct RemoteCommDev {
>> } RemoteCommDev;
>> 
>> #define TYPE_REMOTE_MACHINE "x-remote-machine"
>> -OBJECT_DECLARE_SIMPLE_TYPE(RemoteMachineState, REMOTE_MACHINE)
>> +OBJECT_DECLARE_TYPE(RemoteMachineState, RemoteMachineClass, REMOTE_MACHINE)
>> 
>> void coroutine_fn mpqemu_remote_msg_loop_co(void *data);
>> 
>> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
>> index a9a75e170f..70178b222c 100644
>> --- a/hw/remote/machine.c
>> +++ b/hw/remote/machine.c
>> @@ -78,25 +78,48 @@ static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
>>     s->vfio_user = value;
>> }
>> 
>> +static bool remote_machine_get_auto_shutdown(Object *obj, Error **errp)
>> +{
>> +    RemoteMachineClass *rmc = REMOTE_MACHINE_GET_CLASS(obj);
>> +
>> +    return rmc->auto_shutdown;
>> +}
>> +
>> +static void remote_machine_set_auto_shutdown(Object *obj, bool value,
>> +                                             Error **errp)
>> +{
>> +    RemoteMachineClass *rmc = REMOTE_MACHINE_GET_CLASS(obj);
>> +
>> +    rmc->auto_shutdown = value;
>> +}
>> +
>> static void remote_machine_class_init(ObjectClass *oc, void *data)
>> {
>>     MachineClass *mc = MACHINE_CLASS(oc);
>> +    RemoteMachineClass *rmc = REMOTE_MACHINE_CLASS(oc);
>>     HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
>> 
>>     mc->init = remote_machine_init;
>>     mc->desc = "Experimental remote machine";
>> 
>> +    rmc->auto_shutdown = true;
>> +
>>     hc->unplug = qdev_simple_device_unplug_cb;
>> 
>>     object_class_property_add_bool(oc, "vfio-user",
>>                                    remote_machine_get_vfio_user,
>>                                    remote_machine_set_vfio_user);
>> +
>> +    object_class_property_add_bool(oc, "auto-shutdown",
>> +                                   remote_machine_get_auto_shutdown,
>> +                                   remote_machine_set_auto_shutdown);
>> }
>> 
>> static const TypeInfo remote_machine = {
>>     .name = TYPE_REMOTE_MACHINE,
>>     .parent = TYPE_MACHINE,
>>     .instance_size = sizeof(RemoteMachineState),
>> +    .class_size = sizeof(RemoteMachineClass),
> 
> Why is ->auto_shutdown a global RemoteMachineClass field instead of a
> RemoteMachineState instance field?
> 
> The getter/setter functions receive an object instance so they could
> access the value in RemoteMachineState instead of RemoteMachineClass.
> Moving the field to RemoteMachineState would allow multiple
> RemoteMachineState instances with different ->auto_shutdown values in
> case QEMU ever supports running multiple machines within a single
> process.

Hi Stefan,

Presently, qemu_system_shutdown_request() shuts down the entire
system. Therefore, even if a single machine requests shutdown,
the whole system would be shutdown.

You’re suggesting a future enhancement to QEMU where multiple machines
could run in the same process, and the shutdown API could presumably
target a selected machine.

Sounds good to me, will make it a RemoteMachineState instance field.

Thank you!
--
Jag


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-29 12:35   ` Stefan Hajnoczi
@ 2022-03-29 14:12     ` Jag Raman
  2022-03-29 14:48       ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2022-03-29 14:12 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Mar 29, 2022, at 8:35 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Fri, Mar 25, 2022 at 03:19:41PM -0400, Jagannathan Raman wrote:
>> Assign separate address space for each device in the remote processes.
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> include/hw/remote/iommu.h | 18 ++++++++
>> hw/remote/iommu.c         | 95 +++++++++++++++++++++++++++++++++++++++
>> MAINTAINERS               |  2 +
>> hw/remote/meson.build     |  1 +
>> 4 files changed, 116 insertions(+)
>> create mode 100644 include/hw/remote/iommu.h
>> create mode 100644 hw/remote/iommu.c
>> 
>> diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
>> new file mode 100644
>> index 0000000000..8f850400f1
>> --- /dev/null
>> +++ b/include/hw/remote/iommu.h
>> @@ -0,0 +1,18 @@
>> +/**
>> + * Copyright © 2022 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef REMOTE_IOMMU_H
>> +#define REMOTE_IOMMU_H
>> +
>> +#include "hw/pci/pci_bus.h"
>> +
>> +void remote_configure_iommu(PCIBus *pci_bus);
>> +
>> +void remote_iommu_del_device(PCIDevice *pci_dev);
>> +
>> +#endif
>> diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
>> new file mode 100644
>> index 0000000000..13f329b45d
>> --- /dev/null
>> +++ b/hw/remote/iommu.c
>> @@ -0,0 +1,95 @@
>> +/**
>> + * IOMMU for remote device
>> + *
>> + * Copyright © 2022 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +
>> +#include "hw/remote/iommu.h"
>> +#include "hw/pci/pci_bus.h"
>> +#include "hw/pci/pci.h"
>> +#include "exec/memory.h"
>> +#include "exec/address-spaces.h"
>> +#include "trace.h"
>> +
>> +struct RemoteIommuElem {
>> +    AddressSpace  as;
>> +    MemoryRegion  mr;
>> +};
>> +
>> +struct RemoteIommuTable {
>> +    QemuMutex lock;
>> +    GHashTable *elem_by_bdf;
>> +} remote_iommu_table;
>> +
>> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
>> +
>> +static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
>> +                                              void *opaque, int devfn)
>> +{
>> +    struct RemoteIommuTable *iommu_table = opaque;
>> +    struct RemoteIommuElem *elem = NULL;
>> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
>> +
>> +    elem = g_hash_table_lookup(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf));
> 
> Why is a lock needed around g_hash_table_insert() below but no lock is
> held around g_hash_table_lookup()?
> 
> Insertion isn't atomic because lookup and insert are separate operations
> and they are not done under a single lock.

Thanks for the catch! The lock should cover lookup also.

> 
>> +
>> +    if (!elem) {
>> +        g_autofree char *mr_name = g_strdup_printf("vfu-ram-%d", pci_bdf);
>> +        g_autofree char *as_name = g_strdup_printf("vfu-as-%d", pci_bdf);
>> +
>> +        elem = g_malloc0(sizeof(struct RemoteIommuElem));
>> +
>> +        memory_region_init(&elem->mr, NULL, mr_name, UINT64_MAX);
>> +        address_space_init(&elem->as, &elem->mr, as_name);
>> +
>> +        qemu_mutex_lock(&iommu_table->lock);
>> +        g_hash_table_insert(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf), elem);
>> +        qemu_mutex_unlock(&iommu_table->lock);
>> +    }
>> +
>> +    return &elem->as;
>> +}
>> +
>> +static void remote_iommu_del_elem(gpointer data)
>> +{
>> +    struct RemoteIommuElem *elem = data;
>> +
>> +    g_assert(elem);
>> +
>> +    memory_region_unref(&elem->mr);
>> +    address_space_destroy(&elem->as);
>> +
>> +    g_free(elem);
>> +}
>> +
>> +void remote_iommu_del_device(PCIDevice *pci_dev)
>> +{
>> +    int pci_bdf;
>> +
>> +    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
>> +        return;
>> +    }
>> +
>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
>> +
>> +    qemu_mutex_lock(&remote_iommu_table.lock);
>> +    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
>> +    qemu_mutex_unlock(&remote_iommu_table.lock);
>> +}
>> +
>> +void remote_configure_iommu(PCIBus *pci_bus)
>> +{
>> +    if (!remote_iommu_table.elem_by_bdf) {
>> +        remote_iommu_table.elem_by_bdf =
>> +            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
>> +        qemu_mutex_init(&remote_iommu_table.lock);
>> +    }
>> +
>> +    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);
> 
> Why is remote_iommu_table global? It could be per-PCIBus and indexed by
> just devfn instead of the full BDF.

It’s global because remote_iommu_del_device() needs it for cleanup.

OK, will make it a per bus property.

Thank you!
--
Jag


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 16/17] vfio-user: handle reset of remote device
  2022-03-25 19:19 ` [PATCH v7 16/17] vfio-user: handle reset of remote device Jagannathan Raman
@ 2022-03-29 14:46   ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29 14:46 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 450 bytes --]

On Fri, Mar 25, 2022 at 03:19:45PM -0400, Jagannathan Raman wrote:
> Adds handler to reset a remote device
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-29 14:12     ` Jag Raman
@ 2022-03-29 14:48       ` Stefan Hajnoczi
  2022-03-29 19:58         ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29 14:48 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 1433 bytes --]

On Tue, Mar 29, 2022 at 02:12:40PM +0000, Jag Raman wrote:
> > On Mar 29, 2022, at 8:35 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Fri, Mar 25, 2022 at 03:19:41PM -0400, Jagannathan Raman wrote:
> >> +void remote_iommu_del_device(PCIDevice *pci_dev)
> >> +{
> >> +    int pci_bdf;
> >> +
> >> +    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
> >> +        return;
> >> +    }
> >> +
> >> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
> >> +
> >> +    qemu_mutex_lock(&remote_iommu_table.lock);
> >> +    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
> >> +    qemu_mutex_unlock(&remote_iommu_table.lock);
> >> +}
> >> +
> >> +void remote_configure_iommu(PCIBus *pci_bus)
> >> +{
> >> +    if (!remote_iommu_table.elem_by_bdf) {
> >> +        remote_iommu_table.elem_by_bdf =
> >> +            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
> >> +        qemu_mutex_init(&remote_iommu_table.lock);
> >> +    }
> >> +
> >> +    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);
> > 
> > Why is remote_iommu_table global? It could be per-PCIBus and indexed by
> > just devfn instead of the full BDF.
> 
> It’s global because remote_iommu_del_device() needs it for cleanup.

Can remote_iommu_del_device() use pci_get_bis(pci_dev)->irq_opaque to
get the per-bus table?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 14/17] vfio-user: handle PCI BAR accesses
  2022-03-29 12:50   ` Stefan Hajnoczi
@ 2022-03-29 15:51     ` Jag Raman
  2022-03-30 10:05       ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2022-03-29 15:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Mar 29, 2022, at 8:50 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Fri, Mar 25, 2022 at 03:19:43PM -0400, Jagannathan Raman wrote:
>> @@ -324,6 +325,170 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>>     trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
>> }
>> 
>> +static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
>> +                                hwaddr offset, char * const buf,
>> +                                hwaddr len, const bool is_write)
>> +{
>> +    uint8_t *ptr = (uint8_t *)buf;
>> +    uint8_t *ram_ptr = NULL;
>> +    bool release_lock = false;
>> +    MemoryRegionSection section = { 0 };
>> +    MemoryRegion *mr = NULL;
>> +    int access_size;
>> +    hwaddr size = 0;
>> +    MemTxResult result;
>> +    uint64_t val;
>> +
>> +    section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
>> +                                 offset, len);
>> +
>> +    if (!section.mr) {
>> +        return 0;
>> +    }
>> +
>> +    mr = section.mr;
>> +    offset = section.offset_within_region;
>> +
>> +    if (is_write && mr->readonly) {
>> +        warn_report("vfu: attempting to write to readonly region in "
>> +                    "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
>> +                    pci_bar, offset, (offset + len));
>> +        return 0;
> 
> A mr reference is leaked. The return statement can be replaced with goto
> exit.

OK.

> 
>> +    }
>> +
>> +    if (memory_access_is_direct(mr, is_write)) {
>> +        /**
>> +         * Some devices expose a PCI expansion ROM, which could be buffer
>> +         * based as compared to other regions which are primarily based on
>> +         * MemoryRegionOps. memory_region_find() would already check
>> +         * for buffer overflow, we don't need to repeat it here.
>> +         */
>> +        ram_ptr = memory_region_get_ram_ptr(mr);
>> +
>> +        size = len;
> 
> This looks like it will access beyond the end of ram_ptr when
> section.size < len after memory_region_find() returns.

OK - will will handle shorter sections returned by memory_region_find().

> 
>> +
>> +        if (is_write) {
>> +            memcpy((ram_ptr + offset), buf, size);
>> +        } else {
>> +            memcpy(buf, (ram_ptr + offset), size);
>> +        }
>> +
>> +        goto exit;
> 
> What happens when the access spans two adjacent MemoryRegions? I think
> the while (len > 0) loop is needed even in the memory_access_is_direct()
> case so we perform the full access instead of truncating it.

OK - this should be covered by the shorter section handling above.

Thank you!
--
Jag

> 
>> +    }
>> +
>> +    while (len > 0) {



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-29 14:48       ` Stefan Hajnoczi
@ 2022-03-29 19:58         ` Jag Raman
  2022-03-30 10:04           ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2022-03-29 19:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Mar 29, 2022, at 10:48 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Tue, Mar 29, 2022 at 02:12:40PM +0000, Jag Raman wrote:
>>> On Mar 29, 2022, at 8:35 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> On Fri, Mar 25, 2022 at 03:19:41PM -0400, Jagannathan Raman wrote:
>>>> +void remote_iommu_del_device(PCIDevice *pci_dev)
>>>> +{
>>>> +    int pci_bdf;
>>>> +
>>>> +    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
>>>> +        return;
>>>> +    }
>>>> +
>>>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
>>>> +
>>>> +    qemu_mutex_lock(&remote_iommu_table.lock);
>>>> +    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
>>>> +    qemu_mutex_unlock(&remote_iommu_table.lock);
>>>> +}
>>>> +
>>>> +void remote_configure_iommu(PCIBus *pci_bus)
>>>> +{
>>>> +    if (!remote_iommu_table.elem_by_bdf) {
>>>> +        remote_iommu_table.elem_by_bdf =
>>>> +            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
>>>> +        qemu_mutex_init(&remote_iommu_table.lock);
>>>> +    }
>>>> +
>>>> +    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);
>>> 
>>> Why is remote_iommu_table global? It could be per-PCIBus and indexed by
>>> just devfn instead of the full BDF.
>> 
>> It’s global because remote_iommu_del_device() needs it for cleanup.
> 
> Can remote_iommu_del_device() use pci_get_bis(pci_dev)->irq_opaque to
> get the per-bus table?

pci_get_bus(pci_dev)->irq_opaque is used for interrupts.

PCIBus already has an iommu_opaque, which is a private
member of the bus structure. It’s passed as an argument
to the iommu_fn().

We could add a getter function to retrieve PCIBus->iommu_opaque
in remote_iommu_del_device(). That way we could avoid the global variable.

Thank you!
--
Jag

> 
> Stefan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-29 19:58         ` Jag Raman
@ 2022-03-30 10:04           ` Stefan Hajnoczi
  2022-03-30 12:53             ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-30 10:04 UTC (permalink / raw)
  To: Jag Raman, Michael S. Tsirkin, Peter Xu, Jason Wang
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 3073 bytes --]

On Tue, Mar 29, 2022 at 07:58:51PM +0000, Jag Raman wrote:
> 
> 
> > On Mar 29, 2022, at 10:48 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Tue, Mar 29, 2022 at 02:12:40PM +0000, Jag Raman wrote:
> >>> On Mar 29, 2022, at 8:35 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> On Fri, Mar 25, 2022 at 03:19:41PM -0400, Jagannathan Raman wrote:
> >>>> +void remote_iommu_del_device(PCIDevice *pci_dev)
> >>>> +{
> >>>> +    int pci_bdf;
> >>>> +
> >>>> +    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
> >>>> +        return;
> >>>> +    }
> >>>> +
> >>>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
> >>>> +
> >>>> +    qemu_mutex_lock(&remote_iommu_table.lock);
> >>>> +    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
> >>>> +    qemu_mutex_unlock(&remote_iommu_table.lock);
> >>>> +}
> >>>> +
> >>>> +void remote_configure_iommu(PCIBus *pci_bus)
> >>>> +{
> >>>> +    if (!remote_iommu_table.elem_by_bdf) {
> >>>> +        remote_iommu_table.elem_by_bdf =
> >>>> +            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
> >>>> +        qemu_mutex_init(&remote_iommu_table.lock);
> >>>> +    }
> >>>> +
> >>>> +    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);
> >>> 
> >>> Why is remote_iommu_table global? It could be per-PCIBus and indexed by
> >>> just devfn instead of the full BDF.
> >> 
> >> It’s global because remote_iommu_del_device() needs it for cleanup.
> > 
> > Can remote_iommu_del_device() use pci_get_bis(pci_dev)->irq_opaque to
> > get the per-bus table?
> 
> pci_get_bus(pci_dev)->irq_opaque is used for interrupts.
> 
> PCIBus already has an iommu_opaque, which is a private
> member of the bus structure. It’s passed as an argument
> to the iommu_fn().
> 
> We could add a getter function to retrieve PCIBus->iommu_opaque
> in remote_iommu_del_device(). That way we could avoid the global variable.

I've CCed Michael, Peter, and Jason regarding IOMMUs.

This makes me wonder whether there is a deeper issue with the
pci_setup_iommu() API: the lack of per-device cleanup callbacks.
Per-device IOMMU resources should be freed when a device is hot
unplugged.

From what I can tell this is not the case today:

- hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
  address spaces but I can't find where they are removed and freed.
  VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.

- hw/i386/amd_iommu.c has similar leaks.

Your patch introduces a custom remote_iommu_del_device() function, but I
think the pci_setup_iommu() API should take a device_del() callback so
IOMMUs have a standard interface for handling per-device cleanup.

BTW in your case remote_iommu_del_device() is sufficient because hot
unplug is blocked by the new unplug blocker mechanism you introduced.
For other IOMMUs unplug will not be blocked and therefore IOMMUs really
need a callback for per-device cleanup.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 14/17] vfio-user: handle PCI BAR accesses
  2022-03-29 15:51     ` Jag Raman
@ 2022-03-30 10:05       ` Stefan Hajnoczi
  2022-03-30 14:46         ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-30 10:05 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 3062 bytes --]

On Tue, Mar 29, 2022 at 03:51:17PM +0000, Jag Raman wrote:
> 
> 
> > On Mar 29, 2022, at 8:50 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Fri, Mar 25, 2022 at 03:19:43PM -0400, Jagannathan Raman wrote:
> >> @@ -324,6 +325,170 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
> >>     trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
> >> }
> >> 
> >> +static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
> >> +                                hwaddr offset, char * const buf,
> >> +                                hwaddr len, const bool is_write)
> >> +{
> >> +    uint8_t *ptr = (uint8_t *)buf;
> >> +    uint8_t *ram_ptr = NULL;
> >> +    bool release_lock = false;
> >> +    MemoryRegionSection section = { 0 };
> >> +    MemoryRegion *mr = NULL;
> >> +    int access_size;
> >> +    hwaddr size = 0;
> >> +    MemTxResult result;
> >> +    uint64_t val;
> >> +
> >> +    section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
> >> +                                 offset, len);
> >> +
> >> +    if (!section.mr) {
> >> +        return 0;
> >> +    }
> >> +
> >> +    mr = section.mr;
> >> +    offset = section.offset_within_region;
> >> +
> >> +    if (is_write && mr->readonly) {
> >> +        warn_report("vfu: attempting to write to readonly region in "
> >> +                    "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
> >> +                    pci_bar, offset, (offset + len));
> >> +        return 0;
> > 
> > A mr reference is leaked. The return statement can be replaced with goto
> > exit.
> 
> OK.
> 
> > 
> >> +    }
> >> +
> >> +    if (memory_access_is_direct(mr, is_write)) {
> >> +        /**
> >> +         * Some devices expose a PCI expansion ROM, which could be buffer
> >> +         * based as compared to other regions which are primarily based on
> >> +         * MemoryRegionOps. memory_region_find() would already check
> >> +         * for buffer overflow, we don't need to repeat it here.
> >> +         */
> >> +        ram_ptr = memory_region_get_ram_ptr(mr);
> >> +
> >> +        size = len;
> > 
> > This looks like it will access beyond the end of ram_ptr when
> > section.size < len after memory_region_find() returns.
> 
> OK - will will handle shorter sections returned by memory_region_find().
> 
> > 
> >> +
> >> +        if (is_write) {
> >> +            memcpy((ram_ptr + offset), buf, size);
> >> +        } else {
> >> +            memcpy(buf, (ram_ptr + offset), size);
> >> +        }
> >> +
> >> +        goto exit;
> > 
> > What happens when the access spans two adjacent MemoryRegions? I think
> > the while (len > 0) loop is needed even in the memory_access_is_direct()
> > case so we perform the full access instead of truncating it.
> 
> OK - this should be covered by the shorter section handling above.

No, shorter section handling truncates that access. I think the caller
expects all len bytes to be accessed, not just the first few bytes?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-30 10:04           ` Stefan Hajnoczi
@ 2022-03-30 12:53             ` Peter Xu
  2022-03-30 16:08               ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-03-30 12:53 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, Jag Raman, bleal,
	john.levon, Michael S. Tsirkin, armbru, Jason Wang, quintela,
	f4bug, qemu-devel, Alex Williamson, Kanth Ghatraju, berrange,
	thanos.makatos, pbonzini, eblake, dgilbert

On Wed, Mar 30, 2022 at 11:04:24AM +0100, Stefan Hajnoczi wrote:
> This makes me wonder whether there is a deeper issue with the
> pci_setup_iommu() API: the lack of per-device cleanup callbacks.
> Per-device IOMMU resources should be freed when a device is hot
> unplugged.
> 
> From what I can tell this is not the case today:
> 
> - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
>   address spaces but I can't find where they are removed and freed.
>   VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.
> 
> - hw/i386/amd_iommu.c has similar leaks.

AFAICT it's because there's no device-specific data cached in the
per-device IOMMU address space, at least so far.  IOW, all the data
structures allocated here can be re-used when a new device is plugged in
after the old device unplugged.

It's definitely not ideal since after unplug (and before a new device
plugged in) the resource is not needed at all so it's kind of wasted, but
it should work functionally.  If to achieve that, some iommu_unplug() or
iommu_cleanup() hook sounds reasonable.

One thing I'm not sure is these iommu ops are per-bus not per-device.  So
I'm not sure whether that's what we wanted here because remote device
cleanup seems to be per-device only.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 14/17] vfio-user: handle PCI BAR accesses
  2022-03-30 10:05       ` Stefan Hajnoczi
@ 2022-03-30 14:46         ` Jag Raman
  2022-03-30 16:06           ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2022-03-30 14:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Mar 30, 2022, at 6:05 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Tue, Mar 29, 2022 at 03:51:17PM +0000, Jag Raman wrote:
>> 
>> 
>>> On Mar 29, 2022, at 8:50 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> 
>>> On Fri, Mar 25, 2022 at 03:19:43PM -0400, Jagannathan Raman wrote:
>>>> @@ -324,6 +325,170 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>>>>    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
>>>> }
>>>> 
>>>> +static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
>>>> +                                hwaddr offset, char * const buf,
>>>> +                                hwaddr len, const bool is_write)
>>>> +{
>>>> +    uint8_t *ptr = (uint8_t *)buf;
>>>> +    uint8_t *ram_ptr = NULL;
>>>> +    bool release_lock = false;
>>>> +    MemoryRegionSection section = { 0 };
>>>> +    MemoryRegion *mr = NULL;
>>>> +    int access_size;
>>>> +    hwaddr size = 0;
>>>> +    MemTxResult result;
>>>> +    uint64_t val;
>>>> +
>>>> +    section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
>>>> +                                 offset, len);
>>>> +
>>>> +    if (!section.mr) {
>>>> +        return 0;
>>>> +    }
>>>> +
>>>> +    mr = section.mr;
>>>> +    offset = section.offset_within_region;
>>>> +
>>>> +    if (is_write && mr->readonly) {
>>>> +        warn_report("vfu: attempting to write to readonly region in "
>>>> +                    "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
>>>> +                    pci_bar, offset, (offset + len));
>>>> +        return 0;
>>> 
>>> A mr reference is leaked. The return statement can be replaced with goto
>>> exit.
>> 
>> OK.
>> 
>>> 
>>>> +    }
>>>> +
>>>> +    if (memory_access_is_direct(mr, is_write)) {
>>>> +        /**
>>>> +         * Some devices expose a PCI expansion ROM, which could be buffer
>>>> +         * based as compared to other regions which are primarily based on
>>>> +         * MemoryRegionOps. memory_region_find() would already check
>>>> +         * for buffer overflow, we don't need to repeat it here.
>>>> +         */
>>>> +        ram_ptr = memory_region_get_ram_ptr(mr);
>>>> +
>>>> +        size = len;
>>> 
>>> This looks like it will access beyond the end of ram_ptr when
>>> section.size < len after memory_region_find() returns.
>> 
>> OK - will will handle shorter sections returned by memory_region_find().
>> 
>>> 
>>>> +
>>>> +        if (is_write) {
>>>> +            memcpy((ram_ptr + offset), buf, size);
>>>> +        } else {
>>>> +            memcpy(buf, (ram_ptr + offset), size);
>>>> +        }
>>>> +
>>>> +        goto exit;
>>> 
>>> What happens when the access spans two adjacent MemoryRegions? I think
>>> the while (len > 0) loop is needed even in the memory_access_is_direct()
>>> case so we perform the full access instead of truncating it.
>> 
>> OK - this should be covered by the shorter section handling above.
> 
> No, shorter section handling truncates that access. I think the caller
> expects all len bytes to be accessed, not just the first few bytes?

Yes, I also think that the caller expects to access all len bytes.

I should’ve provided more details - but by shorter section handling I
meant iterating over all the sections that make up len bytes.

Thank you!
--
Jag

> 
> Stefan


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 14/17] vfio-user: handle PCI BAR accesses
  2022-03-30 14:46         ` Jag Raman
@ 2022-03-30 16:06           ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-30 16:06 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 3748 bytes --]

On Wed, Mar 30, 2022 at 02:46:20PM +0000, Jag Raman wrote:
> 
> 
> > On Mar 30, 2022, at 6:05 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Tue, Mar 29, 2022 at 03:51:17PM +0000, Jag Raman wrote:
> >> 
> >> 
> >>> On Mar 29, 2022, at 8:50 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> 
> >>> On Fri, Mar 25, 2022 at 03:19:43PM -0400, Jagannathan Raman wrote:
> >>>> @@ -324,6 +325,170 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
> >>>>    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
> >>>> }
> >>>> 
> >>>> +static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
> >>>> +                                hwaddr offset, char * const buf,
> >>>> +                                hwaddr len, const bool is_write)
> >>>> +{
> >>>> +    uint8_t *ptr = (uint8_t *)buf;
> >>>> +    uint8_t *ram_ptr = NULL;
> >>>> +    bool release_lock = false;
> >>>> +    MemoryRegionSection section = { 0 };
> >>>> +    MemoryRegion *mr = NULL;
> >>>> +    int access_size;
> >>>> +    hwaddr size = 0;
> >>>> +    MemTxResult result;
> >>>> +    uint64_t val;
> >>>> +
> >>>> +    section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
> >>>> +                                 offset, len);
> >>>> +
> >>>> +    if (!section.mr) {
> >>>> +        return 0;
> >>>> +    }
> >>>> +
> >>>> +    mr = section.mr;
> >>>> +    offset = section.offset_within_region;
> >>>> +
> >>>> +    if (is_write && mr->readonly) {
> >>>> +        warn_report("vfu: attempting to write to readonly region in "
> >>>> +                    "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
> >>>> +                    pci_bar, offset, (offset + len));
> >>>> +        return 0;
> >>> 
> >>> A mr reference is leaked. The return statement can be replaced with goto
> >>> exit.
> >> 
> >> OK.
> >> 
> >>> 
> >>>> +    }
> >>>> +
> >>>> +    if (memory_access_is_direct(mr, is_write)) {
> >>>> +        /**
> >>>> +         * Some devices expose a PCI expansion ROM, which could be buffer
> >>>> +         * based as compared to other regions which are primarily based on
> >>>> +         * MemoryRegionOps. memory_region_find() would already check
> >>>> +         * for buffer overflow, we don't need to repeat it here.
> >>>> +         */
> >>>> +        ram_ptr = memory_region_get_ram_ptr(mr);
> >>>> +
> >>>> +        size = len;
> >>> 
> >>> This looks like it will access beyond the end of ram_ptr when
> >>> section.size < len after memory_region_find() returns.
> >> 
> >> OK - will will handle shorter sections returned by memory_region_find().
> >> 
> >>> 
> >>>> +
> >>>> +        if (is_write) {
> >>>> +            memcpy((ram_ptr + offset), buf, size);
> >>>> +        } else {
> >>>> +            memcpy(buf, (ram_ptr + offset), size);
> >>>> +        }
> >>>> +
> >>>> +        goto exit;
> >>> 
> >>> What happens when the access spans two adjacent MemoryRegions? I think
> >>> the while (len > 0) loop is needed even in the memory_access_is_direct()
> >>> case so we perform the full access instead of truncating it.
> >> 
> >> OK - this should be covered by the shorter section handling above.
> > 
> > No, shorter section handling truncates that access. I think the caller
> > expects all len bytes to be accessed, not just the first few bytes?
> 
> Yes, I also think that the caller expects to access all len bytes.
> 
> I should’ve provided more details - but by shorter section handling I
> meant iterating over all the sections that make up len bytes.

Okay, I think we're on the same page. I wanted to be make sure that if
(memory_access_is_direct()) will be moved inside the loop.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-30 12:53             ` Peter Xu
@ 2022-03-30 16:08               ` Stefan Hajnoczi
  2022-03-30 17:13                 ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-30 16:08 UTC (permalink / raw)
  To: Peter Xu
  Cc: eduardo, Elena Ufimtseva, John Johnson, Jag Raman, bleal,
	john.levon, Michael S. Tsirkin, armbru, Jason Wang, quintela,
	f4bug, qemu-devel, Alex Williamson, Kanth Ghatraju, berrange,
	thanos.makatos, pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 1393 bytes --]

On Wed, Mar 30, 2022 at 08:53:16AM -0400, Peter Xu wrote:
> On Wed, Mar 30, 2022 at 11:04:24AM +0100, Stefan Hajnoczi wrote:
> > This makes me wonder whether there is a deeper issue with the
> > pci_setup_iommu() API: the lack of per-device cleanup callbacks.
> > Per-device IOMMU resources should be freed when a device is hot
> > unplugged.
> > 
> > From what I can tell this is not the case today:
> > 
> > - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
> >   address spaces but I can't find where they are removed and freed.
> >   VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.
> > 
> > - hw/i386/amd_iommu.c has similar leaks.
> 
> AFAICT it's because there's no device-specific data cached in the
> per-device IOMMU address space, at least so far.  IOW, all the data
> structures allocated here can be re-used when a new device is plugged in
> after the old device unplugged.
> 
> It's definitely not ideal since after unplug (and before a new device
> plugged in) the resource is not needed at all so it's kind of wasted, but
> it should work functionally.  If to achieve that, some iommu_unplug() or
> iommu_cleanup() hook sounds reasonable.

I guess the question is whether PCI busses can be hotplugged with
IOMMUs. If yes, then there is a memory leak that matters for
intel_iommu.c and amd_iommu.c.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-30 16:08               ` Stefan Hajnoczi
@ 2022-03-30 17:13                 ` Peter Xu
  2022-03-31  9:47                   ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-03-30 17:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, Jag Raman, bleal,
	john.levon, Michael S. Tsirkin, armbru, Jason Wang, quintela,
	f4bug, qemu-devel, Alex Williamson, Kanth Ghatraju, berrange,
	thanos.makatos, pbonzini, eblake, dgilbert

On Wed, Mar 30, 2022 at 05:08:24PM +0100, Stefan Hajnoczi wrote:
> On Wed, Mar 30, 2022 at 08:53:16AM -0400, Peter Xu wrote:
> > On Wed, Mar 30, 2022 at 11:04:24AM +0100, Stefan Hajnoczi wrote:
> > > This makes me wonder whether there is a deeper issue with the
> > > pci_setup_iommu() API: the lack of per-device cleanup callbacks.
> > > Per-device IOMMU resources should be freed when a device is hot
> > > unplugged.
> > > 
> > > From what I can tell this is not the case today:
> > > 
> > > - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
> > >   address spaces but I can't find where they are removed and freed.
> > >   VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.
> > > 
> > > - hw/i386/amd_iommu.c has similar leaks.
> > 
> > AFAICT it's because there's no device-specific data cached in the
> > per-device IOMMU address space, at least so far.  IOW, all the data
> > structures allocated here can be re-used when a new device is plugged in
> > after the old device unplugged.
> > 
> > It's definitely not ideal since after unplug (and before a new device
> > plugged in) the resource is not needed at all so it's kind of wasted, but
> > it should work functionally.  If to achieve that, some iommu_unplug() or
> > iommu_cleanup() hook sounds reasonable.
> 
> I guess the question is whether PCI busses can be hotplugged with
> IOMMUs. If yes, then there is a memory leak that matters for
> intel_iommu.c and amd_iommu.c.

It can't, and we only support one vIOMMU so far for both (commit
1b3bf13890fd849b26).  Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-30 17:13                 ` Peter Xu
@ 2022-03-31  9:47                   ` Stefan Hajnoczi
  2022-03-31 12:41                     ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2022-03-31  9:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: eduardo, Elena Ufimtseva, John Johnson, Jag Raman, bleal,
	john.levon, Michael S. Tsirkin, armbru, Jason Wang, quintela,
	f4bug, qemu-devel, Alex Williamson, Kanth Ghatraju, berrange,
	thanos.makatos, pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 2440 bytes --]

On Wed, Mar 30, 2022 at 01:13:03PM -0400, Peter Xu wrote:
> On Wed, Mar 30, 2022 at 05:08:24PM +0100, Stefan Hajnoczi wrote:
> > On Wed, Mar 30, 2022 at 08:53:16AM -0400, Peter Xu wrote:
> > > On Wed, Mar 30, 2022 at 11:04:24AM +0100, Stefan Hajnoczi wrote:
> > > > This makes me wonder whether there is a deeper issue with the
> > > > pci_setup_iommu() API: the lack of per-device cleanup callbacks.
> > > > Per-device IOMMU resources should be freed when a device is hot
> > > > unplugged.
> > > > 
> > > > From what I can tell this is not the case today:
> > > > 
> > > > - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
> > > >   address spaces but I can't find where they are removed and freed.
> > > >   VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.
> > > > 
> > > > - hw/i386/amd_iommu.c has similar leaks.
> > > 
> > > AFAICT it's because there's no device-specific data cached in the
> > > per-device IOMMU address space, at least so far.  IOW, all the data
> > > structures allocated here can be re-used when a new device is plugged in
> > > after the old device unplugged.
> > > 
> > > It's definitely not ideal since after unplug (and before a new device
> > > plugged in) the resource is not needed at all so it's kind of wasted, but
> > > it should work functionally.  If to achieve that, some iommu_unplug() or
> > > iommu_cleanup() hook sounds reasonable.
> > 
> > I guess the question is whether PCI busses can be hotplugged with
> > IOMMUs. If yes, then there is a memory leak that matters for
> > intel_iommu.c and amd_iommu.c.
> 
> It can't, and we only support one vIOMMU so far for both (commit
> 1b3bf13890fd849b26).  Thanks,

I see, thanks!

Okay, summarizing options for the vfio-user IOMMU:

1. Use the same singleton approach as existing IOMMUs where the
   MemoryRegion/AddressSpace are never freed. Don't bother deleting.

2. Keep the approach in this patch where vfio-user code manually calls a
   custom delete function (not part of the pci_setup_iommu() API). This
   is slightly awkward to do without global state and that's what
   started this discussion.

3. Introduce an optional pci_setup_iommu() callback:

   typdef void (*PCIIOMMUDeviceUnplug)(PCIBus *bus, void *opaque, int devfn);

   Solves the awkwardness of option #2. Not needed by existing IOMMU
   devices.

Any preferences anyone?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-31  9:47                   ` Stefan Hajnoczi
@ 2022-03-31 12:41                     ` Peter Xu
  2022-04-13 14:37                       ` Igor Mammedov
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-03-31 12:41 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, Jag Raman, bleal,
	john.levon, Michael S. Tsirkin, armbru, Jason Wang, quintela,
	f4bug, qemu-devel, Alex Williamson, Kanth Ghatraju, berrange,
	thanos.makatos, pbonzini, eblake, Igor Mammedov, dgilbert

On Thu, Mar 31, 2022 at 10:47:33AM +0100, Stefan Hajnoczi wrote:
> On Wed, Mar 30, 2022 at 01:13:03PM -0400, Peter Xu wrote:
> > On Wed, Mar 30, 2022 at 05:08:24PM +0100, Stefan Hajnoczi wrote:
> > > On Wed, Mar 30, 2022 at 08:53:16AM -0400, Peter Xu wrote:
> > > > On Wed, Mar 30, 2022 at 11:04:24AM +0100, Stefan Hajnoczi wrote:
> > > > > This makes me wonder whether there is a deeper issue with the
> > > > > pci_setup_iommu() API: the lack of per-device cleanup callbacks.
> > > > > Per-device IOMMU resources should be freed when a device is hot
> > > > > unplugged.
> > > > > 
> > > > > From what I can tell this is not the case today:
> > > > > 
> > > > > - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
> > > > >   address spaces but I can't find where they are removed and freed.
> > > > >   VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.
> > > > > 
> > > > > - hw/i386/amd_iommu.c has similar leaks.
> > > > 
> > > > AFAICT it's because there's no device-specific data cached in the
> > > > per-device IOMMU address space, at least so far.  IOW, all the data
> > > > structures allocated here can be re-used when a new device is plugged in
> > > > after the old device unplugged.
> > > > 
> > > > It's definitely not ideal since after unplug (and before a new device
> > > > plugged in) the resource is not needed at all so it's kind of wasted, but
> > > > it should work functionally.  If to achieve that, some iommu_unplug() or
> > > > iommu_cleanup() hook sounds reasonable.
> > > 
> > > I guess the question is whether PCI busses can be hotplugged with
> > > IOMMUs. If yes, then there is a memory leak that matters for
> > > intel_iommu.c and amd_iommu.c.
> > 
> > It can't, and we only support one vIOMMU so far for both (commit
> > 1b3bf13890fd849b26).  Thanks,
> 
> I see, thanks!
> 
> Okay, summarizing options for the vfio-user IOMMU:
> 
> 1. Use the same singleton approach as existing IOMMUs where the
>    MemoryRegion/AddressSpace are never freed. Don't bother deleting.
> 
> 2. Keep the approach in this patch where vfio-user code manually calls a
>    custom delete function (not part of the pci_setup_iommu() API). This
>    is slightly awkward to do without global state and that's what
>    started this discussion.
> 
> 3. Introduce an optional pci_setup_iommu() callback:
> 
>    typdef void (*PCIIOMMUDeviceUnplug)(PCIBus *bus, void *opaque, int devfn);
> 
>    Solves the awkwardness of option #2. Not needed by existing IOMMU
>    devices.

Looks all workable to me.  One tiny thing is if we'd like 3) we may want to
pass over the PCIDevice* too because in this case IIUC we'd need to double
check the device class before doing anything - we may not want to call the
vfio-user callbacks for general emulated devices under the same pci bus.

I think we could also fetch that from PCIBus.devices[devfn] but that's just
not as obvious.

Option 4) is as mentioned previously, that we add another device unplug
hook that can be registered per-device.  I just didn't think thoroughly on
how it would interact with the current HotplugHandler design yet.. it looks
quite similar but so far it's either for the machine type or pci bus, not
capable of registering on one single device (and it's always a mistery to
me why we'd rather ignore the per-bus hook if the machine hook
existed.. that's in qdev_get_hotplug_handler).

Copying Igor too.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-25 19:19 ` [PATCH v7 12/17] vfio-user: IOMMU support for remote device Jagannathan Raman
  2022-03-29 12:35   ` Stefan Hajnoczi
@ 2022-04-13 14:25   ` Igor Mammedov
  2022-04-13 18:25     ` Jag Raman
  1 sibling, 1 reply; 44+ messages in thread
From: Igor Mammedov @ 2022-04-13 14:25 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, berrange, bleal, john.g.johnson,
	john.levon, qemu-devel, armbru, quintela, alex.williamson,
	pbonzini, mst, stefanha, thanos.makatos, kanth.ghatraju, eblake,
	dgilbert, f4bug

On Fri, 25 Mar 2022 15:19:41 -0400
Jagannathan Raman <jag.raman@oracle.com> wrote:

> Assign separate address space for each device in the remote processes.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/remote/iommu.h | 18 ++++++++
>  hw/remote/iommu.c         | 95 +++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS               |  2 +
>  hw/remote/meson.build     |  1 +
>  4 files changed, 116 insertions(+)
>  create mode 100644 include/hw/remote/iommu.h
>  create mode 100644 hw/remote/iommu.c
> 
> diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
> new file mode 100644
> index 0000000000..8f850400f1
> --- /dev/null
> +++ b/include/hw/remote/iommu.h
> @@ -0,0 +1,18 @@
> +/**
> + * Copyright © 2022 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#ifndef REMOTE_IOMMU_H
> +#define REMOTE_IOMMU_H
> +
> +#include "hw/pci/pci_bus.h"
> +
> +void remote_configure_iommu(PCIBus *pci_bus);
> +
> +void remote_iommu_del_device(PCIDevice *pci_dev);
> +
> +#endif
> diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
> new file mode 100644
> index 0000000000..13f329b45d
> --- /dev/null
> +++ b/hw/remote/iommu.c
> @@ -0,0 +1,95 @@
> +/**
> + * IOMMU for remote device
> + *
> + * Copyright © 2022 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "hw/remote/iommu.h"
> +#include "hw/pci/pci_bus.h"
> +#include "hw/pci/pci.h"
> +#include "exec/memory.h"
> +#include "exec/address-spaces.h"
> +#include "trace.h"
> +
> +struct RemoteIommuElem {
> +    AddressSpace  as;
> +    MemoryRegion  mr;
> +};
> +
> +struct RemoteIommuTable {
> +    QemuMutex lock;
> +    GHashTable *elem_by_bdf;
> +} remote_iommu_table;
> +
> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
> +
> +static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
> +                                              void *opaque, int devfn)
> +{
> +    struct RemoteIommuTable *iommu_table = opaque;
> +    struct RemoteIommuElem *elem = NULL;
> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
> +
> +    elem = g_hash_table_lookup(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf));
> +
> +    if (!elem) {
> +        g_autofree char *mr_name = g_strdup_printf("vfu-ram-%d", pci_bdf);
> +        g_autofree char *as_name = g_strdup_printf("vfu-as-%d", pci_bdf);
> +
> +        elem = g_malloc0(sizeof(struct RemoteIommuElem));
> +
> +        memory_region_init(&elem->mr, NULL, mr_name, UINT64_MAX);
goes here:
   memory_region_do_init()
        if (!owner) {                                                            
            owner = container_get(qdev_get_machine(), "/unattached");            
        }  

then

> +        address_space_init(&elem->as, &elem->mr, as_name);
> +
> +        qemu_mutex_lock(&iommu_table->lock);
> +        g_hash_table_insert(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf), elem);
> +        qemu_mutex_unlock(&iommu_table->lock);
> +    }
> +
> +    return &elem->as;
> +}
> +
> +static void remote_iommu_del_elem(gpointer data)
> +{
> +    struct RemoteIommuElem *elem = data;
> +
> +    g_assert(elem);
> +
> +    memory_region_unref(&elem->mr);

here we call
      object_unref(mr->owner); 
leaving dangling pointer in owner '(qdev_get_machine(), "/unattached")'
it doesn't look correct

I thought that memory_region_unref() should be always paired with memory_region_ref()

and looking at memory_region_init(...owner...) history it looks like
owner-less (NULL) regions are not meant to be deleted ever.

> +    address_space_destroy(&elem->as);
> +
> +    g_free(elem);
> +}
> +
> +void remote_iommu_del_device(PCIDevice *pci_dev)
> +{
> +    int pci_bdf;
> +
> +    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
> +        return;
> +    }
> +
> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
> +
> +    qemu_mutex_lock(&remote_iommu_table.lock);
> +    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
> +    qemu_mutex_unlock(&remote_iommu_table.lock);
> +}
> +
> +void remote_configure_iommu(PCIBus *pci_bus)
> +{
> +    if (!remote_iommu_table.elem_by_bdf) {
> +        remote_iommu_table.elem_by_bdf =
> +            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
> +        qemu_mutex_init(&remote_iommu_table.lock);
> +    }
> +
> +    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);
> +}
> diff --git a/MAINTAINERS b/MAINTAINERS
> index e7b0297a63..21694a9698 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3599,6 +3599,8 @@ F: hw/remote/iohub.c
>  F: include/hw/remote/iohub.h
>  F: subprojects/libvfio-user
>  F: hw/remote/vfio-user-obj.c
> +F: hw/remote/iommu.c
> +F: include/hw/remote/iommu.h
>  
>  EBPF:
>  M: Jason Wang <jasowang@redhat.com>
> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
> index 534ac5df79..bcef83c8cc 100644
> --- a/hw/remote/meson.build
> +++ b/hw/remote/meson.build
> @@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
>  remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iommu.c'))
>  remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
>  
>  remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-03-31 12:41                     ` Peter Xu
@ 2022-04-13 14:37                       ` Igor Mammedov
  2022-04-13 19:08                         ` Peter Xu
  0 siblings, 1 reply; 44+ messages in thread
From: Igor Mammedov @ 2022-04-13 14:37 UTC (permalink / raw)
  To: Peter Xu
  Cc: eduardo, Elena Ufimtseva, dgilbert, John Johnson, Jag Raman,
	bleal, john.levon, Michael S. Tsirkin, armbru, Jason Wang,
	quintela, qemu-devel, f4bug, Alex Williamson, Kanth Ghatraju,
	berrange, Stefan Hajnoczi, thanos.makatos, pbonzini, eblake

On Thu, 31 Mar 2022 08:41:01 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Thu, Mar 31, 2022 at 10:47:33AM +0100, Stefan Hajnoczi wrote:
> > On Wed, Mar 30, 2022 at 01:13:03PM -0400, Peter Xu wrote:  
> > > On Wed, Mar 30, 2022 at 05:08:24PM +0100, Stefan Hajnoczi wrote:  
> > > > On Wed, Mar 30, 2022 at 08:53:16AM -0400, Peter Xu wrote:  
> > > > > On Wed, Mar 30, 2022 at 11:04:24AM +0100, Stefan Hajnoczi wrote:  
> > > > > > This makes me wonder whether there is a deeper issue with the
> > > > > > pci_setup_iommu() API: the lack of per-device cleanup callbacks.
> > > > > > Per-device IOMMU resources should be freed when a device is hot
> > > > > > unplugged.
> > > > > > 
> > > > > > From what I can tell this is not the case today:
> > > > > > 
> > > > > > - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
> > > > > >   address spaces but I can't find where they are removed and freed.
> > > > > >   VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.
> > > > > > 
> > > > > > - hw/i386/amd_iommu.c has similar leaks.  
> > > > > 
> > > > > AFAICT it's because there's no device-specific data cached in the
> > > > > per-device IOMMU address space, at least so far.  IOW, all the data
> > > > > structures allocated here can be re-used when a new device is plugged in
> > > > > after the old device unplugged.
> > > > > 
> > > > > It's definitely not ideal since after unplug (and before a new device
> > > > > plugged in) the resource is not needed at all so it's kind of wasted, but
> > > > > it should work functionally.  If to achieve that, some iommu_unplug() or
> > > > > iommu_cleanup() hook sounds reasonable.  
> > > > 
> > > > I guess the question is whether PCI busses can be hotplugged with
> > > > IOMMUs. If yes, then there is a memory leak that matters for
> > > > intel_iommu.c and amd_iommu.c.  
> > > 
> > > It can't, and we only support one vIOMMU so far for both (commit
> > > 1b3bf13890fd849b26).  Thanks,  
> > 
> > I see, thanks!
> > 
> > Okay, summarizing options for the vfio-user IOMMU:
> > 
> > 1. Use the same singleton approach as existing IOMMUs where the
> >    MemoryRegion/AddressSpace are never freed. Don't bother deleting.
> > 
> > 2. Keep the approach in this patch where vfio-user code manually calls a
> >    custom delete function (not part of the pci_setup_iommu() API). This
> >    is slightly awkward to do without global state and that's what
> >    started this discussion.
> > 
> > 3. Introduce an optional pci_setup_iommu() callback:
> > 
> >    typdef void (*PCIIOMMUDeviceUnplug)(PCIBus *bus, void *opaque, int devfn);
> > 
> >    Solves the awkwardness of option #2. Not needed by existing IOMMU
> >    devices.  
> 
> Looks all workable to me.  One tiny thing is if we'd like 3) we may want to
> pass over the PCIDevice* too because in this case IIUC we'd need to double
> check the device class before doing anything - we may not want to call the
> vfio-user callbacks for general emulated devices under the same pci bus.
> 
> I think we could also fetch that from PCIBus.devices[devfn] but that's just
> not as obvious.
> 
> Option 4) is as mentioned previously, that we add another device unplug
> hook that can be registered per-device.  I just didn't think thoroughly on
can you expand on why per device hook is needed?

> how it would interact with the current HotplugHandler design yet.. it looks
> quite similar but so far it's either for the machine type or pci bus, not
> capable of registering on one single device (and it's always a mistery to
> me why we'd rather ignore the per-bus hook if the machine hook
> existed.. that's in qdev_get_hotplug_handler).

machine hook is there for bus-less devices mainly, if it's not defined
code will fallback to bus handler if any exists.

However machine hook can also be used to override default hotplug chain
to do to implement non trivial plug/unplug flow.
for example see pc_get_hotplug_handler(), corresponding
pc_machine_device_[pre_plug|plug|unplug...]_cb() where
plug/unplug chain is altered for some PCI devices types.
Perhaps the same can be done for vfio.

> 
> Copying Igor too.
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-04-13 14:25   ` Igor Mammedov
@ 2022-04-13 18:25     ` Jag Raman
  2022-04-13 21:59       ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2022-04-13 18:25 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eduardo, Elena Ufimtseva, berrange, bleal, John Johnson,
	john.levon, qemu-devel, armbru, quintela, alex.williamson,
	pbonzini, mst, stefanha, thanos.makatos, Kanth Ghatraju, eblake,
	dgilbert, f4bug



> On Apr 13, 2022, at 10:25 AM, Igor Mammedov <imammedo@redhat.com> wrote:
> 
> On Fri, 25 Mar 2022 15:19:41 -0400
> Jagannathan Raman <jag.raman@oracle.com> wrote:
> 
>> Assign separate address space for each device in the remote processes.
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> include/hw/remote/iommu.h | 18 ++++++++
>> hw/remote/iommu.c         | 95 +++++++++++++++++++++++++++++++++++++++
>> MAINTAINERS               |  2 +
>> hw/remote/meson.build     |  1 +
>> 4 files changed, 116 insertions(+)
>> create mode 100644 include/hw/remote/iommu.h
>> create mode 100644 hw/remote/iommu.c
>> 
>> diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
>> new file mode 100644
>> index 0000000000..8f850400f1
>> --- /dev/null
>> +++ b/include/hw/remote/iommu.h
>> @@ -0,0 +1,18 @@
>> +/**
>> + * Copyright © 2022 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef REMOTE_IOMMU_H
>> +#define REMOTE_IOMMU_H
>> +
>> +#include "hw/pci/pci_bus.h"
>> +
>> +void remote_configure_iommu(PCIBus *pci_bus);
>> +
>> +void remote_iommu_del_device(PCIDevice *pci_dev);
>> +
>> +#endif
>> diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
>> new file mode 100644
>> index 0000000000..13f329b45d
>> --- /dev/null
>> +++ b/hw/remote/iommu.c
>> @@ -0,0 +1,95 @@
>> +/**
>> + * IOMMU for remote device
>> + *
>> + * Copyright © 2022 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +
>> +#include "hw/remote/iommu.h"
>> +#include "hw/pci/pci_bus.h"
>> +#include "hw/pci/pci.h"
>> +#include "exec/memory.h"
>> +#include "exec/address-spaces.h"
>> +#include "trace.h"
>> +
>> +struct RemoteIommuElem {
>> +    AddressSpace  as;
>> +    MemoryRegion  mr;
>> +};
>> +
>> +struct RemoteIommuTable {
>> +    QemuMutex lock;
>> +    GHashTable *elem_by_bdf;
>> +} remote_iommu_table;
>> +
>> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
>> +
>> +static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
>> +                                              void *opaque, int devfn)
>> +{
>> +    struct RemoteIommuTable *iommu_table = opaque;
>> +    struct RemoteIommuElem *elem = NULL;
>> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
>> +
>> +    elem = g_hash_table_lookup(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf));
>> +
>> +    if (!elem) {
>> +        g_autofree char *mr_name = g_strdup_printf("vfu-ram-%d", pci_bdf);
>> +        g_autofree char *as_name = g_strdup_printf("vfu-as-%d", pci_bdf);
>> +
>> +        elem = g_malloc0(sizeof(struct RemoteIommuElem));
>> +
>> +        memory_region_init(&elem->mr, NULL, mr_name, UINT64_MAX);
> goes here:
>   memory_region_do_init()
>        if (!owner) {                                                            
>            owner = container_get(qdev_get_machine(), "/unattached");            
>        }  
> 
> then
> 
>> +        address_space_init(&elem->as, &elem->mr, as_name);
>> +
>> +        qemu_mutex_lock(&iommu_table->lock);
>> +        g_hash_table_insert(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf), elem);
>> +        qemu_mutex_unlock(&iommu_table->lock);
>> +    }
>> +
>> +    return &elem->as;
>> +}
>> +
>> +static void remote_iommu_del_elem(gpointer data)
>> +{
>> +    struct RemoteIommuElem *elem = data;
>> +
>> +    g_assert(elem);
>> +
>> +    memory_region_unref(&elem->mr);
> 
> here we call
>      object_unref(mr->owner); 
> leaving dangling pointer in owner '(qdev_get_machine(), "/unattached")'
> it doesn't look correct
> 
> I thought that memory_region_unref() should be always paired with memory_region_ref()
> 
> and looking at memory_region_init(...owner...) history it looks like
> owner-less (NULL) regions are not meant to be deleted ever.

Hi Igor,

Thanks for the pointers about ref counters for MemoryRegions.

It makes sense - MemoryRegions are not QEMU Objects. So their
owner’s ref counters are used instead. So the expectation is that
when the owner is destroyed, the MemoryRegions initialized by them
also get destroyed simultaneously.

In this case, RemoteIommuElem->mr does not have an owner. Therefore,
we don’t have to manage its ref counter. When RemoteIommuElem is
destroyed, the MemoryRegion should be cleaned up automatically.

--
Jag

> 
>> +    address_space_destroy(&elem->as);
>> +
>> +    g_free(elem);
>> +}
>> +
>> +void remote_iommu_del_device(PCIDevice *pci_dev)
>> +{
>> +    int pci_bdf;
>> +
>> +    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
>> +        return;
>> +    }
>> +
>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
>> +
>> +    qemu_mutex_lock(&remote_iommu_table.lock);
>> +    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
>> +    qemu_mutex_unlock(&remote_iommu_table.lock);
>> +}
>> +
>> +void remote_configure_iommu(PCIBus *pci_bus)
>> +{
>> +    if (!remote_iommu_table.elem_by_bdf) {
>> +        remote_iommu_table.elem_by_bdf =
>> +            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
>> +        qemu_mutex_init(&remote_iommu_table.lock);
>> +    }
>> +
>> +    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);
>> +}
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index e7b0297a63..21694a9698 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3599,6 +3599,8 @@ F: hw/remote/iohub.c
>> F: include/hw/remote/iohub.h
>> F: subprojects/libvfio-user
>> F: hw/remote/vfio-user-obj.c
>> +F: hw/remote/iommu.c
>> +F: include/hw/remote/iommu.h
>> 
>> EBPF:
>> M: Jason Wang <jasowang@redhat.com>
>> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
>> index 534ac5df79..bcef83c8cc 100644
>> --- a/hw/remote/meson.build
>> +++ b/hw/remote/meson.build
>> @@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
>> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iommu.c'))
>> remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
>> 
>> remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-04-13 14:37                       ` Igor Mammedov
@ 2022-04-13 19:08                         ` Peter Xu
  2022-04-19  8:48                           ` Igor Mammedov
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Xu @ 2022-04-13 19:08 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eduardo, Elena Ufimtseva, dgilbert, John Johnson, Jag Raman,
	bleal, john.levon, Michael S. Tsirkin, armbru, Jason Wang,
	quintela, qemu-devel, f4bug, Alex Williamson, Kanth Ghatraju,
	berrange, Stefan Hajnoczi, thanos.makatos, pbonzini, eblake

On Wed, Apr 13, 2022 at 04:37:35PM +0200, Igor Mammedov wrote:
> On Thu, 31 Mar 2022 08:41:01 -0400
> Peter Xu <peterx@redhat.com> wrote:
> 
> > On Thu, Mar 31, 2022 at 10:47:33AM +0100, Stefan Hajnoczi wrote:
> > > On Wed, Mar 30, 2022 at 01:13:03PM -0400, Peter Xu wrote:  
> > > > On Wed, Mar 30, 2022 at 05:08:24PM +0100, Stefan Hajnoczi wrote:  
> > > > > On Wed, Mar 30, 2022 at 08:53:16AM -0400, Peter Xu wrote:  
> > > > > > On Wed, Mar 30, 2022 at 11:04:24AM +0100, Stefan Hajnoczi wrote:  
> > > > > > > This makes me wonder whether there is a deeper issue with the
> > > > > > > pci_setup_iommu() API: the lack of per-device cleanup callbacks.
> > > > > > > Per-device IOMMU resources should be freed when a device is hot
> > > > > > > unplugged.
> > > > > > > 
> > > > > > > From what I can tell this is not the case today:
> > > > > > > 
> > > > > > > - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
> > > > > > >   address spaces but I can't find where they are removed and freed.
> > > > > > >   VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.
> > > > > > > 
> > > > > > > - hw/i386/amd_iommu.c has similar leaks.  
> > > > > > 
> > > > > > AFAICT it's because there's no device-specific data cached in the
> > > > > > per-device IOMMU address space, at least so far.  IOW, all the data
> > > > > > structures allocated here can be re-used when a new device is plugged in
> > > > > > after the old device unplugged.
> > > > > > 
> > > > > > It's definitely not ideal since after unplug (and before a new device
> > > > > > plugged in) the resource is not needed at all so it's kind of wasted, but
> > > > > > it should work functionally.  If to achieve that, some iommu_unplug() or
> > > > > > iommu_cleanup() hook sounds reasonable.  
> > > > > 
> > > > > I guess the question is whether PCI busses can be hotplugged with
> > > > > IOMMUs. If yes, then there is a memory leak that matters for
> > > > > intel_iommu.c and amd_iommu.c.  
> > > > 
> > > > It can't, and we only support one vIOMMU so far for both (commit
> > > > 1b3bf13890fd849b26).  Thanks,  
> > > 
> > > I see, thanks!
> > > 
> > > Okay, summarizing options for the vfio-user IOMMU:
> > > 
> > > 1. Use the same singleton approach as existing IOMMUs where the
> > >    MemoryRegion/AddressSpace are never freed. Don't bother deleting.
> > > 
> > > 2. Keep the approach in this patch where vfio-user code manually calls a
> > >    custom delete function (not part of the pci_setup_iommu() API). This
> > >    is slightly awkward to do without global state and that's what
> > >    started this discussion.
> > > 
> > > 3. Introduce an optional pci_setup_iommu() callback:
> > > 
> > >    typdef void (*PCIIOMMUDeviceUnplug)(PCIBus *bus, void *opaque, int devfn);
> > > 
> > >    Solves the awkwardness of option #2. Not needed by existing IOMMU
> > >    devices.  
> > 
> > Looks all workable to me.  One tiny thing is if we'd like 3) we may want to
> > pass over the PCIDevice* too because in this case IIUC we'd need to double
> > check the device class before doing anything - we may not want to call the
> > vfio-user callbacks for general emulated devices under the same pci bus.
> > 
> > I think we could also fetch that from PCIBus.devices[devfn] but that's just
> > not as obvious.
> > 
> > Option 4) is as mentioned previously, that we add another device unplug
> > hook that can be registered per-device.  I just didn't think thoroughly on
> can you expand on why per device hook is needed?

E.g. when the pci bus that contains the vfio-user device also contains
another emulated device?  Then IIUC we only want to call the vfio-user hook
for the vfio-user device, not the rest ones on the same bus?

Per-bus will work too, but again then the per-bus hook will need to first
identify the PCIDevice* object so it'll work similarly as a per-device hook.

> 
> > how it would interact with the current HotplugHandler design yet.. it looks
> > quite similar but so far it's either for the machine type or pci bus, not
> > capable of registering on one single device (and it's always a mistery to
> > me why we'd rather ignore the per-bus hook if the machine hook
> > existed.. that's in qdev_get_hotplug_handler).
> 
> machine hook is there for bus-less devices mainly, if it's not defined
> code will fallback to bus handler if any exists.
> 
> However machine hook can also be used to override default hotplug chain
> to do to implement non trivial plug/unplug flow.
> for example see pc_get_hotplug_handler(), corresponding
> pc_machine_device_[pre_plug|plug|unplug...]_cb() where
> plug/unplug chain is altered for some PCI devices types.
> Perhaps the same can be done for vfio.

It just seems non-obvious, no?  For example, if someone implementes a pci
bus with hotplug_handler() being provided, it will surprise me a bit if
it's triggered conditionally, depending on which machine type the bus is
attached to, and whether this machine type has get_hotplug_handler().

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-04-13 18:25     ` Jag Raman
@ 2022-04-13 21:59       ` Jag Raman
  0 siblings, 0 replies; 44+ messages in thread
From: Jag Raman @ 2022-04-13 21:59 UTC (permalink / raw)
  To: Igor Mammedov
  Cc: eduardo, Elena Ufimtseva, berrange, bleal, John Johnson,
	john.levon, qemu-devel, armbru, quintela, alex.williamson,
	pbonzini, mst, stefanha, thanos.makatos, Kanth Ghatraju, eblake,
	dgilbert, f4bug



> On Apr 13, 2022, at 2:24 PM, Jag Raman <jag.raman@oracle.com> wrote:
> 
> 
> 
>> On Apr 13, 2022, at 10:25 AM, Igor Mammedov <imammedo@redhat.com> wrote:
>> 
>> On Fri, 25 Mar 2022 15:19:41 -0400
>> Jagannathan Raman <jag.raman@oracle.com> wrote:
>> 
>>> Assign separate address space for each device in the remote processes.
>>> 
>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>> ---
>>> include/hw/remote/iommu.h | 18 ++++++++
>>> hw/remote/iommu.c         | 95 +++++++++++++++++++++++++++++++++++++++
>>> MAINTAINERS               |  2 +
>>> hw/remote/meson.build     |  1 +
>>> 4 files changed, 116 insertions(+)
>>> create mode 100644 include/hw/remote/iommu.h
>>> create mode 100644 hw/remote/iommu.c
>>> 
>>> diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
>>> new file mode 100644
>>> index 0000000000..8f850400f1
>>> --- /dev/null
>>> +++ b/include/hw/remote/iommu.h
>>> @@ -0,0 +1,18 @@
>>> +/**
>>> + * Copyright © 2022 Oracle and/or its affiliates.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>> + * See the COPYING file in the top-level directory.
>>> + *
>>> + */
>>> +
>>> +#ifndef REMOTE_IOMMU_H
>>> +#define REMOTE_IOMMU_H
>>> +
>>> +#include "hw/pci/pci_bus.h"
>>> +
>>> +void remote_configure_iommu(PCIBus *pci_bus);
>>> +
>>> +void remote_iommu_del_device(PCIDevice *pci_dev);
>>> +
>>> +#endif
>>> diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
>>> new file mode 100644
>>> index 0000000000..13f329b45d
>>> --- /dev/null
>>> +++ b/hw/remote/iommu.c
>>> @@ -0,0 +1,95 @@
>>> +/**
>>> + * IOMMU for remote device
>>> + *
>>> + * Copyright © 2022 Oracle and/or its affiliates.
>>> + *
>>> + * This work is licensed under the terms of the GNU GPL, version 2 or later.
>>> + * See the COPYING file in the top-level directory.
>>> + *
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "qemu-common.h"
>>> +
>>> +#include "hw/remote/iommu.h"
>>> +#include "hw/pci/pci_bus.h"
>>> +#include "hw/pci/pci.h"
>>> +#include "exec/memory.h"
>>> +#include "exec/address-spaces.h"
>>> +#include "trace.h"
>>> +
>>> +struct RemoteIommuElem {
>>> +    AddressSpace  as;
>>> +    MemoryRegion  mr;
>>> +};
>>> +
>>> +struct RemoteIommuTable {
>>> +    QemuMutex lock;
>>> +    GHashTable *elem_by_bdf;
>>> +} remote_iommu_table;
>>> +
>>> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
>>> +
>>> +static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
>>> +                                              void *opaque, int devfn)
>>> +{
>>> +    struct RemoteIommuTable *iommu_table = opaque;
>>> +    struct RemoteIommuElem *elem = NULL;
>>> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
>>> +
>>> +    elem = g_hash_table_lookup(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf));
>>> +
>>> +    if (!elem) {
>>> +        g_autofree char *mr_name = g_strdup_printf("vfu-ram-%d", pci_bdf);
>>> +        g_autofree char *as_name = g_strdup_printf("vfu-as-%d", pci_bdf);
>>> +
>>> +        elem = g_malloc0(sizeof(struct RemoteIommuElem));
>>> +
>>> +        memory_region_init(&elem->mr, NULL, mr_name, UINT64_MAX);
>> goes here:
>>  memory_region_do_init()
>>       if (!owner) {                                                            
>>           owner = container_get(qdev_get_machine(), "/unattached");            
>>       }  
>> 
>> then
>> 
>>> +        address_space_init(&elem->as, &elem->mr, as_name);
>>> +
>>> +        qemu_mutex_lock(&iommu_table->lock);
>>> +        g_hash_table_insert(iommu_table->elem_by_bdf, INT2VOIDP(pci_bdf), elem);
>>> +        qemu_mutex_unlock(&iommu_table->lock);
>>> +    }
>>> +
>>> +    return &elem->as;
>>> +}
>>> +
>>> +static void remote_iommu_del_elem(gpointer data)
>>> +{
>>> +    struct RemoteIommuElem *elem = data;
>>> +
>>> +    g_assert(elem);
>>> +
>>> +    memory_region_unref(&elem->mr);
>> 
>> here we call
>>     object_unref(mr->owner); 
>> leaving dangling pointer in owner '(qdev_get_machine(), "/unattached")'
>> it doesn't look correct
>> 
>> I thought that memory_region_unref() should be always paired with memory_region_ref()
>> 
>> and looking at memory_region_init(...owner...) history it looks like
>> owner-less (NULL) regions are not meant to be deleted ever.
> 
> Hi Igor,
> 
> Thanks for the pointers about ref counters for MemoryRegions.
> 
> It makes sense - MemoryRegions are not QEMU Objects. So their
> owner’s ref counters are used instead. So the expectation is that
> when the owner is destroyed, the MemoryRegions initialized by them
> also get destroyed simultaneously.

Well, MemoryRegions are indeed QEMU objects -
"memory_region_init() -> object_initialize()" initializes the object.
So we should be able to unref the MemoryRegion object directly.

We could make the PCIDevice as the owner of its IOMMU region -
when the device is finalized, its region would be finalized as well.

Given the above, I don’t think we would need a separate delete
function (such as remote_iommu_del_device()). When the device is
unplugged, its MemoryRegion would be finalized automatically.

Thank you!
--
Jag

> 
> In this case, RemoteIommuElem->mr does not have an owner. Therefore,
> we don’t have to manage its ref counter. When RemoteIommuElem is
> destroyed, the MemoryRegion should be cleaned up automatically.
> 
> --
> Jag
> 
>> 
>>> +    address_space_destroy(&elem->as);
>>> +
>>> +    g_free(elem);
>>> +}
>>> +
>>> +void remote_iommu_del_device(PCIDevice *pci_dev)
>>> +{
>>> +    int pci_bdf;
>>> +
>>> +    if (!remote_iommu_table.elem_by_bdf || !pci_dev) {
>>> +        return;
>>> +    }
>>> +
>>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
>>> +
>>> +    qemu_mutex_lock(&remote_iommu_table.lock);
>>> +    g_hash_table_remove(remote_iommu_table.elem_by_bdf, INT2VOIDP(pci_bdf));
>>> +    qemu_mutex_unlock(&remote_iommu_table.lock);
>>> +}
>>> +
>>> +void remote_configure_iommu(PCIBus *pci_bus)
>>> +{
>>> +    if (!remote_iommu_table.elem_by_bdf) {
>>> +        remote_iommu_table.elem_by_bdf =
>>> +            g_hash_table_new_full(NULL, NULL, NULL, remote_iommu_del_elem);
>>> +        qemu_mutex_init(&remote_iommu_table.lock);
>>> +    }
>>> +
>>> +    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, &remote_iommu_table);
>>> +}
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index e7b0297a63..21694a9698 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -3599,6 +3599,8 @@ F: hw/remote/iohub.c
>>> F: include/hw/remote/iohub.h
>>> F: subprojects/libvfio-user
>>> F: hw/remote/vfio-user-obj.c
>>> +F: hw/remote/iommu.c
>>> +F: include/hw/remote/iommu.h
>>> 
>>> EBPF:
>>> M: Jason Wang <jasowang@redhat.com>
>>> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
>>> index 534ac5df79..bcef83c8cc 100644
>>> --- a/hw/remote/meson.build
>>> +++ b/hw/remote/meson.build
>>> @@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
>>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
>>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
>>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
>>> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iommu.c'))
>>> remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
>>> 
>>> remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v7 12/17] vfio-user: IOMMU support for remote device
  2022-04-13 19:08                         ` Peter Xu
@ 2022-04-19  8:48                           ` Igor Mammedov
  0 siblings, 0 replies; 44+ messages in thread
From: Igor Mammedov @ 2022-04-19  8:48 UTC (permalink / raw)
  To: Peter Xu
  Cc: eduardo, Elena Ufimtseva, dgilbert, John Johnson, Jag Raman,
	bleal, john.levon, Michael S. Tsirkin, armbru, Jason Wang,
	quintela, qemu-devel, f4bug, Alex Williamson, Kanth Ghatraju,
	berrange, Stefan Hajnoczi, thanos.makatos, pbonzini, eblake

On Wed, 13 Apr 2022 15:08:33 -0400
Peter Xu <peterx@redhat.com> wrote:

> On Wed, Apr 13, 2022 at 04:37:35PM +0200, Igor Mammedov wrote:
> > On Thu, 31 Mar 2022 08:41:01 -0400
> > Peter Xu <peterx@redhat.com> wrote:
> >   
> > > On Thu, Mar 31, 2022 at 10:47:33AM +0100, Stefan Hajnoczi wrote:  
> > > > On Wed, Mar 30, 2022 at 01:13:03PM -0400, Peter Xu wrote:    
> > > > > On Wed, Mar 30, 2022 at 05:08:24PM +0100, Stefan Hajnoczi wrote:    
> > > > > > On Wed, Mar 30, 2022 at 08:53:16AM -0400, Peter Xu wrote:    
> > > > > > > On Wed, Mar 30, 2022 at 11:04:24AM +0100, Stefan Hajnoczi wrote:    
> > > > > > > > This makes me wonder whether there is a deeper issue with the
> > > > > > > > pci_setup_iommu() API: the lack of per-device cleanup callbacks.
> > > > > > > > Per-device IOMMU resources should be freed when a device is hot
> > > > > > > > unplugged.
> > > > > > > > 
> > > > > > > > From what I can tell this is not the case today:
> > > > > > > > 
> > > > > > > > - hw/i386/intel_iommu.c:vtd_find_add_as() allocates and adds device
> > > > > > > >   address spaces but I can't find where they are removed and freed.
> > > > > > > >   VTDAddressSpace instances pointed to from vtd_bus->dev_as[] are leaked.
> > > > > > > > 
> > > > > > > > - hw/i386/amd_iommu.c has similar leaks.    
> > > > > > > 
> > > > > > > AFAICT it's because there's no device-specific data cached in the
> > > > > > > per-device IOMMU address space, at least so far.  IOW, all the data
> > > > > > > structures allocated here can be re-used when a new device is plugged in
> > > > > > > after the old device unplugged.
> > > > > > > 
> > > > > > > It's definitely not ideal since after unplug (and before a new device
> > > > > > > plugged in) the resource is not needed at all so it's kind of wasted, but
> > > > > > > it should work functionally.  If to achieve that, some iommu_unplug() or
> > > > > > > iommu_cleanup() hook sounds reasonable.    
> > > > > > 
> > > > > > I guess the question is whether PCI busses can be hotplugged with
> > > > > > IOMMUs. If yes, then there is a memory leak that matters for
> > > > > > intel_iommu.c and amd_iommu.c.    
> > > > > 
> > > > > It can't, and we only support one vIOMMU so far for both (commit
> > > > > 1b3bf13890fd849b26).  Thanks,    
> > > > 
> > > > I see, thanks!
> > > > 
> > > > Okay, summarizing options for the vfio-user IOMMU:
> > > > 
> > > > 1. Use the same singleton approach as existing IOMMUs where the
> > > >    MemoryRegion/AddressSpace are never freed. Don't bother deleting.
> > > > 
> > > > 2. Keep the approach in this patch where vfio-user code manually calls a
> > > >    custom delete function (not part of the pci_setup_iommu() API). This
> > > >    is slightly awkward to do without global state and that's what
> > > >    started this discussion.
> > > > 
> > > > 3. Introduce an optional pci_setup_iommu() callback:
> > > > 
> > > >    typdef void (*PCIIOMMUDeviceUnplug)(PCIBus *bus, void *opaque, int devfn);
> > > > 
> > > >    Solves the awkwardness of option #2. Not needed by existing IOMMU
> > > >    devices.    
> > > 
> > > Looks all workable to me.  One tiny thing is if we'd like 3) we may want to
> > > pass over the PCIDevice* too because in this case IIUC we'd need to double
> > > check the device class before doing anything - we may not want to call the
> > > vfio-user callbacks for general emulated devices under the same pci bus.
> > > 
> > > I think we could also fetch that from PCIBus.devices[devfn] but that's just
> > > not as obvious.
> > > 
> > > Option 4) is as mentioned previously, that we add another device unplug
> > > hook that can be registered per-device.  I just didn't think thoroughly on  
> > can you expand on why per device hook is needed?  
> 
> E.g. when the pci bus that contains the vfio-user device also contains
> another emulated device?  Then IIUC we only want to call the vfio-user hook
> for the vfio-user device, not the rest ones on the same bus?
> 
> Per-bus will work too, but again then the per-bus hook will need to first
> identify the PCIDevice* object so it'll work similarly as a per-device hook.

Question is if hook is really needed,
and why doing cleanup in vfio-usr.unrealize() is not sufficient.

also there are realize/unrealize listener hooks, that could
help to hook random code to plug/unplug workflow as the last resort
(i.e. avoid properly dividing responsibility between emulated
device models) on top of that it doesn't have any error handling
so hooks must not fail ever.

(also it's generic vfio issue as it's written as a mix of backend+frontend
code which is confusing at times and makes it hard to distinguish what
belongs where (unless one has an intimate knowledge of how it should
be working))

> > > how it would interact with the current HotplugHandler design yet.. it looks
> > > quite similar but so far it's either for the machine type or pci bus, not
> > > capable of registering on one single device (and it's always a mistery to
> > > me why we'd rather ignore the per-bus hook if the machine hook
> > > existed.. that's in qdev_get_hotplug_handler).  
> > 
> > machine hook is there for bus-less devices mainly, if it's not defined
> > code will fallback to bus handler if any exists.
> > 
> > However machine hook can also be used to override default hotplug chain
> > to do to implement non trivial plug/unplug flow.
> > for example see pc_get_hotplug_handler(), corresponding
> > pc_machine_device_[pre_plug|plug|unplug...]_cb() where
> > plug/unplug chain is altered for some PCI devices types.
> > Perhaps the same can be done for vfio.  
> 
> It just seems non-obvious, no?  For example, if someone implementes a pci
> bus with hotplug_handler() being provided, it will surprise me a bit if
> it's triggered conditionally, depending on which machine type the bus is
> attached to, and whether this machine type has get_hotplug_handler().

bus level handler is called anyways, with the difference that plug/unplug
flow could be additionally redirected to other relevant components.

It might be confusing at the beginning, the idea behind, having single
entry point to override flow at machine level, is that it's the top most
object that governs all other devices, wires them together and is allowed
to reshape default wiring between attached devices without violating
relationship between components. (drawback is that approach adds quite
a bit of boilerplate, but no one has suggested/implemented any other
idea for generic device wiring).

Adding specialized hooks to generic bus code for a random device quirks
might work too, but it's not obvious either, scattered through codebase
often polluting  generic code with device specific workarounds, dn it's
harder to review/judge if proposed hook is not just another layer
violation.
Well I don't know enough about vfio/iommu to make useful suggestion
but adding to generic PCI bus/device vfio induced hooks does look
suspicious to me.


> Thanks,
> 



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2022-04-19  8:51 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-25 19:19 [PATCH v7 00/17] vfio-user server in QEMU Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 01/17] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 02/17] qdev: unplug blocker for devices Jagannathan Raman
2022-03-29 10:08   ` Stefan Hajnoczi
2022-03-25 19:19 ` [PATCH v7 03/17] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 04/17] remote/machine: add vfio-user property Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 05/17] configure: require cmake 3.19 or newer Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 06/17] vfio-user: build library Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 07/17] vfio-user: define vfio-user-server object Jagannathan Raman
2022-03-29 10:21   ` Stefan Hajnoczi
2022-03-29 14:03     ` Jag Raman
2022-03-25 19:19 ` [PATCH v7 08/17] vfio-user: instantiate vfio-user context Jagannathan Raman
2022-03-29 10:26   ` Stefan Hajnoczi
2022-03-25 19:19 ` [PATCH v7 09/17] vfio-user: find and init PCI device Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 10/17] vfio-user: run vfio-user context Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 11/17] vfio-user: handle PCI config space accesses Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 12/17] vfio-user: IOMMU support for remote device Jagannathan Raman
2022-03-29 12:35   ` Stefan Hajnoczi
2022-03-29 14:12     ` Jag Raman
2022-03-29 14:48       ` Stefan Hajnoczi
2022-03-29 19:58         ` Jag Raman
2022-03-30 10:04           ` Stefan Hajnoczi
2022-03-30 12:53             ` Peter Xu
2022-03-30 16:08               ` Stefan Hajnoczi
2022-03-30 17:13                 ` Peter Xu
2022-03-31  9:47                   ` Stefan Hajnoczi
2022-03-31 12:41                     ` Peter Xu
2022-04-13 14:37                       ` Igor Mammedov
2022-04-13 19:08                         ` Peter Xu
2022-04-19  8:48                           ` Igor Mammedov
2022-04-13 14:25   ` Igor Mammedov
2022-04-13 18:25     ` Jag Raman
2022-04-13 21:59       ` Jag Raman
2022-03-25 19:19 ` [PATCH v7 13/17] vfio-user: handle DMA mappings Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 14/17] vfio-user: handle PCI BAR accesses Jagannathan Raman
2022-03-29 12:50   ` Stefan Hajnoczi
2022-03-29 15:51     ` Jag Raman
2022-03-30 10:05       ` Stefan Hajnoczi
2022-03-30 14:46         ` Jag Raman
2022-03-30 16:06           ` Stefan Hajnoczi
2022-03-25 19:19 ` [PATCH v7 15/17] vfio-user: handle device interrupts Jagannathan Raman
2022-03-25 19:19 ` [PATCH v7 16/17] vfio-user: handle reset of remote device Jagannathan Raman
2022-03-29 14:46   ` Stefan Hajnoczi
2022-03-25 19:19 ` [PATCH v7 17/17] vfio-user: avocado tests for vfio-user Jagannathan Raman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.