All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v10 00/14] vfio-user server in QEMU
@ 2022-05-24 15:30 Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 01/14] qdev: unplug blocker for devices Jagannathan Raman
                   ` (14 more replies)
  0 siblings, 15 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Hi,

This is v10 of the server side changes to enable vfio-user in QEMU.

Thank you for reviewing and sharing your feedback for the previous
revision. We have addressed your comments in this revision.

We have dropped the following patches in this series:
  - tests/avocado: Specify target VM argument to helper routines
  - configure: require cmake 3.19 or newer
  - vfio-user: avocado tests for vfio-user

We have also made the following changes:
  [PATCH v10 1/14] qdev: unplug blocker for devices
    - updated functions comments for unplug blockers in hw/qdev-core.h

  [PATCH v10 4/14] vfio-user: build library
    - uses meson build system to build libvfio-user library
    - dropped ubuntu CI build

  [PATCH v10 5/14] vfio-user: define vfio-user-server object
    - updated comments for VfioUserServerProperties in qapi/qom.json

  [PATCH v10 6/14] vfio-user: instantiate vfio-user context
    - added comments to vfu_object_init_ctx() explaining function contract

  [PATCH v10 8/14] vfio-user: run vfio-user context
    - vfu_object_ctx_run() asserts that VfuObject->device is not NULL
    - added a comment to vfu_object_ctx_run() explaining why
      VfuObject->device wouldn't be NULL

Thank you very much!

Jagannathan Raman (14):
  qdev: unplug blocker for devices
  remote/machine: add HotplugHandler for remote machine
  remote/machine: add vfio-user property
  vfio-user: build library
  vfio-user: define vfio-user-server object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: IOMMU support for remote device
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: handle reset of remote device

 configure                               |  17 +
 meson.build                             |  23 +-
 qapi/misc.json                          |  31 +
 qapi/qom.json                           |  20 +-
 include/exec/memory.h                   |   3 +
 include/hw/pci/pci.h                    |  13 +
 include/hw/qdev-core.h                  |  29 +
 include/hw/remote/iommu.h               |  40 ++
 include/hw/remote/machine.h             |   4 +
 include/hw/remote/vfio-user-obj.h       |   6 +
 hw/core/qdev.c                          |  24 +
 hw/pci/msi.c                            |  16 +-
 hw/pci/msix.c                           |  10 +-
 hw/pci/pci.c                            |  13 +
 hw/remote/iommu.c                       | 131 ++++
 hw/remote/machine.c                     |  88 ++-
 hw/remote/vfio-user-obj.c               | 914 ++++++++++++++++++++++++
 softmmu/physmem.c                       |   4 +-
 softmmu/qdev-monitor.c                  |   4 +
 stubs/vfio-user-obj.c                   |   6 +
 tests/qtest/fuzz/generic_fuzz.c         |   9 +-
 .gitlab-ci.d/buildtest.yml              |   1 +
 .gitmodules                             |   3 +
 Kconfig.host                            |   4 +
 MAINTAINERS                             |   5 +
 hw/remote/Kconfig                       |   4 +
 hw/remote/meson.build                   |   4 +
 hw/remote/trace-events                  |  11 +
 meson_options.txt                       |   2 +
 stubs/meson.build                       |   1 +
 subprojects/libvfio-user                |   1 +
 tests/docker/dockerfiles/centos8.docker |   2 +
 32 files changed, 1424 insertions(+), 19 deletions(-)
 create mode 100644 include/hw/remote/iommu.h
 create mode 100644 include/hw/remote/vfio-user-obj.h
 create mode 100644 hw/remote/iommu.c
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 100644 stubs/vfio-user-obj.c
 create mode 160000 subprojects/libvfio-user

-- 
2.20.1



^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v10 01/14] qdev: unplug blocker for devices
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 02/14] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Add blocker to prevent hot-unplug of devices

TYPE_VFIO_USER_SERVER, which is introduced shortly, attaches itself to a
PCIDevice on which it depends. If the attached PCIDevice gets removed
while the server in use, it could cause it crash. To prevent this,
TYPE_VFIO_USER_SERVER adds an unplug blocker for the PCIDevice.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/qdev-core.h | 29 +++++++++++++++++++++++++++++
 hw/core/qdev.c         | 24 ++++++++++++++++++++++++
 softmmu/qdev-monitor.c |  4 ++++
 3 files changed, 57 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 92c3d65208..98774e2835 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    GSList *unplug_blockers;
 };
 
 struct DeviceListener {
@@ -419,6 +420,34 @@ void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
 void qdev_machine_creation_done(void);
 bool qdev_machine_modified(void);
 
+/**
+ * qdev_add_unplug_blocker: Add an unplug blocker to a device
+ *
+ * @dev: Device to be blocked from unplug
+ * @reason: Reason for blocking
+ */
+void qdev_add_unplug_blocker(DeviceState *dev, Error *reason);
+
+/**
+ * qdev_del_unplug_blocker: Remove an unplug blocker from a device
+ *
+ * @dev: Device to be unblocked
+ * @reason: Pointer to the Error used with qdev_add_unplug_blocker.
+ *          Used as a handle to lookup the blocker for deletion.
+ */
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason);
+
+/**
+ * qdev_unplug_blocked: Confirm if a device is blocked from unplug
+ *
+ * @dev: Device to be tested
+ * @reason: Returns one of the reasons why the device is blocked,
+ *          if any
+ *
+ * Returns: true if device is blocked from unplug, false otherwise
+ */
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp);
+
 /**
  * GpioPolarity: Polarity of a GPIO line
  *
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 84f3019440..0806d8fcaa 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -468,6 +468,28 @@ char *qdev_get_dev_path(DeviceState *dev)
     return NULL;
 }
 
+void qdev_add_unplug_blocker(DeviceState *dev, Error *reason)
+{
+    dev->unplug_blockers = g_slist_prepend(dev->unplug_blockers, reason);
+}
+
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason)
+{
+    dev->unplug_blockers = g_slist_remove(dev->unplug_blockers, reason);
+}
+
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp)
+{
+    ERRP_GUARD();
+
+    if (dev->unplug_blockers) {
+        error_propagate(errp, error_copy(dev->unplug_blockers->data));
+        return true;
+    }
+
+    return false;
+}
+
 static bool device_get_realized(Object *obj, Error **errp)
 {
     DeviceState *dev = DEVICE(obj);
@@ -704,6 +726,8 @@ static void device_finalize(Object *obj)
 
     DeviceState *dev = DEVICE(obj);
 
+    g_assert(!dev->unplug_blockers);
+
     QLIST_FOREACH_SAFE(ngl, &dev->gpios, node, next) {
         QLIST_REMOVE(ngl, node);
         qemu_free_irqs(ngl->in, ngl->num_in);
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 12fe60c467..9cfd59d17c 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -898,6 +898,10 @@ void qdev_unplug(DeviceState *dev, Error **errp)
     HotplugHandlerClass *hdc;
     Error *local_err = NULL;
 
+    if (qdev_unplug_blocked(dev, errp)) {
+        return;
+    }
+
     if (dev->parent_bus && !qbus_is_hotpluggable(dev->parent_bus)) {
         error_setg(errp, QERR_BUS_NO_HOTPLUG, dev->parent_bus->name);
         return;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 02/14] remote/machine: add HotplugHandler for remote machine
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 01/14] qdev: unplug blocker for devices Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 03/14] remote/machine: add vfio-user property Jagannathan Raman
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Allow hotplugging of PCI(e) devices to remote machine

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/machine.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 92d71d47bb..a97e53e250 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -20,6 +20,7 @@
 #include "qapi/error.h"
 #include "hw/pci/pci_host.h"
 #include "hw/remote/iohub.h"
+#include "hw/qdev-core.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -53,14 +54,19 @@ static void remote_machine_init(MachineState *machine)
 
     pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
                  &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+
+    qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
 
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
     mc->init = remote_machine_init;
     mc->desc = "Experimental remote machine";
+
+    hc->unplug = qdev_simple_device_unplug_cb;
 }
 
 static const TypeInfo remote_machine = {
@@ -68,6 +74,10 @@ static const TypeInfo remote_machine = {
     .parent = TYPE_MACHINE,
     .instance_size = sizeof(RemoteMachineState),
     .class_init = remote_machine_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
 };
 
 static void remote_machine_register_types(void)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 03/14] remote/machine: add vfio-user property
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 01/14] qdev: unplug blocker for devices Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 02/14] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 04/14] vfio-user: build library Jagannathan Raman
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Add vfio-user to x-remote machine. It is a boolean, which indicates if
the machine supports vfio-user protocol. The machine configures the bus
differently vfio-user and multiprocess protocols, so this property
informs it on how to configure the bus.

This property should be short lived. Once vfio-user fully replaces
multiprocess, this property could be removed.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/machine.h |  2 ++
 hw/remote/machine.c         | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 2a2a33c4b2..8d0fa98d33 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -22,6 +22,8 @@ struct RemoteMachineState {
 
     RemotePCIHost *host;
     RemoteIOHubState iohub;
+
+    bool vfio_user;
 };
 
 /* Used to pass to co-routine device and ioc. */
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index a97e53e250..9f3cdc55c3 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -58,6 +58,25 @@ static void remote_machine_init(MachineState *machine)
     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
 
+static bool remote_machine_get_vfio_user(Object *obj, Error **errp)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    return s->vfio_user;
+}
+
+static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    if (phase_check(PHASE_MACHINE_CREATED)) {
+        error_setg(errp, "Error enabling vfio-user - machine already created");
+        return;
+    }
+
+    s->vfio_user = value;
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -67,6 +86,10 @@ static void remote_machine_class_init(ObjectClass *oc, void *data)
     mc->desc = "Experimental remote machine";
 
     hc->unplug = qdev_simple_device_unplug_cb;
+
+    object_class_property_add_bool(oc, "vfio-user",
+                                   remote_machine_get_vfio_user,
+                                   remote_machine_set_vfio_user);
 }
 
 static const TypeInfo remote_machine = {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 04/14] vfio-user: build library
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (2 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 03/14] remote/machine: add vfio-user property Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-25 14:46   ` Stefan Hajnoczi
  2022-05-24 15:30 ` [PATCH v10 05/14] vfio-user: define vfio-user-server object Jagannathan Raman
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

add the libvfio-user library as a submodule. build it as a meson
subproject.

libvfio-user is distributed with BSD 3-Clause license and
json-c with MIT (Expat) license

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 configure                               | 17 +++++++++++++++++
 meson.build                             | 23 ++++++++++++++++++++++-
 .gitlab-ci.d/buildtest.yml              |  1 +
 .gitmodules                             |  3 +++
 Kconfig.host                            |  4 ++++
 MAINTAINERS                             |  1 +
 hw/remote/Kconfig                       |  4 ++++
 hw/remote/meson.build                   |  2 ++
 meson_options.txt                       |  2 ++
 subprojects/libvfio-user                |  1 +
 tests/docker/dockerfiles/centos8.docker |  2 ++
 11 files changed, 59 insertions(+), 1 deletion(-)
 create mode 160000 subprojects/libvfio-user

diff --git a/configure b/configure
index 180ee688dc..d6a36ba8e6 100755
--- a/configure
+++ b/configure
@@ -301,6 +301,7 @@ meson_args=""
 ninja=""
 bindir="bin"
 skip_meson=no
+vfio_user_server="disabled"
 
 # The following Meson options are handled manually (still they
 # are included in the automatically generated help message)
@@ -891,6 +892,10 @@ for opt do
   ;;
   --disable-blobs) meson_option_parse --disable-install-blobs ""
   ;;
+  --enable-vfio-user-server) vfio_user_server="enabled"
+  ;;
+  --disable-vfio-user-server) vfio_user_server="disabled"
+  ;;
   --enable-tcmalloc) meson_option_parse --enable-malloc=tcmalloc tcmalloc
   ;;
   --enable-jemalloc) meson_option_parse --enable-malloc=jemalloc jemalloc
@@ -1796,6 +1801,17 @@ case "$slirp" in
     ;;
 esac
 
+##########################################
+# check for vfio_user_server
+
+case "$vfio_user_server" in
+  enabled )
+    if test "$git_submodules_action" != "ignore"; then
+      git_submodules="${git_submodules} subprojects/libvfio-user"
+    fi
+    ;;
+esac
+
 ##########################################
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -2207,6 +2223,7 @@ if test "$skip_meson" = no; then
   test "$slirp" != auto && meson_option_add "-Dslirp=$slirp"
   test "$smbd" != '' && meson_option_add "-Dsmbd=$smbd"
   test "$tcg" != enabled && meson_option_add "-Dtcg=$tcg"
+  test "$vfio_user_server" != auto && meson_option_add "-Dvfio_user_server=$vfio_user_server"
   run_meson() {
     NINJA=$ninja $meson setup --prefix "$prefix" "$@" $cross_arg "$PWD" "$source_path"
   }
diff --git a/meson.build b/meson.build
index 9ebc00f032..6e66bb5a8c 100644
--- a/meson.build
+++ b/meson.build
@@ -308,6 +308,10 @@ multiprocess_allowed = get_option('multiprocess') \
   .require(targetos == 'linux', error_message: 'Multiprocess QEMU is supported only on Linux') \
   .allowed()
 
+vfio_user_server_allowed = get_option('vfio_user_server') \
+  .require(targetos == 'linux', error_message: 'vfio-user server is supported only on Linux') \
+  .allowed()
+
 have_tpm = get_option('tpm') \
   .require(targetos != 'windows', error_message: 'TPM emulation only available on POSIX systems') \
   .allowed()
@@ -2358,7 +2362,8 @@ host_kconfig = \
   (have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
   ('CONFIG_LINUX' in config_host ? ['CONFIG_LINUX=y'] : []) + \
   (have_pvrdma ? ['CONFIG_PVRDMA=y'] : []) + \
-  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : [])
+  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : []) + \
+  (vfio_user_server_allowed ? ['CONFIG_VFIO_USER_SERVER_ALLOWED=y'] : [])
 
 ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
 
@@ -2650,6 +2655,21 @@ if have_system
   endif
 endif
 
+libvfio_user_dep = not_found
+if have_system and vfio_user_server_allowed
+  have_internal = fs.exists(meson.current_source_dir() / 'subprojects/libvfio-user/meson.build')
+
+  if not have_internal
+    error('libvfio-user source not found - please pull git submodule')
+  endif
+
+  libvfio_user_proj = subproject('libvfio-user')
+
+  libvfio_user_lib = libvfio_user_proj.get_variable('libvfio_user_dep')
+
+  libvfio_user_dep = declare_dependency(dependencies: [libvfio_user_lib])
+endif
+
 fdt = not_found
 if have_system
   fdt_opt = get_option('fdt')
@@ -3760,6 +3780,7 @@ summary_info += {'target list':       ' '.join(target_dirs)}
 if have_system
   summary_info += {'default devices':   get_option('default_devices')}
   summary_info += {'out of process emulation': multiprocess_allowed}
+  summary_info += {'vfio-user server': vfio_user_server_allowed}
 endif
 summary(summary_info, bool_yn: true, section: 'Targets and accelerators')
 
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index e9620c3074..2263a412e8 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -166,6 +166,7 @@ build-system-centos:
     IMAGE: centos8
     CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-fdt=system
       --enable-modules --enable-trace-backends=dtrace --enable-docs
+      --enable-vfio-user-server
     TARGETS: ppc64-softmmu or1k-softmmu s390x-softmmu
       x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
     MAKE_CHECK_ARGS: check-build
diff --git a/.gitmodules b/.gitmodules
index b8bff47df8..c4e66ddb6f 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -64,3 +64,6 @@
 [submodule "tests/lcitool/libvirt-ci"]
 	path = tests/lcitool/libvirt-ci
 	url = https://gitlab.com/libvirt/libvirt-ci.git
+[submodule "subprojects/libvfio-user"]
+	path = subprojects/libvfio-user
+	url = https://github.com/nutanix/libvfio-user.git
diff --git a/Kconfig.host b/Kconfig.host
index 1165c4eacd..d763d89269 100644
--- a/Kconfig.host
+++ b/Kconfig.host
@@ -42,3 +42,7 @@ config MULTIPROCESS_ALLOWED
 config FUZZ
     bool
     select SPARSE_MEM
+
+config VFIO_USER_SERVER_ALLOWED
+    bool
+    imply VFIO_USER_SERVER
diff --git a/MAINTAINERS b/MAINTAINERS
index dff0200f70..c92d87daac 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3602,6 +3602,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: subprojects/libvfio-user
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
index 08c16e235f..2d6b4f4cf4 100644
--- a/hw/remote/Kconfig
+++ b/hw/remote/Kconfig
@@ -2,3 +2,7 @@ config MULTIPROCESS
     bool
     depends on PCI && PCI_EXPRESS && KVM
     select REMOTE_PCIHOST
+
+config VFIO_USER_SERVER
+    bool
+    depends on MULTIPROCESS
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574242..7da83350c8 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
 
+remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: libvfio_user_dep)
+
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
 
diff --git a/meson_options.txt b/meson_options.txt
index 2de94af037..2bf2d20b42 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -88,6 +88,8 @@ option('cfi_debug', type: 'boolean', value: 'false',
        description: 'Verbose errors in case of CFI violation')
 option('multiprocess', type: 'feature', value: 'auto',
        description: 'Out of process device emulation support')
+option('vfio_user_server', type: 'feature', value: 'disabled',
+       description: 'vfio-user server support')
 option('dbus_display', type: 'feature', value: 'auto',
        description: '-display dbus support')
 option('tpm', type : 'feature', value : 'auto',
diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
new file mode 160000
index 0000000000..b52bff72d4
--- /dev/null
+++ b/subprojects/libvfio-user
@@ -0,0 +1 @@
+Subproject commit b52bff72d4eb646a453d19e19ddbd13ed6111a09
diff --git a/tests/docker/dockerfiles/centos8.docker b/tests/docker/dockerfiles/centos8.docker
index 4b20925bbf..10618bfa83 100644
--- a/tests/docker/dockerfiles/centos8.docker
+++ b/tests/docker/dockerfiles/centos8.docker
@@ -51,6 +51,7 @@ RUN dnf update -y && \
         libbpf-devel \
         libcacard-devel \
         libcap-ng-devel \
+        libcmocka-devel \
         libcurl-devel \
         libdrm-devel \
         libepoxy-devel \
@@ -59,6 +60,7 @@ RUN dnf update -y && \
         libgcrypt-devel \
         libiscsi-devel \
         libjpeg-devel \
+        json-c-devel \
         libnfs-devel \
         libpmem-devel \
         libpng-devel \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 05/14] vfio-user: define vfio-user-server object
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (3 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 04/14] vfio-user: build library Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 06/14] vfio-user: instantiate vfio-user context Jagannathan Raman
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/qom.json               |  20 +++-
 include/hw/remote/machine.h |   2 +
 hw/remote/machine.c         |  27 +++++
 hw/remote/vfio-user-obj.c   | 210 ++++++++++++++++++++++++++++++++++++
 MAINTAINERS                 |   1 +
 hw/remote/meson.build       |   1 +
 hw/remote/trace-events      |   3 +
 7 files changed, 262 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/qapi/qom.json b/qapi/qom.json
index 6a653c6636..80dd419b39 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -734,6 +734,20 @@
 { 'struct': 'RemoteObjectProperties',
   'data': { 'fd': 'str', 'devid': 'str' } }
 
+##
+# @VfioUserServerProperties:
+#
+# Properties for x-vfio-user-server objects.
+#
+# @socket: socket to be used by the libvfio-user library
+#
+# @device: the ID of the device to be emulated at the server
+#
+# Since: 7.1
+##
+{ 'struct': 'VfioUserServerProperties',
+  'data': { 'socket': 'SocketAddress', 'device': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -874,7 +888,8 @@
     'tls-creds-psk',
     'tls-creds-x509',
     'tls-cipher-suites',
-    { 'name': 'x-remote-object', 'features': [ 'unstable' ] }
+    { 'name': 'x-remote-object', 'features': [ 'unstable' ] },
+    { 'name': 'x-vfio-user-server', 'features': [ 'unstable' ] }
   ] }
 
 ##
@@ -938,7 +953,8 @@
       'tls-creds-psk':              'TlsCredsPskProperties',
       'tls-creds-x509':             'TlsCredsX509Properties',
       'tls-cipher-suites':          'TlsCredsProperties',
-      'x-remote-object':            'RemoteObjectProperties'
+      'x-remote-object':            'RemoteObjectProperties',
+      'x-vfio-user-server':         'VfioUserServerProperties'
   } }
 
 ##
diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 8d0fa98d33..ac32fda387 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -24,6 +24,8 @@ struct RemoteMachineState {
     RemoteIOHubState iohub;
 
     bool vfio_user;
+
+    bool auto_shutdown;
 };
 
 /* Used to pass to co-routine device and ioc. */
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 9f3cdc55c3..4d008ed721 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -77,6 +77,28 @@ static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
     s->vfio_user = value;
 }
 
+static bool remote_machine_get_auto_shutdown(Object *obj, Error **errp)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    return s->auto_shutdown;
+}
+
+static void remote_machine_set_auto_shutdown(Object *obj, bool value,
+                                             Error **errp)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    s->auto_shutdown = value;
+}
+
+static void remote_machine_instance_init(Object *obj)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    s->auto_shutdown = true;
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -90,12 +112,17 @@ static void remote_machine_class_init(ObjectClass *oc, void *data)
     object_class_property_add_bool(oc, "vfio-user",
                                    remote_machine_get_vfio_user,
                                    remote_machine_set_vfio_user);
+
+    object_class_property_add_bool(oc, "auto-shutdown",
+                                   remote_machine_get_auto_shutdown,
+                                   remote_machine_set_auto_shutdown);
 }
 
 static const TypeInfo remote_machine = {
     .name = TYPE_REMOTE_MACHINE,
     .parent = TYPE_MACHINE,
     .instance_size = sizeof(RemoteMachineState),
+    .instance_init = remote_machine_instance_init,
     .class_init = remote_machine_class_init,
     .interfaces = (InterfaceInfo[]) {
         { TYPE_HOTPLUG_HANDLER },
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 0000000000..bc49adcc27
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,210 @@
+/**
+ * QEMU vfio-user-server server object
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ *     -machine x-remote,vfio-user=on,auto-shutdown=on
+ *     -device <PCI-device>,id=<pci-dev-id>
+ *     -object x-vfio-user-server,id=<id>,type=unix,path=<socket-path>,
+ *             device=<pci-dev-id>
+ *
+ * Note that x-vfio-user-server object must be used with x-remote machine only.
+ * This server could only support PCI devices for now.
+ *
+ * type - SocketAddress type - presently "unix" alone is supported. Required
+ *        option
+ *
+ * path - named unix socket, it will be created by the server. It is
+ *        a required option
+ *
+ * device - id of a device on the server, a required option. PCI devices
+ *          alone are supported presently.
+ */
+
+#include "qemu/osdep.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+#include "hw/boards.h"
+#include "hw/remote/machine.h"
+#include "qapi/error.h"
+#include "qapi/qapi-visit-sockets.h"
+
+#define TYPE_VFU_OBJECT "x-vfio-user-server"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+/**
+ * VFU_OBJECT_ERROR - reports an error message. If auto_shutdown
+ * is set, it aborts the machine on error. Otherwise, it logs an
+ * error message without aborting.
+ */
+#define VFU_OBJECT_ERROR(o, fmt, ...)                                     \
+    {                                                                     \
+        if (vfu_object_auto_shutdown()) {                                 \
+            error_setg(&error_abort, (fmt), ## __VA_ARGS__);              \
+        } else {                                                          \
+            error_report((fmt), ## __VA_ARGS__);                          \
+        }                                                                 \
+    }                                                                     \
+
+struct VfuObjectClass {
+    ObjectClass parent_class;
+
+    unsigned int nr_devs;
+};
+
+struct VfuObject {
+    /* private */
+    Object parent;
+
+    SocketAddress *socket;
+
+    char *device;
+
+    Error *err;
+};
+
+static bool vfu_object_auto_shutdown(void)
+{
+    bool auto_shutdown = true;
+    Error *local_err = NULL;
+
+    if (!current_machine) {
+        return auto_shutdown;
+    }
+
+    auto_shutdown = object_property_get_bool(OBJECT(current_machine),
+                                             "auto-shutdown",
+                                             &local_err);
+
+    /*
+     * local_err would be set if no such property exists - safe to ignore.
+     * Unlikely scenario as auto-shutdown is always defined for
+     * TYPE_REMOTE_MACHINE, and  TYPE_VFU_OBJECT only works with
+     * TYPE_REMOTE_MACHINE
+     */
+    if (local_err) {
+        auto_shutdown = true;
+        error_free(local_err);
+    }
+
+    return auto_shutdown;
+}
+
+static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    qapi_free_SocketAddress(o->socket);
+
+    o->socket = NULL;
+
+    visit_type_SocketAddress(v, name, &o->socket, errp);
+
+    if (o->socket->type != SOCKET_ADDRESS_TYPE_UNIX) {
+        error_setg(errp, "vfu: Unsupported socket type - %s",
+                   SocketAddressType_str(o->socket->type));
+        qapi_free_SocketAddress(o->socket);
+        o->socket = NULL;
+        return;
+    }
+
+    trace_vfu_prop("socket", o->socket->u.q_unix.path);
+}
+
+static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->device);
+
+    o->device = g_strdup(str);
+
+    trace_vfu_prop("device", str);
+}
+
+static void vfu_object_init(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs++;
+
+    if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
+        error_setg(&o->err, "vfu: %s only compatible with %s machine",
+                   TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
+        return;
+    }
+}
+
+static void vfu_object_finalize(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs--;
+
+    qapi_free_SocketAddress(o->socket);
+
+    o->socket = NULL;
+
+    g_free(o->device);
+
+    o->device = NULL;
+
+    if (!k->nr_devs && vfu_object_auto_shutdown()) {
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
+}
+
+static void vfu_object_class_init(ObjectClass *klass, void *data)
+{
+    VfuObjectClass *k = VFU_OBJECT_CLASS(klass);
+
+    k->nr_devs = 0;
+
+    object_class_property_add(klass, "socket", "SocketAddress", NULL,
+                              vfu_object_set_socket, NULL, NULL);
+    object_class_property_set_description(klass, "socket",
+                                          "SocketAddress "
+                                          "(ex: type=unix,path=/tmp/sock). "
+                                          "Only UNIX is presently supported");
+    object_class_property_add_str(klass, "device", NULL,
+                                  vfu_object_set_device);
+    object_class_property_set_description(klass, "device",
+                                          "device ID - only PCI devices "
+                                          "are presently supported");
+}
+
+static const TypeInfo vfu_object_info = {
+    .name = TYPE_VFU_OBJECT,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(VfuObject),
+    .instance_init = vfu_object_init,
+    .instance_finalize = vfu_object_finalize,
+    .class_size = sizeof(VfuObjectClass),
+    .class_init = vfu_object_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void vfu_register_types(void)
+{
+    type_register_static(&vfu_object_info);
+}
+
+type_init(vfu_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index c92d87daac..8121671228 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3603,6 +3603,7 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
+F: hw/remote/vfio-user-obj.c
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 7da83350c8..0eb5a0f375 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: libvfio_user_dep)
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 0b23974f90..7da12f0d96 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -2,3 +2,6 @@
 
 mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
 mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
+
+# vfio-user-obj.c
+vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 06/14] vfio-user: instantiate vfio-user context
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (4 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 05/14] vfio-user: define vfio-user-server object Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 07/14] vfio-user: find and init PCI device Jagannathan Raman
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

create a context with the vfio-user library to run a PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 96 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 96 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index bc49adcc27..68aac0c2b9 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -40,6 +40,9 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/notify.h"
+#include "sysemu/sysemu.h"
+#include "libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -73,8 +76,14 @@ struct VfuObject {
     char *device;
 
     Error *err;
+
+    Notifier machine_done;
+
+    vfu_ctx_t *vfu_ctx;
 };
 
+static void vfu_object_init_ctx(VfuObject *o, Error **errp);
+
 static bool vfu_object_auto_shutdown(void)
 {
     bool auto_shutdown = true;
@@ -107,6 +116,11 @@ static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
 {
     VfuObject *o = VFU_OBJECT(obj);
 
+    if (o->vfu_ctx) {
+        error_setg(errp, "vfu: Unable to set socket property - server busy");
+        return;
+    }
+
     qapi_free_SocketAddress(o->socket);
 
     o->socket = NULL;
@@ -122,17 +136,83 @@ static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
     }
 
     trace_vfu_prop("socket", o->socket->u.q_unix.path);
+
+    vfu_object_init_ctx(o, errp);
 }
 
 static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
 {
     VfuObject *o = VFU_OBJECT(obj);
 
+    if (o->vfu_ctx) {
+        error_setg(errp, "vfu: Unable to set device property - server busy");
+        return;
+    }
+
     g_free(o->device);
 
     o->device = g_strdup(str);
 
     trace_vfu_prop("device", str);
+
+    vfu_object_init_ctx(o, errp);
+}
+
+/*
+ * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
+ * properties. It also depends on devices instantiated in QEMU. These
+ * dependencies are not available during the instance_init phase of this
+ * object's life-cycle. As such, the server is initialized after the
+ * machine is setup. machine_init_done_notifier notifies TYPE_VFU_OBJECT
+ * when the machine is setup, and the dependencies are available.
+ */
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+    VfuObject *o = container_of(notifier, VfuObject, machine_done);
+    Error *err = NULL;
+
+    vfu_object_init_ctx(o, &err);
+
+    if (err) {
+        error_propagate(&error_abort, err);
+    }
+}
+
+/**
+ * vfu_object_init_ctx: Create and initialize libvfio-user context. Add
+ *   an unplug blocker for the associated PCI device. Setup a FD handler
+ *   to process incoming messages in the context's socket.
+ *
+ *   The socket and device properties are mandatory, and this function
+ *   will not create the context without them - the setters for these
+ *   properties should call this function when the property is set. The
+ *   machine should also be ready when this function is invoked - it is
+ *   because QEMU objects are initialized before devices, and the
+ *   associated PCI device wouldn't be available at the object
+ *   initialization time. Until these conditions are satisfied, this
+ *   function would return early without performing any task.
+ */
+static void vfu_object_init_ctx(VfuObject *o, Error **errp)
+{
+    ERRP_GUARD();
+
+    if (o->vfu_ctx || !o->socket || !o->device ||
+            !phase_check(PHASE_MACHINE_READY)) {
+        return;
+    }
+
+    if (o->err) {
+        error_propagate(errp, o->err);
+        o->err = NULL;
+        return;
+    }
+
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+                                o, VFU_DEV_TYPE_PCI);
+    if (o->vfu_ctx == NULL) {
+        error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
+        return;
+    }
 }
 
 static void vfu_object_init(Object *obj)
@@ -147,6 +227,12 @@ static void vfu_object_init(Object *obj)
                    TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
         return;
     }
+
+    if (!phase_check(PHASE_MACHINE_READY)) {
+        o->machine_done.notify = vfu_object_machine_done;
+        qemu_add_machine_init_done_notifier(&o->machine_done);
+    }
+
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -160,6 +246,11 @@ static void vfu_object_finalize(Object *obj)
 
     o->socket = NULL;
 
+    if (o->vfu_ctx) {
+        vfu_destroy_ctx(o->vfu_ctx);
+        o->vfu_ctx = NULL;
+    }
+
     g_free(o->device);
 
     o->device = NULL;
@@ -167,6 +258,11 @@ static void vfu_object_finalize(Object *obj)
     if (!k->nr_devs && vfu_object_auto_shutdown()) {
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
+
+    if (o->machine_done.notify) {
+        qemu_remove_machine_init_done_notifier(&o->machine_done);
+        o->machine_done.notify = NULL;
+    }
 }
 
 static void vfu_object_class_init(ObjectClass *klass, void *data)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 07/14] vfio-user: find and init PCI device
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (5 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 06/14] vfio-user: instantiate vfio-user context Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 08/14] vfio-user: run vfio-user context Jagannathan Raman
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 67 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 67 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 68aac0c2b9..fdee274933 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -43,6 +43,8 @@
 #include "qemu/notify.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -80,6 +82,10 @@ struct VfuObject {
     Notifier machine_done;
 
     vfu_ctx_t *vfu_ctx;
+
+    PCIDevice *pci_dev;
+
+    Error *unplug_blocker;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -195,6 +201,9 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
 static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 {
     ERRP_GUARD();
+    DeviceState *dev = NULL;
+    vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
+    int ret;
 
     if (o->vfu_ctx || !o->socket || !o->device ||
             !phase_check(PHASE_MACHINE_READY)) {
@@ -213,6 +222,53 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
         return;
     }
+
+    dev = qdev_find_recursive(sysbus_get_default(), o->device);
+    if (dev == NULL) {
+        error_setg(errp, "vfu: Device %s not found", o->device);
+        goto fail;
+    }
+
+    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        error_setg(errp, "vfu: %s not a PCI device", o->device);
+        goto fail;
+    }
+
+    o->pci_dev = PCI_DEVICE(dev);
+
+    object_ref(OBJECT(o->pci_dev));
+
+    if (pci_is_express(o->pci_dev)) {
+        pci_type = VFU_PCI_TYPE_EXPRESS;
+    }
+
+    ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
+    if (ret < 0) {
+        error_setg(errp,
+                   "vfu: Failed to attach PCI device %s to context - %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
+    error_setg(&o->unplug_blocker,
+               "vfu: %s for %s must be deleted before unplugging",
+               TYPE_VFU_OBJECT, o->device);
+    qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+
+    return;
+
+fail:
+    vfu_destroy_ctx(o->vfu_ctx);
+    if (o->unplug_blocker && o->pci_dev) {
+        qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+        error_free(o->unplug_blocker);
+        o->unplug_blocker = NULL;
+    }
+    if (o->pci_dev) {
+        object_unref(OBJECT(o->pci_dev));
+        o->pci_dev = NULL;
+    }
+    o->vfu_ctx = NULL;
 }
 
 static void vfu_object_init(Object *obj)
@@ -255,6 +311,17 @@ static void vfu_object_finalize(Object *obj)
 
     o->device = NULL;
 
+    if (o->unplug_blocker && o->pci_dev) {
+        qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+        error_free(o->unplug_blocker);
+        o->unplug_blocker = NULL;
+    }
+
+    if (o->pci_dev) {
+        object_unref(OBJECT(o->pci_dev));
+        o->pci_dev = NULL;
+    }
+
     if (!k->nr_devs && vfu_object_auto_shutdown()) {
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 08/14] vfio-user: run vfio-user context
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (6 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 07/14] vfio-user: find and init PCI device Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 09/14] vfio-user: handle PCI config space accesses Jagannathan Raman
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/misc.json            |  31 ++++++++++++
 hw/remote/vfio-user-obj.c | 104 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 134 insertions(+), 1 deletion(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index 45344483cd..27ef5a2b20 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -553,3 +553,34 @@
 ##
 { 'event': 'RTC_CHANGE',
   'data': { 'offset': 'int', 'qom-path': 'str' } }
+
+##
+# @VFU_CLIENT_HANGUP:
+#
+# Emitted when the client of a TYPE_VFIO_USER_SERVER closes the
+# communication channel
+#
+# @vfu-id: ID of the TYPE_VFIO_USER_SERVER object. It is the last component
+#          of @vfu-qom-path referenced below
+#
+# @vfu-qom-path: path to the TYPE_VFIO_USER_SERVER object in the QOM tree
+#
+# @dev-id: ID of attached PCI device
+#
+# @dev-qom-path: path to attached PCI device in the QOM tree
+#
+# Since: 7.1
+#
+# Example:
+#
+# <- { "event": "VFU_CLIENT_HANGUP",
+#      "data": { "vfu-id": "vfu1",
+#                "vfu-qom-path": "/objects/vfu1",
+#                "dev-id": "sas1",
+#                "dev-qom-path": "/machine/peripheral/sas1" },
+#      "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+#
+##
+{ 'event': 'VFU_CLIENT_HANGUP',
+  'data': { 'vfu-id': 'str', 'vfu-qom-path': 'str',
+            'dev-id': 'str', 'dev-qom-path': 'str' } }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index fdee274933..fb5c46331c 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,6 +27,9 @@
  *
  * device - id of a device on the server, a required option. PCI devices
  *          alone are supported presently.
+ *
+ * notes - x-vfio-user-server could block IO and monitor during the
+ *         initialization phase.
  */
 
 #include "qemu/osdep.h"
@@ -40,11 +43,14 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "qemu/timer.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -86,6 +92,8 @@ struct VfuObject {
     PCIDevice *pci_dev;
 
     Error *unplug_blocker;
+
+    int vfu_poll_fd;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -164,6 +172,78 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
     vfu_object_init_ctx(o, errp);
 }
 
+static void vfu_object_ctx_run(void *opaque)
+{
+    VfuObject *o = opaque;
+    const char *vfu_id;
+    char *vfu_path, *pci_dev_path;
+    int ret = -1;
+
+    while (ret != 0) {
+        ret = vfu_run_ctx(o->vfu_ctx);
+        if (ret < 0) {
+            if (errno == EINTR) {
+                continue;
+            } else if (errno == ENOTCONN) {
+                vfu_id = object_get_canonical_path_component(OBJECT(o));
+                vfu_path = object_get_canonical_path(OBJECT(o));
+                g_assert(o->pci_dev);
+                pci_dev_path = object_get_canonical_path(OBJECT(o->pci_dev));
+                 /* o->device is a required property and is non-NULL here */
+                g_assert(o->device);
+                qapi_event_send_vfu_client_hangup(vfu_id, vfu_path,
+                                                  o->device, pci_dev_path);
+                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+                o->vfu_poll_fd = -1;
+                object_unparent(OBJECT(o));
+                g_free(vfu_path);
+                g_free(pci_dev_path);
+                break;
+            } else {
+                VFU_OBJECT_ERROR(o, "vfu: Failed to run device %s - %s",
+                                 o->device, strerror(errno));
+                break;
+            }
+        }
+    }
+}
+
+static void vfu_object_attach_ctx(void *opaque)
+{
+    VfuObject *o = opaque;
+    GPollFD pfds[1];
+    int ret;
+
+    qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+
+    pfds[0].fd = o->vfu_poll_fd;
+    pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+
+retry_attach:
+    ret = vfu_attach_ctx(o->vfu_ctx);
+    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
+        /**
+         * vfu_object_attach_ctx can block QEMU's main loop
+         * during attach - the monitor and other IO
+         * could be unresponsive during this time.
+         */
+        (void)qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
+        goto retry_attach;
+    } else if (ret < 0) {
+        VFU_OBJECT_ERROR(o, "vfu: Failed to attach device %s to context - %s",
+                         o->device, strerror(errno));
+        return;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        VFU_OBJECT_ERROR(o, "vfu: Failed to get poll fd %s", o->device);
+        return;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -216,7 +296,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         return;
     }
 
-    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path,
+                                LIBVFIO_USER_FLAG_ATTACH_NB,
                                 o, VFU_DEV_TYPE_PCI);
     if (o->vfu_ctx == NULL) {
         error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
@@ -255,6 +336,21 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
                TYPE_VFU_OBJECT, o->device);
     qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
 
+    ret = vfu_realize_ctx(o->vfu_ctx);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to realize device %s- %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        error_setg(errp, "vfu: Failed to get poll fd %s", o->device);
+        goto fail;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_attach_ctx, NULL, o);
+
     return;
 
 fail:
@@ -289,6 +385,7 @@ static void vfu_object_init(Object *obj)
         qemu_add_machine_init_done_notifier(&o->machine_done);
     }
 
+    o->vfu_poll_fd = -1;
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -302,6 +399,11 @@ static void vfu_object_finalize(Object *obj)
 
     o->socket = NULL;
 
+    if (o->vfu_poll_fd != -1) {
+        qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+        o->vfu_poll_fd = -1;
+    }
+
     if (o->vfu_ctx) {
         vfu_destroy_ctx(o->vfu_ctx);
         o->vfu_ctx = NULL;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 09/14] vfio-user: handle PCI config space accesses
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (7 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 08/14] vfio-user: run vfio-user context Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 10/14] vfio-user: IOMMU support for remote device Jagannathan Raman
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 51 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 53 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index fb5c46331c..575bd47397 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -46,6 +46,7 @@
 #include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
@@ -244,6 +245,45 @@ retry_attach:
     qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+                                     size_t count, loff_t offset,
+                                     const bool is_write)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint32_t pci_access_width = sizeof(uint32_t);
+    size_t bytes = count;
+    uint32_t val = 0;
+    char *ptr = buf;
+    int len;
+
+    /*
+     * Writes to the BAR registers would trigger an update to the
+     * global Memory and IO AddressSpaces. But the remote device
+     * never uses the global AddressSpaces, therefore overlapping
+     * memory regions are not a problem
+     */
+    while (bytes > 0) {
+        len = (bytes > pci_access_width) ? pci_access_width : bytes;
+        if (is_write) {
+            memcpy(&val, ptr, len);
+            pci_host_config_write_common(o->pci_dev, offset,
+                                         pci_config_size(o->pci_dev),
+                                         val, len);
+            trace_vfu_cfg_write(offset, val);
+        } else {
+            val = pci_host_config_read_common(o->pci_dev, offset,
+                                              pci_config_size(o->pci_dev), len);
+            memcpy(ptr, &val, len);
+            trace_vfu_cfg_read(offset, val);
+        }
+        offset += len;
+        ptr += len;
+        bytes -= len;
+    }
+
+    return count;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -336,6 +376,17 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
                TYPE_VFU_OBJECT, o->device);
     qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
 
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+                           pci_config_size(o->pci_dev), &vfu_object_cfg_access,
+                           VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+                           NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(errp,
+                   "vfu: Failed to setup config space handlers for %s- %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0d96..2ef7884346 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 10/14] vfio-user: IOMMU support for remote device
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (8 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 09/14] vfio-user: handle PCI config space accesses Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 11/14] vfio-user: handle DMA mappings Jagannathan Raman
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Assign separate address space for each device in the remote processes.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/iommu.h |  40 ++++++++++++
 hw/remote/iommu.c         | 131 ++++++++++++++++++++++++++++++++++++++
 hw/remote/machine.c       |  13 +++-
 MAINTAINERS               |   2 +
 hw/remote/meson.build     |   1 +
 5 files changed, 186 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/remote/iommu.h
 create mode 100644 hw/remote/iommu.c

diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
new file mode 100644
index 0000000000..33b68a8f4b
--- /dev/null
+++ b/include/hw/remote/iommu.h
@@ -0,0 +1,40 @@
+/**
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_IOMMU_H
+#define REMOTE_IOMMU_H
+
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
+
+#ifndef INT2VOIDP
+#define INT2VOIDP(i) (void *)(uintptr_t)(i)
+#endif
+
+typedef struct RemoteIommuElem {
+    MemoryRegion *mr;
+
+    AddressSpace as;
+} RemoteIommuElem;
+
+#define TYPE_REMOTE_IOMMU "x-remote-iommu"
+OBJECT_DECLARE_SIMPLE_TYPE(RemoteIommu, REMOTE_IOMMU)
+
+struct RemoteIommu {
+    Object parent;
+
+    GHashTable *elem_by_devfn;
+
+    QemuMutex lock;
+};
+
+void remote_iommu_setup(PCIBus *pci_bus);
+
+void remote_iommu_unplug_dev(PCIDevice *pci_dev);
+
+#endif
diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
new file mode 100644
index 0000000000..fd723d91f3
--- /dev/null
+++ b/hw/remote/iommu.c
@@ -0,0 +1,131 @@
+/**
+ * IOMMU for remote device
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/remote/iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
+
+/**
+ * IOMMU for TYPE_REMOTE_MACHINE - manages DMA address space isolation
+ *     for remote machine. It is used by TYPE_VFIO_USER_SERVER.
+ *
+ * - Each TYPE_VFIO_USER_SERVER instance handles one PCIDevice on a PCIBus.
+ *   There is one RemoteIommu per PCIBus, so the RemoteIommu tracks multiple
+ *   PCIDevices by maintaining a ->elem_by_devfn mapping.
+ *
+ * - memory_region_init_iommu() is not used because vfio-user MemoryRegions
+ *   will be added to the elem->mr container instead. This is more natural
+ *   than implementing the IOMMUMemoryRegionClass APIs since vfio-user
+ *   provides something that is close to a full-fledged MemoryRegion and
+ *   not like an IOMMU mapping.
+ *
+ * - When a device is hot unplugged, the elem->mr reference is dropped so
+ *   all vfio-user MemoryRegions associated with this vfio-user server are
+ *   destroyed.
+ */
+
+static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
+                                              void *opaque, int devfn)
+{
+    RemoteIommu *iommu = opaque;
+    RemoteIommuElem *elem = NULL;
+
+    qemu_mutex_lock(&iommu->lock);
+
+    elem = g_hash_table_lookup(iommu->elem_by_devfn, INT2VOIDP(devfn));
+
+    if (!elem) {
+        elem = g_malloc0(sizeof(RemoteIommuElem));
+        g_hash_table_insert(iommu->elem_by_devfn, INT2VOIDP(devfn), elem);
+    }
+
+    if (!elem->mr) {
+        elem->mr = MEMORY_REGION(object_new(TYPE_MEMORY_REGION));
+        memory_region_set_size(elem->mr, UINT64_MAX);
+        address_space_init(&elem->as, elem->mr, NULL);
+    }
+
+    qemu_mutex_unlock(&iommu->lock);
+
+    return &elem->as;
+}
+
+void remote_iommu_unplug_dev(PCIDevice *pci_dev)
+{
+    AddressSpace *as = pci_device_iommu_address_space(pci_dev);
+    RemoteIommuElem *elem = NULL;
+
+    if (as == &address_space_memory) {
+        return;
+    }
+
+    elem = container_of(as, RemoteIommuElem, as);
+
+    address_space_destroy(&elem->as);
+
+    object_unref(elem->mr);
+
+    elem->mr = NULL;
+}
+
+static void remote_iommu_init(Object *obj)
+{
+    RemoteIommu *iommu = REMOTE_IOMMU(obj);
+
+    iommu->elem_by_devfn = g_hash_table_new_full(NULL, NULL, NULL, g_free);
+
+    qemu_mutex_init(&iommu->lock);
+}
+
+static void remote_iommu_finalize(Object *obj)
+{
+    RemoteIommu *iommu = REMOTE_IOMMU(obj);
+
+    qemu_mutex_destroy(&iommu->lock);
+
+    g_hash_table_destroy(iommu->elem_by_devfn);
+
+    iommu->elem_by_devfn = NULL;
+}
+
+void remote_iommu_setup(PCIBus *pci_bus)
+{
+    RemoteIommu *iommu = NULL;
+
+    g_assert(pci_bus);
+
+    iommu = REMOTE_IOMMU(object_new(TYPE_REMOTE_IOMMU));
+
+    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, iommu);
+
+    object_property_add_child(OBJECT(pci_bus), "remote-iommu", OBJECT(iommu));
+
+    object_unref(OBJECT(iommu));
+}
+
+static const TypeInfo remote_iommu_info = {
+    .name = TYPE_REMOTE_IOMMU,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(RemoteIommu),
+    .instance_init = remote_iommu_init,
+    .instance_finalize = remote_iommu_finalize,
+};
+
+static void remote_iommu_register_types(void)
+{
+    type_register_static(&remote_iommu_info);
+}
+
+type_init(remote_iommu_register_types)
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 4d008ed721..cbb2add291 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -20,6 +20,7 @@
 #include "qapi/error.h"
 #include "hw/pci/pci_host.h"
 #include "hw/remote/iohub.h"
+#include "hw/remote/iommu.h"
 #include "hw/qdev-core.h"
 
 static void remote_machine_init(MachineState *machine)
@@ -99,6 +100,16 @@ static void remote_machine_instance_init(Object *obj)
     s->auto_shutdown = true;
 }
 
+static void remote_machine_dev_unplug_cb(HotplugHandler *hotplug_dev,
+                                         DeviceState *dev, Error **errp)
+{
+    qdev_unrealize(dev);
+
+    if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        remote_iommu_unplug_dev(PCI_DEVICE(dev));
+    }
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -107,7 +118,7 @@ static void remote_machine_class_init(ObjectClass *oc, void *data)
     mc->init = remote_machine_init;
     mc->desc = "Experimental remote machine";
 
-    hc->unplug = qdev_simple_device_unplug_cb;
+    hc->unplug = remote_machine_dev_unplug_cb;
 
     object_class_property_add_bool(oc, "vfio-user",
                                    remote_machine_get_vfio_user,
diff --git a/MAINTAINERS b/MAINTAINERS
index 8121671228..9d8695b68d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3604,6 +3604,8 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: hw/remote/iommu.c
+F: include/hw/remote/iommu.h
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 0eb5a0f375..ab25c04906 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iommu.c'))
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: libvfio_user_dep)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 11/14] vfio-user: handle DMA mappings
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (9 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 10/14] vfio-user: IOMMU support for remote device Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 12/14] vfio-user: handle PCI BAR accesses Jagannathan Raman
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/machine.c       |  5 ++++
 hw/remote/vfio-user-obj.c | 55 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 3 files changed, 62 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index cbb2add291..645b54343d 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -22,6 +22,7 @@
 #include "hw/remote/iohub.h"
 #include "hw/remote/iommu.h"
 #include "hw/qdev-core.h"
+#include "hw/remote/iommu.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -51,6 +52,10 @@ static void remote_machine_init(MachineState *machine)
 
     pci_host = PCI_HOST_BRIDGE(rem_host);
 
+    if (s->vfio_user) {
+        remote_iommu_setup(pci_host->bus);
+    }
+
     remote_iohub_init(&s->iohub);
 
     pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 575bd47397..8d208f1294 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -284,6 +284,54 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
     return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    AddressSpace *dma_as = NULL;
+    MemoryRegion *subregion = NULL;
+    g_autofree char *name = NULL;
+    struct iovec *iov = &info->iova;
+
+    if (!info->vaddr) {
+        return;
+    }
+
+    name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
+                           (uint64_t)info->vaddr);
+
+    subregion = g_new0(MemoryRegion, 1);
+
+    memory_region_init_ram_ptr(subregion, NULL, name,
+                               iov->iov_len, info->vaddr);
+
+    dma_as = pci_device_iommu_address_space(o->pci_dev);
+
+    memory_region_add_subregion(dma_as->root, (hwaddr)iov->iov_base, subregion);
+
+    trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    AddressSpace *dma_as = NULL;
+    MemoryRegion *mr = NULL;
+    ram_addr_t offset;
+
+    mr = memory_region_from_host(info->vaddr, &offset);
+    if (!mr) {
+        return;
+    }
+
+    dma_as = pci_device_iommu_address_space(o->pci_dev);
+
+    memory_region_del_subregion(dma_as->root, mr);
+
+    object_unparent((OBJECT(mr)));
+
+    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -387,6 +435,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, &dma_unregister);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup DMA handlers for %s",
+                   o->device);
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884346..f945c7e33b 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 12/14] vfio-user: handle PCI BAR accesses
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (10 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 11/14] vfio-user: handle DMA mappings Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-24 15:30 ` [PATCH v10 13/14] vfio-user: handle device interrupts Jagannathan Raman
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/exec/memory.h           |   3 +
 hw/remote/vfio-user-obj.c       | 190 ++++++++++++++++++++++++++++++++
 softmmu/physmem.c               |   4 +-
 tests/qtest/fuzz/generic_fuzz.c |   9 +-
 hw/remote/trace-events          |   3 +
 5 files changed, 203 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index f1c19451bc..a6a0f4d8ad 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2810,6 +2810,9 @@ MemTxResult address_space_write_cached_slow(MemoryRegionCache *cache,
                                             hwaddr addr, const void *buf,
                                             hwaddr len);
 
+int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
+bool prepare_mmio_access(MemoryRegion *mr);
+
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
     if (is_write) {
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 8d208f1294..ee28a93782 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -52,6 +52,7 @@
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
 #include "qemu/timer.h"
+#include "exec/memory.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -332,6 +333,193 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
     trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
 }
 
+static int vfu_object_mr_rw(MemoryRegion *mr, uint8_t *buf, hwaddr offset,
+                            hwaddr size, const bool is_write)
+{
+    uint8_t *ptr = buf;
+    bool release_lock = false;
+    uint8_t *ram_ptr = NULL;
+    MemTxResult result;
+    int access_size;
+    uint64_t val;
+
+    if (memory_access_is_direct(mr, is_write)) {
+        /**
+         * Some devices expose a PCI expansion ROM, which could be buffer
+         * based as compared to other regions which are primarily based on
+         * MemoryRegionOps. memory_region_find() would already check
+         * for buffer overflow, we don't need to repeat it here.
+         */
+        ram_ptr = memory_region_get_ram_ptr(mr);
+
+        if (is_write) {
+            memcpy((ram_ptr + offset), buf, size);
+        } else {
+            memcpy(buf, (ram_ptr + offset), size);
+        }
+
+        return 0;
+    }
+
+    while (size) {
+        /**
+         * The read/write logic used below is similar to the ones in
+         * flatview_read/write_continue()
+         */
+        release_lock = prepare_mmio_access(mr);
+
+        access_size = memory_access_size(mr, size, offset);
+
+        if (is_write) {
+            val = ldn_he_p(ptr, access_size);
+
+            result = memory_region_dispatch_write(mr, offset, val,
+                                                  size_memop(access_size),
+                                                  MEMTXATTRS_UNSPECIFIED);
+        } else {
+            result = memory_region_dispatch_read(mr, offset, &val,
+                                                 size_memop(access_size),
+                                                 MEMTXATTRS_UNSPECIFIED);
+
+            stn_he_p(ptr, access_size, val);
+        }
+
+        if (release_lock) {
+            qemu_mutex_unlock_iothread();
+            release_lock = false;
+        }
+
+        if (result != MEMTX_OK) {
+            return -1;
+        }
+
+        size -= access_size;
+        ptr += access_size;
+        offset += access_size;
+    }
+
+    return 0;
+}
+
+static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
+                                hwaddr bar_offset, char * const buf,
+                                hwaddr len, const bool is_write)
+{
+    MemoryRegionSection section = { 0 };
+    uint8_t *ptr = (uint8_t *)buf;
+    MemoryRegion *section_mr = NULL;
+    uint64_t section_size;
+    hwaddr section_offset;
+    hwaddr size = 0;
+
+    while (len) {
+        section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
+                                     bar_offset, len);
+
+        if (!section.mr) {
+            warn_report("vfu: invalid address 0x%"PRIx64"", bar_offset);
+            return size;
+        }
+
+        section_mr = section.mr;
+        section_offset = section.offset_within_region;
+        section_size = int128_get64(section.size);
+
+        if (is_write && section_mr->readonly) {
+            warn_report("vfu: attempting to write to readonly region in "
+                        "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
+                        pci_bar, bar_offset,
+                        (bar_offset + section_size));
+            memory_region_unref(section_mr);
+            return size;
+        }
+
+        if (vfu_object_mr_rw(section_mr, ptr, section_offset,
+                             section_size, is_write)) {
+            warn_report("vfu: failed to %s "
+                        "[0x%"PRIx64" - 0x%"PRIx64"] in bar %d",
+                        is_write ? "write to" : "read from", bar_offset,
+                        (bar_offset + section_size), pci_bar);
+            memory_region_unref(section_mr);
+            return size;
+        }
+
+        size += section_size;
+        bar_offset += section_size;
+        ptr += section_size;
+        len -= section_size;
+
+        memory_region_unref(section_mr);
+    }
+
+    return size;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
+    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
+                                        char * const buf, size_t count,        \
+                                        loff_t offset, const bool is_write)    \
+    {                                                                          \
+        VfuObject *o = vfu_get_private(vfu_ctx);                               \
+        PCIDevice *pci_dev = o->pci_dev;                                       \
+                                                                               \
+        return vfu_object_bar_rw(pci_dev, BAR_NO, offset,                      \
+                                 buf, count, is_write);                        \
+    }                                                                          \
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+VFU_OBJECT_BAR_HANDLER(6)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+    &vfu_object_bar0_handler,
+    &vfu_object_bar1_handler,
+    &vfu_object_bar2_handler,
+    &vfu_object_bar3_handler,
+    &vfu_object_bar4_handler,
+    &vfu_object_bar5_handler,
+    &vfu_object_bar6_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ *                            callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+    int flags = VFU_REGION_FLAG_RW;
+    int i;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        if (!pdev->io_regions[i].size) {
+            continue;
+        }
+
+        if ((i == VFU_PCI_DEV_ROM_REGION_IDX) ||
+            pdev->io_regions[i].memory->readonly) {
+            flags &= ~VFU_REGION_FLAG_WRITE;
+        }
+
+        vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i,
+                         (size_t)pdev->io_regions[i].size,
+                         vfu_object_bar_handlers[i],
+                         flags, NULL, 0, -1, 0);
+
+        trace_vfu_bar_register(i, pdev->io_regions[i].addr,
+                               pdev->io_regions[i].size);
+    }
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -442,6 +630,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 657841eed0..fb16be57a6 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2719,7 +2719,7 @@ void memory_region_flush_rom_device(MemoryRegion *mr, hwaddr addr, hwaddr size)
     invalidate_and_set_dirty(mr, addr, size);
 }
 
-static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
+int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
 {
     unsigned access_size_max = mr->ops->valid.max_access_size;
 
@@ -2746,7 +2746,7 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
     return l;
 }
 
-static bool prepare_mmio_access(MemoryRegion *mr)
+bool prepare_mmio_access(MemoryRegion *mr)
 {
     bool release_lock = false;
 
diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index 25df19fd5a..447ffe8178 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -144,7 +144,7 @@ static void *pattern_alloc(pattern p, size_t len)
     return buf;
 }
 
-static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
+static int fuzz_memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
 {
     unsigned access_size_max = mr->ops->valid.max_access_size;
 
@@ -242,11 +242,12 @@ void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion *mr)
 
         /*
          *  If mr1 isn't RAM, address_space_translate doesn't update l. Use
-         *  memory_access_size to identify the number of bytes that it is safe
-         *  to write without accidentally writing to another MemoryRegion.
+         *  fuzz_memory_access_size to identify the number of bytes that it
+         *  is safe to write without accidentally writing to another
+         *  MemoryRegion.
          */
         if (!memory_region_is_ram(mr1)) {
-            l = memory_access_size(mr1, l, addr1);
+            l = fuzz_memory_access_size(mr1, l, addr1);
         }
         if (memory_region_is_ram(mr1) ||
             memory_region_is_romd(mr1) ||
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e33b..847d50d88f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,6 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
+vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
+vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
+vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (11 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 12/14] vfio-user: handle PCI BAR accesses Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-25 14:53   ` Stefan Hajnoczi
  2022-06-06 18:32   ` Alexander Duyck
  2022-05-24 15:30 ` [PATCH v10 14/14] vfio-user: handle reset of remote device Jagannathan Raman
  2022-05-25 14:55 ` [PATCH v10 00/14] vfio-user server in QEMU Stefan Hajnoczi
  14 siblings, 2 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/pci/pci.h              |  13 ++++
 include/hw/remote/vfio-user-obj.h |   6 ++
 hw/pci/msi.c                      |  16 ++--
 hw/pci/msix.c                     |  10 ++-
 hw/pci/pci.c                      |  13 ++++
 hw/remote/machine.c               |  14 +++-
 hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
 stubs/vfio-user-obj.c             |   6 ++
 MAINTAINERS                       |   1 +
 hw/remote/trace-events            |   1 +
 stubs/meson.build                 |   1 +
 11 files changed, 193 insertions(+), 11 deletions(-)
 create mode 100644 include/hw/remote/vfio-user-obj.h
 create mode 100644 stubs/vfio-user-obj.c

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 44dacfa224..b54b6ef88f 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -16,6 +16,7 @@ extern bool pci_available;
 #define PCI_SLOT(devfn)         (((devfn) >> 3) & 0x1f)
 #define PCI_FUNC(devfn)         ((devfn) & 0x07)
 #define PCI_BUILD_BDF(bus, devfn)     ((bus << 8) | (devfn))
+#define PCI_BDF_TO_DEVFN(x)     ((x) & 0xff)
 #define PCI_BUS_MAX             256
 #define PCI_DEVFN_MAX           256
 #define PCI_SLOT_MAX            32
@@ -127,6 +128,10 @@ typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
                                 pcibus_t addr, pcibus_t size, int type);
 typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
 
+typedef void MSITriggerFunc(PCIDevice *dev, MSIMessage msg);
+typedef MSIMessage MSIPrepareMessageFunc(PCIDevice *dev, unsigned vector);
+typedef MSIMessage MSIxPrepareMessageFunc(PCIDevice *dev, unsigned vector);
+
 typedef struct PCIIORegion {
     pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
 #define PCI_BAR_UNMAPPED (~(pcibus_t)0)
@@ -329,6 +334,14 @@ struct PCIDevice {
     /* Space to store MSIX table & pending bit array */
     uint8_t *msix_table;
     uint8_t *msix_pba;
+
+    /* May be used by INTx or MSI during interrupt notification */
+    void *irq_opaque;
+
+    MSITriggerFunc *msi_trigger;
+    MSIPrepareMessageFunc *msi_prepare_message;
+    MSIxPrepareMessageFunc *msix_prepare_message;
+
     /* MemoryRegion container for msix exclusive BAR setup */
     MemoryRegion msix_exclusive_bar;
     /* Memory Regions for MSIX table and pending bit entries. */
diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-user-obj.h
new file mode 100644
index 0000000000..87ab78b875
--- /dev/null
+++ b/include/hw/remote/vfio-user-obj.h
@@ -0,0 +1,6 @@
+#ifndef VFIO_USER_OBJ_H
+#define VFIO_USER_OBJ_H
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus);
+
+#endif
diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index 47d2b0f33c..d556e17a09 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -134,7 +134,7 @@ void msi_set_message(PCIDevice *dev, MSIMessage msg)
     pci_set_word(dev->config + msi_data_off(dev, msi64bit), msg.data);
 }
 
-MSIMessage msi_get_message(PCIDevice *dev, unsigned int vector)
+static MSIMessage msi_prepare_message(PCIDevice *dev, unsigned int vector)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
@@ -159,6 +159,11 @@ MSIMessage msi_get_message(PCIDevice *dev, unsigned int vector)
     return msg;
 }
 
+MSIMessage msi_get_message(PCIDevice *dev, unsigned int vector)
+{
+    return dev->msi_prepare_message(dev, vector);
+}
+
 bool msi_enabled(const PCIDevice *dev)
 {
     return msi_present(dev) &&
@@ -241,6 +246,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
                      0xffffffff >> (PCI_MSI_VECTORS_MAX - nr_vectors));
     }
 
+    dev->msi_prepare_message = msi_prepare_message;
+
     return 0;
 }
 
@@ -256,6 +263,7 @@ void msi_uninit(struct PCIDevice *dev)
     cap_size = msi_cap_sizeof(flags);
     pci_del_capability(dev, PCI_CAP_ID_MSI, cap_size);
     dev->cap_present &= ~QEMU_PCI_CAP_MSI;
+    dev->msi_prepare_message = NULL;
 
     MSI_DEV_PRINTF(dev, "uninit\n");
 }
@@ -334,11 +342,7 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
 
 void msi_send_message(PCIDevice *dev, MSIMessage msg)
 {
-    MemTxAttrs attrs = {};
-
-    attrs.requester_id = pci_requester_id(dev);
-    address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
-                         attrs, NULL);
+    dev->msi_trigger(dev, msg);
 }
 
 /* Normally called by pci_default_write_config(). */
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index ae9331cd0b..6f85192d6f 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -31,7 +31,7 @@
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
 #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
 
-MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
+static MSIMessage msix_prepare_message(PCIDevice *dev, unsigned vector)
 {
     uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
     MSIMessage msg;
@@ -41,6 +41,11 @@ MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
     return msg;
 }
 
+MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
+{
+    return dev->msix_prepare_message(dev, vector);
+}
+
 /*
  * Special API for POWER to configure the vectors through
  * a side channel. Should never be used by devices.
@@ -344,6 +349,8 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
                           "msix-pba", pba_size);
     memory_region_add_subregion(pba_bar, pba_offset, &dev->msix_pba_mmio);
 
+    dev->msix_prepare_message = msix_prepare_message;
+
     return 0;
 }
 
@@ -429,6 +436,7 @@ void msix_uninit(PCIDevice *dev, MemoryRegion *table_bar, MemoryRegion *pba_bar)
     g_free(dev->msix_entry_used);
     dev->msix_entry_used = NULL;
     dev->cap_present &= ~QEMU_PCI_CAP_MSIX;
+    dev->msix_prepare_message = NULL;
 }
 
 void msix_uninit_exclusive_bar(PCIDevice *dev)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index a9b37f8000..435f84393c 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -317,6 +317,15 @@ void pci_device_deassert_intx(PCIDevice *dev)
     }
 }
 
+static void pci_msi_trigger(PCIDevice *dev, MSIMessage msg)
+{
+    MemTxAttrs attrs = {};
+
+    attrs.requester_id = pci_requester_id(dev);
+    address_space_stl_le(&dev->bus_master_as, msg.address, msg.data,
+                         attrs, NULL);
+}
+
 static void pci_reset_regions(PCIDevice *dev)
 {
     int r;
@@ -1212,6 +1221,8 @@ static void pci_qdev_unrealize(DeviceState *dev)
 
     pci_device_deassert_intx(pci_dev);
     do_pci_unregister_device(pci_dev);
+
+    pci_dev->msi_trigger = NULL;
 }
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
@@ -2251,6 +2262,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error **errp)
     }
 
     pci_set_power(pci_dev, true);
+
+    pci_dev->msi_trigger = pci_msi_trigger;
 }
 
 PCIDevice *pci_new_multifunction(int devfn, bool multifunction,
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 645b54343d..75d550daae 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -23,6 +23,8 @@
 #include "hw/remote/iommu.h"
 #include "hw/qdev-core.h"
 #include "hw/remote/iommu.h"
+#include "hw/remote/vfio-user-obj.h"
+#include "hw/pci/msi.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -54,12 +56,16 @@ static void remote_machine_init(MachineState *machine)
 
     if (s->vfio_user) {
         remote_iommu_setup(pci_host->bus);
-    }
 
-    remote_iohub_init(&s->iohub);
+        msi_nonbroken = true;
+
+        vfu_object_set_bus_irq(pci_host->bus);
+    } else {
+        remote_iohub_init(&s->iohub);
 
-    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
-                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+        pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
+                     &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+    }
 
     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index ee28a93782..eeb165a805 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -53,6 +53,9 @@
 #include "hw/pci/pci.h"
 #include "qemu/timer.h"
 #include "exec/memory.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+#include "hw/remote/vfio-user-obj.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -96,6 +99,10 @@ struct VfuObject {
     Error *unplug_blocker;
 
     int vfu_poll_fd;
+
+    MSITriggerFunc *default_msi_trigger;
+    MSIPrepareMessageFunc *default_msi_prepare_message;
+    MSIxPrepareMessageFunc *default_msix_prepare_message;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -520,6 +527,111 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
     }
 }
 
+static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
+{
+    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
+                                pci_dev->devfn);
+
+    return pci_bdf;
+}
+
+static void vfu_object_set_irq(void *opaque, int pirq, int level)
+{
+    PCIBus *pci_bus = opaque;
+    PCIDevice *pci_dev = NULL;
+    vfu_ctx_t *vfu_ctx = NULL;
+    int pci_bus_num, devfn;
+
+    if (level) {
+        pci_bus_num = PCI_BUS_NUM(pirq);
+        devfn = PCI_BDF_TO_DEVFN(pirq);
+
+        /*
+         * pci_find_device() performs at O(1) if the device is attached
+         * to the root PCI bus. Whereas, if the device is attached to a
+         * secondary PCI bus (such as when a root port is involved),
+         * finding the parent PCI bus could take O(n)
+         */
+        pci_dev = pci_find_device(pci_bus, pci_bus_num, devfn);
+
+        vfu_ctx = pci_dev->irq_opaque;
+
+        g_assert(vfu_ctx);
+
+        vfu_irq_trigger(vfu_ctx, 0);
+    }
+}
+
+static MSIMessage vfu_object_msi_prepare_msg(PCIDevice *pci_dev,
+                                             unsigned int vector)
+{
+    MSIMessage msg;
+
+    msg.address = 0;
+    msg.data = vector;
+
+    return msg;
+}
+
+static void vfu_object_msi_trigger(PCIDevice *pci_dev, MSIMessage msg)
+{
+    vfu_ctx_t *vfu_ctx = pci_dev->irq_opaque;
+
+    vfu_irq_trigger(vfu_ctx, msg.data);
+}
+
+static void vfu_object_setup_msi_cbs(VfuObject *o)
+{
+    o->default_msi_trigger = o->pci_dev->msi_trigger;
+    o->default_msi_prepare_message = o->pci_dev->msi_prepare_message;
+    o->default_msix_prepare_message = o->pci_dev->msix_prepare_message;
+
+    o->pci_dev->msi_trigger = vfu_object_msi_trigger;
+    o->pci_dev->msi_prepare_message = vfu_object_msi_prepare_msg;
+    o->pci_dev->msix_prepare_message = vfu_object_msi_prepare_msg;
+}
+
+static void vfu_object_restore_msi_cbs(VfuObject *o)
+{
+    o->pci_dev->msi_trigger = o->default_msi_trigger;
+    o->pci_dev->msi_prepare_message = o->default_msi_prepare_message;
+    o->pci_dev->msix_prepare_message = o->default_msix_prepare_message;
+}
+
+static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
+{
+    vfu_ctx_t *vfu_ctx = o->vfu_ctx;
+    int ret;
+
+    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (msix_nr_vectors_allocated(pci_dev)) {
+        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
+                                       msix_nr_vectors_allocated(pci_dev));
+    } else if (msi_nr_vectors_allocated(pci_dev)) {
+        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
+                                       msi_nr_vectors_allocated(pci_dev));
+    }
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    vfu_object_setup_msi_cbs(o);
+
+    pci_dev->irq_opaque = vfu_ctx;
+
+    return 0;
+}
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus)
+{
+    pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, pci_bus, 1);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -632,6 +744,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 
     vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
 
+    ret = vfu_object_setup_irqs(o, o->pci_dev);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup interrupts for %s",
+                   o->device);
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
@@ -657,6 +776,8 @@ fail:
         o->unplug_blocker = NULL;
     }
     if (o->pci_dev) {
+        vfu_object_restore_msi_cbs(o);
+        o->pci_dev->irq_opaque = NULL;
         object_unref(OBJECT(o->pci_dev));
         o->pci_dev = NULL;
     }
@@ -716,6 +837,8 @@ static void vfu_object_finalize(Object *obj)
     }
 
     if (o->pci_dev) {
+        vfu_object_restore_msi_cbs(o);
+        o->pci_dev->irq_opaque = NULL;
         object_unref(OBJECT(o->pci_dev));
         o->pci_dev = NULL;
     }
diff --git a/stubs/vfio-user-obj.c b/stubs/vfio-user-obj.c
new file mode 100644
index 0000000000..79100d768e
--- /dev/null
+++ b/stubs/vfio-user-obj.c
@@ -0,0 +1,6 @@
+#include "qemu/osdep.h"
+#include "hw/remote/vfio-user-obj.h"
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus)
+{
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index 9d8695b68d..844ed75834 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3604,6 +3604,7 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: include/hw/remote/vfio-user-obj.h
 F: hw/remote/iommu.c
 F: include/hw/remote/iommu.h
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 847d50d88f..c167b3c7a5 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -12,3 +12,4 @@ vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
 vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
 vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
 vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
+vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
diff --git a/stubs/meson.build b/stubs/meson.build
index 6f80fec761..d8f3fd5c44 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -60,3 +60,4 @@ if have_system
 else
   stub_ss.add(files('qdev.c'))
 endif
+stub_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_false: files('vfio-user-obj.c'))
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v10 14/14] vfio-user: handle reset of remote device
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (12 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 13/14] vfio-user: handle device interrupts Jagannathan Raman
@ 2022-05-24 15:30 ` Jagannathan Raman
  2022-05-25 14:55 ` [PATCH v10 00/14] vfio-user server in QEMU Stefan Hajnoczi
  14 siblings, 0 replies; 33+ messages in thread
From: Jagannathan Raman @ 2022-05-24 15:30 UTC (permalink / raw)
  To: qemu-devel
  Cc: stefanha, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju,
	jag.raman

Adds handler to reset a remote device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index eeb165a805..c0c2277bfc 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -632,6 +632,20 @@ void vfu_object_set_bus_irq(PCIBus *pci_bus)
     pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, pci_bus, 1);
 }
 
+static int vfu_object_device_reset(vfu_ctx_t *vfu_ctx, vfu_reset_type_t type)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    /* vfu_object_ctx_run() handles lost connection */
+    if (type == VFU_RESET_LOST_CONN) {
+        return 0;
+    }
+
+    qdev_reset_all(DEVICE(o->pci_dev));
+
+    return 0;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -751,6 +765,12 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    ret = vfu_setup_device_reset_cb(o->vfu_ctx, &vfu_object_device_reset);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup reset callback");
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 04/14] vfio-user: build library
  2022-05-24 15:30 ` [PATCH v10 04/14] vfio-user: build library Jagannathan Raman
@ 2022-05-25 14:46   ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2022-05-25 14:46 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: qemu-devel, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju

[-- Attachment #1: Type: text/plain, Size: 1211 bytes --]

On Tue, May 24, 2022 at 11:30:23AM -0400, Jagannathan Raman wrote:
> add the libvfio-user library as a submodule. build it as a meson
> subproject.
> 
> libvfio-user is distributed with BSD 3-Clause license and
> json-c with MIT (Expat) license
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  configure                               | 17 +++++++++++++++++
>  meson.build                             | 23 ++++++++++++++++++++++-
>  .gitlab-ci.d/buildtest.yml              |  1 +
>  .gitmodules                             |  3 +++
>  Kconfig.host                            |  4 ++++
>  MAINTAINERS                             |  1 +
>  hw/remote/Kconfig                       |  4 ++++
>  hw/remote/meson.build                   |  2 ++
>  meson_options.txt                       |  2 ++
>  subprojects/libvfio-user                |  1 +
>  tests/docker/dockerfiles/centos8.docker |  2 ++
>  11 files changed, 59 insertions(+), 1 deletion(-)
>  create mode 160000 subprojects/libvfio-user

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-24 15:30 ` [PATCH v10 13/14] vfio-user: handle device interrupts Jagannathan Raman
@ 2022-05-25 14:53   ` Stefan Hajnoczi
  2022-05-31 15:01     ` Jag Raman
  2022-06-06 18:32   ` Alexander Duyck
  1 sibling, 1 reply; 33+ messages in thread
From: Stefan Hajnoczi @ 2022-05-25 14:53 UTC (permalink / raw)
  To: Jagannathan Raman, mst, alex.williamson
  Cc: qemu-devel, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju

[-- Attachment #1: Type: text/plain, Size: 1157 bytes --]

On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:
> Forward remote device's interrupts to the guest
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/pci/pci.h              |  13 ++++
>  include/hw/remote/vfio-user-obj.h |   6 ++
>  hw/pci/msi.c                      |  16 ++--
>  hw/pci/msix.c                     |  10 ++-
>  hw/pci/pci.c                      |  13 ++++
>  hw/remote/machine.c               |  14 +++-
>  hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
>  stubs/vfio-user-obj.c             |   6 ++
>  MAINTAINERS                       |   1 +
>  hw/remote/trace-events            |   1 +
>  stubs/meson.build                 |   1 +
>  11 files changed, 193 insertions(+), 11 deletions(-)
>  create mode 100644 include/hw/remote/vfio-user-obj.h
>  create mode 100644 stubs/vfio-user-obj.c

It would be great if Michael Tsirkin and Alex Williamson would review
this.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 00/14] vfio-user server in QEMU
  2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
                   ` (13 preceding siblings ...)
  2022-05-24 15:30 ` [PATCH v10 14/14] vfio-user: handle reset of remote device Jagannathan Raman
@ 2022-05-25 14:55 ` Stefan Hajnoczi
  14 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2022-05-25 14:55 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: qemu-devel, mst, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	berrange, peter.maydell, eduardo, marcel.apfelbaum, eblake,
	armbru, quintela, dgilbert, imammedo, peterx, john.levon,
	thanos.makatos, elena.ufimtseva, john.g.johnson, kanth.ghatraju

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

On Tue, May 24, 2022 at 11:30:19AM -0400, Jagannathan Raman wrote:
> Hi,
> 
> This is v10 of the server side changes to enable vfio-user in QEMU.
> 
> Thank you for reviewing and sharing your feedback for the previous
> revision. We have addressed your comments in this revision.
> 
> We have dropped the following patches in this series:
>   - tests/avocado: Specify target VM argument to helper routines
>   - configure: require cmake 3.19 or newer
>   - vfio-user: avocado tests for vfio-user
> 
> We have also made the following changes:
>   [PATCH v10 1/14] qdev: unplug blocker for devices
>     - updated functions comments for unplug blockers in hw/qdev-core.h
> 
>   [PATCH v10 4/14] vfio-user: build library
>     - uses meson build system to build libvfio-user library
>     - dropped ubuntu CI build
> 
>   [PATCH v10 5/14] vfio-user: define vfio-user-server object
>     - updated comments for VfioUserServerProperties in qapi/qom.json
> 
>   [PATCH v10 6/14] vfio-user: instantiate vfio-user context
>     - added comments to vfu_object_init_ctx() explaining function contract
> 
>   [PATCH v10 8/14] vfio-user: run vfio-user context
>     - vfu_object_ctx_run() asserts that VfuObject->device is not NULL
>     - added a comment to vfu_object_ctx_run() explaining why
>       VfuObject->device wouldn't be NULL
> 
> Thank you very much!

I'm happy with this series. I've asked Michael Tsirkin and Alex
Williamson to review the interrupt patch. Aside from that I think it's
time to merge and further work can be done in qemu.git.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-25 14:53   ` Stefan Hajnoczi
@ 2022-05-31 15:01     ` Jag Raman
  2022-05-31 20:10       ` Alex Williamson
  0 siblings, 1 reply; 33+ messages in thread
From: Jag Raman @ 2022-05-31 15:01 UTC (permalink / raw)
  To: Stefan Hajnoczi, alex.williamson, Michael S. Tsirkin
  Cc: qemu-devel, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, John Johnson,
	Kanth Ghatraju



> On May 25, 2022, at 10:53 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:
>> Forward remote device's interrupts to the guest
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> include/hw/pci/pci.h              |  13 ++++
>> include/hw/remote/vfio-user-obj.h |   6 ++
>> hw/pci/msi.c                      |  16 ++--
>> hw/pci/msix.c                     |  10 ++-
>> hw/pci/pci.c                      |  13 ++++
>> hw/remote/machine.c               |  14 +++-
>> hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
>> stubs/vfio-user-obj.c             |   6 ++
>> MAINTAINERS                       |   1 +
>> hw/remote/trace-events            |   1 +
>> stubs/meson.build                 |   1 +
>> 11 files changed, 193 insertions(+), 11 deletions(-)
>> create mode 100644 include/hw/remote/vfio-user-obj.h
>> create mode 100644 stubs/vfio-user-obj.c
> 
> It would be great if Michael Tsirkin and Alex Williamson would review
> this.

Hi Michael and Alex,

Do you have any thoughts on this patch?

Thank you!
--
Jag

> 
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-31 15:01     ` Jag Raman
@ 2022-05-31 20:10       ` Alex Williamson
  2022-05-31 21:03         ` Stefan Hajnoczi
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2022-05-31 20:10 UTC (permalink / raw)
  To: Jag Raman
  Cc: Stefan Hajnoczi, Michael S. Tsirkin, qemu-devel, f4bug, pbonzini,
	marcandre.lureau, thuth, bleal, Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, John Johnson,
	Kanth Ghatraju

On Tue, 31 May 2022 15:01:57 +0000
Jag Raman <jag.raman@oracle.com> wrote:

> > On May 25, 2022, at 10:53 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:  
> >> Forward remote device's interrupts to the guest
> >> 
> >> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> >> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> >> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> >> ---
> >> include/hw/pci/pci.h              |  13 ++++
> >> include/hw/remote/vfio-user-obj.h |   6 ++
> >> hw/pci/msi.c                      |  16 ++--
> >> hw/pci/msix.c                     |  10 ++-
> >> hw/pci/pci.c                      |  13 ++++
> >> hw/remote/machine.c               |  14 +++-
> >> hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
> >> stubs/vfio-user-obj.c             |   6 ++
> >> MAINTAINERS                       |   1 +
> >> hw/remote/trace-events            |   1 +
> >> stubs/meson.build                 |   1 +
> >> 11 files changed, 193 insertions(+), 11 deletions(-)
> >> create mode 100644 include/hw/remote/vfio-user-obj.h
> >> create mode 100644 stubs/vfio-user-obj.c  
> > 
> > It would be great if Michael Tsirkin and Alex Williamson would review
> > this.  
> 
> Hi Michael and Alex,
> 
> Do you have any thoughts on this patch?

Ultimately this is just how to insert callbacks to replace the default
MSI/X triggers so you can send a vector# over the wire for a remote
machine, right?  I'll let the code owners, Michael and Marcel, comment
if they have grand vision how to architect this differently.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-31 20:10       ` Alex Williamson
@ 2022-05-31 21:03         ` Stefan Hajnoczi
  2022-05-31 21:45           ` Alex Williamson
  0 siblings, 1 reply; 33+ messages in thread
From: Stefan Hajnoczi @ 2022-05-31 21:03 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jag Raman, Stefan Hajnoczi, Michael S. Tsirkin, qemu-devel,
	f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, John Johnson,
	Kanth Ghatraju

On Tue, 31 May 2022 at 21:11, Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> On Tue, 31 May 2022 15:01:57 +0000
> Jag Raman <jag.raman@oracle.com> wrote:
>
> > > On May 25, 2022, at 10:53 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > >
> > > On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:
> > >> Forward remote device's interrupts to the guest
> > >>
> > >> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > >> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > >> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > >> ---
> > >> include/hw/pci/pci.h              |  13 ++++
> > >> include/hw/remote/vfio-user-obj.h |   6 ++
> > >> hw/pci/msi.c                      |  16 ++--
> > >> hw/pci/msix.c                     |  10 ++-
> > >> hw/pci/pci.c                      |  13 ++++
> > >> hw/remote/machine.c               |  14 +++-
> > >> hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
> > >> stubs/vfio-user-obj.c             |   6 ++
> > >> MAINTAINERS                       |   1 +
> > >> hw/remote/trace-events            |   1 +
> > >> stubs/meson.build                 |   1 +
> > >> 11 files changed, 193 insertions(+), 11 deletions(-)
> > >> create mode 100644 include/hw/remote/vfio-user-obj.h
> > >> create mode 100644 stubs/vfio-user-obj.c
> > >
> > > It would be great if Michael Tsirkin and Alex Williamson would review
> > > this.
> >
> > Hi Michael and Alex,
> >
> > Do you have any thoughts on this patch?
>
> Ultimately this is just how to insert callbacks to replace the default
> MSI/X triggers so you can send a vector# over the wire for a remote
> machine, right?  I'll let the code owners, Michael and Marcel, comment
> if they have grand vision how to architect this differently.  Thanks,

An earlier version of the patch intercepted MSI-X at the msix_notify()
level, replacing the entire function. This patch replaces
msix_get_message() and msi_send_message(), leaving the masking logic
in place.

I haven't seen the latest vfio-user client implementation for QEMU,
but if the idea is to allow the guest to directly control the
vfio-user device's MSI-X table's mask bits, then I think this is a
different design from VFIO kernel where masking is emulated by QEMU
and not passed through to the PCI device.

It's been a while since I looked at how this works in QEMU's hw/vfio/
code, so I may not be explaining it correctly, but I think there is a
design difference here between VFIO kernel and vfio-user that's worth
evaluating.

Stefan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-31 21:03         ` Stefan Hajnoczi
@ 2022-05-31 21:45           ` Alex Williamson
  2022-06-01  6:37             ` John Johnson
  2022-06-01 17:00             ` Jag Raman
  0 siblings, 2 replies; 33+ messages in thread
From: Alex Williamson @ 2022-05-31 21:45 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Jag Raman, Stefan Hajnoczi, Michael S. Tsirkin, qemu-devel,
	f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, John Johnson,
	Kanth Ghatraju

On Tue, 31 May 2022 22:03:14 +0100
Stefan Hajnoczi <stefanha@gmail.com> wrote:

> On Tue, 31 May 2022 at 21:11, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > On Tue, 31 May 2022 15:01:57 +0000
> > Jag Raman <jag.raman@oracle.com> wrote:
> >  
> > > > On May 25, 2022, at 10:53 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > > >
> > > > On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:  
> > > >> Forward remote device's interrupts to the guest
> > > >>
> > > >> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > > >> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > > >> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > > >> ---
> > > >> include/hw/pci/pci.h              |  13 ++++
> > > >> include/hw/remote/vfio-user-obj.h |   6 ++
> > > >> hw/pci/msi.c                      |  16 ++--
> > > >> hw/pci/msix.c                     |  10 ++-
> > > >> hw/pci/pci.c                      |  13 ++++
> > > >> hw/remote/machine.c               |  14 +++-
> > > >> hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
> > > >> stubs/vfio-user-obj.c             |   6 ++
> > > >> MAINTAINERS                       |   1 +
> > > >> hw/remote/trace-events            |   1 +
> > > >> stubs/meson.build                 |   1 +
> > > >> 11 files changed, 193 insertions(+), 11 deletions(-)
> > > >> create mode 100644 include/hw/remote/vfio-user-obj.h
> > > >> create mode 100644 stubs/vfio-user-obj.c  
> > > >
> > > > It would be great if Michael Tsirkin and Alex Williamson would review
> > > > this.  
> > >
> > > Hi Michael and Alex,
> > >
> > > Do you have any thoughts on this patch?  
> >
> > Ultimately this is just how to insert callbacks to replace the default
> > MSI/X triggers so you can send a vector# over the wire for a remote
> > machine, right?  I'll let the code owners, Michael and Marcel, comment
> > if they have grand vision how to architect this differently.  Thanks,  
> 
> An earlier version of the patch intercepted MSI-X at the msix_notify()
> level, replacing the entire function. This patch replaces
> msix_get_message() and msi_send_message(), leaving the masking logic
> in place.
> 
> I haven't seen the latest vfio-user client implementation for QEMU,
> but if the idea is to allow the guest to directly control the
> vfio-user device's MSI-X table's mask bits, then I think this is a
> different design from VFIO kernel where masking is emulated by QEMU
> and not passed through to the PCI device.

Essentially what's happening here is an implementation of an interrupt
handler callback in the remote QEMU instance.  The default handler is
to simply write the MSI message data at the MSI message address of the
vCPU, vfio-user replaces that with hijacking the MSI message itself to
simply report the vector# so that the "handler", ie. trigger, can
forward it to the client.  That's very analogous to the kernel
implementation.

The equivalent masking we have today with vfio kernel would happen on
the client side, where the MSI/X code might instead set a pending bit
if the vector is masked on the client.  Likewise the possibility
remains, just as it does on the kernel side, that the guest masking a
vector could be relayed over ioctl/socket to set the equivalent mask on
the host/remote.

I don't see a fundamental design difference here, but please point out
if I'm missing something.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-31 21:45           ` Alex Williamson
@ 2022-06-01  6:37             ` John Johnson
  2022-06-01  9:38               ` Stefan Hajnoczi
  2022-06-01 17:00             ` Jag Raman
  1 sibling, 1 reply; 33+ messages in thread
From: John Johnson @ 2022-06-01  6:37 UTC (permalink / raw)
  To: Alex Williamson, QEMU Devel Mailing List



> On May 31, 2022, at 2:45 PM, Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> On Tue, 31 May 2022 22:03:14 +0100
> Stefan Hajnoczi <stefanha@gmail.com> wrote:
> 
>> On Tue, 31 May 2022 at 21:11, Alex Williamson
>> <alex.williamson@redhat.com> wrote:
>>> 
>>> On Tue, 31 May 2022 15:01:57 +0000
>>> Jag Raman <jag.raman@oracle.com> wrote:
>>> 
>>>>> On May 25, 2022, at 10:53 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>>> 
>>>>> On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:  
>>>>>> Forward remote device's interrupts to the guest
>>>>>> 
>>>>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>>>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>>>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>>>>> ---
>>>>>> include/hw/pci/pci.h              |  13 ++++
>>>>>> include/hw/remote/vfio-user-obj.h |   6 ++
>>>>>> hw/pci/msi.c                      |  16 ++--
>>>>>> hw/pci/msix.c                     |  10 ++-
>>>>>> hw/pci/pci.c                      |  13 ++++
>>>>>> hw/remote/machine.c               |  14 +++-
>>>>>> hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
>>>>>> stubs/vfio-user-obj.c             |   6 ++
>>>>>> MAINTAINERS                       |   1 +
>>>>>> hw/remote/trace-events            |   1 +
>>>>>> stubs/meson.build                 |   1 +
>>>>>> 11 files changed, 193 insertions(+), 11 deletions(-)
>>>>>> create mode 100644 include/hw/remote/vfio-user-obj.h
>>>>>> create mode 100644 stubs/vfio-user-obj.c  
>>>>> 
>>>>> It would be great if Michael Tsirkin and Alex Williamson would review
>>>>> this.  
>>>> 
>>>> Hi Michael and Alex,
>>>> 
>>>> Do you have any thoughts on this patch?  
>>> 
>>> Ultimately this is just how to insert callbacks to replace the default
>>> MSI/X triggers so you can send a vector# over the wire for a remote
>>> machine, right?  I'll let the code owners, Michael and Marcel, comment
>>> if they have grand vision how to architect this differently.  Thanks,  
>> 
>> An earlier version of the patch intercepted MSI-X at the msix_notify()
>> level, replacing the entire function. This patch replaces
>> msix_get_message() and msi_send_message(), leaving the masking logic
>> in place.
>> 
>> I haven't seen the latest vfio-user client implementation for QEMU,
>> but if the idea is to allow the guest to directly control the
>> vfio-user device's MSI-X table's mask bits, then I think this is a
>> different design from VFIO kernel where masking is emulated by QEMU
>> and not passed through to the PCI device.
> 
> Essentially what's happening here is an implementation of an interrupt
> handler callback in the remote QEMU instance.  The default handler is
> to simply write the MSI message data at the MSI message address of the
> vCPU, vfio-user replaces that with hijacking the MSI message itself to
> simply report the vector# so that the "handler", ie. trigger, can
> forward it to the client.  That's very analogous to the kernel
> implementation.
> 
> The equivalent masking we have today with vfio kernel would happen on
> the client side, where the MSI/X code might instead set a pending bit
> if the vector is masked on the client.  Likewise the possibility
> remains, just as it does on the kernel side, that the guest masking a
> vector could be relayed over ioctl/socket to set the equivalent mask on
> the host/remote.
> 
> I don't see a fundamental design difference here, but please point out
> if I'm missing something.  Thanks,
> 



	I don’t think you’ve missed anything.  We were trying to stay
close to the kernel implementation.

	Do you have any comments on the client side implementation I
sent on 5/5?

							JJ





^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-06-01  6:37             ` John Johnson
@ 2022-06-01  9:38               ` Stefan Hajnoczi
  0 siblings, 0 replies; 33+ messages in thread
From: Stefan Hajnoczi @ 2022-06-01  9:38 UTC (permalink / raw)
  To: John Johnson; +Cc: Alex Williamson, QEMU Devel Mailing List

On Wed, 1 Jun 2022 at 07:40, John Johnson <john.g.johnson@oracle.com> wrote:
>
> > On May 31, 2022, at 2:45 PM, Alex Williamson <alex.williamson@redhat.com> wrote:
> >
> > On Tue, 31 May 2022 22:03:14 +0100
> > Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >
> >> On Tue, 31 May 2022 at 21:11, Alex Williamson
> >> <alex.williamson@redhat.com> wrote:
> >>>
> >>> On Tue, 31 May 2022 15:01:57 +0000
> >>> Jag Raman <jag.raman@oracle.com> wrote:
> >>>
> >>>>> On May 25, 2022, at 10:53 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>>>>
> >>>>> On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:
> >>>>>> Forward remote device's interrupts to the guest
> >>>>>>
> >>>>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> >>>>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> >>>>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> >>>>>> ---
> >>>>>> include/hw/pci/pci.h              |  13 ++++
> >>>>>> include/hw/remote/vfio-user-obj.h |   6 ++
> >>>>>> hw/pci/msi.c                      |  16 ++--
> >>>>>> hw/pci/msix.c                     |  10 ++-
> >>>>>> hw/pci/pci.c                      |  13 ++++
> >>>>>> hw/remote/machine.c               |  14 +++-
> >>>>>> hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
> >>>>>> stubs/vfio-user-obj.c             |   6 ++
> >>>>>> MAINTAINERS                       |   1 +
> >>>>>> hw/remote/trace-events            |   1 +
> >>>>>> stubs/meson.build                 |   1 +
> >>>>>> 11 files changed, 193 insertions(+), 11 deletions(-)
> >>>>>> create mode 100644 include/hw/remote/vfio-user-obj.h
> >>>>>> create mode 100644 stubs/vfio-user-obj.c
> >>>>>
> >>>>> It would be great if Michael Tsirkin and Alex Williamson would review
> >>>>> this.
> >>>>
> >>>> Hi Michael and Alex,
> >>>>
> >>>> Do you have any thoughts on this patch?
> >>>
> >>> Ultimately this is just how to insert callbacks to replace the default
> >>> MSI/X triggers so you can send a vector# over the wire for a remote
> >>> machine, right?  I'll let the code owners, Michael and Marcel, comment
> >>> if they have grand vision how to architect this differently.  Thanks,
> >>
> >> An earlier version of the patch intercepted MSI-X at the msix_notify()
> >> level, replacing the entire function. This patch replaces
> >> msix_get_message() and msi_send_message(), leaving the masking logic
> >> in place.
> >>
> >> I haven't seen the latest vfio-user client implementation for QEMU,
> >> but if the idea is to allow the guest to directly control the
> >> vfio-user device's MSI-X table's mask bits, then I think this is a
> >> different design from VFIO kernel where masking is emulated by QEMU
> >> and not passed through to the PCI device.
> >
> > Essentially what's happening here is an implementation of an interrupt
> > handler callback in the remote QEMU instance.  The default handler is
> > to simply write the MSI message data at the MSI message address of the
> > vCPU, vfio-user replaces that with hijacking the MSI message itself to
> > simply report the vector# so that the "handler", ie. trigger, can
> > forward it to the client.  That's very analogous to the kernel
> > implementation.
> >
> > The equivalent masking we have today with vfio kernel would happen on
> > the client side, where the MSI/X code might instead set a pending bit
> > if the vector is masked on the client.  Likewise the possibility
> > remains, just as it does on the kernel side, that the guest masking a
> > vector could be relayed over ioctl/socket to set the equivalent mask on
> > the host/remote.
> >
> > I don't see a fundamental design difference here, but please point out
> > if I'm missing something.  Thanks,
> >
>
>
>
>         I don’t think you’ve missed anything.  We were trying to stay
> close to the kernel implementation.

Okay.

Stefan


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-31 21:45           ` Alex Williamson
  2022-06-01  6:37             ` John Johnson
@ 2022-06-01 17:00             ` Jag Raman
  2022-06-01 17:26               ` Alex Williamson
  1 sibling, 1 reply; 33+ messages in thread
From: Jag Raman @ 2022-06-01 17:00 UTC (permalink / raw)
  To: Alex Williamson, Stefan Hajnoczi, John Johnson
  Cc: Stefan Hajnoczi, Stefan Hajnoczi, Michael S. Tsirkin, qemu-devel,
	f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, Kanth Ghatraju

[-- Attachment #1: Type: text/plain, Size: 4714 bytes --]



On May 31, 2022, at 5:45 PM, Alex Williamson <alex.williamson@redhat.com<mailto:alex.williamson@redhat.com>> wrote:

On Tue, 31 May 2022 22:03:14 +0100
Stefan Hajnoczi <stefanha@gmail.com<mailto:stefanha@gmail.com>> wrote:

On Tue, 31 May 2022 at 21:11, Alex Williamson
<alex.williamson@redhat.com<mailto:alex.williamson@redhat.com>> wrote:

On Tue, 31 May 2022 15:01:57 +0000
Jag Raman <jag.raman@oracle.com<mailto:jag.raman@oracle.com>> wrote:

On May 25, 2022, at 10:53 AM, Stefan Hajnoczi <stefanha@redhat.com<mailto:stefanha@redhat.com>> wrote:

On Tue, May 24, 2022 at 11:30:32AM -0400, Jagannathan Raman wrote:
Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com<mailto:elena.ufimtseva@oracle.com>>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com<mailto:john.g.johnson@oracle.com>>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com<mailto:jag.raman@oracle.com>>
---
include/hw/pci/pci.h              |  13 ++++
include/hw/remote/vfio-user-obj.h |   6 ++
hw/pci/msi.c                      |  16 ++--
hw/pci/msix.c                     |  10 ++-
hw/pci/pci.c                      |  13 ++++
hw/remote/machine.c               |  14 +++-
hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
stubs/vfio-user-obj.c             |   6 ++
MAINTAINERS                       |   1 +
hw/remote/trace-events            |   1 +
stubs/meson.build                 |   1 +
11 files changed, 193 insertions(+), 11 deletions(-)
create mode 100644 include/hw/remote/vfio-user-obj.h
create mode 100644 stubs/vfio-user-obj.c

It would be great if Michael Tsirkin and Alex Williamson would review
this.

Hi Michael and Alex,

Do you have any thoughts on this patch?

Ultimately this is just how to insert callbacks to replace the default
MSI/X triggers so you can send a vector# over the wire for a remote
machine, right?  I'll let the code owners, Michael and Marcel, comment
if they have grand vision how to architect this differently.  Thanks,

An earlier version of the patch intercepted MSI-X at the msix_notify()
level, replacing the entire function. This patch replaces
msix_get_message() and msi_send_message(), leaving the masking logic
in place.

I haven't seen the latest vfio-user client implementation for QEMU,
but if the idea is to allow the guest to directly control the
vfio-user device's MSI-X table's mask bits, then I think this is a
different design from VFIO kernel where masking is emulated by QEMU
and not passed through to the PCI device.

Essentially what's happening here is an implementation of an interrupt
handler callback in the remote QEMU instance.  The default handler is
to simply write the MSI message data at the MSI message address of the
vCPU, vfio-user replaces that with hijacking the MSI message itself to
simply report the vector# so that the "handler", ie. trigger, can
forward it to the client.  That's very analogous to the kernel
implementation.

The equivalent masking we have today with vfio kernel would happen on
the client side, where the MSI/X code might instead set a pending bit
if the vector is masked on the client.  Likewise the possibility
remains, just as it does on the kernel side, that the guest masking a
vector could be relayed over ioctl/socket to set the equivalent mask on
the host/remote.

Hi Alex,

Just to add some more detail, the emulated PCI device in QEMU presently
maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In the
present VFIO PCI device implementation, QEMU leverages the same
MSIx table for interrupt masking/unmasking. The backend PCI device (such as
the passthru device) always thinks that the interrupt is unmasked and lets
QEMU manage masking.

Whereas in the vfio-user case, the client additionally pushes a copy of
emulated PCI device’s table downstream to the remote device. We did this
to allow a small set of devices (such as e1000e) to clear the
PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the
MSIx table to determine if interrupt should be triggered - this would prevent
an interrupt from being sent to the client unnecessarily if it's masked.

We are wondering if pushing the MSIx table to the remote device and
reading PBA from it would diverge from the VFIO protocol specification?

From your comment, I understand it’s similar to VFIO protocol because VFIO
clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl +
VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently
does not use this approach and the kernel does not support it for MSI.

Thank you!
--
Jag

[-- Attachment #2: Type: text/html, Size: 9206 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-06-01 17:00             ` Jag Raman
@ 2022-06-01 17:26               ` Alex Williamson
  2022-06-01 18:01                 ` Jag Raman
  2022-06-03 12:20                 ` John Johnson
  0 siblings, 2 replies; 33+ messages in thread
From: Alex Williamson @ 2022-06-01 17:26 UTC (permalink / raw)
  To: Jag Raman
  Cc: Stefan Hajnoczi, John Johnson, Stefan Hajnoczi,
	Michael S. Tsirkin, qemu-devel, f4bug, pbonzini,
	marcandre.lureau, thuth, bleal, Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, Kanth Ghatraju

On Wed, 1 Jun 2022 17:00:54 +0000
Jag Raman <jag.raman@oracle.com> wrote:
> 
> Hi Alex,
> 
> Just to add some more detail, the emulated PCI device in QEMU presently
> maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In the
> present VFIO PCI device implementation, QEMU leverages the same
> MSIx table for interrupt masking/unmasking. The backend PCI device (such as
> the passthru device) always thinks that the interrupt is unmasked and lets
> QEMU manage masking.
> 
> Whereas in the vfio-user case, the client additionally pushes a copy of
> emulated PCI device’s table downstream to the remote device. We did this
> to allow a small set of devices (such as e1000e) to clear the
> PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the
> MSIx table to determine if interrupt should be triggered - this would prevent
> an interrupt from being sent to the client unnecessarily if it's masked.
> 
> We are wondering if pushing the MSIx table to the remote device and
> reading PBA from it would diverge from the VFIO protocol specification?
> 
> From your comment, I understand it’s similar to VFIO protocol because VFIO
> clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl +
> VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently
> does not use this approach and the kernel does not support it for MSI.

I believe the SET_IRQS ioctl definition is pre-enabled to support
masking and unmasking, we've just lacked kernel support to mask at the
device which leads to the hybrid approach we have today.  Our intention
would be to use the current uAPI, to provide that masking support, at
which point we'd leave the PBA mapped to the device.

So whether your proposal diverges from the VFIO uAPI depends on what
you mean by "pushing the MSIx table to the remote device".  If that's
done by implementing the existing SET_IRQS masking support, then you're
spot on.  OTOH, if you're actually pushing a copy of the MSIx table
from the client, that's certainly not how I had envisioned the kernel
interface.  Thanks,

Alex



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-06-01 17:26               ` Alex Williamson
@ 2022-06-01 18:01                 ` Jag Raman
  2022-06-01 18:30                   ` Alex Williamson
  2022-06-03 12:20                 ` John Johnson
  1 sibling, 1 reply; 33+ messages in thread
From: Jag Raman @ 2022-06-01 18:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Stefan Hajnoczi, John Johnson, Stefan Hajnoczi,
	Michael S. Tsirkin, qemu-devel, f4bug, pbonzini,
	marcandre.lureau, thuth, bleal, Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, Kanth Ghatraju



> On Jun 1, 2022, at 1:26 PM, Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> On Wed, 1 Jun 2022 17:00:54 +0000
> Jag Raman <jag.raman@oracle.com> wrote:
>> 
>> Hi Alex,
>> 
>> Just to add some more detail, the emulated PCI device in QEMU presently
>> maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In the
>> present VFIO PCI device implementation, QEMU leverages the same
>> MSIx table for interrupt masking/unmasking. The backend PCI device (such as
>> the passthru device) always thinks that the interrupt is unmasked and lets
>> QEMU manage masking.
>> 
>> Whereas in the vfio-user case, the client additionally pushes a copy of
>> emulated PCI device’s table downstream to the remote device. We did this
>> to allow a small set of devices (such as e1000e) to clear the
>> PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the
>> MSIx table to determine if interrupt should be triggered - this would prevent
>> an interrupt from being sent to the client unnecessarily if it's masked.
>> 
>> We are wondering if pushing the MSIx table to the remote device and
>> reading PBA from it would diverge from the VFIO protocol specification?
>> 
>> From your comment, I understand it’s similar to VFIO protocol because VFIO
>> clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl +
>> VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently
>> does not use this approach and the kernel does not support it for MSI.
> 
> I believe the SET_IRQS ioctl definition is pre-enabled to support
> masking and unmasking, we've just lacked kernel support to mask at the
> device which leads to the hybrid approach we have today.  Our intention
> would be to use the current uAPI, to provide that masking support, at
> which point we'd leave the PBA mapped to the device.

Thank you for clarifying!

> 
> So whether your proposal diverges from the VFIO uAPI depends on what
> you mean by "pushing the MSIx table to the remote device".  If that's
> done by implementing the existing SET_IRQS masking support, then you're
> spot on.  OTOH, if you're actually pushing a copy of the MSIx table
> from the client, that's certainly not how I had envisioned the kernel

In the current implementation - when the guest accesses the MSIx table and
PBA, the client passes these accesses through to the remote device.

If we switch to using SET_IRQS approach, we could use SET_IRQS
message for masking/unmasking, but still pass through the the PBA
access to the backend PCI device.

So I think the question is, if we should switch vfio-user to SET_IRQS
now for masking or unmasking, or whenever QEMU does it in the future?
The PBA access would remain the same as it’s now - via device BAR.

Thank you!
--
Jag

> interface.  Thanks,
> 
> Alex
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-06-01 18:01                 ` Jag Raman
@ 2022-06-01 18:30                   ` Alex Williamson
  2022-06-01 19:38                     ` Jag Raman
  0 siblings, 1 reply; 33+ messages in thread
From: Alex Williamson @ 2022-06-01 18:30 UTC (permalink / raw)
  To: Jag Raman
  Cc: Stefan Hajnoczi, John Johnson, Stefan Hajnoczi,
	Michael S. Tsirkin, qemu-devel, f4bug, pbonzini,
	marcandre.lureau, thuth, bleal, Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, Kanth Ghatraju

On Wed, 1 Jun 2022 18:01:39 +0000
Jag Raman <jag.raman@oracle.com> wrote:

> > On Jun 1, 2022, at 1:26 PM, Alex Williamson <alex.williamson@redhat.com> wrote:
> > 
> > On Wed, 1 Jun 2022 17:00:54 +0000
> > Jag Raman <jag.raman@oracle.com> wrote:  
> >> 
> >> Hi Alex,
> >> 
> >> Just to add some more detail, the emulated PCI device in QEMU presently
> >> maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In the
> >> present VFIO PCI device implementation, QEMU leverages the same
> >> MSIx table for interrupt masking/unmasking. The backend PCI device (such as
> >> the passthru device) always thinks that the interrupt is unmasked and lets
> >> QEMU manage masking.
> >> 
> >> Whereas in the vfio-user case, the client additionally pushes a copy of
> >> emulated PCI device’s table downstream to the remote device. We did this
> >> to allow a small set of devices (such as e1000e) to clear the
> >> PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the
> >> MSIx table to determine if interrupt should be triggered - this would prevent
> >> an interrupt from being sent to the client unnecessarily if it's masked.
> >> 
> >> We are wondering if pushing the MSIx table to the remote device and
> >> reading PBA from it would diverge from the VFIO protocol specification?
> >> 
> >> From your comment, I understand it’s similar to VFIO protocol because VFIO
> >> clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl +
> >> VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently
> >> does not use this approach and the kernel does not support it for MSI.  
> > 
> > I believe the SET_IRQS ioctl definition is pre-enabled to support
> > masking and unmasking, we've just lacked kernel support to mask at the
> > device which leads to the hybrid approach we have today.  Our intention
> > would be to use the current uAPI, to provide that masking support, at
> > which point we'd leave the PBA mapped to the device.  
> 
> Thank you for clarifying!
> 
> > 
> > So whether your proposal diverges from the VFIO uAPI depends on what
> > you mean by "pushing the MSIx table to the remote device".  If that's
> > done by implementing the existing SET_IRQS masking support, then you're
> > spot on.  OTOH, if you're actually pushing a copy of the MSIx table
> > from the client, that's certainly not how I had envisioned the kernel  
> 
> In the current implementation - when the guest accesses the MSIx table and
> PBA, the client passes these accesses through to the remote device.

I suppose you can do this because you don't need some means for the
client to register a notification mechanism for the interrupt, you can
already send notifications via the socket.  But this is now a
divergence from the kernel vfio uapi and eliminates what is intended to
be a device agnostic interrupt interface.

> If we switch to using SET_IRQS approach, we could use SET_IRQS
> message for masking/unmasking, but still pass through the the PBA
> access to the backend PCI device.

Yes.

> So I think the question is, if we should switch vfio-user to SET_IRQS
> now for masking or unmasking, or whenever QEMU does it in the future?
> The PBA access would remain the same as it’s now - via device BAR.

I apologize that I'm constantly overwhelmed with requests that I
haven't reviewed it, but it seems like the client side would be far
more compliant to the vfio kernel interface if vfio-user did implement
an interpretation of the SET_IRQS ioctl.  Potentially couldn't you also
make use of eventfds or define other data types to pass that would give
the client more flexibility in receiving interrupts?  Thanks,

Alex



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-06-01 18:30                   ` Alex Williamson
@ 2022-06-01 19:38                     ` Jag Raman
  0 siblings, 0 replies; 33+ messages in thread
From: Jag Raman @ 2022-06-01 19:38 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Stefan Hajnoczi, John Johnson, Stefan Hajnoczi,
	Michael S. Tsirkin, qemu-devel, f4bug, pbonzini,
	marcandre.lureau, thuth, bleal, Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, Kanth Ghatraju

[-- Attachment #1: Type: text/plain, Size: 3938 bytes --]



On Jun 1, 2022, at 2:30 PM, Alex Williamson <alex.williamson@redhat.com<mailto:alex.williamson@redhat.com>> wrote:

On Wed, 1 Jun 2022 18:01:39 +0000
Jag Raman <jag.raman@oracle.com<mailto:jag.raman@oracle.com>> wrote:

On Jun 1, 2022, at 1:26 PM, Alex Williamson <alex.williamson@redhat.com<mailto:alex.williamson@redhat.com>> wrote:

On Wed, 1 Jun 2022 17:00:54 +0000
Jag Raman <jag.raman@oracle.com<mailto:jag.raman@oracle.com>> wrote:

Hi Alex,

Just to add some more detail, the emulated PCI device in QEMU presently
maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In the
present VFIO PCI device implementation, QEMU leverages the same
MSIx table for interrupt masking/unmasking. The backend PCI device (such as
the passthru device) always thinks that the interrupt is unmasked and lets
QEMU manage masking.

Whereas in the vfio-user case, the client additionally pushes a copy of
emulated PCI device’s table downstream to the remote device. We did this
to allow a small set of devices (such as e1000e) to clear the
PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the
MSIx table to determine if interrupt should be triggered - this would prevent
an interrupt from being sent to the client unnecessarily if it's masked.

We are wondering if pushing the MSIx table to the remote device and
reading PBA from it would diverge from the VFIO protocol specification?

From your comment, I understand it’s similar to VFIO protocol because VFIO
clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl +
VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently
does not use this approach and the kernel does not support it for MSI.

I believe the SET_IRQS ioctl definition is pre-enabled to support
masking and unmasking, we've just lacked kernel support to mask at the
device which leads to the hybrid approach we have today. Our intention
would be to use the current uAPI, to provide that masking support, at
which point we'd leave the PBA mapped to the device.

Thank you for clarifying!


So whether your proposal diverges from the VFIO uAPI depends on what
you mean by "pushing the MSIx table to the remote device". If that's
done by implementing the existing SET_IRQS masking support, then you're
spot on. OTOH, if you're actually pushing a copy of the MSIx table
from the client, that's certainly not how I had envisioned the kernel

In the current implementation - when the guest accesses the MSIx table and
PBA, the client passes these accesses through to the remote device.

I suppose you can do this because you don't need some means for the
client to register a notification mechanism for the interrupt, you can
already send notifications via the socket. But this is now a
divergence from the kernel vfio uapi and eliminates what is intended to
be a device agnostic interrupt interface.

If we switch to using SET_IRQS approach, we could use SET_IRQS
message for masking/unmasking, but still pass through the the PBA
access to the backend PCI device.

Yes.

So I think the question is, if we should switch vfio-user to SET_IRQS
now for masking or unmasking, or whenever QEMU does it in the future?
The PBA access would remain the same as it’s now - via device BAR.

I apologize that I'm constantly overwhelmed with requests that I
haven't reviewed it, but it seems like the client side would be far
more compliant to the vfio kernel interface if vfio-user did implement
an interpretation of the SET_IRQS ioctl. Potentially couldn't you also

Thanks for confirming! We’ll explore using SET_IRQS for masking
and unmasking.

make use of eventfds or define other data types to pass that would give
the client more flexibility in receiving interrupts? Thanks,

I think the libvfio-user library already uses eventfds to pass interrupts
to the client.

--
Jag


Alex


[-- Attachment #2: Type: text/html, Size: 20441 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-06-01 17:26               ` Alex Williamson
  2022-06-01 18:01                 ` Jag Raman
@ 2022-06-03 12:20                 ` John Johnson
  1 sibling, 0 replies; 33+ messages in thread
From: John Johnson @ 2022-06-03 12:20 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Jag Raman, Stefan Hajnoczi, Stefan Hajnoczi, Michael S. Tsirkin,
	qemu-devel, f4bug, pbonzini, marcandre.lureau, thuth, bleal,
	Daniel P. Berrangé,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake,
	Markus Armbruster, quintela, dgilbert, imammedo, peterx,
	john.levon, thanos.makatos, Elena Ufimtseva, Kanth Ghatraju



> On Jun 1, 2022, at 1:26 PM, Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> On Wed, 1 Jun 2022 17:00:54 +0000
> Jag Raman <jag.raman@oracle.com> wrote:
>> 
>> Hi Alex,
>> 
>> Just to add some more detail, the emulated PCI device in QEMU presently
>> maintains a MSIx table (PCIDevice->msix_table) and Pending Bit Array. In the
>> present VFIO PCI device implementation, QEMU leverages the same
>> MSIx table for interrupt masking/unmasking. The backend PCI device (such as
>> the passthru device) always thinks that the interrupt is unmasked and lets
>> QEMU manage masking.
>> 
>> Whereas in the vfio-user case, the client additionally pushes a copy of
>> emulated PCI device’s table downstream to the remote device. We did this
>> to allow a small set of devices (such as e1000e) to clear the
>> PBA (msix_clr_pending()). Secondly, the remote device uses its copy of the
>> MSIx table to determine if interrupt should be triggered - this would prevent
>> an interrupt from being sent to the client unnecessarily if it's masked.
>> 
>> We are wondering if pushing the MSIx table to the remote device and
>> reading PBA from it would diverge from the VFIO protocol specification?
>> 
>> From your comment, I understand it’s similar to VFIO protocol because VFIO
>> clients could mask an interrupt using VFIO_DEVICE_SET_IRQS ioctl +
>> VFIO_IRQ_SET_ACTION_MASK / _UNMASK flags. I observed that QEMU presently
>> does not use this approach and the kernel does not support it for MSI.
> 
> I believe the SET_IRQS ioctl definition is pre-enabled to support
> masking and unmasking, we've just lacked kernel support to mask at the
> device which leads to the hybrid approach we have today.  Our intention
> would be to use the current uAPI, to provide that masking support, at
> which point we'd leave the PBA mapped to the device.
> 

	The reason I didn’t use SET_IRQS was the kernel implementation
didn’t, and I wanted to avoid having the two behave differently.  If we
want to go down the modal path, then if we use SET_IRQS to mask interrupts
at the source, we don’t need to emulate masking by changing the interrupt
eventfd from KVM to QEMU when the interrupt is masked.  Do you want that
change as well?

								JJ


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-05-24 15:30 ` [PATCH v10 13/14] vfio-user: handle device interrupts Jagannathan Raman
  2022-05-25 14:53   ` Stefan Hajnoczi
@ 2022-06-06 18:32   ` Alexander Duyck
  2022-06-06 19:29     ` Jag Raman
  1 sibling, 1 reply; 33+ messages in thread
From: Alexander Duyck @ 2022-06-06 18:32 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: qemu-devel, stefanha, Michael S. Tsirkin, f4bug, Paolo Bonzini,
	marcandre.lureau, thuth, bleal, berrange, Peter Maydell, eduardo,
	marcel.apfelbaum, eblake, armbru, Juan Quintela,
	Dr. David Alan Gilbert, imammedo, Peter Xu, john.levon,
	thanos.makatos, Elena Ufimtseva, John Johnson, kanth.ghatraju

On Tue, May 24, 2022 at 9:11 AM Jagannathan Raman <jag.raman@oracle.com> wrote:
>
> Forward remote device's interrupts to the guest
>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/pci/pci.h              |  13 ++++
>  include/hw/remote/vfio-user-obj.h |   6 ++
>  hw/pci/msi.c                      |  16 ++--
>  hw/pci/msix.c                     |  10 ++-
>  hw/pci/pci.c                      |  13 ++++
>  hw/remote/machine.c               |  14 +++-
>  hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
>  stubs/vfio-user-obj.c             |   6 ++
>  MAINTAINERS                       |   1 +
>  hw/remote/trace-events            |   1 +
>  stubs/meson.build                 |   1 +
>  11 files changed, 193 insertions(+), 11 deletions(-)
>  create mode 100644 include/hw/remote/vfio-user-obj.h
>  create mode 100644 stubs/vfio-user-obj.c
>

So I had a question about a few bits below. Specifically I ran into
issues when I had setup two devices to be assigned to the same VM via
two vfio-user-pci/x-vfio-user-server interfaces.

What I am hitting is an assert(irq_num < bus->nirq) in
pci_bus_change_irq_level in the server.

> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> index 645b54343d..75d550daae 100644
> --- a/hw/remote/machine.c
> +++ b/hw/remote/machine.c
> @@ -23,6 +23,8 @@
>  #include "hw/remote/iommu.h"
>  #include "hw/qdev-core.h"
>  #include "hw/remote/iommu.h"
> +#include "hw/remote/vfio-user-obj.h"
> +#include "hw/pci/msi.h"
>
>  static void remote_machine_init(MachineState *machine)
>  {
> @@ -54,12 +56,16 @@ static void remote_machine_init(MachineState *machine)
>
>      if (s->vfio_user) {
>          remote_iommu_setup(pci_host->bus);
> -    }
>
> -    remote_iohub_init(&s->iohub);
> +        msi_nonbroken = true;
> +
> +        vfu_object_set_bus_irq(pci_host->bus);
> +    } else {
> +        remote_iohub_init(&s->iohub);
>
> -    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
> -                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
> +        pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
> +                     &s->iohub, REMOTE_IOHUB_NB_PIRQS);
> +    }
>
>      qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
>  }

If I am reading the code right this limits us to one legacy interrupt
in the vfio_user case, irq 0, correct? Is this intentional? Just
wanted to verify as this seems to limit us to supporting only one
device based on the mapping below.

> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index ee28a93782..eeb165a805 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -53,6 +53,9 @@
>  #include "hw/pci/pci.h"
>  #include "qemu/timer.h"
>  #include "exec/memory.h"
> +#include "hw/pci/msi.h"
> +#include "hw/pci/msix.h"
> +#include "hw/remote/vfio-user-obj.h"
>
>  #define TYPE_VFU_OBJECT "x-vfio-user-server"
>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -96,6 +99,10 @@ struct VfuObject {
>      Error *unplug_blocker;
>
>      int vfu_poll_fd;
> +
> +    MSITriggerFunc *default_msi_trigger;
> +    MSIPrepareMessageFunc *default_msi_prepare_message;
> +    MSIxPrepareMessageFunc *default_msix_prepare_message;
>  };
>
>  static void vfu_object_init_ctx(VfuObject *o, Error **errp);
> @@ -520,6 +527,111 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>      }
>  }
>
> +static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
> +{
> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
> +                                pci_dev->devfn);
> +
> +    return pci_bdf;
> +}
> +

This bit ends up mapping it so that the BDF ends up setting the IRQ
number. So for example device 0, function 0 will be IRQ 0, and device
1, function 0 will be IRQ 8. Just wondering why it is implemented this
way if we only intend to support one device. Also I am wondering if we
should support some sort of IRQ sharing?

> +static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
> +{
> +    vfu_ctx_t *vfu_ctx = o->vfu_ctx;
> +    int ret;
> +
> +    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    if (msix_nr_vectors_allocated(pci_dev)) {
> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
> +                                       msix_nr_vectors_allocated(pci_dev));
> +    } else if (msi_nr_vectors_allocated(pci_dev)) {
> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
> +                                       msi_nr_vectors_allocated(pci_dev));
> +    }
> +
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    vfu_object_setup_msi_cbs(o);
> +
> +    pci_dev->irq_opaque = vfu_ctx;
> +
> +    return 0;
> +}
> +
> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
> +{
> +    pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, pci_bus, 1);
> +}
> +
>  /*
>   * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>   * properties. It also depends on devices instantiated in QEMU. These

So this is the code that was called earlier that is being used to
assign 1 interrupt to the bus. I am just wondering if that is
intentional and if the expected behavior is to only support one device
per server for now?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-06-06 18:32   ` Alexander Duyck
@ 2022-06-06 19:29     ` Jag Raman
  2022-06-06 20:38       ` Alexander Duyck
  0 siblings, 1 reply; 33+ messages in thread
From: Jag Raman @ 2022-06-06 19:29 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin, f4bug,
	Paolo Bonzini, marcandre.lureau, thuth, bleal, berrange,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake, armbru,
	Juan Quintela, Dr. David Alan Gilbert, imammedo, Peter Xu,
	john.levon, thanos.makatos, Elena Ufimtseva, John Johnson,
	Kanth Ghatraju



> On Jun 6, 2022, at 2:32 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> 
> On Tue, May 24, 2022 at 9:11 AM Jagannathan Raman <jag.raman@oracle.com> wrote:
>> 
>> Forward remote device's interrupts to the guest
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> include/hw/pci/pci.h              |  13 ++++
>> include/hw/remote/vfio-user-obj.h |   6 ++
>> hw/pci/msi.c                      |  16 ++--
>> hw/pci/msix.c                     |  10 ++-
>> hw/pci/pci.c                      |  13 ++++
>> hw/remote/machine.c               |  14 +++-
>> hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
>> stubs/vfio-user-obj.c             |   6 ++
>> MAINTAINERS                       |   1 +
>> hw/remote/trace-events            |   1 +
>> stubs/meson.build                 |   1 +
>> 11 files changed, 193 insertions(+), 11 deletions(-)
>> create mode 100644 include/hw/remote/vfio-user-obj.h
>> create mode 100644 stubs/vfio-user-obj.c
>> 
> 
> So I had a question about a few bits below. Specifically I ran into
> issues when I had setup two devices to be assigned to the same VM via
> two vfio-user-pci/x-vfio-user-server interfaces.

Thanks for the heads up, Alexander!

> 
> What I am hitting is an assert(irq_num < bus->nirq) in
> pci_bus_change_irq_level in the server.

That is issue is because we are only initializing only one
IRQ for the PCI bus - but it should be more. We will update
it and when the bus initializes interrupts and get back to you.

We only tested MSI/x devices for the multi-device config. We
should also test INTx - could you please confirm which device
you’re using so we could verify that it works before posting
the next revision.

Thank you!
--
Jag

> 
>> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
>> index 645b54343d..75d550daae 100644
>> --- a/hw/remote/machine.c
>> +++ b/hw/remote/machine.c
>> @@ -23,6 +23,8 @@
>> #include "hw/remote/iommu.h"
>> #include "hw/qdev-core.h"
>> #include "hw/remote/iommu.h"
>> +#include "hw/remote/vfio-user-obj.h"
>> +#include "hw/pci/msi.h"
>> 
>> static void remote_machine_init(MachineState *machine)
>> {
>> @@ -54,12 +56,16 @@ static void remote_machine_init(MachineState *machine)
>> 
>>     if (s->vfio_user) {
>>         remote_iommu_setup(pci_host->bus);
>> -    }
>> 
>> -    remote_iohub_init(&s->iohub);
>> +        msi_nonbroken = true;
>> +
>> +        vfu_object_set_bus_irq(pci_host->bus);
>> +    } else {
>> +        remote_iohub_init(&s->iohub);
>> 
>> -    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
>> -                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
>> +        pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
>> +                     &s->iohub, REMOTE_IOHUB_NB_PIRQS);
>> +    }
>> 
>>     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
>> }
> 
> If I am reading the code right this limits us to one legacy interrupt
> in the vfio_user case, irq 0, correct? Is this intentional? Just
> wanted to verify as this seems to limit us to supporting only one
> device based on the mapping below.
> 
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index ee28a93782..eeb165a805 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -53,6 +53,9 @@
>> #include "hw/pci/pci.h"
>> #include "qemu/timer.h"
>> #include "exec/memory.h"
>> +#include "hw/pci/msi.h"
>> +#include "hw/pci/msix.h"
>> +#include "hw/remote/vfio-user-obj.h"
>> 
>> #define TYPE_VFU_OBJECT "x-vfio-user-server"
>> OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
>> @@ -96,6 +99,10 @@ struct VfuObject {
>>     Error *unplug_blocker;
>> 
>>     int vfu_poll_fd;
>> +
>> +    MSITriggerFunc *default_msi_trigger;
>> +    MSIPrepareMessageFunc *default_msi_prepare_message;
>> +    MSIxPrepareMessageFunc *default_msix_prepare_message;
>> };
>> 
>> static void vfu_object_init_ctx(VfuObject *o, Error **errp);
>> @@ -520,6 +527,111 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>>     }
>> }
>> 
>> +static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
>> +{
>> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
>> +                                pci_dev->devfn);
>> +
>> +    return pci_bdf;
>> +}
>> +
> 
> This bit ends up mapping it so that the BDF ends up setting the IRQ
> number. So for example device 0, function 0 will be IRQ 0, and device
> 1, function 0 will be IRQ 8. Just wondering why it is implemented this
> way if we only intend to support one device. Also I am wondering if we
> should support some sort of IRQ sharing?
> 
>> +static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
>> +{
>> +    vfu_ctx_t *vfu_ctx = o->vfu_ctx;
>> +    int ret;
>> +
>> +    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    if (msix_nr_vectors_allocated(pci_dev)) {
>> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
>> +                                       msix_nr_vectors_allocated(pci_dev));
>> +    } else if (msi_nr_vectors_allocated(pci_dev)) {
>> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
>> +                                       msi_nr_vectors_allocated(pci_dev));
>> +    }
>> +
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    vfu_object_setup_msi_cbs(o);
>> +
>> +    pci_dev->irq_opaque = vfu_ctx;
>> +
>> +    return 0;
>> +}
>> +
>> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
>> +{
>> +    pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, pci_bus, 1);
>> +}
>> +
>> /*
>>  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>>  * properties. It also depends on devices instantiated in QEMU. These
> 
> So this is the code that was called earlier that is being used to
> assign 1 interrupt to the bus. I am just wondering if that is
> intentional and if the expected behavior is to only support one device
> per server for now?


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v10 13/14] vfio-user: handle device interrupts
  2022-06-06 19:29     ` Jag Raman
@ 2022-06-06 20:38       ` Alexander Duyck
  0 siblings, 0 replies; 33+ messages in thread
From: Alexander Duyck @ 2022-06-06 20:38 UTC (permalink / raw)
  To: Jag Raman
  Cc: qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin, f4bug,
	Paolo Bonzini, marcandre.lureau, thuth, bleal, berrange,
	Peter Maydell, eduardo, marcel.apfelbaum, eblake, armbru,
	Juan Quintela, Dr. David Alan Gilbert, imammedo, Peter Xu,
	john.levon, thanos.makatos, Elena Ufimtseva, John Johnson,
	Kanth Ghatraju

On Mon, Jun 6, 2022 at 12:29 PM Jag Raman <jag.raman@oracle.com> wrote:
>
>
>
> > On Jun 6, 2022, at 2:32 PM, Alexander Duyck <alexander.duyck@gmail.com> wrote:
> >
> > On Tue, May 24, 2022 at 9:11 AM Jagannathan Raman <jag.raman@oracle.com> wrote:
> >>
> >> Forward remote device's interrupts to the guest
> >>
> >> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> >> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> >> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> >> ---
> >> include/hw/pci/pci.h              |  13 ++++
> >> include/hw/remote/vfio-user-obj.h |   6 ++
> >> hw/pci/msi.c                      |  16 ++--
> >> hw/pci/msix.c                     |  10 ++-
> >> hw/pci/pci.c                      |  13 ++++
> >> hw/remote/machine.c               |  14 +++-
> >> hw/remote/vfio-user-obj.c         | 123 ++++++++++++++++++++++++++++++
> >> stubs/vfio-user-obj.c             |   6 ++
> >> MAINTAINERS                       |   1 +
> >> hw/remote/trace-events            |   1 +
> >> stubs/meson.build                 |   1 +
> >> 11 files changed, 193 insertions(+), 11 deletions(-)
> >> create mode 100644 include/hw/remote/vfio-user-obj.h
> >> create mode 100644 stubs/vfio-user-obj.c
> >>
> >
> > So I had a question about a few bits below. Specifically I ran into
> > issues when I had setup two devices to be assigned to the same VM via
> > two vfio-user-pci/x-vfio-user-server interfaces.
>
> Thanks for the heads up, Alexander!
>
> >
> > What I am hitting is an assert(irq_num < bus->nirq) in
> > pci_bus_change_irq_level in the server.
>
> That is issue is because we are only initializing only one
> IRQ for the PCI bus - but it should be more. We will update
> it and when the bus initializes interrupts and get back to you.
>
> We only tested MSI/x devices for the multi-device config. We
> should also test INTx - could you please confirm which device
> you’re using so we could verify that it works before posting
> the next revision.
>
> Thank you!
> --
> Jag

I was testing an MSI-X capable NIC, so you can reproduce with e1000e.
During the device enumeration before the driver was even loaded it
threw the error:
qemu-system-x86_64: ../hw/pci/pci.c:276: pci_bus_change_irq_level:
Assertion `irq_num < bus->nirq' failed.

All it really takes is attaching to the second device, that is enough
to trigger the error since the irq_num will be non-zero.

If I recall, all the kernel is doing is unmasking the interrupt via
the config space and it is triggering the error which then shuts down
the server.


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2022-06-06 20:51 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-24 15:30 [PATCH v10 00/14] vfio-user server in QEMU Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 01/14] qdev: unplug blocker for devices Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 02/14] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 03/14] remote/machine: add vfio-user property Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 04/14] vfio-user: build library Jagannathan Raman
2022-05-25 14:46   ` Stefan Hajnoczi
2022-05-24 15:30 ` [PATCH v10 05/14] vfio-user: define vfio-user-server object Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 06/14] vfio-user: instantiate vfio-user context Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 07/14] vfio-user: find and init PCI device Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 08/14] vfio-user: run vfio-user context Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 09/14] vfio-user: handle PCI config space accesses Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 10/14] vfio-user: IOMMU support for remote device Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 11/14] vfio-user: handle DMA mappings Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 12/14] vfio-user: handle PCI BAR accesses Jagannathan Raman
2022-05-24 15:30 ` [PATCH v10 13/14] vfio-user: handle device interrupts Jagannathan Raman
2022-05-25 14:53   ` Stefan Hajnoczi
2022-05-31 15:01     ` Jag Raman
2022-05-31 20:10       ` Alex Williamson
2022-05-31 21:03         ` Stefan Hajnoczi
2022-05-31 21:45           ` Alex Williamson
2022-06-01  6:37             ` John Johnson
2022-06-01  9:38               ` Stefan Hajnoczi
2022-06-01 17:00             ` Jag Raman
2022-06-01 17:26               ` Alex Williamson
2022-06-01 18:01                 ` Jag Raman
2022-06-01 18:30                   ` Alex Williamson
2022-06-01 19:38                     ` Jag Raman
2022-06-03 12:20                 ` John Johnson
2022-06-06 18:32   ` Alexander Duyck
2022-06-06 19:29     ` Jag Raman
2022-06-06 20:38       ` Alexander Duyck
2022-05-24 15:30 ` [PATCH v10 14/14] vfio-user: handle reset of remote device Jagannathan Raman
2022-05-25 14:55 ` [PATCH v10 00/14] vfio-user server in QEMU Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.