All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 00/19] vfio-user server in QEMU
@ 2022-02-17  7:48 Jagannathan Raman
  2022-02-17  7:48 ` [PATCH v6 01/19] configure, meson: override C compiler for cmake Jagannathan Raman
                   ` (18 more replies)
  0 siblings, 19 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Hi,

This is v6 of the server side changes to enable vfio-user in QEMU.

Thank you very much for your feedback for the last revision which
helped to streamline the overall design. We've made the following
changes to this revision:

[PATCH v6 03/19] qdev: unplug blocker for devices
  - removed test which prevented an unplug blocker
    from getting added if migration was in progress
  - added comments to function

[PATCH v6 04/19] remote/machine: add HotplugHandler for remote machine
  - changed commit message prefix from vfio-user to "remote/machine"

[PATCH v6 05/19] remote/machine: add vfio-user property
  - new in this series

[PATCH v6 07/19] vfio-user: define vfio-user-server object
  - fixed typo noted in the review
  - moved error message before setting "o->socket = NULL" in
    vfu_object_set_socket()
  - added "vfio-user=on" to the usage comment at the top of file

[PATCH v6 08/19] vfio-user: instantiate vfio-user context
  - added error message to the object set property message when
    server is already running

[PATCH v6 09/19] vfio-user: find and init PCI device
  - added more detailed error message for device unplug blocker

[PATCH v6 10/19] vfio-user: run vfio-user context
  - send ID of device in VFU_CLIENT_HANGUP instead of path
  - disable FD handler in object finalize

[PATCH v6 12/19] vfio-user: IOMMU support for remote device
  - new in this series

[PATCH v6 13/19] vfio-user: handle DMA mappings
  - Setup IOMMU for remote machine if vfio-user is enabled
  - Map/Unmap the DMA regions in the IOMMU address space in
    dma_register()/dma_unregister() using
    pci_device_iommu_address_space() function

[PATCH v6 14/19] vfio-user: handle PCI BAR accesses
  - vfu_object_bar_rw() - directly access the bar region
    instead of accessing via address_space_rw()
  - register handler for PCI ROM region
  - set read only flags for read only MemoryRegions with
    vfu_setup_region()

[PATCH v6 15/19] vfio-user: handle device interrupts
  - setup separate PCI bus map_irq and set_irq for
    vfio-user during remote machine init
  - index hash table using PCI bud device function numbers

[PATCH v6 16/19] vfio-user: handle device interrupts
  - new in this series

[PATCH v6 17/19] vfio-user: register handlers to facilitate migration
  - enable streaming for migration data instead pre-determining
    the migration data size at boot
  - dropped migrated_devs static variable to track the number of
    devices migrated
  - added helper functions to independently start stop block and
    network devices
  - updated qemu_remote_savevm() to migrate data of all the
    devices under the target device

[PATCH v6 18/19] vfio-user: handle reset of remote device
  - new in this series

[PATCH v6 19/19] vfio-user: avocado tests for vfio-user
  - use QMP command for hotplug instead of HMP command
  - confirm the state of source and destination VMs after migration
  - testing megasas device instead of lsi53c895a as lsi53c895a
    doesn't seem to support IOMMU, which is enabled by default
    on the server

We dropped the following patches from the previous revision:
  - pci: isolated address space for PCI bus
  - pci: create and free isolated PCI buses
  - vfio-user: set qdev bus callbacks for remote machine

We are looking forward to your comments.

Thank you very much!

Jagannathan Raman (19):
  configure, meson: override C compiler for cmake
  tests/avocado: Specify target VM argument to helper routines
  qdev: unplug blocker for devices
  remote/machine: add HotplugHandler for remote machine
  remote/machine: add vfio-user property
  vfio-user: build library
  vfio-user: define vfio-user-server object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: IOMMU support for remote device
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  softmmu/vl: defer backend init
  vfio-user: register handlers to facilitate migration
  vfio-user: handle reset of remote device
  vfio-user: avocado tests for vfio-user

 configure                                  |   21 +-
 meson.build                                |   44 +-
 qapi/misc.json                             |   23 +
 qapi/qom.json                              |   20 +-
 include/block/block.h                      |    1 +
 include/exec/memory.h                      |    3 +
 include/hw/pci/pci.h                       |    6 +
 include/hw/qdev-core.h                     |   35 +
 include/hw/remote/iommu.h                  |   18 +
 include/hw/remote/machine.h                |    2 +
 include/hw/remote/vfio-user-obj.h          |    6 +
 include/migration/vmstate.h                |    2 +
 include/sysemu/sysemu.h                    |    4 +
 migration/savevm.h                         |    2 +
 block.c                                    |    5 +
 block/block-backend.c                      |    3 +-
 blockdev.c                                 |    2 +-
 hw/pci/msi.c                               |   13 +-
 hw/pci/msix.c                              |   12 +-
 hw/remote/iommu.c                          |   78 ++
 hw/remote/machine.c                        |   54 +-
 hw/remote/vfio-user-obj.c                  | 1286 ++++++++++++++++++++
 migration/savevm.c                         |   89 ++
 migration/vmstate.c                        |   19 +
 softmmu/physmem.c                          |    4 +-
 softmmu/qdev-monitor.c                     |   26 +
 softmmu/vl.c                               |   17 +
 stubs/defer-backend-init.c                 |    7 +
 stubs/vfio-user-obj.c                      |    6 +
 tests/qtest/fuzz/generic_fuzz.c            |    9 +-
 .gitlab-ci.d/buildtest.yml                 |    2 +
 .gitmodules                                |    3 +
 Kconfig.host                               |    4 +
 MAINTAINERS                                |    7 +
 hw/remote/Kconfig                          |    4 +
 hw/remote/meson.build                      |    4 +
 hw/remote/trace-events                     |   11 +
 meson_options.txt                          |    2 +
 stubs/meson.build                          |    2 +
 subprojects/libvfio-user                   |    1 +
 tests/avocado/avocado_qemu/__init__.py     |   14 +-
 tests/avocado/vfio-user.py                 |  234 ++++
 tests/docker/dockerfiles/centos8.docker    |    2 +
 tests/docker/dockerfiles/ubuntu2004.docker |    2 +
 44 files changed, 2088 insertions(+), 21 deletions(-)
 create mode 100644 include/hw/remote/iommu.h
 create mode 100644 include/hw/remote/vfio-user-obj.h
 create mode 100644 hw/remote/iommu.c
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 100644 stubs/defer-backend-init.c
 create mode 100644 stubs/vfio-user-obj.c
 create mode 160000 subprojects/libvfio-user
 create mode 100644 tests/avocado/vfio-user.py

-- 
2.20.1



^ permalink raw reply	[flat|nested] 76+ messages in thread

* [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-17 12:09   ` Peter Maydell
  2022-02-17  7:48 ` [PATCH v6 02/19] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
                   ` (17 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

The compiler path that cmake gets from meson is corrupted. It results in
the following error:
| -- The C compiler identification is unknown
| CMake Error at CMakeLists.txt:35 (project):
| The CMAKE_C_COMPILER:
| /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
| is not a full path to an existing compiler tool.

Explicitly specify the C compiler for cmake to avoid this error

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
---
 configure | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure b/configure
index 3a29eff5cc..9a326eda1e 100755
--- a/configure
+++ b/configure
@@ -3726,6 +3726,8 @@ if test "$skip_meson" = no; then
   echo "cpp_args = [$(meson_quote $CXXFLAGS $EXTRA_CXXFLAGS)]" >> $cross
   echo "c_link_args = [$(meson_quote $CFLAGS $LDFLAGS $EXTRA_CFLAGS $EXTRA_LDFLAGS)]" >> $cross
   echo "cpp_link_args = [$(meson_quote $CXXFLAGS $LDFLAGS $EXTRA_CXXFLAGS $EXTRA_LDFLAGS)]" >> $cross
+  echo "[cmake]" >> $cross
+  echo "CMAKE_C_COMPILER = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
   echo "[binaries]" >> $cross
   echo "c = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
   test -n "$cxx" && echo "cpp = [$(meson_quote $cxx $CPU_CFLAGS)]" >> $cross
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 02/19] tests/avocado: Specify target VM argument to helper routines
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
  2022-02-17  7:48 ` [PATCH v6 01/19] configure, meson: override C compiler for cmake Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-17  7:48 ` [PATCH v6 03/19] qdev: unplug blocker for devices Jagannathan Raman
                   ` (16 subsequent siblings)
  18 siblings, 0 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Specify target VM for exec_command and
exec_command_and_wait_for_pattern routines

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Beraldo Leal <bleal@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/avocado/avocado_qemu/__init__.py | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
index 75063c0c30..b3fbf77577 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -198,7 +198,7 @@ def wait_for_console_pattern(test, success_message, failure_message=None,
     """
     _console_interaction(test, success_message, failure_message, None, vm=vm)
 
-def exec_command(test, command):
+def exec_command(test, command, vm=None):
     """
     Send a command to a console (appending CRLF characters), while logging
     the content.
@@ -207,11 +207,14 @@ def exec_command(test, command):
     :type test: :class:`avocado_qemu.QemuSystemTest`
     :param command: the command to send
     :type command: str
+    :param vm: target vm
+    :type vm: :class:`qemu.machine.QEMUMachine`
     """
-    _console_interaction(test, None, None, command + '\r')
+    _console_interaction(test, None, None, command + '\r', vm=vm)
 
 def exec_command_and_wait_for_pattern(test, command,
-                                      success_message, failure_message=None):
+                                      success_message, failure_message=None,
+                                      vm=None):
     """
     Send a command to a console (appending CRLF characters), then wait
     for success_message to appear on the console, while logging the.
@@ -223,8 +226,11 @@ def exec_command_and_wait_for_pattern(test, command,
     :param command: the command to send
     :param success_message: if this message appears, test succeeds
     :param failure_message: if this message appears, test fails
+    :param vm: target vm
+    :type vm: :class:`qemu.machine.QEMUMachine`
     """
-    _console_interaction(test, success_message, failure_message, command + '\r')
+    _console_interaction(test, success_message, failure_message, command + '\r',
+                         vm=vm)
 
 class QemuBaseTest(avocado.Test):
     def _get_unique_tag_val(self, tag_name):
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 03/19] qdev: unplug blocker for devices
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
  2022-02-17  7:48 ` [PATCH v6 01/19] configure, meson: override C compiler for cmake Jagannathan Raman
  2022-02-17  7:48 ` [PATCH v6 02/19] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-21 15:27   ` Stefan Hajnoczi
  2022-02-21 15:30   ` Stefan Hajnoczi
  2022-02-17  7:48 ` [PATCH v6 04/19] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Add blocker to prevent hot-unplug of devices

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/qdev-core.h | 35 +++++++++++++++++++++++++++++++++++
 softmmu/qdev-monitor.c | 26 ++++++++++++++++++++++++++
 2 files changed, 61 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 92c3d65208..4b1d77f44a 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -193,6 +193,7 @@ struct DeviceState {
     int instance_id_alias;
     int alias_required_for_version;
     ResettableState reset;
+    GSList *unplug_blockers;
 };
 
 struct DeviceListener {
@@ -419,6 +420,40 @@ void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
 void qdev_machine_creation_done(void);
 bool qdev_machine_modified(void);
 
+/**
+ * Device Unplug blocker: prevents a device from being unplugged. It could
+ * be used to indicate that another object depends on the device.
+ *
+ * qdev_add_unplug_blocker: Adds an unplug blocker to a device
+ *
+ * @dev: Device to be blocked from unplug
+ * @reason: Reason for blocking
+ *
+ */
+void qdev_add_unplug_blocker(DeviceState *dev, Error *reason);
+
+/**
+ * qdev_del_unplug_blocker: Removes an unplug blocker from a device
+ *
+ * @dev: Device to be unblocked
+ * @reason: Pointer to the Error used with qdev_add_unplug_blocker.
+ *          Used as a handle to lookup the blocker for deletion.
+ *
+ */
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason);
+
+/**
+ * qdev_unplug_blocked: Confirms if a device is blocked from unplug
+ *
+ * @dev: Device to be tested
+ * @reason: Returns one of the reasons why the device is blocked,
+ *          if any
+ *
+ * Returns: true if device is blocked from unplug, false otherwise
+ *
+ */
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp);
+
 /**
  * GpioPolarity: Polarity of a GPIO line
  *
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 01f3834db5..69d9cf3f25 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -945,10 +945,36 @@ void qmp_device_del(const char *id, Error **errp)
             return;
         }
 
+        if (qdev_unplug_blocked(dev, errp)) {
+            return;
+        }
+
         qdev_unplug(dev, errp);
     }
 }
 
+void qdev_add_unplug_blocker(DeviceState *dev, Error *reason)
+{
+    dev->unplug_blockers = g_slist_prepend(dev->unplug_blockers, reason);
+}
+
+void qdev_del_unplug_blocker(DeviceState *dev, Error *reason)
+{
+    dev->unplug_blockers = g_slist_remove(dev->unplug_blockers, reason);
+}
+
+bool qdev_unplug_blocked(DeviceState *dev, Error **errp)
+{
+    ERRP_GUARD();
+
+    if (dev->unplug_blockers) {
+        error_propagate(errp, error_copy(dev->unplug_blockers->data));
+        return true;
+    }
+
+    return false;
+}
+
 void hmp_device_add(Monitor *mon, const QDict *qdict)
 {
     Error *err = NULL;
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 04/19] remote/machine: add HotplugHandler for remote machine
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (2 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 03/19] qdev: unplug blocker for devices Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-21 15:30   ` Stefan Hajnoczi
  2022-02-17  7:48 ` [PATCH v6 05/19] remote/machine: add vfio-user property Jagannathan Raman
                   ` (14 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Allow hotplugging of PCI(e) devices to remote machine

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/machine.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 952105eab5..0c5bd4f923 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -21,6 +21,7 @@
 #include "qapi/error.h"
 #include "hw/pci/pci_host.h"
 #include "hw/remote/iohub.h"
+#include "hw/qdev-core.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -54,14 +55,19 @@ static void remote_machine_init(MachineState *machine)
 
     pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
                  &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+
+    qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
 
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
+    HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(oc);
 
     mc->init = remote_machine_init;
     mc->desc = "Experimental remote machine";
+
+    hc->unplug = qdev_simple_device_unplug_cb;
 }
 
 static const TypeInfo remote_machine = {
@@ -69,6 +75,10 @@ static const TypeInfo remote_machine = {
     .parent = TYPE_MACHINE,
     .instance_size = sizeof(RemoteMachineState),
     .class_init = remote_machine_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_HOTPLUG_HANDLER },
+        { }
+    }
 };
 
 static void remote_machine_register_types(void)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 05/19] remote/machine: add vfio-user property
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (3 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 04/19] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-21 15:32   ` Stefan Hajnoczi
  2022-02-17  7:48 ` [PATCH v6 06/19] vfio-user: build library Jagannathan Raman
                   ` (13 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Add vfio-user to x-remote machine. It is a boolean, which indicates if
the machine supports vfio-user protocol. The machine configures the bus
differently vfio-user and multiprocess protocols, so this property
informs it on how to configure the bus.

This property should be short lived. Once vfio-user fully replaces
multiprocess, this property could be removed.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/remote/machine.h |  2 ++
 hw/remote/machine.c         | 23 +++++++++++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/include/hw/remote/machine.h b/include/hw/remote/machine.h
index 2a2a33c4b2..8d0fa98d33 100644
--- a/include/hw/remote/machine.h
+++ b/include/hw/remote/machine.h
@@ -22,6 +22,8 @@ struct RemoteMachineState {
 
     RemotePCIHost *host;
     RemoteIOHubState iohub;
+
+    bool vfio_user;
 };
 
 /* Used to pass to co-routine device and ioc. */
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index 0c5bd4f923..a9a75e170f 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -59,6 +59,25 @@ static void remote_machine_init(MachineState *machine)
     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
 
+static bool remote_machine_get_vfio_user(Object *obj, Error **errp)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    return s->vfio_user;
+}
+
+static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
+{
+    RemoteMachineState *s = REMOTE_MACHINE(obj);
+
+    if (phase_check(PHASE_MACHINE_CREATED)) {
+        error_setg(errp, "Error enabling vfio-user - machine already created");
+        return;
+    }
+
+    s->vfio_user = value;
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -68,6 +87,10 @@ static void remote_machine_class_init(ObjectClass *oc, void *data)
     mc->desc = "Experimental remote machine";
 
     hc->unplug = qdev_simple_device_unplug_cb;
+
+    object_class_property_add_bool(oc, "vfio-user",
+                                   remote_machine_get_vfio_user,
+                                   remote_machine_set_vfio_user);
 }
 
 static const TypeInfo remote_machine = {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 06/19] vfio-user: build library
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (4 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 05/19] remote/machine: add vfio-user property Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-17  7:48 ` [PATCH v6 07/19] vfio-user: define vfio-user-server object Jagannathan Raman
                   ` (12 subsequent siblings)
  18 siblings, 0 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

add the libvfio-user library as a submodule. build it as a cmake
subproject.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 configure                                  | 19 +++++++++-
 meson.build                                | 44 +++++++++++++++++++++-
 .gitlab-ci.d/buildtest.yml                 |  2 +
 .gitmodules                                |  3 ++
 Kconfig.host                               |  4 ++
 MAINTAINERS                                |  1 +
 hw/remote/Kconfig                          |  4 ++
 hw/remote/meson.build                      |  2 +
 meson_options.txt                          |  2 +
 subprojects/libvfio-user                   |  1 +
 tests/docker/dockerfiles/centos8.docker    |  2 +
 tests/docker/dockerfiles/ubuntu2004.docker |  2 +
 12 files changed, 84 insertions(+), 2 deletions(-)
 create mode 160000 subprojects/libvfio-user

diff --git a/configure b/configure
index 9a326eda1e..2acb2604c2 100755
--- a/configure
+++ b/configure
@@ -356,6 +356,7 @@ ninja=""
 gio="$default_feature"
 skip_meson=no
 slirp_smbd="$default_feature"
+vfio_user_server="disabled"
 
 # The following Meson options are handled manually (still they
 # are included in the automatically generated help message)
@@ -1172,6 +1173,10 @@ for opt do
   ;;
   --disable-blobs) meson_option_parse --disable-install-blobs ""
   ;;
+  --enable-vfio-user-server) vfio_user_server="enabled"
+  ;;
+  --disable-vfio-user-server) vfio_user_server="disabled"
+  ;;
   --enable-tcmalloc) meson_option_parse --enable-malloc=tcmalloc tcmalloc
   ;;
   --enable-jemalloc) meson_option_parse --enable-malloc=jemalloc jemalloc
@@ -1425,6 +1430,7 @@ cat << EOF
   rng-none        dummy RNG, avoid using /dev/(u)random and getrandom()
   gio             libgio support
   slirp-smbd      use smbd (at path --smbd=*) in slirp networking
+  vfio-user-server    vfio-user server support
 
 NOTE: The object files are built at the place where configure is launched
 EOF
@@ -3100,6 +3106,17 @@ but not implemented on your system"
     fi
 fi
 
+##########################################
+# check for vfio_user_server
+
+case "$vfio_user_server" in
+  auto | enabled )
+    if test "$git_submodules_action" != "ignore"; then
+      git_submodules="${git_submodules} subprojects/libvfio-user"
+    fi
+    ;;
+esac
+
 ##########################################
 # End of CC checks
 # After here, no more $cc or $ld runs
@@ -3790,7 +3807,7 @@ if test "$skip_meson" = no; then
         -Db_pie=$(if test "$pie" = yes; then echo true; else echo false; fi) \
         -Db_coverage=$(if test "$gcov" = yes; then echo true; else echo false; fi) \
         -Db_lto=$lto -Dcfi=$cfi -Dtcg=$tcg -Dxen=$xen \
-        -Dcapstone=$capstone -Dfdt=$fdt -Dslirp=$slirp \
+        -Dcapstone=$capstone -Dfdt=$fdt -Dslirp=$slirp -Dvfio_user_server=$vfio_user_server \
         $(test -n "${LIB_FUZZING_ENGINE+xxx}" && echo "-Dfuzzing_engine=$LIB_FUZZING_ENGINE") \
         $(if test "$default_feature" = no; then echo "-Dauto_features=disabled"; fi) \
         "$@" $cross_arg "$PWD" "$source_path"
diff --git a/meson.build b/meson.build
index ae5f7eec6e..5111b6fed8 100644
--- a/meson.build
+++ b/meson.build
@@ -278,6 +278,11 @@ if targetos != 'linux' and get_option('multiprocess').enabled()
 endif
 multiprocess_allowed = targetos == 'linux' and not get_option('multiprocess').disabled()
 
+if targetos != 'linux' and get_option('vfio_user_server').enabled()
+  error('vfio-user server is supported only on Linux')
+endif
+vfio_user_server_allowed = targetos == 'linux' and not get_option('vfio_user_server').disabled()
+
 # Target-specific libraries and flags
 libm = cc.find_library('m', required: false)
 threads = dependency('threads')
@@ -1876,7 +1881,8 @@ host_kconfig = \
   (have_virtfs ? ['CONFIG_VIRTFS=y'] : []) + \
   ('CONFIG_LINUX' in config_host ? ['CONFIG_LINUX=y'] : []) + \
   ('CONFIG_PVRDMA' in config_host ? ['CONFIG_PVRDMA=y'] : []) + \
-  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : [])
+  (multiprocess_allowed ? ['CONFIG_MULTIPROCESS_ALLOWED=y'] : []) + \
+  (vfio_user_server_allowed ? ['CONFIG_VFIO_USER_SERVER_ALLOWED=y'] : [])
 
 ignored = [ 'TARGET_XML_FILES', 'TARGET_ABI_DIR', 'TARGET_ARCH' ]
 
@@ -2265,6 +2271,41 @@ if get_option('cfi') and slirp_opt == 'system'
          + ' Please configure with --enable-slirp=git')
 endif
 
+vfiouser = not_found
+if have_system and vfio_user_server_allowed
+  have_internal = fs.exists(meson.current_source_dir() / 'subprojects/libvfio-user/Makefile')
+
+  if not have_internal
+    error('libvfio-user source not found - please pull git submodule')
+  endif
+
+  json_c = dependency('json-c', required: false)
+  if not json_c.found()
+    json_c = dependency('libjson-c', required: false)
+  endif
+  if not json_c.found()
+    json_c = dependency('libjson-c-dev', required: false)
+  endif
+
+  if not json_c.found()
+    error('Unable to find json-c package')
+  endif
+
+  cmake = import('cmake')
+
+  vfiouser_subproj = cmake.subproject('libvfio-user')
+
+  vfiouser_sl = vfiouser_subproj.dependency('vfio-user-static')
+
+  # Although cmake links the json-c library with vfio-user-static
+  # target, that info is not available to meson via cmake.subproject.
+  # As such, we have to separately declare the json-c dependency here.
+  # This appears to be a current limitation of using cmake inside meson.
+  # libvfio-user is planning a switch to meson in the future, which
+  # would address this item automatically.
+  vfiouser = declare_dependency(dependencies: [vfiouser_sl, json_c])
+endif
+
 fdt = not_found
 fdt_opt = get_option('fdt')
 if have_system
@@ -3366,6 +3407,7 @@ summary_info += {'target list':       ' '.join(target_dirs)}
 if have_system
   summary_info += {'default devices':   get_option('default_devices')}
   summary_info += {'out of process emulation': multiprocess_allowed}
+  summary_info += {'vfio-user server': vfio_user_server_allowed}
 endif
 summary(summary_info, bool_yn: true, section: 'Targets and accelerators')
 
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 0aa70213fb..e52391ec5c 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -42,6 +42,7 @@ build-system-ubuntu:
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --enable-docs --enable-fdt=system --enable-slirp=system
+                    --enable-vfio-user-server
     TARGETS: aarch64-softmmu alpha-softmmu cris-softmmu hppa-softmmu
       microblazeel-softmmu mips64el-softmmu
     MAKE_CHECK_ARGS: check-build
@@ -165,6 +166,7 @@ build-system-centos:
     IMAGE: centos8
     CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-fdt=system
       --enable-modules --enable-trace-backends=dtrace --enable-docs
+      --enable-vfio-user-server
     TARGETS: ppc64-softmmu or1k-softmmu s390x-softmmu
       x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
     MAKE_CHECK_ARGS: check-build
diff --git a/.gitmodules b/.gitmodules
index f4b6a9b401..d66af96dc9 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -67,3 +67,6 @@
 [submodule "tests/lcitool/libvirt-ci"]
 	path = tests/lcitool/libvirt-ci
 	url = https://gitlab.com/libvirt/libvirt-ci.git
+[submodule "subprojects/libvfio-user"]
+	path = subprojects/libvfio-user
+	url = https://github.com/nutanix/libvfio-user.git
diff --git a/Kconfig.host b/Kconfig.host
index 60b9c07b5e..f2da8bcf8a 100644
--- a/Kconfig.host
+++ b/Kconfig.host
@@ -45,3 +45,7 @@ config MULTIPROCESS_ALLOWED
 config FUZZ
     bool
     select SPARSE_MEM
+
+config VFIO_USER_SERVER_ALLOWED
+    bool
+    imply VFIO_USER_SERVER
diff --git a/MAINTAINERS b/MAINTAINERS
index 81aa31b5e1..9af3e96d63 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3567,6 +3567,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: subprojects/libvfio-user
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
index 08c16e235f..2d6b4f4cf4 100644
--- a/hw/remote/Kconfig
+++ b/hw/remote/Kconfig
@@ -2,3 +2,7 @@ config MULTIPROCESS
     bool
     depends on PCI && PCI_EXPRESS && KVM
     select REMOTE_PCIHOST
+
+config VFIO_USER_SERVER
+    bool
+    depends on MULTIPROCESS
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574242..dfea6b533b 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
 
+remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
+
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
 
diff --git a/meson_options.txt b/meson_options.txt
index 95d527f773..0713ef508c 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -68,6 +68,8 @@ option('multiprocess', type: 'feature', value: 'auto',
        description: 'Out of process device emulation support')
 option('dbus_display', type: 'feature', value: 'auto',
        description: '-display dbus support')
+option('vfio_user_server', type: 'feature', value: 'auto',
+       description: 'vfio-user server support')
 
 option('attr', type : 'feature', value : 'auto',
        description: 'attr/xattr support')
diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
new file mode 160000
index 0000000000..7056525da5
--- /dev/null
+++ b/subprojects/libvfio-user
@@ -0,0 +1 @@
+Subproject commit 7056525da5399d00831e90bed4aedb4b8442c9b2
diff --git a/tests/docker/dockerfiles/centos8.docker b/tests/docker/dockerfiles/centos8.docker
index 3ede55d09b..b6b4aa9626 100644
--- a/tests/docker/dockerfiles/centos8.docker
+++ b/tests/docker/dockerfiles/centos8.docker
@@ -23,6 +23,7 @@ RUN dnf update -y && \
         capstone-devel \
         ccache \
         clang \
+        cmake \
         ctags \
         cyrus-sasl-devel \
         daxctl-devel \
@@ -45,6 +46,7 @@ RUN dnf update -y && \
         gtk3-devel \
         hostname \
         jemalloc-devel \
+        json-c-devel \
         libaio-devel \
         libasan \
         libattr-devel \
diff --git a/tests/docker/dockerfiles/ubuntu2004.docker b/tests/docker/dockerfiles/ubuntu2004.docker
index 87513125b8..22468d01e7 100644
--- a/tests/docker/dockerfiles/ubuntu2004.docker
+++ b/tests/docker/dockerfiles/ubuntu2004.docker
@@ -18,6 +18,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
             ca-certificates \
             ccache \
             clang \
+            cmake \
             dbus \
             debianutils \
             diffutils \
@@ -58,6 +59,7 @@ RUN export DEBIAN_FRONTEND=noninteractive && \
             libiscsi-dev \
             libjemalloc-dev \
             libjpeg-turbo8-dev \
+            libjson-c-dev \
             liblttng-ust-dev \
             liblzo2-dev \
             libncursesw5-dev \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 07/19] vfio-user: define vfio-user-server object
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (5 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 06/19] vfio-user: build library Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-21 15:37   ` Stefan Hajnoczi
  2022-02-25 15:42   ` Eric Blake
  2022-02-17  7:48 ` [PATCH v6 08/19] vfio-user: instantiate vfio-user context Jagannathan Raman
                   ` (11 subsequent siblings)
  18 siblings, 2 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/qom.json             |  20 +++-
 hw/remote/vfio-user-obj.c | 194 ++++++++++++++++++++++++++++++++++++++
 MAINTAINERS               |   1 +
 hw/remote/meson.build     |   1 +
 hw/remote/trace-events    |   3 +
 5 files changed, 217 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/qapi/qom.json b/qapi/qom.json
index eeb5395ff3..ff266e4732 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -703,6 +703,20 @@
 { 'struct': 'RemoteObjectProperties',
   'data': { 'fd': 'str', 'devid': 'str' } }
 
+##
+# @VfioUserServerProperties:
+#
+# Properties for x-vfio-user-server objects.
+#
+# @socket: socket to be used by the libvfiouser library
+#
+# @device: the id of the device to be emulated at the server
+#
+# Since: 6.3
+##
+{ 'struct': 'VfioUserServerProperties',
+  'data': { 'socket': 'SocketAddress', 'device': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -842,7 +856,8 @@
     'tls-creds-psk',
     'tls-creds-x509',
     'tls-cipher-suites',
-    { 'name': 'x-remote-object', 'features': [ 'unstable' ] }
+    { 'name': 'x-remote-object', 'features': [ 'unstable' ] },
+    { 'name': 'x-vfio-user-server', 'features': [ 'unstable' ] }
   ] }
 
 ##
@@ -905,7 +920,8 @@
       'tls-creds-psk':              'TlsCredsPskProperties',
       'tls-creds-x509':             'TlsCredsX509Properties',
       'tls-cipher-suites':          'TlsCredsProperties',
-      'x-remote-object':            'RemoteObjectProperties'
+      'x-remote-object':            'RemoteObjectProperties',
+      'x-vfio-user-server':         'VfioUserServerProperties'
   } }
 
 ##
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 0000000000..84cd16c4ad
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,194 @@
+/**
+ * QEMU vfio-user-server server object
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ *     -machine x-remote,vfio-user=on
+ *     -device <PCI-device>,id=<pci-dev-id>
+ *     -object x-vfio-user-server,id=<id>,type=unix,path=<socket-path>,
+ *             device=<pci-dev-id>
+ *
+ * Note that x-vfio-user-server object must be used with x-remote machine only.
+ * This server could only support PCI devices for now.
+ *
+ * type - SocketAddress type - presently "unix" alone is supported. Required
+ *        option
+ *
+ * path - named unix socket, it will be created by the server. It is
+ *        a required option
+ *
+ * device - id of a device on the server, a required option. PCI devices
+ *          alone are supported presently.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+#include "hw/boards.h"
+#include "hw/remote/machine.h"
+#include "qapi/error.h"
+#include "qapi/qapi-visit-sockets.h"
+
+#define TYPE_VFU_OBJECT "x-vfio-user-server"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+/**
+ * VFU_OBJECT_ERROR - reports an error message. If auto_shutdown
+ * is set, it aborts the machine on error. Otherwise, it logs an
+ * error message without aborting.
+ */
+#define VFU_OBJECT_ERROR(o, fmt, ...)                         \
+    {                                                         \
+        VfuObjectClass *oc = VFU_OBJECT_GET_CLASS(OBJECT(o)); \
+                                                              \
+        if (oc->auto_shutdown) {                              \
+            error_setg(&error_abort, (fmt), ## __VA_ARGS__);  \
+        } else {                                              \
+            error_report((fmt), ## __VA_ARGS__);              \
+        }                                                     \
+    }                                                         \
+
+struct VfuObjectClass {
+    ObjectClass parent_class;
+
+    unsigned int nr_devs;
+
+    /*
+     * Can be set to shutdown automatically when all server object
+     * instances are destroyed
+     */
+    bool auto_shutdown;
+};
+
+struct VfuObject {
+    /* private */
+    Object parent;
+
+    SocketAddress *socket;
+
+    char *device;
+
+    Error *err;
+};
+
+static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    qapi_free_SocketAddress(o->socket);
+
+    o->socket = NULL;
+
+    visit_type_SocketAddress(v, name, &o->socket, errp);
+
+    if (o->socket->type != SOCKET_ADDRESS_TYPE_UNIX) {
+        error_setg(errp, "vfu: Unsupported socket type - %s",
+                   SocketAddressType_str(o->socket->type));
+        qapi_free_SocketAddress(o->socket);
+        o->socket = NULL;
+        return;
+    }
+
+    trace_vfu_prop("socket", o->socket->u.q_unix.path);
+}
+
+static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->device);
+
+    o->device = g_strdup(str);
+
+    trace_vfu_prop("device", str);
+}
+
+static void vfu_object_init(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs++;
+
+    if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
+        error_setg(&o->err, "vfu: %s only compatible with %s machine",
+                   TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
+        return;
+    }
+}
+
+static void vfu_object_finalize(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs--;
+
+    qapi_free_SocketAddress(o->socket);
+
+    o->socket = NULL;
+
+    g_free(o->device);
+
+    o->device = NULL;
+
+    if (!k->nr_devs && k->auto_shutdown) {
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
+}
+
+static void vfu_object_class_init(ObjectClass *klass, void *data)
+{
+    VfuObjectClass *k = VFU_OBJECT_CLASS(klass);
+
+    k->nr_devs = 0;
+
+    k->auto_shutdown = true;
+
+    object_class_property_add(klass, "socket", "SocketAddress", NULL,
+                              vfu_object_set_socket, NULL, NULL);
+    object_class_property_set_description(klass, "socket",
+                                          "SocketAddress "
+                                          "(ex: type=unix,path=/tmp/sock). "
+                                          "Only UNIX is presently supported");
+    object_class_property_add_str(klass, "device", NULL,
+                                  vfu_object_set_device);
+    object_class_property_set_description(klass, "device",
+                                          "device ID - only PCI devices "
+                                          "are presently supported");
+}
+
+static const TypeInfo vfu_object_info = {
+    .name = TYPE_VFU_OBJECT,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(VfuObject),
+    .instance_init = vfu_object_init,
+    .instance_finalize = vfu_object_finalize,
+    .class_size = sizeof(VfuObjectClass),
+    .class_init = vfu_object_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void vfu_register_types(void)
+{
+    type_register_static(&vfu_object_info);
+}
+
+type_init(vfu_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index 9af3e96d63..751d97852d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3568,6 +3568,7 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
+F: hw/remote/vfio-user-obj.c
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index dfea6b533b..534ac5df79 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 0b23974f90..7da12f0d96 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -2,3 +2,6 @@
 
 mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
 mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
+
+# vfio-user-obj.c
+vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 08/19] vfio-user: instantiate vfio-user context
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (6 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 07/19] vfio-user: define vfio-user-server object Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-21 15:42   ` Stefan Hajnoczi
  2022-02-17  7:48 ` [PATCH v6 09/19] vfio-user: find and init PCI device Jagannathan Raman
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

create a context with the vfio-user library to run a PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 80 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 80 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 84cd16c4ad..496e6c8038 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -41,6 +41,9 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/notify.h"
+#include "sysemu/sysemu.h"
+#include "libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -82,13 +85,24 @@ struct VfuObject {
     char *device;
 
     Error *err;
+
+    Notifier machine_done;
+
+    vfu_ctx_t *vfu_ctx;
 };
 
+static void vfu_object_init_ctx(VfuObject *o, Error **errp);
+
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
                                   void *opaque, Error **errp)
 {
     VfuObject *o = VFU_OBJECT(obj);
 
+    if (o->vfu_ctx) {
+        error_setg(errp, "vfu: Unable to set socket property - server busy");
+        return;
+    }
+
     qapi_free_SocketAddress(o->socket);
 
     o->socket = NULL;
@@ -104,17 +118,69 @@ static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
     }
 
     trace_vfu_prop("socket", o->socket->u.q_unix.path);
+
+    vfu_object_init_ctx(o, errp);
 }
 
 static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
 {
     VfuObject *o = VFU_OBJECT(obj);
 
+    if (o->vfu_ctx) {
+        error_setg(errp, "vfu: Unable to set device property - server busy");
+        return;
+    }
+
     g_free(o->device);
 
     o->device = g_strdup(str);
 
     trace_vfu_prop("device", str);
+
+    vfu_object_init_ctx(o, errp);
+}
+
+/*
+ * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
+ * properties. It also depends on devices instantiated in QEMU. These
+ * dependencies are not available during the instance_init phase of this
+ * object's life-cycle. As such, the server is initialized after the
+ * machine is setup. machine_init_done_notifier notifies TYPE_VFU_OBJECT
+ * when the machine is setup, and the dependencies are available.
+ */
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+    VfuObject *o = container_of(notifier, VfuObject, machine_done);
+    Error *err = NULL;
+
+    vfu_object_init_ctx(o, &err);
+
+    if (err) {
+        error_propagate(&error_abort, err);
+    }
+}
+
+static void vfu_object_init_ctx(VfuObject *o, Error **errp)
+{
+    ERRP_GUARD();
+
+    if (o->vfu_ctx || !o->socket || !o->device ||
+            !phase_check(PHASE_MACHINE_READY)) {
+        return;
+    }
+
+    if (o->err) {
+        error_propagate(errp, o->err);
+        o->err = NULL;
+        return;
+    }
+
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+                                o, VFU_DEV_TYPE_PCI);
+    if (o->vfu_ctx == NULL) {
+        error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
+        return;
+    }
 }
 
 static void vfu_object_init(Object *obj)
@@ -124,6 +190,11 @@ static void vfu_object_init(Object *obj)
 
     k->nr_devs++;
 
+    if (!phase_check(PHASE_MACHINE_READY)) {
+        o->machine_done.notify = vfu_object_machine_done;
+        qemu_add_machine_init_done_notifier(&o->machine_done);
+    }
+
     if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
         error_setg(&o->err, "vfu: %s only compatible with %s machine",
                    TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
@@ -142,6 +213,10 @@ static void vfu_object_finalize(Object *obj)
 
     o->socket = NULL;
 
+    if (o->vfu_ctx) {
+        vfu_destroy_ctx(o->vfu_ctx);
+    }
+
     g_free(o->device);
 
     o->device = NULL;
@@ -149,6 +224,11 @@ static void vfu_object_finalize(Object *obj)
     if (!k->nr_devs && k->auto_shutdown) {
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
+
+    if (o->machine_done.notify) {
+        qemu_remove_machine_init_done_notifier(&o->machine_done);
+        o->machine_done.notify = NULL;
+    }
 }
 
 static void vfu_object_class_init(ObjectClass *klass, void *data)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 09/19] vfio-user: find and init PCI device
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (7 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 08/19] vfio-user: instantiate vfio-user context Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-21 15:57   ` Stefan Hajnoczi
  2022-02-17  7:48 ` [PATCH v6 10/19] vfio-user: run vfio-user context Jagannathan Raman
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 59 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 59 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 496e6c8038..9c76913545 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -44,6 +44,8 @@
 #include "qemu/notify.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -89,6 +91,10 @@ struct VfuObject {
     Notifier machine_done;
 
     vfu_ctx_t *vfu_ctx;
+
+    PCIDevice *pci_dev;
+
+    Error *unplug_blocker;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -163,6 +169,9 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
 static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 {
     ERRP_GUARD();
+    DeviceState *dev = NULL;
+    vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
+    int ret;
 
     if (o->vfu_ctx || !o->socket || !o->device ||
             !phase_check(PHASE_MACHINE_READY)) {
@@ -181,6 +190,48 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
         return;
     }
+
+    dev = qdev_find_recursive(sysbus_get_default(), o->device);
+    if (dev == NULL) {
+        error_setg(errp, "vfu: Device %s not found", o->device);
+        goto fail;
+    }
+
+    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        error_setg(errp, "vfu: %s not a PCI device", o->device);
+        goto fail;
+    }
+
+    o->pci_dev = PCI_DEVICE(dev);
+
+    if (pci_is_express(o->pci_dev)) {
+        pci_type = VFU_PCI_TYPE_EXPRESS;
+    }
+
+    ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
+    if (ret < 0) {
+        error_setg(errp,
+                   "vfu: Failed to attach PCI device %s to context - %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
+    error_setg(&o->unplug_blocker,
+               "vfu: %s for %s must be deleted before unplugging",
+               TYPE_VFU_OBJECT, o->device);
+    qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+
+    return;
+
+fail:
+    vfu_destroy_ctx(o->vfu_ctx);
+    if (o->unplug_blocker && o->pci_dev) {
+        qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+        error_free(o->unplug_blocker);
+        o->unplug_blocker = NULL;
+    }
+    o->vfu_ctx = NULL;
+    o->pci_dev = NULL;
 }
 
 static void vfu_object_init(Object *obj)
@@ -221,6 +272,14 @@ static void vfu_object_finalize(Object *obj)
 
     o->device = NULL;
 
+    if (o->unplug_blocker && o->pci_dev) {
+        qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
+        error_free(o->unplug_blocker);
+        o->unplug_blocker = NULL;
+    }
+
+    o->pci_dev = NULL;
+
     if (!k->nr_devs && k->auto_shutdown) {
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 10/19] vfio-user: run vfio-user context
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (8 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 09/19] vfio-user: find and init PCI device Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-22 10:13   ` Stefan Hajnoczi
  2022-02-25 16:06   ` Eric Blake
  2022-02-17  7:48 ` [PATCH v6 11/19] vfio-user: handle PCI config space accesses Jagannathan Raman
                   ` (8 subsequent siblings)
  18 siblings, 2 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 qapi/misc.json            | 23 ++++++++++
 hw/remote/vfio-user-obj.c | 96 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 118 insertions(+), 1 deletion(-)

diff --git a/qapi/misc.json b/qapi/misc.json
index e8054f415b..9d7f12ab04 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -527,3 +527,26 @@
  'data': { '*option': 'str' },
  'returns': ['CommandLineOptionInfo'],
  'allow-preconfig': true }
+
+##
+# @VFU_CLIENT_HANGUP:
+#
+# Emitted when the client of a TYPE_VFIO_USER_SERVER closes the
+# communication channel
+#
+# @id: ID of the TYPE_VFIO_USER_SERVER object
+#
+# @device: ID of attached PCI device
+#
+# Since: 6.3
+#
+# Example:
+#
+# <- { "event": "VFU_CLIENT_HANGUP",
+#      "data": { "id": "vfu1",
+#                "device": "lsi1" },
+#      "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
+#
+##
+{ 'event': 'VFU_CLIENT_HANGUP',
+  'data': { 'id': 'str', 'device': 'str' } }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 9c76913545..384ec4612d 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -27,6 +27,9 @@
  *
  * device - id of a device on the server, a required option. PCI devices
  *          alone are supported presently.
+ *
+ * notes - x-vfio-user-server could block IO and monitor during the
+ *         initialization phase.
  */
 
 #include "qemu/osdep.h"
@@ -41,11 +44,14 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "qemu/timer.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -95,6 +101,8 @@ struct VfuObject {
     PCIDevice *pci_dev;
 
     Error *unplug_blocker;
+
+    int vfu_poll_fd;
 };
 
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
@@ -146,6 +154,69 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
     vfu_object_init_ctx(o, errp);
 }
 
+static void vfu_object_ctx_run(void *opaque)
+{
+    VfuObject *o = opaque;
+    const char *id = NULL;
+    int ret = -1;
+
+    while (ret != 0) {
+        ret = vfu_run_ctx(o->vfu_ctx);
+        if (ret < 0) {
+            if (errno == EINTR) {
+                continue;
+            } else if (errno == ENOTCONN) {
+                id = object_get_canonical_path_component(OBJECT(o));
+                qapi_event_send_vfu_client_hangup(id, o->device);
+                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+                o->vfu_poll_fd = -1;
+                object_unparent(OBJECT(o));
+                break;
+            } else {
+                VFU_OBJECT_ERROR(o, "vfu: Failed to run device %s - %s",
+                                 o->device, strerror(errno));
+                break;
+            }
+        }
+    }
+}
+
+static void vfu_object_attach_ctx(void *opaque)
+{
+    VfuObject *o = opaque;
+    GPollFD pfds[1];
+    int ret;
+
+    qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+
+    pfds[0].fd = o->vfu_poll_fd;
+    pfds[0].events = G_IO_IN | G_IO_HUP | G_IO_ERR;
+
+retry_attach:
+    ret = vfu_attach_ctx(o->vfu_ctx);
+    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
+        /**
+         * vfu_object_attach_ctx can block QEMU's main loop
+         * during attach - the monitor and other IO
+         * could be unresponsive during this time.
+         */
+        (void)qemu_poll_ns(pfds, 1, 500 * (int64_t)SCALE_MS);
+        goto retry_attach;
+    } else if (ret < 0) {
+        VFU_OBJECT_ERROR(o, "vfu: Failed to attach device %s to context - %s",
+                         o->device, strerror(errno));
+        return;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        VFU_OBJECT_ERROR(o, "vfu: Failed to get poll fd %s", o->device);
+        return;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -184,7 +255,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         return;
     }
 
-    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path,
+                                LIBVFIO_USER_FLAG_ATTACH_NB,
                                 o, VFU_DEV_TYPE_PCI);
     if (o->vfu_ctx == NULL) {
         error_setg(errp, "vfu: Failed to create context - %s", strerror(errno));
@@ -221,6 +293,21 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
                TYPE_VFU_OBJECT, o->device);
     qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
 
+    ret = vfu_realize_ctx(o->vfu_ctx);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to realize device %s- %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        error_setg(errp, "vfu: Failed to get poll fd %s", o->device);
+        goto fail;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_attach_ctx, NULL, o);
+
     return;
 
 fail:
@@ -251,6 +338,8 @@ static void vfu_object_init(Object *obj)
                    TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
         return;
     }
+
+    o->vfu_poll_fd = -1;
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -264,6 +353,11 @@ static void vfu_object_finalize(Object *obj)
 
     o->socket = NULL;
 
+    if (o->vfu_poll_fd != -1) {
+        qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+        o->vfu_poll_fd = -1;
+    }
+
     if (o->vfu_ctx) {
         vfu_destroy_ctx(o->vfu_ctx);
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 11/19] vfio-user: handle PCI config space accesses
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (9 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 10/19] vfio-user: run vfio-user context Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-22 11:09   ` Stefan Hajnoczi
  2022-02-17  7:48 ` [PATCH v6 12/19] vfio-user: IOMMU support for remote device Jagannathan Raman
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 45 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 47 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 384ec4612d..4c4280d603 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -47,6 +47,7 @@
 #include "qapi/qapi-events-misc.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
@@ -217,6 +218,39 @@ retry_attach:
     qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+                                     size_t count, loff_t offset,
+                                     const bool is_write)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint32_t pci_access_width = sizeof(uint32_t);
+    size_t bytes = count;
+    uint32_t val = 0;
+    char *ptr = buf;
+    int len;
+
+    while (bytes > 0) {
+        len = (bytes > pci_access_width) ? pci_access_width : bytes;
+        if (is_write) {
+            memcpy(&val, ptr, len);
+            pci_host_config_write_common(o->pci_dev, offset,
+                                         pci_config_size(o->pci_dev),
+                                         val, len);
+            trace_vfu_cfg_write(offset, val);
+        } else {
+            val = pci_host_config_read_common(o->pci_dev, offset,
+                                              pci_config_size(o->pci_dev), len);
+            memcpy(ptr, &val, len);
+            trace_vfu_cfg_read(offset, val);
+        }
+        offset += len;
+        ptr += len;
+        bytes -= len;
+    }
+
+    return count;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -293,6 +327,17 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
                TYPE_VFU_OBJECT, o->device);
     qdev_add_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
 
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+                           pci_config_size(o->pci_dev), &vfu_object_cfg_access,
+                           VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+                           NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(errp,
+                   "vfu: Failed to setup config space handlers for %s- %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0d96..2ef7884346 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 12/19] vfio-user: IOMMU support for remote device
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (10 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 11/19] vfio-user: handle PCI config space accesses Jagannathan Raman
@ 2022-02-17  7:48 ` Jagannathan Raman
  2022-02-22 10:40   ` Stefan Hajnoczi
  2022-02-17  7:49 ` [PATCH v6 13/19] vfio-user: handle DMA mappings Jagannathan Raman
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Assign separate address space for each device in the remote processes.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/remote/iommu.h | 18 +++++++++
 hw/remote/iommu.c         | 78 +++++++++++++++++++++++++++++++++++++++
 MAINTAINERS               |  2 +
 hw/remote/meson.build     |  1 +
 4 files changed, 99 insertions(+)
 create mode 100644 include/hw/remote/iommu.h
 create mode 100644 hw/remote/iommu.c

diff --git a/include/hw/remote/iommu.h b/include/hw/remote/iommu.h
new file mode 100644
index 0000000000..8f850400f1
--- /dev/null
+++ b/include/hw/remote/iommu.h
@@ -0,0 +1,18 @@
+/**
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef REMOTE_IOMMU_H
+#define REMOTE_IOMMU_H
+
+#include "hw/pci/pci_bus.h"
+
+void remote_configure_iommu(PCIBus *pci_bus);
+
+void remote_iommu_del_device(PCIDevice *pci_dev);
+
+#endif
diff --git a/hw/remote/iommu.c b/hw/remote/iommu.c
new file mode 100644
index 0000000000..50d75cc22d
--- /dev/null
+++ b/hw/remote/iommu.c
@@ -0,0 +1,78 @@
+/**
+ * IOMMU for remote device
+ *
+ * Copyright © 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "hw/remote/iommu.h"
+#include "hw/pci/pci_bus.h"
+#include "hw/pci/pci.h"
+#include "exec/memory.h"
+#include "exec/address-spaces.h"
+#include "trace.h"
+
+struct RemoteIommuElem {
+    AddressSpace  as;
+    MemoryRegion  mr;
+};
+
+GHashTable *remote_iommu_elem_by_bdf;
+
+#define INT2VOIDP(i) (void *)(uintptr_t)(i)
+
+static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
+                                              void *opaque, int devfn)
+{
+    struct RemoteIommuElem *elem = NULL;
+    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
+
+    if (!remote_iommu_elem_by_bdf) {
+        return &address_space_memory;
+    }
+
+    elem = g_hash_table_lookup(remote_iommu_elem_by_bdf, INT2VOIDP(pci_bdf));
+
+    if (!elem) {
+        g_autofree char *mr_name = g_strdup_printf("vfu-ram-%d", pci_bdf);
+        g_autofree char *as_name = g_strdup_printf("vfu-as-%d", pci_bdf);
+
+        elem = g_malloc0(sizeof(struct RemoteIommuElem));
+
+        memory_region_init(&elem->mr, NULL, mr_name, UINT64_MAX);
+        address_space_init(&elem->as, &elem->mr, as_name);
+
+        g_hash_table_insert(remote_iommu_elem_by_bdf, INT2VOIDP(pci_bdf), elem);
+    }
+
+    return &elem->as;
+}
+
+void remote_iommu_del_device(PCIDevice *pci_dev)
+{
+    int pci_bdf;
+
+    if (!remote_iommu_elem_by_bdf || !pci_dev) {
+        return;
+    }
+
+    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
+
+    g_hash_table_remove(remote_iommu_elem_by_bdf, INT2VOIDP(pci_bdf));
+}
+
+void remote_configure_iommu(PCIBus *pci_bus)
+{
+    if (!remote_iommu_elem_by_bdf) {
+        remote_iommu_elem_by_bdf = g_hash_table_new_full(NULL, NULL,
+                                                         NULL, NULL);
+    }
+
+    pci_setup_iommu(pci_bus, remote_iommu_find_add_as, NULL);
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index 751d97852d..f47232c78c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3569,6 +3569,8 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: hw/remote/iommu.c
+F: include/hw/remote/iommu.h
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index 534ac5df79..bcef83c8cc 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iommu.c'))
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 13/19] vfio-user: handle DMA mappings
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (11 preceding siblings ...)
  2022-02-17  7:48 ` [PATCH v6 12/19] vfio-user: IOMMU support for remote device Jagannathan Raman
@ 2022-02-17  7:49 ` Jagannathan Raman
  2022-02-17  7:49 ` [PATCH v6 14/19] vfio-user: handle PCI BAR accesses Jagannathan Raman
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/machine.c       |  5 ++++
 hw/remote/vfio-user-obj.c | 55 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 3 files changed, 62 insertions(+)

diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index a9a75e170f..db4ae30710 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -22,6 +22,7 @@
 #include "hw/pci/pci_host.h"
 #include "hw/remote/iohub.h"
 #include "hw/qdev-core.h"
+#include "hw/remote/iommu.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -51,6 +52,10 @@ static void remote_machine_init(MachineState *machine)
 
     pci_host = PCI_HOST_BRIDGE(rem_host);
 
+    if (s->vfio_user) {
+        remote_configure_iommu(pci_host->bus);
+    }
+
     remote_iohub_init(&s->iohub);
 
     pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 4c4280d603..971f6ca28e 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -251,6 +251,54 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
     return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    AddressSpace *dma_as = NULL;
+    MemoryRegion *subregion = NULL;
+    g_autofree char *name = NULL;
+    struct iovec *iov = &info->iova;
+
+    if (!info->vaddr) {
+        return;
+    }
+
+    name = g_strdup_printf("mem-%s-%"PRIx64"", o->device,
+                           (uint64_t)info->vaddr);
+
+    subregion = g_new0(MemoryRegion, 1);
+
+    memory_region_init_ram_ptr(subregion, NULL, name,
+                               iov->iov_len, info->vaddr);
+
+    dma_as = pci_device_iommu_address_space(o->pci_dev);
+
+    memory_region_add_subregion(dma_as->root, (hwaddr)iov->iov_base, subregion);
+
+    trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    AddressSpace *dma_as = NULL;
+    MemoryRegion *mr = NULL;
+    ram_addr_t offset;
+
+    mr = memory_region_from_host(info->vaddr, &offset);
+    if (!mr) {
+        return;
+    }
+
+    dma_as = pci_device_iommu_address_space(o->pci_dev);
+
+    memory_region_del_subregion(dma_as->root, mr);
+
+    object_unparent((OBJECT(mr)));
+
+    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -338,6 +386,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, &dma_unregister);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup DMA handlers for %s",
+                   o->device);
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884346..f945c7e33b 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 14/19] vfio-user: handle PCI BAR accesses
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (12 preceding siblings ...)
  2022-02-17  7:49 ` [PATCH v6 13/19] vfio-user: handle DMA mappings Jagannathan Raman
@ 2022-02-17  7:49 ` Jagannathan Raman
  2022-02-22 11:04   ` Stefan Hajnoczi
  2022-02-17  7:49 ` [PATCH v6 15/19] vfio-user: handle device interrupts Jagannathan Raman
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/exec/memory.h           |   3 +
 hw/remote/vfio-user-obj.c       | 166 ++++++++++++++++++++++++++++++++
 softmmu/physmem.c               |   4 +-
 tests/qtest/fuzz/generic_fuzz.c |   9 +-
 hw/remote/trace-events          |   3 +
 5 files changed, 179 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 4d5997e6bb..4b061e62d5 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -2810,6 +2810,9 @@ MemTxResult address_space_write_cached_slow(MemoryRegionCache *cache,
                                             hwaddr addr, const void *buf,
                                             hwaddr len);
 
+int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
+bool prepare_mmio_access(MemoryRegion *mr);
+
 static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
 {
     if (is_write) {
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 971f6ca28e..2feabd06a4 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -53,6 +53,7 @@
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
 #include "qemu/timer.h"
+#include "exec/memory.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -299,6 +300,169 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
     trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
 }
 
+static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
+                                hwaddr offset, char * const buf,
+                                hwaddr len, const bool is_write)
+{
+    uint8_t *ptr = (uint8_t *)buf;
+    uint8_t *ram_ptr = NULL;
+    bool release_lock = false;
+    MemoryRegionSection section = { 0 };
+    MemoryRegion *mr = NULL;
+    int access_size;
+    hwaddr size = 0;
+    MemTxResult result;
+    uint64_t val;
+
+    section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
+                                 offset, len);
+
+    if (!section.mr) {
+        return 0;
+    }
+
+    mr = section.mr;
+
+    if (is_write && mr->readonly) {
+        warn_report("vfu: attempting to write to readonly region in "
+                    "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
+                    pci_bar, offset, (offset + len));
+        return 0;
+    }
+
+    if (memory_access_is_direct(mr, is_write)) {
+        /**
+         * Some devices expose a PCI expansion ROM, which could be buffer
+         * based as compared to other regions which are primarily based on
+         * MemoryRegionOps. memory_region_find() would already check
+         * for buffer overflow, we don't need to repeat it here.
+         */
+        ram_ptr = memory_region_get_ram_ptr(mr);
+
+        size = len;
+
+        if (is_write) {
+            memcpy(ram_ptr, buf, size);
+        } else {
+            memcpy(buf, ram_ptr, size);
+        }
+
+        goto exit;
+    }
+
+    while (len > 0) {
+        /**
+         * The read/write logic used below is similar to the ones in
+         * flatview_read/write_continue()
+         */
+        release_lock = prepare_mmio_access(mr);
+
+        access_size = memory_access_size(mr, len, offset);
+
+        if (is_write) {
+            val = ldn_he_p(ptr, access_size);
+
+            result = memory_region_dispatch_write(mr, offset, val,
+                                                  size_memop(access_size),
+                                                  MEMTXATTRS_UNSPECIFIED);
+        } else {
+            result = memory_region_dispatch_read(mr, offset, &val,
+                                                 size_memop(access_size),
+                                                 MEMTXATTRS_UNSPECIFIED);
+
+            stn_he_p(ptr, access_size, val);
+        }
+
+        if (release_lock) {
+            qemu_mutex_unlock_iothread();
+            release_lock = false;
+        }
+
+        if (result != MEMTX_OK) {
+            warn_report("vfu: failed to %s 0x%"PRIx64"",
+                        is_write ? "write to" : "read from",
+                        (offset - size));
+
+            goto exit;
+        }
+
+        len -= access_size;
+        size += access_size;
+        ptr += access_size;
+        offset += access_size;
+    }
+
+exit:
+    memory_region_unref(mr);
+
+    return size;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
+    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
+                                        char * const buf, size_t count,        \
+                                        loff_t offset, const bool is_write)    \
+    {                                                                          \
+        VfuObject *o = vfu_get_private(vfu_ctx);                               \
+        PCIDevice *pci_dev = o->pci_dev;                                       \
+                                                                               \
+        return vfu_object_bar_rw(pci_dev, BAR_NO, offset,                      \
+                                 buf, count, is_write);                        \
+    }                                                                          \
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+VFU_OBJECT_BAR_HANDLER(6)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+    &vfu_object_bar0_handler,
+    &vfu_object_bar1_handler,
+    &vfu_object_bar2_handler,
+    &vfu_object_bar3_handler,
+    &vfu_object_bar4_handler,
+    &vfu_object_bar5_handler,
+    &vfu_object_bar6_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ *                            callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+    int flags = VFU_REGION_FLAG_RW;
+    int i;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        if (!pdev->io_regions[i].size) {
+            continue;
+        }
+
+        if ((i == VFU_PCI_DEV_ROM_REGION_IDX) ||
+            pdev->io_regions[i].memory->readonly) {
+            flags &= ~VFU_REGION_FLAG_WRITE;
+        }
+
+        vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i,
+                         (size_t)pdev->io_regions[i].size,
+                         vfu_object_bar_handlers[i],
+                         flags, NULL, 0, -1, 0);
+
+        trace_vfu_bar_register(i, pdev->io_regions[i].addr,
+                               pdev->io_regions[i].size);
+    }
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -393,6 +557,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index dddf70edf5..3188d4e143 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2717,7 +2717,7 @@ void memory_region_flush_rom_device(MemoryRegion *mr, hwaddr addr, hwaddr size)
     invalidate_and_set_dirty(mr, addr, size);
 }
 
-static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
+int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
 {
     unsigned access_size_max = mr->ops->valid.max_access_size;
 
@@ -2744,7 +2744,7 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
     return l;
 }
 
-static bool prepare_mmio_access(MemoryRegion *mr)
+bool prepare_mmio_access(MemoryRegion *mr)
 {
     bool release_lock = false;
 
diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
index dd7e25851c..77547fc1d8 100644
--- a/tests/qtest/fuzz/generic_fuzz.c
+++ b/tests/qtest/fuzz/generic_fuzz.c
@@ -144,7 +144,7 @@ static void *pattern_alloc(pattern p, size_t len)
     return buf;
 }
 
-static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
+static int fuzz_memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
 {
     unsigned access_size_max = mr->ops->valid.max_access_size;
 
@@ -242,11 +242,12 @@ void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion *mr)
 
         /*
          *  If mr1 isn't RAM, address_space_translate doesn't update l. Use
-         *  memory_access_size to identify the number of bytes that it is safe
-         *  to write without accidentally writing to another MemoryRegion.
+         *  fuzz_memory_access_size to identify the number of bytes that it
+         *  is safe to write without accidentally writing to another
+         *  MemoryRegion.
          */
         if (!memory_region_is_ram(mr1)) {
-            l = memory_access_size(mr1, l, addr1);
+            l = fuzz_memory_access_size(mr1, l, addr1);
         }
         if (memory_region_is_ram(mr1) ||
             memory_region_is_romd(mr1) ||
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e33b..847d50d88f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,6 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
+vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
+vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
+vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (13 preceding siblings ...)
  2022-02-17  7:49 ` [PATCH v6 14/19] vfio-user: handle PCI BAR accesses Jagannathan Raman
@ 2022-02-17  7:49 ` Jagannathan Raman
  2022-03-07 10:24   ` Stefan Hajnoczi
  2022-02-17  7:49 ` [PATCH v6 16/19] softmmu/vl: defer backend init Jagannathan Raman
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/hw/pci/pci.h              |   6 ++
 include/hw/remote/vfio-user-obj.h |   6 ++
 hw/pci/msi.c                      |  13 +++-
 hw/pci/msix.c                     |  12 +++-
 hw/remote/machine.c               |  11 +--
 hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
 stubs/vfio-user-obj.c             |   6 ++
 MAINTAINERS                       |   1 +
 hw/remote/trace-events            |   1 +
 stubs/meson.build                 |   1 +
 10 files changed, 158 insertions(+), 6 deletions(-)
 create mode 100644 include/hw/remote/vfio-user-obj.h
 create mode 100644 stubs/vfio-user-obj.c

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index c3f3c90473..d42d526a48 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
 typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
                                 pcibus_t addr, pcibus_t size, int type);
 typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
+typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
+typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
 
 typedef struct PCIIORegion {
     pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
@@ -323,6 +325,10 @@ struct PCIDevice {
     /* Space to store MSIX table & pending bit array */
     uint8_t *msix_table;
     uint8_t *msix_pba;
+
+    PCIMSINotify *msi_notify;
+    PCIMSIxNotify *msix_notify;
+
     /* MemoryRegion container for msix exclusive BAR setup */
     MemoryRegion msix_exclusive_bar;
     /* Memory Regions for MSIX table and pending bit entries. */
diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-user-obj.h
new file mode 100644
index 0000000000..87ab78b875
--- /dev/null
+++ b/include/hw/remote/vfio-user-obj.h
@@ -0,0 +1,6 @@
+#ifndef VFIO_USER_OBJ_H
+#define VFIO_USER_OBJ_H
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus);
+
+#endif
diff --git a/hw/pci/msi.c b/hw/pci/msi.c
index 47d2b0f33c..93f5e400cc 100644
--- a/hw/pci/msi.c
+++ b/hw/pci/msi.c
@@ -51,6 +51,8 @@
  */
 bool msi_nonbroken;
 
+static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
+
 /* If we get rid of cap allocator, we won't need this. */
 static inline uint8_t msi_cap_sizeof(uint16_t flags)
 {
@@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
     dev->msi_cap = config_offset;
     dev->cap_present |= QEMU_PCI_CAP_MSI;
 
+    dev->msi_notify = pci_msi_notify;
+
     pci_set_word(dev->config + msi_flags_off(dev), flags);
     pci_set_word(dev->wmask + msi_flags_off(dev),
                  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
@@ -307,7 +311,7 @@ bool msi_is_masked(const PCIDevice *dev, unsigned int vector)
     return mask & (1U << vector);
 }
 
-void msi_notify(PCIDevice *dev, unsigned int vector)
+static void pci_msi_notify(PCIDevice *dev, unsigned int vector)
 {
     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
@@ -332,6 +336,13 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
     msi_send_message(dev, msg);
 }
 
+void msi_notify(PCIDevice *dev, unsigned int vector)
+{
+    if (dev->msi_notify) {
+        dev->msi_notify(dev, vector);
+    }
+}
+
 void msi_send_message(PCIDevice *dev, MSIMessage msg)
 {
     MemTxAttrs attrs = {};
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index ae9331cd0b..1c71e67f53 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -31,6 +31,8 @@
 #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
 #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
 
+static void pci_msix_notify(PCIDevice *dev, unsigned vector);
+
 MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
 {
     uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
@@ -334,6 +336,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
     dev->msix_table = g_malloc0(table_size);
     dev->msix_pba = g_malloc0(pba_size);
     dev->msix_entry_used = g_malloc0(nentries * sizeof *dev->msix_entry_used);
+    dev->msix_notify = pci_msix_notify;
 
     msix_mask_all(dev, nentries);
 
@@ -485,7 +488,7 @@ int msix_enabled(PCIDevice *dev)
 }
 
 /* Send an MSI-X message */
-void msix_notify(PCIDevice *dev, unsigned vector)
+static void pci_msix_notify(PCIDevice *dev, unsigned vector)
 {
     MSIMessage msg;
 
@@ -503,6 +506,13 @@ void msix_notify(PCIDevice *dev, unsigned vector)
     msi_send_message(dev, msg);
 }
 
+void msix_notify(PCIDevice *dev, unsigned vector)
+{
+    if (dev->msix_notify) {
+        dev->msix_notify(dev, vector);
+    }
+}
+
 void msix_reset(PCIDevice *dev)
 {
     if (!msix_present(dev)) {
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index db4ae30710..a8b4a3aef3 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -23,6 +23,7 @@
 #include "hw/remote/iohub.h"
 #include "hw/qdev-core.h"
 #include "hw/remote/iommu.h"
+#include "hw/remote/vfio-user-obj.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -54,12 +55,14 @@ static void remote_machine_init(MachineState *machine)
 
     if (s->vfio_user) {
         remote_configure_iommu(pci_host->bus);
-    }
 
-    remote_iohub_init(&s->iohub);
+        vfu_object_set_bus_irq(pci_host->bus);
+    } else {
+        remote_iohub_init(&s->iohub);
 
-    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
-                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+        pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
+                     &s->iohub, REMOTE_IOHUB_NB_PIRQS);
+    }
 
     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
 }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 2feabd06a4..d79bab87f1 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -54,6 +54,9 @@
 #include "hw/pci/pci.h"
 #include "qemu/timer.h"
 #include "exec/memory.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+#include "hw/remote/vfio-user-obj.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -107,6 +110,10 @@ struct VfuObject {
     int vfu_poll_fd;
 };
 
+static GHashTable *vfu_object_bdf_to_ctx_table;
+
+#define INT2VOIDP(i) (void *)(uintptr_t)(i)
+
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
 
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
@@ -463,6 +470,86 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
     }
 }
 
+static void vfu_object_irq_trigger(int pci_bdf, unsigned vector)
+{
+    vfu_ctx_t *vfu_ctx = NULL;
+
+    if (!vfu_object_bdf_to_ctx_table) {
+        return;
+    }
+
+    vfu_ctx = g_hash_table_lookup(vfu_object_bdf_to_ctx_table,
+                                  INT2VOIDP(pci_bdf));
+
+    if (vfu_ctx) {
+        vfu_irq_trigger(vfu_ctx, vector);
+    }
+}
+
+static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
+{
+    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
+                                pci_dev->devfn);
+
+    return pci_bdf;
+}
+
+static void vfu_object_set_irq(void *opaque, int pirq, int level)
+{
+    if (level) {
+        vfu_object_irq_trigger(pirq, 0);
+    }
+}
+
+static void vfu_object_msi_notify(PCIDevice *pci_dev, unsigned vector)
+{
+    int pci_bdf;
+
+    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
+
+    vfu_object_irq_trigger(pci_bdf, vector);
+}
+
+static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
+{
+    vfu_ctx_t *vfu_ctx = o->vfu_ctx;
+    int ret, pci_bdf;
+
+    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = 0;
+    if (msix_nr_vectors_allocated(pci_dev)) {
+        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
+                                       msix_nr_vectors_allocated(pci_dev));
+
+        pci_dev->msix_notify = vfu_object_msi_notify;
+    } else if (msi_nr_vectors_allocated(pci_dev)) {
+        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
+                                       msi_nr_vectors_allocated(pci_dev));
+
+        pci_dev->msi_notify = vfu_object_msi_notify;
+    }
+
+    if (ret < 0) {
+        return ret;
+    }
+
+    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
+
+    g_hash_table_insert(vfu_object_bdf_to_ctx_table, INT2VOIDP(pci_bdf),
+                        o->vfu_ctx);
+
+    return 0;
+}
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus)
+{
+    pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, NULL, 1);
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -559,6 +646,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
 
     vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
 
+    ret = vfu_object_setup_irqs(o, o->pci_dev);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup interrupts for %s",
+                   o->device);
+        goto fail;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
@@ -612,6 +706,7 @@ static void vfu_object_finalize(Object *obj)
 {
     VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
     VfuObject *o = VFU_OBJECT(obj);
+    int pci_bdf;
 
     k->nr_devs--;
 
@@ -638,9 +733,17 @@ static void vfu_object_finalize(Object *obj)
         o->unplug_blocker = NULL;
     }
 
+    if (o->pci_dev) {
+        pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(o->pci_dev)),
+                                o->pci_dev->devfn);
+        g_hash_table_remove(vfu_object_bdf_to_ctx_table, INT2VOIDP(pci_bdf));
+    }
+
     o->pci_dev = NULL;
 
     if (!k->nr_devs && k->auto_shutdown) {
+        g_hash_table_destroy(vfu_object_bdf_to_ctx_table);
+        vfu_object_bdf_to_ctx_table = NULL;
         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
     }
 
@@ -658,6 +761,10 @@ static void vfu_object_class_init(ObjectClass *klass, void *data)
 
     k->auto_shutdown = true;
 
+    msi_nonbroken = true;
+
+    vfu_object_bdf_to_ctx_table = g_hash_table_new_full(NULL, NULL, NULL, NULL);
+
     object_class_property_add(klass, "socket", "SocketAddress", NULL,
                               vfu_object_set_socket, NULL, NULL);
     object_class_property_set_description(klass, "socket",
diff --git a/stubs/vfio-user-obj.c b/stubs/vfio-user-obj.c
new file mode 100644
index 0000000000..79100d768e
--- /dev/null
+++ b/stubs/vfio-user-obj.c
@@ -0,0 +1,6 @@
+#include "qemu/osdep.h"
+#include "hw/remote/vfio-user-obj.h"
+
+void vfu_object_set_bus_irq(PCIBus *pci_bus)
+{
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index f47232c78c..e274cb46af 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3569,6 +3569,7 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: include/hw/remote/vfio-user-obj.h
 F: hw/remote/iommu.c
 F: include/hw/remote/iommu.h
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 847d50d88f..c167b3c7a5 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -12,3 +12,4 @@ vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
 vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
 vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
 vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
+vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
diff --git a/stubs/meson.build b/stubs/meson.build
index d359cbe1ad..c5ce979dc3 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -57,3 +57,4 @@ if have_system
 else
   stub_ss.add(files('qdev.c'))
 endif
+stub_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_false: files('vfio-user-obj.c'))
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 16/19] softmmu/vl: defer backend init
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (14 preceding siblings ...)
  2022-02-17  7:49 ` [PATCH v6 15/19] vfio-user: handle device interrupts Jagannathan Raman
@ 2022-02-17  7:49 ` Jagannathan Raman
  2022-03-07 10:48   ` Stefan Hajnoczi
  2022-02-17  7:49 ` [PATCH v6 17/19] vfio-user: register handlers to facilitate migration Jagannathan Raman
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Allow deferred initialization of backends. TYPE_REMOTE_MACHINE is
agnostic to QEMU's RUN_STATE. It's state is driven by the QEMU client
via the vfio-user protocol. Whereas, the backends presently defer
initialization if QEMU is in RUN_STATE_INMIGRATE. Since the remote
machine can't use RUN_STATE*, this commit allows it to ask for deferred
initialization of backend device. It is primarily targeted towards block
devices in this commit, but it needed not be limited to that.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/sysemu/sysemu.h    |  4 ++++
 block/block-backend.c      |  3 ++-
 blockdev.c                 |  2 +-
 softmmu/vl.c               | 17 +++++++++++++++++
 stubs/defer-backend-init.c |  7 +++++++
 MAINTAINERS                |  1 +
 stubs/meson.build          |  1 +
 7 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100644 stubs/defer-backend-init.c

diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b9421e03ff..3179eb1857 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -119,4 +119,8 @@ extern QemuOptsList qemu_net_opts;
 extern QemuOptsList qemu_global_opts;
 extern QemuOptsList qemu_semihosting_config_opts;
 
+bool deferred_backend_init(void);
+void set_deferred_backend_init(void);
+void clear_deferred_backend_init(void);
+
 #endif
diff --git a/block/block-backend.c b/block/block-backend.c
index 4ff6b4d785..e04f9b6469 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -20,6 +20,7 @@
 #include "sysemu/blockdev.h"
 #include "sysemu/runstate.h"
 #include "sysemu/replay.h"
+#include "sysemu/sysemu.h"
 #include "qapi/error.h"
 #include "qapi/qapi-events-block.h"
 #include "qemu/id.h"
@@ -935,7 +936,7 @@ int blk_attach_dev(BlockBackend *blk, DeviceState *dev)
     /* While migration is still incoming, we don't need to apply the
      * permissions of guest device BlockBackends. We might still have a block
      * job or NBD server writing to the image for storage migration. */
-    if (runstate_check(RUN_STATE_INMIGRATE)) {
+    if (runstate_check(RUN_STATE_INMIGRATE) || deferred_backend_init()) {
         blk->disable_perm = true;
     }
 
diff --git a/blockdev.c b/blockdev.c
index 42e098b458..d495070679 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -569,7 +569,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
         qdict_set_default_str(bs_opts, BDRV_OPT_AUTO_READ_ONLY, "on");
         assert((bdrv_flags & BDRV_O_CACHE_MASK) == 0);
 
-        if (runstate_check(RUN_STATE_INMIGRATE)) {
+        if (runstate_check(RUN_STATE_INMIGRATE) || deferred_backend_init()) {
             bdrv_flags |= BDRV_O_INACTIVE;
         }
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 5e1b35ba48..9584ab82e3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -496,6 +496,23 @@ static QemuOptsList qemu_action_opts = {
     },
 };
 
+bool defer_backend_init;
+
+bool deferred_backend_init(void)
+{
+    return defer_backend_init;
+}
+
+void set_deferred_backend_init(void)
+{
+    defer_backend_init = true;
+}
+
+void clear_deferred_backend_init(void)
+{
+    defer_backend_init = false;
+}
+
 const char *qemu_get_vm_name(void)
 {
     return qemu_name;
diff --git a/stubs/defer-backend-init.c b/stubs/defer-backend-init.c
new file mode 100644
index 0000000000..3a74c669a1
--- /dev/null
+++ b/stubs/defer-backend-init.c
@@ -0,0 +1,7 @@
+#include "qemu/osdep.h"
+#include "sysemu/sysemu.h"
+
+bool deferred_backend_init(void)
+{
+    return false;
+}
diff --git a/MAINTAINERS b/MAINTAINERS
index e274cb46af..1f55d04ce6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3572,6 +3572,7 @@ F: hw/remote/vfio-user-obj.c
 F: include/hw/remote/vfio-user-obj.h
 F: hw/remote/iommu.c
 F: include/hw/remote/iommu.h
+F: stubs/defer-backend-init.c
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/stubs/meson.build b/stubs/meson.build
index c5ce979dc3..98770966f6 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -58,3 +58,4 @@ else
   stub_ss.add(files('qdev.c'))
 endif
 stub_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_false: files('vfio-user-obj.c'))
+stub_ss.add(files('defer-backend-init.c'))
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 17/19] vfio-user: register handlers to facilitate migration
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (15 preceding siblings ...)
  2022-02-17  7:49 ` [PATCH v6 16/19] softmmu/vl: defer backend init Jagannathan Raman
@ 2022-02-17  7:49 ` Jagannathan Raman
  2022-02-18 12:20   ` Paolo Bonzini
  2022-03-07 11:26   ` Stefan Hajnoczi
  2022-02-17  7:49 ` [PATCH v6 18/19] vfio-user: handle reset of remote device Jagannathan Raman
  2022-02-17  7:49 ` [PATCH v6 19/19] vfio-user: avocado tests for vfio-user Jagannathan Raman
  18 siblings, 2 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Store and load the device's state during migration. use libvfio-user's
handlers for this purpose

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 include/block/block.h       |   1 +
 include/migration/vmstate.h |   2 +
 migration/savevm.h          |   2 +
 block.c                     |   5 +
 hw/remote/machine.c         |   7 +
 hw/remote/vfio-user-obj.c   | 467 ++++++++++++++++++++++++++++++++++++
 migration/savevm.c          |  89 +++++++
 migration/vmstate.c         |  19 ++
 8 files changed, 592 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index e1713ee306..02b89e0668 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -495,6 +495,7 @@ int generated_co_wrapper bdrv_invalidate_cache(BlockDriverState *bs,
                                                Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
+int bdrv_inactivate(BlockDriverState *bs);
 
 /* Ensure contents are flushed to disk.  */
 int generated_co_wrapper bdrv_flush(BlockDriverState *bs);
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 017c03675c..68bea576ea 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -1165,6 +1165,8 @@ extern const VMStateInfo vmstate_info_qlist;
 #define VMSTATE_END_OF_LIST()                                         \
     {}
 
+uint64_t vmstate_vmsd_size(PCIDevice *pci_dev);
+
 int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
                        void *opaque, int version_id);
 int vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..8007064ff2 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         bool in_postcopy, bool inactivate_disks);
+int qemu_remote_savevm(QEMUFile *f, DeviceState *dev);
+int qemu_remote_loadvm(QEMUFile *f);
 
 #endif
diff --git a/block.c b/block.c
index b54d59d1fa..e90aaee30c 100644
--- a/block.c
+++ b/block.c
@@ -6565,6 +6565,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
     return 0;
 }
 
+int bdrv_inactivate(BlockDriverState *bs)
+{
+    return bdrv_inactivate_recurse(bs);
+}
+
 int bdrv_inactivate_all(void)
 {
     BlockDriverState *bs = NULL;
diff --git a/hw/remote/machine.c b/hw/remote/machine.c
index a8b4a3aef3..31ef401e43 100644
--- a/hw/remote/machine.c
+++ b/hw/remote/machine.c
@@ -24,6 +24,7 @@
 #include "hw/qdev-core.h"
 #include "hw/remote/iommu.h"
 #include "hw/remote/vfio-user-obj.h"
+#include "sysemu/sysemu.h"
 
 static void remote_machine_init(MachineState *machine)
 {
@@ -86,6 +87,11 @@ static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
     s->vfio_user = value;
 }
 
+static void remote_machine_instance_init(Object *obj)
+{
+    set_deferred_backend_init();
+}
+
 static void remote_machine_class_init(ObjectClass *oc, void *data)
 {
     MachineClass *mc = MACHINE_CLASS(oc);
@@ -105,6 +111,7 @@ static const TypeInfo remote_machine = {
     .name = TYPE_REMOTE_MACHINE,
     .parent = TYPE_MACHINE,
     .instance_size = sizeof(RemoteMachineState),
+    .instance_init = remote_machine_instance_init,
     .class_init = remote_machine_class_init,
     .interfaces = (InterfaceInfo[]) {
         { TYPE_HOTPLUG_HANDLER },
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index d79bab87f1..2304643003 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -57,6 +57,13 @@
 #include "hw/pci/msi.h"
 #include "hw/pci/msix.h"
 #include "hw/remote/vfio-user-obj.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/vmstate.h"
+#include "migration/global_state.h"
+#include "block/block.h"
+#include "sysemu/block-backend.h"
+#include "net/net.h"
 
 #define TYPE_VFU_OBJECT "x-vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -108,12 +115,49 @@ struct VfuObject {
     Error *unplug_blocker;
 
     int vfu_poll_fd;
+
+    /*
+     * vfu_mig_buf holds the migration data. In the remote server, this
+     * buffer replaces the role of an IO channel which links the source
+     * and the destination.
+     *
+     * Whenever the client QEMU process initiates migration, the remote
+     * server gets notified via libvfio-user callbacks. The remote server
+     * sets up a QEMUFile object using this buffer as backend. The remote
+     * server passes this object to its migration subsystem, which slurps
+     * the VMSD of the device ('devid' above) referenced by this object
+     * and stores the VMSD in this buffer.
+     *
+     * The client subsequetly asks the remote server for any data that
+     * needs to be moved over to the destination via libvfio-user
+     * library's vfu_migration_callbacks_t callbacks. The remote hands
+     * over this buffer as data at this time.
+     *
+     * A reverse of this process happens at the destination.
+     */
+    uint8_t *vfu_mig_buf;
+
+    uint64_t vfu_mig_buf_size;
+
+    uint64_t vfu_mig_buf_pending;
+
+    uint64_t vfu_mig_data_written;
+
+    uint64_t vfu_mig_section_offset;
+
+    QEMUFile *vfu_mig_file;
+
+    vfu_migr_state_t vfu_state;
 };
 
 static GHashTable *vfu_object_bdf_to_ctx_table;
 
 #define INT2VOIDP(i) (void *)(uintptr_t)(i)
 
+#define KB(x)    ((size_t) (x) << 10)
+
+#define VFU_OBJECT_MIG_WINDOW KB(64)
+
 static void vfu_object_init_ctx(VfuObject *o, Error **errp);
 
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
@@ -163,6 +207,394 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
     vfu_object_init_ctx(o, errp);
 }
 
+/**
+ * Migration helper functions
+ *
+ * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
+ * subsystem - qemu_remote_loadvm & qemu_remote_savevm. loadvm/savevm
+ * call these functions via QEMUFileOps to load/save the VMSD of a
+ * device into vfu_mig_buf
+ *
+ */
+static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
+                                size_t size, Error **errp)
+{
+    VfuObject *o = opaque;
+
+    if (pos > o->vfu_mig_buf_size) {
+        size = 0;
+    } else if ((pos + size) > o->vfu_mig_buf_size) {
+        size = o->vfu_mig_buf_size - pos;
+    }
+
+    memcpy(buf, (o->vfu_mig_buf + pos), size);
+
+    return size;
+}
+
+static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
+                                 int64_t pos, Error **errp)
+{
+    ERRP_GUARD();
+    VfuObject *o = opaque;
+    uint64_t end = pos + iov_size(iov, iovcnt);
+    int i;
+
+    if (o->vfu_mig_buf_pending) {
+        error_setg(errp, "Migration is ongoing");
+        return 0;
+    }
+
+    if (end > o->vfu_mig_buf_size) {
+        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+    }
+
+    for (i = 0; i < iovcnt; i++) {
+        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
+               iov[i].iov_len);
+        o->vfu_mig_buf_size += iov[i].iov_len;
+    }
+
+    return iov_size(iov, iovcnt);
+}
+
+static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
+{
+    VfuObject *o = opaque;
+
+    o->vfu_mig_buf_size = 0;
+
+    g_free(o->vfu_mig_buf);
+
+    o->vfu_mig_buf = NULL;
+
+    o->vfu_mig_buf_pending = 0;
+
+    o->vfu_mig_data_written = 0;
+
+    o->vfu_mig_section_offset = 0;
+
+    return 0;
+}
+
+static const QEMUFileOps vfu_mig_fops_save = {
+    .writev_buffer  = vfu_mig_buf_write,
+    .shut_down      = vfu_mig_buf_shutdown,
+};
+
+static const QEMUFileOps vfu_mig_fops_load = {
+    .get_buffer     = vfu_mig_buf_read,
+    .shut_down      = vfu_mig_buf_shutdown,
+};
+
+static BlockDriverState *vfu_object_find_bs_by_dev(DeviceState *dev)
+{
+    BlockBackend *blk = blk_by_dev(dev);
+
+    if (!blk) {
+        return NULL;
+    }
+
+    return blk_bs(blk);
+}
+
+static int vfu_object_bdrv_invalidate_cache_by_dev(DeviceState *dev)
+{
+    BlockDriverState *bs = NULL;
+    Error *local_err = NULL;
+
+    bs = vfu_object_find_bs_by_dev(dev);
+    if (!bs) {
+        return 0;
+    }
+
+    bdrv_invalidate_cache(bs, &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -1;
+    }
+
+    return 0;
+}
+
+static int vfu_object_bdrv_inactivate_by_dev(DeviceState *dev)
+{
+    BlockDriverState *bs = NULL;
+
+    bs = vfu_object_find_bs_by_dev(dev);
+    if (!bs) {
+        return 0;
+    }
+
+    return bdrv_inactivate(bs);
+}
+
+static void vfu_object_start_stop_netdev(DeviceState *dev, bool start)
+{
+    NetClientState *nc = NULL;
+    Error *local_err = NULL;
+    char *netdev = NULL;
+
+    netdev = object_property_get_str(OBJECT(dev), "netdev", &local_err);
+    if (local_err) {
+        /**
+         * object_property_get_str() sets Error if netdev property is
+         * not found, not necessarily an error in the context of
+         * this function
+         */
+        error_free(local_err);
+        return;
+    }
+
+    if (!netdev) {
+        return;
+    }
+
+    nc = qemu_find_netdev(netdev);
+
+    if (!nc) {
+        return;
+    }
+
+    if (!start) {
+        qemu_flush_or_purge_queued_packets(nc, true);
+
+        if (nc->info && nc->info->cleanup) {
+            nc->info->cleanup(nc);
+        }
+    } else if (nc->peer) {
+        qemu_flush_or_purge_queued_packets(nc->peer, false);
+    }
+}
+
+static int vfu_object_start_devs(DeviceState *dev, void *opaque)
+{
+    int ret = vfu_object_bdrv_invalidate_cache_by_dev(dev);
+
+    if (ret) {
+        return ret;
+    }
+
+    vfu_object_start_stop_netdev(dev, true);
+
+    return ret;
+}
+
+static int vfu_object_stop_devs(DeviceState *dev, void *opaque)
+{
+    int ret = vfu_object_bdrv_inactivate_by_dev(dev);
+
+    if (ret) {
+        return ret;
+    }
+
+    vfu_object_start_stop_netdev(dev, false);
+
+    return ret;
+}
+
+/**
+ * handlers for vfu_migration_callbacks_t
+ *
+ * The libvfio-user library accesses these handlers to drive the migration
+ * at the remote end, and also to transport the data stored in vfu_mig_buf
+ *
+ */
+static void vfu_mig_state_stop_and_copy(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    int ret;
+
+    if (!o->vfu_mig_file) {
+        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save, false);
+    }
+
+    ret = qemu_remote_savevm(o->vfu_mig_file, DEVICE(o->pci_dev));
+    if (ret) {
+        qemu_file_shutdown(o->vfu_mig_file);
+        o->vfu_mig_file = NULL;
+        return;
+    }
+
+    qemu_fflush(o->vfu_mig_file);
+}
+
+static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    int ret;
+
+    if (o->vfu_state != VFU_MIGR_STATE_RESUME) {
+        goto run_ctx;
+    }
+
+    if (!o->vfu_mig_file) {
+        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
+    }
+
+    ret = qemu_remote_loadvm(o->vfu_mig_file);
+    if (ret) {
+        VFU_OBJECT_ERROR(o, "vfu: failed to restore device state");
+        return;
+    }
+
+    qemu_file_shutdown(o->vfu_mig_file);
+    o->vfu_mig_file = NULL;
+
+run_ctx:
+    ret = qdev_walk_children(DEVICE(o->pci_dev), NULL, NULL,
+                             vfu_object_start_devs,
+                             NULL, NULL);
+    if (ret) {
+        VFU_OBJECT_ERROR(o, "vfu: failed to setup backends for %s",
+                         o->device);
+        return;
+    }
+}
+
+static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    int ret;
+
+    ret = qdev_walk_children(DEVICE(o->pci_dev), NULL, NULL,
+                             vfu_object_stop_devs,
+                             NULL, NULL);
+    if (ret) {
+        VFU_OBJECT_ERROR(o, "vfu: failed to inactivate backends for %s",
+                         o->device);
+    }
+}
+
+static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (o->vfu_state == state) {
+        return 0;
+    }
+
+    switch (state) {
+    case VFU_MIGR_STATE_RESUME:
+        break;
+    case VFU_MIGR_STATE_STOP_AND_COPY:
+        vfu_mig_state_stop_and_copy(vfu_ctx);
+        break;
+    case VFU_MIGR_STATE_STOP:
+        vfu_mig_state_stop(vfu_ctx);
+        break;
+    case VFU_MIGR_STATE_PRE_COPY:
+        break;
+    case VFU_MIGR_STATE_RUNNING:
+        vfu_mig_state_running(vfu_ctx);
+        break;
+    default:
+        warn_report("vfu: Unknown migration state %d", state);
+    }
+
+    o->vfu_state = state;
+
+    return 0;
+}
+
+static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    static bool mig_ongoing;
+
+    if (!mig_ongoing && !o->vfu_mig_buf_pending) {
+        o->vfu_mig_buf_pending = o->vfu_mig_buf_size;
+        mig_ongoing = true;
+    }
+
+    if (mig_ongoing && !o->vfu_mig_buf_pending) {
+        mig_ongoing = false;
+    }
+
+    return o->vfu_mig_buf_pending;
+}
+
+static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
+                                uint64_t *size)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint64_t data_size = o->vfu_mig_buf_pending;
+
+    if (data_size > VFU_OBJECT_MIG_WINDOW) {
+        data_size = VFU_OBJECT_MIG_WINDOW;
+    }
+
+    o->vfu_mig_section_offset = o->vfu_mig_buf_size - o->vfu_mig_buf_pending;
+
+    o->vfu_mig_buf_pending -= data_size;
+
+    if (offset) {
+        *offset = 0;
+    }
+
+    if (size) {
+        *size = data_size;
+    }
+
+    return 0;
+}
+
+static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
+                                 uint64_t size, uint64_t offset)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint64_t read_offset = o->vfu_mig_section_offset + offset;
+
+    if (read_offset > o->vfu_mig_buf_size) {
+        warn_report("vfu: buffer overflow - offset outside range");
+        return -1;
+    }
+
+    if ((read_offset + size) > o->vfu_mig_buf_size) {
+        warn_report("vfu: buffer overflow - size outside range");
+        size = o->vfu_mig_buf_size - read_offset;
+    }
+
+    memcpy(buf, (o->vfu_mig_buf + read_offset), size);
+
+    return size;
+}
+
+static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
+                                  uint64_t size, uint64_t offset)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint64_t end = o->vfu_mig_data_written + offset + size;
+
+    if (end > o->vfu_mig_buf_size) {
+        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+        o->vfu_mig_buf_size = end;
+    }
+
+    memcpy((o->vfu_mig_buf + o->vfu_mig_data_written + offset), data, size);
+
+    return size;
+}
+
+static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    o->vfu_mig_data_written += count;
+
+    return 0;
+}
+
+static const vfu_migration_callbacks_t vfu_mig_cbs = {
+    .version = VFU_MIGR_CALLBACKS_VERS,
+    .transition = &vfu_mig_transition,
+    .get_pending_bytes = &vfu_mig_get_pending_bytes,
+    .prepare_data = &vfu_mig_prepare_data,
+    .read_data = &vfu_mig_read_data,
+    .data_written = &vfu_mig_data_written,
+    .write_data = &vfu_mig_write_data,
+};
+
 static void vfu_object_ctx_run(void *opaque)
 {
     VfuObject *o = opaque;
@@ -550,6 +982,13 @@ void vfu_object_set_bus_irq(PCIBus *pci_bus)
     pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, NULL, 1);
 }
 
+static bool vfu_object_migratable(VfuObject *o)
+{
+    DeviceClass *dc = DEVICE_GET_CLASS(o->pci_dev);
+
+    return dc->vmsd && !dc->vmsd->unmigratable;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -575,6 +1014,7 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
     ERRP_GUARD();
     DeviceState *dev = NULL;
     vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
+    uint64_t migr_regs_size, migr_size;
     int ret;
 
     if (o->vfu_ctx || !o->socket || !o->device ||
@@ -653,6 +1093,31 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    migr_regs_size = vfu_get_migr_register_area_size();
+    migr_size = migr_regs_size + VFU_OBJECT_MIG_WINDOW;
+
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
+                           migr_size, NULL,
+                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to register migration BAR %s- %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
+    if (!vfu_object_migratable(o)) {
+        goto realize_ctx;
+    }
+
+    ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
+                                               migr_regs_size);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup migration %s- %s",
+                   o->device, strerror(errno));
+        goto fail;
+    }
+
+realize_ctx:
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(errp, "vfu: Failed to realize device %s- %s",
@@ -700,6 +1165,8 @@ static void vfu_object_init(Object *obj)
     }
 
     o->vfu_poll_fd = -1;
+
+    o->vfu_state = VFU_MIGR_STATE_STOP;
 }
 
 static void vfu_object_finalize(Object *obj)
diff --git a/migration/savevm.c b/migration/savevm.c
index 1599b02fbc..2cc3b74287 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -66,6 +66,7 @@
 #include "net/announce.h"
 #include "qemu/yank.h"
 #include "yank_functions.h"
+#include "hw/qdev-core.h"
 
 const unsigned int postcopy_ram_discard_version;
 
@@ -1606,6 +1607,64 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     return ret;
 }
 
+static SaveStateEntry *find_se_from_dev(DeviceState *dev)
+{
+    SaveStateEntry *se;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->opaque == dev) {
+            return se;
+        }
+    }
+
+    return NULL;
+}
+
+static int qemu_remote_savevm_section_full(DeviceState *dev, void *opaque)
+{
+    QEMUFile *f = opaque;
+    SaveStateEntry *se;
+    int ret;
+
+    se = find_se_from_dev(dev);
+    if (!se) {
+        return 0;
+    }
+
+    if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque) ||
+        se->vmsd->unmigratable) {
+        return 0;
+    }
+
+    save_section_header(f, se, QEMU_VM_SECTION_FULL);
+
+    ret = vmstate_save(f, se, NULL);
+    if (ret) {
+        qemu_file_set_error(f, ret);
+        return ret;
+    }
+
+    save_section_footer(f, se);
+
+    return 0;
+}
+
+int qemu_remote_savevm(QEMUFile *f, DeviceState *dev)
+{
+    int ret = qdev_walk_children(dev, NULL, NULL,
+                                 qemu_remote_savevm_section_full,
+                                 NULL, f);
+
+    if (ret) {
+        return ret;
+    }
+
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+
+    return 0;
+}
+
 void qemu_savevm_live_state(QEMUFile *f)
 {
     /* save QEMU_VM_SECTION_END section */
@@ -2447,6 +2506,36 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
     return 0;
 }
 
+int qemu_remote_loadvm(QEMUFile *f)
+{
+    uint8_t section_type;
+    int ret = 0;
+
+    while (true) {
+        section_type = qemu_get_byte(f);
+
+        ret = qemu_file_get_error(f);
+        if (ret) {
+            break;
+        }
+
+        switch (section_type) {
+        case QEMU_VM_SECTION_FULL:
+            ret = qemu_loadvm_section_start_full(f, NULL);
+            if (ret < 0) {
+                break;
+            }
+            break;
+        case QEMU_VM_EOF:
+            return ret;
+        default:
+            return -EINVAL;
+        }
+    }
+
+    return ret;
+}
+
 static int
 qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
 {
diff --git a/migration/vmstate.c b/migration/vmstate.c
index 05f87cdddc..83f8562792 100644
--- a/migration/vmstate.c
+++ b/migration/vmstate.c
@@ -63,6 +63,25 @@ static int vmstate_size(void *opaque, const VMStateField *field)
     return size;
 }
 
+uint64_t vmstate_vmsd_size(PCIDevice *pci_dev)
+{
+    DeviceClass *dc = DEVICE_GET_CLASS(DEVICE(pci_dev));
+    const VMStateField *field = NULL;
+    uint64_t size = 0;
+
+    if (!dc->vmsd) {
+        return 0;
+    }
+
+    field = dc->vmsd->fields;
+    while (field && field->name) {
+        size += vmstate_size(pci_dev, field);
+        field++;
+    }
+
+    return size;
+}
+
 static void vmstate_handle_alloc(void *ptr, const VMStateField *field,
                                  void *opaque)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 18/19] vfio-user: handle reset of remote device
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (16 preceding siblings ...)
  2022-02-17  7:49 ` [PATCH v6 17/19] vfio-user: register handlers to facilitate migration Jagannathan Raman
@ 2022-02-17  7:49 ` Jagannathan Raman
  2022-03-07 11:36   ` Stefan Hajnoczi
  2022-02-17  7:49 ` [PATCH v6 19/19] vfio-user: avocado tests for vfio-user Jagannathan Raman
  18 siblings, 1 reply; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Adds handler to reset a remote device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 2304643003..55f1bf5e0f 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -989,6 +989,19 @@ static bool vfu_object_migratable(VfuObject *o)
     return dc->vmsd && !dc->vmsd->unmigratable;
 }
 
+static int vfu_object_device_reset(vfu_ctx_t *vfu_ctx, vfu_reset_type_t type)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (type == VFU_RESET_LOST_CONN) {
+        return 0;
+    }
+
+    qdev_reset_all(DEVICE(o->pci_dev));
+
+    return 0;
+}
+
 /*
  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -1105,6 +1118,12 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
         goto fail;
     }
 
+    ret = vfu_setup_device_reset_cb(o->vfu_ctx, &vfu_object_device_reset);
+    if (ret < 0) {
+        error_setg(errp, "vfu: Failed to setup reset callback");
+        goto fail;
+    }
+
     if (!vfu_object_migratable(o)) {
         goto realize_ctx;
     }
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* [PATCH v6 19/19] vfio-user: avocado tests for vfio-user
  2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
                   ` (17 preceding siblings ...)
  2022-02-17  7:49 ` [PATCH v6 18/19] vfio-user: handle reset of remote device Jagannathan Raman
@ 2022-02-17  7:49 ` Jagannathan Raman
  18 siblings, 0 replies; 76+ messages in thread
From: Jagannathan Raman @ 2022-02-17  7:49 UTC (permalink / raw)
  To: qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, pbonzini, jag.raman,
	eblake, dgilbert

Avocado tests for libvfio-user in QEMU - tests startup,
hotplug and migration of the server object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 MAINTAINERS                |   1 +
 tests/avocado/vfio-user.py | 234 +++++++++++++++++++++++++++++++++++++
 2 files changed, 235 insertions(+)
 create mode 100644 tests/avocado/vfio-user.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 1f55d04ce6..02728eb8b8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3573,6 +3573,7 @@ F: include/hw/remote/vfio-user-obj.h
 F: hw/remote/iommu.c
 F: include/hw/remote/iommu.h
 F: stubs/defer-backend-init.c
+F: tests/avocado/vfio-user.py
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/tests/avocado/vfio-user.py b/tests/avocado/vfio-user.py
new file mode 100644
index 0000000000..3f13d9895d
--- /dev/null
+++ b/tests/avocado/vfio-user.py
@@ -0,0 +1,234 @@
+# vfio-user protocol sanity test
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+import os
+import socket
+import uuid
+
+from avocado_qemu import QemuSystemTest
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado_qemu import exec_command_and_wait_for_pattern
+
+from avocado.utils import network
+from avocado.utils import wait
+
+class VfioUser(QemuSystemTest):
+    """
+    :avocado: tags=vfiouser
+    """
+    KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+    timeout = 20
+
+    @staticmethod
+    def migration_finished(vm):
+        res = vm.command('query-migrate')
+        if 'status' in res:
+            return res['status'] in ('completed', 'failed')
+        else:
+            return False
+
+    def _get_free_port(self):
+        port = network.find_free_port()
+        if port is None:
+            self.cancel('Failed to find a free port')
+        return port
+
+    def validate_vm_launch(self, vm):
+        wait_for_console_pattern(self, 'as init process',
+                                 'Kernel panic - not syncing', vm=vm)
+        exec_command(self, 'mount -t sysfs sysfs /sys', vm=vm)
+        exec_command_and_wait_for_pattern(self,
+                                          'cat /sys/bus/pci/devices/*/uevent',
+                                          'PCI_ID=1000:0060', vm=vm)
+
+    def launch_server_startup(self, socket, *opts):
+        server_vm = self.get_vm()
+        server_vm.add_args('-machine', 'x-remote,vfio-user=on')
+        server_vm.add_args('-nodefaults')
+        server_vm.add_args('-device', 'megasas,id=sas1')
+        server_vm.add_args('-object', 'x-vfio-user-server,id=vfioobj1,'
+                           'type=unix,path='+socket+',device=sas1')
+        for opt in opts:
+            server_vm.add_args(opt)
+        server_vm.launch()
+        return server_vm
+
+    def launch_server_hotplug(self, socket):
+        server_vm = self.get_vm()
+        server_vm.add_args('-machine', 'x-remote,vfio-user=on')
+        server_vm.add_args('-nodefaults')
+        server_vm.launch()
+        server_vm.qmp('device_add', args_dict=None, conv_keys=None,
+                      driver='megasas', id='sas1')
+        obj_add_opts = {'qom-type': 'x-vfio-user-server',
+                        'id': 'vfioobj', 'device': 'sas1',
+                        'socket': {'type': 'unix', 'path': socket}}
+        server_vm.qmp('object-add', args_dict=obj_add_opts)
+        return server_vm
+
+    def launch_client(self, kernel_path, initrd_path, kernel_command_line,
+                      machine_type, socket, *opts):
+        client_vm = self.get_vm()
+        client_vm.set_console()
+        client_vm.add_args('-machine', machine_type)
+        client_vm.add_args('-accel', 'kvm')
+        client_vm.add_args('-cpu', 'host')
+        client_vm.add_args('-object',
+                           'memory-backend-memfd,id=sysmem-file,size=2G')
+        client_vm.add_args('--numa', 'node,memdev=sysmem-file')
+        client_vm.add_args('-m', '2048')
+        client_vm.add_args('-kernel', kernel_path,
+                           '-initrd', initrd_path,
+                           '-append', kernel_command_line)
+        client_vm.add_args('-device',
+                           'vfio-user-pci,x-enable-migration=true,'
+                           'socket='+socket)
+        for opt in opts:
+            client_vm.add_args(opt)
+        client_vm.launch()
+        return client_vm
+
+    def do_test_startup(self, kernel_url, initrd_url, kernel_command_line,
+                machine_type):
+        self.require_accelerator('kvm')
+
+        kernel_path = self.fetch_asset(kernel_url)
+        initrd_path = self.fetch_asset(initrd_url)
+        socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(socket):
+            os.remove(socket)
+        self.launch_server_startup(socket)
+        client = self.launch_client(kernel_path, initrd_path,
+                                    kernel_command_line, machine_type, socket)
+        self.validate_vm_launch(client)
+
+    def do_test_hotplug(self, kernel_url, initrd_url, kernel_command_line,
+                machine_type):
+        self.require_accelerator('kvm')
+
+        kernel_path = self.fetch_asset(kernel_url)
+        initrd_path = self.fetch_asset(initrd_url)
+        socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(socket):
+            os.remove(socket)
+        self.launch_server_hotplug(socket)
+        client = self.launch_client(kernel_path, initrd_path,
+                                    kernel_command_line, machine_type, socket)
+        self.validate_vm_launch(client)
+
+    def do_test_migrate(self, kernel_url, initrd_url, kernel_command_line,
+                machine_type):
+        self.require_accelerator('kvm')
+
+        kernel_path = self.fetch_asset(kernel_url)
+        initrd_path = self.fetch_asset(initrd_url)
+        srv_socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(srv_socket):
+            os.remove(srv_socket)
+        dst_socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(dst_socket):
+            os.remove(dst_socket)
+        client_uri = 'tcp:localhost:%u' % self._get_free_port()
+
+        """ Launch destination VM """
+        self.launch_server_startup(dst_socket)
+        dst_client = self.launch_client(kernel_path, initrd_path,
+                                        kernel_command_line, machine_type,
+                                        dst_socket, '-incoming', client_uri)
+
+        """ Launch source VM """
+        self.launch_server_startup(srv_socket)
+        src_client = self.launch_client(kernel_path, initrd_path,
+                                        kernel_command_line, machine_type,
+                                        srv_socket)
+        self.validate_vm_launch(src_client)
+
+        """ Kick off migration """
+        src_client.qmp('migrate', uri=client_uri)
+
+        wait.wait_for(self.migration_finished,
+                      timeout=self.timeout,
+                      step=0.1,
+                      args=(dst_client,))
+
+        self.assertEqual(src_client.command('query-migrate')['status'], \
+                         'completed')
+        self.assertEqual(dst_client.command('query-migrate')['status'], \
+                         'completed')
+        self.assertEqual(src_client.command('query-status')['status'], \
+                         'postmigrate')
+        self.assertEqual(dst_client.command('query-status')['status'], \
+                         'running')
+
+    def test_vfio_user_x86_64(self):
+        """
+        :avocado: tags=arch:x86_64
+        :avocado: tags=distro:centos
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyS0 rdinit=/bin/bash')
+        machine_type = 'pc'
+        self.do_test_startup(kernel_url, initrd_url, kernel_command_line,
+                             machine_type)
+
+    def test_vfio_user_aarch64(self):
+        """
+        :avocado: tags=arch:aarch64
+        :avocado: tags=distro:ubuntu
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'rdinit=/bin/bash console=ttyAMA0')
+        machine_type = 'virt,gic-version=3'
+        self.do_test_startup(kernel_url, initrd_url, kernel_command_line,
+                             machine_type)
+
+    def test_vfio_user_hotplug_x86_64(self):
+        """
+        :avocado: tags=arch:x86_64
+        :avocado: tags=distro:centos
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyS0 rdinit=/bin/bash')
+        machine_type = 'pc'
+        self.do_test_hotplug(kernel_url, initrd_url, kernel_command_line,
+                             machine_type)
+
+    def test_vfio_user_migrate_x86_64(self):
+        """
+        :avocado: tags=arch:x86_64
+        :avocado: tags=distro:centos
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyS0 rdinit=/bin/bash')
+        machine_type = 'pc'
+        self.do_test_migrate(kernel_url, initrd_url, kernel_command_line,
+                             machine_type)
+
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-17  7:48 ` [PATCH v6 01/19] configure, meson: override C compiler for cmake Jagannathan Raman
@ 2022-02-17 12:09   ` Peter Maydell
  2022-02-17 15:49     ` Jag Raman
  2022-02-18  3:40     ` Jag Raman
  0 siblings, 2 replies; 76+ messages in thread
From: Peter Maydell @ 2022-02-17 12:09 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, berrange, bleal, john.g.johnson,
	john.levon, qemu-devel, armbru, quintela, alex.williamson,
	pbonzini, mst, stefanha, thanos.makatos, kanth.ghatraju, eblake,
	dgilbert, f4bug

On Thu, 17 Feb 2022 at 07:56, Jagannathan Raman <jag.raman@oracle.com> wrote:
>
> The compiler path that cmake gets from meson is corrupted. It results in
> the following error:
> | -- The C compiler identification is unknown
> | CMake Error at CMakeLists.txt:35 (project):
> | The CMAKE_C_COMPILER:
> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
> | is not a full path to an existing compiler tool.
>
> Explicitly specify the C compiler for cmake to avoid this error

This sounds like a bug in Meson. Is there a Meson bug report
we can reference in the commit message here ?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-17 12:09   ` Peter Maydell
@ 2022-02-17 15:49     ` Jag Raman
  2022-02-18  3:40     ` Jag Raman
  1 sibling, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-17 15:49 UTC (permalink / raw)
  To: Peter Maydell
  Cc: eduardo, Elena Ufimtseva, berrange, bleal, John Johnson,
	john.levon, qemu-devel, armbru, quintela, alex.williamson,
	pbonzini, mst, stefanha, thanos.makatos, Kanth Ghatraju, eblake,
	dgilbert, f4bug



> On Feb 17, 2022, at 7:09 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> On Thu, 17 Feb 2022 at 07:56, Jagannathan Raman <jag.raman@oracle.com> wrote:
>> 
>> The compiler path that cmake gets from meson is corrupted. It results in
>> the following error:
>> | -- The C compiler identification is unknown
>> | CMake Error at CMakeLists.txt:35 (project):
>> | The CMAKE_C_COMPILER:
>> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
>> | is not a full path to an existing compiler tool.
>> 
>> Explicitly specify the C compiler for cmake to avoid this error
> 
> This sounds like a bug in Meson. Is there a Meson bug report
> we can reference in the commit message here ?

Hi Peter,

I’ll try to locate the bug report and the meson version which has the fix,
and get back to you.

Thank you!
--
Jag

> 
> thanks
> -- PMM


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-17 12:09   ` Peter Maydell
  2022-02-17 15:49     ` Jag Raman
@ 2022-02-18  3:40     ` Jag Raman
  2022-02-18 12:13       ` Paolo Bonzini
  1 sibling, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-02-18  3:40 UTC (permalink / raw)
  To: Peter Maydell, Paolo Bonzini
  Cc: eduardo, Elena Ufimtseva, Daniel P. Berrangé,
	Beraldo Leal, John Johnson, John Levon, qemu-devel,
	Markus Armbruster, Juan Quintela, Alex Williamson,
	Michael S. Tsirkin, Stefan Hajnoczi, Thanos Makatos,
	Kanth Ghatraju, Eric Blake, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé



> On Feb 17, 2022, at 7:09 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
> 
> On Thu, 17 Feb 2022 at 07:56, Jagannathan Raman <jag.raman@oracle.com> wrote:
>> 
>> The compiler path that cmake gets from meson is corrupted. It results in
>> the following error:
>> | -- The C compiler identification is unknown
>> | CMake Error at CMakeLists.txt:35 (project):
>> | The CMAKE_C_COMPILER:
>> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
>> | is not a full path to an existing compiler tool.
>> 
>> Explicitly specify the C compiler for cmake to avoid this error
> 
> This sounds like a bug in Meson. Is there a Meson bug report
> we can reference in the commit message here ?

Hi Peter,

This issue reproduces with the latest meson [1] also.

I noticed the following about the “binaries” section [2]. The manual
says meson could pass the values in this section to find_program [3].
As such I’m wondering if it’s OK to set compiler flags in this section
because find_program doesn’t seem to accept any compiler flags.

The compiler flags could be set in the “built-in options” section using
options such as “c_args”, “cpp_args” and “objc_args” [4]. When I
moved CPU_CFLAGS from the binaries section to the built-in-options
section in “configure", I don’t see the issue anymore. 

[1]: https://github.com/mesonbuild/meson.git
[2]: https://mesonbuild.com/Machine-files.html#binaries
[3]: https://cmake.org/cmake/help/latest/command/find_program.html
[4]: https://github.com/mesonbuild/meson/blob/master/docs/markdown/Reference-tables.md (section “Language arguments parameter names")

Thank you!
--
Jag

> 
> thanks
> -- PMM


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-18  3:40     ` Jag Raman
@ 2022-02-18 12:13       ` Paolo Bonzini
  2022-02-18 14:49         ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2022-02-18 12:13 UTC (permalink / raw)
  To: Jag Raman, Peter Maydell
  Cc: eduardo, Elena Ufimtseva, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Juan Quintela, qemu-devel,
	Markus Armbruster, Alex Williamson, Michael S. Tsirkin,
	Stefan Hajnoczi, Thanos Makatos, Eric Blake, Kanth Ghatraju,
	Dr. David Alan Gilbert, Philippe Mathieu-Daudé

On 2/18/22 04:40, Jag Raman wrote:
> 
> 
>> On Feb 17, 2022, at 7:09 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>
>> On Thu, 17 Feb 2022 at 07:56, Jagannathan Raman <jag.raman@oracle.com> wrote:
>>>
>>> The compiler path that cmake gets from meson is corrupted. It results in
>>> the following error:
>>> | -- The C compiler identification is unknown
>>> | CMake Error at CMakeLists.txt:35 (project):
>>> | The CMAKE_C_COMPILER:
>>> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
>>> | is not a full path to an existing compiler tool.
>>>
>>> Explicitly specify the C compiler for cmake to avoid this error
>>
>> This sounds like a bug in Meson. Is there a Meson bug report
>> we can reference in the commit message here ?
> 
> Hi Peter,
> 
> This issue reproduces with the latest meson [1] also.

0.60.0 or more recent versions should have a fix, which would do exactly 
what this patch does: do not define CMAKE_C_COMPILER_LAUNCHER, and place 
the whole binaries.c variable in CMAKE_C_COMPILER.  What are the 
contents of the genrated CMakeMesonToolchainFile.cmake and 
CMakeCache.txt files, without and with your patch?

> I noticed the following about the “binaries” section [2]. The manual
> says meson could pass the values in this section to find_program [3].
> As such I’m wondering if it’s OK to set compiler flags in this section
> because find_program doesn’t seem to accept any compiler flags.

The full quote of the manual is "These can be used internally by Meson, 
or by the find_program function", and the C compiler variable "c" is in 
the former category.

There is an important difference between the flags in "binaries" and 
those in "built-in options". What is in "binaries" is used when 
requesting e.g. the compiler search path, while what is in "built-in 
options" is not.  So options like "-m32" are definitely part of 
"binaries", not "built-in options":

     $ gcc --print-multi-os-directory
     ../lib64
     $ gcc -m32 --print-multi-os-directory
     ../lib

Paolo


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 17/19] vfio-user: register handlers to facilitate migration
  2022-02-17  7:49 ` [PATCH v6 17/19] vfio-user: register handlers to facilitate migration Jagannathan Raman
@ 2022-02-18 12:20   ` Paolo Bonzini
  2022-02-18 14:55     ` Jag Raman
  2022-03-07 11:26   ` Stefan Hajnoczi
  1 sibling, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2022-02-18 12:20 UTC (permalink / raw)
  To: Jagannathan Raman, qemu-devel
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, f4bug, quintela, alex.williamson,
	kanth.ghatraju, stefanha, thanos.makatos, eblake, dgilbert

On 2/17/22 08:49, Jagannathan Raman wrote:
> Store and load the device's state during migration. use libvfio-user's
> handlers for this purpose
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>

Why does no one call clear_deferred_backend_init?

Paolo

> ---
>   include/block/block.h       |   1 +
>   include/migration/vmstate.h |   2 +
>   migration/savevm.h          |   2 +
>   block.c                     |   5 +
>   hw/remote/machine.c         |   7 +
>   hw/remote/vfio-user-obj.c   | 467 ++++++++++++++++++++++++++++++++++++
>   migration/savevm.c          |  89 +++++++
>   migration/vmstate.c         |  19 ++
>   8 files changed, 592 insertions(+)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index e1713ee306..02b89e0668 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -495,6 +495,7 @@ int generated_co_wrapper bdrv_invalidate_cache(BlockDriverState *bs,
>                                                  Error **errp);
>   void bdrv_invalidate_cache_all(Error **errp);
>   int bdrv_inactivate_all(void);
> +int bdrv_inactivate(BlockDriverState *bs);
>   
>   /* Ensure contents are flushed to disk.  */
>   int generated_co_wrapper bdrv_flush(BlockDriverState *bs);
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 017c03675c..68bea576ea 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -1165,6 +1165,8 @@ extern const VMStateInfo vmstate_info_qlist;
>   #define VMSTATE_END_OF_LIST()                                         \
>       {}
>   
> +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev);
> +
>   int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
>                          void *opaque, int version_id);
>   int vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 6461342cb4..8007064ff2 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
>   int qemu_load_device_state(QEMUFile *f);
>   int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>           bool in_postcopy, bool inactivate_disks);
> +int qemu_remote_savevm(QEMUFile *f, DeviceState *dev);
> +int qemu_remote_loadvm(QEMUFile *f);
>   
>   #endif
> diff --git a/block.c b/block.c
> index b54d59d1fa..e90aaee30c 100644
> --- a/block.c
> +++ b/block.c
> @@ -6565,6 +6565,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
>       return 0;
>   }
>   
> +int bdrv_inactivate(BlockDriverState *bs)
> +{
> +    return bdrv_inactivate_recurse(bs);
> +}
> +
>   int bdrv_inactivate_all(void)
>   {
>       BlockDriverState *bs = NULL;
> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> index a8b4a3aef3..31ef401e43 100644
> --- a/hw/remote/machine.c
> +++ b/hw/remote/machine.c
> @@ -24,6 +24,7 @@
>   #include "hw/qdev-core.h"
>   #include "hw/remote/iommu.h"
>   #include "hw/remote/vfio-user-obj.h"
> +#include "sysemu/sysemu.h"
>   
>   static void remote_machine_init(MachineState *machine)
>   {
> @@ -86,6 +87,11 @@ static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
>       s->vfio_user = value;
>   }
>   
> +static void remote_machine_instance_init(Object *obj)
> +{
> +    set_deferred_backend_init();
> +}
> +
>   static void remote_machine_class_init(ObjectClass *oc, void *data)
>   {
>       MachineClass *mc = MACHINE_CLASS(oc);
> @@ -105,6 +111,7 @@ static const TypeInfo remote_machine = {
>       .name = TYPE_REMOTE_MACHINE,
>       .parent = TYPE_MACHINE,
>       .instance_size = sizeof(RemoteMachineState),
> +    .instance_init = remote_machine_instance_init,
>       .class_init = remote_machine_class_init,
>       .interfaces = (InterfaceInfo[]) {
>           { TYPE_HOTPLUG_HANDLER },
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index d79bab87f1..2304643003 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -57,6 +57,13 @@
>   #include "hw/pci/msi.h"
>   #include "hw/pci/msix.h"
>   #include "hw/remote/vfio-user-obj.h"
> +#include "migration/qemu-file.h"
> +#include "migration/savevm.h"
> +#include "migration/vmstate.h"
> +#include "migration/global_state.h"
> +#include "block/block.h"
> +#include "sysemu/block-backend.h"
> +#include "net/net.h"
>   
>   #define TYPE_VFU_OBJECT "x-vfio-user-server"
>   OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -108,12 +115,49 @@ struct VfuObject {
>       Error *unplug_blocker;
>   
>       int vfu_poll_fd;
> +
> +    /*
> +     * vfu_mig_buf holds the migration data. In the remote server, this
> +     * buffer replaces the role of an IO channel which links the source
> +     * and the destination.
> +     *
> +     * Whenever the client QEMU process initiates migration, the remote
> +     * server gets notified via libvfio-user callbacks. The remote server
> +     * sets up a QEMUFile object using this buffer as backend. The remote
> +     * server passes this object to its migration subsystem, which slurps
> +     * the VMSD of the device ('devid' above) referenced by this object
> +     * and stores the VMSD in this buffer.
> +     *
> +     * The client subsequetly asks the remote server for any data that
> +     * needs to be moved over to the destination via libvfio-user
> +     * library's vfu_migration_callbacks_t callbacks. The remote hands
> +     * over this buffer as data at this time.
> +     *
> +     * A reverse of this process happens at the destination.
> +     */
> +    uint8_t *vfu_mig_buf;
> +
> +    uint64_t vfu_mig_buf_size;
> +
> +    uint64_t vfu_mig_buf_pending;
> +
> +    uint64_t vfu_mig_data_written;
> +
> +    uint64_t vfu_mig_section_offset;
> +
> +    QEMUFile *vfu_mig_file;
> +
> +    vfu_migr_state_t vfu_state;
>   };
>   
>   static GHashTable *vfu_object_bdf_to_ctx_table;
>   
>   #define INT2VOIDP(i) (void *)(uintptr_t)(i)
>   
> +#define KB(x)    ((size_t) (x) << 10)
> +
> +#define VFU_OBJECT_MIG_WINDOW KB(64)
> +
>   static void vfu_object_init_ctx(VfuObject *o, Error **errp);
>   
>   static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
> @@ -163,6 +207,394 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>       vfu_object_init_ctx(o, errp);
>   }
>   
> +/**
> + * Migration helper functions
> + *
> + * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
> + * subsystem - qemu_remote_loadvm & qemu_remote_savevm. loadvm/savevm
> + * call these functions via QEMUFileOps to load/save the VMSD of a
> + * device into vfu_mig_buf
> + *
> + */
> +static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
> +                                size_t size, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    if (pos > o->vfu_mig_buf_size) {
> +        size = 0;
> +    } else if ((pos + size) > o->vfu_mig_buf_size) {
> +        size = o->vfu_mig_buf_size - pos;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + pos), size);
> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
> +                                 int64_t pos, Error **errp)
> +{
> +    ERRP_GUARD();
> +    VfuObject *o = opaque;
> +    uint64_t end = pos + iov_size(iov, iovcnt);
> +    int i;
> +
> +    if (o->vfu_mig_buf_pending) {
> +        error_setg(errp, "Migration is ongoing");
> +        return 0;
> +    }
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +    }
> +
> +    for (i = 0; i < iovcnt; i++) {
> +        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
> +               iov[i].iov_len);
> +        o->vfu_mig_buf_size += iov[i].iov_len;
> +    }
> +
> +    return iov_size(iov, iovcnt);
> +}
> +
> +static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    o->vfu_mig_buf_size = 0;
> +
> +    g_free(o->vfu_mig_buf);
> +
> +    o->vfu_mig_buf = NULL;
> +
> +    o->vfu_mig_buf_pending = 0;
> +
> +    o->vfu_mig_data_written = 0;
> +
> +    o->vfu_mig_section_offset = 0;
> +
> +    return 0;
> +}
> +
> +static const QEMUFileOps vfu_mig_fops_save = {
> +    .writev_buffer  = vfu_mig_buf_write,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +static const QEMUFileOps vfu_mig_fops_load = {
> +    .get_buffer     = vfu_mig_buf_read,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +static BlockDriverState *vfu_object_find_bs_by_dev(DeviceState *dev)
> +{
> +    BlockBackend *blk = blk_by_dev(dev);
> +
> +    if (!blk) {
> +        return NULL;
> +    }
> +
> +    return blk_bs(blk);
> +}
> +
> +static int vfu_object_bdrv_invalidate_cache_by_dev(DeviceState *dev)
> +{
> +    BlockDriverState *bs = NULL;
> +    Error *local_err = NULL;
> +
> +    bs = vfu_object_find_bs_by_dev(dev);
> +    if (!bs) {
> +        return 0;
> +    }
> +
> +    bdrv_invalidate_cache(bs, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int vfu_object_bdrv_inactivate_by_dev(DeviceState *dev)
> +{
> +    BlockDriverState *bs = NULL;
> +
> +    bs = vfu_object_find_bs_by_dev(dev);
> +    if (!bs) {
> +        return 0;
> +    }
> +
> +    return bdrv_inactivate(bs);
> +}
> +
> +static void vfu_object_start_stop_netdev(DeviceState *dev, bool start)
> +{
> +    NetClientState *nc = NULL;
> +    Error *local_err = NULL;
> +    char *netdev = NULL;
> +
> +    netdev = object_property_get_str(OBJECT(dev), "netdev", &local_err);
> +    if (local_err) {
> +        /**
> +         * object_property_get_str() sets Error if netdev property is
> +         * not found, not necessarily an error in the context of
> +         * this function
> +         */
> +        error_free(local_err);
> +        return;
> +    }
> +
> +    if (!netdev) {
> +        return;
> +    }
> +
> +    nc = qemu_find_netdev(netdev);
> +
> +    if (!nc) {
> +        return;
> +    }
> +
> +    if (!start) {
> +        qemu_flush_or_purge_queued_packets(nc, true);
> +
> +        if (nc->info && nc->info->cleanup) {
> +            nc->info->cleanup(nc);
> +        }
> +    } else if (nc->peer) {
> +        qemu_flush_or_purge_queued_packets(nc->peer, false);
> +    }
> +}
> +
> +static int vfu_object_start_devs(DeviceState *dev, void *opaque)
> +{
> +    int ret = vfu_object_bdrv_invalidate_cache_by_dev(dev);
> +
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    vfu_object_start_stop_netdev(dev, true);
> +
> +    return ret;
> +}
> +
> +static int vfu_object_stop_devs(DeviceState *dev, void *opaque)
> +{
> +    int ret = vfu_object_bdrv_inactivate_by_dev(dev);
> +
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    vfu_object_start_stop_netdev(dev, false);
> +
> +    return ret;
> +}
> +
> +/**
> + * handlers for vfu_migration_callbacks_t
> + *
> + * The libvfio-user library accesses these handlers to drive the migration
> + * at the remote end, and also to transport the data stored in vfu_mig_buf
> + *
> + */
> +static void vfu_mig_state_stop_and_copy(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save, false);
> +    }
> +
> +    ret = qemu_remote_savevm(o->vfu_mig_file, DEVICE(o->pci_dev));
> +    if (ret) {
> +        qemu_file_shutdown(o->vfu_mig_file);
> +        o->vfu_mig_file = NULL;
> +        return;
> +    }
> +
> +    qemu_fflush(o->vfu_mig_file);
> +}
> +
> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    if (o->vfu_state != VFU_MIGR_STATE_RESUME) {
> +        goto run_ctx;
> +    }
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
> +    }
> +
> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
> +    if (ret) {
> +        VFU_OBJECT_ERROR(o, "vfu: failed to restore device state");
> +        return;
> +    }
> +
> +    qemu_file_shutdown(o->vfu_mig_file);
> +    o->vfu_mig_file = NULL;
> +
> +run_ctx:
> +    ret = qdev_walk_children(DEVICE(o->pci_dev), NULL, NULL,
> +                             vfu_object_start_devs,
> +                             NULL, NULL);
> +    if (ret) {
> +        VFU_OBJECT_ERROR(o, "vfu: failed to setup backends for %s",
> +                         o->device);
> +        return;
> +    }
> +}
> +
> +static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    ret = qdev_walk_children(DEVICE(o->pci_dev), NULL, NULL,
> +                             vfu_object_stop_devs,
> +                             NULL, NULL);
> +    if (ret) {
> +        VFU_OBJECT_ERROR(o, "vfu: failed to inactivate backends for %s",
> +                         o->device);
> +    }
> +}
> +
> +static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (o->vfu_state == state) {
> +        return 0;
> +    }
> +
> +    switch (state) {
> +    case VFU_MIGR_STATE_RESUME:
> +        break;
> +    case VFU_MIGR_STATE_STOP_AND_COPY:
> +        vfu_mig_state_stop_and_copy(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_STOP:
> +        vfu_mig_state_stop(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_PRE_COPY:
> +        break;
> +    case VFU_MIGR_STATE_RUNNING:
> +        vfu_mig_state_running(vfu_ctx);
> +        break;
> +    default:
> +        warn_report("vfu: Unknown migration state %d", state);
> +    }
> +
> +    o->vfu_state = state;
> +
> +    return 0;
> +}
> +
> +static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    static bool mig_ongoing;
> +
> +    if (!mig_ongoing && !o->vfu_mig_buf_pending) {
> +        o->vfu_mig_buf_pending = o->vfu_mig_buf_size;
> +        mig_ongoing = true;
> +    }
> +
> +    if (mig_ongoing && !o->vfu_mig_buf_pending) {
> +        mig_ongoing = false;
> +    }
> +
> +    return o->vfu_mig_buf_pending;
> +}
> +
> +static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
> +                                uint64_t *size)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t data_size = o->vfu_mig_buf_pending;
> +
> +    if (data_size > VFU_OBJECT_MIG_WINDOW) {
> +        data_size = VFU_OBJECT_MIG_WINDOW;
> +    }
> +
> +    o->vfu_mig_section_offset = o->vfu_mig_buf_size - o->vfu_mig_buf_pending;
> +
> +    o->vfu_mig_buf_pending -= data_size;
> +
> +    if (offset) {
> +        *offset = 0;
> +    }
> +
> +    if (size) {
> +        *size = data_size;
> +    }
> +
> +    return 0;
> +}
> +
> +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
> +                                 uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t read_offset = o->vfu_mig_section_offset + offset;
> +
> +    if (read_offset > o->vfu_mig_buf_size) {
> +        warn_report("vfu: buffer overflow - offset outside range");
> +        return -1;
> +    }
> +
> +    if ((read_offset + size) > o->vfu_mig_buf_size) {
> +        warn_report("vfu: buffer overflow - size outside range");
> +        size = o->vfu_mig_buf_size - read_offset;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + read_offset), size);
> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
> +                                  uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t end = o->vfu_mig_data_written + offset + size;
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +        o->vfu_mig_buf_size = end;
> +    }
> +
> +    memcpy((o->vfu_mig_buf + o->vfu_mig_data_written + offset), data, size);
> +
> +    return size;
> +}
> +
> +static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    o->vfu_mig_data_written += count;
> +
> +    return 0;
> +}
> +
> +static const vfu_migration_callbacks_t vfu_mig_cbs = {
> +    .version = VFU_MIGR_CALLBACKS_VERS,
> +    .transition = &vfu_mig_transition,
> +    .get_pending_bytes = &vfu_mig_get_pending_bytes,
> +    .prepare_data = &vfu_mig_prepare_data,
> +    .read_data = &vfu_mig_read_data,
> +    .data_written = &vfu_mig_data_written,
> +    .write_data = &vfu_mig_write_data,
> +};
> +
>   static void vfu_object_ctx_run(void *opaque)
>   {
>       VfuObject *o = opaque;
> @@ -550,6 +982,13 @@ void vfu_object_set_bus_irq(PCIBus *pci_bus)
>       pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, NULL, 1);
>   }
>   
> +static bool vfu_object_migratable(VfuObject *o)
> +{
> +    DeviceClass *dc = DEVICE_GET_CLASS(o->pci_dev);
> +
> +    return dc->vmsd && !dc->vmsd->unmigratable;
> +}
> +
>   /*
>    * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>    * properties. It also depends on devices instantiated in QEMU. These
> @@ -575,6 +1014,7 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>       ERRP_GUARD();
>       DeviceState *dev = NULL;
>       vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
> +    uint64_t migr_regs_size, migr_size;
>       int ret;
>   
>       if (o->vfu_ctx || !o->socket || !o->device ||
> @@ -653,6 +1093,31 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>           goto fail;
>       }
>   
> +    migr_regs_size = vfu_get_migr_register_area_size();
> +    migr_size = migr_regs_size + VFU_OBJECT_MIG_WINDOW;
> +
> +    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
> +                           migr_size, NULL,
> +                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
> +    if (ret < 0) {
> +        error_setg(errp, "vfu: Failed to register migration BAR %s- %s",
> +                   o->device, strerror(errno));
> +        goto fail;
> +    }
> +
> +    if (!vfu_object_migratable(o)) {
> +        goto realize_ctx;
> +    }
> +
> +    ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
> +                                               migr_regs_size);
> +    if (ret < 0) {
> +        error_setg(errp, "vfu: Failed to setup migration %s- %s",
> +                   o->device, strerror(errno));
> +        goto fail;
> +    }
> +
> +realize_ctx:
>       ret = vfu_realize_ctx(o->vfu_ctx);
>       if (ret < 0) {
>           error_setg(errp, "vfu: Failed to realize device %s- %s",
> @@ -700,6 +1165,8 @@ static void vfu_object_init(Object *obj)
>       }
>   
>       o->vfu_poll_fd = -1;
> +
> +    o->vfu_state = VFU_MIGR_STATE_STOP;
>   }
>   
>   static void vfu_object_finalize(Object *obj)
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1599b02fbc..2cc3b74287 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -66,6 +66,7 @@
>   #include "net/announce.h"
>   #include "qemu/yank.h"
>   #include "yank_functions.h"
> +#include "hw/qdev-core.h"
>   
>   const unsigned int postcopy_ram_discard_version;
>   
> @@ -1606,6 +1607,64 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>       return ret;
>   }
>   
> +static SaveStateEntry *find_se_from_dev(DeviceState *dev)
> +{
> +    SaveStateEntry *se;
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (se->opaque == dev) {
> +            return se;
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +static int qemu_remote_savevm_section_full(DeviceState *dev, void *opaque)
> +{
> +    QEMUFile *f = opaque;
> +    SaveStateEntry *se;
> +    int ret;
> +
> +    se = find_se_from_dev(dev);
> +    if (!se) {
> +        return 0;
> +    }
> +
> +    if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque) ||
> +        se->vmsd->unmigratable) {
> +        return 0;
> +    }
> +
> +    save_section_header(f, se, QEMU_VM_SECTION_FULL);
> +
> +    ret = vmstate_save(f, se, NULL);
> +    if (ret) {
> +        qemu_file_set_error(f, ret);
> +        return ret;
> +    }
> +
> +    save_section_footer(f, se);
> +
> +    return 0;
> +}
> +
> +int qemu_remote_savevm(QEMUFile *f, DeviceState *dev)
> +{
> +    int ret = qdev_walk_children(dev, NULL, NULL,
> +                                 qemu_remote_savevm_section_full,
> +                                 NULL, f);
> +
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    qemu_put_byte(f, QEMU_VM_EOF);
> +    qemu_fflush(f);
> +
> +    return 0;
> +}
> +
>   void qemu_savevm_live_state(QEMUFile *f)
>   {
>       /* save QEMU_VM_SECTION_END section */
> @@ -2447,6 +2506,36 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
>       return 0;
>   }
>   
> +int qemu_remote_loadvm(QEMUFile *f)
> +{
> +    uint8_t section_type;
> +    int ret = 0;
> +
> +    while (true) {
> +        section_type = qemu_get_byte(f);
> +
> +        ret = qemu_file_get_error(f);
> +        if (ret) {
> +            break;
> +        }
> +
> +        switch (section_type) {
> +        case QEMU_VM_SECTION_FULL:
> +            ret = qemu_loadvm_section_start_full(f, NULL);
> +            if (ret < 0) {
> +                break;
> +            }
> +            break;
> +        case QEMU_VM_EOF:
> +            return ret;
> +        default:
> +            return -EINVAL;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
>   static int
>   qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>   {
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index 05f87cdddc..83f8562792 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -63,6 +63,25 @@ static int vmstate_size(void *opaque, const VMStateField *field)
>       return size;
>   }
>   
> +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev)
> +{
> +    DeviceClass *dc = DEVICE_GET_CLASS(DEVICE(pci_dev));
> +    const VMStateField *field = NULL;
> +    uint64_t size = 0;
> +
> +    if (!dc->vmsd) {
> +        return 0;
> +    }
> +
> +    field = dc->vmsd->fields;
> +    while (field && field->name) {
> +        size += vmstate_size(pci_dev, field);
> +        field++;
> +    }
> +
> +    return size;
> +}
> +
>   static void vmstate_handle_alloc(void *ptr, const VMStateField *field,
>                                    void *opaque)
>   {



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-18 12:13       ` Paolo Bonzini
@ 2022-02-18 14:49         ` Jag Raman
  2022-02-18 15:16           ` Jag Raman
  2022-02-20  8:27           ` Paolo Bonzini
  0 siblings, 2 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-18 14:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: eduardo, Peter Maydell, Daniel P. Berrangé,
	Beraldo Leal, John Johnson, John Levon, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Juan Quintela,
	Alex Williamson, Michael S. Tsirkin, Stefan Hajnoczi,
	Thanos Makatos, Eric Blake, Kanth Ghatraju,
	Dr. David Alan Gilbert, Philippe Mathieu-Daudé



> On Feb 18, 2022, at 7:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 2/18/22 04:40, Jag Raman wrote:
>>> On Feb 17, 2022, at 7:09 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>> 
>>> On Thu, 17 Feb 2022 at 07:56, Jagannathan Raman <jag.raman@oracle.com> wrote:
>>>> 
>>>> The compiler path that cmake gets from meson is corrupted. It results in
>>>> the following error:
>>>> | -- The C compiler identification is unknown
>>>> | CMake Error at CMakeLists.txt:35 (project):
>>>> | The CMAKE_C_COMPILER:
>>>> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
>>>> | is not a full path to an existing compiler tool.
>>>> 
>>>> Explicitly specify the C compiler for cmake to avoid this error
>>> 
>>> This sounds like a bug in Meson. Is there a Meson bug report
>>> we can reference in the commit message here ?
>> Hi Peter,
>> This issue reproduces with the latest meson [1] also.
> 
> 0.60.0 or more recent versions should have a fix, which would do exactly what this patch does: do not define CMAKE_C_COMPILER_LAUNCHER, and place the whole binaries.c variable in CMAKE_C_COMPILER.  What are the contents of the genrated CMakeMesonToolchainFile.cmake and CMakeCache.txt files, without and with your patch?

I’ll checkout what’s going on at my end. But the issue reproduces with
meson 0.61 from what I can tell:
# ../configure --target-list=x86_64-softmmu --enable-debug --enable-vfio-user-server;
The Meson build system
Version: 0.61.2
…
…
| /opt/rh/devtoolset-9/root/usr/bin/cc;-m64;-mcx16

| is not a full path to an existing compiler tool.


Concerning the generated files, I see the following in CMakeMesonToolchainFile.cmake:
Without patch: set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc" "-m64" "-mcx16”)
With patch: set(CMAKE_C_COMPILER "cc" "-m64" "-mcx16")

> 
>> I noticed the following about the “binaries” section [2]. The manual
>> says meson could pass the values in this section to find_program [3].
>> As such I’m wondering if it’s OK to set compiler flags in this section
>> because find_program doesn’t seem to accept any compiler flags.
> 
> The full quote of the manual is "These can be used internally by Meson, or by the find_program function", and the C compiler variable "c" is in the former category.
> 
> There is an important difference between the flags in "binaries" and those in "built-in options". What is in "binaries" is used when requesting e.g. the compiler search path, while what is in "built-in options" is not.  So options like "-m32" are definitely part of "binaries", not "built-in options":
> 
>    $ gcc --print-multi-os-directory
>    ../lib64
>    $ gcc -m32 --print-multi-os-directory
>    ../lib

Do you know if the “host_machine” section in cross build
definition file [1] would be any help here?

[1]: https://mesonbuild.com/Cross-compilation.html#machine-entries

--
Jag

> 
> Paolo


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 17/19] vfio-user: register handlers to facilitate migration
  2022-02-18 12:20   ` Paolo Bonzini
@ 2022-02-18 14:55     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-18 14:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: eduardo, Elena Ufimtseva, Daniel P. Berrangé,
	bleal, John Johnson, john.levon, qemu-devel, armbru, quintela,
	alex.williamson, mst, stefanha, thanos.makatos, Kanth Ghatraju,
	eblake, dgilbert, f4bug



> On Feb 18, 2022, at 7:20 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 2/17/22 08:49, Jagannathan Raman wrote:
>> Store and load the device's state during migration. use libvfio-user's
>> handlers for this purpose
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> 
> Why does no one call clear_deferred_backend_init?

We’ll clear it at the machine finalization. FWIW, the ‘x-remote’ machine
operates in a deferred backend initialization mode for the entire
lifecycle of the VM.

Thank you Paolo!
--
Jag

> 
> Paolo
> 
>> ---
>>  include/block/block.h       |   1 +
>>  include/migration/vmstate.h |   2 +
>>  migration/savevm.h          |   2 +
>>  block.c                     |   5 +
>>  hw/remote/machine.c         |   7 +
>>  hw/remote/vfio-user-obj.c   | 467 ++++++++++++++++++++++++++++++++++++
>>  migration/savevm.c          |  89 +++++++
>>  migration/vmstate.c         |  19 ++
>>  8 files changed, 592 insertions(+)
>> diff --git a/include/block/block.h b/include/block/block.h
>> index e1713ee306..02b89e0668 100644
>> --- a/include/block/block.h
>> +++ b/include/block/block.h
>> @@ -495,6 +495,7 @@ int generated_co_wrapper bdrv_invalidate_cache(BlockDriverState *bs,
>>                                                 Error **errp);
>>  void bdrv_invalidate_cache_all(Error **errp);
>>  int bdrv_inactivate_all(void);
>> +int bdrv_inactivate(BlockDriverState *bs);
>>    /* Ensure contents are flushed to disk.  */
>>  int generated_co_wrapper bdrv_flush(BlockDriverState *bs);
>> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
>> index 017c03675c..68bea576ea 100644
>> --- a/include/migration/vmstate.h
>> +++ b/include/migration/vmstate.h
>> @@ -1165,6 +1165,8 @@ extern const VMStateInfo vmstate_info_qlist;
>>  #define VMSTATE_END_OF_LIST()                                         \
>>      {}
>>  +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev);
>> +
>>  int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
>>                         void *opaque, int version_id);
>>  int vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
>> diff --git a/migration/savevm.h b/migration/savevm.h
>> index 6461342cb4..8007064ff2 100644
>> --- a/migration/savevm.h
>> +++ b/migration/savevm.h
>> @@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
>>  int qemu_load_device_state(QEMUFile *f);
>>  int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>>          bool in_postcopy, bool inactivate_disks);
>> +int qemu_remote_savevm(QEMUFile *f, DeviceState *dev);
>> +int qemu_remote_loadvm(QEMUFile *f);
>>    #endif
>> diff --git a/block.c b/block.c
>> index b54d59d1fa..e90aaee30c 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -6565,6 +6565,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
>>      return 0;
>>  }
>>  +int bdrv_inactivate(BlockDriverState *bs)
>> +{
>> +    return bdrv_inactivate_recurse(bs);
>> +}
>> +
>>  int bdrv_inactivate_all(void)
>>  {
>>      BlockDriverState *bs = NULL;
>> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
>> index a8b4a3aef3..31ef401e43 100644
>> --- a/hw/remote/machine.c
>> +++ b/hw/remote/machine.c
>> @@ -24,6 +24,7 @@
>>  #include "hw/qdev-core.h"
>>  #include "hw/remote/iommu.h"
>>  #include "hw/remote/vfio-user-obj.h"
>> +#include "sysemu/sysemu.h"
>>    static void remote_machine_init(MachineState *machine)
>>  {
>> @@ -86,6 +87,11 @@ static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
>>      s->vfio_user = value;
>>  }
>>  +static void remote_machine_instance_init(Object *obj)
>> +{
>> +    set_deferred_backend_init();
>> +}
>> +
>>  static void remote_machine_class_init(ObjectClass *oc, void *data)
>>  {
>>      MachineClass *mc = MACHINE_CLASS(oc);
>> @@ -105,6 +111,7 @@ static const TypeInfo remote_machine = {
>>      .name = TYPE_REMOTE_MACHINE,
>>      .parent = TYPE_MACHINE,
>>      .instance_size = sizeof(RemoteMachineState),
>> +    .instance_init = remote_machine_instance_init,
>>      .class_init = remote_machine_class_init,
>>      .interfaces = (InterfaceInfo[]) {
>>          { TYPE_HOTPLUG_HANDLER },
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index d79bab87f1..2304643003 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -57,6 +57,13 @@
>>  #include "hw/pci/msi.h"
>>  #include "hw/pci/msix.h"
>>  #include "hw/remote/vfio-user-obj.h"
>> +#include "migration/qemu-file.h"
>> +#include "migration/savevm.h"
>> +#include "migration/vmstate.h"
>> +#include "migration/global_state.h"
>> +#include "block/block.h"
>> +#include "sysemu/block-backend.h"
>> +#include "net/net.h"
>>    #define TYPE_VFU_OBJECT "x-vfio-user-server"
>>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
>> @@ -108,12 +115,49 @@ struct VfuObject {
>>      Error *unplug_blocker;
>>        int vfu_poll_fd;
>> +
>> +    /*
>> +     * vfu_mig_buf holds the migration data. In the remote server, this
>> +     * buffer replaces the role of an IO channel which links the source
>> +     * and the destination.
>> +     *
>> +     * Whenever the client QEMU process initiates migration, the remote
>> +     * server gets notified via libvfio-user callbacks. The remote server
>> +     * sets up a QEMUFile object using this buffer as backend. The remote
>> +     * server passes this object to its migration subsystem, which slurps
>> +     * the VMSD of the device ('devid' above) referenced by this object
>> +     * and stores the VMSD in this buffer.
>> +     *
>> +     * The client subsequetly asks the remote server for any data that
>> +     * needs to be moved over to the destination via libvfio-user
>> +     * library's vfu_migration_callbacks_t callbacks. The remote hands
>> +     * over this buffer as data at this time.
>> +     *
>> +     * A reverse of this process happens at the destination.
>> +     */
>> +    uint8_t *vfu_mig_buf;
>> +
>> +    uint64_t vfu_mig_buf_size;
>> +
>> +    uint64_t vfu_mig_buf_pending;
>> +
>> +    uint64_t vfu_mig_data_written;
>> +
>> +    uint64_t vfu_mig_section_offset;
>> +
>> +    QEMUFile *vfu_mig_file;
>> +
>> +    vfu_migr_state_t vfu_state;
>>  };
>>    static GHashTable *vfu_object_bdf_to_ctx_table;
>>    #define INT2VOIDP(i) (void *)(uintptr_t)(i)
>>  +#define KB(x)    ((size_t) (x) << 10)
>> +
>> +#define VFU_OBJECT_MIG_WINDOW KB(64)
>> +
>>  static void vfu_object_init_ctx(VfuObject *o, Error **errp);
>>    static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
>> @@ -163,6 +207,394 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>>      vfu_object_init_ctx(o, errp);
>>  }
>>  +/**
>> + * Migration helper functions
>> + *
>> + * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
>> + * subsystem - qemu_remote_loadvm & qemu_remote_savevm. loadvm/savevm
>> + * call these functions via QEMUFileOps to load/save the VMSD of a
>> + * device into vfu_mig_buf
>> + *
>> + */
>> +static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
>> +                                size_t size, Error **errp)
>> +{
>> +    VfuObject *o = opaque;
>> +
>> +    if (pos > o->vfu_mig_buf_size) {
>> +        size = 0;
>> +    } else if ((pos + size) > o->vfu_mig_buf_size) {
>> +        size = o->vfu_mig_buf_size - pos;
>> +    }
>> +
>> +    memcpy(buf, (o->vfu_mig_buf + pos), size);
>> +
>> +    return size;
>> +}
>> +
>> +static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
>> +                                 int64_t pos, Error **errp)
>> +{
>> +    ERRP_GUARD();
>> +    VfuObject *o = opaque;
>> +    uint64_t end = pos + iov_size(iov, iovcnt);
>> +    int i;
>> +
>> +    if (o->vfu_mig_buf_pending) {
>> +        error_setg(errp, "Migration is ongoing");
>> +        return 0;
>> +    }
>> +
>> +    if (end > o->vfu_mig_buf_size) {
>> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
>> +    }
>> +
>> +    for (i = 0; i < iovcnt; i++) {
>> +        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
>> +               iov[i].iov_len);
>> +        o->vfu_mig_buf_size += iov[i].iov_len;
>> +    }
>> +
>> +    return iov_size(iov, iovcnt);
>> +}
>> +
>> +static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
>> +{
>> +    VfuObject *o = opaque;
>> +
>> +    o->vfu_mig_buf_size = 0;
>> +
>> +    g_free(o->vfu_mig_buf);
>> +
>> +    o->vfu_mig_buf = NULL;
>> +
>> +    o->vfu_mig_buf_pending = 0;
>> +
>> +    o->vfu_mig_data_written = 0;
>> +
>> +    o->vfu_mig_section_offset = 0;
>> +
>> +    return 0;
>> +}
>> +
>> +static const QEMUFileOps vfu_mig_fops_save = {
>> +    .writev_buffer  = vfu_mig_buf_write,
>> +    .shut_down      = vfu_mig_buf_shutdown,
>> +};
>> +
>> +static const QEMUFileOps vfu_mig_fops_load = {
>> +    .get_buffer     = vfu_mig_buf_read,
>> +    .shut_down      = vfu_mig_buf_shutdown,
>> +};
>> +
>> +static BlockDriverState *vfu_object_find_bs_by_dev(DeviceState *dev)
>> +{
>> +    BlockBackend *blk = blk_by_dev(dev);
>> +
>> +    if (!blk) {
>> +        return NULL;
>> +    }
>> +
>> +    return blk_bs(blk);
>> +}
>> +
>> +static int vfu_object_bdrv_invalidate_cache_by_dev(DeviceState *dev)
>> +{
>> +    BlockDriverState *bs = NULL;
>> +    Error *local_err = NULL;
>> +
>> +    bs = vfu_object_find_bs_by_dev(dev);
>> +    if (!bs) {
>> +        return 0;
>> +    }
>> +
>> +    bdrv_invalidate_cache(bs, &local_err);
>> +    if (local_err) {
>> +        error_report_err(local_err);
>> +        return -1;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static int vfu_object_bdrv_inactivate_by_dev(DeviceState *dev)
>> +{
>> +    BlockDriverState *bs = NULL;
>> +
>> +    bs = vfu_object_find_bs_by_dev(dev);
>> +    if (!bs) {
>> +        return 0;
>> +    }
>> +
>> +    return bdrv_inactivate(bs);
>> +}
>> +
>> +static void vfu_object_start_stop_netdev(DeviceState *dev, bool start)
>> +{
>> +    NetClientState *nc = NULL;
>> +    Error *local_err = NULL;
>> +    char *netdev = NULL;
>> +
>> +    netdev = object_property_get_str(OBJECT(dev), "netdev", &local_err);
>> +    if (local_err) {
>> +        /**
>> +         * object_property_get_str() sets Error if netdev property is
>> +         * not found, not necessarily an error in the context of
>> +         * this function
>> +         */
>> +        error_free(local_err);
>> +        return;
>> +    }
>> +
>> +    if (!netdev) {
>> +        return;
>> +    }
>> +
>> +    nc = qemu_find_netdev(netdev);
>> +
>> +    if (!nc) {
>> +        return;
>> +    }
>> +
>> +    if (!start) {
>> +        qemu_flush_or_purge_queued_packets(nc, true);
>> +
>> +        if (nc->info && nc->info->cleanup) {
>> +            nc->info->cleanup(nc);
>> +        }
>> +    } else if (nc->peer) {
>> +        qemu_flush_or_purge_queued_packets(nc->peer, false);
>> +    }
>> +}
>> +
>> +static int vfu_object_start_devs(DeviceState *dev, void *opaque)
>> +{
>> +    int ret = vfu_object_bdrv_invalidate_cache_by_dev(dev);
>> +
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    vfu_object_start_stop_netdev(dev, true);
>> +
>> +    return ret;
>> +}
>> +
>> +static int vfu_object_stop_devs(DeviceState *dev, void *opaque)
>> +{
>> +    int ret = vfu_object_bdrv_inactivate_by_dev(dev);
>> +
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    vfu_object_start_stop_netdev(dev, false);
>> +
>> +    return ret;
>> +}
>> +
>> +/**
>> + * handlers for vfu_migration_callbacks_t
>> + *
>> + * The libvfio-user library accesses these handlers to drive the migration
>> + * at the remote end, and also to transport the data stored in vfu_mig_buf
>> + *
>> + */
>> +static void vfu_mig_state_stop_and_copy(vfu_ctx_t *vfu_ctx)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    int ret;
>> +
>> +    if (!o->vfu_mig_file) {
>> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save, false);
>> +    }
>> +
>> +    ret = qemu_remote_savevm(o->vfu_mig_file, DEVICE(o->pci_dev));
>> +    if (ret) {
>> +        qemu_file_shutdown(o->vfu_mig_file);
>> +        o->vfu_mig_file = NULL;
>> +        return;
>> +    }
>> +
>> +    qemu_fflush(o->vfu_mig_file);
>> +}
>> +
>> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    int ret;
>> +
>> +    if (o->vfu_state != VFU_MIGR_STATE_RESUME) {
>> +        goto run_ctx;
>> +    }
>> +
>> +    if (!o->vfu_mig_file) {
>> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
>> +    }
>> +
>> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
>> +    if (ret) {
>> +        VFU_OBJECT_ERROR(o, "vfu: failed to restore device state");
>> +        return;
>> +    }
>> +
>> +    qemu_file_shutdown(o->vfu_mig_file);
>> +    o->vfu_mig_file = NULL;
>> +
>> +run_ctx:
>> +    ret = qdev_walk_children(DEVICE(o->pci_dev), NULL, NULL,
>> +                             vfu_object_start_devs,
>> +                             NULL, NULL);
>> +    if (ret) {
>> +        VFU_OBJECT_ERROR(o, "vfu: failed to setup backends for %s",
>> +                         o->device);
>> +        return;
>> +    }
>> +}
>> +
>> +static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    int ret;
>> +
>> +    ret = qdev_walk_children(DEVICE(o->pci_dev), NULL, NULL,
>> +                             vfu_object_stop_devs,
>> +                             NULL, NULL);
>> +    if (ret) {
>> +        VFU_OBJECT_ERROR(o, "vfu: failed to inactivate backends for %s",
>> +                         o->device);
>> +    }
>> +}
>> +
>> +static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +
>> +    if (o->vfu_state == state) {
>> +        return 0;
>> +    }
>> +
>> +    switch (state) {
>> +    case VFU_MIGR_STATE_RESUME:
>> +        break;
>> +    case VFU_MIGR_STATE_STOP_AND_COPY:
>> +        vfu_mig_state_stop_and_copy(vfu_ctx);
>> +        break;
>> +    case VFU_MIGR_STATE_STOP:
>> +        vfu_mig_state_stop(vfu_ctx);
>> +        break;
>> +    case VFU_MIGR_STATE_PRE_COPY:
>> +        break;
>> +    case VFU_MIGR_STATE_RUNNING:
>> +        vfu_mig_state_running(vfu_ctx);
>> +        break;
>> +    default:
>> +        warn_report("vfu: Unknown migration state %d", state);
>> +    }
>> +
>> +    o->vfu_state = state;
>> +
>> +    return 0;
>> +}
>> +
>> +static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    static bool mig_ongoing;
>> +
>> +    if (!mig_ongoing && !o->vfu_mig_buf_pending) {
>> +        o->vfu_mig_buf_pending = o->vfu_mig_buf_size;
>> +        mig_ongoing = true;
>> +    }
>> +
>> +    if (mig_ongoing && !o->vfu_mig_buf_pending) {
>> +        mig_ongoing = false;
>> +    }
>> +
>> +    return o->vfu_mig_buf_pending;
>> +}
>> +
>> +static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
>> +                                uint64_t *size)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    uint64_t data_size = o->vfu_mig_buf_pending;
>> +
>> +    if (data_size > VFU_OBJECT_MIG_WINDOW) {
>> +        data_size = VFU_OBJECT_MIG_WINDOW;
>> +    }
>> +
>> +    o->vfu_mig_section_offset = o->vfu_mig_buf_size - o->vfu_mig_buf_pending;
>> +
>> +    o->vfu_mig_buf_pending -= data_size;
>> +
>> +    if (offset) {
>> +        *offset = 0;
>> +    }
>> +
>> +    if (size) {
>> +        *size = data_size;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
>> +                                 uint64_t size, uint64_t offset)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    uint64_t read_offset = o->vfu_mig_section_offset + offset;
>> +
>> +    if (read_offset > o->vfu_mig_buf_size) {
>> +        warn_report("vfu: buffer overflow - offset outside range");
>> +        return -1;
>> +    }
>> +
>> +    if ((read_offset + size) > o->vfu_mig_buf_size) {
>> +        warn_report("vfu: buffer overflow - size outside range");
>> +        size = o->vfu_mig_buf_size - read_offset;
>> +    }
>> +
>> +    memcpy(buf, (o->vfu_mig_buf + read_offset), size);
>> +
>> +    return size;
>> +}
>> +
>> +static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
>> +                                  uint64_t size, uint64_t offset)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    uint64_t end = o->vfu_mig_data_written + offset + size;
>> +
>> +    if (end > o->vfu_mig_buf_size) {
>> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
>> +        o->vfu_mig_buf_size = end;
>> +    }
>> +
>> +    memcpy((o->vfu_mig_buf + o->vfu_mig_data_written + offset), data, size);
>> +
>> +    return size;
>> +}
>> +
>> +static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +
>> +    o->vfu_mig_data_written += count;
>> +
>> +    return 0;
>> +}
>> +
>> +static const vfu_migration_callbacks_t vfu_mig_cbs = {
>> +    .version = VFU_MIGR_CALLBACKS_VERS,
>> +    .transition = &vfu_mig_transition,
>> +    .get_pending_bytes = &vfu_mig_get_pending_bytes,
>> +    .prepare_data = &vfu_mig_prepare_data,
>> +    .read_data = &vfu_mig_read_data,
>> +    .data_written = &vfu_mig_data_written,
>> +    .write_data = &vfu_mig_write_data,
>> +};
>> +
>>  static void vfu_object_ctx_run(void *opaque)
>>  {
>>      VfuObject *o = opaque;
>> @@ -550,6 +982,13 @@ void vfu_object_set_bus_irq(PCIBus *pci_bus)
>>      pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, NULL, 1);
>>  }
>>  +static bool vfu_object_migratable(VfuObject *o)
>> +{
>> +    DeviceClass *dc = DEVICE_GET_CLASS(o->pci_dev);
>> +
>> +    return dc->vmsd && !dc->vmsd->unmigratable;
>> +}
>> +
>>  /*
>>   * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>>   * properties. It also depends on devices instantiated in QEMU. These
>> @@ -575,6 +1014,7 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>>      ERRP_GUARD();
>>      DeviceState *dev = NULL;
>>      vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
>> +    uint64_t migr_regs_size, migr_size;
>>      int ret;
>>        if (o->vfu_ctx || !o->socket || !o->device ||
>> @@ -653,6 +1093,31 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>>          goto fail;
>>      }
>>  +    migr_regs_size = vfu_get_migr_register_area_size();
>> +    migr_size = migr_regs_size + VFU_OBJECT_MIG_WINDOW;
>> +
>> +    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
>> +                           migr_size, NULL,
>> +                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
>> +    if (ret < 0) {
>> +        error_setg(errp, "vfu: Failed to register migration BAR %s- %s",
>> +                   o->device, strerror(errno));
>> +        goto fail;
>> +    }
>> +
>> +    if (!vfu_object_migratable(o)) {
>> +        goto realize_ctx;
>> +    }
>> +
>> +    ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
>> +                                               migr_regs_size);
>> +    if (ret < 0) {
>> +        error_setg(errp, "vfu: Failed to setup migration %s- %s",
>> +                   o->device, strerror(errno));
>> +        goto fail;
>> +    }
>> +
>> +realize_ctx:
>>      ret = vfu_realize_ctx(o->vfu_ctx);
>>      if (ret < 0) {
>>          error_setg(errp, "vfu: Failed to realize device %s- %s",
>> @@ -700,6 +1165,8 @@ static void vfu_object_init(Object *obj)
>>      }
>>        o->vfu_poll_fd = -1;
>> +
>> +    o->vfu_state = VFU_MIGR_STATE_STOP;
>>  }
>>    static void vfu_object_finalize(Object *obj)
>> diff --git a/migration/savevm.c b/migration/savevm.c
>> index 1599b02fbc..2cc3b74287 100644
>> --- a/migration/savevm.c
>> +++ b/migration/savevm.c
>> @@ -66,6 +66,7 @@
>>  #include "net/announce.h"
>>  #include "qemu/yank.h"
>>  #include "yank_functions.h"
>> +#include "hw/qdev-core.h"
>>    const unsigned int postcopy_ram_discard_version;
>>  @@ -1606,6 +1607,64 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>>      return ret;
>>  }
>>  +static SaveStateEntry *find_se_from_dev(DeviceState *dev)
>> +{
>> +    SaveStateEntry *se;
>> +
>> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
>> +        if (se->opaque == dev) {
>> +            return se;
>> +        }
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static int qemu_remote_savevm_section_full(DeviceState *dev, void *opaque)
>> +{
>> +    QEMUFile *f = opaque;
>> +    SaveStateEntry *se;
>> +    int ret;
>> +
>> +    se = find_se_from_dev(dev);
>> +    if (!se) {
>> +        return 0;
>> +    }
>> +
>> +    if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque) ||
>> +        se->vmsd->unmigratable) {
>> +        return 0;
>> +    }
>> +
>> +    save_section_header(f, se, QEMU_VM_SECTION_FULL);
>> +
>> +    ret = vmstate_save(f, se, NULL);
>> +    if (ret) {
>> +        qemu_file_set_error(f, ret);
>> +        return ret;
>> +    }
>> +
>> +    save_section_footer(f, se);
>> +
>> +    return 0;
>> +}
>> +
>> +int qemu_remote_savevm(QEMUFile *f, DeviceState *dev)
>> +{
>> +    int ret = qdev_walk_children(dev, NULL, NULL,
>> +                                 qemu_remote_savevm_section_full,
>> +                                 NULL, f);
>> +
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    qemu_put_byte(f, QEMU_VM_EOF);
>> +    qemu_fflush(f);
>> +
>> +    return 0;
>> +}
>> +
>>  void qemu_savevm_live_state(QEMUFile *f)
>>  {
>>      /* save QEMU_VM_SECTION_END section */
>> @@ -2447,6 +2506,36 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
>>      return 0;
>>  }
>>  +int qemu_remote_loadvm(QEMUFile *f)
>> +{
>> +    uint8_t section_type;
>> +    int ret = 0;
>> +
>> +    while (true) {
>> +        section_type = qemu_get_byte(f);
>> +
>> +        ret = qemu_file_get_error(f);
>> +        if (ret) {
>> +            break;
>> +        }
>> +
>> +        switch (section_type) {
>> +        case QEMU_VM_SECTION_FULL:
>> +            ret = qemu_loadvm_section_start_full(f, NULL);
>> +            if (ret < 0) {
>> +                break;
>> +            }
>> +            break;
>> +        case QEMU_VM_EOF:
>> +            return ret;
>> +        default:
>> +            return -EINVAL;
>> +        }
>> +    }
>> +
>> +    return ret;
>> +}
>> +
>>  static int
>>  qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>>  {
>> diff --git a/migration/vmstate.c b/migration/vmstate.c
>> index 05f87cdddc..83f8562792 100644
>> --- a/migration/vmstate.c
>> +++ b/migration/vmstate.c
>> @@ -63,6 +63,25 @@ static int vmstate_size(void *opaque, const VMStateField *field)
>>      return size;
>>  }
>>  +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev)
>> +{
>> +    DeviceClass *dc = DEVICE_GET_CLASS(DEVICE(pci_dev));
>> +    const VMStateField *field = NULL;
>> +    uint64_t size = 0;
>> +
>> +    if (!dc->vmsd) {
>> +        return 0;
>> +    }
>> +
>> +    field = dc->vmsd->fields;
>> +    while (field && field->name) {
>> +        size += vmstate_size(pci_dev, field);
>> +        field++;
>> +    }
>> +
>> +    return size;
>> +}
>> +
>>  static void vmstate_handle_alloc(void *ptr, const VMStateField *field,
>>                                   void *opaque)
>>  {
> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-18 14:49         ` Jag Raman
@ 2022-02-18 15:16           ` Jag Raman
  2022-02-20  8:27           ` Paolo Bonzini
  1 sibling, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-18 15:16 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: eduardo, Peter Maydell, Daniel P. Berrangé,
	Beraldo Leal, John Johnson, John Levon, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Juan Quintela,
	Alex Williamson, Michael S. Tsirkin, Stefan Hajnoczi,
	Thanos Makatos, Eric Blake, Kanth Ghatraju,
	Dr. David Alan Gilbert, Philippe Mathieu-Daudé



> On Feb 18, 2022, at 9:49 AM, Jag Raman <jag.raman@oracle.com> wrote:
> 
> 
> 
>> On Feb 18, 2022, at 7:13 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>> 
>> On 2/18/22 04:40, Jag Raman wrote:
>>>> On Feb 17, 2022, at 7:09 AM, Peter Maydell <peter.maydell@linaro.org> wrote:
>>>> 
>>>> On Thu, 17 Feb 2022 at 07:56, Jagannathan Raman <jag.raman@oracle.com> wrote:
>>>>> 
>>>>> The compiler path that cmake gets from meson is corrupted. It results in
>>>>> the following error:
>>>>> | -- The C compiler identification is unknown
>>>>> | CMake Error at CMakeLists.txt:35 (project):
>>>>> | The CMAKE_C_COMPILER:
>>>>> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
>>>>> | is not a full path to an existing compiler tool.
>>>>> 
>>>>> Explicitly specify the C compiler for cmake to avoid this error
>>>> 
>>>> This sounds like a bug in Meson. Is there a Meson bug report
>>>> we can reference in the commit message here ?
>>> Hi Peter,
>>> This issue reproduces with the latest meson [1] also.
>> 
>> 0.60.0 or more recent versions should have a fix, which would do exactly what this patch does: do not define CMAKE_C_COMPILER_LAUNCHER, and place the whole binaries.c variable in CMAKE_C_COMPILER. What are the contents of the genrated CMakeMesonToolchainFile.cmake and CMakeCache.txt files, without and with your patch?
> 
> I’ll checkout what’s going on at my end. But the issue reproduces with
> meson 0.61 from what I can tell:
> # ../configure --target-list=x86_64-softmmu --enable-debug --enable-vfio-user-server;
> The Meson build system
> Version: 0.61.2
> …
> …
> | /opt/rh/devtoolset-9/root/usr/bin/cc;-m64;-mcx16
> 
> | is not a full path to an existing compiler tool.
> 
> 
> Concerning the generated files, I see the following in CMakeMesonToolchainFile.cmake:
> Without patch: set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc" "-m64" "-mcx16”)
> With patch: set(CMAKE_C_COMPILER "cc" "-m64" "-mcx16")

I’m not sure if you’re interested in the contents of the whole file. But they’re here:

Without patch: https://pastebin.com/sbwtvHy0 (also has error log at the end)
With patch: https://pastebin.com/buRYSp2R

Thank you!
--
Jag

> 
>> 
>>> I noticed the following about the “binaries” section [2]. The manual
>>> says meson could pass the values in this section to find_program [3].
>>> As such I’m wondering if it’s OK to set compiler flags in this section
>>> because find_program doesn’t seem to accept any compiler flags.
>> 
>> The full quote of the manual is "These can be used internally by Meson, or by the find_program function", and the C compiler variable "c" is in the former category.
>> 
>> There is an important difference between the flags in "binaries" and those in "built-in options". What is in "binaries" is used when requesting e.g. the compiler search path, while what is in "built-in options" is not.  So options like "-m32" are definitely part of "binaries", not "built-in options":
>> 
>>   $ gcc --print-multi-os-directory
>>   ../lib64
>>   $ gcc -m32 --print-multi-os-directory
>>   ../lib
> 
> Do you know if the “host_machine” section in cross build
> definition file [1] would be any help here?
> 
> [1]: https://mesonbuild.com/Cross-compilation.html#machine-entries
> 
> --
> Jag
> 
>> 
>> Paolo


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-18 14:49         ` Jag Raman
  2022-02-18 15:16           ` Jag Raman
@ 2022-02-20  8:27           ` Paolo Bonzini
  2022-02-20 13:27             ` Paolo Bonzini
  2022-02-22 19:05             ` Jag Raman
  1 sibling, 2 replies; 76+ messages in thread
From: Paolo Bonzini @ 2022-02-20  8:27 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Peter Maydell, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Juan Quintela, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Alex Williamson,
	Michael S. Tsirkin, Stefan Hajnoczi, Thanos Makatos, Eric Blake,
	Kanth Ghatraju, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé

On 2/18/22 15:49, Jag Raman wrote:
> 
> Concerning the generated files, I see the following in CMakeMesonToolchainFile.cmake:
> Without patch: set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc" "-m64" "-mcx16”)
> With patch: set(CMAKE_C_COMPILER "cc" "-m64" "-mcx16")

I don't understand why it works at all with the latter, but the right solution
could be

set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc")
set(CMAKE_C_COMPILER_ARG1 "-m64")
set(CMAKE_C_COMPILER_ARG2 "-mcx16")

Perhaps you can try the following patch to meson (patch it in qemu's build
directory and make sure to use --meson=internal):

diff --git a/mesonbuild/cmake/toolchain.py b/mesonbuild/cmake/toolchain.py
index 316f57cb5..9756864ee 100644
--- a/mesonbuild/cmake/toolchain.py
+++ b/mesonbuild/cmake/toolchain.py
@@ -191,11 +191,14 @@ class CMakeToolchain:
                  continue
  
              if len(exe_list) >= 2 and not self.is_cmdline_option(comp_obj, exe_list[1]):
-                defaults[prefix + 'COMPILER_LAUNCHER'] = [make_abs(exe_list[0])]
+                defaults[f'{prefix}COMPILER_LAUNCHER'] = [make_abs(exe_list[0])]
                  exe_list = exe_list[1:]
  
              exe_list[0] = make_abs(exe_list[0])
-            defaults[prefix + 'COMPILER'] = exe_list
+            defaults[f'{prefix}COMPILER'] = [exe_list[0]]
+            for i in range(1, len(exe_list)):
+                defaults[f'{prefix}COMPILER_ARG{i}'] = [exe_list[i]]
+
              if comp_obj.get_id() == 'clang-cl':
                  defaults['CMAKE_LINKER'] = comp_obj.get_linker_exelist()
  


Thanks,

Paolo


^ permalink raw reply related	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-20  8:27           ` Paolo Bonzini
@ 2022-02-20 13:27             ` Paolo Bonzini
  2022-02-22 19:05             ` Jag Raman
  1 sibling, 0 replies; 76+ messages in thread
From: Paolo Bonzini @ 2022-02-20 13:27 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Peter Maydell, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Juan Quintela, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Alex Williamson,
	Michael S. Tsirkin, Stefan Hajnoczi, Thanos Makatos, Eric Blake,
	Kanth Ghatraju, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé

On 2/20/22 09:27, Paolo Bonzini wrote:
> On 2/18/22 15:49, Jag Raman wrote:
>>
>> Concerning the generated files, I see the following in 
>> CMakeMesonToolchainFile.cmake:
>> Without patch: set(CMAKE_C_COMPILER 
>> "/opt/rh/devtoolset-9/root/usr/bin/cc" "-m64" "-mcx16”)
>> With patch: set(CMAKE_C_COMPILER "cc" "-m64" "-mcx16")
> 
> I don't understand why it works at all with the latter, but the right 
> solution
> could be
> 
> set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc")
> set(CMAKE_C_COMPILER_ARG1 "-m64")
> set(CMAKE_C_COMPILER_ARG2 "-mcx16")

Anyhow it seems to be a cmake bug, because what QEMU/Meson are doing is 
exactly what is in the manual at 
https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_COMPILER.html#variable:CMAKE_%3CLANG%3E_COMPILER:

   Note: Options that are required to make the compiler work correctly
   can be included as items in a list; they can not be changed.

   #set within user supplied toolchain file
   set(CMAKE_C_COMPILER /full/path/to/qcc --arg1 --arg2)

Paolo



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 03/19] qdev: unplug blocker for devices
  2022-02-17  7:48 ` [PATCH v6 03/19] qdev: unplug blocker for devices Jagannathan Raman
@ 2022-02-21 15:27   ` Stefan Hajnoczi
  2022-02-28 16:23     ` Jag Raman
  2022-02-21 15:30   ` Stefan Hajnoczi
  1 sibling, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-21 15:27 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 4180 bytes --]

On Thu, Feb 17, 2022 at 02:48:50AM -0500, Jagannathan Raman wrote:
> Add blocker to prevent hot-unplug of devices
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/qdev-core.h | 35 +++++++++++++++++++++++++++++++++++
>  softmmu/qdev-monitor.c | 26 ++++++++++++++++++++++++++
>  2 files changed, 61 insertions(+)
> 
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 92c3d65208..4b1d77f44a 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -193,6 +193,7 @@ struct DeviceState {
>      int instance_id_alias;
>      int alias_required_for_version;
>      ResettableState reset;
> +    GSList *unplug_blockers;
>  };
>  
>  struct DeviceListener {
> @@ -419,6 +420,40 @@ void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
>  void qdev_machine_creation_done(void);
>  bool qdev_machine_modified(void);
>  
> +/**
> + * Device Unplug blocker: prevents a device from being unplugged. It could
      ^^^^^^^^^^^^^^^^^^^^^

This looks strange. gtkdoc will probably treat it as the doc comment for
qdev_add_unplug_blocker(), which is actually defined below. I suggest
not trying to define a new section in the documentation and instead just
focussing on doc comments for qdev_add_unplug_block() and other
functions.

The gtkdoc way of defining sections is covered here but it's almost
never used in QEMU:
https://developer-old.gnome.org/gtk-doc-manual/stable/documenting_sections.html.en

> + * be used to indicate that another object depends on the device.
> + *
> + * qdev_add_unplug_blocker: Adds an unplug blocker to a device
> + *
> + * @dev: Device to be blocked from unplug
> + * @reason: Reason for blocking
> + *
> + */
> +void qdev_add_unplug_blocker(DeviceState *dev, Error *reason);

Does the caller have to call qdev_del_unplug_blocker() later?

An assert(!dev->unplug_blockers) would be nice when DeviceState is
destroyed. That way leaks will be caught.

> +
> +/**
> + * qdev_del_unplug_blocker: Removes an unplug blocker from a device
> + *
> + * @dev: Device to be unblocked
> + * @reason: Pointer to the Error used with qdev_add_unplug_blocker.
> + *          Used as a handle to lookup the blocker for deletion.
> + *
> + */
> +void qdev_del_unplug_blocker(DeviceState *dev, Error *reason);
> +
> +/**
> + * qdev_unplug_blocked: Confirms if a device is blocked from unplug
> + *
> + * @dev: Device to be tested
> + * @reason: Returns one of the reasons why the device is blocked,
> + *          if any
> + *
> + * Returns: true if device is blocked from unplug, false otherwise
> + *
> + */
> +bool qdev_unplug_blocked(DeviceState *dev, Error **errp);
> +
>  /**
>   * GpioPolarity: Polarity of a GPIO line
>   *
> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> index 01f3834db5..69d9cf3f25 100644
> --- a/softmmu/qdev-monitor.c
> +++ b/softmmu/qdev-monitor.c
> @@ -945,10 +945,36 @@ void qmp_device_del(const char *id, Error **errp)
>              return;
>          }
>  
> +        if (qdev_unplug_blocked(dev, errp)) {
> +            return;
> +        }
> +
>          qdev_unplug(dev, errp);
>      }
>  }
>  
> +void qdev_add_unplug_blocker(DeviceState *dev, Error *reason)

These functions belong in hw/core/qdev.c because they are part of the
DeviceState API, not qdev monitor commands?

> +{
> +    dev->unplug_blockers = g_slist_prepend(dev->unplug_blockers, reason);
> +}
> +
> +void qdev_del_unplug_blocker(DeviceState *dev, Error *reason)
> +{
> +    dev->unplug_blockers = g_slist_remove(dev->unplug_blockers, reason);
> +}
> +
> +bool qdev_unplug_blocked(DeviceState *dev, Error **errp)
> +{
> +    ERRP_GUARD();
> +
> +    if (dev->unplug_blockers) {
> +        error_propagate(errp, error_copy(dev->unplug_blockers->data));
> +        return true;
> +    }
> +
> +    return false;
> +}
> +
>  void hmp_device_add(Monitor *mon, const QDict *qdict)
>  {
>      Error *err = NULL;
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 03/19] qdev: unplug blocker for devices
  2022-02-17  7:48 ` [PATCH v6 03/19] qdev: unplug blocker for devices Jagannathan Raman
  2022-02-21 15:27   ` Stefan Hajnoczi
@ 2022-02-21 15:30   ` Stefan Hajnoczi
  2022-02-28 19:11     ` Jag Raman
  1 sibling, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-21 15:30 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 526 bytes --]

On Thu, Feb 17, 2022 at 02:48:50AM -0500, Jagannathan Raman wrote:
> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> index 01f3834db5..69d9cf3f25 100644
> --- a/softmmu/qdev-monitor.c
> +++ b/softmmu/qdev-monitor.c
> @@ -945,10 +945,36 @@ void qmp_device_del(const char *id, Error **errp)
>              return;
>          }
>  
> +        if (qdev_unplug_blocked(dev, errp)) {
> +            return;
> +        }
> +
>          qdev_unplug(dev, errp);

Can qdev_unplug() check this internally?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 04/19] remote/machine: add HotplugHandler for remote machine
  2022-02-17  7:48 ` [PATCH v6 04/19] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
@ 2022-02-21 15:30   ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-21 15:30 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 450 bytes --]

On Thu, Feb 17, 2022 at 02:48:51AM -0500, Jagannathan Raman wrote:
> Allow hotplugging of PCI(e) devices to remote machine
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/machine.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 05/19] remote/machine: add vfio-user property
  2022-02-17  7:48 ` [PATCH v6 05/19] remote/machine: add vfio-user property Jagannathan Raman
@ 2022-02-21 15:32   ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-21 15:32 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 840 bytes --]

On Thu, Feb 17, 2022 at 02:48:52AM -0500, Jagannathan Raman wrote:
> Add vfio-user to x-remote machine. It is a boolean, which indicates if
> the machine supports vfio-user protocol. The machine configures the bus
> differently vfio-user and multiprocess protocols, so this property
> informs it on how to configure the bus.
> 
> This property should be short lived. Once vfio-user fully replaces
> multiprocess, this property could be removed.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/remote/machine.h |  2 ++
>  hw/remote/machine.c         | 23 +++++++++++++++++++++++
>  2 files changed, 25 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 07/19] vfio-user: define vfio-user-server object
  2022-02-17  7:48 ` [PATCH v6 07/19] vfio-user: define vfio-user-server object Jagannathan Raman
@ 2022-02-21 15:37   ` Stefan Hajnoczi
  2022-02-28 19:14     ` Jag Raman
  2022-02-25 15:42   ` Eric Blake
  1 sibling, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-21 15:37 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 406 bytes --]

On Thu, Feb 17, 2022 at 02:48:54AM -0500, Jagannathan Raman wrote:
> +struct VfuObjectClass {
> +    ObjectClass parent_class;
> +
> +    unsigned int nr_devs;
> +
> +    /*
> +     * Can be set to shutdown automatically when all server object
> +     * instances are destroyed
> +     */
> +    bool auto_shutdown;

This field is introduced but it is hardcoded to true. Is there a way to
set it to false?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 08/19] vfio-user: instantiate vfio-user context
  2022-02-17  7:48 ` [PATCH v6 08/19] vfio-user: instantiate vfio-user context Jagannathan Raman
@ 2022-02-21 15:42   ` Stefan Hajnoczi
  2022-02-28 19:16     ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-21 15:42 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 537 bytes --]

On Thu, Feb 17, 2022 at 02:48:55AM -0500, Jagannathan Raman wrote:
> @@ -124,6 +190,11 @@ static void vfu_object_init(Object *obj)
>  
>      k->nr_devs++;
>  
> +    if (!phase_check(PHASE_MACHINE_READY)) {
> +        o->machine_done.notify = vfu_object_machine_done;
> +        qemu_add_machine_init_done_notifier(&o->machine_done);

This probably has to happen after the next if statement since
qemu_add_machine_init_done_notifier() can immediately call ->notify()
and we'd try to initialize on a non-remote machine type.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 09/19] vfio-user: find and init PCI device
  2022-02-17  7:48 ` [PATCH v6 09/19] vfio-user: find and init PCI device Jagannathan Raman
@ 2022-02-21 15:57   ` Stefan Hajnoczi
  2022-02-28 19:17     ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-21 15:57 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 792 bytes --]

On Thu, Feb 17, 2022 at 02:48:56AM -0500, Jagannathan Raman wrote:
> @@ -221,6 +272,14 @@ static void vfu_object_finalize(Object *obj)
>  
>      o->device = NULL;
>  
> +    if (o->unplug_blocker && o->pci_dev) {
> +        qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
> +        error_free(o->unplug_blocker);
> +        o->unplug_blocker = NULL;
> +    }
> +
> +    o->pci_dev = NULL;

Since we don't hold a reference to o->pci_dev there is an assumption
about the order of --object vs --device ->finalize() here. I think it
will work because softmmu/runstate.c:qemu_cleanup() doesn't unref
main_system_bus and only --object ->finalize() is called, but this seems
fragile. We should probably hold a reference to pci_dev and call
object_unref() on it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 10/19] vfio-user: run vfio-user context
  2022-02-17  7:48 ` [PATCH v6 10/19] vfio-user: run vfio-user context Jagannathan Raman
@ 2022-02-22 10:13   ` Stefan Hajnoczi
  2022-02-25 16:06   ` Eric Blake
  1 sibling, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-22 10:13 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 679 bytes --]

On Thu, Feb 17, 2022 at 02:48:57AM -0500, Jagannathan Raman wrote:
> Setup a handler to run vfio-user context. The context is driven by
> messages to the file descriptor associated with it - get the fd for
> the context and hook up the handler with it
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  qapi/misc.json            | 23 ++++++++++
>  hw/remote/vfio-user-obj.c | 96 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 118 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/19] vfio-user: IOMMU support for remote device
  2022-02-17  7:48 ` [PATCH v6 12/19] vfio-user: IOMMU support for remote device Jagannathan Raman
@ 2022-02-22 10:40   ` Stefan Hajnoczi
  2022-02-28 19:54     ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-22 10:40 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 996 bytes --]

On Thu, Feb 17, 2022 at 02:48:59AM -0500, Jagannathan Raman wrote:
> +struct RemoteIommuElem {
> +    AddressSpace  as;
> +    MemoryRegion  mr;
> +};
> +
> +GHashTable *remote_iommu_elem_by_bdf;

A mutable global hash table requires synchronization when device
emulation runs in multiple threads.

I suggest using pci_setup_iommu()'s iommu_opaque argument to avoid the
global. If there is only 1 device per remote PCI bus, then there are no
further synchronization concerns.

> +
> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
> +
> +static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
> +                                              void *opaque, int devfn)
> +{
> +    struct RemoteIommuElem *elem = NULL;
> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
> +
> +    if (!remote_iommu_elem_by_bdf) {
> +        return &address_space_memory;
> +    }

When can this happen? remote_configure_iommu() allocates
remote_iommu_elem_by_bdf so it should always be non-NULL.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 14/19] vfio-user: handle PCI BAR accesses
  2022-02-17  7:49 ` [PATCH v6 14/19] vfio-user: handle PCI BAR accesses Jagannathan Raman
@ 2022-02-22 11:04   ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-22 11:04 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	Jagannathan Raman, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 11550 bytes --]

On Thu, Feb 17, 2022 at 02:49:01AM -0500, Jagannathan Raman wrote:
> Determine the BARs used by the PCI device and register handlers to
> manage the access to the same.

Hi Paolo,
Please review this from the memory API perspective. vfu_object_bar_rw()
reimplements MemoryRegion read/write because we're dispatching to a
MemoryRegion without going through an AddressSpace/FlatView.

> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/exec/memory.h           |   3 +
>  hw/remote/vfio-user-obj.c       | 166 ++++++++++++++++++++++++++++++++
>  softmmu/physmem.c               |   4 +-
>  tests/qtest/fuzz/generic_fuzz.c |   9 +-
>  hw/remote/trace-events          |   3 +
>  5 files changed, 179 insertions(+), 6 deletions(-)
> 
> diff --git a/include/exec/memory.h b/include/exec/memory.h
> index 4d5997e6bb..4b061e62d5 100644
> --- a/include/exec/memory.h
> +++ b/include/exec/memory.h
> @@ -2810,6 +2810,9 @@ MemTxResult address_space_write_cached_slow(MemoryRegionCache *cache,
>                                              hwaddr addr, const void *buf,
>                                              hwaddr len);
>  
> +int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr);
> +bool prepare_mmio_access(MemoryRegion *mr);
> +
>  static inline bool memory_access_is_direct(MemoryRegion *mr, bool is_write)
>  {
>      if (is_write) {
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 971f6ca28e..2feabd06a4 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -53,6 +53,7 @@
>  #include "hw/qdev-core.h"
>  #include "hw/pci/pci.h"
>  #include "qemu/timer.h"
> +#include "exec/memory.h"
>  
>  #define TYPE_VFU_OBJECT "x-vfio-user-server"
>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -299,6 +300,169 @@ static void dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
>      trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
>  }
>  
> +static size_t vfu_object_bar_rw(PCIDevice *pci_dev, int pci_bar,
> +                                hwaddr offset, char * const buf,
> +                                hwaddr len, const bool is_write)
> +{
> +    uint8_t *ptr = (uint8_t *)buf;
> +    uint8_t *ram_ptr = NULL;
> +    bool release_lock = false;
> +    MemoryRegionSection section = { 0 };
> +    MemoryRegion *mr = NULL;
> +    int access_size;
> +    hwaddr size = 0;
> +    MemTxResult result;
> +    uint64_t val;
> +
> +    section = memory_region_find(pci_dev->io_regions[pci_bar].memory,
> +                                 offset, len);
> +
> +    if (!section.mr) {
> +        return 0;
> +    }
> +
> +    mr = section.mr;
> +
> +    if (is_write && mr->readonly) {
> +        warn_report("vfu: attempting to write to readonly region in "
> +                    "bar %d - [0x%"PRIx64" - 0x%"PRIx64"]",
> +                    pci_bar, offset, (offset + len));
> +        return 0;
> +    }
> +
> +    if (memory_access_is_direct(mr, is_write)) {
> +        /**
> +         * Some devices expose a PCI expansion ROM, which could be buffer
> +         * based as compared to other regions which are primarily based on
> +         * MemoryRegionOps. memory_region_find() would already check
> +         * for buffer overflow, we don't need to repeat it here.
> +         */
> +        ram_ptr = memory_region_get_ram_ptr(mr);
> +
> +        size = len;
> +
> +        if (is_write) {
> +            memcpy(ram_ptr, buf, size);
> +        } else {
> +            memcpy(buf, ram_ptr, size);
> +        }
> +
> +        goto exit;
> +    }
> +
> +    while (len > 0) {
> +        /**
> +         * The read/write logic used below is similar to the ones in
> +         * flatview_read/write_continue()
> +         */
> +        release_lock = prepare_mmio_access(mr);
> +
> +        access_size = memory_access_size(mr, len, offset);
> +
> +        if (is_write) {
> +            val = ldn_he_p(ptr, access_size);
> +
> +            result = memory_region_dispatch_write(mr, offset, val,
> +                                                  size_memop(access_size),
> +                                                  MEMTXATTRS_UNSPECIFIED);
> +        } else {
> +            result = memory_region_dispatch_read(mr, offset, &val,
> +                                                 size_memop(access_size),
> +                                                 MEMTXATTRS_UNSPECIFIED);
> +
> +            stn_he_p(ptr, access_size, val);
> +        }
> +
> +        if (release_lock) {
> +            qemu_mutex_unlock_iothread();
> +            release_lock = false;
> +        }
> +
> +        if (result != MEMTX_OK) {
> +            warn_report("vfu: failed to %s 0x%"PRIx64"",
> +                        is_write ? "write to" : "read from",
> +                        (offset - size));
> +
> +            goto exit;
> +        }
> +
> +        len -= access_size;
> +        size += access_size;
> +        ptr += access_size;
> +        offset += access_size;
> +    }
> +
> +exit:
> +    memory_region_unref(mr);
> +
> +    return size;
> +}
> +
> +/**
> + * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
> + *
> + * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
> + * define vfu_object_bar2_handler
> + */
> +#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
> +    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
> +                                        char * const buf, size_t count,        \
> +                                        loff_t offset, const bool is_write)    \
> +    {                                                                          \
> +        VfuObject *o = vfu_get_private(vfu_ctx);                               \
> +        PCIDevice *pci_dev = o->pci_dev;                                       \
> +                                                                               \
> +        return vfu_object_bar_rw(pci_dev, BAR_NO, offset,                      \
> +                                 buf, count, is_write);                        \
> +    }                                                                          \
> +
> +VFU_OBJECT_BAR_HANDLER(0)
> +VFU_OBJECT_BAR_HANDLER(1)
> +VFU_OBJECT_BAR_HANDLER(2)
> +VFU_OBJECT_BAR_HANDLER(3)
> +VFU_OBJECT_BAR_HANDLER(4)
> +VFU_OBJECT_BAR_HANDLER(5)
> +VFU_OBJECT_BAR_HANDLER(6)
> +
> +static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
> +    &vfu_object_bar0_handler,
> +    &vfu_object_bar1_handler,
> +    &vfu_object_bar2_handler,
> +    &vfu_object_bar3_handler,
> +    &vfu_object_bar4_handler,
> +    &vfu_object_bar5_handler,
> +    &vfu_object_bar6_handler,
> +};
> +
> +/**
> + * vfu_object_register_bars - Identify active BAR regions of pdev and setup
> + *                            callbacks to handle read/write accesses
> + */
> +static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
> +{
> +    int flags = VFU_REGION_FLAG_RW;
> +    int i;
> +
> +    for (i = 0; i < PCI_NUM_REGIONS; i++) {
> +        if (!pdev->io_regions[i].size) {
> +            continue;
> +        }
> +
> +        if ((i == VFU_PCI_DEV_ROM_REGION_IDX) ||
> +            pdev->io_regions[i].memory->readonly) {
> +            flags &= ~VFU_REGION_FLAG_WRITE;
> +        }
> +
> +        vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i,
> +                         (size_t)pdev->io_regions[i].size,
> +                         vfu_object_bar_handlers[i],
> +                         flags, NULL, 0, -1, 0);
> +
> +        trace_vfu_bar_register(i, pdev->io_regions[i].addr,
> +                               pdev->io_regions[i].size);
> +    }
> +}
> +
>  /*
>   * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>   * properties. It also depends on devices instantiated in QEMU. These
> @@ -393,6 +557,8 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>          goto fail;
>      }
>  
> +    vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
> +
>      ret = vfu_realize_ctx(o->vfu_ctx);
>      if (ret < 0) {
>          error_setg(errp, "vfu: Failed to realize device %s- %s",
> diff --git a/softmmu/physmem.c b/softmmu/physmem.c
> index dddf70edf5..3188d4e143 100644
> --- a/softmmu/physmem.c
> +++ b/softmmu/physmem.c
> @@ -2717,7 +2717,7 @@ void memory_region_flush_rom_device(MemoryRegion *mr, hwaddr addr, hwaddr size)
>      invalidate_and_set_dirty(mr, addr, size);
>  }
>  
> -static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> +int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>  {
>      unsigned access_size_max = mr->ops->valid.max_access_size;
>  
> @@ -2744,7 +2744,7 @@ static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>      return l;
>  }
>  
> -static bool prepare_mmio_access(MemoryRegion *mr)
> +bool prepare_mmio_access(MemoryRegion *mr)
>  {
>      bool release_lock = false;
>  
> diff --git a/tests/qtest/fuzz/generic_fuzz.c b/tests/qtest/fuzz/generic_fuzz.c
> index dd7e25851c..77547fc1d8 100644
> --- a/tests/qtest/fuzz/generic_fuzz.c
> +++ b/tests/qtest/fuzz/generic_fuzz.c
> @@ -144,7 +144,7 @@ static void *pattern_alloc(pattern p, size_t len)
>      return buf;
>  }
>  
> -static int memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
> +static int fuzz_memory_access_size(MemoryRegion *mr, unsigned l, hwaddr addr)
>  {
>      unsigned access_size_max = mr->ops->valid.max_access_size;
>  
> @@ -242,11 +242,12 @@ void fuzz_dma_read_cb(size_t addr, size_t len, MemoryRegion *mr)
>  
>          /*
>           *  If mr1 isn't RAM, address_space_translate doesn't update l. Use
> -         *  memory_access_size to identify the number of bytes that it is safe
> -         *  to write without accidentally writing to another MemoryRegion.
> +         *  fuzz_memory_access_size to identify the number of bytes that it
> +         *  is safe to write without accidentally writing to another
> +         *  MemoryRegion.
>           */
>          if (!memory_region_is_ram(mr1)) {
> -            l = memory_access_size(mr1, l, addr1);
> +            l = fuzz_memory_access_size(mr1, l, addr1);
>          }
>          if (memory_region_is_ram(mr1) ||
>              memory_region_is_romd(mr1) ||
> diff --git a/hw/remote/trace-events b/hw/remote/trace-events
> index f945c7e33b..847d50d88f 100644
> --- a/hw/remote/trace-events
> +++ b/hw/remote/trace-events
> @@ -9,3 +9,6 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
>  vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
>  vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
>  vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
> +vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
> +vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
> +vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 11/19] vfio-user: handle PCI config space accesses
  2022-02-17  7:48 ` [PATCH v6 11/19] vfio-user: handle PCI config space accesses Jagannathan Raman
@ 2022-02-22 11:09   ` Stefan Hajnoczi
  2022-02-28 19:23     ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-02-22 11:09 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 856 bytes --]

On Thu, Feb 17, 2022 at 02:48:58AM -0500, Jagannathan Raman wrote:
> Define and register handlers for PCI config space accesses
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  hw/remote/vfio-user-obj.c | 45 +++++++++++++++++++++++++++++++++++++++
>  hw/remote/trace-events    |  2 ++
>  2 files changed, 47 insertions(+)

hw/pci/pci.c:pci_update_mappings() will unmap/map BARs when the
vfio-user client touches BARs. Please add a comment that the remote
machine type never dispatches memory accesses in the global memory address
space and therefore we don't care that multiple remote devices may set
up conflicting Memory and I/O Space BARs.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-20  8:27           ` Paolo Bonzini
  2022-02-20 13:27             ` Paolo Bonzini
@ 2022-02-22 19:05             ` Jag Raman
  2022-02-24 17:52               ` Paolo Bonzini
  1 sibling, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-02-22 19:05 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: eduardo, Peter Maydell, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Juan Quintela, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Alex Williamson,
	Michael S. Tsirkin, Stefan Hajnoczi, Thanos Makatos, Eric Blake,
	Kanth Ghatraju, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé



> On Feb 20, 2022, at 3:27 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 2/18/22 15:49, Jag Raman wrote:
>> Concerning the generated files, I see the following in CMakeMesonToolchainFile.cmake:
>> Without patch: set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc" "-m64" "-mcx16”)
>> With patch: set(CMAKE_C_COMPILER "cc" "-m64" "-mcx16")
> 
> I don't understand why it works at all with the latter, but the right solution
> could be
> 
> set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc")
> set(CMAKE_C_COMPILER_ARG1 "-m64")
> set(CMAKE_C_COMPILER_ARG2 "-mcx16")
> 
> Perhaps you can try the following patch to meson (patch it in qemu's build
> directory and make sure to use --meson=internal):
> 
> diff --git a/mesonbuild/cmake/toolchain.py b/mesonbuild/cmake/toolchain.py
> index 316f57cb5..9756864ee 100644
> --- a/mesonbuild/cmake/toolchain.py
> +++ b/mesonbuild/cmake/toolchain.py
> @@ -191,11 +191,14 @@ class CMakeToolchain:
>                 continue
>              if len(exe_list) >= 2 and not self.is_cmdline_option(comp_obj, exe_list[1]):
> -                defaults[prefix + 'COMPILER_LAUNCHER'] = [make_abs(exe_list[0])]
> +                defaults[f'{prefix}COMPILER_LAUNCHER'] = [make_abs(exe_list[0])]
>                 exe_list = exe_list[1:]
>              exe_list[0] = make_abs(exe_list[0])
> -            defaults[prefix + 'COMPILER'] = exe_list
> +            defaults[f'{prefix}COMPILER'] = [exe_list[0]]
> +            for i in range(1, len(exe_list)):
> +                defaults[f'{prefix}COMPILER_ARG{i}'] = [exe_list[i]]
> +
>             if comp_obj.get_id() == 'clang-cl':
>                 defaults['CMAKE_LINKER'] = comp_obj.get_linker_exelist()

This fix works at my end.

Thank you!
--
Jag

> 
> 
> Thanks,
> 
> Paolo


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-22 19:05             ` Jag Raman
@ 2022-02-24 17:52               ` Paolo Bonzini
  2022-02-25  4:03                 ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2022-02-24 17:52 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Peter Maydell, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Juan Quintela, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Alex Williamson,
	Michael S. Tsirkin, Stefan Hajnoczi, Thanos Makatos, Eric Blake,
	Kanth Ghatraju, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé

On 2/22/22 20:05, Jag Raman wrote:
>> -            defaults[prefix + 'COMPILER'] = exe_list
>> +            defaults[f'{prefix}COMPILER'] = [exe_list[0]]
>> +            for i in range(1, len(exe_list)):
>> +                defaults[f'{prefix}COMPILER_ARG{i}'] = [exe_list[i]]
>> +
>>              if comp_obj.get_id() == 'clang-cl':
>>                  defaults['CMAKE_LINKER'] = comp_obj.get_linker_exelist()
> This fix works at my end.

Would you please check that -m64 and -mcx16 are passed indeed to the 
compiler?

Paolo


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-24 17:52               ` Paolo Bonzini
@ 2022-02-25  4:03                 ` Jag Raman
  2022-02-28 18:12                   ` Paolo Bonzini
  0 siblings, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-02-25  4:03 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: eduardo, Peter Maydell, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Juan Quintela, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Alex Williamson,
	Michael S. Tsirkin, Stefan Hajnoczi, Thanos Makatos, Eric Blake,
	Kanth Ghatraju, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé



> On Feb 24, 2022, at 12:52 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 2/22/22 20:05, Jag Raman wrote:
>>> -            defaults[prefix + 'COMPILER'] = exe_list
>>> +            defaults[f'{prefix}COMPILER'] = [exe_list[0]]
>>> +            for i in range(1, len(exe_list)):
>>> +                defaults[f'{prefix}COMPILER_ARG{i}'] = [exe_list[i]]
>>> +
>>>             if comp_obj.get_id() == 'clang-cl':
>>>                 defaults['CMAKE_LINKER'] = comp_obj.get_linker_exelist()
>> This fix works at my end.
> 
> Would you please check that -m64 and -mcx16 are passed indeed to the compiler?

Hi Paolo,

Yes, I’m able to see that -m64 and -mcx16 are passed to the compiler.

# cat ./subprojects/libvfio-user/__CMake_build/CMakeMesonToolchainFile.cmake
…
set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc")
set(CMAKE_C_COMPILER_ARG1 "-m64")
set(CMAKE_C_COMPILER_ARG2 "-mcx16")
…

Full log here: https://pastebin.com/PEwNSWMn

Thank you!
--
Jag

> 
> Paolo


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 07/19] vfio-user: define vfio-user-server object
  2022-02-17  7:48 ` [PATCH v6 07/19] vfio-user: define vfio-user-server object Jagannathan Raman
  2022-02-21 15:37   ` Stefan Hajnoczi
@ 2022-02-25 15:42   ` Eric Blake
  1 sibling, 0 replies; 76+ messages in thread
From: Eric Blake @ 2022-02-25 15:42 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, dgilbert, stefanha,
	thanos.makatos, pbonzini

On Thu, Feb 17, 2022 at 02:48:54AM -0500, Jagannathan Raman wrote:
> Define vfio-user object which is remote process server for QEMU. Setup
> object initialization functions and properties necessary to instantiate
> the object
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  qapi/qom.json             |  20 +++-
>  hw/remote/vfio-user-obj.c | 194 ++++++++++++++++++++++++++++++++++++++
>  MAINTAINERS               |   1 +
>  hw/remote/meson.build     |   1 +
>  hw/remote/trace-events    |   3 +
>  5 files changed, 217 insertions(+), 2 deletions(-)
>  create mode 100644 hw/remote/vfio-user-obj.c
> 
> diff --git a/qapi/qom.json b/qapi/qom.json
> index eeb5395ff3..ff266e4732 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -703,6 +703,20 @@
>  { 'struct': 'RemoteObjectProperties',
>    'data': { 'fd': 'str', 'devid': 'str' } }
>  
> +##
> +# @VfioUserServerProperties:
> +#
> +# Properties for x-vfio-user-server objects.
> +#
> +# @socket: socket to be used by the libvfiouser library
> +#
> +# @device: the id of the device to be emulated at the server
> +#
> +# Since: 6.3

The next release is 7.0, not 6.3.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 10/19] vfio-user: run vfio-user context
  2022-02-17  7:48 ` [PATCH v6 10/19] vfio-user: run vfio-user context Jagannathan Raman
  2022-02-22 10:13   ` Stefan Hajnoczi
@ 2022-02-25 16:06   ` Eric Blake
  2022-02-28 19:22     ` Jag Raman
  1 sibling, 1 reply; 76+ messages in thread
From: Eric Blake @ 2022-02-25 16:06 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, dgilbert, stefanha,
	thanos.makatos, pbonzini

On Thu, Feb 17, 2022 at 02:48:57AM -0500, Jagannathan Raman wrote:
> Setup a handler to run vfio-user context. The context is driven by
> messages to the file descriptor associated with it - get the fd for
> the context and hook up the handler with it
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  qapi/misc.json            | 23 ++++++++++
>  hw/remote/vfio-user-obj.c | 96 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 118 insertions(+), 1 deletion(-)
> 
> diff --git a/qapi/misc.json b/qapi/misc.json
> index e8054f415b..9d7f12ab04 100644
> --- a/qapi/misc.json
> +++ b/qapi/misc.json
> @@ -527,3 +527,26 @@
>   'data': { '*option': 'str' },
>   'returns': ['CommandLineOptionInfo'],
>   'allow-preconfig': true }
> +
> +##
> +# @VFU_CLIENT_HANGUP:
> +#
> +# Emitted when the client of a TYPE_VFIO_USER_SERVER closes the
> +# communication channel
> +#
> +# @id: ID of the TYPE_VFIO_USER_SERVER object
> +#
> +# @device: ID of attached PCI device
> +#
> +# Since: 6.3

7.0

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 03/19] qdev: unplug blocker for devices
  2022-02-21 15:27   ` Stefan Hajnoczi
@ 2022-02-28 16:23     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-28 16:23 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Feb 21, 2022, at 10:27 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:48:50AM -0500, Jagannathan Raman wrote:
>> Add blocker to prevent hot-unplug of devices
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> include/hw/qdev-core.h | 35 +++++++++++++++++++++++++++++++++++
>> softmmu/qdev-monitor.c | 26 ++++++++++++++++++++++++++
>> 2 files changed, 61 insertions(+)
>> 
>> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
>> index 92c3d65208..4b1d77f44a 100644
>> --- a/include/hw/qdev-core.h
>> +++ b/include/hw/qdev-core.h
>> @@ -193,6 +193,7 @@ struct DeviceState {
>>     int instance_id_alias;
>>     int alias_required_for_version;
>>     ResettableState reset;
>> +    GSList *unplug_blockers;
>> };
>> 
>> struct DeviceListener {
>> @@ -419,6 +420,40 @@ void qdev_simple_device_unplug_cb(HotplugHandler *hotplug_dev,
>> void qdev_machine_creation_done(void);
>> bool qdev_machine_modified(void);
>> 
>> +/**
>> + * Device Unplug blocker: prevents a device from being unplugged. It could
>      ^^^^^^^^^^^^^^^^^^^^^
> 
> This looks strange. gtkdoc will probably treat it as the doc comment for
> qdev_add_unplug_blocker(), which is actually defined below. I suggest
> not trying to define a new section in the documentation and instead just
> focussing on doc comments for qdev_add_unplug_block() and other
> functions.

Sorry I assumed that we needed an extra ‘*’ at the beginning of the
comment. I got this idea when checking out block.c and blockdev.c
while working on the migration patch.

I’ll follow the “Comment style” section in “docs/devel/style.rst"

> 
> The gtkdoc way of defining sections is covered here but it's almost
> never used in QEMU:
> https://developer-old.gnome.org/gtk-doc-manual/stable/documenting_sections.html.en
> 
>> + * be used to indicate that another object depends on the device.
>> + *
>> + * qdev_add_unplug_blocker: Adds an unplug blocker to a device
>> + *
>> + * @dev: Device to be blocked from unplug
>> + * @reason: Reason for blocking
>> + *
>> + */
>> +void qdev_add_unplug_blocker(DeviceState *dev, Error *reason);
> 
> Does the caller have to call qdev_del_unplug_blocker() later?
> 
> An assert(!dev->unplug_blockers) would be nice when DeviceState is
> destroyed. That way leaks will be caught.

Makes sense, will add.

> 
>> +
>> +/**
>> + * qdev_del_unplug_blocker: Removes an unplug blocker from a device
>> + *
>> + * @dev: Device to be unblocked
>> + * @reason: Pointer to the Error used with qdev_add_unplug_blocker.
>> + *          Used as a handle to lookup the blocker for deletion.
>> + *
>> + */
>> +void qdev_del_unplug_blocker(DeviceState *dev, Error *reason);
>> +
>> +/**
>> + * qdev_unplug_blocked: Confirms if a device is blocked from unplug
>> + *
>> + * @dev: Device to be tested
>> + * @reason: Returns one of the reasons why the device is blocked,
>> + *          if any
>> + *
>> + * Returns: true if device is blocked from unplug, false otherwise
>> + *
>> + */
>> +bool qdev_unplug_blocked(DeviceState *dev, Error **errp);
>> +
>> /**
>>  * GpioPolarity: Polarity of a GPIO line
>>  *
>> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
>> index 01f3834db5..69d9cf3f25 100644
>> --- a/softmmu/qdev-monitor.c
>> +++ b/softmmu/qdev-monitor.c
>> @@ -945,10 +945,36 @@ void qmp_device_del(const char *id, Error **errp)
>>             return;
>>         }
>> 
>> +        if (qdev_unplug_blocked(dev, errp)) {
>> +            return;
>> +        }
>> +
>>         qdev_unplug(dev, errp);
>>     }
>> }
>> 
>> +void qdev_add_unplug_blocker(DeviceState *dev, Error *reason)
> 
> These functions belong in hw/core/qdev.c because they are part of the
> DeviceState API, not qdev monitor commands?

Both hw/core/qdev.c and softmmu/qdev-monitor.c seem to manage the
DeviceState.

softmmu/qdev-monitor.c seems to manage device addition and
removal using qdev_device_add() and qdev_unplug(). I noticed
that some functions in this file change DeviceState. For example,
qdev_device_add() sets DeviceState->opts, qdev_set_id() sets
DeviceState->id. Given the above two reasons, I thought it the
unplug blockers could be better place here.

Since hw/core/qdev.c makes the majority of changes to the
DeviceState, moving the unplug blockers over there makes
sense to me.

Thank you!
--
Jag

> 
>> +{
>> +    dev->unplug_blockers = g_slist_prepend(dev->unplug_blockers, reason);
>> +}
>> +
>> +void qdev_del_unplug_blocker(DeviceState *dev, Error *reason)
>> +{
>> +    dev->unplug_blockers = g_slist_remove(dev->unplug_blockers, reason);
>> +}
>> +
>> +bool qdev_unplug_blocked(DeviceState *dev, Error **errp)
>> +{
>> +    ERRP_GUARD();
>> +
>> +    if (dev->unplug_blockers) {
>> +        error_propagate(errp, error_copy(dev->unplug_blockers->data));
>> +        return true;
>> +    }
>> +
>> +    return false;
>> +}
>> +
>> void hmp_device_add(Monitor *mon, const QDict *qdict)
>> {
>>     Error *err = NULL;
>> -- 
>> 2.20.1
>> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-25  4:03                 ` Jag Raman
@ 2022-02-28 18:12                   ` Paolo Bonzini
  2022-02-28 19:55                     ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Paolo Bonzini @ 2022-02-28 18:12 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Peter Maydell, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Juan Quintela, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Alex Williamson,
	Michael S. Tsirkin, Stefan Hajnoczi, Thanos Makatos, Eric Blake,
	Kanth Ghatraju, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé

On 2/25/22 05:03, Jag Raman wrote:
> 
> 
>> On Feb 24, 2022, at 12:52 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>
>> On 2/22/22 20:05, Jag Raman wrote:
>>>> -            defaults[prefix + 'COMPILER'] = exe_list
>>>> +            defaults[f'{prefix}COMPILER'] = [exe_list[0]]
>>>> +            for i in range(1, len(exe_list)):
>>>> +                defaults[f'{prefix}COMPILER_ARG{i}'] = [exe_list[i]]
>>>> +
>>>>              if comp_obj.get_id() == 'clang-cl':
>>>>                  defaults['CMAKE_LINKER'] = comp_obj.get_linker_exelist()
>>> This fix works at my end.
>>
>> Would you please check that -m64 and -mcx16 are passed indeed to the compiler?
> 
> Hi Paolo,
> 
> Yes, I’m able to see that -m64 and -mcx16 are passed to the compiler.
> 
> # cat ./subprojects/libvfio-user/__CMake_build/CMakeMesonToolchainFile.cmake
> …
> set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc")
> set(CMAKE_C_COMPILER_ARG1 "-m64")
> set(CMAKE_C_COMPILER_ARG2 "-mcx16")
> …

I reproduced this with CMake 3.17.x and it's fixed by 3.19.x.  I'll send 
the fix to Meson, but for now I recommend requiring a newer CMake and 
just dropping this patch.

Paolo


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 03/19] qdev: unplug blocker for devices
  2022-02-21 15:30   ` Stefan Hajnoczi
@ 2022-02-28 19:11     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-28 19:11 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Feb 21, 2022, at 10:30 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:48:50AM -0500, Jagannathan Raman wrote:
>> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
>> index 01f3834db5..69d9cf3f25 100644
>> --- a/softmmu/qdev-monitor.c
>> +++ b/softmmu/qdev-monitor.c
>> @@ -945,10 +945,36 @@ void qmp_device_del(const char *id, Error **errp)
>>             return;
>>         }
>> 
>> +        if (qdev_unplug_blocked(dev, errp)) {
>> +            return;
>> +        }
>> +
>>         qdev_unplug(dev, errp);
> 
> Can qdev_unplug() check this internally?

Yes, I think qdev_unplug() could check this internally. Will move it there.

Thank you!
--
Jag



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 07/19] vfio-user: define vfio-user-server object
  2022-02-21 15:37   ` Stefan Hajnoczi
@ 2022-02-28 19:14     ` Jag Raman
  2022-03-02 16:45       ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-02-28 19:14 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Feb 21, 2022, at 10:37 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:48:54AM -0500, Jagannathan Raman wrote:
>> +struct VfuObjectClass {
>> +    ObjectClass parent_class;
>> +
>> +    unsigned int nr_devs;
>> +
>> +    /*
>> +     * Can be set to shutdown automatically when all server object
>> +     * instances are destroyed
>> +     */
>> +    bool auto_shutdown;
> 
> This field is introduced but it is hardcoded to true. Is there a way to
> set it to false?

We could add a property to ’TYPE_REMOTE_MACHINE’ which indicates if
it would run as a server/daemon.

Thank you!
--
Jag



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 08/19] vfio-user: instantiate vfio-user context
  2022-02-21 15:42   ` Stefan Hajnoczi
@ 2022-02-28 19:16     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-28 19:16 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Feb 21, 2022, at 10:42 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:48:55AM -0500, Jagannathan Raman wrote:
>> @@ -124,6 +190,11 @@ static void vfu_object_init(Object *obj)
>> 
>>     k->nr_devs++;
>> 
>> +    if (!phase_check(PHASE_MACHINE_READY)) {
>> +        o->machine_done.notify = vfu_object_machine_done;
>> +        qemu_add_machine_init_done_notifier(&o->machine_done);
> 
> This probably has to happen after the next if statement since
> qemu_add_machine_init_done_notifier() can immediately call ->notify()
> and we'd try to initialize on a non-remote machine type.

OK, got it.

Thank you!
--
Jag



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 09/19] vfio-user: find and init PCI device
  2022-02-21 15:57   ` Stefan Hajnoczi
@ 2022-02-28 19:17     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-28 19:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Feb 21, 2022, at 10:57 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:48:56AM -0500, Jagannathan Raman wrote:
>> @@ -221,6 +272,14 @@ static void vfu_object_finalize(Object *obj)
>> 
>>     o->device = NULL;
>> 
>> +    if (o->unplug_blocker && o->pci_dev) {
>> +        qdev_del_unplug_blocker(DEVICE(o->pci_dev), o->unplug_blocker);
>> +        error_free(o->unplug_blocker);
>> +        o->unplug_blocker = NULL;
>> +    }
>> +
>> +    o->pci_dev = NULL;
> 
> Since we don't hold a reference to o->pci_dev there is an assumption
> about the order of --object vs --device ->finalize() here. I think it
> will work because softmmu/runstate.c:qemu_cleanup() doesn't unref
> main_system_bus and only --object ->finalize() is called, but this seems
> fragile. We should probably hold a reference to pci_dev and call
> object_unref() on it.

OK, will do.

Thank you!
--
Jag



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 10/19] vfio-user: run vfio-user context
  2022-02-25 16:06   ` Eric Blake
@ 2022-02-28 19:22     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-28 19:22 UTC (permalink / raw)
  To: Eric Blake
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, dgilbert,
	Stefan Hajnoczi, thanos.makatos, pbonzini



> On Feb 25, 2022, at 11:06 AM, Eric Blake <eblake@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:48:57AM -0500, Jagannathan Raman wrote:
>> Setup a handler to run vfio-user context. The context is driven by
>> messages to the file descriptor associated with it - get the fd for
>> the context and hook up the handler with it
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> qapi/misc.json            | 23 ++++++++++
>> hw/remote/vfio-user-obj.c | 96 ++++++++++++++++++++++++++++++++++++++-
>> 2 files changed, 118 insertions(+), 1 deletion(-)
>> 
>> diff --git a/qapi/misc.json b/qapi/misc.json
>> index e8054f415b..9d7f12ab04 100644
>> --- a/qapi/misc.json
>> +++ b/qapi/misc.json
>> @@ -527,3 +527,26 @@
>>  'data': { '*option': 'str' },
>>  'returns': ['CommandLineOptionInfo'],
>>  'allow-preconfig': true }
>> +
>> +##
>> +# @VFU_CLIENT_HANGUP:
>> +#
>> +# Emitted when the client of a TYPE_VFIO_USER_SERVER closes the
>> +# communication channel
>> +#
>> +# @id: ID of the TYPE_VFIO_USER_SERVER object
>> +#
>> +# @device: ID of attached PCI device
>> +#
>> +# Since: 6.3
> 
> 7.0

OK, got it. Looks like the next release version is 7.0:
https://gitlab.com/qemu-project/qemu/-/milestones

Thank you!
--
Jag

> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
> 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 11/19] vfio-user: handle PCI config space accesses
  2022-02-22 11:09   ` Stefan Hajnoczi
@ 2022-02-28 19:23     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-28 19:23 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Feb 22, 2022, at 6:09 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:48:58AM -0500, Jagannathan Raman wrote:
>> Define and register handlers for PCI config space accesses
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
>> ---
>> hw/remote/vfio-user-obj.c | 45 +++++++++++++++++++++++++++++++++++++++
>> hw/remote/trace-events    |  2 ++
>> 2 files changed, 47 insertions(+)
> 
> hw/pci/pci.c:pci_update_mappings() will unmap/map BARs when the
> vfio-user client touches BARs. Please add a comment that the remote
> machine type never dispatches memory accesses in the global memory address
> space and therefore we don't care that multiple remote devices may set
> up conflicting Memory and I/O Space BARs.

OK, will do.

Thank you!
--
Jag



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/19] vfio-user: IOMMU support for remote device
  2022-02-22 10:40   ` Stefan Hajnoczi
@ 2022-02-28 19:54     ` Jag Raman
  2022-03-02 16:49       ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-02-28 19:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Feb 22, 2022, at 5:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:48:59AM -0500, Jagannathan Raman wrote:
>> +struct RemoteIommuElem {
>> +    AddressSpace  as;
>> +    MemoryRegion  mr;
>> +};
>> +
>> +GHashTable *remote_iommu_elem_by_bdf;
> 
> A mutable global hash table requires synchronization when device
> emulation runs in multiple threads.
> 
> I suggest using pci_setup_iommu()'s iommu_opaque argument to avoid the
> global. If there is only 1 device per remote PCI bus, then there are no
> further synchronization concerns.

OK, will avoid the global. We would need to access the hash table
concurrently since there could be more than one device in the
same bus - so a mutex would be needed here.

> 
>> +
>> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
>> +
>> +static AddressSpace *remote_iommu_find_add_as(PCIBus *pci_bus,
>> +                                              void *opaque, int devfn)
>> +{
>> +    struct RemoteIommuElem *elem = NULL;
>> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_bus), devfn);
>> +
>> +    if (!remote_iommu_elem_by_bdf) {
>> +        return &address_space_memory;
>> +    }
> 
> When can this happen? remote_configure_iommu() allocates
> remote_iommu_elem_by_bdf so it should always be non-NULL.

I think we won’t hit this case. g_hash_table_new_full() would always succeed.

Thank you!
--
Jag


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 01/19] configure, meson: override C compiler for cmake
  2022-02-28 18:12                   ` Paolo Bonzini
@ 2022-02-28 19:55                     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-02-28 19:55 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: eduardo, Peter Maydell, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Juan Quintela, qemu-devel,
	Elena Ufimtseva, Markus Armbruster, Alex Williamson,
	Michael S. Tsirkin, Stefan Hajnoczi, Thanos Makatos, Eric Blake,
	Kanth Ghatraju, Dr. David Alan Gilbert,
	Philippe Mathieu-Daudé



> On Feb 28, 2022, at 1:12 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
> On 2/25/22 05:03, Jag Raman wrote:
>>> On Feb 24, 2022, at 12:52 PM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>>> 
>>> On 2/22/22 20:05, Jag Raman wrote:
>>>>> -            defaults[prefix + 'COMPILER'] = exe_list
>>>>> +            defaults[f'{prefix}COMPILER'] = [exe_list[0]]
>>>>> +            for i in range(1, len(exe_list)):
>>>>> +                defaults[f'{prefix}COMPILER_ARG{i}'] = [exe_list[i]]
>>>>> +
>>>>>             if comp_obj.get_id() == 'clang-cl':
>>>>>                 defaults['CMAKE_LINKER'] = comp_obj.get_linker_exelist()
>>>> This fix works at my end.
>>> 
>>> Would you please check that -m64 and -mcx16 are passed indeed to the compiler?
>> Hi Paolo,
>> Yes, I’m able to see that -m64 and -mcx16 are passed to the compiler.
>> # cat ./subprojects/libvfio-user/__CMake_build/CMakeMesonToolchainFile.cmake
>> …
>> set(CMAKE_C_COMPILER "/opt/rh/devtoolset-9/root/usr/bin/cc")
>> set(CMAKE_C_COMPILER_ARG1 "-m64")
>> set(CMAKE_C_COMPILER_ARG2 "-mcx16")
>> …
> 
> I reproduced this with CMake 3.17.x and it's fixed by 3.19.x.  I'll send the fix to Meson, but for now I recommend requiring a newer CMake and just dropping this patch.

OK, got it.

Thank you!
--
Jag

> 
> Paolo


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 07/19] vfio-user: define vfio-user-server object
  2022-02-28 19:14     ` Jag Raman
@ 2022-03-02 16:45       ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-02 16:45 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 1006 bytes --]

On Mon, Feb 28, 2022 at 07:14:21PM +0000, Jag Raman wrote:
> > On Feb 21, 2022, at 10:37 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Thu, Feb 17, 2022 at 02:48:54AM -0500, Jagannathan Raman wrote:
> >> +struct VfuObjectClass {
> >> +    ObjectClass parent_class;
> >> +
> >> +    unsigned int nr_devs;
> >> +
> >> +    /*
> >> +     * Can be set to shutdown automatically when all server object
> >> +     * instances are destroyed
> >> +     */
> >> +    bool auto_shutdown;
> > 
> > This field is introduced but it is hardcoded to true. Is there a way to
> > set it to false?
> 
> We could add a property to ’TYPE_REMOTE_MACHINE’ which indicates if
> it would run as a server/daemon.

Yes.

An alternative is to add a per-instance property to --object
x-vfio-user-server. In practice there is not much benefit since users
are unlikely to mix auto-shutdown instances with non-auto-shutdown
instances, but the code might be a little simpler and cleaner.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/19] vfio-user: IOMMU support for remote device
  2022-02-28 19:54     ` Jag Raman
@ 2022-03-02 16:49       ` Stefan Hajnoczi
  2022-03-03 14:49         ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-02 16:49 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On Mon, Feb 28, 2022 at 07:54:38PM +0000, Jag Raman wrote:
> 
> 
> > On Feb 22, 2022, at 5:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Thu, Feb 17, 2022 at 02:48:59AM -0500, Jagannathan Raman wrote:
> >> +struct RemoteIommuElem {
> >> +    AddressSpace  as;
> >> +    MemoryRegion  mr;
> >> +};
> >> +
> >> +GHashTable *remote_iommu_elem_by_bdf;
> > 
> > A mutable global hash table requires synchronization when device
> > emulation runs in multiple threads.
> > 
> > I suggest using pci_setup_iommu()'s iommu_opaque argument to avoid the
> > global. If there is only 1 device per remote PCI bus, then there are no
> > further synchronization concerns.
> 
> OK, will avoid the global. We would need to access the hash table
> concurrently since there could be more than one device in the
> same bus - so a mutex would be needed here.

I thought the PCIe topology can be set up with a separate buf for each
x-vfio-user-server? I remember something like that in the previous
revision where a root port was instantiated for each x-vfio-user-server.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/19] vfio-user: IOMMU support for remote device
  2022-03-02 16:49       ` Stefan Hajnoczi
@ 2022-03-03 14:49         ` Jag Raman
  2022-03-07  9:45           ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-03-03 14:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Mar 2, 2022, at 11:49 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Mon, Feb 28, 2022 at 07:54:38PM +0000, Jag Raman wrote:
>> 
>> 
>>> On Feb 22, 2022, at 5:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> 
>>> On Thu, Feb 17, 2022 at 02:48:59AM -0500, Jagannathan Raman wrote:
>>>> +struct RemoteIommuElem {
>>>> +    AddressSpace  as;
>>>> +    MemoryRegion  mr;
>>>> +};
>>>> +
>>>> +GHashTable *remote_iommu_elem_by_bdf;
>>> 
>>> A mutable global hash table requires synchronization when device
>>> emulation runs in multiple threads.
>>> 
>>> I suggest using pci_setup_iommu()'s iommu_opaque argument to avoid the
>>> global. If there is only 1 device per remote PCI bus, then there are no
>>> further synchronization concerns.
>> 
>> OK, will avoid the global. We would need to access the hash table
>> concurrently since there could be more than one device in the
>> same bus - so a mutex would be needed here.
> 
> I thought the PCIe topology can be set up with a separate buf for each
> x-vfio-user-server? I remember something like that in the previous
> revision where a root port was instantiated for each x-vfio-user-server.

Yes, we could setup the PCIe topology to be that way. But the user could
add more than one device to the same bus, unless the bus type explicitly
limits the number of devices to one (BusClass->max_dev).

Thank you!
--
Jag

> 
> Stefan



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/19] vfio-user: IOMMU support for remote device
  2022-03-03 14:49         ` Jag Raman
@ 2022-03-07  9:45           ` Stefan Hajnoczi
  2022-03-07 14:42             ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-07  9:45 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]

On Thu, Mar 03, 2022 at 02:49:53PM +0000, Jag Raman wrote:
> 
> 
> > On Mar 2, 2022, at 11:49 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Mon, Feb 28, 2022 at 07:54:38PM +0000, Jag Raman wrote:
> >> 
> >> 
> >>> On Feb 22, 2022, at 5:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> 
> >>> On Thu, Feb 17, 2022 at 02:48:59AM -0500, Jagannathan Raman wrote:
> >>>> +struct RemoteIommuElem {
> >>>> +    AddressSpace  as;
> >>>> +    MemoryRegion  mr;
> >>>> +};
> >>>> +
> >>>> +GHashTable *remote_iommu_elem_by_bdf;
> >>> 
> >>> A mutable global hash table requires synchronization when device
> >>> emulation runs in multiple threads.
> >>> 
> >>> I suggest using pci_setup_iommu()'s iommu_opaque argument to avoid the
> >>> global. If there is only 1 device per remote PCI bus, then there are no
> >>> further synchronization concerns.
> >> 
> >> OK, will avoid the global. We would need to access the hash table
> >> concurrently since there could be more than one device in the
> >> same bus - so a mutex would be needed here.
> > 
> > I thought the PCIe topology can be set up with a separate buf for each
> > x-vfio-user-server? I remember something like that in the previous
> > revision where a root port was instantiated for each x-vfio-user-server.
> 
> Yes, we could setup the PCIe topology to be that way. But the user could
> add more than one device to the same bus, unless the bus type explicitly
> limits the number of devices to one (BusClass->max_dev).

Due to how the IOMMU is used to restrict the bus to the vfio-user
client's DMA mappings, it seems like it's necesssary to limit the number
of devices to 1 per bus anyway?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-02-17  7:49 ` [PATCH v6 15/19] vfio-user: handle device interrupts Jagannathan Raman
@ 2022-03-07 10:24   ` Stefan Hajnoczi
  2022-03-07 15:10     ` Jag Raman
  2022-03-26 23:47     ` Jag Raman
  0 siblings, 2 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-07 10:24 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 14495 bytes --]

On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
> Forward remote device's interrupts to the guest
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/hw/pci/pci.h              |   6 ++
>  include/hw/remote/vfio-user-obj.h |   6 ++
>  hw/pci/msi.c                      |  13 +++-
>  hw/pci/msix.c                     |  12 +++-
>  hw/remote/machine.c               |  11 +--
>  hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
>  stubs/vfio-user-obj.c             |   6 ++
>  MAINTAINERS                       |   1 +
>  hw/remote/trace-events            |   1 +
>  stubs/meson.build                 |   1 +
>  10 files changed, 158 insertions(+), 6 deletions(-)
>  create mode 100644 include/hw/remote/vfio-user-obj.h
>  create mode 100644 stubs/vfio-user-obj.c
> 
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index c3f3c90473..d42d526a48 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
>  typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
>                                  pcibus_t addr, pcibus_t size, int type);
>  typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
>  
>  typedef struct PCIIORegion {
>      pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
> @@ -323,6 +325,10 @@ struct PCIDevice {
>      /* Space to store MSIX table & pending bit array */
>      uint8_t *msix_table;
>      uint8_t *msix_pba;
> +
> +    PCIMSINotify *msi_notify;
> +    PCIMSIxNotify *msix_notify;
> +
>      /* MemoryRegion container for msix exclusive BAR setup */
>      MemoryRegion msix_exclusive_bar;
>      /* Memory Regions for MSIX table and pending bit entries. */
> diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-user-obj.h
> new file mode 100644
> index 0000000000..87ab78b875
> --- /dev/null
> +++ b/include/hw/remote/vfio-user-obj.h
> @@ -0,0 +1,6 @@
> +#ifndef VFIO_USER_OBJ_H
> +#define VFIO_USER_OBJ_H
> +
> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
> +
> +#endif
> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> index 47d2b0f33c..93f5e400cc 100644
> --- a/hw/pci/msi.c
> +++ b/hw/pci/msi.c
> @@ -51,6 +51,8 @@
>   */
>  bool msi_nonbroken;
>  
> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
> +
>  /* If we get rid of cap allocator, we won't need this. */
>  static inline uint8_t msi_cap_sizeof(uint16_t flags)
>  {
> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>      dev->msi_cap = config_offset;
>      dev->cap_present |= QEMU_PCI_CAP_MSI;
>  
> +    dev->msi_notify = pci_msi_notify;

Are you sure it's correct to skip the msi_is_masked() logic? I think the
callback function should only override the behavior of
msi_send_message(), not the entire msi_notify() function.

The same applies to MSI-X.

> +
>      pci_set_word(dev->config + msi_flags_off(dev), flags);
>      pci_set_word(dev->wmask + msi_flags_off(dev),
>                   PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
> @@ -307,7 +311,7 @@ bool msi_is_masked(const PCIDevice *dev, unsigned int vector)
>      return mask & (1U << vector);
>  }
>  
> -void msi_notify(PCIDevice *dev, unsigned int vector)
> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector)
>  {
>      uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
>      bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
> @@ -332,6 +336,13 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
>      msi_send_message(dev, msg);
>  }
>  
> +void msi_notify(PCIDevice *dev, unsigned int vector)
> +{
> +    if (dev->msi_notify) {

Can this ever be NULL?

> +        dev->msi_notify(dev, vector);
> +    }
> +}
> +
>  void msi_send_message(PCIDevice *dev, MSIMessage msg)
>  {
>      MemTxAttrs attrs = {};
> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
> index ae9331cd0b..1c71e67f53 100644
> --- a/hw/pci/msix.c
> +++ b/hw/pci/msix.c
> @@ -31,6 +31,8 @@
>  #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
>  #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
>  
> +static void pci_msix_notify(PCIDevice *dev, unsigned vector);
> +
>  MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
>  {
>      uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
> @@ -334,6 +336,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>      dev->msix_table = g_malloc0(table_size);
>      dev->msix_pba = g_malloc0(pba_size);
>      dev->msix_entry_used = g_malloc0(nentries * sizeof *dev->msix_entry_used);
> +    dev->msix_notify = pci_msix_notify;
>  
>      msix_mask_all(dev, nentries);
>  
> @@ -485,7 +488,7 @@ int msix_enabled(PCIDevice *dev)
>  }
>  
>  /* Send an MSI-X message */
> -void msix_notify(PCIDevice *dev, unsigned vector)
> +static void pci_msix_notify(PCIDevice *dev, unsigned vector)
>  {
>      MSIMessage msg;
>  
> @@ -503,6 +506,13 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>      msi_send_message(dev, msg);
>  }
>  
> +void msix_notify(PCIDevice *dev, unsigned vector)
> +{
> +    if (dev->msix_notify) {

Can this ever be NULL?

> +        dev->msix_notify(dev, vector);
> +    }
> +}
> +
>  void msix_reset(PCIDevice *dev)
>  {
>      if (!msix_present(dev)) {
> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> index db4ae30710..a8b4a3aef3 100644
> --- a/hw/remote/machine.c
> +++ b/hw/remote/machine.c
> @@ -23,6 +23,7 @@
>  #include "hw/remote/iohub.h"
>  #include "hw/qdev-core.h"
>  #include "hw/remote/iommu.h"
> +#include "hw/remote/vfio-user-obj.h"
>  
>  static void remote_machine_init(MachineState *machine)
>  {
> @@ -54,12 +55,14 @@ static void remote_machine_init(MachineState *machine)
>  
>      if (s->vfio_user) {
>          remote_configure_iommu(pci_host->bus);
> -    }
>  
> -    remote_iohub_init(&s->iohub);
> +        vfu_object_set_bus_irq(pci_host->bus);
> +    } else {
> +        remote_iohub_init(&s->iohub);
>  
> -    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
> -                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
> +        pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
> +                     &s->iohub, REMOTE_IOHUB_NB_PIRQS);
> +    }
>  
>      qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
>  }
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 2feabd06a4..d79bab87f1 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -54,6 +54,9 @@
>  #include "hw/pci/pci.h"
>  #include "qemu/timer.h"
>  #include "exec/memory.h"
> +#include "hw/pci/msi.h"
> +#include "hw/pci/msix.h"
> +#include "hw/remote/vfio-user-obj.h"
>  
>  #define TYPE_VFU_OBJECT "x-vfio-user-server"
>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -107,6 +110,10 @@ struct VfuObject {
>      int vfu_poll_fd;
>  };
>  
> +static GHashTable *vfu_object_bdf_to_ctx_table;

I suggest adding a void *msi_notify_opaque field to PCIDevice and
passing the value as an argument to ->msi_notify(). vfio-user-obj.c can
set the value to vfu_ctx and eliminate the vfu_object_bdf_to_ctx_table
hash table.

This simplifies the code, makes it faster, and solves race conditions
during hot plug/unplug if other instances are running in IOThreads.

> +
> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
> +
>  static void vfu_object_init_ctx(VfuObject *o, Error **errp);
>  
>  static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
> @@ -463,6 +470,86 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>      }
>  }
>  
> +static void vfu_object_irq_trigger(int pci_bdf, unsigned vector)
> +{
> +    vfu_ctx_t *vfu_ctx = NULL;
> +
> +    if (!vfu_object_bdf_to_ctx_table) {

Can this ever be NULL?

> +        return;
> +    }
> +
> +    vfu_ctx = g_hash_table_lookup(vfu_object_bdf_to_ctx_table,
> +                                  INT2VOIDP(pci_bdf));
> +
> +    if (vfu_ctx) {
> +        vfu_irq_trigger(vfu_ctx, vector);
> +    }
> +}
> +
> +static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
> +{
> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
> +                                pci_dev->devfn);
> +
> +    return pci_bdf;
> +}
> +
> +static void vfu_object_set_irq(void *opaque, int pirq, int level)
> +{
> +    if (level) {
> +        vfu_object_irq_trigger(pirq, 0);
> +    }
> +}
> +
> +static void vfu_object_msi_notify(PCIDevice *pci_dev, unsigned vector)
> +{
> +    int pci_bdf;
> +
> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
> +
> +    vfu_object_irq_trigger(pci_bdf, vector);
> +}
> +
> +static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
> +{
> +    vfu_ctx_t *vfu_ctx = o->vfu_ctx;
> +    int ret, pci_bdf;
> +
> +    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = 0;
> +    if (msix_nr_vectors_allocated(pci_dev)) {
> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
> +                                       msix_nr_vectors_allocated(pci_dev));
> +
> +        pci_dev->msix_notify = vfu_object_msi_notify;
> +    } else if (msi_nr_vectors_allocated(pci_dev)) {
> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
> +                                       msi_nr_vectors_allocated(pci_dev));
> +
> +        pci_dev->msi_notify = vfu_object_msi_notify;
> +    }
> +
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
> +
> +    g_hash_table_insert(vfu_object_bdf_to_ctx_table, INT2VOIDP(pci_bdf),
> +                        o->vfu_ctx);
> +
> +    return 0;
> +}
> +
> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
> +{
> +    pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, NULL, 1);
> +}
> +
>  /*
>   * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>   * properties. It also depends on devices instantiated in QEMU. These
> @@ -559,6 +646,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>  
>      vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
>  
> +    ret = vfu_object_setup_irqs(o, o->pci_dev);
> +    if (ret < 0) {
> +        error_setg(errp, "vfu: Failed to setup interrupts for %s",
> +                   o->device);
> +        goto fail;
> +    }
> +
>      ret = vfu_realize_ctx(o->vfu_ctx);
>      if (ret < 0) {
>          error_setg(errp, "vfu: Failed to realize device %s- %s",
> @@ -612,6 +706,7 @@ static void vfu_object_finalize(Object *obj)
>  {
>      VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
>      VfuObject *o = VFU_OBJECT(obj);
> +    int pci_bdf;
>  
>      k->nr_devs--;
>  
> @@ -638,9 +733,17 @@ static void vfu_object_finalize(Object *obj)
>          o->unplug_blocker = NULL;
>      }
>  
> +    if (o->pci_dev) {
> +        pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(o->pci_dev)),
> +                                o->pci_dev->devfn);
> +        g_hash_table_remove(vfu_object_bdf_to_ctx_table, INT2VOIDP(pci_bdf));
> +    }
> +
>      o->pci_dev = NULL;
>  
>      if (!k->nr_devs && k->auto_shutdown) {
> +        g_hash_table_destroy(vfu_object_bdf_to_ctx_table);
> +        vfu_object_bdf_to_ctx_table = NULL;
>          qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>      }
>  
> @@ -658,6 +761,10 @@ static void vfu_object_class_init(ObjectClass *klass, void *data)
>  
>      k->auto_shutdown = true;
>  
> +    msi_nonbroken = true;

This should go in hw/remote/machine.c. It's a global variable related to
the machine's interrupt controller capabilities. The value is not
related to vfu_object_class_init(), which will be called by any QEMU
binary that links hw/remote/vfio-user-obj.o regardless of which machine
type is instantiated.

> +
> +    vfu_object_bdf_to_ctx_table = g_hash_table_new_full(NULL, NULL, NULL, NULL);
> +
>      object_class_property_add(klass, "socket", "SocketAddress", NULL,
>                                vfu_object_set_socket, NULL, NULL);
>      object_class_property_set_description(klass, "socket",
> diff --git a/stubs/vfio-user-obj.c b/stubs/vfio-user-obj.c
> new file mode 100644
> index 0000000000..79100d768e
> --- /dev/null
> +++ b/stubs/vfio-user-obj.c
> @@ -0,0 +1,6 @@
> +#include "qemu/osdep.h"
> +#include "hw/remote/vfio-user-obj.h"
> +
> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
> +{
> +}
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f47232c78c..e274cb46af 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3569,6 +3569,7 @@ F: hw/remote/iohub.c
>  F: include/hw/remote/iohub.h
>  F: subprojects/libvfio-user
>  F: hw/remote/vfio-user-obj.c
> +F: include/hw/remote/vfio-user-obj.h
>  F: hw/remote/iommu.c
>  F: include/hw/remote/iommu.h
>  
> diff --git a/hw/remote/trace-events b/hw/remote/trace-events
> index 847d50d88f..c167b3c7a5 100644
> --- a/hw/remote/trace-events
> +++ b/hw/remote/trace-events
> @@ -12,3 +12,4 @@ vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
>  vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
>  vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
>  vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
> +vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
> diff --git a/stubs/meson.build b/stubs/meson.build
> index d359cbe1ad..c5ce979dc3 100644
> --- a/stubs/meson.build
> +++ b/stubs/meson.build
> @@ -57,3 +57,4 @@ if have_system
>  else
>    stub_ss.add(files('qdev.c'))
>  endif
> +stub_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_false: files('vfio-user-obj.c'))
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 16/19] softmmu/vl: defer backend init
  2022-02-17  7:49 ` [PATCH v6 16/19] softmmu/vl: defer backend init Jagannathan Raman
@ 2022-03-07 10:48   ` Stefan Hajnoczi
  2022-03-07 15:31     ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-07 10:48 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 3316 bytes --]

On Thu, Feb 17, 2022 at 02:49:03AM -0500, Jagannathan Raman wrote:
> Allow deferred initialization of backends. TYPE_REMOTE_MACHINE is
> agnostic to QEMU's RUN_STATE. It's state is driven by the QEMU client

s/It's/Its/

> via the vfio-user protocol. Whereas, the backends presently defer
> initialization if QEMU is in RUN_STATE_INMIGRATE. Since the remote
> machine can't use RUN_STATE*, this commit allows it to ask for deferred
> initialization of backend device. It is primarily targeted towards block
> devices in this commit, but it needed not be limited to that.

What is the purpose of this commit? I don't understand the description.

> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/sysemu/sysemu.h    |  4 ++++
>  block/block-backend.c      |  3 ++-
>  blockdev.c                 |  2 +-
>  softmmu/vl.c               | 17 +++++++++++++++++
>  stubs/defer-backend-init.c |  7 +++++++
>  MAINTAINERS                |  1 +
>  stubs/meson.build          |  1 +
>  7 files changed, 33 insertions(+), 2 deletions(-)
>  create mode 100644 stubs/defer-backend-init.c
> 
> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
> index b9421e03ff..3179eb1857 100644
> --- a/include/sysemu/sysemu.h
> +++ b/include/sysemu/sysemu.h
> @@ -119,4 +119,8 @@ extern QemuOptsList qemu_net_opts;
>  extern QemuOptsList qemu_global_opts;
>  extern QemuOptsList qemu_semihosting_config_opts;
>  
> +bool deferred_backend_init(void);
> +void set_deferred_backend_init(void);
> +void clear_deferred_backend_init(void);
> +
>  #endif
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 4ff6b4d785..e04f9b6469 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -20,6 +20,7 @@
>  #include "sysemu/blockdev.h"
>  #include "sysemu/runstate.h"
>  #include "sysemu/replay.h"
> +#include "sysemu/sysemu.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-events-block.h"
>  #include "qemu/id.h"
> @@ -935,7 +936,7 @@ int blk_attach_dev(BlockBackend *blk, DeviceState *dev)
>      /* While migration is still incoming, we don't need to apply the
>       * permissions of guest device BlockBackends. We might still have a block
>       * job or NBD server writing to the image for storage migration. */
> -    if (runstate_check(RUN_STATE_INMIGRATE)) {
> +    if (runstate_check(RUN_STATE_INMIGRATE) || deferred_backend_init()) {
>          blk->disable_perm = true;
>      }

Why is this necessary for vfio-user? Disk images shouldn't be in use by
another process so we don't need to bypass permissions temporarily.

>  
> diff --git a/blockdev.c b/blockdev.c
> index 42e098b458..d495070679 100644
> --- a/blockdev.c
> +++ b/blockdev.c
> @@ -569,7 +569,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
>          qdict_set_default_str(bs_opts, BDRV_OPT_AUTO_READ_ONLY, "on");
>          assert((bdrv_flags & BDRV_O_CACHE_MASK) == 0);
>  
> -        if (runstate_check(RUN_STATE_INMIGRATE)) {
> +        if (runstate_check(RUN_STATE_INMIGRATE) || deferred_backend_init()) {
>              bdrv_flags |= BDRV_O_INACTIVE;

Same here.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 17/19] vfio-user: register handlers to facilitate migration
  2022-02-17  7:49 ` [PATCH v6 17/19] vfio-user: register handlers to facilitate migration Jagannathan Raman
  2022-02-18 12:20   ` Paolo Bonzini
@ 2022-03-07 11:26   ` Stefan Hajnoczi
  1 sibling, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-07 11:26 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 23404 bytes --]

On Thu, Feb 17, 2022 at 02:49:04AM -0500, Jagannathan Raman wrote:
> Store and load the device's state during migration. use libvfio-user's
> handlers for this purpose
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  include/block/block.h       |   1 +
>  include/migration/vmstate.h |   2 +
>  migration/savevm.h          |   2 +
>  block.c                     |   5 +
>  hw/remote/machine.c         |   7 +
>  hw/remote/vfio-user-obj.c   | 467 ++++++++++++++++++++++++++++++++++++
>  migration/savevm.c          |  89 +++++++
>  migration/vmstate.c         |  19 ++
>  8 files changed, 592 insertions(+)
> 
> diff --git a/include/block/block.h b/include/block/block.h
> index e1713ee306..02b89e0668 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -495,6 +495,7 @@ int generated_co_wrapper bdrv_invalidate_cache(BlockDriverState *bs,
>                                                 Error **errp);
>  void bdrv_invalidate_cache_all(Error **errp);
>  int bdrv_inactivate_all(void);
> +int bdrv_inactivate(BlockDriverState *bs);
>  
>  /* Ensure contents are flushed to disk.  */
>  int generated_co_wrapper bdrv_flush(BlockDriverState *bs);
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index 017c03675c..68bea576ea 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -1165,6 +1165,8 @@ extern const VMStateInfo vmstate_info_qlist;
>  #define VMSTATE_END_OF_LIST()                                         \
>      {}
>  
> +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev);
> +
>  int vmstate_load_state(QEMUFile *f, const VMStateDescription *vmsd,
>                         void *opaque, int version_id);
>  int vmstate_save_state(QEMUFile *f, const VMStateDescription *vmsd,
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 6461342cb4..8007064ff2 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
>  int qemu_load_device_state(QEMUFile *f);
>  int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
>          bool in_postcopy, bool inactivate_disks);
> +int qemu_remote_savevm(QEMUFile *f, DeviceState *dev);
> +int qemu_remote_loadvm(QEMUFile *f);
>  
>  #endif
> diff --git a/block.c b/block.c
> index b54d59d1fa..e90aaee30c 100644
> --- a/block.c
> +++ b/block.c
> @@ -6565,6 +6565,11 @@ static int bdrv_inactivate_recurse(BlockDriverState *bs)
>      return 0;
>  }
>  
> +int bdrv_inactivate(BlockDriverState *bs)
> +{
> +    return bdrv_inactivate_recurse(bs);
> +}
> +
>  int bdrv_inactivate_all(void)
>  {
>      BlockDriverState *bs = NULL;
> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
> index a8b4a3aef3..31ef401e43 100644
> --- a/hw/remote/machine.c
> +++ b/hw/remote/machine.c
> @@ -24,6 +24,7 @@
>  #include "hw/qdev-core.h"
>  #include "hw/remote/iommu.h"
>  #include "hw/remote/vfio-user-obj.h"
> +#include "sysemu/sysemu.h"
>  
>  static void remote_machine_init(MachineState *machine)
>  {
> @@ -86,6 +87,11 @@ static void remote_machine_set_vfio_user(Object *obj, bool value, Error **errp)
>      s->vfio_user = value;
>  }
>  
> +static void remote_machine_instance_init(Object *obj)
> +{
> +    set_deferred_backend_init();
> +}
> +
>  static void remote_machine_class_init(ObjectClass *oc, void *data)
>  {
>      MachineClass *mc = MACHINE_CLASS(oc);
> @@ -105,6 +111,7 @@ static const TypeInfo remote_machine = {
>      .name = TYPE_REMOTE_MACHINE,
>      .parent = TYPE_MACHINE,
>      .instance_size = sizeof(RemoteMachineState),
> +    .instance_init = remote_machine_instance_init,
>      .class_init = remote_machine_class_init,
>      .interfaces = (InterfaceInfo[]) {
>          { TYPE_HOTPLUG_HANDLER },
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index d79bab87f1..2304643003 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -57,6 +57,13 @@
>  #include "hw/pci/msi.h"
>  #include "hw/pci/msix.h"
>  #include "hw/remote/vfio-user-obj.h"
> +#include "migration/qemu-file.h"
> +#include "migration/savevm.h"
> +#include "migration/vmstate.h"
> +#include "migration/global_state.h"
> +#include "block/block.h"
> +#include "sysemu/block-backend.h"
> +#include "net/net.h"
>  
>  #define TYPE_VFU_OBJECT "x-vfio-user-server"
>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -108,12 +115,49 @@ struct VfuObject {
>      Error *unplug_blocker;
>  
>      int vfu_poll_fd;
> +
> +    /*
> +     * vfu_mig_buf holds the migration data. In the remote server, this
> +     * buffer replaces the role of an IO channel which links the source
> +     * and the destination.
> +     *
> +     * Whenever the client QEMU process initiates migration, the remote
> +     * server gets notified via libvfio-user callbacks. The remote server
> +     * sets up a QEMUFile object using this buffer as backend. The remote
> +     * server passes this object to its migration subsystem, which slurps
> +     * the VMSD of the device ('devid' above) referenced by this object
> +     * and stores the VMSD in this buffer.
> +     *
> +     * The client subsequetly asks the remote server for any data that
> +     * needs to be moved over to the destination via libvfio-user
> +     * library's vfu_migration_callbacks_t callbacks. The remote hands
> +     * over this buffer as data at this time.
> +     *
> +     * A reverse of this process happens at the destination.
> +     */
> +    uint8_t *vfu_mig_buf;
> +
> +    uint64_t vfu_mig_buf_size;
> +
> +    uint64_t vfu_mig_buf_pending;
> +
> +    uint64_t vfu_mig_data_written;
> +
> +    uint64_t vfu_mig_section_offset;
> +
> +    QEMUFile *vfu_mig_file;
> +
> +    vfu_migr_state_t vfu_state;
>  };
>  
>  static GHashTable *vfu_object_bdf_to_ctx_table;
>  
>  #define INT2VOIDP(i) (void *)(uintptr_t)(i)
>  
> +#define KB(x)    ((size_t) (x) << 10)
> +
> +#define VFU_OBJECT_MIG_WINDOW KB(64)

Please use "qemu/units.h":

  #include "qemu/units.h"
  ...
  #define VFU_OBJECT_MIG_WINDOW_SIZE (64 * KiB)

(Adding "_SIZE" to the name makes the purpose of the constant clearer.)

> +
>  static void vfu_object_init_ctx(VfuObject *o, Error **errp);
>  
>  static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
> @@ -163,6 +207,394 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>      vfu_object_init_ctx(o, errp);
>  }
>  
> +/**
> + * Migration helper functions
> + *
> + * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
> + * subsystem - qemu_remote_loadvm & qemu_remote_savevm. loadvm/savevm
> + * call these functions via QEMUFileOps to load/save the VMSD of a
> + * device into vfu_mig_buf
> + *
> + */
> +static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
> +                                size_t size, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    if (pos > o->vfu_mig_buf_size) {
> +        size = 0;
> +    } else if ((pos + size) > o->vfu_mig_buf_size) {
> +        size = o->vfu_mig_buf_size - pos;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + pos), size);
> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
> +                                 int64_t pos, Error **errp)
> +{
> +    ERRP_GUARD();
> +    VfuObject *o = opaque;
> +    uint64_t end = pos + iov_size(iov, iovcnt);
> +    int i;
> +
> +    if (o->vfu_mig_buf_pending) {
> +        error_setg(errp, "Migration is ongoing");
> +        return 0;
> +    }
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +    }
> +
> +    for (i = 0; i < iovcnt; i++) {
> +        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
> +               iov[i].iov_len);
> +        o->vfu_mig_buf_size += iov[i].iov_len;
> +    }
> +
> +    return iov_size(iov, iovcnt);
> +}
> +
> +static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
> +{
> +    VfuObject *o = opaque;
> +
> +    o->vfu_mig_buf_size = 0;
> +
> +    g_free(o->vfu_mig_buf);
> +
> +    o->vfu_mig_buf = NULL;
> +
> +    o->vfu_mig_buf_pending = 0;
> +
> +    o->vfu_mig_data_written = 0;
> +
> +    o->vfu_mig_section_offset = 0;
> +
> +    return 0;
> +}
> +
> +static const QEMUFileOps vfu_mig_fops_save = {
> +    .writev_buffer  = vfu_mig_buf_write,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +static const QEMUFileOps vfu_mig_fops_load = {
> +    .get_buffer     = vfu_mig_buf_read,
> +    .shut_down      = vfu_mig_buf_shutdown,
> +};
> +
> +static BlockDriverState *vfu_object_find_bs_by_dev(DeviceState *dev)
> +{
> +    BlockBackend *blk = blk_by_dev(dev);
> +
> +    if (!blk) {
> +        return NULL;
> +    }
> +
> +    return blk_bs(blk);
> +}
> +
> +static int vfu_object_bdrv_invalidate_cache_by_dev(DeviceState *dev)
> +{
> +    BlockDriverState *bs = NULL;
> +    Error *local_err = NULL;
> +
> +    bs = vfu_object_find_bs_by_dev(dev);
> +    if (!bs) {
> +        return 0;
> +    }
> +
> +    bdrv_invalidate_cache(bs, &local_err);
> +    if (local_err) {
> +        error_report_err(local_err);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int vfu_object_bdrv_inactivate_by_dev(DeviceState *dev)
> +{
> +    BlockDriverState *bs = NULL;
> +
> +    bs = vfu_object_find_bs_by_dev(dev);
> +    if (!bs) {
> +        return 0;
> +    }
> +
> +    return bdrv_inactivate(bs);
> +}
> +
> +static void vfu_object_start_stop_netdev(DeviceState *dev, bool start)
> +{
> +    NetClientState *nc = NULL;
> +    Error *local_err = NULL;
> +    char *netdev = NULL;
> +
> +    netdev = object_property_get_str(OBJECT(dev), "netdev", &local_err);
> +    if (local_err) {
> +        /**
> +         * object_property_get_str() sets Error if netdev property is
> +         * not found, not necessarily an error in the context of
> +         * this function
> +         */
> +        error_free(local_err);
> +        return;
> +    }
> +
> +    if (!netdev) {
> +        return;
> +    }
> +
> +    nc = qemu_find_netdev(netdev);
> +
> +    if (!nc) {
> +        return;
> +    }
> +
> +    if (!start) {
> +        qemu_flush_or_purge_queued_packets(nc, true);
> +
> +        if (nc->info && nc->info->cleanup) {
> +            nc->info->cleanup(nc);
> +        }

I'm not sure if this is correct. Do we actually want to clean up the
NetClient (e.g. close the tap file descriptor)? If yes, why isn't the
NetClient removed from the net_clients tailq?

> +    } else if (nc->peer) {
> +        qemu_flush_or_purge_queued_packets(nc->peer, false);
> +    }
> +}
> +
> +static int vfu_object_start_devs(DeviceState *dev, void *opaque)
> +{
> +    int ret = vfu_object_bdrv_invalidate_cache_by_dev(dev);
> +
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    vfu_object_start_stop_netdev(dev, true);
> +
> +    return ret;
> +}
> +
> +static int vfu_object_stop_devs(DeviceState *dev, void *opaque)
> +{
> +    int ret = vfu_object_bdrv_inactivate_by_dev(dev);
> +
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    vfu_object_start_stop_netdev(dev, false);
> +
> +    return ret;
> +}
> +
> +/**
> + * handlers for vfu_migration_callbacks_t
> + *
> + * The libvfio-user library accesses these handlers to drive the migration
> + * at the remote end, and also to transport the data stored in vfu_mig_buf
> + *
> + */
> +static void vfu_mig_state_stop_and_copy(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save, false);
> +    }
> +
> +    ret = qemu_remote_savevm(o->vfu_mig_file, DEVICE(o->pci_dev));
> +    if (ret) {
> +        qemu_file_shutdown(o->vfu_mig_file);
> +        o->vfu_mig_file = NULL;
> +        return;
> +    }
> +
> +    qemu_fflush(o->vfu_mig_file);
> +}
> +
> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    if (o->vfu_state != VFU_MIGR_STATE_RESUME) {
> +        goto run_ctx;
> +    }
> +
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
> +    }
> +
> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
> +    if (ret) {
> +        VFU_OBJECT_ERROR(o, "vfu: failed to restore device state");
> +        return;
> +    }
> +
> +    qemu_file_shutdown(o->vfu_mig_file);
> +    o->vfu_mig_file = NULL;
> +
> +run_ctx:
> +    ret = qdev_walk_children(DEVICE(o->pci_dev), NULL, NULL,
> +                             vfu_object_start_devs,
> +                             NULL, NULL);
> +    if (ret) {
> +        VFU_OBJECT_ERROR(o, "vfu: failed to setup backends for %s",
> +                         o->device);
> +        return;
> +    }
> +}
> +
> +static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    int ret;
> +
> +    ret = qdev_walk_children(DEVICE(o->pci_dev), NULL, NULL,
> +                             vfu_object_stop_devs,
> +                             NULL, NULL);
> +    if (ret) {
> +        VFU_OBJECT_ERROR(o, "vfu: failed to inactivate backends for %s",
> +                         o->device);
> +    }
> +}
> +
> +static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (o->vfu_state == state) {
> +        return 0;
> +    }
> +
> +    switch (state) {
> +    case VFU_MIGR_STATE_RESUME:
> +        break;
> +    case VFU_MIGR_STATE_STOP_AND_COPY:
> +        vfu_mig_state_stop_and_copy(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_STOP:
> +        vfu_mig_state_stop(vfu_ctx);
> +        break;
> +    case VFU_MIGR_STATE_PRE_COPY:
> +        break;
> +    case VFU_MIGR_STATE_RUNNING:
> +        vfu_mig_state_running(vfu_ctx);
> +        break;
> +    default:
> +        warn_report("vfu: Unknown migration state %d", state);
> +    }
> +
> +    o->vfu_state = state;
> +
> +    return 0;
> +}
> +
> +static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    static bool mig_ongoing;
> +
> +    if (!mig_ongoing && !o->vfu_mig_buf_pending) {
> +        o->vfu_mig_buf_pending = o->vfu_mig_buf_size;
> +        mig_ongoing = true;
> +    }
> +
> +    if (mig_ongoing && !o->vfu_mig_buf_pending) {
> +        mig_ongoing = false;
> +    }
> +
> +    return o->vfu_mig_buf_pending;
> +}
> +
> +static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
> +                                uint64_t *size)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t data_size = o->vfu_mig_buf_pending;
> +
> +    if (data_size > VFU_OBJECT_MIG_WINDOW) {
> +        data_size = VFU_OBJECT_MIG_WINDOW;
> +    }
> +
> +    o->vfu_mig_section_offset = o->vfu_mig_buf_size - o->vfu_mig_buf_pending;
> +
> +    o->vfu_mig_buf_pending -= data_size;
> +
> +    if (offset) {
> +        *offset = 0;
> +    }
> +
> +    if (size) {
> +        *size = data_size;
> +    }
> +
> +    return 0;
> +}
> +
> +static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
> +                                 uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t read_offset = o->vfu_mig_section_offset + offset;
> +
> +    if (read_offset > o->vfu_mig_buf_size) {
> +        warn_report("vfu: buffer overflow - offset outside range");
> +        return -1;
> +    }
> +
> +    if ((read_offset + size) > o->vfu_mig_buf_size) {
> +        warn_report("vfu: buffer overflow - size outside range");
> +        size = o->vfu_mig_buf_size - read_offset;
> +    }
> +
> +    memcpy(buf, (o->vfu_mig_buf + read_offset), size);
> +
> +    return size;
> +}
> +
> +static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
> +                                  uint64_t size, uint64_t offset)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint64_t end = o->vfu_mig_data_written + offset + size;
> +
> +    if (end > o->vfu_mig_buf_size) {
> +        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
> +        o->vfu_mig_buf_size = end;
> +    }
> +
> +    memcpy((o->vfu_mig_buf + o->vfu_mig_data_written + offset), data, size);
> +
> +    return size;
> +}
> +
> +static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    o->vfu_mig_data_written += count;
> +
> +    return 0;
> +}
> +
> +static const vfu_migration_callbacks_t vfu_mig_cbs = {
> +    .version = VFU_MIGR_CALLBACKS_VERS,
> +    .transition = &vfu_mig_transition,
> +    .get_pending_bytes = &vfu_mig_get_pending_bytes,
> +    .prepare_data = &vfu_mig_prepare_data,
> +    .read_data = &vfu_mig_read_data,
> +    .data_written = &vfu_mig_data_written,
> +    .write_data = &vfu_mig_write_data,
> +};
> +
>  static void vfu_object_ctx_run(void *opaque)
>  {
>      VfuObject *o = opaque;
> @@ -550,6 +982,13 @@ void vfu_object_set_bus_irq(PCIBus *pci_bus)
>      pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, NULL, 1);
>  }
>  
> +static bool vfu_object_migratable(VfuObject *o)
> +{
> +    DeviceClass *dc = DEVICE_GET_CLASS(o->pci_dev);
> +
> +    return dc->vmsd && !dc->vmsd->unmigratable;
> +}
> +
>  /*
>   * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>   * properties. It also depends on devices instantiated in QEMU. These
> @@ -575,6 +1014,7 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>      ERRP_GUARD();
>      DeviceState *dev = NULL;
>      vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
> +    uint64_t migr_regs_size, migr_size;
>      int ret;
>  
>      if (o->vfu_ctx || !o->socket || !o->device ||
> @@ -653,6 +1093,31 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>          goto fail;
>      }
>  
> +    migr_regs_size = vfu_get_migr_register_area_size();
> +    migr_size = migr_regs_size + VFU_OBJECT_MIG_WINDOW;
> +
> +    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
> +                           migr_size, NULL,
> +                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
> +    if (ret < 0) {
> +        error_setg(errp, "vfu: Failed to register migration BAR %s- %s",
> +                   o->device, strerror(errno));
> +        goto fail;
> +    }
> +
> +    if (!vfu_object_migratable(o)) {
> +        goto realize_ctx;
> +    }
> +
> +    ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
> +                                               migr_regs_size);
> +    if (ret < 0) {
> +        error_setg(errp, "vfu: Failed to setup migration %s- %s",
> +                   o->device, strerror(errno));
> +        goto fail;
> +    }
> +
> +realize_ctx:
>      ret = vfu_realize_ctx(o->vfu_ctx);
>      if (ret < 0) {
>          error_setg(errp, "vfu: Failed to realize device %s- %s",
> @@ -700,6 +1165,8 @@ static void vfu_object_init(Object *obj)
>      }
>  
>      o->vfu_poll_fd = -1;
> +
> +    o->vfu_state = VFU_MIGR_STATE_STOP;

I was expecting RUNNING instead of STOP. Can you explain the state
machine? Perhaps --object x-vfio-user-server needs an incoming=on|off
parameter that defaults to off?

>  }
>  
>  static void vfu_object_finalize(Object *obj)
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 1599b02fbc..2cc3b74287 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -66,6 +66,7 @@
>  #include "net/announce.h"
>  #include "qemu/yank.h"
>  #include "yank_functions.h"
> +#include "hw/qdev-core.h"
>  
>  const unsigned int postcopy_ram_discard_version;
>  
> @@ -1606,6 +1607,64 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
>      return ret;
>  }
>  
> +static SaveStateEntry *find_se_from_dev(DeviceState *dev)
> +{
> +    SaveStateEntry *se;
> +
> +    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
> +        if (se->opaque == dev) {
> +            return se;
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +static int qemu_remote_savevm_section_full(DeviceState *dev, void *opaque)
> +{
> +    QEMUFile *f = opaque;
> +    SaveStateEntry *se;
> +    int ret;
> +
> +    se = find_se_from_dev(dev);
> +    if (!se) {
> +        return 0;
> +    }
> +
> +    if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque) ||
> +        se->vmsd->unmigratable) {
> +        return 0;
> +    }
> +
> +    save_section_header(f, se, QEMU_VM_SECTION_FULL);
> +
> +    ret = vmstate_save(f, se, NULL);
> +    if (ret) {
> +        qemu_file_set_error(f, ret);
> +        return ret;
> +    }
> +
> +    save_section_footer(f, se);
> +
> +    return 0;
> +}
> +
> +int qemu_remote_savevm(QEMUFile *f, DeviceState *dev)
> +{
> +    int ret = qdev_walk_children(dev, NULL, NULL,
> +                                 qemu_remote_savevm_section_full,
> +                                 NULL, f);
> +
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    qemu_put_byte(f, QEMU_VM_EOF);
> +    qemu_fflush(f);
> +
> +    return 0;
> +}
> +
>  void qemu_savevm_live_state(QEMUFile *f)
>  {
>      /* save QEMU_VM_SECTION_END section */
> @@ -2447,6 +2506,36 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
>      return 0;
>  }
>  
> +int qemu_remote_loadvm(QEMUFile *f)
> +{
> +    uint8_t section_type;
> +    int ret = 0;
> +
> +    while (true) {
> +        section_type = qemu_get_byte(f);
> +
> +        ret = qemu_file_get_error(f);
> +        if (ret) {
> +            break;
> +        }
> +
> +        switch (section_type) {
> +        case QEMU_VM_SECTION_FULL:
> +            ret = qemu_loadvm_section_start_full(f, NULL);
> +            if (ret < 0) {
> +                break;
> +            }
> +            break;
> +        case QEMU_VM_EOF:
> +            return ret;
> +        default:
> +            return -EINVAL;
> +        }
> +    }
> +
> +    return ret;
> +}
> +
>  static int
>  qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
>  {
> diff --git a/migration/vmstate.c b/migration/vmstate.c
> index 05f87cdddc..83f8562792 100644
> --- a/migration/vmstate.c
> +++ b/migration/vmstate.c
> @@ -63,6 +63,25 @@ static int vmstate_size(void *opaque, const VMStateField *field)
>      return size;
>  }
>  
> +uint64_t vmstate_vmsd_size(PCIDevice *pci_dev)

This function is no longer used and can be dropped.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 18/19] vfio-user: handle reset of remote device
  2022-02-17  7:49 ` [PATCH v6 18/19] vfio-user: handle reset of remote device Jagannathan Raman
@ 2022-03-07 11:36   ` Stefan Hajnoczi
  2022-03-07 15:37     ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-07 11:36 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: eduardo, elena.ufimtseva, john.g.johnson, berrange, bleal,
	john.levon, mst, armbru, quintela, f4bug, qemu-devel,
	alex.williamson, kanth.ghatraju, thanos.makatos, pbonzini,
	eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 1685 bytes --]

On Thu, Feb 17, 2022 at 02:49:05AM -0500, Jagannathan Raman wrote:
> Adds handler to reset a remote device
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 2304643003..55f1bf5e0f 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -989,6 +989,19 @@ static bool vfu_object_migratable(VfuObject *o)
>      return dc->vmsd && !dc->vmsd->unmigratable;
>  }
>  
> +static int vfu_object_device_reset(vfu_ctx_t *vfu_ctx, vfu_reset_type_t type)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +
> +    if (type == VFU_RESET_LOST_CONN) {
> +        return 0;
> +    }

Why is a lost connection ignored? Should there be a QMP monitor event?

> +
> +    qdev_reset_all(DEVICE(o->pci_dev));
> +
> +    return 0;
> +}
> +
>  /*
>   * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>   * properties. It also depends on devices instantiated in QEMU. These
> @@ -1105,6 +1118,12 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>          goto fail;
>      }
>  
> +    ret = vfu_setup_device_reset_cb(o->vfu_ctx, &vfu_object_device_reset);
> +    if (ret < 0) {
> +        error_setg(errp, "vfu: Failed to setup reset callback");
> +        goto fail;
> +    }
> +
>      if (!vfu_object_migratable(o)) {
>          goto realize_ctx;
>      }
> -- 
> 2.20.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/19] vfio-user: IOMMU support for remote device
  2022-03-07  9:45           ` Stefan Hajnoczi
@ 2022-03-07 14:42             ` Jag Raman
  2022-03-08 10:04               ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-03-07 14:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Mar 7, 2022, at 4:45 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Mar 03, 2022 at 02:49:53PM +0000, Jag Raman wrote:
>> 
>> 
>>> On Mar 2, 2022, at 11:49 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> 
>>> On Mon, Feb 28, 2022 at 07:54:38PM +0000, Jag Raman wrote:
>>>> 
>>>> 
>>>>> On Feb 22, 2022, at 5:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>>>> 
>>>>> On Thu, Feb 17, 2022 at 02:48:59AM -0500, Jagannathan Raman wrote:
>>>>>> +struct RemoteIommuElem {
>>>>>> +    AddressSpace  as;
>>>>>> +    MemoryRegion  mr;
>>>>>> +};
>>>>>> +
>>>>>> +GHashTable *remote_iommu_elem_by_bdf;
>>>>> 
>>>>> A mutable global hash table requires synchronization when device
>>>>> emulation runs in multiple threads.
>>>>> 
>>>>> I suggest using pci_setup_iommu()'s iommu_opaque argument to avoid the
>>>>> global. If there is only 1 device per remote PCI bus, then there are no
>>>>> further synchronization concerns.
>>>> 
>>>> OK, will avoid the global. We would need to access the hash table
>>>> concurrently since there could be more than one device in the
>>>> same bus - so a mutex would be needed here.
>>> 
>>> I thought the PCIe topology can be set up with a separate buf for each
>>> x-vfio-user-server? I remember something like that in the previous
>>> revision where a root port was instantiated for each x-vfio-user-server.
>> 
>> Yes, we could setup the PCIe topology to be that way. But the user could
>> add more than one device to the same bus, unless the bus type explicitly
>> limits the number of devices to one (BusClass->max_dev).
> 
> Due to how the IOMMU is used to restrict the bus to the vfio-user
> client's DMA mappings, it seems like it's necesssary to limit the number
> of devices to 1 per bus anyway?

Hi Stefan,

“remote_iommu_elem_by_bdf” has a separate entry for each of the BDF
combinations - it provides a separate DMA address space per device. As
such, we don’t have to limit the number of devices to 1 per bus.

Thank you!
--
Jag

> 
> Stefan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-03-07 10:24   ` Stefan Hajnoczi
@ 2022-03-07 15:10     ` Jag Raman
  2022-03-08 10:15       ` Stefan Hajnoczi
  2022-03-26 23:47     ` Jag Raman
  1 sibling, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-03-07 15:10 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
>> Forward remote device's interrupts to the guest
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> include/hw/pci/pci.h              |   6 ++
>> include/hw/remote/vfio-user-obj.h |   6 ++
>> hw/pci/msi.c                      |  13 +++-
>> hw/pci/msix.c                     |  12 +++-
>> hw/remote/machine.c               |  11 +--
>> hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
>> stubs/vfio-user-obj.c             |   6 ++
>> MAINTAINERS                       |   1 +
>> hw/remote/trace-events            |   1 +
>> stubs/meson.build                 |   1 +
>> 10 files changed, 158 insertions(+), 6 deletions(-)
>> create mode 100644 include/hw/remote/vfio-user-obj.h
>> create mode 100644 stubs/vfio-user-obj.c
>> 
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index c3f3c90473..d42d526a48 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
>>                                 pcibus_t addr, pcibus_t size, int type);
>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
>> 
>> typedef struct PCIIORegion {
>>     pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
>> @@ -323,6 +325,10 @@ struct PCIDevice {
>>     /* Space to store MSIX table & pending bit array */
>>     uint8_t *msix_table;
>>     uint8_t *msix_pba;
>> +
>> +    PCIMSINotify *msi_notify;
>> +    PCIMSIxNotify *msix_notify;
>> +
>>     /* MemoryRegion container for msix exclusive BAR setup */
>>     MemoryRegion msix_exclusive_bar;
>>     /* Memory Regions for MSIX table and pending bit entries. */
>> diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-user-obj.h
>> new file mode 100644
>> index 0000000000..87ab78b875
>> --- /dev/null
>> +++ b/include/hw/remote/vfio-user-obj.h
>> @@ -0,0 +1,6 @@
>> +#ifndef VFIO_USER_OBJ_H
>> +#define VFIO_USER_OBJ_H
>> +
>> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
>> +
>> +#endif
>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
>> index 47d2b0f33c..93f5e400cc 100644
>> --- a/hw/pci/msi.c
>> +++ b/hw/pci/msi.c
>> @@ -51,6 +51,8 @@
>>  */
>> bool msi_nonbroken;
>> 
>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
>> +
>> /* If we get rid of cap allocator, we won't need this. */
>> static inline uint8_t msi_cap_sizeof(uint16_t flags)
>> {
>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>     dev->msi_cap = config_offset;
>>     dev->cap_present |= QEMU_PCI_CAP_MSI;
>> 
>> +    dev->msi_notify = pci_msi_notify;
> 
> Are you sure it's correct to skip the msi_is_masked() logic? I think the

pci_msi_notify() callback includes the test for msi_is_masked() - that
covers the non vfio-user case.

The vfio-user callback should add this test - will do it if we continue
this approach. Thanks for the catch!

> callback function should only override the behavior of
> msi_send_message(), not the entire msi_notify() function.

OK, this sounds like a better approach.

> 
> The same applies to MSI-X.
> 
>> +
>>     pci_set_word(dev->config + msi_flags_off(dev), flags);
>>     pci_set_word(dev->wmask + msi_flags_off(dev),
>>                  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
>> @@ -307,7 +311,7 @@ bool msi_is_masked(const PCIDevice *dev, unsigned int vector)
>>     return mask & (1U << vector);
>> }
>> 
>> -void msi_notify(PCIDevice *dev, unsigned int vector)
>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector)
>> {
>>     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
>>     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
>> @@ -332,6 +336,13 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
>>     msi_send_message(dev, msg);
>> }
>> 
>> +void msi_notify(PCIDevice *dev, unsigned int vector)
>> +{
>> +    if (dev->msi_notify) {
> 
> Can this ever be NULL?

Unlikely in the current code flow, but it could change in the future.

As a matter of principle, I thought that we should check if a function
pointer is non-NULL before invoking it in QEMU. Is that not the case?

> 
>> +        dev->msi_notify(dev, vector);
>> +    }
>> +}
>> +
>> void msi_send_message(PCIDevice *dev, MSIMessage msg)
>> {
>>     MemTxAttrs attrs = {};
>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>> index ae9331cd0b..1c71e67f53 100644
>> --- a/hw/pci/msix.c
>> +++ b/hw/pci/msix.c
>> @@ -31,6 +31,8 @@
>> #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
>> #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
>> 
>> +static void pci_msix_notify(PCIDevice *dev, unsigned vector);
>> +
>> MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
>> {
>>     uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
>> @@ -334,6 +336,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>>     dev->msix_table = g_malloc0(table_size);
>>     dev->msix_pba = g_malloc0(pba_size);
>>     dev->msix_entry_used = g_malloc0(nentries * sizeof *dev->msix_entry_used);
>> +    dev->msix_notify = pci_msix_notify;
>> 
>>     msix_mask_all(dev, nentries);
>> 
>> @@ -485,7 +488,7 @@ int msix_enabled(PCIDevice *dev)
>> }
>> 
>> /* Send an MSI-X message */
>> -void msix_notify(PCIDevice *dev, unsigned vector)
>> +static void pci_msix_notify(PCIDevice *dev, unsigned vector)
>> {
>>     MSIMessage msg;
>> 
>> @@ -503,6 +506,13 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>>     msi_send_message(dev, msg);
>> }
>> 
>> +void msix_notify(PCIDevice *dev, unsigned vector)
>> +{
>> +    if (dev->msix_notify) {
> 
> Can this ever be NULL?
> 
>> +        dev->msix_notify(dev, vector);
>> +    }
>> +}
>> +
>> void msix_reset(PCIDevice *dev)
>> {
>>     if (!msix_present(dev)) {
>> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
>> index db4ae30710..a8b4a3aef3 100644
>> --- a/hw/remote/machine.c
>> +++ b/hw/remote/machine.c
>> @@ -23,6 +23,7 @@
>> #include "hw/remote/iohub.h"
>> #include "hw/qdev-core.h"
>> #include "hw/remote/iommu.h"
>> +#include "hw/remote/vfio-user-obj.h"
>> 
>> static void remote_machine_init(MachineState *machine)
>> {
>> @@ -54,12 +55,14 @@ static void remote_machine_init(MachineState *machine)
>> 
>>     if (s->vfio_user) {
>>         remote_configure_iommu(pci_host->bus);
>> -    }
>> 
>> -    remote_iohub_init(&s->iohub);
>> +        vfu_object_set_bus_irq(pci_host->bus);
>> +    } else {
>> +        remote_iohub_init(&s->iohub);
>> 
>> -    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
>> -                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
>> +        pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
>> +                     &s->iohub, REMOTE_IOHUB_NB_PIRQS);
>> +    }
>> 
>>     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
>> }
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index 2feabd06a4..d79bab87f1 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -54,6 +54,9 @@
>> #include "hw/pci/pci.h"
>> #include "qemu/timer.h"
>> #include "exec/memory.h"
>> +#include "hw/pci/msi.h"
>> +#include "hw/pci/msix.h"
>> +#include "hw/remote/vfio-user-obj.h"
>> 
>> #define TYPE_VFU_OBJECT "x-vfio-user-server"
>> OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
>> @@ -107,6 +110,10 @@ struct VfuObject {
>>     int vfu_poll_fd;
>> };
>> 
>> +static GHashTable *vfu_object_bdf_to_ctx_table;
> 
> I suggest adding a void *msi_notify_opaque field to PCIDevice and
> passing the value as an argument to ->msi_notify(). vfio-user-obj.c can
> set the value to vfu_ctx and eliminate the vfu_object_bdf_to_ctx_table
> hash table.
> 
> This simplifies the code, makes it faster, and solves race conditions
> during hot plug/unplug if other instances are running in IOThreads.

OK, will do.

> 
>> +
>> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
>> +
>> static void vfu_object_init_ctx(VfuObject *o, Error **errp);
>> 
>> static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
>> @@ -463,6 +470,86 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>>     }
>> }
>> 
>> +static void vfu_object_irq_trigger(int pci_bdf, unsigned vector)
>> +{
>> +    vfu_ctx_t *vfu_ctx = NULL;
>> +
>> +    if (!vfu_object_bdf_to_ctx_table) {
> 
> Can this ever be NULL?
> 
>> +        return;
>> +    }
>> +
>> +    vfu_ctx = g_hash_table_lookup(vfu_object_bdf_to_ctx_table,
>> +                                  INT2VOIDP(pci_bdf));
>> +
>> +    if (vfu_ctx) {
>> +        vfu_irq_trigger(vfu_ctx, vector);
>> +    }
>> +}
>> +
>> +static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
>> +{
>> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
>> +                                pci_dev->devfn);
>> +
>> +    return pci_bdf;
>> +}
>> +
>> +static void vfu_object_set_irq(void *opaque, int pirq, int level)
>> +{
>> +    if (level) {
>> +        vfu_object_irq_trigger(pirq, 0);
>> +    }
>> +}
>> +
>> +static void vfu_object_msi_notify(PCIDevice *pci_dev, unsigned vector)
>> +{
>> +    int pci_bdf;
>> +
>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
>> +
>> +    vfu_object_irq_trigger(pci_bdf, vector);
>> +}
>> +
>> +static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
>> +{
>> +    vfu_ctx_t *vfu_ctx = o->vfu_ctx;
>> +    int ret, pci_bdf;
>> +
>> +    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = 0;
>> +    if (msix_nr_vectors_allocated(pci_dev)) {
>> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
>> +                                       msix_nr_vectors_allocated(pci_dev));
>> +
>> +        pci_dev->msix_notify = vfu_object_msi_notify;
>> +    } else if (msi_nr_vectors_allocated(pci_dev)) {
>> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
>> +                                       msi_nr_vectors_allocated(pci_dev));
>> +
>> +        pci_dev->msi_notify = vfu_object_msi_notify;
>> +    }
>> +
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
>> +
>> +    g_hash_table_insert(vfu_object_bdf_to_ctx_table, INT2VOIDP(pci_bdf),
>> +                        o->vfu_ctx);
>> +
>> +    return 0;
>> +}
>> +
>> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
>> +{
>> +    pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, NULL, 1);
>> +}
>> +
>> /*
>>  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>>  * properties. It also depends on devices instantiated in QEMU. These
>> @@ -559,6 +646,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>> 
>>     vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
>> 
>> +    ret = vfu_object_setup_irqs(o, o->pci_dev);
>> +    if (ret < 0) {
>> +        error_setg(errp, "vfu: Failed to setup interrupts for %s",
>> +                   o->device);
>> +        goto fail;
>> +    }
>> +
>>     ret = vfu_realize_ctx(o->vfu_ctx);
>>     if (ret < 0) {
>>         error_setg(errp, "vfu: Failed to realize device %s- %s",
>> @@ -612,6 +706,7 @@ static void vfu_object_finalize(Object *obj)
>> {
>>     VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
>>     VfuObject *o = VFU_OBJECT(obj);
>> +    int pci_bdf;
>> 
>>     k->nr_devs--;
>> 
>> @@ -638,9 +733,17 @@ static void vfu_object_finalize(Object *obj)
>>         o->unplug_blocker = NULL;
>>     }
>> 
>> +    if (o->pci_dev) {
>> +        pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(o->pci_dev)),
>> +                                o->pci_dev->devfn);
>> +        g_hash_table_remove(vfu_object_bdf_to_ctx_table, INT2VOIDP(pci_bdf));
>> +    }
>> +
>>     o->pci_dev = NULL;
>> 
>>     if (!k->nr_devs && k->auto_shutdown) {
>> +        g_hash_table_destroy(vfu_object_bdf_to_ctx_table);
>> +        vfu_object_bdf_to_ctx_table = NULL;
>>         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>>     }
>> 
>> @@ -658,6 +761,10 @@ static void vfu_object_class_init(ObjectClass *klass, void *data)
>> 
>>     k->auto_shutdown = true;
>> 
>> +    msi_nonbroken = true;
> 
> This should go in hw/remote/machine.c. It's a global variable related to
> the machine's interrupt controller capabilities. The value is not
> related to vfu_object_class_init(), which will be called by any QEMU
> binary that links hw/remote/vfio-user-obj.o regardless of which machine
> type is instantiated.

multiprocess QEMU, which also uses the remote machine, doesn’t support MSI and
that’s why we placed it here originally. In subsequent series, we have added
‘vfio-user’ machine sub-option to discern vfio-user and multiprocess, so this could be
moved to the machine initialization code as you just pointed out.

Thank you!
--
Jag

> 
>> +
>> +    vfu_object_bdf_to_ctx_table = g_hash_table_new_full(NULL, NULL, NULL, NULL);
>> +
>>     object_class_property_add(klass, "socket", "SocketAddress", NULL,
>>                               vfu_object_set_socket, NULL, NULL);
>>     object_class_property_set_description(klass, "socket",
>> diff --git a/stubs/vfio-user-obj.c b/stubs/vfio-user-obj.c
>> new file mode 100644
>> index 0000000000..79100d768e
>> --- /dev/null
>> +++ b/stubs/vfio-user-obj.c
>> @@ -0,0 +1,6 @@
>> +#include "qemu/osdep.h"
>> +#include "hw/remote/vfio-user-obj.h"
>> +
>> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
>> +{
>> +}
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index f47232c78c..e274cb46af 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3569,6 +3569,7 @@ F: hw/remote/iohub.c
>> F: include/hw/remote/iohub.h
>> F: subprojects/libvfio-user
>> F: hw/remote/vfio-user-obj.c
>> +F: include/hw/remote/vfio-user-obj.h
>> F: hw/remote/iommu.c
>> F: include/hw/remote/iommu.h
>> 
>> diff --git a/hw/remote/trace-events b/hw/remote/trace-events
>> index 847d50d88f..c167b3c7a5 100644
>> --- a/hw/remote/trace-events
>> +++ b/hw/remote/trace-events
>> @@ -12,3 +12,4 @@ vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
>> vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
>> vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
>> vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
>> +vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
>> diff --git a/stubs/meson.build b/stubs/meson.build
>> index d359cbe1ad..c5ce979dc3 100644
>> --- a/stubs/meson.build
>> +++ b/stubs/meson.build
>> @@ -57,3 +57,4 @@ if have_system
>> else
>>   stub_ss.add(files('qdev.c'))
>> endif
>> +stub_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_false: files('vfio-user-obj.c'))
>> -- 
>> 2.20.1
>> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 16/19] softmmu/vl: defer backend init
  2022-03-07 10:48   ` Stefan Hajnoczi
@ 2022-03-07 15:31     ` Jag Raman
  0 siblings, 0 replies; 76+ messages in thread
From: Jag Raman @ 2022-03-07 15:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	Paolo Bonzini, eblake, dgilbert



> On Mar 7, 2022, at 5:48 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:49:03AM -0500, Jagannathan Raman wrote:
>> Allow deferred initialization of backends. TYPE_REMOTE_MACHINE is
>> agnostic to QEMU's RUN_STATE. It's state is driven by the QEMU client
> 
> s/It's/Its/
> 
>> via the vfio-user protocol. Whereas, the backends presently defer
>> initialization if QEMU is in RUN_STATE_INMIGRATE. Since the remote
>> machine can't use RUN_STATE*, this commit allows it to ask for deferred
>> initialization of backend device. It is primarily targeted towards block
>> devices in this commit, but it needed not be limited to that.
> 
> What is the purpose of this commit? I don't understand the description.

Sorry it’s not clear. This patch is needed to support vfio-user migration.

Just for background, this patch along with the next one helps to migrate
individual devices from the source to the destination. For example, in a
storage server daemon with 5 PCI controllers, we could migrate just 2 of
the 5 controllers to the destination while the remaining 3 continue to run
on the source. The destination could also be a server that is already
running, it doesn’t have to be frozen for migration.

This patch specifically affects how block drives are initialized in the
destination. In all the presently defined use cases, QEMU launches the
destination in RUN_STATE_INMIGRATE. This is essentially a frozen
state, which implicitly defers the initialization of the backends such as
block drives until after the migration is complete. Whereas in vfio-user,
the destination cannot be in RUN_STATE_INMIGRATE as it could already
be running. Therefore, we need a way to tell backend devices to defer
their initialization. This patch addresses the need to defer backend
initialization for already running QEMU instances.

> 
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> include/sysemu/sysemu.h    |  4 ++++
>> block/block-backend.c      |  3 ++-
>> blockdev.c                 |  2 +-
>> softmmu/vl.c               | 17 +++++++++++++++++
>> stubs/defer-backend-init.c |  7 +++++++
>> MAINTAINERS                |  1 +
>> stubs/meson.build          |  1 +
>> 7 files changed, 33 insertions(+), 2 deletions(-)
>> create mode 100644 stubs/defer-backend-init.c
>> 
>> diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
>> index b9421e03ff..3179eb1857 100644
>> --- a/include/sysemu/sysemu.h
>> +++ b/include/sysemu/sysemu.h
>> @@ -119,4 +119,8 @@ extern QemuOptsList qemu_net_opts;
>> extern QemuOptsList qemu_global_opts;
>> extern QemuOptsList qemu_semihosting_config_opts;
>> 
>> +bool deferred_backend_init(void);
>> +void set_deferred_backend_init(void);
>> +void clear_deferred_backend_init(void);
>> +
>> #endif
>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index 4ff6b4d785..e04f9b6469 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -20,6 +20,7 @@
>> #include "sysemu/blockdev.h"
>> #include "sysemu/runstate.h"
>> #include "sysemu/replay.h"
>> +#include "sysemu/sysemu.h"
>> #include "qapi/error.h"
>> #include "qapi/qapi-events-block.h"
>> #include "qemu/id.h"
>> @@ -935,7 +936,7 @@ int blk_attach_dev(BlockBackend *blk, DeviceState *dev)
>>     /* While migration is still incoming, we don't need to apply the
>>      * permissions of guest device BlockBackends. We might still have a block
>>      * job or NBD server writing to the image for storage migration. */
>> -    if (runstate_check(RUN_STATE_INMIGRATE)) {
>> +    if (runstate_check(RUN_STATE_INMIGRATE) || deferred_backend_init()) {
>>         blk->disable_perm = true;
>>     }
> 
> Why is this necessary for vfio-user? Disk images shouldn't be in use by
> another process so we don't need to bypass permissions temporarily.

The destination in vfio-user migration needs this - the source would
already be using the disk images.

Thank you!
--
Jag

> 
>> 
>> diff --git a/blockdev.c b/blockdev.c
>> index 42e098b458..d495070679 100644
>> --- a/blockdev.c
>> +++ b/blockdev.c
>> @@ -569,7 +569,7 @@ static BlockBackend *blockdev_init(const char *file, QDict *bs_opts,
>>         qdict_set_default_str(bs_opts, BDRV_OPT_AUTO_READ_ONLY, "on");
>>         assert((bdrv_flags & BDRV_O_CACHE_MASK) == 0);
>> 
>> -        if (runstate_check(RUN_STATE_INMIGRATE)) {
>> +        if (runstate_check(RUN_STATE_INMIGRATE) || deferred_backend_init()) {
>>             bdrv_flags |= BDRV_O_INACTIVE;
> 
> Same here.


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 18/19] vfio-user: handle reset of remote device
  2022-03-07 11:36   ` Stefan Hajnoczi
@ 2022-03-07 15:37     ` Jag Raman
  2022-03-08 10:21       ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-03-07 15:37 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert



> On Mar 7, 2022, at 6:36 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:49:05AM -0500, Jagannathan Raman wrote:
>> Adds handler to reset a remote device
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> hw/remote/vfio-user-obj.c | 19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>> 
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index 2304643003..55f1bf5e0f 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -989,6 +989,19 @@ static bool vfu_object_migratable(VfuObject *o)
>>     return dc->vmsd && !dc->vmsd->unmigratable;
>> }
>> 
>> +static int vfu_object_device_reset(vfu_ctx_t *vfu_ctx, vfu_reset_type_t type)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +
>> +    if (type == VFU_RESET_LOST_CONN) {
>> +        return 0;
>> +    }
> 
> Why is a lost connection ignored? Should there be a QMP monitor event?

We handle the lost connection case in vfu_object_ctx_run(), which is in
PATCH 5 of this series. We are sending a QMP monitor event in this case.

Thank you!
--
Jag

> 
>> +
>> +    qdev_reset_all(DEVICE(o->pci_dev));
>> +
>> +    return 0;
>> +}
>> +
>> /*
>>  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>>  * properties. It also depends on devices instantiated in QEMU. These
>> @@ -1105,6 +1118,12 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>>         goto fail;
>>     }
>> 
>> +    ret = vfu_setup_device_reset_cb(o->vfu_ctx, &vfu_object_device_reset);
>> +    if (ret < 0) {
>> +        error_setg(errp, "vfu: Failed to setup reset callback");
>> +        goto fail;
>> +    }
>> +
>>     if (!vfu_object_migratable(o)) {
>>         goto realize_ctx;
>>     }
>> -- 
>> 2.20.1
>> 



^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 12/19] vfio-user: IOMMU support for remote device
  2022-03-07 14:42             ` Jag Raman
@ 2022-03-08 10:04               ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-08 10:04 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 2226 bytes --]

On Mon, Mar 07, 2022 at 02:42:49PM +0000, Jag Raman wrote:
> 
> 
> > On Mar 7, 2022, at 4:45 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Thu, Mar 03, 2022 at 02:49:53PM +0000, Jag Raman wrote:
> >> 
> >> 
> >>> On Mar 2, 2022, at 11:49 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>> 
> >>> On Mon, Feb 28, 2022 at 07:54:38PM +0000, Jag Raman wrote:
> >>>> 
> >>>> 
> >>>>> On Feb 22, 2022, at 5:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >>>>> 
> >>>>> On Thu, Feb 17, 2022 at 02:48:59AM -0500, Jagannathan Raman wrote:
> >>>>>> +struct RemoteIommuElem {
> >>>>>> +    AddressSpace  as;
> >>>>>> +    MemoryRegion  mr;
> >>>>>> +};
> >>>>>> +
> >>>>>> +GHashTable *remote_iommu_elem_by_bdf;
> >>>>> 
> >>>>> A mutable global hash table requires synchronization when device
> >>>>> emulation runs in multiple threads.
> >>>>> 
> >>>>> I suggest using pci_setup_iommu()'s iommu_opaque argument to avoid the
> >>>>> global. If there is only 1 device per remote PCI bus, then there are no
> >>>>> further synchronization concerns.
> >>>> 
> >>>> OK, will avoid the global. We would need to access the hash table
> >>>> concurrently since there could be more than one device in the
> >>>> same bus - so a mutex would be needed here.
> >>> 
> >>> I thought the PCIe topology can be set up with a separate buf for each
> >>> x-vfio-user-server? I remember something like that in the previous
> >>> revision where a root port was instantiated for each x-vfio-user-server.
> >> 
> >> Yes, we could setup the PCIe topology to be that way. But the user could
> >> add more than one device to the same bus, unless the bus type explicitly
> >> limits the number of devices to one (BusClass->max_dev).
> > 
> > Due to how the IOMMU is used to restrict the bus to the vfio-user
> > client's DMA mappings, it seems like it's necesssary to limit the number
> > of devices to 1 per bus anyway?
> 
> Hi Stefan,
> 
> “remote_iommu_elem_by_bdf” has a separate entry for each of the BDF
> combinations - it provides a separate DMA address space per device. As
> such, we don’t have to limit the number of devices to 1 per bus.

I see, thanks!

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-03-07 15:10     ` Jag Raman
@ 2022-03-08 10:15       ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-08 10:15 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 955 bytes --]

On Mon, Mar 07, 2022 at 03:10:41PM +0000, Jag Raman wrote:
> > On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
> >> @@ -332,6 +336,13 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
> >>     msi_send_message(dev, msg);
> >> }
> >> 
> >> +void msi_notify(PCIDevice *dev, unsigned int vector)
> >> +{
> >> +    if (dev->msi_notify) {
> > 
> > Can this ever be NULL?
> 
> Unlikely in the current code flow, but it could change in the future.
> 
> As a matter of principle, I thought that we should check if a function
> pointer is non-NULL before invoking it in QEMU. Is that not the case?

No, it's better to dump core with a backtrace when a program invariant
is violated than to silently suppress it. If msi_notify() is called but
the function pointer is NULL then there is a bug in the program that
needs to be fixed.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 18/19] vfio-user: handle reset of remote device
  2022-03-07 15:37     ` Jag Raman
@ 2022-03-08 10:21       ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-08 10:21 UTC (permalink / raw)
  To: Jag Raman
  Cc: eduardo, Elena Ufimtseva, John Johnson, berrange, bleal,
	john.levon, Michael S. Tsirkin, armbru, quintela, f4bug,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	pbonzini, eblake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 1437 bytes --]

On Mon, Mar 07, 2022 at 03:37:51PM +0000, Jag Raman wrote:
> 
> 
> > On Mar 7, 2022, at 6:36 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Thu, Feb 17, 2022 at 02:49:05AM -0500, Jagannathan Raman wrote:
> >> Adds handler to reset a remote device
> >> 
> >> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> >> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> >> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> >> ---
> >> hw/remote/vfio-user-obj.c | 19 +++++++++++++++++++
> >> 1 file changed, 19 insertions(+)
> >> 
> >> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> >> index 2304643003..55f1bf5e0f 100644
> >> --- a/hw/remote/vfio-user-obj.c
> >> +++ b/hw/remote/vfio-user-obj.c
> >> @@ -989,6 +989,19 @@ static bool vfu_object_migratable(VfuObject *o)
> >>     return dc->vmsd && !dc->vmsd->unmigratable;
> >> }
> >> 
> >> +static int vfu_object_device_reset(vfu_ctx_t *vfu_ctx, vfu_reset_type_t type)
> >> +{
> >> +    VfuObject *o = vfu_get_private(vfu_ctx);
> >> +
> >> +    if (type == VFU_RESET_LOST_CONN) {
> >> +        return 0;
> >> +    }
> > 
> > Why is a lost connection ignored? Should there be a QMP monitor event?
> 
> We handle the lost connection case in vfu_object_ctx_run(), which is in
> PATCH 5 of this series. We are sending a QMP monitor event in this case.

Great, please add a comment here.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-03-07 10:24   ` Stefan Hajnoczi
  2022-03-07 15:10     ` Jag Raman
@ 2022-03-26 23:47     ` Jag Raman
  2022-03-29 14:24       ` Stefan Hajnoczi
  1 sibling, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-03-26 23:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, john.levon, Michael S. Tsirkin, Markus Armbruster,
	Juan Quintela, Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	Paolo Bonzini, Eric Blake, dgilbert



> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
>> Forward remote device's interrupts to the guest
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> include/hw/pci/pci.h              |   6 ++
>> include/hw/remote/vfio-user-obj.h |   6 ++
>> hw/pci/msi.c                      |  13 +++-
>> hw/pci/msix.c                     |  12 +++-
>> hw/remote/machine.c               |  11 +--
>> hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
>> stubs/vfio-user-obj.c             |   6 ++
>> MAINTAINERS                       |   1 +
>> hw/remote/trace-events            |   1 +
>> stubs/meson.build                 |   1 +
>> 10 files changed, 158 insertions(+), 6 deletions(-)
>> create mode 100644 include/hw/remote/vfio-user-obj.h
>> create mode 100644 stubs/vfio-user-obj.c
>> 
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index c3f3c90473..d42d526a48 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
>>                                 pcibus_t addr, pcibus_t size, int type);
>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
>> 
>> typedef struct PCIIORegion {
>>     pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
>> @@ -323,6 +325,10 @@ struct PCIDevice {
>>     /* Space to store MSIX table & pending bit array */
>>     uint8_t *msix_table;
>>     uint8_t *msix_pba;
>> +
>> +    PCIMSINotify *msi_notify;
>> +    PCIMSIxNotify *msix_notify;
>> +
>>     /* MemoryRegion container for msix exclusive BAR setup */
>>     MemoryRegion msix_exclusive_bar;
>>     /* Memory Regions for MSIX table and pending bit entries. */
>> diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-user-obj.h
>> new file mode 100644
>> index 0000000000..87ab78b875
>> --- /dev/null
>> +++ b/include/hw/remote/vfio-user-obj.h
>> @@ -0,0 +1,6 @@
>> +#ifndef VFIO_USER_OBJ_H
>> +#define VFIO_USER_OBJ_H
>> +
>> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
>> +
>> +#endif
>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
>> index 47d2b0f33c..93f5e400cc 100644
>> --- a/hw/pci/msi.c
>> +++ b/hw/pci/msi.c
>> @@ -51,6 +51,8 @@
>>  */
>> bool msi_nonbroken;
>> 
>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
>> +
>> /* If we get rid of cap allocator, we won't need this. */
>> static inline uint8_t msi_cap_sizeof(uint16_t flags)
>> {
>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>     dev->msi_cap = config_offset;
>>     dev->cap_present |= QEMU_PCI_CAP_MSI;
>> 
>> +    dev->msi_notify = pci_msi_notify;
> 
> Are you sure it's correct to skip the msi_is_masked() logic? I think the
> callback function should only override the behavior of
> msi_send_message(), not the entire msi_notify() function.
> 
> The same applies to MSI-X.

Hi Stefan,

We noticed that the client is handling the masking and unmasking of MSIx
interrupts.

Concerning MSIx, vfio_msix_vector_use() handles unmasking and
vfio_msix_vector_release() handles masking operations. The server triggers
an MSIx interrupt by signaling the eventfd associated with the vector. If the vector
is unmasked, the interrupt bypasses the client/QEMU and takes this
path: “server -> KVM -> guest”. Whereas, if the vector is masked, it lands on the
client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt() suppresses the
interrupt if the vector is masked. The use and release functions switch the server’s
eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and
VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the
VFIO_DEVICE_SET_IRQS message.

Concerning MSI, the server should check if the vector is unmasked before
triggering. The server is not doing it presently, will update it. For some reason,
I had assumed that MSI handling is similar to MSIx in terms of masking - sorry
about that. The masking and unmasking information for MSI is in the config space
registers, so the server should have this information.

You had previously suggested using callbacks for msi_get_message &
msi_send_message, considering the masking issue. Given MSIx masking
(including the MSIx table BAR) is handled at the client, the masking information
doesn’t reach the server - so msix_notify will never invoke the
msi_send_message callback - all the vectors remain masked at the server
end (msix_init() -> msix_mask_all()).

Thank you!
--
Jag

> 
>> +
>>     pci_set_word(dev->config + msi_flags_off(dev), flags);
>>     pci_set_word(dev->wmask + msi_flags_off(dev),
>>                  PCI_MSI_FLAGS_QSIZE | PCI_MSI_FLAGS_ENABLE);
>> @@ -307,7 +311,7 @@ bool msi_is_masked(const PCIDevice *dev, unsigned int vector)
>>     return mask & (1U << vector);
>> }
>> 
>> -void msi_notify(PCIDevice *dev, unsigned int vector)
>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector)
>> {
>>     uint16_t flags = pci_get_word(dev->config + msi_flags_off(dev));
>>     bool msi64bit = flags & PCI_MSI_FLAGS_64BIT;
>> @@ -332,6 +336,13 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
>>     msi_send_message(dev, msg);
>> }
>> 
>> +void msi_notify(PCIDevice *dev, unsigned int vector)
>> +{
>> +    if (dev->msi_notify) {
> 
> Can this ever be NULL?
> 
>> +        dev->msi_notify(dev, vector);
>> +    }
>> +}
>> +
>> void msi_send_message(PCIDevice *dev, MSIMessage msg)
>> {
>>     MemTxAttrs attrs = {};
>> diff --git a/hw/pci/msix.c b/hw/pci/msix.c
>> index ae9331cd0b..1c71e67f53 100644
>> --- a/hw/pci/msix.c
>> +++ b/hw/pci/msix.c
>> @@ -31,6 +31,8 @@
>> #define MSIX_ENABLE_MASK (PCI_MSIX_FLAGS_ENABLE >> 8)
>> #define MSIX_MASKALL_MASK (PCI_MSIX_FLAGS_MASKALL >> 8)
>> 
>> +static void pci_msix_notify(PCIDevice *dev, unsigned vector);
>> +
>> MSIMessage msix_get_message(PCIDevice *dev, unsigned vector)
>> {
>>     uint8_t *table_entry = dev->msix_table + vector * PCI_MSIX_ENTRY_SIZE;
>> @@ -334,6 +336,7 @@ int msix_init(struct PCIDevice *dev, unsigned short nentries,
>>     dev->msix_table = g_malloc0(table_size);
>>     dev->msix_pba = g_malloc0(pba_size);
>>     dev->msix_entry_used = g_malloc0(nentries * sizeof *dev->msix_entry_used);
>> +    dev->msix_notify = pci_msix_notify;
>> 
>>     msix_mask_all(dev, nentries);
>> 
>> @@ -485,7 +488,7 @@ int msix_enabled(PCIDevice *dev)
>> }
>> 
>> /* Send an MSI-X message */
>> -void msix_notify(PCIDevice *dev, unsigned vector)
>> +static void pci_msix_notify(PCIDevice *dev, unsigned vector)
>> {
>>     MSIMessage msg;
>> 
>> @@ -503,6 +506,13 @@ void msix_notify(PCIDevice *dev, unsigned vector)
>>     msi_send_message(dev, msg);
>> }
>> 
>> +void msix_notify(PCIDevice *dev, unsigned vector)
>> +{
>> +    if (dev->msix_notify) {
> 
> Can this ever be NULL?
> 
>> +        dev->msix_notify(dev, vector);
>> +    }
>> +}
>> +
>> void msix_reset(PCIDevice *dev)
>> {
>>     if (!msix_present(dev)) {
>> diff --git a/hw/remote/machine.c b/hw/remote/machine.c
>> index db4ae30710..a8b4a3aef3 100644
>> --- a/hw/remote/machine.c
>> +++ b/hw/remote/machine.c
>> @@ -23,6 +23,7 @@
>> #include "hw/remote/iohub.h"
>> #include "hw/qdev-core.h"
>> #include "hw/remote/iommu.h"
>> +#include "hw/remote/vfio-user-obj.h"
>> 
>> static void remote_machine_init(MachineState *machine)
>> {
>> @@ -54,12 +55,14 @@ static void remote_machine_init(MachineState *machine)
>> 
>>     if (s->vfio_user) {
>>         remote_configure_iommu(pci_host->bus);
>> -    }
>> 
>> -    remote_iohub_init(&s->iohub);
>> +        vfu_object_set_bus_irq(pci_host->bus);
>> +    } else {
>> +        remote_iohub_init(&s->iohub);
>> 
>> -    pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
>> -                 &s->iohub, REMOTE_IOHUB_NB_PIRQS);
>> +        pci_bus_irqs(pci_host->bus, remote_iohub_set_irq, remote_iohub_map_irq,
>> +                     &s->iohub, REMOTE_IOHUB_NB_PIRQS);
>> +    }
>> 
>>     qbus_set_hotplug_handler(BUS(pci_host->bus), OBJECT(s));
>> }
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index 2feabd06a4..d79bab87f1 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -54,6 +54,9 @@
>> #include "hw/pci/pci.h"
>> #include "qemu/timer.h"
>> #include "exec/memory.h"
>> +#include "hw/pci/msi.h"
>> +#include "hw/pci/msix.h"
>> +#include "hw/remote/vfio-user-obj.h"
>> 
>> #define TYPE_VFU_OBJECT "x-vfio-user-server"
>> OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
>> @@ -107,6 +110,10 @@ struct VfuObject {
>>     int vfu_poll_fd;
>> };
>> 
>> +static GHashTable *vfu_object_bdf_to_ctx_table;
> 
> I suggest adding a void *msi_notify_opaque field to PCIDevice and
> passing the value as an argument to ->msi_notify(). vfio-user-obj.c can
> set the value to vfu_ctx and eliminate the vfu_object_bdf_to_ctx_table
> hash table.
> 
> This simplifies the code, makes it faster, and solves race conditions
> during hot plug/unplug if other instances are running in IOThreads.
> 
>> +
>> +#define INT2VOIDP(i) (void *)(uintptr_t)(i)
>> +
>> static void vfu_object_init_ctx(VfuObject *o, Error **errp);
>> 
>> static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
>> @@ -463,6 +470,86 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>>     }
>> }
>> 
>> +static void vfu_object_irq_trigger(int pci_bdf, unsigned vector)
>> +{
>> +    vfu_ctx_t *vfu_ctx = NULL;
>> +
>> +    if (!vfu_object_bdf_to_ctx_table) {
> 
> Can this ever be NULL?
> 
>> +        return;
>> +    }
>> +
>> +    vfu_ctx = g_hash_table_lookup(vfu_object_bdf_to_ctx_table,
>> +                                  INT2VOIDP(pci_bdf));
>> +
>> +    if (vfu_ctx) {
>> +        vfu_irq_trigger(vfu_ctx, vector);
>> +    }
>> +}
>> +
>> +static int vfu_object_map_irq(PCIDevice *pci_dev, int intx)
>> +{
>> +    int pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)),
>> +                                pci_dev->devfn);
>> +
>> +    return pci_bdf;
>> +}
>> +
>> +static void vfu_object_set_irq(void *opaque, int pirq, int level)
>> +{
>> +    if (level) {
>> +        vfu_object_irq_trigger(pirq, 0);
>> +    }
>> +}
>> +
>> +static void vfu_object_msi_notify(PCIDevice *pci_dev, unsigned vector)
>> +{
>> +    int pci_bdf;
>> +
>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
>> +
>> +    vfu_object_irq_trigger(pci_bdf, vector);
>> +}
>> +
>> +static int vfu_object_setup_irqs(VfuObject *o, PCIDevice *pci_dev)
>> +{
>> +    vfu_ctx_t *vfu_ctx = o->vfu_ctx;
>> +    int ret, pci_bdf;
>> +
>> +    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = 0;
>> +    if (msix_nr_vectors_allocated(pci_dev)) {
>> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSIX_IRQ,
>> +                                       msix_nr_vectors_allocated(pci_dev));
>> +
>> +        pci_dev->msix_notify = vfu_object_msi_notify;
>> +    } else if (msi_nr_vectors_allocated(pci_dev)) {
>> +        ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_MSI_IRQ,
>> +                                       msi_nr_vectors_allocated(pci_dev));
>> +
>> +        pci_dev->msi_notify = vfu_object_msi_notify;
>> +    }
>> +
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(pci_dev)), pci_dev->devfn);
>> +
>> +    g_hash_table_insert(vfu_object_bdf_to_ctx_table, INT2VOIDP(pci_bdf),
>> +                        o->vfu_ctx);
>> +
>> +    return 0;
>> +}
>> +
>> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
>> +{
>> +    pci_bus_irqs(pci_bus, vfu_object_set_irq, vfu_object_map_irq, NULL, 1);
>> +}
>> +
>> /*
>>  * TYPE_VFU_OBJECT depends on the availability of the 'socket' and 'device'
>>  * properties. It also depends on devices instantiated in QEMU. These
>> @@ -559,6 +646,13 @@ static void vfu_object_init_ctx(VfuObject *o, Error **errp)
>> 
>>     vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
>> 
>> +    ret = vfu_object_setup_irqs(o, o->pci_dev);
>> +    if (ret < 0) {
>> +        error_setg(errp, "vfu: Failed to setup interrupts for %s",
>> +                   o->device);
>> +        goto fail;
>> +    }
>> +
>>     ret = vfu_realize_ctx(o->vfu_ctx);
>>     if (ret < 0) {
>>         error_setg(errp, "vfu: Failed to realize device %s- %s",
>> @@ -612,6 +706,7 @@ static void vfu_object_finalize(Object *obj)
>> {
>>     VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
>>     VfuObject *o = VFU_OBJECT(obj);
>> +    int pci_bdf;
>> 
>>     k->nr_devs--;
>> 
>> @@ -638,9 +733,17 @@ static void vfu_object_finalize(Object *obj)
>>         o->unplug_blocker = NULL;
>>     }
>> 
>> +    if (o->pci_dev) {
>> +        pci_bdf = PCI_BUILD_BDF(pci_bus_num(pci_get_bus(o->pci_dev)),
>> +                                o->pci_dev->devfn);
>> +        g_hash_table_remove(vfu_object_bdf_to_ctx_table, INT2VOIDP(pci_bdf));
>> +    }
>> +
>>     o->pci_dev = NULL;
>> 
>>     if (!k->nr_devs && k->auto_shutdown) {
>> +        g_hash_table_destroy(vfu_object_bdf_to_ctx_table);
>> +        vfu_object_bdf_to_ctx_table = NULL;
>>         qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>>     }
>> 
>> @@ -658,6 +761,10 @@ static void vfu_object_class_init(ObjectClass *klass, void *data)
>> 
>>     k->auto_shutdown = true;
>> 
>> +    msi_nonbroken = true;
> 
> This should go in hw/remote/machine.c. It's a global variable related to
> the machine's interrupt controller capabilities. The value is not
> related to vfu_object_class_init(), which will be called by any QEMU
> binary that links hw/remote/vfio-user-obj.o regardless of which machine
> type is instantiated.
> 
>> +
>> +    vfu_object_bdf_to_ctx_table = g_hash_table_new_full(NULL, NULL, NULL, NULL);
>> +
>>     object_class_property_add(klass, "socket", "SocketAddress", NULL,
>>                               vfu_object_set_socket, NULL, NULL);
>>     object_class_property_set_description(klass, "socket",
>> diff --git a/stubs/vfio-user-obj.c b/stubs/vfio-user-obj.c
>> new file mode 100644
>> index 0000000000..79100d768e
>> --- /dev/null
>> +++ b/stubs/vfio-user-obj.c
>> @@ -0,0 +1,6 @@
>> +#include "qemu/osdep.h"
>> +#include "hw/remote/vfio-user-obj.h"
>> +
>> +void vfu_object_set_bus_irq(PCIBus *pci_bus)
>> +{
>> +}
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index f47232c78c..e274cb46af 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3569,6 +3569,7 @@ F: hw/remote/iohub.c
>> F: include/hw/remote/iohub.h
>> F: subprojects/libvfio-user
>> F: hw/remote/vfio-user-obj.c
>> +F: include/hw/remote/vfio-user-obj.h
>> F: hw/remote/iommu.c
>> F: include/hw/remote/iommu.h
>> 
>> diff --git a/hw/remote/trace-events b/hw/remote/trace-events
>> index 847d50d88f..c167b3c7a5 100644
>> --- a/hw/remote/trace-events
>> +++ b/hw/remote/trace-events
>> @@ -12,3 +12,4 @@ vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
>> vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
>> vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
>> vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
>> +vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
>> diff --git a/stubs/meson.build b/stubs/meson.build
>> index d359cbe1ad..c5ce979dc3 100644
>> --- a/stubs/meson.build
>> +++ b/stubs/meson.build
>> @@ -57,3 +57,4 @@ if have_system
>> else
>>   stub_ss.add(files('qdev.c'))
>> endif
>> +stub_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_false: files('vfio-user-obj.c'))
>> -- 
>> 2.20.1
>> 


^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-03-26 23:47     ` Jag Raman
@ 2022-03-29 14:24       ` Stefan Hajnoczi
  2022-03-29 19:06         ` Jag Raman
  0 siblings, 1 reply; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-03-29 14:24 UTC (permalink / raw)
  To: Jag Raman, Alex Williamson
  Cc: eduardo, Elena Ufimtseva, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, john.levon, Michael S. Tsirkin, Markus Armbruster,
	Juan Quintela, Philippe Mathieu-Daudé,
	qemu-devel, Kanth Ghatraju, thanos.makatos, Paolo Bonzini,
	Eric Blake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 6477 bytes --]

On Sat, Mar 26, 2022 at 11:47:36PM +0000, Jag Raman wrote:
> 
> 
> > On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
> >> Forward remote device's interrupts to the guest
> >> 
> >> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> >> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> >> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> >> ---
> >> include/hw/pci/pci.h              |   6 ++
> >> include/hw/remote/vfio-user-obj.h |   6 ++
> >> hw/pci/msi.c                      |  13 +++-
> >> hw/pci/msix.c                     |  12 +++-
> >> hw/remote/machine.c               |  11 +--
> >> hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
> >> stubs/vfio-user-obj.c             |   6 ++
> >> MAINTAINERS                       |   1 +
> >> hw/remote/trace-events            |   1 +
> >> stubs/meson.build                 |   1 +
> >> 10 files changed, 158 insertions(+), 6 deletions(-)
> >> create mode 100644 include/hw/remote/vfio-user-obj.h
> >> create mode 100644 stubs/vfio-user-obj.c
> >> 
> >> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >> index c3f3c90473..d42d526a48 100644
> >> --- a/include/hw/pci/pci.h
> >> +++ b/include/hw/pci/pci.h
> >> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
> >> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
> >>                                 pcibus_t addr, pcibus_t size, int type);
> >> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
> >> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
> >> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
> >> 
> >> typedef struct PCIIORegion {
> >>     pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
> >> @@ -323,6 +325,10 @@ struct PCIDevice {
> >>     /* Space to store MSIX table & pending bit array */
> >>     uint8_t *msix_table;
> >>     uint8_t *msix_pba;
> >> +
> >> +    PCIMSINotify *msi_notify;
> >> +    PCIMSIxNotify *msix_notify;
> >> +
> >>     /* MemoryRegion container for msix exclusive BAR setup */
> >>     MemoryRegion msix_exclusive_bar;
> >>     /* Memory Regions for MSIX table and pending bit entries. */
> >> diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-user-obj.h
> >> new file mode 100644
> >> index 0000000000..87ab78b875
> >> --- /dev/null
> >> +++ b/include/hw/remote/vfio-user-obj.h
> >> @@ -0,0 +1,6 @@
> >> +#ifndef VFIO_USER_OBJ_H
> >> +#define VFIO_USER_OBJ_H
> >> +
> >> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
> >> +
> >> +#endif
> >> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> >> index 47d2b0f33c..93f5e400cc 100644
> >> --- a/hw/pci/msi.c
> >> +++ b/hw/pci/msi.c
> >> @@ -51,6 +51,8 @@
> >>  */
> >> bool msi_nonbroken;
> >> 
> >> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
> >> +
> >> /* If we get rid of cap allocator, we won't need this. */
> >> static inline uint8_t msi_cap_sizeof(uint16_t flags)
> >> {
> >> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
> >>     dev->msi_cap = config_offset;
> >>     dev->cap_present |= QEMU_PCI_CAP_MSI;
> >> 
> >> +    dev->msi_notify = pci_msi_notify;
> > 
> > Are you sure it's correct to skip the msi_is_masked() logic? I think the
> > callback function should only override the behavior of
> > msi_send_message(), not the entire msi_notify() function.
> > 
> > The same applies to MSI-X.
> 
> Hi Stefan,
> 
> We noticed that the client is handling the masking and unmasking of MSIx
> interrupts.
> 
> Concerning MSIx, vfio_msix_vector_use() handles unmasking and
> vfio_msix_vector_release() handles masking operations. The server triggers
> an MSIx interrupt by signaling the eventfd associated with the vector. If the vector
> is unmasked, the interrupt bypasses the client/QEMU and takes this
> path: “server -> KVM -> guest”. Whereas, if the vector is masked, it lands on the
> client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt() suppresses the
> interrupt if the vector is masked. The use and release functions switch the server’s
> eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and
> VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the
> VFIO_DEVICE_SET_IRQS message.
> 
> Concerning MSI, the server should check if the vector is unmasked before
> triggering. The server is not doing it presently, will update it. For some reason,
> I had assumed that MSI handling is similar to MSIx in terms of masking - sorry
> about that. The masking and unmasking information for MSI is in the config space
> registers, so the server should have this information.
> 
> You had previously suggested using callbacks for msi_get_message &
> msi_send_message, considering the masking issue. Given MSIx masking
> (including the MSIx table BAR) is handled at the client, the masking information
> doesn’t reach the server - so msix_notify will never invoke the
> msi_send_message callback - all the vectors remain masked at the server
> end (msix_init() -> msix_mask_all()).

I was expecting vfio-user devices to be involved in MSI-X masking so
they can implement the Pending bit semantics described in the spec:

  If a masked vector has its Pending bit Set, and the associated
  underlying interrupt events are somehow satisfied (usually by software
  though the exact manner is Function-specific), the Function must Clear
  the Pending bit, to avoid sending a spurious interrupt message later
  when software unmasks the vector.

Does QEMU VFIO support this?

What happens when a hw/net/e1000e_core.c vfio-user device uses
msix_clr_pending() and related APIs?

Also, having the vfio-user daemon write to the eventfd while the vector
is masked is a waste of CPU cycles. The PCIe spec describes using MSI-X
masking for poll mode operation and going via eventfd is suboptimal:

  Software is permitted to mask one or more vectors indefinitely, and
  service their associated interrupt events strictly based on polling
  their Pending bits. A Function must Set and Clear its Pending bits as
  necessary to support this “pure polling” mode of operation.

Maybe the answer is these issues don't matter in practice because MSI-X
masking is not used much?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-03-29 14:24       ` Stefan Hajnoczi
@ 2022-03-29 19:06         ` Jag Raman
  2022-03-30  9:40           ` Thanos Makatos
  0 siblings, 1 reply; 76+ messages in thread
From: Jag Raman @ 2022-03-29 19:06 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, john.levon, Michael S. Tsirkin, Markus Armbruster,
	Juan Quintela, Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, thanos.makatos,
	Paolo Bonzini, Eric Blake, dgilbert



> On Mar 29, 2022, at 10:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Sat, Mar 26, 2022 at 11:47:36PM +0000, Jag Raman wrote:
>> 
>> 
>>> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>>> 
>>> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
>>>> Forward remote device's interrupts to the guest
>>>> 
>>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>>> ---
>>>> include/hw/pci/pci.h              |   6 ++
>>>> include/hw/remote/vfio-user-obj.h |   6 ++
>>>> hw/pci/msi.c                      |  13 +++-
>>>> hw/pci/msix.c                     |  12 +++-
>>>> hw/remote/machine.c               |  11 +--
>>>> hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
>>>> stubs/vfio-user-obj.c             |   6 ++
>>>> MAINTAINERS                       |   1 +
>>>> hw/remote/trace-events            |   1 +
>>>> stubs/meson.build                 |   1 +
>>>> 10 files changed, 158 insertions(+), 6 deletions(-)
>>>> create mode 100644 include/hw/remote/vfio-user-obj.h
>>>> create mode 100644 stubs/vfio-user-obj.c
>>>> 
>>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>>>> index c3f3c90473..d42d526a48 100644
>>>> --- a/include/hw/pci/pci.h
>>>> +++ b/include/hw/pci/pci.h
>>>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice *pci_dev,
>>>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
>>>>                                pcibus_t addr, pcibus_t size, int type);
>>>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
>>>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
>>>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
>>>> 
>>>> typedef struct PCIIORegion {
>>>>    pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
>>>> @@ -323,6 +325,10 @@ struct PCIDevice {
>>>>    /* Space to store MSIX table & pending bit array */
>>>>    uint8_t *msix_table;
>>>>    uint8_t *msix_pba;
>>>> +
>>>> +    PCIMSINotify *msi_notify;
>>>> +    PCIMSIxNotify *msix_notify;
>>>> +
>>>>    /* MemoryRegion container for msix exclusive BAR setup */
>>>>    MemoryRegion msix_exclusive_bar;
>>>>    /* Memory Regions for MSIX table and pending bit entries. */
>>>> diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-user-obj.h
>>>> new file mode 100644
>>>> index 0000000000..87ab78b875
>>>> --- /dev/null
>>>> +++ b/include/hw/remote/vfio-user-obj.h
>>>> @@ -0,0 +1,6 @@
>>>> +#ifndef VFIO_USER_OBJ_H
>>>> +#define VFIO_USER_OBJ_H
>>>> +
>>>> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
>>>> +
>>>> +#endif
>>>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
>>>> index 47d2b0f33c..93f5e400cc 100644
>>>> --- a/hw/pci/msi.c
>>>> +++ b/hw/pci/msi.c
>>>> @@ -51,6 +51,8 @@
>>>> */
>>>> bool msi_nonbroken;
>>>> 
>>>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
>>>> +
>>>> /* If we get rid of cap allocator, we won't need this. */
>>>> static inline uint8_t msi_cap_sizeof(uint16_t flags)
>>>> {
>>>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
>>>>    dev->msi_cap = config_offset;
>>>>    dev->cap_present |= QEMU_PCI_CAP_MSI;
>>>> 
>>>> +    dev->msi_notify = pci_msi_notify;
>>> 
>>> Are you sure it's correct to skip the msi_is_masked() logic? I think the
>>> callback function should only override the behavior of
>>> msi_send_message(), not the entire msi_notify() function.
>>> 
>>> The same applies to MSI-X.
>> 
>> Hi Stefan,
>> 
>> We noticed that the client is handling the masking and unmasking of MSIx
>> interrupts.
>> 
>> Concerning MSIx, vfio_msix_vector_use() handles unmasking and
>> vfio_msix_vector_release() handles masking operations. The server triggers
>> an MSIx interrupt by signaling the eventfd associated with the vector. If the vector
>> is unmasked, the interrupt bypasses the client/QEMU and takes this
>> path: “server -> KVM -> guest”. Whereas, if the vector is masked, it lands on the
>> client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt() suppresses the
>> interrupt if the vector is masked. The use and release functions switch the server’s
>> eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and
>> VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the
>> VFIO_DEVICE_SET_IRQS message.
>> 
>> Concerning MSI, the server should check if the vector is unmasked before
>> triggering. The server is not doing it presently, will update it. For some reason,
>> I had assumed that MSI handling is similar to MSIx in terms of masking - sorry
>> about that. The masking and unmasking information for MSI is in the config space
>> registers, so the server should have this information.
>> 
>> You had previously suggested using callbacks for msi_get_message &
>> msi_send_message, considering the masking issue. Given MSIx masking
>> (including the MSIx table BAR) is handled at the client, the masking information
>> doesn’t reach the server - so msix_notify will never invoke the
>> msi_send_message callback - all the vectors remain masked at the server
>> end (msix_init() -> msix_mask_all()).
> 
> I was expecting vfio-user devices to be involved in MSI-X masking so
> they can implement the Pending bit semantics described in the spec:
> 
>  If a masked vector has its Pending bit Set, and the associated
>  underlying interrupt events are somehow satisfied (usually by software
>  though the exact manner is Function-specific), the Function must Clear
>  the Pending bit, to avoid sending a spurious interrupt message later
>  when software unmasks the vector.
> 
> Does QEMU VFIO support this?

QEMU VFIO doesn’t seem to support it - I couldn’t find a place where
an assigned/passthru PCI device clears the pending bits in QEMU.

> 
> What happens when a hw/net/e1000e_core.c vfio-user device uses
> msix_clr_pending() and related APIs?
> 
> Also, having the vfio-user daemon write to the eventfd while the vector
> is masked is a waste of CPU cycles. The PCIe spec describes using MSI-X
> masking for poll mode operation and going via eventfd is suboptimal:
> 
>  Software is permitted to mask one or more vectors indefinitely, and
>  service their associated interrupt events strictly based on polling
>  their Pending bits. A Function must Set and Clear its Pending bits as
>  necessary to support this “pure polling” mode of operation.
> 
> Maybe the answer is these issues don't matter in practice because MSI-X
> masking is not used much?

From what I can tell, “pure polling” is used by ivshmem and virtio-pci devices in QEMU.

e1000e doesn’t use “pure polling”, but it does clear pending interrupts.

John Johnson, John Levon & Thanos,

    Any thoughts?

Thank you!
--
Jag

> 
> Stefan


^ permalink raw reply	[flat|nested] 76+ messages in thread

* RE: [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-03-29 19:06         ` Jag Raman
@ 2022-03-30  9:40           ` Thanos Makatos
  2022-04-04  9:44             ` Stefan Hajnoczi
  0 siblings, 1 reply; 76+ messages in thread
From: Thanos Makatos @ 2022-03-30  9:40 UTC (permalink / raw)
  To: Jag Raman, Stefan Hajnoczi
  Cc: eduardo, Elena Ufimtseva, John Johnson, Daniel P. Berrangé,
	Beraldo Leal, John Levon, Michael S. Tsirkin, Markus Armbruster,
	Juan Quintela, Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju, Paolo Bonzini,
	Eric Blake, dgilbert

> -----Original Message-----
> From: Jag Raman <jag.raman@oracle.com>
> Sent: 29 March 2022 20:07
> To: Stefan Hajnoczi <stefanha@redhat.com>
> Cc: Alex Williamson <alex.williamson@redhat.com>; qemu-devel <qemu-
> devel@nongnu.org>; Michael S. Tsirkin <mst@redhat.com>; Philippe Mathieu-
> Daudé <f4bug@amsat.org>; Paolo Bonzini <pbonzini@redhat.com>; Beraldo
> Leal <bleal@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>;
> eduardo@habkost.net; Marcel Apfelbaum <marcel.apfelbaum@gmail.com>;
> Eric Blake <eblake@redhat.com>; Markus Armbruster <armbru@redhat.com>;
> Juan Quintela <quintela@redhat.com>; dgilbert@redhat.com; John Levon
> <john.levon@nutanix.com>; Thanos Makatos <thanos.makatos@nutanix.com>;
> Elena Ufimtseva <elena.ufimtseva@oracle.com>; John Johnson
> <john.g.johnson@oracle.com>; Kanth Ghatraju <kanth.ghatraju@oracle.com>
> Subject: Re: [PATCH v6 15/19] vfio-user: handle device interrupts
> 
> 
> 
> > On Mar 29, 2022, at 10:24 AM, Stefan Hajnoczi <stefanha@redhat.com>
> wrote:
> >
> > On Sat, Mar 26, 2022 at 11:47:36PM +0000, Jag Raman wrote:
> >>
> >>
> >>> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com>
> wrote:
> >>>
> >>> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
> >>>> Forward remote device's interrupts to the guest
> >>>>
> >>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> >>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> >>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> >>>> ---
> >>>> include/hw/pci/pci.h              |   6 ++
> >>>> include/hw/remote/vfio-user-obj.h |   6 ++
> >>>> hw/pci/msi.c                      |  13 +++-
> >>>> hw/pci/msix.c                     |  12 +++-
> >>>> hw/remote/machine.c               |  11 +--
> >>>> hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
> >>>> stubs/vfio-user-obj.c             |   6 ++
> >>>> MAINTAINERS                       |   1 +
> >>>> hw/remote/trace-events            |   1 +
> >>>> stubs/meson.build                 |   1 +
> >>>> 10 files changed, 158 insertions(+), 6 deletions(-)
> >>>> create mode 100644 include/hw/remote/vfio-user-obj.h
> >>>> create mode 100644 stubs/vfio-user-obj.c
> >>>>
> >>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> >>>> index c3f3c90473..d42d526a48 100644
> >>>> --- a/include/hw/pci/pci.h
> >>>> +++ b/include/hw/pci/pci.h
> >>>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice
> *pci_dev,
> >>>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
> >>>>                                pcibus_t addr, pcibus_t size, int type);
> >>>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
> >>>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
> >>>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
> >>>>
> >>>> typedef struct PCIIORegion {
> >>>>    pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
> >>>> @@ -323,6 +325,10 @@ struct PCIDevice {
> >>>>    /* Space to store MSIX table & pending bit array */
> >>>>    uint8_t *msix_table;
> >>>>    uint8_t *msix_pba;
> >>>> +
> >>>> +    PCIMSINotify *msi_notify;
> >>>> +    PCIMSIxNotify *msix_notify;
> >>>> +
> >>>>    /* MemoryRegion container for msix exclusive BAR setup */
> >>>>    MemoryRegion msix_exclusive_bar;
> >>>>    /* Memory Regions for MSIX table and pending bit entries. */
> >>>> diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-
> user-obj.h
> >>>> new file mode 100644
> >>>> index 0000000000..87ab78b875
> >>>> --- /dev/null
> >>>> +++ b/include/hw/remote/vfio-user-obj.h
> >>>> @@ -0,0 +1,6 @@
> >>>> +#ifndef VFIO_USER_OBJ_H
> >>>> +#define VFIO_USER_OBJ_H
> >>>> +
> >>>> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
> >>>> +
> >>>> +#endif
> >>>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> >>>> index 47d2b0f33c..93f5e400cc 100644
> >>>> --- a/hw/pci/msi.c
> >>>> +++ b/hw/pci/msi.c
> >>>> @@ -51,6 +51,8 @@
> >>>> */
> >>>> bool msi_nonbroken;
> >>>>
> >>>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
> >>>> +
> >>>> /* If we get rid of cap allocator, we won't need this. */
> >>>> static inline uint8_t msi_cap_sizeof(uint16_t flags)
> >>>> {
> >>>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
> >>>>    dev->msi_cap = config_offset;
> >>>>    dev->cap_present |= QEMU_PCI_CAP_MSI;
> >>>>
> >>>> +    dev->msi_notify = pci_msi_notify;
> >>>
> >>> Are you sure it's correct to skip the msi_is_masked() logic? I think the
> >>> callback function should only override the behavior of
> >>> msi_send_message(), not the entire msi_notify() function.
> >>>
> >>> The same applies to MSI-X.
> >>
> >> Hi Stefan,
> >>
> >> We noticed that the client is handling the masking and unmasking of MSIx
> >> interrupts.
> >>
> >> Concerning MSIx, vfio_msix_vector_use() handles unmasking and
> >> vfio_msix_vector_release() handles masking operations. The server triggers
> >> an MSIx interrupt by signaling the eventfd associated with the vector. If the
> vector
> >> is unmasked, the interrupt bypasses the client/QEMU and takes this
> >> path: “server -> KVM -> guest”. Whereas, if the vector is masked, it lands on
> the
> >> client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt() suppresses
> the
> >> interrupt if the vector is masked. The use and release functions switch the
> server’s
> >> eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and
> >> VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the
> >> VFIO_DEVICE_SET_IRQS message.
> >>
> >> Concerning MSI, the server should check if the vector is unmasked before
> >> triggering. The server is not doing it presently, will update it. For some
> reason,
> >> I had assumed that MSI handling is similar to MSIx in terms of masking - sorry
> >> about that. The masking and unmasking information for MSI is in the config
> space
> >> registers, so the server should have this information.
> >>
> >> You had previously suggested using callbacks for msi_get_message &
> >> msi_send_message, considering the masking issue. Given MSIx masking
> >> (including the MSIx table BAR) is handled at the client, the masking
> information
> >> doesn’t reach the server - so msix_notify will never invoke the
> >> msi_send_message callback - all the vectors remain masked at the server
> >> end (msix_init() -> msix_mask_all()).
> >
> > I was expecting vfio-user devices to be involved in MSI-X masking so

libvfio-user can't be involved in the first place since QEMU emulates MSI/X:
https://lore.kernel.org/all/20200121101911.64701afd@w520.home/T/

> > they can implement the Pending bit semantics described in the spec:
> >
> >  If a masked vector has its Pending bit Set, and the associated
> >  underlying interrupt events are somehow satisfied (usually by software
> >  though the exact manner is Function-specific), the Function must Clear
> >  the Pending bit, to avoid sending a spurious interrupt message later
> >  when software unmasks the vector.
> >
> > Does QEMU VFIO support this?
> 
> QEMU VFIO doesn’t seem to support it - I couldn’t find a place where
> an assigned/passthru PCI device clears the pending bits in QEMU.
> 
> >
> > What happens when a hw/net/e1000e_core.c vfio-user device uses
> > msix_clr_pending() and related APIs?
> >
> > Also, having the vfio-user daemon write to the eventfd while the vector
> > is masked is a waste of CPU cycles. The PCIe spec describes using MSI-X
> > masking for poll mode operation and going via eventfd is suboptimal:
> >
> >  Software is permitted to mask one or more vectors indefinitely, and
> >  service their associated interrupt events strictly based on polling
> >  their Pending bits. A Function must Set and Clear its Pending bits as
> >  necessary to support this “pure polling” mode of operation.
> >
> > Maybe the answer is these issues don't matter in practice because MSI-X
> > masking is not used much?
> 
> From what I can tell, “pure polling” is used by ivshmem and virtio-pci devices in
> QEMU.
> 
> e1000e doesn’t use “pure polling”, but it does clear pending interrupts.
> 
> John Johnson, John Levon & Thanos,
> 
>     Any thoughts?

If QEMU stops emulating MSI/X then we libvfio-user would have to do it.

^ permalink raw reply	[flat|nested] 76+ messages in thread

* Re: [PATCH v6 15/19] vfio-user: handle device interrupts
  2022-03-30  9:40           ` Thanos Makatos
@ 2022-04-04  9:44             ` Stefan Hajnoczi
  0 siblings, 0 replies; 76+ messages in thread
From: Stefan Hajnoczi @ 2022-04-04  9:44 UTC (permalink / raw)
  To: Thanos Makatos
  Cc: eduardo, Elena Ufimtseva, John Johnson, Jag Raman, Beraldo Leal,
	John Levon, Michael S. Tsirkin, Markus Armbruster, Juan Quintela,
	Philippe Mathieu-Daudé,
	qemu-devel, Alex Williamson, Kanth Ghatraju,
	Daniel P. Berrangé,
	Paolo Bonzini, Eric Blake, dgilbert

[-- Attachment #1: Type: text/plain, Size: 9199 bytes --]

On Wed, Mar 30, 2022 at 09:40:42AM +0000, Thanos Makatos wrote:
> > -----Original Message-----
> > From: Jag Raman <jag.raman@oracle.com>
> > Sent: 29 March 2022 20:07
> > To: Stefan Hajnoczi <stefanha@redhat.com>
> > Cc: Alex Williamson <alex.williamson@redhat.com>; qemu-devel <qemu-
> > devel@nongnu.org>; Michael S. Tsirkin <mst@redhat.com>; Philippe Mathieu-
> > Daudé <f4bug@amsat.org>; Paolo Bonzini <pbonzini@redhat.com>; Beraldo
> > Leal <bleal@redhat.com>; Daniel P. Berrangé <berrange@redhat.com>;
> > eduardo@habkost.net; Marcel Apfelbaum <marcel.apfelbaum@gmail.com>;
> > Eric Blake <eblake@redhat.com>; Markus Armbruster <armbru@redhat.com>;
> > Juan Quintela <quintela@redhat.com>; dgilbert@redhat.com; John Levon
> > <john.levon@nutanix.com>; Thanos Makatos <thanos.makatos@nutanix.com>;
> > Elena Ufimtseva <elena.ufimtseva@oracle.com>; John Johnson
> > <john.g.johnson@oracle.com>; Kanth Ghatraju <kanth.ghatraju@oracle.com>
> > Subject: Re: [PATCH v6 15/19] vfio-user: handle device interrupts
> > 
> > 
> > 
> > > On Mar 29, 2022, at 10:24 AM, Stefan Hajnoczi <stefanha@redhat.com>
> > wrote:
> > >
> > > On Sat, Mar 26, 2022 at 11:47:36PM +0000, Jag Raman wrote:
> > >>
> > >>
> > >>> On Mar 7, 2022, at 5:24 AM, Stefan Hajnoczi <stefanha@redhat.com>
> > wrote:
> > >>>
> > >>> On Thu, Feb 17, 2022 at 02:49:02AM -0500, Jagannathan Raman wrote:
> > >>>> Forward remote device's interrupts to the guest
> > >>>>
> > >>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > >>>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> > >>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > >>>> ---
> > >>>> include/hw/pci/pci.h              |   6 ++
> > >>>> include/hw/remote/vfio-user-obj.h |   6 ++
> > >>>> hw/pci/msi.c                      |  13 +++-
> > >>>> hw/pci/msix.c                     |  12 +++-
> > >>>> hw/remote/machine.c               |  11 +--
> > >>>> hw/remote/vfio-user-obj.c         | 107 ++++++++++++++++++++++++++++++
> > >>>> stubs/vfio-user-obj.c             |   6 ++
> > >>>> MAINTAINERS                       |   1 +
> > >>>> hw/remote/trace-events            |   1 +
> > >>>> stubs/meson.build                 |   1 +
> > >>>> 10 files changed, 158 insertions(+), 6 deletions(-)
> > >>>> create mode 100644 include/hw/remote/vfio-user-obj.h
> > >>>> create mode 100644 stubs/vfio-user-obj.c
> > >>>>
> > >>>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > >>>> index c3f3c90473..d42d526a48 100644
> > >>>> --- a/include/hw/pci/pci.h
> > >>>> +++ b/include/hw/pci/pci.h
> > >>>> @@ -129,6 +129,8 @@ typedef uint32_t PCIConfigReadFunc(PCIDevice
> > *pci_dev,
> > >>>> typedef void PCIMapIORegionFunc(PCIDevice *pci_dev, int region_num,
> > >>>>                                pcibus_t addr, pcibus_t size, int type);
> > >>>> typedef void PCIUnregisterFunc(PCIDevice *pci_dev);
> > >>>> +typedef void PCIMSINotify(PCIDevice *pci_dev, unsigned vector);
> > >>>> +typedef void PCIMSIxNotify(PCIDevice *pci_dev, unsigned vector);
> > >>>>
> > >>>> typedef struct PCIIORegion {
> > >>>>    pcibus_t addr; /* current PCI mapping address. -1 means not mapped */
> > >>>> @@ -323,6 +325,10 @@ struct PCIDevice {
> > >>>>    /* Space to store MSIX table & pending bit array */
> > >>>>    uint8_t *msix_table;
> > >>>>    uint8_t *msix_pba;
> > >>>> +
> > >>>> +    PCIMSINotify *msi_notify;
> > >>>> +    PCIMSIxNotify *msix_notify;
> > >>>> +
> > >>>>    /* MemoryRegion container for msix exclusive BAR setup */
> > >>>>    MemoryRegion msix_exclusive_bar;
> > >>>>    /* Memory Regions for MSIX table and pending bit entries. */
> > >>>> diff --git a/include/hw/remote/vfio-user-obj.h b/include/hw/remote/vfio-
> > user-obj.h
> > >>>> new file mode 100644
> > >>>> index 0000000000..87ab78b875
> > >>>> --- /dev/null
> > >>>> +++ b/include/hw/remote/vfio-user-obj.h
> > >>>> @@ -0,0 +1,6 @@
> > >>>> +#ifndef VFIO_USER_OBJ_H
> > >>>> +#define VFIO_USER_OBJ_H
> > >>>> +
> > >>>> +void vfu_object_set_bus_irq(PCIBus *pci_bus);
> > >>>> +
> > >>>> +#endif
> > >>>> diff --git a/hw/pci/msi.c b/hw/pci/msi.c
> > >>>> index 47d2b0f33c..93f5e400cc 100644
> > >>>> --- a/hw/pci/msi.c
> > >>>> +++ b/hw/pci/msi.c
> > >>>> @@ -51,6 +51,8 @@
> > >>>> */
> > >>>> bool msi_nonbroken;
> > >>>>
> > >>>> +static void pci_msi_notify(PCIDevice *dev, unsigned int vector);
> > >>>> +
> > >>>> /* If we get rid of cap allocator, we won't need this. */
> > >>>> static inline uint8_t msi_cap_sizeof(uint16_t flags)
> > >>>> {
> > >>>> @@ -225,6 +227,8 @@ int msi_init(struct PCIDevice *dev, uint8_t offset,
> > >>>>    dev->msi_cap = config_offset;
> > >>>>    dev->cap_present |= QEMU_PCI_CAP_MSI;
> > >>>>
> > >>>> +    dev->msi_notify = pci_msi_notify;
> > >>>
> > >>> Are you sure it's correct to skip the msi_is_masked() logic? I think the
> > >>> callback function should only override the behavior of
> > >>> msi_send_message(), not the entire msi_notify() function.
> > >>>
> > >>> The same applies to MSI-X.
> > >>
> > >> Hi Stefan,
> > >>
> > >> We noticed that the client is handling the masking and unmasking of MSIx
> > >> interrupts.
> > >>
> > >> Concerning MSIx, vfio_msix_vector_use() handles unmasking and
> > >> vfio_msix_vector_release() handles masking operations. The server triggers
> > >> an MSIx interrupt by signaling the eventfd associated with the vector. If the
> > vector
> > >> is unmasked, the interrupt bypasses the client/QEMU and takes this
> > >> path: “server -> KVM -> guest”. Whereas, if the vector is masked, it lands on
> > the
> > >> client via: “server -> vfio_msi_interrupt()”. vfio_msi_interrupt() suppresses
> > the
> > >> interrupt if the vector is masked. The use and release functions switch the
> > server’s
> > >> eventfd between VFIOPCIDevice->VFIOMSIVector[i]->kvm_interrupt and
> > >> VFIOPCIDevice->VFIOMSIVector[i]->interrupt using the
> > >> VFIO_DEVICE_SET_IRQS message.
> > >>
> > >> Concerning MSI, the server should check if the vector is unmasked before
> > >> triggering. The server is not doing it presently, will update it. For some
> > reason,
> > >> I had assumed that MSI handling is similar to MSIx in terms of masking - sorry
> > >> about that. The masking and unmasking information for MSI is in the config
> > space
> > >> registers, so the server should have this information.
> > >>
> > >> You had previously suggested using callbacks for msi_get_message &
> > >> msi_send_message, considering the masking issue. Given MSIx masking
> > >> (including the MSIx table BAR) is handled at the client, the masking
> > information
> > >> doesn’t reach the server - so msix_notify will never invoke the
> > >> msi_send_message callback - all the vectors remain masked at the server
> > >> end (msix_init() -> msix_mask_all()).
> > >
> > > I was expecting vfio-user devices to be involved in MSI-X masking so
> 
> libvfio-user can't be involved in the first place since QEMU emulates MSI/X:
> https://lore.kernel.org/all/20200121101911.64701afd@w520.home/T/
> 
> > > they can implement the Pending bit semantics described in the spec:
> > >
> > >  If a masked vector has its Pending bit Set, and the associated
> > >  underlying interrupt events are somehow satisfied (usually by software
> > >  though the exact manner is Function-specific), the Function must Clear
> > >  the Pending bit, to avoid sending a spurious interrupt message later
> > >  when software unmasks the vector.
> > >
> > > Does QEMU VFIO support this?
> > 
> > QEMU VFIO doesn’t seem to support it - I couldn’t find a place where
> > an assigned/passthru PCI device clears the pending bits in QEMU.
> > 
> > >
> > > What happens when a hw/net/e1000e_core.c vfio-user device uses
> > > msix_clr_pending() and related APIs?
> > >
> > > Also, having the vfio-user daemon write to the eventfd while the vector
> > > is masked is a waste of CPU cycles. The PCIe spec describes using MSI-X
> > > masking for poll mode operation and going via eventfd is suboptimal:
> > >
> > >  Software is permitted to mask one or more vectors indefinitely, and
> > >  service their associated interrupt events strictly based on polling
> > >  their Pending bits. A Function must Set and Clear its Pending bits as
> > >  necessary to support this “pure polling” mode of operation.
> > >
> > > Maybe the answer is these issues don't matter in practice because MSI-X
> > > masking is not used much?
> > 
> > From what I can tell, “pure polling” is used by ivshmem and virtio-pci devices in
> > QEMU.
> > 
> > e1000e doesn’t use “pure polling”, but it does clear pending interrupts.
> > 
> > John Johnson, John Levon & Thanos,
> > 
> >     Any thoughts?
> 
> If QEMU stops emulating MSI/X then we libvfio-user would have to do it.

If Alex is happy with VFIO not supporting the exact Pending bits
semantics in the PCIe spec, then vfio-user probably doesn't need to
either.

I just wanted to raise this design issue and maybe we can document
somewhere that this is intentional.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 76+ messages in thread

end of thread, other threads:[~2022-04-04  9:48 UTC | newest]

Thread overview: 76+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-17  7:48 [PATCH v6 00/19] vfio-user server in QEMU Jagannathan Raman
2022-02-17  7:48 ` [PATCH v6 01/19] configure, meson: override C compiler for cmake Jagannathan Raman
2022-02-17 12:09   ` Peter Maydell
2022-02-17 15:49     ` Jag Raman
2022-02-18  3:40     ` Jag Raman
2022-02-18 12:13       ` Paolo Bonzini
2022-02-18 14:49         ` Jag Raman
2022-02-18 15:16           ` Jag Raman
2022-02-20  8:27           ` Paolo Bonzini
2022-02-20 13:27             ` Paolo Bonzini
2022-02-22 19:05             ` Jag Raman
2022-02-24 17:52               ` Paolo Bonzini
2022-02-25  4:03                 ` Jag Raman
2022-02-28 18:12                   ` Paolo Bonzini
2022-02-28 19:55                     ` Jag Raman
2022-02-17  7:48 ` [PATCH v6 02/19] tests/avocado: Specify target VM argument to helper routines Jagannathan Raman
2022-02-17  7:48 ` [PATCH v6 03/19] qdev: unplug blocker for devices Jagannathan Raman
2022-02-21 15:27   ` Stefan Hajnoczi
2022-02-28 16:23     ` Jag Raman
2022-02-21 15:30   ` Stefan Hajnoczi
2022-02-28 19:11     ` Jag Raman
2022-02-17  7:48 ` [PATCH v6 04/19] remote/machine: add HotplugHandler for remote machine Jagannathan Raman
2022-02-21 15:30   ` Stefan Hajnoczi
2022-02-17  7:48 ` [PATCH v6 05/19] remote/machine: add vfio-user property Jagannathan Raman
2022-02-21 15:32   ` Stefan Hajnoczi
2022-02-17  7:48 ` [PATCH v6 06/19] vfio-user: build library Jagannathan Raman
2022-02-17  7:48 ` [PATCH v6 07/19] vfio-user: define vfio-user-server object Jagannathan Raman
2022-02-21 15:37   ` Stefan Hajnoczi
2022-02-28 19:14     ` Jag Raman
2022-03-02 16:45       ` Stefan Hajnoczi
2022-02-25 15:42   ` Eric Blake
2022-02-17  7:48 ` [PATCH v6 08/19] vfio-user: instantiate vfio-user context Jagannathan Raman
2022-02-21 15:42   ` Stefan Hajnoczi
2022-02-28 19:16     ` Jag Raman
2022-02-17  7:48 ` [PATCH v6 09/19] vfio-user: find and init PCI device Jagannathan Raman
2022-02-21 15:57   ` Stefan Hajnoczi
2022-02-28 19:17     ` Jag Raman
2022-02-17  7:48 ` [PATCH v6 10/19] vfio-user: run vfio-user context Jagannathan Raman
2022-02-22 10:13   ` Stefan Hajnoczi
2022-02-25 16:06   ` Eric Blake
2022-02-28 19:22     ` Jag Raman
2022-02-17  7:48 ` [PATCH v6 11/19] vfio-user: handle PCI config space accesses Jagannathan Raman
2022-02-22 11:09   ` Stefan Hajnoczi
2022-02-28 19:23     ` Jag Raman
2022-02-17  7:48 ` [PATCH v6 12/19] vfio-user: IOMMU support for remote device Jagannathan Raman
2022-02-22 10:40   ` Stefan Hajnoczi
2022-02-28 19:54     ` Jag Raman
2022-03-02 16:49       ` Stefan Hajnoczi
2022-03-03 14:49         ` Jag Raman
2022-03-07  9:45           ` Stefan Hajnoczi
2022-03-07 14:42             ` Jag Raman
2022-03-08 10:04               ` Stefan Hajnoczi
2022-02-17  7:49 ` [PATCH v6 13/19] vfio-user: handle DMA mappings Jagannathan Raman
2022-02-17  7:49 ` [PATCH v6 14/19] vfio-user: handle PCI BAR accesses Jagannathan Raman
2022-02-22 11:04   ` Stefan Hajnoczi
2022-02-17  7:49 ` [PATCH v6 15/19] vfio-user: handle device interrupts Jagannathan Raman
2022-03-07 10:24   ` Stefan Hajnoczi
2022-03-07 15:10     ` Jag Raman
2022-03-08 10:15       ` Stefan Hajnoczi
2022-03-26 23:47     ` Jag Raman
2022-03-29 14:24       ` Stefan Hajnoczi
2022-03-29 19:06         ` Jag Raman
2022-03-30  9:40           ` Thanos Makatos
2022-04-04  9:44             ` Stefan Hajnoczi
2022-02-17  7:49 ` [PATCH v6 16/19] softmmu/vl: defer backend init Jagannathan Raman
2022-03-07 10:48   ` Stefan Hajnoczi
2022-03-07 15:31     ` Jag Raman
2022-02-17  7:49 ` [PATCH v6 17/19] vfio-user: register handlers to facilitate migration Jagannathan Raman
2022-02-18 12:20   ` Paolo Bonzini
2022-02-18 14:55     ` Jag Raman
2022-03-07 11:26   ` Stefan Hajnoczi
2022-02-17  7:49 ` [PATCH v6 18/19] vfio-user: handle reset of remote device Jagannathan Raman
2022-03-07 11:36   ` Stefan Hajnoczi
2022-03-07 15:37     ` Jag Raman
2022-03-08 10:21       ` Stefan Hajnoczi
2022-02-17  7:49 ` [PATCH v6 19/19] vfio-user: avocado tests for vfio-user Jagannathan Raman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.