qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/12] vfio-user server in QEMU
@ 2021-10-11  5:31 Jagannathan Raman
  2021-10-11  5:31 ` [PATCH v3 01/12] configure, meson: override C compiler for cmake Jagannathan Raman
                   ` (12 more replies)
  0 siblings, 13 replies; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Based-on: <cover.1629131628.git.elena.ufimtseva@oracle.com>

Hi,

Thank you very much for reviewing the patches in the previous revision!

We have addressed most of comments from v2.

We are working on MSI-x support (used by PCIe devices such as “nvme”)
and a couple of comments in the migration patches. We hope these two
items will be in the next revision of the patches.

Please the list below for changes since last revision:

[PATCH RFC v3 01/12] configure, meson: override C compiler for cmake
  - New patch in this rev

[PATCH RFC v3 02/12] vfio-user: build library
  - fixed indentation issue
  - dropped separate MAINTAINERS section for submodule. using
    existing section to avoid checkpatch warning

[PATCH RFC v3 03/12] vfio-user: define vfio-user-server object
  - Changed socket parameter type to SocketAddress from str
  - renamed object as vfio-user-server
  - renamed devid option as device
  - Added CONFIG_VFIO_USER_SERVER option

[PATCH RFC v3 04/12] vfio-user: instantiate vfio-user context
  - removed errno.h include
  - documented reason for using a machine init done notifier
  - don’t call vfu_destroy_ctx() if context is not initialized

[PATCH RFC v3 05/12] vfio-user: find and init PCI device
  - registering for PCI Express devices as well. Tested with
    “nvme” device in QEMU.
  - MSI-x is not supported yet, PCI-e devices using INTx
    presently. Will add a patch shortly to address MSI-x

[PATCH RFC v3 06/12] vfio-user: run vfio-user context
  - Removed the separate QemuThread for attach as it was blocking
  - Using qemu_set_fd_handler() for attach’ing context

[PATCH RFC v3 07/12] vfio-user: handle PCI config space accesses
  - Using pci_host_config_read_common/write/common() instead of pci_default_read/write_config()
  - The read_common/write_common() should take care of endianness

[PATCH RFC v3 09/12] vfio-user: handle PCI BAR accesses
  - Added support for 64-bit BARs

[PATCH RFC v3 11/12] vfio-user: register handlers to facilitate migration
  - Fixed size handling in vfu_mig_buf_read()
  - Moved qemu_remote_savevm() from “precopy” phase to “stop_and_copy”
  - Opening “vfu_mig_file” just before use
  - TODO: Working with Nutanix on issues concerning the “resume”
    phase and size of Migration BAR (the size of migration BAR shouldn’t
    matter when using vfu_migration_callbacks_t).

Please share your comments.

Thank you!

Jagannathan Raman (12):
  configure, meson: override C compiler for cmake
  vfio-user: build library
  vfio-user: define vfio-user-server object
  vfio-user: instantiate vfio-user context
  vfio-user: find and init PCI device
  vfio-user: run vfio-user context
  vfio-user: handle PCI config space accesses
  vfio-user: handle DMA mappings
  vfio-user: handle PCI BAR accesses
  vfio-user: handle device interrupts
  vfio-user: register handlers to facilitate migration
  vfio-user: acceptance test

 configure                                  |  15 +-
 meson.build                                |  39 +
 qapi/qom.json                              |  20 +-
 include/hw/remote/iohub.h                  |   2 +
 migration/savevm.h                         |   2 +
 hw/remote/iohub.c                          |   5 +
 hw/remote/vfio-user-obj.c                  | 868 +++++++++++++++++++++
 migration/savevm.c                         |  73 ++
 .gitlab-ci.d/buildtest.yml                 |   2 +
 .gitmodules                                |   3 +
 MAINTAINERS                                |   3 +
 hw/remote/Kconfig                          |   5 +
 hw/remote/meson.build                      |   3 +
 hw/remote/trace-events                     |  11 +
 subprojects/libvfio-user                   |   1 +
 tests/acceptance/multiprocess.py           |   2 +
 tests/acceptance/vfio-user.py              |  96 +++
 tests/docker/dockerfiles/centos8.docker    |   2 +
 tests/docker/dockerfiles/ubuntu2004.docker |   2 +
 19 files changed, 1151 insertions(+), 3 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c
 create mode 160000 subprojects/libvfio-user
 create mode 100644 tests/acceptance/vfio-user.py

-- 
2.20.1



^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH v3 01/12] configure, meson: override C compiler for cmake
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-12 10:44   ` Paolo Bonzini
  2021-10-11  5:31 ` [PATCH v3 02/12] vfio-user: build library Jagannathan Raman
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

The compiler path that cmake gets from meson is corrupted. It results in
the following error:
| -- The C compiler identification is unknown
| CMake Error at CMakeLists.txt:35 (project):
| The CMAKE_C_COMPILER:
| /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
| is not a full path to an existing compiler tool.

Explicitly specify the C compiler for cmake to avoid this error

Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 configure | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/configure b/configure
index 877bf3d76a..e804dfba2f 100755
--- a/configure
+++ b/configure
@@ -5112,6 +5112,8 @@ if test "$skip_meson" = no; then
   echo "cpp_args = [${CXXFLAGS:+$(meson_quote $CXXFLAGS)}]" >> $cross
   echo "c_link_args = [${LDFLAGS:+$(meson_quote $LDFLAGS)}]" >> $cross
   echo "cpp_link_args = [${LDFLAGS:+$(meson_quote $LDFLAGS)}]" >> $cross
+  echo "[cmake]" >> $cross
+  echo "CMAKE_C_COMPILER = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
   echo "[binaries]" >> $cross
   echo "c = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
   test -n "$cxx" && echo "cpp = [$(meson_quote $cxx $CPU_CFLAGS)]" >> $cross
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 02/12] vfio-user: build library
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
  2021-10-11  5:31 ` [PATCH v3 01/12] configure, meson: override C compiler for cmake Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-27 15:17   ` Stefan Hajnoczi
  2021-10-11  5:31 ` [PATCH v3 03/12] vfio-user: define vfio-user-server object Jagannathan Raman
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

add the libvfio-user library as a submodule. build it as a cmake
subproject.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 configure                                  | 13 +++++++-
 meson.build                                | 39 ++++++++++++++++++++++
 .gitlab-ci.d/buildtest.yml                 |  2 ++
 .gitmodules                                |  3 ++
 MAINTAINERS                                |  1 +
 hw/remote/Kconfig                          |  5 +++
 hw/remote/meson.build                      |  2 ++
 subprojects/libvfio-user                   |  1 +
 tests/acceptance/multiprocess.py           |  2 ++
 tests/docker/dockerfiles/centos8.docker    |  2 ++
 tests/docker/dockerfiles/ubuntu2004.docker |  2 ++
 11 files changed, 71 insertions(+), 1 deletion(-)
 create mode 160000 subprojects/libvfio-user

diff --git a/configure b/configure
index e804dfba2f..88fb44f15a 100755
--- a/configure
+++ b/configure
@@ -443,7 +443,7 @@ skip_meson=no
 gettext="auto"
 fuse="auto"
 fuse_lseek="auto"
-multiprocess="auto"
+multiprocess="disabled"
 slirp_smbd="$default_feature"
 
 malloc_trim="auto"
@@ -4284,6 +4284,17 @@ but not implemented on your system"
     fi
 fi
 
+##########################################
+# check for multiprocess
+
+case "$multiprocess" in
+  auto | enabled )
+    if test "$git_submodules_action" != "ignore"; then
+      git_submodules="${git_submodules} subprojects/libvfio-user"
+    fi
+    ;;
+esac
+
 ##########################################
 # End of CC checks
 # After here, no more $cc or $ld runs
diff --git a/meson.build b/meson.build
index 99a0a3e689..5c91305f2d 100644
--- a/meson.build
+++ b/meson.build
@@ -172,6 +172,10 @@ if targetos != 'linux' and get_option('multiprocess').enabled()
 endif
 multiprocess_allowed = targetos == 'linux' and not get_option('multiprocess').disabled()
 
+# libvfiouser is enabled with multiprocess. Presently, libvfiouser depends on
+# multiprocess code, as such it can't be enabled independently
+libvfiouser_allowed = multiprocess_allowed
+
 libm = cc.find_library('m', required: false)
 threads = dependency('threads')
 util = cc.find_library('util', required: false)
@@ -1903,6 +1907,41 @@ if get_option('cfi') and slirp_opt == 'system'
          + ' Please configure with --enable-slirp=git')
 endif
 
+vfiouser = not_found
+if have_system and libvfiouser_allowed
+  have_internal = fs.exists(meson.current_source_dir() / 'subprojects/libvfio-user/Makefile')
+
+  if not have_internal
+    error('libvfio-user source not found - please pull git submodule')
+  endif
+
+  json_c = dependency('json-c', required: false)
+  if not json_c.found()
+    json_c = dependency('libjson-c', required: false)
+  endif
+  if not json_c.found()
+    json_c = dependency('libjson-c-dev', required: false)
+  endif
+
+  if not json_c.found()
+    error('Unable to find json-c package')
+  endif
+
+  cmake = import('cmake')
+
+  vfiouser_subproj = cmake.subproject('libvfio-user')
+
+  vfiouser_sl = vfiouser_subproj.dependency('vfio-user-static')
+
+  # Although cmake links the json-c library with vfio-user-static
+  # target, that info is not available to meson via cmake.subproject.
+  # As such, we have to separately declare the json-c dependency here.
+  # This appears to be a current limitation of using cmake inside meson.
+  # libvfio-user is planning a switch to meson in the future, which
+  # would address this item automatically.
+  vfiouser = declare_dependency(dependencies: [vfiouser_sl, json_c])
+endif
+
 fdt = not_found
 fdt_opt = get_option('fdt')
 if have_system
diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 5c378e35f9..515ae40d1f 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -42,6 +42,7 @@ build-system-ubuntu:
   variables:
     IMAGE: ubuntu2004
     CONFIGURE_ARGS: --enable-docs --enable-fdt=system --enable-slirp=system
+                    --enable-multiprocess
     TARGETS: aarch64-softmmu alpha-softmmu cris-softmmu hppa-softmmu
       microblazeel-softmmu mips64el-softmmu
     MAKE_CHECK_ARGS: check-build
@@ -142,6 +143,7 @@ build-system-centos:
     IMAGE: centos8
     CONFIGURE_ARGS: --disable-nettle --enable-gcrypt --enable-fdt=system
                     --enable-modules --enable-trace-backends=dtrace
+                    --enable-multiprocess
     TARGETS: ppc64-softmmu or1k-softmmu s390x-softmmu
       x86_64-softmmu rx-softmmu sh4-softmmu nios2-softmmu
     MAKE_CHECK_ARGS: check-build
diff --git a/.gitmodules b/.gitmodules
index 08b1b48a09..cfeea7cf20 100644
--- a/.gitmodules
+++ b/.gitmodules
@@ -64,3 +64,6 @@
 [submodule "roms/vbootrom"]
 	path = roms/vbootrom
 	url = https://gitlab.com/qemu-project/vbootrom.git
+[submodule "subprojects/libvfio-user"]
+	path = subprojects/libvfio-user
+	url = https://github.com/nutanix/libvfio-user.git
diff --git a/MAINTAINERS b/MAINTAINERS
index 50435b8d2f..661f91a160 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3420,6 +3420,7 @@ F: hw/remote/proxy-memory-listener.c
 F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
+F: subprojects/libvfio-user
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
index 08c16e235f..f9e512d44a 100644
--- a/hw/remote/Kconfig
+++ b/hw/remote/Kconfig
@@ -1,4 +1,9 @@
+config VFIO_USER_SERVER
+    bool
+    default n
+
 config MULTIPROCESS
     bool
     depends on PCI && PCI_EXPRESS && KVM
     select REMOTE_PCIHOST
+    select VFIO_USER_SERVER
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index e6a5574242..dfea6b533b 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -7,6 +7,8 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
 
+remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
+
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('memory.c'))
 specific_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy-memory-listener.c'))
 
diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
new file mode 160000
index 0000000000..647c9341d2
--- /dev/null
+++ b/subprojects/libvfio-user
@@ -0,0 +1 @@
+Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
diff --git a/tests/acceptance/multiprocess.py b/tests/acceptance/multiprocess.py
index 96627f022a..7383c6eb58 100644
--- a/tests/acceptance/multiprocess.py
+++ b/tests/acceptance/multiprocess.py
@@ -67,6 +67,7 @@ def do_test(self, kernel_url, initrd_url, kernel_command_line,
     def test_multiprocess_x86_64(self):
         """
         :avocado: tags=arch:x86_64
+        :avocado: tags=distro:centos
         """
         kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
                       '/linux/releases/31/Everything/x86_64/os/images'
@@ -82,6 +83,7 @@ def test_multiprocess_x86_64(self):
     def test_multiprocess_aarch64(self):
         """
         :avocado: tags=arch:aarch64
+        :avocado: tags=distro:ubuntu
         """
         kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
                       '/linux/releases/31/Everything/aarch64/os/images'
diff --git a/tests/docker/dockerfiles/centos8.docker b/tests/docker/dockerfiles/centos8.docker
index 46398c61ee..646abcda1f 100644
--- a/tests/docker/dockerfiles/centos8.docker
+++ b/tests/docker/dockerfiles/centos8.docker
@@ -12,6 +12,7 @@ ENV PACKAGES \
     capstone-devel \
     ccache \
     clang \
+    cmake \
     ctags \
     cyrus-sasl-devel \
     daxctl-devel \
@@ -32,6 +33,7 @@ ENV PACKAGES \
     gtk3-devel \
     hostname \
     jemalloc-devel \
+    json-c-devel \
     libaio-devel \
     libasan \
     libattr-devel \
diff --git a/tests/docker/dockerfiles/ubuntu2004.docker b/tests/docker/dockerfiles/ubuntu2004.docker
index 39de63d012..ca4dff0e6b 100644
--- a/tests/docker/dockerfiles/ubuntu2004.docker
+++ b/tests/docker/dockerfiles/ubuntu2004.docker
@@ -6,6 +6,7 @@ ENV PACKAGES \
     ca-certificates \
     ccache \
     clang \
+    cmake \
     dbus \
     debianutils \
     diffutils \
@@ -44,6 +45,7 @@ ENV PACKAGES \
     libiscsi-dev \
     libjemalloc-dev \
     libjpeg-turbo8-dev \
+    libjson-c-dev \
     liblttng-ust-dev \
     liblzo2-dev \
     libncursesw5-dev \
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
  2021-10-11  5:31 ` [PATCH v3 01/12] configure, meson: override C compiler for cmake Jagannathan Raman
  2021-10-11  5:31 ` [PATCH v3 02/12] vfio-user: build library Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-27 15:40   ` Stefan Hajnoczi
  2021-10-11  5:31 ` [PATCH v3 04/12] vfio-user: instantiate vfio-user context Jagannathan Raman
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Define vfio-user object which is remote process server for QEMU. Setup
object initialization functions and properties necessary to instantiate
the object

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 qapi/qom.json             |  20 ++++-
 hw/remote/vfio-user-obj.c | 173 ++++++++++++++++++++++++++++++++++++++
 MAINTAINERS               |   1 +
 hw/remote/meson.build     |   1 +
 hw/remote/trace-events    |   3 +
 5 files changed, 196 insertions(+), 2 deletions(-)
 create mode 100644 hw/remote/vfio-user-obj.c

diff --git a/qapi/qom.json b/qapi/qom.json
index 0222bb4506..97de79cc36 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -705,6 +705,20 @@
 { 'struct': 'RemoteObjectProperties',
   'data': { 'fd': 'str', 'devid': 'str' } }
 
+##
+# @VfioUserServerProperties:
+#
+# Properties for vfio-user-server objects.
+#
+# @socket: socket to be used by the libvfiouser library
+#
+# @device: the id of the device to be emulated at the server
+#
+# Since: 6.0
+##
+{ 'struct': 'VfioUserServerProperties',
+  'data': { 'socket': 'SocketAddress', 'device': 'str' } }
+
 ##
 # @RngProperties:
 #
@@ -830,7 +844,8 @@
     'tls-creds-psk',
     'tls-creds-x509',
     'tls-cipher-suites',
-    'x-remote-object'
+    'x-remote-object',
+    'vfio-user-server'
   ] }
 
 ##
@@ -887,7 +902,8 @@
       'tls-creds-psk':              'TlsCredsPskProperties',
       'tls-creds-x509':             'TlsCredsX509Properties',
       'tls-cipher-suites':          'TlsCredsProperties',
-      'x-remote-object':            'RemoteObjectProperties'
+      'x-remote-object':            'RemoteObjectProperties',
+      'vfio-user-server':           'VfioUserServerProperties'
   } }
 
 ##
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
new file mode 100644
index 0000000000..c2a300f0ff
--- /dev/null
+++ b/hw/remote/vfio-user-obj.c
@@ -0,0 +1,173 @@
+/**
+ * QEMU vfio-user-server server object
+ *
+ * Copyright © 2021 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
+ *
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+/**
+ * Usage: add options:
+ *     -machine x-remote
+ *     -device <PCI-device>,id=<pci-dev-id>
+ *     -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,
+ *             device=<pci-dev-id>
+ *
+ * Note that vfio-user-server object must be used with x-remote machine only.
+ * This server could only support PCI devices for now.
+ *
+ * type - SocketAddress type - presently "unix" alone is supported. Required
+ *        option
+ *
+ * path - named unix socket, it will be created by the server. It is
+ *        a required option
+ *
+ * device - id of a device on the server, a required option. PCI devices
+ *          alone are supported presently.
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qom/object.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "sysemu/runstate.h"
+#include "hw/boards.h"
+#include "hw/remote/machine.h"
+#include "qapi/error.h"
+#include "qapi/qapi-visit-sockets.h"
+
+#define TYPE_VFU_OBJECT "vfio-user-server"
+OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
+
+struct VfuObjectClass {
+    ObjectClass parent_class;
+
+    unsigned int nr_devs;
+
+    /* Maximum number of devices the server could support */
+    unsigned int max_devs;
+};
+
+struct VfuObject {
+    /* private */
+    Object parent;
+
+    SocketAddress *socket;
+
+    char *device;
+};
+
+static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
+                                  void *opaque, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->socket);
+
+    o->socket = NULL;
+
+    visit_type_SocketAddress(v, name, &o->socket, errp);
+
+    if (o->socket->type != SOCKET_ADDRESS_TYPE_UNIX) {
+        error_setg(&error_abort, "vfu: Unsupported socket type - %s",
+                   o->socket->u.q_unix.path);
+        return;
+    }
+
+    trace_vfu_prop("socket", o->socket->u.q_unix.path);
+}
+
+static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
+{
+    VfuObject *o = VFU_OBJECT(obj);
+
+    g_free(o->device);
+
+    o->device = g_strdup(str);
+
+    trace_vfu_prop("device", str);
+}
+
+static void vfu_object_init(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+
+    if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
+        error_setg(&error_abort, "vfu: %s only compatible with %s machine",
+                   TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
+        return;
+    }
+
+    if (k->nr_devs >= k->max_devs) {
+        error_setg(&error_abort,
+                   "Reached maximum number of vfio-user-server devices: %u",
+                   k->max_devs);
+        return;
+    }
+
+    k->nr_devs++;
+}
+
+static void vfu_object_finalize(Object *obj)
+{
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
+
+    k->nr_devs--;
+
+    g_free(o->socket);
+
+    g_free(o->device);
+
+    if (k->nr_devs == 0) {
+        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
+    }
+}
+
+static void vfu_object_class_init(ObjectClass *klass, void *data)
+{
+    VfuObjectClass *k = VFU_OBJECT_CLASS(klass);
+
+    /* Limiting maximum number of devices to 1 until IOMMU support is added */
+    k->max_devs = 1;
+    k->nr_devs = 0;
+
+    object_class_property_add(klass, "socket", "SocketAddress", NULL,
+                              vfu_object_set_socket, NULL, NULL);
+    object_class_property_set_description(klass, "socket",
+                                          "SocketAddress "
+                                          "(ex: type=unix,path=/tmp/sock). "
+                                          "Only UNIX is presently supported");
+    object_class_property_add_str(klass, "device", NULL,
+                                  vfu_object_set_device);
+    object_class_property_set_description(klass, "device",
+                                          "device ID - only PCI devices "
+                                          "are presently supported");
+}
+
+static const TypeInfo vfu_object_info = {
+    .name = TYPE_VFU_OBJECT,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(VfuObject),
+    .instance_init = vfu_object_init,
+    .instance_finalize = vfu_object_finalize,
+    .class_size = sizeof(VfuObjectClass),
+    .class_init = vfu_object_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void vfu_register_types(void)
+{
+    type_register_static(&vfu_object_info);
+}
+
+type_init(vfu_register_types);
diff --git a/MAINTAINERS b/MAINTAINERS
index 661f91a160..79ff8331dc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3421,6 +3421,7 @@ F: include/hw/remote/proxy-memory-listener.h
 F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
+F: hw/remote/vfio-user-obj.c
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/hw/remote/meson.build b/hw/remote/meson.build
index dfea6b533b..534ac5df79 100644
--- a/hw/remote/meson.build
+++ b/hw/remote/meson.build
@@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('message.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
 remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
+remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: files('vfio-user-obj.c'))
 
 remote_ss.add(when: 'CONFIG_VFIO_USER_SERVER', if_true: vfiouser)
 
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 0b23974f90..7da12f0d96 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -2,3 +2,6 @@
 
 mpqemu_send_io_error(int cmd, int size, int nfds) "send command %d size %d, %d file descriptors to remote process"
 mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d, %d file descriptors to remote process"
+
+# vfio-user-obj.c
+vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 04/12] vfio-user: instantiate vfio-user context
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (2 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 03/12] vfio-user: define vfio-user-server object Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-27 15:59   ` Stefan Hajnoczi
  2021-10-11  5:31 ` [PATCH v3 05/12] vfio-user: find and init PCI device Jagannathan Raman
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

create a context with the vfio-user library to run a PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 38 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 38 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index c2a300f0ff..d26e5ec9e9 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -41,6 +41,9 @@
 #include "hw/remote/machine.h"
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
+#include "qemu/notify.h"
+#include "sysemu/sysemu.h"
+#include "libvfio-user.h"
 
 #define TYPE_VFU_OBJECT "vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -61,6 +64,10 @@ struct VfuObject {
     SocketAddress *socket;
 
     char *device;
+
+    Notifier machine_done;
+
+    vfu_ctx_t *vfu_ctx;
 };
 
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
@@ -94,9 +101,31 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("device", str);
 }
 
+/*
+ * vfio-user-server depends on the availability of the 'socket' and 'device'
+ * properties. It also depends on devices instantiated in QEMU. These
+ * dependencies are not available during the instance_init phase of this
+ * object's life-cycle. As such, the server is initialized after the
+ * machine is setup. machine_init_done_notifier notifies vfio-user-server
+ * when the machine is setup, and the dependencies are available.
+ */
+static void vfu_object_machine_done(Notifier *notifier, void *data)
+{
+    VfuObject *o = container_of(notifier, VfuObject, machine_done);
+
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+                                o, VFU_DEV_TYPE_PCI);
+    if (o->vfu_ctx == NULL) {
+        error_setg(&error_abort, "vfu: Failed to create context - %s",
+                   strerror(errno));
+        return;
+    }
+}
+
 static void vfu_object_init(Object *obj)
 {
     VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
+    VfuObject *o = VFU_OBJECT(obj);
 
     if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
         error_setg(&error_abort, "vfu: %s only compatible with %s machine",
@@ -111,7 +140,12 @@ static void vfu_object_init(Object *obj)
         return;
     }
 
+    o->vfu_ctx = NULL;
+
     k->nr_devs++;
+
+    o->machine_done.notify = vfu_object_machine_done;
+    qemu_add_machine_init_done_notifier(&o->machine_done);
 }
 
 static void vfu_object_finalize(Object *obj)
@@ -123,6 +157,10 @@ static void vfu_object_finalize(Object *obj)
 
     g_free(o->socket);
 
+    if (o->vfu_ctx) {
+        vfu_destroy_ctx(o->vfu_ctx);
+    }
+
     g_free(o->device);
 
     if (k->nr_devs == 0) {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 05/12] vfio-user: find and init PCI device
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (3 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 04/12] vfio-user: instantiate vfio-user context Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-27 16:05   ` Stefan Hajnoczi
  2021-10-11  5:31 ` [PATCH v3 06/12] vfio-user: run vfio-user context Jagannathan Raman
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Find the PCI device with specified id. Initialize the device context
with the QEMU PCI device

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index d26e5ec9e9..7ce4e5b256 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -44,6 +44,8 @@
 #include "qemu/notify.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
+#include "hw/qdev-core.h"
+#include "hw/pci/pci.h"
 
 #define TYPE_VFU_OBJECT "vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -68,6 +70,8 @@ struct VfuObject {
     Notifier machine_done;
 
     vfu_ctx_t *vfu_ctx;
+
+    PCIDevice *pci_dev;
 };
 
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
@@ -112,6 +116,9 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
 static void vfu_object_machine_done(Notifier *notifier, void *data)
 {
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
+    DeviceState *dev = NULL;
+    vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
+    int ret;
 
     o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
                                 o, VFU_DEV_TYPE_PCI);
@@ -120,6 +127,31 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
                    strerror(errno));
         return;
     }
+
+    dev = qdev_find_recursive(sysbus_get_default(), o->device);
+    if (dev == NULL) {
+        error_setg(&error_abort, "vfu: Device %s not found", o->device);
+        return;
+    }
+
+    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
+        error_setg(&error_abort, "vfu: %s not a PCI device", o->device);
+        return;
+    }
+
+    o->pci_dev = PCI_DEVICE(dev);
+
+    if (pci_is_express(o->pci_dev)) {
+        pci_type = VFU_PCI_TYPE_EXPRESS;
+    }
+
+    ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
+    if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to attach PCI device %s to context - %s",
+                   o->device, strerror(errno));
+        return;
+    }
 }
 
 static void vfu_object_init(Object *obj)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 06/12] vfio-user: run vfio-user context
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (4 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 05/12] vfio-user: find and init PCI device Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-27 16:21   ` Stefan Hajnoczi
  2021-10-11  5:31 ` [PATCH v3 07/12] vfio-user: handle PCI config space accesses Jagannathan Raman
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Setup a handler to run vfio-user context. The context is driven by
messages to the file descriptor associated with it - get the fd for
the context and hook up the handler with it

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 75 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 7ce4e5b256..05f7fff19c 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -42,6 +42,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-visit-sockets.h"
 #include "qemu/notify.h"
+#include "qemu/thread.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
@@ -72,6 +73,8 @@ struct VfuObject {
     vfu_ctx_t *vfu_ctx;
 
     PCIDevice *pci_dev;
+
+    int vfu_poll_fd;
 };
 
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
@@ -105,6 +108,58 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("device", str);
 }
 
+static void vfu_object_ctx_run(void *opaque)
+{
+    VfuObject *o = opaque;
+    int ret = -1;
+
+    while (ret != 0) {
+        ret = vfu_run_ctx(o->vfu_ctx);
+        if (ret < 0) {
+            if (errno == EINTR) {
+                continue;
+            } else if (errno == ENOTCONN) {
+                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+                o->vfu_poll_fd = -1;
+                object_unparent(OBJECT(o));
+                break;
+            } else {
+                error_setg(&error_abort, "vfu: Failed to run device %s - %s",
+                           o->device, strerror(errno));
+                 break;
+            }
+        }
+    }
+}
+
+static void vfu_object_attach_ctx(void *opaque)
+{
+    VfuObject *o = opaque;
+    int ret;
+
+    qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
+    o->vfu_poll_fd = -1;
+
+retry_attach:
+    ret = vfu_attach_ctx(o->vfu_ctx);
+    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
+        goto retry_attach;
+    } else if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to attach device %s to context - %s",
+                   o->device, strerror(errno));
+        return;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        error_setg(&error_abort, "vfu: Failed to get poll fd %s", o->device);
+        return;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
+}
+
 /*
  * vfio-user-server depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -120,7 +175,8 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
     vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
     int ret;
 
-    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
+    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path,
+                                LIBVFIO_USER_FLAG_ATTACH_NB,
                                 o, VFU_DEV_TYPE_PCI);
     if (o->vfu_ctx == NULL) {
         error_setg(&error_abort, "vfu: Failed to create context - %s",
@@ -152,6 +208,21 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
                    o->device, strerror(errno));
         return;
     }
+
+    ret = vfu_realize_ctx(o->vfu_ctx);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
+                   o->device, strerror(errno));
+        return;
+    }
+
+    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
+    if (o->vfu_poll_fd < 0) {
+        error_setg(&error_abort, "vfu: Failed to get poll fd %s", o->device);
+        return;
+    }
+
+    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_attach_ctx, NULL, o);
 }
 
 static void vfu_object_init(Object *obj)
@@ -178,6 +249,8 @@ static void vfu_object_init(Object *obj)
 
     o->machine_done.notify = vfu_object_machine_done;
     qemu_add_machine_init_done_notifier(&o->machine_done);
+
+    o->vfu_poll_fd = -1;
 }
 
 static void vfu_object_finalize(Object *obj)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 07/12] vfio-user: handle PCI config space accesses
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (5 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 06/12] vfio-user: run vfio-user context Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-27 16:35   ` Stefan Hajnoczi
  2021-10-11  5:31 ` [PATCH v3 08/12] vfio-user: handle DMA mappings Jagannathan Raman
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Define and register handlers for PCI config space accesses

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 45 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 47 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 05f7fff19c..df5843c388 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -43,6 +43,7 @@
 #include "qapi/qapi-visit-sockets.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
+#include "qemu/main-loop.h"
 #include "sysemu/sysemu.h"
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
@@ -160,6 +161,39 @@ retry_attach:
     qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
 }
 
+static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
+                                     size_t count, loff_t offset,
+                                     const bool is_write)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint32_t pci_access_width = sizeof(uint32_t);
+    size_t bytes = count;
+    uint32_t val = 0;
+    char *ptr = buf;
+    int len;
+
+    while (bytes > 0) {
+        len = (bytes > pci_access_width) ? pci_access_width : bytes;
+        if (is_write) {
+            memcpy(&val, ptr, len);
+            pci_host_config_write_common(o->pci_dev, offset,
+                                         pci_config_size(o->pci_dev),
+                                         val, len);
+            trace_vfu_cfg_write(offset, val);
+        } else {
+            val = pci_host_config_read_common(o->pci_dev, offset,
+                                              pci_config_size(o->pci_dev), len);
+            memcpy(ptr, &val, len);
+            trace_vfu_cfg_read(offset, val);
+        }
+        offset += len;
+        ptr += len;
+        bytes -= len;
+    }
+
+    return count;
+}
+
 /*
  * vfio-user-server depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -209,6 +243,17 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_CFG_REGION_IDX,
+                           pci_config_size(o->pci_dev), &vfu_object_cfg_access,
+                           VFU_REGION_FLAG_RW | VFU_REGION_FLAG_ALWAYS_CB,
+                           NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(&error_abort,
+                   "vfu: Failed to setup config space handlers for %s- %s",
+                   o->device, strerror(errno));
+        return;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 7da12f0d96..2ef7884346 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -5,3 +5,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 
 # vfio-user-obj.c
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
+vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
+vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 08/12] vfio-user: handle DMA mappings
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (6 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 07/12] vfio-user: handle PCI config space accesses Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-11  5:31 ` [PATCH v3 09/12] vfio-user: handle PCI BAR accesses Jagannathan Raman
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Define and register callbacks to manage the RAM regions used for
device DMA

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 hw/remote/vfio-user-obj.c | 50 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  2 ++
 2 files changed, 52 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index df5843c388..7233562540 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -194,6 +194,49 @@ static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
     return count;
 }
 
+static void dma_register(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    MemoryRegion *subregion = NULL;
+    g_autofree char *name = NULL;
+    static unsigned int suffix;
+    struct iovec *iov = &info->iova;
+
+    if (!info->vaddr) {
+        return;
+    }
+
+    name = g_strdup_printf("remote-mem-%u", suffix++);
+
+    subregion = g_new0(MemoryRegion, 1);
+
+    memory_region_init_ram_ptr(subregion, NULL, name,
+                               iov->iov_len, info->vaddr);
+
+    memory_region_add_subregion(get_system_memory(), (hwaddr)iov->iov_base,
+                                subregion);
+
+    trace_vfu_dma_register((uint64_t)iov->iov_base, iov->iov_len);
+}
+
+static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
+{
+    MemoryRegion *mr = NULL;
+    ram_addr_t offset;
+
+    mr = memory_region_from_host(info->vaddr, &offset);
+    if (!mr) {
+        return 0;
+    }
+
+    memory_region_del_subregion(get_system_memory(), mr);
+
+    object_unparent((OBJECT(mr)));
+
+    trace_vfu_dma_unregister((uint64_t)info->iova.iov_base);
+
+    return 0;
+}
+
 /*
  * vfio-user-server depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -254,6 +297,13 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    ret = vfu_setup_device_dma(o->vfu_ctx, &dma_register, &dma_unregister);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup DMA handlers for %s",
+                   o->device);
+        return;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 2ef7884346..f945c7e33b 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -7,3 +7,5 @@ mpqemu_recv_io_error(int cmd, int size, int nfds) "failed to receive %d size %d,
 vfu_prop(const char *prop, const char *val) "vfu: setting %s as %s"
 vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
+vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
+vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 09/12] vfio-user: handle PCI BAR accesses
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (7 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 08/12] vfio-user: handle DMA mappings Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-27 16:38   ` Stefan Hajnoczi
  2021-10-11  5:31 ` [PATCH v3 10/12] vfio-user: handle device interrupts Jagannathan Raman
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Determine the BARs used by the PCI device and register handlers to
manage the access to the same.

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 hw/remote/vfio-user-obj.c | 90 +++++++++++++++++++++++++++++++++++++++
 hw/remote/trace-events    |  3 ++
 2 files changed, 93 insertions(+)

diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 7233562540..4c9ed1543c 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -237,6 +237,94 @@ static int dma_unregister(vfu_ctx_t *vfu_ctx, vfu_dma_info_t *info)
     return 0;
 }
 
+static ssize_t vfu_object_bar_rw(PCIDevice *pci_dev, hwaddr addr, size_t count,
+                                 char * const buf, const bool is_write,
+                                 bool is_io)
+{
+    AddressSpace *as = NULL;
+    MemTxResult res;
+
+    if (is_io) {
+        as = &address_space_io;
+    } else {
+        as = pci_device_iommu_address_space(pci_dev);
+    }
+
+    trace_vfu_bar_rw_enter(is_write ? "Write" : "Read", (uint64_t)addr);
+
+    res = address_space_rw(as, addr, MEMTXATTRS_UNSPECIFIED, (void *)buf,
+                           (hwaddr)count, is_write);
+    if (res != MEMTX_OK) {
+        warn_report("vfu: failed to %s 0x%"PRIx64"",
+                    is_write ? "write to" : "read from",
+                    addr);
+        return -1;
+    }
+
+    trace_vfu_bar_rw_exit(is_write ? "Write" : "Read", (uint64_t)addr);
+
+    return count;
+}
+
+/**
+ * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
+ *
+ * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
+ * define vfu_object_bar2_handler
+ */
+#define VFU_OBJECT_BAR_HANDLER(BAR_NO)                                         \
+    static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx,        \
+                                        char * const buf, size_t count,        \
+                                        loff_t offset, const bool is_write)    \
+    {                                                                          \
+        VfuObject *o = vfu_get_private(vfu_ctx);                               \
+        PCIDevice *pci_dev = o->pci_dev;                                       \
+        hwaddr addr = (hwaddr)(pci_get_bar_addr(pci_dev, BAR_NO) + offset);    \
+        bool is_io = !!(pci_dev->io_regions[BAR_NO].type &                     \
+                        PCI_BASE_ADDRESS_SPACE);                               \
+                                                                               \
+        return vfu_object_bar_rw(pci_dev, addr, count, buf, is_write, is_io);  \
+    }                                                                          \
+
+VFU_OBJECT_BAR_HANDLER(0)
+VFU_OBJECT_BAR_HANDLER(1)
+VFU_OBJECT_BAR_HANDLER(2)
+VFU_OBJECT_BAR_HANDLER(3)
+VFU_OBJECT_BAR_HANDLER(4)
+VFU_OBJECT_BAR_HANDLER(5)
+
+static vfu_region_access_cb_t *vfu_object_bar_handlers[PCI_NUM_REGIONS] = {
+    &vfu_object_bar0_handler,
+    &vfu_object_bar1_handler,
+    &vfu_object_bar2_handler,
+    &vfu_object_bar3_handler,
+    &vfu_object_bar4_handler,
+    &vfu_object_bar5_handler,
+};
+
+/**
+ * vfu_object_register_bars - Identify active BAR regions of pdev and setup
+ *                            callbacks to handle read/write accesses
+ */
+static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
+{
+    int i;
+
+    for (i = 0; i < PCI_NUM_REGIONS; i++) {
+        if (!pdev->io_regions[i].size) {
+            continue;
+        }
+
+        vfu_setup_region(vfu_ctx, VFU_PCI_DEV_BAR0_REGION_IDX + i,
+                         (size_t)pdev->io_regions[i].size,
+                         vfu_object_bar_handlers[i],
+                         VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
+
+        trace_vfu_bar_register(i, pdev->io_regions[i].addr,
+                               pdev->io_regions[i].size);
+    }
+}
+
 /*
  * vfio-user-server depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -304,6 +392,8 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index f945c7e33b..847d50d88f 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -9,3 +9,6 @@ vfu_cfg_read(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u -> 0x%x"
 vfu_cfg_write(uint32_t offset, uint32_t val) "vfu: cfg: 0x%u <- 0x%x"
 vfu_dma_register(uint64_t gpa, size_t len) "vfu: registering GPA 0x%"PRIx64", %zu bytes"
 vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
+vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
+vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
+vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 10/12] vfio-user: handle device interrupts
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (8 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 09/12] vfio-user: handle PCI BAR accesses Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-11  5:31 ` [PATCH v3 11/12] vfio-user: register handlers to facilitate migration Jagannathan Raman
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Forward remote device's interrupts to the guest

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/hw/remote/iohub.h |  2 ++
 hw/remote/iohub.c         |  5 +++++
 hw/remote/vfio-user-obj.c | 28 ++++++++++++++++++++++++++++
 hw/remote/trace-events    |  1 +
 4 files changed, 36 insertions(+)

diff --git a/include/hw/remote/iohub.h b/include/hw/remote/iohub.h
index 0bf98e0d78..d5bd0b08b0 100644
--- a/include/hw/remote/iohub.h
+++ b/include/hw/remote/iohub.h
@@ -15,6 +15,7 @@
 #include "qemu/event_notifier.h"
 #include "qemu/thread-posix.h"
 #include "hw/remote/mpqemu-link.h"
+#include "libvfio-user.h"
 
 #define REMOTE_IOHUB_NB_PIRQS    PCI_DEVFN_MAX
 
@@ -30,6 +31,7 @@ typedef struct RemoteIOHubState {
     unsigned int irq_level[REMOTE_IOHUB_NB_PIRQS];
     ResampleToken token[REMOTE_IOHUB_NB_PIRQS];
     QemuMutex irq_level_lock[REMOTE_IOHUB_NB_PIRQS];
+    vfu_ctx_t *vfu_ctx[REMOTE_IOHUB_NB_PIRQS];
 } RemoteIOHubState;
 
 int remote_iohub_map_irq(PCIDevice *pci_dev, int intx);
diff --git a/hw/remote/iohub.c b/hw/remote/iohub.c
index 547d597f0f..94102338a8 100644
--- a/hw/remote/iohub.c
+++ b/hw/remote/iohub.c
@@ -18,6 +18,7 @@
 #include "hw/remote/machine.h"
 #include "hw/remote/iohub.h"
 #include "qemu/main-loop.h"
+#include "trace.h"
 
 void remote_iohub_init(RemoteIOHubState *iohub)
 {
@@ -62,6 +63,10 @@ void remote_iohub_set_irq(void *opaque, int pirq, int level)
     QEMU_LOCK_GUARD(&iohub->irq_level_lock[pirq]);
 
     if (level) {
+        if (iohub->vfu_ctx[pirq]) {
+            trace_vfu_interrupt(pirq);
+            vfu_irq_trigger(iohub->vfu_ctx[pirq], 0);
+        }
         if (++iohub->irq_level[pirq] == 1) {
             event_notifier_set(&iohub->irqfds[pirq]);
         }
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 4c9ed1543c..63c468d6f3 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -48,6 +48,7 @@
 #include "libvfio-user.h"
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
+#include "hw/remote/iohub.h"
 
 #define TYPE_VFU_OBJECT "vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -325,6 +326,26 @@ static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
     }
 }
 
+static int vfu_object_setup_irqs(vfu_ctx_t *vfu_ctx, PCIDevice *pci_dev)
+{
+    RemoteMachineState *machine = REMOTE_MACHINE(current_machine);
+    RemoteIOHubState *iohub = &machine->iohub;
+    int pirq, intx, ret;
+
+    ret = vfu_setup_device_nr_irqs(vfu_ctx, VFU_DEV_INTX_IRQ, 1);
+    if (ret < 0) {
+        return ret;
+    }
+
+    intx = pci_get_byte(pci_dev->config + PCI_INTERRUPT_PIN) - 1;
+
+    pirq = remote_iohub_map_irq(pci_dev, intx);
+
+    iohub->vfu_ctx[pirq] = vfu_ctx;
+
+    return 0;
+}
+
 /*
  * vfio-user-server depends on the availability of the 'socket' and 'device'
  * properties. It also depends on devices instantiated in QEMU. These
@@ -394,6 +415,13 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
 
     vfu_object_register_bars(o->vfu_ctx, o->pci_dev);
 
+    ret = vfu_object_setup_irqs(o->vfu_ctx, o->pci_dev);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup interrupts for %s",
+                   o->device);
+        return;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
diff --git a/hw/remote/trace-events b/hw/remote/trace-events
index 847d50d88f..c167b3c7a5 100644
--- a/hw/remote/trace-events
+++ b/hw/remote/trace-events
@@ -12,3 +12,4 @@ vfu_dma_unregister(uint64_t gpa) "vfu: unregistering GPA 0x%"PRIx64""
 vfu_bar_register(int i, uint64_t addr, uint64_t size) "vfu: BAR %d: addr 0x%"PRIx64" size 0x%"PRIx64""
 vfu_bar_rw_enter(const char *op, uint64_t addr) "vfu: %s request for BAR address 0x%"PRIx64""
 vfu_bar_rw_exit(const char *op, uint64_t addr) "vfu: Finished %s of BAR address 0x%"PRIx64""
+vfu_interrupt(int pirq) "vfu: sending interrupt to device - PIRQ %d"
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 11/12] vfio-user: register handlers to facilitate migration
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (9 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 10/12] vfio-user: handle device interrupts Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-27 18:30   ` Stefan Hajnoczi
  2021-10-11  5:31 ` [PATCH v3 12/12] vfio-user: acceptance test Jagannathan Raman
  2021-10-27 18:33 ` [PATCH v3 00/12] vfio-user server in QEMU Stefan Hajnoczi
  12 siblings, 1 reply; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Store and load the device's state during migration. use libvfio-user's
handlers for this purpose

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 migration/savevm.h        |   2 +
 hw/remote/vfio-user-obj.c | 339 ++++++++++++++++++++++++++++++++++++++
 migration/savevm.c        |  73 ++++++++
 3 files changed, 414 insertions(+)

diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..8007064ff2 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -67,5 +67,7 @@ int qemu_loadvm_state_main(QEMUFile *f, MigrationIncomingState *mis);
 int qemu_load_device_state(QEMUFile *f);
 int qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
         bool in_postcopy, bool inactivate_disks);
+int qemu_remote_savevm(QEMUFile *f, DeviceState *dev);
+int qemu_remote_loadvm(QEMUFile *f);
 
 #endif
diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
index 63c468d6f3..757355ecaf 100644
--- a/hw/remote/vfio-user-obj.c
+++ b/hw/remote/vfio-user-obj.c
@@ -49,6 +49,10 @@
 #include "hw/qdev-core.h"
 #include "hw/pci/pci.h"
 #include "hw/remote/iohub.h"
+#include "migration/qemu-file.h"
+#include "migration/savevm.h"
+#include "migration/global_state.h"
+#include "block/block.h"
 
 #define TYPE_VFU_OBJECT "vfio-user-server"
 OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
@@ -77,6 +81,35 @@ struct VfuObject {
     PCIDevice *pci_dev;
 
     int vfu_poll_fd;
+
+    /*
+     * vfu_mig_buf holds the migration data. In the remote server, this
+     * buffer replaces the role of an IO channel which links the source
+     * and the destination.
+     *
+     * Whenever the client QEMU process initiates migration, the remote
+     * server gets notified via libvfio-user callbacks. The remote server
+     * sets up a QEMUFile object using this buffer as backend. The remote
+     * server passes this object to its migration subsystem, which slurps
+     * the VMSD of the device ('devid' above) referenced by this object
+     * and stores the VMSD in this buffer.
+     *
+     * The client subsequetly asks the remote server for any data that
+     * needs to be moved over to the destination via libvfio-user
+     * library's vfu_migration_callbacks_t callbacks. The remote hands
+     * over this buffer as data at this time.
+     *
+     * A reverse of this process happens at the destination.
+     */
+    uint8_t *vfu_mig_buf;
+
+    uint64_t vfu_mig_buf_size;
+
+    uint64_t vfu_mig_buf_pending;
+
+    QEMUFile *vfu_mig_file;
+
+    vfu_migr_state_t vfu_state;
 };
 
 static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
@@ -110,6 +143,272 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
     trace_vfu_prop("device", str);
 }
 
+/**
+ * Migration helper functions
+ *
+ * vfu_mig_buf_read & vfu_mig_buf_write are used by QEMU's migration
+ * subsystem - qemu_remote_loadvm & qemu_remote_savevm. loadvm/savevm
+ * call these functions via QEMUFileOps to load/save the VMSD of a
+ * device into vfu_mig_buf
+ *
+ */
+static ssize_t vfu_mig_buf_read(void *opaque, uint8_t *buf, int64_t pos,
+                                size_t size, Error **errp)
+{
+    VfuObject *o = opaque;
+
+    if (pos > o->vfu_mig_buf_size) {
+        size = 0;
+    } else if ((pos + size) > o->vfu_mig_buf_size) {
+        size = o->vfu_mig_buf_size - pos;
+    }
+
+    memcpy(buf, (o->vfu_mig_buf + pos), size);
+
+    return size;
+}
+
+static ssize_t vfu_mig_buf_write(void *opaque, struct iovec *iov, int iovcnt,
+                                 int64_t pos, Error **errp)
+{
+    VfuObject *o = opaque;
+    uint64_t end = pos + iov_size(iov, iovcnt);
+    int i;
+
+    if (end > o->vfu_mig_buf_size) {
+        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+    }
+
+    for (i = 0; i < iovcnt; i++) {
+        memcpy((o->vfu_mig_buf + o->vfu_mig_buf_size), iov[i].iov_base,
+               iov[i].iov_len);
+        o->vfu_mig_buf_size += iov[i].iov_len;
+        o->vfu_mig_buf_pending += iov[i].iov_len;
+    }
+
+    return iov_size(iov, iovcnt);
+}
+
+static int vfu_mig_buf_shutdown(void *opaque, bool rd, bool wr, Error **errp)
+{
+    VfuObject *o = opaque;
+
+    o->vfu_mig_buf_size = 0;
+
+    g_free(o->vfu_mig_buf);
+
+    o->vfu_mig_buf = NULL;
+
+    o->vfu_mig_buf_pending = 0;
+
+    return 0;
+}
+
+static const QEMUFileOps vfu_mig_fops_save = {
+    .writev_buffer  = vfu_mig_buf_write,
+    .shut_down      = vfu_mig_buf_shutdown,
+};
+
+static const QEMUFileOps vfu_mig_fops_load = {
+    .get_buffer     = vfu_mig_buf_read,
+    .shut_down      = vfu_mig_buf_shutdown,
+};
+
+/**
+ * handlers for vfu_migration_callbacks_t
+ *
+ * The libvfio-user library accesses these handlers to drive the migration
+ * at the remote end, and also to transport the data stored in vfu_mig_buf
+ *
+ */
+static void vfu_mig_state_stop_and_copy(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    int ret;
+
+    if (!o->vfu_mig_file) {
+        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_save, false);
+    }
+
+    ret = qemu_remote_savevm(o->vfu_mig_file, DEVICE(o->pci_dev));
+    if (ret) {
+        qemu_file_shutdown(o->vfu_mig_file);
+        o->vfu_mig_file = NULL;
+        return;
+    }
+
+    qemu_fflush(o->vfu_mig_file);
+}
+
+static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
+    static int migrated_devs;
+    Error *local_err = NULL;
+    int ret;
+
+    /**
+     * TODO: move to VFU_MIGR_STATE_RESUME handler. Presently, the
+     * VMSD data from source is not available at RESUME state.
+     * Working on a fix for this.
+     */
+    if (!o->vfu_mig_file) {
+        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
+    }
+
+    ret = qemu_remote_loadvm(o->vfu_mig_file);
+    if (ret) {
+        error_setg(&error_abort, "vfu: failed to restore device state");
+        return;
+    }
+
+    qemu_file_shutdown(o->vfu_mig_file);
+    o->vfu_mig_file = NULL;
+
+    /* VFU_MIGR_STATE_RUNNING begins here */
+    if (++migrated_devs == k->nr_devs) {
+        bdrv_invalidate_cache_all(&local_err);
+        if (local_err) {
+            error_report_err(local_err);
+            return;
+        }
+
+        vm_start();
+    }
+}
+
+static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
+    static int migrated_devs;
+
+    /**
+     * note: calling bdrv_inactivate_all() is not the best approach.
+     *
+     *  Ideally, we would identify the block devices (if any) indirectly
+     *  linked (such as via a scs-hd device) to each of the migrated devices,
+     *  and inactivate them individually. This is essential while operating
+     *  the server in a storage daemon mode, with devices from different VMs.
+     *
+     *  However, we currently don't have this capability. As such, we need to
+     *  inactivate all devices at the same time when migration is completed.
+     */
+    if (++migrated_devs == k->nr_devs) {
+        bdrv_inactivate_all();
+        vm_stop(RUN_STATE_PAUSED);
+    }
+}
+
+static int vfu_mig_transition(vfu_ctx_t *vfu_ctx, vfu_migr_state_t state)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (o->vfu_state == state) {
+        return 0;
+    }
+
+    switch (state) {
+    case VFU_MIGR_STATE_RESUME:
+        break;
+    case VFU_MIGR_STATE_STOP_AND_COPY:
+        vfu_mig_state_stop_and_copy(vfu_ctx);
+        break;
+    case VFU_MIGR_STATE_STOP:
+        vfu_mig_state_stop(vfu_ctx);
+        break;
+    case VFU_MIGR_STATE_PRE_COPY:
+        break;
+    case VFU_MIGR_STATE_RUNNING:
+        if (!runstate_is_running()) {
+            vfu_mig_state_running(vfu_ctx);
+        }
+        break;
+    default:
+        warn_report("vfu: Unknown migration state %d", state);
+    }
+
+    o->vfu_state = state;
+
+    return 0;
+}
+
+static uint64_t vfu_mig_get_pending_bytes(vfu_ctx_t *vfu_ctx)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    return o->vfu_mig_buf_pending;
+}
+
+static int vfu_mig_prepare_data(vfu_ctx_t *vfu_ctx, uint64_t *offset,
+                                uint64_t *size)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (offset) {
+        *offset = 0;
+    }
+
+    if (size) {
+        *size = o->vfu_mig_buf_size;
+    }
+
+    return 0;
+}
+
+static ssize_t vfu_mig_read_data(vfu_ctx_t *vfu_ctx, void *buf,
+                                 uint64_t size, uint64_t offset)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+
+    if (offset > o->vfu_mig_buf_size) {
+        return -1;
+    }
+
+    if ((offset + size) > o->vfu_mig_buf_size) {
+        warn_report("vfu: buffer overflow - check pending_bytes");
+        size = o->vfu_mig_buf_size - offset;
+    }
+
+    memcpy(buf, (o->vfu_mig_buf + offset), size);
+
+    o->vfu_mig_buf_pending -= size;
+
+    return size;
+}
+
+static ssize_t vfu_mig_write_data(vfu_ctx_t *vfu_ctx, void *data,
+                                  uint64_t size, uint64_t offset)
+{
+    VfuObject *o = vfu_get_private(vfu_ctx);
+    uint64_t end = offset + size;
+
+    if (end > o->vfu_mig_buf_size) {
+        o->vfu_mig_buf = g_realloc(o->vfu_mig_buf, end);
+        o->vfu_mig_buf_size = end;
+    }
+
+    memcpy((o->vfu_mig_buf + offset), data, size);
+
+    return size;
+}
+
+static int vfu_mig_data_written(vfu_ctx_t *vfu_ctx, uint64_t count)
+{
+    return 0;
+}
+
+static const vfu_migration_callbacks_t vfu_mig_cbs = {
+    .version = VFU_MIGR_CALLBACKS_VERS,
+    .transition = &vfu_mig_transition,
+    .get_pending_bytes = &vfu_mig_get_pending_bytes,
+    .prepare_data = &vfu_mig_prepare_data,
+    .read_data = &vfu_mig_read_data,
+    .data_written = &vfu_mig_data_written,
+    .write_data = &vfu_mig_write_data,
+};
+
 static void vfu_object_ctx_run(void *opaque)
 {
     VfuObject *o = opaque;
@@ -359,6 +658,7 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
     VfuObject *o = container_of(notifier, VfuObject, machine_done);
     DeviceState *dev = NULL;
     vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
+    size_t migr_area_size;
     int ret;
 
     o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path,
@@ -422,6 +722,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
         return;
     }
 
+    /*
+     * TODO: The 0x20000 number used below is a temporary. We are working on
+     *     a cleaner fix for this.
+     *
+     *     The libvfio-user library assumes that the remote knows the size of
+     *     the data to be migrated at boot time, but that is not the case with
+     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
+     *     as a sufficiently large buffer to demonstrate migration, but that
+     *     cannot be used as a solution.
+     *
+     */
+    ret = vfu_setup_region(o->vfu_ctx, VFU_PCI_DEV_MIGR_REGION_IDX,
+                           0x20000, NULL,
+                           VFU_REGION_FLAG_RW, NULL, 0, -1, 0);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to register migration BAR %s- %s",
+                   o->device, strerror(errno));
+        return;
+    }
+
+    migr_area_size = vfu_get_migr_register_area_size();
+    ret = vfu_setup_device_migration_callbacks(o->vfu_ctx, &vfu_mig_cbs,
+                                               migr_area_size);
+    if (ret < 0) {
+        error_setg(&error_abort, "vfu: Failed to setup migration %s- %s",
+                   o->device, strerror(errno));
+        return;
+    }
+
     ret = vfu_realize_ctx(o->vfu_ctx);
     if (ret < 0) {
         error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
@@ -464,6 +793,16 @@ static void vfu_object_init(Object *obj)
     qemu_add_machine_init_done_notifier(&o->machine_done);
 
     o->vfu_poll_fd = -1;
+
+    o->vfu_mig_file = NULL;
+
+    o->vfu_mig_buf = NULL;
+
+    o->vfu_mig_buf_size = 0;
+
+    o->vfu_mig_buf_pending = 0;
+
+    o->vfu_state = VFU_MIGR_STATE_STOP;
 }
 
 static void vfu_object_finalize(Object *obj)
diff --git a/migration/savevm.c b/migration/savevm.c
index 7b7b64bd13..341fde73f8 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1604,6 +1604,49 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
     return ret;
 }
 
+static SaveStateEntry *find_se_from_dev(DeviceState *dev)
+{
+    SaveStateEntry *se;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->opaque == dev) {
+            return se;
+        }
+    }
+
+    return NULL;
+}
+
+int qemu_remote_savevm(QEMUFile *f, DeviceState *dev)
+{
+    SaveStateEntry *se;
+    int ret = 0;
+
+    se = find_se_from_dev(dev);
+    if (!se) {
+        return -ENODEV;
+    }
+
+    if (!se->vmsd || !vmstate_save_needed(se->vmsd, se->opaque)) {
+        return ret;
+    }
+
+    save_section_header(f, se, QEMU_VM_SECTION_FULL);
+
+    ret = vmstate_save(f, se, NULL);
+    if (ret) {
+        qemu_file_set_error(f, ret);
+        return ret;
+    }
+
+    save_section_footer(f, se);
+
+    qemu_put_byte(f, QEMU_VM_EOF);
+    qemu_fflush(f);
+
+    return 0;
+}
+
 void qemu_savevm_live_state(QEMUFile *f)
 {
     /* save QEMU_VM_SECTION_END section */
@@ -2444,6 +2487,36 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
     return 0;
 }
 
+int qemu_remote_loadvm(QEMUFile *f)
+{
+    uint8_t section_type;
+    int ret = 0;
+
+    while (true) {
+        section_type = qemu_get_byte(f);
+
+        ret = qemu_file_get_error(f);
+        if (ret) {
+            break;
+        }
+
+        switch (section_type) {
+        case QEMU_VM_SECTION_FULL:
+            ret = qemu_loadvm_section_start_full(f, NULL);
+            if (ret < 0) {
+                break;
+            }
+            break;
+        case QEMU_VM_EOF:
+            return ret;
+        default:
+            return -EINVAL;
+        }
+    }
+
+    return ret;
+}
+
 static int
 qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)
 {
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH v3 12/12] vfio-user: acceptance test
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (10 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 11/12] vfio-user: register handlers to facilitate migration Jagannathan Raman
@ 2021-10-11  5:31 ` Jagannathan Raman
  2021-10-11 22:26   ` Philippe Mathieu-Daudé
  2021-10-27 16:42   ` Stefan Hajnoczi
  2021-10-27 18:33 ` [PATCH v3 00/12] vfio-user server in QEMU Stefan Hajnoczi
  12 siblings, 2 replies; 44+ messages in thread
From: Jagannathan Raman @ 2021-10-11  5:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, jag.raman, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

Acceptance test for libvfio-user in QEMU

Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
---
 MAINTAINERS                   |  1 +
 tests/acceptance/vfio-user.py | 96 +++++++++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+)
 create mode 100644 tests/acceptance/vfio-user.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 79ff8331dc..a98d37423b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3422,6 +3422,7 @@ F: hw/remote/iohub.c
 F: include/hw/remote/iohub.h
 F: subprojects/libvfio-user
 F: hw/remote/vfio-user-obj.c
+F: tests/acceptance/vfio-user.py
 
 EBPF:
 M: Jason Wang <jasowang@redhat.com>
diff --git a/tests/acceptance/vfio-user.py b/tests/acceptance/vfio-user.py
new file mode 100644
index 0000000000..5eb5cabc68
--- /dev/null
+++ b/tests/acceptance/vfio-user.py
@@ -0,0 +1,96 @@
+# vfio-user protocol sanity test
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+
+
+import os
+import socket
+import uuid
+
+from avocado_qemu import Test
+from avocado_qemu import wait_for_console_pattern
+from avocado_qemu import exec_command
+from avocado_qemu import exec_command_and_wait_for_pattern
+
+class VfioUser(Test):
+    """
+    :avocado: tags=vfiouser
+    """
+    KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
+
+    def do_test(self, kernel_url, initrd_url, kernel_command_line,
+                machine_type):
+        """Main test method"""
+        self.require_accelerator('kvm')
+
+        kernel_path = self.fetch_asset(kernel_url)
+        initrd_path = self.fetch_asset(initrd_url)
+
+        socket = os.path.join('/tmp', str(uuid.uuid4()))
+        if os.path.exists(socket):
+            os.remove(socket)
+
+        # Create remote process
+        remote_vm = self.get_vm()
+        remote_vm.add_args('-machine', 'x-remote')
+        remote_vm.add_args('-nodefaults')
+        remote_vm.add_args('-device', 'lsi53c895a,id=lsi1')
+        remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,'
+                           'devid=lsi1,socket='+socket)
+        remote_vm.launch()
+
+        # Create proxy process
+        self.vm.set_console()
+        self.vm.add_args('-machine', machine_type)
+        self.vm.add_args('-accel', 'kvm')
+        self.vm.add_args('-cpu', 'host')
+        self.vm.add_args('-object',
+                         'memory-backend-memfd,id=sysmem-file,size=2G')
+        self.vm.add_args('--numa', 'node,memdev=sysmem-file')
+        self.vm.add_args('-m', '2048')
+        self.vm.add_args('-kernel', kernel_path,
+                         '-initrd', initrd_path,
+                         '-append', kernel_command_line)
+        self.vm.add_args('-device',
+                         'vfio-user-pci,'
+                         'socket='+socket)
+        self.vm.launch()
+        wait_for_console_pattern(self, 'as init process',
+                                 'Kernel panic - not syncing')
+        exec_command(self, 'mount -t sysfs sysfs /sys')
+        exec_command_and_wait_for_pattern(self,
+                                          'cat /sys/bus/pci/devices/*/uevent',
+                                          'PCI_ID=1000:0012')
+
+    def test_multiprocess_x86_64(self):
+        """
+        :avocado: tags=arch:x86_64
+        :avocado: tags=distro:centos
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/x86_64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'console=ttyS0 rdinit=/bin/bash')
+        machine_type = 'pc'
+        self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
+
+    def test_multiprocess_aarch64(self):
+        """
+        :avocado: tags=arch:aarch64
+        :avocado: tags=distro:ubuntu
+        """
+        kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/vmlinuz')
+        initrd_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
+                      '/linux/releases/31/Everything/aarch64/os/images'
+                      '/pxeboot/initrd.img')
+        kernel_command_line = (self.KERNEL_COMMON_COMMAND_LINE +
+                               'rdinit=/bin/bash console=ttyAMA0')
+        machine_type = 'virt,gic-version=3'
+        self.do_test(kernel_url, initrd_url, kernel_command_line, machine_type)
-- 
2.20.1



^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 12/12] vfio-user: acceptance test
  2021-10-11  5:31 ` [PATCH v3 12/12] vfio-user: acceptance test Jagannathan Raman
@ 2021-10-11 22:26   ` Philippe Mathieu-Daudé
  2021-10-27 16:42   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Philippe Mathieu-Daudé @ 2021-10-11 22:26 UTC (permalink / raw)
  To: Jagannathan Raman, qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, pbonzini, alex.bennee

On 10/11/21 07:31, Jagannathan Raman wrote:
> Acceptance test for libvfio-user in QEMU
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  MAINTAINERS                   |  1 +
>  tests/acceptance/vfio-user.py | 96 +++++++++++++++++++++++++++++++++++
>  2 files changed, 97 insertions(+)
>  create mode 100644 tests/acceptance/vfio-user.py

> +class VfioUser(Test):
> +    """
> +    :avocado: tags=vfiouser
> +    """
> +    KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
> +
> +    def do_test(self, kernel_url, initrd_url, kernel_command_line,
> +                machine_type):
> +        """Main test method"""
> +        self.require_accelerator('kvm')
> +
> +        kernel_path = self.fetch_asset(kernel_url)
> +        initrd_path = self.fetch_asset(initrd_url)
> +
> +        socket = os.path.join('/tmp', str(uuid.uuid4()))
> +        if os.path.exists(socket):
> +            os.remove(socket)
> +
> +        # Create remote process
> +        remote_vm = self.get_vm()
> +        remote_vm.add_args('-machine', 'x-remote')
> +        remote_vm.add_args('-nodefaults')
> +        remote_vm.add_args('-device', 'lsi53c895a,id=lsi1')
> +        remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,'
> +                           'devid=lsi1,socket='+socket)

Nitpicking for style: spaces around '+' here,

> +        remote_vm.launch()
> +
> +        # Create proxy process
> +        self.vm.set_console()
> +        self.vm.add_args('-machine', machine_type)
> +        self.vm.add_args('-accel', 'kvm')
> +        self.vm.add_args('-cpu', 'host')
> +        self.vm.add_args('-object',
> +                         'memory-backend-memfd,id=sysmem-file,size=2G')
> +        self.vm.add_args('--numa', 'node,memdev=sysmem-file')
> +        self.vm.add_args('-m', '2048')
> +        self.vm.add_args('-kernel', kernel_path,
> +                         '-initrd', initrd_path,
> +                         '-append', kernel_command_line)
> +        self.vm.add_args('-device',
> +                         'vfio-user-pci,'
> +                         'socket='+socket)

and here. Otherwise:
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>

Thanks for adding this test :)



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 01/12] configure, meson: override C compiler for cmake
  2021-10-11  5:31 ` [PATCH v3 01/12] configure, meson: override C compiler for cmake Jagannathan Raman
@ 2021-10-12 10:44   ` Paolo Bonzini
  0 siblings, 0 replies; 44+ messages in thread
From: Paolo Bonzini @ 2021-10-12 10:44 UTC (permalink / raw)
  To: Jagannathan Raman, qemu-devel
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, alex.williamson, marcandre.lureau, stefanha,
	thanos.makatos, alex.bennee

On 11/10/21 07:31, Jagannathan Raman wrote:
> The compiler path that cmake gets from meson is corrupted. It results in
> the following error:
> | -- The C compiler identification is unknown
> | CMake Error at CMakeLists.txt:35 (project):
> | The CMAKE_C_COMPILER:
> | /opt/rh/devtoolset-9/root/bin/cc;-m64;-mcx16
> | is not a full path to an existing compiler tool.
> 
> Explicitly specify the C compiler for cmake to avoid this error
> 
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>   configure | 2 ++
>   1 file changed, 2 insertions(+)
> 
> diff --git a/configure b/configure
> index 877bf3d76a..e804dfba2f 100755
> --- a/configure
> +++ b/configure
> @@ -5112,6 +5112,8 @@ if test "$skip_meson" = no; then
>     echo "cpp_args = [${CXXFLAGS:+$(meson_quote $CXXFLAGS)}]" >> $cross
>     echo "c_link_args = [${LDFLAGS:+$(meson_quote $LDFLAGS)}]" >> $cross
>     echo "cpp_link_args = [${LDFLAGS:+$(meson_quote $LDFLAGS)}]" >> $cross
> +  echo "[cmake]" >> $cross
> +  echo "CMAKE_C_COMPILER = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
>     echo "[binaries]" >> $cross
>     echo "c = [$(meson_quote $cc $CPU_CFLAGS)]" >> $cross
>     test -n "$cxx" && echo "cpp = [$(meson_quote $cxx $CPU_CFLAGS)]" >> $cross
> 

(This is fixed in Meson 0.60).

Acked-by: Paolo Bonzini <pbonzini@redhat.com>



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/12] vfio-user: build library
  2021-10-11  5:31 ` [PATCH v3 02/12] vfio-user: build library Jagannathan Raman
@ 2021-10-27 15:17   ` Stefan Hajnoczi
  2021-10-29 14:17     ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 15:17 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 1562 bytes --]

On Mon, Oct 11, 2021 at 01:31:07AM -0400, Jagannathan Raman wrote:
> diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
> index 08c16e235f..f9e512d44a 100644
> --- a/hw/remote/Kconfig
> +++ b/hw/remote/Kconfig
> @@ -1,4 +1,9 @@
> +config VFIO_USER_SERVER
> +    bool
> +    default n

Does VFIO_USER_SERVER depend on MULTIPROCESS?

> diff --git a/tests/acceptance/multiprocess.py b/tests/acceptance/multiprocess.py
> index 96627f022a..7383c6eb58 100644
> --- a/tests/acceptance/multiprocess.py
> +++ b/tests/acceptance/multiprocess.py
> @@ -67,6 +67,7 @@ def do_test(self, kernel_url, initrd_url, kernel_command_line,
>      def test_multiprocess_x86_64(self):
>          """
>          :avocado: tags=arch:x86_64
> +        :avocado: tags=distro:centos
>          """
>          kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
>                        '/linux/releases/31/Everything/x86_64/os/images'
> @@ -82,6 +83,7 @@ def test_multiprocess_x86_64(self):
>      def test_multiprocess_aarch64(self):
>          """
>          :avocado: tags=arch:aarch64
> +        :avocado: tags=distro:ubuntu
>          """
>          kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
>                        '/linux/releases/31/Everything/aarch64/os/images'

Did you tag them with different distros in order to get coverage on both
CentOS and Ubuntu (even though that's orthogonal to x86_64 vs aarch64)?
Maybe a comment is necessary so it's clear why these tags are in place
because the test isn't actually limited to that distro.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-10-11  5:31 ` [PATCH v3 03/12] vfio-user: define vfio-user-server object Jagannathan Raman
@ 2021-10-27 15:40   ` Stefan Hajnoczi
  2021-10-29 14:42     ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 15:40 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 6257 bytes --]

On Mon, Oct 11, 2021 at 01:31:08AM -0400, Jagannathan Raman wrote:
> diff --git a/qapi/qom.json b/qapi/qom.json
> index 0222bb4506..97de79cc36 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -705,6 +705,20 @@
>  { 'struct': 'RemoteObjectProperties',
>    'data': { 'fd': 'str', 'devid': 'str' } }
>  
> +##
> +# @VfioUserServerProperties:
> +#
> +# Properties for vfio-user-server objects.
> +#
> +# @socket: socket to be used by the libvfiouser library
> +#
> +# @device: the id of the device to be emulated at the server
> +#
> +# Since: 6.0

6.2

> +##
> +{ 'struct': 'VfioUserServerProperties',
> +  'data': { 'socket': 'SocketAddress', 'device': 'str' } }
> +
>  ##
>  # @RngProperties:
>  #
> @@ -830,7 +844,8 @@
>      'tls-creds-psk',
>      'tls-creds-x509',
>      'tls-cipher-suites',
> -    'x-remote-object'
> +    'x-remote-object',
> +    'vfio-user-server'

Should it have the experimental prefix ('x-') or is the QAPI interface
stable? I think some things to think about are whether a single process
can host multiple device servers, whether hotplug is possible, etc. If
the interface is stable then it must be able to accomodate future
features (at least ones we can anticipate right now :)).

>    ] }
>  
>  ##
> @@ -887,7 +902,8 @@
>        'tls-creds-psk':              'TlsCredsPskProperties',
>        'tls-creds-x509':             'TlsCredsX509Properties',
>        'tls-cipher-suites':          'TlsCredsProperties',
> -      'x-remote-object':            'RemoteObjectProperties'
> +      'x-remote-object':            'RemoteObjectProperties',
> +      'vfio-user-server':           'VfioUserServerProperties'
>    } }
>  
>  ##
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> new file mode 100644
> index 0000000000..c2a300f0ff
> --- /dev/null
> +++ b/hw/remote/vfio-user-obj.c
> @@ -0,0 +1,173 @@
> +/**
> + * QEMU vfio-user-server server object
> + *
> + * Copyright © 2021 Oracle and/or its affiliates.
> + *
> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
> + *
> + * See the COPYING file in the top-level directory.
> + *
> + */
> +
> +/**
> + * Usage: add options:
> + *     -machine x-remote
> + *     -device <PCI-device>,id=<pci-dev-id>
> + *     -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,

I expected socket.type= and socket.path= based on the QAPI schema. Is
this command-line example correct?

> + *             device=<pci-dev-id>
> + *
> + * Note that vfio-user-server object must be used with x-remote machine only.
> + * This server could only support PCI devices for now.
> + *
> + * type - SocketAddress type - presently "unix" alone is supported. Required
> + *        option
> + *
> + * path - named unix socket, it will be created by the server. It is
> + *        a required option
> + *
> + * device - id of a device on the server, a required option. PCI devices
> + *          alone are supported presently.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include "qom/object.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "trace.h"
> +#include "sysemu/runstate.h"
> +#include "hw/boards.h"
> +#include "hw/remote/machine.h"
> +#include "qapi/error.h"
> +#include "qapi/qapi-visit-sockets.h"
> +
> +#define TYPE_VFU_OBJECT "vfio-user-server"
> +OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> +
> +struct VfuObjectClass {
> +    ObjectClass parent_class;
> +
> +    unsigned int nr_devs;
> +
> +    /* Maximum number of devices the server could support */
> +    unsigned int max_devs;
> +};
> +
> +struct VfuObject {
> +    /* private */
> +    Object parent;
> +
> +    SocketAddress *socket;
> +
> +    char *device;
> +};
> +
> +static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
> +                                  void *opaque, Error **errp)
> +{
> +    VfuObject *o = VFU_OBJECT(obj);
> +
> +    g_free(o->socket);

qapi_free_SocketAddress(o->socket)?

> +
> +    o->socket = NULL;
> +
> +    visit_type_SocketAddress(v, name, &o->socket, errp);
> +
> +    if (o->socket->type != SOCKET_ADDRESS_TYPE_UNIX) {
> +        error_setg(&error_abort, "vfu: Unsupported socket type - %s",
> +                   o->socket->u.q_unix.path);

o->socket must be freed and set it to NULL again, otherwise setting the
property appears to fail but the SocketAddress actually retains the
invalid value.

> +        return;
> +    }
> +
> +    trace_vfu_prop("socket", o->socket->u.q_unix.path);
> +}
> +
> +static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
> +{
> +    VfuObject *o = VFU_OBJECT(obj);
> +
> +    g_free(o->device);
> +
> +    o->device = g_strdup(str);
> +
> +    trace_vfu_prop("device", str);
> +}
> +
> +static void vfu_object_init(Object *obj)
> +{
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
> +
> +    if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
> +        error_setg(&error_abort, "vfu: %s only compatible with %s machine",
> +                   TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
> +        return;
> +    }
> +
> +    if (k->nr_devs >= k->max_devs) {
> +        error_setg(&error_abort,
> +                   "Reached maximum number of vfio-user-server devices: %u",
> +                   k->max_devs);
> +        return;
> +    }
> +
> +    k->nr_devs++;
> +}
> +
> +static void vfu_object_finalize(Object *obj)
> +{
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
> +    VfuObject *o = VFU_OBJECT(obj);
> +
> +    k->nr_devs--;
> +
> +    g_free(o->socket);

qapi_free_SocketAddress(o->socket)?

> +
> +    g_free(o->device);
> +
> +    if (k->nr_devs == 0) {
> +        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
> +    }

This won't work for all use cases. The user may wish to create/delete
vhost-user servers at runtime without terminating the process when there
are none left. An boolean option can be added in the future to control
this behavior, so it's okay to leave this as is.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 04/12] vfio-user: instantiate vfio-user context
  2021-10-11  5:31 ` [PATCH v3 04/12] vfio-user: instantiate vfio-user context Jagannathan Raman
@ 2021-10-27 15:59   ` Stefan Hajnoczi
  2021-10-29 14:59     ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 15:59 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 2858 bytes --]

On Mon, Oct 11, 2021 at 01:31:09AM -0400, Jagannathan Raman wrote:
> @@ -94,9 +101,31 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>      trace_vfu_prop("device", str);
>  }
>  
> +/*
> + * vfio-user-server depends on the availability of the 'socket' and 'device'
> + * properties. It also depends on devices instantiated in QEMU. These
> + * dependencies are not available during the instance_init phase of this
> + * object's life-cycle. As such, the server is initialized after the
> + * machine is setup. machine_init_done_notifier notifies vfio-user-server
> + * when the machine is setup, and the dependencies are available.
> + */
> +static void vfu_object_machine_done(Notifier *notifier, void *data)
> +{
> +    VfuObject *o = container_of(notifier, VfuObject, machine_done);

Was there a check for non-NULL o->socket before this? Maybe it's not
needed because QAPI treats 'socket' as a required field and refuses to
create the SocketAddress if it's missing?

> +
> +    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
> +                                o, VFU_DEV_TYPE_PCI);
> +    if (o->vfu_ctx == NULL) {
> +        error_setg(&error_abort, "vfu: Failed to create context - %s",
> +                   strerror(errno));

The error reporting needs to be synchronous so that hotplugging with
object-add fails instead of succeeding and leaving a failed object.

In the startup case (not hotplug) it's okay to abort.

> +        return;
> +    }
> +}
> +
>  static void vfu_object_init(Object *obj)
>  {
>      VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
> +    VfuObject *o = VFU_OBJECT(obj);
>  
>      if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
>          error_setg(&error_abort, "vfu: %s only compatible with %s machine",
> @@ -111,7 +140,12 @@ static void vfu_object_init(Object *obj)
>          return;
>      }
>  
> +    o->vfu_ctx = NULL;

The object's fields are initialized to 0 so this isn't necessary.

> +
>      k->nr_devs++;
> +
> +    o->machine_done.notify = vfu_object_machine_done;
> +    qemu_add_machine_init_done_notifier(&o->machine_done);

The notifier is invoked immediately if the machine has already been
initialized. That means vfu_object_machine_done() is called before the
properties ('socket' and 'device') have been set when object-add hotplug
is used. I think this needs to be moved elsewhere.
>  }
>  
>  static void vfu_object_finalize(Object *obj)
> @@ -123,6 +157,10 @@ static void vfu_object_finalize(Object *obj)
>  
>      g_free(o->socket);
>  
> +    if (o->vfu_ctx) {
> +        vfu_destroy_ctx(o->vfu_ctx);
> +    }
> +
>      g_free(o->device);
>  
>      if (k->nr_devs == 0) {

Missing qemu_remove_machine_init_done_notifier().

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/12] vfio-user: find and init PCI device
  2021-10-11  5:31 ` [PATCH v3 05/12] vfio-user: find and init PCI device Jagannathan Raman
@ 2021-10-27 16:05   ` Stefan Hajnoczi
  2021-10-29 15:58     ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 16:05 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 2975 bytes --]

On Mon, Oct 11, 2021 at 01:31:10AM -0400, Jagannathan Raman wrote:
> Find the PCI device with specified id. Initialize the device context
> with the QEMU PCI device
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 32 ++++++++++++++++++++++++++++++++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index d26e5ec9e9..7ce4e5b256 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -44,6 +44,8 @@
>  #include "qemu/notify.h"
>  #include "sysemu/sysemu.h"
>  #include "libvfio-user.h"
> +#include "hw/qdev-core.h"
> +#include "hw/pci/pci.h"
>  
>  #define TYPE_VFU_OBJECT "vfio-user-server"
>  OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> @@ -68,6 +70,8 @@ struct VfuObject {
>      Notifier machine_done;
>  
>      vfu_ctx_t *vfu_ctx;
> +
> +    PCIDevice *pci_dev;
>  };
>  
>  static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
> @@ -112,6 +116,9 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>  static void vfu_object_machine_done(Notifier *notifier, void *data)
>  {
>      VfuObject *o = container_of(notifier, VfuObject, machine_done);
> +    DeviceState *dev = NULL;
> +    vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
> +    int ret;
>  
>      o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
>                                  o, VFU_DEV_TYPE_PCI);
> @@ -120,6 +127,31 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>                     strerror(errno));
>          return;
>      }
> +
> +    dev = qdev_find_recursive(sysbus_get_default(), o->device);
> +    if (dev == NULL) {
> +        error_setg(&error_abort, "vfu: Device %s not found", o->device);
> +        return;
> +    }
> +
> +    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> +        error_setg(&error_abort, "vfu: %s not a PCI device", o->device);
> +        return;
> +    }
> +
> +    o->pci_dev = PCI_DEVICE(dev);
> +
> +    if (pci_is_express(o->pci_dev)) {
> +        pci_type = VFU_PCI_TYPE_EXPRESS;
> +    }
> +
> +    ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
> +    if (ret < 0) {
> +        error_setg(&error_abort,
> +                   "vfu: Failed to attach PCI device %s to context - %s",
> +                   o->device, strerror(errno));
> +        return;
> +    }

It's unclear what happens when one of these error code paths is taken.
o->vfu_ctx and o->pci_dev might both be initialized, so how does the
code know not to service the vfio-user connection? It would be easy to
tell that this is correct if o->pci_dev and o->vfu_ctx were destroyed
when an error occurs.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 06/12] vfio-user: run vfio-user context
  2021-10-11  5:31 ` [PATCH v3 06/12] vfio-user: run vfio-user context Jagannathan Raman
@ 2021-10-27 16:21   ` Stefan Hajnoczi
  2021-10-28 21:55     ` John Levon
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 16:21 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 5630 bytes --]

On Mon, Oct 11, 2021 at 01:31:11AM -0400, Jagannathan Raman wrote:
> Setup a handler to run vfio-user context. The context is driven by
> messages to the file descriptor associated with it - get the fd for
> the context and hook up the handler with it
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 75 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 74 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> index 7ce4e5b256..05f7fff19c 100644
> --- a/hw/remote/vfio-user-obj.c
> +++ b/hw/remote/vfio-user-obj.c
> @@ -42,6 +42,7 @@
>  #include "qapi/error.h"
>  #include "qapi/qapi-visit-sockets.h"
>  #include "qemu/notify.h"
> +#include "qemu/thread.h"
>  #include "sysemu/sysemu.h"
>  #include "libvfio-user.h"
>  #include "hw/qdev-core.h"
> @@ -72,6 +73,8 @@ struct VfuObject {
>      vfu_ctx_t *vfu_ctx;
>  
>      PCIDevice *pci_dev;
> +
> +    int vfu_poll_fd;
>  };
>  
>  static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
> @@ -105,6 +108,58 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>      trace_vfu_prop("device", str);
>  }
>  
> +static void vfu_object_ctx_run(void *opaque)
> +{
> +    VfuObject *o = opaque;
> +    int ret = -1;
> +
> +    while (ret != 0) {
> +        ret = vfu_run_ctx(o->vfu_ctx);
> +        if (ret < 0) {
> +            if (errno == EINTR) {
> +                continue;
> +            } else if (errno == ENOTCONN) {
> +                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> +                o->vfu_poll_fd = -1;
> +                object_unparent(OBJECT(o));
> +                break;
> +            } else {
> +                error_setg(&error_abort, "vfu: Failed to run device %s - %s",
> +                           o->device, strerror(errno));
> +                 break;
> +            }

The libvfio-user.h doc comments say this function can return -1 (EAGAIN)
when LIBVFIO_USER_FLAG_ATTACH_NB was used. Is the doc comment wrong
since this patch seems to rely on vfu_run_ctx() returning 0 when there
are no more commands to process?

> +        }
> +    }
> +}
> +
> +static void vfu_object_attach_ctx(void *opaque)
> +{
> +    VfuObject *o = opaque;
> +    int ret;
> +
> +    qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> +    o->vfu_poll_fd = -1;
> +
> +retry_attach:
> +    ret = vfu_attach_ctx(o->vfu_ctx);
> +    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
> +        goto retry_attach;

Can we wait for the poll fd to become readable instead of spinning? I
don't know the libvfio-user APIs so I'm not sure, but it would be nice
to avoid a busy loop.

> +    } else if (ret < 0) {
> +        error_setg(&error_abort,

error_abort is not appropriate for hotplugged objects, where it's less
likely that the user wants to terminate the process when a failure
occurs. If asynchronous errors occur then QMP Events should be raised so
the QMP client gets notified and can deal with them.

> +                   "vfu: Failed to attach device %s to context - %s",
> +                   o->device, strerror(errno));
> +        return;
> +    }
> +
> +    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
> +    if (o->vfu_poll_fd < 0) {
> +        error_setg(&error_abort, "vfu: Failed to get poll fd %s", o->device);
> +        return;
> +    }
> +
> +    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_ctx_run, NULL, o);
> +}
> +
>  /*
>   * vfio-user-server depends on the availability of the 'socket' and 'device'
>   * properties. It also depends on devices instantiated in QEMU. These
> @@ -120,7 +175,8 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>      vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
>      int ret;
>  
> -    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
> +    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path,
> +                                LIBVFIO_USER_FLAG_ATTACH_NB,
>                                  o, VFU_DEV_TYPE_PCI);
>      if (o->vfu_ctx == NULL) {
>          error_setg(&error_abort, "vfu: Failed to create context - %s",
> @@ -152,6 +208,21 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>                     o->device, strerror(errno));
>          return;
>      }
> +
> +    ret = vfu_realize_ctx(o->vfu_ctx);
> +    if (ret < 0) {
> +        error_setg(&error_abort, "vfu: Failed to realize device %s- %s",
> +                   o->device, strerror(errno));
> +        return;
> +    }
> +
> +    o->vfu_poll_fd = vfu_get_poll_fd(o->vfu_ctx);
> +    if (o->vfu_poll_fd < 0) {
> +        error_setg(&error_abort, "vfu: Failed to get poll fd %s", o->device);
> +        return;
> +    }
> +
> +    qemu_set_fd_handler(o->vfu_poll_fd, vfu_object_attach_ctx, NULL, o);
>  }
>  
>  static void vfu_object_init(Object *obj)
> @@ -178,6 +249,8 @@ static void vfu_object_init(Object *obj)
>  
>      o->machine_done.notify = vfu_object_machine_done;
>      qemu_add_machine_init_done_notifier(&o->machine_done);
> +
> +    o->vfu_poll_fd = -1;

This must be done before the qemu_add_machine_init_done_notifier() call
in the hotplug case. qemu_add_machine_init_done_notifier() invokes the
notifier callback immediately if machine init was already done.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 07/12] vfio-user: handle PCI config space accesses
  2021-10-11  5:31 ` [PATCH v3 07/12] vfio-user: handle PCI config space accesses Jagannathan Raman
@ 2021-10-27 16:35   ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 16:35 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]

On Mon, Oct 11, 2021 at 01:31:12AM -0400, Jagannathan Raman wrote:
> +static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
> +                                     size_t count, loff_t offset,
> +                                     const bool is_write)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    uint32_t pci_access_width = sizeof(uint32_t);
> +    size_t bytes = count;
> +    uint32_t val = 0;
> +    char *ptr = buf;
> +    int len;
> +
> +    while (bytes > 0) {
> +        len = (bytes > pci_access_width) ? pci_access_width : bytes;
> +        if (is_write) {
> +            memcpy(&val, ptr, len);
> +            pci_host_config_write_common(o->pci_dev, offset,
> +                                         pci_config_size(o->pci_dev),
> +                                         val, len);
> +            trace_vfu_cfg_write(offset, val);
> +        } else {
> +            val = pci_host_config_read_common(o->pci_dev, offset,
> +                                              pci_config_size(o->pci_dev), len);
> +            memcpy(ptr, &val, len);
> +            trace_vfu_cfg_read(offset, val);
> +        }

pci_host_config_read/write_common() work with host-endian values. What
is the endianness of buf?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 09/12] vfio-user: handle PCI BAR accesses
  2021-10-11  5:31 ` [PATCH v3 09/12] vfio-user: handle PCI BAR accesses Jagannathan Raman
@ 2021-10-27 16:38   ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 16:38 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 571 bytes --]

On Mon, Oct 11, 2021 at 01:31:14AM -0400, Jagannathan Raman wrote:
> Determine the BARs used by the PCI device and register handlers to
> manage the access to the same.
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  hw/remote/vfio-user-obj.c | 90 +++++++++++++++++++++++++++++++++++++++
>  hw/remote/trace-events    |  3 ++
>  2 files changed, 93 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 12/12] vfio-user: acceptance test
  2021-10-11  5:31 ` [PATCH v3 12/12] vfio-user: acceptance test Jagannathan Raman
  2021-10-11 22:26   ` Philippe Mathieu-Daudé
@ 2021-10-27 16:42   ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 16:42 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 2589 bytes --]

On Mon, Oct 11, 2021 at 01:31:17AM -0400, Jagannathan Raman wrote:
> Acceptance test for libvfio-user in QEMU
> 
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> ---
>  MAINTAINERS                   |  1 +
>  tests/acceptance/vfio-user.py | 96 +++++++++++++++++++++++++++++++++++
>  2 files changed, 97 insertions(+)
>  create mode 100644 tests/acceptance/vfio-user.py
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 79ff8331dc..a98d37423b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3422,6 +3422,7 @@ F: hw/remote/iohub.c
>  F: include/hw/remote/iohub.h
>  F: subprojects/libvfio-user
>  F: hw/remote/vfio-user-obj.c
> +F: tests/acceptance/vfio-user.py
>  
>  EBPF:
>  M: Jason Wang <jasowang@redhat.com>
> diff --git a/tests/acceptance/vfio-user.py b/tests/acceptance/vfio-user.py
> new file mode 100644
> index 0000000000..5eb5cabc68
> --- /dev/null
> +++ b/tests/acceptance/vfio-user.py
> @@ -0,0 +1,96 @@
> +# vfio-user protocol sanity test
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later.  See the COPYING file in the top-level directory.
> +
> +
> +import os
> +import socket
> +import uuid
> +
> +from avocado_qemu import Test
> +from avocado_qemu import wait_for_console_pattern
> +from avocado_qemu import exec_command
> +from avocado_qemu import exec_command_and_wait_for_pattern
> +
> +class VfioUser(Test):
> +    """
> +    :avocado: tags=vfiouser
> +    """
> +    KERNEL_COMMON_COMMAND_LINE = 'printk.time=0 '
> +
> +    def do_test(self, kernel_url, initrd_url, kernel_command_line,
> +                machine_type):
> +        """Main test method"""
> +        self.require_accelerator('kvm')
> +
> +        kernel_path = self.fetch_asset(kernel_url)
> +        initrd_path = self.fetch_asset(initrd_url)
> +
> +        socket = os.path.join('/tmp', str(uuid.uuid4()))
> +        if os.path.exists(socket):
> +            os.remove(socket)
> +
> +        # Create remote process
> +        remote_vm = self.get_vm()
> +        remote_vm.add_args('-machine', 'x-remote')
> +        remote_vm.add_args('-nodefaults')
> +        remote_vm.add_args('-device', 'lsi53c895a,id=lsi1')
> +        remote_vm.add_args('-object', 'vfio-user,id=vfioobj1,'
> +                           'devid=lsi1,socket='+socket)

The object is called "vfio-user-server" and the parameter syntax seems
to be outdated. Does this test pass?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 11/12] vfio-user: register handlers to facilitate migration
  2021-10-11  5:31 ` [PATCH v3 11/12] vfio-user: register handlers to facilitate migration Jagannathan Raman
@ 2021-10-27 18:30   ` Stefan Hajnoczi
  2021-12-15 15:49     ` Jag Raman
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 18:30 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 3667 bytes --]

On Mon, Oct 11, 2021 at 01:31:16AM -0400, Jagannathan Raman wrote:
> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> +    static int migrated_devs;
> +    Error *local_err = NULL;
> +    int ret;
> +
> +    /**
> +     * TODO: move to VFU_MIGR_STATE_RESUME handler. Presently, the
> +     * VMSD data from source is not available at RESUME state.
> +     * Working on a fix for this.
> +     */
> +    if (!o->vfu_mig_file) {
> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
> +    }
> +
> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
> +    if (ret) {
> +        error_setg(&error_abort, "vfu: failed to restore device state");
> +        return;
> +    }
> +
> +    qemu_file_shutdown(o->vfu_mig_file);
> +    o->vfu_mig_file = NULL;
> +
> +    /* VFU_MIGR_STATE_RUNNING begins here */
> +    if (++migrated_devs == k->nr_devs) {

See below about migrated_devs.

> +        bdrv_invalidate_cache_all(&local_err);
> +        if (local_err) {
> +            error_report_err(local_err);
> +            return;
> +        }
> +
> +        vm_start();
> +    }
> +}
> +
> +static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
> +{
> +    VfuObject *o = vfu_get_private(vfu_ctx);
> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
> +    static int migrated_devs;
> +
> +    /**
> +     * note: calling bdrv_inactivate_all() is not the best approach.
> +     *
> +     *  Ideally, we would identify the block devices (if any) indirectly
> +     *  linked (such as via a scs-hd device) to each of the migrated devices,

s/scs/scsi/

> +     *  and inactivate them individually. This is essential while operating
> +     *  the server in a storage daemon mode, with devices from different VMs.
> +     *
> +     *  However, we currently don't have this capability. As such, we need to
> +     *  inactivate all devices at the same time when migration is completed.
> +     */
> +    if (++migrated_devs == k->nr_devs) {
> +        bdrv_inactivate_all();
> +        vm_stop(RUN_STATE_PAUSED);

The order of these two functions is reversed in migration/migration.c.
First we pause the VM, then we inactivate disks.

I think we need to zero migrated_devs in case migration fails and we try
to migrate again later:

  migrated_devs = 0;

This is still not quite right because maybe only a few VfuObjects are
stopped before migration fails. A different approach for counting
devices is necessary, like zeroing migrated_devs in
vfu_mig_state_stop_and_copy().

> @@ -422,6 +722,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>          return;
>      }
>  
> +    /*
> +     * TODO: The 0x20000 number used below is a temporary. We are working on
> +     *     a cleaner fix for this.
> +     *
> +     *     The libvfio-user library assumes that the remote knows the size of
> +     *     the data to be migrated at boot time, but that is not the case with
> +     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
> +     *     as a sufficiently large buffer to demonstrate migration, but that
> +     *     cannot be used as a solution.
> +     *
> +     */

My question from the previous revision was not answered:

  libvfio-user has the vfu_migration_callbacks_t interface that allows the
  device to save/load more data regardless of the size of the migration
  region. I don't see the issue here since the region doesn't need to be
  sized to fit the savevm data?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 00/12] vfio-user server in QEMU
  2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
                   ` (11 preceding siblings ...)
  2021-10-11  5:31 ` [PATCH v3 12/12] vfio-user: acceptance test Jagannathan Raman
@ 2021-10-27 18:33 ` Stefan Hajnoczi
  12 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-10-27 18:33 UTC (permalink / raw)
  To: Jagannathan Raman
  Cc: elena.ufimtseva, john.g.johnson, thuth, swapnil.ingle,
	john.levon, philmd, qemu-devel, alex.williamson,
	marcandre.lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 763 bytes --]

On Mon, Oct 11, 2021 at 01:31:05AM -0400, Jagannathan Raman wrote:
> We have addressed most of comments from v2.
> 
> We are working on MSI-x support (used by PCIe devices such as “nvme”)
> and a couple of comments in the migration patches. We hope these two
> items will be in the next revision of the patches.

Please add an object-add vfio-user-server and object-del
vfio-user-server test case. The code is currently written in a way that
is incompatible with hotplug. Even if you don't need hotplug right away
it's worth implementing it immediately to avoid baking in assumptions
that will be hard to fix later.

Please try running the tests on a big-endian host, if possible. That may
help shake out remaining endianness issues.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 06/12] vfio-user: run vfio-user context
  2021-10-27 16:21   ` Stefan Hajnoczi
@ 2021-10-28 21:55     ` John Levon
  0 siblings, 0 replies; 44+ messages in thread
From: John Levon @ 2021-10-28 21:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: elena.ufimtseva, john.g.johnson, thuth, Jagannathan Raman,
	Swapnil Ingle, alex.bennee, qemu-devel, alex.williamson,
	marcandre.lureau, Thanos Makatos, pbonzini, philmd

On Wed, Oct 27, 2021 at 05:21:30PM +0100, Stefan Hajnoczi wrote:

> > +static void vfu_object_ctx_run(void *opaque)
> > +{
> > +    VfuObject *o = opaque;
> > +    int ret = -1;
> > +
> > +    while (ret != 0) {
> > +        ret = vfu_run_ctx(o->vfu_ctx);
> > +        if (ret < 0) {
> > +            if (errno == EINTR) {
> > +                continue;
> > +            } else if (errno == ENOTCONN) {
> > +                qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> > +                o->vfu_poll_fd = -1;
> > +                object_unparent(OBJECT(o));
> > +                break;
> > +            } else {
> > +                error_setg(&error_abort, "vfu: Failed to run device %s - %s",
> > +                           o->device, strerror(errno));
> > +                 break;
> > +            }
> 
> The libvfio-user.h doc comments say this function can return -1 (EAGAIN)
> when LIBVFIO_USER_FLAG_ATTACH_NB was used. Is the doc comment wrong
> since this patch seems to rely on vfu_run_ctx() returning 0 when there
> are no more commands to process?

 * @returns the number of requests processed (0 or more); or -1 on error,        
 *          with errno set as follows:

So in fact it does return 0. The EAGAIN line needs removing; I'll fix it
shortly.

> > +static void vfu_object_attach_ctx(void *opaque)
> > +{
> > +    VfuObject *o = opaque;
> > +    int ret;
> > +
> > +    qemu_set_fd_handler(o->vfu_poll_fd, NULL, NULL, NULL);
> > +    o->vfu_poll_fd = -1;
> > +
> > +retry_attach:
> > +    ret = vfu_attach_ctx(o->vfu_ctx);
> > +    if (ret < 0 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
> > +        goto retry_attach;
> 
> Can we wait for the poll fd to become readable instead of spinning? I
> don't know the libvfio-user APIs so I'm not sure, but it would be nice
> to avoid a busy loop.

At this point, ->vfu_poll_fd is a listening socket, no reason why qemu couldn't
epoll on it.

regards
john

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/12] vfio-user: build library
  2021-10-27 15:17   ` Stefan Hajnoczi
@ 2021-10-29 14:17     ` Jag Raman
  2021-11-01  9:56       ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2021-10-29 14:17 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Alex Williamson, Marc-André Lureau,
	thanos.makatos, pbonzini, alex.bennee

Hi Stefan,

> On Oct 27, 2021, at 11:17 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Mon, Oct 11, 2021 at 01:31:07AM -0400, Jagannathan Raman wrote:
>> diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
>> index 08c16e235f..f9e512d44a 100644
>> --- a/hw/remote/Kconfig
>> +++ b/hw/remote/Kconfig
>> @@ -1,4 +1,9 @@
>> +config VFIO_USER_SERVER
>> +    bool
>> +    default n
> 
> Does VFIO_USER_SERVER depend on MULTIPROCESS?

Yes, VFIO_USER_SERVER presently depends on MULTIPROCESS.

This is because it needs some object and functions implemented by multiprocess
such as TYPE_REMOTE_MACHINE and TYPE_REMOTE_PCIHOST.

> 
>> diff --git a/tests/acceptance/multiprocess.py b/tests/acceptance/multiprocess.py
>> index 96627f022a..7383c6eb58 100644
>> --- a/tests/acceptance/multiprocess.py
>> +++ b/tests/acceptance/multiprocess.py
>> @@ -67,6 +67,7 @@ def do_test(self, kernel_url, initrd_url, kernel_command_line,
>>     def test_multiprocess_x86_64(self):
>>         """
>>         :avocado: tags=arch:x86_64
>> +        :avocado: tags=distro:centos
>>         """
>>         kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
>>                       '/linux/releases/31/Everything/x86_64/os/images'
>> @@ -82,6 +83,7 @@ def test_multiprocess_x86_64(self):
>>     def test_multiprocess_aarch64(self):
>>         """
>>         :avocado: tags=arch:aarch64
>> +        :avocado: tags=distro:ubuntu
>>         """
>>         kernel_url = ('https://archives.fedoraproject.org/pub/archive/fedora'
>>                       '/linux/releases/31/Everything/aarch64/os/images'
> 
> Did you tag them with different distros in order to get coverage on both
> CentOS and Ubuntu (even though that's orthogonal to x86_64 vs aarch64)?
> Maybe a comment is necessary so it's clear why these tags are in place
> because the test isn't actually limited to that distro.

OK, I’ll add a comment to explain this.

For background, we disabled multiprocess by default in this series. This is
because, not all docker images have the json-c package available to build
libvfio-user library. So we had to enable it on selected docker images where
that package would be available. As such, we also had to limit the multiprocess
acceptance tests to the distros which had multiprocess enabled.

Going by “.gitlab-ci.d/buildtest.yml” file, the distros that support x86_64 target
are centos (build-system-centos) and opensuse (build-system-opensuse). So
we picked centos for x86_64, as running this test on other builds could
cause a failure. Likewise, ubuntu supported aarch64.

--
Jag


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-10-27 15:40   ` Stefan Hajnoczi
@ 2021-10-29 14:42     ` Jag Raman
  2021-11-01 10:34       ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2021-10-29 14:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Alex Williamson, Marc-André Lureau,
	thanos.makatos, pbonzini, alex.bennee



> On Oct 27, 2021, at 11:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Mon, Oct 11, 2021 at 01:31:08AM -0400, Jagannathan Raman wrote:
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index 0222bb4506..97de79cc36 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -705,6 +705,20 @@
>> { 'struct': 'RemoteObjectProperties',
>>   'data': { 'fd': 'str', 'devid': 'str' } }
>> 
>> +##
>> +# @VfioUserServerProperties:
>> +#
>> +# Properties for vfio-user-server objects.
>> +#
>> +# @socket: socket to be used by the libvfiouser library
>> +#
>> +# @device: the id of the device to be emulated at the server
>> +#
>> +# Since: 6.0
> 
> 6.2
> 
>> +##
>> +{ 'struct': 'VfioUserServerProperties',
>> +  'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>> +
>> ##
>> # @RngProperties:
>> #
>> @@ -830,7 +844,8 @@
>>     'tls-creds-psk',
>>     'tls-creds-x509',
>>     'tls-cipher-suites',
>> -    'x-remote-object'
>> +    'x-remote-object',
>> +    'vfio-user-server'
> 
> Should it have the experimental prefix ('x-') or is the QAPI interface
> stable? I think some things to think about are whether a single process
> can host multiple device servers, whether hotplug is possible, etc. If
> the interface is stable then it must be able to accomodate future
> features (at least ones we can anticipate right now :)).

We did test out hotplug support.

We’ll get back to you on the multiple device servers scenario.

> 
>>   ] }
>> 
>> ##
>> @@ -887,7 +902,8 @@
>>       'tls-creds-psk':              'TlsCredsPskProperties',
>>       'tls-creds-x509':             'TlsCredsX509Properties',
>>       'tls-cipher-suites':          'TlsCredsProperties',
>> -      'x-remote-object':            'RemoteObjectProperties'
>> +      'x-remote-object':            'RemoteObjectProperties',
>> +      'vfio-user-server':           'VfioUserServerProperties'
>>   } }
>> 
>> ##
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> new file mode 100644
>> index 0000000000..c2a300f0ff
>> --- /dev/null
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -0,0 +1,173 @@
>> +/**
>> + * QEMU vfio-user-server server object
>> + *
>> + * Copyright © 2021 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
>> + *
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +/**
>> + * Usage: add options:
>> + *     -machine x-remote
>> + *     -device <PCI-device>,id=<pci-dev-id>
>> + *     -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,
> 
> I expected socket.type= and socket.path= based on the QAPI schema. Is
> this command-line example correct?

When I tried the “socket.path” approach, QEMU was not able to parse the
arguments. So I had to break it down to a series of individual members.

If “socket.path” is the expected way, I’ll see why the parser is not working
as expected. 

> 
>> + *             device=<pci-dev-id>
>> + *
>> + * Note that vfio-user-server object must be used with x-remote machine only.
>> + * This server could only support PCI devices for now.
>> + *
>> + * type - SocketAddress type - presently "unix" alone is supported. Required
>> + *        option
>> + *
>> + * path - named unix socket, it will be created by the server. It is
>> + *        a required option
>> + *
>> + * device - id of a device on the server, a required option. PCI devices
>> + *          alone are supported presently.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "qemu-common.h"
>> +
>> +#include "qom/object.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/error-report.h"
>> +#include "trace.h"
>> +#include "sysemu/runstate.h"
>> +#include "hw/boards.h"
>> +#include "hw/remote/machine.h"
>> +#include "qapi/error.h"
>> +#include "qapi/qapi-visit-sockets.h"
>> +
>> +#define TYPE_VFU_OBJECT "vfio-user-server"
>> +OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
>> +
>> +struct VfuObjectClass {
>> +    ObjectClass parent_class;
>> +
>> +    unsigned int nr_devs;
>> +
>> +    /* Maximum number of devices the server could support */
>> +    unsigned int max_devs;
>> +};
>> +
>> +struct VfuObject {
>> +    /* private */
>> +    Object parent;
>> +
>> +    SocketAddress *socket;
>> +
>> +    char *device;
>> +};
>> +
>> +static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
>> +                                  void *opaque, Error **errp)
>> +{
>> +    VfuObject *o = VFU_OBJECT(obj);
>> +
>> +    g_free(o->socket);
> 
> qapi_free_SocketAddress(o->socket)?

OK, will use that.

Didn’t know the visitors also define a function for free’ing. Thank you!

> 
>> +
>> +    o->socket = NULL;
>> +
>> +    visit_type_SocketAddress(v, name, &o->socket, errp);
>> +
>> +    if (o->socket->type != SOCKET_ADDRESS_TYPE_UNIX) {
>> +        error_setg(&error_abort, "vfu: Unsupported socket type - %s",
>> +                   o->socket->u.q_unix.path);
> 
> o->socket must be freed and set it to NULL again, otherwise setting the
> property appears to fail but the SocketAddress actually retains the
> invalid value.

OK, got it.

> 
>> +        return;
>> +    }
>> +
>> +    trace_vfu_prop("socket", o->socket->u.q_unix.path);
>> +}
>> +
>> +static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>> +{
>> +    VfuObject *o = VFU_OBJECT(obj);
>> +
>> +    g_free(o->device);
>> +
>> +    o->device = g_strdup(str);
>> +
>> +    trace_vfu_prop("device", str);
>> +}
>> +
>> +static void vfu_object_init(Object *obj)
>> +{
>> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
>> +
>> +    if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
>> +        error_setg(&error_abort, "vfu: %s only compatible with %s machine",
>> +                   TYPE_VFU_OBJECT, TYPE_REMOTE_MACHINE);
>> +        return;
>> +    }
>> +
>> +    if (k->nr_devs >= k->max_devs) {
>> +        error_setg(&error_abort,
>> +                   "Reached maximum number of vfio-user-server devices: %u",
>> +                   k->max_devs);
>> +        return;
>> +    }
>> +
>> +    k->nr_devs++;
>> +}
>> +
>> +static void vfu_object_finalize(Object *obj)
>> +{
>> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
>> +    VfuObject *o = VFU_OBJECT(obj);
>> +
>> +    k->nr_devs--;
>> +
>> +    g_free(o->socket);
> 
> qapi_free_SocketAddress(o->socket)?
> 
>> +
>> +    g_free(o->device);
>> +
>> +    if (k->nr_devs == 0) {
>> +        qemu_system_shutdown_request(SHUTDOWN_CAUSE_GUEST_SHUTDOWN);
>> +    }
> 
> This won't work for all use cases. The user may wish to create/delete
> vhost-user servers at runtime without terminating the process when there
> are none left. An boolean option can be added in the future to control
> this behavior, so it's okay to leave this as is.

Thank you, we’ll make a note of this. I’ll add a TODO comment here to ensure we
don’t lose this thought.

--
Jag

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 04/12] vfio-user: instantiate vfio-user context
  2021-10-27 15:59   ` Stefan Hajnoczi
@ 2021-10-29 14:59     ` Jag Raman
  2021-11-01 10:35       ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2021-10-29 14:59 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Alex Williamson, Marc-André Lureau,
	thanos.makatos, pbonzini, alex.bennee



> On Oct 27, 2021, at 11:59 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Mon, Oct 11, 2021 at 01:31:09AM -0400, Jagannathan Raman wrote:
>> @@ -94,9 +101,31 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>>     trace_vfu_prop("device", str);
>> }
>> 
>> +/*
>> + * vfio-user-server depends on the availability of the 'socket' and 'device'
>> + * properties. It also depends on devices instantiated in QEMU. These
>> + * dependencies are not available during the instance_init phase of this
>> + * object's life-cycle. As such, the server is initialized after the
>> + * machine is setup. machine_init_done_notifier notifies vfio-user-server
>> + * when the machine is setup, and the dependencies are available.
>> + */
>> +static void vfu_object_machine_done(Notifier *notifier, void *data)
>> +{
>> +    VfuObject *o = container_of(notifier, VfuObject, machine_done);
> 
> Was there a check for non-NULL o->socket before this? Maybe it's not
> needed because QAPI treats 'socket' as a required field and refuses to
> create the SocketAddress if it's missing?

Yes,  “socket” is a required option. The server will not launch without that option.

I believe optional parameters are defined within ‘[‘ and ‘]’ braces in "./qapi/qom.json"

> 
>> +
>> +    o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
>> +                                o, VFU_DEV_TYPE_PCI);
>> +    if (o->vfu_ctx == NULL) {
>> +        error_setg(&error_abort, "vfu: Failed to create context - %s",
>> +                   strerror(errno));
> 
> The error reporting needs to be synchronous so that hotplugging with
> object-add fails instead of succeeding and leaving a failed object.

OK, will make the change for the error reporting to be synchronous.

> 
> In the startup case (not hotplug) it's okay to abort.
> 
>> +        return;
>> +    }
>> +}
>> +
>> static void vfu_object_init(Object *obj)
>> {
>>     VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
>> +    VfuObject *o = VFU_OBJECT(obj);
>> 
>>     if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
>>         error_setg(&error_abort, "vfu: %s only compatible with %s machine",
>> @@ -111,7 +140,12 @@ static void vfu_object_init(Object *obj)
>>         return;
>>     }
>> 
>> +    o->vfu_ctx = NULL;
> 
> The object's fields are initialized to 0 so this isn't necessary.

Got it.

> 
>> +
>>     k->nr_devs++;
>> +
>> +    o->machine_done.notify = vfu_object_machine_done;
>> +    qemu_add_machine_init_done_notifier(&o->machine_done);
> 
> The notifier is invoked immediately if the machine has already been
> initialized. That means vfu_object_machine_done() is called before the
> properties ('socket' and 'device') have been set when object-add hotplug
> is used. I think this needs to be moved elsewhere.

We’ll check this scenario out. Thank you!

>> }
>> 
>> static void vfu_object_finalize(Object *obj)
>> @@ -123,6 +157,10 @@ static void vfu_object_finalize(Object *obj)
>> 
>>     g_free(o->socket);
>> 
>> +    if (o->vfu_ctx) {
>> +        vfu_destroy_ctx(o->vfu_ctx);
>> +    }
>> +
>>     g_free(o->device);
>> 
>>     if (k->nr_devs == 0) {
> 
> Missing qemu_remove_machine_init_done_notifier().

Got it.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/12] vfio-user: find and init PCI device
  2021-10-27 16:05   ` Stefan Hajnoczi
@ 2021-10-29 15:58     ` Jag Raman
  2021-11-01 10:38       ` Stefan Hajnoczi
  0 siblings, 1 reply; 44+ messages in thread
From: Jag Raman @ 2021-10-29 15:58 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Alex Williamson, Marc-André Lureau,
	thanos.makatos, pbonzini, alex.bennee



> On Oct 27, 2021, at 12:05 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Mon, Oct 11, 2021 at 01:31:10AM -0400, Jagannathan Raman wrote:
>> Find the PCI device with specified id. Initialize the device context
>> with the QEMU PCI device
>> 
>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>> ---
>> hw/remote/vfio-user-obj.c | 32 ++++++++++++++++++++++++++++++++
>> 1 file changed, 32 insertions(+)
>> 
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index d26e5ec9e9..7ce4e5b256 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -44,6 +44,8 @@
>> #include "qemu/notify.h"
>> #include "sysemu/sysemu.h"
>> #include "libvfio-user.h"
>> +#include "hw/qdev-core.h"
>> +#include "hw/pci/pci.h"
>> 
>> #define TYPE_VFU_OBJECT "vfio-user-server"
>> OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
>> @@ -68,6 +70,8 @@ struct VfuObject {
>>     Notifier machine_done;
>> 
>>     vfu_ctx_t *vfu_ctx;
>> +
>> +    PCIDevice *pci_dev;
>> };
>> 
>> static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
>> @@ -112,6 +116,9 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
>> static void vfu_object_machine_done(Notifier *notifier, void *data)
>> {
>>     VfuObject *o = container_of(notifier, VfuObject, machine_done);
>> +    DeviceState *dev = NULL;
>> +    vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
>> +    int ret;
>> 
>>     o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
>>                                 o, VFU_DEV_TYPE_PCI);
>> @@ -120,6 +127,31 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>>                    strerror(errno));
>>         return;
>>     }
>> +
>> +    dev = qdev_find_recursive(sysbus_get_default(), o->device);
>> +    if (dev == NULL) {
>> +        error_setg(&error_abort, "vfu: Device %s not found", o->device);
>> +        return;
>> +    }
>> +
>> +    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> +        error_setg(&error_abort, "vfu: %s not a PCI device", o->device);
>> +        return;
>> +    }
>> +
>> +    o->pci_dev = PCI_DEVICE(dev);
>> +
>> +    if (pci_is_express(o->pci_dev)) {
>> +        pci_type = VFU_PCI_TYPE_EXPRESS;
>> +    }
>> +
>> +    ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
>> +    if (ret < 0) {
>> +        error_setg(&error_abort,
>> +                   "vfu: Failed to attach PCI device %s to context - %s",
>> +                   o->device, strerror(errno));
>> +        return;
>> +    }
> 
> It's unclear what happens when one of these error code paths is taken.
> o->vfu_ctx and o->pci_dev might both be initialized, so how does the
> code know not to service the vfio-user connection? It would be easy to
> tell that this is correct if o->pci_dev and o->vfu_ctx were destroyed
> when an error occurs.

Hey Stefan, if I understand your question correctly, you’re wondering
if the server would still service the connection if it takes the above
error path.

I don’t believe that would happen. When the above error path is taken,
“error_abort” is set. Setting error_abort immediately terminates the
QEMU process. Additionally, vfu_object_ctx_run() won’t be
attached to the unix socket as the function exits early - so the
connection wouldn’t be serviced.

—
Jag

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 02/12] vfio-user: build library
  2021-10-29 14:17     ` Jag Raman
@ 2021-11-01  9:56       ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-11-01  9:56 UTC (permalink / raw)
  To: Jag Raman
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Alex Williamson, Marc-André Lureau,
	thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 995 bytes --]

On Fri, Oct 29, 2021 at 02:17:43PM +0000, Jag Raman wrote:
> Hi Stefan,
> 
> > On Oct 27, 2021, at 11:17 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Mon, Oct 11, 2021 at 01:31:07AM -0400, Jagannathan Raman wrote:
> >> diff --git a/hw/remote/Kconfig b/hw/remote/Kconfig
> >> index 08c16e235f..f9e512d44a 100644
> >> --- a/hw/remote/Kconfig
> >> +++ b/hw/remote/Kconfig
> >> @@ -1,4 +1,9 @@
> >> +config VFIO_USER_SERVER
> >> +    bool
> >> +    default n
> > 
> > Does VFIO_USER_SERVER depend on MULTIPROCESS?
> 
> Yes, VFIO_USER_SERVER presently depends on MULTIPROCESS.
> 
> This is because it needs some object and functions implemented by multiprocess
> such as TYPE_REMOTE_MACHINE and TYPE_REMOTE_PCIHOST.

I wonder if it's necessary to specify that dependency in the Kconfig
file? If someone disables MULTIPROCESS but enables VFIO_USER_SERVER then
there should probably be an error (or MULTIPROCESS becomes enabled again
automatically).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-10-29 14:42     ` Jag Raman
@ 2021-11-01 10:34       ` Stefan Hajnoczi
  2021-11-04 12:13         ` Markus Armbruster
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-11-01 10:34 UTC (permalink / raw)
  To: Jag Raman
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Markus Armbruster, Alex Williamson,
	Marc-André Lureau, thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 2755 bytes --]

On Fri, Oct 29, 2021 at 02:42:49PM +0000, Jag Raman wrote:
> > On Oct 27, 2021, at 11:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > On Mon, Oct 11, 2021 at 01:31:08AM -0400, Jagannathan Raman wrote:
> >> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> >> new file mode 100644
> >> index 0000000000..c2a300f0ff
> >> --- /dev/null
> >> +++ b/hw/remote/vfio-user-obj.c
> >> @@ -0,0 +1,173 @@
> >> +/**
> >> + * QEMU vfio-user-server server object
> >> + *
> >> + * Copyright © 2021 Oracle and/or its affiliates.
> >> + *
> >> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
> >> + *
> >> + * See the COPYING file in the top-level directory.
> >> + *
> >> + */
> >> +
> >> +/**
> >> + * Usage: add options:
> >> + *     -machine x-remote
> >> + *     -device <PCI-device>,id=<pci-dev-id>
> >> + *     -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,
> > 
> > I expected socket.type= and socket.path= based on the QAPI schema. Is
> > this command-line example correct?
> 
> When I tried the “socket.path” approach, QEMU was not able to parse the
> arguments. So I had to break it down to a series of individual members.
> 
> If “socket.path” is the expected way, I’ll see why the parser is not working
> as expected. 

CCing Markus regarding QAPI.

I'm surprised because the QAPI schema for vfio-user-server objects is:

  { 'struct': 'VfioUserServerProperties',
    'data': { 'socket': 'SocketAddress', 'device': 'str' } }

It's not clear to me why the command-line parser flattens the 'socket'
field into its 'type' and 'path' sub-fields in your example:

  -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,device=<pci-dev-id>

Maybe because SocketAddress is an enum instead of a struct?

Imagine a second SocketAddress field is added to vfio-user-server. How
can the parser know which field 'type' and 'path' belong to? I tried it:

  { 'struct': 'VfioUserServerProperties',
    'data': { 'socket': 'SocketAddress', 'socket2': 'SocketAddress', 'device': 'str' } }

Now the parser refuses any input I've tried. For example:

  $ build/qemu-system-x86_64 -object vfio-user-server,id=s,device=asdf,type=unix,path=asdf,type=unix
  qemu-system-x86_64: -object vfio-user-server,id=s,device=asdf,type=unix,path=asdf,type=unix: Parameter 'type' is missing

A similar case happens if the parent struct has 'type' or 'path' fields.
They collide with the SocketAddress union fields. I didn't test this
though.

Questions for Markus:
1. Do you know why the parser behaves like this?
2. Is it good practice to embed SocketAddress into parent structs or
   should we force it into a struct?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 04/12] vfio-user: instantiate vfio-user context
  2021-10-29 14:59     ` Jag Raman
@ 2021-11-01 10:35       ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-11-01 10:35 UTC (permalink / raw)
  To: Jag Raman
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Alex Williamson, Marc-André Lureau,
	thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 1550 bytes --]

On Fri, Oct 29, 2021 at 02:59:02PM +0000, Jag Raman wrote:
> 
> 
> > On Oct 27, 2021, at 11:59 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Mon, Oct 11, 2021 at 01:31:09AM -0400, Jagannathan Raman wrote:
> >> @@ -94,9 +101,31 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
> >>     trace_vfu_prop("device", str);
> >> }
> >> 
> >> +/*
> >> + * vfio-user-server depends on the availability of the 'socket' and 'device'
> >> + * properties. It also depends on devices instantiated in QEMU. These
> >> + * dependencies are not available during the instance_init phase of this
> >> + * object's life-cycle. As such, the server is initialized after the
> >> + * machine is setup. machine_init_done_notifier notifies vfio-user-server
> >> + * when the machine is setup, and the dependencies are available.
> >> + */
> >> +static void vfu_object_machine_done(Notifier *notifier, void *data)
> >> +{
> >> +    VfuObject *o = container_of(notifier, VfuObject, machine_done);
> > 
> > Was there a check for non-NULL o->socket before this? Maybe it's not
> > needed because QAPI treats 'socket' as a required field and refuses to
> > create the SocketAddress if it's missing?
> 
> Yes,  “socket” is a required option. The server will not launch without that option.

Thanks for confirming!

> I believe optional parameters are defined within ‘[‘ and ‘]’ braces in "./qapi/qom.json"

Optional parameters have the asterisk ('*'). Block quotes are for lists.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 05/12] vfio-user: find and init PCI device
  2021-10-29 15:58     ` Jag Raman
@ 2021-11-01 10:38       ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-11-01 10:38 UTC (permalink / raw)
  To: Jag Raman
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Alex Williamson, Marc-André Lureau,
	thanos.makatos, pbonzini, alex.bennee

[-- Attachment #1: Type: text/plain, Size: 4073 bytes --]

On Fri, Oct 29, 2021 at 03:58:28PM +0000, Jag Raman wrote:
> 
> 
> > On Oct 27, 2021, at 12:05 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Mon, Oct 11, 2021 at 01:31:10AM -0400, Jagannathan Raman wrote:
> >> Find the PCI device with specified id. Initialize the device context
> >> with the QEMU PCI device
> >> 
> >> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> >> Signed-off-by: John G Johnson <john.g.johnson@oracle.com>
> >> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> >> ---
> >> hw/remote/vfio-user-obj.c | 32 ++++++++++++++++++++++++++++++++
> >> 1 file changed, 32 insertions(+)
> >> 
> >> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> >> index d26e5ec9e9..7ce4e5b256 100644
> >> --- a/hw/remote/vfio-user-obj.c
> >> +++ b/hw/remote/vfio-user-obj.c
> >> @@ -44,6 +44,8 @@
> >> #include "qemu/notify.h"
> >> #include "sysemu/sysemu.h"
> >> #include "libvfio-user.h"
> >> +#include "hw/qdev-core.h"
> >> +#include "hw/pci/pci.h"
> >> 
> >> #define TYPE_VFU_OBJECT "vfio-user-server"
> >> OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
> >> @@ -68,6 +70,8 @@ struct VfuObject {
> >>     Notifier machine_done;
> >> 
> >>     vfu_ctx_t *vfu_ctx;
> >> +
> >> +    PCIDevice *pci_dev;
> >> };
> >> 
> >> static void vfu_object_set_socket(Object *obj, Visitor *v, const char *name,
> >> @@ -112,6 +116,9 @@ static void vfu_object_set_device(Object *obj, const char *str, Error **errp)
> >> static void vfu_object_machine_done(Notifier *notifier, void *data)
> >> {
> >>     VfuObject *o = container_of(notifier, VfuObject, machine_done);
> >> +    DeviceState *dev = NULL;
> >> +    vfu_pci_type_t pci_type = VFU_PCI_TYPE_CONVENTIONAL;
> >> +    int ret;
> >> 
> >>     o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket->u.q_unix.path, 0,
> >>                                 o, VFU_DEV_TYPE_PCI);
> >> @@ -120,6 +127,31 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
> >>                    strerror(errno));
> >>         return;
> >>     }
> >> +
> >> +    dev = qdev_find_recursive(sysbus_get_default(), o->device);
> >> +    if (dev == NULL) {
> >> +        error_setg(&error_abort, "vfu: Device %s not found", o->device);
> >> +        return;
> >> +    }
> >> +
> >> +    if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
> >> +        error_setg(&error_abort, "vfu: %s not a PCI device", o->device);
> >> +        return;
> >> +    }
> >> +
> >> +    o->pci_dev = PCI_DEVICE(dev);
> >> +
> >> +    if (pci_is_express(o->pci_dev)) {
> >> +        pci_type = VFU_PCI_TYPE_EXPRESS;
> >> +    }
> >> +
> >> +    ret = vfu_pci_init(o->vfu_ctx, pci_type, PCI_HEADER_TYPE_NORMAL, 0);
> >> +    if (ret < 0) {
> >> +        error_setg(&error_abort,
> >> +                   "vfu: Failed to attach PCI device %s to context - %s",
> >> +                   o->device, strerror(errno));
> >> +        return;
> >> +    }
> > 
> > It's unclear what happens when one of these error code paths is taken.
> > o->vfu_ctx and o->pci_dev might both be initialized, so how does the
> > code know not to service the vfio-user connection? It would be easy to
> > tell that this is correct if o->pci_dev and o->vfu_ctx were destroyed
> > when an error occurs.
> 
> Hey Stefan, if I understand your question correctly, you’re wondering
> if the server would still service the connection if it takes the above
> error path.
> 
> I don’t believe that would happen. When the above error path is taken,
> “error_abort” is set. Setting error_abort immediately terminates the
> QEMU process. Additionally, vfu_object_ctx_run() won’t be
> attached to the unix socket as the function exits early - so the
> connection wouldn’t be serviced.

Thanks. I'll revisit this next revision because error_abort cannot be
used for hotplug. It's okay to terminate the process on startup, but for
hotplug the expected behavior is to report an error and continue
running.

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-01 10:34       ` Stefan Hajnoczi
@ 2021-11-04 12:13         ` Markus Armbruster
  2021-11-04 14:39           ` Kevin Wolf
  2021-11-04 16:48           ` Stefan Hajnoczi
  0 siblings, 2 replies; 44+ messages in thread
From: Markus Armbruster @ 2021-11-04 12:13 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, John Johnson, thuth, Jag Raman, swapnil.ingle,
	john.levon, alex.bennee, qemu-devel, Markus Armbruster,
	Alex Williamson, Marc-André Lureau, Kevin Wolf,
	thanos.makatos, pbonzini, philmd

Stefan Hajnoczi <stefanha@redhat.com> writes:

> On Fri, Oct 29, 2021 at 02:42:49PM +0000, Jag Raman wrote:
>> > On Oct 27, 2021, at 11:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
>> > On Mon, Oct 11, 2021 at 01:31:08AM -0400, Jagannathan Raman wrote:
>> >> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> >> new file mode 100644
>> >> index 0000000000..c2a300f0ff
>> >> --- /dev/null
>> >> +++ b/hw/remote/vfio-user-obj.c
>> >> @@ -0,0 +1,173 @@
>> >> +/**
>> >> + * QEMU vfio-user-server server object
>> >> + *
>> >> + * Copyright © 2021 Oracle and/or its affiliates.
>> >> + *
>> >> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
>> >> + *
>> >> + * See the COPYING file in the top-level directory.
>> >> + *
>> >> + */
>> >> +
>> >> +/**
>> >> + * Usage: add options:
>> >> + *     -machine x-remote
>> >> + *     -device <PCI-device>,id=<pci-dev-id>
>> >> + *     -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,
>> > 
>> > I expected socket.type= and socket.path= based on the QAPI schema. Is
>> > this command-line example correct?
>> 
>> When I tried the “socket.path” approach, QEMU was not able to parse the
>> arguments. So I had to break it down to a series of individual members.
>> 
>> If “socket.path” is the expected way, I’ll see why the parser is not working
>> as expected. 
>
> CCing Markus regarding QAPI.
>
> I'm surprised because the QAPI schema for vfio-user-server objects is:
>
>   { 'struct': 'VfioUserServerProperties',
>     'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>
> It's not clear to me why the command-line parser flattens the 'socket'
> field into its 'type' and 'path' sub-fields in your example:
>
>   -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,device=<pci-dev-id>
>
> Maybe because SocketAddress is an enum instead of a struct?
>
> Imagine a second SocketAddress field is added to vfio-user-server. How
> can the parser know which field 'type' and 'path' belong to? I tried it:
>
>   { 'struct': 'VfioUserServerProperties',
>     'data': { 'socket': 'SocketAddress', 'socket2': 'SocketAddress', 'device': 'str' } }
>
> Now the parser refuses any input I've tried. For example:
>
>   $ build/qemu-system-x86_64 -object vfio-user-server,id=s,device=asdf,type=unix,path=asdf,type=unix
>   qemu-system-x86_64: -object vfio-user-server,id=s,device=asdf,type=unix,path=asdf,type=unix: Parameter 'type' is missing
>
> A similar case happens if the parent struct has 'type' or 'path' fields.
> They collide with the SocketAddress union fields. I didn't test this
> though.
>
> Questions for Markus:
> 1. Do you know why the parser behaves like this?

Yes: backward compatibility.

The straightforward way to do a QAPI-based command line option uses
qobject_input_visitor_new_str(), which parses either JSON or dotted
keys, and returns the result wrapped in the appropriate QObject visitor.

The JSON syntax is derived from the QAPI schema just like for QMP.  For
the VfioUserServerProperties shown above, it's something like

    {"socket": {"type": "unix", "path": "dir/socket"}, "device" "mumble"}

I did not check my derivation by feeding it to QEMU.  Beware of
screwups.

The dotted keys syntax is derived from the JSON syntax as described in
keyval.c.  For the JSON above, it should be

    socket.type=unix,socket.path=dir/socket,device=mumble

When we QAPIfy an existing option instead of adding a new QAPI-based
one, we have an additional problem: the dotted keys syntax has to match
the old syntax (the JSON syntax is all new, so no problem there).

The old syntax almost always has its quirks.  Ideally, we'd somehow get
from quirky old to boring new in an orderly manner.  Sadly, we still
don't have good solutions for that.  To make progress, we commonly
combine JSON new with quirky old.

qemu-system-FOO -object works that way.  object_option_parse() parses
either JSON or QemuOpts.  It wraps the former in a QObject visitor, and
the latter in an opts visitor.

QemuOpts is flat by design[*], so the opts visitor parses flat QemuOpts
from a (possibly non-flat) QAPI type.  How exactly it flattens, and how
it handles clashes I don't remember.

Sadly, this means that we get quirky old even for new object types.

Questions?

> 2. Is it good practice to embed SocketAddress into parent structs or
>    should we force it into a struct?

I'm not sure I got your question.  An example might help.


[*] You can play games with dotted keys to simulate nesting, but the
opts visitor predates all that.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-04 12:13         ` Markus Armbruster
@ 2021-11-04 14:39           ` Kevin Wolf
  2021-11-05 10:08             ` Markus Armbruster
  2021-11-04 16:48           ` Stefan Hajnoczi
  1 sibling, 1 reply; 44+ messages in thread
From: Kevin Wolf @ 2021-11-04 14:39 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Elena Ufimtseva, John Johnson, thuth, Jag Raman, swapnil.ingle,
	john.levon, alex.bennee, qemu-devel, Alex Williamson,
	Marc-André Lureau, Stefan Hajnoczi, thanos.makatos,
	pbonzini, philmd

Am 04.11.2021 um 13:13 hat Markus Armbruster geschrieben:
> The old syntax almost always has its quirks.  Ideally, we'd somehow get
> from quirky old to boring new in an orderly manner.  Sadly, we still
> don't have good solutions for that.  To make progress, we commonly
> combine JSON new with quirky old.
> 
> qemu-system-FOO -object works that way.  object_option_parse() parses
> either JSON or QemuOpts.  It wraps the former in a QObject visitor, and
> the latter in an opts visitor.
> 
> QemuOpts is flat by design[*], so the opts visitor parses flat QemuOpts
> from a (possibly non-flat) QAPI type.  How exactly it flattens, and how
> it handles clashes I don't remember.
> 
> Sadly, this means that we get quirky old even for new object types.

For -object in the system emulator (the tools all use the keyval
visitor, so there it would work as expected), the only reason that we
need to keep the quirky old code path around is the list handling in
memory-backend.host-nodes.

The main difficulty there is that the old QemuOpts based code path
allows specifying the option twice and both of them would effectively be
combined. Do we have any idea how to replicate this in a keyval parser
based world?

If not, do we want to use the remaining time until 6.2 to deprecate
this? The nasty part is that the only syntax that works both now and in
the future is JSON. We can't easily accept the new keyval syntax while
still using the QemuOpts based code.

Kevin



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-04 12:13         ` Markus Armbruster
  2021-11-04 14:39           ` Kevin Wolf
@ 2021-11-04 16:48           ` Stefan Hajnoczi
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2021-11-04 16:48 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Elena Ufimtseva, John Johnson, thuth, Jag Raman, swapnil.ingle,
	john.levon, alex.bennee, qemu-devel, Alex Williamson,
	Marc-André Lureau, Kevin Wolf, thanos.makatos, pbonzini,
	philmd

[-- Attachment #1: Type: text/plain, Size: 5060 bytes --]

On Thu, Nov 04, 2021 at 01:13:02PM +0100, Markus Armbruster wrote:
> Stefan Hajnoczi <stefanha@redhat.com> writes:
> 
> > On Fri, Oct 29, 2021 at 02:42:49PM +0000, Jag Raman wrote:
> >> > On Oct 27, 2021, at 11:40 AM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> >> > On Mon, Oct 11, 2021 at 01:31:08AM -0400, Jagannathan Raman wrote:
> >> >> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
> >> >> new file mode 100644
> >> >> index 0000000000..c2a300f0ff
> >> >> --- /dev/null
> >> >> +++ b/hw/remote/vfio-user-obj.c
> >> >> @@ -0,0 +1,173 @@
> >> >> +/**
> >> >> + * QEMU vfio-user-server server object
> >> >> + *
> >> >> + * Copyright © 2021 Oracle and/or its affiliates.
> >> >> + *
> >> >> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or later.
> >> >> + *
> >> >> + * See the COPYING file in the top-level directory.
> >> >> + *
> >> >> + */
> >> >> +
> >> >> +/**
> >> >> + * Usage: add options:
> >> >> + *     -machine x-remote
> >> >> + *     -device <PCI-device>,id=<pci-dev-id>
> >> >> + *     -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,
> >> > 
> >> > I expected socket.type= and socket.path= based on the QAPI schema. Is
> >> > this command-line example correct?
> >> 
> >> When I tried the “socket.path” approach, QEMU was not able to parse the
> >> arguments. So I had to break it down to a series of individual members.
> >> 
> >> If “socket.path” is the expected way, I’ll see why the parser is not working
> >> as expected. 
> >
> > CCing Markus regarding QAPI.
> >
> > I'm surprised because the QAPI schema for vfio-user-server objects is:
> >
> >   { 'struct': 'VfioUserServerProperties',
> >     'data': { 'socket': 'SocketAddress', 'device': 'str' } }
> >
> > It's not clear to me why the command-line parser flattens the 'socket'
> > field into its 'type' and 'path' sub-fields in your example:
> >
> >   -object vfio-user-server,id=<id>,type=unix,path=<socket-path>,device=<pci-dev-id>
> >
> > Maybe because SocketAddress is an enum instead of a struct?
> >
> > Imagine a second SocketAddress field is added to vfio-user-server. How
> > can the parser know which field 'type' and 'path' belong to? I tried it:
> >
> >   { 'struct': 'VfioUserServerProperties',
> >     'data': { 'socket': 'SocketAddress', 'socket2': 'SocketAddress', 'device': 'str' } }
> >
> > Now the parser refuses any input I've tried. For example:
> >
> >   $ build/qemu-system-x86_64 -object vfio-user-server,id=s,device=asdf,type=unix,path=asdf,type=unix
> >   qemu-system-x86_64: -object vfio-user-server,id=s,device=asdf,type=unix,path=asdf,type=unix: Parameter 'type' is missing
> >
> > A similar case happens if the parent struct has 'type' or 'path' fields.
> > They collide with the SocketAddress union fields. I didn't test this
> > though.
> >
> > Questions for Markus:
> > 1. Do you know why the parser behaves like this?
> 
> Yes: backward compatibility.
> 
> The straightforward way to do a QAPI-based command line option uses
> qobject_input_visitor_new_str(), which parses either JSON or dotted
> keys, and returns the result wrapped in the appropriate QObject visitor.
> 
> The JSON syntax is derived from the QAPI schema just like for QMP.  For
> the VfioUserServerProperties shown above, it's something like
> 
>     {"socket": {"type": "unix", "path": "dir/socket"}, "device" "mumble"}
> 
> I did not check my derivation by feeding it to QEMU.  Beware of
> screwups.
> 
> The dotted keys syntax is derived from the JSON syntax as described in
> keyval.c.  For the JSON above, it should be
> 
>     socket.type=unix,socket.path=dir/socket,device=mumble
> 
> When we QAPIfy an existing option instead of adding a new QAPI-based
> one, we have an additional problem: the dotted keys syntax has to match
> the old syntax (the JSON syntax is all new, so no problem there).
> 
> The old syntax almost always has its quirks.  Ideally, we'd somehow get
> from quirky old to boring new in an orderly manner.  Sadly, we still
> don't have good solutions for that.  To make progress, we commonly
> combine JSON new with quirky old.
> 
> qemu-system-FOO -object works that way.  object_option_parse() parses
> either JSON or QemuOpts.  It wraps the former in a QObject visitor, and
> the latter in an opts visitor.
> 
> QemuOpts is flat by design[*], so the opts visitor parses flat QemuOpts
> from a (possibly non-flat) QAPI type.  How exactly it flattens, and how
> it handles clashes I don't remember.
> 
> Sadly, this means that we get quirky old even for new object types.
> 
> Questions?
> 
> > 2. Is it good practice to embed SocketAddress into parent structs or
> >    should we force it into a struct?
> 
> I'm not sure I got your question.  An example might help.

I think Question 2 isn't relevant anymore.

Thanks for the explanation!

> [*] You can play games with dotted keys to simulate nesting, but the
> opts visitor predates all that.
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-04 14:39           ` Kevin Wolf
@ 2021-11-05 10:08             ` Markus Armbruster
  2021-11-05 13:19               ` Kevin Wolf
  0 siblings, 1 reply; 44+ messages in thread
From: Markus Armbruster @ 2021-11-05 10:08 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Elena Ufimtseva, John Johnson, thuth, Jag Raman, swapnil.ingle,
	john.levon, alex.bennee, qemu-devel, Alex Williamson,
	Marc-André Lureau, Stefan Hajnoczi, thanos.makatos,
	pbonzini, philmd

Kevin Wolf <kwolf@redhat.com> writes:

> Am 04.11.2021 um 13:13 hat Markus Armbruster geschrieben:
>> The old syntax almost always has its quirks.  Ideally, we'd somehow get
>> from quirky old to boring new in an orderly manner.  Sadly, we still
>> don't have good solutions for that.  To make progress, we commonly
>> combine JSON new with quirky old.
>> 
>> qemu-system-FOO -object works that way.  object_option_parse() parses
>> either JSON or QemuOpts.  It wraps the former in a QObject visitor, and
>> the latter in an opts visitor.
>> 
>> QemuOpts is flat by design[*], so the opts visitor parses flat QemuOpts
>> from a (possibly non-flat) QAPI type.  How exactly it flattens, and how
>> it handles clashes I don't remember.
>> 
>> Sadly, this means that we get quirky old even for new object types.
>
> For -object in the system emulator (the tools all use the keyval
> visitor, so there it would work as expected), the only reason that we
> need to keep the quirky old code path around is the list handling in
> memory-backend.host-nodes.
>
> The main difficulty there is that the old QemuOpts based code path
> allows specifying the option twice and both of them would effectively be
> combined. Do we have any idea how to replicate this in a keyval parser
> based world?

I can see just two clean solutions, but both involve upending a lot of
code.

We can fuse keyval parser and visitor to get a schema-directed parser.

We can change the abstract keyval syntax to permit repeated keys.  This
means replacing QDict in in the abstract syntax tree, with fallout in
the visitor.

Even if we find a practical solution, I don't like the combination of
"you may give the same parameter multiple times, and the last one wins"
and "for a list-valued parameter, the values of repeated parameters are
collected into a list".  Each makes sense on its own.  The combination
not so much.  Inheriting "last one wins" from QemuOpts may have been a
mistake.

The keyval way of doing lists (inherited from the block layer's usage of
dotted keys?  I don't remember) requires the user to count, which isn't
exactly nice, either.

> If not, do we want to use the remaining time until 6.2 to deprecate
> this? The nasty part is that the only syntax that works both now and in
> the future is JSON. We can't easily accept the new keyval syntax while
> still using the QemuOpts based code.

What exactly do you propose to deprecate?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-05 10:08             ` Markus Armbruster
@ 2021-11-05 13:19               ` Kevin Wolf
  2021-11-05 13:54                 ` Peter Krempa
  2021-11-06  6:34                 ` Markus Armbruster
  0 siblings, 2 replies; 44+ messages in thread
From: Kevin Wolf @ 2021-11-05 13:19 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Elena Ufimtseva, John Johnson, thuth, pkrempa, Jag Raman,
	swapnil.ingle, john.levon, alex.bennee, qemu-devel,
	Alex Williamson, Marc-André Lureau, Stefan Hajnoczi,
	thanos.makatos, pbonzini, philmd

Am 05.11.2021 um 11:08 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 04.11.2021 um 13:13 hat Markus Armbruster geschrieben:
> >> The old syntax almost always has its quirks.  Ideally, we'd somehow get
> >> from quirky old to boring new in an orderly manner.  Sadly, we still
> >> don't have good solutions for that.  To make progress, we commonly
> >> combine JSON new with quirky old.
> >> 
> >> qemu-system-FOO -object works that way.  object_option_parse() parses
> >> either JSON or QemuOpts.  It wraps the former in a QObject visitor, and
> >> the latter in an opts visitor.
> >> 
> >> QemuOpts is flat by design[*], so the opts visitor parses flat QemuOpts
> >> from a (possibly non-flat) QAPI type.  How exactly it flattens, and how
> >> it handles clashes I don't remember.
> >> 
> >> Sadly, this means that we get quirky old even for new object types.
> >
> > For -object in the system emulator (the tools all use the keyval
> > visitor, so there it would work as expected), the only reason that we
> > need to keep the quirky old code path around is the list handling in
> > memory-backend.host-nodes.
> >
> > The main difficulty there is that the old QemuOpts based code path
> > allows specifying the option twice and both of them would effectively be
> > combined. Do we have any idea how to replicate this in a keyval parser
> > based world?
> 
> I can see just two clean solutions, but both involve upending a lot of
> code.
> 
> We can fuse keyval parser and visitor to get a schema-directed parser.
> 
> We can change the abstract keyval syntax to permit repeated keys.  This
> means replacing QDict in in the abstract syntax tree, with fallout in
> the visitor.
> 
> Even if we find a practical solution, I don't like the combination of
> "you may give the same parameter multiple times, and the last one wins"
> and "for a list-valued parameter, the values of repeated parameters are
> collected into a list".  Each makes sense on its own.  The combination
> not so much.  Inheriting "last one wins" from QemuOpts may have been a
> mistake.
> 
> The keyval way of doing lists (inherited from the block layer's usage of
> dotted keys?  I don't remember) requires the user to count, which isn't
> exactly nice, either.

Yes. If we didn't have to maintain compatibility (or actually as soon as
we degrade non-JSON option lists to HMP-level support), I would
introduce [] and {} syntax for lists and dicts, even if that means that
use of these characters in strings doesn't work any more or only in a
limited way. I think this would be the best compromise for usability.

Anyway, this doesn't help us with the compatibility problem we're
discussing here.

> > If not, do we want to use the remaining time until 6.2 to deprecate
> > this? The nasty part is that the only syntax that works both now and in
> > the future is JSON. We can't easily accept the new keyval syntax while
> > still using the QemuOpts based code.
> 
> What exactly do you propose to deprecate?

We can deprecate on two different levels. I think it's useful to do
both:

1. Broad deprecation: Stable non-JSON interfaces are degraded to
   a HMP-like compatibility promise. Obviously, this can only be done
   for options that support JSON. Peter Maydell also wants to do this
   only after a big user (read: libvirt) has implemented and is
   using JSON, basically as a proof that the alternative is working.

   So this can certainly be done for -object. I believe libvirt also
   uses JSON for -device now, so this should be fine now, too. Possibly
   -drive (in favour of -blockdev), though I'm not completely sure if we
   have gotten rid of the final users of -drive. (CCing Peter Krempa for
   details.)

   This degradation of the compatibility promise doesn't tell users what
   exactly is going to change, which is why doing the second one, too,
   might be nice.

2. Narrow deprecation: We can just deprecate the non-JSON form, or
   certain aspects of it, of memory-backend.host-nodes. This is the
   specific things that stops us from switching -object to keyval.

   a. Deprecate the whole option. If you want to use it and need a
      stable interface, you have to use JSON. We'll just switch the
      non-JSON form on a flag day. Before it, you need to use QemuOpts +
      OptsVisitor syntax for the list; after it, you need to use keyval
      syntax.

   b. Deprecate only repeating the option. memory-backend is changed to
      first try visiting a list, and if that fails, it visits a string
      and goes through a string visitor locally to keep supporting the
      integer range syntax.

   c. Deprecate all list values, but keep supporting a single integer
      value by using an alternate between list and int.

Picking one of these four options is enough to convert -object to
keyval. I would suggest doing both 1. and one of the options in 2.

Kevin



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-05 13:19               ` Kevin Wolf
@ 2021-11-05 13:54                 ` Peter Krempa
  2021-11-06  6:34                 ` Markus Armbruster
  1 sibling, 0 replies; 44+ messages in thread
From: Peter Krempa @ 2021-11-05 13:54 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Elena Ufimtseva, John Johnson, thuth, Jag Raman, swapnil.ingle,
	john.levon, alex.bennee, Markus Armbruster, qemu-devel,
	Alex Williamson, Marc-André Lureau, Stefan Hajnoczi,
	thanos.makatos, pbonzini, philmd

On Fri, Nov 05, 2021 at 14:19:34 +0100, Kevin Wolf wrote:
> Am 05.11.2021 um 11:08 hat Markus Armbruster geschrieben:
> > Kevin Wolf <kwolf@redhat.com> writes:
> > 
> > > Am 04.11.2021 um 13:13 hat Markus Armbruster geschrieben:

[...]

> We can deprecate on two different levels. I think it's useful to do
> both:
> 
> 1. Broad deprecation: Stable non-JSON interfaces are degraded to
>    a HMP-like compatibility promise. Obviously, this can only be done
>    for options that support JSON. Peter Maydell also wants to do this
>    only after a big user (read: libvirt) has implemented and is
>    using JSON, basically as a proof that the alternative is working.
> 
>    So this can certainly be done for -object. I believe libvirt also
>    uses JSON for -device now, so this should be fine now, too. Possibly
>    -drive (in favour of -blockdev), though I'm not completely sure if we
>    have gotten rid of the final users of -drive. (CCing Peter Krempa for
>    details.)

Libvirt uses JSON exclusively with -object (starting from qemu-6.0) and
with -device (starting from qemu-6.2).

We could also easily do -netdev if JSON support will be added for the
option as internally we already create the JSON object (so that the code
is identical for QMP hotplug) and just convert it back to the
commandline syntax.

Libvirt also uses JSON with -blockdev since -blockdev support was
introduced (IIRC qemu-4.2 added all the necessary bits for us).
If -blockdev is used libvirt does not use -drive at all (except for SD
cards for ARM boards), but that is not tested very well.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-05 13:19               ` Kevin Wolf
  2021-11-05 13:54                 ` Peter Krempa
@ 2021-11-06  6:34                 ` Markus Armbruster
  2021-11-08 12:05                   ` Kevin Wolf
  1 sibling, 1 reply; 44+ messages in thread
From: Markus Armbruster @ 2021-11-06  6:34 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Elena Ufimtseva, John Johnson, thuth, pkrempa, Jag Raman,
	swapnil.ingle, john.levon, alex.bennee, qemu-devel,
	Alex Williamson, Marc-André Lureau, Stefan Hajnoczi,
	thanos.makatos, pbonzini, philmd

Kevin Wolf <kwolf@redhat.com> writes:

> Am 05.11.2021 um 11:08 hat Markus Armbruster geschrieben:
>> Kevin Wolf <kwolf@redhat.com> writes:
>> 
>> > Am 04.11.2021 um 13:13 hat Markus Armbruster geschrieben:
>> >> The old syntax almost always has its quirks.  Ideally, we'd somehow get
>> >> from quirky old to boring new in an orderly manner.  Sadly, we still
>> >> don't have good solutions for that.  To make progress, we commonly
>> >> combine JSON new with quirky old.
>> >> 
>> >> qemu-system-FOO -object works that way.  object_option_parse() parses
>> >> either JSON or QemuOpts.  It wraps the former in a QObject visitor, and
>> >> the latter in an opts visitor.
>> >> 
>> >> QemuOpts is flat by design[*], so the opts visitor parses flat QemuOpts
>> >> from a (possibly non-flat) QAPI type.  How exactly it flattens, and how
>> >> it handles clashes I don't remember.
>> >> 
>> >> Sadly, this means that we get quirky old even for new object types.
>> >
>> > For -object in the system emulator (the tools all use the keyval
>> > visitor, so there it would work as expected), the only reason that we
>> > need to keep the quirky old code path around is the list handling in
>> > memory-backend.host-nodes.
>> >
>> > The main difficulty there is that the old QemuOpts based code path
>> > allows specifying the option twice and both of them would effectively be
>> > combined. Do we have any idea how to replicate this in a keyval parser
>> > based world?
>> 
>> I can see just two clean solutions, but both involve upending a lot of
>> code.
>> 
>> We can fuse keyval parser and visitor to get a schema-directed parser.
>> 
>> We can change the abstract keyval syntax to permit repeated keys.  This
>> means replacing QDict in in the abstract syntax tree, with fallout in
>> the visitor.
>> 
>> Even if we find a practical solution, I don't like the combination of
>> "you may give the same parameter multiple times, and the last one wins"
>> and "for a list-valued parameter, the values of repeated parameters are
>> collected into a list".  Each makes sense on its own.  The combination
>> not so much.  Inheriting "last one wins" from QemuOpts may have been a
>> mistake.
>> 
>> The keyval way of doing lists (inherited from the block layer's usage of
>> dotted keys?  I don't remember) requires the user to count, which isn't
>> exactly nice, either.
>
> Yes. If we didn't have to maintain compatibility (or actually as soon as
> we degrade non-JSON option lists to HMP-level support), I would
> introduce [] and {} syntax for lists and dicts, even if that means that
> use of these characters in strings doesn't work any more or only in a
> limited way. I think this would be the best compromise for usability.
>
> Anyway, this doesn't help us with the compatibility problem we're
> discussing here.
>
>> > If not, do we want to use the remaining time until 6.2 to deprecate
>> > this? The nasty part is that the only syntax that works both now and in
>> > the future is JSON. We can't easily accept the new keyval syntax while
>> > still using the QemuOpts based code.
>> 
>> What exactly do you propose to deprecate?
>
> We can deprecate on two different levels. I think it's useful to do
> both:
>
> 1. Broad deprecation: Stable non-JSON interfaces are degraded to
>    a HMP-like compatibility promise.

Calling it "deprecation" might be confusing.  HMP isn't deprecated, it's
merely not a stable interface.  That's kind of like "deprecated when you
need stable", but saying "not a stable interface" is clearer.

When I write "deprecate" below, I mean something like "go use something
else (no conditions)".  When I mean "use something else when you need
stable", I write "degrade" (short for "degrade to an HMP-like
compatibility promise").

>                                      Obviously, this can only be done
>    for options that support JSON.

We can also degrade or even deprecate sugar options in favor of the real
ones.  Case by case, I guess.

>                                   Peter Maydell also wants to do this
>    only after a big user (read: libvirt) has implemented and is
>    using JSON, basically as a proof that the alternative is working.
>
>    So this can certainly be done for -object. I believe libvirt also
>    uses JSON for -device now, so this should be fine now, too.

The non-sugar options supporting JSON are -audiodev, -blockdev, -compat,
-display (partially), -machine (I think), -object.

-netdev is QAPIfied, but still uses QemuOpts.  Too late for 6.2, I'm
afraid.

>                                                                Possibly
>    -drive (in favour of -blockdev), though I'm not completely sure if we
>    have gotten rid of the final users of -drive. (CCing Peter Krempa for
>    details.)

The problem with deprecating -drive is configuring onboard block
devices.  We need a stable interface for that, and it must be usable
together with -blockdev.

We provided such an interface (machine properties) for some onboard
block devices starting with commit ebc29e1bea "pc: Support firmware
configuration with -blockdev".  Many more remain, I believe.

>    This degradation of the compatibility promise doesn't tell users what
>    exactly is going to change, which is why doing the second one, too,
>    might be nice.
>
> 2. Narrow deprecation: We can just deprecate the non-JSON form, or
>    certain aspects of it, of memory-backend.host-nodes. This is the
>    specific things that stops us from switching -object to keyval.
>
>    a. Deprecate the whole option. If you want to use it and need a
>       stable interface, you have to use JSON. We'll just switch the
>       non-JSON form on a flag day. Before it, you need to use QemuOpts +
>       OptsVisitor syntax for the list; after it, you need to use keyval
>       syntax.

I parse "the whole option" as "-object with dotted keys argument".
Correct?

>    b. Deprecate only repeating the option. memory-backend is changed to
>       first try visiting a list, and if that fails, it visits a string
>       and goes through a string visitor locally to keep supporting the
>       integer range syntax.

Possible problem: integer range syntax must not leak into the JSON form.

>    c. Deprecate all list values, but keep supporting a single integer
>       value by using an alternate between list and int.

Single int should also not leak into JSON.

> Picking one of these four options is enough to convert -object to
> keyval. I would suggest doing both 1. and one of the options in 2.

I'm grateful for your analysis.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-06  6:34                 ` Markus Armbruster
@ 2021-11-08 12:05                   ` Kevin Wolf
  2021-11-08 12:54                     ` Peter Krempa
  0 siblings, 1 reply; 44+ messages in thread
From: Kevin Wolf @ 2021-11-08 12:05 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Elena Ufimtseva, John Johnson, thuth, pkrempa, Jag Raman,
	swapnil.ingle, john.levon, alex.bennee, qemu-devel,
	Alex Williamson, Marc-André Lureau, Stefan Hajnoczi,
	thanos.makatos, pbonzini, philmd

Am 06.11.2021 um 07:34 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 05.11.2021 um 11:08 hat Markus Armbruster geschrieben:
> >> Kevin Wolf <kwolf@redhat.com> writes:
> >> 
> >> > Am 04.11.2021 um 13:13 hat Markus Armbruster geschrieben:
> >> >> The old syntax almost always has its quirks.  Ideally, we'd somehow get
> >> >> from quirky old to boring new in an orderly manner.  Sadly, we still
> >> >> don't have good solutions for that.  To make progress, we commonly
> >> >> combine JSON new with quirky old.
> >> >> 
> >> >> qemu-system-FOO -object works that way.  object_option_parse() parses
> >> >> either JSON or QemuOpts.  It wraps the former in a QObject visitor, and
> >> >> the latter in an opts visitor.
> >> >> 
> >> >> QemuOpts is flat by design[*], so the opts visitor parses flat QemuOpts
> >> >> from a (possibly non-flat) QAPI type.  How exactly it flattens, and how
> >> >> it handles clashes I don't remember.
> >> >> 
> >> >> Sadly, this means that we get quirky old even for new object types.
> >> >
> >> > For -object in the system emulator (the tools all use the keyval
> >> > visitor, so there it would work as expected), the only reason that we
> >> > need to keep the quirky old code path around is the list handling in
> >> > memory-backend.host-nodes.
> >> >
> >> > The main difficulty there is that the old QemuOpts based code path
> >> > allows specifying the option twice and both of them would effectively be
> >> > combined. Do we have any idea how to replicate this in a keyval parser
> >> > based world?
> >> 
> >> I can see just two clean solutions, but both involve upending a lot of
> >> code.
> >> 
> >> We can fuse keyval parser and visitor to get a schema-directed parser.
> >> 
> >> We can change the abstract keyval syntax to permit repeated keys.  This
> >> means replacing QDict in in the abstract syntax tree, with fallout in
> >> the visitor.
> >> 
> >> Even if we find a practical solution, I don't like the combination of
> >> "you may give the same parameter multiple times, and the last one wins"
> >> and "for a list-valued parameter, the values of repeated parameters are
> >> collected into a list".  Each makes sense on its own.  The combination
> >> not so much.  Inheriting "last one wins" from QemuOpts may have been a
> >> mistake.
> >> 
> >> The keyval way of doing lists (inherited from the block layer's usage of
> >> dotted keys?  I don't remember) requires the user to count, which isn't
> >> exactly nice, either.
> >
> > Yes. If we didn't have to maintain compatibility (or actually as soon as
> > we degrade non-JSON option lists to HMP-level support), I would
> > introduce [] and {} syntax for lists and dicts, even if that means that
> > use of these characters in strings doesn't work any more or only in a
> > limited way. I think this would be the best compromise for usability.
> >
> > Anyway, this doesn't help us with the compatibility problem we're
> > discussing here.
> >
> >> > If not, do we want to use the remaining time until 6.2 to deprecate
> >> > this? The nasty part is that the only syntax that works both now and in
> >> > the future is JSON. We can't easily accept the new keyval syntax while
> >> > still using the QemuOpts based code.
> >> 
> >> What exactly do you propose to deprecate?
> >
> > We can deprecate on two different levels. I think it's useful to do
> > both:
> >
> > 1. Broad deprecation: Stable non-JSON interfaces are degraded to
> >    a HMP-like compatibility promise.
> 
> Calling it "deprecation" might be confusing.  HMP isn't deprecated, it's
> merely not a stable interface.  That's kind of like "deprecated when you
> need stable", but saying "not a stable interface" is clearer.
> 
> When I write "deprecate" below, I mean something like "go use something
> else (no conditions)".  When I mean "use something else when you need
> stable", I write "degrade" (short for "degrade to an HMP-like
> compatibility promise").
> 
> >                                      Obviously, this can only be done
> >    for options that support JSON.
> 
> We can also degrade or even deprecate sugar options in favor of the real
> ones.  Case by case, I guess.

Right. And essentially, the non-JSON form would be considered a sugar
option, even if the option string is the same.

> >                                   Peter Maydell also wants to do this
> >    only after a big user (read: libvirt) has implemented and is
> >    using JSON, basically as a proof that the alternative is working.
> >
> >    So this can certainly be done for -object. I believe libvirt also
> >    uses JSON for -device now, so this should be fine now, too.
> 
> The non-sugar options supporting JSON are -audiodev, -blockdev, -compat,
> -display (partially), -machine (I think), -object.
> 
> -netdev is QAPIfied, but still uses QemuOpts.  Too late for 6.2, I'm
> afraid.

Ok. Not sure about the libvirt status for some of these, but -object and
-device are the ones that I know are going to be in the way in the
future, so degrading their non-JSON form would already be helpful.

> >                                                                Possibly
> >    -drive (in favour of -blockdev), though I'm not completely sure if we
> >    have gotten rid of the final users of -drive. (CCing Peter Krempa for
> >    details.)
> 
> The problem with deprecating -drive is configuring onboard block
> devices.  We need a stable interface for that, and it must be usable
> together with -blockdev.
> 
> We provided such an interface (machine properties) for some onboard
> block devices starting with commit ebc29e1bea "pc: Support firmware
> configuration with -blockdev".  Many more remain, I believe.

So maybe we need to define a form of -drive (or a new option) that would
stay stable and can just take a block node and create a DriveInfo for
it.

Anyway, not for 6.2, so let's ignore this for now.

> >    This degradation of the compatibility promise doesn't tell users what
> >    exactly is going to change, which is why doing the second one, too,
> >    might be nice.
> >
> > 2. Narrow deprecation: We can just deprecate the non-JSON form, or
> >    certain aspects of it, of memory-backend.host-nodes. This is the
> >    specific things that stops us from switching -object to keyval.
> >
> >    a. Deprecate the whole option. If you want to use it and need a
> >       stable interface, you have to use JSON. We'll just switch the
> >       non-JSON form on a flag day. Before it, you need to use QemuOpts +
> >       OptsVisitor syntax for the list; after it, you need to use keyval
> >       syntax.
> 
> I parse "the whole option" as "-object with dotted keys argument".
> Correct?

No, degrading non-JSON -object (it's still QemuOpts, so "dotted keys"
aren't even supported) is already option 1.

This one is specifically "memory-backend.host-nodes on the CLI".

> >    b. Deprecate only repeating the option. memory-backend is changed to
> >       first try visiting a list, and if that fails, it visits a string
> >       and goes through a string visitor locally to keep supporting the
> >       integer range syntax.
> 
> Possible problem: integer range syntax must not leak into the JSON form.
> 
> >    c. Deprecate all list values, but keep supporting a single integer
> >       value by using an alternate between list and int.
> 
> Single int should also not leak into JSON.

Honestly, I don't care about them leaking into JSON and QMP, and I don't
think you should either.

If we insist on a perfectly idiomatic QAPI schema as if we were writing
the objects today, we won't have made any progress even in 10 years.
Many QOM objects have properties that are a mess and it will show in the
schema. Strings that are parsed, alternates to provide different syntax
for the same thing, etc. I don't like any of this, but we're not
designing new interfaces here, but describing existing ones.

I do understand and agree that you want to keep the core infrastructure
reasonably clean from hacks, because it affects everything and we're
touching it a lot. But if an individual property in some QOM object is
in the way, we should just hack around it. Spending a lot of thought on
how to get it cleaned up would have a much higher cost than maintaining
a small hack.

Kevin



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 03/12] vfio-user: define vfio-user-server object
  2021-11-08 12:05                   ` Kevin Wolf
@ 2021-11-08 12:54                     ` Peter Krempa
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Krempa @ 2021-11-08 12:54 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Elena Ufimtseva, John Johnson, thuth, Jag Raman, swapnil.ingle,
	john.levon, alex.bennee, Markus Armbruster, qemu-devel,
	Alex Williamson, Marc-André Lureau, Stefan Hajnoczi,
	thanos.makatos, pbonzini, philmd

On Mon, Nov 08, 2021 at 13:05:10 +0100, Kevin Wolf wrote:
> Am 06.11.2021 um 07:34 hat Markus Armbruster geschrieben:
> > Kevin Wolf <kwolf@redhat.com> writes:
> > > Am 05.11.2021 um 11:08 hat Markus Armbruster geschrieben:
> > >> Kevin Wolf <kwolf@redhat.com> writes:
> > >> > Am 04.11.2021 um 13:13 hat Markus Armbruster geschrieben:

[...]

> > >                                   Peter Maydell also wants to do this
> > >    only after a big user (read: libvirt) has implemented and is
> > >    using JSON, basically as a proof that the alternative is working.
> > >
> > >    So this can certainly be done for -object. I believe libvirt also
> > >    uses JSON for -device now, so this should be fine now, too.
> > 
> > The non-sugar options supporting JSON are -audiodev, -blockdev, -compat,
> > -display (partially), -machine (I think), -object.
> > 
> > -netdev is QAPIfied, but still uses QemuOpts.  Too late for 6.2, I'm
> > afraid.
> 
> Ok. Not sure about the libvirt status for some of these, but -object and

Interresting.

We don't do JSON for -audiodev, -compat, -display or -machine.

-audiodev and -compat are recent enough so I suppose those accepted JSON
always. Converting them will be trivial.

For -display and -machine we'll need a witness to switch to the new
syntax but I think I can convert them in libvirt if it helps qemu to
have a more consistent message.

> -device are the ones that I know are going to be in the way in the
> future, so degrading their non-JSON form would already be helpful.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH v3 11/12] vfio-user: register handlers to facilitate migration
  2021-10-27 18:30   ` Stefan Hajnoczi
@ 2021-12-15 15:49     ` Jag Raman
  0 siblings, 0 replies; 44+ messages in thread
From: Jag Raman @ 2021-12-15 15:49 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Elena Ufimtseva, John Johnson, thuth, swapnil.ingle, john.levon,
	philmd, qemu-devel, Alex Williamson, Marc-André Lureau,
	thanos.makatos, pbonzini, alex.bennee



> On Oct 27, 2021, at 2:30 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Mon, Oct 11, 2021 at 01:31:16AM -0400, Jagannathan Raman wrote:
>> +static void vfu_mig_state_running(vfu_ctx_t *vfu_ctx)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
>> +    static int migrated_devs;
>> +    Error *local_err = NULL;
>> +    int ret;
>> +
>> +    /**
>> +     * TODO: move to VFU_MIGR_STATE_RESUME handler. Presently, the
>> +     * VMSD data from source is not available at RESUME state.
>> +     * Working on a fix for this.
>> +     */
>> +    if (!o->vfu_mig_file) {
>> +        o->vfu_mig_file = qemu_fopen_ops(o, &vfu_mig_fops_load, false);
>> +    }
>> +
>> +    ret = qemu_remote_loadvm(o->vfu_mig_file);
>> +    if (ret) {
>> +        error_setg(&error_abort, "vfu: failed to restore device state");
>> +        return;
>> +    }
>> +
>> +    qemu_file_shutdown(o->vfu_mig_file);
>> +    o->vfu_mig_file = NULL;
>> +
>> +    /* VFU_MIGR_STATE_RUNNING begins here */
>> +    if (++migrated_devs == k->nr_devs) {
> 
> See below about migrated_devs.
> 
>> +        bdrv_invalidate_cache_all(&local_err);
>> +        if (local_err) {
>> +            error_report_err(local_err);
>> +            return;
>> +        }
>> +
>> +        vm_start();
>> +    }
>> +}
>> +
>> +static void vfu_mig_state_stop(vfu_ctx_t *vfu_ctx)
>> +{
>> +    VfuObject *o = vfu_get_private(vfu_ctx);
>> +    VfuObjectClass *k = VFU_OBJECT_GET_CLASS(OBJECT(o));
>> +    static int migrated_devs;
>> +
>> +    /**
>> +     * note: calling bdrv_inactivate_all() is not the best approach.
>> +     *
>> +     *  Ideally, we would identify the block devices (if any) indirectly
>> +     *  linked (such as via a scs-hd device) to each of the migrated devices,
> 
> s/scs/scsi/
> 
>> +     *  and inactivate them individually. This is essential while operating
>> +     *  the server in a storage daemon mode, with devices from different VMs.
>> +     *
>> +     *  However, we currently don't have this capability. As such, we need to
>> +     *  inactivate all devices at the same time when migration is completed.
>> +     */
>> +    if (++migrated_devs == k->nr_devs) {
>> +        bdrv_inactivate_all();
>> +        vm_stop(RUN_STATE_PAUSED);
> 
> The order of these two functions is reversed in migration/migration.c.
> First we pause the VM, then we inactivate disks.
> 
> I think we need to zero migrated_devs in case migration fails and we try
> to migrate again later:
> 
>  migrated_devs = 0;
> 
> This is still not quite right because maybe only a few VfuObjects are
> stopped before migration fails. A different approach for counting
> devices is necessary, like zeroing migrated_devs in
> vfu_mig_state_stop_and_copy().

Hi Stefan,

I understand that it’s important to detect cancellation. However, I’m not sure how
to go about doing it.

The difficulty is that the migration callbacks are per device/VFIO context. There is
no global callback to indicate when all the devices have been migrated successfully.

We could probably have a bitmap of devices’ migration status, and set it when
migration data was sent to client? When migration is cancelled and the state
switches to running, we could check this bit and detect cancellation?

> 
>> @@ -422,6 +722,35 @@ static void vfu_object_machine_done(Notifier *notifier, void *data)
>>         return;
>>     }
>> 
>> +    /*
>> +     * TODO: The 0x20000 number used below is a temporary. We are working on
>> +     *     a cleaner fix for this.
>> +     *
>> +     *     The libvfio-user library assumes that the remote knows the size of
>> +     *     the data to be migrated at boot time, but that is not the case with
>> +     *     VMSDs, as it can contain a variable-size buffer. 0x20000 is used
>> +     *     as a sufficiently large buffer to demonstrate migration, but that
>> +     *     cannot be used as a solution.
>> +     *
>> +     */
> 
> My question from the previous revision was not answered:
> 
>  libvfio-user has the vfu_migration_callbacks_t interface that allows the
>  device to save/load more data regardless of the size of the migration
>  region. I don't see the issue here since the region doesn't need to be
>  sized to fit the savevm data?

In both scenarios at the server end - whether using the migration BAR or
using callbacks, the migration data is transported to the other end using
the BAR. As such we need to specify the BAR’s size during initialization.

In the case of the callbacks, the library translates the BAR access to callbacks.

Thank you very much!
--
Jag


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2021-12-15 16:18 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-11  5:31 [PATCH v3 00/12] vfio-user server in QEMU Jagannathan Raman
2021-10-11  5:31 ` [PATCH v3 01/12] configure, meson: override C compiler for cmake Jagannathan Raman
2021-10-12 10:44   ` Paolo Bonzini
2021-10-11  5:31 ` [PATCH v3 02/12] vfio-user: build library Jagannathan Raman
2021-10-27 15:17   ` Stefan Hajnoczi
2021-10-29 14:17     ` Jag Raman
2021-11-01  9:56       ` Stefan Hajnoczi
2021-10-11  5:31 ` [PATCH v3 03/12] vfio-user: define vfio-user-server object Jagannathan Raman
2021-10-27 15:40   ` Stefan Hajnoczi
2021-10-29 14:42     ` Jag Raman
2021-11-01 10:34       ` Stefan Hajnoczi
2021-11-04 12:13         ` Markus Armbruster
2021-11-04 14:39           ` Kevin Wolf
2021-11-05 10:08             ` Markus Armbruster
2021-11-05 13:19               ` Kevin Wolf
2021-11-05 13:54                 ` Peter Krempa
2021-11-06  6:34                 ` Markus Armbruster
2021-11-08 12:05                   ` Kevin Wolf
2021-11-08 12:54                     ` Peter Krempa
2021-11-04 16:48           ` Stefan Hajnoczi
2021-10-11  5:31 ` [PATCH v3 04/12] vfio-user: instantiate vfio-user context Jagannathan Raman
2021-10-27 15:59   ` Stefan Hajnoczi
2021-10-29 14:59     ` Jag Raman
2021-11-01 10:35       ` Stefan Hajnoczi
2021-10-11  5:31 ` [PATCH v3 05/12] vfio-user: find and init PCI device Jagannathan Raman
2021-10-27 16:05   ` Stefan Hajnoczi
2021-10-29 15:58     ` Jag Raman
2021-11-01 10:38       ` Stefan Hajnoczi
2021-10-11  5:31 ` [PATCH v3 06/12] vfio-user: run vfio-user context Jagannathan Raman
2021-10-27 16:21   ` Stefan Hajnoczi
2021-10-28 21:55     ` John Levon
2021-10-11  5:31 ` [PATCH v3 07/12] vfio-user: handle PCI config space accesses Jagannathan Raman
2021-10-27 16:35   ` Stefan Hajnoczi
2021-10-11  5:31 ` [PATCH v3 08/12] vfio-user: handle DMA mappings Jagannathan Raman
2021-10-11  5:31 ` [PATCH v3 09/12] vfio-user: handle PCI BAR accesses Jagannathan Raman
2021-10-27 16:38   ` Stefan Hajnoczi
2021-10-11  5:31 ` [PATCH v3 10/12] vfio-user: handle device interrupts Jagannathan Raman
2021-10-11  5:31 ` [PATCH v3 11/12] vfio-user: register handlers to facilitate migration Jagannathan Raman
2021-10-27 18:30   ` Stefan Hajnoczi
2021-12-15 15:49     ` Jag Raman
2021-10-11  5:31 ` [PATCH v3 12/12] vfio-user: acceptance test Jagannathan Raman
2021-10-11 22:26   ` Philippe Mathieu-Daudé
2021-10-27 16:42   ` Stefan Hajnoczi
2021-10-27 18:33 ` [PATCH v3 00/12] vfio-user server in QEMU Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).