All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 00/41] vfio: Adopt iommufd
@ 2023-11-02  7:12 Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 01/41] vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c Zhenzhong Duan
                   ` (42 more replies)
  0 siblings, 43 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

Hi,

Thanks all for giving guides and comments on previous series, here is
the v4 of pure iommufd support part.

Based on Cédric's suggestion, this series includes an effort to remove
spapr code from container.c, now all spapr functions are moved to spapr.c
or spapr_pci_vfio.c, but there are still a few trival check on
VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
callbacks and duplicate code just to remove them. Some functions are moved
to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
spapr_pci_vfio.c is arch specific, or else we need to introduce stub
functions to those spapr functions moved.


PATCH 1-5: Move spapr functions to spapr*.c
PATCH 6-20: Abstract out base container
PATCH 21-24: Introduce sparpr container and its specific interface
PATCH 25: Add --enable/--disable-iommufd config support
PATCH 26: Introduce iommufd object
PATCH 27-33: add IOMMUFD container and cdev support
PATCH 34-39: fd passing for IOMMUFD object and cdev
PATCH 40: make VFIOContainerBase parameter const
PATCH 41: Compile out for PPC


We have done wide test with different combinations, e.g:
- PCI device were tested
- FD passing and hot reset with some trick.
- device hotplug test with legacy and iommufd backends
- with or without vIOMMU for legacy and iommufd backends
- divices linked to different iommufds
- VFIO migration with a E800 net card(no dirty sync support) passthrough
- platform, ccw and ap were only compile-tested due to environment limit


Given some iommufd kernel limitations, the iommufd backend is
not yet fully on par with the legacy backend w.r.t. features like:
- p2p mappings (you will see related error traces)
- dirty page sync
- and etc.


qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v4
Based on vfio-next, commit id: f686924775

--------------------------------------------------------------------------

Below are some background and graph about the design:

With the introduction of iommufd, the Linux kernel provides a generic
interface for userspace drivers to propagate their DMA mappings to kernel
for assigned devices. This series does the porting of the VFIO devices
onto the /dev/iommu uapi and let it coexist with the legacy implementation.

At QEMU level, interactions with the /dev/iommu are abstracted by a new
iommufd object (compiled in with the CONFIG_IOMMUFD option).

Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
linked with an iommufd object. In this series, the vfio-pci device is
granted with such capability (other VFIO devices are not yet ready):

It gets a new optional parameter named iommufd which allows to pass
an iommufd object:

    -object iommufd,id=iommufd0
    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0

Note the /dev/iommu and vfio cdev can be externally opened by a
management layer. In such a case the fd is passed:

    -object iommufd,id=iommufd0,fd=22
    -device vfio-pci,iommufd=iommufd0,fd=23

If the fd parameter is not passed, the fd is opened by QEMU.
See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
for detailed discuss on this requirement.

If no iommufd option is passed to the vfio-pci device, iommufd is not
used and the end-user gets the behavior based on the legacy vfio iommu
interfaces:

    -device vfio-pci,host=0000:02:00.0

While the legacy kernel interface is group-centric, the new iommufd
interface is device-centric, relying on device fd and iommufd.

To support both interfaces in the QEMU VFIO device we reworked the vfio
container abstraction so that the generic VFIO code can use either
backend.

The VFIOContainer object becomes a base object derived into
a) the legacy VFIO container and
b) the new iommufd based container.

The base object implements generic code such as code related to
memory_listener and address space management whereas the derived
objects implement callbacks specific to either BE, legacy and
iommufd. Indeed each backend has its own way to setup secure context
and dma management interface. The below diagram shows how it looks
like with both BEs.

                    VFIO                           AddressSpace/Memory
    +-------+  +----------+  +-----+  +-----+
    |  pci  |  | platform |  |  ap |  | ccw |
    +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
        |           |           |        |        |   AddressSpace       |
        |           |           |        |        +------------+---------+
    +---V-----------V-----------V--------V----+               /
    |           VFIOAddressSpace              | <------------+
    |                  |                      |  MemoryListener
    |          VFIOContainer list             |
    +-------+----------------------------+----+
            |                            |
            |                            |
    +-------V------+            +--------V----------+
    |   iommufd    |            |    vfio legacy    |
    |  container   |            |     container     |
    +-------+------+            +--------+----------+
            |                            |
            | /dev/iommu                 | /dev/vfio/vfio
            | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
Userspace   |                            |
============+============================+===========================
Kernel      |  device fd                 |
            +---------------+            | group/container fd
            | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
            |  ATTACH_IOAS) |            | device fd
            |               |            |
            |       +-------V------------V-----------------+
    iommufd |       |                vfio                  |
(map/unmap  |       +---------+--------------------+-------+
ioas_copy)  |                 |                    | map/unmap
            |                 |                    |
     +------V------+    +-----V------+      +------V--------+
     | iommfd core |    |  device    |      |  vfio iommu   |
     +-------------+    +------------+      +---------------+

[Secure Context setup]
- iommufd BE: uses device fd and iommufd to setup secure context
              (bind_iommufd, attach_ioas)
- vfio legacy BE: uses group fd and container fd to setup secure context
                  (set_container, set_iommu)
[Device access]
- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
- vfio legacy BE: device fd is retrieved from group fd ioctl
[DMA Mapping flow]
1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
2. VFIO populates DMA map/unmap via the container BEs
   *) iommufd BE: uses iommufd
   *) vfio legacy BE: uses container fd


Changelog:
v4:
- add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
- add doc for default case without fd (Markus)
- Fix build issue reported by Markus and Cédric
- Simply use SPDX identifier in new file (Cédric)
- make vfio_container_init/destroy helper a seperate patch (Cédric)
- make vrdl_list movement a seperate patch (Cédric)
- add const for some callback parameters (Cédric)
- add g_assert in VFIOIOMMUOps callback (Cédric)
- introduce pci_hot_reset callback (Cédric)
- remove VFIOIOMMUSpaprOps (Cédric)
- initialize g_autofree to NULL (Cédric)
- adjust func name prefix and trace event in iommufd.c (Cédric)
- add RB

v3:
- Rename base container as VFIOContainerBase and legacy container as container (Cédric)
- Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
- Cleanup container.c by introducing spapr backend and move spapr code out (Cédric)
- Introduce vfio_iommu_spapr_ops (Cédric)
- Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
- patch19 and patch21 are splitted to two parts to facilitate review

v2:
- patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
- add fd passing to platform/ap/ccw vfio device
- add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
- rename char_dev.h to chardev_open.h for same naming scheme per Daniel
- add full copyright per Daniel and Jason


Note changelog below are from full IOMMUFD series:

v1:
- Alloc hwpt instead of using auto hwpt
- elaborate iommufd code per Nicolin
- consolidate two patches and drop as.c
- typo error fix and function rename

rfcv4:
- rebase on top of v8.0.3
- Add one patch from Yi which is about vfio device add in kvm
- Remove IOAS_COPY optimization and focus on functions in this patchset
- Fix wrong name issue reported and fix suggested by Matthew
- Fix compilation issue reported and fix sugggsted by Nicolin
- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
granularity
- Add dev_iter_next() callback to avoid adding so many callback
  at container scope, add VFIODevice.hwpt to support that
- Restore all functions back to common from container whenever possible,
  mainly migration and reset related functions
- Add --enable/disable-iommufd config option, enabled by default in linux
- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
redundant code
- Add FD passing support for vfio device backed by IOMMUFD
- Fix hot unplug resource leak issue in vfio_legacy_detach_device()
- Fix FD leak in vfio_get_devicefd()

rfcv3:
- rebase on top of v7.2.0
- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
  VFIO backends
- Fix use after free in error path, reported by Alister
- Split common.c in several steps to ease the review

rfcv2:
- remove the first three patches of rfcv1
- add open cdev helper suggested by Jason
- remove the QOMification of the VFIOContainer and simply use standard ops
(David)
- add "-object iommufd" suggested by Alex

Thanks
Zhenzhong

Eric Auger (11):
  vfio/container: Switch to dma_map|unmap API
  vfio/common: Move giommu_list in base container
  vfio/container: Move space field to base container
  vfio/container: Switch to IOMMU BE
    set_dirty_page_tracking/query_dirty_bitmap API
  vfio/container: Convert functions to base container
  vfio/container: Move pgsizes and dma_max_mappings to base container
  vfio/container: Move listener to base container
  vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size to base
    container
  vfio/container: Implement attach/detach_device
  backends/iommufd: Introduce the iommufd object
  vfio/pci: Allow the selection of a given iommu backend

Yi Liu (2):
  util/char_dev: Add open_cdev()
  vfio/iommufd: Implement the iommufd backend

Zhenzhong Duan (28):
  vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
  vfio/container: Move vfio_container_add/del_section_window into
    spapr.c
  vfio/container: Move spapr specific init/deinit into spapr.c
  vfio/spapr: Make vfio_spapr_create/remove_window static
  vfio/common: Move vfio_host_win_add/del into spapr.c
  vfio: Introduce base object for VFIOContainer and targeted interface
  vfio/container: Introduce a empty VFIOIOMMUOps
  vfio/common: Introduce vfio_container_init/destroy helper
  vfio/container: Move per container device list in base container
  vfio/container: Move vrdl_list to base container
  vfio/container: Move iova_ranges to base container
  vfio/spapr: Introduce spapr backend and target interface
  vfio/spapr: switch to spapr IOMMU BE add/del_section_window
  vfio/spapr: Move prereg_listener into spapr container
  vfio/spapr: Move hostwin_list into spapr container
  Add iommufd configure option
  vfio/iommufd: Relax assert check for iommufd backend
  vfio/iommufd: Add support for iova_ranges
  vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
  vfio/pci: Introduce a vfio pci hot reset interface
  vfio/iommufd: Enable pci hot reset through iommufd cdev interface
  vfio/pci: Make vfio cdev pre-openable by passing a file handle
  vfio: Allow the selection of a given iommu backend for platform ap and
    ccw
  vfio/platform: Make vfio cdev pre-openable by passing a file handle
  vfio/ap: Make vfio cdev pre-openable by passing a file handle
  vfio/ccw: Make vfio cdev pre-openable by passing a file handle
  vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
    callbacks
  vfio: Compile out iommufd for PPC target

 MAINTAINERS                           |  13 +
 meson.build                           |   6 +
 qapi/qom.json                         |  22 +
 hw/vfio/pci.h                         |   6 +
 include/hw/vfio/vfio-common.h         | 118 ++---
 include/hw/vfio/vfio-container-base.h | 121 +++++
 include/hw/vfio/vfio-platform.h       |   1 +
 include/hw/vfio/vfio.h                |   7 -
 include/qemu/chardev_open.h           |  16 +
 include/sysemu/iommufd.h              |  46 ++
 backends/iommufd-stub.c               |  59 +++
 backends/iommufd.c                    | 257 ++++++++++
 hw/ppc/spapr_pci_vfio.c               | 100 +++-
 hw/vfio/ap.c                          |  38 +-
 hw/vfio/ccw.c                         |  40 +-
 hw/vfio/common.c                      | 330 ++++++------
 hw/vfio/container-base.c              | 101 ++++
 hw/vfio/container.c                   | 443 ++++------------
 hw/vfio/helpers.c                     |  34 +-
 hw/vfio/iommufd.c                     | 697 ++++++++++++++++++++++++++
 hw/vfio/pci.c                         | 112 +++--
 hw/vfio/platform.c                    |  45 +-
 hw/vfio/spapr.c                       | 338 ++++++++++++-
 util/chardev_open.c                   |  81 +++
 backends/Kconfig                      |   4 +
 backends/meson.build                  |   5 +
 backends/trace-events                 |  12 +
 hw/vfio/meson.build                   |   4 +
 hw/vfio/trace-events                  |  18 +-
 meson_options.txt                     |   2 +
 qemu-options.hx                       |  13 +
 scripts/meson-buildoptions.sh         |   3 +
 util/meson.build                      |   1 +
 33 files changed, 2403 insertions(+), 690 deletions(-)
 create mode 100644 include/hw/vfio/vfio-container-base.h
 delete mode 100644 include/hw/vfio/vfio.h
 create mode 100644 include/qemu/chardev_open.h
 create mode 100644 include/sysemu/iommufd.h
 create mode 100644 backends/iommufd-stub.c
 create mode 100644 backends/iommufd.c
 create mode 100644 hw/vfio/container-base.c
 create mode 100644 hw/vfio/iommufd.c
 create mode 100644 util/chardev_open.c

-- 
2.34.1



^ permalink raw reply	[flat|nested] 114+ messages in thread

* [PATCH v4 01/41] vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 02/41] vfio/container: Move vfio_container_add/del_section_window into spapr.c Zhenzhong Duan
                   ` (41 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Eric Farman, Nicholas Piggin,
	Daniel Henrique Barboza, Cédric Le Goater, David Gibson,
	Harsh Prateek Bora, Tony Krowiak, Halil Pasic, Jason Herne,
	Thomas Huth, Matthew Rosato, open list:sPAPR (pseries),
	open list:vfio-ap

With vfio_eeh_as_ok/vfio_eeh_as_op moved and made static,
vfio.h becomes empty and is deleted.

No functional changes intended.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Acked-by: Eric Farman <farman@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio.h  |   7 ---
 hw/ppc/spapr_pci_vfio.c | 100 +++++++++++++++++++++++++++++++++++++++-
 hw/vfio/ap.c            |   1 -
 hw/vfio/ccw.c           |   1 -
 hw/vfio/common.c        |   1 -
 hw/vfio/container.c     |  98 ---------------------------------------
 hw/vfio/helpers.c       |   1 -
 7 files changed, 99 insertions(+), 110 deletions(-)
 delete mode 100644 include/hw/vfio/vfio.h

diff --git a/include/hw/vfio/vfio.h b/include/hw/vfio/vfio.h
deleted file mode 100644
index 86248f5436..0000000000
--- a/include/hw/vfio/vfio.h
+++ /dev/null
@@ -1,7 +0,0 @@
-#ifndef HW_VFIO_H
-#define HW_VFIO_H
-
-bool vfio_eeh_as_ok(AddressSpace *as);
-int vfio_eeh_as_op(AddressSpace *as, uint32_t op);
-
-#endif
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index 9016720547..f283f7e38d 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -18,14 +18,112 @@
  */
 
 #include "qemu/osdep.h"
+#include <sys/ioctl.h>
 #include <linux/vfio.h>
 #include "hw/ppc/spapr.h"
 #include "hw/pci-host/spapr.h"
 #include "hw/pci/msix.h"
 #include "hw/pci/pci_device.h"
-#include "hw/vfio/vfio.h"
+#include "hw/vfio/vfio-common.h"
 #include "qemu/error-report.h"
 
+/*
+ * Interfaces for IBM EEH (Enhanced Error Handling)
+ */
+static bool vfio_eeh_container_ok(VFIOContainer *container)
+{
+    /*
+     * As of 2016-03-04 (linux-4.5) the host kernel EEH/VFIO
+     * implementation is broken if there are multiple groups in a
+     * container.  The hardware works in units of Partitionable
+     * Endpoints (== IOMMU groups) and the EEH operations naively
+     * iterate across all groups in the container, without any logic
+     * to make sure the groups have their state synchronized.  For
+     * certain operations (ENABLE) that might be ok, until an error
+     * occurs, but for others (GET_STATE) it's clearly broken.
+     */
+
+    /*
+     * XXX Once fixed kernels exist, test for them here
+     */
+
+    if (QLIST_EMPTY(&container->group_list)) {
+        return false;
+    }
+
+    if (QLIST_NEXT(QLIST_FIRST(&container->group_list), container_next)) {
+        return false;
+    }
+
+    return true;
+}
+
+static int vfio_eeh_container_op(VFIOContainer *container, uint32_t op)
+{
+    struct vfio_eeh_pe_op pe_op = {
+        .argsz = sizeof(pe_op),
+        .op = op,
+    };
+    int ret;
+
+    if (!vfio_eeh_container_ok(container)) {
+        error_report("vfio/eeh: EEH_PE_OP 0x%x: "
+                     "kernel requires a container with exactly one group", op);
+        return -EPERM;
+    }
+
+    ret = ioctl(container->fd, VFIO_EEH_PE_OP, &pe_op);
+    if (ret < 0) {
+        error_report("vfio/eeh: EEH_PE_OP 0x%x failed: %m", op);
+        return -errno;
+    }
+
+    return ret;
+}
+
+static VFIOContainer *vfio_eeh_as_container(AddressSpace *as)
+{
+    VFIOAddressSpace *space = vfio_get_address_space(as);
+    VFIOContainer *container = NULL;
+
+    if (QLIST_EMPTY(&space->containers)) {
+        /* No containers to act on */
+        goto out;
+    }
+
+    container = QLIST_FIRST(&space->containers);
+
+    if (QLIST_NEXT(container, next)) {
+        /*
+         * We don't yet have logic to synchronize EEH state across
+         * multiple containers
+         */
+        container = NULL;
+        goto out;
+    }
+
+out:
+    vfio_put_address_space(space);
+    return container;
+}
+
+static bool vfio_eeh_as_ok(AddressSpace *as)
+{
+    VFIOContainer *container = vfio_eeh_as_container(as);
+
+    return (container != NULL) && vfio_eeh_container_ok(container);
+}
+
+static int vfio_eeh_as_op(AddressSpace *as, uint32_t op)
+{
+    VFIOContainer *container = vfio_eeh_as_container(as);
+
+    if (!container) {
+        return -ENODEV;
+    }
+    return vfio_eeh_container_op(container, op);
+}
+
 bool spapr_phb_eeh_available(SpaprPhbState *sphb)
 {
     return vfio_eeh_as_ok(&sphb->iommu_as);
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 5f257bffb9..bbf69ff55a 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -14,7 +14,6 @@
 #include <linux/vfio.h>
 #include <sys/ioctl.h>
 #include "qapi/error.h"
-#include "hw/vfio/vfio.h"
 #include "hw/vfio/vfio-common.h"
 #include "hw/s390x/ap-device.h"
 #include "qemu/error-report.h"
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 6623ae237b..d857bb8d0f 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -20,7 +20,6 @@
 #include <sys/ioctl.h>
 
 #include "qapi/error.h"
-#include "hw/vfio/vfio.h"
 #include "hw/vfio/vfio-common.h"
 #include "hw/s390x/s390-ccw.h"
 #include "hw/s390x/vfio-ccw.h"
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9c5c6433f2..e72055e752 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -26,7 +26,6 @@
 #include <linux/vfio.h>
 
 #include "hw/vfio/vfio-common.h"
-#include "hw/vfio/vfio.h"
 #include "hw/vfio/pci.h"
 #include "exec/address-spaces.h"
 #include "exec/memory.h"
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index fc88222377..83c0f05bba 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -26,7 +26,6 @@
 #include <linux/vfio.h>
 
 #include "hw/vfio/vfio-common.h"
-#include "hw/vfio/vfio.h"
 #include "exec/address-spaces.h"
 #include "exec/memory.h"
 #include "exec/ram_addr.h"
@@ -1011,103 +1010,6 @@ static void vfio_put_base_device(VFIODevice *vbasedev)
     close(vbasedev->fd);
 }
 
-/*
- * Interfaces for IBM EEH (Enhanced Error Handling)
- */
-static bool vfio_eeh_container_ok(VFIOContainer *container)
-{
-    /*
-     * As of 2016-03-04 (linux-4.5) the host kernel EEH/VFIO
-     * implementation is broken if there are multiple groups in a
-     * container.  The hardware works in units of Partitionable
-     * Endpoints (== IOMMU groups) and the EEH operations naively
-     * iterate across all groups in the container, without any logic
-     * to make sure the groups have their state synchronized.  For
-     * certain operations (ENABLE) that might be ok, until an error
-     * occurs, but for others (GET_STATE) it's clearly broken.
-     */
-
-    /*
-     * XXX Once fixed kernels exist, test for them here
-     */
-
-    if (QLIST_EMPTY(&container->group_list)) {
-        return false;
-    }
-
-    if (QLIST_NEXT(QLIST_FIRST(&container->group_list), container_next)) {
-        return false;
-    }
-
-    return true;
-}
-
-static int vfio_eeh_container_op(VFIOContainer *container, uint32_t op)
-{
-    struct vfio_eeh_pe_op pe_op = {
-        .argsz = sizeof(pe_op),
-        .op = op,
-    };
-    int ret;
-
-    if (!vfio_eeh_container_ok(container)) {
-        error_report("vfio/eeh: EEH_PE_OP 0x%x: "
-                     "kernel requires a container with exactly one group", op);
-        return -EPERM;
-    }
-
-    ret = ioctl(container->fd, VFIO_EEH_PE_OP, &pe_op);
-    if (ret < 0) {
-        error_report("vfio/eeh: EEH_PE_OP 0x%x failed: %m", op);
-        return -errno;
-    }
-
-    return ret;
-}
-
-static VFIOContainer *vfio_eeh_as_container(AddressSpace *as)
-{
-    VFIOAddressSpace *space = vfio_get_address_space(as);
-    VFIOContainer *container = NULL;
-
-    if (QLIST_EMPTY(&space->containers)) {
-        /* No containers to act on */
-        goto out;
-    }
-
-    container = QLIST_FIRST(&space->containers);
-
-    if (QLIST_NEXT(container, next)) {
-        /*
-         * We don't yet have logic to synchronize EEH state across
-         * multiple containers
-         */
-        container = NULL;
-        goto out;
-    }
-
-out:
-    vfio_put_address_space(space);
-    return container;
-}
-
-bool vfio_eeh_as_ok(AddressSpace *as)
-{
-    VFIOContainer *container = vfio_eeh_as_container(as);
-
-    return (container != NULL) && vfio_eeh_container_ok(container);
-}
-
-int vfio_eeh_as_op(AddressSpace *as, uint32_t op)
-{
-    VFIOContainer *container = vfio_eeh_as_container(as);
-
-    if (!container) {
-        return -ENODEV;
-    }
-    return vfio_eeh_container_op(container, op);
-}
-
 static int vfio_device_groupid(VFIODevice *vbasedev, Error **errp)
 {
     char *tmp, group_path[PATH_MAX], *group_name;
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 7e5da21b31..168847e7c5 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -23,7 +23,6 @@
 #include <sys/ioctl.h>
 
 #include "hw/vfio/vfio-common.h"
-#include "hw/vfio/vfio.h"
 #include "hw/hw.h"
 #include "trace.h"
 #include "qapi/error.h"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 02/41] vfio/container: Move vfio_container_add/del_section_window into spapr.c
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 01/41] vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 03/41] vfio/container: Move spapr specific init/deinit " Zhenzhong Duan
                   ` (40 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

vfio_container_add/del_section_window are spapr specific functions,
so move them into spapr.c to make container.c cleaner.

No functional changes intended.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 hw/vfio/container.c | 90 ---------------------------------------------
 hw/vfio/spapr.c     | 90 +++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+), 90 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 83c0f05bba..7a3f005d1b 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -20,9 +20,6 @@
 
 #include "qemu/osdep.h"
 #include <sys/ioctl.h>
-#ifdef CONFIG_KVM
-#include <linux/kvm.h>
-#endif
 #include <linux/vfio.h>
 
 #include "hw/vfio/vfio-common.h"
@@ -32,7 +29,6 @@
 #include "hw/hw.h"
 #include "qemu/error-report.h"
 #include "qemu/range.h"
-#include "sysemu/kvm.h"
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
@@ -204,92 +200,6 @@ int vfio_dma_map(VFIOContainer *container, hwaddr iova,
     return -errno;
 }
 
-int vfio_container_add_section_window(VFIOContainer *container,
-                                      MemoryRegionSection *section,
-                                      Error **errp)
-{
-    VFIOHostDMAWindow *hostwin;
-    hwaddr pgsize = 0;
-    int ret;
-
-    if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
-        return 0;
-    }
-
-    /* For now intersections are not allowed, we may relax this later */
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-        if (ranges_overlap(hostwin->min_iova,
-                           hostwin->max_iova - hostwin->min_iova + 1,
-                           section->offset_within_address_space,
-                           int128_get64(section->size))) {
-            error_setg(errp,
-                "region [0x%"PRIx64",0x%"PRIx64"] overlaps with existing"
-                "host DMA window [0x%"PRIx64",0x%"PRIx64"]",
-                section->offset_within_address_space,
-                section->offset_within_address_space +
-                    int128_get64(section->size) - 1,
-                hostwin->min_iova, hostwin->max_iova);
-            return -EINVAL;
-        }
-    }
-
-    ret = vfio_spapr_create_window(container, section, &pgsize);
-    if (ret) {
-        error_setg_errno(errp, -ret, "Failed to create SPAPR window");
-        return ret;
-    }
-
-    vfio_host_win_add(container, section->offset_within_address_space,
-                      section->offset_within_address_space +
-                      int128_get64(section->size) - 1, pgsize);
-#ifdef CONFIG_KVM
-    if (kvm_enabled()) {
-        VFIOGroup *group;
-        IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
-        struct kvm_vfio_spapr_tce param;
-        struct kvm_device_attr attr = {
-            .group = KVM_DEV_VFIO_GROUP,
-            .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
-            .addr = (uint64_t)(unsigned long)&param,
-        };
-
-        if (!memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_SPAPR_TCE_FD,
-                                          &param.tablefd)) {
-            QLIST_FOREACH(group, &container->group_list, container_next) {
-                param.groupfd = group->fd;
-                if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
-                    error_setg_errno(errp, errno,
-                                     "vfio: failed GROUP_SET_SPAPR_TCE for "
-                                     "KVM VFIO device %d and group fd %d",
-                                     param.tablefd, param.groupfd);
-                    return -errno;
-                }
-                trace_vfio_spapr_group_attach(param.groupfd, param.tablefd);
-            }
-        }
-    }
-#endif
-    return 0;
-}
-
-void vfio_container_del_section_window(VFIOContainer *container,
-                                       MemoryRegionSection *section)
-{
-    if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
-        return;
-    }
-
-    vfio_spapr_remove_window(container,
-                             section->offset_within_address_space);
-    if (vfio_host_win_del(container,
-                          section->offset_within_address_space,
-                          section->offset_within_address_space +
-                          int128_get64(section->size) - 1) < 0) {
-        hw_error("%s: Cannot delete missing window at %"HWADDR_PRIx,
-                 __func__, section->offset_within_address_space);
-    }
-}
-
 int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
 {
     int ret;
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 9ec1e95f6d..9a7517c042 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -11,6 +11,10 @@
 #include "qemu/osdep.h"
 #include <sys/ioctl.h>
 #include <linux/vfio.h>
+#ifdef CONFIG_KVM
+#include <linux/kvm.h>
+#endif
+#include "sysemu/kvm.h"
 
 #include "hw/vfio/vfio-common.h"
 #include "hw/hw.h"
@@ -253,3 +257,89 @@ int vfio_spapr_remove_window(VFIOContainer *container,
 
     return 0;
 }
+
+int vfio_container_add_section_window(VFIOContainer *container,
+                                      MemoryRegionSection *section,
+                                      Error **errp)
+{
+    VFIOHostDMAWindow *hostwin;
+    hwaddr pgsize = 0;
+    int ret;
+
+    if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
+        return 0;
+    }
+
+    /* For now intersections are not allowed, we may relax this later */
+    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+        if (ranges_overlap(hostwin->min_iova,
+                           hostwin->max_iova - hostwin->min_iova + 1,
+                           section->offset_within_address_space,
+                           int128_get64(section->size))) {
+            error_setg(errp,
+                "region [0x%"PRIx64",0x%"PRIx64"] overlaps with existing"
+                "host DMA window [0x%"PRIx64",0x%"PRIx64"]",
+                section->offset_within_address_space,
+                section->offset_within_address_space +
+                    int128_get64(section->size) - 1,
+                hostwin->min_iova, hostwin->max_iova);
+            return -EINVAL;
+        }
+    }
+
+    ret = vfio_spapr_create_window(container, section, &pgsize);
+    if (ret) {
+        error_setg_errno(errp, -ret, "Failed to create SPAPR window");
+        return ret;
+    }
+
+    vfio_host_win_add(container, section->offset_within_address_space,
+                      section->offset_within_address_space +
+                      int128_get64(section->size) - 1, pgsize);
+#ifdef CONFIG_KVM
+    if (kvm_enabled()) {
+        VFIOGroup *group;
+        IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
+        struct kvm_vfio_spapr_tce param;
+        struct kvm_device_attr attr = {
+            .group = KVM_DEV_VFIO_GROUP,
+            .attr = KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE,
+            .addr = (uint64_t)(unsigned long)&param,
+        };
+
+        if (!memory_region_iommu_get_attr(iommu_mr, IOMMU_ATTR_SPAPR_TCE_FD,
+                                          &param.tablefd)) {
+            QLIST_FOREACH(group, &container->group_list, container_next) {
+                param.groupfd = group->fd;
+                if (ioctl(vfio_kvm_device_fd, KVM_SET_DEVICE_ATTR, &attr)) {
+                    error_setg_errno(errp, errno,
+                                     "vfio: failed GROUP_SET_SPAPR_TCE for "
+                                     "KVM VFIO device %d and group fd %d",
+                                     param.tablefd, param.groupfd);
+                    return -errno;
+                }
+                trace_vfio_spapr_group_attach(param.groupfd, param.tablefd);
+            }
+        }
+    }
+#endif
+    return 0;
+}
+
+void vfio_container_del_section_window(VFIOContainer *container,
+                                       MemoryRegionSection *section)
+{
+    if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
+        return;
+    }
+
+    vfio_spapr_remove_window(container,
+                             section->offset_within_address_space);
+    if (vfio_host_win_del(container,
+                          section->offset_within_address_space,
+                          section->offset_within_address_space +
+                          int128_get64(section->size) - 1) < 0) {
+        hw_error("%s: Cannot delete missing window at %"HWADDR_PRIx,
+                 __func__, section->offset_within_address_space);
+    }
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 03/41] vfio/container: Move spapr specific init/deinit into spapr.c
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 01/41] vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 02/41] vfio/container: Move vfio_container_add/del_section_window into spapr.c Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 04/41] vfio/spapr: Make vfio_spapr_create/remove_window static Zhenzhong Duan
                   ` (39 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

Move spapr specific init/deinit code into spapr.c and wrap
them with vfio_spapr_container_init/deinit, this way footprint
of spapr is further reduced, vfio_prereg_listener could also
be made static.

vfio_listener_release is unnecessary when prereg_listener is
moved out, so have it removed.

No functional changes intended.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h |  4 +-
 hw/vfio/container.c           | 82 +++++------------------------------
 hw/vfio/spapr.c               | 81 +++++++++++++++++++++++++++++++++-
 3 files changed, 95 insertions(+), 72 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 0c3d390e8b..ed5a8e4754 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -225,11 +225,14 @@ int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start);
 int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
                             hwaddr iova, hwaddr size);
 
+/* SPAPR specific */
 int vfio_container_add_section_window(VFIOContainer *container,
                                       MemoryRegionSection *section,
                                       Error **errp);
 void vfio_container_del_section_window(VFIOContainer *container,
                                        MemoryRegionSection *section);
+int vfio_spapr_container_init(VFIOContainer *container, Error **errp);
+void vfio_spapr_container_deinit(VFIOContainer *container);
 
 void vfio_disable_irqindex(VFIODevice *vbasedev, int index);
 void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index);
@@ -289,7 +292,6 @@ vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
 struct vfio_info_cap_header *
 vfio_get_cap(void *ptr, uint32_t cap_offset, uint16_t id);
 #endif
-extern const MemoryListener vfio_prereg_listener;
 
 int vfio_spapr_create_window(VFIOContainer *container,
                              MemoryRegionSection *section,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 7a3f005d1b..204b244b11 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -264,14 +264,6 @@ int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
     return ret;
 }
 
-static void vfio_listener_release(VFIOContainer *container)
-{
-    memory_listener_unregister(&container->listener);
-    if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
-        memory_listener_unregister(&container->prereg_listener);
-    }
-}
-
 static struct vfio_info_cap_header *
 vfio_get_iommu_type1_info_cap(struct vfio_iommu_type1_info *info, uint16_t id)
 {
@@ -612,69 +604,11 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     case VFIO_SPAPR_TCE_v2_IOMMU:
     case VFIO_SPAPR_TCE_IOMMU:
     {
-        struct vfio_iommu_spapr_tce_info info;
-        bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
-
-        /*
-         * The host kernel code implementing VFIO_IOMMU_DISABLE is called
-         * when container fd is closed so we do not call it explicitly
-         * in this file.
-         */
-        if (!v2) {
-            ret = ioctl(fd, VFIO_IOMMU_ENABLE);
-            if (ret) {
-                error_setg_errno(errp, errno, "failed to enable container");
-                ret = -errno;
-                goto enable_discards_exit;
-            }
-        } else {
-            container->prereg_listener = vfio_prereg_listener;
-
-            memory_listener_register(&container->prereg_listener,
-                                     &address_space_memory);
-            if (container->error) {
-                memory_listener_unregister(&container->prereg_listener);
-                ret = -1;
-                error_propagate_prepend(errp, container->error,
-                    "RAM memory listener initialization failed: ");
-                goto enable_discards_exit;
-            }
-        }
-
-        info.argsz = sizeof(info);
-        ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
+        ret = vfio_spapr_container_init(container, errp);
         if (ret) {
-            error_setg_errno(errp, errno,
-                             "VFIO_IOMMU_SPAPR_TCE_GET_INFO failed");
-            ret = -errno;
-            if (v2) {
-                memory_listener_unregister(&container->prereg_listener);
-            }
             goto enable_discards_exit;
         }
-
-        if (v2) {
-            container->pgsizes = info.ddw.pgsizes;
-            /*
-             * There is a default window in just created container.
-             * To make region_add/del simpler, we better remove this
-             * window now and let those iommu_listener callbacks
-             * create/remove them when needed.
-             */
-            ret = vfio_spapr_remove_window(container, info.dma32_window_start);
-            if (ret) {
-                error_setg_errno(errp, -ret,
-                                 "failed to remove existing window");
-                goto enable_discards_exit;
-            }
-        } else {
-            /* The default table uses 4K pages */
-            container->pgsizes = 0x1000;
-            vfio_host_win_add(container, info.dma32_window_start,
-                              info.dma32_window_start +
-                              info.dma32_window_size - 1,
-                              0x1000);
-        }
+        break;
     }
     }
 
@@ -704,7 +638,11 @@ listener_release_exit:
     QLIST_REMOVE(group, container_next);
     QLIST_REMOVE(container, next);
     vfio_kvm_device_del_group(group);
-    vfio_listener_release(container);
+    memory_listener_unregister(&container->listener);
+    if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU ||
+        container->iommu_type == VFIO_SPAPR_TCE_IOMMU) {
+        vfio_spapr_container_deinit(container);
+    }
 
 enable_discards_exit:
     vfio_ram_block_discard_disable(container, false);
@@ -734,7 +672,11 @@ static void vfio_disconnect_container(VFIOGroup *group)
      * group.
      */
     if (QLIST_EMPTY(&container->group_list)) {
-        vfio_listener_release(container);
+        memory_listener_unregister(&container->listener);
+        if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU ||
+            container->iommu_type == VFIO_SPAPR_TCE_IOMMU) {
+            vfio_spapr_container_deinit(container);
+        }
     }
 
     if (ioctl(group->fd, VFIO_GROUP_UNSET_CONTAINER, &container->fd)) {
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 9a7517c042..00dbd7af11 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -15,6 +15,7 @@
 #include <linux/kvm.h>
 #endif
 #include "sysemu/kvm.h"
+#include "exec/address-spaces.h"
 
 #include "hw/vfio/vfio-common.h"
 #include "hw/hw.h"
@@ -139,7 +140,7 @@ static void vfio_prereg_listener_region_del(MemoryListener *listener,
     trace_vfio_prereg_unregister(reg.vaddr, reg.size, ret ? -errno : 0);
 }
 
-const MemoryListener vfio_prereg_listener = {
+static const MemoryListener vfio_prereg_listener = {
     .name = "vfio-pre-reg",
     .region_add = vfio_prereg_listener_region_add,
     .region_del = vfio_prereg_listener_region_del,
@@ -343,3 +344,81 @@ void vfio_container_del_section_window(VFIOContainer *container,
                  __func__, section->offset_within_address_space);
     }
 }
+
+int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
+{
+    struct vfio_iommu_spapr_tce_info info;
+    bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
+    int ret, fd = container->fd;
+
+    /*
+     * The host kernel code implementing VFIO_IOMMU_DISABLE is called
+     * when container fd is closed so we do not call it explicitly
+     * in this file.
+     */
+    if (!v2) {
+        ret = ioctl(fd, VFIO_IOMMU_ENABLE);
+        if (ret) {
+            error_setg_errno(errp, errno, "failed to enable container");
+            return -errno;
+        }
+    } else {
+        container->prereg_listener = vfio_prereg_listener;
+
+        memory_listener_register(&container->prereg_listener,
+                                 &address_space_memory);
+        if (container->error) {
+            ret = -1;
+            error_propagate_prepend(errp, container->error,
+                    "RAM memory listener initialization failed: ");
+            goto listener_unregister_exit;
+        }
+    }
+
+    info.argsz = sizeof(info);
+    ret = ioctl(fd, VFIO_IOMMU_SPAPR_TCE_GET_INFO, &info);
+    if (ret) {
+        error_setg_errno(errp, errno,
+                         "VFIO_IOMMU_SPAPR_TCE_GET_INFO failed");
+        ret = -errno;
+        goto listener_unregister_exit;
+    }
+
+    if (v2) {
+        container->pgsizes = info.ddw.pgsizes;
+        /*
+         * There is a default window in just created container.
+         * To make region_add/del simpler, we better remove this
+         * window now and let those iommu_listener callbacks
+         * create/remove them when needed.
+         */
+        ret = vfio_spapr_remove_window(container, info.dma32_window_start);
+        if (ret) {
+            error_setg_errno(errp, -ret,
+                             "failed to remove existing window");
+            goto listener_unregister_exit;
+        }
+    } else {
+        /* The default table uses 4K pages */
+        container->pgsizes = 0x1000;
+        vfio_host_win_add(container, info.dma32_window_start,
+                          info.dma32_window_start +
+                          info.dma32_window_size - 1,
+                          0x1000);
+    }
+
+    return 0;
+
+listener_unregister_exit:
+    if (v2) {
+        memory_listener_unregister(&container->prereg_listener);
+    }
+    return ret;
+}
+
+void vfio_spapr_container_deinit(VFIOContainer *container)
+{
+    if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
+        memory_listener_unregister(&container->prereg_listener);
+    }
+}
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 04/41] vfio/spapr: Make vfio_spapr_create/remove_window static
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (2 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 03/41] vfio/container: Move spapr specific init/deinit " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 05/41] vfio/common: Move vfio_host_win_add/del into spapr.c Zhenzhong Duan
                   ` (38 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Harsh Prateek Bora,
	open list:sPAPR (pseries)

vfio_spapr_create_window calls vfio_spapr_remove_window,
With reoder of definition of the two, we can make
vfio_spapr_create/remove_window static.

No functional changes intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h |  6 -----
 hw/vfio/spapr.c               | 48 +++++++++++++++++------------------
 2 files changed, 24 insertions(+), 30 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index ed5a8e4754..87848982bd 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -293,12 +293,6 @@ struct vfio_info_cap_header *
 vfio_get_cap(void *ptr, uint32_t cap_offset, uint16_t id);
 #endif
 
-int vfio_spapr_create_window(VFIOContainer *container,
-                             MemoryRegionSection *section,
-                             hwaddr *pgsize);
-int vfio_spapr_remove_window(VFIOContainer *container,
-                             hwaddr offset_within_address_space);
-
 bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
 void vfio_migration_exit(VFIODevice *vbasedev);
 
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 00dbd7af11..4428990c28 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -146,9 +146,30 @@ static const MemoryListener vfio_prereg_listener = {
     .region_del = vfio_prereg_listener_region_del,
 };
 
-int vfio_spapr_create_window(VFIOContainer *container,
-                             MemoryRegionSection *section,
-                             hwaddr *pgsize)
+static int vfio_spapr_remove_window(VFIOContainer *container,
+                                    hwaddr offset_within_address_space)
+{
+    struct vfio_iommu_spapr_tce_remove remove = {
+        .argsz = sizeof(remove),
+        .start_addr = offset_within_address_space,
+    };
+    int ret;
+
+    ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
+    if (ret) {
+        error_report("Failed to remove window at %"PRIx64,
+                     (uint64_t)remove.start_addr);
+        return -errno;
+    }
+
+    trace_vfio_spapr_remove_window(offset_within_address_space);
+
+    return 0;
+}
+
+static int vfio_spapr_create_window(VFIOContainer *container,
+                                    MemoryRegionSection *section,
+                                    hwaddr *pgsize)
 {
     int ret = 0;
     IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
@@ -238,27 +259,6 @@ int vfio_spapr_create_window(VFIOContainer *container,
     return 0;
 }
 
-int vfio_spapr_remove_window(VFIOContainer *container,
-                             hwaddr offset_within_address_space)
-{
-    struct vfio_iommu_spapr_tce_remove remove = {
-        .argsz = sizeof(remove),
-        .start_addr = offset_within_address_space,
-    };
-    int ret;
-
-    ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_REMOVE, &remove);
-    if (ret) {
-        error_report("Failed to remove window at %"PRIx64,
-                     (uint64_t)remove.start_addr);
-        return -errno;
-    }
-
-    trace_vfio_spapr_remove_window(offset_within_address_space);
-
-    return 0;
-}
-
 int vfio_container_add_section_window(VFIOContainer *container,
                                       MemoryRegionSection *section,
                                       Error **errp)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 05/41] vfio/common: Move vfio_host_win_add/del into spapr.c
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (3 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 04/41] vfio/spapr: Make vfio_spapr_create/remove_window static Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06  9:33   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 06/41] vfio: Introduce base object for VFIOContainer and targeted interface Zhenzhong Duan
                   ` (37 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

Only spapr supports a customed host window list, other vfio driver
assume 64bit host window. So remove the check in listener callback
and move vfio_host_win_add/del into spapr.c and make it static.

With the check removed, we still need to do the same check for
VFIO_SPAPR_TCE_IOMMU which allows a single host window range
[dma32_window_start, dma32_window_size). Move vfio_find_hostwin
into spapr.c and do same check in vfio_container_add_section_window
instead.

When mapping a ram device section, if it's unaligned with
hostwin->iova_pgsizes, this mapping is bypassed. With hostwin
moved into spapr, we changed to check container->pgsizes.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: add vfio_find_hostwin back for VFIO_SPAPR_TCE_IOMMU

 include/hw/vfio/vfio-common.h |  5 ---
 hw/vfio/common.c              | 70 +----------------------------
 hw/vfio/container.c           | 16 -------
 hw/vfio/spapr.c               | 83 +++++++++++++++++++++++++++++++++++
 4 files changed, 85 insertions(+), 89 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 87848982bd..a4a22accb9 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -207,11 +207,6 @@ typedef struct {
     hwaddr pages;
 } VFIOBitmap;
 
-void vfio_host_win_add(VFIOContainer *container,
-                       hwaddr min_iova, hwaddr max_iova,
-                       uint64_t iova_pgsizes);
-int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
-                      hwaddr max_iova);
 VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
 bool vfio_devices_all_running_and_saving(VFIOContainer *container);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e72055e752..e70fdf5e0c 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -245,44 +245,6 @@ bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
     return true;
 }
 
-void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova,
-                       hwaddr max_iova, uint64_t iova_pgsizes)
-{
-    VFIOHostDMAWindow *hostwin;
-
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-        if (ranges_overlap(hostwin->min_iova,
-                           hostwin->max_iova - hostwin->min_iova + 1,
-                           min_iova,
-                           max_iova - min_iova + 1)) {
-            hw_error("%s: Overlapped IOMMU are not enabled", __func__);
-        }
-    }
-
-    hostwin = g_malloc0(sizeof(*hostwin));
-
-    hostwin->min_iova = min_iova;
-    hostwin->max_iova = max_iova;
-    hostwin->iova_pgsizes = iova_pgsizes;
-    QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
-}
-
-int vfio_host_win_del(VFIOContainer *container,
-                      hwaddr min_iova, hwaddr max_iova)
-{
-    VFIOHostDMAWindow *hostwin;
-
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-        if (hostwin->min_iova == min_iova && hostwin->max_iova == max_iova) {
-            QLIST_REMOVE(hostwin, hostwin_next);
-            g_free(hostwin);
-            return 0;
-        }
-    }
-
-    return -1;
-}
-
 static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 {
     return (!memory_region_is_ram(section->mr) &&
@@ -531,22 +493,6 @@ static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
     g_free(vrdl);
 }
 
-static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
-                                            hwaddr iova, hwaddr end)
-{
-    VFIOHostDMAWindow *hostwin;
-    bool hostwin_found = false;
-
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
-        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
-            hostwin_found = true;
-            break;
-        }
-    }
-
-    return hostwin_found ? hostwin : NULL;
-}
-
 static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
 {
     MemoryRegion *mr = section->mr;
@@ -625,7 +571,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
     Int128 llend, llsize;
     void *vaddr;
     int ret;
-    VFIOHostDMAWindow *hostwin;
     Error *err = NULL;
 
     if (!vfio_listener_valid_section(section, "region_add")) {
@@ -647,13 +592,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
         goto fail;
     }
 
-    hostwin = vfio_find_hostwin(container, iova, end);
-    if (!hostwin) {
-        error_setg(&err, "Container %p can't map guest IOVA region"
-                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
-        goto fail;
-    }
-
     memory_region_ref(section->mr);
 
     if (memory_region_is_iommu(section->mr)) {
@@ -734,7 +672,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
     llsize = int128_sub(llend, int128_make64(iova));
 
     if (memory_region_is_ram_device(section->mr)) {
-        hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+        hwaddr pgmask = (1ULL << ctz64(container->pgsizes)) - 1;
 
         if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
             trace_vfio_listener_region_add_no_dma_map(
@@ -833,12 +771,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
 
     if (memory_region_is_ram_device(section->mr)) {
         hwaddr pgmask;
-        VFIOHostDMAWindow *hostwin;
-
-        hostwin = vfio_find_hostwin(container, iova, end);
-        assert(hostwin); /* or region_add() would have failed */
 
-        pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
+        pgmask = (1ULL << ctz64(container->pgsizes)) - 1;
         try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
     } else if (memory_region_has_ram_discard_manager(section->mr)) {
         vfio_unregister_ram_discard_listener(container, section);
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 204b244b11..242010036a 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -551,7 +551,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->dma_max_mappings = 0;
     container->iova_ranges = NULL;
     QLIST_INIT(&container->giommu_list);
-    QLIST_INIT(&container->hostwin_list);
     QLIST_INIT(&container->vrdl_list);
 
     ret = vfio_init_container(container, group->fd, errp);
@@ -591,14 +590,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 
         vfio_get_iommu_info_migration(container, info);
         g_free(info);
-
-        /*
-         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
-         * information to get the actual window extent rather than assume
-         * a 64-bit IOVA address space.
-         */
-        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
-
         break;
     }
     case VFIO_SPAPR_TCE_v2_IOMMU:
@@ -687,7 +678,6 @@ static void vfio_disconnect_container(VFIOGroup *group)
     if (QLIST_EMPTY(&container->group_list)) {
         VFIOAddressSpace *space = container->space;
         VFIOGuestIOMMU *giommu, *tmp;
-        VFIOHostDMAWindow *hostwin, *next;
 
         QLIST_REMOVE(container, next);
 
@@ -698,12 +688,6 @@ static void vfio_disconnect_container(VFIOGroup *group)
             g_free(giommu);
         }
 
-        QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
-                           next) {
-            QLIST_REMOVE(hostwin, hostwin_next);
-            g_free(hostwin);
-        }
-
         trace_vfio_disconnect_container(container->fd);
         close(container->fd);
         vfio_free_container(container);
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 4428990c28..83da2f7ec2 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -146,6 +146,60 @@ static const MemoryListener vfio_prereg_listener = {
     .region_del = vfio_prereg_listener_region_del,
 };
 
+static void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova,
+                              hwaddr max_iova, uint64_t iova_pgsizes)
+{
+    VFIOHostDMAWindow *hostwin;
+
+    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+        if (ranges_overlap(hostwin->min_iova,
+                           hostwin->max_iova - hostwin->min_iova + 1,
+                           min_iova,
+                           max_iova - min_iova + 1)) {
+            hw_error("%s: Overlapped IOMMU are not enabled", __func__);
+        }
+    }
+
+    hostwin = g_malloc0(sizeof(*hostwin));
+
+    hostwin->min_iova = min_iova;
+    hostwin->max_iova = max_iova;
+    hostwin->iova_pgsizes = iova_pgsizes;
+    QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
+}
+
+static int vfio_host_win_del(VFIOContainer *container,
+                             hwaddr min_iova, hwaddr max_iova)
+{
+    VFIOHostDMAWindow *hostwin;
+
+    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+        if (hostwin->min_iova == min_iova && hostwin->max_iova == max_iova) {
+            QLIST_REMOVE(hostwin, hostwin_next);
+            g_free(hostwin);
+            return 0;
+        }
+    }
+
+    return -1;
+}
+
+static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
+                                            hwaddr iova, hwaddr end)
+{
+    VFIOHostDMAWindow *hostwin;
+    bool hostwin_found = false;
+
+    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
+            hostwin_found = true;
+            break;
+        }
+    }
+
+    return hostwin_found ? hostwin : NULL;
+}
+
 static int vfio_spapr_remove_window(VFIOContainer *container,
                                     hwaddr offset_within_address_space)
 {
@@ -267,6 +321,26 @@ int vfio_container_add_section_window(VFIOContainer *container,
     hwaddr pgsize = 0;
     int ret;
 
+    /*
+     * VFIO_SPAPR_TCE_IOMMU supports a single host window between
+     * [dma32_window_start, dma32_window_size), we need to ensure
+     * the section fall in this range.
+     */
+    if (container->iommu_type == VFIO_SPAPR_TCE_IOMMU) {
+        hwaddr iova, end;
+
+        iova = section->offset_within_address_space;
+        end = iova + int128_get64(section->size) - 1;
+
+        if (!vfio_find_hostwin(container, iova, end)) {
+            error_setg(errp, "Container %p can't map guest IOVA region"
+                       " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container,
+                       iova, end);
+            return -EINVAL;
+        }
+        return 0;
+    }
+
     if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
         return 0;
     }
@@ -351,6 +425,8 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
     bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
     int ret, fd = container->fd;
 
+    QLIST_INIT(&container->hostwin_list);
+
     /*
      * The host kernel code implementing VFIO_IOMMU_DISABLE is called
      * when container fd is closed so we do not call it explicitly
@@ -418,7 +494,14 @@ listener_unregister_exit:
 
 void vfio_spapr_container_deinit(VFIOContainer *container)
 {
+    VFIOHostDMAWindow *hostwin, *next;
+
     if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
         memory_listener_unregister(&container->prereg_listener);
     }
+    QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
+                       next) {
+        QLIST_REMOVE(hostwin, hostwin_next);
+        g_free(hostwin);
+    }
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 06/41] vfio: Introduce base object for VFIOContainer and targeted interface
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (4 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 05/41] vfio/common: Move vfio_host_win_add/del into spapr.c Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:36   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 07/41] vfio/container: Introduce a empty VFIOIOMMUOps Zhenzhong Duan
                   ` (36 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Yi Sun

Introduce a dumb VFIOContainerBase object and its targeted interface.
This is willingly not a QOM object because we don't want it to be
visible from the user interface. The VFIOContainerBase will be
smoothly populated in subsequent patches as well as interfaces.

No fucntional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: use SPDX identifier, use const char *name parameter, HW_VFIO_VFIO_CONTAINER_BASE_H

 include/hw/vfio/vfio-common.h         |  8 ++---
 include/hw/vfio/vfio-container-base.h | 50 +++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 6 deletions(-)
 create mode 100644 include/hw/vfio/vfio-container-base.h

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index a4a22accb9..586d153c12 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -30,6 +30,7 @@
 #include <linux/vfio.h>
 #endif
 #include "sysemu/sysemu.h"
+#include "hw/vfio/vfio-container-base.h"
 
 #define VFIO_MSG_PREFIX "vfio %s: "
 
@@ -81,6 +82,7 @@ typedef struct VFIOAddressSpace {
 struct VFIOGroup;
 
 typedef struct VFIOContainer {
+    VFIOContainerBase bcontainer;
     VFIOAddressSpace *space;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
     MemoryListener listener;
@@ -201,12 +203,6 @@ typedef struct VFIODisplay {
     } dmabuf;
 } VFIODisplay;
 
-typedef struct {
-    unsigned long *bitmap;
-    hwaddr size;
-    hwaddr pages;
-} VFIOBitmap;
-
 VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
 bool vfio_devices_all_running_and_saving(VFIOContainer *container);
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
new file mode 100644
index 0000000000..1d6daaea5d
--- /dev/null
+++ b/include/hw/vfio/vfio-container-base.h
@@ -0,0 +1,50 @@
+/*
+ * VFIO BASE CONTAINER
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *          Eric Auger <eric.auger@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_VFIO_VFIO_CONTAINER_BASE_H
+#define HW_VFIO_VFIO_CONTAINER_BASE_H
+
+#include "exec/memory.h"
+
+typedef struct VFIODevice VFIODevice;
+typedef struct VFIOIOMMUOps VFIOIOMMUOps;
+
+typedef struct {
+    unsigned long *bitmap;
+    hwaddr size;
+    hwaddr pages;
+} VFIOBitmap;
+
+/*
+ * This is the base object for vfio container backends
+ */
+typedef struct VFIOContainerBase {
+    const VFIOIOMMUOps *ops;
+} VFIOContainerBase;
+
+struct VFIOIOMMUOps {
+    /* basic feature */
+    int (*dma_map)(VFIOContainerBase *bcontainer,
+                   hwaddr iova, ram_addr_t size,
+                   void *vaddr, bool readonly);
+    int (*dma_unmap)(VFIOContainerBase *bcontainer,
+                     hwaddr iova, ram_addr_t size,
+                     IOMMUTLBEntry *iotlb);
+    int (*attach_device)(const char *name, VFIODevice *vbasedev,
+                         AddressSpace *as, Error **errp);
+    void (*detach_device)(VFIODevice *vbasedev);
+    /* migration feature */
+    int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
+    int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
+                              hwaddr iova, hwaddr size);
+};
+#endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 07/41] vfio/container: Introduce a empty VFIOIOMMUOps
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (5 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 06/41] vfio: Introduce base object for VFIOContainer and targeted interface Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:36   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 08/41] vfio/container: Switch to dma_map|unmap API Zhenzhong Duan
                   ` (35 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

This empty VFIOIOMMUOps named vfio_legacy_ops will hold all general
IOMMU ops of legacy container.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h | 2 +-
 hw/vfio/container.c           | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 586d153c12..678161f207 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -255,7 +255,7 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
 typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
 extern VFIOGroupList vfio_group_list;
 extern VFIODeviceList vfio_device_list;
-
+extern const VFIOIOMMUOps vfio_legacy_ops;
 extern const MemoryListener vfio_memory_listener;
 extern int vfio_kvm_device_fd;
 
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 242010036a..4bc43ddfa4 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -472,6 +472,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
                                   Error **errp)
 {
     VFIOContainer *container;
+    VFIOContainerBase *bcontainer;
     int ret, fd;
     VFIOAddressSpace *space;
 
@@ -552,6 +553,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->iova_ranges = NULL;
     QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->vrdl_list);
+    bcontainer = &container->bcontainer;
+    bcontainer->ops = &vfio_legacy_ops;
 
     ret = vfio_init_container(container, group->fd, errp);
     if (ret) {
@@ -933,3 +936,5 @@ void vfio_detach_device(VFIODevice *vbasedev)
     vfio_put_base_device(vbasedev);
     vfio_put_group(group);
 }
+
+const VFIOIOMMUOps vfio_legacy_ops;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 08/41] vfio/container: Switch to dma_map|unmap API
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (6 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 07/41] vfio/container: Introduce a empty VFIOIOMMUOps Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:37   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 09/41] vfio/common: Introduce vfio_container_init/destroy helper Zhenzhong Duan
                   ` (34 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Zhenzhong Duan

From: Eric Auger <eric.auger@redhat.com>

No fucntional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: use SPDX identifier, use assert

 include/hw/vfio/vfio-common.h         |  4 ---
 include/hw/vfio/vfio-container-base.h |  7 +++++
 hw/vfio/common.c                      | 45 +++++++++++++++------------
 hw/vfio/container-base.c              | 32 +++++++++++++++++++
 hw/vfio/container.c                   | 22 ++++++++-----
 hw/vfio/meson.build                   |  1 +
 hw/vfio/trace-events                  |  2 +-
 7 files changed, 81 insertions(+), 32 deletions(-)
 create mode 100644 hw/vfio/container-base.c

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 678161f207..24a26345e5 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -208,10 +208,6 @@ void vfio_put_address_space(VFIOAddressSpace *space);
 bool vfio_devices_all_running_and_saving(VFIOContainer *container);
 
 /* container->fd */
-int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
-                   ram_addr_t size, IOMMUTLBEntry *iotlb);
-int vfio_dma_map(VFIOContainer *container, hwaddr iova,
-                 ram_addr_t size, void *vaddr, bool readonly);
 int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start);
 int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
                             hwaddr iova, hwaddr size);
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 1d6daaea5d..56b033f59f 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -31,6 +31,13 @@ typedef struct VFIOContainerBase {
     const VFIOIOMMUOps *ops;
 } VFIOContainerBase;
 
+int vfio_container_dma_map(VFIOContainerBase *bcontainer,
+                           hwaddr iova, ram_addr_t size,
+                           void *vaddr, bool readonly);
+int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
+                             hwaddr iova, ram_addr_t size,
+                             IOMMUTLBEntry *iotlb);
+
 struct VFIOIOMMUOps {
     /* basic feature */
     int (*dma_map)(VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e70fdf5e0c..e610771888 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -292,7 +292,7 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
     VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
-    VFIOContainer *container = giommu->container;
+    VFIOContainerBase *bcontainer = &giommu->container->bcontainer;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     void *vaddr;
     int ret;
@@ -322,21 +322,22 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
          * of vaddr will always be there, even if the memory object is
          * destroyed and its backing memory munmap-ed.
          */
-        ret = vfio_dma_map(container, iova,
-                           iotlb->addr_mask + 1, vaddr,
-                           read_only);
+        ret = vfio_container_dma_map(bcontainer, iova,
+                                     iotlb->addr_mask + 1, vaddr,
+                                     read_only);
         if (ret) {
-            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+            error_report("vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx", %p) = %d (%s)",
-                         container, iova,
+                         bcontainer, iova,
                          iotlb->addr_mask + 1, vaddr, ret, strerror(-ret));
         }
     } else {
-        ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
+        ret = vfio_container_dma_unmap(bcontainer, iova,
+                                       iotlb->addr_mask + 1, iotlb);
         if (ret) {
-            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+            error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx") = %d (%s)",
-                         container, iova,
+                         bcontainer, iova,
                          iotlb->addr_mask + 1, ret, strerror(-ret));
             vfio_set_migration_error(ret);
         }
@@ -355,9 +356,10 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
     int ret;
 
     /* Unmap with a single call. */
-    ret = vfio_dma_unmap(vrdl->container, iova, size , NULL);
+    ret = vfio_container_dma_unmap(&vrdl->container->bcontainer,
+                                   iova, size , NULL);
     if (ret) {
-        error_report("%s: vfio_dma_unmap() failed: %s", __func__,
+        error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
                      strerror(-ret));
     }
 }
@@ -385,8 +387,8 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
                section->offset_within_address_space;
         vaddr = memory_region_get_ram_ptr(section->mr) + start;
 
-        ret = vfio_dma_map(vrdl->container, iova, next - start,
-                           vaddr, section->readonly);
+        ret = vfio_container_dma_map(&vrdl->container->bcontainer, iova,
+                                     next - start, vaddr, section->readonly);
         if (ret) {
             /* Rollback */
             vfio_ram_discard_notify_discard(rdl, section);
@@ -684,10 +686,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
         }
     }
 
-    ret = vfio_dma_map(container, iova, int128_get64(llsize),
-                       vaddr, section->readonly);
+    ret = vfio_container_dma_map(&container->bcontainer,
+                                 iova, int128_get64(llsize), vaddr,
+                                 section->readonly);
     if (ret) {
-        error_setg(&err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
+        error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
                    "0x%"HWADDR_PRIx", %p) = %d (%s)",
                    container, iova, int128_get64(llsize), vaddr, ret,
                    strerror(-ret));
@@ -784,18 +787,20 @@ static void vfio_listener_region_del(MemoryListener *listener,
         if (int128_eq(llsize, int128_2_64())) {
             /* The unmap ioctl doesn't accept a full 64-bit span. */
             llsize = int128_rshift(llsize, 1);
-            ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+            ret = vfio_container_dma_unmap(&container->bcontainer, iova,
+                                           int128_get64(llsize), NULL);
             if (ret) {
-                error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+                error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                              "0x%"HWADDR_PRIx") = %d (%s)",
                              container, iova, int128_get64(llsize), ret,
                              strerror(-ret));
             }
             iova += int128_get64(llsize);
         }
-        ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
+        ret = vfio_container_dma_unmap(&container->bcontainer, iova,
+                                       int128_get64(llsize), NULL);
         if (ret) {
-            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
+            error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx") = %d (%s)",
                          container, iova, int128_get64(llsize), ret,
                          strerror(-ret));
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
new file mode 100644
index 0000000000..55d3a35fa4
--- /dev/null
+++ b/hw/vfio/container-base.c
@@ -0,0 +1,32 @@
+/*
+ * VFIO BASE CONTAINER
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *          Eric Auger <eric.auger@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/error-report.h"
+#include "hw/vfio/vfio-container-base.h"
+
+int vfio_container_dma_map(VFIOContainerBase *bcontainer,
+                           hwaddr iova, ram_addr_t size,
+                           void *vaddr, bool readonly)
+{
+    g_assert(bcontainer->ops->dma_map);
+    return bcontainer->ops->dma_map(bcontainer, iova, size, vaddr, readonly);
+}
+
+int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
+                             hwaddr iova, ram_addr_t size,
+                             IOMMUTLBEntry *iotlb)
+{
+    g_assert(bcontainer->ops->dma_unmap);
+    return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
+}
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 4bc43ddfa4..c04df26323 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -115,9 +115,11 @@ unmap_exit:
 /*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
-int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
-                   ram_addr_t size, IOMMUTLBEntry *iotlb)
+static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
+                                 ram_addr_t size, IOMMUTLBEntry *iotlb)
 {
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
     struct vfio_iommu_type1_dma_unmap unmap = {
         .argsz = sizeof(unmap),
         .flags = 0,
@@ -151,7 +153,7 @@ int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
          */
         if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
             container->iommu_type == VFIO_TYPE1v2_IOMMU) {
-            trace_vfio_dma_unmap_overflow_workaround();
+            trace_vfio_legacy_dma_unmap_overflow_workaround();
             unmap.size -= 1ULL << ctz64(container->pgsizes);
             continue;
         }
@@ -170,9 +172,11 @@ int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
     return 0;
 }
 
-int vfio_dma_map(VFIOContainer *container, hwaddr iova,
-                 ram_addr_t size, void *vaddr, bool readonly)
+static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
+                               ram_addr_t size, void *vaddr, bool readonly)
 {
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
     struct vfio_iommu_type1_dma_map map = {
         .argsz = sizeof(map),
         .flags = VFIO_DMA_MAP_FLAG_READ,
@@ -191,7 +195,8 @@ int vfio_dma_map(VFIOContainer *container, hwaddr iova,
      * the VGA ROM space.
      */
     if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
-        (errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 &&
+        (errno == EBUSY &&
+         vfio_legacy_dma_unmap(bcontainer, iova, size, NULL) == 0 &&
          ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
         return 0;
     }
@@ -937,4 +942,7 @@ void vfio_detach_device(VFIODevice *vbasedev)
     vfio_put_group(group);
 }
 
-const VFIOIOMMUOps vfio_legacy_ops;
+const VFIOIOMMUOps vfio_legacy_ops = {
+    .dma_map = vfio_legacy_dma_map,
+    .dma_unmap = vfio_legacy_dma_unmap,
+};
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 2a6912c940..eb6ce6229d 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -2,6 +2,7 @@ vfio_ss = ss.source_set()
 vfio_ss.add(files(
   'helpers.c',
   'common.c',
+  'container-base.c',
   'container.c',
   'spapr.c',
   'migration.c',
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 0eb2387cf2..9f7fedee98 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -116,7 +116,7 @@ vfio_region_unmap(const char *name, unsigned long offset, unsigned long end) "Re
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%08x"
-vfio_dma_unmap_overflow_workaround(void) ""
+vfio_legacy_dma_unmap_overflow_workaround(void) ""
 vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start, uint64_t dirty_pages) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64" dirty_pages=%"PRIu64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 09/41] vfio/common: Introduce vfio_container_init/destroy helper
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (7 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 08/41] vfio/container: Switch to dma_map|unmap API Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:37   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 10/41] vfio/common: Move giommu_list in base container Zhenzhong Duan
                   ` (33 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

This adds two helper functions vfio_container_init/destroy which will be
used by both legacy and iommufd containers to do base container specific
initialization and release.

No fucntional change intended.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-container-base.h | 4 ++++
 hw/vfio/container-base.c              | 9 +++++++++
 hw/vfio/container.c                   | 4 +++-
 3 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 56b033f59f..577f52ccbc 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -38,6 +38,10 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
                              hwaddr iova, ram_addr_t size,
                              IOMMUTLBEntry *iotlb);
 
+void vfio_container_init(VFIOContainerBase *bcontainer,
+                         const VFIOIOMMUOps *ops);
+void vfio_container_destroy(VFIOContainerBase *bcontainer);
+
 struct VFIOIOMMUOps {
     /* basic feature */
     int (*dma_map)(VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 55d3a35fa4..e929435751 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -30,3 +30,12 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
     g_assert(bcontainer->ops->dma_unmap);
     return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
 }
+
+void vfio_container_init(VFIOContainerBase *bcontainer, const VFIOIOMMUOps *ops)
+{
+    bcontainer->ops = ops;
+}
+
+void vfio_container_destroy(VFIOContainerBase *bcontainer)
+{
+}
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index c04df26323..32a0251dd1 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -559,7 +559,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->vrdl_list);
     bcontainer = &container->bcontainer;
-    bcontainer->ops = &vfio_legacy_ops;
+    vfio_container_init(bcontainer, &vfio_legacy_ops);
 
     ret = vfio_init_container(container, group->fd, errp);
     if (ret) {
@@ -661,6 +661,7 @@ put_space_exit:
 static void vfio_disconnect_container(VFIOGroup *group)
 {
     VFIOContainer *container = group->container;
+    VFIOContainerBase *bcontainer = &container->bcontainer;
 
     QLIST_REMOVE(group, container_next);
     group->container = NULL;
@@ -695,6 +696,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
             QLIST_REMOVE(giommu, giommu_next);
             g_free(giommu);
         }
+        vfio_container_destroy(bcontainer);
 
         trace_vfio_disconnect_container(container->fd);
         close(container->fd);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 10/41] vfio/common: Move giommu_list in base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (8 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 09/41] vfio/common: Introduce vfio_container_init/destroy helper Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:50   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 11/41] vfio/container: Move space field to " Zhenzhong Duan
                   ` (32 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Zhenzhong Duan

From: Eric Auger <eric.auger@redhat.com>

Move the giommu_list field in the base container and store
the base container in the VFIOGuestIOMMU.

No functional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h         |  9 ---------
 include/hw/vfio/vfio-container-base.h |  9 +++++++++
 hw/vfio/common.c                      | 17 +++++++++++------
 hw/vfio/container-base.c              |  9 +++++++++
 hw/vfio/container.c                   |  8 --------
 5 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 24a26345e5..6be082b8f2 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -95,7 +95,6 @@ typedef struct VFIOContainer {
     uint64_t max_dirty_bitmap_size;
     unsigned long pgsizes;
     unsigned int dma_max_mappings;
-    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
@@ -104,14 +103,6 @@ typedef struct VFIOContainer {
     GList *iova_ranges;
 } VFIOContainer;
 
-typedef struct VFIOGuestIOMMU {
-    VFIOContainer *container;
-    IOMMUMemoryRegion *iommu_mr;
-    hwaddr iommu_offset;
-    IOMMUNotifier n;
-    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
-} VFIOGuestIOMMU;
-
 typedef struct VFIORamDiscardListener {
     VFIOContainer *container;
     MemoryRegion *mr;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 577f52ccbc..a11aec5755 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -29,8 +29,17 @@ typedef struct {
  */
 typedef struct VFIOContainerBase {
     const VFIOIOMMUOps *ops;
+    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
 } VFIOContainerBase;
 
+typedef struct VFIOGuestIOMMU {
+    VFIOContainerBase *bcontainer;
+    IOMMUMemoryRegion *iommu_mr;
+    hwaddr iommu_offset;
+    IOMMUNotifier n;
+    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
+} VFIOGuestIOMMU;
+
 int vfio_container_dma_map(VFIOContainerBase *bcontainer,
                            hwaddr iova, ram_addr_t size,
                            void *vaddr, bool readonly);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index e610771888..43580bcc43 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -292,7 +292,7 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
 static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 {
     VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
-    VFIOContainerBase *bcontainer = &giommu->container->bcontainer;
+    VFIOContainerBase *bcontainer = giommu->bcontainer;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     void *vaddr;
     int ret;
@@ -569,6 +569,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     hwaddr iova, end;
     Int128 llend, llsize;
     void *vaddr;
@@ -612,7 +613,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
         giommu->iommu_mr = iommu_mr;
         giommu->iommu_offset = section->offset_within_address_space -
                                section->offset_within_region;
-        giommu->container = container;
+        giommu->bcontainer = bcontainer;
         llend = int128_add(int128_make64(section->offset_within_region),
                            section->size);
         llend = int128_sub(llend, int128_one());
@@ -647,7 +648,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
             g_free(giommu);
             goto fail;
         }
-        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
+        QLIST_INSERT_HEAD(&bcontainer->giommu_list, giommu, giommu_next);
         memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
 
         return;
@@ -732,6 +733,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     hwaddr iova, end;
     Int128 llend, llsize;
     int ret;
@@ -744,7 +746,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
 
-        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+        QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) {
             if (MEMORY_REGION(giommu->iommu_mr) == section->mr &&
                 giommu->n.start == section->offset_within_region) {
                 memory_region_unregister_iommu_notifier(section->mr,
@@ -1206,7 +1208,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     vfio_giommu_dirty_notifier *gdn = container_of(n,
                                                 vfio_giommu_dirty_notifier, n);
     VFIOGuestIOMMU *giommu = gdn->giommu;
-    VFIOContainer *container = giommu->container;
+    VFIOContainerBase *bcontainer = giommu->bcontainer;
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     ram_addr_t translated_addr;
     int ret = -EINVAL;
@@ -1284,12 +1288,13 @@ static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
 static int vfio_sync_dirty_bitmap(VFIOContainer *container,
                                   MemoryRegionSection *section)
 {
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     ram_addr_t ram_addr;
 
     if (memory_region_is_iommu(section->mr)) {
         VFIOGuestIOMMU *giommu;
 
-        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
+        QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) {
             if (MEMORY_REGION(giommu->iommu_mr) == section->mr &&
                 giommu->n.start == section->offset_within_region) {
                 Int128 llend;
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index e929435751..20bcb9669a 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -34,8 +34,17 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
 void vfio_container_init(VFIOContainerBase *bcontainer, const VFIOIOMMUOps *ops)
 {
     bcontainer->ops = ops;
+    QLIST_INIT(&bcontainer->giommu_list);
 }
 
 void vfio_container_destroy(VFIOContainerBase *bcontainer)
 {
+    VFIOGuestIOMMU *giommu, *tmp;
+
+    QLIST_FOREACH_SAFE(giommu, &bcontainer->giommu_list, giommu_next, tmp) {
+        memory_region_unregister_iommu_notifier(
+                MEMORY_REGION(giommu->iommu_mr), &giommu->n);
+        QLIST_REMOVE(giommu, giommu_next);
+        g_free(giommu);
+    }
 }
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 32a0251dd1..133d3c8f5c 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -556,7 +556,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->dirty_pages_supported = false;
     container->dma_max_mappings = 0;
     container->iova_ranges = NULL;
-    QLIST_INIT(&container->giommu_list);
     QLIST_INIT(&container->vrdl_list);
     bcontainer = &container->bcontainer;
     vfio_container_init(bcontainer, &vfio_legacy_ops);
@@ -686,16 +685,9 @@ static void vfio_disconnect_container(VFIOGroup *group)
 
     if (QLIST_EMPTY(&container->group_list)) {
         VFIOAddressSpace *space = container->space;
-        VFIOGuestIOMMU *giommu, *tmp;
 
         QLIST_REMOVE(container, next);
 
-        QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
-            memory_region_unregister_iommu_notifier(
-                    MEMORY_REGION(giommu->iommu_mr), &giommu->n);
-            QLIST_REMOVE(giommu, giommu_next);
-            g_free(giommu);
-        }
         vfio_container_destroy(bcontainer);
 
         trace_vfio_disconnect_container(container->fd);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 11/41] vfio/container: Move space field to base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (9 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 10/41] vfio/common: Move giommu_list in base container Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:50   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 12/41] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API Zhenzhong Duan
                   ` (31 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Harsh Prateek Bora,
	open list:sPAPR (pseries)

From: Eric Auger <eric.auger@redhat.com>

Move the space field to the base object. Also the VFIOAddressSpace
now contains a list of base containers.

No fucntional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: use bcontainer->space->as instead of container->bcontainer.space->as

 include/hw/vfio/vfio-common.h         |  8 --------
 include/hw/vfio/vfio-container-base.h |  9 +++++++++
 hw/ppc/spapr_pci_vfio.c               | 10 +++++-----
 hw/vfio/common.c                      |  4 ++--
 hw/vfio/container-base.c              |  6 +++++-
 hw/vfio/container.c                   | 18 ++++++++----------
 6 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 6be082b8f2..bd4de6cb3a 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -73,17 +73,10 @@ typedef struct VFIOMigration {
     bool initial_data_sent;
 } VFIOMigration;
 
-typedef struct VFIOAddressSpace {
-    AddressSpace *as;
-    QLIST_HEAD(, VFIOContainer) containers;
-    QLIST_ENTRY(VFIOAddressSpace) list;
-} VFIOAddressSpace;
-
 struct VFIOGroup;
 
 typedef struct VFIOContainer {
     VFIOContainerBase bcontainer;
-    VFIOAddressSpace *space;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
     MemoryListener listener;
     MemoryListener prereg_listener;
@@ -98,7 +91,6 @@ typedef struct VFIOContainer {
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
-    QLIST_ENTRY(VFIOContainer) next;
     QLIST_HEAD(, VFIODevice) device_list;
     GList *iova_ranges;
 } VFIOContainer;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index a11aec5755..c7cc6ec9c5 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -24,12 +24,20 @@ typedef struct {
     hwaddr pages;
 } VFIOBitmap;
 
+typedef struct VFIOAddressSpace {
+    AddressSpace *as;
+    QLIST_HEAD(, VFIOContainerBase) containers;
+    QLIST_ENTRY(VFIOAddressSpace) list;
+} VFIOAddressSpace;
+
 /*
  * This is the base object for vfio container backends
  */
 typedef struct VFIOContainerBase {
     const VFIOIOMMUOps *ops;
+    VFIOAddressSpace *space;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+    QLIST_ENTRY(VFIOContainerBase) next;
 } VFIOContainerBase;
 
 typedef struct VFIOGuestIOMMU {
@@ -48,6 +56,7 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
                              IOMMUTLBEntry *iotlb);
 
 void vfio_container_init(VFIOContainerBase *bcontainer,
+                         VFIOAddressSpace *space,
                          const VFIOIOMMUOps *ops);
 void vfio_container_destroy(VFIOContainerBase *bcontainer);
 
diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
index f283f7e38d..d1d07bec46 100644
--- a/hw/ppc/spapr_pci_vfio.c
+++ b/hw/ppc/spapr_pci_vfio.c
@@ -84,27 +84,27 @@ static int vfio_eeh_container_op(VFIOContainer *container, uint32_t op)
 static VFIOContainer *vfio_eeh_as_container(AddressSpace *as)
 {
     VFIOAddressSpace *space = vfio_get_address_space(as);
-    VFIOContainer *container = NULL;
+    VFIOContainerBase *bcontainer = NULL;
 
     if (QLIST_EMPTY(&space->containers)) {
         /* No containers to act on */
         goto out;
     }
 
-    container = QLIST_FIRST(&space->containers);
+    bcontainer = QLIST_FIRST(&space->containers);
 
-    if (QLIST_NEXT(container, next)) {
+    if (QLIST_NEXT(bcontainer, next)) {
         /*
          * We don't yet have logic to synchronize EEH state across
          * multiple containers
          */
-        container = NULL;
+        bcontainer = NULL;
         goto out;
     }
 
 out:
     vfio_put_address_space(space);
-    return container;
+    return container_of(bcontainer, VFIOContainer, bcontainer);
 }
 
 static bool vfio_eeh_as_ok(AddressSpace *as)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 43580bcc43..1d8202537e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -145,7 +145,7 @@ void vfio_unblock_multiple_devices_migration(void)
 
 bool vfio_viommu_preset(VFIODevice *vbasedev)
 {
-    return vbasedev->container->space->as != &address_space_memory;
+    return vbasedev->container->bcontainer.space->as != &address_space_memory;
 }
 
 static void vfio_set_migration_error(int err)
@@ -922,7 +922,7 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
     dirty.container = container;
 
     memory_listener_register(&dirty.listener,
-                             container->space->as);
+                             container->bcontainer.space->as);
 
     *ranges = dirty.ranges;
 
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 20bcb9669a..3933391e0d 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -31,9 +31,11 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
     return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
 }
 
-void vfio_container_init(VFIOContainerBase *bcontainer, const VFIOIOMMUOps *ops)
+void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
+                         const VFIOIOMMUOps *ops)
 {
     bcontainer->ops = ops;
+    bcontainer->space = space;
     QLIST_INIT(&bcontainer->giommu_list);
 }
 
@@ -41,6 +43,8 @@ void vfio_container_destroy(VFIOContainerBase *bcontainer)
 {
     VFIOGuestIOMMU *giommu, *tmp;
 
+    QLIST_REMOVE(bcontainer, next);
+
     QLIST_FOREACH_SAFE(giommu, &bcontainer->giommu_list, giommu_next, tmp) {
         memory_region_unregister_iommu_notifier(
                 MEMORY_REGION(giommu->iommu_mr), &giommu->n);
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 133d3c8f5c..f12fcb6fe1 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -514,7 +514,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
      * details once we know which type of IOMMU we are using.
      */
 
-    QLIST_FOREACH(container, &space->containers, next) {
+    QLIST_FOREACH(bcontainer, &space->containers, next) {
+        container = container_of(bcontainer, VFIOContainer, bcontainer);
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
             ret = vfio_ram_block_discard_disable(container, true);
             if (ret) {
@@ -550,7 +551,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     }
 
     container = g_malloc0(sizeof(*container));
-    container->space = space;
     container->fd = fd;
     container->error = NULL;
     container->dirty_pages_supported = false;
@@ -558,7 +558,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->iova_ranges = NULL;
     QLIST_INIT(&container->vrdl_list);
     bcontainer = &container->bcontainer;
-    vfio_container_init(bcontainer, &vfio_legacy_ops);
+    vfio_container_init(bcontainer, space, &vfio_legacy_ops);
 
     ret = vfio_init_container(container, group->fd, errp);
     if (ret) {
@@ -613,14 +613,14 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     vfio_kvm_device_add_group(group);
 
     QLIST_INIT(&container->group_list);
-    QLIST_INSERT_HEAD(&space->containers, container, next);
+    QLIST_INSERT_HEAD(&space->containers, bcontainer, next);
 
     group->container = container;
     QLIST_INSERT_HEAD(&container->group_list, group, container_next);
 
     container->listener = vfio_memory_listener;
 
-    memory_listener_register(&container->listener, container->space->as);
+    memory_listener_register(&container->listener, bcontainer->space->as);
 
     if (container->error) {
         ret = -1;
@@ -634,7 +634,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     return 0;
 listener_release_exit:
     QLIST_REMOVE(group, container_next);
-    QLIST_REMOVE(container, next);
+    QLIST_REMOVE(bcontainer, next);
     vfio_kvm_device_del_group(group);
     memory_listener_unregister(&container->listener);
     if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU ||
@@ -684,9 +684,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
     }
 
     if (QLIST_EMPTY(&container->group_list)) {
-        VFIOAddressSpace *space = container->space;
-
-        QLIST_REMOVE(container, next);
+        VFIOAddressSpace *space = bcontainer->space;
 
         vfio_container_destroy(bcontainer);
 
@@ -707,7 +705,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
     QLIST_FOREACH(group, &vfio_group_list, next) {
         if (group->groupid == groupid) {
             /* Found it.  Now is it already in the right context? */
-            if (group->container->space->as == as) {
+            if (group->container->bcontainer.space->as == as) {
                 return group;
             } else {
                 error_setg(errp, "group %d used in multiple address spaces",
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 12/41] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (10 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 11/41] vfio/container: Move space field to " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:50   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 13/41] vfio/container: Move per container device list in base container Zhenzhong Duan
                   ` (30 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Zhenzhong Duan

From: Eric Auger <eric.auger@redhat.com>

dirty_pages_supported field is also moved to the base container

No fucntional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: use assert

 include/hw/vfio/vfio-common.h         |  6 ------
 include/hw/vfio/vfio-container-base.h |  6 ++++++
 hw/vfio/common.c                      | 12 ++++++++----
 hw/vfio/container-base.c              | 16 ++++++++++++++++
 hw/vfio/container.c                   | 21 ++++++++++++++-------
 5 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index bd4de6cb3a..60f2785fe0 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -83,7 +83,6 @@ typedef struct VFIOContainer {
     unsigned iommu_type;
     Error *error;
     bool initialized;
-    bool dirty_pages_supported;
     uint64_t dirty_pgsizes;
     uint64_t max_dirty_bitmap_size;
     unsigned long pgsizes;
@@ -190,11 +189,6 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
 bool vfio_devices_all_running_and_saving(VFIOContainer *container);
 
-/* container->fd */
-int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start);
-int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
-                            hwaddr iova, hwaddr size);
-
 /* SPAPR specific */
 int vfio_container_add_section_window(VFIOContainer *container,
                                       MemoryRegionSection *section,
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index c7cc6ec9c5..f244f003d0 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -36,6 +36,7 @@ typedef struct VFIOAddressSpace {
 typedef struct VFIOContainerBase {
     const VFIOIOMMUOps *ops;
     VFIOAddressSpace *space;
+    bool dirty_pages_supported;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_ENTRY(VFIOContainerBase) next;
 } VFIOContainerBase;
@@ -54,6 +55,11 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
 int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
                              hwaddr iova, ram_addr_t size,
                              IOMMUTLBEntry *iotlb);
+int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
+                                           bool start);
+int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+                                      VFIOBitmap *vbmap,
+                                      hwaddr iova, hwaddr size);
 
 void vfio_container_init(VFIOContainerBase *bcontainer,
                          VFIOAddressSpace *space,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1d8202537e..b1a875ca93 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1079,7 +1079,8 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
     if (vfio_devices_all_device_dirty_tracking(container)) {
         ret = vfio_devices_dma_logging_start(container);
     } else {
-        ret = vfio_set_dirty_page_tracking(container, true);
+        ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
+                                                     true);
     }
 
     if (ret) {
@@ -1097,7 +1098,8 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     if (vfio_devices_all_device_dirty_tracking(container)) {
         vfio_devices_dma_logging_stop(container);
     } else {
-        ret = vfio_set_dirty_page_tracking(container, false);
+        ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
+                                                     false);
     }
 
     if (ret) {
@@ -1165,7 +1167,8 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
     VFIOBitmap vbmap;
     int ret;
 
-    if (!container->dirty_pages_supported && !all_device_dirty_tracking) {
+    if (!container->bcontainer.dirty_pages_supported &&
+        !all_device_dirty_tracking) {
         cpu_physical_memory_set_dirty_range(ram_addr, size,
                                             tcg_enabled() ? DIRTY_CLIENTS_ALL :
                                             DIRTY_CLIENTS_NOCODE);
@@ -1180,7 +1183,8 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
     if (all_device_dirty_tracking) {
         ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
     } else {
-        ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
+        ret = vfio_container_query_dirty_bitmap(&container->bcontainer, &vbmap,
+                                                iova, size);
     }
 
     if (ret) {
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 3933391e0d..5d654ae172 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -31,11 +31,27 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
     return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
 }
 
+int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
+                                           bool start)
+{
+    g_assert(bcontainer->ops->set_dirty_page_tracking);
+    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
+}
+
+int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+                                      VFIOBitmap *vbmap,
+                                      hwaddr iova, hwaddr size)
+{
+    g_assert(bcontainer->ops->query_dirty_bitmap);
+    return bcontainer->ops->query_dirty_bitmap(bcontainer, vbmap, iova, size);
+}
+
 void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
                          const VFIOIOMMUOps *ops)
 {
     bcontainer->ops = ops;
     bcontainer->space = space;
+    bcontainer->dirty_pages_supported = false;
     QLIST_INIT(&bcontainer->giommu_list);
 }
 
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index f12fcb6fe1..3ab74e2615 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -131,7 +131,7 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
 
     if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
         if (!vfio_devices_all_device_dirty_tracking(container) &&
-            container->dirty_pages_supported) {
+            container->bcontainer.dirty_pages_supported) {
             return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
         }
 
@@ -205,14 +205,17 @@ static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
     return -errno;
 }
 
-int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
+static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
+                                               bool start)
 {
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
     int ret;
     struct vfio_iommu_type1_dirty_bitmap dirty = {
         .argsz = sizeof(dirty),
     };
 
-    if (!container->dirty_pages_supported) {
+    if (!bcontainer->dirty_pages_supported) {
         return 0;
     }
 
@@ -232,9 +235,12 @@ int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
     return ret;
 }
 
-int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
-                            hwaddr iova, hwaddr size)
+static int vfio_legacy_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+                                          VFIOBitmap *vbmap,
+                                          hwaddr iova, hwaddr size)
 {
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
     struct vfio_iommu_type1_dirty_bitmap *dbitmap;
     struct vfio_iommu_type1_dirty_bitmap_get *range;
     int ret;
@@ -461,7 +467,7 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
      * qemu_real_host_page_size to mark those dirty.
      */
     if (cap_mig->pgsize_bitmap & qemu_real_host_page_size()) {
-        container->dirty_pages_supported = true;
+        container->bcontainer.dirty_pages_supported = true;
         container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
         container->dirty_pgsizes = cap_mig->pgsize_bitmap;
     }
@@ -553,7 +559,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container = g_malloc0(sizeof(*container));
     container->fd = fd;
     container->error = NULL;
-    container->dirty_pages_supported = false;
     container->dma_max_mappings = 0;
     container->iova_ranges = NULL;
     QLIST_INIT(&container->vrdl_list);
@@ -937,4 +942,6 @@ void vfio_detach_device(VFIODevice *vbasedev)
 const VFIOIOMMUOps vfio_legacy_ops = {
     .dma_map = vfio_legacy_dma_map,
     .dma_unmap = vfio_legacy_dma_unmap,
+    .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
+    .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
 };
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 13/41] vfio/container: Move per container device list in base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (11 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 12/41] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 14/41] vfio/container: Convert functions to " Zhenzhong Duan
                   ` (29 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

VFIO Device is also changed to point to base container instead of
legacy container.

No fucntional change intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h         |  3 +--
 include/hw/vfio/vfio-container-base.h |  1 +
 hw/vfio/common.c                      | 23 +++++++++++++++--------
 hw/vfio/container.c                   | 12 ++++++------
 4 files changed, 23 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 60f2785fe0..9740cf9fbc 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -90,7 +90,6 @@ typedef struct VFIOContainer {
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
-    QLIST_HEAD(, VFIODevice) device_list;
     GList *iova_ranges;
 } VFIOContainer;
 
@@ -118,7 +117,7 @@ typedef struct VFIODevice {
     QLIST_ENTRY(VFIODevice) container_next;
     QLIST_ENTRY(VFIODevice) global_next;
     struct VFIOGroup *group;
-    VFIOContainer *container;
+    VFIOContainerBase *bcontainer;
     char *sysfsdev;
     char *name;
     DeviceState *dev;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index f244f003d0..7090962496 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -39,6 +39,7 @@ typedef struct VFIOContainerBase {
     bool dirty_pages_supported;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_ENTRY(VFIOContainerBase) next;
+    QLIST_HEAD(, VFIODevice) device_list;
 } VFIOContainerBase;
 
 typedef struct VFIOGuestIOMMU {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index b1a875ca93..9415395ed9 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -145,7 +145,7 @@ void vfio_unblock_multiple_devices_migration(void)
 
 bool vfio_viommu_preset(VFIODevice *vbasedev)
 {
-    return vbasedev->container->bcontainer.space->as != &address_space_memory;
+    return vbasedev->bcontainer->space->as != &address_space_memory;
 }
 
 static void vfio_set_migration_error(int err)
@@ -179,6 +179,7 @@ bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
 
 static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
 {
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
     MigrationState *ms = migrate_get_current();
 
@@ -187,7 +188,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
         return false;
     }
 
-    QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+    QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
         VFIOMigration *migration = vbasedev->migration;
 
         if (!migration) {
@@ -205,9 +206,10 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
 
 bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
 {
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
 
-    QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+    QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
         if (!vbasedev->dirty_pages_supported) {
             return false;
         }
@@ -222,13 +224,14 @@ bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
  */
 bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 {
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
 
     if (!migration_is_active(migrate_get_current())) {
         return false;
     }
 
-    QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+    QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
         VFIOMigration *migration = vbasedev->migration;
 
         if (!migration) {
@@ -833,12 +836,13 @@ static bool vfio_section_is_vfio_pci(MemoryRegionSection *section,
                                      VFIOContainer *container)
 {
     VFIOPCIDevice *pcidev;
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
     Object *owner;
 
     owner = memory_region_owner(section->mr);
 
-    QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+    QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
         if (vbasedev->type != VFIO_DEVICE_TYPE_PCI) {
             continue;
         }
@@ -939,13 +943,14 @@ static void vfio_devices_dma_logging_stop(VFIOContainer *container)
     uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
                               sizeof(uint64_t))] = {};
     struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
 
     feature->argsz = sizeof(buf);
     feature->flags = VFIO_DEVICE_FEATURE_SET |
                      VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
 
-    QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+    QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
         if (!vbasedev->dirty_tracking) {
             continue;
         }
@@ -1036,6 +1041,7 @@ static int vfio_devices_dma_logging_start(VFIOContainer *container)
 {
     struct vfio_device_feature *feature;
     VFIODirtyRanges ranges;
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
     int ret = 0;
 
@@ -1046,7 +1052,7 @@ static int vfio_devices_dma_logging_start(VFIOContainer *container)
         return -errno;
     }
 
-    QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+    QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
         if (vbasedev->dirty_tracking) {
             continue;
         }
@@ -1139,10 +1145,11 @@ int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
                                     VFIOBitmap *vbmap, hwaddr iova,
                                     hwaddr size)
 {
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
     int ret;
 
-    QLIST_FOREACH(vbasedev, &container->device_list, container_next) {
+    QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
         ret = vfio_device_dma_logging_report(vbasedev, iova, size,
                                              vbmap->bitmap);
         if (ret) {
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 3ab74e2615..63a906de93 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -888,7 +888,7 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
     int groupid = vfio_device_groupid(vbasedev, errp);
     VFIODevice *vbasedev_iter;
     VFIOGroup *group;
-    VFIOContainer *container;
+    VFIOContainerBase *bcontainer;
     int ret;
 
     if (groupid < 0) {
@@ -915,9 +915,9 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
         return ret;
     }
 
-    container = group->container;
-    vbasedev->container = container;
-    QLIST_INSERT_HEAD(&container->device_list, vbasedev, container_next);
+    bcontainer = &group->container->bcontainer;
+    vbasedev->bcontainer = bcontainer;
+    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
     QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
 
     return ret;
@@ -927,13 +927,13 @@ void vfio_detach_device(VFIODevice *vbasedev)
 {
     VFIOGroup *group = vbasedev->group;
 
-    if (!vbasedev->container) {
+    if (!vbasedev->bcontainer) {
         return;
     }
 
     QLIST_REMOVE(vbasedev, global_next);
     QLIST_REMOVE(vbasedev, container_next);
-    vbasedev->container = NULL;
+    vbasedev->bcontainer = NULL;
     trace_vfio_detach_device(vbasedev->name, group->groupid);
     vfio_put_base_device(vbasedev);
     vfio_put_group(group);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 14/41] vfio/container: Convert functions to base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (12 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 13/41] vfio/container: Move per container device list in base container Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 15/41] vfio/container: Move pgsizes and dma_max_mappings " Zhenzhong Duan
                   ` (28 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

From: Eric Auger <eric.auger@redhat.com>

In the prospect to get rid of VFIOContainer refs
in common.c lets convert misc functions to use the base
container object instead:

vfio_devices_all_dirty_tracking
vfio_devices_all_device_dirty_tracking
vfio_devices_all_running_and_mig_active
vfio_devices_query_dirty_bitmap
vfio_get_dirty_bitmap

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h |  9 ++++----
 hw/vfio/common.c              | 42 +++++++++++++++--------------------
 hw/vfio/container.c           |  6 ++---
 hw/vfio/trace-events          |  2 +-
 4 files changed, 26 insertions(+), 33 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9740cf9fbc..bc67e1316c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -186,7 +186,6 @@ typedef struct VFIODisplay {
 
 VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
-bool vfio_devices_all_running_and_saving(VFIOContainer *container);
 
 /* SPAPR specific */
 int vfio_container_add_section_window(VFIOContainer *container,
@@ -260,11 +259,11 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
 void vfio_migration_exit(VFIODevice *vbasedev);
 
 int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size);
-bool vfio_devices_all_running_and_mig_active(VFIOContainer *container);
-bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container);
-int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
+bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer);
+bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer);
+int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
                                     VFIOBitmap *vbmap, hwaddr iova,
                                     hwaddr size);
-int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
                                  uint64_t size, ram_addr_t ram_addr);
 #endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 9415395ed9..cf6618f6ed 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -177,9 +177,8 @@ bool vfio_device_state_is_precopy(VFIODevice *vbasedev)
            migration->device_state == VFIO_DEVICE_STATE_PRE_COPY_P2P;
 }
 
-static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
+static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
 {
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
     MigrationState *ms = migrate_get_current();
 
@@ -204,9 +203,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer *container)
     return true;
 }
 
-bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
+bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
 {
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
 
     QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
@@ -222,9 +220,8 @@ bool vfio_devices_all_device_dirty_tracking(VFIOContainer *container)
  * Check if all VFIO devices are running and migration is active, which is
  * essentially equivalent to the migration being in pre-copy phase.
  */
-bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
+bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer)
 {
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
 
     if (!migration_is_active(migrate_get_current())) {
@@ -1082,7 +1079,7 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     int ret;
 
-    if (vfio_devices_all_device_dirty_tracking(container)) {
+    if (vfio_devices_all_device_dirty_tracking(&container->bcontainer)) {
         ret = vfio_devices_dma_logging_start(container);
     } else {
         ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
@@ -1101,7 +1098,7 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
     VFIOContainer *container = container_of(listener, VFIOContainer, listener);
     int ret = 0;
 
-    if (vfio_devices_all_device_dirty_tracking(container)) {
+    if (vfio_devices_all_device_dirty_tracking(&container->bcontainer)) {
         vfio_devices_dma_logging_stop(container);
     } else {
         ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
@@ -1141,11 +1138,10 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
     return 0;
 }
 
-int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
+int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
                                     VFIOBitmap *vbmap, hwaddr iova,
                                     hwaddr size)
 {
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
     int ret;
 
@@ -1165,17 +1161,16 @@ int vfio_devices_query_dirty_bitmap(VFIOContainer *container,
     return 0;
 }
 
-int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
+int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
                           uint64_t size, ram_addr_t ram_addr)
 {
     bool all_device_dirty_tracking =
-        vfio_devices_all_device_dirty_tracking(container);
+        vfio_devices_all_device_dirty_tracking(bcontainer);
     uint64_t dirty_pages;
     VFIOBitmap vbmap;
     int ret;
 
-    if (!container->bcontainer.dirty_pages_supported &&
-        !all_device_dirty_tracking) {
+    if (!bcontainer->dirty_pages_supported && !all_device_dirty_tracking) {
         cpu_physical_memory_set_dirty_range(ram_addr, size,
                                             tcg_enabled() ? DIRTY_CLIENTS_ALL :
                                             DIRTY_CLIENTS_NOCODE);
@@ -1188,10 +1183,9 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
     }
 
     if (all_device_dirty_tracking) {
-        ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
+        ret = vfio_devices_query_dirty_bitmap(bcontainer, &vbmap, iova, size);
     } else {
-        ret = vfio_container_query_dirty_bitmap(&container->bcontainer, &vbmap,
-                                                iova, size);
+        ret = vfio_container_query_dirty_bitmap(bcontainer, &vbmap, iova, size);
     }
 
     if (ret) {
@@ -1201,8 +1195,7 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
     dirty_pages = cpu_physical_memory_set_dirty_lebitmap(vbmap.bitmap, ram_addr,
                                                          vbmap.pages);
 
-    trace_vfio_get_dirty_bitmap(container->fd, iova, size, vbmap.size,
-                                ram_addr, dirty_pages);
+    trace_vfio_get_dirty_bitmap(iova, size, vbmap.size, ram_addr, dirty_pages);
 out:
     g_free(vbmap.bitmap);
 
@@ -1236,8 +1229,8 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 
     rcu_read_lock();
     if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
-        ret = vfio_get_dirty_bitmap(container, iova, iotlb->addr_mask + 1,
-                                    translated_addr);
+        ret = vfio_get_dirty_bitmap(&container->bcontainer, iova,
+                                    iotlb->addr_mask + 1, translated_addr);
         if (ret) {
             error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx") = %d (%s)",
@@ -1266,7 +1259,8 @@ static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
      * Sync the whole mapped region (spanning multiple individual mappings)
      * in one go.
      */
-    return vfio_get_dirty_bitmap(vrdl->container, iova, size, ram_addr);
+    return vfio_get_dirty_bitmap(&vrdl->container->bcontainer, iova, size,
+                                 ram_addr);
 }
 
 static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
@@ -1335,7 +1329,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
     ram_addr = memory_region_get_ram_addr(section->mr) +
                section->offset_within_region;
 
-    return vfio_get_dirty_bitmap(container,
+    return vfio_get_dirty_bitmap(&container->bcontainer,
                    REAL_HOST_PAGE_ALIGN(section->offset_within_address_space),
                    int128_get64(section->size), ram_addr);
 }
@@ -1350,7 +1344,7 @@ static void vfio_listener_log_sync(MemoryListener *listener,
         return;
     }
 
-    if (vfio_devices_all_dirty_tracking(container)) {
+    if (vfio_devices_all_dirty_tracking(&container->bcontainer)) {
         ret = vfio_sync_dirty_bitmap(container, section);
         if (ret) {
             error_report("vfio: Failed to sync dirty bitmap, err: %d (%s)", ret,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 63a906de93..7bd81eab09 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -129,8 +129,8 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
     bool need_dirty_sync = false;
     int ret;
 
-    if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
-        if (!vfio_devices_all_device_dirty_tracking(container) &&
+    if (iotlb && vfio_devices_all_running_and_mig_active(bcontainer)) {
+        if (!vfio_devices_all_device_dirty_tracking(bcontainer) &&
             container->bcontainer.dirty_pages_supported) {
             return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
         }
@@ -162,7 +162,7 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
     }
 
     if (need_dirty_sync) {
-        ret = vfio_get_dirty_bitmap(container, iova, size,
+        ret = vfio_get_dirty_bitmap(bcontainer, iova, size,
                                     iotlb->translated_addr);
         if (ret) {
             return ret;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 9f7fedee98..08a1f9dfa4 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -117,7 +117,7 @@ vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%08x"
 vfio_legacy_dma_unmap_overflow_workaround(void) ""
-vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start, uint64_t dirty_pages) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64" dirty_pages=%"PRIu64
+vfio_get_dirty_bitmap(uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start, uint64_t dirty_pages) "iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64" dirty_pages=%"PRIu64
 vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
 
 # platform.c
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 15/41] vfio/container: Move pgsizes and dma_max_mappings to base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (13 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 14/41] vfio/container: Convert functions to " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:53   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 16/41] vfio/container: Move vrdl_list " Zhenzhong Duan
                   ` (27 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

From: Eric Auger <eric.auger@redhat.com>

No functional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: Split vrdl_list change out in a seperate patch

 include/hw/vfio/vfio-common.h         |  2 --
 include/hw/vfio/vfio-container-base.h |  2 ++
 hw/vfio/common.c                      | 17 +++++++++--------
 hw/vfio/container-base.c              |  1 +
 hw/vfio/container.c                   | 11 +++++------
 hw/vfio/spapr.c                       | 10 ++++++----
 6 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index bc67e1316c..d3dc2f9dcb 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -85,8 +85,6 @@ typedef struct VFIOContainer {
     bool initialized;
     uint64_t dirty_pgsizes;
     uint64_t max_dirty_bitmap_size;
-    unsigned long pgsizes;
-    unsigned int dma_max_mappings;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 7090962496..85ec7e1a56 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -36,6 +36,8 @@ typedef struct VFIOAddressSpace {
 typedef struct VFIOContainerBase {
     const VFIOIOMMUOps *ops;
     VFIOAddressSpace *space;
+    unsigned long pgsizes;
+    unsigned int dma_max_mappings;
     bool dirty_pages_supported;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
     QLIST_ENTRY(VFIOContainerBase) next;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index cf6618f6ed..1cb53d369e 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -401,6 +401,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
 static void vfio_register_ram_discard_listener(VFIOContainer *container,
                                                MemoryRegionSection *section)
 {
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
     VFIORamDiscardListener *vrdl;
 
@@ -419,8 +420,8 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
                                                                 section->mr);
 
     g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity));
-    g_assert(container->pgsizes &&
-             vrdl->granularity >= 1ULL << ctz64(container->pgsizes));
+    g_assert(bcontainer->pgsizes &&
+             vrdl->granularity >= 1ULL << ctz64(bcontainer->pgsizes));
 
     ram_discard_listener_init(&vrdl->listener,
                               vfio_ram_discard_notify_populate,
@@ -441,7 +442,7 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
      * number of sections in the address space we could have over time,
      * also consuming DMA mappings.
      */
-    if (container->dma_max_mappings) {
+    if (bcontainer->dma_max_mappings) {
         unsigned int vrdl_count = 0, vrdl_mappings = 0, max_memslots = 512;
 
 #ifdef CONFIG_KVM
@@ -462,11 +463,11 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
         }
 
         if (vrdl_mappings + max_memslots - vrdl_count >
-            container->dma_max_mappings) {
+            bcontainer->dma_max_mappings) {
             warn_report("%s: possibly running out of DMA mappings. E.g., try"
                         " increasing the 'block-size' of virtio-mem devies."
                         " Maximum possible DMA mappings: %d, Maximum possible"
-                        " memslots: %d", __func__, container->dma_max_mappings,
+                        " memslots: %d", __func__, bcontainer->dma_max_mappings,
                         max_memslots);
         }
     }
@@ -626,7 +627,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
                             iommu_idx);
 
         ret = memory_region_iommu_set_page_size_mask(giommu->iommu_mr,
-                                                     container->pgsizes,
+                                                     bcontainer->pgsizes,
                                                      &err);
         if (ret) {
             g_free(giommu);
@@ -675,7 +676,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
     llsize = int128_sub(llend, int128_make64(iova));
 
     if (memory_region_is_ram_device(section->mr)) {
-        hwaddr pgmask = (1ULL << ctz64(container->pgsizes)) - 1;
+        hwaddr pgmask = (1ULL << ctz64(bcontainer->pgsizes)) - 1;
 
         if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
             trace_vfio_listener_region_add_no_dma_map(
@@ -777,7 +778,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
     if (memory_region_is_ram_device(section->mr)) {
         hwaddr pgmask;
 
-        pgmask = (1ULL << ctz64(container->pgsizes)) - 1;
+        pgmask = (1ULL << ctz64(bcontainer->pgsizes)) - 1;
         try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
     } else if (memory_region_has_ram_discard_manager(section->mr)) {
         vfio_unregister_ram_discard_listener(container, section);
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 5d654ae172..dcce111349 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -52,6 +52,7 @@ void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
     bcontainer->ops = ops;
     bcontainer->space = space;
     bcontainer->dirty_pages_supported = false;
+    bcontainer->dma_max_mappings = 0;
     QLIST_INIT(&bcontainer->giommu_list);
 }
 
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 7bd81eab09..c5a6262882 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -154,7 +154,7 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
         if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
             container->iommu_type == VFIO_TYPE1v2_IOMMU) {
             trace_vfio_legacy_dma_unmap_overflow_workaround();
-            unmap.size -= 1ULL << ctz64(container->pgsizes);
+            unmap.size -= 1ULL << ctz64(bcontainer->pgsizes);
             continue;
         }
         error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
@@ -559,7 +559,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container = g_malloc0(sizeof(*container));
     container->fd = fd;
     container->error = NULL;
-    container->dma_max_mappings = 0;
     container->iova_ranges = NULL;
     QLIST_INIT(&container->vrdl_list);
     bcontainer = &container->bcontainer;
@@ -589,13 +588,13 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
         }
 
         if (info->flags & VFIO_IOMMU_INFO_PGSIZES) {
-            container->pgsizes = info->iova_pgsizes;
+            bcontainer->pgsizes = info->iova_pgsizes;
         } else {
-            container->pgsizes = qemu_real_host_page_size();
+            bcontainer->pgsizes = qemu_real_host_page_size();
         }
 
-        if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
-            container->dma_max_mappings = 65535;
+        if (!vfio_get_info_dma_avail(info, &bcontainer->dma_max_mappings)) {
+            bcontainer->dma_max_mappings = 65535;
         }
 
         vfio_get_info_iova_range(info, container);
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 83da2f7ec2..4f76bdd3ca 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -226,6 +226,7 @@ static int vfio_spapr_create_window(VFIOContainer *container,
                                     hwaddr *pgsize)
 {
     int ret = 0;
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
     uint64_t pagesize = memory_region_iommu_get_min_page_size(iommu_mr), pgmask;
     unsigned entries, bits_total, bits_per_level, max_levels;
@@ -239,13 +240,13 @@ static int vfio_spapr_create_window(VFIOContainer *container,
     if (pagesize > rampagesize) {
         pagesize = rampagesize;
     }
-    pgmask = container->pgsizes & (pagesize | (pagesize - 1));
+    pgmask = bcontainer->pgsizes & (pagesize | (pagesize - 1));
     pagesize = pgmask ? (1ULL << (63 - clz64(pgmask))) : 0;
     if (!pagesize) {
         error_report("Host doesn't support page size 0x%"PRIx64
                      ", the supported mask is 0x%lx",
                      memory_region_iommu_get_min_page_size(iommu_mr),
-                     container->pgsizes);
+                     bcontainer->pgsizes);
         return -EINVAL;
     }
 
@@ -421,6 +422,7 @@ void vfio_container_del_section_window(VFIOContainer *container,
 
 int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
 {
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     struct vfio_iommu_spapr_tce_info info;
     bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
     int ret, fd = container->fd;
@@ -461,7 +463,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
     }
 
     if (v2) {
-        container->pgsizes = info.ddw.pgsizes;
+        bcontainer->pgsizes = info.ddw.pgsizes;
         /*
          * There is a default window in just created container.
          * To make region_add/del simpler, we better remove this
@@ -476,7 +478,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
         }
     } else {
         /* The default table uses 4K pages */
-        container->pgsizes = 0x1000;
+        bcontainer->pgsizes = 0x1000;
         vfio_host_win_add(container, info.dma32_window_start,
                           info.dma32_window_start +
                           info.dma32_window_size - 1,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 16/41] vfio/container: Move vrdl_list to base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (14 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 15/41] vfio/container: Move pgsizes and dma_max_mappings " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:53   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 17/41] vfio/container: Move listener " Zhenzhong Duan
                   ` (26 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

No functional change intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h         | 11 --------
 include/hw/vfio/vfio-container-base.h | 11 ++++++++
 hw/vfio/common.c                      | 38 +++++++++++++--------------
 hw/vfio/container-base.c              |  1 +
 hw/vfio/container.c                   |  1 -
 5 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index d3dc2f9dcb..8a607a4c17 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -87,20 +87,9 @@ typedef struct VFIOContainer {
     uint64_t max_dirty_bitmap_size;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
-    QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
     GList *iova_ranges;
 } VFIOContainer;
 
-typedef struct VFIORamDiscardListener {
-    VFIOContainer *container;
-    MemoryRegion *mr;
-    hwaddr offset_within_address_space;
-    hwaddr size;
-    uint64_t granularity;
-    RamDiscardListener listener;
-    QLIST_ENTRY(VFIORamDiscardListener) next;
-} VFIORamDiscardListener;
-
 typedef struct VFIOHostDMAWindow {
     hwaddr min_iova;
     hwaddr max_iova;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 85ec7e1a56..8e05b5ac5a 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -40,6 +40,7 @@ typedef struct VFIOContainerBase {
     unsigned int dma_max_mappings;
     bool dirty_pages_supported;
     QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
+    QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
     QLIST_ENTRY(VFIOContainerBase) next;
     QLIST_HEAD(, VFIODevice) device_list;
 } VFIOContainerBase;
@@ -52,6 +53,16 @@ typedef struct VFIOGuestIOMMU {
     QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
 } VFIOGuestIOMMU;
 
+typedef struct VFIORamDiscardListener {
+    VFIOContainerBase *bcontainer;
+    MemoryRegion *mr;
+    hwaddr offset_within_address_space;
+    hwaddr size;
+    uint64_t granularity;
+    RamDiscardListener listener;
+    QLIST_ENTRY(VFIORamDiscardListener) next;
+} VFIORamDiscardListener;
+
 int vfio_container_dma_map(VFIOContainerBase *bcontainer,
                            hwaddr iova, ram_addr_t size,
                            void *vaddr, bool readonly);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1cb53d369e..f15665789f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -351,13 +351,13 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
 {
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
+    VFIOContainerBase *bcontainer = vrdl->bcontainer;
     const hwaddr size = int128_get64(section->size);
     const hwaddr iova = section->offset_within_address_space;
     int ret;
 
     /* Unmap with a single call. */
-    ret = vfio_container_dma_unmap(&vrdl->container->bcontainer,
-                                   iova, size , NULL);
+    ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL);
     if (ret) {
         error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
                      strerror(-ret));
@@ -369,6 +369,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
 {
     VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
                                                 listener);
+    VFIOContainerBase *bcontainer = vrdl->bcontainer;
     const hwaddr end = section->offset_within_region +
                        int128_get64(section->size);
     hwaddr start, next, iova;
@@ -387,8 +388,8 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
                section->offset_within_address_space;
         vaddr = memory_region_get_ram_ptr(section->mr) + start;
 
-        ret = vfio_container_dma_map(&vrdl->container->bcontainer, iova,
-                                     next - start, vaddr, section->readonly);
+        ret = vfio_container_dma_map(bcontainer, iova, next - start,
+                                     vaddr, section->readonly);
         if (ret) {
             /* Rollback */
             vfio_ram_discard_notify_discard(rdl, section);
@@ -398,10 +399,9 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
     return 0;
 }
 
-static void vfio_register_ram_discard_listener(VFIOContainer *container,
+static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                MemoryRegionSection *section)
 {
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
     VFIORamDiscardListener *vrdl;
 
@@ -412,7 +412,7 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
     g_assert(QEMU_IS_ALIGNED(int128_get64(section->size), TARGET_PAGE_SIZE));
 
     vrdl = g_new0(VFIORamDiscardListener, 1);
-    vrdl->container = container;
+    vrdl->bcontainer = bcontainer;
     vrdl->mr = section->mr;
     vrdl->offset_within_address_space = section->offset_within_address_space;
     vrdl->size = int128_get64(section->size);
@@ -427,7 +427,7 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
                               vfio_ram_discard_notify_populate,
                               vfio_ram_discard_notify_discard, true);
     ram_discard_manager_register_listener(rdm, &vrdl->listener, section);
-    QLIST_INSERT_HEAD(&container->vrdl_list, vrdl, next);
+    QLIST_INSERT_HEAD(&bcontainer->vrdl_list, vrdl, next);
 
     /*
      * Sanity-check if we have a theoretically problematic setup where we could
@@ -451,7 +451,7 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
         }
 #endif
 
-        QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
+        QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
             hwaddr start, end;
 
             start = QEMU_ALIGN_DOWN(vrdl->offset_within_address_space,
@@ -473,13 +473,13 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
     }
 }
 
-static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
+static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
                                                  MemoryRegionSection *section)
 {
     RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
     VFIORamDiscardListener *vrdl = NULL;
 
-    QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
+    QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
         if (vrdl->mr == section->mr &&
             vrdl->offset_within_address_space ==
             section->offset_within_address_space) {
@@ -663,7 +663,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
      * about changes.
      */
     if (memory_region_has_ram_discard_manager(section->mr)) {
-        vfio_register_ram_discard_listener(container, section);
+        vfio_register_ram_discard_listener(bcontainer, section);
         return;
     }
 
@@ -781,7 +781,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
         pgmask = (1ULL << ctz64(bcontainer->pgsizes)) - 1;
         try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
     } else if (memory_region_has_ram_discard_manager(section->mr)) {
-        vfio_unregister_ram_discard_listener(container, section);
+        vfio_unregister_ram_discard_listener(bcontainer, section);
         /* Unregistering will trigger an unmap. */
         try_unmap = false;
     }
@@ -1260,17 +1260,17 @@ static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
      * Sync the whole mapped region (spanning multiple individual mappings)
      * in one go.
      */
-    return vfio_get_dirty_bitmap(&vrdl->container->bcontainer, iova, size,
-                                 ram_addr);
+    return vfio_get_dirty_bitmap(vrdl->bcontainer, iova, size, ram_addr);
 }
 
-static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
-                                                   MemoryRegionSection *section)
+static int
+vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
+                                            MemoryRegionSection *section)
 {
     RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
     VFIORamDiscardListener *vrdl = NULL;
 
-    QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
+    QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
         if (vrdl->mr == section->mr &&
             vrdl->offset_within_address_space ==
             section->offset_within_address_space) {
@@ -1324,7 +1324,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
         }
         return 0;
     } else if (memory_region_has_ram_discard_manager(section->mr)) {
-        return vfio_sync_ram_discard_listener_dirty_bitmap(container, section);
+        return vfio_sync_ram_discard_listener_dirty_bitmap(bcontainer, section);
     }
 
     ram_addr = memory_region_get_ram_addr(section->mr) +
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index dcce111349..584eee4ba1 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -54,6 +54,7 @@ void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
     bcontainer->dirty_pages_supported = false;
     bcontainer->dma_max_mappings = 0;
     QLIST_INIT(&bcontainer->giommu_list);
+    QLIST_INIT(&bcontainer->vrdl_list);
 }
 
 void vfio_container_destroy(VFIOContainerBase *bcontainer)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index c5a6262882..6ba2e2f8c4 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -560,7 +560,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     container->fd = fd;
     container->error = NULL;
     container->iova_ranges = NULL;
-    QLIST_INIT(&container->vrdl_list);
     bcontainer = &container->bcontainer;
     vfio_container_init(bcontainer, space, &vfio_legacy_ops);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 17/41] vfio/container: Move listener to base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (15 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 16/41] vfio/container: Move vrdl_list " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:57   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 18/41] vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size " Zhenzhong Duan
                   ` (25 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

From: Eric Auger <eric.auger@redhat.com>

Move listener to base container. Also error and initialized fields
are moved at the same time.

No functional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h         |   3 -
 include/hw/vfio/vfio-container-base.h |   3 +
 hw/vfio/common.c                      | 110 +++++++++++++-------------
 hw/vfio/container-base.c              |   1 +
 hw/vfio/container.c                   |  19 +++--
 hw/vfio/spapr.c                       |  11 +--
 6 files changed, 74 insertions(+), 73 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8a607a4c17..922022cbc6 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -78,11 +78,8 @@ struct VFIOGroup;
 typedef struct VFIOContainer {
     VFIOContainerBase bcontainer;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-    MemoryListener listener;
     MemoryListener prereg_listener;
     unsigned iommu_type;
-    Error *error;
-    bool initialized;
     uint64_t dirty_pgsizes;
     uint64_t max_dirty_bitmap_size;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 8e05b5ac5a..95f8d319e0 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -36,6 +36,9 @@ typedef struct VFIOAddressSpace {
 typedef struct VFIOContainerBase {
     const VFIOIOMMUOps *ops;
     VFIOAddressSpace *space;
+    MemoryListener listener;
+    Error *error;
+    bool initialized;
     unsigned long pgsizes;
     unsigned int dma_max_mappings;
     bool dirty_pages_supported;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f15665789f..be623e544b 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -541,7 +541,7 @@ static bool vfio_listener_valid_section(MemoryRegionSection *section,
     return true;
 }
 
-static bool vfio_get_section_iova_range(VFIOContainer *container,
+static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer,
                                         MemoryRegionSection *section,
                                         hwaddr *out_iova, hwaddr *out_end,
                                         Int128 *out_llend)
@@ -569,8 +569,10 @@ static bool vfio_get_section_iova_range(VFIOContainer *container,
 static void vfio_listener_region_add(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
-    VFIOContainerBase *bcontainer = &container->bcontainer;
+    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+                                                 listener);
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
     hwaddr iova, end;
     Int128 llend, llsize;
     void *vaddr;
@@ -581,7 +583,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
-    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
+    if (!vfio_get_section_iova_range(bcontainer, section, &iova, &end,
+                                     &llend)) {
         if (memory_region_is_ram_device(section->mr)) {
             trace_vfio_listener_region_add_no_dma_map(
                 memory_region_name(section->mr),
@@ -688,13 +691,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
         }
     }
 
-    ret = vfio_container_dma_map(&container->bcontainer,
-                                 iova, int128_get64(llsize), vaddr,
-                                 section->readonly);
+    ret = vfio_container_dma_map(bcontainer, iova, int128_get64(llsize),
+                                 vaddr, section->readonly);
     if (ret) {
         error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
                    "0x%"HWADDR_PRIx", %p) = %d (%s)",
-                   container, iova, int128_get64(llsize), vaddr, ret,
+                   bcontainer, iova, int128_get64(llsize), vaddr, ret,
                    strerror(-ret));
         if (memory_region_is_ram_device(section->mr)) {
             /* Allow unexpected mappings not to be fatal for RAM devices */
@@ -716,9 +718,9 @@ fail:
      * can gracefully fail.  Runtime, there's not much we can do other
      * than throw a hardware error.
      */
-    if (!container->initialized) {
-        if (!container->error) {
-            error_propagate_prepend(&container->error, err,
+    if (!bcontainer->initialized) {
+        if (!bcontainer->error) {
+            error_propagate_prepend(&bcontainer->error, err,
                                     "Region %s: ",
                                     memory_region_name(section->mr));
         } else {
@@ -733,8 +735,10 @@ fail:
 static void vfio_listener_region_del(MemoryListener *listener,
                                      MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
-    VFIOContainerBase *bcontainer = &container->bcontainer;
+    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+                                                 listener);
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
     hwaddr iova, end;
     Int128 llend, llsize;
     int ret;
@@ -767,7 +771,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
          */
     }
 
-    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
+    if (!vfio_get_section_iova_range(bcontainer, section, &iova, &end,
+                                     &llend)) {
         return;
     }
 
@@ -790,22 +795,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
         if (int128_eq(llsize, int128_2_64())) {
             /* The unmap ioctl doesn't accept a full 64-bit span. */
             llsize = int128_rshift(llsize, 1);
-            ret = vfio_container_dma_unmap(&container->bcontainer, iova,
+            ret = vfio_container_dma_unmap(bcontainer, iova,
                                            int128_get64(llsize), NULL);
             if (ret) {
                 error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                              "0x%"HWADDR_PRIx") = %d (%s)",
-                             container, iova, int128_get64(llsize), ret,
+                             bcontainer, iova, int128_get64(llsize), ret,
                              strerror(-ret));
             }
             iova += int128_get64(llsize);
         }
-        ret = vfio_container_dma_unmap(&container->bcontainer, iova,
+        ret = vfio_container_dma_unmap(bcontainer, iova,
                                        int128_get64(llsize), NULL);
         if (ret) {
             error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx") = %d (%s)",
-                         container, iova, int128_get64(llsize), ret,
+                         bcontainer, iova, int128_get64(llsize), ret,
                          strerror(-ret));
         }
     }
@@ -825,16 +830,15 @@ typedef struct VFIODirtyRanges {
 } VFIODirtyRanges;
 
 typedef struct VFIODirtyRangesListener {
-    VFIOContainer *container;
+    VFIOContainerBase *bcontainer;
     VFIODirtyRanges ranges;
     MemoryListener listener;
 } VFIODirtyRangesListener;
 
 static bool vfio_section_is_vfio_pci(MemoryRegionSection *section,
-                                     VFIOContainer *container)
+                                     VFIOContainerBase *bcontainer)
 {
     VFIOPCIDevice *pcidev;
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
     Object *owner;
 
@@ -863,7 +867,7 @@ static void vfio_dirty_tracking_update(MemoryListener *listener,
     hwaddr iova, end, *min, *max;
 
     if (!vfio_listener_valid_section(section, "tracking_update") ||
-        !vfio_get_section_iova_range(dirty->container, section,
+        !vfio_get_section_iova_range(dirty->bcontainer, section,
                                      &iova, &end, NULL)) {
         return;
     }
@@ -887,7 +891,7 @@ static void vfio_dirty_tracking_update(MemoryListener *listener,
      * The alternative would be an IOVATree but that has a much bigger runtime
      * overhead and unnecessary complexity.
      */
-    if (vfio_section_is_vfio_pci(section, dirty->container) &&
+    if (vfio_section_is_vfio_pci(section, dirty->bcontainer) &&
         iova >= UINT32_MAX) {
         min = &range->minpci64;
         max = &range->maxpci64;
@@ -911,7 +915,7 @@ static const MemoryListener vfio_dirty_tracking_listener = {
     .region_add = vfio_dirty_tracking_update,
 };
 
-static void vfio_dirty_tracking_init(VFIOContainer *container,
+static void vfio_dirty_tracking_init(VFIOContainerBase *bcontainer,
                                      VFIODirtyRanges *ranges)
 {
     VFIODirtyRangesListener dirty;
@@ -921,10 +925,10 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
     dirty.ranges.min64 = UINT64_MAX;
     dirty.ranges.minpci64 = UINT64_MAX;
     dirty.listener = vfio_dirty_tracking_listener;
-    dirty.container = container;
+    dirty.bcontainer = bcontainer;
 
     memory_listener_register(&dirty.listener,
-                             container->bcontainer.space->as);
+                             bcontainer->space->as);
 
     *ranges = dirty.ranges;
 
@@ -936,12 +940,11 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
     memory_listener_unregister(&dirty.listener);
 }
 
-static void vfio_devices_dma_logging_stop(VFIOContainer *container)
+static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
 {
     uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
                               sizeof(uint64_t))] = {};
     struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
 
     feature->argsz = sizeof(buf);
@@ -962,7 +965,7 @@ static void vfio_devices_dma_logging_stop(VFIOContainer *container)
 }
 
 static struct vfio_device_feature *
-vfio_device_feature_dma_logging_start_create(VFIOContainer *container,
+vfio_device_feature_dma_logging_start_create(VFIOContainerBase *bcontainer,
                                              VFIODirtyRanges *tracking)
 {
     struct vfio_device_feature *feature;
@@ -1035,16 +1038,15 @@ static void vfio_device_feature_dma_logging_start_destroy(
     g_free(feature);
 }
 
-static int vfio_devices_dma_logging_start(VFIOContainer *container)
+static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
 {
     struct vfio_device_feature *feature;
     VFIODirtyRanges ranges;
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     VFIODevice *vbasedev;
     int ret = 0;
 
-    vfio_dirty_tracking_init(container, &ranges);
-    feature = vfio_device_feature_dma_logging_start_create(container,
+    vfio_dirty_tracking_init(bcontainer, &ranges);
+    feature = vfio_device_feature_dma_logging_start_create(bcontainer,
                                                            &ranges);
     if (!feature) {
         return -errno;
@@ -1067,7 +1069,7 @@ static int vfio_devices_dma_logging_start(VFIOContainer *container)
 
 out:
     if (ret) {
-        vfio_devices_dma_logging_stop(container);
+        vfio_devices_dma_logging_stop(bcontainer);
     }
 
     vfio_device_feature_dma_logging_start_destroy(feature);
@@ -1077,14 +1079,14 @@ out:
 
 static void vfio_listener_log_global_start(MemoryListener *listener)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+                                                 listener);
     int ret;
 
-    if (vfio_devices_all_device_dirty_tracking(&container->bcontainer)) {
-        ret = vfio_devices_dma_logging_start(container);
+    if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
+        ret = vfio_devices_dma_logging_start(bcontainer);
     } else {
-        ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
-                                                     true);
+        ret = vfio_container_set_dirty_page_tracking(bcontainer, true);
     }
 
     if (ret) {
@@ -1096,14 +1098,14 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
 
 static void vfio_listener_log_global_stop(MemoryListener *listener)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+                                                 listener);
     int ret = 0;
 
-    if (vfio_devices_all_device_dirty_tracking(&container->bcontainer)) {
-        vfio_devices_dma_logging_stop(container);
+    if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
+        vfio_devices_dma_logging_stop(bcontainer);
     } else {
-        ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
-                                                     false);
+        ret = vfio_container_set_dirty_page_tracking(bcontainer, false);
     }
 
     if (ret) {
@@ -1214,8 +1216,6 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
                                                 vfio_giommu_dirty_notifier, n);
     VFIOGuestIOMMU *giommu = gdn->giommu;
     VFIOContainerBase *bcontainer = giommu->bcontainer;
-    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
-                                            bcontainer);
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
     ram_addr_t translated_addr;
     int ret = -EINVAL;
@@ -1230,12 +1230,12 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
 
     rcu_read_lock();
     if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
-        ret = vfio_get_dirty_bitmap(&container->bcontainer, iova,
-                                    iotlb->addr_mask + 1, translated_addr);
+        ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
+                                    translated_addr);
         if (ret) {
             error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx") = %d (%s)",
-                         container, iova, iotlb->addr_mask + 1, ret,
+                         bcontainer, iova, iotlb->addr_mask + 1, ret,
                          strerror(-ret));
         }
     }
@@ -1291,10 +1291,9 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
                                                 &vrdl);
 }
 
-static int vfio_sync_dirty_bitmap(VFIOContainer *container,
+static int vfio_sync_dirty_bitmap(VFIOContainerBase *bcontainer,
                                   MemoryRegionSection *section)
 {
-    VFIOContainerBase *bcontainer = &container->bcontainer;
     ram_addr_t ram_addr;
 
     if (memory_region_is_iommu(section->mr)) {
@@ -1330,7 +1329,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
     ram_addr = memory_region_get_ram_addr(section->mr) +
                section->offset_within_region;
 
-    return vfio_get_dirty_bitmap(&container->bcontainer,
+    return vfio_get_dirty_bitmap(bcontainer,
                    REAL_HOST_PAGE_ALIGN(section->offset_within_address_space),
                    int128_get64(section->size), ram_addr);
 }
@@ -1338,15 +1337,16 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
 static void vfio_listener_log_sync(MemoryListener *listener,
         MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+                                                 listener);
     int ret;
 
     if (vfio_listener_skipped_section(section)) {
         return;
     }
 
-    if (vfio_devices_all_dirty_tracking(&container->bcontainer)) {
-        ret = vfio_sync_dirty_bitmap(container, section);
+    if (vfio_devices_all_dirty_tracking(bcontainer)) {
+        ret = vfio_sync_dirty_bitmap(bcontainer, section);
         if (ret) {
             error_report("vfio: Failed to sync dirty bitmap, err: %d (%s)", ret,
                          strerror(-ret));
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 584eee4ba1..7f508669f5 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -51,6 +51,7 @@ void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
 {
     bcontainer->ops = ops;
     bcontainer->space = space;
+    bcontainer->error = NULL;
     bcontainer->dirty_pages_supported = false;
     bcontainer->dma_max_mappings = 0;
     QLIST_INIT(&bcontainer->giommu_list);
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 6ba2e2f8c4..5c1dee8c9f 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -453,6 +453,7 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
 {
     struct vfio_info_cap_header *hdr;
     struct vfio_iommu_type1_info_cap_migration *cap_mig;
+    VFIOContainerBase *bcontainer = &container->bcontainer;
 
     hdr = vfio_get_iommu_info_cap(info, VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION);
     if (!hdr) {
@@ -467,7 +468,7 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
      * qemu_real_host_page_size to mark those dirty.
      */
     if (cap_mig->pgsize_bitmap & qemu_real_host_page_size()) {
-        container->bcontainer.dirty_pages_supported = true;
+        bcontainer->dirty_pages_supported = true;
         container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
         container->dirty_pgsizes = cap_mig->pgsize_bitmap;
     }
@@ -558,7 +559,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 
     container = g_malloc0(sizeof(*container));
     container->fd = fd;
-    container->error = NULL;
     container->iova_ranges = NULL;
     bcontainer = &container->bcontainer;
     vfio_container_init(bcontainer, space, &vfio_legacy_ops);
@@ -621,25 +621,24 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
     group->container = container;
     QLIST_INSERT_HEAD(&container->group_list, group, container_next);
 
-    container->listener = vfio_memory_listener;
-
-    memory_listener_register(&container->listener, bcontainer->space->as);
+    bcontainer->listener = vfio_memory_listener;
+    memory_listener_register(&bcontainer->listener, bcontainer->space->as);
 
-    if (container->error) {
+    if (bcontainer->error) {
         ret = -1;
-        error_propagate_prepend(errp, container->error,
+        error_propagate_prepend(errp, bcontainer->error,
             "memory listener initialization failed: ");
         goto listener_release_exit;
     }
 
-    container->initialized = true;
+    bcontainer->initialized = true;
 
     return 0;
 listener_release_exit:
     QLIST_REMOVE(group, container_next);
     QLIST_REMOVE(bcontainer, next);
     vfio_kvm_device_del_group(group);
-    memory_listener_unregister(&container->listener);
+    memory_listener_unregister(&bcontainer->listener);
     if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU ||
         container->iommu_type == VFIO_SPAPR_TCE_IOMMU) {
         vfio_spapr_container_deinit(container);
@@ -674,7 +673,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
      * group.
      */
     if (QLIST_EMPTY(&container->group_list)) {
-        memory_listener_unregister(&container->listener);
+        memory_listener_unregister(&bcontainer->listener);
         if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU ||
             container->iommu_type == VFIO_SPAPR_TCE_IOMMU) {
             vfio_spapr_container_deinit(container);
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 4f76bdd3ca..7a50975f25 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -46,6 +46,7 @@ static void vfio_prereg_listener_region_add(MemoryListener *listener,
 {
     VFIOContainer *container = container_of(listener, VFIOContainer,
                                             prereg_listener);
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     const hwaddr gpa = section->offset_within_address_space;
     hwaddr end;
     int ret;
@@ -88,9 +89,9 @@ static void vfio_prereg_listener_region_add(MemoryListener *listener,
          * can gracefully fail.  Runtime, there's not much we can do other
          * than throw a hardware error.
          */
-        if (!container->initialized) {
-            if (!container->error) {
-                error_setg_errno(&container->error, -ret,
+        if (!bcontainer->initialized) {
+            if (!bcontainer->error) {
+                error_setg_errno(&bcontainer->error, -ret,
                                  "Memory registering failed");
             }
         } else {
@@ -445,9 +446,9 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
 
         memory_listener_register(&container->prereg_listener,
                                  &address_space_memory);
-        if (container->error) {
+        if (bcontainer->error) {
             ret = -1;
-            error_propagate_prepend(errp, container->error,
+            error_propagate_prepend(errp, bcontainer->error,
                     "RAM memory listener initialization failed: ");
             goto listener_unregister_exit;
         }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 18/41] vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size to base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (16 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 17/41] vfio/container: Move listener " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 19/41] vfio/container: Move iova_ranges " Zhenzhong Duan
                   ` (24 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Zhenzhong Duan

From: Eric Auger <eric.auger@redhat.com>

No functional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
 include/hw/vfio/vfio-common.h         | 2 --
 include/hw/vfio/vfio-container-base.h | 2 ++
 hw/vfio/container.c                   | 9 +++++----
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 922022cbc6..b1c9fe711b 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -80,8 +80,6 @@ typedef struct VFIOContainer {
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
     MemoryListener prereg_listener;
     unsigned iommu_type;
-    uint64_t dirty_pgsizes;
-    uint64_t max_dirty_bitmap_size;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
     GList *iova_ranges;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 95f8d319e0..80e4a993c5 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -39,6 +39,8 @@ typedef struct VFIOContainerBase {
     MemoryListener listener;
     Error *error;
     bool initialized;
+    uint64_t dirty_pgsizes;
+    uint64_t max_dirty_bitmap_size;
     unsigned long pgsizes;
     unsigned int dma_max_mappings;
     bool dirty_pages_supported;
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 5c1dee8c9f..c8088a8174 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -64,6 +64,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
                                  hwaddr iova, ram_addr_t size,
                                  IOMMUTLBEntry *iotlb)
 {
+    VFIOContainerBase *bcontainer = &container->bcontainer;
     struct vfio_iommu_type1_dma_unmap *unmap;
     struct vfio_bitmap *bitmap;
     VFIOBitmap vbmap;
@@ -91,7 +92,7 @@ static int vfio_dma_unmap_bitmap(VFIOContainer *container,
     bitmap->size = vbmap.size;
     bitmap->data = (__u64 *)vbmap.bitmap;
 
-    if (vbmap.size > container->max_dirty_bitmap_size) {
+    if (vbmap.size > bcontainer->max_dirty_bitmap_size) {
         error_report("UNMAP: Size of bitmap too big 0x%"PRIx64, vbmap.size);
         ret = -E2BIG;
         goto unmap_exit;
@@ -131,7 +132,7 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
 
     if (iotlb && vfio_devices_all_running_and_mig_active(bcontainer)) {
         if (!vfio_devices_all_device_dirty_tracking(bcontainer) &&
-            container->bcontainer.dirty_pages_supported) {
+            bcontainer->dirty_pages_supported) {
             return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
         }
 
@@ -469,8 +470,8 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
      */
     if (cap_mig->pgsize_bitmap & qemu_real_host_page_size()) {
         bcontainer->dirty_pages_supported = true;
-        container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
-        container->dirty_pgsizes = cap_mig->pgsize_bitmap;
+        bcontainer->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
+        bcontainer->dirty_pgsizes = cap_mig->pgsize_bitmap;
     }
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 19/41] vfio/container: Move iova_ranges to base container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (17 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 18/41] vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:58   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 20/41] vfio/container: Implement attach/detach_device Zhenzhong Duan
                   ` (23 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

Meanwhile remove the helper function vfio_free_container as it
only calls g_free now.

No functional change intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h         |  1 -
 include/hw/vfio/vfio-container-base.h |  1 +
 hw/vfio/common.c                      |  5 +++--
 hw/vfio/container-base.c              |  3 +++
 hw/vfio/container.c                   | 19 ++++++-------------
 5 files changed, 13 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b1c9fe711b..b9e5a0e64b 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -82,7 +82,6 @@ typedef struct VFIOContainer {
     unsigned iommu_type;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
-    GList *iova_ranges;
 } VFIOContainer;
 
 typedef struct VFIOHostDMAWindow {
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 80e4a993c5..9658ffb526 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -48,6 +48,7 @@ typedef struct VFIOContainerBase {
     QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
     QLIST_ENTRY(VFIOContainerBase) next;
     QLIST_HEAD(, VFIODevice) device_list;
+    GList *iova_ranges;
 } VFIOContainerBase;
 
 typedef struct VFIOGuestIOMMU {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index be623e544b..8ef2e7967d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -637,9 +637,10 @@ static void vfio_listener_region_add(MemoryListener *listener,
             goto fail;
         }
 
-        if (container->iova_ranges) {
+        if (bcontainer->iova_ranges) {
             ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
-                    container->iova_ranges, &err);
+                                                      bcontainer->iova_ranges,
+                                                      &err);
             if (ret) {
                 g_free(giommu);
                 goto fail;
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 7f508669f5..0177f43741 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -54,6 +54,7 @@ void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
     bcontainer->error = NULL;
     bcontainer->dirty_pages_supported = false;
     bcontainer->dma_max_mappings = 0;
+    bcontainer->iova_ranges = NULL;
     QLIST_INIT(&bcontainer->giommu_list);
     QLIST_INIT(&bcontainer->vrdl_list);
 }
@@ -70,4 +71,6 @@ void vfio_container_destroy(VFIOContainerBase *bcontainer)
         QLIST_REMOVE(giommu, giommu_next);
         g_free(giommu);
     }
+
+    g_list_free_full(bcontainer->iova_ranges, g_free);
 }
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index c8088a8174..721c0d7375 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -308,7 +308,7 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
 }
 
 static bool vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
-                                     VFIOContainer *container)
+                                     VFIOContainerBase *bcontainer)
 {
     struct vfio_info_cap_header *hdr;
     struct vfio_iommu_type1_info_cap_iova_range *cap;
@@ -326,8 +326,8 @@ static bool vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
 
         range_set_bounds(range, cap->iova_ranges[i].start,
                          cap->iova_ranges[i].end);
-        container->iova_ranges =
-            range_list_insert(container->iova_ranges, range);
+        bcontainer->iova_ranges =
+            range_list_insert(bcontainer->iova_ranges, range);
     }
 
     return true;
@@ -475,12 +475,6 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
     }
 }
 
-static void vfio_free_container(VFIOContainer *container)
-{
-    g_list_free_full(container->iova_ranges, g_free);
-    g_free(container);
-}
-
 static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
                                   Error **errp)
 {
@@ -560,7 +554,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
 
     container = g_malloc0(sizeof(*container));
     container->fd = fd;
-    container->iova_ranges = NULL;
     bcontainer = &container->bcontainer;
     vfio_container_init(bcontainer, space, &vfio_legacy_ops);
 
@@ -597,7 +590,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
             bcontainer->dma_max_mappings = 65535;
         }
 
-        vfio_get_info_iova_range(info, container);
+        vfio_get_info_iova_range(info, bcontainer);
 
         vfio_get_iommu_info_migration(container, info);
         g_free(info);
@@ -649,7 +642,7 @@ enable_discards_exit:
     vfio_ram_block_discard_disable(container, false);
 
 free_container_exit:
-    vfio_free_container(container);
+    g_free(container);
 
 close_fd_exit:
     close(fd);
@@ -693,7 +686,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
 
         trace_vfio_disconnect_container(container->fd);
         close(container->fd);
-        vfio_free_container(container);
+        g_free(container);
 
         vfio_put_address_space(space);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 20/41] vfio/container: Implement attach/detach_device
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (18 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 19/41] vfio/container: Move iova_ranges " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 16:59   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 21/41] vfio/spapr: Introduce spapr backend and target interface Zhenzhong Duan
                   ` (22 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Zhenzhong Duan

From: Eric Auger <eric.auger@redhat.com>

No fucntional change intended.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/common.c    | 16 ++++++++++++++++
 hw/vfio/container.c | 12 +++++-------
 2 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8ef2e7967d..483ba82089 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1498,3 +1498,19 @@ retry:
 
     return info;
 }
+
+int vfio_attach_device(char *name, VFIODevice *vbasedev,
+                       AddressSpace *as, Error **errp)
+{
+    const VFIOIOMMUOps *ops = &vfio_legacy_ops;
+
+    return ops->attach_device(name, vbasedev, as, errp);
+}
+
+void vfio_detach_device(VFIODevice *vbasedev)
+{
+    if (!vbasedev->bcontainer) {
+        return;
+    }
+    vbasedev->bcontainer->ops->detach_device(vbasedev);
+}
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 721c0d7375..6bacf38222 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -873,8 +873,8 @@ static int vfio_device_groupid(VFIODevice *vbasedev, Error **errp)
  * @name and @vbasedev->name are likely to be different depending
  * on the type of the device, hence the need for passing @name
  */
-int vfio_attach_device(char *name, VFIODevice *vbasedev,
-                       AddressSpace *as, Error **errp)
+static int vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
+                                     AddressSpace *as, Error **errp)
 {
     int groupid = vfio_device_groupid(vbasedev, errp);
     VFIODevice *vbasedev_iter;
@@ -914,14 +914,10 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
     return ret;
 }
 
-void vfio_detach_device(VFIODevice *vbasedev)
+static void vfio_legacy_detach_device(VFIODevice *vbasedev)
 {
     VFIOGroup *group = vbasedev->group;
 
-    if (!vbasedev->bcontainer) {
-        return;
-    }
-
     QLIST_REMOVE(vbasedev, global_next);
     QLIST_REMOVE(vbasedev, container_next);
     vbasedev->bcontainer = NULL;
@@ -933,6 +929,8 @@ void vfio_detach_device(VFIODevice *vbasedev)
 const VFIOIOMMUOps vfio_legacy_ops = {
     .dma_map = vfio_legacy_dma_map,
     .dma_unmap = vfio_legacy_dma_unmap,
+    .attach_device = vfio_legacy_attach_device,
+    .detach_device = vfio_legacy_detach_device,
     .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
     .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
 };
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 21/41] vfio/spapr: Introduce spapr backend and target interface
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (19 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 20/41] vfio/container: Implement attach/detach_device Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 17:30   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window Zhenzhong Duan
                   ` (21 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Harsh Prateek Bora,
	open list:sPAPR (pseries)

Introduce an empty spapr backend which will hold spapr specific
content, currently only prereg_listener and hostwin_list.

Also introduce two spapr specific callbacks add/del_window into
VFIOIOMMUOps. Instantiate a spapr ops with a helper setup_spapr_ops
and assign it to bcontainer->ops.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: remove VFIOIOMMUSpaprOps

 include/hw/vfio/vfio-container-base.h |  6 ++++++
 hw/vfio/spapr.c                       | 14 ++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 9658ffb526..f62a14ac73 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -101,5 +101,11 @@ struct VFIOIOMMUOps {
     int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
     int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
                               hwaddr iova, hwaddr size);
+    /* SPAPR specific */
+    int (*add_window)(VFIOContainerBase *bcontainer,
+                      MemoryRegionSection *section,
+                      Error **errp);
+    void (*del_window)(VFIOContainerBase *bcontainer,
+                       MemoryRegionSection *section);
 };
 #endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 7a50975f25..e1a6b35563 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -24,6 +24,10 @@
 #include "qapi/error.h"
 #include "trace.h"
 
+typedef struct VFIOSpaprContainer {
+    VFIOContainer container;
+} VFIOSpaprContainer;
+
 static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
 {
     if (memory_region_is_iommu(section->mr)) {
@@ -421,6 +425,14 @@ void vfio_container_del_section_window(VFIOContainer *container,
     }
 }
 
+static VFIOIOMMUOps vfio_iommu_spapr_ops;
+
+static void setup_spapr_ops(VFIOContainerBase *bcontainer)
+{
+    vfio_iommu_spapr_ops = *bcontainer->ops;
+    bcontainer->ops = &vfio_iommu_spapr_ops;
+}
+
 int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
 {
     VFIOContainerBase *bcontainer = &container->bcontainer;
@@ -486,6 +498,8 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
                           0x1000);
     }
 
+    setup_spapr_ops(bcontainer);
+
     return 0;
 
 listener_unregister_exit:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (20 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 21/41] vfio/spapr: Introduce spapr backend and target interface Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 17:33   ` Cédric Le Goater
  2023-11-07 17:34   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 23/41] vfio/spapr: Move prereg_listener into spapr container Zhenzhong Duan
                   ` (20 subsequent siblings)
  42 siblings, 2 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

No fucntional change intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h         |  5 -----
 include/hw/vfio/vfio-container-base.h |  5 +++++
 hw/vfio/common.c                      |  8 ++------
 hw/vfio/container-base.c              | 21 +++++++++++++++++++++
 hw/vfio/spapr.c                       | 19 ++++++++++++++-----
 5 files changed, 42 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index b9e5a0e64b..055f679363 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -169,11 +169,6 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
 void vfio_put_address_space(VFIOAddressSpace *space);
 
 /* SPAPR specific */
-int vfio_container_add_section_window(VFIOContainer *container,
-                                      MemoryRegionSection *section,
-                                      Error **errp);
-void vfio_container_del_section_window(VFIOContainer *container,
-                                       MemoryRegionSection *section);
 int vfio_spapr_container_init(VFIOContainer *container, Error **errp);
 void vfio_spapr_container_deinit(VFIOContainer *container);
 
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index f62a14ac73..4b6f017c6f 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -75,6 +75,11 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
 int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
                              hwaddr iova, ram_addr_t size,
                              IOMMUTLBEntry *iotlb);
+int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
+                                      MemoryRegionSection *section,
+                                      Error **errp);
+void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
+                                       MemoryRegionSection *section);
 int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
                                            bool start);
 int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 483ba82089..572ae7c934 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -571,8 +571,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
 {
     VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
                                                  listener);
-    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
-                                            bcontainer);
     hwaddr iova, end;
     Int128 llend, llsize;
     void *vaddr;
@@ -595,7 +593,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
         return;
     }
 
-    if (vfio_container_add_section_window(container, section, &err)) {
+    if (vfio_container_add_section_window(bcontainer, section, &err)) {
         goto fail;
     }
 
@@ -738,8 +736,6 @@ static void vfio_listener_region_del(MemoryListener *listener,
 {
     VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
                                                  listener);
-    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
-                                            bcontainer);
     hwaddr iova, end;
     Int128 llend, llsize;
     int ret;
@@ -818,7 +814,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
 
     memory_region_unref(section->mr);
 
-    vfio_container_del_section_window(container, section);
+    vfio_container_del_section_window(bcontainer, section);
 }
 
 typedef struct VFIODirtyRanges {
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 0177f43741..71f7274973 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -31,6 +31,27 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
     return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
 }
 
+int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
+                                      MemoryRegionSection *section,
+                                      Error **errp)
+{
+    if (!bcontainer->ops->add_window) {
+        return 0;
+    }
+
+    return bcontainer->ops->add_window(bcontainer, section, errp);
+}
+
+void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
+                                       MemoryRegionSection *section)
+{
+    if (!bcontainer->ops->del_window) {
+        return;
+    }
+
+    return bcontainer->ops->del_window(bcontainer, section);
+}
+
 int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
                                            bool start)
 {
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index e1a6b35563..5be1911aad 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -319,10 +319,13 @@ static int vfio_spapr_create_window(VFIOContainer *container,
     return 0;
 }
 
-int vfio_container_add_section_window(VFIOContainer *container,
-                                      MemoryRegionSection *section,
-                                      Error **errp)
+static int
+vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
+                                        MemoryRegionSection *section,
+                                        Error **errp)
 {
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
     VFIOHostDMAWindow *hostwin;
     hwaddr pgsize = 0;
     int ret;
@@ -407,9 +410,13 @@ int vfio_container_add_section_window(VFIOContainer *container,
     return 0;
 }
 
-void vfio_container_del_section_window(VFIOContainer *container,
-                                       MemoryRegionSection *section)
+static void
+vfio_spapr_container_del_section_window(VFIOContainerBase *bcontainer,
+                                        MemoryRegionSection *section)
 {
+    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                            bcontainer);
+
     if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
         return;
     }
@@ -430,6 +437,8 @@ static VFIOIOMMUOps vfio_iommu_spapr_ops;
 static void setup_spapr_ops(VFIOContainerBase *bcontainer)
 {
     vfio_iommu_spapr_ops = *bcontainer->ops;
+    vfio_iommu_spapr_ops.add_window = vfio_spapr_container_add_section_window;
+    vfio_iommu_spapr_ops.del_window = vfio_spapr_container_del_section_window;
     bcontainer->ops = &vfio_iommu_spapr_ops;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 23/41] vfio/spapr: Move prereg_listener into spapr container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (21 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 17:34   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 24/41] vfio/spapr: Move hostwin_list " Zhenzhong Duan
                   ` (19 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

No functional changes intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h |  1 -
 hw/vfio/spapr.c               | 24 ++++++++++++++++--------
 2 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 055f679363..ed6148c058 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -78,7 +78,6 @@ struct VFIOGroup;
 typedef struct VFIOContainer {
     VFIOContainerBase bcontainer;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
-    MemoryListener prereg_listener;
     unsigned iommu_type;
     QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 5be1911aad..68c3dd6c75 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -26,6 +26,7 @@
 
 typedef struct VFIOSpaprContainer {
     VFIOContainer container;
+    MemoryListener prereg_listener;
 } VFIOSpaprContainer;
 
 static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
@@ -48,8 +49,9 @@ static void *vfio_prereg_gpa_to_vaddr(MemoryRegionSection *section, hwaddr gpa)
 static void vfio_prereg_listener_region_add(MemoryListener *listener,
                                             MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            prereg_listener);
+    VFIOSpaprContainer *scontainer = container_of(listener, VFIOSpaprContainer,
+                                                  prereg_listener);
+    VFIOContainer *container = &scontainer->container;
     VFIOContainerBase *bcontainer = &container->bcontainer;
     const hwaddr gpa = section->offset_within_address_space;
     hwaddr end;
@@ -107,8 +109,9 @@ static void vfio_prereg_listener_region_add(MemoryListener *listener,
 static void vfio_prereg_listener_region_del(MemoryListener *listener,
                                             MemoryRegionSection *section)
 {
-    VFIOContainer *container = container_of(listener, VFIOContainer,
-                                            prereg_listener);
+    VFIOSpaprContainer *scontainer = container_of(listener, VFIOSpaprContainer,
+                                                  prereg_listener);
+    VFIOContainer *container = &scontainer->container;
     const hwaddr gpa = section->offset_within_address_space;
     hwaddr end;
     int ret;
@@ -445,6 +448,8 @@ static void setup_spapr_ops(VFIOContainerBase *bcontainer)
 int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
 {
     VFIOContainerBase *bcontainer = &container->bcontainer;
+    VFIOSpaprContainer *scontainer = container_of(container, VFIOSpaprContainer,
+                                                  container);
     struct vfio_iommu_spapr_tce_info info;
     bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
     int ret, fd = container->fd;
@@ -463,9 +468,9 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
             return -errno;
         }
     } else {
-        container->prereg_listener = vfio_prereg_listener;
+        scontainer->prereg_listener = vfio_prereg_listener;
 
-        memory_listener_register(&container->prereg_listener,
+        memory_listener_register(&scontainer->prereg_listener,
                                  &address_space_memory);
         if (bcontainer->error) {
             ret = -1;
@@ -513,7 +518,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
 
 listener_unregister_exit:
     if (v2) {
-        memory_listener_unregister(&container->prereg_listener);
+        memory_listener_unregister(&scontainer->prereg_listener);
     }
     return ret;
 }
@@ -523,7 +528,10 @@ void vfio_spapr_container_deinit(VFIOContainer *container)
     VFIOHostDMAWindow *hostwin, *next;
 
     if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
-        memory_listener_unregister(&container->prereg_listener);
+        VFIOSpaprContainer *scontainer = container_of(container,
+                                                      VFIOSpaprContainer,
+                                                      container);
+        memory_listener_unregister(&scontainer->prereg_listener);
     }
     QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
                        next) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 24/41] vfio/spapr: Move hostwin_list into spapr container
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (22 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 23/41] vfio/spapr: Move prereg_listener into spapr container Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 17:35   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 25/41] Add iommufd configure option Zhenzhong Duan
                   ` (18 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Nicholas Piggin, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Harsh Prateek Bora,
	open list:sPAPR (pseries)

No functional changes intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h |  1 -
 hw/vfio/spapr.c               | 36 +++++++++++++++++++----------------
 2 files changed, 20 insertions(+), 17 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index ed6148c058..24ecc0e7ee 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -79,7 +79,6 @@ typedef struct VFIOContainer {
     VFIOContainerBase bcontainer;
     int fd; /* /dev/vfio/vfio, empowered by the attached groups */
     unsigned iommu_type;
-    QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
     QLIST_HEAD(, VFIOGroup) group_list;
 } VFIOContainer;
 
diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
index 68c3dd6c75..5c6426e697 100644
--- a/hw/vfio/spapr.c
+++ b/hw/vfio/spapr.c
@@ -27,6 +27,7 @@
 typedef struct VFIOSpaprContainer {
     VFIOContainer container;
     MemoryListener prereg_listener;
+    QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
 } VFIOSpaprContainer;
 
 static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
@@ -154,12 +155,12 @@ static const MemoryListener vfio_prereg_listener = {
     .region_del = vfio_prereg_listener_region_del,
 };
 
-static void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova,
+static void vfio_host_win_add(VFIOSpaprContainer *scontainer, hwaddr min_iova,
                               hwaddr max_iova, uint64_t iova_pgsizes)
 {
     VFIOHostDMAWindow *hostwin;
 
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+    QLIST_FOREACH(hostwin, &scontainer->hostwin_list, hostwin_next) {
         if (ranges_overlap(hostwin->min_iova,
                            hostwin->max_iova - hostwin->min_iova + 1,
                            min_iova,
@@ -173,15 +174,15 @@ static void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova,
     hostwin->min_iova = min_iova;
     hostwin->max_iova = max_iova;
     hostwin->iova_pgsizes = iova_pgsizes;
-    QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
+    QLIST_INSERT_HEAD(&scontainer->hostwin_list, hostwin, hostwin_next);
 }
 
-static int vfio_host_win_del(VFIOContainer *container,
+static int vfio_host_win_del(VFIOSpaprContainer *scontainer,
                              hwaddr min_iova, hwaddr max_iova)
 {
     VFIOHostDMAWindow *hostwin;
 
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+    QLIST_FOREACH(hostwin, &scontainer->hostwin_list, hostwin_next) {
         if (hostwin->min_iova == min_iova && hostwin->max_iova == max_iova) {
             QLIST_REMOVE(hostwin, hostwin_next);
             g_free(hostwin);
@@ -192,7 +193,7 @@ static int vfio_host_win_del(VFIOContainer *container,
     return -1;
 }
 
-static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
+static VFIOHostDMAWindow *vfio_find_hostwin(VFIOSpaprContainer *container,
                                             hwaddr iova, hwaddr end)
 {
     VFIOHostDMAWindow *hostwin;
@@ -329,6 +330,8 @@ vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
 {
     VFIOContainer *container = container_of(bcontainer, VFIOContainer,
                                             bcontainer);
+    VFIOSpaprContainer *scontainer = container_of(container, VFIOSpaprContainer,
+                                                  container);
     VFIOHostDMAWindow *hostwin;
     hwaddr pgsize = 0;
     int ret;
@@ -344,7 +347,7 @@ vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
         iova = section->offset_within_address_space;
         end = iova + int128_get64(section->size) - 1;
 
-        if (!vfio_find_hostwin(container, iova, end)) {
+        if (!vfio_find_hostwin(scontainer, iova, end)) {
             error_setg(errp, "Container %p can't map guest IOVA region"
                        " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container,
                        iova, end);
@@ -358,7 +361,7 @@ vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
     }
 
     /* For now intersections are not allowed, we may relax this later */
-    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
+    QLIST_FOREACH(hostwin, &scontainer->hostwin_list, hostwin_next) {
         if (ranges_overlap(hostwin->min_iova,
                            hostwin->max_iova - hostwin->min_iova + 1,
                            section->offset_within_address_space,
@@ -380,7 +383,7 @@ vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
         return ret;
     }
 
-    vfio_host_win_add(container, section->offset_within_address_space,
+    vfio_host_win_add(scontainer, section->offset_within_address_space,
                       section->offset_within_address_space +
                       int128_get64(section->size) - 1, pgsize);
 #ifdef CONFIG_KVM
@@ -419,6 +422,8 @@ vfio_spapr_container_del_section_window(VFIOContainerBase *bcontainer,
 {
     VFIOContainer *container = container_of(bcontainer, VFIOContainer,
                                             bcontainer);
+    VFIOSpaprContainer *scontainer = container_of(container, VFIOSpaprContainer,
+                                                  container);
 
     if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
         return;
@@ -426,7 +431,7 @@ vfio_spapr_container_del_section_window(VFIOContainerBase *bcontainer,
 
     vfio_spapr_remove_window(container,
                              section->offset_within_address_space);
-    if (vfio_host_win_del(container,
+    if (vfio_host_win_del(scontainer,
                           section->offset_within_address_space,
                           section->offset_within_address_space +
                           int128_get64(section->size) - 1) < 0) {
@@ -454,7 +459,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
     bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
     int ret, fd = container->fd;
 
-    QLIST_INIT(&container->hostwin_list);
+    QLIST_INIT(&scontainer->hostwin_list);
 
     /*
      * The host kernel code implementing VFIO_IOMMU_DISABLE is called
@@ -506,7 +511,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
     } else {
         /* The default table uses 4K pages */
         bcontainer->pgsizes = 0x1000;
-        vfio_host_win_add(container, info.dma32_window_start,
+        vfio_host_win_add(scontainer, info.dma32_window_start,
                           info.dma32_window_start +
                           info.dma32_window_size - 1,
                           0x1000);
@@ -525,15 +530,14 @@ listener_unregister_exit:
 
 void vfio_spapr_container_deinit(VFIOContainer *container)
 {
+    VFIOSpaprContainer *scontainer = container_of(container, VFIOSpaprContainer,
+                                                  container);
     VFIOHostDMAWindow *hostwin, *next;
 
     if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
-        VFIOSpaprContainer *scontainer = container_of(container,
-                                                      VFIOSpaprContainer,
-                                                      container);
         memory_listener_unregister(&scontainer->prereg_listener);
     }
-    QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
+    QLIST_FOREACH_SAFE(hostwin, &scontainer->hostwin_list, hostwin_next,
                        next) {
         QLIST_REMOVE(hostwin, hostwin_next);
         g_free(hostwin);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 25/41] Add iommufd configure option
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (23 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 24/41] vfio/spapr: Move hostwin_list " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-07 13:14   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
                   ` (17 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Paolo Bonzini, Marc-André Lureau,
	Daniel P. Berrangé, Thomas Huth, Philippe Mathieu-Daudé

This adds "--enable-iommufd/--disable-iommufd" to enable or disable
iommufd support, enabled by default.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 meson.build                   | 6 ++++++
 meson_options.txt             | 2 ++
 scripts/meson-buildoptions.sh | 3 +++
 3 files changed, 11 insertions(+)

diff --git a/meson.build b/meson.build
index dcef8b1e79..72a57288a0 100644
--- a/meson.build
+++ b/meson.build
@@ -560,6 +560,10 @@ have_tpm = get_option('tpm') \
   .require(targetos != 'windows', error_message: 'TPM emulation only available on POSIX systems') \
   .allowed()
 
+have_iommufd = get_option('iommufd') \
+  .require(targetos == 'linux', error_message: 'iommufd is supported only on Linux') \
+  .allowed()
+
 # vhost
 have_vhost_user = get_option('vhost_user') \
   .disable_auto_if(targetos != 'linux') \
@@ -2133,6 +2137,7 @@ if get_option('tcg').allowed()
 endif
 config_host_data.set('CONFIG_TPM', have_tpm)
 config_host_data.set('CONFIG_TSAN', get_option('tsan'))
+config_host_data.set('CONFIG_IOMMUFD', have_iommufd)
 config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
 config_host_data.set('CONFIG_VDE', vde.found())
 config_host_data.set('CONFIG_VHOST', have_vhost)
@@ -4075,6 +4080,7 @@ summary_info += {'vhost-user-crypto support': have_vhost_user_crypto}
 summary_info += {'vhost-user-blk server support': have_vhost_user_blk_server}
 summary_info += {'vhost-vdpa support': have_vhost_vdpa}
 summary_info += {'build guest agent': have_ga}
+summary_info += {'iommufd support': have_iommufd}
 summary(summary_info, bool_yn: true, section: 'Configurable features')
 
 # Compilation information
diff --git a/meson_options.txt b/meson_options.txt
index 3c7398f3c6..91bb958cae 100644
--- a/meson_options.txt
+++ b/meson_options.txt
@@ -109,6 +109,8 @@ option('dbus_display', type: 'feature', value: 'auto',
        description: '-display dbus support')
 option('tpm', type : 'feature', value : 'auto',
        description: 'TPM support')
+option('iommufd', type : 'feature', value : 'auto',
+       description: 'iommufd support')
 
 # Do not enable it by default even for Mingw32, because it doesn't
 # work on Wine.
diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
index 7ca4b77eae..1effc46f7d 100644
--- a/scripts/meson-buildoptions.sh
+++ b/scripts/meson-buildoptions.sh
@@ -125,6 +125,7 @@ meson_options_help() {
   printf "%s\n" '  guest-agent-msi Build MSI package for the QEMU Guest Agent'
   printf "%s\n" '  hvf             HVF acceleration support'
   printf "%s\n" '  iconv           Font glyph conversion support'
+  printf "%s\n" '  iommufd         iommufd support'
   printf "%s\n" '  jack            JACK sound support'
   printf "%s\n" '  keyring         Linux keyring support'
   printf "%s\n" '  kvm             KVM acceleration support'
@@ -342,6 +343,8 @@ _meson_option_parse() {
     --enable-install-blobs) printf "%s" -Dinstall_blobs=true ;;
     --disable-install-blobs) printf "%s" -Dinstall_blobs=false ;;
     --interp-prefix=*) quote_sh "-Dinterp_prefix=$2" ;;
+    --enable-iommufd) printf "%s" -Diommufd=enabled ;;
+    --disable-iommufd) printf "%s" -Diommufd=disabled ;;
     --enable-jack) printf "%s" -Djack=enabled ;;
     --disable-jack) printf "%s" -Djack=disabled ;;
     --enable-keyring) printf "%s" -Dkeyring=enabled ;;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (24 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 25/41] Add iommufd configure option Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-07 13:33   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 27/41] util/char_dev: Add open_cdev() Zhenzhong Duan
                   ` (16 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Paolo Bonzini, Eric Blake, Markus Armbruster,
	Daniel P. Berrangé,
	Eduardo Habkost

From: Eric Auger <eric.auger@redhat.com>

Introduce an iommufd object which allows the interaction
with the host /dev/iommu device.

The /dev/iommu can have been already pre-opened outside of qemu,
in which case the fd can be passed directly along with the
iommufd object:

This allows the iommufd object to be shared accross several
subsystems (VFIO, VDPA, ...). For example, libvirt would open
the /dev/iommu once.

If no fd is passed along with the iommufd object, the /dev/iommu
is opened by the qemu code.

The CONFIG_IOMMUFD option must be set to compile this new object.

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: add CONFIG_IOMMUFD check, document default case

 MAINTAINERS              |   7 ++
 qapi/qom.json            |  22 ++++
 include/sysemu/iommufd.h |  46 +++++++
 backends/iommufd-stub.c  |  59 +++++++++
 backends/iommufd.c       | 257 +++++++++++++++++++++++++++++++++++++++
 backends/Kconfig         |   4 +
 backends/meson.build     |   5 +
 backends/trace-events    |  12 ++
 qemu-options.hx          |  13 ++
 9 files changed, 425 insertions(+)
 create mode 100644 include/sysemu/iommufd.h
 create mode 100644 backends/iommufd-stub.c
 create mode 100644 backends/iommufd.c

diff --git a/MAINTAINERS b/MAINTAINERS
index cd8d6b140f..6f35159255 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2135,6 +2135,13 @@ F: hw/vfio/ap.c
 F: docs/system/s390x/vfio-ap.rst
 L: qemu-s390x@nongnu.org
 
+iommufd
+M: Yi Liu <yi.l.liu@intel.com>
+M: Eric Auger <eric.auger@redhat.com>
+S: Supported
+F: backends/iommufd.c
+F: include/sysemu/iommufd.h
+
 vhost
 M: Michael S. Tsirkin <mst@redhat.com>
 S: Supported
diff --git a/qapi/qom.json b/qapi/qom.json
index c53ef978ff..27300add48 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -794,6 +794,24 @@
 { 'struct': 'VfioUserServerProperties',
   'data': { 'socket': 'SocketAddress', 'device': 'str' } }
 
+##
+# @IOMMUFDProperties:
+#
+# Properties for iommufd objects.
+#
+# @fd: file descriptor name previously passed via 'getfd' command,
+#     which represents a pre-opened /dev/iommu.  This allows the
+#     iommufd object to be shared accross several subsystems
+#     (VFIO, VDPA, ...), and the file descriptor to be shared
+#     with other process, e.g. DPDK.  (default: QEMU opens
+#     /dev/iommu by itself)
+#
+# Since: 8.2
+##
+{ 'struct': 'IOMMUFDProperties',
+  'data': { '*fd': 'str' },
+  'if': 'CONFIG_IOMMUFD' }
+
 ##
 # @RngProperties:
 #
@@ -934,6 +952,8 @@
     'input-barrier',
     { 'name': 'input-linux',
       'if': 'CONFIG_LINUX' },
+    { 'name': 'iommufd',
+      'if': 'CONFIG_IOMMUFD' },
     'iothread',
     'main-loop',
     { 'name': 'memory-backend-epc',
@@ -1003,6 +1023,8 @@
       'input-barrier':              'InputBarrierProperties',
       'input-linux':                { 'type': 'InputLinuxProperties',
                                       'if': 'CONFIG_LINUX' },
+      'iommufd':                    { 'type': 'IOMMUFDProperties',
+                                      'if': 'CONFIG_IOMMUFD' },
       'iothread':                   'IothreadProperties',
       'main-loop':                  'MainLoopProperties',
       'memory-backend-epc':         { 'type': 'MemoryBackendEpcProperties',
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
new file mode 100644
index 0000000000..f0e5c7eeb8
--- /dev/null
+++ b/include/sysemu/iommufd.h
@@ -0,0 +1,46 @@
+#ifndef SYSEMU_IOMMUFD_H
+#define SYSEMU_IOMMUFD_H
+
+#include "qom/object.h"
+#include "qemu/thread.h"
+#include "exec/hwaddr.h"
+#include "exec/cpu-common.h"
+
+#define TYPE_IOMMUFD_BACKEND "iommufd"
+OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
+                    IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND(obj) \
+    OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND_GET_CLASS(obj) \
+    OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND_CLASS(klass) \
+    OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND)
+struct IOMMUFDBackendClass {
+    ObjectClass parent_class;
+};
+
+struct IOMMUFDBackend {
+    Object parent;
+
+    /*< protected >*/
+    int fd;            /* /dev/iommu file descriptor */
+    bool owned;        /* is the /dev/iommu opened internally */
+    QemuMutex lock;
+    uint32_t users;
+
+    /*< public >*/
+};
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
+void iommufd_backend_disconnect(IOMMUFDBackend *be);
+
+int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id);
+void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id);
+void iommufd_backend_free_id(int fd, uint32_t id);
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+                            ram_addr_t size, void *vaddr, bool readonly);
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+                              hwaddr iova, ram_addr_t size);
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+                               uint32_t pt_id, uint32_t *out_hwpt);
+#endif
diff --git a/backends/iommufd-stub.c b/backends/iommufd-stub.c
new file mode 100644
index 0000000000..02ac844c17
--- /dev/null
+++ b/backends/iommufd-stub.c
@@ -0,0 +1,59 @@
+/*
+ * iommufd container backend stub
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *          Eric Auger <eric.auger@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/iommufd.h"
+#include "qemu/error-report.h"
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
+{
+    return 0;
+}
+void iommufd_backend_disconnect(IOMMUFDBackend *be)
+{
+}
+void iommufd_backend_free_id(int fd, uint32_t id)
+{
+}
+int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id)
+{
+    return 0;
+}
+void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id)
+{
+}
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+                            ram_addr_t size, void *vaddr, bool readonly)
+{
+    return 0;
+}
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+                              hwaddr iova, ram_addr_t size)
+{
+    return 0;
+}
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+                               uint32_t pt_id, uint32_t *out_hwpt)
+{
+    return 0;
+}
diff --git a/backends/iommufd.c b/backends/iommufd.c
new file mode 100644
index 0000000000..a526d58824
--- /dev/null
+++ b/backends/iommufd.c
@@ -0,0 +1,257 @@
+/*
+ * iommufd container backend
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *          Eric Auger <eric.auger@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/iommufd.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/module.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "monitor/monitor.h"
+#include "trace.h"
+#include <sys/ioctl.h>
+#include <linux/iommufd.h>
+
+static void iommufd_backend_init(Object *obj)
+{
+    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+    be->fd = -1;
+    be->users = 0;
+    be->owned = true;
+    qemu_mutex_init(&be->lock);
+}
+
+static void iommufd_backend_finalize(Object *obj)
+{
+    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+    if (be->owned) {
+        close(be->fd);
+        be->fd = -1;
+    }
+}
+
+static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
+{
+    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+    int fd = -1;
+
+    fd = monitor_fd_param(monitor_cur(), str, errp);
+    if (fd == -1) {
+        error_prepend(errp, "Could not parse remote object fd %s:", str);
+        return;
+    }
+    qemu_mutex_lock(&be->lock);
+    be->fd = fd;
+    be->owned = false;
+    qemu_mutex_unlock(&be->lock);
+    trace_iommu_backend_set_fd(be->fd);
+}
+
+static void iommufd_backend_class_init(ObjectClass *oc, void *data)
+{
+    object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
+}
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
+{
+    int fd, ret = 0;
+
+    qemu_mutex_lock(&be->lock);
+    if (be->users == UINT32_MAX) {
+        error_setg(errp, "too many connections");
+        ret = -E2BIG;
+        goto out;
+    }
+    if (be->owned && !be->users) {
+        fd = qemu_open_old("/dev/iommu", O_RDWR);
+        if (fd < 0) {
+            error_setg_errno(errp, errno, "/dev/iommu opening failed");
+            ret = fd;
+            goto out;
+        }
+        be->fd = fd;
+    }
+    be->users++;
+out:
+    trace_iommufd_backend_connect(be->fd, be->owned,
+                                  be->users, ret);
+    qemu_mutex_unlock(&be->lock);
+    return ret;
+}
+
+void iommufd_backend_disconnect(IOMMUFDBackend *be)
+{
+    qemu_mutex_lock(&be->lock);
+    if (!be->users) {
+        goto out;
+    }
+    be->users--;
+    if (!be->users && be->owned) {
+        close(be->fd);
+        be->fd = -1;
+    }
+out:
+    trace_iommufd_backend_disconnect(be->fd, be->users);
+    qemu_mutex_unlock(&be->lock);
+}
+
+static int iommufd_backend_alloc_ioas(int fd, uint32_t *ioas_id)
+{
+    int ret;
+    struct iommu_ioas_alloc alloc_data  = {
+        .size = sizeof(alloc_data),
+        .flags = 0,
+    };
+
+    ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
+    if (ret) {
+        error_report("Failed to allocate ioas %m");
+    }
+
+    *ioas_id = alloc_data.out_ioas_id;
+    trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
+
+    return ret;
+}
+
+void iommufd_backend_free_id(int fd, uint32_t id)
+{
+    int ret;
+    struct iommu_destroy des = {
+        .size = sizeof(des),
+        .id = id,
+    };
+
+    ret = ioctl(fd, IOMMU_DESTROY, &des);
+    trace_iommufd_backend_free_id(fd, id, ret);
+    if (ret) {
+        error_report("Failed to free id: %u %m", id);
+    }
+}
+
+int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id)
+{
+    int ret;
+
+    ret = iommufd_backend_alloc_ioas(be->fd, ioas_id);
+    trace_iommufd_backend_get_ioas(be->fd, *ioas_id, ret);
+    return ret;
+}
+
+void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id)
+{
+    iommufd_backend_free_id(be->fd, ioas_id);
+    trace_iommufd_backend_put_ioas(be->fd, ioas_id);
+}
+
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+                            ram_addr_t size, void *vaddr, bool readonly)
+{
+    int ret;
+    struct iommu_ioas_map map = {
+        .size = sizeof(map),
+        .flags = IOMMU_IOAS_MAP_READABLE |
+                 IOMMU_IOAS_MAP_FIXED_IOVA,
+        .ioas_id = ioas_id,
+        .__reserved = 0,
+        .user_va = (uintptr_t)vaddr,
+        .iova = iova,
+        .length = size,
+    };
+
+    if (!readonly) {
+        map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
+    }
+
+    ret = ioctl(be->fd, IOMMU_IOAS_MAP, &map);
+    trace_iommufd_backend_map_dma(be->fd, ioas_id, iova, size,
+                                  vaddr, readonly, ret);
+    if (ret) {
+        error_report("IOMMU_IOAS_MAP failed: %m");
+    }
+    return !ret ? 0 : -errno;
+}
+
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+                              hwaddr iova, ram_addr_t size)
+{
+    int ret;
+    struct iommu_ioas_unmap unmap = {
+        .size = sizeof(unmap),
+        .ioas_id = ioas_id,
+        .iova = iova,
+        .length = size,
+    };
+
+    ret = ioctl(be->fd, IOMMU_IOAS_UNMAP, &unmap);
+    trace_iommufd_backend_unmap_dma(be->fd, ioas_id, iova, size, ret);
+    /*
+     * TODO: IOMMUFD doesn't support mapping PCI BARs for now.
+     * It's not a problem if there is no p2p dma, relax it here
+     * and avoid many noisy trigger from vIOMMU side.
+     */
+    if (ret && errno == ENOENT) {
+        ret = 0;
+    }
+    if (ret) {
+        error_report("IOMMU_IOAS_UNMAP failed: %m");
+    }
+    return !ret ? 0 : -errno;
+}
+
+int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
+                               uint32_t pt_id, uint32_t *out_hwpt)
+{
+    int ret;
+    struct iommu_hwpt_alloc alloc_hwpt = {
+        .size = sizeof(struct iommu_hwpt_alloc),
+        .flags = 0,
+        .dev_id = dev_id,
+        .pt_id = pt_id,
+        .__reserved = 0,
+    };
+
+    ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+    trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id,
+                                     alloc_hwpt.out_hwpt_id, ret);
+
+    if (ret) {
+        error_report("IOMMU_HWPT_ALLOC failed: %m");
+    } else {
+        *out_hwpt = alloc_hwpt.out_hwpt_id;
+    }
+    return !ret ? 0 : -errno;
+}
+
+static const TypeInfo iommufd_backend_info = {
+    .name = TYPE_IOMMUFD_BACKEND,
+    .parent = TYPE_OBJECT,
+    .instance_size = sizeof(IOMMUFDBackend),
+    .instance_init = iommufd_backend_init,
+    .instance_finalize = iommufd_backend_finalize,
+    .class_size = sizeof(IOMMUFDBackendClass),
+    .class_init = iommufd_backend_class_init,
+    .interfaces = (InterfaceInfo[]) {
+        { TYPE_USER_CREATABLE },
+        { }
+    }
+};
+
+static void register_types(void)
+{
+    type_register_static(&iommufd_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/Kconfig b/backends/Kconfig
index f35abc1609..2cb23f62fa 100644
--- a/backends/Kconfig
+++ b/backends/Kconfig
@@ -1 +1,5 @@
 source tpm/Kconfig
+
+config IOMMUFD
+    bool
+    depends on VFIO
diff --git a/backends/meson.build b/backends/meson.build
index 914c7c4afb..05ac57ff15 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -20,6 +20,11 @@ if have_vhost_user
   system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c'))
 endif
 system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost.c'))
+if have_iommufd
+  system_ss.add(files('iommufd.c'))
+else
+  system_ss.add(files('iommufd-stub.c'))
+endif
 if have_vhost_user_crypto
   system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost-user.c'))
 endif
diff --git a/backends/trace-events b/backends/trace-events
index 652eb76a57..e5f828bca2 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -5,3 +5,15 @@ dbus_vmstate_pre_save(void)
 dbus_vmstate_post_load(int version_id) "version_id: %d"
 dbus_vmstate_loading(const char *id) "id: %s"
 dbus_vmstate_saving(const char *id) "id: %s"
+
+# iommufd.c
+iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd=%d owned=%d users=%d (%d)"
+iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
+iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
+iommufd_backend_get_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
+iommufd_backend_put_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
+iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
+iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
+iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u out_hwpt=%u (%d)"
diff --git a/qemu-options.hx b/qemu-options.hx
index e26230bac5..ddfaddf8ce 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5210,6 +5210,19 @@ SRST
 
         The ``share`` boolean option is on by default with memfd.
 
+#ifdef CONFIG_IOMMUFD
+    ``-object iommufd,id=id[,fd=fd]``
+        Creates an iommufd backend which allows control of DMA mapping
+        through the /dev/iommu device.
+
+        The ``id`` parameter is a unique ID which frontends (such as
+        vfio-pci of vdpa) will use to connect with the iommufd backend.
+
+        The ``fd`` parameter is an optional pre-opened file descriptor
+        resulting from /dev/iommu opening. Usually the iommufd is shared
+        across all subsystems, bringing the benefit of centralized
+        reference counting.
+#endif
     ``-object rng-builtin,id=id``
         Creates a random number generator backend which obtains entropy
         from QEMU builtin functions. The ``id`` parameter is a unique ID
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 27/41] util/char_dev: Add open_cdev()
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (25 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-07 13:37   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
                   ` (15 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

From: Yi Liu <yi.l.liu@intel.com>

/dev/vfio/devices/vfioX may not exist. In that case it is still possible
to open /dev/char/$major:$minor instead. Add helper function to abstract
the cdev open.

Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 MAINTAINERS                 |  6 +++
 include/qemu/chardev_open.h | 16 ++++++++
 util/chardev_open.c         | 81 +++++++++++++++++++++++++++++++++++++
 util/meson.build            |  1 +
 4 files changed, 104 insertions(+)
 create mode 100644 include/qemu/chardev_open.h
 create mode 100644 util/chardev_open.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 6f35159255..eada773975 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3473,6 +3473,12 @@ S: Maintained
 F: include/qemu/iova-tree.h
 F: util/iova-tree.c
 
+cdev Open
+M: Yi Liu <yi.l.liu@intel.com>
+S: Maintained
+F: include/qemu/chardev_open.h
+F: util/chardev_open.c
+
 elf2dmp
 M: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
 S: Maintained
diff --git a/include/qemu/chardev_open.h b/include/qemu/chardev_open.h
new file mode 100644
index 0000000000..64e8fcfdcb
--- /dev/null
+++ b/include/qemu/chardev_open.h
@@ -0,0 +1,16 @@
+/*
+ * QEMU Chardev Helper
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_CHARDEV_OPEN_H
+#define QEMU_CHARDEV_OPEN_H
+
+int open_cdev(const char *devpath, dev_t cdev);
+#endif
diff --git a/util/chardev_open.c b/util/chardev_open.c
new file mode 100644
index 0000000000..f776429788
--- /dev/null
+++ b/util/chardev_open.c
@@ -0,0 +1,81 @@
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ * Copyright (C) 2023 Intel Corporation.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *      Redistribution and use in source and binary forms, with or
+ *      without modification, are permitted provided that the following
+ *      conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *
+ * Copied from
+ * https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/chardev_open.h"
+
+static int open_cdev_internal(const char *path, dev_t cdev)
+{
+    struct stat st;
+    int fd;
+
+    fd = qemu_open_old(path, O_RDWR);
+    if (fd == -1) {
+        return -1;
+    }
+    if (fstat(fd, &st) || !S_ISCHR(st.st_mode) ||
+        (cdev != 0 && st.st_rdev != cdev)) {
+        close(fd);
+        return -1;
+    }
+    return fd;
+}
+
+static int open_cdev_robust(dev_t cdev)
+{
+    g_autofree char *devpath = NULL;
+
+    /*
+     * This assumes that udev is being used and is creating the /dev/char/
+     * symlinks.
+     */
+    devpath = g_strdup_printf("/dev/char/%u:%u", major(cdev), minor(cdev));
+    return open_cdev_internal(devpath, cdev);
+}
+
+int open_cdev(const char *devpath, dev_t cdev)
+{
+    int fd;
+
+    fd = open_cdev_internal(devpath, cdev);
+    if (fd == -1 && cdev != 0) {
+        return open_cdev_robust(cdev);
+    }
+    return fd;
+}
diff --git a/util/meson.build b/util/meson.build
index eb677b40c2..eda0b06062 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -107,6 +107,7 @@ if have_block
     util_ss.add(files('filemonitor-stub.c'))
   endif
   util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c'))
+  util_ss.add(when: 'CONFIG_LINUX', if_true: files('chardev_open.c'))
 endif
 
 if cpu == 'aarch64'
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (26 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 27/41] util/char_dev: Add open_cdev() Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-07 13:41   ` Cédric Le Goater
  2023-11-08  2:59   ` Matthew Rosato
  2023-11-02  7:12 ` [PATCH v4 29/41] vfio/iommufd: Relax assert check for " Zhenzhong Duan
                   ` (14 subsequent siblings)
  42 siblings, 2 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

From: Yi Liu <yi.l.liu@intel.com>

Add the iommufd backend. The IOMMUFD container class is implemented
based on the new /dev/iommu user API. This backend obviously depends
on CONFIG_IOMMUFD.

So far, the iommufd backend doesn't support dirty page sync yet due
to missing support in the host kernel.

Co-authored-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: use SPDX identifier, use iommufd_cdev_* prefix, merge with manual alloc patch

 include/hw/vfio/vfio-common.h |  23 ++
 hw/vfio/common.c              |  19 +-
 hw/vfio/iommufd.c             | 504 ++++++++++++++++++++++++++++++++++
 hw/vfio/meson.build           |   3 +
 hw/vfio/trace-events          |  13 +
 5 files changed, 558 insertions(+), 4 deletions(-)
 create mode 100644 hw/vfio/iommufd.c

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 24ecc0e7ee..3f1a39a991 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -89,6 +89,23 @@ typedef struct VFIOHostDMAWindow {
     QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
 } VFIOHostDMAWindow;
 
+#ifdef CONFIG_IOMMUFD
+typedef struct VFIOIOASHwpt {
+    uint32_t hwpt_id;
+    QLIST_HEAD(, VFIODevice) device_list;
+    QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
+typedef struct IOMMUFDBackend IOMMUFDBackend;
+
+typedef struct VFIOIOMMUFDContainer {
+    VFIOContainerBase bcontainer;
+    IOMMUFDBackend *be;
+    uint32_t ioas_id;
+    QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
+} VFIOIOMMUFDContainer;
+#endif
+
 typedef struct VFIODeviceOps VFIODeviceOps;
 
 typedef struct VFIODevice {
@@ -116,6 +133,11 @@ typedef struct VFIODevice {
     OnOffAuto pre_copy_dirty_page_tracking;
     bool dirty_pages_supported;
     bool dirty_tracking;
+#ifdef CONFIG_IOMMUFD
+    int devid;
+    VFIOIOASHwpt *hwpt;
+    IOMMUFDBackend *iommufd;
+#endif
 } VFIODevice;
 
 struct VFIODeviceOps {
@@ -201,6 +223,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
 extern VFIOGroupList vfio_group_list;
 extern VFIODeviceList vfio_device_list;
 extern const VFIOIOMMUOps vfio_legacy_ops;
+extern const VFIOIOMMUOps vfio_iommufd_ops;
 extern const MemoryListener vfio_memory_listener;
 extern int vfio_kvm_device_fd;
 
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 572ae7c934..a61dce2845 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1462,10 +1462,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
 
 void vfio_put_address_space(VFIOAddressSpace *space)
 {
-    if (QLIST_EMPTY(&space->containers)) {
-        QLIST_REMOVE(space, list);
-        g_free(space);
+    if (!QLIST_EMPTY(&space->containers)) {
+        return;
     }
+
+    QLIST_REMOVE(space, list);
+    g_free(space);
+
     if (QLIST_EMPTY(&vfio_address_spaces)) {
         qemu_unregister_reset(vfio_reset_handler, NULL);
     }
@@ -1498,8 +1501,16 @@ retry:
 int vfio_attach_device(char *name, VFIODevice *vbasedev,
                        AddressSpace *as, Error **errp)
 {
-    const VFIOIOMMUOps *ops = &vfio_legacy_ops;
+    const VFIOIOMMUOps *ops;
 
+#ifdef CONFIG_IOMMUFD
+    if (vbasedev->iommufd) {
+        ops = &vfio_iommufd_ops;
+    } else
+#endif
+    {
+        ops = &vfio_legacy_ops;
+    }
     return ops->attach_device(name, vbasedev, as, errp);
 }
 
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
new file mode 100644
index 0000000000..1bb55ca2c4
--- /dev/null
+++ b/hw/vfio/iommufd.c
@@ -0,0 +1,504 @@
+/*
+ * iommufd container backend
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *          Eric Auger <eric.auger@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include <sys/ioctl.h>
+#include <linux/vfio.h>
+#include <linux/iommufd.h>
+
+#include "hw/vfio/vfio-common.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "qapi/error.h"
+#include "sysemu/iommufd.h"
+#include "hw/qdev-core.h"
+#include "sysemu/reset.h"
+#include "qemu/cutils.h"
+#include "qemu/chardev_open.h"
+
+static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
+                       ram_addr_t size, void *vaddr, bool readonly)
+{
+    VFIOIOMMUFDContainer *container =
+        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+    return iommufd_backend_map_dma(container->be,
+                                   container->ioas_id,
+                                   iova, size, vaddr, readonly);
+}
+
+static int iommufd_unmap(VFIOContainerBase *bcontainer,
+                         hwaddr iova, ram_addr_t size,
+                         IOMMUTLBEntry *iotlb)
+{
+    VFIOIOMMUFDContainer *container =
+        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+    /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
+    return iommufd_backend_unmap_dma(container->be,
+                                     container->ioas_id, iova, size);
+}
+
+static void iommufd_cdev_kvm_device_add(VFIODevice *vbasedev)
+{
+    Error *err = NULL;
+
+    if (vfio_kvm_device_add_fd(vbasedev->fd, &err)) {
+        error_report_err(err);
+    }
+}
+
+static void iommufd_cdev_kvm_device_del(VFIODevice *vbasedev)
+{
+    Error *err = NULL;
+
+    if (vfio_kvm_device_del_fd(vbasedev->fd, &err)) {
+        error_report_err(err);
+    }
+}
+
+static int iommufd_connect_and_bind(VFIODevice *vbasedev, Error **errp)
+{
+    IOMMUFDBackend *iommufd = vbasedev->iommufd;
+    struct vfio_device_bind_iommufd bind = {
+        .argsz = sizeof(bind),
+        .flags = 0,
+    };
+    int ret;
+
+    ret = iommufd_backend_connect(iommufd, errp);
+    if (ret) {
+        return ret;
+    }
+
+    /*
+     * Add device to kvm-vfio to be prepared for the tracking
+     * in KVM. Especially for some emulated devices, it requires
+     * to have kvm information in the device open.
+     */
+    iommufd_cdev_kvm_device_add(vbasedev);
+
+    /* Bind device to iommufd */
+    bind.iommufd = iommufd->fd;
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
+    if (ret) {
+        error_setg_errno(errp, errno, "error bind device fd=%d to iommufd=%d",
+                         vbasedev->fd, bind.iommufd);
+        goto err_bind;
+    }
+
+    vbasedev->devid = bind.out_devid;
+    trace_iommufd_connect_and_bind(bind.iommufd, vbasedev->name, vbasedev->fd,
+                                   vbasedev->devid);
+    return ret;
+err_bind:
+    iommufd_cdev_kvm_device_del(vbasedev);
+    iommufd_backend_disconnect(iommufd);
+    return ret;
+}
+
+static void iommufd_unbind_and_disconnect(VFIODevice *vbasedev)
+{
+    /* Unbind is automatically conducted when device fd is closed */
+    iommufd_cdev_kvm_device_del(vbasedev);
+    iommufd_backend_disconnect(vbasedev->iommufd);
+}
+
+static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
+{
+    long int ret = -ENOTTY;
+    char *path, *vfio_dev_path = NULL, *vfio_path = NULL;
+    DIR *dir = NULL;
+    struct dirent *dent;
+    gchar *contents;
+    struct stat st;
+    gsize length;
+    int major, minor;
+    dev_t vfio_devt;
+
+    path = g_strdup_printf("%s/vfio-dev", sysfs_path);
+    if (stat(path, &st) < 0) {
+        error_setg_errno(errp, errno, "no such host device");
+        goto out_free_path;
+    }
+
+    dir = opendir(path);
+    if (!dir) {
+        error_setg_errno(errp, errno, "couldn't open dirrectory %s", path);
+        goto out_free_path;
+    }
+
+    while ((dent = readdir(dir))) {
+        if (!strncmp(dent->d_name, "vfio", 4)) {
+            vfio_dev_path = g_strdup_printf("%s/%s/dev", path, dent->d_name);
+            break;
+        }
+    }
+
+    if (!vfio_dev_path) {
+        error_setg(errp, "failed to find vfio-dev/vfioX/dev");
+        goto out_close_dir;
+    }
+
+    if (!g_file_get_contents(vfio_dev_path, &contents, &length, NULL)) {
+        error_setg(errp, "failed to load \"%s\"", vfio_dev_path);
+        goto out_free_dev_path;
+    }
+
+    if (sscanf(contents, "%d:%d", &major, &minor) != 2) {
+        error_setg(errp, "failed to get major:minor for \"%s\"", vfio_dev_path);
+        goto out_free_dev_path;
+    }
+    g_free(contents);
+    vfio_devt = makedev(major, minor);
+
+    vfio_path = g_strdup_printf("/dev/vfio/devices/%s", dent->d_name);
+    ret = open_cdev(vfio_path, vfio_devt);
+    if (ret < 0) {
+        error_setg(errp, "Failed to open %s", vfio_path);
+    }
+
+    trace_iommufd_cdev_getfd(vfio_path, ret);
+    g_free(vfio_path);
+
+out_free_dev_path:
+    g_free(vfio_dev_path);
+out_close_dir:
+    closedir(dir);
+out_free_path:
+    if (*errp) {
+        error_prepend(errp, VFIO_MSG_PREFIX, path);
+    }
+    g_free(path);
+
+    return ret;
+}
+
+static VFIOIOASHwpt *iommufd_container_get_hwpt(VFIOIOMMUFDContainer *container,
+                                                uint32_t hwpt_id)
+{
+    VFIOIOASHwpt *hwpt;
+
+    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+        if (hwpt->hwpt_id == hwpt_id) {
+            return hwpt;
+        }
+    }
+
+    hwpt = g_malloc0(sizeof(*hwpt));
+
+    hwpt->hwpt_id = hwpt_id;
+    QLIST_INIT(&hwpt->device_list);
+    QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+
+    return hwpt;
+}
+
+static void iommufd_container_put_hwpt(IOMMUFDBackend *be, VFIOIOASHwpt *hwpt)
+{
+    QLIST_REMOVE(hwpt, next);
+    iommufd_backend_free_id(be->fd, hwpt->hwpt_id);
+    g_free(hwpt);
+}
+
+static int iommufd_cdev_attach_hwpt(VFIODevice *vbasedev, uint32_t hwpt_id,
+                                    Error **errp)
+{
+    int ret, iommufd = vbasedev->iommufd->fd;
+    struct vfio_device_attach_iommufd_pt attach_data = {
+        .argsz = sizeof(attach_data),
+        .flags = 0,
+        .pt_id = hwpt_id,
+    };
+
+    /* Attach device to an hwpt within iommufd */
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
+    if (ret) {
+        error_setg_errno(errp, errno,
+                         "[iommufd=%d] error attach %s (%d) to hwpt_id=%d",
+                         iommufd, vbasedev->name, vbasedev->fd, hwpt_id);
+    }
+    trace_iommufd_cdev_attach_hwpt(iommufd, vbasedev->name, vbasedev->fd,
+                                   hwpt_id);
+    return ret;
+}
+
+static int iommufd_cdev_detach_hwpt(VFIODevice *vbasedev, Error **errp)
+{
+    int ret, iommufd = vbasedev->iommufd->fd;
+    struct vfio_device_detach_iommufd_pt detach_data = {
+        .argsz = sizeof(detach_data),
+        .flags = 0,
+    };
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data);
+    if (ret) {
+        error_setg_errno(errp, errno, "detach %s from ioas failed",
+                         vbasedev->name);
+    }
+    trace_iommufd_cdev_detach_hwpt(iommufd, vbasedev->name,
+                                   vbasedev->hwpt->hwpt_id);
+    return ret;
+}
+
+static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
+                                         VFIOIOMMUFDContainer *container,
+                                         Error **errp)
+{
+    int ret, iommufd = vbasedev->iommufd->fd;
+    VFIOIOASHwpt *hwpt;
+    uint32_t hwpt_id;
+    Error *err = NULL;
+
+    /* try to attach to an existing hwpt in this container */
+    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+        ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt->hwpt_id, &err);
+        if (ret) {
+            const char *msg = error_get_pretty(err);
+
+            trace_iommufd_cdev_fail_attach_existing_hwpt(msg);
+            error_free(err);
+            err = NULL;
+        } else {
+            goto found_hwpt;
+        }
+    }
+
+    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
+                                     container->ioas_id, &hwpt_id);
+
+    if (ret) {
+        error_setg_errno(errp, errno, "error alloc shadow hwpt");
+        return ret;
+    }
+
+    /* Attach cdev to a new allocated hwpt within iommufd */
+    ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt_id, errp);
+    if (ret) {
+        iommufd_backend_free_id(iommufd, hwpt_id);
+        return ret;
+    }
+
+    hwpt = iommufd_container_get_hwpt(container, hwpt_id);
+found_hwpt:
+    QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, next);
+    vbasedev->hwpt = hwpt;
+
+    trace_iommufd_cdev_attach_container(iommufd, vbasedev->name, vbasedev->fd,
+                                        container->ioas_id, hwpt->hwpt_id);
+    return ret;
+}
+
+static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
+                                          VFIOIOMMUFDContainer *container)
+{
+    VFIOIOASHwpt *hwpt = vbasedev->hwpt;
+    Error *err = NULL;
+    int ret;
+
+    ret = iommufd_cdev_detach_hwpt(vbasedev, &err);
+    if (ret) {
+        error_report_err(err);
+    }
+
+    QLIST_REMOVE(vbasedev, next);
+    vbasedev->hwpt = NULL;
+    if (QLIST_EMPTY(&hwpt->device_list)) {
+        iommufd_container_put_hwpt(vbasedev->iommufd, hwpt);
+    }
+
+    trace_iommufd_cdev_detach_container(container->be->fd, vbasedev->name,
+                                        container->ioas_id);
+}
+
+static void iommufd_container_destroy(VFIOIOMMUFDContainer *container)
+{
+    VFIOContainerBase *bcontainer = &container->bcontainer;
+
+    if (!QLIST_EMPTY(&container->hwpt_list)) {
+        return;
+    }
+    memory_listener_unregister(&bcontainer->listener);
+    vfio_container_destroy(bcontainer);
+    iommufd_backend_put_ioas(container->be, container->ioas_id);
+    g_free(container);
+}
+
+static int iommufd_ram_block_discard_disable(bool state)
+{
+    /*
+     * We support coordinated discarding of RAM via the RamDiscardManager.
+     */
+    return ram_block_uncoordinated_discard_disable(state);
+}
+
+static int iommufd_attach_device(const char *name, VFIODevice *vbasedev,
+                                 AddressSpace *as, Error **errp)
+{
+    VFIOContainerBase *bcontainer;
+    VFIOIOMMUFDContainer *container;
+    VFIOAddressSpace *space;
+    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+    int ret, devfd;
+    uint32_t ioas_id;
+    Error *err = NULL;
+
+    devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
+    if (devfd < 0) {
+        return devfd;
+    }
+    vbasedev->fd = devfd;
+
+    ret = iommufd_connect_and_bind(vbasedev, errp);
+    if (ret) {
+        goto err_connect_bind;
+    }
+
+    space = vfio_get_address_space(as);
+
+    /* try to attach to an existing container in this space */
+    QLIST_FOREACH(bcontainer, &space->containers, next) {
+        container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+        if (bcontainer->ops != &vfio_iommufd_ops ||
+            vbasedev->iommufd != container->be) {
+            continue;
+        }
+        if (iommufd_cdev_attach_container(vbasedev, container, &err)) {
+            const char *msg = error_get_pretty(err);
+
+            trace_iommufd_cdev_fail_attach_existing_container(msg);
+            error_free(err);
+            err = NULL;
+        } else {
+            ret = iommufd_ram_block_discard_disable(true);
+            if (ret) {
+                error_setg(errp,
+                              "Cannot set discarding of RAM broken (%d)", ret);
+                goto err_discard_disable;
+            }
+            goto found_container;
+        }
+    }
+
+    /* Need to allocate a new dedicated container */
+    ret = iommufd_backend_get_ioas(vbasedev->iommufd, &ioas_id);
+    if (ret < 0) {
+        error_setg_errno(errp, errno, "Failed to alloc ioas");
+        goto err_get_ioas;
+    }
+
+    trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
+
+    container = g_malloc0(sizeof(*container));
+    container->be = vbasedev->iommufd;
+    container->ioas_id = ioas_id;
+    QLIST_INIT(&container->hwpt_list);
+
+    bcontainer = &container->bcontainer;
+    vfio_container_init(bcontainer, space, &vfio_iommufd_ops);
+    QLIST_INSERT_HEAD(&space->containers, bcontainer, next);
+
+    ret = iommufd_cdev_attach_container(vbasedev, container, errp);
+    if (ret) {
+        goto err_attach_container;
+    }
+
+    ret = iommufd_ram_block_discard_disable(true);
+    if (ret) {
+        goto err_discard_disable;
+    }
+
+    bcontainer->pgsizes = qemu_real_host_page_size();
+
+    bcontainer->listener = vfio_memory_listener;
+    memory_listener_register(&bcontainer->listener, bcontainer->space->as);
+
+    if (bcontainer->error) {
+        ret = -1;
+        error_propagate_prepend(errp, bcontainer->error,
+                                "memory listener initialization failed: ");
+        goto err_listener_register;
+    }
+
+    bcontainer->initialized = true;
+
+found_container:
+    ret = ioctl(devfd, VFIO_DEVICE_GET_INFO, &dev_info);
+    if (ret) {
+        error_setg_errno(errp, errno, "error getting device info");
+        goto err_listener_register;
+    }
+
+    /*
+     * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
+     * for discarding incompatibility check as well?
+     */
+    if (vbasedev->ram_block_discard_allowed) {
+        iommufd_ram_block_discard_disable(false);
+    }
+
+    vbasedev->group = 0;
+    vbasedev->num_irqs = dev_info.num_irqs;
+    vbasedev->num_regions = dev_info.num_regions;
+    vbasedev->flags = dev_info.flags;
+    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+    vbasedev->bcontainer = bcontainer;
+    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
+    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+
+    trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
+                                   vbasedev->num_regions, vbasedev->flags);
+    return 0;
+
+err_listener_register:
+    iommufd_ram_block_discard_disable(false);
+err_discard_disable:
+    iommufd_cdev_detach_container(vbasedev, container);
+err_attach_container:
+    iommufd_container_destroy(container);
+err_get_ioas:
+    vfio_put_address_space(space);
+    iommufd_unbind_and_disconnect(vbasedev);
+err_connect_bind:
+    close(vbasedev->fd);
+    return ret;
+}
+
+static void iommufd_detach_device(VFIODevice *vbasedev)
+{
+    VFIOContainerBase *bcontainer = vbasedev->bcontainer;
+    VFIOIOMMUFDContainer *container;
+    VFIOAddressSpace *space = bcontainer->space;
+
+    QLIST_REMOVE(vbasedev, global_next);
+    QLIST_REMOVE(vbasedev, container_next);
+    vbasedev->bcontainer = NULL;
+
+    if (!vbasedev->ram_block_discard_allowed) {
+        iommufd_ram_block_discard_disable(false);
+    }
+
+    container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+    iommufd_cdev_detach_container(vbasedev, container);
+    iommufd_container_destroy(container);
+    vfio_put_address_space(space);
+
+    iommufd_unbind_and_disconnect(vbasedev);
+    close(vbasedev->fd);
+}
+
+const VFIOIOMMUOps vfio_iommufd_ops = {
+    .dma_map = iommufd_map,
+    .dma_unmap = iommufd_unmap,
+    .attach_device = iommufd_attach_device,
+    .detach_device = iommufd_detach_device,
+};
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index eb6ce6229d..9cae2c9e21 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -7,6 +7,9 @@ vfio_ss.add(files(
   'spapr.c',
   'migration.c',
 ))
+if have_iommufd
+  vfio_ss.add(files('iommufd.c'))
+endif
 vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
   'display.c',
   'pci-quirks.c',
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 08a1f9dfa4..d85342b65f 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -164,3 +164,16 @@ vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcop
 vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
 vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
+
+#iommufd.c
+
+iommufd_connect_and_bind(int iommufd, const char *name, int devfd, int devid) " [iommufd=%d] Successfully bound device %s (fd=%d): output devid=%d"
+iommufd_cdev_getfd(const char *dev, int devfd) " %s (fd=%d)"
+iommufd_cdev_attach_hwpt(int iommufd, const char *name, int devfd, int hwptid) " [iommufd=%d] Successfully attached device %s (%d) to hwptd=%d"
+iommufd_cdev_detach_hwpt(int iommufd, const char *name, int hwptid) " [iommufd=%d] Detached %s from hwpt=%d"
+iommufd_cdev_fail_attach_existing_hwpt(const char *msg) " %s"
+iommufd_cdev_attach_container(int iommufd, const char *name, int devfd, int ioasid, int hwptid) " [iommufd=%d] Successfully attached device %s (%d) to ioasid=%d: output hwptd=%d"
+iommufd_cdev_detach_container(int iommufd, const char *name, int ioasid) " [iommufd=%d] Detached %s from ioasid=%d"
+iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
+iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD container with ioasid=%d"
+iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 29/41] vfio/iommufd: Relax assert check for iommufd backend
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (27 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 30/41] vfio/iommufd: Add support for iova_ranges Zhenzhong Duan
                   ` (13 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

Currently iommufd doesn't support dirty page sync yet,
but it will not block us doing live migration if VFIO
migration is force enabled.

So in this case we allow set_dirty_page_tracking to be NULL.
Note we don't need same change for query_dirty_bitmap because
when dirty page sync isn't supported, query_dirty_bitmap will
never be called.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/container-base.c | 4 ++++
 hw/vfio/container.c      | 4 ----
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 71f7274973..eee2dcfe76 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -55,6 +55,10 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
 int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
                                            bool start)
 {
+    if (!bcontainer->dirty_pages_supported) {
+        return 0;
+    }
+
     g_assert(bcontainer->ops->set_dirty_page_tracking);
     return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
 }
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 6bacf38222..ed2d721b2b 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -216,10 +216,6 @@ static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
         .argsz = sizeof(dirty),
     };
 
-    if (!bcontainer->dirty_pages_supported) {
-        return 0;
-    }
-
     if (start) {
         dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
     } else {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 30/41] vfio/iommufd: Add support for iova_ranges
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (28 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 29/41] vfio/iommufd: Relax assert check for " Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-06 17:19   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 31/41] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Zhenzhong Duan
                   ` (12 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

Some vIOMMU such as virtio-iommu use iova ranges from host side to
setup reserved ranges for passthrough device, so that guest will not
use an iova range beyond host support.

Use an uAPI of IOMMUFD to get iova ranges of host side and pass to
vIOMMU just like the legacy backend.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v4: fix build error in 32bit fedora

 hw/vfio/iommufd.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 1bb55ca2c4..22f02f92a9 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -341,6 +341,52 @@ static int iommufd_ram_block_discard_disable(bool state)
     return ram_block_uncoordinated_discard_disable(state);
 }
 
+static int vfio_get_info_iova_range(VFIOIOMMUFDContainer *container,
+                                    uint32_t ioas_id)
+{
+    VFIOContainerBase *bcontainer = &container->bcontainer;
+    struct iommu_ioas_iova_ranges *info;
+    struct iommu_iova_range *iova_ranges;
+    int ret, sz, fd = container->be->fd;
+
+    info = g_malloc0(sizeof(*info));
+    info->size = sizeof(*info);
+    info->ioas_id = ioas_id;
+
+    ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
+    if (ret && errno != EMSGSIZE) {
+        goto error;
+    }
+
+    sz = info->num_iovas * sizeof(struct iommu_iova_range);
+    info = g_realloc(info, sizeof(*info) + sz);
+    info->allowed_iovas = (uintptr_t)(info + 1);
+
+    ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
+    if (ret) {
+        goto error;
+    }
+
+    iova_ranges = (struct iommu_iova_range *)info->allowed_iovas;
+
+    for (int i = 0; i < info->num_iovas; i++) {
+        Range *range = g_new(Range, 1);
+
+        range_set_bounds(range, iova_ranges[i].start, iova_ranges[i].last);
+        bcontainer->iova_ranges =
+            range_list_insert(bcontainer->iova_ranges, range);
+    }
+
+    g_free(info);
+    return 0;
+
+error:
+    ret = -errno;
+    g_free(info);
+    error_report("vfio/iommufd: Cannot get iova ranges: %m");
+    return ret;
+}
+
 static int iommufd_attach_device(const char *name, VFIODevice *vbasedev,
                                  AddressSpace *as, Error **errp)
 {
@@ -418,6 +464,7 @@ static int iommufd_attach_device(const char *name, VFIODevice *vbasedev,
     }
 
     bcontainer->pgsizes = qemu_real_host_page_size();
+    vfio_get_info_iova_range(container, ioas_id);
 
     bcontainer->listener = vfio_memory_listener;
     memory_listener_register(&bcontainer->listener, bcontainer->space->as);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 31/41] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (29 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 30/41] vfio/iommufd: Add support for iova_ranges Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-07 13:48   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 32/41] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
                   ` (11 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

This helper will be used by both legacy and iommufd backends.

No functional changes intended.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.h |  3 +++
 hw/vfio/pci.c | 54 +++++++++++++++++++++++++++++++++++----------------
 2 files changed, 40 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index fba8737ab2..1006061afb 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -218,6 +218,9 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
 
 extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
 
+int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
+                                    struct vfio_pci_hot_reset_info **info_p);
+
 int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
 
 int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c62c02f7b6..eb55e8ae88 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2445,22 +2445,13 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
     return (strcmp(tmp, name) == 0);
 }
 
-static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
+int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
+                                    struct vfio_pci_hot_reset_info **info_p)
 {
-    VFIOGroup *group;
     struct vfio_pci_hot_reset_info *info;
-    struct vfio_pci_dependent_device *devices;
-    struct vfio_pci_hot_reset *reset;
-    int32_t *fds;
-    int ret, i, count;
-    bool multi = false;
+    int ret, count;
 
-    trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
-
-    if (!single) {
-        vfio_pci_pre_reset(vdev);
-    }
-    vdev->vbasedev.needs_reset = false;
+    assert(info_p && !*info_p);
 
     info = g_malloc0(sizeof(*info));
     info->argsz = sizeof(*info);
@@ -2468,24 +2459,53 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
     ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret && errno != ENOSPC) {
         ret = -errno;
+        g_free(info);
         if (!vdev->has_pm_reset) {
             error_report("vfio: Cannot reset device %s, "
                          "no available reset mechanism.", vdev->vbasedev.name);
         }
-        goto out_single;
+        return ret;
     }
 
     count = info->count;
-    info = g_realloc(info, sizeof(*info) + (count * sizeof(*devices)));
-    info->argsz = sizeof(*info) + (count * sizeof(*devices));
-    devices = &info->devices[0];
+    info = g_realloc(info, sizeof(*info) + (count * sizeof(info->devices[0])));
+    info->argsz = sizeof(*info) + (count * sizeof(info->devices[0]));
 
     ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
     if (ret) {
         ret = -errno;
+        g_free(info);
         error_report("vfio: hot reset info failed: %m");
+        return ret;
+    }
+
+    *info_p = info;
+    return 0;
+}
+
+static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
+{
+    VFIOGroup *group;
+    struct vfio_pci_hot_reset_info *info = NULL;
+    struct vfio_pci_dependent_device *devices;
+    struct vfio_pci_hot_reset *reset;
+    int32_t *fds;
+    int ret, i, count;
+    bool multi = false;
+
+    trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
+
+    if (!single) {
+        vfio_pci_pre_reset(vdev);
+    }
+    vdev->vbasedev.needs_reset = false;
+
+    ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
+
+    if (ret) {
         goto out_single;
     }
+    devices = &info->devices[0];
 
     trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 32/41] vfio/pci: Introduce a vfio pci hot reset interface
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (30 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 31/41] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-07 13:52   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 33/41] vfio/iommufd: Enable pci hot reset through iommufd cdev interface Zhenzhong Duan
                   ` (10 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

Legacy vfio pci and iommufd cdev have different process to hot reset
vfio device, expand current code to abstract out pci_hot_reset callback
for legacy vfio, this same interface will also be used by iommufd
cdev vfio device.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.h                         |  1 +
 include/hw/vfio/vfio-container-base.h |  3 +++
 hw/vfio/container.c                   |  2 ++
 hw/vfio/pci.c                         | 11 ++++++++++-
 4 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 1006061afb..12cc765821 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -220,6 +220,7 @@ extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
 
 int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
                                     struct vfio_pci_hot_reset_info **info_p);
+int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single);
 
 int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
 
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 4b6f017c6f..45bb19c767 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -106,6 +106,9 @@ struct VFIOIOMMUOps {
     int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
     int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
                               hwaddr iova, hwaddr size);
+    /* PCI specific */
+    int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
+
     /* SPAPR specific */
     int (*add_window)(VFIOContainerBase *bcontainer,
                       MemoryRegionSection *section,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index ed2d721b2b..f27cc15d09 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -33,6 +33,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
+#include "pci.h"
 
 VFIOGroupList vfio_group_list =
     QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -929,4 +930,5 @@ const VFIOIOMMUOps vfio_legacy_ops = {
     .detach_device = vfio_legacy_detach_device,
     .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
     .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
+    .pci_hot_reset = vfio_legacy_pci_hot_reset,
 };
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index eb55e8ae88..a6194b7bfe 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2483,8 +2483,9 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
     return 0;
 }
 
-static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
+int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single)
 {
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
     VFIOGroup *group;
     struct vfio_pci_hot_reset_info *info = NULL;
     struct vfio_pci_dependent_device *devices;
@@ -2647,6 +2648,14 @@ out_single:
     return ret;
 }
 
+static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
+{
+    VFIODevice *vbasedev = &vdev->vbasedev;
+    const VFIOIOMMUOps *ops = vbasedev->bcontainer->ops;
+
+    return ops->pci_hot_reset(vbasedev, single);
+}
+
 /*
  * We want to differentiate hot reset of multiple in-use devices vs hot reset
  * of a single in-use device.  VFIO_DEVICE_RESET will already handle the case
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 33/41] vfio/iommufd: Enable pci hot reset through iommufd cdev interface
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (31 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 32/41] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 34/41] vfio/pci: Allow the selection of a given iommu backend Zhenzhong Duan
                   ` (9 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

Add a new callback iommufd_pci_hot_reset to do iommufd specific
check and reset operation.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.h        |   2 +
 hw/vfio/iommufd.c    | 142 +++++++++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.c        |   4 +-
 hw/vfio/trace-events |   1 +
 4 files changed, 147 insertions(+), 2 deletions(-)

diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 12cc765821..ec4a03aecd 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -218,6 +218,8 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
 
 extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
 
+void vfio_pci_pre_reset(VFIOPCIDevice *vdev);
+void vfio_pci_post_reset(VFIOPCIDevice *vdev);
 int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
                                     struct vfio_pci_hot_reset_info **info_p);
 int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single);
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 22f02f92a9..aedfe31c3c 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -24,6 +24,7 @@
 #include "sysemu/reset.h"
 #include "qemu/cutils.h"
 #include "qemu/chardev_open.h"
+#include "pci.h"
 
 static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
                        ram_addr_t size, void *vaddr, bool readonly)
@@ -543,9 +544,150 @@ static void iommufd_detach_device(VFIODevice *vbasedev)
     close(vbasedev->fd);
 }
 
+static VFIODevice *vfio_pci_find_by_iommufd_devid(__u32 devid)
+{
+    VFIODevice *vbasedev_iter;
+
+    QLIST_FOREACH(vbasedev_iter, &vfio_device_list, global_next) {
+        if (vbasedev_iter->bcontainer->ops != &vfio_iommufd_ops) {
+            continue;
+        }
+        if (devid == vbasedev_iter->devid) {
+            return vbasedev_iter;
+        }
+    }
+    return NULL;
+}
+
+static int iommufd_pci_hot_reset(VFIODevice *vbasedev, bool single)
+{
+    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+    struct vfio_pci_hot_reset_info *info = NULL;
+    struct vfio_pci_dependent_device *devices;
+    struct vfio_pci_hot_reset *reset;
+    int ret, i;
+    bool multi = false;
+
+    trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
+
+    if (!single) {
+        vfio_pci_pre_reset(vdev);
+    }
+    vdev->vbasedev.needs_reset = false;
+
+    ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
+
+    if (ret) {
+        goto out_single;
+    }
+
+    assert(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID);
+
+    devices = &info->devices[0];
+
+    if (!(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED)) {
+        if (!vdev->has_pm_reset) {
+            for (i = 0; i < info->count; i++) {
+                if (devices[i].devid == VFIO_PCI_DEVID_NOT_OWNED) {
+                    error_report("vfio: Cannot reset device %s, "
+                                 "depends on device %04x:%02x:%02x.%x "
+                                 "which is not owned.",
+                                 vdev->vbasedev.name, devices[i].segment,
+                                 devices[i].bus, PCI_SLOT(devices[i].devfn),
+                                 PCI_FUNC(devices[i].devfn));
+                }
+            }
+        }
+        ret = -EPERM;
+        goto out_single;
+    }
+
+    trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
+
+    for (i = 0; i < info->count; i++) {
+        VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
+
+        trace_vfio_pci_hot_reset_dep_devices_iommufd(devices[i].segment,
+                                             devices[i].bus,
+                                             PCI_SLOT(devices[i].devfn),
+                                             PCI_FUNC(devices[i].devfn),
+                                             devices[i].devid);
+
+        /*
+         * If a VFIO cdev device is resettable, all the dependent devices
+         * are either bound to same iommufd or within same iommu_groups as
+         * one of the iommufd bound devices.
+         */
+        assert(devices[i].devid != VFIO_PCI_DEVID_NOT_OWNED);
+
+        if (devices[i].devid == vdev->vbasedev.devid ||
+            devices[i].devid == VFIO_PCI_DEVID_OWNED) {
+            continue;
+        }
+
+        vbasedev_iter = vfio_pci_find_by_iommufd_devid(devices[i].devid);
+        if (!vbasedev_iter || !vbasedev_iter->dev->realized ||
+            vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+            continue;
+        }
+        tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+        if (single) {
+            ret = -EINVAL;
+            goto out_single;
+        }
+        vfio_pci_pre_reset(tmp);
+        tmp->vbasedev.needs_reset = false;
+        multi = true;
+    }
+
+    if (!single && !multi) {
+        ret = -EINVAL;
+        goto out_single;
+    }
+
+    /* Use zero length array for hot reset with iommufd backend */
+    reset = g_malloc0(sizeof(*reset));
+    reset->argsz = sizeof(*reset);
+
+     /* Bus reset! */
+    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+    g_free(reset);
+
+    trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
+                                    ret ? strerror(errno) : "Success");
+
+    /* Re-enable INTx on affected devices */
+    for (i = 0; i < info->count; i++) {
+        VFIOPCIDevice *tmp;
+        VFIODevice *vbasedev_iter;
+
+        if (devices[i].devid == vdev->vbasedev.devid ||
+            devices[i].devid == VFIO_PCI_DEVID_OWNED) {
+            continue;
+        }
+
+        vbasedev_iter = vfio_pci_find_by_iommufd_devid(devices[i].devid);
+        if (!vbasedev_iter || !vbasedev_iter->dev->realized ||
+            vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+            continue;
+        }
+        tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+        vfio_pci_post_reset(tmp);
+    }
+out_single:
+    if (!single) {
+        vfio_pci_post_reset(vdev);
+    }
+    g_free(info);
+
+    return ret;
+}
+
 const VFIOIOMMUOps vfio_iommufd_ops = {
     .dma_map = iommufd_map,
     .dma_unmap = iommufd_unmap,
     .attach_device = iommufd_attach_device,
     .detach_device = iommufd_detach_device,
+    .pci_hot_reset = iommufd_pci_hot_reset,
 };
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index a6194b7bfe..eb662fd086 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2374,7 +2374,7 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
     return 0;
 }
 
-static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
+void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 {
     PCIDevice *pdev = &vdev->pdev;
     uint16_t cmd;
@@ -2411,7 +2411,7 @@ static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
     vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
 }
 
-static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
+void vfio_pci_post_reset(VFIOPCIDevice *vdev)
 {
     Error *err = NULL;
     int nr;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index d85342b65f..e88a7d5ccc 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -34,6 +34,7 @@ vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"
 vfio_pci_hot_reset(const char *name, const char *type) " (%s) %s"
 vfio_pci_hot_reset_has_dep_devices(const char *name) "%s: hot reset dependent devices:"
 vfio_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int group_id) "\t%04x:%02x:%02x.%x group %d"
+vfio_pci_hot_reset_dep_devices_iommufd(int domain, int bus, int slot, int function, int dev_id) "\t%04x:%02x:%02x.%x devid %d"
 vfio_pci_hot_reset_result(const char *name, const char *result) "%s hot reset: %s"
 vfio_populate_device_config(const char *name, unsigned long size, unsigned long offset, unsigned long flags) "Device %s config:\n  size: 0x%lx, offset: 0x%lx, flags: 0x%lx"
 vfio_populate_device_get_irq_info_failure(const char *errstr) "VFIO_DEVICE_GET_IRQ_INFO failure: %s"
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 34/41] vfio/pci: Allow the selection of a given iommu backend
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (32 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 33/41] vfio/iommufd: Enable pci hot reset through iommufd cdev interface Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 35/41] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
                   ` (8 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

From: Eric Auger <eric.auger@redhat.com>

Now we support two types of iommu backends, let's add the capability
to select one of them. This depends on whether an iommufd object has
been linked with the vfio-pci device:

if the user wants to use the legacy backend, it shall not
link the vfio-pci device with any iommufd object:

-device vfio-pci,host=0000:02:00.0

This is called the legacy mode/backend.

If the user wants to use the iommufd backend (/dev/iommu) it
shall pass an iommufd object id in the vfio-pci device options:

 -object iommufd,id=iommufd0
 -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0

Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/pci.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index eb662fd086..7a6696ca55 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -42,6 +42,7 @@
 #include "qapi/error.h"
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
+#include "sysemu/iommufd.h"
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
@@ -3551,6 +3552,10 @@ static Property vfio_pci_dev_properties[] = {
      * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
      * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
      */
+#ifdef CONFIG_IOMMUFD
+    DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
+                     TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 35/41] vfio/pci: Make vfio cdev pre-openable by passing a file handle
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (33 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 34/41] vfio/pci: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 36/41] vfio: Allow the selection of a given iommu backend for platform ap and ccw Zhenzhong Duan
                   ` (7 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Together with the earlier support of pre-opening /dev/iommu device,
now we have full support of passing a vfio device to unprivileged
qemu by management tool. This mode is no more considered for the
legacy backend. So let's remove the "TODO" comment.

Add a helper function vfio_device_get_name() to check fd and get
device name, it will also be used by other vfio devices.

There is no easy way to check if a device is mdev with FD passing,
so fail the x-balloon-allowed check unconditionally in this case.

There is also no easy way to get BDF as name with FD passing, so
we fake a name by VFIO_FD[fd].

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h |  1 +
 hw/vfio/helpers.c             | 33 +++++++++++++++++++++++++++++
 hw/vfio/iommufd.c             | 12 +++++++----
 hw/vfio/pci.c                 | 40 ++++++++++++++++++++++++-----------
 4 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3f1a39a991..854c32e4ce 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -250,6 +250,7 @@ struct vfio_info_cap_header *
 vfio_get_device_info_cap(struct vfio_device_info *info, uint16_t id);
 struct vfio_info_cap_header *
 vfio_get_cap(void *ptr, uint32_t cap_offset, uint16_t id);
+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
 #endif
 
 bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 168847e7c5..044dbbc501 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -609,3 +609,36 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
 
     return ret;
 }
+
+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
+{
+    struct stat st;
+
+    if (vbasedev->fd < 0) {
+        if (stat(vbasedev->sysfsdev, &st) < 0) {
+            error_setg_errno(errp, errno, "no such host device");
+            error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
+            return -errno;
+        }
+        /* User may specify a name, e.g: VFIO platform device */
+        if (!vbasedev->name) {
+            vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+        }
+    }
+#ifdef CONFIG_IOMMUFD
+    else {
+        if (!vbasedev->iommufd) {
+            error_setg(errp, "Use FD passing only with iommufd backend");
+            return -EINVAL;
+        }
+        /*
+         * Give a name with fd so any function printing out vbasedev->name
+         * will not break.
+         */
+        if (!vbasedev->name) {
+            vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
+        }
+    }
+#endif
+    return 0;
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index aedfe31c3c..1fb1c7e853 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -399,11 +399,15 @@ static int iommufd_attach_device(const char *name, VFIODevice *vbasedev,
     uint32_t ioas_id;
     Error *err = NULL;
 
-    devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
-    if (devfd < 0) {
-        return devfd;
+    if (vbasedev->fd < 0) {
+        devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
+        if (devfd < 0) {
+            return devfd;
+        }
+        vbasedev->fd = devfd;
+    } else {
+        devfd = vbasedev->fd;
     }
-    vbasedev->fd = devfd;
 
     ret = iommufd_connect_and_bind(vbasedev, errp);
     if (ret) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 7a6696ca55..d8f658ea47 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -43,6 +43,7 @@
 #include "migration/blocker.h"
 #include "migration/qemu-file.h"
 #include "sysemu/iommufd.h"
+#include "monitor/monitor.h"
 
 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
 
@@ -3108,18 +3109,23 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     VFIODevice *vbasedev = &vdev->vbasedev;
     char *tmp, *subsys;
     Error *err = NULL;
-    struct stat st;
     int i, ret;
     bool is_mdev;
     char uuid[UUID_STR_LEN];
     char *name;
 
-    if (!vbasedev->sysfsdev) {
+    if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
         if (!(~vdev->host.domain || ~vdev->host.bus ||
               ~vdev->host.slot || ~vdev->host.function)) {
             error_setg(errp, "No provided host device");
+#ifdef CONFIG_IOMMUFD
+            error_append_hint(errp, "Use -device vfio-pci,host=DDDD:BB:DD.F, "
+                              "-device vfio-pci,sysfsdev=PATH_TO_DEVICE "
+                              "or -device vfio-pci,fd=DEVICE_FD\n");
+#else
             error_append_hint(errp, "Use -device vfio-pci,host=DDDD:BB:DD.F "
                               "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
+#endif
             return;
         }
         vbasedev->sysfsdev =
@@ -3128,13 +3134,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
                             vdev->host.slot, vdev->host.function);
     }
 
-    if (stat(vbasedev->sysfsdev, &st) < 0) {
-        error_setg_errno(errp, errno, "no such host device");
-        error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
+    if (vfio_device_get_name(vbasedev, errp)) {
         return;
     }
-
-    vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
     vbasedev->ops = &vfio_pci_ops;
     vbasedev->type = VFIO_DEVICE_TYPE_PCI;
     vbasedev->dev = DEVICE(vdev);
@@ -3494,6 +3496,7 @@ static void vfio_instance_init(Object *obj)
     vdev->host.bus = ~0U;
     vdev->host.slot = ~0U;
     vdev->host.function = ~0U;
+    vdev->vbasedev.fd = -1;
 
     vdev->nv_gpudirect_clique = 0xFF;
 
@@ -3547,11 +3550,6 @@ static Property vfio_pci_dev_properties[] = {
                                    qdev_prop_nv_gpudirect_clique, uint8_t),
     DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
                                 OFF_AUTOPCIBAR_OFF),
-    /*
-     * TODO - support passed fds... is this necessary?
-     * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
-     * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
-     */
 #ifdef CONFIG_IOMMUFD
     DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
                      TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
@@ -3559,6 +3557,21 @@ static Property vfio_pci_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+#ifdef CONFIG_IOMMUFD
+static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp)
+{
+    VFIOPCIDevice *vdev = VFIO_PCI(obj);
+    int fd = -1;
+
+    fd = monitor_fd_param(monitor_cur(), str, errp);
+    if (fd == -1) {
+        error_prepend(errp, "Could not parse remote object fd %s:", str);
+        return;
+    }
+    vdev->vbasedev.fd = fd;
+}
+#endif
+
 static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3566,6 +3579,9 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
 
     dc->reset = vfio_pci_reset;
     device_class_set_props(dc, vfio_pci_dev_properties);
+#ifdef CONFIG_IOMMUFD
+    object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
+#endif
     dc->desc = "VFIO-based PCI device assignment";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     pdc->realize = vfio_realize;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 36/41] vfio: Allow the selection of a given iommu backend for platform ap and ccw
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (34 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 35/41] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-07 18:18   ` Cédric Le Goater
  2023-11-02  7:12 ` [PATCH v4 37/41] vfio/platform: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
                   ` (6 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Thomas Huth, Tony Krowiak, Halil Pasic,
	Jason Herne, Eric Farman, Matthew Rosato,
	open list:S390 general arch...

Previously we added support to select iommu backend for vfio pci
device. Now we added others, E.g: platform, ap and ccw.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-platform.h | 1 +
 hw/vfio/ap.c                    | 5 +++++
 hw/vfio/ccw.c                   | 5 +++++
 hw/vfio/platform.c              | 4 ++++
 4 files changed, 15 insertions(+)

diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
index c414c3dffc..f57f4276f2 100644
--- a/include/hw/vfio/vfio-platform.h
+++ b/include/hw/vfio/vfio-platform.h
@@ -18,6 +18,7 @@
 
 #include "hw/sysbus.h"
 #include "hw/vfio/vfio-common.h"
+#include "sysemu/iommufd.h"
 #include "qemu/event_notifier.h"
 #include "qemu/queue.h"
 #include "qom/object.h"
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index bbf69ff55a..6a4186ccd3 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -15,6 +15,7 @@
 #include <sys/ioctl.h>
 #include "qapi/error.h"
 #include "hw/vfio/vfio-common.h"
+#include "sysemu/iommufd.h"
 #include "hw/s390x/ap-device.h"
 #include "qemu/error-report.h"
 #include "qemu/event_notifier.h"
@@ -204,6 +205,10 @@ static void vfio_ap_unrealize(DeviceState *dev)
 
 static Property vfio_ap_properties[] = {
     DEFINE_PROP_STRING("sysfsdev", VFIOAPDevice, vdev.sysfsdev),
+#ifdef CONFIG_IOMMUFD
+    DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd,
+                     TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index d857bb8d0f..7695ede0fc 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -21,6 +21,7 @@
 
 #include "qapi/error.h"
 #include "hw/vfio/vfio-common.h"
+#include "sysemu/iommufd.h"
 #include "hw/s390x/s390-ccw.h"
 #include "hw/s390x/vfio-ccw.h"
 #include "hw/qdev-properties.h"
@@ -677,6 +678,10 @@ static void vfio_ccw_unrealize(DeviceState *dev)
 static Property vfio_ccw_properties[] = {
     DEFINE_PROP_STRING("sysfsdev", VFIOCCWDevice, vdev.sysfsdev),
     DEFINE_PROP_BOOL("force-orb-pfch", VFIOCCWDevice, force_orb_pfch, false),
+#ifdef CONFIG_IOMMUFD
+    DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd,
+                     TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
     DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 8e3d4ac458..a1c25e0337 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -649,6 +649,10 @@ static Property vfio_platform_dev_properties[] = {
     DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
                        mmap_timeout, 1100),
     DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
+#ifdef CONFIG_IOMMUFD
+    DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
+                     TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
     DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 37/41] vfio/platform: Make vfio cdev pre-openable by passing a file handle
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (35 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 36/41] vfio: Allow the selection of a given iommu backend for platform ap and ccw Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-02  7:12 ` [PATCH v4 38/41] vfio/ap: " Zhenzhong Duan
                   ` (5 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/platform.c | 41 +++++++++++++++++++++++++++++++++--------
 1 file changed, 33 insertions(+), 8 deletions(-)

diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a1c25e0337..aa0b2b9583 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -35,6 +35,7 @@
 #include "hw/platform-bus.h"
 #include "hw/qdev-properties.h"
 #include "sysemu/kvm.h"
+#include "monitor/monitor.h"
 
 /*
  * Functions used whatever the injection method
@@ -529,14 +530,13 @@ static VFIODeviceOps vfio_platform_ops = {
  */
 static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
 {
-    struct stat st;
     int ret;
 
-    /* @sysfsdev takes precedence over @host */
-    if (vbasedev->sysfsdev) {
+    /* @fd takes precedence over @sysfsdev which takes precedence over @host */
+    if (vbasedev->fd < 0 && vbasedev->sysfsdev) {
         g_free(vbasedev->name);
         vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
-    } else {
+    } else if (vbasedev->fd < 0) {
         if (!vbasedev->name || strchr(vbasedev->name, '/')) {
             error_setg(errp, "wrong host device name");
             return -EINVAL;
@@ -546,10 +546,9 @@ static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
                                              vbasedev->name);
     }
 
-    if (stat(vbasedev->sysfsdev, &st) < 0) {
-        error_setg_errno(errp, errno,
-                         "failed to get the sysfs host device file status");
-        return -errno;
+    ret = vfio_device_get_name(vbasedev, errp);
+    if (ret) {
+        return ret;
     }
 
     ret = vfio_attach_device(vbasedev->name, vbasedev,
@@ -656,6 +655,28 @@ static Property vfio_platform_dev_properties[] = {
     DEFINE_PROP_END_OF_LIST(),
 };
 
+static void vfio_platform_instance_init(Object *obj)
+{
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(obj);
+
+    vdev->vbasedev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_platform_set_fd(Object *obj, const char *str, Error **errp)
+{
+    VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(obj);
+    int fd = -1;
+
+    fd = monitor_fd_param(monitor_cur(), str, errp);
+    if (fd == -1) {
+        error_prepend(errp, "Could not parse remote object fd %s:", str);
+        return;
+    }
+    vdev->vbasedev.fd = fd;
+}
+#endif
+
 static void vfio_platform_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
@@ -663,6 +684,9 @@ static void vfio_platform_class_init(ObjectClass *klass, void *data)
 
     dc->realize = vfio_platform_realize;
     device_class_set_props(dc, vfio_platform_dev_properties);
+#ifdef CONFIG_IOMMUFD
+    object_class_property_add_str(klass, "fd", NULL, vfio_platform_set_fd);
+#endif
     dc->vmsd = &vfio_platform_vmstate;
     dc->desc = "VFIO-based platform device assignment";
     sbc->connect_irq_notifier = vfio_start_irqfd_injection;
@@ -675,6 +699,7 @@ static const TypeInfo vfio_platform_dev_info = {
     .name = TYPE_VFIO_PLATFORM,
     .parent = TYPE_SYS_BUS_DEVICE,
     .instance_size = sizeof(VFIOPlatformDevice),
+    .instance_init = vfio_platform_instance_init,
     .class_init = vfio_platform_class_init,
     .class_size = sizeof(VFIOPlatformDeviceClass),
 };
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 38/41] vfio/ap: Make vfio cdev pre-openable by passing a file handle
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (36 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 37/41] vfio/platform: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-02  7:12 ` Zhenzhong Duan
  2023-11-07 18:19   ` Cédric Le Goater
  2023-11-02  7:13 ` [PATCH v4 39/41] vfio/ccw: " Zhenzhong Duan
                   ` (4 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:12 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Thomas Huth, Tony Krowiak, Halil Pasic,
	Jason Herne, open list:S390 general arch...

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Opportunisticly, remove some unnecessory double-cast.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/ap.c | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 6a4186ccd3..0a810f8b88 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -29,6 +29,7 @@
 #include "hw/s390x/ap-bridge.h"
 #include "exec/address-spaces.h"
 #include "qom/object.h"
+#include "monitor/monitor.h"
 
 #define TYPE_VFIO_AP_DEVICE      "vfio-ap"
 
@@ -159,7 +160,10 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
     VFIOAPDevice *vapdev = VFIO_AP_DEVICE(dev);
     VFIODevice *vbasedev = &vapdev->vdev;
 
-    vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+    if (vfio_device_get_name(vbasedev, errp)) {
+        return;
+    }
+
     vbasedev->ops = &vfio_ap_ops;
     vbasedev->type = VFIO_DEVICE_TYPE_AP;
     vbasedev->dev = dev;
@@ -229,11 +233,36 @@ static const VMStateDescription vfio_ap_vmstate = {
     .unmigratable = 1,
 };
 
+static void vfio_ap_instance_init(Object *obj)
+{
+    VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
+
+    vapdev->vdev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_ap_set_fd(Object *obj, const char *str, Error **errp)
+{
+    VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
+    int fd = -1;
+
+    fd = monitor_fd_param(monitor_cur(), str, errp);
+    if (fd == -1) {
+        error_prepend(errp, "Could not parse remote object fd %s:", str);
+        return;
+    }
+    vapdev->vdev.fd = fd;
+}
+#endif
+
 static void vfio_ap_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
 
     device_class_set_props(dc, vfio_ap_properties);
+#ifdef CONFIG_IOMMUFD
+    object_class_property_add_str(klass, "fd", NULL, vfio_ap_set_fd);
+#endif
     dc->vmsd = &vfio_ap_vmstate;
     dc->desc = "VFIO-based AP device assignment";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
@@ -248,6 +277,7 @@ static const TypeInfo vfio_ap_info = {
     .name = TYPE_VFIO_AP_DEVICE,
     .parent = TYPE_AP_DEVICE,
     .instance_size = sizeof(VFIOAPDevice),
+    .instance_init = vfio_ap_instance_init,
     .class_init = vfio_ap_class_init,
 };
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 39/41] vfio/ccw: Make vfio cdev pre-openable by passing a file handle
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (37 preceding siblings ...)
  2023-11-02  7:12 ` [PATCH v4 38/41] vfio/ap: " Zhenzhong Duan
@ 2023-11-02  7:13 ` Zhenzhong Duan
  2023-11-07 18:20   ` Cédric Le Goater
  2023-11-02  7:13 ` [PATCH v4 40/41] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Zhenzhong Duan
                   ` (3 subsequent siblings)
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan, Eric Farman, Matthew Rosato, Thomas Huth,
	open list:vfio-ccw

This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.

Opportunisticly, remove a redundant definition of TYPE_VFIO_CCW.

Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/ccw.c | 34 +++++++++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 7695ede0fc..a674bd8d6d 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -30,6 +30,7 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
+#include "monitor/monitor.h"
 
 struct VFIOCCWDevice {
     S390CCWDevice cdev;
@@ -589,11 +590,12 @@ static void vfio_ccw_realize(DeviceState *dev, Error **errp)
         }
     }
 
+    if (vfio_device_get_name(vbasedev, errp)) {
+        return;
+    }
+
     vbasedev->ops = &vfio_ccw_ops;
     vbasedev->type = VFIO_DEVICE_TYPE_CCW;
-    vbasedev->name = g_strdup_printf("%x.%x.%04x", vcdev->cdev.hostid.cssid,
-                           vcdev->cdev.hostid.ssid,
-                           vcdev->cdev.hostid.devid);
     vbasedev->dev = dev;
 
     /*
@@ -690,12 +692,37 @@ static const VMStateDescription vfio_ccw_vmstate = {
     .unmigratable = 1,
 };
 
+static void vfio_ccw_instance_init(Object *obj)
+{
+    VFIOCCWDevice *vcdev = VFIO_CCW(obj);
+
+    vcdev->vdev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_ccw_set_fd(Object *obj, const char *str, Error **errp)
+{
+    VFIOCCWDevice *vcdev = VFIO_CCW(obj);
+    int fd = -1;
+
+    fd = monitor_fd_param(monitor_cur(), str, errp);
+    if (fd == -1) {
+        error_prepend(errp, "Could not parse remote object fd %s:", str);
+        return;
+    }
+    vcdev->vdev.fd = fd;
+}
+#endif
+
 static void vfio_ccw_class_init(ObjectClass *klass, void *data)
 {
     DeviceClass *dc = DEVICE_CLASS(klass);
     S390CCWDeviceClass *cdc = S390_CCW_DEVICE_CLASS(klass);
 
     device_class_set_props(dc, vfio_ccw_properties);
+#ifdef CONFIG_IOMMUFD
+    object_class_property_add_str(klass, "fd", NULL, vfio_ccw_set_fd);
+#endif
     dc->vmsd = &vfio_ccw_vmstate;
     dc->desc = "VFIO-based subchannel assignment";
     set_bit(DEVICE_CATEGORY_MISC, dc->categories);
@@ -713,6 +740,7 @@ static const TypeInfo vfio_ccw_info = {
     .name = TYPE_VFIO_CCW,
     .parent = TYPE_S390_CCW,
     .instance_size = sizeof(VFIOCCWDevice),
+    .instance_init = vfio_ccw_instance_init,
     .class_init = vfio_ccw_class_init,
 };
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 40/41] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (38 preceding siblings ...)
  2023-11-02  7:13 ` [PATCH v4 39/41] vfio/ccw: " Zhenzhong Duan
@ 2023-11-02  7:13 ` Zhenzhong Duan
  2023-11-02  7:13 ` [PATCH v4 41/41] vfio: Compile out iommufd for PPC target Zhenzhong Duan
                   ` (2 subsequent siblings)
  42 siblings, 0 replies; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

Some of the callbacks in VFIOIOMMUOps pass VFIOContainerBase poiner,
those callbacks only need read access to the sub object of VFIOContainerBase.
So make VFIOContainerBase, VFIOContainer and VFIOIOMMUFDContainer as const
in these callbacks.

Local functions called by those callbacks also need same changes to avoid
build error.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 include/hw/vfio/vfio-common.h         | 12 ++++++----
 include/hw/vfio/vfio-container-base.h | 12 ++++++----
 hw/vfio/common.c                      |  9 +++----
 hw/vfio/container-base.c              |  2 +-
 hw/vfio/container.c                   | 34 ++++++++++++++-------------
 hw/vfio/iommufd.c                     |  8 +++----
 6 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 854c32e4ce..2fc976c7f0 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -257,11 +257,13 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
 void vfio_migration_exit(VFIODevice *vbasedev);
 
 int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size);
-bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer);
-bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer);
-int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+bool
+vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
+bool
+vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
+int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                     VFIOBitmap *vbmap, hwaddr iova,
                                     hwaddr size);
-int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
-                                 uint64_t size, ram_addr_t ram_addr);
+int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
+                          uint64_t size, ram_addr_t ram_addr);
 #endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 45bb19c767..2ae297ccda 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -82,7 +82,7 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
                                        MemoryRegionSection *section);
 int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
                                            bool start);
-int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                       VFIOBitmap *vbmap,
                                       hwaddr iova, hwaddr size);
 
@@ -93,18 +93,20 @@ void vfio_container_destroy(VFIOContainerBase *bcontainer);
 
 struct VFIOIOMMUOps {
     /* basic feature */
-    int (*dma_map)(VFIOContainerBase *bcontainer,
+    int (*dma_map)(const VFIOContainerBase *bcontainer,
                    hwaddr iova, ram_addr_t size,
                    void *vaddr, bool readonly);
-    int (*dma_unmap)(VFIOContainerBase *bcontainer,
+    int (*dma_unmap)(const VFIOContainerBase *bcontainer,
                      hwaddr iova, ram_addr_t size,
                      IOMMUTLBEntry *iotlb);
     int (*attach_device)(const char *name, VFIODevice *vbasedev,
                          AddressSpace *as, Error **errp);
     void (*detach_device)(VFIODevice *vbasedev);
     /* migration feature */
-    int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
-    int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
+    int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
+                                   bool start);
+    int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
+                              VFIOBitmap *vbmap,
                               hwaddr iova, hwaddr size);
     /* PCI specific */
     int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a61dce2845..1c9203183d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -203,7 +203,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
     return true;
 }
 
-bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
+bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
 {
     VFIODevice *vbasedev;
 
@@ -220,7 +220,8 @@ bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
  * Check if all VFIO devices are running and migration is active, which is
  * essentially equivalent to the migration being in pre-copy phase.
  */
-bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer)
+bool
+vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer)
 {
     VFIODevice *vbasedev;
 
@@ -1138,7 +1139,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
     return 0;
 }
 
-int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                     VFIOBitmap *vbmap, hwaddr iova,
                                     hwaddr size)
 {
@@ -1161,7 +1162,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
     return 0;
 }
 
-int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
+int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
                           uint64_t size, ram_addr_t ram_addr)
 {
     bool all_device_dirty_tracking =
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index eee2dcfe76..1ffd25bbfa 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -63,7 +63,7 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
     return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
 }
 
-int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                       VFIOBitmap *vbmap,
                                       hwaddr iova, hwaddr size)
 {
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index f27cc15d09..31681db52b 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -61,11 +61,11 @@ static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
     }
 }
 
-static int vfio_dma_unmap_bitmap(VFIOContainer *container,
+static int vfio_dma_unmap_bitmap(const VFIOContainer *container,
                                  hwaddr iova, ram_addr_t size,
                                  IOMMUTLBEntry *iotlb)
 {
-    VFIOContainerBase *bcontainer = &container->bcontainer;
+    const VFIOContainerBase *bcontainer = &container->bcontainer;
     struct vfio_iommu_type1_dma_unmap *unmap;
     struct vfio_bitmap *bitmap;
     VFIOBitmap vbmap;
@@ -117,11 +117,12 @@ unmap_exit:
 /*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
-static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
-                                 ram_addr_t size, IOMMUTLBEntry *iotlb)
+static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
+                                 hwaddr iova, ram_addr_t size,
+                                 IOMMUTLBEntry *iotlb)
 {
-    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
-                                            bcontainer);
+    const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                                  bcontainer);
     struct vfio_iommu_type1_dma_unmap unmap = {
         .argsz = sizeof(unmap),
         .flags = 0,
@@ -174,11 +175,11 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
     return 0;
 }
 
-static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
+static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
                                ram_addr_t size, void *vaddr, bool readonly)
 {
-    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
-                                            bcontainer);
+    const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                                  bcontainer);
     struct vfio_iommu_type1_dma_map map = {
         .argsz = sizeof(map),
         .flags = VFIO_DMA_MAP_FLAG_READ,
@@ -207,11 +208,12 @@ static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
     return -errno;
 }
 
-static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
-                                               bool start)
+static int
+vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
+                                    bool start)
 {
-    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
-                                            bcontainer);
+    const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                                  bcontainer);
     int ret;
     struct vfio_iommu_type1_dirty_bitmap dirty = {
         .argsz = sizeof(dirty),
@@ -233,12 +235,12 @@ static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
     return ret;
 }
 
-static int vfio_legacy_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+static int vfio_legacy_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
                                           VFIOBitmap *vbmap,
                                           hwaddr iova, hwaddr size)
 {
-    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
-                                            bcontainer);
+    const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+                                                  bcontainer);
     struct vfio_iommu_type1_dirty_bitmap *dbitmap;
     struct vfio_iommu_type1_dirty_bitmap_get *range;
     int ret;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 1fb1c7e853..ea33e64933 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -26,10 +26,10 @@
 #include "qemu/chardev_open.h"
 #include "pci.h"
 
-static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
+static int iommufd_map(const VFIOContainerBase *bcontainer, hwaddr iova,
                        ram_addr_t size, void *vaddr, bool readonly)
 {
-    VFIOIOMMUFDContainer *container =
+    const VFIOIOMMUFDContainer *container =
         container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
 
     return iommufd_backend_map_dma(container->be,
@@ -37,11 +37,11 @@ static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
                                    iova, size, vaddr, readonly);
 }
 
-static int iommufd_unmap(VFIOContainerBase *bcontainer,
+static int iommufd_unmap(const VFIOContainerBase *bcontainer,
                          hwaddr iova, ram_addr_t size,
                          IOMMUTLBEntry *iotlb)
 {
-    VFIOIOMMUFDContainer *container =
+    const VFIOIOMMUFDContainer *container =
         container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
 
     /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* [PATCH v4 41/41] vfio: Compile out iommufd for PPC target
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (39 preceding siblings ...)
  2023-11-02  7:13 ` [PATCH v4 40/41] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Zhenzhong Duan
@ 2023-11-02  7:13 ` Zhenzhong Duan
  2023-11-07 13:44   ` Cédric Le Goater
  2023-11-06 14:23 ` [PATCH v4 00/41] vfio: Adopt iommufd Cédric Le Goater
  2023-11-07 18:28 ` Cédric Le Goater
  42 siblings, 1 reply; 114+ messages in thread
From: Zhenzhong Duan @ 2023-11-02  7:13 UTC (permalink / raw)
  To: qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Zhenzhong Duan

Since PPC doesn't support IOMMUFD, make iommufd related code
compiled out.

Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 hw/vfio/common.c     | 2 +-
 hw/vfio/pci.c        | 2 +-
 hw/vfio/platform.c   | 2 +-
 backends/meson.build | 4 ++--
 hw/vfio/meson.build  | 2 +-
 5 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 1c9203183d..000717cef3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1504,7 +1504,7 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
 {
     const VFIOIOMMUOps *ops;
 
-#ifdef CONFIG_IOMMUFD
+#if defined(CONFIG_IOMMUFD) && !defined(TARGET_PPC)
     if (vbasedev->iommufd) {
         ops = &vfio_iommufd_ops;
     } else
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d8f658ea47..2287e45119 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3550,7 +3550,7 @@ static Property vfio_pci_dev_properties[] = {
                                    qdev_prop_nv_gpudirect_clique, uint8_t),
     DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
                                 OFF_AUTOPCIBAR_OFF),
-#ifdef CONFIG_IOMMUFD
+#if defined(CONFIG_IOMMUFD) && !defined(TARGET_PPC)
     DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
                      TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
 #endif
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index aa0b2b9583..c8f4ae5a06 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -648,7 +648,7 @@ static Property vfio_platform_dev_properties[] = {
     DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
                        mmap_timeout, 1100),
     DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
-#ifdef CONFIG_IOMMUFD
+#if defined(CONFIG_IOMMUFD) && !defined(TARGET_PPC)
     DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
                      TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
 #endif
diff --git a/backends/meson.build b/backends/meson.build
index 05ac57ff15..9dbdfa87f7 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -21,9 +21,9 @@ if have_vhost_user
 endif
 system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost.c'))
 if have_iommufd
-  system_ss.add(files('iommufd.c'))
+  system_ss.add(when: 'TARGET_PPC', if_false: files('iommufd.c'))
 else
-  system_ss.add(files('iommufd-stub.c'))
+  system_ss.add(when: 'TARGET_PPC', if_false: files('iommufd-stub.c'))
 endif
 if have_vhost_user_crypto
   system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost-user.c'))
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index 9cae2c9e21..4423bb3cd4 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -8,7 +8,7 @@ vfio_ss.add(files(
   'migration.c',
 ))
 if have_iommufd
-  vfio_ss.add(files('iommufd.c'))
+  vfio_ss.add(when: 'TARGET_PPC', if_false: files('iommufd.c'))
 endif
 vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
   'display.c',
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 05/41] vfio/common: Move vfio_host_win_add/del into spapr.c
  2023-11-02  7:12 ` [PATCH v4 05/41] vfio/common: Move vfio_host_win_add/del into spapr.c Zhenzhong Duan
@ 2023-11-06  9:33   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06  9:33 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Nicholas Piggin, Daniel Henrique Barboza, David Gibson,
	Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Only spapr supports a customed host window list, other vfio driver
> assume 64bit host window. So remove the check in listener callback
> and move vfio_host_win_add/del into spapr.c and make it static.
> 
> With the check removed, we still need to do the same check for
> VFIO_SPAPR_TCE_IOMMU which allows a single host window range
> [dma32_window_start, dma32_window_size). Move vfio_find_hostwin
> into spapr.c and do same check in vfio_container_add_section_window
> instead.
> 
> When mapping a ram device section, if it's unaligned with
> hostwin->iova_pgsizes, this mapping is bypassed. With hostwin
> moved into spapr, we changed to check container->pgsizes.
> 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
> v4: add vfio_find_hostwin back for VFIO_SPAPR_TCE_IOMMU
> 
>   include/hw/vfio/vfio-common.h |  5 ---
>   hw/vfio/common.c              | 70 +----------------------------
>   hw/vfio/container.c           | 16 -------
>   hw/vfio/spapr.c               | 83 +++++++++++++++++++++++++++++++++++
>   4 files changed, 85 insertions(+), 89 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 87848982bd..a4a22accb9 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -207,11 +207,6 @@ typedef struct {
>       hwaddr pages;
>   } VFIOBitmap;
>   
> -void vfio_host_win_add(VFIOContainer *container,
> -                       hwaddr min_iova, hwaddr max_iova,
> -                       uint64_t iova_pgsizes);
> -int vfio_host_win_del(VFIOContainer *container, hwaddr min_iova,
> -                      hwaddr max_iova);
>   VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
>   void vfio_put_address_space(VFIOAddressSpace *space);
>   bool vfio_devices_all_running_and_saving(VFIOContainer *container);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index e72055e752..e70fdf5e0c 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -245,44 +245,6 @@ bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
>       return true;
>   }
>   
> -void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova,
> -                       hwaddr max_iova, uint64_t iova_pgsizes)
> -{
> -    VFIOHostDMAWindow *hostwin;
> -
> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> -        if (ranges_overlap(hostwin->min_iova,
> -                           hostwin->max_iova - hostwin->min_iova + 1,
> -                           min_iova,
> -                           max_iova - min_iova + 1)) {
> -            hw_error("%s: Overlapped IOMMU are not enabled", __func__);
> -        }
> -    }
> -
> -    hostwin = g_malloc0(sizeof(*hostwin));
> -
> -    hostwin->min_iova = min_iova;
> -    hostwin->max_iova = max_iova;
> -    hostwin->iova_pgsizes = iova_pgsizes;
> -    QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
> -}
> -
> -int vfio_host_win_del(VFIOContainer *container,
> -                      hwaddr min_iova, hwaddr max_iova)
> -{
> -    VFIOHostDMAWindow *hostwin;
> -
> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> -        if (hostwin->min_iova == min_iova && hostwin->max_iova == max_iova) {
> -            QLIST_REMOVE(hostwin, hostwin_next);
> -            g_free(hostwin);
> -            return 0;
> -        }
> -    }
> -
> -    return -1;
> -}
> -
>   static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>   {
>       return (!memory_region_is_ram(section->mr) &&
> @@ -531,22 +493,6 @@ static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
>       g_free(vrdl);
>   }
>   
> -static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
> -                                            hwaddr iova, hwaddr end)
> -{
> -    VFIOHostDMAWindow *hostwin;
> -    bool hostwin_found = false;
> -
> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> -        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> -            hostwin_found = true;
> -            break;
> -        }
> -    }
> -
> -    return hostwin_found ? hostwin : NULL;
> -}
> -
>   static bool vfio_known_safe_misalignment(MemoryRegionSection *section)
>   {
>       MemoryRegion *mr = section->mr;
> @@ -625,7 +571,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>       Int128 llend, llsize;
>       void *vaddr;
>       int ret;
> -    VFIOHostDMAWindow *hostwin;
>       Error *err = NULL;
>   
>       if (!vfio_listener_valid_section(section, "region_add")) {
> @@ -647,13 +592,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           goto fail;
>       }
>   
> -    hostwin = vfio_find_hostwin(container, iova, end);
> -    if (!hostwin) {
> -        error_setg(&err, "Container %p can't map guest IOVA region"
> -                   " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container, iova, end);
> -        goto fail;
> -    }
> -
>       memory_region_ref(section->mr);
>   
>       if (memory_region_is_iommu(section->mr)) {
> @@ -734,7 +672,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>       llsize = int128_sub(llend, int128_make64(iova));
>   
>       if (memory_region_is_ram_device(section->mr)) {
> -        hwaddr pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
> +        hwaddr pgmask = (1ULL << ctz64(container->pgsizes)) - 1;
>   
>           if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
>               trace_vfio_listener_region_add_no_dma_map(
> @@ -833,12 +771,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
>   
>       if (memory_region_is_ram_device(section->mr)) {
>           hwaddr pgmask;
> -        VFIOHostDMAWindow *hostwin;
> -
> -        hostwin = vfio_find_hostwin(container, iova, end);
> -        assert(hostwin); /* or region_add() would have failed */
>   
> -        pgmask = (1ULL << ctz64(hostwin->iova_pgsizes)) - 1;
> +        pgmask = (1ULL << ctz64(container->pgsizes)) - 1;
>           try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
>       } else if (memory_region_has_ram_discard_manager(section->mr)) {
>           vfio_unregister_ram_discard_listener(container, section);
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 204b244b11..242010036a 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -551,7 +551,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       container->dma_max_mappings = 0;
>       container->iova_ranges = NULL;
>       QLIST_INIT(&container->giommu_list);
> -    QLIST_INIT(&container->hostwin_list);
>       QLIST_INIT(&container->vrdl_list);
>   
>       ret = vfio_init_container(container, group->fd, errp);
> @@ -591,14 +590,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>   
>           vfio_get_iommu_info_migration(container, info);
>           g_free(info);
> -
> -        /*
> -         * FIXME: We should parse VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE
> -         * information to get the actual window extent rather than assume
> -         * a 64-bit IOVA address space.
> -         */
> -        vfio_host_win_add(container, 0, (hwaddr)-1, container->pgsizes);
> -
>           break;
>       }
>       case VFIO_SPAPR_TCE_v2_IOMMU:
> @@ -687,7 +678,6 @@ static void vfio_disconnect_container(VFIOGroup *group)
>       if (QLIST_EMPTY(&container->group_list)) {
>           VFIOAddressSpace *space = container->space;
>           VFIOGuestIOMMU *giommu, *tmp;
> -        VFIOHostDMAWindow *hostwin, *next;
>   
>           QLIST_REMOVE(container, next);
>   
> @@ -698,12 +688,6 @@ static void vfio_disconnect_container(VFIOGroup *group)
>               g_free(giommu);
>           }
>   
> -        QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
> -                           next) {
> -            QLIST_REMOVE(hostwin, hostwin_next);
> -            g_free(hostwin);
> -        }
> -
>           trace_vfio_disconnect_container(container->fd);
>           close(container->fd);
>           vfio_free_container(container);
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 4428990c28..83da2f7ec2 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -146,6 +146,60 @@ static const MemoryListener vfio_prereg_listener = {
>       .region_del = vfio_prereg_listener_region_del,
>   };
>   
> +static void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova,
> +                              hwaddr max_iova, uint64_t iova_pgsizes)
> +{
> +    VFIOHostDMAWindow *hostwin;
> +
> +    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +        if (ranges_overlap(hostwin->min_iova,
> +                           hostwin->max_iova - hostwin->min_iova + 1,
> +                           min_iova,
> +                           max_iova - min_iova + 1)) {
> +            hw_error("%s: Overlapped IOMMU are not enabled", __func__);
> +        }
> +    }
> +
> +    hostwin = g_malloc0(sizeof(*hostwin));
> +
> +    hostwin->min_iova = min_iova;
> +    hostwin->max_iova = max_iova;
> +    hostwin->iova_pgsizes = iova_pgsizes;
> +    QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
> +}
> +
> +static int vfio_host_win_del(VFIOContainer *container,
> +                             hwaddr min_iova, hwaddr max_iova)
> +{
> +    VFIOHostDMAWindow *hostwin;
> +
> +    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +        if (hostwin->min_iova == min_iova && hostwin->max_iova == max_iova) {
> +            QLIST_REMOVE(hostwin, hostwin_next);
> +            g_free(hostwin);
> +            return 0;
> +        }
> +    }
> +
> +    return -1;
> +}
> +
> +static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
> +                                            hwaddr iova, hwaddr end)
> +{
> +    VFIOHostDMAWindow *hostwin;
> +    bool hostwin_found = false;
> +
> +    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +        if (hostwin->min_iova <= iova && end <= hostwin->max_iova) {
> +            hostwin_found = true;
> +            break;
> +        }
> +    }
> +
> +    return hostwin_found ? hostwin : NULL;
> +}
> +
>   static int vfio_spapr_remove_window(VFIOContainer *container,
>                                       hwaddr offset_within_address_space)
>   {
> @@ -267,6 +321,26 @@ int vfio_container_add_section_window(VFIOContainer *container,
>       hwaddr pgsize = 0;
>       int ret;
>   
> +    /*
> +     * VFIO_SPAPR_TCE_IOMMU supports a single host window between
> +     * [dma32_window_start, dma32_window_size), we need to ensure
> +     * the section fall in this range.
> +     */
> +    if (container->iommu_type == VFIO_SPAPR_TCE_IOMMU) {
> +        hwaddr iova, end;
> +
> +        iova = section->offset_within_address_space;
> +        end = iova + int128_get64(section->size) - 1;
> +
> +        if (!vfio_find_hostwin(container, iova, end)) {
> +            error_setg(errp, "Container %p can't map guest IOVA region"
> +                       " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container,
> +                       iova, end);
> +            return -EINVAL;
> +        }
> +        return 0;
> +    }
> +
>       if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
>           return 0;
>       }
> @@ -351,6 +425,8 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>       bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
>       int ret, fd = container->fd;
>   
> +    QLIST_INIT(&container->hostwin_list);
> +
>       /*
>        * The host kernel code implementing VFIO_IOMMU_DISABLE is called
>        * when container fd is closed so we do not call it explicitly
> @@ -418,7 +494,14 @@ listener_unregister_exit:
>   
>   void vfio_spapr_container_deinit(VFIOContainer *container)
>   {
> +    VFIOHostDMAWindow *hostwin, *next;
> +
>       if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
>           memory_listener_unregister(&container->prereg_listener);
>       }
> +    QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
> +                       next) {
> +        QLIST_REMOVE(hostwin, hostwin_next);
> +        g_free(hostwin);
> +    }
>   }



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 00/41] vfio: Adopt iommufd
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (40 preceding siblings ...)
  2023-11-02  7:13 ` [PATCH v4 41/41] vfio: Compile out iommufd for PPC target Zhenzhong Duan
@ 2023-11-06 14:23 ` Cédric Le Goater
  2023-11-07 18:28 ` Cédric Le Goater
  42 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 14:23 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Hi,
> 
> Thanks all for giving guides and comments on previous series, here is
> the v4 of pure iommufd support part.
> 
> Based on Cédric's suggestion, this series includes an effort to remove
> spapr code from container.c, now all spapr functions are moved to spapr.c
> or spapr_pci_vfio.c, but there are still a few trival check on
> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
> callbacks and duplicate code just to remove them. Some functions are moved
> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
> functions to those spapr functions moved.
> 
> 
> PATCH 1-5: Move spapr functions to spapr*.c

PATCH 1-5 applied to vfio-next.

Thanks,

C.



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 06/41] vfio: Introduce base object for VFIOContainer and targeted interface
  2023-11-02  7:12 ` [PATCH v4 06/41] vfio: Introduce base object for VFIOContainer and targeted interface Zhenzhong Duan
@ 2023-11-06 16:36   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:36 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Introduce a dumb VFIOContainerBase object and its targeted interface.
> This is willingly not a QOM object because we don't want it to be
> visible from the user interface. The VFIOContainerBase will be
> smoothly populated in subsequent patches as well as interfaces.
> 
> No fucntional change intended.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v4: use SPDX identifier, use const char *name parameter, HW_VFIO_VFIO_CONTAINER_BASE_H



Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


>   include/hw/vfio/vfio-common.h         |  8 ++---
>   include/hw/vfio/vfio-container-base.h | 50 +++++++++++++++++++++++++++
>   2 files changed, 52 insertions(+), 6 deletions(-)
>   create mode 100644 include/hw/vfio/vfio-container-base.h
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index a4a22accb9..586d153c12 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -30,6 +30,7 @@
>   #include <linux/vfio.h>
>   #endif
>   #include "sysemu/sysemu.h"
> +#include "hw/vfio/vfio-container-base.h"
>   
>   #define VFIO_MSG_PREFIX "vfio %s: "
>   
> @@ -81,6 +82,7 @@ typedef struct VFIOAddressSpace {
>   struct VFIOGroup;
>   
>   typedef struct VFIOContainer {
> +    VFIOContainerBase bcontainer;
>       VFIOAddressSpace *space;
>       int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>       MemoryListener listener;
> @@ -201,12 +203,6 @@ typedef struct VFIODisplay {
>       } dmabuf;
>   } VFIODisplay;
>   
> -typedef struct {
> -    unsigned long *bitmap;
> -    hwaddr size;
> -    hwaddr pages;
> -} VFIOBitmap;
> -
>   VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
>   void vfio_put_address_space(VFIOAddressSpace *space);
>   bool vfio_devices_all_running_and_saving(VFIOContainer *container);
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> new file mode 100644
> index 0000000000..1d6daaea5d
> --- /dev/null
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -0,0 +1,50 @@
> +/*
> + * VFIO BASE CONTAINER
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + * Copyright Red Hat, Inc. 2023
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *          Eric Auger <eric.auger@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef HW_VFIO_VFIO_CONTAINER_BASE_H
> +#define HW_VFIO_VFIO_CONTAINER_BASE_H
> +
> +#include "exec/memory.h"
> +
> +typedef struct VFIODevice VFIODevice;
> +typedef struct VFIOIOMMUOps VFIOIOMMUOps;
> +
> +typedef struct {
> +    unsigned long *bitmap;
> +    hwaddr size;
> +    hwaddr pages;
> +} VFIOBitmap;
> +
> +/*
> + * This is the base object for vfio container backends
> + */
> +typedef struct VFIOContainerBase {
> +    const VFIOIOMMUOps *ops;
> +} VFIOContainerBase;
> +
> +struct VFIOIOMMUOps {
> +    /* basic feature */
> +    int (*dma_map)(VFIOContainerBase *bcontainer,
> +                   hwaddr iova, ram_addr_t size,
> +                   void *vaddr, bool readonly);
> +    int (*dma_unmap)(VFIOContainerBase *bcontainer,
> +                     hwaddr iova, ram_addr_t size,
> +                     IOMMUTLBEntry *iotlb);
> +    int (*attach_device)(const char *name, VFIODevice *vbasedev,
> +                         AddressSpace *as, Error **errp);
> +    void (*detach_device)(VFIODevice *vbasedev);
> +    /* migration feature */
> +    int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
> +    int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
> +                              hwaddr iova, hwaddr size);
> +};
> +#endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 07/41] vfio/container: Introduce a empty VFIOIOMMUOps
  2023-11-02  7:12 ` [PATCH v4 07/41] vfio/container: Introduce a empty VFIOIOMMUOps Zhenzhong Duan
@ 2023-11-06 16:36   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:36 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> This empty VFIOIOMMUOps named vfio_legacy_ops will hold all general
> IOMMU ops of legacy container.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.



> ---
>   include/hw/vfio/vfio-common.h | 2 +-
>   hw/vfio/container.c           | 5 +++++
>   2 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 586d153c12..678161f207 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -255,7 +255,7 @@ typedef QLIST_HEAD(VFIOGroupList, VFIOGroup) VFIOGroupList;
>   typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
>   extern VFIOGroupList vfio_group_list;
>   extern VFIODeviceList vfio_device_list;
> -
> +extern const VFIOIOMMUOps vfio_legacy_ops;
>   extern const MemoryListener vfio_memory_listener;
>   extern int vfio_kvm_device_fd;
>   
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 242010036a..4bc43ddfa4 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -472,6 +472,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>                                     Error **errp)
>   {
>       VFIOContainer *container;
> +    VFIOContainerBase *bcontainer;
>       int ret, fd;
>       VFIOAddressSpace *space;
>   
> @@ -552,6 +553,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       container->iova_ranges = NULL;
>       QLIST_INIT(&container->giommu_list);
>       QLIST_INIT(&container->vrdl_list);
> +    bcontainer = &container->bcontainer;
> +    bcontainer->ops = &vfio_legacy_ops;
>   
>       ret = vfio_init_container(container, group->fd, errp);
>       if (ret) {
> @@ -933,3 +936,5 @@ void vfio_detach_device(VFIODevice *vbasedev)
>       vfio_put_base_device(vbasedev);
>       vfio_put_group(group);
>   }
> +
> +const VFIOIOMMUOps vfio_legacy_ops;



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 08/41] vfio/container: Switch to dma_map|unmap API
  2023-11-02  7:12 ` [PATCH v4 08/41] vfio/container: Switch to dma_map|unmap API Zhenzhong Duan
@ 2023-11-06 16:37   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:37 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> No fucntional change intended.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v4: use SPDX identifier, use assert


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> 
>   include/hw/vfio/vfio-common.h         |  4 ---
>   include/hw/vfio/vfio-container-base.h |  7 +++++
>   hw/vfio/common.c                      | 45 +++++++++++++++------------
>   hw/vfio/container-base.c              | 32 +++++++++++++++++++
>   hw/vfio/container.c                   | 22 ++++++++-----
>   hw/vfio/meson.build                   |  1 +
>   hw/vfio/trace-events                  |  2 +-
>   7 files changed, 81 insertions(+), 32 deletions(-)
>   create mode 100644 hw/vfio/container-base.c
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 678161f207..24a26345e5 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -208,10 +208,6 @@ void vfio_put_address_space(VFIOAddressSpace *space);
>   bool vfio_devices_all_running_and_saving(VFIOContainer *container);
>   
>   /* container->fd */
> -int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
> -                   ram_addr_t size, IOMMUTLBEntry *iotlb);
> -int vfio_dma_map(VFIOContainer *container, hwaddr iova,
> -                 ram_addr_t size, void *vaddr, bool readonly);
>   int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start);
>   int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
>                               hwaddr iova, hwaddr size);
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 1d6daaea5d..56b033f59f 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -31,6 +31,13 @@ typedef struct VFIOContainerBase {
>       const VFIOIOMMUOps *ops;
>   } VFIOContainerBase;
>   
> +int vfio_container_dma_map(VFIOContainerBase *bcontainer,
> +                           hwaddr iova, ram_addr_t size,
> +                           void *vaddr, bool readonly);
> +int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
> +                             hwaddr iova, ram_addr_t size,
> +                             IOMMUTLBEntry *iotlb);
> +
>   struct VFIOIOMMUOps {
>       /* basic feature */
>       int (*dma_map)(VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index e70fdf5e0c..e610771888 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -292,7 +292,7 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
>   static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>   {
>       VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> -    VFIOContainer *container = giommu->container;
> +    VFIOContainerBase *bcontainer = &giommu->container->bcontainer;
>       hwaddr iova = iotlb->iova + giommu->iommu_offset;
>       void *vaddr;
>       int ret;
> @@ -322,21 +322,22 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>            * of vaddr will always be there, even if the memory object is
>            * destroyed and its backing memory munmap-ed.
>            */
> -        ret = vfio_dma_map(container, iova,
> -                           iotlb->addr_mask + 1, vaddr,
> -                           read_only);
> +        ret = vfio_container_dma_map(bcontainer, iova,
> +                                     iotlb->addr_mask + 1, vaddr,
> +                                     read_only);
>           if (ret) {
> -            error_report("vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> +            error_report("vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
>                            "0x%"HWADDR_PRIx", %p) = %d (%s)",
> -                         container, iova,
> +                         bcontainer, iova,
>                            iotlb->addr_mask + 1, vaddr, ret, strerror(-ret));
>           }
>       } else {
> -        ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
> +        ret = vfio_container_dma_unmap(bcontainer, iova,
> +                                       iotlb->addr_mask + 1, iotlb);
>           if (ret) {
> -            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> +            error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>                            "0x%"HWADDR_PRIx") = %d (%s)",
> -                         container, iova,
> +                         bcontainer, iova,
>                            iotlb->addr_mask + 1, ret, strerror(-ret));
>               vfio_set_migration_error(ret);
>           }
> @@ -355,9 +356,10 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>       int ret;
>   
>       /* Unmap with a single call. */
> -    ret = vfio_dma_unmap(vrdl->container, iova, size , NULL);
> +    ret = vfio_container_dma_unmap(&vrdl->container->bcontainer,
> +                                   iova, size , NULL);
>       if (ret) {
> -        error_report("%s: vfio_dma_unmap() failed: %s", __func__,
> +        error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
>                        strerror(-ret));
>       }
>   }
> @@ -385,8 +387,8 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>                  section->offset_within_address_space;
>           vaddr = memory_region_get_ram_ptr(section->mr) + start;
>   
> -        ret = vfio_dma_map(vrdl->container, iova, next - start,
> -                           vaddr, section->readonly);
> +        ret = vfio_container_dma_map(&vrdl->container->bcontainer, iova,
> +                                     next - start, vaddr, section->readonly);
>           if (ret) {
>               /* Rollback */
>               vfio_ram_discard_notify_discard(rdl, section);
> @@ -684,10 +686,11 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           }
>       }
>   
> -    ret = vfio_dma_map(container, iova, int128_get64(llsize),
> -                       vaddr, section->readonly);
> +    ret = vfio_container_dma_map(&container->bcontainer,
> +                                 iova, int128_get64(llsize), vaddr,
> +                                 section->readonly);
>       if (ret) {
> -        error_setg(&err, "vfio_dma_map(%p, 0x%"HWADDR_PRIx", "
> +        error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
>                      "0x%"HWADDR_PRIx", %p) = %d (%s)",
>                      container, iova, int128_get64(llsize), vaddr, ret,
>                      strerror(-ret));
> @@ -784,18 +787,20 @@ static void vfio_listener_region_del(MemoryListener *listener,
>           if (int128_eq(llsize, int128_2_64())) {
>               /* The unmap ioctl doesn't accept a full 64-bit span. */
>               llsize = int128_rshift(llsize, 1);
> -            ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
> +            ret = vfio_container_dma_unmap(&container->bcontainer, iova,
> +                                           int128_get64(llsize), NULL);
>               if (ret) {
> -                error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> +                error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>                                "0x%"HWADDR_PRIx") = %d (%s)",
>                                container, iova, int128_get64(llsize), ret,
>                                strerror(-ret));
>               }
>               iova += int128_get64(llsize);
>           }
> -        ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
> +        ret = vfio_container_dma_unmap(&container->bcontainer, iova,
> +                                       int128_get64(llsize), NULL);
>           if (ret) {
> -            error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> +            error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>                            "0x%"HWADDR_PRIx") = %d (%s)",
>                            container, iova, int128_get64(llsize), ret,
>                            strerror(-ret));
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> new file mode 100644
> index 0000000000..55d3a35fa4
> --- /dev/null
> +++ b/hw/vfio/container-base.c
> @@ -0,0 +1,32 @@
> +/*
> + * VFIO BASE CONTAINER
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + * Copyright Red Hat, Inc. 2023
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *          Eric Auger <eric.auger@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qapi/error.h"
> +#include "qemu/error-report.h"
> +#include "hw/vfio/vfio-container-base.h"
> +
> +int vfio_container_dma_map(VFIOContainerBase *bcontainer,
> +                           hwaddr iova, ram_addr_t size,
> +                           void *vaddr, bool readonly)
> +{
> +    g_assert(bcontainer->ops->dma_map);
> +    return bcontainer->ops->dma_map(bcontainer, iova, size, vaddr, readonly);
> +}
> +
> +int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
> +                             hwaddr iova, ram_addr_t size,
> +                             IOMMUTLBEntry *iotlb)
> +{
> +    g_assert(bcontainer->ops->dma_unmap);
> +    return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
> +}
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 4bc43ddfa4..c04df26323 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -115,9 +115,11 @@ unmap_exit:
>   /*
>    * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
>    */
> -int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
> -                   ram_addr_t size, IOMMUTLBEntry *iotlb)
> +static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
> +                                 ram_addr_t size, IOMMUTLBEntry *iotlb)
>   {
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       struct vfio_iommu_type1_dma_unmap unmap = {
>           .argsz = sizeof(unmap),
>           .flags = 0,
> @@ -151,7 +153,7 @@ int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
>            */
>           if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
>               container->iommu_type == VFIO_TYPE1v2_IOMMU) {
> -            trace_vfio_dma_unmap_overflow_workaround();
> +            trace_vfio_legacy_dma_unmap_overflow_workaround();
>               unmap.size -= 1ULL << ctz64(container->pgsizes);
>               continue;
>           }
> @@ -170,9 +172,11 @@ int vfio_dma_unmap(VFIOContainer *container, hwaddr iova,
>       return 0;
>   }
>   
> -int vfio_dma_map(VFIOContainer *container, hwaddr iova,
> -                 ram_addr_t size, void *vaddr, bool readonly)
> +static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
> +                               ram_addr_t size, void *vaddr, bool readonly)
>   {
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       struct vfio_iommu_type1_dma_map map = {
>           .argsz = sizeof(map),
>           .flags = VFIO_DMA_MAP_FLAG_READ,
> @@ -191,7 +195,8 @@ int vfio_dma_map(VFIOContainer *container, hwaddr iova,
>        * the VGA ROM space.
>        */
>       if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
> -        (errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 &&
> +        (errno == EBUSY &&
> +         vfio_legacy_dma_unmap(bcontainer, iova, size, NULL) == 0 &&
>            ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
>           return 0;
>       }
> @@ -937,4 +942,7 @@ void vfio_detach_device(VFIODevice *vbasedev)
>       vfio_put_group(group);
>   }
>   
> -const VFIOIOMMUOps vfio_legacy_ops;
> +const VFIOIOMMUOps vfio_legacy_ops = {
> +    .dma_map = vfio_legacy_dma_map,
> +    .dma_unmap = vfio_legacy_dma_unmap,
> +};
> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
> index 2a6912c940..eb6ce6229d 100644
> --- a/hw/vfio/meson.build
> +++ b/hw/vfio/meson.build
> @@ -2,6 +2,7 @@ vfio_ss = ss.source_set()
>   vfio_ss.add(files(
>     'helpers.c',
>     'common.c',
> +  'container-base.c',
>     'container.c',
>     'spapr.c',
>     'migration.c',
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 0eb2387cf2..9f7fedee98 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -116,7 +116,7 @@ vfio_region_unmap(const char *name, unsigned long offset, unsigned long end) "Re
>   vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) "Device %s region %d: %d sparse mmap entries"
>   vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) "sparse entry %d [0x%lx - 0x%lx]"
>   vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t subtype) "%s index %d, %08x/%08x"
> -vfio_dma_unmap_overflow_workaround(void) ""
> +vfio_legacy_dma_unmap_overflow_workaround(void) ""
>   vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t bitmap_size, uint64_t start, uint64_t dirty_pages) "container fd=%d, iova=0x%"PRIx64" size= 0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64" dirty_pages=%"PRIu64
>   vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu dirty @ 0x%"PRIx64" - 0x%"PRIx64
>   



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 09/41] vfio/common: Introduce vfio_container_init/destroy helper
  2023-11-02  7:12 ` [PATCH v4 09/41] vfio/common: Introduce vfio_container_init/destroy helper Zhenzhong Duan
@ 2023-11-06 16:37   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:37 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> This adds two helper functions vfio_container_init/destroy which will be
> used by both legacy and iommufd containers to do base container specific
> initialization and release.
> 
> No fucntional change intended.
> 
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/hw/vfio/vfio-container-base.h | 4 ++++
>   hw/vfio/container-base.c              | 9 +++++++++
>   hw/vfio/container.c                   | 4 +++-
>   3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 56b033f59f..577f52ccbc 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -38,6 +38,10 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>                                hwaddr iova, ram_addr_t size,
>                                IOMMUTLBEntry *iotlb);
>   
> +void vfio_container_init(VFIOContainerBase *bcontainer,
> +                         const VFIOIOMMUOps *ops);
> +void vfio_container_destroy(VFIOContainerBase *bcontainer);
> +
>   struct VFIOIOMMUOps {
>       /* basic feature */
>       int (*dma_map)(VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 55d3a35fa4..e929435751 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -30,3 +30,12 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>       g_assert(bcontainer->ops->dma_unmap);
>       return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
>   }
> +
> +void vfio_container_init(VFIOContainerBase *bcontainer, const VFIOIOMMUOps *ops)
> +{
> +    bcontainer->ops = ops;
> +}
> +
> +void vfio_container_destroy(VFIOContainerBase *bcontainer)
> +{
> +}
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index c04df26323..32a0251dd1 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -559,7 +559,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       QLIST_INIT(&container->giommu_list);
>       QLIST_INIT(&container->vrdl_list);
>       bcontainer = &container->bcontainer;
> -    bcontainer->ops = &vfio_legacy_ops;
> +    vfio_container_init(bcontainer, &vfio_legacy_ops);
>   
>       ret = vfio_init_container(container, group->fd, errp);
>       if (ret) {
> @@ -661,6 +661,7 @@ put_space_exit:
>   static void vfio_disconnect_container(VFIOGroup *group)
>   {
>       VFIOContainer *container = group->container;
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>   
>       QLIST_REMOVE(group, container_next);
>       group->container = NULL;
> @@ -695,6 +696,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>               QLIST_REMOVE(giommu, giommu_next);
>               g_free(giommu);
>           }
> +        vfio_container_destroy(bcontainer);
>   
>           trace_vfio_disconnect_container(container->fd);
>           close(container->fd);



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 10/41] vfio/common: Move giommu_list in base container
  2023-11-02  7:12 ` [PATCH v4 10/41] vfio/common: Move giommu_list in base container Zhenzhong Duan
@ 2023-11-06 16:50   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:50 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> Move the giommu_list field in the base container and store
> the base container in the VFIOGuestIOMMU.
> 
> No functional change intended.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   include/hw/vfio/vfio-common.h         |  9 ---------
>   include/hw/vfio/vfio-container-base.h |  9 +++++++++
>   hw/vfio/common.c                      | 17 +++++++++++------
>   hw/vfio/container-base.c              |  9 +++++++++
>   hw/vfio/container.c                   |  8 --------
>   5 files changed, 29 insertions(+), 23 deletions(-)


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 24a26345e5..6be082b8f2 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -95,7 +95,6 @@ typedef struct VFIOContainer {
>       uint64_t max_dirty_bitmap_size;
>       unsigned long pgsizes;
>       unsigned int dma_max_mappings;
> -    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
>       QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
> @@ -104,14 +103,6 @@ typedef struct VFIOContainer {
>       GList *iova_ranges;
>   } VFIOContainer;
>   
> -typedef struct VFIOGuestIOMMU {
> -    VFIOContainer *container;
> -    IOMMUMemoryRegion *iommu_mr;
> -    hwaddr iommu_offset;
> -    IOMMUNotifier n;
> -    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
> -} VFIOGuestIOMMU;
> -
>   typedef struct VFIORamDiscardListener {
>       VFIOContainer *container;
>       MemoryRegion *mr;
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 577f52ccbc..a11aec5755 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -29,8 +29,17 @@ typedef struct {
>    */
>   typedef struct VFIOContainerBase {
>       const VFIOIOMMUOps *ops;
> +    QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>   } VFIOContainerBase;
>   
> +typedef struct VFIOGuestIOMMU {
> +    VFIOContainerBase *bcontainer;
> +    IOMMUMemoryRegion *iommu_mr;
> +    hwaddr iommu_offset;
> +    IOMMUNotifier n;
> +    QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
> +} VFIOGuestIOMMU;
> +
>   int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>                              hwaddr iova, ram_addr_t size,
>                              void *vaddr, bool readonly);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index e610771888..43580bcc43 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -292,7 +292,7 @@ static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
>   static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>   {
>       VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> -    VFIOContainerBase *bcontainer = &giommu->container->bcontainer;
> +    VFIOContainerBase *bcontainer = giommu->bcontainer;
>       hwaddr iova = iotlb->iova + giommu->iommu_offset;
>       void *vaddr;
>       int ret;
> @@ -569,6 +569,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>                                        MemoryRegionSection *section)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>       hwaddr iova, end;
>       Int128 llend, llsize;
>       void *vaddr;
> @@ -612,7 +613,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           giommu->iommu_mr = iommu_mr;
>           giommu->iommu_offset = section->offset_within_address_space -
>                                  section->offset_within_region;
> -        giommu->container = container;
> +        giommu->bcontainer = bcontainer;
>           llend = int128_add(int128_make64(section->offset_within_region),
>                              section->size);
>           llend = int128_sub(llend, int128_one());
> @@ -647,7 +648,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>               g_free(giommu);
>               goto fail;
>           }
> -        QLIST_INSERT_HEAD(&container->giommu_list, giommu, giommu_next);
> +        QLIST_INSERT_HEAD(&bcontainer->giommu_list, giommu, giommu_next);
>           memory_region_iommu_replay(giommu->iommu_mr, &giommu->n);
>   
>           return;
> @@ -732,6 +733,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>                                        MemoryRegionSection *section)
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>       hwaddr iova, end;
>       Int128 llend, llsize;
>       int ret;
> @@ -744,7 +746,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>       if (memory_region_is_iommu(section->mr)) {
>           VFIOGuestIOMMU *giommu;
>   
> -        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
> +        QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) {
>               if (MEMORY_REGION(giommu->iommu_mr) == section->mr &&
>                   giommu->n.start == section->offset_within_region) {
>                   memory_region_unregister_iommu_notifier(section->mr,
> @@ -1206,7 +1208,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>       vfio_giommu_dirty_notifier *gdn = container_of(n,
>                                                   vfio_giommu_dirty_notifier, n);
>       VFIOGuestIOMMU *giommu = gdn->giommu;
> -    VFIOContainer *container = giommu->container;
> +    VFIOContainerBase *bcontainer = giommu->bcontainer;
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       hwaddr iova = iotlb->iova + giommu->iommu_offset;
>       ram_addr_t translated_addr;
>       int ret = -EINVAL;
> @@ -1284,12 +1288,13 @@ static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
>   static int vfio_sync_dirty_bitmap(VFIOContainer *container,
>                                     MemoryRegionSection *section)
>   {
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>       ram_addr_t ram_addr;
>   
>       if (memory_region_is_iommu(section->mr)) {
>           VFIOGuestIOMMU *giommu;
>   
> -        QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
> +        QLIST_FOREACH(giommu, &bcontainer->giommu_list, giommu_next) {
>               if (MEMORY_REGION(giommu->iommu_mr) == section->mr &&
>                   giommu->n.start == section->offset_within_region) {
>                   Int128 llend;
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index e929435751..20bcb9669a 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -34,8 +34,17 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>   void vfio_container_init(VFIOContainerBase *bcontainer, const VFIOIOMMUOps *ops)
>   {
>       bcontainer->ops = ops;
> +    QLIST_INIT(&bcontainer->giommu_list);
>   }
>   
>   void vfio_container_destroy(VFIOContainerBase *bcontainer)
>   {
> +    VFIOGuestIOMMU *giommu, *tmp;
> +
> +    QLIST_FOREACH_SAFE(giommu, &bcontainer->giommu_list, giommu_next, tmp) {
> +        memory_region_unregister_iommu_notifier(
> +                MEMORY_REGION(giommu->iommu_mr), &giommu->n);
> +        QLIST_REMOVE(giommu, giommu_next);
> +        g_free(giommu);
> +    }
>   }
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 32a0251dd1..133d3c8f5c 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -556,7 +556,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       container->dirty_pages_supported = false;
>       container->dma_max_mappings = 0;
>       container->iova_ranges = NULL;
> -    QLIST_INIT(&container->giommu_list);
>       QLIST_INIT(&container->vrdl_list);
>       bcontainer = &container->bcontainer;
>       vfio_container_init(bcontainer, &vfio_legacy_ops);
> @@ -686,16 +685,9 @@ static void vfio_disconnect_container(VFIOGroup *group)
>   
>       if (QLIST_EMPTY(&container->group_list)) {
>           VFIOAddressSpace *space = container->space;
> -        VFIOGuestIOMMU *giommu, *tmp;
>   
>           QLIST_REMOVE(container, next);
>   
> -        QLIST_FOREACH_SAFE(giommu, &container->giommu_list, giommu_next, tmp) {
> -            memory_region_unregister_iommu_notifier(
> -                    MEMORY_REGION(giommu->iommu_mr), &giommu->n);
> -            QLIST_REMOVE(giommu, giommu_next);
> -            g_free(giommu);
> -        }
>           vfio_container_destroy(bcontainer);
>   
>           trace_vfio_disconnect_container(container->fd);



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 11/41] vfio/container: Move space field to base container
  2023-11-02  7:12 ` [PATCH v4 11/41] vfio/container: Move space field to " Zhenzhong Duan
@ 2023-11-06 16:50   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:50 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Nicholas Piggin, Daniel Henrique Barboza,
	Cédric Le Goater, David Gibson, Harsh Prateek Bora,
	open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> Move the space field to the base object. Also the VFIOAddressSpace
> now contains a list of base containers.
> 
> No fucntional change intended.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v4: use bcontainer->space->as instead of container->bcontainer.space->as
> 
>   include/hw/vfio/vfio-common.h         |  8 --------
>   include/hw/vfio/vfio-container-base.h |  9 +++++++++
>   hw/ppc/spapr_pci_vfio.c               | 10 +++++-----
>   hw/vfio/common.c                      |  4 ++--
>   hw/vfio/container-base.c              |  6 +++++-
>   hw/vfio/container.c                   | 18 ++++++++----------
>   6 files changed, 29 insertions(+), 26 deletions(-)


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.



> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 6be082b8f2..bd4de6cb3a 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -73,17 +73,10 @@ typedef struct VFIOMigration {
>       bool initial_data_sent;
>   } VFIOMigration;
>   
> -typedef struct VFIOAddressSpace {
> -    AddressSpace *as;
> -    QLIST_HEAD(, VFIOContainer) containers;
> -    QLIST_ENTRY(VFIOAddressSpace) list;
> -} VFIOAddressSpace;
> -
>   struct VFIOGroup;
>   
>   typedef struct VFIOContainer {
>       VFIOContainerBase bcontainer;
> -    VFIOAddressSpace *space;
>       int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>       MemoryListener listener;
>       MemoryListener prereg_listener;
> @@ -98,7 +91,6 @@ typedef struct VFIOContainer {
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
>       QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
> -    QLIST_ENTRY(VFIOContainer) next;
>       QLIST_HEAD(, VFIODevice) device_list;
>       GList *iova_ranges;
>   } VFIOContainer;
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index a11aec5755..c7cc6ec9c5 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -24,12 +24,20 @@ typedef struct {
>       hwaddr pages;
>   } VFIOBitmap;
>   
> +typedef struct VFIOAddressSpace {
> +    AddressSpace *as;
> +    QLIST_HEAD(, VFIOContainerBase) containers;
> +    QLIST_ENTRY(VFIOAddressSpace) list;
> +} VFIOAddressSpace;
> +
>   /*
>    * This is the base object for vfio container backends
>    */
>   typedef struct VFIOContainerBase {
>       const VFIOIOMMUOps *ops;
> +    VFIOAddressSpace *space;
>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> +    QLIST_ENTRY(VFIOContainerBase) next;
>   } VFIOContainerBase;
>   
>   typedef struct VFIOGuestIOMMU {
> @@ -48,6 +56,7 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>                                IOMMUTLBEntry *iotlb);
>   
>   void vfio_container_init(VFIOContainerBase *bcontainer,
> +                         VFIOAddressSpace *space,
>                            const VFIOIOMMUOps *ops);
>   void vfio_container_destroy(VFIOContainerBase *bcontainer);
>   
> diff --git a/hw/ppc/spapr_pci_vfio.c b/hw/ppc/spapr_pci_vfio.c
> index f283f7e38d..d1d07bec46 100644
> --- a/hw/ppc/spapr_pci_vfio.c
> +++ b/hw/ppc/spapr_pci_vfio.c
> @@ -84,27 +84,27 @@ static int vfio_eeh_container_op(VFIOContainer *container, uint32_t op)
>   static VFIOContainer *vfio_eeh_as_container(AddressSpace *as)
>   {
>       VFIOAddressSpace *space = vfio_get_address_space(as);
> -    VFIOContainer *container = NULL;
> +    VFIOContainerBase *bcontainer = NULL;
>   
>       if (QLIST_EMPTY(&space->containers)) {
>           /* No containers to act on */
>           goto out;
>       }
>   
> -    container = QLIST_FIRST(&space->containers);
> +    bcontainer = QLIST_FIRST(&space->containers);
>   
> -    if (QLIST_NEXT(container, next)) {
> +    if (QLIST_NEXT(bcontainer, next)) {
>           /*
>            * We don't yet have logic to synchronize EEH state across
>            * multiple containers
>            */
> -        container = NULL;
> +        bcontainer = NULL;
>           goto out;
>       }
>   
>   out:
>       vfio_put_address_space(space);
> -    return container;
> +    return container_of(bcontainer, VFIOContainer, bcontainer);
>   }
>   
>   static bool vfio_eeh_as_ok(AddressSpace *as)
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 43580bcc43..1d8202537e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -145,7 +145,7 @@ void vfio_unblock_multiple_devices_migration(void)
>   
>   bool vfio_viommu_preset(VFIODevice *vbasedev)
>   {
> -    return vbasedev->container->space->as != &address_space_memory;
> +    return vbasedev->container->bcontainer.space->as != &address_space_memory;
>   }
>   
>   static void vfio_set_migration_error(int err)
> @@ -922,7 +922,7 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
>       dirty.container = container;
>   
>       memory_listener_register(&dirty.listener,
> -                             container->space->as);
> +                             container->bcontainer.space->as);
>   
>       *ranges = dirty.ranges;
>   
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 20bcb9669a..3933391e0d 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -31,9 +31,11 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>       return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
>   }
>   
> -void vfio_container_init(VFIOContainerBase *bcontainer, const VFIOIOMMUOps *ops)
> +void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
> +                         const VFIOIOMMUOps *ops)
>   {
>       bcontainer->ops = ops;
> +    bcontainer->space = space;
>       QLIST_INIT(&bcontainer->giommu_list);
>   }
>   
> @@ -41,6 +43,8 @@ void vfio_container_destroy(VFIOContainerBase *bcontainer)
>   {
>       VFIOGuestIOMMU *giommu, *tmp;
>   
> +    QLIST_REMOVE(bcontainer, next);
> +
>       QLIST_FOREACH_SAFE(giommu, &bcontainer->giommu_list, giommu_next, tmp) {
>           memory_region_unregister_iommu_notifier(
>                   MEMORY_REGION(giommu->iommu_mr), &giommu->n);
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 133d3c8f5c..f12fcb6fe1 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -514,7 +514,8 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>        * details once we know which type of IOMMU we are using.
>        */
>   
> -    QLIST_FOREACH(container, &space->containers, next) {
> +    QLIST_FOREACH(bcontainer, &space->containers, next) {
> +        container = container_of(bcontainer, VFIOContainer, bcontainer);
>           if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
>               ret = vfio_ram_block_discard_disable(container, true);
>               if (ret) {
> @@ -550,7 +551,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       }
>   
>       container = g_malloc0(sizeof(*container));
> -    container->space = space;
>       container->fd = fd;
>       container->error = NULL;
>       container->dirty_pages_supported = false;
> @@ -558,7 +558,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       container->iova_ranges = NULL;
>       QLIST_INIT(&container->vrdl_list);
>       bcontainer = &container->bcontainer;
> -    vfio_container_init(bcontainer, &vfio_legacy_ops);
> +    vfio_container_init(bcontainer, space, &vfio_legacy_ops);
>   
>       ret = vfio_init_container(container, group->fd, errp);
>       if (ret) {
> @@ -613,14 +613,14 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       vfio_kvm_device_add_group(group);
>   
>       QLIST_INIT(&container->group_list);
> -    QLIST_INSERT_HEAD(&space->containers, container, next);
> +    QLIST_INSERT_HEAD(&space->containers, bcontainer, next);
>   
>       group->container = container;
>       QLIST_INSERT_HEAD(&container->group_list, group, container_next);
>   
>       container->listener = vfio_memory_listener;
>   
> -    memory_listener_register(&container->listener, container->space->as);
> +    memory_listener_register(&container->listener, bcontainer->space->as);
>   
>       if (container->error) {
>           ret = -1;
> @@ -634,7 +634,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       return 0;
>   listener_release_exit:
>       QLIST_REMOVE(group, container_next);
> -    QLIST_REMOVE(container, next);
> +    QLIST_REMOVE(bcontainer, next);
>       vfio_kvm_device_del_group(group);
>       memory_listener_unregister(&container->listener);
>       if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU ||
> @@ -684,9 +684,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>       }
>   
>       if (QLIST_EMPTY(&container->group_list)) {
> -        VFIOAddressSpace *space = container->space;
> -
> -        QLIST_REMOVE(container, next);
> +        VFIOAddressSpace *space = bcontainer->space;
>   
>           vfio_container_destroy(bcontainer);
>   
> @@ -707,7 +705,7 @@ static VFIOGroup *vfio_get_group(int groupid, AddressSpace *as, Error **errp)
>       QLIST_FOREACH(group, &vfio_group_list, next) {
>           if (group->groupid == groupid) {
>               /* Found it.  Now is it already in the right context? */
> -            if (group->container->space->as == as) {
> +            if (group->container->bcontainer.space->as == as) {
>                   return group;
>               } else {
>                   error_setg(errp, "group %d used in multiple address spaces",



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 12/41] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API
  2023-11-02  7:12 ` [PATCH v4 12/41] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API Zhenzhong Duan
@ 2023-11-06 16:50   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:50 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> dirty_pages_supported field is also moved to the base container
> 
> No fucntional change intended.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v4: use assert


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> 
>   include/hw/vfio/vfio-common.h         |  6 ------
>   include/hw/vfio/vfio-container-base.h |  6 ++++++
>   hw/vfio/common.c                      | 12 ++++++++----
>   hw/vfio/container-base.c              | 16 ++++++++++++++++
>   hw/vfio/container.c                   | 21 ++++++++++++++-------
>   5 files changed, 44 insertions(+), 17 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index bd4de6cb3a..60f2785fe0 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -83,7 +83,6 @@ typedef struct VFIOContainer {
>       unsigned iommu_type;
>       Error *error;
>       bool initialized;
> -    bool dirty_pages_supported;
>       uint64_t dirty_pgsizes;
>       uint64_t max_dirty_bitmap_size;
>       unsigned long pgsizes;
> @@ -190,11 +189,6 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
>   void vfio_put_address_space(VFIOAddressSpace *space);
>   bool vfio_devices_all_running_and_saving(VFIOContainer *container);
>   
> -/* container->fd */
> -int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start);
> -int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
> -                            hwaddr iova, hwaddr size);
> -
>   /* SPAPR specific */
>   int vfio_container_add_section_window(VFIOContainer *container,
>                                         MemoryRegionSection *section,
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index c7cc6ec9c5..f244f003d0 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -36,6 +36,7 @@ typedef struct VFIOAddressSpace {
>   typedef struct VFIOContainerBase {
>       const VFIOIOMMUOps *ops;
>       VFIOAddressSpace *space;
> +    bool dirty_pages_supported;
>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>       QLIST_ENTRY(VFIOContainerBase) next;
>   } VFIOContainerBase;
> @@ -54,6 +55,11 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>   int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>                                hwaddr iova, ram_addr_t size,
>                                IOMMUTLBEntry *iotlb);
> +int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> +                                           bool start);
> +int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +                                      VFIOBitmap *vbmap,
> +                                      hwaddr iova, hwaddr size);
>   
>   void vfio_container_init(VFIOContainerBase *bcontainer,
>                            VFIOAddressSpace *space,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 1d8202537e..b1a875ca93 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1079,7 +1079,8 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
>       if (vfio_devices_all_device_dirty_tracking(container)) {
>           ret = vfio_devices_dma_logging_start(container);
>       } else {
> -        ret = vfio_set_dirty_page_tracking(container, true);
> +        ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
> +                                                     true);
>       }
>   
>       if (ret) {
> @@ -1097,7 +1098,8 @@ static void vfio_listener_log_global_stop(MemoryListener *listener)
>       if (vfio_devices_all_device_dirty_tracking(container)) {
>           vfio_devices_dma_logging_stop(container);
>       } else {
> -        ret = vfio_set_dirty_page_tracking(container, false);
> +        ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
> +                                                     false);
>       }
>   
>       if (ret) {
> @@ -1165,7 +1167,8 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>       VFIOBitmap vbmap;
>       int ret;
>   
> -    if (!container->dirty_pages_supported && !all_device_dirty_tracking) {
> +    if (!container->bcontainer.dirty_pages_supported &&
> +        !all_device_dirty_tracking) {
>           cpu_physical_memory_set_dirty_range(ram_addr, size,
>                                               tcg_enabled() ? DIRTY_CLIENTS_ALL :
>                                               DIRTY_CLIENTS_NOCODE);
> @@ -1180,7 +1183,8 @@ int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
>       if (all_device_dirty_tracking) {
>           ret = vfio_devices_query_dirty_bitmap(container, &vbmap, iova, size);
>       } else {
> -        ret = vfio_query_dirty_bitmap(container, &vbmap, iova, size);
> +        ret = vfio_container_query_dirty_bitmap(&container->bcontainer, &vbmap,
> +                                                iova, size);
>       }
>   
>       if (ret) {
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 3933391e0d..5d654ae172 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -31,11 +31,27 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>       return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
>   }
>   
> +int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> +                                           bool start)
> +{
> +    g_assert(bcontainer->ops->set_dirty_page_tracking);
> +    return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
> +}
> +
> +int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +                                      VFIOBitmap *vbmap,
> +                                      hwaddr iova, hwaddr size)
> +{
> +    g_assert(bcontainer->ops->query_dirty_bitmap);
> +    return bcontainer->ops->query_dirty_bitmap(bcontainer, vbmap, iova, size);
> +}
> +
>   void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
>                            const VFIOIOMMUOps *ops)
>   {
>       bcontainer->ops = ops;
>       bcontainer->space = space;
> +    bcontainer->dirty_pages_supported = false;
>       QLIST_INIT(&bcontainer->giommu_list);
>   }
>   
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index f12fcb6fe1..3ab74e2615 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -131,7 +131,7 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
>   
>       if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
>           if (!vfio_devices_all_device_dirty_tracking(container) &&
> -            container->dirty_pages_supported) {
> +            container->bcontainer.dirty_pages_supported) {
>               return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
>           }
>   
> @@ -205,14 +205,17 @@ static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
>       return -errno;
>   }
>   
> -int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
> +static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> +                                               bool start)
>   {
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       int ret;
>       struct vfio_iommu_type1_dirty_bitmap dirty = {
>           .argsz = sizeof(dirty),
>       };
>   
> -    if (!container->dirty_pages_supported) {
> +    if (!bcontainer->dirty_pages_supported) {
>           return 0;
>       }
>   
> @@ -232,9 +235,12 @@ int vfio_set_dirty_page_tracking(VFIOContainer *container, bool start)
>       return ret;
>   }
>   
> -int vfio_query_dirty_bitmap(VFIOContainer *container, VFIOBitmap *vbmap,
> -                            hwaddr iova, hwaddr size)
> +static int vfio_legacy_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +                                          VFIOBitmap *vbmap,
> +                                          hwaddr iova, hwaddr size)
>   {
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       struct vfio_iommu_type1_dirty_bitmap *dbitmap;
>       struct vfio_iommu_type1_dirty_bitmap_get *range;
>       int ret;
> @@ -461,7 +467,7 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
>        * qemu_real_host_page_size to mark those dirty.
>        */
>       if (cap_mig->pgsize_bitmap & qemu_real_host_page_size()) {
> -        container->dirty_pages_supported = true;
> +        container->bcontainer.dirty_pages_supported = true;
>           container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
>           container->dirty_pgsizes = cap_mig->pgsize_bitmap;
>       }
> @@ -553,7 +559,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       container = g_malloc0(sizeof(*container));
>       container->fd = fd;
>       container->error = NULL;
> -    container->dirty_pages_supported = false;
>       container->dma_max_mappings = 0;
>       container->iova_ranges = NULL;
>       QLIST_INIT(&container->vrdl_list);
> @@ -937,4 +942,6 @@ void vfio_detach_device(VFIODevice *vbasedev)
>   const VFIOIOMMUOps vfio_legacy_ops = {
>       .dma_map = vfio_legacy_dma_map,
>       .dma_unmap = vfio_legacy_dma_unmap,
> +    .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
> +    .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
>   };



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 15/41] vfio/container: Move pgsizes and dma_max_mappings to base container
  2023-11-02  7:12 ` [PATCH v4 15/41] vfio/container: Move pgsizes and dma_max_mappings " Zhenzhong Duan
@ 2023-11-06 16:53   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:53 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Nicholas Piggin, Daniel Henrique Barboza, David Gibson,
	Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> No functional change intended.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.



> ---
> v4: Split vrdl_list change out in a seperate patch
> 
>   include/hw/vfio/vfio-common.h         |  2 --
>   include/hw/vfio/vfio-container-base.h |  2 ++
>   hw/vfio/common.c                      | 17 +++++++++--------
>   hw/vfio/container-base.c              |  1 +
>   hw/vfio/container.c                   | 11 +++++------
>   hw/vfio/spapr.c                       | 10 ++++++----
>   6 files changed, 23 insertions(+), 20 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index bc67e1316c..d3dc2f9dcb 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -85,8 +85,6 @@ typedef struct VFIOContainer {
>       bool initialized;
>       uint64_t dirty_pgsizes;
>       uint64_t max_dirty_bitmap_size;
> -    unsigned long pgsizes;
> -    unsigned int dma_max_mappings;
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
>       QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 7090962496..85ec7e1a56 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -36,6 +36,8 @@ typedef struct VFIOAddressSpace {
>   typedef struct VFIOContainerBase {
>       const VFIOIOMMUOps *ops;
>       VFIOAddressSpace *space;
> +    unsigned long pgsizes;
> +    unsigned int dma_max_mappings;
>       bool dirty_pages_supported;
>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
>       QLIST_ENTRY(VFIOContainerBase) next;
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index cf6618f6ed..1cb53d369e 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -401,6 +401,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>   static void vfio_register_ram_discard_listener(VFIOContainer *container,
>                                                  MemoryRegionSection *section)
>   {
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>       RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
>       VFIORamDiscardListener *vrdl;
>   
> @@ -419,8 +420,8 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
>                                                                   section->mr);
>   
>       g_assert(vrdl->granularity && is_power_of_2(vrdl->granularity));
> -    g_assert(container->pgsizes &&
> -             vrdl->granularity >= 1ULL << ctz64(container->pgsizes));
> +    g_assert(bcontainer->pgsizes &&
> +             vrdl->granularity >= 1ULL << ctz64(bcontainer->pgsizes));
>   
>       ram_discard_listener_init(&vrdl->listener,
>                                 vfio_ram_discard_notify_populate,
> @@ -441,7 +442,7 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
>        * number of sections in the address space we could have over time,
>        * also consuming DMA mappings.
>        */
> -    if (container->dma_max_mappings) {
> +    if (bcontainer->dma_max_mappings) {
>           unsigned int vrdl_count = 0, vrdl_mappings = 0, max_memslots = 512;
>   
>   #ifdef CONFIG_KVM
> @@ -462,11 +463,11 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
>           }
>   
>           if (vrdl_mappings + max_memslots - vrdl_count >
> -            container->dma_max_mappings) {
> +            bcontainer->dma_max_mappings) {
>               warn_report("%s: possibly running out of DMA mappings. E.g., try"
>                           " increasing the 'block-size' of virtio-mem devies."
>                           " Maximum possible DMA mappings: %d, Maximum possible"
> -                        " memslots: %d", __func__, container->dma_max_mappings,
> +                        " memslots: %d", __func__, bcontainer->dma_max_mappings,
>                           max_memslots);
>           }
>       }
> @@ -626,7 +627,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>                               iommu_idx);
>   
>           ret = memory_region_iommu_set_page_size_mask(giommu->iommu_mr,
> -                                                     container->pgsizes,
> +                                                     bcontainer->pgsizes,
>                                                        &err);
>           if (ret) {
>               g_free(giommu);
> @@ -675,7 +676,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>       llsize = int128_sub(llend, int128_make64(iova));
>   
>       if (memory_region_is_ram_device(section->mr)) {
> -        hwaddr pgmask = (1ULL << ctz64(container->pgsizes)) - 1;
> +        hwaddr pgmask = (1ULL << ctz64(bcontainer->pgsizes)) - 1;
>   
>           if ((iova & pgmask) || (int128_get64(llsize) & pgmask)) {
>               trace_vfio_listener_region_add_no_dma_map(
> @@ -777,7 +778,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>       if (memory_region_is_ram_device(section->mr)) {
>           hwaddr pgmask;
>   
> -        pgmask = (1ULL << ctz64(container->pgsizes)) - 1;
> +        pgmask = (1ULL << ctz64(bcontainer->pgsizes)) - 1;
>           try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
>       } else if (memory_region_has_ram_discard_manager(section->mr)) {
>           vfio_unregister_ram_discard_listener(container, section);
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 5d654ae172..dcce111349 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -52,6 +52,7 @@ void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
>       bcontainer->ops = ops;
>       bcontainer->space = space;
>       bcontainer->dirty_pages_supported = false;
> +    bcontainer->dma_max_mappings = 0;
>       QLIST_INIT(&bcontainer->giommu_list);
>   }
>   
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 7bd81eab09..c5a6262882 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -154,7 +154,7 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
>           if (errno == EINVAL && unmap.size && !(unmap.iova + unmap.size) &&
>               container->iommu_type == VFIO_TYPE1v2_IOMMU) {
>               trace_vfio_legacy_dma_unmap_overflow_workaround();
> -            unmap.size -= 1ULL << ctz64(container->pgsizes);
> +            unmap.size -= 1ULL << ctz64(bcontainer->pgsizes);
>               continue;
>           }
>           error_report("VFIO_UNMAP_DMA failed: %s", strerror(errno));
> @@ -559,7 +559,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       container = g_malloc0(sizeof(*container));
>       container->fd = fd;
>       container->error = NULL;
> -    container->dma_max_mappings = 0;
>       container->iova_ranges = NULL;
>       QLIST_INIT(&container->vrdl_list);
>       bcontainer = &container->bcontainer;
> @@ -589,13 +588,13 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>           }
>   
>           if (info->flags & VFIO_IOMMU_INFO_PGSIZES) {
> -            container->pgsizes = info->iova_pgsizes;
> +            bcontainer->pgsizes = info->iova_pgsizes;
>           } else {
> -            container->pgsizes = qemu_real_host_page_size();
> +            bcontainer->pgsizes = qemu_real_host_page_size();
>           }
>   
> -        if (!vfio_get_info_dma_avail(info, &container->dma_max_mappings)) {
> -            container->dma_max_mappings = 65535;
> +        if (!vfio_get_info_dma_avail(info, &bcontainer->dma_max_mappings)) {
> +            bcontainer->dma_max_mappings = 65535;
>           }
>   
>           vfio_get_info_iova_range(info, container);
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 83da2f7ec2..4f76bdd3ca 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -226,6 +226,7 @@ static int vfio_spapr_create_window(VFIOContainer *container,
>                                       hwaddr *pgsize)
>   {
>       int ret = 0;
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>       IOMMUMemoryRegion *iommu_mr = IOMMU_MEMORY_REGION(section->mr);
>       uint64_t pagesize = memory_region_iommu_get_min_page_size(iommu_mr), pgmask;
>       unsigned entries, bits_total, bits_per_level, max_levels;
> @@ -239,13 +240,13 @@ static int vfio_spapr_create_window(VFIOContainer *container,
>       if (pagesize > rampagesize) {
>           pagesize = rampagesize;
>       }
> -    pgmask = container->pgsizes & (pagesize | (pagesize - 1));
> +    pgmask = bcontainer->pgsizes & (pagesize | (pagesize - 1));
>       pagesize = pgmask ? (1ULL << (63 - clz64(pgmask))) : 0;
>       if (!pagesize) {
>           error_report("Host doesn't support page size 0x%"PRIx64
>                        ", the supported mask is 0x%lx",
>                        memory_region_iommu_get_min_page_size(iommu_mr),
> -                     container->pgsizes);
> +                     bcontainer->pgsizes);
>           return -EINVAL;
>       }
>   
> @@ -421,6 +422,7 @@ void vfio_container_del_section_window(VFIOContainer *container,
>   
>   int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>   {
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>       struct vfio_iommu_spapr_tce_info info;
>       bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
>       int ret, fd = container->fd;
> @@ -461,7 +463,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>       }
>   
>       if (v2) {
> -        container->pgsizes = info.ddw.pgsizes;
> +        bcontainer->pgsizes = info.ddw.pgsizes;
>           /*
>            * There is a default window in just created container.
>            * To make region_add/del simpler, we better remove this
> @@ -476,7 +478,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>           }
>       } else {
>           /* The default table uses 4K pages */
> -        container->pgsizes = 0x1000;
> +        bcontainer->pgsizes = 0x1000;
>           vfio_host_win_add(container, info.dma32_window_start,
>                             info.dma32_window_start +
>                             info.dma32_window_size - 1,



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 16/41] vfio/container: Move vrdl_list to base container
  2023-11-02  7:12 ` [PATCH v4 16/41] vfio/container: Move vrdl_list " Zhenzhong Duan
@ 2023-11-06 16:53   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:53 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> No functional change intended.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/hw/vfio/vfio-common.h         | 11 --------
>   include/hw/vfio/vfio-container-base.h | 11 ++++++++
>   hw/vfio/common.c                      | 38 +++++++++++++--------------
>   hw/vfio/container-base.c              |  1 +
>   hw/vfio/container.c                   |  1 -
>   5 files changed, 31 insertions(+), 31 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index d3dc2f9dcb..8a607a4c17 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -87,20 +87,9 @@ typedef struct VFIOContainer {
>       uint64_t max_dirty_bitmap_size;
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
> -    QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>       GList *iova_ranges;
>   } VFIOContainer;
>   
> -typedef struct VFIORamDiscardListener {
> -    VFIOContainer *container;
> -    MemoryRegion *mr;
> -    hwaddr offset_within_address_space;
> -    hwaddr size;
> -    uint64_t granularity;
> -    RamDiscardListener listener;
> -    QLIST_ENTRY(VFIORamDiscardListener) next;
> -} VFIORamDiscardListener;
> -
>   typedef struct VFIOHostDMAWindow {
>       hwaddr min_iova;
>       hwaddr max_iova;
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 85ec7e1a56..8e05b5ac5a 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -40,6 +40,7 @@ typedef struct VFIOContainerBase {
>       unsigned int dma_max_mappings;
>       bool dirty_pages_supported;
>       QLIST_HEAD(, VFIOGuestIOMMU) giommu_list;
> +    QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>       QLIST_ENTRY(VFIOContainerBase) next;
>       QLIST_HEAD(, VFIODevice) device_list;
>   } VFIOContainerBase;
> @@ -52,6 +53,16 @@ typedef struct VFIOGuestIOMMU {
>       QLIST_ENTRY(VFIOGuestIOMMU) giommu_next;
>   } VFIOGuestIOMMU;
>   
> +typedef struct VFIORamDiscardListener {
> +    VFIOContainerBase *bcontainer;
> +    MemoryRegion *mr;
> +    hwaddr offset_within_address_space;
> +    hwaddr size;
> +    uint64_t granularity;
> +    RamDiscardListener listener;
> +    QLIST_ENTRY(VFIORamDiscardListener) next;
> +} VFIORamDiscardListener;
> +
>   int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>                              hwaddr iova, ram_addr_t size,
>                              void *vaddr, bool readonly);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 1cb53d369e..f15665789f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -351,13 +351,13 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>   {
>       VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>                                                   listener);
> +    VFIOContainerBase *bcontainer = vrdl->bcontainer;
>       const hwaddr size = int128_get64(section->size);
>       const hwaddr iova = section->offset_within_address_space;
>       int ret;
>   
>       /* Unmap with a single call. */
> -    ret = vfio_container_dma_unmap(&vrdl->container->bcontainer,
> -                                   iova, size , NULL);
> +    ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL);
>       if (ret) {
>           error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
>                        strerror(-ret));
> @@ -369,6 +369,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>   {
>       VFIORamDiscardListener *vrdl = container_of(rdl, VFIORamDiscardListener,
>                                                   listener);
> +    VFIOContainerBase *bcontainer = vrdl->bcontainer;
>       const hwaddr end = section->offset_within_region +
>                          int128_get64(section->size);
>       hwaddr start, next, iova;
> @@ -387,8 +388,8 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>                  section->offset_within_address_space;
>           vaddr = memory_region_get_ram_ptr(section->mr) + start;
>   
> -        ret = vfio_container_dma_map(&vrdl->container->bcontainer, iova,
> -                                     next - start, vaddr, section->readonly);
> +        ret = vfio_container_dma_map(bcontainer, iova, next - start,
> +                                     vaddr, section->readonly);
>           if (ret) {
>               /* Rollback */
>               vfio_ram_discard_notify_discard(rdl, section);
> @@ -398,10 +399,9 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>       return 0;
>   }
>   
> -static void vfio_register_ram_discard_listener(VFIOContainer *container,
> +static void vfio_register_ram_discard_listener(VFIOContainerBase *bcontainer,
>                                                  MemoryRegionSection *section)
>   {
> -    VFIOContainerBase *bcontainer = &container->bcontainer;
>       RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
>       VFIORamDiscardListener *vrdl;
>   
> @@ -412,7 +412,7 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
>       g_assert(QEMU_IS_ALIGNED(int128_get64(section->size), TARGET_PAGE_SIZE));
>   
>       vrdl = g_new0(VFIORamDiscardListener, 1);
> -    vrdl->container = container;
> +    vrdl->bcontainer = bcontainer;
>       vrdl->mr = section->mr;
>       vrdl->offset_within_address_space = section->offset_within_address_space;
>       vrdl->size = int128_get64(section->size);
> @@ -427,7 +427,7 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
>                                 vfio_ram_discard_notify_populate,
>                                 vfio_ram_discard_notify_discard, true);
>       ram_discard_manager_register_listener(rdm, &vrdl->listener, section);
> -    QLIST_INSERT_HEAD(&container->vrdl_list, vrdl, next);
> +    QLIST_INSERT_HEAD(&bcontainer->vrdl_list, vrdl, next);
>   
>       /*
>        * Sanity-check if we have a theoretically problematic setup where we could
> @@ -451,7 +451,7 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
>           }
>   #endif
>   
> -        QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
> +        QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
>               hwaddr start, end;
>   
>               start = QEMU_ALIGN_DOWN(vrdl->offset_within_address_space,
> @@ -473,13 +473,13 @@ static void vfio_register_ram_discard_listener(VFIOContainer *container,
>       }
>   }
>   
> -static void vfio_unregister_ram_discard_listener(VFIOContainer *container,
> +static void vfio_unregister_ram_discard_listener(VFIOContainerBase *bcontainer,
>                                                    MemoryRegionSection *section)
>   {
>       RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
>       VFIORamDiscardListener *vrdl = NULL;
>   
> -    QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
> +    QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
>           if (vrdl->mr == section->mr &&
>               vrdl->offset_within_address_space ==
>               section->offset_within_address_space) {
> @@ -663,7 +663,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>        * about changes.
>        */
>       if (memory_region_has_ram_discard_manager(section->mr)) {
> -        vfio_register_ram_discard_listener(container, section);
> +        vfio_register_ram_discard_listener(bcontainer, section);
>           return;
>       }
>   
> @@ -781,7 +781,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>           pgmask = (1ULL << ctz64(bcontainer->pgsizes)) - 1;
>           try_unmap = !((iova & pgmask) || (int128_get64(llsize) & pgmask));
>       } else if (memory_region_has_ram_discard_manager(section->mr)) {
> -        vfio_unregister_ram_discard_listener(container, section);
> +        vfio_unregister_ram_discard_listener(bcontainer, section);
>           /* Unregistering will trigger an unmap. */
>           try_unmap = false;
>       }
> @@ -1260,17 +1260,17 @@ static int vfio_ram_discard_get_dirty_bitmap(MemoryRegionSection *section,
>        * Sync the whole mapped region (spanning multiple individual mappings)
>        * in one go.
>        */
> -    return vfio_get_dirty_bitmap(&vrdl->container->bcontainer, iova, size,
> -                                 ram_addr);
> +    return vfio_get_dirty_bitmap(vrdl->bcontainer, iova, size, ram_addr);
>   }
>   
> -static int vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainer *container,
> -                                                   MemoryRegionSection *section)
> +static int
> +vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
> +                                            MemoryRegionSection *section)
>   {
>       RamDiscardManager *rdm = memory_region_get_ram_discard_manager(section->mr);
>       VFIORamDiscardListener *vrdl = NULL;
>   
> -    QLIST_FOREACH(vrdl, &container->vrdl_list, next) {
> +    QLIST_FOREACH(vrdl, &bcontainer->vrdl_list, next) {
>           if (vrdl->mr == section->mr &&
>               vrdl->offset_within_address_space ==
>               section->offset_within_address_space) {
> @@ -1324,7 +1324,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
>           }
>           return 0;
>       } else if (memory_region_has_ram_discard_manager(section->mr)) {
> -        return vfio_sync_ram_discard_listener_dirty_bitmap(container, section);
> +        return vfio_sync_ram_discard_listener_dirty_bitmap(bcontainer, section);
>       }
>   
>       ram_addr = memory_region_get_ram_addr(section->mr) +
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index dcce111349..584eee4ba1 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -54,6 +54,7 @@ void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
>       bcontainer->dirty_pages_supported = false;
>       bcontainer->dma_max_mappings = 0;
>       QLIST_INIT(&bcontainer->giommu_list);
> +    QLIST_INIT(&bcontainer->vrdl_list);
>   }
>   
>   void vfio_container_destroy(VFIOContainerBase *bcontainer)
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index c5a6262882..6ba2e2f8c4 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -560,7 +560,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       container->fd = fd;
>       container->error = NULL;
>       container->iova_ranges = NULL;
> -    QLIST_INIT(&container->vrdl_list);
>       bcontainer = &container->bcontainer;
>       vfio_container_init(bcontainer, space, &vfio_legacy_ops);
>   



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 17/41] vfio/container: Move listener to base container
  2023-11-02  7:12 ` [PATCH v4 17/41] vfio/container: Move listener " Zhenzhong Duan
@ 2023-11-06 16:57   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:57 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun, Nicholas Piggin, Daniel Henrique Barboza, David Gibson,
	Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> Move listener to base container. Also error and initialized fields
> are moved at the same time.
> 
> No functional change intended.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.



> ---
>   include/hw/vfio/vfio-common.h         |   3 -
>   include/hw/vfio/vfio-container-base.h |   3 +
>   hw/vfio/common.c                      | 110 +++++++++++++-------------
>   hw/vfio/container-base.c              |   1 +
>   hw/vfio/container.c                   |  19 +++--
>   hw/vfio/spapr.c                       |  11 +--
>   6 files changed, 74 insertions(+), 73 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 8a607a4c17..922022cbc6 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -78,11 +78,8 @@ struct VFIOGroup;
>   typedef struct VFIOContainer {
>       VFIOContainerBase bcontainer;
>       int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> -    MemoryListener listener;
>       MemoryListener prereg_listener;
>       unsigned iommu_type;
> -    Error *error;
> -    bool initialized;
>       uint64_t dirty_pgsizes;
>       uint64_t max_dirty_bitmap_size;
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 8e05b5ac5a..95f8d319e0 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -36,6 +36,9 @@ typedef struct VFIOAddressSpace {
>   typedef struct VFIOContainerBase {
>       const VFIOIOMMUOps *ops;
>       VFIOAddressSpace *space;
> +    MemoryListener listener;
> +    Error *error;
> +    bool initialized;
>       unsigned long pgsizes;
>       unsigned int dma_max_mappings;
>       bool dirty_pages_supported;
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index f15665789f..be623e544b 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -541,7 +541,7 @@ static bool vfio_listener_valid_section(MemoryRegionSection *section,
>       return true;
>   }
>   
> -static bool vfio_get_section_iova_range(VFIOContainer *container,
> +static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer,
>                                           MemoryRegionSection *section,
>                                           hwaddr *out_iova, hwaddr *out_end,
>                                           Int128 *out_llend)
> @@ -569,8 +569,10 @@ static bool vfio_get_section_iova_range(VFIOContainer *container,
>   static void vfio_listener_region_add(MemoryListener *listener,
>                                        MemoryRegionSection *section)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> -    VFIOContainerBase *bcontainer = &container->bcontainer;
> +    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
> +                                                 listener);
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       hwaddr iova, end;
>       Int128 llend, llsize;
>       void *vaddr;
> @@ -581,7 +583,8 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           return;
>       }
>   
> -    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
> +    if (!vfio_get_section_iova_range(bcontainer, section, &iova, &end,
> +                                     &llend)) {
>           if (memory_region_is_ram_device(section->mr)) {
>               trace_vfio_listener_region_add_no_dma_map(
>                   memory_region_name(section->mr),
> @@ -688,13 +691,12 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           }
>       }
>   
> -    ret = vfio_container_dma_map(&container->bcontainer,
> -                                 iova, int128_get64(llsize), vaddr,
> -                                 section->readonly);
> +    ret = vfio_container_dma_map(bcontainer, iova, int128_get64(llsize),
> +                                 vaddr, section->readonly);
>       if (ret) {
>           error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
>                      "0x%"HWADDR_PRIx", %p) = %d (%s)",
> -                   container, iova, int128_get64(llsize), vaddr, ret,
> +                   bcontainer, iova, int128_get64(llsize), vaddr, ret,
>                      strerror(-ret));
>           if (memory_region_is_ram_device(section->mr)) {
>               /* Allow unexpected mappings not to be fatal for RAM devices */
> @@ -716,9 +718,9 @@ fail:
>        * can gracefully fail.  Runtime, there's not much we can do other
>        * than throw a hardware error.
>        */
> -    if (!container->initialized) {
> -        if (!container->error) {
> -            error_propagate_prepend(&container->error, err,
> +    if (!bcontainer->initialized) {
> +        if (!bcontainer->error) {
> +            error_propagate_prepend(&bcontainer->error, err,
>                                       "Region %s: ",
>                                       memory_region_name(section->mr));
>           } else {
> @@ -733,8 +735,10 @@ fail:
>   static void vfio_listener_region_del(MemoryListener *listener,
>                                        MemoryRegionSection *section)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> -    VFIOContainerBase *bcontainer = &container->bcontainer;
> +    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
> +                                                 listener);
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       hwaddr iova, end;
>       Int128 llend, llsize;
>       int ret;
> @@ -767,7 +771,8 @@ static void vfio_listener_region_del(MemoryListener *listener,
>            */
>       }
>   
> -    if (!vfio_get_section_iova_range(container, section, &iova, &end, &llend)) {
> +    if (!vfio_get_section_iova_range(bcontainer, section, &iova, &end,
> +                                     &llend)) {
>           return;
>       }
>   
> @@ -790,22 +795,22 @@ static void vfio_listener_region_del(MemoryListener *listener,
>           if (int128_eq(llsize, int128_2_64())) {
>               /* The unmap ioctl doesn't accept a full 64-bit span. */
>               llsize = int128_rshift(llsize, 1);
> -            ret = vfio_container_dma_unmap(&container->bcontainer, iova,
> +            ret = vfio_container_dma_unmap(bcontainer, iova,
>                                              int128_get64(llsize), NULL);
>               if (ret) {
>                   error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>                                "0x%"HWADDR_PRIx") = %d (%s)",
> -                             container, iova, int128_get64(llsize), ret,
> +                             bcontainer, iova, int128_get64(llsize), ret,
>                                strerror(-ret));
>               }
>               iova += int128_get64(llsize);
>           }
> -        ret = vfio_container_dma_unmap(&container->bcontainer, iova,
> +        ret = vfio_container_dma_unmap(bcontainer, iova,
>                                          int128_get64(llsize), NULL);
>           if (ret) {
>               error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>                            "0x%"HWADDR_PRIx") = %d (%s)",
> -                         container, iova, int128_get64(llsize), ret,
> +                         bcontainer, iova, int128_get64(llsize), ret,
>                            strerror(-ret));
>           }
>       }
> @@ -825,16 +830,15 @@ typedef struct VFIODirtyRanges {
>   } VFIODirtyRanges;
>   
>   typedef struct VFIODirtyRangesListener {
> -    VFIOContainer *container;
> +    VFIOContainerBase *bcontainer;
>       VFIODirtyRanges ranges;
>       MemoryListener listener;
>   } VFIODirtyRangesListener;
>   
>   static bool vfio_section_is_vfio_pci(MemoryRegionSection *section,
> -                                     VFIOContainer *container)
> +                                     VFIOContainerBase *bcontainer)
>   {
>       VFIOPCIDevice *pcidev;
> -    VFIOContainerBase *bcontainer = &container->bcontainer;
>       VFIODevice *vbasedev;
>       Object *owner;
>   
> @@ -863,7 +867,7 @@ static void vfio_dirty_tracking_update(MemoryListener *listener,
>       hwaddr iova, end, *min, *max;
>   
>       if (!vfio_listener_valid_section(section, "tracking_update") ||
> -        !vfio_get_section_iova_range(dirty->container, section,
> +        !vfio_get_section_iova_range(dirty->bcontainer, section,
>                                        &iova, &end, NULL)) {
>           return;
>       }
> @@ -887,7 +891,7 @@ static void vfio_dirty_tracking_update(MemoryListener *listener,
>        * The alternative would be an IOVATree but that has a much bigger runtime
>        * overhead and unnecessary complexity.
>        */
> -    if (vfio_section_is_vfio_pci(section, dirty->container) &&
> +    if (vfio_section_is_vfio_pci(section, dirty->bcontainer) &&
>           iova >= UINT32_MAX) {
>           min = &range->minpci64;
>           max = &range->maxpci64;
> @@ -911,7 +915,7 @@ static const MemoryListener vfio_dirty_tracking_listener = {
>       .region_add = vfio_dirty_tracking_update,
>   };
>   
> -static void vfio_dirty_tracking_init(VFIOContainer *container,
> +static void vfio_dirty_tracking_init(VFIOContainerBase *bcontainer,
>                                        VFIODirtyRanges *ranges)
>   {
>       VFIODirtyRangesListener dirty;
> @@ -921,10 +925,10 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
>       dirty.ranges.min64 = UINT64_MAX;
>       dirty.ranges.minpci64 = UINT64_MAX;
>       dirty.listener = vfio_dirty_tracking_listener;
> -    dirty.container = container;
> +    dirty.bcontainer = bcontainer;
>   
>       memory_listener_register(&dirty.listener,
> -                             container->bcontainer.space->as);
> +                             bcontainer->space->as);
>   
>       *ranges = dirty.ranges;
>   
> @@ -936,12 +940,11 @@ static void vfio_dirty_tracking_init(VFIOContainer *container,
>       memory_listener_unregister(&dirty.listener);
>   }
>   
> -static void vfio_devices_dma_logging_stop(VFIOContainer *container)
> +static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
>   {
>       uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature),
>                                 sizeof(uint64_t))] = {};
>       struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
> -    VFIOContainerBase *bcontainer = &container->bcontainer;
>       VFIODevice *vbasedev;
>   
>       feature->argsz = sizeof(buf);
> @@ -962,7 +965,7 @@ static void vfio_devices_dma_logging_stop(VFIOContainer *container)
>   }
>   
>   static struct vfio_device_feature *
> -vfio_device_feature_dma_logging_start_create(VFIOContainer *container,
> +vfio_device_feature_dma_logging_start_create(VFIOContainerBase *bcontainer,
>                                                VFIODirtyRanges *tracking)
>   {
>       struct vfio_device_feature *feature;
> @@ -1035,16 +1038,15 @@ static void vfio_device_feature_dma_logging_start_destroy(
>       g_free(feature);
>   }
>   
> -static int vfio_devices_dma_logging_start(VFIOContainer *container)
> +static int vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer)
>   {
>       struct vfio_device_feature *feature;
>       VFIODirtyRanges ranges;
> -    VFIOContainerBase *bcontainer = &container->bcontainer;
>       VFIODevice *vbasedev;
>       int ret = 0;
>   
> -    vfio_dirty_tracking_init(container, &ranges);
> -    feature = vfio_device_feature_dma_logging_start_create(container,
> +    vfio_dirty_tracking_init(bcontainer, &ranges);
> +    feature = vfio_device_feature_dma_logging_start_create(bcontainer,
>                                                              &ranges);
>       if (!feature) {
>           return -errno;
> @@ -1067,7 +1069,7 @@ static int vfio_devices_dma_logging_start(VFIOContainer *container)
>   
>   out:
>       if (ret) {
> -        vfio_devices_dma_logging_stop(container);
> +        vfio_devices_dma_logging_stop(bcontainer);
>       }
>   
>       vfio_device_feature_dma_logging_start_destroy(feature);
> @@ -1077,14 +1079,14 @@ out:
>   
>   static void vfio_listener_log_global_start(MemoryListener *listener)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> +    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
> +                                                 listener);
>       int ret;
>   
> -    if (vfio_devices_all_device_dirty_tracking(&container->bcontainer)) {
> -        ret = vfio_devices_dma_logging_start(container);
> +    if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
> +        ret = vfio_devices_dma_logging_start(bcontainer);
>       } else {
> -        ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
> -                                                     true);
> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, true);
>       }
>   
>       if (ret) {
> @@ -1096,14 +1098,14 @@ static void vfio_listener_log_global_start(MemoryListener *listener)
>   
>   static void vfio_listener_log_global_stop(MemoryListener *listener)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> +    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
> +                                                 listener);
>       int ret = 0;
>   
> -    if (vfio_devices_all_device_dirty_tracking(&container->bcontainer)) {
> -        vfio_devices_dma_logging_stop(container);
> +    if (vfio_devices_all_device_dirty_tracking(bcontainer)) {
> +        vfio_devices_dma_logging_stop(bcontainer);
>       } else {
> -        ret = vfio_container_set_dirty_page_tracking(&container->bcontainer,
> -                                                     false);
> +        ret = vfio_container_set_dirty_page_tracking(bcontainer, false);
>       }
>   
>       if (ret) {
> @@ -1214,8 +1216,6 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>                                                   vfio_giommu_dirty_notifier, n);
>       VFIOGuestIOMMU *giommu = gdn->giommu;
>       VFIOContainerBase *bcontainer = giommu->bcontainer;
> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> -                                            bcontainer);
>       hwaddr iova = iotlb->iova + giommu->iommu_offset;
>       ram_addr_t translated_addr;
>       int ret = -EINVAL;
> @@ -1230,12 +1230,12 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>   
>       rcu_read_lock();
>       if (vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL)) {
> -        ret = vfio_get_dirty_bitmap(&container->bcontainer, iova,
> -                                    iotlb->addr_mask + 1, translated_addr);
> +        ret = vfio_get_dirty_bitmap(bcontainer, iova, iotlb->addr_mask + 1,
> +                                    translated_addr);
>           if (ret) {
>               error_report("vfio_iommu_map_dirty_notify(%p, 0x%"HWADDR_PRIx", "
>                            "0x%"HWADDR_PRIx") = %d (%s)",
> -                         container, iova, iotlb->addr_mask + 1, ret,
> +                         bcontainer, iova, iotlb->addr_mask + 1, ret,
>                            strerror(-ret));
>           }
>       }
> @@ -1291,10 +1291,9 @@ vfio_sync_ram_discard_listener_dirty_bitmap(VFIOContainerBase *bcontainer,
>                                                   &vrdl);
>   }
>   
> -static int vfio_sync_dirty_bitmap(VFIOContainer *container,
> +static int vfio_sync_dirty_bitmap(VFIOContainerBase *bcontainer,
>                                     MemoryRegionSection *section)
>   {
> -    VFIOContainerBase *bcontainer = &container->bcontainer;
>       ram_addr_t ram_addr;
>   
>       if (memory_region_is_iommu(section->mr)) {
> @@ -1330,7 +1329,7 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
>       ram_addr = memory_region_get_ram_addr(section->mr) +
>                  section->offset_within_region;
>   
> -    return vfio_get_dirty_bitmap(&container->bcontainer,
> +    return vfio_get_dirty_bitmap(bcontainer,
>                      REAL_HOST_PAGE_ALIGN(section->offset_within_address_space),
>                      int128_get64(section->size), ram_addr);
>   }
> @@ -1338,15 +1337,16 @@ static int vfio_sync_dirty_bitmap(VFIOContainer *container,
>   static void vfio_listener_log_sync(MemoryListener *listener,
>           MemoryRegionSection *section)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> +    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
> +                                                 listener);
>       int ret;
>   
>       if (vfio_listener_skipped_section(section)) {
>           return;
>       }
>   
> -    if (vfio_devices_all_dirty_tracking(&container->bcontainer)) {
> -        ret = vfio_sync_dirty_bitmap(container, section);
> +    if (vfio_devices_all_dirty_tracking(bcontainer)) {
> +        ret = vfio_sync_dirty_bitmap(bcontainer, section);
>           if (ret) {
>               error_report("vfio: Failed to sync dirty bitmap, err: %d (%s)", ret,
>                            strerror(-ret));
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 584eee4ba1..7f508669f5 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -51,6 +51,7 @@ void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
>   {
>       bcontainer->ops = ops;
>       bcontainer->space = space;
> +    bcontainer->error = NULL;
>       bcontainer->dirty_pages_supported = false;
>       bcontainer->dma_max_mappings = 0;
>       QLIST_INIT(&bcontainer->giommu_list);
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 6ba2e2f8c4..5c1dee8c9f 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -453,6 +453,7 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
>   {
>       struct vfio_info_cap_header *hdr;
>       struct vfio_iommu_type1_info_cap_migration *cap_mig;
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>   
>       hdr = vfio_get_iommu_info_cap(info, VFIO_IOMMU_TYPE1_INFO_CAP_MIGRATION);
>       if (!hdr) {
> @@ -467,7 +468,7 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
>        * qemu_real_host_page_size to mark those dirty.
>        */
>       if (cap_mig->pgsize_bitmap & qemu_real_host_page_size()) {
> -        container->bcontainer.dirty_pages_supported = true;
> +        bcontainer->dirty_pages_supported = true;
>           container->max_dirty_bitmap_size = cap_mig->max_dirty_bitmap_size;
>           container->dirty_pgsizes = cap_mig->pgsize_bitmap;
>       }
> @@ -558,7 +559,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>   
>       container = g_malloc0(sizeof(*container));
>       container->fd = fd;
> -    container->error = NULL;
>       container->iova_ranges = NULL;
>       bcontainer = &container->bcontainer;
>       vfio_container_init(bcontainer, space, &vfio_legacy_ops);
> @@ -621,25 +621,24 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>       group->container = container;
>       QLIST_INSERT_HEAD(&container->group_list, group, container_next);
>   
> -    container->listener = vfio_memory_listener;
> -
> -    memory_listener_register(&container->listener, bcontainer->space->as);
> +    bcontainer->listener = vfio_memory_listener;
> +    memory_listener_register(&bcontainer->listener, bcontainer->space->as);
>   
> -    if (container->error) {
> +    if (bcontainer->error) {
>           ret = -1;
> -        error_propagate_prepend(errp, container->error,
> +        error_propagate_prepend(errp, bcontainer->error,
>               "memory listener initialization failed: ");
>           goto listener_release_exit;
>       }
>   
> -    container->initialized = true;
> +    bcontainer->initialized = true;
>   
>       return 0;
>   listener_release_exit:
>       QLIST_REMOVE(group, container_next);
>       QLIST_REMOVE(bcontainer, next);
>       vfio_kvm_device_del_group(group);
> -    memory_listener_unregister(&container->listener);
> +    memory_listener_unregister(&bcontainer->listener);
>       if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU ||
>           container->iommu_type == VFIO_SPAPR_TCE_IOMMU) {
>           vfio_spapr_container_deinit(container);
> @@ -674,7 +673,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>        * group.
>        */
>       if (QLIST_EMPTY(&container->group_list)) {
> -        memory_listener_unregister(&container->listener);
> +        memory_listener_unregister(&bcontainer->listener);
>           if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU ||
>               container->iommu_type == VFIO_SPAPR_TCE_IOMMU) {
>               vfio_spapr_container_deinit(container);
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 4f76bdd3ca..7a50975f25 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -46,6 +46,7 @@ static void vfio_prereg_listener_region_add(MemoryListener *listener,
>   {
>       VFIOContainer *container = container_of(listener, VFIOContainer,
>                                               prereg_listener);
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>       const hwaddr gpa = section->offset_within_address_space;
>       hwaddr end;
>       int ret;
> @@ -88,9 +89,9 @@ static void vfio_prereg_listener_region_add(MemoryListener *listener,
>            * can gracefully fail.  Runtime, there's not much we can do other
>            * than throw a hardware error.
>            */
> -        if (!container->initialized) {
> -            if (!container->error) {
> -                error_setg_errno(&container->error, -ret,
> +        if (!bcontainer->initialized) {
> +            if (!bcontainer->error) {
> +                error_setg_errno(&bcontainer->error, -ret,
>                                    "Memory registering failed");
>               }
>           } else {
> @@ -445,9 +446,9 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>   
>           memory_listener_register(&container->prereg_listener,
>                                    &address_space_memory);
> -        if (container->error) {
> +        if (bcontainer->error) {
>               ret = -1;
> -            error_propagate_prepend(errp, container->error,
> +            error_propagate_prepend(errp, bcontainer->error,
>                       "RAM memory listener initialization failed: ");
>               goto listener_unregister_exit;
>           }



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 19/41] vfio/container: Move iova_ranges to base container
  2023-11-02  7:12 ` [PATCH v4 19/41] vfio/container: Move iova_ranges " Zhenzhong Duan
@ 2023-11-06 16:58   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:58 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Meanwhile remove the helper function vfio_free_container as it
> only calls g_free now.
> 
> No functional change intended.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/hw/vfio/vfio-common.h         |  1 -
>   include/hw/vfio/vfio-container-base.h |  1 +
>   hw/vfio/common.c                      |  5 +++--
>   hw/vfio/container-base.c              |  3 +++
>   hw/vfio/container.c                   | 19 ++++++-------------
>   5 files changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index b1c9fe711b..b9e5a0e64b 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -82,7 +82,6 @@ typedef struct VFIOContainer {
>       unsigned iommu_type;
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
> -    GList *iova_ranges;
>   } VFIOContainer;
>   
>   typedef struct VFIOHostDMAWindow {
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 80e4a993c5..9658ffb526 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -48,6 +48,7 @@ typedef struct VFIOContainerBase {
>       QLIST_HEAD(, VFIORamDiscardListener) vrdl_list;
>       QLIST_ENTRY(VFIOContainerBase) next;
>       QLIST_HEAD(, VFIODevice) device_list;
> +    GList *iova_ranges;
>   } VFIOContainerBase;
>   
>   typedef struct VFIOGuestIOMMU {
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index be623e544b..8ef2e7967d 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -637,9 +637,10 @@ static void vfio_listener_region_add(MemoryListener *listener,
>               goto fail;
>           }
>   
> -        if (container->iova_ranges) {
> +        if (bcontainer->iova_ranges) {
>               ret = memory_region_iommu_set_iova_ranges(giommu->iommu_mr,
> -                    container->iova_ranges, &err);
> +                                                      bcontainer->iova_ranges,
> +                                                      &err);
>               if (ret) {
>                   g_free(giommu);
>                   goto fail;
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 7f508669f5..0177f43741 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -54,6 +54,7 @@ void vfio_container_init(VFIOContainerBase *bcontainer, VFIOAddressSpace *space,
>       bcontainer->error = NULL;
>       bcontainer->dirty_pages_supported = false;
>       bcontainer->dma_max_mappings = 0;
> +    bcontainer->iova_ranges = NULL;
>       QLIST_INIT(&bcontainer->giommu_list);
>       QLIST_INIT(&bcontainer->vrdl_list);
>   }
> @@ -70,4 +71,6 @@ void vfio_container_destroy(VFIOContainerBase *bcontainer)
>           QLIST_REMOVE(giommu, giommu_next);
>           g_free(giommu);
>       }
> +
> +    g_list_free_full(bcontainer->iova_ranges, g_free);
>   }
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index c8088a8174..721c0d7375 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -308,7 +308,7 @@ bool vfio_get_info_dma_avail(struct vfio_iommu_type1_info *info,
>   }
>   
>   static bool vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
> -                                     VFIOContainer *container)
> +                                     VFIOContainerBase *bcontainer)
>   {
>       struct vfio_info_cap_header *hdr;
>       struct vfio_iommu_type1_info_cap_iova_range *cap;
> @@ -326,8 +326,8 @@ static bool vfio_get_info_iova_range(struct vfio_iommu_type1_info *info,
>   
>           range_set_bounds(range, cap->iova_ranges[i].start,
>                            cap->iova_ranges[i].end);
> -        container->iova_ranges =
> -            range_list_insert(container->iova_ranges, range);
> +        bcontainer->iova_ranges =
> +            range_list_insert(bcontainer->iova_ranges, range);
>       }
>   
>       return true;
> @@ -475,12 +475,6 @@ static void vfio_get_iommu_info_migration(VFIOContainer *container,
>       }
>   }
>   
> -static void vfio_free_container(VFIOContainer *container)
> -{
> -    g_list_free_full(container->iova_ranges, g_free);
> -    g_free(container);
> -}
> -
>   static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>                                     Error **errp)
>   {
> @@ -560,7 +554,6 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>   
>       container = g_malloc0(sizeof(*container));
>       container->fd = fd;
> -    container->iova_ranges = NULL;
>       bcontainer = &container->bcontainer;
>       vfio_container_init(bcontainer, space, &vfio_legacy_ops);
>   
> @@ -597,7 +590,7 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
>               bcontainer->dma_max_mappings = 65535;
>           }
>   
> -        vfio_get_info_iova_range(info, container);
> +        vfio_get_info_iova_range(info, bcontainer);
>   
>           vfio_get_iommu_info_migration(container, info);
>           g_free(info);
> @@ -649,7 +642,7 @@ enable_discards_exit:
>       vfio_ram_block_discard_disable(container, false);
>   
>   free_container_exit:
> -    vfio_free_container(container);
> +    g_free(container);
>   
>   close_fd_exit:
>       close(fd);
> @@ -693,7 +686,7 @@ static void vfio_disconnect_container(VFIOGroup *group)
>   
>           trace_vfio_disconnect_container(container->fd);
>           close(container->fd);
> -        vfio_free_container(container);
> +        g_free(container);
>   
>           vfio_put_address_space(space);
>       }



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 20/41] vfio/container: Implement attach/detach_device
  2023-11-02  7:12 ` [PATCH v4 20/41] vfio/container: Implement attach/detach_device Zhenzhong Duan
@ 2023-11-06 16:59   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 16:59 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Yi Sun

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> No fucntional change intended.
> 
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.



> ---
>   hw/vfio/common.c    | 16 ++++++++++++++++
>   hw/vfio/container.c | 12 +++++-------
>   2 files changed, 21 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 8ef2e7967d..483ba82089 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1498,3 +1498,19 @@ retry:
>   
>       return info;
>   }
> +
> +int vfio_attach_device(char *name, VFIODevice *vbasedev,
> +                       AddressSpace *as, Error **errp)
> +{
> +    const VFIOIOMMUOps *ops = &vfio_legacy_ops;
> +
> +    return ops->attach_device(name, vbasedev, as, errp);
> +}
> +
> +void vfio_detach_device(VFIODevice *vbasedev)
> +{
> +    if (!vbasedev->bcontainer) {
> +        return;
> +    }
> +    vbasedev->bcontainer->ops->detach_device(vbasedev);
> +}
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 721c0d7375..6bacf38222 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -873,8 +873,8 @@ static int vfio_device_groupid(VFIODevice *vbasedev, Error **errp)
>    * @name and @vbasedev->name are likely to be different depending
>    * on the type of the device, hence the need for passing @name
>    */
> -int vfio_attach_device(char *name, VFIODevice *vbasedev,
> -                       AddressSpace *as, Error **errp)
> +static int vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
> +                                     AddressSpace *as, Error **errp)
>   {
>       int groupid = vfio_device_groupid(vbasedev, errp);
>       VFIODevice *vbasedev_iter;
> @@ -914,14 +914,10 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
>       return ret;
>   }
>   
> -void vfio_detach_device(VFIODevice *vbasedev)
> +static void vfio_legacy_detach_device(VFIODevice *vbasedev)
>   {
>       VFIOGroup *group = vbasedev->group;
>   
> -    if (!vbasedev->bcontainer) {
> -        return;
> -    }
> -
>       QLIST_REMOVE(vbasedev, global_next);
>       QLIST_REMOVE(vbasedev, container_next);
>       vbasedev->bcontainer = NULL;
> @@ -933,6 +929,8 @@ void vfio_detach_device(VFIODevice *vbasedev)
>   const VFIOIOMMUOps vfio_legacy_ops = {
>       .dma_map = vfio_legacy_dma_map,
>       .dma_unmap = vfio_legacy_dma_unmap,
> +    .attach_device = vfio_legacy_attach_device,
> +    .detach_device = vfio_legacy_detach_device,
>       .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
>       .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
>   };



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 30/41] vfio/iommufd: Add support for iova_ranges
  2023-11-02  7:12 ` [PATCH v4 30/41] vfio/iommufd: Add support for iova_ranges Zhenzhong Duan
@ 2023-11-06 17:19   ` Cédric Le Goater
  2023-11-07  3:07     ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 17:19 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Some vIOMMU such as virtio-iommu use iova ranges from host side to
> setup reserved ranges for passthrough device, so that guest will not
> use an iova range beyond host support.
> 
> Use an uAPI of IOMMUFD to get iova ranges of host side and pass to
> vIOMMU just like the legacy backend.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v4: fix build error in 32bit fedora
> 
>   hw/vfio/iommufd.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 47 insertions(+)
> 
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 1bb55ca2c4..22f02f92a9 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -341,6 +341,52 @@ static int iommufd_ram_block_discard_disable(bool state)
>       return ram_block_uncoordinated_discard_disable(state);
>   }
>   
> +static int vfio_get_info_iova_range(VFIOIOMMUFDContainer *container,
> +                                    uint32_t ioas_id)
> +{
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
> +    struct iommu_ioas_iova_ranges *info;
> +    struct iommu_iova_range *iova_ranges;
> +    int ret, sz, fd = container->be->fd;
> +
> +    info = g_malloc0(sizeof(*info));
> +    info->size = sizeof(*info);
> +    info->ioas_id = ioas_id;
> +
> +    ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
> +    if (ret && errno != EMSGSIZE) {
> +        goto error;
> +    }
> +
> +    sz = info->num_iovas * sizeof(struct iommu_iova_range);
> +    info = g_realloc(info, sizeof(*info) + sz);
> +    info->allowed_iovas = (uintptr_t)(info + 1);
> +
> +    ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
> +    if (ret) {
> +        goto error;
> +    }
> +
> +    iova_ranges = (struct iommu_iova_range *)info->allowed_iovas;

iova_ranges = (struct iommu_iova_range *)(uintptr_t)info->allowed_iovas;

Thanks,

C.


> +
> +    for (int i = 0; i < info->num_iovas; i++) {
> +        Range *range = g_new(Range, 1);
> +
> +        range_set_bounds(range, iova_ranges[i].start, iova_ranges[i].last);
> +        bcontainer->iova_ranges =
> +            range_list_insert(bcontainer->iova_ranges, range);
> +    }
> +
> +    g_free(info);
> +    return 0;
> +
> +error:
> +    ret = -errno;
> +    g_free(info);
> +    error_report("vfio/iommufd: Cannot get iova ranges: %m");
> +    return ret;
> +}
> +
>   static int iommufd_attach_device(const char *name, VFIODevice *vbasedev,
>                                    AddressSpace *as, Error **errp)
>   {
> @@ -418,6 +464,7 @@ static int iommufd_attach_device(const char *name, VFIODevice *vbasedev,
>       }
>   
>       bcontainer->pgsizes = qemu_real_host_page_size();
> +    vfio_get_info_iova_range(container, ioas_id);
>   
>       bcontainer->listener = vfio_memory_listener;
>       memory_listener_register(&bcontainer->listener, bcontainer->space->as);



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 21/41] vfio/spapr: Introduce spapr backend and target interface
  2023-11-02  7:12 ` [PATCH v4 21/41] vfio/spapr: Introduce spapr backend and target interface Zhenzhong Duan
@ 2023-11-06 17:30   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 17:30 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Nicholas Piggin, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Introduce an empty spapr backend which will hold spapr specific
> content, currently only prereg_listener and hostwin_list.
> 
> Also introduce two spapr specific callbacks add/del_window into
> VFIOIOMMUOps. Instantiate a spapr ops with a helper setup_spapr_ops
> and assign it to bcontainer->ops.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
> v4: remove VFIOIOMMUSpaprOps
> 
>   include/hw/vfio/vfio-container-base.h |  6 ++++++
>   hw/vfio/spapr.c                       | 14 ++++++++++++++
>   2 files changed, 20 insertions(+)
> 
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 9658ffb526..f62a14ac73 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -101,5 +101,11 @@ struct VFIOIOMMUOps {
>       int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
>       int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
>                                 hwaddr iova, hwaddr size);
> +    /* SPAPR specific */
> +    int (*add_window)(VFIOContainerBase *bcontainer,
> +                      MemoryRegionSection *section,
> +                      Error **errp);
> +    void (*del_window)(VFIOContainerBase *bcontainer,
> +                       MemoryRegionSection *section);
>   };
>   #endif /* HW_VFIO_VFIO_CONTAINER_BASE_H */
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 7a50975f25..e1a6b35563 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -24,6 +24,10 @@
>   #include "qapi/error.h"
>   #include "trace.h"
>   
> +typedef struct VFIOSpaprContainer {
> +    VFIOContainer container;
> +} VFIOSpaprContainer;
> +
>   static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
>   {
>       if (memory_region_is_iommu(section->mr)) {
> @@ -421,6 +425,14 @@ void vfio_container_del_section_window(VFIOContainer *container,
>       }
>   }
>   
> +static VFIOIOMMUOps vfio_iommu_spapr_ops;
> +
> +static void setup_spapr_ops(VFIOContainerBase *bcontainer)
> +{
> +    vfio_iommu_spapr_ops = *bcontainer->ops;
> +    bcontainer->ops = &vfio_iommu_spapr_ops;
> +}
> +
>   int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>   {
>       VFIOContainerBase *bcontainer = &container->bcontainer;
> @@ -486,6 +498,8 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>                             0x1000);
>       }
>   
> +    setup_spapr_ops(bcontainer);
> +
>       return 0;
>   
>   listener_unregister_exit:



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window
  2023-11-02  7:12 ` [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window Zhenzhong Duan
@ 2023-11-06 17:33   ` Cédric Le Goater
  2023-11-07  3:06     ` Duan, Zhenzhong
  2023-11-07 17:34   ` Cédric Le Goater
  1 sibling, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 17:33 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Nicholas Piggin, Daniel Henrique Barboza, David Gibson,
	Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> No fucntional change intended.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   include/hw/vfio/vfio-common.h         |  5 -----
>   include/hw/vfio/vfio-container-base.h |  5 +++++
>   hw/vfio/common.c                      |  8 ++------
>   hw/vfio/container-base.c              | 21 +++++++++++++++++++++
>   hw/vfio/spapr.c                       | 19 ++++++++++++++-----
>   5 files changed, 42 insertions(+), 16 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index b9e5a0e64b..055f679363 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -169,11 +169,6 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
>   void vfio_put_address_space(VFIOAddressSpace *space);
>   
>   /* SPAPR specific */
> -int vfio_container_add_section_window(VFIOContainer *container,
> -                                      MemoryRegionSection *section,
> -                                      Error **errp);
> -void vfio_container_del_section_window(VFIOContainer *container,
> -                                       MemoryRegionSection *section);
>   int vfio_spapr_container_init(VFIOContainer *container, Error **errp);
>   void vfio_spapr_container_deinit(VFIOContainer *container);
>   
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index f62a14ac73..4b6f017c6f 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -75,6 +75,11 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>   int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>                                hwaddr iova, ram_addr_t size,
>                                IOMMUTLBEntry *iotlb);
> +int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
> +                                      MemoryRegionSection *section,
> +                                      Error **errp);
> +void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
> +                                       MemoryRegionSection *section);
>   int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>                                              bool start);
>   int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 483ba82089..572ae7c934 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -571,8 +571,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>   {
>       VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>                                                    listener);
> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> -                                            bcontainer);
>       hwaddr iova, end;
>       Int128 llend, llsize;
>       void *vaddr;
> @@ -595,7 +593,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           return;
>       }
>   
> -    if (vfio_container_add_section_window(container, section, &err)) {
> +    if (vfio_container_add_section_window(bcontainer, section, &err)) {
>           goto fail;
>       }
>   
> @@ -738,8 +736,6 @@ static void vfio_listener_region_del(MemoryListener *listener,
>   {
>       VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>                                                    listener);
> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> -                                            bcontainer);
>       hwaddr iova, end;
>       Int128 llend, llsize;
>       int ret;
> @@ -818,7 +814,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>   
>       memory_region_unref(section->mr);
>   
> -    vfio_container_del_section_window(container, section);
> +    vfio_container_del_section_window(bcontainer, section);
>   }
>   
>   typedef struct VFIODirtyRanges {
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 0177f43741..71f7274973 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -31,6 +31,27 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>       return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
>   }
>   
> +int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
> +                                      MemoryRegionSection *section,
> +                                      Error **errp)
> +{
> +    if (!bcontainer->ops->add_window) {
> +        return 0;
> +    }

These should an assert right ? because only called on the pseries
platform which defines the handlers.


Thanks,

C.

  


> +    return bcontainer->ops->add_window(bcontainer, section, errp);
> +}
> +
> +void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
> +                                       MemoryRegionSection *section)
> +{
> +    if (!bcontainer->ops->del_window) {
> +        return;
> +    }
> +
> +    return bcontainer->ops->del_window(bcontainer, section);
> +}
> +
>   int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>                                              bool start)
>   {
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index e1a6b35563..5be1911aad 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -319,10 +319,13 @@ static int vfio_spapr_create_window(VFIOContainer *container,
>       return 0;
>   }
>   
> -int vfio_container_add_section_window(VFIOContainer *container,
> -                                      MemoryRegionSection *section,
> -                                      Error **errp)
> +static int
> +vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
> +                                        MemoryRegionSection *section,
> +                                        Error **errp)
>   {
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       VFIOHostDMAWindow *hostwin;
>       hwaddr pgsize = 0;
>       int ret;
> @@ -407,9 +410,13 @@ int vfio_container_add_section_window(VFIOContainer *container,
>       return 0;
>   }
>   
> -void vfio_container_del_section_window(VFIOContainer *container,
> -                                       MemoryRegionSection *section)
> +static void
> +vfio_spapr_container_del_section_window(VFIOContainerBase *bcontainer,
> +                                        MemoryRegionSection *section)
>   {
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
> +
>       if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
>           return;
>       }
> @@ -430,6 +437,8 @@ static VFIOIOMMUOps vfio_iommu_spapr_ops;
>   static void setup_spapr_ops(VFIOContainerBase *bcontainer)
>   {
>       vfio_iommu_spapr_ops = *bcontainer->ops;
> +    vfio_iommu_spapr_ops.add_window = vfio_spapr_container_add_section_window;
> +    vfio_iommu_spapr_ops.del_window = vfio_spapr_container_del_section_window;
>       bcontainer->ops = &vfio_iommu_spapr_ops;
>   }
>   



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 23/41] vfio/spapr: Move prereg_listener into spapr container
  2023-11-02  7:12 ` [PATCH v4 23/41] vfio/spapr: Move prereg_listener into spapr container Zhenzhong Duan
@ 2023-11-06 17:34   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 17:34 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Nicholas Piggin, Daniel Henrique Barboza, David Gibson,
	Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> No functional changes intended.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/hw/vfio/vfio-common.h |  1 -
>   hw/vfio/spapr.c               | 24 ++++++++++++++++--------
>   2 files changed, 16 insertions(+), 9 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 055f679363..ed6148c058 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -78,7 +78,6 @@ struct VFIOGroup;
>   typedef struct VFIOContainer {
>       VFIOContainerBase bcontainer;
>       int fd; /* /dev/vfio/vfio, empowered by the attached groups */
> -    MemoryListener prereg_listener;
>       unsigned iommu_type;
>       QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 5be1911aad..68c3dd6c75 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -26,6 +26,7 @@
>   
>   typedef struct VFIOSpaprContainer {
>       VFIOContainer container;
> +    MemoryListener prereg_listener;
>   } VFIOSpaprContainer;
>   
>   static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
> @@ -48,8 +49,9 @@ static void *vfio_prereg_gpa_to_vaddr(MemoryRegionSection *section, hwaddr gpa)
>   static void vfio_prereg_listener_region_add(MemoryListener *listener,
>                                               MemoryRegionSection *section)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer,
> -                                            prereg_listener);
> +    VFIOSpaprContainer *scontainer = container_of(listener, VFIOSpaprContainer,
> +                                                  prereg_listener);
> +    VFIOContainer *container = &scontainer->container;
>       VFIOContainerBase *bcontainer = &container->bcontainer;
>       const hwaddr gpa = section->offset_within_address_space;
>       hwaddr end;
> @@ -107,8 +109,9 @@ static void vfio_prereg_listener_region_add(MemoryListener *listener,
>   static void vfio_prereg_listener_region_del(MemoryListener *listener,
>                                               MemoryRegionSection *section)
>   {
> -    VFIOContainer *container = container_of(listener, VFIOContainer,
> -                                            prereg_listener);
> +    VFIOSpaprContainer *scontainer = container_of(listener, VFIOSpaprContainer,
> +                                                  prereg_listener);
> +    VFIOContainer *container = &scontainer->container;
>       const hwaddr gpa = section->offset_within_address_space;
>       hwaddr end;
>       int ret;
> @@ -445,6 +448,8 @@ static void setup_spapr_ops(VFIOContainerBase *bcontainer)
>   int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>   {
>       VFIOContainerBase *bcontainer = &container->bcontainer;
> +    VFIOSpaprContainer *scontainer = container_of(container, VFIOSpaprContainer,
> +                                                  container);
>       struct vfio_iommu_spapr_tce_info info;
>       bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
>       int ret, fd = container->fd;
> @@ -463,9 +468,9 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>               return -errno;
>           }
>       } else {
> -        container->prereg_listener = vfio_prereg_listener;
> +        scontainer->prereg_listener = vfio_prereg_listener;
>   
> -        memory_listener_register(&container->prereg_listener,
> +        memory_listener_register(&scontainer->prereg_listener,
>                                    &address_space_memory);
>           if (bcontainer->error) {
>               ret = -1;
> @@ -513,7 +518,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>   
>   listener_unregister_exit:
>       if (v2) {
> -        memory_listener_unregister(&container->prereg_listener);
> +        memory_listener_unregister(&scontainer->prereg_listener);
>       }
>       return ret;
>   }
> @@ -523,7 +528,10 @@ void vfio_spapr_container_deinit(VFIOContainer *container)
>       VFIOHostDMAWindow *hostwin, *next;
>   
>       if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
> -        memory_listener_unregister(&container->prereg_listener);
> +        VFIOSpaprContainer *scontainer = container_of(container,
> +                                                      VFIOSpaprContainer,
> +                                                      container);
> +        memory_listener_unregister(&scontainer->prereg_listener);
>       }
>       QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
>                          next) {



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 24/41] vfio/spapr: Move hostwin_list into spapr container
  2023-11-02  7:12 ` [PATCH v4 24/41] vfio/spapr: Move hostwin_list " Zhenzhong Duan
@ 2023-11-06 17:35   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-06 17:35 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Nicholas Piggin, Daniel Henrique Barboza, Cédric Le Goater,
	David Gibson, Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> No functional changes intended.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/hw/vfio/vfio-common.h |  1 -
>   hw/vfio/spapr.c               | 36 +++++++++++++++++++----------------
>   2 files changed, 20 insertions(+), 17 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index ed6148c058..24ecc0e7ee 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -79,7 +79,6 @@ typedef struct VFIOContainer {
>       VFIOContainerBase bcontainer;
>       int fd; /* /dev/vfio/vfio, empowered by the attached groups */
>       unsigned iommu_type;
> -    QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>       QLIST_HEAD(, VFIOGroup) group_list;
>   } VFIOContainer;
>   
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index 68c3dd6c75..5c6426e697 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -27,6 +27,7 @@
>   typedef struct VFIOSpaprContainer {
>       VFIOContainer container;
>       MemoryListener prereg_listener;
> +    QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list;
>   } VFIOSpaprContainer;
>   
>   static bool vfio_prereg_listener_skipped_section(MemoryRegionSection *section)
> @@ -154,12 +155,12 @@ static const MemoryListener vfio_prereg_listener = {
>       .region_del = vfio_prereg_listener_region_del,
>   };
>   
> -static void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova,
> +static void vfio_host_win_add(VFIOSpaprContainer *scontainer, hwaddr min_iova,
>                                 hwaddr max_iova, uint64_t iova_pgsizes)
>   {
>       VFIOHostDMAWindow *hostwin;
>   
> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +    QLIST_FOREACH(hostwin, &scontainer->hostwin_list, hostwin_next) {
>           if (ranges_overlap(hostwin->min_iova,
>                              hostwin->max_iova - hostwin->min_iova + 1,
>                              min_iova,
> @@ -173,15 +174,15 @@ static void vfio_host_win_add(VFIOContainer *container, hwaddr min_iova,
>       hostwin->min_iova = min_iova;
>       hostwin->max_iova = max_iova;
>       hostwin->iova_pgsizes = iova_pgsizes;
> -    QLIST_INSERT_HEAD(&container->hostwin_list, hostwin, hostwin_next);
> +    QLIST_INSERT_HEAD(&scontainer->hostwin_list, hostwin, hostwin_next);
>   }
>   
> -static int vfio_host_win_del(VFIOContainer *container,
> +static int vfio_host_win_del(VFIOSpaprContainer *scontainer,
>                                hwaddr min_iova, hwaddr max_iova)
>   {
>       VFIOHostDMAWindow *hostwin;
>   
> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +    QLIST_FOREACH(hostwin, &scontainer->hostwin_list, hostwin_next) {
>           if (hostwin->min_iova == min_iova && hostwin->max_iova == max_iova) {
>               QLIST_REMOVE(hostwin, hostwin_next);
>               g_free(hostwin);
> @@ -192,7 +193,7 @@ static int vfio_host_win_del(VFIOContainer *container,
>       return -1;
>   }
>   
> -static VFIOHostDMAWindow *vfio_find_hostwin(VFIOContainer *container,
> +static VFIOHostDMAWindow *vfio_find_hostwin(VFIOSpaprContainer *container,
>                                               hwaddr iova, hwaddr end)
>   {
>       VFIOHostDMAWindow *hostwin;
> @@ -329,6 +330,8 @@ vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
>   {
>       VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>                                               bcontainer);
> +    VFIOSpaprContainer *scontainer = container_of(container, VFIOSpaprContainer,
> +                                                  container);
>       VFIOHostDMAWindow *hostwin;
>       hwaddr pgsize = 0;
>       int ret;
> @@ -344,7 +347,7 @@ vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
>           iova = section->offset_within_address_space;
>           end = iova + int128_get64(section->size) - 1;
>   
> -        if (!vfio_find_hostwin(container, iova, end)) {
> +        if (!vfio_find_hostwin(scontainer, iova, end)) {
>               error_setg(errp, "Container %p can't map guest IOVA region"
>                          " 0x%"HWADDR_PRIx"..0x%"HWADDR_PRIx, container,
>                          iova, end);
> @@ -358,7 +361,7 @@ vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
>       }
>   
>       /* For now intersections are not allowed, we may relax this later */
> -    QLIST_FOREACH(hostwin, &container->hostwin_list, hostwin_next) {
> +    QLIST_FOREACH(hostwin, &scontainer->hostwin_list, hostwin_next) {
>           if (ranges_overlap(hostwin->min_iova,
>                              hostwin->max_iova - hostwin->min_iova + 1,
>                              section->offset_within_address_space,
> @@ -380,7 +383,7 @@ vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
>           return ret;
>       }
>   
> -    vfio_host_win_add(container, section->offset_within_address_space,
> +    vfio_host_win_add(scontainer, section->offset_within_address_space,
>                         section->offset_within_address_space +
>                         int128_get64(section->size) - 1, pgsize);
>   #ifdef CONFIG_KVM
> @@ -419,6 +422,8 @@ vfio_spapr_container_del_section_window(VFIOContainerBase *bcontainer,
>   {
>       VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>                                               bcontainer);
> +    VFIOSpaprContainer *scontainer = container_of(container, VFIOSpaprContainer,
> +                                                  container);
>   
>       if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
>           return;
> @@ -426,7 +431,7 @@ vfio_spapr_container_del_section_window(VFIOContainerBase *bcontainer,
>   
>       vfio_spapr_remove_window(container,
>                                section->offset_within_address_space);
> -    if (vfio_host_win_del(container,
> +    if (vfio_host_win_del(scontainer,
>                             section->offset_within_address_space,
>                             section->offset_within_address_space +
>                             int128_get64(section->size) - 1) < 0) {
> @@ -454,7 +459,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>       bool v2 = container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU;
>       int ret, fd = container->fd;
>   
> -    QLIST_INIT(&container->hostwin_list);
> +    QLIST_INIT(&scontainer->hostwin_list);
>   
>       /*
>        * The host kernel code implementing VFIO_IOMMU_DISABLE is called
> @@ -506,7 +511,7 @@ int vfio_spapr_container_init(VFIOContainer *container, Error **errp)
>       } else {
>           /* The default table uses 4K pages */
>           bcontainer->pgsizes = 0x1000;
> -        vfio_host_win_add(container, info.dma32_window_start,
> +        vfio_host_win_add(scontainer, info.dma32_window_start,
>                             info.dma32_window_start +
>                             info.dma32_window_size - 1,
>                             0x1000);
> @@ -525,15 +530,14 @@ listener_unregister_exit:
>   
>   void vfio_spapr_container_deinit(VFIOContainer *container)
>   {
> +    VFIOSpaprContainer *scontainer = container_of(container, VFIOSpaprContainer,
> +                                                  container);
>       VFIOHostDMAWindow *hostwin, *next;
>   
>       if (container->iommu_type == VFIO_SPAPR_TCE_v2_IOMMU) {
> -        VFIOSpaprContainer *scontainer = container_of(container,
> -                                                      VFIOSpaprContainer,
> -                                                      container);
>           memory_listener_unregister(&scontainer->prereg_listener);
>       }
> -    QLIST_FOREACH_SAFE(hostwin, &container->hostwin_list, hostwin_next,
> +    QLIST_FOREACH_SAFE(hostwin, &scontainer->hostwin_list, hostwin_next,
>                          next) {
>           QLIST_REMOVE(hostwin, hostwin_next);
>           g_free(hostwin);



^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window
  2023-11-06 17:33   ` Cédric Le Goater
@ 2023-11-07  3:06     ` Duan, Zhenzhong
  2023-11-07 13:07       ` Cédric Le Goater
  0 siblings, 1 reply; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-07  3:06 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Nicholas Piggin, Daniel Henrique Barboza, David Gibson,
	Harsh Prateek Bora, open list:sPAPR (pseries)



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 7, 2023 1:33 AM
>Subject: Re: [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE
>add/del_section_window
>
>On 11/2/23 08:12, Zhenzhong Duan wrote:
>> No fucntional change intended.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   include/hw/vfio/vfio-common.h         |  5 -----
>>   include/hw/vfio/vfio-container-base.h |  5 +++++
>>   hw/vfio/common.c                      |  8 ++------
>>   hw/vfio/container-base.c              | 21 +++++++++++++++++++++
>>   hw/vfio/spapr.c                       | 19 ++++++++++++++-----
>>   5 files changed, 42 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index b9e5a0e64b..055f679363 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -169,11 +169,6 @@ VFIOAddressSpace
>*vfio_get_address_space(AddressSpace *as);
>>   void vfio_put_address_space(VFIOAddressSpace *space);
>>
>>   /* SPAPR specific */
>> -int vfio_container_add_section_window(VFIOContainer *container,
>> -                                      MemoryRegionSection *section,
>> -                                      Error **errp);
>> -void vfio_container_del_section_window(VFIOContainer *container,
>> -                                       MemoryRegionSection *section);
>>   int vfio_spapr_container_init(VFIOContainer *container, Error **errp);
>>   void vfio_spapr_container_deinit(VFIOContainer *container);
>>
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-
>container-base.h
>> index f62a14ac73..4b6f017c6f 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -75,6 +75,11 @@ int vfio_container_dma_map(VFIOContainerBase
>*bcontainer,
>>   int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>>                                hwaddr iova, ram_addr_t size,
>>                                IOMMUTLBEntry *iotlb);
>> +int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>> +                                      MemoryRegionSection *section,
>> +                                      Error **errp);
>> +void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>> +                                       MemoryRegionSection *section);
>>   int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>>                                              bool start);
>>   int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 483ba82089..572ae7c934 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -571,8 +571,6 @@ static void vfio_listener_region_add(MemoryListener
>*listener,
>>   {
>>       VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>>                                                    listener);
>> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>> -                                            bcontainer);
>>       hwaddr iova, end;
>>       Int128 llend, llsize;
>>       void *vaddr;
>> @@ -595,7 +593,7 @@ static void vfio_listener_region_add(MemoryListener
>*listener,
>>           return;
>>       }
>>
>> -    if (vfio_container_add_section_window(container, section, &err)) {
>> +    if (vfio_container_add_section_window(bcontainer, section, &err)) {
>>           goto fail;
>>       }
>>
>> @@ -738,8 +736,6 @@ static void vfio_listener_region_del(MemoryListener
>*listener,
>>   {
>>       VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>>                                                    listener);
>> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>> -                                            bcontainer);
>>       hwaddr iova, end;
>>       Int128 llend, llsize;
>>       int ret;
>> @@ -818,7 +814,7 @@ static void vfio_listener_region_del(MemoryListener
>*listener,
>>
>>       memory_region_unref(section->mr);
>>
>> -    vfio_container_del_section_window(container, section);
>> +    vfio_container_del_section_window(bcontainer, section);
>>   }
>>
>>   typedef struct VFIODirtyRanges {
>> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
>> index 0177f43741..71f7274973 100644
>> --- a/hw/vfio/container-base.c
>> +++ b/hw/vfio/container-base.c
>> @@ -31,6 +31,27 @@ int vfio_container_dma_unmap(VFIOContainerBase
>*bcontainer,
>>       return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
>>   }
>>
>> +int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>> +                                      MemoryRegionSection *section,
>> +                                      Error **errp)
>> +{
>> +    if (!bcontainer->ops->add_window) {
>> +        return 0;
>> +    }
>
>These should an assert right ? because only called on the pseries
>platform which defines the handlers.

Because we use a unified vfio_memory_listener for legacy, spapr and iommufd
backend, so we need the check for legacy and iommufd backend.

Another choice is to introduce separate region_add/del callbacks for spapr,
then we can add assert. But that way we will have redundant code.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 30/41] vfio/iommufd: Add support for iova_ranges
  2023-11-06 17:19   ` Cédric Le Goater
@ 2023-11-07  3:07     ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-07  3:07 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 7, 2023 1:19 AM
>Subject: Re: [PATCH v4 30/41] vfio/iommufd: Add support for iova_ranges
>
>On 11/2/23 08:12, Zhenzhong Duan wrote:
>> Some vIOMMU such as virtio-iommu use iova ranges from host side to
>> setup reserved ranges for passthrough device, so that guest will not
>> use an iova range beyond host support.
>>
>> Use an uAPI of IOMMUFD to get iova ranges of host side and pass to
>> vIOMMU just like the legacy backend.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> v4: fix build error in 32bit fedora
>>
>>   hw/vfio/iommufd.c | 47
>+++++++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 47 insertions(+)
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 1bb55ca2c4..22f02f92a9 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -341,6 +341,52 @@ static int iommufd_ram_block_discard_disable(bool
>state)
>>       return ram_block_uncoordinated_discard_disable(state);
>>   }
>>
>> +static int vfio_get_info_iova_range(VFIOIOMMUFDContainer *container,
>> +                                    uint32_t ioas_id)
>> +{
>> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>> +    struct iommu_ioas_iova_ranges *info;
>> +    struct iommu_iova_range *iova_ranges;
>> +    int ret, sz, fd = container->be->fd;
>> +
>> +    info = g_malloc0(sizeof(*info));
>> +    info->size = sizeof(*info);
>> +    info->ioas_id = ioas_id;
>> +
>> +    ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
>> +    if (ret && errno != EMSGSIZE) {
>> +        goto error;
>> +    }
>> +
>> +    sz = info->num_iovas * sizeof(struct iommu_iova_range);
>> +    info = g_realloc(info, sizeof(*info) + sz);
>> +    info->allowed_iovas = (uintptr_t)(info + 1);
>> +
>> +    ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
>> +    if (ret) {
>> +        goto error;
>> +    }
>> +
>> +    iova_ranges = (struct iommu_iova_range *)info->allowed_iovas;
>
>iova_ranges = (struct iommu_iova_range *)(uintptr_t)info->allowed_iovas;

Will fix, thanks for point out.

BRs.
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window
  2023-11-07  3:06     ` Duan, Zhenzhong
@ 2023-11-07 13:07       ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 13:07 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Nicholas Piggin, Daniel Henrique Barboza, David Gibson,
	Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/7/23 04:06, Duan, Zhenzhong wrote:
> 
> 
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Sent: Tuesday, November 7, 2023 1:33 AM
>> Subject: Re: [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE
>> add/del_section_window
>>
>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>> No fucntional change intended.
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>>    include/hw/vfio/vfio-common.h         |  5 -----
>>>    include/hw/vfio/vfio-container-base.h |  5 +++++
>>>    hw/vfio/common.c                      |  8 ++------
>>>    hw/vfio/container-base.c              | 21 +++++++++++++++++++++
>>>    hw/vfio/spapr.c                       | 19 ++++++++++++++-----
>>>    5 files changed, 42 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index b9e5a0e64b..055f679363 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -169,11 +169,6 @@ VFIOAddressSpace
>> *vfio_get_address_space(AddressSpace *as);
>>>    void vfio_put_address_space(VFIOAddressSpace *space);
>>>
>>>    /* SPAPR specific */
>>> -int vfio_container_add_section_window(VFIOContainer *container,
>>> -                                      MemoryRegionSection *section,
>>> -                                      Error **errp);
>>> -void vfio_container_del_section_window(VFIOContainer *container,
>>> -                                       MemoryRegionSection *section);
>>>    int vfio_spapr_container_init(VFIOContainer *container, Error **errp);
>>>    void vfio_spapr_container_deinit(VFIOContainer *container);
>>>
>>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-
>> container-base.h
>>> index f62a14ac73..4b6f017c6f 100644
>>> --- a/include/hw/vfio/vfio-container-base.h
>>> +++ b/include/hw/vfio/vfio-container-base.h
>>> @@ -75,6 +75,11 @@ int vfio_container_dma_map(VFIOContainerBase
>> *bcontainer,
>>>    int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>>>                                 hwaddr iova, ram_addr_t size,
>>>                                 IOMMUTLBEntry *iotlb);
>>> +int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>>> +                                      MemoryRegionSection *section,
>>> +                                      Error **errp);
>>> +void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
>>> +                                       MemoryRegionSection *section);
>>>    int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>>>                                               bool start);
>>>    int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
>>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>>> index 483ba82089..572ae7c934 100644
>>> --- a/hw/vfio/common.c
>>> +++ b/hw/vfio/common.c
>>> @@ -571,8 +571,6 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>>    {
>>>        VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>>>                                                     listener);
>>> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>>> -                                            bcontainer);
>>>        hwaddr iova, end;
>>>        Int128 llend, llsize;
>>>        void *vaddr;
>>> @@ -595,7 +593,7 @@ static void vfio_listener_region_add(MemoryListener
>> *listener,
>>>            return;
>>>        }
>>>
>>> -    if (vfio_container_add_section_window(container, section, &err)) {
>>> +    if (vfio_container_add_section_window(bcontainer, section, &err)) {
>>>            goto fail;
>>>        }
>>>
>>> @@ -738,8 +736,6 @@ static void vfio_listener_region_del(MemoryListener
>> *listener,
>>>    {
>>>        VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>>>                                                     listener);
>>> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>>> -                                            bcontainer);
>>>        hwaddr iova, end;
>>>        Int128 llend, llsize;
>>>        int ret;
>>> @@ -818,7 +814,7 @@ static void vfio_listener_region_del(MemoryListener
>> *listener,
>>>
>>>        memory_region_unref(section->mr);
>>>
>>> -    vfio_container_del_section_window(container, section);
>>> +    vfio_container_del_section_window(bcontainer, section);
>>>    }
>>>
>>>    typedef struct VFIODirtyRanges {
>>> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
>>> index 0177f43741..71f7274973 100644
>>> --- a/hw/vfio/container-base.c
>>> +++ b/hw/vfio/container-base.c
>>> @@ -31,6 +31,27 @@ int vfio_container_dma_unmap(VFIOContainerBase
>> *bcontainer,
>>>        return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
>>>    }
>>>
>>> +int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>>> +                                      MemoryRegionSection *section,
>>> +                                      Error **errp)
>>> +{
>>> +    if (!bcontainer->ops->add_window) {
>>> +        return 0;
>>> +    }
>>
>> These should an assert right ? because only called on the pseries
>> platform which defines the handlers.
> 
> Because we use a unified vfio_memory_listener for legacy, spapr and iommufd
> backend, so we need the check for legacy and iommufd backend.
> 
> Another choice is to introduce separate region_add/del callbacks for spapr,
> then we can add assert. But that way we will have redundant code.

Yeah. I thought so. I will take a look. I have more comments to come.

Thanks,

C.




^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 25/41] Add iommufd configure option
  2023-11-02  7:12 ` [PATCH v4 25/41] Add iommufd configure option Zhenzhong Duan
@ 2023-11-07 13:14   ` Cédric Le Goater
  2023-11-07 14:37     ` Cédric Le Goater
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 13:14 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Paolo Bonzini, Marc-André Lureau, Daniel P. Berrangé,
	Thomas Huth, Philippe Mathieu-Daudé

On 11/2/23 08:12, Zhenzhong Duan wrote:
> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
> iommufd support, enabled by default.

I don't think a configure option is the right approach. I will
comment other patches to propose another solution relying on
Kconfig and activating IOMMUFD for aarch64, s390x, x86_64 only.
  
Please drop this patch.

Thanks,

C.



> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   meson.build                   | 6 ++++++
>   meson_options.txt             | 2 ++
>   scripts/meson-buildoptions.sh | 3 +++
>   3 files changed, 11 insertions(+)
> 
> diff --git a/meson.build b/meson.build
> index dcef8b1e79..72a57288a0 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -560,6 +560,10 @@ have_tpm = get_option('tpm') \
>     .require(targetos != 'windows', error_message: 'TPM emulation only available on POSIX systems') \
>     .allowed()
>   
> +have_iommufd = get_option('iommufd') \
> +  .require(targetos == 'linux', error_message: 'iommufd is supported only on Linux') \
> +  .allowed()
> +
>   # vhost
>   have_vhost_user = get_option('vhost_user') \
>     .disable_auto_if(targetos != 'linux') \
> @@ -2133,6 +2137,7 @@ if get_option('tcg').allowed()
>   endif
>   config_host_data.set('CONFIG_TPM', have_tpm)
>   config_host_data.set('CONFIG_TSAN', get_option('tsan'))
> +config_host_data.set('CONFIG_IOMMUFD', have_iommufd)
>   config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
>   config_host_data.set('CONFIG_VDE', vde.found())
>   config_host_data.set('CONFIG_VHOST', have_vhost)
> @@ -4075,6 +4080,7 @@ summary_info += {'vhost-user-crypto support': have_vhost_user_crypto}
>   summary_info += {'vhost-user-blk server support': have_vhost_user_blk_server}
>   summary_info += {'vhost-vdpa support': have_vhost_vdpa}
>   summary_info += {'build guest agent': have_ga}
> +summary_info += {'iommufd support': have_iommufd}
>   summary(summary_info, bool_yn: true, section: 'Configurable features')
>   
>   # Compilation information
> diff --git a/meson_options.txt b/meson_options.txt
> index 3c7398f3c6..91bb958cae 100644
> --- a/meson_options.txt
> +++ b/meson_options.txt
> @@ -109,6 +109,8 @@ option('dbus_display', type: 'feature', value: 'auto',
>          description: '-display dbus support')
>   option('tpm', type : 'feature', value : 'auto',
>          description: 'TPM support')
> +option('iommufd', type : 'feature', value : 'auto',
> +       description: 'iommufd support')
>   
>   # Do not enable it by default even for Mingw32, because it doesn't
>   # work on Wine.
> diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
> index 7ca4b77eae..1effc46f7d 100644
> --- a/scripts/meson-buildoptions.sh
> +++ b/scripts/meson-buildoptions.sh
> @@ -125,6 +125,7 @@ meson_options_help() {
>     printf "%s\n" '  guest-agent-msi Build MSI package for the QEMU Guest Agent'
>     printf "%s\n" '  hvf             HVF acceleration support'
>     printf "%s\n" '  iconv           Font glyph conversion support'
> +  printf "%s\n" '  iommufd         iommufd support'
>     printf "%s\n" '  jack            JACK sound support'
>     printf "%s\n" '  keyring         Linux keyring support'
>     printf "%s\n" '  kvm             KVM acceleration support'
> @@ -342,6 +343,8 @@ _meson_option_parse() {
>       --enable-install-blobs) printf "%s" -Dinstall_blobs=true ;;
>       --disable-install-blobs) printf "%s" -Dinstall_blobs=false ;;
>       --interp-prefix=*) quote_sh "-Dinterp_prefix=$2" ;;
> +    --enable-iommufd) printf "%s" -Diommufd=enabled ;;
> +    --disable-iommufd) printf "%s" -Diommufd=disabled ;;
>       --enable-jack) printf "%s" -Djack=enabled ;;
>       --disable-jack) printf "%s" -Djack=disabled ;;
>       --enable-keyring) printf "%s" -Dkeyring=enabled ;;



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-02  7:12 ` [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
@ 2023-11-07 13:33   ` Cédric Le Goater
  2023-11-08  3:35     ` Duan, Zhenzhong
  2023-11-08  5:50     ` Markus Armbruster
  0 siblings, 2 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 13:33 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Paolo Bonzini, Eric Blake, Markus Armbruster,
	Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
> 
> Introduce an iommufd object which allows the interaction
> with the host /dev/iommu device.
> 
> The /dev/iommu can have been already pre-opened outside of qemu,
> in which case the fd can be passed directly along with the
> iommufd object:
> 
> This allows the iommufd object to be shared accross several
> subsystems (VFIO, VDPA, ...). For example, libvirt would open
> the /dev/iommu once.
> 
> If no fd is passed along with the iommufd object, the /dev/iommu
> is opened by the qemu code.
> 
> The CONFIG_IOMMUFD option must be set to compile this new object.
> 
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v4: add CONFIG_IOMMUFD check, document default case
> 
>   MAINTAINERS              |   7 ++
>   qapi/qom.json            |  22 ++++
>   include/sysemu/iommufd.h |  46 +++++++
>   backends/iommufd-stub.c  |  59 +++++++++
>   backends/iommufd.c       | 257 +++++++++++++++++++++++++++++++++++++++
>   backends/Kconfig         |   4 +
>   backends/meson.build     |   5 +
>   backends/trace-events    |  12 ++
>   qemu-options.hx          |  13 ++
>   9 files changed, 425 insertions(+)
>   create mode 100644 include/sysemu/iommufd.h
>   create mode 100644 backends/iommufd-stub.c
>   create mode 100644 backends/iommufd.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index cd8d6b140f..6f35159255 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2135,6 +2135,13 @@ F: hw/vfio/ap.c
>   F: docs/system/s390x/vfio-ap.rst
>   L: qemu-s390x@nongnu.org
>   
> +iommufd
> +M: Yi Liu <yi.l.liu@intel.com>
> +M: Eric Auger <eric.auger@redhat.com>
> +S: Supported
> +F: backends/iommufd.c
> +F: include/sysemu/iommufd.h
> +
>   vhost
>   M: Michael S. Tsirkin <mst@redhat.com>
>   S: Supported
> diff --git a/qapi/qom.json b/qapi/qom.json
> index c53ef978ff..27300add48 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -794,6 +794,24 @@
>   { 'struct': 'VfioUserServerProperties',
>     'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>   
> +##
> +# @IOMMUFDProperties:
> +#
> +# Properties for iommufd objects.
> +#
> +# @fd: file descriptor name previously passed via 'getfd' command,
> +#     which represents a pre-opened /dev/iommu.  This allows the
> +#     iommufd object to be shared accross several subsystems
> +#     (VFIO, VDPA, ...), and the file descriptor to be shared
> +#     with other process, e.g. DPDK.  (default: QEMU opens
> +#     /dev/iommu by itself)
> +#
> +# Since: 8.2
> +##
> +{ 'struct': 'IOMMUFDProperties',
> +  'data': { '*fd': 'str' },
> +  'if': 'CONFIG_IOMMUFD' }


Activating or not IOMMUFD on a platform is a configuration choice
and it is not a dependency on an external resource. I would make
things simpler and drop all the #ifdef in the documentation files.

There might be a way to remove the documentation also. Not a big
issue for now.


> +
>   ##
>   # @RngProperties:
>   #
> @@ -934,6 +952,8 @@
>       'input-barrier',
>       { 'name': 'input-linux',
>         'if': 'CONFIG_LINUX' },
> +    { 'name': 'iommufd',
> +      'if': 'CONFIG_IOMMUFD' },
>       'iothread',
>       'main-loop',
>       { 'name': 'memory-backend-epc',
> @@ -1003,6 +1023,8 @@
>         'input-barrier':              'InputBarrierProperties',
>         'input-linux':                { 'type': 'InputLinuxProperties',
>                                         'if': 'CONFIG_LINUX' },
> +      'iommufd':                    { 'type': 'IOMMUFDProperties',
> +                                      'if': 'CONFIG_IOMMUFD' },
>         'iothread':                   'IothreadProperties',
>         'main-loop':                  'MainLoopProperties',
>         'memory-backend-epc':         { 'type': 'MemoryBackendEpcProperties',
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> new file mode 100644
> index 0000000000..f0e5c7eeb8
> --- /dev/null
> +++ b/include/sysemu/iommufd.h
> @@ -0,0 +1,46 @@
> +#ifndef SYSEMU_IOMMUFD_H
> +#define SYSEMU_IOMMUFD_H
> +
> +#include "qom/object.h"
> +#include "qemu/thread.h"
> +#include "exec/hwaddr.h"
> +#include "exec/cpu-common.h"
> +
> +#define TYPE_IOMMUFD_BACKEND "iommufd"
> +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
> +                    IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND(obj) \
> +    OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND_GET_CLASS(obj) \
> +    OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND_CLASS(klass) \
> +    OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND)
> +struct IOMMUFDBackendClass {
> +    ObjectClass parent_class;
> +};
> +
> +struct IOMMUFDBackend {
> +    Object parent;
> +
> +    /*< protected >*/
> +    int fd;            /* /dev/iommu file descriptor */
> +    bool owned;        /* is the /dev/iommu opened internally */
> +    QemuMutex lock;
> +    uint32_t users;
> +
> +    /*< public >*/
> +};
> +
> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
> +void iommufd_backend_disconnect(IOMMUFDBackend *be);
> +
> +int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id);
> +void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id);
> +void iommufd_backend_free_id(int fd, uint32_t id);
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> +                            ram_addr_t size, void *vaddr, bool readonly);
> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> +                              hwaddr iova, ram_addr_t size);
> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
> +                               uint32_t pt_id, uint32_t *out_hwpt);
> +#endif
> diff --git a/backends/iommufd-stub.c b/backends/iommufd-stub.c

I don't think this stub file is needed. Please drop.

> new file mode 100644
> index 0000000000..02ac844c17
> --- /dev/null
> +++ b/backends/iommufd-stub.c
> @@ -0,0 +1,59 @@
> +/*
> + * iommufd container backend stub
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + * Copyright Red Hat, Inc. 2023
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *          Eric Auger <eric.auger@redhat.com>
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> +
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> +
> + * You should have received a copy of the GNU General Public License along
> + * with this program; if not, see <http://www.gnu.org/licenses/>.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "sysemu/iommufd.h"
> +#include "qemu/error-report.h"
> +
> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
> +{
> +    return 0;
> +}
> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
> +{
> +}
> +void iommufd_backend_free_id(int fd, uint32_t id)
> +{
> +}
> +int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id)
> +{
> +    return 0;
> +}
> +void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id)
> +{
> +}
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> +                            ram_addr_t size, void *vaddr, bool readonly)
> +{
> +    return 0;
> +}
> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> +                              hwaddr iova, ram_addr_t size)
> +{
> +    return 0;
> +}
> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
> +                               uint32_t pt_id, uint32_t *out_hwpt)
> +{
> +    return 0;
> +}
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> new file mode 100644
> index 0000000000..a526d58824
> --- /dev/null
> +++ b/backends/iommufd.c
> @@ -0,0 +1,257 @@
> +/*
> + * iommufd container backend
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + * Copyright Red Hat, Inc. 2023
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *          Eric Auger <eric.auger@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "sysemu/iommufd.h"
> +#include "qapi/error.h"
> +#include "qapi/qmp/qerror.h"
> +#include "qemu/module.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "monitor/monitor.h"
> +#include "trace.h"
> +#include <sys/ioctl.h>
> +#include <linux/iommufd.h>
> +
> +static void iommufd_backend_init(Object *obj)
> +{
> +    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> +
> +    be->fd = -1;
> +    be->users = 0;
> +    be->owned = true;
> +    qemu_mutex_init(&be->lock);
> +}
> +
> +static void iommufd_backend_finalize(Object *obj)
> +{
> +    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> +
> +    if (be->owned) {
> +        close(be->fd);
> +        be->fd = -1;
> +    }
> +}
> +
> +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
> +{
> +    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> +    int fd = -1;
> +
> +    fd = monitor_fd_param(monitor_cur(), str, errp);
> +    if (fd == -1) {
> +        error_prepend(errp, "Could not parse remote object fd %s:", str);
> +        return;
> +    }
> +    qemu_mutex_lock(&be->lock);
> +    be->fd = fd;
> +    be->owned = false;
> +    qemu_mutex_unlock(&be->lock);
> +    trace_iommu_backend_set_fd(be->fd);
> +}
> +
> +static void iommufd_backend_class_init(ObjectClass *oc, void *data)
> +{
> +    object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
> +}
> +
> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
> +{
> +    int fd, ret = 0;
> +
> +    qemu_mutex_lock(&be->lock);
> +    if (be->users == UINT32_MAX) {
> +        error_setg(errp, "too many connections");
> +        ret = -E2BIG;
> +        goto out;
> +    }
> +    if (be->owned && !be->users) {
> +        fd = qemu_open_old("/dev/iommu", O_RDWR);
> +        if (fd < 0) {
> +            error_setg_errno(errp, errno, "/dev/iommu opening failed");
> +            ret = fd;
> +            goto out;
> +        }
> +        be->fd = fd;
> +    }
> +    be->users++;
> +out:
> +    trace_iommufd_backend_connect(be->fd, be->owned,
> +                                  be->users, ret);
> +    qemu_mutex_unlock(&be->lock);
> +    return ret;
> +}
> +
> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
> +{
> +    qemu_mutex_lock(&be->lock);
> +    if (!be->users) {
> +        goto out;
> +    }
> +    be->users--;
> +    if (!be->users && be->owned) {
> +        close(be->fd);
> +        be->fd = -1;
> +    }
> +out:
> +    trace_iommufd_backend_disconnect(be->fd, be->users);
> +    qemu_mutex_unlock(&be->lock);
> +}
> +
> +static int iommufd_backend_alloc_ioas(int fd, uint32_t *ioas_id)
> +{
> +    int ret;
> +    struct iommu_ioas_alloc alloc_data  = {
> +        .size = sizeof(alloc_data),
> +        .flags = 0,
> +    };
> +
> +    ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
> +    if (ret) {
> +        error_report("Failed to allocate ioas %m");
> +    }
> +
> +    *ioas_id = alloc_data.out_ioas_id;
> +    trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
> +
> +    return ret;
> +}
> +
> +void iommufd_backend_free_id(int fd, uint32_t id)
> +{
> +    int ret;
> +    struct iommu_destroy des = {
> +        .size = sizeof(des),
> +        .id = id,
> +    };
> +
> +    ret = ioctl(fd, IOMMU_DESTROY, &des);
> +    trace_iommufd_backend_free_id(fd, id, ret);
> +    if (ret) {
> +        error_report("Failed to free id: %u %m", id);
> +    }
> +}
> +
> +int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id)
> +{
> +    int ret;
> +
> +    ret = iommufd_backend_alloc_ioas(be->fd, ioas_id);
> +    trace_iommufd_backend_get_ioas(be->fd, *ioas_id, ret);
> +    return ret;
> +}
> +
> +void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id)
> +{
> +    iommufd_backend_free_id(be->fd, ioas_id);
> +    trace_iommufd_backend_put_ioas(be->fd, ioas_id);
> +}
> +
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> +                            ram_addr_t size, void *vaddr, bool readonly)
> +{
> +    int ret;
> +    struct iommu_ioas_map map = {
> +        .size = sizeof(map),
> +        .flags = IOMMU_IOAS_MAP_READABLE |
> +                 IOMMU_IOAS_MAP_FIXED_IOVA,
> +        .ioas_id = ioas_id,
> +        .__reserved = 0,
> +        .user_va = (uintptr_t)vaddr,
> +        .iova = iova,
> +        .length = size,
> +    };
> +
> +    if (!readonly) {
> +        map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
> +    }
> +
> +    ret = ioctl(be->fd, IOMMU_IOAS_MAP, &map);
> +    trace_iommufd_backend_map_dma(be->fd, ioas_id, iova, size,
> +                                  vaddr, readonly, ret);
> +    if (ret) {
> +        error_report("IOMMU_IOAS_MAP failed: %m");
> +    }
> +    return !ret ? 0 : -errno;
> +}
> +
> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> +                              hwaddr iova, ram_addr_t size)
> +{
> +    int ret;
> +    struct iommu_ioas_unmap unmap = {
> +        .size = sizeof(unmap),
> +        .ioas_id = ioas_id,
> +        .iova = iova,
> +        .length = size,
> +    };
> +
> +    ret = ioctl(be->fd, IOMMU_IOAS_UNMAP, &unmap);
> +    trace_iommufd_backend_unmap_dma(be->fd, ioas_id, iova, size, ret);
> +    /*
> +     * TODO: IOMMUFD doesn't support mapping PCI BARs for now.
> +     * It's not a problem if there is no p2p dma, relax it here
> +     * and avoid many noisy trigger from vIOMMU side.

Should we add a warn_report() ?

> +     */
> +    if (ret && errno == ENOENT) {
> +        ret = 0;
> +    }
> +    if (ret) {
> +        error_report("IOMMU_IOAS_UNMAP failed: %m");
> +    }
> +    return !ret ? 0 : -errno;
> +}
> +
> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
> +                               uint32_t pt_id, uint32_t *out_hwpt)
> +{
> +    int ret;
> +    struct iommu_hwpt_alloc alloc_hwpt = {
> +        .size = sizeof(struct iommu_hwpt_alloc),
> +        .flags = 0,
> +        .dev_id = dev_id,
> +        .pt_id = pt_id,
> +        .__reserved = 0,
> +    };
> +
> +    ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
> +    trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id,
> +                                     alloc_hwpt.out_hwpt_id, ret);
> +
> +    if (ret) {
> +        error_report("IOMMU_HWPT_ALLOC failed: %m");
> +    } else {
> +        *out_hwpt = alloc_hwpt.out_hwpt_id;
> +    }
> +    return !ret ? 0 : -errno;
> +}
> +
> +static const TypeInfo iommufd_backend_info = {
> +    .name = TYPE_IOMMUFD_BACKEND,
> +    .parent = TYPE_OBJECT,
> +    .instance_size = sizeof(IOMMUFDBackend),
> +    .instance_init = iommufd_backend_init,
> +    .instance_finalize = iommufd_backend_finalize,
> +    .class_size = sizeof(IOMMUFDBackendClass),
> +    .class_init = iommufd_backend_class_init,
> +    .interfaces = (InterfaceInfo[]) {
> +        { TYPE_USER_CREATABLE },
> +        { }
> +    }
> +};
> +
> +static void register_types(void)
> +{
> +    type_register_static(&iommufd_backend_info);
> +}
> +
> +type_init(register_types);
> diff --git a/backends/Kconfig b/backends/Kconfig
> index f35abc1609..2cb23f62fa 100644
> --- a/backends/Kconfig
> +++ b/backends/Kconfig
> @@ -1 +1,5 @@
>   source tpm/Kconfig
> +
> +config IOMMUFD
> +    bool
> +    depends on VFIO
> diff --git a/backends/meson.build b/backends/meson.build
> index 914c7c4afb..05ac57ff15 100644
> --- a/backends/meson.build
> +++ b/backends/meson.build
> @@ -20,6 +20,11 @@ if have_vhost_user
>     system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c'))
>   endif
>   system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost.c'))
> +if have_iommufd
> +  system_ss.add(files('iommufd.c'))
> +else
> +  system_ss.add(files('iommufd-stub.c'))
> +endif

replace with :

  system_ss.add(when: 'CONFIG_IOMMUFD', if_true: files('iommufd.c'))

and drop iommufd-stub.c which will become useless.



>   if have_vhost_user_crypto
>     system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost-user.c'))
>   endif
> diff --git a/backends/trace-events b/backends/trace-events
> index 652eb76a57..e5f828bca2 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -5,3 +5,15 @@ dbus_vmstate_pre_save(void)
>   dbus_vmstate_post_load(int version_id) "version_id: %d"
>   dbus_vmstate_loading(const char *id) "id: %s"
>   dbus_vmstate_saving(const char *id) "id: %s"
> +
> +# iommufd.c
> +iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd=%d owned=%d users=%d (%d)"
> +iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
> +iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
> +iommufd_backend_get_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
> +iommufd_backend_put_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
> +iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
> +iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> +iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
> +iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u out_hwpt=%u (%d)"
> diff --git a/qemu-options.hx b/qemu-options.hx
> index e26230bac5..ddfaddf8ce 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -5210,6 +5210,19 @@ SRST
>   
>           The ``share`` boolean option is on by default with memfd.
>   
> +#ifdef CONFIG_IOMMUFD

Please remove.


Thanks,

C.




> +    ``-object iommufd,id=id[,fd=fd]``
> +        Creates an iommufd backend which allows control of DMA mapping
> +        through the /dev/iommu device.
> +
> +        The ``id`` parameter is a unique ID which frontends (such as
> +        vfio-pci of vdpa) will use to connect with the iommufd backend.
> +
> +        The ``fd`` parameter is an optional pre-opened file descriptor
> +        resulting from /dev/iommu opening. Usually the iommufd is shared
> +        across all subsystems, bringing the benefit of centralized
> +        reference counting.
> +#endif
>       ``-object rng-builtin,id=id``
>           Creates a random number generator backend which obtains entropy
>           from QEMU builtin functions. The ``id`` parameter is a unique ID



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 27/41] util/char_dev: Add open_cdev()
  2023-11-02  7:12 ` [PATCH v4 27/41] util/char_dev: Add open_cdev() Zhenzhong Duan
@ 2023-11-07 13:37   ` Cédric Le Goater
  2023-11-08  4:29     ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 13:37 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
> 
> /dev/vfio/devices/vfioX may not exist. In that case it is still possible
> to open /dev/char/$major:$minor instead. Add helper function to abstract
> the cdev open.
> 
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>> ---
>   MAINTAINERS                 |  6 +++
>   include/qemu/chardev_open.h | 16 ++++++++
>   util/chardev_open.c         | 81 +++++++++++++++++++++++++++++++++++++
>   util/meson.build            |  1 +
>   4 files changed, 104 insertions(+)
>   create mode 100644 include/qemu/chardev_open.h
>   create mode 100644 util/chardev_open.c
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 6f35159255..eada773975 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3473,6 +3473,12 @@ S: Maintained
>   F: include/qemu/iova-tree.h
>   F: util/iova-tree.c
>   
> +cdev Open
> +M: Yi Liu <yi.l.liu@intel.com>
> +S: Maintained
> +F: include/qemu/chardev_open.h
> +F: util/chardev_open.c

May be move under the IOMMUFD entry instead ?


Thanks,

C.


  





> +
>   elf2dmp
>   M: Viktor Prutyanov <viktor.prutyanov@phystech.edu>
>   S: Maintained
> diff --git a/include/qemu/chardev_open.h b/include/qemu/chardev_open.h
> new file mode 100644
> index 0000000000..64e8fcfdcb
> --- /dev/null
> +++ b/include/qemu/chardev_open.h
> @@ -0,0 +1,16 @@
> +/*
> + * QEMU Chardev Helper
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2.  See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_CHARDEV_OPEN_H
> +#define QEMU_CHARDEV_OPEN_H
> +
> +int open_cdev(const char *devpath, dev_t cdev);
> +#endif
> diff --git a/util/chardev_open.c b/util/chardev_open.c
> new file mode 100644
> index 0000000000..f776429788
> --- /dev/null
> +++ b/util/chardev_open.c
> @@ -0,0 +1,81 @@
> +/*
> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
> + * Copyright (C) 2023 Intel Corporation.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *      Redistribution and use in source and binary forms, with or
> + *      without modification, are permitted provided that the following
> + *      conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *
> + * Copied from
> + * https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/chardev_open.h"
> +
> +static int open_cdev_internal(const char *path, dev_t cdev)
> +{
> +    struct stat st;
> +    int fd;
> +
> +    fd = qemu_open_old(path, O_RDWR);
> +    if (fd == -1) {
> +        return -1;
> +    }
> +    if (fstat(fd, &st) || !S_ISCHR(st.st_mode) ||
> +        (cdev != 0 && st.st_rdev != cdev)) {
> +        close(fd);
> +        return -1;
> +    }
> +    return fd;
> +}
> +
> +static int open_cdev_robust(dev_t cdev)
> +{
> +    g_autofree char *devpath = NULL;
> +
> +    /*
> +     * This assumes that udev is being used and is creating the /dev/char/
> +     * symlinks.
> +     */
> +    devpath = g_strdup_printf("/dev/char/%u:%u", major(cdev), minor(cdev));
> +    return open_cdev_internal(devpath, cdev);
> +}
> +
> +int open_cdev(const char *devpath, dev_t cdev)
> +{
> +    int fd;
> +
> +    fd = open_cdev_internal(devpath, cdev);
> +    if (fd == -1 && cdev != 0) {
> +        return open_cdev_robust(cdev);
> +    }
> +    return fd;
> +}
> diff --git a/util/meson.build b/util/meson.build
> index eb677b40c2..eda0b06062 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -107,6 +107,7 @@ if have_block
>       util_ss.add(files('filemonitor-stub.c'))
>     endif
>     util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c'))
> +  util_ss.add(when: 'CONFIG_LINUX', if_true: files('chardev_open.c'))
>   endif
>   
>   if cpu == 'aarch64'



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-02  7:12 ` [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
@ 2023-11-07 13:41   ` Cédric Le Goater
  2023-11-08  5:45     ` Duan, Zhenzhong
  2023-11-08  2:59   ` Matthew Rosato
  1 sibling, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 13:41 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
> 
> Add the iommufd backend. The IOMMUFD container class is implemented
> based on the new /dev/iommu user API. This backend obviously depends
> on CONFIG_IOMMUFD.
> 
> So far, the iommufd backend doesn't support dirty page sync yet due
> to missing support in the host kernel.
> 
> Co-authored-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>

I think one tag for Eric is enough.

> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v4: use SPDX identifier, use iommufd_cdev_* prefix, merge with manual alloc patch
> 
>   include/hw/vfio/vfio-common.h |  23 ++
>   hw/vfio/common.c              |  19 +-
>   hw/vfio/iommufd.c             | 504 ++++++++++++++++++++++++++++++++++
>   hw/vfio/meson.build           |   3 +
>   hw/vfio/trace-events          |  13 +
>   5 files changed, 558 insertions(+), 4 deletions(-)
>   create mode 100644 hw/vfio/iommufd.c
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 24ecc0e7ee..3f1a39a991 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -89,6 +89,23 @@ typedef struct VFIOHostDMAWindow {
>       QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
>   } VFIOHostDMAWindow;
>   
> +#ifdef CONFIG_IOMMUFD

Please remove the #ifdef.

> +typedef struct VFIOIOASHwpt {
> +    uint32_t hwpt_id;
> +    QLIST_HEAD(, VFIODevice) device_list;
> +    QLIST_ENTRY(VFIOIOASHwpt) next;
> +} VFIOIOASHwpt;
> +
> +typedef struct IOMMUFDBackend IOMMUFDBackend;
> +
> +typedef struct VFIOIOMMUFDContainer {
> +    VFIOContainerBase bcontainer;
> +    IOMMUFDBackend *be;
> +    uint32_t ioas_id;
> +    QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
> +} VFIOIOMMUFDContainer;
> +#endif
> +
>   typedef struct VFIODeviceOps VFIODeviceOps;
>   
>   typedef struct VFIODevice {
> @@ -116,6 +133,11 @@ typedef struct VFIODevice {
>       OnOffAuto pre_copy_dirty_page_tracking;
>       bool dirty_pages_supported;
>       bool dirty_tracking;
> +#ifdef CONFIG_IOMMUFD
> +    int devid;
> +    VFIOIOASHwpt *hwpt;
> +    IOMMUFDBackend *iommufd;
> +#endif
>   } VFIODevice;
>   
>   struct VFIODeviceOps {
> @@ -201,6 +223,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
>   extern VFIOGroupList vfio_group_list;
>   extern VFIODeviceList vfio_device_list;
>   extern const VFIOIOMMUOps vfio_legacy_ops;
> +extern const VFIOIOMMUOps vfio_iommufd_ops;
>   extern const MemoryListener vfio_memory_listener;
>   extern int vfio_kvm_device_fd;
>   
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 572ae7c934..a61dce2845 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1462,10 +1462,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
>   
>   void vfio_put_address_space(VFIOAddressSpace *space)
>   {
> -    if (QLIST_EMPTY(&space->containers)) {
> -        QLIST_REMOVE(space, list);
> -        g_free(space);
> +    if (!QLIST_EMPTY(&space->containers)) {
> +        return;
>       }
> +
> +    QLIST_REMOVE(space, list);
> +    g_free(space);
> +
>       if (QLIST_EMPTY(&vfio_address_spaces)) {
>           qemu_unregister_reset(vfio_reset_handler, NULL);
>       }
> @@ -1498,8 +1501,16 @@ retry:
>   int vfio_attach_device(char *name, VFIODevice *vbasedev,
>                          AddressSpace *as, Error **errp)
>   {
> -    const VFIOIOMMUOps *ops = &vfio_legacy_ops;
> +    const VFIOIOMMUOps *ops;
>   
> +#ifdef CONFIG_IOMMUFD

You can keep this one though.

> +    if (vbasedev->iommufd) {
> +        ops = &vfio_iommufd_ops;
> +    } else
> +#endif
> +    {
> +        ops = &vfio_legacy_ops;
> +    }
>       return ops->attach_device(name, vbasedev, as, errp);
>   }
>   
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> new file mode 100644
> index 0000000000..1bb55ca2c4
> --- /dev/null
> +++ b/hw/vfio/iommufd.c
> @@ -0,0 +1,504 @@
> +/*
> + * iommufd container backend
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + * Copyright Red Hat, Inc. 2023
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *          Eric Auger <eric.auger@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include <sys/ioctl.h>
> +#include <linux/vfio.h>
> +#include <linux/iommufd.h>
> +
> +#include "hw/vfio/vfio-common.h"
> +#include "qemu/error-report.h"
> +#include "trace.h"
> +#include "qapi/error.h"
> +#include "sysemu/iommufd.h"
> +#include "hw/qdev-core.h"
> +#include "sysemu/reset.h"
> +#include "qemu/cutils.h"
> +#include "qemu/chardev_open.h"
> +
> +static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
> +                       ram_addr_t size, void *vaddr, bool readonly)
> +{
> +    VFIOIOMMUFDContainer *container =
> +        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> +
> +    return iommufd_backend_map_dma(container->be,
> +                                   container->ioas_id,
> +                                   iova, size, vaddr, readonly);
> +}
> +
> +static int iommufd_unmap(VFIOContainerBase *bcontainer,
> +                         hwaddr iova, ram_addr_t size,
> +                         IOMMUTLBEntry *iotlb)
> +{
> +    VFIOIOMMUFDContainer *container =
> +        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> +
> +    /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
> +    return iommufd_backend_unmap_dma(container->be,
> +                                     container->ioas_id, iova, size);
> +}
> +
> +static void iommufd_cdev_kvm_device_add(VFIODevice *vbasedev)
> +{
> +    Error *err = NULL;
> +
> +    if (vfio_kvm_device_add_fd(vbasedev->fd, &err)) {
> +        error_report_err(err);
> +    }
> +}
> +
> +static void iommufd_cdev_kvm_device_del(VFIODevice *vbasedev)
> +{
> +    Error *err = NULL;
> +
> +    if (vfio_kvm_device_del_fd(vbasedev->fd, &err)) {
> +        error_report_err(err);
> +    }
> +}
> +
> +static int iommufd_connect_and_bind(VFIODevice *vbasedev, Error **errp)
> +{
> +    IOMMUFDBackend *iommufd = vbasedev->iommufd;
> +    struct vfio_device_bind_iommufd bind = {
> +        .argsz = sizeof(bind),
> +        .flags = 0,
> +    };
> +    int ret;
> +
> +    ret = iommufd_backend_connect(iommufd, errp);
> +    if (ret) {
> +        return ret;
> +    }
> +
> +    /*
> +     * Add device to kvm-vfio to be prepared for the tracking
> +     * in KVM. Especially for some emulated devices, it requires
> +     * to have kvm information in the device open.
> +     */
> +    iommufd_cdev_kvm_device_add(vbasedev);
> +
> +    /* Bind device to iommufd */
> +    bind.iommufd = iommufd->fd;
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
> +    if (ret) {
> +        error_setg_errno(errp, errno, "error bind device fd=%d to iommufd=%d",
> +                         vbasedev->fd, bind.iommufd);
> +        goto err_bind;
> +    }
> +
> +    vbasedev->devid = bind.out_devid;
> +    trace_iommufd_connect_and_bind(bind.iommufd, vbasedev->name, vbasedev->fd,
> +                                   vbasedev->devid);
> +    return ret;
> +err_bind:
> +    iommufd_cdev_kvm_device_del(vbasedev);
> +    iommufd_backend_disconnect(iommufd);
> +    return ret;
> +}
> +
> +static void iommufd_unbind_and_disconnect(VFIODevice *vbasedev)
> +{
> +    /* Unbind is automatically conducted when device fd is closed */
> +    iommufd_cdev_kvm_device_del(vbasedev);
> +    iommufd_backend_disconnect(vbasedev->iommufd);
> +}
> +
> +static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> +{
> +    long int ret = -ENOTTY;
> +    char *path, *vfio_dev_path = NULL, *vfio_path = NULL;
> +    DIR *dir = NULL;
> +    struct dirent *dent;
> +    gchar *contents;
> +    struct stat st;
> +    gsize length;
> +    int major, minor;
> +    dev_t vfio_devt;
> +
> +    path = g_strdup_printf("%s/vfio-dev", sysfs_path);
> +    if (stat(path, &st) < 0) {
> +        error_setg_errno(errp, errno, "no such host device");
> +        goto out_free_path;
> +    }
> +
> +    dir = opendir(path);
> +    if (!dir) {
> +        error_setg_errno(errp, errno, "couldn't open dirrectory %s", path);
> +        goto out_free_path;
> +    }
> +
> +    while ((dent = readdir(dir))) {
> +        if (!strncmp(dent->d_name, "vfio", 4)) {
> +            vfio_dev_path = g_strdup_printf("%s/%s/dev", path, dent->d_name);
> +            break;
> +        }
> +    }
> +
> +    if (!vfio_dev_path) {
> +        error_setg(errp, "failed to find vfio-dev/vfioX/dev");
> +        goto out_close_dir;
> +    }
> +
> +    if (!g_file_get_contents(vfio_dev_path, &contents, &length, NULL)) {
> +        error_setg(errp, "failed to load \"%s\"", vfio_dev_path);
> +        goto out_free_dev_path;
> +    }
> +
> +    if (sscanf(contents, "%d:%d", &major, &minor) != 2) {
> +        error_setg(errp, "failed to get major:minor for \"%s\"", vfio_dev_path);
> +        goto out_free_dev_path;
> +    }
> +    g_free(contents);
> +    vfio_devt = makedev(major, minor);
> +
> +    vfio_path = g_strdup_printf("/dev/vfio/devices/%s", dent->d_name);
> +    ret = open_cdev(vfio_path, vfio_devt);
> +    if (ret < 0) {
> +        error_setg(errp, "Failed to open %s", vfio_path);
> +    }
> +
> +    trace_iommufd_cdev_getfd(vfio_path, ret);
> +    g_free(vfio_path);
> +
> +out_free_dev_path:
> +    g_free(vfio_dev_path);
> +out_close_dir:
> +    closedir(dir);
> +out_free_path:
> +    if (*errp) {
> +        error_prepend(errp, VFIO_MSG_PREFIX, path);
> +    }
> +    g_free(path);
> +
> +    return ret;
> +}
> +
> +static VFIOIOASHwpt *iommufd_container_get_hwpt(VFIOIOMMUFDContainer *container,
> +                                                uint32_t hwpt_id)
> +{
> +    VFIOIOASHwpt *hwpt;
> +
> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +        if (hwpt->hwpt_id == hwpt_id) {
> +            return hwpt;
> +        }
> +    }
> +
> +    hwpt = g_malloc0(sizeof(*hwpt));
> +
> +    hwpt->hwpt_id = hwpt_id;
> +    QLIST_INIT(&hwpt->device_list);
> +    QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
> +
> +    return hwpt;
> +}
> +
> +static void iommufd_container_put_hwpt(IOMMUFDBackend *be, VFIOIOASHwpt *hwpt)
> +{
> +    QLIST_REMOVE(hwpt, next);
> +    iommufd_backend_free_id(be->fd, hwpt->hwpt_id);
> +    g_free(hwpt);
> +}
> +
> +static int iommufd_cdev_attach_hwpt(VFIODevice *vbasedev, uint32_t hwpt_id,
> +                                    Error **errp)
> +{
> +    int ret, iommufd = vbasedev->iommufd->fd;
> +    struct vfio_device_attach_iommufd_pt attach_data = {
> +        .argsz = sizeof(attach_data),
> +        .flags = 0,
> +        .pt_id = hwpt_id,
> +    };
> +
> +    /* Attach device to an hwpt within iommufd */
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> +    if (ret) {
> +        error_setg_errno(errp, errno,
> +                         "[iommufd=%d] error attach %s (%d) to hwpt_id=%d",
> +                         iommufd, vbasedev->name, vbasedev->fd, hwpt_id);
> +    }
> +    trace_iommufd_cdev_attach_hwpt(iommufd, vbasedev->name, vbasedev->fd,
> +                                   hwpt_id);
> +    return ret;
> +}
> +
> +static int iommufd_cdev_detach_hwpt(VFIODevice *vbasedev, Error **errp)
> +{
> +    int ret, iommufd = vbasedev->iommufd->fd;
> +    struct vfio_device_detach_iommufd_pt detach_data = {
> +        .argsz = sizeof(detach_data),
> +        .flags = 0,
> +    };
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data);
> +    if (ret) {
> +        error_setg_errno(errp, errno, "detach %s from ioas failed",
> +                         vbasedev->name);
> +    }
> +    trace_iommufd_cdev_detach_hwpt(iommufd, vbasedev->name,
> +                                   vbasedev->hwpt->hwpt_id);
> +    return ret;
> +}
> +
> +static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
> +                                         VFIOIOMMUFDContainer *container,
> +                                         Error **errp)
> +{
> +    int ret, iommufd = vbasedev->iommufd->fd;
> +    VFIOIOASHwpt *hwpt;
> +    uint32_t hwpt_id;
> +    Error *err = NULL;
> +
> +    /* try to attach to an existing hwpt in this container */
> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +        ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt->hwpt_id, &err);
> +        if (ret) {
> +            const char *msg = error_get_pretty(err);
> +
> +            trace_iommufd_cdev_fail_attach_existing_hwpt(msg);
> +            error_free(err);
> +            err = NULL;
> +        } else {
> +            goto found_hwpt;
> +        }
> +    }
> +
> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> +                                     container->ioas_id, &hwpt_id);
> +
> +    if (ret) {
> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
> +        return ret;
> +    }
> +
> +    /* Attach cdev to a new allocated hwpt within iommufd */
> +    ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt_id, errp);
> +    if (ret) {
> +        iommufd_backend_free_id(iommufd, hwpt_id);
> +        return ret;
> +    }
> +
> +    hwpt = iommufd_container_get_hwpt(container, hwpt_id);
> +found_hwpt:
> +    QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, next);
> +    vbasedev->hwpt = hwpt;
> +
> +    trace_iommufd_cdev_attach_container(iommufd, vbasedev->name, vbasedev->fd,
> +                                        container->ioas_id, hwpt->hwpt_id);
> +    return ret;
> +}
> +
> +static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
> +                                          VFIOIOMMUFDContainer *container)
> +{
> +    VFIOIOASHwpt *hwpt = vbasedev->hwpt;
> +    Error *err = NULL;
> +    int ret;
> +
> +    ret = iommufd_cdev_detach_hwpt(vbasedev, &err);
> +    if (ret) {
> +        error_report_err(err);
> +    }
> +
> +    QLIST_REMOVE(vbasedev, next);
> +    vbasedev->hwpt = NULL;
> +    if (QLIST_EMPTY(&hwpt->device_list)) {
> +        iommufd_container_put_hwpt(vbasedev->iommufd, hwpt);
> +    }
> +
> +    trace_iommufd_cdev_detach_container(container->be->fd, vbasedev->name,
> +                                        container->ioas_id);
> +}
> +
> +static void iommufd_container_destroy(VFIOIOMMUFDContainer *container)
> +{
> +    VFIOContainerBase *bcontainer = &container->bcontainer;
> +
> +    if (!QLIST_EMPTY(&container->hwpt_list)) {
> +        return;
> +    }
> +    memory_listener_unregister(&bcontainer->listener);
> +    vfio_container_destroy(bcontainer);
> +    iommufd_backend_put_ioas(container->be, container->ioas_id);
> +    g_free(container);
> +}
> +
> +static int iommufd_ram_block_discard_disable(bool state)
> +{
> +    /*
> +     * We support coordinated discarding of RAM via the RamDiscardManager.
> +     */
> +    return ram_block_uncoordinated_discard_disable(state);
> +}
> +
> +static int iommufd_attach_device(const char *name, VFIODevice *vbasedev,
> +                                 AddressSpace *as, Error **errp)
> +{
> +    VFIOContainerBase *bcontainer;
> +    VFIOIOMMUFDContainer *container;
> +    VFIOAddressSpace *space;
> +    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
> +    int ret, devfd;
> +    uint32_t ioas_id;
> +    Error *err = NULL;
> +
> +    devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
> +    if (devfd < 0) {
> +        return devfd;
> +    }
> +    vbasedev->fd = devfd;
> +
> +    ret = iommufd_connect_and_bind(vbasedev, errp);
> +    if (ret) {
> +        goto err_connect_bind;
> +    }
> +
> +    space = vfio_get_address_space(as);
> +
> +    /* try to attach to an existing container in this space */
> +    QLIST_FOREACH(bcontainer, &space->containers, next) {
> +        container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> +        if (bcontainer->ops != &vfio_iommufd_ops ||
> +            vbasedev->iommufd != container->be) {
> +            continue;
> +        }
> +        if (iommufd_cdev_attach_container(vbasedev, container, &err)) {
> +            const char *msg = error_get_pretty(err);
> +
> +            trace_iommufd_cdev_fail_attach_existing_container(msg);
> +            error_free(err);
> +            err = NULL;
> +        } else {
> +            ret = iommufd_ram_block_discard_disable(true);
> +            if (ret) {
> +                error_setg(errp,
> +                              "Cannot set discarding of RAM broken (%d)", ret);
> +                goto err_discard_disable;
> +            }
> +            goto found_container;
> +        }
> +    }
> +
> +    /* Need to allocate a new dedicated container */
> +    ret = iommufd_backend_get_ioas(vbasedev->iommufd, &ioas_id);
> +    if (ret < 0) {
> +        error_setg_errno(errp, errno, "Failed to alloc ioas");
> +        goto err_get_ioas;
> +    }
> +
> +    trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
> +
> +    container = g_malloc0(sizeof(*container));
> +    container->be = vbasedev->iommufd;
> +    container->ioas_id = ioas_id;
> +    QLIST_INIT(&container->hwpt_list);
> +
> +    bcontainer = &container->bcontainer;
> +    vfio_container_init(bcontainer, space, &vfio_iommufd_ops);
> +    QLIST_INSERT_HEAD(&space->containers, bcontainer, next);
> +
> +    ret = iommufd_cdev_attach_container(vbasedev, container, errp);
> +    if (ret) {
> +        goto err_attach_container;
> +    }
> +
> +    ret = iommufd_ram_block_discard_disable(true);
> +    if (ret) {
> +        goto err_discard_disable;
> +    }
> +
> +    bcontainer->pgsizes = qemu_real_host_page_size();
> +
> +    bcontainer->listener = vfio_memory_listener;
> +    memory_listener_register(&bcontainer->listener, bcontainer->space->as);
> +
> +    if (bcontainer->error) {
> +        ret = -1;
> +        error_propagate_prepend(errp, bcontainer->error,
> +                                "memory listener initialization failed: ");
> +        goto err_listener_register;
> +    }
> +
> +    bcontainer->initialized = true;
> +
> +found_container:
> +    ret = ioctl(devfd, VFIO_DEVICE_GET_INFO, &dev_info);
> +    if (ret) {
> +        error_setg_errno(errp, errno, "error getting device info");
> +        goto err_listener_register;
> +    }
> +
> +    /*
> +     * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
> +     * for discarding incompatibility check as well?
> +     */
> +    if (vbasedev->ram_block_discard_allowed) {
> +        iommufd_ram_block_discard_disable(false);
> +    }
> +
> +    vbasedev->group = 0;
> +    vbasedev->num_irqs = dev_info.num_irqs;
> +    vbasedev->num_regions = dev_info.num_regions;
> +    vbasedev->flags = dev_info.flags;
> +    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
> +    vbasedev->bcontainer = bcontainer;
> +    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
> +    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
> +
> +    trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
> +                                   vbasedev->num_regions, vbasedev->flags);
> +    return 0;
> +
> +err_listener_register:
> +    iommufd_ram_block_discard_disable(false);
> +err_discard_disable:
> +    iommufd_cdev_detach_container(vbasedev, container);
> +err_attach_container:
> +    iommufd_container_destroy(container);
> +err_get_ioas:
> +    vfio_put_address_space(space);
> +    iommufd_unbind_and_disconnect(vbasedev);
> +err_connect_bind:
> +    close(vbasedev->fd);
> +    return ret;
> +}
> +
> +static void iommufd_detach_device(VFIODevice *vbasedev)
> +{
> +    VFIOContainerBase *bcontainer = vbasedev->bcontainer;
> +    VFIOIOMMUFDContainer *container;
> +    VFIOAddressSpace *space = bcontainer->space;
> +
> +    QLIST_REMOVE(vbasedev, global_next);
> +    QLIST_REMOVE(vbasedev, container_next);
> +    vbasedev->bcontainer = NULL;
> +
> +    if (!vbasedev->ram_block_discard_allowed) {
> +        iommufd_ram_block_discard_disable(false);
> +    }
> +
> +    container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> +    iommufd_cdev_detach_container(vbasedev, container);
> +    iommufd_container_destroy(container);
> +    vfio_put_address_space(space);
> +
> +    iommufd_unbind_and_disconnect(vbasedev);
> +    close(vbasedev->fd);
> +}
> +
> +const VFIOIOMMUOps vfio_iommufd_ops = {
> +    .dma_map = iommufd_map,
> +    .dma_unmap = iommufd_unmap,
> +    .attach_device = iommufd_attach_device,
> +    .detach_device = iommufd_detach_device,
> +};
> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
> index eb6ce6229d..9cae2c9e21 100644
> --- a/hw/vfio/meson.build
> +++ b/hw/vfio/meson.build
> @@ -7,6 +7,9 @@ vfio_ss.add(files(
>     'spapr.c',
>     'migration.c',
>   ))
> +if have_iommufd
> +  vfio_ss.add(files('iommufd.c'))
> +endif

Instead,

vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
   'iommufd.c',
))


Thanks,

C.



>   vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
>     'display.c',
>     'pci-quirks.c',
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 08a1f9dfa4..d85342b65f 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -164,3 +164,16 @@ vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcop
>   vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
>   vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
>   vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> +
> +#iommufd.c
> +
> +iommufd_connect_and_bind(int iommufd, const char *name, int devfd, int devid) " [iommufd=%d] Successfully bound device %s (fd=%d): output devid=%d"
> +iommufd_cdev_getfd(const char *dev, int devfd) " %s (fd=%d)"
> +iommufd_cdev_attach_hwpt(int iommufd, const char *name, int devfd, int hwptid) " [iommufd=%d] Successfully attached device %s (%d) to hwptd=%d"
> +iommufd_cdev_detach_hwpt(int iommufd, const char *name, int hwptid) " [iommufd=%d] Detached %s from hwpt=%d"
> +iommufd_cdev_fail_attach_existing_hwpt(const char *msg) " %s"
> +iommufd_cdev_attach_container(int iommufd, const char *name, int devfd, int ioasid, int hwptid) " [iommufd=%d] Successfully attached device %s (%d) to ioasid=%d: output hwptd=%d"
> +iommufd_cdev_detach_container(int iommufd, const char *name, int ioasid) " [iommufd=%d] Detached %s from ioasid=%d"
> +iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
> +iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD container with ioasid=%d"
> +iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 41/41] vfio: Compile out iommufd for PPC target
  2023-11-02  7:13 ` [PATCH v4 41/41] vfio: Compile out iommufd for PPC target Zhenzhong Duan
@ 2023-11-07 13:44   ` Cédric Le Goater
  2023-11-08  4:31     ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 13:44 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Thomas Huth

On 11/2/23 08:13, Zhenzhong Duan wrote:
> Since PPC doesn't support IOMMUFD, make iommufd related code
> compiled out.
> 
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Please drop this patch.

Instead, add

     imply IOMMUFD

in hw/{i386,s390x,arm}/Kconfig for platforms supporting IOMMUFD.

Thanks,

C.



> ---
>   hw/vfio/common.c     | 2 +-
>   hw/vfio/pci.c        | 2 +-
>   hw/vfio/platform.c   | 2 +-
>   backends/meson.build | 4 ++--
>   hw/vfio/meson.build  | 2 +-
>   5 files changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 1c9203183d..000717cef3 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1504,7 +1504,7 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
>   {
>       const VFIOIOMMUOps *ops;
>   
> -#ifdef CONFIG_IOMMUFD
> +#if defined(CONFIG_IOMMUFD) && !defined(TARGET_PPC)
>       if (vbasedev->iommufd) {
>           ops = &vfio_iommufd_ops;
>       } else
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index d8f658ea47..2287e45119 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3550,7 +3550,7 @@ static Property vfio_pci_dev_properties[] = {
>                                      qdev_prop_nv_gpudirect_clique, uint8_t),
>       DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
>                                   OFF_AUTOPCIBAR_OFF),
> -#ifdef CONFIG_IOMMUFD
> +#if defined(CONFIG_IOMMUFD) && !defined(TARGET_PPC)
>       DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
>                        TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
>   #endif
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index aa0b2b9583..c8f4ae5a06 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -648,7 +648,7 @@ static Property vfio_platform_dev_properties[] = {
>       DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
>                          mmap_timeout, 1100),
>       DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
> -#ifdef CONFIG_IOMMUFD
> +#if defined(CONFIG_IOMMUFD) && !defined(TARGET_PPC)
>       DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
>                        TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
>   #endif
> diff --git a/backends/meson.build b/backends/meson.build
> index 05ac57ff15..9dbdfa87f7 100644
> --- a/backends/meson.build
> +++ b/backends/meson.build
> @@ -21,9 +21,9 @@ if have_vhost_user
>   endif
>   system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost.c'))
>   if have_iommufd
> -  system_ss.add(files('iommufd.c'))
> +  system_ss.add(when: 'TARGET_PPC', if_false: files('iommufd.c'))
>   else
> -  system_ss.add(files('iommufd-stub.c'))
> +  system_ss.add(when: 'TARGET_PPC', if_false: files('iommufd-stub.c'))
>   endif
>   if have_vhost_user_crypto
>     system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost-user.c'))
> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
> index 9cae2c9e21..4423bb3cd4 100644
> --- a/hw/vfio/meson.build
> +++ b/hw/vfio/meson.build
> @@ -8,7 +8,7 @@ vfio_ss.add(files(
>     'migration.c',
>   ))
>   if have_iommufd
> -  vfio_ss.add(files('iommufd.c'))
> +  vfio_ss.add(when: 'TARGET_PPC', if_false: files('iommufd.c'))
>   endif
>   vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
>     'display.c',



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 31/41] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
  2023-11-02  7:12 ` [PATCH v4 31/41] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Zhenzhong Duan
@ 2023-11-07 13:48   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 13:48 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> This helper will be used by both legacy and iommufd backends.
> 
> No functional changes intended.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/pci.h |  3 +++
>   hw/vfio/pci.c | 54 +++++++++++++++++++++++++++++++++++----------------
>   2 files changed, 40 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index fba8737ab2..1006061afb 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -218,6 +218,9 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
>   
>   extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
>   
> +int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
> +                                    struct vfio_pci_hot_reset_info **info_p);
> +
>   int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
>   
>   int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c62c02f7b6..eb55e8ae88 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2445,22 +2445,13 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
>       return (strcmp(tmp, name) == 0);
>   }
>   
> -static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> +int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
> +                                    struct vfio_pci_hot_reset_info **info_p)
>   {
> -    VFIOGroup *group;
>       struct vfio_pci_hot_reset_info *info;
> -    struct vfio_pci_dependent_device *devices;
> -    struct vfio_pci_hot_reset *reset;
> -    int32_t *fds;
> -    int ret, i, count;
> -    bool multi = false;
> +    int ret, count;
>   
> -    trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> -
> -    if (!single) {
> -        vfio_pci_pre_reset(vdev);
> -    }
> -    vdev->vbasedev.needs_reset = false;
> +    assert(info_p && !*info_p);
>   
>       info = g_malloc0(sizeof(*info));
>       info->argsz = sizeof(*info);
> @@ -2468,24 +2459,53 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>       ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>       if (ret && errno != ENOSPC) {
>           ret = -errno;
> +        g_free(info);
>           if (!vdev->has_pm_reset) {
>               error_report("vfio: Cannot reset device %s, "
>                            "no available reset mechanism.", vdev->vbasedev.name);
>           }
> -        goto out_single;
> +        return ret;
>       }
>   
>       count = info->count;
> -    info = g_realloc(info, sizeof(*info) + (count * sizeof(*devices)));
> -    info->argsz = sizeof(*info) + (count * sizeof(*devices));
> -    devices = &info->devices[0];
> +    info = g_realloc(info, sizeof(*info) + (count * sizeof(info->devices[0])));
> +    info->argsz = sizeof(*info) + (count * sizeof(info->devices[0]));
>   
>       ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
>       if (ret) {
>           ret = -errno;
> +        g_free(info);
>           error_report("vfio: hot reset info failed: %m");
> +        return ret;
> +    }
> +
> +    *info_p = info;
> +    return 0;
> +}
> +
> +static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> +{
> +    VFIOGroup *group;
> +    struct vfio_pci_hot_reset_info *info = NULL;
> +    struct vfio_pci_dependent_device *devices;
> +    struct vfio_pci_hot_reset *reset;
> +    int32_t *fds;
> +    int ret, i, count;
> +    bool multi = false;
> +
> +    trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> +
> +    if (!single) {
> +        vfio_pci_pre_reset(vdev);
> +    }
> +    vdev->vbasedev.needs_reset = false;
> +
> +    ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
> +
> +    if (ret) {
>           goto out_single;
>       }
> +    devices = &info->devices[0];
>   
>       trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
>   



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 32/41] vfio/pci: Introduce a vfio pci hot reset interface
  2023-11-02  7:12 ` [PATCH v4 32/41] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
@ 2023-11-07 13:52   ` Cédric Le Goater
  2023-11-08  5:46     ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 13:52 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Legacy vfio pci and iommufd cdev have different process to hot reset
> vfio device, expand current code to abstract out pci_hot_reset callback
> for legacy vfio, this same interface will also be used by iommufd
> cdev vfio device.
> 
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
>   hw/vfio/pci.h                         |  1 +
>   include/hw/vfio/vfio-container-base.h |  3 +++
>   hw/vfio/container.c                   |  2 ++
>   hw/vfio/pci.c                         | 11 ++++++++++-
>   4 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 1006061afb..12cc765821 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -220,6 +220,7 @@ extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
>   
>   int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
>                                       struct vfio_pci_hot_reset_info **info_p);
> +int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single);
>   
>   int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
>   
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 4b6f017c6f..45bb19c767 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -106,6 +106,9 @@ struct VFIOIOMMUOps {
>       int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
>       int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
>                                 hwaddr iova, hwaddr size);
> +    /* PCI specific */
> +    int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
> +
>       /* SPAPR specific */
>       int (*add_window)(VFIOContainerBase *bcontainer,
>                         MemoryRegionSection *section,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index ed2d721b2b..f27cc15d09 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -33,6 +33,7 @@
>   #include "trace.h"
>   #include "qapi/error.h"
>   #include "migration/migration.h"
> +#include "pci.h"
>   
>   VFIOGroupList vfio_group_list =
>       QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -929,4 +930,5 @@ const VFIOIOMMUOps vfio_legacy_ops = {
>       .detach_device = vfio_legacy_detach_device,
>       .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
>       .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
> +    .pci_hot_reset = vfio_legacy_pci_hot_reset,
>   };
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index eb55e8ae88..a6194b7bfe 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2483,8 +2483,9 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
>       return 0;
>   }
>   
> -static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> +int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single)

Could we move this routine to container .c ?


Thanks,

C.


>   {
> +    VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>       VFIOGroup *group;
>       struct vfio_pci_hot_reset_info *info = NULL;
>       struct vfio_pci_dependent_device *devices;
> @@ -2647,6 +2648,14 @@ out_single:
>       return ret;
>   }
>   
> +static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> +{
> +    VFIODevice *vbasedev = &vdev->vbasedev;
> +    const VFIOIOMMUOps *ops = vbasedev->bcontainer->ops;
> +
> +    return ops->pci_hot_reset(vbasedev, single);
> +}
> +
>   /*
>    * We want to differentiate hot reset of multiple in-use devices vs hot reset
>    * of a single in-use device.  VFIO_DEVICE_RESET will already handle the case



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 25/41] Add iommufd configure option
  2023-11-07 13:14   ` Cédric Le Goater
@ 2023-11-07 14:37     ` Cédric Le Goater
  2023-11-08  6:08       ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 14:37 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Paolo Bonzini, Marc-André Lureau, Daniel P. Berrangé,
	Thomas Huth, Philippe Mathieu-Daudé

On 11/7/23 14:14, Cédric Le Goater wrote:
> On 11/2/23 08:12, Zhenzhong Duan wrote:
>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>> iommufd support, enabled by default.
> 
> I don't think a configure option is the right approach. I will
> comment other patches to propose another solution relying on
> Kconfig and activating IOMMUFD for aarch64, s390x, x86_64 only.

Here is an example on your series :

   https://github.com/legoater/qemu/commits/vfio-8.2

The backend is always compiled (since it is common) but the VFIO frontend
and the 'iommufd' object are only available on x86_64, arm, s390x.

Looks like a good compromise. Please tell me what you think about it.

Thanks,

C.





> 
> Please drop this patch.
> 
> Thanks,
> 
> C.
> 
> 
> 
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   meson.build                   | 6 ++++++
>>   meson_options.txt             | 2 ++
>>   scripts/meson-buildoptions.sh | 3 +++
>>   3 files changed, 11 insertions(+)
>>
>> diff --git a/meson.build b/meson.build
>> index dcef8b1e79..72a57288a0 100644
>> --- a/meson.build
>> +++ b/meson.build
>> @@ -560,6 +560,10 @@ have_tpm = get_option('tpm') \
>>     .require(targetos != 'windows', error_message: 'TPM emulation only available on POSIX systems') \
>>     .allowed()
>> +have_iommufd = get_option('iommufd') \
>> +  .require(targetos == 'linux', error_message: 'iommufd is supported only on Linux') \
>> +  .allowed()
>> +
>>   # vhost
>>   have_vhost_user = get_option('vhost_user') \
>>     .disable_auto_if(targetos != 'linux') \
>> @@ -2133,6 +2137,7 @@ if get_option('tcg').allowed()
>>   endif
>>   config_host_data.set('CONFIG_TPM', have_tpm)
>>   config_host_data.set('CONFIG_TSAN', get_option('tsan'))
>> +config_host_data.set('CONFIG_IOMMUFD', have_iommufd)
>>   config_host_data.set('CONFIG_USB_LIBUSB', libusb.found())
>>   config_host_data.set('CONFIG_VDE', vde.found())
>>   config_host_data.set('CONFIG_VHOST', have_vhost)
>> @@ -4075,6 +4080,7 @@ summary_info += {'vhost-user-crypto support': have_vhost_user_crypto}
>>   summary_info += {'vhost-user-blk server support': have_vhost_user_blk_server}
>>   summary_info += {'vhost-vdpa support': have_vhost_vdpa}
>>   summary_info += {'build guest agent': have_ga}
>> +summary_info += {'iommufd support': have_iommufd}
>>   summary(summary_info, bool_yn: true, section: 'Configurable features')
>>   # Compilation information
>> diff --git a/meson_options.txt b/meson_options.txt
>> index 3c7398f3c6..91bb958cae 100644
>> --- a/meson_options.txt
>> +++ b/meson_options.txt
>> @@ -109,6 +109,8 @@ option('dbus_display', type: 'feature', value: 'auto',
>>          description: '-display dbus support')
>>   option('tpm', type : 'feature', value : 'auto',
>>          description: 'TPM support')
>> +option('iommufd', type : 'feature', value : 'auto',
>> +       description: 'iommufd support')
>>   # Do not enable it by default even for Mingw32, because it doesn't
>>   # work on Wine.
>> diff --git a/scripts/meson-buildoptions.sh b/scripts/meson-buildoptions.sh
>> index 7ca4b77eae..1effc46f7d 100644
>> --- a/scripts/meson-buildoptions.sh
>> +++ b/scripts/meson-buildoptions.sh
>> @@ -125,6 +125,7 @@ meson_options_help() {
>>     printf "%s\n" '  guest-agent-msi Build MSI package for the QEMU Guest Agent'
>>     printf "%s\n" '  hvf             HVF acceleration support'
>>     printf "%s\n" '  iconv           Font glyph conversion support'
>> +  printf "%s\n" '  iommufd         iommufd support'
>>     printf "%s\n" '  jack            JACK sound support'
>>     printf "%s\n" '  keyring         Linux keyring support'
>>     printf "%s\n" '  kvm             KVM acceleration support'
>> @@ -342,6 +343,8 @@ _meson_option_parse() {
>>       --enable-install-blobs) printf "%s" -Dinstall_blobs=true ;;
>>       --disable-install-blobs) printf "%s" -Dinstall_blobs=false ;;
>>       --interp-prefix=*) quote_sh "-Dinterp_prefix=$2" ;;
>> +    --enable-iommufd) printf "%s" -Diommufd=enabled ;;
>> +    --disable-iommufd) printf "%s" -Diommufd=disabled ;;
>>       --enable-jack) printf "%s" -Djack=enabled ;;
>>       --disable-jack) printf "%s" -Djack=disabled ;;
>>       --enable-keyring) printf "%s" -Dkeyring=enabled ;;
> 



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window
  2023-11-02  7:12 ` [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window Zhenzhong Duan
  2023-11-06 17:33   ` Cédric Le Goater
@ 2023-11-07 17:34   ` Cédric Le Goater
  1 sibling, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 17:34 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Nicholas Piggin, Daniel Henrique Barboza, David Gibson,
	Harsh Prateek Bora, open list:sPAPR (pseries)

On 11/2/23 08:12, Zhenzhong Duan wrote:
> No fucntional change intended.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   include/hw/vfio/vfio-common.h         |  5 -----
>   include/hw/vfio/vfio-container-base.h |  5 +++++
>   hw/vfio/common.c                      |  8 ++------
>   hw/vfio/container-base.c              | 21 +++++++++++++++++++++
>   hw/vfio/spapr.c                       | 19 ++++++++++++++-----
>   5 files changed, 42 insertions(+), 16 deletions(-)
> 
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index b9e5a0e64b..055f679363 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -169,11 +169,6 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as);
>   void vfio_put_address_space(VFIOAddressSpace *space);
>   
>   /* SPAPR specific */
> -int vfio_container_add_section_window(VFIOContainer *container,
> -                                      MemoryRegionSection *section,
> -                                      Error **errp);
> -void vfio_container_del_section_window(VFIOContainer *container,
> -                                       MemoryRegionSection *section);
>   int vfio_spapr_container_init(VFIOContainer *container, Error **errp);
>   void vfio_spapr_container_deinit(VFIOContainer *container);
>   
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index f62a14ac73..4b6f017c6f 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -75,6 +75,11 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>   int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>                                hwaddr iova, ram_addr_t size,
>                                IOMMUTLBEntry *iotlb);
> +int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
> +                                      MemoryRegionSection *section,
> +                                      Error **errp);
> +void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
> +                                       MemoryRegionSection *section);
>   int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>                                              bool start);
>   int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 483ba82089..572ae7c934 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -571,8 +571,6 @@ static void vfio_listener_region_add(MemoryListener *listener,
>   {
>       VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>                                                    listener);
> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> -                                            bcontainer);
>       hwaddr iova, end;
>       Int128 llend, llsize;
>       void *vaddr;
> @@ -595,7 +593,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>           return;
>       }
>   
> -    if (vfio_container_add_section_window(container, section, &err)) {
> +    if (vfio_container_add_section_window(bcontainer, section, &err)) {
>           goto fail;
>       }
>   
> @@ -738,8 +736,6 @@ static void vfio_listener_region_del(MemoryListener *listener,
>   {
>       VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
>                                                    listener);
> -    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> -                                            bcontainer);
>       hwaddr iova, end;
>       Int128 llend, llsize;
>       int ret;
> @@ -818,7 +814,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>   
>       memory_region_unref(section->mr);
>   
> -    vfio_container_del_section_window(container, section);
> +    vfio_container_del_section_window(bcontainer, section);
>   }
>   
>   typedef struct VFIODirtyRanges {
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 0177f43741..71f7274973 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -31,6 +31,27 @@ int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>       return bcontainer->ops->dma_unmap(bcontainer, iova, size, iotlb);
>   }
>   
> +int vfio_container_add_section_window(VFIOContainerBase *bcontainer,
> +                                      MemoryRegionSection *section,
> +                                      Error **errp)
> +{
> +    if (!bcontainer->ops->add_window) {
> +        return 0;
> +    }
> +
> +    return bcontainer->ops->add_window(bcontainer, section, errp);
> +}
> +
> +void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
> +                                       MemoryRegionSection *section)
> +{
> +    if (!bcontainer->ops->del_window) {
> +        return;
> +    }
> +
> +    return bcontainer->ops->del_window(bcontainer, section);
> +}
> +
>   int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
>                                              bool start)
>   {
> diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c
> index e1a6b35563..5be1911aad 100644
> --- a/hw/vfio/spapr.c
> +++ b/hw/vfio/spapr.c
> @@ -319,10 +319,13 @@ static int vfio_spapr_create_window(VFIOContainer *container,
>       return 0;
>   }
>   
> -int vfio_container_add_section_window(VFIOContainer *container,
> -                                      MemoryRegionSection *section,
> -                                      Error **errp)
> +static int
> +vfio_spapr_container_add_section_window(VFIOContainerBase *bcontainer,
> +                                        MemoryRegionSection *section,
> +                                        Error **errp)
>   {
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
>       VFIOHostDMAWindow *hostwin;
>       hwaddr pgsize = 0;
>       int ret;
> @@ -407,9 +410,13 @@ int vfio_container_add_section_window(VFIOContainer *container,
>       return 0;
>   }
>   
> -void vfio_container_del_section_window(VFIOContainer *container,
> -                                       MemoryRegionSection *section)
> +static void
> +vfio_spapr_container_del_section_window(VFIOContainerBase *bcontainer,
> +                                        MemoryRegionSection *section)
>   {
> +    VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> +                                            bcontainer);
> +
>       if (container->iommu_type != VFIO_SPAPR_TCE_v2_IOMMU) {
>           return;
>       }
> @@ -430,6 +437,8 @@ static VFIOIOMMUOps vfio_iommu_spapr_ops;
>   static void setup_spapr_ops(VFIOContainerBase *bcontainer)
>   {
>       vfio_iommu_spapr_ops = *bcontainer->ops;
> +    vfio_iommu_spapr_ops.add_window = vfio_spapr_container_add_section_window;
> +    vfio_iommu_spapr_ops.del_window = vfio_spapr_container_del_section_window;
>       bcontainer->ops = &vfio_iommu_spapr_ops;
>   }
>   



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 36/41] vfio: Allow the selection of a given iommu backend for platform ap and ccw
  2023-11-02  7:12 ` [PATCH v4 36/41] vfio: Allow the selection of a given iommu backend for platform ap and ccw Zhenzhong Duan
@ 2023-11-07 18:18   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 18:18 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Thomas Huth, Tony Krowiak, Halil Pasic, Jason Herne, Eric Farman,
	Matthew Rosato, open list:S390 general arch...

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Previously we added support to select iommu backend for vfio pci
> device. Now we added others, E.g: platform, ap and ccw.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

We would need an Ack from the Z team for this change.

Thanks,

C.


> ---
>   include/hw/vfio/vfio-platform.h | 1 +
>   hw/vfio/ap.c                    | 5 +++++
>   hw/vfio/ccw.c                   | 5 +++++
>   hw/vfio/platform.c              | 4 ++++
>   4 files changed, 15 insertions(+)
> 
> diff --git a/include/hw/vfio/vfio-platform.h b/include/hw/vfio/vfio-platform.h
> index c414c3dffc..f57f4276f2 100644
> --- a/include/hw/vfio/vfio-platform.h
> +++ b/include/hw/vfio/vfio-platform.h
> @@ -18,6 +18,7 @@
>   
>   #include "hw/sysbus.h"
>   #include "hw/vfio/vfio-common.h"
> +#include "sysemu/iommufd.h"
>   #include "qemu/event_notifier.h"
>   #include "qemu/queue.h"
>   #include "qom/object.h"
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index bbf69ff55a..6a4186ccd3 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -15,6 +15,7 @@
>   #include <sys/ioctl.h>
>   #include "qapi/error.h"
>   #include "hw/vfio/vfio-common.h"
> +#include "sysemu/iommufd.h"
>   #include "hw/s390x/ap-device.h"
>   #include "qemu/error-report.h"
>   #include "qemu/event_notifier.h"
> @@ -204,6 +205,10 @@ static void vfio_ap_unrealize(DeviceState *dev)
>   
>   static Property vfio_ap_properties[] = {
>       DEFINE_PROP_STRING("sysfsdev", VFIOAPDevice, vdev.sysfsdev),
> +#ifdef CONFIG_IOMMUFD
> +    DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd,
> +                     TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> +#endif
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index d857bb8d0f..7695ede0fc 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -21,6 +21,7 @@
>   
>   #include "qapi/error.h"
>   #include "hw/vfio/vfio-common.h"
> +#include "sysemu/iommufd.h"
>   #include "hw/s390x/s390-ccw.h"
>   #include "hw/s390x/vfio-ccw.h"
>   #include "hw/qdev-properties.h"
> @@ -677,6 +678,10 @@ static void vfio_ccw_unrealize(DeviceState *dev)
>   static Property vfio_ccw_properties[] = {
>       DEFINE_PROP_STRING("sysfsdev", VFIOCCWDevice, vdev.sysfsdev),
>       DEFINE_PROP_BOOL("force-orb-pfch", VFIOCCWDevice, force_orb_pfch, false),
> +#ifdef CONFIG_IOMMUFD
> +    DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd,
> +                     TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> +#endif
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index 8e3d4ac458..a1c25e0337 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -649,6 +649,10 @@ static Property vfio_platform_dev_properties[] = {
>       DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
>                          mmap_timeout, 1100),
>       DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
> +#ifdef CONFIG_IOMMUFD
> +    DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
> +                     TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> +#endif
>       DEFINE_PROP_END_OF_LIST(),
>   };
>   



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 38/41] vfio/ap: Make vfio cdev pre-openable by passing a file handle
  2023-11-02  7:12 ` [PATCH v4 38/41] vfio/ap: " Zhenzhong Duan
@ 2023-11-07 18:19   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 18:19 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Thomas Huth, Tony Krowiak, Halil Pasic, Jason Herne,
	open list:S390 general arch...

On 11/2/23 08:12, Zhenzhong Duan wrote:
> This gives management tools like libvirt a chance to open the vfio
> cdev with privilege and pass FD to qemu. This way qemu never needs
> to have privilege to open a VFIO or iommu cdev node.
> 
> Opportunisticly, remove some unnecessory double-cast.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>


We would need an Ack from the Z team for this change.

Thanks,

C.

> ---
>   hw/vfio/ap.c | 32 +++++++++++++++++++++++++++++++-
>   1 file changed, 31 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index 6a4186ccd3..0a810f8b88 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -29,6 +29,7 @@
>   #include "hw/s390x/ap-bridge.h"
>   #include "exec/address-spaces.h"
>   #include "qom/object.h"
> +#include "monitor/monitor.h"
>   
>   #define TYPE_VFIO_AP_DEVICE      "vfio-ap"
>   
> @@ -159,7 +160,10 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
>       VFIOAPDevice *vapdev = VFIO_AP_DEVICE(dev);
>       VFIODevice *vbasedev = &vapdev->vdev;
>   
> -    vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
> +    if (vfio_device_get_name(vbasedev, errp)) {
> +        return;
> +    }
> +
>       vbasedev->ops = &vfio_ap_ops;
>       vbasedev->type = VFIO_DEVICE_TYPE_AP;
>       vbasedev->dev = dev;
> @@ -229,11 +233,36 @@ static const VMStateDescription vfio_ap_vmstate = {
>       .unmigratable = 1,
>   };
>   
> +static void vfio_ap_instance_init(Object *obj)
> +{
> +    VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
> +
> +    vapdev->vdev.fd = -1;
> +}
> +
> +#ifdef CONFIG_IOMMUFD
> +static void vfio_ap_set_fd(Object *obj, const char *str, Error **errp)
> +{
> +    VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
> +    int fd = -1;
> +
> +    fd = monitor_fd_param(monitor_cur(), str, errp);
> +    if (fd == -1) {
> +        error_prepend(errp, "Could not parse remote object fd %s:", str);
> +        return;
> +    }
> +    vapdev->vdev.fd = fd;
> +}
> +#endif
> +
>   static void vfio_ap_class_init(ObjectClass *klass, void *data)
>   {
>       DeviceClass *dc = DEVICE_CLASS(klass);
>   
>       device_class_set_props(dc, vfio_ap_properties);
> +#ifdef CONFIG_IOMMUFD
> +    object_class_property_add_str(klass, "fd", NULL, vfio_ap_set_fd);
> +#endif
>       dc->vmsd = &vfio_ap_vmstate;
>       dc->desc = "VFIO-based AP device assignment";
>       set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> @@ -248,6 +277,7 @@ static const TypeInfo vfio_ap_info = {
>       .name = TYPE_VFIO_AP_DEVICE,
>       .parent = TYPE_AP_DEVICE,
>       .instance_size = sizeof(VFIOAPDevice),
> +    .instance_init = vfio_ap_instance_init,
>       .class_init = vfio_ap_class_init,
>   };
>   



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 39/41] vfio/ccw: Make vfio cdev pre-openable by passing a file handle
  2023-11-02  7:13 ` [PATCH v4 39/41] vfio/ccw: " Zhenzhong Duan
@ 2023-11-07 18:20   ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 18:20 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Eric Farman, Matthew Rosato, Thomas Huth, open list:vfio-ccw

On 11/2/23 08:13, Zhenzhong Duan wrote:
> This gives management tools like libvirt a chance to open the vfio
> cdev with privilege and pass FD to qemu. This way qemu never needs
> to have privilege to open a VFIO or iommu cdev node.
> 
> Opportunisticly, remove a redundant definition of TYPE_VFIO_CCW.
> 
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>

We would need an Ack from the Z team for this change.

Thanks,

C.


> ---
>   hw/vfio/ccw.c | 34 +++++++++++++++++++++++++++++++---
>   1 file changed, 31 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 7695ede0fc..a674bd8d6d 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -30,6 +30,7 @@
>   #include "qemu/error-report.h"
>   #include "qemu/main-loop.h"
>   #include "qemu/module.h"
> +#include "monitor/monitor.h"
>   
>   struct VFIOCCWDevice {
>       S390CCWDevice cdev;
> @@ -589,11 +590,12 @@ static void vfio_ccw_realize(DeviceState *dev, Error **errp)
>           }
>       }
>   
> +    if (vfio_device_get_name(vbasedev, errp)) {
> +        return;
> +    }
> +
>       vbasedev->ops = &vfio_ccw_ops;
>       vbasedev->type = VFIO_DEVICE_TYPE_CCW;
> -    vbasedev->name = g_strdup_printf("%x.%x.%04x", vcdev->cdev.hostid.cssid,
> -                           vcdev->cdev.hostid.ssid,
> -                           vcdev->cdev.hostid.devid);
>       vbasedev->dev = dev;
>   
>       /*
> @@ -690,12 +692,37 @@ static const VMStateDescription vfio_ccw_vmstate = {
>       .unmigratable = 1,
>   };
>   
> +static void vfio_ccw_instance_init(Object *obj)
> +{
> +    VFIOCCWDevice *vcdev = VFIO_CCW(obj);
> +
> +    vcdev->vdev.fd = -1;
> +}
> +
> +#ifdef CONFIG_IOMMUFD
> +static void vfio_ccw_set_fd(Object *obj, const char *str, Error **errp)
> +{
> +    VFIOCCWDevice *vcdev = VFIO_CCW(obj);
> +    int fd = -1;
> +
> +    fd = monitor_fd_param(monitor_cur(), str, errp);
> +    if (fd == -1) {
> +        error_prepend(errp, "Could not parse remote object fd %s:", str);
> +        return;
> +    }
> +    vcdev->vdev.fd = fd;
> +}
> +#endif
> +
>   static void vfio_ccw_class_init(ObjectClass *klass, void *data)
>   {
>       DeviceClass *dc = DEVICE_CLASS(klass);
>       S390CCWDeviceClass *cdc = S390_CCW_DEVICE_CLASS(klass);
>   
>       device_class_set_props(dc, vfio_ccw_properties);
> +#ifdef CONFIG_IOMMUFD
> +    object_class_property_add_str(klass, "fd", NULL, vfio_ccw_set_fd);
> +#endif
>       dc->vmsd = &vfio_ccw_vmstate;
>       dc->desc = "VFIO-based subchannel assignment";
>       set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> @@ -713,6 +740,7 @@ static const TypeInfo vfio_ccw_info = {
>       .name = TYPE_VFIO_CCW,
>       .parent = TYPE_S390_CCW,
>       .instance_size = sizeof(VFIOCCWDevice),
> +    .instance_init = vfio_ccw_instance_init,
>       .class_init = vfio_ccw_class_init,
>   };
>   



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 00/41] vfio: Adopt iommufd
  2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
                   ` (41 preceding siblings ...)
  2023-11-06 14:23 ` [PATCH v4 00/41] vfio: Adopt iommufd Cédric Le Goater
@ 2023-11-07 18:28 ` Cédric Le Goater
  2023-11-08  3:26   ` Matthew Rosato
  42 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-07 18:28 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng

On 11/2/23 08:12, Zhenzhong Duan wrote:
> Hi,
> 
> Thanks all for giving guides and comments on previous series, here is
> the v4 of pure iommufd support part.
> 
> Based on Cédric's suggestion, this series includes an effort to remove
> spapr code from container.c, now all spapr functions are moved to spapr.c
> or spapr_pci_vfio.c, but there are still a few trival check on
> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
> callbacks and duplicate code just to remove them. Some functions are moved
> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
> functions to those spapr functions moved.
> 
> 
> PATCH 1-5: Move spapr functions to spapr*.c
> PATCH 6-20: Abstract out base container
> PATCH 21-24: Introduce sparpr container and its specific interface

PATCH 6-24 applied to vfio-next :

   https://github.com/legoater/qemu/commits/vfio-next

(with a global s/fucntional/functional/)


I also pushed the remaining patches on :

   https://github.com/legoater/qemu/commits/vfio-8.2

with a slight rework of the IOMMUFD configuration, now done per platform.
The VFIO frontend and the 'iommufd' object are only available on x86_64,
arm, s390x.

Thanks,

C.


> PATCH 25: Add --enable/--disable-iommufd config support
> PATCH 26: Introduce iommufd object
> PATCH 27-33: add IOMMUFD container and cdev support
> PATCH 34-39: fd passing for IOMMUFD object and cdev
> PATCH 40: make VFIOContainerBase parameter const
> PATCH 41: Compile out for PPC
> 
> 
> We have done wide test with different combinations, e.g:
> - PCI device were tested
> - FD passing and hot reset with some trick.
> - device hotplug test with legacy and iommufd backends
> - with or without vIOMMU for legacy and iommufd backends
> - divices linked to different iommufds
> - VFIO migration with a E800 net card(no dirty sync support) passthrough
> - platform, ccw and ap were only compile-tested due to environment limit
> 
> 
> Given some iommufd kernel limitations, the iommufd backend is
> not yet fully on par with the legacy backend w.r.t. features like:
> - p2p mappings (you will see related error traces)
> - dirty page sync
> - and etc.
> 
> 
> qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v4
> Based on vfio-next, commit id: f686924775
> 
> --------------------------------------------------------------------------
> 
> Below are some background and graph about the design:
> 
> With the introduction of iommufd, the Linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
> 
> At QEMU level, interactions with the /dev/iommu are abstracted by a new
> iommufd object (compiled in with the CONFIG_IOMMUFD option).
> 
> Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
> linked with an iommufd object. In this series, the vfio-pci device is
> granted with such capability (other VFIO devices are not yet ready):
> 
> It gets a new optional parameter named iommufd which allows to pass
> an iommufd object:
> 
>      -object iommufd,id=iommufd0
>      -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
> 
> Note the /dev/iommu and vfio cdev can be externally opened by a
> management layer. In such a case the fd is passed:
> 
>      -object iommufd,id=iommufd0,fd=22
>      -device vfio-pci,iommufd=iommufd0,fd=23
> 
> If the fd parameter is not passed, the fd is opened by QEMU.
> See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
> for detailed discuss on this requirement.
> 
> If no iommufd option is passed to the vfio-pci device, iommufd is not
> used and the end-user gets the behavior based on the legacy vfio iommu
> interfaces:
> 
>      -device vfio-pci,host=0000:02:00.0
> 
> While the legacy kernel interface is group-centric, the new iommufd
> interface is device-centric, relying on device fd and iommufd.
> 
> To support both interfaces in the QEMU VFIO device we reworked the vfio
> container abstraction so that the generic VFIO code can use either
> backend.
> 
> The VFIOContainer object becomes a base object derived into
> a) the legacy VFIO container and
> b) the new iommufd based container.
> 
> The base object implements generic code such as code related to
> memory_listener and address space management whereas the derived
> objects implement callbacks specific to either BE, legacy and
> iommufd. Indeed each backend has its own way to setup secure context
> and dma management interface. The below diagram shows how it looks
> like with both BEs.
> 
>                      VFIO                           AddressSpace/Memory
>      +-------+  +----------+  +-----+  +-----+
>      |  pci  |  | platform |  |  ap |  | ccw |
>      +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
>          |           |           |        |        |   AddressSpace       |
>          |           |           |        |        +------------+---------+
>      +---V-----------V-----------V--------V----+               /
>      |           VFIOAddressSpace              | <------------+
>      |                  |                      |  MemoryListener
>      |          VFIOContainer list             |
>      +-------+----------------------------+----+
>              |                            |
>              |                            |
>      +-------V------+            +--------V----------+
>      |   iommufd    |            |    vfio legacy    |
>      |  container   |            |     container     |
>      +-------+------+            +--------+----------+
>              |                            |
>              | /dev/iommu                 | /dev/vfio/vfio
>              | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
> Userspace   |                            |
> ============+============================+===========================
> Kernel      |  device fd                 |
>              +---------------+            | group/container fd
>              | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
>              |  ATTACH_IOAS) |            | device fd
>              |               |            |
>              |       +-------V------------V-----------------+
>      iommufd |       |                vfio                  |
> (map/unmap  |       +---------+--------------------+-------+
> ioas_copy)  |                 |                    | map/unmap
>              |                 |                    |
>       +------V------+    +-----V------+      +------V--------+
>       | iommfd core |    |  device    |      |  vfio iommu   |
>       +-------------+    +------------+      +---------------+
> 
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
>                (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
>                    (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> 2. VFIO populates DMA map/unmap via the container BEs
>     *) iommufd BE: uses iommufd
>     *) vfio legacy BE: uses container fd
> 
> 
> Changelog:
> v4:
> - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
> - add doc for default case without fd (Markus)
> - Fix build issue reported by Markus and Cédric
> - Simply use SPDX identifier in new file (Cédric)
> - make vfio_container_init/destroy helper a seperate patch (Cédric)
> - make vrdl_list movement a seperate patch (Cédric)
> - add const for some callback parameters (Cédric)
> - add g_assert in VFIOIOMMUOps callback (Cédric)
> - introduce pci_hot_reset callback (Cédric)
> - remove VFIOIOMMUSpaprOps (Cédric)
> - initialize g_autofree to NULL (Cédric)
> - adjust func name prefix and trace event in iommufd.c (Cédric)
> - add RB
> 
> v3:
> - Rename base container as VFIOContainerBase and legacy container as container (Cédric)
> - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
> - Cleanup container.c by introducing spapr backend and move spapr code out (Cédric)
> - Introduce vfio_iommu_spapr_ops (Cédric)
> - Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
> - patch19 and patch21 are splitted to two parts to facilitate review
> 
> v2:
> - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
> - add fd passing to platform/ap/ccw vfio device
> - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
> - rename char_dev.h to chardev_open.h for same naming scheme per Daniel
> - add full copyright per Daniel and Jason
> 
> 
> Note changelog below are from full IOMMUFD series:
> 
> v1:
> - Alloc hwpt instead of using auto hwpt
> - elaborate iommufd code per Nicolin
> - consolidate two patches and drop as.c
> - typo error fix and function rename
> 
> rfcv4:
> - rebase on top of v8.0.3
> - Add one patch from Yi which is about vfio device add in kvm
> - Remove IOAS_COPY optimization and focus on functions in this patchset
> - Fix wrong name issue reported and fix suggested by Matthew
> - Fix compilation issue reported and fix sugggsted by Nicolin
> - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
> granularity
> - Add dev_iter_next() callback to avoid adding so many callback
>    at container scope, add VFIODevice.hwpt to support that
> - Restore all functions back to common from container whenever possible,
>    mainly migration and reset related functions
> - Add --enable/disable-iommufd config option, enabled by default in linux
> - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
> - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
> - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
> redundant code
> - Add FD passing support for vfio device backed by IOMMUFD
> - Fix hot unplug resource leak issue in vfio_legacy_detach_device()
> - Fix FD leak in vfio_get_devicefd()
> 
> rfcv3:
> - rebase on top of v7.2.0
> - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
>    VFIO backends
> - Fix use after free in error path, reported by Alister
> - Split common.c in several steps to ease the review
> 
> rfcv2:
> - remove the first three patches of rfcv1
> - add open cdev helper suggested by Jason
> - remove the QOMification of the VFIOContainer and simply use standard ops
> (David)
> - add "-object iommufd" suggested by Alex
> 
> Thanks
> Zhenzhong
> 
> Eric Auger (11):
>    vfio/container: Switch to dma_map|unmap API
>    vfio/common: Move giommu_list in base container
>    vfio/container: Move space field to base container
>    vfio/container: Switch to IOMMU BE
>      set_dirty_page_tracking/query_dirty_bitmap API
>    vfio/container: Convert functions to base container
>    vfio/container: Move pgsizes and dma_max_mappings to base container
>    vfio/container: Move listener to base container
>    vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size to base
>      container
>    vfio/container: Implement attach/detach_device
>    backends/iommufd: Introduce the iommufd object
>    vfio/pci: Allow the selection of a given iommu backend
> 
> Yi Liu (2):
>    util/char_dev: Add open_cdev()
>    vfio/iommufd: Implement the iommufd backend
> 
> Zhenzhong Duan (28):
>    vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c
>    vfio/container: Move vfio_container_add/del_section_window into
>      spapr.c
>    vfio/container: Move spapr specific init/deinit into spapr.c
>    vfio/spapr: Make vfio_spapr_create/remove_window static
>    vfio/common: Move vfio_host_win_add/del into spapr.c
>    vfio: Introduce base object for VFIOContainer and targeted interface
>    vfio/container: Introduce a empty VFIOIOMMUOps
>    vfio/common: Introduce vfio_container_init/destroy helper
>    vfio/container: Move per container device list in base container
>    vfio/container: Move vrdl_list to base container
>    vfio/container: Move iova_ranges to base container
>    vfio/spapr: Introduce spapr backend and target interface
>    vfio/spapr: switch to spapr IOMMU BE add/del_section_window
>    vfio/spapr: Move prereg_listener into spapr container
>    vfio/spapr: Move hostwin_list into spapr container
>    Add iommufd configure option
>    vfio/iommufd: Relax assert check for iommufd backend
>    vfio/iommufd: Add support for iova_ranges
>    vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
>    vfio/pci: Introduce a vfio pci hot reset interface
>    vfio/iommufd: Enable pci hot reset through iommufd cdev interface
>    vfio/pci: Make vfio cdev pre-openable by passing a file handle
>    vfio: Allow the selection of a given iommu backend for platform ap and
>      ccw
>    vfio/platform: Make vfio cdev pre-openable by passing a file handle
>    vfio/ap: Make vfio cdev pre-openable by passing a file handle
>    vfio/ccw: Make vfio cdev pre-openable by passing a file handle
>    vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
>      callbacks
>    vfio: Compile out iommufd for PPC target
> 
>   MAINTAINERS                           |  13 +
>   meson.build                           |   6 +
>   qapi/qom.json                         |  22 +
>   hw/vfio/pci.h                         |   6 +
>   include/hw/vfio/vfio-common.h         | 118 ++---
>   include/hw/vfio/vfio-container-base.h | 121 +++++
>   include/hw/vfio/vfio-platform.h       |   1 +
>   include/hw/vfio/vfio.h                |   7 -
>   include/qemu/chardev_open.h           |  16 +
>   include/sysemu/iommufd.h              |  46 ++
>   backends/iommufd-stub.c               |  59 +++
>   backends/iommufd.c                    | 257 ++++++++++
>   hw/ppc/spapr_pci_vfio.c               | 100 +++-
>   hw/vfio/ap.c                          |  38 +-
>   hw/vfio/ccw.c                         |  40 +-
>   hw/vfio/common.c                      | 330 ++++++------
>   hw/vfio/container-base.c              | 101 ++++
>   hw/vfio/container.c                   | 443 ++++------------
>   hw/vfio/helpers.c                     |  34 +-
>   hw/vfio/iommufd.c                     | 697 ++++++++++++++++++++++++++
>   hw/vfio/pci.c                         | 112 +++--
>   hw/vfio/platform.c                    |  45 +-
>   hw/vfio/spapr.c                       | 338 ++++++++++++-
>   util/chardev_open.c                   |  81 +++
>   backends/Kconfig                      |   4 +
>   backends/meson.build                  |   5 +
>   backends/trace-events                 |  12 +
>   hw/vfio/meson.build                   |   4 +
>   hw/vfio/trace-events                  |  18 +-
>   meson_options.txt                     |   2 +
>   qemu-options.hx                       |  13 +
>   scripts/meson-buildoptions.sh         |   3 +
>   util/meson.build                      |   1 +
>   33 files changed, 2403 insertions(+), 690 deletions(-)
>   create mode 100644 include/hw/vfio/vfio-container-base.h
>   delete mode 100644 include/hw/vfio/vfio.h
>   create mode 100644 include/qemu/chardev_open.h
>   create mode 100644 include/sysemu/iommufd.h
>   create mode 100644 backends/iommufd-stub.c
>   create mode 100644 backends/iommufd.c
>   create mode 100644 hw/vfio/container-base.c
>   create mode 100644 hw/vfio/iommufd.c
>   create mode 100644 util/chardev_open.c
> 



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-02  7:12 ` [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
  2023-11-07 13:41   ` Cédric Le Goater
@ 2023-11-08  2:59   ` Matthew Rosato
  2023-11-08  7:16     ` Duan, Zhenzhong
  1 sibling, 1 reply; 114+ messages in thread
From: Matthew Rosato @ 2023-11-08  2:59 UTC (permalink / raw)
  To: Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Thomas Huth, Eric Farman, Halil Pasic, Jason J. Herne,
	Tony Krowiak

On 11/2/23 3:12 AM, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
> 
> Add the iommufd backend. The IOMMUFD container class is implemented
> based on the new /dev/iommu user API. This backend obviously depends
> on CONFIG_IOMMUFD.
> 
> So far, the iommufd backend doesn't support dirty page sync yet due
> to missing support in the host kernel.
> 
> Co-authored-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---

[...]

> +static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
> +                                         VFIOIOMMUFDContainer *container,
> +                                         Error **errp)
> +{
> +    int ret, iommufd = vbasedev->iommufd->fd;
> +    VFIOIOASHwpt *hwpt;
> +    uint32_t hwpt_id;
> +    Error *err = NULL;
> +
> +    /* try to attach to an existing hwpt in this container */
> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
> +        ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt->hwpt_id, &err);
> +        if (ret) {
> +            const char *msg = error_get_pretty(err);
> +
> +            trace_iommufd_cdev_fail_attach_existing_hwpt(msg);
> +            error_free(err);
> +            err = NULL;
> +        } else {
> +            goto found_hwpt;
> +        }
> +    }
> +
> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> +                                     container->ioas_id, &hwpt_id);
> +
> +    if (ret) {
> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
> +        return ret;
> +    }

The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to use iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in the kernel because it can't find an IOMMUFD_OBJ_DEVICE.

AFAIU that's because the mdevs are meant to instead use kernel access via vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when looking at the kernel vfio compat container.

As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd backend by just skipping this alloc_hwpt above and instead passing container->ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the vfio_iommufd_emulated_attach_ioas call in the kernel.

> +
> +    /* Attach cdev to a new allocated hwpt within iommufd */
> +    ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt_id, errp);
> +    if (ret) {
> +        iommufd_backend_free_id(iommufd, hwpt_id);
> +        return ret;
> +    }
> +
> +    hwpt = iommufd_container_get_hwpt(container, hwpt_id);
> +found_hwpt:
> +    QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, next);
> +    vbasedev->hwpt = hwpt;
> +
> +    trace_iommufd_cdev_attach_container(iommufd, vbasedev->name, vbasedev->fd,
> +                                        container->ioas_id, hwpt->hwpt_id);
> +    return ret;
> +}



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 00/41] vfio: Adopt iommufd
  2023-11-07 18:28 ` Cédric Le Goater
@ 2023-11-08  3:26   ` Matthew Rosato
  2023-11-08  8:37     ` Duan, Zhenzhong
  2023-11-08  9:21     ` Cédric Le Goater
  0 siblings, 2 replies; 114+ messages in thread
From: Matthew Rosato @ 2023-11-08  3:26 UTC (permalink / raw)
  To: Cédric Le Goater, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Thomas Huth, Eric Farman, Halil Pasic, Jason J. Herne,
	Tony Krowiak

On 11/7/23 1:28 PM, Cédric Le Goater wrote:
> On 11/2/23 08:12, Zhenzhong Duan wrote:
>> Hi,
>>
>> Thanks all for giving guides and comments on previous series, here is
>> the v4 of pure iommufd support part.
>>
>> Based on Cédric's suggestion, this series includes an effort to remove
>> spapr code from container.c, now all spapr functions are moved to spapr.c
>> or spapr_pci_vfio.c, but there are still a few trival check on
>> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
>> callbacks and duplicate code just to remove them. Some functions are moved
>> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
>> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
>> functions to those spapr functions moved.
>>
>>
>> PATCH 1-5: Move spapr functions to spapr*.c
>> PATCH 6-20: Abstract out base container
>> PATCH 21-24: Introduce sparpr container and its specific interface
> 
> PATCH 6-24 applied to vfio-next :
> 
>   https://github.com/legoater/qemu/commits/vfio-next
> 
> (with a global s/fucntional/functional/)
> 
> 
> I also pushed the remaining patches on :
> 
>   https://github.com/legoater/qemu/commits/vfio-8.2
> 
> with a slight rework of the IOMMUFD configuration, now done per platform.
> The VFIO frontend and the 'iommufd' object are only available on x86_64,
> arm, s390x.

FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found) so I think something isn't actually enabling IOMMUFD as expected with your change...

Instead I tested on s390x using vfio-next + patches 25-41 of this series on top.  

Legacy backend regression testing worked fine for vfio-pci, vfio-ap and vfio-ccw.

Using iommufd backend for vfio-pci on s390 exposes an s390-only issue related to accounting of vfio DMA limit (code in hw/s390x/s390-pci-vfio.c assumes VFIODevice.group is never null, but that's no longer true when we use the iommufd backend with cdev).  We don't even need to track this when using the iommufd backend -- With that issue bypassed, vfio-pci testing on s390x looks good so far.  I'll send a separate fix for that.

Using the iommufd backend for vfio-ccw and vfio-ap did not work, see response to patch 28.

Thanks,
Matt



^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-07 13:33   ` Cédric Le Goater
@ 2023-11-08  3:35     ` Duan, Zhenzhong
  2023-11-08  9:40       ` Cédric Le Goater
  2023-11-08  5:50     ` Markus Armbruster
  1 sibling, 1 reply; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  3:35 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Paolo Bonzini, Eric Blake, Markus Armbruster,
	Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 7, 2023 9:33 PM
>Subject: Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
>
>On 11/2/23 08:12, Zhenzhong Duan wrote:
>> From: Eric Auger <eric.auger@redhat.com>
[...]
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index c53ef978ff..27300add48 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -794,6 +794,24 @@
>>   { 'struct': 'VfioUserServerProperties',
>>     'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>
>> +##
>> +# @IOMMUFDProperties:
>> +#
>> +# Properties for iommufd objects.
>> +#
>> +# @fd: file descriptor name previously passed via 'getfd' command,
>> +#     which represents a pre-opened /dev/iommu.  This allows the
>> +#     iommufd object to be shared accross several subsystems
>> +#     (VFIO, VDPA, ...), and the file descriptor to be shared
>> +#     with other process, e.g. DPDK.  (default: QEMU opens
>> +#     /dev/iommu by itself)
>> +#
>> +# Since: 8.2
>> +##
>> +{ 'struct': 'IOMMUFDProperties',
>> +  'data': { '*fd': 'str' },
>> +  'if': 'CONFIG_IOMMUFD' }
>
>
>Activating or not IOMMUFD on a platform is a configuration choice
>and it is not a dependency on an external resource. I would make
>things simpler and drop all the #ifdef in the documentation files.

Yes, that will be cleaner.

>
>There might be a way to remove the documentation also. Not a big
>issue for now.
[...]
>> diff --git a/backends/iommufd-stub.c b/backends/iommufd-stub.c
>
>I don't think this stub file is needed. Please drop.

Will do.

>
>> new file mode 100644
>> index 0000000000..02ac844c17
>> --- /dev/null
>> +++ b/backends/iommufd-stub.c
>> @@ -0,0 +1,59 @@
>> +/*
>> + * iommufd container backend stub
>> + *
>> + * Copyright (C) 2023 Intel Corporation.
>> + * Copyright Red Hat, Inc. 2023
>> + *
>> + * Authors: Yi Liu <yi.l.liu@intel.com>
>> + *          Eric Auger <eric.auger@redhat.com>
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> +
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> +
>> + * You should have received a copy of the GNU General Public License along
>> + * with this program; if not, see <http://www.gnu.org/licenses/>.
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "sysemu/iommufd.h"
>> +#include "qemu/error-report.h"
>> +
>> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
>> +{
>> +    return 0;
>> +}
>> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
>> +{
>> +}
>> +void iommufd_backend_free_id(int fd, uint32_t id)
>> +{
>> +}
>> +int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id)
>> +{
>> +    return 0;
>> +}
>> +void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id)
>> +{
>> +}
>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>hwaddr iova,
>> +                            ram_addr_t size, void *vaddr, bool readonly)
>> +{
>> +    return 0;
>> +}
>> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> +                              hwaddr iova, ram_addr_t size)
>> +{
>> +    return 0;
>> +}
>> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>> +                               uint32_t pt_id, uint32_t *out_hwpt)
>> +{
>> +    return 0;
>> +}
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> new file mode 100644
>> index 0000000000..a526d58824
>> --- /dev/null
>> +++ b/backends/iommufd.c
>> @@ -0,0 +1,257 @@
>> +/*
>> + * iommufd container backend
>> + *
>> + * Copyright (C) 2023 Intel Corporation.
>> + * Copyright Red Hat, Inc. 2023
>> + *
>> + * Authors: Yi Liu <yi.l.liu@intel.com>
>> + *          Eric Auger <eric.auger@redhat.com>
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "sysemu/iommufd.h"
>> +#include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>> +#include "qemu/module.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/error-report.h"
>> +#include "monitor/monitor.h"
>> +#include "trace.h"
>> +#include <sys/ioctl.h>
>> +#include <linux/iommufd.h>
>> +
>> +static void iommufd_backend_init(Object *obj)
>> +{
>> +    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> +
>> +    be->fd = -1;
>> +    be->users = 0;
>> +    be->owned = true;
>> +    qemu_mutex_init(&be->lock);
>> +}
>> +
>> +static void iommufd_backend_finalize(Object *obj)
>> +{
>> +    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> +
>> +    if (be->owned) {
>> +        close(be->fd);
>> +        be->fd = -1;
>> +    }
>> +}
>> +
>> +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
>> +{
>> +    IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> +    int fd = -1;
>> +
>> +    fd = monitor_fd_param(monitor_cur(), str, errp);
>> +    if (fd == -1) {
>> +        error_prepend(errp, "Could not parse remote object fd %s:", str);
>> +        return;
>> +    }
>> +    qemu_mutex_lock(&be->lock);
>> +    be->fd = fd;
>> +    be->owned = false;
>> +    qemu_mutex_unlock(&be->lock);
>> +    trace_iommu_backend_set_fd(be->fd);
>> +}
>> +
>> +static void iommufd_backend_class_init(ObjectClass *oc, void *data)
>> +{
>> +    object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
>> +}
>> +
>> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
>> +{
>> +    int fd, ret = 0;
>> +
>> +    qemu_mutex_lock(&be->lock);
>> +    if (be->users == UINT32_MAX) {
>> +        error_setg(errp, "too many connections");
>> +        ret = -E2BIG;
>> +        goto out;
>> +    }
>> +    if (be->owned && !be->users) {
>> +        fd = qemu_open_old("/dev/iommu", O_RDWR);
>> +        if (fd < 0) {
>> +            error_setg_errno(errp, errno, "/dev/iommu opening failed");
>> +            ret = fd;
>> +            goto out;
>> +        }
>> +        be->fd = fd;
>> +    }
>> +    be->users++;
>> +out:
>> +    trace_iommufd_backend_connect(be->fd, be->owned,
>> +                                  be->users, ret);
>> +    qemu_mutex_unlock(&be->lock);
>> +    return ret;
>> +}
>> +
>> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
>> +{
>> +    qemu_mutex_lock(&be->lock);
>> +    if (!be->users) {
>> +        goto out;
>> +    }
>> +    be->users--;
>> +    if (!be->users && be->owned) {
>> +        close(be->fd);
>> +        be->fd = -1;
>> +    }
>> +out:
>> +    trace_iommufd_backend_disconnect(be->fd, be->users);
>> +    qemu_mutex_unlock(&be->lock);
>> +}
>> +
>> +static int iommufd_backend_alloc_ioas(int fd, uint32_t *ioas_id)
>> +{
>> +    int ret;
>> +    struct iommu_ioas_alloc alloc_data  = {
>> +        .size = sizeof(alloc_data),
>> +        .flags = 0,
>> +    };
>> +
>> +    ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
>> +    if (ret) {
>> +        error_report("Failed to allocate ioas %m");
>> +    }
>> +
>> +    *ioas_id = alloc_data.out_ioas_id;
>> +    trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
>> +
>> +    return ret;
>> +}
>> +
>> +void iommufd_backend_free_id(int fd, uint32_t id)
>> +{
>> +    int ret;
>> +    struct iommu_destroy des = {
>> +        .size = sizeof(des),
>> +        .id = id,
>> +    };
>> +
>> +    ret = ioctl(fd, IOMMU_DESTROY, &des);
>> +    trace_iommufd_backend_free_id(fd, id, ret);
>> +    if (ret) {
>> +        error_report("Failed to free id: %u %m", id);
>> +    }
>> +}
>> +
>> +int iommufd_backend_get_ioas(IOMMUFDBackend *be, uint32_t *ioas_id)
>> +{
>> +    int ret;
>> +
>> +    ret = iommufd_backend_alloc_ioas(be->fd, ioas_id);
>> +    trace_iommufd_backend_get_ioas(be->fd, *ioas_id, ret);
>> +    return ret;
>> +}
>> +
>> +void iommufd_backend_put_ioas(IOMMUFDBackend *be, uint32_t ioas_id)
>> +{
>> +    iommufd_backend_free_id(be->fd, ioas_id);
>> +    trace_iommufd_backend_put_ioas(be->fd, ioas_id);
>> +}
>> +
>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>hwaddr iova,
>> +                            ram_addr_t size, void *vaddr, bool readonly)
>> +{
>> +    int ret;
>> +    struct iommu_ioas_map map = {
>> +        .size = sizeof(map),
>> +        .flags = IOMMU_IOAS_MAP_READABLE |
>> +                 IOMMU_IOAS_MAP_FIXED_IOVA,
>> +        .ioas_id = ioas_id,
>> +        .__reserved = 0,
>> +        .user_va = (uintptr_t)vaddr,
>> +        .iova = iova,
>> +        .length = size,
>> +    };
>> +
>> +    if (!readonly) {
>> +        map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>> +    }
>> +
>> +    ret = ioctl(be->fd, IOMMU_IOAS_MAP, &map);
>> +    trace_iommufd_backend_map_dma(be->fd, ioas_id, iova, size,
>> +                                  vaddr, readonly, ret);
>> +    if (ret) {
>> +        error_report("IOMMU_IOAS_MAP failed: %m");
>> +    }
>> +    return !ret ? 0 : -errno;
>> +}
>> +
>> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> +                              hwaddr iova, ram_addr_t size)
>> +{
>> +    int ret;
>> +    struct iommu_ioas_unmap unmap = {
>> +        .size = sizeof(unmap),
>> +        .ioas_id = ioas_id,
>> +        .iova = iova,
>> +        .length = size,
>> +    };
>> +
>> +    ret = ioctl(be->fd, IOMMU_IOAS_UNMAP, &unmap);
>> +    trace_iommufd_backend_unmap_dma(be->fd, ioas_id, iova, size, ret);
>> +    /*
>> +     * TODO: IOMMUFD doesn't support mapping PCI BARs for now.
>> +     * It's not a problem if there is no p2p dma, relax it here
>> +     * and avoid many noisy trigger from vIOMMU side.
>
>Should we add a warn_report() ?

The purpose of checking "ret && errno == ENOENT" is to avoid many
error_report() for PCI BARs, If we add warn_report(), there will still be
many print for PCI BARs.

>
>> +     */
>> +    if (ret && errno == ENOENT) {
>> +        ret = 0;
>> +    }
>> +    if (ret) {
>> +        error_report("IOMMU_IOAS_UNMAP failed: %m");
>> +    }
>> +    return !ret ? 0 : -errno;
>> +}
>> +
>> +int iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id,
>> +                               uint32_t pt_id, uint32_t *out_hwpt)
>> +{
>> +    int ret;
>> +    struct iommu_hwpt_alloc alloc_hwpt = {
>> +        .size = sizeof(struct iommu_hwpt_alloc),
>> +        .flags = 0,
>> +        .dev_id = dev_id,
>> +        .pt_id = pt_id,
>> +        .__reserved = 0,
>> +    };
>> +
>> +    ret = ioctl(iommufd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
>> +    trace_iommufd_backend_alloc_hwpt(iommufd, dev_id, pt_id,
>> +                                     alloc_hwpt.out_hwpt_id, ret);
>> +
>> +    if (ret) {
>> +        error_report("IOMMU_HWPT_ALLOC failed: %m");
>> +    } else {
>> +        *out_hwpt = alloc_hwpt.out_hwpt_id;
>> +    }
>> +    return !ret ? 0 : -errno;
>> +}
>> +
>> +static const TypeInfo iommufd_backend_info = {
>> +    .name = TYPE_IOMMUFD_BACKEND,
>> +    .parent = TYPE_OBJECT,
>> +    .instance_size = sizeof(IOMMUFDBackend),
>> +    .instance_init = iommufd_backend_init,
>> +    .instance_finalize = iommufd_backend_finalize,
>> +    .class_size = sizeof(IOMMUFDBackendClass),
>> +    .class_init = iommufd_backend_class_init,
>> +    .interfaces = (InterfaceInfo[]) {
>> +        { TYPE_USER_CREATABLE },
>> +        { }
>> +    }
>> +};
>> +
>> +static void register_types(void)
>> +{
>> +    type_register_static(&iommufd_backend_info);
>> +}
>> +
>> +type_init(register_types);
>> diff --git a/backends/Kconfig b/backends/Kconfig
>> index f35abc1609..2cb23f62fa 100644
>> --- a/backends/Kconfig
>> +++ b/backends/Kconfig
>> @@ -1 +1,5 @@
>>   source tpm/Kconfig
>> +
>> +config IOMMUFD
>> +    bool
>> +    depends on VFIO
>> diff --git a/backends/meson.build b/backends/meson.build
>> index 914c7c4afb..05ac57ff15 100644
>> --- a/backends/meson.build
>> +++ b/backends/meson.build
>> @@ -20,6 +20,11 @@ if have_vhost_user
>>     system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c'))
>>   endif
>>   system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-
>vhost.c'))
>> +if have_iommufd
>> +  system_ss.add(files('iommufd.c'))
>> +else
>> +  system_ss.add(files('iommufd-stub.c'))
>> +endif
>
>replace with :
>
>  system_ss.add(when: 'CONFIG_IOMMUFD', if_true: files('iommufd.c'))
>
>and drop iommufd-stub.c which will become useless.

Will do.

>
>
>
>>   if have_vhost_user_crypto
>>     system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-
>vhost-user.c'))
>>   endif
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 652eb76a57..e5f828bca2 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -5,3 +5,15 @@ dbus_vmstate_pre_save(void)
>>   dbus_vmstate_post_load(int version_id) "version_id: %d"
>>   dbus_vmstate_loading(const char *id) "id: %s"
>>   dbus_vmstate_saving(const char *id) "id: %s"
>> +
>> +# iommufd.c
>> +iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd=%d
>owned=%d users=%d (%d)"
>> +iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
>> +iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
>> +iommufd_backend_get_ioas(int iommufd, uint32_t ioas, int ret) "
>iommufd=%d ioas=%d (%d)"
>> +iommufd_backend_put_ioas(int iommufd, uint32_t ioas) " iommufd=%d
>ioas=%d"
>> +iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova,
>uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d
>iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
>> +iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>size=0x%"PRIx64" (%d)"
>> +iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) "
>iommufd=%d ioas=%d (%d)"
>> +iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
>> +iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id,
>uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u out_hwpt=%u
>(%d)"
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index e26230bac5..ddfaddf8ce 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -5210,6 +5210,19 @@ SRST
>>
>>           The ``share`` boolean option is on by default with memfd.
>>
>> +#ifdef CONFIG_IOMMUFD
>
>Please remove.

Will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 27/41] util/char_dev: Add open_cdev()
  2023-11-07 13:37   ` Cédric Le Goater
@ 2023-11-08  4:29     ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  4:29 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 7, 2023 9:38 PM
>Subject: Re: [PATCH v4 27/41] util/char_dev: Add open_cdev()
>
>On 11/2/23 08:12, Zhenzhong Duan wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>>
>> /dev/vfio/devices/vfioX may not exist. In that case it is still possible
>> to open /dev/char/$major:$minor instead. Add helper function to abstract
>> the cdev open.
>>
>> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>> ---
>>   MAINTAINERS                 |  6 +++
>>   include/qemu/chardev_open.h | 16 ++++++++
>>   util/chardev_open.c         | 81 +++++++++++++++++++++++++++++++++++++
>>   util/meson.build            |  1 +
>>   4 files changed, 104 insertions(+)
>>   create mode 100644 include/qemu/chardev_open.h
>>   create mode 100644 util/chardev_open.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 6f35159255..eada773975 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -3473,6 +3473,12 @@ S: Maintained
>>   F: include/qemu/iova-tree.h
>>   F: util/iova-tree.c
>>
>> +cdev Open
>> +M: Yi Liu <yi.l.liu@intel.com>
>> +S: Maintained
>> +F: include/qemu/chardev_open.h
>> +F: util/chardev_open.c
>
>May be move under the IOMMUFD entry instead ?

Sure, will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 41/41] vfio: Compile out iommufd for PPC target
  2023-11-07 13:44   ` Cédric Le Goater
@ 2023-11-08  4:31     ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  4:31 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Thomas Huth



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 7, 2023 9:44 PM
>Subject: Re: [PATCH v4 41/41] vfio: Compile out iommufd for PPC target
>
>On 11/2/23 08:13, Zhenzhong Duan wrote:
>> Since PPC doesn't support IOMMUFD, make iommufd related code
>> compiled out.
>>
>> Suggested-by: Cédric Le Goater <clg@redhat.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>Please drop this patch.
>
>Instead, add
>
>     imply IOMMUFD
>
>in hw/{i386,s390x,arm}/Kconfig for platforms supporting IOMMUFD.

Good suggestions, will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-07 13:41   ` Cédric Le Goater
@ 2023-11-08  5:45     ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  5:45 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 7, 2023 9:41 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On 11/2/23 08:12, Zhenzhong Duan wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>>
>> Add the iommufd backend. The IOMMUFD container class is implemented
>> based on the new /dev/iommu user API. This backend obviously depends
>> on CONFIG_IOMMUFD.
>>
>> So far, the iommufd backend doesn't support dirty page sync yet due
>> to missing support in the host kernel.
>>
>> Co-authored-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>
>I think one tag for Eric is enough.
>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> v4: use SPDX identifier, use iommufd_cdev_* prefix, merge with manual alloc
>patch
>>
>>   include/hw/vfio/vfio-common.h |  23 ++
>>   hw/vfio/common.c              |  19 +-
>>   hw/vfio/iommufd.c             | 504 ++++++++++++++++++++++++++++++++++
>>   hw/vfio/meson.build           |   3 +
>>   hw/vfio/trace-events          |  13 +
>>   5 files changed, 558 insertions(+), 4 deletions(-)
>>   create mode 100644 hw/vfio/iommufd.c
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 24ecc0e7ee..3f1a39a991 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -89,6 +89,23 @@ typedef struct VFIOHostDMAWindow {
>>       QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
>>   } VFIOHostDMAWindow;
>>
>> +#ifdef CONFIG_IOMMUFD
>
>Please remove the #ifdef.

Will do.

>
>> +typedef struct VFIOIOASHwpt {
>> +    uint32_t hwpt_id;
>> +    QLIST_HEAD(, VFIODevice) device_list;
>> +    QLIST_ENTRY(VFIOIOASHwpt) next;
>> +} VFIOIOASHwpt;
>> +
>> +typedef struct IOMMUFDBackend IOMMUFDBackend;
>> +
>> +typedef struct VFIOIOMMUFDContainer {
>> +    VFIOContainerBase bcontainer;
>> +    IOMMUFDBackend *be;
>> +    uint32_t ioas_id;
>> +    QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
>> +} VFIOIOMMUFDContainer;
>> +#endif
>> +
>>   typedef struct VFIODeviceOps VFIODeviceOps;
>>
>>   typedef struct VFIODevice {
>> @@ -116,6 +133,11 @@ typedef struct VFIODevice {
>>       OnOffAuto pre_copy_dirty_page_tracking;
>>       bool dirty_pages_supported;
>>       bool dirty_tracking;
>> +#ifdef CONFIG_IOMMUFD
>> +    int devid;
>> +    VFIOIOASHwpt *hwpt;
>> +    IOMMUFDBackend *iommufd;
>> +#endif
>>   } VFIODevice;
>>
>>   struct VFIODeviceOps {
>> @@ -201,6 +223,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice)
>VFIODeviceList;
>>   extern VFIOGroupList vfio_group_list;
>>   extern VFIODeviceList vfio_device_list;
>>   extern const VFIOIOMMUOps vfio_legacy_ops;
>> +extern const VFIOIOMMUOps vfio_iommufd_ops;
>>   extern const MemoryListener vfio_memory_listener;
>>   extern int vfio_kvm_device_fd;
>>
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 572ae7c934..a61dce2845 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -1462,10 +1462,13 @@ VFIOAddressSpace
>*vfio_get_address_space(AddressSpace *as)
>>
>>   void vfio_put_address_space(VFIOAddressSpace *space)
>>   {
>> -    if (QLIST_EMPTY(&space->containers)) {
>> -        QLIST_REMOVE(space, list);
>> -        g_free(space);
>> +    if (!QLIST_EMPTY(&space->containers)) {
>> +        return;
>>       }
>> +
>> +    QLIST_REMOVE(space, list);
>> +    g_free(space);
>> +
>>       if (QLIST_EMPTY(&vfio_address_spaces)) {
>>           qemu_unregister_reset(vfio_reset_handler, NULL);
>>       }
>> @@ -1498,8 +1501,16 @@ retry:
>>   int vfio_attach_device(char *name, VFIODevice *vbasedev,
>>                          AddressSpace *as, Error **errp)
>>   {
>> -    const VFIOIOMMUOps *ops = &vfio_legacy_ops;
>> +    const VFIOIOMMUOps *ops;
>>
>> +#ifdef CONFIG_IOMMUFD
>
>You can keep this one though.

Will do.

>
>> +    if (vbasedev->iommufd) {
>> +        ops = &vfio_iommufd_ops;
>> +    } else
>> +#endif
>> +    {
>> +        ops = &vfio_legacy_ops;
>> +    }
>>       return ops->attach_device(name, vbasedev, as, errp);
>>   }
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> new file mode 100644
>> index 0000000000..1bb55ca2c4
>> --- /dev/null
>> +++ b/hw/vfio/iommufd.c
>> @@ -0,0 +1,504 @@
>> +/*
>> + * iommufd container backend
>> + *
>> + * Copyright (C) 2023 Intel Corporation.
>> + * Copyright Red Hat, Inc. 2023
>> + *
>> + * Authors: Yi Liu <yi.l.liu@intel.com>
>> + *          Eric Auger <eric.auger@redhat.com>
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include <sys/ioctl.h>
>> +#include <linux/vfio.h>
>> +#include <linux/iommufd.h>
>> +
>> +#include "hw/vfio/vfio-common.h"
>> +#include "qemu/error-report.h"
>> +#include "trace.h"
>> +#include "qapi/error.h"
>> +#include "sysemu/iommufd.h"
>> +#include "hw/qdev-core.h"
>> +#include "sysemu/reset.h"
>> +#include "qemu/cutils.h"
>> +#include "qemu/chardev_open.h"
>> +
>> +static int iommufd_map(VFIOContainerBase *bcontainer, hwaddr iova,
>> +                       ram_addr_t size, void *vaddr, bool readonly)
>> +{
>> +    VFIOIOMMUFDContainer *container =
>> +        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> +
>> +    return iommufd_backend_map_dma(container->be,
>> +                                   container->ioas_id,
>> +                                   iova, size, vaddr, readonly);
>> +}
>> +
>> +static int iommufd_unmap(VFIOContainerBase *bcontainer,
>> +                         hwaddr iova, ram_addr_t size,
>> +                         IOMMUTLBEntry *iotlb)
>> +{
>> +    VFIOIOMMUFDContainer *container =
>> +        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> +
>> +    /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
>> +    return iommufd_backend_unmap_dma(container->be,
>> +                                     container->ioas_id, iova, size);
>> +}
>> +
>> +static void iommufd_cdev_kvm_device_add(VFIODevice *vbasedev)
>> +{
>> +    Error *err = NULL;
>> +
>> +    if (vfio_kvm_device_add_fd(vbasedev->fd, &err)) {
>> +        error_report_err(err);
>> +    }
>> +}
>> +
>> +static void iommufd_cdev_kvm_device_del(VFIODevice *vbasedev)
>> +{
>> +    Error *err = NULL;
>> +
>> +    if (vfio_kvm_device_del_fd(vbasedev->fd, &err)) {
>> +        error_report_err(err);
>> +    }
>> +}
>> +
>> +static int iommufd_connect_and_bind(VFIODevice *vbasedev, Error **errp)
>> +{
>> +    IOMMUFDBackend *iommufd = vbasedev->iommufd;
>> +    struct vfio_device_bind_iommufd bind = {
>> +        .argsz = sizeof(bind),
>> +        .flags = 0,
>> +    };
>> +    int ret;
>> +
>> +    ret = iommufd_backend_connect(iommufd, errp);
>> +    if (ret) {
>> +        return ret;
>> +    }
>> +
>> +    /*
>> +     * Add device to kvm-vfio to be prepared for the tracking
>> +     * in KVM. Especially for some emulated devices, it requires
>> +     * to have kvm information in the device open.
>> +     */
>> +    iommufd_cdev_kvm_device_add(vbasedev);
>> +
>> +    /* Bind device to iommufd */
>> +    bind.iommufd = iommufd->fd;
>> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
>> +    if (ret) {
>> +        error_setg_errno(errp, errno, "error bind device fd=%d to iommufd=%d",
>> +                         vbasedev->fd, bind.iommufd);
>> +        goto err_bind;
>> +    }
>> +
>> +    vbasedev->devid = bind.out_devid;
>> +    trace_iommufd_connect_and_bind(bind.iommufd, vbasedev->name,
>vbasedev->fd,
>> +                                   vbasedev->devid);
>> +    return ret;
>> +err_bind:
>> +    iommufd_cdev_kvm_device_del(vbasedev);
>> +    iommufd_backend_disconnect(iommufd);
>> +    return ret;
>> +}
>> +
>> +static void iommufd_unbind_and_disconnect(VFIODevice *vbasedev)
>> +{
>> +    /* Unbind is automatically conducted when device fd is closed */
>> +    iommufd_cdev_kvm_device_del(vbasedev);
>> +    iommufd_backend_disconnect(vbasedev->iommufd);
>> +}
>> +
>> +static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
>> +{
>> +    long int ret = -ENOTTY;
>> +    char *path, *vfio_dev_path = NULL, *vfio_path = NULL;
>> +    DIR *dir = NULL;
>> +    struct dirent *dent;
>> +    gchar *contents;
>> +    struct stat st;
>> +    gsize length;
>> +    int major, minor;
>> +    dev_t vfio_devt;
>> +
>> +    path = g_strdup_printf("%s/vfio-dev", sysfs_path);
>> +    if (stat(path, &st) < 0) {
>> +        error_setg_errno(errp, errno, "no such host device");
>> +        goto out_free_path;
>> +    }
>> +
>> +    dir = opendir(path);
>> +    if (!dir) {
>> +        error_setg_errno(errp, errno, "couldn't open dirrectory %s", path);
>> +        goto out_free_path;
>> +    }
>> +
>> +    while ((dent = readdir(dir))) {
>> +        if (!strncmp(dent->d_name, "vfio", 4)) {
>> +            vfio_dev_path = g_strdup_printf("%s/%s/dev", path, dent->d_name);
>> +            break;
>> +        }
>> +    }
>> +
>> +    if (!vfio_dev_path) {
>> +        error_setg(errp, "failed to find vfio-dev/vfioX/dev");
>> +        goto out_close_dir;
>> +    }
>> +
>> +    if (!g_file_get_contents(vfio_dev_path, &contents, &length, NULL)) {
>> +        error_setg(errp, "failed to load \"%s\"", vfio_dev_path);
>> +        goto out_free_dev_path;
>> +    }
>> +
>> +    if (sscanf(contents, "%d:%d", &major, &minor) != 2) {
>> +        error_setg(errp, "failed to get major:minor for \"%s\"", vfio_dev_path);
>> +        goto out_free_dev_path;
>> +    }
>> +    g_free(contents);
>> +    vfio_devt = makedev(major, minor);
>> +
>> +    vfio_path = g_strdup_printf("/dev/vfio/devices/%s", dent->d_name);
>> +    ret = open_cdev(vfio_path, vfio_devt);
>> +    if (ret < 0) {
>> +        error_setg(errp, "Failed to open %s", vfio_path);
>> +    }
>> +
>> +    trace_iommufd_cdev_getfd(vfio_path, ret);
>> +    g_free(vfio_path);
>> +
>> +out_free_dev_path:
>> +    g_free(vfio_dev_path);
>> +out_close_dir:
>> +    closedir(dir);
>> +out_free_path:
>> +    if (*errp) {
>> +        error_prepend(errp, VFIO_MSG_PREFIX, path);
>> +    }
>> +    g_free(path);
>> +
>> +    return ret;
>> +}
>> +
>> +static VFIOIOASHwpt
>*iommufd_container_get_hwpt(VFIOIOMMUFDContainer *container,
>> +                                                uint32_t hwpt_id)
>> +{
>> +    VFIOIOASHwpt *hwpt;
>> +
>> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> +        if (hwpt->hwpt_id == hwpt_id) {
>> +            return hwpt;
>> +        }
>> +    }
>> +
>> +    hwpt = g_malloc0(sizeof(*hwpt));
>> +
>> +    hwpt->hwpt_id = hwpt_id;
>> +    QLIST_INIT(&hwpt->device_list);
>> +    QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
>> +
>> +    return hwpt;
>> +}
>> +
>> +static void iommufd_container_put_hwpt(IOMMUFDBackend *be,
>VFIOIOASHwpt *hwpt)
>> +{
>> +    QLIST_REMOVE(hwpt, next);
>> +    iommufd_backend_free_id(be->fd, hwpt->hwpt_id);
>> +    g_free(hwpt);
>> +}
>> +
>> +static int iommufd_cdev_attach_hwpt(VFIODevice *vbasedev, uint32_t
>hwpt_id,
>> +                                    Error **errp)
>> +{
>> +    int ret, iommufd = vbasedev->iommufd->fd;
>> +    struct vfio_device_attach_iommufd_pt attach_data = {
>> +        .argsz = sizeof(attach_data),
>> +        .flags = 0,
>> +        .pt_id = hwpt_id,
>> +    };
>> +
>> +    /* Attach device to an hwpt within iommufd */
>> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT,
>&attach_data);
>> +    if (ret) {
>> +        error_setg_errno(errp, errno,
>> +                         "[iommufd=%d] error attach %s (%d) to hwpt_id=%d",
>> +                         iommufd, vbasedev->name, vbasedev->fd, hwpt_id);
>> +    }
>> +    trace_iommufd_cdev_attach_hwpt(iommufd, vbasedev->name, vbasedev-
>>fd,
>> +                                   hwpt_id);
>> +    return ret;
>> +}
>> +
>> +static int iommufd_cdev_detach_hwpt(VFIODevice *vbasedev, Error **errp)
>> +{
>> +    int ret, iommufd = vbasedev->iommufd->fd;
>> +    struct vfio_device_detach_iommufd_pt detach_data = {
>> +        .argsz = sizeof(detach_data),
>> +        .flags = 0,
>> +    };
>> +
>> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT,
>&detach_data);
>> +    if (ret) {
>> +        error_setg_errno(errp, errno, "detach %s from ioas failed",
>> +                         vbasedev->name);
>> +    }
>> +    trace_iommufd_cdev_detach_hwpt(iommufd, vbasedev->name,
>> +                                   vbasedev->hwpt->hwpt_id);
>> +    return ret;
>> +}
>> +
>> +static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
>> +                                         VFIOIOMMUFDContainer *container,
>> +                                         Error **errp)
>> +{
>> +    int ret, iommufd = vbasedev->iommufd->fd;
>> +    VFIOIOASHwpt *hwpt;
>> +    uint32_t hwpt_id;
>> +    Error *err = NULL;
>> +
>> +    /* try to attach to an existing hwpt in this container */
>> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> +        ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt->hwpt_id, &err);
>> +        if (ret) {
>> +            const char *msg = error_get_pretty(err);
>> +
>> +            trace_iommufd_cdev_fail_attach_existing_hwpt(msg);
>> +            error_free(err);
>> +            err = NULL;
>> +        } else {
>> +            goto found_hwpt;
>> +        }
>> +    }
>> +
>> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> +                                     container->ioas_id, &hwpt_id);
>> +
>> +    if (ret) {
>> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
>> +        return ret;
>> +    }
>> +
>> +    /* Attach cdev to a new allocated hwpt within iommufd */
>> +    ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt_id, errp);
>> +    if (ret) {
>> +        iommufd_backend_free_id(iommufd, hwpt_id);
>> +        return ret;
>> +    }
>> +
>> +    hwpt = iommufd_container_get_hwpt(container, hwpt_id);
>> +found_hwpt:
>> +    QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, next);
>> +    vbasedev->hwpt = hwpt;
>> +
>> +    trace_iommufd_cdev_attach_container(iommufd, vbasedev->name,
>vbasedev->fd,
>> +                                        container->ioas_id, hwpt->hwpt_id);
>> +    return ret;
>> +}
>> +
>> +static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
>> +                                          VFIOIOMMUFDContainer *container)
>> +{
>> +    VFIOIOASHwpt *hwpt = vbasedev->hwpt;
>> +    Error *err = NULL;
>> +    int ret;
>> +
>> +    ret = iommufd_cdev_detach_hwpt(vbasedev, &err);
>> +    if (ret) {
>> +        error_report_err(err);
>> +    }
>> +
>> +    QLIST_REMOVE(vbasedev, next);
>> +    vbasedev->hwpt = NULL;
>> +    if (QLIST_EMPTY(&hwpt->device_list)) {
>> +        iommufd_container_put_hwpt(vbasedev->iommufd, hwpt);
>> +    }
>> +
>> +    trace_iommufd_cdev_detach_container(container->be->fd, vbasedev-
>>name,
>> +                                        container->ioas_id);
>> +}
>> +
>> +static void iommufd_container_destroy(VFIOIOMMUFDContainer *container)
>> +{
>> +    VFIOContainerBase *bcontainer = &container->bcontainer;
>> +
>> +    if (!QLIST_EMPTY(&container->hwpt_list)) {
>> +        return;
>> +    }
>> +    memory_listener_unregister(&bcontainer->listener);
>> +    vfio_container_destroy(bcontainer);
>> +    iommufd_backend_put_ioas(container->be, container->ioas_id);
>> +    g_free(container);
>> +}
>> +
>> +static int iommufd_ram_block_discard_disable(bool state)
>> +{
>> +    /*
>> +     * We support coordinated discarding of RAM via the RamDiscardManager.
>> +     */
>> +    return ram_block_uncoordinated_discard_disable(state);
>> +}
>> +
>> +static int iommufd_attach_device(const char *name, VFIODevice *vbasedev,
>> +                                 AddressSpace *as, Error **errp)
>> +{
>> +    VFIOContainerBase *bcontainer;
>> +    VFIOIOMMUFDContainer *container;
>> +    VFIOAddressSpace *space;
>> +    struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
>> +    int ret, devfd;
>> +    uint32_t ioas_id;
>> +    Error *err = NULL;
>> +
>> +    devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>> +    if (devfd < 0) {
>> +        return devfd;
>> +    }
>> +    vbasedev->fd = devfd;
>> +
>> +    ret = iommufd_connect_and_bind(vbasedev, errp);
>> +    if (ret) {
>> +        goto err_connect_bind;
>> +    }
>> +
>> +    space = vfio_get_address_space(as);
>> +
>> +    /* try to attach to an existing container in this space */
>> +    QLIST_FOREACH(bcontainer, &space->containers, next) {
>> +        container = container_of(bcontainer, VFIOIOMMUFDContainer,
>bcontainer);
>> +        if (bcontainer->ops != &vfio_iommufd_ops ||
>> +            vbasedev->iommufd != container->be) {
>> +            continue;
>> +        }
>> +        if (iommufd_cdev_attach_container(vbasedev, container, &err)) {
>> +            const char *msg = error_get_pretty(err);
>> +
>> +            trace_iommufd_cdev_fail_attach_existing_container(msg);
>> +            error_free(err);
>> +            err = NULL;
>> +        } else {
>> +            ret = iommufd_ram_block_discard_disable(true);
>> +            if (ret) {
>> +                error_setg(errp,
>> +                              "Cannot set discarding of RAM broken (%d)", ret);
>> +                goto err_discard_disable;
>> +            }
>> +            goto found_container;
>> +        }
>> +    }
>> +
>> +    /* Need to allocate a new dedicated container */
>> +    ret = iommufd_backend_get_ioas(vbasedev->iommufd, &ioas_id);
>> +    if (ret < 0) {
>> +        error_setg_errno(errp, errno, "Failed to alloc ioas");
>> +        goto err_get_ioas;
>> +    }
>> +
>> +    trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
>> +
>> +    container = g_malloc0(sizeof(*container));
>> +    container->be = vbasedev->iommufd;
>> +    container->ioas_id = ioas_id;
>> +    QLIST_INIT(&container->hwpt_list);
>> +
>> +    bcontainer = &container->bcontainer;
>> +    vfio_container_init(bcontainer, space, &vfio_iommufd_ops);
>> +    QLIST_INSERT_HEAD(&space->containers, bcontainer, next);
>> +
>> +    ret = iommufd_cdev_attach_container(vbasedev, container, errp);
>> +    if (ret) {
>> +        goto err_attach_container;
>> +    }
>> +
>> +    ret = iommufd_ram_block_discard_disable(true);
>> +    if (ret) {
>> +        goto err_discard_disable;
>> +    }
>> +
>> +    bcontainer->pgsizes = qemu_real_host_page_size();
>> +
>> +    bcontainer->listener = vfio_memory_listener;
>> +    memory_listener_register(&bcontainer->listener, bcontainer->space->as);
>> +
>> +    if (bcontainer->error) {
>> +        ret = -1;
>> +        error_propagate_prepend(errp, bcontainer->error,
>> +                                "memory listener initialization failed: ");
>> +        goto err_listener_register;
>> +    }
>> +
>> +    bcontainer->initialized = true;
>> +
>> +found_container:
>> +    ret = ioctl(devfd, VFIO_DEVICE_GET_INFO, &dev_info);
>> +    if (ret) {
>> +        error_setg_errno(errp, errno, "error getting device info");
>> +        goto err_listener_register;
>> +    }
>> +
>> +    /*
>> +     * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
>> +     * for discarding incompatibility check as well?
>> +     */
>> +    if (vbasedev->ram_block_discard_allowed) {
>> +        iommufd_ram_block_discard_disable(false);
>> +    }
>> +
>> +    vbasedev->group = 0;
>> +    vbasedev->num_irqs = dev_info.num_irqs;
>> +    vbasedev->num_regions = dev_info.num_regions;
>> +    vbasedev->flags = dev_info.flags;
>> +    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
>> +    vbasedev->bcontainer = bcontainer;
>> +    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
>> +    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
>> +
>> +    trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev-
>>num_irqs,
>> +                                   vbasedev->num_regions, vbasedev->flags);
>> +    return 0;
>> +
>> +err_listener_register:
>> +    iommufd_ram_block_discard_disable(false);
>> +err_discard_disable:
>> +    iommufd_cdev_detach_container(vbasedev, container);
>> +err_attach_container:
>> +    iommufd_container_destroy(container);
>> +err_get_ioas:
>> +    vfio_put_address_space(space);
>> +    iommufd_unbind_and_disconnect(vbasedev);
>> +err_connect_bind:
>> +    close(vbasedev->fd);
>> +    return ret;
>> +}
>> +
>> +static void iommufd_detach_device(VFIODevice *vbasedev)
>> +{
>> +    VFIOContainerBase *bcontainer = vbasedev->bcontainer;
>> +    VFIOIOMMUFDContainer *container;
>> +    VFIOAddressSpace *space = bcontainer->space;
>> +
>> +    QLIST_REMOVE(vbasedev, global_next);
>> +    QLIST_REMOVE(vbasedev, container_next);
>> +    vbasedev->bcontainer = NULL;
>> +
>> +    if (!vbasedev->ram_block_discard_allowed) {
>> +        iommufd_ram_block_discard_disable(false);
>> +    }
>> +
>> +    container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> +    iommufd_cdev_detach_container(vbasedev, container);
>> +    iommufd_container_destroy(container);
>> +    vfio_put_address_space(space);
>> +
>> +    iommufd_unbind_and_disconnect(vbasedev);
>> +    close(vbasedev->fd);
>> +}
>> +
>> +const VFIOIOMMUOps vfio_iommufd_ops = {
>> +    .dma_map = iommufd_map,
>> +    .dma_unmap = iommufd_unmap,
>> +    .attach_device = iommufd_attach_device,
>> +    .detach_device = iommufd_detach_device,
>> +};
>> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
>> index eb6ce6229d..9cae2c9e21 100644
>> --- a/hw/vfio/meson.build
>> +++ b/hw/vfio/meson.build
>> @@ -7,6 +7,9 @@ vfio_ss.add(files(
>>     'spapr.c',
>>     'migration.c',
>>   ))
>> +if have_iommufd
>> +  vfio_ss.add(files('iommufd.c'))
>> +endif
>
>Instead,
>
>vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
>   'iommufd.c',
>))

Will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 32/41] vfio/pci: Introduce a vfio pci hot reset interface
  2023-11-07 13:52   ` Cédric Le Goater
@ 2023-11-08  5:46     ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  5:46 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 7, 2023 9:53 PM
>Subject: Re: [PATCH v4 32/41] vfio/pci: Introduce a vfio pci hot reset interface
>
>On 11/2/23 08:12, Zhenzhong Duan wrote:
>> Legacy vfio pci and iommufd cdev have different process to hot reset
>> vfio device, expand current code to abstract out pci_hot_reset callback
>> for legacy vfio, this same interface will also be used by iommufd
>> cdev vfio device.
>>
>> Suggested-by: Cédric Le Goater <clg@redhat.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>>   hw/vfio/pci.h                         |  1 +
>>   include/hw/vfio/vfio-container-base.h |  3 +++
>>   hw/vfio/container.c                   |  2 ++
>>   hw/vfio/pci.c                         | 11 ++++++++++-
>>   4 files changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index 1006061afb..12cc765821 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -220,6 +220,7 @@ extern const PropertyInfo
>qdev_prop_nv_gpudirect_clique;
>>
>>   int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
>>                                       struct vfio_pci_hot_reset_info **info_p);
>> +int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single);
>>
>>   int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
>>
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-
>container-base.h
>> index 4b6f017c6f..45bb19c767 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -106,6 +106,9 @@ struct VFIOIOMMUOps {
>>       int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
>>       int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap
>*vbmap,
>>                                 hwaddr iova, hwaddr size);
>> +    /* PCI specific */
>> +    int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
>> +
>>       /* SPAPR specific */
>>       int (*add_window)(VFIOContainerBase *bcontainer,
>>                         MemoryRegionSection *section,
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index ed2d721b2b..f27cc15d09 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -33,6 +33,7 @@
>>   #include "trace.h"
>>   #include "qapi/error.h"
>>   #include "migration/migration.h"
>> +#include "pci.h"
>>
>>   VFIOGroupList vfio_group_list =
>>       QLIST_HEAD_INITIALIZER(vfio_group_list);
>> @@ -929,4 +930,5 @@ const VFIOIOMMUOps vfio_legacy_ops = {
>>       .detach_device = vfio_legacy_detach_device,
>>       .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
>>       .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
>> +    .pci_hot_reset = vfio_legacy_pci_hot_reset,
>>   };
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index eb55e8ae88..a6194b7bfe 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2483,8 +2483,9 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice
>*vdev,
>>       return 0;
>>   }
>>
>> -static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
>> +int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single)
>
>Could we move this routine to container .c ?

Good idea, will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-07 13:33   ` Cédric Le Goater
  2023-11-08  3:35     ` Duan, Zhenzhong
@ 2023-11-08  5:50     ` Markus Armbruster
  2023-11-08 10:03       ` Cédric Le Goater
  1 sibling, 1 reply; 114+ messages in thread
From: Markus Armbruster @ 2023-11-08  5:50 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Zhenzhong Duan, qemu-devel, alex.williamson, jgg, nicolinc,
	joao.m.martins, eric.auger, peterx, jasowang, kevin.tian,
	yi.l.liu, yi.y.sun, chao.p.peng, Paolo Bonzini, Eric Blake,
	Markus Armbruster, Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth

Cédric Le Goater <clg@redhat.com> writes:

> On 11/2/23 08:12, Zhenzhong Duan wrote:
>> From: Eric Auger <eric.auger@redhat.com>
>> Introduce an iommufd object which allows the interaction
>> with the host /dev/iommu device.
>> The /dev/iommu can have been already pre-opened outside of qemu,
>> in which case the fd can be passed directly along with the
>> iommufd object:
>> This allows the iommufd object to be shared accross several
>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>> the /dev/iommu once.
>> If no fd is passed along with the iommufd object, the /dev/iommu
>> is opened by the qemu code.
>> The CONFIG_IOMMUFD option must be set to compile this new object.
>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> v4: add CONFIG_IOMMUFD check, document default case
>>   MAINTAINERS              |   7 ++
>>   qapi/qom.json            |  22 ++++
>>   include/sysemu/iommufd.h |  46 +++++++
>>   backends/iommufd-stub.c  |  59 +++++++++
>>   backends/iommufd.c       | 257 +++++++++++++++++++++++++++++++++++++++
>>   backends/Kconfig         |   4 +
>>   backends/meson.build     |   5 +
>>   backends/trace-events    |  12 ++
>>   qemu-options.hx          |  13 ++
>>   9 files changed, 425 insertions(+)
>>   create mode 100644 include/sysemu/iommufd.h
>>   create mode 100644 backends/iommufd-stub.c
>>   create mode 100644 backends/iommufd.c
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index cd8d6b140f..6f35159255 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -2135,6 +2135,13 @@ F: hw/vfio/ap.c
>>   F: docs/system/s390x/vfio-ap.rst
>>   L: qemu-s390x@nongnu.org
>>   +iommufd
>> +M: Yi Liu <yi.l.liu@intel.com>
>> +M: Eric Auger <eric.auger@redhat.com>
>> +S: Supported
>> +F: backends/iommufd.c
>> +F: include/sysemu/iommufd.h
>> +
>>   vhost
>>   M: Michael S. Tsirkin <mst@redhat.com>
>>   S: Supported
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index c53ef978ff..27300add48 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -794,6 +794,24 @@
>>   { 'struct': 'VfioUserServerProperties',
>>     'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>> +##
>> +# @IOMMUFDProperties:
>> +#
>> +# Properties for iommufd objects.
>> +#
>> +# @fd: file descriptor name previously passed via 'getfd' command,
>> +#     which represents a pre-opened /dev/iommu.  This allows the
>> +#     iommufd object to be shared accross several subsystems
>> +#     (VFIO, VDPA, ...), and the file descriptor to be shared
>> +#     with other process, e.g. DPDK.  (default: QEMU opens
>> +#     /dev/iommu by itself)
>> +#
>> +# Since: 8.2
>> +##
>> +{ 'struct': 'IOMMUFDProperties',
>> +  'data': { '*fd': 'str' },
>> +  'if': 'CONFIG_IOMMUFD' }
>
>
> Activating or not IOMMUFD on a platform is a configuration choice
> and it is not a dependency on an external resource. I would make
> things simpler and drop all the #ifdef in the documentation files.

What exactly are you proposing?

The use of 'if': 'CONFIG_IOMMUFD' in the QAPI schema enables
introspection with query-qmp-schema: when ObjectType @iommufd exists,
QEMU supports creating the object.  Or am I confused?

> There might be a way to remove the documentation also. Not a big
> issue for now.
>
>
>> +
>>   ##
>>   # @RngProperties:
>>   #
>> @@ -934,6 +952,8 @@
>>      'input-barrier',
>>      { 'name': 'input-linux',
>>        'if': 'CONFIG_LINUX' },
>> +    { 'name': 'iommufd',
>> +      'if': 'CONFIG_IOMMUFD' },
>>      'iothread',
>>      'main-loop',
>>      { 'name': 'memory-backend-epc',
>> @@ -1003,6 +1023,8 @@
>>        'input-barrier':              'InputBarrierProperties',
>>        'input-linux':                { 'type': 'InputLinuxProperties',
>>                                         'if': 'CONFIG_LINUX' },
>> +      'iommufd':                    { 'type': 'IOMMUFDProperties',
>> +                                      'if': 'CONFIG_IOMMUFD' },
>>        'iothread':                   'IothreadProperties',
>>        'main-loop':                  'MainLoopProperties',
>>        'memory-backend-epc':         { 'type': 'MemoryBackendEpcProperties',

[...]



^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 25/41] Add iommufd configure option
  2023-11-07 14:37     ` Cédric Le Goater
@ 2023-11-08  6:08       ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  6:08 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Paolo Bonzini, Marc-André Lureau,
	Daniel P. Berrangé, Thomas Huth, Philippe Mathieu-Daudé



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 7, 2023 10:37 PM
>Subject: Re: [PATCH v4 25/41] Add iommufd configure option
>
>On 11/7/23 14:14, Cédric Le Goater wrote:
>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>> This adds "--enable-iommufd/--disable-iommufd" to enable or disable
>>> iommufd support, enabled by default.
>>
>> I don't think a configure option is the right approach. I will
>> comment other patches to propose another solution relying on
>> Kconfig and activating IOMMUFD for aarch64, s390x, x86_64 only.
>
>Here is an example on your series :
>
>   https://github.com/legoater/qemu/commits/vfio-8.2
>
>The backend is always compiled (since it is common) but the VFIO frontend
>and the 'iommufd' object are only available on x86_64, arm, s390x.

It looks like iommufd backend is compiled only for x86_64, arm or s390x,
this makes sense for me, as I think on other platform which doesn't
support iommufd, no need to compile useless iommufd backend in.

>
>Looks like a good compromise. Please tell me what you think about it.

Yes, this looks better for me, I'll include your change in v5.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-08  2:59   ` Matthew Rosato
@ 2023-11-08  7:16     ` Duan, Zhenzhong
  2023-11-08 12:48       ` Jason Gunthorpe
  0 siblings, 1 reply; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  7:16 UTC (permalink / raw)
  To: Matthew Rosato, qemu-devel
  Cc: alex.williamson, clg, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Thomas Huth, Eric Farman, Halil Pasic, Jason J. Herne,
	Tony Krowiak

Hi Matthew,

>-----Original Message-----
>From: Matthew Rosato <mjrosato@linux.ibm.com>
>Sent: Wednesday, November 8, 2023 11:00 AM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On 11/2/23 3:12 AM, Zhenzhong Duan wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>>
>> Add the iommufd backend. The IOMMUFD container class is implemented
>> based on the new /dev/iommu user API. This backend obviously depends
>> on CONFIG_IOMMUFD.
>>
>> So far, the iommufd backend doesn't support dirty page sync yet due
>> to missing support in the host kernel.
>>
>> Co-authored-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>
>[...]
>
>> +static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
>> +                                         VFIOIOMMUFDContainer *container,
>> +                                         Error **errp)
>> +{
>> +    int ret, iommufd = vbasedev->iommufd->fd;
>> +    VFIOIOASHwpt *hwpt;
>> +    uint32_t hwpt_id;
>> +    Error *err = NULL;
>> +
>> +    /* try to attach to an existing hwpt in this container */
>> +    QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
>> +        ret = iommufd_cdev_attach_hwpt(vbasedev, hwpt->hwpt_id, &err);
>> +        if (ret) {
>> +            const char *msg = error_get_pretty(err);
>> +
>> +            trace_iommufd_cdev_fail_attach_existing_hwpt(msg);
>> +            error_free(err);
>> +            err = NULL;
>> +        } else {
>> +            goto found_hwpt;
>> +        }
>> +    }
>> +
>> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> +                                     container->ioas_id, &hwpt_id);
>> +
>> +    if (ret) {
>> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
>> +        return ret;
>> +    }
>
>The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to use
>iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in the
>kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>
>AFAIU that's because the mdevs are meant to instead use kernel access via
>vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
>looking at the kernel vfio compat container.
>
>As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
>backend by just skipping this alloc_hwpt above and instead passing container-
>>ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>vfio_iommufd_emulated_attach_ioas call in the kernel.

Thanks for help test and investigation.
I was only focusing on real device and missed the mdev particularity, sorry.
You are right, there is no hwpt support for mdev, not even an emulated hwpt.
I'll digging into this and see how to distinguish mdev with real device in
this low level function.

BRs.
Zhenzhong


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 00/41] vfio: Adopt iommufd
  2023-11-08  3:26   ` Matthew Rosato
@ 2023-11-08  8:37     ` Duan, Zhenzhong
  2023-11-08  9:07       ` Duan, Zhenzhong
  2023-11-08  9:21     ` Cédric Le Goater
  1 sibling, 1 reply; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  8:37 UTC (permalink / raw)
  To: Matthew Rosato, Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Thomas Huth, Eric Farman, Halil Pasic, Jason J. Herne,
	Tony Krowiak



>-----Original Message-----
>From: Matthew Rosato <mjrosato@linux.ibm.com>
>Sent: Wednesday, November 8, 2023 11:27 AM
>Subject: Re: [PATCH v4 00/41] vfio: Adopt iommufd
>
>On 11/7/23 1:28 PM, Cédric Le Goater wrote:
>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>> Hi,
>>>
>>> Thanks all for giving guides and comments on previous series, here is
>>> the v4 of pure iommufd support part.
>>>
>>> Based on Cédric's suggestion, this series includes an effort to remove
>>> spapr code from container.c, now all spapr functions are moved to spapr.c
>>> or spapr_pci_vfio.c, but there are still a few trival check on
>>> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce
>many
>>> callbacks and duplicate code just to remove them. Some functions are moved
>>> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
>>> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
>>> functions to those spapr functions moved.
>>>
>>>
>>> PATCH 1-5: Move spapr functions to spapr*.c
>>> PATCH 6-20: Abstract out base container
>>> PATCH 21-24: Introduce sparpr container and its specific interface
>>
>> PATCH 6-24 applied to vfio-next :
>>
>>   https://github.com/legoater/qemu/commits/vfio-next
>>
>> (with a global s/fucntional/functional/)
>>
>>
>> I also pushed the remaining patches on :
>>
>>   https://github.com/legoater/qemu/commits/vfio-8.2
>>
>> with a slight rework of the IOMMUFD configuration, now done per platform.
>> The VFIO frontend and the 'iommufd' object are only available on x86_64,
>> arm, s390x.

Thanks Cédric.

>
>FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the
>iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found)
>so I think something isn't actually enabling IOMMUFD as expected with your
>change...

It looks CONFIG_IOMMUFD is recognized by Kconfig sub-system but not received
by compiler. I'm still digging how to pass CONFIG_IOMMUFD to compiler.

>
>Instead I tested on s390x using vfio-next + patches 25-41 of this series on top.
>
>Legacy backend regression testing worked fine for vfio-pci, vfio-ap and vfio-ccw.
>
>Using iommufd backend for vfio-pci on s390 exposes an s390-only issue related
>to accounting of vfio DMA limit (code in hw/s390x/s390-pci-vfio.c assumes
>VFIODevice.group is never null, but that's no longer true when we use the
>iommufd backend with cdev).  We don't even need to track this when using the
>iommufd backend -- With that issue bypassed, vfio-pci testing on s390x looks
>good so far.  I'll send a separate fix for that.

Thanks for fixing that.

BRs.
Zhenzhong

>
>Using the iommufd backend for vfio-ccw and vfio-ap did not work, see response
>to patch 28.
>
>Thanks,
>Matt


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 00/41] vfio: Adopt iommufd
  2023-11-08  8:37     ` Duan, Zhenzhong
@ 2023-11-08  9:07       ` Duan, Zhenzhong
  2023-11-08  9:23         ` Cédric Le Goater
  0 siblings, 1 reply; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  9:07 UTC (permalink / raw)
  To: Matthew Rosato, Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Thomas Huth, Eric Farman, Halil Pasic, Jason J. Herne,
	Tony Krowiak



>-----Original Message-----
>From: Duan, Zhenzhong
>Sent: Wednesday, November 8, 2023 4:38 PM
>Subject: RE: [PATCH v4 00/41] vfio: Adopt iommufd
>
>
>
>>-----Original Message-----
>>From: Matthew Rosato <mjrosato@linux.ibm.com>
>>Sent: Wednesday, November 8, 2023 11:27 AM
>>Subject: Re: [PATCH v4 00/41] vfio: Adopt iommufd
>>
>>On 11/7/23 1:28 PM, Cédric Le Goater wrote:
>>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>>> Hi,
>>>>
>>>> Thanks all for giving guides and comments on previous series, here is
>>>> the v4 of pure iommufd support part.
>>>>
>>>> Based on Cédric's suggestion, this series includes an effort to remove
>>>> spapr code from container.c, now all spapr functions are moved to spapr.c
>>>> or spapr_pci_vfio.c, but there are still a few trival check on
>>>> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce
>>many
>>>> callbacks and duplicate code just to remove them. Some functions are moved
>>>> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
>>>> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
>>>> functions to those spapr functions moved.
>>>>
>>>>
>>>> PATCH 1-5: Move spapr functions to spapr*.c
>>>> PATCH 6-20: Abstract out base container
>>>> PATCH 21-24: Introduce sparpr container and its specific interface
>>>
>>> PATCH 6-24 applied to vfio-next :
>>>
>>>   https://github.com/legoater/qemu/commits/vfio-next
>>>
>>> (with a global s/fucntional/functional/)
>>>
>>>
>>> I also pushed the remaining patches on :
>>>
>>>   https://github.com/legoater/qemu/commits/vfio-8.2
>>>
>>> with a slight rework of the IOMMUFD configuration, now done per platform.
>>> The VFIO frontend and the 'iommufd' object are only available on x86_64,
>>> arm, s390x.
>
>Thanks Cédric.
>
>>
>>FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the
>>iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found)
>>so I think something isn't actually enabling IOMMUFD as expected with your
>>change...
>
>It looks CONFIG_IOMMUFD is recognized by Kconfig sub-system but not received
>by compiler. I'm still digging how to pass CONFIG_IOMMUFD to compiler.

Need below change to pass CONFIG_IOMMUFD to compiler.

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 0a810f8b88..2a3263b51f 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -30,6 +30,7 @@
 #include "exec/address-spaces.h"
 #include "qom/object.h"
 #include "monitor/monitor.h"
+#include CONFIG_DEVICES

 #define TYPE_VFIO_AP_DEVICE      "vfio-ap"

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index a674bd8d6d..08101ad445 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -31,6 +31,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
 #include "monitor/monitor.h"
+#include CONFIG_DEVICES

 struct VFIOCCWDevice {
     S390CCWDevice cdev;
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d8f658ea47..3121b5f985 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -44,6 +44,7 @@
 #include "migration/qemu-file.h"
 #include "sysemu/iommufd.h"
 #include "monitor/monitor.h"
+#include CONFIG_DEVICES

 #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"

Thanks
Zhenzhong

^ permalink raw reply related	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 00/41] vfio: Adopt iommufd
  2023-11-08  3:26   ` Matthew Rosato
  2023-11-08  8:37     ` Duan, Zhenzhong
@ 2023-11-08  9:21     ` Cédric Le Goater
  1 sibling, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-08  9:21 UTC (permalink / raw)
  To: Matthew Rosato, Zhenzhong Duan, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
	peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
	Thomas Huth, Eric Farman, Halil Pasic, Jason J. Herne,
	Tony Krowiak

On 11/8/23 04:26, Matthew Rosato wrote:
> On 11/7/23 1:28 PM, Cédric Le Goater wrote:
>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>> Hi,
>>>
>>> Thanks all for giving guides and comments on previous series, here is
>>> the v4 of pure iommufd support part.
>>>
>>> Based on Cédric's suggestion, this series includes an effort to remove
>>> spapr code from container.c, now all spapr functions are moved to spapr.c
>>> or spapr_pci_vfio.c, but there are still a few trival check on
>>> VFIO_SPAPR_TCE_*_IOMMU which I am not sure if deserved to introduce many
>>> callbacks and duplicate code just to remove them. Some functions are moved
>>> to spapr.c instead of spapr_pci_vfio.c to avoid compile issue because
>>> spapr_pci_vfio.c is arch specific, or else we need to introduce stub
>>> functions to those spapr functions moved.
>>>
>>>
>>> PATCH 1-5: Move spapr functions to spapr*.c
>>> PATCH 6-20: Abstract out base container
>>> PATCH 21-24: Introduce sparpr container and its specific interface
>>
>> PATCH 6-24 applied to vfio-next :
>>
>>    https://github.com/legoater/qemu/commits/vfio-next
>>
>> (with a global s/fucntional/functional/)
>>
>>
>> I also pushed the remaining patches on :
>>
>>    https://github.com/legoater/qemu/commits/vfio-8.2
>>
>> with a slight rework of the IOMMUFD configuration, now done per platform.
>> The VFIO frontend and the 'iommufd' object are only available on x86_64,
>> arm, s390x.
> 
> FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found) so I think something isn't actually enabling IOMMUFD as expected with your change...

yes. The previous method used to enable the IOMMUFD device with
a ./configure script option was exposing the CONFIG_IOMMUFD define
globally.

The current method using the Kconfig files requires an extra :

#include CONFIG_DEVICES

in each file using CONFIG_IOMMUFD.

I didn't see it because when compiled natively on x86_64 the
CONFIG_IOMMUFD define is included for some (magic) reason.
It is not the case on other arches, ppc64, aarch64, s390x.

I did the update and repushed vfio-8.2. Should work now.

> 
> Instead I tested on s390x using vfio-next + patches 25-41 of this series on top.
> 
> Legacy backend regression testing worked fine for vfio-pci, vfio-ap and vfio-ccw.

ok. Good. This means that vfio-next is in good shape.

> Using iommufd backend for vfio-pci on s390 exposes an s390-only issue related to accounting of vfio DMA limit (code in hw/s390x/s390-pci-vfio.c assumes VFIODevice.group is never null, but that's no longer true when we use the iommufd backend with cdev).  We don't even need to track this when using the iommufd backend -- With that issue bypassed, vfio-pci testing on s390x looks good so far.  I'll send a separate fix for that.

Thanks,

C.

  
> Using the iommufd backend for vfio-ccw and vfio-ap did not work, see response to patch 28.
> 
> Thanks,
> Matt
> 



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 00/41] vfio: Adopt iommufd
  2023-11-08  9:07       ` Duan, Zhenzhong
@ 2023-11-08  9:23         ` Cédric Le Goater
  0 siblings, 0 replies; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-08  9:23 UTC (permalink / raw)
  To: Duan, Zhenzhong, Matthew Rosato, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Thomas Huth, Eric Farman, Halil Pasic, Jason J. Herne,
	Tony Krowiak

>>> FYI, I first tried this vfio-8.2 branch on s390x but wasn't actually able to use the
>>> iommufd backend (was getting errors like Property 'vfio-pci.iommufd' not found)
>>> so I think something isn't actually enabling IOMMUFD as expected with your
>>> change...
>>
>> It looks CONFIG_IOMMUFD is recognized by Kconfig sub-system but not received
>> by compiler. I'm still digging how to pass CONFIG_IOMMUFD to compiler.
> 
> Need below change to pass CONFIG_IOMMUFD to compiler.
> 
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index 0a810f8b88..2a3263b51f 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -30,6 +30,7 @@
>   #include "exec/address-spaces.h"
>   #include "qom/object.h"
>   #include "monitor/monitor.h"
> +#include CONFIG_DEVICES
> 
>   #define TYPE_VFIO_AP_DEVICE      "vfio-ap"
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index a674bd8d6d..08101ad445 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -31,6 +31,7 @@
>   #include "qemu/main-loop.h"
>   #include "qemu/module.h"
>   #include "monitor/monitor.h"
> +#include CONFIG_DEVICES
> 
>   struct VFIOCCWDevice {
>       S390CCWDevice cdev;
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index d8f658ea47..3121b5f985 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -44,6 +44,7 @@
>   #include "migration/qemu-file.h"
>   #include "sysemu/iommufd.h"
>   #include "monitor/monitor.h"
> +#include CONFIG_DEVICES
> 
>   #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"


yep. I pushed forced vfio-8.2 with these changes.

Thanks,

C.




^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-08  3:35     ` Duan, Zhenzhong
@ 2023-11-08  9:40       ` Cédric Le Goater
  2023-11-08  9:43         ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-08  9:40 UTC (permalink / raw)
  To: Duan, Zhenzhong, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Paolo Bonzini, Eric Blake, Markus Armbruster,
	Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth

>>> +                              hwaddr iova, ram_addr_t size)
>>> +{
>>> +    int ret;
>>> +    struct iommu_ioas_unmap unmap = {
>>> +        .size = sizeof(unmap),
>>> +        .ioas_id = ioas_id,
>>> +        .iova = iova,
>>> +        .length = size,
>>> +    };
>>> +
>>> +    ret = ioctl(be->fd, IOMMU_IOAS_UNMAP, &unmap);
>>> +    trace_iommufd_backend_unmap_dma(be->fd, ioas_id, iova, size, ret);
>>> +    /*
>>> +     * TODO: IOMMUFD doesn't support mapping PCI BARs for now.
>>> +     * It's not a problem if there is no p2p dma, relax it here
>>> +     * and avoid many noisy trigger from vIOMMU side.
>>
>> Should we add a warn_report() ?
> 
> The purpose of checking "ret && errno == ENOENT" is to avoid many
> error_report() for PCI BARs, If we add warn_report(), there will still be
> many print for PCI BARs.

a trace event then ?

Thanks,

C.



^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-08  9:40       ` Cédric Le Goater
@ 2023-11-08  9:43         ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08  9:43 UTC (permalink / raw)
  To: Cédric Le Goater, qemu-devel
  Cc: alex.williamson, jgg, nicolinc, Martins, Joao, eric.auger,
	peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng,
	Chao P, Paolo Bonzini, Eric Blake, Markus Armbruster,
	Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Wednesday, November 8, 2023 5:41 PM
>Subject: Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
>
>>>> +                              hwaddr iova, ram_addr_t size)
>>>> +{
>>>> +    int ret;
>>>> +    struct iommu_ioas_unmap unmap = {
>>>> +        .size = sizeof(unmap),
>>>> +        .ioas_id = ioas_id,
>>>> +        .iova = iova,
>>>> +        .length = size,
>>>> +    };
>>>> +
>>>> +    ret = ioctl(be->fd, IOMMU_IOAS_UNMAP, &unmap);
>>>> +    trace_iommufd_backend_unmap_dma(be->fd, ioas_id, iova, size, ret);
>>>> +    /*
>>>> +     * TODO: IOMMUFD doesn't support mapping PCI BARs for now.
>>>> +     * It's not a problem if there is no p2p dma, relax it here
>>>> +     * and avoid many noisy trigger from vIOMMU side.
>>>
>>> Should we add a warn_report() ?
>>
>> The purpose of checking "ret && errno == ENOENT" is to avoid many
>> error_report() for PCI BARs, If we add warn_report(), there will still be
>> many print for PCI BARs.
>
>a trace event then ?

Good idea, will do.

Thanks
Zhenzhong

^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-08  5:50     ` Markus Armbruster
@ 2023-11-08 10:03       ` Cédric Le Goater
  2023-11-08 10:30         ` Markus Armbruster
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-08 10:03 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Zhenzhong Duan, qemu-devel, alex.williamson, jgg, nicolinc,
	joao.m.martins, eric.auger, peterx, jasowang, kevin.tian,
	yi.l.liu, yi.y.sun, chao.p.peng, Paolo Bonzini, Eric Blake,
	Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth

Hello Markus,

On 11/8/23 06:50, Markus Armbruster wrote:
> Cédric Le Goater <clg@redhat.com> writes:
> 
>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Introduce an iommufd object which allows the interaction
>>> with the host /dev/iommu device.
>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>> in which case the fd can be passed directly along with the
>>> iommufd object:
>>> This allows the iommufd object to be shared accross several
>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>> the /dev/iommu once.
>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>> is opened by the qemu code.
>>> The CONFIG_IOMMUFD option must be set to compile this new object.
>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>> v4: add CONFIG_IOMMUFD check, document default case
>>>    MAINTAINERS              |   7 ++
>>>    qapi/qom.json            |  22 ++++
>>>    include/sysemu/iommufd.h |  46 +++++++
>>>    backends/iommufd-stub.c  |  59 +++++++++
>>>    backends/iommufd.c       | 257 +++++++++++++++++++++++++++++++++++++++
>>>    backends/Kconfig         |   4 +
>>>    backends/meson.build     |   5 +
>>>    backends/trace-events    |  12 ++
>>>    qemu-options.hx          |  13 ++
>>>    9 files changed, 425 insertions(+)
>>>    create mode 100644 include/sysemu/iommufd.h
>>>    create mode 100644 backends/iommufd-stub.c
>>>    create mode 100644 backends/iommufd.c
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index cd8d6b140f..6f35159255 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -2135,6 +2135,13 @@ F: hw/vfio/ap.c
>>>    F: docs/system/s390x/vfio-ap.rst
>>>    L: qemu-s390x@nongnu.org
>>>    +iommufd
>>> +M: Yi Liu <yi.l.liu@intel.com>
>>> +M: Eric Auger <eric.auger@redhat.com>
>>> +S: Supported
>>> +F: backends/iommufd.c
>>> +F: include/sysemu/iommufd.h
>>> +
>>>    vhost
>>>    M: Michael S. Tsirkin <mst@redhat.com>
>>>    S: Supported
>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>> index c53ef978ff..27300add48 100644
>>> --- a/qapi/qom.json
>>> +++ b/qapi/qom.json
>>> @@ -794,6 +794,24 @@
>>>    { 'struct': 'VfioUserServerProperties',
>>>      'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>> +##
>>> +# @IOMMUFDProperties:
>>> +#
>>> +# Properties for iommufd objects.
>>> +#
>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>>> +#     which represents a pre-opened /dev/iommu.  This allows the
>>> +#     iommufd object to be shared accross several subsystems
>>> +#     (VFIO, VDPA, ...), and the file descriptor to be shared
>>> +#     with other process, e.g. DPDK.  (default: QEMU opens
>>> +#     /dev/iommu by itself)
>>> +#
>>> +# Since: 8.2
>>> +##
>>> +{ 'struct': 'IOMMUFDProperties',
>>> +  'data': { '*fd': 'str' },
>>> +  'if': 'CONFIG_IOMMUFD' }
>>
>>
>> Activating or not IOMMUFD on a platform is a configuration choice
>> and it is not a dependency on an external resource. I would make
>> things simpler and drop all the #ifdef in the documentation files.
> 
> What exactly are you proposing?

I would like to simplify the configuration part of this new IOMMUFD
feature and avoid a ./configure option to enable/disable the feature
since it has no external dependencies and can be compiled on all
platforms.

However, we know that it only makes sense to have the IOMMUFD backend
on platforms s390x, aarch64, x86_64. So I am proposing as an improvement
to enable IOMMUFD only on these platforms with this addition :

   imply IOMMUFD

to hw/{i386,s390x,arm}/Kconfig files.

This gives us the possibility to compile out the feature downstream
if something goes wrong, using the files under : configs/devices/.


Given that the IOMMUFD feature doesn't have any external dependencies
and that the IOMMUFD backend object is common to all platforms, I am
also proposing to remove all the CONFIG_IOMMUFD define usage in the
documentation file "qemu-options.hx" and the schema file "qapi/qom.json".

> 
> The use of 'if': 'CONFIG_IOMMUFD' in the QAPI schema enables
> introspection with query-qmp-schema: when ObjectType @iommufd exists,
> QEMU supports creating the object.  Or am I confused?
Object iommufd should always exist since it is common to all.

Is that acceptable ?

Thanks,

C.

> 
>> There might be a way to remove the documentation also. Not a big
>> issue for now.
>>
>>
>>> +
>>>    ##
>>>    # @RngProperties:
>>>    #
>>> @@ -934,6 +952,8 @@
>>>       'input-barrier',
>>>       { 'name': 'input-linux',
>>>         'if': 'CONFIG_LINUX' },
>>> +    { 'name': 'iommufd',
>>> +      'if': 'CONFIG_IOMMUFD' },
>>>       'iothread',
>>>       'main-loop',
>>>       { 'name': 'memory-backend-epc',
>>> @@ -1003,6 +1023,8 @@
>>>         'input-barrier':              'InputBarrierProperties',
>>>         'input-linux':                { 'type': 'InputLinuxProperties',
>>>                                          'if': 'CONFIG_LINUX' },
>>> +      'iommufd':                    { 'type': 'IOMMUFDProperties',
>>> +                                      'if': 'CONFIG_IOMMUFD' },
>>>         'iothread':                   'IothreadProperties',
>>>         'main-loop':                  'MainLoopProperties',
>>>         'memory-backend-epc':         { 'type': 'MemoryBackendEpcProperties',
> 
> [...]
> 



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-08 10:03       ` Cédric Le Goater
@ 2023-11-08 10:30         ` Markus Armbruster
  2023-11-08 13:48           ` Cédric Le Goater
  0 siblings, 1 reply; 114+ messages in thread
From: Markus Armbruster @ 2023-11-08 10:30 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Zhenzhong Duan, qemu-devel, alex.williamson, jgg, nicolinc,
	joao.m.martins, eric.auger, peterx, jasowang, kevin.tian,
	yi.l.liu, yi.y.sun, chao.p.peng, Paolo Bonzini, Eric Blake,
	Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth

Cédric Le Goater <clg@redhat.com> writes:

> Hello Markus,
>
> On 11/8/23 06:50, Markus Armbruster wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>> 
>>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>> Introduce an iommufd object which allows the interaction
>>>> with the host /dev/iommu device.
>>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>>> in which case the fd can be passed directly along with the
>>>> iommufd object:
>>>> This allows the iommufd object to be shared accross several
>>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>>> the /dev/iommu once.
>>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>>> is opened by the qemu code.
>>>> The CONFIG_IOMMUFD option must be set to compile this new object.
>>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>> v4: add CONFIG_IOMMUFD check, document default case
>>>>    MAINTAINERS              |   7 ++
>>>>    qapi/qom.json            |  22 ++++
>>>>    include/sysemu/iommufd.h |  46 +++++++
>>>>    backends/iommufd-stub.c  |  59 +++++++++
>>>>    backends/iommufd.c       | 257 +++++++++++++++++++++++++++++++++++++++
>>>>    backends/Kconfig         |   4 +
>>>>    backends/meson.build     |   5 +
>>>>    backends/trace-events    |  12 ++
>>>>    qemu-options.hx          |  13 ++
>>>>    9 files changed, 425 insertions(+)
>>>>    create mode 100644 include/sysemu/iommufd.h
>>>>    create mode 100644 backends/iommufd-stub.c
>>>>    create mode 100644 backends/iommufd.c
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index cd8d6b140f..6f35159255 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -2135,6 +2135,13 @@ F: hw/vfio/ap.c
>>>>    F: docs/system/s390x/vfio-ap.rst
>>>>    L: qemu-s390x@nongnu.org
>>>>    +iommufd
>>>> +M: Yi Liu <yi.l.liu@intel.com>
>>>> +M: Eric Auger <eric.auger@redhat.com>
>>>> +S: Supported
>>>> +F: backends/iommufd.c
>>>> +F: include/sysemu/iommufd.h
>>>> +
>>>>    vhost
>>>>    M: Michael S. Tsirkin <mst@redhat.com>
>>>>    S: Supported
>>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>>> index c53ef978ff..27300add48 100644
>>>> --- a/qapi/qom.json
>>>> +++ b/qapi/qom.json
>>>> @@ -794,6 +794,24 @@
>>>>    { 'struct': 'VfioUserServerProperties',
>>>>      'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>>> +##
>>>> +# @IOMMUFDProperties:
>>>> +#
>>>> +# Properties for iommufd objects.
>>>> +#
>>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>>>> +#     which represents a pre-opened /dev/iommu.  This allows the
>>>> +#     iommufd object to be shared accross several subsystems
>>>> +#     (VFIO, VDPA, ...), and the file descriptor to be shared
>>>> +#     with other process, e.g. DPDK.  (default: QEMU opens
>>>> +#     /dev/iommu by itself)
>>>> +#
>>>> +# Since: 8.2
>>>> +##
>>>> +{ 'struct': 'IOMMUFDProperties',
>>>> +  'data': { '*fd': 'str' },
>>>> +  'if': 'CONFIG_IOMMUFD' }
>>>
>>>
>>> Activating or not IOMMUFD on a platform is a configuration choice
>>> and it is not a dependency on an external resource. I would make
>>> things simpler and drop all the #ifdef in the documentation files.
>>
>> What exactly are you proposing?
>
> I would like to simplify the configuration part of this new IOMMUFD
> feature and avoid a ./configure option to enable/disable the feature
> since it has no external dependencies and can be compiled on all
> platforms.
>
> However, we know that it only makes sense to have the IOMMUFD backend
> on platforms s390x, aarch64, x86_64. So I am proposing as an improvement
> to enable IOMMUFD only on these platforms with this addition :
>
>   imply IOMMUFD
>
> to hw/{i386,s390x,arm}/Kconfig files.
>
> This gives us the possibility to compile out the feature downstream
> if something goes wrong, using the files under : configs/devices/.

Shouldn't we then compile out the relevant parts of the QAPI schema,
too?

> Given that the IOMMUFD feature doesn't have any external dependencies
> and that the IOMMUFD backend object is common to all platforms, I am
> also proposing to remove all the CONFIG_IOMMUFD define usage in the
> documentation file "qemu-options.hx" and the schema file "qapi/qom.json".

Any CONFIG_IOMMUFD left elsewhere?

>> The use of 'if': 'CONFIG_IOMMUFD' in the QAPI schema enables
>> introspection with query-qmp-schema: when ObjectType @iommufd exists,
>> QEMU supports creating the object.  Or am I confused?
>
> Object iommufd should always exist since it is common to all.
>
> Is that acceptable ?

Perhaps the question to ask is whether a management application needs to
know whether this version of QEMU supports iommufd objects.  If yes,
then query-qmp-schema is an obvious way to find out.  What could go
wrong when this returns "supported" when it actually isn't?



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-08  7:16     ` Duan, Zhenzhong
@ 2023-11-08 12:48       ` Jason Gunthorpe
  2023-11-08 13:25         ` Duan, Zhenzhong
  2023-11-09 12:17         ` Joao Martins
  0 siblings, 2 replies; 114+ messages in thread
From: Jason Gunthorpe @ 2023-11-08 12:48 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: Matthew Rosato, qemu-devel, alex.williamson, clg, nicolinc,
	Martins, Joao, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak

On Wed, Nov 08, 2023 at 07:16:52AM +0000, Duan, Zhenzhong wrote:

> >> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> >> +                                     container->ioas_id, &hwpt_id);
> >> +
> >> +    if (ret) {
> >> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
> >> +        return ret;
> >> +    }
> >
> >The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to use
> >iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in the
> >kernel because it can't find an IOMMUFD_OBJ_DEVICE.
> >
> >AFAIU that's because the mdevs are meant to instead use kernel access via
> >vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
> >looking at the kernel vfio compat container.
> >
> >As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
> >backend by just skipping this alloc_hwpt above and instead passing container-
> >>ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
> >vfio_iommufd_emulated_attach_ioas call in the kernel.
> 
> Thanks for help test and investigation.
> I was only focusing on real device and missed the mdev particularity, sorry.
> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
> I'll digging into this and see how to distinguish mdev with real device in
> this low level function.

I was expecting that hwpt manipulation would be done exclusively
inside the device-specific vIOMMU userspace driver. Generic code paths
that don't have that knowledge should use the IOAS for everything

Jason


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-08 12:48       ` Jason Gunthorpe
@ 2023-11-08 13:25         ` Duan, Zhenzhong
  2023-11-08 14:19           ` Jason Gunthorpe
  2023-11-09 12:17         ` Joao Martins
  1 sibling, 1 reply; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-08 13:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Rosato, qemu-devel, alex.williamson, clg, nicolinc,
	Martins, Joao, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak



>-----Original Message-----
>From: Jason Gunthorpe <jgg@nvidia.com>
>Sent: Wednesday, November 8, 2023 8:48 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On Wed, Nov 08, 2023 at 07:16:52AM +0000, Duan, Zhenzhong wrote:
>
>> >> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>> >> +                                     container->ioas_id, &hwpt_id);
>> >> +
>> >> +    if (ret) {
>> >> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
>> >> +        return ret;
>> >> +    }
>> >
>> >The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to
>use
>> >iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in the
>> >kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>> >
>> >AFAIU that's because the mdevs are meant to instead use kernel access via
>> >vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave
>when
>> >looking at the kernel vfio compat container.
>> >
>> >As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
>> >backend by just skipping this alloc_hwpt above and instead passing container-
>> >>ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>> >vfio_iommufd_emulated_attach_ioas call in the kernel.
>>
>> Thanks for help test and investigation.
>> I was only focusing on real device and missed the mdev particularity, sorry.
>> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
>> I'll digging into this and see how to distinguish mdev with real device in
>> this low level function.
>
>I was expecting that hwpt manipulation would be done exclusively
>inside the device-specific vIOMMU userspace driver. Generic code paths
>that don't have that knowledge should use the IOAS for everything

Yes, this way we don't need to distinguish between mdev and real device,
just attach to IOAS. But lose the benefit that same hwpt could be passed
into vIOMMU to be used as S2 hwpt in nesting.

If you don't have a strong opinion to use IOAS for everything, I'm thinking
about adding a bool variable is_mdev in VFIODevice, checking this bool
to decide if attach to manually allocated hwpt or IOAS.
For vfio-ap and vfio-ccw, is_mdev is set to true, for vfio-pci, we check
"/sys/bus/mdev" from vbasedev->sysfsdev to decide if it's true.

Another choice is to add VFIO_DEVICE_FLAGS_MDEV in vfio_device_info.flags
on kernel side, qemu can know if this device is mdev by checking the flag from
kernel, this works even in fd passing case.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-08 10:30         ` Markus Armbruster
@ 2023-11-08 13:48           ` Cédric Le Goater
  2023-11-09  9:05             ` Markus Armbruster
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-08 13:48 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Zhenzhong Duan, qemu-devel, alex.williamson, jgg, nicolinc,
	joao.m.martins, eric.auger, peterx, jasowang, kevin.tian,
	yi.l.liu, yi.y.sun, chao.p.peng, Paolo Bonzini, Eric Blake,
	Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth

On 11/8/23 11:30, Markus Armbruster wrote:
> Cédric Le Goater <clg@redhat.com> writes:
> 
>> Hello Markus,
>>
>> On 11/8/23 06:50, Markus Armbruster wrote:
>>> Cédric Le Goater <clg@redhat.com> writes:
>>>
>>>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>>>> From: Eric Auger <eric.auger@redhat.com>
>>>>> Introduce an iommufd object which allows the interaction
>>>>> with the host /dev/iommu device.
>>>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>>>> in which case the fd can be passed directly along with the
>>>>> iommufd object:
>>>>> This allows the iommufd object to be shared accross several
>>>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>>>> the /dev/iommu once.
>>>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>>>> is opened by the qemu code.
>>>>> The CONFIG_IOMMUFD option must be set to compile this new object.
>>>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>> ---
>>>>> v4: add CONFIG_IOMMUFD check, document default case
>>>>>     MAINTAINERS              |   7 ++
>>>>>     qapi/qom.json            |  22 ++++
>>>>>     include/sysemu/iommufd.h |  46 +++++++
>>>>>     backends/iommufd-stub.c  |  59 +++++++++
>>>>>     backends/iommufd.c       | 257 +++++++++++++++++++++++++++++++++++++++
>>>>>     backends/Kconfig         |   4 +
>>>>>     backends/meson.build     |   5 +
>>>>>     backends/trace-events    |  12 ++
>>>>>     qemu-options.hx          |  13 ++
>>>>>     9 files changed, 425 insertions(+)
>>>>>     create mode 100644 include/sysemu/iommufd.h
>>>>>     create mode 100644 backends/iommufd-stub.c
>>>>>     create mode 100644 backends/iommufd.c
>>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>>> index cd8d6b140f..6f35159255 100644
>>>>> --- a/MAINTAINERS
>>>>> +++ b/MAINTAINERS
>>>>> @@ -2135,6 +2135,13 @@ F: hw/vfio/ap.c
>>>>>     F: docs/system/s390x/vfio-ap.rst
>>>>>     L: qemu-s390x@nongnu.org
>>>>>     +iommufd
>>>>> +M: Yi Liu <yi.l.liu@intel.com>
>>>>> +M: Eric Auger <eric.auger@redhat.com>
>>>>> +S: Supported
>>>>> +F: backends/iommufd.c
>>>>> +F: include/sysemu/iommufd.h
>>>>> +
>>>>>     vhost
>>>>>     M: Michael S. Tsirkin <mst@redhat.com>
>>>>>     S: Supported
>>>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>>>> index c53ef978ff..27300add48 100644
>>>>> --- a/qapi/qom.json
>>>>> +++ b/qapi/qom.json
>>>>> @@ -794,6 +794,24 @@
>>>>>     { 'struct': 'VfioUserServerProperties',
>>>>>       'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>>>> +##
>>>>> +# @IOMMUFDProperties:
>>>>> +#
>>>>> +# Properties for iommufd objects.
>>>>> +#
>>>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>>>>> +#     which represents a pre-opened /dev/iommu.  This allows the
>>>>> +#     iommufd object to be shared accross several subsystems
>>>>> +#     (VFIO, VDPA, ...), and the file descriptor to be shared
>>>>> +#     with other process, e.g. DPDK.  (default: QEMU opens
>>>>> +#     /dev/iommu by itself)
>>>>> +#
>>>>> +# Since: 8.2
>>>>> +##
>>>>> +{ 'struct': 'IOMMUFDProperties',
>>>>> +  'data': { '*fd': 'str' },
>>>>> +  'if': 'CONFIG_IOMMUFD' }
>>>>
>>>>
>>>> Activating or not IOMMUFD on a platform is a configuration choice
>>>> and it is not a dependency on an external resource. I would make
>>>> things simpler and drop all the #ifdef in the documentation files.
>>>
>>> What exactly are you proposing?
>>
>> I would like to simplify the configuration part of this new IOMMUFD
>> feature and avoid a ./configure option to enable/disable the feature
>> since it has no external dependencies and can be compiled on all
>> platforms.
>>
>> However, we know that it only makes sense to have the IOMMUFD backend
>> on platforms s390x, aarch64, x86_64. So I am proposing as an improvement
>> to enable IOMMUFD only on these platforms with this addition :
>>
>>    imply IOMMUFD
>>
>> to hw/{i386,s390x,arm}/Kconfig files.
>>
>> This gives us the possibility to compile out the feature downstream
>> if something goes wrong, using the files under : configs/devices/.
> 
> Shouldn't we then compile out the relevant parts of the QAPI schema,
> too?

Is it possible with Kconfig options ?
  
>> Given that the IOMMUFD feature doesn't have any external dependencies
>> and that the IOMMUFD backend object is common to all platforms, I am
>> also proposing to remove all the CONFIG_IOMMUFD define usage in the
>> documentation file "qemu-options.hx" and the schema file "qapi/qom.json".
> 
> Any CONFIG_IOMMUFD left elsewhere?

There are. To expose or not vfio properties. Stuff like :

ifdef CONFIG_IOMMUFD
     DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
                      TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
#endif
     DEFINE_PROP_END_OF_LIST(),

and

#ifdef CONFIG_IOMMUFD
     object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
#endif


>>> The use of 'if': 'CONFIG_IOMMUFD' in the QAPI schema enables
>>> introspection with query-qmp-schema: when ObjectType @iommufd exists,
>>> QEMU supports creating the object.  Or am I confused?
>>
>> Object iommufd should always exist since it is common to all.
>>
>> Is that acceptable ?
> 
> Perhaps the question to ask is whether a management application needs to
> know whether this version of QEMU supports iommufd objects.  If yes,
> then query-qmp-schema is an obvious way to find out.  

Thanks for reminding me of this possibility. In that case, we should
indeed avoid returning iommufd support when !CONFIG_IOMMUFD.

Can it be implemented in qapi/qom.json with Kconfig options ?

> What could go
> wrong when this returns "supported" when it actually isn't?
  
The management layer would build an invalid QEMU command line and
QEMU would return "invalid object type: iommufd"

Thanks,

C.






^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-08 13:25         ` Duan, Zhenzhong
@ 2023-11-08 14:19           ` Jason Gunthorpe
  2023-11-09  2:45             ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Jason Gunthorpe @ 2023-11-08 14:19 UTC (permalink / raw)
  To: Duan, Zhenzhong
  Cc: Matthew Rosato, qemu-devel, alex.williamson, clg, nicolinc,
	Martins, Joao, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak

On Wed, Nov 08, 2023 at 01:25:34PM +0000, Duan, Zhenzhong wrote:

> >I was expecting that hwpt manipulation would be done exclusively
> >inside the device-specific vIOMMU userspace driver. Generic code paths
> >that don't have that knowledge should use the IOAS for everything
> 
> Yes, this way we don't need to distinguish between mdev and real device,
> just attach to IOAS. But lose the benefit that same hwpt could be passed
> into vIOMMU to be used as S2 hwpt in nesting.

If you have a nesting capable vIOMMU driver then it should be
creating the HWPTs and managing them in its layer. Maybe the core code
provides some helpers.

Obviously you can't link a mdev to a nesting vIOMMU driver in the
first place. Mdev should be connected to a different IOMMU driver that
doesn't use HWPT at all.

I think it will make alot of trouble to put the hwpt in the wrong
layer as there shouldn't really be much generic code touching it.

Jason


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-08 14:19           ` Jason Gunthorpe
@ 2023-11-09  2:45             ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-09  2:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Matthew Rosato, qemu-devel, alex.williamson, clg, nicolinc,
	Martins, Joao, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak



>-----Original Message-----
>From: Jason Gunthorpe <jgg@nvidia.com>
>Sent: Wednesday, November 8, 2023 10:19 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On Wed, Nov 08, 2023 at 01:25:34PM +0000, Duan, Zhenzhong wrote:
>
>> >I was expecting that hwpt manipulation would be done exclusively
>> >inside the device-specific vIOMMU userspace driver. Generic code paths
>> >that don't have that knowledge should use the IOAS for everything
>>
>> Yes, this way we don't need to distinguish between mdev and real device,
>> just attach to IOAS. But lose the benefit that same hwpt could be passed
>> into vIOMMU to be used as S2 hwpt in nesting.
>
>If you have a nesting capable vIOMMU driver then it should be
>creating the HWPTs and managing them in its layer. Maybe the core code
>provides some helpers.

OK, thanks for suggestion.

>
>Obviously you can't link a mdev to a nesting vIOMMU driver in the
>first place. Mdev should be connected to a different IOMMU driver that
>doesn't use HWPT at all.
>
>I think it will make alot of trouble to put the hwpt in the wrong
>layer as there shouldn't really be much generic code touching it.

I'll send v5 with your suggested changes.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-08 13:48           ` Cédric Le Goater
@ 2023-11-09  9:05             ` Markus Armbruster
  2023-11-10  2:03               ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Markus Armbruster @ 2023-11-09  9:05 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Markus Armbruster, Zhenzhong Duan, qemu-devel, alex.williamson,
	jgg, nicolinc, joao.m.martins, eric.auger, peterx, jasowang,
	kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng, Paolo Bonzini,
	Eric Blake, Daniel P. Berrangé,
	Eduardo Habkost, Thomas Huth

Cédric Le Goater <clg@redhat.com> writes:

> On 11/8/23 11:30, Markus Armbruster wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>> 
>>> Hello Markus,
>>>
>>> On 11/8/23 06:50, Markus Armbruster wrote:
>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>
>>>>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>>>>> From: Eric Auger <eric.auger@redhat.com>
>>>>>> Introduce an iommufd object which allows the interaction
>>>>>> with the host /dev/iommu device.
>>>>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>>>>> in which case the fd can be passed directly along with the
>>>>>> iommufd object:
>>>>>> This allows the iommufd object to be shared accross several
>>>>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>>>>> the /dev/iommu once.
>>>>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>>>>> is opened by the qemu code.
>>>>>> The CONFIG_IOMMUFD option must be set to compile this new object.
>>>>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>> ---
>>>>>> v4: add CONFIG_IOMMUFD check, document default case
>>>>>>     MAINTAINERS              |   7 ++
>>>>>>     qapi/qom.json            |  22 ++++
>>>>>>     include/sysemu/iommufd.h |  46 +++++++
>>>>>>     backends/iommufd-stub.c  |  59 +++++++++
>>>>>>     backends/iommufd.c       | 257 +++++++++++++++++++++++++++++++++++++++
>>>>>>     backends/Kconfig         |   4 +
>>>>>>     backends/meson.build     |   5 +
>>>>>>     backends/trace-events    |  12 ++
>>>>>>     qemu-options.hx          |  13 ++
>>>>>>     9 files changed, 425 insertions(+)
>>>>>>     create mode 100644 include/sysemu/iommufd.h
>>>>>>     create mode 100644 backends/iommufd-stub.c
>>>>>>     create mode 100644 backends/iommufd.c
>>>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>>>> index cd8d6b140f..6f35159255 100644
>>>>>> --- a/MAINTAINERS
>>>>>> +++ b/MAINTAINERS
>>>>>> @@ -2135,6 +2135,13 @@ F: hw/vfio/ap.c
>>>>>>     F: docs/system/s390x/vfio-ap.rst
>>>>>>     L: qemu-s390x@nongnu.org
>>>>>>     +iommufd
>>>>>> +M: Yi Liu <yi.l.liu@intel.com>
>>>>>> +M: Eric Auger <eric.auger@redhat.com>
>>>>>> +S: Supported
>>>>>> +F: backends/iommufd.c
>>>>>> +F: include/sysemu/iommufd.h
>>>>>> +
>>>>>>     vhost
>>>>>>     M: Michael S. Tsirkin <mst@redhat.com>
>>>>>>     S: Supported
>>>>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>>>>> index c53ef978ff..27300add48 100644
>>>>>> --- a/qapi/qom.json
>>>>>> +++ b/qapi/qom.json
>>>>>> @@ -794,6 +794,24 @@
>>>>>>     { 'struct': 'VfioUserServerProperties',
>>>>>>       'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>>>>> +##
>>>>>> +# @IOMMUFDProperties:
>>>>>> +#
>>>>>> +# Properties for iommufd objects.
>>>>>> +#
>>>>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>>>>>> +#     which represents a pre-opened /dev/iommu.  This allows the
>>>>>> +#     iommufd object to be shared accross several subsystems
>>>>>> +#     (VFIO, VDPA, ...), and the file descriptor to be shared
>>>>>> +#     with other process, e.g. DPDK.  (default: QEMU opens
>>>>>> +#     /dev/iommu by itself)
>>>>>> +#
>>>>>> +# Since: 8.2
>>>>>> +##
>>>>>> +{ 'struct': 'IOMMUFDProperties',
>>>>>> +  'data': { '*fd': 'str' },
>>>>>> +  'if': 'CONFIG_IOMMUFD' }
>>>>>
>>>>>
>>>>> Activating or not IOMMUFD on a platform is a configuration choice
>>>>> and it is not a dependency on an external resource. I would make
>>>>> things simpler and drop all the #ifdef in the documentation files.
>>>>
>>>> What exactly are you proposing?
>>>
>>> I would like to simplify the configuration part of this new IOMMUFD
>>> feature and avoid a ./configure option to enable/disable the feature
>>> since it has no external dependencies and can be compiled on all
>>> platforms.
>>>
>>> However, we know that it only makes sense to have the IOMMUFD backend
>>> on platforms s390x, aarch64, x86_64. So I am proposing as an improvement
>>> to enable IOMMUFD only on these platforms with this addition :
>>>
>>>    imply IOMMUFD
>>>
>>> to hw/{i386,s390x,arm}/Kconfig files.
>>>
>>> This gives us the possibility to compile out the feature downstream
>>> if something goes wrong, using the files under : configs/devices/.
>> 
>> Shouldn't we then compile out the relevant parts of the QAPI schema,
>> too?
>
> Is it possible with Kconfig options ?

See below.

>>> Given that the IOMMUFD feature doesn't have any external dependencies
>>> and that the IOMMUFD backend object is common to all platforms, I am
>>> also proposing to remove all the CONFIG_IOMMUFD define usage in the
>>> documentation file "qemu-options.hx" and the schema file "qapi/qom.json".
>> 
>> Any CONFIG_IOMMUFD left elsewhere?
>
> There are. To expose or not vfio properties. Stuff like :
>
> ifdef CONFIG_IOMMUFD
>      DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
>                       TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> #endif
>      DEFINE_PROP_END_OF_LIST(),
>
> and
>
> #ifdef CONFIG_IOMMUFD
>      object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
> #endif
>
>
>>>> The use of 'if': 'CONFIG_IOMMUFD' in the QAPI schema enables
>>>> introspection with query-qmp-schema: when ObjectType @iommufd exists,
>>>> QEMU supports creating the object.  Or am I confused?
>>>
>>> Object iommufd should always exist since it is common to all.
>>>
>>> Is that acceptable ?
>> 
>> Perhaps the question to ask is whether a management application needs to
>> know whether this version of QEMU supports iommufd objects.  If yes,
>> then query-qmp-schema is an obvious way to find out.  
>
> Thanks for reminding me of this possibility. In that case, we should
> indeed avoid returning iommufd support when !CONFIG_IOMMUFD.
>
> Can it be implemented in qapi/qom.json with Kconfig options ?

The only tool we have for configuring the schema is the 'if'
conditional.  'if': 'CONFIG_IOMMUFD' compiles to #if
defined(CONFIG_IOMMUFD) ... #endif.  Your use of #ifdef CONFIG_IOMMUFD
above suggests this is fine here.

Symbols that are only defined in target-dependent compiles (see
exec/poison.h) can only be used in target-dependent schema modules,
i.e. the *-target.json.

>> What could go
>> wrong when this returns "supported" when it actually isn't?
>   
> The management layer would build an invalid QEMU command line and
> QEMU would return "invalid object type: iommufd"
>
> Thanks,
>
> C.



^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-08 12:48       ` Jason Gunthorpe
  2023-11-08 13:25         ` Duan, Zhenzhong
@ 2023-11-09 12:17         ` Joao Martins
  2023-11-09 12:57           ` Jason Gunthorpe
  1 sibling, 1 reply; 114+ messages in thread
From: Joao Martins @ 2023-11-09 12:17 UTC (permalink / raw)
  To: Jason Gunthorpe, Duan, Zhenzhong
  Cc: Matthew Rosato, qemu-devel, alex.williamson, clg, nicolinc,
	eric.auger, peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
	Peng, Chao P, Thomas Huth, Eric Farman, Halil Pasic,
	Jason J. Herne, Tony Krowiak



On 08/11/2023 12:48, Jason Gunthorpe wrote:
> On Wed, Nov 08, 2023 at 07:16:52AM +0000, Duan, Zhenzhong wrote:
> 
>>>> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>> +                                     container->ioas_id, &hwpt_id);
>>>> +
>>>> +    if (ret) {
>>>> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
>>>> +        return ret;
>>>> +    }
>>>
>>> The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to use
>>> iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in the
>>> kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>>>
>>> AFAIU that's because the mdevs are meant to instead use kernel access via
>>> vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
>>> looking at the kernel vfio compat container.
>>>
>>> As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
>>> backend by just skipping this alloc_hwpt above and instead passing container-
>>>> ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>>> vfio_iommufd_emulated_attach_ioas call in the kernel.
>>
>> Thanks for help test and investigation.
>> I was only focusing on real device and missed the mdev particularity, sorry.
>> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
>> I'll digging into this and see how to distinguish mdev with real device in
>> this low level function.
> 
> I was expecting that hwpt manipulation would be done exclusively
> inside the device-specific vIOMMU userspace driver. Generic code paths
> that don't have that knowledge should use the IOAS for everything

I am probably late into noticing this given Zhenzhong v5; but arent' we
forgetting the enforcing of dirty tracking in HWPT is done /via/ ALLOC_HWPT ?

We decided sometime ago that the domain_alloc_user flow (and thus enforcement of
dirty tracking) would go via hwpt manip as opposed to the autodomains flow.

Otherwise if I need to ressurect the autodomains support we will need a
ATTACH_IOAS flag replicating this enforcement to pass into the HWPT auto allocation.

Or I can add the hwpt manip on the qemu dirty tracking support of iommufd.

	Joao


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-09 12:17         ` Joao Martins
@ 2023-11-09 12:57           ` Jason Gunthorpe
  2023-11-09 12:59             ` Joao Martins
  0 siblings, 1 reply; 114+ messages in thread
From: Jason Gunthorpe @ 2023-11-09 12:57 UTC (permalink / raw)
  To: Joao Martins
  Cc: Duan, Zhenzhong, Matthew Rosato, qemu-devel, alex.williamson,
	clg, nicolinc, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak

On Thu, Nov 09, 2023 at 12:17:35PM +0000, Joao Martins wrote:
> 
> 
> On 08/11/2023 12:48, Jason Gunthorpe wrote:
> > On Wed, Nov 08, 2023 at 07:16:52AM +0000, Duan, Zhenzhong wrote:
> > 
> >>>> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
> >>>> +                                     container->ioas_id, &hwpt_id);
> >>>> +
> >>>> +    if (ret) {
> >>>> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
> >>>> +        return ret;
> >>>> +    }
> >>>
> >>> The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to use
> >>> iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in the
> >>> kernel because it can't find an IOMMUFD_OBJ_DEVICE.
> >>>
> >>> AFAIU that's because the mdevs are meant to instead use kernel access via
> >>> vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
> >>> looking at the kernel vfio compat container.
> >>>
> >>> As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
> >>> backend by just skipping this alloc_hwpt above and instead passing container-
> >>>> ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
> >>> vfio_iommufd_emulated_attach_ioas call in the kernel.
> >>
> >> Thanks for help test and investigation.
> >> I was only focusing on real device and missed the mdev particularity, sorry.
> >> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
> >> I'll digging into this and see how to distinguish mdev with real device in
> >> this low level function.
> > 
> > I was expecting that hwpt manipulation would be done exclusively
> > inside the device-specific vIOMMU userspace driver. Generic code paths
> > that don't have that knowledge should use the IOAS for everything
> 
> I am probably late into noticing this given Zhenzhong v5; but arent' we
> forgetting the enforcing of dirty tracking in HWPT is done /via/
> ALLOC_HWPT ?

The underlying viommu driver supporting mdev cannot support dirty
tracking via the hwpt flag, so it doesn't matter.

The entire point is that a mdev doesn't have a hwpt or any of the hwpt
linked features including dirty tracking.

Jason


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-09 12:57           ` Jason Gunthorpe
@ 2023-11-09 12:59             ` Joao Martins
  2023-11-09 13:03               ` Joao Martins
  0 siblings, 1 reply; 114+ messages in thread
From: Joao Martins @ 2023-11-09 12:59 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Duan, Zhenzhong, Matthew Rosato, qemu-devel, alex.williamson,
	clg, nicolinc, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak

On 09/11/2023 12:57, Jason Gunthorpe wrote:
> On Thu, Nov 09, 2023 at 12:17:35PM +0000, Joao Martins wrote:
>> On 08/11/2023 12:48, Jason Gunthorpe wrote:
>>> On Wed, Nov 08, 2023 at 07:16:52AM +0000, Duan, Zhenzhong wrote:
>>>
>>>>>> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>>>> +                                     container->ioas_id, &hwpt_id);
>>>>>> +
>>>>>> +    if (ret) {
>>>>>> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
>>>>>> +        return ret;
>>>>>> +    }
>>>>>
>>>>> The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to use
>>>>> iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in the
>>>>> kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>>>>>
>>>>> AFAIU that's because the mdevs are meant to instead use kernel access via
>>>>> vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
>>>>> looking at the kernel vfio compat container.
>>>>>
>>>>> As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
>>>>> backend by just skipping this alloc_hwpt above and instead passing container-
>>>>>> ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>>>>> vfio_iommufd_emulated_attach_ioas call in the kernel.
>>>>
>>>> Thanks for help test and investigation.
>>>> I was only focusing on real device and missed the mdev particularity, sorry.
>>>> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
>>>> I'll digging into this and see how to distinguish mdev with real device in
>>>> this low level function.
>>>
>>> I was expecting that hwpt manipulation would be done exclusively
>>> inside the device-specific vIOMMU userspace driver. Generic code paths
>>> that don't have that knowledge should use the IOAS for everything
>>
>> I am probably late into noticing this given Zhenzhong v5; but arent' we
>> forgetting the enforcing of dirty tracking in HWPT is done /via/
>> ALLOC_HWPT ?
> 
> The underlying viommu driver supporting mdev cannot support dirty
> tracking via the hwpt flag, so it doesn't matter.
> 
> The entire point is that a mdev doesn't have a hwpt or any of the hwpt
> linked features including dirty tracking.

I am not talking about mdevs; but rather the regular (non mdev) case not being
able to use dirty tracking with autodomains hwpt allocation.

	Joao


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-09 12:59             ` Joao Martins
@ 2023-11-09 13:03               ` Joao Martins
  2023-11-09 13:09                 ` Jason Gunthorpe
  0 siblings, 1 reply; 114+ messages in thread
From: Joao Martins @ 2023-11-09 13:03 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Duan, Zhenzhong, Matthew Rosato, qemu-devel, alex.williamson,
	clg, nicolinc, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak

On 09/11/2023 12:59, Joao Martins wrote:
> On 09/11/2023 12:57, Jason Gunthorpe wrote:
>> On Thu, Nov 09, 2023 at 12:17:35PM +0000, Joao Martins wrote:
>>> On 08/11/2023 12:48, Jason Gunthorpe wrote:
>>>> On Wed, Nov 08, 2023 at 07:16:52AM +0000, Duan, Zhenzhong wrote:
>>>>
>>>>>>> +    ret = iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
>>>>>>> +                                     container->ioas_id, &hwpt_id);
>>>>>>> +
>>>>>>> +    if (ret) {
>>>>>>> +        error_setg_errno(errp, errno, "error alloc shadow hwpt");
>>>>>>> +        return ret;
>>>>>>> +    }
>>>>>>
>>>>>> The above alloc_hwpt fails for mdevs (at least, it fails for me attempting to use
>>>>>> iommufd backend with vfio-ccw and vfio-ap on s390).  The ioctl is failing in the
>>>>>> kernel because it can't find an IOMMUFD_OBJ_DEVICE.
>>>>>>
>>>>>> AFAIU that's because the mdevs are meant to instead use kernel access via
>>>>>> vfio_iommufd_emulated_attach_ioas, not hwpt.  That's how mdevs behave when
>>>>>> looking at the kernel vfio compat container.
>>>>>>
>>>>>> As a test, I was able to get vfio-ccw and vfio-ap working using the iommufd
>>>>>> backend by just skipping this alloc_hwpt above and instead passing container-
>>>>>>> ioas_id into the iommufd_cdev_attach_hwpt below.  That triggers the
>>>>>> vfio_iommufd_emulated_attach_ioas call in the kernel.
>>>>>
>>>>> Thanks for help test and investigation.
>>>>> I was only focusing on real device and missed the mdev particularity, sorry.
>>>>> You are right, there is no hwpt support for mdev, not even an emulated hwpt.
>>>>> I'll digging into this and see how to distinguish mdev with real device in
>>>>> this low level function.
>>>>
>>>> I was expecting that hwpt manipulation would be done exclusively
>>>> inside the device-specific vIOMMU userspace driver. Generic code paths
>>>> that don't have that knowledge should use the IOAS for everything
>>>
>>> I am probably late into noticing this given Zhenzhong v5; but arent' we
>>> forgetting the enforcing of dirty tracking in HWPT is done /via/
>>> ALLOC_HWPT ?
>>
>> The underlying viommu driver supporting mdev cannot support dirty
>> tracking via the hwpt flag, so it doesn't matter.
>>
>> The entire point is that a mdev doesn't have a hwpt or any of the hwpt
>> linked features including dirty tracking.
> 
> I am not talking about mdevs; but rather the regular (non mdev) case not being
> able to use dirty tracking with autodomains hwpt allocation.

... without any vIOMMU.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-09 13:03               ` Joao Martins
@ 2023-11-09 13:09                 ` Jason Gunthorpe
  2023-11-09 13:21                   ` Joao Martins
  0 siblings, 1 reply; 114+ messages in thread
From: Jason Gunthorpe @ 2023-11-09 13:09 UTC (permalink / raw)
  To: Joao Martins
  Cc: Duan, Zhenzhong, Matthew Rosato, qemu-devel, alex.williamson,
	clg, nicolinc, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak

On Thu, Nov 09, 2023 at 01:03:02PM +0000, Joao Martins wrote:

> > I am not talking about mdevs; but rather the regular (non mdev) case not being
> > able to use dirty tracking with autodomains hwpt allocation.
> 
> ... without any vIOMMU.

Ah, well, that is troublesome isn't it..

So do we teach autodomains to be more featured in the kernel or do we
teach the generic qemu code to effectively implement autodomains in
userspace?

Jason


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-09 13:09                 ` Jason Gunthorpe
@ 2023-11-09 13:21                   ` Joao Martins
  2023-11-09 14:34                     ` Jason Gunthorpe
  0 siblings, 1 reply; 114+ messages in thread
From: Joao Martins @ 2023-11-09 13:21 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Duan, Zhenzhong, Matthew Rosato, qemu-devel, alex.williamson,
	clg, nicolinc, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak

On 09/11/2023 13:09, Jason Gunthorpe wrote:
> On Thu, Nov 09, 2023 at 01:03:02PM +0000, Joao Martins wrote:
> 
>>> I am not talking about mdevs; but rather the regular (non mdev) case not being
>>> able to use dirty tracking with autodomains hwpt allocation.
>>
>> ... without any vIOMMU.
> 
> Ah, well, that is troublesome isn't it..
> 
> So do we teach autodomains to be more featured in the kernel or do we
> teach the generic qemu code to effectively implement autodomains in
> userspace?

The latter is actually what we have been doing. Well I wouldn't call autodomains
in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
directly. But well mdevs don't have domains and we overlooked that. I would turn
the exception into an exception rather than making the norm, doesn't look to be
much complexity added?

What I last re-collect is that autodomains represents the 'simple users' that
don't care much beyond the basics of IOMMU features (I recall the example was
DPDK apps and the like). You could say that for current needs IOMMU autodomains
suffices for qemu.

For more advanced features we have advocating into our new iommu domain
manipulation i.e the more advanced API or manipulation domain objects. Nesting
is obviously the one that stresses 99% of the hwpt APIs (beyond alloc), and the
other one has been dirty tracking as the domain is where we enforce current
device support and future device attachments.

Connecting autodomains to this enforcing on the hwpt is relatively simple btw,
it just needs to connect the dirty tracking flag with same semantic of
hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.

It's more of what of a question should be the expectations to the user when
using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I am
wondering if dirty tracking is alone here or whether there's more features that
start to mud the simplicity of autodomains that would approximate of hwpt-alloc.

	Joao


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-09 13:21                   ` Joao Martins
@ 2023-11-09 14:34                     ` Jason Gunthorpe
  2023-11-10  3:15                       ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Jason Gunthorpe @ 2023-11-09 14:34 UTC (permalink / raw)
  To: Joao Martins
  Cc: Duan, Zhenzhong, Matthew Rosato, qemu-devel, alex.williamson,
	clg, nicolinc, eric.auger, peterx, jasowang, Tian, Kevin, Liu,
	Yi L, Sun, Yi Y, Peng, Chao P, Thomas Huth, Eric Farman,
	Halil Pasic, Jason J. Herne, Tony Krowiak

On Thu, Nov 09, 2023 at 01:21:59PM +0000, Joao Martins wrote:
> On 09/11/2023 13:09, Jason Gunthorpe wrote:
> > On Thu, Nov 09, 2023 at 01:03:02PM +0000, Joao Martins wrote:
> > 
> >>> I am not talking about mdevs; but rather the regular (non mdev) case not being
> >>> able to use dirty tracking with autodomains hwpt allocation.
> >>
> >> ... without any vIOMMU.
> > 
> > Ah, well, that is troublesome isn't it..
> > 
> > So do we teach autodomains to be more featured in the kernel or do we
> > teach the generic qemu code to effectively implement autodomains in
> > userspace?
> 
> The latter is actually what we have been doing. Well I wouldn't call autodomains
> in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
> directly. But well mdevs don't have domains and we overlooked that. I would turn
> the exception into an exception rather than making the norm, doesn't look to be
> much complexity added?

Autodomains are complex because of things like mdev and iommu
non-uniformity's. Qemu can't just allocate a single HWPT, it needs to
be annoyingly managed.

> What I last re-collect is that autodomains represents the 'simple users' that
> don't care much beyond the basics of IOMMU features (I recall the example was
> DPDK apps and the like). You could say that for current needs IOMMU autodomains
> suffices for qemu.

Yes, that was my intention. Aside from that it primarily exists to
support vfio compatibility

> Connecting autodomains to this enforcing on the hwpt is relatively simple btw,
> it just needs to connect the dirty tracking flag with same semantic of
> hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.

Yes

> It's more of what of a question should be the expectations to the user when
> using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I am
> wondering if dirty tracking is alone here or whether there's more features that
> start to mud the simplicity of autodomains that would approximate of hwpt-alloc.

This is why I had been thinking of a pure HWPT based scheme

So it seems we cannot have a simple model where the generic qmeu layer
just works in IOAS :( It might as well always work in HWPT and
understand all the auto domains complexity itself.

Jason


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-09  9:05             ` Markus Armbruster
@ 2023-11-10  2:03               ` Duan, Zhenzhong
  2023-11-14  9:40                 ` Cédric Le Goater
  0 siblings, 1 reply; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-10  2:03 UTC (permalink / raw)
  To: Markus Armbruster, Cédric Le Goater
  Cc: qemu-devel, alex.williamson, jgg, nicolinc, Martins, Joao,
	eric.auger, peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
	Peng, Chao P, Paolo Bonzini, Eric Blake, Daniel P.Berrangé,
	Eduardo Habkost, Thomas Huth


>-----Original Message-----
>From: Markus Armbruster <armbru@redhat.com>
>Sent: Thursday, November 9, 2023 5:05 PM
>Subject: Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
>
>Cédric Le Goater <clg@redhat.com> writes:
>
>> On 11/8/23 11:30, Markus Armbruster wrote:
>>> Cédric Le Goater <clg@redhat.com> writes:
>>>
>>>> Hello Markus,
>>>>
>>>> On 11/8/23 06:50, Markus Armbruster wrote:
>>>>> Cédric Le Goater <clg@redhat.com> writes:
>>>>>
>>>>>> On 11/2/23 08:12, Zhenzhong Duan wrote:
>>>>>>> From: Eric Auger <eric.auger@redhat.com>
>>>>>>> Introduce an iommufd object which allows the interaction
>>>>>>> with the host /dev/iommu device.
>>>>>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>>>>>> in which case the fd can be passed directly along with the
>>>>>>> iommufd object:
>>>>>>> This allows the iommufd object to be shared accross several
>>>>>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>>>>>> the /dev/iommu once.
>>>>>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>>>>>> is opened by the qemu code.
>>>>>>> The CONFIG_IOMMUFD option must be set to compile this new object.
>>>>>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>> ---
>>>>>>> v4: add CONFIG_IOMMUFD check, document default case
>>>>>>>     MAINTAINERS              |   7 ++
>>>>>>>     qapi/qom.json            |  22 ++++
>>>>>>>     include/sysemu/iommufd.h |  46 +++++++
>>>>>>>     backends/iommufd-stub.c  |  59 +++++++++
>>>>>>>     backends/iommufd.c       | 257
>+++++++++++++++++++++++++++++++++++++++
>>>>>>>     backends/Kconfig         |   4 +
>>>>>>>     backends/meson.build     |   5 +
>>>>>>>     backends/trace-events    |  12 ++
>>>>>>>     qemu-options.hx          |  13 ++
>>>>>>>     9 files changed, 425 insertions(+)
>>>>>>>     create mode 100644 include/sysemu/iommufd.h
>>>>>>>     create mode 100644 backends/iommufd-stub.c
>>>>>>>     create mode 100644 backends/iommufd.c
>>>>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>>>>> index cd8d6b140f..6f35159255 100644
>>>>>>> --- a/MAINTAINERS
>>>>>>> +++ b/MAINTAINERS
>>>>>>> @@ -2135,6 +2135,13 @@ F: hw/vfio/ap.c
>>>>>>>     F: docs/system/s390x/vfio-ap.rst
>>>>>>>     L: qemu-s390x@nongnu.org
>>>>>>>     +iommufd
>>>>>>> +M: Yi Liu <yi.l.liu@intel.com>
>>>>>>> +M: Eric Auger <eric.auger@redhat.com>
>>>>>>> +S: Supported
>>>>>>> +F: backends/iommufd.c
>>>>>>> +F: include/sysemu/iommufd.h
>>>>>>> +
>>>>>>>     vhost
>>>>>>>     M: Michael S. Tsirkin <mst@redhat.com>
>>>>>>>     S: Supported
>>>>>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>>>>>> index c53ef978ff..27300add48 100644
>>>>>>> --- a/qapi/qom.json
>>>>>>> +++ b/qapi/qom.json
>>>>>>> @@ -794,6 +794,24 @@
>>>>>>>     { 'struct': 'VfioUserServerProperties',
>>>>>>>       'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>>>>>> +##
>>>>>>> +# @IOMMUFDProperties:
>>>>>>> +#
>>>>>>> +# Properties for iommufd objects.
>>>>>>> +#
>>>>>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>>>>>>> +#     which represents a pre-opened /dev/iommu.  This allows the
>>>>>>> +#     iommufd object to be shared accross several subsystems
>>>>>>> +#     (VFIO, VDPA, ...), and the file descriptor to be shared
>>>>>>> +#     with other process, e.g. DPDK.  (default: QEMU opens
>>>>>>> +#     /dev/iommu by itself)
>>>>>>> +#
>>>>>>> +# Since: 8.2
>>>>>>> +##
>>>>>>> +{ 'struct': 'IOMMUFDProperties',
>>>>>>> +  'data': { '*fd': 'str' },
>>>>>>> +  'if': 'CONFIG_IOMMUFD' }
>>>>>>
>>>>>>
>>>>>> Activating or not IOMMUFD on a platform is a configuration choice
>>>>>> and it is not a dependency on an external resource. I would make
>>>>>> things simpler and drop all the #ifdef in the documentation files.
>>>>>
>>>>> What exactly are you proposing?
>>>>
>>>> I would like to simplify the configuration part of this new IOMMUFD
>>>> feature and avoid a ./configure option to enable/disable the feature
>>>> since it has no external dependencies and can be compiled on all
>>>> platforms.
>>>>
>>>> However, we know that it only makes sense to have the IOMMUFD backend
>>>> on platforms s390x, aarch64, x86_64. So I am proposing as an improvement
>>>> to enable IOMMUFD only on these platforms with this addition :
>>>>
>>>>    imply IOMMUFD
>>>>
>>>> to hw/{i386,s390x,arm}/Kconfig files.
>>>>
>>>> This gives us the possibility to compile out the feature downstream
>>>> if something goes wrong, using the files under : configs/devices/.
>>>
>>> Shouldn't we then compile out the relevant parts of the QAPI schema,
>>> too?
>>
>> Is it possible with Kconfig options ?
>
>See below.
>
>>>> Given that the IOMMUFD feature doesn't have any external dependencies
>>>> and that the IOMMUFD backend object is common to all platforms, I am
>>>> also proposing to remove all the CONFIG_IOMMUFD define usage in the
>>>> documentation file "qemu-options.hx" and the schema file "qapi/qom.json".
>>>
>>> Any CONFIG_IOMMUFD left elsewhere?
>>
>> There are. To expose or not vfio properties. Stuff like :
>>
>> ifdef CONFIG_IOMMUFD
>>      DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
>>                       TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
>> #endif
>>      DEFINE_PROP_END_OF_LIST(),
>>
>> and
>>
>> #ifdef CONFIG_IOMMUFD
>>      object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
>> #endif
>>
>>
>>>>> The use of 'if': 'CONFIG_IOMMUFD' in the QAPI schema enables
>>>>> introspection with query-qmp-schema: when ObjectType @iommufd exists,
>>>>> QEMU supports creating the object.  Or am I confused?
>>>>
>>>> Object iommufd should always exist since it is common to all.
>>>>
>>>> Is that acceptable ?
>>>
>>> Perhaps the question to ask is whether a management application needs to
>>> know whether this version of QEMU supports iommufd objects.  If yes,
>>> then query-qmp-schema is an obvious way to find out.
>>
>> Thanks for reminding me of this possibility. In that case, we should
>> indeed avoid returning iommufd support when !CONFIG_IOMMUFD.
>>
>> Can it be implemented in qapi/qom.json with Kconfig options ?
>
>The only tool we have for configuring the schema is the 'if'
>conditional.  'if': 'CONFIG_IOMMUFD' compiles to #if
>defined(CONFIG_IOMMUFD) ... #endif.  Your use of #ifdef CONFIG_IOMMUFD
>above suggests this is fine here.
>
>Symbols that are only defined in target-dependent compiles (see
>exec/poison.h) can only be used in target-dependent schema modules,
>i.e. the *-target.json.

I'm fresh on Kconfig & qapi, but I have a weak idea:
Remove conditional check for backends/iommufd.c, like:

system_ss.add(files('iommufd.c'))

Then iommufd object is common and always supported, we will not see
"invalid object type: iommufd", even for platform other than i386,s390x,arm.

On those platform not supporting iommufd, we can create an iommufd object
which is dummy, as no one will link to it to open /dev/iommufd

Thanks
Zhenzhong

>
>>> What could go
>>> wrong when this returns "supported" when it actually isn't?
>>
>> The management layer would build an invalid QEMU command line and
>> QEMU would return "invalid object type: iommufd"
>>
>> Thanks,
>>
>> C.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-09 14:34                     ` Jason Gunthorpe
@ 2023-11-10  3:15                       ` Duan, Zhenzhong
  2023-11-10 13:09                         ` Joao Martins
  0 siblings, 1 reply; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-10  3:15 UTC (permalink / raw)
  To: Jason Gunthorpe, Martins, Joao
  Cc: Matthew Rosato, qemu-devel, alex.williamson, clg, nicolinc,
	eric.auger, peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
	Peng, Chao P, Thomas Huth, Eric Farman, Halil Pasic,
	Jason J. Herne, Tony Krowiak

Hi Jason, Joao,

>-----Original Message-----
>From: Jason Gunthorpe <jgg@nvidia.com>
>Sent: Thursday, November 9, 2023 10:35 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On Thu, Nov 09, 2023 at 01:21:59PM +0000, Joao Martins wrote:
>> On 09/11/2023 13:09, Jason Gunthorpe wrote:
>> > On Thu, Nov 09, 2023 at 01:03:02PM +0000, Joao Martins wrote:
>> >
>> >>> I am not talking about mdevs; but rather the regular (non mdev) case not
>being
>> >>> able to use dirty tracking with autodomains hwpt allocation.
>> >>
>> >> ... without any vIOMMU.
>> >
>> > Ah, well, that is troublesome isn't it..
>> >
>> > So do we teach autodomains to be more featured in the kernel or do we
>> > teach the generic qemu code to effectively implement autodomains in
>> > userspace?
>>
>> The latter is actually what we have been doing. Well I wouldn't call
>autodomains
>> in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
>> directly. But well mdevs don't have domains and we overlooked that. I would
>turn
>> the exception into an exception rather than making the norm, doesn't look to
>be
>> much complexity added?
>
>Autodomains are complex because of things like mdev and iommu
>non-uniformity's. Qemu can't just allocate a single HWPT, it needs to
>be annoyingly managed.
>
>> What I last re-collect is that autodomains represents the 'simple users' that
>> don't care much beyond the basics of IOMMU features (I recall the example
>was
>> DPDK apps and the like). You could say that for current needs IOMMU
>autodomains
>> suffices for qemu.
>
>Yes, that was my intention. Aside from that it primarily exists to
>support vfio compatibility
>
>> Connecting autodomains to this enforcing on the hwpt is relatively simple btw,
>> it just needs to connect the dirty tracking flag with same semantic of
>> hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.
>
>Yes
>
>> It's more of what of a question should be the expectations to the user when
>> using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I am
>> wondering if dirty tracking is alone here or whether there's more features that
>> start to mud the simplicity of autodomains that would approximate of hwpt-
>alloc.
>
>This is why I had been thinking of a pure HWPT based scheme
>
>So it seems we cannot have a simple model where the generic qmeu layer
>just works in IOAS :( It might as well always work in HWPT and
>understand all the auto domains complexity itself.

Let me know if there is anything I can do in this series to facilitate
future qemu dirty tracking support of iommufd. Not clear if I should
restore to the manual HWPT_ALLOC method in v4.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-10  3:15                       ` Duan, Zhenzhong
@ 2023-11-10 13:09                         ` Joao Martins
  2023-11-13  3:17                           ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Joao Martins @ 2023-11-10 13:09 UTC (permalink / raw)
  To: Duan, Zhenzhong, Jason Gunthorpe
  Cc: Matthew Rosato, qemu-devel, alex.williamson, clg, nicolinc,
	eric.auger, peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
	Peng, Chao P, Thomas Huth, Eric Farman, Halil Pasic,
	Jason J. Herne, Tony Krowiak

On 10/11/2023 03:15, Duan, Zhenzhong wrote:
> Hi Jason, Joao,
> 
>> -----Original Message-----
>> From: Jason Gunthorpe <jgg@nvidia.com>
>> Sent: Thursday, November 9, 2023 10:35 PM
>> Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>>
>> On Thu, Nov 09, 2023 at 01:21:59PM +0000, Joao Martins wrote:
>>> On 09/11/2023 13:09, Jason Gunthorpe wrote:
>>>> On Thu, Nov 09, 2023 at 01:03:02PM +0000, Joao Martins wrote:
>>>>
>>>>>> I am not talking about mdevs; but rather the regular (non mdev) case not
>> being
>>>>>> able to use dirty tracking with autodomains hwpt allocation.
>>>>>
>>>>> ... without any vIOMMU.
>>>>
>>>> Ah, well, that is troublesome isn't it..
>>>>
>>>> So do we teach autodomains to be more featured in the kernel or do we
>>>> teach the generic qemu code to effectively implement autodomains in
>>>> userspace?
>>>
>>> The latter is actually what we have been doing. Well I wouldn't call
>> autodomains
>>> in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
>>> directly. But well mdevs don't have domains and we overlooked that. I would
>> turn
>>> the exception into an exception rather than making the norm, doesn't look to
>> be
>>> much complexity added?
>>
>> Autodomains are complex because of things like mdev and iommu
>> non-uniformity's. Qemu can't just allocate a single HWPT, it needs to
>> be annoyingly managed.
>>
>>> What I last re-collect is that autodomains represents the 'simple users' that
>>> don't care much beyond the basics of IOMMU features (I recall the example
>> was
>>> DPDK apps and the like). You could say that for current needs IOMMU
>> autodomains
>>> suffices for qemu.
>>
>> Yes, that was my intention. Aside from that it primarily exists to
>> support vfio compatibility
>>
>>> Connecting autodomains to this enforcing on the hwpt is relatively simple btw,
>>> it just needs to connect the dirty tracking flag with same semantic of
>>> hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.
>>
>> Yes
>>
>>> It's more of what of a question should be the expectations to the user when
>>> using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I am
>>> wondering if dirty tracking is alone here or whether there's more features that
>>> start to mud the simplicity of autodomains that would approximate of hwpt-
>> alloc.
>>
>> This is why I had been thinking of a pure HWPT based scheme
>>
>> So it seems we cannot have a simple model where the generic qmeu layer
>> just works in IOAS :( It might as well always work in HWPT and
>> understand all the auto domains complexity itself.
> 
> Let me know if there is anything I can do in this series to facilitate
> future qemu dirty tracking support of iommufd. Not clear if I should
> restore to the manual HWPT_ALLOC method in v4.

If we want to have the closest support as type1-iommu, from what we have been
discussing... it sounds like IOAS is the easiest first step to get barebones
iommufd support. Which sort of makes sense since this is the introduction of
iommufd and it already requires a lot of churn & refactoring to get there.

For the new iommufd-only features (nesting/dirty-tracking) we will need the auto
domains done by Qemu IIUC -- unless nesting is meant to coexist with autodomains
with its own hwpts somehow (?)

Right now I don't have the autodomains QEMU equivalent structure in mind to
suggest a path in alternative to v5; Looking at the kernel autodomains path,
aside from mdev I am not sure yet what annoyances the autodomains path in qemu
is going to generate: more worringly whether we have enough information to
tackle the non-uniformity e.g. if we are talking about features or whether
different devices are behind different IOMMUs.


^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
  2023-11-10 13:09                         ` Joao Martins
@ 2023-11-13  3:17                           ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-13  3:17 UTC (permalink / raw)
  To: Joao Martins, Jason Gunthorpe
  Cc: Matthew Rosato, qemu-devel, alex.williamson, clg, nicolinc,
	eric.auger, peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
	Peng, Chao P, Thomas Huth, Eric Farman, Halil Pasic,
	Jason J. Herne, Tony Krowiak



>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Sent: Friday, November 10, 2023 9:09 PM
>Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend
>
>On 10/11/2023 03:15, Duan, Zhenzhong wrote:
>> Hi Jason, Joao,
>>
>>> -----Original Message-----
>>> From: Jason Gunthorpe <jgg@nvidia.com>
>>> Sent: Thursday, November 9, 2023 10:35 PM
>>> Subject: Re: [PATCH v4 28/41] vfio/iommufd: Implement the iommufd
>backend
>>>
>>> On Thu, Nov 09, 2023 at 01:21:59PM +0000, Joao Martins wrote:
>>>> On 09/11/2023 13:09, Jason Gunthorpe wrote:
>>>>> On Thu, Nov 09, 2023 at 01:03:02PM +0000, Joao Martins wrote:
>>>>>
>>>>>>> I am not talking about mdevs; but rather the regular (non mdev) case not
>>> being
>>>>>>> able to use dirty tracking with autodomains hwpt allocation.
>>>>>>
>>>>>> ... without any vIOMMU.
>>>>>
>>>>> Ah, well, that is troublesome isn't it..
>>>>>
>>>>> So do we teach autodomains to be more featured in the kernel or do we
>>>>> teach the generic qemu code to effectively implement autodomains in
>>>>> userspace?
>>>>
>>>> The latter is actually what we have been doing. Well I wouldn't call
>>> autodomains
>>>> in qemu, but rather just allocate a hwpt, instead of attaching the IOAS
>>>> directly. But well mdevs don't have domains and we overlooked that. I would
>>> turn
>>>> the exception into an exception rather than making the norm, doesn't look to
>>> be
>>>> much complexity added?
>>>
>>> Autodomains are complex because of things like mdev and iommu
>>> non-uniformity's. Qemu can't just allocate a single HWPT, it needs to
>>> be annoyingly managed.
>>>
>>>> What I last re-collect is that autodomains represents the 'simple users' that
>>>> don't care much beyond the basics of IOMMU features (I recall the example
>>> was
>>>> DPDK apps and the like). You could say that for current needs IOMMU
>>> autodomains
>>>> suffices for qemu.
>>>
>>> Yes, that was my intention. Aside from that it primarily exists to
>>> support vfio compatibility
>>>
>>>> Connecting autodomains to this enforcing on the hwpt is relatively simple
>btw,
>>>> it just needs to connect the dirty tracking flag with same semantic of
>>>> hwpt-alloc equivalent and pass the hwpt flags into the domain allocation.
>>>
>>> Yes
>>>
>>>> It's more of what of a question should be the expectations to the user when
>>>> using ATTACH_HWPT with an IOAS_ID versus direct manipulation of HWPT. I
>am
>>>> wondering if dirty tracking is alone here or whether there's more features
>that
>>>> start to mud the simplicity of autodomains that would approximate of hwpt-
>>> alloc.
>>>
>>> This is why I had been thinking of a pure HWPT based scheme
>>>
>>> So it seems we cannot have a simple model where the generic qmeu layer
>>> just works in IOAS :( It might as well always work in HWPT and
>>> understand all the auto domains complexity itself.
>>
>> Let me know if there is anything I can do in this series to facilitate
>> future qemu dirty tracking support of iommufd. Not clear if I should
>> restore to the manual HWPT_ALLOC method in v4.
>
>If we want to have the closest support as type1-iommu, from what we have been
>discussing... it sounds like IOAS is the easiest first step to get barebones
>iommufd support. Which sort of makes sense since this is the introduction of
>iommufd and it already requires a lot of churn & refactoring to get there.
Agree.

>
>For the new iommufd-only features (nesting/dirty-tracking) we will need the auto
>domains done by Qemu IIUC -- unless nesting is meant to coexist with
>autodomains
>with its own hwpts somehow (?)

We have a draft nesting implementation which has its own hwpts and coexist
with autodomains.
See https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_cdev_v5_nesting
if you are interested.

>
>Right now I don't have the autodomains QEMU equivalent structure in mind to
>suggest a path in alternative to v5; Looking at the kernel autodomains path,
>aside from mdev I am not sure yet what annoyances the autodomains path in
>qemu
>is going to generate: more worringly whether we have enough information to
>tackle the non-uniformity e.g. if we are talking about features or whether
>different devices are behind different IOMMUs.

OK, looks more thinking and discuss needed except mdev.
I'd like to keep this series as a basic iommufd support with IOAS attaching.
QEMU autodomain may be another series addressing new iommufd-only features.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 114+ messages in thread

* Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-10  2:03               ` Duan, Zhenzhong
@ 2023-11-14  9:40                 ` Cédric Le Goater
  2023-11-14 10:18                   ` Duan, Zhenzhong
  0 siblings, 1 reply; 114+ messages in thread
From: Cédric Le Goater @ 2023-11-14  9:40 UTC (permalink / raw)
  To: Duan, Zhenzhong, Markus Armbruster
  Cc: qemu-devel, alex.williamson, jgg, nicolinc, Martins, Joao,
	eric.auger, peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
	Peng, Chao P, Paolo Bonzini, Eric Blake, Daniel P.Berrangé,
	Eduardo Habkost, Thomas Huth


>> The only tool we have for configuring the schema is the 'if'
>> conditional.  'if': 'CONFIG_IOMMUFD' compiles to #if
>> defined(CONFIG_IOMMUFD) ... #endif.  Your use of #ifdef CONFIG_IOMMUFD
>> above suggests this is fine here.
>>
>> Symbols that are only defined in target-dependent compiles (see
>> exec/poison.h) can only be used in target-dependent schema modules,
>> i.e. the *-target.json.
> 
> I'm fresh on Kconfig & qapi, but I have a weak idea:
> Remove conditional check for backends/iommufd.c, like:
> 
> system_ss.add(files('iommufd.c'))
> 
> Then iommufd object is common and always supported, we will not see
> "invalid object type: iommufd", even for platform other than i386,s390x,arm.
>
> On those platform not supporting iommufd, we can create an iommufd object
> which is dummy, as no one will link to it to open /dev/iommufd

In that case, the management layer would define a crippled vfio-pci
device. I'd rather let the error occur or find a way to move the
"iommufd" object and properties to a target dependent file. I don't
see how this could be done though.

Thanks,

C.




^ permalink raw reply	[flat|nested] 114+ messages in thread

* RE: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
  2023-11-14  9:40                 ` Cédric Le Goater
@ 2023-11-14 10:18                   ` Duan, Zhenzhong
  0 siblings, 0 replies; 114+ messages in thread
From: Duan, Zhenzhong @ 2023-11-14 10:18 UTC (permalink / raw)
  To: Cédric Le Goater, Markus Armbruster
  Cc: qemu-devel, alex.williamson, jgg, nicolinc, Martins, Joao,
	eric.auger, peterx, jasowang, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
	Peng, Chao P, Paolo Bonzini, Eric Blake, Daniel P.Berrangé,
	Eduardo Habkost, Thomas Huth



>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 14, 2023 5:41 PM
>Subject: Re: [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object
>
>
>>> The only tool we have for configuring the schema is the 'if'
>>> conditional.  'if': 'CONFIG_IOMMUFD' compiles to #if
>>> defined(CONFIG_IOMMUFD) ... #endif.  Your use of #ifdef CONFIG_IOMMUFD
>>> above suggests this is fine here.
>>>
>>> Symbols that are only defined in target-dependent compiles (see
>>> exec/poison.h) can only be used in target-dependent schema modules,
>>> i.e. the *-target.json.
>>
>> I'm fresh on Kconfig & qapi, but I have a weak idea:
>> Remove conditional check for backends/iommufd.c, like:
>>
>> system_ss.add(files('iommufd.c'))
>>
>> Then iommufd object is common and always supported, we will not see
>> "invalid object type: iommufd", even for platform other than i386,s390x,arm.
>>
>> On those platform not supporting iommufd, we can create an iommufd object
>> which is dummy, as no one will link to it to open /dev/iommufd
>
>In that case, the management layer would define a crippled vfio-pci
>device. I'd rather let the error occur or find a way to move the
>"iommufd" object and properties to a target dependent file. I don't
>see how this could be done though.

I see, error occur is better than a crippled vfio-pci device. Or else we
need to teach libvirt to also check /dev/iommu existence.

Thanks
Zhenzhong


^ permalink raw reply	[flat|nested] 114+ messages in thread

end of thread, other threads:[~2023-11-14 10:19 UTC | newest]

Thread overview: 114+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-02  7:12 [PATCH v4 00/41] vfio: Adopt iommufd Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 01/41] vfio/container: Move IBM EEH related functions into spapr_pci_vfio.c Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 02/41] vfio/container: Move vfio_container_add/del_section_window into spapr.c Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 03/41] vfio/container: Move spapr specific init/deinit " Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 04/41] vfio/spapr: Make vfio_spapr_create/remove_window static Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 05/41] vfio/common: Move vfio_host_win_add/del into spapr.c Zhenzhong Duan
2023-11-06  9:33   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 06/41] vfio: Introduce base object for VFIOContainer and targeted interface Zhenzhong Duan
2023-11-06 16:36   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 07/41] vfio/container: Introduce a empty VFIOIOMMUOps Zhenzhong Duan
2023-11-06 16:36   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 08/41] vfio/container: Switch to dma_map|unmap API Zhenzhong Duan
2023-11-06 16:37   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 09/41] vfio/common: Introduce vfio_container_init/destroy helper Zhenzhong Duan
2023-11-06 16:37   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 10/41] vfio/common: Move giommu_list in base container Zhenzhong Duan
2023-11-06 16:50   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 11/41] vfio/container: Move space field to " Zhenzhong Duan
2023-11-06 16:50   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 12/41] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API Zhenzhong Duan
2023-11-06 16:50   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 13/41] vfio/container: Move per container device list in base container Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 14/41] vfio/container: Convert functions to " Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 15/41] vfio/container: Move pgsizes and dma_max_mappings " Zhenzhong Duan
2023-11-06 16:53   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 16/41] vfio/container: Move vrdl_list " Zhenzhong Duan
2023-11-06 16:53   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 17/41] vfio/container: Move listener " Zhenzhong Duan
2023-11-06 16:57   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 18/41] vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size " Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 19/41] vfio/container: Move iova_ranges " Zhenzhong Duan
2023-11-06 16:58   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 20/41] vfio/container: Implement attach/detach_device Zhenzhong Duan
2023-11-06 16:59   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 21/41] vfio/spapr: Introduce spapr backend and target interface Zhenzhong Duan
2023-11-06 17:30   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 22/41] vfio/spapr: switch to spapr IOMMU BE add/del_section_window Zhenzhong Duan
2023-11-06 17:33   ` Cédric Le Goater
2023-11-07  3:06     ` Duan, Zhenzhong
2023-11-07 13:07       ` Cédric Le Goater
2023-11-07 17:34   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 23/41] vfio/spapr: Move prereg_listener into spapr container Zhenzhong Duan
2023-11-06 17:34   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 24/41] vfio/spapr: Move hostwin_list " Zhenzhong Duan
2023-11-06 17:35   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 25/41] Add iommufd configure option Zhenzhong Duan
2023-11-07 13:14   ` Cédric Le Goater
2023-11-07 14:37     ` Cédric Le Goater
2023-11-08  6:08       ` Duan, Zhenzhong
2023-11-02  7:12 ` [PATCH v4 26/41] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
2023-11-07 13:33   ` Cédric Le Goater
2023-11-08  3:35     ` Duan, Zhenzhong
2023-11-08  9:40       ` Cédric Le Goater
2023-11-08  9:43         ` Duan, Zhenzhong
2023-11-08  5:50     ` Markus Armbruster
2023-11-08 10:03       ` Cédric Le Goater
2023-11-08 10:30         ` Markus Armbruster
2023-11-08 13:48           ` Cédric Le Goater
2023-11-09  9:05             ` Markus Armbruster
2023-11-10  2:03               ` Duan, Zhenzhong
2023-11-14  9:40                 ` Cédric Le Goater
2023-11-14 10:18                   ` Duan, Zhenzhong
2023-11-02  7:12 ` [PATCH v4 27/41] util/char_dev: Add open_cdev() Zhenzhong Duan
2023-11-07 13:37   ` Cédric Le Goater
2023-11-08  4:29     ` Duan, Zhenzhong
2023-11-02  7:12 ` [PATCH v4 28/41] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
2023-11-07 13:41   ` Cédric Le Goater
2023-11-08  5:45     ` Duan, Zhenzhong
2023-11-08  2:59   ` Matthew Rosato
2023-11-08  7:16     ` Duan, Zhenzhong
2023-11-08 12:48       ` Jason Gunthorpe
2023-11-08 13:25         ` Duan, Zhenzhong
2023-11-08 14:19           ` Jason Gunthorpe
2023-11-09  2:45             ` Duan, Zhenzhong
2023-11-09 12:17         ` Joao Martins
2023-11-09 12:57           ` Jason Gunthorpe
2023-11-09 12:59             ` Joao Martins
2023-11-09 13:03               ` Joao Martins
2023-11-09 13:09                 ` Jason Gunthorpe
2023-11-09 13:21                   ` Joao Martins
2023-11-09 14:34                     ` Jason Gunthorpe
2023-11-10  3:15                       ` Duan, Zhenzhong
2023-11-10 13:09                         ` Joao Martins
2023-11-13  3:17                           ` Duan, Zhenzhong
2023-11-02  7:12 ` [PATCH v4 29/41] vfio/iommufd: Relax assert check for " Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 30/41] vfio/iommufd: Add support for iova_ranges Zhenzhong Duan
2023-11-06 17:19   ` Cédric Le Goater
2023-11-07  3:07     ` Duan, Zhenzhong
2023-11-02  7:12 ` [PATCH v4 31/41] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Zhenzhong Duan
2023-11-07 13:48   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 32/41] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
2023-11-07 13:52   ` Cédric Le Goater
2023-11-08  5:46     ` Duan, Zhenzhong
2023-11-02  7:12 ` [PATCH v4 33/41] vfio/iommufd: Enable pci hot reset through iommufd cdev interface Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 34/41] vfio/pci: Allow the selection of a given iommu backend Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 35/41] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 36/41] vfio: Allow the selection of a given iommu backend for platform ap and ccw Zhenzhong Duan
2023-11-07 18:18   ` Cédric Le Goater
2023-11-02  7:12 ` [PATCH v4 37/41] vfio/platform: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-11-02  7:12 ` [PATCH v4 38/41] vfio/ap: " Zhenzhong Duan
2023-11-07 18:19   ` Cédric Le Goater
2023-11-02  7:13 ` [PATCH v4 39/41] vfio/ccw: " Zhenzhong Duan
2023-11-07 18:20   ` Cédric Le Goater
2023-11-02  7:13 ` [PATCH v4 40/41] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Zhenzhong Duan
2023-11-02  7:13 ` [PATCH v4 41/41] vfio: Compile out iommufd for PPC target Zhenzhong Duan
2023-11-07 13:44   ` Cédric Le Goater
2023-11-08  4:31     ` Duan, Zhenzhong
2023-11-06 14:23 ` [PATCH v4 00/41] vfio: Adopt iommufd Cédric Le Goater
2023-11-07 18:28 ` Cédric Le Goater
2023-11-08  3:26   ` Matthew Rosato
2023-11-08  8:37     ` Duan, Zhenzhong
2023-11-08  9:07       ` Duan, Zhenzhong
2023-11-08  9:23         ` Cédric Le Goater
2023-11-08  9:21     ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.