All of lore.kernel.org
 help / color / mirror / Atom feed
From: Zhenzhong Duan <zhenzhong.duan@intel.com>
To: qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com,
	nicolinc@nvidia.com, joao.m.martins@oracle.com,
	eric.auger@redhat.com, peterx@redhat.com, jasowang@redhat.com,
	kevin.tian@intel.com, yi.l.liu@intel.com, yi.y.sun@intel.com,
	chao.p.peng@intel.com, Zhenzhong Duan <zhenzhong.duan@intel.com>
Subject: [PATCH v2 00/27] vfio: Adopt iommufd
Date: Mon, 16 Oct 2023 16:31:56 +0800	[thread overview]
Message-ID: <20231016083223.1519410-1-zhenzhong.duan@intel.com> (raw)

Hi,

Thanks all for giving guides and comments on previous series, here is
the pure iommufd support part.


PATCH 1-15: Abstract out base container
PATCH 16: Add --enable/--disable-iommufd config support
PATCH 17: Introduce iommufd object
PATCH 18-21: add IOMMUFD container and cdev support
PATCH 22-27: fd passing for IOMMUFD object and cdev


We have done wide test with different combinations, e.g:
- PCI device were tested
- FD passing and hot reset with some trick.
- device hotplug test with legacy and iommufd backends
- with or without vIOMMU for legacy and iommufd backends
- devices linked to different iommufd backends
- VFIO migration with a E800 net card(no dirty sync support) passthrough
- platform, ccw and ap were only compile-tested due to environment limit


Given some iommufd kernel limitations, the iommufd backend is
not yet fully on par with the legacy backend w.r.t. features like:
- p2p mappings (you will see related error traces)
- dirty page sync
- and etc.


qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v2

--------------------------------------------------------------------------

Below are some background and graph about the design:

With the introduction of iommufd, the Linux kernel provides a generic
interface for userspace drivers to propagate their DMA mappings to kernel
for assigned devices. This series does the porting of the VFIO devices
onto the /dev/iommu uapi and let it coexist with the legacy implementation.

At QEMU level, interactions with the /dev/iommu are abstracted by a new
iommufd object (compiled in with the CONFIG_IOMMUFD option).

Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
linked with an iommufd object. In this series, the vfio-pci device is
granted with such capability (other VFIO devices are not yet ready):

It gets a new optional parameter named iommufd which allows to pass
an iommufd object:

    -object iommufd,id=iommufd0
    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0

Note the /dev/iommu and vfio cdev can be externally opened by a
management layer. In such a case the fd is passed:

    -object iommufd,id=iommufd0,fd=22
    -device vfio-pci,iommufd=iommufd0,fd=23

If the fd parameter is not passed, the fd is opened by QEMU.
See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
for detailed discuss on this requirement.

If no iommufd option is passed to the vfio-pci device, iommufd is not
used and the end-user gets the behavior based on the legacy vfio iommu
interfaces:

    -device vfio-pci,host=0000:02:00.0

While the legacy kernel interface is group-centric, the new iommufd
interface is device-centric, relying on device fd and iommufd.

To support both interfaces in the QEMU VFIO device we reworked the vfio
container abstraction so that the generic VFIO code can use either
backend.

The VFIOContainer object becomes a base object derived into
a) the legacy VFIO container and
b) the new iommufd based container.

The base object implements generic code such as code related to
memory_listener and address space management whereas the derived
objects implement callbacks specific to either BE, legacy and
iommufd. Indeed each backend has its own way to setup secure context
and dma management interface. The below diagram shows how it looks
like with both BEs.

                    VFIO                           AddressSpace/Memory
    +-------+  +----------+  +-----+  +-----+
    |  pci  |  | platform |  |  ap |  | ccw |
    +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
        |           |           |        |        |   AddressSpace       |
        |           |           |        |        +------------+---------+
    +---V-----------V-----------V--------V----+               /
    |           VFIOAddressSpace              | <------------+
    |                  |                      |  MemoryListener
    |          VFIOContainer list             |
    +-------+----------------------------+----+
            |                            |
            |                            |
    +-------V------+            +--------V----------+
    |   iommufd    |            |    vfio legacy    |
    |  container   |            |     container     |
    +-------+------+            +--------+----------+
            |                            |
            | /dev/iommu                 | /dev/vfio/vfio
            | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
Userspace   |                            |
============+============================+===========================
Kernel      |  device fd                 |
            +---------------+            | group/container fd
            | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
            |  ATTACH_IOAS) |            | device fd
            |               |            |
            |       +-------V------------V-----------------+
    iommufd |       |                vfio                  |
(map/unmap  |       +---------+--------------------+-------+
ioas_copy)  |                 |                    | map/unmap
            |                 |                    |
     +------V------+    +-----V------+      +------V--------+
     | iommfd core |    |  device    |      |  vfio iommu   |
     +-------------+    +------------+      +---------------+

[Secure Context setup]
- iommufd BE: uses device fd and iommufd to setup secure context
              (bind_iommufd, attach_ioas)
- vfio legacy BE: uses group fd and container fd to setup secure context
                  (set_container, set_iommu)
[Device access]
- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
- vfio legacy BE: device fd is retrieved from group fd ioctl
[DMA Mapping flow]
1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
2. VFIO populates DMA map/unmap via the container BEs
   *) iommufd BE: uses iommufd
   *) vfio legacy BE: uses container fd


Changelog:
v2:
- rebase to vfio-next
- split PATCH "vfio: Add base container" in v1 to PATCH1-15 per Cédric
- add fd passing to platform/ap/ccw vfio device
- add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
- rename char_dev.h to chardev_open.h for same naming scheme per Daniel
- add full copyright per Daniel and Jason


Note changelog below are from full IOMMUFD series:

v1:
- Alloc hwpt instead of using auto hwpt
- elaborate iommufd code per Nicolin
- consolidate two patches and drop as.c
- typo error fix and function rename

rfcv4:
- rebase on top of v8.0.3
- Add one patch from Yi which is about vfio device add in kvm
- Remove IOAS_COPY optimization and focus on functions in this patchset
- Fix wrong name issue reported and fix suggested by Matthew
- Fix compilation issue reported and fix sugggsted by Nicolin
- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
granularity
- Add dev_iter_next() callback to avoid adding so many callback
  at container scope, add VFIODevice.hwpt to support that
- Restore all functions back to common from container whenever possible,
  mainly migration and reset related functions
- Add --enable/disable-iommufd config option, enabled by default in linux
- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
redundant code
- Add FD passing support for vfio device backed by IOMMUFD
- Fix hot unplug resource leak issue in vfio_legacy_detach_device()
- Fix FD leak in vfio_get_devicefd()

rfcv3:
- rebase on top of v7.2.0
- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
  VFIO backends
- Fix use after free in error path, reported by Alister
- Split common.c in several steps to ease the review

rfcv2:
- remove the first three patches of rfcv1
- add open cdev helper suggested by Jason
- remove the QOMification of the VFIOContainer and simply use standard ops
(David)
- add "-object iommufd" suggested by Alex

Thanks
Zhenzhong


Eric Auger (16):
  vfio: Rename VFIOContainer into VFIOLegacyContainer
  vfio: Introduce base object for VFIOContainer and targetted interface
  VFIO/container: Introduce dummy VFIOContainerClass implementation
  vfio/container: Switch to dma_map|unmap API
  vfio/common: Move giommu_list in base container
  vfio/container: Move space field to base container
  vfio/container: switch to IOMMU BE add/del_section_window
  vfio/container: Move hostwin_list in base container
  vfio/container: Switch to IOMMU BE
    set_dirty_page_tracking/query_dirty_bitmap API
  vfio/container: Convert functions to base container
  vfio/container: Move vrdl_list, pgsizes and dma_max_mappings to base
    container
  vfio/container: Move listener to base container
  vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size to base
    container
  vfio/container: Implement attach/detach_device
  backends/iommufd: Introduce the iommufd object
  vfio/pci: Allow the selection of a given iommu backend

Yi Liu (2):
  util/char_dev: Add open_cdev()
  vfio/iommufd: Implement the iommufd backend

Zhenzhong Duan (9):
  vfio/container: Move per container device list in base container
  Add iommufd configure option
  vfio/container: Bypass EEH if iommufd backend
  vfio/pci: Adapt vfio pci hot reset support with iommufd BE
  vfio/pci: Make vfio cdev pre-openable by passing a file handle
  vfio: Allow the selection of a given iommu backend for platform ap and
    ccw
  vfio/platform: Make vfio cdev pre-openable by passing a file handle
  vfio/ap: Make vfio cdev pre-openable by passing a file handle
  vfio/ccw: Make vfio cdev pre-openable by passing a file handle

 MAINTAINERS                           |  13 +
 meson.build                           |   6 +
 qapi/qom.json                         |  18 +-
 include/hw/vfio/vfio-common.h         | 110 ++----
 include/hw/vfio/vfio-container-base.h | 153 ++++++++
 include/hw/vfio/vfio-platform.h       |   1 +
 include/qemu/chardev_open.h           |  16 +
 include/sysemu/iommufd.h              |  46 +++
 backends/iommufd-stub.c               |  59 +++
 backends/iommufd.c                    | 268 +++++++++++++
 hw/vfio/ap.c                          |  37 +-
 hw/vfio/ccw.c                         |  39 +-
 hw/vfio/common.c                      | 274 +++++++------
 hw/vfio/container-base.c              | 150 +++++++
 hw/vfio/container.c                   | 243 +++++++-----
 hw/vfio/helpers.c                     |  33 ++
 hw/vfio/iommufd.c                     | 539 ++++++++++++++++++++++++++
 hw/vfio/pci.c                         | 257 ++++++++++--
 hw/vfio/platform.c                    |  45 ++-
 hw/vfio/spapr.c                       |  23 +-
 util/chardev_open.c                   |  81 ++++
 backends/Kconfig                      |   4 +
 backends/meson.build                  |   5 +
 backends/trace-events                 |  12 +
 hw/vfio/meson.build                   |   4 +
 hw/vfio/trace-events                  |  17 +-
 meson_options.txt                     |   2 +
 qemu-options.hx                       |  13 +
 scripts/meson-buildoptions.sh         |   3 +
 util/meson.build                      |   1 +
 30 files changed, 2122 insertions(+), 350 deletions(-)
 create mode 100644 include/hw/vfio/vfio-container-base.h
 create mode 100644 include/qemu/chardev_open.h
 create mode 100644 include/sysemu/iommufd.h
 create mode 100644 backends/iommufd-stub.c
 create mode 100644 backends/iommufd.c
 create mode 100644 hw/vfio/container-base.c
 create mode 100644 hw/vfio/iommufd.c
 create mode 100644 util/chardev_open.c

-- 
2.34.1



             reply	other threads:[~2023-10-16  8:48 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16  8:31 Zhenzhong Duan [this message]
2023-10-16  8:31 ` [PATCH v2 01/27] vfio: Rename VFIOContainer into VFIOLegacyContainer Zhenzhong Duan
2023-10-17 15:50   ` Cédric Le Goater
2023-10-18  2:33     ` Duan, Zhenzhong
2023-10-16  8:31 ` [PATCH v2 02/27] vfio: Introduce base object for VFIOContainer and targetted interface Zhenzhong Duan
2023-10-17 15:51   ` Cédric Le Goater
2023-10-18  2:41     ` Duan, Zhenzhong
2023-10-18  8:04       ` Cédric Le Goater
2023-10-19  2:29         ` Duan, Zhenzhong
2023-10-19 12:17           ` Cédric Le Goater
2023-10-20  5:48             ` Duan, Zhenzhong
2023-10-20  8:19               ` Eric Auger
2023-10-20  8:28                 ` Duan, Zhenzhong
2023-10-23 15:28                 ` Cédric Le Goater
2023-10-24  6:03                   ` Duan, Zhenzhong
2023-10-24  6:51                     ` Cédric Le Goater
2023-10-16  8:31 ` [PATCH v2 03/27] VFIO/container: Introduce dummy VFIOContainerClass implementation Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 04/27] vfio/container: Switch to dma_map|unmap API Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 05/27] vfio/common: Move giommu_list in base container Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 06/27] vfio/container: Move space field to " Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 07/27] vfio/container: switch to IOMMU BE add/del_section_window Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 08/27] vfio/container: Move hostwin_list in base container Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 09/27] vfio/container: Switch to IOMMU BE set_dirty_page_tracking/query_dirty_bitmap API Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 10/27] vfio/container: Move per container device list in base container Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 11/27] vfio/container: Convert functions to " Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 12/27] vfio/container: Move vrdl_list, pgsizes and dma_max_mappings " Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 13/27] vfio/container: Move listener " Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 14/27] vfio/container: Move dirty_pgsizes and max_dirty_bitmap_size " Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 15/27] vfio/container: Implement attach/detach_device Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 16/27] Add iommufd configure option Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 17/27] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
2023-10-16 10:00   ` Markus Armbruster
2023-10-17  8:27     ` Duan, Zhenzhong
2023-10-16  8:32 ` [PATCH v2 18/27] util/char_dev: Add open_cdev() Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 19/27] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 20/27] vfio/container: Bypass EEH if " Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 21/27] vfio/pci: Adapt vfio pci hot reset support with iommufd BE Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 22/27] vfio/pci: Allow the selection of a given iommu backend Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 23/27] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 24/27] vfio: Allow the selection of a given iommu backend for platform ap and ccw Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 25/27] vfio/platform: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 26/27] vfio/ap: " Zhenzhong Duan
2023-10-16  8:32 ` [PATCH v2 27/27] vfio/ccw: " Zhenzhong Duan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231016083223.1519410-1-zhenzhong.duan@intel.com \
    --to=zhenzhong.duan@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=chao.p.peng@intel.com \
    --cc=clg@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kevin.tian@intel.com \
    --cc=nicolinc@nvidia.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yi.l.liu@intel.com \
    --cc=yi.y.sun@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.