kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [DRAFT RFC] ioeventfd/ioregionfd support in vfio-user
@ 2021-01-21 16:09 John Levon
  2021-01-26 11:01 ` Stefan Hajnoczi
  0 siblings, 1 reply; 3+ messages in thread
From: John Levon @ 2021-01-21 16:09 UTC (permalink / raw)
  To: libvfio-user-devel, kvm, Stefan Hajnoczi, Cornelia Huck,
	john.g.johnson, qemu-devel


Hi, please take a look. For context, this addition is against the current
https://github.com/nutanix/libvfio-user/blob/master/docs/vfio-user.rst
specification.

kvm@ readers: Stefan suggested this may be of interest from a VFIO point of
view, in case there is any potential cross-over in defining how to wire up these
fds.

Note that this is a new message instead of adding a new region capability to
VFIO_USER_DEVICE_GET_REGION_INFO: with a new capability, there's no way for the
server to know if ioeventfd/ioregionfd is actually used/requested by the client
(the client would just have to close those fds if it didn't want to use FDs). An
explicit new call is much clearer for this.

The ioregionfd feature itself is at proposal stage, so there's a good chance of
further changes depending on that.

I also have these pending issues so far:

1) I'm not familiar with CCW bus, so don't know if
KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY flag makes sense or is supportable in
vfio-user context

2) if ioregionfd subsumes all ioeventfd use cases, we can remove the distinction
from this API, and the client caller can translate to ioregionfd or ioeventfd as
needed

3) is it neccessary for the client to indicate support (e.g. lacking ioregionfd
support) ?

4) (ioregionfd issue) region_id is 4 bytes, which seems a little awkward from
the server side: this has to encode both the region ID as well as the offset of
the sub-region within that region. Can this be 64 bits wide?

thanks
john

(NB: I edited the diff so the new sections are more readable.)

diff --git a/docs/vfio-user.rst b/docs/vfio-user.rst
index e3adc7a..e7274a2 100644
--- a/docs/vfio-user.rst
+++ b/docs/vfio-user.rst
@@ -161,6 +161,17 @@ in the region info reply of a device-specific region. Such regions are reflected
 in ``struct vfio_device_info.num_regions``. Thus, for PCI devices this value can
 be equal to, or higher than, VFIO_PCI_NUM_REGIONS.
 
+Region I/O via file descriptors
+-------------------------------
+
+For unmapped regions, region I/O from the client is done via
+VFIO_USER_REGION_READ/WRITE.  As an optimization, ioeventfds or ioregionfds may
+be configured for sub-regions of some regions. A client may request information
+on these sub-regions via VFIO_USER_DEVICE_GET_REGION_IO_FDS; by configuring the
+returned file descriptors as ioeventfds or ioregionfds, the server can be
+directly notified of I/O (for example, by KVM) without taking a trip through the
+client.
+
 Interrupts
 ^^^^^^^^^^
 The client uses VFIO_USER_DEVICE_GET_IRQ_INFO messages to query the server for
@@ -293,37 +304,39 @@ Commands
 The following table lists the VFIO message command IDs, and whether the
 message command is sent from the client or the server.
 
-+----------------------------------+---------+-------------------+
-| Name                             | Command | Request Direction |
-+==================================+=========+===================+
-| VFIO_USER_VERSION                | 1       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_MAP                | 2       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_UNMAP              | 3       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_INFO        | 4       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_REGION_INFO | 5       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_IRQ_INFO    | 6       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_SET_IRQS        | 7       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_REGION_READ            | 8       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_REGION_WRITE           | 9       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_READ               | 10      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_WRITE              | 11      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_VM_INTERRUPT           | 12      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_RESET           | 13      | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DIRTY_PAGES            | 14      | client -> server  |
-+----------------------------------+---------+-------------------+
++------------------------------------+---------+-------------------+
+| Name                               | Command | Request Direction |
++====================================+=========+===================+
+| VFIO_USER_VERSION                  | 1       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_MAP                  | 2       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_UNMAP                | 3       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_INFO          | 4       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_REGION_INFO   | 5       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_REGION_IO_FDS | 6       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_IRQ_INFO      | 7       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_SET_IRQS          | 8       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_REGION_READ              | 9       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_REGION_WRITE             | 10      | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_READ                 | 11      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_WRITE                | 12      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_VM_INTERRUPT             | 13      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_RESET             | 14      | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DIRTY_PAGES              | 15      | client -> server  |
++------------------------------------+---------+-------------------+
 
 
 .. Note:: Some VFIO defines cannot be reused since their values are
@@ -1130,6 +1143,161 @@ client must write data on the same order and transction size as read.
 If an error occurs then the server must fail the read or write operation. It is
 an implementation detail of the client how to handle errors.
 
VFIO_USER_DEVICE_GET_REGION_IO_FDS
----------------------------------

Message format
^^^^^^^^^^^^^^

+--------------+------------------------+
| Name         | Value                  |
+==============+========================+
| Message ID   | <ID>                   |
+--------------+------------------------+
| Command      | 6                      |
+--------------+------------------------+
| Message size | 32 + subregion info    |
+--------------+------------------------+
| Flags        | Reply bit set in reply |
+--------------+------------------------+
| Error        | 0/errno                |
+--------------+------------------------+
| Region info  | Region IO FD info      |
+--------------+------------------------+

Clients can access regions via VFIO_USER_REGION_READ/WRITE or, if provided, by
mmap()ing a file descriptor provided by the server.

VFIO_USER_DEVICE_GET_REGION_IO_FDS provides an alternative access mechanism via
file descriptors. This is an optional feature intended for performance
improvements where an underlying sub-system (such as KVM) supports communication
across such file descriptors to the vfio-user server, without needing to
round-trip through the client.

The server returns an array describing sub-regions of the given region along
with the server specifies a set of sub-regions and the requested file descriptor
notification mechanism to use for that sub-region.  Each sub-region in the
response message may choose to use a different method, as defined below.  The
two mechanisms supported in this specification are ioeventfds and ioregionfds.

A client should then hook up the returned file descriptors with the notification
method requested.

Region IO FD info format
^^^^^^^^^^^^^^^^^^^^^^^^

+------------+--------+------+
| Name       | Offset | Size |
+============+========+======+
| argsz      | 16     | 4    |
+------------+--------+------+
| flags      | 20     | 4    |
+------------+--------+------+
| index      | 24     | 4    |
+------------+--------+------+
| count      | 28     | 4    |
+------------+--------+------+
| subregions | 32     | ...  |
+------------+--------+------+

* *argsz* is the size of the region IO FD info structure plus the
  total size of the subregion array. Thus, each array entry "i" is at offset
    i * ((argsz - 32) / count)
* *flags* must be zero
* *index* is the index of memory region being queried
* *count* is the number of sub-regions in the array
* *subregions* is the array of Sub-Region IO FD info structures

The client must set ``flags`` to zero and specify the region being queried in
the ``index``.

The client sets the ``argsz`` field to indicate the maximum size of the response
that the server can send, which must be at least the size of the response header
plus space for the subregion array. If the full response size exceeds ``argsz``,
then the server must respond only with the response header and the Region IO FD
info structure, setting in ``argsz`` the buffer size required to store the full
response. In this case, no file descriptors are passed back.  The client then
retries the operation with a larger receive buffer.

The reply message will additionally include at least one file descriptor in the
ancillary data. Note that more than one subregion may share the same file
descriptor.

Each sub-region given in the response has one of two possible structures,
depending whether *type* is `VFIO_USER_IO_FD_TYPE_IOEVENTFD` or
`VFIO_USER_IO_FD_TYPE_IOREGIONFD`:

Sub-Region IO FD info format (ioeventfd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+-----------+--------+------+
| Name      | Offset | Size |
+===========+========+======+
| offset    | 0      | 8    |
+-----------+--------+------+
| size      | 8      | 8    |
+-----------+--------+------+
| fd_index  | 16     | 4    |
+-----------+--------+------+
| type      | 20     | 4    |
+-----------+--------+------+
| flags     | 24     | 4    |
+-----------+--------+------+
| padding   | 28     | 4    |
+-----------+--------+------+
| datamatch | 32     | 8    |
+-----------+--------+------+

* *offset* is the offset of the start of the sub-region within the region
  requested ("physical address offset" for the region)
* *size* is the length of the sub-region. This may be zero if the access size is
  not relevant, which may allow for optimizations
* *fd_index* is the index in the ancillary data of the FD to use for ioeventfd
  notification; it may be shared.
* *type* is `VFIO_USER_IO_FD_TYPE_IOEVENTFD`
* *flags* is any of:
  * `KVM_IOEVENTFD_FLAG_DATAMATCH`
  * `KVM_IOEVENTFD_FLAG_PIO`
  * `KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY` (FIXME: makes sense?)
* *datamatch* is the datamatch value if needed

See https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt 4.59
KVM_IOEVENTFD for further context on the ioeventfd-specific fields.

Sub-Region IO FD info format (ioregionfd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+-----------+--------+------+
| Name      | Offset | Size |
+===========+========+======+
| offset    | 0      | 8    |
+-----------+--------+------+
| size      | 8      | 8    |
+-----------+--------+------+
| fd_index  | 16     | 4    |
+-----------+--------+------+
| type      | 20     | 4    |
+-----------+--------+------+
| flags     | 24     | 4    |
+-----------+--------+------+
| region_id | 28     | 4    |
+-----------+--------+------+

* *offset* is the offset of the start of the sub-region within the region
  requested ("physical address offset" for the region)
* *size* is the length of the sub-region. FIXME: may allow zero?
* *fd_index* is the index in the ancillary data of the FD to use for ioregionfd
  messages; it may be shared
* *type* is `VFIO_USER_IO_FD_TYPE_IOREGIONFD`
* *flags* is any of:
  * `KVM_IOREGIONFD_FLAG_PIO`
  * `KVM_IOREGIONFD_FLAG_POSTED_WRITES`
* *region_id* is an opaque value passed back to the server via a message on the
  file descriptor

See https://www.spinics.net/lists/kvm/msg208139.html (FIXME) for further context
on the ioregionfd-specific fields.

 VFIO_USER_DEVICE_GET_IRQ_INFO
 -----------------------------
 
@@ -1141,7 +1309,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 6                      |
+| Command      | 7                      |
 +--------------+------------------------+
 | Message size | 32                     |
 +--------------+------------------------+
@@ -1212,7 +1380,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 7                      |
+| Command      | 8                      |
 +--------------+------------------------+
 | Message size | 36 + any data          |
 +--------------+------------------------+
@@ -1370,7 +1538,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 8                      |
+| Command      | 9                      |
 +--------------+------------------------+
 | Message size | 32 + data size         |
 +--------------+------------------------+
@@ -1397,7 +1565,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 9                      |
+| Command      | 10                     |
 +--------------+------------------------+
 | Message size | 32 + data size         |
 +--------------+------------------------+
@@ -1424,7 +1592,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 10                     |
+| Command      | 11                     |
 +--------------+------------------------+
 | Message size | 28 + data size         |
 +--------------+------------------------+
@@ -1451,7 +1619,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 11                     |
+| Command      | 12                     |
 +--------------+------------------------+
 | Message size | 28 + data size         |
 +--------------+------------------------+
@@ -1478,7 +1646,7 @@ Message format
 +================+========================+
 | Message ID     | <ID>                   |
 +----------------+------------------------+
-| Command        | 12                     |
+| Command        | 13                     |
 +----------------+------------------------+
 | Message size   | 20                     |
 +----------------+------------------------+
@@ -1515,7 +1683,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 13                     |
+| Command      | 14                     |
 +--------------+------------------------+
 | Message size | 16                     |
 +--------------+------------------------+
@@ -1537,7 +1705,7 @@ Message format
 +====================+========================+
 | Message ID         | <ID>                   |
 +--------------------+------------------------+
-| Command            | 14                     |
+| Command            | 15                     |
 +--------------------+------------------------+
 | Message size       | 16                     |
 +--------------------+------------------------+

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-01-28 15:02 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21 16:09 [DRAFT RFC] ioeventfd/ioregionfd support in vfio-user John Levon
2021-01-26 11:01 ` Stefan Hajnoczi
2021-01-28 14:38   ` John Levon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).