KVM Archive on lore.kernel.org
 help / color / Atom feed
* [DRAFT RFC] ioeventfd/ioregionfd support in vfio-user
@ 2021-01-21 16:09 John Levon
  2021-01-26 11:01 ` Stefan Hajnoczi
  0 siblings, 1 reply; 3+ messages in thread
From: John Levon @ 2021-01-21 16:09 UTC (permalink / raw)
  To: libvfio-user-devel, kvm, Stefan Hajnoczi, Cornelia Huck,
	john.g.johnson, qemu-devel


Hi, please take a look. For context, this addition is against the current
https://github.com/nutanix/libvfio-user/blob/master/docs/vfio-user.rst
specification.

kvm@ readers: Stefan suggested this may be of interest from a VFIO point of
view, in case there is any potential cross-over in defining how to wire up these
fds.

Note that this is a new message instead of adding a new region capability to
VFIO_USER_DEVICE_GET_REGION_INFO: with a new capability, there's no way for the
server to know if ioeventfd/ioregionfd is actually used/requested by the client
(the client would just have to close those fds if it didn't want to use FDs). An
explicit new call is much clearer for this.

The ioregionfd feature itself is at proposal stage, so there's a good chance of
further changes depending on that.

I also have these pending issues so far:

1) I'm not familiar with CCW bus, so don't know if
KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY flag makes sense or is supportable in
vfio-user context

2) if ioregionfd subsumes all ioeventfd use cases, we can remove the distinction
from this API, and the client caller can translate to ioregionfd or ioeventfd as
needed

3) is it neccessary for the client to indicate support (e.g. lacking ioregionfd
support) ?

4) (ioregionfd issue) region_id is 4 bytes, which seems a little awkward from
the server side: this has to encode both the region ID as well as the offset of
the sub-region within that region. Can this be 64 bits wide?

thanks
john

(NB: I edited the diff so the new sections are more readable.)

diff --git a/docs/vfio-user.rst b/docs/vfio-user.rst
index e3adc7a..e7274a2 100644
--- a/docs/vfio-user.rst
+++ b/docs/vfio-user.rst
@@ -161,6 +161,17 @@ in the region info reply of a device-specific region. Such regions are reflected
 in ``struct vfio_device_info.num_regions``. Thus, for PCI devices this value can
 be equal to, or higher than, VFIO_PCI_NUM_REGIONS.
 
+Region I/O via file descriptors
+-------------------------------
+
+For unmapped regions, region I/O from the client is done via
+VFIO_USER_REGION_READ/WRITE.  As an optimization, ioeventfds or ioregionfds may
+be configured for sub-regions of some regions. A client may request information
+on these sub-regions via VFIO_USER_DEVICE_GET_REGION_IO_FDS; by configuring the
+returned file descriptors as ioeventfds or ioregionfds, the server can be
+directly notified of I/O (for example, by KVM) without taking a trip through the
+client.
+
 Interrupts
 ^^^^^^^^^^
 The client uses VFIO_USER_DEVICE_GET_IRQ_INFO messages to query the server for
@@ -293,37 +304,39 @@ Commands
 The following table lists the VFIO message command IDs, and whether the
 message command is sent from the client or the server.
 
-+----------------------------------+---------+-------------------+
-| Name                             | Command | Request Direction |
-+==================================+=========+===================+
-| VFIO_USER_VERSION                | 1       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_MAP                | 2       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_UNMAP              | 3       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_INFO        | 4       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_REGION_INFO | 5       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_IRQ_INFO    | 6       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_SET_IRQS        | 7       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_REGION_READ            | 8       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_REGION_WRITE           | 9       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_READ               | 10      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_WRITE              | 11      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_VM_INTERRUPT           | 12      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_RESET           | 13      | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DIRTY_PAGES            | 14      | client -> server  |
-+----------------------------------+---------+-------------------+
++------------------------------------+---------+-------------------+
+| Name                               | Command | Request Direction |
++====================================+=========+===================+
+| VFIO_USER_VERSION                  | 1       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_MAP                  | 2       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_UNMAP                | 3       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_INFO          | 4       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_REGION_INFO   | 5       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_REGION_IO_FDS | 6       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_IRQ_INFO      | 7       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_SET_IRQS          | 8       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_REGION_READ              | 9       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_REGION_WRITE             | 10      | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_READ                 | 11      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_WRITE                | 12      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_VM_INTERRUPT             | 13      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_RESET             | 14      | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DIRTY_PAGES              | 15      | client -> server  |
++------------------------------------+---------+-------------------+
 
 
 .. Note:: Some VFIO defines cannot be reused since their values are
@@ -1130,6 +1143,161 @@ client must write data on the same order and transction size as read.
 If an error occurs then the server must fail the read or write operation. It is
 an implementation detail of the client how to handle errors.
 
VFIO_USER_DEVICE_GET_REGION_IO_FDS
----------------------------------

Message format
^^^^^^^^^^^^^^

+--------------+------------------------+
| Name         | Value                  |
+==============+========================+
| Message ID   | <ID>                   |
+--------------+------------------------+
| Command      | 6                      |
+--------------+------------------------+
| Message size | 32 + subregion info    |
+--------------+------------------------+
| Flags        | Reply bit set in reply |
+--------------+------------------------+
| Error        | 0/errno                |
+--------------+------------------------+
| Region info  | Region IO FD info      |
+--------------+------------------------+

Clients can access regions via VFIO_USER_REGION_READ/WRITE or, if provided, by
mmap()ing a file descriptor provided by the server.

VFIO_USER_DEVICE_GET_REGION_IO_FDS provides an alternative access mechanism via
file descriptors. This is an optional feature intended for performance
improvements where an underlying sub-system (such as KVM) supports communication
across such file descriptors to the vfio-user server, without needing to
round-trip through the client.

The server returns an array describing sub-regions of the given region along
with the server specifies a set of sub-regions and the requested file descriptor
notification mechanism to use for that sub-region.  Each sub-region in the
response message may choose to use a different method, as defined below.  The
two mechanisms supported in this specification are ioeventfds and ioregionfds.

A client should then hook up the returned file descriptors with the notification
method requested.

Region IO FD info format
^^^^^^^^^^^^^^^^^^^^^^^^

+------------+--------+------+
| Name       | Offset | Size |
+============+========+======+
| argsz      | 16     | 4    |
+------------+--------+------+
| flags      | 20     | 4    |
+------------+--------+------+
| index      | 24     | 4    |
+------------+--------+------+
| count      | 28     | 4    |
+------------+--------+------+
| subregions | 32     | ...  |
+------------+--------+------+

* *argsz* is the size of the region IO FD info structure plus the
  total size of the subregion array. Thus, each array entry "i" is at offset
    i * ((argsz - 32) / count)
* *flags* must be zero
* *index* is the index of memory region being queried
* *count* is the number of sub-regions in the array
* *subregions* is the array of Sub-Region IO FD info structures

The client must set ``flags`` to zero and specify the region being queried in
the ``index``.

The client sets the ``argsz`` field to indicate the maximum size of the response
that the server can send, which must be at least the size of the response header
plus space for the subregion array. If the full response size exceeds ``argsz``,
then the server must respond only with the response header and the Region IO FD
info structure, setting in ``argsz`` the buffer size required to store the full
response. In this case, no file descriptors are passed back.  The client then
retries the operation with a larger receive buffer.

The reply message will additionally include at least one file descriptor in the
ancillary data. Note that more than one subregion may share the same file
descriptor.

Each sub-region given in the response has one of two possible structures,
depending whether *type* is `VFIO_USER_IO_FD_TYPE_IOEVENTFD` or
`VFIO_USER_IO_FD_TYPE_IOREGIONFD`:

Sub-Region IO FD info format (ioeventfd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+-----------+--------+------+
| Name      | Offset | Size |
+===========+========+======+
| offset    | 0      | 8    |
+-----------+--------+------+
| size      | 8      | 8    |
+-----------+--------+------+
| fd_index  | 16     | 4    |
+-----------+--------+------+
| type      | 20     | 4    |
+-----------+--------+------+
| flags     | 24     | 4    |
+-----------+--------+------+
| padding   | 28     | 4    |
+-----------+--------+------+
| datamatch | 32     | 8    |
+-----------+--------+------+

* *offset* is the offset of the start of the sub-region within the region
  requested ("physical address offset" for the region)
* *size* is the length of the sub-region. This may be zero if the access size is
  not relevant, which may allow for optimizations
* *fd_index* is the index in the ancillary data of the FD to use for ioeventfd
  notification; it may be shared.
* *type* is `VFIO_USER_IO_FD_TYPE_IOEVENTFD`
* *flags* is any of:
  * `KVM_IOEVENTFD_FLAG_DATAMATCH`
  * `KVM_IOEVENTFD_FLAG_PIO`
  * `KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY` (FIXME: makes sense?)
* *datamatch* is the datamatch value if needed

See https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt 4.59
KVM_IOEVENTFD for further context on the ioeventfd-specific fields.

Sub-Region IO FD info format (ioregionfd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+-----------+--------+------+
| Name      | Offset | Size |
+===========+========+======+
| offset    | 0      | 8    |
+-----------+--------+------+
| size      | 8      | 8    |
+-----------+--------+------+
| fd_index  | 16     | 4    |
+-----------+--------+------+
| type      | 20     | 4    |
+-----------+--------+------+
| flags     | 24     | 4    |
+-----------+--------+------+
| region_id | 28     | 4    |
+-----------+--------+------+

* *offset* is the offset of the start of the sub-region within the region
  requested ("physical address offset" for the region)
* *size* is the length of the sub-region. FIXME: may allow zero?
* *fd_index* is the index in the ancillary data of the FD to use for ioregionfd
  messages; it may be shared
* *type* is `VFIO_USER_IO_FD_TYPE_IOREGIONFD`
* *flags* is any of:
  * `KVM_IOREGIONFD_FLAG_PIO`
  * `KVM_IOREGIONFD_FLAG_POSTED_WRITES`
* *region_id* is an opaque value passed back to the server via a message on the
  file descriptor

See https://www.spinics.net/lists/kvm/msg208139.html (FIXME) for further context
on the ioregionfd-specific fields.

 VFIO_USER_DEVICE_GET_IRQ_INFO
 -----------------------------
 
@@ -1141,7 +1309,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 6                      |
+| Command      | 7                      |
 +--------------+------------------------+
 | Message size | 32                     |
 +--------------+------------------------+
@@ -1212,7 +1380,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 7                      |
+| Command      | 8                      |
 +--------------+------------------------+
 | Message size | 36 + any data          |
 +--------------+------------------------+
@@ -1370,7 +1538,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 8                      |
+| Command      | 9                      |
 +--------------+------------------------+
 | Message size | 32 + data size         |
 +--------------+------------------------+
@@ -1397,7 +1565,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 9                      |
+| Command      | 10                     |
 +--------------+------------------------+
 | Message size | 32 + data size         |
 +--------------+------------------------+
@@ -1424,7 +1592,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 10                     |
+| Command      | 11                     |
 +--------------+------------------------+
 | Message size | 28 + data size         |
 +--------------+------------------------+
@@ -1451,7 +1619,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 11                     |
+| Command      | 12                     |
 +--------------+------------------------+
 | Message size | 28 + data size         |
 +--------------+------------------------+
@@ -1478,7 +1646,7 @@ Message format
 +================+========================+
 | Message ID     | <ID>                   |
 +----------------+------------------------+
-| Command        | 12                     |
+| Command        | 13                     |
 +----------------+------------------------+
 | Message size   | 20                     |
 +----------------+------------------------+
@@ -1515,7 +1683,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 13                     |
+| Command      | 14                     |
 +--------------+------------------------+
 | Message size | 16                     |
 +--------------+------------------------+
@@ -1537,7 +1705,7 @@ Message format
 +====================+========================+
 | Message ID         | <ID>                   |
 +--------------------+------------------------+
-| Command            | 14                     |
+| Command            | 15                     |
 +--------------------+------------------------+
 | Message size       | 16                     |
 +--------------------+------------------------+

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [DRAFT RFC] ioeventfd/ioregionfd support in vfio-user
  2021-01-21 16:09 [DRAFT RFC] ioeventfd/ioregionfd support in vfio-user John Levon
@ 2021-01-26 11:01 ` Stefan Hajnoczi
  2021-01-28 14:38   ` John Levon
  0 siblings, 1 reply; 3+ messages in thread
From: Stefan Hajnoczi @ 2021-01-26 11:01 UTC (permalink / raw)
  To: John Levon
  Cc: libvfio-user-devel, kvm, Cornelia Huck, john.g.johnson, qemu-devel


[-- Attachment #1: Type: text/plain, Size: 19982 bytes --]

On Thu, Jan 21, 2021 at 04:09:09PM +0000, John Levon wrote:
> 
> Hi, please take a look. For context, this addition is against the current
> https://github.com/nutanix/libvfio-user/blob/master/docs/vfio-user.rst
> specification.
> 
> kvm@ readers: Stefan suggested this may be of interest from a VFIO point of
> view, in case there is any potential cross-over in defining how to wire up these
> fds.

After talking to Alex Williamson about including these APIs in the
kernel VFIO ioctls it became clear to me that there is an architectural
difference between classic vfio-pci devices and vfio-user devices:

The current model with kernel VFIO is that the VMM sets up ioeventfds
using ioctl(VFIO_DEVICE_IOEVENTFD). The VMM has device-specific
knowledge that allows it to add ioeventfds for specific hardware
registers.

In vfio-user the VMM has no knowledge of the specific device. Instead
the device tells the VMM how to configure ioeventfds and ioregionfds.
There are a few reasons for this:

1. It is not practical to add knowledge of every vfio-user device
   implementation into the VMM. A cleaner solution is for the device to
   advertise its desired configuration to the VMM instead.

2. The device implementation may prefer a specific ioeventfd or
   ioregionfd configuration to fit its multi-threading architecture. A
   multi-threaded (multi-queue) device implementation may want
   ioeventfds and ioregionfds to be configured so that the right thread
   receives certain MMIO/PIO accesses.

3. The device implementation may want to specify ioregionfd's user_data
   value. This is the application-specific value that is passed along
   with an MMIO/PIO access.

I think mdev fits in somewhere between these two models. I can imagine
complex mdev devices wanting to control the ioeventfd and ioregionfd
configuration just like vfio-user. But in simple cases it doesn't
matter.

I suggest including John's ioeventfd/ioregionfd work into the kernel
VFIO interface so mdev devices can take advantage of it too.

> Note that this is a new message instead of adding a new region capability to
> VFIO_USER_DEVICE_GET_REGION_INFO: with a new capability, there's no way for the
> server to know if ioeventfd/ioregionfd is actually used/requested by the client
> (the client would just have to close those fds if it didn't want to use FDs). An
> explicit new call is much clearer for this.
> 
> The ioregionfd feature itself is at proposal stage, so there's a good chance of
> further changes depending on that.
> 
> I also have these pending issues so far:
> 
> 1) I'm not familiar with CCW bus, so don't know if
> KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY flag makes sense or is supportable in
> vfio-user context
> 
> 2) if ioregionfd subsumes all ioeventfd use cases, we can remove the distinction
> from this API, and the client caller can translate to ioregionfd or ioeventfd as
> needed
> 
> 3) is it neccessary for the client to indicate support (e.g. lacking ioregionfd
> support) ?
> 
> 4) (ioregionfd issue) region_id is 4 bytes, which seems a little awkward from
> the server side: this has to encode both the region ID as well as the offset of
> the sub-region within that region. Can this be 64 bits wide?

The user_data field in Elena's ioregionfd patches is 64 bits. Does this
solve the problem?

> 
> thanks
> john
> 
> (NB: I edited the diff so the new sections are more readable.)
> 
> diff --git a/docs/vfio-user.rst b/docs/vfio-user.rst
> index e3adc7a..e7274a2 100644
> --- a/docs/vfio-user.rst
> +++ b/docs/vfio-user.rst
> @@ -161,6 +161,17 @@ in the region info reply of a device-specific region. Such regions are reflected
>  in ``struct vfio_device_info.num_regions``. Thus, for PCI devices this value can
>  be equal to, or higher than, VFIO_PCI_NUM_REGIONS.
>  
> +Region I/O via file descriptors
> +-------------------------------
> +
> +For unmapped regions, region I/O from the client is done via
> +VFIO_USER_REGION_READ/WRITE.  As an optimization, ioeventfds or ioregionfds may
> +be configured for sub-regions of some regions. A client may request information
> +on these sub-regions via VFIO_USER_DEVICE_GET_REGION_IO_FDS; by configuring the
> +returned file descriptors as ioeventfds or ioregionfds, the server can be
> +directly notified of I/O (for example, by KVM) without taking a trip through the
> +client.
> +
>  Interrupts
>  ^^^^^^^^^^
>  The client uses VFIO_USER_DEVICE_GET_IRQ_INFO messages to query the server for
> @@ -293,37 +304,39 @@ Commands
>  The following table lists the VFIO message command IDs, and whether the
>  message command is sent from the client or the server.
>  
> -+----------------------------------+---------+-------------------+
> -| Name                             | Command | Request Direction |
> -+==================================+=========+===================+
> -| VFIO_USER_VERSION                | 1       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DMA_MAP                | 2       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DMA_UNMAP              | 3       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DEVICE_GET_INFO        | 4       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DEVICE_GET_REGION_INFO | 5       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DEVICE_GET_IRQ_INFO    | 6       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DEVICE_SET_IRQS        | 7       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_REGION_READ            | 8       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_REGION_WRITE           | 9       | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DMA_READ               | 10      | server -> client  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DMA_WRITE              | 11      | server -> client  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_VM_INTERRUPT           | 12      | server -> client  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DEVICE_RESET           | 13      | client -> server  |
> -+----------------------------------+---------+-------------------+
> -| VFIO_USER_DIRTY_PAGES            | 14      | client -> server  |
> -+----------------------------------+---------+-------------------+
> ++------------------------------------+---------+-------------------+
> +| Name                               | Command | Request Direction |
> ++====================================+=========+===================+
> +| VFIO_USER_VERSION                  | 1       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DMA_MAP                  | 2       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DMA_UNMAP                | 3       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DEVICE_GET_INFO          | 4       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DEVICE_GET_REGION_INFO   | 5       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DEVICE_GET_REGION_IO_FDS | 6       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DEVICE_GET_IRQ_INFO      | 7       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DEVICE_SET_IRQS          | 8       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_REGION_READ              | 9       | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_REGION_WRITE             | 10      | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DMA_READ                 | 11      | server -> client  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DMA_WRITE                | 12      | server -> client  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_VM_INTERRUPT             | 13      | server -> client  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DEVICE_RESET             | 14      | client -> server  |
> ++------------------------------------+---------+-------------------+
> +| VFIO_USER_DIRTY_PAGES              | 15      | client -> server  |
> ++------------------------------------+---------+-------------------+
>  
>  
>  .. Note:: Some VFIO defines cannot be reused since their values are
> @@ -1130,6 +1143,161 @@ client must write data on the same order and transction size as read.
>  If an error occurs then the server must fail the read or write operation. It is
>  an implementation detail of the client how to handle errors.
>  
> VFIO_USER_DEVICE_GET_REGION_IO_FDS
> ----------------------------------
> 
> Message format
> ^^^^^^^^^^^^^^
> 
> +--------------+------------------------+
> | Name         | Value                  |
> +==============+========================+
> | Message ID   | <ID>                   |
> +--------------+------------------------+
> | Command      | 6                      |
> +--------------+------------------------+
> | Message size | 32 + subregion info    |
> +--------------+------------------------+
> | Flags        | Reply bit set in reply |
> +--------------+------------------------+
> | Error        | 0/errno                |
> +--------------+------------------------+
> | Region info  | Region IO FD info      |
> +--------------+------------------------+
> 
> Clients can access regions via VFIO_USER_REGION_READ/WRITE or, if provided, by
> mmap()ing a file descriptor provided by the server.
> 
> VFIO_USER_DEVICE_GET_REGION_IO_FDS provides an alternative access mechanism via
> file descriptors. This is an optional feature intended for performance
> improvements where an underlying sub-system (such as KVM) supports communication
> across such file descriptors to the vfio-user server, without needing to
> round-trip through the client.
> 
> The server returns an array describing sub-regions of the given region along
> with the server specifies a set of sub-regions and the requested file descriptor
> notification mechanism to use for that sub-region.  Each sub-region in the
> response message may choose to use a different method, as defined below.  The
> two mechanisms supported in this specification are ioeventfds and ioregionfds.
> 
> A client should then hook up the returned file descriptors with the notification
> method requested.
> 
> Region IO FD info format
> ^^^^^^^^^^^^^^^^^^^^^^^^
> 
> +------------+--------+------+
> | Name       | Offset | Size |
> +============+========+======+
> | argsz      | 16     | 4    |
> +------------+--------+------+
> | flags      | 20     | 4    |
> +------------+--------+------+
> | index      | 24     | 4    |
> +------------+--------+------+
> | count      | 28     | 4    |
> +------------+--------+------+
> | subregions | 32     | ...  |
> +------------+--------+------+
> 
> * *argsz* is the size of the region IO FD info structure plus the
>   total size of the subregion array. Thus, each array entry "i" is at offset
>     i * ((argsz - 32) / count)
> * *flags* must be zero
> * *index* is the index of memory region being queried
> * *count* is the number of sub-regions in the array
> * *subregions* is the array of Sub-Region IO FD info structures
> 
> The client must set ``flags`` to zero and specify the region being queried in
> the ``index``.
> 
> The client sets the ``argsz`` field to indicate the maximum size of the response
> that the server can send, which must be at least the size of the response header
> plus space for the subregion array. If the full response size exceeds ``argsz``,
> then the server must respond only with the response header and the Region IO FD
> info structure, setting in ``argsz`` the buffer size required to store the full
> response. In this case, no file descriptors are passed back.  The client then
> retries the operation with a larger receive buffer.
> 
> The reply message will additionally include at least one file descriptor in the
> ancillary data. Note that more than one subregion may share the same file
> descriptor.
> 
> Each sub-region given in the response has one of two possible structures,
> depending whether *type* is `VFIO_USER_IO_FD_TYPE_IOEVENTFD` or
> `VFIO_USER_IO_FD_TYPE_IOREGIONFD`:
> 
> Sub-Region IO FD info format (ioeventfd)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> +-----------+--------+------+
> | Name      | Offset | Size |
> +===========+========+======+
> | offset    | 0      | 8    |
> +-----------+--------+------+
> | size      | 8      | 8    |
> +-----------+--------+------+
> | fd_index  | 16     | 4    |
> +-----------+--------+------+
> | type      | 20     | 4    |
> +-----------+--------+------+
> | flags     | 24     | 4    |
> +-----------+--------+------+
> | padding   | 28     | 4    |
> +-----------+--------+------+
> | datamatch | 32     | 8    |
> +-----------+--------+------+
> 
> * *offset* is the offset of the start of the sub-region within the region
>   requested ("physical address offset" for the region)
> * *size* is the length of the sub-region. This may be zero if the access size is
>   not relevant, which may allow for optimizations
> * *fd_index* is the index in the ancillary data of the FD to use for ioeventfd
>   notification; it may be shared.
> * *type* is `VFIO_USER_IO_FD_TYPE_IOEVENTFD`
> * *flags* is any of:
>   * `KVM_IOEVENTFD_FLAG_DATAMATCH`
>   * `KVM_IOEVENTFD_FLAG_PIO`
>   * `KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY` (FIXME: makes sense?)
> * *datamatch* is the datamatch value if needed
> 
> See https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt 4.59
> KVM_IOEVENTFD for further context on the ioeventfd-specific fields.
> 
> Sub-Region IO FD info format (ioregionfd)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> +-----------+--------+------+
> | Name      | Offset | Size |
> +===========+========+======+
> | offset    | 0      | 8    |
> +-----------+--------+------+
> | size      | 8      | 8    |
> +-----------+--------+------+
> | fd_index  | 16     | 4    |
> +-----------+--------+------+
> | type      | 20     | 4    |
> +-----------+--------+------+
> | flags     | 24     | 4    |
> +-----------+--------+------+
> | region_id | 28     | 4    |
> +-----------+--------+------+
> 
> * *offset* is the offset of the start of the sub-region within the region
>   requested ("physical address offset" for the region)
> * *size* is the length of the sub-region. FIXME: may allow zero?
> * *fd_index* is the index in the ancillary data of the FD to use for ioregionfd
>   messages; it may be shared
> * *type* is `VFIO_USER_IO_FD_TYPE_IOREGIONFD`
> * *flags* is any of:
>   * `KVM_IOREGIONFD_FLAG_PIO`
>   * `KVM_IOREGIONFD_FLAG_POSTED_WRITES`
> * *region_id* is an opaque value passed back to the server via a message on the
>   file descriptor
> 
> See https://www.spinics.net/lists/kvm/msg208139.html (FIXME) for further context
> on the ioregionfd-specific fields.
> 
>  VFIO_USER_DEVICE_GET_IRQ_INFO
>  -----------------------------
>  
> @@ -1141,7 +1309,7 @@ Message format
>  +==============+========================+
>  | Message ID   | <ID>                   |
>  +--------------+------------------------+
> -| Command      | 6                      |
> +| Command      | 7                      |
>  +--------------+------------------------+
>  | Message size | 32                     |
>  +--------------+------------------------+
> @@ -1212,7 +1380,7 @@ Message format
>  +==============+========================+
>  | Message ID   | <ID>                   |
>  +--------------+------------------------+
> -| Command      | 7                      |
> +| Command      | 8                      |
>  +--------------+------------------------+
>  | Message size | 36 + any data          |
>  +--------------+------------------------+
> @@ -1370,7 +1538,7 @@ Message format
>  +==============+========================+
>  | Message ID   | <ID>                   |
>  +--------------+------------------------+
> -| Command      | 8                      |
> +| Command      | 9                      |
>  +--------------+------------------------+
>  | Message size | 32 + data size         |
>  +--------------+------------------------+
> @@ -1397,7 +1565,7 @@ Message format
>  +==============+========================+
>  | Message ID   | <ID>                   |
>  +--------------+------------------------+
> -| Command      | 9                      |
> +| Command      | 10                     |
>  +--------------+------------------------+
>  | Message size | 32 + data size         |
>  +--------------+------------------------+
> @@ -1424,7 +1592,7 @@ Message format
>  +==============+========================+
>  | Message ID   | <ID>                   |
>  +--------------+------------------------+
> -| Command      | 10                     |
> +| Command      | 11                     |
>  +--------------+------------------------+
>  | Message size | 28 + data size         |
>  +--------------+------------------------+
> @@ -1451,7 +1619,7 @@ Message format
>  +==============+========================+
>  | Message ID   | <ID>                   |
>  +--------------+------------------------+
> -| Command      | 11                     |
> +| Command      | 12                     |
>  +--------------+------------------------+
>  | Message size | 28 + data size         |
>  +--------------+------------------------+
> @@ -1478,7 +1646,7 @@ Message format
>  +================+========================+
>  | Message ID     | <ID>                   |
>  +----------------+------------------------+
> -| Command        | 12                     |
> +| Command        | 13                     |
>  +----------------+------------------------+
>  | Message size   | 20                     |
>  +----------------+------------------------+
> @@ -1515,7 +1683,7 @@ Message format
>  +==============+========================+
>  | Message ID   | <ID>                   |
>  +--------------+------------------------+
> -| Command      | 13                     |
> +| Command      | 14                     |
>  +--------------+------------------------+
>  | Message size | 16                     |
>  +--------------+------------------------+
> @@ -1537,7 +1705,7 @@ Message format
>  +====================+========================+
>  | Message ID         | <ID>                   |
>  +--------------------+------------------------+
> -| Command            | 14                     |
> +| Command            | 15                     |
>  +--------------------+------------------------+
>  | Message size       | 16                     |
>  +--------------------+------------------------+
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [DRAFT RFC] ioeventfd/ioregionfd support in vfio-user
  2021-01-26 11:01 ` Stefan Hajnoczi
@ 2021-01-28 14:38   ` John Levon
  0 siblings, 0 replies; 3+ messages in thread
From: John Levon @ 2021-01-28 14:38 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: John Levon, libvfio-user-devel, kvm, Cornelia Huck,
	john.g.johnson, qemu-devel

On Tue, Jan 26, 2021 at 11:01:04AM +0000, Stefan Hajnoczi wrote:

> > 4) (ioregionfd issue) region_id is 4 bytes, which seems a little awkward from
> > the server side: this has to encode both the region ID as well as the offset of
> > the sub-region within that region. Can this be 64 bits wide?
> 
> The user_data field in Elena's ioregionfd patches is 64 bits. Does this
> solve the problem?

I had missed this had been changed from the proposal at https://www.spinics.net/lists/kvm/msg208139.html

Thanks for pointing this out.

regards
john

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, back to index

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-21 16:09 [DRAFT RFC] ioeventfd/ioregionfd support in vfio-user John Levon
2021-01-26 11:01 ` Stefan Hajnoczi
2021-01-28 14:38   ` John Levon

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git