KVM Archive on lore.kernel.org
 help / color / Atom feed
From: John Levon <john.levon@nutanix.com>
To: "libvfio-user-devel@nongnu.org" <libvfio-user-devel@nongnu.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	"john.g.johnson@oracle.com" <john.g.johnson@oracle.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Subject: [DRAFT RFC] ioeventfd/ioregionfd support in vfio-user
Date: Thu, 21 Jan 2021 16:09:09 +0000
Message-ID: <20210121160905.GA314820@sent> (raw)


Hi, please take a look. For context, this addition is against the current
https://github.com/nutanix/libvfio-user/blob/master/docs/vfio-user.rst
specification.

kvm@ readers: Stefan suggested this may be of interest from a VFIO point of
view, in case there is any potential cross-over in defining how to wire up these
fds.

Note that this is a new message instead of adding a new region capability to
VFIO_USER_DEVICE_GET_REGION_INFO: with a new capability, there's no way for the
server to know if ioeventfd/ioregionfd is actually used/requested by the client
(the client would just have to close those fds if it didn't want to use FDs). An
explicit new call is much clearer for this.

The ioregionfd feature itself is at proposal stage, so there's a good chance of
further changes depending on that.

I also have these pending issues so far:

1) I'm not familiar with CCW bus, so don't know if
KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY flag makes sense or is supportable in
vfio-user context

2) if ioregionfd subsumes all ioeventfd use cases, we can remove the distinction
from this API, and the client caller can translate to ioregionfd or ioeventfd as
needed

3) is it neccessary for the client to indicate support (e.g. lacking ioregionfd
support) ?

4) (ioregionfd issue) region_id is 4 bytes, which seems a little awkward from
the server side: this has to encode both the region ID as well as the offset of
the sub-region within that region. Can this be 64 bits wide?

thanks
john

(NB: I edited the diff so the new sections are more readable.)

diff --git a/docs/vfio-user.rst b/docs/vfio-user.rst
index e3adc7a..e7274a2 100644
--- a/docs/vfio-user.rst
+++ b/docs/vfio-user.rst
@@ -161,6 +161,17 @@ in the region info reply of a device-specific region. Such regions are reflected
 in ``struct vfio_device_info.num_regions``. Thus, for PCI devices this value can
 be equal to, or higher than, VFIO_PCI_NUM_REGIONS.
 
+Region I/O via file descriptors
+-------------------------------
+
+For unmapped regions, region I/O from the client is done via
+VFIO_USER_REGION_READ/WRITE.  As an optimization, ioeventfds or ioregionfds may
+be configured for sub-regions of some regions. A client may request information
+on these sub-regions via VFIO_USER_DEVICE_GET_REGION_IO_FDS; by configuring the
+returned file descriptors as ioeventfds or ioregionfds, the server can be
+directly notified of I/O (for example, by KVM) without taking a trip through the
+client.
+
 Interrupts
 ^^^^^^^^^^
 The client uses VFIO_USER_DEVICE_GET_IRQ_INFO messages to query the server for
@@ -293,37 +304,39 @@ Commands
 The following table lists the VFIO message command IDs, and whether the
 message command is sent from the client or the server.
 
-+----------------------------------+---------+-------------------+
-| Name                             | Command | Request Direction |
-+==================================+=========+===================+
-| VFIO_USER_VERSION                | 1       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_MAP                | 2       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_UNMAP              | 3       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_INFO        | 4       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_REGION_INFO | 5       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_GET_IRQ_INFO    | 6       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_SET_IRQS        | 7       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_REGION_READ            | 8       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_REGION_WRITE           | 9       | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_READ               | 10      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DMA_WRITE              | 11      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_VM_INTERRUPT           | 12      | server -> client  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DEVICE_RESET           | 13      | client -> server  |
-+----------------------------------+---------+-------------------+
-| VFIO_USER_DIRTY_PAGES            | 14      | client -> server  |
-+----------------------------------+---------+-------------------+
++------------------------------------+---------+-------------------+
+| Name                               | Command | Request Direction |
++====================================+=========+===================+
+| VFIO_USER_VERSION                  | 1       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_MAP                  | 2       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_UNMAP                | 3       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_INFO          | 4       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_REGION_INFO   | 5       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_REGION_IO_FDS | 6       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_GET_IRQ_INFO      | 7       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_SET_IRQS          | 8       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_REGION_READ              | 9       | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_REGION_WRITE             | 10      | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_READ                 | 11      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DMA_WRITE                | 12      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_VM_INTERRUPT             | 13      | server -> client  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DEVICE_RESET             | 14      | client -> server  |
++------------------------------------+---------+-------------------+
+| VFIO_USER_DIRTY_PAGES              | 15      | client -> server  |
++------------------------------------+---------+-------------------+
 
 
 .. Note:: Some VFIO defines cannot be reused since their values are
@@ -1130,6 +1143,161 @@ client must write data on the same order and transction size as read.
 If an error occurs then the server must fail the read or write operation. It is
 an implementation detail of the client how to handle errors.
 
VFIO_USER_DEVICE_GET_REGION_IO_FDS
----------------------------------

Message format
^^^^^^^^^^^^^^

+--------------+------------------------+
| Name         | Value                  |
+==============+========================+
| Message ID   | <ID>                   |
+--------------+------------------------+
| Command      | 6                      |
+--------------+------------------------+
| Message size | 32 + subregion info    |
+--------------+------------------------+
| Flags        | Reply bit set in reply |
+--------------+------------------------+
| Error        | 0/errno                |
+--------------+------------------------+
| Region info  | Region IO FD info      |
+--------------+------------------------+

Clients can access regions via VFIO_USER_REGION_READ/WRITE or, if provided, by
mmap()ing a file descriptor provided by the server.

VFIO_USER_DEVICE_GET_REGION_IO_FDS provides an alternative access mechanism via
file descriptors. This is an optional feature intended for performance
improvements where an underlying sub-system (such as KVM) supports communication
across such file descriptors to the vfio-user server, without needing to
round-trip through the client.

The server returns an array describing sub-regions of the given region along
with the server specifies a set of sub-regions and the requested file descriptor
notification mechanism to use for that sub-region.  Each sub-region in the
response message may choose to use a different method, as defined below.  The
two mechanisms supported in this specification are ioeventfds and ioregionfds.

A client should then hook up the returned file descriptors with the notification
method requested.

Region IO FD info format
^^^^^^^^^^^^^^^^^^^^^^^^

+------------+--------+------+
| Name       | Offset | Size |
+============+========+======+
| argsz      | 16     | 4    |
+------------+--------+------+
| flags      | 20     | 4    |
+------------+--------+------+
| index      | 24     | 4    |
+------------+--------+------+
| count      | 28     | 4    |
+------------+--------+------+
| subregions | 32     | ...  |
+------------+--------+------+

* *argsz* is the size of the region IO FD info structure plus the
  total size of the subregion array. Thus, each array entry "i" is at offset
    i * ((argsz - 32) / count)
* *flags* must be zero
* *index* is the index of memory region being queried
* *count* is the number of sub-regions in the array
* *subregions* is the array of Sub-Region IO FD info structures

The client must set ``flags`` to zero and specify the region being queried in
the ``index``.

The client sets the ``argsz`` field to indicate the maximum size of the response
that the server can send, which must be at least the size of the response header
plus space for the subregion array. If the full response size exceeds ``argsz``,
then the server must respond only with the response header and the Region IO FD
info structure, setting in ``argsz`` the buffer size required to store the full
response. In this case, no file descriptors are passed back.  The client then
retries the operation with a larger receive buffer.

The reply message will additionally include at least one file descriptor in the
ancillary data. Note that more than one subregion may share the same file
descriptor.

Each sub-region given in the response has one of two possible structures,
depending whether *type* is `VFIO_USER_IO_FD_TYPE_IOEVENTFD` or
`VFIO_USER_IO_FD_TYPE_IOREGIONFD`:

Sub-Region IO FD info format (ioeventfd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+-----------+--------+------+
| Name      | Offset | Size |
+===========+========+======+
| offset    | 0      | 8    |
+-----------+--------+------+
| size      | 8      | 8    |
+-----------+--------+------+
| fd_index  | 16     | 4    |
+-----------+--------+------+
| type      | 20     | 4    |
+-----------+--------+------+
| flags     | 24     | 4    |
+-----------+--------+------+
| padding   | 28     | 4    |
+-----------+--------+------+
| datamatch | 32     | 8    |
+-----------+--------+------+

* *offset* is the offset of the start of the sub-region within the region
  requested ("physical address offset" for the region)
* *size* is the length of the sub-region. This may be zero if the access size is
  not relevant, which may allow for optimizations
* *fd_index* is the index in the ancillary data of the FD to use for ioeventfd
  notification; it may be shared.
* *type* is `VFIO_USER_IO_FD_TYPE_IOEVENTFD`
* *flags* is any of:
  * `KVM_IOEVENTFD_FLAG_DATAMATCH`
  * `KVM_IOEVENTFD_FLAG_PIO`
  * `KVM_IOEVENTFD_FLAG_VIRTIO_CCW_NOTIFY` (FIXME: makes sense?)
* *datamatch* is the datamatch value if needed

See https://www.kernel.org/doc/Documentation/virtual/kvm/api.txt 4.59
KVM_IOEVENTFD for further context on the ioeventfd-specific fields.

Sub-Region IO FD info format (ioregionfd)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

+-----------+--------+------+
| Name      | Offset | Size |
+===========+========+======+
| offset    | 0      | 8    |
+-----------+--------+------+
| size      | 8      | 8    |
+-----------+--------+------+
| fd_index  | 16     | 4    |
+-----------+--------+------+
| type      | 20     | 4    |
+-----------+--------+------+
| flags     | 24     | 4    |
+-----------+--------+------+
| region_id | 28     | 4    |
+-----------+--------+------+

* *offset* is the offset of the start of the sub-region within the region
  requested ("physical address offset" for the region)
* *size* is the length of the sub-region. FIXME: may allow zero?
* *fd_index* is the index in the ancillary data of the FD to use for ioregionfd
  messages; it may be shared
* *type* is `VFIO_USER_IO_FD_TYPE_IOREGIONFD`
* *flags* is any of:
  * `KVM_IOREGIONFD_FLAG_PIO`
  * `KVM_IOREGIONFD_FLAG_POSTED_WRITES`
* *region_id* is an opaque value passed back to the server via a message on the
  file descriptor

See https://www.spinics.net/lists/kvm/msg208139.html (FIXME) for further context
on the ioregionfd-specific fields.

 VFIO_USER_DEVICE_GET_IRQ_INFO
 -----------------------------
 
@@ -1141,7 +1309,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 6                      |
+| Command      | 7                      |
 +--------------+------------------------+
 | Message size | 32                     |
 +--------------+------------------------+
@@ -1212,7 +1380,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 7                      |
+| Command      | 8                      |
 +--------------+------------------------+
 | Message size | 36 + any data          |
 +--------------+------------------------+
@@ -1370,7 +1538,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 8                      |
+| Command      | 9                      |
 +--------------+------------------------+
 | Message size | 32 + data size         |
 +--------------+------------------------+
@@ -1397,7 +1565,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 9                      |
+| Command      | 10                     |
 +--------------+------------------------+
 | Message size | 32 + data size         |
 +--------------+------------------------+
@@ -1424,7 +1592,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 10                     |
+| Command      | 11                     |
 +--------------+------------------------+
 | Message size | 28 + data size         |
 +--------------+------------------------+
@@ -1451,7 +1619,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 11                     |
+| Command      | 12                     |
 +--------------+------------------------+
 | Message size | 28 + data size         |
 +--------------+------------------------+
@@ -1478,7 +1646,7 @@ Message format
 +================+========================+
 | Message ID     | <ID>                   |
 +----------------+------------------------+
-| Command        | 12                     |
+| Command        | 13                     |
 +----------------+------------------------+
 | Message size   | 20                     |
 +----------------+------------------------+
@@ -1515,7 +1683,7 @@ Message format
 +==============+========================+
 | Message ID   | <ID>                   |
 +--------------+------------------------+
-| Command      | 13                     |
+| Command      | 14                     |
 +--------------+------------------------+
 | Message size | 16                     |
 +--------------+------------------------+
@@ -1537,7 +1705,7 @@ Message format
 +====================+========================+
 | Message ID         | <ID>                   |
 +--------------------+------------------------+
-| Command            | 14                     |
+| Command            | 15                     |
 +--------------------+------------------------+
 | Message size       | 16                     |
 +--------------------+------------------------+

             reply index

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 16:09 John Levon [this message]
2021-01-26 11:01 ` Stefan Hajnoczi
2021-01-28 14:38   ` John Levon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210121160905.GA314820@sent \
    --to=john.levon@nutanix.com \
    --cc=cohuck@redhat.com \
    --cc=john.g.johnson@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=libvfio-user-devel@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git