From: John Levon <levon@movementarian.org>
To: Thanos Makatos <thanos.makatos@nutanix.com>
Cc: "benjamin.walker@intel.com" <benjamin.walker@intel.com>,
"Elena Ufimtseva" <elena.ufimtseva@oracle.com>,
"jag.raman@oracle.com" <jag.raman@oracle.com>,
"james.r.harris@intel.com" <james.r.harris@intel.com>,
"Swapnil Ingle" <swapnil.ingle@nutanix.com>,
"john.g.johnson@oracle.com" <john.g.johnson@oracle.com>,
"yuvalkashtan@gmail.com" <yuvalkashtan@gmail.com>,
"konrad.wilk@oracle.com" <konrad.wilk@oracle.com>,
"tina.zhang@intel.com" <tina.zhang@intel.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"dgilbert@redhat.com" <dgilbert@redhat.com>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"ismael@linux.com" <ismael@linux.com>,
"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
"Stefan Hajnoczi" <stefanha@redhat.com>,
"Felipe Franciosi" <felipe@nutanix.com>,
"xiuchun.lu@intel.com" <xiuchun.lu@intel.com>,
"tomassetti.andrea@gmail.com" <tomassetti.andrea@gmail.com>,
"changpeng.liu@intel.com" <changpeng.liu@intel.com>,
"Raphael Norwitz" <raphael.norwitz@nutanix.com>,
"Kanth.Ghatraju@oracle.com" <Kanth.Ghatraju@oracle.com>
Subject: Re: [PATCH v5] introduce vfio-user protocol specification
Date: Fri, 30 Oct 2020 17:03:06 +0000 [thread overview]
Message-ID: <20201030170306.GA2544852@li1368-133.members.linode.com> (raw)
In-Reply-To: <SN1PR02MB3725C85DCD4BF652FF6FBB8D8B170@SN1PR02MB3725.namprd02.prod.outlook.com>
On Wed, Oct 28, 2020 at 04:41:31PM +0000, Thanos Makatos wrote:
> FYI here's v5 of the vfio-user protocol, my --cc in git send-email got messed up somehow
Hi Thanos, this looks great, I just had some minor questions below.
> Command Concurrency
> -------------------
> A client may pipeline multiple commands without waiting for previous command
> replies. The server will process commands in the order they are received.
> A consequence of this is if a client issues a command with the *No_reply* bit,
> then subseqently issues a command without *No_reply*, the older command will
> have been processed before the reply to the younger command is sent by the
> server. The client must be aware of the device's capability to process concurrent
> commands if pipelining is used. For example, pipelining allows multiple client
> threads to concurently access device memory; the client must ensure these acceses
> obey device semantics.
>
> An example is a frame buffer device, where the device may allow concurrent access
> to different areas of video memory, but may have indeterminate behavior if concurrent
> acceses are performed to command or status registers.
Is it valid for an unrelated server->client message to appear in between a
client->server request/reply, or not? And vice versa? Either way, seems useful
for the spec to say.
> | | +-----+------------+ |
> | | | Bit | Definition | |
> | | +=====+============+ |
> | | | 0-3 | Type | |
> | | +-----+------------+ |
> | | | 4 | No_reply | |
> | | +-----+------------+ |
> | | | 5 | Error | |
> | | +-----+------------+ |
> +----------------+--------+-------------+
> | Error | 12 | 4 |
> +----------------+--------+-------------+
>
> * *Message ID* identifies the message, and is echoed in the command's reply message.
Is it valid to re-use an ID? When/when not?
> * *Error* in a reply message indicates the command being acknowledged had
> an error. In this case, the *Error* field will be valid.
>
> * *Error* in a reply message is a UNIX errno value. It is reserved in a command message.
I'm not quite following why we need a bit flag and an error field. Do you
anticipate a failure, but with errno==0?
> VFIO_USER_VERSION
> -----------------
>
> +--------------+------------------------+
> | Message size | 16 + version length |
Terminating NUL included?
> +--------------+--------+---------------------------------------------------+
> | Name | Type | Description |
> +==============+========+===================================================+
> | version | object | ``{"major": <number>, "minor": <number>}`` |
> | | | |
> | | | Version supported by the sender, e.g. "0.1". |
It seems quite unlikely but this should specify it's strings not floating point
values maybe?
Definitely applies to max_fds too.
> Common capabilities:
>
> +---------------+------------------------------------------------------------+
> | Name | Description |
> +===============+============================================================+
> | ``max_fds`` | Maximum number of file descriptors that can be received by |
> | | the sender. Optional. |
Could specify the meaning when absent?
By array I presume you mean associative array i.e. an Object. Does the whole
thing look like this:
{
"major": ..
"minor": ..
"capabilities": {
"max_fds": ..,
"migration
}
}
or something else?
> Versioning and Feature Support
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> Upon accepting a connection, the server must send a VFIO_USER_VERSION message
> proposing a protocol version and a set of capabilities. The client compares
> these with the versions and capabilities it supports and sends a
> VFIO_USER_VERSION reply according to the following rules.
I'm curious if there was a specific reason it's this way around, when it seems
more natural for the client to propose first, and the server to reply?
> VFIO_USER_DMA_MAP and VFIO_USER_DMA_UNMAP
> -----------------------------------------
Huge nit, but why are these DMA_*MAP when vfio uses *MAP_DMA ?
> VFIO bitmap format
> * *size* the size for the bitmap, in bytes.
Should this clarify it does *not* include the bitmap header in its size, unlike
other size fields?
> VFIO_USER_DMA_MAP
> """""""""""""""""
> If a DMA region being added can be directly mapped by the server, an array of
> file descriptors must be sent as part of the message meta-data. Each region
> entry must have a corresponding file descriptor.
"Each mappable region entry" ?
> descriptors must be passed as SCM_RIGHTS type ancillary data. Otherwise, if a
> DMA region cannot be directly mapped by the server, it can be accessed by the
> server using VFIO_USER_DMA_READ and VFIO_USER_DMA_WRITE messages, explained in
> `Read and Write Operations`_. A command to map over an existing region must be
> failed by the server with ``EEXIST`` set in error field in the reply.
>
> VFIO_USER_DMA_UNMAP
> """""""""""""""""""
> Upon receiving a VFIO_USER_DMA_UNMAP command, if the file descriptor is mapped
> then the server must release all references to that DMA region before replying,
> which includes potentially in flight DMA transactions. Removing a portion of a
> DMA region is possible. If the VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP bit is set
> in the request, the server must append to the header the ``struct vfio_bitmap``
> received in the command, followed by the bitmap. Thus, the message size the
> client should is expect is the size of the header plus the size of
> ``struct vfio_bitmap`` plus ``vfio_bitmap.size`` bytes. Each bit in the bitmap
> represents one page of size ``vfio_bitmap.pgsize``.
I'm finding this makes the sizing a bit confusing between map and unmap, could
we may be separate them out, and always define a vfio_bitmap slot for unmap?
Also, shouldn't the client expect sizeof (header) + (nr_table_entries_in_request
* (each vfio_bitmap's size)) in the server's response?
Does the reply header size field reflect this?
> VFIO_USER_DMA_WRITE
> -------------------
>
> This command message is sent from the server to the client to write to server
> memory.
"write to client memory"?
> VFIO_USER_DIRY_PAGES
Nit, "DIRTY"
thanks
john
next prev parent reply other threads:[~2020-10-30 17:06 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-17 11:20 [PATCH v3] introduce VFIO-over-socket protocol specificaion Thanos Makatos
2020-07-21 16:33 ` Nikos Dragazis
2020-07-22 11:43 ` Thanos Makatos
2020-09-15 14:29 ` [PATCH v4] introduce vfio-user protocol specification Thanos Makatos
2020-09-24 8:21 ` Stefan Hajnoczi
2020-09-24 9:24 ` Michael S. Tsirkin
2020-09-28 9:58 ` Thanos Makatos
2020-09-29 10:37 ` Stefan Hajnoczi
2020-09-29 16:21 ` John G Johnson
2020-09-30 14:24 ` Stefan Hajnoczi
2020-10-02 10:14 ` Felipe Franciosi
2020-10-13 9:30 ` Stefan Hajnoczi
2020-10-15 13:36 ` Felipe Franciosi
2020-10-30 19:14 ` Stefan Hajnoczi
2020-10-13 9:42 ` Daniel P. Berrangé
2020-09-29 10:41 ` Stefan Hajnoczi
2020-10-28 16:10 ` [PATCH v5] " Thanos Makatos
2020-10-28 16:41 ` Thanos Makatos
2020-10-30 17:03 ` John Levon [this message]
2020-11-02 11:29 ` Thanos Makatos
2020-11-02 11:41 ` John Levon
2020-11-02 11:51 ` Thanos Makatos
2020-11-06 1:50 ` John G Johnson
2020-11-07 12:26 ` John Levon
2020-11-09 12:07 ` Thanos Makatos
2020-11-09 9:20 ` Thanos Makatos
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201030170306.GA2544852@li1368-133.members.linode.com \
--to=levon@movementarian.org \
--cc=Kanth.Ghatraju@oracle.com \
--cc=alex.williamson@redhat.com \
--cc=benjamin.walker@intel.com \
--cc=changpeng.liu@intel.com \
--cc=dgilbert@redhat.com \
--cc=elena.ufimtseva@oracle.com \
--cc=felipe@nutanix.com \
--cc=ismael@linux.com \
--cc=jag.raman@oracle.com \
--cc=james.r.harris@intel.com \
--cc=john.g.johnson@oracle.com \
--cc=konrad.wilk@oracle.com \
--cc=marcandre.lureau@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=raphael.norwitz@nutanix.com \
--cc=stefanha@redhat.com \
--cc=swapnil.ingle@nutanix.com \
--cc=thanos.makatos@nutanix.com \
--cc=tina.zhang@intel.com \
--cc=tomassetti.andrea@gmail.com \
--cc=xiuchun.lu@intel.com \
--cc=yuvalkashtan@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).