From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, WEIRD_QUOTING autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7ABAC00A89 for ; Fri, 30 Oct 2020 17:06:49 +0000 (UTC) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 256242075E for ; Fri, 30 Oct 2020 17:06:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 256242075E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=movementarian.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Received: from localhost ([::1]:55824 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kYXrU-00027u-2A for qemu-devel@archiver.kernel.org; Fri, 30 Oct 2020 13:06:48 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:32968) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kYXo3-0008FU-Br for qemu-devel@nongnu.org; Fri, 30 Oct 2020 13:03:15 -0400 Received: from ssh.movementarian.org ([2a01:7e00::f03c:92ff:fefb:3ad2]:59820 helo=movementarian.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kYXo1-0005iF-41 for qemu-devel@nongnu.org; Fri, 30 Oct 2020 13:03:14 -0400 Received: from movement by movementarian.org with local (Exim 4.93) (envelope-from ) id 1kYXnu-00Agre-GO; Fri, 30 Oct 2020 17:03:06 +0000 Date: Fri, 30 Oct 2020 17:03:06 +0000 From: John Levon To: Thanos Makatos Subject: Re: [PATCH v5] introduce vfio-user protocol specification Message-ID: <20201030170306.GA2544852@li1368-133.members.linode.com> References: <1594984851-59327-1-git-send-email-thanos.makatos@nutanix.com> <20201028161005.115810-1-thanos.makatos@nutanix.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Url: http://www.movementarian.org/ Received-SPF: none client-ip=2a01:7e00::f03c:92ff:fefb:3ad2; envelope-from=movement@li1368-133.members.linode.com; helo=movementarian.org X-detected-operating-system: by eggs.gnu.org: No matching host in p0f cache. That's all we know. X-Spam_score_int: 20 X-Spam_score: 2.0 X-Spam_bar: ++ X-Spam_report: (2.0 / 5.0 requ) BAYES_00=-1.9, HEADER_FROM_DIFFERENT_DOMAINS=0.25, KHOP_HELO_FCRDNS=0.276, RCVD_IN_SBL_CSS=3.335, SPF_HELO_FAIL=0.001, SPF_NONE=0.001, WEIRD_QUOTING=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "benjamin.walker@intel.com" , Elena Ufimtseva , "jag.raman@oracle.com" , "james.r.harris@intel.com" , Swapnil Ingle , "john.g.johnson@oracle.com" , "yuvalkashtan@gmail.com" , "konrad.wilk@oracle.com" , "tina.zhang@intel.com" , "qemu-devel@nongnu.org" , "dgilbert@redhat.com" , =?iso-8859-1?Q?Marc-Andr=E9?= Lureau , "ismael@linux.com" , "alex.williamson@redhat.com" , Stefan Hajnoczi , Felipe Franciosi , "xiuchun.lu@intel.com" , "tomassetti.andrea@gmail.com" , "changpeng.liu@intel.com" , Raphael Norwitz , "Kanth.Ghatraju@oracle.com" Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" On Wed, Oct 28, 2020 at 04:41:31PM +0000, Thanos Makatos wrote: > FYI here's v5 of the vfio-user protocol, my --cc in git send-email got messed up somehow Hi Thanos, this looks great, I just had some minor questions below. > Command Concurrency > ------------------- > A client may pipeline multiple commands without waiting for previous command > replies. The server will process commands in the order they are received. > A consequence of this is if a client issues a command with the *No_reply* bit, > then subseqently issues a command without *No_reply*, the older command will > have been processed before the reply to the younger command is sent by the > server. The client must be aware of the device's capability to process concurrent > commands if pipelining is used. For example, pipelining allows multiple client > threads to concurently access device memory; the client must ensure these acceses > obey device semantics. > > An example is a frame buffer device, where the device may allow concurrent access > to different areas of video memory, but may have indeterminate behavior if concurrent > acceses are performed to command or status registers. Is it valid for an unrelated server->client message to appear in between a client->server request/reply, or not? And vice versa? Either way, seems useful for the spec to say. > | | +-----+------------+ | > | | | Bit | Definition | | > | | +=====+============+ | > | | | 0-3 | Type | | > | | +-----+------------+ | > | | | 4 | No_reply | | > | | +-----+------------+ | > | | | 5 | Error | | > | | +-----+------------+ | > +----------------+--------+-------------+ > | Error | 12 | 4 | > +----------------+--------+-------------+ > > * *Message ID* identifies the message, and is echoed in the command's reply message. Is it valid to re-use an ID? When/when not? > * *Error* in a reply message indicates the command being acknowledged had > an error. In this case, the *Error* field will be valid. > > * *Error* in a reply message is a UNIX errno value. It is reserved in a command message. I'm not quite following why we need a bit flag and an error field. Do you anticipate a failure, but with errno==0? > VFIO_USER_VERSION > ----------------- > > +--------------+------------------------+ > | Message size | 16 + version length | Terminating NUL included? > +--------------+--------+---------------------------------------------------+ > | Name | Type | Description | > +==============+========+===================================================+ > | version | object | ``{"major": , "minor": }`` | > | | | | > | | | Version supported by the sender, e.g. "0.1". | It seems quite unlikely but this should specify it's strings not floating point values maybe? Definitely applies to max_fds too. > Common capabilities: > > +---------------+------------------------------------------------------------+ > | Name | Description | > +===============+============================================================+ > | ``max_fds`` | Maximum number of file descriptors that can be received by | > | | the sender. Optional. | Could specify the meaning when absent? By array I presume you mean associative array i.e. an Object. Does the whole thing look like this: { "major": .. "minor": .. "capabilities": { "max_fds": .., "migration } } or something else? > Versioning and Feature Support > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Upon accepting a connection, the server must send a VFIO_USER_VERSION message > proposing a protocol version and a set of capabilities. The client compares > these with the versions and capabilities it supports and sends a > VFIO_USER_VERSION reply according to the following rules. I'm curious if there was a specific reason it's this way around, when it seems more natural for the client to propose first, and the server to reply? > VFIO_USER_DMA_MAP and VFIO_USER_DMA_UNMAP > ----------------------------------------- Huge nit, but why are these DMA_*MAP when vfio uses *MAP_DMA ? > VFIO bitmap format > * *size* the size for the bitmap, in bytes. Should this clarify it does *not* include the bitmap header in its size, unlike other size fields? > VFIO_USER_DMA_MAP > """"""""""""""""" > If a DMA region being added can be directly mapped by the server, an array of > file descriptors must be sent as part of the message meta-data. Each region > entry must have a corresponding file descriptor. "Each mappable region entry" ? > descriptors must be passed as SCM_RIGHTS type ancillary data. Otherwise, if a > DMA region cannot be directly mapped by the server, it can be accessed by the > server using VFIO_USER_DMA_READ and VFIO_USER_DMA_WRITE messages, explained in > `Read and Write Operations`_. A command to map over an existing region must be > failed by the server with ``EEXIST`` set in error field in the reply. > > VFIO_USER_DMA_UNMAP > """"""""""""""""""" > Upon receiving a VFIO_USER_DMA_UNMAP command, if the file descriptor is mapped > then the server must release all references to that DMA region before replying, > which includes potentially in flight DMA transactions. Removing a portion of a > DMA region is possible. If the VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP bit is set > in the request, the server must append to the header the ``struct vfio_bitmap`` > received in the command, followed by the bitmap. Thus, the message size the > client should is expect is the size of the header plus the size of > ``struct vfio_bitmap`` plus ``vfio_bitmap.size`` bytes. Each bit in the bitmap > represents one page of size ``vfio_bitmap.pgsize``. I'm finding this makes the sizing a bit confusing between map and unmap, could we may be separate them out, and always define a vfio_bitmap slot for unmap? Also, shouldn't the client expect sizeof (header) + (nr_table_entries_in_request * (each vfio_bitmap's size)) in the server's response? Does the reply header size field reflect this? > VFIO_USER_DMA_WRITE > ------------------- > > This command message is sent from the server to the client to write to server > memory. "write to client memory"? > VFIO_USER_DIRY_PAGES Nit, "DIRTY" thanks john