kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>, alex.williamson@redhat.com
Cc: jgg@nvidia.com, corbet@lwn.net, kvm@vger.kernel.org,
	linux-doc@vger.kernel.org, farman@linux.ibm.com,
	mjrosato@linux.ibm.com, pasic@linux.ibm.com
Subject: Re: [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state
Date: Mon, 20 Dec 2021 18:38:26 +0100	[thread overview]
Message-ID: <87v8zjp46l.fsf@redhat.com> (raw)
In-Reply-To: <163909282574.728533.7460416142511440919.stgit@omen>

On Thu, Dec 09 2021, Alex Williamson <alex.williamson@redhat.com> wrote:

> A new NDMA state is being proposed to support a quiescent state for
> contexts containing multiple devices with peer-to-peer DMA support.
> Formally define it.

[I'm wondering if we would want to use NDMA in other cases as well. Just
thinking out loud below.]

>
> Clarify various aspects of the migration region data fields and
> protocol.  Remove QEMU related terminology and flows from the uAPI;
> these will be provided in Documentation/ so as not to confuse the
> device_state bitfield with a finite state machine with restricted
> state transitions.
>
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  include/uapi/linux/vfio.h |  405 ++++++++++++++++++++++++---------------------
>  1 file changed, 214 insertions(+), 191 deletions(-)
>
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index ef33ea002b0b..1fdbc928f886 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h

(...)

> + *   The device_state field defines the following bitfield use:
> + *
> + *     - Bit 0 (RUNNING) [REQUIRED]:
> + *        - Setting this bit indicates the device is fully operational, the
> + *          device may generate interrupts, DMA, respond to MMIO, all vfio
> + *          device regions are functional, and the device may advance its
> + *          internal state.  The default device_state must indicate the device
> + *          in exclusively the RUNNING state, with no other bits in this field
> + *          set.
> + *        - Clearing this bit (ie. !RUNNING) must stop the operation of the
> + *          device.  The device must not generate interrupts, DMA, or advance
> + *          its internal state.  The user should take steps to restrict access
> + *          to vfio device regions other than the migration region while the
> + *          device is !RUNNING or risk corruption of the device migration data
> + *          stream.  The device and kernel migration driver must accept and
> + *          respond to interaction to support external subsystems in the
> + *          !RUNNING state, for example PCI MSI-X and PCI config space.
> + *          Failure by the user to restrict device access while !RUNNING must
> + *          not result in error conditions outside the user context (ex.
> + *          host system faults).

If I consider ccw, this would mean that user space would need to stop
writing to the regions that initiate start/halt/... when RUNNING is
cleared (makes sense) and that the subchannel must be idle or even
disabled (so that it does not become status pending). The question is,
does it make sense to stop new requests and wait for the subchannel to
become idle during the !RUNNING transition (or even forcefully kill
outstanding I/O), or...

> + *     - Bit 1 (SAVING) [REQUIRED]:
> + *        - Setting this bit enables and initializes the migration region data
> + *          window and associated fields within vfio_device_migration_info for
> + *          capturing the migration data stream for the device.  The migration
> + *          driver may perform actions such as enabling dirty logging of device
> + *          state with this bit.  The SAVING bit is mutually exclusive with the
> + *          RESUMING bit defined below.
> + *        - Clearing this bit (ie. !SAVING) de-initializes the migration region
> + *          data window and indicates the completion or termination of the
> + *          migration data stream for the device.
> + *     - Bit 2 (RESUMING) [REQUIRED]:
> + *        - Setting this bit enables and initializes the migration region data
> + *          window and associated fields within vfio_device_migration_info for
> + *          restoring the device from a migration data stream captured from a
> + *          SAVING session with a compatible device.  The migration driver may
> + *          perform internal device resets as necessary to reinitialize the
> + *          internal device state for the incoming migration data.
> + *        - Clearing this bit (ie. !RESUMING) de-initializes the migration
> + *          region data window and indicates the end of a resuming session for
> + *          the device.  The kernel migration driver should complete the
> + *          incorporation of data written to the migration data window into the
> + *          device internal state and perform final validity and consistency
> + *          checking of the new device state.  If the user provided data is
> + *          found to be incomplete, inconsistent, or otherwise invalid, the
> + *          migration driver must indicate a write(2) error and follow the
> + *          previously described protocol to return either the previous state
> + *          or an error state.
> + *     - Bit 3 (NDMA) [OPTIONAL]:
> + *        The NDMA or "No DMA" state is intended to be a quiescent state for
> + *        the device for the purposes of managing multiple devices within a
> + *        user context where peer-to-peer DMA between devices may be active.
> + *        Support for the NDMA bit is indicated through the presence of the
> + *        VFIO_REGION_INFO_CAP_MIG_NDMA capability as reported by
> + *        VFIO_DEVICE_GET_REGION_INFO for the associated device migration
> + *        region.
> + *        - Setting this bit must prevent the device from initiating any
> + *          new DMA or interrupt transactions.  The migration driver must
> + *          complete any such outstanding operations prior to completing
> + *          the transition to the NDMA state.  The NDMA device_state
> + *          essentially represents a sub-set of the !RUNNING state for the
> + *          purpose of quiescing the device, therefore the NDMA device_state
> + *          bit is superfluous in combinations including !RUNNING.
> + *        - Clearing this bit (ie. !NDMA) negates the device operational
> + *          restrictions required by the NDMA state.

...should we use NDMA as the "stop new requests" state, but allow
running channel programs to conclude? I'm not entirely sure whether
that's in the spirit of NDMA (subchannels are independent of each
other), but it would be kind of "quiescing" already.

(We should probably clarify things like that in the Documentation/
file.)


  parent reply	other threads:[~2021-12-20 17:38 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 23:34 [RFC PATCH] vfio: Update/Clarify migration uAPI, add NDMA state Alex Williamson
2021-12-10  1:25 ` Jason Gunthorpe
2021-12-13 20:40   ` Alex Williamson
2021-12-14 12:08     ` Cornelia Huck
2021-12-14 16:26     ` Jason Gunthorpe
2021-12-20 22:26       ` Alex Williamson
2022-01-04 20:28         ` Jason Gunthorpe
2022-01-06 18:17           ` Alex Williamson
2022-01-06 21:20             ` Jason Gunthorpe
2022-01-10  7:55               ` Tian, Kevin
2022-01-10 17:34                 ` Alex Williamson
2022-01-11  2:41                   ` Tian, Kevin
2022-01-10 18:11                 ` Jason Gunthorpe
2022-01-11  3:14                   ` Tian, Kevin
2022-01-11 18:19                     ` Jason Gunthorpe
2022-01-04  3:49       ` Tian, Kevin
2022-01-04 16:09         ` Jason Gunthorpe
2022-01-05  1:59           ` Tian, Kevin
2022-01-05 12:45             ` Jason Gunthorpe
2022-01-06  6:32               ` Tian, Kevin
2022-01-06 15:42                 ` Jason Gunthorpe
2022-01-07  0:00                   ` Tian, Kevin
2022-01-07  0:29                     ` Jason Gunthorpe
2022-01-07  2:01                       ` Tian, Kevin
2022-01-07 17:23                         ` Jason Gunthorpe
2022-01-10  3:14                           ` Tian, Kevin
2022-01-10 17:52                             ` Jason Gunthorpe
2022-01-11  2:57                               ` Tian, Kevin
2022-01-05  3:06           ` Tian, Kevin
2021-12-20 17:38 ` Cornelia Huck [this message]
2021-12-20 22:49   ` Alex Williamson
2021-12-21 11:24     ` Cornelia Huck
2022-01-07  8:03 ` Tian, Kevin
2022-01-07 16:36   ` Alex Williamson
2022-01-10  6:01     ` Tian, Kevin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8zjp46l.fsf@redhat.com \
    --to=cohuck@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=corbet@lwn.net \
    --cc=farman@linux.ibm.com \
    --cc=jgg@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=mjrosato@linux.ibm.com \
    --cc=pasic@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).