kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Cornelia Huck <cohuck@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>, <cjia@nvidia.com>,
	<kevin.tian@intel.com>, <ziye.yang@intel.com>,
	<changpeng.liu@intel.com>, <yi.l.liu@intel.com>,
	<mlevitsk@redhat.com>, <eskultet@redhat.com>,
	<dgilbert@redhat.com>, <jonathan.davies@nutanix.com>,
	<eauger@redhat.com>, <aik@ozlabs.ru>, <pasic@linux.ibm.com>,
	<felipe@nutanix.com>, <Zhengxiao.zx@Alibaba-inc.com>,
	<shuangtai.tst@alibaba-inc.com>, <Ken.Xue@amd.com>,
	<zhi.a.wang@intel.com>, <yan.y.zhao@intel.com>,
	<qemu-devel@nongnu.org>, <kvm@vger.kernel.org>
Subject: Re: [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state
Date: Wed, 13 Nov 2019 11:24:17 +0100	[thread overview]
Message-ID: <20191113112417.6e40ce96.cohuck@redhat.com> (raw)
In-Reply-To: <20191112153005.53bf324c@x1.home>

On Tue, 12 Nov 2019 15:30:05 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Tue, 12 Nov 2019 22:33:36 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
> > - Defined MIGRATION region type and sub-type.
> > - Used 3 bits to define VFIO device states.
> >     Bit 0 => _RUNNING
> >     Bit 1 => _SAVING
> >     Bit 2 => _RESUMING
> >     Combination of these bits defines VFIO device's state during migration
> >     _RUNNING => Normal VFIO device running state. When its reset, it
> > 		indicates _STOPPED state. when device is changed to
> > 		_STOPPED, driver should stop device before write()
> > 		returns.
> >     _SAVING | _RUNNING => vCPUs are running, VFIO device is running but
> >                           start saving state of device i.e. pre-copy state
> >     _SAVING  => vCPUs are stopped, VFIO device should be stopped, and  
> 
> s/should/must/
> 
> >                 save device state,i.e. stop-n-copy state
> >     _RESUMING => VFIO device resuming state.
> >     _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states  
> 
> A table might be useful here and in the uapi header to indicate valid
> states:

I like that.

> 
> | _RESUMING | _SAVING | _RUNNING | Description
> +-----------+---------+----------+------------------------------------------
> |     0     |    0    |     0    | Stopped, not saving or resuming (a)
> +-----------+---------+----------+------------------------------------------
> |     0     |    0    |     1    | Running, default state
> +-----------+---------+----------+------------------------------------------
> |     0     |    1    |     0    | Stopped, migration interface in save mode
> +-----------+---------+----------+------------------------------------------
> |     0     |    1    |     1    | Running, save mode interface, iterative
> +-----------+---------+----------+------------------------------------------
> |     1     |    0    |     0    | Stopped, migration resume interface active
> +-----------+---------+----------+------------------------------------------
> |     1     |    0    |     1    | Invalid (b)
> +-----------+---------+----------+------------------------------------------
> |     1     |    1    |     0    | Invalid (c)
> +-----------+---------+----------+------------------------------------------
> |     1     |    1    |     1    | Invalid (d)
> 
> I think we need to consider whether we define (a) as generally
> available, for instance we might want to use it for diagnostics or a
> fatal error condition outside of migration.
> 
> Are there hidden assumptions between state transitions here or are
> there specific next possible state diagrams that we need to include as
> well?

Some kind of state-change diagram might be useful in addition to the
textual description anyway. Let me try, just to make sure I understand
this correctly:

1) 0/0/1 ---(trigger driver to start gathering state info)---> 0/1/1
2) 0/0/1 ---(tell driver to stop)---> 0/0/0
3) 0/1/1 ---(tell driver to stop)---> 0/1/0
4) 0/0/1 ---(tell driver to resume with provided info)---> 1/0/0
5) 1/0/0 ---(driver is ready)---> 0/0/1
6) 0/1/1 ---(tell driver to stop saving)---> 0/0/1

Not sure about the usefulness of 2). Also, is 4) the only way to
trigger resuming? And is the change in 5) performed by the driver, or
by userspace?

Are any other state transitions valid?

(...)

> > + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase
> > + * and for _SAVING device state or stop-and-copy phase:
> > + * a. read pending_bytes. If pending_bytes > 0, go through below steps.
> > + * b. read data_offset, indicates kernel driver to write data to staging buffer.
> > + *    Kernel driver should return this read operation only after writing data to
> > + *    staging buffer is done.  
> 
> "staging buffer" implies a vendor driver implementation, perhaps we
> could just state that data is available from (region + data_offset) to
> (region + data_offset + data_size) upon return of this read operation.
> 
> > + * c. read data_size, amount of data in bytes written by vendor driver in
> > + *    migration region.
> > + * d. read data_size bytes of data from data_offset in the migration region.
> > + * e. process data.
> > + * f. Loop through a to e. Next read on pending_bytes indicates that read data
> > + *    operation from migration region for previous iteration is done.  
> 
> I think this indicate that step (f) should be to read pending_bytes, the
> read sequence is not complete until this step.  Optionally the user can
> then proceed to step (b).  There are no read side-effects of (a) afaict.
> 
> Is the use required to reach pending_bytes == 0 before changing
> device_state, particularly transitioning to !_RUNNING?  Presumably the
> user can exit this sequence at any time by clearing _SAVING.

That would be transition 6) above (abort saving and continue). I think
it makes sense not to forbid this.

> 
> > + *
> > + * Sequence to be followed while _RESUMING device state:
> > + * While data for this device is available, repeat below steps:
> > + * a. read data_offset from where user application should write data.
> > + * b. write data of data_size to migration region from data_offset.
> > + * c. write data_size which indicates vendor driver that data is written in
> > + *    staging buffer. Vendor driver should read this data from migration
> > + *    region and resume device's state.  
> 
> The device defaults to _RUNNING state, so a prerequisite is to set
> _RESUMING and clear _RUNNING, right?

Transition 4) above. Do we need
7) 0/0/0 ---(tell driver to resume with provided info)---> 1/0/0
as well? (Probably depends on how sensible the 0/0/0 state is.)

> 
> > + *
> > + * For user application, data is opaque. User should write data in the same
> > + * order as received.
> > + */


  parent reply	other threads:[~2019-11-13 10:24 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-12 17:03 [PATCH v9 Kernel 0/5] Add KABIs to support migration for VFIO devices Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13  3:23     ` Yan Zhao
2019-11-13 19:02       ` Kirti Wankhede
2019-11-14  0:36         ` Yan Zhao
2019-11-14 18:55           ` Kirti Wankhede
2019-11-13 10:24     ` Cornelia Huck [this message]
2019-11-13 18:27       ` Alex Williamson
2019-11-13 19:29         ` Kirti Wankhede
2019-11-13 19:48           ` Alex Williamson
2019-11-13 20:17             ` Kirti Wankhede
2019-11-13 20:40               ` Alex Williamson
2019-11-14 18:49                 ` Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13 19:37     ` Kirti Wankhede
2019-11-13 20:07       ` Alex Williamson
2019-11-14 18:56         ` Kirti Wankhede
2019-11-14 21:06           ` Alex Williamson
2019-11-15  2:40             ` Yan Zhao
2019-11-15  3:21               ` Alex Williamson
2019-11-15  5:10                 ` Tian, Kevin
2019-11-19 23:16                   ` Alex Williamson
2019-11-20  1:04                     ` Tian, Kevin
2019-11-20  1:51                 ` Yan Zhao
2019-11-26  0:57             ` Yan Zhao
2019-12-03 18:04               ` Alex Williamson
2019-12-04 18:10                 ` Kirti Wankhede
2019-12-04 18:34                   ` Alex Williamson
2019-12-05  1:28                     ` Yan Zhao
2019-12-05  5:42                       ` Kirti Wankhede
2019-12-05  5:47                         ` Yan Zhao
2019-12-05  5:56                         ` Alex Williamson
2019-12-05  6:19                           ` Kirti Wankhede
2019-12-05  6:40                             ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 3/5] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13 19:52     ` Kirti Wankhede
2019-11-13 20:22       ` Alex Williamson
2019-11-14 18:56         ` Kirti Wankhede
2019-11-14 21:08           ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 4/5] vfio iommu: Implementation of ioctl to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 5/5] vfio iommu: Implementation of ioctl to get dirty bitmap before unmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191113112417.6e40ce96.cohuck@redhat.com \
    --to=cohuck@redhat.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).