kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: Kirti Wankhede <kwankhede@nvidia.com>, <cjia@nvidia.com>,
	<kevin.tian@intel.com>, <ziye.yang@intel.com>,
	<changpeng.liu@intel.com>, <yi.l.liu@intel.com>,
	<mlevitsk@redhat.com>, <eskultet@redhat.com>,
	<dgilbert@redhat.com>, <jonathan.davies@nutanix.com>,
	<eauger@redhat.com>, <aik@ozlabs.ru>, <pasic@linux.ibm.com>,
	<felipe@nutanix.com>, <Zhengxiao.zx@Alibaba-inc.com>,
	<shuangtai.tst@alibaba-inc.com>, <Ken.Xue@amd.com>,
	<zhi.a.wang@intel.com>, <yan.y.zhao@intel.com>,
	<qemu-devel@nongnu.org>, <kvm@vger.kernel.org>
Subject: Re: [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state
Date: Wed, 13 Nov 2019 11:27:33 -0700	[thread overview]
Message-ID: <20191113112733.49542ebc@x1.home> (raw)
In-Reply-To: <20191113112417.6e40ce96.cohuck@redhat.com>

On Wed, 13 Nov 2019 11:24:17 +0100
Cornelia Huck <cohuck@redhat.com> wrote:

> On Tue, 12 Nov 2019 15:30:05 -0700
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Tue, 12 Nov 2019 22:33:36 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> > > - Defined MIGRATION region type and sub-type.
> > > - Used 3 bits to define VFIO device states.
> > >     Bit 0 => _RUNNING
> > >     Bit 1 => _SAVING
> > >     Bit 2 => _RESUMING
> > >     Combination of these bits defines VFIO device's state during migration
> > >     _RUNNING => Normal VFIO device running state. When its reset, it
> > > 		indicates _STOPPED state. when device is changed to
> > > 		_STOPPED, driver should stop device before write()
> > > 		returns.
> > >     _SAVING | _RUNNING => vCPUs are running, VFIO device is running but
> > >                           start saving state of device i.e. pre-copy state
> > >     _SAVING  => vCPUs are stopped, VFIO device should be stopped, and    
> > 
> > s/should/must/
> >   
> > >                 save device state,i.e. stop-n-copy state
> > >     _RESUMING => VFIO device resuming state.
> > >     _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states    
> > 
> > A table might be useful here and in the uapi header to indicate valid
> > states:  
> 
> I like that.
> 
> > 
> > | _RESUMING | _SAVING | _RUNNING | Description
> > +-----------+---------+----------+------------------------------------------
> > |     0     |    0    |     0    | Stopped, not saving or resuming (a)
> > +-----------+---------+----------+------------------------------------------
> > |     0     |    0    |     1    | Running, default state
> > +-----------+---------+----------+------------------------------------------
> > |     0     |    1    |     0    | Stopped, migration interface in save mode
> > +-----------+---------+----------+------------------------------------------
> > |     0     |    1    |     1    | Running, save mode interface, iterative
> > +-----------+---------+----------+------------------------------------------
> > |     1     |    0    |     0    | Stopped, migration resume interface active
> > +-----------+---------+----------+------------------------------------------
> > |     1     |    0    |     1    | Invalid (b)
> > +-----------+---------+----------+------------------------------------------
> > |     1     |    1    |     0    | Invalid (c)
> > +-----------+---------+----------+------------------------------------------
> > |     1     |    1    |     1    | Invalid (d)
> > 
> > I think we need to consider whether we define (a) as generally
> > available, for instance we might want to use it for diagnostics or a
> > fatal error condition outside of migration.
> > 
> > Are there hidden assumptions between state transitions here or are
> > there specific next possible state diagrams that we need to include as
> > well?  
> 
> Some kind of state-change diagram might be useful in addition to the
> textual description anyway. Let me try, just to make sure I understand
> this correctly:
> 
> 1) 0/0/1 ---(trigger driver to start gathering state info)---> 0/1/1
> 2) 0/0/1 ---(tell driver to stop)---> 0/0/0
> 3) 0/1/1 ---(tell driver to stop)---> 0/1/0
> 4) 0/0/1 ---(tell driver to resume with provided info)---> 1/0/0

I think this is to switch into resuming mode, the data will follow

> 5) 1/0/0 ---(driver is ready)---> 0/0/1
> 6) 0/1/1 ---(tell driver to stop saving)---> 0/0/1

I think also:

0/0/1 --> 0/1/0 If user chooses to go directly to stop and copy

0/0/0 and 0/0/1 should be reachable from any state, though I could see
that a vendor driver could fail transition from 1/0/0 -> 0/0/1 if the
received state is incomplete.  Somehow though a user always needs to
return the device to the initial state, so how does device_state
interact with the reset ioctl?  Would this automatically manipulate
device_state back to 0/0/1?
 
> Not sure about the usefulness of 2). Also, is 4) the only way to
> trigger resuming? And is the change in 5) performed by the driver, or
> by userspace?
> 
> Are any other state transitions valid?
> 
> (...)
> 
> > > + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase
> > > + * and for _SAVING device state or stop-and-copy phase:
> > > + * a. read pending_bytes. If pending_bytes > 0, go through below steps.
> > > + * b. read data_offset, indicates kernel driver to write data to staging buffer.
> > > + *    Kernel driver should return this read operation only after writing data to
> > > + *    staging buffer is done.    
> > 
> > "staging buffer" implies a vendor driver implementation, perhaps we
> > could just state that data is available from (region + data_offset) to
> > (region + data_offset + data_size) upon return of this read operation.
> >   
> > > + * c. read data_size, amount of data in bytes written by vendor driver in
> > > + *    migration region.
> > > + * d. read data_size bytes of data from data_offset in the migration region.
> > > + * e. process data.
> > > + * f. Loop through a to e. Next read on pending_bytes indicates that read data
> > > + *    operation from migration region for previous iteration is done.    
> > 
> > I think this indicate that step (f) should be to read pending_bytes, the
> > read sequence is not complete until this step.  Optionally the user can
> > then proceed to step (b).  There are no read side-effects of (a) afaict.
> > 
> > Is the use required to reach pending_bytes == 0 before changing
> > device_state, particularly transitioning to !_RUNNING?  Presumably the
> > user can exit this sequence at any time by clearing _SAVING.  
> 
> That would be transition 6) above (abort saving and continue). I think
> it makes sense not to forbid this.
> 
> >   
> > > + *
> > > + * Sequence to be followed while _RESUMING device state:
> > > + * While data for this device is available, repeat below steps:
> > > + * a. read data_offset from where user application should write data.
> > > + * b. write data of data_size to migration region from data_offset.
> > > + * c. write data_size which indicates vendor driver that data is written in
> > > + *    staging buffer. Vendor driver should read this data from migration
> > > + *    region and resume device's state.    
> > 
> > The device defaults to _RUNNING state, so a prerequisite is to set
> > _RESUMING and clear _RUNNING, right?  
> 
> Transition 4) above. Do we need
> 7) 0/0/0 ---(tell driver to resume with provided info)---> 1/0/0
> as well? (Probably depends on how sensible the 0/0/0 state is.)

I think we must unless we require the user to transition from 0/0/1 to
1/0/0 in a single operation, but I'd prefer to make 0/0/0 generally
available.  Thanks,

Alex


  reply	other threads:[~2019-11-13 18:27 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-12 17:03 [PATCH v9 Kernel 0/5] Add KABIs to support migration for VFIO devices Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13  3:23     ` Yan Zhao
2019-11-13 19:02       ` Kirti Wankhede
2019-11-14  0:36         ` Yan Zhao
2019-11-14 18:55           ` Kirti Wankhede
2019-11-13 10:24     ` Cornelia Huck
2019-11-13 18:27       ` Alex Williamson [this message]
2019-11-13 19:29         ` Kirti Wankhede
2019-11-13 19:48           ` Alex Williamson
2019-11-13 20:17             ` Kirti Wankhede
2019-11-13 20:40               ` Alex Williamson
2019-11-14 18:49                 ` Kirti Wankhede
2019-11-12 17:03 ` [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13 19:37     ` Kirti Wankhede
2019-11-13 20:07       ` Alex Williamson
2019-11-14 18:56         ` Kirti Wankhede
2019-11-14 21:06           ` Alex Williamson
2019-11-15  2:40             ` Yan Zhao
2019-11-15  3:21               ` Alex Williamson
2019-11-15  5:10                 ` Tian, Kevin
2019-11-19 23:16                   ` Alex Williamson
2019-11-20  1:04                     ` Tian, Kevin
2019-11-20  1:51                 ` Yan Zhao
2019-11-26  0:57             ` Yan Zhao
2019-12-03 18:04               ` Alex Williamson
2019-12-04 18:10                 ` Kirti Wankhede
2019-12-04 18:34                   ` Alex Williamson
2019-12-05  1:28                     ` Yan Zhao
2019-12-05  5:42                       ` Kirti Wankhede
2019-12-05  5:47                         ` Yan Zhao
2019-12-05  5:56                         ` Alex Williamson
2019-12-05  6:19                           ` Kirti Wankhede
2019-12-05  6:40                             ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 3/5] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-13 19:52     ` Kirti Wankhede
2019-11-13 20:22       ` Alex Williamson
2019-11-14 18:56         ` Kirti Wankhede
2019-11-14 21:08           ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 4/5] vfio iommu: Implementation of ioctl to get dirty pages bitmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson
2019-11-12 17:03 ` [PATCH v9 Kernel 5/5] vfio iommu: Implementation of ioctl to get dirty bitmap before unmap Kirti Wankhede
2019-11-12 22:30   ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191113112733.49542ebc@x1.home \
    --to=alex.williamson@redhat.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@Alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=changpeng.liu@intel.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=jonathan.davies@nutanix.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=zhi.a.wang@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).