All of lore.kernel.org
 help / color / mirror / Atom feed
From: Erik Skultety <eskultet@redhat.com>
To: Zhao Yan <yan.y.zhao@intel.com>
Cc: "cjia@nvidia.com" <cjia@nvidia.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"aik@ozlabs.ru" <aik@ozlabs.ru>,
	"Zhengxiao.zx@alibaba-inc.com" <Zhengxiao.zx@alibaba-inc.com>,
	"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
	"eauger@redhat.com" <eauger@redhat.com>,
	"Liu, Yi L" <yi.l.liu@intel.com>,
	"Yang, Ziye" <ziye.yang@intel.com>,
	"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
	"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
	"Gonglei \(Arei\)" <arei.gonglei@huawei.com>,
	"felipe@nutanix.com" <felipe@nutanix.com>,
	"Ken.Xue@amd.com" <Ken.Xue@amd.com>,
	"Tian, Kevin" <kevin.tian@intel.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	"intel-gvt-dev@lists.freedesktop.org"
	<intel-gvt-dev@lists.freedesktop.org>,
	"Liu
Subject: Re: [PATCH 0/5] QEMU VFIO live migration
Date: Thu, 28 Mar 2019 10:21:38 +0100	[thread overview]
Message-ID: <20190328092138.GA1989@beluga.usersys.redhat.com> (raw)
In-Reply-To: <20190328083602.GE14681@joy-OptiPlex-7040>

On Thu, Mar 28, 2019 at 04:36:03AM -0400, Zhao Yan wrote:
> hi Alex and Dave,
> Thanks for your replies.
> Please see my comments inline.
>
> On Thu, Mar 28, 2019 at 06:10:20AM +0800, Alex Williamson wrote:
> > On Wed, 27 Mar 2019 20:18:54 +0000
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >
> > > * Zhao Yan (yan.y.zhao@intel.com) wrote:
> > > > On Wed, Feb 20, 2019 at 07:42:42PM +0800, Cornelia Huck wrote:
> > > > > > > > >   b) How do we detect if we're migrating from/to the wrong device or
> > > > > > > > > version of device?  Or say to a device with older firmware or perhaps
> > > > > > > > > a device that has less device memory ?
> > > > > > > > Actually it's still an open for VFIO migration. Need to think about
> > > > > > > > whether it's better to check that in libvirt or qemu (like a device magic
> > > > > > > > along with verion ?).
> > > > > >
> > > > > > We must keep the hardware generation is the same with one POD of public cloud
> > > > > > providers. But we still think about the live migration between from the the lower
> > > > > > generation of hardware migrated to the higher generation.
> > > > >
> > > > > Agreed, lower->higher is the one direction that might make sense to
> > > > > support.
> > > > >
> > > > > But regardless of that, I think we need to make sure that incompatible
> > > > > devices/versions fail directly instead of failing in a subtle, hard to
> > > > > debug way. Might be useful to do some initial sanity checks in libvirt
> > > > > as well.
> > > > >
> > > > > How easy is it to obtain that information in a form that can be
> > > > > consumed by higher layers? Can we find out the device type at least?
> > > > > What about some kind of revision?
> > > > hi Alex and Cornelia
> > > > for device compatibility, do you think it's a good idea to use "version"
> > > > and "device version" fields?
> > > >
> > > > version field: identify live migration interface's version. it can have a
> > > > sort of backward compatibility, like target machine's version >= source
> > > > machine's version. something like that.
> >
> > Don't we essentially already have this via the device specific region?
> > The struct vfio_info_cap_header includes id and version fields, so we
> > can declare a migration id and increment the version for any
> > incompatible changes to the protocol.
> yes, good idea!
> so, what about declaring below new cap?
>     #define VFIO_REGION_INFO_CAP_MIGRATION 4
>     struct vfio_region_info_cap_migration {
>         struct vfio_info_cap_header header;
>         __u32 device_version_len;
>         __u8  device_version[];
>     };
>
>
> > > >
> > > > device_version field consists two parts:
> > > > 1. vendor id : it takes 32 bits. e.g. 0x8086.
> >
> > Who allocates IDs?  If we're going to use PCI vendor IDs, then I'd
> > suggest we use a bit to flag it as such so we can reserve that portion
> > of the 32bit address space.  See for example:
> >
> > #define VFIO_REGION_TYPE_PCI_VENDOR_TYPE        (1 << 31)
> > #define VFIO_REGION_TYPE_PCI_VENDOR_MASK        (0xffff)
> >
> > For vendor specific regions.
> Yes, use PCI vendor ID.
> you are right, we need to use highest bit (VFIO_REGION_TYPE_PCI_VENDOR_TYPE)
> to identify it's a PCI ID.
> Thanks for pointing it out.
> But, I have a question. what is VFIO_REGION_TYPE_PCI_VENDOR_MASK used for?
> why it's 0xffff? I searched QEMU and kernel code and did not find anywhere
> uses it.
>
>
> > > > 2. vendor proprietary string: it can be any string that a vendor driver
> > > > thinks can identify a source device. e.g. pciid + mdev type.
> > > > "vendor id" is to avoid overlap of "vendor proprietary string".
> > > >
> > > >
> > > > struct vfio_device_state_ctl {
> > > >      __u32 version;            /* ro */
> > > >      __u8 device_version[MAX_DEVICE_VERSION_LEN];            /* ro */
> > > >      struct {
> > > >      	__u32 action; /* GET_BUFFER, SET_BUFFER, IS_COMPATIBLE*/
> > > > 	...
> > > >      }data;
> > > >      ...
> > > >  };
> >
> > We have a buffer area where we can read and write data from the vendor
> > driver, why would we have this IS_COMPATIBLE protocol to write the
> > device version string but use a static fixed length version string in
> > the control header to read it?  IOW, let's use GET_VERSION,
> > CHECK_VERSION actions that make use of the buffer area and allow vendor
> > specific version information length.
> you are right, such static fixed length version string is bad :)
> To get device version, do you think which approach below is better?
> 1. use GET_VERSION action, and read from region buffer
> 2. get it when querying cap VFIO_REGION_INFO_CAP_MIGRATION
>
> seems approach 1 is better, and cap VFIO_REGION_INFO_CAP_MIGRATION is only
> for checking migration interface's version?
>
> > > >
> > > > Then, an action IS_COMPATIBLE is added to check device compatibility.
> > > >
> > > > The flow to figure out whether a source device is migratable to target device
> > > > is like that:
> > > > 1. in source side's .save_setup, save source device's device_version string
> > > > 2. in target side's .load_state, load source device's device version string
> > > > and write it to data region, and call IS_COMPATIBLE action to ask vendor driver
> > > > to check whether the source device is compatible to it.
> > > >
> > > > The advantage of adding an IS_COMPATIBLE action is that, vendor driver can
> > > > maintain a compatibility table and decide whether source device is compatible
> > > > to target device according to its proprietary table.
> > > > In device_version string, vendor driver only has to describe the source
> > > > device as elaborately as possible and resorts to vendor driver in target side
> > > > to figure out whether they are compatible.
> >
> > I agree, it's too complicated and restrictive to try to create an
> > interface for the user to determine compatibility, let the driver
> > declare it compatible or not.
> :)
>
> > > It would also be good if the 'IS_COMPATIBLE' was somehow callable
> > > externally - so we could be able to answer a question like 'can we
> > > migrate this VM to this host' - from the management layer before it
> > > actually starts the migration.
>
> so qemu needs to expose two qmp/sysfs interfaces: GET_VERSION and CHECK_VERSION.
> GET_VERSION returns a vm's device's version string.
> CHECK_VERSION's input is device version string and return
> compatible/non-compatible.
> Do you think it's good?
>
> > I think we'd need to mirror this capability in sysfs to support that,
> > or create a qmp interface through QEMU that the device owner could make
> > the request on behalf of the management layer.  Getting access to the
> > vfio device requires an iommu context that's already in use by the
> > device owner, we have no intention of supporting a model that allows
> > independent tasks access to a device.  Thanks,
> > Alex
> >
> do you think two sysfs nodes under a device node is ok?
> e.g.
> /sys/devices/pci0000\:00/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/get_version
> /sys/devices/pci0000\:00/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/check_version

Why do you need both sysfs and QMP at the same time? I can see it gives us some
flexibility, but is there something more to that?

Normally, I'd prefer a QMP interface from libvirt's perspective (with an
appropriate capability that libvirt can check for QEMU support) because I imagine large nodes having a
bunch of GPUs with different revisions which might not be backwards compatible.
Libvirt might query the version string on source and check it on dest via the
QMP in a way that QEMU, talking to the driver, would return either a list or a
single physical device to which we can migrate, because neither QEMU nor
libvirt know that, only the driver does, so that's an important information
rather than looping through all the devices and trying to find one that is
compatible. However, you might have a hard time making all the necessary
changes in QMP introspectable, a new command would be fine, but if you also
wanted to extend say vfio-pci options, IIRC that would not appear in the QAPI
schema and libvirt would not be able to detect support for it.

On the other hand, the presence of a QMP interface IMO doesn't help mgmt apps
much, as it still carries the burden of being able to check this only at the
time of migration, which e.g. OpenStack would like to know long before that.

So, having sysfs attributes would work for both libvirt (even though libvirt
would benefit from a QMP much more) and OpenStack. OpenStack would IMO then
have to figure out how to create the mappings between compatible devices across
several nodes which are non-uniform.

Regards,
Erik

  reply	other threads:[~2019-03-28  9:21 UTC|newest]

Thread overview: 133+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-19  8:50 [PATCH 0/5] QEMU VFIO live migration Yan Zhao
2019-02-19  8:50 ` [Qemu-devel] " Yan Zhao
2019-02-19  8:52 ` [PATCH 1/5] vfio/migration: define kernel interfaces Yan Zhao
2019-02-19  8:52   ` [Qemu-devel] " Yan Zhao
2019-02-19 13:09   ` Cornelia Huck
2019-02-19 13:09     ` [Qemu-devel] " Cornelia Huck
2019-02-20  7:36     ` Zhao Yan
2019-02-20  7:36       ` [Qemu-devel] " Zhao Yan
2019-02-20 17:08       ` Cornelia Huck
2019-02-20 17:08         ` [Qemu-devel] " Cornelia Huck
2019-02-21  1:47         ` Zhao Yan
2019-02-21  1:47           ` [Qemu-devel] " Zhao Yan
2019-02-19  8:52 ` [PATCH 2/5] vfio/migration: support device of device config capability Yan Zhao
2019-02-19  8:52   ` [Qemu-devel] " Yan Zhao
2019-02-19 11:01   ` Dr. David Alan Gilbert
2019-02-19 11:01     ` [Qemu-devel] " Dr. David Alan Gilbert
2019-02-20  5:12     ` Zhao Yan
2019-02-20  5:12       ` [Qemu-devel] " Zhao Yan
2019-02-20 10:57       ` Dr. David Alan Gilbert
2019-02-20 10:57         ` [Qemu-devel] " Dr. David Alan Gilbert
2019-02-19 14:37   ` Cornelia Huck
2019-02-19 14:37     ` [Qemu-devel] " Cornelia Huck
2019-02-20 22:54     ` Zhao Yan
2019-02-20 22:54       ` [Qemu-devel] " Zhao Yan
2019-02-21 10:56       ` Cornelia Huck
2019-02-21 10:56         ` [Qemu-devel] " Cornelia Huck
2019-02-19  8:52 ` [PATCH 3/5] vfio/migration: tracking of dirty page in system memory Yan Zhao
2019-02-19  8:52   ` [Qemu-devel] " Yan Zhao
2019-02-19  8:52 ` [PATCH 4/5] vfio/migration: turn on migration Yan Zhao
2019-02-19  8:52   ` [Qemu-devel] " Yan Zhao
2019-02-19  8:53 ` [PATCH 5/5] vfio/migration: support device memory capability Yan Zhao
2019-02-19  8:53   ` [Qemu-devel] " Yan Zhao
2019-02-19 11:25   ` Dr. David Alan Gilbert
2019-02-19 11:25     ` [Qemu-devel] " Dr. David Alan Gilbert
2019-02-20  5:17     ` Zhao Yan
2019-02-20  5:17       ` [Qemu-devel] " Zhao Yan
2019-02-19 14:42   ` Christophe de Dinechin
2019-02-19 14:42     ` [Qemu-devel] " Christophe de Dinechin
2019-02-20  7:58     ` Zhao Yan
2019-02-20  7:58       ` [Qemu-devel] " Zhao Yan
2019-02-20 10:14       ` Christophe de Dinechin
2019-02-20 10:14         ` [Qemu-devel] " Christophe de Dinechin
2019-02-21  0:07         ` Zhao Yan
2019-02-21  0:07           ` [Qemu-devel] " Zhao Yan
2019-02-19 11:32 ` [PATCH 0/5] QEMU VFIO live migration Dr. David Alan Gilbert
2019-02-19 11:32   ` [Qemu-devel] " Dr. David Alan Gilbert
2019-02-20  5:28   ` Zhao Yan
2019-02-20  5:28     ` [Qemu-devel] " Zhao Yan
2019-02-20 11:01     ` Dr. David Alan Gilbert
2019-02-20 11:01       ` [Qemu-devel] " Dr. David Alan Gilbert
2019-02-20 11:28       ` Gonglei (Arei)
2019-02-20 11:28         ` [Qemu-devel] " Gonglei (Arei)
2019-02-20 11:42         ` Cornelia Huck
2019-02-20 11:42           ` [Qemu-devel] " Cornelia Huck
2019-02-20 12:07           ` Gonglei (Arei)
2019-02-20 12:07             ` [Qemu-devel] " Gonglei (Arei)
2019-03-27  6:35           ` Zhao Yan
2019-03-27 20:18             ` Dr. David Alan Gilbert
2019-03-27 22:10               ` Alex Williamson
2019-03-28  8:36                 ` Zhao Yan
2019-03-28  9:21                   ` Erik Skultety [this message]
2019-03-28 16:04                     ` Alex Williamson
2019-03-29  2:47                       ` Zhao Yan
2019-03-29 14:26                         ` Alex Williamson
2019-03-29 23:10                           ` Zhao Yan
2019-03-30 14:14                             ` Alex Williamson
2019-04-01  2:17                               ` Zhao Yan
2019-04-01  8:14                 ` Cornelia Huck
2019-04-01  8:14                   ` [Qemu-devel] " Cornelia Huck
2019-04-01  8:40                   ` Yan Zhao
2019-04-01  8:40                     ` [Qemu-devel] " Yan Zhao
2019-04-01 14:15                     ` Alex Williamson
2019-04-01 14:15                       ` [Qemu-devel] " Alex Williamson
2019-02-21  0:31       ` Zhao Yan
2019-02-21  0:31         ` [Qemu-devel] " Zhao Yan
2019-02-21  9:15         ` Dr. David Alan Gilbert
2019-02-21  9:15           ` [Qemu-devel] " Dr. David Alan Gilbert
2019-02-20 11:56 ` Gonglei (Arei)
2019-02-20 11:56   ` [Qemu-devel] " Gonglei (Arei)
2019-02-21  0:24   ` Zhao Yan
2019-02-21  0:24     ` [Qemu-devel] " Zhao Yan
2019-02-21  1:35     ` Gonglei (Arei)
2019-02-21  1:35       ` [Qemu-devel] " Gonglei (Arei)
2019-02-21  1:58       ` Zhao Yan
2019-02-21  1:58         ` [Qemu-devel] " Zhao Yan
2019-02-21  3:33         ` Gonglei (Arei)
2019-02-21  3:33           ` [Qemu-devel] " Gonglei (Arei)
2019-02-21  4:08           ` Zhao Yan
2019-02-21  4:08             ` [Qemu-devel] " Zhao Yan
2019-02-21  5:46             ` Gonglei (Arei)
2019-02-21  5:46               ` [Qemu-devel] " Gonglei (Arei)
2019-02-21  2:04       ` Zhao Yan
2019-02-21  2:04         ` [Qemu-devel] " Zhao Yan
2019-02-21  3:16         ` Gonglei (Arei)
2019-02-21  3:16           ` [Qemu-devel] " Gonglei (Arei)
2019-02-21  4:21           ` Zhao Yan
2019-02-21  4:21             ` [Qemu-devel] " Zhao Yan
2019-02-21  5:56             ` Gonglei (Arei)
2019-02-21  5:56               ` [Qemu-devel] " Gonglei (Arei)
2019-02-21 20:40 ` Alex Williamson
2019-02-21 20:40   ` [Qemu-devel] " Alex Williamson
2019-02-25  2:22   ` Zhao Yan
2019-02-25  2:22     ` [Qemu-devel] " Zhao Yan
2019-03-06  0:22     ` Zhao Yan
2019-03-06  0:22       ` [Qemu-devel] " Zhao Yan
2019-03-07 17:44     ` Alex Williamson
2019-03-07 17:44       ` [Qemu-devel] " Alex Williamson
2019-03-07 23:20       ` Tian, Kevin
2019-03-07 23:20         ` [Qemu-devel] " Tian, Kevin
2019-03-08 16:11         ` Alex Williamson
2019-03-08 16:11           ` [Qemu-devel] " Alex Williamson
2019-03-08 16:21           ` Dr. David Alan Gilbert
2019-03-08 16:21             ` [Qemu-devel] " Dr. David Alan Gilbert
2019-03-08 22:02             ` Alex Williamson
2019-03-08 22:02               ` [Qemu-devel] " Alex Williamson
2019-03-11  2:33               ` Tian, Kevin
2019-03-11  2:33                 ` [Qemu-devel] " Tian, Kevin
2019-03-11 20:19                 ` Alex Williamson
2019-03-11 20:19                   ` [Qemu-devel] " Alex Williamson
2019-03-12  2:48                   ` Tian, Kevin
2019-03-12  2:48                     ` [Qemu-devel] " Tian, Kevin
2019-03-13 19:57                     ` Alex Williamson
2019-03-12  2:57       ` Zhao Yan
2019-03-12  2:57         ` [Qemu-devel] " Zhao Yan
2019-03-13  1:13         ` Zhao Yan
2019-03-13 19:14           ` Alex Williamson
2019-03-14  1:12             ` Zhao Yan
2019-03-14 22:44               ` Alex Williamson
2019-03-14 23:05                 ` Zhao Yan
2019-03-15  2:24                   ` Alex Williamson
2019-03-18  2:51                     ` Zhao Yan
2019-03-18  3:09                       ` Alex Williamson
2019-03-18  3:27                         ` Zhao Yan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190328092138.GA1989@beluga.usersys.redhat.com \
    --to=eskultet@redhat.com \
    --cc=Ken.Xue@amd.com \
    --cc=Zhengxiao.zx@alibaba-inc.com \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=cjia@nvidia.com \
    --cc=dgilbert@redhat.com \
    --cc=eauger@redhat.com \
    --cc=felipe@nutanix.com \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=mlevitsk@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shuangtai.tst@alibaba-inc.com \
    --cc=yan.y.zhao@intel.com \
    --cc=yi.l.liu@intel.com \
    --cc=ziye.yang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.