From: Yan Zhao <yan.y.zhao@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"Tian, Kevin" <kevin.tian@intel.com>,
"cjia@nvidia.com" <cjia@nvidia.com>,
"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>,
"libvir-list@redhat.com" <libvir-list@redhat.com>,
"Zhengxiao.zx@alibaba-inc.com" <Zhengxiao.zx@alibaba-inc.com>,
"shuangtai.tst@alibaba-inc.com" <shuangtai.tst@alibaba-inc.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"kwankhede@nvidia.com" <kwankhede@nvidia.com>,
"eauger@redhat.com" <eauger@redhat.com>,
"corbet@lwn.net" <corbet@lwn.net>,
"Liu, Yi L" <yi.l.liu@intel.com>,
"eskultet@redhat.com" <eskultet@redhat.com>,
"Yang, Ziye" <ziye.yang@intel.com>,
"mlevitsk@redhat.com" <mlevitsk@redhat.com>,
"pasic@linux.ibm.com" <pasic@linux.ibm.com>,
"aik@ozlabs.ru" <aik@ozlabs.ru>,
"felipe@nutanix.com" <felipe@nutanix.com>,
"Ken.Xue@amd.com" <Ken.Xue@amd.com>,
"Zeng, Xin" <xin.zeng@intel.com>,
"zhenyuw@linux.intel.com" <zhenyuw@linux.intel.com>,
"dinechin@redhat.com" <dinechin@redhat.com>,
"intel-gvt-dev@lists.freedesktop.org"
<intel-gvt-dev@lists.freedesktop.org>,
"Liu, Changpeng" <changpeng.liu@intel.com>,
"berrange@redhat.com" <berrange@redhat.com>,
Cornelia Huck <cohuck@redhat.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Wang, Zhi A" <zhi.a.wang@intel.com>,
"jonathan.davies@nutanix.com" <jonathan.davies@nutanix.com>,
"He, Shaopeng" <shaopeng.he@intel.com>
Subject: Re: [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration
Date: Wed, 3 Jun 2020 01:24:43 -0400 [thread overview]
Message-ID: <20200603052443.GC12300@joy-OptiPlex-7040> (raw)
In-Reply-To: <20200602215528.7a1008f0@x1.home>
On Tue, Jun 02, 2020 at 09:55:28PM -0600, Alex Williamson wrote:
> On Tue, 2 Jun 2020 23:19:48 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
>
> > On Tue, Jun 02, 2020 at 04:55:27PM -0600, Alex Williamson wrote:
> > > On Wed, 29 Apr 2020 20:39:50 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >
> > > > On Wed, Apr 29, 2020 at 05:48:44PM +0800, Dr. David Alan Gilbert wrote:
> > > > <snip>
> > > > > > > > > > > > > > > > > An mdev type is meant to define a software compatible interface, so in
> > > > > > > > > > > > > > > > > the case of mdev->mdev migration, doesn't migrating to a different type
> > > > > > > > > > > > > > > > > fail the most basic of compatibility tests that we expect userspace to
> > > > > > > > > > > > > > > > > perform? IOW, if two mdev types are migration compatible, it seems a
> > > > > > > > > > > > > > > > > prerequisite to that is that they provide the same software interface,
> > > > > > > > > > > > > > > > > which means they should be the same mdev type.
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > In the hybrid cases of mdev->phys or phys->mdev, how does a
> > > > > > > > > > > > > > > > management
> > > > > > > > > > > > > > > > > tool begin to even guess what might be compatible? Are we expecting
> > > > > > > > > > > > > > > > > libvirt to probe ever device with this attribute in the system? Is
> > > > > > > > > > > > > > > > > there going to be a new class hierarchy created to enumerate all
> > > > > > > > > > > > > > > > > possible migrate-able devices?
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > yes, management tool needs to guess and test migration compatible
> > > > > > > > > > > > > > > > between two devices. But I think it's not the problem only for
> > > > > > > > > > > > > > > > mdev->phys or phys->mdev. even for mdev->mdev, management tool needs
> > > > > > > > > > > > > > > > to
> > > > > > > > > > > > > > > > first assume that the two mdevs have the same type of parent devices
> > > > > > > > > > > > > > > > (e.g.their pciids are equal). otherwise, it's still enumerating
> > > > > > > > > > > > > > > > possibilities.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > on the other hand, for two mdevs,
> > > > > > > > > > > > > > > > mdev1 from pdev1, its mdev_type is 1/2 of pdev1;
> > > > > > > > > > > > > > > > mdev2 from pdev2, its mdev_type is 1/4 of pdev2;
> > > > > > > > > > > > > > > > if pdev2 is exactly 2 times of pdev1, why not allow migration between
> > > > > > > > > > > > > > > > mdev1 <-> mdev2.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > How could the manage tool figure out that 1/2 of pdev1 is equivalent
> > > > > > > > > > > > > > > to 1/4 of pdev2? If we really want to allow such thing happen, the best
> > > > > > > > > > > > > > > choice is to report the same mdev type on both pdev1 and pdev2.
> > > > > > > > > > > > > > I think that's exactly the value of this migration_version interface.
> > > > > > > > > > > > > > the management tool can take advantage of this interface to know if two
> > > > > > > > > > > > > > devices are migration compatible, no matter they are mdevs, non-mdevs,
> > > > > > > > > > > > > > or mix.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > as I know, (please correct me if not right), current libvirt still
> > > > > > > > > > > > > > requires manually generating mdev devices, and it just duplicates src vm
> > > > > > > > > > > > > > configuration to the target vm.
> > > > > > > > > > > > > > for libvirt, currently it's always phys->phys and mdev->mdev (and of the
> > > > > > > > > > > > > > same mdev type).
> > > > > > > > > > > > > > But it does not justify that hybrid cases should not be allowed. otherwise,
> > > > > > > > > > > > > > why do we need to introduce this migration_version interface and leave
> > > > > > > > > > > > > > the judgement of migration compatibility to vendor driver? why not simply
> > > > > > > > > > > > > > set the criteria to something like "pciids of parent devices are equal,
> > > > > > > > > > > > > > and mdev types are equal" ?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > btw mdev<->phys just brings trouble to upper stack as Alex pointed out.
> > > > > > > > > > > > > > could you help me understand why it will bring trouble to upper stack?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think it just needs to read src migration_version under src dev node,
> > > > > > > > > > > > > > and test it in target migration version under target dev node.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > after all, through this interface we just help the upper layer
> > > > > > > > > > > > > > knowing available options through reading and testing, and they decide
> > > > > > > > > > > > > > to use it or not.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Can we simplify the requirement by allowing only mdev<->mdev and
> > > > > > > > > > > > > > > phys<->phys migration? If an customer does want to migrate between a
> > > > > > > > > > > > > > > mdev and phys, he could wrap physical device into a wrapped mdev
> > > > > > > > > > > > > > > instance (with the same type as the source mdev) instead of using vendor
> > > > > > > > > > > > > > > ops. Doing so does add some burden but if mdev<->phys is not dominant
> > > > > > > > > > > > > > > usage then such tradeoff might be worthywhile...
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > If the interfaces for phys<->phys and mdev<->mdev are consistent, it makes no
> > > > > > > > > > > > > > difference to phys<->mdev, right?
> > > > > > > > > > > > > > I think the vendor string for a mdev device is something like:
> > > > > > > > > > > > > > "Parent PCIID + mdev type + software version", and
> > > > > > > > > > > > > > that for a phys device is something like:
> > > > > > > > > > > > > > "PCIID + software version".
> > > > > > > > > > > > > > as long as we don't migrate between devices from different vendors, it's
> > > > > > > > > > > > > > easy for vendor driver to tell if a phys device is migration compatible
> > > > > > > > > > > > > > to a mdev device according it supports it or not.
> > > > > > > > > > > > >
> > > > > > > > > > > > > It surprises me that the PCIID matching is a requirement; I'd assumed
> > > > > > > > > > > > > with this clever mdev name setup that you could migrate between two
> > > > > > > > > > > > > different models in a series, or to a newer model, as long as they
> > > > > > > > > > > > > both supported the same mdev view.
> > > > > > > > > > > > >
> > > > > > > > > > > > hi Dave
> > > > > > > > > > > > the migration_version string is transparent to userspace, and is
> > > > > > > > > > > > completely defined by vendor driver.
> > > > > > > > > > > > I put it there just as an example of how vendor driver may implement it.
> > > > > > > > > > > > e.g.
> > > > > > > > > > > > the src migration_version string is "src PCIID + src software version",
> > > > > > > > > > > > then when this string is write to target migration_version node,
> > > > > > > > > > > > the vendor driver in the target device will compare it with its own
> > > > > > > > > > > > device info and software version.
> > > > > > > > > > > > If different models are allowed, the write just succeeds even
> > > > > > > > > > > > PCIIDs in src and target are different.
> > > > > > > > > > > >
> > > > > > > > > > > > so, it is the vendor driver to define whether two devices are able to
> > > > > > > > > > > > migrate, no matter their PCIIDs, mdev types, software versions..., which
> > > > > > > > > > > > provides vendor driver full flexibility.
> > > > > > > > > > > >
> > > > > > > > > > > > do you think it's good?
> > > > > > > > > > >
> > > > > > > > > > > Yeh that's OK; I guess it's going to need to have a big table in their
> > > > > > > > > > > with all the PCIIDs in.
> > > > > > > > > > > The alternative would be to abstract it a little; e.g. to say it's
> > > > > > > > > > > an Intel-gpu-core-v4 and then it would be less worried about the exact
> > > > > > > > > > > clock speed etc - but yes you might be right htat PCIIDs might be best
> > > > > > > > > > > for checking for quirks.
> > > > > > > > > > >
> > > > > > > > > > glad that you are agreed with it:)
> > > > > > > > > > I think the vendor driver still can choose a way to abstract a little
> > > > > > > > > > (e.g. Intel-gpu-core-v4...) if they think it's better. In that case, the
> > > > > > > > > > migration_string would be something like "Intel-gpu-core-v4 + instance
> > > > > > > > > > number + software version".
> > > > > > > > > > IOW, they can choose anything they think appropriate to identify migration
> > > > > > > > > > compatibility of a device.
> > > > > > > > > > But Alex is right, we have to prevent namespace overlapping. So I think
> > > > > > > > > > we need to ensure src and target devices are from the same vendors.
> > > > > > > > > > or, any other ideas?
> > > > > > > > >
> > > > > > > > > That's why I kept the 'Intel' in that example; or PCI vendor ID; I was
> > > > > > > > Yes, it's a good idea!
> > > > > > > > could we add a line in the doc saying that
> > > > > > > > it is the vendor driver to add a unique string to avoid namespace
> > > > > > > > collision?
> > > > > > >
> > > > > > > So why don't we split the difference; lets say that it should start with
> > > > > > > the hex PCI Vendor ID.
> > > > > > >
> > > > > > The problem is for mdev devices, if the parent devices are not PCI devices,
> > > > > > they don't have PCI vendor IDs.
> > > > >
> > > > > Hmm it would be best not to invent a whole new way of giving unique
> > > > > idenitifiers for vendors if we can.
> > > > >
> > > > what about leveraging the flags in vfio device info ?
> > > >
> > > > #define VFIO_DEVICE_FLAGS_RESET (1 << 0) /* Device supports reset */
> > > > #define VFIO_DEVICE_FLAGS_PCI (1 << 1) /* vfio-pci device */
> > > > #define VFIO_DEVICE_FLAGS_PLATFORM (1 << 2) /* vfio-platform device */
> > > > #define VFIO_DEVICE_FLAGS_AMBA (1 << 3) /* vfio-amba device */
> > > > #define VFIO_DEVICE_FLAGS_CCW (1 << 4) /* vfio-ccw device */
> > > > #define VFIO_DEVICE_FLAGS_AP (1 << 5) /* vfio-ap device */
> > > >
> > > > Then for migration_version string,
> > > > The first 64 bits are for device type, the second 64 bits are for device id.
> > > > e.g.
> > > > for PCI devices, it could be
> > > > VFIO_DEVICE_FLAGS_PCI + PCI ID.
> > > >
> > > > Currently in the doc, we only define PCI devices to use PCI ID as the second
> > > > 64 bits. In future, if other types of devices want to support migration,
> > > > they can define their own parts of device id. e.g. use ACPI ID as the
> > > > second 64-bit...
> > > >
> > > > sounds good?
> > >
> > > [dead thread resurrection alert]
> > >
> > > Not really. We're deep into territory that we were trying to avoid.
> > > We had previously defined the version string as opaque (not
> > > transparent) specifically because we did not want userspace to make
> > > assumptions about compatibility based on the content of the string. It
> > > was 100% left to the vendor driver to determine compatibility. The
> > > mdev type was the full extent of the first level filter that userspace
> > > could use to narrow the set of potentially compatible devices. If we
> > > remove that due to physical device migration support, I'm not sure how
> > > we simplify the problem for userspace.
> > >
> > > We need to step away from PCI IDs and parent devices. We're not
> > > designing a solution that only works for PCI, there's no guarantee that
> > > parent devices are similar or even from the same vendor.
> > >
> > > Does the mdev type sufficiently solve the problem for mdev devices? If
> > > so, then what can we learn from it and how can we apply an equivalence
> > > to physical devices? For example, should a vfio bus driver (vfio-pci
> > > or vfio-mdev) expose vfio_migration_type and vfio_migration_version
> > > attributes under the device in sysfs where the _type provides the first
> > > level, user transparent, matching string (ex. mdev type for mdev
> > > devices) while the _version provides the user opaque, vendor known
> > > compatibility test?
> > >
> > > This pushes the problem out to the drivers where we can perhaps
> > > incorporate the module name to avoid collisions. For example Yan's
> > > vendor extension proposal makes use of vfio-pci with extension modules
> > > loaded via an alias incorporating the PCI vendor and device ID. So
> > > vfio-pci might use a type of "vfio-pci:$ALIAS".
> > >
> > > It's still a bit messy that someone needs to go evaluate all these
> > > types between devices that exist and mdev devices that might exist if
> > > created, but I don't have any good ideas to resolve that (maybe a new
> > > class hierarchy?). Thanks,
> >
> > hi Alex
> >
> > yes, with the same mdev_type, user still has to enumerate all parent
> > devices and test between the supported mdev_types to know whether two mdev
> > devices are compatible.
> > maybe this is not a problem? in reality, it is the administrator that
> > specifies two devices and the management tool feedbacks compatibility
> > result. management tool is not required to pre-test and setup the
> > compatibility map beforehand.
>
> That's exactly the purpose of this interface though is to give the
> management tools some indication that a migration has a chance of
> working.
>
> > If so, then the only problem left is namespace collision.
> > given that the migration_version nodes is exported by vendor driver,
> > maybe it can also embed its module name in the migration version string,
> > like "i915" in "i915-GVTg_V5_8", as you suggested above.
>
> No, we've already decided that the version string is opaque, the user
> is not to attempt to infer anything from it. That's why I've suggested
> another attribute in sysfs that does present type information that a
> user can compare. Thanks,
>
> Alex
>
ok. got it.
one more thing I want to confirm is that do you think it's a necessary
restriction that "The mdev devices are of the same type" ?
could mdev and phys devices both expose "vfio_migration_type" and
"vfio_migration_version" under device sysfs so that it may not be
confined in mdev_type? (e.g. when aggregator is enabled, though two
mdevs are of the same mdev_type, they are not actually compatible; and
two mdevs are compatible though their mdev_type is not equal.)
for mdev devices, we could still expose vfio_migration_version
attribute under mdev_type for detection before mdev generated.
Thanks
Yan
> > with module name as the first mandatory field in version string and
> > skipping the enumeration/testing problem, we can happyly unify migration
> > across mdev and phys devices. e.g. it is possible to migrate between
> > VFs in sriov and mdevs in siov to achieve backwards compatibility.
> >
> > Thanks
> > Yan
> >
> >
> >
> > >
> >
>
next prev parent reply other threads:[~2020-06-03 5:34 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-04-13 5:52 [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Yan Zhao
2020-04-13 5:54 ` [PATCH v5 1/4] vfio/mdev: add migration_version attribute for mdev (under mdev_type node) Yan Zhao
2020-04-15 7:28 ` Erik Skultety
2020-04-15 8:58 ` Yan Zhao
2020-04-13 5:54 ` [PATCH v5 2/4] drm/i915/gvt: export migration_version to mdev sysfs " Yan Zhao
2020-04-13 5:55 ` [PATCH v5 3/4] vfio/mdev: add migration_version attribute for mdev (under mdev device node) Yan Zhao
2020-04-15 7:42 ` Erik Skultety
2020-04-15 9:02 ` Yan Zhao
2020-04-13 5:55 ` [PATCH v5 4/4] drm/i915/gvt: export migration_version to mdev sysfs " Yan Zhao
2020-04-17 8:44 ` [PATCH v5 0/4] introduction of migration_version attribute for VFIO live migration Cornelia Huck
2020-04-17 9:52 ` Yan Zhao
2020-04-17 11:24 ` Cornelia Huck
2020-04-20 1:24 ` Yan Zhao
2020-04-20 22:56 ` Alex Williamson
2020-04-21 2:37 ` Yan Zhao
2020-04-21 12:08 ` Tian, Kevin
2020-04-22 7:36 ` Yan Zhao
2020-04-24 19:10 ` Dr. David Alan Gilbert
2020-04-26 1:36 ` Yan Zhao
2020-04-27 15:37 ` Dr. David Alan Gilbert
2020-04-28 0:54 ` Yan Zhao
2020-04-28 14:14 ` Dr. David Alan Gilbert
2020-04-29 7:26 ` Yan Zhao
2020-04-29 8:22 ` Dr. David Alan Gilbert
2020-04-29 9:35 ` Yan Zhao
2020-04-29 9:48 ` Dr. David Alan Gilbert
2020-04-30 0:39 ` Yan Zhao
2020-06-02 22:55 ` Alex Williamson
2020-06-03 3:19 ` Yan Zhao
2020-06-03 3:55 ` Alex Williamson
2020-06-03 5:24 ` Yan Zhao [this message]
2020-06-03 16:26 ` Alex Williamson
2020-06-05 10:22 ` Dr. David Alan Gilbert
2020-06-05 14:31 ` Alex Williamson
2020-06-05 14:39 ` Dr. David Alan Gilbert
2020-06-10 0:37 ` Yan Zhao
2020-06-19 22:40 ` Alex Williamson
2020-06-22 2:28 ` Yan Zhao
2020-04-29 14:13 ` Eric Blake
2020-04-30 0:45 ` Yan Zhao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200603052443.GC12300@joy-OptiPlex-7040 \
--to=yan.y.zhao@intel.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=berrange@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=corbet@lwn.net \
--cc=dgilbert@redhat.com \
--cc=dinechin@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=libvir-list@redhat.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=qemu-devel@nongnu.org \
--cc=shaopeng.he@intel.com \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=xin.zeng@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhenyuw@linux.intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).