From: Yan Zhao <yan.y.zhao@intel.com>
To: Jiri Pirko <jiri@mellanox.com>
Cc: Jason Wang <jasowang@redhat.com>,
Cornelia Huck <cohuck@redhat.com>,
Alex Williamson <alex.williamson@redhat.com>,
kvm@vger.kernel.org, libvir-list@redhat.com,
qemu-devel@nongnu.org, kwankhede@nvidia.com, eauger@redhat.com,
xin-ran.wang@intel.com, corbet@lwn.net,
openstack-discuss@lists.openstack.org, shaohe.feng@intel.com,
kevin.tian@intel.com, eskultet@redhat.com,
jian-feng.ding@intel.com, dgilbert@redhat.com,
zhenyuw@linux.intel.com, hejie.xu@intel.com,
bao.yumeng@zte.com.cn, smooney@redhat.com,
intel-gvt-dev@lists.freedesktop.org, berrange@redhat.com,
dinechin@redhat.com, devel@ovirt.org,
Parav Pandit <parav@mellanox.com>
Subject: Re: device compatibility interface for live migration with assigned devices
Date: Mon, 10 Aug 2020 15:46:31 +0800 [thread overview]
Message-ID: <20200810074631.GA29059@joy-OptiPlex-7040> (raw)
In-Reply-To: <20200805105319.GF2177@nanopsycho>
On Wed, Aug 05, 2020 at 12:53:19PM +0200, Jiri Pirko wrote:
> Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao@intel.com wrote:
> >On Wed, Aug 05, 2020 at 04:02:48PM +0800, Jason Wang wrote:
> >>
> >> On 2020/8/5 下午3:56, Jiri Pirko wrote:
> >> > Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang@redhat.com wrote:
> >> > > On 2020/8/5 上午10:16, Yan Zhao wrote:
> >> > > > On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote:
> >> > > > > On 2020/8/5 上午12:35, Cornelia Huck wrote:
> >> > > > > > [sorry about not chiming in earlier]
> >> > > > > >
> >> > > > > > On Wed, 29 Jul 2020 16:05:03 +0800
> >> > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >> > > > > >
> >> > > > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote:
> >> > > > > > (...)
> >> > > > > >
> >> > > > > > > > Based on the feedback we've received, the previously proposed interface
> >> > > > > > > > is not viable. I think there's agreement that the user needs to be
> >> > > > > > > > able to parse and interpret the version information. Using json seems
> >> > > > > > > > viable, but I don't know if it's the best option. Is there any
> >> > > > > > > > precedent of markup strings returned via sysfs we could follow?
> >> > > > > > I don't think encoding complex information in a sysfs file is a viable
> >> > > > > > approach. Quoting Documentation/filesystems/sysfs.rst:
> >> > > > > >
> >> > > > > > "Attributes should be ASCII text files, preferably with only one value
> >> > > > > > per file. It is noted that it may not be efficient to contain only one
> >> > > > > > value per file, so it is socially acceptable to express an array of
> >> > > > > > values of the same type.
> >> > > > > > Mixing types, expressing multiple lines of data, and doing fancy
> >> > > > > > formatting of data is heavily frowned upon."
> >> > > > > >
> >> > > > > > Even though this is an older file, I think these restrictions still
> >> > > > > > apply.
> >> > > > > +1, that's another reason why devlink(netlink) is better.
> >> > > > >
> >> > > > hi Jason,
> >> > > > do you have any materials or sample code about devlink, so we can have a good
> >> > > > study of it?
> >> > > > I found some kernel docs about it but my preliminary study didn't show me the
> >> > > > advantage of devlink.
> >> > >
> >> > > CC Jiri and Parav for a better answer for this.
> >> > >
> >> > > My understanding is that the following advantages are obvious (as I replied
> >> > > in another thread):
> >> > >
> >> > > - existing users (NIC, crypto, SCSI, ib), mature and stable
> >> > > - much better error reporting (ext_ack other than string or errno)
> >> > > - namespace aware
> >> > > - do not couple with kobject
> >> > Jason, what is your use case?
> >>
> >>
> >> I think the use case is to report device compatibility for live migration.
> >> Yan proposed a simple sysfs based migration version first, but it looks not
> >> sufficient and something based on JSON is discussed.
> >>
> >> Yan, can you help to summarize the discussion so far for Jiri as a
> >> reference?
> >>
> >yes.
> >we are currently defining an device live migration compatibility
> >interface in order to let user space like openstack and libvirt knows
> >which two devices are live migration compatible.
> >currently the devices include mdev (a kernel emulated virtual device)
> >and physical devices (e.g. a VF of a PCI SRIOV device).
> >
> >the attributes we want user space to compare including
> >common attribues:
> > device_api: vfio-pci, vfio-ccw...
> > mdev_type: mdev type of mdev or similar signature for physical device
> > It specifies a device's hardware capability. e.g.
> > i915-GVTg_V5_4 means it's of 1/4 of a gen9 Intel graphics
> > device.
> > software_version: device driver's version.
> > in <major>.<minor>[.bugfix] scheme, where there is no
> > compatibility across major versions, minor versions have
> > forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and
> > bugfix version number indicates some degree of internal
> > improvement that is not visible to the user in terms of
> > features or compatibility,
> >
> >vendor specific attributes: each vendor may define different attributes
> > device id : device id of a physical devices or mdev's parent pci device.
> > it could be equal to pci id for pci devices
> > aggregator: used together with mdev_type. e.g. aggregator=2 together
> > with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel
> > graphics device.
> > remote_url: for a local NVMe VF, it may be configured with a remote
> > url of a remote storage and all data is stored in the
> > remote side specified by the remote url.
> > ...
> >
> >Comparing those attributes by user space alone is not an easy job, as it
> >can't simply assume an equal relationship between source attributes and
> >target attributes. e.g.
> >for a source device of mdev_type=i915-GVTg_V5_4,aggregator=2, (1/2 of
> >gen9), it actually could find a compatible device of
> >mdev_type=i915-GVTg_V5_8,aggregator=4 (also 1/2 of gen9),
> >if mdev_type of i915-GVTg_V5_4 is not available in the target machine.
> >
> >So, in our current proposal, we want to create two sysfs attributes
> >under a device sysfs node.
> >/sys/<path to device>/migration/self
> >/sys/<path to device>/migration/compatible
> >
> >#cat /sys/<path to device>/migration/self
> >device_type=vfio_pci
> >mdev_type=i915-GVTg_V5_4
> >device_id=8086591d
> >aggregator=2
> >software_version=1.0.0
> >
> >#cat /sys/<path to device>/migration/compatible
> >device_type=vfio_pci
> >mdev_type=i915-GVTg_V5_{val1:int:2,4,8}
> >device_id=8086591d
> >aggregator={val1}/2
> >software_version=1.0.0
> >
> >The /sys/<path to device>/migration/self specifies self attributes of
> >a device.
> >The /sys/<path to device>/migration/compatible specifies the list of
> >compatible devices of a device. as in the example, compatible devices
> >could have
> > device_type == vfio_pci &&
> > device_id == 8086591d &&
> > software_version == 1.0.0 &&
> > (
> > (mdev_type of i915-GVTg_V5_2 && aggregator==1) ||
> > (mdev_type of i915-GVTg_V5_4 && aggregator==2) ||
> > (mdev_type of i915-GVTg_V5_8 && aggregator=4)
> > )
> >
> >by comparing whether a target device is in compatible list of source
> >device, the user space can know whether a two devices are live migration
> >compatible.
> >
> >Additional notes:
> >1)software_version in the compatible list may not be necessary as it
> >already has a major.minor.bugfix scheme.
> >2)for vendor attribute like remote_url, it may not be statically
> >assigned and could be changed with a device interface.
> >
> >So, as Cornelia pointed that it's not good to use complex format in
> >a sysfs attribute, we'd like to know whether there're other good ways to
> >our use case, e.g. splitting a single attribute to multiple simple sysfs
> >attributes as what Cornelia suggested or devlink that Jason has strongly
> >recommended.
>
> Hi Yan.
>
Hi Jiri,
> Thanks for the explanation, I'm still fuzzy about the details.
> Anyway, I suggest you to check "devlink dev info" command we have
> implemented for multiple drivers. You can try netdevsim to test this.
> I think that the info you need to expose might be put there.
do you mean drivers/net/netdevsim/ ?
>
> Devlink creates instance per-device. Specific device driver calls into
> devlink core to create the instance. What device do you have? What
the devlink core is net/core/devlink.c ?
> driver is it handled by?
It looks that the devlink is for network device specific, and in
devlink.h, it says
include/uapi/linux/devlink.h - Network physical device Netlink
interface, I feel like it's not very appropriate for a GPU driver to use
this interface. Is that right?
Thanks
Yan
next prev parent reply other threads:[~2020-08-10 8:04 UTC|newest]
Thread overview: 114+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-13 23:29 device compatibility interface for live migration with assigned devices Yan Zhao
2020-07-14 10:21 ` Daniel P. Berrangé
2020-07-14 12:33 ` Sean Mooney
[not found] ` <20200714110148.0471c03c@x1.home>
[not found] ` <eb705c72cdc8b6b8959b6ebaeeac6069a718d524.camel@redhat.com>
2020-07-14 21:15 ` Sean Mooney
2020-07-14 16:16 ` Alex Williamson
2020-07-14 16:47 ` Daniel P. Berrangé
2020-07-14 20:47 ` Alex Williamson
2020-07-15 9:16 ` Daniel P. Berrangé
2020-07-14 17:19 ` Dr. David Alan Gilbert
2020-07-14 20:59 ` Alex Williamson
2020-07-15 8:20 ` Yan Zhao
2020-07-15 8:49 ` Feng, Shaohe
2020-07-17 14:59 ` Alex Williamson
2020-07-17 18:03 ` Dr. David Alan Gilbert
2020-07-17 18:30 ` Alex Williamson
2020-07-15 8:23 ` Dr. David Alan Gilbert
[not found] ` <CAH7mGatPWsczh_rbVhx4a+psJXvkZgKou3r5HrEQTqE7SqZkKA@mail.gmail.com>
2020-07-17 15:18 ` Alex Williamson
2020-07-16 4:16 ` Jason Wang
2020-07-16 8:32 ` Yan Zhao
2020-07-16 9:30 ` Jason Wang
2020-07-17 16:12 ` Alex Williamson
2020-07-20 3:41 ` Jason Wang
2020-07-20 10:39 ` Sean Mooney
2020-07-21 2:11 ` Jason Wang
2020-07-21 0:51 ` Yan Zhao
2020-07-27 7:24 ` Yan Zhao
2020-07-27 22:23 ` Alex Williamson
2020-07-29 8:05 ` Yan Zhao
2020-07-29 11:28 ` Sean Mooney
2020-07-29 19:12 ` Alex Williamson
2020-07-30 3:41 ` Yan Zhao
2020-07-30 13:24 ` Sean Mooney
2020-07-30 17:29 ` Alex Williamson
2020-08-04 8:37 ` Yan Zhao
2020-08-05 9:44 ` Dr. David Alan Gilbert
2020-07-30 1:56 ` Yan Zhao
2020-07-30 13:14 ` Sean Mooney
2020-08-04 16:35 ` Cornelia Huck
2020-08-05 2:22 ` Jason Wang
2020-08-05 2:16 ` Yan Zhao
2020-08-05 2:41 ` Jason Wang
2020-08-05 7:56 ` Jiri Pirko
2020-08-05 8:02 ` Jason Wang
2020-08-05 9:33 ` Yan Zhao
2020-08-05 10:53 ` Jiri Pirko
2020-08-05 11:35 ` Sean Mooney
2020-08-07 11:59 ` Cornelia Huck
2020-08-13 15:33 ` Cornelia Huck
2020-08-13 19:02 ` Eric Farman
2020-08-17 6:38 ` Cornelia Huck
2020-08-10 7:46 ` Yan Zhao [this message]
2020-08-13 4:24 ` Jason Wang
2020-08-14 5:16 ` Yan Zhao
2020-08-14 12:30 ` Sean Mooney
2020-08-17 1:52 ` Yan Zhao
2020-08-18 3:24 ` Jason Wang
2020-08-18 8:55 ` Daniel P. Berrangé
2020-08-18 9:06 ` Cornelia Huck
2020-08-18 9:24 ` Daniel P. Berrangé
2020-08-18 9:38 ` Cornelia Huck
[not found] ` <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com>
2020-08-18 9:16 ` Daniel P. Berrangé
2020-08-18 9:36 ` Cornelia Huck
2020-08-18 9:39 ` Parav Pandit
2020-08-19 3:30 ` Yan Zhao
2020-08-19 5:58 ` Parav Pandit
2020-08-19 9:41 ` Jason Wang
2020-08-19 6:57 ` [ovirt-devel] " Jason Wang
2020-08-19 6:59 ` Yan Zhao
2020-08-19 7:39 ` Jason Wang
2020-08-19 8:13 ` Yan Zhao
2020-08-19 9:28 ` Jason Wang
2020-08-20 12:27 ` Cornelia Huck
2020-08-21 3:14 ` Jason Wang
2020-08-21 14:52 ` Cornelia Huck
2020-08-31 3:07 ` Jason Wang
2020-08-19 17:50 ` Alex Williamson
2020-08-20 0:18 ` Yan Zhao
2020-08-20 3:13 ` Alex Williamson
2020-08-20 3:09 ` Yan Zhao
2020-08-19 2:54 ` Jason Wang
2020-08-20 0:39 ` Yan Zhao
2020-08-20 1:29 ` Sean Mooney
2020-08-20 4:01 ` Yan Zhao
2020-08-20 5:16 ` Sean Mooney
2020-08-20 6:27 ` Yan Zhao
2020-08-20 13:24 ` Sean Mooney
2020-08-26 8:54 ` Yan Zhao
2020-08-20 3:22 ` Alex Williamson
2020-08-20 3:16 ` Yan Zhao
2020-08-25 14:39 ` Cornelia Huck
2020-08-26 6:41 ` Yan Zhao
2020-08-28 13:47 ` Cornelia Huck
2020-08-28 14:04 ` Sean Mooney
2020-08-31 4:43 ` Yan Zhao
2020-09-08 14:41 ` Cornelia Huck
2020-09-09 2:13 ` Yan Zhao
2020-09-10 12:38 ` Cornelia Huck
2020-09-10 12:50 ` Sean Mooney
2020-09-10 18:02 ` Alex Williamson
2020-09-11 0:56 ` Yan Zhao
2020-09-11 10:08 ` Cornelia Huck
2020-09-11 10:18 ` Tian, Kevin
2020-09-11 16:51 ` Alex Williamson
2020-09-14 13:48 ` Zeng, Xin
2020-09-14 14:44 ` Alex Williamson
2020-09-09 5:37 ` Yan Zhao
2020-08-31 2:23 ` Yan Zhao
2020-08-19 2:38 ` Jason Wang
2020-08-18 9:32 ` Parav Pandit
2020-08-19 2:45 ` Jason Wang
2020-08-19 5:26 ` Parav Pandit
2020-08-19 6:48 ` Jason Wang
2020-08-19 6:53 ` Parav Pandit
2020-07-29 19:05 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200810074631.GA29059@joy-OptiPlex-7040 \
--to=yan.y.zhao@intel.com \
--cc=alex.williamson@redhat.com \
--cc=bao.yumeng@zte.com.cn \
--cc=berrange@redhat.com \
--cc=cohuck@redhat.com \
--cc=corbet@lwn.net \
--cc=devel@ovirt.org \
--cc=dgilbert@redhat.com \
--cc=dinechin@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=hejie.xu@intel.com \
--cc=intel-gvt-dev@lists.freedesktop.org \
--cc=jasowang@redhat.com \
--cc=jian-feng.ding@intel.com \
--cc=jiri@mellanox.com \
--cc=kevin.tian@intel.com \
--cc=kvm@vger.kernel.org \
--cc=kwankhede@nvidia.com \
--cc=libvir-list@redhat.com \
--cc=openstack-discuss@lists.openstack.org \
--cc=parav@mellanox.com \
--cc=qemu-devel@nongnu.org \
--cc=shaohe.feng@intel.com \
--cc=smooney@redhat.com \
--cc=xin-ran.wang@intel.com \
--cc=zhenyuw@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).