kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Sean Mooney <smooney@redhat.com>
To: Jiri Pirko <jiri@mellanox.com>, Yan Zhao <yan.y.zhao@intel.com>
Cc: Jason Wang <jasowang@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	kvm@vger.kernel.org, libvir-list@redhat.com,
	qemu-devel@nongnu.org, kwankhede@nvidia.com, eauger@redhat.com,
	xin-ran.wang@intel.com, corbet@lwn.net,
	openstack-discuss@lists.openstack.org, shaohe.feng@intel.com,
	kevin.tian@intel.com, eskultet@redhat.com,
	jian-feng.ding@intel.com, dgilbert@redhat.com,
	zhenyuw@linux.intel.com, hejie.xu@intel.com,
	bao.yumeng@zte.com.cn, intel-gvt-dev@lists.freedesktop.org,
	berrange@redhat.com, dinechin@redhat.com, devel@ovirt.org,
	Parav Pandit <parav@mellanox.com>
Subject: Re: device compatibility interface for live migration with assigned devices
Date: Wed, 05 Aug 2020 12:35:01 +0100	[thread overview]
Message-ID: <4cf2824c803c96496e846c5b06767db305e9fb5a.camel@redhat.com> (raw)
In-Reply-To: <20200805105319.GF2177@nanopsycho>

On Wed, 2020-08-05 at 12:53 +0200, Jiri Pirko wrote:
> Wed, Aug 05, 2020 at 11:33:38AM CEST, yan.y.zhao@intel.com wrote:
> > On Wed, Aug 05, 2020 at 04:02:48PM +0800, Jason Wang wrote:
> > > 
> > > On 2020/8/5 下午3:56, Jiri Pirko wrote:
> > > > Wed, Aug 05, 2020 at 04:41:54AM CEST, jasowang@redhat.com wrote:
> > > > > On 2020/8/5 上午10:16, Yan Zhao wrote:
> > > > > > On Wed, Aug 05, 2020 at 10:22:15AM +0800, Jason Wang wrote:
> > > > > > > On 2020/8/5 上午12:35, Cornelia Huck wrote:
> > > > > > > > [sorry about not chiming in earlier]
> > > > > > > > 
> > > > > > > > On Wed, 29 Jul 2020 16:05:03 +0800
> > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > 
> > > > > > > > > On Mon, Jul 27, 2020 at 04:23:21PM -0600, Alex Williamson wrote:
> > > > > > > > 
> > > > > > > > (...)
> > > > > > > > 
> > > > > > > > > > Based on the feedback we've received, the previously proposed interface
> > > > > > > > > > is not viable.  I think there's agreement that the user needs to be
> > > > > > > > > > able to parse and interpret the version information.  Using json seems
> > > > > > > > > > viable, but I don't know if it's the best option.  Is there any
> > > > > > > > > > precedent of markup strings returned via sysfs we could follow?
> > > > > > > > 
> > > > > > > > I don't think encoding complex information in a sysfs file is a viable
> > > > > > > > approach. Quoting Documentation/filesystems/sysfs.rst:
> > > > > > > > 
> > > > > > > > "Attributes should be ASCII text files, preferably with only one value
> > > > > > > > per file. It is noted that it may not be efficient to contain only one
> > > > > > > > value per file, so it is socially acceptable to express an array of
> > > > > > > > values of the same type.
> > > > > > > > Mixing types, expressing multiple lines of data, and doing fancy
> > > > > > > > formatting of data is heavily frowned upon."
> > > > > > > > 
> > > > > > > > Even though this is an older file, I think these restrictions still
> > > > > > > > apply.
> > > > > > > 
> > > > > > > +1, that's another reason why devlink(netlink) is better.
> > > > > > > 
> > > > > > 
> > > > > > hi Jason,
> > > > > > do you have any materials or sample code about devlink, so we can have a good
> > > > > > study of it?
> > > > > > I found some kernel docs about it but my preliminary study didn't show me the
> > > > > > advantage of devlink.
> > > > > 
> > > > > CC Jiri and Parav for a better answer for this.
> > > > > 
> > > > > My understanding is that the following advantages are obvious (as I replied
> > > > > in another thread):
> > > > > 
> > > > > - existing users (NIC, crypto, SCSI, ib), mature and stable
> > > > > - much better error reporting (ext_ack other than string or errno)
> > > > > - namespace aware
> > > > > - do not couple with kobject
> > > > 
> > > > Jason, what is your use case?
> > > 
> > > 
> > > I think the use case is to report device compatibility for live migration.
> > > Yan proposed a simple sysfs based migration version first, but it looks not
> > > sufficient and something based on JSON is discussed.
> > > 
> > > Yan, can you help to summarize the discussion so far for Jiri as a
> > > reference?
> > > 
> > 
> > yes.
> > we are currently defining an device live migration compatibility
> > interface in order to let user space like openstack and libvirt knows
> > which two devices are live migration compatible.
> > currently the devices include mdev (a kernel emulated virtual device)
> > and physical devices (e.g.  a VF of a PCI SRIOV device).
> > 
> > the attributes we want user space to compare including
> > common attribues:
> >    device_api: vfio-pci, vfio-ccw...
> >    mdev_type: mdev type of mdev or similar signature for physical device
> >               It specifies a device's hardware capability. e.g.
> > 	       i915-GVTg_V5_4 means it's of 1/4 of a gen9 Intel graphics
> > 	       device.
by the way this nameing sceam works the opisite of how it would have expected
i woudl have expected to i915-GVTg_V5 to be the same as i915-GVTg_V5_1 and 
i915-GVTg_V5_4 to use 4 times the amount of resouce as i915-GVTg_V5_1 not 1 quarter.

i would much rather see i915-GVTg_V5_4 express as aggreataor:i915-GVTg_V5=4
e.g. that it is 4 of the basic i915-GVTg_V5 type
the invertion of the relationship makes this much harder to resonabout IMO.

if i915-GVTg_V5_8 and i915-GVTg_V5_4 are both actully claiming the same resouce
and both can be used at the same time with your suggested nameing scemem i have have
to fine the mdevtype with the largest value and store that then do math by devidign it by the suffix
of the requested type every time i want to claim the resouce in our placement inventoies.

if we represent it the way i suggest we dont
if it i915-GVTg_V5_8 i know its using 8 of i915-GVTg_V5
it makes it significantly simpler.

> >    software_version: device driver's version.
> >               in <major>.<minor>[.bugfix] scheme, where there is no
> > 	       compatibility across major versions, minor versions have
> > 	       forward compatibility (ex. 1-> 2 is ok, 2 -> 1 is not) and
> > 	       bugfix version number indicates some degree of internal
> > 	       improvement that is not visible to the user in terms of
> > 	       features or compatibility,
> > 
> > vendor specific attributes: each vendor may define different attributes
> >   device id : device id of a physical devices or mdev's parent pci device.
> >               it could be equal to pci id for pci devices
> >   aggregator: used together with mdev_type. e.g. aggregator=2 together
> >               with i915-GVTg_V5_4 means 2*1/4=1/2 of a gen9 Intel
> > 	       graphics device.
> >   remote_url: for a local NVMe VF, it may be configured with a remote
> >               url of a remote storage and all data is stored in the
> > 	       remote side specified by the remote url.
> >   ...
just a minor not that i find ^ much more simmple to understand then
the current proposal with self and compatiable.
if i have well defiend attibute that i can parse and understand that allow
me to calulate the what is and is not compatible that is likely going to
more useful as you wont have to keep maintianing a list of other compatible
devices every time a new sku is released.

in anycase thank for actully shareing ^ as it make it simpler to reson about what
you have previously proposed.
> > 
> > Comparing those attributes by user space alone is not an easy job, as it
> > can't simply assume an equal relationship between source attributes and
> > target attributes. e.g.
> > for a source device of mdev_type=i915-GVTg_V5_4,aggregator=2, (1/2 of
> > gen9), it actually could find a compatible device of
> > mdev_type=i915-GVTg_V5_8,aggregator=4 (also 1/2 of gen9),
> > if mdev_type of i915-GVTg_V5_4 is not available in the target machine.
> > 
> > So, in our current proposal, we want to create two sysfs attributes
> > under a device sysfs node.
> > /sys/<path to device>/migration/self
> > /sys/<path to device>/migration/compatible
> > 
> > #cat /sys/<path to device>/migration/self
> > device_type=vfio_pci
> > mdev_type=i915-GVTg_V5_4
> > device_id=8086591d
> > aggregator=2
> > software_version=1.0.0
> > 
> > #cat /sys/<path to device>/migration/compatible
> > device_type=vfio_pci
> > mdev_type=i915-GVTg_V5_{val1:int:2,4,8}
> > device_id=8086591d
> > aggregator={val1}/2
> > software_version=1.0.0
> > 
> > The /sys/<path to device>/migration/self specifies self attributes of
> > a device.
> > The /sys/<path to device>/migration/compatible specifies the list of
> > compatible devices of a device. as in the example, compatible devices
> > could have
> > 	device_type == vfio_pci &&
> > 	device_id == 8086591d   &&
> > 	software_version == 1.0.0 &&
> >        (
> > 	(mdev_type of i915-GVTg_V5_2 && aggregator==1) ||
> > 	(mdev_type of i915-GVTg_V5_4 && aggregator==2) ||
> > 	(mdev_type of i915-GVTg_V5_8 && aggregator=4)
> > 	)
> > 
> > by comparing whether a target device is in compatible list of source
> > device, the user space can know whether a two devices are live migration
> > compatible.
> > 
> > Additional notes:
> > 1)software_version in the compatible list may not be necessary as it
> > already has a major.minor.bugfix scheme.
> > 2)for vendor attribute like remote_url, it may not be statically
> > assigned and could be changed with a device interface.
> > 
> > So, as Cornelia pointed that it's not good to use complex format in
> > a sysfs attribute, we'd like to know whether there're other good ways to
> > our use case, e.g. splitting a single attribute to multiple simple sysfs
> > attributes as what Cornelia suggested or devlink that Jason has strongly
> > recommended.
> 
> Hi Yan.
> 
> Thanks for the explanation, I'm still fuzzy about the details.
> Anyway, I suggest you to check "devlink dev info" command we have
> implemented for multiple drivers.

is devlink exposed as a filesytem we can read with just open?
openstack will likely try to leverage libvirt to get this info but when we
cant its much simpler to read sysfs then it is to take a a depenency on a commandline
too and have to fork shell to execute it and parse the cli output.
pyroute2 which we use in some openstack poject has basic python binding for devlink but im not
sure how complete it is as i think its relitivly new addtion. if we need to take a dependcy
we will but that would be a drawback fo devlink not that that is a large one just something
to keep in mind.

>  You can try netdevsim to test this.
> I think that the info you need to expose might be put there.
> 
> Devlink creates instance per-device. Specific device driver calls into
> devlink core to create the instance.  What device do you have? What
> driver is it handled by?
> 
> 
> > 
> > Thanks
> > Yan
> > 
> > 
> > 
> 
> 


  reply	other threads:[~2020-08-05 17:18 UTC|newest]

Thread overview: 114+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-13 23:29 device compatibility interface for live migration with assigned devices Yan Zhao
2020-07-14 10:21 ` Daniel P. Berrangé
2020-07-14 12:33   ` Sean Mooney
     [not found]     ` <20200714110148.0471c03c@x1.home>
     [not found]       ` <eb705c72cdc8b6b8959b6ebaeeac6069a718d524.camel@redhat.com>
2020-07-14 21:15         ` Sean Mooney
2020-07-14 16:16   ` Alex Williamson
2020-07-14 16:47     ` Daniel P. Berrangé
2020-07-14 20:47       ` Alex Williamson
2020-07-15  9:16         ` Daniel P. Berrangé
2020-07-14 17:19     ` Dr. David Alan Gilbert
2020-07-14 20:59       ` Alex Williamson
2020-07-15  8:20         ` Yan Zhao
2020-07-15  8:49           ` Feng, Shaohe
2020-07-17 14:59           ` Alex Williamson
2020-07-17 18:03             ` Dr. David Alan Gilbert
2020-07-17 18:30               ` Alex Williamson
2020-07-15  8:23         ` Dr. David Alan Gilbert
     [not found]         ` <CAH7mGatPWsczh_rbVhx4a+psJXvkZgKou3r5HrEQTqE7SqZkKA@mail.gmail.com>
2020-07-17 15:18           ` Alex Williamson
2020-07-16  4:16 ` Jason Wang
2020-07-16  8:32   ` Yan Zhao
2020-07-16  9:30     ` Jason Wang
2020-07-17 16:12     ` Alex Williamson
2020-07-20  3:41       ` Jason Wang
2020-07-20 10:39         ` Sean Mooney
2020-07-21  2:11           ` Jason Wang
2020-07-21  0:51       ` Yan Zhao
2020-07-27  7:24         ` Yan Zhao
2020-07-27 22:23           ` Alex Williamson
2020-07-29  8:05             ` Yan Zhao
2020-07-29 11:28               ` Sean Mooney
2020-07-29 19:12                 ` Alex Williamson
2020-07-30  3:41                   ` Yan Zhao
2020-07-30 13:24                     ` Sean Mooney
2020-07-30 17:29                     ` Alex Williamson
2020-08-04  8:37                       ` Yan Zhao
2020-08-05  9:44                         ` Dr. David Alan Gilbert
2020-07-30  1:56                 ` Yan Zhao
2020-07-30 13:14                   ` Sean Mooney
2020-08-04 16:35               ` Cornelia Huck
2020-08-05  2:22                 ` Jason Wang
2020-08-05  2:16                   ` Yan Zhao
2020-08-05  2:41                     ` Jason Wang
2020-08-05  7:56                       ` Jiri Pirko
2020-08-05  8:02                         ` Jason Wang
2020-08-05  9:33                           ` Yan Zhao
2020-08-05 10:53                             ` Jiri Pirko
2020-08-05 11:35                               ` Sean Mooney [this message]
2020-08-07 11:59                                 ` Cornelia Huck
2020-08-13 15:33                                   ` Cornelia Huck
2020-08-13 19:02                                     ` Eric Farman
2020-08-17  6:38                                       ` Cornelia Huck
2020-08-10  7:46                               ` Yan Zhao
2020-08-13  4:24                                 ` Jason Wang
2020-08-14  5:16                                   ` Yan Zhao
2020-08-14 12:30                                     ` Sean Mooney
2020-08-17  1:52                                       ` Yan Zhao
2020-08-18  3:24                                     ` Jason Wang
2020-08-18  8:55                                       ` Daniel P. Berrangé
2020-08-18  9:06                                         ` Cornelia Huck
2020-08-18  9:24                                           ` Daniel P. Berrangé
2020-08-18  9:38                                             ` Cornelia Huck
     [not found]                                         ` <3a073222-dcfe-c02d-198b-29f6a507b2e1@redhat.com>
2020-08-18  9:16                                           ` Daniel P. Berrangé
2020-08-18  9:36                                             ` Cornelia Huck
2020-08-18  9:39                                               ` Parav Pandit
2020-08-19  3:30                                                 ` Yan Zhao
2020-08-19  5:58                                                   ` Parav Pandit
2020-08-19  9:41                                                     ` Jason Wang
2020-08-19  6:57                                                   ` [ovirt-devel] " Jason Wang
2020-08-19  6:59                                                     ` Yan Zhao
2020-08-19  7:39                                                       ` Jason Wang
2020-08-19  8:13                                                         ` Yan Zhao
2020-08-19  9:28                                                           ` Jason Wang
2020-08-20 12:27                                                             ` Cornelia Huck
2020-08-21  3:14                                                               ` Jason Wang
2020-08-21 14:52                                                                 ` Cornelia Huck
2020-08-31  3:07                                                                   ` Jason Wang
2020-08-19 17:50                                                   ` Alex Williamson
2020-08-20  0:18                                                     ` Yan Zhao
2020-08-20  3:13                                                       ` Alex Williamson
2020-08-20  3:09                                                         ` Yan Zhao
2020-08-19  2:54                                               ` Jason Wang
2020-08-20  0:39                                               ` Yan Zhao
2020-08-20  1:29                                                 ` Sean Mooney
2020-08-20  4:01                                                   ` Yan Zhao
2020-08-20  5:16                                                     ` Sean Mooney
2020-08-20  6:27                                                       ` Yan Zhao
2020-08-20 13:24                                                         ` Sean Mooney
2020-08-26  8:54                                                           ` Yan Zhao
2020-08-20  3:22                                                 ` Alex Williamson
2020-08-20  3:16                                                   ` Yan Zhao
2020-08-25 14:39                                                     ` Cornelia Huck
2020-08-26  6:41                                                       ` Yan Zhao
2020-08-28 13:47                                                         ` Cornelia Huck
2020-08-28 14:04                                                           ` Sean Mooney
2020-08-31  4:43                                                             ` Yan Zhao
2020-09-08 14:41                                                               ` Cornelia Huck
2020-09-09  2:13                                                                 ` Yan Zhao
2020-09-10 12:38                                                                   ` Cornelia Huck
2020-09-10 12:50                                                                     ` Sean Mooney
2020-09-10 18:02                                                                       ` Alex Williamson
2020-09-11  0:56                                                                         ` Yan Zhao
2020-09-11 10:08                                                                           ` Cornelia Huck
2020-09-11 10:18                                                                             ` Tian, Kevin
2020-09-11 16:51                                                                           ` Alex Williamson
2020-09-14 13:48                                                                             ` Zeng, Xin
2020-09-14 14:44                                                                               ` Alex Williamson
2020-09-09  5:37                                                               ` Yan Zhao
2020-08-31  2:23                                                           ` Yan Zhao
2020-08-19  2:38                                             ` Jason Wang
2020-08-18  9:32                                           ` Parav Pandit
2020-08-19  2:45                                             ` Jason Wang
2020-08-19  5:26                                               ` Parav Pandit
2020-08-19  6:48                                                 ` Jason Wang
2020-08-19  6:53                                                   ` Parav Pandit
2020-07-29 19:05             ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4cf2824c803c96496e846c5b06767db305e9fb5a.camel@redhat.com \
    --to=smooney@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=bao.yumeng@zte.com.cn \
    --cc=berrange@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=corbet@lwn.net \
    --cc=devel@ovirt.org \
    --cc=dgilbert@redhat.com \
    --cc=dinechin@redhat.com \
    --cc=eauger@redhat.com \
    --cc=eskultet@redhat.com \
    --cc=hejie.xu@intel.com \
    --cc=intel-gvt-dev@lists.freedesktop.org \
    --cc=jasowang@redhat.com \
    --cc=jian-feng.ding@intel.com \
    --cc=jiri@mellanox.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=libvir-list@redhat.com \
    --cc=openstack-discuss@lists.openstack.org \
    --cc=parav@mellanox.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shaohe.feng@intel.com \
    --cc=xin-ran.wang@intel.com \
    --cc=yan.y.zhao@intel.com \
    --cc=zhenyuw@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).