All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Daniel P. Berrange" <berrange@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: "libvir-list@redhat.com" <libvir-list@redhat.com>,
	Andy Currid <ACurrid@nvidia.com>,
	"Tian, Kevin" <kevin.tian@intel.com>, Neo Jia <cjia@nvidia.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	"Song, Jike" <jike.song@intel.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"bjsdjshi@linux.vnet.ibm.com" <bjsdjshi@linux.vnet.ibm.com>
Subject: Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
Date: Tue, 20 Sep 2016 10:47:53 +0100	[thread overview]
Message-ID: <20160920094753.GB25490@redhat.com> (raw)
In-Reply-To: <00d96f24-5df0-d16b-d4e1-838333989dee@nvidia.com>

On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
> 
> Hi libvirt experts,
> 
> Thanks for valuable input on v1 version of RFC.
> 
> Quick brief, VFIO based mediated device framework provides a way to
> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> and IBM's channel IO. This framework reuses VFIO APIs for all the
> functionalities for mediated devices which are currently being used for
> pass through devices. This framework introduces a set of new sysfs files
> for device creation and its life cycle management.
> 
> Here is the summary of discussion on v1:
> 1. Discover mediated device:
> As part of physical device initialization process, vendor driver will
> register their physical devices, which will be used to create virtual
> device (mediated device, aka mdev) to the mediated framework.
> 
> Vendor driver should specify mdev_supported_types in directory format.
> This format is class based, for example, display class directory format
> should be as below. We need to define such set for each class of devices
> which would be supported by mediated device framework.
> 
>  --- mdev_destroy
>  --- mdev_supported_types
>      |-- 11
>      |   |-- create
>      |   |-- name
>      |   |-- fb_length
>      |   |-- resolution
>      |   |-- heads
>      |   |-- max_instances
>      |   |-- params
>      |   |-- requires_group
>      |-- 12
>      |   |-- create
>      |   |-- name
>      |   |-- fb_length
>      |   |-- resolution
>      |   |-- heads
>      |   |-- max_instances
>      |   |-- params
>      |   |-- requires_group
>      |-- 13
>          |-- create
>          |-- name
>          |-- fb_length
>          |-- resolution
>          |-- heads
>          |-- max_instances
>          |-- params
>          |-- requires_group
> 
> 
> In the above example directory '11' represents a type id of mdev device.
> 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> 'requires_group' would be Read-Only files that vendor would provide to
> describe about that type.
> 
> 'create':
>     Write-only file. Mandatory.
>     Accepts string to create mediated device.
> 
> 'name':
>     Read-Only file. Mandatory.
>     Returns string, the name of that type id.

Presumably this is a human-targetted title/description of
the device.

> 
> 'fb_length':
>     Read-only file. Mandatory.
>     Returns <number>{K,M,G}, size of framebuffer.
> 
> 'resolution':
>     Read-Only file. Mandatory.
>     Returns 'hres x vres' format. Maximum supported resolution.
> 
> 'heads':
>     Read-Only file. Mandatory.
>     Returns integer. Number of maximum heads supported.

None of these should be mandatory as that makes the mdev
useless for non-GPU devices.

I'd expect to see a 'class' or 'type' attribute in the
directory whcih tells you what kind of mdev it is. A
valid 'class' value would be 'gpu'. The fb_length,
resolution, and heads parameters would only be mandatory
when class==gpu.

> 'max_instance':
>     Read-Only file. Mandatory.
>     Returns integer.  Returns maximum mdev device could be created
> at the moment when this file is read. This count would be updated by
> vendor driver. Before creating mdev device of this type, check if
> max_instance is > 0.
> 
> 'params'
>     Write-Only file. Optional.
>     String input. Libvirt would pass the string given in XML file to
> this file and then create mdev device. Set empty string to clear params.
> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> limiter for performance benchmarking, then create device of type 11. The
> device created would have that parameter set by vendor driver.

Nope, libvirt will explicitly *NEVER* allow arbitrary opaque
passthrough of vendor specific data in this way.

> The parent device would look like:
> 
>    <device>
>      <name>pci_0000_86_00_0</name>
>      <capability type='pci'>
>        <domain>0</domain>
>        <bus>134</bus>
>        <slot>0</slot>
>        <function>0</function>
>        <capability type='mdev'>
>          <!-- one type element per sysfs directory -->
>          <type id='11'>
>            <!-- one element per sysfs file roughly -->
>            <name>GRID M60-0B</name>
>            <attribute name='fb_length'>512M</attribute>
>            <attribute name='resolution'>2560x1600</attribute>
>            <attribute name='heads'>2</attribute>
>            <attribute name='max_instances'>16</attribute>
>            <attribute name='requires_group'>1</attribute>
>          </type>

There would need to be a <class> element, eg <class>gpu</class>

We would then have further elements based on the class. eg

          <type id='11'>
            <!-- one element per sysfs file roughly -->
            <name>GRID M60-0B</name>
            <fb_length>512M</fb_length>
            <resolution>2560x1600</resolution>
            <heads>2</heads>
            <max_instances>16</max_instances>
            <requires_group>1</requires_group>
          </type>



>        </capability>
>        <product id='...'>GRID M60</product>
>        <vendor id='0x10de'>NVIDIA</vendor>
>      </capability>
>    </device>
> 
> 2. Create/destroy mediated device
> 
> With above example, vGPU device XML would look like:
> 
>    <device>
>      <name>my-vgpu</name>
>      <parent>pci_0000_86_00_0</parent>
>      <capability type='mdev'>
>        <type id='11'/>
>        <group>1</group>
>        <params>'frame_rate_limiter=0'</params>

No, we will not support <params> in this manner in libvirt.

The entire purpose of libvirt is to represent data in a
vendor agnostic manner and not do abitrary passthrough
of vendor specific data. Simply saying this field is
optional does not get around that either.

>      </capability>
>    </device>
> 
> 'type id' is mandatory.
> 'group' is optional. It should be a unique number in the system among
> all the groups created for mdev devices. Its usage is:
>   - not needed if single vGPU device is being assigned to a domain.
>   - only need to be set if multiple vGPUs need to be assigned to a
> domain and vendor driver have 'requires_group' file in type id directory.
>   - if type id directory include 'requires_group' and user tries to
> assign multiple vGPUs to a domain without having <group> field in XML,
> it will create single vGPU.
> 
> 'params' is optional field. User should set this field if extra
> parameters need to be set for a particular vGPU device. Libvirt don't
> need to parse these params. These are meant for vendor driver.
> 
> Libvirt need to follow the sequence to create device:
> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> then only proceed else fail.
> 
> * Set extra params if 'params' field exist in device XML and 'params'
> file exist in type id directory
> 
>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params

We cannot do that step.

> 
> * Autogenerate UUID
> * Create device:
> 
>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
> 
>     where <group> is optional. Group should be unique number among all
> the groups created for mdev devices.
> 
> * Clear params, if set earlier:
> 
>     echo "" > /sys/../0000\:86\:00.0/11/params
> 
> * To destroy device:
> 
>     echo $UUID > /sys/../0000\:86\:00.0/mdev_destroy
> 
> 
> 3. Start/stop mediated device
> 
> No change or requirement for libvirt as this will be handled by open()
> and close() callbacks to vendor driver. In case of multiple devices and
> 'requires_group' set, this will be handled in 'first open()' and 'last
> close()' on device in that group.
> 
> 4. Launch QEMU/VM
> 
>  Pass the mdev sysfs path to QEMU as vfio-pci device.
>  For above vGPU device example:
> 
>     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID
> 
> 5. QEMU/VM Shutdown sequence
> 
> No change or requirement for libvirt.
> 
> 6. VM Reset
> 
> No change or requirement for libvirt as this will be handled via VFIO
> reset API and QEMU process will keep running as before.
> 
> 7. Hot-plug
> 
> It is same syntax to create a virtual device for hot-plug.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

  parent reply	other threads:[~2016-09-20  9:48 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-19 20:35 [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration Kirti Wankhede
2016-09-19 21:36 ` Alex Williamson
2016-09-19 21:50   ` Paolo Bonzini
2016-09-19 22:25     ` Alex Williamson
2016-09-20 14:35       ` Kirti Wankhede
2016-09-20 14:41         ` Daniel P. Berrange
2016-09-20 14:49           ` Paolo Bonzini
2016-09-20 14:58             ` Daniel P. Berrange
2016-09-20 15:05               ` Paolo Bonzini
2016-09-20 15:14                 ` Daniel P. Berrange
2016-09-20 16:31                   ` Kirti Wankhede
2016-09-20 16:36                     ` Daniel P. Berrange
2016-09-20 16:42                       ` Kirti Wankhede
2016-09-20 16:44                         ` Daniel P. Berrange
2016-09-20 16:46                     ` Daniel P. Berrange
2016-09-20 17:21                   ` Paolo Bonzini
2016-09-21  8:34                     ` Daniel P. Berrange
2016-09-20 14:52         ` Alex Williamson
2016-09-20  1:25   ` Tian, Kevin
2016-09-20 14:21   ` Kirti Wankhede
2016-09-20 14:43     ` Alex Williamson
2016-09-20 16:23       ` Kirti Wankhede
2016-09-20 16:50         ` Alex Williamson
2016-09-21 18:34           ` Kirti Wankhede
2016-09-21 19:03             ` Alex Williamson
2016-09-22  4:11               ` Kirti Wankhede
2016-09-22 14:19                 ` Alex Williamson
2016-09-22 14:26                   ` [Qemu-devel] [libvirt] " Daniel P. Berrange
2016-09-28 19:22                     ` Neo Jia
2016-09-28 19:45                       ` Tian, Kevin
2016-09-28 19:59                         ` Neo Jia
2016-09-28 20:31                           ` Laine Stump
2016-09-28 20:47                             ` Neo Jia
2016-09-28 22:49                           ` Alex Williamson
2016-09-28 19:55                       ` Alex Williamson
2016-09-28 20:06                         ` Neo Jia
2016-09-28 22:39                           ` Alex Williamson
2016-09-29  8:03                       ` Daniel P. Berrange
2016-09-29  8:12                         ` Neo Jia
2016-09-29 14:22               ` [Qemu-devel] " Kirti Wankhede
2016-09-21  4:10         ` Tian, Kevin
2016-09-21  4:43           ` Alex Williamson
2016-09-22  2:43             ` Tian, Kevin
2016-09-22 19:25         ` Tian, Kevin
2016-09-23 18:34           ` Kirti Wankhede
2016-09-21  3:56     ` Tian, Kevin
2016-09-21  4:36       ` Alex Williamson
2016-09-22  2:33         ` Tian, Kevin
2016-09-22  3:01           ` Alex Williamson
2016-09-22  3:42             ` Tian, Kevin
     [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D18DF86F5F@SHSMSX101.ccr.corp.intel.com>
2016-09-22  2:59           ` Tian, Kevin
2016-09-20  1:37 ` Tian, Kevin
2016-09-20  9:47 ` Daniel P. Berrange [this message]
2016-09-28 19:48   ` Neo Jia
2016-09-29  8:06     ` Daniel P. Berrange
2016-09-29 14:35       ` Tian, Kevin
2016-09-29 14:38         ` Daniel P. Berrange
2016-09-29 14:42           ` Tian, Kevin
2016-09-30  5:19             ` Kirti Wankhede
2016-10-03  8:20               ` Kirti Wankhede
2016-10-07  5:16                 ` Kirti Wankhede
2016-10-07 19:09                   ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160920094753.GB25490@redhat.com \
    --to=berrange@redhat.com \
    --cc=ACurrid@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=bjsdjshi@linux.vnet.ibm.com \
    --cc=cjia@nvidia.com \
    --cc=jike.song@intel.com \
    --cc=kevin.tian@intel.com \
    --cc=kraxel@redhat.com \
    --cc=kwankhede@nvidia.com \
    --cc=libvir-list@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.