All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
@ 2016-09-19 20:35 Kirti Wankhede
  2016-09-19 21:36 ` Alex Williamson
                   ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-19 20:35 UTC (permalink / raw)
  To: libvir-list
  Cc: qemu-devel, Alex Williamson, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann


Hi libvirt experts,

Thanks for valuable input on v1 version of RFC.

Quick brief, VFIO based mediated device framework provides a way to
virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
and IBM's channel IO. This framework reuses VFIO APIs for all the
functionalities for mediated devices which are currently being used for
pass through devices. This framework introduces a set of new sysfs files
for device creation and its life cycle management.

Here is the summary of discussion on v1:
1. Discover mediated device:
As part of physical device initialization process, vendor driver will
register their physical devices, which will be used to create virtual
device (mediated device, aka mdev) to the mediated framework.

Vendor driver should specify mdev_supported_types in directory format.
This format is class based, for example, display class directory format
should be as below. We need to define such set for each class of devices
which would be supported by mediated device framework.

 --- mdev_destroy
 --- mdev_supported_types
     |-- 11
     |   |-- create
     |   |-- name
     |   |-- fb_length
     |   |-- resolution
     |   |-- heads
     |   |-- max_instances
     |   |-- params
     |   |-- requires_group
     |-- 12
     |   |-- create
     |   |-- name
     |   |-- fb_length
     |   |-- resolution
     |   |-- heads
     |   |-- max_instances
     |   |-- params
     |   |-- requires_group
     |-- 13
         |-- create
         |-- name
         |-- fb_length
         |-- resolution
         |-- heads
         |-- max_instances
         |-- params
         |-- requires_group


In the above example directory '11' represents a type id of mdev device.
'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
'requires_group' would be Read-Only files that vendor would provide to
describe about that type.

'create':
    Write-only file. Mandatory.
    Accepts string to create mediated device.

'name':
    Read-Only file. Mandatory.
    Returns string, the name of that type id.

'fb_length':
    Read-only file. Mandatory.
    Returns <number>{K,M,G}, size of framebuffer.

'resolution':
    Read-Only file. Mandatory.
    Returns 'hres x vres' format. Maximum supported resolution.

'heads':
    Read-Only file. Mandatory.
    Returns integer. Number of maximum heads supported.

'max_instance':
    Read-Only file. Mandatory.
    Returns integer.  Returns maximum mdev device could be created
at the moment when this file is read. This count would be updated by
vendor driver. Before creating mdev device of this type, check if
max_instance is > 0.

'params'
    Write-Only file. Optional.
    String input. Libvirt would pass the string given in XML file to
this file and then create mdev device. Set empty string to clear params.
For example, set parameter 'frame_rate_limiter=0' to disable frame rate
limiter for performance benchmarking, then create device of type 11. The
device created would have that parameter set by vendor driver.

'requires_group'
    Read-Only file. Optional.
    This should be provided by vendor driver if vendor driver need to
group mdev devices in one domain so that vendor driver can use 'first
open' to commit resources of all mdev devices associated to that domain
and 'last close' to free those.

The parent device would look like:

   <device>
     <name>pci_0000_86_00_0</name>
     <capability type='pci'>
       <domain>0</domain>
       <bus>134</bus>
       <slot>0</slot>
       <function>0</function>
       <capability type='mdev'>
         <!-- one type element per sysfs directory -->
         <type id='11'>
           <!-- one element per sysfs file roughly -->
           <name>GRID M60-0B</name>
           <attribute name='fb_length'>512M</attribute>
           <attribute name='resolution'>2560x1600</attribute>
           <attribute name='heads'>2</attribute>
           <attribute name='max_instances'>16</attribute>
           <attribute name='requires_group'>1</attribute>
         </type>
       </capability>
       <product id='...'>GRID M60</product>
       <vendor id='0x10de'>NVIDIA</vendor>
     </capability>
   </device>

2. Create/destroy mediated device

With above example, vGPU device XML would look like:

   <device>
     <name>my-vgpu</name>
     <parent>pci_0000_86_00_0</parent>
     <capability type='mdev'>
       <type id='11'/>
       <group>1</group>
       <params>'frame_rate_limiter=0'</params>
     </capability>
   </device>

'type id' is mandatory.
'group' is optional. It should be a unique number in the system among
all the groups created for mdev devices. Its usage is:
  - not needed if single vGPU device is being assigned to a domain.
  - only need to be set if multiple vGPUs need to be assigned to a
domain and vendor driver have 'requires_group' file in type id directory.
  - if type id directory include 'requires_group' and user tries to
assign multiple vGPUs to a domain without having <group> field in XML,
it will create single vGPU.

'params' is optional field. User should set this field if extra
parameters need to be set for a particular vGPU device. Libvirt don't
need to parse these params. These are meant for vendor driver.

Libvirt need to follow the sequence to create device:
* Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
then only proceed else fail.

* Set extra params if 'params' field exist in device XML and 'params'
file exist in type id directory

    echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params

* Autogenerate UUID
* Create device:

    echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create

    where <group> is optional. Group should be unique number among all
the groups created for mdev devices.

* Clear params, if set earlier:

    echo "" > /sys/../0000\:86\:00.0/11/params

* To destroy device:

    echo $UUID > /sys/../0000\:86\:00.0/mdev_destroy


3. Start/stop mediated device

No change or requirement for libvirt as this will be handled by open()
and close() callbacks to vendor driver. In case of multiple devices and
'requires_group' set, this will be handled in 'first open()' and 'last
close()' on device in that group.

4. Launch QEMU/VM

 Pass the mdev sysfs path to QEMU as vfio-pci device.
 For above vGPU device example:

    -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID

5. QEMU/VM Shutdown sequence

No change or requirement for libvirt.

6. VM Reset

No change or requirement for libvirt as this will be handled via VFIO
reset API and QEMU process will keep running as before.

7. Hot-plug

It is same syntax to create a virtual device for hot-plug.

Thanks,
Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-19 20:35 [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration Kirti Wankhede
@ 2016-09-19 21:36 ` Alex Williamson
  2016-09-19 21:50   ` Paolo Bonzini
                     ` (2 more replies)
  2016-09-20  1:37 ` Tian, Kevin
  2016-09-20  9:47 ` Daniel P. Berrange
  2 siblings, 3 replies; 62+ messages in thread
From: Alex Williamson @ 2016-09-19 21:36 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann

On Tue, 20 Sep 2016 02:05:52 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> Hi libvirt experts,
> 
> Thanks for valuable input on v1 version of RFC.
> 
> Quick brief, VFIO based mediated device framework provides a way to
> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> and IBM's channel IO. This framework reuses VFIO APIs for all the
> functionalities for mediated devices which are currently being used for
> pass through devices. This framework introduces a set of new sysfs files
> for device creation and its life cycle management.
> 
> Here is the summary of discussion on v1:
> 1. Discover mediated device:
> As part of physical device initialization process, vendor driver will
> register their physical devices, which will be used to create virtual
> device (mediated device, aka mdev) to the mediated framework.
> 
> Vendor driver should specify mdev_supported_types in directory format.
> This format is class based, for example, display class directory format
> should be as below. We need to define such set for each class of devices
> which would be supported by mediated device framework.
> 
>  --- mdev_destroy

I thought we replaced mdev_destroy with a remove attribute for each
mdev device.

>  --- mdev_supported_types
>      |-- 11
>      |   |-- create
>      |   |-- name
>      |   |-- fb_length
>      |   |-- resolution
>      |   |-- heads
>      |   |-- max_instances
>      |   |-- params
>      |   |-- requires_group
>      |-- 12
>      |   |-- create
>      |   |-- name
>      |   |-- fb_length
>      |   |-- resolution
>      |   |-- heads
>      |   |-- max_instances
>      |   |-- params
>      |   |-- requires_group
>      |-- 13
>          |-- create
>          |-- name
>          |-- fb_length
>          |-- resolution
>          |-- heads
>          |-- max_instances
>          |-- params
>          |-- requires_group
> 
> 
> In the above example directory '11' represents a type id of mdev device.
> 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> 'requires_group' would be Read-Only files that vendor would provide to
> describe about that type.
> 
> 'create':
>     Write-only file. Mandatory.
>     Accepts string to create mediated device.
> 
> 'name':
>     Read-Only file. Mandatory.
>     Returns string, the name of that type id.
> 
> 'fb_length':
>     Read-only file. Mandatory.
>     Returns <number>{K,M,G}, size of framebuffer.

This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
just one user of mediated devices.

> 
> 'resolution':
>     Read-Only file. Mandatory.
>     Returns 'hres x vres' format. Maximum supported resolution.

Same.

> 
> 'heads':
>     Read-Only file. Mandatory.
>     Returns integer. Number of maximum heads supported.

Same.

> 
> 'max_instance':
>     Read-Only file. Mandatory.
>     Returns integer.  Returns maximum mdev device could be created
> at the moment when this file is read. This count would be updated by
> vendor driver. Before creating mdev device of this type, check if
> max_instance is > 0.

Didn't we discuss this being being something like "available_instances"
with a "devices" sub-directory linking current devices of that type?
 
> 'params'
>     Write-Only file. Optional.
>     String input. Libvirt would pass the string given in XML file to
> this file and then create mdev device. Set empty string to clear params.
> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> limiter for performance benchmarking, then create device of type 11. The
> device created would have that parameter set by vendor driver.

Might as well just call it "magic", there's absolutely no ability to
introspect what parameters are allowed or even valid here.  Can all
parameters be changed dynamically?  Do parameter changes apply to
existing devices or only future devices?  This is a poorly specified
interface.  I'd prefer this be done via module options on the vendor
driver.

> 'requires_group'
>     Read-Only file. Optional.
>     This should be provided by vendor driver if vendor driver need to
> group mdev devices in one domain so that vendor driver can use 'first
> open' to commit resources of all mdev devices associated to that domain
> and 'last close' to free those.
> 
> The parent device would look like:
> 
>    <device>
>      <name>pci_0000_86_00_0</name>
>      <capability type='pci'>
>        <domain>0</domain>
>        <bus>134</bus>
>        <slot>0</slot>
>        <function>0</function>

This is the parent device address?  (I think we'd re-use the
specification for assigned devices)  Is this optional?  Couldn't
libvirt choose to pick any parent device supplying the specified type
if not supplied?

>        <capability type='mdev'>
>          <!-- one type element per sysfs directory -->
>          <type id='11'>
>            <!-- one element per sysfs file roughly -->
>            <name>GRID M60-0B</name>
>            <attribute name='fb_length'>512M</attribute>
>            <attribute name='resolution'>2560x1600</attribute>
>            <attribute name='heads'>2</attribute>
>            <attribute name='max_instances'>16</attribute>
>            <attribute name='requires_group'>1</attribute>
>          </type>
>        </capability>
>        <product id='...'>GRID M60</product>
>        <vendor id='0x10de'>NVIDIA</vendor>

What are these specifying?

>      </capability>
>    </device>
> 
> 2. Create/destroy mediated device
> 
> With above example, vGPU device XML would look like:
> 
>    <device>
>      <name>my-vgpu</name>
>      <parent>pci_0000_86_00_0</parent>
>      <capability type='mdev'>
>        <type id='11'/>
>        <group>1</group>
>        <params>'frame_rate_limiter=0'</params>
>      </capability>
>    </device>
> 
> 'type id' is mandatory.

I was under the impression that the vendor supplied "name" was the one
unique identifier.  We're certainly not going to create a registrar to
hand out type ids to each vendor, so what makes type id unique?  I have
a hard time believing that a given vendor can even allocate unique type
ids for their own devices.  Unique type id across vendors is not
practical.  So which attribute are we actually using to identify which
type of mdev device we need and why would we ever specify additional
attributes like fb_length?  Doesn't the vendor guarantee that "GRID
M60-0B" has a fixed setup of those attributes?

> 'group' is optional. It should be a unique number in the system among
> all the groups created for mdev devices. Its usage is:
>   - not needed if single vGPU device is being assigned to a domain.
>   - only need to be set if multiple vGPUs need to be assigned to a
> domain and vendor driver have 'requires_group' file in type id directory.
>   - if type id directory include 'requires_group' and user tries to
> assign multiple vGPUs to a domain without having <group> field in XML,
> it will create single vGPU.

We never finished our discussion of how this gets implemented or
whether a group applies only to devices from the same vendor, same
device, how heterogeneous groups are handled, etc.

> 'params' is optional field. User should set this field if extra
> parameters need to be set for a particular vGPU device. Libvirt don't
> need to parse these params. These are meant for vendor driver.

ie. magic black hole.  nope.

> Libvirt need to follow the sequence to create device:
> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> then only proceed else fail.
> 
> * Set extra params if 'params' field exist in device XML and 'params'
> file exist in type id directory
> 
>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
> 
> * Autogenerate UUID
> * Create device:
> 
>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
> 
>     where <group> is optional. Group should be unique number among all
> the groups created for mdev devices.

Is it an integer, a UUID, a string?  How is the group for an mdev
device discovered?  Can it be changed later?

> * Clear params, if set earlier:
> 
>     echo "" > /sys/../0000\:86\:00.0/11/params

So params only applies to the devices created while those params are
set?  This is inherently racy.

> * To destroy device:
> 
>     echo $UUID > /sys/../0000\:86\:00.0/mdev_destroy

echo 1 > /sys/path/to/mdev/device/remove
 
> 3. Start/stop mediated device
> 
> No change or requirement for libvirt as this will be handled by open()
> and close() callbacks to vendor driver. In case of multiple devices and
> 'requires_group' set, this will be handled in 'first open()' and 'last
> close()' on device in that group.
> 
> 4. Launch QEMU/VM
> 
>  Pass the mdev sysfs path to QEMU as vfio-pci device.
>  For above vGPU device example:
> 
>     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID
> 
> 5. QEMU/VM Shutdown sequence
> 
> No change or requirement for libvirt.
> 
> 6. VM Reset
> 
> No change or requirement for libvirt as this will be handled via VFIO
> reset API and QEMU process will keep running as before.
> 
> 7. Hot-plug
> 
> It is same syntax to create a virtual device for hot-plug.

How do groups work with hotplug?  Can a device be creating into an
existing, running group?  Can a device be removed from an existing,
running group?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-19 21:36 ` Alex Williamson
@ 2016-09-19 21:50   ` Paolo Bonzini
  2016-09-19 22:25     ` Alex Williamson
  2016-09-20  1:25   ` Tian, Kevin
  2016-09-20 14:21   ` Kirti Wankhede
  2 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2016-09-19 21:50 UTC (permalink / raw)
  To: Alex Williamson, Kirti Wankhede
  Cc: libvir-list, qemu-devel, Neo Jia, Andy Currid, Tian, Kevin, Song,
	Jike, bjsdjshi, Gerd Hoffmann



On 19/09/2016 23:36, Alex Williamson wrote:
> On Tue, 20 Sep 2016 02:05:52 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>> 'fb_length':
>>     Read-only file. Mandatory.
>>     Returns <number>{K,M,G}, size of framebuffer.
> 
> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> just one user of mediated devices. [...]
> 
>> 'params'
>>     Write-Only file. Optional.
>>     String input. Libvirt would pass the string given in XML file to
>> this file and then create mdev device. Set empty string to clear params.
>> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
>> limiter for performance benchmarking, then create device of type 11. The
>> device created would have that parameter set by vendor driver.
> 
> Might as well just call it "magic", there's absolutely no ability to
> introspect what parameters are allowed or even valid here.  Can all
> parameters be changed dynamically?  Do parameter changes apply to
> existing devices or only future devices?  This is a poorly specified
> interface.  I'd prefer this be done via module options on the vendor
> driver.

Or additional files in the mdev's sysfs directory?

>> The parent device would look like:
>>
>>    <device>
>>      <name>pci_0000_86_00_0</name>
>>      <capability type='pci'>
>>        <domain>0</domain>
>>        <bus>134</bus>
>>        <slot>0</slot>
>>        <function>0</function>
> 
> This is the parent device address?

(Since this is taken from an email of mine, this is libvirt's XML
interpretation of the udev database and the contents of sysfs.  Try
"virsh nodedev-dumpxml pci_0000_00_02_0" for an example that is
unrelated to mdev).

>> 2. Create/destroy mediated device
>>
>> With above example, vGPU device XML would look like:
>>
>>    <device>
>>      <name>my-vgpu</name>
>>      <parent>pci_0000_86_00_0</parent>
>>      <capability type='mdev'>
>>        <type id='11'/>
>>        <group>1</group>
>>        <params>'frame_rate_limiter=0'</params>
>>      </capability>
>>    </device>
>>
>> 'type id' is mandatory.
> 
> I was under the impression that the vendor supplied "name" was the one
> unique identifier.  We're certainly not going to create a registrar to
> hand out type ids to each vendor, so what makes type id unique?

Why does it have to be unique?  It's just parroting what libvirt said in
the <capability type='mdev'> block of the parent's nodedev-dumpxml.

(Though I guess going by name could be useful as well.  Perhaps both
could be allowed).

>> * Clear params, if set earlier:
>>
>>     echo "" > /sys/../0000\:86\:00.0/11/params
> 
> So params only applies to the devices created while those params are
> set?  This is inherently racy.

I agree, these extra attributes should be set on the mdev, not the type
directory.

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-19 21:50   ` Paolo Bonzini
@ 2016-09-19 22:25     ` Alex Williamson
  2016-09-20 14:35       ` Kirti Wankhede
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Williamson @ 2016-09-19 22:25 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kirti Wankhede, libvir-list, qemu-devel, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann

On Mon, 19 Sep 2016 23:50:56 +0200
Paolo Bonzini <pbonzini@redhat.com> wrote:

> On 19/09/2016 23:36, Alex Williamson wrote:
> > On Tue, 20 Sep 2016 02:05:52 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:  
> >> 'fb_length':
> >>     Read-only file. Mandatory.
> >>     Returns <number>{K,M,G}, size of framebuffer.  
> > 
> > This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> > just one user of mediated devices. [...]
> >   
> >> 'params'
> >>     Write-Only file. Optional.
> >>     String input. Libvirt would pass the string given in XML file to
> >> this file and then create mdev device. Set empty string to clear params.
> >> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> >> limiter for performance benchmarking, then create device of type 11. The
> >> device created would have that parameter set by vendor driver.  
> > 
> > Might as well just call it "magic", there's absolutely no ability to
> > introspect what parameters are allowed or even valid here.  Can all
> > parameters be changed dynamically?  Do parameter changes apply to
> > existing devices or only future devices?  This is a poorly specified
> > interface.  I'd prefer this be done via module options on the vendor
> > driver.  
> 
> Or additional files in the mdev's sysfs directory?

Sure, the vendor driver could certainly support that, it'd be vendor
specific of course.
 
> >> The parent device would look like:
> >>
> >>    <device>
> >>      <name>pci_0000_86_00_0</name>
> >>      <capability type='pci'>
> >>        <domain>0</domain>
> >>        <bus>134</bus>
> >>        <slot>0</slot>
> >>        <function>0</function>  
> > 
> > This is the parent device address?  
> 
> (Since this is taken from an email of mine, this is libvirt's XML
> interpretation of the udev database and the contents of sysfs.  Try
> "virsh nodedev-dumpxml pci_0000_00_02_0" for an example that is
> unrelated to mdev).
> 
> >> 2. Create/destroy mediated device
> >>
> >> With above example, vGPU device XML would look like:
> >>
> >>    <device>
> >>      <name>my-vgpu</name>
> >>      <parent>pci_0000_86_00_0</parent>
> >>      <capability type='mdev'>
> >>        <type id='11'/>
> >>        <group>1</group>
> >>        <params>'frame_rate_limiter=0'</params>
> >>      </capability>
> >>    </device>
> >>
> >> 'type id' is mandatory.  
> > 
> > I was under the impression that the vendor supplied "name" was the one
> > unique identifier.  We're certainly not going to create a registrar to
> > hand out type ids to each vendor, so what makes type id unique?  
> 
> Why does it have to be unique?  It's just parroting what libvirt said in
> the <capability type='mdev'> block of the parent's nodedev-dumpxml.
> 
> (Though I guess going by name could be useful as well.  Perhaps both
> could be allowed).

Do we not want to be able to migrate a VM to another host and have the
XML mean something?  Off-line or on-line, mdev device may support
migration.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-19 21:36 ` Alex Williamson
  2016-09-19 21:50   ` Paolo Bonzini
@ 2016-09-20  1:25   ` Tian, Kevin
  2016-09-20 14:21   ` Kirti Wankhede
  2 siblings, 0 replies; 62+ messages in thread
From: Tian, Kevin @ 2016-09-20  1:25 UTC (permalink / raw)
  To: Alex Williamson, Kirti Wankhede
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Tuesday, September 20, 2016 5:36 AM
> 
> > In the above example directory '11' represents a type id of mdev device.
> > 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> > 'requires_group' would be Read-Only files that vendor would provide to
> > describe about that type.
> >
> > 'create':
> >     Write-only file. Mandatory.
> >     Accepts string to create mediated device.
> >
> > 'name':
> >     Read-Only file. Mandatory.
> >     Returns string, the name of that type id.
> >
> > 'fb_length':
> >     Read-only file. Mandatory.
> >     Returns <number>{K,M,G}, size of framebuffer.
> 
> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> just one user of mediated devices.

Agree. at least it doesn't make sense for s390 passthrough usage.

> 
> >
> > 'max_instance':
> >     Read-Only file. Mandatory.
> >     Returns integer.  Returns maximum mdev device could be created
> > at the moment when this file is read. This count would be updated by
> > vendor driver. Before creating mdev device of this type, check if
> > max_instance is > 0.
> 
> Didn't we discuss this being being something like "available_instances"
> with a "devices" sub-directory linking current devices of that type?

Right. "available_instances" which is dynamically changed under two scenarios:

a) create a new mdev under one type decrements "available_instances" of this
type by 1;
b) create a new mdev under one type decrements "available_instances" under
other types by type-specific number, if vendor supports creating multiple types
simultaneously like in Intel solution;

> 
> > 'params'
> >     Write-Only file. Optional.
> >     String input. Libvirt would pass the string given in XML file to
> > this file and then create mdev device. Set empty string to clear params.
> > For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> > limiter for performance benchmarking, then create device of type 11. The
> > device created would have that parameter set by vendor driver.
> 
> Might as well just call it "magic", there's absolutely no ability to
> introspect what parameters are allowed or even valid here.  Can all
> parameters be changed dynamically?  Do parameter changes apply to
> existing devices or only future devices?  This is a poorly specified
> interface.  I'd prefer this be done via module options on the vendor
> driver.

I guess it might be mdev specific for nvidia's case, then module option
is not suitable. Then this magic string should be placed under mdev
directory instead of here, otherwise there'll be race condition.

> 
> >
> > 2. Create/destroy mediated device
> >
> > With above example, vGPU device XML would look like:
> >
> >    <device>
> >      <name>my-vgpu</name>
> >      <parent>pci_0000_86_00_0</parent>
> >      <capability type='mdev'>
> >        <type id='11'/>
> >        <group>1</group>
> >        <params>'frame_rate_limiter=0'</params>
> >      </capability>
> >    </device>
> >
> > 'type id' is mandatory.
> 
> I was under the impression that the vendor supplied "name" was the one
> unique identifier.  We're certainly not going to create a registrar to
> hand out type ids to each vendor, so what makes type id unique?  I have
> a hard time believing that a given vendor can even allocate unique type
> ids for their own devices.  Unique type id across vendors is not
> practical.  So which attribute are we actually using to identify which
> type of mdev device we need and why would we ever specify additional
> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
> M60-0B" has a fixed setup of those attributes?

I'm a bit confused about type ID and type name here. Just keeping one 
should be enough (ether a number or descriptive name), which is 
per-device scope, not necessarily to cross vendors. As long as the vendor
defines the ID or name unique cross its own generations, looks should be
sufficient (I guess cross-vendor migration is not an interest here)

> 
> > 'params' is optional field. User should set this field if extra
> > parameters need to be set for a particular vGPU device. Libvirt don't
> > need to parse these params. These are meant for vendor driver.
> 
> ie. magic black hole.  nope.

A curious question here, though it's not a mandatory requirement for
Intel's GPU virtualization solution now, but allow such extension might
be useful in the long term. Has libvirt ever supported similar black 
string in other usages? I can think it's a problem for public cloud usage,
where any configuration on the managed device should be controlled
by orchestration layer (e.g. openstack) automatically. It's not appropriate
to allow a cloud tenant specifying such parameter, so in that manner
any configuration parameter should be explicitly understood from upper
level stack to libvirt then vendor specific parameters would be hard to
handle. On the other hand, if we consider enterprise virtualization 
scenario which is more proprietary, administrator may add such vendor
specific parameters as a black string through libvirt, as long as he
knows the underlying capability on purchased devices. Then is the latter
a valid value to consider such extension? :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-19 20:35 [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration Kirti Wankhede
  2016-09-19 21:36 ` Alex Williamson
@ 2016-09-20  1:37 ` Tian, Kevin
  2016-09-20  9:47 ` Daniel P. Berrange
  2 siblings, 0 replies; 62+ messages in thread
From: Tian, Kevin @ 2016-09-20  1:37 UTC (permalink / raw)
  To: Kirti Wankhede, libvir-list
  Cc: qemu-devel, Alex Williamson, Paolo Bonzini, Neo Jia, Andy Currid,
	Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> Sent: Tuesday, September 20, 2016 4:36 AM
> 
> 
> Hi libvirt experts,
> 
> Thanks for valuable input on v1 version of RFC.
> 
> Quick brief, VFIO based mediated device framework provides a way to
> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> and IBM's channel IO. This framework reuses VFIO APIs for all the
> functionalities for mediated devices which are currently being used for
> pass through devices. This framework introduces a set of new sysfs files
> for device creation and its life cycle management.
> 
> Here is the summary of discussion on v1:
> 1. Discover mediated device:
> As part of physical device initialization process, vendor driver will
> register their physical devices, which will be used to create virtual
> device (mediated device, aka mdev) to the mediated framework.
> 
> Vendor driver should specify mdev_supported_types in directory format.
> This format is class based, for example, display class directory format
> should be as below. We need to define such set for each class of devices
> which would be supported by mediated device framework.
> 
>  --- mdev_destroy
>  --- mdev_supported_types
>      |-- 11
>      |   |-- create
>      |   |-- name
>      |   |-- fb_length
>      |   |-- resolution
>      |   |-- heads
>      |   |-- max_instances
>      |   |-- params
>      |   |-- requires_group
>      |-- 12
>      |   |-- create
>      |   |-- name
>      |   |-- fb_length
>      |   |-- resolution
>      |   |-- heads
>      |   |-- max_instances
>      |   |-- params
>      |   |-- requires_group
>      |-- 13
>          |-- create
>          |-- name
>          |-- fb_length
>          |-- resolution
>          |-- heads
>          |-- max_instances
>          |-- params
>          |-- requires_group
> 
> 

Per previous discussion, I'd think below layout is more general and clearer:

  --- mdev_supported_types
      |-- ID (or name)
      |   |-- create
      |   |-- available_instances
      |   |-- requires_group
      |-- ID (or name)
      |   |-- create
      |   |-- available_instances
      |   |-- requires_group
	 ...

Then under each mdev directory:
      |-- type (link to type directory)
      |-- destroy
      |-- online
      |-- reset
      |-- Destroy

Group association may be done under mdev directory too, per Alex's earlier
suggestion.

Optional vendor-agnostic attributes can be extended in the future, e.g.:
      |-- priority
      |-- weight

Vendor-specific or class-specific parameters (black string or explicit attributes) 
could be also extended per-mdev directory, if necessary to support.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-19 20:35 [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration Kirti Wankhede
  2016-09-19 21:36 ` Alex Williamson
  2016-09-20  1:37 ` Tian, Kevin
@ 2016-09-20  9:47 ` Daniel P. Berrange
  2016-09-28 19:48   ` Neo Jia
  2 siblings, 1 reply; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-20  9:47 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: libvir-list, Andy Currid, Tian, Kevin, Neo Jia, qemu-devel, Song,
	Jike, Alex Williamson, Gerd Hoffmann, Paolo Bonzini, bjsdjshi

On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
> 
> Hi libvirt experts,
> 
> Thanks for valuable input on v1 version of RFC.
> 
> Quick brief, VFIO based mediated device framework provides a way to
> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> and IBM's channel IO. This framework reuses VFIO APIs for all the
> functionalities for mediated devices which are currently being used for
> pass through devices. This framework introduces a set of new sysfs files
> for device creation and its life cycle management.
> 
> Here is the summary of discussion on v1:
> 1. Discover mediated device:
> As part of physical device initialization process, vendor driver will
> register their physical devices, which will be used to create virtual
> device (mediated device, aka mdev) to the mediated framework.
> 
> Vendor driver should specify mdev_supported_types in directory format.
> This format is class based, for example, display class directory format
> should be as below. We need to define such set for each class of devices
> which would be supported by mediated device framework.
> 
>  --- mdev_destroy
>  --- mdev_supported_types
>      |-- 11
>      |   |-- create
>      |   |-- name
>      |   |-- fb_length
>      |   |-- resolution
>      |   |-- heads
>      |   |-- max_instances
>      |   |-- params
>      |   |-- requires_group
>      |-- 12
>      |   |-- create
>      |   |-- name
>      |   |-- fb_length
>      |   |-- resolution
>      |   |-- heads
>      |   |-- max_instances
>      |   |-- params
>      |   |-- requires_group
>      |-- 13
>          |-- create
>          |-- name
>          |-- fb_length
>          |-- resolution
>          |-- heads
>          |-- max_instances
>          |-- params
>          |-- requires_group
> 
> 
> In the above example directory '11' represents a type id of mdev device.
> 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> 'requires_group' would be Read-Only files that vendor would provide to
> describe about that type.
> 
> 'create':
>     Write-only file. Mandatory.
>     Accepts string to create mediated device.
> 
> 'name':
>     Read-Only file. Mandatory.
>     Returns string, the name of that type id.

Presumably this is a human-targetted title/description of
the device.

> 
> 'fb_length':
>     Read-only file. Mandatory.
>     Returns <number>{K,M,G}, size of framebuffer.
> 
> 'resolution':
>     Read-Only file. Mandatory.
>     Returns 'hres x vres' format. Maximum supported resolution.
> 
> 'heads':
>     Read-Only file. Mandatory.
>     Returns integer. Number of maximum heads supported.

None of these should be mandatory as that makes the mdev
useless for non-GPU devices.

I'd expect to see a 'class' or 'type' attribute in the
directory whcih tells you what kind of mdev it is. A
valid 'class' value would be 'gpu'. The fb_length,
resolution, and heads parameters would only be mandatory
when class==gpu.

> 'max_instance':
>     Read-Only file. Mandatory.
>     Returns integer.  Returns maximum mdev device could be created
> at the moment when this file is read. This count would be updated by
> vendor driver. Before creating mdev device of this type, check if
> max_instance is > 0.
> 
> 'params'
>     Write-Only file. Optional.
>     String input. Libvirt would pass the string given in XML file to
> this file and then create mdev device. Set empty string to clear params.
> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> limiter for performance benchmarking, then create device of type 11. The
> device created would have that parameter set by vendor driver.

Nope, libvirt will explicitly *NEVER* allow arbitrary opaque
passthrough of vendor specific data in this way.

> The parent device would look like:
> 
>    <device>
>      <name>pci_0000_86_00_0</name>
>      <capability type='pci'>
>        <domain>0</domain>
>        <bus>134</bus>
>        <slot>0</slot>
>        <function>0</function>
>        <capability type='mdev'>
>          <!-- one type element per sysfs directory -->
>          <type id='11'>
>            <!-- one element per sysfs file roughly -->
>            <name>GRID M60-0B</name>
>            <attribute name='fb_length'>512M</attribute>
>            <attribute name='resolution'>2560x1600</attribute>
>            <attribute name='heads'>2</attribute>
>            <attribute name='max_instances'>16</attribute>
>            <attribute name='requires_group'>1</attribute>
>          </type>

There would need to be a <class> element, eg <class>gpu</class>

We would then have further elements based on the class. eg

          <type id='11'>
            <!-- one element per sysfs file roughly -->
            <name>GRID M60-0B</name>
            <fb_length>512M</fb_length>
            <resolution>2560x1600</resolution>
            <heads>2</heads>
            <max_instances>16</max_instances>
            <requires_group>1</requires_group>
          </type>



>        </capability>
>        <product id='...'>GRID M60</product>
>        <vendor id='0x10de'>NVIDIA</vendor>
>      </capability>
>    </device>
> 
> 2. Create/destroy mediated device
> 
> With above example, vGPU device XML would look like:
> 
>    <device>
>      <name>my-vgpu</name>
>      <parent>pci_0000_86_00_0</parent>
>      <capability type='mdev'>
>        <type id='11'/>
>        <group>1</group>
>        <params>'frame_rate_limiter=0'</params>

No, we will not support <params> in this manner in libvirt.

The entire purpose of libvirt is to represent data in a
vendor agnostic manner and not do abitrary passthrough
of vendor specific data. Simply saying this field is
optional does not get around that either.

>      </capability>
>    </device>
> 
> 'type id' is mandatory.
> 'group' is optional. It should be a unique number in the system among
> all the groups created for mdev devices. Its usage is:
>   - not needed if single vGPU device is being assigned to a domain.
>   - only need to be set if multiple vGPUs need to be assigned to a
> domain and vendor driver have 'requires_group' file in type id directory.
>   - if type id directory include 'requires_group' and user tries to
> assign multiple vGPUs to a domain without having <group> field in XML,
> it will create single vGPU.
> 
> 'params' is optional field. User should set this field if extra
> parameters need to be set for a particular vGPU device. Libvirt don't
> need to parse these params. These are meant for vendor driver.
> 
> Libvirt need to follow the sequence to create device:
> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> then only proceed else fail.
> 
> * Set extra params if 'params' field exist in device XML and 'params'
> file exist in type id directory
> 
>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params

We cannot do that step.

> 
> * Autogenerate UUID
> * Create device:
> 
>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
> 
>     where <group> is optional. Group should be unique number among all
> the groups created for mdev devices.
> 
> * Clear params, if set earlier:
> 
>     echo "" > /sys/../0000\:86\:00.0/11/params
> 
> * To destroy device:
> 
>     echo $UUID > /sys/../0000\:86\:00.0/mdev_destroy
> 
> 
> 3. Start/stop mediated device
> 
> No change or requirement for libvirt as this will be handled by open()
> and close() callbacks to vendor driver. In case of multiple devices and
> 'requires_group' set, this will be handled in 'first open()' and 'last
> close()' on device in that group.
> 
> 4. Launch QEMU/VM
> 
>  Pass the mdev sysfs path to QEMU as vfio-pci device.
>  For above vGPU device example:
> 
>     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID
> 
> 5. QEMU/VM Shutdown sequence
> 
> No change or requirement for libvirt.
> 
> 6. VM Reset
> 
> No change or requirement for libvirt as this will be handled via VFIO
> reset API and QEMU process will keep running as before.
> 
> 7. Hot-plug
> 
> It is same syntax to create a virtual device for hot-plug.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-19 21:36 ` Alex Williamson
  2016-09-19 21:50   ` Paolo Bonzini
  2016-09-20  1:25   ` Tian, Kevin
@ 2016-09-20 14:21   ` Kirti Wankhede
  2016-09-20 14:43     ` Alex Williamson
  2016-09-21  3:56     ` Tian, Kevin
  2 siblings, 2 replies; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-20 14:21 UTC (permalink / raw)
  To: Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann



On 9/20/2016 3:06 AM, Alex Williamson wrote:
> On Tue, 20 Sep 2016 02:05:52 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> Hi libvirt experts,
>>
>> Thanks for valuable input on v1 version of RFC.
>>
>> Quick brief, VFIO based mediated device framework provides a way to
>> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
>> and IBM's channel IO. This framework reuses VFIO APIs for all the
>> functionalities for mediated devices which are currently being used for
>> pass through devices. This framework introduces a set of new sysfs files
>> for device creation and its life cycle management.
>>
>> Here is the summary of discussion on v1:
>> 1. Discover mediated device:
>> As part of physical device initialization process, vendor driver will
>> register their physical devices, which will be used to create virtual
>> device (mediated device, aka mdev) to the mediated framework.
>>
>> Vendor driver should specify mdev_supported_types in directory format.
>> This format is class based, for example, display class directory format
>> should be as below. We need to define such set for each class of devices
>> which would be supported by mediated device framework.
>>
>>  --- mdev_destroy
> 
> I thought we replaced mdev_destroy with a remove attribute for each
> mdev device.
> 
>>  --- mdev_supported_types
>>      |-- 11
>>      |   |-- create
>>      |   |-- name
>>      |   |-- fb_length
>>      |   |-- resolution
>>      |   |-- heads
>>      |   |-- max_instances
>>      |   |-- params
>>      |   |-- requires_group
>>      |-- 12
>>      |   |-- create
>>      |   |-- name
>>      |   |-- fb_length
>>      |   |-- resolution
>>      |   |-- heads
>>      |   |-- max_instances
>>      |   |-- params
>>      |   |-- requires_group
>>      |-- 13
>>          |-- create
>>          |-- name
>>          |-- fb_length
>>          |-- resolution
>>          |-- heads
>>          |-- max_instances
>>          |-- params
>>          |-- requires_group
>>
>>
>> In the above example directory '11' represents a type id of mdev device.
>> 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
>> 'requires_group' would be Read-Only files that vendor would provide to
>> describe about that type.
>>
>> 'create':
>>     Write-only file. Mandatory.
>>     Accepts string to create mediated device.
>>
>> 'name':
>>     Read-Only file. Mandatory.
>>     Returns string, the name of that type id.
>>
>> 'fb_length':
>>     Read-only file. Mandatory.
>>     Returns <number>{K,M,G}, size of framebuffer.
> 
> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> just one user of mediated devices.
> 

As per our discussion in BOF, we would define class and its attributes.
As I mentioned above these are attributes of "display" class.

>>
>> 'resolution':
>>     Read-Only file. Mandatory.
>>     Returns 'hres x vres' format. Maximum supported resolution.
> 
> Same.
> 
>>
>> 'heads':
>>     Read-Only file. Mandatory.
>>     Returns integer. Number of maximum heads supported.
> 
> Same.
> 
>>
>> 'max_instance':
>>     Read-Only file. Mandatory.
>>     Returns integer.  Returns maximum mdev device could be created
>> at the moment when this file is read. This count would be updated by
>> vendor driver. Before creating mdev device of this type, check if
>> max_instance is > 0.
> 
> Didn't we discuss this being being something like "available_instances"
> with a "devices" sub-directory linking current devices of that type?
>

I'm ok with 'available_instances' as name of this file.
Yes, mdev device directory will have links to devices of that type. I
think that is part of mdev core module discussion. I'm trying to list
sysfs files for libvirt interface here to focus more on libvirt
interface. Hence didn't add that. I'll make sure to add all such details
in future.


>> 'params'
>>     Write-Only file. Optional.
>>     String input. Libvirt would pass the string given in XML file to
>> this file and then create mdev device. Set empty string to clear params.
>> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
>> limiter for performance benchmarking, then create device of type 11. The
>> device created would have that parameter set by vendor driver.
> 
> Might as well just call it "magic", there's absolutely no ability to
> introspect what parameters are allowed or even valid here.

Libvirt don't need to introspect these parameters. These are meant for
vendor driver. Let vendor driver interpret these parameters and take
action based on that.

>  Can all parameters be changed dynamically?

Parameters here would be extra parameters which are not listed above in
mandatory list. If such parameters are required to set, these parameters
should be set per mdev device before VM is booted.

>  Do parameter changes apply to existing devices or only future devices?

No, it should be applied at the time when mdev device is created or
after mdev device is created but before VM is booted. It will not be
applicable to existing devices.

>  This is a poorly specified
> interface.  I'd prefer this be done via module options on the vendor
> driver.
> 

Module option is not suitable here, in that case such parameters would
be applied to all mdev device created for vendor driver and that is not
what we want to.

>> 'requires_group'
>>     Read-Only file. Optional.
>>     This should be provided by vendor driver if vendor driver need to
>> group mdev devices in one domain so that vendor driver can use 'first
>> open' to commit resources of all mdev devices associated to that domain
>> and 'last close' to free those.
>>
>> The parent device would look like:
>>
>>    <device>
>>      <name>pci_0000_86_00_0</name>
>>      <capability type='pci'>
>>        <domain>0</domain>
>>        <bus>134</bus>
>>        <slot>0</slot>
>>        <function>0</function>
> 
> This is the parent device address?  (I think we'd re-use the
> specification for assigned devices)  Is this optional?  Couldn't
> libvirt choose to pick any parent device supplying the specified type
> if not supplied?
> 
>>        <capability type='mdev'>
>>          <!-- one type element per sysfs directory -->
>>          <type id='11'>
>>            <!-- one element per sysfs file roughly -->
>>            <name>GRID M60-0B</name>
>>            <attribute name='fb_length'>512M</attribute>
>>            <attribute name='resolution'>2560x1600</attribute>
>>            <attribute name='heads'>2</attribute>
>>            <attribute name='max_instances'>16</attribute>
>>            <attribute name='requires_group'>1</attribute>
>>          </type>
>>        </capability>
>>        <product id='...'>GRID M60</product>
>>        <vendor id='0x10de'>NVIDIA</vendor>
> 
> What are these specifying?
> 
>>      </capability>
>>    </device>
>>
>> 2. Create/destroy mediated device
>>
>> With above example, vGPU device XML would look like:
>>
>>    <device>
>>      <name>my-vgpu</name>
>>      <parent>pci_0000_86_00_0</parent>
>>      <capability type='mdev'>
>>        <type id='11'/>
>>        <group>1</group>
>>        <params>'frame_rate_limiter=0'</params>
>>      </capability>
>>    </device>
>>
>> 'type id' is mandatory.
> 
> I was under the impression that the vendor supplied "name" was the one
> unique identifier.  We're certainly not going to create a registrar to
> hand out type ids to each vendor, so what makes type id unique?

Type id is unique, 'name' is the name (string) of device that would be
displayed in guest OS for that type of mdev device.

>  I have
> a hard time believing that a given vendor can even allocate unique type
> ids for their own devices.  Unique type id across vendors is not
> practical.  So which attribute are we actually using to identify which
> type of mdev device we need and why would we ever specify additional
> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
> M60-0B" has a fixed setup of those attributes?
> 

Specifying attributes here is not our requirement. Yes we have fixed set
of attributes for "GRID-M60-0B" and on.
We are defining the attributed here for "display" class for all other
vendor of gpu can use.


>> 'group' is optional. It should be a unique number in the system among
>> all the groups created for mdev devices. Its usage is:
>>   - not needed if single vGPU device is being assigned to a domain.
>>   - only need to be set if multiple vGPUs need to be assigned to a
>> domain and vendor driver have 'requires_group' file in type id directory.
>>   - if type id directory include 'requires_group' and user tries to
>> assign multiple vGPUs to a domain without having <group> field in XML,
>> it will create single vGPU.
> 
> We never finished our discussion of how this gets implemented or
> whether a group applies only to devices from the same vendor, same
> device, how heterogeneous groups are handled, etc.
> 

We agreed on above points that I mentioned here and we wish to know what
libvirt folks think, right?
Since there were no inputs on that thread I thought I should summarize
what had been discussed and get libvirt experts thoughts on this.

>> 'params' is optional field. User should set this field if extra
>> parameters need to be set for a particular vGPU device. Libvirt don't
>> need to parse these params. These are meant for vendor driver.
> 
> ie. magic black hole.  nope.
> 
>> Libvirt need to follow the sequence to create device:
>> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
>> then only proceed else fail.
>>
>> * Set extra params if 'params' field exist in device XML and 'params'
>> file exist in type id directory
>>
>>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
>>
>> * Autogenerate UUID
>> * Create device:
>>
>>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
>>
>>     where <group> is optional. Group should be unique number among all
>> the groups created for mdev devices.
> 
> Is it an integer, a UUID, a string?  How is the group for an mdev
> device discovered?  Can it be changed later?
> 

Group number could be integer, that's where we need input from libvirt
folks. It should be provided at the time of mdev device creation. It
can't be changed after that. If user want to change group number
destroy/remove mdev device and create it again with proper group number.

>> * Clear params, if set earlier:
>>
>>     echo "" > /sys/../0000\:86\:00.0/11/params
> 
> So params only applies to the devices created while those params are
> set?  This is inherently racy.
> 
>> * To destroy device:
>>
>>     echo $UUID > /sys/../0000\:86\:00.0/mdev_destroy
> 
> echo 1 > /sys/path/to/mdev/device/remove
>  
>> 3. Start/stop mediated device
>>
>> No change or requirement for libvirt as this will be handled by open()
>> and close() callbacks to vendor driver. In case of multiple devices and
>> 'requires_group' set, this will be handled in 'first open()' and 'last
>> close()' on device in that group.
>>
>> 4. Launch QEMU/VM
>>
>>  Pass the mdev sysfs path to QEMU as vfio-pci device.
>>  For above vGPU device example:
>>
>>     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID
>>
>> 5. QEMU/VM Shutdown sequence
>>
>> No change or requirement for libvirt.
>>
>> 6. VM Reset
>>
>> No change or requirement for libvirt as this will be handled via VFIO
>> reset API and QEMU process will keep running as before.
>>
>> 7. Hot-plug
>>
>> It is same syntax to create a virtual device for hot-plug.
> 
> How do groups work with hotplug?  Can a device be creating into an
> existing, running group?  Can a device be removed from an existing,
> running group?  Thanks,
> 

Group is for vendor driver to decide when to commit resources for a
group of devices to support multiple devices in a domain. It will be
upto vendor driver how it manage it.

Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-19 22:25     ` Alex Williamson
@ 2016-09-20 14:35       ` Kirti Wankhede
  2016-09-20 14:41         ` Daniel P. Berrange
  2016-09-20 14:52         ` Alex Williamson
  0 siblings, 2 replies; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-20 14:35 UTC (permalink / raw)
  To: Alex Williamson, Paolo Bonzini
  Cc: libvir-list, qemu-devel, Neo Jia, Andy Currid, Tian, Kevin, Song,
	Jike, bjsdjshi, Gerd Hoffmann



On 9/20/2016 3:55 AM, Alex Williamson wrote:
> On Mon, 19 Sep 2016 23:50:56 +0200
> Paolo Bonzini <pbonzini@redhat.com> wrote:
> 
>> On 19/09/2016 23:36, Alex Williamson wrote:
>>> On Tue, 20 Sep 2016 02:05:52 +0530
>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:  
>>>> 'fb_length':
>>>>     Read-only file. Mandatory.
>>>>     Returns <number>{K,M,G}, size of framebuffer.  
>>>
>>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
>>> just one user of mediated devices. [...]
>>>   
>>>> 'params'
>>>>     Write-Only file. Optional.
>>>>     String input. Libvirt would pass the string given in XML file to
>>>> this file and then create mdev device. Set empty string to clear params.
>>>> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
>>>> limiter for performance benchmarking, then create device of type 11. The
>>>> device created would have that parameter set by vendor driver.  
>>>
>>> Might as well just call it "magic", there's absolutely no ability to
>>> introspect what parameters are allowed or even valid here.  Can all
>>> parameters be changed dynamically?  Do parameter changes apply to
>>> existing devices or only future devices?  This is a poorly specified
>>> interface.  I'd prefer this be done via module options on the vendor
>>> driver.  
>>
>> Or additional files in the mdev's sysfs directory?
> 
> Sure, the vendor driver could certainly support that, it'd be vendor
> specific of course.
>

Right, it would be vendor specific. But if this is not set through mdev
device XML, user have to create mdev device with XML first and then
manually set such parameter. Also these parameters would not retain if
mdev device is destroyed and created again. That's why if libvirt sets
these parameters by reading it from XML file, it would be simple for user.

If we move 'params' file to mdev device, could libvirt do the following
to create mdev device:

* Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
then only proceed else fail.

* Autogenerate UUID
* Create device:

    echo "$UUID" > /sys/../0000\:86\:00.0/11/create

* Set extra params if 'params' field exist in device XML and 'params'
file exist mdev device directory:

    echo "frame_rate_limiter=0" > /sys/bus/mdev/devices/$UUID/params

Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 14:35       ` Kirti Wankhede
@ 2016-09-20 14:41         ` Daniel P. Berrange
  2016-09-20 14:49           ` Paolo Bonzini
  2016-09-20 14:52         ` Alex Williamson
  1 sibling, 1 reply; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-20 14:41 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Alex Williamson, Paolo Bonzini, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi

On Tue, Sep 20, 2016 at 08:05:01PM +0530, Kirti Wankhede wrote:
> 
> 
> On 9/20/2016 3:55 AM, Alex Williamson wrote:
> > On Mon, 19 Sep 2016 23:50:56 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> > 
> >> On 19/09/2016 23:36, Alex Williamson wrote:
> >>> On Tue, 20 Sep 2016 02:05:52 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:  
> >>>> 'fb_length':
> >>>>     Read-only file. Mandatory.
> >>>>     Returns <number>{K,M,G}, size of framebuffer.  
> >>>
> >>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> >>> just one user of mediated devices. [...]
> >>>   
> >>>> 'params'
> >>>>     Write-Only file. Optional.
> >>>>     String input. Libvirt would pass the string given in XML file to
> >>>> this file and then create mdev device. Set empty string to clear params.
> >>>> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> >>>> limiter for performance benchmarking, then create device of type 11. The
> >>>> device created would have that parameter set by vendor driver.  
> >>>
> >>> Might as well just call it "magic", there's absolutely no ability to
> >>> introspect what parameters are allowed or even valid here.  Can all
> >>> parameters be changed dynamically?  Do parameter changes apply to
> >>> existing devices or only future devices?  This is a poorly specified
> >>> interface.  I'd prefer this be done via module options on the vendor
> >>> driver.  
> >>
> >> Or additional files in the mdev's sysfs directory?
> > 
> > Sure, the vendor driver could certainly support that, it'd be vendor
> > specific of course.
> >
> 
> Right, it would be vendor specific. But if this is not set through mdev
> device XML, user have to create mdev device with XML first and then
> manually set such parameter. Also these parameters would not retain if
> mdev device is destroyed and created again. That's why if libvirt sets
> these parameters by reading it from XML file, it would be simple for user.
> 
> If we move 'params' file to mdev device, could libvirt do the following
> to create mdev device:
> 
> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> then only proceed else fail.
> 
> * Autogenerate UUID
> * Create device:
> 
>     echo "$UUID" > /sys/../0000\:86\:00.0/11/create
> 
> * Set extra params if 'params' field exist in device XML and 'params'
> file exist mdev device directory:
> 
>     echo "frame_rate_limiter=0" > /sys/bus/mdev/devices/$UUID/params

As I've said in my earlier reply - libvirt will *NOT* support passing
arbitrary vendor specific parameters as a blob via the XML. Everything
that appears in the XML must be *fully* specified and explicitly
represented in the XML  as a distinct attribute or element.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 14:21   ` Kirti Wankhede
@ 2016-09-20 14:43     ` Alex Williamson
  2016-09-20 16:23       ` Kirti Wankhede
  2016-09-21  3:56     ` Tian, Kevin
  1 sibling, 1 reply; 62+ messages in thread
From: Alex Williamson @ 2016-09-20 14:43 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann

On Tue, 20 Sep 2016 19:51:58 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 9/20/2016 3:06 AM, Alex Williamson wrote:
> > On Tue, 20 Sep 2016 02:05:52 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> Hi libvirt experts,
> >>
> >> Thanks for valuable input on v1 version of RFC.
> >>
> >> Quick brief, VFIO based mediated device framework provides a way to
> >> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> >> and IBM's channel IO. This framework reuses VFIO APIs for all the
> >> functionalities for mediated devices which are currently being used for
> >> pass through devices. This framework introduces a set of new sysfs files
> >> for device creation and its life cycle management.
> >>
> >> Here is the summary of discussion on v1:
> >> 1. Discover mediated device:
> >> As part of physical device initialization process, vendor driver will
> >> register their physical devices, which will be used to create virtual
> >> device (mediated device, aka mdev) to the mediated framework.
> >>
> >> Vendor driver should specify mdev_supported_types in directory format.
> >> This format is class based, for example, display class directory format
> >> should be as below. We need to define such set for each class of devices
> >> which would be supported by mediated device framework.
> >>
> >>  --- mdev_destroy  
> > 
> > I thought we replaced mdev_destroy with a remove attribute for each
> > mdev device.
> >   
> >>  --- mdev_supported_types
> >>      |-- 11
> >>      |   |-- create
> >>      |   |-- name
> >>      |   |-- fb_length
> >>      |   |-- resolution
> >>      |   |-- heads
> >>      |   |-- max_instances
> >>      |   |-- params
> >>      |   |-- requires_group
> >>      |-- 12
> >>      |   |-- create
> >>      |   |-- name
> >>      |   |-- fb_length
> >>      |   |-- resolution
> >>      |   |-- heads
> >>      |   |-- max_instances
> >>      |   |-- params
> >>      |   |-- requires_group
> >>      |-- 13
> >>          |-- create
> >>          |-- name
> >>          |-- fb_length
> >>          |-- resolution
> >>          |-- heads
> >>          |-- max_instances
> >>          |-- params
> >>          |-- requires_group
> >>
> >>
> >> In the above example directory '11' represents a type id of mdev device.
> >> 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> >> 'requires_group' would be Read-Only files that vendor would provide to
> >> describe about that type.
> >>
> >> 'create':
> >>     Write-only file. Mandatory.
> >>     Accepts string to create mediated device.
> >>
> >> 'name':
> >>     Read-Only file. Mandatory.
> >>     Returns string, the name of that type id.
> >>
> >> 'fb_length':
> >>     Read-only file. Mandatory.
> >>     Returns <number>{K,M,G}, size of framebuffer.  
> > 
> > This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> > just one user of mediated devices.
> >   
> 
> As per our discussion in BOF, we would define class and its attributes.
> As I mentioned above these are attributes of "display" class.

Just as I didn't know here, how does libvirt know the class of a given
type id?
 
> >>
> >> 'resolution':
> >>     Read-Only file. Mandatory.
> >>     Returns 'hres x vres' format. Maximum supported resolution.  
> > 
> > Same.
> >   
> >>
> >> 'heads':
> >>     Read-Only file. Mandatory.
> >>     Returns integer. Number of maximum heads supported.  
> > 
> > Same.
> >   
> >>
> >> 'max_instance':
> >>     Read-Only file. Mandatory.
> >>     Returns integer.  Returns maximum mdev device could be created
> >> at the moment when this file is read. This count would be updated by
> >> vendor driver. Before creating mdev device of this type, check if
> >> max_instance is > 0.  
> > 
> > Didn't we discuss this being being something like "available_instances"
> > with a "devices" sub-directory linking current devices of that type?
> >  
> 
> I'm ok with 'available_instances' as name of this file.
> Yes, mdev device directory will have links to devices of that type. I
> think that is part of mdev core module discussion. I'm trying to list
> sysfs files for libvirt interface here to focus more on libvirt
> interface. Hence didn't add that. I'll make sure to add all such details
> in future.
> 
> 
> >> 'params'
> >>     Write-Only file. Optional.
> >>     String input. Libvirt would pass the string given in XML file to
> >> this file and then create mdev device. Set empty string to clear params.
> >> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> >> limiter for performance benchmarking, then create device of type 11. The
> >> device created would have that parameter set by vendor driver.  
> > 
> > Might as well just call it "magic", there's absolutely no ability to
> > introspect what parameters are allowed or even valid here.  
> 
> Libvirt don't need to introspect these parameters. These are meant for
> vendor driver. Let vendor driver interpret these parameters and take
> action based on that.

Doesn't matter, there's clearly as strong feeling that libvirt wants
nothing to do with passing opaque vendor defined strings to a device.
Paolo suggested these could be attributed on the mdev device itself
that could be set by some 3rd party.  I suggested that these could be
module options on the vendor driver whether to expose types with these
properties.  There are other, more reasonable ways to do this.
 
> >  Can all parameters be changed dynamically?  
> 
> Parameters here would be extra parameters which are not listed above in
> mandatory list. If such parameters are required to set, these parameters
> should be set per mdev device before VM is booted.
> 
> >  Do parameter changes apply to existing devices or only future devices?  
> 
> No, it should be applied at the time when mdev device is created or
> after mdev device is created but before VM is booted. It will not be
> applicable to existing devices.
> 
> >  This is a poorly specified
> > interface.  I'd prefer this be done via module options on the vendor
> > driver.
> >   
> 
> Module option is not suitable here, in that case such parameters would
> be applied to all mdev device created for vendor driver and that is not
> what we want to.

Then use attributes on the mdev device itself, this is not an
acceptable interface.

> >> 'requires_group'
> >>     Read-Only file. Optional.
> >>     This should be provided by vendor driver if vendor driver need to
> >> group mdev devices in one domain so that vendor driver can use 'first
> >> open' to commit resources of all mdev devices associated to that domain
> >> and 'last close' to free those.
> >>
> >> The parent device would look like:
> >>
> >>    <device>
> >>      <name>pci_0000_86_00_0</name>
> >>      <capability type='pci'>
> >>        <domain>0</domain>
> >>        <bus>134</bus>
> >>        <slot>0</slot>
> >>        <function>0</function>  
> > 
> > This is the parent device address?  (I think we'd re-use the
> > specification for assigned devices)  Is this optional?  Couldn't
> > libvirt choose to pick any parent device supplying the specified type
> > if not supplied?
> >   
> >>        <capability type='mdev'>
> >>          <!-- one type element per sysfs directory -->
> >>          <type id='11'>
> >>            <!-- one element per sysfs file roughly -->
> >>            <name>GRID M60-0B</name>
> >>            <attribute name='fb_length'>512M</attribute>
> >>            <attribute name='resolution'>2560x1600</attribute>
> >>            <attribute name='heads'>2</attribute>
> >>            <attribute name='max_instances'>16</attribute>
> >>            <attribute name='requires_group'>1</attribute>
> >>          </type>
> >>        </capability>
> >>        <product id='...'>GRID M60</product>
> >>        <vendor id='0x10de'>NVIDIA</vendor>  
> > 
> > What are these specifying?
> >   
> >>      </capability>
> >>    </device>
> >>
> >> 2. Create/destroy mediated device
> >>
> >> With above example, vGPU device XML would look like:
> >>
> >>    <device>
> >>      <name>my-vgpu</name>
> >>      <parent>pci_0000_86_00_0</parent>
> >>      <capability type='mdev'>
> >>        <type id='11'/>
> >>        <group>1</group>
> >>        <params>'frame_rate_limiter=0'</params>
> >>      </capability>
> >>    </device>
> >>
> >> 'type id' is mandatory.  
> > 
> > I was under the impression that the vendor supplied "name" was the one
> > unique identifier.  We're certainly not going to create a registrar to
> > hand out type ids to each vendor, so what makes type id unique?  
> 
> Type id is unique, 'name' is the name (string) of device that would be
> displayed in guest OS for that type of mdev device.

We were told at the BoF that name was the unique string which would be
consistent and you again indicate below that "GRID-M60-0B" has a fixed
set of attributes, so why do we care what type id is associated with
that name?
 
> >  I have
> > a hard time believing that a given vendor can even allocate unique type
> > ids for their own devices.  Unique type id across vendors is not
> > practical.  So which attribute are we actually using to identify which
> > type of mdev device we need and why would we ever specify additional
> > attributes like fb_length?  Doesn't the vendor guarantee that "GRID
> > M60-0B" has a fixed setup of those attributes?
> >   
> 
> Specifying attributes here is not our requirement. Yes we have fixed set
> of attributes for "GRID-M60-0B" and on.
> We are defining the attributed here for "display" class for all other
> vendor of gpu can use.
> 
> 
> >> 'group' is optional. It should be a unique number in the system among
> >> all the groups created for mdev devices. Its usage is:
> >>   - not needed if single vGPU device is being assigned to a domain.
> >>   - only need to be set if multiple vGPUs need to be assigned to a
> >> domain and vendor driver have 'requires_group' file in type id directory.
> >>   - if type id directory include 'requires_group' and user tries to
> >> assign multiple vGPUs to a domain without having <group> field in XML,
> >> it will create single vGPU.  
> > 
> > We never finished our discussion of how this gets implemented or
> > whether a group applies only to devices from the same vendor, same
> > device, how heterogeneous groups are handled, etc.
> >   
> 
> We agreed on above points that I mentioned here and we wish to know what
> libvirt folks think, right?
> Since there were no inputs on that thread I thought I should summarize
> what had been discussed and get libvirt experts thoughts on this.

No, that was just an idea that was never full defined.  We had that
idea, the IOMMU group idea, the invalid container idea...  This is
still an open question in the design.

> >> 'params' is optional field. User should set this field if extra
> >> parameters need to be set for a particular vGPU device. Libvirt don't
> >> need to parse these params. These are meant for vendor driver.  
> > 
> > ie. magic black hole.  nope.
> >   
> >> Libvirt need to follow the sequence to create device:
> >> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> >> then only proceed else fail.
> >>
> >> * Set extra params if 'params' field exist in device XML and 'params'
> >> file exist in type id directory
> >>
> >>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
> >>
> >> * Autogenerate UUID
> >> * Create device:
> >>
> >>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
> >>
> >>     where <group> is optional. Group should be unique number among all
> >> the groups created for mdev devices.  
> > 
> > Is it an integer, a UUID, a string?  How is the group for an mdev
> > device discovered?  Can it be changed later?
> >   
> 
> Group number could be integer, that's where we need input from libvirt
> folks. It should be provided at the time of mdev device creation. It
> can't be changed after that. If user want to change group number
> destroy/remove mdev device and create it again with proper group number.

My concern is that a type id seems arbitrary but we're specifying that
it be unique.  We already have something unique, the name.  So why try
to make the type id unique as well?  A vendor can accidentally create
their vendor driver so that a given name means something very
specific.  On the other hand they need to be extremely deliberate to
coordinate that a type id means a unique thing across all their product
lines.

> >> * Clear params, if set earlier:
> >>
> >>     echo "" > /sys/../0000\:86\:00.0/11/params  
> > 
> > So params only applies to the devices created while those params are
> > set?  This is inherently racy.
> >   
> >> * To destroy device:
> >>
> >>     echo $UUID > /sys/../0000\:86\:00.0/mdev_destroy  
> > 
> > echo 1 > /sys/path/to/mdev/device/remove
> >    
> >> 3. Start/stop mediated device
> >>
> >> No change or requirement for libvirt as this will be handled by open()
> >> and close() callbacks to vendor driver. In case of multiple devices and
> >> 'requires_group' set, this will be handled in 'first open()' and 'last
> >> close()' on device in that group.
> >>
> >> 4. Launch QEMU/VM
> >>
> >>  Pass the mdev sysfs path to QEMU as vfio-pci device.
> >>  For above vGPU device example:
> >>
> >>     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID
> >>
> >> 5. QEMU/VM Shutdown sequence
> >>
> >> No change or requirement for libvirt.
> >>
> >> 6. VM Reset
> >>
> >> No change or requirement for libvirt as this will be handled via VFIO
> >> reset API and QEMU process will keep running as before.
> >>
> >> 7. Hot-plug
> >>
> >> It is same syntax to create a virtual device for hot-plug.  
> > 
> > How do groups work with hotplug?  Can a device be creating into an
> > existing, running group?  Can a device be removed from an existing,
> > running group?  Thanks,
> >   
> 
> Group is for vendor driver to decide when to commit resources for a
> group of devices to support multiple devices in a domain. It will be
> upto vendor driver how it manage it.

Then it's not sufficiently specified for libvirt to use.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 14:41         ` Daniel P. Berrange
@ 2016-09-20 14:49           ` Paolo Bonzini
  2016-09-20 14:58             ` Daniel P. Berrange
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2016-09-20 14:49 UTC (permalink / raw)
  To: Daniel P. Berrange, Kirti Wankhede
  Cc: Alex Williamson, Andy Currid, Tian, Kevin, Neo Jia, libvir-list,
	qemu-devel, Song, Jike, Gerd Hoffmann, bjsdjshi



On 20/09/2016 16:41, Daniel P. Berrange wrote:
> As I've said in my earlier reply - libvirt will *NOT* support passing
> arbitrary vendor specific parameters as a blob via the XML. Everything
> that appears in the XML must be *fully* specified and explicitly
> represented in the XML  as a distinct attribute or element.

Are generic key/value attributes (e.g. a <attribute> element) acceptable?

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 14:35       ` Kirti Wankhede
  2016-09-20 14:41         ` Daniel P. Berrange
@ 2016-09-20 14:52         ` Alex Williamson
  1 sibling, 0 replies; 62+ messages in thread
From: Alex Williamson @ 2016-09-20 14:52 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Paolo Bonzini, libvir-list, qemu-devel, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann

On Tue, 20 Sep 2016 20:05:01 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 9/20/2016 3:55 AM, Alex Williamson wrote:
> > On Mon, 19 Sep 2016 23:50:56 +0200
> > Paolo Bonzini <pbonzini@redhat.com> wrote:
> >   
> >> On 19/09/2016 23:36, Alex Williamson wrote:  
> >>> On Tue, 20 Sep 2016 02:05:52 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:    
> >>>> 'fb_length':
> >>>>     Read-only file. Mandatory.
> >>>>     Returns <number>{K,M,G}, size of framebuffer.    
> >>>
> >>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> >>> just one user of mediated devices. [...]
> >>>     
> >>>> 'params'
> >>>>     Write-Only file. Optional.
> >>>>     String input. Libvirt would pass the string given in XML file to
> >>>> this file and then create mdev device. Set empty string to clear params.
> >>>> For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> >>>> limiter for performance benchmarking, then create device of type 11. The
> >>>> device created would have that parameter set by vendor driver.    
> >>>
> >>> Might as well just call it "magic", there's absolutely no ability to
> >>> introspect what parameters are allowed or even valid here.  Can all
> >>> parameters be changed dynamically?  Do parameter changes apply to
> >>> existing devices or only future devices?  This is a poorly specified
> >>> interface.  I'd prefer this be done via module options on the vendor
> >>> driver.    
> >>
> >> Or additional files in the mdev's sysfs directory?  
> > 
> > Sure, the vendor driver could certainly support that, it'd be vendor
> > specific of course.
> >  
> 
> Right, it would be vendor specific. But if this is not set through mdev
> device XML, user have to create mdev device with XML first and then
> manually set such parameter. Also these parameters would not retain if
> mdev device is destroyed and created again. That's why if libvirt sets
> these parameters by reading it from XML file, it would be simple for user.
> 
> If we move 'params' file to mdev device, could libvirt do the following
> to create mdev device:
> 
> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> then only proceed else fail.
> 
> * Autogenerate UUID
> * Create device:
> 
>     echo "$UUID" > /sys/../0000\:86\:00.0/11/create
> 
> * Set extra params if 'params' field exist in device XML and 'params'
> file exist mdev device directory:
> 
>     echo "frame_rate_limiter=0" > /sys/bus/mdev/devices/$UUID/params

Why does libvirt even need to participate in setting these additional
parameters?  The example here is for benchmarking, which doesn't
express to me a strong need to be automated through libvirt.  For
hostdev devices libvirt supports a managed='no' option that requires
the user to setup the device in advance.  Something similar sounds
reasonable here too.

This is also where I see that a vendor module option could be used,
something like:

modprobe nvidia enable-no-frl-mdev

You say you don't want to apply this to all mdev devices, but maybe it
just makes an additional type-id available "GRID-M60-0B-NOFRL".  Then
you've removed libvirt from the equation, the VM simply specifies this
type-id that incorporates that feature.  Obviously it gets complicated
if there are multiple features to enable, but we've only seen one
example so far of why this parameter is needed.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 14:49           ` Paolo Bonzini
@ 2016-09-20 14:58             ` Daniel P. Berrange
  2016-09-20 15:05               ` Paolo Bonzini
  0 siblings, 1 reply; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-20 14:58 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kirti Wankhede, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi

On Tue, Sep 20, 2016 at 04:49:09PM +0200, Paolo Bonzini wrote:
> 
> 
> On 20/09/2016 16:41, Daniel P. Berrange wrote:
> > As I've said in my earlier reply - libvirt will *NOT* support passing
> > arbitrary vendor specific parameters as a blob via the XML. Everything
> > that appears in the XML must be *fully* specified and explicitly
> > represented in the XML  as a distinct attribute or element.
> 
> Are generic key/value attributes (e.g. a <attribute> element) acceptable?

Only if libvirt has a known list of valid attribute key names upfront.
We don't want to just blindly expose arbitary vendor specific keys exposed
by the kernel. Libvirt's job is to ensure the XML representation is vendor
portable and stable across software versions. If we just blindly expose
the strings reported by the host, then the XML will change if the vendor
arbitararily renames an attribute in a software update, or if two
vendors have the same concept there's no guaranteed name between them.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 14:58             ` Daniel P. Berrange
@ 2016-09-20 15:05               ` Paolo Bonzini
  2016-09-20 15:14                 ` Daniel P. Berrange
  0 siblings, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2016-09-20 15:05 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kirti Wankhede, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi



On 20/09/2016 16:58, Daniel P. Berrange wrote:
> > > As I've said in my earlier reply - libvirt will *NOT* support passing
> > > arbitrary vendor specific parameters as a blob via the XML. Everything
> > > that appears in the XML must be *fully* specified and explicitly
> > > represented in the XML  as a distinct attribute or element.
> > 
> > Are generic key/value attributes (e.g. a <attribute> element) acceptable?
>
> Only if libvirt has a known list of valid attribute key names upfront.
> We don't want to just blindly expose arbitary vendor specific keys exposed
> by the kernel. Libvirt's job is to ensure the XML representation is vendor
> portable

Ok, then I guess vendor-specific mdev parameters are out completely.  Or
could they be added under a separate namespace, like QEMU passthrough?

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 15:05               ` Paolo Bonzini
@ 2016-09-20 15:14                 ` Daniel P. Berrange
  2016-09-20 16:31                   ` Kirti Wankhede
  2016-09-20 17:21                   ` Paolo Bonzini
  0 siblings, 2 replies; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-20 15:14 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kirti Wankhede, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi

On Tue, Sep 20, 2016 at 05:05:43PM +0200, Paolo Bonzini wrote:
> 
> 
> On 20/09/2016 16:58, Daniel P. Berrange wrote:
> > > > As I've said in my earlier reply - libvirt will *NOT* support passing
> > > > arbitrary vendor specific parameters as a blob via the XML. Everything
> > > > that appears in the XML must be *fully* specified and explicitly
> > > > represented in the XML  as a distinct attribute or element.
> > > 
> > > Are generic key/value attributes (e.g. a <attribute> element) acceptable?
> >
> > Only if libvirt has a known list of valid attribute key names upfront.
> > We don't want to just blindly expose arbitary vendor specific keys exposed
> > by the kernel. Libvirt's job is to ensure the XML representation is vendor
> > portable
> 
> Ok, then I guess vendor-specific mdev parameters are out completely.  Or
> could they be added under a separate namespace, like QEMU passthrough?

The QEMU XML namespace that allows passthrough is intended either as a
short term hack to let people experiment with new QEMU features, before
they get explicitly represented in the main XML namespace, or in a few
cases, for things we never want to support officially.  Any VM which
uses the separate namespace is tainted, which means if theres a bug
report we'll require the reported to remove whatever config caused
the tainting and then reproduce the problem.

If the vendor specific mdev parameters are things that apps will often
need to be able to set, then allowing it via a separate namespace is
just providing a backdoor that everyone needs to use, defeating the
point of using a separate namespace.

We don't want to knowingly design a system which defacto requires
the app to use the separate namespace most/all of the time.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 14:43     ` Alex Williamson
@ 2016-09-20 16:23       ` Kirti Wankhede
  2016-09-20 16:50         ` Alex Williamson
                           ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-20 16:23 UTC (permalink / raw)
  To: Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann



On 9/20/2016 8:13 PM, Alex Williamson wrote:
> On Tue, 20 Sep 2016 19:51:58 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> On 9/20/2016 3:06 AM, Alex Williamson wrote:
>>> On Tue, 20 Sep 2016 02:05:52 +0530
>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>>>   
>>>> Hi libvirt experts,
>>>>
>>>>
>>>> 'create':
>>>>     Write-only file. Mandatory.
>>>>     Accepts string to create mediated device.
>>>>
>>>> 'name':
>>>>     Read-Only file. Mandatory.
>>>>     Returns string, the name of that type id.
>>>>
>>>> 'fb_length':
>>>>     Read-only file. Mandatory.
>>>>     Returns <number>{K,M,G}, size of framebuffer.  
>>>
>>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
>>> just one user of mediated devices.
>>>   
>>
>> As per our discussion in BOF, we would define class and its attributes.
>> As I mentioned above these are attributes of "display" class.
> 
> Just as I didn't know here, how does libvirt know the class of a given
> type id?
>  

Yes, we need some way to identify the class as Daniel mentioned in his
suggestion. Add a file 'class' in mdev_supported_types that would return
the class.


>>>  Can all parameters be changed dynamically?  
>>
>> Parameters here would be extra parameters which are not listed above in
>> mandatory list. If such parameters are required to set, these parameters
>> should be set per mdev device before VM is booted.
>>
>>>  Do parameter changes apply to existing devices or only future devices?  
>>
>> No, it should be applied at the time when mdev device is created or
>> after mdev device is created but before VM is booted. It will not be
>> applicable to existing devices.
>>
>>>  This is a poorly specified
>>> interface.  I'd prefer this be done via module options on the vendor
>>> driver.
>>>   
>>
>> Module option is not suitable here, in that case such parameters would
>> be applied to all mdev device created for vendor driver and that is not
>> what we want to.
> 
> Then use attributes on the mdev device itself, this is not an
> acceptable interface.
> 
>>>>
>>>> With above example, vGPU device XML would look like:
>>>>
>>>>    <device>
>>>>      <name>my-vgpu</name>
>>>>      <parent>pci_0000_86_00_0</parent>
>>>>      <capability type='mdev'>
>>>>        <type id='11'/>
>>>>        <group>1</group>
>>>>        <params>'frame_rate_limiter=0'</params>
>>>>      </capability>
>>>>    </device>
>>>>
>>>> 'type id' is mandatory.  
>>>
>>> I was under the impression that the vendor supplied "name" was the one
>>> unique identifier.  We're certainly not going to create a registrar to
>>> hand out type ids to each vendor, so what makes type id unique?  
>>
>> Type id is unique, 'name' is the name (string) of device that would be
>> displayed in guest OS for that type of mdev device.
> 
> We were told at the BoF that name was the unique string which would be
> consistent and you again indicate below that "GRID-M60-0B" has a fixed
> set of attributes, so why do we care what type id is associated with
> that name?
>  
>>>  I have
>>> a hard time believing that a given vendor can even allocate unique type
>>> ids for their own devices.  Unique type id across vendors is not
>>> practical.  So which attribute are we actually using to identify which
>>> type of mdev device we need and why would we ever specify additional
>>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
>>> M60-0B" has a fixed setup of those attributes?
>>>   
>>
>> Specifying attributes here is not our requirement. Yes we have fixed set
>> of attributes for "GRID-M60-0B" and on.
>> We are defining the attributed here for "display" class for all other
>> vendor of gpu can use.
>>
>>
>>>> 'group' is optional. It should be a unique number in the system among
>>>> all the groups created for mdev devices. Its usage is:
>>>>   - not needed if single vGPU device is being assigned to a domain.
>>>>   - only need to be set if multiple vGPUs need to be assigned to a
>>>> domain and vendor driver have 'requires_group' file in type id directory.
>>>>   - if type id directory include 'requires_group' and user tries to
>>>> assign multiple vGPUs to a domain without having <group> field in XML,
>>>> it will create single vGPU.  
>>>
>>> We never finished our discussion of how this gets implemented or
>>> whether a group applies only to devices from the same vendor, same
>>> device, how heterogeneous groups are handled, etc.
>>>   
>>
>> We agreed on above points that I mentioned here and we wish to know what
>> libvirt folks think, right?
>> Since there were no inputs on that thread I thought I should summarize
>> what had been discussed and get libvirt experts thoughts on this.
> 
> No, that was just an idea that was never full defined.  We had that
> idea, the IOMMU group idea, the invalid container idea...  This is
> still an open question in the design.
> 

We do want to close down on the design as soon as possible.
I had tried to address open questions in earlier discussion also. Let me
address your concerns you have:

> whether a group applies only to devices from the same vendor,

Yes.

> same device,

Group can be of mdev devices from different supported types and from
different physical device but should be from same vendor.

>how heterogeneous groups are handled,

'heterogeneous' does that mean device grouped together with different
vendors?

>>>> 'params' is optional field. User should set this field if extra
>>>> parameters need to be set for a particular vGPU device. Libvirt don't
>>>> need to parse these params. These are meant for vendor driver.  
>>>
>>> ie. magic black hole.  nope.
>>>   
>>>> Libvirt need to follow the sequence to create device:
>>>> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
>>>> then only proceed else fail.
>>>>
>>>> * Set extra params if 'params' field exist in device XML and 'params'
>>>> file exist in type id directory
>>>>
>>>>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
>>>>
>>>> * Autogenerate UUID
>>>> * Create device:
>>>>
>>>>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
>>>>
>>>>     where <group> is optional. Group should be unique number among all
>>>> the groups created for mdev devices.  
>>>
>>> Is it an integer, a UUID, a string?  How is the group for an mdev
>>> device discovered?  Can it be changed later?
>>>   
>>
>> Group number could be integer, that's where we need input from libvirt
>> folks. It should be provided at the time of mdev device creation. It
>> can't be changed after that. If user want to change group number
>> destroy/remove mdev device and create it again with proper group number.
> 
> My concern is that a type id seems arbitrary but we're specifying that
> it be unique.  We already have something unique, the name.  So why try
> to make the type id unique as well?  A vendor can accidentally create
> their vendor driver so that a given name means something very
> specific.  On the other hand they need to be extremely deliberate to
> coordinate that a type id means a unique thing across all their product
> lines.
> 

Let me clarify, type id should be unique in the list of
mdev_supported_types. You can't have 2 directories in with same name.

I think here 'name' is the problem and it is not used in xml file.
Lets not make it mandatory. But vendor driver can have it which 3rd
party can use to read.
Lets remove 'id' from type id in XML if that is the concern. Supported
types is going to be defined by vendor driver, so let vendor driver
decide what to use for directory name and same should be used in device
xml file, it could be '11' or "GRID M60-0B":

    <device>
      <name>my-vgpu</name>
      <parent>pci_0000_86_00_0</parent>
      <capability type='mdev'>
        <type='11'/>
        ...
      </capability>
    </device>


>>>>
>>>> 7. Hot-plug
>>>>
>>>> It is same syntax to create a virtual device for hot-plug.  
>>>
>>> How do groups work with hotplug?  Can a device be creating into an
>>> existing, running group?  Can a device be removed from an existing,
>>> running group?  Thanks,
>>>   
>>
>> Group is for vendor driver to decide when to commit resources for a
>> group of devices to support multiple devices in a domain. It will be
>> upto vendor driver how it manage it.
> 
> Then it's not sufficiently specified for libvirt to use.  Thanks,
> 

Can you clarify what if required for libvirt to use?

Kirti.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 15:14                 ` Daniel P. Berrange
@ 2016-09-20 16:31                   ` Kirti Wankhede
  2016-09-20 16:36                     ` Daniel P. Berrange
  2016-09-20 16:46                     ` Daniel P. Berrange
  2016-09-20 17:21                   ` Paolo Bonzini
  1 sibling, 2 replies; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-20 16:31 UTC (permalink / raw)
  To: Daniel P. Berrange, Paolo Bonzini
  Cc: Alex Williamson, Andy Currid, Tian, Kevin, Neo Jia, libvir-list,
	qemu-devel, Song, Jike, Gerd Hoffmann, bjsdjshi



On 9/20/2016 8:44 PM, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 05:05:43PM +0200, Paolo Bonzini wrote:
>>
>>
>> On 20/09/2016 16:58, Daniel P. Berrange wrote:
>>>>> As I've said in my earlier reply - libvirt will *NOT* support passing
>>>>> arbitrary vendor specific parameters as a blob via the XML. Everything
>>>>> that appears in the XML must be *fully* specified and explicitly
>>>>> represented in the XML  as a distinct attribute or element.
>>>>
>>>> Are generic key/value attributes (e.g. a <attribute> element) acceptable?
>>>
>>> Only if libvirt has a known list of valid attribute key names upfront.
>>> We don't want to just blindly expose arbitary vendor specific keys exposed
>>> by the kernel. Libvirt's job is to ensure the XML representation is vendor
>>> portable
>>

In key/value attributes (taking example from proposed xml file)

<attribute name='resolution'>2560x1600</attribute>

'Key' (i.e. 'resolution') should be known upfront, not the value, right?

Kirti


>> Ok, then I guess vendor-specific mdev parameters are out completely.  Or
>> could they be added under a separate namespace, like QEMU passthrough?
> 
> The QEMU XML namespace that allows passthrough is intended either as a
> short term hack to let people experiment with new QEMU features, before
> they get explicitly represented in the main XML namespace, or in a few
> cases, for things we never want to support officially.  Any VM which
> uses the separate namespace is tainted, which means if theres a bug
> report we'll require the reported to remove whatever config caused
> the tainting and then reproduce the problem.
> 
> If the vendor specific mdev parameters are things that apps will often
> need to be able to set, then allowing it via a separate namespace is
> just providing a backdoor that everyone needs to use, defeating the
> point of using a separate namespace.
> 
> We don't want to knowingly design a system which defacto requires
> the app to use the separate namespace most/all of the time.
> 
> 
> Regards,
> Daniel
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 16:31                   ` Kirti Wankhede
@ 2016-09-20 16:36                     ` Daniel P. Berrange
  2016-09-20 16:42                       ` Kirti Wankhede
  2016-09-20 16:46                     ` Daniel P. Berrange
  1 sibling, 1 reply; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-20 16:36 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Paolo Bonzini, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi

On Tue, Sep 20, 2016 at 10:01:18PM +0530, Kirti Wankhede wrote:
> 
> 
> On 9/20/2016 8:44 PM, Daniel P. Berrange wrote:
> > On Tue, Sep 20, 2016 at 05:05:43PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 20/09/2016 16:58, Daniel P. Berrange wrote:
> >>>>> As I've said in my earlier reply - libvirt will *NOT* support passing
> >>>>> arbitrary vendor specific parameters as a blob via the XML. Everything
> >>>>> that appears in the XML must be *fully* specified and explicitly
> >>>>> represented in the XML  as a distinct attribute or element.
> >>>>
> >>>> Are generic key/value attributes (e.g. a <attribute> element) acceptable?
> >>>
> >>> Only if libvirt has a known list of valid attribute key names upfront.
> >>> We don't want to just blindly expose arbitary vendor specific keys exposed
> >>> by the kernel. Libvirt's job is to ensure the XML representation is vendor
> >>> portable
> >>
> 
> In key/value attributes (taking example from proposed xml file)
> 
> <attribute name='resolution'>2560x1600</attribute>
> 
> 'Key' (i.e. 'resolution') should be known upfront, not the value, right?

Yes, the actual value is not important - only its structured.
ie, libvirt would check that it is in the format '$WIDTHx$HEIGHT'
and reject it if not.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 16:36                     ` Daniel P. Berrange
@ 2016-09-20 16:42                       ` Kirti Wankhede
  2016-09-20 16:44                         ` Daniel P. Berrange
  0 siblings, 1 reply; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-20 16:42 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Paolo Bonzini, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi



On 9/20/2016 10:06 PM, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 10:01:18PM +0530, Kirti Wankhede wrote:
>>
>>
>> On 9/20/2016 8:44 PM, Daniel P. Berrange wrote:
>>> On Tue, Sep 20, 2016 at 05:05:43PM +0200, Paolo Bonzini wrote:
>>>>
>>>>
>>>> On 20/09/2016 16:58, Daniel P. Berrange wrote:
>>>>>>> As I've said in my earlier reply - libvirt will *NOT* support passing
>>>>>>> arbitrary vendor specific parameters as a blob via the XML. Everything
>>>>>>> that appears in the XML must be *fully* specified and explicitly
>>>>>>> represented in the XML  as a distinct attribute or element.
>>>>>>
>>>>>> Are generic key/value attributes (e.g. a <attribute> element) acceptable?
>>>>>
>>>>> Only if libvirt has a known list of valid attribute key names upfront.
>>>>> We don't want to just blindly expose arbitary vendor specific keys exposed
>>>>> by the kernel. Libvirt's job is to ensure the XML representation is vendor
>>>>> portable
>>>>
>>
>> In key/value attributes (taking example from proposed xml file)
>>
>> <attribute name='resolution'>2560x1600</attribute>
>>
>> 'Key' (i.e. 'resolution') should be known upfront, not the value, right?
> 
> Yes, the actual value is not important - only its structured.
> ie, libvirt would check that it is in the format '$WIDTHx$HEIGHT'
> and reject it if not.
> 

In this particular example, libvirt checks if its integer? or value
could be 2560x1600 or 4096x4096 both are valid?

Does libvirt accept string value?

Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 16:42                       ` Kirti Wankhede
@ 2016-09-20 16:44                         ` Daniel P. Berrange
  0 siblings, 0 replies; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-20 16:44 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Paolo Bonzini, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi

On Tue, Sep 20, 2016 at 10:12:20PM +0530, Kirti Wankhede wrote:
> 
> 
> On 9/20/2016 10:06 PM, Daniel P. Berrange wrote:
> > On Tue, Sep 20, 2016 at 10:01:18PM +0530, Kirti Wankhede wrote:
> >>
> >>
> >> On 9/20/2016 8:44 PM, Daniel P. Berrange wrote:
> >>> On Tue, Sep 20, 2016 at 05:05:43PM +0200, Paolo Bonzini wrote:
> >>>>
> >>>>
> >>>> On 20/09/2016 16:58, Daniel P. Berrange wrote:
> >>>>>>> As I've said in my earlier reply - libvirt will *NOT* support passing
> >>>>>>> arbitrary vendor specific parameters as a blob via the XML. Everything
> >>>>>>> that appears in the XML must be *fully* specified and explicitly
> >>>>>>> represented in the XML  as a distinct attribute or element.
> >>>>>>
> >>>>>> Are generic key/value attributes (e.g. a <attribute> element) acceptable?
> >>>>>
> >>>>> Only if libvirt has a known list of valid attribute key names upfront.
> >>>>> We don't want to just blindly expose arbitary vendor specific keys exposed
> >>>>> by the kernel. Libvirt's job is to ensure the XML representation is vendor
> >>>>> portable
> >>>>
> >>
> >> In key/value attributes (taking example from proposed xml file)
> >>
> >> <attribute name='resolution'>2560x1600</attribute>
> >>
> >> 'Key' (i.e. 'resolution') should be known upfront, not the value, right?
> > 
> > Yes, the actual value is not important - only its structured.
> > ie, libvirt would check that it is in the format '$WIDTHx$HEIGHT'
> > and reject it if not.
> > 
> 
> In this particular example, libvirt checks if its integer? or value
> could be 2560x1600 or 4096x4096 both are valid?
>
> Does libvirt accept string value?

Err, as I said, we'd validate that its in the format '$WIDTHx$HEIGHT'


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 16:31                   ` Kirti Wankhede
  2016-09-20 16:36                     ` Daniel P. Berrange
@ 2016-09-20 16:46                     ` Daniel P. Berrange
  1 sibling, 0 replies; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-20 16:46 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Paolo Bonzini, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi

On Tue, Sep 20, 2016 at 10:01:18PM +0530, Kirti Wankhede wrote:
> 
> 
> On 9/20/2016 8:44 PM, Daniel P. Berrange wrote:
> > On Tue, Sep 20, 2016 at 05:05:43PM +0200, Paolo Bonzini wrote:
> >>
> >>
> >> On 20/09/2016 16:58, Daniel P. Berrange wrote:
> >>>>> As I've said in my earlier reply - libvirt will *NOT* support passing
> >>>>> arbitrary vendor specific parameters as a blob via the XML. Everything
> >>>>> that appears in the XML must be *fully* specified and explicitly
> >>>>> represented in the XML  as a distinct attribute or element.
> >>>>
> >>>> Are generic key/value attributes (e.g. a <attribute> element) acceptable?
> >>>
> >>> Only if libvirt has a known list of valid attribute key names upfront.
> >>> We don't want to just blindly expose arbitary vendor specific keys exposed
> >>> by the kernel. Libvirt's job is to ensure the XML representation is vendor
> >>> portable
> >>
> 
> In key/value attributes (taking example from proposed xml file)
> 
> <attribute name='resolution'>2560x1600</attribute>
> 
> 'Key' (i.e. 'resolution') should be known upfront, not the value, right?

NB, resolution is a good example of why we would *not* use the
generic <attribute name='$key'>$value</attribute> format in
the XML. Instead we'd represent it as

   <resolution width='2560' height='1600'/>

as applications should not have to further parse data after extracting
it from an XML attribute/element.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 16:23       ` Kirti Wankhede
@ 2016-09-20 16:50         ` Alex Williamson
  2016-09-21 18:34           ` Kirti Wankhede
  2016-09-21  4:10         ` Tian, Kevin
  2016-09-22 19:25         ` Tian, Kevin
  2 siblings, 1 reply; 62+ messages in thread
From: Alex Williamson @ 2016-09-20 16:50 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann

On Tue, 20 Sep 2016 21:53:16 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 9/20/2016 8:13 PM, Alex Williamson wrote:
> > On Tue, 20 Sep 2016 19:51:58 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> On 9/20/2016 3:06 AM, Alex Williamson wrote:  
> >>> On Tue, 20 Sep 2016 02:05:52 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>     
> >>>> Hi libvirt experts,
> >>>>
> >>>>
> >>>> 'create':
> >>>>     Write-only file. Mandatory.
> >>>>     Accepts string to create mediated device.
> >>>>
> >>>> 'name':
> >>>>     Read-Only file. Mandatory.
> >>>>     Returns string, the name of that type id.
> >>>>
> >>>> 'fb_length':
> >>>>     Read-only file. Mandatory.
> >>>>     Returns <number>{K,M,G}, size of framebuffer.    
> >>>
> >>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> >>> just one user of mediated devices.
> >>>     
> >>
> >> As per our discussion in BOF, we would define class and its attributes.
> >> As I mentioned above these are attributes of "display" class.  
> > 
> > Just as I didn't know here, how does libvirt know the class of a given
> > type id?
> >    
> 
> Yes, we need some way to identify the class as Daniel mentioned in his
> suggestion. Add a file 'class' in mdev_supported_types that would return
> the class.
> 
> 
> >>>  Can all parameters be changed dynamically?    
> >>
> >> Parameters here would be extra parameters which are not listed above in
> >> mandatory list. If such parameters are required to set, these parameters
> >> should be set per mdev device before VM is booted.
> >>  
> >>>  Do parameter changes apply to existing devices or only future devices?    
> >>
> >> No, it should be applied at the time when mdev device is created or
> >> after mdev device is created but before VM is booted. It will not be
> >> applicable to existing devices.
> >>  
> >>>  This is a poorly specified
> >>> interface.  I'd prefer this be done via module options on the vendor
> >>> driver.
> >>>     
> >>
> >> Module option is not suitable here, in that case such parameters would
> >> be applied to all mdev device created for vendor driver and that is not
> >> what we want to.  
> > 
> > Then use attributes on the mdev device itself, this is not an
> > acceptable interface.
> >   
> >>>>
> >>>> With above example, vGPU device XML would look like:
> >>>>
> >>>>    <device>
> >>>>      <name>my-vgpu</name>
> >>>>      <parent>pci_0000_86_00_0</parent>
> >>>>      <capability type='mdev'>
> >>>>        <type id='11'/>
> >>>>        <group>1</group>
> >>>>        <params>'frame_rate_limiter=0'</params>
> >>>>      </capability>
> >>>>    </device>
> >>>>
> >>>> 'type id' is mandatory.    
> >>>
> >>> I was under the impression that the vendor supplied "name" was the one
> >>> unique identifier.  We're certainly not going to create a registrar to
> >>> hand out type ids to each vendor, so what makes type id unique?    
> >>
> >> Type id is unique, 'name' is the name (string) of device that would be
> >> displayed in guest OS for that type of mdev device.  
> > 
> > We were told at the BoF that name was the unique string which would be
> > consistent and you again indicate below that "GRID-M60-0B" has a fixed
> > set of attributes, so why do we care what type id is associated with
> > that name?
> >    
> >>>  I have
> >>> a hard time believing that a given vendor can even allocate unique type
> >>> ids for their own devices.  Unique type id across vendors is not
> >>> practical.  So which attribute are we actually using to identify which
> >>> type of mdev device we need and why would we ever specify additional
> >>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
> >>> M60-0B" has a fixed setup of those attributes?
> >>>     
> >>
> >> Specifying attributes here is not our requirement. Yes we have fixed set
> >> of attributes for "GRID-M60-0B" and on.
> >> We are defining the attributed here for "display" class for all other
> >> vendor of gpu can use.
> >>
> >>  
> >>>> 'group' is optional. It should be a unique number in the system among
> >>>> all the groups created for mdev devices. Its usage is:
> >>>>   - not needed if single vGPU device is being assigned to a domain.
> >>>>   - only need to be set if multiple vGPUs need to be assigned to a
> >>>> domain and vendor driver have 'requires_group' file in type id directory.
> >>>>   - if type id directory include 'requires_group' and user tries to
> >>>> assign multiple vGPUs to a domain without having <group> field in XML,
> >>>> it will create single vGPU.    
> >>>
> >>> We never finished our discussion of how this gets implemented or
> >>> whether a group applies only to devices from the same vendor, same
> >>> device, how heterogeneous groups are handled, etc.
> >>>     
> >>
> >> We agreed on above points that I mentioned here and we wish to know what
> >> libvirt folks think, right?
> >> Since there were no inputs on that thread I thought I should summarize
> >> what had been discussed and get libvirt experts thoughts on this.  
> > 
> > No, that was just an idea that was never full defined.  We had that
> > idea, the IOMMU group idea, the invalid container idea...  This is
> > still an open question in the design.
> >   
> 
> We do want to close down on the design as soon as possible.
> I had tried to address open questions in earlier discussion also. Let me
> address your concerns you have:
> 
> > whether a group applies only to devices from the same vendor,  
> 
> Yes.

So should we call this a vendor_group?  How does libvirt learn the
vendor?  We can't assume that the parent device will always be a PCI
device.  Maybe it's actually a driver_group and libvirt should put any
mdev devices with the same ->parent->driver into the same group.

If these groups are managed via the vendor driver, we don't really need
a global namespace of group ids, isn't it per vendor driver?  Maybe the
group should be a uuid rather than an integer to effectively remove the
collision/uniqueness problem.
 
> > same device,  
> 
> Group can be of mdev devices from different supported types and from
> different physical device but should be from same vendor.

I think we're just reinforcing that this is a driver-group.  Another
vendor might require a device-group, where devices must all be attached
to the same parent device.  "group" is too generic.

> >how heterogeneous groups are handled,  
> 
> 'heterogeneous' does that mean device grouped together with different
> vendors?

Different vendors or different type-ids.  I think you've clarified both
of these, but it certainly wasn't specified previously.
 
> >>>> 'params' is optional field. User should set this field if extra
> >>>> parameters need to be set for a particular vGPU device. Libvirt don't
> >>>> need to parse these params. These are meant for vendor driver.    
> >>>
> >>> ie. magic black hole.  nope.
> >>>     
> >>>> Libvirt need to follow the sequence to create device:
> >>>> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> >>>> then only proceed else fail.
> >>>>
> >>>> * Set extra params if 'params' field exist in device XML and 'params'
> >>>> file exist in type id directory
> >>>>
> >>>>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
> >>>>
> >>>> * Autogenerate UUID
> >>>> * Create device:
> >>>>
> >>>>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
> >>>>
> >>>>     where <group> is optional. Group should be unique number among all
> >>>> the groups created for mdev devices.    
> >>>
> >>> Is it an integer, a UUID, a string?  How is the group for an mdev
> >>> device discovered?  Can it be changed later?
> >>>     
> >>
> >> Group number could be integer, that's where we need input from libvirt
> >> folks. It should be provided at the time of mdev device creation. It
> >> can't be changed after that. If user want to change group number
> >> destroy/remove mdev device and create it again with proper group number.  
> > 
> > My concern is that a type id seems arbitrary but we're specifying that
> > it be unique.  We already have something unique, the name.  So why try
> > to make the type id unique as well?  A vendor can accidentally create
> > their vendor driver so that a given name means something very
> > specific.  On the other hand they need to be extremely deliberate to
> > coordinate that a type id means a unique thing across all their product
> > lines.
> >   
> 
> Let me clarify, type id should be unique in the list of
> mdev_supported_types. You can't have 2 directories in with same name.

Of course, but does that mean it's only unique to the machine I'm
currently running on?  Let's say I have a Tesla P100 on my system and
type-id 11 is named "GRID-M60-0B".  At some point in the future I
replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
new card still going to be a "GRID-M60-0B"?  If not then we've based
our XML on the wrong attribute.  If the new device does not support
"GRID-M60-0B" then we should generate an error, not simply initialize
whatever type-id 11 happens to be on this new card.

Again, we've defined that name is unique, so why should type-id be
anything other than arbitrary?  It relates only to the current instance
on the machine.
 
> I think here 'name' is the problem and it is not used in xml file.
> Lets not make it mandatory. But vendor driver can have it which 3rd
> party can use to read.
> Lets remove 'id' from type id in XML if that is the concern. Supported
> types is going to be defined by vendor driver, so let vendor driver
> decide what to use for directory name and same should be used in device
> xml file, it could be '11' or "GRID M60-0B":
> 
>     <device>
>       <name>my-vgpu</name>
>       <parent>pci_0000_86_00_0</parent>
>       <capability type='mdev'>
>         <type='11'/>
>         ...
>       </capability>
>     </device>
> 
> 
> >>>>
> >>>> 7. Hot-plug
> >>>>
> >>>> It is same syntax to create a virtual device for hot-plug.    
> >>>
> >>> How do groups work with hotplug?  Can a device be creating into an
> >>> existing, running group?  Can a device be removed from an existing,
> >>> running group?  Thanks,
> >>>     
> >>
> >> Group is for vendor driver to decide when to commit resources for a
> >> group of devices to support multiple devices in a domain. It will be
> >> upto vendor driver how it manage it.  
> > 
> > Then it's not sufficiently specified for libvirt to use.  Thanks,
> >   
> 
> Can you clarify what if required for libvirt to use?

Everything we're discussing above.  We can't simply say that group
management is defined by the vendor driver, that means that libvirt
needs to know how to uniquely behave for each vendor driver.  We want
to create a consistent framework where libvirt doesn't care what the
vendor driver is.  Therefore we need to make group manipulation fully
specified, regardless of the vendor driver.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 15:14                 ` Daniel P. Berrange
  2016-09-20 16:31                   ` Kirti Wankhede
@ 2016-09-20 17:21                   ` Paolo Bonzini
  2016-09-21  8:34                     ` Daniel P. Berrange
  1 sibling, 1 reply; 62+ messages in thread
From: Paolo Bonzini @ 2016-09-20 17:21 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kirti Wankhede, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi



On 20/09/2016 17:14, Daniel P. Berrange wrote:
> Any VM which
> uses the separate namespace is tainted, which means if theres a bug
> report we'll require the reported to remove whatever config caused
> the tainting and then reproduce the problem.
> 
> If the vendor specific mdev parameters are things that apps will often
> need to be able to set, then allowing it via a separate namespace is
> just providing a backdoor that everyone needs to use, defeating the
> point of using a separate namespace.

No, they don't need to be often set, and tainting the domain is a good idea.

Paolo

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 14:21   ` Kirti Wankhede
  2016-09-20 14:43     ` Alex Williamson
@ 2016-09-21  3:56     ` Tian, Kevin
  2016-09-21  4:36       ` Alex Williamson
  1 sibling, 1 reply; 62+ messages in thread
From: Tian, Kevin @ 2016-09-21  3:56 UTC (permalink / raw)
  To: Kirti Wankhede, Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> Sent: Tuesday, September 20, 2016 10:22 PM
> >>
> >> 'max_instance':
> >>     Read-Only file. Mandatory.
> >>     Returns integer.  Returns maximum mdev device could be created
> >> at the moment when this file is read. This count would be updated by
> >> vendor driver. Before creating mdev device of this type, check if
> >> max_instance is > 0.
> >
> > Didn't we discuss this being being something like "available_instances"
> > with a "devices" sub-directory linking current devices of that type?
> >
> 
> I'm ok with 'available_instances' as name of this file.
> Yes, mdev device directory will have links to devices of that type. I
> think that is part of mdev core module discussion. I'm trying to list
> sysfs files for libvirt interface here to focus more on libvirt
> interface. Hence didn't add that. I'll make sure to add all such details
> in future.
> 
> 

Definitely you need provide those details since they also contain
libvirt interface, e.g. 'destroy' which we discussed should be in
mdev directory. Another example is class-specific attributes such
as 'resolution', etc. which you listed under 'display class'. All those
attributes should be moved to mdev directory. Only class ID is
required under each type.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 16:23       ` Kirti Wankhede
  2016-09-20 16:50         ` Alex Williamson
@ 2016-09-21  4:10         ` Tian, Kevin
  2016-09-21  4:43           ` Alex Williamson
  2016-09-22 19:25         ` Tian, Kevin
  2 siblings, 1 reply; 62+ messages in thread
From: Tian, Kevin @ 2016-09-21  4:10 UTC (permalink / raw)
  To: Kirti Wankhede, Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> Sent: Wednesday, September 21, 2016 12:23 AM
> 
> 
> On 9/20/2016 8:13 PM, Alex Williamson wrote:
> > On Tue, 20 Sep 2016 19:51:58 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >
> >> On 9/20/2016 3:06 AM, Alex Williamson wrote:
> >>> On Tue, 20 Sep 2016 02:05:52 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>
> >>>> Hi libvirt experts,
> >>>>
> >>>>
> >>>> 'create':
> >>>>     Write-only file. Mandatory.
> >>>>     Accepts string to create mediated device.
> >>>>
> >>>> 'name':
> >>>>     Read-Only file. Mandatory.
> >>>>     Returns string, the name of that type id.
> >>>>
> >>>> 'fb_length':
> >>>>     Read-only file. Mandatory.
> >>>>     Returns <number>{K,M,G}, size of framebuffer.
> >>>
> >>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> >>> just one user of mediated devices.
> >>>
> >>
> >> As per our discussion in BOF, we would define class and its attributes.
> >> As I mentioned above these are attributes of "display" class.
> >
> > Just as I didn't know here, how does libvirt know the class of a given
> > type id?
> >
> 
> Yes, we need some way to identify the class as Daniel mentioned in his
> suggestion. Add a file 'class' in mdev_supported_types that would return
> the class.
> 
> 

'display' class or 'gpu' class? display represents only part of GPU 
functionalities. I assume you want to provide an unique class ID
for each type, instead of allowing multiple classes each identifying
a subset of controllable GPU resources (e.g. 'display', 'render', etc.)...

for unique class ID, I once considered whether it could be directly
inherited from class of parent device, since typically a vendor driver
creates mdev devices using the same type as physical device. But later
I realized one potential value of keep a class field for mdev types,
e.g. if we want to extend mdev to FPGA which could be programmed
as different classes of functionalities. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-21  3:56     ` Tian, Kevin
@ 2016-09-21  4:36       ` Alex Williamson
  2016-09-22  2:33         ` Tian, Kevin
       [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D18DF86F5F@SHSMSX101.ccr.corp.intel.com>
  0 siblings, 2 replies; 62+ messages in thread
From: Alex Williamson @ 2016-09-21  4:36 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Kirti Wankhede, libvir-list, qemu-devel, Paolo Bonzini, Neo Jia,
	Andy Currid, Song, Jike, bjsdjshi, Gerd Hoffmann

On Wed, 21 Sep 2016 03:56:21 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> > Sent: Tuesday, September 20, 2016 10:22 PM  
> > >>
> > >> 'max_instance':
> > >>     Read-Only file. Mandatory.
> > >>     Returns integer.  Returns maximum mdev device could be created
> > >> at the moment when this file is read. This count would be updated by
> > >> vendor driver. Before creating mdev device of this type, check if
> > >> max_instance is > 0.  
> > >
> > > Didn't we discuss this being being something like "available_instances"
> > > with a "devices" sub-directory linking current devices of that type?
> > >  
> > 
> > I'm ok with 'available_instances' as name of this file.
> > Yes, mdev device directory will have links to devices of that type. I
> > think that is part of mdev core module discussion. I'm trying to list
> > sysfs files for libvirt interface here to focus more on libvirt
> > interface. Hence didn't add that. I'll make sure to add all such details
> > in future.
> > 
> >   
> 
> Definitely you need provide those details since they also contain
> libvirt interface, e.g. 'destroy' which we discussed should be in
> mdev directory.

$ find /sys/devices -name remove | wc -l
24
$ find /sys/devices -name destroy | wc -l
0

'remove' is the sysfs deconstructor we should use.

> Another example is class-specific attributes such
> as 'resolution', etc. which you listed under 'display class'. All those
> attributes should be moved to mdev directory. Only class ID is
> required under each type.

Can you elaborate on what you're proposing here?  If we don't have
attributes like 'resolution' under the type-id then we can't describe
to the user the features of the mdev before we create it.  I don't
think anybody wants a 'create it and find out' type of interface.
Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-21  4:10         ` Tian, Kevin
@ 2016-09-21  4:43           ` Alex Williamson
  2016-09-22  2:43             ` Tian, Kevin
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Williamson @ 2016-09-21  4:43 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Kirti Wankhede, libvir-list, qemu-devel, Paolo Bonzini, Neo Jia,
	Andy Currid, Song, Jike, bjsdjshi, Gerd Hoffmann

On Wed, 21 Sep 2016 04:10:53 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> > Sent: Wednesday, September 21, 2016 12:23 AM
> > 
> > 
> > On 9/20/2016 8:13 PM, Alex Williamson wrote:  
> > > On Tue, 20 Sep 2016 19:51:58 +0530
> > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >  
> > >> On 9/20/2016 3:06 AM, Alex Williamson wrote:  
> > >>> On Tue, 20 Sep 2016 02:05:52 +0530
> > >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >>>  
> > >>>> Hi libvirt experts,
> > >>>>
> > >>>>
> > >>>> 'create':
> > >>>>     Write-only file. Mandatory.
> > >>>>     Accepts string to create mediated device.
> > >>>>
> > >>>> 'name':
> > >>>>     Read-Only file. Mandatory.
> > >>>>     Returns string, the name of that type id.
> > >>>>
> > >>>> 'fb_length':
> > >>>>     Read-only file. Mandatory.
> > >>>>     Returns <number>{K,M,G}, size of framebuffer.  
> > >>>
> > >>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> > >>> just one user of mediated devices.
> > >>>  
> > >>
> > >> As per our discussion in BOF, we would define class and its attributes.
> > >> As I mentioned above these are attributes of "display" class.  
> > >
> > > Just as I didn't know here, how does libvirt know the class of a given
> > > type id?
> > >  
> > 
> > Yes, we need some way to identify the class as Daniel mentioned in his
> > suggestion. Add a file 'class' in mdev_supported_types that would return
> > the class.
> > 
> >   
> 
> 'display' class or 'gpu' class? display represents only part of GPU 
> functionalities. I assume you want to provide an unique class ID
> for each type, instead of allowing multiple classes each identifying
> a subset of controllable GPU resources (e.g. 'display', 'render', etc.)...

Not sure where you're going with this, we're not creating a programming
interface for the device, that exists through the vfio API.  I believe
the intention here is simply a user level classification for the
purpose of being able to interpret the attributes provided in a logical
way.

> for unique class ID, I once considered whether it could be directly
> inherited from class of parent device, since typically a vendor driver
> creates mdev devices using the same type as physical device. But later
> I realized one potential value of keep a class field for mdev types,
> e.g. if we want to extend mdev to FPGA which could be programmed
> as different classes of functionalities. :-)

And how do we generically determine the class of a parent device?  We
cannot assume the parent device is PCI.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 17:21                   ` Paolo Bonzini
@ 2016-09-21  8:34                     ` Daniel P. Berrange
  0 siblings, 0 replies; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-21  8:34 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Kirti Wankhede, Alex Williamson, Andy Currid, Tian, Kevin,
	Neo Jia, libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	bjsdjshi

On Tue, Sep 20, 2016 at 07:21:07PM +0200, Paolo Bonzini wrote:
> 
> 
> On 20/09/2016 17:14, Daniel P. Berrange wrote:
> > Any VM which
> > uses the separate namespace is tainted, which means if theres a bug
> > report we'll require the reported to remove whatever config caused
> > the tainting and then reproduce the problem.
> > 
> > If the vendor specific mdev parameters are things that apps will often
> > need to be able to set, then allowing it via a separate namespace is
> > just providing a backdoor that everyone needs to use, defeating the
> > point of using a separate namespace.
> 
> No, they don't need to be often set, and tainting the domain is a good idea.

Basically think of the XML namespace support as

 "a mechanism for accessing underlying platform features before they
  are exposed in the main XML, that happens to use the vendor specific
  data format"

*not* as

 "a mechanism for exposing vendor specific functionality"

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 16:50         ` Alex Williamson
@ 2016-09-21 18:34           ` Kirti Wankhede
  2016-09-21 19:03             ` Alex Williamson
  0 siblings, 1 reply; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-21 18:34 UTC (permalink / raw)
  To: Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann



On 9/20/2016 10:20 PM, Alex Williamson wrote:
> On Tue, 20 Sep 2016 21:53:16 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
>> On 9/20/2016 8:13 PM, Alex Williamson wrote:
>>> On Tue, 20 Sep 2016 19:51:58 +0530
>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>>>   
>>>> On 9/20/2016 3:06 AM, Alex Williamson wrote:  
>>>>> On Tue, 20 Sep 2016 02:05:52 +0530
>>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>>>>>     
>>>>>> Hi libvirt experts,
>>>>>>
>>>>>>
>>>>>> 'create':
>>>>>>     Write-only file. Mandatory.
>>>>>>     Accepts string to create mediated device.
>>>>>>
>>>>>> 'name':
>>>>>>     Read-Only file. Mandatory.
>>>>>>     Returns string, the name of that type id.
>>>>>>
>>>>>> 'fb_length':
>>>>>>     Read-only file. Mandatory.
>>>>>>     Returns <number>{K,M,G}, size of framebuffer.    
>>>>>
>>>>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
>>>>> just one user of mediated devices.
>>>>>     
>>>>
>>>> As per our discussion in BOF, we would define class and its attributes.
>>>> As I mentioned above these are attributes of "display" class.  
>>>
>>> Just as I didn't know here, how does libvirt know the class of a given
>>> type id?
>>>    
>>
>> Yes, we need some way to identify the class as Daniel mentioned in his
>> suggestion. Add a file 'class' in mdev_supported_types that would return
>> the class.
>>
>>
>>>>>  Can all parameters be changed dynamically?    
>>>>
>>>> Parameters here would be extra parameters which are not listed above in
>>>> mandatory list. If such parameters are required to set, these parameters
>>>> should be set per mdev device before VM is booted.
>>>>  
>>>>>  Do parameter changes apply to existing devices or only future devices?    
>>>>
>>>> No, it should be applied at the time when mdev device is created or
>>>> after mdev device is created but before VM is booted. It will not be
>>>> applicable to existing devices.
>>>>  
>>>>>  This is a poorly specified
>>>>> interface.  I'd prefer this be done via module options on the vendor
>>>>> driver.
>>>>>     
>>>>
>>>> Module option is not suitable here, in that case such parameters would
>>>> be applied to all mdev device created for vendor driver and that is not
>>>> what we want to.  
>>>
>>> Then use attributes on the mdev device itself, this is not an
>>> acceptable interface.
>>>   
>>>>>>
>>>>>> With above example, vGPU device XML would look like:
>>>>>>
>>>>>>    <device>
>>>>>>      <name>my-vgpu</name>
>>>>>>      <parent>pci_0000_86_00_0</parent>
>>>>>>      <capability type='mdev'>
>>>>>>        <type id='11'/>
>>>>>>        <group>1</group>
>>>>>>        <params>'frame_rate_limiter=0'</params>
>>>>>>      </capability>
>>>>>>    </device>
>>>>>>
>>>>>> 'type id' is mandatory.    
>>>>>
>>>>> I was under the impression that the vendor supplied "name" was the one
>>>>> unique identifier.  We're certainly not going to create a registrar to
>>>>> hand out type ids to each vendor, so what makes type id unique?    
>>>>
>>>> Type id is unique, 'name' is the name (string) of device that would be
>>>> displayed in guest OS for that type of mdev device.  
>>>
>>> We were told at the BoF that name was the unique string which would be
>>> consistent and you again indicate below that "GRID-M60-0B" has a fixed
>>> set of attributes, so why do we care what type id is associated with
>>> that name?
>>>    
>>>>>  I have
>>>>> a hard time believing that a given vendor can even allocate unique type
>>>>> ids for their own devices.  Unique type id across vendors is not
>>>>> practical.  So which attribute are we actually using to identify which
>>>>> type of mdev device we need and why would we ever specify additional
>>>>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
>>>>> M60-0B" has a fixed setup of those attributes?
>>>>>     
>>>>
>>>> Specifying attributes here is not our requirement. Yes we have fixed set
>>>> of attributes for "GRID-M60-0B" and on.
>>>> We are defining the attributed here for "display" class for all other
>>>> vendor of gpu can use.
>>>>
>>>>  
>>>>>> 'group' is optional. It should be a unique number in the system among
>>>>>> all the groups created for mdev devices. Its usage is:
>>>>>>   - not needed if single vGPU device is being assigned to a domain.
>>>>>>   - only need to be set if multiple vGPUs need to be assigned to a
>>>>>> domain and vendor driver have 'requires_group' file in type id directory.
>>>>>>   - if type id directory include 'requires_group' and user tries to
>>>>>> assign multiple vGPUs to a domain without having <group> field in XML,
>>>>>> it will create single vGPU.    
>>>>>
>>>>> We never finished our discussion of how this gets implemented or
>>>>> whether a group applies only to devices from the same vendor, same
>>>>> device, how heterogeneous groups are handled, etc.
>>>>>     
>>>>
>>>> We agreed on above points that I mentioned here and we wish to know what
>>>> libvirt folks think, right?
>>>> Since there were no inputs on that thread I thought I should summarize
>>>> what had been discussed and get libvirt experts thoughts on this.  
>>>
>>> No, that was just an idea that was never full defined.  We had that
>>> idea, the IOMMU group idea, the invalid container idea...  This is
>>> still an open question in the design.
>>>   
>>
>> We do want to close down on the design as soon as possible.
>> I had tried to address open questions in earlier discussion also. Let me
>> address your concerns you have:
>>
>>> whether a group applies only to devices from the same vendor,  
>>
>> Yes.
> 
> So should we call this a vendor_group?  How does libvirt learn the
> vendor?  We can't assume that the parent device will always be a PCI
> device.  

In sysfs, devices, even other than PCI devices keeps link to its owner
driver. We can make use of that to find the owner driver.

> Maybe it's actually a driver_group and libvirt should put any
> mdev devices with the same ->parent->driver into the same group.
> 

Yes.

> If these groups are managed via the vendor driver, we don't really need
> a global namespace of group ids, isn't it per vendor driver?  Maybe the
> group should be a uuid rather than an integer to effectively remove the
> collision/uniqueness problem.
>  

Ok, UUID is also fine. It should be unique.


>>> same device,  
>>
>> Group can be of mdev devices from different supported types and from
>> different physical device but should be from same vendor.
> 
> I think we're just reinforcing that this is a driver-group.  Another
> vendor might require a device-group, where devices must all be attached
> to the same parent device.  "group" is too generic.
> 
>>> how heterogeneous groups are handled,  
>>
>> 'heterogeneous' does that mean device grouped together with different
>> vendors?
> 
> Different vendors or different type-ids.  I think you've clarified both
> of these, but it certainly wasn't specified previously.
>  
>>>>>> 'params' is optional field. User should set this field if extra
>>>>>> parameters need to be set for a particular vGPU device. Libvirt don't
>>>>>> need to parse these params. These are meant for vendor driver.    
>>>>>
>>>>> ie. magic black hole.  nope.
>>>>>     
>>>>>> Libvirt need to follow the sequence to create device:
>>>>>> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
>>>>>> then only proceed else fail.
>>>>>>
>>>>>> * Set extra params if 'params' field exist in device XML and 'params'
>>>>>> file exist in type id directory
>>>>>>
>>>>>>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
>>>>>>
>>>>>> * Autogenerate UUID
>>>>>> * Create device:
>>>>>>
>>>>>>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
>>>>>>
>>>>>>     where <group> is optional. Group should be unique number among all
>>>>>> the groups created for mdev devices.    
>>>>>
>>>>> Is it an integer, a UUID, a string?  How is the group for an mdev
>>>>> device discovered?  Can it be changed later?
>>>>>     
>>>>
>>>> Group number could be integer, that's where we need input from libvirt
>>>> folks. It should be provided at the time of mdev device creation. It
>>>> can't be changed after that. If user want to change group number
>>>> destroy/remove mdev device and create it again with proper group number.  
>>>
>>> My concern is that a type id seems arbitrary but we're specifying that
>>> it be unique.  We already have something unique, the name.  So why try
>>> to make the type id unique as well?  A vendor can accidentally create
>>> their vendor driver so that a given name means something very
>>> specific.  On the other hand they need to be extremely deliberate to
>>> coordinate that a type id means a unique thing across all their product
>>> lines.
>>>   
>>
>> Let me clarify, type id should be unique in the list of
>> mdev_supported_types. You can't have 2 directories in with same name.
> 
> Of course, but does that mean it's only unique to the machine I'm
> currently running on?  Let's say I have a Tesla P100 on my system and
> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> new card still going to be a "GRID-M60-0B"?  If not then we've based
> our XML on the wrong attribute.  If the new device does not support
> "GRID-M60-0B" then we should generate an error, not simply initialize
> whatever type-id 11 happens to be on this new card.
> 

If there are 2 M60 in the system then you would find '11' type directory
in mdev_supported_types of both M60. If you have P100, '11' type would
not be there in its mdev_supported_types, it will have different types.

For example, if you replace M60 with P100, but XML is not updated. XML
have type '11'. When libvirt would try to create mdev device, libvirt
would have to find 'create' file in sysfs in following directory format:

 --- mdev_supported_types
     |-- 11
     |   |-- create

but now for P100, '11' directory is not there, so libvirt should throw
error on not able to find '11' directory.


> Again, we've defined that name is unique, so why should type-id be
> anything other than arbitrary?  It relates only to the current instance
> on the machine.
>  
>> I think here 'name' is the problem and it is not used in xml file.
>> Lets not make it mandatory. But vendor driver can have it which 3rd
>> party can use to read.
>> Lets remove 'id' from type id in XML if that is the concern. Supported
>> types is going to be defined by vendor driver, so let vendor driver
>> decide what to use for directory name and same should be used in device
>> xml file, it could be '11' or "GRID M60-0B":
>>
>>     <device>
>>       <name>my-vgpu</name>
>>       <parent>pci_0000_86_00_0</parent>
>>       <capability type='mdev'>
>>         <type='11'/>
>>         ...
>>       </capability>
>>     </device>
>>
>>
>>>>>>
>>>>>> 7. Hot-plug
>>>>>>
>>>>>> It is same syntax to create a virtual device for hot-plug.    
>>>>>
>>>>> How do groups work with hotplug?  Can a device be creating into an
>>>>> existing, running group?  Can a device be removed from an existing,
>>>>> running group?  Thanks,
>>>>>     
>>>>
>>>> Group is for vendor driver to decide when to commit resources for a
>>>> group of devices to support multiple devices in a domain. It will be
>>>> upto vendor driver how it manage it.  
>>>
>>> Then it's not sufficiently specified for libvirt to use.  Thanks,
>>>   
>>
>> Can you clarify what if required for libvirt to use?
> 
> Everything we're discussing above.  We can't simply say that group
> management is defined by the vendor driver, that means that libvirt
> needs to know how to uniquely behave for each vendor driver.  We want
> to create a consistent framework where libvirt doesn't care what the
> vendor driver is.  Therefore we need to make group manipulation fully
> specified, regardless of the vendor driver.  Thanks,
>

There is no communication between different vendor drivers. So if
libvirt creates a group with one device from one vendor and one device
from another vendor, both vendor driver would think they have a single
device and accordingly commit resources for that single device from its
own driver. Ideally nothing would fail. But that is not the main aim of
grouping, right?

To make this independent of vendor driver, we had initial proposal of
having same UUID and different instance numbers. But that was also not
acceptable.

Thanks,
Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-21 18:34           ` Kirti Wankhede
@ 2016-09-21 19:03             ` Alex Williamson
  2016-09-22  4:11               ` Kirti Wankhede
  2016-09-29 14:22               ` [Qemu-devel] " Kirti Wankhede
  0 siblings, 2 replies; 62+ messages in thread
From: Alex Williamson @ 2016-09-21 19:03 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann

On Thu, 22 Sep 2016 00:04:14 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 9/20/2016 10:20 PM, Alex Williamson wrote:
> > On Tue, 20 Sep 2016 21:53:16 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> On 9/20/2016 8:13 PM, Alex Williamson wrote:  
> >>> On Tue, 20 Sep 2016 19:51:58 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>     
> >>>> On 9/20/2016 3:06 AM, Alex Williamson wrote:    
> >>>>> On Tue, 20 Sep 2016 02:05:52 +0530
> >>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>>>       
> >>>>>> Hi libvirt experts,
> >>>>>>
> >>>>>>
> >>>>>> 'create':
> >>>>>>     Write-only file. Mandatory.
> >>>>>>     Accepts string to create mediated device.
> >>>>>>
> >>>>>> 'name':
> >>>>>>     Read-Only file. Mandatory.
> >>>>>>     Returns string, the name of that type id.
> >>>>>>
> >>>>>> 'fb_length':
> >>>>>>     Read-only file. Mandatory.
> >>>>>>     Returns <number>{K,M,G}, size of framebuffer.      
> >>>>>
> >>>>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> >>>>> just one user of mediated devices.
> >>>>>       
> >>>>
> >>>> As per our discussion in BOF, we would define class and its attributes.
> >>>> As I mentioned above these are attributes of "display" class.    
> >>>
> >>> Just as I didn't know here, how does libvirt know the class of a given
> >>> type id?
> >>>      
> >>
> >> Yes, we need some way to identify the class as Daniel mentioned in his
> >> suggestion. Add a file 'class' in mdev_supported_types that would return
> >> the class.
> >>
> >>  
> >>>>>  Can all parameters be changed dynamically?      
> >>>>
> >>>> Parameters here would be extra parameters which are not listed above in
> >>>> mandatory list. If such parameters are required to set, these parameters
> >>>> should be set per mdev device before VM is booted.
> >>>>    
> >>>>>  Do parameter changes apply to existing devices or only future devices?      
> >>>>
> >>>> No, it should be applied at the time when mdev device is created or
> >>>> after mdev device is created but before VM is booted. It will not be
> >>>> applicable to existing devices.
> >>>>    
> >>>>>  This is a poorly specified
> >>>>> interface.  I'd prefer this be done via module options on the vendor
> >>>>> driver.
> >>>>>       
> >>>>
> >>>> Module option is not suitable here, in that case such parameters would
> >>>> be applied to all mdev device created for vendor driver and that is not
> >>>> what we want to.    
> >>>
> >>> Then use attributes on the mdev device itself, this is not an
> >>> acceptable interface.
> >>>     
> >>>>>>
> >>>>>> With above example, vGPU device XML would look like:
> >>>>>>
> >>>>>>    <device>
> >>>>>>      <name>my-vgpu</name>
> >>>>>>      <parent>pci_0000_86_00_0</parent>
> >>>>>>      <capability type='mdev'>
> >>>>>>        <type id='11'/>
> >>>>>>        <group>1</group>
> >>>>>>        <params>'frame_rate_limiter=0'</params>
> >>>>>>      </capability>
> >>>>>>    </device>
> >>>>>>
> >>>>>> 'type id' is mandatory.      
> >>>>>
> >>>>> I was under the impression that the vendor supplied "name" was the one
> >>>>> unique identifier.  We're certainly not going to create a registrar to
> >>>>> hand out type ids to each vendor, so what makes type id unique?      
> >>>>
> >>>> Type id is unique, 'name' is the name (string) of device that would be
> >>>> displayed in guest OS for that type of mdev device.    
> >>>
> >>> We were told at the BoF that name was the unique string which would be
> >>> consistent and you again indicate below that "GRID-M60-0B" has a fixed
> >>> set of attributes, so why do we care what type id is associated with
> >>> that name?
> >>>      
> >>>>>  I have
> >>>>> a hard time believing that a given vendor can even allocate unique type
> >>>>> ids for their own devices.  Unique type id across vendors is not
> >>>>> practical.  So which attribute are we actually using to identify which
> >>>>> type of mdev device we need and why would we ever specify additional
> >>>>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
> >>>>> M60-0B" has a fixed setup of those attributes?
> >>>>>       
> >>>>
> >>>> Specifying attributes here is not our requirement. Yes we have fixed set
> >>>> of attributes for "GRID-M60-0B" and on.
> >>>> We are defining the attributed here for "display" class for all other
> >>>> vendor of gpu can use.
> >>>>
> >>>>    
> >>>>>> 'group' is optional. It should be a unique number in the system among
> >>>>>> all the groups created for mdev devices. Its usage is:
> >>>>>>   - not needed if single vGPU device is being assigned to a domain.
> >>>>>>   - only need to be set if multiple vGPUs need to be assigned to a
> >>>>>> domain and vendor driver have 'requires_group' file in type id directory.
> >>>>>>   - if type id directory include 'requires_group' and user tries to
> >>>>>> assign multiple vGPUs to a domain without having <group> field in XML,
> >>>>>> it will create single vGPU.      
> >>>>>
> >>>>> We never finished our discussion of how this gets implemented or
> >>>>> whether a group applies only to devices from the same vendor, same
> >>>>> device, how heterogeneous groups are handled, etc.
> >>>>>       
> >>>>
> >>>> We agreed on above points that I mentioned here and we wish to know what
> >>>> libvirt folks think, right?
> >>>> Since there were no inputs on that thread I thought I should summarize
> >>>> what had been discussed and get libvirt experts thoughts on this.    
> >>>
> >>> No, that was just an idea that was never full defined.  We had that
> >>> idea, the IOMMU group idea, the invalid container idea...  This is
> >>> still an open question in the design.
> >>>     
> >>
> >> We do want to close down on the design as soon as possible.
> >> I had tried to address open questions in earlier discussion also. Let me
> >> address your concerns you have:
> >>  
> >>> whether a group applies only to devices from the same vendor,    
> >>
> >> Yes.  
> > 
> > So should we call this a vendor_group?  How does libvirt learn the
> > vendor?  We can't assume that the parent device will always be a PCI
> > device.    
> 
> In sysfs, devices, even other than PCI devices keeps link to its owner
> driver. We can make use of that to find the owner driver.
> 
> > Maybe it's actually a driver_group and libvirt should put any
> > mdev devices with the same ->parent->driver into the same group.
> >   
> 
> Yes.
> 
> > If these groups are managed via the vendor driver, we don't really need
> > a global namespace of group ids, isn't it per vendor driver?  Maybe the
> > group should be a uuid rather than an integer to effectively remove the
> > collision/uniqueness problem.
> >    
> 
> Ok, UUID is also fine. It should be unique.
> 
> 
> >>> same device,    
> >>
> >> Group can be of mdev devices from different supported types and from
> >> different physical device but should be from same vendor.  
> > 
> > I think we're just reinforcing that this is a driver-group.  Another
> > vendor might require a device-group, where devices must all be attached
> > to the same parent device.  "group" is too generic.
> >   
> >>> how heterogeneous groups are handled,    
> >>
> >> 'heterogeneous' does that mean device grouped together with different
> >> vendors?  
> > 
> > Different vendors or different type-ids.  I think you've clarified both
> > of these, but it certainly wasn't specified previously.
> >    
> >>>>>> 'params' is optional field. User should set this field if extra
> >>>>>> parameters need to be set for a particular vGPU device. Libvirt don't
> >>>>>> need to parse these params. These are meant for vendor driver.      
> >>>>>
> >>>>> ie. magic black hole.  nope.
> >>>>>       
> >>>>>> Libvirt need to follow the sequence to create device:
> >>>>>> * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> >>>>>> then only proceed else fail.
> >>>>>>
> >>>>>> * Set extra params if 'params' field exist in device XML and 'params'
> >>>>>> file exist in type id directory
> >>>>>>
> >>>>>>     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
> >>>>>>
> >>>>>> * Autogenerate UUID
> >>>>>> * Create device:
> >>>>>>
> >>>>>>     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
> >>>>>>
> >>>>>>     where <group> is optional. Group should be unique number among all
> >>>>>> the groups created for mdev devices.      
> >>>>>
> >>>>> Is it an integer, a UUID, a string?  How is the group for an mdev
> >>>>> device discovered?  Can it be changed later?
> >>>>>       
> >>>>
> >>>> Group number could be integer, that's where we need input from libvirt
> >>>> folks. It should be provided at the time of mdev device creation. It
> >>>> can't be changed after that. If user want to change group number
> >>>> destroy/remove mdev device and create it again with proper group number.    
> >>>
> >>> My concern is that a type id seems arbitrary but we're specifying that
> >>> it be unique.  We already have something unique, the name.  So why try
> >>> to make the type id unique as well?  A vendor can accidentally create
> >>> their vendor driver so that a given name means something very
> >>> specific.  On the other hand they need to be extremely deliberate to
> >>> coordinate that a type id means a unique thing across all their product
> >>> lines.
> >>>     
> >>
> >> Let me clarify, type id should be unique in the list of
> >> mdev_supported_types. You can't have 2 directories in with same name.  
> > 
> > Of course, but does that mean it's only unique to the machine I'm
> > currently running on?  Let's say I have a Tesla P100 on my system and
> > type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > new card still going to be a "GRID-M60-0B"?  If not then we've based
> > our XML on the wrong attribute.  If the new device does not support
> > "GRID-M60-0B" then we should generate an error, not simply initialize
> > whatever type-id 11 happens to be on this new card.
> >   
> 
> If there are 2 M60 in the system then you would find '11' type directory
> in mdev_supported_types of both M60. If you have P100, '11' type would
> not be there in its mdev_supported_types, it will have different types.
> 
> For example, if you replace M60 with P100, but XML is not updated. XML
> have type '11'. When libvirt would try to create mdev device, libvirt
> would have to find 'create' file in sysfs in following directory format:
> 
>  --- mdev_supported_types
>      |-- 11
>      |   |-- create
> 
> but now for P100, '11' directory is not there, so libvirt should throw
> error on not able to find '11' directory.

This really seems like an accident waiting to happen.  What happens
when the user replaces their M60 with an Intel XYZ device that happens
to expose a type 11 mdev class gpu device?  How is libvirt supposed to
know that the XML used to refer to a GRID-M60-0B and now it's an
INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
yet another arbitrary requirement that we have some sort of globally
unique type-id database make a lot of sense?  The same issue applies
for simple debug-ability, if I'm reviewing the XML for a domain and the
name is the primary index for the mdev device, I know what it is.
Seeing type-id='11' is meaningless.
 
> > Again, we've defined that name is unique, so why should type-id be
> > anything other than arbitrary?  It relates only to the current instance
> > on the machine.
> >    
> >> I think here 'name' is the problem and it is not used in xml file.
> >> Lets not make it mandatory. But vendor driver can have it which 3rd
> >> party can use to read.
> >> Lets remove 'id' from type id in XML if that is the concern. Supported
> >> types is going to be defined by vendor driver, so let vendor driver
> >> decide what to use for directory name and same should be used in device
> >> xml file, it could be '11' or "GRID M60-0B":
> >>
> >>     <device>
> >>       <name>my-vgpu</name>
> >>       <parent>pci_0000_86_00_0</parent>
> >>       <capability type='mdev'>
> >>         <type='11'/>
> >>         ...
> >>       </capability>
> >>     </device>
> >>
> >>  
> >>>>>>
> >>>>>> 7. Hot-plug
> >>>>>>
> >>>>>> It is same syntax to create a virtual device for hot-plug.      
> >>>>>
> >>>>> How do groups work with hotplug?  Can a device be creating into an
> >>>>> existing, running group?  Can a device be removed from an existing,
> >>>>> running group?  Thanks,
> >>>>>       
> >>>>
> >>>> Group is for vendor driver to decide when to commit resources for a
> >>>> group of devices to support multiple devices in a domain. It will be
> >>>> upto vendor driver how it manage it.    
> >>>
> >>> Then it's not sufficiently specified for libvirt to use.  Thanks,
> >>>     
> >>
> >> Can you clarify what if required for libvirt to use?  
> > 
> > Everything we're discussing above.  We can't simply say that group
> > management is defined by the vendor driver, that means that libvirt
> > needs to know how to uniquely behave for each vendor driver.  We want
> > to create a consistent framework where libvirt doesn't care what the
> > vendor driver is.  Therefore we need to make group manipulation fully
> > specified, regardless of the vendor driver.  Thanks,
> >  
> 
> There is no communication between different vendor drivers. So if
> libvirt creates a group with one device from one vendor and one device
> from another vendor, both vendor driver would think they have a single
> device and accordingly commit resources for that single device from its
> own driver. Ideally nothing would fail. But that is not the main aim of
> grouping, right?
> 
> To make this independent of vendor driver, we had initial proposal of
> having same UUID and different instance numbers. But that was also not
> acceptable.

We're defining the infrastructure, we get to define the rules, but we
need to define all the rules so that libvirt can actually implement to
them.  We can't say "groups are this nebulous thing and each vendor
will manage them however they want", nobody wants to care that NVIDIA
did it one way and Intel did it another and we need to support both.
That's a design failure.  Therefore we \must\ make the group definition
we need well specified, any hopefully generic enough that others can
use it.  If they cannot then we'd need to add a new type of group.
Therefore we need to \specify\ things like for what set of mdev should
a group be created?  Do I need a group for a single device?  How do we
introspect a group to know which mdevs it contains?  Can we dynamically
add or remove mdev devices from a group?  Can we attach multiple
groups to the same user?  Can we change the association of an mdev
device without removing it and recreating it? etc, etc, etc.  We don't
need vendor coordination, we need a specification of exactly how this
type of group is meant to work. Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-21  4:36       ` Alex Williamson
@ 2016-09-22  2:33         ` Tian, Kevin
  2016-09-22  3:01           ` Alex Williamson
       [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D18DF86F5F@SHSMSX101.ccr.corp.intel.com>
  1 sibling, 1 reply; 62+ messages in thread
From: Tian, Kevin @ 2016-09-22  2:33 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Kirti Wankhede, libvir-list, qemu-devel, Paolo Bonzini, Neo Jia,
	Andy Currid, Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, September 21, 2016 12:37 PM
> 
> On Wed, 21 Sep 2016 03:56:21 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> > > Sent: Tuesday, September 20, 2016 10:22 PM
> > > >>
> > > >> 'max_instance':
> > > >>     Read-Only file. Mandatory.
> > > >>     Returns integer.  Returns maximum mdev device could be created
> > > >> at the moment when this file is read. This count would be updated by
> > > >> vendor driver. Before creating mdev device of this type, check if
> > > >> max_instance is > 0.
> > > >
> > > > Didn't we discuss this being being something like "available_instances"
> > > > with a "devices" sub-directory linking current devices of that type?
> > > >
> > >
> > > I'm ok with 'available_instances' as name of this file.
> > > Yes, mdev device directory will have links to devices of that type. I
> > > think that is part of mdev core module discussion. I'm trying to list
> > > sysfs files for libvirt interface here to focus more on libvirt
> > > interface. Hence didn't add that. I'll make sure to add all such details
> > > in future.
> > >
> > >
> >
> > Definitely you need provide those details since they also contain
> > libvirt interface, e.g. 'destroy' which we discussed should be in
> > mdev directory.
> 
> $ find /sys/devices -name remove | wc -l
> 24
> $ find /sys/devices -name destroy | wc -l
> 0
> 
> 'remove' is the sysfs deconstructor we should use.

make sense.

> 
> > Another example is class-specific attributes such
> > as 'resolution', etc. which you listed under 'display class'. All those
> > attributes should be moved to mdev directory. Only class ID is
> > required under each type.
> 
> Can you elaborate on what you're proposing here?  If we don't have
> attributes like 'resolution' under the type-id then we can't describe
> to the user the features of the mdev before we create it.  I don't
> think anybody wants a 'create it and find out' type of interface.
> Thanks,
> 

I think such way would be racy. What about two clients creating mdev 
devices simultaneously? How to guarantee their configuration of class
specific attributes not mutual-impacted since before creation any such
configuration would be global under that type?

My feeling is that we'd better keep create simple - just write a UUID
to "type-id/create". Any class-specific attribute, if we really want to
support, should be able to be individually configured with required
resource allocated incrementally on top of basic resources allocated 
at create stage. Then libvirt can set those attributes between create
and open a mdev device. If this direction is accepted, then naturally
such attributes should be put under mdev directory. Possibly we 
don't need a class description under type-id. libvirt just checks directly
whether any known class represented in each mdev directory (vendor
driver will expose it on demand), and then operate attributes under
that class.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-21  4:43           ` Alex Williamson
@ 2016-09-22  2:43             ` Tian, Kevin
  0 siblings, 0 replies; 62+ messages in thread
From: Tian, Kevin @ 2016-09-22  2:43 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Kirti Wankhede, libvir-list, qemu-devel, Paolo Bonzini, Neo Jia,
	Andy Currid, Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Wednesday, September 21, 2016 12:43 PM
> 
> On Wed, 21 Sep 2016 04:10:53 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> > > Sent: Wednesday, September 21, 2016 12:23 AM
> > >
> > >
> > > On 9/20/2016 8:13 PM, Alex Williamson wrote:
> > > > On Tue, 20 Sep 2016 19:51:58 +0530
> > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > >
> > > >> On 9/20/2016 3:06 AM, Alex Williamson wrote:
> > > >>> On Tue, 20 Sep 2016 02:05:52 +0530
> > > >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > >>>
> > > >>>> Hi libvirt experts,
> > > >>>>
> > > >>>>
> > > >>>> 'create':
> > > >>>>     Write-only file. Mandatory.
> > > >>>>     Accepts string to create mediated device.
> > > >>>>
> > > >>>> 'name':
> > > >>>>     Read-Only file. Mandatory.
> > > >>>>     Returns string, the name of that type id.
> > > >>>>
> > > >>>> 'fb_length':
> > > >>>>     Read-only file. Mandatory.
> > > >>>>     Returns <number>{K,M,G}, size of framebuffer.
> > > >>>
> > > >>> This can't be mandatory, it's only relevant to vGPU devices, vGPUs are
> > > >>> just one user of mediated devices.
> > > >>>
> > > >>
> > > >> As per our discussion in BOF, we would define class and its attributes.
> > > >> As I mentioned above these are attributes of "display" class.
> > > >
> > > > Just as I didn't know here, how does libvirt know the class of a given
> > > > type id?
> > > >
> > >
> > > Yes, we need some way to identify the class as Daniel mentioned in his
> > > suggestion. Add a file 'class' in mdev_supported_types that would return
> > > the class.
> > >
> > >
> >
> > 'display' class or 'gpu' class? display represents only part of GPU
> > functionalities. I assume you want to provide an unique class ID
> > for each type, instead of allowing multiple classes each identifying
> > a subset of controllable GPU resources (e.g. 'display', 'render', etc.)...
> 
> Not sure where you're going with this, we're not creating a programming
> interface for the device, that exists through the vfio API.  I believe
> the intention here is simply a user level classification for the
> purpose of being able to interpret the attributes provided in a logical
> way.

I'm not against that purpose.

My main intention is that 'display' is not an accurate classification here.
If we allow classification in 'display' level, it implies more finer-grained
classifications may be further defined upon which vendor specific GPU 
resources is allowed to be configured, which might be an overkill. Here 
we just need a general classification like 'gpu' to cover all possible 
vendor-agnostic attributes beyond display.

> 
> > for unique class ID, I once considered whether it could be directly
> > inherited from class of parent device, since typically a vendor driver
> > creates mdev devices using the same type as physical device. But later
> > I realized one potential value of keep a class field for mdev types,
> > e.g. if we want to extend mdev to FPGA which could be programmed
> > as different classes of functionalities. :-)
> 
> And how do we generically determine the class of a parent device?  We
> cannot assume the parent device is PCI.  Thanks,
> 

Yes, this is also a good point.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
       [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D18DF86F5F@SHSMSX101.ccr.corp.intel.com>
@ 2016-09-22  2:59           ` Tian, Kevin
  0 siblings, 0 replies; 62+ messages in thread
From: Tian, Kevin @ 2016-09-22  2:59 UTC (permalink / raw)
  To: 'Alex Williamson'
  Cc: Kirti Wankhede, libvir-list, qemu-devel, Paolo Bonzini, Neo Jia,
	Andy Currid, Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Tian, Kevin
> Sent: Thursday, September 22, 2016 10:33 AM
> 
> >
> > > Another example is class-specific attributes such
> > > as 'resolution', etc. which you listed under 'display class'. All those
> > > attributes should be moved to mdev directory. Only class ID is
> > > required under each type.
> >
> > Can you elaborate on what you're proposing here?  If we don't have
> > attributes like 'resolution' under the type-id then we can't describe
> > to the user the features of the mdev before we create it.  I don't
> > think anybody wants a 'create it and find out' type of interface.
> > Thanks,
> >
> 
> I think such way would be racy. What about two clients creating mdev
> devices simultaneously? How to guarantee their configuration of class
> specific attributes not mutual-impacted since before creation any such
> configuration would be global under that type?
> 
> My feeling is that we'd better keep create simple - just write a UUID
> to "type-id/create". Any class-specific attribute, if we really want to
> support, should be able to be individually configured with required
> resource allocated incrementally on top of basic resources allocated
> at create stage. Then libvirt can set those attributes between create
> and open a mdev device. If this direction is accepted, then naturally
> such attributes should be put under mdev directory. Possibly we
> don't need a class description under type-id. libvirt just checks directly
> whether any known class represented in each mdev directory (vendor
> driver will expose it on demand), and then operate attributes under
> that class.
> 

Have a better understanding of your concern now. User needs to know
and set a list of attributes before requesting to create a new mdev
device. From this angle all attributes should be enumerated per type-id.
But as I explained above, writing multiple sysfs nodes to create a
mdev device is racy. We either need a way to guarantee transactional
operations on those nodes, or having type-id directory to only expose
supported attributes while programming those attributes has to be 
through per-mdev directory after mdev device is created...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-22  2:33         ` Tian, Kevin
@ 2016-09-22  3:01           ` Alex Williamson
  2016-09-22  3:42             ` Tian, Kevin
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Williamson @ 2016-09-22  3:01 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Kirti Wankhede, libvir-list, qemu-devel, Paolo Bonzini, Neo Jia,
	Andy Currid, Song, Jike, bjsdjshi, Gerd Hoffmann

On Thu, 22 Sep 2016 02:33:01 +0000
"Tian, Kevin" <kevin.tian@intel.com> wrote:

> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Wednesday, September 21, 2016 12:37 PM
> > 
> > On Wed, 21 Sep 2016 03:56:21 +0000
> > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> >   
> > > > From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> > > > Sent: Tuesday, September 20, 2016 10:22 PM  
> > > > >>
> > > > >> 'max_instance':
> > > > >>     Read-Only file. Mandatory.
> > > > >>     Returns integer.  Returns maximum mdev device could be created
> > > > >> at the moment when this file is read. This count would be updated by
> > > > >> vendor driver. Before creating mdev device of this type, check if
> > > > >> max_instance is > 0.  
> > > > >
> > > > > Didn't we discuss this being being something like "available_instances"
> > > > > with a "devices" sub-directory linking current devices of that type?
> > > > >  
> > > >
> > > > I'm ok with 'available_instances' as name of this file.
> > > > Yes, mdev device directory will have links to devices of that type. I
> > > > think that is part of mdev core module discussion. I'm trying to list
> > > > sysfs files for libvirt interface here to focus more on libvirt
> > > > interface. Hence didn't add that. I'll make sure to add all such details
> > > > in future.
> > > >
> > > >  
> > >
> > > Definitely you need provide those details since they also contain
> > > libvirt interface, e.g. 'destroy' which we discussed should be in
> > > mdev directory.  
> > 
> > $ find /sys/devices -name remove | wc -l
> > 24
> > $ find /sys/devices -name destroy | wc -l
> > 0
> > 
> > 'remove' is the sysfs deconstructor we should use.  
> 
> make sense.
> 
> >   
> > > Another example is class-specific attributes such
> > > as 'resolution', etc. which you listed under 'display class'. All those
> > > attributes should be moved to mdev directory. Only class ID is
> > > required under each type.  
> > 
> > Can you elaborate on what you're proposing here?  If we don't have
> > attributes like 'resolution' under the type-id then we can't describe
> > to the user the features of the mdev before we create it.  I don't
> > think anybody wants a 'create it and find out' type of interface.
> > Thanks,
> >   
> 
> I think such way would be racy. What about two clients creating mdev 
> devices simultaneously? How to guarantee their configuration of class
> specific attributes not mutual-impacted since before creation any such
> configuration would be global under that type?

A simple mutex in the sysfs store attribute to serialize, return write
error if insufficient resources to create additional devices...  Easy.
Let's not exaggerate the issue.
 
> My feeling is that we'd better keep create simple - just write a UUID
> to "type-id/create". Any class-specific attribute, if we really want to
> support, should be able to be individually configured with required
> resource allocated incrementally on top of basic resources allocated 
> at create stage. Then libvirt can set those attributes between create
> and open a mdev device. If this direction is accepted, then naturally
> such attributes should be put under mdev directory. Possibly we 
> don't need a class description under type-id. libvirt just checks directly
> whether any known class represented in each mdev directory (vendor
> driver will expose it on demand), and then operate attributes under
> that class.

It seems like this just moves the hypothetical race to the point where
we're adjusting individual attributes for the mdev device, but now the
problem is worse because there's no simple locking or serialization
that guarantees we can get all the attributes we want.  We might get
half the attributes for one device and half the attributes for another
and not be able to complete initialization of either.  I think we also
lose a lot of our ability to expose pre-defined configurations to the
user and the vendor's test matrix explodes.  We also can't provide any
sort of reasonable estimate of how many devices can be created because
we don't know the resource requirements of each device.  I don't like
this idea at all, sorry.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-22  3:01           ` Alex Williamson
@ 2016-09-22  3:42             ` Tian, Kevin
  0 siblings, 0 replies; 62+ messages in thread
From: Tian, Kevin @ 2016-09-22  3:42 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Kirti Wankhede, libvir-list, qemu-devel, Paolo Bonzini, Neo Jia,
	Andy Currid, Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Thursday, September 22, 2016 11:02 AM
> 
> On Thu, 22 Sep 2016 02:33:01 +0000
> "Tian, Kevin" <kevin.tian@intel.com> wrote:
> 
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Wednesday, September 21, 2016 12:37 PM
> > >
> > > On Wed, 21 Sep 2016 03:56:21 +0000
> > > "Tian, Kevin" <kevin.tian@intel.com> wrote:
> > >
> > > > > From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> > > > > Sent: Tuesday, September 20, 2016 10:22 PM
> > > > > >>
> > > > > >> 'max_instance':
> > > > > >>     Read-Only file. Mandatory.
> > > > > >>     Returns integer.  Returns maximum mdev device could be created
> > > > > >> at the moment when this file is read. This count would be updated by
> > > > > >> vendor driver. Before creating mdev device of this type, check if
> > > > > >> max_instance is > 0.
> > > > > >
> > > > > > Didn't we discuss this being being something like "available_instances"
> > > > > > with a "devices" sub-directory linking current devices of that type?
> > > > > >
> > > > >
> > > > > I'm ok with 'available_instances' as name of this file.
> > > > > Yes, mdev device directory will have links to devices of that type. I
> > > > > think that is part of mdev core module discussion. I'm trying to list
> > > > > sysfs files for libvirt interface here to focus more on libvirt
> > > > > interface. Hence didn't add that. I'll make sure to add all such details
> > > > > in future.
> > > > >
> > > > >
> > > >
> > > > Definitely you need provide those details since they also contain
> > > > libvirt interface, e.g. 'destroy' which we discussed should be in
> > > > mdev directory.
> > >
> > > $ find /sys/devices -name remove | wc -l
> > > 24
> > > $ find /sys/devices -name destroy | wc -l
> > > 0
> > >
> > > 'remove' is the sysfs deconstructor we should use.
> >
> > make sense.
> >
> > >
> > > > Another example is class-specific attributes such
> > > > as 'resolution', etc. which you listed under 'display class'. All those
> > > > attributes should be moved to mdev directory. Only class ID is
> > > > required under each type.
> > >
> > > Can you elaborate on what you're proposing here?  If we don't have
> > > attributes like 'resolution' under the type-id then we can't describe
> > > to the user the features of the mdev before we create it.  I don't
> > > think anybody wants a 'create it and find out' type of interface.
> > > Thanks,
> > >
> >
> > I think such way would be racy. What about two clients creating mdev
> > devices simultaneously? How to guarantee their configuration of class
> > specific attributes not mutual-impacted since before creation any such
> > configuration would be global under that type?
> 
> A simple mutex in the sysfs store attribute to serialize, return write
> error if insufficient resources to create additional devices...  Easy.
> Let's not exaggerate the issue.
> 
> > My feeling is that we'd better keep create simple - just write a UUID
> > to "type-id/create". Any class-specific attribute, if we really want to
> > support, should be able to be individually configured with required
> > resource allocated incrementally on top of basic resources allocated
> > at create stage. Then libvirt can set those attributes between create
> > and open a mdev device. If this direction is accepted, then naturally
> > such attributes should be put under mdev directory. Possibly we
> > don't need a class description under type-id. libvirt just checks directly
> > whether any known class represented in each mdev directory (vendor
> > driver will expose it on demand), and then operate attributes under
> > that class.
> 
> It seems like this just moves the hypothetical race to the point where
> we're adjusting individual attributes for the mdev device, but now the
> problem is worse because there's no simple locking or serialization
> that guarantees we can get all the attributes we want.  We might get
> half the attributes for one device and half the attributes for another
> and not be able to complete initialization of either.  I think we also
> lose a lot of our ability to expose pre-defined configurations to the
> user and the vendor's test matrix explodes.  We also can't provide any
> sort of reasonable estimate of how many devices can be created because
> we don't know the resource requirements of each device.  I don't like
> this idea at all, sorry.  Thanks,
> 

Yes, it's a valid concern. Alex, curious about your opinion of supporting
some device-agnostic optional attributes. One example is to set QoS
parameters (priority, weight, etc.), which may be enabled by vendor
driver to allow dynamic adjustment even after a mdev device is created.
Do we still put a QoS class under type-id, meaning must-be-configured 
before creation (not a bad limitation since from SLA p.o.v all QoS 
attributes should be negotiated down in advance)? Would there be any 
concern if a type-id contains multiple classes (e.g. include both 'gpu' 
class and 'qos' class)? Or not need to generalize a QoS class at all... 
just define those attributes under every possible device-specific class,
regardless of whether they are device-agnostic?

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-21 19:03             ` Alex Williamson
@ 2016-09-22  4:11               ` Kirti Wankhede
  2016-09-22 14:19                 ` Alex Williamson
  2016-09-29 14:22               ` [Qemu-devel] " Kirti Wankhede
  1 sibling, 1 reply; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-22  4:11 UTC (permalink / raw)
  To: Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann




>>>>> My concern is that a type id seems arbitrary but we're specifying that
>>>>> it be unique.  We already have something unique, the name.  So why try
>>>>> to make the type id unique as well?  A vendor can accidentally create
>>>>> their vendor driver so that a given name means something very
>>>>> specific.  On the other hand they need to be extremely deliberate to
>>>>> coordinate that a type id means a unique thing across all their product
>>>>> lines.
>>>>>     
>>>>
>>>> Let me clarify, type id should be unique in the list of
>>>> mdev_supported_types. You can't have 2 directories in with same name.  
>>>
>>> Of course, but does that mean it's only unique to the machine I'm
>>> currently running on?  Let's say I have a Tesla P100 on my system and
>>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
>>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
>>> new card still going to be a "GRID-M60-0B"?  If not then we've based
>>> our XML on the wrong attribute.  If the new device does not support
>>> "GRID-M60-0B" then we should generate an error, not simply initialize
>>> whatever type-id 11 happens to be on this new card.
>>>   
>>
>> If there are 2 M60 in the system then you would find '11' type directory
>> in mdev_supported_types of both M60. If you have P100, '11' type would
>> not be there in its mdev_supported_types, it will have different types.
>>
>> For example, if you replace M60 with P100, but XML is not updated. XML
>> have type '11'. When libvirt would try to create mdev device, libvirt
>> would have to find 'create' file in sysfs in following directory format:
>>
>>  --- mdev_supported_types
>>      |-- 11
>>      |   |-- create
>>
>> but now for P100, '11' directory is not there, so libvirt should throw
>> error on not able to find '11' directory.
> 
> This really seems like an accident waiting to happen.  What happens
> when the user replaces their M60 with an Intel XYZ device that happens
> to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> know that the XML used to refer to a GRID-M60-0B and now it's an
> INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> yet another arbitrary requirement that we have some sort of globally
> unique type-id database make a lot of sense?  The same issue applies
> for simple debug-ability, if I'm reviewing the XML for a domain and the
> name is the primary index for the mdev device, I know what it is.
> Seeing type-id='11' is meaningless.
>

Let me clarify again, type '11' is a string that vendor driver would
define (see my previous reply below) it could be "11" or "GRID-M60-0B".
If 2 vendors used same string we can't control that. right?


>>>> Lets remove 'id' from type id in XML if that is the concern. Supported
>>>> types is going to be defined by vendor driver, so let vendor driver
>>>> decide what to use for directory name and same should be used in device
>>>> xml file, it could be '11' or "GRID M60-0B":
>>>>
>>>>     <device>
>>>>       <name>my-vgpu</name>
>>>>       <parent>pci_0000_86_00_0</parent>
>>>>       <capability type='mdev'>
>>>>         <type='11'/>
>>>>         ...
>>>>       </capability>
>>>>     </device>
>>>>
>>>>  

Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-22  4:11               ` Kirti Wankhede
@ 2016-09-22 14:19                 ` Alex Williamson
  2016-09-22 14:26                   ` [Qemu-devel] [libvirt] " Daniel P. Berrange
  0 siblings, 1 reply; 62+ messages in thread
From: Alex Williamson @ 2016-09-22 14:19 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann

On Thu, 22 Sep 2016 09:41:20 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> >>>>> My concern is that a type id seems arbitrary but we're specifying that
> >>>>> it be unique.  We already have something unique, the name.  So why try
> >>>>> to make the type id unique as well?  A vendor can accidentally create
> >>>>> their vendor driver so that a given name means something very
> >>>>> specific.  On the other hand they need to be extremely deliberate to
> >>>>> coordinate that a type id means a unique thing across all their product
> >>>>> lines.
> >>>>>       
> >>>>
> >>>> Let me clarify, type id should be unique in the list of
> >>>> mdev_supported_types. You can't have 2 directories in with same name.    
> >>>
> >>> Of course, but does that mean it's only unique to the machine I'm
> >>> currently running on?  Let's say I have a Tesla P100 on my system and
> >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> >>> our XML on the wrong attribute.  If the new device does not support
> >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> >>> whatever type-id 11 happens to be on this new card.
> >>>     
> >>
> >> If there are 2 M60 in the system then you would find '11' type directory
> >> in mdev_supported_types of both M60. If you have P100, '11' type would
> >> not be there in its mdev_supported_types, it will have different types.
> >>
> >> For example, if you replace M60 with P100, but XML is not updated. XML
> >> have type '11'. When libvirt would try to create mdev device, libvirt
> >> would have to find 'create' file in sysfs in following directory format:
> >>
> >>  --- mdev_supported_types
> >>      |-- 11
> >>      |   |-- create
> >>
> >> but now for P100, '11' directory is not there, so libvirt should throw
> >> error on not able to find '11' directory.  
> > 
> > This really seems like an accident waiting to happen.  What happens
> > when the user replaces their M60 with an Intel XYZ device that happens
> > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > know that the XML used to refer to a GRID-M60-0B and now it's an
> > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > yet another arbitrary requirement that we have some sort of globally
> > unique type-id database make a lot of sense?  The same issue applies
> > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > name is the primary index for the mdev device, I know what it is.
> > Seeing type-id='11' is meaningless.
> >  
> 
> Let me clarify again, type '11' is a string that vendor driver would
> define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> If 2 vendors used same string we can't control that. right?
> 
> 
> >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> >>>> types is going to be defined by vendor driver, so let vendor driver
> >>>> decide what to use for directory name and same should be used in device
> >>>> xml file, it could be '11' or "GRID M60-0B":
> >>>>
> >>>>     <device>
> >>>>       <name>my-vgpu</name>
> >>>>       <parent>pci_0000_86_00_0</parent>
> >>>>       <capability type='mdev'>
> >>>>         <type='11'/>
> >>>>         ...
> >>>>       </capability>
> >>>>     </device>

Then let's get rid of the 'name' attribute and let the sysfs directory
simply be the name.  Then we can get rid of 'type' altogether so we
don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-22 14:19                 ` Alex Williamson
@ 2016-09-22 14:26                   ` Daniel P. Berrange
  2016-09-28 19:22                     ` Neo Jia
  0 siblings, 1 reply; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-22 14:26 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Kirti Wankhede, Andy Currid, Tian, Kevin, Neo Jia, libvir-list,
	qemu-devel, Song, Jike, Gerd Hoffmann, Paolo Bonzini, bjsdjshi

On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> On Thu, 22 Sep 2016 09:41:20 +0530
> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> 
> > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > >>>>> it be unique.  We already have something unique, the name.  So why try
> > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > >>>>> their vendor driver so that a given name means something very
> > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > >>>>> coordinate that a type id means a unique thing across all their product
> > >>>>> lines.
> > >>>>>       
> > >>>>
> > >>>> Let me clarify, type id should be unique in the list of
> > >>>> mdev_supported_types. You can't have 2 directories in with same name.    
> > >>>
> > >>> Of course, but does that mean it's only unique to the machine I'm
> > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > >>> our XML on the wrong attribute.  If the new device does not support
> > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > >>> whatever type-id 11 happens to be on this new card.
> > >>>     
> > >>
> > >> If there are 2 M60 in the system then you would find '11' type directory
> > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > >> not be there in its mdev_supported_types, it will have different types.
> > >>
> > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > >> would have to find 'create' file in sysfs in following directory format:
> > >>
> > >>  --- mdev_supported_types
> > >>      |-- 11
> > >>      |   |-- create
> > >>
> > >> but now for P100, '11' directory is not there, so libvirt should throw
> > >> error on not able to find '11' directory.  
> > > 
> > > This really seems like an accident waiting to happen.  What happens
> > > when the user replaces their M60 with an Intel XYZ device that happens
> > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > yet another arbitrary requirement that we have some sort of globally
> > > unique type-id database make a lot of sense?  The same issue applies
> > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > name is the primary index for the mdev device, I know what it is.
> > > Seeing type-id='11' is meaningless.
> > >  
> > 
> > Let me clarify again, type '11' is a string that vendor driver would
> > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > If 2 vendors used same string we can't control that. right?
> > 
> > 
> > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > >>>> types is going to be defined by vendor driver, so let vendor driver
> > >>>> decide what to use for directory name and same should be used in device
> > >>>> xml file, it could be '11' or "GRID M60-0B":
> > >>>>
> > >>>>     <device>
> > >>>>       <name>my-vgpu</name>
> > >>>>       <parent>pci_0000_86_00_0</parent>
> > >>>>       <capability type='mdev'>
> > >>>>         <type='11'/>
> > >>>>         ...
> > >>>>       </capability>
> > >>>>     </device>
> 
> Then let's get rid of the 'name' attribute and let the sysfs directory
> simply be the name.  Then we can get rid of 'type' altogether so we
> don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,

That sounds nice to me - we don't need two unique identifiers if
one will do.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20 16:23       ` Kirti Wankhede
  2016-09-20 16:50         ` Alex Williamson
  2016-09-21  4:10         ` Tian, Kevin
@ 2016-09-22 19:25         ` Tian, Kevin
  2016-09-23 18:34           ` Kirti Wankhede
  2 siblings, 1 reply; 62+ messages in thread
From: Tian, Kevin @ 2016-09-22 19:25 UTC (permalink / raw)
  To: Kirti Wankhede, Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Song, Jike, bjsdjshi, Gerd Hoffmann

> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
> Sent: Wednesday, September 21, 2016 12:23 AM
> >
> >>>  I have
> >>> a hard time believing that a given vendor can even allocate unique type
> >>> ids for their own devices.  Unique type id across vendors is not
> >>> practical.  So which attribute are we actually using to identify which
> >>> type of mdev device we need and why would we ever specify additional
> >>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
> >>> M60-0B" has a fixed setup of those attributes?
> >>>
> >>
> >> Specifying attributes here is not our requirement. Yes we have fixed set
> >> of attributes for "GRID-M60-0B" and on.
> >> We are defining the attributed here for "display" class for all other
> >> vendor of gpu can use.
> >>

Hi, Kirti, 

We decide to go with above type-based interface for KVMGT, with fixed setup 
of attributes too. If both Intel and NVIDIA solutions use such fixed manner,
can we go with a proposal which doesn't include 'class' extension for now?
Later if there is a concrete usage requiring such class-specific attribute setting,
then it's easy to extend following discussion in this thread. I'm thinking how we
can converge the discussion here into something simple enough (and extensible)
to accelerate upstreaming of both Intel/NVIDIA solutions...

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-22 19:25         ` Tian, Kevin
@ 2016-09-23 18:34           ` Kirti Wankhede
  0 siblings, 0 replies; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-23 18:34 UTC (permalink / raw)
  To: Tian, Kevin, Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Song, Jike, bjsdjshi, Gerd Hoffmann



On 9/23/2016 12:55 AM, Tian, Kevin wrote:
>> From: Kirti Wankhede [mailto:kwankhede@nvidia.com]
>> Sent: Wednesday, September 21, 2016 12:23 AM
>>>
>>>>>  I have
>>>>> a hard time believing that a given vendor can even allocate unique type
>>>>> ids for their own devices.  Unique type id across vendors is not
>>>>> practical.  So which attribute are we actually using to identify which
>>>>> type of mdev device we need and why would we ever specify additional
>>>>> attributes like fb_length?  Doesn't the vendor guarantee that "GRID
>>>>> M60-0B" has a fixed setup of those attributes?
>>>>>
>>>>
>>>> Specifying attributes here is not our requirement. Yes we have fixed set
>>>> of attributes for "GRID-M60-0B" and on.
>>>> We are defining the attributed here for "display" class for all other
>>>> vendor of gpu can use.
>>>>
> 
> Hi, Kirti, 
> 
> We decide to go with above type-based interface for KVMGT, with fixed setup 
> of attributes too. If both Intel and NVIDIA solutions use such fixed manner,
> can we go with a proposal which doesn't include 'class' extension for now?
> Later if there is a concrete usage requiring such class-specific attribute setting,
> then it's easy to extend following discussion in this thread. I'm thinking how we
> can converge the discussion here into something simple enough (and extensible)
> to accelerate upstreaming of both Intel/NVIDIA solutions...
> 

Hi Kevin,

We have fixed set of attributes which are GPU/graphics specific, like
framebuffer_length, resolution, number of heads, max supported instances.
If you are going with fixed set of attributes, how are the vGPU types on
KVMGT looks like from attributes point of view? attributes are graphics
specific attributes?

Thanks,
Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-22 14:26                   ` [Qemu-devel] [libvirt] " Daniel P. Berrange
@ 2016-09-28 19:22                     ` Neo Jia
  2016-09-28 19:45                       ` Tian, Kevin
                                         ` (2 more replies)
  0 siblings, 3 replies; 62+ messages in thread
From: Neo Jia @ 2016-09-28 19:22 UTC (permalink / raw)
  To: Daniel P. Berrange, Alex Williamson
  Cc: Kirti Wankhede, Andy Currid, Tian, Kevin, libvir-list,
	qemu-devel, Song, Jike, Gerd Hoffmann, Paolo Bonzini, bjsdjshi

On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > On Thu, 22 Sep 2016 09:41:20 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > 
> > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > >>>>> their vendor driver so that a given name means something very
> > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > >>>>> coordinate that a type id means a unique thing across all their product
> > > >>>>> lines.
> > > >>>>>       
> > > >>>>
> > > >>>> Let me clarify, type id should be unique in the list of
> > > >>>> mdev_supported_types. You can't have 2 directories in with same name.    
> > > >>>
> > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > >>> our XML on the wrong attribute.  If the new device does not support
> > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > >>> whatever type-id 11 happens to be on this new card.
> > > >>>     
> > > >>
> > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > >> not be there in its mdev_supported_types, it will have different types.
> > > >>
> > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > >> would have to find 'create' file in sysfs in following directory format:
> > > >>
> > > >>  --- mdev_supported_types
> > > >>      |-- 11
> > > >>      |   |-- create
> > > >>
> > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > >> error on not able to find '11' directory.  
> > > > 
> > > > This really seems like an accident waiting to happen.  What happens
> > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > yet another arbitrary requirement that we have some sort of globally
> > > > unique type-id database make a lot of sense?  The same issue applies
> > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > name is the primary index for the mdev device, I know what it is.
> > > > Seeing type-id='11' is meaningless.
> > > >  
> > > 
> > > Let me clarify again, type '11' is a string that vendor driver would
> > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > If 2 vendors used same string we can't control that. right?
> > > 
> > > 
> > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > >>>> decide what to use for directory name and same should be used in device
> > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > >>>>
> > > >>>>     <device>
> > > >>>>       <name>my-vgpu</name>
> > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > >>>>       <capability type='mdev'>
> > > >>>>         <type='11'/>
> > > >>>>         ...
> > > >>>>       </capability>
> > > >>>>     </device>
> > 
> > Then let's get rid of the 'name' attribute and let the sysfs directory
> > simply be the name.  Then we can get rid of 'type' altogether so we
> > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,
> 
> That sounds nice to me - we don't need two unique identifiers if
> one will do.

Hi Alex and Daniel,

I just had some internal discussions here within NVIDIA and found out that
actually the name/label potentially might not be unique and the "id" will be. 
So I think we still would like to keep both so the id is the programmatic id
and the name/label is a human readable string for it, which might get changed to
be non-unique by outside of engineering.

Sorry for the change.

Thanks,
Neo

> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 19:22                     ` Neo Jia
@ 2016-09-28 19:45                       ` Tian, Kevin
  2016-09-28 19:59                         ` Neo Jia
  2016-09-28 19:55                       ` Alex Williamson
  2016-09-29  8:03                       ` Daniel P. Berrange
  2 siblings, 1 reply; 62+ messages in thread
From: Tian, Kevin @ 2016-09-28 19:45 UTC (permalink / raw)
  To: Neo Jia, Daniel P. Berrange, Alex Williamson
  Cc: Kirti Wankhede, Andy Currid, libvir-list, qemu-devel, Song, Jike,
	Gerd Hoffmann, Paolo Bonzini, bjsdjshi

> From: Neo Jia [mailto:cjia@nvidia.com]
> Sent: Thursday, September 29, 2016 3:23 AM
> 
> On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >
> > > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > > >>>>> their vendor driver so that a given name means something very
> > > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > > >>>>> coordinate that a type id means a unique thing across all their product
> > > > >>>>> lines.
> > > > >>>>>
> > > > >>>>
> > > > >>>> Let me clarify, type id should be unique in the list of
> > > > >>>> mdev_supported_types. You can't have 2 directories in with same name.
> > > > >>>
> > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > >>> our XML on the wrong attribute.  If the new device does not support
> > > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > >>> whatever type-id 11 happens to be on this new card.
> > > > >>>
> > > > >>
> > > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > >> not be there in its mdev_supported_types, it will have different types.
> > > > >>
> > > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > > >> would have to find 'create' file in sysfs in following directory format:
> > > > >>
> > > > >>  --- mdev_supported_types
> > > > >>      |-- 11
> > > > >>      |   |-- create
> > > > >>
> > > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > > >> error on not able to find '11' directory.
> > > > >
> > > > > This really seems like an accident waiting to happen.  What happens
> > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > name is the primary index for the mdev device, I know what it is.
> > > > > Seeing type-id='11' is meaningless.
> > > > >
> > > >
> > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > If 2 vendors used same string we can't control that. right?
> > > >
> > > >
> > > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > > >>>> decide what to use for directory name and same should be used in device
> > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > >>>>
> > > > >>>>     <device>
> > > > >>>>       <name>my-vgpu</name>
> > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > >>>>       <capability type='mdev'>
> > > > >>>>         <type='11'/>
> > > > >>>>         ...
> > > > >>>>       </capability>
> > > > >>>>     </device>
> > >
> > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,
> >
> > That sounds nice to me - we don't need two unique identifiers if
> > one will do.
> 
> Hi Alex and Daniel,
> 
> I just had some internal discussions here within NVIDIA and found out that
> actually the name/label potentially might not be unique and the "id" will be.
> So I think we still would like to keep both so the id is the programmatic id
> and the name/label is a human readable string for it, which might get changed to
> be non-unique by outside of engineering.
> 
> Sorry for the change.
> 
> Thanks,
> Neo
> 

A curious question. How do we expect such a descriptive name/label used
by upper-level stack (e.g. openstack)? Should openstack define a vGPU
flavor just using ID (GRID-type11) or using both ID/name (GRID-type11-
M60-0B) for end customer to choose? If it's only for human information,
does it make sense e.g. providing only unique ID in sysfs while relying on 
vendor specific documentation to describe what the ID actually means?

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-20  9:47 ` Daniel P. Berrange
@ 2016-09-28 19:48   ` Neo Jia
  2016-09-29  8:06     ` Daniel P. Berrange
  0 siblings, 1 reply; 62+ messages in thread
From: Neo Jia @ 2016-09-28 19:48 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Kirti Wankhede, libvir-list, Andy Currid, Tian, Kevin,
	qemu-devel, Song, Jike, Alex Williamson, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Tue, Sep 20, 2016 at 10:47:53AM +0100, Daniel P. Berrange wrote:
> On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
> > 
> > Hi libvirt experts,
> > 
> > Thanks for valuable input on v1 version of RFC.
> > 
> > Quick brief, VFIO based mediated device framework provides a way to
> > virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> > and IBM's channel IO. This framework reuses VFIO APIs for all the
> > functionalities for mediated devices which are currently being used for
> > pass through devices. This framework introduces a set of new sysfs files
> > for device creation and its life cycle management.
> > 
> > Here is the summary of discussion on v1:
> > 1. Discover mediated device:
> > As part of physical device initialization process, vendor driver will
> > register their physical devices, which will be used to create virtual
> > device (mediated device, aka mdev) to the mediated framework.
> > 
> > Vendor driver should specify mdev_supported_types in directory format.
> > This format is class based, for example, display class directory format
> > should be as below. We need to define such set for each class of devices
> > which would be supported by mediated device framework.
> > 
> >  --- mdev_destroy
> >  --- mdev_supported_types
> >      |-- 11
> >      |   |-- create
> >      |   |-- name
> >      |   |-- fb_length
> >      |   |-- resolution
> >      |   |-- heads
> >      |   |-- max_instances
> >      |   |-- params
> >      |   |-- requires_group
> >      |-- 12
> >      |   |-- create
> >      |   |-- name
> >      |   |-- fb_length
> >      |   |-- resolution
> >      |   |-- heads
> >      |   |-- max_instances
> >      |   |-- params
> >      |   |-- requires_group
> >      |-- 13
> >          |-- create
> >          |-- name
> >          |-- fb_length
> >          |-- resolution
> >          |-- heads
> >          |-- max_instances
> >          |-- params
> >          |-- requires_group
> > 
> > 
> > In the above example directory '11' represents a type id of mdev device.
> > 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> > 'requires_group' would be Read-Only files that vendor would provide to
> > describe about that type.
> > 
> > 'create':
> >     Write-only file. Mandatory.
> >     Accepts string to create mediated device.
> > 
> > 'name':
> >     Read-Only file. Mandatory.
> >     Returns string, the name of that type id.
> 
> Presumably this is a human-targetted title/description of
> the device.
> 
> > 
> > 'fb_length':
> >     Read-only file. Mandatory.
> >     Returns <number>{K,M,G}, size of framebuffer.
> > 
> > 'resolution':
> >     Read-Only file. Mandatory.
> >     Returns 'hres x vres' format. Maximum supported resolution.
> > 
> > 'heads':
> >     Read-Only file. Mandatory.
> >     Returns integer. Number of maximum heads supported.
> 
> None of these should be mandatory as that makes the mdev
> useless for non-GPU devices.
> 
> I'd expect to see a 'class' or 'type' attribute in the
> directory whcih tells you what kind of mdev it is. A
> valid 'class' value would be 'gpu'. The fb_length,
> resolution, and heads parameters would only be mandatory
> when class==gpu.
> 

Hi Daniel,

Here you are proposing to add a class named "gpu", which will make all those gpu
related attributes mandatory, which libvirt can allow user to better
parse/present a particular mdev configuration?

I am just wondering if there is another option that we just make all those
attributes that a mdev device can have as optional but still meaningful to
libvirt, so libvirt can still parse / recognize them as an class "mdev".

In general, I am just trying to understand the requirement from libvirt and see
how we can fit in this requirement for both Intel and NVIDIA since Intel is also
moving to the type-based interface although they don't have "class" concept yet.

Thanks,
Neo

> > 'max_instance':
> >     Read-Only file. Mandatory.
> >     Returns integer.  Returns maximum mdev device could be created
> > at the moment when this file is read. This count would be updated by
> > vendor driver. Before creating mdev device of this type, check if
> > max_instance is > 0.
> > 
> > 'params'
> >     Write-Only file. Optional.
> >     String input. Libvirt would pass the string given in XML file to
> > this file and then create mdev device. Set empty string to clear params.
> > For example, set parameter 'frame_rate_limiter=0' to disable frame rate
> > limiter for performance benchmarking, then create device of type 11. The
> > device created would have that parameter set by vendor driver.
> 
> Nope, libvirt will explicitly *NEVER* allow arbitrary opaque
> passthrough of vendor specific data in this way.
> 
> > The parent device would look like:
> > 
> >    <device>
> >      <name>pci_0000_86_00_0</name>
> >      <capability type='pci'>
> >        <domain>0</domain>
> >        <bus>134</bus>
> >        <slot>0</slot>
> >        <function>0</function>
> >        <capability type='mdev'>
> >          <!-- one type element per sysfs directory -->
> >          <type id='11'>
> >            <!-- one element per sysfs file roughly -->
> >            <name>GRID M60-0B</name>
> >            <attribute name='fb_length'>512M</attribute>
> >            <attribute name='resolution'>2560x1600</attribute>
> >            <attribute name='heads'>2</attribute>
> >            <attribute name='max_instances'>16</attribute>
> >            <attribute name='requires_group'>1</attribute>
> >          </type>
> 
> There would need to be a <class> element, eg <class>gpu</class>
> 
> We would then have further elements based on the class. eg
> 
>           <type id='11'>
>             <!-- one element per sysfs file roughly -->
>             <name>GRID M60-0B</name>
>             <fb_length>512M</fb_length>
>             <resolution>2560x1600</resolution>
>             <heads>2</heads>
>             <max_instances>16</max_instances>
>             <requires_group>1</requires_group>
>           </type>
> 
> 
> 
> >        </capability>
> >        <product id='...'>GRID M60</product>
> >        <vendor id='0x10de'>NVIDIA</vendor>
> >      </capability>
> >    </device>
> > 
> > 2. Create/destroy mediated device
> > 
> > With above example, vGPU device XML would look like:
> > 
> >    <device>
> >      <name>my-vgpu</name>
> >      <parent>pci_0000_86_00_0</parent>
> >      <capability type='mdev'>
> >        <type id='11'/>
> >        <group>1</group>
> >        <params>'frame_rate_limiter=0'</params>
> 
> No, we will not support <params> in this manner in libvirt.
> 
> The entire purpose of libvirt is to represent data in a
> vendor agnostic manner and not do abitrary passthrough
> of vendor specific data. Simply saying this field is
> optional does not get around that either.
> 
> >      </capability>
> >    </device>
> > 
> > 'type id' is mandatory.
> > 'group' is optional. It should be a unique number in the system among
> > all the groups created for mdev devices. Its usage is:
> >   - not needed if single vGPU device is being assigned to a domain.
> >   - only need to be set if multiple vGPUs need to be assigned to a
> > domain and vendor driver have 'requires_group' file in type id directory.
> >   - if type id directory include 'requires_group' and user tries to
> > assign multiple vGPUs to a domain without having <group> field in XML,
> > it will create single vGPU.
> > 
> > 'params' is optional field. User should set this field if extra
> > parameters need to be set for a particular vGPU device. Libvirt don't
> > need to parse these params. These are meant for vendor driver.
> > 
> > Libvirt need to follow the sequence to create device:
> > * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0,
> > then only proceed else fail.
> > 
> > * Set extra params if 'params' field exist in device XML and 'params'
> > file exist in type id directory
> > 
> >     echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params
> 
> We cannot do that step.
> 
> > 
> > * Autogenerate UUID
> > * Create device:
> > 
> >     echo "$UUID:<group>" > /sys/../0000\:86\:00.0/11/create
> > 
> >     where <group> is optional. Group should be unique number among all
> > the groups created for mdev devices.
> > 
> > * Clear params, if set earlier:
> > 
> >     echo "" > /sys/../0000\:86\:00.0/11/params
> > 
> > * To destroy device:
> > 
> >     echo $UUID > /sys/../0000\:86\:00.0/mdev_destroy
> > 
> > 
> > 3. Start/stop mediated device
> > 
> > No change or requirement for libvirt as this will be handled by open()
> > and close() callbacks to vendor driver. In case of multiple devices and
> > 'requires_group' set, this will be handled in 'first open()' and 'last
> > close()' on device in that group.
> > 
> > 4. Launch QEMU/VM
> > 
> >  Pass the mdev sysfs path to QEMU as vfio-pci device.
> >  For above vGPU device example:
> > 
> >     -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID
> > 
> > 5. QEMU/VM Shutdown sequence
> > 
> > No change or requirement for libvirt.
> > 
> > 6. VM Reset
> > 
> > No change or requirement for libvirt as this will be handled via VFIO
> > reset API and QEMU process will keep running as before.
> > 
> > 7. Hot-plug
> > 
> > It is same syntax to create a virtual device for hot-plug.
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 19:22                     ` Neo Jia
  2016-09-28 19:45                       ` Tian, Kevin
@ 2016-09-28 19:55                       ` Alex Williamson
  2016-09-28 20:06                         ` Neo Jia
  2016-09-29  8:03                       ` Daniel P. Berrange
  2 siblings, 1 reply; 62+ messages in thread
From: Alex Williamson @ 2016-09-28 19:55 UTC (permalink / raw)
  To: Neo Jia
  Cc: Daniel P. Berrange, Kirti Wankhede, Andy Currid, Tian, Kevin,
	libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Wed, 28 Sep 2016 12:22:35 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:  
> > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > >   
> > > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > > >>>>> their vendor driver so that a given name means something very
> > > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > > >>>>> coordinate that a type id means a unique thing across all their product
> > > > >>>>> lines.
> > > > >>>>>         
> > > > >>>>
> > > > >>>> Let me clarify, type id should be unique in the list of
> > > > >>>> mdev_supported_types. You can't have 2 directories in with same name.      
> > > > >>>
> > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > >>> our XML on the wrong attribute.  If the new device does not support
> > > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > >>> whatever type-id 11 happens to be on this new card.
> > > > >>>       
> > > > >>
> > > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > >> not be there in its mdev_supported_types, it will have different types.
> > > > >>
> > > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > > >> would have to find 'create' file in sysfs in following directory format:
> > > > >>
> > > > >>  --- mdev_supported_types
> > > > >>      |-- 11
> > > > >>      |   |-- create
> > > > >>
> > > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > > >> error on not able to find '11' directory.    
> > > > > 
> > > > > This really seems like an accident waiting to happen.  What happens
> > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > name is the primary index for the mdev device, I know what it is.
> > > > > Seeing type-id='11' is meaningless.
> > > > >    
> > > > 
> > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > If 2 vendors used same string we can't control that. right?
> > > > 
> > > >   
> > > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > > >>>> decide what to use for directory name and same should be used in device
> > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > >>>>
> > > > >>>>     <device>
> > > > >>>>       <name>my-vgpu</name>
> > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > >>>>       <capability type='mdev'>
> > > > >>>>         <type='11'/>
> > > > >>>>         ...
> > > > >>>>       </capability>
> > > > >>>>     </device>  
> > > 
> > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,  
> > 
> > That sounds nice to me - we don't need two unique identifiers if
> > one will do.  
> 
> Hi Alex and Daniel,
> 
> I just had some internal discussions here within NVIDIA and found out that
> actually the name/label potentially might not be unique and the "id" will be. 
> So I think we still would like to keep both so the id is the programmatic id
> and the name/label is a human readable string for it, which might get changed to
> be non-unique by outside of engineering.

I think your discovery only means that for your vendor driver, the name
will be "11" (as a string).  Perhaps you'd like some sort of vendor
provided description within each type, but I am not in favor of having
an arbitrary integer value imply something specific within the sysfs
interface.  IOW, the NVIDIA vendor driver should be able to create:

11
├── create
├── description
├── etc
└── resolution

While Intel might create:

Skylake-vGPU
├── create
├── description
├── etc
└── resolution

Maybe "description" is optional for vendors that use useful names?
Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 19:45                       ` Tian, Kevin
@ 2016-09-28 19:59                         ` Neo Jia
  2016-09-28 20:31                           ` Laine Stump
  2016-09-28 22:49                           ` Alex Williamson
  0 siblings, 2 replies; 62+ messages in thread
From: Neo Jia @ 2016-09-28 19:59 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Daniel P. Berrange, Alex Williamson, Kirti Wankhede, Andy Currid,
	libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Wed, Sep 28, 2016 at 07:45:38PM +0000, Tian, Kevin wrote:
> > From: Neo Jia [mailto:cjia@nvidia.com]
> > Sent: Thursday, September 29, 2016 3:23 AM
> > 
> > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > >
> > > > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > > > >>>>> their vendor driver so that a given name means something very
> > > > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > > > >>>>> coordinate that a type id means a unique thing across all their product
> > > > > >>>>> lines.
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > >>>> mdev_supported_types. You can't have 2 directories in with same name.
> > > > > >>>
> > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > > >>> our XML on the wrong attribute.  If the new device does not support
> > > > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > >>>
> > > > > >>
> > > > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > > >> not be there in its mdev_supported_types, it will have different types.
> > > > > >>
> > > > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > > > >> would have to find 'create' file in sysfs in following directory format:
> > > > > >>
> > > > > >>  --- mdev_supported_types
> > > > > >>      |-- 11
> > > > > >>      |   |-- create
> > > > > >>
> > > > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > > > >> error on not able to find '11' directory.
> > > > > >
> > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > Seeing type-id='11' is meaningless.
> > > > > >
> > > > >
> > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > > If 2 vendors used same string we can't control that. right?
> > > > >
> > > > >
> > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > > > >>>> decide what to use for directory name and same should be used in device
> > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > >>>>
> > > > > >>>>     <device>
> > > > > >>>>       <name>my-vgpu</name>
> > > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > > >>>>       <capability type='mdev'>
> > > > > >>>>         <type='11'/>
> > > > > >>>>         ...
> > > > > >>>>       </capability>
> > > > > >>>>     </device>
> > > >
> > > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,
> > >
> > > That sounds nice to me - we don't need two unique identifiers if
> > > one will do.
> > 
> > Hi Alex and Daniel,
> > 
> > I just had some internal discussions here within NVIDIA and found out that
> > actually the name/label potentially might not be unique and the "id" will be.
> > So I think we still would like to keep both so the id is the programmatic id
> > and the name/label is a human readable string for it, which might get changed to
> > be non-unique by outside of engineering.
> > 
> > Sorry for the change.
> > 
> > Thanks,
> > Neo
> > 
> 
> A curious question. How do we expect such a descriptive name/label used
> by upper-level stack (e.g. openstack)? Should openstack define a vGPU
> flavor just using ID (GRID-type11) or using both ID/name (GRID-type11-
> M60-0B) for end customer to choose? If it's only for human information,
> does it make sense e.g. providing only unique ID in sysfs while relying on 
> vendor specific documentation to describe what the ID actually means?

Hi Kevin,

The id is not visible to the upper-level stack, only the name / label will be
shown to the end customer to choose, such as "GRID-M60-0B", as we might expose
the same virtual device (name/label) with some internal difference which will 
be tracked by the different unique id.

I think having the ability to allow libvirt or upper-level stack to display a
human readable string for a given type of vgpu will make the user life easier.

Thanks,
Neo

> 
> Thanks,
> Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 19:55                       ` Alex Williamson
@ 2016-09-28 20:06                         ` Neo Jia
  2016-09-28 22:39                           ` Alex Williamson
  0 siblings, 1 reply; 62+ messages in thread
From: Neo Jia @ 2016-09-28 20:06 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Daniel P. Berrange, Kirti Wankhede, Andy Currid, Tian, Kevin,
	libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Wed, Sep 28, 2016 at 01:55:47PM -0600, Alex Williamson wrote:
> On Wed, 28 Sep 2016 12:22:35 -0700
> Neo Jia <cjia@nvidia.com> wrote:
> 
> > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:  
> > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > >   
> > > > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > > > >>>>> their vendor driver so that a given name means something very
> > > > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > > > >>>>> coordinate that a type id means a unique thing across all their product
> > > > > >>>>> lines.
> > > > > >>>>>         
> > > > > >>>>
> > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > >>>> mdev_supported_types. You can't have 2 directories in with same name.      
> > > > > >>>
> > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > > >>> our XML on the wrong attribute.  If the new device does not support
> > > > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > >>>       
> > > > > >>
> > > > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > > >> not be there in its mdev_supported_types, it will have different types.
> > > > > >>
> > > > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > > > >> would have to find 'create' file in sysfs in following directory format:
> > > > > >>
> > > > > >>  --- mdev_supported_types
> > > > > >>      |-- 11
> > > > > >>      |   |-- create
> > > > > >>
> > > > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > > > >> error on not able to find '11' directory.    
> > > > > > 
> > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > Seeing type-id='11' is meaningless.
> > > > > >    
> > > > > 
> > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > > If 2 vendors used same string we can't control that. right?
> > > > > 
> > > > >   
> > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > > > >>>> decide what to use for directory name and same should be used in device
> > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > >>>>
> > > > > >>>>     <device>
> > > > > >>>>       <name>my-vgpu</name>
> > > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > > >>>>       <capability type='mdev'>
> > > > > >>>>         <type='11'/>
> > > > > >>>>         ...
> > > > > >>>>       </capability>
> > > > > >>>>     </device>  
> > > > 
> > > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,  
> > > 
> > > That sounds nice to me - we don't need two unique identifiers if
> > > one will do.  
> > 
> > Hi Alex and Daniel,
> > 
> > I just had some internal discussions here within NVIDIA and found out that
> > actually the name/label potentially might not be unique and the "id" will be. 
> > So I think we still would like to keep both so the id is the programmatic id
> > and the name/label is a human readable string for it, which might get changed to
> > be non-unique by outside of engineering.
> 
> I think your discovery only means that for your vendor driver, the name
> will be "11" (as a string).  Perhaps you'd like some sort of vendor
> provided description within each type, but I am not in favor of having
> an arbitrary integer value imply something specific within the sysfs
> interface.  IOW, the NVIDIA vendor driver should be able to create:
> 
> 11
> ├── create
> ├── description
> ├── etc
> └── resolution
> 
> While Intel might create:
> 
> Skylake-vGPU
> ├── create
> ├── description
> ├── etc
> └── resolution
> 
> Maybe "description" is optional for vendors that use useful names?
> Thanks,

I think we should be able to have a unique vendor type string instead of an
arbitrary integer value there as long as we are allowed to have a description
field that can be used to show to the end user as "name / label". 

Thanks,
Neo

> 
> Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 19:59                         ` Neo Jia
@ 2016-09-28 20:31                           ` Laine Stump
  2016-09-28 20:47                             ` Neo Jia
  2016-09-28 22:49                           ` Alex Williamson
  1 sibling, 1 reply; 62+ messages in thread
From: Laine Stump @ 2016-09-28 20:31 UTC (permalink / raw)
  To: qemu-devel
  Cc: Neo Jia, Tian, Kevin, Currid, Song, Jike, libvir-list,
	Kirti Wankhede, Andy, Gerd Hoffmann, Paolo Bonzini, bjsdjshi

On 09/28/2016 03:59 PM, Neo Jia wrote:
> On Wed, Sep 28, 2016 at 07:45:38PM +0000, Tian, Kevin wrote:
>>> From: Neo Jia [mailto:cjia@nvidia.com]
>>> Sent: Thursday, September 29, 2016 3:23 AM
>>>
>>> On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
>>>> On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
>>>>> On Thu, 22 Sep 2016 09:41:20 +0530
>>>>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
>>>>>
>>>>>>>>>>> My concern is that a type id seems arbitrary but we're specifying that
>>>>>>>>>>> it be unique.  We already have something unique, the name.  So why try
>>>>>>>>>>> to make the type id unique as well?  A vendor can accidentally create
>>>>>>>>>>> their vendor driver so that a given name means something very
>>>>>>>>>>> specific.  On the other hand they need to be extremely deliberate to
>>>>>>>>>>> coordinate that a type id means a unique thing across all their product
>>>>>>>>>>> lines.
>>>>>>>>>>>
>>>>>>>>>> Let me clarify, type id should be unique in the list of
>>>>>>>>>> mdev_supported_types. You can't have 2 directories in with same name.
>>>>>>>>> Of course, but does that mean it's only unique to the machine I'm
>>>>>>>>> currently running on?  Let's say I have a Tesla P100 on my system and
>>>>>>>>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
>>>>>>>>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
>>>>>>>>> new card still going to be a "GRID-M60-0B"?  If not then we've based
>>>>>>>>> our XML on the wrong attribute.  If the new device does not support
>>>>>>>>> "GRID-M60-0B" then we should generate an error, not simply initialize
>>>>>>>>> whatever type-id 11 happens to be on this new card.
>>>>>>>>>
>>>>>>>> If there are 2 M60 in the system then you would find '11' type directory
>>>>>>>> in mdev_supported_types of both M60. If you have P100, '11' type would
>>>>>>>> not be there in its mdev_supported_types, it will have different types.
>>>>>>>>
>>>>>>>> For example, if you replace M60 with P100, but XML is not updated. XML
>>>>>>>> have type '11'. When libvirt would try to create mdev device, libvirt
>>>>>>>> would have to find 'create' file in sysfs in following directory format:
>>>>>>>>
>>>>>>>>   --- mdev_supported_types
>>>>>>>>       |-- 11
>>>>>>>>       |   |-- create
>>>>>>>>
>>>>>>>> but now for P100, '11' directory is not there, so libvirt should throw
>>>>>>>> error on not able to find '11' directory.
>>>>>>> This really seems like an accident waiting to happen.  What happens
>>>>>>> when the user replaces their M60 with an Intel XYZ device that happens
>>>>>>> to expose a type 11 mdev class gpu device?  How is libvirt supposed to
>>>>>>> know that the XML used to refer to a GRID-M60-0B and now it's an
>>>>>>> INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
>>>>>>> yet another arbitrary requirement that we have some sort of globally
>>>>>>> unique type-id database make a lot of sense?  The same issue applies
>>>>>>> for simple debug-ability, if I'm reviewing the XML for a domain and the
>>>>>>> name is the primary index for the mdev device, I know what it is.
>>>>>>> Seeing type-id='11' is meaningless.
>>>>>>>
>>>>>> Let me clarify again, type '11' is a string that vendor driver would
>>>>>> define (see my previous reply below) it could be "11" or "GRID-M60-0B".
>>>>>> If 2 vendors used same string we can't control that. right?
>>>>>>
>>>>>>
>>>>>>>>>> Lets remove 'id' from type id in XML if that is the concern. Supported
>>>>>>>>>> types is going to be defined by vendor driver, so let vendor driver
>>>>>>>>>> decide what to use for directory name and same should be used in device
>>>>>>>>>> xml file, it could be '11' or "GRID M60-0B":
>>>>>>>>>>
>>>>>>>>>>      <device>
>>>>>>>>>>        <name>my-vgpu</name>
>>>>>>>>>>        <parent>pci_0000_86_00_0</parent>
>>>>>>>>>>        <capability type='mdev'>
>>>>>>>>>>          <type='11'/>
>>>>>>>>>>          ...
>>>>>>>>>>        </capability>
>>>>>>>>>>      </device>
>>>>> Then let's get rid of the 'name' attribute and let the sysfs directory
>>>>> simply be the name.  Then we can get rid of 'type' altogether so we
>>>>> don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,
>>>> That sounds nice to me - we don't need two unique identifiers if
>>>> one will do.
>>> Hi Alex and Daniel,
>>>
>>> I just had some internal discussions here within NVIDIA and found out that
>>> actually the name/label potentially might not be unique and the "id" will be.

My comment below follows the above statement^^^^.

>>> So I think we still would like to keep both so the id is the programmatic id
>>> and the name/label is a human readable string for it, which might get changed to
>>> be non-unique by outside of engineering.
>>>
>>> Sorry for the change.
>>>
>>> Thanks,
>>> Neo
>>>
>> A curious question. How do we expect such a descriptive name/label used
>> by upper-level stack (e.g. openstack)? Should openstack define a vGPU
>> flavor just using ID (GRID-type11) or using both ID/name (GRID-type11-
>> M60-0B) for end customer to choose? If it's only for human information,
>> does it make sense e.g. providing only unique ID in sysfs while relying on
>> vendor specific documentation to describe what the ID actually means?
> Hi Kevin,
>
> The id is not visible to the upper-level stack, only the name / label will be
> shown to the end customer to choose, such as "GRID-M60-0B", as we might expose
> the same virtual device (name/label) with some internal difference which will
> be tracked by the different unique id.

If the upper layer will only see the descriptive name/label, then that 
label must be unique. It's not acceptable for a particular key used by 
management software to sometimes lead to one flavor of device and 
sometimes another (no matter how small the differences may be). So if 
only the ID is unique, then the ID is what must be used in any 
configuration at any level.

>
> I think having the ability to allow libvirt or upper-level stack to display a
> human readable string for a given type of vgpu will make the user life easier.
>
> Thanks,
> Neo
>
>> Thanks,
>> Kevin
>
>
>
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 20:31                           ` Laine Stump
@ 2016-09-28 20:47                             ` Neo Jia
  0 siblings, 0 replies; 62+ messages in thread
From: Neo Jia @ 2016-09-28 20:47 UTC (permalink / raw)
  To: Laine Stump
  Cc: qemu-devel, Tian, Kevin, Currid, Song, Jike, libvir-list,
	Kirti Wankhede, Andy, Gerd Hoffmann, Paolo Bonzini, bjsdjshi

On Wed, Sep 28, 2016 at 04:31:25PM -0400, Laine Stump wrote:
> On 09/28/2016 03:59 PM, Neo Jia wrote:
> > On Wed, Sep 28, 2016 at 07:45:38PM +0000, Tian, Kevin wrote:
> > > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > > Sent: Thursday, September 29, 2016 3:23 AM
> > > > 
> > > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > > > > 
> > > > > > > > > > > > My concern is that a type id seems arbitrary but we're specifying that
> > > > > > > > > > > > it be unique.  We already have something unique, the name.  So why try
> > > > > > > > > > > > to make the type id unique as well?  A vendor can accidentally create
> > > > > > > > > > > > their vendor driver so that a given name means something very
> > > > > > > > > > > > specific.  On the other hand they need to be extremely deliberate to
> > > > > > > > > > > > coordinate that a type id means a unique thing across all their product
> > > > > > > > > > > > lines.
> > > > > > > > > > > > 
> > > > > > > > > > > Let me clarify, type id should be unique in the list of
> > > > > > > > > > > mdev_supported_types. You can't have 2 directories in with same name.
> > > > > > > > > > Of course, but does that mean it's only unique to the machine I'm
> > > > > > > > > > currently running on?  Let's say I have a Tesla P100 on my system and
> > > > > > > > > > type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > > > > > > replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > > > > > > > new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > > > > > > > our XML on the wrong attribute.  If the new device does not support
> > > > > > > > > > "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > > > > > > > whatever type-id 11 happens to be on this new card.
> > > > > > > > > > 
> > > > > > > > > If there are 2 M60 in the system then you would find '11' type directory
> > > > > > > > > in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > > > > > > not be there in its mdev_supported_types, it will have different types.
> > > > > > > > > 
> > > > > > > > > For example, if you replace M60 with P100, but XML is not updated. XML
> > > > > > > > > have type '11'. When libvirt would try to create mdev device, libvirt
> > > > > > > > > would have to find 'create' file in sysfs in following directory format:
> > > > > > > > > 
> > > > > > > > >   --- mdev_supported_types
> > > > > > > > >       |-- 11
> > > > > > > > >       |   |-- create
> > > > > > > > > 
> > > > > > > > > but now for P100, '11' directory is not there, so libvirt should throw
> > > > > > > > > error on not able to find '11' directory.
> > > > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > > > Seeing type-id='11' is meaningless.
> > > > > > > > 
> > > > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > > > > If 2 vendors used same string we can't control that. right?
> > > > > > > 
> > > > > > > 
> > > > > > > > > > > Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > > > > > > > > types is going to be defined by vendor driver, so let vendor driver
> > > > > > > > > > > decide what to use for directory name and same should be used in device
> > > > > > > > > > > xml file, it could be '11' or "GRID M60-0B":
> > > > > > > > > > > 
> > > > > > > > > > >      <device>
> > > > > > > > > > >        <name>my-vgpu</name>
> > > > > > > > > > >        <parent>pci_0000_86_00_0</parent>
> > > > > > > > > > >        <capability type='mdev'>
> > > > > > > > > > >          <type='11'/>
> > > > > > > > > > >          ...
> > > > > > > > > > >        </capability>
> > > > > > > > > > >      </device>
> > > > > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > > > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > > > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,
> > > > > That sounds nice to me - we don't need two unique identifiers if
> > > > > one will do.
> > > > Hi Alex and Daniel,
> > > > 
> > > > I just had some internal discussions here within NVIDIA and found out that
> > > > actually the name/label potentially might not be unique and the "id" will be.
> 
> My comment below follows the above statement^^^^.
> 
> > > > So I think we still would like to keep both so the id is the programmatic id
> > > > and the name/label is a human readable string for it, which might get changed to
> > > > be non-unique by outside of engineering.
> > > > 
> > > > Sorry for the change.
> > > > 
> > > > Thanks,
> > > > Neo
> > > > 
> > > A curious question. How do we expect such a descriptive name/label used
> > > by upper-level stack (e.g. openstack)? Should openstack define a vGPU
> > > flavor just using ID (GRID-type11) or using both ID/name (GRID-type11-
> > > M60-0B) for end customer to choose? If it's only for human information,
> > > does it make sense e.g. providing only unique ID in sysfs while relying on
> > > vendor specific documentation to describe what the ID actually means?
> > Hi Kevin,
> > 
> > The id is not visible to the upper-level stack, only the name / label will be
> > shown to the end customer to choose, such as "GRID-M60-0B", as we might expose
> > the same virtual device (name/label) with some internal difference which will
> > be tracked by the different unique id.
> 
> If the upper layer will only see the descriptive name/label, then that label
> must be unique. It's not acceptable for a particular key used by management
> software to sometimes lead to one flavor of device and sometimes another (no
> matter how small the differences may be). So if only the ID is unique, then
> the ID is what must be used in any configuration at any level.

Probably I should be clear about the "upper layer" as I was replying to Kevin
regarding what the end user will see from their interface.

The "key" will be unique throughout all management stacks and you should use the
"key / ID" for all configurations. But when you want to show a human readable
description of a virtual device, it will come from the "description" field.

Thanks,
Neo

> 
> > 
> > I think having the ability to allow libvirt or upper-level stack to display a
> > human readable string for a given type of vgpu will make the user life easier.
> > 
> > Thanks,
> > Neo
> > 
> > > Thanks,
> > > Kevin
> > 
> > 
> > 
> > 
> > --
> > libvir-list mailing list
> > libvir-list@redhat.com
> > https://www.redhat.com/mailman/listinfo/libvir-list
> > 
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 20:06                         ` Neo Jia
@ 2016-09-28 22:39                           ` Alex Williamson
  0 siblings, 0 replies; 62+ messages in thread
From: Alex Williamson @ 2016-09-28 22:39 UTC (permalink / raw)
  To: Neo Jia
  Cc: Daniel P. Berrange, Kirti Wankhede, Andy Currid, Tian, Kevin,
	libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Wed, 28 Sep 2016 13:06:31 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Wed, Sep 28, 2016 at 01:55:47PM -0600, Alex Williamson wrote:
> > On Wed, 28 Sep 2016 12:22:35 -0700
> > Neo Jia <cjia@nvidia.com> wrote:
> >   
> > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:  
> > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:    
> > > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > > >     
> > > > > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > > > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > > > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > > > > >>>>> their vendor driver so that a given name means something very
> > > > > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > > > > >>>>> coordinate that a type id means a unique thing across all their product
> > > > > > >>>>> lines.
> > > > > > >>>>>           
> > > > > > >>>>
> > > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > > >>>> mdev_supported_types. You can't have 2 directories in with same name.        
> > > > > > >>>
> > > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > > > >>> our XML on the wrong attribute.  If the new device does not support
> > > > > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > > >>>         
> > > > > > >>
> > > > > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > > > >> not be there in its mdev_supported_types, it will have different types.
> > > > > > >>
> > > > > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > > > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > > > > >> would have to find 'create' file in sysfs in following directory format:
> > > > > > >>
> > > > > > >>  --- mdev_supported_types
> > > > > > >>      |-- 11
> > > > > > >>      |   |-- create
> > > > > > >>
> > > > > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > > > > >> error on not able to find '11' directory.      
> > > > > > > 
> > > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > > Seeing type-id='11' is meaningless.
> > > > > > >      
> > > > > > 
> > > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > > > If 2 vendors used same string we can't control that. right?
> > > > > > 
> > > > > >     
> > > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > > > > >>>> decide what to use for directory name and same should be used in device
> > > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > > >>>>
> > > > > > >>>>     <device>
> > > > > > >>>>       <name>my-vgpu</name>
> > > > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > > > >>>>       <capability type='mdev'>
> > > > > > >>>>         <type='11'/>
> > > > > > >>>>         ...
> > > > > > >>>>       </capability>
> > > > > > >>>>     </device>    
> > > > > 
> > > > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,    
> > > > 
> > > > That sounds nice to me - we don't need two unique identifiers if
> > > > one will do.    
> > > 
> > > Hi Alex and Daniel,
> > > 
> > > I just had some internal discussions here within NVIDIA and found out that
> > > actually the name/label potentially might not be unique and the "id" will be. 
> > > So I think we still would like to keep both so the id is the programmatic id
> > > and the name/label is a human readable string for it, which might get changed to
> > > be non-unique by outside of engineering.  
> > 
> > I think your discovery only means that for your vendor driver, the name
> > will be "11" (as a string).  Perhaps you'd like some sort of vendor
> > provided description within each type, but I am not in favor of having
> > an arbitrary integer value imply something specific within the sysfs
> > interface.  IOW, the NVIDIA vendor driver should be able to create:
> > 
> > 11
> > ├── create
> > ├── description
> > ├── etc
> > └── resolution
> > 
> > While Intel might create:
> > 
> > Skylake-vGPU
> > ├── create
> > ├── description
> > ├── etc
> > └── resolution
> > 
> > Maybe "description" is optional for vendors that use useful names?
> > Thanks,  
> 
> I think we should be able to have a unique vendor type string instead of an
> arbitrary integer value there as long as we are allowed to have a description
> field that can be used to show to the end user as "name / label". 

The key is what goes into the XML and I think that should be unique,
both between vendors and within the vendor, so "11" is probably a poor
choice.  We could perhaps get a per-vendor-driver namespace by
prefixing the name of the vendor driver, then you could have
nvidia-vgpu-mdev-11 (or whatever you end up naming your vendor
driver).  Of course if the vendor driver wanted to translate "11" into
a more stable, human readable name, you could export something like
"nvidia-vgpu-mdev-GRID11" (I have no idea what your unique types are
based on).  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 19:59                         ` Neo Jia
  2016-09-28 20:31                           ` Laine Stump
@ 2016-09-28 22:49                           ` Alex Williamson
  1 sibling, 0 replies; 62+ messages in thread
From: Alex Williamson @ 2016-09-28 22:49 UTC (permalink / raw)
  To: Neo Jia
  Cc: Tian, Kevin, Daniel P. Berrange, Kirti Wankhede, Andy Currid,
	libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Wed, 28 Sep 2016 12:59:59 -0700
Neo Jia <cjia@nvidia.com> wrote:

> On Wed, Sep 28, 2016 at 07:45:38PM +0000, Tian, Kevin wrote:
> > > From: Neo Jia [mailto:cjia@nvidia.com]
> > > Sent: Thursday, September 29, 2016 3:23 AM
> > > 
> > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:  
> > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:  
> > > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > > >  
> > > > > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > > > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > > > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > > > > >>>>> their vendor driver so that a given name means something very
> > > > > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > > > > >>>>> coordinate that a type id means a unique thing across all their product
> > > > > > >>>>> lines.
> > > > > > >>>>>  
> > > > > > >>>>
> > > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > > >>>> mdev_supported_types. You can't have 2 directories in with same name.  
> > > > > > >>>
> > > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > > > >>> our XML on the wrong attribute.  If the new device does not support
> > > > > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > > >>>  
> > > > > > >>
> > > > > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > > > >> not be there in its mdev_supported_types, it will have different types.
> > > > > > >>
> > > > > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > > > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > > > > >> would have to find 'create' file in sysfs in following directory format:
> > > > > > >>
> > > > > > >>  --- mdev_supported_types
> > > > > > >>      |-- 11
> > > > > > >>      |   |-- create
> > > > > > >>
> > > > > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > > > > >> error on not able to find '11' directory.  
> > > > > > >
> > > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > > Seeing type-id='11' is meaningless.
> > > > > > >  
> > > > > >
> > > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > > > If 2 vendors used same string we can't control that. right?
> > > > > >
> > > > > >  
> > > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > > > > >>>> decide what to use for directory name and same should be used in device
> > > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > > >>>>
> > > > > > >>>>     <device>
> > > > > > >>>>       <name>my-vgpu</name>
> > > > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > > > >>>>       <capability type='mdev'>
> > > > > > >>>>         <type='11'/>
> > > > > > >>>>         ...
> > > > > > >>>>       </capability>
> > > > > > >>>>     </device>  
> > > > >
> > > > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,  
> > > >
> > > > That sounds nice to me - we don't need two unique identifiers if
> > > > one will do.  
> > > 
> > > Hi Alex and Daniel,
> > > 
> > > I just had some internal discussions here within NVIDIA and found out that
> > > actually the name/label potentially might not be unique and the "id" will be.
> > > So I think we still would like to keep both so the id is the programmatic id
> > > and the name/label is a human readable string for it, which might get changed to
> > > be non-unique by outside of engineering.
> > > 
> > > Sorry for the change.
> > > 
> > > Thanks,
> > > Neo
> > >   
> > 
> > A curious question. How do we expect such a descriptive name/label used
> > by upper-level stack (e.g. openstack)? Should openstack define a vGPU
> > flavor just using ID (GRID-type11) or using both ID/name (GRID-type11-
> > M60-0B) for end customer to choose? If it's only for human information,
> > does it make sense e.g. providing only unique ID in sysfs while relying on 
> > vendor specific documentation to describe what the ID actually means?  
> 
> Hi Kevin,
> 
> The id is not visible to the upper-level stack, only the name / label will be
> shown to the end customer to choose, such as "GRID-M60-0B", as we might expose
> the same virtual device (name/label) with some internal difference which will 
> be tracked by the different unique id.
> 
> I think having the ability to allow libvirt or upper-level stack to display a
> human readable string for a given type of vgpu will make the user life easier.

Again, we need a stable and unique string to go into the XML.  That
should represent a consistent device regardless of host driver versions
or specific hardware.  I would consider that string to be user visible.

Also, can you define exactly what you mean by "unique"?  What's the
purpose of the label "GRID-M60-0B" if it's not unique?  Does type 11
mean "GRID-M60-0B" as implemented on a specific card and type 12 might
mean "GRID-M60-0B" as implemented on a different card?  Do you want
your users to be able to instantiate their VM on any "GRID-M60-0B"
mdev, or does it need to be a type 11 GRID-M60-0B mdev?  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 19:22                     ` Neo Jia
  2016-09-28 19:45                       ` Tian, Kevin
  2016-09-28 19:55                       ` Alex Williamson
@ 2016-09-29  8:03                       ` Daniel P. Berrange
  2016-09-29  8:12                         ` Neo Jia
  2 siblings, 1 reply; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-29  8:03 UTC (permalink / raw)
  To: Neo Jia
  Cc: Alex Williamson, Kirti Wankhede, Andy Currid, Tian, Kevin,
	libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Wed, Sep 28, 2016 at 12:22:35PM -0700, Neo Jia wrote:
> On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > 
> > > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > > >>>>> their vendor driver so that a given name means something very
> > > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > > >>>>> coordinate that a type id means a unique thing across all their product
> > > > >>>>> lines.
> > > > >>>>>       
> > > > >>>>
> > > > >>>> Let me clarify, type id should be unique in the list of
> > > > >>>> mdev_supported_types. You can't have 2 directories in with same name.    
> > > > >>>
> > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > >>> our XML on the wrong attribute.  If the new device does not support
> > > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > >>> whatever type-id 11 happens to be on this new card.
> > > > >>>     
> > > > >>
> > > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > >> not be there in its mdev_supported_types, it will have different types.
> > > > >>
> > > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > > >> would have to find 'create' file in sysfs in following directory format:
> > > > >>
> > > > >>  --- mdev_supported_types
> > > > >>      |-- 11
> > > > >>      |   |-- create
> > > > >>
> > > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > > >> error on not able to find '11' directory.  
> > > > > 
> > > > > This really seems like an accident waiting to happen.  What happens
> > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > name is the primary index for the mdev device, I know what it is.
> > > > > Seeing type-id='11' is meaningless.
> > > > >  
> > > > 
> > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > If 2 vendors used same string we can't control that. right?
> > > > 
> > > > 
> > > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > > >>>> decide what to use for directory name and same should be used in device
> > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > >>>>
> > > > >>>>     <device>
> > > > >>>>       <name>my-vgpu</name>
> > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > >>>>       <capability type='mdev'>
> > > > >>>>         <type='11'/>
> > > > >>>>         ...
> > > > >>>>       </capability>
> > > > >>>>     </device>
> > > 
> > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,
> > 
> > That sounds nice to me - we don't need two unique identifiers if
> > one will do.
> 
> Hi Alex and Daniel,
> 
> I just had some internal discussions here within NVIDIA and found out that
> actually the name/label potentially might not be unique and the "id" will be. 
> So I think we still would like to keep both so the id is the programmatic id
> and the name/label is a human readable string for it, which might get changed to
> be non-unique by outside of engineering.

The name is what the end user will used to decide which type of device
to create. If the name is not unique, how is the user supposed to decide
which of the duplicates to use. Allowing duplicate names just sounds
broken to me.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-28 19:48   ` Neo Jia
@ 2016-09-29  8:06     ` Daniel P. Berrange
  2016-09-29 14:35       ` Tian, Kevin
  0 siblings, 1 reply; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-29  8:06 UTC (permalink / raw)
  To: Neo Jia
  Cc: Kirti Wankhede, libvir-list, Andy Currid, Tian, Kevin,
	qemu-devel, Song, Jike, Alex Williamson, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Wed, Sep 28, 2016 at 12:48:33PM -0700, Neo Jia wrote:
> On Tue, Sep 20, 2016 at 10:47:53AM +0100, Daniel P. Berrange wrote:
> > On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
> > > 
> > > Hi libvirt experts,
> > > 
> > > Thanks for valuable input on v1 version of RFC.
> > > 
> > > Quick brief, VFIO based mediated device framework provides a way to
> > > virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> > > and IBM's channel IO. This framework reuses VFIO APIs for all the
> > > functionalities for mediated devices which are currently being used for
> > > pass through devices. This framework introduces a set of new sysfs files
> > > for device creation and its life cycle management.
> > > 
> > > Here is the summary of discussion on v1:
> > > 1. Discover mediated device:
> > > As part of physical device initialization process, vendor driver will
> > > register their physical devices, which will be used to create virtual
> > > device (mediated device, aka mdev) to the mediated framework.
> > > 
> > > Vendor driver should specify mdev_supported_types in directory format.
> > > This format is class based, for example, display class directory format
> > > should be as below. We need to define such set for each class of devices
> > > which would be supported by mediated device framework.
> > > 
> > >  --- mdev_destroy
> > >  --- mdev_supported_types
> > >      |-- 11
> > >      |   |-- create
> > >      |   |-- name
> > >      |   |-- fb_length
> > >      |   |-- resolution
> > >      |   |-- heads
> > >      |   |-- max_instances
> > >      |   |-- params
> > >      |   |-- requires_group
> > >      |-- 12
> > >      |   |-- create
> > >      |   |-- name
> > >      |   |-- fb_length
> > >      |   |-- resolution
> > >      |   |-- heads
> > >      |   |-- max_instances
> > >      |   |-- params
> > >      |   |-- requires_group
> > >      |-- 13
> > >          |-- create
> > >          |-- name
> > >          |-- fb_length
> > >          |-- resolution
> > >          |-- heads
> > >          |-- max_instances
> > >          |-- params
> > >          |-- requires_group
> > > 
> > > 
> > > In the above example directory '11' represents a type id of mdev device.
> > > 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> > > 'requires_group' would be Read-Only files that vendor would provide to
> > > describe about that type.
> > > 
> > > 'create':
> > >     Write-only file. Mandatory.
> > >     Accepts string to create mediated device.
> > > 
> > > 'name':
> > >     Read-Only file. Mandatory.
> > >     Returns string, the name of that type id.
> > 
> > Presumably this is a human-targetted title/description of
> > the device.
> > 
> > > 
> > > 'fb_length':
> > >     Read-only file. Mandatory.
> > >     Returns <number>{K,M,G}, size of framebuffer.
> > > 
> > > 'resolution':
> > >     Read-Only file. Mandatory.
> > >     Returns 'hres x vres' format. Maximum supported resolution.
> > > 
> > > 'heads':
> > >     Read-Only file. Mandatory.
> > >     Returns integer. Number of maximum heads supported.
> > 
> > None of these should be mandatory as that makes the mdev
> > useless for non-GPU devices.
> > 
> > I'd expect to see a 'class' or 'type' attribute in the
> > directory whcih tells you what kind of mdev it is. A
> > valid 'class' value would be 'gpu'. The fb_length,
> > resolution, and heads parameters would only be mandatory
> > when class==gpu.
> > 
> 
> Hi Daniel,
> 
> Here you are proposing to add a class named "gpu", which will make all those gpu
> related attributes mandatory, which libvirt can allow user to better
> parse/present a particular mdev configuration?
> 
> I am just wondering if there is another option that we just make all those
> attributes that a mdev device can have as optional but still meaningful to
> libvirt, so libvirt can still parse / recognize them as an class "mdev".

'mdev' isn't a class - mdev is the name of the kernel module. The class
refers to the broad capability of the device. class would be things
like "gpu", "nic", "fpga" or other such things. The point of the class
is to identify which other attributes will be considered mandatory.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration
  2016-09-29  8:03                       ` Daniel P. Berrange
@ 2016-09-29  8:12                         ` Neo Jia
  0 siblings, 0 replies; 62+ messages in thread
From: Neo Jia @ 2016-09-29  8:12 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Alex Williamson, Kirti Wankhede, Andy Currid, Tian, Kevin,
	libvir-list, qemu-devel, Song, Jike, Gerd Hoffmann,
	Paolo Bonzini, bjsdjshi

On Thu, Sep 29, 2016 at 09:03:40AM +0100, Daniel P. Berrange wrote:
> On Wed, Sep 28, 2016 at 12:22:35PM -0700, Neo Jia wrote:
> > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote:
> > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote:
> > > > On Thu, 22 Sep 2016 09:41:20 +0530
> > > > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> > > > 
> > > > > >>>>> My concern is that a type id seems arbitrary but we're specifying that
> > > > > >>>>> it be unique.  We already have something unique, the name.  So why try
> > > > > >>>>> to make the type id unique as well?  A vendor can accidentally create
> > > > > >>>>> their vendor driver so that a given name means something very
> > > > > >>>>> specific.  On the other hand they need to be extremely deliberate to
> > > > > >>>>> coordinate that a type id means a unique thing across all their product
> > > > > >>>>> lines.
> > > > > >>>>>       
> > > > > >>>>
> > > > > >>>> Let me clarify, type id should be unique in the list of
> > > > > >>>> mdev_supported_types. You can't have 2 directories in with same name.    
> > > > > >>>
> > > > > >>> Of course, but does that mean it's only unique to the machine I'm
> > > > > >>> currently running on?  Let's say I have a Tesla P100 on my system and
> > > > > >>> type-id 11 is named "GRID-M60-0B".  At some point in the future I
> > > > > >>> replace the Tesla P100 with a Q1000 (made up).  Is type-id 11 on that
> > > > > >>> new card still going to be a "GRID-M60-0B"?  If not then we've based
> > > > > >>> our XML on the wrong attribute.  If the new device does not support
> > > > > >>> "GRID-M60-0B" then we should generate an error, not simply initialize
> > > > > >>> whatever type-id 11 happens to be on this new card.
> > > > > >>>     
> > > > > >>
> > > > > >> If there are 2 M60 in the system then you would find '11' type directory
> > > > > >> in mdev_supported_types of both M60. If you have P100, '11' type would
> > > > > >> not be there in its mdev_supported_types, it will have different types.
> > > > > >>
> > > > > >> For example, if you replace M60 with P100, but XML is not updated. XML
> > > > > >> have type '11'. When libvirt would try to create mdev device, libvirt
> > > > > >> would have to find 'create' file in sysfs in following directory format:
> > > > > >>
> > > > > >>  --- mdev_supported_types
> > > > > >>      |-- 11
> > > > > >>      |   |-- create
> > > > > >>
> > > > > >> but now for P100, '11' directory is not there, so libvirt should throw
> > > > > >> error on not able to find '11' directory.  
> > > > > > 
> > > > > > This really seems like an accident waiting to happen.  What happens
> > > > > > when the user replaces their M60 with an Intel XYZ device that happens
> > > > > > to expose a type 11 mdev class gpu device?  How is libvirt supposed to
> > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an
> > > > > > INTEL-IGD-XYZ?  Doesn't basing the XML entry on the name and removing
> > > > > > yet another arbitrary requirement that we have some sort of globally
> > > > > > unique type-id database make a lot of sense?  The same issue applies
> > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the
> > > > > > name is the primary index for the mdev device, I know what it is.
> > > > > > Seeing type-id='11' is meaningless.
> > > > > >  
> > > > > 
> > > > > Let me clarify again, type '11' is a string that vendor driver would
> > > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B".
> > > > > If 2 vendors used same string we can't control that. right?
> > > > > 
> > > > > 
> > > > > >>>> Lets remove 'id' from type id in XML if that is the concern. Supported
> > > > > >>>> types is going to be defined by vendor driver, so let vendor driver
> > > > > >>>> decide what to use for directory name and same should be used in device
> > > > > >>>> xml file, it could be '11' or "GRID M60-0B":
> > > > > >>>>
> > > > > >>>>     <device>
> > > > > >>>>       <name>my-vgpu</name>
> > > > > >>>>       <parent>pci_0000_86_00_0</parent>
> > > > > >>>>       <capability type='mdev'>
> > > > > >>>>         <type='11'/>
> > > > > >>>>         ...
> > > > > >>>>       </capability>
> > > > > >>>>     </device>
> > > > 
> > > > Then let's get rid of the 'name' attribute and let the sysfs directory
> > > > simply be the name.  Then we can get rid of 'type' altogether so we
> > > > don't have this '11' vs 'GRID-M60-0B' issue.  Thanks,
> > > 
> > > That sounds nice to me - we don't need two unique identifiers if
> > > one will do.
> > 
> > Hi Alex and Daniel,
> > 
> > I just had some internal discussions here within NVIDIA and found out that
> > actually the name/label potentially might not be unique and the "id" will be. 
> > So I think we still would like to keep both so the id is the programmatic id
> > and the name/label is a human readable string for it, which might get changed to
> > be non-unique by outside of engineering.
> 
> The name is what the end user will used to decide which type of device
> to create. If the name is not unique, how is the user supposed to decide
> which of the duplicates to use. Allowing duplicate names just sounds
> broken to me.

Yes, the name is what the end user will use to decide and from their point of
view the virtual device is presented to them same. For example, it might come
from two different physical devices (I think Alex has mentioned the same on his
response to me). To achieve the same "virtual device", there
will be internal implementation differences, therefore the "id" will be unique.

Thanks,
Neo

> 
> 
> Regards,
> Daniel
> -- 
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
> |: http://libvirt.org              -o-             http://virt-manager.org :|
> |: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-21 19:03             ` Alex Williamson
  2016-09-22  4:11               ` Kirti Wankhede
@ 2016-09-29 14:22               ` Kirti Wankhede
  1 sibling, 0 replies; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-29 14:22 UTC (permalink / raw)
  To: Alex Williamson
  Cc: libvir-list, qemu-devel, Paolo Bonzini, Neo Jia, Andy Currid,
	Tian, Kevin, Song, Jike, bjsdjshi, Gerd Hoffmann


>>>>>>>>
>>>>>>>> 7. Hot-plug
>>>>>>>>
>>>>>>>> It is same syntax to create a virtual device for hot-plug.      
>>>>>>>
>>>>>>> How do groups work with hotplug?  Can a device be creating into an
>>>>>>> existing, running group?  Can a device be removed from an existing,
>>>>>>> running group?  Thanks,
>>>>>>>       
>>>>>>
>>>>>> Group is for vendor driver to decide when to commit resources for a
>>>>>> group of devices to support multiple devices in a domain. It will be
>>>>>> upto vendor driver how it manage it.    
>>>>>
>>>>> Then it's not sufficiently specified for libvirt to use.  Thanks,
>>>>>     
>>>>
>>>> Can you clarify what if required for libvirt to use?  
>>>
>>> Everything we're discussing above.  We can't simply say that group
>>> management is defined by the vendor driver, that means that libvirt
>>> needs to know how to uniquely behave for each vendor driver.  We want
>>> to create a consistent framework where libvirt doesn't care what the
>>> vendor driver is.  Therefore we need to make group manipulation fully
>>> specified, regardless of the vendor driver.  Thanks,
>>>  
>>
>> There is no communication between different vendor drivers. So if
>> libvirt creates a group with one device from one vendor and one device
>> from another vendor, both vendor driver would think they have a single
>> device and accordingly commit resources for that single device from its
>> own driver. Ideally nothing would fail. But that is not the main aim of
>> grouping, right?
>>
>> To make this independent of vendor driver, we had initial proposal of
>> having same UUID and different instance numbers. But that was also not
>> acceptable.
> 
> We're defining the infrastructure, we get to define the rules, but we
> need to define all the rules so that libvirt can actually implement to
> them.  We can't say "groups are this nebulous thing and each vendor
> will manage them however they want", nobody wants to care that NVIDIA
> did it one way and Intel did it another and we need to support both.
> That's a design failure.  Therefore we \must\ make the group definition
> we need well specified, any hopefully generic enough that others can
> use it.  If they cannot then we'd need to add a new type of group.
> Therefore we need to \specify\ things like for what set of mdev should
> a group be created?  Do I need a group for a single device?  How do we
> introspect a group to know which mdevs it contains?  Can we dynamically
> add or remove mdev devices from a group?  Can we attach multiple
> groups to the same user?  Can we change the association of an mdev
> device without removing it and recreating it? etc, etc, etc.  We don't
> need vendor coordination, we need a specification of exactly how this
> type of group is meant to work. Thanks,
> 

We decided to handle "multiple vGPUs per VM" through internal design
changes. With that we don't need grouping in mdev framework. Hope this
helps to accelerate and settle on sysfs interface soon.

Thanks,
Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-29  8:06     ` Daniel P. Berrange
@ 2016-09-29 14:35       ` Tian, Kevin
  2016-09-29 14:38         ` Daniel P. Berrange
  0 siblings, 1 reply; 62+ messages in thread
From: Tian, Kevin @ 2016-09-29 14:35 UTC (permalink / raw)
  To: Daniel P. Berrange, Neo Jia
  Cc: Kirti Wankhede, libvir-list, Andy Currid, qemu-devel, Song, Jike,
	Alex Williamson, Gerd Hoffmann, Paolo Bonzini, bjsdjshi

> From: Daniel P. Berrange [mailto:berrange@redhat.com]
> Sent: Thursday, September 29, 2016 4:06 PM
> 
> On Wed, Sep 28, 2016 at 12:48:33PM -0700, Neo Jia wrote:
> > On Tue, Sep 20, 2016 at 10:47:53AM +0100, Daniel P. Berrange wrote:
> > > On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
> > > >
> > > > Hi libvirt experts,
> > > >
> > > > Thanks for valuable input on v1 version of RFC.
> > > >
> > > > Quick brief, VFIO based mediated device framework provides a way to
> > > > virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> > > > and IBM's channel IO. This framework reuses VFIO APIs for all the
> > > > functionalities for mediated devices which are currently being used for
> > > > pass through devices. This framework introduces a set of new sysfs files
> > > > for device creation and its life cycle management.
> > > >
> > > > Here is the summary of discussion on v1:
> > > > 1. Discover mediated device:
> > > > As part of physical device initialization process, vendor driver will
> > > > register their physical devices, which will be used to create virtual
> > > > device (mediated device, aka mdev) to the mediated framework.
> > > >
> > > > Vendor driver should specify mdev_supported_types in directory format.
> > > > This format is class based, for example, display class directory format
> > > > should be as below. We need to define such set for each class of devices
> > > > which would be supported by mediated device framework.
> > > >
> > > >  --- mdev_destroy
> > > >  --- mdev_supported_types
> > > >      |-- 11
> > > >      |   |-- create
> > > >      |   |-- name
> > > >      |   |-- fb_length
> > > >      |   |-- resolution
> > > >      |   |-- heads
> > > >      |   |-- max_instances
> > > >      |   |-- params
> > > >      |   |-- requires_group
> > > >      |-- 12
> > > >      |   |-- create
> > > >      |   |-- name
> > > >      |   |-- fb_length
> > > >      |   |-- resolution
> > > >      |   |-- heads
> > > >      |   |-- max_instances
> > > >      |   |-- params
> > > >      |   |-- requires_group
> > > >      |-- 13
> > > >          |-- create
> > > >          |-- name
> > > >          |-- fb_length
> > > >          |-- resolution
> > > >          |-- heads
> > > >          |-- max_instances
> > > >          |-- params
> > > >          |-- requires_group
> > > >
> > > >
> > > > In the above example directory '11' represents a type id of mdev device.
> > > > 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> > > > 'requires_group' would be Read-Only files that vendor would provide to
> > > > describe about that type.
> > > >
> > > > 'create':
> > > >     Write-only file. Mandatory.
> > > >     Accepts string to create mediated device.
> > > >
> > > > 'name':
> > > >     Read-Only file. Mandatory.
> > > >     Returns string, the name of that type id.
> > >
> > > Presumably this is a human-targetted title/description of
> > > the device.
> > >
> > > >
> > > > 'fb_length':
> > > >     Read-only file. Mandatory.
> > > >     Returns <number>{K,M,G}, size of framebuffer.
> > > >
> > > > 'resolution':
> > > >     Read-Only file. Mandatory.
> > > >     Returns 'hres x vres' format. Maximum supported resolution.
> > > >
> > > > 'heads':
> > > >     Read-Only file. Mandatory.
> > > >     Returns integer. Number of maximum heads supported.
> > >
> > > None of these should be mandatory as that makes the mdev
> > > useless for non-GPU devices.
> > >
> > > I'd expect to see a 'class' or 'type' attribute in the
> > > directory whcih tells you what kind of mdev it is. A
> > > valid 'class' value would be 'gpu'. The fb_length,
> > > resolution, and heads parameters would only be mandatory
> > > when class==gpu.
> > >
> >
> > Hi Daniel,
> >
> > Here you are proposing to add a class named "gpu", which will make all those gpu
> > related attributes mandatory, which libvirt can allow user to better
> > parse/present a particular mdev configuration?
> >
> > I am just wondering if there is another option that we just make all those
> > attributes that a mdev device can have as optional but still meaningful to
> > libvirt, so libvirt can still parse / recognize them as an class "mdev".
> 
> 'mdev' isn't a class - mdev is the name of the kernel module. The class
> refers to the broad capability of the device. class would be things
> like "gpu", "nic", "fpga" or other such things. The point of the class
> is to identify which other attributes will be considered mandatory.
> 
> 

Thanks Daniel. This class definition makes sense to me.

However I'm not sure whether we should define such common mandatory attributes
of a 'gpu' class now. Intel will go with a 2's power sharing of type definition... actual 
type name to be finalized, but an example looks like below:

[GVTG-SKL-x2]: available instances (2)
[GVTG-SKL-x4]: available instances (4)
[GVTG-SKL-x8]: available instances (8)
...

User can create different types of vGPUs simultaneously. A GVTG-SKL-x2 type
vGPU will get half of the physical GPU resource, while a GVTG-SKL-x4 type will
get a quarter. However it's unclear to me how we want to enumerate those 
resources into resolution or heads. I feel it'd be more reasonable for us to push 
initial libvirt mdev support w/o vgpu specific class definition, until we see 
a clear value of doing so (at that time we then follow Daniel's guideline to define
mandatory attributes common to all GPU vendors).

Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-29 14:35       ` Tian, Kevin
@ 2016-09-29 14:38         ` Daniel P. Berrange
  2016-09-29 14:42           ` Tian, Kevin
  0 siblings, 1 reply; 62+ messages in thread
From: Daniel P. Berrange @ 2016-09-29 14:38 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Neo Jia, Kirti Wankhede, libvir-list, Andy Currid, qemu-devel,
	Song, Jike, Alex Williamson, Gerd Hoffmann, Paolo Bonzini,
	bjsdjshi

On Thu, Sep 29, 2016 at 02:35:48PM +0000, Tian, Kevin wrote:
> > From: Daniel P. Berrange [mailto:berrange@redhat.com]
> > Sent: Thursday, September 29, 2016 4:06 PM
> > 
> > On Wed, Sep 28, 2016 at 12:48:33PM -0700, Neo Jia wrote:
> > > On Tue, Sep 20, 2016 at 10:47:53AM +0100, Daniel P. Berrange wrote:
> > > > On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
> > > > >
> > > > > Hi libvirt experts,
> > > > >
> > > > > Thanks for valuable input on v1 version of RFC.
> > > > >
> > > > > Quick brief, VFIO based mediated device framework provides a way to
> > > > > virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> > > > > and IBM's channel IO. This framework reuses VFIO APIs for all the
> > > > > functionalities for mediated devices which are currently being used for
> > > > > pass through devices. This framework introduces a set of new sysfs files
> > > > > for device creation and its life cycle management.
> > > > >
> > > > > Here is the summary of discussion on v1:
> > > > > 1. Discover mediated device:
> > > > > As part of physical device initialization process, vendor driver will
> > > > > register their physical devices, which will be used to create virtual
> > > > > device (mediated device, aka mdev) to the mediated framework.
> > > > >
> > > > > Vendor driver should specify mdev_supported_types in directory format.
> > > > > This format is class based, for example, display class directory format
> > > > > should be as below. We need to define such set for each class of devices
> > > > > which would be supported by mediated device framework.
> > > > >
> > > > >  --- mdev_destroy
> > > > >  --- mdev_supported_types
> > > > >      |-- 11
> > > > >      |   |-- create
> > > > >      |   |-- name
> > > > >      |   |-- fb_length
> > > > >      |   |-- resolution
> > > > >      |   |-- heads
> > > > >      |   |-- max_instances
> > > > >      |   |-- params
> > > > >      |   |-- requires_group
> > > > >      |-- 12
> > > > >      |   |-- create
> > > > >      |   |-- name
> > > > >      |   |-- fb_length
> > > > >      |   |-- resolution
> > > > >      |   |-- heads
> > > > >      |   |-- max_instances
> > > > >      |   |-- params
> > > > >      |   |-- requires_group
> > > > >      |-- 13
> > > > >          |-- create
> > > > >          |-- name
> > > > >          |-- fb_length
> > > > >          |-- resolution
> > > > >          |-- heads
> > > > >          |-- max_instances
> > > > >          |-- params
> > > > >          |-- requires_group
> > > > >
> > > > >
> > > > > In the above example directory '11' represents a type id of mdev device.
> > > > > 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> > > > > 'requires_group' would be Read-Only files that vendor would provide to
> > > > > describe about that type.
> > > > >
> > > > > 'create':
> > > > >     Write-only file. Mandatory.
> > > > >     Accepts string to create mediated device.
> > > > >
> > > > > 'name':
> > > > >     Read-Only file. Mandatory.
> > > > >     Returns string, the name of that type id.
> > > >
> > > > Presumably this is a human-targetted title/description of
> > > > the device.
> > > >
> > > > >
> > > > > 'fb_length':
> > > > >     Read-only file. Mandatory.
> > > > >     Returns <number>{K,M,G}, size of framebuffer.
> > > > >
> > > > > 'resolution':
> > > > >     Read-Only file. Mandatory.
> > > > >     Returns 'hres x vres' format. Maximum supported resolution.
> > > > >
> > > > > 'heads':
> > > > >     Read-Only file. Mandatory.
> > > > >     Returns integer. Number of maximum heads supported.
> > > >
> > > > None of these should be mandatory as that makes the mdev
> > > > useless for non-GPU devices.
> > > >
> > > > I'd expect to see a 'class' or 'type' attribute in the
> > > > directory whcih tells you what kind of mdev it is. A
> > > > valid 'class' value would be 'gpu'. The fb_length,
> > > > resolution, and heads parameters would only be mandatory
> > > > when class==gpu.
> > > >
> > >
> > > Hi Daniel,
> > >
> > > Here you are proposing to add a class named "gpu", which will make all those gpu
> > > related attributes mandatory, which libvirt can allow user to better
> > > parse/present a particular mdev configuration?
> > >
> > > I am just wondering if there is another option that we just make all those
> > > attributes that a mdev device can have as optional but still meaningful to
> > > libvirt, so libvirt can still parse / recognize them as an class "mdev".
> > 
> > 'mdev' isn't a class - mdev is the name of the kernel module. The class
> > refers to the broad capability of the device. class would be things
> > like "gpu", "nic", "fpga" or other such things. The point of the class
> > is to identify which other attributes will be considered mandatory.
> > 
> > 
> 
> Thanks Daniel. This class definition makes sense to me.
> 
> However I'm not sure whether we should define such common mandatory attributes
> of a 'gpu' class now. Intel will go with a 2's power sharing of type definition... actual 
> type name to be finalized, but an example looks like below:
> 
> [GVTG-SKL-x2]: available instances (2)
> [GVTG-SKL-x4]: available instances (4)
> [GVTG-SKL-x8]: available instances (8)
> ...
> 
> User can create different types of vGPUs simultaneously. A GVTG-SKL-x2 type
> vGPU will get half of the physical GPU resource, while a GVTG-SKL-x4 type will
> get a quarter. However it's unclear to me how we want to enumerate those 
> resources into resolution or heads. I feel it'd be more reasonable for us to push 
> initial libvirt mdev support w/o vgpu specific class definition, until we see 
> a clear value of doing so (at that time we then follow Daniel's guideline to define
> mandatory attributes common to all GPU vendors).

Libvirt won't report arbitrary vendor define attributes. So if we are not
going to define a gpu class & associated attributes, then there will be
no reporting of the 'heads', 'resolution', 'fb_length' data described
above.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-29 14:38         ` Daniel P. Berrange
@ 2016-09-29 14:42           ` Tian, Kevin
  2016-09-30  5:19             ` Kirti Wankhede
  0 siblings, 1 reply; 62+ messages in thread
From: Tian, Kevin @ 2016-09-29 14:42 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Neo Jia, Kirti Wankhede, libvir-list, Andy Currid, qemu-devel,
	Song, Jike, Alex Williamson, Gerd Hoffmann, Paolo Bonzini,
	bjsdjshi

> From: Daniel P. Berrange [mailto:berrange@redhat.com]
> Sent: Thursday, September 29, 2016 10:39 PM
> 
> On Thu, Sep 29, 2016 at 02:35:48PM +0000, Tian, Kevin wrote:
> > > From: Daniel P. Berrange [mailto:berrange@redhat.com]
> > > Sent: Thursday, September 29, 2016 4:06 PM
> > >
> > > On Wed, Sep 28, 2016 at 12:48:33PM -0700, Neo Jia wrote:
> > > > On Tue, Sep 20, 2016 at 10:47:53AM +0100, Daniel P. Berrange wrote:
> > > > > On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
> > > > > >
> > > > > > Hi libvirt experts,
> > > > > >
> > > > > > Thanks for valuable input on v1 version of RFC.
> > > > > >
> > > > > > Quick brief, VFIO based mediated device framework provides a way to
> > > > > > virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
> > > > > > and IBM's channel IO. This framework reuses VFIO APIs for all the
> > > > > > functionalities for mediated devices which are currently being used for
> > > > > > pass through devices. This framework introduces a set of new sysfs files
> > > > > > for device creation and its life cycle management.
> > > > > >
> > > > > > Here is the summary of discussion on v1:
> > > > > > 1. Discover mediated device:
> > > > > > As part of physical device initialization process, vendor driver will
> > > > > > register their physical devices, which will be used to create virtual
> > > > > > device (mediated device, aka mdev) to the mediated framework.
> > > > > >
> > > > > > Vendor driver should specify mdev_supported_types in directory format.
> > > > > > This format is class based, for example, display class directory format
> > > > > > should be as below. We need to define such set for each class of devices
> > > > > > which would be supported by mediated device framework.
> > > > > >
> > > > > >  --- mdev_destroy
> > > > > >  --- mdev_supported_types
> > > > > >      |-- 11
> > > > > >      |   |-- create
> > > > > >      |   |-- name
> > > > > >      |   |-- fb_length
> > > > > >      |   |-- resolution
> > > > > >      |   |-- heads
> > > > > >      |   |-- max_instances
> > > > > >      |   |-- params
> > > > > >      |   |-- requires_group
> > > > > >      |-- 12
> > > > > >      |   |-- create
> > > > > >      |   |-- name
> > > > > >      |   |-- fb_length
> > > > > >      |   |-- resolution
> > > > > >      |   |-- heads
> > > > > >      |   |-- max_instances
> > > > > >      |   |-- params
> > > > > >      |   |-- requires_group
> > > > > >      |-- 13
> > > > > >          |-- create
> > > > > >          |-- name
> > > > > >          |-- fb_length
> > > > > >          |-- resolution
> > > > > >          |-- heads
> > > > > >          |-- max_instances
> > > > > >          |-- params
> > > > > >          |-- requires_group
> > > > > >
> > > > > >
> > > > > > In the above example directory '11' represents a type id of mdev device.
> > > > > > 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
> > > > > > 'requires_group' would be Read-Only files that vendor would provide to
> > > > > > describe about that type.
> > > > > >
> > > > > > 'create':
> > > > > >     Write-only file. Mandatory.
> > > > > >     Accepts string to create mediated device.
> > > > > >
> > > > > > 'name':
> > > > > >     Read-Only file. Mandatory.
> > > > > >     Returns string, the name of that type id.
> > > > >
> > > > > Presumably this is a human-targetted title/description of
> > > > > the device.
> > > > >
> > > > > >
> > > > > > 'fb_length':
> > > > > >     Read-only file. Mandatory.
> > > > > >     Returns <number>{K,M,G}, size of framebuffer.
> > > > > >
> > > > > > 'resolution':
> > > > > >     Read-Only file. Mandatory.
> > > > > >     Returns 'hres x vres' format. Maximum supported resolution.
> > > > > >
> > > > > > 'heads':
> > > > > >     Read-Only file. Mandatory.
> > > > > >     Returns integer. Number of maximum heads supported.
> > > > >
> > > > > None of these should be mandatory as that makes the mdev
> > > > > useless for non-GPU devices.
> > > > >
> > > > > I'd expect to see a 'class' or 'type' attribute in the
> > > > > directory whcih tells you what kind of mdev it is. A
> > > > > valid 'class' value would be 'gpu'. The fb_length,
> > > > > resolution, and heads parameters would only be mandatory
> > > > > when class==gpu.
> > > > >
> > > >
> > > > Hi Daniel,
> > > >
> > > > Here you are proposing to add a class named "gpu", which will make all those gpu
> > > > related attributes mandatory, which libvirt can allow user to better
> > > > parse/present a particular mdev configuration?
> > > >
> > > > I am just wondering if there is another option that we just make all those
> > > > attributes that a mdev device can have as optional but still meaningful to
> > > > libvirt, so libvirt can still parse / recognize them as an class "mdev".
> > >
> > > 'mdev' isn't a class - mdev is the name of the kernel module. The class
> > > refers to the broad capability of the device. class would be things
> > > like "gpu", "nic", "fpga" or other such things. The point of the class
> > > is to identify which other attributes will be considered mandatory.
> > >
> > >
> >
> > Thanks Daniel. This class definition makes sense to me.
> >
> > However I'm not sure whether we should define such common mandatory attributes
> > of a 'gpu' class now. Intel will go with a 2's power sharing of type definition... actual
> > type name to be finalized, but an example looks like below:
> >
> > [GVTG-SKL-x2]: available instances (2)
> > [GVTG-SKL-x4]: available instances (4)
> > [GVTG-SKL-x8]: available instances (8)
> > ...
> >
> > User can create different types of vGPUs simultaneously. A GVTG-SKL-x2 type
> > vGPU will get half of the physical GPU resource, while a GVTG-SKL-x4 type will
> > get a quarter. However it's unclear to me how we want to enumerate those
> > resources into resolution or heads. I feel it'd be more reasonable for us to push
> > initial libvirt mdev support w/o vgpu specific class definition, until we see
> > a clear value of doing so (at that time we then follow Daniel's guideline to define
> > mandatory attributes common to all GPU vendors).
> 
> Libvirt won't report arbitrary vendor define attributes. So if we are not
> going to define a gpu class & associated attributes, then there will be
> no reporting of the 'heads', 'resolution', 'fb_length' data described
> above.
> 

yes, that's my point. I think nvidia may put them into the 'description' attribute
just for descriptive purpose for now.
Thanks
Kevin

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-29 14:42           ` Tian, Kevin
@ 2016-09-30  5:19             ` Kirti Wankhede
  2016-10-03  8:20               ` Kirti Wankhede
  0 siblings, 1 reply; 62+ messages in thread
From: Kirti Wankhede @ 2016-09-30  5:19 UTC (permalink / raw)
  To: Tian, Kevin, Daniel P. Berrange
  Cc: Neo Jia, libvir-list, Andy Currid, qemu-devel, Song, Jike,
	Alex Williamson, Gerd Hoffmann, Paolo Bonzini, bjsdjshi



On 9/29/2016 8:12 PM, Tian, Kevin wrote:
>> From: Daniel P. Berrange [mailto:berrange@redhat.com]
>> Sent: Thursday, September 29, 2016 10:39 PM
>>
>> On Thu, Sep 29, 2016 at 02:35:48PM +0000, Tian, Kevin wrote:
>>>> From: Daniel P. Berrange [mailto:berrange@redhat.com]
>>>> Sent: Thursday, September 29, 2016 4:06 PM
>>>>
>>>> On Wed, Sep 28, 2016 at 12:48:33PM -0700, Neo Jia wrote:
>>>>> On Tue, Sep 20, 2016 at 10:47:53AM +0100, Daniel P. Berrange wrote:
>>>>>> On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote:
>>>>>>>
>>>>>>> Hi libvirt experts,
>>>>>>>
>>>>>>> Thanks for valuable input on v1 version of RFC.
>>>>>>>
>>>>>>> Quick brief, VFIO based mediated device framework provides a way to
>>>>>>> virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT
>>>>>>> and IBM's channel IO. This framework reuses VFIO APIs for all the
>>>>>>> functionalities for mediated devices which are currently being used for
>>>>>>> pass through devices. This framework introduces a set of new sysfs files
>>>>>>> for device creation and its life cycle management.
>>>>>>>
>>>>>>> Here is the summary of discussion on v1:
>>>>>>> 1. Discover mediated device:
>>>>>>> As part of physical device initialization process, vendor driver will
>>>>>>> register their physical devices, which will be used to create virtual
>>>>>>> device (mediated device, aka mdev) to the mediated framework.
>>>>>>>
>>>>>>> Vendor driver should specify mdev_supported_types in directory format.
>>>>>>> This format is class based, for example, display class directory format
>>>>>>> should be as below. We need to define such set for each class of devices
>>>>>>> which would be supported by mediated device framework.
>>>>>>>
>>>>>>>  --- mdev_destroy
>>>>>>>  --- mdev_supported_types
>>>>>>>      |-- 11
>>>>>>>      |   |-- create
>>>>>>>      |   |-- name
>>>>>>>      |   |-- fb_length
>>>>>>>      |   |-- resolution
>>>>>>>      |   |-- heads
>>>>>>>      |   |-- max_instances
>>>>>>>      |   |-- params
>>>>>>>      |   |-- requires_group
>>>>>>>      |-- 12
>>>>>>>      |   |-- create
>>>>>>>      |   |-- name
>>>>>>>      |   |-- fb_length
>>>>>>>      |   |-- resolution
>>>>>>>      |   |-- heads
>>>>>>>      |   |-- max_instances
>>>>>>>      |   |-- params
>>>>>>>      |   |-- requires_group
>>>>>>>      |-- 13
>>>>>>>          |-- create
>>>>>>>          |-- name
>>>>>>>          |-- fb_length
>>>>>>>          |-- resolution
>>>>>>>          |-- heads
>>>>>>>          |-- max_instances
>>>>>>>          |-- params
>>>>>>>          |-- requires_group
>>>>>>>
>>>>>>>
>>>>>>> In the above example directory '11' represents a type id of mdev device.
>>>>>>> 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and
>>>>>>> 'requires_group' would be Read-Only files that vendor would provide to
>>>>>>> describe about that type.
>>>>>>>
>>>>>>> 'create':
>>>>>>>     Write-only file. Mandatory.
>>>>>>>     Accepts string to create mediated device.
>>>>>>>
>>>>>>> 'name':
>>>>>>>     Read-Only file. Mandatory.
>>>>>>>     Returns string, the name of that type id.
>>>>>>
>>>>>> Presumably this is a human-targetted title/description of
>>>>>> the device.
>>>>>>
>>>>>>>
>>>>>>> 'fb_length':
>>>>>>>     Read-only file. Mandatory.
>>>>>>>     Returns <number>{K,M,G}, size of framebuffer.
>>>>>>>
>>>>>>> 'resolution':
>>>>>>>     Read-Only file. Mandatory.
>>>>>>>     Returns 'hres x vres' format. Maximum supported resolution.
>>>>>>>
>>>>>>> 'heads':
>>>>>>>     Read-Only file. Mandatory.
>>>>>>>     Returns integer. Number of maximum heads supported.
>>>>>>
>>>>>> None of these should be mandatory as that makes the mdev
>>>>>> useless for non-GPU devices.
>>>>>>
>>>>>> I'd expect to see a 'class' or 'type' attribute in the
>>>>>> directory whcih tells you what kind of mdev it is. A
>>>>>> valid 'class' value would be 'gpu'. The fb_length,
>>>>>> resolution, and heads parameters would only be mandatory
>>>>>> when class==gpu.
>>>>>>
>>>>>
>>>>> Hi Daniel,
>>>>>
>>>>> Here you are proposing to add a class named "gpu", which will make all those gpu
>>>>> related attributes mandatory, which libvirt can allow user to better
>>>>> parse/present a particular mdev configuration?
>>>>>
>>>>> I am just wondering if there is another option that we just make all those
>>>>> attributes that a mdev device can have as optional but still meaningful to
>>>>> libvirt, so libvirt can still parse / recognize them as an class "mdev".
>>>>
>>>> 'mdev' isn't a class - mdev is the name of the kernel module. The class
>>>> refers to the broad capability of the device. class would be things
>>>> like "gpu", "nic", "fpga" or other such things. The point of the class
>>>> is to identify which other attributes will be considered mandatory.
>>>>
>>>>
>>>
>>> Thanks Daniel. This class definition makes sense to me.
>>>
>>> However I'm not sure whether we should define such common mandatory attributes
>>> of a 'gpu' class now. Intel will go with a 2's power sharing of type definition... actual
>>> type name to be finalized, but an example looks like below:
>>>
>>> [GVTG-SKL-x2]: available instances (2)
>>> [GVTG-SKL-x4]: available instances (4)
>>> [GVTG-SKL-x8]: available instances (8)
>>> ...
>>>
>>> User can create different types of vGPUs simultaneously. A GVTG-SKL-x2 type
>>> vGPU will get half of the physical GPU resource, while a GVTG-SKL-x4 type will
>>> get a quarter. However it's unclear to me how we want to enumerate those
>>> resources into resolution or heads. I feel it'd be more reasonable for us to push
>>> initial libvirt mdev support w/o vgpu specific class definition, until we see
>>> a clear value of doing so (at that time we then follow Daniel's guideline to define
>>> mandatory attributes common to all GPU vendors).
>>
>> Libvirt won't report arbitrary vendor define attributes. So if we are not
>> going to define a gpu class & associated attributes, then there will be
>> no reporting of the 'heads', 'resolution', 'fb_length' data described
>> above.
>>
> 
> yes, that's my point. I think nvidia may put them into the 'description' attribute
> just for descriptive purpose for now.


Will libvirt report 'description' RO attribute, its output would be
string, so that user could be able to see the configuration of that
profile?

We can have 'class' as optional attribute. So Intel don't have to
provide 'class' attribute and they don't have to specify mandatory
attributes of that class. We would provide 'class' attribute and provide
mandatory attributes.

Thanks,
Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-09-30  5:19             ` Kirti Wankhede
@ 2016-10-03  8:20               ` Kirti Wankhede
  2016-10-07  5:16                 ` Kirti Wankhede
  0 siblings, 1 reply; 62+ messages in thread
From: Kirti Wankhede @ 2016-10-03  8:20 UTC (permalink / raw)
  To: Tian, Kevin, Daniel P. Berrange
  Cc: Neo Jia, libvir-list, Andy Currid, qemu-devel, Song, Jike,
	Alex Williamson, Gerd Hoffmann, Paolo Bonzini, bjsdjshi



On 9/30/2016 10:49 AM, Kirti Wankhede wrote:
> 
...

>>>>>> Hi Daniel,
>>>>>>
>>>>>> Here you are proposing to add a class named "gpu", which will make all those gpu
>>>>>> related attributes mandatory, which libvirt can allow user to better
>>>>>> parse/present a particular mdev configuration?
>>>>>>
>>>>>> I am just wondering if there is another option that we just make all those
>>>>>> attributes that a mdev device can have as optional but still meaningful to
>>>>>> libvirt, so libvirt can still parse / recognize them as an class "mdev".
>>>>>
>>>>> 'mdev' isn't a class - mdev is the name of the kernel module. The class
>>>>> refers to the broad capability of the device. class would be things
>>>>> like "gpu", "nic", "fpga" or other such things. The point of the class
>>>>> is to identify which other attributes will be considered mandatory.
>>>>>
>>>>>
>>>>
>>>> Thanks Daniel. This class definition makes sense to me.
>>>>
>>>> However I'm not sure whether we should define such common mandatory attributes
>>>> of a 'gpu' class now. Intel will go with a 2's power sharing of type definition... actual
>>>> type name to be finalized, but an example looks like below:
>>>>
>>>> [GVTG-SKL-x2]: available instances (2)
>>>> [GVTG-SKL-x4]: available instances (4)
>>>> [GVTG-SKL-x8]: available instances (8)
>>>> ...
>>>>
>>>> User can create different types of vGPUs simultaneously. A GVTG-SKL-x2 type
>>>> vGPU will get half of the physical GPU resource, while a GVTG-SKL-x4 type will
>>>> get a quarter. However it's unclear to me how we want to enumerate those
>>>> resources into resolution or heads. I feel it'd be more reasonable for us to push
>>>> initial libvirt mdev support w/o vgpu specific class definition, until we see
>>>> a clear value of doing so (at that time we then follow Daniel's guideline to define
>>>> mandatory attributes common to all GPU vendors).
>>>
>>> Libvirt won't report arbitrary vendor define attributes. So if we are not
>>> going to define a gpu class & associated attributes, then there will be
>>> no reporting of the 'heads', 'resolution', 'fb_length' data described
>>> above.
>>>
>>
>> yes, that's my point. I think nvidia may put them into the 'description' attribute
>> just for descriptive purpose for now.
> 
> 
> Will libvirt report 'description' RO attribute, its output would be
> string, so that user could be able to see the configuration of that
> profile?
>

Daniel,
Waiting for your input on this.

> We can have 'class' as optional attribute. So Intel don't have to
> provide 'class' attribute and they don't have to specify mandatory
> attributes of that class. We would provide 'class' attribute and provide
> mandatory attributes.


Thanks,
Kirti

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-10-03  8:20               ` Kirti Wankhede
@ 2016-10-07  5:16                 ` Kirti Wankhede
  2016-10-07 19:09                   ` Alex Williamson
  0 siblings, 1 reply; 62+ messages in thread
From: Kirti Wankhede @ 2016-10-07  5:16 UTC (permalink / raw)
  To: Tian, Kevin, Daniel P. Berrange
  Cc: Andy Currid, Song, Jike, Neo Jia, libvir-list, qemu-devel,
	Alex Williamson, Gerd Hoffmann, Paolo Bonzini, bjsdjshi

Ping..

Pulling the questions at the top.

>> Will libvirt report 'description' RO attribute, its output would be
>> string, so that user could be able to see the configuration of that
>> profile?
>>
>
> Daniel,
> Waiting for your input on this.
>
>> We can have 'class' as optional attribute. So Intel don't have to
>> provide 'class' attribute and they don't have to specify mandatory
>> attributes of that class. We would provide 'class' attribute and provide
>> mandatory attributes.
>

Thanks,
Kirti


On 10/3/2016 1:50 PM, Kirti Wankhede wrote:
> 
> 
> On 9/30/2016 10:49 AM, Kirti Wankhede wrote:
>>
> ...
> 
>>>>>>> Hi Daniel,
>>>>>>>
>>>>>>> Here you are proposing to add a class named "gpu", which will make all those gpu
>>>>>>> related attributes mandatory, which libvirt can allow user to better
>>>>>>> parse/present a particular mdev configuration?
>>>>>>>
>>>>>>> I am just wondering if there is another option that we just make all those
>>>>>>> attributes that a mdev device can have as optional but still meaningful to
>>>>>>> libvirt, so libvirt can still parse / recognize them as an class "mdev".
>>>>>>
>>>>>> 'mdev' isn't a class - mdev is the name of the kernel module. The class
>>>>>> refers to the broad capability of the device. class would be things
>>>>>> like "gpu", "nic", "fpga" or other such things. The point of the class
>>>>>> is to identify which other attributes will be considered mandatory.
>>>>>>
>>>>>>
>>>>>
>>>>> Thanks Daniel. This class definition makes sense to me.
>>>>>
>>>>> However I'm not sure whether we should define such common mandatory attributes
>>>>> of a 'gpu' class now. Intel will go with a 2's power sharing of type definition... actual
>>>>> type name to be finalized, but an example looks like below:
>>>>>
>>>>> [GVTG-SKL-x2]: available instances (2)
>>>>> [GVTG-SKL-x4]: available instances (4)
>>>>> [GVTG-SKL-x8]: available instances (8)
>>>>> ...
>>>>>
>>>>> User can create different types of vGPUs simultaneously. A GVTG-SKL-x2 type
>>>>> vGPU will get half of the physical GPU resource, while a GVTG-SKL-x4 type will
>>>>> get a quarter. However it's unclear to me how we want to enumerate those
>>>>> resources into resolution or heads. I feel it'd be more reasonable for us to push
>>>>> initial libvirt mdev support w/o vgpu specific class definition, until we see
>>>>> a clear value of doing so (at that time we then follow Daniel's guideline to define
>>>>> mandatory attributes common to all GPU vendors).
>>>>
>>>> Libvirt won't report arbitrary vendor define attributes. So if we are not
>>>> going to define a gpu class & associated attributes, then there will be
>>>> no reporting of the 'heads', 'resolution', 'fb_length' data described
>>>> above.
>>>>
>>>
>>> yes, that's my point. I think nvidia may put them into the 'description' attribute
>>> just for descriptive purpose for now.
>>
>>
>> Will libvirt report 'description' RO attribute, its output would be
>> string, so that user could be able to see the configuration of that
>> profile?
>>
> 
> Daniel,
> Waiting for your input on this.
> 
>> We can have 'class' as optional attribute. So Intel don't have to
>> provide 'class' attribute and they don't have to specify mandatory
>> attributes of that class. We would provide 'class' attribute and provide
>> mandatory attributes.
> 
> 
> Thanks,
> Kirti
> 

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration
  2016-10-07  5:16                 ` Kirti Wankhede
@ 2016-10-07 19:09                   ` Alex Williamson
  0 siblings, 0 replies; 62+ messages in thread
From: Alex Williamson @ 2016-10-07 19:09 UTC (permalink / raw)
  To: Kirti Wankhede
  Cc: Tian, Kevin, Daniel P. Berrange, Andy Currid, Song, Jike,
	Neo Jia, libvir-list, qemu-devel, Gerd Hoffmann, Paolo Bonzini,
	bjsdjshi

On Fri, 7 Oct 2016 10:46:22 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> Ping..
> 
> Pulling the questions at the top.
> 
> >> Will libvirt report 'description' RO attribute, its output would be
> >> string, so that user could be able to see the configuration of that
> >> profile?
> >>  
> >
> > Daniel,
> > Waiting for your input on this.
> >  
> >> We can have 'class' as optional attribute. So Intel don't have to
> >> provide 'class' attribute and they don't have to specify mandatory
> >> attributes of that class. We would provide 'class' attribute and provide
> >> mandatory attributes.  

Hi Kirti,

I would proceed with your design and posting, this is not a
sufficiently significant issue to delay a new version.  The next
posting will likely not be the last, more feedback will come.

Personally I don't see an issue.  We should have a small number
of mandatory sysfs interfaces for mdev devices, essentially just
enough to manage the life cycle of a device, even if it requires
vendor specific knowledge to understand the exact composition of
that device.  Some sort of human readable description seems like
a reasonable optional feature for a userspace tool like libvirt
to take advantage of.  A class definition with an available set
of attributes per class would clearly enhance the ability of a
userspace tool to understand the device, but does not seem
strictly required for managing the device.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2016-10-07 19:09 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-19 20:35 [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration Kirti Wankhede
2016-09-19 21:36 ` Alex Williamson
2016-09-19 21:50   ` Paolo Bonzini
2016-09-19 22:25     ` Alex Williamson
2016-09-20 14:35       ` Kirti Wankhede
2016-09-20 14:41         ` Daniel P. Berrange
2016-09-20 14:49           ` Paolo Bonzini
2016-09-20 14:58             ` Daniel P. Berrange
2016-09-20 15:05               ` Paolo Bonzini
2016-09-20 15:14                 ` Daniel P. Berrange
2016-09-20 16:31                   ` Kirti Wankhede
2016-09-20 16:36                     ` Daniel P. Berrange
2016-09-20 16:42                       ` Kirti Wankhede
2016-09-20 16:44                         ` Daniel P. Berrange
2016-09-20 16:46                     ` Daniel P. Berrange
2016-09-20 17:21                   ` Paolo Bonzini
2016-09-21  8:34                     ` Daniel P. Berrange
2016-09-20 14:52         ` Alex Williamson
2016-09-20  1:25   ` Tian, Kevin
2016-09-20 14:21   ` Kirti Wankhede
2016-09-20 14:43     ` Alex Williamson
2016-09-20 16:23       ` Kirti Wankhede
2016-09-20 16:50         ` Alex Williamson
2016-09-21 18:34           ` Kirti Wankhede
2016-09-21 19:03             ` Alex Williamson
2016-09-22  4:11               ` Kirti Wankhede
2016-09-22 14:19                 ` Alex Williamson
2016-09-22 14:26                   ` [Qemu-devel] [libvirt] " Daniel P. Berrange
2016-09-28 19:22                     ` Neo Jia
2016-09-28 19:45                       ` Tian, Kevin
2016-09-28 19:59                         ` Neo Jia
2016-09-28 20:31                           ` Laine Stump
2016-09-28 20:47                             ` Neo Jia
2016-09-28 22:49                           ` Alex Williamson
2016-09-28 19:55                       ` Alex Williamson
2016-09-28 20:06                         ` Neo Jia
2016-09-28 22:39                           ` Alex Williamson
2016-09-29  8:03                       ` Daniel P. Berrange
2016-09-29  8:12                         ` Neo Jia
2016-09-29 14:22               ` [Qemu-devel] " Kirti Wankhede
2016-09-21  4:10         ` Tian, Kevin
2016-09-21  4:43           ` Alex Williamson
2016-09-22  2:43             ` Tian, Kevin
2016-09-22 19:25         ` Tian, Kevin
2016-09-23 18:34           ` Kirti Wankhede
2016-09-21  3:56     ` Tian, Kevin
2016-09-21  4:36       ` Alex Williamson
2016-09-22  2:33         ` Tian, Kevin
2016-09-22  3:01           ` Alex Williamson
2016-09-22  3:42             ` Tian, Kevin
     [not found]         ` <AADFC41AFE54684AB9EE6CBC0274A5D18DF86F5F@SHSMSX101.ccr.corp.intel.com>
2016-09-22  2:59           ` Tian, Kevin
2016-09-20  1:37 ` Tian, Kevin
2016-09-20  9:47 ` Daniel P. Berrange
2016-09-28 19:48   ` Neo Jia
2016-09-29  8:06     ` Daniel P. Berrange
2016-09-29 14:35       ` Tian, Kevin
2016-09-29 14:38         ` Daniel P. Berrange
2016-09-29 14:42           ` Tian, Kevin
2016-09-30  5:19             ` Kirti Wankhede
2016-10-03  8:20               ` Kirti Wankhede
2016-10-07  5:16                 ` Kirti Wankhede
2016-10-07 19:09                   ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.