From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56870) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmHel-00058Z-Ns for qemu-devel@nongnu.org; Tue, 20 Sep 2016 05:48:05 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bmHeh-0004Uc-DH for qemu-devel@nongnu.org; Tue, 20 Sep 2016 05:48:02 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57842) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bmHeh-0004TM-2Y for qemu-devel@nongnu.org; Tue, 20 Sep 2016 05:47:59 -0400 Date: Tue, 20 Sep 2016 10:47:53 +0100 From: "Daniel P. Berrange" Message-ID: <20160920094753.GB25490@redhat.com> Reply-To: "Daniel P. Berrange" References: <00d96f24-5df0-d16b-d4e1-838333989dee@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <00d96f24-5df0-d16b-d4e1-838333989dee@nvidia.com> Subject: Re: [Qemu-devel] [RFC v2] libvirt vGPU QEMU integration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kirti Wankhede Cc: "libvir-list@redhat.com" , Andy Currid , "Tian, Kevin" , Neo Jia , qemu-devel , "Song, Jike" , Alex Williamson , Gerd Hoffmann , Paolo Bonzini , "bjsdjshi@linux.vnet.ibm.com" On Tue, Sep 20, 2016 at 02:05:52AM +0530, Kirti Wankhede wrote: > > Hi libvirt experts, > > Thanks for valuable input on v1 version of RFC. > > Quick brief, VFIO based mediated device framework provides a way to > virtualize their devices without SR-IOV, like NVIDIA vGPU, Intel KVMGT > and IBM's channel IO. This framework reuses VFIO APIs for all the > functionalities for mediated devices which are currently being used for > pass through devices. This framework introduces a set of new sysfs files > for device creation and its life cycle management. > > Here is the summary of discussion on v1: > 1. Discover mediated device: > As part of physical device initialization process, vendor driver will > register their physical devices, which will be used to create virtual > device (mediated device, aka mdev) to the mediated framework. > > Vendor driver should specify mdev_supported_types in directory format. > This format is class based, for example, display class directory format > should be as below. We need to define such set for each class of devices > which would be supported by mediated device framework. > > --- mdev_destroy > --- mdev_supported_types > |-- 11 > | |-- create > | |-- name > | |-- fb_length > | |-- resolution > | |-- heads > | |-- max_instances > | |-- params > | |-- requires_group > |-- 12 > | |-- create > | |-- name > | |-- fb_length > | |-- resolution > | |-- heads > | |-- max_instances > | |-- params > | |-- requires_group > |-- 13 > |-- create > |-- name > |-- fb_length > |-- resolution > |-- heads > |-- max_instances > |-- params > |-- requires_group > > > In the above example directory '11' represents a type id of mdev device. > 'name', 'fb_length', 'resolution', 'heads', 'max_instance' and > 'requires_group' would be Read-Only files that vendor would provide to > describe about that type. > > 'create': > Write-only file. Mandatory. > Accepts string to create mediated device. > > 'name': > Read-Only file. Mandatory. > Returns string, the name of that type id. Presumably this is a human-targetted title/description of the device. > > 'fb_length': > Read-only file. Mandatory. > Returns {K,M,G}, size of framebuffer. > > 'resolution': > Read-Only file. Mandatory. > Returns 'hres x vres' format. Maximum supported resolution. > > 'heads': > Read-Only file. Mandatory. > Returns integer. Number of maximum heads supported. None of these should be mandatory as that makes the mdev useless for non-GPU devices. I'd expect to see a 'class' or 'type' attribute in the directory whcih tells you what kind of mdev it is. A valid 'class' value would be 'gpu'. The fb_length, resolution, and heads parameters would only be mandatory when class==gpu. > 'max_instance': > Read-Only file. Mandatory. > Returns integer. Returns maximum mdev device could be created > at the moment when this file is read. This count would be updated by > vendor driver. Before creating mdev device of this type, check if > max_instance is > 0. > > 'params' > Write-Only file. Optional. > String input. Libvirt would pass the string given in XML file to > this file and then create mdev device. Set empty string to clear params. > For example, set parameter 'frame_rate_limiter=0' to disable frame rate > limiter for performance benchmarking, then create device of type 11. The > device created would have that parameter set by vendor driver. Nope, libvirt will explicitly *NEVER* allow arbitrary opaque passthrough of vendor specific data in this way. > The parent device would look like: > > > pci_0000_86_00_0 > > 0 > 134 > 0 > 0 > > > > > GRID M60-0B > 512M > 2560x1600 > 2 > 16 > 1 > There would need to be a element, eg gpu We would then have further elements based on the class. eg GRID M60-0B 512M 2560x1600 2 16 1 > > GRID M60 > NVIDIA > > > > 2. Create/destroy mediated device > > With above example, vGPU device XML would look like: > > > my-vgpu > pci_0000_86_00_0 > > > 1 > 'frame_rate_limiter=0' No, we will not support in this manner in libvirt. The entire purpose of libvirt is to represent data in a vendor agnostic manner and not do abitrary passthrough of vendor specific data. Simply saying this field is optional does not get around that either. > > > > 'type id' is mandatory. > 'group' is optional. It should be a unique number in the system among > all the groups created for mdev devices. Its usage is: > - not needed if single vGPU device is being assigned to a domain. > - only need to be set if multiple vGPUs need to be assigned to a > domain and vendor driver have 'requires_group' file in type id directory. > - if type id directory include 'requires_group' and user tries to > assign multiple vGPUs to a domain without having field in XML, > it will create single vGPU. > > 'params' is optional field. User should set this field if extra > parameters need to be set for a particular vGPU device. Libvirt don't > need to parse these params. These are meant for vendor driver. > > Libvirt need to follow the sequence to create device: > * Read /sys/../0000\:86\:00.0/11/max_instances. If it is greater than 0, > then only proceed else fail. > > * Set extra params if 'params' field exist in device XML and 'params' > file exist in type id directory > > echo "frame_rate_limiter=0" > /sys/../0000\:86\:00.0/11/params We cannot do that step. > > * Autogenerate UUID > * Create device: > > echo "$UUID:" > /sys/../0000\:86\:00.0/11/create > > where is optional. Group should be unique number among all > the groups created for mdev devices. > > * Clear params, if set earlier: > > echo "" > /sys/../0000\:86\:00.0/11/params > > * To destroy device: > > echo $UUID > /sys/../0000\:86\:00.0/mdev_destroy > > > 3. Start/stop mediated device > > No change or requirement for libvirt as this will be handled by open() > and close() callbacks to vendor driver. In case of multiple devices and > 'requires_group' set, this will be handled in 'first open()' and 'last > close()' on device in that group. > > 4. Launch QEMU/VM > > Pass the mdev sysfs path to QEMU as vfio-pci device. > For above vGPU device example: > > -device vfio-pci,sysfsdev=/sys/bus/mdev/devices/$UUID > > 5. QEMU/VM Shutdown sequence > > No change or requirement for libvirt. > > 6. VM Reset > > No change or requirement for libvirt as this will be handled via VFIO > reset API and QEMU process will keep running as before. > > 7. Hot-plug > > It is same syntax to create a virtual device for hot-plug. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|