From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Tian, Kevin" Subject: RE: [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU. Date: Wed, 17 Feb 2016 05:04:31 +0000 Message-ID: References: <1454527963.18969.8.camel@redhat.com> <20160216071304.GA6867@nvidia.com> <20160216073647.GB6867@nvidia.com> <20160216075310.GC6867@nvidia.com> <20160216084855.GA7717@nvidia.com> <20160217041743.GA7903@nvidia.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alex Williamson , Gerd Hoffmann , Kirti Wankhede , Paolo Bonzini , "Ruan, Shuai" , "Song, Jike" , "Lv, Zhiyuan" , "kvm@vger.kernel.org" , qemu-devel To: Neo Jia Return-path: Received: from mga09.intel.com ([134.134.136.24]:50201 "EHLO mga09.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750740AbcBQFEg convert rfc822-to-8bit (ORCPT ); Wed, 17 Feb 2016 00:04:36 -0500 In-Reply-To: <20160217041743.GA7903@nvidia.com> Content-Language: en-US Sender: kvm-owner@vger.kernel.org List-ID: > From: Neo Jia > Sent: Wednesday, February 17, 2016 12:18 PM >=20 > On Wed, Feb 17, 2016 at 03:31:24AM +0000, Tian, Kevin wrote: > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > Sent: Tuesday, February 16, 2016 4:49 PM > > > > > > On Tue, Feb 16, 2016 at 08:10:42AM +0000, Tian, Kevin wrote: > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > Sent: Tuesday, February 16, 2016 3:53 PM > > > > > > > > > > On Tue, Feb 16, 2016 at 07:40:47AM +0000, Tian, Kevin wrote: > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > > Sent: Tuesday, February 16, 2016 3:37 PM > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 07:27:09AM +0000, Tian, Kevin wro= te: > > > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM > > > > > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +0000, Tian, Kevin= wrote: > > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@red= hat.com] > > > > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM > > > > > > > > > > > > > > > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffmann = wrote: > > > > > > > > > > > > =A0 Hi, > > > > > > > > > > > > > > > > > > > > > > > > > Actually I have a long puzzle in this area. D= efinitely libvirt will > use > > > UUID > > > > > to > > > > > > > > > > > > > mark a VM. And obviously UUID is not recorded= within KVM. > Then > > > how > > > > > does > > > > > > > > > > > > > libvirt talk to KVM based on UUID? It could b= e a good reference > to > > > this > > > > > design. > > > > > > > > > > > > > > > > > > > > > > > > libvirt keeps track which qemu instance belongs= to which vm. > > > > > > > > > > > > qemu also gets started with "-uuid ...", so one= can query qemu > via > > > > > > > > > > > > monitor ("info uuid") to figure what the uuid i= s.=A0=A0It is also in the > > > > > > > > > > > > smbios tables so the guest can see it in the sy= stem information > table. > > > > > > > > > > > > > > > > > > > > > > > > The uuid is not visible to the kernel though, t= he kvm kernel driver > > > > > > > > > > > > doesn't know what the uuid is (and neither does= vfio).=A0=A0qemu uses > > > file > > > > > > > > > > > > handles to talk to both kvm and vfio.=A0=A0qemu= notifies both kvm > and > > > vfio > > > > > > > > > > > > about anything relevant events (guest address s= pace changes > etc) > > > and > > > > > > > > > > > > connects file descriptors (eventfd -> irqfd). > > > > > > > > > > > > > > > > > > > > > > I think the original link to using a VM UUID for = the vGPU comes from > > > > > > > > > > > NVIDIA having a userspace component which might g= et launched > from > > > a udev > > > > > > > > > > > event as the vGPU is created or the set of vGPUs = within that UUID > is > > > > > > > > > > > started.=A0=A0Using the VM UUID then gives them a= way to associate > that > > > > > > > > > > > userspace process with a VM instance.=A0=A0Maybe = it could register with > > > > > > > > > > > libvirt for some sort of service provided for the= VM, I don't know. > > > > > > > > > > > > > > > > > > > > Intel doesn't have this requirement. It should be e= nough as long as > > > > > > > > > > libvirt maintains which sysfs vgpu node is associat= ed to a VM UUID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > qemu needs a sysfs node as handle to the vfio d= evice, something > > > > > > > > > > > > like /sys/devices/virtual/vgpu/.=A0=A0 can be a uuid > if > > > you > > > > > want > > > > > > > > > > > > have it that way, but it could be pretty much a= nything.=A0=A0The sysfs > node > > > > > > > > > > > > will probably show up as-is in the libvirt xml = when assign a vgpu > to > > > a > > > > > > > > > > > > vm.=A0=A0So the name should be something stable= (i.e. when using > a uuid > > > as > > > > > > > > > > > > name you should better not generate a new one o= n each boot). > > > > > > > > > > > > > > > > > > > > > > Actually I don't think there's really a persisten= t naming issue, that's > > > > > > > > > > > probably where we diverge from the SR-IOV model.=A0= =A0SR-IOV cannot > > > > > > > > > > > dynamically add a new VF, it needs to reset the n= umber of VFs to > zero, > > > > > > > > > > > then re-allocate all of them up to the new desire= d count.=A0=A0That has > some > > > > > > > > > > > obvious implications.=A0=A0I think with both vend= ors here, we can > > > > > > > > > > > dynamically allocate new vGPUs, so I would expect= that libvirt would > > > > > > > > > > > create each vGPU instance as it's needed.=A0=A0No= ne would be created > by > > > > > > > > > > > default without user interaction. > > > > > > > > > > > > > > > > > > > > > > Personally I think using a UUID makes sense, but = it needs to be > > > > > > > > > > > userspace policy whether that UUID has any implic= it meaning like > > > > > > > > > > > matching the VM UUID.=A0=A0Having an index within= a UUID bothers me > a > > > bit, > > > > > > > > > > > but it doesn't seem like too much of a concession= to enable the use > case > > > > > > > > > > > that NVIDIA is trying to achieve.=A0=A0Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would prefer to making UUID an optional parameter= , while not tieing > > > > > > > > > > sysfs vgpu naming to UUID. This would be more flexi= ble to different > > > > > > > > > > scenarios where UUID might not be required. > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > > > Happy Chinese New Year! > > > > > > > > > > > > > > > > > > I think having UUID as the vgpu device name will allo= w us to have an > gpu > > > vendor > > > > > > > > > agnostic solution for the upper layer software stack = such as QEMU, who > is > > > > > > > > > supposed to open the device. > > > > > > > > > > > > > > > > > > > > > > > > > Qemu can use whatever sysfs path provided to open the d= evice, regardless > > > > > > > > of whether there is an UUID within the path... > > > > > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > Then it will provide even more benefit of using UUID as l= ibvirt can be > > > > > > > implemented as gpu vendor agnostic, right? :-) > > > > > > > > > > > > > > The UUID can be VM UUID or vGPU group object UUID which r= eally depends > on > > > the > > > > > > > high level software stack, again the benefit is gpu vendo= r agnostic. > > > > > > > > > > > > > > > > > > > There is case where libvirt is not used while another mgmt.= stack doesn't use > > > > > > UUID, e.g. in some Xen scenarios. So it's not about GPU ven= dor agnostic. It's > > > > > > about high level mgmt. stack agnostic. That's why we need m= ake UUID as > > > > > > optional in this vGPU-core framework. > > > > > > > > > > Hi Kevin, > > > > > > > > > > As long as you have to create an object to represent vGPU or = vGPU group, you > > > > > will have UUID, no matter which management stack you are goin= g to use. > > > > > > > > > > UUID is the most agnostic way to represent an object, I think= =2E > > > > > > > > > > (a bit off topic since we are supposed to focus on VFIO on KV= M) > > > > > > > > > > Since now you are talking about Xen, I am very happy to discu= ss that with you. > > > > > You can check how Xen has managed its object via UUID in xapi= =2E > > > > > > > > > > > > > Well, I'm not the expert in this area. IMHO UUID is just an use= r level > > > > attribute, which can be associated to any sysfs node and manage= d by > > > > mgmt. stack itself, and then the sysfs path can be opened as th= e > > > > bridge between user/kernel. I don't understand the necessity of= binding > > > > UUID internally within vGPU core framework here. Alex gave one = example > > > > of udev, but I didn't quite catch why only UUID can work there.= Maybe > > > > you can elaborate that requirement. > > > > > > Hi Kevin, > > > > > > UUID is just a way to represent an object. > > > > > > It is not binding, it is just a representation. I think here we a= re just > > > creating a convenient and generic way to represent a virtual gpu = device on > > > sysfs. > > > > > > Having the UUID as part of the virtual gpu device name allows us = easily find out > > > the mapping. > > > > > > UUID can be anything, you can always use an UUID to present VMID = in the example > > > you listed below, so you are actually gaining flexibility by usin= g UUID instead > > > of VMID as it can be supported by both KVM and Xen. :-) > > > > > > Thanks, > > > Neo > > > > > > > Thanks Neo. I understand UUID has its merit in many usages. As you > > may see from my earlier reply, my main concern is whether it's a mu= st > > to record this information within kernel vGPU-core framework. We ca= n > > still make it hypervisor agnostic even not using UUID, as long as t= here's > > a unified namespace created for all vgpus, like: > > vgpu-vendor-0, vgpu-vendor-1, ... > > > > Then high level mgmt. stack can associate UUID to that namespace. S= o > > I hope you can help elaborate below description: > > >=20 > > > Having the UUID as part of the virtual gpu device name allows us = easily find out > > > the mapping. > > >=20 > Hi Kevin, >=20 > The answer is simple, having a UUID as part of the device name will g= ive you a > unique sysfs path that will be opened by QEMU. >=20 > vgpu-vendor-0 and vgpu-vendor-1 will not be unique as we can have mul= tiple > virtual gpu devices per VM coming from same or different physical dev= ices. That is not a problem. We can add physical device info too like vgpu-ve= ndor-0-0, vgpu-vendor-1-0, ... Please note Qemu doesn't care about the actual name. It just accepts a = sysfs path to open. >=20 > If you are worried about losing meaningful name here, we can create a= sysfs file > to capture the vendor device description if you like. >=20 Having the vgpu name descriptive is more informative imo. User can simp= ly check sysfs names to know raw information w/o relying on 3rd party agent to q= uery=20 information around an opaque UUID. That's why I prefer to making UUID optional. By default vgpu name would= be some description string, and when UUID is provided, UUID can be appende= d to the string to serve your purpose. Thanks Kevin From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46510) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVuI3-0005LQ-TC for qemu-devel@nongnu.org; Wed, 17 Feb 2016 00:04:41 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aVuI0-0006Gi-Lm for qemu-devel@nongnu.org; Wed, 17 Feb 2016 00:04:39 -0500 Received: from mga11.intel.com ([192.55.52.93]:37821) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aVuI0-0006GO-9h for qemu-devel@nongnu.org; Wed, 17 Feb 2016 00:04:36 -0500 From: "Tian, Kevin" Date: Wed, 17 Feb 2016 05:04:31 +0000 Message-ID: References: <1454527963.18969.8.camel@redhat.com> <20160216071304.GA6867@nvidia.com> <20160216073647.GB6867@nvidia.com> <20160216075310.GC6867@nvidia.com> <20160216084855.GA7717@nvidia.com> <20160217041743.GA7903@nvidia.com> In-Reply-To: <20160217041743.GA7903@nvidia.com> Content-Language: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU. List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Neo Jia Cc: "Ruan, Shuai" , "Song, Jike" , "kvm@vger.kernel.org" , Kirti Wankhede , qemu-devel , Alex Williamson , Gerd Hoffmann , Paolo Bonzini , "Lv, Zhiyuan" > From: Neo Jia > Sent: Wednesday, February 17, 2016 12:18 PM >=20 > On Wed, Feb 17, 2016 at 03:31:24AM +0000, Tian, Kevin wrote: > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > Sent: Tuesday, February 16, 2016 4:49 PM > > > > > > On Tue, Feb 16, 2016 at 08:10:42AM +0000, Tian, Kevin wrote: > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > Sent: Tuesday, February 16, 2016 3:53 PM > > > > > > > > > > On Tue, Feb 16, 2016 at 07:40:47AM +0000, Tian, Kevin wrote: > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > > Sent: Tuesday, February 16, 2016 3:37 PM > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 07:27:09AM +0000, Tian, Kevin wrote: > > > > > > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > > > > > > Sent: Tuesday, February 16, 2016 3:13 PM > > > > > > > > > > > > > > > > > > On Tue, Feb 16, 2016 at 06:49:30AM +0000, Tian, Kevin wro= te: > > > > > > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.= com] > > > > > > > > > > > Sent: Thursday, February 04, 2016 3:33 AM > > > > > > > > > > > > > > > > > > > > > > On Wed, 2016-02-03 at 09:28 +0100, Gerd Hoffmann wrot= e: > > > > > > > > > > > > =A0 Hi, > > > > > > > > > > > > > > > > > > > > > > > > > Actually I have a long puzzle in this area. Defin= itely libvirt will > use > > > UUID > > > > > to > > > > > > > > > > > > > mark a VM. And obviously UUID is not recorded wit= hin KVM. > Then > > > how > > > > > does > > > > > > > > > > > > > libvirt talk to KVM based on UUID? It could be a = good reference > to > > > this > > > > > design. > > > > > > > > > > > > > > > > > > > > > > > > libvirt keeps track which qemu instance belongs to = which vm. > > > > > > > > > > > > qemu also gets started with "-uuid ...", so one can= query qemu > via > > > > > > > > > > > > monitor ("info uuid") to figure what the uuid is.= =A0=A0It is also in the > > > > > > > > > > > > smbios tables so the guest can see it in the system= information > table. > > > > > > > > > > > > > > > > > > > > > > > > The uuid is not visible to the kernel though, the k= vm kernel driver > > > > > > > > > > > > doesn't know what the uuid is (and neither does vfi= o).=A0=A0qemu uses > > > file > > > > > > > > > > > > handles to talk to both kvm and vfio.=A0=A0qemu not= ifies both kvm > and > > > vfio > > > > > > > > > > > > about anything relevant events (guest address space= changes > etc) > > > and > > > > > > > > > > > > connects file descriptors (eventfd -> irqfd). > > > > > > > > > > > > > > > > > > > > > > I think the original link to using a VM UUID for the = vGPU comes from > > > > > > > > > > > NVIDIA having a userspace component which might get l= aunched > from > > > a udev > > > > > > > > > > > event as the vGPU is created or the set of vGPUs with= in that UUID > is > > > > > > > > > > > started.=A0=A0Using the VM UUID then gives them a way= to associate > that > > > > > > > > > > > userspace process with a VM instance.=A0=A0Maybe it c= ould register with > > > > > > > > > > > libvirt for some sort of service provided for the VM,= I don't know. > > > > > > > > > > > > > > > > > > > > Intel doesn't have this requirement. It should be enoug= h as long as > > > > > > > > > > libvirt maintains which sysfs vgpu node is associated t= o a VM UUID. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > qemu needs a sysfs node as handle to the vfio devic= e, something > > > > > > > > > > > > like /sys/devices/virtual/vgpu/.=A0=A0 = can be a uuid > if > > > you > > > > > want > > > > > > > > > > > > have it that way, but it could be pretty much anyth= ing.=A0=A0The sysfs > node > > > > > > > > > > > > will probably show up as-is in the libvirt xml when= assign a vgpu > to > > > a > > > > > > > > > > > > vm.=A0=A0So the name should be something stable (i.= e. when using > a uuid > > > as > > > > > > > > > > > > name you should better not generate a new one on ea= ch boot). > > > > > > > > > > > > > > > > > > > > > > Actually I don't think there's really a persistent na= ming issue, that's > > > > > > > > > > > probably where we diverge from the SR-IOV model.=A0= =A0SR-IOV cannot > > > > > > > > > > > dynamically add a new VF, it needs to reset the numbe= r of VFs to > zero, > > > > > > > > > > > then re-allocate all of them up to the new desired co= unt.=A0=A0That has > some > > > > > > > > > > > obvious implications.=A0=A0I think with both vendors = here, we can > > > > > > > > > > > dynamically allocate new vGPUs, so I would expect tha= t libvirt would > > > > > > > > > > > create each vGPU instance as it's needed.=A0=A0None w= ould be created > by > > > > > > > > > > > default without user interaction. > > > > > > > > > > > > > > > > > > > > > > Personally I think using a UUID makes sense, but it n= eeds to be > > > > > > > > > > > userspace policy whether that UUID has any implicit m= eaning like > > > > > > > > > > > matching the VM UUID.=A0=A0Having an index within a U= UID bothers me > a > > > bit, > > > > > > > > > > > but it doesn't seem like too much of a concession to = enable the use > case > > > > > > > > > > > that NVIDIA is trying to achieve.=A0=A0Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I would prefer to making UUID an optional parameter, wh= ile not tieing > > > > > > > > > > sysfs vgpu naming to UUID. This would be more flexible = to different > > > > > > > > > > scenarios where UUID might not be required. > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > > > > > Happy Chinese New Year! > > > > > > > > > > > > > > > > > > I think having UUID as the vgpu device name will allow us= to have an > gpu > > > vendor > > > > > > > > > agnostic solution for the upper layer software stack such= as QEMU, who > is > > > > > > > > > supposed to open the device. > > > > > > > > > > > > > > > > > > > > > > > > > Qemu can use whatever sysfs path provided to open the devic= e, regardless > > > > > > > > of whether there is an UUID within the path... > > > > > > > > > > > > > > > > > > > > > > Hi Kevin, > > > > > > > > > > > > > > Then it will provide even more benefit of using UUID as libvi= rt can be > > > > > > > implemented as gpu vendor agnostic, right? :-) > > > > > > > > > > > > > > The UUID can be VM UUID or vGPU group object UUID which reall= y depends > on > > > the > > > > > > > high level software stack, again the benefit is gpu vendor ag= nostic. > > > > > > > > > > > > > > > > > > > There is case where libvirt is not used while another mgmt. sta= ck doesn't use > > > > > > UUID, e.g. in some Xen scenarios. So it's not about GPU vendor = agnostic. It's > > > > > > about high level mgmt. stack agnostic. That's why we need make = UUID as > > > > > > optional in this vGPU-core framework. > > > > > > > > > > Hi Kevin, > > > > > > > > > > As long as you have to create an object to represent vGPU or vGPU= group, you > > > > > will have UUID, no matter which management stack you are going to= use. > > > > > > > > > > UUID is the most agnostic way to represent an object, I think. > > > > > > > > > > (a bit off topic since we are supposed to focus on VFIO on KVM) > > > > > > > > > > Since now you are talking about Xen, I am very happy to discuss t= hat with you. > > > > > You can check how Xen has managed its object via UUID in xapi. > > > > > > > > > > > > > Well, I'm not the expert in this area. IMHO UUID is just an user le= vel > > > > attribute, which can be associated to any sysfs node and managed by > > > > mgmt. stack itself, and then the sysfs path can be opened as the > > > > bridge between user/kernel. I don't understand the necessity of bin= ding > > > > UUID internally within vGPU core framework here. Alex gave one exam= ple > > > > of udev, but I didn't quite catch why only UUID can work there. May= be > > > > you can elaborate that requirement. > > > > > > Hi Kevin, > > > > > > UUID is just a way to represent an object. > > > > > > It is not binding, it is just a representation. I think here we are j= ust > > > creating a convenient and generic way to represent a virtual gpu devi= ce on > > > sysfs. > > > > > > Having the UUID as part of the virtual gpu device name allows us easi= ly find out > > > the mapping. > > > > > > UUID can be anything, you can always use an UUID to present VMID in t= he example > > > you listed below, so you are actually gaining flexibility by using UU= ID instead > > > of VMID as it can be supported by both KVM and Xen. :-) > > > > > > Thanks, > > > Neo > > > > > > > Thanks Neo. I understand UUID has its merit in many usages. As you > > may see from my earlier reply, my main concern is whether it's a must > > to record this information within kernel vGPU-core framework. We can > > still make it hypervisor agnostic even not using UUID, as long as there= 's > > a unified namespace created for all vgpus, like: > > vgpu-vendor-0, vgpu-vendor-1, ... > > > > Then high level mgmt. stack can associate UUID to that namespace. So > > I hope you can help elaborate below description: > > >=20 > > > Having the UUID as part of the virtual gpu device name allows us easi= ly find out > > > the mapping. > > >=20 > Hi Kevin, >=20 > The answer is simple, having a UUID as part of the device name will give = you a > unique sysfs path that will be opened by QEMU. >=20 > vgpu-vendor-0 and vgpu-vendor-1 will not be unique as we can have multipl= e > virtual gpu devices per VM coming from same or different physical devices= . That is not a problem. We can add physical device info too like vgpu-vendor= -0-0, vgpu-vendor-1-0, ... Please note Qemu doesn't care about the actual name. It just accepts a sysf= s path to open. >=20 > If you are worried about losing meaningful name here, we can create a sys= fs file > to capture the vendor device description if you like. >=20 Having the vgpu name descriptive is more informative imo. User can simply c= heck sysfs names to know raw information w/o relying on 3rd party agent to query= =20 information around an opaque UUID. That's why I prefer to making UUID optional. By default vgpu name would be some description string, and when UUID is provided, UUID can be appended to the string to serve your purpose. Thanks Kevin