From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36687) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpNWH-00042V-FW for qemu-devel@nongnu.org; Wed, 28 Sep 2016 18:40:06 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bpNWC-0000w9-1d for qemu-devel@nongnu.org; Wed, 28 Sep 2016 18:40:05 -0400 Received: from mx1.redhat.com ([209.132.183.28]:55418) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpNWB-0000vw-Pd for qemu-devel@nongnu.org; Wed, 28 Sep 2016 18:40:00 -0400 Date: Wed, 28 Sep 2016 16:39:57 -0600 From: Alex Williamson Message-ID: <20160928163957.199b26c4@t450s.home> In-Reply-To: <20160928200631.GC26800@nvidia.com> References: <20160920084323.55a032fc@t450s.home> <20160920105025.4ef2cd40@t450s.home> <20160921130353.62bf309c@t450s.home> <11355037-d88e-0f28-bd07-14a7c86f85b0@nvidia.com> <20160922081921.57d31e47@t450s.home> <20160922142638.GR352@redhat.com> <20160928192233.GA25369@nvidia.com> <20160928135547.280daff0@t450s.home> <20160928200631.GC26800@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Neo Jia Cc: "Daniel P. Berrange" , Kirti Wankhede , Andy Currid , "Tian, Kevin" , "libvir-list@redhat.com" , qemu-devel , "Song, Jike" , Gerd Hoffmann , Paolo Bonzini , "bjsdjshi@linux.vnet.ibm.com" On Wed, 28 Sep 2016 13:06:31 -0700 Neo Jia wrote: > On Wed, Sep 28, 2016 at 01:55:47PM -0600, Alex Williamson wrote: > > On Wed, 28 Sep 2016 12:22:35 -0700 > > Neo Jia wrote: > > =20 > > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote: = =20 > > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote: = =20 > > > > > On Thu, 22 Sep 2016 09:41:20 +0530 > > > > > Kirti Wankhede wrote: > > > > > =20 > > > > > > >>>>> My concern is that a type id seems arbitrary but we're sp= ecifying that > > > > > > >>>>> it be unique. We already have something unique, the name= . So why try > > > > > > >>>>> to make the type id unique as well? A vendor can acciden= tally create > > > > > > >>>>> their vendor driver so that a given name means something = very > > > > > > >>>>> specific. On the other hand they need to be extremely de= liberate to > > > > > > >>>>> coordinate that a type id means a unique thing across all= their product > > > > > > >>>>> lines. > > > > > > >>>>> =20 > > > > > > >>>> > > > > > > >>>> Let me clarify, type id should be unique in the list of > > > > > > >>>> mdev_supported_types. You can't have 2 directories in with= same name. =20 > > > > > > >>> > > > > > > >>> Of course, but does that mean it's only unique to the machi= ne I'm > > > > > > >>> currently running on? Let's say I have a Tesla P100 on my = system and > > > > > > >>> type-id 11 is named "GRID-M60-0B". At some point in the fu= ture I > > > > > > >>> replace the Tesla P100 with a Q1000 (made up). Is type-id = 11 on that > > > > > > >>> new card still going to be a "GRID-M60-0B"? If not then we= 've based > > > > > > >>> our XML on the wrong attribute. If the new device does not= support > > > > > > >>> "GRID-M60-0B" then we should generate an error, not simply = initialize > > > > > > >>> whatever type-id 11 happens to be on this new card. > > > > > > >>> =20 > > > > > > >> > > > > > > >> If there are 2 M60 in the system then you would find '11' ty= pe directory > > > > > > >> in mdev_supported_types of both M60. If you have P100, '11' = type would > > > > > > >> not be there in its mdev_supported_types, it will have diffe= rent types. > > > > > > >> > > > > > > >> For example, if you replace M60 with P100, but XML is not up= dated. XML > > > > > > >> have type '11'. When libvirt would try to create mdev device= , libvirt > > > > > > >> would have to find 'create' file in sysfs in following direc= tory format: > > > > > > >> > > > > > > >> --- mdev_supported_types > > > > > > >> |-- 11 > > > > > > >> | |-- create > > > > > > >> > > > > > > >> but now for P100, '11' directory is not there, so libvirt sh= ould throw > > > > > > >> error on not able to find '11' directory. =20 > > > > > > >=20 > > > > > > > This really seems like an accident waiting to happen. What h= appens > > > > > > > when the user replaces their M60 with an Intel XYZ device tha= t happens > > > > > > > to expose a type 11 mdev class gpu device? How is libvirt su= pposed to > > > > > > > know that the XML used to refer to a GRID-M60-0B and now it's= an > > > > > > > INTEL-IGD-XYZ? Doesn't basing the XML entry on the name and = removing > > > > > > > yet another arbitrary requirement that we have some sort of g= lobally > > > > > > > unique type-id database make a lot of sense? The same issue = applies > > > > > > > for simple debug-ability, if I'm reviewing the XML for a doma= in and the > > > > > > > name is the primary index for the mdev device, I know what it= is. > > > > > > > Seeing type-id=3D'11' is meaningless. > > > > > > > =20 > > > > > >=20 > > > > > > Let me clarify again, type '11' is a string that vendor driver = would > > > > > > define (see my previous reply below) it could be "11" or "GRID-= M60-0B". > > > > > > If 2 vendors used same string we can't control that. right? > > > > > >=20 > > > > > > =20 > > > > > > >>>> Lets remove 'id' from type id in XML if that is the concer= n. Supported > > > > > > >>>> types is going to be defined by vendor driver, so let vend= or driver > > > > > > >>>> decide what to use for directory name and same should be u= sed in device > > > > > > >>>> xml file, it could be '11' or "GRID M60-0B": > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> my-vgpu > > > > > > >>>> pci_0000_86_00_0 > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> ... > > > > > > >>>> > > > > > > >>>> =20 > > > > >=20 > > > > > Then let's get rid of the 'name' attribute and let the sysfs dire= ctory > > > > > simply be the name. Then we can get rid of 'type' altogether so = we > > > > > don't have this '11' vs 'GRID-M60-0B' issue. Thanks, =20 > > > >=20 > > > > That sounds nice to me - we don't need two unique identifiers if > > > > one will do. =20 > > >=20 > > > Hi Alex and Daniel, > > >=20 > > > I just had some internal discussions here within NVIDIA and found out= that > > > actually the name/label potentially might not be unique and the "id" = will be.=20 > > > So I think we still would like to keep both so the id is the programm= atic id > > > and the name/label is a human readable string for it, which might get= changed to > > > be non-unique by outside of engineering. =20 > >=20 > > I think your discovery only means that for your vendor driver, the name > > will be "11" (as a string). Perhaps you'd like some sort of vendor > > provided description within each type, but I am not in favor of having > > an arbitrary integer value imply something specific within the sysfs > > interface. IOW, the NVIDIA vendor driver should be able to create: > >=20 > > 11 > > =E2=94=9C=E2=94=80=E2=94=80 create > > =E2=94=9C=E2=94=80=E2=94=80 description > > =E2=94=9C=E2=94=80=E2=94=80 etc > > =E2=94=94=E2=94=80=E2=94=80 resolution > >=20 > > While Intel might create: > >=20 > > Skylake-vGPU > > =E2=94=9C=E2=94=80=E2=94=80 create > > =E2=94=9C=E2=94=80=E2=94=80 description > > =E2=94=9C=E2=94=80=E2=94=80 etc > > =E2=94=94=E2=94=80=E2=94=80 resolution > >=20 > > Maybe "description" is optional for vendors that use useful names? > > Thanks, =20 >=20 > I think we should be able to have a unique vendor type string instead of = an > arbitrary integer value there as long as we are allowed to have a descrip= tion > field that can be used to show to the end user as "name / label".=20 The key is what goes into the XML and I think that should be unique, both between vendors and within the vendor, so "11" is probably a poor choice. We could perhaps get a per-vendor-driver namespace by prefixing the name of the vendor driver, then you could have nvidia-vgpu-mdev-11 (or whatever you end up naming your vendor driver). Of course if the vendor driver wanted to translate "11" into a more stable, human readable name, you could export something like "nvidia-vgpu-mdev-GRID11" (I have no idea what your unique types are based on). Thanks, Alex