From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38729) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpLlW-0007Ca-38 for qemu-devel@nongnu.org; Wed, 28 Sep 2016 16:47:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bpLlS-0003Fe-2w for qemu-devel@nongnu.org; Wed, 28 Sep 2016 16:47:42 -0400 Received: from hqemgate15.nvidia.com ([216.228.121.64]:18958) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpLlR-0003FX-Nn for qemu-devel@nongnu.org; Wed, 28 Sep 2016 16:47:38 -0400 Date: Wed, 28 Sep 2016 13:47:34 -0700 From: Neo Jia Message-ID: <20160928204734.GA27575@nvidia.com> References: <20160920105025.4ef2cd40@t450s.home> <20160921130353.62bf309c@t450s.home> <11355037-d88e-0f28-bd07-14a7c86f85b0@nvidia.com> <20160922081921.57d31e47@t450s.home> <20160922142638.GR352@redhat.com> <20160928192233.GA25369@nvidia.com> <20160928195959.GB26800@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [libvirt] [RFC v2] libvirt vGPU QEMU integration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Laine Stump Cc: qemu-devel , "Tian, Kevin" , Currid , "Song, Jike" , "libvir-list@redhat.com" , Kirti Wankhede , Andy@redhat.com, Gerd Hoffmann , Paolo Bonzini , "bjsdjshi@linux.vnet.ibm.com" On Wed, Sep 28, 2016 at 04:31:25PM -0400, Laine Stump wrote: > On 09/28/2016 03:59 PM, Neo Jia wrote: > > On Wed, Sep 28, 2016 at 07:45:38PM +0000, Tian, Kevin wrote: > > > > From: Neo Jia [mailto:cjia@nvidia.com] > > > > Sent: Thursday, September 29, 2016 3:23 AM > > > > > > > > On Thu, Sep 22, 2016 at 03:26:38PM +0100, Daniel P. Berrange wrote: > > > > > On Thu, Sep 22, 2016 at 08:19:21AM -0600, Alex Williamson wrote: > > > > > > On Thu, 22 Sep 2016 09:41:20 +0530 > > > > > > Kirti Wankhede wrote: > > > > > > > > > > > > > > > > > > My concern is that a type id seems arbitrary but we're specifying that > > > > > > > > > > > > it be unique. We already have something unique, the name. So why try > > > > > > > > > > > > to make the type id unique as well? A vendor can accidentally create > > > > > > > > > > > > their vendor driver so that a given name means something very > > > > > > > > > > > > specific. On the other hand they need to be extremely deliberate to > > > > > > > > > > > > coordinate that a type id means a unique thing across all their product > > > > > > > > > > > > lines. > > > > > > > > > > > > > > > > > > > > > > > Let me clarify, type id should be unique in the list of > > > > > > > > > > > mdev_supported_types. You can't have 2 directories in with same name. > > > > > > > > > > Of course, but does that mean it's only unique to the machine I'm > > > > > > > > > > currently running on? Let's say I have a Tesla P100 on my system and > > > > > > > > > > type-id 11 is named "GRID-M60-0B". At some point in the future I > > > > > > > > > > replace the Tesla P100 with a Q1000 (made up). Is type-id 11 on that > > > > > > > > > > new card still going to be a "GRID-M60-0B"? If not then we've based > > > > > > > > > > our XML on the wrong attribute. If the new device does not support > > > > > > > > > > "GRID-M60-0B" then we should generate an error, not simply initialize > > > > > > > > > > whatever type-id 11 happens to be on this new card. > > > > > > > > > > > > > > > > > > > If there are 2 M60 in the system then you would find '11' type directory > > > > > > > > > in mdev_supported_types of both M60. If you have P100, '11' type would > > > > > > > > > not be there in its mdev_supported_types, it will have different types. > > > > > > > > > > > > > > > > > > For example, if you replace M60 with P100, but XML is not updated. XML > > > > > > > > > have type '11'. When libvirt would try to create mdev device, libvirt > > > > > > > > > would have to find 'create' file in sysfs in following directory format: > > > > > > > > > > > > > > > > > > --- mdev_supported_types > > > > > > > > > |-- 11 > > > > > > > > > | |-- create > > > > > > > > > > > > > > > > > > but now for P100, '11' directory is not there, so libvirt should throw > > > > > > > > > error on not able to find '11' directory. > > > > > > > > This really seems like an accident waiting to happen. What happens > > > > > > > > when the user replaces their M60 with an Intel XYZ device that happens > > > > > > > > to expose a type 11 mdev class gpu device? How is libvirt supposed to > > > > > > > > know that the XML used to refer to a GRID-M60-0B and now it's an > > > > > > > > INTEL-IGD-XYZ? Doesn't basing the XML entry on the name and removing > > > > > > > > yet another arbitrary requirement that we have some sort of globally > > > > > > > > unique type-id database make a lot of sense? The same issue applies > > > > > > > > for simple debug-ability, if I'm reviewing the XML for a domain and the > > > > > > > > name is the primary index for the mdev device, I know what it is. > > > > > > > > Seeing type-id='11' is meaningless. > > > > > > > > > > > > > > > Let me clarify again, type '11' is a string that vendor driver would > > > > > > > define (see my previous reply below) it could be "11" or "GRID-M60-0B". > > > > > > > If 2 vendors used same string we can't control that. right? > > > > > > > > > > > > > > > > > > > > > > > > > Lets remove 'id' from type id in XML if that is the concern. Supported > > > > > > > > > > > types is going to be defined by vendor driver, so let vendor driver > > > > > > > > > > > decide what to use for directory name and same should be used in device > > > > > > > > > > > xml file, it could be '11' or "GRID M60-0B": > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > my-vgpu > > > > > > > > > > > pci_0000_86_00_0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > > > > > > > > > > > > > Then let's get rid of the 'name' attribute and let the sysfs directory > > > > > > simply be the name. Then we can get rid of 'type' altogether so we > > > > > > don't have this '11' vs 'GRID-M60-0B' issue. Thanks, > > > > > That sounds nice to me - we don't need two unique identifiers if > > > > > one will do. > > > > Hi Alex and Daniel, > > > > > > > > I just had some internal discussions here within NVIDIA and found out that > > > > actually the name/label potentially might not be unique and the "id" will be. > > My comment below follows the above statement^^^^. > > > > > So I think we still would like to keep both so the id is the programmatic id > > > > and the name/label is a human readable string for it, which might get changed to > > > > be non-unique by outside of engineering. > > > > > > > > Sorry for the change. > > > > > > > > Thanks, > > > > Neo > > > > > > > A curious question. How do we expect such a descriptive name/label used > > > by upper-level stack (e.g. openstack)? Should openstack define a vGPU > > > flavor just using ID (GRID-type11) or using both ID/name (GRID-type11- > > > M60-0B) for end customer to choose? If it's only for human information, > > > does it make sense e.g. providing only unique ID in sysfs while relying on > > > vendor specific documentation to describe what the ID actually means? > > Hi Kevin, > > > > The id is not visible to the upper-level stack, only the name / label will be > > shown to the end customer to choose, such as "GRID-M60-0B", as we might expose > > the same virtual device (name/label) with some internal difference which will > > be tracked by the different unique id. > > If the upper layer will only see the descriptive name/label, then that label > must be unique. It's not acceptable for a particular key used by management > software to sometimes lead to one flavor of device and sometimes another (no > matter how small the differences may be). So if only the ID is unique, then > the ID is what must be used in any configuration at any level. Probably I should be clear about the "upper layer" as I was replying to Kevin regarding what the end user will see from their interface. The "key" will be unique throughout all management stacks and you should use the "key / ID" for all configurations. But when you want to show a human readable description of a virtual device, it will come from the "description" field. Thanks, Neo > > > > > I think having the ability to allow libvirt or upper-level stack to display a > > human readable string for a given type of vgpu will make the user life easier. > > > > Thanks, > > Neo > > > > > Thanks, > > > Kevin > > > > > > > > > > -- > > libvir-list mailing list > > libvir-list@redhat.com > > https://www.redhat.com/mailman/listinfo/libvir-list > > >