From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neo Jia Subject: Re: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Date: Tue, 26 Jan 2016 13:30:33 -0800 Message-ID: <20160126213033.GA21927@nvidia.com> References: <569CA8AD.6070200@intel.com> <1453143919.32741.169.camel@redhat.com> <569F4C86.2070501@intel.com> <56A6083E.10703@intel.com> <1453757426.32741.614.camel@redhat.com> <56A72313.9030009@intel.com> <56A77D2D.40109@gmail.com> <1453826249.26652.54.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alex Williamson , Yang Zhang , "Song, Jike" , Gerd Hoffmann , Paolo Bonzini , "Lv, Zhiyuan" , "Ruan, Shuai" , "kvm@vger.kernel.org" , qemu-devel , "igvt-g@lists.01.org" To: "Tian, Kevin" Return-path: Received: from hqemgate16.nvidia.com ([216.228.121.65]:19840 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751532AbcAZVaj convert rfc822-to-8bit (ORCPT ); Tue, 26 Jan 2016 16:30:39 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Tue, Jan 26, 2016 at 09:21:42PM +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Wednesday, January 27, 2016 12:37 AM > >=20 > > On Tue, 2016-01-26 at 22:05 +0800, Yang Zhang wrote: > > > On 2016/1/26 15:41, Jike Song wrote: > > > > On 01/26/2016 05:30 AM, Alex Williamson wrote: > > > > > [cc +Neo @Nvidia] > > > > > > > > > > Hi Jike, > > > > > > > > > > On Mon, 2016-01-25 at 19:34 +0800, Jike Song wrote: > > > > > > On 01/20/2016 05:05 PM, Tian, Kevin wrote: > > > > > > > I would expect we can spell out next level tasks toward a= bove > > > > > > > direction, upon which Alex can easily judge whether there= are > > > > > > > some common VFIO framework changes that he can help :-) > > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > Here is a draft task list after a short discussion w/ Kevin= , > > > > > > would you please have a look? > > > > > > > > > > > > Bus Driver > > > > > > > > > > > > { in i915/vgt/xxx.c } > > > > > > > > > > > > - define a subset of vfio_pci interfaces > > > > > > - selective pass-through (say aperture) > > > > > > - trap MMIO: interface w/ QEMU > > > > > > > > > > What's included in the subset?=A0=A0Certainly the bus reset i= octls really > > > > > don't apply, but you'll need to support the full device inter= face, > > > > > right?=A0=A0That includes the region info ioctl and access th= rough the vfio > > > > > device file descriptor as well as the interrupt info and setu= p ioctls. > > > > > > > > > > > > > [All interfaces I thought are via ioctl:)=A0=A0For other stuff = like file > > > > descriptor we'll definitely keep it.] > > > > > > > > The list of ioctl commands provided by vfio_pci: > > > > > > > > - VFIO_DEVICE_GET_PCI_HOT_RESET_INFO > > > > - VFIO_DEVICE_PCI_HOT_RESET > > > > > > > > As you said, above 2 don't apply. But for this: > > > > > > > > - VFIO_DEVICE_RESET > > > > > > > > In my opinion it should be kept, no matter what will be provide= d in > > > > the bus driver. > > > > > > > > - VFIO_PCI_ROM_REGION_INDEX > > > > - VFIO_PCI_VGA_REGION_INDEX > > > > > > > > I suppose above 2 don't apply neither? For a vgpu we don't prov= ide a > > > > ROM BAR or VGA region. > > > > > > > > - VFIO_DEVICE_GET_INFO > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > - VFIO_DEVICE_GET_IRQ_INFO > > > > - VFIO_DEVICE_SET_IRQS > > > > > > > > Above 4 are needed of course. > > > > > > > > We will need to extend: > > > > > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > > > > > > > > > a) adding a flag: DONT_MAP. For example, the MMIO of vgpu > > > > should be trapped instead of being mmap-ed. > > > > > > I may not in the context, but i am curious how to handle the DONT= _MAP in > > > vfio driver? Since there are no real MMIO maps into the region an= d i > > > suppose the access to the region should be handled by vgpu in i91= 5 > > > driver, but currently most of the mmio accesses are handled by Qe= mu. > >=20 > > VFIO supports the following region attributes: > >=20 > > #define VFIO_REGION_INFO_FLAG_READ=A0=A0=A0=A0=A0=A0(1 << 0) /* Reg= ion supports read */ > > #define VFIO_REGION_INFO_FLAG_WRITE=A0=A0=A0=A0=A0(1 << 1) /* Regio= n supports write */ > > #define VFIO_REGION_INFO_FLAG_MMAP=A0=A0=A0=A0=A0=A0(1 << 2) /* Reg= ion supports mmap */ > >=20 > > If MMAP is not set, then the QEMU driver will do pread and/or pwrit= e to > > the specified offsets of the device file descriptor, depending on w= hat > > accesses are supported. =A0This is all reported through the REGION_= INFO > > ioctl for a given index. =A0If mmap is supported, the VM will have = direct > > access to the area, without faulting to KVM other than to populate = the > > mapping. =A0Without mmap support, a VM MMIO access traps into KVM, = which > > returns out to QEMU to service the request, which then finds the > > MemoryRegion serviced through vfio, which will then perform a > > pread/pwrite through to the kernel vfio bus driver to handle the > > access. =A0Thanks, > >=20 >=20 > Today KVMGT (not using VFIO yet) registers I/O emulation callbacks to= =20 > KVM, so VM MMIO access will be forwarded to KVMGT directly for=20 > emulation in kernel. If we reuse above R/W flags, the whole emulation= =20 > path would be unnecessarily long with obvious performance impact. We > either need a new flag here to indicate in-kernel emulation (bias fro= m > passthrough support), or just hide the region alternatively (let KVMG= T > to handle I/O emulation itself like today). >=20 Hi Kevin, Maybe there is some confusion about the VFIO interface that we are goin= g to use here. I thought we were going to adopt VFIO so nobody would need to dir= ectly plug into kvm module. Thanks, Neo > Thanks > Kevin From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:37431) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOBCO-0002sf-4P for qemu-devel@nongnu.org; Tue, 26 Jan 2016 16:30:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aOBCG-0005qU-Ja for qemu-devel@nongnu.org; Tue, 26 Jan 2016 16:30:52 -0500 Received: from hqemgate16.nvidia.com ([216.228.121.65]:19852) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOBCG-0005qQ-Am for qemu-devel@nongnu.org; Tue, 26 Jan 2016 16:30:44 -0500 Date: Tue, 26 Jan 2016 13:30:33 -0800 From: Neo Jia Message-ID: <20160126213033.GA21927@nvidia.com> References: <569CA8AD.6070200@intel.com> <1453143919.32741.169.camel@redhat.com> <569F4C86.2070501@intel.com> <56A6083E.10703@intel.com> <1453757426.32741.614.camel@redhat.com> <56A72313.9030009@intel.com> <56A77D2D.40109@gmail.com> <1453826249.26652.54.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Tian, Kevin" Cc: Yang Zhang , "Ruan, Shuai" , "Song, Jike" , "kvm@vger.kernel.org" , "igvt-g@lists.01.org" , qemu-devel , Alex Williamson , "Lv, Zhiyuan" , Paolo Bonzini , Gerd Hoffmann On Tue, Jan 26, 2016 at 09:21:42PM +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Wednesday, January 27, 2016 12:37 AM > >=20 > > On Tue, 2016-01-26 at 22:05 +0800, Yang Zhang wrote: > > > On 2016/1/26 15:41, Jike Song wrote: > > > > On 01/26/2016 05:30 AM, Alex Williamson wrote: > > > > > [cc +Neo @Nvidia] > > > > > > > > > > Hi Jike, > > > > > > > > > > On Mon, 2016-01-25 at 19:34 +0800, Jike Song wrote: > > > > > > On 01/20/2016 05:05 PM, Tian, Kevin wrote: > > > > > > > I would expect we can spell out next level tasks toward above > > > > > > > direction, upon which Alex can easily judge whether there are > > > > > > > some common VFIO framework changes that he can help :-) > > > > > > > > > > > > Hi Alex, > > > > > > > > > > > > Here is a draft task list after a short discussion w/ Kevin, > > > > > > would you please have a look? > > > > > > > > > > > > Bus Driver > > > > > > > > > > > > { in i915/vgt/xxx.c } > > > > > > > > > > > > - define a subset of vfio_pci interfaces > > > > > > - selective pass-through (say aperture) > > > > > > - trap MMIO: interface w/ QEMU > > > > > > > > > > What's included in the subset?=A0=A0Certainly the bus reset ioctl= s really > > > > > don't apply, but you'll need to support the full device interface= , > > > > > right?=A0=A0That includes the region info ioctl and access throug= h the vfio > > > > > device file descriptor as well as the interrupt info and setup io= ctls. > > > > > > > > > > > > > [All interfaces I thought are via ioctl:)=A0=A0For other stuff like= file > > > > descriptor we'll definitely keep it.] > > > > > > > > The list of ioctl commands provided by vfio_pci: > > > > > > > > - VFIO_DEVICE_GET_PCI_HOT_RESET_INFO > > > > - VFIO_DEVICE_PCI_HOT_RESET > > > > > > > > As you said, above 2 don't apply. But for this: > > > > > > > > - VFIO_DEVICE_RESET > > > > > > > > In my opinion it should be kept, no matter what will be provided in > > > > the bus driver. > > > > > > > > - VFIO_PCI_ROM_REGION_INDEX > > > > - VFIO_PCI_VGA_REGION_INDEX > > > > > > > > I suppose above 2 don't apply neither? For a vgpu we don't provide = a > > > > ROM BAR or VGA region. > > > > > > > > - VFIO_DEVICE_GET_INFO > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > - VFIO_DEVICE_GET_IRQ_INFO > > > > - VFIO_DEVICE_SET_IRQS > > > > > > > > Above 4 are needed of course. > > > > > > > > We will need to extend: > > > > > > > > - VFIO_DEVICE_GET_REGION_INFO > > > > > > > > > > > > a) adding a flag: DONT_MAP. For example, the MMIO of vgpu > > > > should be trapped instead of being mmap-ed. > > > > > > I may not in the context, but i am curious how to handle the DONT_MAP= in > > > vfio driver? Since there are no real MMIO maps into the region and i > > > suppose the access to the region should be handled by vgpu in i915 > > > driver, but currently most of the mmio accesses are handled by Qemu. > >=20 > > VFIO supports the following region attributes: > >=20 > > #define VFIO_REGION_INFO_FLAG_READ=A0=A0=A0=A0=A0=A0(1 << 0) /* Region = supports read */ > > #define VFIO_REGION_INFO_FLAG_WRITE=A0=A0=A0=A0=A0(1 << 1) /* Region su= pports write */ > > #define VFIO_REGION_INFO_FLAG_MMAP=A0=A0=A0=A0=A0=A0(1 << 2) /* Region = supports mmap */ > >=20 > > If MMAP is not set, then the QEMU driver will do pread and/or pwrite to > > the specified offsets of the device file descriptor, depending on what > > accesses are supported. =A0This is all reported through the REGION_INFO > > ioctl for a given index. =A0If mmap is supported, the VM will have dire= ct > > access to the area, without faulting to KVM other than to populate the > > mapping. =A0Without mmap support, a VM MMIO access traps into KVM, whic= h > > returns out to QEMU to service the request, which then finds the > > MemoryRegion serviced through vfio, which will then perform a > > pread/pwrite through to the kernel vfio bus driver to handle the > > access. =A0Thanks, > >=20 >=20 > Today KVMGT (not using VFIO yet) registers I/O emulation callbacks to=20 > KVM, so VM MMIO access will be forwarded to KVMGT directly for=20 > emulation in kernel. If we reuse above R/W flags, the whole emulation=20 > path would be unnecessarily long with obvious performance impact. We > either need a new flag here to indicate in-kernel emulation (bias from > passthrough support), or just hide the region alternatively (let KVMGT > to handle I/O emulation itself like today). >=20 Hi Kevin, Maybe there is some confusion about the VFIO interface that we are going to= use here. I thought we were going to adopt VFIO so nobody would need to directl= y plug into kvm module. Thanks, Neo > Thanks > Kevin