From mboxrd@z Thu Jan 1 00:00:00 1970 From: Neo Jia Subject: Re: [iGVT-g]

RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Date: Wed, 3 Feb 2016 19:52:22 -0800 Message-ID: <20160204035222.GA7092@nvidia.com> References: <1454532287.18969.14.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Alex Williamson , "Lv, Zhiyuan" , Gerd Hoffmann , Yang Zhang , "igvt-g@lists.01.org" , qemu-devel , "kvm@vger.kernel.org" , Paolo Bonzini , Kirti Wankhede To: "Tian, Kevin" Return-path: Received: from hqemgate15.nvidia.com ([216.228.121.64]:8772 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752834AbcBDDw2 convert rfc822-to-8bit (ORCPT ); Wed, 3 Feb 2016 22:52:28 -0500 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Thu, Feb 04, 2016 at 03:01:36AM +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Thursday, February 04, 2016 4:45 AM > > > > > > First, Jike told me before his vacation, that we cannot do any ch= ange to > > > KVM module according to community comments. Now I think it's not = true. > > > We can do necessary changes, as long as it is done in a structura= l/layered > > > approach, w/o hard assumption on KVMGT as the only user. That's t= he > > > guideline we need to obey. :-) > >=20 > > We certainly need to separate the functionality that you're trying = to > > enable from the more pure concept of vfio.=A0=A0vfio is a userspace= driver > > interfaces, not a userspace driver interface for KVM-based virtual > > machines.=A0=A0Maybe it's more of a gimmick that we can assign PCI = devices > > to QEMU tcg VMs, but that's really just the proof of concept for mo= re > > useful capabilities, like supporting DPDK applications.=A0=A0So, I > > begrudgingly agree that structured/layered interactions are accepta= ble, > > but consider what use cases may be excluded by doing so. >=20 > Understand. We shouldn't assume VFIO always connected to KVM. For=20 > example, once we have vfio-vgpu ready, it can be used to drive contai= ner > usage too, not exactly always connecting with KVM/Qemu. Actually thin= king > more from this angle there is a new open which I'll describe in the e= nd... >=20 > > > > > > 4) Map/unmap guest memory > > > -- > > > It's there for KVM. > >=20 > > Map and unmap for who?=A0=A0For the vGPU or for the VM?=A0=A0It see= ms like we > > know how to map guest memory for the vGPU without KVM, but that's > > covered in 7), so I'm not entirely sure what this is specifying. >=20 > Map guest memory for emulation purpose in vGPU device model, e.g. to = r/w > guest GPU page table, command buffer, etc. It's the basic requirement= as > we see in any device model. >=20 > 7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HP= A > can be used to implement this interface for KVM. However for Xen it's > different, as special foreign domain mapping hypercall is involved wh= ich is > Xen specific so not appropriate to be in VFIO.=20 >=20 > That's why we list this interface separately as a key requirement (th= ough > it's obvious for KVM) Hi Kevin, It seems you are trying to map the guest physical memory into your kern= el driver on the host, right?=20 If yes, I think we have already have the required information to achiev= e that. The type1 IOMMU VGPU interface has provided , w= hich is enough for us to do any lookup. >=20 > >=20 > > > 5) Pin/unpin guest memory > > > -- > > > IGD or any PCI passthru should have same requirement. So we shoul= d be > > > able to leverage existing code in VFIO. The only tricky thing (Ji= ke may > > > elaborate after he is back), is that KVMGT requires to pin EPT en= try too, > > > which requires some further change in KVM side. But I'm not sure = whether > > > it still holds true after some design changes made in this thread= =2E So I'll > > > leave to Jike to further comment. > >=20 > > PCI assignment requires pinning all of guest memory, I would think = that > > IGD would only need to pin selective memory, so is this simply stat= ing > > that both have the need to pin memory, not that they'll do it to th= e > > same extent? >=20 > For simplicity let's first pin all memory, while taking selective pin= ning as a > future enhancement. >=20 > The tricky thing is that existing 'pin' action in VFIO doesn't actual= ly pin > EPT entry too (only pin host page tables for Qemu process). There are= =20 > various places where EPT entries might be invalidated when guest is=20 > running, while KVMGT requires EPT entries to be pinned too. Let's wai= t=20 > for Jike to elaborate whether this part is still required today. Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU = (including API and underlying IOMMU implementation) will pin the guest physical memory= and install those pages to the proper device domain. Yes, it is only for th= e QEMU process as that is what the VM is running at.=20 Do I miss something here? >=20 > >=20 > > > 6) Write-protect a guest memory page > > > -- > > > The primary purpose is for GPU page table shadowing. We need to t= rack > > > modifications on guest GPU page table, so shadow part can be sync= hronized > > > accordingly. Just think about CPU page table shadowing. And old e= xample > > > as Zhiyuan pointed out, is to write-protect guest cmd buffer. But= it becomes > > > not necessary now. > > > > > > So we need KVM to provide an interface so some agents can request= such > > > write-protection action (not just for KVMGT. could be for other t= racking > > > usages). Guangrong has been working on a general page tracking me= chanism, > > > upon which write-protection can be easily built on. The review is= still in > > > progress. > >=20 > > I have a hard time believing we don't have the mechanics to do this > > outside of KVM.=A0=A0We should be able to write protect user pages = from the > > kernel, this is how copy-on-write generally works.=A0=A0So it seems= like we > > should be able to apply those same mechanics to our userspace proce= ss, > > which just happens to be a KVM VM.=A0=A0I'm hoping that Paolo might= have > > some ideas how to make this work or maybe Intel has some virtual me= mory > > experts that can point us in the right direction. >=20 > What we want to write-protect, is when the access happens inside VM. > I don't know why any tricks in host page table can help here. The onl= y > way is to tweak page tables used in non-root mode (either EPT or > shadow page table). >=20 > >=20 > > Thanks for your summary, Kevin.=A0=A0It does seem like there are on= ly a few > > outstanding issues which should be manageable and hopefully the ove= rall > > approach is cleaner for QEMU, management tools, and provides a more > > consistent user interface as well.=A0=A0If we can translate the sol= ution to > > Xen, that's even better.=A0=A0Thanks, > >=20 >=20 > Here is the main open in my head, after thinking about the role of VF= IO: >=20 > For above 7 services required by vGPU device model, they can fall int= o > two categories: >=20 > a) services to connect vGPU with VM, which are essentially what a dev= ice > driver is doing (so VFIO can fit here), including: > 1) Selectively pass-through a region to a VM > 2) Trap-and-emulate a region > 3) Inject a virtual interrupt > 5) Pin/unpin guest memory > 7) GPA->IOVA/HVA translation (as a side-effect) >=20 > b) services to support device emulation, which gonna be hypervisor=20 > specific, including: > 4) Map/unmap guest memory I think we have the ability to support this already with VFIO, see my c= omments above. Thanks, Neo > 6) Write-protect a guest memory page >=20 > VFIO can fulfill category a), but not for b). A possible abstraction = would > be in vGPU core driver, to allow specific hypervisor registering call= backs > for category b) (which means a KVMGT specific file say KVM-vGPU will=20 > be added to KVM to connect both together). >=20 > Then a likely layered blocks would be like: >=20 > VFIO-vGPU <---------> vGPU Core <-------------> KVMGT-vGPU > ^ ^ > | | > | | > v v > nvidia intel > vGPU vGPU >=20 > Xen will register its own vGPU bus driver (not using VFIO today) and > also hypervisor services using the same framework. With this design, > everything is abstracted/registered through vGPU core driver, instead > of talking with each other directly. >=20 > Thoughts? >=20 > P.S. from the description of above requirements, the whole framework > might be also extended to cover any device type using same mediated > pass-through approach. Though graphics has some special requirement, > the majority are actually device agnostics. Maybe better not limiting= it > with a vGPU name at all. :-) >=20 > Thanks > Kevin > _______________________________________________ > iGVT-g mailing list > iGVT-g@lists.01.org > https://lists.01.org/mailman/listinfo/igvt-g From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:50012) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRAyA-0000u3-Aq for qemu-devel@nongnu.org; Wed, 03 Feb 2016 22:52:36 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aRAy5-0002rD-9l for qemu-devel@nongnu.org; Wed, 03 Feb 2016 22:52:34 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:8773) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aRAy4-0002q4-TA for qemu-devel@nongnu.org; Wed, 03 Feb 2016 22:52:29 -0500 Date: Wed, 3 Feb 2016 19:52:22 -0800 From: Neo Jia Message-ID: <20160204035222.GA7092@nvidia.com> References: <1454532287.18969.14.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: Subject: Re: [Qemu-devel] [iGVT-g]

RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Tian, Kevin" Cc: Yang Zhang , "igvt-g@lists.01.org" , "kvm@vger.kernel.org" , qemu-devel , Kirti Wankhede , Alex Williamson , "Lv, Zhiyuan" , Paolo Bonzini , Gerd Hoffmann On Thu, Feb 04, 2016 at 03:01:36AM +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Thursday, February 04, 2016 4:45 AM > > > > > > First, Jike told me before his vacation, that we cannot do any change= to > > > KVM module according to community comments. Now I think it's not true= . > > > We can do necessary changes, as long as it is done in a structural/la= yered > > > approach, w/o hard assumption on KVMGT as the only user. That's the > > > guideline we need to obey. :-) > >=20 > > We certainly need to separate the functionality that you're trying to > > enable from the more pure concept of vfio.=A0=A0vfio is a userspace dri= ver > > interfaces, not a userspace driver interface for KVM-based virtual > > machines.=A0=A0Maybe it's more of a gimmick that we can assign PCI devi= ces > > to QEMU tcg VMs, but that's really just the proof of concept for more > > useful capabilities, like supporting DPDK applications.=A0=A0So, I > > begrudgingly agree that structured/layered interactions are acceptable, > > but consider what use cases may be excluded by doing so. >=20 > Understand. We shouldn't assume VFIO always connected to KVM. For=20 > example, once we have vfio-vgpu ready, it can be used to drive container > usage too, not exactly always connecting with KVM/Qemu. Actually thinking > more from this angle there is a new open which I'll describe in the end..= . >=20 > > > > > > 4) Map/unmap guest memory > > > -- > > > It's there for KVM. > >=20 > > Map and unmap for who?=A0=A0For the vGPU or for the VM?=A0=A0It seems l= ike we > > know how to map guest memory for the vGPU without KVM, but that's > > covered in 7), so I'm not entirely sure what this is specifying. >=20 > Map guest memory for emulation purpose in vGPU device model, e.g. to r/w > guest GPU page table, command buffer, etc. It's the basic requirement as > we see in any device model. >=20 > 7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HPA > can be used to implement this interface for KVM. However for Xen it's > different, as special foreign domain mapping hypercall is involved which = is > Xen specific so not appropriate to be in VFIO.=20 >=20 > That's why we list this interface separately as a key requirement (though > it's obvious for KVM) Hi Kevin, It seems you are trying to map the guest physical memory into your kernel d= river on the host, right?=20 If yes, I think we have already have the required information to achieve th= at. The type1 IOMMU VGPU interface has provided , which= is enough for us to do any lookup. >=20 > >=20 > > > 5) Pin/unpin guest memory > > > -- > > > IGD or any PCI passthru should have same requirement. So we should be > > > able to leverage existing code in VFIO. The only tricky thing (Jike m= ay > > > elaborate after he is back), is that KVMGT requires to pin EPT entry = too, > > > which requires some further change in KVM side. But I'm not sure whet= her > > > it still holds true after some design changes made in this thread. So= I'll > > > leave to Jike to further comment. > >=20 > > PCI assignment requires pinning all of guest memory, I would think that > > IGD would only need to pin selective memory, so is this simply stating > > that both have the need to pin memory, not that they'll do it to the > > same extent? >=20 > For simplicity let's first pin all memory, while taking selective pinning= as a > future enhancement. >=20 > The tricky thing is that existing 'pin' action in VFIO doesn't actually p= in > EPT entry too (only pin host page tables for Qemu process). There are=20 > various places where EPT entries might be invalidated when guest is=20 > running, while KVMGT requires EPT entries to be pinned too. Let's wait=20 > for Jike to elaborate whether this part is still required today. Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (inc= luding API and underlying IOMMU implementation) will pin the guest physical memory and install those pages to the proper device domain. Yes, it is only for the QE= MU process as that is what the VM is running at.=20 Do I miss something here? >=20 > >=20 > > > 6) Write-protect a guest memory page > > > -- > > > The primary purpose is for GPU page table shadowing. We need to track > > > modifications on guest GPU page table, so shadow part can be synchron= ized > > > accordingly. Just think about CPU page table shadowing. And old examp= le > > > as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it = becomes > > > not necessary now. > > > > > > So we need KVM to provide an interface so some agents can request suc= h > > > write-protection action (not just for KVMGT. could be for other track= ing > > > usages). Guangrong has been working on a general page tracking mechan= ism, > > > upon which write-protection can be easily built on. The review is sti= ll in > > > progress. > >=20 > > I have a hard time believing we don't have the mechanics to do this > > outside of KVM.=A0=A0We should be able to write protect user pages from= the > > kernel, this is how copy-on-write generally works.=A0=A0So it seems lik= e we > > should be able to apply those same mechanics to our userspace process, > > which just happens to be a KVM VM.=A0=A0I'm hoping that Paolo might hav= e > > some ideas how to make this work or maybe Intel has some virtual memory > > experts that can point us in the right direction. >=20 > What we want to write-protect, is when the access happens inside VM. > I don't know why any tricks in host page table can help here. The only > way is to tweak page tables used in non-root mode (either EPT or > shadow page table). >=20 > >=20 > > Thanks for your summary, Kevin.=A0=A0It does seem like there are only a= few > > outstanding issues which should be manageable and hopefully the overall > > approach is cleaner for QEMU, management tools, and provides a more > > consistent user interface as well.=A0=A0If we can translate the solutio= n to > > Xen, that's even better.=A0=A0Thanks, > >=20 >=20 > Here is the main open in my head, after thinking about the role of VFIO: >=20 > For above 7 services required by vGPU device model, they can fall into > two categories: >=20 > a) services to connect vGPU with VM, which are essentially what a device > driver is doing (so VFIO can fit here), including: > 1) Selectively pass-through a region to a VM > 2) Trap-and-emulate a region > 3) Inject a virtual interrupt > 5) Pin/unpin guest memory > 7) GPA->IOVA/HVA translation (as a side-effect) >=20 > b) services to support device emulation, which gonna be hypervisor=20 > specific, including: > 4) Map/unmap guest memory I think we have the ability to support this already with VFIO, see my comme= nts above. Thanks, Neo > 6) Write-protect a guest memory page >=20 > VFIO can fulfill category a), but not for b). A possible abstraction woul= d > be in vGPU core driver, to allow specific hypervisor registering callback= s > for category b) (which means a KVMGT specific file say KVM-vGPU will=20 > be added to KVM to connect both together). >=20 > Then a likely layered blocks would be like: >=20 > VFIO-vGPU <---------> vGPU Core <-------------> KVMGT-vGPU > ^ ^ > | | > | | > v v > nvidia intel > vGPU vGPU >=20 > Xen will register its own vGPU bus driver (not using VFIO today) and > also hypervisor services using the same framework. With this design, > everything is abstracted/registered through vGPU core driver, instead > of talking with each other directly. >=20 > Thoughts? >=20 > P.S. from the description of above requirements, the whole framework > might be also extended to cover any device type using same mediated > pass-through approach. Though graphics has some special requirement, > the majority are actually device agnostics. Maybe better not limiting it > with a vGPU name at all. :-) >=20 > Thanks > Kevin > _______________________________________________ > iGVT-g mailing list > iGVT-g@lists.01.org > https://lists.01.org/mailman/listinfo/igvt-g