From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Date: Tue, 26 Jan 2016 15:56:15 -0700 Message-ID: <1453848975.18049.7.camel@redhat.com> References: <569C5071.6080004@intel.com> <1453092476.32741.67.camel@redhat.com> <569CA8AD.6070200@intel.com> <1453143919.32741.169.camel@redhat.com> <569F4C86.2070501@intel.com> <56A6083E.10703@intel.com> <1453757426.32741.614.camel@redhat.com> <56A72313.9030009@intel.com> <56A77D2D.40109@gmail.com> <1453826249.26652.54.camel@redhat.com> <1453844613.18049.1.camel@redhat.com> <1453846073.18049.3.camel@redhat.com> <1453847250.18049.5.camel@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Gerd Hoffmann , Paolo Bonzini , "Lv, Zhiyuan" , "Ruan, Shuai" , "kvm@vger.kernel.org" , qemu-devel , "igvt-g@lists.01.org" , Neo Jia To: "Tian, Kevin" , Yang Zhang , "Song, Jike" Return-path: Received: from mx1.redhat.com ([209.132.183.28]:51483 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753119AbcAZW4R (ORCPT ); Tue, 26 Jan 2016 17:56:17 -0500 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Tue, 2016-01-26 at 22:39 +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Wednesday, January 27, 2016 6:27 AM > >=C2=A0 > > On Tue, 2016-01-26 at 22:15 +0000, Tian, Kevin wrote: > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > Sent: Wednesday, January 27, 2016 6:08 AM > > > >=C2=A0 > > > > > > > >=C2=A0 > > > > > > >=C2=A0 > > > > > > > Today KVMGT (not using VFIO yet) registers I/O emulation = callbacks to > > > > > > > KVM, so VM MMIO access will be forwarded to KVMGT directl= y for > > > > > > > emulation in kernel. If we reuse above R/W flags, the who= le emulation > > > > > > > path would be unnecessarily long with obvious performance= impact. We > > > > > > > either need a new flag here to indicate in-kernel emulati= on (bias from > > > > > > > passthrough support), or just hide the region alternative= ly (let KVMGT > > > > > > > to handle I/O emulation itself like today). > > > > > >=C2=A0 > > > > > > That sounds like a future optimization TBH.=C2=A0=C2=A0Ther= e's very strict > > > > > > layering between vfio and kvm.=C2=A0=C2=A0Physical device a= ssignment could make > > > > > > use of it as well, avoiding a round trip through userspace = when an > > > > > > ioread/write would do.=C2=A0=C2=A0Userspace also needs to o= rchestrate those kinds > > > > > > of accelerators, there might be cases where userspace wants= to see those > > > > > > transactions for debugging or manipulating the device.=C2=A0= =C2=A0We can't simply > > > > > > take shortcuts to provide such direct access.=C2=A0=C2=A0Th= anks, > > > > > >=C2=A0 > > > > >=C2=A0 > > > > > But we have to balance such debugging flexibility and accepta= ble performance. > > > > > To me the latter one is more important otherwise there'd be n= o real usage > > > > > around this technique, while for debugging there are other al= ternative (e.g. > > > > > ftrace) Consider some extreme case with 100k traps/second and= then see > > > > > how much impact a 2-3x longer emulation path can bring... > > > >=C2=A0 > > > > Are you jumping to the conclusion that it cannot be done with p= roper > > > > layering in place?=C2=A0=C2=A0Performance is important, but it'= s not an excuse to > > > > abandon designing interfaces between independent components.=C2= =A0=C2=A0Thanks, > > > >=C2=A0 > > >=C2=A0 > > > Two are not controversial. My point is to remove unnecessary long= trip > > > as possible. After another thought, yes we can reuse existing rea= d/write > > > flags: > > > =C2=A0 - KVMGT will expose a private control variable whether in-= kernel > > > delivery is required; > >=C2=A0 > > But in-kernel delivery is never *required*.=C2=A0=C2=A0Wouldn't use= rspace want to > > deliver in-kernel any time it possibly could? > >=C2=A0 > > > =C2=A0 - when the variable is true, KVMGT will register in-kernel= MMIO > > > emulation callbacks then VM MMIO request will be delivered to KVM= GT > > > directly; > > > =C2=A0 - when the variable is false, KVMGT will not register anyt= hing. > > > VM MMIO request will then be delivered to Qemu and then ioread/wr= ite > > > will be used to finally reach KVMGT emulation logic; > >=C2=A0 > > No, that means the interface is entirely dependent on a backdoor th= rough > > KVM.=C2=A0=C2=A0Why can't userspace (QEMU) do something like regist= er an MMIO > > region with KVM handled via a provided file descriptor and offset, > > couldn't KVM then call the file ops without a kernel exit?=C2=A0=C2= =A0Thanks, > >=C2=A0 >=C2=A0 > Could you elaborate this thought? If it can achieve the purpose w/o > a kernel exit definitely we can adapt to it. :-) I only thought of it when replying to the last email and have been doin= g some research, but we already do quite a bit of synchronization through file descriptors.=C2=A0=C2=A0The kvm-vfio pseudo device uses a group fi= le descriptor to ensure a user has access to a group, allowing some degree of interaction between modules.=C2=A0=C2=A0Eventfds and irqfds already = make use of f_ops on file descriptors to poke data.=C2=A0=C2=A0So, if KVM had infor= mation that an MMIO region was backed by a file descriptor for which it already has a reference via fdget() (and verified access rights and whatnot), then it ought to be a simple matter to get to f_ops->read/write knowing the base offset of that MMIO region.=C2=A0=C2=A0Perhaps it could even simpl= y use __vfs_read/write().=C2=A0=C2=A0Then we've got a proper reference to the= file descriptor for ownership purposes and we've transparently jumped across modules without any implicit knowledge of the other end.=C2=A0=C2=A0Cou= ld it work? Thanks, Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57598) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOCX8-0001Ef-NX for qemu-devel@nongnu.org; Tue, 26 Jan 2016 17:56:23 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aOCX3-0002FK-KF for qemu-devel@nongnu.org; Tue, 26 Jan 2016 17:56:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48840) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aOCX3-0002F5-Cl for qemu-devel@nongnu.org; Tue, 26 Jan 2016 17:56:17 -0500 Message-ID: <1453848975.18049.7.camel@redhat.com> From: Alex Williamson Date: Tue, 26 Jan 2016 15:56:15 -0700 In-Reply-To: References: <569C5071.6080004@intel.com> <1453092476.32741.67.camel@redhat.com> <569CA8AD.6070200@intel.com> <1453143919.32741.169.camel@redhat.com> <569F4C86.2070501@intel.com> <56A6083E.10703@intel.com> <1453757426.32741.614.camel@redhat.com> <56A72313.9030009@intel.com> <56A77D2D.40109@gmail.com> <1453826249.26652.54.camel@redhat.com> <1453844613.18049.1.camel@redhat.com> <1453846073.18049.3.camel@redhat.com> <1453847250.18049.5.camel@redhat.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Tian, Kevin" , Yang Zhang , "Song, Jike" Cc: "Ruan, Shuai" , Neo Jia , "kvm@vger.kernel.org" , "igvt-g@lists.01.org" , qemu-devel , Gerd Hoffmann , Paolo Bonzini , "Lv, Zhiyuan" On Tue, 2016-01-26 at 22:39 +0000, Tian, Kevin wrote: > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > Sent: Wednesday, January 27, 2016 6:27 AM > >=C2=A0 > > On Tue, 2016-01-26 at 22:15 +0000, Tian, Kevin wrote: > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com] > > > > Sent: Wednesday, January 27, 2016 6:08 AM > > > >=C2=A0 > > > > > > > >=C2=A0 > > > > > > >=C2=A0 > > > > > > > Today KVMGT (not using VFIO yet) registers I/O emulation ca= llbacks to > > > > > > > KVM, so VM MMIO access will be forwarded to KVMGT directly = for > > > > > > > emulation in kernel. If we reuse above R/W flags, the whole= emulation > > > > > > > path would be unnecessarily long with obvious performance i= mpact. We > > > > > > > either need a new flag here to indicate in-kernel emulation= (bias from > > > > > > > passthrough support), or just hide the region alternatively= (let KVMGT > > > > > > > to handle I/O emulation itself like today). > > > > > >=C2=A0 > > > > > > That sounds like a future optimization TBH.=C2=A0=C2=A0There'= s very strict > > > > > > layering between vfio and kvm.=C2=A0=C2=A0Physical device ass= ignment could make > > > > > > use of it as well, avoiding a round trip through userspace wh= en an > > > > > > ioread/write would do.=C2=A0=C2=A0Userspace also needs to orc= hestrate those kinds > > > > > > of accelerators, there might be cases where userspace wants t= o see those > > > > > > transactions for debugging or manipulating the device.=C2=A0=C2= =A0We can't simply > > > > > > take shortcuts to provide such direct access.=C2=A0=C2=A0Than= ks, > > > > > >=C2=A0 > > > > >=C2=A0 > > > > > But we have to balance such debugging flexibility and acceptabl= e performance. > > > > > To me the latter one is more important otherwise there'd be no = real usage > > > > > around this technique, while for debugging there are other alte= rnative (e.g. > > > > > ftrace) Consider some extreme case with 100k traps/second and t= hen see > > > > > how much impact a 2-3x longer emulation path can bring... > > > >=C2=A0 > > > > Are you jumping to the conclusion that it cannot be done with pro= per > > > > layering in place?=C2=A0=C2=A0Performance is important, but it's = not an excuse to > > > > abandon designing interfaces between independent components.=C2=A0= =C2=A0Thanks, > > > >=C2=A0 > > >=C2=A0 > > > Two are not controversial. My point is to remove unnecessary long t= rip > > > as possible. After another thought, yes we can reuse existing read/= write > > > flags: > > > =C2=A0 - KVMGT will expose a private control variable whether in-ke= rnel > > > delivery is required; > >=C2=A0 > > But in-kernel delivery is never *required*.=C2=A0=C2=A0Wouldn't users= pace want to > > deliver in-kernel any time it possibly could? > >=C2=A0 > > > =C2=A0 - when the variable is true, KVMGT will register in-kernel M= MIO > > > emulation callbacks then VM MMIO request will be delivered to KVMGT > > > directly; > > > =C2=A0 - when the variable is false, KVMGT will not register anythi= ng. > > > VM MMIO request will then be delivered to Qemu and then ioread/writ= e > > > will be used to finally reach KVMGT emulation logic; > >=C2=A0 > > No, that means the interface is entirely dependent on a backdoor thro= ugh > > KVM.=C2=A0=C2=A0Why can't userspace (QEMU) do something like register= an MMIO > > region with KVM handled via a provided file descriptor and offset, > > couldn't KVM then call the file ops without a kernel exit?=C2=A0=C2=A0= Thanks, > >=C2=A0 >=C2=A0 > Could you elaborate this thought? If it can achieve the purpose w/o > a kernel exit definitely we can adapt to it. :-) I only thought of it when replying to the last email and have been doing some research, but we already do quite a bit of synchronization through file descriptors.=C2=A0=C2=A0The kvm-vfio pseudo device uses a group file descriptor to ensure a user has access to a group, allowing some degree of interaction between modules.=C2=A0=C2=A0Eventfds and irqfds already ma= ke use of f_ops on file descriptors to poke data.=C2=A0=C2=A0So, if KVM had informa= tion that an MMIO region was backed by a file descriptor for which it already has a reference via fdget() (and verified access rights and whatnot), then it ought to be a simple matter to get to f_ops->read/write knowing the base offset of that MMIO region.=C2=A0=C2=A0Perhaps it could even simply = use __vfs_read/write().=C2=A0=C2=A0Then we've got a proper reference to the f= ile descriptor for ownership purposes and we've transparently jumped across modules without any implicit knowledge of the other end.=C2=A0=C2=A0Could= it work? Thanks, Alex