From mboxrd@z Thu Jan 1 00:00:00 1970 From: Alex Williamson Subject: Re: kvm PCI assignment & VFIO ramblings Date: Tue, 23 Aug 2011 11:08:29 -0600 Message-ID: <1314119311.2859.59.camel@bling.home> References: <20110802082848.GD29719@yookeroo.fritz.box> <1312308847.2653.467.camel@bling.home> <1312310121.2653.470.camel@bling.home> <20110803020422.GF29719@yookeroo.fritz.box> <4E3F9E33.5000706@redhat.com> <1312932258.4524.55.camel@bling.home> <1312944513.29273.28.camel@pasglop> <1313859105.6866.192.camel@x201.home> <20110822172508.GJ2079@amd.com> <1314040622.6866.268.camel@x201.home> <20110823131441.GN2079@amd.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Benjamin Herrenschmidt , chrisw , Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , "linux-pci@vger.kernel.org" , qemu-devel , iommu , Avi Kivity , Anthony Liguori , linuxppc-dev , "benve@cisco.com" To: "Roedel, Joerg" Return-path: In-Reply-To: <20110823131441.GN2079@amd.com> Sender: linux-pci-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote: > On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote: > > On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote: > > > > I am in favour of /dev/vfio/$GROUP. If multiple devices should be > > > assigned to a guest, there can also be an ioctl to bind a group to an > > > address-space of another group (certainly needs some care to not allow > > > that both groups belong to different processes). > > > > That's an interesting idea. Maybe an interface similar to the current > > uiommu interface, where you open() the 2nd group fd and pass the fd via > > ioctl to the primary group. IOMMUs that don't support this would fail > > the attach device callback, which would fail the ioctl to bind them. It > > will need to be designed so any group can be removed from the super-set > > and the remaining group(s) still works. This feels like something that > > can be added after we get an initial implementation. > > Handling it through fds is a good idea. This makes sure that everything > belongs to one process. I am not really sure yet if we go the way to > just bind plain groups together or if we create meta-groups. The > meta-groups thing seems somewhat cleaner, though. I'm leaning towards binding because we need to make it dynamic, but I don't really have a good picture of the lifecycle of a meta-group. > > > Btw, a problem we havn't talked about yet entirely is > > > driver-deassignment. User space can decide to de-assign the device from > > > vfio while a fd is open on it. With PCI there is no way to let this fail > > > (the .release function returns void last time i checked). Is this a > > > problem, and yes, how we handle that? > > > > The current vfio has the same problem, we can't unbind a device from > > vfio while it's attached to a guest. I think we'd use the same solution > > too; send out a netlink packet for a device removal and have the .remove > > call sleep on a wait_event(, refcnt == 0). We could also set a timeout > > and SIGBUS the PIDs holding the device if they don't return it > > willingly. Thanks, > > Putting the process to sleep (which would be uninterruptible) seems bad. > The process would sleep until the guest releases the device-group, which > can take days or months. > The best thing (and the most intrusive :-) ) is to change PCI core to > allow unbindings to fail, I think. But this probably further complicates > the way to upstream VFIO... Yes, it's not ideal but I think it's sufficient for now and if we later get support for returning an error from release, we can set a timeout after notifying the user to make use of that. Thanks, Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by ozlabs.org (Postfix) with ESMTP id F1B7AB6F18 for ; Wed, 24 Aug 2011 03:08:41 +1000 (EST) Subject: Re: kvm PCI assignment & VFIO ramblings From: Alex Williamson To: "Roedel, Joerg" Date: Tue, 23 Aug 2011 11:08:29 -0600 In-Reply-To: <20110823131441.GN2079@amd.com> References: <20110802082848.GD29719@yookeroo.fritz.box> <1312308847.2653.467.camel@bling.home> <1312310121.2653.470.camel@bling.home> <20110803020422.GF29719@yookeroo.fritz.box> <4E3F9E33.5000706@redhat.com> <1312932258.4524.55.camel@bling.home> <1312944513.29273.28.camel@pasglop> <1313859105.6866.192.camel@x201.home> <20110822172508.GJ2079@amd.com> <1314040622.6866.268.camel@x201.home> <20110823131441.GN2079@amd.com> Content-Type: text/plain; charset="UTF-8" Message-ID: <1314119311.2859.59.camel@bling.home> Mime-Version: 1.0 Cc: Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , qemu-devel , chrisw , iommu , Avi Kivity , Anthony Liguori , "linux-pci@vger.kernel.org" , linuxppc-dev , "benve@cisco.com" List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote: > On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote: > > On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote: > > > > I am in favour of /dev/vfio/$GROUP. If multiple devices should be > > > assigned to a guest, there can also be an ioctl to bind a group to an > > > address-space of another group (certainly needs some care to not allow > > > that both groups belong to different processes). > > > > That's an interesting idea. Maybe an interface similar to the current > > uiommu interface, where you open() the 2nd group fd and pass the fd via > > ioctl to the primary group. IOMMUs that don't support this would fail > > the attach device callback, which would fail the ioctl to bind them. It > > will need to be designed so any group can be removed from the super-set > > and the remaining group(s) still works. This feels like something that > > can be added after we get an initial implementation. > > Handling it through fds is a good idea. This makes sure that everything > belongs to one process. I am not really sure yet if we go the way to > just bind plain groups together or if we create meta-groups. The > meta-groups thing seems somewhat cleaner, though. I'm leaning towards binding because we need to make it dynamic, but I don't really have a good picture of the lifecycle of a meta-group. > > > Btw, a problem we havn't talked about yet entirely is > > > driver-deassignment. User space can decide to de-assign the device from > > > vfio while a fd is open on it. With PCI there is no way to let this fail > > > (the .release function returns void last time i checked). Is this a > > > problem, and yes, how we handle that? > > > > The current vfio has the same problem, we can't unbind a device from > > vfio while it's attached to a guest. I think we'd use the same solution > > too; send out a netlink packet for a device removal and have the .remove > > call sleep on a wait_event(, refcnt == 0). We could also set a timeout > > and SIGBUS the PIDs holding the device if they don't return it > > willingly. Thanks, > > Putting the process to sleep (which would be uninterruptible) seems bad. > The process would sleep until the guest releases the device-group, which > can take days or months. > The best thing (and the most intrusive :-) ) is to change PCI core to > allow unbindings to fail, I think. But this probably further complicates > the way to upstream VFIO... Yes, it's not ideal but I think it's sufficient for now and if we later get support for returning an error from release, we can set a timeout after notifying the user to make use of that. Thanks, Alex From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:43233) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QvuSy-0001Ru-4X for qemu-devel@nongnu.org; Tue, 23 Aug 2011 13:08:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QvuSx-0000Z8-8I for qemu-devel@nongnu.org; Tue, 23 Aug 2011 13:08:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60656) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QvuSw-0000Z1-TI for qemu-devel@nongnu.org; Tue, 23 Aug 2011 13:08:43 -0400 From: Alex Williamson Date: Tue, 23 Aug 2011 11:08:29 -0600 In-Reply-To: <20110823131441.GN2079@amd.com> References: <20110802082848.GD29719@yookeroo.fritz.box> <1312308847.2653.467.camel@bling.home> <1312310121.2653.470.camel@bling.home> <20110803020422.GF29719@yookeroo.fritz.box> <4E3F9E33.5000706@redhat.com> <1312932258.4524.55.camel@bling.home> <1312944513.29273.28.camel@pasglop> <1313859105.6866.192.camel@x201.home> <20110822172508.GJ2079@amd.com> <1314040622.6866.268.camel@x201.home> <20110823131441.GN2079@amd.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Message-ID: <1314119311.2859.59.camel@bling.home> Mime-Version: 1.0 Subject: Re: [Qemu-devel] kvm PCI assignment & VFIO ramblings List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Roedel, Joerg" Cc: Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , qemu-devel , chrisw , iommu , Avi Kivity , "linux-pci@vger.kernel.org" , linuxppc-dev , "benve@cisco.com" On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote: > On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote: > > On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote: > > > > I am in favour of /dev/vfio/$GROUP. If multiple devices should be > > > assigned to a guest, there can also be an ioctl to bind a group to an > > > address-space of another group (certainly needs some care to not allow > > > that both groups belong to different processes). > > > > That's an interesting idea. Maybe an interface similar to the current > > uiommu interface, where you open() the 2nd group fd and pass the fd via > > ioctl to the primary group. IOMMUs that don't support this would fail > > the attach device callback, which would fail the ioctl to bind them. It > > will need to be designed so any group can be removed from the super-set > > and the remaining group(s) still works. This feels like something that > > can be added after we get an initial implementation. > > Handling it through fds is a good idea. This makes sure that everything > belongs to one process. I am not really sure yet if we go the way to > just bind plain groups together or if we create meta-groups. The > meta-groups thing seems somewhat cleaner, though. I'm leaning towards binding because we need to make it dynamic, but I don't really have a good picture of the lifecycle of a meta-group. > > > Btw, a problem we havn't talked about yet entirely is > > > driver-deassignment. User space can decide to de-assign the device from > > > vfio while a fd is open on it. With PCI there is no way to let this fail > > > (the .release function returns void last time i checked). Is this a > > > problem, and yes, how we handle that? > > > > The current vfio has the same problem, we can't unbind a device from > > vfio while it's attached to a guest. I think we'd use the same solution > > too; send out a netlink packet for a device removal and have the .remove > > call sleep on a wait_event(, refcnt == 0). We could also set a timeout > > and SIGBUS the PIDs holding the device if they don't return it > > willingly. Thanks, > > Putting the process to sleep (which would be uninterruptible) seems bad. > The process would sleep until the guest releases the device-group, which > can take days or months. > The best thing (and the most intrusive :-) ) is to change PCI core to > allow unbindings to fail, I think. But this probably further complicates > the way to upstream VFIO... Yes, it's not ideal but I think it's sufficient for now and if we later get support for returning an error from release, we can set a timeout after notifying the user to make use of that. Thanks, Alex