From mboxrd@z Thu Jan 1 00:00:00 1970 From: Joerg Roedel Subject: Re: kvm PCI assignment & VFIO ramblings Date: Mon, 22 Aug 2011 19:25:08 +0200 Message-ID: <20110822172508.GJ2079@amd.com> References: <1311983933.8793.42.camel@pasglop> <1312050011.2265.185.camel@x201.home> <20110802082848.GD29719@yookeroo.fritz.box> <1312308847.2653.467.camel@bling.home> <1312310121.2653.470.camel@bling.home> <20110803020422.GF29719@yookeroo.fritz.box> <4E3F9E33.5000706@redhat.com> <1312932258.4524.55.camel@bling.home> <1312944513.29273.28.camel@pasglop> <1313859105.6866.192.camel@x201.home> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Cc: Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , qemu-devel , chrisw , iommu , Avi Kivity , "linux-pci@vger.kernel.org" , linuxppc-dev , "benve@cisco.com" To: Alex Williamson Return-path: Content-Disposition: inline In-Reply-To: <1313859105.6866.192.camel@x201.home> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org On Sat, Aug 20, 2011 at 12:51:39PM -0400, Alex Williamson wrote: > We had an extremely productive VFIO BoF on Monday. Here's my attempt to > capture the plan that I think we agreed to: > > We need to address both the description and enforcement of device > groups. Groups are formed any time the iommu does not have resolution > between a set of devices. On x86, this typically happens when a > PCI-to-PCI bridge exists between the set of devices and the iommu. For > Power, partitionable endpoints define a group. Grouping information > needs to be exposed for both userspace and kernel internal usage. This > will be a sysfs attribute setup by the iommu drivers. Perhaps: > > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group > 42 Right, that is mainly for libvirt to provide that information to the user in a meaningful way. So userspace is aware that other devices might not work anymore when it assigns one to a guest. > > (I use a PCI example here, but attribute should not be PCI specific) > > From there we have a few options. In the BoF we discussed a model where > binding a device to vfio creates a /dev/vfio$GROUP character device > file. This "group" fd provides provides dma mapping ioctls as well as > ioctls to enumerate and return a "device" fd for each attached member of > the group (similar to KVM_CREATE_VCPU). We enforce grouping by > returning an error on open() of the group fd if there are members of the > group not bound to the vfio driver. Each device fd would then support a > similar set of ioctls and mapping (mmio/pio/config) interface as current > vfio, except for the obvious domain and dma ioctls superseded by the > group fd. > > Another valid model might be that /dev/vfio/$GROUP is created for all > groups when the vfio module is loaded. The group fd would allow open() > and some set of iommu querying and device enumeration ioctls, but would > error on dma mapping and retrieving device fds until all of the group > devices are bound to the vfio driver. I am in favour of /dev/vfio/$GROUP. If multiple devices should be assigned to a guest, there can also be an ioctl to bind a group to an address-space of another group (certainly needs some care to not allow that both groups belong to different processes). Btw, a problem we havn't talked about yet entirely is driver-deassignment. User space can decide to de-assign the device from vfio while a fd is open on it. With PCI there is no way to let this fail (the .release function returns void last time i checked). Is this a problem, and yes, how we handle that? Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ch1outboundpool.messaging.microsoft.com (ch1ehsobe006.messaging.microsoft.com [216.32.181.186]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (Client CN "mail.global.frontbridge.com", Issuer "Microsoft Secure Server Authority" (verified OK)) by ozlabs.org (Postfix) with ESMTPS id A7D84B6F75 for ; Tue, 23 Aug 2011 03:25:22 +1000 (EST) Date: Mon, 22 Aug 2011 19:25:08 +0200 From: Joerg Roedel To: Alex Williamson Subject: Re: kvm PCI assignment & VFIO ramblings Message-ID: <20110822172508.GJ2079@amd.com> References: <1311983933.8793.42.camel@pasglop> <1312050011.2265.185.camel@x201.home> <20110802082848.GD29719@yookeroo.fritz.box> <1312308847.2653.467.camel@bling.home> <1312310121.2653.470.camel@bling.home> <20110803020422.GF29719@yookeroo.fritz.box> <4E3F9E33.5000706@redhat.com> <1312932258.4524.55.camel@bling.home> <1312944513.29273.28.camel@pasglop> <1313859105.6866.192.camel@x201.home> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" In-Reply-To: <1313859105.6866.192.camel@x201.home> Cc: Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , qemu-devel , chrisw , iommu , Avi Kivity , Anthony Liguori , "linux-pci@vger.kernel.org" , linuxppc-dev , "benve@cisco.com" List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , On Sat, Aug 20, 2011 at 12:51:39PM -0400, Alex Williamson wrote: > We had an extremely productive VFIO BoF on Monday. Here's my attempt to > capture the plan that I think we agreed to: > > We need to address both the description and enforcement of device > groups. Groups are formed any time the iommu does not have resolution > between a set of devices. On x86, this typically happens when a > PCI-to-PCI bridge exists between the set of devices and the iommu. For > Power, partitionable endpoints define a group. Grouping information > needs to be exposed for both userspace and kernel internal usage. This > will be a sysfs attribute setup by the iommu drivers. Perhaps: > > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group > 42 Right, that is mainly for libvirt to provide that information to the user in a meaningful way. So userspace is aware that other devices might not work anymore when it assigns one to a guest. > > (I use a PCI example here, but attribute should not be PCI specific) > > From there we have a few options. In the BoF we discussed a model where > binding a device to vfio creates a /dev/vfio$GROUP character device > file. This "group" fd provides provides dma mapping ioctls as well as > ioctls to enumerate and return a "device" fd for each attached member of > the group (similar to KVM_CREATE_VCPU). We enforce grouping by > returning an error on open() of the group fd if there are members of the > group not bound to the vfio driver. Each device fd would then support a > similar set of ioctls and mapping (mmio/pio/config) interface as current > vfio, except for the obvious domain and dma ioctls superseded by the > group fd. > > Another valid model might be that /dev/vfio/$GROUP is created for all > groups when the vfio module is loaded. The group fd would allow open() > and some set of iommu querying and device enumeration ioctls, but would > error on dma mapping and retrieving device fds until all of the group > devices are bound to the vfio driver. I am in favour of /dev/vfio/$GROUP. If multiple devices should be assigned to a guest, there can also be an ioctl to bind a group to an address-space of another group (certainly needs some care to not allow that both groups belong to different processes). Btw, a problem we havn't talked about yet entirely is driver-deassignment. User space can decide to de-assign the device from vfio while a fd is open on it. With PCI there is no way to let this fail (the .release function returns void last time i checked). Is this a problem, and yes, how we handle that? Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:37901) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QvYFU-00044K-2A for qemu-devel@nongnu.org; Mon, 22 Aug 2011 13:25:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QvYFS-0006IC-Qs for qemu-devel@nongnu.org; Mon, 22 Aug 2011 13:25:20 -0400 Received: from ch1ehsobe006.messaging.microsoft.com ([216.32.181.186]:29120 helo=ch1outboundpool.messaging.microsoft.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QvYFS-0006I4-NH for qemu-devel@nongnu.org; Mon, 22 Aug 2011 13:25:18 -0400 Date: Mon, 22 Aug 2011 19:25:08 +0200 From: Joerg Roedel Message-ID: <20110822172508.GJ2079@amd.com> References: <1311983933.8793.42.camel@pasglop> <1312050011.2265.185.camel@x201.home> <20110802082848.GD29719@yookeroo.fritz.box> <1312308847.2653.467.camel@bling.home> <1312310121.2653.470.camel@bling.home> <20110803020422.GF29719@yookeroo.fritz.box> <4E3F9E33.5000706@redhat.com> <1312932258.4524.55.camel@bling.home> <1312944513.29273.28.camel@pasglop> <1313859105.6866.192.camel@x201.home> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <1313859105.6866.192.camel@x201.home> Subject: Re: [Qemu-devel] kvm PCI assignment & VFIO ramblings List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , qemu-devel , chrisw , iommu , Avi Kivity , "linux-pci@vger.kernel.org" , linuxppc-dev , "benve@cisco.com" On Sat, Aug 20, 2011 at 12:51:39PM -0400, Alex Williamson wrote: > We had an extremely productive VFIO BoF on Monday. Here's my attempt to > capture the plan that I think we agreed to: > > We need to address both the description and enforcement of device > groups. Groups are formed any time the iommu does not have resolution > between a set of devices. On x86, this typically happens when a > PCI-to-PCI bridge exists between the set of devices and the iommu. For > Power, partitionable endpoints define a group. Grouping information > needs to be exposed for both userspace and kernel internal usage. This > will be a sysfs attribute setup by the iommu drivers. Perhaps: > > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group > 42 Right, that is mainly for libvirt to provide that information to the user in a meaningful way. So userspace is aware that other devices might not work anymore when it assigns one to a guest. > > (I use a PCI example here, but attribute should not be PCI specific) > > From there we have a few options. In the BoF we discussed a model where > binding a device to vfio creates a /dev/vfio$GROUP character device > file. This "group" fd provides provides dma mapping ioctls as well as > ioctls to enumerate and return a "device" fd for each attached member of > the group (similar to KVM_CREATE_VCPU). We enforce grouping by > returning an error on open() of the group fd if there are members of the > group not bound to the vfio driver. Each device fd would then support a > similar set of ioctls and mapping (mmio/pio/config) interface as current > vfio, except for the obvious domain and dma ioctls superseded by the > group fd. > > Another valid model might be that /dev/vfio/$GROUP is created for all > groups when the vfio module is loaded. The group fd would allow open() > and some set of iommu querying and device enumeration ioctls, but would > error on dma mapping and retrieving device fds until all of the group > devices are bound to the vfio driver. I am in favour of /dev/vfio/$GROUP. If multiple devices should be assigned to a guest, there can also be an ioctl to bind a group to an address-space of another group (certainly needs some care to not allow that both groups belong to different processes). Btw, a problem we havn't talked about yet entirely is driver-deassignment. User space can decide to de-assign the device from vfio while a fd is open on it. With PCI there is no way to let this fail (the .release function returns void last time i checked). Is this a problem, and yes, how we handle that? Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632