From mboxrd@z Thu Jan 1 00:00:00 1970 From: Chris Wright Subject: Re: kvm PCI assignment & VFIO ramblings Date: Fri, 26 Aug 2011 14:06:19 -0700 Message-ID: <20110826210619.GE13060@sequoia.sous-sol.org> References: <20110826193559.GD13060@sequoia.sous-sol.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , "Roedel, Joerg" , Alexander Graf , qemu-devel , Chris Wright , iommu , Avi Kivity , "linux-pci@vger.kernel.org" , linuxppc-dev , "benve@cisco.com" To: Aaron Fabbri Return-path: Content-Disposition: inline In-Reply-To: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org List-Id: kvm.vger.kernel.org * Aaron Fabbri (aafabbri@cisco.com) wrote: > On 8/26/11 12:35 PM, "Chris Wright" wrote: > > * Aaron Fabbri (aafabbri@cisco.com) wrote: > >> Each process will open vfio devices on the fly, and they need to be able to > >> share IOMMU resources. > > > > How do you share IOMMU resources w/ multiple processes, are the processes > > sharing memory? > > Sorry, bad wording. I share IOMMU domains *within* each process. Ah, got it. Thanks. > E.g. If one process has 3 devices and another has 10, I can get by with two > iommu domains (and can share buffers among devices within each process). > > If I ever need to share devices across processes, the shared memory case > might be interesting. > > > > >> So I need the ability to dynamically bring up devices and assign them to a > >> group. The number of actual devices and how they map to iommu domains is > >> not known ahead of time. We have a single piece of silicon that can expose > >> hundreds of pci devices. > > > > This does not seem fundamentally different from the KVM use case. > > > > We have 2 kinds of groupings. > > > > 1) low-level system or topoolgy grouping > > > > Some may have multiple devices in a single group > > > > * the PCIe-PCI bridge example > > * the POWER partitionable endpoint > > > > Many will not > > > > * singleton group, e.g. typical x86 PCIe function (majority of > > assigned devices) > > > > Not sure it makes sense to have these administratively defined as > > opposed to system defined. > > > > 2) logical grouping > > > > * multiple low-level groups (singleton or otherwise) attached to same > > process, allowing things like single set of io page tables where > > applicable. > > > > These are nominally adminstratively defined. In the KVM case, there > > is likely a privileged task (i.e. libvirtd) involved w/ making the > > device available to the guest and can do things like group merging. > > In your userspace case, perhaps it should be directly exposed. > > Yes. In essence, I'd rather not have to run any other admin processes. > Doing things programmatically, on the fly, from each process, is the > cleanest model right now. I don't see an issue w/ this. As long it can not add devices to the system defined groups, it's not a privileged operation. So we still need the iommu domain concept exposed in some form to logically put groups into a single iommu domain (if desired). In fact, I believe Alex covered this in his most recent recap: ...The group fd will provide interfaces for enumerating the devices in the group, returning a file descriptor for each device in the group (the "device fd"), binding groups together, and returning a file descriptor for iommu operations (the "iommu fd"). thanks, -chris From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from sequoia.sous-sol.org (sous-sol.org [216.99.217.87]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by ozlabs.org (Postfix) with ESMTPS id D5289B6FD7 for ; Sat, 27 Aug 2011 07:07:11 +1000 (EST) Date: Fri, 26 Aug 2011 14:06:19 -0700 From: Chris Wright To: Aaron Fabbri Subject: Re: kvm PCI assignment & VFIO ramblings Message-ID: <20110826210619.GE13060@sequoia.sous-sol.org> References: <20110826193559.GD13060@sequoia.sous-sol.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: Cc: Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , "Roedel, Joerg" , Alexander Graf , qemu-devel , Chris Wright , iommu , Avi Kivity , Anthony Liguori , "linux-pci@vger.kernel.org" , linuxppc-dev , "benve@cisco.com" List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , * Aaron Fabbri (aafabbri@cisco.com) wrote: > On 8/26/11 12:35 PM, "Chris Wright" wrote: > > * Aaron Fabbri (aafabbri@cisco.com) wrote: > >> Each process will open vfio devices on the fly, and they need to be able to > >> share IOMMU resources. > > > > How do you share IOMMU resources w/ multiple processes, are the processes > > sharing memory? > > Sorry, bad wording. I share IOMMU domains *within* each process. Ah, got it. Thanks. > E.g. If one process has 3 devices and another has 10, I can get by with two > iommu domains (and can share buffers among devices within each process). > > If I ever need to share devices across processes, the shared memory case > might be interesting. > > > > >> So I need the ability to dynamically bring up devices and assign them to a > >> group. The number of actual devices and how they map to iommu domains is > >> not known ahead of time. We have a single piece of silicon that can expose > >> hundreds of pci devices. > > > > This does not seem fundamentally different from the KVM use case. > > > > We have 2 kinds of groupings. > > > > 1) low-level system or topoolgy grouping > > > > Some may have multiple devices in a single group > > > > * the PCIe-PCI bridge example > > * the POWER partitionable endpoint > > > > Many will not > > > > * singleton group, e.g. typical x86 PCIe function (majority of > > assigned devices) > > > > Not sure it makes sense to have these administratively defined as > > opposed to system defined. > > > > 2) logical grouping > > > > * multiple low-level groups (singleton or otherwise) attached to same > > process, allowing things like single set of io page tables where > > applicable. > > > > These are nominally adminstratively defined. In the KVM case, there > > is likely a privileged task (i.e. libvirtd) involved w/ making the > > device available to the guest and can do things like group merging. > > In your userspace case, perhaps it should be directly exposed. > > Yes. In essence, I'd rather not have to run any other admin processes. > Doing things programmatically, on the fly, from each process, is the > cleanest model right now. I don't see an issue w/ this. As long it can not add devices to the system defined groups, it's not a privileged operation. So we still need the iommu domain concept exposed in some form to logically put groups into a single iommu domain (if desired). In fact, I believe Alex covered this in his most recent recap: ...The group fd will provide interfaces for enumerating the devices in the group, returning a file descriptor for each device in the group (the "device fd"), binding groups together, and returning a file descriptor for iommu operations (the "iommu fd"). thanks, -chris From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:44988) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qx3cV-0001or-GC for qemu-devel@nongnu.org; Fri, 26 Aug 2011 17:07:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Qx3cT-0008QH-RT for qemu-devel@nongnu.org; Fri, 26 Aug 2011 17:07:19 -0400 Received: from sous-sol.org ([216.99.217.87]:39254 helo=sequoia.sous-sol.org) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Qx3cT-0008IZ-Ej for qemu-devel@nongnu.org; Fri, 26 Aug 2011 17:07:17 -0400 Date: Fri, 26 Aug 2011 14:06:19 -0700 From: Chris Wright Message-ID: <20110826210619.GE13060@sequoia.sous-sol.org> References: <20110826193559.GD13060@sequoia.sous-sol.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] kvm PCI assignment & VFIO ramblings List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Aaron Fabbri Cc: Alexey Kardashevskiy , "kvm@vger.kernel.org" , Paul Mackerras , "Roedel, Joerg" , Alexander Graf , qemu-devel , Chris Wright , iommu , Avi Kivity , "linux-pci@vger.kernel.org" , linuxppc-dev , "benve@cisco.com" * Aaron Fabbri (aafabbri@cisco.com) wrote: > On 8/26/11 12:35 PM, "Chris Wright" wrote: > > * Aaron Fabbri (aafabbri@cisco.com) wrote: > >> Each process will open vfio devices on the fly, and they need to be able to > >> share IOMMU resources. > > > > How do you share IOMMU resources w/ multiple processes, are the processes > > sharing memory? > > Sorry, bad wording. I share IOMMU domains *within* each process. Ah, got it. Thanks. > E.g. If one process has 3 devices and another has 10, I can get by with two > iommu domains (and can share buffers among devices within each process). > > If I ever need to share devices across processes, the shared memory case > might be interesting. > > > > >> So I need the ability to dynamically bring up devices and assign them to a > >> group. The number of actual devices and how they map to iommu domains is > >> not known ahead of time. We have a single piece of silicon that can expose > >> hundreds of pci devices. > > > > This does not seem fundamentally different from the KVM use case. > > > > We have 2 kinds of groupings. > > > > 1) low-level system or topoolgy grouping > > > > Some may have multiple devices in a single group > > > > * the PCIe-PCI bridge example > > * the POWER partitionable endpoint > > > > Many will not > > > > * singleton group, e.g. typical x86 PCIe function (majority of > > assigned devices) > > > > Not sure it makes sense to have these administratively defined as > > opposed to system defined. > > > > 2) logical grouping > > > > * multiple low-level groups (singleton or otherwise) attached to same > > process, allowing things like single set of io page tables where > > applicable. > > > > These are nominally adminstratively defined. In the KVM case, there > > is likely a privileged task (i.e. libvirtd) involved w/ making the > > device available to the guest and can do things like group merging. > > In your userspace case, perhaps it should be directly exposed. > > Yes. In essence, I'd rather not have to run any other admin processes. > Doing things programmatically, on the fly, from each process, is the > cleanest model right now. I don't see an issue w/ this. As long it can not add devices to the system defined groups, it's not a privileged operation. So we still need the iommu domain concept exposed in some form to logically put groups into a single iommu domain (if desired). In fact, I believe Alex covered this in his most recent recap: ...The group fd will provide interfaces for enumerating the devices in the group, returning a file descriptor for each device in the group (the "device fd"), binding groups together, and returning a file descriptor for iommu operations (the "iommu fd"). thanks, -chris