From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Michael S. Tsirkin" Subject: Re: [PATCH 2/7] pci: memory access API and IOMMU support Date: Sun, 5 Sep 2010 10:10:31 +0300 Message-ID: <20100905071029.GA16014@redhat.com> References: <1283007298-10942-1-git-send-email-eduard.munteanu@linux360.ro> <1283007298-10942-3-git-send-email-eduard.munteanu@linux360.ro> <20100902052826.GB4273@redhat.com> <20100902084058.GA7211@localhost> <20100902094947.GA9085@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Eduard - Gabriel Munteanu , joro@8bytes.org, paul@codesourcery.com, avi@redhat.com, anthony@codemonkey.ws, av1474@comtv.ru, yamahata@valinux.co.jp, kvm@vger.kernel.org, qemu-devel@nongnu.org To: Blue Swirl Return-path: Received: from mx1.redhat.com ([209.132.183.28]:59199 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752669Ab0IEHRE (ORCPT ); Sun, 5 Sep 2010 03:17:04 -0400 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Sat, Sep 04, 2010 at 09:01:06AM +0000, Blue Swirl wrote: > On Thu, Sep 2, 2010 at 9:49 AM, Michael S. Tsirkin w= rote: > > On Thu, Sep 02, 2010 at 11:40:58AM +0300, Eduard - Gabriel Munteanu= wrote: > >> On Thu, Sep 02, 2010 at 08:28:26AM +0300, Michael S. Tsirkin wrote= : > >> > On Sat, Aug 28, 2010 at 05:54:53PM +0300, Eduard - Gabriel Munte= anu wrote: > >> > > PCI devices should access memory through pci_memory_*() instea= d of > >> > > cpu_physical_memory_*(). This also provides support for transl= ation and > >> > > access checking in case an IOMMU is emulated. > >> > > > >> > > Memory maps are treated as remote IOTLBs (that is, translation= caches > >> > > belonging to the IOMMU-aware device itself). Clients (devices)= must > >> > > provide callbacks for map invalidation in case these maps are > >> > > persistent beyond the current I/O context, e.g. AIO DMA transf= ers. > >> > > > >> > > Signed-off-by: Eduard - Gabriel Munteanu > >> > > >> > > >> > I am concerned about adding more pointer chaising on data path. > >> > Could we have > >> > 1. an iommu pointer in a device, inherited by secondary buses > >> > =A0 =A0when they are created and by devices from buses when they= are attached. > >> > 2. translation pointer in the iommu instead of the bus > >> > >> The first solution I proposed was based on qdev, that is, each > >> DeviceState had an 'iommu' field. Translation would be done by > >> recursively looking in the parent bus/devs for an IOMMU. > >> > >> But Anthony said we're better off with bus-specific APIs, mostly b= ecause > >> (IIRC) there may be different types of addresses and it might be > >> difficult to abstract those properly. > > > > Well we ended up with casting > > away types to make pci callbacks fit in ide structure, > > and silently assuming that all addresses are in fact 64 bit. > > So maybe it's hard to abstract addresses properly, but > > it appears we'll have to, to avoid even worse problems. > > > >> I suppose I could revisit the idea by integrating the IOMMU in a > >> PCIDevice as opposed to a DeviceState. > >> > >> Anthony, Paul, any thoughts on this? > > > > Just to clarify: this is an optimization idea: > > instead of a bus walk on each access, do the walk > > when device is attached to the bus, and copy the iommu > > from the root to the device itself. > > > > This will also make it possible to create > > DMADeviceState structure which would have this iommu field, > > and we'd use this structure instead of the void pointers all over. > > > > > >> > 3. pci_memory_XX functions inline, doing fast path for non-iommu= case: > >> > > >> > =A0 =A0 if (__builtin_expect(!dev->iommu, 1) > >> > =A0 =A0 =A0 =A0 =A0 =A0 return cpu_memory_rw > >> > >> But isn't this some sort of 'if (likely(!dev->iommu))' from the Li= nux > >> kernel? If so, it puts the IOMMU-enabled case at disadvantage. > > > > IOMMU has a ton of indirections anyway. > > > >> I suppose most emulated systems would have at least some theoretic= al > >> reasons to enable the IOMMU, e.g. as a GART replacement (say for 3= 2-bit > >> PCI devices) or for userspace drivers. > >> So there are reasons to enable > >> the IOMMU even when you don't have a real host IOMMU and you're no= t > >> using nested guests. > > > > The time most people enable iommu for all devices in both real and = virtualized > > systems appears distant, one of the reasons is because it has a lot= of overhead. > > Let's start with not adding overhead for existing users, makes sens= e? >=20 > I think the goal architecture (not for IOMMU, but in general) is one > with zero copy DMA. This means we have stage one where the addresses > are translated to host pointers and stage two where the read/write > operation happens. The devices need to be converted. >=20 > Now, let's consider the IOMMU in this zero copy architecture. It's on= e > stage of address translation, for the access operation it will not > matter. We can add translation caching at device level (or even at an= y > intermediate levels), but that needs a cache invalidation callback > system as discussed earlier. This can be implemented later, we need > the zero copy stuff first. >=20 > Given this overall picture, I think eliminating some pointer > dereference overheads in non-zero copy architecture is a very > premature optimization and it may even direct the architecture to > wrong direction. >=20 > If the performance degradation at this point is not acceptable, we > could also postpone merging IOMMU until zero copy conversion has > happened, or make IOMMU a compile time option. But it would be nice t= o > back the decision by performance figures. I agree, a minimal benchmark showing no performance impact when disabled would put these concerns to rest. --=20 MST