From mboxrd@z Thu Jan 1 00:00:00 1970 From: Dante Cinco Subject: Re: swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough Date: Tue, 16 Nov 2010 11:43:11 -0800 Message-ID: References: <20101112165541.GA10339@dumpdata.com> <20101112223333.GD26189@dumpdata.com> <20101116185748.GA11549@dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Return-path: In-Reply-To: <20101116185748.GA11549@dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: Konrad Rzeszutek Wilk Cc: Xen-devel List-Id: xen-devel@lists.xenproject.org On Tue, Nov 16, 2010 at 10:57 AM, Konrad Rzeszutek Wilk wrote: >> >> Using the bounce buffers limits the DMA operations to under 32-bit. S= o could it be that you are using some casting macro that casts a PFN to uns= igned long or vice-versa and we end up truncating it to 32-bit? (I've seen = this issue actually with InfiniBand drivers back in RHEL5 days..). Lastly, = do you set your DMA mask on the device to 32BIT? >> >> >> >> The tachyon chip supports both 32-bit & 45-bit dma. Some features nee= d to set 32-bit physical addr to chip. Others need to set 45-bit physical a= ddr to chip. >> > >> > Oh boy. That complicates it. >> > >> >> The driver doesn't set DMA mask on the device to 32 bit. >> > >> > Is it set then to 45bit? >> > >> >> We were not explicitly setting the DMA mask. pci_alloc_coherent was > > You should. But only once (during startup). > >> always returning 32 bits but pci_map_single was returning a 34-bit >> address which we truncate by casting it to a uint32_t since the > > Truncating any bus (DMA) address is a big no no. > >> Tachyon's HBA register is only 32 bits. With swiotlb=3Dforce, both > > Not knowing the driver I can't comment here much, but > =A01). When you say 'HBA registers' I think PCI MMIO BARs. Those are > =A0 =A0 usually found beneath the 4GB limit and you get the virtual > =A0 =A0 address when doing ioremap (or the pci equivalant). And the > =A0 =A0 bus address is definitly under the 4GB. > =A02). After you have done that, set your pci_dma_mask to 34-bit, and the= n > =A02). For all other operations where you can do 34-bit use the pci_map > =A0 =A0 _single. The swiotlb buffer looks at the dma_mask (and if there > =A0 =A0 is no set it assumes 32bit), and if it finds the physical address > =A0 =A0 to be within the DMA mask it will gladly translate the physical > =A0 =A0 to bus and nothing else. If however the physical address is way > =A0 =A0 beyound the bus address it will give you the bounce buffer which > =A0 =A0 you will later have to copy from (using pci_sync..). I've written > =A0 =A0 a little blurp at the bottom of the email explaining this in more= details. > > Or is the issue that when you write to your HBA register the DMA > address, the HBA register can _only_ deal with 32-bit values (4bytes)? The HBA register which is using the address returned by pci_map_single is limited to a 32-bit value. > In which case the PCI device seems to be limited to addressing only up to= 4GB, right? The HBA has some 32-bit registers and some that are 45-bit. > >> returned 32 bits without explicitly setting the DMA mask. Once we set >> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However >> with iommu=3Dsoft (and no more swiotlb=3Dforce), we're still stuck with >> the abysmal I/O performance (same as when we had swiotlb=3Dforce). > > Right, that is expected. So with iommu=3Dsoft, all I/Os have to go through Xen-SWIOTLB which explains why we're seeing the abysmal I/O performance, right? Is it true then that with an HVM domU kernel and PCI passthrough, it does not use Xen-SWIOTLB and therefore results in better performance? > >> In pvops domU (xen-pcifront-0.8.2), what does iommu=3Dsoft do? What's >> the default if we don't specify it? Without it, we get no I/Os (it > > If you don't specify it you can't do PCI passthrough in PV guests. > It is automatically enabled when you boot Linux as Dom0. > >> seems the interrupts and/or DMA don't work). > > It has two purposes: > > =A01). The predominant and which is used for both DomU and Dom0 is to > =A0 =A0 translate physical address to machine frame numbers (PFNs->MFNs). > =A0 =A0 Xen PV guests have a P2M array that is consulted when setting > =A0 =A0 virtual addresses (PTEs). For PCI BARs, they are equivalant > =A0 =A0 (PFN =3D=3D MFN), but for memory regions they can be discontigous= , > =A0 =A0 and in decreasing order. If you would traverse the P2M list you > =A0 =A0 could see: p2m(0x1000)=3D=3D0x5121, p2m(0x1001)=3D=3D0x5120, p2m(= 0x1002)=3D=3D0x5119. > > =A0 =A0 So obviously we need a lookup mechanism to say find for > =A0 =A0 virtual address 0xfffff8000010000 the DMA address (bus address). > =A0 =A0 Naively on baremetal on X86 you could use virt_to_phy which would > =A0 =A0 get you PFN 0x10000. On Xen however, we need to consult the P2M a= rray. > =A0 =A0 For example, for p2m[0x10000], the real machine frame number migh= t 0x102323. > > =A0 =A0 So when you do 'pci_map_*' Xen-SWIOTLB looks up the P2M to find y= ou the > =A0 =A0 machine frame number and returns that (dma address aka bus addres= s). That > =A0 =A0 is the value you tell the HBA to transform from/to. > > =A0 =A0 If you don't enable Xen-SWIOTLB, and use the native one (or none = at all), > =A0 =A0 you end up programming the PCI driver with bogus data since the b= us address you > =A0 =A0 are giving the card does not correspond to the real bus address. > > =A02). Using our example before, the p2m[0x10000] returned MFN 0x102323. = That > =A0 =A0 MFN is above 4GB (0x100000) and if your device can _only_ do PCI = Memory Write > =A0 =A0 and PCI Memory Read b/c it only has 32-bit address bits we need s= ome way > =A0 =A0 of still getting the contents of 0x102323 to the PCI card. This i= s where > =A0 =A0 bounce buffers come in play. During bootup, Xen-SWIOTLB initializ= es a 64MB > =A0 =A0 chunk of space that is underneath the 4GB space - it is also cont= ingous. > =A0 =A0 When you do 'pci_map_*' Xen-SWIOTLB looks at the DMA mask you hav= e, the MFN, > =A0 =A0 and if DMA mask & MFN > DMA mask it copies the value from 0x10232= 3 to one it'ss > =A0 =A0 buffers, gives you the MFN of its buffer (say 0x20000) and you pr= ogram that > =A0 =A0 in the PCI card. =A0When you get an interrupt from the PCI card, = you call > =A0 =A0 pci_sync_* which copies from MFN 0x20000 to 0x102323 and sticks t= he MFN 0x20000 > =A0 =A0 back on the list of buffers to be used. And now you have in MFN 0= x102323 the > =A0 =A0 result. > >> >> Are there any profiling tools you can suggest for domU? I was able to >> apply Dulloor's xenoprofile patch to our dom0 kernel (2.6.32.25-pvops) >> but not to xen-pcifront-0.8.2. > > Oh boy. I don't sorry. >