From mboxrd@z Thu Jan  1 00:00:00 1970
From: Dante Cinco <dantecinco@gmail.com>
Subject: Re: swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops
	domU kernel with PCI passthrough
Date: Wed, 17 Nov 2010 17:09:24 -0800
Message-ID: <AANLkTin7SRKuT5qQQ_1NSis1asOG3eJ1SmmC3fppsGnv@mail.gmail.com>
References: <20101112165541.GA10339@dumpdata.com>
	<EB4C61A1A2501842A04B573FE42B14D601374FBFD2@cosmail02.lsi.com>
	<20101112223333.GD26189@dumpdata.com>
	<AANLkTi=H6r2=-zJE+6eCtP4VXacYhd_e47+KRW5vdwjS@mail.gmail.com>
	<20101116185748.GA11549@dumpdata.com>
	<AANLkTikw8reKXwd9CcXc3qqHuXKjbMEatAVfn19uwzs3@mail.gmail.com>
	<20101116201349.GA18315@dumpdata.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Return-path: <xen-devel-bounces@lists.xensource.com>
In-Reply-To: <20101116201349.GA18315@dumpdata.com>
List-Unsubscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xensource.com>
List-Help: <mailto:xen-devel-request@lists.xensource.com?subject=help>
List-Subscribe: <http://lists.xensource.com/mailman/listinfo/xen-devel>,
	<mailto:xen-devel-request@lists.xensource.com?subject=subscribe>
Sender: xen-devel-bounces@lists.xensource.com
Errors-To: xen-devel-bounces@lists.xensource.com
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
List-Id: xen-devel@lists.xenproject.org

On Tue, Nov 16, 2010 at 12:15 PM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> > Or is the issue that when you write to your HBA register the DMA
>> > address, the HBA register can _only_ deal with 32-bit values (4bytes)?
>>
>> The HBA register which is using the address returned by pci_map_single
>> is limited to a 32-bit value.
>>
>> > In which case the PCI device seems to be limited to addressing only up to 4GB, right?
>>
>> The HBA has some 32-bit registers and some that are 45-bit.
>
> Ugh, so can you set up pci coherent DMA pools at startup for the 32-bit
> registers. Then set the pci_dma_mask to 45-bit and use pci_map_single for
> all others.
>>
>> >
>> >> returned 32 bits without explicitly setting the DMA mask. Once we set
>> >> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However
>> >> with iommu=soft (and no more swiotlb=force), we're still stuck with
>> >> the abysmal I/O performance (same as when we had swiotlb=force).
>> >
>> > Right, that is expected.
>>
>> So with iommu=soft, all I/Os have to go through Xen-SWIOTLB which
>> explains why we're seeing the abysmal I/O performance, right?
>
> You are simplifying it. You are seeing abysmal I/O performance b/c you
> are doing bounce buffering. You can fix this by making the driver
> have a 32-bit pool allocated at startup and use that just for the
> HBA registers that can only do 32-bit, and then for the rest use
> the pci_map_single and use DMA mask 45-bit.

I wanted to confirm that bounce buffering was indeed occurring so I
modified swiotlb.c in the kernel and added printks in the following
functions:
swiotlb_bounce
swiotlb_tbl_map_single
swiotlb_tbl_unmap_single
Sure enough we were calling all 3 five times per I/O. We took your
suggestion and replaced pci_map_single with pci_pool_alloc. The
swiotlb calls were gone but the I/O performance only improved 6% (29k
IOPS to 31k IOPS) which is still abysmal.

Any suggestions on where to look next? I have one question about the
P2M array: Does the P2M lookup occur every DMA or just during the
allocation? What I'm getting at is this: Is the Xen-SWIOTLB a central
resource that could be a bottleneck?

>
>>
>> Is it true then that with an HVM domU kernel and PCI passthrough, it
>> does not use Xen-SWIOTLB and therefore results in better performance?
>
> Yes and no.
>
> If you allocate to your HVM guests more than 4GB you are going to
> hit the same issues with the bounce buffer.
>
> If you give your guest less than 4GB, there is no SWIOTLB running in the guest
> and QEMU along with the hypervisor end up using the hardware one (currently
> Xen hypervisor supports AMD V-i and Intel VT-d). In your case it is the VT-d
> - at which point the VT-d will remap your GMFN to MFNs. And the VT-d will
> be responsible for translating the DMA address that the PCI card will
> try to access to the real MFN.
>
>