All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dante Cinco <dantecinco@gmail.com>
To: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Xen-devel <xen-devel@lists.xensource.com>
Subject: Re: swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough
Date: Tue, 16 Nov 2010 11:43:11 -0800	[thread overview]
Message-ID: <AANLkTikw8reKXwd9CcXc3qqHuXKjbMEatAVfn19uwzs3@mail.gmail.com> (raw)
In-Reply-To: <20101116185748.GA11549@dumpdata.com>

On Tue, Nov 16, 2010 at 10:57 AM, Konrad Rzeszutek Wilk
<konrad.wilk@oracle.com> wrote:
>> >> Using the bounce buffers limits the DMA operations to under 32-bit. So could it be that you are using some casting macro that casts a PFN to unsigned long or vice-versa and we end up truncating it to 32-bit? (I've seen this issue actually with InfiniBand drivers back in RHEL5 days..). Lastly, do you set your DMA mask on the device to 32BIT?
>> >>
>> >> The tachyon chip supports both 32-bit & 45-bit dma. Some features need to set 32-bit physical addr to chip. Others need to set 45-bit physical addr to chip.
>> >
>> > Oh boy. That complicates it.
>> >
>> >> The driver doesn't set DMA mask on the device to 32 bit.
>> >
>> > Is it set then to 45bit?
>> >
>>
>> We were not explicitly setting the DMA mask. pci_alloc_coherent was
>
> You should. But only once (during startup).
>
>> always returning 32 bits but pci_map_single was returning a 34-bit
>> address which we truncate by casting it to a uint32_t since the
>
> Truncating any bus (DMA) address is a big no no.
>
>> Tachyon's HBA register is only 32 bits. With swiotlb=force, both
>
> Not knowing the driver I can't comment here much, but
>  1). When you say 'HBA registers' I think PCI MMIO BARs. Those are
>     usually found beneath the 4GB limit and you get the virtual
>     address when doing ioremap (or the pci equivalant). And the
>     bus address is definitly under the 4GB.
>  2). After you have done that, set your pci_dma_mask to 34-bit, and then
>  2). For all other operations where you can do 34-bit use the pci_map
>     _single. The swiotlb buffer looks at the dma_mask (and if there
>     is no set it assumes 32bit), and if it finds the physical address
>     to be within the DMA mask it will gladly translate the physical
>     to bus and nothing else. If however the physical address is way
>     beyound the bus address it will give you the bounce buffer which
>     you will later have to copy from (using pci_sync..). I've written
>     a little blurp at the bottom of the email explaining this in more details.
>
> Or is the issue that when you write to your HBA register the DMA
> address, the HBA register can _only_ deal with 32-bit values (4bytes)?

The HBA register which is using the address returned by pci_map_single
is limited to a 32-bit value.

> In which case the PCI device seems to be limited to addressing only up to 4GB, right?

The HBA has some 32-bit registers and some that are 45-bit.

>
>> returned 32 bits without explicitly setting the DMA mask. Once we set
>> the mask to 32 bits using pci_set_dma_mask, the NMIs stopped. However
>> with iommu=soft (and no more swiotlb=force), we're still stuck with
>> the abysmal I/O performance (same as when we had swiotlb=force).
>
> Right, that is expected.

So with iommu=soft, all I/Os have to go through Xen-SWIOTLB which
explains why we're seeing the abysmal I/O performance, right?

Is it true then that with an HVM domU kernel and PCI passthrough, it
does not use Xen-SWIOTLB and therefore results in better performance?

>
>> In pvops domU (xen-pcifront-0.8.2), what does iommu=soft do? What's
>> the default if we don't specify it? Without it, we get no I/Os (it
>
> If you don't specify it you can't do PCI passthrough in PV guests.
> It is automatically enabled when you boot Linux as Dom0.
>
>> seems the interrupts and/or DMA don't work).
>
> It has two purposes:
>
>  1). The predominant and which is used for both DomU and Dom0 is to
>     translate physical address to machine frame numbers (PFNs->MFNs).
>     Xen PV guests have a P2M array that is consulted when setting
>     virtual addresses (PTEs). For PCI BARs, they are equivalant
>     (PFN == MFN), but for memory regions they can be discontigous,
>     and in decreasing order. If you would traverse the P2M list you
>     could see: p2m(0x1000)==0x5121, p2m(0x1001)==0x5120, p2m(0x1002)==0x5119.
>
>     So obviously we need a lookup mechanism to say find for
>     virtual address 0xfffff8000010000 the DMA address (bus address).
>     Naively on baremetal on X86 you could use virt_to_phy which would
>     get you PFN 0x10000. On Xen however, we need to consult the P2M array.
>     For example, for p2m[0x10000], the real machine frame number might 0x102323.
>
>     So when you do 'pci_map_*' Xen-SWIOTLB looks up the P2M to find you the
>     machine frame number and returns that (dma address aka bus address). That
>     is the value you tell the HBA to transform from/to.
>
>     If you don't enable Xen-SWIOTLB, and use the native one (or none at all),
>     you end up programming the PCI driver with bogus data since the bus address you
>     are giving the card does not correspond to the real bus address.
>
>  2). Using our example before, the p2m[0x10000] returned MFN 0x102323. That
>     MFN is above 4GB (0x100000) and if your device can _only_ do PCI Memory Write
>     and PCI Memory Read b/c it only has 32-bit address bits we need some way
>     of still getting the contents of 0x102323 to the PCI card. This is where
>     bounce buffers come in play. During bootup, Xen-SWIOTLB initializes a 64MB
>     chunk of space that is underneath the 4GB space - it is also contingous.
>     When you do 'pci_map_*' Xen-SWIOTLB looks at the DMA mask you have, the MFN,
>     and if DMA mask & MFN > DMA mask it copies the value from 0x102323 to one it'ss
>     buffers, gives you the MFN of its buffer (say 0x20000) and you program that
>     in the PCI card.  When you get an interrupt from the PCI card, you call
>     pci_sync_* which copies from MFN 0x20000 to 0x102323 and sticks the MFN 0x20000
>     back on the list of buffers to be used. And now you have in MFN 0x102323 the
>     result.
>
>>
>> Are there any profiling tools you can suggest for domU? I was able to
>> apply Dulloor's xenoprofile patch to our dom0 kernel (2.6.32.25-pvops)
>> but not to xen-pcifront-0.8.2.
>
> Oh boy. I don't sorry.
>

  reply	other threads:[~2010-11-16 19:43 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-11  1:16 swiotlb=force in Konrad's xen-pcifront-0.8.2 pvops domU kernel with PCI passthrough Dante Cinco
2010-11-11 16:04 ` Konrad Rzeszutek Wilk
2010-11-11 18:31   ` Dante Cinco
2010-11-11 19:03     ` Konrad Rzeszutek Wilk
2010-11-11 19:42       ` Lin, Ray
2010-11-12 15:56         ` Konrad Rzeszutek Wilk
2010-11-12 16:20           ` Lin, Ray
2010-11-12 16:55             ` Konrad Rzeszutek Wilk
2010-11-12 19:38               ` Lin, Ray
2010-11-12 22:33                 ` Konrad Rzeszutek Wilk
2010-11-12 22:57                   ` Lin, Ray
2010-11-16 17:07                   ` Dante Cinco
2010-11-16 18:57                     ` Konrad Rzeszutek Wilk
2010-11-16 19:43                       ` Dante Cinco [this message]
2010-11-16 20:15                         ` Konrad Rzeszutek Wilk
2010-11-18  1:09                           ` Dante Cinco
2010-11-18 17:19                             ` Konrad Rzeszutek Wilk
2010-11-18 17:28                               ` Chris Mason
2010-11-18 17:54                               ` Mathieu Desnoyers
2010-11-18 18:43                               ` Dante Cinco
2010-11-18 18:52                                 ` Lin, Ray
2010-11-18 19:35                                 ` Dante Cinco
2010-11-18 21:20                                   ` Dan Magenheimer
2010-11-18 21:39                                     ` Lin, Ray
2010-11-19  0:20                                       ` Dan Magenheimer
2010-11-19  1:38                                         ` Dante Cinco
2010-11-19 17:10                                   ` Jeremy Fitzhardinge
2010-11-19 17:52                                     ` Dante Cinco
2010-11-19 17:58                                       ` Keir Fraser
2010-11-19 22:36                                         ` Dan Magenheimer
2010-11-20  0:13                                           ` Dante Cinco
2010-11-19 17:55                                     ` Lin, Ray
2010-11-12 18:29           ` Dante Cinco
2010-11-11 22:32       ` Dante Cinco
2010-11-12  1:02         ` Dante Cinco
2010-11-12 16:58           ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AANLkTikw8reKXwd9CcXc3qqHuXKjbMEatAVfn19uwzs3@mail.gmail.com \
    --to=dantecinco@gmail.com \
    --cc=konrad.wilk@oracle.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.