linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IOAT DMA w/IOMMU
@ 2018-08-09 18:14 Eric Pilmore
  2018-08-09 18:43 ` Bjorn Helgaas
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Pilmore @ 2018-08-09 18:14 UTC (permalink / raw)
  To: linux-pci

Didn't get any response on the IRC channel so trying here.

Was wondering if anybody here has used IOAT DMA engines with an
IOMMU turned on (Xeon based system)? My specific question is really
whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
destination without having to map that address to the IOVA space of the
DMA engine first (assuming the IOMMU is on)?

I am encountering issues where I see PTE Errors reported from DMAR in this
scenario, but I do not if I use a different DMA engine that's sitting downstream
off the PCI tree. I'm wondering if the IOAT DMA failure is some
artifact of these
engines sitting behind the Host Bridge.

Thanks in advance!
Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 18:14 IOAT DMA w/IOMMU Eric Pilmore
@ 2018-08-09 18:43 ` Bjorn Helgaas
  2018-08-09 18:51   ` Eric Pilmore
  0 siblings, 1 reply; 48+ messages in thread
From: Bjorn Helgaas @ 2018-08-09 18:43 UTC (permalink / raw)
  To: Eric Pilmore
  Cc: linux-pci, David Woodhouse, Logan Gunthorpe, Alex Williamson, iommu

[+cc David, Logan, Alex, iommu list]

On Thu, Aug 09, 2018 at 11:14:13AM -0700, Eric Pilmore wrote:
> Didn't get any response on the IRC channel so trying here.
> 
> Was wondering if anybody here has used IOAT DMA engines with an
> IOMMU turned on (Xeon based system)? My specific question is really
> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
> destination without having to map that address to the IOVA space of
> the DMA engine first (assuming the IOMMU is on)?

So is this a peer-to-peer DMA scenario?  You mention DMA, which would
be a transaction initiated by a PCI device, to a PCI BAR address, so
it doesn't sound like system memory is involved.

I copied some folks who know a lot more about this than I do.

> I am encountering issues where I see PTE Errors reported from DMAR
> in this scenario, but I do not if I use a different DMA engine
> that's sitting downstream off the PCI tree. I'm wondering if the
> IOAT DMA failure is some artifact of these engines sitting behind
> the Host Bridge.
> 
> Thanks in advance!
> Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 18:43 ` Bjorn Helgaas
@ 2018-08-09 18:51   ` Eric Pilmore
  2018-08-09 19:35     ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Pilmore @ 2018-08-09 18:51 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Logan Gunthorpe, Alex Williamson, iommu

On Thu, Aug 9, 2018 at 11:43 AM, Bjorn Helgaas <helgaas@kernel.org> wrote:
> [+cc David, Logan, Alex, iommu list]
>
> On Thu, Aug 09, 2018 at 11:14:13AM -0700, Eric Pilmore wrote:
>> Didn't get any response on the IRC channel so trying here.
>>
>> Was wondering if anybody here has used IOAT DMA engines with an
>> IOMMU turned on (Xeon based system)? My specific question is really
>> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
>> destination without having to map that address to the IOVA space of
>> the DMA engine first (assuming the IOMMU is on)?
>
> So is this a peer-to-peer DMA scenario?  You mention DMA, which would
> be a transaction initiated by a PCI device, to a PCI BAR address, so
> it doesn't sound like system memory is involved.

No, not peer-to-peer.  This is from system memory (e.g. SKB buffer which
has had an IOMMU mapping created) to a PCI BAR address.

>
> I copied some folks who know a lot more about this than I do.

Thank you!


>
>> I am encountering issues where I see PTE Errors reported from DMAR
>> in this scenario, but I do not if I use a different DMA engine
>> that's sitting downstream off the PCI tree. I'm wondering if the
>> IOAT DMA failure is some artifact of these engines sitting behind
>> the Host Bridge.
>>
>> Thanks in advance!
>> Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 18:51   ` Eric Pilmore
@ 2018-08-09 19:35     ` Logan Gunthorpe
  2018-08-09 19:47       ` Kit Chow
  2018-08-09 21:31       ` Eric Pilmore
  0 siblings, 2 replies; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-09 19:35 UTC (permalink / raw)
  To: Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu

Hey,

On 09/08/18 12:51 PM, Eric Pilmore wrote:
>>> Was wondering if anybody here has used IOAT DMA engines with an
>>> IOMMU turned on (Xeon based system)? My specific question is really
>>> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
>>> destination without having to map that address to the IOVA space of
>>> the DMA engine first (assuming the IOMMU is on)?

I haven't tested this scenario but my guess would be that IOAT would
indeed go through the IOMMU and the PCI BAR address would need to be
properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
is probably a good indication that this is the case. I really don't know
why you'd want to DMA something without mapping it.

>> So is this a peer-to-peer DMA scenario?  You mention DMA, which would
>> be a transaction initiated by a PCI device, to a PCI BAR address, so
>> it doesn't sound like system memory is involved.
> 
> No, not peer-to-peer.  This is from system memory (e.g. SKB buffer which
> has had an IOMMU mapping created) to a PCI BAR address.

It's definitely peer-to-peer in the case where you are using a DMA
engine in the PCI tree. You have the DMA PCI device sending TLPs
directly to the PCI BAR device. So, if everything is done right, the
TLPs will avoid the root complex completely. (Though, ACS settings could
also prevent this from working and you'd either get similar DMAR errors
or they'd disappear into a black hole).

When using the IOAT, it is part of the CPU so I wouldn't say it's really
peer-to-peer. But an argument could be made that it is. Though, this is
exactly what the existing ntb_transport is doing: DMAing from system
memory to a PCI BAR and vice versa using IOAT.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 19:35     ` Logan Gunthorpe
@ 2018-08-09 19:47       ` Kit Chow
  2018-08-09 20:11         ` Logan Gunthorpe
  2018-08-09 21:31       ` Eric Pilmore
  1 sibling, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-09 19:47 UTC (permalink / raw)
  To: Logan Gunthorpe, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu


On 08/09/2018 12:35 PM, Logan Gunthorpe wrote:
> Hey,
>
> On 09/08/18 12:51 PM, Eric Pilmore wrote:
>>>> Was wondering if anybody here has used IOAT DMA engines with an
>>>> IOMMU turned on (Xeon based system)? My specific question is really
>>>> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
>>>> destination without having to map that address to the IOVA space of
>>>> the DMA engine first (assuming the IOMMU is on)?
> I haven't tested this scenario but my guess would be that IOAT would
> indeed go through the IOMMU and the PCI BAR address would need to be
> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
> is probably a good indication that this is the case. I really don't know
> why you'd want to DMA something without mapping it.
I have experimented with changing ntb_async_tx_submit to dma_map the PCI 
BAR
address. With this, I get a different DMAR error:

DMAR: [DMA Write] Request device [07:00.4] fault addr ffd00000
[fault reason 12] non-zero reserved fields in PTE

Kit
>>> So is this a peer-to-peer DMA scenario?  You mention DMA, which would
>>> be a transaction initiated by a PCI device, to a PCI BAR address, so
>>> it doesn't sound like system memory is involved.
>> No, not peer-to-peer.  This is from system memory (e.g. SKB buffer which
>> has had an IOMMU mapping created) to a PCI BAR address.
> It's definitely peer-to-peer in the case where you are using a DMA
> engine in the PCI tree. You have the DMA PCI device sending TLPs
> directly to the PCI BAR device. So, if everything is done right, the
> TLPs will avoid the root complex completely. (Though, ACS settings could
> also prevent this from working and you'd either get similar DMAR errors
> or they'd disappear into a black hole).
>
> When using the IOAT, it is part of the CPU so I wouldn't say it's really
> peer-to-peer. But an argument could be made that it is. Though, this is
> exactly what the existing ntb_transport is doing: DMAing from system
> memory to a PCI BAR and vice versa using IOAT.
>
> Logan
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 19:47       ` Kit Chow
@ 2018-08-09 20:11         ` Logan Gunthorpe
  2018-08-09 20:57           ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-09 20:11 UTC (permalink / raw)
  To: Kit Chow, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 09/08/18 01:47 PM, Kit Chow wrote:
>> I haven't tested this scenario but my guess would be that IOAT would
>> indeed go through the IOMMU and the PCI BAR address would need to be
>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>> is probably a good indication that this is the case. I really don't know
>> why you'd want to DMA something without mapping it.
> I have experimented with changing ntb_async_tx_submit to dma_map the PCI 
> BAR
> address. With this, I get a different DMAR error:

What code did you use to do this?


> DMAR: [DMA Write] Request device [07:00.4] fault addr ffd00000
> [fault reason 12] non-zero reserved fields in PTE

Also, what device corresponds to 07:00.4 on your system?

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 20:11         ` Logan Gunthorpe
@ 2018-08-09 20:57           ` Kit Chow
  2018-08-09 21:11             ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-09 20:57 UTC (permalink / raw)
  To: Logan Gunthorpe, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:
>
> On 09/08/18 01:47 PM, Kit Chow wrote:
>>> I haven't tested this scenario but my guess would be that IOAT would
>>> indeed go through the IOMMU and the PCI BAR address would need to be
>>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>>> is probably a good indication that this is the case. I really don't know
>>> why you'd want to DMA something without mapping it.
>> I have experimented with changing ntb_async_tx_submit to dma_map the PCI
>> BAR
>> address. With this, I get a different DMAR error:
> What code did you use to do this?
If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I 
believe it was dma_map_single.

>
>> DMAR: [DMA Write] Request device [07:00.4] fault addr ffd00000
>> [fault reason 12] non-zero reserved fields in PTE
> Also, what device corresponds to 07:00.4 on your system?
I believe 07.00.4 was the PLX dma device. I get the same error with ioat.

Kit


> Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 20:57           ` Kit Chow
@ 2018-08-09 21:11             ` Logan Gunthorpe
  2018-08-09 21:47               ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-09 21:11 UTC (permalink / raw)
  To: Kit Chow, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 09/08/18 02:57 PM, Kit Chow wrote:
> 
> 
> On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:
>>
>> On 09/08/18 01:47 PM, Kit Chow wrote:
>>>> I haven't tested this scenario but my guess would be that IOAT would
>>>> indeed go through the IOMMU and the PCI BAR address would need to be
>>>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>>>> is probably a good indication that this is the case. I really don't know
>>>> why you'd want to DMA something without mapping it.
>>> I have experimented with changing ntb_async_tx_submit to dma_map the PCI
>>> BAR
>>> address. With this, I get a different DMAR error:
>> What code did you use to do this?
> If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I 
> believe it was dma_map_single.

I mean the complete code you use to do the mapping so we can see if it's
correct. dma_map_single() seems like an odd choice, I expected to see
dma_map_resource().

>>> DMAR: [DMA Write] Request device [07:00.4] fault addr ffd00000
>>> [fault reason 12] non-zero reserved fields in PTE
>> Also, what device corresponds to 07:00.4 on your system?
> I believe 07.00.4 was the PLX dma device. I get the same error with ioat.

Using the mapping with the PLX dma device likely converts it from a pure
P2P request to one where the TLPs pass through the IOMMU. So the fact
that you get the same error with both means IOAT almost certainly goes
through the IOMMU and there's something wrong with the mapping setup.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 19:35     ` Logan Gunthorpe
  2018-08-09 19:47       ` Kit Chow
@ 2018-08-09 21:31       ` Eric Pilmore
  2018-08-09 21:36         ` Logan Gunthorpe
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Pilmore @ 2018-08-09 21:31 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Bjorn Helgaas, linux-pci, David Woodhouse, Alex Williamson, iommu

On Thu, Aug 9, 2018 at 12:35 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Hey,
>
> On 09/08/18 12:51 PM, Eric Pilmore wrote:
>>>> Was wondering if anybody here has used IOAT DMA engines with an
>>>> IOMMU turned on (Xeon based system)? My specific question is really
>>>> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
>>>> destination without having to map that address to the IOVA space of
>>>> the DMA engine first (assuming the IOMMU is on)?
>
> I haven't tested this scenario but my guess would be that IOAT would
> indeed go through the IOMMU and the PCI BAR address would need to be
> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
> is probably a good indication that this is the case. I really don't know
> why you'd want to DMA something without mapping it.

The thought was to avoid paying the price of having to go through yet another
translation and also because it was not believed to be necessary anyway since
the DMA device could go straight to a PCI BAR address without the need for a
mapping.  We have been playing with two DMA engines, IOAT and PLX. The
PLX does not have any issues going straight to the PCI BAR address, but unlike
IOAT, PLX is sitting "in the PCI tree".

>
>>> So is this a peer-to-peer DMA scenario?  You mention DMA, which would
>>> be a transaction initiated by a PCI device, to a PCI BAR address, so
>>> it doesn't sound like system memory is involved.
>>
>> No, not peer-to-peer.  This is from system memory (e.g. SKB buffer which
>> has had an IOMMU mapping created) to a PCI BAR address.
>
> It's definitely peer-to-peer in the case where you are using a DMA
> engine in the PCI tree. You have the DMA PCI device sending TLPs
> directly to the PCI BAR device. So, if everything is done right, the
> TLPs will avoid the root complex completely. (Though, ACS settings could
> also prevent this from working and you'd either get similar DMAR errors
> or they'd disappear into a black hole).
>
> When using the IOAT, it is part of the CPU so I wouldn't say it's really
> peer-to-peer. But an argument could be made that it is. Though, this is
> exactly what the existing ntb_transport is doing: DMAing from system
> memory to a PCI BAR and vice versa using IOAT.
>
> Logan
>



-- 
Eric Pilmore
epilmore@gigaio.com
http://gigaio.com
Phone: (858) 775 2514

This e-mail message is intended only for the individual(s) to whom it
is addressed and
may contain information that is privileged, confidential, proprietary,
or otherwise exempt
from disclosure under applicable law. If you believe you have received
this message in
error, please advise the sender by return e-mail and delete it from
your mailbox.
Thank you.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 21:31       ` Eric Pilmore
@ 2018-08-09 21:36         ` Logan Gunthorpe
  2018-08-16 17:16           ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-09 21:36 UTC (permalink / raw)
  To: Eric Pilmore
  Cc: Bjorn Helgaas, linux-pci, David Woodhouse, Alex Williamson, iommu



On 09/08/18 03:31 PM, Eric Pilmore wrote:
> On Thu, Aug 9, 2018 at 12:35 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>> Hey,
>>
>> On 09/08/18 12:51 PM, Eric Pilmore wrote:
>>>>> Was wondering if anybody here has used IOAT DMA engines with an
>>>>> IOMMU turned on (Xeon based system)? My specific question is really
>>>>> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
>>>>> destination without having to map that address to the IOVA space of
>>>>> the DMA engine first (assuming the IOMMU is on)?
>>
>> I haven't tested this scenario but my guess would be that IOAT would
>> indeed go through the IOMMU and the PCI BAR address would need to be
>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>> is probably a good indication that this is the case. I really don't know
>> why you'd want to DMA something without mapping it.
> 
> The thought was to avoid paying the price of having to go through yet another
> translation and also because it was not believed to be necessary anyway since
> the DMA device could go straight to a PCI BAR address without the need for a
> mapping.  We have been playing with two DMA engines, IOAT and PLX. The
> PLX does not have any issues going straight to the PCI BAR address, but unlike
> IOAT, PLX is sitting "in the PCI tree".

Yes, you can do true P2P transactions with the PLX DMA engine. (Though
you do have to be careful as technically you need to translate to a PCI
bus address not the CPU physical address which are the same on x86; and
you need to watch that ACS doesn't mess it up).

It doesn't sound like it's possible to avoid the IOMMU when using the
IOAT so you need the mapping. I would not expect the extra translation
in the IOMMU to be noticeable, performance wise.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 21:11             ` Logan Gunthorpe
@ 2018-08-09 21:47               ` Kit Chow
  2018-08-09 22:40                 ` Jiang, Dave
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-09 21:47 UTC (permalink / raw)
  To: Logan Gunthorpe, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/09/2018 02:11 PM, Logan Gunthorpe wrote:
>
> On 09/08/18 02:57 PM, Kit Chow wrote:
>>
>> On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:
>>> On 09/08/18 01:47 PM, Kit Chow wrote:
>>>>> I haven't tested this scenario but my guess would be that IOAT would
>>>>> indeed go through the IOMMU and the PCI BAR address would need to be
>>>>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>>>>> is probably a good indication that this is the case. I really don't know
>>>>> why you'd want to DMA something without mapping it.
>>>> I have experimented with changing ntb_async_tx_submit to dma_map the PCI
>>>> BAR
>>>> address. With this, I get a different DMAR error:
>>> What code did you use to do this?
>> If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I
>> believe it was dma_map_single.
> I mean the complete code you use to do the mapping so we can see if it's
> correct. dma_map_single() seems like an odd choice, I expected to see
> dma_map_resource().
Thanks for the suggestion!  Will try out dma_map_resource and report back.

Kit

>>>> DMAR: [DMA Write] Request device [07:00.4] fault addr ffd00000
>>>> [fault reason 12] non-zero reserved fields in PTE
>>> Also, what device corresponds to 07:00.4 on your system?
>> I believe 07.00.4 was the PLX dma device. I get the same error with ioat.
> Using the mapping with the PLX dma device likely converts it from a pure
> P2P request to one where the TLPs pass through the IOMMU. So the fact
> that you get the same error with both means IOAT almost certainly goes
> through the IOMMU and there's something wrong with the mapping setup.
>
> Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* RE: IOAT DMA w/IOMMU
  2018-08-09 21:47               ` Kit Chow
@ 2018-08-09 22:40                 ` Jiang, Dave
  2018-08-09 22:48                   ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Jiang, Dave @ 2018-08-09 22:40 UTC (permalink / raw)
  To: Kit Chow, Logan Gunthorpe, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu

PiAtLS0tLU9yaWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBsaW51eC1wY2ktb3duZXJAdmdl
ci5rZXJuZWwub3JnIFttYWlsdG86bGludXgtcGNpLW93bmVyQHZnZXIua2VybmVsLm9yZ10gT24g
QmVoYWxmIE9mIEtpdCBDaG93DQo+IFNlbnQ6IFRodXJzZGF5LCBBdWd1c3QgOSwgMjAxOCAyOjQ4
IFBNDQo+IFRvOiBMb2dhbiBHdW50aG9ycGUgPGxvZ2FuZ0BkZWx0YXRlZS5jb20+OyBFcmljIFBp
bG1vcmUgPGVwaWxtb3JlQGdpZ2Fpby5jb20+OyBCam9ybiBIZWxnYWFzIDxoZWxnYWFzQGtlcm5l
bC5vcmc+DQo+IENjOiBsaW51eC1wY2lAdmdlci5rZXJuZWwub3JnOyBEYXZpZCBXb29kaG91c2Ug
PGR3bXcyQGluZnJhZGVhZC5vcmc+OyBBbGV4IFdpbGxpYW1zb24gPGFsZXgud2lsbGlhbXNvbkBy
ZWRoYXQuY29tPjsNCj4gaW9tbXVAbGlzdHMubGludXgtZm91bmRhdGlvbi5vcmcNCj4gU3ViamVj
dDogUmU6IElPQVQgRE1BIHcvSU9NTVUNCj4gDQo+IA0KPiANCj4gT24gMDgvMDkvMjAxOCAwMjox
MSBQTSwgTG9nYW4gR3VudGhvcnBlIHdyb3RlOg0KPiA+DQo+ID4gT24gMDkvMDgvMTggMDI6NTcg
UE0sIEtpdCBDaG93IHdyb3RlOg0KPiA+Pg0KPiA+PiBPbiAwOC8wOS8yMDE4IDAxOjExIFBNLCBM
b2dhbiBHdW50aG9ycGUgd3JvdGU6DQo+ID4+PiBPbiAwOS8wOC8xOCAwMTo0NyBQTSwgS2l0IENo
b3cgd3JvdGU6DQo+ID4+Pj4+IEkgaGF2ZW4ndCB0ZXN0ZWQgdGhpcyBzY2VuYXJpbyBidXQgbXkg
Z3Vlc3Mgd291bGQgYmUgdGhhdCBJT0FUIHdvdWxkDQo+ID4+Pj4+IGluZGVlZCBnbyB0aHJvdWdo
IHRoZSBJT01NVSBhbmQgdGhlIFBDSSBCQVIgYWRkcmVzcyB3b3VsZCBuZWVkIHRvIGJlDQo+ID4+
Pj4+IHByb3Blcmx5IG1hcHBlZCBpbnRvIHRoZSBJT0FUJ3MgSU9WQS4gVGhlIGZhY3QgdGhhdCB5
b3Ugc2VlIERNQVIgZXJyb3JzDQo+ID4+Pj4+IGlzIHByb2JhYmx5IGEgZ29vZCBpbmRpY2F0aW9u
IHRoYXQgdGhpcyBpcyB0aGUgY2FzZS4gSSByZWFsbHkgZG9uJ3Qga25vdw0KPiA+Pj4+PiB3aHkg
eW91J2Qgd2FudCB0byBETUEgc29tZXRoaW5nIHdpdGhvdXQgbWFwcGluZyBpdC4NCj4gPj4+PiBJ
IGhhdmUgZXhwZXJpbWVudGVkIHdpdGggY2hhbmdpbmcgbnRiX2FzeW5jX3R4X3N1Ym1pdCB0byBk
bWFfbWFwIHRoZSBQQ0kNCj4gPj4+PiBCQVINCj4gPj4+PiBhZGRyZXNzLiBXaXRoIHRoaXMsIEkg
Z2V0IGEgZGlmZmVyZW50IERNQVIgZXJyb3I6DQo+ID4+PiBXaGF0IGNvZGUgZGlkIHlvdSB1c2Ug
dG8gZG8gdGhpcz8NCj4gPj4gSWYgeW91IG1lYW4gdmVyc2lvbiBvZiBsaW51eCwgaXQgaXMgNC4x
NS43LsKgIE9yIHNwZWNpZmljIGRtYV9tYXAgY2FsbCwgSQ0KPiA+PiBiZWxpZXZlIGl0IHdhcyBk
bWFfbWFwX3NpbmdsZS4NCj4gPiBJIG1lYW4gdGhlIGNvbXBsZXRlIGNvZGUgeW91IHVzZSB0byBk
byB0aGUgbWFwcGluZyBzbyB3ZSBjYW4gc2VlIGlmIGl0J3MNCj4gPiBjb3JyZWN0LiBkbWFfbWFw
X3NpbmdsZSgpIHNlZW1zIGxpa2UgYW4gb2RkIGNob2ljZSwgSSBleHBlY3RlZCB0byBzZWUNCj4g
PiBkbWFfbWFwX3Jlc291cmNlKCkuDQo+IFRoYW5rcyBmb3IgdGhlIHN1Z2dlc3Rpb24hwqAgV2ls
bCB0cnkgb3V0IGRtYV9tYXBfcmVzb3VyY2UgYW5kIHJlcG9ydCBiYWNrLg0KPiANCj4gS2l0DQoN
CktpdCwgSSB3YXMgYWJsZSB0byB0cnkgdGhpcyBvbiB0aGUgU2t5bGFrZSBYZW9uIHBsYXRmb3Jt
IHdpdGggSW50ZWwgTlRCIGFuZCBpb2F0ZG1hIGFuZCBJIGRpZCBub3QgZW5jb3VudGVyIGFueSBp
c3N1ZXMuIFVuZm9ydHVuYXRlbHkgSSBkbyBub3QgaGF2ZSBteSBwcmUtU2t5bGFrZSBwbGF0Zm9y
bSB0byB0ZXN0IGFueW1vcmUgdG8gdmVyaWZ5Lg0KDQo+IA0KPiA+Pj4+IERNQVI6IFtETUEgV3Jp
dGVdIFJlcXVlc3QgZGV2aWNlIFswNzowMC40XSBmYXVsdCBhZGRyIGZmZDAwMDAwDQo+ID4+Pj4g
W2ZhdWx0IHJlYXNvbiAxMl0gbm9uLXplcm8gcmVzZXJ2ZWQgZmllbGRzIGluIFBURQ0KPiA+Pj4g
QWxzbywgd2hhdCBkZXZpY2UgY29ycmVzcG9uZHMgdG8gMDc6MDAuNCBvbiB5b3VyIHN5c3RlbT8N
Cj4gPj4gSSBiZWxpZXZlIDA3LjAwLjQgd2FzIHRoZSBQTFggZG1hIGRldmljZS4gSSBnZXQgdGhl
IHNhbWUgZXJyb3Igd2l0aCBpb2F0Lg0KPiA+IFVzaW5nIHRoZSBtYXBwaW5nIHdpdGggdGhlIFBM
WCBkbWEgZGV2aWNlIGxpa2VseSBjb252ZXJ0cyBpdCBmcm9tIGEgcHVyZQ0KPiA+IFAyUCByZXF1
ZXN0IHRvIG9uZSB3aGVyZSB0aGUgVExQcyBwYXNzIHRocm91Z2ggdGhlIElPTU1VLiBTbyB0aGUg
ZmFjdA0KPiA+IHRoYXQgeW91IGdldCB0aGUgc2FtZSBlcnJvciB3aXRoIGJvdGggbWVhbnMgSU9B
VCBhbG1vc3QgY2VydGFpbmx5IGdvZXMNCj4gPiB0aHJvdWdoIHRoZSBJT01NVSBhbmQgdGhlcmUn
cyBzb21ldGhpbmcgd3Jvbmcgd2l0aCB0aGUgbWFwcGluZyBzZXR1cC4NCj4gPg0KPiA+IExvZ2Fu
DQoNCg==

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 22:40                 ` Jiang, Dave
@ 2018-08-09 22:48                   ` Kit Chow
  2018-08-09 22:50                     ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-09 22:48 UTC (permalink / raw)
  To: Jiang, Dave, Logan Gunthorpe, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/09/2018 03:40 PM, Jiang, Dave wrote:
>> -----Original Message-----
>> From: linux-pci-owner@vger.kernel.org [mailto:linux-pci-owner@vger.kernel.org] On Behalf Of Kit Chow
>> Sent: Thursday, August 9, 2018 2:48 PM
>> To: Logan Gunthorpe <logang@deltatee.com>; Eric Pilmore <epilmore@gigaio.com>; Bjorn Helgaas <helgaas@kernel.org>
>> Cc: linux-pci@vger.kernel.org; David Woodhouse <dwmw2@infradead.org>; Alex Williamson <alex.williamson@redhat.com>;
>> iommu@lists.linux-foundation.org
>> Subject: Re: IOAT DMA w/IOMMU
>>
>>
>>
>> On 08/09/2018 02:11 PM, Logan Gunthorpe wrote:
>>> On 09/08/18 02:57 PM, Kit Chow wrote:
>>>> On 08/09/2018 01:11 PM, Logan Gunthorpe wrote:
>>>>> On 09/08/18 01:47 PM, Kit Chow wrote:
>>>>>>> I haven't tested this scenario but my guess would be that IOAT would
>>>>>>> indeed go through the IOMMU and the PCI BAR address would need to be
>>>>>>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>>>>>>> is probably a good indication that this is the case. I really don't know
>>>>>>> why you'd want to DMA something without mapping it.
>>>>>> I have experimented with changing ntb_async_tx_submit to dma_map the PCI
>>>>>> BAR
>>>>>> address. With this, I get a different DMAR error:
>>>>> What code did you use to do this?
>>>> If you mean version of linux, it is 4.15.7.  Or specific dma_map call, I
>>>> believe it was dma_map_single.
>>> I mean the complete code you use to do the mapping so we can see if it's
>>> correct. dma_map_single() seems like an odd choice, I expected to see
>>> dma_map_resource().
>> Thanks for the suggestion!  Will try out dma_map_resource and report back.
>>
>> Kit
> Kit, I was able to try this on the Skylake Xeon platform with Intel NTB and ioatdma and I did not encounter any issues. Unfortunately I do not have my pre-Skylake platform to test anymore to verify.
We are expecting to get skylakes within a couple of weeks and I'll have 
a chance to give it a go on there.

Based on Logan's comments, I am very hopeful that the dma_map_resource 
will make things work on the older platforms...

Thanks
Kit


>>>>>> DMAR: [DMA Write] Request device [07:00.4] fault addr ffd00000
>>>>>> [fault reason 12] non-zero reserved fields in PTE
>>>>> Also, what device corresponds to 07:00.4 on your system?
>>>> I believe 07.00.4 was the PLX dma device. I get the same error with ioat.
>>> Using the mapping with the PLX dma device likely converts it from a pure
>>> P2P request to one where the TLPs pass through the IOMMU. So the fact
>>> that you get the same error with both means IOAT almost certainly goes
>>> through the IOMMU and there's something wrong with the mapping setup.
>>>
>>> Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 22:48                   ` Kit Chow
@ 2018-08-09 22:50                     ` Logan Gunthorpe
  2018-08-09 23:00                       ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-09 22:50 UTC (permalink / raw)
  To: Kit Chow, Jiang, Dave, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 09/08/18 04:48 PM, Kit Chow wrote:
> Based on Logan's comments, I am very hopeful that the dma_map_resource 
> will make things work on the older platforms...

Well, I *think* dma_map_single() would still work. So I'm not that
confident that's the root of your problem. I'd still like to see the
actual code snippet you are using.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 22:50                     ` Logan Gunthorpe
@ 2018-08-09 23:00                       ` Kit Chow
  2018-08-10 16:02                         ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-09 23:00 UTC (permalink / raw)
  To: Logan Gunthorpe, Jiang, Dave, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/09/2018 03:50 PM, Logan Gunthorpe wrote:
>
> On 09/08/18 04:48 PM, Kit Chow wrote:
>> Based on Logan's comments, I am very hopeful that the dma_map_resource
>> will make things work on the older platforms...
> Well, I *think* dma_map_single() would still work. So I'm not that
> confident that's the root of your problem. I'd still like to see the
> actual code snippet you are using.
>
> Logan
Here's the code snippet - (ntbdebug & 4) path does dma_map_resource of 
the pci bar address.

It was:
                 unmap->addr[1] = dma_map_single(device->dev, (void 
*)dest, len,
                     DMA_TO_DEVICE);

Kit
---


static int ntb_async_tx_submit(struct ntb_transport_qp *qp,
                                struct ntb_queue_entry *entry)
{
         struct dma_async_tx_descriptor *txd;
         struct dma_chan *chan = qp->tx_dma_chan;
         struct dma_device *device;
         size_t len = entry->len;
         void *buf = entry->buf;
         size_t dest_off, buff_off;
         struct dmaengine_unmap_data *unmap;
         dma_addr_t dest;
         dma_cookie_t cookie;
         int     unmapcnt;

         device = chan->device;

         dest = qp->tx_mw_phys + qp->tx_max_frame * entry->tx_index;

         buff_off = (size_t)buf & ~PAGE_MASK;
         dest_off = (size_t)dest & ~PAGE_MASK;

         if (!is_dma_copy_aligned(device, buff_off, dest_off, len))
                 goto err;


         if (ntbdebug & 0x4) {
                 unmapcnt = 2;
         } else {
                 unmapcnt = 1;
         }

         unmap = dmaengine_get_unmap_data(device->dev, unmapcnt, 
GFP_NOWAIT);
         if (!unmap)
                 goto err;

         unmap->len = len;
         unmap->addr[0] = dma_map_page(device->dev, virt_to_page(buf),
                                       buff_off, len, DMA_TO_DEVICE);
         if (dma_mapping_error(device->dev, unmap->addr[0]))
                 goto err_get_unmap;

         if (ntbdebug & 0x4) {
                 unmap->addr[1] = dma_map_resource(device->dev,
                     (phys_addr_t)dest, len, DMA_TO_DEVICE, 0);
                 if (dma_mapping_error(device->dev, unmap->addr[1]))
                         goto err_get_unmap;
                 unmap->to_cnt = 2;
         } else {
                 unmap->addr[1] = dest;
                 unmap->to_cnt = 1;
         }

         txd = device->device_prep_dma_memcpy(chan, unmap->addr[1],
             unmap->addr[0], len, DMA_PREP_INTERRUPT);

         if (!txd)
                 goto err_get_unmap;

         txd->callback_result = ntb_tx_copy_callback;
         txd->callback_param = entry;
         dma_set_unmap(txd, unmap);

         cookie = dmaengine_submit(txd);
         if (dma_submit_error(cookie))
                 goto err_set_unmap;

         dmaengine_unmap_put(unmap);

         dma_async_issue_pending(chan);

         return 0;

err_set_unmap:
         dma_descriptor_unmap(txd);
         txd->desc_free(txd);
err_get_unmap:
         dmaengine_unmap_put(unmap);
err:
         return -ENXIO;
}

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 23:00                       ` Kit Chow
@ 2018-08-10 16:02                         ` Kit Chow
  2018-08-10 16:23                           ` Kit Chow
  2018-08-10 16:24                           ` Logan Gunthorpe
  0 siblings, 2 replies; 48+ messages in thread
From: Kit Chow @ 2018-08-10 16:02 UTC (permalink / raw)
  To: Logan Gunthorpe, Jiang, Dave, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu

Turns out there is no dma_map_resource routine on x86. get_dma_ops 
returns intel_dma_ops which has map_resource pointing to NULL.

(gdb) p intel_dma_ops
$7 = {alloc = 0xffffffff8150f310 <intel_alloc_coherent>,
   free = 0xffffffff8150ec20 <intel_free_coherent>,
   mmap = 0x0 <irq_stack_union>, get_sgtable = 0x0 <irq_stack_union>,
   map_page = 0xffffffff8150f2d0 <intel_map_page>,
   unmap_page = 0xffffffff8150ec10 <intel_unmap_page>,
   map_sg = 0xffffffff8150ef40 <intel_map_sg>,
   unmap_sg = 0xffffffff8150eb80 <intel_unmap_sg>,
   map_resource = 0x0 <irq_stack_union>,
   unmap_resource = 0x0 <irq_stack_union>,
   sync_single_for_cpu = 0x0 <irq_stack_union>,
   sync_single_for_device = 0x0 <irq_stack_union>,
   sync_sg_for_cpu = 0x0 <irq_stack_union>,
   sync_sg_for_device = 0x0 <irq_stack_union>,
   cache_sync = 0x0 <irq_stack_union>,
   mapping_error = 0xffffffff815095f0 <intel_mapping_error>,
   dma_supported = 0xffffffff81033830 <x86_dma_supported>, is_phys = 0}

Will poke around some in the intel_map_page code but can you actually 
get a valid struct page for a pci bar address (dma_map_single calls 
virt_to_page)?  If not, does a map_resource routine that can properly 
map a pci bar address need to be implemented?

Kit

---


static inline dma_addr_t dma_map_single_attrs(struct device *dev, void *ptr,
                                               size_t size,
                                               enum dma_data_direction dir,
                                               unsigned long attrs)
{
         const struct dma_map_ops *ops = get_dma_ops(dev);
         dma_addr_t addr;

         BUG_ON(!valid_dma_direction(dir));
         addr = ops->map_page(dev, virt_to_page(ptr),
                              offset_in_page(ptr), size,
                              dir, attrs);
         debug_dma_map_page(dev, virt_to_page(ptr),
                            offset_in_page(ptr), size,
                            dir, addr, true);
         return addr;
}






On 08/09/2018 04:00 PM, Kit Chow wrote:
>
>
> On 08/09/2018 03:50 PM, Logan Gunthorpe wrote:
>>
>> On 09/08/18 04:48 PM, Kit Chow wrote:
>>> Based on Logan's comments, I am very hopeful that the dma_map_resource
>>> will make things work on the older platforms...
>> Well, I *think* dma_map_single() would still work. So I'm not that
>> confident that's the root of your problem. I'd still like to see the
>> actual code snippet you are using.
>>
>> Logan
> Here's the code snippet - (ntbdebug & 4) path does dma_map_resource of 
> the pci bar address.
>
> It was:
>                 unmap->addr[1] = dma_map_single(device->dev, (void 
> *)dest, len,
>                     DMA_TO_DEVICE);
>
> Kit
> ---
>
>
> static int ntb_async_tx_submit(struct ntb_transport_qp *qp,
>                                struct ntb_queue_entry *entry)
> {
>         struct dma_async_tx_descriptor *txd;
>         struct dma_chan *chan = qp->tx_dma_chan;
>         struct dma_device *device;
>         size_t len = entry->len;
>         void *buf = entry->buf;
>         size_t dest_off, buff_off;
>         struct dmaengine_unmap_data *unmap;
>         dma_addr_t dest;
>         dma_cookie_t cookie;
>         int     unmapcnt;
>
>         device = chan->device;
>
>         dest = qp->tx_mw_phys + qp->tx_max_frame * entry->tx_index;
>
>         buff_off = (size_t)buf & ~PAGE_MASK;
>         dest_off = (size_t)dest & ~PAGE_MASK;
>
>         if (!is_dma_copy_aligned(device, buff_off, dest_off, len))
>                 goto err;
>
>
>         if (ntbdebug & 0x4) {
>                 unmapcnt = 2;
>         } else {
>                 unmapcnt = 1;
>         }
>
>         unmap = dmaengine_get_unmap_data(device->dev, unmapcnt, 
> GFP_NOWAIT);
>         if (!unmap)
>                 goto err;
>
>         unmap->len = len;
>         unmap->addr[0] = dma_map_page(device->dev, virt_to_page(buf),
>                                       buff_off, len, DMA_TO_DEVICE);
>         if (dma_mapping_error(device->dev, unmap->addr[0]))
>                 goto err_get_unmap;
>
>         if (ntbdebug & 0x4) {
>                 unmap->addr[1] = dma_map_resource(device->dev,
>                     (phys_addr_t)dest, len, DMA_TO_DEVICE, 0);
>                 if (dma_mapping_error(device->dev, unmap->addr[1]))
>                         goto err_get_unmap;
>                 unmap->to_cnt = 2;
>         } else {
>                 unmap->addr[1] = dest;
>                 unmap->to_cnt = 1;
>         }
>
>         txd = device->device_prep_dma_memcpy(chan, unmap->addr[1],
>             unmap->addr[0], len, DMA_PREP_INTERRUPT);
>
>         if (!txd)
>                 goto err_get_unmap;
>
>         txd->callback_result = ntb_tx_copy_callback;
>         txd->callback_param = entry;
>         dma_set_unmap(txd, unmap);
>
>         cookie = dmaengine_submit(txd);
>         if (dma_submit_error(cookie))
>                 goto err_set_unmap;
>
>         dmaengine_unmap_put(unmap);
>
>         dma_async_issue_pending(chan);
>
>         return 0;
>
> err_set_unmap:
>         dma_descriptor_unmap(txd);
>         txd->desc_free(txd);
> err_get_unmap:
>         dmaengine_unmap_put(unmap);
> err:
>         return -ENXIO;
> }
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 16:02                         ` Kit Chow
@ 2018-08-10 16:23                           ` Kit Chow
  2018-08-10 16:24                             ` Logan Gunthorpe
  2018-08-10 16:24                           ` Logan Gunthorpe
  1 sibling, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-10 16:23 UTC (permalink / raw)
  To: Logan Gunthorpe, Jiang, Dave, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu

There is an internal routine (__intel_map_single) inside the intel iommu 
code that does the actual mapping using a phys_addr_t. Think I'll try to 
implement a intel_map_resource routine that calls that routine directly 
without all of the conversions done for dma_map_{single,page} (pci bar 
addr -> page -> phys_addr)...


On 08/10/2018 09:02 AM, Kit Chow wrote:
> Turns out there is no dma_map_resource routine on x86. get_dma_ops 
> returns intel_dma_ops which has map_resource pointing to NULL.
>
> (gdb) p intel_dma_ops
> $7 = {alloc = 0xffffffff8150f310 <intel_alloc_coherent>,
>   free = 0xffffffff8150ec20 <intel_free_coherent>,
>   mmap = 0x0 <irq_stack_union>, get_sgtable = 0x0 <irq_stack_union>,
>   map_page = 0xffffffff8150f2d0 <intel_map_page>,
>   unmap_page = 0xffffffff8150ec10 <intel_unmap_page>,
>   map_sg = 0xffffffff8150ef40 <intel_map_sg>,
>   unmap_sg = 0xffffffff8150eb80 <intel_unmap_sg>,
>   map_resource = 0x0 <irq_stack_union>,
>   unmap_resource = 0x0 <irq_stack_union>,
>   sync_single_for_cpu = 0x0 <irq_stack_union>,
>   sync_single_for_device = 0x0 <irq_stack_union>,
>   sync_sg_for_cpu = 0x0 <irq_stack_union>,
>   sync_sg_for_device = 0x0 <irq_stack_union>,
>   cache_sync = 0x0 <irq_stack_union>,
>   mapping_error = 0xffffffff815095f0 <intel_mapping_error>,
>   dma_supported = 0xffffffff81033830 <x86_dma_supported>, is_phys = 0}
>
> Will poke around some in the intel_map_page code but can you actually 
> get a valid struct page for a pci bar address (dma_map_single calls 
> virt_to_page)?  If not, does a map_resource routine that can properly 
> map a pci bar address need to be implemented?
>
> Kit
>
> ---
>
>
> static inline dma_addr_t dma_map_single_attrs(struct device *dev, void 
> *ptr,
>                                               size_t size,
>                                               enum dma_data_direction 
> dir,
>                                               unsigned long attrs)
> {
>         const struct dma_map_ops *ops = get_dma_ops(dev);
>         dma_addr_t addr;
>
>         BUG_ON(!valid_dma_direction(dir));
>         addr = ops->map_page(dev, virt_to_page(ptr),
>                              offset_in_page(ptr), size,
>                              dir, attrs);
>         debug_dma_map_page(dev, virt_to_page(ptr),
>                            offset_in_page(ptr), size,
>                            dir, addr, true);
>         return addr;
> }
>
>
>
>
>
>
> On 08/09/2018 04:00 PM, Kit Chow wrote:
>>
>>
>> On 08/09/2018 03:50 PM, Logan Gunthorpe wrote:
>>>
>>> On 09/08/18 04:48 PM, Kit Chow wrote:
>>>> Based on Logan's comments, I am very hopeful that the dma_map_resource
>>>> will make things work on the older platforms...
>>> Well, I *think* dma_map_single() would still work. So I'm not that
>>> confident that's the root of your problem. I'd still like to see the
>>> actual code snippet you are using.
>>>
>>> Logan
>> Here's the code snippet - (ntbdebug & 4) path does dma_map_resource 
>> of the pci bar address.
>>
>> It was:
>>                 unmap->addr[1] = dma_map_single(device->dev, (void 
>> *)dest, len,
>>                     DMA_TO_DEVICE);
>>
>> Kit
>> ---
>>
>>
>> static int ntb_async_tx_submit(struct ntb_transport_qp *qp,
>>                                struct ntb_queue_entry *entry)
>> {
>>         struct dma_async_tx_descriptor *txd;
>>         struct dma_chan *chan = qp->tx_dma_chan;
>>         struct dma_device *device;
>>         size_t len = entry->len;
>>         void *buf = entry->buf;
>>         size_t dest_off, buff_off;
>>         struct dmaengine_unmap_data *unmap;
>>         dma_addr_t dest;
>>         dma_cookie_t cookie;
>>         int     unmapcnt;
>>
>>         device = chan->device;
>>
>>         dest = qp->tx_mw_phys + qp->tx_max_frame * entry->tx_index;
>>
>>         buff_off = (size_t)buf & ~PAGE_MASK;
>>         dest_off = (size_t)dest & ~PAGE_MASK;
>>
>>         if (!is_dma_copy_aligned(device, buff_off, dest_off, len))
>>                 goto err;
>>
>>
>>         if (ntbdebug & 0x4) {
>>                 unmapcnt = 2;
>>         } else {
>>                 unmapcnt = 1;
>>         }
>>
>>         unmap = dmaengine_get_unmap_data(device->dev, unmapcnt, 
>> GFP_NOWAIT);
>>         if (!unmap)
>>                 goto err;
>>
>>         unmap->len = len;
>>         unmap->addr[0] = dma_map_page(device->dev, virt_to_page(buf),
>>                                       buff_off, len, DMA_TO_DEVICE);
>>         if (dma_mapping_error(device->dev, unmap->addr[0]))
>>                 goto err_get_unmap;
>>
>>         if (ntbdebug & 0x4) {
>>                 unmap->addr[1] = dma_map_resource(device->dev,
>>                     (phys_addr_t)dest, len, DMA_TO_DEVICE, 0);
>>                 if (dma_mapping_error(device->dev, unmap->addr[1]))
>>                         goto err_get_unmap;
>>                 unmap->to_cnt = 2;
>>         } else {
>>                 unmap->addr[1] = dest;
>>                 unmap->to_cnt = 1;
>>         }
>>
>>         txd = device->device_prep_dma_memcpy(chan, unmap->addr[1],
>>             unmap->addr[0], len, DMA_PREP_INTERRUPT);
>>
>>         if (!txd)
>>                 goto err_get_unmap;
>>
>>         txd->callback_result = ntb_tx_copy_callback;
>>         txd->callback_param = entry;
>>         dma_set_unmap(txd, unmap);
>>
>>         cookie = dmaengine_submit(txd);
>>         if (dma_submit_error(cookie))
>>                 goto err_set_unmap;
>>
>>         dmaengine_unmap_put(unmap);
>>
>>         dma_async_issue_pending(chan);
>>
>>         return 0;
>>
>> err_set_unmap:
>>         dma_descriptor_unmap(txd);
>>         txd->desc_free(txd);
>> err_get_unmap:
>>         dmaengine_unmap_put(unmap);
>> err:
>>         return -ENXIO;
>> }
>>
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 16:02                         ` Kit Chow
  2018-08-10 16:23                           ` Kit Chow
@ 2018-08-10 16:24                           ` Logan Gunthorpe
  2018-08-10 16:31                             ` Dave Jiang
  1 sibling, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-10 16:24 UTC (permalink / raw)
  To: Kit Chow, Jiang, Dave, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 10/08/18 10:02 AM, Kit Chow wrote:
> Turns out there is no dma_map_resource routine on x86. get_dma_ops 
> returns intel_dma_ops which has map_resource pointing to NULL.

Oh, yup. I wasn't aware of that. From a cursory view, it looks like it
shouldn't be too hard to implement though.

> Will poke around some in the intel_map_page code but can you actually 
> get a valid struct page for a pci bar address (dma_map_single calls 
> virt_to_page)?  If not, does a map_resource routine that can properly 
> map a pci bar address need to be implemented?

Yes, you can not get a struct page for a PCI bar address unless it's
mapped with ZONE_DEVICE like in my p2p work. So that would explain why
dma_map_single() didn't work.

This all implies that ntb_transport doesn't work with DMA and the IOMMU
turned on. I'm not sure I've ever tried that configuration myself but it
is a bit surprising.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 16:23                           ` Kit Chow
@ 2018-08-10 16:24                             ` Logan Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-10 16:24 UTC (permalink / raw)
  To: Kit Chow, Jiang, Dave, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 10/08/18 10:23 AM, Kit Chow wrote:
> There is an internal routine (__intel_map_single) inside the intel iommu 
> code that does the actual mapping using a phys_addr_t. Think I'll try to 
> implement a intel_map_resource routine that calls that routine directly 
> without all of the conversions done for dma_map_{single,page} (pci bar 
> addr -> page -> phys_addr)...

Nice, yes, that was what I was thinking.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 16:24                           ` Logan Gunthorpe
@ 2018-08-10 16:31                             ` Dave Jiang
  2018-08-10 16:33                               ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Jiang @ 2018-08-10 16:31 UTC (permalink / raw)
  To: Logan Gunthorpe, Kit Chow, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/10/2018 09:24 AM, Logan Gunthorpe wrote:
> 
> 
> On 10/08/18 10:02 AM, Kit Chow wrote:
>> Turns out there is no dma_map_resource routine on x86. get_dma_ops 
>> returns intel_dma_ops which has map_resource pointing to NULL.
> 
> Oh, yup. I wasn't aware of that. From a cursory view, it looks like it
> shouldn't be too hard to implement though.
> 
>> Will poke around some in the intel_map_page code but can you actually 
>> get a valid struct page for a pci bar address (dma_map_single calls 
>> virt_to_page)?  If not, does a map_resource routine that can properly 
>> map a pci bar address need to be implemented?
> 
> Yes, you can not get a struct page for a PCI bar address unless it's
> mapped with ZONE_DEVICE like in my p2p work. So that would explain why
> dma_map_single() didn't work.
> 
> This all implies that ntb_transport doesn't work with DMA and the IOMMU
> turned on. I'm not sure I've ever tried that configuration myself but it
> is a bit surprising.

Hmm....that's surprising because it seems to work on Skylake platform
when I tested it yesterday with Intel NTB. Kit is using a Haswell
platform at the moment I think. Although I'm curious if it works with
the PLX NTB he's using on Skylake.

> 
> Logan
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 16:31                             ` Dave Jiang
@ 2018-08-10 16:33                               ` Logan Gunthorpe
  2018-08-10 17:01                                 ` Dave Jiang
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-10 16:33 UTC (permalink / raw)
  To: Dave Jiang, Kit Chow, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 10/08/18 10:31 AM, Dave Jiang wrote:
> 
> 
> On 08/10/2018 09:24 AM, Logan Gunthorpe wrote:
>>
>>
>> On 10/08/18 10:02 AM, Kit Chow wrote:
>>> Turns out there is no dma_map_resource routine on x86. get_dma_ops 
>>> returns intel_dma_ops which has map_resource pointing to NULL.
>>
>> Oh, yup. I wasn't aware of that. From a cursory view, it looks like it
>> shouldn't be too hard to implement though.
>>
>>> Will poke around some in the intel_map_page code but can you actually 
>>> get a valid struct page for a pci bar address (dma_map_single calls 
>>> virt_to_page)?  If not, does a map_resource routine that can properly 
>>> map a pci bar address need to be implemented?
>>
>> Yes, you can not get a struct page for a PCI bar address unless it's
>> mapped with ZONE_DEVICE like in my p2p work. So that would explain why
>> dma_map_single() didn't work.
>>
>> This all implies that ntb_transport doesn't work with DMA and the IOMMU
>> turned on. I'm not sure I've ever tried that configuration myself but it
>> is a bit surprising.
> 
> Hmm....that's surprising because it seems to work on Skylake platform
> when I tested it yesterday with Intel NTB. Kit is using a Haswell
> platform at the moment I think. Although I'm curious if it works with
> the PLX NTB he's using on Skylake.

Does that mean on Skylake the IOAT can bypass the IOMMU? Because it
looks like the ntb_transport code doesn't map the physical address of
the NTB MW into the IOMMU when doing DMA...

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 16:33                               ` Logan Gunthorpe
@ 2018-08-10 17:01                                 ` Dave Jiang
  2018-08-10 17:15                                   ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Jiang @ 2018-08-10 17:01 UTC (permalink / raw)
  To: Logan Gunthorpe, Kit Chow, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/10/2018 09:33 AM, Logan Gunthorpe wrote:
> 
> 
> On 10/08/18 10:31 AM, Dave Jiang wrote:
>>
>>
>> On 08/10/2018 09:24 AM, Logan Gunthorpe wrote:
>>>
>>>
>>> On 10/08/18 10:02 AM, Kit Chow wrote:
>>>> Turns out there is no dma_map_resource routine on x86. get_dma_ops 
>>>> returns intel_dma_ops which has map_resource pointing to NULL.
>>>
>>> Oh, yup. I wasn't aware of that. From a cursory view, it looks like it
>>> shouldn't be too hard to implement though.
>>>
>>>> Will poke around some in the intel_map_page code but can you actually 
>>>> get a valid struct page for a pci bar address (dma_map_single calls 
>>>> virt_to_page)?  If not, does a map_resource routine that can properly 
>>>> map a pci bar address need to be implemented?
>>>
>>> Yes, you can not get a struct page for a PCI bar address unless it's
>>> mapped with ZONE_DEVICE like in my p2p work. So that would explain why
>>> dma_map_single() didn't work.
>>>
>>> This all implies that ntb_transport doesn't work with DMA and the IOMMU
>>> turned on. I'm not sure I've ever tried that configuration myself but it
>>> is a bit surprising.
>>
>> Hmm....that's surprising because it seems to work on Skylake platform
>> when I tested it yesterday with Intel NTB. Kit is using a Haswell
>> platform at the moment I think. Although I'm curious if it works with
>> the PLX NTB he's using on Skylake.
> 
> Does that mean on Skylake the IOAT can bypass the IOMMU? Because it
> looks like the ntb_transport code doesn't map the physical address of
> the NTB MW into the IOMMU when doing DMA...

Or if the BIOS has provided mapping for the Intel NTB device
specifically? Is that a possibility? NTB does go through the IOMMU.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 17:01                                 ` Dave Jiang
@ 2018-08-10 17:15                                   ` Logan Gunthorpe
  2018-08-10 17:46                                     ` Dave Jiang
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-10 17:15 UTC (permalink / raw)
  To: Dave Jiang, Kit Chow, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 10/08/18 11:01 AM, Dave Jiang wrote:
> Or if the BIOS has provided mapping for the Intel NTB device
> specifically? Is that a possibility? NTB does go through the IOMMU.

I don't know but if the BIOS is doing it, but that would only at best
work for Intel NTB.... I see no hope in getting the BIOS to map the MW
for a Switchtec NTB device...

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 17:15                                   ` Logan Gunthorpe
@ 2018-08-10 17:46                                     ` Dave Jiang
  2018-08-11  0:53                                       ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Jiang @ 2018-08-10 17:46 UTC (permalink / raw)
  To: Logan Gunthorpe, Kit Chow, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/10/2018 10:15 AM, Logan Gunthorpe wrote:
> 
> 
> On 10/08/18 11:01 AM, Dave Jiang wrote:
>> Or if the BIOS has provided mapping for the Intel NTB device
>> specifically? Is that a possibility? NTB does go through the IOMMU.
> 
> I don't know but if the BIOS is doing it, but that would only at best
> work for Intel NTB.... I see no hope in getting the BIOS to map the MW
> for a Switchtec NTB device...

Right. I'm just speculating why it may possibly work. But yes, I think
kernel will need appropriate mapping if it's not happening right now.
Hopefully Kit is onto something.

> 
> Logan
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-10 17:46                                     ` Dave Jiang
@ 2018-08-11  0:53                                       ` Kit Chow
  2018-08-11  2:10                                         ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-11  0:53 UTC (permalink / raw)
  To: Dave Jiang, Logan Gunthorpe, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu

Success!

I've implemented a new intel_map_resource (and intel_unmap_resource) 
routine which is called by dma_map_resource. As mentioned previously, 
the primary job of dma_map_resource/intel_map_resource is to call the 
intel iommu internal mapping routine (__intel_map_single) without 
translating the pci bar address into a page and then back to a phys addr 
as dma_map_page and dma_map_single are doing.

With the new dma_map_resource routine mapping the pci bar address in 
ntb_transport_tx_submit, I was still getting the "PTE Write access is 
not set" error.

The __intel_map_single routine sets prot based on the DMA direction.

         if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL || \
                         !cap_zlr(iommu->cap))
                 prot |= DMA_PTE_READ;
         if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
                 prot |= DMA_PTE_WRITE;

I was able to finally succeed in doing the dma transfers over ioat only 
when prot has DMA_PTE_WRITE set by setting the direction to either 
DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings 
need to be changed? Are there any bad side effects if I used 
DMA_BIDIRECTIONAL?

Given that using the pci bar address as is without getting an iommu 
address results in the same "PTE Write access" error, I wonder if there 
is some internal 'prot' associated with the non-translated pci bar 
address that just needs to be tweaked to include DMA_PTE_WRITE???

Thanks!




On 08/10/2018 10:46 AM, Dave Jiang wrote:
>
> On 08/10/2018 10:15 AM, Logan Gunthorpe wrote:
>>
>> On 10/08/18 11:01 AM, Dave Jiang wrote:
>>> Or if the BIOS has provided mapping for the Intel NTB device
>>> specifically? Is that a possibility? NTB does go through the IOMMU.
>> I don't know but if the BIOS is doing it, but that would only at best
>> work for Intel NTB.... I see no hope in getting the BIOS to map the MW
>> for a Switchtec NTB device...
> Right. I'm just speculating why it may possibly work. But yes, I think
> kernel will need appropriate mapping if it's not happening right now.
> Hopefully Kit is onto something.
>
>> Logan
>>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-11  0:53                                       ` Kit Chow
@ 2018-08-11  2:10                                         ` Logan Gunthorpe
  2018-08-13 14:23                                           ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-11  2:10 UTC (permalink / raw)
  To: Kit Chow, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 10/08/18 06:53 PM, Kit Chow wrote:
> I was able to finally succeed in doing the dma transfers over ioat only
> when prot has DMA_PTE_WRITE set by setting the direction to either
> DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
> need to be changed? Are there any bad side effects if I used
> DMA_BIDIRECTIONAL?

Good to hear it. Without digging into the direction much all I can say
is that it can sometimes be very confusing what the direction is. Adding
another PCI device just adds to the confusion.

I believe, the direction should be from the IOAT's point of view. So if
the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
data is going to the IOAT).

Using DMA_BIDIRECTIONAL just forgoes any hardware security / protection
that the buffer would have in terms of direction. Generally it's good
practice to use the strictest direction you can.

> Given that using the pci bar address as is without getting an iommu
> address results in the same "PTE Write access" error, I wonder if there
> is some internal 'prot' associated with the non-translated pci bar
> address that just needs to be tweaked to include DMA_PTE_WRITE???

No, I don't think so. The 'prot' will be a property of the IOMMU. Not
having an entry is probably just the same (from the perspective of the
error you see) as only having an entry for reading.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-11  2:10                                         ` Logan Gunthorpe
@ 2018-08-13 14:23                                           ` Kit Chow
  2018-08-13 14:59                                             ` Robin Murphy
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-13 14:23 UTC (permalink / raw)
  To: Logan Gunthorpe, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:
>
> On 10/08/18 06:53 PM, Kit Chow wrote:
>> I was able to finally succeed in doing the dma transfers over ioat only
>> when prot has DMA_PTE_WRITE set by setting the direction to either
>> DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
>> need to be changed? Are there any bad side effects if I used
>> DMA_BIDIRECTIONAL?
> Good to hear it. Without digging into the direction much all I can say
> is that it can sometimes be very confusing what the direction is. Adding
> another PCI device just adds to the confusion.
Yep, confusing :).

======================= =============================================
DMA_NONE		no direction (used for debugging)
DMA_TO_DEVICE		data is going from the memory to the device
DMA_FROM_DEVICE		data is coming from the device to the memory
DMA_BIDIRECTIONAL	direction isn't known
======================= =============================================

> I believe, the direction should be from the IOAT's point of view. So if
> the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
> coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
> data is going to the IOAT).
It would certainly seem like DMA_TO_DEVICE would be the proper choice; 
IOAT is the plumbing to move host data (memory) to the bar address (device).

Will go with what works and set DMA_FROM_DEVICE.

In ntb_async_tx_submit, does the direction used for the dma_map routines 
for the src and dest addresses need to be consistent?

And does the direction setting for the dmaengine_unmap_data have to be 
consistent with the direction used in dma_map_*?

BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it keep 
track of the dma_map routine used and call the corresponding dma_unmap 
routine?  In the case of the intel iommu, it doesn't matter.

Thanks
Kit

>
> Using DMA_BIDIRECTIONAL just forgoes any hardware security / protection
> that the buffer would have in terms of direction. Generally it's good
> practice to use the strictest direction you can.
>
>> Given that using the pci bar address as is without getting an iommu
>> address results in the same "PTE Write access" error, I wonder if there
>> is some internal 'prot' associated with the non-translated pci bar
>> address that just needs to be tweaked to include DMA_PTE_WRITE???
> No, I don't think so. The 'prot' will be a property of the IOMMU. Not
> having an entry is probably just the same (from the perspective of the
> error you see) as only having an entry for reading.
>
> Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 14:23                                           ` Kit Chow
@ 2018-08-13 14:59                                             ` Robin Murphy
  2018-08-13 15:21                                               ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Robin Murphy @ 2018-08-13 14:59 UTC (permalink / raw)
  To: Kit Chow, Logan Gunthorpe, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu

On 13/08/18 15:23, Kit Chow wrote:
> On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:
>>
>> On 10/08/18 06:53 PM, Kit Chow wrote:
>>> I was able to finally succeed in doing the dma transfers over ioat only
>>> when prot has DMA_PTE_WRITE set by setting the direction to either
>>> DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
>>> need to be changed? Are there any bad side effects if I used
>>> DMA_BIDIRECTIONAL?
>> Good to hear it. Without digging into the direction much all I can say
>> is that it can sometimes be very confusing what the direction is. Adding
>> another PCI device just adds to the confusion.
> Yep, confusing :).
> 
> ======================= =============================================
> DMA_NONE        no direction (used for debugging)
> DMA_TO_DEVICE        data is going from the memory to the device
> DMA_FROM_DEVICE        data is coming from the device to the memory
> DMA_BIDIRECTIONAL    direction isn't known
> ======================= =============================================
> 
>> I believe, the direction should be from the IOAT's point of view. So if
>> the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
>> coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
>> data is going to the IOAT).
> It would certainly seem like DMA_TO_DEVICE would be the proper choice; 
> IOAT is the plumbing to move host data (memory) to the bar address 
> (device).

Except that the "device" in question is the IOAT itself (more generally, 
it means the device represented by the first argument to dma_map_*() - 
the one actually emitting the reads and writes). The context of a DMA 
API call is the individual mapping in question, not whatever overall 
operation it may be part of - your example already involves two separate 
mappings: one "from" system memory "to" the DMA engine, and one "from" 
the DMA engine "to" PCI BAR memory.

Note that the DMA API's dma_direction is also distinct from the 
dmaengine API's dma_transfer_direction, and there's plenty of fun to be 
had mapping between the two - see pl330.c or rcar-dmac.c for other 
examples of dma_map_resource() for slave devices - no guarantees that 
those implementations are entirely correct (especially the one I did!), 
but in practice they do make the "DMA engine behind an IOMMU" case work 
for UARTs and similar straightforward slaves.

> Will go with what works and set DMA_FROM_DEVICE.
> 
> In ntb_async_tx_submit, does the direction used for the dma_map routines 
> for the src and dest addresses need to be consistent?

In general, the mappings of source and destination addresses would 
typically have opposite directions as above, unless they're both 
bidirectional.

> And does the direction setting for the dmaengine_unmap_data have to be 
> consistent with the direction used in dma_map_*?

Yes, the arguments to an unmap are expected to match whatever was passed 
to the corresponding map call. CONFIG_DMA_API_DEBUG should help catch 
any mishaps.

Robin.

> BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it keep 
> track of the dma_map routine used and call the corresponding dma_unmap 
> routine?  In the case of the intel iommu, it doesn't matter.
> 
> Thanks
> Kit
> 
>>
>> Using DMA_BIDIRECTIONAL just forgoes any hardware security / protection
>> that the buffer would have in terms of direction. Generally it's good
>> practice to use the strictest direction you can.
>>
>>> Given that using the pci bar address as is without getting an iommu
>>> address results in the same "PTE Write access" error, I wonder if there
>>> is some internal 'prot' associated with the non-translated pci bar
>>> address that just needs to be tweaked to include DMA_PTE_WRITE???
>> No, I don't think so. The 'prot' will be a property of the IOMMU. Not
>> having an entry is probably just the same (from the perspective of the
>> error you see) as only having an entry for reading.
>>
>> Logan
> 
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 14:59                                             ` Robin Murphy
@ 2018-08-13 15:21                                               ` Kit Chow
  2018-08-13 23:30                                                 ` Kit Chow
  2018-08-13 23:36                                                 ` Kit Chow
  0 siblings, 2 replies; 48+ messages in thread
From: Kit Chow @ 2018-08-13 15:21 UTC (permalink / raw)
  To: Robin Murphy, Logan Gunthorpe, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu



On 08/13/2018 07:59 AM, Robin Murphy wrote:
> On 13/08/18 15:23, Kit Chow wrote:
>> On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:
>>>
>>> On 10/08/18 06:53 PM, Kit Chow wrote:
>>>> I was able to finally succeed in doing the dma transfers over ioat 
>>>> only
>>>> when prot has DMA_PTE_WRITE set by setting the direction to either
>>>> DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
>>>> need to be changed? Are there any bad side effects if I used
>>>> DMA_BIDIRECTIONAL?
>>> Good to hear it. Without digging into the direction much all I can say
>>> is that it can sometimes be very confusing what the direction is. 
>>> Adding
>>> another PCI device just adds to the confusion.
>> Yep, confusing :).
>>
>> ======================= =============================================
>> DMA_NONE        no direction (used for debugging)
>> DMA_TO_DEVICE        data is going from the memory to the device
>> DMA_FROM_DEVICE        data is coming from the device to the memory
>> DMA_BIDIRECTIONAL    direction isn't known
>> ======================= =============================================
>>
>>> I believe, the direction should be from the IOAT's point of view. So if
>>> the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
>>> coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
>>> data is going to the IOAT).
>> It would certainly seem like DMA_TO_DEVICE would be the proper 
>> choice; IOAT is the plumbing to move host data (memory) to the bar 
>> address (device).
>
> Except that the "device" in question is the IOAT itself (more 
> generally, it means the device represented by the first argument to 
> dma_map_*() - the one actually emitting the reads and writes). The 
> context of a DMA API call is the individual mapping in question, not 
> whatever overall operation it may be part of - your example already 
> involves two separate mappings: one "from" system memory "to" the DMA 
> engine, and one "from" the DMA engine "to" PCI BAR memory.

OK, that makes sense.  The middleman (aka DMA engine device) is the key 
in the to/from puzzle. Thanks!


>
> Note that the DMA API's dma_direction is also distinct from the 
> dmaengine API's dma_transfer_direction, and there's plenty of fun to 
> be had mapping between the two - see pl330.c or rcar-dmac.c for other 
> examples of dma_map_resource() for slave devices - no guarantees that 
> those implementations are entirely correct (especially the one I 
> did!), but in practice they do make the "DMA engine behind an IOMMU" 
> case work for UARTs and similar straightforward slaves.
>
>> Will go with what works and set DMA_FROM_DEVICE.
>>
>> In ntb_async_tx_submit, does the direction used for the dma_map 
>> routines for the src and dest addresses need to be consistent?
>
> In general, the mappings of source and destination addresses would 
> typically have opposite directions as above, unless they're both 
> bidirectional.
>
>> And does the direction setting for the dmaengine_unmap_data have to 
>> be consistent with the direction used in dma_map_*?
>
> Yes, the arguments to an unmap are expected to match whatever was 
> passed to the corresponding map call. CONFIG_DMA_API_DEBUG should help 
> catch any mishaps.
>
> Robin.
>
>> BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it 
>> keep track of the dma_map routine used and call the corresponding 
>> dma_unmap routine?  In the case of the intel iommu, it doesn't matter.
>>
>> Thanks
>> Kit
>>
>>>
>>> Using DMA_BIDIRECTIONAL just forgoes any hardware security / protection
>>> that the buffer would have in terms of direction. Generally it's good
>>> practice to use the strictest direction you can.
>>>
>>>> Given that using the pci bar address as is without getting an iommu
>>>> address results in the same "PTE Write access" error, I wonder if 
>>>> there
>>>> is some internal 'prot' associated with the non-translated pci bar
>>>> address that just needs to be tweaked to include DMA_PTE_WRITE???
>>> No, I don't think so. The 'prot' will be a property of the IOMMU. Not
>>> having an entry is probably just the same (from the perspective of the
>>> error you see) as only having an entry for reading.
>>>
>>> Logan
>>
>> _______________________________________________
>> iommu mailing list
>> iommu@lists.linux-foundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 15:21                                               ` Kit Chow
@ 2018-08-13 23:30                                                 ` Kit Chow
  2018-08-13 23:39                                                   ` Logan Gunthorpe
  2018-08-13 23:36                                                 ` Kit Chow
  1 sibling, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-13 23:30 UTC (permalink / raw)
  To: Robin Murphy, Logan Gunthorpe, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu

[-- Attachment #1: Type: text/plain, Size: 5857 bytes --]

Taking a step back, I was a little surprised that dma_map_single 
successfully returned an iommu address for the pci bar address passed 
into it during my initial experiment...

Internally, dma_map_single calls virt_to_page() to translate the 
"virtual address" into a page and intel_map page then calls 
page_to_phys() to convert the page to a dma_addr_t.

The virt_to_page and page_to_phys routines don't appear to do any 
validation and just uses arithmetic to do the conversions.

the pci bar address (0x383c70e51578) does fall into a valid VA range in 
x86_64 so it could conceivably be a valid VA.  So I tried an virtual 
address inside the VA hole and and it too returned without any errors.

virt_to_page(0x800000000000) -> 0xffffede000000000
page_to_phys(0xffffede000000000) -> 0xf80000000000

In arch/x86/include/asm/page.h, there is the following comment in 
regards to validating the virtual address.

/*
  * virt_to_page(kaddr) returns a valid pointer if and only if
  * virt_addr_valid(kaddr) returns true.
  */
#define virt_to_page(kaddr)     pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)

So it looks like the validation by virt_addr_valid was somehow dropped 
from the virt_to_page code path. Does anyone have any ideas what 
happended to it?

Kit


On 08/13/2018 08:21 AM, Kit Chow wrote:
>
>
> On 08/13/2018 07:59 AM, Robin Murphy wrote:
>> On 13/08/18 15:23, Kit Chow wrote:
>>> On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:
>>>>
>>>> On 10/08/18 06:53 PM, Kit Chow wrote:
>>>>> I was able to finally succeed in doing the dma transfers over ioat 
>>>>> only
>>>>> when prot has DMA_PTE_WRITE set by setting the direction to either
>>>>> DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
>>>>> need to be changed? Are there any bad side effects if I used
>>>>> DMA_BIDIRECTIONAL?
>>>> Good to hear it. Without digging into the direction much all I can say
>>>> is that it can sometimes be very confusing what the direction is. 
>>>> Adding
>>>> another PCI device just adds to the confusion.
>>> Yep, confusing :).
>>>
>>> ======================= =============================================
>>> DMA_NONE        no direction (used for debugging)
>>> DMA_TO_DEVICE        data is going from the memory to the device
>>> DMA_FROM_DEVICE        data is coming from the device to the memory
>>> DMA_BIDIRECTIONAL    direction isn't known
>>> ======================= =============================================
>>>
>>>> I believe, the direction should be from the IOAT's point of view. 
>>>> So if
>>>> the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
>>>> coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
>>>> data is going to the IOAT).
>>> It would certainly seem like DMA_TO_DEVICE would be the proper 
>>> choice; IOAT is the plumbing to move host data (memory) to the bar 
>>> address (device).
>>
>> Except that the "device" in question is the IOAT itself (more 
>> generally, it means the device represented by the first argument to 
>> dma_map_*() - the one actually emitting the reads and writes). The 
>> context of a DMA API call is the individual mapping in question, not 
>> whatever overall operation it may be part of - your example already 
>> involves two separate mappings: one "from" system memory "to" the DMA 
>> engine, and one "from" the DMA engine "to" PCI BAR memory.
>
> OK, that makes sense.  The middleman (aka DMA engine device) is the 
> key in the to/from puzzle. Thanks!
>
>
>>
>> Note that the DMA API's dma_direction is also distinct from the 
>> dmaengine API's dma_transfer_direction, and there's plenty of fun to 
>> be had mapping between the two - see pl330.c or rcar-dmac.c for other 
>> examples of dma_map_resource() for slave devices - no guarantees that 
>> those implementations are entirely correct (especially the one I 
>> did!), but in practice they do make the "DMA engine behind an IOMMU" 
>> case work for UARTs and similar straightforward slaves.
>>
>>> Will go with what works and set DMA_FROM_DEVICE.
>>>
>>> In ntb_async_tx_submit, does the direction used for the dma_map 
>>> routines for the src and dest addresses need to be consistent?
>>
>> In general, the mappings of source and destination addresses would 
>> typically have opposite directions as above, unless they're both 
>> bidirectional.
>>
>>> And does the direction setting for the dmaengine_unmap_data have to 
>>> be consistent with the direction used in dma_map_*?
>>
>> Yes, the arguments to an unmap are expected to match whatever was 
>> passed to the corresponding map call. CONFIG_DMA_API_DEBUG should 
>> help catch any mishaps.
>>
>> Robin.
>>
>>> BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it 
>>> keep track of the dma_map routine used and call the corresponding 
>>> dma_unmap routine?  In the case of the intel iommu, it doesn't matter.
>>>
>>> Thanks
>>> Kit
>>>
>>>>
>>>> Using DMA_BIDIRECTIONAL just forgoes any hardware security / 
>>>> protection
>>>> that the buffer would have in terms of direction. Generally it's good
>>>> practice to use the strictest direction you can.
>>>>
>>>>> Given that using the pci bar address as is without getting an iommu
>>>>> address results in the same "PTE Write access" error, I wonder if 
>>>>> there
>>>>> is some internal 'prot' associated with the non-translated pci bar
>>>>> address that just needs to be tweaked to include DMA_PTE_WRITE???
>>>> No, I don't think so. The 'prot' will be a property of the IOMMU. Not
>>>> having an entry is probably just the same (from the perspective of the
>>>> error you see) as only having an entry for reading.
>>>>
>>>> Logan
>>>
>>> _______________________________________________
>>> iommu mailing list
>>> iommu@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>


[-- Attachment #2: Type: text/html, Size: 9231 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 15:21                                               ` Kit Chow
  2018-08-13 23:30                                                 ` Kit Chow
@ 2018-08-13 23:36                                                 ` Kit Chow
  1 sibling, 0 replies; 48+ messages in thread
From: Kit Chow @ 2018-08-13 23:36 UTC (permalink / raw)
  To: Robin Murphy, Logan Gunthorpe, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu


Taking a step back, I was a little surprised that dma_map_single 
successfully returned an
iommu address for the pci bar address passed into it during my initial 
experiment...

Internally, dma_map_single calls virt_to_page() to translate the 
"virtual address" into a
page and intel_map page then calls page_to_phys() to convert the page to 
a dma_addr_t.

The virt_to_page and page_to_phys routines don't appear to do any 
validation and just
uses arithmetic to do the conversions.

the pci bar address (0x383c70e51578) does fall into a valid VA range in 
x86_64 so
it could conceivably be a valid VA.  So I tried an virtual address 
inside the VA hole
  and and it too returned without any errors.

virt_to_page(0x800000000000) -> 0xffffede000000000
page_to_phys(0xffffede000000000) -> 0xf80000000000

In arch/x86/include/asm/page.h, there is the following comment in regards to
validating the virtual address.

/*
  * virt_to_page(kaddr) returns a valid pointer if and only if
  * virt_addr_valid(kaddr) returns true.
  */
#define virt_to_page(kaddr)     pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)

So it looks like the validation by virt_addr_valid was somehow dropped 
from the
virt_to_page code path. Does anyone have any ideas what happended to it?

Kit

(Resending, earlier message tagged as containing html subpart)

On 08/13/2018 08:21 AM, Kit Chow wrote:
>
>
> On 08/13/2018 07:59 AM, Robin Murphy wrote:
>> On 13/08/18 15:23, Kit Chow wrote:
>>> On 08/10/2018 07:10 PM, Logan Gunthorpe wrote:
>>>>
>>>> On 10/08/18 06:53 PM, Kit Chow wrote:
>>>>> I was able to finally succeed in doing the dma transfers over ioat 
>>>>> only
>>>>> when prot has DMA_PTE_WRITE set by setting the direction to either
>>>>> DMA_FROM_DEVICE or DMA_BIDIRECTIONAL. Any ideas if the prot settings
>>>>> need to be changed? Are there any bad side effects if I used
>>>>> DMA_BIDIRECTIONAL?
>>>> Good to hear it. Without digging into the direction much all I can say
>>>> is that it can sometimes be very confusing what the direction is. 
>>>> Adding
>>>> another PCI device just adds to the confusion.
>>> Yep, confusing :).
>>>
>>> ======================= =============================================
>>> DMA_NONE        no direction (used for debugging)
>>> DMA_TO_DEVICE        data is going from the memory to the device
>>> DMA_FROM_DEVICE        data is coming from the device to the memory
>>> DMA_BIDIRECTIONAL    direction isn't known
>>> ======================= =============================================
>>>
>>>> I believe, the direction should be from the IOAT's point of view. 
>>>> So if
>>>> the IOAT is writing to the BAR you'd set DMA_FROM_DEVICE (ie. data is
>>>> coming from the IOAT) and if it's reading you'd set DMA_TO_DEVICE (ie.
>>>> data is going to the IOAT).
>>> It would certainly seem like DMA_TO_DEVICE would be the proper 
>>> choice; IOAT is the plumbing to move host data (memory) to the bar 
>>> address (device).
>>
>> Except that the "device" in question is the IOAT itself (more 
>> generally, it means the device represented by the first argument to 
>> dma_map_*() - the one actually emitting the reads and writes). The 
>> context of a DMA API call is the individual mapping in question, not 
>> whatever overall operation it may be part of - your example already 
>> involves two separate mappings: one "from" system memory "to" the DMA 
>> engine, and one "from" the DMA engine "to" PCI BAR memory.
>
> OK, that makes sense.  The middleman (aka DMA engine device) is the 
> key in the to/from puzzle. Thanks!
>
>
>>
>> Note that the DMA API's dma_direction is also distinct from the 
>> dmaengine API's dma_transfer_direction, and there's plenty of fun to 
>> be had mapping between the two - see pl330.c or rcar-dmac.c for other 
>> examples of dma_map_resource() for slave devices - no guarantees that 
>> those implementations are entirely correct (especially the one I 
>> did!), but in practice they do make the "DMA engine behind an IOMMU" 
>> case work for UARTs and similar straightforward slaves.
>>
>>> Will go with what works and set DMA_FROM_DEVICE.
>>>
>>> In ntb_async_tx_submit, does the direction used for the dma_map 
>>> routines for the src and dest addresses need to be consistent?
>>
>> In general, the mappings of source and destination addresses would 
>> typically have opposite directions as above, unless they're both 
>> bidirectional.
>>
>>> And does the direction setting for the dmaengine_unmap_data have to 
>>> be consistent with the direction used in dma_map_*?
>>
>> Yes, the arguments to an unmap are expected to match whatever was 
>> passed to the corresponding map call. CONFIG_DMA_API_DEBUG should 
>> help catch any mishaps.
>>
>> Robin.
>>
>>> BTW, dmaengine_unmap routine only calls dma_unmap_page. Should it 
>>> keep track of the dma_map routine used and call the corresponding 
>>> dma_unmap routine?  In the case of the intel iommu, it doesn't matter.
>>>
>>> Thanks
>>> Kit
>>>
>>>>
>>>> Using DMA_BIDIRECTIONAL just forgoes any hardware security / 
>>>> protection
>>>> that the buffer would have in terms of direction. Generally it's good
>>>> practice to use the strictest direction you can.
>>>>
>>>>> Given that using the pci bar address as is without getting an iommu
>>>>> address results in the same "PTE Write access" error, I wonder if 
>>>>> there
>>>>> is some internal 'prot' associated with the non-translated pci bar
>>>>> address that just needs to be tweaked to include DMA_PTE_WRITE???
>>>> No, I don't think so. The 'prot' will be a property of the IOMMU. Not
>>>> having an entry is probably just the same (from the perspective of the
>>>> error you see) as only having an entry for reading.
>>>>
>>>> Logan
>>>
>>> _______________________________________________
>>> iommu mailing list
>>> iommu@lists.linux-foundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 23:30                                                 ` Kit Chow
@ 2018-08-13 23:39                                                   ` Logan Gunthorpe
  2018-08-13 23:48                                                     ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-13 23:39 UTC (permalink / raw)
  To: Kit Chow, Robin Murphy, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu



On 13/08/18 05:30 PM, Kit Chow wrote:
> In arch/x86/include/asm/page.h, there is the following comment in
> regards to validating the virtual address.
> 
> /*
>  * virt_to_page(kaddr) returns a valid pointer if and only if
>  * virt_addr_valid(kaddr) returns true.
>  */
> #define virt_to_page(kaddr)     pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
> 
> So it looks like the validation by virt_addr_valid was somehow dropped
> from the virt_to_page code path. Does anyone have any ideas what
> happended to it?

I don't think it was ever validated (though I haven't been around long
enough to say for certain). What the comment is saying is that you
shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
true (which in most cases you can without the extra check). virt_to_page
is meant to be really fast so adding an extra validation would probably
be a significant performance regression for the entire kernel.

The fact that this can happen through dma_map_single() is non-ideal at
best. It assumes the caller is mapping regular memory and doesn't check
this at all. It may make sense to fix that but I think people expect
dma_map_single() to be as fast as possible as well...

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 23:39                                                   ` Logan Gunthorpe
@ 2018-08-13 23:48                                                     ` Kit Chow
  2018-08-13 23:50                                                       ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-13 23:48 UTC (permalink / raw)
  To: Logan Gunthorpe, Robin Murphy, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu



On 08/13/2018 04:39 PM, Logan Gunthorpe wrote:
>
> On 13/08/18 05:30 PM, Kit Chow wrote:
>> In arch/x86/include/asm/page.h, there is the following comment in
>> regards to validating the virtual address.
>>
>> /*
>>   * virt_to_page(kaddr) returns a valid pointer if and only if
>>   * virt_addr_valid(kaddr) returns true.
>>   */
>> #define virt_to_page(kaddr)     pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
>>
>> So it looks like the validation by virt_addr_valid was somehow dropped
>> from the virt_to_page code path. Does anyone have any ideas what
>> happended to it?
> I don't think it was ever validated (though I haven't been around long
> enough to say for certain). What the comment is saying is that you
> shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
> true (which in most cases you can without the extra check). virt_to_page
> is meant to be really fast so adding an extra validation would probably
> be a significant performance regression for the entire kernel.
>
> The fact that this can happen through dma_map_single() is non-ideal at
> best. It assumes the caller is mapping regular memory and doesn't check
> this at all. It may make sense to fix that but I think people expect
> dma_map_single() to be as fast as possible as well...
>
Perhaps include the validation with some debug turned on?

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 23:48                                                     ` Kit Chow
@ 2018-08-13 23:50                                                       ` Logan Gunthorpe
  2018-08-14 13:47                                                         ` Kit Chow
  2018-08-14 14:03                                                         ` Robin Murphy
  0 siblings, 2 replies; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-13 23:50 UTC (permalink / raw)
  To: Kit Chow, Robin Murphy, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu



On 13/08/18 05:48 PM, Kit Chow wrote:
> On 08/13/2018 04:39 PM, Logan Gunthorpe wrote:
>>
>> On 13/08/18 05:30 PM, Kit Chow wrote:
>>> In arch/x86/include/asm/page.h, there is the following comment in
>>> regards to validating the virtual address.
>>>
>>> /*
>>>   * virt_to_page(kaddr) returns a valid pointer if and only if
>>>   * virt_addr_valid(kaddr) returns true.
>>>   */
>>> #define virt_to_page(kaddr)     pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
>>>
>>> So it looks like the validation by virt_addr_valid was somehow dropped
>>> from the virt_to_page code path. Does anyone have any ideas what
>>> happended to it?
>> I don't think it was ever validated (though I haven't been around long
>> enough to say for certain). What the comment is saying is that you
>> shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
>> true (which in most cases you can without the extra check). virt_to_page
>> is meant to be really fast so adding an extra validation would probably
>> be a significant performance regression for the entire kernel.
>>
>> The fact that this can happen through dma_map_single() is non-ideal at
>> best. It assumes the caller is mapping regular memory and doesn't check
>> this at all. It may make sense to fix that but I think people expect
>> dma_map_single() to be as fast as possible as well...
>>
> Perhaps include the validation with some debug turned on?

The problem is how often do you develop code with any of the debug
config options turned on?

There's already a couple of BUG_ONs in dma_map_single so maybe another
one with virt_addr_valid wouldn't be so bad.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 23:50                                                       ` Logan Gunthorpe
@ 2018-08-14 13:47                                                         ` Kit Chow
  2018-08-14 14:03                                                         ` Robin Murphy
  1 sibling, 0 replies; 48+ messages in thread
From: Kit Chow @ 2018-08-14 13:47 UTC (permalink / raw)
  To: Logan Gunthorpe, Robin Murphy, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu



On 08/13/2018 04:50 PM, Logan Gunthorpe wrote:
>
> On 13/08/18 05:48 PM, Kit Chow wrote:
>> On 08/13/2018 04:39 PM, Logan Gunthorpe wrote:
>>> On 13/08/18 05:30 PM, Kit Chow wrote:
>>>> In arch/x86/include/asm/page.h, there is the following comment in
>>>> regards to validating the virtual address.
>>>>
>>>> /*
>>>>    * virt_to_page(kaddr) returns a valid pointer if and only if
>>>>    * virt_addr_valid(kaddr) returns true.
>>>>    */
>>>> #define virt_to_page(kaddr)     pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
>>>>
>>>> So it looks like the validation by virt_addr_valid was somehow dropped
>>>> from the virt_to_page code path. Does anyone have any ideas what
>>>> happended to it?
>>> I don't think it was ever validated (though I haven't been around long
>>> enough to say for certain). What the comment is saying is that you
>>> shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
>>> true (which in most cases you can without the extra check). virt_to_page
>>> is meant to be really fast so adding an extra validation would probably
>>> be a significant performance regression for the entire kernel.
>>>
>>> The fact that this can happen through dma_map_single() is non-ideal at
>>> best. It assumes the caller is mapping regular memory and doesn't check
>>> this at all. It may make sense to fix that but I think people expect
>>> dma_map_single() to be as fast as possible as well...
>>>
>> Perhaps include the validation with some debug turned on?
> The problem is how often do you develop code with any of the debug
> config options turned on?
>
> There's already a couple of BUG_ONs in dma_map_single so maybe another
> one with virt_addr_valid wouldn't be so bad.
Had my very first Linux crash on the dma direction BUG_ON when I tried 
DMA_NONE :).

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-13 23:50                                                       ` Logan Gunthorpe
  2018-08-14 13:47                                                         ` Kit Chow
@ 2018-08-14 14:03                                                         ` Robin Murphy
  1 sibling, 0 replies; 48+ messages in thread
From: Robin Murphy @ 2018-08-14 14:03 UTC (permalink / raw)
  To: Logan Gunthorpe, Kit Chow, Dave Jiang, Eric Pilmore, Bjorn Helgaas
  Cc: linux-pci, Alex Williamson, David Woodhouse, iommu

On 14/08/18 00:50, Logan Gunthorpe wrote:
> On 13/08/18 05:48 PM, Kit Chow wrote:
>> On 08/13/2018 04:39 PM, Logan Gunthorpe wrote:
>>>
>>> On 13/08/18 05:30 PM, Kit Chow wrote:
>>>> In arch/x86/include/asm/page.h, there is the following comment in
>>>> regards to validating the virtual address.
>>>>
>>>> /*
>>>>    * virt_to_page(kaddr) returns a valid pointer if and only if
>>>>    * virt_addr_valid(kaddr) returns true.
>>>>    */
>>>> #define virt_to_page(kaddr)     pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
>>>>
>>>> So it looks like the validation by virt_addr_valid was somehow dropped
>>>> from the virt_to_page code path. Does anyone have any ideas what
>>>> happended to it?
>>> I don't think it was ever validated (though I haven't been around long
>>> enough to say for certain). What the comment is saying is that you
>>> shouldn't rely on virt_to_page() unless you know virt_addr_valid() is
>>> true (which in most cases you can without the extra check). virt_to_page
>>> is meant to be really fast so adding an extra validation would probably
>>> be a significant performance regression for the entire kernel.
>>>
>>> The fact that this can happen through dma_map_single() is non-ideal at
>>> best. It assumes the caller is mapping regular memory and doesn't check
>>> this at all. It may make sense to fix that but I think people expect
>>> dma_map_single() to be as fast as possible as well...

dma_map_single() is already documented as only supporting lowmem (for 
which virt_to_page() can be assumed to be valid). You might get away 
with feeding it bogus addresses on x86, but on non-coherent 
architectures which convert the page back to a virtual address to 
perform cache maintenance you can expect that to crash and burn rapidly.

There may be some minimal-overhead sanity checking of fundamentals, but 
in general it's not really the DMA API's job to police its callers 
exhaustively; consider that the mm layer doesn't go out of its way to 
stop you from doing things like "kfree(kfree);" either.

>>>
>> Perhaps include the validation with some debug turned on?
> 
> The problem is how often do you develop code with any of the debug
> config options turned on?
> 
> There's already a couple of BUG_ONs in dma_map_single so maybe another
> one with virt_addr_valid wouldn't be so bad.

Note that virt_addr_valid() may be pretty heavyweight in itself. For 
example the arm64 implementation involves memblock_search(); that really 
isn't viable in a DMA mapping fastpath.

Robin.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-09 21:36         ` Logan Gunthorpe
@ 2018-08-16 17:16           ` Kit Chow
  2018-08-16 17:21             ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-16 17:16 UTC (permalink / raw)
  To: Logan Gunthorpe, Eric Pilmore
  Cc: Bjorn Helgaas, linux-pci, David Woodhouse, Alex Williamson,
	iommu, linux-pci


On 08/09/2018 02:36 PM, Logan Gunthorpe wrote:
>
> On 09/08/18 03:31 PM, Eric Pilmore wrote:
>> On Thu, Aug 9, 2018 at 12:35 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>>> Hey,
>>>
>>> On 09/08/18 12:51 PM, Eric Pilmore wrote:
>>>>>> Was wondering if anybody here has used IOAT DMA engines with an
>>>>>> IOMMU turned on (Xeon based system)? My specific question is really
>>>>>> whether it is possible to DMA (w/IOAT) to a PCI BAR address as the
>>>>>> destination without having to map that address to the IOVA space of
>>>>>> the DMA engine first (assuming the IOMMU is on)?
>>> I haven't tested this scenario but my guess would be that IOAT would
>>> indeed go through the IOMMU and the PCI BAR address would need to be
>>> properly mapped into the IOAT's IOVA. The fact that you see DMAR errors
>>> is probably a good indication that this is the case. I really don't know
>>> why you'd want to DMA something without mapping it.
>> The thought was to avoid paying the price of having to go through yet another
>> translation and also because it was not believed to be necessary anyway since
>> the DMA device could go straight to a PCI BAR address without the need for a
>> mapping.  We have been playing with two DMA engines, IOAT and PLX. The
>> PLX does not have any issues going straight to the PCI BAR address, but unlike
>> IOAT, PLX is sitting "in the PCI tree".
> Yes, you can do true P2P transactions with the PLX DMA engine. (Though
> you do have to be careful as technically you need to translate to a PCI
> bus address not the CPU physical address which are the same on x86; and
> you need to watch that ACS doesn't mess it up).
>
> It doesn't sound like it's possible to avoid the IOMMU when using the
> IOAT so you need the mapping. I would not expect the extra translation
> in the IOMMU to be noticeable, performance wise.
>
> Logan
Hi Logan,

Based on Dave Jiang's suggestion, I am looking into upstreaming the
ntb change to dma map the pci bar address in ntb_async_tx_submit() 
(along with
the intel iommu change to support dma_map_resource).

I only have access to intel hosts for testing (and possibly an AMD
host currently collecting dust) and am not sure how to go about getting
the proper test coverage for other architectures.

Any suggestions? Do you anticipate any issues with dma mapping the pci bar
address in ntb_async_tx_submit on non-x86 archs?  Do you or know of folks
who have ntb test setups with non-x86 hosts who might be able to help?

Thanks
Kit

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-16 17:16           ` Kit Chow
@ 2018-08-16 17:21             ` Logan Gunthorpe
  2018-08-16 18:53               ` Kit Chow
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-16 17:21 UTC (permalink / raw)
  To: Kit Chow, Eric Pilmore
  Cc: Bjorn Helgaas, linux-pci, David Woodhouse, Alex Williamson, iommu



On 16/08/18 11:16 AM, Kit Chow wrote:
> I only have access to intel hosts for testing (and possibly an AMD
> host currently collecting dust) and am not sure how to go about getting
> the proper test coverage for other architectures.

Well, I thought you were only changing the Intel IOMMU implementation...
So testing on Intel hardware seems fine to me.

> Any suggestions? Do you anticipate any issues with dma mapping the pci bar
> address in ntb_async_tx_submit on non-x86 archs?  Do you or know of folks
> who have ntb test setups with non-x86 hosts who might be able to help?

I expect other IOMMUs will need to be updated to properly support
dma_map_resource but that will probably need to be done on a case by
case basis as the need arises. Unfortunately it seems the work to add
dma_map_resource() didn't really consider that the IOMMUs had to have
proper support and probably should have errored out instead of just
passing along the physical address if the IOMMU didn't have support.

I don't know of anyone with an NTB setup on anything but x86 at the moment.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-16 17:21             ` Logan Gunthorpe
@ 2018-08-16 18:53               ` Kit Chow
  2018-08-16 18:56                 ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Kit Chow @ 2018-08-16 18:53 UTC (permalink / raw)
  To: Logan Gunthorpe, Eric Pilmore
  Cc: Bjorn Helgaas, linux-pci, David Woodhouse, Alex Williamson, iommu



On 08/16/2018 10:21 AM, Logan Gunthorpe wrote:
>
> On 16/08/18 11:16 AM, Kit Chow wrote:
>> I only have access to intel hosts for testing (and possibly an AMD
>> host currently collecting dust) and am not sure how to go about getting
>> the proper test coverage for other architectures.
> Well, I thought you were only changing the Intel IOMMU implementation...
> So testing on Intel hardware seems fine to me.
For the ntb change, I wasn't sure if there was some other arch where
ntb_async_tx_submit would work ok with the pci bar address but
would break with the dma map (assuming map_resource has been implemented).
If its all x86 at the moment, I guess I'm good to go. Please confirm.

Thanks
Kit

>> Any suggestions? Do you anticipate any issues with dma mapping the pci bar
>> address in ntb_async_tx_submit on non-x86 archs?  Do you or know of folks
>> who have ntb test setups with non-x86 hosts who might be able to help?
> I expect other IOMMUs will need to be updated to properly support
> dma_map_resource but that will probably need to be done on a case by
> case basis as the need arises. Unfortunately it seems the work to add
> dma_map_resource() didn't really consider that the IOMMUs had to have
> proper support and probably should have errored out instead of just
> passing along the physical address if the IOMMU didn't have support.
>
> I don't know of anyone with an NTB setup on anything but x86 at the moment.
>
> Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-16 18:53               ` Kit Chow
@ 2018-08-16 18:56                 ` Logan Gunthorpe
  2018-08-21 23:18                   ` Eric Pilmore
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-16 18:56 UTC (permalink / raw)
  To: Kit Chow, Eric Pilmore
  Cc: Bjorn Helgaas, linux-pci, David Woodhouse, Alex Williamson, iommu



On 16/08/18 12:53 PM, Kit Chow wrote:
> 
> 
> On 08/16/2018 10:21 AM, Logan Gunthorpe wrote:
>>
>> On 16/08/18 11:16 AM, Kit Chow wrote:
>>> I only have access to intel hosts for testing (and possibly an AMD
>>> host currently collecting dust) and am not sure how to go about getting
>>> the proper test coverage for other architectures.
>> Well, I thought you were only changing the Intel IOMMU implementation...
>> So testing on Intel hardware seems fine to me.
> For the ntb change, I wasn't sure if there was some other arch where
> ntb_async_tx_submit would work ok with the pci bar address but
> would break with the dma map (assuming map_resource has been implemented).
> If its all x86 at the moment, I guess I'm good to go. Please confirm.

I expect lots of arches have issues with dma_map_resource(), it's pretty
new and if it didn't work correctly on x86 I imagine many other arches
are wrong too. I an arch is doesn't work, someone will have to fix that
arches Iommu implementation. But using dma_map_resource() in
ntb_transport is definitely correct compared to what was done before.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-16 18:56                 ` Logan Gunthorpe
@ 2018-08-21 23:18                   ` Eric Pilmore
  2018-08-21 23:20                     ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Pilmore @ 2018-08-21 23:18 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kit Chow, Bjorn Helgaas, linux-pci, David Woodhouse,
	Alex Williamson, iommu

On Thu, Aug 16, 2018 at 11:56 AM, Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
>
> On 16/08/18 12:53 PM, Kit Chow wrote:
> >
> >
> > On 08/16/2018 10:21 AM, Logan Gunthorpe wrote:
> >>
> >> On 16/08/18 11:16 AM, Kit Chow wrote:
> >>> I only have access to intel hosts for testing (and possibly an AMD
> >>> host currently collecting dust) and am not sure how to go about getting
> >>> the proper test coverage for other architectures.
> >> Well, I thought you were only changing the Intel IOMMU implementation...
> >> So testing on Intel hardware seems fine to me.
> > For the ntb change, I wasn't sure if there was some other arch where
> > ntb_async_tx_submit would work ok with the pci bar address but
> > would break with the dma map (assuming map_resource has been implemented).
> > If its all x86 at the moment, I guess I'm good to go. Please confirm.
>
> I expect lots of arches have issues with dma_map_resource(), it's pretty
> new and if it didn't work correctly on x86 I imagine many other arches
> are wrong too. I an arch is doesn't work, someone will have to fix that
> arches Iommu implementation. But using dma_map_resource() in
> ntb_transport is definitely correct compared to what was done before.
>
> Logan


We have been running locally with Kit's change for dma_map_resource and its
incorporation in ntb_async_tx_submit for the destination address and
it runs fine
under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based system,
regardless of whether the DMA engine being used is IOAT or a PLX
device sitting in
the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) based
system it seems to run into issues, specifically when put under a
load. In this case,
just having a load using a single ping command with an interval=0, i.e. no delay
between ping packets, after a few thousand packets the system just hangs. No
panic or watchdogs.  Note that in this scenario I can only use a PLX DMA engine.

Just wondering if anybody might have a clue as to why the i7 system might run
into issues, but a Xeon system not. IOMMU differences? My gut reaction is that
the problem may not necessarily be IOMMU hardware related, but perhaps simply
a timing issue given the difference in cpu speeds between these systems and
maybe we're hitting some window in the Intel IOMMU mapping/unmapping logic.

Thanks,
Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-21 23:18                   ` Eric Pilmore
@ 2018-08-21 23:20                     ` Logan Gunthorpe
  2018-08-21 23:28                       ` Eric Pilmore
  2018-08-21 23:30                       ` Eric Pilmore
  0 siblings, 2 replies; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-21 23:20 UTC (permalink / raw)
  To: Eric Pilmore
  Cc: Kit Chow, Bjorn Helgaas, linux-pci, David Woodhouse,
	Alex Williamson, iommu



On 21/08/18 05:18 PM, Eric Pilmore wrote:
> We have been running locally with Kit's change for dma_map_resource and its
> incorporation in ntb_async_tx_submit for the destination address and
> it runs fine
> under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based system,
> regardless of whether the DMA engine being used is IOAT or a PLX
> device sitting in
> the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) based
> system it seems to run into issues, specifically when put under a
> load. In this case,
> just having a load using a single ping command with an interval=0, i.e. no delay
> between ping packets, after a few thousand packets the system just hangs. No
> panic or watchdogs.  Note that in this scenario I can only use a PLX DMA engine.

This is just my best guess: but it sounds to me like a bug in the PLX
DMA driver or hardware.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-21 23:20                     ` Logan Gunthorpe
@ 2018-08-21 23:28                       ` Eric Pilmore
  2018-08-21 23:35                         ` Logan Gunthorpe
  2018-08-21 23:30                       ` Eric Pilmore
  1 sibling, 1 reply; 48+ messages in thread
From: Eric Pilmore @ 2018-08-21 23:28 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kit Chow, Bjorn Helgaas, linux-pci, David Woodhouse,
	Alex Williamson, iommu

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe <logang@deltatee.com>
wrote:

>
>
> On 21/08/18 05:18 PM, Eric Pilmore wrote:
> > We have been running locally with Kit's change for dma_map_resource and
> its
> > incorporation in ntb_async_tx_submit for the destination address and
> > it runs fine
> > under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based
> system,
> > regardless of whether the DMA engine being used is IOAT or a PLX
> > device sitting in
> > the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz)
> based
> > system it seems to run into issues, specifically when put under a
> > load. In this case,
> > just having a load using a single ping command with an interval=0, i.e.
> no delay
> > between ping packets, after a few thousand packets the system just
> hangs. No
> > panic or watchdogs.  Note that in this scenario I can only use a PLX DMA
> engine.
>
> This is just my best guess: but it sounds to me like a bug in the PLX
> DMA driver or hardware.
>
>
The PLX DMA driver?  But the PLX driver isn't really even involved in the
mapping
stage.  Are you thinking maybe the stage at which the DMA descriptor is
freed and
the PLX DMA driver does a dma_descriptor_unmap?

Again, PLX did not exhibit any issues on the Xeon system.

Eric

[-- Attachment #2: Type: text/html, Size: 1888 bytes --]

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-21 23:20                     ` Logan Gunthorpe
  2018-08-21 23:28                       ` Eric Pilmore
@ 2018-08-21 23:30                       ` Eric Pilmore
  1 sibling, 0 replies; 48+ messages in thread
From: Eric Pilmore @ 2018-08-21 23:30 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kit Chow, Bjorn Helgaas, linux-pci, David Woodhouse,
	Alex Williamson, iommu

On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
> On 21/08/18 05:18 PM, Eric Pilmore wrote:
>> We have been running locally with Kit's change for dma_map_resource and its
>> incorporation in ntb_async_tx_submit for the destination address and
>> it runs fine
>> under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based system,
>> regardless of whether the DMA engine being used is IOAT or a PLX
>> device sitting in
>> the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) based
>> system it seems to run into issues, specifically when put under a
>> load. In this case,
>> just having a load using a single ping command with an interval=0, i.e. no delay
>> between ping packets, after a few thousand packets the system just hangs. No
>> panic or watchdogs.  Note that in this scenario I can only use a PLX DMA engine.
>
> This is just my best guess: but it sounds to me like a bug in the PLX
> DMA driver or hardware.
>
> Logan

(sorry, html in last response.  resending..)

The PLX DMA driver?  But the PLX driver isn't really even involved in
the mapping
stage.  Are you thinking maybe the stage at which the DMA descriptor
is freed and
the PLX DMA driver does a dma_descriptor_unmap?

Again, PLX did not exhibit any issues on the Xeon system.

Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-21 23:28                       ` Eric Pilmore
@ 2018-08-21 23:35                         ` Logan Gunthorpe
  2018-08-21 23:45                           ` Eric Pilmore
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-21 23:35 UTC (permalink / raw)
  To: Eric Pilmore
  Cc: Kit Chow, Bjorn Helgaas, linux-pci, David Woodhouse,
	Alex Williamson, iommu



On 21/08/18 05:28 PM, Eric Pilmore wrote:
> 
> 
> On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe <logang@deltatee.com
> <mailto:logang@deltatee.com>> wrote:
> 
> 
> 
>     On 21/08/18 05:18 PM, Eric Pilmore wrote:
>     > We have been running locally with Kit's change for dma_map_resource and its
>     > incorporation in ntb_async_tx_submit for the destination address and
>     > it runs fine
>     > under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based system,
>     > regardless of whether the DMA engine being used is IOAT or a PLX
>     > device sitting in
>     > the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) based
>     > system it seems to run into issues, specifically when put under a
>     > load. In this case,
>     > just having a load using a single ping command with an interval=0, i.e. no delay
>     > between ping packets, after a few thousand packets the system just hangs. No
>     > panic or watchdogs.  Note that in this scenario I can only use a PLX DMA engine.
> 
>     This is just my best guess: but it sounds to me like a bug in the PLX
>     DMA driver or hardware.
> 
> 
> The PLX DMA driver?  But the PLX driver isn't really even involved in
> the mapping
> stage.  Are you thinking maybe the stage at which the DMA descriptor is
> freed and
> the PLX DMA driver does a dma_descriptor_unmap?

Hmm, well what would make you think the hang is during
mapping/unmapping? I would expect a hang to be in handling completions
from the DMA engine or something like that.

> Again, PLX did not exhibit any issues on the Xeon system.

Oh, I missed that. That puts a crinkle in my theory but, as you say, it
could be a timing issue.

Also, it's VERY strange that it would hang the entire system. That makes
things very hard to debug...

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-21 23:35                         ` Logan Gunthorpe
@ 2018-08-21 23:45                           ` Eric Pilmore
  2018-08-21 23:53                             ` Logan Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Eric Pilmore @ 2018-08-21 23:45 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kit Chow, Bjorn Helgaas, linux-pci, David Woodhouse,
	Alex Williamson, iommu

On Tue, Aug 21, 2018 at 4:35 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>
>
> On 21/08/18 05:28 PM, Eric Pilmore wrote:
>>
>>
>> On Tue, Aug 21, 2018 at 4:20 PM, Logan Gunthorpe <logang@deltatee.com
>> <mailto:logang@deltatee.com>> wrote:
>>
>>
>>
>>     On 21/08/18 05:18 PM, Eric Pilmore wrote:
>>     > We have been running locally with Kit's change for dma_map_resource and its
>>     > incorporation in ntb_async_tx_submit for the destination address and
>>     > it runs fine
>>     > under "load" (iperf) on a Xeon (Xeon(R) CPU E5-2680 v4 @ 2.40GHz) based system,
>>     > regardless of whether the DMA engine being used is IOAT or a PLX
>>     > device sitting in
>>     > the PCIe tree. However, when we go back to a i7 (i7-7700K CPU @ 4.20GHz) based
>>     > system it seems to run into issues, specifically when put under a
>>     > load. In this case,
>>     > just having a load using a single ping command with an interval=0, i.e. no delay
>>     > between ping packets, after a few thousand packets the system just hangs. No
>>     > panic or watchdogs.  Note that in this scenario I can only use a PLX DMA engine.
>>
>>     This is just my best guess: but it sounds to me like a bug in the PLX
>>     DMA driver or hardware.
>>
>>
>> The PLX DMA driver?  But the PLX driver isn't really even involved in
>> the mapping
>> stage.  Are you thinking maybe the stage at which the DMA descriptor is
>> freed and
>> the PLX DMA driver does a dma_descriptor_unmap?
>
> Hmm, well what would make you think the hang is during
> mapping/unmapping?

Well, the only difference between success and failure is running with the
call to dma_map_resource for the destination address, which is a PCI BAR
address. Prior to Kit introducing this call, we never created a mapping for the
destination PCI BAR address and it worked fine on all systems when using
PLX DMA. It was only when we went to a Xeon system and attempted to use
IOAT DMA that we found we needed a mapping for that destination PCI BAR
address.

The only thing the PLX driver does related to "mappings" is a call to
dma_descriptor_unmap when the descriptor is freed, however that is more
of an administrative step to clean up the unmap-data data structure used
when the mapping was originally established.

> I would expect a hang to be in handling completions
> from the DMA engine or something like that.
>
>> Again, PLX did not exhibit any issues on the Xeon system.
>
> Oh, I missed that. That puts a crinkle in my theory but, as you say, it
> could be a timing issue.
>
> Also, it's VERY strange that it would hang the entire system. That makes
> things very hard to debug...

Tell me about it!  ;-)

Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-21 23:45                           ` Eric Pilmore
@ 2018-08-21 23:53                             ` Logan Gunthorpe
  2018-08-21 23:59                               ` Eric Pilmore
  0 siblings, 1 reply; 48+ messages in thread
From: Logan Gunthorpe @ 2018-08-21 23:53 UTC (permalink / raw)
  To: Eric Pilmore
  Cc: Kit Chow, Bjorn Helgaas, linux-pci, David Woodhouse,
	Alex Williamson, iommu



On 21/08/18 05:45 PM, Eric Pilmore wrote:
> Well, the only difference between success and failure is running with the
> call to dma_map_resource for the destination address, which is a PCI BAR
> address. Prior to Kit introducing this call, we never created a mapping for the
> destination PCI BAR address and it worked fine on all systems when using
> PLX DMA. It was only when we went to a Xeon system and attempted to use
> IOAT DMA that we found we needed a mapping for that destination PCI BAR
> address.
> 
> The only thing the PLX driver does related to "mappings" is a call to
> dma_descriptor_unmap when the descriptor is freed, however that is more
> of an administrative step to clean up the unmap-data data structure used
> when the mapping was originally established.

Ah, so there's a big difference here in hardware. Without the mapping
call you are going to essentially be doing a P2P transaction. So the
TLPs won't even hit the CPU. With the mapping call, the TLPs will now
go through the CPU and IOMMU.

CPUs don't always have good support for routing PCI P2P transactions.
However, I would have expected a relatively new i7 CPU to support it
well. It may simply be that this CPU does not have good support, however
 that comes as a bit of a surprise to me. Our plans for the P2P
patch-set was to have a white-list for CPUs that work.

If you can, it would be worth hooking up a PCI analyzer to see what's
happening to the TLPs in both cases.

Logan

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: IOAT DMA w/IOMMU
  2018-08-21 23:53                             ` Logan Gunthorpe
@ 2018-08-21 23:59                               ` Eric Pilmore
  0 siblings, 0 replies; 48+ messages in thread
From: Eric Pilmore @ 2018-08-21 23:59 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Kit Chow, Bjorn Helgaas, linux-pci, David Woodhouse,
	Alex Williamson, iommu

On Tue, Aug 21, 2018 at 4:53 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
>
> CPUs don't always have good support for routing PCI P2P transactions.
> However, I would have expected a relatively new i7 CPU to support it
> well. It may simply be that this CPU does not have good support, however
>  that comes as a bit of a surprise to me. Our plans for the P2P
> patch-set was to have a white-list for CPUs that work.
>
> If you can, it would be worth hooking up a PCI analyzer to see what's
> happening to the TLPs in both cases.

We do have one, but it will be a bit of a logistical effort to get it set-up
in this config. Will be a while before I can gather that data. Will forward
along when I do.

Thanks,
Eric

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2018-08-21 23:59 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-09 18:14 IOAT DMA w/IOMMU Eric Pilmore
2018-08-09 18:43 ` Bjorn Helgaas
2018-08-09 18:51   ` Eric Pilmore
2018-08-09 19:35     ` Logan Gunthorpe
2018-08-09 19:47       ` Kit Chow
2018-08-09 20:11         ` Logan Gunthorpe
2018-08-09 20:57           ` Kit Chow
2018-08-09 21:11             ` Logan Gunthorpe
2018-08-09 21:47               ` Kit Chow
2018-08-09 22:40                 ` Jiang, Dave
2018-08-09 22:48                   ` Kit Chow
2018-08-09 22:50                     ` Logan Gunthorpe
2018-08-09 23:00                       ` Kit Chow
2018-08-10 16:02                         ` Kit Chow
2018-08-10 16:23                           ` Kit Chow
2018-08-10 16:24                             ` Logan Gunthorpe
2018-08-10 16:24                           ` Logan Gunthorpe
2018-08-10 16:31                             ` Dave Jiang
2018-08-10 16:33                               ` Logan Gunthorpe
2018-08-10 17:01                                 ` Dave Jiang
2018-08-10 17:15                                   ` Logan Gunthorpe
2018-08-10 17:46                                     ` Dave Jiang
2018-08-11  0:53                                       ` Kit Chow
2018-08-11  2:10                                         ` Logan Gunthorpe
2018-08-13 14:23                                           ` Kit Chow
2018-08-13 14:59                                             ` Robin Murphy
2018-08-13 15:21                                               ` Kit Chow
2018-08-13 23:30                                                 ` Kit Chow
2018-08-13 23:39                                                   ` Logan Gunthorpe
2018-08-13 23:48                                                     ` Kit Chow
2018-08-13 23:50                                                       ` Logan Gunthorpe
2018-08-14 13:47                                                         ` Kit Chow
2018-08-14 14:03                                                         ` Robin Murphy
2018-08-13 23:36                                                 ` Kit Chow
2018-08-09 21:31       ` Eric Pilmore
2018-08-09 21:36         ` Logan Gunthorpe
2018-08-16 17:16           ` Kit Chow
2018-08-16 17:21             ` Logan Gunthorpe
2018-08-16 18:53               ` Kit Chow
2018-08-16 18:56                 ` Logan Gunthorpe
2018-08-21 23:18                   ` Eric Pilmore
2018-08-21 23:20                     ` Logan Gunthorpe
2018-08-21 23:28                       ` Eric Pilmore
2018-08-21 23:35                         ` Logan Gunthorpe
2018-08-21 23:45                           ` Eric Pilmore
2018-08-21 23:53                             ` Logan Gunthorpe
2018-08-21 23:59                               ` Eric Pilmore
2018-08-21 23:30                       ` Eric Pilmore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).