linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* AMD IO_PAGE_FAULT w/NTB on Write ops?
@ 2019-04-20  9:06 Eric Pilmore
  2019-04-22 17:14 ` Logan Gunthorpe
       [not found] ` <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com>
  0 siblings, 2 replies; 8+ messages in thread
From: Eric Pilmore @ 2019-04-20  9:06 UTC (permalink / raw)
  To: linux-ntb, linux-pci; +Cc: S Taylor, D Meyer

Hi Folks,

Before I ask my questions, here is a little background on the
environment I have:
- 2 hosts: 1 Xeon based (Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz),
                1 AMD based (AMD EPYC 7401 24-Core Processor)
- Each host is interconnected via an external PCI-e (switchtec) switch.
- The two hosts are exporting memory to each other via NTB.
- IOMMU is enabled in both hosts. The Xeon platform requires some BIOS
settings and a kernel parameter (intel_iommu=on), however as far as I
have been able to determine, the AMD only requires the IOMMU BIOS
setting to be enabled and no special kernel boot parameters. Does that
sound right for AMD?
- Region of memory exported to each host is acquired/mapped via
dma_alloc_coherent() using the "device" of the respective external
PCI-e switch.
- The dma_addr returned from the dma_alloc_coherent is relayed to the
peer host who then adds that value (i.e. IOVA offset) to it's local
PCI BAR representing the switch, and then ioremap()'s that resulting
address to get a CPU virtual address to which it can now perform
ioread/iowrite operations.

What we have found is that the Xeon based host can successfully ioread
to this mapped shared buffer, but whenever it attempts an iowrite to
this region, it results in an IO_PAGE_FAULT on the AMD based host:

AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000
address=0x00000000fde1c18c flags=0x0070]

Going in the opposite direction there are no issues, i.e. the AMD
based host can successfully ioread/iowrite to the mapped in buffer
exported by the Xeon host.  Or if both hosts are Xeon's, then
everything works fine also.

I have looked high and low, and have not been able to interpret what
the "flags=0x0070" represent. I assume they are indicating some write
permission error, but was wondering if anybody here might know?

More importantly, does anybody know why the AMD IOMMU might seemingly
default to not allow Write operations to the exported memory? Is there
some additional BIOS or kernel boot parameter setting that needs to be
set?

lspci on the AMD hosts of the external PCI-e switch:
   23:00.0 PCI bridge: PMC-Sierra Inc. Device 8536
   23:00.1 Bridge: PMC-Sierra Inc. Device 8536

The 23:00.1 BDF is the NTB bridge. The BDF (23:01.2) in the error
message represents the "NTB translated" BDF of the request that came
from the peer, i.e. the 01.2 is the proxy-id. Is there a chance that
this proxy-id is causing some confusion for the AMD IOMMU?

Would greatly appreciate any assistance!

Thanks!

-- 
Eric Pilmore
epilmore@gigaio.com
http://gigaio.com
Phone: (858) 775 2514

This e-mail message is intended only for the individual(s) to whom it
is addressed and
may contain information that is privileged, confidential, proprietary,
or otherwise exempt
from disclosure under applicable law. If you believe you have received
this message in
error, please advise the sender by return e-mail and delete it from
your mailbox.
Thank you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD IO_PAGE_FAULT w/NTB on Write ops?
  2019-04-20  9:06 AMD IO_PAGE_FAULT w/NTB on Write ops? Eric Pilmore
@ 2019-04-22 17:14 ` Logan Gunthorpe
  2019-04-22 17:31   ` Logan Gunthorpe
       [not found] ` <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com>
  1 sibling, 1 reply; 8+ messages in thread
From: Logan Gunthorpe @ 2019-04-22 17:14 UTC (permalink / raw)
  To: Eric Pilmore, linux-ntb, linux-pci; +Cc: S Taylor, D Meyer



On 2019-04-20 3:06 a.m., Eric Pilmore wrote:
> What we have found is that the Xeon based host can successfully ioread
> to this mapped shared buffer, but whenever it attempts an iowrite to
> this region, it results in an IO_PAGE_FAULT on the AMD based host:
> 
> AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000
> address=0x00000000fde1c18c flags=0x0070]
> 
> Going in the opposite direction there are no issues, i.e. the AMD
> based host can successfully ioread/iowrite to the mapped in buffer
> exported by the Xeon host.  Or if both hosts are Xeon's, then
> everything works fine also.
> 
> I have looked high and low, and have not been able to interpret what
> the "flags=0x0070" represent. I assume they are indicating some write
> permission error, but was wondering if anybody here might know?

See the AMD IOMMU spec[1]. Figure 51. 0x0070 indicates the PE, RW and PR
bits are set which means a Write request to a present page was denied
because the peripheral did not have permission.

> More importantly, does anybody know why the AMD IOMMU might seemingly
> default to not allow Write operations to the exported memory? Is there
> some additional BIOS or kernel boot parameter setting that needs to be
> set?

Yeah, I don't think the IOMMU defaults to allow write operations to
exported memory. That would be extremely broken....

> lspci on the AMD hosts of the external PCI-e switch:
>    23:00.0 PCI bridge: PMC-Sierra Inc. Device 8536
>    23:00.1 Bridge: PMC-Sierra Inc. Device 8536
> 
> The 23:00.1 BDF is the NTB bridge. The BDF (23:01.2) in the error
> message represents the "NTB translated" BDF of the request that came
> from the peer, i.e. the 01.2 is the proxy-id. Is there a chance that
> this proxy-id is causing some confusion for the AMD IOMMU?

I suspect the proxy IDs are the problem. On Intel hardware, we had to
add support so that it allowed requests for all proxy IDs for a given
device. We probably have to do something similar to the AMD IOMMU driver.

My guess is that the reason writes work and not reads is because the
write TLPs are posted and thus the switch doesn't apply the Proxy ID
seeing it doesn't expect a completion. Thus the IOMMU sees the TLPs as
coming from a permitted peripheral and doesn't complain.

Logan

[1] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD IO_PAGE_FAULT w/NTB on Write ops?
  2019-04-22 17:14 ` Logan Gunthorpe
@ 2019-04-22 17:31   ` Logan Gunthorpe
  2019-09-06 23:48     ` AMD Epyc iperf perfomance issues over NTB Kit Chow
  0 siblings, 1 reply; 8+ messages in thread
From: Logan Gunthorpe @ 2019-04-22 17:31 UTC (permalink / raw)
  To: Eric Pilmore, linux-ntb, linux-pci; +Cc: S Taylor, D Meyer



On 2019-04-22 11:14 a.m., Logan Gunthorpe wrote:
> My guess is that the reason writes work and not reads is because the
> write TLPs are posted and thus the switch doesn't apply the Proxy ID
> seeing it doesn't expect a completion. Thus the IOMMU sees the TLPs as
> coming from a permitted peripheral and doesn't complain.

Oh, oops, sounds like I got that backwards as you seem to indicate reads
work but not writes. That doesn't make as much sense to me, but I still
think it's a proxy_id problem.

Take a look at [1]. It reads to me like the AMD IOMMU only supports the
last DMA alias. So most of the proxy IDs for the switchtec device we
register are probably ignored...

One way or another I expect the working cases are because they come from
a specific proxy ID and the broken cases come from a proxy ID that the
AMD IOMMU doesn't consider.

Logan


[1]
https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd_iommu.c#L245

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops?
       [not found] ` <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com>
@ 2019-04-23 11:00   ` Sanjay R Mehta
  2019-04-24 22:04     ` Eric Pilmore
  0 siblings, 1 reply; 8+ messages in thread
From: Sanjay R Mehta @ 2019-04-23 11:00 UTC (permalink / raw)
  To: epilmore, S Taylor, D Meyer, linux-ntb, linux-pci


> From: *Eric Pilmore* <epilmore@gigaio.com <mailto:epilmore@gigaio.com>>
> Date: Sat, Apr 20, 2019 at 2:36 PM
> Subject: AMD IO_PAGE_FAULT w/NTB on Write ops?
> To: linux-ntb <linux-ntb@googlegroups.com <mailto:linux-ntb@googlegroups.com>>, <linux-pci@vger.kernel.org <mailto:linux-pci@vger.kernel.org>>
> Cc: S Taylor <staylor@gigaio.com <mailto:staylor@gigaio.com>>, D Meyer <dmeyer@gigaio.com <mailto:dmeyer@gigaio.com>>
>
>
> Hi Folks,
>
> Before I ask my questions, here is a little background on the
> environment I have:
> - 2 hosts: 1 Xeon based (Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz),
>                 1 AMD based (AMD EPYC 7401 24-Core Processor)
> - Each host is interconnected via an external PCI-e (switchtec) switch.
> - The two hosts are exporting memory to each other via NTB.
> - IOMMU is enabled in both hosts. The Xeon platform requires some BIOS
> settings and a kernel parameter (intel_iommu=on), however as far as I
> have been able to determine, the AMD only requires the IOMMU BIOS
> setting to be enabled and no special kernel boot parameters. Does that
> sound right for AMD?
Yes. you are correct Eric.
> - Region of memory exported to each host is acquired/mapped via
> dma_alloc_coherent() using the "device" of the respective external
> PCI-e switch.
> - The dma_addr returned from the dma_alloc_coherent is relayed to the
> peer host who then adds that value (i.e. IOVA offset) to it's local
> PCI BAR representing the switch, and then ioremap()'s that resulting
> address to get a CPU virtual address to which it can now perform
> ioread/iowrite operations.
>
> What we have found is that the Xeon based host can successfully ioread
> to this mapped shared buffer, but whenever it attempts an iowrite to
> this region, it results in an IO_PAGE_FAULT on the AMD based host:
>
> AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000
> address=0x00000000fde1c18c flags=0x0070]

the address in the above log looks to be physical address of memory window. Am I Right?

If yes then, the first parameter of dma_alloc_coherent() to be passed as below,

dma_alloc_coherent(&ntb->pdev->dev, ...)instead of dma_alloc_coherent(&ntb->dev, ...).

Hope this should solve your problem.

>
> Going in the opposite direction there are no issues, i.e. the AMD
> based host can successfully ioread/iowrite to the mapped in buffer
> exported by the Xeon host.  Or if both hosts are Xeon's, then
> everything works fine also.
>
> I have looked high and low, and have not been able to interpret what
> the "flags=0x0070" represent. I assume they are indicating some write
> permission error, but was wondering if anybody here might know?
>
> More importantly, does anybody know why the AMD IOMMU might seemingly
> default to not allow Write operations to the exported memory? Is there
> some additional BIOS or kernel boot parameter setting that needs to be
> set?
>
> lspci on the AMD hosts of the external PCI-e switch:
>    23:00.0 PCI bridge: PMC-Sierra Inc. Device 8536
>    23:00.1 Bridge: PMC-Sierra Inc. Device 8536
>
> The 23:00.1 BDF is the NTB bridge. The BDF (23:01.2) in the error
> message represents the "NTB translated" BDF of the request that came
> from the peer, i.e. the 01.2 is the proxy-id. Is there a chance that
> this proxy-id is causing some confusion for the AMD IOMMU?
>
> Would greatly appreciate any assistance!
>
> Thanks!
>
> -- 
> Eric Pilmore
> epilmore@gigaio.com <mailto:epilmore@gigaio.com>
> http://gigaio.com
> Phone: (858) 775 2514
>
> This e-mail message is intended only for the individual(s) to whom it
> is addressed and
> may contain information that is privileged, confidential, proprietary,
> or otherwise exempt
> from disclosure under applicable law. If you believe you have received
> this message in
> error, please advise the sender by return e-mail and delete it from
> your mailbox.
> Thank you.
>
> -- 
> You received this message because you are subscribed to the Google Groups "linux-ntb" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to linux-ntb+unsubscribe@googlegroups.com <mailto:linux-ntb%2Bunsubscribe@googlegroups.com>.
> To post to this group, send email to linux-ntb@googlegroups.com <mailto:linux-ntb@googlegroups.com>.
> To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/CAOQPn8sX2G-Db-ZiFpP2SMKbkQnPyk63UZijAY0we%2BDoZsmDtQ%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops?
  2019-04-23 11:00   ` Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops? Sanjay R Mehta
@ 2019-04-24 22:04     ` Eric Pilmore
  2019-05-09 20:03       ` Gary R Hook
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Pilmore @ 2019-04-24 22:04 UTC (permalink / raw)
  To: Sanjay R Mehta; +Cc: S Taylor, D Meyer, linux-ntb, linux-pci

On Tue, Apr 23, 2019 at 4:00 AM Sanjay R Mehta <sanmehta@amd.com> wrote:
>
>
> > AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000
> > address=0x00000000fde1c18c flags=0x0070]
>
> the address in the above log looks to be physical address of memory window. Am I Right?
>
> If yes then, the first parameter of dma_alloc_coherent() to be passed as below,
>
> dma_alloc_coherent(&ntb->pdev->dev, ...)instead of dma_alloc_coherent(&ntb->dev, ...).
>
> Hope this should solve your problem.

Hi Sanjay,

Thanks the for the response.  We are using the correct device for the
dma_alloc_coherent(). Upon further investigation what we are finding
is that apparently the AMD IOMMU support can only manage one alias, as
opposed to Intel IOMMU support which can support multiple. Not clear
at this time if it's a software limitation in the AMD IOMMU kernel
support or an imposed limitation of the hardware. Still investigating.

Thanks,
Eric

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops?
  2019-04-24 22:04     ` Eric Pilmore
@ 2019-05-09 20:03       ` Gary R Hook
  2019-06-04 21:15         ` Eric Pilmore
  0 siblings, 1 reply; 8+ messages in thread
From: Gary R Hook @ 2019-05-09 20:03 UTC (permalink / raw)
  To: Eric Pilmore, Mehta, Sanju; +Cc: S Taylor, D Meyer, linux-ntb, linux-pci

On 4/24/19 5:04 PM, Eric Pilmore wrote:
> On Tue, Apr 23, 2019 at 4:00 AM Sanjay R Mehta <sanmehta@amd.com> wrote:
>>
>>
>>> AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000
>>> address=0x00000000fde1c18c flags=0x0070]
>>
>> the address in the above log looks to be physical address of memory window. Am I Right?
>>
>> If yes then, the first parameter of dma_alloc_coherent() to be passed as below,
>>
>> dma_alloc_coherent(&ntb->pdev->dev, ...)instead of dma_alloc_coherent(&ntb->dev, ...).
>>
>> Hope this should solve your problem.
> 
> Hi Sanjay,
> 
> Thanks the for the response.  We are using the correct device for the
> dma_alloc_coherent(). Upon further investigation what we are finding
> is that apparently the AMD IOMMU support can only manage one alias, as
> opposed to Intel IOMMU support which can support multiple. Not clear
> at this time if it's a software limitation in the AMD IOMMU kernel
> support or an imposed limitation of the hardware. Still investigating.

Please define 'alias'?

The IO_PAGE_FAULT error is described on page 142 of the AMD IOMMU spec, 
document #48882. Easily found via a search.

The flags value of 0x0070 translates to PE, RW, PR. The page was 
present, the transaction was a write, and the peripheral didn't have 
permission. That implies that mapping hadn't been done.

Not being sure how that device presents, or what you're doing with IVHD 
info, I can't comment further. I can say that the AMD IOMMU provides for 
a single exclusion range, but as many unity ranges as you wish.

HTH

grh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops?
  2019-05-09 20:03       ` Gary R Hook
@ 2019-06-04 21:15         ` Eric Pilmore
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Pilmore @ 2019-06-04 21:15 UTC (permalink / raw)
  To: Gary R Hook; +Cc: Mehta, Sanju, S Taylor, D Meyer, linux-ntb, linux-pci

On Thu, May 9, 2019 at 1:03 PM Gary R Hook <ghook@amd.com> wrote:
>
> On 4/24/19 5:04 PM, Eric Pilmore wrote:
> >
> > Thanks the for the response.  We are using the correct device for the
> > dma_alloc_coherent(). Upon further investigation what we are finding
> > is that apparently the AMD IOMMU support can only manage one alias, as
> > opposed to Intel IOMMU support which can support multiple. Not clear
> > at this time if it's a software limitation in the AMD IOMMU kernel
> > support or an imposed limitation of the hardware. Still investigating.
>
> Please define 'alias'?

Hi Gary,

I appreciate the response. Sorry for the late reply. Got sidetracked
with other stuff.

I will try to answer this as best I can. Sorry if my terminology might
be off as I'm still a relative newbie with some of this.

The "alias" is basically another BDF (or ProxyID) that wants to be
associated with the same IOMMU resources as some primary BDF.
Reference <drivers/pci/quirks.c>. In the scenario that we have we are
utilizing NTB and through this bridge will come requests (TLPs) that
will not necessarily have the ReqID as the BDF of the switch device
that contains this bridge. Instead, the ReqID will be a "translated"
(Proxy) BDF of sourcing devices on the other side of the
Non-Transparent Bridge. In our case our NTB is a Switchtec device and
the quirk quirk_switchtec_ntb_dma_alias() is used as a means of
associating these aliases (aka ProxyID or Translated ReqID) with the
NT endpoint in the local host. On Xeon platforms, the framework
supports allowing multiple aliases to be defined for a particular
IOMMU and everything works great. However, with the AMD cpu, it
appears the IOMMU framework is only accepting just one alias. Note
Logan's earlier response @ Mon, Apr 22, 10:31 AM. In our case the one
that is accepted is via the path for a processor Read, but Processor
Writes go through a slightly different path resulting in a different
ReqID. As Logan points out it seems since the AMD IOMMU code is only
accepting one alias, the Write ReqID looks foreign and thus results in
the IOMMU faults.

>
> The IO_PAGE_FAULT error is described on page 142 of the AMD IOMMU spec,
> document #48882. Easily found via a search.
>
> The flags value of 0x0070 translates to PE, RW, PR. The page was
> present, the transaction was a write, and the peripheral didn't have
> permission. That implies that mapping hadn't been done.
>
> Not being sure how that device presents, or what you're doing with IVHD
> info, I can't comment further. I can say that the AMD IOMMU provides for
> a single exclusion range, but as many unity ranges as you wish.

I'm currently not doing anything with IVHD. The devices on the other
side of the NTB that need to be aliased can be anything from a remote
Host processor, NVMe drive, GPU, etc., anything that wants to send a
memory transaction to the local host.

If you have any insight into how the AMD IOMMU support in the kernel
could be extended for multiple aliases, or whether there is a hardware
limitation that restricts it to just one, that would be greatly
appreciated.

Thanks,
Eric




-- 
Eric Pilmore
epilmore@gigaio.com
http://gigaio.com
Phone: (858) 775 2514

This e-mail message is intended only for the individual(s) to whom
it is addressed and may contain information that is privileged,
confidential, proprietary, or otherwise exempt from disclosure under
applicable law. If you believe you have received this message in
error, please advise the sender by return e-mail and delete it from
your mailbox.
Thank you.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: AMD Epyc iperf perfomance issues over NTB
  2019-04-22 17:31   ` Logan Gunthorpe
@ 2019-09-06 23:48     ` Kit Chow
  0 siblings, 0 replies; 8+ messages in thread
From: Kit Chow @ 2019-09-06 23:48 UTC (permalink / raw)
  To: linux-ntb, linux-pci; +Cc: Logan Gunthorpe, Eric Pilmore (GigaIO)

This is a follow-up of the initial problems encountered trying to get 
the AMD Epyc 7401server to do host to host communication through NTB. 
(please see thread for background info).

The IO_PAGE_FAULT flags=0x0070 seen on write ops was in fact related to 
proxy ID setup as Logan had suggested. The AMD iommu code only processed 
the 'last' proxy ID/dma alias; the last proxy ID was associated with 
Reads and this allowed Read ops to succeed and Write ops to fail. Adding 
support to process all of the proxy IDs in the AMD iommu code (plus 
adding dma_map_resource support), the AMD Epyc server can now be 
configured in a 4 host NTB setup and communicate over NTB (tcp/ip over 
ntb_netdev) to the other 3 hosts.

The problem that we are now experiencing, for which I can use some help, 
with the AMD Epyc 7401 server is very poor iperf performance over 
NTB/ntb_netdev.

The iperf numbers over NTB start off initially at around 800 Mbits/s and 
quickly degrades down to the 20 Mbits/s range. Running 'top' during 
iperf, I see many instances (up to 25+) of ksoftirqd running which 
suggests that interrupts are overwhelming the interrupt processing.

/proc/interrupts show lots of 'ccp-5' dma interrupt activity as well as 
ntb_netdev interrupt activity. After eliminating netdev interrupts by 
configuring netdev to 'use_poll' and leaving ccp, the poor iperf 
performance persists.

As a comparison, I can replace the ccp dma with the plx dma (found on 
the host adapter card) on the AMD server and get a steady 9.4 Gbits/s 
with iperf over NTB.

I've optmimized for numa via numactl in all test runs.

So it appears that the iperf NTB performance issues on the AMD Epyc 
server are related to the ccp dma and its interrupt processing.


Does anyone have any experience with the ccp dma that might be able to help?

Any help or suggestions on how to proceed would be very much appreciated.

Thanks
Kit

kchow@gigaio.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-09-06 23:48 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-20  9:06 AMD IO_PAGE_FAULT w/NTB on Write ops? Eric Pilmore
2019-04-22 17:14 ` Logan Gunthorpe
2019-04-22 17:31   ` Logan Gunthorpe
2019-09-06 23:48     ` AMD Epyc iperf perfomance issues over NTB Kit Chow
     [not found] ` <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com>
2019-04-23 11:00   ` Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops? Sanjay R Mehta
2019-04-24 22:04     ` Eric Pilmore
2019-05-09 20:03       ` Gary R Hook
2019-06-04 21:15         ` Eric Pilmore

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).