* AMD IO_PAGE_FAULT w/NTB on Write ops? @ 2019-04-20 9:06 Eric Pilmore 2019-04-22 17:14 ` Logan Gunthorpe [not found] ` <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com> 0 siblings, 2 replies; 8+ messages in thread From: Eric Pilmore @ 2019-04-20 9:06 UTC (permalink / raw) To: linux-ntb, linux-pci; +Cc: S Taylor, D Meyer Hi Folks, Before I ask my questions, here is a little background on the environment I have: - 2 hosts: 1 Xeon based (Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz), 1 AMD based (AMD EPYC 7401 24-Core Processor) - Each host is interconnected via an external PCI-e (switchtec) switch. - The two hosts are exporting memory to each other via NTB. - IOMMU is enabled in both hosts. The Xeon platform requires some BIOS settings and a kernel parameter (intel_iommu=on), however as far as I have been able to determine, the AMD only requires the IOMMU BIOS setting to be enabled and no special kernel boot parameters. Does that sound right for AMD? - Region of memory exported to each host is acquired/mapped via dma_alloc_coherent() using the "device" of the respective external PCI-e switch. - The dma_addr returned from the dma_alloc_coherent is relayed to the peer host who then adds that value (i.e. IOVA offset) to it's local PCI BAR representing the switch, and then ioremap()'s that resulting address to get a CPU virtual address to which it can now perform ioread/iowrite operations. What we have found is that the Xeon based host can successfully ioread to this mapped shared buffer, but whenever it attempts an iowrite to this region, it results in an IO_PAGE_FAULT on the AMD based host: AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000 address=0x00000000fde1c18c flags=0x0070] Going in the opposite direction there are no issues, i.e. the AMD based host can successfully ioread/iowrite to the mapped in buffer exported by the Xeon host. Or if both hosts are Xeon's, then everything works fine also. I have looked high and low, and have not been able to interpret what the "flags=0x0070" represent. I assume they are indicating some write permission error, but was wondering if anybody here might know? More importantly, does anybody know why the AMD IOMMU might seemingly default to not allow Write operations to the exported memory? Is there some additional BIOS or kernel boot parameter setting that needs to be set? lspci on the AMD hosts of the external PCI-e switch: 23:00.0 PCI bridge: PMC-Sierra Inc. Device 8536 23:00.1 Bridge: PMC-Sierra Inc. Device 8536 The 23:00.1 BDF is the NTB bridge. The BDF (23:01.2) in the error message represents the "NTB translated" BDF of the request that came from the peer, i.e. the 01.2 is the proxy-id. Is there a chance that this proxy-id is causing some confusion for the AMD IOMMU? Would greatly appreciate any assistance! Thanks! -- Eric Pilmore epilmore@gigaio.com http://gigaio.com Phone: (858) 775 2514 This e-mail message is intended only for the individual(s) to whom it is addressed and may contain information that is privileged, confidential, proprietary, or otherwise exempt from disclosure under applicable law. If you believe you have received this message in error, please advise the sender by return e-mail and delete it from your mailbox. Thank you. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: AMD IO_PAGE_FAULT w/NTB on Write ops? 2019-04-20 9:06 AMD IO_PAGE_FAULT w/NTB on Write ops? Eric Pilmore @ 2019-04-22 17:14 ` Logan Gunthorpe 2019-04-22 17:31 ` Logan Gunthorpe [not found] ` <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com> 1 sibling, 1 reply; 8+ messages in thread From: Logan Gunthorpe @ 2019-04-22 17:14 UTC (permalink / raw) To: Eric Pilmore, linux-ntb, linux-pci; +Cc: S Taylor, D Meyer On 2019-04-20 3:06 a.m., Eric Pilmore wrote: > What we have found is that the Xeon based host can successfully ioread > to this mapped shared buffer, but whenever it attempts an iowrite to > this region, it results in an IO_PAGE_FAULT on the AMD based host: > > AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000 > address=0x00000000fde1c18c flags=0x0070] > > Going in the opposite direction there are no issues, i.e. the AMD > based host can successfully ioread/iowrite to the mapped in buffer > exported by the Xeon host. Or if both hosts are Xeon's, then > everything works fine also. > > I have looked high and low, and have not been able to interpret what > the "flags=0x0070" represent. I assume they are indicating some write > permission error, but was wondering if anybody here might know? See the AMD IOMMU spec[1]. Figure 51. 0x0070 indicates the PE, RW and PR bits are set which means a Write request to a present page was denied because the peripheral did not have permission. > More importantly, does anybody know why the AMD IOMMU might seemingly > default to not allow Write operations to the exported memory? Is there > some additional BIOS or kernel boot parameter setting that needs to be > set? Yeah, I don't think the IOMMU defaults to allow write operations to exported memory. That would be extremely broken.... > lspci on the AMD hosts of the external PCI-e switch: > 23:00.0 PCI bridge: PMC-Sierra Inc. Device 8536 > 23:00.1 Bridge: PMC-Sierra Inc. Device 8536 > > The 23:00.1 BDF is the NTB bridge. The BDF (23:01.2) in the error > message represents the "NTB translated" BDF of the request that came > from the peer, i.e. the 01.2 is the proxy-id. Is there a chance that > this proxy-id is causing some confusion for the AMD IOMMU? I suspect the proxy IDs are the problem. On Intel hardware, we had to add support so that it allowed requests for all proxy IDs for a given device. We probably have to do something similar to the AMD IOMMU driver. My guess is that the reason writes work and not reads is because the write TLPs are posted and thus the switch doesn't apply the Proxy ID seeing it doesn't expect a completion. Thus the IOMMU sees the TLPs as coming from a permitted peripheral and doesn't complain. Logan [1] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: AMD IO_PAGE_FAULT w/NTB on Write ops? 2019-04-22 17:14 ` Logan Gunthorpe @ 2019-04-22 17:31 ` Logan Gunthorpe 2019-09-06 23:48 ` AMD Epyc iperf perfomance issues over NTB Kit Chow 0 siblings, 1 reply; 8+ messages in thread From: Logan Gunthorpe @ 2019-04-22 17:31 UTC (permalink / raw) To: Eric Pilmore, linux-ntb, linux-pci; +Cc: S Taylor, D Meyer On 2019-04-22 11:14 a.m., Logan Gunthorpe wrote: > My guess is that the reason writes work and not reads is because the > write TLPs are posted and thus the switch doesn't apply the Proxy ID > seeing it doesn't expect a completion. Thus the IOMMU sees the TLPs as > coming from a permitted peripheral and doesn't complain. Oh, oops, sounds like I got that backwards as you seem to indicate reads work but not writes. That doesn't make as much sense to me, but I still think it's a proxy_id problem. Take a look at [1]. It reads to me like the AMD IOMMU only supports the last DMA alias. So most of the proxy IDs for the switchtec device we register are probably ignored... One way or another I expect the working cases are because they come from a specific proxy ID and the broken cases come from a proxy ID that the AMD IOMMU doesn't consider. Logan [1] https://elixir.bootlin.com/linux/latest/source/drivers/iommu/amd_iommu.c#L245 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: AMD Epyc iperf perfomance issues over NTB 2019-04-22 17:31 ` Logan Gunthorpe @ 2019-09-06 23:48 ` Kit Chow 0 siblings, 0 replies; 8+ messages in thread From: Kit Chow @ 2019-09-06 23:48 UTC (permalink / raw) To: linux-ntb, linux-pci; +Cc: Logan Gunthorpe, Eric Pilmore (GigaIO) This is a follow-up of the initial problems encountered trying to get the AMD Epyc 7401server to do host to host communication through NTB. (please see thread for background info). The IO_PAGE_FAULT flags=0x0070 seen on write ops was in fact related to proxy ID setup as Logan had suggested. The AMD iommu code only processed the 'last' proxy ID/dma alias; the last proxy ID was associated with Reads and this allowed Read ops to succeed and Write ops to fail. Adding support to process all of the proxy IDs in the AMD iommu code (plus adding dma_map_resource support), the AMD Epyc server can now be configured in a 4 host NTB setup and communicate over NTB (tcp/ip over ntb_netdev) to the other 3 hosts. The problem that we are now experiencing, for which I can use some help, with the AMD Epyc 7401 server is very poor iperf performance over NTB/ntb_netdev. The iperf numbers over NTB start off initially at around 800 Mbits/s and quickly degrades down to the 20 Mbits/s range. Running 'top' during iperf, I see many instances (up to 25+) of ksoftirqd running which suggests that interrupts are overwhelming the interrupt processing. /proc/interrupts show lots of 'ccp-5' dma interrupt activity as well as ntb_netdev interrupt activity. After eliminating netdev interrupts by configuring netdev to 'use_poll' and leaving ccp, the poor iperf performance persists. As a comparison, I can replace the ccp dma with the plx dma (found on the host adapter card) on the AMD server and get a steady 9.4 Gbits/s with iperf over NTB. I've optmimized for numa via numactl in all test runs. So it appears that the iperf NTB performance issues on the AMD Epyc server are related to the ccp dma and its interrupt processing. Does anyone have any experience with the ccp dma that might be able to help? Any help or suggestions on how to proceed would be very much appreciated. Thanks Kit kchow@gigaio.com ^ permalink raw reply [flat|nested] 8+ messages in thread
[parent not found: <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com>]
* Re: Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops? [not found] ` <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com> @ 2019-04-23 11:00 ` Sanjay R Mehta 2019-04-24 22:04 ` Eric Pilmore 0 siblings, 1 reply; 8+ messages in thread From: Sanjay R Mehta @ 2019-04-23 11:00 UTC (permalink / raw) To: epilmore, S Taylor, D Meyer, linux-ntb, linux-pci > From: *Eric Pilmore* <epilmore@gigaio.com <mailto:epilmore@gigaio.com>> > Date: Sat, Apr 20, 2019 at 2:36 PM > Subject: AMD IO_PAGE_FAULT w/NTB on Write ops? > To: linux-ntb <linux-ntb@googlegroups.com <mailto:linux-ntb@googlegroups.com>>, <linux-pci@vger.kernel.org <mailto:linux-pci@vger.kernel.org>> > Cc: S Taylor <staylor@gigaio.com <mailto:staylor@gigaio.com>>, D Meyer <dmeyer@gigaio.com <mailto:dmeyer@gigaio.com>> > > > Hi Folks, > > Before I ask my questions, here is a little background on the > environment I have: > - 2 hosts: 1 Xeon based (Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz), > 1 AMD based (AMD EPYC 7401 24-Core Processor) > - Each host is interconnected via an external PCI-e (switchtec) switch. > - The two hosts are exporting memory to each other via NTB. > - IOMMU is enabled in both hosts. The Xeon platform requires some BIOS > settings and a kernel parameter (intel_iommu=on), however as far as I > have been able to determine, the AMD only requires the IOMMU BIOS > setting to be enabled and no special kernel boot parameters. Does that > sound right for AMD? Yes. you are correct Eric. > - Region of memory exported to each host is acquired/mapped via > dma_alloc_coherent() using the "device" of the respective external > PCI-e switch. > - The dma_addr returned from the dma_alloc_coherent is relayed to the > peer host who then adds that value (i.e. IOVA offset) to it's local > PCI BAR representing the switch, and then ioremap()'s that resulting > address to get a CPU virtual address to which it can now perform > ioread/iowrite operations. > > What we have found is that the Xeon based host can successfully ioread > to this mapped shared buffer, but whenever it attempts an iowrite to > this region, it results in an IO_PAGE_FAULT on the AMD based host: > > AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000 > address=0x00000000fde1c18c flags=0x0070] the address in the above log looks to be physical address of memory window. Am I Right? If yes then, the first parameter of dma_alloc_coherent() to be passed as below, dma_alloc_coherent(&ntb->pdev->dev, ...)instead of dma_alloc_coherent(&ntb->dev, ...). Hope this should solve your problem. > > Going in the opposite direction there are no issues, i.e. the AMD > based host can successfully ioread/iowrite to the mapped in buffer > exported by the Xeon host. Or if both hosts are Xeon's, then > everything works fine also. > > I have looked high and low, and have not been able to interpret what > the "flags=0x0070" represent. I assume they are indicating some write > permission error, but was wondering if anybody here might know? > > More importantly, does anybody know why the AMD IOMMU might seemingly > default to not allow Write operations to the exported memory? Is there > some additional BIOS or kernel boot parameter setting that needs to be > set? > > lspci on the AMD hosts of the external PCI-e switch: > 23:00.0 PCI bridge: PMC-Sierra Inc. Device 8536 > 23:00.1 Bridge: PMC-Sierra Inc. Device 8536 > > The 23:00.1 BDF is the NTB bridge. The BDF (23:01.2) in the error > message represents the "NTB translated" BDF of the request that came > from the peer, i.e. the 01.2 is the proxy-id. Is there a chance that > this proxy-id is causing some confusion for the AMD IOMMU? > > Would greatly appreciate any assistance! > > Thanks! > > -- > Eric Pilmore > epilmore@gigaio.com <mailto:epilmore@gigaio.com> > http://gigaio.com > Phone: (858) 775 2514 > > This e-mail message is intended only for the individual(s) to whom it > is addressed and > may contain information that is privileged, confidential, proprietary, > or otherwise exempt > from disclosure under applicable law. If you believe you have received > this message in > error, please advise the sender by return e-mail and delete it from > your mailbox. > Thank you. > > -- > You received this message because you are subscribed to the Google Groups "linux-ntb" group. > To unsubscribe from this group and stop receiving emails from it, send an email to linux-ntb+unsubscribe@googlegroups.com <mailto:linux-ntb%2Bunsubscribe@googlegroups.com>. > To post to this group, send email to linux-ntb@googlegroups.com <mailto:linux-ntb@googlegroups.com>. > To view this discussion on the web visit https://groups.google.com/d/msgid/linux-ntb/CAOQPn8sX2G-Db-ZiFpP2SMKbkQnPyk63UZijAY0we%2BDoZsmDtQ%40mail.gmail.com. > For more options, visit https://groups.google.com/d/optout. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops? 2019-04-23 11:00 ` Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops? Sanjay R Mehta @ 2019-04-24 22:04 ` Eric Pilmore 2019-05-09 20:03 ` Gary R Hook 0 siblings, 1 reply; 8+ messages in thread From: Eric Pilmore @ 2019-04-24 22:04 UTC (permalink / raw) To: Sanjay R Mehta; +Cc: S Taylor, D Meyer, linux-ntb, linux-pci On Tue, Apr 23, 2019 at 4:00 AM Sanjay R Mehta <sanmehta@amd.com> wrote: > > > > AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000 > > address=0x00000000fde1c18c flags=0x0070] > > the address in the above log looks to be physical address of memory window. Am I Right? > > If yes then, the first parameter of dma_alloc_coherent() to be passed as below, > > dma_alloc_coherent(&ntb->pdev->dev, ...)instead of dma_alloc_coherent(&ntb->dev, ...). > > Hope this should solve your problem. Hi Sanjay, Thanks the for the response. We are using the correct device for the dma_alloc_coherent(). Upon further investigation what we are finding is that apparently the AMD IOMMU support can only manage one alias, as opposed to Intel IOMMU support which can support multiple. Not clear at this time if it's a software limitation in the AMD IOMMU kernel support or an imposed limitation of the hardware. Still investigating. Thanks, Eric ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops? 2019-04-24 22:04 ` Eric Pilmore @ 2019-05-09 20:03 ` Gary R Hook 2019-06-04 21:15 ` Eric Pilmore 0 siblings, 1 reply; 8+ messages in thread From: Gary R Hook @ 2019-05-09 20:03 UTC (permalink / raw) To: Eric Pilmore, Mehta, Sanju; +Cc: S Taylor, D Meyer, linux-ntb, linux-pci On 4/24/19 5:04 PM, Eric Pilmore wrote: > On Tue, Apr 23, 2019 at 4:00 AM Sanjay R Mehta <sanmehta@amd.com> wrote: >> >> >>> AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000 >>> address=0x00000000fde1c18c flags=0x0070] >> >> the address in the above log looks to be physical address of memory window. Am I Right? >> >> If yes then, the first parameter of dma_alloc_coherent() to be passed as below, >> >> dma_alloc_coherent(&ntb->pdev->dev, ...)instead of dma_alloc_coherent(&ntb->dev, ...). >> >> Hope this should solve your problem. > > Hi Sanjay, > > Thanks the for the response. We are using the correct device for the > dma_alloc_coherent(). Upon further investigation what we are finding > is that apparently the AMD IOMMU support can only manage one alias, as > opposed to Intel IOMMU support which can support multiple. Not clear > at this time if it's a software limitation in the AMD IOMMU kernel > support or an imposed limitation of the hardware. Still investigating. Please define 'alias'? The IO_PAGE_FAULT error is described on page 142 of the AMD IOMMU spec, document #48882. Easily found via a search. The flags value of 0x0070 translates to PE, RW, PR. The page was present, the transaction was a write, and the peripheral didn't have permission. That implies that mapping hadn't been done. Not being sure how that device presents, or what you're doing with IVHD info, I can't comment further. I can say that the AMD IOMMU provides for a single exclusion range, but as many unity ranges as you wish. HTH grh ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops? 2019-05-09 20:03 ` Gary R Hook @ 2019-06-04 21:15 ` Eric Pilmore 0 siblings, 0 replies; 8+ messages in thread From: Eric Pilmore @ 2019-06-04 21:15 UTC (permalink / raw) To: Gary R Hook; +Cc: Mehta, Sanju, S Taylor, D Meyer, linux-ntb, linux-pci On Thu, May 9, 2019 at 1:03 PM Gary R Hook <ghook@amd.com> wrote: > > On 4/24/19 5:04 PM, Eric Pilmore wrote: > > > > Thanks the for the response. We are using the correct device for the > > dma_alloc_coherent(). Upon further investigation what we are finding > > is that apparently the AMD IOMMU support can only manage one alias, as > > opposed to Intel IOMMU support which can support multiple. Not clear > > at this time if it's a software limitation in the AMD IOMMU kernel > > support or an imposed limitation of the hardware. Still investigating. > > Please define 'alias'? Hi Gary, I appreciate the response. Sorry for the late reply. Got sidetracked with other stuff. I will try to answer this as best I can. Sorry if my terminology might be off as I'm still a relative newbie with some of this. The "alias" is basically another BDF (or ProxyID) that wants to be associated with the same IOMMU resources as some primary BDF. Reference <drivers/pci/quirks.c>. In the scenario that we have we are utilizing NTB and through this bridge will come requests (TLPs) that will not necessarily have the ReqID as the BDF of the switch device that contains this bridge. Instead, the ReqID will be a "translated" (Proxy) BDF of sourcing devices on the other side of the Non-Transparent Bridge. In our case our NTB is a Switchtec device and the quirk quirk_switchtec_ntb_dma_alias() is used as a means of associating these aliases (aka ProxyID or Translated ReqID) with the NT endpoint in the local host. On Xeon platforms, the framework supports allowing multiple aliases to be defined for a particular IOMMU and everything works great. However, with the AMD cpu, it appears the IOMMU framework is only accepting just one alias. Note Logan's earlier response @ Mon, Apr 22, 10:31 AM. In our case the one that is accepted is via the path for a processor Read, but Processor Writes go through a slightly different path resulting in a different ReqID. As Logan points out it seems since the AMD IOMMU code is only accepting one alias, the Write ReqID looks foreign and thus results in the IOMMU faults. > > The IO_PAGE_FAULT error is described on page 142 of the AMD IOMMU spec, > document #48882. Easily found via a search. > > The flags value of 0x0070 translates to PE, RW, PR. The page was > present, the transaction was a write, and the peripheral didn't have > permission. That implies that mapping hadn't been done. > > Not being sure how that device presents, or what you're doing with IVHD > info, I can't comment further. I can say that the AMD IOMMU provides for > a single exclusion range, but as many unity ranges as you wish. I'm currently not doing anything with IVHD. The devices on the other side of the NTB that need to be aliased can be anything from a remote Host processor, NVMe drive, GPU, etc., anything that wants to send a memory transaction to the local host. If you have any insight into how the AMD IOMMU support in the kernel could be extended for multiple aliases, or whether there is a hardware limitation that restricts it to just one, that would be greatly appreciated. Thanks, Eric -- Eric Pilmore epilmore@gigaio.com http://gigaio.com Phone: (858) 775 2514 This e-mail message is intended only for the individual(s) to whom it is addressed and may contain information that is privileged, confidential, proprietary, or otherwise exempt from disclosure under applicable law. If you believe you have received this message in error, please advise the sender by return e-mail and delete it from your mailbox. Thank you. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-09-06 23:48 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-04-20 9:06 AMD IO_PAGE_FAULT w/NTB on Write ops? Eric Pilmore 2019-04-22 17:14 ` Logan Gunthorpe 2019-04-22 17:31 ` Logan Gunthorpe 2019-09-06 23:48 ` AMD Epyc iperf perfomance issues over NTB Kit Chow [not found] ` <CAADLhr49ke_3s25gW11qZ+H-Jjje-E00WMHiMDbKU=mcCQtb3g@mail.gmail.com> 2019-04-23 11:00 ` Fwd: AMD IO_PAGE_FAULT w/NTB on Write ops? Sanjay R Mehta 2019-04-24 22:04 ` Eric Pilmore 2019-05-09 20:03 ` Gary R Hook 2019-06-04 21:15 ` Eric Pilmore
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).