linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* IOMMU Page faults when running DMA transfers from PCIe device
@ 2019-04-15 16:04 Patrick Brunner
  2019-04-16 15:33 ` Jerome Glisse
  0 siblings, 1 reply; 6+ messages in thread
From: Patrick Brunner @ 2019-04-15 16:04 UTC (permalink / raw)
  To: linux-kernel

Dear all,

I'm encountering very nasty problems regarding DMA transfers from an external 
PCIe device to the main memory while the IOMMU is enabled, and I'm running out 
of ideas. I'm not even sure, whether it's a kernel issue or not. But I would 
highly appreciate any hints from experienced developers how to proceed to 
solve that issue.

The problem: An FPGA (see details below) should write a small amount of data 
(~128 bytes) over a PCIe 2.0 x1 link to an address in the CPU's memory space. 
The destination address (64 bits) for the Mem Write TLP is written to a BAR-
mapped register before-hand.

On the system side, the driver consists of the usual setup code:
- request PCI regions
- pci_set_master
- I/O remapping of BARs
- setting DMA mask (dma_set_mask_and_coherent), tried both 32/64 bits
- allocating DMA buffers with dma_alloc_coherent (4096 bytes, but also tried 
smaller numbers)
- allocating IRQ lines (MSI) with pci_alloc_irq_vectors and pci_irq_vector
- writing the DMA buffers' logical address (as returned in dma_handle_t from 
dma_alloc_coherent) to a BAR-mapped register

There is also an IRQ handler dumping the first 2 DWs from the DMA buffer when 
triggered.

The FPGA part will initiate following transfers at an interval of 2.5ms:
- Memory write to DMA address
- Send MSI (to signal that transfer is done)
- Memory read from DMA address+offset

And now, the clue: everything works fine with the IOMMU disabled (iommu=off), 
i.e. the 2 DWs dumped in the ISR handler contain valid data. But if the IOMMU 
is enabled (iommu=soft or force), I receive an IO page fault (sometimes even 
more, depending on the payload size) on every transfer, and the data is all 
zeros:

[   49.001605] IO_PAGE_FAULT device=00:00.0 domain=0x0000 
address=0x00000000ffbf8000 flags=0x0070]

Where the device ID corresponds to the Host bridge, and the address 
corresponds to the DMA handle I got from dma_alloc_coherent respectively.

The big question is: what do I need to do to convince the IOMMU that my DMA 
transfer is legal? To my understanding, the IOMMU should be completely 
transparent as regarded from my device driver. What am I missing?

Some notes:
- disabling the IOMMU is not an option, as otherwise only 1 of the 2 MSIs on 
the FPGA is recognised by pci_alloc_irq_vectors. This is another problem I 
don't understand. Well, if I would get both IRQs working, I could disable the 
IOMMU for good...
- The second MSI is used for some UARTs in the FPGA, not related to the DMA-
part.
- The kernel device driver code is based on example code by Lattice for a demo 
based on the ECP3.
- The same system includes a CAN-adapter on a Mini-PCIe-Card where the driver 
code runs basically the same initialisation sequence (see above), including 
setup of the DMA buffer address via BAR-mapped register. This adapter is based 
on an ECP3 FPGA too. I don't see any IO page faults for this adapter though, 
when the IOMMU is enabled.
- When performing a Mem Read from the FPGA, no IO page faults are issued.
- The same approach was used in 3 other designs (various combos of CPUs, RAM 
and x86, some with ECP3, others with ECP5) without any problems, but none so 
far had an IOMMU involved.

System/FPGA details:
- CPU: AMD Embedded R-Series RX-416GD Radeon R6 (family: 0x15, model: 0x60, 
stepping: 0x1)
- FPGA: Lattice ECP5-45 with Lattice PCIe x1 Endpoint IP
- Kernel: 4.9.80 (same results with 4.20.17)

Note: Need to use that older kernel version as the final design must be 
patched with the RTAI extension.

I'm quite desperate at this point, after spending days of trying out various 
stuff, reading driver code for several similar devices, crawling through 
hundreds of google search results. All to no avail... So, sorry for the noise, 
but I'd be much obliged for any help/directions.

I'm happily providing more details such as the full driver code if someone is 
willing to take a deeper look.

Thanks and best regards,

Patrick





^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IOMMU Page faults when running DMA transfers from PCIe device
  2019-04-15 16:04 IOMMU Page faults when running DMA transfers from PCIe device Patrick Brunner
@ 2019-04-16 15:33 ` Jerome Glisse
  2019-04-17 14:17   ` Patrick Brunner
  2019-04-18  9:37   ` David Laight
  0 siblings, 2 replies; 6+ messages in thread
From: Jerome Glisse @ 2019-04-16 15:33 UTC (permalink / raw)
  To: Patrick Brunner; +Cc: linux-kernel

On Mon, Apr 15, 2019 at 06:04:11PM +0200, Patrick Brunner wrote:
> Dear all,
> 
> I'm encountering very nasty problems regarding DMA transfers from an external 
> PCIe device to the main memory while the IOMMU is enabled, and I'm running out 
> of ideas. I'm not even sure, whether it's a kernel issue or not. But I would 
> highly appreciate any hints from experienced developers how to proceed to 
> solve that issue.
> 
> The problem: An FPGA (see details below) should write a small amount of data 
> (~128 bytes) over a PCIe 2.0 x1 link to an address in the CPU's memory space. 
> The destination address (64 bits) for the Mem Write TLP is written to a BAR-
> mapped register before-hand.
> 
> On the system side, the driver consists of the usual setup code:
> - request PCI regions
> - pci_set_master
> - I/O remapping of BARs
> - setting DMA mask (dma_set_mask_and_coherent), tried both 32/64 bits
> - allocating DMA buffers with dma_alloc_coherent (4096 bytes, but also tried 
> smaller numbers)
> - allocating IRQ lines (MSI) with pci_alloc_irq_vectors and pci_irq_vector
> - writing the DMA buffers' logical address (as returned in dma_handle_t from 
> dma_alloc_coherent) to a BAR-mapped register
> 
> There is also an IRQ handler dumping the first 2 DWs from the DMA buffer when 
> triggered.
> 
> The FPGA part will initiate following transfers at an interval of 2.5ms:
> - Memory write to DMA address
> - Send MSI (to signal that transfer is done)
> - Memory read from DMA address+offset
> 
> And now, the clue: everything works fine with the IOMMU disabled (iommu=off), 
> i.e. the 2 DWs dumped in the ISR handler contain valid data. But if the IOMMU 
> is enabled (iommu=soft or force), I receive an IO page fault (sometimes even 
> more, depending on the payload size) on every transfer, and the data is all 
> zeros:
> 
> [   49.001605] IO_PAGE_FAULT device=00:00.0 domain=0x0000 
> address=0x00000000ffbf8000 flags=0x0070]
> 
> Where the device ID corresponds to the Host bridge, and the address 
> corresponds to the DMA handle I got from dma_alloc_coherent respectively.

I am now expert but i am guessing your FPGA set the request field in the
PCIE TLP write packet to 00:00.0 and this might work when IOMMU is off but
might not work when IOMMU is on ie when IOMMU is on your device should set
the request field to the FPGA PCIE id so that the IOMMU knows for which
device the PCIE write or read packet is and thus against which IOMMU page
table.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IOMMU Page faults when running DMA transfers from PCIe device
  2019-04-16 15:33 ` Jerome Glisse
@ 2019-04-17 14:17   ` Patrick Brunner
  2019-04-17 14:37     ` Jerome Glisse
  2019-04-18  9:37   ` David Laight
  1 sibling, 1 reply; 6+ messages in thread
From: Patrick Brunner @ 2019-04-17 14:17 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: linux-kernel

Am Dienstag, 16. April 2019, 17:33:07 CEST schrieb Jerome Glisse:
> On Mon, Apr 15, 2019 at 06:04:11PM +0200, Patrick Brunner wrote:
> > Dear all,
> > 
> > I'm encountering very nasty problems regarding DMA transfers from an
> > external PCIe device to the main memory while the IOMMU is enabled, and
> > I'm running out of ideas. I'm not even sure, whether it's a kernel issue
> > or not. But I would highly appreciate any hints from experienced
> > developers how to proceed to solve that issue.
> > 
> > The problem: An FPGA (see details below) should write a small amount of
> > data (~128 bytes) over a PCIe 2.0 x1 link to an address in the CPU's
> > memory space. The destination address (64 bits) for the Mem Write TLP is
> > written to a BAR- mapped register before-hand.
> > 
> > On the system side, the driver consists of the usual setup code:
> > - request PCI regions
> > - pci_set_master
> > - I/O remapping of BARs
> > - setting DMA mask (dma_set_mask_and_coherent), tried both 32/64 bits
> > - allocating DMA buffers with dma_alloc_coherent (4096 bytes, but also
> > tried smaller numbers)
> > - allocating IRQ lines (MSI) with pci_alloc_irq_vectors and pci_irq_vector
> > - writing the DMA buffers' logical address (as returned in dma_handle_t
> > from dma_alloc_coherent) to a BAR-mapped register
> > 
> > There is also an IRQ handler dumping the first 2 DWs from the DMA buffer
> > when triggered.
> > 
> > The FPGA part will initiate following transfers at an interval of 2.5ms:
> > - Memory write to DMA address
> > - Send MSI (to signal that transfer is done)
> > - Memory read from DMA address+offset
> > 
> > And now, the clue: everything works fine with the IOMMU disabled
> > (iommu=off), i.e. the 2 DWs dumped in the ISR handler contain valid data.
> > But if the IOMMU is enabled (iommu=soft or force), I receive an IO page
> > fault (sometimes even more, depending on the payload size) on every
> > transfer, and the data is all zeros:
> > 
> > [   49.001605] IO_PAGE_FAULT device=00:00.0 domain=0x0000
> > address=0x00000000ffbf8000 flags=0x0070]
> > 
> > Where the device ID corresponds to the Host bridge, and the address
> > corresponds to the DMA handle I got from dma_alloc_coherent respectively.
> 
> I am now expert but i am guessing your FPGA set the request field in the
> PCIE TLP write packet to 00:00.0 and this might work when IOMMU is off but
> might not work when IOMMU is on ie when IOMMU is on your device should set
> the request field to the FPGA PCIE id so that the IOMMU knows for which
> device the PCIE write or read packet is and thus against which IOMMU page
> table.
> 
> Cheers,
> Jérôme

Hi Jérôme

Thank you very much for your response.

You hit the nail! That was exactly the root cause of the problem. The request 
field was properly filled in for the Memory Read TLP, but not for the Memory 
Write TLP, where it was all-zeroes.

If I may ask another question: Is it possible to remap a buffer for DMA which 
was allocated by other means? For the second phase, we are going to use the 
RTAI extension(*) which provides its own memory allocation routines (e.g. 
rt_shm_alloc()). There, you may pass the flag USE_GFP_DMA to indicate that 
this buffer should be suitable for DMA. I've tried to remap this memory area 
using virt_to_phys() and use the resulting address for the DMA transfer from 
the FPGA, getting other IO page faults. E.g.:

[   70.100140] IO_PAGE_FAULT device=01:00.0 domain=0x0001 
address=0x0000000000080000 flags=0x0020]

It's remarkable that the logical addresses returned from dma_alloc_coherent 
(e.g. ffbd8000) look quite different from those returned by rt_shm_alloc
+virt_to_phys (e.g. 00080000).

Unfortunately, it does not seem possible to do that the other way round, i.e. 
forcing RTAI to use the buffer from dma_alloc_coherent.

(*) I'm aware that questions regarding the RTAI extension do not really belong 
to this mailing list, but I've read similar questions regarding DMA on the 
RTAI ML which never got answered...

Thanks again for your hint. It saved us many more hours of debugging! :-)

Regards,

Patrick




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IOMMU Page faults when running DMA transfers from PCIe device
  2019-04-17 14:17   ` Patrick Brunner
@ 2019-04-17 14:37     ` Jerome Glisse
  0 siblings, 0 replies; 6+ messages in thread
From: Jerome Glisse @ 2019-04-17 14:37 UTC (permalink / raw)
  To: Patrick Brunner; +Cc: linux-kernel

On Wed, Apr 17, 2019 at 04:17:09PM +0200, Patrick Brunner wrote:
> Am Dienstag, 16. April 2019, 17:33:07 CEST schrieb Jerome Glisse:
> > On Mon, Apr 15, 2019 at 06:04:11PM +0200, Patrick Brunner wrote:
> > > Dear all,
> > > 
> > > I'm encountering very nasty problems regarding DMA transfers from an
> > > external PCIe device to the main memory while the IOMMU is enabled, and
> > > I'm running out of ideas. I'm not even sure, whether it's a kernel issue
> > > or not. But I would highly appreciate any hints from experienced
> > > developers how to proceed to solve that issue.
> > > 
> > > The problem: An FPGA (see details below) should write a small amount of
> > > data (~128 bytes) over a PCIe 2.0 x1 link to an address in the CPU's
> > > memory space. The destination address (64 bits) for the Mem Write TLP is
> > > written to a BAR- mapped register before-hand.
> > > 
> > > On the system side, the driver consists of the usual setup code:
> > > - request PCI regions
> > > - pci_set_master
> > > - I/O remapping of BARs
> > > - setting DMA mask (dma_set_mask_and_coherent), tried both 32/64 bits
> > > - allocating DMA buffers with dma_alloc_coherent (4096 bytes, but also
> > > tried smaller numbers)
> > > - allocating IRQ lines (MSI) with pci_alloc_irq_vectors and pci_irq_vector
> > > - writing the DMA buffers' logical address (as returned in dma_handle_t
> > > from dma_alloc_coherent) to a BAR-mapped register
> > > 
> > > There is also an IRQ handler dumping the first 2 DWs from the DMA buffer
> > > when triggered.
> > > 
> > > The FPGA part will initiate following transfers at an interval of 2.5ms:
> > > - Memory write to DMA address
> > > - Send MSI (to signal that transfer is done)
> > > - Memory read from DMA address+offset
> > > 
> > > And now, the clue: everything works fine with the IOMMU disabled
> > > (iommu=off), i.e. the 2 DWs dumped in the ISR handler contain valid data.
> > > But if the IOMMU is enabled (iommu=soft or force), I receive an IO page
> > > fault (sometimes even more, depending on the payload size) on every
> > > transfer, and the data is all zeros:
> > > 
> > > [   49.001605] IO_PAGE_FAULT device=00:00.0 domain=0x0000
> > > address=0x00000000ffbf8000 flags=0x0070]
> > > 
> > > Where the device ID corresponds to the Host bridge, and the address
> > > corresponds to the DMA handle I got from dma_alloc_coherent respectively.
> > 
> > I am now expert but i am guessing your FPGA set the request field in the
> > PCIE TLP write packet to 00:00.0 and this might work when IOMMU is off but
> > might not work when IOMMU is on ie when IOMMU is on your device should set
> > the request field to the FPGA PCIE id so that the IOMMU knows for which
> > device the PCIE write or read packet is and thus against which IOMMU page
> > table.
> > 
> > Cheers,
> > Jérôme
> 
> Hi Jérôme
> 
> Thank you very much for your response.
> 
> You hit the nail! That was exactly the root cause of the problem. The request 
> field was properly filled in for the Memory Read TLP, but not for the Memory 
> Write TLP, where it was all-zeroes.
> 
> If I may ask another question: Is it possible to remap a buffer for DMA which 
> was allocated by other means? For the second phase, we are going to use the 
> RTAI extension(*) which provides its own memory allocation routines (e.g. 
> rt_shm_alloc()). There, you may pass the flag USE_GFP_DMA to indicate that 
> this buffer should be suitable for DMA. I've tried to remap this memory area 
> using virt_to_phys() and use the resulting address for the DMA transfer from 
> the FPGA, getting other IO page faults. E.g.:
> 
> [   70.100140] IO_PAGE_FAULT device=01:00.0 domain=0x0001 
> address=0x0000000000080000 flags=0x0020]
> 
> It's remarkable that the logical addresses returned from dma_alloc_coherent 
> (e.g. ffbd8000) look quite different from those returned by rt_shm_alloc
> +virt_to_phys (e.g. 00080000).
> 
> Unfortunately, it does not seem possible to do that the other way round, i.e. 
> forcing RTAI to use the buffer from dma_alloc_coherent.

You can use pci_map_page() or dma_map_page(). First you must get the page
that correspond to the virtual address (maybe with get_user_pages*() but
i would advice against it as it comes with a long list of gotcha and they
are no other alternative unless your device is advance enough).

Once you have the page for the virtual address then you can call either
dma_map_page() or pci_map_page(). I am sure you can find example within
the kernel for there usage.

It is also documented somewhere in Documentations/

Hopes this helps.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: IOMMU Page faults when running DMA transfers from PCIe device
  2019-04-16 15:33 ` Jerome Glisse
  2019-04-17 14:17   ` Patrick Brunner
@ 2019-04-18  9:37   ` David Laight
  2019-04-18 14:58     ` Jerome Glisse
  1 sibling, 1 reply; 6+ messages in thread
From: David Laight @ 2019-04-18  9:37 UTC (permalink / raw)
  To: 'Jerome Glisse', Patrick Brunner; +Cc: linux-kernel

From: Jerome Glisse
> Sent: 16 April 2019 16:33
...
> I am no expert but i am guessing your FPGA set the request field in the
> PCIE TLP write packet to 00:00.0 and this might work when IOMMU is off but
> might not work when IOMMU is on ie when IOMMU is on your device should set
> the request field to the FPGA PCIE id so that the IOMMU knows for which
> device the PCIE write or read packet is and thus against which IOMMU page
> table.

Interesting.
Does that mean that a malicious PCIe device can send write TLP
that contain the 'wrong' id (IIRC that is bus:dev:fn) and so
write to areas that it shouldn't access?

For any degree of security the PCIe bridge nearest the target
needs to verify the id as well.
Actually all bridges need to verify the 'bus' part.
Then boards with 'dodgy' bridges can only write to locations
that other dev:fn on the same board can access.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: IOMMU Page faults when running DMA transfers from PCIe device
  2019-04-18  9:37   ` David Laight
@ 2019-04-18 14:58     ` Jerome Glisse
  0 siblings, 0 replies; 6+ messages in thread
From: Jerome Glisse @ 2019-04-18 14:58 UTC (permalink / raw)
  To: David Laight; +Cc: Patrick Brunner, linux-kernel

On Thu, Apr 18, 2019 at 09:37:58AM +0000, David Laight wrote:
> From: Jerome Glisse
> > Sent: 16 April 2019 16:33
> ...
> > I am no expert but i am guessing your FPGA set the request field in the
> > PCIE TLP write packet to 00:00.0 and this might work when IOMMU is off but
> > might not work when IOMMU is on ie when IOMMU is on your device should set
> > the request field to the FPGA PCIE id so that the IOMMU knows for which
> > device the PCIE write or read packet is and thus against which IOMMU page
> > table.
> 
> Interesting.
> Does that mean that a malicious PCIe device can send write TLP
> that contain the 'wrong' id (IIRC that is bus:dev:fn) and so
> write to areas that it shouldn't access?

Yes it does, they are bunch of paper on that look for IOMMU DMA
attack.

> 
> For any degree of security the PCIe bridge nearest the target
> needs to verify the id as well.
> Actually all bridges need to verify the 'bus' part.
> Then boards with 'dodgy' bridges can only write to locations
> that other dev:fn on the same board can access.

Yes they should but it has a cost and AFAIK no bridges, not even
the root port, does that. PCIE bandwidth is big and it means a
lot of packets can go through a PCIE switch or PCIE bridge and
i believe that such PCIE packet inspection have been considered
too costly. Afterall if someone can plug a rogue device to your
computer (ignoring laptop) then he can do more harm with easier
method. FGPA accelerator as PCIE device, might open a door for
clever and _resourceful_ people to try to use them as a remote
vector attack.

Cheers,
Jérôme

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-04-18 14:58 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-15 16:04 IOMMU Page faults when running DMA transfers from PCIe device Patrick Brunner
2019-04-16 15:33 ` Jerome Glisse
2019-04-17 14:17   ` Patrick Brunner
2019-04-17 14:37     ` Jerome Glisse
2019-04-18  9:37   ` David Laight
2019-04-18 14:58     ` Jerome Glisse

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).