All of lore.kernel.org
 help / color / mirror / Atom feed
From: Robin Murphy <robin.murphy@arm.com>
To: Nikita Yushchenko <nikita.yoush@cogentembedded.com>,
	Arnd Bergmann <arnd@arndb.de>
Cc: Will Deacon <will.deacon@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org, linux-renesas-soc@vger.kernel.org,
	Simon Horman <horms@verge.net.au>,
	Bjorn Helgaas <bhelgaas@google.com>,
	artemi.ivanov@cogentembedded.com, fkan@apm.com,
	Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v2] arm64: do not set dma masks that device connection can't handle
Date: Wed, 11 Jan 2017 18:28:15 +0000	[thread overview]
Message-ID: <b6648c47-89ef-d9d6-72bb-91b49bab8f96@arm.com> (raw)
In-Reply-To: <7c6a1523-e41b-ad83-501a-27c260b9f9ee@cogentembedded.com>

On 11/01/17 12:37, Nikita Yushchenko wrote:
>> I actually have a third variation of this problem involving a PCI root
>> complex which *could* drive full-width (40-bit) addresses, but won't,
>> due to the way its PCI<->AXI interface is programmed. That would require
>> even more complicated dma-ranges handling to describe the windows of
>> valid physical addresses which it *will* pass, so I'm not pressing the
>> issue - let's just get the basic DMA mask case fixed first.
> 
> R-Car + NVMe is actually not "basic case".

I meant "basic" in terms of what needs to be done in Linux - simply
preventing device drivers from overwriting the DT-configured DMA mask
will make everything work as well as well as it possibly can on R-Car,
both with or without the IOMMU, since apparently all you need is to
ensure a PCI device never gets given a DMA address above 4GB. The
situation where PCI devices *can* DMA to all of physical memory, but
can't use arbitrary addresses *outside* it - which only becomes a
problem with an IOMMU - is considerably trickier.

> It has PCI<->AXI interface involved.
> PCI addresses are 64-bit and controller does handle 64-bit addresses
> there. Mapping between PCI addresses and AXI addresses is defined. But
> AXI is 32-bit.
> 
> SoC has iommu that probably could be used between PCIe module and RAM.
> Although AFAIK nobody made that working yet.
> 
> Board I work with has 4G of RAM, in 4 banks, located at different parts
> of wide address space, and only one of them is below 4G. But if iommu is
> capable of translating addresses such that 4 gigabyte banks map to first
> 4 gigabytes of address space, then all memory will become available for
> DMA from PCIe device.

The aforementioned situation on Juno is similar yet different - the PLDA
XR3 root complex uses an address-based lookup table to translate
outgoing PCI memory space transactions to AXI bus addresses with the
appropriate attributes, in power-of-two-sized regions. The firmware
configures 3 LUT entries - 2GB at 0x8000_0000 and 8GB at 0x8_8000_0000
with cache-coherent attributes to cover the DRAM areas, plus a small one
with device attributes covering the GICv2m MSI frame. The issue is that
there is no "no match" translation, so any transaction not within one of
those regions never makes it out of the root complex at all.

That's fine in normal operation, as there's nothing outside those
regions in the physical memory map a PCI device should be accessing
anyway, but turning on the SMMU is another matter - since the IOVA
allocator runs top-down, a PCI device with a 64-bit DMA mask will do a
dma_map or dma_alloc, get the physical page mapped to an IOVA up around
FF_FFFF_F000 (the SMMU will constrain things to the system bus width of
40 bits), then try to access that address and get a termination straight
back from the RC. Similarly, A KVM guest which wants to place its memory
at arbitrary locations and expect device passthrough to work is going to
have a bad time.

I don't know if it's feasible to have the firmware set the LUT up
differently, as that might lead to other problems when not using the
SMMU, and/or just require far more than the 8 available LUT entries
(assuming they have to be non-overlapping - I'm not 100% sure and
documentation is sparse). Thus it seems appropriate to describe the
currently valid PCI-AXI translations with dma-ranges, but then we'd have
multiple entries - last time I looked Linux simply ignores all but the
last one in that case - which can't be combined into a simple bitmask,
so I'm not entirely sure where to go from there. Especially as so far it
seems to be a problem exclusive to one not-widely-available ageing
early-access development platform...

It happens that limiting all PCI DMA masks to 32 bits would bodge around
this problem thanks to the current IOVA allocator behaviour, but that's
pretty yuck, and would force unnecessary bouncing for the non-SMMU case.
My other hack to carve up IOVA domains to reserve all addresses not
matching memblocks is hardly any more realistic, hence why the SMMU is
in the Juno DT in a change-it-at-your-own-peril "disabled" state ;)

Robin.

WARNING: multiple messages have this Message-ID (diff)
From: robin.murphy@arm.com (Robin Murphy)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v2] arm64: do not set dma masks that device connection can't handle
Date: Wed, 11 Jan 2017 18:28:15 +0000	[thread overview]
Message-ID: <b6648c47-89ef-d9d6-72bb-91b49bab8f96@arm.com> (raw)
In-Reply-To: <7c6a1523-e41b-ad83-501a-27c260b9f9ee@cogentembedded.com>

On 11/01/17 12:37, Nikita Yushchenko wrote:
>> I actually have a third variation of this problem involving a PCI root
>> complex which *could* drive full-width (40-bit) addresses, but won't,
>> due to the way its PCI<->AXI interface is programmed. That would require
>> even more complicated dma-ranges handling to describe the windows of
>> valid physical addresses which it *will* pass, so I'm not pressing the
>> issue - let's just get the basic DMA mask case fixed first.
> 
> R-Car + NVMe is actually not "basic case".

I meant "basic" in terms of what needs to be done in Linux - simply
preventing device drivers from overwriting the DT-configured DMA mask
will make everything work as well as well as it possibly can on R-Car,
both with or without the IOMMU, since apparently all you need is to
ensure a PCI device never gets given a DMA address above 4GB. The
situation where PCI devices *can* DMA to all of physical memory, but
can't use arbitrary addresses *outside* it - which only becomes a
problem with an IOMMU - is considerably trickier.

> It has PCI<->AXI interface involved.
> PCI addresses are 64-bit and controller does handle 64-bit addresses
> there. Mapping between PCI addresses and AXI addresses is defined. But
> AXI is 32-bit.
> 
> SoC has iommu that probably could be used between PCIe module and RAM.
> Although AFAIK nobody made that working yet.
> 
> Board I work with has 4G of RAM, in 4 banks, located at different parts
> of wide address space, and only one of them is below 4G. But if iommu is
> capable of translating addresses such that 4 gigabyte banks map to first
> 4 gigabytes of address space, then all memory will become available for
> DMA from PCIe device.

The aforementioned situation on Juno is similar yet different - the PLDA
XR3 root complex uses an address-based lookup table to translate
outgoing PCI memory space transactions to AXI bus addresses with the
appropriate attributes, in power-of-two-sized regions. The firmware
configures 3 LUT entries - 2GB at 0x8000_0000 and 8GB at 0x8_8000_0000
with cache-coherent attributes to cover the DRAM areas, plus a small one
with device attributes covering the GICv2m MSI frame. The issue is that
there is no "no match" translation, so any transaction not within one of
those regions never makes it out of the root complex at all.

That's fine in normal operation, as there's nothing outside those
regions in the physical memory map a PCI device should be accessing
anyway, but turning on the SMMU is another matter - since the IOVA
allocator runs top-down, a PCI device with a 64-bit DMA mask will do a
dma_map or dma_alloc, get the physical page mapped to an IOVA up around
FF_FFFF_F000 (the SMMU will constrain things to the system bus width of
40 bits), then try to access that address and get a termination straight
back from the RC. Similarly, A KVM guest which wants to place its memory
at arbitrary locations and expect device passthrough to work is going to
have a bad time.

I don't know if it's feasible to have the firmware set the LUT up
differently, as that might lead to other problems when not using the
SMMU, and/or just require far more than the 8 available LUT entries
(assuming they have to be non-overlapping - I'm not 100% sure and
documentation is sparse). Thus it seems appropriate to describe the
currently valid PCI-AXI translations with dma-ranges, but then we'd have
multiple entries - last time I looked Linux simply ignores all but the
last one in that case - which can't be combined into a simple bitmask,
so I'm not entirely sure where to go from there. Especially as so far it
seems to be a problem exclusive to one not-widely-available ageing
early-access development platform...

It happens that limiting all PCI DMA masks to 32 bits would bodge around
this problem thanks to the current IOVA allocator behaviour, but that's
pretty yuck, and would force unnecessary bouncing for the non-SMMU case.
My other hack to carve up IOVA domains to reserve all addresses not
matching memblocks is hardly any more realistic, hence why the SMMU is
in the Juno DT in a change-it-at-your-own-peril "disabled" state ;)

Robin.

  parent reply	other threads:[~2017-01-11 18:28 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-09  7:30 [PATCH v2] arm64: do not set dma masks that device connection can't handle Nikita Yushchenko
2017-01-09  7:30 ` Nikita Yushchenko
2017-01-10 11:51 ` Will Deacon
2017-01-10 11:51   ` Will Deacon
2017-01-10 12:47   ` Nikita Yushchenko
2017-01-10 12:47     ` Nikita Yushchenko
2017-01-10 13:12     ` Arnd Bergmann
2017-01-10 13:12       ` Arnd Bergmann
2017-01-10 13:25     ` Robin Murphy
2017-01-10 13:25       ` Robin Murphy
2017-01-10 13:42       ` Arnd Bergmann
2017-01-10 13:42         ` Arnd Bergmann
2017-01-10 14:16         ` Robin Murphy
2017-01-10 14:16           ` Robin Murphy
2017-01-10 15:06           ` Arnd Bergmann
2017-01-10 15:06             ` Arnd Bergmann
2017-01-11 12:37           ` Nikita Yushchenko
2017-01-11 12:37             ` Nikita Yushchenko
2017-01-11 16:21             ` Arnd Bergmann
2017-01-11 16:21               ` Arnd Bergmann
2017-01-11 18:28             ` Robin Murphy [this message]
2017-01-11 18:28               ` Robin Murphy
2017-01-10 14:59         ` Christoph Hellwig
2017-01-10 14:59           ` Christoph Hellwig
2017-01-10 14:00       ` [PATCH] arm64: avoid increasing DMA masks above what hardware supports Nikita Yushchenko
2017-01-10 14:00         ` Nikita Yushchenko
2017-01-10 17:14         ` Robin Murphy
2017-01-10 17:14           ` Robin Murphy
2017-01-11  7:59           ` Nikita Yushchenko
2017-01-11  7:59             ` Nikita Yushchenko
2017-01-11 11:54             ` Robin Murphy
2017-01-11 11:54               ` Robin Murphy
2017-01-11 13:41               ` Nikita Yushchenko
2017-01-11 13:41                 ` Nikita Yushchenko
2017-01-11 14:50                 ` Robin Murphy
2017-01-11 14:50                   ` Robin Murphy
2017-01-11 16:03                   ` Nikita Yushchenko
2017-01-11 16:50                     ` Robin Murphy
2017-01-11 16:50                       ` Robin Murphy
2017-01-11 18:31           ` [PATCH 0/2] arm64: fix handling of DMA masks wider than bus supports Nikita Yushchenko
2017-01-11 18:31             ` Nikita Yushchenko
2017-01-11 18:31             ` [PATCH 1/2] dma-mapping: let arch know origin of dma range passed to arch_setup_dma_ops() Nikita Yushchenko
2017-01-11 18:31               ` Nikita Yushchenko
2017-01-11 21:08               ` Arnd Bergmann
2017-01-11 21:08                 ` Arnd Bergmann
2017-01-12  5:52                 ` Nikita Yushchenko
2017-01-12  5:52                   ` Nikita Yushchenko
2017-01-12  6:33                   ` Nikita Yushchenko
2017-01-12  6:33                     ` Nikita Yushchenko
2017-01-12 13:28                     ` Arnd Bergmann
2017-01-12 13:28                       ` Arnd Bergmann
2017-01-12 13:39                       ` Nikita Yushchenko
2017-01-12 13:39                         ` Nikita Yushchenko
2017-01-12 12:16                   ` Will Deacon
2017-01-12 12:16                     ` Will Deacon
2017-01-12 13:25                     ` Arnd Bergmann
2017-01-12 13:25                       ` Arnd Bergmann
2017-01-12 13:43                       ` Robin Murphy
2017-01-12 13:43                         ` Robin Murphy
2017-01-13 10:40               ` kbuild test robot
2017-01-13 10:40                 ` kbuild test robot
2017-01-11 18:31             ` [PATCH 2/2] arm64: avoid increasing DMA masks above what hardware supports Nikita Yushchenko
2017-01-11 18:31               ` Nikita Yushchenko
2017-01-11 21:11               ` Arnd Bergmann
2017-01-11 21:11                 ` Arnd Bergmann
2017-01-12  5:53                 ` Nikita Yushchenko
2017-01-12  5:53                   ` Nikita Yushchenko
2017-01-13 10:16               ` kbuild test robot
2017-01-13 10:16                 ` kbuild test robot
2017-01-10 14:01       ` [PATCH v2] arm64: do not set dma masks that device connection can't handle Nikita Yushchenko
2017-01-10 14:01         ` Nikita Yushchenko
2017-01-10 14:57       ` Christoph Hellwig
2017-01-10 14:57         ` Christoph Hellwig
2017-01-10 14:51     ` Christoph Hellwig
2017-01-10 14:51       ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b6648c47-89ef-d9d6-72bb-91b49bab8f96@arm.com \
    --to=robin.murphy@arm.com \
    --cc=arnd@arndb.de \
    --cc=artemi.ivanov@cogentembedded.com \
    --cc=bhelgaas@google.com \
    --cc=catalin.marinas@arm.com \
    --cc=fkan@apm.com \
    --cc=hch@lst.de \
    --cc=horms@verge.net.au \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-renesas-soc@vger.kernel.org \
    --cc=nikita.yoush@cogentembedded.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.