All of lore.kernel.org
 help / color / mirror / Atom feed
* DMABOUNCE in pci-rcar
@ 2014-02-24 11:00 ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-24 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Magnus,

I noticed during randconfig testing that you enabled DMABOUNCE for the
pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30

I didn't see the original post unfortunately, but I fear we have to
revert it and come up with a better solution, as your approach seems
wrong on a number of levels:

* We really don't want any new users of DMABOUNCE on ARM but instead leave
  it to the PXA/IXP/SA1100 based platforms using it today.

* If your SoCs have an IOMMU, you should really use it, as that would
  give you much better performance than bounce buffers anyway

* If the hardware is so screwed up that you have to use bounce buffers,
  use the SWIOTLB code that is reasonably modern.

* The window base and size in the driver should not be hardcoded, as
  this is likely not a device specific property but rather an artifact of
  how it's connected. Use the standard "dma-ranges" property.

* You should not need your own dma_map_ops in the driver. What you
  do there isn't really specific to your host but should live in some
  place where it can be shared with other drivers.

* The implementation of the dma_map_ops is wrong: at the very least,
  you cannot use the dma_mask of the pci host when translating
  calls from the device, since each pci device can have a different
  dma mask that is set by the driver if it needs larger or smaller
  than 32-bit DMA space.

* The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
  this is how I noticed your changes.

* The block layer assumes that it will bounce any highmem pages,
  but that isn't necessarily what you want here: if you have 2GB
  of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
  lowmem pages that need bouncing.

* (completely unrelated, I noticed you set up a bogus I/O space
  window. Don't do that, you will get get a panic if a device happens
  to use it. Not providing an resource should work fine though)

On the bright side, your other changes in the same series all look
good. Thanks especially for sorting out the probe() function, I was
already wondering if we could change that the way you did.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-24 11:00 ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-24 11:00 UTC (permalink / raw)
  To: Magnus Damm, linux-pci, Russell King, Simon Horman
  Cc: linux-kernel, Bjorn Helgaas, linux-arm-kernel, linux-sh, Ben Dooks

Hi Magnus,

I noticed during randconfig testing that you enabled DMABOUNCE for the
pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30

I didn't see the original post unfortunately, but I fear we have to
revert it and come up with a better solution, as your approach seems
wrong on a number of levels:

* We really don't want any new users of DMABOUNCE on ARM but instead leave
  it to the PXA/IXP/SA1100 based platforms using it today.

* If your SoCs have an IOMMU, you should really use it, as that would
  give you much better performance than bounce buffers anyway

* If the hardware is so screwed up that you have to use bounce buffers,
  use the SWIOTLB code that is reasonably modern.

* The window base and size in the driver should not be hardcoded, as
  this is likely not a device specific property but rather an artifact of
  how it's connected. Use the standard "dma-ranges" property.

* You should not need your own dma_map_ops in the driver. What you
  do there isn't really specific to your host but should live in some
  place where it can be shared with other drivers.

* The implementation of the dma_map_ops is wrong: at the very least,
  you cannot use the dma_mask of the pci host when translating
  calls from the device, since each pci device can have a different
  dma mask that is set by the driver if it needs larger or smaller
  than 32-bit DMA space.

* The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
  this is how I noticed your changes.

* The block layer assumes that it will bounce any highmem pages,
  but that isn't necessarily what you want here: if you have 2GB
  of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
  lowmem pages that need bouncing.

* (completely unrelated, I noticed you set up a bogus I/O space
  window. Don't do that, you will get get a panic if a device happens
  to use it. Not providing an resource should work fine though)

On the bright side, your other changes in the same series all look
good. Thanks especially for sorting out the probe() function, I was
already wondering if we could change that the way you did.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-24 11:00 ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-24 11:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Magnus,

I noticed during randconfig testing that you enabled DMABOUNCE for the
pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30

I didn't see the original post unfortunately, but I fear we have to
revert it and come up with a better solution, as your approach seems
wrong on a number of levels:

* We really don't want any new users of DMABOUNCE on ARM but instead leave
  it to the PXA/IXP/SA1100 based platforms using it today.

* If your SoCs have an IOMMU, you should really use it, as that would
  give you much better performance than bounce buffers anyway

* If the hardware is so screwed up that you have to use bounce buffers,
  use the SWIOTLB code that is reasonably modern.

* The window base and size in the driver should not be hardcoded, as
  this is likely not a device specific property but rather an artifact of
  how it's connected. Use the standard "dma-ranges" property.

* You should not need your own dma_map_ops in the driver. What you
  do there isn't really specific to your host but should live in some
  place where it can be shared with other drivers.

* The implementation of the dma_map_ops is wrong: at the very least,
  you cannot use the dma_mask of the pci host when translating
  calls from the device, since each pci device can have a different
  dma mask that is set by the driver if it needs larger or smaller
  than 32-bit DMA space.

* The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
  this is how I noticed your changes.

* The block layer assumes that it will bounce any highmem pages,
  but that isn't necessarily what you want here: if you have 2GB
  of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
  lowmem pages that need bouncing.

* (completely unrelated, I noticed you set up a bogus I/O space
  window. Don't do that, you will get get a panic if a device happens
  to use it. Not providing an resource should work fine though)

On the bright side, your other changes in the same series all look
good. Thanks especially for sorting out the probe() function, I was
already wondering if we could change that the way you did.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-24 11:00 ` Arnd Bergmann
  (?)
  (?)
@ 2014-02-24 23:49   ` Magnus Damm
  -1 siblings, 0 replies; 78+ messages in thread
From: Magnus Damm @ 2014-02-24 23:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> Hi Magnus,
>
> I noticed during randconfig testing that you enabled DMABOUNCE for the
> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30

Hi Arnd,

The patch that you are referring to has been updated a few times, but
I believe your comments are still valid for the series
[PATCH v2 00/08] PCI: rcar: Recent driver patches from Ben Dooks and me (V2)

> I didn't see the original post unfortunately, but I fear we have to
> revert it and come up with a better solution, as your approach seems
> wrong on a number of levels:
>
> * We really don't want any new users of DMABOUNCE on ARM but instead leave
>   it to the PXA/IXP/SA1100 based platforms using it today.
>
> * If your SoCs have an IOMMU, you should really use it, as that would
>   give you much better performance than bounce buffers anyway
>
> * If the hardware is so screwed up that you have to use bounce buffers,
>   use the SWIOTLB code that is reasonably modern.

From my point of view we need some kind of bounce buffer unless we
have IOMMU support. I understand that an IOMMU would be much better
than a software-based implementation. If it is possible to use an
IOMMU with these devices remain to be seen.

I didn't know about the SWIOTLB code, neither did I know that
DMABOUNCE was supposed to be avoided. Now I do!

I do realize that my following patches madly mix potential bus code
and actual device support, however..

[PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

.. without my patches the driver does not handle CONFIG_BOUNCE and
CONFIG_VMSPLIT_2G.

> * The window base and size in the driver should not be hardcoded, as
>   this is likely not a device specific property but rather an artifact of
>   how it's connected. Use the standard "dma-ranges" property.

I'm afraid that I may not understand your concern fully here. From my
view the existing driver (without my patch) is having hard coded
board-specific memory base and a hard coded size in it, and with my
patches I try to rework that. Do you have any existing example code to
recommend me?

Regarding what the device can do and/or how it is connected - here is
some hardware information:

1) 3 on-chip PCI bridges per SoC with USB EHCI/OHCI controllers on
each of them (PCI bridge code is also shared between multiple SoCs and
of course multiple boards).

2) The systems have 40-bit physical address space and the CPU can
address this when LPAE is enabled.

3) System RAM is available in two banks - one bank in the lowest
32-bits (top 8 bits set to 0) and another bank in higher space.

4) The PCI bridge has a 32-bit base address for the windows with
alignment requirement (needs to be evenly aligned based on size)

5) Each PCI bridge instance has two windows, but supported size
differs. One window supports up to 2 GiB, another 256MiB.

Without IOMMU available I came to the conclusion that I need both
BOUNCE and DMABOUNCE to support the above hardware.

> * You should not need your own dma_map_ops in the driver. What you
>   do there isn't really specific to your host but should live in some
>   place where it can be shared with other drivers.

I think this boils down to the less-than-32-bit bus master capability
and also the poor match to a DMA zone. Consider 24-bit ISA DMA address
space vs 31-bit DMA space on a 32-bit system. I may be wrong, but a
DMA zone in my mind is something suitable for the classic ISA DMA, not
so much for modern complex systems with multiple devices that come
with different bus master capabilities.

That said, of course it makes sense to share code whenever possible.
Can you explain a bit more what kind of code you would like to have
broken out?

> * The implementation of the dma_map_ops is wrong: at the very least,
>   you cannot use the dma_mask of the pci host when translating
>   calls from the device, since each pci device can have a different
>   dma mask that is set by the driver if it needs larger or smaller
>   than 32-bit DMA space.

The current version of the driver (with or without my patch) seems to
leave all dma mask handling left out. I understand that this is poor
programming practise and I would like to see that fixed. As you
noticed, I did not try to fix this issue in my patch.

It may be worth noticing that the PCI devices hanging off the PCI
bridge instances are all on-chip fixed to OHCI and EHCI and the
drivers for these USB host controllers do not seem to try to set the
dma mask as it is today. So we are talking about a fixed set of PCI
devices here and not external PCI bus with general purpose PCI
drivers. I'm not sure if that makes things much better though..

So yes, I'd like the PCI bridge driver to be fixed with proper dma
mask handling. The question is just what that mask is supposed to
represent on a system where we a) may have IOMMU (and if so we can
address 40 bits), and if not we b) are limited to 31 bits but we also
have a non-zero base address. I'm not sure how to represent that
information with a single dma mask. Any suggestions?

> * The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
>   this is how I noticed your changes.

Do you mean that the following patch is causing some kind of build error?
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

If possible, can you please let me know which patches that you want me
to rework?

> * The block layer assumes that it will bounce any highmem pages,
>   but that isn't necessarily what you want here: if you have 2GB
>   of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
>   lowmem pages that need bouncing.

Good to hear that you also came to the conclusion that the two cases
need to be handled separately. =)

The lowmem bounce case (CONFIG_VMSPLIT_2G) is implemented in:
[PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support

The block layer bounce is also enabled in:
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

> * (completely unrelated, I noticed you set up a bogus I/O space
>   window. Don't do that, you will get get a panic if a device happens
>   to use it. Not providing an resource should work fine though)

Right. To be honest, I'm not sure why the original author implemented
this. Similar to the dma mask bits this is something that I would like
to fix, but it has been out of scope for my patches so far.

> On the bright side, your other changes in the same series all look
> good. Thanks especially for sorting out the probe() function, I was
> already wondering if we could change that the way you did.

Thanks. Ideally I'd like to support bind and unbind on the bridge too
(CARDBUS can hotplug PCI, so should we!), but I suppose that sorting
out the bounce bits is more important.

Also, the SWIOTLB code, are you aware of any existing ARM users? I
also wonder if the code can work with multiple devices - the bits in
lib/swiotlb.c looks like single instance with global system wide
support only - perhaps it needs rework to support 3 PCI bridges?

Thanks for your help, I appreciate your feedback.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-24 23:49   ` Magnus Damm
  0 siblings, 0 replies; 78+ messages in thread
From: Magnus Damm @ 2014-02-24 23:49 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Magnus Damm, linux-pci, Russell King, Simon Horman, linux-kernel,
	Bjorn Helgaas, linux-arm-kernel, SH-Linux, Ben Dooks

On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> Hi Magnus,
>
> I noticed during randconfig testing that you enabled DMABOUNCE for the
> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30

Hi Arnd,

The patch that you are referring to has been updated a few times, but
I believe your comments are still valid for the series
[PATCH v2 00/08] PCI: rcar: Recent driver patches from Ben Dooks and me (V2)

> I didn't see the original post unfortunately, but I fear we have to
> revert it and come up with a better solution, as your approach seems
> wrong on a number of levels:
>
> * We really don't want any new users of DMABOUNCE on ARM but instead leave
>   it to the PXA/IXP/SA1100 based platforms using it today.
>
> * If your SoCs have an IOMMU, you should really use it, as that would
>   give you much better performance than bounce buffers anyway
>
> * If the hardware is so screwed up that you have to use bounce buffers,
>   use the SWIOTLB code that is reasonably modern.

>From my point of view we need some kind of bounce buffer unless we
have IOMMU support. I understand that an IOMMU would be much better
than a software-based implementation. If it is possible to use an
IOMMU with these devices remain to be seen.

I didn't know about the SWIOTLB code, neither did I know that
DMABOUNCE was supposed to be avoided. Now I do!

I do realize that my following patches madly mix potential bus code
and actual device support, however..

[PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

.. without my patches the driver does not handle CONFIG_BOUNCE and
CONFIG_VMSPLIT_2G.

> * The window base and size in the driver should not be hardcoded, as
>   this is likely not a device specific property but rather an artifact of
>   how it's connected. Use the standard "dma-ranges" property.

I'm afraid that I may not understand your concern fully here. From my
view the existing driver (without my patch) is having hard coded
board-specific memory base and a hard coded size in it, and with my
patches I try to rework that. Do you have any existing example code to
recommend me?

Regarding what the device can do and/or how it is connected - here is
some hardware information:

1) 3 on-chip PCI bridges per SoC with USB EHCI/OHCI controllers on
each of them (PCI bridge code is also shared between multiple SoCs and
of course multiple boards).

2) The systems have 40-bit physical address space and the CPU can
address this when LPAE is enabled.

3) System RAM is available in two banks - one bank in the lowest
32-bits (top 8 bits set to 0) and another bank in higher space.

4) The PCI bridge has a 32-bit base address for the windows with
alignment requirement (needs to be evenly aligned based on size)

5) Each PCI bridge instance has two windows, but supported size
differs. One window supports up to 2 GiB, another 256MiB.

Without IOMMU available I came to the conclusion that I need both
BOUNCE and DMABOUNCE to support the above hardware.

> * You should not need your own dma_map_ops in the driver. What you
>   do there isn't really specific to your host but should live in some
>   place where it can be shared with other drivers.

I think this boils down to the less-than-32-bit bus master capability
and also the poor match to a DMA zone. Consider 24-bit ISA DMA address
space vs 31-bit DMA space on a 32-bit system. I may be wrong, but a
DMA zone in my mind is something suitable for the classic ISA DMA, not
so much for modern complex systems with multiple devices that come
with different bus master capabilities.

That said, of course it makes sense to share code whenever possible.
Can you explain a bit more what kind of code you would like to have
broken out?

> * The implementation of the dma_map_ops is wrong: at the very least,
>   you cannot use the dma_mask of the pci host when translating
>   calls from the device, since each pci device can have a different
>   dma mask that is set by the driver if it needs larger or smaller
>   than 32-bit DMA space.

The current version of the driver (with or without my patch) seems to
leave all dma mask handling left out. I understand that this is poor
programming practise and I would like to see that fixed. As you
noticed, I did not try to fix this issue in my patch.

It may be worth noticing that the PCI devices hanging off the PCI
bridge instances are all on-chip fixed to OHCI and EHCI and the
drivers for these USB host controllers do not seem to try to set the
dma mask as it is today. So we are talking about a fixed set of PCI
devices here and not external PCI bus with general purpose PCI
drivers. I'm not sure if that makes things much better though..

So yes, I'd like the PCI bridge driver to be fixed with proper dma
mask handling. The question is just what that mask is supposed to
represent on a system where we a) may have IOMMU (and if so we can
address 40 bits), and if not we b) are limited to 31 bits but we also
have a non-zero base address. I'm not sure how to represent that
information with a single dma mask. Any suggestions?

> * The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
>   this is how I noticed your changes.

Do you mean that the following patch is causing some kind of build error?
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

If possible, can you please let me know which patches that you want me
to rework?

> * The block layer assumes that it will bounce any highmem pages,
>   but that isn't necessarily what you want here: if you have 2GB
>   of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
>   lowmem pages that need bouncing.

Good to hear that you also came to the conclusion that the two cases
need to be handled separately. =)

The lowmem bounce case (CONFIG_VMSPLIT_2G) is implemented in:
[PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support

The block layer bounce is also enabled in:
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

> * (completely unrelated, I noticed you set up a bogus I/O space
>   window. Don't do that, you will get get a panic if a device happens
>   to use it. Not providing an resource should work fine though)

Right. To be honest, I'm not sure why the original author implemented
this. Similar to the dma mask bits this is something that I would like
to fix, but it has been out of scope for my patches so far.

> On the bright side, your other changes in the same series all look
> good. Thanks especially for sorting out the probe() function, I was
> already wondering if we could change that the way you did.

Thanks. Ideally I'd like to support bind and unbind on the bridge too
(CARDBUS can hotplug PCI, so should we!), but I suppose that sorting
out the bounce bits is more important.

Also, the SWIOTLB code, are you aware of any existing ARM users? I
also wonder if the code can work with multiple devices - the bits in
lib/swiotlb.c looks like single instance with global system wide
support only - perhaps it needs rework to support 3 PCI bridges?

Thanks for your help, I appreciate your feedback.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-24 23:49   ` Magnus Damm
  0 siblings, 0 replies; 78+ messages in thread
From: Magnus Damm @ 2014-02-24 23:49 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Magnus Damm, linux-pci, Russell King, Simon Horman, linux-kernel,
	Bjorn Helgaas, linux-arm-kernel, SH-Linux, Ben Dooks

On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> Hi Magnus,
>
> I noticed during randconfig testing that you enabled DMABOUNCE for the
> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30

Hi Arnd,

The patch that you are referring to has been updated a few times, but
I believe your comments are still valid for the series
[PATCH v2 00/08] PCI: rcar: Recent driver patches from Ben Dooks and me (V2)

> I didn't see the original post unfortunately, but I fear we have to
> revert it and come up with a better solution, as your approach seems
> wrong on a number of levels:
>
> * We really don't want any new users of DMABOUNCE on ARM but instead leave
>   it to the PXA/IXP/SA1100 based platforms using it today.
>
> * If your SoCs have an IOMMU, you should really use it, as that would
>   give you much better performance than bounce buffers anyway
>
> * If the hardware is so screwed up that you have to use bounce buffers,
>   use the SWIOTLB code that is reasonably modern.

>From my point of view we need some kind of bounce buffer unless we
have IOMMU support. I understand that an IOMMU would be much better
than a software-based implementation. If it is possible to use an
IOMMU with these devices remain to be seen.

I didn't know about the SWIOTLB code, neither did I know that
DMABOUNCE was supposed to be avoided. Now I do!

I do realize that my following patches madly mix potential bus code
and actual device support, however..

[PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

.. without my patches the driver does not handle CONFIG_BOUNCE and
CONFIG_VMSPLIT_2G.

> * The window base and size in the driver should not be hardcoded, as
>   this is likely not a device specific property but rather an artifact of
>   how it's connected. Use the standard "dma-ranges" property.

I'm afraid that I may not understand your concern fully here. From my
view the existing driver (without my patch) is having hard coded
board-specific memory base and a hard coded size in it, and with my
patches I try to rework that. Do you have any existing example code to
recommend me?

Regarding what the device can do and/or how it is connected - here is
some hardware information:

1) 3 on-chip PCI bridges per SoC with USB EHCI/OHCI controllers on
each of them (PCI bridge code is also shared between multiple SoCs and
of course multiple boards).

2) The systems have 40-bit physical address space and the CPU can
address this when LPAE is enabled.

3) System RAM is available in two banks - one bank in the lowest
32-bits (top 8 bits set to 0) and another bank in higher space.

4) The PCI bridge has a 32-bit base address for the windows with
alignment requirement (needs to be evenly aligned based on size)

5) Each PCI bridge instance has two windows, but supported size
differs. One window supports up to 2 GiB, another 256MiB.

Without IOMMU available I came to the conclusion that I need both
BOUNCE and DMABOUNCE to support the above hardware.

> * You should not need your own dma_map_ops in the driver. What you
>   do there isn't really specific to your host but should live in some
>   place where it can be shared with other drivers.

I think this boils down to the less-than-32-bit bus master capability
and also the poor match to a DMA zone. Consider 24-bit ISA DMA address
space vs 31-bit DMA space on a 32-bit system. I may be wrong, but a
DMA zone in my mind is something suitable for the classic ISA DMA, not
so much for modern complex systems with multiple devices that come
with different bus master capabilities.

That said, of course it makes sense to share code whenever possible.
Can you explain a bit more what kind of code you would like to have
broken out?

> * The implementation of the dma_map_ops is wrong: at the very least,
>   you cannot use the dma_mask of the pci host when translating
>   calls from the device, since each pci device can have a different
>   dma mask that is set by the driver if it needs larger or smaller
>   than 32-bit DMA space.

The current version of the driver (with or without my patch) seems to
leave all dma mask handling left out. I understand that this is poor
programming practise and I would like to see that fixed. As you
noticed, I did not try to fix this issue in my patch.

It may be worth noticing that the PCI devices hanging off the PCI
bridge instances are all on-chip fixed to OHCI and EHCI and the
drivers for these USB host controllers do not seem to try to set the
dma mask as it is today. So we are talking about a fixed set of PCI
devices here and not external PCI bus with general purpose PCI
drivers. I'm not sure if that makes things much better though..

So yes, I'd like the PCI bridge driver to be fixed with proper dma
mask handling. The question is just what that mask is supposed to
represent on a system where we a) may have IOMMU (and if so we can
address 40 bits), and if not we b) are limited to 31 bits but we also
have a non-zero base address. I'm not sure how to represent that
information with a single dma mask. Any suggestions?

> * The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
>   this is how I noticed your changes.

Do you mean that the following patch is causing some kind of build error?
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

If possible, can you please let me know which patches that you want me
to rework?

> * The block layer assumes that it will bounce any highmem pages,
>   but that isn't necessarily what you want here: if you have 2GB
>   of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
>   lowmem pages that need bouncing.

Good to hear that you also came to the conclusion that the two cases
need to be handled separately. =)

The lowmem bounce case (CONFIG_VMSPLIT_2G) is implemented in:
[PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support

The block layer bounce is also enabled in:
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

> * (completely unrelated, I noticed you set up a bogus I/O space
>   window. Don't do that, you will get get a panic if a device happens
>   to use it. Not providing an resource should work fine though)

Right. To be honest, I'm not sure why the original author implemented
this. Similar to the dma mask bits this is something that I would like
to fix, but it has been out of scope for my patches so far.

> On the bright side, your other changes in the same series all look
> good. Thanks especially for sorting out the probe() function, I was
> already wondering if we could change that the way you did.

Thanks. Ideally I'd like to support bind and unbind on the bridge too
(CARDBUS can hotplug PCI, so should we!), but I suppose that sorting
out the bounce bits is more important.

Also, the SWIOTLB code, are you aware of any existing ARM users? I
also wonder if the code can work with multiple devices - the bits in
lib/swiotlb.c looks like single instance with global system wide
support only - perhaps it needs rework to support 3 PCI bridges?

Thanks for your help, I appreciate your feedback.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-24 23:49   ` Magnus Damm
  0 siblings, 0 replies; 78+ messages in thread
From: Magnus Damm @ 2014-02-24 23:49 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> Hi Magnus,
>
> I noticed during randconfig testing that you enabled DMABOUNCE for the
> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30

Hi Arnd,

The patch that you are referring to has been updated a few times, but
I believe your comments are still valid for the series
[PATCH v2 00/08] PCI: rcar: Recent driver patches from Ben Dooks and me (V2)

> I didn't see the original post unfortunately, but I fear we have to
> revert it and come up with a better solution, as your approach seems
> wrong on a number of levels:
>
> * We really don't want any new users of DMABOUNCE on ARM but instead leave
>   it to the PXA/IXP/SA1100 based platforms using it today.
>
> * If your SoCs have an IOMMU, you should really use it, as that would
>   give you much better performance than bounce buffers anyway
>
> * If the hardware is so screwed up that you have to use bounce buffers,
>   use the SWIOTLB code that is reasonably modern.

>From my point of view we need some kind of bounce buffer unless we
have IOMMU support. I understand that an IOMMU would be much better
than a software-based implementation. If it is possible to use an
IOMMU with these devices remain to be seen.

I didn't know about the SWIOTLB code, neither did I know that
DMABOUNCE was supposed to be avoided. Now I do!

I do realize that my following patches madly mix potential bus code
and actual device support, however..

[PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

.. without my patches the driver does not handle CONFIG_BOUNCE and
CONFIG_VMSPLIT_2G.

> * The window base and size in the driver should not be hardcoded, as
>   this is likely not a device specific property but rather an artifact of
>   how it's connected. Use the standard "dma-ranges" property.

I'm afraid that I may not understand your concern fully here. From my
view the existing driver (without my patch) is having hard coded
board-specific memory base and a hard coded size in it, and with my
patches I try to rework that. Do you have any existing example code to
recommend me?

Regarding what the device can do and/or how it is connected - here is
some hardware information:

1) 3 on-chip PCI bridges per SoC with USB EHCI/OHCI controllers on
each of them (PCI bridge code is also shared between multiple SoCs and
of course multiple boards).

2) The systems have 40-bit physical address space and the CPU can
address this when LPAE is enabled.

3) System RAM is available in two banks - one bank in the lowest
32-bits (top 8 bits set to 0) and another bank in higher space.

4) The PCI bridge has a 32-bit base address for the windows with
alignment requirement (needs to be evenly aligned based on size)

5) Each PCI bridge instance has two windows, but supported size
differs. One window supports up to 2 GiB, another 256MiB.

Without IOMMU available I came to the conclusion that I need both
BOUNCE and DMABOUNCE to support the above hardware.

> * You should not need your own dma_map_ops in the driver. What you
>   do there isn't really specific to your host but should live in some
>   place where it can be shared with other drivers.

I think this boils down to the less-than-32-bit bus master capability
and also the poor match to a DMA zone. Consider 24-bit ISA DMA address
space vs 31-bit DMA space on a 32-bit system. I may be wrong, but a
DMA zone in my mind is something suitable for the classic ISA DMA, not
so much for modern complex systems with multiple devices that come
with different bus master capabilities.

That said, of course it makes sense to share code whenever possible.
Can you explain a bit more what kind of code you would like to have
broken out?

> * The implementation of the dma_map_ops is wrong: at the very least,
>   you cannot use the dma_mask of the pci host when translating
>   calls from the device, since each pci device can have a different
>   dma mask that is set by the driver if it needs larger or smaller
>   than 32-bit DMA space.

The current version of the driver (with or without my patch) seems to
leave all dma mask handling left out. I understand that this is poor
programming practise and I would like to see that fixed. As you
noticed, I did not try to fix this issue in my patch.

It may be worth noticing that the PCI devices hanging off the PCI
bridge instances are all on-chip fixed to OHCI and EHCI and the
drivers for these USB host controllers do not seem to try to set the
dma mask as it is today. So we are talking about a fixed set of PCI
devices here and not external PCI bus with general purpose PCI
drivers. I'm not sure if that makes things much better though..

So yes, I'd like the PCI bridge driver to be fixed with proper dma
mask handling. The question is just what that mask is supposed to
represent on a system where we a) may have IOMMU (and if so we can
address 40 bits), and if not we b) are limited to 31 bits but we also
have a non-zero base address. I'm not sure how to represent that
information with a single dma mask. Any suggestions?

> * The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
>   this is how I noticed your changes.

Do you mean that the following patch is causing some kind of build error?
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

If possible, can you please let me know which patches that you want me
to rework?

> * The block layer assumes that it will bounce any highmem pages,
>   but that isn't necessarily what you want here: if you have 2GB
>   of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
>   lowmem pages that need bouncing.

Good to hear that you also came to the conclusion that the two cases
need to be handled separately. =)

The lowmem bounce case (CONFIG_VMSPLIT_2G) is implemented in:
[PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support

The block layer bounce is also enabled in:
[PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

> * (completely unrelated, I noticed you set up a bogus I/O space
>   window. Don't do that, you will get get a panic if a device happens
>   to use it. Not providing an resource should work fine though)

Right. To be honest, I'm not sure why the original author implemented
this. Similar to the dma mask bits this is something that I would like
to fix, but it has been out of scope for my patches so far.

> On the bright side, your other changes in the same series all look
> good. Thanks especially for sorting out the probe() function, I was
> already wondering if we could change that the way you did.

Thanks. Ideally I'd like to support bind and unbind on the bridge too
(CARDBUS can hotplug PCI, so should we!), but I suppose that sorting
out the bounce bits is more important.

Also, the SWIOTLB code, are you aware of any existing ARM users? I
also wonder if the code can work with multiple devices - the bits in
lib/swiotlb.c looks like single instance with global system wide
support only - perhaps it needs rework to support 3 PCI bridges?

Thanks for your help, I appreciate your feedback.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-24 23:49   ` Magnus Damm
  (?)
  (?)
@ 2014-02-25  0:17     ` Russell King - ARM Linux
  -1 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-02-25  0:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 25, 2014 at 08:49:28AM +0900, Magnus Damm wrote:
> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >From my point of view we need some kind of bounce buffer unless we
> have IOMMU support. I understand that an IOMMU would be much better
> than a software-based implementation. If it is possible to use an
> IOMMU with these devices remain to be seen.
> 
> I didn't know about the SWIOTLB code, neither did I know that
> DMABOUNCE was supposed to be avoided. Now I do!

The reason DMABOUNCE should be avoided is because it is a known source
of OOMs, and that has never been investigated and fixed.  You can read
about some of the kinds of problems this code creates here:

http://webcache.googleusercontent.com/search?qÊche:jwl4g8hqWa8J:comments.gmane.org/gmane.linux.ports.arm.kernel/15850+&cd=2&hl=en&ct=clnk&gl=uk&client=firefox-a

That was never got to the bottom of.  I could harp on about not having
the hardware, the people with the hardware not being capable of debugging
it, or not willing to litter their kernels with printks when they've
found a reproducable way to trigger it, etc - but none of that really
matters.

What matters is the end result is nothing was ever done to investigate
the causes, so it remains "unsafe" to use.

> I do realize that my following patches madly mix potential bus code
> and actual device support, however..
> 
> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
> 
> .. without my patches the driver does not handle CONFIG_BOUNCE and
> CONFIG_VMSPLIT_2G.

Can we please kill the idea that CONFIG_VMSPLIT_* has something to do
with DMA?  It doesn't.  VMSPLIT sets where the boundary between userspace
and kernel space is placed in virtual memory.  It doesn't really change
which memory is DMA-able.

There is the BLK_BOUNCE_HIGH option, but that's more to do with drivers
saying "I don't handle highmem pages because I'm old and no one's updated
me".

The same is true of highmem vs bouncing for DMA.  Highmem is purely a
virtual memory concept and has /nothing/ to do with whether the memory
can be DMA'd to.

Let's take an extreme example.  Let's say I set a 3G VM split, so kernel
memory starts at 0xc0000000.  I then set the vmalloc space to be 1024M -
but the kernel strinks that down to the maximum that can be accomodated,
which leaves something like 16MB of lowmem.  Let's say I have 512MB of
RAM in the machine.

Now let's consider I do the same thing, but with a 2G VM split.  Has the
memory pages which can be DMA'd to changed at all?  Yes, the CPU's view
of pages has changed, but the DMA engine's view hasn't changed /one/ /bit/.

Now consider when vmalloc space isn't expanded to maximum and all that
RAM is mapped into the kernel direct mapped region.  Again, any
difference as far as the DMA engine goes?  No there isn't.

So, the idea that highmem or vmsplit has any kind of impact on whether
memory can be DMA'd to by the hardware is absolutely absurd.

VMsplit and highmem are a CPU visible concept, and has very little to do
with whether the memory is DMA-able.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-25  0:17     ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-02-25  0:17 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Arnd Bergmann, Magnus Damm, linux-pci, Simon Horman,
	linux-kernel, Bjorn Helgaas, linux-arm-kernel, SH-Linux,
	Ben Dooks

On Tue, Feb 25, 2014 at 08:49:28AM +0900, Magnus Damm wrote:
> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >From my point of view we need some kind of bounce buffer unless we
> have IOMMU support. I understand that an IOMMU would be much better
> than a software-based implementation. If it is possible to use an
> IOMMU with these devices remain to be seen.
> 
> I didn't know about the SWIOTLB code, neither did I know that
> DMABOUNCE was supposed to be avoided. Now I do!

The reason DMABOUNCE should be avoided is because it is a known source
of OOMs, and that has never been investigated and fixed.  You can read
about some of the kinds of problems this code creates here:

http://webcache.googleusercontent.com/search?q=cache:jwl4g8hqWa8J:comments.gmane.org/gmane.linux.ports.arm.kernel/15850+&cd=2&hl=en&ct=clnk&gl=uk&client=firefox-a

That was never got to the bottom of.  I could harp on about not having
the hardware, the people with the hardware not being capable of debugging
it, or not willing to litter their kernels with printks when they've
found a reproducable way to trigger it, etc - but none of that really
matters.

What matters is the end result is nothing was ever done to investigate
the causes, so it remains "unsafe" to use.

> I do realize that my following patches madly mix potential bus code
> and actual device support, however..
> 
> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
> 
> .. without my patches the driver does not handle CONFIG_BOUNCE and
> CONFIG_VMSPLIT_2G.

Can we please kill the idea that CONFIG_VMSPLIT_* has something to do
with DMA?  It doesn't.  VMSPLIT sets where the boundary between userspace
and kernel space is placed in virtual memory.  It doesn't really change
which memory is DMA-able.

There is the BLK_BOUNCE_HIGH option, but that's more to do with drivers
saying "I don't handle highmem pages because I'm old and no one's updated
me".

The same is true of highmem vs bouncing for DMA.  Highmem is purely a
virtual memory concept and has /nothing/ to do with whether the memory
can be DMA'd to.

Let's take an extreme example.  Let's say I set a 3G VM split, so kernel
memory starts at 0xc0000000.  I then set the vmalloc space to be 1024M -
but the kernel strinks that down to the maximum that can be accomodated,
which leaves something like 16MB of lowmem.  Let's say I have 512MB of
RAM in the machine.

Now let's consider I do the same thing, but with a 2G VM split.  Has the
memory pages which can be DMA'd to changed at all?  Yes, the CPU's view
of pages has changed, but the DMA engine's view hasn't changed /one/ /bit/.

Now consider when vmalloc space isn't expanded to maximum and all that
RAM is mapped into the kernel direct mapped region.  Again, any
difference as far as the DMA engine goes?  No there isn't.

So, the idea that highmem or vmsplit has any kind of impact on whether
memory can be DMA'd to by the hardware is absolutely absurd.

VMsplit and highmem are a CPU visible concept, and has very little to do
with whether the memory is DMA-able.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-25  0:17     ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-02-25  0:17 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Arnd Bergmann, Magnus Damm, linux-pci, Simon Horman,
	linux-kernel, Bjorn Helgaas, linux-arm-kernel, SH-Linux,
	Ben Dooks

On Tue, Feb 25, 2014 at 08:49:28AM +0900, Magnus Damm wrote:
> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >From my point of view we need some kind of bounce buffer unless we
> have IOMMU support. I understand that an IOMMU would be much better
> than a software-based implementation. If it is possible to use an
> IOMMU with these devices remain to be seen.
> 
> I didn't know about the SWIOTLB code, neither did I know that
> DMABOUNCE was supposed to be avoided. Now I do!

The reason DMABOUNCE should be avoided is because it is a known source
of OOMs, and that has never been investigated and fixed.  You can read
about some of the kinds of problems this code creates here:

http://webcache.googleusercontent.com/search?q=cache:jwl4g8hqWa8J:comments.gmane.org/gmane.linux.ports.arm.kernel/15850+&cd=2&hl=en&ct=clnk&gl=uk&client=firefox-a

That was never got to the bottom of.  I could harp on about not having
the hardware, the people with the hardware not being capable of debugging
it, or not willing to litter their kernels with printks when they've
found a reproducable way to trigger it, etc - but none of that really
matters.

What matters is the end result is nothing was ever done to investigate
the causes, so it remains "unsafe" to use.

> I do realize that my following patches madly mix potential bus code
> and actual device support, however..
> 
> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
> 
> .. without my patches the driver does not handle CONFIG_BOUNCE and
> CONFIG_VMSPLIT_2G.

Can we please kill the idea that CONFIG_VMSPLIT_* has something to do
with DMA?  It doesn't.  VMSPLIT sets where the boundary between userspace
and kernel space is placed in virtual memory.  It doesn't really change
which memory is DMA-able.

There is the BLK_BOUNCE_HIGH option, but that's more to do with drivers
saying "I don't handle highmem pages because I'm old and no one's updated
me".

The same is true of highmem vs bouncing for DMA.  Highmem is purely a
virtual memory concept and has /nothing/ to do with whether the memory
can be DMA'd to.

Let's take an extreme example.  Let's say I set a 3G VM split, so kernel
memory starts at 0xc0000000.  I then set the vmalloc space to be 1024M -
but the kernel strinks that down to the maximum that can be accomodated,
which leaves something like 16MB of lowmem.  Let's say I have 512MB of
RAM in the machine.

Now let's consider I do the same thing, but with a 2G VM split.  Has the
memory pages which can be DMA'd to changed at all?  Yes, the CPU's view
of pages has changed, but the DMA engine's view hasn't changed /one/ /bit/.

Now consider when vmalloc space isn't expanded to maximum and all that
RAM is mapped into the kernel direct mapped region.  Again, any
difference as far as the DMA engine goes?  No there isn't.

So, the idea that highmem or vmsplit has any kind of impact on whether
memory can be DMA'd to by the hardware is absolutely absurd.

VMsplit and highmem are a CPU visible concept, and has very little to do
with whether the memory is DMA-able.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-25  0:17     ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-02-25  0:17 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Feb 25, 2014 at 08:49:28AM +0900, Magnus Damm wrote:
> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> >From my point of view we need some kind of bounce buffer unless we
> have IOMMU support. I understand that an IOMMU would be much better
> than a software-based implementation. If it is possible to use an
> IOMMU with these devices remain to be seen.
> 
> I didn't know about the SWIOTLB code, neither did I know that
> DMABOUNCE was supposed to be avoided. Now I do!

The reason DMABOUNCE should be avoided is because it is a known source
of OOMs, and that has never been investigated and fixed.  You can read
about some of the kinds of problems this code creates here:

http://webcache.googleusercontent.com/search?q=cache:jwl4g8hqWa8J:comments.gmane.org/gmane.linux.ports.arm.kernel/15850+&cd=2&hl=en&ct=clnk&gl=uk&client=firefox-a

That was never got to the bottom of.  I could harp on about not having
the hardware, the people with the hardware not being capable of debugging
it, or not willing to litter their kernels with printks when they've
found a reproducable way to trigger it, etc - but none of that really
matters.

What matters is the end result is nothing was ever done to investigate
the causes, so it remains "unsafe" to use.

> I do realize that my following patches madly mix potential bus code
> and actual device support, however..
> 
> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
> 
> .. without my patches the driver does not handle CONFIG_BOUNCE and
> CONFIG_VMSPLIT_2G.

Can we please kill the idea that CONFIG_VMSPLIT_* has something to do
with DMA?  It doesn't.  VMSPLIT sets where the boundary between userspace
and kernel space is placed in virtual memory.  It doesn't really change
which memory is DMA-able.

There is the BLK_BOUNCE_HIGH option, but that's more to do with drivers
saying "I don't handle highmem pages because I'm old and no one's updated
me".

The same is true of highmem vs bouncing for DMA.  Highmem is purely a
virtual memory concept and has /nothing/ to do with whether the memory
can be DMA'd to.

Let's take an extreme example.  Let's say I set a 3G VM split, so kernel
memory starts at 0xc0000000.  I then set the vmalloc space to be 1024M -
but the kernel strinks that down to the maximum that can be accomodated,
which leaves something like 16MB of lowmem.  Let's say I have 512MB of
RAM in the machine.

Now let's consider I do the same thing, but with a 2G VM split.  Has the
memory pages which can be DMA'd to changed at all?  Yes, the CPU's view
of pages has changed, but the DMA engine's view hasn't changed /one/ /bit/.

Now consider when vmalloc space isn't expanded to maximum and all that
RAM is mapped into the kernel direct mapped region.  Again, any
difference as far as the DMA engine goes?  No there isn't.

So, the idea that highmem or vmsplit has any kind of impact on whether
memory can be DMA'd to by the hardware is absolutely absurd.

VMsplit and highmem are a CPU visible concept, and has very little to do
with whether the memory is DMA-able.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-25  0:17     ` Russell King - ARM Linux
  (?)
  (?)
@ 2014-02-25  2:00       ` Magnus Damm
  -1 siblings, 0 replies; 78+ messages in thread
From: Magnus Damm @ 2014-02-25  2:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Tue, Feb 25, 2014 at 9:17 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Feb 25, 2014 at 08:49:28AM +0900, Magnus Damm wrote:
>> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >From my point of view we need some kind of bounce buffer unless we
>> have IOMMU support. I understand that an IOMMU would be much better
>> than a software-based implementation. If it is possible to use an
>> IOMMU with these devices remain to be seen.
>>
>> I didn't know about the SWIOTLB code, neither did I know that
>> DMABOUNCE was supposed to be avoided. Now I do!
>
> The reason DMABOUNCE should be avoided is because it is a known source
> of OOMs, and that has never been investigated and fixed.  You can read
> about some of the kinds of problems this code creates here:
>
> http://webcache.googleusercontent.com/search?qÊche:jwl4g8hqWa8J:comments.gmane.org/gmane.linux.ports.arm.kernel/15850+&cd=2&hl=en&ct=clnk&gl=uk&client=firefox-a
>
> That was never got to the bottom of.  I could harp on about not having
> the hardware, the people with the hardware not being capable of debugging
> it, or not willing to litter their kernels with printks when they've
> found a reproducable way to trigger it, etc - but none of that really
> matters.
>
> What matters is the end result is nothing was ever done to investigate
> the causes, so it remains "unsafe" to use.

Thanks for the pointer! It is good to know.

>> I do realize that my following patches madly mix potential bus code
>> and actual device support, however..
>>
>> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
>> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>
>> .. without my patches the driver does not handle CONFIG_BOUNCE and
>> CONFIG_VMSPLIT_2G.
>
> Can we please kill the idea that CONFIG_VMSPLIT_* has something to do
> with DMA?  It doesn't.  VMSPLIT sets where the boundary between userspace
> and kernel space is placed in virtual memory.  It doesn't really change
> which memory is DMA-able.
>
> There is the BLK_BOUNCE_HIGH option, but that's more to do with drivers
> saying "I don't handle highmem pages because I'm old and no one's updated
> me".

Spot on! =)

From my observations drivers saying that they don't support HIGHMEM
may actually mean that they have a certain physical address
limitation. For instance, if you want to misuse the zones then on a
32-bit system not supporting HIGHMEM will guarantee that your memory
is within 32-bits. I'm not saying anyone should do that, but I'm sure
that kind of stuff is all over the place. =)

> The same is true of highmem vs bouncing for DMA.  Highmem is purely a
> virtual memory concept and has /nothing/ to do with whether the memory
> can be DMA'd to.
>
> Let's take an extreme example.  Let's say I set a 3G VM split, so kernel
> memory starts at 0xc0000000.  I then set the vmalloc space to be 1024M -
> but the kernel strinks that down to the maximum that can be accomodated,
> which leaves something like 16MB of lowmem.  Let's say I have 512MB of
> RAM in the machine.
>
> Now let's consider I do the same thing, but with a 2G VM split.  Has the
> memory pages which can be DMA'd to changed at all?  Yes, the CPU's view
> of pages has changed, but the DMA engine's view hasn't changed /one/ /bit/.
>
> Now consider when vmalloc space isn't expanded to maximum and all that
> RAM is mapped into the kernel direct mapped region.  Again, any
> difference as far as the DMA engine goes?  No there isn't.
>
> So, the idea that highmem or vmsplit has any kind of impact on whether
> memory can be DMA'd to by the hardware is absolutely absurd.
>
> VMsplit and highmem are a CPU visible concept, and has very little to do
> with whether the memory is DMA-able.

I totally agree with what you are saying. If the memory is arranged as
DMA zone or lowmem or HIGHMEM does not matter much from a hardware
point or view. The hardware addressing limitation and the software
concepts of memory zones are often mixed together - I suppose the DMA
zone is the only case where they may have any relation.

The most basic hardware limitation we have with this particular PCI
bridge is that it can only do bus master memory access within the
lowest 32-bits of physical address space (no high LPAE memory banks).
And to make it more complicated, the hardware is even more restriced
than that - the physical address space where bus mastering can happen
is limited to 1 GiB. (The PCI bridge hardware itself can do 2 GiB
window but it must be mapped a 0x8... The on-board memory banks are
designed with 2 GiB memory starting from 0x4.. so out of that only
1GiB remains useful)

The reason why the VMSPLIT was brought up was that the existing
hard-coded 1GiB window happens to work with the common 3G VM split
because lowmem also happens to be within 1 GiB. I suppose the code was
written with "luck" so to say.

The issue is when VMSPLIT is changed to 2G then the 1GiB PCI bus
master limitation will be visible and things will bomb out. I solve
that by using DMABOUNCE to support any VMSPLIT configuration with 1GiB
of bus mastering ability.

And the DMABOUNCE code does not support HIGHMEM, so because of that
the block layer BOUNCE is also used.

Thanks for your help.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-25  2:00       ` Magnus Damm
  0 siblings, 0 replies; 78+ messages in thread
From: Magnus Damm @ 2014-02-25  2:00 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Magnus Damm, linux-pci, Simon Horman,
	linux-kernel, Bjorn Helgaas, linux-arm-kernel, SH-Linux,
	Ben Dooks

Hi Russell,

On Tue, Feb 25, 2014 at 9:17 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Feb 25, 2014 at 08:49:28AM +0900, Magnus Damm wrote:
>> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >From my point of view we need some kind of bounce buffer unless we
>> have IOMMU support. I understand that an IOMMU would be much better
>> than a software-based implementation. If it is possible to use an
>> IOMMU with these devices remain to be seen.
>>
>> I didn't know about the SWIOTLB code, neither did I know that
>> DMABOUNCE was supposed to be avoided. Now I do!
>
> The reason DMABOUNCE should be avoided is because it is a known source
> of OOMs, and that has never been investigated and fixed.  You can read
> about some of the kinds of problems this code creates here:
>
> http://webcache.googleusercontent.com/search?q=cache:jwl4g8hqWa8J:comments.gmane.org/gmane.linux.ports.arm.kernel/15850+&cd=2&hl=en&ct=clnk&gl=uk&client=firefox-a
>
> That was never got to the bottom of.  I could harp on about not having
> the hardware, the people with the hardware not being capable of debugging
> it, or not willing to litter their kernels with printks when they've
> found a reproducable way to trigger it, etc - but none of that really
> matters.
>
> What matters is the end result is nothing was ever done to investigate
> the causes, so it remains "unsafe" to use.

Thanks for the pointer! It is good to know.

>> I do realize that my following patches madly mix potential bus code
>> and actual device support, however..
>>
>> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
>> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>
>> .. without my patches the driver does not handle CONFIG_BOUNCE and
>> CONFIG_VMSPLIT_2G.
>
> Can we please kill the idea that CONFIG_VMSPLIT_* has something to do
> with DMA?  It doesn't.  VMSPLIT sets where the boundary between userspace
> and kernel space is placed in virtual memory.  It doesn't really change
> which memory is DMA-able.
>
> There is the BLK_BOUNCE_HIGH option, but that's more to do with drivers
> saying "I don't handle highmem pages because I'm old and no one's updated
> me".

Spot on! =)

>From my observations drivers saying that they don't support HIGHMEM
may actually mean that they have a certain physical address
limitation. For instance, if you want to misuse the zones then on a
32-bit system not supporting HIGHMEM will guarantee that your memory
is within 32-bits. I'm not saying anyone should do that, but I'm sure
that kind of stuff is all over the place. =)

> The same is true of highmem vs bouncing for DMA.  Highmem is purely a
> virtual memory concept and has /nothing/ to do with whether the memory
> can be DMA'd to.
>
> Let's take an extreme example.  Let's say I set a 3G VM split, so kernel
> memory starts at 0xc0000000.  I then set the vmalloc space to be 1024M -
> but the kernel strinks that down to the maximum that can be accomodated,
> which leaves something like 16MB of lowmem.  Let's say I have 512MB of
> RAM in the machine.
>
> Now let's consider I do the same thing, but with a 2G VM split.  Has the
> memory pages which can be DMA'd to changed at all?  Yes, the CPU's view
> of pages has changed, but the DMA engine's view hasn't changed /one/ /bit/.
>
> Now consider when vmalloc space isn't expanded to maximum and all that
> RAM is mapped into the kernel direct mapped region.  Again, any
> difference as far as the DMA engine goes?  No there isn't.
>
> So, the idea that highmem or vmsplit has any kind of impact on whether
> memory can be DMA'd to by the hardware is absolutely absurd.
>
> VMsplit and highmem are a CPU visible concept, and has very little to do
> with whether the memory is DMA-able.

I totally agree with what you are saying. If the memory is arranged as
DMA zone or lowmem or HIGHMEM does not matter much from a hardware
point or view. The hardware addressing limitation and the software
concepts of memory zones are often mixed together - I suppose the DMA
zone is the only case where they may have any relation.

The most basic hardware limitation we have with this particular PCI
bridge is that it can only do bus master memory access within the
lowest 32-bits of physical address space (no high LPAE memory banks).
And to make it more complicated, the hardware is even more restriced
than that - the physical address space where bus mastering can happen
is limited to 1 GiB. (The PCI bridge hardware itself can do 2 GiB
window but it must be mapped a 0x8... The on-board memory banks are
designed with 2 GiB memory starting from 0x4.. so out of that only
1GiB remains useful)

The reason why the VMSPLIT was brought up was that the existing
hard-coded 1GiB window happens to work with the common 3G VM split
because lowmem also happens to be within 1 GiB. I suppose the code was
written with "luck" so to say.

The issue is when VMSPLIT is changed to 2G then the 1GiB PCI bus
master limitation will be visible and things will bomb out. I solve
that by using DMABOUNCE to support any VMSPLIT configuration with 1GiB
of bus mastering ability.

And the DMABOUNCE code does not support HIGHMEM, so because of that
the block layer BOUNCE is also used.

Thanks for your help.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-25  2:00       ` Magnus Damm
  0 siblings, 0 replies; 78+ messages in thread
From: Magnus Damm @ 2014-02-25  2:00 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Magnus Damm, linux-pci, Simon Horman,
	linux-kernel, Bjorn Helgaas, linux-arm-kernel, SH-Linux,
	Ben Dooks

Hi Russell,

On Tue, Feb 25, 2014 at 9:17 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Feb 25, 2014 at 08:49:28AM +0900, Magnus Damm wrote:
>> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >From my point of view we need some kind of bounce buffer unless we
>> have IOMMU support. I understand that an IOMMU would be much better
>> than a software-based implementation. If it is possible to use an
>> IOMMU with these devices remain to be seen.
>>
>> I didn't know about the SWIOTLB code, neither did I know that
>> DMABOUNCE was supposed to be avoided. Now I do!
>
> The reason DMABOUNCE should be avoided is because it is a known source
> of OOMs, and that has never been investigated and fixed.  You can read
> about some of the kinds of problems this code creates here:
>
> http://webcache.googleusercontent.com/search?q=cache:jwl4g8hqWa8J:comments.gmane.org/gmane.linux.ports.arm.kernel/15850+&cd=2&hl=en&ct=clnk&gl=uk&client=firefox-a
>
> That was never got to the bottom of.  I could harp on about not having
> the hardware, the people with the hardware not being capable of debugging
> it, or not willing to litter their kernels with printks when they've
> found a reproducable way to trigger it, etc - but none of that really
> matters.
>
> What matters is the end result is nothing was ever done to investigate
> the causes, so it remains "unsafe" to use.

Thanks for the pointer! It is good to know.

>> I do realize that my following patches madly mix potential bus code
>> and actual device support, however..
>>
>> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
>> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>
>> .. without my patches the driver does not handle CONFIG_BOUNCE and
>> CONFIG_VMSPLIT_2G.
>
> Can we please kill the idea that CONFIG_VMSPLIT_* has something to do
> with DMA?  It doesn't.  VMSPLIT sets where the boundary between userspace
> and kernel space is placed in virtual memory.  It doesn't really change
> which memory is DMA-able.
>
> There is the BLK_BOUNCE_HIGH option, but that's more to do with drivers
> saying "I don't handle highmem pages because I'm old and no one's updated
> me".

Spot on! =)

>From my observations drivers saying that they don't support HIGHMEM
may actually mean that they have a certain physical address
limitation. For instance, if you want to misuse the zones then on a
32-bit system not supporting HIGHMEM will guarantee that your memory
is within 32-bits. I'm not saying anyone should do that, but I'm sure
that kind of stuff is all over the place. =)

> The same is true of highmem vs bouncing for DMA.  Highmem is purely a
> virtual memory concept and has /nothing/ to do with whether the memory
> can be DMA'd to.
>
> Let's take an extreme example.  Let's say I set a 3G VM split, so kernel
> memory starts at 0xc0000000.  I then set the vmalloc space to be 1024M -
> but the kernel strinks that down to the maximum that can be accomodated,
> which leaves something like 16MB of lowmem.  Let's say I have 512MB of
> RAM in the machine.
>
> Now let's consider I do the same thing, but with a 2G VM split.  Has the
> memory pages which can be DMA'd to changed at all?  Yes, the CPU's view
> of pages has changed, but the DMA engine's view hasn't changed /one/ /bit/.
>
> Now consider when vmalloc space isn't expanded to maximum and all that
> RAM is mapped into the kernel direct mapped region.  Again, any
> difference as far as the DMA engine goes?  No there isn't.
>
> So, the idea that highmem or vmsplit has any kind of impact on whether
> memory can be DMA'd to by the hardware is absolutely absurd.
>
> VMsplit and highmem are a CPU visible concept, and has very little to do
> with whether the memory is DMA-able.

I totally agree with what you are saying. If the memory is arranged as
DMA zone or lowmem or HIGHMEM does not matter much from a hardware
point or view. The hardware addressing limitation and the software
concepts of memory zones are often mixed together - I suppose the DMA
zone is the only case where they may have any relation.

The most basic hardware limitation we have with this particular PCI
bridge is that it can only do bus master memory access within the
lowest 32-bits of physical address space (no high LPAE memory banks).
And to make it more complicated, the hardware is even more restriced
than that - the physical address space where bus mastering can happen
is limited to 1 GiB. (The PCI bridge hardware itself can do 2 GiB
window but it must be mapped a 0x8... The on-board memory banks are
designed with 2 GiB memory starting from 0x4.. so out of that only
1GiB remains useful)

The reason why the VMSPLIT was brought up was that the existing
hard-coded 1GiB window happens to work with the common 3G VM split
because lowmem also happens to be within 1 GiB. I suppose the code was
written with "luck" so to say.

The issue is when VMSPLIT is changed to 2G then the 1GiB PCI bus
master limitation will be visible and things will bomb out. I solve
that by using DMABOUNCE to support any VMSPLIT configuration with 1GiB
of bus mastering ability.

And the DMABOUNCE code does not support HIGHMEM, so because of that
the block layer BOUNCE is also used.

Thanks for your help.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-25  2:00       ` Magnus Damm
  0 siblings, 0 replies; 78+ messages in thread
From: Magnus Damm @ 2014-02-25  2:00 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Tue, Feb 25, 2014 at 9:17 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Tue, Feb 25, 2014 at 08:49:28AM +0900, Magnus Damm wrote:
>> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:
>> >From my point of view we need some kind of bounce buffer unless we
>> have IOMMU support. I understand that an IOMMU would be much better
>> than a software-based implementation. If it is possible to use an
>> IOMMU with these devices remain to be seen.
>>
>> I didn't know about the SWIOTLB code, neither did I know that
>> DMABOUNCE was supposed to be avoided. Now I do!
>
> The reason DMABOUNCE should be avoided is because it is a known source
> of OOMs, and that has never been investigated and fixed.  You can read
> about some of the kinds of problems this code creates here:
>
> http://webcache.googleusercontent.com/search?q=cache:jwl4g8hqWa8J:comments.gmane.org/gmane.linux.ports.arm.kernel/15850+&cd=2&hl=en&ct=clnk&gl=uk&client=firefox-a
>
> That was never got to the bottom of.  I could harp on about not having
> the hardware, the people with the hardware not being capable of debugging
> it, or not willing to litter their kernels with printks when they've
> found a reproducable way to trigger it, etc - but none of that really
> matters.
>
> What matters is the end result is nothing was ever done to investigate
> the causes, so it remains "unsafe" to use.

Thanks for the pointer! It is good to know.

>> I do realize that my following patches madly mix potential bus code
>> and actual device support, however..
>>
>> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
>> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>
>> .. without my patches the driver does not handle CONFIG_BOUNCE and
>> CONFIG_VMSPLIT_2G.
>
> Can we please kill the idea that CONFIG_VMSPLIT_* has something to do
> with DMA?  It doesn't.  VMSPLIT sets where the boundary between userspace
> and kernel space is placed in virtual memory.  It doesn't really change
> which memory is DMA-able.
>
> There is the BLK_BOUNCE_HIGH option, but that's more to do with drivers
> saying "I don't handle highmem pages because I'm old and no one's updated
> me".

Spot on! =)

>From my observations drivers saying that they don't support HIGHMEM
may actually mean that they have a certain physical address
limitation. For instance, if you want to misuse the zones then on a
32-bit system not supporting HIGHMEM will guarantee that your memory
is within 32-bits. I'm not saying anyone should do that, but I'm sure
that kind of stuff is all over the place. =)

> The same is true of highmem vs bouncing for DMA.  Highmem is purely a
> virtual memory concept and has /nothing/ to do with whether the memory
> can be DMA'd to.
>
> Let's take an extreme example.  Let's say I set a 3G VM split, so kernel
> memory starts at 0xc0000000.  I then set the vmalloc space to be 1024M -
> but the kernel strinks that down to the maximum that can be accomodated,
> which leaves something like 16MB of lowmem.  Let's say I have 512MB of
> RAM in the machine.
>
> Now let's consider I do the same thing, but with a 2G VM split.  Has the
> memory pages which can be DMA'd to changed at all?  Yes, the CPU's view
> of pages has changed, but the DMA engine's view hasn't changed /one/ /bit/.
>
> Now consider when vmalloc space isn't expanded to maximum and all that
> RAM is mapped into the kernel direct mapped region.  Again, any
> difference as far as the DMA engine goes?  No there isn't.
>
> So, the idea that highmem or vmsplit has any kind of impact on whether
> memory can be DMA'd to by the hardware is absolutely absurd.
>
> VMsplit and highmem are a CPU visible concept, and has very little to do
> with whether the memory is DMA-able.

I totally agree with what you are saying. If the memory is arranged as
DMA zone or lowmem or HIGHMEM does not matter much from a hardware
point or view. The hardware addressing limitation and the software
concepts of memory zones are often mixed together - I suppose the DMA
zone is the only case where they may have any relation.

The most basic hardware limitation we have with this particular PCI
bridge is that it can only do bus master memory access within the
lowest 32-bits of physical address space (no high LPAE memory banks).
And to make it more complicated, the hardware is even more restriced
than that - the physical address space where bus mastering can happen
is limited to 1 GiB. (The PCI bridge hardware itself can do 2 GiB
window but it must be mapped a 0x8... The on-board memory banks are
designed with 2 GiB memory starting from 0x4.. so out of that only
1GiB remains useful)

The reason why the VMSPLIT was brought up was that the existing
hard-coded 1GiB window happens to work with the common 3G VM split
because lowmem also happens to be within 1 GiB. I suppose the code was
written with "luck" so to say.

The issue is when VMSPLIT is changed to 2G then the 1GiB PCI bus
master limitation will be visible and things will bomb out. I solve
that by using DMABOUNCE to support any VMSPLIT configuration with 1GiB
of bus mastering ability.

And the DMABOUNCE code does not support HIGHMEM, so because of that
the block layer BOUNCE is also used.

Thanks for your help.

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-24 23:49   ` Magnus Damm
  (?)
  (?)
@ 2014-02-25 12:15     ` Arnd Bergmann
  -1 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-25 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 February 2014, Magnus Damm wrote:
> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:

> > * The window base and size in the driver should not be hardcoded, as
> >   this is likely not a device specific property but rather an artifact of
> >   how it's connected. Use the standard "dma-ranges" property.
> 
> I'm afraid that I may not understand your concern fully here. From my
> view the existing driver (without my patch) is having hard coded
> board-specific memory base and a hard coded size in it, and with my
> patches I try to rework that. Do you have any existing example code to
> recommend me?

You are right, this is a preexisting condition, and your patch someone
improves this, just not the way I was hoping for.

At the moment, "dma-ranges" is only used by powerpc platforms, but
we have discussed using it on ARM a couple of times when similar
problems came up. Just today, Santosh Shilimkar posted patches for
mach-keystone, and the PCI support patches that are under review for
arm64 x-gene have a related requirement.

There are two parts to this problem: smaller-than-4GB windows and
windows that are offset to the start of memory. The dma-ranges
approach should handle both. In theory it can also deal with
multiple windows, but we have so far not needed that.

> Regarding what the device can do and/or how it is connected - here is
> some hardware information:
> 
> 1) 3 on-chip PCI bridges per SoC with USB EHCI/OHCI controllers on
> each of them (PCI bridge code is also shared between multiple SoCs and
> of course multiple boards).
> 
> 2) The systems have 40-bit physical address space and the CPU can
> address this when LPAE is enabled.
> 
> 3) System RAM is available in two banks - one bank in the lowest
> 32-bits (top 8 bits set to 0) and another bank in higher space.
> 
> 4) The PCI bridge has a 32-bit base address for the windows with
> alignment requirement (needs to be evenly aligned based on size)
> 
> 5) Each PCI bridge instance has two windows, but supported size
> differs. One window supports up to 2 GiB, another 256MiB.

Thanks for the background. Now, the only real problem is the case where
the window doesn't span all of the RAM below the 4GiB boundary, but
this would be the case at least in the 256MiB window example.

For systems where you have e.g. 2GB of RAM visible below the 4GB
boundary, you shouldn't need any special code aside from refusing
to set a 64-bit dma mask from the driver.

> Without IOMMU available I came to the conclusion that I need both
> BOUNCE and DMABOUNCE to support the above hardware.

You definitely need either DMABOUNCE or SWIOTLB for this case, yes.
The reason for this is that the PCI code assumes that every DMA
master can access all memory below the 4GB boundary if the device
supports it.

I'm less sure about CONFIG_BOUNCE, and Russell already explained
a few things about it that I didn't know. Normally I would expect
the SWIOTLB code to kick in during dma_map_sg() for block devices
just like it does for network and other devices.

> > * You should not need your own dma_map_ops in the driver. What you
> >   do there isn't really specific to your host but should live in some
> >   place where it can be shared with other drivers.
> 
> I think this boils down to the less-than-32-bit bus master capability
> and also the poor match to a DMA zone. Consider 24-bit ISA DMA address
> space vs 31-bit DMA space on a 32-bit system. I may be wrong, but a
> DMA zone in my mind is something suitable for the classic ISA DMA, not
> so much for modern complex systems with multiple devices that come
> with different bus master capabilities.
> 
> That said, of course it makes sense to share code whenever possible.
> Can you explain a bit more what kind of code you would like to have
> broken out?

I was hoping that we can get to a point where we automatically check
the dma-ranges property for platform devices and set the swiotlb
dma_map_ops if necessary. For PCI device, I think each device should
inherit the map_ops from its parent.

> > * The implementation of the dma_map_ops is wrong: at the very least,
> >   you cannot use the dma_mask of the pci host when translating
> >   calls from the device, since each pci device can have a different
> >   dma mask that is set by the driver if it needs larger or smaller
> >   than 32-bit DMA space.
> 
> The current version of the driver (with or without my patch) seems to
> leave all dma mask handling left out. I understand that this is poor
> programming practise and I would like to see that fixed. As you
> noticed, I did not try to fix this issue in my patch.
> 
> It may be worth noticing that the PCI devices hanging off the PCI
> bridge instances are all on-chip fixed to OHCI and EHCI and the
> drivers for these USB host controllers do not seem to try to set the
> dma mask as it is today. So we are talking about a fixed set of PCI
> devices here and not external PCI bus with general purpose PCI
> drivers. I'm not sure if that makes things much better though..

You still have EHCI calling dma_set_mask(DMA_BIS_MASK(64)) to allow high
DMA, while OHCI doesn't allow it, so even with just two possible devices,
you have a conflict.

> So yes, I'd like the PCI bridge driver to be fixed with proper dma
> mask handling. The question is just what that mask is supposed to
> represent on a system where we a) may have IOMMU (and if so we can
> address 40 bits), and if not we b) are limited to 31 bits but we also
> have a non-zero base address. I'm not sure how to represent that
> information with a single dma mask. Any suggestions?

With nonzero base address, do you mean you have to add a number to
the bus address to get to the CPU address? That case is handled by the
patches that Santosh just posted.

Do you always have a 31-bit mask when there is no IOMMU, and have an
IOMMU when there is a smaller mask? That may somewhat simplify the
problem space.

> > * The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
> >   this is how I noticed your changes.
> 
> Do you mean that the following patch is causing some kind of build error?
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
> 
> If possible, can you please let me know which patches that you want me
> to rework?

The build error could be worked around by changing it to 

	select BOUNCE if BLOCK && MMU && HIGHMEM

but I really think you shouldn't need BOUNCE at all, so this needs more
investigation.

> > * The block layer assumes that it will bounce any highmem pages,
> >   but that isn't necessarily what you want here: if you have 2GB
> >   of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
> >   lowmem pages that need bouncing.
> 
> Good to hear that you also came to the conclusion that the two cases
> need to be handled separately. =)
> 
> The lowmem bounce case (CONFIG_VMSPLIT_2G) is implemented in:
> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
> 
> The block layer bounce is also enabled in:
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

But what do here is to only bounce highmem pages. My point is
that highmem is the wrong key here (as Russell also mentioned)
and that you may also have to bounce lowmem pages.

> > On the bright side, your other changes in the same series all look
> > good. Thanks especially for sorting out the probe() function, I was
> > already wondering if we could change that the way you did.
> 
> Thanks. Ideally I'd like to support bind and unbind on the bridge too
> (CARDBUS can hotplug PCI, so should we!), but I suppose that sorting
> out the bounce bits is more important.

Agreed.

> Also, the SWIOTLB code, are you aware of any existing ARM users? I
> also wonder if the code can work with multiple devices - the bits in
> lib/swiotlb.c looks like single instance with global system wide
> support only - perhaps it needs rework to support 3 PCI bridges?

I haven't looked at the code much, but Xen seems to use it. We can definitely
extend it if you see problems. For instance I think we may have to
wrap them to handle noncoherent buses, as the code was written for x86
and ia64 that are both cache-coherent.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-25 12:15     ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-25 12:15 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Magnus Damm, linux-pci, Russell King, Simon Horman, linux-kernel,
	Bjorn Helgaas, linux-arm-kernel, SH-Linux, Ben Dooks

On Tuesday 25 February 2014, Magnus Damm wrote:
> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:

> > * The window base and size in the driver should not be hardcoded, as
> >   this is likely not a device specific property but rather an artifact of
> >   how it's connected. Use the standard "dma-ranges" property.
> 
> I'm afraid that I may not understand your concern fully here. From my
> view the existing driver (without my patch) is having hard coded
> board-specific memory base and a hard coded size in it, and with my
> patches I try to rework that. Do you have any existing example code to
> recommend me?

You are right, this is a preexisting condition, and your patch someone
improves this, just not the way I was hoping for.

At the moment, "dma-ranges" is only used by powerpc platforms, but
we have discussed using it on ARM a couple of times when similar
problems came up. Just today, Santosh Shilimkar posted patches for
mach-keystone, and the PCI support patches that are under review for
arm64 x-gene have a related requirement.

There are two parts to this problem: smaller-than-4GB windows and
windows that are offset to the start of memory. The dma-ranges
approach should handle both. In theory it can also deal with
multiple windows, but we have so far not needed that.

> Regarding what the device can do and/or how it is connected - here is
> some hardware information:
> 
> 1) 3 on-chip PCI bridges per SoC with USB EHCI/OHCI controllers on
> each of them (PCI bridge code is also shared between multiple SoCs and
> of course multiple boards).
> 
> 2) The systems have 40-bit physical address space and the CPU can
> address this when LPAE is enabled.
> 
> 3) System RAM is available in two banks - one bank in the lowest
> 32-bits (top 8 bits set to 0) and another bank in higher space.
> 
> 4) The PCI bridge has a 32-bit base address for the windows with
> alignment requirement (needs to be evenly aligned based on size)
> 
> 5) Each PCI bridge instance has two windows, but supported size
> differs. One window supports up to 2 GiB, another 256MiB.

Thanks for the background. Now, the only real problem is the case where
the window doesn't span all of the RAM below the 4GiB boundary, but
this would be the case at least in the 256MiB window example.

For systems where you have e.g. 2GB of RAM visible below the 4GB
boundary, you shouldn't need any special code aside from refusing
to set a 64-bit dma mask from the driver.

> Without IOMMU available I came to the conclusion that I need both
> BOUNCE and DMABOUNCE to support the above hardware.

You definitely need either DMABOUNCE or SWIOTLB for this case, yes.
The reason for this is that the PCI code assumes that every DMA
master can access all memory below the 4GB boundary if the device
supports it.

I'm less sure about CONFIG_BOUNCE, and Russell already explained
a few things about it that I didn't know. Normally I would expect
the SWIOTLB code to kick in during dma_map_sg() for block devices
just like it does for network and other devices.

> > * You should not need your own dma_map_ops in the driver. What you
> >   do there isn't really specific to your host but should live in some
> >   place where it can be shared with other drivers.
> 
> I think this boils down to the less-than-32-bit bus master capability
> and also the poor match to a DMA zone. Consider 24-bit ISA DMA address
> space vs 31-bit DMA space on a 32-bit system. I may be wrong, but a
> DMA zone in my mind is something suitable for the classic ISA DMA, not
> so much for modern complex systems with multiple devices that come
> with different bus master capabilities.
> 
> That said, of course it makes sense to share code whenever possible.
> Can you explain a bit more what kind of code you would like to have
> broken out?

I was hoping that we can get to a point where we automatically check
the dma-ranges property for platform devices and set the swiotlb
dma_map_ops if necessary. For PCI device, I think each device should
inherit the map_ops from its parent.

> > * The implementation of the dma_map_ops is wrong: at the very least,
> >   you cannot use the dma_mask of the pci host when translating
> >   calls from the device, since each pci device can have a different
> >   dma mask that is set by the driver if it needs larger or smaller
> >   than 32-bit DMA space.
> 
> The current version of the driver (with or without my patch) seems to
> leave all dma mask handling left out. I understand that this is poor
> programming practise and I would like to see that fixed. As you
> noticed, I did not try to fix this issue in my patch.
> 
> It may be worth noticing that the PCI devices hanging off the PCI
> bridge instances are all on-chip fixed to OHCI and EHCI and the
> drivers for these USB host controllers do not seem to try to set the
> dma mask as it is today. So we are talking about a fixed set of PCI
> devices here and not external PCI bus with general purpose PCI
> drivers. I'm not sure if that makes things much better though..

You still have EHCI calling dma_set_mask(DMA_BIS_MASK(64)) to allow high
DMA, while OHCI doesn't allow it, so even with just two possible devices,
you have a conflict.

> So yes, I'd like the PCI bridge driver to be fixed with proper dma
> mask handling. The question is just what that mask is supposed to
> represent on a system where we a) may have IOMMU (and if so we can
> address 40 bits), and if not we b) are limited to 31 bits but we also
> have a non-zero base address. I'm not sure how to represent that
> information with a single dma mask. Any suggestions?

With nonzero base address, do you mean you have to add a number to
the bus address to get to the CPU address? That case is handled by the
patches that Santosh just posted.

Do you always have a 31-bit mask when there is no IOMMU, and have an
IOMMU when there is a smaller mask? That may somewhat simplify the
problem space.

> > * The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
> >   this is how I noticed your changes.
> 
> Do you mean that the following patch is causing some kind of build error?
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
> 
> If possible, can you please let me know which patches that you want me
> to rework?

The build error could be worked around by changing it to 

	select BOUNCE if BLOCK && MMU && HIGHMEM

but I really think you shouldn't need BOUNCE at all, so this needs more
investigation.

> > * The block layer assumes that it will bounce any highmem pages,
> >   but that isn't necessarily what you want here: if you have 2GB
> >   of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
> >   lowmem pages that need bouncing.
> 
> Good to hear that you also came to the conclusion that the two cases
> need to be handled separately. =)
> 
> The lowmem bounce case (CONFIG_VMSPLIT_2G) is implemented in:
> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
> 
> The block layer bounce is also enabled in:
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

But what do here is to only bounce highmem pages. My point is
that highmem is the wrong key here (as Russell also mentioned)
and that you may also have to bounce lowmem pages.

> > On the bright side, your other changes in the same series all look
> > good. Thanks especially for sorting out the probe() function, I was
> > already wondering if we could change that the way you did.
> 
> Thanks. Ideally I'd like to support bind and unbind on the bridge too
> (CARDBUS can hotplug PCI, so should we!), but I suppose that sorting
> out the bounce bits is more important.

Agreed.

> Also, the SWIOTLB code, are you aware of any existing ARM users? I
> also wonder if the code can work with multiple devices - the bits in
> lib/swiotlb.c looks like single instance with global system wide
> support only - perhaps it needs rework to support 3 PCI bridges?

I haven't looked at the code much, but Xen seems to use it. We can definitely
extend it if you see problems. For instance I think we may have to
wrap them to handle noncoherent buses, as the code was written for x86
and ia64 that are both cache-coherent.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-25 12:15     ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-25 12:15 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Magnus Damm, linux-pci, Russell King, Simon Horman, linux-kernel,
	Bjorn Helgaas, linux-arm-kernel, SH-Linux, Ben Dooks

On Tuesday 25 February 2014, Magnus Damm wrote:
> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:

> > * The window base and size in the driver should not be hardcoded, as
> >   this is likely not a device specific property but rather an artifact of
> >   how it's connected. Use the standard "dma-ranges" property.
> 
> I'm afraid that I may not understand your concern fully here. From my
> view the existing driver (without my patch) is having hard coded
> board-specific memory base and a hard coded size in it, and with my
> patches I try to rework that. Do you have any existing example code to
> recommend me?

You are right, this is a preexisting condition, and your patch someone
improves this, just not the way I was hoping for.

At the moment, "dma-ranges" is only used by powerpc platforms, but
we have discussed using it on ARM a couple of times when similar
problems came up. Just today, Santosh Shilimkar posted patches for
mach-keystone, and the PCI support patches that are under review for
arm64 x-gene have a related requirement.

There are two parts to this problem: smaller-than-4GB windows and
windows that are offset to the start of memory. The dma-ranges
approach should handle both. In theory it can also deal with
multiple windows, but we have so far not needed that.

> Regarding what the device can do and/or how it is connected - here is
> some hardware information:
> 
> 1) 3 on-chip PCI bridges per SoC with USB EHCI/OHCI controllers on
> each of them (PCI bridge code is also shared between multiple SoCs and
> of course multiple boards).
> 
> 2) The systems have 40-bit physical address space and the CPU can
> address this when LPAE is enabled.
> 
> 3) System RAM is available in two banks - one bank in the lowest
> 32-bits (top 8 bits set to 0) and another bank in higher space.
> 
> 4) The PCI bridge has a 32-bit base address for the windows with
> alignment requirement (needs to be evenly aligned based on size)
> 
> 5) Each PCI bridge instance has two windows, but supported size
> differs. One window supports up to 2 GiB, another 256MiB.

Thanks for the background. Now, the only real problem is the case where
the window doesn't span all of the RAM below the 4GiB boundary, but
this would be the case at least in the 256MiB window example.

For systems where you have e.g. 2GB of RAM visible below the 4GB
boundary, you shouldn't need any special code aside from refusing
to set a 64-bit dma mask from the driver.

> Without IOMMU available I came to the conclusion that I need both
> BOUNCE and DMABOUNCE to support the above hardware.

You definitely need either DMABOUNCE or SWIOTLB for this case, yes.
The reason for this is that the PCI code assumes that every DMA
master can access all memory below the 4GB boundary if the device
supports it.

I'm less sure about CONFIG_BOUNCE, and Russell already explained
a few things about it that I didn't know. Normally I would expect
the SWIOTLB code to kick in during dma_map_sg() for block devices
just like it does for network and other devices.

> > * You should not need your own dma_map_ops in the driver. What you
> >   do there isn't really specific to your host but should live in some
> >   place where it can be shared with other drivers.
> 
> I think this boils down to the less-than-32-bit bus master capability
> and also the poor match to a DMA zone. Consider 24-bit ISA DMA address
> space vs 31-bit DMA space on a 32-bit system. I may be wrong, but a
> DMA zone in my mind is something suitable for the classic ISA DMA, not
> so much for modern complex systems with multiple devices that come
> with different bus master capabilities.
> 
> That said, of course it makes sense to share code whenever possible.
> Can you explain a bit more what kind of code you would like to have
> broken out?

I was hoping that we can get to a point where we automatically check
the dma-ranges property for platform devices and set the swiotlb
dma_map_ops if necessary. For PCI device, I think each device should
inherit the map_ops from its parent.

> > * The implementation of the dma_map_ops is wrong: at the very least,
> >   you cannot use the dma_mask of the pci host when translating
> >   calls from the device, since each pci device can have a different
> >   dma mask that is set by the driver if it needs larger or smaller
> >   than 32-bit DMA space.
> 
> The current version of the driver (with or without my patch) seems to
> leave all dma mask handling left out. I understand that this is poor
> programming practise and I would like to see that fixed. As you
> noticed, I did not try to fix this issue in my patch.
> 
> It may be worth noticing that the PCI devices hanging off the PCI
> bridge instances are all on-chip fixed to OHCI and EHCI and the
> drivers for these USB host controllers do not seem to try to set the
> dma mask as it is today. So we are talking about a fixed set of PCI
> devices here and not external PCI bus with general purpose PCI
> drivers. I'm not sure if that makes things much better though..

You still have EHCI calling dma_set_mask(DMA_BIS_MASK(64)) to allow high
DMA, while OHCI doesn't allow it, so even with just two possible devices,
you have a conflict.

> So yes, I'd like the PCI bridge driver to be fixed with proper dma
> mask handling. The question is just what that mask is supposed to
> represent on a system where we a) may have IOMMU (and if so we can
> address 40 bits), and if not we b) are limited to 31 bits but we also
> have a non-zero base address. I'm not sure how to represent that
> information with a single dma mask. Any suggestions?

With nonzero base address, do you mean you have to add a number to
the bus address to get to the CPU address? That case is handled by the
patches that Santosh just posted.

Do you always have a 31-bit mask when there is no IOMMU, and have an
IOMMU when there is a smaller mask? That may somewhat simplify the
problem space.

> > * The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
> >   this is how I noticed your changes.
> 
> Do you mean that the following patch is causing some kind of build error?
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
> 
> If possible, can you please let me know which patches that you want me
> to rework?

The build error could be worked around by changing it to 

	select BOUNCE if BLOCK && MMU && HIGHMEM

but I really think you shouldn't need BOUNCE at all, so this needs more
investigation.

> > * The block layer assumes that it will bounce any highmem pages,
> >   but that isn't necessarily what you want here: if you have 2GB
> >   of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
> >   lowmem pages that need bouncing.
> 
> Good to hear that you also came to the conclusion that the two cases
> need to be handled separately. =)
> 
> The lowmem bounce case (CONFIG_VMSPLIT_2G) is implemented in:
> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
> 
> The block layer bounce is also enabled in:
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

But what do here is to only bounce highmem pages. My point is
that highmem is the wrong key here (as Russell also mentioned)
and that you may also have to bounce lowmem pages.

> > On the bright side, your other changes in the same series all look
> > good. Thanks especially for sorting out the probe() function, I was
> > already wondering if we could change that the way you did.
> 
> Thanks. Ideally I'd like to support bind and unbind on the bridge too
> (CARDBUS can hotplug PCI, so should we!), but I suppose that sorting
> out the bounce bits is more important.

Agreed.

> Also, the SWIOTLB code, are you aware of any existing ARM users? I
> also wonder if the code can work with multiple devices - the bits in
> lib/swiotlb.c looks like single instance with global system wide
> support only - perhaps it needs rework to support 3 PCI bridges?

I haven't looked at the code much, but Xen seems to use it. We can definitely
extend it if you see problems. For instance I think we may have to
wrap them to handle noncoherent buses, as the code was written for x86
and ia64 that are both cache-coherent.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-25 12:15     ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-25 12:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 February 2014, Magnus Damm wrote:
> On Mon, Feb 24, 2014 at 8:00 PM, Arnd Bergmann <arnd@arndb.de> wrote:

> > * The window base and size in the driver should not be hardcoded, as
> >   this is likely not a device specific property but rather an artifact of
> >   how it's connected. Use the standard "dma-ranges" property.
> 
> I'm afraid that I may not understand your concern fully here. From my
> view the existing driver (without my patch) is having hard coded
> board-specific memory base and a hard coded size in it, and with my
> patches I try to rework that. Do you have any existing example code to
> recommend me?

You are right, this is a preexisting condition, and your patch someone
improves this, just not the way I was hoping for.

At the moment, "dma-ranges" is only used by powerpc platforms, but
we have discussed using it on ARM a couple of times when similar
problems came up. Just today, Santosh Shilimkar posted patches for
mach-keystone, and the PCI support patches that are under review for
arm64 x-gene have a related requirement.

There are two parts to this problem: smaller-than-4GB windows and
windows that are offset to the start of memory. The dma-ranges
approach should handle both. In theory it can also deal with
multiple windows, but we have so far not needed that.

> Regarding what the device can do and/or how it is connected - here is
> some hardware information:
> 
> 1) 3 on-chip PCI bridges per SoC with USB EHCI/OHCI controllers on
> each of them (PCI bridge code is also shared between multiple SoCs and
> of course multiple boards).
> 
> 2) The systems have 40-bit physical address space and the CPU can
> address this when LPAE is enabled.
> 
> 3) System RAM is available in two banks - one bank in the lowest
> 32-bits (top 8 bits set to 0) and another bank in higher space.
> 
> 4) The PCI bridge has a 32-bit base address for the windows with
> alignment requirement (needs to be evenly aligned based on size)
> 
> 5) Each PCI bridge instance has two windows, but supported size
> differs. One window supports up to 2 GiB, another 256MiB.

Thanks for the background. Now, the only real problem is the case where
the window doesn't span all of the RAM below the 4GiB boundary, but
this would be the case at least in the 256MiB window example.

For systems where you have e.g. 2GB of RAM visible below the 4GB
boundary, you shouldn't need any special code aside from refusing
to set a 64-bit dma mask from the driver.

> Without IOMMU available I came to the conclusion that I need both
> BOUNCE and DMABOUNCE to support the above hardware.

You definitely need either DMABOUNCE or SWIOTLB for this case, yes.
The reason for this is that the PCI code assumes that every DMA
master can access all memory below the 4GB boundary if the device
supports it.

I'm less sure about CONFIG_BOUNCE, and Russell already explained
a few things about it that I didn't know. Normally I would expect
the SWIOTLB code to kick in during dma_map_sg() for block devices
just like it does for network and other devices.

> > * You should not need your own dma_map_ops in the driver. What you
> >   do there isn't really specific to your host but should live in some
> >   place where it can be shared with other drivers.
> 
> I think this boils down to the less-than-32-bit bus master capability
> and also the poor match to a DMA zone. Consider 24-bit ISA DMA address
> space vs 31-bit DMA space on a 32-bit system. I may be wrong, but a
> DMA zone in my mind is something suitable for the classic ISA DMA, not
> so much for modern complex systems with multiple devices that come
> with different bus master capabilities.
> 
> That said, of course it makes sense to share code whenever possible.
> Can you explain a bit more what kind of code you would like to have
> broken out?

I was hoping that we can get to a point where we automatically check
the dma-ranges property for platform devices and set the swiotlb
dma_map_ops if necessary. For PCI device, I think each device should
inherit the map_ops from its parent.

> > * The implementation of the dma_map_ops is wrong: at the very least,
> >   you cannot use the dma_mask of the pci host when translating
> >   calls from the device, since each pci device can have a different
> >   dma mask that is set by the driver if it needs larger or smaller
> >   than 32-bit DMA space.
> 
> The current version of the driver (with or without my patch) seems to
> leave all dma mask handling left out. I understand that this is poor
> programming practise and I would like to see that fixed. As you
> noticed, I did not try to fix this issue in my patch.
> 
> It may be worth noticing that the PCI devices hanging off the PCI
> bridge instances are all on-chip fixed to OHCI and EHCI and the
> drivers for these USB host controllers do not seem to try to set the
> dma mask as it is today. So we are talking about a fixed set of PCI
> devices here and not external PCI bus with general purpose PCI
> drivers. I'm not sure if that makes things much better though..

You still have EHCI calling dma_set_mask(DMA_BIS_MASK(64)) to allow high
DMA, while OHCI doesn't allow it, so even with just two possible devices,
you have a conflict.

> So yes, I'd like the PCI bridge driver to be fixed with proper dma
> mask handling. The question is just what that mask is supposed to
> represent on a system where we a) may have IOMMU (and if so we can
> address 40 bits), and if not we b) are limited to 31 bits but we also
> have a non-zero base address. I'm not sure how to represent that
> information with a single dma mask. Any suggestions?

With nonzero base address, do you mean you have to add a number to
the bus address to get to the CPU address? That case is handled by the
patches that Santosh just posted.

Do you always have a 31-bit mask when there is no IOMMU, and have an
IOMMU when there is a smaller mask? That may somewhat simplify the
problem space.

> > * The block bounce code can't be enabled if CONFIG_BLOCK is disabled,
> >   this is how I noticed your changes.
> 
> Do you mean that the following patch is causing some kind of build error?
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM
> 
> If possible, can you please let me know which patches that you want me
> to rework?

The build error could be worked around by changing it to 

	select BOUNCE if BLOCK && MMU && HIGHMEM

but I really think you shouldn't need BOUNCE at all, so this needs more
investigation.

> > * The block layer assumes that it will bounce any highmem pages,
> >   but that isn't necessarily what you want here: if you have 2GB
> >   of lowmem space (CONFIG_VMSPLIT_2G) or more, you will also have
> >   lowmem pages that need bouncing.
> 
> Good to hear that you also came to the conclusion that the two cases
> need to be handled separately. =)
> 
> The lowmem bounce case (CONFIG_VMSPLIT_2G) is implemented in:
> [PATCH v2 06/08] PCI: rcar: Add DMABOUNCE support
> 
> The block layer bounce is also enabled in:
> [PATCH 07/08] PCI: rcar: Enable BOUNCE in case of HIGHMEM

But what do here is to only bounce highmem pages. My point is
that highmem is the wrong key here (as Russell also mentioned)
and that you may also have to bounce lowmem pages.

> > On the bright side, your other changes in the same series all look
> > good. Thanks especially for sorting out the probe() function, I was
> > already wondering if we could change that the way you did.
> 
> Thanks. Ideally I'd like to support bind and unbind on the bridge too
> (CARDBUS can hotplug PCI, so should we!), but I suppose that sorting
> out the bounce bits is more important.

Agreed.

> Also, the SWIOTLB code, are you aware of any existing ARM users? I
> also wonder if the code can work with multiple devices - the bits in
> lib/swiotlb.c looks like single instance with global system wide
> support only - perhaps it needs rework to support 3 PCI bridges?

I haven't looked at the code much, but Xen seems to use it. We can definitely
extend it if you see problems. For instance I think we may have to
wrap them to handle noncoherent buses, as the code was written for x86
and ia64 that are both cache-coherent.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-25  2:00       ` Magnus Damm
  (?)
  (?)
@ 2014-02-25 15:44         ` Arnd Bergmann
  -1 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-25 15:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 February 2014, Magnus Damm wrote:
> And the DMABOUNCE code does not support HIGHMEM, so because of that
> the block layer BOUNCE is also used.

Ah, I misunderstood this part previously. I understand better what's
going on now, but this also enforces the impression that both BOUNCE
and DMABOUNCE are not what you should be doing here.

On a related note, I've had some more discussions with Santosh on IRC,
and I think he's in the exact same position on mach-keystone, so we
should make sure that whatever solution either of you comes up with
also works for the other one.

The situation on keystone may be a little worse even, because all their
DMA masters have a 2GB limit, but it's also possible that the same
is true for you. Which categories of DMA masters do you have on R-Car?

a) less than 32-bit mask, with IOMMU
b) less than 32-bit mask, without IOMMU
c) 32-bit mask
d) 64-bit mask

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-25 15:44         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-25 15:44 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Russell King - ARM Linux, Magnus Damm, linux-pci, Simon Horman,
	linux-kernel, Bjorn Helgaas, linux-arm-kernel, SH-Linux,
	Ben Dooks, Santosh Shilimkar

On Tuesday 25 February 2014, Magnus Damm wrote:
> And the DMABOUNCE code does not support HIGHMEM, so because of that
> the block layer BOUNCE is also used.

Ah, I misunderstood this part previously. I understand better what's
going on now, but this also enforces the impression that both BOUNCE
and DMABOUNCE are not what you should be doing here.

On a related note, I've had some more discussions with Santosh on IRC,
and I think he's in the exact same position on mach-keystone, so we
should make sure that whatever solution either of you comes up with
also works for the other one.

The situation on keystone may be a little worse even, because all their
DMA masters have a 2GB limit, but it's also possible that the same
is true for you. Which categories of DMA masters do you have on R-Car?

a) less than 32-bit mask, with IOMMU
b) less than 32-bit mask, without IOMMU
c) 32-bit mask
d) 64-bit mask

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-25 15:44         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-25 15:44 UTC (permalink / raw)
  To: Magnus Damm
  Cc: Russell King - ARM Linux, Magnus Damm, linux-pci, Simon Horman,
	linux-kernel, Bjorn Helgaas, linux-arm-kernel, SH-Linux,
	Ben Dooks, Santosh Shilimkar

On Tuesday 25 February 2014, Magnus Damm wrote:
> And the DMABOUNCE code does not support HIGHMEM, so because of that
> the block layer BOUNCE is also used.

Ah, I misunderstood this part previously. I understand better what's
going on now, but this also enforces the impression that both BOUNCE
and DMABOUNCE are not what you should be doing here.

On a related note, I've had some more discussions with Santosh on IRC,
and I think he's in the exact same position on mach-keystone, so we
should make sure that whatever solution either of you comes up with
also works for the other one.

The situation on keystone may be a little worse even, because all their
DMA masters have a 2GB limit, but it's also possible that the same
is true for you. Which categories of DMA masters do you have on R-Car?

a) less than 32-bit mask, with IOMMU
b) less than 32-bit mask, without IOMMU
c) 32-bit mask
d) 64-bit mask

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-25 15:44         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-25 15:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday 25 February 2014, Magnus Damm wrote:
> And the DMABOUNCE code does not support HIGHMEM, so because of that
> the block layer BOUNCE is also used.

Ah, I misunderstood this part previously. I understand better what's
going on now, but this also enforces the impression that both BOUNCE
and DMABOUNCE are not what you should be doing here.

On a related note, I've had some more discussions with Santosh on IRC,
and I think he's in the exact same position on mach-keystone, so we
should make sure that whatever solution either of you comes up with
also works for the other one.

The situation on keystone may be a little worse even, because all their
DMA masters have a 2GB limit, but it's also possible that the same
is true for you. Which categories of DMA masters do you have on R-Car?

a) less than 32-bit mask, with IOMMU
b) less than 32-bit mask, without IOMMU
c) 32-bit mask
d) 64-bit mask

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-24 11:00 ` Arnd Bergmann
  (?)
@ 2014-02-26 19:48   ` Bjorn Helgaas
  -1 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> Hi Magnus,
>
> I noticed during randconfig testing that you enabled DMABOUNCE for the
> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>
> I didn't see the original post unfortunately, but I fear we have to
> revert it and come up with a better solution, ...

Sounds like I should drop the following patches from my pci/host-rcar
branch for now?

  PCI: rcar: Add DMABOUNCE support
  PCI: rcar: Enable BOUNCE in case of HIGHMEM
  PCI: rcar: Make the Kconfig dependencies more generic

Bjorn

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-26 19:48   ` Bjorn Helgaas
  0 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Magnus Damm, linux-pci, Russell King, Simon Horman, linux-kernel,
	linux-arm, linux-sh, Ben Dooks

On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> Hi Magnus,
>
> I noticed during randconfig testing that you enabled DMABOUNCE for the
> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>
> I didn't see the original post unfortunately, but I fear we have to
> revert it and come up with a better solution, ...

Sounds like I should drop the following patches from my pci/host-rcar
branch for now?

  PCI: rcar: Add DMABOUNCE support
  PCI: rcar: Enable BOUNCE in case of HIGHMEM
  PCI: rcar: Make the Kconfig dependencies more generic

Bjorn

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-26 19:48   ` Bjorn Helgaas
  0 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 19:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> Hi Magnus,
>
> I noticed during randconfig testing that you enabled DMABOUNCE for the
> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>
> I didn't see the original post unfortunately, but I fear we have to
> revert it and come up with a better solution, ...

Sounds like I should drop the following patches from my pci/host-rcar
branch for now?

  PCI: rcar: Add DMABOUNCE support
  PCI: rcar: Enable BOUNCE in case of HIGHMEM
  PCI: rcar: Make the Kconfig dependencies more generic

Bjorn

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-26 19:48   ` Bjorn Helgaas
  (?)
@ 2014-02-26 19:57     ` Arnd Bergmann
  -1 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-26 19:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > Hi Magnus,
> >
> > I noticed during randconfig testing that you enabled DMABOUNCE for the
> > pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
> >
> > I didn't see the original post unfortunately, but I fear we have to
> > revert it and come up with a better solution, ...
> 
> Sounds like I should drop the following patches from my pci/host-rcar
> branch for now?
> 
>   PCI: rcar: Add DMABOUNCE support
>   PCI: rcar: Enable BOUNCE in case of HIGHMEM
>   PCI: rcar: Make the Kconfig dependencies more generic

Sounds good to me. The last patch is actually fine, but you'll have to
fix the context to apply it without the other two.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-26 19:57     ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-26 19:57 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Bjorn Helgaas, Russell King, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Simon Horman

On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > Hi Magnus,
> >
> > I noticed during randconfig testing that you enabled DMABOUNCE for the
> > pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
> >
> > I didn't see the original post unfortunately, but I fear we have to
> > revert it and come up with a better solution, ...
> 
> Sounds like I should drop the following patches from my pci/host-rcar
> branch for now?
> 
>   PCI: rcar: Add DMABOUNCE support
>   PCI: rcar: Enable BOUNCE in case of HIGHMEM
>   PCI: rcar: Make the Kconfig dependencies more generic

Sounds good to me. The last patch is actually fine, but you'll have to
fix the context to apply it without the other two.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-26 19:57     ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-26 19:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> > Hi Magnus,
> >
> > I noticed during randconfig testing that you enabled DMABOUNCE for the
> > pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
> >
> > I didn't see the original post unfortunately, but I fear we have to
> > revert it and come up with a better solution, ...
> 
> Sounds like I should drop the following patches from my pci/host-rcar
> branch for now?
> 
>   PCI: rcar: Add DMABOUNCE support
>   PCI: rcar: Enable BOUNCE in case of HIGHMEM
>   PCI: rcar: Make the Kconfig dependencies more generic

Sounds good to me. The last patch is actually fine, but you'll have to
fix the context to apply it without the other two.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-26 19:57     ` Arnd Bergmann
  (?)
@ 2014-02-26 21:02       ` Bjorn Helgaas
  -1 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 21:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 26, 2014 at 12:57 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > Hi Magnus,
>> >
>> > I noticed during randconfig testing that you enabled DMABOUNCE for the
>> > pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>> >
>> > I didn't see the original post unfortunately, but I fear we have to
>> > revert it and come up with a better solution, ...
>>
>> Sounds like I should drop the following patches from my pci/host-rcar
>> branch for now?
>>
>>   PCI: rcar: Add DMABOUNCE support
>>   PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>   PCI: rcar: Make the Kconfig dependencies more generic
>
> Sounds good to me. The last patch is actually fine, but you'll have to
> fix the context to apply it without the other two.

OK, I dropped the DMABOUNCE and BOUNCE patches and force-updated my
"next" branch.

Bjorn

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-26 21:02       ` Bjorn Helgaas
  0 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 21:02 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm, Russell King, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Simon Horman

On Wed, Feb 26, 2014 at 12:57 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > Hi Magnus,
>> >
>> > I noticed during randconfig testing that you enabled DMABOUNCE for the
>> > pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>> >
>> > I didn't see the original post unfortunately, but I fear we have to
>> > revert it and come up with a better solution, ...
>>
>> Sounds like I should drop the following patches from my pci/host-rcar
>> branch for now?
>>
>>   PCI: rcar: Add DMABOUNCE support
>>   PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>   PCI: rcar: Make the Kconfig dependencies more generic
>
> Sounds good to me. The last patch is actually fine, but you'll have to
> fix the context to apply it without the other two.

OK, I dropped the DMABOUNCE and BOUNCE patches and force-updated my
"next" branch.

Bjorn

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-26 21:02       ` Bjorn Helgaas
  0 siblings, 0 replies; 78+ messages in thread
From: Bjorn Helgaas @ 2014-02-26 21:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Feb 26, 2014 at 12:57 PM, Arnd Bergmann <arnd@arndb.de> wrote:
> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>> > Hi Magnus,
>> >
>> > I noticed during randconfig testing that you enabled DMABOUNCE for the
>> > pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>> >
>> > I didn't see the original post unfortunately, but I fear we have to
>> > revert it and come up with a better solution, ...
>>
>> Sounds like I should drop the following patches from my pci/host-rcar
>> branch for now?
>>
>>   PCI: rcar: Add DMABOUNCE support
>>   PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>   PCI: rcar: Make the Kconfig dependencies more generic
>
> Sounds good to me. The last patch is actually fine, but you'll have to
> fix the context to apply it without the other two.

OK, I dropped the DMABOUNCE and BOUNCE patches and force-updated my
"next" branch.

Bjorn

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-26 21:02       ` Bjorn Helgaas
  (?)
@ 2014-02-26 21:52         ` Arnd Bergmann
  -1 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-26 21:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 26 February 2014, Bjorn Helgaas wrote:
> OK, I dropped the DMABOUNCE and BOUNCE patches and force-updated my
> "next" branch.

Thanks!

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-02-26 21:52         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-26 21:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-arm, Russell King, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Simon Horman

On Wednesday 26 February 2014, Bjorn Helgaas wrote:
> OK, I dropped the DMABOUNCE and BOUNCE patches and force-updated my
> "next" branch.

Thanks!

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-02-26 21:52         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-02-26 21:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 26 February 2014, Bjorn Helgaas wrote:
> OK, I dropped the DMABOUNCE and BOUNCE patches and force-updated my
> "next" branch.

Thanks!

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-02-26 19:57     ` Arnd Bergmann
  (?)
@ 2014-03-20 15:04       ` Ben Dooks
  -1 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/02/14 20:57, Arnd Bergmann wrote:
> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> Hi Magnus,
>>>
>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>
>>> I didn't see the original post unfortunately, but I fear we have to
>>> revert it and come up with a better solution, ...
>>
>> Sounds like I should drop the following patches from my pci/host-rcar
>> branch for now?
>>
>>    PCI: rcar: Add DMABOUNCE support
>>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>    PCI: rcar: Make the Kconfig dependencies more generic
>
> Sounds good to me. The last patch is actually fine, but you'll have to
> fix the context to apply it without the other two.

As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
controllers no longer work with full 2GiB RAM enabled in the device
tree.

Could we work around this by having 1GiB of memory defined in the
32bit memory and then add the rest of the 3GiB from the >32bit
area via LPAE? Will the kernel ever try to allocate DMA memory from
anything >32bit?

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 15:04       ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 15:04 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Russell King, linux-sh, linux-pci,
	linux-kernel, Magnus Damm, Ben Dooks, Bjorn Helgaas,
	Simon Horman

On 26/02/14 20:57, Arnd Bergmann wrote:
> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> Hi Magnus,
>>>
>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>
>>> I didn't see the original post unfortunately, but I fear we have to
>>> revert it and come up with a better solution, ...
>>
>> Sounds like I should drop the following patches from my pci/host-rcar
>> branch for now?
>>
>>    PCI: rcar: Add DMABOUNCE support
>>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>    PCI: rcar: Make the Kconfig dependencies more generic
>
> Sounds good to me. The last patch is actually fine, but you'll have to
> fix the context to apply it without the other two.

As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
controllers no longer work with full 2GiB RAM enabled in the device
tree.

Could we work around this by having 1GiB of memory defined in the
32bit memory and then add the rest of the 3GiB from the >32bit
area via LPAE? Will the kernel ever try to allocate DMA memory from
anything >32bit?

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 15:04       ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 15:04 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/02/14 20:57, Arnd Bergmann wrote:
> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>> Hi Magnus,
>>>
>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>
>>> I didn't see the original post unfortunately, but I fear we have to
>>> revert it and come up with a better solution, ...
>>
>> Sounds like I should drop the following patches from my pci/host-rcar
>> branch for now?
>>
>>    PCI: rcar: Add DMABOUNCE support
>>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>    PCI: rcar: Make the Kconfig dependencies more generic
>
> Sounds good to me. The last patch is actually fine, but you'll have to
> fix the context to apply it without the other two.

As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
controllers no longer work with full 2GiB RAM enabled in the device
tree.

Could we work around this by having 1GiB of memory defined in the
32bit memory and then add the rest of the 3GiB from the >32bit
area via LPAE? Will the kernel ever try to allocate DMA memory from
anything >32bit?

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 15:04       ` Ben Dooks
  (?)
@ 2014-03-20 15:26         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 04:04:36PM +0100, Ben Dooks wrote:
> On 26/02/14 20:57, Arnd Bergmann wrote:
>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>> Hi Magnus,
>>>>
>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>
>>>> I didn't see the original post unfortunately, but I fear we have to
>>>> revert it and come up with a better solution, ...
>>>
>>> Sounds like I should drop the following patches from my pci/host-rcar
>>> branch for now?
>>>
>>>    PCI: rcar: Add DMABOUNCE support
>>>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>    PCI: rcar: Make the Kconfig dependencies more generic
>>
>> Sounds good to me. The last patch is actually fine, but you'll have to
>> fix the context to apply it without the other two.
>
> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> controllers no longer work with full 2GiB RAM enabled in the device
> tree.
>
> Could we work around this by having 1GiB of memory defined in the
> 32bit memory and then add the rest of the 3GiB from the >32bit
> area via LPAE? Will the kernel ever try to allocate DMA memory from
> anything >32bit?

Ignore highmem.  What is the actual hardware restriction concerning memory
it can access?

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 15:26         ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 15:26 UTC (permalink / raw)
  To: Ben Dooks
  Cc: Arnd Bergmann, linux-sh, linux-pci, linux-kernel, Magnus Damm,
	Ben Dooks, Bjorn Helgaas, Simon Horman, linux-arm-kernel

On Thu, Mar 20, 2014 at 04:04:36PM +0100, Ben Dooks wrote:
> On 26/02/14 20:57, Arnd Bergmann wrote:
>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>> Hi Magnus,
>>>>
>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>
>>>> I didn't see the original post unfortunately, but I fear we have to
>>>> revert it and come up with a better solution, ...
>>>
>>> Sounds like I should drop the following patches from my pci/host-rcar
>>> branch for now?
>>>
>>>    PCI: rcar: Add DMABOUNCE support
>>>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>    PCI: rcar: Make the Kconfig dependencies more generic
>>
>> Sounds good to me. The last patch is actually fine, but you'll have to
>> fix the context to apply it without the other two.
>
> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> controllers no longer work with full 2GiB RAM enabled in the device
> tree.
>
> Could we work around this by having 1GiB of memory defined in the
> 32bit memory and then add the rest of the 3GiB from the >32bit
> area via LPAE? Will the kernel ever try to allocate DMA memory from
> anything >32bit?

Ignore highmem.  What is the actual hardware restriction concerning memory
it can access?

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 15:26         ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 15:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 04:04:36PM +0100, Ben Dooks wrote:
> On 26/02/14 20:57, Arnd Bergmann wrote:
>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>> Hi Magnus,
>>>>
>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>
>>>> I didn't see the original post unfortunately, but I fear we have to
>>>> revert it and come up with a better solution, ...
>>>
>>> Sounds like I should drop the following patches from my pci/host-rcar
>>> branch for now?
>>>
>>>    PCI: rcar: Add DMABOUNCE support
>>>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>    PCI: rcar: Make the Kconfig dependencies more generic
>>
>> Sounds good to me. The last patch is actually fine, but you'll have to
>> fix the context to apply it without the other two.
>
> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> controllers no longer work with full 2GiB RAM enabled in the device
> tree.
>
> Could we work around this by having 1GiB of memory defined in the
> 32bit memory and then add the rest of the 3GiB from the >32bit
> area via LPAE? Will the kernel ever try to allocate DMA memory from
> anything >32bit?

Ignore highmem.  What is the actual hardware restriction concerning memory
it can access?

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 15:26         ` Russell King - ARM Linux
  (?)
@ 2014-03-20 15:36           ` Ben Dooks
  -1 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/14 16:26, Russell King - ARM Linux wrote:
> On Thu, Mar 20, 2014 at 04:04:36PM +0100, Ben Dooks wrote:
>> On 26/02/14 20:57, Arnd Bergmann wrote:
>>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>> Hi Magnus,
>>>>>
>>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>>
>>>>> I didn't see the original post unfortunately, but I fear we have to
>>>>> revert it and come up with a better solution, ...
>>>>
>>>> Sounds like I should drop the following patches from my pci/host-rcar
>>>> branch for now?
>>>>
>>>>     PCI: rcar: Add DMABOUNCE support
>>>>     PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>>     PCI: rcar: Make the Kconfig dependencies more generic
>>>
>>> Sounds good to me. The last patch is actually fine, but you'll have to
>>> fix the context to apply it without the other two.
>>
>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>> controllers no longer work with full 2GiB RAM enabled in the device
>> tree.
>>
>> Could we work around this by having 1GiB of memory defined in the
>> 32bit memory and then add the rest of the 3GiB from the >32bit
>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>> anything >32bit?
>
> Ignore highmem.  What is the actual hardware restriction concerning memory
> it can access?

If the window is set to 2GiB it can only access 2GiB aligned to 2GiB
boundary, however DRAM spans a 2GiB boundary (being at 1-3GiB)


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 15:36           ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 15:36 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, linux-sh, linux-pci, linux-kernel, Magnus Damm,
	Ben Dooks, Bjorn Helgaas, Simon Horman, linux-arm-kernel

On 20/03/14 16:26, Russell King - ARM Linux wrote:
> On Thu, Mar 20, 2014 at 04:04:36PM +0100, Ben Dooks wrote:
>> On 26/02/14 20:57, Arnd Bergmann wrote:
>>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>> Hi Magnus,
>>>>>
>>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>>
>>>>> I didn't see the original post unfortunately, but I fear we have to
>>>>> revert it and come up with a better solution, ...
>>>>
>>>> Sounds like I should drop the following patches from my pci/host-rcar
>>>> branch for now?
>>>>
>>>>     PCI: rcar: Add DMABOUNCE support
>>>>     PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>>     PCI: rcar: Make the Kconfig dependencies more generic
>>>
>>> Sounds good to me. The last patch is actually fine, but you'll have to
>>> fix the context to apply it without the other two.
>>
>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>> controllers no longer work with full 2GiB RAM enabled in the device
>> tree.
>>
>> Could we work around this by having 1GiB of memory defined in the
>> 32bit memory and then add the rest of the 3GiB from the >32bit
>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>> anything >32bit?
>
> Ignore highmem.  What is the actual hardware restriction concerning memory
> it can access?

If the window is set to 2GiB it can only access 2GiB aligned to 2GiB
boundary, however DRAM spans a 2GiB boundary (being at 1-3GiB)


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 15:36           ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 15:36 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/14 16:26, Russell King - ARM Linux wrote:
> On Thu, Mar 20, 2014 at 04:04:36PM +0100, Ben Dooks wrote:
>> On 26/02/14 20:57, Arnd Bergmann wrote:
>>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>> Hi Magnus,
>>>>>
>>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>>
>>>>> I didn't see the original post unfortunately, but I fear we have to
>>>>> revert it and come up with a better solution, ...
>>>>
>>>> Sounds like I should drop the following patches from my pci/host-rcar
>>>> branch for now?
>>>>
>>>>     PCI: rcar: Add DMABOUNCE support
>>>>     PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>>     PCI: rcar: Make the Kconfig dependencies more generic
>>>
>>> Sounds good to me. The last patch is actually fine, but you'll have to
>>> fix the context to apply it without the other two.
>>
>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>> controllers no longer work with full 2GiB RAM enabled in the device
>> tree.
>>
>> Could we work around this by having 1GiB of memory defined in the
>> 32bit memory and then add the rest of the 3GiB from the >32bit
>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>> anything >32bit?
>
> Ignore highmem.  What is the actual hardware restriction concerning memory
> it can access?

If the window is set to 2GiB it can only access 2GiB aligned to 2GiB
boundary, however DRAM spans a 2GiB boundary (being at 1-3GiB)


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 15:04       ` Ben Dooks
  (?)
@ 2014-03-20 16:09         ` Arnd Bergmann
  -1 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
> On 26/02/14 20:57, Arnd Bergmann wrote:
> > On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
> >> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> >>> Hi Magnus,
> >>>
> >>> I noticed during randconfig testing that you enabled DMABOUNCE for the
> >>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
> >>>
> >>> I didn't see the original post unfortunately, but I fear we have to
> >>> revert it and come up with a better solution, ...
> >>
> >> Sounds like I should drop the following patches from my pci/host-rcar
> >> branch for now?
> >>
> >>    PCI: rcar: Add DMABOUNCE support
> >>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
> >>    PCI: rcar: Make the Kconfig dependencies more generic
> >
> > Sounds good to me. The last patch is actually fine, but you'll have to
> > fix the context to apply it without the other two.
> 
> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> controllers no longer work with full 2GiB RAM enabled in the device
> tree.

Did it work before these patches got applied initially?

> Could we work around this by having 1GiB of memory defined in the
> 32bit memory and then add the rest of the 3GiB from the >32bit
> area via LPAE? Will the kernel ever try to allocate DMA memory from
> anything >32bit?

You can solve the case for dma_alloc_coherent() this way, or by
setting the mask correctly. It won't help you for dma_map_* though,
which still requires someone to add support for swiotlb or using
an IOMMU if present.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 16:09         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 16:09 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Ben Dooks, Russell King, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Bjorn Helgaas, Simon Horman

On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
> On 26/02/14 20:57, Arnd Bergmann wrote:
> > On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
> >> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> >>> Hi Magnus,
> >>>
> >>> I noticed during randconfig testing that you enabled DMABOUNCE for the
> >>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
> >>>
> >>> I didn't see the original post unfortunately, but I fear we have to
> >>> revert it and come up with a better solution, ...
> >>
> >> Sounds like I should drop the following patches from my pci/host-rcar
> >> branch for now?
> >>
> >>    PCI: rcar: Add DMABOUNCE support
> >>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
> >>    PCI: rcar: Make the Kconfig dependencies more generic
> >
> > Sounds good to me. The last patch is actually fine, but you'll have to
> > fix the context to apply it without the other two.
> 
> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> controllers no longer work with full 2GiB RAM enabled in the device
> tree.

Did it work before these patches got applied initially?

> Could we work around this by having 1GiB of memory defined in the
> 32bit memory and then add the rest of the 3GiB from the >32bit
> area via LPAE? Will the kernel ever try to allocate DMA memory from
> anything >32bit?

You can solve the case for dma_alloc_coherent() this way, or by
setting the mask correctly. It won't help you for dma_map_* though,
which still requires someone to add support for swiotlb or using
an IOMMU if present.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 16:09         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 16:09 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
> On 26/02/14 20:57, Arnd Bergmann wrote:
> > On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
> >> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
> >>> Hi Magnus,
> >>>
> >>> I noticed during randconfig testing that you enabled DMABOUNCE for the
> >>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
> >>>
> >>> I didn't see the original post unfortunately, but I fear we have to
> >>> revert it and come up with a better solution, ...
> >>
> >> Sounds like I should drop the following patches from my pci/host-rcar
> >> branch for now?
> >>
> >>    PCI: rcar: Add DMABOUNCE support
> >>    PCI: rcar: Enable BOUNCE in case of HIGHMEM
> >>    PCI: rcar: Make the Kconfig dependencies more generic
> >
> > Sounds good to me. The last patch is actually fine, but you'll have to
> > fix the context to apply it without the other two.
> 
> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> controllers no longer work with full 2GiB RAM enabled in the device
> tree.

Did it work before these patches got applied initially?

> Could we work around this by having 1GiB of memory defined in the
> 32bit memory and then add the rest of the 3GiB from the >32bit
> area via LPAE? Will the kernel ever try to allocate DMA memory from
> anything >32bit?

You can solve the case for dma_alloc_coherent() this way, or by
setting the mask correctly. It won't help you for dma_map_* though,
which still requires someone to add support for swiotlb or using
an IOMMU if present.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 16:09         ` Arnd Bergmann
  (?)
@ 2014-03-20 16:12           ` Ben Dooks
  -1 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/14 17:09, Arnd Bergmann wrote:
> On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
>> On 26/02/14 20:57, Arnd Bergmann wrote:
>>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>> Hi Magnus,
>>>>>
>>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>>
>>>>> I didn't see the original post unfortunately, but I fear we have to
>>>>> revert it and come up with a better solution, ...
>>>>
>>>> Sounds like I should drop the following patches from my pci/host-rcar
>>>> branch for now?
>>>>
>>>>     PCI: rcar: Add DMABOUNCE support
>>>>     PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>>     PCI: rcar: Make the Kconfig dependencies more generic
>>>
>>> Sounds good to me. The last patch is actually fine, but you'll have to
>>> fix the context to apply it without the other two.
>>
>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>> controllers no longer work with full 2GiB RAM enabled in the device
>> tree.
>
> Did it work before these patches got applied initially?

It did, but I cannot remember if we where limiting DRAM to 1GiB
or not.

>> Could we work around this by having 1GiB of memory defined in the
>> 32bit memory and then add the rest of the 3GiB from the >32bit
>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>> anything >32bit?
>
> You can solve the case for dma_alloc_coherent() this way, or by
> setting the mask correctly. It won't help you for dma_map_* though,
> which still requires someone to add support for swiotlb or using
> an IOMMU if present.

We do not have an IOMMU present at the moment. Not sure how
to go about setting a mask on a pci-probed device.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 16:12           ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 16:12 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Russell King, linux-sh, linux-pci,
	linux-kernel, Magnus Damm, Ben Dooks, Bjorn Helgaas,
	Simon Horman

On 20/03/14 17:09, Arnd Bergmann wrote:
> On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
>> On 26/02/14 20:57, Arnd Bergmann wrote:
>>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>> Hi Magnus,
>>>>>
>>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>>
>>>>> I didn't see the original post unfortunately, but I fear we have to
>>>>> revert it and come up with a better solution, ...
>>>>
>>>> Sounds like I should drop the following patches from my pci/host-rcar
>>>> branch for now?
>>>>
>>>>     PCI: rcar: Add DMABOUNCE support
>>>>     PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>>     PCI: rcar: Make the Kconfig dependencies more generic
>>>
>>> Sounds good to me. The last patch is actually fine, but you'll have to
>>> fix the context to apply it without the other two.
>>
>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>> controllers no longer work with full 2GiB RAM enabled in the device
>> tree.
>
> Did it work before these patches got applied initially?

It did, but I cannot remember if we where limiting DRAM to 1GiB
or not.

>> Could we work around this by having 1GiB of memory defined in the
>> 32bit memory and then add the rest of the 3GiB from the >32bit
>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>> anything >32bit?
>
> You can solve the case for dma_alloc_coherent() this way, or by
> setting the mask correctly. It won't help you for dma_map_* though,
> which still requires someone to add support for swiotlb or using
> an IOMMU if present.

We do not have an IOMMU present at the moment. Not sure how
to go about setting a mask on a pci-probed device.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 16:12           ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/14 17:09, Arnd Bergmann wrote:
> On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
>> On 26/02/14 20:57, Arnd Bergmann wrote:
>>> On Wednesday 26 February 2014 12:48:17 Bjorn Helgaas wrote:
>>>> On Mon, Feb 24, 2014 at 4:00 AM, Arnd Bergmann <arnd@arndb.de> wrote:
>>>>> Hi Magnus,
>>>>>
>>>>> I noticed during randconfig testing that you enabled DMABOUNCE for the
>>>>> pci-rcar-gen2 driver as posted in this patch https://lkml.org/lkml/2014/2/5/30
>>>>>
>>>>> I didn't see the original post unfortunately, but I fear we have to
>>>>> revert it and come up with a better solution, ...
>>>>
>>>> Sounds like I should drop the following patches from my pci/host-rcar
>>>> branch for now?
>>>>
>>>>     PCI: rcar: Add DMABOUNCE support
>>>>     PCI: rcar: Enable BOUNCE in case of HIGHMEM
>>>>     PCI: rcar: Make the Kconfig dependencies more generic
>>>
>>> Sounds good to me. The last patch is actually fine, but you'll have to
>>> fix the context to apply it without the other two.
>>
>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>> controllers no longer work with full 2GiB RAM enabled in the device
>> tree.
>
> Did it work before these patches got applied initially?

It did, but I cannot remember if we where limiting DRAM to 1GiB
or not.

>> Could we work around this by having 1GiB of memory defined in the
>> 32bit memory and then add the rest of the 3GiB from the >32bit
>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>> anything >32bit?
>
> You can solve the case for dma_alloc_coherent() this way, or by
> setting the mask correctly. It won't help you for dma_map_* though,
> which still requires someone to add support for swiotlb or using
> an IOMMU if present.

We do not have an IOMMU present at the moment. Not sure how
to go about setting a mask on a pci-probed device.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 16:12           ` Ben Dooks
  (?)
@ 2014-03-20 16:41             ` Arnd Bergmann
  -1 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014 17:12:43 Ben Dooks wrote:
> On 20/03/14 17:09, Arnd Bergmann wrote:
> > On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:

> >> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> >> controllers no longer work with full 2GiB RAM enabled in the device
> >> tree.
> >
> > Did it work before these patches got applied initially?
> 
> It did, but I cannot remember if we where limiting DRAM to 1GiB
> or not.

I would assume you did, or you happened to never actually use
memory higher than that for DMA during tests.

> >> Could we work around this by having 1GiB of memory defined in the
> >> 32bit memory and then add the rest of the 3GiB from the >32bit
> >> area via LPAE? Will the kernel ever try to allocate DMA memory from
> >> anything >32bit?
> >
> > You can solve the case for dma_alloc_coherent() this way, or by
> > setting the mask correctly. It won't help you for dma_map_* though,
> > which still requires someone to add support for swiotlb or using
> > an IOMMU if present.
> 
> We do not have an IOMMU present at the moment. Not sure how
> to go about setting a mask on a pci-probed device.

Ah, right. The mask is a problem because PCI devices assume that
they can do DMA to any 32-bit masked address without calling
dma_set_mask. Your trick to describe the system memory at a different
physical alias would solve this part. Another option might be to
add a quirk in the OHCI/EHCI drivers, if you are able to detect
this special case from the PCI IDs. Whether we can allow the PCI host
controller to just set a mask is an open question. If we decide to go
that route, you should be able to do it using the add_bus() callback
in the host controller driver. However, it would be a departure from
the normal way of doing PCI DMA, and I can't foresee what the implications
would be.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 16:41             ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 16:41 UTC (permalink / raw)
  To: Ben Dooks
  Cc: linux-arm-kernel, Russell King, linux-sh, linux-pci,
	linux-kernel, Magnus Damm, Ben Dooks, Bjorn Helgaas,
	Simon Horman

On Thursday 20 March 2014 17:12:43 Ben Dooks wrote:
> On 20/03/14 17:09, Arnd Bergmann wrote:
> > On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:

> >> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> >> controllers no longer work with full 2GiB RAM enabled in the device
> >> tree.
> >
> > Did it work before these patches got applied initially?
> 
> It did, but I cannot remember if we where limiting DRAM to 1GiB
> or not.

I would assume you did, or you happened to never actually use
memory higher than that for DMA during tests.

> >> Could we work around this by having 1GiB of memory defined in the
> >> 32bit memory and then add the rest of the 3GiB from the >32bit
> >> area via LPAE? Will the kernel ever try to allocate DMA memory from
> >> anything >32bit?
> >
> > You can solve the case for dma_alloc_coherent() this way, or by
> > setting the mask correctly. It won't help you for dma_map_* though,
> > which still requires someone to add support for swiotlb or using
> > an IOMMU if present.
> 
> We do not have an IOMMU present at the moment. Not sure how
> to go about setting a mask on a pci-probed device.

Ah, right. The mask is a problem because PCI devices assume that
they can do DMA to any 32-bit masked address without calling
dma_set_mask. Your trick to describe the system memory at a different
physical alias would solve this part. Another option might be to
add a quirk in the OHCI/EHCI drivers, if you are able to detect
this special case from the PCI IDs. Whether we can allow the PCI host
controller to just set a mask is an open question. If we decide to go
that route, you should be able to do it using the add_bus() callback
in the host controller driver. However, it would be a departure from
the normal way of doing PCI DMA, and I can't foresee what the implications
would be.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 16:41             ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014 17:12:43 Ben Dooks wrote:
> On 20/03/14 17:09, Arnd Bergmann wrote:
> > On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:

> >> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
> >> controllers no longer work with full 2GiB RAM enabled in the device
> >> tree.
> >
> > Did it work before these patches got applied initially?
> 
> It did, but I cannot remember if we where limiting DRAM to 1GiB
> or not.

I would assume you did, or you happened to never actually use
memory higher than that for DMA during tests.

> >> Could we work around this by having 1GiB of memory defined in the
> >> 32bit memory and then add the rest of the 3GiB from the >32bit
> >> area via LPAE? Will the kernel ever try to allocate DMA memory from
> >> anything >32bit?
> >
> > You can solve the case for dma_alloc_coherent() this way, or by
> > setting the mask correctly. It won't help you for dma_map_* though,
> > which still requires someone to add support for swiotlb or using
> > an IOMMU if present.
> 
> We do not have an IOMMU present at the moment. Not sure how
> to go about setting a mask on a pci-probed device.

Ah, right. The mask is a problem because PCI devices assume that
they can do DMA to any 32-bit masked address without calling
dma_set_mask. Your trick to describe the system memory at a different
physical alias would solve this part. Another option might be to
add a quirk in the OHCI/EHCI drivers, if you are able to detect
this special case from the PCI IDs. Whether we can allow the PCI host
controller to just set a mask is an open question. If we decide to go
that route, you should be able to do it using the add_bus() callback
in the host controller driver. However, it would be a departure from
the normal way of doing PCI DMA, and I can't foresee what the implications
would be.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 16:41             ` Arnd Bergmann
  (?)
@ 2014-03-20 17:25               ` Ben Dooks
  -1 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/14 17:41, Arnd Bergmann wrote:
> On Thursday 20 March 2014 17:12:43 Ben Dooks wrote:
>> On 20/03/14 17:09, Arnd Bergmann wrote:
>>> On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
>
>>>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>>>> controllers no longer work with full 2GiB RAM enabled in the device
>>>> tree.
>>>
>>> Did it work before these patches got applied initially?
>>
>> It did, but I cannot remember if we where limiting DRAM to 1GiB
>> or not.
>
> I would assume you did, or you happened to never actually use
> memory higher than that for DMA during tests.
>
>>>> Could we work around this by having 1GiB of memory defined in the
>>>> 32bit memory and then add the rest of the 3GiB from the >32bit
>>>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>>>> anything >32bit?
>>>
>>> You can solve the case for dma_alloc_coherent() this way, or by
>>> setting the mask correctly. It won't help you for dma_map_* though,
>>> which still requires someone to add support for swiotlb or using
>>> an IOMMU if present.
>>
>> We do not have an IOMMU present at the moment. Not sure how
>> to go about setting a mask on a pci-probed device.
>
> Ah, right. The mask is a problem because PCI devices assume that
> they can do DMA to any 32-bit masked address without calling
> dma_set_mask. Your trick to describe the system memory at a different
> physical alias would solve this part. Another option might be to
> add a quirk in the OHCI/EHCI drivers, if you are able to detect
> this special case from the PCI IDs. Whether we can allow the PCI host
> controller to just set a mask is an open question. If we decide to go
> that route, you should be able to do it using the add_bus() callback
> in the host controller driver. However, it would be a departure from
> the normal way of doing PCI DMA, and I can't foresee what the implications
> would be.

So doing:

static void pci_rcar_fixup(struct pci_dev *dev)
{
	if (dev->bus->ops = &rcar_pci_ops) {
		dev_info(&dev->dev, "applying new dma mask\n");
		dev->dma_mask = DMA_BIT_MASK(31);
	}
}

DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);

Did not work for me :(

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 17:25               ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 17:25 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Russell King, linux-sh, linux-pci,
	linux-kernel, Magnus Damm, Ben Dooks, Bjorn Helgaas,
	Simon Horman

On 20/03/14 17:41, Arnd Bergmann wrote:
> On Thursday 20 March 2014 17:12:43 Ben Dooks wrote:
>> On 20/03/14 17:09, Arnd Bergmann wrote:
>>> On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
>
>>>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>>>> controllers no longer work with full 2GiB RAM enabled in the device
>>>> tree.
>>>
>>> Did it work before these patches got applied initially?
>>
>> It did, but I cannot remember if we where limiting DRAM to 1GiB
>> or not.
>
> I would assume you did, or you happened to never actually use
> memory higher than that for DMA during tests.
>
>>>> Could we work around this by having 1GiB of memory defined in the
>>>> 32bit memory and then add the rest of the 3GiB from the >32bit
>>>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>>>> anything >32bit?
>>>
>>> You can solve the case for dma_alloc_coherent() this way, or by
>>> setting the mask correctly. It won't help you for dma_map_* though,
>>> which still requires someone to add support for swiotlb or using
>>> an IOMMU if present.
>>
>> We do not have an IOMMU present at the moment. Not sure how
>> to go about setting a mask on a pci-probed device.
>
> Ah, right. The mask is a problem because PCI devices assume that
> they can do DMA to any 32-bit masked address without calling
> dma_set_mask. Your trick to describe the system memory at a different
> physical alias would solve this part. Another option might be to
> add a quirk in the OHCI/EHCI drivers, if you are able to detect
> this special case from the PCI IDs. Whether we can allow the PCI host
> controller to just set a mask is an open question. If we decide to go
> that route, you should be able to do it using the add_bus() callback
> in the host controller driver. However, it would be a departure from
> the normal way of doing PCI DMA, and I can't foresee what the implications
> would be.

So doing:

static void pci_rcar_fixup(struct pci_dev *dev)
{
	if (dev->bus->ops == &rcar_pci_ops) {
		dev_info(&dev->dev, "applying new dma mask\n");
		dev->dma_mask = DMA_BIT_MASK(31);
	}
}

DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);

Did not work for me :(

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 17:25               ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 17:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/14 17:41, Arnd Bergmann wrote:
> On Thursday 20 March 2014 17:12:43 Ben Dooks wrote:
>> On 20/03/14 17:09, Arnd Bergmann wrote:
>>> On Thursday 20 March 2014 16:04:36 Ben Dooks wrote:
>
>>>> As a note, if we now boot a Lager with DT on 3.14-rc3 series the USB
>>>> controllers no longer work with full 2GiB RAM enabled in the device
>>>> tree.
>>>
>>> Did it work before these patches got applied initially?
>>
>> It did, but I cannot remember if we where limiting DRAM to 1GiB
>> or not.
>
> I would assume you did, or you happened to never actually use
> memory higher than that for DMA during tests.
>
>>>> Could we work around this by having 1GiB of memory defined in the
>>>> 32bit memory and then add the rest of the 3GiB from the >32bit
>>>> area via LPAE? Will the kernel ever try to allocate DMA memory from
>>>> anything >32bit?
>>>
>>> You can solve the case for dma_alloc_coherent() this way, or by
>>> setting the mask correctly. It won't help you for dma_map_* though,
>>> which still requires someone to add support for swiotlb or using
>>> an IOMMU if present.
>>
>> We do not have an IOMMU present at the moment. Not sure how
>> to go about setting a mask on a pci-probed device.
>
> Ah, right. The mask is a problem because PCI devices assume that
> they can do DMA to any 32-bit masked address without calling
> dma_set_mask. Your trick to describe the system memory at a different
> physical alias would solve this part. Another option might be to
> add a quirk in the OHCI/EHCI drivers, if you are able to detect
> this special case from the PCI IDs. Whether we can allow the PCI host
> controller to just set a mask is an open question. If we decide to go
> that route, you should be able to do it using the add_bus() callback
> in the host controller driver. However, it would be a departure from
> the normal way of doing PCI DMA, and I can't foresee what the implications
> would be.

So doing:

static void pci_rcar_fixup(struct pci_dev *dev)
{
	if (dev->bus->ops == &rcar_pci_ops) {
		dev_info(&dev->dev, "applying new dma mask\n");
		dev->dma_mask = DMA_BIT_MASK(31);
	}
}

DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);

Did not work for me :(

-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 17:25               ` Ben Dooks
  (?)
@ 2014-03-20 17:31                 ` Jason Gunthorpe
  -1 siblings, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2014-03-20 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> So doing:
> 
> static void pci_rcar_fixup(struct pci_dev *dev)
> {
> 	if (dev->bus->ops = &rcar_pci_ops) {
> 		dev_info(&dev->dev, "applying new dma mask\n");
> 		dev->dma_mask = DMA_BIT_MASK(31);
> 	}
> }
> 
> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> 
> Did not work for me :(

Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 17:31                 ` Jason Gunthorpe
  0 siblings, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2014-03-20 17:31 UTC (permalink / raw)
  To: Ben Dooks
  Cc: Arnd Bergmann, Russell King, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Bjorn Helgaas, Simon Horman,
	linux-arm-kernel

On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> So doing:
> 
> static void pci_rcar_fixup(struct pci_dev *dev)
> {
> 	if (dev->bus->ops == &rcar_pci_ops) {
> 		dev_info(&dev->dev, "applying new dma mask\n");
> 		dev->dma_mask = DMA_BIT_MASK(31);
> 	}
> }
> 
> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> 
> Did not work for me :(

Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 17:31                 ` Jason Gunthorpe
  0 siblings, 0 replies; 78+ messages in thread
From: Jason Gunthorpe @ 2014-03-20 17:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> So doing:
> 
> static void pci_rcar_fixup(struct pci_dev *dev)
> {
> 	if (dev->bus->ops == &rcar_pci_ops) {
> 		dev_info(&dev->dev, "applying new dma mask\n");
> 		dev->dma_mask = DMA_BIT_MASK(31);
> 	}
> }
> 
> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> 
> Did not work for me :(

Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?

Jason

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 17:31                 ` Jason Gunthorpe
  (?)
@ 2014-03-20 17:32                   ` Ben Dooks
  -1 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 17:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/14 18:31, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
>> So doing:
>>
>> static void pci_rcar_fixup(struct pci_dev *dev)
>> {
>> 	if (dev->bus->ops = &rcar_pci_ops) {
>> 		dev_info(&dev->dev, "applying new dma mask\n");
>> 		dev->dma_mask = DMA_BIT_MASK(31);
>> 	}
>> }
>>
>> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
>>
>> Did not work for me :(
>
> Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?

Yes, see the print happening, just still PCI OHCI dies horribly.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 17:32                   ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 17:32 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Arnd Bergmann, Russell King, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Bjorn Helgaas, Simon Horman,
	linux-arm-kernel

On 20/03/14 18:31, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
>> So doing:
>>
>> static void pci_rcar_fixup(struct pci_dev *dev)
>> {
>> 	if (dev->bus->ops == &rcar_pci_ops) {
>> 		dev_info(&dev->dev, "applying new dma mask\n");
>> 		dev->dma_mask = DMA_BIT_MASK(31);
>> 	}
>> }
>>
>> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
>>
>> Did not work for me :(
>
> Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?

Yes, see the print happening, just still PCI OHCI dies horribly.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 17:32                   ` Ben Dooks
  0 siblings, 0 replies; 78+ messages in thread
From: Ben Dooks @ 2014-03-20 17:32 UTC (permalink / raw)
  To: linux-arm-kernel

On 20/03/14 18:31, Jason Gunthorpe wrote:
> On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
>> So doing:
>>
>> static void pci_rcar_fixup(struct pci_dev *dev)
>> {
>> 	if (dev->bus->ops == &rcar_pci_ops) {
>> 		dev_info(&dev->dev, "applying new dma mask\n");
>> 		dev->dma_mask = DMA_BIT_MASK(31);
>> 	}
>> }
>>
>> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
>>
>> Did not work for me :(
>
> Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?

Yes, see the print happening, just still PCI OHCI dies horribly.


-- 
Ben Dooks				http://www.codethink.co.uk/
Senior Engineer				Codethink - Providing Genius

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 17:32                   ` Ben Dooks
  (?)
@ 2014-03-20 18:29                     ` Arnd Bergmann
  -1 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 18:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014, Ben Dooks wrote:
> On 20/03/14 18:31, Jason Gunthorpe wrote:
> > On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> >> So doing:
> >>
> >> static void pci_rcar_fixup(struct pci_dev *dev)
> >> {
> >>      if (dev->bus->ops = &rcar_pci_ops) {
> >>              dev_info(&dev->dev, "applying new dma mask\n");
> >>              dev->dma_mask = DMA_BIT_MASK(31);
> >>      }
> >> }
> >>
> >> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> >>
> >> Did not work for me :(
> >
> > Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?
> 
> Yes, see the print happening, just still PCI OHCI dies horribly.

Shouldn't that mask be 30 instead of 31 when you only support DMA
to the first GB?

Another possibility is that 'EARLY' means it gets applied before
the normal mask is set.

Finally, setting the mask itself is not enough. As I mentioned you
also need to use the swiotlb_dma_ops. Did you already implement
those?

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 18:29                     ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 18:29 UTC (permalink / raw)
  To: Ben Dooks
  Cc: Jason Gunthorpe, Russell King, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Bjorn Helgaas, Simon Horman,
	linux-arm-kernel

On Thursday 20 March 2014, Ben Dooks wrote:
> On 20/03/14 18:31, Jason Gunthorpe wrote:
> > On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> >> So doing:
> >>
> >> static void pci_rcar_fixup(struct pci_dev *dev)
> >> {
> >>      if (dev->bus->ops == &rcar_pci_ops) {
> >>              dev_info(&dev->dev, "applying new dma mask\n");
> >>              dev->dma_mask = DMA_BIT_MASK(31);
> >>      }
> >> }
> >>
> >> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> >>
> >> Did not work for me :(
> >
> > Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?
> 
> Yes, see the print happening, just still PCI OHCI dies horribly.

Shouldn't that mask be 30 instead of 31 when you only support DMA
to the first GB?

Another possibility is that 'EARLY' means it gets applied before
the normal mask is set.

Finally, setting the mask itself is not enough. As I mentioned you
also need to use the swiotlb_dma_ops. Did you already implement
those?

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 18:29                     ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 18:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014, Ben Dooks wrote:
> On 20/03/14 18:31, Jason Gunthorpe wrote:
> > On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> >> So doing:
> >>
> >> static void pci_rcar_fixup(struct pci_dev *dev)
> >> {
> >>      if (dev->bus->ops == &rcar_pci_ops) {
> >>              dev_info(&dev->dev, "applying new dma mask\n");
> >>              dev->dma_mask = DMA_BIT_MASK(31);
> >>      }
> >> }
> >>
> >> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> >>
> >> Did not work for me :(
> >
> > Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?
> 
> Yes, see the print happening, just still PCI OHCI dies horribly.

Shouldn't that mask be 30 instead of 31 when you only support DMA
to the first GB?

Another possibility is that 'EARLY' means it gets applied before
the normal mask is set.

Finally, setting the mask itself is not enough. As I mentioned you
also need to use the swiotlb_dma_ops. Did you already implement
those?

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 18:29                     ` Arnd Bergmann
  (?)
@ 2014-03-20 18:39                       ` Russell King - ARM Linux
  -1 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 07:29:13PM +0100, Arnd Bergmann wrote:
> On Thursday 20 March 2014, Ben Dooks wrote:
> > On 20/03/14 18:31, Jason Gunthorpe wrote:
> > > On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> > >> So doing:
> > >>
> > >> static void pci_rcar_fixup(struct pci_dev *dev)
> > >> {
> > >>      if (dev->bus->ops = &rcar_pci_ops) {
> > >>              dev_info(&dev->dev, "applying new dma mask\n");
> > >>              dev->dma_mask = DMA_BIT_MASK(31);
> > >>      }
> > >> }
> > >>
> > >> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> > >>
> > >> Did not work for me :(
> > >
> > > Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?
> > 
> > Yes, see the print happening, just still PCI OHCI dies horribly.
> 
> Shouldn't that mask be 30 instead of 31 when you only support DMA
> to the first GB?
> 
> Another possibility is that 'EARLY' means it gets applied before
> the normal mask is set.
> 
> Finally, setting the mask itself is not enough. As I mentioned you
> also need to use the swiotlb_dma_ops. Did you already implement
> those?

And the above is no way to handle the DMA mask - it will get
overwritten if the PCI device calls pci_set_dma_mask().  Please people,
read the documentation on how this stuff is supposed to work:

  For correct operation, you must interrogate the kernel in your device
  probe routine to see if the DMA controller on the machine can properly
  support the DMA addressing limitation your device has.  It is good
  style to do this even if your device holds the default setting,
  because this shows that you did think about these issues wrt. your
  device.

Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
to say that it won't in the future - at which point hacks like the
above suddenly stop working.

We have other platforms which suffer from this - IXP is one of those
examples where PCI only has 64MB available.  The unfortunate thing is
we haven't yet got a good multi-platform way to handle dma_set_mask()
and dma_set_coherent_mask() with these kinds of restrictions.

However, the right solution here is that OHCI _should_ be making those
calls, and we _should_ be applying the platform specific fixups like
IXP does, but in a more generic way.  That includes using swiotlb
rather than dmabounce.

That all said, what I find most annoying is that Ben has ignored my
question about exactly what the nature of the limitation is on this
platform.  My response to that is - if you're not going to do the
curtesy of answering such a simple question, then you don't get to
patch the kernel for this problem.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 18:39                       ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 18:39 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ben Dooks, Jason Gunthorpe, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Bjorn Helgaas, Simon Horman,
	linux-arm-kernel

On Thu, Mar 20, 2014 at 07:29:13PM +0100, Arnd Bergmann wrote:
> On Thursday 20 March 2014, Ben Dooks wrote:
> > On 20/03/14 18:31, Jason Gunthorpe wrote:
> > > On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> > >> So doing:
> > >>
> > >> static void pci_rcar_fixup(struct pci_dev *dev)
> > >> {
> > >>      if (dev->bus->ops == &rcar_pci_ops) {
> > >>              dev_info(&dev->dev, "applying new dma mask\n");
> > >>              dev->dma_mask = DMA_BIT_MASK(31);
> > >>      }
> > >> }
> > >>
> > >> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> > >>
> > >> Did not work for me :(
> > >
> > > Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?
> > 
> > Yes, see the print happening, just still PCI OHCI dies horribly.
> 
> Shouldn't that mask be 30 instead of 31 when you only support DMA
> to the first GB?
> 
> Another possibility is that 'EARLY' means it gets applied before
> the normal mask is set.
> 
> Finally, setting the mask itself is not enough. As I mentioned you
> also need to use the swiotlb_dma_ops. Did you already implement
> those?

And the above is no way to handle the DMA mask - it will get
overwritten if the PCI device calls pci_set_dma_mask().  Please people,
read the documentation on how this stuff is supposed to work:

  For correct operation, you must interrogate the kernel in your device
  probe routine to see if the DMA controller on the machine can properly
  support the DMA addressing limitation your device has.  It is good
  style to do this even if your device holds the default setting,
  because this shows that you did think about these issues wrt. your
  device.

Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
to say that it won't in the future - at which point hacks like the
above suddenly stop working.

We have other platforms which suffer from this - IXP is one of those
examples where PCI only has 64MB available.  The unfortunate thing is
we haven't yet got a good multi-platform way to handle dma_set_mask()
and dma_set_coherent_mask() with these kinds of restrictions.

However, the right solution here is that OHCI _should_ be making those
calls, and we _should_ be applying the platform specific fixups like
IXP does, but in a more generic way.  That includes using swiotlb
rather than dmabounce.

That all said, what I find most annoying is that Ben has ignored my
question about exactly what the nature of the limitation is on this
platform.  My response to that is - if you're not going to do the
curtesy of answering such a simple question, then you don't get to
patch the kernel for this problem.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 18:39                       ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 18:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 07:29:13PM +0100, Arnd Bergmann wrote:
> On Thursday 20 March 2014, Ben Dooks wrote:
> > On 20/03/14 18:31, Jason Gunthorpe wrote:
> > > On Thu, Mar 20, 2014 at 06:25:25PM +0100, Ben Dooks wrote:
> > >> So doing:
> > >>
> > >> static void pci_rcar_fixup(struct pci_dev *dev)
> > >> {
> > >>      if (dev->bus->ops == &rcar_pci_ops) {
> > >>              dev_info(&dev->dev, "applying new dma mask\n");
> > >>              dev->dma_mask = DMA_BIT_MASK(31);
> > >>      }
> > >> }
> > >>
> > >> DECLARE_PCI_FIXUP_EARLY(PCI_ANY_ID, PCI_ANY_ID, pci_rcar_fixup);
> > >>
> > >> Did not work for me :(
> > >
> > > Seems like it should work, do you have CONFIG_PCI_QUIRKS turned on?
> > 
> > Yes, see the print happening, just still PCI OHCI dies horribly.
> 
> Shouldn't that mask be 30 instead of 31 when you only support DMA
> to the first GB?
> 
> Another possibility is that 'EARLY' means it gets applied before
> the normal mask is set.
> 
> Finally, setting the mask itself is not enough. As I mentioned you
> also need to use the swiotlb_dma_ops. Did you already implement
> those?

And the above is no way to handle the DMA mask - it will get
overwritten if the PCI device calls pci_set_dma_mask().  Please people,
read the documentation on how this stuff is supposed to work:

  For correct operation, you must interrogate the kernel in your device
  probe routine to see if the DMA controller on the machine can properly
  support the DMA addressing limitation your device has.  It is good
  style to do this even if your device holds the default setting,
  because this shows that you did think about these issues wrt. your
  device.

Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
to say that it won't in the future - at which point hacks like the
above suddenly stop working.

We have other platforms which suffer from this - IXP is one of those
examples where PCI only has 64MB available.  The unfortunate thing is
we haven't yet got a good multi-platform way to handle dma_set_mask()
and dma_set_coherent_mask() with these kinds of restrictions.

However, the right solution here is that OHCI _should_ be making those
calls, and we _should_ be applying the platform specific fixups like
IXP does, but in a more generic way.  That includes using swiotlb
rather than dmabounce.

That all said, what I find most annoying is that Ben has ignored my
question about exactly what the nature of the limitation is on this
platform.  My response to that is - if you're not going to do the
curtesy of answering such a simple question, then you don't get to
patch the kernel for this problem.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 18:39                       ` Russell King - ARM Linux
  (?)
  (?)
@ 2014-03-20 19:04                         ` Geert Uytterhoeven
  -1 siblings, 0 replies; 78+ messages in thread
From: Geert Uytterhoeven @ 2014-03-20 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Thu, Mar 20, 2014 at 7:39 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> That all said, what I find most annoying is that Ben has ignored my
> question about exactly what the nature of the limitation is on this
> platform.  My response to that is - if you're not going to do the
> curtesy of answering such a simple question, then you don't get to
> patch the kernel for this problem.

Ben did answer:
"If the window is set to 2GiB it can only access 2GiB aligned to 2GiB
 boundary, however DRAM spans a 2GiB boundary (being at 1-3GiB)"

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 19:04                         ` Geert Uytterhoeven
  0 siblings, 0 replies; 78+ messages in thread
From: Geert Uytterhoeven @ 2014-03-20 19:04 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Ben Dooks, Jason Gunthorpe, Linux-sh list,
	linux-pci, linux-kernel, Magnus Damm, Ben Dooks, Bjorn Helgaas,
	Simon Horman, linux-arm-kernel

Hi Russell,

On Thu, Mar 20, 2014 at 7:39 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> That all said, what I find most annoying is that Ben has ignored my
> question about exactly what the nature of the limitation is on this
> platform.  My response to that is - if you're not going to do the
> curtesy of answering such a simple question, then you don't get to
> patch the kernel for this problem.

Ben did answer:
"If the window is set to 2GiB it can only access 2GiB aligned to 2GiB
 boundary, however DRAM spans a 2GiB boundary (being at 1-3GiB)"

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 19:04                         ` Geert Uytterhoeven
  0 siblings, 0 replies; 78+ messages in thread
From: Geert Uytterhoeven @ 2014-03-20 19:04 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Arnd Bergmann, Ben Dooks, Jason Gunthorpe, Linux-sh list,
	linux-pci, linux-kernel, Magnus Damm, Ben Dooks, Bjorn Helgaas,
	Simon Horman, linux-arm-kernel

Hi Russell,

On Thu, Mar 20, 2014 at 7:39 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> That all said, what I find most annoying is that Ben has ignored my
> question about exactly what the nature of the limitation is on this
> platform.  My response to that is - if you're not going to do the
> curtesy of answering such a simple question, then you don't get to
> patch the kernel for this problem.

Ben did answer:
"If the window is set to 2GiB it can only access 2GiB aligned to 2GiB
 boundary, however DRAM spans a 2GiB boundary (being at 1-3GiB)"

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 19:04                         ` Geert Uytterhoeven
  0 siblings, 0 replies; 78+ messages in thread
From: Geert Uytterhoeven @ 2014-03-20 19:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Russell,

On Thu, Mar 20, 2014 at 7:39 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> That all said, what I find most annoying is that Ben has ignored my
> question about exactly what the nature of the limitation is on this
> platform.  My response to that is - if you're not going to do the
> curtesy of answering such a simple question, then you don't get to
> patch the kernel for this problem.

Ben did answer:
"If the window is set to 2GiB it can only access 2GiB aligned to 2GiB
 boundary, however DRAM spans a 2GiB boundary (being at 1-3GiB)"

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 18:39                       ` Russell King - ARM Linux
  (?)
@ 2014-03-20 22:11                         ` Arnd Bergmann
  -1 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 22:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014 18:39:32 Russell King - ARM Linux wrote:
> On Thu, Mar 20, 2014 at 07:29:13PM +0100, Arnd Bergmann wrote:
> > On Thursday 20 March 2014, Ben Dooks wrote:

> > Shouldn't that mask be 30 instead of 31 when you only support DMA
> > to the first GB?
> > 
> > Another possibility is that 'EARLY' means it gets applied before
> > the normal mask is set.
> > 
> > Finally, setting the mask itself is not enough. As I mentioned you
> > also need to use the swiotlb_dma_ops. Did you already implement
> > those?
> 
> And the above is no way to handle the DMA mask - it will get
> overwritten if the PCI device calls pci_set_dma_mask().  Please people,
> read the documentation on how this stuff is supposed to work:
> 
>   For correct operation, you must interrogate the kernel in your device
>   probe routine to see if the DMA controller on the machine can properly
>   support the DMA addressing limitation your device has.  It is good
>   style to do this even if your device holds the default setting,
>   because this shows that you did think about these issues wrt. your
>   device.

The trouble is that in the documentation it is also assumed that
the bus can do 32-bit DMA, and only the device may have further
restrictions. Here it is the other way round: the device can do
32-bit DMA, but the bus is limited to 31-bit, and at least the
documentation does not say explicitly how that should be handled.

> Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
> to say that it won't in the future - at which point hacks like the
> above suddenly stop working.

If the OHCI-PCI driver is changed to call
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))), what do you
think should happen? The way I read it, that call must fail on
this platform, causing the probe() function to bail out and not
use the hardware, which is clearly not what we want here.

> We have other platforms which suffer from this - IXP is one of those
> examples where PCI only has 64MB available.  The unfortunate thing is
> we haven't yet got a good multi-platform way to handle dma_set_mask()
> and dma_set_coherent_mask() with these kinds of restrictions.
> 
> However, the right solution here is that OHCI _should_ be making those
> calls, and we _should_ be applying the platform specific fixups like
> IXP does, but in a more generic way.  That includes using swiotlb
> rather than dmabounce.

IXP uses defines a ixp4xx_pci_platform_notify() function that basically
does the same thing that Ben tried adding here, and then it has a
dma_set_coherent_mask function that checks whether the mask that the
device tries to set is even smaller than that, in which case it returns
a failure, but it returns success if the device tries to set a larger
mask. In no case does it try to actually set the mask. While this seems
like a practical approach, I don't see how that actually matches up
with the documentation, e.g. "dma_set_coherent_mask will always be able
to set the same or a smaller mask as the streaming mask." and
"When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
returns zero, the kernel saves away this mask you have provided."

Once we have figured out what the code should actually do, I think
we can solve the question of how to do it for multiplatform. The
approach I would take here is to treat the legacy platforms with
special requirements the same way we do now -- we will never need
multiplatform support on ixp4xx,  CM-X255/CM-X270 or sa1111.
The majority of the platforms we support today are fine with
always assuming that all of memory can be reached at the physical
address, and the exceptions are either the legacy non-multiplatform
ones, or very new platforms with LPAE that can only be probed using
devicetree. I would think that we can support them all with the
same dma_set_mask/dma_set_coherent_mask implementation that
scans the DT for platform and PCI devices.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 22:11                         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 22:11 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Ben Dooks, Jason Gunthorpe, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Bjorn Helgaas, Simon Horman,
	linux-arm-kernel

On Thursday 20 March 2014 18:39:32 Russell King - ARM Linux wrote:
> On Thu, Mar 20, 2014 at 07:29:13PM +0100, Arnd Bergmann wrote:
> > On Thursday 20 March 2014, Ben Dooks wrote:

> > Shouldn't that mask be 30 instead of 31 when you only support DMA
> > to the first GB?
> > 
> > Another possibility is that 'EARLY' means it gets applied before
> > the normal mask is set.
> > 
> > Finally, setting the mask itself is not enough. As I mentioned you
> > also need to use the swiotlb_dma_ops. Did you already implement
> > those?
> 
> And the above is no way to handle the DMA mask - it will get
> overwritten if the PCI device calls pci_set_dma_mask().  Please people,
> read the documentation on how this stuff is supposed to work:
> 
>   For correct operation, you must interrogate the kernel in your device
>   probe routine to see if the DMA controller on the machine can properly
>   support the DMA addressing limitation your device has.  It is good
>   style to do this even if your device holds the default setting,
>   because this shows that you did think about these issues wrt. your
>   device.

The trouble is that in the documentation it is also assumed that
the bus can do 32-bit DMA, and only the device may have further
restrictions. Here it is the other way round: the device can do
32-bit DMA, but the bus is limited to 31-bit, and at least the
documentation does not say explicitly how that should be handled.

> Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
> to say that it won't in the future - at which point hacks like the
> above suddenly stop working.

If the OHCI-PCI driver is changed to call
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))), what do you
think should happen? The way I read it, that call must fail on
this platform, causing the probe() function to bail out and not
use the hardware, which is clearly not what we want here.

> We have other platforms which suffer from this - IXP is one of those
> examples where PCI only has 64MB available.  The unfortunate thing is
> we haven't yet got a good multi-platform way to handle dma_set_mask()
> and dma_set_coherent_mask() with these kinds of restrictions.
> 
> However, the right solution here is that OHCI _should_ be making those
> calls, and we _should_ be applying the platform specific fixups like
> IXP does, but in a more generic way.  That includes using swiotlb
> rather than dmabounce.

IXP uses defines a ixp4xx_pci_platform_notify() function that basically
does the same thing that Ben tried adding here, and then it has a
dma_set_coherent_mask function that checks whether the mask that the
device tries to set is even smaller than that, in which case it returns
a failure, but it returns success if the device tries to set a larger
mask. In no case does it try to actually set the mask. While this seems
like a practical approach, I don't see how that actually matches up
with the documentation, e.g. "dma_set_coherent_mask will always be able
to set the same or a smaller mask as the streaming mask." and
"When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
returns zero, the kernel saves away this mask you have provided."

Once we have figured out what the code should actually do, I think
we can solve the question of how to do it for multiplatform. The
approach I would take here is to treat the legacy platforms with
special requirements the same way we do now -- we will never need
multiplatform support on ixp4xx,  CM-X255/CM-X270 or sa1111.
The majority of the platforms we support today are fine with
always assuming that all of memory can be reached at the physical
address, and the exceptions are either the legacy non-multiplatform
ones, or very new platforms with LPAE that can only be probed using
devicetree. I would think that we can support them all with the
same dma_set_mask/dma_set_coherent_mask implementation that
scans the DT for platform and PCI devices.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 22:11                         ` Arnd Bergmann
  0 siblings, 0 replies; 78+ messages in thread
From: Arnd Bergmann @ 2014-03-20 22:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 20 March 2014 18:39:32 Russell King - ARM Linux wrote:
> On Thu, Mar 20, 2014 at 07:29:13PM +0100, Arnd Bergmann wrote:
> > On Thursday 20 March 2014, Ben Dooks wrote:

> > Shouldn't that mask be 30 instead of 31 when you only support DMA
> > to the first GB?
> > 
> > Another possibility is that 'EARLY' means it gets applied before
> > the normal mask is set.
> > 
> > Finally, setting the mask itself is not enough. As I mentioned you
> > also need to use the swiotlb_dma_ops. Did you already implement
> > those?
> 
> And the above is no way to handle the DMA mask - it will get
> overwritten if the PCI device calls pci_set_dma_mask().  Please people,
> read the documentation on how this stuff is supposed to work:
> 
>   For correct operation, you must interrogate the kernel in your device
>   probe routine to see if the DMA controller on the machine can properly
>   support the DMA addressing limitation your device has.  It is good
>   style to do this even if your device holds the default setting,
>   because this shows that you did think about these issues wrt. your
>   device.

The trouble is that in the documentation it is also assumed that
the bus can do 32-bit DMA, and only the device may have further
restrictions. Here it is the other way round: the device can do
32-bit DMA, but the bus is limited to 31-bit, and at least the
documentation does not say explicitly how that should be handled.

> Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
> to say that it won't in the future - at which point hacks like the
> above suddenly stop working.

If the OHCI-PCI driver is changed to call
dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))), what do you
think should happen? The way I read it, that call must fail on
this platform, causing the probe() function to bail out and not
use the hardware, which is clearly not what we want here.

> We have other platforms which suffer from this - IXP is one of those
> examples where PCI only has 64MB available.  The unfortunate thing is
> we haven't yet got a good multi-platform way to handle dma_set_mask()
> and dma_set_coherent_mask() with these kinds of restrictions.
> 
> However, the right solution here is that OHCI _should_ be making those
> calls, and we _should_ be applying the platform specific fixups like
> IXP does, but in a more generic way.  That includes using swiotlb
> rather than dmabounce.

IXP uses defines a ixp4xx_pci_platform_notify() function that basically
does the same thing that Ben tried adding here, and then it has a
dma_set_coherent_mask function that checks whether the mask that the
device tries to set is even smaller than that, in which case it returns
a failure, but it returns success if the device tries to set a larger
mask. In no case does it try to actually set the mask. While this seems
like a practical approach, I don't see how that actually matches up
with the documentation, e.g. "dma_set_coherent_mask will always be able
to set the same or a smaller mask as the streaming mask." and
"When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
returns zero, the kernel saves away this mask you have provided."

Once we have figured out what the code should actually do, I think
we can solve the question of how to do it for multiplatform. The
approach I would take here is to treat the legacy platforms with
special requirements the same way we do now -- we will never need
multiplatform support on ixp4xx,  CM-X255/CM-X270 or sa1111.
The majority of the platforms we support today are fine with
always assuming that all of memory can be reached at the physical
address, and the exceptions are either the legacy non-multiplatform
ones, or very new platforms with LPAE that can only be probed using
devicetree. I would think that we can support them all with the
same dma_set_mask/dma_set_coherent_mask implementation that
scans the DT for platform and PCI devices.

	Arnd

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
  2014-03-20 22:11                         ` Arnd Bergmann
  (?)
@ 2014-03-20 22:50                           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 22:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 11:11:04PM +0100, Arnd Bergmann wrote:
> On Thursday 20 March 2014 18:39:32 Russell King - ARM Linux wrote:
> > Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
> > to say that it won't in the future - at which point hacks like the
> > above suddenly stop working.
> 
> If the OHCI-PCI driver is changed to call
> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))), what do you
> think should happen? The way I read it, that call must fail on
> this platform, causing the probe() function to bail out and not
> use the hardware, which is clearly not what we want here.

That depends how you interpret what's going on here.  It's saying
"I may allocate and try and map memory which result in 32-bits
of DMA address".  Under normal situations where there is a 1:1
relationship and a simple mapping done by the DMA API, then yes,
such memory would end up with 32-bits of DMA address.

Now, if the DMA implementation has a way to cope with providing a
restricted DMA address to the device due to bus limitations which
the DMA implementation knows about, then the answer to "can you
cope with memory which would normally result in 32-bits of DMA
address" is yes, even if you can only handle 30 bits.

The reason being is that you're agreeing to the driver using
memory which would be 32-bits, but internally you're guaranteeing
not to violate the bus limitation of 30 bits.

Things get harder with the coherent API, and that's where IXP4xx
has to intercept the call and limit the coherent mask, so the
coherent allocator knows to restrict the memory returned - there's
no way that such memory could be fixed up after the fact.

> IXP uses defines a ixp4xx_pci_platform_notify() function that basically
> does the same thing that Ben tried adding here, and then it has a
> dma_set_coherent_mask function that checks whether the mask that the
> device tries to set is even smaller than that, in which case it returns
> a failure, but it returns success if the device tries to set a larger
> mask.

Yes, IXP has a two-pronged approach because of problem drivers like
OHCI which don't call the DMA mask setting functions.  If everyone
did, then the hooks in the mask setting functions would be all that
it needs, since it could apply its restriction there rather than via
notifiers.

So, what IXP does is it overrides dma_set_coherent_mask() to be a
function which just validates the mask, and doesn't actually allow
the mask to be changed.  This coupled with the notifier sets the
coherent mask to 64M, and ensures that it isn't changed (except by
buggy drivers which access it directly.)

An alternative approach for that would be to accept 64M or larger
masks in dma_set_coherent_mask() but always set the actual coherent
DMA mask to no larger than 64M - but that's only possible where all
drivers properly call dma_set_coherent_mask().

The streaming DMA mask has become more convoluted - it's now set
such that any device which has dmabounce attached to it will ignore
attempts to change the streaming DMA mask value - just like the
above.

> In no case does it try to actually set the mask. While this seems
> like a practical approach, I don't see how that actually matches up
> with the documentation, e.g. "dma_set_coherent_mask will always be able
> to set the same or a smaller mask as the streaming mask." and
> "When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
> returns zero, the kernel saves away this mask you have provided."

IXP4xx used to conform - 6fee48cd330c:

-int
-pci_set_dma_mask(struct pci_dev *dev, u64 mask)
-{
-       if (mask >= SZ_64M - 1 )
-               return 0;
-
-       return -EIO;
-}
-    
-int
-pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask)
-{
-       if (mask >= SZ_64M - 1 )
-               return 0;
-
-       return -EIO;
-}

Here, both functions have the same behaviour.  The bit about "smaller
mask" is slightly bogus - you can never set a mask that is smaller
than the system's DMA memory pool, because the system never has a way
to guarantee you memory below GFP_DMA allocations.  In this case on
IXP4xx, that is 64MB of memory, so thse functions reject on 64MB or
smaller allocations.

However, because of the bouncing that is implemented, it can accept
drivers asking for larger DMA masks, and because it limits the
coherent memory allocations to 64MB internally within the DMA API,
larger coherent masks are permissible.

So.. there's no problem here.  What is a problem is:

static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
{
        if (dev->archdata.dmabounce)
                return 0;

        return arm_dma_ops.set_dma_mask(dev, dma_mask);
}

which says "I'll accept anything if I'm a DMA bounce-hindered device"
which isn't actually correct, since we should still be rejecting
masks smaller than the system can allocate for.

That's checked for by dma_supported() which should really be in 
the above.

The remaining gotcha for multiplatform is the coherent mask, which
doesn't have a way for it to be intercepted by the platform.  That
means any /good/ driver which calls dma_set_coherent_mask() results
in resetting the coherent mask to the driver specified mask.

> The majority of the platforms we support today are fine with
> always assuming that all of memory can be reached at the physical
> address,

That's only because they're now all fine with 32-bit addresses.

> and the exceptions are either the legacy non-multiplatform
> ones, or very new platforms with LPAE that can only be probed using
> devicetree.

Remember with LPAE and its strange memory setups, physical address !DMA address, and that's going to be something that everyone is going
to /have/ to learn to get used to.  No longer will it be possible to
assume that the two are the same, or you can just crop the bits above
32-bit off to make it fit in a 32-bit controller - consider 8G of
memory on LPAE with a 32-bit DMA controller.

That's where dealing with the DMA mask correctly (representing the number
of bits the /device/ can accept rather than the number of physical address
bits) matters, and thankfully that should now be solved.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* Re: DMABOUNCE in pci-rcar
@ 2014-03-20 22:50                           ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 22:50 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Ben Dooks, Jason Gunthorpe, linux-sh, linux-pci, linux-kernel,
	Magnus Damm, Ben Dooks, Bjorn Helgaas, Simon Horman,
	linux-arm-kernel

On Thu, Mar 20, 2014 at 11:11:04PM +0100, Arnd Bergmann wrote:
> On Thursday 20 March 2014 18:39:32 Russell King - ARM Linux wrote:
> > Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
> > to say that it won't in the future - at which point hacks like the
> > above suddenly stop working.
> 
> If the OHCI-PCI driver is changed to call
> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))), what do you
> think should happen? The way I read it, that call must fail on
> this platform, causing the probe() function to bail out and not
> use the hardware, which is clearly not what we want here.

That depends how you interpret what's going on here.  It's saying
"I may allocate and try and map memory which result in 32-bits
of DMA address".  Under normal situations where there is a 1:1
relationship and a simple mapping done by the DMA API, then yes,
such memory would end up with 32-bits of DMA address.

Now, if the DMA implementation has a way to cope with providing a
restricted DMA address to the device due to bus limitations which
the DMA implementation knows about, then the answer to "can you
cope with memory which would normally result in 32-bits of DMA
address" is yes, even if you can only handle 30 bits.

The reason being is that you're agreeing to the driver using
memory which would be 32-bits, but internally you're guaranteeing
not to violate the bus limitation of 30 bits.

Things get harder with the coherent API, and that's where IXP4xx
has to intercept the call and limit the coherent mask, so the
coherent allocator knows to restrict the memory returned - there's
no way that such memory could be fixed up after the fact.

> IXP uses defines a ixp4xx_pci_platform_notify() function that basically
> does the same thing that Ben tried adding here, and then it has a
> dma_set_coherent_mask function that checks whether the mask that the
> device tries to set is even smaller than that, in which case it returns
> a failure, but it returns success if the device tries to set a larger
> mask.

Yes, IXP has a two-pronged approach because of problem drivers like
OHCI which don't call the DMA mask setting functions.  If everyone
did, then the hooks in the mask setting functions would be all that
it needs, since it could apply its restriction there rather than via
notifiers.

So, what IXP does is it overrides dma_set_coherent_mask() to be a
function which just validates the mask, and doesn't actually allow
the mask to be changed.  This coupled with the notifier sets the
coherent mask to 64M, and ensures that it isn't changed (except by
buggy drivers which access it directly.)

An alternative approach for that would be to accept 64M or larger
masks in dma_set_coherent_mask() but always set the actual coherent
DMA mask to no larger than 64M - but that's only possible where all
drivers properly call dma_set_coherent_mask().

The streaming DMA mask has become more convoluted - it's now set
such that any device which has dmabounce attached to it will ignore
attempts to change the streaming DMA mask value - just like the
above.

> In no case does it try to actually set the mask. While this seems
> like a practical approach, I don't see how that actually matches up
> with the documentation, e.g. "dma_set_coherent_mask will always be able
> to set the same or a smaller mask as the streaming mask." and
> "When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
> returns zero, the kernel saves away this mask you have provided."

IXP4xx used to conform - 6fee48cd330c:

-int
-pci_set_dma_mask(struct pci_dev *dev, u64 mask)
-{
-       if (mask >= SZ_64M - 1 )
-               return 0;
-
-       return -EIO;
-}
-    
-int
-pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask)
-{
-       if (mask >= SZ_64M - 1 )
-               return 0;
-
-       return -EIO;
-}

Here, both functions have the same behaviour.  The bit about "smaller
mask" is slightly bogus - you can never set a mask that is smaller
than the system's DMA memory pool, because the system never has a way
to guarantee you memory below GFP_DMA allocations.  In this case on
IXP4xx, that is 64MB of memory, so thse functions reject on 64MB or
smaller allocations.

However, because of the bouncing that is implemented, it can accept
drivers asking for larger DMA masks, and because it limits the
coherent memory allocations to 64MB internally within the DMA API,
larger coherent masks are permissible.

So.. there's no problem here.  What is a problem is:

static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
{
        if (dev->archdata.dmabounce)
                return 0;

        return arm_dma_ops.set_dma_mask(dev, dma_mask);
}

which says "I'll accept anything if I'm a DMA bounce-hindered device"
which isn't actually correct, since we should still be rejecting
masks smaller than the system can allocate for.

That's checked for by dma_supported() which should really be in 
the above.

The remaining gotcha for multiplatform is the coherent mask, which
doesn't have a way for it to be intercepted by the platform.  That
means any /good/ driver which calls dma_set_coherent_mask() results
in resetting the coherent mask to the driver specified mask.

> The majority of the platforms we support today are fine with
> always assuming that all of memory can be reached at the physical
> address,

That's only because they're now all fine with 32-bit addresses.

> and the exceptions are either the legacy non-multiplatform
> ones, or very new platforms with LPAE that can only be probed using
> devicetree.

Remember with LPAE and its strange memory setups, physical address !=
DMA address, and that's going to be something that everyone is going
to /have/ to learn to get used to.  No longer will it be possible to
assume that the two are the same, or you can just crop the bits above
32-bit off to make it fit in a 32-bit controller - consider 8G of
memory on LPAE with a 32-bit DMA controller.

That's where dealing with the DMA mask correctly (representing the number
of bits the /device/ can accept rather than the number of physical address
bits) matters, and thankfully that should now be solved.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

* DMABOUNCE in pci-rcar
@ 2014-03-20 22:50                           ` Russell King - ARM Linux
  0 siblings, 0 replies; 78+ messages in thread
From: Russell King - ARM Linux @ 2014-03-20 22:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Mar 20, 2014 at 11:11:04PM +0100, Arnd Bergmann wrote:
> On Thursday 20 March 2014 18:39:32 Russell King - ARM Linux wrote:
> > Yes, OHCI's PCI driver doesn't do that at the moment, but that's not
> > to say that it won't in the future - at which point hacks like the
> > above suddenly stop working.
> 
> If the OHCI-PCI driver is changed to call
> dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32))), what do you
> think should happen? The way I read it, that call must fail on
> this platform, causing the probe() function to bail out and not
> use the hardware, which is clearly not what we want here.

That depends how you interpret what's going on here.  It's saying
"I may allocate and try and map memory which result in 32-bits
of DMA address".  Under normal situations where there is a 1:1
relationship and a simple mapping done by the DMA API, then yes,
such memory would end up with 32-bits of DMA address.

Now, if the DMA implementation has a way to cope with providing a
restricted DMA address to the device due to bus limitations which
the DMA implementation knows about, then the answer to "can you
cope with memory which would normally result in 32-bits of DMA
address" is yes, even if you can only handle 30 bits.

The reason being is that you're agreeing to the driver using
memory which would be 32-bits, but internally you're guaranteeing
not to violate the bus limitation of 30 bits.

Things get harder with the coherent API, and that's where IXP4xx
has to intercept the call and limit the coherent mask, so the
coherent allocator knows to restrict the memory returned - there's
no way that such memory could be fixed up after the fact.

> IXP uses defines a ixp4xx_pci_platform_notify() function that basically
> does the same thing that Ben tried adding here, and then it has a
> dma_set_coherent_mask function that checks whether the mask that the
> device tries to set is even smaller than that, in which case it returns
> a failure, but it returns success if the device tries to set a larger
> mask.

Yes, IXP has a two-pronged approach because of problem drivers like
OHCI which don't call the DMA mask setting functions.  If everyone
did, then the hooks in the mask setting functions would be all that
it needs, since it could apply its restriction there rather than via
notifiers.

So, what IXP does is it overrides dma_set_coherent_mask() to be a
function which just validates the mask, and doesn't actually allow
the mask to be changed.  This coupled with the notifier sets the
coherent mask to 64M, and ensures that it isn't changed (except by
buggy drivers which access it directly.)

An alternative approach for that would be to accept 64M or larger
masks in dma_set_coherent_mask() but always set the actual coherent
DMA mask to no larger than 64M - but that's only possible where all
drivers properly call dma_set_coherent_mask().

The streaming DMA mask has become more convoluted - it's now set
such that any device which has dmabounce attached to it will ignore
attempts to change the streaming DMA mask value - just like the
above.

> In no case does it try to actually set the mask. While this seems
> like a practical approach, I don't see how that actually matches up
> with the documentation, e.g. "dma_set_coherent_mask will always be able
> to set the same or a smaller mask as the streaming mask." and
> "When dma_set_mask() or dma_set_mask_and_coherent() is successful, and
> returns zero, the kernel saves away this mask you have provided."

IXP4xx used to conform - 6fee48cd330c:

-int
-pci_set_dma_mask(struct pci_dev *dev, u64 mask)
-{
-       if (mask >= SZ_64M - 1 )
-               return 0;
-
-       return -EIO;
-}
-    
-int
-pci_set_consistent_dma_mask(struct pci_dev *dev, u64 mask)
-{
-       if (mask >= SZ_64M - 1 )
-               return 0;
-
-       return -EIO;
-}

Here, both functions have the same behaviour.  The bit about "smaller
mask" is slightly bogus - you can never set a mask that is smaller
than the system's DMA memory pool, because the system never has a way
to guarantee you memory below GFP_DMA allocations.  In this case on
IXP4xx, that is 64MB of memory, so thse functions reject on 64MB or
smaller allocations.

However, because of the bouncing that is implemented, it can accept
drivers asking for larger DMA masks, and because it limits the
coherent memory allocations to 64MB internally within the DMA API,
larger coherent masks are permissible.

So.. there's no problem here.  What is a problem is:

static int dmabounce_set_mask(struct device *dev, u64 dma_mask)
{
        if (dev->archdata.dmabounce)
                return 0;

        return arm_dma_ops.set_dma_mask(dev, dma_mask);
}

which says "I'll accept anything if I'm a DMA bounce-hindered device"
which isn't actually correct, since we should still be rejecting
masks smaller than the system can allocate for.

That's checked for by dma_supported() which should really be in 
the above.

The remaining gotcha for multiplatform is the coherent mask, which
doesn't have a way for it to be intercepted by the platform.  That
means any /good/ driver which calls dma_set_coherent_mask() results
in resetting the coherent mask to the driver specified mask.

> The majority of the platforms we support today are fine with
> always assuming that all of memory can be reached at the physical
> address,

That's only because they're now all fine with 32-bit addresses.

> and the exceptions are either the legacy non-multiplatform
> ones, or very new platforms with LPAE that can only be probed using
> devicetree.

Remember with LPAE and its strange memory setups, physical address !=
DMA address, and that's going to be something that everyone is going
to /have/ to learn to get used to.  No longer will it be possible to
assume that the two are the same, or you can just crop the bits above
32-bit off to make it fit in a 32-bit controller - consider 8G of
memory on LPAE with a 32-bit DMA controller.

That's where dealing with the DMA mask correctly (representing the number
of bits the /device/ can accept rather than the number of physical address
bits) matters, and thankfully that should now be solved.

-- 
FTTC broadband for 0.8mile line: now at 9.7Mbps down 460kbps up... slowly
improving, and getting towards what was expected from it.

^ permalink raw reply	[flat|nested] 78+ messages in thread

end of thread, other threads:[~2014-03-20 22:51 UTC | newest]

Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-24 11:00 DMABOUNCE in pci-rcar Arnd Bergmann
2014-02-24 11:00 ` Arnd Bergmann
2014-02-24 11:00 ` Arnd Bergmann
2014-02-24 23:49 ` Magnus Damm
2014-02-24 23:49   ` Magnus Damm
2014-02-24 23:49   ` Magnus Damm
2014-02-24 23:49   ` Magnus Damm
2014-02-25  0:17   ` Russell King - ARM Linux
2014-02-25  0:17     ` Russell King - ARM Linux
2014-02-25  0:17     ` Russell King - ARM Linux
2014-02-25  0:17     ` Russell King - ARM Linux
2014-02-25  2:00     ` Magnus Damm
2014-02-25  2:00       ` Magnus Damm
2014-02-25  2:00       ` Magnus Damm
2014-02-25  2:00       ` Magnus Damm
2014-02-25 15:44       ` Arnd Bergmann
2014-02-25 15:44         ` Arnd Bergmann
2014-02-25 15:44         ` Arnd Bergmann
2014-02-25 15:44         ` Arnd Bergmann
2014-02-25 12:15   ` Arnd Bergmann
2014-02-25 12:15     ` Arnd Bergmann
2014-02-25 12:15     ` Arnd Bergmann
2014-02-25 12:15     ` Arnd Bergmann
2014-02-26 19:48 ` Bjorn Helgaas
2014-02-26 19:48   ` Bjorn Helgaas
2014-02-26 19:48   ` Bjorn Helgaas
2014-02-26 19:57   ` Arnd Bergmann
2014-02-26 19:57     ` Arnd Bergmann
2014-02-26 19:57     ` Arnd Bergmann
2014-02-26 21:02     ` Bjorn Helgaas
2014-02-26 21:02       ` Bjorn Helgaas
2014-02-26 21:02       ` Bjorn Helgaas
2014-02-26 21:52       ` Arnd Bergmann
2014-02-26 21:52         ` Arnd Bergmann
2014-02-26 21:52         ` Arnd Bergmann
2014-03-20 15:04     ` Ben Dooks
2014-03-20 15:04       ` Ben Dooks
2014-03-20 15:04       ` Ben Dooks
2014-03-20 15:26       ` Russell King - ARM Linux
2014-03-20 15:26         ` Russell King - ARM Linux
2014-03-20 15:26         ` Russell King - ARM Linux
2014-03-20 15:36         ` Ben Dooks
2014-03-20 15:36           ` Ben Dooks
2014-03-20 15:36           ` Ben Dooks
2014-03-20 16:09       ` Arnd Bergmann
2014-03-20 16:09         ` Arnd Bergmann
2014-03-20 16:09         ` Arnd Bergmann
2014-03-20 16:12         ` Ben Dooks
2014-03-20 16:12           ` Ben Dooks
2014-03-20 16:12           ` Ben Dooks
2014-03-20 16:41           ` Arnd Bergmann
2014-03-20 16:41             ` Arnd Bergmann
2014-03-20 16:41             ` Arnd Bergmann
2014-03-20 17:25             ` Ben Dooks
2014-03-20 17:25               ` Ben Dooks
2014-03-20 17:25               ` Ben Dooks
2014-03-20 17:31               ` Jason Gunthorpe
2014-03-20 17:31                 ` Jason Gunthorpe
2014-03-20 17:31                 ` Jason Gunthorpe
2014-03-20 17:32                 ` Ben Dooks
2014-03-20 17:32                   ` Ben Dooks
2014-03-20 17:32                   ` Ben Dooks
2014-03-20 18:29                   ` Arnd Bergmann
2014-03-20 18:29                     ` Arnd Bergmann
2014-03-20 18:29                     ` Arnd Bergmann
2014-03-20 18:39                     ` Russell King - ARM Linux
2014-03-20 18:39                       ` Russell King - ARM Linux
2014-03-20 18:39                       ` Russell King - ARM Linux
2014-03-20 19:04                       ` Geert Uytterhoeven
2014-03-20 19:04                         ` Geert Uytterhoeven
2014-03-20 19:04                         ` Geert Uytterhoeven
2014-03-20 19:04                         ` Geert Uytterhoeven
2014-03-20 22:11                       ` Arnd Bergmann
2014-03-20 22:11                         ` Arnd Bergmann
2014-03-20 22:11                         ` Arnd Bergmann
2014-03-20 22:50                         ` Russell King - ARM Linux
2014-03-20 22:50                           ` Russell King - ARM Linux
2014-03-20 22:50                           ` Russell King - ARM Linux

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.