[PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2016-12-29 20:45 ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2016-12-29 20:45 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Arnd Bergmann, linux-arm-kernel,
	Simon Horman, Bjorn Helgaas, linux-pci, linux-renesas-soc
  Cc: artemi.ivanov, linux-kernel, Nikita Yushchenko

It is possible that PCI device supports 64-bit DMA addressing, and thus
it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
bridge has limitations on inbound transactions addressing. Example of
such setup is NVME SSD device connected to RCAR PCIe controller.

Previously there was attempt to handle this via bus notifier: after
driver is attached to PCI device, bridge driver gets notifier callback,
and resets dma_mask from there. However, this is racy: PCI device driver
could already allocate buffers and/or start i/o in probe routine.
In NVME case, i/o is started in workqueue context, and this race gives
"sometimes works, sometimes not" effect.

Proper solution should make driver's dma_set_mask() call to fail if host
bridge can't support mask being set.

This patch makes __swiotlb_dma_supported() to check mask being set for
PCI device against dma_mask of struct device corresponding to PCI host
bridge (one with name "pciXXXX:YY"), if that dma_mask is set.

This is the least destructive approach: currently dma_mask of that device
object is not used anyhow, thus all existing setups will work as before,
and modification is required only in actually affected components -
driver of particular PCI host bridge, and dma_map_ops of particular
platform.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
---
 arch/arm64/mm/dma-mapping.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f..49645277 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -28,6 +28,7 @@
 #include <linux/dma-contiguous.h>
 #include <linux/vmalloc.h>
 #include <linux/swiotlb.h>
+#include <linux/pci.h>

 #include <asm/cacheflush.h>

@@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,

 static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
+#ifdef CONFIG_PCI
+	if (dev_is_pci(hwdev)) {
+		struct pci_dev *pdev = to_pci_dev(hwdev);
+		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
+
+		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
+				(mask & (*br->dev.dma_mask)) != mask)
+			return 0;
+	}
+#endif
 	if (swiotlb)
 		return swiotlb_dma_supported(hwdev, mask);
 	return 1;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2016-12-29 20:45 ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2016-12-29 20:45 UTC (permalink / raw)
  To: linux-arm-kernel

It is possible that PCI device supports 64-bit DMA addressing, and thus
it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
bridge has limitations on inbound transactions addressing. Example of
such setup is NVME SSD device connected to RCAR PCIe controller.

Previously there was attempt to handle this via bus notifier: after
driver is attached to PCI device, bridge driver gets notifier callback,
and resets dma_mask from there. However, this is racy: PCI device driver
could already allocate buffers and/or start i/o in probe routine.
In NVME case, i/o is started in workqueue context, and this race gives
"sometimes works, sometimes not" effect.

Proper solution should make driver's dma_set_mask() call to fail if host
bridge can't support mask being set.

This patch makes __swiotlb_dma_supported() to check mask being set for
PCI device against dma_mask of struct device corresponding to PCI host
bridge (one with name "pciXXXX:YY"), if that dma_mask is set.

This is the least destructive approach: currently dma_mask of that device
object is not used anyhow, thus all existing setups will work as before,
and modification is required only in actually affected components -
driver of particular PCI host bridge, and dma_map_ops of particular
platform.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
---
 arch/arm64/mm/dma-mapping.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f..49645277 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -28,6 +28,7 @@
 #include <linux/dma-contiguous.h>
 #include <linux/vmalloc.h>
 #include <linux/swiotlb.h>
+#include <linux/pci.h>

 #include <asm/cacheflush.h>

@@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,

 static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 {
+#ifdef CONFIG_PCI
+	if (dev_is_pci(hwdev)) {
+		struct pci_dev *pdev = to_pci_dev(hwdev);
+		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
+
+		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
+				(mask & (*br->dev.dma_mask)) != mask)
+			return 0;
+	}
+#endif
 	if (swiotlb)
 		return swiotlb_dma_supported(hwdev, mask);
 	return 1;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 2/2] rcar-pcie: set host bridge's DMA mask
  2016-12-29 20:45 ` Nikita Yushchenko
@ 2016-12-29 20:45   ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2016-12-29 20:45 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, Arnd Bergmann, linux-arm-kernel,
	Simon Horman, Bjorn Helgaas, linux-pci, linux-renesas-soc
  Cc: artemi.ivanov, linux-kernel, Nikita Yushchenko

This gives platform DMA mapping code a chance to disallow setting device
DMA mask to something that host bridge can't support.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
---
 drivers/pci/host/pcie-rcar.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/host/pcie-rcar.c b/drivers/pci/host/pcie-rcar.c
index aca85be..b1edc3c 100644
--- a/drivers/pci/host/pcie-rcar.c
+++ b/drivers/pci/host/pcie-rcar.c
@@ -451,6 +451,7 @@ static int rcar_pcie_enable(struct rcar_pcie *pcie)
 {
 	struct device *dev = pcie->dev;
 	struct pci_bus *bus, *child;
+	struct pci_host_bridge *bridge;
 	LIST_HEAD(res);
 
 	/* Try setting 5 GT/s link speed */
@@ -480,6 +481,10 @@ static int rcar_pcie_enable(struct rcar_pcie *pcie)
 	list_for_each_entry(child, &bus->children, node)
 		pcie_bus_configure_settings(child);
 
+	bridge = pci_find_host_bridge(bus);
+	bridge->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+	bridge->dev.dma_mask = &bridge->dev.coherent_dma_mask;
+
 	pci_bus_add_devices(bus);
 
 	return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 2/2] rcar-pcie: set host bridge's DMA mask
@ 2016-12-29 20:45   ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2016-12-29 20:45 UTC (permalink / raw)
  To: linux-arm-kernel

This gives platform DMA mapping code a chance to disallow setting device
DMA mask to something that host bridge can't support.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
---
 drivers/pci/host/pcie-rcar.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/host/pcie-rcar.c b/drivers/pci/host/pcie-rcar.c
index aca85be..b1edc3c 100644
--- a/drivers/pci/host/pcie-rcar.c
+++ b/drivers/pci/host/pcie-rcar.c
@@ -451,6 +451,7 @@ static int rcar_pcie_enable(struct rcar_pcie *pcie)
 {
 	struct device *dev = pcie->dev;
 	struct pci_bus *bus, *child;
+	struct pci_host_bridge *bridge;
 	LIST_HEAD(res);
 
 	/* Try setting 5 GT/s link speed */
@@ -480,6 +481,10 @@ static int rcar_pcie_enable(struct rcar_pcie *pcie)
 	list_for_each_entry(child, &bus->children, node)
 		pcie_bus_configure_settings(child);
 
+	bridge = pci_find_host_bridge(bus);
+	bridge->dev.coherent_dma_mask = DMA_BIT_MASK(32);
+	bridge->dev.dma_mask = &bridge->dev.coherent_dma_mask;
+
 	pci_bus_add_devices(bus);
 
 	return 0;
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2016-12-29 20:45 ` Nikita Yushchenko
  (?)
@ 2016-12-29 21:18   ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2016-12-29 21:18 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, Simon Horman,
	Bjorn Helgaas, linux-pci, linux-renesas-soc, artemi.ivanov,
	linux-kernel

On Thursday, December 29, 2016 11:45:03 PM CET Nikita Yushchenko wrote:
> 
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +       if (dev_is_pci(hwdev)) {
> +               struct pci_dev *pdev = to_pci_dev(hwdev);
> +               struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +               if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +                               (mask & (*br->dev.dma_mask)) != mask)
> +                       return 0;
> +       }
> +#endif
>         if (swiotlb)
>                 return swiotlb_dma_supported(hwdev, mask);
>         return 1;
> 

I think it's wrong to make this a special case for PCI.

Instead, we should follow the dma-ranges properties during dma_set_mask()
to ensure we don't set a mask that any of the parents up to the root
cannot support.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2016-12-29 21:18   ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-02-16 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday, December 29, 2016 11:45:03 PM CET Nikita Yushchenko wrote:
> 
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +       if (dev_is_pci(hwdev)) {
> +               struct pci_dev *pdev = to_pci_dev(hwdev);
> +               struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +               if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +                               (mask & (*br->dev.dma_mask)) != mask)
> +                       return 0;
> +       }
> +#endif
>         if (swiotlb)
>                 return swiotlb_dma_supported(hwdev, mask);
>         return 1;
> 

I think it's wrong to make this a special case for PCI.

Instead, we should follow the dma-ranges properties during dma_set_mask()
to ensure we don't set a mask that any of the parents up to the root
cannot support.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2016-12-29 21:18   ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2016-12-29 21:18 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

On Thursday, December 29, 2016 11:45:03 PM CET Nikita Yushchenko wrote:
> 
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +       if (dev_is_pci(hwdev)) {
> +               struct pci_dev *pdev = to_pci_dev(hwdev);
> +               struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +               if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +                               (mask & (*br->dev.dma_mask)) != mask)
> +                       return 0;
> +       }
> +#endif
>         if (swiotlb)
>                 return swiotlb_dma_supported(hwdev, mask);
>         return 1;
> 

I think it's wrong to make this a special case for PCI.

Instead, we should follow the dma-ranges properties during dma_set_mask()
to ensure we don't set a mask that any of the parents up to the root
cannot support.

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2016-12-29 20:45 ` Nikita Yushchenko
@ 2016-12-30  9:46   ` Sergei Shtylyov
  -1 siblings, 0 replies; 115+ messages in thread
From: Sergei Shtylyov @ 2016-12-30  9:46 UTC (permalink / raw)
  To: Nikita Yushchenko, Catalin Marinas, Will Deacon, Arnd Bergmann,
	linux-arm-kernel, Simon Horman, Bjorn Helgaas, linux-pci,
	linux-renesas-soc
  Cc: artemi.ivanov, linux-kernel

Hello!

On 12/29/2016 11:45 PM, Nikita Yushchenko wrote:

> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host

    Its.

> bridge has limitations on inbound transactions addressing. Example of
> such setup is NVME

    Isn't it called NVMe?

> SSD device connected to RCAR PCIe controller.

    R=Car.

> Previously there was attempt to handle this via bus notifier: after
> driver is attached to PCI device, bridge driver gets notifier callback,
> and resets dma_mask from there. However, this is racy: PCI device driver
> could already allocate buffers and/or start i/o in probe routine.
> In NVME case, i/o is started in workqueue context, and this race gives
> "sometimes works, sometimes not" effect.
>
> Proper solution should make driver's dma_set_mask() call to fail if host
> bridge can't support mask being set.
>
> This patch makes __swiotlb_dma_supported() to check mask being set for

    "To" not needed here.

> PCI device against dma_mask of struct device corresponding to PCI host
> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>
> This is the least destructive approach: currently dma_mask of that device
> object is not used anyhow, thus all existing setups will work as before,
> and modification is required only in actually affected components -
> driver of particular PCI host bridge, and dma_map_ops of particular
> platform.
>
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> ---
>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 290a84f..49645277 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
[...]
> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +	if (dev_is_pci(hwdev)) {
> +		struct pci_dev *pdev = to_pci_dev(hwdev);
> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +				(mask & (*br->dev.dma_mask)) != mask)

    Hum, inner parens not necessary?

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2016-12-30  9:46   ` Sergei Shtylyov
  0 siblings, 0 replies; 115+ messages in thread
From: Sergei Shtylyov @ 2016-12-30  9:46 UTC (permalink / raw)
  To: linux-arm-kernel

Hello!

On 12/29/2016 11:45 PM, Nikita Yushchenko wrote:

> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host

    Its.

> bridge has limitations on inbound transactions addressing. Example of
> such setup is NVME

    Isn't it called NVMe?

> SSD device connected to RCAR PCIe controller.

    R=Car.

> Previously there was attempt to handle this via bus notifier: after
> driver is attached to PCI device, bridge driver gets notifier callback,
> and resets dma_mask from there. However, this is racy: PCI device driver
> could already allocate buffers and/or start i/o in probe routine.
> In NVME case, i/o is started in workqueue context, and this race gives
> "sometimes works, sometimes not" effect.
>
> Proper solution should make driver's dma_set_mask() call to fail if host
> bridge can't support mask being set.
>
> This patch makes __swiotlb_dma_supported() to check mask being set for

    "To" not needed here.

> PCI device against dma_mask of struct device corresponding to PCI host
> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>
> This is the least destructive approach: currently dma_mask of that device
> object is not used anyhow, thus all existing setups will work as before,
> and modification is required only in actually affected components -
> driver of particular PCI host bridge, and dma_map_ops of particular
> platform.
>
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> ---
>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
>
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 290a84f..49645277 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
[...]
> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +	if (dev_is_pci(hwdev)) {
> +		struct pci_dev *pdev = to_pci_dev(hwdev);
> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +				(mask & (*br->dev.dma_mask)) != mask)

    Hum, inner parens not necessary?

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2016-12-30  9:46   ` Sergei Shtylyov
@ 2016-12-30 10:06     ` Sergei Shtylyov
  -1 siblings, 0 replies; 115+ messages in thread
From: Sergei Shtylyov @ 2016-12-30 10:06 UTC (permalink / raw)
  To: Nikita Yushchenko, Catalin Marinas, Will Deacon, Arnd Bergmann,
	linux-arm-kernel, Simon Horman, Bjorn Helgaas, linux-pci,
	linux-renesas-soc
  Cc: artemi.ivanov, linux-kernel

On 12/30/2016 12:46 PM, Sergei Shtylyov wrote:

>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>
>    Its.
>
>> bridge has limitations on inbound transactions addressing. Example of
>> such setup is NVME
>
>    Isn't it called NVMe?
>
>> SSD device connected to RCAR PCIe controller.
>
>    R=Car.

    Sorry, R-Car. :-)

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2016-12-30 10:06     ` Sergei Shtylyov
  0 siblings, 0 replies; 115+ messages in thread
From: Sergei Shtylyov @ 2016-12-30 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

On 12/30/2016 12:46 PM, Sergei Shtylyov wrote:

>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>
>    Its.
>
>> bridge has limitations on inbound transactions addressing. Example of
>> such setup is NVME
>
>    Isn't it called NVMe?
>
>> SSD device connected to RCAR PCIe controller.
>
>    R=Car.

    Sorry, R-Car. :-)

[...]

MBR, Sergei

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2016-12-29 20:45 ` Nikita Yushchenko
@ 2017-01-03 18:44   ` Will Deacon
  -1 siblings, 0 replies; 115+ messages in thread
From: Will Deacon @ 2017-01-03 18:44 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Catalin Marinas, Arnd Bergmann, linux-arm-kernel, Simon Horman,
	Bjorn Helgaas, linux-pci, linux-renesas-soc, artemi.ivanov,
	linux-kernel

On Thu, Dec 29, 2016 at 11:45:03PM +0300, Nikita Yushchenko wrote:
> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
> bridge has limitations on inbound transactions addressing. Example of
> such setup is NVME SSD device connected to RCAR PCIe controller.
> 
> Previously there was attempt to handle this via bus notifier: after
> driver is attached to PCI device, bridge driver gets notifier callback,
> and resets dma_mask from there. However, this is racy: PCI device driver
> could already allocate buffers and/or start i/o in probe routine.
> In NVME case, i/o is started in workqueue context, and this race gives
> "sometimes works, sometimes not" effect.
> 
> Proper solution should make driver's dma_set_mask() call to fail if host
> bridge can't support mask being set.
> 
> This patch makes __swiotlb_dma_supported() to check mask being set for
> PCI device against dma_mask of struct device corresponding to PCI host
> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
> 
> This is the least destructive approach: currently dma_mask of that device
> object is not used anyhow, thus all existing setups will work as before,
> and modification is required only in actually affected components -
> driver of particular PCI host bridge, and dma_map_ops of particular
> platform.
> 
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> ---
>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 290a84f..49645277 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -28,6 +28,7 @@
>  #include <linux/dma-contiguous.h>
>  #include <linux/vmalloc.h>
>  #include <linux/swiotlb.h>
> +#include <linux/pci.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>  
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +	if (dev_is_pci(hwdev)) {
> +		struct pci_dev *pdev = to_pci_dev(hwdev);
> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +				(mask & (*br->dev.dma_mask)) != mask)
> +			return 0;
> +	}
> +#endif

Hmm, but this makes it look like the problem is both arm64 and swiotlb
specific, when in reality it's not. Perhaps another hack you could try
would be to register a PCI bus notifier in the host bridge looking for
BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
device before the driver has probed, but adding a dma_set_mask callback
to limit the mask to what you need?

I agree that it would be better if dma_set_mask handled all of this
transparently, but it's all based on the underlying ops rather than the
bus type.

Will

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 18:44   ` Will Deacon
  0 siblings, 0 replies; 115+ messages in thread
From: Will Deacon @ 2017-01-03 18:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 29, 2016 at 11:45:03PM +0300, Nikita Yushchenko wrote:
> It is possible that PCI device supports 64-bit DMA addressing, and thus
> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
> bridge has limitations on inbound transactions addressing. Example of
> such setup is NVME SSD device connected to RCAR PCIe controller.
> 
> Previously there was attempt to handle this via bus notifier: after
> driver is attached to PCI device, bridge driver gets notifier callback,
> and resets dma_mask from there. However, this is racy: PCI device driver
> could already allocate buffers and/or start i/o in probe routine.
> In NVME case, i/o is started in workqueue context, and this race gives
> "sometimes works, sometimes not" effect.
> 
> Proper solution should make driver's dma_set_mask() call to fail if host
> bridge can't support mask being set.
> 
> This patch makes __swiotlb_dma_supported() to check mask being set for
> PCI device against dma_mask of struct device corresponding to PCI host
> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
> 
> This is the least destructive approach: currently dma_mask of that device
> object is not used anyhow, thus all existing setups will work as before,
> and modification is required only in actually affected components -
> driver of particular PCI host bridge, and dma_map_ops of particular
> platform.
> 
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> ---
>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 290a84f..49645277 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -28,6 +28,7 @@
>  #include <linux/dma-contiguous.h>
>  #include <linux/vmalloc.h>
>  #include <linux/swiotlb.h>
> +#include <linux/pci.h>
>  
>  #include <asm/cacheflush.h>
>  
> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>  
>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  {
> +#ifdef CONFIG_PCI
> +	if (dev_is_pci(hwdev)) {
> +		struct pci_dev *pdev = to_pci_dev(hwdev);
> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> +
> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> +				(mask & (*br->dev.dma_mask)) != mask)
> +			return 0;
> +	}
> +#endif

Hmm, but this makes it look like the problem is both arm64 and swiotlb
specific, when in reality it's not. Perhaps another hack you could try
would be to register a PCI bus notifier in the host bridge looking for
BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
device before the driver has probed, but adding a dma_set_mask callback
to limit the mask to what you need?

I agree that it would be better if dma_set_mask handled all of this
transparently, but it's all based on the underlying ops rather than the
bus type.

Will

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-03 18:44   ` Will Deacon
@ 2017-01-03 19:00     ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-03 19:00 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnd Bergmann, Catalin Marinas, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

CgowMy4wMS4yMDE3IDIxOjQ0LCBXaWxsIERlYWNvbiDQv9C40YjQtdGCOgo+IE9uIFRodSwgRGVj
IDI5LCAyMDE2IGF0IDExOjQ1OjAzUE0gKzAzMDAsIE5pa2l0YSBZdXNoY2hlbmtvIHdyb3RlOgo+
PiBJdCBpcyBwb3NzaWJsZSB0aGF0IFBDSSBkZXZpY2Ugc3VwcG9ydHMgNjQtYml0IERNQSBhZGRy
ZXNzaW5nLCBhbmQgdGh1cwo+PiBpdCdzIGRyaXZlciBzZXRzIGRldmljZSdzIGRtYV9tYXNrIHRv
IERNQV9CSVRfTUFTSyg2NCksIGhvd2V2ZXIgUENJIGhvc3QKPj4gYnJpZGdlIGhhcyBsaW1pdGF0
aW9ucyBvbiBpbmJvdW5kIHRyYW5zYWN0aW9ucyBhZGRyZXNzaW5nLiBFeGFtcGxlIG9mCj4+IHN1
Y2ggc2V0dXAgaXMgTlZNRSBTU0QgZGV2aWNlIGNvbm5lY3RlZCB0byBSQ0FSIFBDSWUgY29udHJv
bGxlci4KPj4KPj4gUHJldmlvdXNseSB0aGVyZSB3YXMgYXR0ZW1wdCB0byBoYW5kbGUgdGhpcyB2
aWEgYnVzIG5vdGlmaWVyOiBhZnRlcgo+PiBkcml2ZXIgaXMgYXR0YWNoZWQgdG8gUENJIGRldmlj
ZSwgYnJpZGdlIGRyaXZlciBnZXRzIG5vdGlmaWVyIGNhbGxiYWNrLAo+PiBhbmQgcmVzZXRzIGRt
YV9tYXNrIGZyb20gdGhlcmUuIEhvd2V2ZXIsIHRoaXMgaXMgcmFjeTogUENJIGRldmljZSBkcml2
ZXIKPj4gY291bGQgYWxyZWFkeSBhbGxvY2F0ZSBidWZmZXJzIGFuZC9vciBzdGFydCBpL28gaW4g
cHJvYmUgcm91dGluZS4KPj4gSW4gTlZNRSBjYXNlLCBpL28gaXMgc3RhcnRlZCBpbiB3b3JrcXVl
dWUgY29udGV4dCwgYW5kIHRoaXMgcmFjZSBnaXZlcwo+PiAic29tZXRpbWVzIHdvcmtzLCBzb21l
dGltZXMgbm90IiBlZmZlY3QuCj4+Cj4+IFByb3BlciBzb2x1dGlvbiBzaG91bGQgbWFrZSBkcml2
ZXIncyBkbWFfc2V0X21hc2soKSBjYWxsIHRvIGZhaWwgaWYgaG9zdAo+PiBicmlkZ2UgY2FuJ3Qg
c3VwcG9ydCBtYXNrIGJlaW5nIHNldC4KPj4KPj4gVGhpcyBwYXRjaCBtYWtlcyBfX3N3aW90bGJf
ZG1hX3N1cHBvcnRlZCgpIHRvIGNoZWNrIG1hc2sgYmVpbmcgc2V0IGZvcgo+PiBQQ0kgZGV2aWNl
IGFnYWluc3QgZG1hX21hc2sgb2Ygc3RydWN0IGRldmljZSBjb3JyZXNwb25kaW5nIHRvIFBDSSBo
b3N0Cj4+IGJyaWRnZSAob25lIHdpdGggbmFtZSAicGNpWFhYWDpZWSIpLCBpZiB0aGF0IGRtYV9t
YXNrIGlzIHNldC4KPj4KPj4gVGhpcyBpcyB0aGUgbGVhc3QgZGVzdHJ1Y3RpdmUgYXBwcm9hY2g6
IGN1cnJlbnRseSBkbWFfbWFzayBvZiB0aGF0IGRldmljZQo+PiBvYmplY3QgaXMgbm90IHVzZWQg
YW55aG93LCB0aHVzIGFsbCBleGlzdGluZyBzZXR1cHMgd2lsbCB3b3JrIGFzIGJlZm9yZSwKPj4g
YW5kIG1vZGlmaWNhdGlvbiBpcyByZXF1aXJlZCBvbmx5IGluIGFjdHVhbGx5IGFmZmVjdGVkIGNv
bXBvbmVudHMgLQo+PiBkcml2ZXIgb2YgcGFydGljdWxhciBQQ0kgaG9zdCBicmlkZ2UsIGFuZCBk
bWFfbWFwX29wcyBvZiBwYXJ0aWN1bGFyCj4+IHBsYXRmb3JtLgo+Pgo+PiBTaWduZWQtb2ZmLWJ5
OiBOaWtpdGEgWXVzaGNoZW5rbyA8bmlraXRhLnlvdXNoQGNvZ2VudGVtYmVkZGVkLmNvbT4KPj4g
LS0tCj4+ICBhcmNoL2FybTY0L21tL2RtYS1tYXBwaW5nLmMgfCAxMSArKysrKysrKysrKwo+PiAg
MSBmaWxlIGNoYW5nZWQsIDExIGluc2VydGlvbnMoKykKPj4KPj4gZGlmZiAtLWdpdCBhL2FyY2gv
YXJtNjQvbW0vZG1hLW1hcHBpbmcuYyBiL2FyY2gvYXJtNjQvbW0vZG1hLW1hcHBpbmcuYwo+PiBp
bmRleCAyOTBhODRmLi40OTY0NTI3NyAxMDA2NDQKPj4gLS0tIGEvYXJjaC9hcm02NC9tbS9kbWEt
bWFwcGluZy5jCj4+ICsrKyBiL2FyY2gvYXJtNjQvbW0vZG1hLW1hcHBpbmcuYwo+PiBAQCAtMjgs
NiArMjgsNyBAQAo+PiAgI2luY2x1ZGUgPGxpbnV4L2RtYS1jb250aWd1b3VzLmg+Cj4+ICAjaW5j
bHVkZSA8bGludXgvdm1hbGxvYy5oPgo+PiAgI2luY2x1ZGUgPGxpbnV4L3N3aW90bGIuaD4KPj4g
KyNpbmNsdWRlIDxsaW51eC9wY2kuaD4KPj4gIAo+PiAgI2luY2x1ZGUgPGFzbS9jYWNoZWZsdXNo
Lmg+Cj4+ICAKPj4gQEAgLTM0Nyw2ICszNDgsMTYgQEAgc3RhdGljIGludCBfX3N3aW90bGJfZ2V0
X3NndGFibGUoc3RydWN0IGRldmljZSAqZGV2LCBzdHJ1Y3Qgc2dfdGFibGUgKnNndCwKPj4gIAo+
PiAgc3RhdGljIGludCBfX3N3aW90bGJfZG1hX3N1cHBvcnRlZChzdHJ1Y3QgZGV2aWNlICpod2Rl
diwgdTY0IG1hc2spCj4+ICB7Cj4+ICsjaWZkZWYgQ09ORklHX1BDSQo+PiArCWlmIChkZXZfaXNf
cGNpKGh3ZGV2KSkgewo+PiArCQlzdHJ1Y3QgcGNpX2RldiAqcGRldiA9IHRvX3BjaV9kZXYoaHdk
ZXYpOwo+PiArCQlzdHJ1Y3QgcGNpX2hvc3RfYnJpZGdlICpiciA9IHBjaV9maW5kX2hvc3RfYnJp
ZGdlKHBkZXYtPmJ1cyk7Cj4+ICsKPj4gKwkJaWYgKGJyLT5kZXYuZG1hX21hc2sgJiYgKCpici0+
ZGV2LmRtYV9tYXNrKSAmJgo+PiArCQkJCShtYXNrICYgKCpici0+ZGV2LmRtYV9tYXNrKSkgIT0g
bWFzaykKPj4gKwkJCXJldHVybiAwOwo+PiArCX0KPj4gKyNlbmRpZgo+IAo+IEhtbSwgYnV0IHRo
aXMgbWFrZXMgaXQgbG9vayBsaWtlIHRoZSBwcm9ibGVtIGlzIGJvdGggYXJtNjQgYW5kIHN3aW90
bGIKPiBzcGVjaWZpYywgd2hlbiBpbiByZWFsaXR5IGl0J3Mgbm90LiBQZXJoYXBzIGFub3RoZXIg
aGFjayB5b3UgY291bGQgdHJ5Cj4gd291bGQgYmUgdG8gcmVnaXN0ZXIgYSBQQ0kgYnVzIG5vdGlm
aWVyIGluIHRoZSBob3N0IGJyaWRnZSBsb29raW5nIGZvcgo+IEJVU19OT1RJRllfQklORF9EUklW
RVIsIHRoZW4geW91IGNvdWxkIHByb3h5IHRoZSBETUEgb3BzIGZvciBlYWNoIGNoaWxkCj4gZGV2
aWNlIGJlZm9yZSB0aGUgZHJpdmVyIGhhcyBwcm9iZWQsIGJ1dCBhZGRpbmcgYSBkbWFfc2V0X21h
c2sgY2FsbGJhY2sKPiB0byBsaW1pdCB0aGUgbWFzayB0byB3aGF0IHlvdSBuZWVkPwoKVGhpcyBp
cyB3aGF0IFJlbmVzYXMgQlNQIHRyaWVzIHRvIGRvIGFuZCBpdCBkb2VzIG5vdCB3b3JrLgoKQlVT
X05PVElGWV9CSU5EX0RSSVZFUiBhcnJpdmVzIGFmdGVyIGRyaXZlcidzIHByb2JlIHJvdXRpbmUg
ZXhpdHMsIGJ1dAppL28gY2FuIGJlIHN0YXJ0ZWQgYmVmb3JlIHRoYXQuCgpfX19fX19fX19fX19f
X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fXwpsaW51eC1hcm0ta2VybmVsIG1haWxp
bmcgbGlzdApsaW51eC1hcm0ta2VybmVsQGxpc3RzLmluZnJhZGVhZC5vcmcKaHR0cDovL2xpc3Rz
LmluZnJhZGVhZC5vcmcvbWFpbG1hbi9saXN0aW5mby9saW51eC1hcm0ta2VybmVsCg==

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 19:00     ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-03 19:00 UTC (permalink / raw)
  To: linux-arm-kernel



03.01.2017 21:44, Will Deacon ?????:
> On Thu, Dec 29, 2016 at 11:45:03PM +0300, Nikita Yushchenko wrote:
>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>> bridge has limitations on inbound transactions addressing. Example of
>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>
>> Previously there was attempt to handle this via bus notifier: after
>> driver is attached to PCI device, bridge driver gets notifier callback,
>> and resets dma_mask from there. However, this is racy: PCI device driver
>> could already allocate buffers and/or start i/o in probe routine.
>> In NVME case, i/o is started in workqueue context, and this race gives
>> "sometimes works, sometimes not" effect.
>>
>> Proper solution should make driver's dma_set_mask() call to fail if host
>> bridge can't support mask being set.
>>
>> This patch makes __swiotlb_dma_supported() to check mask being set for
>> PCI device against dma_mask of struct device corresponding to PCI host
>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>
>> This is the least destructive approach: currently dma_mask of that device
>> object is not used anyhow, thus all existing setups will work as before,
>> and modification is required only in actually affected components -
>> driver of particular PCI host bridge, and dma_map_ops of particular
>> platform.
>>
>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>> ---
>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index 290a84f..49645277 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -28,6 +28,7 @@
>>  #include <linux/dma-contiguous.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/swiotlb.h>
>> +#include <linux/pci.h>
>>  
>>  #include <asm/cacheflush.h>
>>  
>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>  
>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>  {
>> +#ifdef CONFIG_PCI
>> +	if (dev_is_pci(hwdev)) {
>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>> +
>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>> +				(mask & (*br->dev.dma_mask)) != mask)
>> +			return 0;
>> +	}
>> +#endif
> 
> Hmm, but this makes it look like the problem is both arm64 and swiotlb
> specific, when in reality it's not. Perhaps another hack you could try
> would be to register a PCI bus notifier in the host bridge looking for
> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
> device before the driver has probed, but adding a dma_set_mask callback
> to limit the mask to what you need?

This is what Renesas BSP tries to do and it does not work.

BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
i/o can be started before that.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-03 18:44   ` Will Deacon
  (?)
@ 2017-01-03 19:01     ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-03 19:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, Arnd Bergmann, linux-arm-kernel, Simon Horman,
	Bjorn Helgaas, linux-pci, linux-renesas-soc, artemi.ivanov,
	linux-kernel

>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>> bridge has limitations on inbound transactions addressing. Example of
>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>
>> Previously there was attempt to handle this via bus notifier: after
>> driver is attached to PCI device, bridge driver gets notifier callback,
>> and resets dma_mask from there. However, this is racy: PCI device driver
>> could already allocate buffers and/or start i/o in probe routine.
>> In NVME case, i/o is started in workqueue context, and this race gives
>> "sometimes works, sometimes not" effect.
>>
>> Proper solution should make driver's dma_set_mask() call to fail if host
>> bridge can't support mask being set.
>>
>> This patch makes __swiotlb_dma_supported() to check mask being set for
>> PCI device against dma_mask of struct device corresponding to PCI host
>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>
>> This is the least destructive approach: currently dma_mask of that device
>> object is not used anyhow, thus all existing setups will work as before,
>> and modification is required only in actually affected components -
>> driver of particular PCI host bridge, and dma_map_ops of particular
>> platform.
>>
>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>> ---
>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index 290a84f..49645277 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -28,6 +28,7 @@
>>  #include <linux/dma-contiguous.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/swiotlb.h>
>> +#include <linux/pci.h>
>>  
>>  #include <asm/cacheflush.h>
>>  
>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>  
>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>  {
>> +#ifdef CONFIG_PCI
>> +	if (dev_is_pci(hwdev)) {
>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>> +
>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>> +				(mask & (*br->dev.dma_mask)) != mask)
>> +			return 0;
>> +	}
>> +#endif
> 
> Hmm, but this makes it look like the problem is both arm64 and swiotlb
> specific, when in reality it's not. Perhaps another hack you could try
> would be to register a PCI bus notifier in the host bridge looking for
> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
> device before the driver has probed, but adding a dma_set_mask callback
> to limit the mask to what you need?

This is what Renesas BSP tries to do and it does not work.

BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
i/o can be started before that.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 19:01     ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-03 19:01 UTC (permalink / raw)
  To: linux-arm-kernel

>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>> bridge has limitations on inbound transactions addressing. Example of
>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>
>> Previously there was attempt to handle this via bus notifier: after
>> driver is attached to PCI device, bridge driver gets notifier callback,
>> and resets dma_mask from there. However, this is racy: PCI device driver
>> could already allocate buffers and/or start i/o in probe routine.
>> In NVME case, i/o is started in workqueue context, and this race gives
>> "sometimes works, sometimes not" effect.
>>
>> Proper solution should make driver's dma_set_mask() call to fail if host
>> bridge can't support mask being set.
>>
>> This patch makes __swiotlb_dma_supported() to check mask being set for
>> PCI device against dma_mask of struct device corresponding to PCI host
>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>
>> This is the least destructive approach: currently dma_mask of that device
>> object is not used anyhow, thus all existing setups will work as before,
>> and modification is required only in actually affected components -
>> driver of particular PCI host bridge, and dma_map_ops of particular
>> platform.
>>
>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>> ---
>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index 290a84f..49645277 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -28,6 +28,7 @@
>>  #include <linux/dma-contiguous.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/swiotlb.h>
>> +#include <linux/pci.h>
>>  
>>  #include <asm/cacheflush.h>
>>  
>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>  
>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>  {
>> +#ifdef CONFIG_PCI
>> +	if (dev_is_pci(hwdev)) {
>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>> +
>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>> +				(mask & (*br->dev.dma_mask)) != mask)
>> +			return 0;
>> +	}
>> +#endif
> 
> Hmm, but this makes it look like the problem is both arm64 and swiotlb
> specific, when in reality it's not. Perhaps another hack you could try
> would be to register a PCI bus notifier in the host bridge looking for
> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
> device before the driver has probed, but adding a dma_set_mask callback
> to limit the mask to what you need?

This is what Renesas BSP tries to do and it does not work.

BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
i/o can be started before that.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 19:01     ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-03 19:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnd Bergmann, Catalin Marinas, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>> bridge has limitations on inbound transactions addressing. Example of
>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>
>> Previously there was attempt to handle this via bus notifier: after
>> driver is attached to PCI device, bridge driver gets notifier callback,
>> and resets dma_mask from there. However, this is racy: PCI device driver
>> could already allocate buffers and/or start i/o in probe routine.
>> In NVME case, i/o is started in workqueue context, and this race gives
>> "sometimes works, sometimes not" effect.
>>
>> Proper solution should make driver's dma_set_mask() call to fail if host
>> bridge can't support mask being set.
>>
>> This patch makes __swiotlb_dma_supported() to check mask being set for
>> PCI device against dma_mask of struct device corresponding to PCI host
>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>
>> This is the least destructive approach: currently dma_mask of that device
>> object is not used anyhow, thus all existing setups will work as before,
>> and modification is required only in actually affected components -
>> driver of particular PCI host bridge, and dma_map_ops of particular
>> platform.
>>
>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>> ---
>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>> index 290a84f..49645277 100644
>> --- a/arch/arm64/mm/dma-mapping.c
>> +++ b/arch/arm64/mm/dma-mapping.c
>> @@ -28,6 +28,7 @@
>>  #include <linux/dma-contiguous.h>
>>  #include <linux/vmalloc.h>
>>  #include <linux/swiotlb.h>
>> +#include <linux/pci.h>
>>  
>>  #include <asm/cacheflush.h>
>>  
>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>  
>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>  {
>> +#ifdef CONFIG_PCI
>> +	if (dev_is_pci(hwdev)) {
>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>> +
>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>> +				(mask & (*br->dev.dma_mask)) != mask)
>> +			return 0;
>> +	}
>> +#endif
> 
> Hmm, but this makes it look like the problem is both arm64 and swiotlb
> specific, when in reality it's not. Perhaps another hack you could try
> would be to register a PCI bus notifier in the host bridge looking for
> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
> device before the driver has probed, but adding a dma_set_mask callback
> to limit the mask to what you need?

This is what Renesas BSP tries to do and it does not work.

BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
i/o can be started before that.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-03 19:01     ` Nikita Yushchenko
  (?)
@ 2017-01-03 20:13       ` Grygorii Strashko
  -1 siblings, 0 replies; 115+ messages in thread
From: Grygorii Strashko @ 2017-01-03 20:13 UTC (permalink / raw)
  To: Nikita Yushchenko, Will Deacon
  Cc: Arnd Bergmann, Catalin Marinas, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel



On 01/03/2017 01:01 PM, Nikita Yushchenko wrote:
>>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>>> bridge has limitations on inbound transactions addressing. Example of
>>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>>
>>> Previously there was attempt to handle this via bus notifier: after
>>> driver is attached to PCI device, bridge driver gets notifier callback,
>>> and resets dma_mask from there. However, this is racy: PCI device driver
>>> could already allocate buffers and/or start i/o in probe routine.
>>> In NVME case, i/o is started in workqueue context, and this race gives
>>> "sometimes works, sometimes not" effect.
>>>
>>> Proper solution should make driver's dma_set_mask() call to fail if host
>>> bridge can't support mask being set.
>>>
>>> This patch makes __swiotlb_dma_supported() to check mask being set for
>>> PCI device against dma_mask of struct device corresponding to PCI host
>>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>>
>>> This is the least destructive approach: currently dma_mask of that device
>>> object is not used anyhow, thus all existing setups will work as before,
>>> and modification is required only in actually affected components -
>>> driver of particular PCI host bridge, and dma_map_ops of particular
>>> platform.
>>>
>>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>>> ---
>>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>> index 290a84f..49645277 100644
>>> --- a/arch/arm64/mm/dma-mapping.c
>>> +++ b/arch/arm64/mm/dma-mapping.c
>>> @@ -28,6 +28,7 @@
>>>  #include <linux/dma-contiguous.h>
>>>  #include <linux/vmalloc.h>
>>>  #include <linux/swiotlb.h>
>>> +#include <linux/pci.h>
>>>  
>>>  #include <asm/cacheflush.h>
>>>  
>>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>>  
>>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>>  {
>>> +#ifdef CONFIG_PCI
>>> +	if (dev_is_pci(hwdev)) {
>>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>>> +
>>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>>> +				(mask & (*br->dev.dma_mask)) != mask)
>>> +			return 0;
>>> +	}
>>> +#endif
>>
>> Hmm, but this makes it look like the problem is both arm64 and swiotlb
>> specific, when in reality it's not. Perhaps another hack you could try
>> would be to register a PCI bus notifier in the host bridge looking for
>> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
>> device before the driver has probed, but adding a dma_set_mask callback
>> to limit the mask to what you need?
> 
> This is what Renesas BSP tries to do and it does not work.
> 
> BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
> i/o can be started before that.

Hm. This is strange statement:
 really_probe
 |->driver_sysfs_add
    |-> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_BIND_DRIVER, dev);
...
 |- ret = drv->probe(dev);
...
 |- driver_bound(dev);
    |- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
					     BUS_NOTIFY_BOUND_DRIVER, dev);

Am I missing smth?

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 20:13       ` Grygorii Strashko
  0 siblings, 0 replies; 115+ messages in thread
From: Grygorii Strashko @ 2017-01-03 20:13 UTC (permalink / raw)
  To: linux-arm-kernel



On 01/03/2017 01:01 PM, Nikita Yushchenko wrote:
>>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>>> bridge has limitations on inbound transactions addressing. Example of
>>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>>
>>> Previously there was attempt to handle this via bus notifier: after
>>> driver is attached to PCI device, bridge driver gets notifier callback,
>>> and resets dma_mask from there. However, this is racy: PCI device driver
>>> could already allocate buffers and/or start i/o in probe routine.
>>> In NVME case, i/o is started in workqueue context, and this race gives
>>> "sometimes works, sometimes not" effect.
>>>
>>> Proper solution should make driver's dma_set_mask() call to fail if host
>>> bridge can't support mask being set.
>>>
>>> This patch makes __swiotlb_dma_supported() to check mask being set for
>>> PCI device against dma_mask of struct device corresponding to PCI host
>>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>>
>>> This is the least destructive approach: currently dma_mask of that device
>>> object is not used anyhow, thus all existing setups will work as before,
>>> and modification is required only in actually affected components -
>>> driver of particular PCI host bridge, and dma_map_ops of particular
>>> platform.
>>>
>>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>>> ---
>>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>> index 290a84f..49645277 100644
>>> --- a/arch/arm64/mm/dma-mapping.c
>>> +++ b/arch/arm64/mm/dma-mapping.c
>>> @@ -28,6 +28,7 @@
>>>  #include <linux/dma-contiguous.h>
>>>  #include <linux/vmalloc.h>
>>>  #include <linux/swiotlb.h>
>>> +#include <linux/pci.h>
>>>  
>>>  #include <asm/cacheflush.h>
>>>  
>>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>>  
>>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>>  {
>>> +#ifdef CONFIG_PCI
>>> +	if (dev_is_pci(hwdev)) {
>>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>>> +
>>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>>> +				(mask & (*br->dev.dma_mask)) != mask)
>>> +			return 0;
>>> +	}
>>> +#endif
>>
>> Hmm, but this makes it look like the problem is both arm64 and swiotlb
>> specific, when in reality it's not. Perhaps another hack you could try
>> would be to register a PCI bus notifier in the host bridge looking for
>> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
>> device before the driver has probed, but adding a dma_set_mask callback
>> to limit the mask to what you need?
> 
> This is what Renesas BSP tries to do and it does not work.
> 
> BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
> i/o can be started before that.

Hm. This is strange statement:
 really_probe
 |->driver_sysfs_add
    |-> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_BIND_DRIVER, dev);
...
 |- ret = drv->probe(dev);
...
 |- driver_bound(dev);
    |- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
					     BUS_NOTIFY_BOUND_DRIVER, dev);

Am I missing smth?

-- 
regards,
-grygorii

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 20:13       ` Grygorii Strashko
  0 siblings, 0 replies; 115+ messages in thread
From: Grygorii Strashko @ 2017-01-03 20:13 UTC (permalink / raw)
  To: Nikita Yushchenko, Will Deacon
  Cc: Arnd Bergmann, Catalin Marinas, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel



On 01/03/2017 01:01 PM, Nikita Yushchenko wrote:
>>> It is possible that PCI device supports 64-bit DMA addressing, and thus
>>> it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host
>>> bridge has limitations on inbound transactions addressing. Example of
>>> such setup is NVME SSD device connected to RCAR PCIe controller.
>>>
>>> Previously there was attempt to handle this via bus notifier: after
>>> driver is attached to PCI device, bridge driver gets notifier callback,
>>> and resets dma_mask from there. However, this is racy: PCI device driver
>>> could already allocate buffers and/or start i/o in probe routine.
>>> In NVME case, i/o is started in workqueue context, and this race gives
>>> "sometimes works, sometimes not" effect.
>>>
>>> Proper solution should make driver's dma_set_mask() call to fail if host
>>> bridge can't support mask being set.
>>>
>>> This patch makes __swiotlb_dma_supported() to check mask being set for
>>> PCI device against dma_mask of struct device corresponding to PCI host
>>> bridge (one with name "pciXXXX:YY"), if that dma_mask is set.
>>>
>>> This is the least destructive approach: currently dma_mask of that device
>>> object is not used anyhow, thus all existing setups will work as before,
>>> and modification is required only in actually affected components -
>>> driver of particular PCI host bridge, and dma_map_ops of particular
>>> platform.
>>>
>>> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
>>> ---
>>>  arch/arm64/mm/dma-mapping.c | 11 +++++++++++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>> index 290a84f..49645277 100644
>>> --- a/arch/arm64/mm/dma-mapping.c
>>> +++ b/arch/arm64/mm/dma-mapping.c
>>> @@ -28,6 +28,7 @@
>>>  #include <linux/dma-contiguous.h>
>>>  #include <linux/vmalloc.h>
>>>  #include <linux/swiotlb.h>
>>> +#include <linux/pci.h>
>>>  
>>>  #include <asm/cacheflush.h>
>>>  
>>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>>  
>>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>>  {
>>> +#ifdef CONFIG_PCI
>>> +	if (dev_is_pci(hwdev)) {
>>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>>> +
>>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>>> +				(mask & (*br->dev.dma_mask)) != mask)
>>> +			return 0;
>>> +	}
>>> +#endif
>>
>> Hmm, but this makes it look like the problem is both arm64 and swiotlb
>> specific, when in reality it's not. Perhaps another hack you could try
>> would be to register a PCI bus notifier in the host bridge looking for
>> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
>> device before the driver has probed, but adding a dma_set_mask callback
>> to limit the mask to what you need?
> 
> This is what Renesas BSP tries to do and it does not work.
> 
> BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
> i/o can be started before that.

Hm. This is strange statement:
 really_probe
 |->driver_sysfs_add
    |-> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
 					     BUS_NOTIFY_BIND_DRIVER, dev);
...
 |- ret = drv->probe(dev);
...
 |- driver_bound(dev);
    |- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
					     BUS_NOTIFY_BOUND_DRIVER, dev);

Am I missing smth?

-- 
regards,
-grygorii

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-03 20:13       ` Grygorii Strashko
  (?)
@ 2017-01-03 20:23         ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-03 20:23 UTC (permalink / raw)
  To: Grygorii Strashko, Will Deacon
  Cc: Arnd Bergmann, Catalin Marinas, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

>>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>>> index 290a84f..49645277 100644
>>>> --- a/arch/arm64/mm/dma-mapping.c
>>>> +++ b/arch/arm64/mm/dma-mapping.c
>>>> @@ -28,6 +28,7 @@
>>>>  #include <linux/dma-contiguous.h>
>>>>  #include <linux/vmalloc.h>
>>>>  #include <linux/swiotlb.h>
>>>> +#include <linux/pci.h>
>>>>  
>>>>  #include <asm/cacheflush.h>
>>>>  
>>>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>>>  
>>>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>>>  {
>>>> +#ifdef CONFIG_PCI
>>>> +	if (dev_is_pci(hwdev)) {
>>>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>>>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>>>> +
>>>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>>>> +				(mask & (*br->dev.dma_mask)) != mask)
>>>> +			return 0;
>>>> +	}
>>>> +#endif
>>>
>>> Hmm, but this makes it look like the problem is both arm64 and swiotlb
>>> specific, when in reality it's not. Perhaps another hack you could try
>>> would be to register a PCI bus notifier in the host bridge looking for
>>> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
>>> device before the driver has probed, but adding a dma_set_mask callback
>>> to limit the mask to what you need?
>>
>> This is what Renesas BSP tries to do and it does not work.
>>
>> BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
>> i/o can be started before that.
> 
> Hm. This is strange statement:
>  really_probe
>  |->driver_sysfs_add
>     |-> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>  					     BUS_NOTIFY_BIND_DRIVER, dev);
> ...
>  |- ret = drv->probe(dev);
> ...
>  |- driver_bound(dev);
>     |- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> 					     BUS_NOTIFY_BOUND_DRIVER, dev);
> 
> Am I missing smth?

I misinterpreted your message, sorry.

BSP attaches to BUS_NOTIFY_BOUND_DRIVER, not to BUS_NOTIFY_BIND_DRIVER,
and simply overwrites device's dma_mask there.  You are suggesting
something completely different.

I'll check if your approach is practical.


Currently powerpc architecture has one more approach implemented, they
use pci_controller structure provided by host bridge driver, and that
has a set_dma_mask() hook. Maybe extending this beyond powerpc could be
a good idea. However, that will require changing quite a few host bridge
drivers, without any gain for most of those...

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 20:23         ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-03 20:23 UTC (permalink / raw)
  To: linux-arm-kernel

>>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>>> index 290a84f..49645277 100644
>>>> --- a/arch/arm64/mm/dma-mapping.c
>>>> +++ b/arch/arm64/mm/dma-mapping.c
>>>> @@ -28,6 +28,7 @@
>>>>  #include <linux/dma-contiguous.h>
>>>>  #include <linux/vmalloc.h>
>>>>  #include <linux/swiotlb.h>
>>>> +#include <linux/pci.h>
>>>>  
>>>>  #include <asm/cacheflush.h>
>>>>  
>>>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>>>  
>>>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>>>  {
>>>> +#ifdef CONFIG_PCI
>>>> +	if (dev_is_pci(hwdev)) {
>>>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>>>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>>>> +
>>>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>>>> +				(mask & (*br->dev.dma_mask)) != mask)
>>>> +			return 0;
>>>> +	}
>>>> +#endif
>>>
>>> Hmm, but this makes it look like the problem is both arm64 and swiotlb
>>> specific, when in reality it's not. Perhaps another hack you could try
>>> would be to register a PCI bus notifier in the host bridge looking for
>>> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
>>> device before the driver has probed, but adding a dma_set_mask callback
>>> to limit the mask to what you need?
>>
>> This is what Renesas BSP tries to do and it does not work.
>>
>> BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
>> i/o can be started before that.
> 
> Hm. This is strange statement:
>  really_probe
>  |->driver_sysfs_add
>     |-> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>  					     BUS_NOTIFY_BIND_DRIVER, dev);
> ...
>  |- ret = drv->probe(dev);
> ...
>  |- driver_bound(dev);
>     |- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> 					     BUS_NOTIFY_BOUND_DRIVER, dev);
> 
> Am I missing smth?

I misinterpreted your message, sorry.

BSP attaches to BUS_NOTIFY_BOUND_DRIVER, not to BUS_NOTIFY_BIND_DRIVER,
and simply overwrites device's dma_mask there.  You are suggesting
something completely different.

I'll check if your approach is practical.


Currently powerpc architecture has one more approach implemented, they
use pci_controller structure provided by host bridge driver, and that
has a set_dma_mask() hook. Maybe extending this beyond powerpc could be
a good idea. However, that will require changing quite a few host bridge
drivers, without any gain for most of those...

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 20:23         ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-03 20:23 UTC (permalink / raw)
  To: Grygorii Strashko, Will Deacon
  Cc: Arnd Bergmann, Catalin Marinas, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

>>>> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
>>>> index 290a84f..49645277 100644
>>>> --- a/arch/arm64/mm/dma-mapping.c
>>>> +++ b/arch/arm64/mm/dma-mapping.c
>>>> @@ -28,6 +28,7 @@
>>>>  #include <linux/dma-contiguous.h>
>>>>  #include <linux/vmalloc.h>
>>>>  #include <linux/swiotlb.h>
>>>> +#include <linux/pci.h>
>>>>  
>>>>  #include <asm/cacheflush.h>
>>>>  
>>>> @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
>>>>  
>>>>  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>>>>  {
>>>> +#ifdef CONFIG_PCI
>>>> +	if (dev_is_pci(hwdev)) {
>>>> +		struct pci_dev *pdev = to_pci_dev(hwdev);
>>>> +		struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
>>>> +
>>>> +		if (br->dev.dma_mask && (*br->dev.dma_mask) &&
>>>> +				(mask & (*br->dev.dma_mask)) != mask)
>>>> +			return 0;
>>>> +	}
>>>> +#endif
>>>
>>> Hmm, but this makes it look like the problem is both arm64 and swiotlb
>>> specific, when in reality it's not. Perhaps another hack you could try
>>> would be to register a PCI bus notifier in the host bridge looking for
>>> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
>>> device before the driver has probed, but adding a dma_set_mask callback
>>> to limit the mask to what you need?
>>
>> This is what Renesas BSP tries to do and it does not work.
>>
>> BUS_NOTIFY_BIND_DRIVER arrives after driver's probe routine exits, but
>> i/o can be started before that.
> 
> Hm. This is strange statement:
>  really_probe
>  |->driver_sysfs_add
>     |-> blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
>  					     BUS_NOTIFY_BIND_DRIVER, dev);
> ...
>  |- ret = drv->probe(dev);
> ...
>  |- driver_bound(dev);
>     |- blocking_notifier_call_chain(&dev->bus->p->bus_notifier,
> 					     BUS_NOTIFY_BOUND_DRIVER, dev);
> 
> Am I missing smth?

I misinterpreted your message, sorry.

BSP attaches to BUS_NOTIFY_BOUND_DRIVER, not to BUS_NOTIFY_BIND_DRIVER,
and simply overwrites device's dma_mask there.  You are suggesting
something completely different.

I'll check if your approach is practical.


Currently powerpc architecture has one more approach implemented, they
use pci_controller structure provided by host bridge driver, and that
has a set_dma_mask() hook. Maybe extending this beyond powerpc could be
a good idea. However, that will require changing quite a few host bridge
drivers, without any gain for most of those...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-03 18:44   ` Will Deacon
  (?)
@ 2017-01-03 23:13     ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-03 23:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Will Deacon, Nikita Yushchenko, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

On Tuesday, January 3, 2017 6:44:44 PM CET Will Deacon wrote:
> > @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
> >  
> >  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
> >  {
> > +#ifdef CONFIG_PCI
> > +     if (dev_is_pci(hwdev)) {
> > +             struct pci_dev *pdev = to_pci_dev(hwdev);
> > +             struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> > +
> > +             if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> > +                             (mask & (*br->dev.dma_mask)) != mask)
> > +                     return 0;
> > +     }
> > +#endif
> 
> Hmm, but this makes it look like the problem is both arm64 and swiotlb
> specific, when in reality it's not. Perhaps another hack you could try
> would be to register a PCI bus notifier in the host bridge looking for
> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
> device before the driver has probed, but adding a dma_set_mask callback
> to limit the mask to what you need?
> 
> I agree that it would be better if dma_set_mask handled all of this
> transparently, but it's all based on the underlying ops rather than the
> bus type.

This is what I prototyped a long time ago when this first came up.
I still think this needs to be solved properly for all of arm64, not
with a PCI specific hack, and in particular not using notifiers.

	Arnd

commit 9a57d58d116800a535510053136c6dd7a9c26e25
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Nov 17 14:06:55 2015 +0100

    [EXPERIMENTAL] ARM64: check implement dma_set_mask
    
    Needs work for coherent mask
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 243ef256b8c9..a57e7bb10e71 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -22,6 +22,7 @@ struct dev_archdata {
 	void *iommu;			/* private IOMMU data */
 #endif
 	bool dma_coherent;
+	u64 parent_dma_mask;
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f3351f..aa65875c611b 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return 1;
 }
 
+static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
+{
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	/* mask is below swiotlb bounce buffer, so fail */
+	if (!swiotlb_dma_supported(dev, mask))
+		return -EIO;
+
+	/*
+	 * because of the swiotlb, we can return success for
+	 * larger masks, but need to ensure that bounce buffers
+	 * are used above parent_dma_mask, so set that as
+	 * the effective mask.
+	 */
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static struct dma_map_ops swiotlb_dma_ops = {
 	.alloc = __dma_alloc,
 	.free = __dma_free,
@@ -367,6 +392,7 @@ static struct dma_map_ops swiotlb_dma_ops = {
 	.sync_sg_for_device = __swiotlb_sync_sg_for_device,
 	.dma_supported = __swiotlb_dma_supported,
 	.mapping_error = swiotlb_dma_mapping_error,
+	.set_dma_mask = __swiotlb_set_dma_mask,
 };
 
 static int __init atomic_pool_init(void)
@@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	if (!dev->archdata.dma_ops)
 		dev->archdata.dma_ops = &swiotlb_dma_ops;
 
+	/*
+	 * we don't yet support buses that have a non-zero mapping.
+	 *  Let's hope we won't need it
+	 */
+	WARN_ON(dma_base != 0);
+
+	/*
+	 * Whatever the parent bus can set. A device must not set
+	 * a DMA mask larger than this.
+	 */
+	dev->archdata.parent_dma_mask = size;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 }

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 23:13     ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-03 23:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, January 3, 2017 6:44:44 PM CET Will Deacon wrote:
> > @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
> >  
> >  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
> >  {
> > +#ifdef CONFIG_PCI
> > +     if (dev_is_pci(hwdev)) {
> > +             struct pci_dev *pdev = to_pci_dev(hwdev);
> > +             struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> > +
> > +             if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> > +                             (mask & (*br->dev.dma_mask)) != mask)
> > +                     return 0;
> > +     }
> > +#endif
> 
> Hmm, but this makes it look like the problem is both arm64 and swiotlb
> specific, when in reality it's not. Perhaps another hack you could try
> would be to register a PCI bus notifier in the host bridge looking for
> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
> device before the driver has probed, but adding a dma_set_mask callback
> to limit the mask to what you need?
> 
> I agree that it would be better if dma_set_mask handled all of this
> transparently, but it's all based on the underlying ops rather than the
> bus type.

This is what I prototyped a long time ago when this first came up.
I still think this needs to be solved properly for all of arm64, not
with a PCI specific hack, and in particular not using notifiers.

	Arnd

commit 9a57d58d116800a535510053136c6dd7a9c26e25
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Nov 17 14:06:55 2015 +0100

    [EXPERIMENTAL] ARM64: check implement dma_set_mask
    
    Needs work for coherent mask
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 243ef256b8c9..a57e7bb10e71 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -22,6 +22,7 @@ struct dev_archdata {
 	void *iommu;			/* private IOMMU data */
 #endif
 	bool dma_coherent;
+	u64 parent_dma_mask;
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f3351f..aa65875c611b 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return 1;
 }
 
+static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
+{
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	/* mask is below swiotlb bounce buffer, so fail */
+	if (!swiotlb_dma_supported(dev, mask))
+		return -EIO;
+
+	/*
+	 * because of the swiotlb, we can return success for
+	 * larger masks, but need to ensure that bounce buffers
+	 * are used above parent_dma_mask, so set that as
+	 * the effective mask.
+	 */
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static struct dma_map_ops swiotlb_dma_ops = {
 	.alloc = __dma_alloc,
 	.free = __dma_free,
@@ -367,6 +392,7 @@ static struct dma_map_ops swiotlb_dma_ops = {
 	.sync_sg_for_device = __swiotlb_sync_sg_for_device,
 	.dma_supported = __swiotlb_dma_supported,
 	.mapping_error = swiotlb_dma_mapping_error,
+	.set_dma_mask = __swiotlb_set_dma_mask,
 };
 
 static int __init atomic_pool_init(void)
@@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	if (!dev->archdata.dma_ops)
 		dev->archdata.dma_ops = &swiotlb_dma_ops;
 
+	/*
+	 * we don't yet support buses that have a non-zero mapping.
+	 *  Let's hope we won't need it
+	 */
+	WARN_ON(dma_base != 0);
+
+	/*
+	 * Whatever the parent bus can set. A device must not set
+	 * a DMA mask larger than this.
+	 */
+	dev->archdata.parent_dma_mask = size;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 }

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-03 23:13     ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-03 23:13 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nikita Yushchenko, Catalin Marinas, Will Deacon, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

On Tuesday, January 3, 2017 6:44:44 PM CET Will Deacon wrote:
> > @@ -347,6 +348,16 @@ static int __swiotlb_get_sgtable(struct device *dev, struct sg_table *sgt,
> >  
> >  static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
> >  {
> > +#ifdef CONFIG_PCI
> > +     if (dev_is_pci(hwdev)) {
> > +             struct pci_dev *pdev = to_pci_dev(hwdev);
> > +             struct pci_host_bridge *br = pci_find_host_bridge(pdev->bus);
> > +
> > +             if (br->dev.dma_mask && (*br->dev.dma_mask) &&
> > +                             (mask & (*br->dev.dma_mask)) != mask)
> > +                     return 0;
> > +     }
> > +#endif
> 
> Hmm, but this makes it look like the problem is both arm64 and swiotlb
> specific, when in reality it's not. Perhaps another hack you could try
> would be to register a PCI bus notifier in the host bridge looking for
> BUS_NOTIFY_BIND_DRIVER, then you could proxy the DMA ops for each child
> device before the driver has probed, but adding a dma_set_mask callback
> to limit the mask to what you need?
> 
> I agree that it would be better if dma_set_mask handled all of this
> transparently, but it's all based on the underlying ops rather than the
> bus type.

This is what I prototyped a long time ago when this first came up.
I still think this needs to be solved properly for all of arm64, not
with a PCI specific hack, and in particular not using notifiers.

	Arnd

commit 9a57d58d116800a535510053136c6dd7a9c26e25
Author: Arnd Bergmann <arnd@arndb.de>
Date:   Tue Nov 17 14:06:55 2015 +0100

    [EXPERIMENTAL] ARM64: check implement dma_set_mask
    
    Needs work for coherent mask
    
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>

diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 243ef256b8c9..a57e7bb10e71 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -22,6 +22,7 @@ struct dev_archdata {
 	void *iommu;			/* private IOMMU data */
 #endif
 	bool dma_coherent;
+	u64 parent_dma_mask;
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f3351f..aa65875c611b 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return 1;
 }
 
+static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
+{
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	/* mask is below swiotlb bounce buffer, so fail */
+	if (!swiotlb_dma_supported(dev, mask))
+		return -EIO;
+
+	/*
+	 * because of the swiotlb, we can return success for
+	 * larger masks, but need to ensure that bounce buffers
+	 * are used above parent_dma_mask, so set that as
+	 * the effective mask.
+	 */
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static struct dma_map_ops swiotlb_dma_ops = {
 	.alloc = __dma_alloc,
 	.free = __dma_free,
@@ -367,6 +392,7 @@ static struct dma_map_ops swiotlb_dma_ops = {
 	.sync_sg_for_device = __swiotlb_sync_sg_for_device,
 	.dma_supported = __swiotlb_dma_supported,
 	.mapping_error = swiotlb_dma_mapping_error,
+	.set_dma_mask = __swiotlb_set_dma_mask,
 };
 
 static int __init atomic_pool_init(void)
@@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	if (!dev->archdata.dma_ops)
 		dev->archdata.dma_ops = &swiotlb_dma_ops;
 
+	/*
+	 * we don't yet support buses that have a non-zero mapping.
+	 *  Let's hope we won't need it
+	 */
+	WARN_ON(dma_base != 0);
+
+	/*
+	 * Whatever the parent bus can set. A device must not set
+	 * a DMA mask larger than this.
+	 */
+	dev->archdata.parent_dma_mask = size;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 }


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-03 23:13     ` Arnd Bergmann
  (?)
@ 2017-01-04  6:24       ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04  6:24 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: Will Deacon, Catalin Marinas, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov

> commit 9a57d58d116800a535510053136c6dd7a9c26e25
> Author: Arnd Bergmann <arnd@arndb.de>
> Date:   Tue Nov 17 14:06:55 2015 +0100
> 
>     [EXPERIMENTAL] ARM64: check implement dma_set_mask
>     
>     Needs work for coherent mask
>     
>     Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Unfortunately this is far incomplete

> @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  	if (!dev->archdata.dma_ops)
>  		dev->archdata.dma_ops = &swiotlb_dma_ops;
>  
> +	/*
> +	 * we don't yet support buses that have a non-zero mapping.
> +	 *  Let's hope we won't need it
> +	 */
> +	WARN_ON(dma_base != 0);
> +
> +	/*
> +	 * Whatever the parent bus can set. A device must not set
> +	 * a DMA mask larger than this.
> +	 */
> +	dev->archdata.parent_dma_mask = size;
> +

... because size/mask passed here for PCI devices are meaningless.

For OF platforms, this is called via of_dma_configure(), that checks
dma-ranges of node that is *parent* for host bridge. Host bridge
currently does not control this at all.

In current device trees no dma-ranges is defined for nodes that are
parents to pci host bridges. This will make of_dma_configure() to fall
back to 32-bit size for all devices on all current platforms.  Thus
applying this patch will immediately break 64-bit dma masks on all
hardware that supports it.

Also related: dma-ranges property used by several pci host bridges is
*not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
former uses additional flags word at beginning.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04  6:24       ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04  6:24 UTC (permalink / raw)
  To: linux-arm-kernel

> commit 9a57d58d116800a535510053136c6dd7a9c26e25
> Author: Arnd Bergmann <arnd@arndb.de>
> Date:   Tue Nov 17 14:06:55 2015 +0100
> 
>     [EXPERIMENTAL] ARM64: check implement dma_set_mask
>     
>     Needs work for coherent mask
>     
>     Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Unfortunately this is far incomplete

> @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  	if (!dev->archdata.dma_ops)
>  		dev->archdata.dma_ops = &swiotlb_dma_ops;
>  
> +	/*
> +	 * we don't yet support buses that have a non-zero mapping.
> +	 *  Let's hope we won't need it
> +	 */
> +	WARN_ON(dma_base != 0);
> +
> +	/*
> +	 * Whatever the parent bus can set. A device must not set
> +	 * a DMA mask larger than this.
> +	 */
> +	dev->archdata.parent_dma_mask = size;
> +

... because size/mask passed here for PCI devices are meaningless.

For OF platforms, this is called via of_dma_configure(), that checks
dma-ranges of node that is *parent* for host bridge. Host bridge
currently does not control this at all.

In current device trees no dma-ranges is defined for nodes that are
parents to pci host bridges. This will make of_dma_configure() to fall
back to 32-bit size for all devices on all current platforms.  Thus
applying this patch will immediately break 64-bit dma masks on all
hardware that supports it.

Also related: dma-ranges property used by several pci host bridges is
*not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
former uses additional flags word at beginning.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04  6:24       ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04  6:24 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov

> commit 9a57d58d116800a535510053136c6dd7a9c26e25
> Author: Arnd Bergmann <arnd@arndb.de>
> Date:   Tue Nov 17 14:06:55 2015 +0100
> 
>     [EXPERIMENTAL] ARM64: check implement dma_set_mask
>     
>     Needs work for coherent mask
>     
>     Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Unfortunately this is far incomplete

> @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  	if (!dev->archdata.dma_ops)
>  		dev->archdata.dma_ops = &swiotlb_dma_ops;
>  
> +	/*
> +	 * we don't yet support buses that have a non-zero mapping.
> +	 *  Let's hope we won't need it
> +	 */
> +	WARN_ON(dma_base != 0);
> +
> +	/*
> +	 * Whatever the parent bus can set. A device must not set
> +	 * a DMA mask larger than this.
> +	 */
> +	dev->archdata.parent_dma_mask = size;
> +

... because size/mask passed here for PCI devices are meaningless.

For OF platforms, this is called via of_dma_configure(), that checks
dma-ranges of node that is *parent* for host bridge. Host bridge
currently does not control this at all.

In current device trees no dma-ranges is defined for nodes that are
parents to pci host bridges. This will make of_dma_configure() to fall
back to 32-bit size for all devices on all current platforms.  Thus
applying this patch will immediately break 64-bit dma masks on all
hardware that supports it.

Also related: dma-ranges property used by several pci host bridges is
*not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
former uses additional flags word at beginning.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-04  6:24       ` Nikita Yushchenko
  (?)
@ 2017-01-04 13:29         ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-04 13:29 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

On Wednesday, January 4, 2017 9:24:09 AM CET Nikita Yushchenko wrote:
> > commit 9a57d58d116800a535510053136c6dd7a9c26e25
> > Author: Arnd Bergmann <arnd@arndb.de>
> > Date:   Tue Nov 17 14:06:55 2015 +0100
> > 
> >     [EXPERIMENTAL] ARM64: check implement dma_set_mask
> >     
> >     Needs work for coherent mask
> >     
> >     Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> 
> Unfortunately this is far incomplete
> 
> > @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> >  	if (!dev->archdata.dma_ops)
> >  		dev->archdata.dma_ops = &swiotlb_dma_ops;
> >  
> > +	/*
> > +	 * we don't yet support buses that have a non-zero mapping.
> > +	 *  Let's hope we won't need it
> > +	 */
> > +	WARN_ON(dma_base != 0);
> > +
> > +	/*
> > +	 * Whatever the parent bus can set. A device must not set
> > +	 * a DMA mask larger than this.
> > +	 */
> > +	dev->archdata.parent_dma_mask = size;
> > +
> 
> ... because size/mask passed here for PCI devices are meaningless.
> 
> For OF platforms, this is called via of_dma_configure(), that checks
> dma-ranges of node that is *parent* for host bridge. Host bridge
> currently does not control this at all.

We need to think about this a bit. Is it actually the PCI host
bridge that limits the ranges here, or the bus that it is connected
to. In the latter case, the caller needs to be adapted to handle
both.

> In current device trees no dma-ranges is defined for nodes that are
> parents to pci host bridges. This will make of_dma_configure() to fall
> back to 32-bit size for all devices on all current platforms.  Thus
> applying this patch will immediately break 64-bit dma masks on all
> hardware that supports it.

No, it won't break it, it will just fall back to swiotlb for all the
ones that are lacking the dma-ranges property. I think this is correct
behavior.

> Also related: dma-ranges property used by several pci host bridges is
> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
> former uses additional flags word at beginning.

Can you elaborate? Do we have PCI host bridges that use wrongly formatted
dma-ranges properties?

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04 13:29         ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-04 13:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday, January 4, 2017 9:24:09 AM CET Nikita Yushchenko wrote:
> > commit 9a57d58d116800a535510053136c6dd7a9c26e25
> > Author: Arnd Bergmann <arnd@arndb.de>
> > Date:   Tue Nov 17 14:06:55 2015 +0100
> > 
> >     [EXPERIMENTAL] ARM64: check implement dma_set_mask
> >     
> >     Needs work for coherent mask
> >     
> >     Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> 
> Unfortunately this is far incomplete
> 
> > @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> >  	if (!dev->archdata.dma_ops)
> >  		dev->archdata.dma_ops = &swiotlb_dma_ops;
> >  
> > +	/*
> > +	 * we don't yet support buses that have a non-zero mapping.
> > +	 *  Let's hope we won't need it
> > +	 */
> > +	WARN_ON(dma_base != 0);
> > +
> > +	/*
> > +	 * Whatever the parent bus can set. A device must not set
> > +	 * a DMA mask larger than this.
> > +	 */
> > +	dev->archdata.parent_dma_mask = size;
> > +
> 
> ... because size/mask passed here for PCI devices are meaningless.
> 
> For OF platforms, this is called via of_dma_configure(), that checks
> dma-ranges of node that is *parent* for host bridge. Host bridge
> currently does not control this at all.

We need to think about this a bit. Is it actually the PCI host
bridge that limits the ranges here, or the bus that it is connected
to. In the latter case, the caller needs to be adapted to handle
both.

> In current device trees no dma-ranges is defined for nodes that are
> parents to pci host bridges. This will make of_dma_configure() to fall
> back to 32-bit size for all devices on all current platforms.  Thus
> applying this patch will immediately break 64-bit dma masks on all
> hardware that supports it.

No, it won't break it, it will just fall back to swiotlb for all the
ones that are lacking the dma-ranges property. I think this is correct
behavior.

> Also related: dma-ranges property used by several pci host bridges is
> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
> former uses additional flags word at beginning.

Can you elaborate? Do we have PCI host bridges that use wrongly formatted
dma-ranges properties?

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04 13:29         ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-04 13:29 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

On Wednesday, January 4, 2017 9:24:09 AM CET Nikita Yushchenko wrote:
> > commit 9a57d58d116800a535510053136c6dd7a9c26e25
> > Author: Arnd Bergmann <arnd@arndb.de>
> > Date:   Tue Nov 17 14:06:55 2015 +0100
> > 
> >     [EXPERIMENTAL] ARM64: check implement dma_set_mask
> >     
> >     Needs work for coherent mask
> >     
> >     Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> 
> Unfortunately this is far incomplete
> 
> > @@ -957,6 +983,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> >  	if (!dev->archdata.dma_ops)
> >  		dev->archdata.dma_ops = &swiotlb_dma_ops;
> >  
> > +	/*
> > +	 * we don't yet support buses that have a non-zero mapping.
> > +	 *  Let's hope we won't need it
> > +	 */
> > +	WARN_ON(dma_base != 0);
> > +
> > +	/*
> > +	 * Whatever the parent bus can set. A device must not set
> > +	 * a DMA mask larger than this.
> > +	 */
> > +	dev->archdata.parent_dma_mask = size;
> > +
> 
> ... because size/mask passed here for PCI devices are meaningless.
> 
> For OF platforms, this is called via of_dma_configure(), that checks
> dma-ranges of node that is *parent* for host bridge. Host bridge
> currently does not control this at all.

We need to think about this a bit. Is it actually the PCI host
bridge that limits the ranges here, or the bus that it is connected
to. In the latter case, the caller needs to be adapted to handle
both.

> In current device trees no dma-ranges is defined for nodes that are
> parents to pci host bridges. This will make of_dma_configure() to fall
> back to 32-bit size for all devices on all current platforms.  Thus
> applying this patch will immediately break 64-bit dma masks on all
> hardware that supports it.

No, it won't break it, it will just fall back to swiotlb for all the
ones that are lacking the dma-ranges property. I think this is correct
behavior.

> Also related: dma-ranges property used by several pci host bridges is
> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
> former uses additional flags word at beginning.

Can you elaborate? Do we have PCI host bridges that use wrongly formatted
dma-ranges properties?

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-04 13:29         ` Arnd Bergmann
  (?)
@ 2017-01-04 14:30           ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04 14:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

>> For OF platforms, this is called via of_dma_configure(), that checks
>> dma-ranges of node that is *parent* for host bridge. Host bridge
>> currently does not control this at all.
> 
> We need to think about this a bit. Is it actually the PCI host
> bridge that limits the ranges here, or the bus that it is connected
> to. In the latter case, the caller needs to be adapted to handle
> both.

In r-car case, I'm not sure what is the source of limitation at physical
level.

pcie-rcar driver configures ranges for PCIe inbound transactions based
on dma-ranges property in it's device tree node. In the current device
tree for this platform, that only contains one range and it is in lower
memory.

NVMe driver tries i/o to kmalloc()ed area. That returns 0x5xxxxxxxx
addresses here. As a quick experiment, I tried to add second range to
pcie-rcar's dma-ranges to cover 0x5xxxxxxxx area - but that did not make
DMA to high addresses working.

My current understanding is that host bridge hardware module can't
handle inbound transactions to PCI addresses above 4G - and this
limitations comes from host bridge itself.

I've read somewhere in the lists that pcie-rcar hardware is "32-bit" -
but I don't remember where, and don't know lowlevel details. Maybe
somebody from linux-renesas can elaborate?

>> In current device trees no dma-ranges is defined for nodes that are
>> parents to pci host bridges. This will make of_dma_configure() to fall
>> back to 32-bit size for all devices on all current platforms.  Thus
>> applying this patch will immediately break 64-bit dma masks on all
>> hardware that supports it.
> 
> No, it won't break it, it will just fall back to swiotlb for all the
> ones that are lacking the dma-ranges property. I think this is correct
> behavior.

I'd say - for all ones that have parents without dma-ranges property.

As of 4.10-rc2, I see only two definitions of wide parent dma-ranges
under arch/arm64/boot/dts/ - in amd/amd-seattle-soc.dtsi and
apm/apm-storm.dtsi

Are these the only arm64 platforms that can to DMA to high addresses?
I'm not arm64 expert but I'd be surprised if that's the case.

>> Also related: dma-ranges property used by several pci host bridges is
>> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
>> former uses additional flags word at beginning.
> 
> Can you elaborate? Do we have PCI host bridges that use wrongly formatted
> dma-ranges properties?

of_dma_get_range() expects <dma_addr cpu_addr size> format.

pcie-rcar.c, pci-rcar-gen2.c, pci-xgene.c and pcie-iproc.c from
drivers/pci/host/ all parse dma-ranges using of_pci_range_parser that
uses <flags pci-addr cpu-addr size> format - i.e. something different
from what of_dma_get_range() uses.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04 14:30           ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

>> For OF platforms, this is called via of_dma_configure(), that checks
>> dma-ranges of node that is *parent* for host bridge. Host bridge
>> currently does not control this at all.
> 
> We need to think about this a bit. Is it actually the PCI host
> bridge that limits the ranges here, or the bus that it is connected
> to. In the latter case, the caller needs to be adapted to handle
> both.

In r-car case, I'm not sure what is the source of limitation at physical
level.

pcie-rcar driver configures ranges for PCIe inbound transactions based
on dma-ranges property in it's device tree node. In the current device
tree for this platform, that only contains one range and it is in lower
memory.

NVMe driver tries i/o to kmalloc()ed area. That returns 0x5xxxxxxxx
addresses here. As a quick experiment, I tried to add second range to
pcie-rcar's dma-ranges to cover 0x5xxxxxxxx area - but that did not make
DMA to high addresses working.

My current understanding is that host bridge hardware module can't
handle inbound transactions to PCI addresses above 4G - and this
limitations comes from host bridge itself.

I've read somewhere in the lists that pcie-rcar hardware is "32-bit" -
but I don't remember where, and don't know lowlevel details. Maybe
somebody from linux-renesas can elaborate?

>> In current device trees no dma-ranges is defined for nodes that are
>> parents to pci host bridges. This will make of_dma_configure() to fall
>> back to 32-bit size for all devices on all current platforms.  Thus
>> applying this patch will immediately break 64-bit dma masks on all
>> hardware that supports it.
> 
> No, it won't break it, it will just fall back to swiotlb for all the
> ones that are lacking the dma-ranges property. I think this is correct
> behavior.

I'd say - for all ones that have parents without dma-ranges property.

As of 4.10-rc2, I see only two definitions of wide parent dma-ranges
under arch/arm64/boot/dts/ - in amd/amd-seattle-soc.dtsi and
apm/apm-storm.dtsi

Are these the only arm64 platforms that can to DMA to high addresses?
I'm not arm64 expert but I'd be surprised if that's the case.

>> Also related: dma-ranges property used by several pci host bridges is
>> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
>> former uses additional flags word at beginning.
> 
> Can you elaborate? Do we have PCI host bridges that use wrongly formatted
> dma-ranges properties?

of_dma_get_range() expects <dma_addr cpu_addr size> format.

pcie-rcar.c, pci-rcar-gen2.c, pci-xgene.c and pcie-iproc.c from
drivers/pci/host/ all parse dma-ranges using of_pci_range_parser that
uses <flags pci-addr cpu-addr size> format - i.e. something different
from what of_dma_get_range() uses.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04 14:30           ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04 14:30 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

>> For OF platforms, this is called via of_dma_configure(), that checks
>> dma-ranges of node that is *parent* for host bridge. Host bridge
>> currently does not control this at all.
> 
> We need to think about this a bit. Is it actually the PCI host
> bridge that limits the ranges here, or the bus that it is connected
> to. In the latter case, the caller needs to be adapted to handle
> both.

In r-car case, I'm not sure what is the source of limitation at physical
level.

pcie-rcar driver configures ranges for PCIe inbound transactions based
on dma-ranges property in it's device tree node. In the current device
tree for this platform, that only contains one range and it is in lower
memory.

NVMe driver tries i/o to kmalloc()ed area. That returns 0x5xxxxxxxx
addresses here. As a quick experiment, I tried to add second range to
pcie-rcar's dma-ranges to cover 0x5xxxxxxxx area - but that did not make
DMA to high addresses working.

My current understanding is that host bridge hardware module can't
handle inbound transactions to PCI addresses above 4G - and this
limitations comes from host bridge itself.

I've read somewhere in the lists that pcie-rcar hardware is "32-bit" -
but I don't remember where, and don't know lowlevel details. Maybe
somebody from linux-renesas can elaborate?

>> In current device trees no dma-ranges is defined for nodes that are
>> parents to pci host bridges. This will make of_dma_configure() to fall
>> back to 32-bit size for all devices on all current platforms.  Thus
>> applying this patch will immediately break 64-bit dma masks on all
>> hardware that supports it.
> 
> No, it won't break it, it will just fall back to swiotlb for all the
> ones that are lacking the dma-ranges property. I think this is correct
> behavior.

I'd say - for all ones that have parents without dma-ranges property.

As of 4.10-rc2, I see only two definitions of wide parent dma-ranges
under arch/arm64/boot/dts/ - in amd/amd-seattle-soc.dtsi and
apm/apm-storm.dtsi

Are these the only arm64 platforms that can to DMA to high addresses?
I'm not arm64 expert but I'd be surprised if that's the case.

>> Also related: dma-ranges property used by several pci host bridges is
>> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
>> former uses additional flags word at beginning.
> 
> Can you elaborate? Do we have PCI host bridges that use wrongly formatted
> dma-ranges properties?

of_dma_get_range() expects <dma_addr cpu_addr size> format.

pcie-rcar.c, pci-rcar-gen2.c, pci-xgene.c and pcie-iproc.c from
drivers/pci/host/ all parse dma-ranges using of_pci_range_parser that
uses <flags pci-addr cpu-addr size> format - i.e. something different
from what of_dma_get_range() uses.

Nikita

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-04 14:30           ` Nikita Yushchenko
@ 2017-01-04 14:46             ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-04 14:46 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

On Wednesday, January 4, 2017 5:30:19 PM CET Nikita Yushchenko wrote:
> >> For OF platforms, this is called via of_dma_configure(), that checks
> >> dma-ranges of node that is *parent* for host bridge. Host bridge
> >> currently does not control this at all.
> > 
> > We need to think about this a bit. Is it actually the PCI host
> > bridge that limits the ranges here, or the bus that it is connected
> > to. In the latter case, the caller needs to be adapted to handle
> > both.
> 
> In r-car case, I'm not sure what is the source of limitation at physical
> level.
> 
> pcie-rcar driver configures ranges for PCIe inbound transactions based
> on dma-ranges property in it's device tree node. In the current device
> tree for this platform, that only contains one range and it is in lower
> memory.
> 
> NVMe driver tries i/o to kmalloc()ed area. That returns 0x5xxxxxxxx
> addresses here. As a quick experiment, I tried to add second range to
> pcie-rcar's dma-ranges to cover 0x5xxxxxxxx area - but that did not make
> DMA to high addresses working.
> 
> My current understanding is that host bridge hardware module can't
> handle inbound transactions to PCI addresses above 4G - and this
> limitations comes from host bridge itself.
> 
> I've read somewhere in the lists that pcie-rcar hardware is "32-bit" -
> but I don't remember where, and don't know lowlevel details. Maybe
> somebody from linux-renesas can elaborate?

Just a guess, but if the inbound translation windows in the host
bridge are wider than 32-bit, the reason for setting up a single
32-bit window is probably because that is what the parent bus supports.

> >> In current device trees no dma-ranges is defined for nodes that are
> >> parents to pci host bridges. This will make of_dma_configure() to fall
> >> back to 32-bit size for all devices on all current platforms.  Thus
> >> applying this patch will immediately break 64-bit dma masks on all
> >> hardware that supports it.
> > 
> > No, it won't break it, it will just fall back to swiotlb for all the
> > ones that are lacking the dma-ranges property. I think this is correct
> > behavior.
> 
> I'd say - for all ones that have parents without dma-ranges property.
> 
> As of 4.10-rc2, I see only two definitions of wide parent dma-ranges
> under arch/arm64/boot/dts/ - in amd/amd-seattle-soc.dtsi and
> apm/apm-storm.dtsi
> 
> Are these the only arm64 platforms that can to DMA to high addresses?
> I'm not arm64 expert but I'd be surprised if that's the case.

It's likely that a few others also do high DMA, but a lot of arm64
chips are actually derived from earlier 32-bit chips and don't
even support any RAM above 4GB, as well as having a lot of 32-bit
DMA masters.

> >> Also related: dma-ranges property used by several pci host bridges is
> >> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
> >> former uses additional flags word at beginning.
> > 
> > Can you elaborate? Do we have PCI host bridges that use wrongly formatted
> > dma-ranges properties?
> 
> of_dma_get_range() expects <dma_addr cpu_addr size> format.
> 
> pcie-rcar.c, pci-rcar-gen2.c, pci-xgene.c and pcie-iproc.c from
> drivers/pci/host/ all parse dma-ranges using of_pci_range_parser that
> uses <flags pci-addr cpu-addr size> format - i.e. something different
> from what of_dma_get_range() uses.

The "dma_addr" here is expressed in terms of #address-cells of the
bus it is in, and that is "3" in case of PCI, where the first 32-bit
word is a bit pattern containing various things, and the other two
cells are a 64-bit address. I think this is correct, but we may
need to add some special handling for parsing PCI host bridges
in of_dma_get_range, to ensure we actually look at translations for
the memory space.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04 14:46             ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-04 14:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday, January 4, 2017 5:30:19 PM CET Nikita Yushchenko wrote:
> >> For OF platforms, this is called via of_dma_configure(), that checks
> >> dma-ranges of node that is *parent* for host bridge. Host bridge
> >> currently does not control this at all.
> > 
> > We need to think about this a bit. Is it actually the PCI host
> > bridge that limits the ranges here, or the bus that it is connected
> > to. In the latter case, the caller needs to be adapted to handle
> > both.
> 
> In r-car case, I'm not sure what is the source of limitation at physical
> level.
> 
> pcie-rcar driver configures ranges for PCIe inbound transactions based
> on dma-ranges property in it's device tree node. In the current device
> tree for this platform, that only contains one range and it is in lower
> memory.
> 
> NVMe driver tries i/o to kmalloc()ed area. That returns 0x5xxxxxxxx
> addresses here. As a quick experiment, I tried to add second range to
> pcie-rcar's dma-ranges to cover 0x5xxxxxxxx area - but that did not make
> DMA to high addresses working.
> 
> My current understanding is that host bridge hardware module can't
> handle inbound transactions to PCI addresses above 4G - and this
> limitations comes from host bridge itself.
> 
> I've read somewhere in the lists that pcie-rcar hardware is "32-bit" -
> but I don't remember where, and don't know lowlevel details. Maybe
> somebody from linux-renesas can elaborate?

Just a guess, but if the inbound translation windows in the host
bridge are wider than 32-bit, the reason for setting up a single
32-bit window is probably because that is what the parent bus supports.

> >> In current device trees no dma-ranges is defined for nodes that are
> >> parents to pci host bridges. This will make of_dma_configure() to fall
> >> back to 32-bit size for all devices on all current platforms.  Thus
> >> applying this patch will immediately break 64-bit dma masks on all
> >> hardware that supports it.
> > 
> > No, it won't break it, it will just fall back to swiotlb for all the
> > ones that are lacking the dma-ranges property. I think this is correct
> > behavior.
> 
> I'd say - for all ones that have parents without dma-ranges property.
> 
> As of 4.10-rc2, I see only two definitions of wide parent dma-ranges
> under arch/arm64/boot/dts/ - in amd/amd-seattle-soc.dtsi and
> apm/apm-storm.dtsi
> 
> Are these the only arm64 platforms that can to DMA to high addresses?
> I'm not arm64 expert but I'd be surprised if that's the case.

It's likely that a few others also do high DMA, but a lot of arm64
chips are actually derived from earlier 32-bit chips and don't
even support any RAM above 4GB, as well as having a lot of 32-bit
DMA masters.

> >> Also related: dma-ranges property used by several pci host bridges is
> >> *not* compatible with "legacy" dma-ranges parsed by of_get_dma_range() -
> >> former uses additional flags word at beginning.
> > 
> > Can you elaborate? Do we have PCI host bridges that use wrongly formatted
> > dma-ranges properties?
> 
> of_dma_get_range() expects <dma_addr cpu_addr size> format.
> 
> pcie-rcar.c, pci-rcar-gen2.c, pci-xgene.c and pcie-iproc.c from
> drivers/pci/host/ all parse dma-ranges using of_pci_range_parser that
> uses <flags pci-addr cpu-addr size> format - i.e. something different
> from what of_dma_get_range() uses.

The "dma_addr" here is expressed in terms of #address-cells of the
bus it is in, and that is "3" in case of PCI, where the first 32-bit
word is a bit pattern containing various things, and the other two
cells are a 64-bit address. I think this is correct, but we may
need to add some special handling for parsing PCI host bridges
in of_dma_get_range, to ensure we actually look at translations for
the memory space.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-04 14:46             ` Arnd Bergmann
  (?)
@ 2017-01-04 15:29               ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04 15:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

>>>> For OF platforms, this is called via of_dma_configure(), that checks
>>>> dma-ranges of node that is *parent* for host bridge. Host bridge
>>>> currently does not control this at all.
>>>
>>> We need to think about this a bit. Is it actually the PCI host
>>> bridge that limits the ranges here, or the bus that it is connected
>>> to. In the latter case, the caller needs to be adapted to handle
>>> both.
>>
>> In r-car case, I'm not sure what is the source of limitation at physical
>> level.
>>
>> pcie-rcar driver configures ranges for PCIe inbound transactions based
>> on dma-ranges property in it's device tree node. In the current device
>> tree for this platform, that only contains one range and it is in lower
>> memory.
>>
>> NVMe driver tries i/o to kmalloc()ed area. That returns 0x5xxxxxxxx
>> addresses here. As a quick experiment, I tried to add second range to
>> pcie-rcar's dma-ranges to cover 0x5xxxxxxxx area - but that did not make
>> DMA to high addresses working.
>>
>> My current understanding is that host bridge hardware module can't
>> handle inbound transactions to PCI addresses above 4G - and this
>> limitations comes from host bridge itself.
>>
>> I've read somewhere in the lists that pcie-rcar hardware is "32-bit" -
>> but I don't remember where, and don't know lowlevel details. Maybe
>> somebody from linux-renesas can elaborate?
> 
> Just a guess, but if the inbound translation windows in the host
> bridge are wider than 32-bit, the reason for setting up a single
> 32-bit window is probably because that is what the parent bus supports.

Well anyway applying patch similar to your's will fix pcie-rcar + nvme
case - thus I don't object :)   But it can break other cases ...

But why do you hook at set_dma_mask() and overwrite mask inside, instead
of hooking at dma_supported() and rejecting unsupported mask?

I think later is better, because it lets drivers to handle unsupported
high-dma case, like documented in DMA-API_HOWTO.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04 15:29               ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04 15:29 UTC (permalink / raw)
  To: linux-arm-kernel

>>>> For OF platforms, this is called via of_dma_configure(), that checks
>>>> dma-ranges of node that is *parent* for host bridge. Host bridge
>>>> currently does not control this at all.
>>>
>>> We need to think about this a bit. Is it actually the PCI host
>>> bridge that limits the ranges here, or the bus that it is connected
>>> to. In the latter case, the caller needs to be adapted to handle
>>> both.
>>
>> In r-car case, I'm not sure what is the source of limitation at physical
>> level.
>>
>> pcie-rcar driver configures ranges for PCIe inbound transactions based
>> on dma-ranges property in it's device tree node. In the current device
>> tree for this platform, that only contains one range and it is in lower
>> memory.
>>
>> NVMe driver tries i/o to kmalloc()ed area. That returns 0x5xxxxxxxx
>> addresses here. As a quick experiment, I tried to add second range to
>> pcie-rcar's dma-ranges to cover 0x5xxxxxxxx area - but that did not make
>> DMA to high addresses working.
>>
>> My current understanding is that host bridge hardware module can't
>> handle inbound transactions to PCI addresses above 4G - and this
>> limitations comes from host bridge itself.
>>
>> I've read somewhere in the lists that pcie-rcar hardware is "32-bit" -
>> but I don't remember where, and don't know lowlevel details. Maybe
>> somebody from linux-renesas can elaborate?
> 
> Just a guess, but if the inbound translation windows in the host
> bridge are wider than 32-bit, the reason for setting up a single
> 32-bit window is probably because that is what the parent bus supports.

Well anyway applying patch similar to your's will fix pcie-rcar + nvme
case - thus I don't object :)   But it can break other cases ...

But why do you hook at set_dma_mask() and overwrite mask inside, instead
of hooking at dma_supported() and rejecting unsupported mask?

I think later is better, because it lets drivers to handle unsupported
high-dma case, like documented in DMA-API_HOWTO.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-04 15:29               ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-04 15:29 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

>>>> For OF platforms, this is called via of_dma_configure(), that checks
>>>> dma-ranges of node that is *parent* for host bridge. Host bridge
>>>> currently does not control this at all.
>>>
>>> We need to think about this a bit. Is it actually the PCI host
>>> bridge that limits the ranges here, or the bus that it is connected
>>> to. In the latter case, the caller needs to be adapted to handle
>>> both.
>>
>> In r-car case, I'm not sure what is the source of limitation at physical
>> level.
>>
>> pcie-rcar driver configures ranges for PCIe inbound transactions based
>> on dma-ranges property in it's device tree node. In the current device
>> tree for this platform, that only contains one range and it is in lower
>> memory.
>>
>> NVMe driver tries i/o to kmalloc()ed area. That returns 0x5xxxxxxxx
>> addresses here. As a quick experiment, I tried to add second range to
>> pcie-rcar's dma-ranges to cover 0x5xxxxxxxx area - but that did not make
>> DMA to high addresses working.
>>
>> My current understanding is that host bridge hardware module can't
>> handle inbound transactions to PCI addresses above 4G - and this
>> limitations comes from host bridge itself.
>>
>> I've read somewhere in the lists that pcie-rcar hardware is "32-bit" -
>> but I don't remember where, and don't know lowlevel details. Maybe
>> somebody from linux-renesas can elaborate?
> 
> Just a guess, but if the inbound translation windows in the host
> bridge are wider than 32-bit, the reason for setting up a single
> 32-bit window is probably because that is what the parent bus supports.

Well anyway applying patch similar to your's will fix pcie-rcar + nvme
case - thus I don't object :)   But it can break other cases ...

But why do you hook at set_dma_mask() and overwrite mask inside, instead
of hooking at dma_supported() and rejecting unsupported mask?

I think later is better, because it lets drivers to handle unsupported
high-dma case, like documented in DMA-API_HOWTO.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-04 15:29               ` Nikita Yushchenko
  (?)
@ 2017-01-06 11:10                 ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-06 11:10 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

On Wednesday, January 4, 2017 6:29:39 PM CET Nikita Yushchenko wrote:

> > Just a guess, but if the inbound translation windows in the host
> > bridge are wider than 32-bit, the reason for setting up a single
> > 32-bit window is probably because that is what the parent bus supports.
> 
> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
> case - thus I don't object :)   But it can break other cases ...
> 
> But why do you hook at set_dma_mask() and overwrite mask inside, instead
> of hooking at dma_supported() and rejecting unsupported mask?
> 
> I think later is better, because it lets drivers to handle unsupported
> high-dma case, like documented in DMA-API_HOWTO.

I think the behavior I put in there is required for swiotlb to make
sense, otherwise you would rely on the driver to handle dma_set_mask()
failure gracefully with its own bounce buffers (as network and
scsi drivers do but others don't).

Having swiotlb or iommu enabled should result in dma_set_mask() always
succeeding unless the mask is too small to cover the swiotlb
bounce buffer area or the iommu virtual address space. This behavior
is particularly important in case the bus address space is narrower
than 32-bit, as we have to guarantee that the fallback to 32-bit
DMA always succeeds. There are also a lot of drivers that try to
set a 64-bit mask but don't implement bounce buffers for streaming
mappings if that fails, and swiotlb is what we use to make those
drivers work.

And yes, the API is a horrible mess.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-06 11:10                 ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-06 11:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday, January 4, 2017 6:29:39 PM CET Nikita Yushchenko wrote:

> > Just a guess, but if the inbound translation windows in the host
> > bridge are wider than 32-bit, the reason for setting up a single
> > 32-bit window is probably because that is what the parent bus supports.
> 
> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
> case - thus I don't object :)   But it can break other cases ...
> 
> But why do you hook at set_dma_mask() and overwrite mask inside, instead
> of hooking at dma_supported() and rejecting unsupported mask?
> 
> I think later is better, because it lets drivers to handle unsupported
> high-dma case, like documented in DMA-API_HOWTO.

I think the behavior I put in there is required for swiotlb to make
sense, otherwise you would rely on the driver to handle dma_set_mask()
failure gracefully with its own bounce buffers (as network and
scsi drivers do but others don't).

Having swiotlb or iommu enabled should result in dma_set_mask() always
succeeding unless the mask is too small to cover the swiotlb
bounce buffer area or the iommu virtual address space. This behavior
is particularly important in case the bus address space is narrower
than 32-bit, as we have to guarantee that the fallback to 32-bit
DMA always succeeds. There are also a lot of drivers that try to
set a 64-bit mask but don't implement bounce buffers for streaming
mappings if that fails, and swiotlb is what we use to make those
drivers work.

And yes, the API is a horrible mess.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-06 11:10                 ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-06 11:10 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

On Wednesday, January 4, 2017 6:29:39 PM CET Nikita Yushchenko wrote:

> > Just a guess, but if the inbound translation windows in the host
> > bridge are wider than 32-bit, the reason for setting up a single
> > 32-bit window is probably because that is what the parent bus supports.
> 
> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
> case - thus I don't object :)   But it can break other cases ...
> 
> But why do you hook at set_dma_mask() and overwrite mask inside, instead
> of hooking at dma_supported() and rejecting unsupported mask?
> 
> I think later is better, because it lets drivers to handle unsupported
> high-dma case, like documented in DMA-API_HOWTO.

I think the behavior I put in there is required for swiotlb to make
sense, otherwise you would rely on the driver to handle dma_set_mask()
failure gracefully with its own bounce buffers (as network and
scsi drivers do but others don't).

Having swiotlb or iommu enabled should result in dma_set_mask() always
succeeding unless the mask is too small to cover the swiotlb
bounce buffer area or the iommu virtual address space. This behavior
is particularly important in case the bus address space is narrower
than 32-bit, as we have to guarantee that the fallback to 32-bit
DMA always succeeds. There are also a lot of drivers that try to
set a 64-bit mask but don't implement bounce buffers for streaming
mappings if that fails, and swiotlb is what we use to make those
drivers work.

And yes, the API is a horrible mess.

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-06 11:10                 ` Arnd Bergmann
  (?)
@ 2017-01-06 13:47                   ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-06 13:47 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

>>> Just a guess, but if the inbound translation windows in the host
>>> bridge are wider than 32-bit, the reason for setting up a single
>>> 32-bit window is probably because that is what the parent bus supports.

I've re-checked rcar-pcie hardware documentation.

It indeed mentions that AXI bus it sits on is 32-bit.

>> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
>> case - thus I don't object :)   But it can break other cases ...
>>
>> But why do you hook at set_dma_mask() and overwrite mask inside, instead
>> of hooking at dma_supported() and rejecting unsupported mask?
>>
>> I think later is better, because it lets drivers to handle unsupported
>> high-dma case, like documented in DMA-API_HOWTO.
> 
> I think the behavior I put in there is required for swiotlb to make
> sense, otherwise you would rely on the driver to handle dma_set_mask()
> failure gracefully with its own bounce buffers (as network and
> scsi drivers do but others don't).
> 
> Having swiotlb or iommu enabled should result in dma_set_mask() always
> succeeding unless the mask is too small to cover the swiotlb
> bounce buffer area or the iommu virtual address space. This behavior
> is particularly important in case the bus address space is narrower
> than 32-bit, as we have to guarantee that the fallback to 32-bit
> DMA always succeeds. There are also a lot of drivers that try to
> set a 64-bit mask but don't implement bounce buffers for streaming
> mappings if that fails, and swiotlb is what we use to make those
> drivers work.
> 
> And yes, the API is a horrible mess.

With my patch applied and thus 32bit dma_mask set for NVMe device, I do
see high addresses passed to dma_map_*() routines and handled by
swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
operation but silently replace mask behind the scene" is required for
swiotlb to be used, does not match reality.

It can be interpreted as a breakage elsewhere, but it's hard to point
particular "root cause". The entire infrastructure to allocate and use
DMA memory is messy.

Still current code does not work, thus fix is needed.

Perhaps need to introduce some generic API to "allocate memory best
suited for DMA to particular device", and fix allocation points (in
drivers, filesystems, etc) to use it. Such an API could try to allocate
area that can be DMAed by hardware, and fallback to other memory that
can be used via swiotlb or other bounce buffer implementation.

But for now, have to stay with dma masks. Will follow-up with a patch
based on your but with coherent mask handling added.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-06 13:47                   ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-06 13:47 UTC (permalink / raw)
  To: linux-arm-kernel

>>> Just a guess, but if the inbound translation windows in the host
>>> bridge are wider than 32-bit, the reason for setting up a single
>>> 32-bit window is probably because that is what the parent bus supports.

I've re-checked rcar-pcie hardware documentation.

It indeed mentions that AXI bus it sits on is 32-bit.

>> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
>> case - thus I don't object :)   But it can break other cases ...
>>
>> But why do you hook at set_dma_mask() and overwrite mask inside, instead
>> of hooking at dma_supported() and rejecting unsupported mask?
>>
>> I think later is better, because it lets drivers to handle unsupported
>> high-dma case, like documented in DMA-API_HOWTO.
> 
> I think the behavior I put in there is required for swiotlb to make
> sense, otherwise you would rely on the driver to handle dma_set_mask()
> failure gracefully with its own bounce buffers (as network and
> scsi drivers do but others don't).
> 
> Having swiotlb or iommu enabled should result in dma_set_mask() always
> succeeding unless the mask is too small to cover the swiotlb
> bounce buffer area or the iommu virtual address space. This behavior
> is particularly important in case the bus address space is narrower
> than 32-bit, as we have to guarantee that the fallback to 32-bit
> DMA always succeeds. There are also a lot of drivers that try to
> set a 64-bit mask but don't implement bounce buffers for streaming
> mappings if that fails, and swiotlb is what we use to make those
> drivers work.
> 
> And yes, the API is a horrible mess.

With my patch applied and thus 32bit dma_mask set for NVMe device, I do
see high addresses passed to dma_map_*() routines and handled by
swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
operation but silently replace mask behind the scene" is required for
swiotlb to be used, does not match reality.

It can be interpreted as a breakage elsewhere, but it's hard to point
particular "root cause". The entire infrastructure to allocate and use
DMA memory is messy.

Still current code does not work, thus fix is needed.

Perhaps need to introduce some generic API to "allocate memory best
suited for DMA to particular device", and fix allocation points (in
drivers, filesystems, etc) to use it. Such an API could try to allocate
area that can be DMAed by hardware, and fallback to other memory that
can be used via swiotlb or other bounce buffer implementation.

But for now, have to stay with dma masks. Will follow-up with a patch
based on your but with coherent mask handling added.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-06 13:47                   ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-06 13:47 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel

>>> Just a guess, but if the inbound translation windows in the host
>>> bridge are wider than 32-bit, the reason for setting up a single
>>> 32-bit window is probably because that is what the parent bus supports.

I've re-checked rcar-pcie hardware documentation.

It indeed mentions that AXI bus it sits on is 32-bit.

>> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
>> case - thus I don't object :)   But it can break other cases ...
>>
>> But why do you hook at set_dma_mask() and overwrite mask inside, instead
>> of hooking at dma_supported() and rejecting unsupported mask?
>>
>> I think later is better, because it lets drivers to handle unsupported
>> high-dma case, like documented in DMA-API_HOWTO.
> 
> I think the behavior I put in there is required for swiotlb to make
> sense, otherwise you would rely on the driver to handle dma_set_mask()
> failure gracefully with its own bounce buffers (as network and
> scsi drivers do but others don't).
> 
> Having swiotlb or iommu enabled should result in dma_set_mask() always
> succeeding unless the mask is too small to cover the swiotlb
> bounce buffer area or the iommu virtual address space. This behavior
> is particularly important in case the bus address space is narrower
> than 32-bit, as we have to guarantee that the fallback to 32-bit
> DMA always succeeds. There are also a lot of drivers that try to
> set a 64-bit mask but don't implement bounce buffers for streaming
> mappings if that fails, and swiotlb is what we use to make those
> drivers work.
> 
> And yes, the API is a horrible mess.

With my patch applied and thus 32bit dma_mask set for NVMe device, I do
see high addresses passed to dma_map_*() routines and handled by
swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
operation but silently replace mask behind the scene" is required for
swiotlb to be used, does not match reality.

It can be interpreted as a breakage elsewhere, but it's hard to point
particular "root cause". The entire infrastructure to allocate and use
DMA memory is messy.

Still current code does not work, thus fix is needed.

Perhaps need to introduce some generic API to "allocate memory best
suited for DMA to particular device", and fix allocation points (in
drivers, filesystems, etc) to use it. Such an API could try to allocate
area that can be DMAed by hardware, and fallback to other memory that
can be used via swiotlb or other bounce buffer implementation.

But for now, have to stay with dma masks. Will follow-up with a patch
based on your but with coherent mask handling added.

Nikita

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH] arm64: do not set dma masks that device connection can't handle
  2017-01-06 13:47                   ` Nikita Yushchenko
@ 2017-01-06 14:38                     ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-06 14:38 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, Bjorn Helgaas, artemi.ivanov,
	Nikita Yushchenko

It is possible that device is capable of 64-bit DMA addresses, and
device driver tries to set wide DMA mask, but bridge or bus used to
connect device to the system can't handle wide addresses.

With swiotlb, memory above 4G still can be used by drivers for streaming
DMA, but *dev->mask and dev->dma_coherent_mask must still keep values
that hardware handles physically.

This patch enforces that. Based on original version by
Arnd Bergmann <arnd@arndb.de>, extended with coherent mask hadnling.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
CC: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm64/Kconfig              |  3 +++
 arch/arm64/include/asm/device.h |  1 +
 arch/arm64/mm/dma-mapping.c     | 40 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1117421..afb2c08 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -216,6 +216,9 @@ config NEED_DMA_MAP_STATE
 config NEED_SG_DMA_LENGTH
 	def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y
 
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 243ef25..a57e7bb 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -22,6 +22,7 @@ struct dev_archdata {
 	void *iommu;			/* private IOMMU data */
 #endif
 	bool dma_coherent;
+	u64 parent_dma_mask;
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f..be3632e 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return 1;
 }
 
+static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
+{
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	/* mask is below swiotlb bounce buffer, so fail */
+	if (!swiotlb_dma_supported(dev, mask))
+		return -EIO;
+
+	/*
+	 * because of the swiotlb, we can return success for
+	 * larger masks, but need to ensure that bounce buffers
+	 * are used above parent_dma_mask, so set that as
+	 * the effective mask.
+	 */
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static struct dma_map_ops swiotlb_dma_ops = {
 	.alloc = __dma_alloc,
 	.free = __dma_free,
@@ -367,8 +392,23 @@ static struct dma_map_ops swiotlb_dma_ops = {
 	.sync_sg_for_device = __swiotlb_sync_sg_for_device,
 	.dma_supported = __swiotlb_dma_supported,
 	.mapping_error = swiotlb_dma_mapping_error,
+	.set_dma_mask = __swiotlb_set_dma_mask,
 };
 
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+	if (!dma_supported(dev, mask))
+		return -EIO;
+
+	if (get_dma_ops(dev) == &swiotlb_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
 static int __init atomic_pool_init(void)
 {
 	pgprot_t prot = __pgprot(PROT_NORMAL_NC);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH] arm64: do not set dma masks that device connection can't handle
@ 2017-01-06 14:38                     ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-06 14:38 UTC (permalink / raw)
  To: linux-arm-kernel

It is possible that device is capable of 64-bit DMA addresses, and
device driver tries to set wide DMA mask, but bridge or bus used to
connect device to the system can't handle wide addresses.

With swiotlb, memory above 4G still can be used by drivers for streaming
DMA, but *dev->mask and dev->dma_coherent_mask must still keep values
that hardware handles physically.

This patch enforces that. Based on original version by
Arnd Bergmann <arnd@arndb.de>, extended with coherent mask hadnling.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
CC: Arnd Bergmann <arnd@arndb.de>
---
 arch/arm64/Kconfig              |  3 +++
 arch/arm64/include/asm/device.h |  1 +
 arch/arm64/mm/dma-mapping.c     | 40 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 44 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1117421..afb2c08 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -216,6 +216,9 @@ config NEED_DMA_MAP_STATE
 config NEED_SG_DMA_LENGTH
 	def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y
 
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 243ef25..a57e7bb 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -22,6 +22,7 @@ struct dev_archdata {
 	void *iommu;			/* private IOMMU data */
 #endif
 	bool dma_coherent;
+	u64 parent_dma_mask;
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f..be3632e 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return 1;
 }
 
+static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
+{
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	/* mask is below swiotlb bounce buffer, so fail */
+	if (!swiotlb_dma_supported(dev, mask))
+		return -EIO;
+
+	/*
+	 * because of the swiotlb, we can return success for
+	 * larger masks, but need to ensure that bounce buffers
+	 * are used above parent_dma_mask, so set that as
+	 * the effective mask.
+	 */
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static struct dma_map_ops swiotlb_dma_ops = {
 	.alloc = __dma_alloc,
 	.free = __dma_free,
@@ -367,8 +392,23 @@ static struct dma_map_ops swiotlb_dma_ops = {
 	.sync_sg_for_device = __swiotlb_sync_sg_for_device,
 	.dma_supported = __swiotlb_dma_supported,
 	.mapping_error = swiotlb_dma_mapping_error,
+	.set_dma_mask = __swiotlb_set_dma_mask,
 };
 
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+	if (!dma_supported(dev, mask))
+		return -EIO;
+
+	if (get_dma_ops(dev) == &swiotlb_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
 static int __init atomic_pool_init(void)
 {
 	pgprot_t prot = __pgprot(PROT_NORMAL_NC);
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH] arm64: do not set dma masks that device connection can't handle
  2017-01-06 13:47                   ` Nikita Yushchenko
@ 2017-01-06 14:45                     ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-06 14:45 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, Bjorn Helgaas, artemi.ivanov,
	Nikita Yushchenko

It is possible that device is capable of 64-bit DMA addresses, and
device driver tries to set wide DMA mask, but bridge or bus used to
connect device to the system can't handle wide addresses.

With swiotlb, memory above 4G still can be used by drivers for streaming
DMA, but *dev->mask and dev->dma_coherent_mask must still keep values
that hardware handles physically.

This patch enforces that. Based on original version by
Arnd Bergmann <arnd@arndb.de>, extended with coherent mask hadnling.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
CC: Arnd Bergmann <arnd@arndb.de>
---

... now with initially missed change in arch_setup_dma_ops() ...

 arch/arm64/Kconfig              |  3 +++
 arch/arm64/include/asm/device.h |  1 +
 arch/arm64/mm/dma-mapping.c     | 52 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1117421..afb2c08 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -216,6 +216,9 @@ config NEED_DMA_MAP_STATE
 config NEED_SG_DMA_LENGTH
 	def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y
 
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 243ef25..a57e7bb 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -22,6 +22,7 @@ struct dev_archdata {
 	void *iommu;			/* private IOMMU data */
 #endif
 	bool dma_coherent;
+	u64 parent_dma_mask;
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f..09c7900 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return 1;
 }
 
+static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
+{
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	/* mask is below swiotlb bounce buffer, so fail */
+	if (!swiotlb_dma_supported(dev, mask))
+		return -EIO;
+
+	/*
+	 * because of the swiotlb, we can return success for
+	 * larger masks, but need to ensure that bounce buffers
+	 * are used above parent_dma_mask, so set that as
+	 * the effective mask.
+	 */
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static struct dma_map_ops swiotlb_dma_ops = {
 	.alloc = __dma_alloc,
 	.free = __dma_free,
@@ -367,8 +392,23 @@ static struct dma_map_ops swiotlb_dma_ops = {
 	.sync_sg_for_device = __swiotlb_sync_sg_for_device,
 	.dma_supported = __swiotlb_dma_supported,
 	.mapping_error = swiotlb_dma_mapping_error,
+	.set_dma_mask = __swiotlb_set_dma_mask,
 };
 
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+	if (!dma_supported(dev, mask))
+		return -EIO;
+
+	if (get_dma_ops(dev) == &swiotlb_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
 static int __init atomic_pool_init(void)
 {
 	pgprot_t prot = __pgprot(PROT_NORMAL_NC);
@@ -957,6 +997,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	if (!dev->archdata.dma_ops)
 		dev->archdata.dma_ops = &swiotlb_dma_ops;
 
+	/*
+	 * we don't yet support buses that have a non-zero mapping.
+	 *  Let's hope we won't need it
+	 */
+	WARN_ON(dma_base != 0);
+
+	/*
+	 * Whatever the parent bus can set. A device must not set
+	 * a DMA mask larger than this.
+	 */
+	dev->archdata.parent_dma_mask = size;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* [PATCH] arm64: do not set dma masks that device connection can't handle
@ 2017-01-06 14:45                     ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-06 14:45 UTC (permalink / raw)
  To: linux-arm-kernel

It is possible that device is capable of 64-bit DMA addresses, and
device driver tries to set wide DMA mask, but bridge or bus used to
connect device to the system can't handle wide addresses.

With swiotlb, memory above 4G still can be used by drivers for streaming
DMA, but *dev->mask and dev->dma_coherent_mask must still keep values
that hardware handles physically.

This patch enforces that. Based on original version by
Arnd Bergmann <arnd@arndb.de>, extended with coherent mask hadnling.

Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
CC: Arnd Bergmann <arnd@arndb.de>
---

... now with initially missed change in arch_setup_dma_ops() ...

 arch/arm64/Kconfig              |  3 +++
 arch/arm64/include/asm/device.h |  1 +
 arch/arm64/mm/dma-mapping.c     | 52 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 56 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 1117421..afb2c08 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -216,6 +216,9 @@ config NEED_DMA_MAP_STATE
 config NEED_SG_DMA_LENGTH
 	def_bool y
 
+config ARCH_HAS_DMA_SET_COHERENT_MASK
+	def_bool y
+
 config SMP
 	def_bool y
 
diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 243ef25..a57e7bb 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -22,6 +22,7 @@ struct dev_archdata {
 	void *iommu;			/* private IOMMU data */
 #endif
 	bool dma_coherent;
+	u64 parent_dma_mask;
 };
 
 struct pdev_archdata {
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 290a84f..09c7900 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
 	return 1;
 }
 
+static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
+{
+	/* device is not DMA capable */
+	if (!dev->dma_mask)
+		return -EIO;
+
+	/* mask is below swiotlb bounce buffer, so fail */
+	if (!swiotlb_dma_supported(dev, mask))
+		return -EIO;
+
+	/*
+	 * because of the swiotlb, we can return success for
+	 * larger masks, but need to ensure that bounce buffers
+	 * are used above parent_dma_mask, so set that as
+	 * the effective mask.
+	 */
+	if (mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+
+	*dev->dma_mask = mask;
+
+	return 0;
+}
+
 static struct dma_map_ops swiotlb_dma_ops = {
 	.alloc = __dma_alloc,
 	.free = __dma_free,
@@ -367,8 +392,23 @@ static struct dma_map_ops swiotlb_dma_ops = {
 	.sync_sg_for_device = __swiotlb_sync_sg_for_device,
 	.dma_supported = __swiotlb_dma_supported,
 	.mapping_error = swiotlb_dma_mapping_error,
+	.set_dma_mask = __swiotlb_set_dma_mask,
 };
 
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+	if (!dma_supported(dev, mask))
+		return -EIO;
+
+	if (get_dma_ops(dev) == &swiotlb_dma_ops &&
+	    mask > dev->archdata.parent_dma_mask)
+		mask = dev->archdata.parent_dma_mask;
+
+	dev->coherent_dma_mask = mask;
+	return 0;
+}
+EXPORT_SYMBOL(dma_set_coherent_mask);
+
 static int __init atomic_pool_init(void)
 {
 	pgprot_t prot = __pgprot(PROT_NORMAL_NC);
@@ -957,6 +997,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 	if (!dev->archdata.dma_ops)
 		dev->archdata.dma_ops = &swiotlb_dma_ops;
 
+	/*
+	 * we don't yet support buses that have a non-zero mapping.
+	 *  Let's hope we won't need it
+	 */
+	WARN_ON(dma_base != 0);
+
+	/*
+	 * Whatever the parent bus can set. A device must not set
+	 * a DMA mask larger than this.
+	 */
+	dev->archdata.parent_dma_mask = size;
+
 	dev->archdata.dma_coherent = coherent;
 	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
 }
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 115+ messages in thread

* Re: [PATCH] arm64: do not set dma masks that device connection can't handle
  2017-01-06 14:45                     ` Nikita Yushchenko
@ 2017-01-08  7:09                       ` Sergei Shtylyov
  -1 siblings, 0 replies; 115+ messages in thread
From: Sergei Shtylyov @ 2017-01-08  7:09 UTC (permalink / raw)
  To: Nikita Yushchenko, Arnd Bergmann
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, Bjorn Helgaas, artemi.ivanov

Hello!

On 1/6/2017 5:45 PM, Nikita Yushchenko wrote:

> It is possible that device is capable of 64-bit DMA addresses, and
> device driver tries to set wide DMA mask, but bridge or bus used to
> connect device to the system can't handle wide addresses.
>
> With swiotlb, memory above 4G still can be used by drivers for streaming
> DMA, but *dev->mask and dev->dma_coherent_mask must still keep values
> that hardware handles physically.
>
> This patch enforces that. Based on original version by
> Arnd Bergmann <arnd@arndb.de>, extended with coherent mask hadnling.
>
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> CC: Arnd Bergmann <arnd@arndb.de>
[...]
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 290a84f..09c7900 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  	return 1;
>  }
>
> +static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
> +{
> +	/* device is not DMA capable */
> +	if (!dev->dma_mask)
> +		return -EIO;
> +
> +	/* mask is below swiotlb bounce buffer, so fail */
> +	if (!swiotlb_dma_supported(dev, mask))
> +		return -EIO;
> +
> +	/*
> +	 * because of the swiotlb, we can return success for
> +	 * larger masks, but need to ensure that bounce buffers
> +	 * are used above parent_dma_mask, so set that as
> +	 * the effective mask.
> +	 */
> +	if (mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +

    One empty line is enough...

> +	*dev->dma_mask = mask;
> +
> +	return 0;
> +}
> +
>  static struct dma_map_ops swiotlb_dma_ops = {
>  	.alloc = __dma_alloc,
>  	.free = __dma_free,
[...]
> @@ -957,6 +997,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  	if (!dev->archdata.dma_ops)
>  		dev->archdata.dma_ops = &swiotlb_dma_ops;
>
> +	/*
> +	 * we don't yet support buses that have a non-zero mapping.
> +	 *  Let's hope we won't need it
> +	 */
> +	WARN_ON(dma_base != 0);
> +
> +	/*
> +	 * Whatever the parent bus can set. A device must not set
> +	 * a DMA mask larger than this.
> +	 */
> +	dev->archdata.parent_dma_mask = size;

    Not 'size - 1'?

> +
>  	dev->archdata.dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
>  }

MBR, Sergei

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH] arm64: do not set dma masks that device connection can't handle
@ 2017-01-08  7:09                       ` Sergei Shtylyov
  0 siblings, 0 replies; 115+ messages in thread
From: Sergei Shtylyov @ 2017-01-08  7:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hello!

On 1/6/2017 5:45 PM, Nikita Yushchenko wrote:

> It is possible that device is capable of 64-bit DMA addresses, and
> device driver tries to set wide DMA mask, but bridge or bus used to
> connect device to the system can't handle wide addresses.
>
> With swiotlb, memory above 4G still can be used by drivers for streaming
> DMA, but *dev->mask and dev->dma_coherent_mask must still keep values
> that hardware handles physically.
>
> This patch enforces that. Based on original version by
> Arnd Bergmann <arnd@arndb.de>, extended with coherent mask hadnling.
>
> Signed-off-by: Nikita Yushchenko <nikita.yoush@cogentembedded.com>
> CC: Arnd Bergmann <arnd@arndb.de>
[...]
> diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
> index 290a84f..09c7900 100644
> --- a/arch/arm64/mm/dma-mapping.c
> +++ b/arch/arm64/mm/dma-mapping.c
> @@ -352,6 +352,31 @@ static int __swiotlb_dma_supported(struct device *hwdev, u64 mask)
>  	return 1;
>  }
>
> +static int __swiotlb_set_dma_mask(struct device *dev, u64 mask)
> +{
> +	/* device is not DMA capable */
> +	if (!dev->dma_mask)
> +		return -EIO;
> +
> +	/* mask is below swiotlb bounce buffer, so fail */
> +	if (!swiotlb_dma_supported(dev, mask))
> +		return -EIO;
> +
> +	/*
> +	 * because of the swiotlb, we can return success for
> +	 * larger masks, but need to ensure that bounce buffers
> +	 * are used above parent_dma_mask, so set that as
> +	 * the effective mask.
> +	 */
> +	if (mask > dev->archdata.parent_dma_mask)
> +		mask = dev->archdata.parent_dma_mask;
> +
> +

    One empty line is enough...

> +	*dev->dma_mask = mask;
> +
> +	return 0;
> +}
> +
>  static struct dma_map_ops swiotlb_dma_ops = {
>  	.alloc = __dma_alloc,
>  	.free = __dma_free,
[...]
> @@ -957,6 +997,18 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>  	if (!dev->archdata.dma_ops)
>  		dev->archdata.dma_ops = &swiotlb_dma_ops;
>
> +	/*
> +	 * we don't yet support buses that have a non-zero mapping.
> +	 *  Let's hope we won't need it
> +	 */
> +	WARN_ON(dma_base != 0);
> +
> +	/*
> +	 * Whatever the parent bus can set. A device must not set
> +	 * a DMA mask larger than this.
> +	 */
> +	dev->archdata.parent_dma_mask = size;

    Not 'size - 1'?

> +
>  	dev->archdata.dma_coherent = coherent;
>  	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
>  }

MBR, Sergei

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH] arm64: do not set dma masks that device connection can't handle
  2017-01-08  7:09                       ` Sergei Shtylyov
@ 2017-01-09  6:56                         ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-09  6:56 UTC (permalink / raw)
  To: Sergei Shtylyov, Arnd Bergmann
  Cc: linux-arm-kernel, Will Deacon, Catalin Marinas, linux-kernel,
	linux-renesas-soc, Simon Horman, Bjorn Helgaas, artemi.ivanov

>> +    if (mask > dev->archdata.parent_dma_mask)
>> +        mask = dev->archdata.parent_dma_mask;
>> +
>> +
> 
>    One empty line is enough...

Ok

>> +    /*
>> +     * Whatever the parent bus can set. A device must not set
>> +     * a DMA mask larger than this.
>> +     */
>> +    dev->archdata.parent_dma_mask = size;
> 
>    Not 'size - 1'?

Good question.

Indeed of_dma_configure() calls arch_setup_dma_ops() with size, not
mask. Which implies '-1' is needed here. Although better fix may be to
change caller side - to make DMA_BIT_MASK(64) case cleaner.

Will repost path.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH] arm64: do not set dma masks that device connection can't handle
@ 2017-01-09  6:56                         ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-09  6:56 UTC (permalink / raw)
  To: linux-arm-kernel

>> +    if (mask > dev->archdata.parent_dma_mask)
>> +        mask = dev->archdata.parent_dma_mask;
>> +
>> +
> 
>    One empty line is enough...

Ok

>> +    /*
>> +     * Whatever the parent bus can set. A device must not set
>> +     * a DMA mask larger than this.
>> +     */
>> +    dev->archdata.parent_dma_mask = size;
> 
>    Not 'size - 1'?

Good question.

Indeed of_dma_configure() calls arch_setup_dma_ops() with size, not
mask. Which implies '-1' is needed here. Although better fix may be to
change caller side - to make DMA_BIT_MASK(64) case cleaner.

Will repost path.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-06 13:47                   ` Nikita Yushchenko
  (?)
@ 2017-01-09 14:05                     ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-09 14:05 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nikita Yushchenko, Catalin Marinas, Will Deacon, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

On Friday, January 6, 2017 4:47:59 PM CET Nikita Yushchenko wrote:
> >>> Just a guess, but if the inbound translation windows in the host
> >>> bridge are wider than 32-bit, the reason for setting up a single
> >>> 32-bit window is probably because that is what the parent bus supports.
> 
> I've re-checked rcar-pcie hardware documentation.
> 
> It indeed mentions that AXI bus it sits on is 32-bit.
> 
> 
> >> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
> >> case - thus I don't object :)   But it can break other cases ...
> >>
> >> But why do you hook at set_dma_mask() and overwrite mask inside, instead
> >> of hooking at dma_supported() and rejecting unsupported mask?
> >>
> >> I think later is better, because it lets drivers to handle unsupported
> >> high-dma case, like documented in DMA-API_HOWTO.
> > 
> > I think the behavior I put in there is required for swiotlb to make
> > sense, otherwise you would rely on the driver to handle dma_set_mask()
> > failure gracefully with its own bounce buffers (as network and
> > scsi drivers do but others don't).
> > 
> > Having swiotlb or iommu enabled should result in dma_set_mask() always
> > succeeding unless the mask is too small to cover the swiotlb
> > bounce buffer area or the iommu virtual address space. This behavior
> > is particularly important in case the bus address space is narrower
> > than 32-bit, as we have to guarantee that the fallback to 32-bit
> > DMA always succeeds. There are also a lot of drivers that try to
> > set a 64-bit mask but don't implement bounce buffers for streaming
> > mappings if that fails, and swiotlb is what we use to make those
> > drivers work.
> > 
> > And yes, the API is a horrible mess.
> 
> With my patch applied and thus 32bit dma_mask set for NVMe device, I do
> see high addresses passed to dma_map_*() routines and handled by
> swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
> operation but silently replace mask behind the scene" is required for
> swiotlb to be used, does not match reality.

See my point about drivers that don't implement bounce buffering.
Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage
drivers that do their own thing.

The problem again is the inconsistency of the API.

> It can be interpreted as a breakage elsewhere, but it's hard to point
> particular "root cause". The entire infrastructure to allocate and use
> DMA memory is messy.

Absolutely.

What I think happened here in chronological order is:

- In the old days, 64-bit architectures tended to use an IOMMU 
  all the time to work around 32-bit limitations on DMA masters
- Some architectures had no IOMMU that fully solved this and the
  dma-mapping API required drivers to set the right mask and check
  the return code. If this failed, the driver needed to use its
  own bounce buffers as network and scsi do. See also the
  grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro.
- As we never had support for bounce buffers in all drivers, and
  early 64-bit Intel machines had no IOMMU, the swiotlb code was
  introduced as a workaround, so we can use the IOMMU case without
  driver specific bounce buffers everywhere
- As most of the important 64-bit architectures (x86, arm64, powerpc)
  now always have either IOMMU or swiotlb enabled, drivers like
  NVMe started relying on it, and no longer handle a dma_set_mask
  failure properly.

We may need to audit how drivers typically handle dma_set_mask()
failure. The NVMe driver in its current state will probably cause
silent data corruption when used on a 64-bit architecture that has
a 32-bit bus but neither swiotlb nor iommu enabled at runtime.

I would argue that the driver should be fixed to either refuse
working in that configuration to avoid data corruption, or that
it should implement bounce buffering like SCSI does. If we make it
simply not work, then your suggestion of making dma_set_mask()
fail will break your system in a different way.

> Still current code does not work, thus fix is needed.
> 
> Perhaps need to introduce some generic API to "allocate memory best
> suited for DMA to particular device", and fix allocation points (in
> drivers, filesystems, etc) to use it. Such an API could try to allocate
> area that can be DMAed by hardware, and fallback to other memory that
> can be used via swiotlb or other bounce buffer implementation.

The DMA mapping API is meant to do this, but we can definitely improve
it or clarify some of the rules.
 
> But for now, have to stay with dma masks. Will follow-up with a patch
> based on your but with coherent mask handling added.

Ok.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-09 14:05                     ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-09 14:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday, January 6, 2017 4:47:59 PM CET Nikita Yushchenko wrote:
> >>> Just a guess, but if the inbound translation windows in the host
> >>> bridge are wider than 32-bit, the reason for setting up a single
> >>> 32-bit window is probably because that is what the parent bus supports.
> 
> I've re-checked rcar-pcie hardware documentation.
> 
> It indeed mentions that AXI bus it sits on is 32-bit.
> 
> 
> >> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
> >> case - thus I don't object :)   But it can break other cases ...
> >>
> >> But why do you hook at set_dma_mask() and overwrite mask inside, instead
> >> of hooking at dma_supported() and rejecting unsupported mask?
> >>
> >> I think later is better, because it lets drivers to handle unsupported
> >> high-dma case, like documented in DMA-API_HOWTO.
> > 
> > I think the behavior I put in there is required for swiotlb to make
> > sense, otherwise you would rely on the driver to handle dma_set_mask()
> > failure gracefully with its own bounce buffers (as network and
> > scsi drivers do but others don't).
> > 
> > Having swiotlb or iommu enabled should result in dma_set_mask() always
> > succeeding unless the mask is too small to cover the swiotlb
> > bounce buffer area or the iommu virtual address space. This behavior
> > is particularly important in case the bus address space is narrower
> > than 32-bit, as we have to guarantee that the fallback to 32-bit
> > DMA always succeeds. There are also a lot of drivers that try to
> > set a 64-bit mask but don't implement bounce buffers for streaming
> > mappings if that fails, and swiotlb is what we use to make those
> > drivers work.
> > 
> > And yes, the API is a horrible mess.
> 
> With my patch applied and thus 32bit dma_mask set for NVMe device, I do
> see high addresses passed to dma_map_*() routines and handled by
> swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
> operation but silently replace mask behind the scene" is required for
> swiotlb to be used, does not match reality.

See my point about drivers that don't implement bounce buffering.
Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage
drivers that do their own thing.

The problem again is the inconsistency of the API.

> It can be interpreted as a breakage elsewhere, but it's hard to point
> particular "root cause". The entire infrastructure to allocate and use
> DMA memory is messy.

Absolutely.

What I think happened here in chronological order is:

- In the old days, 64-bit architectures tended to use an IOMMU 
  all the time to work around 32-bit limitations on DMA masters
- Some architectures had no IOMMU that fully solved this and the
  dma-mapping API required drivers to set the right mask and check
  the return code. If this failed, the driver needed to use its
  own bounce buffers as network and scsi do. See also the
  grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro.
- As we never had support for bounce buffers in all drivers, and
  early 64-bit Intel machines had no IOMMU, the swiotlb code was
  introduced as a workaround, so we can use the IOMMU case without
  driver specific bounce buffers everywhere
- As most of the important 64-bit architectures (x86, arm64, powerpc)
  now always have either IOMMU or swiotlb enabled, drivers like
  NVMe started relying on it, and no longer handle a dma_set_mask
  failure properly.

We may need to audit how drivers typically handle dma_set_mask()
failure. The NVMe driver in its current state will probably cause
silent data corruption when used on a 64-bit architecture that has
a 32-bit bus but neither swiotlb nor iommu enabled at runtime.

I would argue that the driver should be fixed to either refuse
working in that configuration to avoid data corruption, or that
it should implement bounce buffering like SCSI does. If we make it
simply not work, then your suggestion of making dma_set_mask()
fail will break your system in a different way.

> Still current code does not work, thus fix is needed.
> 
> Perhaps need to introduce some generic API to "allocate memory best
> suited for DMA to particular device", and fix allocation points (in
> drivers, filesystems, etc) to use it. Such an API could try to allocate
> area that can be DMAed by hardware, and fallback to other memory that
> can be used via swiotlb or other bounce buffer implementation.

The DMA mapping API is meant to do this, but we can definitely improve
it or clarify some of the rules.
 
> But for now, have to stay with dma masks. Will follow-up with a patch
> based on your but with coherent mask handling added.

Ok.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-09 14:05                     ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-09 14:05 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nikita Yushchenko, Catalin Marinas, Will Deacon, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov

On Friday, January 6, 2017 4:47:59 PM CET Nikita Yushchenko wrote:
> >>> Just a guess, but if the inbound translation windows in the host
> >>> bridge are wider than 32-bit, the reason for setting up a single
> >>> 32-bit window is probably because that is what the parent bus supports.
> 
> I've re-checked rcar-pcie hardware documentation.
> 
> It indeed mentions that AXI bus it sits on is 32-bit.
> 
> 
> >> Well anyway applying patch similar to your's will fix pcie-rcar + nvme
> >> case - thus I don't object :)   But it can break other cases ...
> >>
> >> But why do you hook at set_dma_mask() and overwrite mask inside, instead
> >> of hooking at dma_supported() and rejecting unsupported mask?
> >>
> >> I think later is better, because it lets drivers to handle unsupported
> >> high-dma case, like documented in DMA-API_HOWTO.
> > 
> > I think the behavior I put in there is required for swiotlb to make
> > sense, otherwise you would rely on the driver to handle dma_set_mask()
> > failure gracefully with its own bounce buffers (as network and
> > scsi drivers do but others don't).
> > 
> > Having swiotlb or iommu enabled should result in dma_set_mask() always
> > succeeding unless the mask is too small to cover the swiotlb
> > bounce buffer area or the iommu virtual address space. This behavior
> > is particularly important in case the bus address space is narrower
> > than 32-bit, as we have to guarantee that the fallback to 32-bit
> > DMA always succeeds. There are also a lot of drivers that try to
> > set a 64-bit mask but don't implement bounce buffers for streaming
> > mappings if that fails, and swiotlb is what we use to make those
> > drivers work.
> > 
> > And yes, the API is a horrible mess.
> 
> With my patch applied and thus 32bit dma_mask set for NVMe device, I do
> see high addresses passed to dma_map_*() routines and handled by
> swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
> operation but silently replace mask behind the scene" is required for
> swiotlb to be used, does not match reality.

See my point about drivers that don't implement bounce buffering.
Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage
drivers that do their own thing.

The problem again is the inconsistency of the API.

> It can be interpreted as a breakage elsewhere, but it's hard to point
> particular "root cause". The entire infrastructure to allocate and use
> DMA memory is messy.

Absolutely.

What I think happened here in chronological order is:

- In the old days, 64-bit architectures tended to use an IOMMU 
  all the time to work around 32-bit limitations on DMA masters
- Some architectures had no IOMMU that fully solved this and the
  dma-mapping API required drivers to set the right mask and check
  the return code. If this failed, the driver needed to use its
  own bounce buffers as network and scsi do. See also the
  grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro.
- As we never had support for bounce buffers in all drivers, and
  early 64-bit Intel machines had no IOMMU, the swiotlb code was
  introduced as a workaround, so we can use the IOMMU case without
  driver specific bounce buffers everywhere
- As most of the important 64-bit architectures (x86, arm64, powerpc)
  now always have either IOMMU or swiotlb enabled, drivers like
  NVMe started relying on it, and no longer handle a dma_set_mask
  failure properly.

We may need to audit how drivers typically handle dma_set_mask()
failure. The NVMe driver in its current state will probably cause
silent data corruption when used on a 64-bit architecture that has
a 32-bit bus but neither swiotlb nor iommu enabled at runtime.

I would argue that the driver should be fixed to either refuse
working in that configuration to avoid data corruption, or that
it should implement bounce buffering like SCSI does. If we make it
simply not work, then your suggestion of making dma_set_mask()
fail will break your system in a different way.

> Still current code does not work, thus fix is needed.
> 
> Perhaps need to introduce some generic API to "allocate memory best
> suited for DMA to particular device", and fix allocation points (in
> drivers, filesystems, etc) to use it. Such an API could try to allocate
> area that can be DMAed by hardware, and fallback to other memory that
> can be used via swiotlb or other bounce buffer implementation.

The DMA mapping API is meant to do this, but we can definitely improve
it or clarify some of the rules.
 
> But for now, have to stay with dma masks. Will follow-up with a patch
> based on your but with coherent mask handling added.

Ok.

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-09 14:05                     ` Arnd Bergmann
  (?)
  (?)
@ 2017-01-09 20:34                       ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-09 20:34 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Keith Busch, Jens Axboe, Christoph Hellwig, Sagi Grimberg,
	linux-nvme

[CCing NVMe maintainers since we are discussion issues in that driver]

>> With my patch applied and thus 32bit dma_mask set for NVMe device, I do
>> see high addresses passed to dma_map_*() routines and handled by
>> swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
>> operation but silently replace mask behind the scene" is required for
>> swiotlb to be used, does not match reality.
> 
> See my point about drivers that don't implement bounce buffering.
> Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage
> drivers that do their own thing.

I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
but in block layer, in particular it should be controlled by
blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
is something completely different, namely it is for request merging for
hw not supporting scatter-gather].  And NVMe also uses block layer and
thus should get same support.

But blk_queue_bounce_limit() is somewhat broken, it has very strange
code under #if BITS_PER_LONG == 64 that makes setting max_addr to
0xffffffff not working if max_low_pfn is above 4G.

Maybe fixing that, together with making NVMe use this API, could stop it
from issuing dma_map()s of addresses beyond mask.

> What I think happened here in chronological order is:
> 
> - In the old days, 64-bit architectures tended to use an IOMMU 
>   all the time to work around 32-bit limitations on DMA masters
> - Some architectures had no IOMMU that fully solved this and the
>   dma-mapping API required drivers to set the right mask and check
>   the return code. If this failed, the driver needed to use its
>   own bounce buffers as network and scsi do. See also the
>   grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro.
> - As we never had support for bounce buffers in all drivers, and
>   early 64-bit Intel machines had no IOMMU, the swiotlb code was
>   introduced as a workaround, so we can use the IOMMU case without
>   driver specific bounce buffers everywhere
> - As most of the important 64-bit architectures (x86, arm64, powerpc)
>   now always have either IOMMU or swiotlb enabled, drivers like
>   NVMe started relying on it, and no longer handle a dma_set_mask
>   failure properly.

... and architectures started to add to this breakage, not handling
dma_set_mask() as documented.

As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this
macro in the kernel do is - *disable* bounce buffers in block layer if
PCI_DMA_BUS_IS_PHYS is zero.  Defining it to zero (as arm64 currently
does) on system with memory above 4G makes all block drivers to depend
on swiotlb (or iommu). Affected drivers are SCSI and IDE.

> We may need to audit how drivers typically handle dma_set_mask()
> failure. The NVMe driver in its current state will probably cause
> silent data corruption when used on a 64-bit architecture that has
> a 32-bit bus but neither swiotlb nor iommu enabled at runtime.

With current code NVME causes system memory breakage even if swiotlb is
there - because it's dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call
has effect of silent disable of swiotlb.

> I would argue that the driver should be fixed to either refuse
> working in that configuration to avoid data corruption, or that
> it should implement bounce buffering like SCSI does.

Difference from "SCSI" (actually - from block drivers that work) is in
that dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call: driver that does
not do it works, driver that does it fails.

Per documentation, driver *should* do it if it's hardware supports
64-bit dma, and platform *should* either fail this call, or ensure that
64-bit addresses can be dma_map()ed successfully.

So what we have on arm64 is - drivers that follow documented procedure
fail, drivers that don't follow it work, That's nonsense.

> If we make it
> simply not work, then your suggestion of making dma_set_mask()
> fail will break your system in a different way.

Proper fix should fix *both* architecture and NVMe.

- architecture should stop breaking 64-bit DMA when driver attempts to
set 64-bit dma mask,

- NVMe should issue proper blk_queue_bounce_limit() call based on what
is actually set mask,

- and blk_queue_bounce_limit() should also be fixed to actually set
0xffffffff limit, instead of replacing it with (max_low_pfn <<
PAGE_SHIFT) as it does now.

>> Still current code does not work, thus fix is needed.
>>
>> Perhaps need to introduce some generic API to "allocate memory best
>> suited for DMA to particular device", and fix allocation points (in
>> drivers, filesystems, etc) to use it. Such an API could try to allocate
>> area that can be DMAed by hardware, and fallback to other memory that
>> can be used via swiotlb or other bounce buffer implementation.
> 
> The DMA mapping API is meant to do this, but we can definitely improve
> it or clarify some of the rules.

DMA mapping API can't help here, it's about mapping, not about allocation.

What I mean is some API to allocate memory for use with streaming DMA in
such way that bounce buffers won't be needed. There are many cases when
at buffer allocation time, it is already known that buffer will be used
for DMA with particular device. Bounce buffers will still be needed
cases when no such information is available at allocation time, or when
there is no directly-DMAable memory available at allocation time.

>> But for now, have to stay with dma masks. Will follow-up with a patch
>> based on your but with coherent mask handling added.
> 
> Ok.

Already posted. Can we have that merged? At least it will make things to
stop breaking memory and start working.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-09 20:34                       ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-09 20:34 UTC (permalink / raw)
  To: linux-arm-kernel

[CCing NVMe maintainers since we are discussion issues in that driver]

>> With my patch applied and thus 32bit dma_mask set for NVMe device, I do
>> see high addresses passed to dma_map_*() routines and handled by
>> swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
>> operation but silently replace mask behind the scene" is required for
>> swiotlb to be used, does not match reality.
> 
> See my point about drivers that don't implement bounce buffering.
> Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage
> drivers that do their own thing.

I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
but in block layer, in particular it should be controlled by
blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
is something completely different, namely it is for request merging for
hw not supporting scatter-gather].  And NVMe also uses block layer and
thus should get same support.

But blk_queue_bounce_limit() is somewhat broken, it has very strange
code under #if BITS_PER_LONG == 64 that makes setting max_addr to
0xffffffff not working if max_low_pfn is above 4G.

Maybe fixing that, together with making NVMe use this API, could stop it
from issuing dma_map()s of addresses beyond mask.

> What I think happened here in chronological order is:
> 
> - In the old days, 64-bit architectures tended to use an IOMMU 
>   all the time to work around 32-bit limitations on DMA masters
> - Some architectures had no IOMMU that fully solved this and the
>   dma-mapping API required drivers to set the right mask and check
>   the return code. If this failed, the driver needed to use its
>   own bounce buffers as network and scsi do. See also the
>   grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro.
> - As we never had support for bounce buffers in all drivers, and
>   early 64-bit Intel machines had no IOMMU, the swiotlb code was
>   introduced as a workaround, so we can use the IOMMU case without
>   driver specific bounce buffers everywhere
> - As most of the important 64-bit architectures (x86, arm64, powerpc)
>   now always have either IOMMU or swiotlb enabled, drivers like
>   NVMe started relying on it, and no longer handle a dma_set_mask
>   failure properly.

... and architectures started to add to this breakage, not handling
dma_set_mask() as documented.

As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this
macro in the kernel do is - *disable* bounce buffers in block layer if
PCI_DMA_BUS_IS_PHYS is zero.  Defining it to zero (as arm64 currently
does) on system with memory above 4G makes all block drivers to depend
on swiotlb (or iommu). Affected drivers are SCSI and IDE.

> We may need to audit how drivers typically handle dma_set_mask()
> failure. The NVMe driver in its current state will probably cause
> silent data corruption when used on a 64-bit architecture that has
> a 32-bit bus but neither swiotlb nor iommu enabled at runtime.

With current code NVME causes system memory breakage even if swiotlb is
there - because it's dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call
has effect of silent disable of swiotlb.

> I would argue that the driver should be fixed to either refuse
> working in that configuration to avoid data corruption, or that
> it should implement bounce buffering like SCSI does.

Difference from "SCSI" (actually - from block drivers that work) is in
that dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call: driver that does
not do it works, driver that does it fails.

Per documentation, driver *should* do it if it's hardware supports
64-bit dma, and platform *should* either fail this call, or ensure that
64-bit addresses can be dma_map()ed successfully.

So what we have on arm64 is - drivers that follow documented procedure
fail, drivers that don't follow it work, That's nonsense.

> If we make it
> simply not work, then your suggestion of making dma_set_mask()
> fail will break your system in a different way.

Proper fix should fix *both* architecture and NVMe.

- architecture should stop breaking 64-bit DMA when driver attempts to
set 64-bit dma mask,

- NVMe should issue proper blk_queue_bounce_limit() call based on what
is actually set mask,

- and blk_queue_bounce_limit() should also be fixed to actually set
0xffffffff limit, instead of replacing it with (max_low_pfn <<
PAGE_SHIFT) as it does now.

>> Still current code does not work, thus fix is needed.
>>
>> Perhaps need to introduce some generic API to "allocate memory best
>> suited for DMA to particular device", and fix allocation points (in
>> drivers, filesystems, etc) to use it. Such an API could try to allocate
>> area that can be DMAed by hardware, and fallback to other memory that
>> can be used via swiotlb or other bounce buffer implementation.
> 
> The DMA mapping API is meant to do this, but we can definitely improve
> it or clarify some of the rules.

DMA mapping API can't help here, it's about mapping, not about allocation.

What I mean is some API to allocate memory for use with streaming DMA in
such way that bounce buffers won't be needed. There are many cases when
at buffer allocation time, it is already known that buffer will be used
for DMA with particular device. Bounce buffers will still be needed
cases when no such information is available at allocation time, or when
there is no directly-DMAable memory available at allocation time.

>> But for now, have to stay with dma masks. Will follow-up with a patch
>> based on your but with coherent mask handling added.
> 
> Ok.

Already posted. Can we have that merged? At least it will make things to
stop breaking memory and start working.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-09 20:34                       ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-09 20:34 UTC (permalink / raw)


[CCing NVMe maintainers since we are discussion issues in that driver]

>> With my patch applied and thus 32bit dma_mask set for NVMe device, I do
>> see high addresses passed to dma_map_*() routines and handled by
>> swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
>> operation but silently replace mask behind the scene" is required for
>> swiotlb to be used, does not match reality.
> 
> See my point about drivers that don't implement bounce buffering.
> Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage
> drivers that do their own thing.

I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
but in block layer, in particular it should be controlled by
blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
is something completely different, namely it is for request merging for
hw not supporting scatter-gather].  And NVMe also uses block layer and
thus should get same support.

But blk_queue_bounce_limit() is somewhat broken, it has very strange
code under #if BITS_PER_LONG == 64 that makes setting max_addr to
0xffffffff not working if max_low_pfn is above 4G.

Maybe fixing that, together with making NVMe use this API, could stop it
from issuing dma_map()s of addresses beyond mask.


> What I think happened here in chronological order is:
> 
> - In the old days, 64-bit architectures tended to use an IOMMU 
>   all the time to work around 32-bit limitations on DMA masters
> - Some architectures had no IOMMU that fully solved this and the
>   dma-mapping API required drivers to set the right mask and check
>   the return code. If this failed, the driver needed to use its
>   own bounce buffers as network and scsi do. See also the
>   grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro.
> - As we never had support for bounce buffers in all drivers, and
>   early 64-bit Intel machines had no IOMMU, the swiotlb code was
>   introduced as a workaround, so we can use the IOMMU case without
>   driver specific bounce buffers everywhere
> - As most of the important 64-bit architectures (x86, arm64, powerpc)
>   now always have either IOMMU or swiotlb enabled, drivers like
>   NVMe started relying on it, and no longer handle a dma_set_mask
>   failure properly.

... and architectures started to add to this breakage, not handling
dma_set_mask() as documented.


As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this
macro in the kernel do is - *disable* bounce buffers in block layer if
PCI_DMA_BUS_IS_PHYS is zero.  Defining it to zero (as arm64 currently
does) on system with memory above 4G makes all block drivers to depend
on swiotlb (or iommu). Affected drivers are SCSI and IDE.

> We may need to audit how drivers typically handle dma_set_mask()
> failure. The NVMe driver in its current state will probably cause
> silent data corruption when used on a 64-bit architecture that has
> a 32-bit bus but neither swiotlb nor iommu enabled at runtime.

With current code NVME causes system memory breakage even if swiotlb is
there - because it's dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call
has effect of silent disable of swiotlb.


> I would argue that the driver should be fixed to either refuse
> working in that configuration to avoid data corruption, or that
> it should implement bounce buffering like SCSI does.

Difference from "SCSI" (actually - from block drivers that work) is in
that dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call: driver that does
not do it works, driver that does it fails.

Per documentation, driver *should* do it if it's hardware supports
64-bit dma, and platform *should* either fail this call, or ensure that
64-bit addresses can be dma_map()ed successfully.

So what we have on arm64 is - drivers that follow documented procedure
fail, drivers that don't follow it work, That's nonsense.

> If we make it
> simply not work, then your suggestion of making dma_set_mask()
> fail will break your system in a different way.

Proper fix should fix *both* architecture and NVMe.

- architecture should stop breaking 64-bit DMA when driver attempts to
set 64-bit dma mask,

- NVMe should issue proper blk_queue_bounce_limit() call based on what
is actually set mask,

- and blk_queue_bounce_limit() should also be fixed to actually set
0xffffffff limit, instead of replacing it with (max_low_pfn <<
PAGE_SHIFT) as it does now.


>> Still current code does not work, thus fix is needed.
>>
>> Perhaps need to introduce some generic API to "allocate memory best
>> suited for DMA to particular device", and fix allocation points (in
>> drivers, filesystems, etc) to use it. Such an API could try to allocate
>> area that can be DMAed by hardware, and fallback to other memory that
>> can be used via swiotlb or other bounce buffer implementation.
> 
> The DMA mapping API is meant to do this, but we can definitely improve
> it or clarify some of the rules.

DMA mapping API can't help here, it's about mapping, not about allocation.

What I mean is some API to allocate memory for use with streaming DMA in
such way that bounce buffers won't be needed. There are many cases when
at buffer allocation time, it is already known that buffer will be used
for DMA with particular device. Bounce buffers will still be needed
cases when no such information is available at allocation time, or when
there is no directly-DMAable memory available at allocation time.

>> But for now, have to stay with dma masks. Will follow-up with a patch
>> based on your but with coherent mask handling added.
> 
> Ok.

Already posted. Can we have that merged? At least it will make things to
stop breaking memory and start working.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-09 20:34                       ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-09 20:34 UTC (permalink / raw)
  To: Arnd Bergmann, linux-arm-kernel
  Cc: Keith Busch, Sagi Grimberg, Jens Axboe, Catalin Marinas,
	Will Deacon, linux-kernel, linux-nvme, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Christoph Hellwig

[CCing NVMe maintainers since we are discussion issues in that driver]

>> With my patch applied and thus 32bit dma_mask set for NVMe device, I do
>> see high addresses passed to dma_map_*() routines and handled by
>> swiotlb. Thus your statement that behavior "succeed 64bit dma_set_mask()
>> operation but silently replace mask behind the scene" is required for
>> swiotlb to be used, does not match reality.
> 
> See my point about drivers that don't implement bounce buffering.
> Apparently NVMe is one of them, unlike the SATA/SCSI/MMC storage
> drivers that do their own thing.

I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
but in block layer, in particular it should be controlled by
blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
is something completely different, namely it is for request merging for
hw not supporting scatter-gather].  And NVMe also uses block layer and
thus should get same support.

But blk_queue_bounce_limit() is somewhat broken, it has very strange
code under #if BITS_PER_LONG == 64 that makes setting max_addr to
0xffffffff not working if max_low_pfn is above 4G.

Maybe fixing that, together with making NVMe use this API, could stop it
from issuing dma_map()s of addresses beyond mask.

> What I think happened here in chronological order is:
> 
> - In the old days, 64-bit architectures tended to use an IOMMU 
>   all the time to work around 32-bit limitations on DMA masters
> - Some architectures had no IOMMU that fully solved this and the
>   dma-mapping API required drivers to set the right mask and check
>   the return code. If this failed, the driver needed to use its
>   own bounce buffers as network and scsi do. See also the
>   grossly misnamed "PCI_DMA_BUS_IS_PHYS" macro.
> - As we never had support for bounce buffers in all drivers, and
>   early 64-bit Intel machines had no IOMMU, the swiotlb code was
>   introduced as a workaround, so we can use the IOMMU case without
>   driver specific bounce buffers everywhere
> - As most of the important 64-bit architectures (x86, arm64, powerpc)
>   now always have either IOMMU or swiotlb enabled, drivers like
>   NVMe started relying on it, and no longer handle a dma_set_mask
>   failure properly.

... and architectures started to add to this breakage, not handling
dma_set_mask() as documented.

As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this
macro in the kernel do is - *disable* bounce buffers in block layer if
PCI_DMA_BUS_IS_PHYS is zero.  Defining it to zero (as arm64 currently
does) on system with memory above 4G makes all block drivers to depend
on swiotlb (or iommu). Affected drivers are SCSI and IDE.

> We may need to audit how drivers typically handle dma_set_mask()
> failure. The NVMe driver in its current state will probably cause
> silent data corruption when used on a 64-bit architecture that has
> a 32-bit bus but neither swiotlb nor iommu enabled at runtime.

With current code NVME causes system memory breakage even if swiotlb is
there - because it's dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call
has effect of silent disable of swiotlb.

> I would argue that the driver should be fixed to either refuse
> working in that configuration to avoid data corruption, or that
> it should implement bounce buffering like SCSI does.

Difference from "SCSI" (actually - from block drivers that work) is in
that dma_set_mask_and_coherent(DMA_BIT_MASK(64)) call: driver that does
not do it works, driver that does it fails.

Per documentation, driver *should* do it if it's hardware supports
64-bit dma, and platform *should* either fail this call, or ensure that
64-bit addresses can be dma_map()ed successfully.

So what we have on arm64 is - drivers that follow documented procedure
fail, drivers that don't follow it work, That's nonsense.

> If we make it
> simply not work, then your suggestion of making dma_set_mask()
> fail will break your system in a different way.

Proper fix should fix *both* architecture and NVMe.

- architecture should stop breaking 64-bit DMA when driver attempts to
set 64-bit dma mask,

- NVMe should issue proper blk_queue_bounce_limit() call based on what
is actually set mask,

- and blk_queue_bounce_limit() should also be fixed to actually set
0xffffffff limit, instead of replacing it with (max_low_pfn <<
PAGE_SHIFT) as it does now.

>> Still current code does not work, thus fix is needed.
>>
>> Perhaps need to introduce some generic API to "allocate memory best
>> suited for DMA to particular device", and fix allocation points (in
>> drivers, filesystems, etc) to use it. Such an API could try to allocate
>> area that can be DMAed by hardware, and fallback to other memory that
>> can be used via swiotlb or other bounce buffer implementation.
> 
> The DMA mapping API is meant to do this, but we can definitely improve
> it or clarify some of the rules.

DMA mapping API can't help here, it's about mapping, not about allocation.

What I mean is some API to allocate memory for use with streaming DMA in
such way that bounce buffers won't be needed. There are many cases when
at buffer allocation time, it is already known that buffer will be used
for DMA with particular device. Bounce buffers will still be needed
cases when no such information is available at allocation time, or when
there is no directly-DMAable memory available at allocation time.

>> But for now, have to stay with dma masks. Will follow-up with a patch
>> based on your but with coherent mask handling added.
> 
> Ok.

Already posted. Can we have that merged? At least it will make things to
stop breaking memory and start working.

Nikita

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-09 20:34                       ` Nikita Yushchenko
  (?)
  (?)
@ 2017-01-09 20:57                         ` Christoph Hellwig
  -1 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-09 20:57 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Arnd Bergmann, linux-arm-kernel, Catalin Marinas, Will Deacon,
	linux-kernel, linux-renesas-soc, Simon Horman, linux-pci,
	Bjorn Helgaas, artemi.ivanov, Keith Busch, Jens Axboe,
	Christoph Hellwig, Sagi Grimberg, linux-nvme

On Mon, Jan 09, 2017 at 11:34:55PM +0300, Nikita Yushchenko wrote:
> I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
> but in block layer, in particular it should be controlled by
> blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
> is something completely different, namely it is for request merging for
> hw not supporting scatter-gather].  And NVMe also uses block layer and
> thus should get same support.

NVMe shouldn't have to call blk_queue_bounce_limit - 
blk_queue_bounce_limit is to set the DMA addressing limit of the device.
NVMe devices must support unlimited 64-bit addressing and thus calling
blk_queue_bounce_limit from NVMe does not make sense.

That being said currently the default for a queue without a call
to blk_queue_make_request which does the wrong thing on highmem
setups, so we should fix it.  In fact BLK_BOUNCE_HIGH as-is doesn't
really make much sense these days as no driver should ever dereference
pages passed to it directly.

> Maybe fixing that, together with making NVMe use this API, could stop it
> from issuing dma_map()s of addresses beyond mask.

NVMe should never bounce, the fact that it currently possibly does
for highmem pages is a bug.

> As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this
> macro in the kernel do is - *disable* bounce buffers in block layer if
> PCI_DMA_BUS_IS_PHYS is zero.

That's not ironic but the whole point of the macro (horrible name and
the fact that it should be a dma_ops setting aside).

> - architecture should stop breaking 64-bit DMA when driver attempts to
> set 64-bit dma mask,
> 
> - NVMe should issue proper blk_queue_bounce_limit() call based on what
> is actually set mask,

Or even better remove the call to dma_set_mask_and_coherent with
DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
addressing, there is not point in trying to pretent it works without that

> - and blk_queue_bounce_limit() should also be fixed to actually set
> 0xffffffff limit, instead of replacing it with (max_low_pfn <<
> PAGE_SHIFT) as it does now.

We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
mix the highmem aspect with the addressing limits.  In fact the whole
block bouncing scheme doesn't make much sense at all these days, we
should rely on swiotlb instead.

> What I mean is some API to allocate memory for use with streaming DMA in
> such way that bounce buffers won't be needed. There are many cases when
> at buffer allocation time, it is already known that buffer will be used
> for DMA with particular device. Bounce buffers will still be needed
> cases when no such information is available at allocation time, or when
> there is no directly-DMAable memory available at allocation time.

For block I/O that is never the case.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-09 20:57                         ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-09 20:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jan 09, 2017 at 11:34:55PM +0300, Nikita Yushchenko wrote:
> I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
> but in block layer, in particular it should be controlled by
> blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
> is something completely different, namely it is for request merging for
> hw not supporting scatter-gather].  And NVMe also uses block layer and
> thus should get same support.

NVMe shouldn't have to call blk_queue_bounce_limit - 
blk_queue_bounce_limit is to set the DMA addressing limit of the device.
NVMe devices must support unlimited 64-bit addressing and thus calling
blk_queue_bounce_limit from NVMe does not make sense.

That being said currently the default for a queue without a call
to blk_queue_make_request which does the wrong thing on highmem
setups, so we should fix it.  In fact BLK_BOUNCE_HIGH as-is doesn't
really make much sense these days as no driver should ever dereference
pages passed to it directly.

> Maybe fixing that, together with making NVMe use this API, could stop it
> from issuing dma_map()s of addresses beyond mask.

NVMe should never bounce, the fact that it currently possibly does
for highmem pages is a bug.

> As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this
> macro in the kernel do is - *disable* bounce buffers in block layer if
> PCI_DMA_BUS_IS_PHYS is zero.

That's not ironic but the whole point of the macro (horrible name and
the fact that it should be a dma_ops setting aside).

> - architecture should stop breaking 64-bit DMA when driver attempts to
> set 64-bit dma mask,
> 
> - NVMe should issue proper blk_queue_bounce_limit() call based on what
> is actually set mask,

Or even better remove the call to dma_set_mask_and_coherent with
DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
addressing, there is not point in trying to pretent it works without that

> - and blk_queue_bounce_limit() should also be fixed to actually set
> 0xffffffff limit, instead of replacing it with (max_low_pfn <<
> PAGE_SHIFT) as it does now.

We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
mix the highmem aspect with the addressing limits.  In fact the whole
block bouncing scheme doesn't make much sense at all these days, we
should rely on swiotlb instead.

> What I mean is some API to allocate memory for use with streaming DMA in
> such way that bounce buffers won't be needed. There are many cases when
> at buffer allocation time, it is already known that buffer will be used
> for DMA with particular device. Bounce buffers will still be needed
> cases when no such information is available at allocation time, or when
> there is no directly-DMAable memory available at allocation time.

For block I/O that is never the case.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-09 20:57                         ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-09 20:57 UTC (permalink / raw)

On Mon, Jan 09, 2017@11:34:55PM +0300, Nikita Yushchenko wrote:
> I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
> but in block layer, in particular it should be controlled by
> blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
> is something completely different, namely it is for request merging for
> hw not supporting scatter-gather].  And NVMe also uses block layer and
> thus should get same support.

NVMe shouldn't have to call blk_queue_bounce_limit - 
blk_queue_bounce_limit is to set the DMA addressing limit of the device.
NVMe devices must support unlimited 64-bit addressing and thus calling
blk_queue_bounce_limit from NVMe does not make sense.

That being said currently the default for a queue without a call
to blk_queue_make_request which does the wrong thing on highmem
setups, so we should fix it.  In fact BLK_BOUNCE_HIGH as-is doesn't
really make much sense these days as no driver should ever dereference
pages passed to it directly.

> Maybe fixing that, together with making NVMe use this API, could stop it
> from issuing dma_map()s of addresses beyond mask.

NVMe should never bounce, the fact that it currently possibly does
for highmem pages is a bug.

> As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this
> macro in the kernel do is - *disable* bounce buffers in block layer if
> PCI_DMA_BUS_IS_PHYS is zero.

That's not ironic but the whole point of the macro (horrible name and
the fact that it should be a dma_ops setting aside).

> - architecture should stop breaking 64-bit DMA when driver attempts to
> set 64-bit dma mask,
> 
> - NVMe should issue proper blk_queue_bounce_limit() call based on what
> is actually set mask,

Or even better remove the call to dma_set_mask_and_coherent with
DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
addressing, there is not point in trying to pretent it works without that

> - and blk_queue_bounce_limit() should also be fixed to actually set
> 0xffffffff limit, instead of replacing it with (max_low_pfn <<
> PAGE_SHIFT) as it does now.

We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
mix the highmem aspect with the addressing limits.  In fact the whole
block bouncing scheme doesn't make much sense at all these days, we
should rely on swiotlb instead.

> What I mean is some API to allocate memory for use with streaming DMA in
> such way that bounce buffers won't be needed. There are many cases when
> at buffer allocation time, it is already known that buffer will be used
> for DMA with particular device. Bounce buffers will still be needed
> cases when no such information is available at allocation time, or when
> there is no directly-DMAable memory available at allocation time.

For block I/O that is never the case.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-09 20:57                         ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-09 20:57 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Keith Busch, Sagi Grimberg, Jens Axboe, Catalin Marinas,
	Will Deacon, linux-kernel, linux-nvme, Christoph Hellwig,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov, Arnd Bergmann, linux-arm-kernel

On Mon, Jan 09, 2017 at 11:34:55PM +0300, Nikita Yushchenko wrote:
> I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
> but in block layer, in particular it should be controlled by
> blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
> is something completely different, namely it is for request merging for
> hw not supporting scatter-gather].  And NVMe also uses block layer and
> thus should get same support.

NVMe shouldn't have to call blk_queue_bounce_limit - 
blk_queue_bounce_limit is to set the DMA addressing limit of the device.
NVMe devices must support unlimited 64-bit addressing and thus calling
blk_queue_bounce_limit from NVMe does not make sense.

That being said currently the default for a queue without a call
to blk_queue_make_request which does the wrong thing on highmem
setups, so we should fix it.  In fact BLK_BOUNCE_HIGH as-is doesn't
really make much sense these days as no driver should ever dereference
pages passed to it directly.

> Maybe fixing that, together with making NVMe use this API, could stop it
> from issuing dma_map()s of addresses beyond mask.

NVMe should never bounce, the fact that it currently possibly does
for highmem pages is a bug.

> As for PCI_DMA_BUS_IS_PHYS - ironically, what all current usages of this
> macro in the kernel do is - *disable* bounce buffers in block layer if
> PCI_DMA_BUS_IS_PHYS is zero.

That's not ironic but the whole point of the macro (horrible name and
the fact that it should be a dma_ops setting aside).

> - architecture should stop breaking 64-bit DMA when driver attempts to
> set 64-bit dma mask,
> 
> - NVMe should issue proper blk_queue_bounce_limit() call based on what
> is actually set mask,

Or even better remove the call to dma_set_mask_and_coherent with
DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
addressing, there is not point in trying to pretent it works without that

> - and blk_queue_bounce_limit() should also be fixed to actually set
> 0xffffffff limit, instead of replacing it with (max_low_pfn <<
> PAGE_SHIFT) as it does now.

We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
mix the highmem aspect with the addressing limits.  In fact the whole
block bouncing scheme doesn't make much sense at all these days, we
should rely on swiotlb instead.

> What I mean is some API to allocate memory for use with streaming DMA in
> such way that bounce buffers won't be needed. There are many cases when
> at buffer allocation time, it is already known that buffer will be used
> for DMA with particular device. Bounce buffers will still be needed
> cases when no such information is available at allocation time, or when
> there is no directly-DMAable memory available at allocation time.

For block I/O that is never the case.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
  2017-01-09 20:57                         ` Christoph Hellwig
                                           ` (2 preceding siblings ...)
  (?)
@ 2017-01-10  6:47                         ` Nikita Yushchenko
  2017-01-10  7:07                             ` Christoph Hellwig
  -1 siblings, 1 reply; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-10  6:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arnd Bergmann, linux-arm-kernel, Catalin Marinas, Will Deacon,
	linux-kernel, linux-renesas-soc, Simon Horman, linux-pci,
	Bjorn Helgaas, artemi.ivanov, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme

>> I believe the bounce buffering code you refer to is not in SATA/SCSI/MMC
>> but in block layer, in particular it should be controlled by
>> blk_queue_bounce_limit().  [Yes there is CONFIG_MMC_BLOCK_BOUNCE but it
>> is something completely different, namely it is for request merging for
>> hw not supporting scatter-gather].  And NVMe also uses block layer and
>> thus should get same support.
> 
> NVMe shouldn't have to call blk_queue_bounce_limit - 
> blk_queue_bounce_limit is to set the DMA addressing limit of the device.
> NVMe devices must support unlimited 64-bit addressing and thus calling
> blk_queue_bounce_limit from NVMe does not make sense.

I'm now working with HW that:
- is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
and is being manufactured and developed,
- has 75% of it's RAM located beyond first 4G of address space,
- can't physically handle incoming PCIe transactions addressed to memory
beyond 4G.

Swiotlb is used there, sure (once a bug in arm64 arch is patched). But
that setup still has at least two issues.

(1) it constantly runs of swiotlb space, logs are full of warnings
despite of rate limiting,

(2) it runs far suboptimal due to bounce-buffering almost all i/o,
despite of lots of free memory in area where direct DMA is possible.

I'm looking for proper way to address these. Shooting HW designer as you
suggested elsewhere doesn't look like a practical solution. Any better
ideas?

Per my current understanding, blk-level bounce buffering will at least
help with (1) - if done properly it will allocate bounce buffers within
entire memory below 4G, not within dedicated swiotlb space (that is
small and enlarging it makes memory permanently unavailable for other
use).  This looks simple and safe (in sense of not anyhow breaking
unrelated use cases).

Addressing (2) looks much more difficult because different memory
allocation policy is required for that.

> That being said currently the default for a queue without a call
> to blk_queue_make_request which does the wrong thing on highmem
> setups, so we should fix it.  In fact BLK_BOUNCE_HIGH as-is doesn't
> really make much sense these days as no driver should ever dereference
> pages passed to it directly.
> 
>> Maybe fixing that, together with making NVMe use this API, could stop it
>> from issuing dma_map()s of addresses beyond mask.
> 
> NVMe should never bounce, the fact that it currently possibly does
> for highmem pages is a bug.

The entire topic is absolutely not related to highmem (i.e. memory not
directly addressable by 32-bit kernel).

What we are discussing is hw-originated restriction on where DMA is
possible.

> Or even better remove the call to dma_set_mask_and_coherent with
> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> addressing, there is not point in trying to pretent it works without that

Are you claiming that NVMe driver in mainline is intentionally designed
to not work on HW that can't do DMA to entire 64-bit space?

Such setups do exist and there is interest to make them working.

> We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
> mix the highmem aspect with the addressing limits.  In fact the whole
> block bouncing scheme doesn't make much sense at all these days, we
> should rely on swiotlb instead.

I agree that centralized bounce buffering is better than
subsystem-implemented bounce buffering.

I still claim that even better - especially from performance point of
view - is some memory allocation policy that is aware of HW limitations
and avoids bounce buffering at least when it is possible.

>> What I mean is some API to allocate memory for use with streaming DMA in
>> such way that bounce buffers won't be needed. There are many cases when
>> at buffer allocation time, it is already known that buffer will be used
>> for DMA with particular device. Bounce buffers will still be needed
>> cases when no such information is available at allocation time, or when
>> there is no directly-DMAable memory available at allocation time.
> 
> For block I/O that is never the case.

Quite a few pages used for block I/O are allocated by filemap code - and
at allocation point it is known what inode page is being allocated for.
If this inode is from filesystem located on a known device with known
DMA limitations, this knowledge can be used to allocate page that can be
DMAed directly.

Sure there are lots of cases when at allocation time there is no idea
what device will run DMA on page being allocated, or perhaps page is
going to be shared, or whatever. Such cases unavoidably require bounce
buffers if page ends to be used with device with DMA limitations. But
still there are cases when better allocation can remove need for bounce
buffers - without any hurt for other cases.

Nikita

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-10  6:47                         ` NVMe vs DMA addressing limitations Nikita Yushchenko
  2017-01-10  7:07                             ` Christoph Hellwig
  (?)
@ 2017-01-10  7:07                             ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10  7:07 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Christoph Hellwig, Arnd Bergmann, linux-arm-kernel,
	Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme

On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
> I'm now working with HW that:
> - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
> and is being manufactured and developed,
> - has 75% of it's RAM located beyond first 4G of address space,
> - can't physically handle incoming PCIe transactions addressed to memory
> beyond 4G.

It might not be low end or obselete, but it's absolutely braindead.
Your I/O performance will suffer badly for the life of the platform
because someone tries to save 2 cents, and there is not much we can do
about it.

> (1) it constantly runs of swiotlb space, logs are full of warnings
> despite of rate limiting,

> Per my current understanding, blk-level bounce buffering will at least
> help with (1) - if done properly it will allocate bounce buffers within
> entire memory below 4G, not within dedicated swiotlb space (that is
> small and enlarging it makes memory permanently unavailable for other
> use).  This looks simple and safe (in sense of not anyhow breaking
> unrelated use cases).

Yes.  Although there is absolutely no reason why swiotlb could not
do the same.

> (2) it runs far suboptimal due to bounce-buffering almost all i/o,
> despite of lots of free memory in area where direct DMA is possible.

> Addressing (2) looks much more difficult because different memory
> allocation policy is required for that.

It's basically not possible.  Every piece of memory in a Linux
kernel is a possible source of I/O, and depending on the workload
type it might even be a the prime source of I/O.

> > NVMe should never bounce, the fact that it currently possibly does
> > for highmem pages is a bug.
> 
> The entire topic is absolutely not related to highmem (i.e. memory not
> directly addressable by 32-bit kernel).

I did not say this affects you, but thanks to your mail I noticed that
NVMe has a suboptimal setting there.  Also note that highmem does not
have to imply a 32-bit kernel, just physical memory that is not in the
kernel mapping.

> What we are discussing is hw-originated restriction on where DMA is
> possible.

Yes, where hw means the SOC, and not the actual I/O device, which is an
important distinction.

> > Or even better remove the call to dma_set_mask_and_coherent with
> > DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> > addressing, there is not point in trying to pretent it works without that
> 
> Are you claiming that NVMe driver in mainline is intentionally designed
> to not work on HW that can't do DMA to entire 64-bit space?

It is not intenteded to handle the case where the SOC / chipset
can't handle DMA to all physical memoery, yes.

> Such setups do exist and there is interest to make them working.

Sure, but it's not the job of the NVMe driver to work around such a broken
system.  It's something your architecture code needs to do, maybe with
a bit of core kernel support.

> Quite a few pages used for block I/O are allocated by filemap code - and
> at allocation point it is known what inode page is being allocated for.
> If this inode is from filesystem located on a known device with known
> DMA limitations, this knowledge can be used to allocate page that can be
> DMAed directly.

But in other cases we might never DMA to it.  Or we rarely DMA to it, say
for a machine running databses or qemu and using lots of direct I/O. Or
a storage target using it's local alloc_pages buffers.

> Sure there are lots of cases when at allocation time there is no idea
> what device will run DMA on page being allocated, or perhaps page is
> going to be shared, or whatever. Such cases unavoidably require bounce
> buffers if page ends to be used with device with DMA limitations. But
> still there are cases when better allocation can remove need for bounce
> buffers - without any hurt for other cases.

It takes your max 1GB DMA addressable memoery away from other uses,
and introduce the crazy highmem VM tuning issues we had with big
32-bit x86 systems in the past.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10  7:07                             ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10  7:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
> I'm now working with HW that:
> - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
> and is being manufactured and developed,
> - has 75% of it's RAM located beyond first 4G of address space,
> - can't physically handle incoming PCIe transactions addressed to memory
> beyond 4G.

It might not be low end or obselete, but it's absolutely braindead.
Your I/O performance will suffer badly for the life of the platform
because someone tries to save 2 cents, and there is not much we can do
about it.

> (1) it constantly runs of swiotlb space, logs are full of warnings
> despite of rate limiting,

> Per my current understanding, blk-level bounce buffering will at least
> help with (1) - if done properly it will allocate bounce buffers within
> entire memory below 4G, not within dedicated swiotlb space (that is
> small and enlarging it makes memory permanently unavailable for other
> use).  This looks simple and safe (in sense of not anyhow breaking
> unrelated use cases).

Yes.  Although there is absolutely no reason why swiotlb could not
do the same.

> (2) it runs far suboptimal due to bounce-buffering almost all i/o,
> despite of lots of free memory in area where direct DMA is possible.

> Addressing (2) looks much more difficult because different memory
> allocation policy is required for that.

It's basically not possible.  Every piece of memory in a Linux
kernel is a possible source of I/O, and depending on the workload
type it might even be a the prime source of I/O.

> > NVMe should never bounce, the fact that it currently possibly does
> > for highmem pages is a bug.
> 
> The entire topic is absolutely not related to highmem (i.e. memory not
> directly addressable by 32-bit kernel).

I did not say this affects you, but thanks to your mail I noticed that
NVMe has a suboptimal setting there.  Also note that highmem does not
have to imply a 32-bit kernel, just physical memory that is not in the
kernel mapping.

> What we are discussing is hw-originated restriction on where DMA is
> possible.

Yes, where hw means the SOC, and not the actual I/O device, which is an
important distinction.

> > Or even better remove the call to dma_set_mask_and_coherent with
> > DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> > addressing, there is not point in trying to pretent it works without that
> 
> Are you claiming that NVMe driver in mainline is intentionally designed
> to not work on HW that can't do DMA to entire 64-bit space?

It is not intenteded to handle the case where the SOC / chipset
can't handle DMA to all physical memoery, yes.

> Such setups do exist and there is interest to make them working.

Sure, but it's not the job of the NVMe driver to work around such a broken
system.  It's something your architecture code needs to do, maybe with
a bit of core kernel support.

> Quite a few pages used for block I/O are allocated by filemap code - and
> at allocation point it is known what inode page is being allocated for.
> If this inode is from filesystem located on a known device with known
> DMA limitations, this knowledge can be used to allocate page that can be
> DMAed directly.

But in other cases we might never DMA to it.  Or we rarely DMA to it, say
for a machine running databses or qemu and using lots of direct I/O. Or
a storage target using it's local alloc_pages buffers.

> Sure there are lots of cases when at allocation time there is no idea
> what device will run DMA on page being allocated, or perhaps page is
> going to be shared, or whatever. Such cases unavoidably require bounce
> buffers if page ends to be used with device with DMA limitations. But
> still there are cases when better allocation can remove need for bounce
> buffers - without any hurt for other cases.

It takes your max 1GB DMA addressable memoery away from other uses,
and introduce the crazy highmem VM tuning issues we had with big
32-bit x86 systems in the past.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10  7:07                             ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10  7:07 UTC (permalink / raw)

On Tue, Jan 10, 2017@09:47:21AM +0300, Nikita Yushchenko wrote:
> I'm now working with HW that:
> - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
> and is being manufactured and developed,
> - has 75% of it's RAM located beyond first 4G of address space,
> - can't physically handle incoming PCIe transactions addressed to memory
> beyond 4G.

It might not be low end or obselete, but it's absolutely braindead.
Your I/O performance will suffer badly for the life of the platform
because someone tries to save 2 cents, and there is not much we can do
about it.

> (1) it constantly runs of swiotlb space, logs are full of warnings
> despite of rate limiting,

> Per my current understanding, blk-level bounce buffering will at least
> help with (1) - if done properly it will allocate bounce buffers within
> entire memory below 4G, not within dedicated swiotlb space (that is
> small and enlarging it makes memory permanently unavailable for other
> use).  This looks simple and safe (in sense of not anyhow breaking
> unrelated use cases).

Yes.  Although there is absolutely no reason why swiotlb could not
do the same.

> (2) it runs far suboptimal due to bounce-buffering almost all i/o,
> despite of lots of free memory in area where direct DMA is possible.

> Addressing (2) looks much more difficult because different memory
> allocation policy is required for that.

It's basically not possible.  Every piece of memory in a Linux
kernel is a possible source of I/O, and depending on the workload
type it might even be a the prime source of I/O.

> > NVMe should never bounce, the fact that it currently possibly does
> > for highmem pages is a bug.
> 
> The entire topic is absolutely not related to highmem (i.e. memory not
> directly addressable by 32-bit kernel).

I did not say this affects you, but thanks to your mail I noticed that
NVMe has a suboptimal setting there.  Also note that highmem does not
have to imply a 32-bit kernel, just physical memory that is not in the
kernel mapping.

> What we are discussing is hw-originated restriction on where DMA is
> possible.

Yes, where hw means the SOC, and not the actual I/O device, which is an
important distinction.

> > Or even better remove the call to dma_set_mask_and_coherent with
> > DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> > addressing, there is not point in trying to pretent it works without that
> 
> Are you claiming that NVMe driver in mainline is intentionally designed
> to not work on HW that can't do DMA to entire 64-bit space?

It is not intenteded to handle the case where the SOC / chipset
can't handle DMA to all physical memoery, yes.

> Such setups do exist and there is interest to make them working.

Sure, but it's not the job of the NVMe driver to work around such a broken
system.  It's something your architecture code needs to do, maybe with
a bit of core kernel support.

> Quite a few pages used for block I/O are allocated by filemap code - and
> at allocation point it is known what inode page is being allocated for.
> If this inode is from filesystem located on a known device with known
> DMA limitations, this knowledge can be used to allocate page that can be
> DMAed directly.

But in other cases we might never DMA to it.  Or we rarely DMA to it, say
for a machine running databses or qemu and using lots of direct I/O. Or
a storage target using it's local alloc_pages buffers.

> Sure there are lots of cases when at allocation time there is no idea
> what device will run DMA on page being allocated, or perhaps page is
> going to be shared, or whatever. Such cases unavoidably require bounce
> buffers if page ends to be used with device with DMA limitations. But
> still there are cases when better allocation can remove need for bounce
> buffers - without any hurt for other cases.

It takes your max 1GB DMA addressable memoery away from other uses,
and introduce the crazy highmem VM tuning issues we had with big
32-bit x86 systems in the past.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-10  7:07                             ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10  7:07 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Keith Busch, Sagi Grimberg, Jens Axboe, Catalin Marinas,
	Will Deacon, linux-kernel, linux-nvme, Arnd Bergmann,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov, Christoph Hellwig, linux-arm-kernel

On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
> I'm now working with HW that:
> - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
> and is being manufactured and developed,
> - has 75% of it's RAM located beyond first 4G of address space,
> - can't physically handle incoming PCIe transactions addressed to memory
> beyond 4G.

It might not be low end or obselete, but it's absolutely braindead.
Your I/O performance will suffer badly for the life of the platform
because someone tries to save 2 cents, and there is not much we can do
about it.

> (1) it constantly runs of swiotlb space, logs are full of warnings
> despite of rate limiting,

> Per my current understanding, blk-level bounce buffering will at least
> help with (1) - if done properly it will allocate bounce buffers within
> entire memory below 4G, not within dedicated swiotlb space (that is
> small and enlarging it makes memory permanently unavailable for other
> use).  This looks simple and safe (in sense of not anyhow breaking
> unrelated use cases).

Yes.  Although there is absolutely no reason why swiotlb could not
do the same.

> (2) it runs far suboptimal due to bounce-buffering almost all i/o,
> despite of lots of free memory in area where direct DMA is possible.

> Addressing (2) looks much more difficult because different memory
> allocation policy is required for that.

It's basically not possible.  Every piece of memory in a Linux
kernel is a possible source of I/O, and depending on the workload
type it might even be a the prime source of I/O.

> > NVMe should never bounce, the fact that it currently possibly does
> > for highmem pages is a bug.
> 
> The entire topic is absolutely not related to highmem (i.e. memory not
> directly addressable by 32-bit kernel).

I did not say this affects you, but thanks to your mail I noticed that
NVMe has a suboptimal setting there.  Also note that highmem does not
have to imply a 32-bit kernel, just physical memory that is not in the
kernel mapping.

> What we are discussing is hw-originated restriction on where DMA is
> possible.

Yes, where hw means the SOC, and not the actual I/O device, which is an
important distinction.

> > Or even better remove the call to dma_set_mask_and_coherent with
> > DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> > addressing, there is not point in trying to pretent it works without that
> 
> Are you claiming that NVMe driver in mainline is intentionally designed
> to not work on HW that can't do DMA to entire 64-bit space?

It is not intenteded to handle the case where the SOC / chipset
can't handle DMA to all physical memoery, yes.

> Such setups do exist and there is interest to make them working.

Sure, but it's not the job of the NVMe driver to work around such a broken
system.  It's something your architecture code needs to do, maybe with
a bit of core kernel support.

> Quite a few pages used for block I/O are allocated by filemap code - and
> at allocation point it is known what inode page is being allocated for.
> If this inode is from filesystem located on a known device with known
> DMA limitations, this knowledge can be used to allocate page that can be
> DMAed directly.

But in other cases we might never DMA to it.  Or we rarely DMA to it, say
for a machine running databses or qemu and using lots of direct I/O. Or
a storage target using it's local alloc_pages buffers.

> Sure there are lots of cases when at allocation time there is no idea
> what device will run DMA on page being allocated, or perhaps page is
> going to be shared, or whatever. Such cases unavoidably require bounce
> buffers if page ends to be used with device with DMA limitations. But
> still there are cases when better allocation can remove need for bounce
> buffers - without any hurt for other cases.

It takes your max 1GB DMA addressable memoery away from other uses,
and introduce the crazy highmem VM tuning issues we had with big
32-bit x86 systems in the past.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-10  7:07                             ` Christoph Hellwig
  (?)
  (?)
@ 2017-01-10  7:31                               ` Nikita Yushchenko
  -1 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-10  7:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arnd Bergmann, linux-arm-kernel, Catalin Marinas, Will Deacon,
	linux-kernel, linux-renesas-soc, Simon Horman, linux-pci,
	Bjorn Helgaas, artemi.ivanov, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme

Christoph, thanks for clear input.

Arnd, I think that given this discussion, best short-term solution is
indeed the patch I've submitted yesterday. That is, your version +
coherent mask support.  With that, set_dma_mask(DMA_BIT_MASK(64)) will
succeed and hardware with work with swiotlb.

Possible next step is to teach swiotlb to dynamically allocate bounce
buffers within entire arm64's ZONE_DMA.

Also there is some hope that R-Car *can* iommu-translate addresses that
PCIe module issues to system bus.  Although previous attempts to make
that working failed. Additional research is needed here.

Nikita

> On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
>> I'm now working with HW that:
>> - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
>> and is being manufactured and developed,
>> - has 75% of it's RAM located beyond first 4G of address space,
>> - can't physically handle incoming PCIe transactions addressed to memory
>> beyond 4G.
> 
> It might not be low end or obselete, but it's absolutely braindead.
> Your I/O performance will suffer badly for the life of the platform
> because someone tries to save 2 cents, and there is not much we can do
> about it.
> 
>> (1) it constantly runs of swiotlb space, logs are full of warnings
>> despite of rate limiting,
> 
>> Per my current understanding, blk-level bounce buffering will at least
>> help with (1) - if done properly it will allocate bounce buffers within
>> entire memory below 4G, not within dedicated swiotlb space (that is
>> small and enlarging it makes memory permanently unavailable for other
>> use).  This looks simple and safe (in sense of not anyhow breaking
>> unrelated use cases).
> 
> Yes.  Although there is absolutely no reason why swiotlb could not
> do the same.
> 
>> (2) it runs far suboptimal due to bounce-buffering almost all i/o,
>> despite of lots of free memory in area where direct DMA is possible.
> 
>> Addressing (2) looks much more difficult because different memory
>> allocation policy is required for that.
> 
> It's basically not possible.  Every piece of memory in a Linux
> kernel is a possible source of I/O, and depending on the workload
> type it might even be a the prime source of I/O.
> 
>>> NVMe should never bounce, the fact that it currently possibly does
>>> for highmem pages is a bug.
>>
>> The entire topic is absolutely not related to highmem (i.e. memory not
>> directly addressable by 32-bit kernel).
> 
> I did not say this affects you, but thanks to your mail I noticed that
> NVMe has a suboptimal setting there.  Also note that highmem does not
> have to imply a 32-bit kernel, just physical memory that is not in the
> kernel mapping.
> 
>> What we are discussing is hw-originated restriction on where DMA is
>> possible.
> 
> Yes, where hw means the SOC, and not the actual I/O device, which is an
> important distinction.
> 
>>> Or even better remove the call to dma_set_mask_and_coherent with
>>> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
>>> addressing, there is not point in trying to pretent it works without that
>>
>> Are you claiming that NVMe driver in mainline is intentionally designed
>> to not work on HW that can't do DMA to entire 64-bit space?
> 
> It is not intenteded to handle the case where the SOC / chipset
> can't handle DMA to all physical memoery, yes.
> 
>> Such setups do exist and there is interest to make them working.
> 
> Sure, but it's not the job of the NVMe driver to work around such a broken
> system.  It's something your architecture code needs to do, maybe with
> a bit of core kernel support.
> 
>> Quite a few pages used for block I/O are allocated by filemap code - and
>> at allocation point it is known what inode page is being allocated for.
>> If this inode is from filesystem located on a known device with known
>> DMA limitations, this knowledge can be used to allocate page that can be
>> DMAed directly.
> 
> But in other cases we might never DMA to it.  Or we rarely DMA to it, say
> for a machine running databses or qemu and using lots of direct I/O. Or
> a storage target using it's local alloc_pages buffers.
> 
>> Sure there are lots of cases when at allocation time there is no idea
>> what device will run DMA on page being allocated, or perhaps page is
>> going to be shared, or whatever. Such cases unavoidably require bounce
>> buffers if page ends to be used with device with DMA limitations. But
>> still there are cases when better allocation can remove need for bounce
>> buffers - without any hurt for other cases.
> 
> It takes your max 1GB DMA addressable memoery away from other uses,
> and introduce the crazy highmem VM tuning issues we had with big
> 32-bit x86 systems in the past.
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10  7:31                               ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-10  7:31 UTC (permalink / raw)
  To: linux-arm-kernel

Christoph, thanks for clear input.

Arnd, I think that given this discussion, best short-term solution is
indeed the patch I've submitted yesterday. That is, your version +
coherent mask support.  With that, set_dma_mask(DMA_BIT_MASK(64)) will
succeed and hardware with work with swiotlb.

Possible next step is to teach swiotlb to dynamically allocate bounce
buffers within entire arm64's ZONE_DMA.

Also there is some hope that R-Car *can* iommu-translate addresses that
PCIe module issues to system bus.  Although previous attempts to make
that working failed. Additional research is needed here.

Nikita

> On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
>> I'm now working with HW that:
>> - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
>> and is being manufactured and developed,
>> - has 75% of it's RAM located beyond first 4G of address space,
>> - can't physically handle incoming PCIe transactions addressed to memory
>> beyond 4G.
> 
> It might not be low end or obselete, but it's absolutely braindead.
> Your I/O performance will suffer badly for the life of the platform
> because someone tries to save 2 cents, and there is not much we can do
> about it.
> 
>> (1) it constantly runs of swiotlb space, logs are full of warnings
>> despite of rate limiting,
> 
>> Per my current understanding, blk-level bounce buffering will at least
>> help with (1) - if done properly it will allocate bounce buffers within
>> entire memory below 4G, not within dedicated swiotlb space (that is
>> small and enlarging it makes memory permanently unavailable for other
>> use).  This looks simple and safe (in sense of not anyhow breaking
>> unrelated use cases).
> 
> Yes.  Although there is absolutely no reason why swiotlb could not
> do the same.
> 
>> (2) it runs far suboptimal due to bounce-buffering almost all i/o,
>> despite of lots of free memory in area where direct DMA is possible.
> 
>> Addressing (2) looks much more difficult because different memory
>> allocation policy is required for that.
> 
> It's basically not possible.  Every piece of memory in a Linux
> kernel is a possible source of I/O, and depending on the workload
> type it might even be a the prime source of I/O.
> 
>>> NVMe should never bounce, the fact that it currently possibly does
>>> for highmem pages is a bug.
>>
>> The entire topic is absolutely not related to highmem (i.e. memory not
>> directly addressable by 32-bit kernel).
> 
> I did not say this affects you, but thanks to your mail I noticed that
> NVMe has a suboptimal setting there.  Also note that highmem does not
> have to imply a 32-bit kernel, just physical memory that is not in the
> kernel mapping.
> 
>> What we are discussing is hw-originated restriction on where DMA is
>> possible.
> 
> Yes, where hw means the SOC, and not the actual I/O device, which is an
> important distinction.
> 
>>> Or even better remove the call to dma_set_mask_and_coherent with
>>> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
>>> addressing, there is not point in trying to pretent it works without that
>>
>> Are you claiming that NVMe driver in mainline is intentionally designed
>> to not work on HW that can't do DMA to entire 64-bit space?
> 
> It is not intenteded to handle the case where the SOC / chipset
> can't handle DMA to all physical memoery, yes.
> 
>> Such setups do exist and there is interest to make them working.
> 
> Sure, but it's not the job of the NVMe driver to work around such a broken
> system.  It's something your architecture code needs to do, maybe with
> a bit of core kernel support.
> 
>> Quite a few pages used for block I/O are allocated by filemap code - and
>> at allocation point it is known what inode page is being allocated for.
>> If this inode is from filesystem located on a known device with known
>> DMA limitations, this knowledge can be used to allocate page that can be
>> DMAed directly.
> 
> But in other cases we might never DMA to it.  Or we rarely DMA to it, say
> for a machine running databses or qemu and using lots of direct I/O. Or
> a storage target using it's local alloc_pages buffers.
> 
>> Sure there are lots of cases when at allocation time there is no idea
>> what device will run DMA on page being allocated, or perhaps page is
>> going to be shared, or whatever. Such cases unavoidably require bounce
>> buffers if page ends to be used with device with DMA limitations. But
>> still there are cases when better allocation can remove need for bounce
>> buffers - without any hurt for other cases.
> 
> It takes your max 1GB DMA addressable memoery away from other uses,
> and introduce the crazy highmem VM tuning issues we had with big
> 32-bit x86 systems in the past.
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10  7:31                               ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-10  7:31 UTC (permalink / raw)


Christoph, thanks for clear input.

Arnd, I think that given this discussion, best short-term solution is
indeed the patch I've submitted yesterday. That is, your version +
coherent mask support.  With that, set_dma_mask(DMA_BIT_MASK(64)) will
succeed and hardware with work with swiotlb.

Possible next step is to teach swiotlb to dynamically allocate bounce
buffers within entire arm64's ZONE_DMA.

Also there is some hope that R-Car *can* iommu-translate addresses that
PCIe module issues to system bus.  Although previous attempts to make
that working failed. Additional research is needed here.

Nikita

> On Tue, Jan 10, 2017@09:47:21AM +0300, Nikita Yushchenko wrote:
>> I'm now working with HW that:
>> - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
>> and is being manufactured and developed,
>> - has 75% of it's RAM located beyond first 4G of address space,
>> - can't physically handle incoming PCIe transactions addressed to memory
>> beyond 4G.
> 
> It might not be low end or obselete, but it's absolutely braindead.
> Your I/O performance will suffer badly for the life of the platform
> because someone tries to save 2 cents, and there is not much we can do
> about it.
> 
>> (1) it constantly runs of swiotlb space, logs are full of warnings
>> despite of rate limiting,
> 
>> Per my current understanding, blk-level bounce buffering will at least
>> help with (1) - if done properly it will allocate bounce buffers within
>> entire memory below 4G, not within dedicated swiotlb space (that is
>> small and enlarging it makes memory permanently unavailable for other
>> use).  This looks simple and safe (in sense of not anyhow breaking
>> unrelated use cases).
> 
> Yes.  Although there is absolutely no reason why swiotlb could not
> do the same.
> 
>> (2) it runs far suboptimal due to bounce-buffering almost all i/o,
>> despite of lots of free memory in area where direct DMA is possible.
> 
>> Addressing (2) looks much more difficult because different memory
>> allocation policy is required for that.
> 
> It's basically not possible.  Every piece of memory in a Linux
> kernel is a possible source of I/O, and depending on the workload
> type it might even be a the prime source of I/O.
> 
>>> NVMe should never bounce, the fact that it currently possibly does
>>> for highmem pages is a bug.
>>
>> The entire topic is absolutely not related to highmem (i.e. memory not
>> directly addressable by 32-bit kernel).
> 
> I did not say this affects you, but thanks to your mail I noticed that
> NVMe has a suboptimal setting there.  Also note that highmem does not
> have to imply a 32-bit kernel, just physical memory that is not in the
> kernel mapping.
> 
>> What we are discussing is hw-originated restriction on where DMA is
>> possible.
> 
> Yes, where hw means the SOC, and not the actual I/O device, which is an
> important distinction.
> 
>>> Or even better remove the call to dma_set_mask_and_coherent with
>>> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
>>> addressing, there is not point in trying to pretent it works without that
>>
>> Are you claiming that NVMe driver in mainline is intentionally designed
>> to not work on HW that can't do DMA to entire 64-bit space?
> 
> It is not intenteded to handle the case where the SOC / chipset
> can't handle DMA to all physical memoery, yes.
> 
>> Such setups do exist and there is interest to make them working.
> 
> Sure, but it's not the job of the NVMe driver to work around such a broken
> system.  It's something your architecture code needs to do, maybe with
> a bit of core kernel support.
> 
>> Quite a few pages used for block I/O are allocated by filemap code - and
>> at allocation point it is known what inode page is being allocated for.
>> If this inode is from filesystem located on a known device with known
>> DMA limitations, this knowledge can be used to allocate page that can be
>> DMAed directly.
> 
> But in other cases we might never DMA to it.  Or we rarely DMA to it, say
> for a machine running databses or qemu and using lots of direct I/O. Or
> a storage target using it's local alloc_pages buffers.
> 
>> Sure there are lots of cases when at allocation time there is no idea
>> what device will run DMA on page being allocated, or perhaps page is
>> going to be shared, or whatever. Such cases unavoidably require bounce
>> buffers if page ends to be used with device with DMA limitations. But
>> still there are cases when better allocation can remove need for bounce
>> buffers - without any hurt for other cases.
> 
> It takes your max 1GB DMA addressable memoery away from other uses,
> and introduce the crazy highmem VM tuning issues we had with big
> 32-bit x86 systems in the past.
> 

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-10  7:31                               ` Nikita Yushchenko
  0 siblings, 0 replies; 115+ messages in thread
From: Nikita Yushchenko @ 2017-01-10  7:31 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, Sagi Grimberg, Jens Axboe, Catalin Marinas,
	Will Deacon, linux-kernel, linux-nvme, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Arnd Bergmann, linux-arm-kernel

Christoph, thanks for clear input.

Arnd, I think that given this discussion, best short-term solution is
indeed the patch I've submitted yesterday. That is, your version +
coherent mask support.  With that, set_dma_mask(DMA_BIT_MASK(64)) will
succeed and hardware with work with swiotlb.

Possible next step is to teach swiotlb to dynamically allocate bounce
buffers within entire arm64's ZONE_DMA.

Also there is some hope that R-Car *can* iommu-translate addresses that
PCIe module issues to system bus.  Although previous attempts to make
that working failed. Additional research is needed here.

Nikita

> On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
>> I'm now working with HW that:
>> - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
>> and is being manufactured and developed,
>> - has 75% of it's RAM located beyond first 4G of address space,
>> - can't physically handle incoming PCIe transactions addressed to memory
>> beyond 4G.
> 
> It might not be low end or obselete, but it's absolutely braindead.
> Your I/O performance will suffer badly for the life of the platform
> because someone tries to save 2 cents, and there is not much we can do
> about it.
> 
>> (1) it constantly runs of swiotlb space, logs are full of warnings
>> despite of rate limiting,
> 
>> Per my current understanding, blk-level bounce buffering will at least
>> help with (1) - if done properly it will allocate bounce buffers within
>> entire memory below 4G, not within dedicated swiotlb space (that is
>> small and enlarging it makes memory permanently unavailable for other
>> use).  This looks simple and safe (in sense of not anyhow breaking
>> unrelated use cases).
> 
> Yes.  Although there is absolutely no reason why swiotlb could not
> do the same.
> 
>> (2) it runs far suboptimal due to bounce-buffering almost all i/o,
>> despite of lots of free memory in area where direct DMA is possible.
> 
>> Addressing (2) looks much more difficult because different memory
>> allocation policy is required for that.
> 
> It's basically not possible.  Every piece of memory in a Linux
> kernel is a possible source of I/O, and depending on the workload
> type it might even be a the prime source of I/O.
> 
>>> NVMe should never bounce, the fact that it currently possibly does
>>> for highmem pages is a bug.
>>
>> The entire topic is absolutely not related to highmem (i.e. memory not
>> directly addressable by 32-bit kernel).
> 
> I did not say this affects you, but thanks to your mail I noticed that
> NVMe has a suboptimal setting there.  Also note that highmem does not
> have to imply a 32-bit kernel, just physical memory that is not in the
> kernel mapping.
> 
>> What we are discussing is hw-originated restriction on where DMA is
>> possible.
> 
> Yes, where hw means the SOC, and not the actual I/O device, which is an
> important distinction.
> 
>>> Or even better remove the call to dma_set_mask_and_coherent with
>>> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
>>> addressing, there is not point in trying to pretent it works without that
>>
>> Are you claiming that NVMe driver in mainline is intentionally designed
>> to not work on HW that can't do DMA to entire 64-bit space?
> 
> It is not intenteded to handle the case where the SOC / chipset
> can't handle DMA to all physical memoery, yes.
> 
>> Such setups do exist and there is interest to make them working.
> 
> Sure, but it's not the job of the NVMe driver to work around such a broken
> system.  It's something your architecture code needs to do, maybe with
> a bit of core kernel support.
> 
>> Quite a few pages used for block I/O are allocated by filemap code - and
>> at allocation point it is known what inode page is being allocated for.
>> If this inode is from filesystem located on a known device with known
>> DMA limitations, this knowledge can be used to allocate page that can be
>> DMAed directly.
> 
> But in other cases we might never DMA to it.  Or we rarely DMA to it, say
> for a machine running databses or qemu and using lots of direct I/O. Or
> a storage target using it's local alloc_pages buffers.
> 
>> Sure there are lots of cases when at allocation time there is no idea
>> what device will run DMA on page being allocated, or perhaps page is
>> going to be shared, or whatever. Such cases unavoidably require bounce
>> buffers if page ends to be used with device with DMA limitations. But
>> still there are cases when better allocation can remove need for bounce
>> buffers - without any hurt for other cases.
> 
> It takes your max 1GB DMA addressable memoery away from other uses,
> and introduce the crazy highmem VM tuning issues we had with big
> 32-bit x86 systems in the past.
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-10  7:31                               ` Nikita Yushchenko
  (?)
  (?)
@ 2017-01-10 11:01                                 ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 11:01 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Christoph Hellwig, linux-arm-kernel, Catalin Marinas,
	Will Deacon, linux-kernel, linux-renesas-soc, Simon Horman,
	linux-pci, Bjorn Helgaas, artemi.ivanov, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme

On Tuesday, January 10, 2017 10:31:47 AM CET Nikita Yushchenko wrote:
> Christoph, thanks for clear input.
> 
> Arnd, I think that given this discussion, best short-term solution is
> indeed the patch I've submitted yesterday. That is, your version +
> coherent mask support.  With that, set_dma_mask(DMA_BIT_MASK(64)) will
> succeed and hardware with work with swiotlb.

Ok, good.

> Possible next step is to teach swiotlb to dynamically allocate bounce
> buffers within entire arm64's ZONE_DMA.

That seems reasonable, yes. We probably have to do both, as there are
cases where a device has dma_mask smaller than ZONE_DMA but the swiotlb
bounce area is low enough to work anyway.

Another workaround me might need is to limit amount of concurrent DMA
in the NVMe driver based on some platform quirk. The way that NVMe works,
it can have very large amounts of data that is concurrently mapped into
the device. SWIOTLB is one case where this currently fails, but another
example would be old PowerPC servers that have a 256MB window of virtual
I/O addresses per VM guest in their IOMMU. Those will likely fail the same
way that your does.

> Also there is some hope that R-Car *can* iommu-translate addresses that
> PCIe module issues to system bus.  Although previous attempts to make
> that working failed. Additional research is needed here.

Does this IOMMU support remapping data within a virtual machine? I believe
there are some that only do one of the two -- either you can have guest
machines with DMA access to their low memory, or you can remap data on
the fly in the host.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10 11:01                                 ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 11:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, January 10, 2017 10:31:47 AM CET Nikita Yushchenko wrote:
> Christoph, thanks for clear input.
> 
> Arnd, I think that given this discussion, best short-term solution is
> indeed the patch I've submitted yesterday. That is, your version +
> coherent mask support.  With that, set_dma_mask(DMA_BIT_MASK(64)) will
> succeed and hardware with work with swiotlb.

Ok, good.

> Possible next step is to teach swiotlb to dynamically allocate bounce
> buffers within entire arm64's ZONE_DMA.

That seems reasonable, yes. We probably have to do both, as there are
cases where a device has dma_mask smaller than ZONE_DMA but the swiotlb
bounce area is low enough to work anyway.

Another workaround me might need is to limit amount of concurrent DMA
in the NVMe driver based on some platform quirk. The way that NVMe works,
it can have very large amounts of data that is concurrently mapped into
the device. SWIOTLB is one case where this currently fails, but another
example would be old PowerPC servers that have a 256MB window of virtual
I/O addresses per VM guest in their IOMMU. Those will likely fail the same
way that your does.

> Also there is some hope that R-Car *can* iommu-translate addresses that
> PCIe module issues to system bus.  Although previous attempts to make
> that working failed. Additional research is needed here.

Does this IOMMU support remapping data within a virtual machine? I believe
there are some that only do one of the two -- either you can have guest
machines with DMA access to their low memory, or you can remap data on
the fly in the host.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10 11:01                                 ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 11:01 UTC (permalink / raw)

On Tuesday, January 10, 2017 10:31:47 AM CET Nikita Yushchenko wrote:
> Christoph, thanks for clear input.
> 
> Arnd, I think that given this discussion, best short-term solution is
> indeed the patch I've submitted yesterday. That is, your version +
> coherent mask support.  With that, set_dma_mask(DMA_BIT_MASK(64)) will
> succeed and hardware with work with swiotlb.

Ok, good.

> Possible next step is to teach swiotlb to dynamically allocate bounce
> buffers within entire arm64's ZONE_DMA.

That seems reasonable, yes. We probably have to do both, as there are
cases where a device has dma_mask smaller than ZONE_DMA but the swiotlb
bounce area is low enough to work anyway.

Another workaround me might need is to limit amount of concurrent DMA
in the NVMe driver based on some platform quirk. The way that NVMe works,
it can have very large amounts of data that is concurrently mapped into
the device. SWIOTLB is one case where this currently fails, but another
example would be old PowerPC servers that have a 256MB window of virtual
I/O addresses per VM guest in their IOMMU. Those will likely fail the same
way that your does.

> Also there is some hope that R-Car *can* iommu-translate addresses that
> PCIe module issues to system bus.  Although previous attempts to make
> that working failed. Additional research is needed here.

Does this IOMMU support remapping data within a virtual machine? I believe
there are some that only do one of the two -- either you can have guest
machines with DMA access to their low memory, or you can remap data on
the fly in the host.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-10 11:01                                 ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 11:01 UTC (permalink / raw)
  To: Nikita Yushchenko
  Cc: Keith Busch, Sagi Grimberg, Jens Axboe, Catalin Marinas,
	Will Deacon, linux-kernel, linux-nvme, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Christoph Hellwig, linux-arm-kernel

On Tuesday, January 10, 2017 10:31:47 AM CET Nikita Yushchenko wrote:
> Christoph, thanks for clear input.
> 
> Arnd, I think that given this discussion, best short-term solution is
> indeed the patch I've submitted yesterday. That is, your version +
> coherent mask support.  With that, set_dma_mask(DMA_BIT_MASK(64)) will
> succeed and hardware with work with swiotlb.

Ok, good.

> Possible next step is to teach swiotlb to dynamically allocate bounce
> buffers within entire arm64's ZONE_DMA.

That seems reasonable, yes. We probably have to do both, as there are
cases where a device has dma_mask smaller than ZONE_DMA but the swiotlb
bounce area is low enough to work anyway.

Another workaround me might need is to limit amount of concurrent DMA
in the NVMe driver based on some platform quirk. The way that NVMe works,
it can have very large amounts of data that is concurrently mapped into
the device. SWIOTLB is one case where this currently fails, but another
example would be old PowerPC servers that have a 256MB window of virtual
I/O addresses per VM guest in their IOMMU. Those will likely fail the same
way that your does.

> Also there is some hope that R-Car *can* iommu-translate addresses that
> PCIe module issues to system bus.  Although previous attempts to make
> that working failed. Additional research is needed here.

Does this IOMMU support remapping data within a virtual machine? I believe
there are some that only do one of the two -- either you can have guest
machines with DMA access to their low memory, or you can remap data on
the fly in the host.

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-10 11:01                                 ` Arnd Bergmann
  (?)
  (?)
@ 2017-01-10 14:48                                   ` Christoph Hellwig
  -1 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10 14:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nikita Yushchenko, Christoph Hellwig, linux-arm-kernel,
	Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme

On Tue, Jan 10, 2017 at 12:01:05PM +0100, Arnd Bergmann wrote:
> Another workaround me might need is to limit amount of concurrent DMA
> in the NVMe driver based on some platform quirk. The way that NVMe works,
> it can have very large amounts of data that is concurrently mapped into
> the device.

That's not really just NVMe - other storage and network controllers also
can DMA map giant amounts of memory.  There are a couple aspects to it:

 - dma coherent memoery - right now NVMe doesn't use too much of it,
   but upcoming low-end NVMe controllers will soon start to require
   fairl large amounts of it for the host memory buffer feature that
   allows for DRAM-less controller designs.  As an interesting quirk
   that is memory only used by the PCIe devices, and never accessed
   by the Linux host at all.

 - size vs number of the dynamic mapping.  We probably want the dma_ops
   specify a maximum mapping size for a given device.  As long as we
   can make progress with a few mappings swiotlb / the iommu can just
   fail mapping and the driver will propagate that to the block layer
   that throttles I/O.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10 14:48                                   ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10 14:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 10, 2017 at 12:01:05PM +0100, Arnd Bergmann wrote:
> Another workaround me might need is to limit amount of concurrent DMA
> in the NVMe driver based on some platform quirk. The way that NVMe works,
> it can have very large amounts of data that is concurrently mapped into
> the device.

That's not really just NVMe - other storage and network controllers also
can DMA map giant amounts of memory.  There are a couple aspects to it:

 - dma coherent memoery - right now NVMe doesn't use too much of it,
   but upcoming low-end NVMe controllers will soon start to require
   fairl large amounts of it for the host memory buffer feature that
   allows for DRAM-less controller designs.  As an interesting quirk
   that is memory only used by the PCIe devices, and never accessed
   by the Linux host at all.

 - size vs number of the dynamic mapping.  We probably want the dma_ops
   specify a maximum mapping size for a given device.  As long as we
   can make progress with a few mappings swiotlb / the iommu can just
   fail mapping and the driver will propagate that to the block layer
   that throttles I/O.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10 14:48                                   ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10 14:48 UTC (permalink / raw)


On Tue, Jan 10, 2017@12:01:05PM +0100, Arnd Bergmann wrote:
> Another workaround me might need is to limit amount of concurrent DMA
> in the NVMe driver based on some platform quirk. The way that NVMe works,
> it can have very large amounts of data that is concurrently mapped into
> the device.

That's not really just NVMe - other storage and network controllers also
can DMA map giant amounts of memory.  There are a couple aspects to it:

 - dma coherent memoery - right now NVMe doesn't use too much of it,
   but upcoming low-end NVMe controllers will soon start to require
   fairl large amounts of it for the host memory buffer feature that
   allows for DRAM-less controller designs.  As an interesting quirk
   that is memory only used by the PCIe devices, and never accessed
   by the Linux host at all.

 - size vs number of the dynamic mapping.  We probably want the dma_ops
   specify a maximum mapping size for a given device.  As long as we
   can make progress with a few mappings swiotlb / the iommu can just
   fail mapping and the driver will propagate that to the block layer
   that throttles I/O.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-10 14:48                                   ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10 14:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nikita Yushchenko, Keith Busch, Sagi Grimberg, Jens Axboe,
	Catalin Marinas, Will Deacon, linux-kernel, linux-nvme,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov, Christoph Hellwig, linux-arm-kernel

On Tue, Jan 10, 2017 at 12:01:05PM +0100, Arnd Bergmann wrote:
> Another workaround me might need is to limit amount of concurrent DMA
> in the NVMe driver based on some platform quirk. The way that NVMe works,
> it can have very large amounts of data that is concurrently mapped into
> the device.

That's not really just NVMe - other storage and network controllers also
can DMA map giant amounts of memory.  There are a couple aspects to it:

 - dma coherent memoery - right now NVMe doesn't use too much of it,
   but upcoming low-end NVMe controllers will soon start to require
   fairl large amounts of it for the host memory buffer feature that
   allows for DRAM-less controller designs.  As an interesting quirk
   that is memory only used by the PCIe devices, and never accessed
   by the Linux host at all.

 - size vs number of the dynamic mapping.  We probably want the dma_ops
   specify a maximum mapping size for a given device.  As long as we
   can make progress with a few mappings swiotlb / the iommu can just
   fail mapping and the driver will propagate that to the block layer
   that throttles I/O.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-10 14:48                                   ` Christoph Hellwig
  (?)
  (?)
@ 2017-01-10 15:02                                     ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 15:02 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Christoph Hellwig, Nikita Yushchenko, Keith Busch, Sagi Grimberg,
	Jens Axboe, Catalin Marinas, Will Deacon, linux-kernel,
	linux-nvme, linux-renesas-soc, Simon Horman, linux-pci,
	Bjorn Helgaas, artemi.ivanov

On Tuesday, January 10, 2017 3:48:39 PM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 12:01:05PM +0100, Arnd Bergmann wrote:
> > Another workaround me might need is to limit amount of concurrent DMA
> > in the NVMe driver based on some platform quirk. The way that NVMe works,
> > it can have very large amounts of data that is concurrently mapped into
> > the device.
> 
> That's not really just NVMe - other storage and network controllers also
> can DMA map giant amounts of memory.  There are a couple aspects to it:
> 
>  - dma coherent memoery - right now NVMe doesn't use too much of it,
>    but upcoming low-end NVMe controllers will soon start to require
>    fairl large amounts of it for the host memory buffer feature that
>    allows for DRAM-less controller designs.  As an interesting quirk
>    that is memory only used by the PCIe devices, and never accessed
>    by the Linux host at all.

Right, that is going to become interesting, as some platforms are
very limited with their coherent allocations.

>  - size vs number of the dynamic mapping.  We probably want the dma_ops
>    specify a maximum mapping size for a given device.  As long as we
>    can make progress with a few mappings swiotlb / the iommu can just
>    fail mapping and the driver will propagate that to the block layer
>    that throttles I/O.

Good idea.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10 15:02                                     ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 15:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, January 10, 2017 3:48:39 PM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 12:01:05PM +0100, Arnd Bergmann wrote:
> > Another workaround me might need is to limit amount of concurrent DMA
> > in the NVMe driver based on some platform quirk. The way that NVMe works,
> > it can have very large amounts of data that is concurrently mapped into
> > the device.
> 
> That's not really just NVMe - other storage and network controllers also
> can DMA map giant amounts of memory.  There are a couple aspects to it:
> 
>  - dma coherent memoery - right now NVMe doesn't use too much of it,
>    but upcoming low-end NVMe controllers will soon start to require
>    fairl large amounts of it for the host memory buffer feature that
>    allows for DRAM-less controller designs.  As an interesting quirk
>    that is memory only used by the PCIe devices, and never accessed
>    by the Linux host at all.

Right, that is going to become interesting, as some platforms are
very limited with their coherent allocations.

>  - size vs number of the dynamic mapping.  We probably want the dma_ops
>    specify a maximum mapping size for a given device.  As long as we
>    can make progress with a few mappings swiotlb / the iommu can just
>    fail mapping and the driver will propagate that to the block layer
>    that throttles I/O.

Good idea.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10 15:02                                     ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 15:02 UTC (permalink / raw)


On Tuesday, January 10, 2017 3:48:39 PM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017@12:01:05PM +0100, Arnd Bergmann wrote:
> > Another workaround me might need is to limit amount of concurrent DMA
> > in the NVMe driver based on some platform quirk. The way that NVMe works,
> > it can have very large amounts of data that is concurrently mapped into
> > the device.
> 
> That's not really just NVMe - other storage and network controllers also
> can DMA map giant amounts of memory.  There are a couple aspects to it:
> 
>  - dma coherent memoery - right now NVMe doesn't use too much of it,
>    but upcoming low-end NVMe controllers will soon start to require
>    fairl large amounts of it for the host memory buffer feature that
>    allows for DRAM-less controller designs.  As an interesting quirk
>    that is memory only used by the PCIe devices, and never accessed
>    by the Linux host at all.

Right, that is going to become interesting, as some platforms are
very limited with their coherent allocations.

>  - size vs number of the dynamic mapping.  We probably want the dma_ops
>    specify a maximum mapping size for a given device.  As long as we
>    can make progress with a few mappings swiotlb / the iommu can just
>    fail mapping and the driver will propagate that to the block layer
>    that throttles I/O.

Good idea.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-10 15:02                                     ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 15:02 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nikita Yushchenko, Jens Axboe, Sagi Grimberg, linux-pci,
	Catalin Marinas, Will Deacon, linux-kernel, linux-nvme,
	Keith Busch, Simon Horman, linux-renesas-soc, Bjorn Helgaas,
	artemi.ivanov, Christoph Hellwig

On Tuesday, January 10, 2017 3:48:39 PM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 12:01:05PM +0100, Arnd Bergmann wrote:
> > Another workaround me might need is to limit amount of concurrent DMA
> > in the NVMe driver based on some platform quirk. The way that NVMe works,
> > it can have very large amounts of data that is concurrently mapped into
> > the device.
> 
> That's not really just NVMe - other storage and network controllers also
> can DMA map giant amounts of memory.  There are a couple aspects to it:
> 
>  - dma coherent memoery - right now NVMe doesn't use too much of it,
>    but upcoming low-end NVMe controllers will soon start to require
>    fairl large amounts of it for the host memory buffer feature that
>    allows for DRAM-less controller designs.  As an interesting quirk
>    that is memory only used by the PCIe devices, and never accessed
>    by the Linux host at all.

Right, that is going to become interesting, as some platforms are
very limited with their coherent allocations.

>  - size vs number of the dynamic mapping.  We probably want the dma_ops
>    specify a maximum mapping size for a given device.  As long as we
>    can make progress with a few mappings swiotlb / the iommu can just
>    fail mapping and the driver will propagate that to the block layer
>    that throttles I/O.

Good idea.

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-10 14:48                                   ` Christoph Hellwig
  (?)
  (?)
@ 2017-01-12 10:09                                     ` Sagi Grimberg
  -1 siblings, 0 replies; 115+ messages in thread
From: Sagi Grimberg @ 2017-01-12 10:09 UTC (permalink / raw)
  To: Christoph Hellwig, Arnd Bergmann
  Cc: Nikita Yushchenko, linux-arm-kernel, Catalin Marinas,
	Will Deacon, linux-kernel, linux-renesas-soc, Simon Horman,
	linux-pci, Bjorn Helgaas, artemi.ivanov, Keith Busch, Jens Axboe,
	linux-nvme


>> Another workaround me might need is to limit amount of concurrent DMA
>> in the NVMe driver based on some platform quirk. The way that NVMe works,
>> it can have very large amounts of data that is concurrently mapped into
>> the device.
>
> That's not really just NVMe - other storage and network controllers also
> can DMA map giant amounts of memory.  There are a couple aspects to it:
>
>  - dma coherent memoery - right now NVMe doesn't use too much of it,
>    but upcoming low-end NVMe controllers will soon start to require
>    fairl large amounts of it for the host memory buffer feature that
>    allows for DRAM-less controller designs.  As an interesting quirk
>    that is memory only used by the PCIe devices, and never accessed
>    by the Linux host at all.

Would it make sense to convert the nvme driver to use normal allocations
and use the DMA streaming APIs (dma_sync_single_for_[cpu|device]) for
both queues and future HMB?

>  - size vs number of the dynamic mapping.  We probably want the dma_ops
>    specify a maximum mapping size for a given device.  As long as we
>    can make progress with a few mappings swiotlb / the iommu can just
>    fail mapping and the driver will propagate that to the block layer
>    that throttles I/O.

Isn't max mapping size per device too restrictive? it is possible that
not all devices posses active mappings concurrently.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-12 10:09                                     ` Sagi Grimberg
  0 siblings, 0 replies; 115+ messages in thread
From: Sagi Grimberg @ 2017-01-12 10:09 UTC (permalink / raw)
  To: linux-arm-kernel


>> Another workaround me might need is to limit amount of concurrent DMA
>> in the NVMe driver based on some platform quirk. The way that NVMe works,
>> it can have very large amounts of data that is concurrently mapped into
>> the device.
>
> That's not really just NVMe - other storage and network controllers also
> can DMA map giant amounts of memory.  There are a couple aspects to it:
>
>  - dma coherent memoery - right now NVMe doesn't use too much of it,
>    but upcoming low-end NVMe controllers will soon start to require
>    fairl large amounts of it for the host memory buffer feature that
>    allows for DRAM-less controller designs.  As an interesting quirk
>    that is memory only used by the PCIe devices, and never accessed
>    by the Linux host at all.

Would it make sense to convert the nvme driver to use normal allocations
and use the DMA streaming APIs (dma_sync_single_for_[cpu|device]) for
both queues and future HMB?

>  - size vs number of the dynamic mapping.  We probably want the dma_ops
>    specify a maximum mapping size for a given device.  As long as we
>    can make progress with a few mappings swiotlb / the iommu can just
>    fail mapping and the driver will propagate that to the block layer
>    that throttles I/O.

Isn't max mapping size per device too restrictive? it is possible that
not all devices posses active mappings concurrently.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-12 10:09                                     ` Sagi Grimberg
  0 siblings, 0 replies; 115+ messages in thread
From: Sagi Grimberg @ 2017-01-12 10:09 UTC (permalink / raw)



>> Another workaround me might need is to limit amount of concurrent DMA
>> in the NVMe driver based on some platform quirk. The way that NVMe works,
>> it can have very large amounts of data that is concurrently mapped into
>> the device.
>
> That's not really just NVMe - other storage and network controllers also
> can DMA map giant amounts of memory.  There are a couple aspects to it:
>
>  - dma coherent memoery - right now NVMe doesn't use too much of it,
>    but upcoming low-end NVMe controllers will soon start to require
>    fairl large amounts of it for the host memory buffer feature that
>    allows for DRAM-less controller designs.  As an interesting quirk
>    that is memory only used by the PCIe devices, and never accessed
>    by the Linux host at all.

Would it make sense to convert the nvme driver to use normal allocations
and use the DMA streaming APIs (dma_sync_single_for_[cpu|device]) for
both queues and future HMB?

>  - size vs number of the dynamic mapping.  We probably want the dma_ops
>    specify a maximum mapping size for a given device.  As long as we
>    can make progress with a few mappings swiotlb / the iommu can just
>    fail mapping and the driver will propagate that to the block layer
>    that throttles I/O.

Isn't max mapping size per device too restrictive? it is possible that
not all devices posses active mappings concurrently.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-12 10:09                                     ` Sagi Grimberg
  0 siblings, 0 replies; 115+ messages in thread
From: Sagi Grimberg @ 2017-01-12 10:09 UTC (permalink / raw)
  To: Christoph Hellwig, Arnd Bergmann
  Cc: Nikita Yushchenko, Keith Busch, Jens Axboe, Catalin Marinas,
	Will Deacon, linux-kernel, linux-nvme, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	linux-arm-kernel


>> Another workaround me might need is to limit amount of concurrent DMA
>> in the NVMe driver based on some platform quirk. The way that NVMe works,
>> it can have very large amounts of data that is concurrently mapped into
>> the device.
>
> That's not really just NVMe - other storage and network controllers also
> can DMA map giant amounts of memory.  There are a couple aspects to it:
>
>  - dma coherent memoery - right now NVMe doesn't use too much of it,
>    but upcoming low-end NVMe controllers will soon start to require
>    fairl large amounts of it for the host memory buffer feature that
>    allows for DRAM-less controller designs.  As an interesting quirk
>    that is memory only used by the PCIe devices, and never accessed
>    by the Linux host at all.

Would it make sense to convert the nvme driver to use normal allocations
and use the DMA streaming APIs (dma_sync_single_for_[cpu|device]) for
both queues and future HMB?

>  - size vs number of the dynamic mapping.  We probably want the dma_ops
>    specify a maximum mapping size for a given device.  As long as we
>    can make progress with a few mappings swiotlb / the iommu can just
>    fail mapping and the driver will propagate that to the block layer
>    that throttles I/O.

Isn't max mapping size per device too restrictive? it is possible that
not all devices posses active mappings concurrently.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-12 10:09                                     ` Sagi Grimberg
  (?)
  (?)
@ 2017-01-12 11:56                                       ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-12 11:56 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Christoph Hellwig, Nikita Yushchenko, linux-arm-kernel,
	Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Keith Busch, Jens Axboe, linux-nvme

On Thursday, January 12, 2017 12:09:11 PM CET Sagi Grimberg wrote:
> >> Another workaround me might need is to limit amount of concurrent DMA
> >> in the NVMe driver based on some platform quirk. The way that NVMe works,
> >> it can have very large amounts of data that is concurrently mapped into
> >> the device.
> >
> > That's not really just NVMe - other storage and network controllers also
> > can DMA map giant amounts of memory.  There are a couple aspects to it:
> >
> >  - dma coherent memoery - right now NVMe doesn't use too much of it,
> >    but upcoming low-end NVMe controllers will soon start to require
> >    fairl large amounts of it for the host memory buffer feature that
> >    allows for DRAM-less controller designs.  As an interesting quirk
> >    that is memory only used by the PCIe devices, and never accessed
> >    by the Linux host at all.
> 
> Would it make sense to convert the nvme driver to use normal allocations
> and use the DMA streaming APIs (dma_sync_single_for_[cpu|device]) for
> both queues and future HMB?

That is an interesting question: We actually have the
"DMA_ATTR_NO_KERNEL_MAPPING" for this case, and ARM implements
it in the coherent interface, so that might be a good fit.

Implementing it in the streaming API makes no sense since we
already have a kernel mapping here, but using a normal allocation
(possibly with DMA_ATTR_NON_CONSISTENT or DMA_ATTR_SKIP_CPU_SYNC,
need to check) might help on other architectures that have
limited amounts of coherent memory and no CMA.

Another benefit of the coherent API for this kind of buffer is
that we can use CMA where available to get a large consecutive
chunk of RAM on architectures without an IOMMU when normal
memory is no longer available because of fragmentation.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-12 11:56                                       ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-12 11:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday, January 12, 2017 12:09:11 PM CET Sagi Grimberg wrote:
> >> Another workaround me might need is to limit amount of concurrent DMA
> >> in the NVMe driver based on some platform quirk. The way that NVMe works,
> >> it can have very large amounts of data that is concurrently mapped into
> >> the device.
> >
> > That's not really just NVMe - other storage and network controllers also
> > can DMA map giant amounts of memory.  There are a couple aspects to it:
> >
> >  - dma coherent memoery - right now NVMe doesn't use too much of it,
> >    but upcoming low-end NVMe controllers will soon start to require
> >    fairl large amounts of it for the host memory buffer feature that
> >    allows for DRAM-less controller designs.  As an interesting quirk
> >    that is memory only used by the PCIe devices, and never accessed
> >    by the Linux host at all.
> 
> Would it make sense to convert the nvme driver to use normal allocations
> and use the DMA streaming APIs (dma_sync_single_for_[cpu|device]) for
> both queues and future HMB?

That is an interesting question: We actually have the
"DMA_ATTR_NO_KERNEL_MAPPING" for this case, and ARM implements
it in the coherent interface, so that might be a good fit.

Implementing it in the streaming API makes no sense since we
already have a kernel mapping here, but using a normal allocation
(possibly with DMA_ATTR_NON_CONSISTENT or DMA_ATTR_SKIP_CPU_SYNC,
need to check) might help on other architectures that have
limited amounts of coherent memory and no CMA.

Another benefit of the coherent API for this kind of buffer is
that we can use CMA where available to get a large consecutive
chunk of RAM on architectures without an IOMMU when normal
memory is no longer available because of fragmentation.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-12 11:56                                       ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-12 11:56 UTC (permalink / raw)

On Thursday, January 12, 2017 12:09:11 PM CET Sagi Grimberg wrote:
> >> Another workaround me might need is to limit amount of concurrent DMA
> >> in the NVMe driver based on some platform quirk. The way that NVMe works,
> >> it can have very large amounts of data that is concurrently mapped into
> >> the device.
> >
> > That's not really just NVMe - other storage and network controllers also
> > can DMA map giant amounts of memory.  There are a couple aspects to it:
> >
> >  - dma coherent memoery - right now NVMe doesn't use too much of it,
> >    but upcoming low-end NVMe controllers will soon start to require
> >    fairl large amounts of it for the host memory buffer feature that
> >    allows for DRAM-less controller designs.  As an interesting quirk
> >    that is memory only used by the PCIe devices, and never accessed
> >    by the Linux host at all.
> 
> Would it make sense to convert the nvme driver to use normal allocations
> and use the DMA streaming APIs (dma_sync_single_for_[cpu|device]) for
> both queues and future HMB?

That is an interesting question: We actually have the
"DMA_ATTR_NO_KERNEL_MAPPING" for this case, and ARM implements
it in the coherent interface, so that might be a good fit.

Implementing it in the streaming API makes no sense since we
already have a kernel mapping here, but using a normal allocation
(possibly with DMA_ATTR_NON_CONSISTENT or DMA_ATTR_SKIP_CPU_SYNC,
need to check) might help on other architectures that have
limited amounts of coherent memory and no CMA.

Another benefit of the coherent API for this kind of buffer is
that we can use CMA where available to get a large consecutive
chunk of RAM on architectures without an IOMMU when normal
memory is no longer available because of fragmentation.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-12 11:56                                       ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-12 11:56 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Nikita Yushchenko, Keith Busch, Jens Axboe, Catalin Marinas,
	Will Deacon, linux-kernel, linux-nvme, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Christoph Hellwig, linux-arm-kernel

On Thursday, January 12, 2017 12:09:11 PM CET Sagi Grimberg wrote:
> >> Another workaround me might need is to limit amount of concurrent DMA
> >> in the NVMe driver based on some platform quirk. The way that NVMe works,
> >> it can have very large amounts of data that is concurrently mapped into
> >> the device.
> >
> > That's not really just NVMe - other storage and network controllers also
> > can DMA map giant amounts of memory.  There are a couple aspects to it:
> >
> >  - dma coherent memoery - right now NVMe doesn't use too much of it,
> >    but upcoming low-end NVMe controllers will soon start to require
> >    fairl large amounts of it for the host memory buffer feature that
> >    allows for DRAM-less controller designs.  As an interesting quirk
> >    that is memory only used by the PCIe devices, and never accessed
> >    by the Linux host at all.
> 
> Would it make sense to convert the nvme driver to use normal allocations
> and use the DMA streaming APIs (dma_sync_single_for_[cpu|device]) for
> both queues and future HMB?

That is an interesting question: We actually have the
"DMA_ATTR_NO_KERNEL_MAPPING" for this case, and ARM implements
it in the coherent interface, so that might be a good fit.

Implementing it in the streaming API makes no sense since we
already have a kernel mapping here, but using a normal allocation
(possibly with DMA_ATTR_NON_CONSISTENT or DMA_ATTR_SKIP_CPU_SYNC,
need to check) might help on other architectures that have
limited amounts of coherent memory and no CMA.

Another benefit of the coherent API for this kind of buffer is
that we can use CMA where available to get a large consecutive
chunk of RAM on architectures without an IOMMU when normal
memory is no longer available because of fragmentation.

	Arnd


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-12 11:56                                       ` Arnd Bergmann
  (?)
  (?)
@ 2017-01-12 13:07                                         ` Christoph Hellwig
  -1 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-12 13:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Sagi Grimberg, Christoph Hellwig, Nikita Yushchenko,
	linux-arm-kernel, Catalin Marinas, Will Deacon, linux-kernel,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov, Keith Busch, Jens Axboe, linux-nvme

On Thu, Jan 12, 2017 at 12:56:07PM +0100, Arnd Bergmann wrote:
> That is an interesting question: We actually have the
> "DMA_ATTR_NO_KERNEL_MAPPING" for this case, and ARM implements
> it in the coherent interface, so that might be a good fit.

Yes. my WIP HMB patch uses DMA_ATTR_NO_KERNEL_MAPPING, although I'm
workin on x86 at the moment where it's a no-op.

> Implementing it in the streaming API makes no sense since we
> already have a kernel mapping here, but using a normal allocation
> (possibly with DMA_ATTR_NON_CONSISTENT or DMA_ATTR_SKIP_CPU_SYNC,
> need to check) might help on other architectures that have
> limited amounts of coherent memory and no CMA.

Though about that - but in the end DMA_ATTR_NO_KERNEL_MAPPING implies
those, so instead of using lots of flags in driver I'd rather fix
up more dma_ops implementations to take advantage of
DMA_ATTR_NO_KERNEL_MAPPING.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-12 13:07                                         ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-12 13:07 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jan 12, 2017 at 12:56:07PM +0100, Arnd Bergmann wrote:
> That is an interesting question: We actually have the
> "DMA_ATTR_NO_KERNEL_MAPPING" for this case, and ARM implements
> it in the coherent interface, so that might be a good fit.

Yes. my WIP HMB patch uses DMA_ATTR_NO_KERNEL_MAPPING, although I'm
workin on x86 at the moment where it's a no-op.

> Implementing it in the streaming API makes no sense since we
> already have a kernel mapping here, but using a normal allocation
> (possibly with DMA_ATTR_NON_CONSISTENT or DMA_ATTR_SKIP_CPU_SYNC,
> need to check) might help on other architectures that have
> limited amounts of coherent memory and no CMA.

Though about that - but in the end DMA_ATTR_NO_KERNEL_MAPPING implies
those, so instead of using lots of flags in driver I'd rather fix
up more dma_ops implementations to take advantage of
DMA_ATTR_NO_KERNEL_MAPPING.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-12 13:07                                         ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-12 13:07 UTC (permalink / raw)


On Thu, Jan 12, 2017@12:56:07PM +0100, Arnd Bergmann wrote:
> That is an interesting question: We actually have the
> "DMA_ATTR_NO_KERNEL_MAPPING" for this case, and ARM implements
> it in the coherent interface, so that might be a good fit.

Yes. my WIP HMB patch uses DMA_ATTR_NO_KERNEL_MAPPING, although I'm
workin on x86 at the moment where it's a no-op.

> Implementing it in the streaming API makes no sense since we
> already have a kernel mapping here, but using a normal allocation
> (possibly with DMA_ATTR_NON_CONSISTENT or DMA_ATTR_SKIP_CPU_SYNC,
> need to check) might help on other architectures that have
> limited amounts of coherent memory and no CMA.

Though about that - but in the end DMA_ATTR_NO_KERNEL_MAPPING implies
those, so instead of using lots of flags in driver I'd rather fix
up more dma_ops implementations to take advantage of
DMA_ATTR_NO_KERNEL_MAPPING.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-12 13:07                                         ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-12 13:07 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nikita Yushchenko, Keith Busch, Sagi Grimberg, Jens Axboe,
	Catalin Marinas, Will Deacon, linux-kernel, linux-nvme,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov, Christoph Hellwig, linux-arm-kernel

On Thu, Jan 12, 2017 at 12:56:07PM +0100, Arnd Bergmann wrote:
> That is an interesting question: We actually have the
> "DMA_ATTR_NO_KERNEL_MAPPING" for this case, and ARM implements
> it in the coherent interface, so that might be a good fit.

Yes. my WIP HMB patch uses DMA_ATTR_NO_KERNEL_MAPPING, although I'm
workin on x86 at the moment where it's a no-op.

> Implementing it in the streaming API makes no sense since we
> already have a kernel mapping here, but using a normal allocation
> (possibly with DMA_ATTR_NON_CONSISTENT or DMA_ATTR_SKIP_CPU_SYNC,
> need to check) might help on other architectures that have
> limited amounts of coherent memory and no CMA.

Though about that - but in the end DMA_ATTR_NO_KERNEL_MAPPING implies
those, so instead of using lots of flags in driver I'd rather fix
up more dma_ops implementations to take advantage of
DMA_ATTR_NO_KERNEL_MAPPING.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
  2017-01-10  7:07                             ` Christoph Hellwig
  (?)
  (?)
@ 2017-01-10 10:54                               ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 10:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nikita Yushchenko, linux-arm-kernel, Catalin Marinas,
	Will Deacon, linux-kernel, linux-renesas-soc, Simon Horman,
	linux-pci, Bjorn Helgaas, artemi.ivanov, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme

On Tuesday, January 10, 2017 8:07:20 AM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
> > I'm now working with HW that:
> > - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
> > and is being manufactured and developed,
> > - has 75% of it's RAM located beyond first 4G of address space,
> > - can't physically handle incoming PCIe transactions addressed to memory
> > beyond 4G.
> 
> It might not be low end or obselete, but it's absolutely braindead.
> Your I/O performance will suffer badly for the life of the platform
> because someone tries to save 2 cents, and there is not much we can do
> about it.

Unfortunately it is a common problem for arm64 chips that were designed
by taking a 32-bit SoC and replacing the CPU core. The swiotlb is the
right workaround for this, and I think we all agree that we should
just make it work correctly.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10 10:54                               ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 10:54 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, January 10, 2017 8:07:20 AM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
> > I'm now working with HW that:
> > - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
> > and is being manufactured and developed,
> > - has 75% of it's RAM located beyond first 4G of address space,
> > - can't physically handle incoming PCIe transactions addressed to memory
> > beyond 4G.
> 
> It might not be low end or obselete, but it's absolutely braindead.
> Your I/O performance will suffer badly for the life of the platform
> because someone tries to save 2 cents, and there is not much we can do
> about it.

Unfortunately it is a common problem for arm64 chips that were designed
by taking a 32-bit SoC and replacing the CPU core. The swiotlb is the
right workaround for this, and I think we all agree that we should
just make it work correctly.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* NVMe vs DMA addressing limitations
@ 2017-01-10 10:54                               ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 10:54 UTC (permalink / raw)


On Tuesday, January 10, 2017 8:07:20 AM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017@09:47:21AM +0300, Nikita Yushchenko wrote:
> > I'm now working with HW that:
> > - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
> > and is being manufactured and developed,
> > - has 75% of it's RAM located beyond first 4G of address space,
> > - can't physically handle incoming PCIe transactions addressed to memory
> > beyond 4G.
> 
> It might not be low end or obselete, but it's absolutely braindead.
> Your I/O performance will suffer badly for the life of the platform
> because someone tries to save 2 cents, and there is not much we can do
> about it.

Unfortunately it is a common problem for arm64 chips that were designed
by taking a 32-bit SoC and replacing the CPU core. The swiotlb is the
right workaround for this, and I think we all agree that we should
just make it work correctly.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: NVMe vs DMA addressing limitations
@ 2017-01-10 10:54                               ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 10:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nikita Yushchenko, Keith Busch, Sagi Grimberg, Jens Axboe,
	Catalin Marinas, Will Deacon, linux-kernel, linux-nvme,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov, linux-arm-kernel

On Tuesday, January 10, 2017 8:07:20 AM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 09:47:21AM +0300, Nikita Yushchenko wrote:
> > I'm now working with HW that:
> > - is now way "low end" or "obsolete", it has 4G of RAM and 8 CPU cores,
> > and is being manufactured and developed,
> > - has 75% of it's RAM located beyond first 4G of address space,
> > - can't physically handle incoming PCIe transactions addressed to memory
> > beyond 4G.
> 
> It might not be low end or obselete, but it's absolutely braindead.
> Your I/O performance will suffer badly for the life of the platform
> because someone tries to save 2 cents, and there is not much we can do
> about it.

Unfortunately it is a common problem for arm64 chips that were designed
by taking a 32-bit SoC and replacing the CPU core. The swiotlb is the
right workaround for this, and I think we all agree that we should
just make it work correctly.

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-09 20:57                         ` Christoph Hellwig
  (?)
  (?)
@ 2017-01-10 10:47                           ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 10:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nikita Yushchenko, linux-arm-kernel, Catalin Marinas,
	Will Deacon, linux-kernel, linux-renesas-soc, Simon Horman,
	linux-pci, Bjorn Helgaas, artemi.ivanov, Keith Busch, Jens Axboe,
	Sagi Grimberg, linux-nvme

On Monday, January 9, 2017 9:57:46 PM CET Christoph Hellwig wrote:
> > - architecture should stop breaking 64-bit DMA when driver attempts to
> > set 64-bit dma mask,
> > 
> > - NVMe should issue proper blk_queue_bounce_limit() call based on what
> > is actually set mask,
> 
> Or even better remove the call to dma_set_mask_and_coherent with
> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> addressing, there is not point in trying to pretent it works without that

Agreed, let's just fail the probe() if DMA_BIT_MASK(64) fails, and
have swiotlb work around machines that for some reason need bounce
buffers.

> > - and blk_queue_bounce_limit() should also be fixed to actually set
> > 0xffffffff limit, instead of replacing it with (max_low_pfn <<
> > PAGE_SHIFT) as it does now.
> 
> We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
> mix the highmem aspect with the addressing limits.  In fact the whole
> block bouncing scheme doesn't make much sense at all these days, we
> should rely on swiotlb instead.

If we do this, we should probably have another look at the respective
NETIF_F_HIGHDMA support in the network stack, which does the same thing
and mixes up highmem on 32-bit architectures with the DMA address limit.
(side note: there are actually cases in which you have a 31-bit DMA
mask but 3 GB of lowmem using CONFIG_VMSPLIT_1G, so BLK_BOUNCE_HIGH
and !NETIF_F_HIGHDMA are both missing the limit, causing data corruption
without swiotlb).

Before we rely too much on swiotlb, we may also need to consider which
architectures today rely on bouncing in blk and network.

I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
32-bit architectures without swiotlb (arc, arm, some mips32), and
there are several 64-bit architectures that do not have swiotlb
(alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
always use some form of IOMMU, but the other four apparently don't,
so we would need to add swiotlb support there to remove all the
bounce buffering in network and block layers.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 10:47                           ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 10:47 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday, January 9, 2017 9:57:46 PM CET Christoph Hellwig wrote:
> > - architecture should stop breaking 64-bit DMA when driver attempts to
> > set 64-bit dma mask,
> > 
> > - NVMe should issue proper blk_queue_bounce_limit() call based on what
> > is actually set mask,
> 
> Or even better remove the call to dma_set_mask_and_coherent with
> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> addressing, there is not point in trying to pretent it works without that

Agreed, let's just fail the probe() if DMA_BIT_MASK(64) fails, and
have swiotlb work around machines that for some reason need bounce
buffers.

> > - and blk_queue_bounce_limit() should also be fixed to actually set
> > 0xffffffff limit, instead of replacing it with (max_low_pfn <<
> > PAGE_SHIFT) as it does now.
> 
> We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
> mix the highmem aspect with the addressing limits.  In fact the whole
> block bouncing scheme doesn't make much sense at all these days, we
> should rely on swiotlb instead.

If we do this, we should probably have another look at the respective
NETIF_F_HIGHDMA support in the network stack, which does the same thing
and mixes up highmem on 32-bit architectures with the DMA address limit.
(side note: there are actually cases in which you have a 31-bit DMA
mask but 3 GB of lowmem using CONFIG_VMSPLIT_1G, so BLK_BOUNCE_HIGH
and !NETIF_F_HIGHDMA are both missing the limit, causing data corruption
without swiotlb).

Before we rely too much on swiotlb, we may also need to consider which
architectures today rely on bouncing in blk and network.

I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
32-bit architectures without swiotlb (arc, arm, some mips32), and
there are several 64-bit architectures that do not have swiotlb
(alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
always use some form of IOMMU, but the other four apparently don't,
so we would need to add swiotlb support there to remove all the
bounce buffering in network and block layers.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 10:47                           ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 10:47 UTC (permalink / raw)

On Monday, January 9, 2017 9:57:46 PM CET Christoph Hellwig wrote:
> > - architecture should stop breaking 64-bit DMA when driver attempts to
> > set 64-bit dma mask,
> > 
> > - NVMe should issue proper blk_queue_bounce_limit() call based on what
> > is actually set mask,
> 
> Or even better remove the call to dma_set_mask_and_coherent with
> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> addressing, there is not point in trying to pretent it works without that

Agreed, let's just fail the probe() if DMA_BIT_MASK(64) fails, and
have swiotlb work around machines that for some reason need bounce
buffers.

> > - and blk_queue_bounce_limit() should also be fixed to actually set
> > 0xffffffff limit, instead of replacing it with (max_low_pfn <<
> > PAGE_SHIFT) as it does now.
> 
> We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
> mix the highmem aspect with the addressing limits.  In fact the whole
> block bouncing scheme doesn't make much sense at all these days, we
> should rely on swiotlb instead.

If we do this, we should probably have another look at the respective
NETIF_F_HIGHDMA support in the network stack, which does the same thing
and mixes up highmem on 32-bit architectures with the DMA address limit.
(side note: there are actually cases in which you have a 31-bit DMA
mask but 3 GB of lowmem using CONFIG_VMSPLIT_1G, so BLK_BOUNCE_HIGH
and !NETIF_F_HIGHDMA are both missing the limit, causing data corruption
without swiotlb).

Before we rely too much on swiotlb, we may also need to consider which
architectures today rely on bouncing in blk and network.

I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
32-bit architectures without swiotlb (arc, arm, some mips32), and
there are several 64-bit architectures that do not have swiotlb
(alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
always use some form of IOMMU, but the other four apparently don't,
so we would need to add swiotlb support there to remove all the
bounce buffering in network and block layers.

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 10:47                           ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 10:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Nikita Yushchenko, Keith Busch, Sagi Grimberg, Jens Axboe,
	Catalin Marinas, Will Deacon, linux-kernel, linux-nvme,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov, linux-arm-kernel

On Monday, January 9, 2017 9:57:46 PM CET Christoph Hellwig wrote:
> > - architecture should stop breaking 64-bit DMA when driver attempts to
> > set 64-bit dma mask,
> > 
> > - NVMe should issue proper blk_queue_bounce_limit() call based on what
> > is actually set mask,
> 
> Or even better remove the call to dma_set_mask_and_coherent with
> DMA_BIT_MASK(32).  NVMe is designed around having proper 64-bit DMA
> addressing, there is not point in trying to pretent it works without that

Agreed, let's just fail the probe() if DMA_BIT_MASK(64) fails, and
have swiotlb work around machines that for some reason need bounce
buffers.

> > - and blk_queue_bounce_limit() should also be fixed to actually set
> > 0xffffffff limit, instead of replacing it with (max_low_pfn <<
> > PAGE_SHIFT) as it does now.
> 
> We need to kill off BLK_BOUNCE_HIGH, it just doesn't make sense to
> mix the highmem aspect with the addressing limits.  In fact the whole
> block bouncing scheme doesn't make much sense at all these days, we
> should rely on swiotlb instead.

If we do this, we should probably have another look at the respective
NETIF_F_HIGHDMA support in the network stack, which does the same thing
and mixes up highmem on 32-bit architectures with the DMA address limit.
(side note: there are actually cases in which you have a 31-bit DMA
mask but 3 GB of lowmem using CONFIG_VMSPLIT_1G, so BLK_BOUNCE_HIGH
and !NETIF_F_HIGHDMA are both missing the limit, causing data corruption
without swiotlb).

Before we rely too much on swiotlb, we may also need to consider which
architectures today rely on bouncing in blk and network.

I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
32-bit architectures without swiotlb (arc, arm, some mips32), and
there are several 64-bit architectures that do not have swiotlb
(alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
always use some form of IOMMU, but the other four apparently don't,
so we would need to add swiotlb support there to remove all the
bounce buffering in network and block layers.

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-10 10:47                           ` Arnd Bergmann
  (?)
  (?)
@ 2017-01-10 14:44                             ` Christoph Hellwig
  -1 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10 14:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Christoph Hellwig, Nikita Yushchenko, linux-arm-kernel,
	Catalin Marinas, Will Deacon, linux-kernel, linux-renesas-soc,
	Simon Horman, linux-pci, Bjorn Helgaas, artemi.ivanov,
	Keith Busch, Jens Axboe, Sagi Grimberg, linux-nvme

On Tue, Jan 10, 2017 at 11:47:42AM +0100, Arnd Bergmann wrote:
> I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
> 32-bit architectures without swiotlb (arc, arm, some mips32), and
> there are several 64-bit architectures that do not have swiotlb
> (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
> always use some form of IOMMU, but the other four apparently don't,
> so we would need to add swiotlb support there to remove all the
> bounce buffering in network and block layers.

mips has lots of weird swiotlb wire-up in it's board code (the swiotlb
arch glue really needs some major cleanup..), as does arm.  Not
sure about the others.

Getting rid of highmem bouncing in the block layer will take some time
as various PIO-only drivers rely on it at the moment.  These should
all be convertable to kmap that data, but it needs a careful audit
first.  For 4.11 I'll plan to switch away from bouncing highmem by
default at least, though and maybe also convert a few PIO drivers.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 14:44                             ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jan 10, 2017 at 11:47:42AM +0100, Arnd Bergmann wrote:
> I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
> 32-bit architectures without swiotlb (arc, arm, some mips32), and
> there are several 64-bit architectures that do not have swiotlb
> (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
> always use some form of IOMMU, but the other four apparently don't,
> so we would need to add swiotlb support there to remove all the
> bounce buffering in network and block layers.

mips has lots of weird swiotlb wire-up in it's board code (the swiotlb
arch glue really needs some major cleanup..), as does arm.  Not
sure about the others.

Getting rid of highmem bouncing in the block layer will take some time
as various PIO-only drivers rely on it at the moment.  These should
all be convertable to kmap that data, but it needs a careful audit
first.  For 4.11 I'll plan to switch away from bouncing highmem by
default at least, though and maybe also convert a few PIO drivers.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 14:44                             ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10 14:44 UTC (permalink / raw)

On Tue, Jan 10, 2017@11:47:42AM +0100, Arnd Bergmann wrote:
> I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
> 32-bit architectures without swiotlb (arc, arm, some mips32), and
> there are several 64-bit architectures that do not have swiotlb
> (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
> always use some form of IOMMU, but the other four apparently don't,
> so we would need to add swiotlb support there to remove all the
> bounce buffering in network and block layers.

mips has lots of weird swiotlb wire-up in it's board code (the swiotlb
arch glue really needs some major cleanup..), as does arm.  Not
sure about the others.

Getting rid of highmem bouncing in the block layer will take some time
as various PIO-only drivers rely on it at the moment.  These should
all be convertable to kmap that data, but it needs a careful audit
first.  For 4.11 I'll plan to switch away from bouncing highmem by
default at least, though and maybe also convert a few PIO drivers.

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 14:44                             ` Christoph Hellwig
  0 siblings, 0 replies; 115+ messages in thread
From: Christoph Hellwig @ 2017-01-10 14:44 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Nikita Yushchenko, Keith Busch, Sagi Grimberg, Jens Axboe,
	Catalin Marinas, Will Deacon, linux-kernel, linux-nvme,
	linux-renesas-soc, Simon Horman, linux-pci, Bjorn Helgaas,
	artemi.ivanov, Christoph Hellwig, linux-arm-kernel

On Tue, Jan 10, 2017 at 11:47:42AM +0100, Arnd Bergmann wrote:
> I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
> 32-bit architectures without swiotlb (arc, arm, some mips32), and
> there are several 64-bit architectures that do not have swiotlb
> (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
> always use some form of IOMMU, but the other four apparently don't,
> so we would need to add swiotlb support there to remove all the
> bounce buffering in network and block layers.

mips has lots of weird swiotlb wire-up in it's board code (the swiotlb
arch glue really needs some major cleanup..), as does arm.  Not
sure about the others.

Getting rid of highmem bouncing in the block layer will take some time
as various PIO-only drivers rely on it at the moment.  These should
all be convertable to kmap that data, but it needs a careful audit
first.  For 4.11 I'll plan to switch away from bouncing highmem by
default at least, though and maybe also convert a few PIO drivers.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
  2017-01-10 14:44                             ` Christoph Hellwig
  (?)
  (?)
@ 2017-01-10 15:00                               ` Arnd Bergmann
  -1 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 15:00 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Christoph Hellwig, Nikita Yushchenko, Keith Busch, Sagi Grimberg,
	Jens Axboe, Catalin Marinas, Will Deacon, linux-kernel,
	linux-nvme, linux-renesas-soc, Simon Horman, linux-pci,
	Bjorn Helgaas, artemi.ivanov

On Tuesday, January 10, 2017 3:44:53 PM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 11:47:42AM +0100, Arnd Bergmann wrote:
> > I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
> > 32-bit architectures without swiotlb (arc, arm, some mips32), and
> > there are several 64-bit architectures that do not have swiotlb
> > (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
> > always use some form of IOMMU, but the other four apparently don't,
> > so we would need to add swiotlb support there to remove all the
> > bounce buffering in network and block layers.
> 
> mips has lots of weird swiotlb wire-up in it's board code (the swiotlb
> arch glue really needs some major cleanup..),

My reading of the MIPS code was that only the 64-bit platforms use it,
but there are a number of 32-bit platforms that have 64-bit physical
addresses and don't.

> as does arm.  Not sure about the others.

32-bit ARM doesn't actually use SWIOTLB at all, despite selecting it
in Kconfig. I think Xen uses it for its own purposes, but nothing
else does.

Most ARM platforms can't actually have RAM beyond 4GB, and the ones
that do have it tend to also come with an IOMMU, but I remember
at least BCM53xx actually needing swiotlb on some chip revisions
that are widely used and that cannot DMA to the second memory bank
from PCI (!).

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 15:00                               ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Tuesday, January 10, 2017 3:44:53 PM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 11:47:42AM +0100, Arnd Bergmann wrote:
> > I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
> > 32-bit architectures without swiotlb (arc, arm, some mips32), and
> > there are several 64-bit architectures that do not have swiotlb
> > (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
> > always use some form of IOMMU, but the other four apparently don't,
> > so we would need to add swiotlb support there to remove all the
> > bounce buffering in network and block layers.
> 
> mips has lots of weird swiotlb wire-up in it's board code (the swiotlb
> arch glue really needs some major cleanup..),

My reading of the MIPS code was that only the 64-bit platforms use it,
but there are a number of 32-bit platforms that have 64-bit physical
addresses and don't.

> as does arm.  Not sure about the others.

32-bit ARM doesn't actually use SWIOTLB at all, despite selecting it
in Kconfig. I think Xen uses it for its own purposes, but nothing
else does.

Most ARM platforms can't actually have RAM beyond 4GB, and the ones
that do have it tend to also come with an IOMMU, but I remember
at least BCM53xx actually needing swiotlb on some chip revisions
that are widely used and that cannot DMA to the second memory bank
from PCI (!).

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 15:00                               ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 15:00 UTC (permalink / raw)

On Tuesday, January 10, 2017 3:44:53 PM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017@11:47:42AM +0100, Arnd Bergmann wrote:
> > I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
> > 32-bit architectures without swiotlb (arc, arm, some mips32), and
> > there are several 64-bit architectures that do not have swiotlb
> > (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
> > always use some form of IOMMU, but the other four apparently don't,
> > so we would need to add swiotlb support there to remove all the
> > bounce buffering in network and block layers.
> 
> mips has lots of weird swiotlb wire-up in it's board code (the swiotlb
> arch glue really needs some major cleanup..),

My reading of the MIPS code was that only the 64-bit platforms use it,
but there are a number of 32-bit platforms that have 64-bit physical
addresses and don't.

> as does arm.  Not sure about the others.

32-bit ARM doesn't actually use SWIOTLB at all, despite selecting it
in Kconfig. I think Xen uses it for its own purposes, but nothing
else does.

Most ARM platforms can't actually have RAM beyond 4GB, and the ones
that do have it tend to also come with an IOMMU, but I remember
at least BCM53xx actually needing swiotlb on some chip revisions
that are widely used and that cannot DMA to the second memory bank
from PCI (!).

	Arnd

^ permalink raw reply	[flat|nested] 115+ messages in thread

* Re: [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask
@ 2017-01-10 15:00                               ` Arnd Bergmann
  0 siblings, 0 replies; 115+ messages in thread
From: Arnd Bergmann @ 2017-01-10 15:00 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Nikita Yushchenko, Jens Axboe, Sagi Grimberg, linux-pci,
	Catalin Marinas, Will Deacon, linux-kernel, linux-nvme,
	Keith Busch, Simon Horman, linux-renesas-soc, Bjorn Helgaas,
	artemi.ivanov, Christoph Hellwig

On Tuesday, January 10, 2017 3:44:53 PM CET Christoph Hellwig wrote:
> On Tue, Jan 10, 2017 at 11:47:42AM +0100, Arnd Bergmann wrote:
> > I see that we have CONFIG_ARCH_PHYS_ADDR_T_64BIT on a couple of
> > 32-bit architectures without swiotlb (arc, arm, some mips32), and
> > there are several 64-bit architectures that do not have swiotlb
> > (alpha, parisc, s390, sparc). I believe that alpha, s390 and sparc
> > always use some form of IOMMU, but the other four apparently don't,
> > so we would need to add swiotlb support there to remove all the
> > bounce buffering in network and block layers.
> 
> mips has lots of weird swiotlb wire-up in it's board code (the swiotlb
> arch glue really needs some major cleanup..),

My reading of the MIPS code was that only the 64-bit platforms use it,
but there are a number of 32-bit platforms that have 64-bit physical
addresses and don't.

> as does arm.  Not sure about the others.

32-bit ARM doesn't actually use SWIOTLB at all, despite selecting it
in Kconfig. I think Xen uses it for its own purposes, but nothing
else does.

Most ARM platforms can't actually have RAM beyond 4GB, and the ones
that do have it tend to also come with an IOMMU, but I remember
at least BCM53xx actually needing swiotlb on some chip revisions
that are widely used and that cannot DMA to the second memory bank
from PCI (!).

	Arnd

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 115+ messages in thread

end of thread, other threads:[~2017-02-16 16:13 UTC | newest]

Thread overview: 115+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-29 20:45 [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask Nikita Yushchenko
2016-12-29 20:45 ` Nikita Yushchenko
2016-12-29 20:45 ` [PATCH 2/2] rcar-pcie: set host bridge's " Nikita Yushchenko
2016-12-29 20:45   ` Nikita Yushchenko
2016-12-29 21:18 ` [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit " Arnd Bergmann
2017-02-16 16:12   ` Arnd Bergmann
2016-12-29 21:18   ` Arnd Bergmann
2016-12-30  9:46 ` Sergei Shtylyov
2016-12-30  9:46   ` Sergei Shtylyov
2016-12-30 10:06   ` Sergei Shtylyov
2016-12-30 10:06     ` Sergei Shtylyov
2017-01-03 18:44 ` Will Deacon
2017-01-03 18:44   ` Will Deacon
2017-01-03 19:00   ` Nikita Yushchenko
2017-01-03 19:00     ` Nikita Yushchenko
2017-01-03 19:01   ` Nikita Yushchenko
2017-01-03 19:01     ` Nikita Yushchenko
2017-01-03 19:01     ` Nikita Yushchenko
2017-01-03 20:13     ` Grygorii Strashko
2017-01-03 20:13       ` Grygorii Strashko
2017-01-03 20:13       ` Grygorii Strashko
2017-01-03 20:23       ` Nikita Yushchenko
2017-01-03 20:23         ` Nikita Yushchenko
2017-01-03 20:23         ` Nikita Yushchenko
2017-01-03 23:13   ` Arnd Bergmann
2017-01-03 23:13     ` Arnd Bergmann
2017-01-03 23:13     ` Arnd Bergmann
2017-01-04  6:24     ` Nikita Yushchenko
2017-01-04  6:24       ` Nikita Yushchenko
2017-01-04  6:24       ` Nikita Yushchenko
2017-01-04 13:29       ` Arnd Bergmann
2017-01-04 13:29         ` Arnd Bergmann
2017-01-04 13:29         ` Arnd Bergmann
2017-01-04 14:30         ` Nikita Yushchenko
2017-01-04 14:30           ` Nikita Yushchenko
2017-01-04 14:30           ` Nikita Yushchenko
2017-01-04 14:46           ` Arnd Bergmann
2017-01-04 14:46             ` Arnd Bergmann
2017-01-04 15:29             ` Nikita Yushchenko
2017-01-04 15:29               ` Nikita Yushchenko
2017-01-04 15:29               ` Nikita Yushchenko
2017-01-06 11:10               ` Arnd Bergmann
2017-01-06 11:10                 ` Arnd Bergmann
2017-01-06 11:10                 ` Arnd Bergmann
2017-01-06 13:47                 ` Nikita Yushchenko
2017-01-06 13:47                   ` Nikita Yushchenko
2017-01-06 13:47                   ` Nikita Yushchenko
2017-01-06 14:38                   ` [PATCH] arm64: do not set dma masks that device connection can't handle Nikita Yushchenko
2017-01-06 14:38                     ` Nikita Yushchenko
2017-01-06 14:45                   ` Nikita Yushchenko
2017-01-06 14:45                     ` Nikita Yushchenko
2017-01-08  7:09                     ` Sergei Shtylyov
2017-01-08  7:09                       ` Sergei Shtylyov
2017-01-09  6:56                       ` Nikita Yushchenko
2017-01-09  6:56                         ` Nikita Yushchenko
2017-01-09 14:05                   ` [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask Arnd Bergmann
2017-01-09 14:05                     ` Arnd Bergmann
2017-01-09 14:05                     ` Arnd Bergmann
2017-01-09 20:34                     ` Nikita Yushchenko
2017-01-09 20:34                       ` Nikita Yushchenko
2017-01-09 20:34                       ` Nikita Yushchenko
2017-01-09 20:34                       ` Nikita Yushchenko
2017-01-09 20:57                       ` Christoph Hellwig
2017-01-09 20:57                         ` Christoph Hellwig
2017-01-09 20:57                         ` Christoph Hellwig
2017-01-09 20:57                         ` Christoph Hellwig
2017-01-10  6:47                         ` NVMe vs DMA addressing limitations Nikita Yushchenko
2017-01-10  7:07                           ` Christoph Hellwig
2017-01-10  7:07                             ` Christoph Hellwig
2017-01-10  7:07                             ` Christoph Hellwig
2017-01-10  7:07                             ` Christoph Hellwig
2017-01-10  7:31                             ` Nikita Yushchenko
2017-01-10  7:31                               ` Nikita Yushchenko
2017-01-10  7:31                               ` Nikita Yushchenko
2017-01-10  7:31                               ` Nikita Yushchenko
2017-01-10 11:01                               ` Arnd Bergmann
2017-01-10 11:01                                 ` Arnd Bergmann
2017-01-10 11:01                                 ` Arnd Bergmann
2017-01-10 11:01                                 ` Arnd Bergmann
2017-01-10 14:48                                 ` Christoph Hellwig
2017-01-10 14:48                                   ` Christoph Hellwig
2017-01-10 14:48                                   ` Christoph Hellwig
2017-01-10 14:48                                   ` Christoph Hellwig
2017-01-10 15:02                                   ` Arnd Bergmann
2017-01-10 15:02                                     ` Arnd Bergmann
2017-01-10 15:02                                     ` Arnd Bergmann
2017-01-10 15:02                                     ` Arnd Bergmann
2017-01-12 10:09                                   ` Sagi Grimberg
2017-01-12 10:09                                     ` Sagi Grimberg
2017-01-12 10:09                                     ` Sagi Grimberg
2017-01-12 10:09                                     ` Sagi Grimberg
2017-01-12 11:56                                     ` Arnd Bergmann
2017-01-12 11:56                                       ` Arnd Bergmann
2017-01-12 11:56                                       ` Arnd Bergmann
2017-01-12 11:56                                       ` Arnd Bergmann
2017-01-12 13:07                                       ` Christoph Hellwig
2017-01-12 13:07                                         ` Christoph Hellwig
2017-01-12 13:07                                         ` Christoph Hellwig
2017-01-12 13:07                                         ` Christoph Hellwig
2017-01-10 10:54                             ` Arnd Bergmann
2017-01-10 10:54                               ` Arnd Bergmann
2017-01-10 10:54                               ` Arnd Bergmann
2017-01-10 10:54                               ` Arnd Bergmann
2017-01-10 10:47                         ` [PATCH 1/2] arm64: dma_mapping: allow PCI host driver to limit DMA mask Arnd Bergmann
2017-01-10 10:47                           ` Arnd Bergmann
2017-01-10 10:47                           ` Arnd Bergmann
2017-01-10 10:47                           ` Arnd Bergmann
2017-01-10 14:44                           ` Christoph Hellwig
2017-01-10 14:44                             ` Christoph Hellwig
2017-01-10 14:44                             ` Christoph Hellwig
2017-01-10 14:44                             ` Christoph Hellwig
2017-01-10 15:00                             ` Arnd Bergmann
2017-01-10 15:00                               ` Arnd Bergmann
2017-01-10 15:00                               ` Arnd Bergmann
2017-01-10 15:00                               ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.