All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
@ 2016-02-19  8:22 ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: iommu, linux-arm-kernel, linux-kernel
  Cc: Marek Szyprowski, linaro-mm-sig, dri-devel, Arnd Bergmann,
	Will Deacon, Catalin Marinas, Robin Murphy,
	Russell King - ARM Linux, Joerg Roedel, Laurent Pinchart,
	Sakari Ailus, Mark Yao, Heiko Stuebner, Tomasz Figa, Inki Dae,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

Dear All,

This is an initial RFC on the unification of IOMMU-based DMA-mapping
code for ARM and ARM64 architectures.

Right now ARM architecture still use my old code for IOMMU-based
DMA-mapping glue, initially merged in commit
4ce63fcd919c32d22528e54dcd89506962933719 ("ARM: dma-mapping: add support
for IOMMU mapper"). In meantime ARM64 got a new, slightly improved
implementation provided by Robin Murphy in commit
13b8629f651164d71f4d38b821925f93ba4236c8 ("arm64: Add IOMMU dma_ops").

Both implementations are very similar thus their unification is desired
to avoid duplicating future works and simplify code, which uses this
layer on both architectures. In this patchset I've selected the new
implementation (from ARM64 architecture) as a base. This means that
ARM-specific, old interface (arm_iommu_* functions) for configuring
IOMMU domains will be no longer available and its users have to be
converted to new API.

Besides lack of old interface, the second difference is additional
requirements for IOMMU drivers. New code relies on the support for
IOMMU_DOMAIN_DMA and default IOMMU domain, which is automatically
attached by the IOMMU core.

The new code also assumes that the IOMMU-based DMA-mapping ops are
mainly configured from arch_setup_dma_ops() function, which means that
the IOMMU driver should provide needed of_xlate callbacks and initialize
IOMMU ops for device nodes. However it should be also possible to
initialize IOMMU-based DMA-mapping ops for client devices directly from
IOMMU drivers by calling common_iommu_setup_dma_ops() (some drivers used
such approach).

IOMMU drivers should be also aware of the fact that the
default domain is attached via device_attach and then device_attach
callback can be called once again with different domain without previous
detach from default domain. For more information on this issue, see the
following thread:
https://lists.linaro.org/pipermail/linaro-mm-sig/2016-February/004625.html

Currently there are 4 users of the old arm_iommu_* interface:
1. Exynos DRM driver
2. Rockchip DRM driver
3. OMAP3 ISP camera driver
4. Renesas VMSA-compatible IPMMU driver

In this patchset I've converted Exynos DRM driver for the new API (patch
1). This required some changes in the memory management model inside the
driver and removal of some hacks, which were used to setup IOMMU-based
DMA-mapping ops on the 'exynos-drm' virtual device and common IOMMU
domain for all Exynos DRM sub-devices, those changes have been posted
separately here: http://www.spinics.net/lists/dri-devel/msg100861.html 
Rockchip DRM driver requires similar conversion.

Converting OMAP3 ISP camera driver to new API requires adding support
for IOMMU groups to OMAP IOMMU driver, because the new DMA/IOMMU code
used IOMMU_DOMAIN_DMA type domains and default groups.

Renesas IPMMU driver needs also to be extended with IOMMU_DOMAIN_DMA domain
type support. It can also be prepared for IOMMU_OF_DECLARE and of_xlate
callback-based initialization to let core to automatically setup of
IOMMU-based DMA mapping implementation.

Patch 2 moves existing code from arch/arm64 to drivers/iommu and
introduces some minor changes in function names - mainly adding arch_
prefix to some dma-mapping internal functions, which stay in arch/arm64/
(functions of similar names are present in arch/arm). Patch 3 adapts ARM
architecture for the common code.

I would like to get your comments on the proposed approach. There is
still some work that need to be done to convert remaining users of the
old API and updating IOMMU drivers to the new API requirements. This
change need to be tested on the all affected ARM sub-architectures.

Right now patches were tested on only Exynos based boards: ARM 32bit:
Exynos4412 and Exynos5422 boards and ARM 64 bit Exnyos 5433 (with some
out-of-tree DTS).

To ease testing I've prepared a branch with all the patches needed
(there are all needed patches for Exynos subarch, which have been posted
as separate patchsets):
https://git.linaro.org/people/marek.szyprowski/linux-srpol.git v4.5-dma-iommu-unification

Patches are based on Linux v4.5-rc4 vanilla tree.

Best regards
Marek Szyprowski
Samsung R&D Institute Poland


Patch summary:

Marek Szyprowski (3):
  drm/exynos: rewrite IOMMU support code
  iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
  iommu: dma-iommu: use common implementation also on ARM architecture

 arch/arm/Kconfig                          |   22 +-
 arch/arm/include/asm/device.h             |    9 -
 arch/arm/include/asm/dma-iommu.h          |   37 -
 arch/arm/include/asm/dma-mapping.h        |   59 +-
 arch/arm/mm/dma-mapping.c                 | 1158 +----------------------------
 arch/arm64/include/asm/dma-mapping.h      |   39 +-
 arch/arm64/mm/dma-mapping.c               |  491 +-----------
 drivers/gpu/drm/exynos/Kconfig            |    2 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.c   |    7 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.h   |    2 +-
 drivers/gpu/drm/exynos/exynos_drm_iommu.c |   91 ++-
 drivers/gpu/drm/exynos/exynos_drm_iommu.h |    2 +-
 drivers/gpu/drm/rockchip/Kconfig          |    1 +
 drivers/iommu/Kconfig                     |    1 +
 drivers/iommu/Makefile                    |    2 +-
 drivers/iommu/dma-iommu-ops.c             |  471 ++++++++++++
 drivers/media/platform/Kconfig            |    1 +
 include/linux/dma-iommu.h                 |   14 +
 18 files changed, 679 insertions(+), 1730 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-iommu.h
 create mode 100644 drivers/iommu/dma-iommu-ops.c

-- 
1.9.2

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
@ 2016-02-19  8:22 ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: iommu, linux-arm-kernel, linux-kernel
  Cc: Krzysztof Kozlowski, Russell King - ARM Linux, Arnd Bergmann,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	dri-devel, Tomasz Figa, linaro-mm-sig, Sakari Ailus,
	Laurent Pinchart, Robin Murphy, Marek Szyprowski

Dear All,

This is an initial RFC on the unification of IOMMU-based DMA-mapping
code for ARM and ARM64 architectures.

Right now ARM architecture still use my old code for IOMMU-based
DMA-mapping glue, initially merged in commit
4ce63fcd919c32d22528e54dcd89506962933719 ("ARM: dma-mapping: add support
for IOMMU mapper"). In meantime ARM64 got a new, slightly improved
implementation provided by Robin Murphy in commit
13b8629f651164d71f4d38b821925f93ba4236c8 ("arm64: Add IOMMU dma_ops").

Both implementations are very similar thus their unification is desired
to avoid duplicating future works and simplify code, which uses this
layer on both architectures. In this patchset I've selected the new
implementation (from ARM64 architecture) as a base. This means that
ARM-specific, old interface (arm_iommu_* functions) for configuring
IOMMU domains will be no longer available and its users have to be
converted to new API.

Besides lack of old interface, the second difference is additional
requirements for IOMMU drivers. New code relies on the support for
IOMMU_DOMAIN_DMA and default IOMMU domain, which is automatically
attached by the IOMMU core.

The new code also assumes that the IOMMU-based DMA-mapping ops are
mainly configured from arch_setup_dma_ops() function, which means that
the IOMMU driver should provide needed of_xlate callbacks and initialize
IOMMU ops for device nodes. However it should be also possible to
initialize IOMMU-based DMA-mapping ops for client devices directly from
IOMMU drivers by calling common_iommu_setup_dma_ops() (some drivers used
such approach).

IOMMU drivers should be also aware of the fact that the
default domain is attached via device_attach and then device_attach
callback can be called once again with different domain without previous
detach from default domain. For more information on this issue, see the
following thread:
https://lists.linaro.org/pipermail/linaro-mm-sig/2016-February/004625.html

Currently there are 4 users of the old arm_iommu_* interface:
1. Exynos DRM driver
2. Rockchip DRM driver
3. OMAP3 ISP camera driver
4. Renesas VMSA-compatible IPMMU driver

In this patchset I've converted Exynos DRM driver for the new API (patch
1). This required some changes in the memory management model inside the
driver and removal of some hacks, which were used to setup IOMMU-based
DMA-mapping ops on the 'exynos-drm' virtual device and common IOMMU
domain for all Exynos DRM sub-devices, those changes have been posted
separately here: http://www.spinics.net/lists/dri-devel/msg100861.html 
Rockchip DRM driver requires similar conversion.

Converting OMAP3 ISP camera driver to new API requires adding support
for IOMMU groups to OMAP IOMMU driver, because the new DMA/IOMMU code
used IOMMU_DOMAIN_DMA type domains and default groups.

Renesas IPMMU driver needs also to be extended with IOMMU_DOMAIN_DMA domain
type support. It can also be prepared for IOMMU_OF_DECLARE and of_xlate
callback-based initialization to let core to automatically setup of
IOMMU-based DMA mapping implementation.

Patch 2 moves existing code from arch/arm64 to drivers/iommu and
introduces some minor changes in function names - mainly adding arch_
prefix to some dma-mapping internal functions, which stay in arch/arm64/
(functions of similar names are present in arch/arm). Patch 3 adapts ARM
architecture for the common code.

I would like to get your comments on the proposed approach. There is
still some work that need to be done to convert remaining users of the
old API and updating IOMMU drivers to the new API requirements. This
change need to be tested on the all affected ARM sub-architectures.

Right now patches were tested on only Exynos based boards: ARM 32bit:
Exynos4412 and Exynos5422 boards and ARM 64 bit Exnyos 5433 (with some
out-of-tree DTS).

To ease testing I've prepared a branch with all the patches needed
(there are all needed patches for Exynos subarch, which have been posted
as separate patchsets):
https://git.linaro.org/people/marek.szyprowski/linux-srpol.git v4.5-dma-iommu-unification

Patches are based on Linux v4.5-rc4 vanilla tree.

Best regards
Marek Szyprowski
Samsung R&D Institute Poland


Patch summary:

Marek Szyprowski (3):
  drm/exynos: rewrite IOMMU support code
  iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
  iommu: dma-iommu: use common implementation also on ARM architecture

 arch/arm/Kconfig                          |   22 +-
 arch/arm/include/asm/device.h             |    9 -
 arch/arm/include/asm/dma-iommu.h          |   37 -
 arch/arm/include/asm/dma-mapping.h        |   59 +-
 arch/arm/mm/dma-mapping.c                 | 1158 +----------------------------
 arch/arm64/include/asm/dma-mapping.h      |   39 +-
 arch/arm64/mm/dma-mapping.c               |  491 +-----------
 drivers/gpu/drm/exynos/Kconfig            |    2 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.c   |    7 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.h   |    2 +-
 drivers/gpu/drm/exynos/exynos_drm_iommu.c |   91 ++-
 drivers/gpu/drm/exynos/exynos_drm_iommu.h |    2 +-
 drivers/gpu/drm/rockchip/Kconfig          |    1 +
 drivers/iommu/Kconfig                     |    1 +
 drivers/iommu/Makefile                    |    2 +-
 drivers/iommu/dma-iommu-ops.c             |  471 ++++++++++++
 drivers/media/platform/Kconfig            |    1 +
 include/linux/dma-iommu.h                 |   14 +
 18 files changed, 679 insertions(+), 1730 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-iommu.h
 create mode 100644 drivers/iommu/dma-iommu-ops.c

-- 
1.9.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
@ 2016-02-19  8:22 ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

Dear All,

This is an initial RFC on the unification of IOMMU-based DMA-mapping
code for ARM and ARM64 architectures.

Right now ARM architecture still use my old code for IOMMU-based
DMA-mapping glue, initially merged in commit
4ce63fcd919c32d22528e54dcd89506962933719 ("ARM: dma-mapping: add support
for IOMMU mapper"). In meantime ARM64 got a new, slightly improved
implementation provided by Robin Murphy in commit
13b8629f651164d71f4d38b821925f93ba4236c8 ("arm64: Add IOMMU dma_ops").

Both implementations are very similar thus their unification is desired
to avoid duplicating future works and simplify code, which uses this
layer on both architectures. In this patchset I've selected the new
implementation (from ARM64 architecture) as a base. This means that
ARM-specific, old interface (arm_iommu_* functions) for configuring
IOMMU domains will be no longer available and its users have to be
converted to new API.

Besides lack of old interface, the second difference is additional
requirements for IOMMU drivers. New code relies on the support for
IOMMU_DOMAIN_DMA and default IOMMU domain, which is automatically
attached by the IOMMU core.

The new code also assumes that the IOMMU-based DMA-mapping ops are
mainly configured from arch_setup_dma_ops() function, which means that
the IOMMU driver should provide needed of_xlate callbacks and initialize
IOMMU ops for device nodes. However it should be also possible to
initialize IOMMU-based DMA-mapping ops for client devices directly from
IOMMU drivers by calling common_iommu_setup_dma_ops() (some drivers used
such approach).

IOMMU drivers should be also aware of the fact that the
default domain is attached via device_attach and then device_attach
callback can be called once again with different domain without previous
detach from default domain. For more information on this issue, see the
following thread:
https://lists.linaro.org/pipermail/linaro-mm-sig/2016-February/004625.html

Currently there are 4 users of the old arm_iommu_* interface:
1. Exynos DRM driver
2. Rockchip DRM driver
3. OMAP3 ISP camera driver
4. Renesas VMSA-compatible IPMMU driver

In this patchset I've converted Exynos DRM driver for the new API (patch
1). This required some changes in the memory management model inside the
driver and removal of some hacks, which were used to setup IOMMU-based
DMA-mapping ops on the 'exynos-drm' virtual device and common IOMMU
domain for all Exynos DRM sub-devices, those changes have been posted
separately here: http://www.spinics.net/lists/dri-devel/msg100861.html 
Rockchip DRM driver requires similar conversion.

Converting OMAP3 ISP camera driver to new API requires adding support
for IOMMU groups to OMAP IOMMU driver, because the new DMA/IOMMU code
used IOMMU_DOMAIN_DMA type domains and default groups.

Renesas IPMMU driver needs also to be extended with IOMMU_DOMAIN_DMA domain
type support. It can also be prepared for IOMMU_OF_DECLARE and of_xlate
callback-based initialization to let core to automatically setup of
IOMMU-based DMA mapping implementation.

Patch 2 moves existing code from arch/arm64 to drivers/iommu and
introduces some minor changes in function names - mainly adding arch_
prefix to some dma-mapping internal functions, which stay in arch/arm64/
(functions of similar names are present in arch/arm). Patch 3 adapts ARM
architecture for the common code.

I would like to get your comments on the proposed approach. There is
still some work that need to be done to convert remaining users of the
old API and updating IOMMU drivers to the new API requirements. This
change need to be tested on the all affected ARM sub-architectures.

Right now patches were tested on only Exynos based boards: ARM 32bit:
Exynos4412 and Exynos5422 boards and ARM 64 bit Exnyos 5433 (with some
out-of-tree DTS).

To ease testing I've prepared a branch with all the patches needed
(there are all needed patches for Exynos subarch, which have been posted
as separate patchsets):
https://git.linaro.org/people/marek.szyprowski/linux-srpol.git v4.5-dma-iommu-unification

Patches are based on Linux v4.5-rc4 vanilla tree.

Best regards
Marek Szyprowski
Samsung R&D Institute Poland


Patch summary:

Marek Szyprowski (3):
  drm/exynos: rewrite IOMMU support code
  iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
  iommu: dma-iommu: use common implementation also on ARM architecture

 arch/arm/Kconfig                          |   22 +-
 arch/arm/include/asm/device.h             |    9 -
 arch/arm/include/asm/dma-iommu.h          |   37 -
 arch/arm/include/asm/dma-mapping.h        |   59 +-
 arch/arm/mm/dma-mapping.c                 | 1158 +----------------------------
 arch/arm64/include/asm/dma-mapping.h      |   39 +-
 arch/arm64/mm/dma-mapping.c               |  491 +-----------
 drivers/gpu/drm/exynos/Kconfig            |    2 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.c   |    7 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.h   |    2 +-
 drivers/gpu/drm/exynos/exynos_drm_iommu.c |   91 ++-
 drivers/gpu/drm/exynos/exynos_drm_iommu.h |    2 +-
 drivers/gpu/drm/rockchip/Kconfig          |    1 +
 drivers/iommu/Kconfig                     |    1 +
 drivers/iommu/Makefile                    |    2 +-
 drivers/iommu/dma-iommu-ops.c             |  471 ++++++++++++
 drivers/media/platform/Kconfig            |    1 +
 include/linux/dma-iommu.h                 |   14 +
 18 files changed, 679 insertions(+), 1730 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-iommu.h
 create mode 100644 drivers/iommu/dma-iommu-ops.c

-- 
1.9.2

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 1/3] drm/exynos: rewrite IOMMU support code
@ 2016-02-19  8:22   ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: iommu, linux-arm-kernel, linux-kernel
  Cc: Marek Szyprowski, linaro-mm-sig, dri-devel, Arnd Bergmann,
	Will Deacon, Catalin Marinas, Robin Murphy,
	Russell King - ARM Linux, Joerg Roedel, Laurent Pinchart,
	Sakari Ailus, Mark Yao, Heiko Stuebner, Tomasz Figa, Inki Dae,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

This patch replaces usage of ARM-specific IOMMU/DMA-mapping related calls
with new generic code for managing DMA-IOMMU integration layer. It also
removes all the hacks, which were needed to configure common DMA/IO address
space on the virtual exynos-drm device. Since moving Exynos GEM code to use
on of real devices for DMA-mapping operations, such hacks are no longer
needed. The only requirement is to have all the devices, which build
Exynos DRM, attached to the same IOMMU domain (to share IO address space).

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 drivers/gpu/drm/exynos/Kconfig            |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.c   |  7 +--
 drivers/gpu/drm/exynos/exynos_drm_drv.h   |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_iommu.c | 91 +++++++++++++++++++------------
 drivers/gpu/drm/exynos/exynos_drm_iommu.h |  2 +-
 5 files changed, 59 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 83efca941388..b0d0aaa7fea5 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -15,7 +15,7 @@ if DRM_EXYNOS
 
 config DRM_EXYNOS_IOMMU
 	bool
-	depends on EXYNOS_IOMMU && ARM_DMA_USE_IOMMU
+	depends on EXYNOS_IOMMU && IOMMU_DMA
 	default y
 
 comment "CRTCs"
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index c7fce7ffeef5..45aa480f1890 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -159,12 +159,7 @@ static int exynos_drm_load(struct drm_device *dev, unsigned long flags)
 	DRM_INFO("Exynos DRM: using %s device for DMA mapping operations\n",
 		 dev_name(private->dma_dev));
 
-	/*
-	 * create mapping to manage iommu table and set a pointer to iommu
-	 * mapping structure to iommu_mapping of private data.
-	 * also this iommu_mapping can be used to check if iommu is supported
-	 * or not.
-	 */
+	/* create common IOMMU mapping for all devices attached to Exynos DRM */
 	ret = drm_create_iommu_mapping(dev);
 	if (ret < 0) {
 		DRM_ERROR("failed to create iommu mapping.\n");
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h b/drivers/gpu/drm/exynos/exynos_drm_drv.h
index 303056311c0c..b107f77d0897 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h
@@ -222,7 +222,7 @@ struct exynos_drm_private {
 	struct device *dma_dev;
 	unsigned long da_start;
 	unsigned long da_space_size;
-	void *mapping;
+	struct iommu_domain *domain;
 
 	unsigned int pipe;
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.c b/drivers/gpu/drm/exynos/exynos_drm_iommu.c
index 146ac88078ae..89e51ed6499d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.c
@@ -14,13 +14,28 @@
 
 #include <linux/dma-mapping.h>
 #include <linux/iommu.h>
-#include <linux/kref.h>
-
-#include <asm/dma-iommu.h>
+#include <linux/dma-iommu.h>
 
 #include "exynos_drm_drv.h"
 #include "exynos_drm_iommu.h"
 
+static inline int configure_dma_max_seg_size(struct device *dev)
+{
+	if (!dev->dma_parms)
+		dev->dma_parms = kzalloc(sizeof(*dev->dma_parms), GFP_KERNEL);
+	if (!dev->dma_parms)
+		return -ENOMEM;
+
+	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+	return 0;
+}
+
+static inline void clear_dma_max_seg_size(struct device *dev)
+{
+	kfree(dev->dma_parms);
+	dev->dma_parms = NULL;
+}
+
 /*
  * drm_create_iommu_mapping - create a mapping structure
  *
@@ -28,38 +43,48 @@
  */
 int drm_create_iommu_mapping(struct drm_device *drm_dev)
 {
-	struct dma_iommu_mapping *mapping = NULL;
 	struct exynos_drm_private *priv = drm_dev->dev_private;
+	struct device *dev = to_dma_dev(drm_dev);
+	int ret;
 
 	if (!priv->da_start)
 		priv->da_start = EXYNOS_DEV_ADDR_START;
 	if (!priv->da_space_size)
 		priv->da_space_size = EXYNOS_DEV_ADDR_SIZE;
 
-	mapping = arm_iommu_create_mapping(&platform_bus_type, priv->da_start,
-						priv->da_space_size);
+	priv->domain = iommu_domain_alloc(dev->bus);
+	if (!priv->domain)
+		return -ENOMEM;
 
-	if (IS_ERR(mapping))
-		return PTR_ERR(mapping);
+	ret = iommu_get_dma_cookie(priv->domain);
+	if (ret)
+		goto free_domain;
 
-	priv->mapping = mapping;
+	ret = iommu_dma_init_domain(priv->domain, priv->da_start,
+				    priv->da_space_size);
+	if (ret)
+		goto put_cookie;
 
 	return 0;
+
+put_cookie:
+	iommu_put_dma_cookie(priv->domain);
+free_domain:
+	iommu_domain_free(priv->domain);
+	return ret;
 }
 
 /*
  * drm_release_iommu_mapping - release iommu mapping structure
  *
  * @drm_dev: DRM device
- *
- * if mapping->kref becomes 0 then all things related to iommu mapping
- * will be released
  */
 void drm_release_iommu_mapping(struct drm_device *drm_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
 
-	arm_iommu_release_mapping(priv->mapping);
+	iommu_put_dma_cookie(priv->domain);
+	iommu_domain_free(priv->domain);
 }
 
 /*
@@ -75,29 +100,25 @@ int drm_iommu_attach_device(struct drm_device *drm_dev,
 				struct device *subdrv_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
+	struct device *dev = to_dma_dev(drm_dev);
+	struct iommu_domain *domain = priv->domain;
 	int ret;
 
-	if (!priv->mapping)
-		return 0;
-
-	subdrv_dev->dma_parms = devm_kzalloc(subdrv_dev,
-					sizeof(*subdrv_dev->dma_parms),
-					GFP_KERNEL);
-	if (!subdrv_dev->dma_parms)
-		return -ENOMEM;
-
-	dma_set_max_seg_size(subdrv_dev, 0xffffffffu);
-
-	if (subdrv_dev->archdata.mapping)
-		arm_iommu_detach_device(subdrv_dev);
+	if (get_dma_ops(dev) != get_dma_ops(subdrv_dev)) {
+		DRM_ERROR("Device %s lacks support for IOMMU\n",
+			  dev_name(subdrv_dev));
+		return -EINVAL;
+	}
 
-	ret = arm_iommu_attach_device(subdrv_dev, priv->mapping);
-	if (ret < 0) {
-		DRM_DEBUG_KMS("failed iommu attach.\n");
+	ret = configure_dma_max_seg_size(subdrv_dev);
+	if (ret)
 		return ret;
-	}
 
-	return 0;
+	ret = iommu_attach_device(domain, subdrv_dev);
+	if (ret != 0)
+		clear_dma_max_seg_size(subdrv_dev);
+
+	return ret;
 }
 
 /*
@@ -113,10 +134,8 @@ void drm_iommu_detach_device(struct drm_device *drm_dev,
 				struct device *subdrv_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
-	struct dma_iommu_mapping *mapping = priv->mapping;
-
-	if (!mapping || !mapping->domain)
-		return;
+	struct iommu_domain *domain = priv->domain;
 
-	arm_iommu_detach_device(subdrv_dev);
+	iommu_detach_device(domain, subdrv_dev);
+	clear_dma_max_seg_size(subdrv_dev);
 }
diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index c1584f24d23d..8318ebe2de6d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -30,7 +30,7 @@ void drm_iommu_detach_device(struct drm_device *dev_dev,
 static inline bool is_drm_iommu_supported(struct drm_device *drm_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
-	return priv->mapping ? true : false;
+	return priv->domain ? true : false;
 }
 
 #else
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 1/3] drm/exynos: rewrite IOMMU support code
@ 2016-02-19  8:22   ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: Inki Dae, Krzysztof Kozlowski, Russell King - ARM Linux,
	Heiko Stuebner, Arnd Bergmann, Bartlomiej Zolnierkiewicz,
	Catalin Marinas, Will Deacon,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw, Sakari Ailus,
	Laurent Pinchart, Mark Yao

This patch replaces usage of ARM-specific IOMMU/DMA-mapping related calls
with new generic code for managing DMA-IOMMU integration layer. It also
removes all the hacks, which were needed to configure common DMA/IO address
space on the virtual exynos-drm device. Since moving Exynos GEM code to use
on of real devices for DMA-mapping operations, such hacks are no longer
needed. The only requirement is to have all the devices, which build
Exynos DRM, attached to the same IOMMU domain (to share IO address space).

Signed-off-by: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
---
 drivers/gpu/drm/exynos/Kconfig            |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.c   |  7 +--
 drivers/gpu/drm/exynos/exynos_drm_drv.h   |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_iommu.c | 91 +++++++++++++++++++------------
 drivers/gpu/drm/exynos/exynos_drm_iommu.h |  2 +-
 5 files changed, 59 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 83efca941388..b0d0aaa7fea5 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -15,7 +15,7 @@ if DRM_EXYNOS
 
 config DRM_EXYNOS_IOMMU
 	bool
-	depends on EXYNOS_IOMMU && ARM_DMA_USE_IOMMU
+	depends on EXYNOS_IOMMU && IOMMU_DMA
 	default y
 
 comment "CRTCs"
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index c7fce7ffeef5..45aa480f1890 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -159,12 +159,7 @@ static int exynos_drm_load(struct drm_device *dev, unsigned long flags)
 	DRM_INFO("Exynos DRM: using %s device for DMA mapping operations\n",
 		 dev_name(private->dma_dev));
 
-	/*
-	 * create mapping to manage iommu table and set a pointer to iommu
-	 * mapping structure to iommu_mapping of private data.
-	 * also this iommu_mapping can be used to check if iommu is supported
-	 * or not.
-	 */
+	/* create common IOMMU mapping for all devices attached to Exynos DRM */
 	ret = drm_create_iommu_mapping(dev);
 	if (ret < 0) {
 		DRM_ERROR("failed to create iommu mapping.\n");
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h b/drivers/gpu/drm/exynos/exynos_drm_drv.h
index 303056311c0c..b107f77d0897 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h
@@ -222,7 +222,7 @@ struct exynos_drm_private {
 	struct device *dma_dev;
 	unsigned long da_start;
 	unsigned long da_space_size;
-	void *mapping;
+	struct iommu_domain *domain;
 
 	unsigned int pipe;
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.c b/drivers/gpu/drm/exynos/exynos_drm_iommu.c
index 146ac88078ae..89e51ed6499d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.c
@@ -14,13 +14,28 @@
 
 #include <linux/dma-mapping.h>
 #include <linux/iommu.h>
-#include <linux/kref.h>
-
-#include <asm/dma-iommu.h>
+#include <linux/dma-iommu.h>
 
 #include "exynos_drm_drv.h"
 #include "exynos_drm_iommu.h"
 
+static inline int configure_dma_max_seg_size(struct device *dev)
+{
+	if (!dev->dma_parms)
+		dev->dma_parms = kzalloc(sizeof(*dev->dma_parms), GFP_KERNEL);
+	if (!dev->dma_parms)
+		return -ENOMEM;
+
+	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+	return 0;
+}
+
+static inline void clear_dma_max_seg_size(struct device *dev)
+{
+	kfree(dev->dma_parms);
+	dev->dma_parms = NULL;
+}
+
 /*
  * drm_create_iommu_mapping - create a mapping structure
  *
@@ -28,38 +43,48 @@
  */
 int drm_create_iommu_mapping(struct drm_device *drm_dev)
 {
-	struct dma_iommu_mapping *mapping = NULL;
 	struct exynos_drm_private *priv = drm_dev->dev_private;
+	struct device *dev = to_dma_dev(drm_dev);
+	int ret;
 
 	if (!priv->da_start)
 		priv->da_start = EXYNOS_DEV_ADDR_START;
 	if (!priv->da_space_size)
 		priv->da_space_size = EXYNOS_DEV_ADDR_SIZE;
 
-	mapping = arm_iommu_create_mapping(&platform_bus_type, priv->da_start,
-						priv->da_space_size);
+	priv->domain = iommu_domain_alloc(dev->bus);
+	if (!priv->domain)
+		return -ENOMEM;
 
-	if (IS_ERR(mapping))
-		return PTR_ERR(mapping);
+	ret = iommu_get_dma_cookie(priv->domain);
+	if (ret)
+		goto free_domain;
 
-	priv->mapping = mapping;
+	ret = iommu_dma_init_domain(priv->domain, priv->da_start,
+				    priv->da_space_size);
+	if (ret)
+		goto put_cookie;
 
 	return 0;
+
+put_cookie:
+	iommu_put_dma_cookie(priv->domain);
+free_domain:
+	iommu_domain_free(priv->domain);
+	return ret;
 }
 
 /*
  * drm_release_iommu_mapping - release iommu mapping structure
  *
  * @drm_dev: DRM device
- *
- * if mapping->kref becomes 0 then all things related to iommu mapping
- * will be released
  */
 void drm_release_iommu_mapping(struct drm_device *drm_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
 
-	arm_iommu_release_mapping(priv->mapping);
+	iommu_put_dma_cookie(priv->domain);
+	iommu_domain_free(priv->domain);
 }
 
 /*
@@ -75,29 +100,25 @@ int drm_iommu_attach_device(struct drm_device *drm_dev,
 				struct device *subdrv_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
+	struct device *dev = to_dma_dev(drm_dev);
+	struct iommu_domain *domain = priv->domain;
 	int ret;
 
-	if (!priv->mapping)
-		return 0;
-
-	subdrv_dev->dma_parms = devm_kzalloc(subdrv_dev,
-					sizeof(*subdrv_dev->dma_parms),
-					GFP_KERNEL);
-	if (!subdrv_dev->dma_parms)
-		return -ENOMEM;
-
-	dma_set_max_seg_size(subdrv_dev, 0xffffffffu);
-
-	if (subdrv_dev->archdata.mapping)
-		arm_iommu_detach_device(subdrv_dev);
+	if (get_dma_ops(dev) != get_dma_ops(subdrv_dev)) {
+		DRM_ERROR("Device %s lacks support for IOMMU\n",
+			  dev_name(subdrv_dev));
+		return -EINVAL;
+	}
 
-	ret = arm_iommu_attach_device(subdrv_dev, priv->mapping);
-	if (ret < 0) {
-		DRM_DEBUG_KMS("failed iommu attach.\n");
+	ret = configure_dma_max_seg_size(subdrv_dev);
+	if (ret)
 		return ret;
-	}
 
-	return 0;
+	ret = iommu_attach_device(domain, subdrv_dev);
+	if (ret != 0)
+		clear_dma_max_seg_size(subdrv_dev);
+
+	return ret;
 }
 
 /*
@@ -113,10 +134,8 @@ void drm_iommu_detach_device(struct drm_device *drm_dev,
 				struct device *subdrv_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
-	struct dma_iommu_mapping *mapping = priv->mapping;
-
-	if (!mapping || !mapping->domain)
-		return;
+	struct iommu_domain *domain = priv->domain;
 
-	arm_iommu_detach_device(subdrv_dev);
+	iommu_detach_device(domain, subdrv_dev);
+	clear_dma_max_seg_size(subdrv_dev);
 }
diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index c1584f24d23d..8318ebe2de6d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -30,7 +30,7 @@ void drm_iommu_detach_device(struct drm_device *dev_dev,
 static inline bool is_drm_iommu_supported(struct drm_device *drm_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
-	return priv->mapping ? true : false;
+	return priv->domain ? true : false;
 }
 
 #else
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 1/3] drm/exynos: rewrite IOMMU support code
@ 2016-02-19  8:22   ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

This patch replaces usage of ARM-specific IOMMU/DMA-mapping related calls
with new generic code for managing DMA-IOMMU integration layer. It also
removes all the hacks, which were needed to configure common DMA/IO address
space on the virtual exynos-drm device. Since moving Exynos GEM code to use
on of real devices for DMA-mapping operations, such hacks are no longer
needed. The only requirement is to have all the devices, which build
Exynos DRM, attached to the same IOMMU domain (to share IO address space).

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 drivers/gpu/drm/exynos/Kconfig            |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_drv.c   |  7 +--
 drivers/gpu/drm/exynos/exynos_drm_drv.h   |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_iommu.c | 91 +++++++++++++++++++------------
 drivers/gpu/drm/exynos/exynos_drm_iommu.h |  2 +-
 5 files changed, 59 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/exynos/Kconfig b/drivers/gpu/drm/exynos/Kconfig
index 83efca941388..b0d0aaa7fea5 100644
--- a/drivers/gpu/drm/exynos/Kconfig
+++ b/drivers/gpu/drm/exynos/Kconfig
@@ -15,7 +15,7 @@ if DRM_EXYNOS
 
 config DRM_EXYNOS_IOMMU
 	bool
-	depends on EXYNOS_IOMMU && ARM_DMA_USE_IOMMU
+	depends on EXYNOS_IOMMU && IOMMU_DMA
 	default y
 
 comment "CRTCs"
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.c b/drivers/gpu/drm/exynos/exynos_drm_drv.c
index c7fce7ffeef5..45aa480f1890 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.c
@@ -159,12 +159,7 @@ static int exynos_drm_load(struct drm_device *dev, unsigned long flags)
 	DRM_INFO("Exynos DRM: using %s device for DMA mapping operations\n",
 		 dev_name(private->dma_dev));
 
-	/*
-	 * create mapping to manage iommu table and set a pointer to iommu
-	 * mapping structure to iommu_mapping of private data.
-	 * also this iommu_mapping can be used to check if iommu is supported
-	 * or not.
-	 */
+	/* create common IOMMU mapping for all devices attached to Exynos DRM */
 	ret = drm_create_iommu_mapping(dev);
 	if (ret < 0) {
 		DRM_ERROR("failed to create iommu mapping.\n");
diff --git a/drivers/gpu/drm/exynos/exynos_drm_drv.h b/drivers/gpu/drm/exynos/exynos_drm_drv.h
index 303056311c0c..b107f77d0897 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_drv.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_drv.h
@@ -222,7 +222,7 @@ struct exynos_drm_private {
 	struct device *dma_dev;
 	unsigned long da_start;
 	unsigned long da_space_size;
-	void *mapping;
+	struct iommu_domain *domain;
 
 	unsigned int pipe;
 
diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.c b/drivers/gpu/drm/exynos/exynos_drm_iommu.c
index 146ac88078ae..89e51ed6499d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.c
@@ -14,13 +14,28 @@
 
 #include <linux/dma-mapping.h>
 #include <linux/iommu.h>
-#include <linux/kref.h>
-
-#include <asm/dma-iommu.h>
+#include <linux/dma-iommu.h>
 
 #include "exynos_drm_drv.h"
 #include "exynos_drm_iommu.h"
 
+static inline int configure_dma_max_seg_size(struct device *dev)
+{
+	if (!dev->dma_parms)
+		dev->dma_parms = kzalloc(sizeof(*dev->dma_parms), GFP_KERNEL);
+	if (!dev->dma_parms)
+		return -ENOMEM;
+
+	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+	return 0;
+}
+
+static inline void clear_dma_max_seg_size(struct device *dev)
+{
+	kfree(dev->dma_parms);
+	dev->dma_parms = NULL;
+}
+
 /*
  * drm_create_iommu_mapping - create a mapping structure
  *
@@ -28,38 +43,48 @@
  */
 int drm_create_iommu_mapping(struct drm_device *drm_dev)
 {
-	struct dma_iommu_mapping *mapping = NULL;
 	struct exynos_drm_private *priv = drm_dev->dev_private;
+	struct device *dev = to_dma_dev(drm_dev);
+	int ret;
 
 	if (!priv->da_start)
 		priv->da_start = EXYNOS_DEV_ADDR_START;
 	if (!priv->da_space_size)
 		priv->da_space_size = EXYNOS_DEV_ADDR_SIZE;
 
-	mapping = arm_iommu_create_mapping(&platform_bus_type, priv->da_start,
-						priv->da_space_size);
+	priv->domain = iommu_domain_alloc(dev->bus);
+	if (!priv->domain)
+		return -ENOMEM;
 
-	if (IS_ERR(mapping))
-		return PTR_ERR(mapping);
+	ret = iommu_get_dma_cookie(priv->domain);
+	if (ret)
+		goto free_domain;
 
-	priv->mapping = mapping;
+	ret = iommu_dma_init_domain(priv->domain, priv->da_start,
+				    priv->da_space_size);
+	if (ret)
+		goto put_cookie;
 
 	return 0;
+
+put_cookie:
+	iommu_put_dma_cookie(priv->domain);
+free_domain:
+	iommu_domain_free(priv->domain);
+	return ret;
 }
 
 /*
  * drm_release_iommu_mapping - release iommu mapping structure
  *
  * @drm_dev: DRM device
- *
- * if mapping->kref becomes 0 then all things related to iommu mapping
- * will be released
  */
 void drm_release_iommu_mapping(struct drm_device *drm_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
 
-	arm_iommu_release_mapping(priv->mapping);
+	iommu_put_dma_cookie(priv->domain);
+	iommu_domain_free(priv->domain);
 }
 
 /*
@@ -75,29 +100,25 @@ int drm_iommu_attach_device(struct drm_device *drm_dev,
 				struct device *subdrv_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
+	struct device *dev = to_dma_dev(drm_dev);
+	struct iommu_domain *domain = priv->domain;
 	int ret;
 
-	if (!priv->mapping)
-		return 0;
-
-	subdrv_dev->dma_parms = devm_kzalloc(subdrv_dev,
-					sizeof(*subdrv_dev->dma_parms),
-					GFP_KERNEL);
-	if (!subdrv_dev->dma_parms)
-		return -ENOMEM;
-
-	dma_set_max_seg_size(subdrv_dev, 0xffffffffu);
-
-	if (subdrv_dev->archdata.mapping)
-		arm_iommu_detach_device(subdrv_dev);
+	if (get_dma_ops(dev) != get_dma_ops(subdrv_dev)) {
+		DRM_ERROR("Device %s lacks support for IOMMU\n",
+			  dev_name(subdrv_dev));
+		return -EINVAL;
+	}
 
-	ret = arm_iommu_attach_device(subdrv_dev, priv->mapping);
-	if (ret < 0) {
-		DRM_DEBUG_KMS("failed iommu attach.\n");
+	ret = configure_dma_max_seg_size(subdrv_dev);
+	if (ret)
 		return ret;
-	}
 
-	return 0;
+	ret = iommu_attach_device(domain, subdrv_dev);
+	if (ret != 0)
+		clear_dma_max_seg_size(subdrv_dev);
+
+	return ret;
 }
 
 /*
@@ -113,10 +134,8 @@ void drm_iommu_detach_device(struct drm_device *drm_dev,
 				struct device *subdrv_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
-	struct dma_iommu_mapping *mapping = priv->mapping;
-
-	if (!mapping || !mapping->domain)
-		return;
+	struct iommu_domain *domain = priv->domain;
 
-	arm_iommu_detach_device(subdrv_dev);
+	iommu_detach_device(domain, subdrv_dev);
+	clear_dma_max_seg_size(subdrv_dev);
 }
diff --git a/drivers/gpu/drm/exynos/exynos_drm_iommu.h b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
index c1584f24d23d..8318ebe2de6d 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_iommu.h
+++ b/drivers/gpu/drm/exynos/exynos_drm_iommu.h
@@ -30,7 +30,7 @@ void drm_iommu_detach_device(struct drm_device *dev_dev,
 static inline bool is_drm_iommu_supported(struct drm_device *drm_dev)
 {
 	struct exynos_drm_private *priv = drm_dev->dev_private;
-	return priv->mapping ? true : false;
+	return priv->domain ? true : false;
 }
 
 #else
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 2/3] iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
  2016-02-19  8:22 ` Marek Szyprowski
  (?)
@ 2016-02-19  8:22   ` Marek Szyprowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: iommu, linux-arm-kernel, linux-kernel
  Cc: Marek Szyprowski, linaro-mm-sig, dri-devel, Arnd Bergmann,
	Will Deacon, Catalin Marinas, Robin Murphy,
	Russell King - ARM Linux, Joerg Roedel, Laurent Pinchart,
	Sakari Ailus, Mark Yao, Heiko Stuebner, Tomasz Figa, Inki Dae,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

This patch moves all the IOMMU-based DMA-mapping code from arch/arm64/mm
to drivers/iommu/dma-iommu-ops.c. This way it can be easily shared with
ARM architecture, which will also use them.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm64/include/asm/dma-mapping.h |  39 ++-
 arch/arm64/mm/dma-mapping.c          | 491 ++---------------------------------
 drivers/iommu/Makefile               |   2 +-
 drivers/iommu/dma-iommu-ops.c        | 471 +++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h            |  14 +
 5 files changed, 538 insertions(+), 479 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu-ops.c

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index ba437f090a74..3a582d820717 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -22,6 +22,7 @@
 #include <linux/vmalloc.h>
 
 #include <xen/xen.h>
+#include <asm/cacheflush.h>
 #include <asm/xen/hypervisor.h>
 
 #define DMA_ERROR_CODE	(~(dma_addr_t)0)
@@ -47,14 +48,17 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
+static inline void arch_set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+{
+	dev->archdata.dma_ops = ops;
+}
+
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
-#ifdef CONFIG_IOMMU_DMA
 void arch_teardown_dma_ops(struct device *dev);
 #define arch_teardown_dma_ops	arch_teardown_dma_ops
-#endif
 
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
@@ -86,5 +90,36 @@ static inline void dma_mark_clean(void *addr, size_t size)
 {
 }
 
+static inline void arch_flush_page(struct device *dev, const void *virt,
+				   phys_addr_t phys)
+{
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+}
+
+static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
+				     enum dma_data_direction dir)
+{
+	__dma_map_area(phys_to_virt(phys), size, dir);
+}
+
+static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
+				       enum dma_data_direction dir)
+{
+	__dma_unmap_area(phys_to_virt(phys), size, dir);
+}
+
+static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
+					pgprot_t prot, bool coherent)
+{
+	if (!coherent || dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs))
+		return pgprot_writecombine(prot);
+	return prot;
+}
+
+extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
+					 gfp_t flags);
+extern bool arch_in_atomic_pool(void *start, size_t size);
+extern int arch_free_from_atomic_pool(void *start, size_t size);
+
 #endif	/* __KERNEL__ */
 #endif	/* __ASM_DMA_MAPPING_H */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index a6e757cbab77..d8cb8552bbff 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -24,19 +24,12 @@
 #include <linux/genalloc.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
+#include <linux/dma-iommu.h>
 #include <linux/vmalloc.h>
 #include <linux/swiotlb.h>
 
 #include <asm/cacheflush.h>
 
-static pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot,
-				 bool coherent)
-{
-	if (!coherent || dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs))
-		return pgprot_writecombine(prot);
-	return prot;
-}
-
 static struct gen_pool *atomic_pool;
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
@@ -49,7 +42,7 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
-static void *__alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
+void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page, gfp_t flags)
 {
 	unsigned long val;
 	void *ptr = NULL;
@@ -71,14 +64,14 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
 	return ptr;
 }
 
-static bool __in_atomic_pool(void *start, size_t size)
+bool arch_in_atomic_pool(void *start, size_t size)
 {
 	return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
 }
 
-static int __free_from_pool(void *start, size_t size)
+int arch_free_from_atomic_pool(void *start, size_t size)
 {
-	if (!__in_atomic_pool(start, size))
+	if (!arch_in_atomic_pool(start, size))
 		return 0;
 
 	gen_pool_free(atomic_pool, (unsigned long)start, size);
@@ -142,13 +135,13 @@ static void *__dma_alloc(struct device *dev, size_t size,
 	struct page *page;
 	void *ptr, *coherent_ptr;
 	bool coherent = is_device_dma_coherent(dev);
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false);
+	pgprot_t prot = arch_get_dma_pgprot(attrs, PAGE_KERNEL, false);
 
 	size = PAGE_ALIGN(size);
 
 	if (!coherent && !gfpflags_allow_blocking(flags)) {
 		struct page *page = NULL;
-		void *addr = __alloc_from_pool(size, &page, flags);
+		void *addr = arch_alloc_from_atomic_pool(size, &page, flags);
 
 		if (addr)
 			*dma_handle = phys_to_dma(dev, page_to_phys(page));
@@ -192,7 +185,7 @@ static void __dma_free(struct device *dev, size_t size,
 	size = PAGE_ALIGN(size);
 
 	if (!is_device_dma_coherent(dev)) {
-		if (__free_from_pool(vaddr, size))
+		if (arch_free_from_atomic_pool(vaddr, size))
 			return;
 		vunmap(vaddr);
 	}
@@ -312,7 +305,7 @@ static int __swiotlb_mmap(struct device *dev,
 	unsigned long pfn = dma_to_phys(dev, dma_addr) >> PAGE_SHIFT;
 	unsigned long off = vma->vm_pgoff;
 
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
+	vma->vm_page_prot = arch_get_dma_pgprot(attrs, vma->vm_page_prot,
 					     is_device_dma_coherent(dev));
 
 	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
@@ -526,470 +519,16 @@ static int __init dma_debug_do_init(void)
 }
 fs_initcall(dma_debug_do_init);
 
-
-#ifdef CONFIG_IOMMU_DMA
-#include <linux/dma-iommu.h>
-#include <linux/platform_device.h>
-#include <linux/amba/bus.h>
-
-/* Thankfully, all cache ops are by VA so we can ignore phys here */
-static void flush_page(struct device *dev, const void *virt, phys_addr_t phys)
-{
-	__dma_flush_range(virt, virt + PAGE_SIZE);
-}
-
-static void *__iommu_alloc_attrs(struct device *dev, size_t size,
-				 dma_addr_t *handle, gfp_t gfp,
-				 struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
-	size_t iosize = size;
-	void *addr;
-
-	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
-		return NULL;
-
-	size = PAGE_ALIGN(size);
-
-	/*
-	 * Some drivers rely on this, and we probably don't want the
-	 * possibility of stale kernel data being read by devices anyway.
-	 */
-	gfp |= __GFP_ZERO;
-
-	if (gfpflags_allow_blocking(gfp)) {
-		struct page **pages;
-		pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
-
-		pages = iommu_dma_alloc(dev, iosize, gfp, ioprot, handle,
-					flush_page);
-		if (!pages)
-			return NULL;
-
-		addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
-					      __builtin_return_address(0));
-		if (!addr)
-			iommu_dma_free(dev, pages, iosize, handle);
-	} else {
-		struct page *page;
-		/*
-		 * In atomic context we can't remap anything, so we'll only
-		 * get the virtually contiguous buffer we need by way of a
-		 * physically contiguous allocation.
-		 */
-		if (coherent) {
-			page = alloc_pages(gfp, get_order(size));
-			addr = page ? page_address(page) : NULL;
-		} else {
-			addr = __alloc_from_pool(size, &page, gfp);
-		}
-		if (!addr)
-			return NULL;
-
-		*handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-		if (iommu_dma_mapping_error(dev, *handle)) {
-			if (coherent)
-				__free_pages(page, get_order(size));
-			else
-				__free_from_pool(addr, size);
-			addr = NULL;
-		}
-	}
-	return addr;
-}
-
-static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-			       dma_addr_t handle, struct dma_attrs *attrs)
-{
-	size_t iosize = size;
-
-	size = PAGE_ALIGN(size);
-	/*
-	 * @cpu_addr will be one of 3 things depending on how it was allocated:
-	 * - A remapped array of pages from iommu_dma_alloc(), for all
-	 *   non-atomic allocations.
-	 * - A non-cacheable alias from the atomic pool, for atomic
-	 *   allocations by non-coherent devices.
-	 * - A normal lowmem address, for atomic allocations by
-	 *   coherent devices.
-	 * Hence how dodgy the below logic looks...
-	 */
-	if (__in_atomic_pool(cpu_addr, size)) {
-		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
-		__free_from_pool(cpu_addr, size);
-	} else if (is_vmalloc_addr(cpu_addr)){
-		struct vm_struct *area = find_vm_area(cpu_addr);
-
-		if (WARN_ON(!area || !area->pages))
-			return;
-		iommu_dma_free(dev, area->pages, iosize, &handle);
-		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
-	} else {
-		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
-		__free_pages(virt_to_page(cpu_addr), get_order(size));
-	}
-}
-
-static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
-			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
-			      struct dma_attrs *attrs)
-{
-	struct vm_struct *area;
-	int ret;
-
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
-					     is_device_dma_coherent(dev));
-
-	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
-		return ret;
-
-	area = find_vm_area(cpu_addr);
-	if (WARN_ON(!area || !area->pages))
-		return -ENXIO;
-
-	return iommu_dma_mmap(area->pages, size, vma);
-}
-
-static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
-			       void *cpu_addr, dma_addr_t dma_addr,
-			       size_t size, struct dma_attrs *attrs)
-{
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	struct vm_struct *area = find_vm_area(cpu_addr);
-
-	if (WARN_ON(!area || !area->pages))
-		return -ENXIO;
-
-	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
-					 GFP_KERNEL);
-}
-
-static void __iommu_sync_single_for_cpu(struct device *dev,
-					dma_addr_t dev_addr, size_t size,
-					enum dma_data_direction dir)
-{
-	phys_addr_t phys;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
-	__dma_unmap_area(phys_to_virt(phys), size, dir);
-}
-
-static void __iommu_sync_single_for_device(struct device *dev,
-					   dma_addr_t dev_addr, size_t size,
-					   enum dma_data_direction dir)
-{
-	phys_addr_t phys;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
-	__dma_map_area(phys_to_virt(phys), size, dir);
-}
-
-static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
-				   unsigned long offset, size_t size,
-				   enum dma_data_direction dir,
-				   struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-	int prot = dma_direction_to_prot(dir, coherent);
-	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
-
-	if (!iommu_dma_mapping_error(dev, dev_addr) &&
-	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
-
-	return dev_addr;
-}
-
-static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
-			       size_t size, enum dma_data_direction dir,
-			       struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
-
-	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
-}
-
-static void __iommu_sync_sg_for_cpu(struct device *dev,
-				    struct scatterlist *sgl, int nelems,
-				    enum dma_data_direction dir)
-{
-	struct scatterlist *sg;
-	int i;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	for_each_sg(sgl, sg, nelems, i)
-		__dma_unmap_area(sg_virt(sg), sg->length, dir);
-}
-
-static void __iommu_sync_sg_for_device(struct device *dev,
-				       struct scatterlist *sgl, int nelems,
-				       enum dma_data_direction dir)
-{
-	struct scatterlist *sg;
-	int i;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	for_each_sg(sgl, sg, nelems, i)
-		__dma_map_area(sg_virt(sg), sg->length, dir);
-}
-
-static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
-				int nelems, enum dma_data_direction dir,
-				struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
-
-	return iommu_dma_map_sg(dev, sgl, nelems,
-			dma_direction_to_prot(dir, coherent));
-}
-
-static void __iommu_unmap_sg_attrs(struct device *dev,
-				   struct scatterlist *sgl, int nelems,
-				   enum dma_data_direction dir,
-				   struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
-
-	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
-}
-
-static struct dma_map_ops iommu_dma_ops = {
-	.alloc = __iommu_alloc_attrs,
-	.free = __iommu_free_attrs,
-	.mmap = __iommu_mmap_attrs,
-	.get_sgtable = __iommu_get_sgtable,
-	.map_page = __iommu_map_page,
-	.unmap_page = __iommu_unmap_page,
-	.map_sg = __iommu_map_sg_attrs,
-	.unmap_sg = __iommu_unmap_sg_attrs,
-	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
-	.sync_single_for_device = __iommu_sync_single_for_device,
-	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
-	.sync_sg_for_device = __iommu_sync_sg_for_device,
-	.dma_supported = iommu_dma_supported,
-	.mapping_error = iommu_dma_mapping_error,
-};
-
-/*
- * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
- * everything it needs to - the device is only partially created and the
- * IOMMU driver hasn't seen it yet, so it can't have a group. Thus we
- * need this delayed attachment dance. Once IOMMU probe ordering is sorted
- * to move the arch_setup_dma_ops() call later, all the notifier bits below
- * become unnecessary, and will go away.
- */
-struct iommu_dma_notifier_data {
-	struct list_head list;
-	struct device *dev;
-	const struct iommu_ops *ops;
-	u64 dma_base;
-	u64 size;
-};
-static LIST_HEAD(iommu_dma_masters);
-static DEFINE_MUTEX(iommu_dma_notifier_lock);
-
-/*
- * Temporarily "borrow" a domain feature flag to to tell if we had to resort
- * to creating our own domain here, in case we need to clean it up again.
- */
-#define __IOMMU_DOMAIN_FAKE_DEFAULT		(1U << 31)
-
-static bool do_iommu_attach(struct device *dev, const struct iommu_ops *ops,
-			   u64 dma_base, u64 size)
-{
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
-	/*
-	 * Best case: The device is either part of a group which was
-	 * already attached to a domain in a previous call, or it's
-	 * been put in a default DMA domain by the IOMMU core.
-	 */
-	if (!domain) {
-		/*
-		 * Urgh. The IOMMU core isn't going to do default domains
-		 * for non-PCI devices anyway, until it has some means of
-		 * abstracting the entirely implementation-specific
-		 * sideband data/SoC topology/unicorn dust that may or
-		 * may not differentiate upstream masters.
-		 * So until then, HORRIBLE HACKS!
-		 */
-		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
-		if (!domain)
-			goto out_no_domain;
-
-		domain->ops = ops;
-		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_FAKE_DEFAULT;
-
-		if (iommu_attach_device(domain, dev))
-			goto out_put_domain;
-	}
-
-	if (iommu_dma_init_domain(domain, dma_base, size))
-		goto out_detach;
-
-	dev->archdata.dma_ops = &iommu_dma_ops;
-	return true;
-
-out_detach:
-	iommu_detach_device(domain, dev);
-out_put_domain:
-	if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
-		iommu_domain_free(domain);
-out_no_domain:
-	pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA ops\n",
-		dev_name(dev));
-	return false;
-}
-
-static void queue_iommu_attach(struct device *dev, const struct iommu_ops *ops,
-			      u64 dma_base, u64 size)
-{
-	struct iommu_dma_notifier_data *iommudata;
-
-	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
-	if (!iommudata)
-		return;
-
-	iommudata->dev = dev;
-	iommudata->ops = ops;
-	iommudata->dma_base = dma_base;
-	iommudata->size = size;
-
-	mutex_lock(&iommu_dma_notifier_lock);
-	list_add(&iommudata->list, &iommu_dma_masters);
-	mutex_unlock(&iommu_dma_notifier_lock);
-}
-
-static int __iommu_attach_notifier(struct notifier_block *nb,
-				   unsigned long action, void *data)
-{
-	struct iommu_dma_notifier_data *master, *tmp;
-
-	if (action != BUS_NOTIFY_ADD_DEVICE)
-		return 0;
-
-	mutex_lock(&iommu_dma_notifier_lock);
-	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
-		if (do_iommu_attach(master->dev, master->ops,
-				master->dma_base, master->size)) {
-			list_del(&master->list);
-			kfree(master);
-		}
-	}
-	mutex_unlock(&iommu_dma_notifier_lock);
-	return 0;
-}
-
-static int __init register_iommu_dma_ops_notifier(struct bus_type *bus)
-{
-	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
-	int ret;
-
-	if (!nb)
-		return -ENOMEM;
-	/*
-	 * The device must be attached to a domain before the driver probe
-	 * routine gets a chance to start allocating DMA buffers. However,
-	 * the IOMMU driver also needs a chance to configure the iommu_group
-	 * via its add_device callback first, so we need to make the attach
-	 * happen between those two points. Since the IOMMU core uses a bus
-	 * notifier with default priority for add_device, do the same but
-	 * with a lower priority to ensure the appropriate ordering.
-	 */
-	nb->notifier_call = __iommu_attach_notifier;
-	nb->priority = -100;
-
-	ret = bus_register_notifier(bus, nb);
-	if (ret) {
-		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
-			bus->name);
-		kfree(nb);
-	}
-	return ret;
-}
-
-static int __init __iommu_dma_init(void)
-{
-	int ret;
-
-	ret = iommu_dma_init();
-	if (!ret)
-		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
-	if (!ret)
-		ret = register_iommu_dma_ops_notifier(&amba_bustype);
-
-	/* handle devices queued before this arch_initcall */
-	if (!ret)
-		__iommu_attach_notifier(NULL, BUS_NOTIFY_ADD_DEVICE, NULL);
-	return ret;
-}
-arch_initcall(__iommu_dma_init);
-
-static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				  const struct iommu_ops *ops)
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent)
 {
-	struct iommu_group *group;
+	dev->archdata.dma_coherent = coherent;
 
-	if (!ops)
-		return;
-	/*
-	 * TODO: As a concession to the future, we're ready to handle being
-	 * called both early and late (i.e. after bus_add_device). Once all
-	 * the platform bus code is reworked to call us late and the notifier
-	 * junk above goes away, move the body of do_iommu_attach here.
-	 */
-	group = iommu_group_get(dev);
-	if (group) {
-		do_iommu_attach(dev, ops, dma_base, size);
-		iommu_group_put(group);
-	} else {
-		queue_iommu_attach(dev, ops, dma_base, size);
-	}
+	if (!common_iommu_setup_dma_ops(dev, dma_base, size, iommu))
+		arch_set_dma_ops(dev, &swiotlb_dma_ops);
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
-	if (domain) {
-		iommu_detach_device(domain, dev);
-		if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
-			iommu_domain_free(domain);
-	}
-
-	dev->archdata.dma_ops = NULL;
-}
-
-#else
-
-static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				  struct iommu_ops *iommu)
-{ }
-
-#endif  /* CONFIG_IOMMU_DMA */
-
-void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-			struct iommu_ops *iommu, bool coherent)
-{
-	if (!dev->archdata.dma_ops)
-		dev->archdata.dma_ops = &swiotlb_dma_ops;
-
-	dev->archdata.dma_coherent = coherent;
-	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
+	common_iommu_teardown_dma_ops(dev);
 }
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 42fc0c25cf1a..c0dbf765bf45 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,7 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
-obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o dma-iommu-ops.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu-ops.c b/drivers/iommu/dma-iommu-ops.c
new file mode 100644
index 000000000000..047c47e3c0ab
--- /dev/null
+++ b/drivers/iommu/dma-iommu-ops.c
@@ -0,0 +1,471 @@
+/*
+ * A common IOMMU based DMA-API implementation for ARM and ARM64 architecutes.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-iommu.h>
+#include <linux/gfp.h>
+#include <linux/huge_mm.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+#include <linux/vmalloc.h>
+
+#include <linux/platform_device.h>
+#include <linux/amba/bus.h>
+
+#include <asm/dma-mapping.h>
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+				 dma_addr_t *handle, gfp_t gfp,
+				 struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	size_t iosize = size;
+	void *addr;
+
+	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
+		return NULL;
+
+	size = PAGE_ALIGN(size);
+
+	/*
+	 * Some drivers rely on this, and we probably don't want the
+	 * possibility of stale kernel data being read by devices anyway.
+	 */
+	gfp |= __GFP_ZERO;
+
+	if (gfpflags_allow_blocking(gfp)) {
+		struct page **pages;
+		pgprot_t prot = arch_get_dma_pgprot(attrs, PAGE_KERNEL,
+						    coherent);
+
+		pages = iommu_dma_alloc(dev, iosize, gfp, ioprot, handle,
+					arch_flush_page);
+		if (!pages)
+			return NULL;
+
+		addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
+					      __builtin_return_address(0));
+		if (!addr)
+			iommu_dma_free(dev, pages, iosize, handle);
+	} else {
+		struct page *page;
+		/*
+		 * In atomic context we can't remap anything, so we'll only
+		 * get the virtually contiguous buffer we need by way of a
+		 * physically contiguous allocation.
+		 */
+		if (coherent) {
+			page = alloc_pages(gfp, get_order(size));
+			addr = page ? page_address(page) : NULL;
+		} else {
+			addr = arch_alloc_from_atomic_pool(size, &page, gfp);
+		}
+		if (!addr)
+			return NULL;
+
+		*handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
+		if (iommu_dma_mapping_error(dev, *handle)) {
+			if (coherent)
+				__free_pages(page, get_order(size));
+			else
+				arch_free_from_atomic_pool(addr, size);
+			addr = NULL;
+		}
+	}
+	return addr;
+}
+
+static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+			       dma_addr_t handle, struct dma_attrs *attrs)
+{
+	size_t iosize = size;
+
+	size = PAGE_ALIGN(size);
+	/*
+	 * @cpu_addr will be one of 3 things depending on how it was allocated:
+	 * - A remapped array of pages from iommu_dma_alloc(), for all
+	 *   non-atomic allocations.
+	 * - A non-cacheable alias from the atomic pool, for atomic
+	 *   allocations by non-coherent devices.
+	 * - A normal lowmem address, for atomic allocations by
+	 *   coherent devices.
+	 * Hence how dodgy the below logic looks...
+	 */
+	if (arch_in_atomic_pool(cpu_addr, size)) {
+		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
+		arch_free_from_atomic_pool(cpu_addr, size);
+	} else if (is_vmalloc_addr(cpu_addr)){
+		struct vm_struct *area = find_vm_area(cpu_addr);
+
+		if (WARN_ON(!area || !area->pages))
+			return;
+		iommu_dma_free(dev, area->pages, iosize, &handle);
+		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+	} else {
+		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
+		__free_pages(virt_to_page(cpu_addr), get_order(size));
+	}
+}
+
+static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			      struct dma_attrs *attrs)
+{
+	struct vm_struct *area;
+	int ret;
+
+	vma->vm_page_prot = arch_get_dma_pgprot(attrs, vma->vm_page_prot,
+					        is_device_dma_coherent(dev));
+
+	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	area = find_vm_area(cpu_addr);
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return iommu_dma_mmap(area->pages, size, vma);
+}
+
+static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
+			       void *cpu_addr, dma_addr_t dma_addr,
+			       size_t size, struct dma_attrs *attrs)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct vm_struct *area = find_vm_area(cpu_addr);
+
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+					 GFP_KERNEL);
+}
+
+static void __iommu_sync_single_for_cpu(struct device *dev,
+					dma_addr_t dev_addr, size_t size,
+					enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	arch_dma_unmap_area(phys, size, dir);
+}
+
+static void __iommu_sync_single_for_device(struct device *dev,
+					   dma_addr_t dev_addr, size_t size,
+					   enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	arch_dma_map_area(phys, size, dir);
+}
+
+static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int prot = dma_direction_to_prot(dir, coherent);
+	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
+
+	if (!iommu_dma_mapping_error(dev, dev_addr) &&
+	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+
+	return dev_addr;
+}
+
+static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
+
+	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static void __iommu_sync_sg_for_cpu(struct device *dev,
+				    struct scatterlist *sgl, int nelems,
+				    enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		arch_dma_unmap_area(sg_phys(sg), sg->length, dir);
+}
+
+static void __iommu_sync_sg_for_device(struct device *dev,
+				       struct scatterlist *sgl, int nelems,
+				       enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		arch_dma_map_area(sg_phys(sg), sg->length, dir);
+}
+
+static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+				int nelems, enum dma_data_direction dir,
+				struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
+
+	return iommu_dma_map_sg(dev, sgl, nelems,
+			dma_direction_to_prot(dir, coherent));
+}
+
+static void __iommu_unmap_sg_attrs(struct device *dev,
+				   struct scatterlist *sgl, int nelems,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
+
+	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static struct dma_map_ops iommu_dma_ops = {
+	.alloc = __iommu_alloc_attrs,
+	.free = __iommu_free_attrs,
+	.mmap = __iommu_mmap_attrs,
+	.get_sgtable = __iommu_get_sgtable,
+	.map_page = __iommu_map_page,
+	.unmap_page = __iommu_unmap_page,
+	.map_sg = __iommu_map_sg_attrs,
+	.unmap_sg = __iommu_unmap_sg_attrs,
+	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
+	.sync_single_for_device = __iommu_sync_single_for_device,
+	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
+	.sync_sg_for_device = __iommu_sync_sg_for_device,
+	.dma_supported = iommu_dma_supported,
+	.mapping_error = iommu_dma_mapping_error,
+};
+
+/*
+ * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
+ * everything it needs to - the device is only partially created and the
+ * IOMMU driver hasn't seen it yet, so it can't have a group. Thus we
+ * need this delayed attachment dance. Once IOMMU probe ordering is sorted
+ * to move the arch_setup_dma_ops() call later, all the notifier bits below
+ * become unnecessary, and will go away.
+ */
+struct iommu_dma_notifier_data {
+	struct list_head list;
+	struct device *dev;
+	const struct iommu_ops *ops;
+	u64 dma_base;
+	u64 size;
+};
+static LIST_HEAD(iommu_dma_masters);
+static DEFINE_MUTEX(iommu_dma_notifier_lock);
+
+/*
+ * Temporarily "borrow" a domain feature flag to to tell if we had to resort
+ * to creating our own domain here, in case we need to clean it up again.
+ */
+#define __IOMMU_DOMAIN_FAKE_DEFAULT		(1U << 31)
+
+static bool do_iommu_attach(struct device *dev, const struct iommu_ops *ops,
+			   u64 dma_base, u64 size)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	/*
+	 * Best case: The device is either part of a group which was
+	 * already attached to a domain in a previous call, or it's
+	 * been put in a default DMA domain by the IOMMU core.
+	 */
+	if (!domain) {
+		/*
+		 * Urgh. The IOMMU core isn't going to do default domains
+		 * for non-PCI devices anyway, until it has some means of
+		 * abstracting the entirely implementation-specific
+		 * sideband data/SoC topology/unicorn dust that may or
+		 * may not differentiate upstream masters.
+		 * So until then, HORRIBLE HACKS!
+		 */
+		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
+		if (!domain)
+			goto out_no_domain;
+
+		domain->ops = ops;
+		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_FAKE_DEFAULT;
+
+		if (iommu_attach_device(domain, dev))
+			goto out_put_domain;
+	}
+
+	if (iommu_dma_init_domain(domain, dma_base, size))
+		goto out_detach;
+
+	arch_set_dma_ops(dev, &iommu_dma_ops);
+	return true;
+
+out_detach:
+	iommu_detach_device(domain, dev);
+out_put_domain:
+	if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
+		iommu_domain_free(domain);
+out_no_domain:
+	pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA ops\n",
+		dev_name(dev));
+	return false;
+}
+
+static void queue_iommu_attach(struct device *dev, const struct iommu_ops *ops,
+			      u64 dma_base, u64 size)
+{
+	struct iommu_dma_notifier_data *iommudata;
+
+	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
+	if (!iommudata)
+		return;
+
+	iommudata->dev = dev;
+	iommudata->ops = ops;
+	iommudata->dma_base = dma_base;
+	iommudata->size = size;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_add(&iommudata->list, &iommu_dma_masters);
+	mutex_unlock(&iommu_dma_notifier_lock);
+}
+
+static int __iommu_attach_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct iommu_dma_notifier_data *master, *tmp;
+
+	if (action != BUS_NOTIFY_ADD_DEVICE)
+		return 0;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
+		if (do_iommu_attach(master->dev, master->ops,
+				master->dma_base, master->size)) {
+			list_del(&master->list);
+			kfree(master);
+		}
+	}
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int __init register_iommu_dma_ops_notifier(struct bus_type *bus)
+{
+	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
+	int ret;
+
+	if (!nb)
+		return -ENOMEM;
+	/*
+	 * The device must be attached to a domain before the driver probe
+	 * routine gets a chance to start allocating DMA buffers. However,
+	 * the IOMMU driver also needs a chance to configure the iommu_group
+	 * via its add_device callback first, so we need to make the attach
+	 * happen between those two points. Since the IOMMU core uses a bus
+	 * notifier with default priority for add_device, do the same but
+	 * with a lower priority to ensure the appropriate ordering.
+	 */
+	nb->notifier_call = __iommu_attach_notifier;
+	nb->priority = -100;
+
+	ret = bus_register_notifier(bus, nb);
+	if (ret) {
+		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
+			bus->name);
+		kfree(nb);
+	}
+	return ret;
+}
+
+static int __init __iommu_dma_init(void)
+{
+	int ret;
+
+	ret = iommu_dma_init();
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&amba_bustype);
+
+	/* handle devices queued before this arch_initcall */
+	if (!ret)
+		__iommu_attach_notifier(NULL, BUS_NOTIFY_ADD_DEVICE, NULL);
+	return ret;
+}
+arch_initcall(__iommu_dma_init);
+
+bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops)
+{
+	struct iommu_group *group;
+
+	if (!ops)
+		return false;
+	/*
+	 * TODO: As a concession to the future, we're ready to handle being
+	 * called both early and late (i.e. after bus_add_device). Once all
+	 * the platform bus code is reworked to call us late and the notifier
+	 * junk above goes away, move the body of do_iommu_attach here.
+	 */
+	group = iommu_group_get(dev);
+	if (group) {
+		do_iommu_attach(dev, ops, dma_base, size);
+		iommu_group_put(group);
+	} else {
+		queue_iommu_attach(dev, ops, dma_base, size);
+	}
+
+	return true;
+}
+
+void common_iommu_teardown_dma_ops(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain) {
+		iommu_detach_device(domain, dev);
+		if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
+			iommu_domain_free(domain);
+	}
+
+	arch_set_dma_ops(dev, NULL);
+}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index fc481037478a..01a836c43dc3 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -62,6 +62,10 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 int iommu_dma_supported(struct device *dev, u64 mask);
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 
+bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops);
+void common_iommu_teardown_dma_ops(struct device *dev);
+
 #else
 
 struct iommu_domain;
@@ -80,6 +84,16 @@ static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
 }
 
+static inline bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base,
+					u64 size, const struct iommu_ops *ops)
+{
+	return false;
+}
+
+static inline void common_iommu_teardown_dma_ops(struct device *dev)
+{
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 2/3] iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
@ 2016-02-19  8:22   ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: iommu, linux-arm-kernel, linux-kernel
  Cc: Krzysztof Kozlowski, Russell King - ARM Linux, Arnd Bergmann,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	dri-devel, Tomasz Figa, linaro-mm-sig, Sakari Ailus,
	Laurent Pinchart, Robin Murphy, Marek Szyprowski

This patch moves all the IOMMU-based DMA-mapping code from arch/arm64/mm
to drivers/iommu/dma-iommu-ops.c. This way it can be easily shared with
ARM architecture, which will also use them.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm64/include/asm/dma-mapping.h |  39 ++-
 arch/arm64/mm/dma-mapping.c          | 491 ++---------------------------------
 drivers/iommu/Makefile               |   2 +-
 drivers/iommu/dma-iommu-ops.c        | 471 +++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h            |  14 +
 5 files changed, 538 insertions(+), 479 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu-ops.c

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index ba437f090a74..3a582d820717 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -22,6 +22,7 @@
 #include <linux/vmalloc.h>
 
 #include <xen/xen.h>
+#include <asm/cacheflush.h>
 #include <asm/xen/hypervisor.h>
 
 #define DMA_ERROR_CODE	(~(dma_addr_t)0)
@@ -47,14 +48,17 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
+static inline void arch_set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+{
+	dev->archdata.dma_ops = ops;
+}
+
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
-#ifdef CONFIG_IOMMU_DMA
 void arch_teardown_dma_ops(struct device *dev);
 #define arch_teardown_dma_ops	arch_teardown_dma_ops
-#endif
 
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
@@ -86,5 +90,36 @@ static inline void dma_mark_clean(void *addr, size_t size)
 {
 }
 
+static inline void arch_flush_page(struct device *dev, const void *virt,
+				   phys_addr_t phys)
+{
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+}
+
+static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
+				     enum dma_data_direction dir)
+{
+	__dma_map_area(phys_to_virt(phys), size, dir);
+}
+
+static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
+				       enum dma_data_direction dir)
+{
+	__dma_unmap_area(phys_to_virt(phys), size, dir);
+}
+
+static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
+					pgprot_t prot, bool coherent)
+{
+	if (!coherent || dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs))
+		return pgprot_writecombine(prot);
+	return prot;
+}
+
+extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
+					 gfp_t flags);
+extern bool arch_in_atomic_pool(void *start, size_t size);
+extern int arch_free_from_atomic_pool(void *start, size_t size);
+
 #endif	/* __KERNEL__ */
 #endif	/* __ASM_DMA_MAPPING_H */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index a6e757cbab77..d8cb8552bbff 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -24,19 +24,12 @@
 #include <linux/genalloc.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
+#include <linux/dma-iommu.h>
 #include <linux/vmalloc.h>
 #include <linux/swiotlb.h>
 
 #include <asm/cacheflush.h>
 
-static pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot,
-				 bool coherent)
-{
-	if (!coherent || dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs))
-		return pgprot_writecombine(prot);
-	return prot;
-}
-
 static struct gen_pool *atomic_pool;
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
@@ -49,7 +42,7 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
-static void *__alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
+void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page, gfp_t flags)
 {
 	unsigned long val;
 	void *ptr = NULL;
@@ -71,14 +64,14 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
 	return ptr;
 }
 
-static bool __in_atomic_pool(void *start, size_t size)
+bool arch_in_atomic_pool(void *start, size_t size)
 {
 	return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
 }
 
-static int __free_from_pool(void *start, size_t size)
+int arch_free_from_atomic_pool(void *start, size_t size)
 {
-	if (!__in_atomic_pool(start, size))
+	if (!arch_in_atomic_pool(start, size))
 		return 0;
 
 	gen_pool_free(atomic_pool, (unsigned long)start, size);
@@ -142,13 +135,13 @@ static void *__dma_alloc(struct device *dev, size_t size,
 	struct page *page;
 	void *ptr, *coherent_ptr;
 	bool coherent = is_device_dma_coherent(dev);
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false);
+	pgprot_t prot = arch_get_dma_pgprot(attrs, PAGE_KERNEL, false);
 
 	size = PAGE_ALIGN(size);
 
 	if (!coherent && !gfpflags_allow_blocking(flags)) {
 		struct page *page = NULL;
-		void *addr = __alloc_from_pool(size, &page, flags);
+		void *addr = arch_alloc_from_atomic_pool(size, &page, flags);
 
 		if (addr)
 			*dma_handle = phys_to_dma(dev, page_to_phys(page));
@@ -192,7 +185,7 @@ static void __dma_free(struct device *dev, size_t size,
 	size = PAGE_ALIGN(size);
 
 	if (!is_device_dma_coherent(dev)) {
-		if (__free_from_pool(vaddr, size))
+		if (arch_free_from_atomic_pool(vaddr, size))
 			return;
 		vunmap(vaddr);
 	}
@@ -312,7 +305,7 @@ static int __swiotlb_mmap(struct device *dev,
 	unsigned long pfn = dma_to_phys(dev, dma_addr) >> PAGE_SHIFT;
 	unsigned long off = vma->vm_pgoff;
 
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
+	vma->vm_page_prot = arch_get_dma_pgprot(attrs, vma->vm_page_prot,
 					     is_device_dma_coherent(dev));
 
 	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
@@ -526,470 +519,16 @@ static int __init dma_debug_do_init(void)
 }
 fs_initcall(dma_debug_do_init);
 
-
-#ifdef CONFIG_IOMMU_DMA
-#include <linux/dma-iommu.h>
-#include <linux/platform_device.h>
-#include <linux/amba/bus.h>
-
-/* Thankfully, all cache ops are by VA so we can ignore phys here */
-static void flush_page(struct device *dev, const void *virt, phys_addr_t phys)
-{
-	__dma_flush_range(virt, virt + PAGE_SIZE);
-}
-
-static void *__iommu_alloc_attrs(struct device *dev, size_t size,
-				 dma_addr_t *handle, gfp_t gfp,
-				 struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
-	size_t iosize = size;
-	void *addr;
-
-	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
-		return NULL;
-
-	size = PAGE_ALIGN(size);
-
-	/*
-	 * Some drivers rely on this, and we probably don't want the
-	 * possibility of stale kernel data being read by devices anyway.
-	 */
-	gfp |= __GFP_ZERO;
-
-	if (gfpflags_allow_blocking(gfp)) {
-		struct page **pages;
-		pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
-
-		pages = iommu_dma_alloc(dev, iosize, gfp, ioprot, handle,
-					flush_page);
-		if (!pages)
-			return NULL;
-
-		addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
-					      __builtin_return_address(0));
-		if (!addr)
-			iommu_dma_free(dev, pages, iosize, handle);
-	} else {
-		struct page *page;
-		/*
-		 * In atomic context we can't remap anything, so we'll only
-		 * get the virtually contiguous buffer we need by way of a
-		 * physically contiguous allocation.
-		 */
-		if (coherent) {
-			page = alloc_pages(gfp, get_order(size));
-			addr = page ? page_address(page) : NULL;
-		} else {
-			addr = __alloc_from_pool(size, &page, gfp);
-		}
-		if (!addr)
-			return NULL;
-
-		*handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-		if (iommu_dma_mapping_error(dev, *handle)) {
-			if (coherent)
-				__free_pages(page, get_order(size));
-			else
-				__free_from_pool(addr, size);
-			addr = NULL;
-		}
-	}
-	return addr;
-}
-
-static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-			       dma_addr_t handle, struct dma_attrs *attrs)
-{
-	size_t iosize = size;
-
-	size = PAGE_ALIGN(size);
-	/*
-	 * @cpu_addr will be one of 3 things depending on how it was allocated:
-	 * - A remapped array of pages from iommu_dma_alloc(), for all
-	 *   non-atomic allocations.
-	 * - A non-cacheable alias from the atomic pool, for atomic
-	 *   allocations by non-coherent devices.
-	 * - A normal lowmem address, for atomic allocations by
-	 *   coherent devices.
-	 * Hence how dodgy the below logic looks...
-	 */
-	if (__in_atomic_pool(cpu_addr, size)) {
-		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
-		__free_from_pool(cpu_addr, size);
-	} else if (is_vmalloc_addr(cpu_addr)){
-		struct vm_struct *area = find_vm_area(cpu_addr);
-
-		if (WARN_ON(!area || !area->pages))
-			return;
-		iommu_dma_free(dev, area->pages, iosize, &handle);
-		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
-	} else {
-		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
-		__free_pages(virt_to_page(cpu_addr), get_order(size));
-	}
-}
-
-static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
-			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
-			      struct dma_attrs *attrs)
-{
-	struct vm_struct *area;
-	int ret;
-
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
-					     is_device_dma_coherent(dev));
-
-	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
-		return ret;
-
-	area = find_vm_area(cpu_addr);
-	if (WARN_ON(!area || !area->pages))
-		return -ENXIO;
-
-	return iommu_dma_mmap(area->pages, size, vma);
-}
-
-static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
-			       void *cpu_addr, dma_addr_t dma_addr,
-			       size_t size, struct dma_attrs *attrs)
-{
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	struct vm_struct *area = find_vm_area(cpu_addr);
-
-	if (WARN_ON(!area || !area->pages))
-		return -ENXIO;
-
-	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
-					 GFP_KERNEL);
-}
-
-static void __iommu_sync_single_for_cpu(struct device *dev,
-					dma_addr_t dev_addr, size_t size,
-					enum dma_data_direction dir)
-{
-	phys_addr_t phys;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
-	__dma_unmap_area(phys_to_virt(phys), size, dir);
-}
-
-static void __iommu_sync_single_for_device(struct device *dev,
-					   dma_addr_t dev_addr, size_t size,
-					   enum dma_data_direction dir)
-{
-	phys_addr_t phys;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
-	__dma_map_area(phys_to_virt(phys), size, dir);
-}
-
-static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
-				   unsigned long offset, size_t size,
-				   enum dma_data_direction dir,
-				   struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-	int prot = dma_direction_to_prot(dir, coherent);
-	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
-
-	if (!iommu_dma_mapping_error(dev, dev_addr) &&
-	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
-
-	return dev_addr;
-}
-
-static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
-			       size_t size, enum dma_data_direction dir,
-			       struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
-
-	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
-}
-
-static void __iommu_sync_sg_for_cpu(struct device *dev,
-				    struct scatterlist *sgl, int nelems,
-				    enum dma_data_direction dir)
-{
-	struct scatterlist *sg;
-	int i;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	for_each_sg(sgl, sg, nelems, i)
-		__dma_unmap_area(sg_virt(sg), sg->length, dir);
-}
-
-static void __iommu_sync_sg_for_device(struct device *dev,
-				       struct scatterlist *sgl, int nelems,
-				       enum dma_data_direction dir)
-{
-	struct scatterlist *sg;
-	int i;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	for_each_sg(sgl, sg, nelems, i)
-		__dma_map_area(sg_virt(sg), sg->length, dir);
-}
-
-static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
-				int nelems, enum dma_data_direction dir,
-				struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
-
-	return iommu_dma_map_sg(dev, sgl, nelems,
-			dma_direction_to_prot(dir, coherent));
-}
-
-static void __iommu_unmap_sg_attrs(struct device *dev,
-				   struct scatterlist *sgl, int nelems,
-				   enum dma_data_direction dir,
-				   struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
-
-	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
-}
-
-static struct dma_map_ops iommu_dma_ops = {
-	.alloc = __iommu_alloc_attrs,
-	.free = __iommu_free_attrs,
-	.mmap = __iommu_mmap_attrs,
-	.get_sgtable = __iommu_get_sgtable,
-	.map_page = __iommu_map_page,
-	.unmap_page = __iommu_unmap_page,
-	.map_sg = __iommu_map_sg_attrs,
-	.unmap_sg = __iommu_unmap_sg_attrs,
-	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
-	.sync_single_for_device = __iommu_sync_single_for_device,
-	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
-	.sync_sg_for_device = __iommu_sync_sg_for_device,
-	.dma_supported = iommu_dma_supported,
-	.mapping_error = iommu_dma_mapping_error,
-};
-
-/*
- * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
- * everything it needs to - the device is only partially created and the
- * IOMMU driver hasn't seen it yet, so it can't have a group. Thus we
- * need this delayed attachment dance. Once IOMMU probe ordering is sorted
- * to move the arch_setup_dma_ops() call later, all the notifier bits below
- * become unnecessary, and will go away.
- */
-struct iommu_dma_notifier_data {
-	struct list_head list;
-	struct device *dev;
-	const struct iommu_ops *ops;
-	u64 dma_base;
-	u64 size;
-};
-static LIST_HEAD(iommu_dma_masters);
-static DEFINE_MUTEX(iommu_dma_notifier_lock);
-
-/*
- * Temporarily "borrow" a domain feature flag to to tell if we had to resort
- * to creating our own domain here, in case we need to clean it up again.
- */
-#define __IOMMU_DOMAIN_FAKE_DEFAULT		(1U << 31)
-
-static bool do_iommu_attach(struct device *dev, const struct iommu_ops *ops,
-			   u64 dma_base, u64 size)
-{
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
-	/*
-	 * Best case: The device is either part of a group which was
-	 * already attached to a domain in a previous call, or it's
-	 * been put in a default DMA domain by the IOMMU core.
-	 */
-	if (!domain) {
-		/*
-		 * Urgh. The IOMMU core isn't going to do default domains
-		 * for non-PCI devices anyway, until it has some means of
-		 * abstracting the entirely implementation-specific
-		 * sideband data/SoC topology/unicorn dust that may or
-		 * may not differentiate upstream masters.
-		 * So until then, HORRIBLE HACKS!
-		 */
-		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
-		if (!domain)
-			goto out_no_domain;
-
-		domain->ops = ops;
-		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_FAKE_DEFAULT;
-
-		if (iommu_attach_device(domain, dev))
-			goto out_put_domain;
-	}
-
-	if (iommu_dma_init_domain(domain, dma_base, size))
-		goto out_detach;
-
-	dev->archdata.dma_ops = &iommu_dma_ops;
-	return true;
-
-out_detach:
-	iommu_detach_device(domain, dev);
-out_put_domain:
-	if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
-		iommu_domain_free(domain);
-out_no_domain:
-	pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA ops\n",
-		dev_name(dev));
-	return false;
-}
-
-static void queue_iommu_attach(struct device *dev, const struct iommu_ops *ops,
-			      u64 dma_base, u64 size)
-{
-	struct iommu_dma_notifier_data *iommudata;
-
-	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
-	if (!iommudata)
-		return;
-
-	iommudata->dev = dev;
-	iommudata->ops = ops;
-	iommudata->dma_base = dma_base;
-	iommudata->size = size;
-
-	mutex_lock(&iommu_dma_notifier_lock);
-	list_add(&iommudata->list, &iommu_dma_masters);
-	mutex_unlock(&iommu_dma_notifier_lock);
-}
-
-static int __iommu_attach_notifier(struct notifier_block *nb,
-				   unsigned long action, void *data)
-{
-	struct iommu_dma_notifier_data *master, *tmp;
-
-	if (action != BUS_NOTIFY_ADD_DEVICE)
-		return 0;
-
-	mutex_lock(&iommu_dma_notifier_lock);
-	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
-		if (do_iommu_attach(master->dev, master->ops,
-				master->dma_base, master->size)) {
-			list_del(&master->list);
-			kfree(master);
-		}
-	}
-	mutex_unlock(&iommu_dma_notifier_lock);
-	return 0;
-}
-
-static int __init register_iommu_dma_ops_notifier(struct bus_type *bus)
-{
-	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
-	int ret;
-
-	if (!nb)
-		return -ENOMEM;
-	/*
-	 * The device must be attached to a domain before the driver probe
-	 * routine gets a chance to start allocating DMA buffers. However,
-	 * the IOMMU driver also needs a chance to configure the iommu_group
-	 * via its add_device callback first, so we need to make the attach
-	 * happen between those two points. Since the IOMMU core uses a bus
-	 * notifier with default priority for add_device, do the same but
-	 * with a lower priority to ensure the appropriate ordering.
-	 */
-	nb->notifier_call = __iommu_attach_notifier;
-	nb->priority = -100;
-
-	ret = bus_register_notifier(bus, nb);
-	if (ret) {
-		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
-			bus->name);
-		kfree(nb);
-	}
-	return ret;
-}
-
-static int __init __iommu_dma_init(void)
-{
-	int ret;
-
-	ret = iommu_dma_init();
-	if (!ret)
-		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
-	if (!ret)
-		ret = register_iommu_dma_ops_notifier(&amba_bustype);
-
-	/* handle devices queued before this arch_initcall */
-	if (!ret)
-		__iommu_attach_notifier(NULL, BUS_NOTIFY_ADD_DEVICE, NULL);
-	return ret;
-}
-arch_initcall(__iommu_dma_init);
-
-static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				  const struct iommu_ops *ops)
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent)
 {
-	struct iommu_group *group;
+	dev->archdata.dma_coherent = coherent;
 
-	if (!ops)
-		return;
-	/*
-	 * TODO: As a concession to the future, we're ready to handle being
-	 * called both early and late (i.e. after bus_add_device). Once all
-	 * the platform bus code is reworked to call us late and the notifier
-	 * junk above goes away, move the body of do_iommu_attach here.
-	 */
-	group = iommu_group_get(dev);
-	if (group) {
-		do_iommu_attach(dev, ops, dma_base, size);
-		iommu_group_put(group);
-	} else {
-		queue_iommu_attach(dev, ops, dma_base, size);
-	}
+	if (!common_iommu_setup_dma_ops(dev, dma_base, size, iommu))
+		arch_set_dma_ops(dev, &swiotlb_dma_ops);
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
-	if (domain) {
-		iommu_detach_device(domain, dev);
-		if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
-			iommu_domain_free(domain);
-	}
-
-	dev->archdata.dma_ops = NULL;
-}
-
-#else
-
-static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				  struct iommu_ops *iommu)
-{ }
-
-#endif  /* CONFIG_IOMMU_DMA */
-
-void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-			struct iommu_ops *iommu, bool coherent)
-{
-	if (!dev->archdata.dma_ops)
-		dev->archdata.dma_ops = &swiotlb_dma_ops;
-
-	dev->archdata.dma_coherent = coherent;
-	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
+	common_iommu_teardown_dma_ops(dev);
 }
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 42fc0c25cf1a..c0dbf765bf45 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,7 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
-obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o dma-iommu-ops.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu-ops.c b/drivers/iommu/dma-iommu-ops.c
new file mode 100644
index 000000000000..047c47e3c0ab
--- /dev/null
+++ b/drivers/iommu/dma-iommu-ops.c
@@ -0,0 +1,471 @@
+/*
+ * A common IOMMU based DMA-API implementation for ARM and ARM64 architecutes.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-iommu.h>
+#include <linux/gfp.h>
+#include <linux/huge_mm.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+#include <linux/vmalloc.h>
+
+#include <linux/platform_device.h>
+#include <linux/amba/bus.h>
+
+#include <asm/dma-mapping.h>
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+				 dma_addr_t *handle, gfp_t gfp,
+				 struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	size_t iosize = size;
+	void *addr;
+
+	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
+		return NULL;
+
+	size = PAGE_ALIGN(size);
+
+	/*
+	 * Some drivers rely on this, and we probably don't want the
+	 * possibility of stale kernel data being read by devices anyway.
+	 */
+	gfp |= __GFP_ZERO;
+
+	if (gfpflags_allow_blocking(gfp)) {
+		struct page **pages;
+		pgprot_t prot = arch_get_dma_pgprot(attrs, PAGE_KERNEL,
+						    coherent);
+
+		pages = iommu_dma_alloc(dev, iosize, gfp, ioprot, handle,
+					arch_flush_page);
+		if (!pages)
+			return NULL;
+
+		addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
+					      __builtin_return_address(0));
+		if (!addr)
+			iommu_dma_free(dev, pages, iosize, handle);
+	} else {
+		struct page *page;
+		/*
+		 * In atomic context we can't remap anything, so we'll only
+		 * get the virtually contiguous buffer we need by way of a
+		 * physically contiguous allocation.
+		 */
+		if (coherent) {
+			page = alloc_pages(gfp, get_order(size));
+			addr = page ? page_address(page) : NULL;
+		} else {
+			addr = arch_alloc_from_atomic_pool(size, &page, gfp);
+		}
+		if (!addr)
+			return NULL;
+
+		*handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
+		if (iommu_dma_mapping_error(dev, *handle)) {
+			if (coherent)
+				__free_pages(page, get_order(size));
+			else
+				arch_free_from_atomic_pool(addr, size);
+			addr = NULL;
+		}
+	}
+	return addr;
+}
+
+static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+			       dma_addr_t handle, struct dma_attrs *attrs)
+{
+	size_t iosize = size;
+
+	size = PAGE_ALIGN(size);
+	/*
+	 * @cpu_addr will be one of 3 things depending on how it was allocated:
+	 * - A remapped array of pages from iommu_dma_alloc(), for all
+	 *   non-atomic allocations.
+	 * - A non-cacheable alias from the atomic pool, for atomic
+	 *   allocations by non-coherent devices.
+	 * - A normal lowmem address, for atomic allocations by
+	 *   coherent devices.
+	 * Hence how dodgy the below logic looks...
+	 */
+	if (arch_in_atomic_pool(cpu_addr, size)) {
+		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
+		arch_free_from_atomic_pool(cpu_addr, size);
+	} else if (is_vmalloc_addr(cpu_addr)){
+		struct vm_struct *area = find_vm_area(cpu_addr);
+
+		if (WARN_ON(!area || !area->pages))
+			return;
+		iommu_dma_free(dev, area->pages, iosize, &handle);
+		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+	} else {
+		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
+		__free_pages(virt_to_page(cpu_addr), get_order(size));
+	}
+}
+
+static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			      struct dma_attrs *attrs)
+{
+	struct vm_struct *area;
+	int ret;
+
+	vma->vm_page_prot = arch_get_dma_pgprot(attrs, vma->vm_page_prot,
+					        is_device_dma_coherent(dev));
+
+	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	area = find_vm_area(cpu_addr);
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return iommu_dma_mmap(area->pages, size, vma);
+}
+
+static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
+			       void *cpu_addr, dma_addr_t dma_addr,
+			       size_t size, struct dma_attrs *attrs)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct vm_struct *area = find_vm_area(cpu_addr);
+
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+					 GFP_KERNEL);
+}
+
+static void __iommu_sync_single_for_cpu(struct device *dev,
+					dma_addr_t dev_addr, size_t size,
+					enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	arch_dma_unmap_area(phys, size, dir);
+}
+
+static void __iommu_sync_single_for_device(struct device *dev,
+					   dma_addr_t dev_addr, size_t size,
+					   enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	arch_dma_map_area(phys, size, dir);
+}
+
+static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int prot = dma_direction_to_prot(dir, coherent);
+	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
+
+	if (!iommu_dma_mapping_error(dev, dev_addr) &&
+	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+
+	return dev_addr;
+}
+
+static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
+
+	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static void __iommu_sync_sg_for_cpu(struct device *dev,
+				    struct scatterlist *sgl, int nelems,
+				    enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		arch_dma_unmap_area(sg_phys(sg), sg->length, dir);
+}
+
+static void __iommu_sync_sg_for_device(struct device *dev,
+				       struct scatterlist *sgl, int nelems,
+				       enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		arch_dma_map_area(sg_phys(sg), sg->length, dir);
+}
+
+static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+				int nelems, enum dma_data_direction dir,
+				struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
+
+	return iommu_dma_map_sg(dev, sgl, nelems,
+			dma_direction_to_prot(dir, coherent));
+}
+
+static void __iommu_unmap_sg_attrs(struct device *dev,
+				   struct scatterlist *sgl, int nelems,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
+
+	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static struct dma_map_ops iommu_dma_ops = {
+	.alloc = __iommu_alloc_attrs,
+	.free = __iommu_free_attrs,
+	.mmap = __iommu_mmap_attrs,
+	.get_sgtable = __iommu_get_sgtable,
+	.map_page = __iommu_map_page,
+	.unmap_page = __iommu_unmap_page,
+	.map_sg = __iommu_map_sg_attrs,
+	.unmap_sg = __iommu_unmap_sg_attrs,
+	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
+	.sync_single_for_device = __iommu_sync_single_for_device,
+	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
+	.sync_sg_for_device = __iommu_sync_sg_for_device,
+	.dma_supported = iommu_dma_supported,
+	.mapping_error = iommu_dma_mapping_error,
+};
+
+/*
+ * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
+ * everything it needs to - the device is only partially created and the
+ * IOMMU driver hasn't seen it yet, so it can't have a group. Thus we
+ * need this delayed attachment dance. Once IOMMU probe ordering is sorted
+ * to move the arch_setup_dma_ops() call later, all the notifier bits below
+ * become unnecessary, and will go away.
+ */
+struct iommu_dma_notifier_data {
+	struct list_head list;
+	struct device *dev;
+	const struct iommu_ops *ops;
+	u64 dma_base;
+	u64 size;
+};
+static LIST_HEAD(iommu_dma_masters);
+static DEFINE_MUTEX(iommu_dma_notifier_lock);
+
+/*
+ * Temporarily "borrow" a domain feature flag to to tell if we had to resort
+ * to creating our own domain here, in case we need to clean it up again.
+ */
+#define __IOMMU_DOMAIN_FAKE_DEFAULT		(1U << 31)
+
+static bool do_iommu_attach(struct device *dev, const struct iommu_ops *ops,
+			   u64 dma_base, u64 size)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	/*
+	 * Best case: The device is either part of a group which was
+	 * already attached to a domain in a previous call, or it's
+	 * been put in a default DMA domain by the IOMMU core.
+	 */
+	if (!domain) {
+		/*
+		 * Urgh. The IOMMU core isn't going to do default domains
+		 * for non-PCI devices anyway, until it has some means of
+		 * abstracting the entirely implementation-specific
+		 * sideband data/SoC topology/unicorn dust that may or
+		 * may not differentiate upstream masters.
+		 * So until then, HORRIBLE HACKS!
+		 */
+		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
+		if (!domain)
+			goto out_no_domain;
+
+		domain->ops = ops;
+		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_FAKE_DEFAULT;
+
+		if (iommu_attach_device(domain, dev))
+			goto out_put_domain;
+	}
+
+	if (iommu_dma_init_domain(domain, dma_base, size))
+		goto out_detach;
+
+	arch_set_dma_ops(dev, &iommu_dma_ops);
+	return true;
+
+out_detach:
+	iommu_detach_device(domain, dev);
+out_put_domain:
+	if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
+		iommu_domain_free(domain);
+out_no_domain:
+	pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA ops\n",
+		dev_name(dev));
+	return false;
+}
+
+static void queue_iommu_attach(struct device *dev, const struct iommu_ops *ops,
+			      u64 dma_base, u64 size)
+{
+	struct iommu_dma_notifier_data *iommudata;
+
+	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
+	if (!iommudata)
+		return;
+
+	iommudata->dev = dev;
+	iommudata->ops = ops;
+	iommudata->dma_base = dma_base;
+	iommudata->size = size;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_add(&iommudata->list, &iommu_dma_masters);
+	mutex_unlock(&iommu_dma_notifier_lock);
+}
+
+static int __iommu_attach_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct iommu_dma_notifier_data *master, *tmp;
+
+	if (action != BUS_NOTIFY_ADD_DEVICE)
+		return 0;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
+		if (do_iommu_attach(master->dev, master->ops,
+				master->dma_base, master->size)) {
+			list_del(&master->list);
+			kfree(master);
+		}
+	}
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int __init register_iommu_dma_ops_notifier(struct bus_type *bus)
+{
+	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
+	int ret;
+
+	if (!nb)
+		return -ENOMEM;
+	/*
+	 * The device must be attached to a domain before the driver probe
+	 * routine gets a chance to start allocating DMA buffers. However,
+	 * the IOMMU driver also needs a chance to configure the iommu_group
+	 * via its add_device callback first, so we need to make the attach
+	 * happen between those two points. Since the IOMMU core uses a bus
+	 * notifier with default priority for add_device, do the same but
+	 * with a lower priority to ensure the appropriate ordering.
+	 */
+	nb->notifier_call = __iommu_attach_notifier;
+	nb->priority = -100;
+
+	ret = bus_register_notifier(bus, nb);
+	if (ret) {
+		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
+			bus->name);
+		kfree(nb);
+	}
+	return ret;
+}
+
+static int __init __iommu_dma_init(void)
+{
+	int ret;
+
+	ret = iommu_dma_init();
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&amba_bustype);
+
+	/* handle devices queued before this arch_initcall */
+	if (!ret)
+		__iommu_attach_notifier(NULL, BUS_NOTIFY_ADD_DEVICE, NULL);
+	return ret;
+}
+arch_initcall(__iommu_dma_init);
+
+bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops)
+{
+	struct iommu_group *group;
+
+	if (!ops)
+		return false;
+	/*
+	 * TODO: As a concession to the future, we're ready to handle being
+	 * called both early and late (i.e. after bus_add_device). Once all
+	 * the platform bus code is reworked to call us late and the notifier
+	 * junk above goes away, move the body of do_iommu_attach here.
+	 */
+	group = iommu_group_get(dev);
+	if (group) {
+		do_iommu_attach(dev, ops, dma_base, size);
+		iommu_group_put(group);
+	} else {
+		queue_iommu_attach(dev, ops, dma_base, size);
+	}
+
+	return true;
+}
+
+void common_iommu_teardown_dma_ops(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain) {
+		iommu_detach_device(domain, dev);
+		if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
+			iommu_domain_free(domain);
+	}
+
+	arch_set_dma_ops(dev, NULL);
+}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index fc481037478a..01a836c43dc3 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -62,6 +62,10 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 int iommu_dma_supported(struct device *dev, u64 mask);
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 
+bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops);
+void common_iommu_teardown_dma_ops(struct device *dev);
+
 #else
 
 struct iommu_domain;
@@ -80,6 +84,16 @@ static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
 }
 
+static inline bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base,
+					u64 size, const struct iommu_ops *ops)
+{
+	return false;
+}
+
+static inline void common_iommu_teardown_dma_ops(struct device *dev)
+{
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 2/3] iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
@ 2016-02-19  8:22   ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

This patch moves all the IOMMU-based DMA-mapping code from arch/arm64/mm
to drivers/iommu/dma-iommu-ops.c. This way it can be easily shared with
ARM architecture, which will also use them.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm64/include/asm/dma-mapping.h |  39 ++-
 arch/arm64/mm/dma-mapping.c          | 491 ++---------------------------------
 drivers/iommu/Makefile               |   2 +-
 drivers/iommu/dma-iommu-ops.c        | 471 +++++++++++++++++++++++++++++++++
 include/linux/dma-iommu.h            |  14 +
 5 files changed, 538 insertions(+), 479 deletions(-)
 create mode 100644 drivers/iommu/dma-iommu-ops.c

diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h
index ba437f090a74..3a582d820717 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -22,6 +22,7 @@
 #include <linux/vmalloc.h>
 
 #include <xen/xen.h>
+#include <asm/cacheflush.h>
 #include <asm/xen/hypervisor.h>
 
 #define DMA_ERROR_CODE	(~(dma_addr_t)0)
@@ -47,14 +48,17 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
+static inline void arch_set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+{
+	dev->archdata.dma_ops = ops;
+}
+
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			struct iommu_ops *iommu, bool coherent);
 #define arch_setup_dma_ops	arch_setup_dma_ops
 
-#ifdef CONFIG_IOMMU_DMA
 void arch_teardown_dma_ops(struct device *dev);
 #define arch_teardown_dma_ops	arch_teardown_dma_ops
-#endif
 
 /* do not use this function in a driver */
 static inline bool is_device_dma_coherent(struct device *dev)
@@ -86,5 +90,36 @@ static inline void dma_mark_clean(void *addr, size_t size)
 {
 }
 
+static inline void arch_flush_page(struct device *dev, const void *virt,
+				   phys_addr_t phys)
+{
+	__dma_flush_range(virt, virt + PAGE_SIZE);
+}
+
+static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
+				     enum dma_data_direction dir)
+{
+	__dma_map_area(phys_to_virt(phys), size, dir);
+}
+
+static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
+				       enum dma_data_direction dir)
+{
+	__dma_unmap_area(phys_to_virt(phys), size, dir);
+}
+
+static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
+					pgprot_t prot, bool coherent)
+{
+	if (!coherent || dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs))
+		return pgprot_writecombine(prot);
+	return prot;
+}
+
+extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
+					 gfp_t flags);
+extern bool arch_in_atomic_pool(void *start, size_t size);
+extern int arch_free_from_atomic_pool(void *start, size_t size);
+
 #endif	/* __KERNEL__ */
 #endif	/* __ASM_DMA_MAPPING_H */
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index a6e757cbab77..d8cb8552bbff 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -24,19 +24,12 @@
 #include <linux/genalloc.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
+#include <linux/dma-iommu.h>
 #include <linux/vmalloc.h>
 #include <linux/swiotlb.h>
 
 #include <asm/cacheflush.h>
 
-static pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot,
-				 bool coherent)
-{
-	if (!coherent || dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs))
-		return pgprot_writecombine(prot);
-	return prot;
-}
-
 static struct gen_pool *atomic_pool;
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
@@ -49,7 +42,7 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
-static void *__alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
+void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page, gfp_t flags)
 {
 	unsigned long val;
 	void *ptr = NULL;
@@ -71,14 +64,14 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
 	return ptr;
 }
 
-static bool __in_atomic_pool(void *start, size_t size)
+bool arch_in_atomic_pool(void *start, size_t size)
 {
 	return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
 }
 
-static int __free_from_pool(void *start, size_t size)
+int arch_free_from_atomic_pool(void *start, size_t size)
 {
-	if (!__in_atomic_pool(start, size))
+	if (!arch_in_atomic_pool(start, size))
 		return 0;
 
 	gen_pool_free(atomic_pool, (unsigned long)start, size);
@@ -142,13 +135,13 @@ static void *__dma_alloc(struct device *dev, size_t size,
 	struct page *page;
 	void *ptr, *coherent_ptr;
 	bool coherent = is_device_dma_coherent(dev);
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false);
+	pgprot_t prot = arch_get_dma_pgprot(attrs, PAGE_KERNEL, false);
 
 	size = PAGE_ALIGN(size);
 
 	if (!coherent && !gfpflags_allow_blocking(flags)) {
 		struct page *page = NULL;
-		void *addr = __alloc_from_pool(size, &page, flags);
+		void *addr = arch_alloc_from_atomic_pool(size, &page, flags);
 
 		if (addr)
 			*dma_handle = phys_to_dma(dev, page_to_phys(page));
@@ -192,7 +185,7 @@ static void __dma_free(struct device *dev, size_t size,
 	size = PAGE_ALIGN(size);
 
 	if (!is_device_dma_coherent(dev)) {
-		if (__free_from_pool(vaddr, size))
+		if (arch_free_from_atomic_pool(vaddr, size))
 			return;
 		vunmap(vaddr);
 	}
@@ -312,7 +305,7 @@ static int __swiotlb_mmap(struct device *dev,
 	unsigned long pfn = dma_to_phys(dev, dma_addr) >> PAGE_SHIFT;
 	unsigned long off = vma->vm_pgoff;
 
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
+	vma->vm_page_prot = arch_get_dma_pgprot(attrs, vma->vm_page_prot,
 					     is_device_dma_coherent(dev));
 
 	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
@@ -526,470 +519,16 @@ static int __init dma_debug_do_init(void)
 }
 fs_initcall(dma_debug_do_init);
 
-
-#ifdef CONFIG_IOMMU_DMA
-#include <linux/dma-iommu.h>
-#include <linux/platform_device.h>
-#include <linux/amba/bus.h>
-
-/* Thankfully, all cache ops are by VA so we can ignore phys here */
-static void flush_page(struct device *dev, const void *virt, phys_addr_t phys)
-{
-	__dma_flush_range(virt, virt + PAGE_SIZE);
-}
-
-static void *__iommu_alloc_attrs(struct device *dev, size_t size,
-				 dma_addr_t *handle, gfp_t gfp,
-				 struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
-	size_t iosize = size;
-	void *addr;
-
-	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
-		return NULL;
-
-	size = PAGE_ALIGN(size);
-
-	/*
-	 * Some drivers rely on this, and we probably don't want the
-	 * possibility of stale kernel data being read by devices anyway.
-	 */
-	gfp |= __GFP_ZERO;
-
-	if (gfpflags_allow_blocking(gfp)) {
-		struct page **pages;
-		pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, coherent);
-
-		pages = iommu_dma_alloc(dev, iosize, gfp, ioprot, handle,
-					flush_page);
-		if (!pages)
-			return NULL;
-
-		addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
-					      __builtin_return_address(0));
-		if (!addr)
-			iommu_dma_free(dev, pages, iosize, handle);
-	} else {
-		struct page *page;
-		/*
-		 * In atomic context we can't remap anything, so we'll only
-		 * get the virtually contiguous buffer we need by way of a
-		 * physically contiguous allocation.
-		 */
-		if (coherent) {
-			page = alloc_pages(gfp, get_order(size));
-			addr = page ? page_address(page) : NULL;
-		} else {
-			addr = __alloc_from_pool(size, &page, gfp);
-		}
-		if (!addr)
-			return NULL;
-
-		*handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
-		if (iommu_dma_mapping_error(dev, *handle)) {
-			if (coherent)
-				__free_pages(page, get_order(size));
-			else
-				__free_from_pool(addr, size);
-			addr = NULL;
-		}
-	}
-	return addr;
-}
-
-static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-			       dma_addr_t handle, struct dma_attrs *attrs)
-{
-	size_t iosize = size;
-
-	size = PAGE_ALIGN(size);
-	/*
-	 * @cpu_addr will be one of 3 things depending on how it was allocated:
-	 * - A remapped array of pages from iommu_dma_alloc(), for all
-	 *   non-atomic allocations.
-	 * - A non-cacheable alias from the atomic pool, for atomic
-	 *   allocations by non-coherent devices.
-	 * - A normal lowmem address, for atomic allocations by
-	 *   coherent devices.
-	 * Hence how dodgy the below logic looks...
-	 */
-	if (__in_atomic_pool(cpu_addr, size)) {
-		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
-		__free_from_pool(cpu_addr, size);
-	} else if (is_vmalloc_addr(cpu_addr)){
-		struct vm_struct *area = find_vm_area(cpu_addr);
-
-		if (WARN_ON(!area || !area->pages))
-			return;
-		iommu_dma_free(dev, area->pages, iosize, &handle);
-		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
-	} else {
-		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
-		__free_pages(virt_to_page(cpu_addr), get_order(size));
-	}
-}
-
-static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
-			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
-			      struct dma_attrs *attrs)
-{
-	struct vm_struct *area;
-	int ret;
-
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot,
-					     is_device_dma_coherent(dev));
-
-	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
-		return ret;
-
-	area = find_vm_area(cpu_addr);
-	if (WARN_ON(!area || !area->pages))
-		return -ENXIO;
-
-	return iommu_dma_mmap(area->pages, size, vma);
-}
-
-static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
-			       void *cpu_addr, dma_addr_t dma_addr,
-			       size_t size, struct dma_attrs *attrs)
-{
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	struct vm_struct *area = find_vm_area(cpu_addr);
-
-	if (WARN_ON(!area || !area->pages))
-		return -ENXIO;
-
-	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
-					 GFP_KERNEL);
-}
-
-static void __iommu_sync_single_for_cpu(struct device *dev,
-					dma_addr_t dev_addr, size_t size,
-					enum dma_data_direction dir)
-{
-	phys_addr_t phys;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
-	__dma_unmap_area(phys_to_virt(phys), size, dir);
-}
-
-static void __iommu_sync_single_for_device(struct device *dev,
-					   dma_addr_t dev_addr, size_t size,
-					   enum dma_data_direction dir)
-{
-	phys_addr_t phys;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
-	__dma_map_area(phys_to_virt(phys), size, dir);
-}
-
-static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
-				   unsigned long offset, size_t size,
-				   enum dma_data_direction dir,
-				   struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-	int prot = dma_direction_to_prot(dir, coherent);
-	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
-
-	if (!iommu_dma_mapping_error(dev, dev_addr) &&
-	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
-
-	return dev_addr;
-}
-
-static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
-			       size_t size, enum dma_data_direction dir,
-			       struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
-
-	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
-}
-
-static void __iommu_sync_sg_for_cpu(struct device *dev,
-				    struct scatterlist *sgl, int nelems,
-				    enum dma_data_direction dir)
-{
-	struct scatterlist *sg;
-	int i;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	for_each_sg(sgl, sg, nelems, i)
-		__dma_unmap_area(sg_virt(sg), sg->length, dir);
-}
-
-static void __iommu_sync_sg_for_device(struct device *dev,
-				       struct scatterlist *sgl, int nelems,
-				       enum dma_data_direction dir)
-{
-	struct scatterlist *sg;
-	int i;
-
-	if (is_device_dma_coherent(dev))
-		return;
-
-	for_each_sg(sgl, sg, nelems, i)
-		__dma_map_area(sg_virt(sg), sg->length, dir);
-}
-
-static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
-				int nelems, enum dma_data_direction dir,
-				struct dma_attrs *attrs)
-{
-	bool coherent = is_device_dma_coherent(dev);
-
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
-
-	return iommu_dma_map_sg(dev, sgl, nelems,
-			dma_direction_to_prot(dir, coherent));
-}
-
-static void __iommu_unmap_sg_attrs(struct device *dev,
-				   struct scatterlist *sgl, int nelems,
-				   enum dma_data_direction dir,
-				   struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
-
-	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
-}
-
-static struct dma_map_ops iommu_dma_ops = {
-	.alloc = __iommu_alloc_attrs,
-	.free = __iommu_free_attrs,
-	.mmap = __iommu_mmap_attrs,
-	.get_sgtable = __iommu_get_sgtable,
-	.map_page = __iommu_map_page,
-	.unmap_page = __iommu_unmap_page,
-	.map_sg = __iommu_map_sg_attrs,
-	.unmap_sg = __iommu_unmap_sg_attrs,
-	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
-	.sync_single_for_device = __iommu_sync_single_for_device,
-	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
-	.sync_sg_for_device = __iommu_sync_sg_for_device,
-	.dma_supported = iommu_dma_supported,
-	.mapping_error = iommu_dma_mapping_error,
-};
-
-/*
- * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
- * everything it needs to - the device is only partially created and the
- * IOMMU driver hasn't seen it yet, so it can't have a group. Thus we
- * need this delayed attachment dance. Once IOMMU probe ordering is sorted
- * to move the arch_setup_dma_ops() call later, all the notifier bits below
- * become unnecessary, and will go away.
- */
-struct iommu_dma_notifier_data {
-	struct list_head list;
-	struct device *dev;
-	const struct iommu_ops *ops;
-	u64 dma_base;
-	u64 size;
-};
-static LIST_HEAD(iommu_dma_masters);
-static DEFINE_MUTEX(iommu_dma_notifier_lock);
-
-/*
- * Temporarily "borrow" a domain feature flag to to tell if we had to resort
- * to creating our own domain here, in case we need to clean it up again.
- */
-#define __IOMMU_DOMAIN_FAKE_DEFAULT		(1U << 31)
-
-static bool do_iommu_attach(struct device *dev, const struct iommu_ops *ops,
-			   u64 dma_base, u64 size)
-{
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
-	/*
-	 * Best case: The device is either part of a group which was
-	 * already attached to a domain in a previous call, or it's
-	 * been put in a default DMA domain by the IOMMU core.
-	 */
-	if (!domain) {
-		/*
-		 * Urgh. The IOMMU core isn't going to do default domains
-		 * for non-PCI devices anyway, until it has some means of
-		 * abstracting the entirely implementation-specific
-		 * sideband data/SoC topology/unicorn dust that may or
-		 * may not differentiate upstream masters.
-		 * So until then, HORRIBLE HACKS!
-		 */
-		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
-		if (!domain)
-			goto out_no_domain;
-
-		domain->ops = ops;
-		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_FAKE_DEFAULT;
-
-		if (iommu_attach_device(domain, dev))
-			goto out_put_domain;
-	}
-
-	if (iommu_dma_init_domain(domain, dma_base, size))
-		goto out_detach;
-
-	dev->archdata.dma_ops = &iommu_dma_ops;
-	return true;
-
-out_detach:
-	iommu_detach_device(domain, dev);
-out_put_domain:
-	if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
-		iommu_domain_free(domain);
-out_no_domain:
-	pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA ops\n",
-		dev_name(dev));
-	return false;
-}
-
-static void queue_iommu_attach(struct device *dev, const struct iommu_ops *ops,
-			      u64 dma_base, u64 size)
-{
-	struct iommu_dma_notifier_data *iommudata;
-
-	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
-	if (!iommudata)
-		return;
-
-	iommudata->dev = dev;
-	iommudata->ops = ops;
-	iommudata->dma_base = dma_base;
-	iommudata->size = size;
-
-	mutex_lock(&iommu_dma_notifier_lock);
-	list_add(&iommudata->list, &iommu_dma_masters);
-	mutex_unlock(&iommu_dma_notifier_lock);
-}
-
-static int __iommu_attach_notifier(struct notifier_block *nb,
-				   unsigned long action, void *data)
-{
-	struct iommu_dma_notifier_data *master, *tmp;
-
-	if (action != BUS_NOTIFY_ADD_DEVICE)
-		return 0;
-
-	mutex_lock(&iommu_dma_notifier_lock);
-	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
-		if (do_iommu_attach(master->dev, master->ops,
-				master->dma_base, master->size)) {
-			list_del(&master->list);
-			kfree(master);
-		}
-	}
-	mutex_unlock(&iommu_dma_notifier_lock);
-	return 0;
-}
-
-static int __init register_iommu_dma_ops_notifier(struct bus_type *bus)
-{
-	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
-	int ret;
-
-	if (!nb)
-		return -ENOMEM;
-	/*
-	 * The device must be attached to a domain before the driver probe
-	 * routine gets a chance to start allocating DMA buffers. However,
-	 * the IOMMU driver also needs a chance to configure the iommu_group
-	 * via its add_device callback first, so we need to make the attach
-	 * happen between those two points. Since the IOMMU core uses a bus
-	 * notifier with default priority for add_device, do the same but
-	 * with a lower priority to ensure the appropriate ordering.
-	 */
-	nb->notifier_call = __iommu_attach_notifier;
-	nb->priority = -100;
-
-	ret = bus_register_notifier(bus, nb);
-	if (ret) {
-		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
-			bus->name);
-		kfree(nb);
-	}
-	return ret;
-}
-
-static int __init __iommu_dma_init(void)
-{
-	int ret;
-
-	ret = iommu_dma_init();
-	if (!ret)
-		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
-	if (!ret)
-		ret = register_iommu_dma_ops_notifier(&amba_bustype);
-
-	/* handle devices queued before this arch_initcall */
-	if (!ret)
-		__iommu_attach_notifier(NULL, BUS_NOTIFY_ADD_DEVICE, NULL);
-	return ret;
-}
-arch_initcall(__iommu_dma_init);
-
-static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				  const struct iommu_ops *ops)
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+			struct iommu_ops *iommu, bool coherent)
 {
-	struct iommu_group *group;
+	dev->archdata.dma_coherent = coherent;
 
-	if (!ops)
-		return;
-	/*
-	 * TODO: As a concession to the future, we're ready to handle being
-	 * called both early and late (i.e. after bus_add_device). Once all
-	 * the platform bus code is reworked to call us late and the notifier
-	 * junk above goes away, move the body of do_iommu_attach here.
-	 */
-	group = iommu_group_get(dev);
-	if (group) {
-		do_iommu_attach(dev, ops, dma_base, size);
-		iommu_group_put(group);
-	} else {
-		queue_iommu_attach(dev, ops, dma_base, size);
-	}
+	if (!common_iommu_setup_dma_ops(dev, dma_base, size, iommu))
+		arch_set_dma_ops(dev, &swiotlb_dma_ops);
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-
-	if (domain) {
-		iommu_detach_device(domain, dev);
-		if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
-			iommu_domain_free(domain);
-	}
-
-	dev->archdata.dma_ops = NULL;
-}
-
-#else
-
-static void __iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				  struct iommu_ops *iommu)
-{ }
-
-#endif  /* CONFIG_IOMMU_DMA */
-
-void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
-			struct iommu_ops *iommu, bool coherent)
-{
-	if (!dev->archdata.dma_ops)
-		dev->archdata.dma_ops = &swiotlb_dma_ops;
-
-	dev->archdata.dma_coherent = coherent;
-	__iommu_setup_dma_ops(dev, dma_base, size, iommu);
+	common_iommu_teardown_dma_ops(dev);
 }
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 42fc0c25cf1a..c0dbf765bf45 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -1,7 +1,7 @@
 obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
-obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o dma-iommu-ops.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
 obj-$(CONFIG_IOMMU_IOVA) += iova.o
diff --git a/drivers/iommu/dma-iommu-ops.c b/drivers/iommu/dma-iommu-ops.c
new file mode 100644
index 000000000000..047c47e3c0ab
--- /dev/null
+++ b/drivers/iommu/dma-iommu-ops.c
@@ -0,0 +1,471 @@
+/*
+ * A common IOMMU based DMA-API implementation for ARM and ARM64 architecutes.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/device.h>
+#include <linux/dma-iommu.h>
+#include <linux/gfp.h>
+#include <linux/huge_mm.h>
+#include <linux/iommu.h>
+#include <linux/iova.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+#include <linux/vmalloc.h>
+
+#include <linux/platform_device.h>
+#include <linux/amba/bus.h>
+
+#include <asm/dma-mapping.h>
+
+static void *__iommu_alloc_attrs(struct device *dev, size_t size,
+				 dma_addr_t *handle, gfp_t gfp,
+				 struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int ioprot = dma_direction_to_prot(DMA_BIDIRECTIONAL, coherent);
+	size_t iosize = size;
+	void *addr;
+
+	if (WARN(!dev, "cannot create IOMMU mapping for unknown device\n"))
+		return NULL;
+
+	size = PAGE_ALIGN(size);
+
+	/*
+	 * Some drivers rely on this, and we probably don't want the
+	 * possibility of stale kernel data being read by devices anyway.
+	 */
+	gfp |= __GFP_ZERO;
+
+	if (gfpflags_allow_blocking(gfp)) {
+		struct page **pages;
+		pgprot_t prot = arch_get_dma_pgprot(attrs, PAGE_KERNEL,
+						    coherent);
+
+		pages = iommu_dma_alloc(dev, iosize, gfp, ioprot, handle,
+					arch_flush_page);
+		if (!pages)
+			return NULL;
+
+		addr = dma_common_pages_remap(pages, size, VM_USERMAP, prot,
+					      __builtin_return_address(0));
+		if (!addr)
+			iommu_dma_free(dev, pages, iosize, handle);
+	} else {
+		struct page *page;
+		/*
+		 * In atomic context we can't remap anything, so we'll only
+		 * get the virtually contiguous buffer we need by way of a
+		 * physically contiguous allocation.
+		 */
+		if (coherent) {
+			page = alloc_pages(gfp, get_order(size));
+			addr = page ? page_address(page) : NULL;
+		} else {
+			addr = arch_alloc_from_atomic_pool(size, &page, gfp);
+		}
+		if (!addr)
+			return NULL;
+
+		*handle = iommu_dma_map_page(dev, page, 0, iosize, ioprot);
+		if (iommu_dma_mapping_error(dev, *handle)) {
+			if (coherent)
+				__free_pages(page, get_order(size));
+			else
+				arch_free_from_atomic_pool(addr, size);
+			addr = NULL;
+		}
+	}
+	return addr;
+}
+
+static void __iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
+			       dma_addr_t handle, struct dma_attrs *attrs)
+{
+	size_t iosize = size;
+
+	size = PAGE_ALIGN(size);
+	/*
+	 * @cpu_addr will be one of 3 things depending on how it was allocated:
+	 * - A remapped array of pages from iommu_dma_alloc(), for all
+	 *   non-atomic allocations.
+	 * - A non-cacheable alias from the atomic pool, for atomic
+	 *   allocations by non-coherent devices.
+	 * - A normal lowmem address, for atomic allocations by
+	 *   coherent devices.
+	 * Hence how dodgy the below logic looks...
+	 */
+	if (arch_in_atomic_pool(cpu_addr, size)) {
+		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
+		arch_free_from_atomic_pool(cpu_addr, size);
+	} else if (is_vmalloc_addr(cpu_addr)){
+		struct vm_struct *area = find_vm_area(cpu_addr);
+
+		if (WARN_ON(!area || !area->pages))
+			return;
+		iommu_dma_free(dev, area->pages, iosize, &handle);
+		dma_common_free_remap(cpu_addr, size, VM_USERMAP);
+	} else {
+		iommu_dma_unmap_page(dev, handle, iosize, 0, NULL);
+		__free_pages(virt_to_page(cpu_addr), get_order(size));
+	}
+}
+
+static int __iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
+			      void *cpu_addr, dma_addr_t dma_addr, size_t size,
+			      struct dma_attrs *attrs)
+{
+	struct vm_struct *area;
+	int ret;
+
+	vma->vm_page_prot = arch_get_dma_pgprot(attrs, vma->vm_page_prot,
+					        is_device_dma_coherent(dev));
+
+	if (dma_mmap_from_coherent(dev, vma, cpu_addr, size, &ret))
+		return ret;
+
+	area = find_vm_area(cpu_addr);
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return iommu_dma_mmap(area->pages, size, vma);
+}
+
+static int __iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
+			       void *cpu_addr, dma_addr_t dma_addr,
+			       size_t size, struct dma_attrs *attrs)
+{
+	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
+	struct vm_struct *area = find_vm_area(cpu_addr);
+
+	if (WARN_ON(!area || !area->pages))
+		return -ENXIO;
+
+	return sg_alloc_table_from_pages(sgt, area->pages, count, 0, size,
+					 GFP_KERNEL);
+}
+
+static void __iommu_sync_single_for_cpu(struct device *dev,
+					dma_addr_t dev_addr, size_t size,
+					enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	arch_dma_unmap_area(phys, size, dir);
+}
+
+static void __iommu_sync_single_for_device(struct device *dev,
+					   dma_addr_t dev_addr, size_t size,
+					   enum dma_data_direction dir)
+{
+	phys_addr_t phys;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	phys = iommu_iova_to_phys(iommu_get_domain_for_dev(dev), dev_addr);
+	arch_dma_map_area(phys, size, dir);
+}
+
+static dma_addr_t __iommu_map_page(struct device *dev, struct page *page,
+				   unsigned long offset, size_t size,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+	int prot = dma_direction_to_prot(dir, coherent);
+	dma_addr_t dev_addr = iommu_dma_map_page(dev, page, offset, size, prot);
+
+	if (!iommu_dma_mapping_error(dev, dev_addr) &&
+	    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_device(dev, dev_addr, size, dir);
+
+	return dev_addr;
+}
+
+static void __iommu_unmap_page(struct device *dev, dma_addr_t dev_addr,
+			       size_t size, enum dma_data_direction dir,
+			       struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_single_for_cpu(dev, dev_addr, size, dir);
+
+	iommu_dma_unmap_page(dev, dev_addr, size, dir, attrs);
+}
+
+static void __iommu_sync_sg_for_cpu(struct device *dev,
+				    struct scatterlist *sgl, int nelems,
+				    enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		arch_dma_unmap_area(sg_phys(sg), sg->length, dir);
+}
+
+static void __iommu_sync_sg_for_device(struct device *dev,
+				       struct scatterlist *sgl, int nelems,
+				       enum dma_data_direction dir)
+{
+	struct scatterlist *sg;
+	int i;
+
+	if (is_device_dma_coherent(dev))
+		return;
+
+	for_each_sg(sgl, sg, nelems, i)
+		arch_dma_map_area(sg_phys(sg), sg->length, dir);
+}
+
+static int __iommu_map_sg_attrs(struct device *dev, struct scatterlist *sgl,
+				int nelems, enum dma_data_direction dir,
+				struct dma_attrs *attrs)
+{
+	bool coherent = is_device_dma_coherent(dev);
+
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_device(dev, sgl, nelems, dir);
+
+	return iommu_dma_map_sg(dev, sgl, nelems,
+			dma_direction_to_prot(dir, coherent));
+}
+
+static void __iommu_unmap_sg_attrs(struct device *dev,
+				   struct scatterlist *sgl, int nelems,
+				   enum dma_data_direction dir,
+				   struct dma_attrs *attrs)
+{
+	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
+		__iommu_sync_sg_for_cpu(dev, sgl, nelems, dir);
+
+	iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs);
+}
+
+static struct dma_map_ops iommu_dma_ops = {
+	.alloc = __iommu_alloc_attrs,
+	.free = __iommu_free_attrs,
+	.mmap = __iommu_mmap_attrs,
+	.get_sgtable = __iommu_get_sgtable,
+	.map_page = __iommu_map_page,
+	.unmap_page = __iommu_unmap_page,
+	.map_sg = __iommu_map_sg_attrs,
+	.unmap_sg = __iommu_unmap_sg_attrs,
+	.sync_single_for_cpu = __iommu_sync_single_for_cpu,
+	.sync_single_for_device = __iommu_sync_single_for_device,
+	.sync_sg_for_cpu = __iommu_sync_sg_for_cpu,
+	.sync_sg_for_device = __iommu_sync_sg_for_device,
+	.dma_supported = iommu_dma_supported,
+	.mapping_error = iommu_dma_mapping_error,
+};
+
+/*
+ * TODO: Right now __iommu_setup_dma_ops() gets called too early to do
+ * everything it needs to - the device is only partially created and the
+ * IOMMU driver hasn't seen it yet, so it can't have a group. Thus we
+ * need this delayed attachment dance. Once IOMMU probe ordering is sorted
+ * to move the arch_setup_dma_ops() call later, all the notifier bits below
+ * become unnecessary, and will go away.
+ */
+struct iommu_dma_notifier_data {
+	struct list_head list;
+	struct device *dev;
+	const struct iommu_ops *ops;
+	u64 dma_base;
+	u64 size;
+};
+static LIST_HEAD(iommu_dma_masters);
+static DEFINE_MUTEX(iommu_dma_notifier_lock);
+
+/*
+ * Temporarily "borrow" a domain feature flag to to tell if we had to resort
+ * to creating our own domain here, in case we need to clean it up again.
+ */
+#define __IOMMU_DOMAIN_FAKE_DEFAULT		(1U << 31)
+
+static bool do_iommu_attach(struct device *dev, const struct iommu_ops *ops,
+			   u64 dma_base, u64 size)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	/*
+	 * Best case: The device is either part of a group which was
+	 * already attached to a domain in a previous call, or it's
+	 * been put in a default DMA domain by the IOMMU core.
+	 */
+	if (!domain) {
+		/*
+		 * Urgh. The IOMMU core isn't going to do default domains
+		 * for non-PCI devices anyway, until it has some means of
+		 * abstracting the entirely implementation-specific
+		 * sideband data/SoC topology/unicorn dust that may or
+		 * may not differentiate upstream masters.
+		 * So until then, HORRIBLE HACKS!
+		 */
+		domain = ops->domain_alloc(IOMMU_DOMAIN_DMA);
+		if (!domain)
+			goto out_no_domain;
+
+		domain->ops = ops;
+		domain->type = IOMMU_DOMAIN_DMA | __IOMMU_DOMAIN_FAKE_DEFAULT;
+
+		if (iommu_attach_device(domain, dev))
+			goto out_put_domain;
+	}
+
+	if (iommu_dma_init_domain(domain, dma_base, size))
+		goto out_detach;
+
+	arch_set_dma_ops(dev, &iommu_dma_ops);
+	return true;
+
+out_detach:
+	iommu_detach_device(domain, dev);
+out_put_domain:
+	if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
+		iommu_domain_free(domain);
+out_no_domain:
+	pr_warn("Failed to set up IOMMU for device %s; retaining platform DMA ops\n",
+		dev_name(dev));
+	return false;
+}
+
+static void queue_iommu_attach(struct device *dev, const struct iommu_ops *ops,
+			      u64 dma_base, u64 size)
+{
+	struct iommu_dma_notifier_data *iommudata;
+
+	iommudata = kzalloc(sizeof(*iommudata), GFP_KERNEL);
+	if (!iommudata)
+		return;
+
+	iommudata->dev = dev;
+	iommudata->ops = ops;
+	iommudata->dma_base = dma_base;
+	iommudata->size = size;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_add(&iommudata->list, &iommu_dma_masters);
+	mutex_unlock(&iommu_dma_notifier_lock);
+}
+
+static int __iommu_attach_notifier(struct notifier_block *nb,
+				   unsigned long action, void *data)
+{
+	struct iommu_dma_notifier_data *master, *tmp;
+
+	if (action != BUS_NOTIFY_ADD_DEVICE)
+		return 0;
+
+	mutex_lock(&iommu_dma_notifier_lock);
+	list_for_each_entry_safe(master, tmp, &iommu_dma_masters, list) {
+		if (do_iommu_attach(master->dev, master->ops,
+				master->dma_base, master->size)) {
+			list_del(&master->list);
+			kfree(master);
+		}
+	}
+	mutex_unlock(&iommu_dma_notifier_lock);
+	return 0;
+}
+
+static int __init register_iommu_dma_ops_notifier(struct bus_type *bus)
+{
+	struct notifier_block *nb = kzalloc(sizeof(*nb), GFP_KERNEL);
+	int ret;
+
+	if (!nb)
+		return -ENOMEM;
+	/*
+	 * The device must be attached to a domain before the driver probe
+	 * routine gets a chance to start allocating DMA buffers. However,
+	 * the IOMMU driver also needs a chance to configure the iommu_group
+	 * via its add_device callback first, so we need to make the attach
+	 * happen between those two points. Since the IOMMU core uses a bus
+	 * notifier with default priority for add_device, do the same but
+	 * with a lower priority to ensure the appropriate ordering.
+	 */
+	nb->notifier_call = __iommu_attach_notifier;
+	nb->priority = -100;
+
+	ret = bus_register_notifier(bus, nb);
+	if (ret) {
+		pr_warn("Failed to register DMA domain notifier; IOMMU DMA ops unavailable on bus '%s'\n",
+			bus->name);
+		kfree(nb);
+	}
+	return ret;
+}
+
+static int __init __iommu_dma_init(void)
+{
+	int ret;
+
+	ret = iommu_dma_init();
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&platform_bus_type);
+	if (!ret)
+		ret = register_iommu_dma_ops_notifier(&amba_bustype);
+
+	/* handle devices queued before this arch_initcall */
+	if (!ret)
+		__iommu_attach_notifier(NULL, BUS_NOTIFY_ADD_DEVICE, NULL);
+	return ret;
+}
+arch_initcall(__iommu_dma_init);
+
+bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops)
+{
+	struct iommu_group *group;
+
+	if (!ops)
+		return false;
+	/*
+	 * TODO: As a concession to the future, we're ready to handle being
+	 * called both early and late (i.e. after bus_add_device). Once all
+	 * the platform bus code is reworked to call us late and the notifier
+	 * junk above goes away, move the body of do_iommu_attach here.
+	 */
+	group = iommu_group_get(dev);
+	if (group) {
+		do_iommu_attach(dev, ops, dma_base, size);
+		iommu_group_put(group);
+	} else {
+		queue_iommu_attach(dev, ops, dma_base, size);
+	}
+
+	return true;
+}
+
+void common_iommu_teardown_dma_ops(struct device *dev)
+{
+	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+
+	if (domain) {
+		iommu_detach_device(domain, dev);
+		if (domain->type & __IOMMU_DOMAIN_FAKE_DEFAULT)
+			iommu_domain_free(domain);
+	}
+
+	arch_set_dma_ops(dev, NULL);
+}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index fc481037478a..01a836c43dc3 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -62,6 +62,10 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
 int iommu_dma_supported(struct device *dev, u64 mask);
 int iommu_dma_mapping_error(struct device *dev, dma_addr_t dma_addr);
 
+bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+				  const struct iommu_ops *ops);
+void common_iommu_teardown_dma_ops(struct device *dev);
+
 #else
 
 struct iommu_domain;
@@ -80,6 +84,16 @@ static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
 {
 }
 
+static inline bool common_iommu_setup_dma_ops(struct device *dev, u64 dma_base,
+					u64 size, const struct iommu_ops *ops)
+{
+	return false;
+}
+
+static inline void common_iommu_teardown_dma_ops(struct device *dev)
+{
+}
+
 #endif	/* CONFIG_IOMMU_DMA */
 #endif	/* __KERNEL__ */
 #endif	/* __DMA_IOMMU_H */
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
  2016-02-19  8:22 ` Marek Szyprowski
  (?)
@ 2016-02-19  8:22   ` Marek Szyprowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: iommu, linux-arm-kernel, linux-kernel
  Cc: Marek Szyprowski, linaro-mm-sig, dri-devel, Arnd Bergmann,
	Will Deacon, Catalin Marinas, Robin Murphy,
	Russell King - ARM Linux, Joerg Roedel, Laurent Pinchart,
	Sakari Ailus, Mark Yao, Heiko Stuebner, Tomasz Figa, Inki Dae,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
side-effect of this change is a switch from bitmap-based IO address space
management to tree-based code. There should be no functional changes
for drivers, which rely on initialization from generic arch_setup_dna_ops()
interface. Code, which used old arm_iommu_* functions must be updated to
new interface.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/Kconfig                   |   22 +-
 arch/arm/include/asm/device.h      |    9 -
 arch/arm/include/asm/dma-iommu.h   |   37 --
 arch/arm/include/asm/dma-mapping.h |   59 +-
 arch/arm/mm/dma-mapping.c          | 1158 +-----------------------------------
 drivers/gpu/drm/rockchip/Kconfig   |    1 +
 drivers/iommu/Kconfig              |    1 +
 drivers/media/platform/Kconfig     |    1 +
 8 files changed, 82 insertions(+), 1206 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4f799e567fc8..ed45f0d63cee 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -107,27 +107,7 @@ config ARM_DMA_USE_IOMMU
 	bool
 	select ARM_HAS_SG_CHAIN
 	select NEED_SG_DMA_LENGTH
-
-if ARM_DMA_USE_IOMMU
-
-config ARM_DMA_IOMMU_ALIGNMENT
-	int "Maximum PAGE_SIZE order of alignment for DMA IOMMU buffers"
-	range 4 9
-	default 8
-	help
-	  DMA mapping framework by default aligns all buffers to the smallest
-	  PAGE_SIZE order which is greater than or equal to the requested buffer
-	  size. This works well for buffers up to a few hundreds kilobytes, but
-	  for larger buffers it just a waste of address space. Drivers which has
-	  relatively small addressing window (like 64Mib) might run out of
-	  virtual space with just a few allocations.
-
-	  With this parameter you can specify the maximum PAGE_SIZE order for
-	  DMA IOMMU buffers. Larger buffers will be aligned only to this
-	  specified order. The order is expressed as a power of two multiplied
-	  by the PAGE_SIZE.
-
-endif
+	select IOMMU_DMA
 
 config MIGHT_HAVE_PCI
 	bool
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 4111592f0130..6ea939c39cd4 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu; /* private IOMMU data */
 #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-	struct dma_iommu_mapping	*mapping;
-#endif
 	bool dma_coherent;
 };
 
@@ -28,10 +25,4 @@ struct pdev_archdata {
 #endif
 };
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-#define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
-
 #endif
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
deleted file mode 100644
index 2ef282f96651..000000000000
--- a/arch/arm/include/asm/dma-iommu.h
+++ /dev/null
@@ -1,37 +0,0 @@
-#ifndef ASMARM_DMA_IOMMU_H
-#define ASMARM_DMA_IOMMU_H
-
-#ifdef __KERNEL__
-
-#include <linux/mm_types.h>
-#include <linux/scatterlist.h>
-#include <linux/dma-debug.h>
-#include <linux/kmemcheck.h>
-#include <linux/kref.h>
-
-struct dma_iommu_mapping {
-	/* iommu specific data */
-	struct iommu_domain	*domain;
-
-	unsigned long		**bitmaps;	/* array of bitmaps */
-	unsigned int		nr_bitmaps;	/* nr of elements in array */
-	unsigned int		extensions;
-	size_t			bitmap_size;	/* size of a single bitmap */
-	size_t			bits;		/* per bitmap */
-	dma_addr_t		base;
-
-	spinlock_t		lock;
-	struct kref		kref;
-};
-
-struct dma_iommu_mapping *
-arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size);
-
-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);
-
-int arm_iommu_attach_device(struct device *dev,
-					struct dma_iommu_mapping *mapping);
-void arm_iommu_detach_device(struct device *dev);
-
-#endif /* __KERNEL__ */
-#endif
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 6ad1ceda62a5..08bedb0c02c6 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -8,6 +8,7 @@
 #include <linux/dma-attrs.h>
 #include <linux/dma-debug.h>
 
+#include <asm/cacheflush.h>
 #include <asm/memory.h>
 
 #include <xen/xen.h>
@@ -32,7 +33,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
-static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+static inline void arch_set_dma_ops(struct device *dev, struct dma_map_ops *ops)
 {
 	BUG_ON(!dev);
 	dev->archdata.dma_ops = ops;
@@ -275,5 +276,61 @@ extern int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		struct dma_attrs *attrs);
 
+/*
+ * The DMA API is built upon the notion of "buffer ownership".  A buffer
+ * is either exclusively owned by the CPU (and therefore may be accessed
+ * by it) or exclusively owned by the DMA device.  These helper functions
+ * represent the transitions between these two ownership states.
+ *
+ * Note, however, that on later ARMs, this notion does not work due to
+ * speculative prefetches.  We model our approach on the assumption that
+ * the CPU does do speculative prefetches, which means we clean caches
+ * before transfers and delay cache invalidation until transfer completion.
+ *
+ */
+extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
+				  enum dma_data_direction);
+extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
+				  enum dma_data_direction);
+
+static inline void arch_flush_page(struct device *dev, const void *virt,
+			    phys_addr_t phys)
+{
+	dmac_flush_range(virt, virt + PAGE_SIZE);
+	outer_flush_range(phys, phys + PAGE_SIZE);
+}
+
+static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
+				     enum dma_data_direction dir)
+{
+	unsigned int offset = phys & ~PAGE_MASK;
+	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
+}
+
+static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
+				       enum dma_data_direction dir)
+{
+	unsigned int offset = phys & ~PAGE_MASK;
+	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
+}
+
+static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
+					pgprot_t prot, bool coherent)
+{
+	if (coherent)
+		return prot;
+
+	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			    pgprot_writecombine(prot) :
+			    pgprot_dmacoherent(prot);
+	return prot;
+}
+
+extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
+					 gfp_t flags);
+extern bool arch_in_atomic_pool(void *start, size_t size);
+extern int arch_free_from_atomic_pool(void *start, size_t size);
+
+
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0eca3812527e..5d497f3c5924 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -20,6 +20,7 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
+#include <linux/dma-iommu.h>
 #include <linux/highmem.h>
 #include <linux/memblock.h>
 #include <linux/slab.h>
@@ -34,7 +35,6 @@
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/mach/arch.h>
-#include <asm/dma-iommu.h>
 #include <asm/mach/map.h>
 #include <asm/system_info.h>
 #include <asm/dma-contiguous.h>
@@ -42,23 +42,6 @@
 #include "dma.h"
 #include "mm.h"
 
-/*
- * The DMA API is built upon the notion of "buffer ownership".  A buffer
- * is either exclusively owned by the CPU (and therefore may be accessed
- * by it) or exclusively owned by the DMA device.  These helper functions
- * represent the transitions between these two ownership states.
- *
- * Note, however, that on later ARMs, this notion does not work due to
- * speculative prefetches.  We model our approach on the assumption that
- * the CPU does do speculative prefetches, which means we clean caches
- * before transfers and delay cache invalidation until transfer completion.
- *
- */
-static void __dma_page_cpu_to_dev(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-static void __dma_page_dev_to_cpu(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
 /**
  * arm_dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -492,7 +475,7 @@ static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
 	return ptr;
 }
 
-static void *__alloc_from_pool(size_t size, struct page **ret_page)
+void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page, gfp_t gfp)
 {
 	unsigned long val;
 	void *ptr = NULL;
@@ -513,14 +496,14 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page)
 	return ptr;
 }
 
-static bool __in_atomic_pool(void *start, size_t size)
+bool arch_in_atomic_pool(void *start, size_t size)
 {
 	return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
 }
 
-static int __free_from_pool(void *start, size_t size)
+int arch_free_from_atomic_pool(void *start, size_t size)
 {
-	if (!__in_atomic_pool(start, size))
+	if (!arch_in_atomic_pool(start, size))
 		return 0;
 
 	gen_pool_free(atomic_pool, (unsigned long)start, size);
@@ -574,25 +557,21 @@ static void __free_from_contiguous(struct device *dev, struct page *page,
 	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
 }
 
-static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot)
-{
-	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
-			    pgprot_writecombine(prot) :
-			    pgprot_dmacoherent(prot);
-	return prot;
-}
-
 #define nommu() 0
 
+#define __alloc_from_pool(size, ret, gfp) arch_alloc_from_atomic_pool(size, ret, gfp)
+#define __free_from_pool(addr, size) arch_free_from_atomic_pool(addr, size)
+#define __get_dma_pgprot(attrs, prot, coherent) arch_get_dma_pgprot(attrs, prot, coherent)
+
 #else	/* !CONFIG_MMU */
 
 #define nommu() 1
 
-#define __get_dma_pgprot(attrs, prot)				__pgprot(0)
+#define __get_dma_pgprot(attrs, prot, coherent)				__pgprot(0)
 #define __alloc_remap_buffer(dev, size, gfp, prot, ret, c, wv)	NULL
-#define __alloc_from_pool(size, ret_page)			NULL
+#define __alloc_from_pool(size, ret_page, gfp)			NULL
 #define __alloc_from_contiguous(dev, size, prot, ret, c, wv)	NULL
-#define __free_from_pool(cpu_addr, size)			0
+#define __free_from_atomic_pool(cpu_addr, size)			0
 #define __free_from_contiguous(dev, page, cpu_addr, size, wv)	do { } while (0)
 #define __dma_free_remap(cpu_addr, size)			do { } while (0)
 
@@ -657,7 +636,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 	else if (is_coherent)
 		addr = __alloc_simple_buffer(dev, size, gfp, &page);
 	else if (!gfpflags_allow_blocking(gfp))
-		addr = __alloc_from_pool(size, &page);
+		addr = __alloc_from_pool(size, &page, gfp);
 	else
 		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page,
 					    caller, want_vaddr);
@@ -675,7 +654,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 		    gfp_t gfp, struct dma_attrs *attrs)
 {
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
+	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false);
 
 	return __dma_alloc(dev, size, handle, gfp, prot, false,
 			   attrs, __builtin_return_address(0));
@@ -728,7 +707,7 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
 		 struct dma_attrs *attrs)
 {
 #ifdef CONFIG_MMU
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot, false);
 #endif	/* CONFIG_MMU */
 	return __arm_dma_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
 }
@@ -842,7 +821,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
  * platforms with CONFIG_DMABOUNCE.
  * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
  */
-static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
+void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr;
@@ -858,7 +837,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	/* FIXME: non-speculating: flush on bidirectional mappings? */
 }
 
-static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
+void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = page_to_phys(page) + off;
@@ -1023,1098 +1002,6 @@ static int __init dma_debug_do_init(void)
 }
 fs_initcall(dma_debug_do_init);
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-
-/* IOMMU */
-
-static int extend_iommu_mapping(struct dma_iommu_mapping *mapping);
-
-static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping,
-				      size_t size)
-{
-	unsigned int order = get_order(size);
-	unsigned int align = 0;
-	unsigned int count, start;
-	size_t mapping_size = mapping->bits << PAGE_SHIFT;
-	unsigned long flags;
-	dma_addr_t iova;
-	int i;
-
-	if (order > CONFIG_ARM_DMA_IOMMU_ALIGNMENT)
-		order = CONFIG_ARM_DMA_IOMMU_ALIGNMENT;
-
-	count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	align = (1 << order) - 1;
-
-	spin_lock_irqsave(&mapping->lock, flags);
-	for (i = 0; i < mapping->nr_bitmaps; i++) {
-		start = bitmap_find_next_zero_area(mapping->bitmaps[i],
-				mapping->bits, 0, count, align);
-
-		if (start > mapping->bits)
-			continue;
-
-		bitmap_set(mapping->bitmaps[i], start, count);
-		break;
-	}
-
-	/*
-	 * No unused range found. Try to extend the existing mapping
-	 * and perform a second attempt to reserve an IO virtual
-	 * address range of size bytes.
-	 */
-	if (i == mapping->nr_bitmaps) {
-		if (extend_iommu_mapping(mapping)) {
-			spin_unlock_irqrestore(&mapping->lock, flags);
-			return DMA_ERROR_CODE;
-		}
-
-		start = bitmap_find_next_zero_area(mapping->bitmaps[i],
-				mapping->bits, 0, count, align);
-
-		if (start > mapping->bits) {
-			spin_unlock_irqrestore(&mapping->lock, flags);
-			return DMA_ERROR_CODE;
-		}
-
-		bitmap_set(mapping->bitmaps[i], start, count);
-	}
-	spin_unlock_irqrestore(&mapping->lock, flags);
-
-	iova = mapping->base + (mapping_size * i);
-	iova += start << PAGE_SHIFT;
-
-	return iova;
-}
-
-static inline void __free_iova(struct dma_iommu_mapping *mapping,
-			       dma_addr_t addr, size_t size)
-{
-	unsigned int start, count;
-	size_t mapping_size = mapping->bits << PAGE_SHIFT;
-	unsigned long flags;
-	dma_addr_t bitmap_base;
-	u32 bitmap_index;
-
-	if (!size)
-		return;
-
-	bitmap_index = (u32) (addr - mapping->base) / (u32) mapping_size;
-	BUG_ON(addr < mapping->base || bitmap_index > mapping->extensions);
-
-	bitmap_base = mapping->base + mapping_size * bitmap_index;
-
-	start = (addr - bitmap_base) >>	PAGE_SHIFT;
-
-	if (addr + size > bitmap_base + mapping_size) {
-		/*
-		 * The address range to be freed reaches into the iova
-		 * range of the next bitmap. This should not happen as
-		 * we don't allow this in __alloc_iova (at the
-		 * moment).
-		 */
-		BUG();
-	} else
-		count = size >> PAGE_SHIFT;
-
-	spin_lock_irqsave(&mapping->lock, flags);
-	bitmap_clear(mapping->bitmaps[bitmap_index], start, count);
-	spin_unlock_irqrestore(&mapping->lock, flags);
-}
-
-static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
-					  gfp_t gfp, struct dma_attrs *attrs)
-{
-	struct page **pages;
-	int count = size >> PAGE_SHIFT;
-	int array_size = count * sizeof(struct page *);
-	int i = 0;
-
-	if (array_size <= PAGE_SIZE)
-		pages = kzalloc(array_size, GFP_KERNEL);
-	else
-		pages = vzalloc(array_size);
-	if (!pages)
-		return NULL;
-
-	if (dma_get_attr(DMA_ATTR_FORCE_CONTIGUOUS, attrs))
-	{
-		unsigned long order = get_order(size);
-		struct page *page;
-
-		page = dma_alloc_from_contiguous(dev, count, order);
-		if (!page)
-			goto error;
-
-		__dma_clear_buffer(page, size);
-
-		for (i = 0; i < count; i++)
-			pages[i] = page + i;
-
-		return pages;
-	}
-
-	/*
-	 * IOMMU can map any pages, so himem can also be used here
-	 */
-	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
-
-	while (count) {
-		int j, order;
-
-		for (order = __fls(count); order > 0; --order) {
-			/*
-			 * We do not want OOM killer to be invoked as long
-			 * as we can fall back to single pages, so we force
-			 * __GFP_NORETRY for orders higher than zero.
-			 */
-			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
-			if (pages[i])
-				break;
-		}
-
-		if (!pages[i]) {
-			/*
-			 * Fall back to single page allocation.
-			 * Might invoke OOM killer as last resort.
-			 */
-			pages[i] = alloc_pages(gfp, 0);
-			if (!pages[i])
-				goto error;
-		}
-
-		if (order) {
-			split_page(pages[i], order);
-			j = 1 << order;
-			while (--j)
-				pages[i + j] = pages[i] + j;
-		}
-
-		__dma_clear_buffer(pages[i], PAGE_SIZE << order);
-		i += 1 << order;
-		count -= 1 << order;
-	}
-
-	return pages;
-error:
-	while (i--)
-		if (pages[i])
-			__free_pages(pages[i], 0);
-	kvfree(pages);
-	return NULL;
-}
-
-static int __iommu_free_buffer(struct device *dev, struct page **pages,
-			       size_t size, struct dma_attrs *attrs)
-{
-	int count = size >> PAGE_SHIFT;
-	int i;
-
-	if (dma_get_attr(DMA_ATTR_FORCE_CONTIGUOUS, attrs)) {
-		dma_release_from_contiguous(dev, pages[0], count);
-	} else {
-		for (i = 0; i < count; i++)
-			if (pages[i])
-				__free_pages(pages[i], 0);
-	}
-
-	kvfree(pages);
-	return 0;
-}
-
-/*
- * Create a CPU mapping for a specified pages
- */
-static void *
-__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot,
-		    const void *caller)
-{
-	return dma_common_pages_remap(pages, size,
-			VM_ARM_DMA_CONSISTENT | VM_USERMAP, prot, caller);
-}
-
-/*
- * Create a mapping in device IO address space for specified pages
- */
-static dma_addr_t
-__iommu_create_mapping(struct device *dev, struct page **pages, size_t size)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	dma_addr_t dma_addr, iova;
-	int i;
-
-	dma_addr = __alloc_iova(mapping, size);
-	if (dma_addr == DMA_ERROR_CODE)
-		return dma_addr;
-
-	iova = dma_addr;
-	for (i = 0; i < count; ) {
-		int ret;
-
-		unsigned int next_pfn = page_to_pfn(pages[i]) + 1;
-		phys_addr_t phys = page_to_phys(pages[i]);
-		unsigned int len, j;
-
-		for (j = i + 1; j < count; j++, next_pfn++)
-			if (page_to_pfn(pages[j]) != next_pfn)
-				break;
-
-		len = (j - i) << PAGE_SHIFT;
-		ret = iommu_map(mapping->domain, iova, phys, len,
-				IOMMU_READ|IOMMU_WRITE);
-		if (ret < 0)
-			goto fail;
-		iova += len;
-		i = j;
-	}
-	return dma_addr;
-fail:
-	iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
-	__free_iova(mapping, dma_addr, size);
-	return DMA_ERROR_CODE;
-}
-
-static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-	/*
-	 * add optional in-page offset from iova to size and align
-	 * result to page size
-	 */
-	size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
-	iova &= PAGE_MASK;
-
-	iommu_unmap(mapping->domain, iova, size);
-	__free_iova(mapping, iova, size);
-	return 0;
-}
-
-static struct page **__atomic_get_pages(void *addr)
-{
-	struct page *page;
-	phys_addr_t phys;
-
-	phys = gen_pool_virt_to_phys(atomic_pool, (unsigned long)addr);
-	page = phys_to_page(phys);
-
-	return (struct page **)page;
-}
-
-static struct page **__iommu_get_pages(void *cpu_addr, struct dma_attrs *attrs)
-{
-	struct vm_struct *area;
-
-	if (__in_atomic_pool(cpu_addr, PAGE_SIZE))
-		return __atomic_get_pages(cpu_addr);
-
-	if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
-		return cpu_addr;
-
-	area = find_vm_area(cpu_addr);
-	if (area && (area->flags & VM_ARM_DMA_CONSISTENT))
-		return area->pages;
-	return NULL;
-}
-
-static void *__iommu_alloc_atomic(struct device *dev, size_t size,
-				  dma_addr_t *handle)
-{
-	struct page *page;
-	void *addr;
-
-	addr = __alloc_from_pool(size, &page);
-	if (!addr)
-		return NULL;
-
-	*handle = __iommu_create_mapping(dev, &page, size);
-	if (*handle == DMA_ERROR_CODE)
-		goto err_mapping;
-
-	return addr;
-
-err_mapping:
-	__free_from_pool(addr, size);
-	return NULL;
-}
-
-static void __iommu_free_atomic(struct device *dev, void *cpu_addr,
-				dma_addr_t handle, size_t size)
-{
-	__iommu_remove_mapping(dev, handle, size);
-	__free_from_pool(cpu_addr, size);
-}
-
-static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
-	    dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
-{
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
-	struct page **pages;
-	void *addr = NULL;
-
-	*handle = DMA_ERROR_CODE;
-	size = PAGE_ALIGN(size);
-
-	if (!gfpflags_allow_blocking(gfp))
-		return __iommu_alloc_atomic(dev, size, handle);
-
-	/*
-	 * Following is a work-around (a.k.a. hack) to prevent pages
-	 * with __GFP_COMP being passed to split_page() which cannot
-	 * handle them.  The real problem is that this flag probably
-	 * should be 0 on ARM as it is not supported on this
-	 * platform; see CONFIG_HUGETLBFS.
-	 */
-	gfp &= ~(__GFP_COMP);
-
-	pages = __iommu_alloc_buffer(dev, size, gfp, attrs);
-	if (!pages)
-		return NULL;
-
-	*handle = __iommu_create_mapping(dev, pages, size);
-	if (*handle == DMA_ERROR_CODE)
-		goto err_buffer;
-
-	if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
-		return pages;
-
-	addr = __iommu_alloc_remap(pages, size, gfp, prot,
-				   __builtin_return_address(0));
-	if (!addr)
-		goto err_mapping;
-
-	return addr;
-
-err_mapping:
-	__iommu_remove_mapping(dev, *handle, size);
-err_buffer:
-	__iommu_free_buffer(dev, pages, size, attrs);
-	return NULL;
-}
-
-static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size,
-		    struct dma_attrs *attrs)
-{
-	unsigned long uaddr = vma->vm_start;
-	unsigned long usize = vma->vm_end - vma->vm_start;
-	struct page **pages = __iommu_get_pages(cpu_addr, attrs);
-	unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	unsigned long off = vma->vm_pgoff;
-
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
-
-	if (!pages)
-		return -ENXIO;
-
-	if (off >= nr_pages || (usize >> PAGE_SHIFT) > nr_pages - off)
-		return -ENXIO;
-
-	pages += off;
-
-	do {
-		int ret = vm_insert_page(vma, uaddr, *pages++);
-		if (ret) {
-			pr_err("Remapping memory failed: %d\n", ret);
-			return ret;
-		}
-		uaddr += PAGE_SIZE;
-		usize -= PAGE_SIZE;
-	} while (usize > 0);
-
-	return 0;
-}
-
-/*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
- */
-void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-			  dma_addr_t handle, struct dma_attrs *attrs)
-{
-	struct page **pages;
-	size = PAGE_ALIGN(size);
-
-	if (__in_atomic_pool(cpu_addr, size)) {
-		__iommu_free_atomic(dev, cpu_addr, handle, size);
-		return;
-	}
-
-	pages = __iommu_get_pages(cpu_addr, attrs);
-	if (!pages) {
-		WARN(1, "trying to free invalid coherent area: %p\n", cpu_addr);
-		return;
-	}
-
-	if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) {
-		dma_common_free_remap(cpu_addr, size,
-			VM_ARM_DMA_CONSISTENT | VM_USERMAP);
-	}
-
-	__iommu_remove_mapping(dev, handle, size);
-	__iommu_free_buffer(dev, pages, size, attrs);
-}
-
-static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
-				 void *cpu_addr, dma_addr_t dma_addr,
-				 size_t size, struct dma_attrs *attrs)
-{
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	struct page **pages = __iommu_get_pages(cpu_addr, attrs);
-
-	if (!pages)
-		return -ENXIO;
-
-	return sg_alloc_table_from_pages(sgt, pages, count, 0, size,
-					 GFP_KERNEL);
-}
-
-static int __dma_direction_to_prot(enum dma_data_direction dir)
-{
-	int prot;
-
-	switch (dir) {
-	case DMA_BIDIRECTIONAL:
-		prot = IOMMU_READ | IOMMU_WRITE;
-		break;
-	case DMA_TO_DEVICE:
-		prot = IOMMU_READ;
-		break;
-	case DMA_FROM_DEVICE:
-		prot = IOMMU_WRITE;
-		break;
-	default:
-		prot = 0;
-	}
-
-	return prot;
-}
-
-/*
- * Map a part of the scatter-gather list into contiguous io address space
- */
-static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
-			  size_t size, dma_addr_t *handle,
-			  enum dma_data_direction dir, struct dma_attrs *attrs,
-			  bool is_coherent)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova, iova_base;
-	int ret = 0;
-	unsigned int count;
-	struct scatterlist *s;
-	int prot;
-
-	size = PAGE_ALIGN(size);
-	*handle = DMA_ERROR_CODE;
-
-	iova_base = iova = __alloc_iova(mapping, size);
-	if (iova == DMA_ERROR_CODE)
-		return -ENOMEM;
-
-	for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
-		phys_addr_t phys = page_to_phys(sg_page(s));
-		unsigned int len = PAGE_ALIGN(s->offset + s->length);
-
-		if (!is_coherent &&
-			!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
-
-		prot = __dma_direction_to_prot(dir);
-
-		ret = iommu_map(mapping->domain, iova, phys, len, prot);
-		if (ret < 0)
-			goto fail;
-		count += len >> PAGE_SHIFT;
-		iova += len;
-	}
-	*handle = iova_base;
-
-	return 0;
-fail:
-	iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
-	__free_iova(mapping, iova_base, size);
-	return ret;
-}
-
-static int __iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
-		     enum dma_data_direction dir, struct dma_attrs *attrs,
-		     bool is_coherent)
-{
-	struct scatterlist *s = sg, *dma = sg, *start = sg;
-	int i, count = 0;
-	unsigned int offset = s->offset;
-	unsigned int size = s->offset + s->length;
-	unsigned int max = dma_get_max_seg_size(dev);
-
-	for (i = 1; i < nents; i++) {
-		s = sg_next(s);
-
-		s->dma_address = DMA_ERROR_CODE;
-		s->dma_length = 0;
-
-		if (s->offset || (size & ~PAGE_MASK) || size + s->length > max) {
-			if (__map_sg_chunk(dev, start, size, &dma->dma_address,
-			    dir, attrs, is_coherent) < 0)
-				goto bad_mapping;
-
-			dma->dma_address += offset;
-			dma->dma_length = size - offset;
-
-			size = offset = s->offset;
-			start = s;
-			dma = sg_next(dma);
-			count += 1;
-		}
-		size += s->length;
-	}
-	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir, attrs,
-		is_coherent) < 0)
-		goto bad_mapping;
-
-	dma->dma_address += offset;
-	dma->dma_length = size - offset;
-
-	return count+1;
-
-bad_mapping:
-	for_each_sg(sg, s, count, i)
-		__iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s));
-	return 0;
-}
-
-/**
- * arm_coherent_iommu_map_sg - map a set of SG buffers for streaming mode DMA
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map
- * @dir: DMA transfer direction
- *
- * Map a set of i/o coherent buffers described by scatterlist in streaming
- * mode for DMA. The scatter gather list elements are merged together (if
- * possible) and tagged with the appropriate dma address and length. They are
- * obtained via sg_dma_{address,length}.
- */
-int arm_coherent_iommu_map_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	return __iommu_map_sg(dev, sg, nents, dir, attrs, true);
-}
-
-/**
- * arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map
- * @dir: DMA transfer direction
- *
- * Map a set of buffers described by scatterlist in streaming mode for DMA.
- * The scatter gather list elements are merged together (if possible) and
- * tagged with the appropriate dma address and length. They are obtained via
- * sg_dma_{address,length}.
- */
-int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	return __iommu_map_sg(dev, sg, nents, dir, attrs, false);
-}
-
-static void __iommu_unmap_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs,
-		bool is_coherent)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i) {
-		if (sg_dma_len(s))
-			__iommu_remove_mapping(dev, sg_dma_address(s),
-					       sg_dma_len(s));
-		if (!is_coherent &&
-		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-			__dma_page_dev_to_cpu(sg_page(s), s->offset,
-					      s->length, dir);
-	}
-}
-
-/**
- * arm_coherent_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- *
- * Unmap a set of streaming mode DMA translations.  Again, CPU access
- * rules concerning calls here are the same as for dma_unmap_single().
- */
-void arm_coherent_iommu_unmap_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	__iommu_unmap_sg(dev, sg, nents, dir, attrs, true);
-}
-
-/**
- * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- *
- * Unmap a set of streaming mode DMA translations.  Again, CPU access
- * rules concerning calls here are the same as for dma_unmap_single().
- */
-void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
-			enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	__iommu_unmap_sg(dev, sg, nents, dir, attrs, false);
-}
-
-/**
- * arm_iommu_sync_sg_for_cpu
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map (returned from dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- */
-void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
-			int nents, enum dma_data_direction dir)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i)
-		__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
-
-}
-
-/**
- * arm_iommu_sync_sg_for_device
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map (returned from dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- */
-void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
-			int nents, enum dma_data_direction dir)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i)
-		__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
-}
-
-
-/**
- * arm_coherent_iommu_map_page
- * @dev: valid struct device pointer
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Coherent IOMMU aware version of arm_dma_map_page()
- */
-static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir,
-	     struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t dma_addr;
-	int ret, prot, len = PAGE_ALIGN(size + offset);
-
-	dma_addr = __alloc_iova(mapping, len);
-	if (dma_addr == DMA_ERROR_CODE)
-		return dma_addr;
-
-	prot = __dma_direction_to_prot(dir);
-
-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
-	if (ret < 0)
-		goto fail;
-
-	return dma_addr + offset;
-fail:
-	__free_iova(mapping, dma_addr, len);
-	return DMA_ERROR_CODE;
-}
-
-/**
- * arm_iommu_map_page
- * @dev: valid struct device pointer
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * IOMMU aware version of arm_dma_map_page()
- */
-static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir,
-	     struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__dma_page_cpu_to_dev(page, offset, size, dir);
-
-	return arm_coherent_iommu_map_page(dev, page, offset, size, dir, attrs);
-}
-
-/**
- * arm_coherent_iommu_unmap_page
- * @dev: valid struct device pointer
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * Coherent IOMMU aware version of arm_dma_unmap_page()
- */
-static void arm_coherent_iommu_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	int offset = handle & ~PAGE_MASK;
-	int len = PAGE_ALIGN(size + offset);
-
-	if (!iova)
-		return;
-
-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
-}
-
-/**
- * arm_iommu_unmap_page
- * @dev: valid struct device pointer
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * IOMMU aware version of arm_dma_unmap_page()
- */
-static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	int offset = handle & ~PAGE_MASK;
-	int len = PAGE_ALIGN(size + offset);
-
-	if (!iova)
-		return;
-
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__dma_page_dev_to_cpu(page, offset, size, dir);
-
-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
-}
-
-static void arm_iommu_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	unsigned int offset = handle & ~PAGE_MASK;
-
-	if (!iova)
-		return;
-
-	__dma_page_dev_to_cpu(page, offset, size, dir);
-}
-
-static void arm_iommu_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	unsigned int offset = handle & ~PAGE_MASK;
-
-	if (!iova)
-		return;
-
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-}
-
-struct dma_map_ops iommu_ops = {
-	.alloc		= arm_iommu_alloc_attrs,
-	.free		= arm_iommu_free_attrs,
-	.mmap		= arm_iommu_mmap_attrs,
-	.get_sgtable	= arm_iommu_get_sgtable,
-
-	.map_page		= arm_iommu_map_page,
-	.unmap_page		= arm_iommu_unmap_page,
-	.sync_single_for_cpu	= arm_iommu_sync_single_for_cpu,
-	.sync_single_for_device	= arm_iommu_sync_single_for_device,
-
-	.map_sg			= arm_iommu_map_sg,
-	.unmap_sg		= arm_iommu_unmap_sg,
-	.sync_sg_for_cpu	= arm_iommu_sync_sg_for_cpu,
-	.sync_sg_for_device	= arm_iommu_sync_sg_for_device,
-
-	.set_dma_mask		= arm_dma_set_mask,
-};
-
-struct dma_map_ops iommu_coherent_ops = {
-	.alloc		= arm_iommu_alloc_attrs,
-	.free		= arm_iommu_free_attrs,
-	.mmap		= arm_iommu_mmap_attrs,
-	.get_sgtable	= arm_iommu_get_sgtable,
-
-	.map_page	= arm_coherent_iommu_map_page,
-	.unmap_page	= arm_coherent_iommu_unmap_page,
-
-	.map_sg		= arm_coherent_iommu_map_sg,
-	.unmap_sg	= arm_coherent_iommu_unmap_sg,
-
-	.set_dma_mask	= arm_dma_set_mask,
-};
-
-/**
- * arm_iommu_create_mapping
- * @bus: pointer to the bus holding the client device (for IOMMU calls)
- * @base: start address of the valid IO address space
- * @size: maximum size of the valid IO address space
- *
- * Creates a mapping structure which holds information about used/unused
- * IO address ranges, which is required to perform memory allocation and
- * mapping with IOMMU aware functions.
- *
- * The client device need to be attached to the mapping with
- * arm_iommu_attach_device function.
- */
-struct dma_iommu_mapping *
-arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size)
-{
-	unsigned int bits = size >> PAGE_SHIFT;
-	unsigned int bitmap_size = BITS_TO_LONGS(bits) * sizeof(long);
-	struct dma_iommu_mapping *mapping;
-	int extensions = 1;
-	int err = -ENOMEM;
-
-	/* currently only 32-bit DMA address space is supported */
-	if (size > DMA_BIT_MASK(32) + 1)
-		return ERR_PTR(-ERANGE);
-
-	if (!bitmap_size)
-		return ERR_PTR(-EINVAL);
-
-	if (bitmap_size > PAGE_SIZE) {
-		extensions = bitmap_size / PAGE_SIZE;
-		bitmap_size = PAGE_SIZE;
-	}
-
-	mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL);
-	if (!mapping)
-		goto err;
-
-	mapping->bitmap_size = bitmap_size;
-	mapping->bitmaps = kzalloc(extensions * sizeof(unsigned long *),
-				GFP_KERNEL);
-	if (!mapping->bitmaps)
-		goto err2;
-
-	mapping->bitmaps[0] = kzalloc(bitmap_size, GFP_KERNEL);
-	if (!mapping->bitmaps[0])
-		goto err3;
-
-	mapping->nr_bitmaps = 1;
-	mapping->extensions = extensions;
-	mapping->base = base;
-	mapping->bits = BITS_PER_BYTE * bitmap_size;
-
-	spin_lock_init(&mapping->lock);
-
-	mapping->domain = iommu_domain_alloc(bus);
-	if (!mapping->domain)
-		goto err4;
-
-	kref_init(&mapping->kref);
-	return mapping;
-err4:
-	kfree(mapping->bitmaps[0]);
-err3:
-	kfree(mapping->bitmaps);
-err2:
-	kfree(mapping);
-err:
-	return ERR_PTR(err);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_create_mapping);
-
-static void release_iommu_mapping(struct kref *kref)
-{
-	int i;
-	struct dma_iommu_mapping *mapping =
-		container_of(kref, struct dma_iommu_mapping, kref);
-
-	iommu_domain_free(mapping->domain);
-	for (i = 0; i < mapping->nr_bitmaps; i++)
-		kfree(mapping->bitmaps[i]);
-	kfree(mapping->bitmaps);
-	kfree(mapping);
-}
-
-static int extend_iommu_mapping(struct dma_iommu_mapping *mapping)
-{
-	int next_bitmap;
-
-	if (mapping->nr_bitmaps >= mapping->extensions)
-		return -EINVAL;
-
-	next_bitmap = mapping->nr_bitmaps;
-	mapping->bitmaps[next_bitmap] = kzalloc(mapping->bitmap_size,
-						GFP_ATOMIC);
-	if (!mapping->bitmaps[next_bitmap])
-		return -ENOMEM;
-
-	mapping->nr_bitmaps++;
-
-	return 0;
-}
-
-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping)
-{
-	if (mapping)
-		kref_put(&mapping->kref, release_iommu_mapping);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_release_mapping);
-
-static int __arm_iommu_attach_device(struct device *dev,
-				     struct dma_iommu_mapping *mapping)
-{
-	int err;
-
-	err = iommu_attach_device(mapping->domain, dev);
-	if (err)
-		return err;
-
-	kref_get(&mapping->kref);
-	to_dma_iommu_mapping(dev) = mapping;
-
-	pr_debug("Attached IOMMU controller to %s device.\n", dev_name(dev));
-	return 0;
-}
-
-/**
- * arm_iommu_attach_device
- * @dev: valid struct device pointer
- * @mapping: io address space mapping structure (returned from
- *	arm_iommu_create_mapping)
- *
- * Attaches specified io address space mapping to the provided device.
- * This replaces the dma operations (dma_map_ops pointer) with the
- * IOMMU aware version.
- *
- * More than one client might be attached to the same io address space
- * mapping.
- */
-int arm_iommu_attach_device(struct device *dev,
-			    struct dma_iommu_mapping *mapping)
-{
-	int err;
-
-	err = __arm_iommu_attach_device(dev, mapping);
-	if (err)
-		return err;
-
-	set_dma_ops(dev, &iommu_ops);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
-
-static void __arm_iommu_detach_device(struct device *dev)
-{
-	struct dma_iommu_mapping *mapping;
-
-	mapping = to_dma_iommu_mapping(dev);
-	if (!mapping) {
-		dev_warn(dev, "Not attached\n");
-		return;
-	}
-
-	iommu_detach_device(mapping->domain, dev);
-	kref_put(&mapping->kref, release_iommu_mapping);
-	to_dma_iommu_mapping(dev) = NULL;
-
-	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
-}
-
-/**
- * arm_iommu_detach_device
- * @dev: valid struct device pointer
- *
- * Detaches the provided device from a previously attached map.
- * This voids the dma operations (dma_map_ops pointer)
- */
-void arm_iommu_detach_device(struct device *dev)
-{
-	__arm_iommu_detach_device(dev);
-	set_dma_ops(dev, NULL);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_detach_device);
-
-static struct dma_map_ops *arm_get_iommu_dma_map_ops(bool coherent)
-{
-	return coherent ? &iommu_coherent_ops : &iommu_ops;
-}
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    struct iommu_ops *iommu)
-{
-	struct dma_iommu_mapping *mapping;
-
-	if (!iommu)
-		return false;
-
-	mapping = arm_iommu_create_mapping(dev->bus, dma_base, size);
-	if (IS_ERR(mapping)) {
-		pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n",
-				size, dev_name(dev));
-		return false;
-	}
-
-	if (__arm_iommu_attach_device(dev, mapping)) {
-		pr_warn("Failed to attached device %s to IOMMU_mapping\n",
-				dev_name(dev));
-		arm_iommu_release_mapping(mapping);
-		return false;
-	}
-
-	return true;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-	if (!mapping)
-		return;
-
-	__arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(mapping);
-}
-
-#else
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    struct iommu_ops *iommu)
-{
-	return false;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev) { }
-
-#define arm_get_iommu_dma_map_ops arm_get_dma_map_ops
-
-#endif	/* CONFIG_ARM_DMA_USE_IOMMU */
-
 static struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
@@ -2123,18 +1010,13 @@ static struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			struct iommu_ops *iommu, bool coherent)
 {
-	struct dma_map_ops *dma_ops;
-
 	dev->archdata.dma_coherent = coherent;
-	if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
-		dma_ops = arm_get_iommu_dma_map_ops(coherent);
-	else
-		dma_ops = arm_get_dma_map_ops(coherent);
 
-	set_dma_ops(dev, dma_ops);
+	if (!common_iommu_setup_dma_ops(dev, dma_base, size, iommu))
+		arch_set_dma_ops(dev, arm_get_dma_map_ops(coherent));
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
-	arm_teardown_iommu_dma_ops(dev);
+	common_iommu_teardown_dma_ops(dev);
 }
diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig
index 85739859dffc..7bdb5cf64ba3 100644
--- a/drivers/gpu/drm/rockchip/Kconfig
+++ b/drivers/gpu/drm/rockchip/Kconfig
@@ -1,5 +1,6 @@
 config DRM_ROCKCHIP
 	tristate "DRM Support for Rockchip"
+	depends on BROKEN
 	depends on DRM && ROCKCHIP_IOMMU
 	depends on RESET_CONTROLLER
 	select DRM_KMS_HELPER
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1674de1cfed0..8a99210f1cbc 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -265,6 +265,7 @@ config EXYNOS_IOMMU_DEBUG
 
 config IPMMU_VMSA
 	bool "Renesas VMSA-compatible IPMMU"
+	depends on BROKEN
 	depends on ARM_LPAE
 	depends on ARCH_SHMOBILE || COMPILE_TEST
 	select IOMMU_API
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 8b89ebe16d94..fb8bb372e489 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -88,6 +88,7 @@ config VIDEO_OMAP3
 	depends on VIDEO_V4L2 && I2C && VIDEO_V4L2_SUBDEV_API && ARCH_OMAP3
 	depends on HAS_DMA && OF
 	depends on OMAP_IOMMU
+	depends on BROKEN
 	select ARM_DMA_USE_IOMMU
 	select VIDEOBUF2_DMA_CONTIG
 	select MFD_SYSCON
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-19  8:22   ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: iommu, linux-arm-kernel, linux-kernel
  Cc: Krzysztof Kozlowski, Russell King - ARM Linux, Arnd Bergmann,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	dri-devel, Tomasz Figa, linaro-mm-sig, Sakari Ailus,
	Laurent Pinchart, Robin Murphy, Marek Szyprowski

This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
side-effect of this change is a switch from bitmap-based IO address space
management to tree-based code. There should be no functional changes
for drivers, which rely on initialization from generic arch_setup_dna_ops()
interface. Code, which used old arm_iommu_* functions must be updated to
new interface.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/Kconfig                   |   22 +-
 arch/arm/include/asm/device.h      |    9 -
 arch/arm/include/asm/dma-iommu.h   |   37 --
 arch/arm/include/asm/dma-mapping.h |   59 +-
 arch/arm/mm/dma-mapping.c          | 1158 +-----------------------------------
 drivers/gpu/drm/rockchip/Kconfig   |    1 +
 drivers/iommu/Kconfig              |    1 +
 drivers/media/platform/Kconfig     |    1 +
 8 files changed, 82 insertions(+), 1206 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4f799e567fc8..ed45f0d63cee 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -107,27 +107,7 @@ config ARM_DMA_USE_IOMMU
 	bool
 	select ARM_HAS_SG_CHAIN
 	select NEED_SG_DMA_LENGTH
-
-if ARM_DMA_USE_IOMMU
-
-config ARM_DMA_IOMMU_ALIGNMENT
-	int "Maximum PAGE_SIZE order of alignment for DMA IOMMU buffers"
-	range 4 9
-	default 8
-	help
-	  DMA mapping framework by default aligns all buffers to the smallest
-	  PAGE_SIZE order which is greater than or equal to the requested buffer
-	  size. This works well for buffers up to a few hundreds kilobytes, but
-	  for larger buffers it just a waste of address space. Drivers which has
-	  relatively small addressing window (like 64Mib) might run out of
-	  virtual space with just a few allocations.
-
-	  With this parameter you can specify the maximum PAGE_SIZE order for
-	  DMA IOMMU buffers. Larger buffers will be aligned only to this
-	  specified order. The order is expressed as a power of two multiplied
-	  by the PAGE_SIZE.
-
-endif
+	select IOMMU_DMA
 
 config MIGHT_HAVE_PCI
 	bool
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 4111592f0130..6ea939c39cd4 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu; /* private IOMMU data */
 #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-	struct dma_iommu_mapping	*mapping;
-#endif
 	bool dma_coherent;
 };
 
@@ -28,10 +25,4 @@ struct pdev_archdata {
 #endif
 };
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-#define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
-
 #endif
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
deleted file mode 100644
index 2ef282f96651..000000000000
--- a/arch/arm/include/asm/dma-iommu.h
+++ /dev/null
@@ -1,37 +0,0 @@
-#ifndef ASMARM_DMA_IOMMU_H
-#define ASMARM_DMA_IOMMU_H
-
-#ifdef __KERNEL__
-
-#include <linux/mm_types.h>
-#include <linux/scatterlist.h>
-#include <linux/dma-debug.h>
-#include <linux/kmemcheck.h>
-#include <linux/kref.h>
-
-struct dma_iommu_mapping {
-	/* iommu specific data */
-	struct iommu_domain	*domain;
-
-	unsigned long		**bitmaps;	/* array of bitmaps */
-	unsigned int		nr_bitmaps;	/* nr of elements in array */
-	unsigned int		extensions;
-	size_t			bitmap_size;	/* size of a single bitmap */
-	size_t			bits;		/* per bitmap */
-	dma_addr_t		base;
-
-	spinlock_t		lock;
-	struct kref		kref;
-};
-
-struct dma_iommu_mapping *
-arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size);
-
-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);
-
-int arm_iommu_attach_device(struct device *dev,
-					struct dma_iommu_mapping *mapping);
-void arm_iommu_detach_device(struct device *dev);
-
-#endif /* __KERNEL__ */
-#endif
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 6ad1ceda62a5..08bedb0c02c6 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -8,6 +8,7 @@
 #include <linux/dma-attrs.h>
 #include <linux/dma-debug.h>
 
+#include <asm/cacheflush.h>
 #include <asm/memory.h>
 
 #include <xen/xen.h>
@@ -32,7 +33,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
-static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+static inline void arch_set_dma_ops(struct device *dev, struct dma_map_ops *ops)
 {
 	BUG_ON(!dev);
 	dev->archdata.dma_ops = ops;
@@ -275,5 +276,61 @@ extern int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		struct dma_attrs *attrs);
 
+/*
+ * The DMA API is built upon the notion of "buffer ownership".  A buffer
+ * is either exclusively owned by the CPU (and therefore may be accessed
+ * by it) or exclusively owned by the DMA device.  These helper functions
+ * represent the transitions between these two ownership states.
+ *
+ * Note, however, that on later ARMs, this notion does not work due to
+ * speculative prefetches.  We model our approach on the assumption that
+ * the CPU does do speculative prefetches, which means we clean caches
+ * before transfers and delay cache invalidation until transfer completion.
+ *
+ */
+extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
+				  enum dma_data_direction);
+extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
+				  enum dma_data_direction);
+
+static inline void arch_flush_page(struct device *dev, const void *virt,
+			    phys_addr_t phys)
+{
+	dmac_flush_range(virt, virt + PAGE_SIZE);
+	outer_flush_range(phys, phys + PAGE_SIZE);
+}
+
+static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
+				     enum dma_data_direction dir)
+{
+	unsigned int offset = phys & ~PAGE_MASK;
+	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
+}
+
+static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
+				       enum dma_data_direction dir)
+{
+	unsigned int offset = phys & ~PAGE_MASK;
+	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
+}
+
+static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
+					pgprot_t prot, bool coherent)
+{
+	if (coherent)
+		return prot;
+
+	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			    pgprot_writecombine(prot) :
+			    pgprot_dmacoherent(prot);
+	return prot;
+}
+
+extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
+					 gfp_t flags);
+extern bool arch_in_atomic_pool(void *start, size_t size);
+extern int arch_free_from_atomic_pool(void *start, size_t size);
+
+
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0eca3812527e..5d497f3c5924 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -20,6 +20,7 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
+#include <linux/dma-iommu.h>
 #include <linux/highmem.h>
 #include <linux/memblock.h>
 #include <linux/slab.h>
@@ -34,7 +35,6 @@
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/mach/arch.h>
-#include <asm/dma-iommu.h>
 #include <asm/mach/map.h>
 #include <asm/system_info.h>
 #include <asm/dma-contiguous.h>
@@ -42,23 +42,6 @@
 #include "dma.h"
 #include "mm.h"
 
-/*
- * The DMA API is built upon the notion of "buffer ownership".  A buffer
- * is either exclusively owned by the CPU (and therefore may be accessed
- * by it) or exclusively owned by the DMA device.  These helper functions
- * represent the transitions between these two ownership states.
- *
- * Note, however, that on later ARMs, this notion does not work due to
- * speculative prefetches.  We model our approach on the assumption that
- * the CPU does do speculative prefetches, which means we clean caches
- * before transfers and delay cache invalidation until transfer completion.
- *
- */
-static void __dma_page_cpu_to_dev(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-static void __dma_page_dev_to_cpu(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
 /**
  * arm_dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -492,7 +475,7 @@ static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
 	return ptr;
 }
 
-static void *__alloc_from_pool(size_t size, struct page **ret_page)
+void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page, gfp_t gfp)
 {
 	unsigned long val;
 	void *ptr = NULL;
@@ -513,14 +496,14 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page)
 	return ptr;
 }
 
-static bool __in_atomic_pool(void *start, size_t size)
+bool arch_in_atomic_pool(void *start, size_t size)
 {
 	return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
 }
 
-static int __free_from_pool(void *start, size_t size)
+int arch_free_from_atomic_pool(void *start, size_t size)
 {
-	if (!__in_atomic_pool(start, size))
+	if (!arch_in_atomic_pool(start, size))
 		return 0;
 
 	gen_pool_free(atomic_pool, (unsigned long)start, size);
@@ -574,25 +557,21 @@ static void __free_from_contiguous(struct device *dev, struct page *page,
 	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
 }
 
-static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot)
-{
-	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
-			    pgprot_writecombine(prot) :
-			    pgprot_dmacoherent(prot);
-	return prot;
-}
-
 #define nommu() 0
 
+#define __alloc_from_pool(size, ret, gfp) arch_alloc_from_atomic_pool(size, ret, gfp)
+#define __free_from_pool(addr, size) arch_free_from_atomic_pool(addr, size)
+#define __get_dma_pgprot(attrs, prot, coherent) arch_get_dma_pgprot(attrs, prot, coherent)
+
 #else	/* !CONFIG_MMU */
 
 #define nommu() 1
 
-#define __get_dma_pgprot(attrs, prot)				__pgprot(0)
+#define __get_dma_pgprot(attrs, prot, coherent)				__pgprot(0)
 #define __alloc_remap_buffer(dev, size, gfp, prot, ret, c, wv)	NULL
-#define __alloc_from_pool(size, ret_page)			NULL
+#define __alloc_from_pool(size, ret_page, gfp)			NULL
 #define __alloc_from_contiguous(dev, size, prot, ret, c, wv)	NULL
-#define __free_from_pool(cpu_addr, size)			0
+#define __free_from_atomic_pool(cpu_addr, size)			0
 #define __free_from_contiguous(dev, page, cpu_addr, size, wv)	do { } while (0)
 #define __dma_free_remap(cpu_addr, size)			do { } while (0)
 
@@ -657,7 +636,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 	else if (is_coherent)
 		addr = __alloc_simple_buffer(dev, size, gfp, &page);
 	else if (!gfpflags_allow_blocking(gfp))
-		addr = __alloc_from_pool(size, &page);
+		addr = __alloc_from_pool(size, &page, gfp);
 	else
 		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page,
 					    caller, want_vaddr);
@@ -675,7 +654,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 		    gfp_t gfp, struct dma_attrs *attrs)
 {
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
+	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false);
 
 	return __dma_alloc(dev, size, handle, gfp, prot, false,
 			   attrs, __builtin_return_address(0));
@@ -728,7 +707,7 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
 		 struct dma_attrs *attrs)
 {
 #ifdef CONFIG_MMU
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot, false);
 #endif	/* CONFIG_MMU */
 	return __arm_dma_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
 }
@@ -842,7 +821,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
  * platforms with CONFIG_DMABOUNCE.
  * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
  */
-static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
+void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr;
@@ -858,7 +837,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	/* FIXME: non-speculating: flush on bidirectional mappings? */
 }
 
-static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
+void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = page_to_phys(page) + off;
@@ -1023,1098 +1002,6 @@ static int __init dma_debug_do_init(void)
 }
 fs_initcall(dma_debug_do_init);
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-
-/* IOMMU */
-
-static int extend_iommu_mapping(struct dma_iommu_mapping *mapping);
-
-static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping,
-				      size_t size)
-{
-	unsigned int order = get_order(size);
-	unsigned int align = 0;
-	unsigned int count, start;
-	size_t mapping_size = mapping->bits << PAGE_SHIFT;
-	unsigned long flags;
-	dma_addr_t iova;
-	int i;
-
-	if (order > CONFIG_ARM_DMA_IOMMU_ALIGNMENT)
-		order = CONFIG_ARM_DMA_IOMMU_ALIGNMENT;
-
-	count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	align = (1 << order) - 1;
-
-	spin_lock_irqsave(&mapping->lock, flags);
-	for (i = 0; i < mapping->nr_bitmaps; i++) {
-		start = bitmap_find_next_zero_area(mapping->bitmaps[i],
-				mapping->bits, 0, count, align);
-
-		if (start > mapping->bits)
-			continue;
-
-		bitmap_set(mapping->bitmaps[i], start, count);
-		break;
-	}
-
-	/*
-	 * No unused range found. Try to extend the existing mapping
-	 * and perform a second attempt to reserve an IO virtual
-	 * address range of size bytes.
-	 */
-	if (i == mapping->nr_bitmaps) {
-		if (extend_iommu_mapping(mapping)) {
-			spin_unlock_irqrestore(&mapping->lock, flags);
-			return DMA_ERROR_CODE;
-		}
-
-		start = bitmap_find_next_zero_area(mapping->bitmaps[i],
-				mapping->bits, 0, count, align);
-
-		if (start > mapping->bits) {
-			spin_unlock_irqrestore(&mapping->lock, flags);
-			return DMA_ERROR_CODE;
-		}
-
-		bitmap_set(mapping->bitmaps[i], start, count);
-	}
-	spin_unlock_irqrestore(&mapping->lock, flags);
-
-	iova = mapping->base + (mapping_size * i);
-	iova += start << PAGE_SHIFT;
-
-	return iova;
-}
-
-static inline void __free_iova(struct dma_iommu_mapping *mapping,
-			       dma_addr_t addr, size_t size)
-{
-	unsigned int start, count;
-	size_t mapping_size = mapping->bits << PAGE_SHIFT;
-	unsigned long flags;
-	dma_addr_t bitmap_base;
-	u32 bitmap_index;
-
-	if (!size)
-		return;
-
-	bitmap_index = (u32) (addr - mapping->base) / (u32) mapping_size;
-	BUG_ON(addr < mapping->base || bitmap_index > mapping->extensions);
-
-	bitmap_base = mapping->base + mapping_size * bitmap_index;
-
-	start = (addr - bitmap_base) >>	PAGE_SHIFT;
-
-	if (addr + size > bitmap_base + mapping_size) {
-		/*
-		 * The address range to be freed reaches into the iova
-		 * range of the next bitmap. This should not happen as
-		 * we don't allow this in __alloc_iova (at the
-		 * moment).
-		 */
-		BUG();
-	} else
-		count = size >> PAGE_SHIFT;
-
-	spin_lock_irqsave(&mapping->lock, flags);
-	bitmap_clear(mapping->bitmaps[bitmap_index], start, count);
-	spin_unlock_irqrestore(&mapping->lock, flags);
-}
-
-static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
-					  gfp_t gfp, struct dma_attrs *attrs)
-{
-	struct page **pages;
-	int count = size >> PAGE_SHIFT;
-	int array_size = count * sizeof(struct page *);
-	int i = 0;
-
-	if (array_size <= PAGE_SIZE)
-		pages = kzalloc(array_size, GFP_KERNEL);
-	else
-		pages = vzalloc(array_size);
-	if (!pages)
-		return NULL;
-
-	if (dma_get_attr(DMA_ATTR_FORCE_CONTIGUOUS, attrs))
-	{
-		unsigned long order = get_order(size);
-		struct page *page;
-
-		page = dma_alloc_from_contiguous(dev, count, order);
-		if (!page)
-			goto error;
-
-		__dma_clear_buffer(page, size);
-
-		for (i = 0; i < count; i++)
-			pages[i] = page + i;
-
-		return pages;
-	}
-
-	/*
-	 * IOMMU can map any pages, so himem can also be used here
-	 */
-	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
-
-	while (count) {
-		int j, order;
-
-		for (order = __fls(count); order > 0; --order) {
-			/*
-			 * We do not want OOM killer to be invoked as long
-			 * as we can fall back to single pages, so we force
-			 * __GFP_NORETRY for orders higher than zero.
-			 */
-			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
-			if (pages[i])
-				break;
-		}
-
-		if (!pages[i]) {
-			/*
-			 * Fall back to single page allocation.
-			 * Might invoke OOM killer as last resort.
-			 */
-			pages[i] = alloc_pages(gfp, 0);
-			if (!pages[i])
-				goto error;
-		}
-
-		if (order) {
-			split_page(pages[i], order);
-			j = 1 << order;
-			while (--j)
-				pages[i + j] = pages[i] + j;
-		}
-
-		__dma_clear_buffer(pages[i], PAGE_SIZE << order);
-		i += 1 << order;
-		count -= 1 << order;
-	}
-
-	return pages;
-error:
-	while (i--)
-		if (pages[i])
-			__free_pages(pages[i], 0);
-	kvfree(pages);
-	return NULL;
-}
-
-static int __iommu_free_buffer(struct device *dev, struct page **pages,
-			       size_t size, struct dma_attrs *attrs)
-{
-	int count = size >> PAGE_SHIFT;
-	int i;
-
-	if (dma_get_attr(DMA_ATTR_FORCE_CONTIGUOUS, attrs)) {
-		dma_release_from_contiguous(dev, pages[0], count);
-	} else {
-		for (i = 0; i < count; i++)
-			if (pages[i])
-				__free_pages(pages[i], 0);
-	}
-
-	kvfree(pages);
-	return 0;
-}
-
-/*
- * Create a CPU mapping for a specified pages
- */
-static void *
-__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot,
-		    const void *caller)
-{
-	return dma_common_pages_remap(pages, size,
-			VM_ARM_DMA_CONSISTENT | VM_USERMAP, prot, caller);
-}
-
-/*
- * Create a mapping in device IO address space for specified pages
- */
-static dma_addr_t
-__iommu_create_mapping(struct device *dev, struct page **pages, size_t size)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	dma_addr_t dma_addr, iova;
-	int i;
-
-	dma_addr = __alloc_iova(mapping, size);
-	if (dma_addr == DMA_ERROR_CODE)
-		return dma_addr;
-
-	iova = dma_addr;
-	for (i = 0; i < count; ) {
-		int ret;
-
-		unsigned int next_pfn = page_to_pfn(pages[i]) + 1;
-		phys_addr_t phys = page_to_phys(pages[i]);
-		unsigned int len, j;
-
-		for (j = i + 1; j < count; j++, next_pfn++)
-			if (page_to_pfn(pages[j]) != next_pfn)
-				break;
-
-		len = (j - i) << PAGE_SHIFT;
-		ret = iommu_map(mapping->domain, iova, phys, len,
-				IOMMU_READ|IOMMU_WRITE);
-		if (ret < 0)
-			goto fail;
-		iova += len;
-		i = j;
-	}
-	return dma_addr;
-fail:
-	iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
-	__free_iova(mapping, dma_addr, size);
-	return DMA_ERROR_CODE;
-}
-
-static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-	/*
-	 * add optional in-page offset from iova to size and align
-	 * result to page size
-	 */
-	size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
-	iova &= PAGE_MASK;
-
-	iommu_unmap(mapping->domain, iova, size);
-	__free_iova(mapping, iova, size);
-	return 0;
-}
-
-static struct page **__atomic_get_pages(void *addr)
-{
-	struct page *page;
-	phys_addr_t phys;
-
-	phys = gen_pool_virt_to_phys(atomic_pool, (unsigned long)addr);
-	page = phys_to_page(phys);
-
-	return (struct page **)page;
-}
-
-static struct page **__iommu_get_pages(void *cpu_addr, struct dma_attrs *attrs)
-{
-	struct vm_struct *area;
-
-	if (__in_atomic_pool(cpu_addr, PAGE_SIZE))
-		return __atomic_get_pages(cpu_addr);
-
-	if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
-		return cpu_addr;
-
-	area = find_vm_area(cpu_addr);
-	if (area && (area->flags & VM_ARM_DMA_CONSISTENT))
-		return area->pages;
-	return NULL;
-}
-
-static void *__iommu_alloc_atomic(struct device *dev, size_t size,
-				  dma_addr_t *handle)
-{
-	struct page *page;
-	void *addr;
-
-	addr = __alloc_from_pool(size, &page);
-	if (!addr)
-		return NULL;
-
-	*handle = __iommu_create_mapping(dev, &page, size);
-	if (*handle == DMA_ERROR_CODE)
-		goto err_mapping;
-
-	return addr;
-
-err_mapping:
-	__free_from_pool(addr, size);
-	return NULL;
-}
-
-static void __iommu_free_atomic(struct device *dev, void *cpu_addr,
-				dma_addr_t handle, size_t size)
-{
-	__iommu_remove_mapping(dev, handle, size);
-	__free_from_pool(cpu_addr, size);
-}
-
-static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
-	    dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
-{
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
-	struct page **pages;
-	void *addr = NULL;
-
-	*handle = DMA_ERROR_CODE;
-	size = PAGE_ALIGN(size);
-
-	if (!gfpflags_allow_blocking(gfp))
-		return __iommu_alloc_atomic(dev, size, handle);
-
-	/*
-	 * Following is a work-around (a.k.a. hack) to prevent pages
-	 * with __GFP_COMP being passed to split_page() which cannot
-	 * handle them.  The real problem is that this flag probably
-	 * should be 0 on ARM as it is not supported on this
-	 * platform; see CONFIG_HUGETLBFS.
-	 */
-	gfp &= ~(__GFP_COMP);
-
-	pages = __iommu_alloc_buffer(dev, size, gfp, attrs);
-	if (!pages)
-		return NULL;
-
-	*handle = __iommu_create_mapping(dev, pages, size);
-	if (*handle == DMA_ERROR_CODE)
-		goto err_buffer;
-
-	if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
-		return pages;
-
-	addr = __iommu_alloc_remap(pages, size, gfp, prot,
-				   __builtin_return_address(0));
-	if (!addr)
-		goto err_mapping;
-
-	return addr;
-
-err_mapping:
-	__iommu_remove_mapping(dev, *handle, size);
-err_buffer:
-	__iommu_free_buffer(dev, pages, size, attrs);
-	return NULL;
-}
-
-static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size,
-		    struct dma_attrs *attrs)
-{
-	unsigned long uaddr = vma->vm_start;
-	unsigned long usize = vma->vm_end - vma->vm_start;
-	struct page **pages = __iommu_get_pages(cpu_addr, attrs);
-	unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	unsigned long off = vma->vm_pgoff;
-
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
-
-	if (!pages)
-		return -ENXIO;
-
-	if (off >= nr_pages || (usize >> PAGE_SHIFT) > nr_pages - off)
-		return -ENXIO;
-
-	pages += off;
-
-	do {
-		int ret = vm_insert_page(vma, uaddr, *pages++);
-		if (ret) {
-			pr_err("Remapping memory failed: %d\n", ret);
-			return ret;
-		}
-		uaddr += PAGE_SIZE;
-		usize -= PAGE_SIZE;
-	} while (usize > 0);
-
-	return 0;
-}
-
-/*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
- */
-void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-			  dma_addr_t handle, struct dma_attrs *attrs)
-{
-	struct page **pages;
-	size = PAGE_ALIGN(size);
-
-	if (__in_atomic_pool(cpu_addr, size)) {
-		__iommu_free_atomic(dev, cpu_addr, handle, size);
-		return;
-	}
-
-	pages = __iommu_get_pages(cpu_addr, attrs);
-	if (!pages) {
-		WARN(1, "trying to free invalid coherent area: %p\n", cpu_addr);
-		return;
-	}
-
-	if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) {
-		dma_common_free_remap(cpu_addr, size,
-			VM_ARM_DMA_CONSISTENT | VM_USERMAP);
-	}
-
-	__iommu_remove_mapping(dev, handle, size);
-	__iommu_free_buffer(dev, pages, size, attrs);
-}
-
-static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
-				 void *cpu_addr, dma_addr_t dma_addr,
-				 size_t size, struct dma_attrs *attrs)
-{
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	struct page **pages = __iommu_get_pages(cpu_addr, attrs);
-
-	if (!pages)
-		return -ENXIO;
-
-	return sg_alloc_table_from_pages(sgt, pages, count, 0, size,
-					 GFP_KERNEL);
-}
-
-static int __dma_direction_to_prot(enum dma_data_direction dir)
-{
-	int prot;
-
-	switch (dir) {
-	case DMA_BIDIRECTIONAL:
-		prot = IOMMU_READ | IOMMU_WRITE;
-		break;
-	case DMA_TO_DEVICE:
-		prot = IOMMU_READ;
-		break;
-	case DMA_FROM_DEVICE:
-		prot = IOMMU_WRITE;
-		break;
-	default:
-		prot = 0;
-	}
-
-	return prot;
-}
-
-/*
- * Map a part of the scatter-gather list into contiguous io address space
- */
-static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
-			  size_t size, dma_addr_t *handle,
-			  enum dma_data_direction dir, struct dma_attrs *attrs,
-			  bool is_coherent)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova, iova_base;
-	int ret = 0;
-	unsigned int count;
-	struct scatterlist *s;
-	int prot;
-
-	size = PAGE_ALIGN(size);
-	*handle = DMA_ERROR_CODE;
-
-	iova_base = iova = __alloc_iova(mapping, size);
-	if (iova == DMA_ERROR_CODE)
-		return -ENOMEM;
-
-	for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
-		phys_addr_t phys = page_to_phys(sg_page(s));
-		unsigned int len = PAGE_ALIGN(s->offset + s->length);
-
-		if (!is_coherent &&
-			!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
-
-		prot = __dma_direction_to_prot(dir);
-
-		ret = iommu_map(mapping->domain, iova, phys, len, prot);
-		if (ret < 0)
-			goto fail;
-		count += len >> PAGE_SHIFT;
-		iova += len;
-	}
-	*handle = iova_base;
-
-	return 0;
-fail:
-	iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
-	__free_iova(mapping, iova_base, size);
-	return ret;
-}
-
-static int __iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
-		     enum dma_data_direction dir, struct dma_attrs *attrs,
-		     bool is_coherent)
-{
-	struct scatterlist *s = sg, *dma = sg, *start = sg;
-	int i, count = 0;
-	unsigned int offset = s->offset;
-	unsigned int size = s->offset + s->length;
-	unsigned int max = dma_get_max_seg_size(dev);
-
-	for (i = 1; i < nents; i++) {
-		s = sg_next(s);
-
-		s->dma_address = DMA_ERROR_CODE;
-		s->dma_length = 0;
-
-		if (s->offset || (size & ~PAGE_MASK) || size + s->length > max) {
-			if (__map_sg_chunk(dev, start, size, &dma->dma_address,
-			    dir, attrs, is_coherent) < 0)
-				goto bad_mapping;
-
-			dma->dma_address += offset;
-			dma->dma_length = size - offset;
-
-			size = offset = s->offset;
-			start = s;
-			dma = sg_next(dma);
-			count += 1;
-		}
-		size += s->length;
-	}
-	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir, attrs,
-		is_coherent) < 0)
-		goto bad_mapping;
-
-	dma->dma_address += offset;
-	dma->dma_length = size - offset;
-
-	return count+1;
-
-bad_mapping:
-	for_each_sg(sg, s, count, i)
-		__iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s));
-	return 0;
-}
-
-/**
- * arm_coherent_iommu_map_sg - map a set of SG buffers for streaming mode DMA
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map
- * @dir: DMA transfer direction
- *
- * Map a set of i/o coherent buffers described by scatterlist in streaming
- * mode for DMA. The scatter gather list elements are merged together (if
- * possible) and tagged with the appropriate dma address and length. They are
- * obtained via sg_dma_{address,length}.
- */
-int arm_coherent_iommu_map_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	return __iommu_map_sg(dev, sg, nents, dir, attrs, true);
-}
-
-/**
- * arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map
- * @dir: DMA transfer direction
- *
- * Map a set of buffers described by scatterlist in streaming mode for DMA.
- * The scatter gather list elements are merged together (if possible) and
- * tagged with the appropriate dma address and length. They are obtained via
- * sg_dma_{address,length}.
- */
-int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	return __iommu_map_sg(dev, sg, nents, dir, attrs, false);
-}
-
-static void __iommu_unmap_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs,
-		bool is_coherent)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i) {
-		if (sg_dma_len(s))
-			__iommu_remove_mapping(dev, sg_dma_address(s),
-					       sg_dma_len(s));
-		if (!is_coherent &&
-		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-			__dma_page_dev_to_cpu(sg_page(s), s->offset,
-					      s->length, dir);
-	}
-}
-
-/**
- * arm_coherent_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- *
- * Unmap a set of streaming mode DMA translations.  Again, CPU access
- * rules concerning calls here are the same as for dma_unmap_single().
- */
-void arm_coherent_iommu_unmap_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	__iommu_unmap_sg(dev, sg, nents, dir, attrs, true);
-}
-
-/**
- * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- *
- * Unmap a set of streaming mode DMA translations.  Again, CPU access
- * rules concerning calls here are the same as for dma_unmap_single().
- */
-void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
-			enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	__iommu_unmap_sg(dev, sg, nents, dir, attrs, false);
-}
-
-/**
- * arm_iommu_sync_sg_for_cpu
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map (returned from dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- */
-void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
-			int nents, enum dma_data_direction dir)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i)
-		__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
-
-}
-
-/**
- * arm_iommu_sync_sg_for_device
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map (returned from dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- */
-void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
-			int nents, enum dma_data_direction dir)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i)
-		__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
-}
-
-
-/**
- * arm_coherent_iommu_map_page
- * @dev: valid struct device pointer
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Coherent IOMMU aware version of arm_dma_map_page()
- */
-static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir,
-	     struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t dma_addr;
-	int ret, prot, len = PAGE_ALIGN(size + offset);
-
-	dma_addr = __alloc_iova(mapping, len);
-	if (dma_addr == DMA_ERROR_CODE)
-		return dma_addr;
-
-	prot = __dma_direction_to_prot(dir);
-
-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
-	if (ret < 0)
-		goto fail;
-
-	return dma_addr + offset;
-fail:
-	__free_iova(mapping, dma_addr, len);
-	return DMA_ERROR_CODE;
-}
-
-/**
- * arm_iommu_map_page
- * @dev: valid struct device pointer
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * IOMMU aware version of arm_dma_map_page()
- */
-static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir,
-	     struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__dma_page_cpu_to_dev(page, offset, size, dir);
-
-	return arm_coherent_iommu_map_page(dev, page, offset, size, dir, attrs);
-}
-
-/**
- * arm_coherent_iommu_unmap_page
- * @dev: valid struct device pointer
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * Coherent IOMMU aware version of arm_dma_unmap_page()
- */
-static void arm_coherent_iommu_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	int offset = handle & ~PAGE_MASK;
-	int len = PAGE_ALIGN(size + offset);
-
-	if (!iova)
-		return;
-
-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
-}
-
-/**
- * arm_iommu_unmap_page
- * @dev: valid struct device pointer
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * IOMMU aware version of arm_dma_unmap_page()
- */
-static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	int offset = handle & ~PAGE_MASK;
-	int len = PAGE_ALIGN(size + offset);
-
-	if (!iova)
-		return;
-
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__dma_page_dev_to_cpu(page, offset, size, dir);
-
-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
-}
-
-static void arm_iommu_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	unsigned int offset = handle & ~PAGE_MASK;
-
-	if (!iova)
-		return;
-
-	__dma_page_dev_to_cpu(page, offset, size, dir);
-}
-
-static void arm_iommu_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	unsigned int offset = handle & ~PAGE_MASK;
-
-	if (!iova)
-		return;
-
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-}
-
-struct dma_map_ops iommu_ops = {
-	.alloc		= arm_iommu_alloc_attrs,
-	.free		= arm_iommu_free_attrs,
-	.mmap		= arm_iommu_mmap_attrs,
-	.get_sgtable	= arm_iommu_get_sgtable,
-
-	.map_page		= arm_iommu_map_page,
-	.unmap_page		= arm_iommu_unmap_page,
-	.sync_single_for_cpu	= arm_iommu_sync_single_for_cpu,
-	.sync_single_for_device	= arm_iommu_sync_single_for_device,
-
-	.map_sg			= arm_iommu_map_sg,
-	.unmap_sg		= arm_iommu_unmap_sg,
-	.sync_sg_for_cpu	= arm_iommu_sync_sg_for_cpu,
-	.sync_sg_for_device	= arm_iommu_sync_sg_for_device,
-
-	.set_dma_mask		= arm_dma_set_mask,
-};
-
-struct dma_map_ops iommu_coherent_ops = {
-	.alloc		= arm_iommu_alloc_attrs,
-	.free		= arm_iommu_free_attrs,
-	.mmap		= arm_iommu_mmap_attrs,
-	.get_sgtable	= arm_iommu_get_sgtable,
-
-	.map_page	= arm_coherent_iommu_map_page,
-	.unmap_page	= arm_coherent_iommu_unmap_page,
-
-	.map_sg		= arm_coherent_iommu_map_sg,
-	.unmap_sg	= arm_coherent_iommu_unmap_sg,
-
-	.set_dma_mask	= arm_dma_set_mask,
-};
-
-/**
- * arm_iommu_create_mapping
- * @bus: pointer to the bus holding the client device (for IOMMU calls)
- * @base: start address of the valid IO address space
- * @size: maximum size of the valid IO address space
- *
- * Creates a mapping structure which holds information about used/unused
- * IO address ranges, which is required to perform memory allocation and
- * mapping with IOMMU aware functions.
- *
- * The client device need to be attached to the mapping with
- * arm_iommu_attach_device function.
- */
-struct dma_iommu_mapping *
-arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size)
-{
-	unsigned int bits = size >> PAGE_SHIFT;
-	unsigned int bitmap_size = BITS_TO_LONGS(bits) * sizeof(long);
-	struct dma_iommu_mapping *mapping;
-	int extensions = 1;
-	int err = -ENOMEM;
-
-	/* currently only 32-bit DMA address space is supported */
-	if (size > DMA_BIT_MASK(32) + 1)
-		return ERR_PTR(-ERANGE);
-
-	if (!bitmap_size)
-		return ERR_PTR(-EINVAL);
-
-	if (bitmap_size > PAGE_SIZE) {
-		extensions = bitmap_size / PAGE_SIZE;
-		bitmap_size = PAGE_SIZE;
-	}
-
-	mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL);
-	if (!mapping)
-		goto err;
-
-	mapping->bitmap_size = bitmap_size;
-	mapping->bitmaps = kzalloc(extensions * sizeof(unsigned long *),
-				GFP_KERNEL);
-	if (!mapping->bitmaps)
-		goto err2;
-
-	mapping->bitmaps[0] = kzalloc(bitmap_size, GFP_KERNEL);
-	if (!mapping->bitmaps[0])
-		goto err3;
-
-	mapping->nr_bitmaps = 1;
-	mapping->extensions = extensions;
-	mapping->base = base;
-	mapping->bits = BITS_PER_BYTE * bitmap_size;
-
-	spin_lock_init(&mapping->lock);
-
-	mapping->domain = iommu_domain_alloc(bus);
-	if (!mapping->domain)
-		goto err4;
-
-	kref_init(&mapping->kref);
-	return mapping;
-err4:
-	kfree(mapping->bitmaps[0]);
-err3:
-	kfree(mapping->bitmaps);
-err2:
-	kfree(mapping);
-err:
-	return ERR_PTR(err);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_create_mapping);
-
-static void release_iommu_mapping(struct kref *kref)
-{
-	int i;
-	struct dma_iommu_mapping *mapping =
-		container_of(kref, struct dma_iommu_mapping, kref);
-
-	iommu_domain_free(mapping->domain);
-	for (i = 0; i < mapping->nr_bitmaps; i++)
-		kfree(mapping->bitmaps[i]);
-	kfree(mapping->bitmaps);
-	kfree(mapping);
-}
-
-static int extend_iommu_mapping(struct dma_iommu_mapping *mapping)
-{
-	int next_bitmap;
-
-	if (mapping->nr_bitmaps >= mapping->extensions)
-		return -EINVAL;
-
-	next_bitmap = mapping->nr_bitmaps;
-	mapping->bitmaps[next_bitmap] = kzalloc(mapping->bitmap_size,
-						GFP_ATOMIC);
-	if (!mapping->bitmaps[next_bitmap])
-		return -ENOMEM;
-
-	mapping->nr_bitmaps++;
-
-	return 0;
-}
-
-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping)
-{
-	if (mapping)
-		kref_put(&mapping->kref, release_iommu_mapping);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_release_mapping);
-
-static int __arm_iommu_attach_device(struct device *dev,
-				     struct dma_iommu_mapping *mapping)
-{
-	int err;
-
-	err = iommu_attach_device(mapping->domain, dev);
-	if (err)
-		return err;
-
-	kref_get(&mapping->kref);
-	to_dma_iommu_mapping(dev) = mapping;
-
-	pr_debug("Attached IOMMU controller to %s device.\n", dev_name(dev));
-	return 0;
-}
-
-/**
- * arm_iommu_attach_device
- * @dev: valid struct device pointer
- * @mapping: io address space mapping structure (returned from
- *	arm_iommu_create_mapping)
- *
- * Attaches specified io address space mapping to the provided device.
- * This replaces the dma operations (dma_map_ops pointer) with the
- * IOMMU aware version.
- *
- * More than one client might be attached to the same io address space
- * mapping.
- */
-int arm_iommu_attach_device(struct device *dev,
-			    struct dma_iommu_mapping *mapping)
-{
-	int err;
-
-	err = __arm_iommu_attach_device(dev, mapping);
-	if (err)
-		return err;
-
-	set_dma_ops(dev, &iommu_ops);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
-
-static void __arm_iommu_detach_device(struct device *dev)
-{
-	struct dma_iommu_mapping *mapping;
-
-	mapping = to_dma_iommu_mapping(dev);
-	if (!mapping) {
-		dev_warn(dev, "Not attached\n");
-		return;
-	}
-
-	iommu_detach_device(mapping->domain, dev);
-	kref_put(&mapping->kref, release_iommu_mapping);
-	to_dma_iommu_mapping(dev) = NULL;
-
-	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
-}
-
-/**
- * arm_iommu_detach_device
- * @dev: valid struct device pointer
- *
- * Detaches the provided device from a previously attached map.
- * This voids the dma operations (dma_map_ops pointer)
- */
-void arm_iommu_detach_device(struct device *dev)
-{
-	__arm_iommu_detach_device(dev);
-	set_dma_ops(dev, NULL);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_detach_device);
-
-static struct dma_map_ops *arm_get_iommu_dma_map_ops(bool coherent)
-{
-	return coherent ? &iommu_coherent_ops : &iommu_ops;
-}
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    struct iommu_ops *iommu)
-{
-	struct dma_iommu_mapping *mapping;
-
-	if (!iommu)
-		return false;
-
-	mapping = arm_iommu_create_mapping(dev->bus, dma_base, size);
-	if (IS_ERR(mapping)) {
-		pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n",
-				size, dev_name(dev));
-		return false;
-	}
-
-	if (__arm_iommu_attach_device(dev, mapping)) {
-		pr_warn("Failed to attached device %s to IOMMU_mapping\n",
-				dev_name(dev));
-		arm_iommu_release_mapping(mapping);
-		return false;
-	}
-
-	return true;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-	if (!mapping)
-		return;
-
-	__arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(mapping);
-}
-
-#else
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    struct iommu_ops *iommu)
-{
-	return false;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev) { }
-
-#define arm_get_iommu_dma_map_ops arm_get_dma_map_ops
-
-#endif	/* CONFIG_ARM_DMA_USE_IOMMU */
-
 static struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
@@ -2123,18 +1010,13 @@ static struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			struct iommu_ops *iommu, bool coherent)
 {
-	struct dma_map_ops *dma_ops;
-
 	dev->archdata.dma_coherent = coherent;
-	if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
-		dma_ops = arm_get_iommu_dma_map_ops(coherent);
-	else
-		dma_ops = arm_get_dma_map_ops(coherent);
 
-	set_dma_ops(dev, dma_ops);
+	if (!common_iommu_setup_dma_ops(dev, dma_base, size, iommu))
+		arch_set_dma_ops(dev, arm_get_dma_map_ops(coherent));
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
-	arm_teardown_iommu_dma_ops(dev);
+	common_iommu_teardown_dma_ops(dev);
 }
diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig
index 85739859dffc..7bdb5cf64ba3 100644
--- a/drivers/gpu/drm/rockchip/Kconfig
+++ b/drivers/gpu/drm/rockchip/Kconfig
@@ -1,5 +1,6 @@
 config DRM_ROCKCHIP
 	tristate "DRM Support for Rockchip"
+	depends on BROKEN
 	depends on DRM && ROCKCHIP_IOMMU
 	depends on RESET_CONTROLLER
 	select DRM_KMS_HELPER
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1674de1cfed0..8a99210f1cbc 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -265,6 +265,7 @@ config EXYNOS_IOMMU_DEBUG
 
 config IPMMU_VMSA
 	bool "Renesas VMSA-compatible IPMMU"
+	depends on BROKEN
 	depends on ARM_LPAE
 	depends on ARCH_SHMOBILE || COMPILE_TEST
 	select IOMMU_API
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 8b89ebe16d94..fb8bb372e489 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -88,6 +88,7 @@ config VIDEO_OMAP3
 	depends on VIDEO_V4L2 && I2C && VIDEO_V4L2_SUBDEV_API && ARCH_OMAP3
 	depends on HAS_DMA && OF
 	depends on OMAP_IOMMU
+	depends on BROKEN
 	select ARM_DMA_USE_IOMMU
 	select VIDEOBUF2_DMA_CONTIG
 	select MFD_SYSCON
-- 
1.9.2

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-19  8:22   ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-19  8:22 UTC (permalink / raw)
  To: linux-arm-kernel

This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
side-effect of this change is a switch from bitmap-based IO address space
management to tree-based code. There should be no functional changes
for drivers, which rely on initialization from generic arch_setup_dna_ops()
interface. Code, which used old arm_iommu_* functions must be updated to
new interface.

Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
---
 arch/arm/Kconfig                   |   22 +-
 arch/arm/include/asm/device.h      |    9 -
 arch/arm/include/asm/dma-iommu.h   |   37 --
 arch/arm/include/asm/dma-mapping.h |   59 +-
 arch/arm/mm/dma-mapping.c          | 1158 +-----------------------------------
 drivers/gpu/drm/rockchip/Kconfig   |    1 +
 drivers/iommu/Kconfig              |    1 +
 drivers/media/platform/Kconfig     |    1 +
 8 files changed, 82 insertions(+), 1206 deletions(-)
 delete mode 100644 arch/arm/include/asm/dma-iommu.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 4f799e567fc8..ed45f0d63cee 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -107,27 +107,7 @@ config ARM_DMA_USE_IOMMU
 	bool
 	select ARM_HAS_SG_CHAIN
 	select NEED_SG_DMA_LENGTH
-
-if ARM_DMA_USE_IOMMU
-
-config ARM_DMA_IOMMU_ALIGNMENT
-	int "Maximum PAGE_SIZE order of alignment for DMA IOMMU buffers"
-	range 4 9
-	default 8
-	help
-	  DMA mapping framework by default aligns all buffers to the smallest
-	  PAGE_SIZE order which is greater than or equal to the requested buffer
-	  size. This works well for buffers up to a few hundreds kilobytes, but
-	  for larger buffers it just a waste of address space. Drivers which has
-	  relatively small addressing window (like 64Mib) might run out of
-	  virtual space with just a few allocations.
-
-	  With this parameter you can specify the maximum PAGE_SIZE order for
-	  DMA IOMMU buffers. Larger buffers will be aligned only to this
-	  specified order. The order is expressed as a power of two multiplied
-	  by the PAGE_SIZE.
-
-endif
+	select IOMMU_DMA
 
 config MIGHT_HAVE_PCI
 	bool
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 4111592f0130..6ea939c39cd4 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
 #ifdef CONFIG_IOMMU_API
 	void *iommu; /* private IOMMU data */
 #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-	struct dma_iommu_mapping	*mapping;
-#endif
 	bool dma_coherent;
 };
 
@@ -28,10 +25,4 @@ struct pdev_archdata {
 #endif
 };
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-#define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
-
 #endif
diff --git a/arch/arm/include/asm/dma-iommu.h b/arch/arm/include/asm/dma-iommu.h
deleted file mode 100644
index 2ef282f96651..000000000000
--- a/arch/arm/include/asm/dma-iommu.h
+++ /dev/null
@@ -1,37 +0,0 @@
-#ifndef ASMARM_DMA_IOMMU_H
-#define ASMARM_DMA_IOMMU_H
-
-#ifdef __KERNEL__
-
-#include <linux/mm_types.h>
-#include <linux/scatterlist.h>
-#include <linux/dma-debug.h>
-#include <linux/kmemcheck.h>
-#include <linux/kref.h>
-
-struct dma_iommu_mapping {
-	/* iommu specific data */
-	struct iommu_domain	*domain;
-
-	unsigned long		**bitmaps;	/* array of bitmaps */
-	unsigned int		nr_bitmaps;	/* nr of elements in array */
-	unsigned int		extensions;
-	size_t			bitmap_size;	/* size of a single bitmap */
-	size_t			bits;		/* per bitmap */
-	dma_addr_t		base;
-
-	spinlock_t		lock;
-	struct kref		kref;
-};
-
-struct dma_iommu_mapping *
-arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size);
-
-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);
-
-int arm_iommu_attach_device(struct device *dev,
-					struct dma_iommu_mapping *mapping);
-void arm_iommu_detach_device(struct device *dev);
-
-#endif /* __KERNEL__ */
-#endif
diff --git a/arch/arm/include/asm/dma-mapping.h b/arch/arm/include/asm/dma-mapping.h
index 6ad1ceda62a5..08bedb0c02c6 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -8,6 +8,7 @@
 #include <linux/dma-attrs.h>
 #include <linux/dma-debug.h>
 
+#include <asm/cacheflush.h>
 #include <asm/memory.h>
 
 #include <xen/xen.h>
@@ -32,7 +33,7 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 		return __generic_dma_ops(dev);
 }
 
-static inline void set_dma_ops(struct device *dev, struct dma_map_ops *ops)
+static inline void arch_set_dma_ops(struct device *dev, struct dma_map_ops *ops)
 {
 	BUG_ON(!dev);
 	dev->archdata.dma_ops = ops;
@@ -275,5 +276,61 @@ extern int arm_dma_get_sgtable(struct device *dev, struct sg_table *sgt,
 		void *cpu_addr, dma_addr_t dma_addr, size_t size,
 		struct dma_attrs *attrs);
 
+/*
+ * The DMA API is built upon the notion of "buffer ownership".  A buffer
+ * is either exclusively owned by the CPU (and therefore may be accessed
+ * by it) or exclusively owned by the DMA device.  These helper functions
+ * represent the transitions between these two ownership states.
+ *
+ * Note, however, that on later ARMs, this notion does not work due to
+ * speculative prefetches.  We model our approach on the assumption that
+ * the CPU does do speculative prefetches, which means we clean caches
+ * before transfers and delay cache invalidation until transfer completion.
+ *
+ */
+extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
+				  enum dma_data_direction);
+extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
+				  enum dma_data_direction);
+
+static inline void arch_flush_page(struct device *dev, const void *virt,
+			    phys_addr_t phys)
+{
+	dmac_flush_range(virt, virt + PAGE_SIZE);
+	outer_flush_range(phys, phys + PAGE_SIZE);
+}
+
+static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
+				     enum dma_data_direction dir)
+{
+	unsigned int offset = phys & ~PAGE_MASK;
+	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
+}
+
+static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
+				       enum dma_data_direction dir)
+{
+	unsigned int offset = phys & ~PAGE_MASK;
+	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
+}
+
+static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
+					pgprot_t prot, bool coherent)
+{
+	if (coherent)
+		return prot;
+
+	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
+			    pgprot_writecombine(prot) :
+			    pgprot_dmacoherent(prot);
+	return prot;
+}
+
+extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
+					 gfp_t flags);
+extern bool arch_in_atomic_pool(void *start, size_t size);
+extern int arch_free_from_atomic_pool(void *start, size_t size);
+
+
 #endif /* __KERNEL__ */
 #endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 0eca3812527e..5d497f3c5924 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -20,6 +20,7 @@
 #include <linux/device.h>
 #include <linux/dma-mapping.h>
 #include <linux/dma-contiguous.h>
+#include <linux/dma-iommu.h>
 #include <linux/highmem.h>
 #include <linux/memblock.h>
 #include <linux/slab.h>
@@ -34,7 +35,6 @@
 #include <asm/cacheflush.h>
 #include <asm/tlbflush.h>
 #include <asm/mach/arch.h>
-#include <asm/dma-iommu.h>
 #include <asm/mach/map.h>
 #include <asm/system_info.h>
 #include <asm/dma-contiguous.h>
@@ -42,23 +42,6 @@
 #include "dma.h"
 #include "mm.h"
 
-/*
- * The DMA API is built upon the notion of "buffer ownership".  A buffer
- * is either exclusively owned by the CPU (and therefore may be accessed
- * by it) or exclusively owned by the DMA device.  These helper functions
- * represent the transitions between these two ownership states.
- *
- * Note, however, that on later ARMs, this notion does not work due to
- * speculative prefetches.  We model our approach on the assumption that
- * the CPU does do speculative prefetches, which means we clean caches
- * before transfers and delay cache invalidation until transfer completion.
- *
- */
-static void __dma_page_cpu_to_dev(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-static void __dma_page_dev_to_cpu(struct page *, unsigned long,
-		size_t, enum dma_data_direction);
-
 /**
  * arm_dma_map_page - map a portion of a page for streaming DMA
  * @dev: valid struct device pointer, or NULL for ISA and EISA-like devices
@@ -492,7 +475,7 @@ static void *__alloc_remap_buffer(struct device *dev, size_t size, gfp_t gfp,
 	return ptr;
 }
 
-static void *__alloc_from_pool(size_t size, struct page **ret_page)
+void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page, gfp_t gfp)
 {
 	unsigned long val;
 	void *ptr = NULL;
@@ -513,14 +496,14 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page)
 	return ptr;
 }
 
-static bool __in_atomic_pool(void *start, size_t size)
+bool arch_in_atomic_pool(void *start, size_t size)
 {
 	return addr_in_gen_pool(atomic_pool, (unsigned long)start, size);
 }
 
-static int __free_from_pool(void *start, size_t size)
+int arch_free_from_atomic_pool(void *start, size_t size)
 {
-	if (!__in_atomic_pool(start, size))
+	if (!arch_in_atomic_pool(start, size))
 		return 0;
 
 	gen_pool_free(atomic_pool, (unsigned long)start, size);
@@ -574,25 +557,21 @@ static void __free_from_contiguous(struct device *dev, struct page *page,
 	dma_release_from_contiguous(dev, page, size >> PAGE_SHIFT);
 }
 
-static inline pgprot_t __get_dma_pgprot(struct dma_attrs *attrs, pgprot_t prot)
-{
-	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
-			    pgprot_writecombine(prot) :
-			    pgprot_dmacoherent(prot);
-	return prot;
-}
-
 #define nommu() 0
 
+#define __alloc_from_pool(size, ret, gfp) arch_alloc_from_atomic_pool(size, ret, gfp)
+#define __free_from_pool(addr, size) arch_free_from_atomic_pool(addr, size)
+#define __get_dma_pgprot(attrs, prot, coherent) arch_get_dma_pgprot(attrs, prot, coherent)
+
 #else	/* !CONFIG_MMU */
 
 #define nommu() 1
 
-#define __get_dma_pgprot(attrs, prot)				__pgprot(0)
+#define __get_dma_pgprot(attrs, prot, coherent)				__pgprot(0)
 #define __alloc_remap_buffer(dev, size, gfp, prot, ret, c, wv)	NULL
-#define __alloc_from_pool(size, ret_page)			NULL
+#define __alloc_from_pool(size, ret_page, gfp)			NULL
 #define __alloc_from_contiguous(dev, size, prot, ret, c, wv)	NULL
-#define __free_from_pool(cpu_addr, size)			0
+#define __free_from_atomic_pool(cpu_addr, size)			0
 #define __free_from_contiguous(dev, page, cpu_addr, size, wv)	do { } while (0)
 #define __dma_free_remap(cpu_addr, size)			do { } while (0)
 
@@ -657,7 +636,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 	else if (is_coherent)
 		addr = __alloc_simple_buffer(dev, size, gfp, &page);
 	else if (!gfpflags_allow_blocking(gfp))
-		addr = __alloc_from_pool(size, &page);
+		addr = __alloc_from_pool(size, &page, gfp);
 	else
 		addr = __alloc_remap_buffer(dev, size, gfp, prot, &page,
 					    caller, want_vaddr);
@@ -675,7 +654,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 void *arm_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle,
 		    gfp_t gfp, struct dma_attrs *attrs)
 {
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
+	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL, false);
 
 	return __dma_alloc(dev, size, handle, gfp, prot, false,
 			   attrs, __builtin_return_address(0));
@@ -728,7 +707,7 @@ int arm_dma_mmap(struct device *dev, struct vm_area_struct *vma,
 		 struct dma_attrs *attrs)
 {
 #ifdef CONFIG_MMU
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
+	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot, false);
 #endif	/* CONFIG_MMU */
 	return __arm_dma_mmap(dev, vma, cpu_addr, dma_addr, size, attrs);
 }
@@ -842,7 +821,7 @@ static void dma_cache_maint_page(struct page *page, unsigned long offset,
  * platforms with CONFIG_DMABOUNCE.
  * Use the driver DMA support - see dma-mapping.h (dma_sync_*)
  */
-static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
+void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr;
@@ -858,7 +837,7 @@ static void __dma_page_cpu_to_dev(struct page *page, unsigned long off,
 	/* FIXME: non-speculating: flush on bidirectional mappings? */
 }
 
-static void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
+void __dma_page_dev_to_cpu(struct page *page, unsigned long off,
 	size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = page_to_phys(page) + off;
@@ -1023,1098 +1002,6 @@ static int __init dma_debug_do_init(void)
 }
 fs_initcall(dma_debug_do_init);
 
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-
-/* IOMMU */
-
-static int extend_iommu_mapping(struct dma_iommu_mapping *mapping);
-
-static inline dma_addr_t __alloc_iova(struct dma_iommu_mapping *mapping,
-				      size_t size)
-{
-	unsigned int order = get_order(size);
-	unsigned int align = 0;
-	unsigned int count, start;
-	size_t mapping_size = mapping->bits << PAGE_SHIFT;
-	unsigned long flags;
-	dma_addr_t iova;
-	int i;
-
-	if (order > CONFIG_ARM_DMA_IOMMU_ALIGNMENT)
-		order = CONFIG_ARM_DMA_IOMMU_ALIGNMENT;
-
-	count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	align = (1 << order) - 1;
-
-	spin_lock_irqsave(&mapping->lock, flags);
-	for (i = 0; i < mapping->nr_bitmaps; i++) {
-		start = bitmap_find_next_zero_area(mapping->bitmaps[i],
-				mapping->bits, 0, count, align);
-
-		if (start > mapping->bits)
-			continue;
-
-		bitmap_set(mapping->bitmaps[i], start, count);
-		break;
-	}
-
-	/*
-	 * No unused range found. Try to extend the existing mapping
-	 * and perform a second attempt to reserve an IO virtual
-	 * address range of size bytes.
-	 */
-	if (i == mapping->nr_bitmaps) {
-		if (extend_iommu_mapping(mapping)) {
-			spin_unlock_irqrestore(&mapping->lock, flags);
-			return DMA_ERROR_CODE;
-		}
-
-		start = bitmap_find_next_zero_area(mapping->bitmaps[i],
-				mapping->bits, 0, count, align);
-
-		if (start > mapping->bits) {
-			spin_unlock_irqrestore(&mapping->lock, flags);
-			return DMA_ERROR_CODE;
-		}
-
-		bitmap_set(mapping->bitmaps[i], start, count);
-	}
-	spin_unlock_irqrestore(&mapping->lock, flags);
-
-	iova = mapping->base + (mapping_size * i);
-	iova += start << PAGE_SHIFT;
-
-	return iova;
-}
-
-static inline void __free_iova(struct dma_iommu_mapping *mapping,
-			       dma_addr_t addr, size_t size)
-{
-	unsigned int start, count;
-	size_t mapping_size = mapping->bits << PAGE_SHIFT;
-	unsigned long flags;
-	dma_addr_t bitmap_base;
-	u32 bitmap_index;
-
-	if (!size)
-		return;
-
-	bitmap_index = (u32) (addr - mapping->base) / (u32) mapping_size;
-	BUG_ON(addr < mapping->base || bitmap_index > mapping->extensions);
-
-	bitmap_base = mapping->base + mapping_size * bitmap_index;
-
-	start = (addr - bitmap_base) >>	PAGE_SHIFT;
-
-	if (addr + size > bitmap_base + mapping_size) {
-		/*
-		 * The address range to be freed reaches into the iova
-		 * range of the next bitmap. This should not happen as
-		 * we don't allow this in __alloc_iova (at the
-		 * moment).
-		 */
-		BUG();
-	} else
-		count = size >> PAGE_SHIFT;
-
-	spin_lock_irqsave(&mapping->lock, flags);
-	bitmap_clear(mapping->bitmaps[bitmap_index], start, count);
-	spin_unlock_irqrestore(&mapping->lock, flags);
-}
-
-static struct page **__iommu_alloc_buffer(struct device *dev, size_t size,
-					  gfp_t gfp, struct dma_attrs *attrs)
-{
-	struct page **pages;
-	int count = size >> PAGE_SHIFT;
-	int array_size = count * sizeof(struct page *);
-	int i = 0;
-
-	if (array_size <= PAGE_SIZE)
-		pages = kzalloc(array_size, GFP_KERNEL);
-	else
-		pages = vzalloc(array_size);
-	if (!pages)
-		return NULL;
-
-	if (dma_get_attr(DMA_ATTR_FORCE_CONTIGUOUS, attrs))
-	{
-		unsigned long order = get_order(size);
-		struct page *page;
-
-		page = dma_alloc_from_contiguous(dev, count, order);
-		if (!page)
-			goto error;
-
-		__dma_clear_buffer(page, size);
-
-		for (i = 0; i < count; i++)
-			pages[i] = page + i;
-
-		return pages;
-	}
-
-	/*
-	 * IOMMU can map any pages, so himem can also be used here
-	 */
-	gfp |= __GFP_NOWARN | __GFP_HIGHMEM;
-
-	while (count) {
-		int j, order;
-
-		for (order = __fls(count); order > 0; --order) {
-			/*
-			 * We do not want OOM killer to be invoked as long
-			 * as we can fall back to single pages, so we force
-			 * __GFP_NORETRY for orders higher than zero.
-			 */
-			pages[i] = alloc_pages(gfp | __GFP_NORETRY, order);
-			if (pages[i])
-				break;
-		}
-
-		if (!pages[i]) {
-			/*
-			 * Fall back to single page allocation.
-			 * Might invoke OOM killer as last resort.
-			 */
-			pages[i] = alloc_pages(gfp, 0);
-			if (!pages[i])
-				goto error;
-		}
-
-		if (order) {
-			split_page(pages[i], order);
-			j = 1 << order;
-			while (--j)
-				pages[i + j] = pages[i] + j;
-		}
-
-		__dma_clear_buffer(pages[i], PAGE_SIZE << order);
-		i += 1 << order;
-		count -= 1 << order;
-	}
-
-	return pages;
-error:
-	while (i--)
-		if (pages[i])
-			__free_pages(pages[i], 0);
-	kvfree(pages);
-	return NULL;
-}
-
-static int __iommu_free_buffer(struct device *dev, struct page **pages,
-			       size_t size, struct dma_attrs *attrs)
-{
-	int count = size >> PAGE_SHIFT;
-	int i;
-
-	if (dma_get_attr(DMA_ATTR_FORCE_CONTIGUOUS, attrs)) {
-		dma_release_from_contiguous(dev, pages[0], count);
-	} else {
-		for (i = 0; i < count; i++)
-			if (pages[i])
-				__free_pages(pages[i], 0);
-	}
-
-	kvfree(pages);
-	return 0;
-}
-
-/*
- * Create a CPU mapping for a specified pages
- */
-static void *
-__iommu_alloc_remap(struct page **pages, size_t size, gfp_t gfp, pgprot_t prot,
-		    const void *caller)
-{
-	return dma_common_pages_remap(pages, size,
-			VM_ARM_DMA_CONSISTENT | VM_USERMAP, prot, caller);
-}
-
-/*
- * Create a mapping in device IO address space for specified pages
- */
-static dma_addr_t
-__iommu_create_mapping(struct device *dev, struct page **pages, size_t size)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	dma_addr_t dma_addr, iova;
-	int i;
-
-	dma_addr = __alloc_iova(mapping, size);
-	if (dma_addr == DMA_ERROR_CODE)
-		return dma_addr;
-
-	iova = dma_addr;
-	for (i = 0; i < count; ) {
-		int ret;
-
-		unsigned int next_pfn = page_to_pfn(pages[i]) + 1;
-		phys_addr_t phys = page_to_phys(pages[i]);
-		unsigned int len, j;
-
-		for (j = i + 1; j < count; j++, next_pfn++)
-			if (page_to_pfn(pages[j]) != next_pfn)
-				break;
-
-		len = (j - i) << PAGE_SHIFT;
-		ret = iommu_map(mapping->domain, iova, phys, len,
-				IOMMU_READ|IOMMU_WRITE);
-		if (ret < 0)
-			goto fail;
-		iova += len;
-		i = j;
-	}
-	return dma_addr;
-fail:
-	iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
-	__free_iova(mapping, dma_addr, size);
-	return DMA_ERROR_CODE;
-}
-
-static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, size_t size)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-	/*
-	 * add optional in-page offset from iova to size and align
-	 * result to page size
-	 */
-	size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
-	iova &= PAGE_MASK;
-
-	iommu_unmap(mapping->domain, iova, size);
-	__free_iova(mapping, iova, size);
-	return 0;
-}
-
-static struct page **__atomic_get_pages(void *addr)
-{
-	struct page *page;
-	phys_addr_t phys;
-
-	phys = gen_pool_virt_to_phys(atomic_pool, (unsigned long)addr);
-	page = phys_to_page(phys);
-
-	return (struct page **)page;
-}
-
-static struct page **__iommu_get_pages(void *cpu_addr, struct dma_attrs *attrs)
-{
-	struct vm_struct *area;
-
-	if (__in_atomic_pool(cpu_addr, PAGE_SIZE))
-		return __atomic_get_pages(cpu_addr);
-
-	if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
-		return cpu_addr;
-
-	area = find_vm_area(cpu_addr);
-	if (area && (area->flags & VM_ARM_DMA_CONSISTENT))
-		return area->pages;
-	return NULL;
-}
-
-static void *__iommu_alloc_atomic(struct device *dev, size_t size,
-				  dma_addr_t *handle)
-{
-	struct page *page;
-	void *addr;
-
-	addr = __alloc_from_pool(size, &page);
-	if (!addr)
-		return NULL;
-
-	*handle = __iommu_create_mapping(dev, &page, size);
-	if (*handle == DMA_ERROR_CODE)
-		goto err_mapping;
-
-	return addr;
-
-err_mapping:
-	__free_from_pool(addr, size);
-	return NULL;
-}
-
-static void __iommu_free_atomic(struct device *dev, void *cpu_addr,
-				dma_addr_t handle, size_t size)
-{
-	__iommu_remove_mapping(dev, handle, size);
-	__free_from_pool(cpu_addr, size);
-}
-
-static void *arm_iommu_alloc_attrs(struct device *dev, size_t size,
-	    dma_addr_t *handle, gfp_t gfp, struct dma_attrs *attrs)
-{
-	pgprot_t prot = __get_dma_pgprot(attrs, PAGE_KERNEL);
-	struct page **pages;
-	void *addr = NULL;
-
-	*handle = DMA_ERROR_CODE;
-	size = PAGE_ALIGN(size);
-
-	if (!gfpflags_allow_blocking(gfp))
-		return __iommu_alloc_atomic(dev, size, handle);
-
-	/*
-	 * Following is a work-around (a.k.a. hack) to prevent pages
-	 * with __GFP_COMP being passed to split_page() which cannot
-	 * handle them.  The real problem is that this flag probably
-	 * should be 0 on ARM as it is not supported on this
-	 * platform; see CONFIG_HUGETLBFS.
-	 */
-	gfp &= ~(__GFP_COMP);
-
-	pages = __iommu_alloc_buffer(dev, size, gfp, attrs);
-	if (!pages)
-		return NULL;
-
-	*handle = __iommu_create_mapping(dev, pages, size);
-	if (*handle == DMA_ERROR_CODE)
-		goto err_buffer;
-
-	if (dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs))
-		return pages;
-
-	addr = __iommu_alloc_remap(pages, size, gfp, prot,
-				   __builtin_return_address(0));
-	if (!addr)
-		goto err_mapping;
-
-	return addr;
-
-err_mapping:
-	__iommu_remove_mapping(dev, *handle, size);
-err_buffer:
-	__iommu_free_buffer(dev, pages, size, attrs);
-	return NULL;
-}
-
-static int arm_iommu_mmap_attrs(struct device *dev, struct vm_area_struct *vma,
-		    void *cpu_addr, dma_addr_t dma_addr, size_t size,
-		    struct dma_attrs *attrs)
-{
-	unsigned long uaddr = vma->vm_start;
-	unsigned long usize = vma->vm_end - vma->vm_start;
-	struct page **pages = __iommu_get_pages(cpu_addr, attrs);
-	unsigned long nr_pages = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	unsigned long off = vma->vm_pgoff;
-
-	vma->vm_page_prot = __get_dma_pgprot(attrs, vma->vm_page_prot);
-
-	if (!pages)
-		return -ENXIO;
-
-	if (off >= nr_pages || (usize >> PAGE_SHIFT) > nr_pages - off)
-		return -ENXIO;
-
-	pages += off;
-
-	do {
-		int ret = vm_insert_page(vma, uaddr, *pages++);
-		if (ret) {
-			pr_err("Remapping memory failed: %d\n", ret);
-			return ret;
-		}
-		uaddr += PAGE_SIZE;
-		usize -= PAGE_SIZE;
-	} while (usize > 0);
-
-	return 0;
-}
-
-/*
- * free a page as defined by the above mapping.
- * Must not be called with IRQs disabled.
- */
-void arm_iommu_free_attrs(struct device *dev, size_t size, void *cpu_addr,
-			  dma_addr_t handle, struct dma_attrs *attrs)
-{
-	struct page **pages;
-	size = PAGE_ALIGN(size);
-
-	if (__in_atomic_pool(cpu_addr, size)) {
-		__iommu_free_atomic(dev, cpu_addr, handle, size);
-		return;
-	}
-
-	pages = __iommu_get_pages(cpu_addr, attrs);
-	if (!pages) {
-		WARN(1, "trying to free invalid coherent area: %p\n", cpu_addr);
-		return;
-	}
-
-	if (!dma_get_attr(DMA_ATTR_NO_KERNEL_MAPPING, attrs)) {
-		dma_common_free_remap(cpu_addr, size,
-			VM_ARM_DMA_CONSISTENT | VM_USERMAP);
-	}
-
-	__iommu_remove_mapping(dev, handle, size);
-	__iommu_free_buffer(dev, pages, size, attrs);
-}
-
-static int arm_iommu_get_sgtable(struct device *dev, struct sg_table *sgt,
-				 void *cpu_addr, dma_addr_t dma_addr,
-				 size_t size, struct dma_attrs *attrs)
-{
-	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-	struct page **pages = __iommu_get_pages(cpu_addr, attrs);
-
-	if (!pages)
-		return -ENXIO;
-
-	return sg_alloc_table_from_pages(sgt, pages, count, 0, size,
-					 GFP_KERNEL);
-}
-
-static int __dma_direction_to_prot(enum dma_data_direction dir)
-{
-	int prot;
-
-	switch (dir) {
-	case DMA_BIDIRECTIONAL:
-		prot = IOMMU_READ | IOMMU_WRITE;
-		break;
-	case DMA_TO_DEVICE:
-		prot = IOMMU_READ;
-		break;
-	case DMA_FROM_DEVICE:
-		prot = IOMMU_WRITE;
-		break;
-	default:
-		prot = 0;
-	}
-
-	return prot;
-}
-
-/*
- * Map a part of the scatter-gather list into contiguous io address space
- */
-static int __map_sg_chunk(struct device *dev, struct scatterlist *sg,
-			  size_t size, dma_addr_t *handle,
-			  enum dma_data_direction dir, struct dma_attrs *attrs,
-			  bool is_coherent)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova, iova_base;
-	int ret = 0;
-	unsigned int count;
-	struct scatterlist *s;
-	int prot;
-
-	size = PAGE_ALIGN(size);
-	*handle = DMA_ERROR_CODE;
-
-	iova_base = iova = __alloc_iova(mapping, size);
-	if (iova == DMA_ERROR_CODE)
-		return -ENOMEM;
-
-	for (count = 0, s = sg; count < (size >> PAGE_SHIFT); s = sg_next(s)) {
-		phys_addr_t phys = page_to_phys(sg_page(s));
-		unsigned int len = PAGE_ALIGN(s->offset + s->length);
-
-		if (!is_coherent &&
-			!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-			__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
-
-		prot = __dma_direction_to_prot(dir);
-
-		ret = iommu_map(mapping->domain, iova, phys, len, prot);
-		if (ret < 0)
-			goto fail;
-		count += len >> PAGE_SHIFT;
-		iova += len;
-	}
-	*handle = iova_base;
-
-	return 0;
-fail:
-	iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
-	__free_iova(mapping, iova_base, size);
-	return ret;
-}
-
-static int __iommu_map_sg(struct device *dev, struct scatterlist *sg, int nents,
-		     enum dma_data_direction dir, struct dma_attrs *attrs,
-		     bool is_coherent)
-{
-	struct scatterlist *s = sg, *dma = sg, *start = sg;
-	int i, count = 0;
-	unsigned int offset = s->offset;
-	unsigned int size = s->offset + s->length;
-	unsigned int max = dma_get_max_seg_size(dev);
-
-	for (i = 1; i < nents; i++) {
-		s = sg_next(s);
-
-		s->dma_address = DMA_ERROR_CODE;
-		s->dma_length = 0;
-
-		if (s->offset || (size & ~PAGE_MASK) || size + s->length > max) {
-			if (__map_sg_chunk(dev, start, size, &dma->dma_address,
-			    dir, attrs, is_coherent) < 0)
-				goto bad_mapping;
-
-			dma->dma_address += offset;
-			dma->dma_length = size - offset;
-
-			size = offset = s->offset;
-			start = s;
-			dma = sg_next(dma);
-			count += 1;
-		}
-		size += s->length;
-	}
-	if (__map_sg_chunk(dev, start, size, &dma->dma_address, dir, attrs,
-		is_coherent) < 0)
-		goto bad_mapping;
-
-	dma->dma_address += offset;
-	dma->dma_length = size - offset;
-
-	return count+1;
-
-bad_mapping:
-	for_each_sg(sg, s, count, i)
-		__iommu_remove_mapping(dev, sg_dma_address(s), sg_dma_len(s));
-	return 0;
-}
-
-/**
- * arm_coherent_iommu_map_sg - map a set of SG buffers for streaming mode DMA
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map
- * @dir: DMA transfer direction
- *
- * Map a set of i/o coherent buffers described by scatterlist in streaming
- * mode for DMA. The scatter gather list elements are merged together (if
- * possible) and tagged with the appropriate dma address and length. They are
- * obtained via sg_dma_{address,length}.
- */
-int arm_coherent_iommu_map_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	return __iommu_map_sg(dev, sg, nents, dir, attrs, true);
-}
-
-/**
- * arm_iommu_map_sg - map a set of SG buffers for streaming mode DMA
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map
- * @dir: DMA transfer direction
- *
- * Map a set of buffers described by scatterlist in streaming mode for DMA.
- * The scatter gather list elements are merged together (if possible) and
- * tagged with the appropriate dma address and length. They are obtained via
- * sg_dma_{address,length}.
- */
-int arm_iommu_map_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	return __iommu_map_sg(dev, sg, nents, dir, attrs, false);
-}
-
-static void __iommu_unmap_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs,
-		bool is_coherent)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i) {
-		if (sg_dma_len(s))
-			__iommu_remove_mapping(dev, sg_dma_address(s),
-					       sg_dma_len(s));
-		if (!is_coherent &&
-		    !dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-			__dma_page_dev_to_cpu(sg_page(s), s->offset,
-					      s->length, dir);
-	}
-}
-
-/**
- * arm_coherent_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- *
- * Unmap a set of streaming mode DMA translations.  Again, CPU access
- * rules concerning calls here are the same as for dma_unmap_single().
- */
-void arm_coherent_iommu_unmap_sg(struct device *dev, struct scatterlist *sg,
-		int nents, enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	__iommu_unmap_sg(dev, sg, nents, dir, attrs, true);
-}
-
-/**
- * arm_iommu_unmap_sg - unmap a set of SG buffers mapped by dma_map_sg
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to unmap (same as was passed to dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- *
- * Unmap a set of streaming mode DMA translations.  Again, CPU access
- * rules concerning calls here are the same as for dma_unmap_single().
- */
-void arm_iommu_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
-			enum dma_data_direction dir, struct dma_attrs *attrs)
-{
-	__iommu_unmap_sg(dev, sg, nents, dir, attrs, false);
-}
-
-/**
- * arm_iommu_sync_sg_for_cpu
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map (returned from dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- */
-void arm_iommu_sync_sg_for_cpu(struct device *dev, struct scatterlist *sg,
-			int nents, enum dma_data_direction dir)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i)
-		__dma_page_dev_to_cpu(sg_page(s), s->offset, s->length, dir);
-
-}
-
-/**
- * arm_iommu_sync_sg_for_device
- * @dev: valid struct device pointer
- * @sg: list of buffers
- * @nents: number of buffers to map (returned from dma_map_sg)
- * @dir: DMA transfer direction (same as was passed to dma_map_sg)
- */
-void arm_iommu_sync_sg_for_device(struct device *dev, struct scatterlist *sg,
-			int nents, enum dma_data_direction dir)
-{
-	struct scatterlist *s;
-	int i;
-
-	for_each_sg(sg, s, nents, i)
-		__dma_page_cpu_to_dev(sg_page(s), s->offset, s->length, dir);
-}
-
-
-/**
- * arm_coherent_iommu_map_page
- * @dev: valid struct device pointer
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * Coherent IOMMU aware version of arm_dma_map_page()
- */
-static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir,
-	     struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t dma_addr;
-	int ret, prot, len = PAGE_ALIGN(size + offset);
-
-	dma_addr = __alloc_iova(mapping, len);
-	if (dma_addr == DMA_ERROR_CODE)
-		return dma_addr;
-
-	prot = __dma_direction_to_prot(dir);
-
-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
-	if (ret < 0)
-		goto fail;
-
-	return dma_addr + offset;
-fail:
-	__free_iova(mapping, dma_addr, len);
-	return DMA_ERROR_CODE;
-}
-
-/**
- * arm_iommu_map_page
- * @dev: valid struct device pointer
- * @page: page that buffer resides in
- * @offset: offset into page for start of buffer
- * @size: size of buffer to map
- * @dir: DMA transfer direction
- *
- * IOMMU aware version of arm_dma_map_page()
- */
-static dma_addr_t arm_iommu_map_page(struct device *dev, struct page *page,
-	     unsigned long offset, size_t size, enum dma_data_direction dir,
-	     struct dma_attrs *attrs)
-{
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__dma_page_cpu_to_dev(page, offset, size, dir);
-
-	return arm_coherent_iommu_map_page(dev, page, offset, size, dir, attrs);
-}
-
-/**
- * arm_coherent_iommu_unmap_page
- * @dev: valid struct device pointer
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * Coherent IOMMU aware version of arm_dma_unmap_page()
- */
-static void arm_coherent_iommu_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	int offset = handle & ~PAGE_MASK;
-	int len = PAGE_ALIGN(size + offset);
-
-	if (!iova)
-		return;
-
-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
-}
-
-/**
- * arm_iommu_unmap_page
- * @dev: valid struct device pointer
- * @handle: DMA address of buffer
- * @size: size of buffer (same as passed to dma_map_page)
- * @dir: DMA transfer direction (same as passed to dma_map_page)
- *
- * IOMMU aware version of arm_dma_unmap_page()
- */
-static void arm_iommu_unmap_page(struct device *dev, dma_addr_t handle,
-		size_t size, enum dma_data_direction dir,
-		struct dma_attrs *attrs)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	int offset = handle & ~PAGE_MASK;
-	int len = PAGE_ALIGN(size + offset);
-
-	if (!iova)
-		return;
-
-	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
-		__dma_page_dev_to_cpu(page, offset, size, dir);
-
-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
-}
-
-static void arm_iommu_sync_single_for_cpu(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	unsigned int offset = handle & ~PAGE_MASK;
-
-	if (!iova)
-		return;
-
-	__dma_page_dev_to_cpu(page, offset, size, dir);
-}
-
-static void arm_iommu_sync_single_for_device(struct device *dev,
-		dma_addr_t handle, size_t size, enum dma_data_direction dir)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, iova));
-	unsigned int offset = handle & ~PAGE_MASK;
-
-	if (!iova)
-		return;
-
-	__dma_page_cpu_to_dev(page, offset, size, dir);
-}
-
-struct dma_map_ops iommu_ops = {
-	.alloc		= arm_iommu_alloc_attrs,
-	.free		= arm_iommu_free_attrs,
-	.mmap		= arm_iommu_mmap_attrs,
-	.get_sgtable	= arm_iommu_get_sgtable,
-
-	.map_page		= arm_iommu_map_page,
-	.unmap_page		= arm_iommu_unmap_page,
-	.sync_single_for_cpu	= arm_iommu_sync_single_for_cpu,
-	.sync_single_for_device	= arm_iommu_sync_single_for_device,
-
-	.map_sg			= arm_iommu_map_sg,
-	.unmap_sg		= arm_iommu_unmap_sg,
-	.sync_sg_for_cpu	= arm_iommu_sync_sg_for_cpu,
-	.sync_sg_for_device	= arm_iommu_sync_sg_for_device,
-
-	.set_dma_mask		= arm_dma_set_mask,
-};
-
-struct dma_map_ops iommu_coherent_ops = {
-	.alloc		= arm_iommu_alloc_attrs,
-	.free		= arm_iommu_free_attrs,
-	.mmap		= arm_iommu_mmap_attrs,
-	.get_sgtable	= arm_iommu_get_sgtable,
-
-	.map_page	= arm_coherent_iommu_map_page,
-	.unmap_page	= arm_coherent_iommu_unmap_page,
-
-	.map_sg		= arm_coherent_iommu_map_sg,
-	.unmap_sg	= arm_coherent_iommu_unmap_sg,
-
-	.set_dma_mask	= arm_dma_set_mask,
-};
-
-/**
- * arm_iommu_create_mapping
- * @bus: pointer to the bus holding the client device (for IOMMU calls)
- * @base: start address of the valid IO address space
- * @size: maximum size of the valid IO address space
- *
- * Creates a mapping structure which holds information about used/unused
- * IO address ranges, which is required to perform memory allocation and
- * mapping with IOMMU aware functions.
- *
- * The client device need to be attached to the mapping with
- * arm_iommu_attach_device function.
- */
-struct dma_iommu_mapping *
-arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size)
-{
-	unsigned int bits = size >> PAGE_SHIFT;
-	unsigned int bitmap_size = BITS_TO_LONGS(bits) * sizeof(long);
-	struct dma_iommu_mapping *mapping;
-	int extensions = 1;
-	int err = -ENOMEM;
-
-	/* currently only 32-bit DMA address space is supported */
-	if (size > DMA_BIT_MASK(32) + 1)
-		return ERR_PTR(-ERANGE);
-
-	if (!bitmap_size)
-		return ERR_PTR(-EINVAL);
-
-	if (bitmap_size > PAGE_SIZE) {
-		extensions = bitmap_size / PAGE_SIZE;
-		bitmap_size = PAGE_SIZE;
-	}
-
-	mapping = kzalloc(sizeof(struct dma_iommu_mapping), GFP_KERNEL);
-	if (!mapping)
-		goto err;
-
-	mapping->bitmap_size = bitmap_size;
-	mapping->bitmaps = kzalloc(extensions * sizeof(unsigned long *),
-				GFP_KERNEL);
-	if (!mapping->bitmaps)
-		goto err2;
-
-	mapping->bitmaps[0] = kzalloc(bitmap_size, GFP_KERNEL);
-	if (!mapping->bitmaps[0])
-		goto err3;
-
-	mapping->nr_bitmaps = 1;
-	mapping->extensions = extensions;
-	mapping->base = base;
-	mapping->bits = BITS_PER_BYTE * bitmap_size;
-
-	spin_lock_init(&mapping->lock);
-
-	mapping->domain = iommu_domain_alloc(bus);
-	if (!mapping->domain)
-		goto err4;
-
-	kref_init(&mapping->kref);
-	return mapping;
-err4:
-	kfree(mapping->bitmaps[0]);
-err3:
-	kfree(mapping->bitmaps);
-err2:
-	kfree(mapping);
-err:
-	return ERR_PTR(err);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_create_mapping);
-
-static void release_iommu_mapping(struct kref *kref)
-{
-	int i;
-	struct dma_iommu_mapping *mapping =
-		container_of(kref, struct dma_iommu_mapping, kref);
-
-	iommu_domain_free(mapping->domain);
-	for (i = 0; i < mapping->nr_bitmaps; i++)
-		kfree(mapping->bitmaps[i]);
-	kfree(mapping->bitmaps);
-	kfree(mapping);
-}
-
-static int extend_iommu_mapping(struct dma_iommu_mapping *mapping)
-{
-	int next_bitmap;
-
-	if (mapping->nr_bitmaps >= mapping->extensions)
-		return -EINVAL;
-
-	next_bitmap = mapping->nr_bitmaps;
-	mapping->bitmaps[next_bitmap] = kzalloc(mapping->bitmap_size,
-						GFP_ATOMIC);
-	if (!mapping->bitmaps[next_bitmap])
-		return -ENOMEM;
-
-	mapping->nr_bitmaps++;
-
-	return 0;
-}
-
-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping)
-{
-	if (mapping)
-		kref_put(&mapping->kref, release_iommu_mapping);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_release_mapping);
-
-static int __arm_iommu_attach_device(struct device *dev,
-				     struct dma_iommu_mapping *mapping)
-{
-	int err;
-
-	err = iommu_attach_device(mapping->domain, dev);
-	if (err)
-		return err;
-
-	kref_get(&mapping->kref);
-	to_dma_iommu_mapping(dev) = mapping;
-
-	pr_debug("Attached IOMMU controller to %s device.\n", dev_name(dev));
-	return 0;
-}
-
-/**
- * arm_iommu_attach_device
- * @dev: valid struct device pointer
- * @mapping: io address space mapping structure (returned from
- *	arm_iommu_create_mapping)
- *
- * Attaches specified io address space mapping to the provided device.
- * This replaces the dma operations (dma_map_ops pointer) with the
- * IOMMU aware version.
- *
- * More than one client might be attached to the same io address space
- * mapping.
- */
-int arm_iommu_attach_device(struct device *dev,
-			    struct dma_iommu_mapping *mapping)
-{
-	int err;
-
-	err = __arm_iommu_attach_device(dev, mapping);
-	if (err)
-		return err;
-
-	set_dma_ops(dev, &iommu_ops);
-	return 0;
-}
-EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
-
-static void __arm_iommu_detach_device(struct device *dev)
-{
-	struct dma_iommu_mapping *mapping;
-
-	mapping = to_dma_iommu_mapping(dev);
-	if (!mapping) {
-		dev_warn(dev, "Not attached\n");
-		return;
-	}
-
-	iommu_detach_device(mapping->domain, dev);
-	kref_put(&mapping->kref, release_iommu_mapping);
-	to_dma_iommu_mapping(dev) = NULL;
-
-	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
-}
-
-/**
- * arm_iommu_detach_device
- * @dev: valid struct device pointer
- *
- * Detaches the provided device from a previously attached map.
- * This voids the dma operations (dma_map_ops pointer)
- */
-void arm_iommu_detach_device(struct device *dev)
-{
-	__arm_iommu_detach_device(dev);
-	set_dma_ops(dev, NULL);
-}
-EXPORT_SYMBOL_GPL(arm_iommu_detach_device);
-
-static struct dma_map_ops *arm_get_iommu_dma_map_ops(bool coherent)
-{
-	return coherent ? &iommu_coherent_ops : &iommu_ops;
-}
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    struct iommu_ops *iommu)
-{
-	struct dma_iommu_mapping *mapping;
-
-	if (!iommu)
-		return false;
-
-	mapping = arm_iommu_create_mapping(dev->bus, dma_base, size);
-	if (IS_ERR(mapping)) {
-		pr_warn("Failed to create %llu-byte IOMMU mapping for device %s\n",
-				size, dev_name(dev));
-		return false;
-	}
-
-	if (__arm_iommu_attach_device(dev, mapping)) {
-		pr_warn("Failed to attached device %s to IOMMU_mapping\n",
-				dev_name(dev));
-		arm_iommu_release_mapping(mapping);
-		return false;
-	}
-
-	return true;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev)
-{
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
-
-	if (!mapping)
-		return;
-
-	__arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(mapping);
-}
-
-#else
-
-static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, u64 size,
-				    struct iommu_ops *iommu)
-{
-	return false;
-}
-
-static void arm_teardown_iommu_dma_ops(struct device *dev) { }
-
-#define arm_get_iommu_dma_map_ops arm_get_dma_map_ops
-
-#endif	/* CONFIG_ARM_DMA_USE_IOMMU */
-
 static struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 {
 	return coherent ? &arm_coherent_dma_ops : &arm_dma_ops;
@@ -2123,18 +1010,13 @@ static struct dma_map_ops *arm_get_dma_map_ops(bool coherent)
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
 			struct iommu_ops *iommu, bool coherent)
 {
-	struct dma_map_ops *dma_ops;
-
 	dev->archdata.dma_coherent = coherent;
-	if (arm_setup_iommu_dma_ops(dev, dma_base, size, iommu))
-		dma_ops = arm_get_iommu_dma_map_ops(coherent);
-	else
-		dma_ops = arm_get_dma_map_ops(coherent);
 
-	set_dma_ops(dev, dma_ops);
+	if (!common_iommu_setup_dma_ops(dev, dma_base, size, iommu))
+		arch_set_dma_ops(dev, arm_get_dma_map_ops(coherent));
 }
 
 void arch_teardown_dma_ops(struct device *dev)
 {
-	arm_teardown_iommu_dma_ops(dev);
+	common_iommu_teardown_dma_ops(dev);
 }
diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig
index 85739859dffc..7bdb5cf64ba3 100644
--- a/drivers/gpu/drm/rockchip/Kconfig
+++ b/drivers/gpu/drm/rockchip/Kconfig
@@ -1,5 +1,6 @@
 config DRM_ROCKCHIP
 	tristate "DRM Support for Rockchip"
+	depends on BROKEN
 	depends on DRM && ROCKCHIP_IOMMU
 	depends on RESET_CONTROLLER
 	select DRM_KMS_HELPER
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 1674de1cfed0..8a99210f1cbc 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -265,6 +265,7 @@ config EXYNOS_IOMMU_DEBUG
 
 config IPMMU_VMSA
 	bool "Renesas VMSA-compatible IPMMU"
+	depends on BROKEN
 	depends on ARM_LPAE
 	depends on ARCH_SHMOBILE || COMPILE_TEST
 	select IOMMU_API
diff --git a/drivers/media/platform/Kconfig b/drivers/media/platform/Kconfig
index 8b89ebe16d94..fb8bb372e489 100644
--- a/drivers/media/platform/Kconfig
+++ b/drivers/media/platform/Kconfig
@@ -88,6 +88,7 @@ config VIDEO_OMAP3
 	depends on VIDEO_V4L2 && I2C && VIDEO_V4L2_SUBDEV_API && ARCH_OMAP3
 	depends on HAS_DMA && OF
 	depends on OMAP_IOMMU
+	depends on BROKEN
 	select ARM_DMA_USE_IOMMU
 	select VIDEOBUF2_DMA_CONTIG
 	select MFD_SYSCON
-- 
1.9.2

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
  2016-02-19  8:22   ` Marek Szyprowski
  (?)
@ 2016-02-19 10:30     ` Arnd Bergmann
  -1 siblings, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-02-19 10:30 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: iommu, linux-arm-kernel, linux-kernel, linaro-mm-sig, dri-devel,
	Will Deacon, Catalin Marinas, Robin Murphy,
	Russell King - ARM Linux, Joerg Roedel, Laurent Pinchart,
	Sakari Ailus, Mark Yao, Heiko Stuebner, Tomasz Figa, Inki Dae,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

I like the overall idea. However, this interface from the iommu
subsystem into architecture specific code:

> +/*
> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
> + * is either exclusively owned by the CPU (and therefore may be accessed
> + * by it) or exclusively owned by the DMA device.  These helper functions
> + * represent the transitions between these two ownership states.
> + *
> + * Note, however, that on later ARMs, this notion does not work due to
> + * speculative prefetches.  We model our approach on the assumption that
> + * the CPU does do speculative prefetches, which means we clean caches
> + * before transfers and delay cache invalidation until transfer completion.
> + *
> + */
> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
> +				  enum dma_data_direction);
> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
> +				  enum dma_data_direction);
> +
> +static inline void arch_flush_page(struct device *dev, const void *virt,
> +			    phys_addr_t phys)
> +{
> +	dmac_flush_range(virt, virt + PAGE_SIZE);
> +	outer_flush_range(phys, phys + PAGE_SIZE);
> +}
> +
> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
> +				     enum dma_data_direction dir)
> +{
> +	unsigned int offset = phys & ~PAGE_MASK;
> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
> +}
> +
> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
> +				       enum dma_data_direction dir)
> +{
> +	unsigned int offset = phys & ~PAGE_MASK;
> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
> +}
> +
> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
> +					pgprot_t prot, bool coherent)
> +{
> +	if (coherent)
> +		return prot;
> +
> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
> +			    pgprot_writecombine(prot) :
> +			    pgprot_dmacoherent(prot);
> +	return prot;
> +}
> +
> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
> +					 gfp_t flags);
> +extern bool arch_in_atomic_pool(void *start, size_t size);
> +extern int arch_free_from_atomic_pool(void *start, size_t size);
> +
> +

doesn't feel completely right yet. In particular the arch_flush_page()
interface is probably still too specific to ARM/ARM64 and won't work
that way on other architectures.

I think it would be better to do this either more generic, or less generic:

a) leave the iommu_dma_map_ops definition in the architecture specific
   code, but make it call helper functions in the drivers/iommu to do all
   of the really generic parts.

b) clarify that this is only applicable to arch/arm and arch/arm64, and
   unify things further between these two, as they have very similar
   requirements in the CPU architecture.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-19 10:30     ` Arnd Bergmann
  0 siblings, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-02-19 10:30 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Krzysztof Kozlowski, Russell King - ARM Linux,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	linux-kernel, dri-devel, Tomasz Figa, linaro-mm-sig, iommu,
	Sakari Ailus, Laurent Pinchart, Robin Murphy, linux-arm-kernel

On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

I like the overall idea. However, this interface from the iommu
subsystem into architecture specific code:

> +/*
> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
> + * is either exclusively owned by the CPU (and therefore may be accessed
> + * by it) or exclusively owned by the DMA device.  These helper functions
> + * represent the transitions between these two ownership states.
> + *
> + * Note, however, that on later ARMs, this notion does not work due to
> + * speculative prefetches.  We model our approach on the assumption that
> + * the CPU does do speculative prefetches, which means we clean caches
> + * before transfers and delay cache invalidation until transfer completion.
> + *
> + */
> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
> +				  enum dma_data_direction);
> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
> +				  enum dma_data_direction);
> +
> +static inline void arch_flush_page(struct device *dev, const void *virt,
> +			    phys_addr_t phys)
> +{
> +	dmac_flush_range(virt, virt + PAGE_SIZE);
> +	outer_flush_range(phys, phys + PAGE_SIZE);
> +}
> +
> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
> +				     enum dma_data_direction dir)
> +{
> +	unsigned int offset = phys & ~PAGE_MASK;
> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
> +}
> +
> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
> +				       enum dma_data_direction dir)
> +{
> +	unsigned int offset = phys & ~PAGE_MASK;
> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
> +}
> +
> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
> +					pgprot_t prot, bool coherent)
> +{
> +	if (coherent)
> +		return prot;
> +
> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
> +			    pgprot_writecombine(prot) :
> +			    pgprot_dmacoherent(prot);
> +	return prot;
> +}
> +
> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
> +					 gfp_t flags);
> +extern bool arch_in_atomic_pool(void *start, size_t size);
> +extern int arch_free_from_atomic_pool(void *start, size_t size);
> +
> +

doesn't feel completely right yet. In particular the arch_flush_page()
interface is probably still too specific to ARM/ARM64 and won't work
that way on other architectures.

I think it would be better to do this either more generic, or less generic:

a) leave the iommu_dma_map_ops definition in the architecture specific
   code, but make it call helper functions in the drivers/iommu to do all
   of the really generic parts.

b) clarify that this is only applicable to arch/arm and arch/arm64, and
   unify things further between these two, as they have very similar
   requirements in the CPU architecture.

	Arnd
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-19 10:30     ` Arnd Bergmann
  0 siblings, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-02-19 10:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
> 
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>

I like the overall idea. However, this interface from the iommu
subsystem into architecture specific code:

> +/*
> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
> + * is either exclusively owned by the CPU (and therefore may be accessed
> + * by it) or exclusively owned by the DMA device.  These helper functions
> + * represent the transitions between these two ownership states.
> + *
> + * Note, however, that on later ARMs, this notion does not work due to
> + * speculative prefetches.  We model our approach on the assumption that
> + * the CPU does do speculative prefetches, which means we clean caches
> + * before transfers and delay cache invalidation until transfer completion.
> + *
> + */
> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
> +				  enum dma_data_direction);
> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
> +				  enum dma_data_direction);
> +
> +static inline void arch_flush_page(struct device *dev, const void *virt,
> +			    phys_addr_t phys)
> +{
> +	dmac_flush_range(virt, virt + PAGE_SIZE);
> +	outer_flush_range(phys, phys + PAGE_SIZE);
> +}
> +
> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
> +				     enum dma_data_direction dir)
> +{
> +	unsigned int offset = phys & ~PAGE_MASK;
> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
> +}
> +
> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
> +				       enum dma_data_direction dir)
> +{
> +	unsigned int offset = phys & ~PAGE_MASK;
> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
> +}
> +
> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
> +					pgprot_t prot, bool coherent)
> +{
> +	if (coherent)
> +		return prot;
> +
> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
> +			    pgprot_writecombine(prot) :
> +			    pgprot_dmacoherent(prot);
> +	return prot;
> +}
> +
> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
> +					 gfp_t flags);
> +extern bool arch_in_atomic_pool(void *start, size_t size);
> +extern int arch_free_from_atomic_pool(void *start, size_t size);
> +
> +

doesn't feel completely right yet. In particular the arch_flush_page()
interface is probably still too specific to ARM/ARM64 and won't work
that way on other architectures.

I think it would be better to do this either more generic, or less generic:

a) leave the iommu_dma_map_ops definition in the architecture specific
   code, but make it call helper functions in the drivers/iommu to do all
   of the really generic parts.

b) clarify that this is only applicable to arch/arm and arch/arm64, and
   unify things further between these two, as they have very similar
   requirements in the CPU architecture.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
  2016-02-19 10:30     ` Arnd Bergmann
  (?)
@ 2016-02-25 12:26       ` Marek Szyprowski
  -1 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-25 12:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: iommu, linux-arm-kernel, linux-kernel, linaro-mm-sig, dri-devel,
	Will Deacon, Catalin Marinas, Robin Murphy,
	Russell King - ARM Linux, Joerg Roedel, Laurent Pinchart,
	Sakari Ailus, Mark Yao, Heiko Stuebner, Tomasz Figa, Inki Dae,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

Hello,

On 2016-02-19 11:30, Arnd Bergmann wrote:
> On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> I like the overall idea. However, this interface from the iommu
> subsystem into architecture specific code:
>
>> +/*
>> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
>> + * is either exclusively owned by the CPU (and therefore may be accessed
>> + * by it) or exclusively owned by the DMA device.  These helper functions
>> + * represent the transitions between these two ownership states.
>> + *
>> + * Note, however, that on later ARMs, this notion does not work due to
>> + * speculative prefetches.  We model our approach on the assumption that
>> + * the CPU does do speculative prefetches, which means we clean caches
>> + * before transfers and delay cache invalidation until transfer completion.
>> + *
>> + */
>> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +
>> +static inline void arch_flush_page(struct device *dev, const void *virt,
>> +			    phys_addr_t phys)
>> +{
>> +	dmac_flush_range(virt, virt + PAGE_SIZE);
>> +	outer_flush_range(phys, phys + PAGE_SIZE);
>> +}
>> +
>> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
>> +				     enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
>> +				       enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
>> +					pgprot_t prot, bool coherent)
>> +{
>> +	if (coherent)
>> +		return prot;
>> +
>> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
>> +			    pgprot_writecombine(prot) :
>> +			    pgprot_dmacoherent(prot);
>> +	return prot;
>> +}
>> +
>> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
>> +					 gfp_t flags);
>> +extern bool arch_in_atomic_pool(void *start, size_t size);
>> +extern int arch_free_from_atomic_pool(void *start, size_t size);
>> +
>> +
> doesn't feel completely right yet. In particular the arch_flush_page()
> interface is probably still too specific to ARM/ARM64 and won't work
> that way on other architectures.
>
> I think it would be better to do this either more generic, or less generic:
>
> a) leave the iommu_dma_map_ops definition in the architecture specific
>     code, but make it call helper functions in the drivers/iommu to do all
>     of the really generic parts.
>
> b) clarify that this is only applicable to arch/arm and arch/arm64, and
>     unify things further between these two, as they have very similar
>     requirements in the CPU architecture.

Some really generic parts are already in iommu/dma-iommu.c and one can build
it's own, non-ARM CPU architecture based IOMMU/DMA-mapping code. Initially I
also wanted to use that generic code on both ARM and ARM64, but it 
turned out
that both archs, ARM and ARM64 will duplicate 99% of code, which use this
'generic' functions. This was the reason why I dedided to move all that
common code from arch/{arm,arm64}/mm/dma-mapping.c to
drivers/iommu/dma-iommu-ops.c

I'm not sure if I can design all the changes that need to be made to
drivers/iommu/dma-iommu-ops.c to make it more generic. Maybe when one will
try to use that code with other, non-ARM architecture based arch glue code,
a better abstraction can be developed. For now I would like to keep all this
code in a common place so both arm and arm64 will benefit from improvements
done there.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-25 12:26       ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-25 12:26 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Krzysztof Kozlowski, Russell King - ARM Linux,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	linux-kernel, dri-devel, Tomasz Figa, linaro-mm-sig, iommu,
	Sakari Ailus, Laurent Pinchart, Robin Murphy, linux-arm-kernel

Hello,

On 2016-02-19 11:30, Arnd Bergmann wrote:
> On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> I like the overall idea. However, this interface from the iommu
> subsystem into architecture specific code:
>
>> +/*
>> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
>> + * is either exclusively owned by the CPU (and therefore may be accessed
>> + * by it) or exclusively owned by the DMA device.  These helper functions
>> + * represent the transitions between these two ownership states.
>> + *
>> + * Note, however, that on later ARMs, this notion does not work due to
>> + * speculative prefetches.  We model our approach on the assumption that
>> + * the CPU does do speculative prefetches, which means we clean caches
>> + * before transfers and delay cache invalidation until transfer completion.
>> + *
>> + */
>> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +
>> +static inline void arch_flush_page(struct device *dev, const void *virt,
>> +			    phys_addr_t phys)
>> +{
>> +	dmac_flush_range(virt, virt + PAGE_SIZE);
>> +	outer_flush_range(phys, phys + PAGE_SIZE);
>> +}
>> +
>> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
>> +				     enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
>> +				       enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
>> +					pgprot_t prot, bool coherent)
>> +{
>> +	if (coherent)
>> +		return prot;
>> +
>> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
>> +			    pgprot_writecombine(prot) :
>> +			    pgprot_dmacoherent(prot);
>> +	return prot;
>> +}
>> +
>> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
>> +					 gfp_t flags);
>> +extern bool arch_in_atomic_pool(void *start, size_t size);
>> +extern int arch_free_from_atomic_pool(void *start, size_t size);
>> +
>> +
> doesn't feel completely right yet. In particular the arch_flush_page()
> interface is probably still too specific to ARM/ARM64 and won't work
> that way on other architectures.
>
> I think it would be better to do this either more generic, or less generic:
>
> a) leave the iommu_dma_map_ops definition in the architecture specific
>     code, but make it call helper functions in the drivers/iommu to do all
>     of the really generic parts.
>
> b) clarify that this is only applicable to arch/arm and arch/arm64, and
>     unify things further between these two, as they have very similar
>     requirements in the CPU architecture.

Some really generic parts are already in iommu/dma-iommu.c and one can build
it's own, non-ARM CPU architecture based IOMMU/DMA-mapping code. Initially I
also wanted to use that generic code on both ARM and ARM64, but it 
turned out
that both archs, ARM and ARM64 will duplicate 99% of code, which use this
'generic' functions. This was the reason why I dedided to move all that
common code from arch/{arm,arm64}/mm/dma-mapping.c to
drivers/iommu/dma-iommu-ops.c

I'm not sure if I can design all the changes that need to be made to
drivers/iommu/dma-iommu-ops.c to make it more generic. Maybe when one will
try to use that code with other, non-ARM architecture based arch glue code,
a better abstraction can be developed. For now I would like to keep all this
code in a common place so both arm and arm64 will benefit from improvements
done there.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-25 12:26       ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-02-25 12:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On 2016-02-19 11:30, Arnd Bergmann wrote:
> On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> I like the overall idea. However, this interface from the iommu
> subsystem into architecture specific code:
>
>> +/*
>> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
>> + * is either exclusively owned by the CPU (and therefore may be accessed
>> + * by it) or exclusively owned by the DMA device.  These helper functions
>> + * represent the transitions between these two ownership states.
>> + *
>> + * Note, however, that on later ARMs, this notion does not work due to
>> + * speculative prefetches.  We model our approach on the assumption that
>> + * the CPU does do speculative prefetches, which means we clean caches
>> + * before transfers and delay cache invalidation until transfer completion.
>> + *
>> + */
>> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +
>> +static inline void arch_flush_page(struct device *dev, const void *virt,
>> +			    phys_addr_t phys)
>> +{
>> +	dmac_flush_range(virt, virt + PAGE_SIZE);
>> +	outer_flush_range(phys, phys + PAGE_SIZE);
>> +}
>> +
>> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
>> +				     enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
>> +				       enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
>> +					pgprot_t prot, bool coherent)
>> +{
>> +	if (coherent)
>> +		return prot;
>> +
>> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
>> +			    pgprot_writecombine(prot) :
>> +			    pgprot_dmacoherent(prot);
>> +	return prot;
>> +}
>> +
>> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
>> +					 gfp_t flags);
>> +extern bool arch_in_atomic_pool(void *start, size_t size);
>> +extern int arch_free_from_atomic_pool(void *start, size_t size);
>> +
>> +
> doesn't feel completely right yet. In particular the arch_flush_page()
> interface is probably still too specific to ARM/ARM64 and won't work
> that way on other architectures.
>
> I think it would be better to do this either more generic, or less generic:
>
> a) leave the iommu_dma_map_ops definition in the architecture specific
>     code, but make it call helper functions in the drivers/iommu to do all
>     of the really generic parts.
>
> b) clarify that this is only applicable to arch/arm and arch/arm64, and
>     unify things further between these two, as they have very similar
>     requirements in the CPU architecture.

Some really generic parts are already in iommu/dma-iommu.c and one can build
it's own, non-ARM CPU architecture based IOMMU/DMA-mapping code. Initially I
also wanted to use that generic code on both ARM and ARM64, but it 
turned out
that both archs, ARM and ARM64 will duplicate 99% of code, which use this
'generic' functions. This was the reason why I dedided to move all that
common code from arch/{arm,arm64}/mm/dma-mapping.c to
drivers/iommu/dma-iommu-ops.c

I'm not sure if I can design all the changes that need to be made to
drivers/iommu/dma-iommu-ops.c to make it more generic. Maybe when one will
try to use that code with other, non-ARM architecture based arch glue code,
a better abstraction can be developed. For now I would like to keep all this
code in a common place so both arm and arm64 will benefit from improvements
done there.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-25 14:44         ` Arnd Bergmann
  0 siblings, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-02-25 14:44 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: iommu, linux-arm-kernel, linux-kernel, linaro-mm-sig, dri-devel,
	Will Deacon, Catalin Marinas, Robin Murphy,
	Russell King - ARM Linux, Joerg Roedel, Laurent Pinchart,
	Sakari Ailus, Mark Yao, Heiko Stuebner, Tomasz Figa, Inki Dae,
	Bartlomiej Zolnierkiewicz, Krzysztof Kozlowski

On Thursday 25 February 2016 13:26:17 Marek Szyprowski wrote:
> >> +}
> >> +
> >> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
> >> +                                     gfp_t flags);
> >> +extern bool arch_in_atomic_pool(void *start, size_t size);
> >> +extern int arch_free_from_atomic_pool(void *start, size_t size);
> >> +
> >> +
> > doesn't feel completely right yet. In particular the arch_flush_page()
> > interface is probably still too specific to ARM/ARM64 and won't work
> > that way on other architectures.
> >
> > I think it would be better to do this either more generic, or less generic:
> >
> > a) leave the iommu_dma_map_ops definition in the architecture specific
> >     code, but make it call helper functions in the drivers/iommu to do all
> >     of the really generic parts.
> >
> > b) clarify that this is only applicable to arch/arm and arch/arm64, and
> >     unify things further between these two, as they have very similar
> >     requirements in the CPU architecture.
> 
> Some really generic parts are already in iommu/dma-iommu.c and one can build
> it's own, non-ARM CPU architecture based IOMMU/DMA-mapping code. Initially I
> also wanted to use that generic code on both ARM and ARM64, but it 
> turned out
> that both archs, ARM and ARM64 will duplicate 99% of code, which use this
> 'generic' functions. This was the reason why I dedided to move all that
> common code from arch/{arm,arm64}/mm/dma-mapping.c to
> drivers/iommu/dma-iommu-ops.c
> 
> I'm not sure if I can design all the changes that need to be made to
> drivers/iommu/dma-iommu-ops.c to make it more generic. Maybe when one will
> try to use that code with other, non-ARM architecture based arch glue code,
> a better abstraction can be developed. For now I would like to keep all this
> code in a common place so both arm and arm64 will benefit from improvements
> done there.

Fair enough. Let's stay with your approach then.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-25 14:44         ` Arnd Bergmann
  0 siblings, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-02-25 14:44 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: Inki Dae, Krzysztof Kozlowski, Russell King - ARM Linux,
	Heiko Stuebner, Bartlomiej Zolnierkiewicz, Catalin Marinas,
	Will Deacon, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Sakari Ailus,
	Laurent Pinchart,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Yao

On Thursday 25 February 2016 13:26:17 Marek Szyprowski wrote:
> >> +}
> >> +
> >> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
> >> +                                     gfp_t flags);
> >> +extern bool arch_in_atomic_pool(void *start, size_t size);
> >> +extern int arch_free_from_atomic_pool(void *start, size_t size);
> >> +
> >> +
> > doesn't feel completely right yet. In particular the arch_flush_page()
> > interface is probably still too specific to ARM/ARM64 and won't work
> > that way on other architectures.
> >
> > I think it would be better to do this either more generic, or less generic:
> >
> > a) leave the iommu_dma_map_ops definition in the architecture specific
> >     code, but make it call helper functions in the drivers/iommu to do all
> >     of the really generic parts.
> >
> > b) clarify that this is only applicable to arch/arm and arch/arm64, and
> >     unify things further between these two, as they have very similar
> >     requirements in the CPU architecture.
> 
> Some really generic parts are already in iommu/dma-iommu.c and one can build
> it's own, non-ARM CPU architecture based IOMMU/DMA-mapping code. Initially I
> also wanted to use that generic code on both ARM and ARM64, but it 
> turned out
> that both archs, ARM and ARM64 will duplicate 99% of code, which use this
> 'generic' functions. This was the reason why I dedided to move all that
> common code from arch/{arm,arm64}/mm/dma-mapping.c to
> drivers/iommu/dma-iommu-ops.c
> 
> I'm not sure if I can design all the changes that need to be made to
> drivers/iommu/dma-iommu-ops.c to make it more generic. Maybe when one will
> try to use that code with other, non-ARM architecture based arch glue code,
> a better abstraction can be developed. For now I would like to keep all this
> code in a common place so both arm and arm64 will benefit from improvements
> done there.

Fair enough. Let's stay with your approach then.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-02-25 14:44         ` Arnd Bergmann
  0 siblings, 0 replies; 45+ messages in thread
From: Arnd Bergmann @ 2016-02-25 14:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Thursday 25 February 2016 13:26:17 Marek Szyprowski wrote:
> >> +}
> >> +
> >> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
> >> +                                     gfp_t flags);
> >> +extern bool arch_in_atomic_pool(void *start, size_t size);
> >> +extern int arch_free_from_atomic_pool(void *start, size_t size);
> >> +
> >> +
> > doesn't feel completely right yet. In particular the arch_flush_page()
> > interface is probably still too specific to ARM/ARM64 and won't work
> > that way on other architectures.
> >
> > I think it would be better to do this either more generic, or less generic:
> >
> > a) leave the iommu_dma_map_ops definition in the architecture specific
> >     code, but make it call helper functions in the drivers/iommu to do all
> >     of the really generic parts.
> >
> > b) clarify that this is only applicable to arch/arm and arch/arm64, and
> >     unify things further between these two, as they have very similar
> >     requirements in the CPU architecture.
> 
> Some really generic parts are already in iommu/dma-iommu.c and one can build
> it's own, non-ARM CPU architecture based IOMMU/DMA-mapping code. Initially I
> also wanted to use that generic code on both ARM and ARM64, but it 
> turned out
> that both archs, ARM and ARM64 will duplicate 99% of code, which use this
> 'generic' functions. This was the reason why I dedided to move all that
> common code from arch/{arm,arm64}/mm/dma-mapping.c to
> drivers/iommu/dma-iommu-ops.c
> 
> I'm not sure if I can design all the changes that need to be made to
> drivers/iommu/dma-iommu-ops.c to make it more generic. Maybe when one will
> try to use that code with other, non-ARM architecture based arch glue code,
> a better abstraction can be developed. For now I would like to keep all this
> code in a common place so both arm and arm64 will benefit from improvements
> done there.

Fair enough. Let's stay with your approach then.

	Arnd

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
  2016-02-19  8:22   ` Marek Szyprowski
  (?)
@ 2016-03-15 11:18     ` Magnus Damm
  -1 siblings, 0 replies; 45+ messages in thread
From: Magnus Damm @ 2016-03-15 11:18 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: iommu, linux-arm-kernel, linux-kernel, Inki Dae,
	Krzysztof Kozlowski, Russell King - ARM Linux, Heiko Stuebner,
	Arnd Bergmann, Bartlomiej Zolnierkiewicz, Catalin Marinas,
	Joerg Roedel, Will Deacon, dri-devel, Tomasz Figa, linaro-mm-sig,
	Sakari Ailus, Laurent Pinchart, Mark Yao, Robin Murphy

Hi Marek,

On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---

Thanks for your efforts and my apologies for late comments. Just FYI
I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
32-bit ARM and see how it goes. Nice not to have to support multiple
interfaces depending on architecture!

One question that comes to mind is how to handle features.

For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
while the shared code in drivers/iommu/dma-iommu.c does not. I assume
existing users may rely on such features so from my point of view it
probably makes sense to carry over features from the 32-bit ARM code
into the shared code before pulling the plug.

I also wonder if it is possible to do a step-by-step migration and
support both old and new interfaces in the same binary? That may make
things easier for multiplatform enablement. So far I've managed to
make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
the shared code in drivers/iommu/dma-iommu.c may also be possible. And
probably involving even more ugly magic. =)

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 11:18     ` Magnus Damm
  0 siblings, 0 replies; 45+ messages in thread
From: Magnus Damm @ 2016-03-15 11:18 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: linaro-mm-sig, Krzysztof Kozlowski, Russell King - ARM Linux,
	Arnd Bergmann, Bartlomiej Zolnierkiewicz, Catalin Marinas,
	Will Deacon, linux-kernel, dri-devel, Tomasz Figa, iommu,
	Sakari Ailus, Laurent Pinchart, Robin Murphy, linux-arm-kernel

Hi Marek,

On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---

Thanks for your efforts and my apologies for late comments. Just FYI
I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
32-bit ARM and see how it goes. Nice not to have to support multiple
interfaces depending on architecture!

One question that comes to mind is how to handle features.

For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
while the shared code in drivers/iommu/dma-iommu.c does not. I assume
existing users may rely on such features so from my point of view it
probably makes sense to carry over features from the 32-bit ARM code
into the shared code before pulling the plug.

I also wonder if it is possible to do a step-by-step migration and
support both old and new interfaces in the same binary? That may make
things easier for multiplatform enablement. So far I've managed to
make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
the shared code in drivers/iommu/dma-iommu.c may also be possible. And
probably involving even more ugly magic. =)

Cheers,

/ magnus
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 11:18     ` Magnus Damm
  0 siblings, 0 replies; 45+ messages in thread
From: Magnus Damm @ 2016-03-15 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marek,

On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
<m.szyprowski@samsung.com> wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
>
> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
> ---

Thanks for your efforts and my apologies for late comments. Just FYI
I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
32-bit ARM and see how it goes. Nice not to have to support multiple
interfaces depending on architecture!

One question that comes to mind is how to handle features.

For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
while the shared code in drivers/iommu/dma-iommu.c does not. I assume
existing users may rely on such features so from my point of view it
probably makes sense to carry over features from the 32-bit ARM code
into the shared code before pulling the plug.

I also wonder if it is possible to do a step-by-step migration and
support both old and new interfaces in the same binary? That may make
things easier for multiplatform enablement. So far I've managed to
make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
the shared code in drivers/iommu/dma-iommu.c may also be possible. And
probably involving even more ugly magic. =)

Cheers,

/ magnus

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 11:45       ` Robin Murphy
  0 siblings, 0 replies; 45+ messages in thread
From: Robin Murphy @ 2016-03-15 11:45 UTC (permalink / raw)
  To: Magnus Damm, Marek Szyprowski
  Cc: linaro-mm-sig, Krzysztof Kozlowski, Russell King - ARM Linux,
	Heiko Stuebner, Arnd Bergmann, Bartlomiej Zolnierkiewicz,
	Catalin Marinas, Will Deacon, linux-kernel, dri-devel, Inki Dae,
	iommu, Sakari Ailus, Laurent Pinchart, linux-arm-kernel,
	Mark Yao

Hi Magnus,

On 15/03/16 11:18, Magnus Damm wrote:
> Hi Marek,
>
> On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> ---
>
> Thanks for your efforts and my apologies for late comments. Just FYI
> I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
> 32-bit ARM and see how it goes. Nice not to have to support multiple
> interfaces depending on architecture!
>
> One question that comes to mind is how to handle features.
>
> For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
> while the shared code in drivers/iommu/dma-iommu.c does not. I assume
> existing users may rely on such features so from my point of view it
> probably makes sense to carry over features from the 32-bit ARM code
> into the shared code before pulling the plug.

Indeed - the patch I posted the other day doing proper scatterlist 
merging in the common code is largely to that end.

> I also wonder if it is possible to do a step-by-step migration and
> support both old and new interfaces in the same binary? That may make
> things easier for multiplatform enablement. So far I've managed to
> make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
> ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
> the shared code in drivers/iommu/dma-iommu.c may also be possible. And
> probably involving even more ugly magic. =)

That was also my thought when I tried to look at this a while ago - I 
started on some patches moving the bitmap from dma_iommu_mapping into 
the iommu_domain->iova_cookie so that the existing code and users could 
then be converted to just passing iommu_domains around, after which it 
should be fairly painless to swap out the back-end implementation 
transparently. That particular effort ground to a halt upon realising 
the number of the IOMMU and DRM drivers I'd have no way of testing - if 
you're interested I've dug out the diff below from an old 
work-in-progress branch (which probably doesn't even compile).

Robin.

>
> Cheers,
>
> / magnus

--->8---
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 4111592..6ea939c 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
  #ifdef CONFIG_IOMMU_API
  	void *iommu; /* private IOMMU data */
  #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-	struct dma_iommu_mapping	*mapping;
-#endif
  	bool dma_coherent;
  };

@@ -28,10 +25,4 @@ struct pdev_archdata {
  #endif
  };

-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-#define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
-
  #endif
diff --git a/arch/arm/include/asm/dma-iommu.h 
b/arch/arm/include/asm/dma-iommu.h
index 2ef282f..e15197d 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -24,13 +24,12 @@ struct dma_iommu_mapping {
  	struct kref		kref;
  };

-struct dma_iommu_mapping *
+struct iommu_domain *
  arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size);

-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);
+void arm_iommu_release_mapping(struct iommu_domain *mapping);

-int arm_iommu_attach_device(struct device *dev,
-					struct dma_iommu_mapping *mapping);
+int arm_iommu_attach_device(struct device *dev, struct iommu_domain 
*mapping);
  void arm_iommu_detach_device(struct device *dev);

  #endif /* __KERNEL__ */
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e62400e..dfb5001 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1246,7 +1246,8 @@ __iommu_alloc_remap(struct page **pages, size_t 
size, gfp_t gfp, pgprot_t prot,
  static dma_addr_t
  __iommu_create_mapping(struct device *dev, struct page **pages, size_t 
size)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  	dma_addr_t dma_addr, iova;
  	int i;
@@ -1268,8 +1269,7 @@ __iommu_create_mapping(struct device *dev, struct 
page **pages, size_t size)
  				break;

  		len = (j - i) << PAGE_SHIFT;
-		ret = iommu_map(mapping->domain, iova, phys, len,
-				IOMMU_READ|IOMMU_WRITE);
+		ret = iommu_map(dom, iova, phys, len, IOMMU_READ|IOMMU_WRITE);
  		if (ret < 0)
  			goto fail;
  		iova += len;
@@ -1277,14 +1277,14 @@ __iommu_create_mapping(struct device *dev, 
struct page **pages, size_t size)
  	}
  	return dma_addr;
  fail:
-	iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
+	iommu_unmap(dom, dma_addr, iova-dma_addr);
  	__free_iova(mapping, dma_addr, size);
  	return DMA_ERROR_CODE;
  }

  static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, 
size_t size)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);

  	/*
  	 * add optional in-page offset from iova to size and align
@@ -1293,8 +1293,8 @@ static int __iommu_remove_mapping(struct device 
*dev, dma_addr_t iova, size_t si
  	size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
  	iova &= PAGE_MASK;

-	iommu_unmap(mapping->domain, iova, size);
-	__free_iova(mapping, iova, size);
+	iommu_unmap(dom, iova, size);
+	__free_iova(dom->iova_cookie, iova, size);
  	return 0;
  }

@@ -1506,7 +1506,8 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,
  			  enum dma_data_direction dir, struct dma_attrs *attrs,
  			  bool is_coherent)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	dma_addr_t iova, iova_base;
  	int ret = 0;
  	unsigned int count;
@@ -1530,7 +1531,7 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,

  		prot = __dma_direction_to_prot(dir);

-		ret = iommu_map(mapping->domain, iova, phys, len, prot);
+		ret = iommu_map(dom, iova, phys, len, prot);
  		if (ret < 0)
  			goto fail;
  		count += len >> PAGE_SHIFT;
@@ -1540,7 +1541,7 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,

  	return 0;
  fail:
-	iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
+	iommu_unmap(dom, iova_base, count * PAGE_SIZE);
  	__free_iova(mapping, iova_base, size);
  	return ret;
  }
@@ -1727,7 +1728,8 @@ static dma_addr_t 
arm_coherent_iommu_map_page(struct device *dev, struct page *p
  	     unsigned long offset, size_t size, enum dma_data_direction dir,
  	     struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	dma_addr_t dma_addr;
  	int ret, prot, len = PAGE_ALIGN(size + offset);

@@ -1737,7 +1739,7 @@ static dma_addr_t 
arm_coherent_iommu_map_page(struct device *dev, struct page *p

  	prot = __dma_direction_to_prot(dir);

-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
+	ret = iommu_map(dom, dma_addr, page_to_phys(page), len, prot);
  	if (ret < 0)
  		goto fail;

@@ -1780,7 +1782,7 @@ static void arm_coherent_iommu_unmap_page(struct 
device *dev, dma_addr_t handle,
  		size_t size, enum dma_data_direction dir,
  		struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
  	int offset = handle & ~PAGE_MASK;
  	int len = PAGE_ALIGN(size + offset);
@@ -1788,8 +1790,8 @@ static void arm_coherent_iommu_unmap_page(struct 
device *dev, dma_addr_t handle,
  	if (!iova)
  		return;

-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
+	iommu_unmap(dom, iova, len);
+	__free_iova(dom->iova_cookie, iova, len);
  }

  /**
@@ -1805,9 +1807,9 @@ static void arm_iommu_unmap_page(struct device 
*dev, dma_addr_t handle,
  		size_t size, enum dma_data_direction dir,
  		struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	int offset = handle & ~PAGE_MASK;
  	int len = PAGE_ALIGN(size + offset);

@@ -1817,16 +1819,16 @@ static void arm_iommu_unmap_page(struct device 
*dev, dma_addr_t handle,
  	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
  		__dma_page_dev_to_cpu(page, offset, size, dir);

-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
+	iommu_unmap(dom, iova, len);
+	__free_iova(dom->iova_cookie, iova, len);
  }

  static void arm_iommu_sync_single_for_cpu(struct device *dev,
  		dma_addr_t handle, size_t size, enum dma_data_direction dir)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	unsigned int offset = handle & ~PAGE_MASK;

  	if (!iova)
@@ -1838,9 +1840,9 @@ static void arm_iommu_sync_single_for_cpu(struct 
device *dev,
  static void arm_iommu_sync_single_for_device(struct device *dev,
  		dma_addr_t handle, size_t size, enum dma_data_direction dir)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	unsigned int offset = handle & ~PAGE_MASK;

  	if (!iova)
@@ -1896,12 +1898,13 @@ struct dma_map_ops iommu_coherent_ops = {
   * The client device need to be attached to the mapping with
   * arm_iommu_attach_device function.
   */
-struct dma_iommu_mapping *
+struct iommu_domain *
  arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size)
  {
  	unsigned int bits = size >> PAGE_SHIFT;
  	unsigned int bitmap_size = BITS_TO_LONGS(bits) * sizeof(long);
  	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *dom;
  	int extensions = 1;
  	int err = -ENOMEM;

@@ -1938,12 +1941,14 @@ arm_iommu_create_mapping(struct bus_type *bus, 
dma_addr_t base, u64 size)

  	spin_lock_init(&mapping->lock);

-	mapping->domain = iommu_domain_alloc(bus);
-	if (!mapping->domain)
+	dom = iommu_domain_alloc(bus);
+	if (!dom)
  		goto err4;

+	mapping->domain = dom;
+	dom->iova_cookie = mapping;
  	kref_init(&mapping->kref);
-	return mapping;
+	return dom;
  err4:
  	kfree(mapping->bitmaps[0]);
  err3:
@@ -1986,24 +1991,27 @@ static int extend_iommu_mapping(struct 
dma_iommu_mapping *mapping)
  	return 0;
  }

-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping)
+void arm_iommu_release_mapping(struct iommu_domain *domain)
  {
-	if (mapping)
+	if (domain) {
+		struct dma_iommu_mapping *mapping = domain->iova_cookie;
+
  		kref_put(&mapping->kref, release_iommu_mapping);
+	}
  }
  EXPORT_SYMBOL_GPL(arm_iommu_release_mapping);

  static int __arm_iommu_attach_device(struct device *dev,
-				     struct dma_iommu_mapping *mapping)
+				     struct iommu_domain *domain)
  {
  	int err;
+	struct dma_iommu_mapping *mapping = domain->iova_cookie;

-	err = iommu_attach_device(mapping->domain, dev);
+	err = iommu_attach_device(domain, dev);
  	if (err)
  		return err;

  	kref_get(&mapping->kref);
-	to_dma_iommu_mapping(dev) = mapping;

  	pr_debug("Attached IOMMU controller to %s device.\n", dev_name(dev));
  	return 0;
@@ -2023,7 +2031,7 @@ static int __arm_iommu_attach_device(struct device 
*dev,
   * mapping.
   */
  int arm_iommu_attach_device(struct device *dev,
-			    struct dma_iommu_mapping *mapping)
+			    struct iommu_domain *mapping)
  {
  	int err;

@@ -2039,16 +2047,17 @@ EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
  static void __arm_iommu_detach_device(struct device *dev)
  {
  	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *dom;

-	mapping = to_dma_iommu_mapping(dev);
-	if (!mapping) {
+	dom = iommu_get_domain_for_dev(dev);
+	if (!dom) {
  		dev_warn(dev, "Not attached\n");
  		return;
  	}

-	iommu_detach_device(mapping->domain, dev);
+	mapping = dom->iova_cookie;
+	iommu_detach_device(dom, dev);
  	kref_put(&mapping->kref, release_iommu_mapping);
-	to_dma_iommu_mapping(dev) = NULL;

  	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
  }
@@ -2075,7 +2084,7 @@ static struct dma_map_ops 
*arm_get_iommu_dma_map_ops(bool coherent)
  static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
  				    struct iommu_ops *iommu)
  {
-	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *mapping;

  	if (!iommu)
  		return false;
@@ -2099,13 +2108,13 @@ static bool arm_setup_iommu_dma_ops(struct 
device *dev, u64 dma_base, u64 size,

  static void arm_teardown_iommu_dma_ops(struct device *dev)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *mapping = iommu_get_domain_for_dev(dev);

  	if (!mapping)
  		return;

  	__arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(mapping);
+	arm_iommu_release_mapping(mapping->iova_cookie);
  }

  #else

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 11:45       ` Robin Murphy
  0 siblings, 0 replies; 45+ messages in thread
From: Robin Murphy @ 2016-03-15 11:45 UTC (permalink / raw)
  To: Magnus Damm, Marek Szyprowski
  Cc: Inki Dae, Krzysztof Kozlowski, Russell King - ARM Linux,
	Heiko Stuebner, Arnd Bergmann, Bartlomiej Zolnierkiewicz,
	Catalin Marinas, Will Deacon, linux-kernel,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Sakari Ailus,
	Laurent Pinchart,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Yao

Hi Magnus,

On 15/03/16 11:18, Magnus Damm wrote:
> Hi Marek,
>
> On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
> <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org> wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
>> ---
>
> Thanks for your efforts and my apologies for late comments. Just FYI
> I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
> 32-bit ARM and see how it goes. Nice not to have to support multiple
> interfaces depending on architecture!
>
> One question that comes to mind is how to handle features.
>
> For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
> while the shared code in drivers/iommu/dma-iommu.c does not. I assume
> existing users may rely on such features so from my point of view it
> probably makes sense to carry over features from the 32-bit ARM code
> into the shared code before pulling the plug.

Indeed - the patch I posted the other day doing proper scatterlist 
merging in the common code is largely to that end.

> I also wonder if it is possible to do a step-by-step migration and
> support both old and new interfaces in the same binary? That may make
> things easier for multiplatform enablement. So far I've managed to
> make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
> ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
> the shared code in drivers/iommu/dma-iommu.c may also be possible. And
> probably involving even more ugly magic. =)

That was also my thought when I tried to look at this a while ago - I 
started on some patches moving the bitmap from dma_iommu_mapping into 
the iommu_domain->iova_cookie so that the existing code and users could 
then be converted to just passing iommu_domains around, after which it 
should be fairly painless to swap out the back-end implementation 
transparently. That particular effort ground to a halt upon realising 
the number of the IOMMU and DRM drivers I'd have no way of testing - if 
you're interested I've dug out the diff below from an old 
work-in-progress branch (which probably doesn't even compile).

Robin.

>
> Cheers,
>
> / magnus

--->8---
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 4111592..6ea939c 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
  #ifdef CONFIG_IOMMU_API
  	void *iommu; /* private IOMMU data */
  #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-	struct dma_iommu_mapping	*mapping;
-#endif
  	bool dma_coherent;
  };

@@ -28,10 +25,4 @@ struct pdev_archdata {
  #endif
  };

-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-#define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
-
  #endif
diff --git a/arch/arm/include/asm/dma-iommu.h 
b/arch/arm/include/asm/dma-iommu.h
index 2ef282f..e15197d 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -24,13 +24,12 @@ struct dma_iommu_mapping {
  	struct kref		kref;
  };

-struct dma_iommu_mapping *
+struct iommu_domain *
  arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size);

-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);
+void arm_iommu_release_mapping(struct iommu_domain *mapping);

-int arm_iommu_attach_device(struct device *dev,
-					struct dma_iommu_mapping *mapping);
+int arm_iommu_attach_device(struct device *dev, struct iommu_domain 
*mapping);
  void arm_iommu_detach_device(struct device *dev);

  #endif /* __KERNEL__ */
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e62400e..dfb5001 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1246,7 +1246,8 @@ __iommu_alloc_remap(struct page **pages, size_t 
size, gfp_t gfp, pgprot_t prot,
  static dma_addr_t
  __iommu_create_mapping(struct device *dev, struct page **pages, size_t 
size)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  	dma_addr_t dma_addr, iova;
  	int i;
@@ -1268,8 +1269,7 @@ __iommu_create_mapping(struct device *dev, struct 
page **pages, size_t size)
  				break;

  		len = (j - i) << PAGE_SHIFT;
-		ret = iommu_map(mapping->domain, iova, phys, len,
-				IOMMU_READ|IOMMU_WRITE);
+		ret = iommu_map(dom, iova, phys, len, IOMMU_READ|IOMMU_WRITE);
  		if (ret < 0)
  			goto fail;
  		iova += len;
@@ -1277,14 +1277,14 @@ __iommu_create_mapping(struct device *dev, 
struct page **pages, size_t size)
  	}
  	return dma_addr;
  fail:
-	iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
+	iommu_unmap(dom, dma_addr, iova-dma_addr);
  	__free_iova(mapping, dma_addr, size);
  	return DMA_ERROR_CODE;
  }

  static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, 
size_t size)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);

  	/*
  	 * add optional in-page offset from iova to size and align
@@ -1293,8 +1293,8 @@ static int __iommu_remove_mapping(struct device 
*dev, dma_addr_t iova, size_t si
  	size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
  	iova &= PAGE_MASK;

-	iommu_unmap(mapping->domain, iova, size);
-	__free_iova(mapping, iova, size);
+	iommu_unmap(dom, iova, size);
+	__free_iova(dom->iova_cookie, iova, size);
  	return 0;
  }

@@ -1506,7 +1506,8 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,
  			  enum dma_data_direction dir, struct dma_attrs *attrs,
  			  bool is_coherent)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	dma_addr_t iova, iova_base;
  	int ret = 0;
  	unsigned int count;
@@ -1530,7 +1531,7 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,

  		prot = __dma_direction_to_prot(dir);

-		ret = iommu_map(mapping->domain, iova, phys, len, prot);
+		ret = iommu_map(dom, iova, phys, len, prot);
  		if (ret < 0)
  			goto fail;
  		count += len >> PAGE_SHIFT;
@@ -1540,7 +1541,7 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,

  	return 0;
  fail:
-	iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
+	iommu_unmap(dom, iova_base, count * PAGE_SIZE);
  	__free_iova(mapping, iova_base, size);
  	return ret;
  }
@@ -1727,7 +1728,8 @@ static dma_addr_t 
arm_coherent_iommu_map_page(struct device *dev, struct page *p
  	     unsigned long offset, size_t size, enum dma_data_direction dir,
  	     struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	dma_addr_t dma_addr;
  	int ret, prot, len = PAGE_ALIGN(size + offset);

@@ -1737,7 +1739,7 @@ static dma_addr_t 
arm_coherent_iommu_map_page(struct device *dev, struct page *p

  	prot = __dma_direction_to_prot(dir);

-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
+	ret = iommu_map(dom, dma_addr, page_to_phys(page), len, prot);
  	if (ret < 0)
  		goto fail;

@@ -1780,7 +1782,7 @@ static void arm_coherent_iommu_unmap_page(struct 
device *dev, dma_addr_t handle,
  		size_t size, enum dma_data_direction dir,
  		struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
  	int offset = handle & ~PAGE_MASK;
  	int len = PAGE_ALIGN(size + offset);
@@ -1788,8 +1790,8 @@ static void arm_coherent_iommu_unmap_page(struct 
device *dev, dma_addr_t handle,
  	if (!iova)
  		return;

-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
+	iommu_unmap(dom, iova, len);
+	__free_iova(dom->iova_cookie, iova, len);
  }

  /**
@@ -1805,9 +1807,9 @@ static void arm_iommu_unmap_page(struct device 
*dev, dma_addr_t handle,
  		size_t size, enum dma_data_direction dir,
  		struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	int offset = handle & ~PAGE_MASK;
  	int len = PAGE_ALIGN(size + offset);

@@ -1817,16 +1819,16 @@ static void arm_iommu_unmap_page(struct device 
*dev, dma_addr_t handle,
  	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
  		__dma_page_dev_to_cpu(page, offset, size, dir);

-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
+	iommu_unmap(dom, iova, len);
+	__free_iova(dom->iova_cookie, iova, len);
  }

  static void arm_iommu_sync_single_for_cpu(struct device *dev,
  		dma_addr_t handle, size_t size, enum dma_data_direction dir)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	unsigned int offset = handle & ~PAGE_MASK;

  	if (!iova)
@@ -1838,9 +1840,9 @@ static void arm_iommu_sync_single_for_cpu(struct 
device *dev,
  static void arm_iommu_sync_single_for_device(struct device *dev,
  		dma_addr_t handle, size_t size, enum dma_data_direction dir)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	unsigned int offset = handle & ~PAGE_MASK;

  	if (!iova)
@@ -1896,12 +1898,13 @@ struct dma_map_ops iommu_coherent_ops = {
   * The client device need to be attached to the mapping with
   * arm_iommu_attach_device function.
   */
-struct dma_iommu_mapping *
+struct iommu_domain *
  arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size)
  {
  	unsigned int bits = size >> PAGE_SHIFT;
  	unsigned int bitmap_size = BITS_TO_LONGS(bits) * sizeof(long);
  	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *dom;
  	int extensions = 1;
  	int err = -ENOMEM;

@@ -1938,12 +1941,14 @@ arm_iommu_create_mapping(struct bus_type *bus, 
dma_addr_t base, u64 size)

  	spin_lock_init(&mapping->lock);

-	mapping->domain = iommu_domain_alloc(bus);
-	if (!mapping->domain)
+	dom = iommu_domain_alloc(bus);
+	if (!dom)
  		goto err4;

+	mapping->domain = dom;
+	dom->iova_cookie = mapping;
  	kref_init(&mapping->kref);
-	return mapping;
+	return dom;
  err4:
  	kfree(mapping->bitmaps[0]);
  err3:
@@ -1986,24 +1991,27 @@ static int extend_iommu_mapping(struct 
dma_iommu_mapping *mapping)
  	return 0;
  }

-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping)
+void arm_iommu_release_mapping(struct iommu_domain *domain)
  {
-	if (mapping)
+	if (domain) {
+		struct dma_iommu_mapping *mapping = domain->iova_cookie;
+
  		kref_put(&mapping->kref, release_iommu_mapping);
+	}
  }
  EXPORT_SYMBOL_GPL(arm_iommu_release_mapping);

  static int __arm_iommu_attach_device(struct device *dev,
-				     struct dma_iommu_mapping *mapping)
+				     struct iommu_domain *domain)
  {
  	int err;
+	struct dma_iommu_mapping *mapping = domain->iova_cookie;

-	err = iommu_attach_device(mapping->domain, dev);
+	err = iommu_attach_device(domain, dev);
  	if (err)
  		return err;

  	kref_get(&mapping->kref);
-	to_dma_iommu_mapping(dev) = mapping;

  	pr_debug("Attached IOMMU controller to %s device.\n", dev_name(dev));
  	return 0;
@@ -2023,7 +2031,7 @@ static int __arm_iommu_attach_device(struct device 
*dev,
   * mapping.
   */
  int arm_iommu_attach_device(struct device *dev,
-			    struct dma_iommu_mapping *mapping)
+			    struct iommu_domain *mapping)
  {
  	int err;

@@ -2039,16 +2047,17 @@ EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
  static void __arm_iommu_detach_device(struct device *dev)
  {
  	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *dom;

-	mapping = to_dma_iommu_mapping(dev);
-	if (!mapping) {
+	dom = iommu_get_domain_for_dev(dev);
+	if (!dom) {
  		dev_warn(dev, "Not attached\n");
  		return;
  	}

-	iommu_detach_device(mapping->domain, dev);
+	mapping = dom->iova_cookie;
+	iommu_detach_device(dom, dev);
  	kref_put(&mapping->kref, release_iommu_mapping);
-	to_dma_iommu_mapping(dev) = NULL;

  	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
  }
@@ -2075,7 +2084,7 @@ static struct dma_map_ops 
*arm_get_iommu_dma_map_ops(bool coherent)
  static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
  				    struct iommu_ops *iommu)
  {
-	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *mapping;

  	if (!iommu)
  		return false;
@@ -2099,13 +2108,13 @@ static bool arm_setup_iommu_dma_ops(struct 
device *dev, u64 dma_base, u64 size,

  static void arm_teardown_iommu_dma_ops(struct device *dev)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *mapping = iommu_get_domain_for_dev(dev);

  	if (!mapping)
  		return;

  	__arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(mapping);
+	arm_iommu_release_mapping(mapping->iova_cookie);
  }

  #else

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 11:45       ` Robin Murphy
  0 siblings, 0 replies; 45+ messages in thread
From: Robin Murphy @ 2016-03-15 11:45 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Magnus,

On 15/03/16 11:18, Magnus Damm wrote:
> Hi Marek,
>
> On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> ---
>
> Thanks for your efforts and my apologies for late comments. Just FYI
> I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
> 32-bit ARM and see how it goes. Nice not to have to support multiple
> interfaces depending on architecture!
>
> One question that comes to mind is how to handle features.
>
> For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
> while the shared code in drivers/iommu/dma-iommu.c does not. I assume
> existing users may rely on such features so from my point of view it
> probably makes sense to carry over features from the 32-bit ARM code
> into the shared code before pulling the plug.

Indeed - the patch I posted the other day doing proper scatterlist 
merging in the common code is largely to that end.

> I also wonder if it is possible to do a step-by-step migration and
> support both old and new interfaces in the same binary? That may make
> things easier for multiplatform enablement. So far I've managed to
> make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
> ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
> the shared code in drivers/iommu/dma-iommu.c may also be possible. And
> probably involving even more ugly magic. =)

That was also my thought when I tried to look at this a while ago - I 
started on some patches moving the bitmap from dma_iommu_mapping into 
the iommu_domain->iova_cookie so that the existing code and users could 
then be converted to just passing iommu_domains around, after which it 
should be fairly painless to swap out the back-end implementation 
transparently. That particular effort ground to a halt upon realising 
the number of the IOMMU and DRM drivers I'd have no way of testing - if 
you're interested I've dug out the diff below from an old 
work-in-progress branch (which probably doesn't even compile).

Robin.

>
> Cheers,
>
> / magnus

--->8---
diff --git a/arch/arm/include/asm/device.h b/arch/arm/include/asm/device.h
index 4111592..6ea939c 100644
--- a/arch/arm/include/asm/device.h
+++ b/arch/arm/include/asm/device.h
@@ -14,9 +14,6 @@ struct dev_archdata {
  #ifdef CONFIG_IOMMU_API
  	void *iommu; /* private IOMMU data */
  #endif
-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-	struct dma_iommu_mapping	*mapping;
-#endif
  	bool dma_coherent;
  };

@@ -28,10 +25,4 @@ struct pdev_archdata {
  #endif
  };

-#ifdef CONFIG_ARM_DMA_USE_IOMMU
-#define to_dma_iommu_mapping(dev) ((dev)->archdata.mapping)
-#else
-#define to_dma_iommu_mapping(dev) NULL
-#endif
-
  #endif
diff --git a/arch/arm/include/asm/dma-iommu.h 
b/arch/arm/include/asm/dma-iommu.h
index 2ef282f..e15197d 100644
--- a/arch/arm/include/asm/dma-iommu.h
+++ b/arch/arm/include/asm/dma-iommu.h
@@ -24,13 +24,12 @@ struct dma_iommu_mapping {
  	struct kref		kref;
  };

-struct dma_iommu_mapping *
+struct iommu_domain *
  arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size);

-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping);
+void arm_iommu_release_mapping(struct iommu_domain *mapping);

-int arm_iommu_attach_device(struct device *dev,
-					struct dma_iommu_mapping *mapping);
+int arm_iommu_attach_device(struct device *dev, struct iommu_domain 
*mapping);
  void arm_iommu_detach_device(struct device *dev);

  #endif /* __KERNEL__ */
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index e62400e..dfb5001 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1246,7 +1246,8 @@ __iommu_alloc_remap(struct page **pages, size_t 
size, gfp_t gfp, pgprot_t prot,
  static dma_addr_t
  __iommu_create_mapping(struct device *dev, struct page **pages, size_t 
size)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
  	dma_addr_t dma_addr, iova;
  	int i;
@@ -1268,8 +1269,7 @@ __iommu_create_mapping(struct device *dev, struct 
page **pages, size_t size)
  				break;

  		len = (j - i) << PAGE_SHIFT;
-		ret = iommu_map(mapping->domain, iova, phys, len,
-				IOMMU_READ|IOMMU_WRITE);
+		ret = iommu_map(dom, iova, phys, len, IOMMU_READ|IOMMU_WRITE);
  		if (ret < 0)
  			goto fail;
  		iova += len;
@@ -1277,14 +1277,14 @@ __iommu_create_mapping(struct device *dev, 
struct page **pages, size_t size)
  	}
  	return dma_addr;
  fail:
-	iommu_unmap(mapping->domain, dma_addr, iova-dma_addr);
+	iommu_unmap(dom, dma_addr, iova-dma_addr);
  	__free_iova(mapping, dma_addr, size);
  	return DMA_ERROR_CODE;
  }

  static int __iommu_remove_mapping(struct device *dev, dma_addr_t iova, 
size_t size)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);

  	/*
  	 * add optional in-page offset from iova to size and align
@@ -1293,8 +1293,8 @@ static int __iommu_remove_mapping(struct device 
*dev, dma_addr_t iova, size_t si
  	size = PAGE_ALIGN((iova & ~PAGE_MASK) + size);
  	iova &= PAGE_MASK;

-	iommu_unmap(mapping->domain, iova, size);
-	__free_iova(mapping, iova, size);
+	iommu_unmap(dom, iova, size);
+	__free_iova(dom->iova_cookie, iova, size);
  	return 0;
  }

@@ -1506,7 +1506,8 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,
  			  enum dma_data_direction dir, struct dma_attrs *attrs,
  			  bool is_coherent)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	dma_addr_t iova, iova_base;
  	int ret = 0;
  	unsigned int count;
@@ -1530,7 +1531,7 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,

  		prot = __dma_direction_to_prot(dir);

-		ret = iommu_map(mapping->domain, iova, phys, len, prot);
+		ret = iommu_map(dom, iova, phys, len, prot);
  		if (ret < 0)
  			goto fail;
  		count += len >> PAGE_SHIFT;
@@ -1540,7 +1541,7 @@ static int __map_sg_chunk(struct device *dev, 
struct scatterlist *sg,

  	return 0;
  fail:
-	iommu_unmap(mapping->domain, iova_base, count * PAGE_SIZE);
+	iommu_unmap(dom, iova_base, count * PAGE_SIZE);
  	__free_iova(mapping, iova_base, size);
  	return ret;
  }
@@ -1727,7 +1728,8 @@ static dma_addr_t 
arm_coherent_iommu_map_page(struct device *dev, struct page *p
  	     unsigned long offset, size_t size, enum dma_data_direction dir,
  	     struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
+	struct dma_iommu_mapping *mapping = dom->iova_cookie;
  	dma_addr_t dma_addr;
  	int ret, prot, len = PAGE_ALIGN(size + offset);

@@ -1737,7 +1739,7 @@ static dma_addr_t 
arm_coherent_iommu_map_page(struct device *dev, struct page *p

  	prot = __dma_direction_to_prot(dir);

-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
+	ret = iommu_map(dom, dma_addr, page_to_phys(page), len, prot);
  	if (ret < 0)
  		goto fail;

@@ -1780,7 +1782,7 @@ static void arm_coherent_iommu_unmap_page(struct 
device *dev, dma_addr_t handle,
  		size_t size, enum dma_data_direction dir,
  		struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
  	int offset = handle & ~PAGE_MASK;
  	int len = PAGE_ALIGN(size + offset);
@@ -1788,8 +1790,8 @@ static void arm_coherent_iommu_unmap_page(struct 
device *dev, dma_addr_t handle,
  	if (!iova)
  		return;

-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
+	iommu_unmap(dom, iova, len);
+	__free_iova(dom->iova_cookie, iova, len);
  }

  /**
@@ -1805,9 +1807,9 @@ static void arm_iommu_unmap_page(struct device 
*dev, dma_addr_t handle,
  		size_t size, enum dma_data_direction dir,
  		struct dma_attrs *attrs)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	int offset = handle & ~PAGE_MASK;
  	int len = PAGE_ALIGN(size + offset);

@@ -1817,16 +1819,16 @@ static void arm_iommu_unmap_page(struct device 
*dev, dma_addr_t handle,
  	if (!dma_get_attr(DMA_ATTR_SKIP_CPU_SYNC, attrs))
  		__dma_page_dev_to_cpu(page, offset, size, dir);

-	iommu_unmap(mapping->domain, iova, len);
-	__free_iova(mapping, iova, len);
+	iommu_unmap(dom, iova, len);
+	__free_iova(dom->iova_cookie, iova, len);
  }

  static void arm_iommu_sync_single_for_cpu(struct device *dev,
  		dma_addr_t handle, size_t size, enum dma_data_direction dir)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	unsigned int offset = handle & ~PAGE_MASK;

  	if (!iova)
@@ -1838,9 +1840,9 @@ static void arm_iommu_sync_single_for_cpu(struct 
device *dev,
  static void arm_iommu_sync_single_for_device(struct device *dev,
  		dma_addr_t handle, size_t size, enum dma_data_direction dir)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *dom = iommu_get_domain_for_dev(dev);
  	dma_addr_t iova = handle & PAGE_MASK;
-	struct page *page = phys_to_page(iommu_iova_to_phys(mapping->domain, 
iova));
+	struct page *page = phys_to_page(iommu_iova_to_phys(dom, iova));
  	unsigned int offset = handle & ~PAGE_MASK;

  	if (!iova)
@@ -1896,12 +1898,13 @@ struct dma_map_ops iommu_coherent_ops = {
   * The client device need to be attached to the mapping with
   * arm_iommu_attach_device function.
   */
-struct dma_iommu_mapping *
+struct iommu_domain *
  arm_iommu_create_mapping(struct bus_type *bus, dma_addr_t base, u64 size)
  {
  	unsigned int bits = size >> PAGE_SHIFT;
  	unsigned int bitmap_size = BITS_TO_LONGS(bits) * sizeof(long);
  	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *dom;
  	int extensions = 1;
  	int err = -ENOMEM;

@@ -1938,12 +1941,14 @@ arm_iommu_create_mapping(struct bus_type *bus, 
dma_addr_t base, u64 size)

  	spin_lock_init(&mapping->lock);

-	mapping->domain = iommu_domain_alloc(bus);
-	if (!mapping->domain)
+	dom = iommu_domain_alloc(bus);
+	if (!dom)
  		goto err4;

+	mapping->domain = dom;
+	dom->iova_cookie = mapping;
  	kref_init(&mapping->kref);
-	return mapping;
+	return dom;
  err4:
  	kfree(mapping->bitmaps[0]);
  err3:
@@ -1986,24 +1991,27 @@ static int extend_iommu_mapping(struct 
dma_iommu_mapping *mapping)
  	return 0;
  }

-void arm_iommu_release_mapping(struct dma_iommu_mapping *mapping)
+void arm_iommu_release_mapping(struct iommu_domain *domain)
  {
-	if (mapping)
+	if (domain) {
+		struct dma_iommu_mapping *mapping = domain->iova_cookie;
+
  		kref_put(&mapping->kref, release_iommu_mapping);
+	}
  }
  EXPORT_SYMBOL_GPL(arm_iommu_release_mapping);

  static int __arm_iommu_attach_device(struct device *dev,
-				     struct dma_iommu_mapping *mapping)
+				     struct iommu_domain *domain)
  {
  	int err;
+	struct dma_iommu_mapping *mapping = domain->iova_cookie;

-	err = iommu_attach_device(mapping->domain, dev);
+	err = iommu_attach_device(domain, dev);
  	if (err)
  		return err;

  	kref_get(&mapping->kref);
-	to_dma_iommu_mapping(dev) = mapping;

  	pr_debug("Attached IOMMU controller to %s device.\n", dev_name(dev));
  	return 0;
@@ -2023,7 +2031,7 @@ static int __arm_iommu_attach_device(struct device 
*dev,
   * mapping.
   */
  int arm_iommu_attach_device(struct device *dev,
-			    struct dma_iommu_mapping *mapping)
+			    struct iommu_domain *mapping)
  {
  	int err;

@@ -2039,16 +2047,17 @@ EXPORT_SYMBOL_GPL(arm_iommu_attach_device);
  static void __arm_iommu_detach_device(struct device *dev)
  {
  	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *dom;

-	mapping = to_dma_iommu_mapping(dev);
-	if (!mapping) {
+	dom = iommu_get_domain_for_dev(dev);
+	if (!dom) {
  		dev_warn(dev, "Not attached\n");
  		return;
  	}

-	iommu_detach_device(mapping->domain, dev);
+	mapping = dom->iova_cookie;
+	iommu_detach_device(dom, dev);
  	kref_put(&mapping->kref, release_iommu_mapping);
-	to_dma_iommu_mapping(dev) = NULL;

  	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
  }
@@ -2075,7 +2084,7 @@ static struct dma_map_ops 
*arm_get_iommu_dma_map_ops(bool coherent)
  static bool arm_setup_iommu_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
  				    struct iommu_ops *iommu)
  {
-	struct dma_iommu_mapping *mapping;
+	struct iommu_domain *mapping;

  	if (!iommu)
  		return false;
@@ -2099,13 +2108,13 @@ static bool arm_setup_iommu_dma_ops(struct 
device *dev, u64 dma_base, u64 size,

  static void arm_teardown_iommu_dma_ops(struct device *dev)
  {
-	struct dma_iommu_mapping *mapping = to_dma_iommu_mapping(dev);
+	struct iommu_domain *mapping = iommu_get_domain_for_dev(dev);

  	if (!mapping)
  		return;

  	__arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(mapping);
+	arm_iommu_release_mapping(mapping->iova_cookie);
  }

  #else

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 12:03       ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-03-15 12:03 UTC (permalink / raw)
  To: Magnus Damm
  Cc: iommu, linux-arm-kernel, linux-kernel, Inki Dae,
	Krzysztof Kozlowski, Russell King - ARM Linux, Heiko Stuebner,
	Arnd Bergmann, Bartlomiej Zolnierkiewicz, Catalin Marinas,
	Joerg Roedel, Will Deacon, dri-devel, Tomasz Figa, linaro-mm-sig,
	Sakari Ailus, Laurent Pinchart, Mark Yao, Robin Murphy

Hello,

On 2016-03-15 12:18, Magnus Damm wrote:
> Hi Marek,
>
> On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> ---
> Thanks for your efforts and my apologies for late comments. Just FYI
> I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
> 32-bit ARM and see how it goes. Nice not to have to support multiple
> interfaces depending on architecture!

Thanks for testing!

> One question that comes to mind is how to handle features.
>
> For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
> while the shared code in drivers/iommu/dma-iommu.c does not. I assume
> existing users may rely on such features so from my point of view it
> probably makes sense to carry over features from the 32-bit ARM code
> into the shared code before pulling the plug.

Right, this has to be added to common code before merging.

> I also wonder if it is possible to do a step-by-step migration and
> support both old and new interfaces in the same binary? That may make
> things easier for multiplatform enablement. So far I've managed to
> make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
> ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
> the shared code in drivers/iommu/dma-iommu.c may also be possible. And
> probably involving even more ugly magic. =)

Having one IOMMU driver for both 32-bit and 64-bit ARM archs is quite easy
IF you rely on the iommu core to setup everything. See exynos-iommu driver
- after my last patches it now works fine on both archs (using arch
specific interfaces). Most of the magic is done automatically by
arch_setup_dma_ops().

The real problem is the fact that there are drivers (like DRM) which rely
on specific dma-mapping functions from ARM architecture, which need to be
rewritten.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 12:03       ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-03-15 12:03 UTC (permalink / raw)
  To: Magnus Damm
  Cc: linaro-mm-sig-cunTk1MwBs8s++Sfvej+rw, Krzysztof Kozlowski,
	Russell King - ARM Linux, Heiko Stuebner, Arnd Bergmann,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	linux-kernel, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Inki Dae, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Sakari Ailus, Laurent Pinchart,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Mark Yao

Hello,

On 2016-03-15 12:18, Magnus Damm wrote:
> Hi Marek,
>
> On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
> <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org> wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
>> ---
> Thanks for your efforts and my apologies for late comments. Just FYI
> I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
> 32-bit ARM and see how it goes. Nice not to have to support multiple
> interfaces depending on architecture!

Thanks for testing!

> One question that comes to mind is how to handle features.
>
> For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
> while the shared code in drivers/iommu/dma-iommu.c does not. I assume
> existing users may rely on such features so from my point of view it
> probably makes sense to carry over features from the 32-bit ARM code
> into the shared code before pulling the plug.

Right, this has to be added to common code before merging.

> I also wonder if it is possible to do a step-by-step migration and
> support both old and new interfaces in the same binary? That may make
> things easier for multiplatform enablement. So far I've managed to
> make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
> ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
> the shared code in drivers/iommu/dma-iommu.c may also be possible. And
> probably involving even more ugly magic. =)

Having one IOMMU driver for both 32-bit and 64-bit ARM archs is quite easy
IF you rely on the iommu core to setup everything. See exynos-iommu driver
- after my last patches it now works fine on both archs (using arch
specific interfaces). Most of the magic is done automatically by
arch_setup_dma_ops().

The real problem is the fact that there are drivers (like DRM) which rely
on specific dma-mapping functions from ARM architecture, which need to be
rewritten.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 12:03       ` Marek Szyprowski
  0 siblings, 0 replies; 45+ messages in thread
From: Marek Szyprowski @ 2016-03-15 12:03 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On 2016-03-15 12:18, Magnus Damm wrote:
> Hi Marek,
>
> On Fri, Feb 19, 2016 at 5:22 PM, Marek Szyprowski
> <m.szyprowski@samsung.com> wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>> ---
> Thanks for your efforts and my apologies for late comments. Just FYI
> I'll try your patch (and this series) with the ipmmu-vmsa.c driver on
> 32-bit ARM and see how it goes. Nice not to have to support multiple
> interfaces depending on architecture!

Thanks for testing!

> One question that comes to mind is how to handle features.
>
> For instance, the 32-bit ARM code supports DMA_ATTR_FORCE_CONTIGUOUS
> while the shared code in drivers/iommu/dma-iommu.c does not. I assume
> existing users may rely on such features so from my point of view it
> probably makes sense to carry over features from the 32-bit ARM code
> into the shared code before pulling the plug.

Right, this has to be added to common code before merging.

> I also wonder if it is possible to do a step-by-step migration and
> support both old and new interfaces in the same binary? That may make
> things easier for multiplatform enablement. So far I've managed to
> make one IOMMU driver support both 32-bit ARM and 64-bit ARM with some
> ugly magic, so adjusting 32-bit ARM dma-mapping code to coexist with
> the shared code in drivers/iommu/dma-iommu.c may also be possible. And
> probably involving even more ugly magic. =)

Having one IOMMU driver for both 32-bit and 64-bit ARM archs is quite easy
IF you rely on the iommu core to setup everything. See exynos-iommu driver
- after my last patches it now works fine on both archs (using arch
specific interfaces). Most of the magic is done automatically by
arch_setup_dma_ops().

The real problem is the fact that there are drivers (like DRM) which rely
on specific dma-mapping functions from ARM architecture, which need to be
rewritten.

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
  2016-02-19 10:30     ` Arnd Bergmann
  (?)
@ 2016-03-15 12:33       ` Robin Murphy
  -1 siblings, 0 replies; 45+ messages in thread
From: Robin Murphy @ 2016-03-15 12:33 UTC (permalink / raw)
  To: Arnd Bergmann, Marek Szyprowski
  Cc: Inki Dae, Krzysztof Kozlowski, Russell King - ARM Linux,
	Heiko Stuebner, Bartlomiej Zolnierkiewicz, Catalin Marinas,
	Will Deacon, linux-kernel, dri-devel, linaro-mm-sig, iommu,
	Sakari Ailus, Laurent Pinchart, linux-arm-kernel, Mark Yao

Hi Marek, Arnd,

On 19/02/16 10:30, Arnd Bergmann wrote:
> On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
> I like the overall idea. However, this interface from the iommu
> subsystem into architecture specific code:
>
>> +/*
>> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
>> + * is either exclusively owned by the CPU (and therefore may be accessed
>> + * by it) or exclusively owned by the DMA device.  These helper functions
>> + * represent the transitions between these two ownership states.
>> + *
>> + * Note, however, that on later ARMs, this notion does not work due to
>> + * speculative prefetches.  We model our approach on the assumption that
>> + * the CPU does do speculative prefetches, which means we clean caches
>> + * before transfers and delay cache invalidation until transfer completion.
>> + *
>> + */
>> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +
>> +static inline void arch_flush_page(struct device *dev, const void *virt,
>> +			    phys_addr_t phys)
>> +{
>> +	dmac_flush_range(virt, virt + PAGE_SIZE);
>> +	outer_flush_range(phys, phys + PAGE_SIZE);
>> +}
>> +
>> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
>> +				     enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
>> +				       enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
>> +					pgprot_t prot, bool coherent)
>> +{
>> +	if (coherent)
>> +		return prot;
>> +
>> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
>> +			    pgprot_writecombine(prot) :
>> +			    pgprot_dmacoherent(prot);
>> +	return prot;
>> +}
>> +
>> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
>> +					 gfp_t flags);
>> +extern bool arch_in_atomic_pool(void *start, size_t size);
>> +extern int arch_free_from_atomic_pool(void *start, size_t size);
>> +
>> +
>
> doesn't feel completely right yet. In particular the arch_flush_page()
> interface is probably still too specific to ARM/ARM64 and won't work
> that way on other architectures.
>
> I think it would be better to do this either more generic, or less generic:
>
> a) leave the iommu_dma_map_ops definition in the architecture specific
>     code, but make it call helper functions in the drivers/iommu to do all
>     of the really generic parts.

This was certainly the original intent of the arm64 code. The division 
of responsibility there is a conscious decision - IOMMU-API-wrangling 
goes in the common code, cache maintenance and actual dma_map_ops stay 
hidden in architecture-private code, safe from abuse. It's very much 
modelled on SWIOTLB.

Given all the work Russell did last year getting rid of direct uses of 
the dmac_* cache maintenance functions by ARM drivers, I don't think 
bringing all of that back is a good way to go - Personally I'd much 
rather see several dozen lines of very similar looking (other than 
highmem and outer cache stuff) arch-private code if it maintains a 
robust and clearly-defined abstraction (and avoids yet another level of 
indirection). It does also seem a little odd to factor out only half the 
file on the grounds of architectural similarity, when that argument 
applies equally to the other (non-IOMMU) half too. I think the recent 
tree-wide conversion to generic dma_map_ops was in part motivated by the 
thought of common implementations, so I'm sure that's something we can 
revisit in due course.

Robin.

>
> b) clarify that this is only applicable to arch/arm and arch/arm64, and
>     unify things further between these two, as they have very similar
>     requirements in the CPU architecture.
>
> 	Arnd
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 12:33       ` Robin Murphy
  0 siblings, 0 replies; 45+ messages in thread
From: Robin Murphy @ 2016-03-15 12:33 UTC (permalink / raw)
  To: Arnd Bergmann, Marek Szyprowski
  Cc: linaro-mm-sig, Krzysztof Kozlowski, Russell King - ARM Linux,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	linux-kernel, dri-devel, iommu, Sakari Ailus, Laurent Pinchart,
	linux-arm-kernel

Hi Marek, Arnd,

On 19/02/16 10:30, Arnd Bergmann wrote:
> On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
> I like the overall idea. However, this interface from the iommu
> subsystem into architecture specific code:
>
>> +/*
>> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
>> + * is either exclusively owned by the CPU (and therefore may be accessed
>> + * by it) or exclusively owned by the DMA device.  These helper functions
>> + * represent the transitions between these two ownership states.
>> + *
>> + * Note, however, that on later ARMs, this notion does not work due to
>> + * speculative prefetches.  We model our approach on the assumption that
>> + * the CPU does do speculative prefetches, which means we clean caches
>> + * before transfers and delay cache invalidation until transfer completion.
>> + *
>> + */
>> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +
>> +static inline void arch_flush_page(struct device *dev, const void *virt,
>> +			    phys_addr_t phys)
>> +{
>> +	dmac_flush_range(virt, virt + PAGE_SIZE);
>> +	outer_flush_range(phys, phys + PAGE_SIZE);
>> +}
>> +
>> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
>> +				     enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
>> +				       enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
>> +					pgprot_t prot, bool coherent)
>> +{
>> +	if (coherent)
>> +		return prot;
>> +
>> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
>> +			    pgprot_writecombine(prot) :
>> +			    pgprot_dmacoherent(prot);
>> +	return prot;
>> +}
>> +
>> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
>> +					 gfp_t flags);
>> +extern bool arch_in_atomic_pool(void *start, size_t size);
>> +extern int arch_free_from_atomic_pool(void *start, size_t size);
>> +
>> +
>
> doesn't feel completely right yet. In particular the arch_flush_page()
> interface is probably still too specific to ARM/ARM64 and won't work
> that way on other architectures.
>
> I think it would be better to do this either more generic, or less generic:
>
> a) leave the iommu_dma_map_ops definition in the architecture specific
>     code, but make it call helper functions in the drivers/iommu to do all
>     of the really generic parts.

This was certainly the original intent of the arm64 code. The division 
of responsibility there is a conscious decision - IOMMU-API-wrangling 
goes in the common code, cache maintenance and actual dma_map_ops stay 
hidden in architecture-private code, safe from abuse. It's very much 
modelled on SWIOTLB.

Given all the work Russell did last year getting rid of direct uses of 
the dmac_* cache maintenance functions by ARM drivers, I don't think 
bringing all of that back is a good way to go - Personally I'd much 
rather see several dozen lines of very similar looking (other than 
highmem and outer cache stuff) arch-private code if it maintains a 
robust and clearly-defined abstraction (and avoids yet another level of 
indirection). It does also seem a little odd to factor out only half the 
file on the grounds of architectural similarity, when that argument 
applies equally to the other (non-IOMMU) half too. I think the recent 
tree-wide conversion to generic dma_map_ops was in part motivated by the 
thought of common implementations, so I'm sure that's something we can 
revisit in due course.

Robin.

>
> b) clarify that this is only applicable to arch/arm and arch/arm64, and
>     unify things further between these two, as they have very similar
>     requirements in the CPU architecture.
>
> 	Arnd
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-03-15 12:33       ` Robin Murphy
  0 siblings, 0 replies; 45+ messages in thread
From: Robin Murphy @ 2016-03-15 12:33 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marek, Arnd,

On 19/02/16 10:30, Arnd Bergmann wrote:
> On Friday 19 February 2016 09:22:44 Marek Szyprowski wrote:
>> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
>> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
>> side-effect of this change is a switch from bitmap-based IO address space
>> management to tree-based code. There should be no functional changes
>> for drivers, which rely on initialization from generic arch_setup_dna_ops()
>> interface. Code, which used old arm_iommu_* functions must be updated to
>> new interface.
>>
>> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
>
> I like the overall idea. However, this interface from the iommu
> subsystem into architecture specific code:
>
>> +/*
>> + * The DMA API is built upon the notion of "buffer ownership".  A buffer
>> + * is either exclusively owned by the CPU (and therefore may be accessed
>> + * by it) or exclusively owned by the DMA device.  These helper functions
>> + * represent the transitions between these two ownership states.
>> + *
>> + * Note, however, that on later ARMs, this notion does not work due to
>> + * speculative prefetches.  We model our approach on the assumption that
>> + * the CPU does do speculative prefetches, which means we clean caches
>> + * before transfers and delay cache invalidation until transfer completion.
>> + *
>> + */
>> +extern void __dma_page_cpu_to_dev(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +extern void __dma_page_dev_to_cpu(struct page *, unsigned long, size_t,
>> +				  enum dma_data_direction);
>> +
>> +static inline void arch_flush_page(struct device *dev, const void *virt,
>> +			    phys_addr_t phys)
>> +{
>> +	dmac_flush_range(virt, virt + PAGE_SIZE);
>> +	outer_flush_range(phys, phys + PAGE_SIZE);
>> +}
>> +
>> +static inline void arch_dma_map_area(phys_addr_t phys, size_t size,
>> +				     enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_cpu_to_dev(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline void arch_dma_unmap_area(phys_addr_t phys, size_t size,
>> +				       enum dma_data_direction dir)
>> +{
>> +	unsigned int offset = phys & ~PAGE_MASK;
>> +	__dma_page_dev_to_cpu(phys_to_page(phys & PAGE_MASK), offset, size, dir);
>> +}
>> +
>> +static inline pgprot_t arch_get_dma_pgprot(struct dma_attrs *attrs,
>> +					pgprot_t prot, bool coherent)
>> +{
>> +	if (coherent)
>> +		return prot;
>> +
>> +	prot = dma_get_attr(DMA_ATTR_WRITE_COMBINE, attrs) ?
>> +			    pgprot_writecombine(prot) :
>> +			    pgprot_dmacoherent(prot);
>> +	return prot;
>> +}
>> +
>> +extern void *arch_alloc_from_atomic_pool(size_t size, struct page **ret_page,
>> +					 gfp_t flags);
>> +extern bool arch_in_atomic_pool(void *start, size_t size);
>> +extern int arch_free_from_atomic_pool(void *start, size_t size);
>> +
>> +
>
> doesn't feel completely right yet. In particular the arch_flush_page()
> interface is probably still too specific to ARM/ARM64 and won't work
> that way on other architectures.
>
> I think it would be better to do this either more generic, or less generic:
>
> a) leave the iommu_dma_map_ops definition in the architecture specific
>     code, but make it call helper functions in the drivers/iommu to do all
>     of the really generic parts.

This was certainly the original intent of the arm64 code. The division 
of responsibility there is a conscious decision - IOMMU-API-wrangling 
goes in the common code, cache maintenance and actual dma_map_ops stay 
hidden in architecture-private code, safe from abuse. It's very much 
modelled on SWIOTLB.

Given all the work Russell did last year getting rid of direct uses of 
the dmac_* cache maintenance functions by ARM drivers, I don't think 
bringing all of that back is a good way to go - Personally I'd much 
rather see several dozen lines of very similar looking (other than 
highmem and outer cache stuff) arch-private code if it maintains a 
robust and clearly-defined abstraction (and avoids yet another level of 
indirection). It does also seem a little odd to factor out only half the 
file on the grounds of architectural similarity, when that argument 
applies equally to the other (non-IOMMU) half too. I think the recent 
tree-wide conversion to generic dma_map_ops was in part motivated by the 
thought of common implementations, so I'm sure that's something we can 
revisit in due course.

Robin.

>
> b) clarify that this is only applicable to arch/arm and arch/arm64, and
>     unify things further between these two, as they have very similar
>     requirements in the CPU architecture.
>
> 	Arnd
> _______________________________________________
> iommu mailing list
> iommu at lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
>

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
  2016-02-19  8:22 ` Marek Szyprowski
  (?)
@ 2016-04-18  2:18   ` Mark yao
  -1 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:18 UTC (permalink / raw)
  To: Marek Szyprowski, iommu, linux-arm-kernel, linux-kernel
  Cc: linaro-mm-sig, dri-devel, Arnd Bergmann, Will Deacon,
	Catalin Marinas, Robin Murphy, Russell King - ARM Linux,
	Joerg Roedel, Laurent Pinchart, Sakari Ailus, Heiko Stuebner,
	Tomasz Figa, Inki Dae, Bartlomiej Zolnierkiewicz,
	Krzysztof Kozlowski

Hi Marek

With your patches, do drm/rockchip dma-mapping rewrite, IOMMU works good 
on drm/rockchip ARM64 platform.

Thanks for your patches.

On 2016年02月19日 16:22, Marek Szyprowski wrote:
> Dear All,
>
> This is an initial RFC on the unification of IOMMU-based DMA-mapping
> code for ARM and ARM64 architectures.
>
> Right now ARM architecture still use my old code for IOMMU-based
> DMA-mapping glue, initially merged in commit
> 4ce63fcd919c32d22528e54dcd89506962933719 ("ARM: dma-mapping: add support
> for IOMMU mapper"). In meantime ARM64 got a new, slightly improved
> implementation provided by Robin Murphy in commit
> 13b8629f651164d71f4d38b821925f93ba4236c8 ("arm64: Add IOMMU dma_ops").
>
> Both implementations are very similar thus their unification is desired
> to avoid duplicating future works and simplify code, which uses this
> layer on both architectures. In this patchset I've selected the new
> implementation (from ARM64 architecture) as a base. This means that
> ARM-specific, old interface (arm_iommu_* functions) for configuring
> IOMMU domains will be no longer available and its users have to be
> converted to new API.
>
> Besides lack of old interface, the second difference is additional
> requirements for IOMMU drivers. New code relies on the support for
> IOMMU_DOMAIN_DMA and default IOMMU domain, which is automatically
> attached by the IOMMU core.
>
> The new code also assumes that the IOMMU-based DMA-mapping ops are
> mainly configured from arch_setup_dma_ops() function, which means that
> the IOMMU driver should provide needed of_xlate callbacks and initialize
> IOMMU ops for device nodes. However it should be also possible to
> initialize IOMMU-based DMA-mapping ops for client devices directly from
> IOMMU drivers by calling common_iommu_setup_dma_ops() (some drivers used
> such approach).
>
> IOMMU drivers should be also aware of the fact that the
> default domain is attached via device_attach and then device_attach
> callback can be called once again with different domain without previous
> detach from default domain. For more information on this issue, see the
> following thread:
> https://lists.linaro.org/pipermail/linaro-mm-sig/2016-February/004625.html
>
> Currently there are 4 users of the old arm_iommu_* interface:
> 1. Exynos DRM driver
> 2. Rockchip DRM driver
> 3. OMAP3 ISP camera driver
> 4. Renesas VMSA-compatible IPMMU driver
>
> In this patchset I've converted Exynos DRM driver for the new API (patch
> 1). This required some changes in the memory management model inside the
> driver and removal of some hacks, which were used to setup IOMMU-based
> DMA-mapping ops on the 'exynos-drm' virtual device and common IOMMU
> domain for all Exynos DRM sub-devices, those changes have been posted
> separately here: http://www.spinics.net/lists/dri-devel/msg100861.html
> Rockchip DRM driver requires similar conversion.
>
> Converting OMAP3 ISP camera driver to new API requires adding support
> for IOMMU groups to OMAP IOMMU driver, because the new DMA/IOMMU code
> used IOMMU_DOMAIN_DMA type domains and default groups.
>
> Renesas IPMMU driver needs also to be extended with IOMMU_DOMAIN_DMA domain
> type support. It can also be prepared for IOMMU_OF_DECLARE and of_xlate
> callback-based initialization to let core to automatically setup of
> IOMMU-based DMA mapping implementation.
>
> Patch 2 moves existing code from arch/arm64 to drivers/iommu and
> introduces some minor changes in function names - mainly adding arch_
> prefix to some dma-mapping internal functions, which stay in arch/arm64/
> (functions of similar names are present in arch/arm). Patch 3 adapts ARM
> architecture for the common code.
>
> I would like to get your comments on the proposed approach. There is
> still some work that need to be done to convert remaining users of the
> old API and updating IOMMU drivers to the new API requirements. This
> change need to be tested on the all affected ARM sub-architectures.
>
> Right now patches were tested on only Exynos based boards: ARM 32bit:
> Exynos4412 and Exynos5422 boards and ARM 64 bit Exnyos 5433 (with some
> out-of-tree DTS).
>
> To ease testing I've prepared a branch with all the patches needed
> (there are all needed patches for Exynos subarch, which have been posted
> as separate patchsets):
> https://git.linaro.org/people/marek.szyprowski/linux-srpol.git v4.5-dma-iommu-unification
>
> Patches are based on Linux v4.5-rc4 vanilla tree.
>
> Best regards
> Marek Szyprowski
> Samsung R&D Institute Poland
>
>
> Patch summary:
>
> Marek Szyprowski (3):
>    drm/exynos: rewrite IOMMU support code
>    iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
>    iommu: dma-iommu: use common implementation also on ARM architecture
>
>   arch/arm/Kconfig                          |   22 +-
>   arch/arm/include/asm/device.h             |    9 -
>   arch/arm/include/asm/dma-iommu.h          |   37 -
>   arch/arm/include/asm/dma-mapping.h        |   59 +-
>   arch/arm/mm/dma-mapping.c                 | 1158 +----------------------------
>   arch/arm64/include/asm/dma-mapping.h      |   39 +-
>   arch/arm64/mm/dma-mapping.c               |  491 +-----------
>   drivers/gpu/drm/exynos/Kconfig            |    2 +-
>   drivers/gpu/drm/exynos/exynos_drm_drv.c   |    7 +-
>   drivers/gpu/drm/exynos/exynos_drm_drv.h   |    2 +-
>   drivers/gpu/drm/exynos/exynos_drm_iommu.c |   91 ++-
>   drivers/gpu/drm/exynos/exynos_drm_iommu.h |    2 +-
>   drivers/gpu/drm/rockchip/Kconfig          |    1 +
>   drivers/iommu/Kconfig                     |    1 +
>   drivers/iommu/Makefile                    |    2 +-
>   drivers/iommu/dma-iommu-ops.c             |  471 ++++++++++++
>   drivers/media/platform/Kconfig            |    1 +
>   include/linux/dma-iommu.h                 |   14 +
>   18 files changed, 679 insertions(+), 1730 deletions(-)
>   delete mode 100644 arch/arm/include/asm/dma-iommu.h
>   create mode 100644 drivers/iommu/dma-iommu-ops.c
>


-- 
Mark Yao

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
@ 2016-04-18  2:18   ` Mark yao
  0 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:18 UTC (permalink / raw)
  To: Marek Szyprowski, iommu, linux-arm-kernel, linux-kernel
  Cc: Krzysztof Kozlowski, Russell King - ARM Linux, Arnd Bergmann,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	dri-devel, Tomasz Figa, linaro-mm-sig, Sakari Ailus,
	Laurent Pinchart, Robin Murphy

Hi Marek

With your patches, do drm/rockchip dma-mapping rewrite, IOMMU works good 
on drm/rockchip ARM64 platform.

Thanks for your patches.

On 2016年02月19日 16:22, Marek Szyprowski wrote:
> Dear All,
>
> This is an initial RFC on the unification of IOMMU-based DMA-mapping
> code for ARM and ARM64 architectures.
>
> Right now ARM architecture still use my old code for IOMMU-based
> DMA-mapping glue, initially merged in commit
> 4ce63fcd919c32d22528e54dcd89506962933719 ("ARM: dma-mapping: add support
> for IOMMU mapper"). In meantime ARM64 got a new, slightly improved
> implementation provided by Robin Murphy in commit
> 13b8629f651164d71f4d38b821925f93ba4236c8 ("arm64: Add IOMMU dma_ops").
>
> Both implementations are very similar thus their unification is desired
> to avoid duplicating future works and simplify code, which uses this
> layer on both architectures. In this patchset I've selected the new
> implementation (from ARM64 architecture) as a base. This means that
> ARM-specific, old interface (arm_iommu_* functions) for configuring
> IOMMU domains will be no longer available and its users have to be
> converted to new API.
>
> Besides lack of old interface, the second difference is additional
> requirements for IOMMU drivers. New code relies on the support for
> IOMMU_DOMAIN_DMA and default IOMMU domain, which is automatically
> attached by the IOMMU core.
>
> The new code also assumes that the IOMMU-based DMA-mapping ops are
> mainly configured from arch_setup_dma_ops() function, which means that
> the IOMMU driver should provide needed of_xlate callbacks and initialize
> IOMMU ops for device nodes. However it should be also possible to
> initialize IOMMU-based DMA-mapping ops for client devices directly from
> IOMMU drivers by calling common_iommu_setup_dma_ops() (some drivers used
> such approach).
>
> IOMMU drivers should be also aware of the fact that the
> default domain is attached via device_attach and then device_attach
> callback can be called once again with different domain without previous
> detach from default domain. For more information on this issue, see the
> following thread:
> https://lists.linaro.org/pipermail/linaro-mm-sig/2016-February/004625.html
>
> Currently there are 4 users of the old arm_iommu_* interface:
> 1. Exynos DRM driver
> 2. Rockchip DRM driver
> 3. OMAP3 ISP camera driver
> 4. Renesas VMSA-compatible IPMMU driver
>
> In this patchset I've converted Exynos DRM driver for the new API (patch
> 1). This required some changes in the memory management model inside the
> driver and removal of some hacks, which were used to setup IOMMU-based
> DMA-mapping ops on the 'exynos-drm' virtual device and common IOMMU
> domain for all Exynos DRM sub-devices, those changes have been posted
> separately here: http://www.spinics.net/lists/dri-devel/msg100861.html
> Rockchip DRM driver requires similar conversion.
>
> Converting OMAP3 ISP camera driver to new API requires adding support
> for IOMMU groups to OMAP IOMMU driver, because the new DMA/IOMMU code
> used IOMMU_DOMAIN_DMA type domains and default groups.
>
> Renesas IPMMU driver needs also to be extended with IOMMU_DOMAIN_DMA domain
> type support. It can also be prepared for IOMMU_OF_DECLARE and of_xlate
> callback-based initialization to let core to automatically setup of
> IOMMU-based DMA mapping implementation.
>
> Patch 2 moves existing code from arch/arm64 to drivers/iommu and
> introduces some minor changes in function names - mainly adding arch_
> prefix to some dma-mapping internal functions, which stay in arch/arm64/
> (functions of similar names are present in arch/arm). Patch 3 adapts ARM
> architecture for the common code.
>
> I would like to get your comments on the proposed approach. There is
> still some work that need to be done to convert remaining users of the
> old API and updating IOMMU drivers to the new API requirements. This
> change need to be tested on the all affected ARM sub-architectures.
>
> Right now patches were tested on only Exynos based boards: ARM 32bit:
> Exynos4412 and Exynos5422 boards and ARM 64 bit Exnyos 5433 (with some
> out-of-tree DTS).
>
> To ease testing I've prepared a branch with all the patches needed
> (there are all needed patches for Exynos subarch, which have been posted
> as separate patchsets):
> https://git.linaro.org/people/marek.szyprowski/linux-srpol.git v4.5-dma-iommu-unification
>
> Patches are based on Linux v4.5-rc4 vanilla tree.
>
> Best regards
> Marek Szyprowski
> Samsung R&D Institute Poland
>
>
> Patch summary:
>
> Marek Szyprowski (3):
>    drm/exynos: rewrite IOMMU support code
>    iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
>    iommu: dma-iommu: use common implementation also on ARM architecture
>
>   arch/arm/Kconfig                          |   22 +-
>   arch/arm/include/asm/device.h             |    9 -
>   arch/arm/include/asm/dma-iommu.h          |   37 -
>   arch/arm/include/asm/dma-mapping.h        |   59 +-
>   arch/arm/mm/dma-mapping.c                 | 1158 +----------------------------
>   arch/arm64/include/asm/dma-mapping.h      |   39 +-
>   arch/arm64/mm/dma-mapping.c               |  491 +-----------
>   drivers/gpu/drm/exynos/Kconfig            |    2 +-
>   drivers/gpu/drm/exynos/exynos_drm_drv.c   |    7 +-
>   drivers/gpu/drm/exynos/exynos_drm_drv.h   |    2 +-
>   drivers/gpu/drm/exynos/exynos_drm_iommu.c |   91 ++-
>   drivers/gpu/drm/exynos/exynos_drm_iommu.h |    2 +-
>   drivers/gpu/drm/rockchip/Kconfig          |    1 +
>   drivers/iommu/Kconfig                     |    1 +
>   drivers/iommu/Makefile                    |    2 +-
>   drivers/iommu/dma-iommu-ops.c             |  471 ++++++++++++
>   drivers/media/platform/Kconfig            |    1 +
>   include/linux/dma-iommu.h                 |   14 +
>   18 files changed, 679 insertions(+), 1730 deletions(-)
>   delete mode 100644 arch/arm/include/asm/dma-iommu.h
>   create mode 100644 drivers/iommu/dma-iommu-ops.c
>


-- 
Mark Yao


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
@ 2016-04-18  2:18   ` Mark yao
  0 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:18 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Marek

With your patches, do drm/rockchip dma-mapping rewrite, IOMMU works good 
on drm/rockchip ARM64 platform.

Thanks for your patches.

On 2016?02?19? 16:22, Marek Szyprowski wrote:
> Dear All,
>
> This is an initial RFC on the unification of IOMMU-based DMA-mapping
> code for ARM and ARM64 architectures.
>
> Right now ARM architecture still use my old code for IOMMU-based
> DMA-mapping glue, initially merged in commit
> 4ce63fcd919c32d22528e54dcd89506962933719 ("ARM: dma-mapping: add support
> for IOMMU mapper"). In meantime ARM64 got a new, slightly improved
> implementation provided by Robin Murphy in commit
> 13b8629f651164d71f4d38b821925f93ba4236c8 ("arm64: Add IOMMU dma_ops").
>
> Both implementations are very similar thus their unification is desired
> to avoid duplicating future works and simplify code, which uses this
> layer on both architectures. In this patchset I've selected the new
> implementation (from ARM64 architecture) as a base. This means that
> ARM-specific, old interface (arm_iommu_* functions) for configuring
> IOMMU domains will be no longer available and its users have to be
> converted to new API.
>
> Besides lack of old interface, the second difference is additional
> requirements for IOMMU drivers. New code relies on the support for
> IOMMU_DOMAIN_DMA and default IOMMU domain, which is automatically
> attached by the IOMMU core.
>
> The new code also assumes that the IOMMU-based DMA-mapping ops are
> mainly configured from arch_setup_dma_ops() function, which means that
> the IOMMU driver should provide needed of_xlate callbacks and initialize
> IOMMU ops for device nodes. However it should be also possible to
> initialize IOMMU-based DMA-mapping ops for client devices directly from
> IOMMU drivers by calling common_iommu_setup_dma_ops() (some drivers used
> such approach).
>
> IOMMU drivers should be also aware of the fact that the
> default domain is attached via device_attach and then device_attach
> callback can be called once again with different domain without previous
> detach from default domain. For more information on this issue, see the
> following thread:
> https://lists.linaro.org/pipermail/linaro-mm-sig/2016-February/004625.html
>
> Currently there are 4 users of the old arm_iommu_* interface:
> 1. Exynos DRM driver
> 2. Rockchip DRM driver
> 3. OMAP3 ISP camera driver
> 4. Renesas VMSA-compatible IPMMU driver
>
> In this patchset I've converted Exynos DRM driver for the new API (patch
> 1). This required some changes in the memory management model inside the
> driver and removal of some hacks, which were used to setup IOMMU-based
> DMA-mapping ops on the 'exynos-drm' virtual device and common IOMMU
> domain for all Exynos DRM sub-devices, those changes have been posted
> separately here: http://www.spinics.net/lists/dri-devel/msg100861.html
> Rockchip DRM driver requires similar conversion.
>
> Converting OMAP3 ISP camera driver to new API requires adding support
> for IOMMU groups to OMAP IOMMU driver, because the new DMA/IOMMU code
> used IOMMU_DOMAIN_DMA type domains and default groups.
>
> Renesas IPMMU driver needs also to be extended with IOMMU_DOMAIN_DMA domain
> type support. It can also be prepared for IOMMU_OF_DECLARE and of_xlate
> callback-based initialization to let core to automatically setup of
> IOMMU-based DMA mapping implementation.
>
> Patch 2 moves existing code from arch/arm64 to drivers/iommu and
> introduces some minor changes in function names - mainly adding arch_
> prefix to some dma-mapping internal functions, which stay in arch/arm64/
> (functions of similar names are present in arch/arm). Patch 3 adapts ARM
> architecture for the common code.
>
> I would like to get your comments on the proposed approach. There is
> still some work that need to be done to convert remaining users of the
> old API and updating IOMMU drivers to the new API requirements. This
> change need to be tested on the all affected ARM sub-architectures.
>
> Right now patches were tested on only Exynos based boards: ARM 32bit:
> Exynos4412 and Exynos5422 boards and ARM 64 bit Exnyos 5433 (with some
> out-of-tree DTS).
>
> To ease testing I've prepared a branch with all the patches needed
> (there are all needed patches for Exynos subarch, which have been posted
> as separate patchsets):
> https://git.linaro.org/people/marek.szyprowski/linux-srpol.git v4.5-dma-iommu-unification
>
> Patches are based on Linux v4.5-rc4 vanilla tree.
>
> Best regards
> Marek Szyprowski
> Samsung R&D Institute Poland
>
>
> Patch summary:
>
> Marek Szyprowski (3):
>    drm/exynos: rewrite IOMMU support code
>    iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
>    iommu: dma-iommu: use common implementation also on ARM architecture
>
>   arch/arm/Kconfig                          |   22 +-
>   arch/arm/include/asm/device.h             |    9 -
>   arch/arm/include/asm/dma-iommu.h          |   37 -
>   arch/arm/include/asm/dma-mapping.h        |   59 +-
>   arch/arm/mm/dma-mapping.c                 | 1158 +----------------------------
>   arch/arm64/include/asm/dma-mapping.h      |   39 +-
>   arch/arm64/mm/dma-mapping.c               |  491 +-----------
>   drivers/gpu/drm/exynos/Kconfig            |    2 +-
>   drivers/gpu/drm/exynos/exynos_drm_drv.c   |    7 +-
>   drivers/gpu/drm/exynos/exynos_drm_drv.h   |    2 +-
>   drivers/gpu/drm/exynos/exynos_drm_iommu.c |   91 ++-
>   drivers/gpu/drm/exynos/exynos_drm_iommu.h |    2 +-
>   drivers/gpu/drm/rockchip/Kconfig          |    1 +
>   drivers/iommu/Kconfig                     |    1 +
>   drivers/iommu/Makefile                    |    2 +-
>   drivers/iommu/dma-iommu-ops.c             |  471 ++++++++++++
>   drivers/media/platform/Kconfig            |    1 +
>   include/linux/dma-iommu.h                 |   14 +
>   18 files changed, 679 insertions(+), 1730 deletions(-)
>   delete mode 100644 arch/arm/include/asm/dma-iommu.h
>   create mode 100644 drivers/iommu/dma-iommu-ops.c
>


-- 
?ark Yao

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 2/3] iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
  2016-02-19  8:22   ` Marek Szyprowski
  (?)
@ 2016-04-18  2:20     ` Mark yao
  -1 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:20 UTC (permalink / raw)
  To: Marek Szyprowski, iommu, linux-arm-kernel, linux-kernel
  Cc: linaro-mm-sig, dri-devel, Arnd Bergmann, Will Deacon,
	Catalin Marinas, Robin Murphy, Russell King - ARM Linux,
	Joerg Roedel, Laurent Pinchart, Sakari Ailus, Heiko Stuebner,
	Tomasz Figa, Inki Dae, Bartlomiej Zolnierkiewicz,
	Krzysztof Kozlowski

On 2016年02月19日 16:22, Marek Szyprowski wrote:
> This patch moves all the IOMMU-based DMA-mapping code from arch/arm64/mm
> to drivers/iommu/dma-iommu-ops.c. This way it can be easily shared with
> ARM architecture, which will also use them.
>
> Signed-off-by: Marek Szyprowski<m.szyprowski@samsung.com>
IOMMU works good on drm/rockchip ARM64 platform with this patch. So

Tested-by: Mark Yao <mark.yao@rock-chips.com>

Thanks.

-- 
Mark Yao

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 2/3] iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
@ 2016-04-18  2:20     ` Mark yao
  0 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:20 UTC (permalink / raw)
  To: Marek Szyprowski, iommu, linux-arm-kernel, linux-kernel
  Cc: Krzysztof Kozlowski, Russell King - ARM Linux, Arnd Bergmann,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	dri-devel, Tomasz Figa, linaro-mm-sig, Sakari Ailus,
	Laurent Pinchart, Robin Murphy

On 2016年02月19日 16:22, Marek Szyprowski wrote:
> This patch moves all the IOMMU-based DMA-mapping code from arch/arm64/mm
> to drivers/iommu/dma-iommu-ops.c. This way it can be easily shared with
> ARM architecture, which will also use them.
>
> Signed-off-by: Marek Szyprowski<m.szyprowski@samsung.com>
IOMMU works good on drm/rockchip ARM64 platform with this patch. So

Tested-by: Mark Yao <mark.yao@rock-chips.com>

Thanks.

-- 
Mark Yao


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 2/3] iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers
@ 2016-04-18  2:20     ` Mark yao
  0 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016?02?19? 16:22, Marek Szyprowski wrote:
> This patch moves all the IOMMU-based DMA-mapping code from arch/arm64/mm
> to drivers/iommu/dma-iommu-ops.c. This way it can be easily shared with
> ARM architecture, which will also use them.
>
> Signed-off-by: Marek Szyprowski<m.szyprowski@samsung.com>
IOMMU works good on drm/rockchip ARM64 platform with this patch. So

Tested-by: Mark Yao <mark.yao@rock-chips.com>

Thanks.

-- 
?ark Yao

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
  2016-02-19  8:22   ` Marek Szyprowski
  (?)
@ 2016-04-18  2:20     ` Mark yao
  -1 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:20 UTC (permalink / raw)
  To: Marek Szyprowski, iommu, linux-arm-kernel, linux-kernel
  Cc: linaro-mm-sig, dri-devel, Arnd Bergmann, Will Deacon,
	Catalin Marinas, Robin Murphy, Russell King - ARM Linux,
	Joerg Roedel, Laurent Pinchart, Sakari Ailus, Heiko Stuebner,
	Tomasz Figa, Inki Dae, Bartlomiej Zolnierkiewicz,
	Krzysztof Kozlowski

On 2016年02月19日 16:22, Marek Szyprowski wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
>
> Signed-off-by: Marek Szyprowski<m.szyprowski@samsung.com>
IOMMU works good on drm/rockchip ARM64 platform with this patch. So

Tested-by: Mark Yao <mark.yao@rock-chips.com>

Thanks.

-- 
Mark Yao

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-04-18  2:20     ` Mark yao
  0 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:20 UTC (permalink / raw)
  To: Marek Szyprowski, iommu, linux-arm-kernel, linux-kernel
  Cc: Krzysztof Kozlowski, Russell King - ARM Linux, Arnd Bergmann,
	Bartlomiej Zolnierkiewicz, Catalin Marinas, Will Deacon,
	dri-devel, Tomasz Figa, linaro-mm-sig, Sakari Ailus,
	Laurent Pinchart, Robin Murphy

On 2016年02月19日 16:22, Marek Szyprowski wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
>
> Signed-off-by: Marek Szyprowski<m.szyprowski@samsung.com>
IOMMU works good on drm/rockchip ARM64 platform with this patch. So

Tested-by: Mark Yao <mark.yao@rock-chips.com>

Thanks.

-- 
Mark Yao


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture
@ 2016-04-18  2:20     ` Mark yao
  0 siblings, 0 replies; 45+ messages in thread
From: Mark yao @ 2016-04-18  2:20 UTC (permalink / raw)
  To: linux-arm-kernel

On 2016?02?19? 16:22, Marek Szyprowski wrote:
> This patch replaces ARM-specific IOMMU-based DMA-mapping implementation
> with generic IOMMU DMA-mapping code shared with ARM64 architecture. The
> side-effect of this change is a switch from bitmap-based IO address space
> management to tree-based code. There should be no functional changes
> for drivers, which rely on initialization from generic arch_setup_dna_ops()
> interface. Code, which used old arm_iommu_* functions must be updated to
> new interface.
>
> Signed-off-by: Marek Szyprowski<m.szyprowski@samsung.com>
IOMMU works good on drm/rockchip ARM64 platform with this patch. So

Tested-by: Mark Yao <mark.yao@rock-chips.com>

Thanks.

-- 
?ark Yao

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] drm/rockchip: rewrite IOMMU support code
  2016-02-19  8:22 ` Marek Szyprowski
  (?)
@ 2016-04-19  3:17   ` Mark Yao
  -1 siblings, 0 replies; 45+ messages in thread
From: Mark Yao @ 2016-04-19  3:17 UTC (permalink / raw)
  To: David Airlie, Heiko Stuebner, dri-devel, linux-arm-kernel,
	linux-rockchip, linux-kernel
  Cc: Mark Yao, Marek Szyprowski

This patch is base on Marek Szyprowski's patch:
	[RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
	[https://lkml.org/lkml/2016/2/19/79]

And the patch is learn from Marek Szyprowski's exynos patch:
  (drm/exynos: rewrite IOMMU support code)

The patch replaces usage of ARM-specific IOMMU/DMA-mapping related calls
with new generic code for managing DMA-IOMMU integration layer. It also
removes all the hacks, which were needed to configure common DMA/IO address
space on the virtual rockchip-drm device.

This patch now works on Rockchip ARM64 rk3399 platform.

Cc: Marek Szyprowski <m.szyprowski@samsung.com>

Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
---
 drivers/gpu/drm/rockchip/Kconfig            |    3 +-
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c |   81 +++++++++++++++++++--------
 drivers/gpu/drm/rockchip/rockchip_drm_drv.h |    1 +
 3 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig
index d4c6a89..731f5c3 100644
--- a/drivers/gpu/drm/rockchip/Kconfig
+++ b/drivers/gpu/drm/rockchip/Kconfig
@@ -1,7 +1,6 @@
 config DRM_ROCKCHIP
 	tristate "DRM Support for Rockchip"
-	depends on BROKEN
-	depends on DRM && ROCKCHIP_IOMMU
+	depends on DRM && IOMMU_DMA
 	depends on RESET_CONTROLLER
 	select DRM_KMS_HELPER
 	select DRM_KMS_FB_HELPER
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
index 7176483..772bca3 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -14,7 +14,7 @@
  * GNU General Public License for more details.
  */
 
-#include <asm/dma-iommu.h>
+#include <linux/dma-iommu.h>
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
@@ -44,7 +44,8 @@
 int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
 				   struct device *dev)
 {
-	struct dma_iommu_mapping *mapping = drm_dev->dev->archdata.mapping;
+	struct rockchip_drm_private *private = drm_dev->dev_private;
+	struct iommu_domain *domain = private->domain;
 	int ret;
 
 	ret = dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
@@ -52,14 +53,28 @@ int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
 		return ret;
 
 	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+	ret = iommu_attach_device(domain, dev);
+	if (ret) {
+		dev_err(dev, "Failed to attach iommu device\n");
+		return ret;
+	}
 
-	return arm_iommu_attach_device(dev, mapping);
+	if (!common_iommu_setup_dma_ops(dev, 0x10000000, SZ_2G, domain->ops)) {
+		dev_err(dev, "Failed to set dma_ops\n");
+		iommu_detach_device(domain, dev);
+		ret = -ENODEV;
+	}
+
+	return ret;
 }
 
 void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
 				    struct device *dev)
 {
-	arm_iommu_detach_device(dev);
+	struct rockchip_drm_private *private = drm_dev->dev_private;
+	struct iommu_domain *domain = private->domain;
+
+	iommu_detach_device(domain, dev);
 }
 
 int rockchip_register_crtc_funcs(struct drm_crtc *crtc,
@@ -127,9 +142,9 @@ static void rockchip_drm_crtc_disable_vblank(struct drm_device *dev,
 static int rockchip_drm_load(struct drm_device *drm_dev, unsigned long flags)
 {
 	struct rockchip_drm_private *private;
-	struct dma_iommu_mapping *mapping;
 	struct device *dev = drm_dev->dev;
 	struct drm_connector *connector;
+	struct iommu_group *group;
 	int ret;
 
 	private = devm_kzalloc(drm_dev->dev, sizeof(*private), GFP_KERNEL);
@@ -152,23 +167,36 @@ static int rockchip_drm_load(struct drm_device *drm_dev, unsigned long flags)
 		goto err_config_cleanup;
 	}
 
-	/* TODO(djkurtz): fetch the mapping start/size from somewhere */
-	mapping = arm_iommu_create_mapping(&platform_bus_type, 0x00000000,
-					   SZ_2G);
-	if (IS_ERR(mapping)) {
-		ret = PTR_ERR(mapping);
-		goto err_config_cleanup;
-	}
+	private->domain = iommu_domain_alloc(&platform_bus_type);
+	if (!private->domain)
+		return -ENOMEM;
 
-	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
+	ret = iommu_get_dma_cookie(private->domain);
 	if (ret)
-		goto err_release_mapping;
-
-	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+		goto err_free_domain;
+
+	group = iommu_group_get(dev);
+	if (!group) {
+		group = iommu_group_alloc();
+		if (IS_ERR(group)) {
+			dev_err(dev, "Failed to allocate IOMMU group\n");
+			goto err_put_cookie;
+		}
 
-	ret = arm_iommu_attach_device(dev, mapping);
+		ret = iommu_group_add_device(group, dev);
+		iommu_group_put(group);
+		if (ret) {
+			dev_err(dev, "failed to add device to IOMMU group\n");
+			goto err_put_cookie;
+		}
+	}
+	/*
+	 * Attach virtual iommu device, sub iommu device can share the same
+	 * mapping with it.
+	 */
+	ret = rockchip_drm_dma_attach_device(drm_dev, dev);
 	if (ret)
-		goto err_release_mapping;
+		goto err_group_remove_device;
 
 	/* Try to bind all sub drivers. */
 	ret = component_bind_all(dev, drm_dev);
@@ -226,9 +254,13 @@ err_kms_helper_poll_fini:
 err_unbind:
 	component_unbind_all(dev, drm_dev);
 err_detach_device:
-	arm_iommu_detach_device(dev);
-err_release_mapping:
-	arm_iommu_release_mapping(dev->archdata.mapping);
+	rockchip_drm_dma_detach_device(drm_dev, dev);
+err_group_remove_device:
+	iommu_group_remove_device(dev);
+err_put_cookie:
+	iommu_put_dma_cookie(private->domain);
+err_free_domain:
+	iommu_domain_free(private->domain);
 err_config_cleanup:
 	drm_mode_config_cleanup(drm_dev);
 	drm_dev->dev_private = NULL;
@@ -238,13 +270,16 @@ err_config_cleanup:
 static int rockchip_drm_unload(struct drm_device *drm_dev)
 {
 	struct device *dev = drm_dev->dev;
+	struct rockchip_drm_private *private = drm_dev->dev_private;
 
 	rockchip_drm_fbdev_fini(drm_dev);
 	drm_vblank_cleanup(drm_dev);
 	drm_kms_helper_poll_fini(drm_dev);
 	component_unbind_all(dev, drm_dev);
-	arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(dev->archdata.mapping);
+	rockchip_drm_dma_detach_device(drm_dev, dev);
+	iommu_group_remove_device(dev);
+	iommu_put_dma_cookie(private->domain);
+	iommu_domain_free(private->domain);
 	drm_mode_config_cleanup(drm_dev);
 	drm_dev->dev_private = NULL;
 
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
index 234cec2..2677b95 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
@@ -62,6 +62,7 @@ struct rockchip_drm_private {
 	const struct rockchip_crtc_funcs *crtc_funcs[ROCKCHIP_MAX_CRTC];
 
 	struct rockchip_atomic_commit commit;
+	struct iommu_domain *domain;
 };
 
 void rockchip_drm_atomic_work(struct work_struct *work);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] drm/rockchip: rewrite IOMMU support code
@ 2016-04-19  3:17   ` Mark Yao
  0 siblings, 0 replies; 45+ messages in thread
From: Mark Yao @ 2016-04-19  3:17 UTC (permalink / raw)
  To: David Airlie, Heiko Stuebner, dri-devel, linux-arm-kernel,
	linux-rockchip, linux-kernel
  Cc: Marek Szyprowski

This patch is base on Marek Szyprowski's patch:
	[RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
	[https://lkml.org/lkml/2016/2/19/79]

And the patch is learn from Marek Szyprowski's exynos patch:
  (drm/exynos: rewrite IOMMU support code)

The patch replaces usage of ARM-specific IOMMU/DMA-mapping related calls
with new generic code for managing DMA-IOMMU integration layer. It also
removes all the hacks, which were needed to configure common DMA/IO address
space on the virtual rockchip-drm device.

This patch now works on Rockchip ARM64 rk3399 platform.

Cc: Marek Szyprowski <m.szyprowski@samsung.com>

Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
---
 drivers/gpu/drm/rockchip/Kconfig            |    3 +-
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c |   81 +++++++++++++++++++--------
 drivers/gpu/drm/rockchip/rockchip_drm_drv.h |    1 +
 3 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig
index d4c6a89..731f5c3 100644
--- a/drivers/gpu/drm/rockchip/Kconfig
+++ b/drivers/gpu/drm/rockchip/Kconfig
@@ -1,7 +1,6 @@
 config DRM_ROCKCHIP
 	tristate "DRM Support for Rockchip"
-	depends on BROKEN
-	depends on DRM && ROCKCHIP_IOMMU
+	depends on DRM && IOMMU_DMA
 	depends on RESET_CONTROLLER
 	select DRM_KMS_HELPER
 	select DRM_KMS_FB_HELPER
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
index 7176483..772bca3 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -14,7 +14,7 @@
  * GNU General Public License for more details.
  */
 
-#include <asm/dma-iommu.h>
+#include <linux/dma-iommu.h>
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
@@ -44,7 +44,8 @@
 int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
 				   struct device *dev)
 {
-	struct dma_iommu_mapping *mapping = drm_dev->dev->archdata.mapping;
+	struct rockchip_drm_private *private = drm_dev->dev_private;
+	struct iommu_domain *domain = private->domain;
 	int ret;
 
 	ret = dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
@@ -52,14 +53,28 @@ int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
 		return ret;
 
 	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+	ret = iommu_attach_device(domain, dev);
+	if (ret) {
+		dev_err(dev, "Failed to attach iommu device\n");
+		return ret;
+	}
 
-	return arm_iommu_attach_device(dev, mapping);
+	if (!common_iommu_setup_dma_ops(dev, 0x10000000, SZ_2G, domain->ops)) {
+		dev_err(dev, "Failed to set dma_ops\n");
+		iommu_detach_device(domain, dev);
+		ret = -ENODEV;
+	}
+
+	return ret;
 }
 
 void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
 				    struct device *dev)
 {
-	arm_iommu_detach_device(dev);
+	struct rockchip_drm_private *private = drm_dev->dev_private;
+	struct iommu_domain *domain = private->domain;
+
+	iommu_detach_device(domain, dev);
 }
 
 int rockchip_register_crtc_funcs(struct drm_crtc *crtc,
@@ -127,9 +142,9 @@ static void rockchip_drm_crtc_disable_vblank(struct drm_device *dev,
 static int rockchip_drm_load(struct drm_device *drm_dev, unsigned long flags)
 {
 	struct rockchip_drm_private *private;
-	struct dma_iommu_mapping *mapping;
 	struct device *dev = drm_dev->dev;
 	struct drm_connector *connector;
+	struct iommu_group *group;
 	int ret;
 
 	private = devm_kzalloc(drm_dev->dev, sizeof(*private), GFP_KERNEL);
@@ -152,23 +167,36 @@ static int rockchip_drm_load(struct drm_device *drm_dev, unsigned long flags)
 		goto err_config_cleanup;
 	}
 
-	/* TODO(djkurtz): fetch the mapping start/size from somewhere */
-	mapping = arm_iommu_create_mapping(&platform_bus_type, 0x00000000,
-					   SZ_2G);
-	if (IS_ERR(mapping)) {
-		ret = PTR_ERR(mapping);
-		goto err_config_cleanup;
-	}
+	private->domain = iommu_domain_alloc(&platform_bus_type);
+	if (!private->domain)
+		return -ENOMEM;
 
-	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
+	ret = iommu_get_dma_cookie(private->domain);
 	if (ret)
-		goto err_release_mapping;
-
-	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+		goto err_free_domain;
+
+	group = iommu_group_get(dev);
+	if (!group) {
+		group = iommu_group_alloc();
+		if (IS_ERR(group)) {
+			dev_err(dev, "Failed to allocate IOMMU group\n");
+			goto err_put_cookie;
+		}
 
-	ret = arm_iommu_attach_device(dev, mapping);
+		ret = iommu_group_add_device(group, dev);
+		iommu_group_put(group);
+		if (ret) {
+			dev_err(dev, "failed to add device to IOMMU group\n");
+			goto err_put_cookie;
+		}
+	}
+	/*
+	 * Attach virtual iommu device, sub iommu device can share the same
+	 * mapping with it.
+	 */
+	ret = rockchip_drm_dma_attach_device(drm_dev, dev);
 	if (ret)
-		goto err_release_mapping;
+		goto err_group_remove_device;
 
 	/* Try to bind all sub drivers. */
 	ret = component_bind_all(dev, drm_dev);
@@ -226,9 +254,13 @@ err_kms_helper_poll_fini:
 err_unbind:
 	component_unbind_all(dev, drm_dev);
 err_detach_device:
-	arm_iommu_detach_device(dev);
-err_release_mapping:
-	arm_iommu_release_mapping(dev->archdata.mapping);
+	rockchip_drm_dma_detach_device(drm_dev, dev);
+err_group_remove_device:
+	iommu_group_remove_device(dev);
+err_put_cookie:
+	iommu_put_dma_cookie(private->domain);
+err_free_domain:
+	iommu_domain_free(private->domain);
 err_config_cleanup:
 	drm_mode_config_cleanup(drm_dev);
 	drm_dev->dev_private = NULL;
@@ -238,13 +270,16 @@ err_config_cleanup:
 static int rockchip_drm_unload(struct drm_device *drm_dev)
 {
 	struct device *dev = drm_dev->dev;
+	struct rockchip_drm_private *private = drm_dev->dev_private;
 
 	rockchip_drm_fbdev_fini(drm_dev);
 	drm_vblank_cleanup(drm_dev);
 	drm_kms_helper_poll_fini(drm_dev);
 	component_unbind_all(dev, drm_dev);
-	arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(dev->archdata.mapping);
+	rockchip_drm_dma_detach_device(drm_dev, dev);
+	iommu_group_remove_device(dev);
+	iommu_put_dma_cookie(private->domain);
+	iommu_domain_free(private->domain);
 	drm_mode_config_cleanup(drm_dev);
 	drm_dev->dev_private = NULL;
 
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
index 234cec2..2677b95 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
@@ -62,6 +62,7 @@ struct rockchip_drm_private {
 	const struct rockchip_crtc_funcs *crtc_funcs[ROCKCHIP_MAX_CRTC];
 
 	struct rockchip_atomic_commit commit;
+	struct iommu_domain *domain;
 };
 
 void rockchip_drm_atomic_work(struct work_struct *work);
-- 
1.7.9.5


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH] drm/rockchip: rewrite IOMMU support code
@ 2016-04-19  3:17   ` Mark Yao
  0 siblings, 0 replies; 45+ messages in thread
From: Mark Yao @ 2016-04-19  3:17 UTC (permalink / raw)
  To: linux-arm-kernel

This patch is base on Marek Szyprowski's patch:
	[RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64
	[https://lkml.org/lkml/2016/2/19/79]

And the patch is learn from Marek Szyprowski's exynos patch:
  (drm/exynos: rewrite IOMMU support code)

The patch replaces usage of ARM-specific IOMMU/DMA-mapping related calls
with new generic code for managing DMA-IOMMU integration layer. It also
removes all the hacks, which were needed to configure common DMA/IO address
space on the virtual rockchip-drm device.

This patch now works on Rockchip ARM64 rk3399 platform.

Cc: Marek Szyprowski <m.szyprowski@samsung.com>

Signed-off-by: Mark Yao <mark.yao@rock-chips.com>
---
 drivers/gpu/drm/rockchip/Kconfig            |    3 +-
 drivers/gpu/drm/rockchip/rockchip_drm_drv.c |   81 +++++++++++++++++++--------
 drivers/gpu/drm/rockchip/rockchip_drm_drv.h |    1 +
 3 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/rockchip/Kconfig b/drivers/gpu/drm/rockchip/Kconfig
index d4c6a89..731f5c3 100644
--- a/drivers/gpu/drm/rockchip/Kconfig
+++ b/drivers/gpu/drm/rockchip/Kconfig
@@ -1,7 +1,6 @@
 config DRM_ROCKCHIP
 	tristate "DRM Support for Rockchip"
-	depends on BROKEN
-	depends on DRM && ROCKCHIP_IOMMU
+	depends on DRM && IOMMU_DMA
 	depends on RESET_CONTROLLER
 	select DRM_KMS_HELPER
 	select DRM_KMS_FB_HELPER
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
index 7176483..772bca3 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.c
@@ -14,7 +14,7 @@
  * GNU General Public License for more details.
  */
 
-#include <asm/dma-iommu.h>
+#include <linux/dma-iommu.h>
 
 #include <drm/drmP.h>
 #include <drm/drm_crtc_helper.h>
@@ -44,7 +44,8 @@
 int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
 				   struct device *dev)
 {
-	struct dma_iommu_mapping *mapping = drm_dev->dev->archdata.mapping;
+	struct rockchip_drm_private *private = drm_dev->dev_private;
+	struct iommu_domain *domain = private->domain;
 	int ret;
 
 	ret = dma_set_coherent_mask(dev, DMA_BIT_MASK(32));
@@ -52,14 +53,28 @@ int rockchip_drm_dma_attach_device(struct drm_device *drm_dev,
 		return ret;
 
 	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+	ret = iommu_attach_device(domain, dev);
+	if (ret) {
+		dev_err(dev, "Failed to attach iommu device\n");
+		return ret;
+	}
 
-	return arm_iommu_attach_device(dev, mapping);
+	if (!common_iommu_setup_dma_ops(dev, 0x10000000, SZ_2G, domain->ops)) {
+		dev_err(dev, "Failed to set dma_ops\n");
+		iommu_detach_device(domain, dev);
+		ret = -ENODEV;
+	}
+
+	return ret;
 }
 
 void rockchip_drm_dma_detach_device(struct drm_device *drm_dev,
 				    struct device *dev)
 {
-	arm_iommu_detach_device(dev);
+	struct rockchip_drm_private *private = drm_dev->dev_private;
+	struct iommu_domain *domain = private->domain;
+
+	iommu_detach_device(domain, dev);
 }
 
 int rockchip_register_crtc_funcs(struct drm_crtc *crtc,
@@ -127,9 +142,9 @@ static void rockchip_drm_crtc_disable_vblank(struct drm_device *dev,
 static int rockchip_drm_load(struct drm_device *drm_dev, unsigned long flags)
 {
 	struct rockchip_drm_private *private;
-	struct dma_iommu_mapping *mapping;
 	struct device *dev = drm_dev->dev;
 	struct drm_connector *connector;
+	struct iommu_group *group;
 	int ret;
 
 	private = devm_kzalloc(drm_dev->dev, sizeof(*private), GFP_KERNEL);
@@ -152,23 +167,36 @@ static int rockchip_drm_load(struct drm_device *drm_dev, unsigned long flags)
 		goto err_config_cleanup;
 	}
 
-	/* TODO(djkurtz): fetch the mapping start/size from somewhere */
-	mapping = arm_iommu_create_mapping(&platform_bus_type, 0x00000000,
-					   SZ_2G);
-	if (IS_ERR(mapping)) {
-		ret = PTR_ERR(mapping);
-		goto err_config_cleanup;
-	}
+	private->domain = iommu_domain_alloc(&platform_bus_type);
+	if (!private->domain)
+		return -ENOMEM;
 
-	ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(32));
+	ret = iommu_get_dma_cookie(private->domain);
 	if (ret)
-		goto err_release_mapping;
-
-	dma_set_max_seg_size(dev, DMA_BIT_MASK(32));
+		goto err_free_domain;
+
+	group = iommu_group_get(dev);
+	if (!group) {
+		group = iommu_group_alloc();
+		if (IS_ERR(group)) {
+			dev_err(dev, "Failed to allocate IOMMU group\n");
+			goto err_put_cookie;
+		}
 
-	ret = arm_iommu_attach_device(dev, mapping);
+		ret = iommu_group_add_device(group, dev);
+		iommu_group_put(group);
+		if (ret) {
+			dev_err(dev, "failed to add device to IOMMU group\n");
+			goto err_put_cookie;
+		}
+	}
+	/*
+	 * Attach virtual iommu device, sub iommu device can share the same
+	 * mapping with it.
+	 */
+	ret = rockchip_drm_dma_attach_device(drm_dev, dev);
 	if (ret)
-		goto err_release_mapping;
+		goto err_group_remove_device;
 
 	/* Try to bind all sub drivers. */
 	ret = component_bind_all(dev, drm_dev);
@@ -226,9 +254,13 @@ err_kms_helper_poll_fini:
 err_unbind:
 	component_unbind_all(dev, drm_dev);
 err_detach_device:
-	arm_iommu_detach_device(dev);
-err_release_mapping:
-	arm_iommu_release_mapping(dev->archdata.mapping);
+	rockchip_drm_dma_detach_device(drm_dev, dev);
+err_group_remove_device:
+	iommu_group_remove_device(dev);
+err_put_cookie:
+	iommu_put_dma_cookie(private->domain);
+err_free_domain:
+	iommu_domain_free(private->domain);
 err_config_cleanup:
 	drm_mode_config_cleanup(drm_dev);
 	drm_dev->dev_private = NULL;
@@ -238,13 +270,16 @@ err_config_cleanup:
 static int rockchip_drm_unload(struct drm_device *drm_dev)
 {
 	struct device *dev = drm_dev->dev;
+	struct rockchip_drm_private *private = drm_dev->dev_private;
 
 	rockchip_drm_fbdev_fini(drm_dev);
 	drm_vblank_cleanup(drm_dev);
 	drm_kms_helper_poll_fini(drm_dev);
 	component_unbind_all(dev, drm_dev);
-	arm_iommu_detach_device(dev);
-	arm_iommu_release_mapping(dev->archdata.mapping);
+	rockchip_drm_dma_detach_device(drm_dev, dev);
+	iommu_group_remove_device(dev);
+	iommu_put_dma_cookie(private->domain);
+	iommu_domain_free(private->domain);
 	drm_mode_config_cleanup(drm_dev);
 	drm_dev->dev_private = NULL;
 
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
index 234cec2..2677b95 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_drv.h
@@ -62,6 +62,7 @@ struct rockchip_drm_private {
 	const struct rockchip_crtc_funcs *crtc_funcs[ROCKCHIP_MAX_CRTC];
 
 	struct rockchip_atomic_commit commit;
+	struct iommu_domain *domain;
 };
 
 void rockchip_drm_atomic_work(struct work_struct *work);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2016-04-19  3:19 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-19  8:22 [RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64 Marek Szyprowski
2016-02-19  8:22 ` Marek Szyprowski
2016-02-19  8:22 ` Marek Szyprowski
2016-02-19  8:22 ` [RFC 1/3] drm/exynos: rewrite IOMMU support code Marek Szyprowski
2016-02-19  8:22   ` Marek Szyprowski
2016-02-19  8:22   ` Marek Szyprowski
2016-02-19  8:22 ` [RFC 2/3] iommu: dma-iommu: move IOMMU/DMA-mapping code from ARM64 arch to drivers Marek Szyprowski
2016-02-19  8:22   ` Marek Szyprowski
2016-02-19  8:22   ` Marek Szyprowski
2016-04-18  2:20   ` Mark yao
2016-04-18  2:20     ` Mark yao
2016-04-18  2:20     ` Mark yao
2016-02-19  8:22 ` [RFC 3/3] iommu: dma-iommu: use common implementation also on ARM architecture Marek Szyprowski
2016-02-19  8:22   ` Marek Szyprowski
2016-02-19  8:22   ` Marek Szyprowski
2016-02-19 10:30   ` Arnd Bergmann
2016-02-19 10:30     ` Arnd Bergmann
2016-02-19 10:30     ` Arnd Bergmann
2016-02-25 12:26     ` Marek Szyprowski
2016-02-25 12:26       ` Marek Szyprowski
2016-02-25 12:26       ` Marek Szyprowski
2016-02-25 14:44       ` Arnd Bergmann
2016-02-25 14:44         ` Arnd Bergmann
2016-02-25 14:44         ` Arnd Bergmann
2016-03-15 12:33     ` Robin Murphy
2016-03-15 12:33       ` Robin Murphy
2016-03-15 12:33       ` Robin Murphy
2016-03-15 11:18   ` Magnus Damm
2016-03-15 11:18     ` Magnus Damm
2016-03-15 11:18     ` Magnus Damm
2016-03-15 11:45     ` Robin Murphy
2016-03-15 11:45       ` Robin Murphy
2016-03-15 11:45       ` Robin Murphy
2016-03-15 12:03     ` Marek Szyprowski
2016-03-15 12:03       ` Marek Szyprowski
2016-03-15 12:03       ` Marek Szyprowski
2016-04-18  2:20   ` Mark yao
2016-04-18  2:20     ` Mark yao
2016-04-18  2:20     ` Mark yao
2016-04-18  2:18 ` [RFC 0/3] Unify IOMMU-based DMA-mapping code for ARM and ARM64 Mark yao
2016-04-18  2:18   ` Mark yao
2016-04-18  2:18   ` Mark yao
2016-04-19  3:17 ` [PATCH] drm/rockchip: rewrite IOMMU support code Mark Yao
2016-04-19  3:17   ` Mark Yao
2016-04-19  3:17   ` Mark Yao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.