All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] Add support for ARM SMMU architectures 1 and 2
@ 2013-06-10 18:34 ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi all,

This patch series adds support for ARM's SMMU architectures 1 and 2 to
Linux. It has been tested with models of MMU-400 (AArch32) and MMU-500
(AArch32 and AArch64) coupled with four PL330s doing memory-to-memory
DMA.

The first six patches are preparatory and fix problems that I
encountered whilst trying to use both the ARM IOMMU API and the PL330
driver. The remaining patches add the device-tree binding, previously
discussed at:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/160663.html
  http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/162945.html

as well as the driver implementation and an update to the MAINTAINERS
file.

The driver is written only with single-stage (possibly stage 2), LPAE,
non-secure translation in mind. Nested translation may of course be added
later via KVM, but that will require some changes to the core IOMMU API
in Linux. Both arm and arm64 are supported, as well as chained SMMUs,
although we only install our translation in the SMMU furthest from the
device. We're also currently limited in our IPA size, due to input size
restrictions at stage 2. This can be resolved later by munging about in
the page table allocation code, where we currently re-use parts of the
CPU page table helpers.

Read and write protection is supported as far as the pte formats allow
(i.e. no write-only at stage-1) and memory attributes are either normal,
non-cacheable (default) or normal, cacheable, write-back, write-allocate.
TLB broadcasting is not used for either ASIDs (since we don't support
user tables for DMA in Linux) or VMIDs (which are instead used to tag
address spaces in order to limit the scope of TLB invalidation
operations).

Both 4k and 64k (AArch64 only) pages are supported, with the contiguous
hint bit being used in the pte entries for mappings that allow it.

Support for stream-indexing and non-coherent table walking is provided,
but untested.

All comments welcome,

Will


Will Deacon (9):
  dma: pl330: rip out broken, redundant ID probing
  dma: pl330: use dma_addr_t for describing bus addresses
  ARM: dma-mapping: convert DMA direction into IOMMU protection
    attributes
  ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
  arm64: pgtable: use pte_index instead of __pte_index
  arm64: device: add iommu pointer to device archdata
  documentation: iommu: add description of ARM System MMU binding
  iommu: add support for ARM Ltd. System MMU architecture
  MAINTAINERS: add entry for ARM system MMU driver

 .../devicetree/bindings/iommu/arm,smmu.txt         |   70 +
 MAINTAINERS                                        |    6 +
 arch/arm/mm/dma-mapping.c                          |   20 +-
 arch/arm64/include/asm/device.h                    |    3 +
 arch/arm64/include/asm/pgtable.h                   |    4 +-
 drivers/dma/pl330.c                                |   29 +-
 drivers/iommu/Kconfig                              |   13 +
 drivers/iommu/Makefile                             |    1 +
 drivers/iommu/arm-smmu.c                           | 1965 ++++++++++++++++++++
 9 files changed, 2081 insertions(+), 30 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt
 create mode 100644 drivers/iommu/arm-smmu.c

-- 
1.8.2.2

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 0/9] Add support for ARM SMMU architectures 1 and 2
@ 2013-06-10 18:34 ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This patch series adds support for ARM's SMMU architectures 1 and 2 to
Linux. It has been tested with models of MMU-400 (AArch32) and MMU-500
(AArch32 and AArch64) coupled with four PL330s doing memory-to-memory
DMA.

The first six patches are preparatory and fix problems that I
encountered whilst trying to use both the ARM IOMMU API and the PL330
driver. The remaining patches add the device-tree binding, previously
discussed at:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/160663.html
  http://lists.infradead.org/pipermail/linux-arm-kernel/2013-April/162945.html

as well as the driver implementation and an update to the MAINTAINERS
file.

The driver is written only with single-stage (possibly stage 2), LPAE,
non-secure translation in mind. Nested translation may of course be added
later via KVM, but that will require some changes to the core IOMMU API
in Linux. Both arm and arm64 are supported, as well as chained SMMUs,
although we only install our translation in the SMMU furthest from the
device. We're also currently limited in our IPA size, due to input size
restrictions at stage 2. This can be resolved later by munging about in
the page table allocation code, where we currently re-use parts of the
CPU page table helpers.

Read and write protection is supported as far as the pte formats allow
(i.e. no write-only at stage-1) and memory attributes are either normal,
non-cacheable (default) or normal, cacheable, write-back, write-allocate.
TLB broadcasting is not used for either ASIDs (since we don't support
user tables for DMA in Linux) or VMIDs (which are instead used to tag
address spaces in order to limit the scope of TLB invalidation
operations).

Both 4k and 64k (AArch64 only) pages are supported, with the contiguous
hint bit being used in the pte entries for mappings that allow it.

Support for stream-indexing and non-coherent table walking is provided,
but untested.

All comments welcome,

Will


Will Deacon (9):
  dma: pl330: rip out broken, redundant ID probing
  dma: pl330: use dma_addr_t for describing bus addresses
  ARM: dma-mapping: convert DMA direction into IOMMU protection
    attributes
  ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
  arm64: pgtable: use pte_index instead of __pte_index
  arm64: device: add iommu pointer to device archdata
  documentation: iommu: add description of ARM System MMU binding
  iommu: add support for ARM Ltd. System MMU architecture
  MAINTAINERS: add entry for ARM system MMU driver

 .../devicetree/bindings/iommu/arm,smmu.txt         |   70 +
 MAINTAINERS                                        |    6 +
 arch/arm/mm/dma-mapping.c                          |   20 +-
 arch/arm64/include/asm/device.h                    |    3 +
 arch/arm64/include/asm/pgtable.h                   |    4 +-
 drivers/dma/pl330.c                                |   29 +-
 drivers/iommu/Kconfig                              |   13 +
 drivers/iommu/Makefile                             |    1 +
 drivers/iommu/arm-smmu.c                           | 1965 ++++++++++++++++++++
 9 files changed, 2081 insertions(+), 30 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt
 create mode 100644 drivers/iommu/arm-smmu.c

-- 
1.8.2.2

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Vinod Koul, Jassi Brar,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

The PL330 driver probes the peripheral and primecell IDs of the device to
make sure that it is indeed an AMBA PL330. However, it does this by
making byte accesses to a device mapping of the word-aligned ID
registers, which is either UNPREDICTABLE or generates an alignment fault
(depending on the presence of the virtualisation extensions).

Rather than fix this code, we can actually rip most of it out and let
the AMBA bus driver correctly do the probing for us.

Cc: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: Vinod Koul <vinod.koul-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 drivers/dma/pl330.c | 27 +++------------------------
 1 file changed, 3 insertions(+), 24 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 24e0754..22e2a8f 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -157,7 +157,6 @@ enum pl330_reqtype {
 #define PERIPH_REV_R0P0		0
 #define PERIPH_REV_R1P0		1
 #define PERIPH_REV_R1P1		2
-#define PCELL_ID		0xff0
 
 #define CR0_PERIPH_REQ_SET	(1 << 0)
 #define CR0_BOOT_EN_SET		(1 << 1)
@@ -193,8 +192,6 @@ enum pl330_reqtype {
 #define INTEG_CFG		0x0
 #define PERIPH_ID_VAL		((PART << 0) | (DESIGNER << 12))
 
-#define PCELL_ID_VAL		0xb105f00d
-
 #define PL330_STATE_STOPPED		(1 << 0)
 #define PL330_STATE_EXECUTING		(1 << 1)
 #define PL330_STATE_WFE			(1 << 2)
@@ -292,7 +289,6 @@ static unsigned cmd_line;
 /* Populated by the PL330 core driver for DMA API driver's info */
 struct pl330_config {
 	u32	periph_id;
-	u32	pcell_id;
 #define DMAC_MODE_NS	(1 << 0)
 	unsigned int	mode;
 	unsigned int	data_bus_width:10; /* In number of bits */
@@ -650,19 +646,6 @@ static inline bool _manager_ns(struct pl330_thread *thrd)
 	return (pl330->pinfo->pcfg.mode & DMAC_MODE_NS) ? true : false;
 }
 
-static inline u32 get_id(struct pl330_info *pi, u32 off)
-{
-	void __iomem *regs = pi->base;
-	u32 id = 0;
-
-	id |= (readb(regs + off + 0x0) << 0);
-	id |= (readb(regs + off + 0x4) << 8);
-	id |= (readb(regs + off + 0x8) << 16);
-	id |= (readb(regs + off + 0xc) << 24);
-
-	return id;
-}
-
 static inline u32 get_revision(u32 periph_id)
 {
 	return (periph_id >> PERIPH_REV_SHIFT) & PERIPH_REV_MASK;
@@ -1986,9 +1969,6 @@ static void read_dmac_config(struct pl330_info *pi)
 	pi->pcfg.num_events = val;
 
 	pi->pcfg.irq_ns = readl(regs + CR3);
-
-	pi->pcfg.periph_id = get_id(pi, PERIPH_ID);
-	pi->pcfg.pcell_id = get_id(pi, PCELL_ID);
 }
 
 static inline void _reset_thread(struct pl330_thread *thrd)
@@ -2098,10 +2078,8 @@ static int pl330_add(struct pl330_info *pi)
 	regs = pi->base;
 
 	/* Check if we can handle this DMAC */
-	if ((get_id(pi, PERIPH_ID) & 0xfffff) != PERIPH_ID_VAL
-	   || get_id(pi, PCELL_ID) != PCELL_ID_VAL) {
-		dev_err(pi->dev, "PERIPH_ID 0x%x, PCELL_ID 0x%x !\n",
-			get_id(pi, PERIPH_ID), get_id(pi, PCELL_ID));
+	if ((pi->pcfg.periph_id & 0xfffff) != PERIPH_ID_VAL) {
+		dev_err(pi->dev, "PERIPH_ID 0x%x !\n", pi->pcfg.periph_id);
 		return -EINVAL;
 	}
 
@@ -2922,6 +2900,7 @@ pl330_probe(struct amba_device *adev, const struct amba_id *id)
 	if (ret)
 		return ret;
 
+	pi->pcfg.periph_id = adev->periphid;
 	ret = pl330_add(pi);
 	if (ret)
 		goto probe_err1;
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

The PL330 driver probes the peripheral and primecell IDs of the device to
make sure that it is indeed an AMBA PL330. However, it does this by
making byte accesses to a device mapping of the word-aligned ID
registers, which is either UNPREDICTABLE or generates an alignment fault
(depending on the presence of the virtualisation extensions).

Rather than fix this code, we can actually rip most of it out and let
the AMBA bus driver correctly do the probing for us.

Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/dma/pl330.c | 27 +++------------------------
 1 file changed, 3 insertions(+), 24 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 24e0754..22e2a8f 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -157,7 +157,6 @@ enum pl330_reqtype {
 #define PERIPH_REV_R0P0		0
 #define PERIPH_REV_R1P0		1
 #define PERIPH_REV_R1P1		2
-#define PCELL_ID		0xff0
 
 #define CR0_PERIPH_REQ_SET	(1 << 0)
 #define CR0_BOOT_EN_SET		(1 << 1)
@@ -193,8 +192,6 @@ enum pl330_reqtype {
 #define INTEG_CFG		0x0
 #define PERIPH_ID_VAL		((PART << 0) | (DESIGNER << 12))
 
-#define PCELL_ID_VAL		0xb105f00d
-
 #define PL330_STATE_STOPPED		(1 << 0)
 #define PL330_STATE_EXECUTING		(1 << 1)
 #define PL330_STATE_WFE			(1 << 2)
@@ -292,7 +289,6 @@ static unsigned cmd_line;
 /* Populated by the PL330 core driver for DMA API driver's info */
 struct pl330_config {
 	u32	periph_id;
-	u32	pcell_id;
 #define DMAC_MODE_NS	(1 << 0)
 	unsigned int	mode;
 	unsigned int	data_bus_width:10; /* In number of bits */
@@ -650,19 +646,6 @@ static inline bool _manager_ns(struct pl330_thread *thrd)
 	return (pl330->pinfo->pcfg.mode & DMAC_MODE_NS) ? true : false;
 }
 
-static inline u32 get_id(struct pl330_info *pi, u32 off)
-{
-	void __iomem *regs = pi->base;
-	u32 id = 0;
-
-	id |= (readb(regs + off + 0x0) << 0);
-	id |= (readb(regs + off + 0x4) << 8);
-	id |= (readb(regs + off + 0x8) << 16);
-	id |= (readb(regs + off + 0xc) << 24);
-
-	return id;
-}
-
 static inline u32 get_revision(u32 periph_id)
 {
 	return (periph_id >> PERIPH_REV_SHIFT) & PERIPH_REV_MASK;
@@ -1986,9 +1969,6 @@ static void read_dmac_config(struct pl330_info *pi)
 	pi->pcfg.num_events = val;
 
 	pi->pcfg.irq_ns = readl(regs + CR3);
-
-	pi->pcfg.periph_id = get_id(pi, PERIPH_ID);
-	pi->pcfg.pcell_id = get_id(pi, PCELL_ID);
 }
 
 static inline void _reset_thread(struct pl330_thread *thrd)
@@ -2098,10 +2078,8 @@ static int pl330_add(struct pl330_info *pi)
 	regs = pi->base;
 
 	/* Check if we can handle this DMAC */
-	if ((get_id(pi, PERIPH_ID) & 0xfffff) != PERIPH_ID_VAL
-	   || get_id(pi, PCELL_ID) != PCELL_ID_VAL) {
-		dev_err(pi->dev, "PERIPH_ID 0x%x, PCELL_ID 0x%x !\n",
-			get_id(pi, PERIPH_ID), get_id(pi, PCELL_ID));
+	if ((pi->pcfg.periph_id & 0xfffff) != PERIPH_ID_VAL) {
+		dev_err(pi->dev, "PERIPH_ID 0x%x !\n", pi->pcfg.periph_id);
 		return -EINVAL;
 	}
 
@@ -2922,6 +2900,7 @@ pl330_probe(struct amba_device *adev, const struct amba_id *id)
 	if (ret)
 		return ret;
 
+	pi->pcfg.periph_id = adev->periphid;
 	ret = pl330_add(pi);
 	if (ret)
 		goto probe_err1;
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Vinod Koul, Jassi Brar,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
which fails to compile when building on a system with 64-bit bus
addresses.

This patch uses dma_addr_t to represent the address instead.

Cc: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Cc: Vinod Koul <vinod.koul-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 drivers/dma/pl330.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 22e2a8f..f1bc593 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -501,7 +501,7 @@ struct pl330_dmac {
 	/* Maximum possible events/irqs */
 	int			events[32];
 	/* BUS address of MicroCode buffer */
-	u32			mcode_bus;
+	dma_addr_t		mcode_bus;
 	/* CPU address of MicroCode buffer */
 	void			*mcode_cpu;
 	/* List of all Channel threads */
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
which fails to compile when building on a system with 64-bit bus
addresses.

This patch uses dma_addr_t to represent the address instead.

Cc: Jassi Brar <jaswinder.singh@linaro.org>
Cc: Vinod Koul <vinod.koul@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/dma/pl330.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index 22e2a8f..f1bc593 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -501,7 +501,7 @@ struct pl330_dmac {
 	/* Maximum possible events/irqs */
 	int			events[32];
 	/* BUS address of MicroCode buffer */
-	u32			mcode_bus;
+	dma_addr_t		mcode_bus;
 	/* CPU address of MicroCode buffer */
 	void			*mcode_cpu;
 	/* List of all Channel threads */
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

IOMMU mappings take a prot parameter, identifying the protection bits
to enforce on the newly created mapping (READ or WRITE). The ARM
dma-mapping framework currently just passes 0 as the prot argument,
resulting in faulting mappings.

This patch infers the protection attributes based on the direction of
the DMA transfer.

Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm/mm/dma-mapping.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6fb80cf..d119de7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1636,13 +1636,27 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
 {
 	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
 	dma_addr_t dma_addr;
-	int ret, len = PAGE_ALIGN(size + offset);
+	int ret, prot, len = PAGE_ALIGN(size + offset);
 
 	dma_addr = __alloc_iova(mapping, len);
 	if (dma_addr == DMA_ERROR_CODE)
 		return dma_addr;
 
-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0);
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		prot = IOMMU_READ | IOMMU_WRITE;
+		break;
+	case DMA_TO_DEVICE:
+		prot = IOMMU_READ;
+		break;
+	case DMA_FROM_DEVICE:
+		prot = IOMMU_WRITE;
+		break;
+	default:
+		prot = 0;
+	}
+
+	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
 	if (ret < 0)
 		goto fail;
 
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

IOMMU mappings take a prot parameter, identifying the protection bits
to enforce on the newly created mapping (READ or WRITE). The ARM
dma-mapping framework currently just passes 0 as the prot argument,
resulting in faulting mappings.

This patch infers the protection attributes based on the direction of
the DMA transfer.

Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/mm/dma-mapping.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index 6fb80cf..d119de7 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1636,13 +1636,27 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
 {
 	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
 	dma_addr_t dma_addr;
-	int ret, len = PAGE_ALIGN(size + offset);
+	int ret, prot, len = PAGE_ALIGN(size + offset);
 
 	dma_addr = __alloc_iova(mapping, len);
 	if (dma_addr == DMA_ERROR_CODE)
 		return dma_addr;
 
-	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0);
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		prot = IOMMU_READ | IOMMU_WRITE;
+		break;
+	case DMA_TO_DEVICE:
+		prot = IOMMU_READ;
+		break;
+	case DMA_FROM_DEVICE:
+		prot = IOMMU_WRITE;
+		break;
+	default:
+		prot = 0;
+	}
+
+	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
 	if (ret < 0)
 		goto fail;
 
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

The current code only clobbers a local variable, so the device is left
with a stale mapping pointer.

Cc: Hiroshi Doyu <hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm/mm/dma-mapping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index d119de7..10282db 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1921,7 +1921,7 @@ void arm_iommu_detach_device(struct device *dev)
 
 	iommu_detach_device(mapping->domain, dev);
 	kref_put(&mapping->kref, release_iommu_mapping);
-	mapping = NULL;
+	dev->archdata.mapping = NULL;
 	set_dma_ops(dev, NULL);
 
 	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

The current code only clobbers a local variable, so the device is left
with a stale mapping pointer.

Cc: Hiroshi Doyu <hdoyu@nvidia.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/mm/dma-mapping.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index d119de7..10282db 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -1921,7 +1921,7 @@ void arm_iommu_detach_device(struct device *dev)
 
 	iommu_detach_device(mapping->domain, dev);
 	kref_put(&mapping->kref, release_iommu_mapping);
-	mapping = NULL;
+	dev->archdata.mapping = NULL;
 	set_dma_ops(dev, NULL);
 
 	pr_debug("Detached IOMMU controller from %s device.\n", dev_name(dev));
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 5/9] arm64: pgtable: use pte_index instead of __pte_index
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Catalin Marinas, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	Will Deacon, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

pte_index is a useful helper outside of arch/arm64, for things like the
ARM SMMU driver, so rename __pte_index to pte_index to be consistent
with both arch/arm/ and also the definitions of pmd_index and pgd_index.

Cc: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm64/include/asm/pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e333a24..b93bc23 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -119,7 +119,7 @@ extern struct page *empty_zero_page;
 #define pte_none(pte)		(!pte_val(pte))
 #define pte_clear(mm,addr,ptep)	set_pte(ptep, __pte(0))
 #define pte_page(pte)		(pfn_to_page(pte_pfn(pte)))
-#define pte_offset_kernel(dir,addr)	(pmd_page_vaddr(*(dir)) + __pte_index(addr))
+#define pte_offset_kernel(dir,addr)	(pmd_page_vaddr(*(dir)) + pte_index(addr))
 
 #define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
 #define pte_offset_map_nested(dir,addr)	pte_offset_kernel((dir), (addr))
@@ -263,7 +263,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #endif
 
 /* Find an entry in the third-level page table.. */
-#define __pte_index(addr)	(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
+#define pte_index(addr)		(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 5/9] arm64: pgtable: use pte_index instead of __pte_index
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

pte_index is a useful helper outside of arch/arm64, for things like the
ARM SMMU driver, so rename __pte_index to pte_index to be consistent
with both arch/arm/ and also the definitions of pmd_index and pgd_index.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/pgtable.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e333a24..b93bc23 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -119,7 +119,7 @@ extern struct page *empty_zero_page;
 #define pte_none(pte)		(!pte_val(pte))
 #define pte_clear(mm,addr,ptep)	set_pte(ptep, __pte(0))
 #define pte_page(pte)		(pfn_to_page(pte_pfn(pte)))
-#define pte_offset_kernel(dir,addr)	(pmd_page_vaddr(*(dir)) + __pte_index(addr))
+#define pte_offset_kernel(dir,addr)	(pmd_page_vaddr(*(dir)) + pte_index(addr))
 
 #define pte_offset_map(dir,addr)	pte_offset_kernel((dir), (addr))
 #define pte_offset_map_nested(dir,addr)	pte_offset_kernel((dir), (addr))
@@ -263,7 +263,7 @@ static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
 #endif
 
 /* Find an entry in the third-level page table.. */
-#define __pte_index(addr)	(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
+#define pte_index(addr)		(((addr) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1))
 
 static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
 {
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 6/9] arm64: device: add iommu pointer to device archdata
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Catalin Marinas, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	Will Deacon, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

When using an IOMMU for device mappings, it is necessary to keep a
pointer between the device and the IOMMU to which it is attached in
order to obtain the correct IOMMU when attaching the device to a domain.

This patch adds an iommu pointer to the dev_archdata structure, in a
similar manner to other architectures (ARM, PowerPC, x86, ...).

Cc: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 arch/arm64/include/asm/device.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 07dd743..01fac31 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -18,6 +18,9 @@
 
 struct dev_archdata {
 	struct dma_map_ops *dma_ops;
+#ifdef CONFIG_IOMMU_API
+	void *iommu; /* private IOMMU data */
+#endif
 	struct dma_iommu_mapping	*mapping;
 };
 
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 6/9] arm64: device: add iommu pointer to device archdata
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

When using an IOMMU for device mappings, it is necessary to keep a
pointer between the device and the IOMMU to which it is attached in
order to obtain the correct IOMMU when attaching the device to a domain.

This patch adds an iommu pointer to the dev_archdata structure, in a
similar manner to other architectures (ARM, PowerPC, x86, ...).

Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/device.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h
index 07dd743..01fac31 100644
--- a/arch/arm64/include/asm/device.h
+++ b/arch/arm64/include/asm/device.h
@@ -18,6 +18,9 @@
 
 struct dev_archdata {
 	struct dma_map_ops *dma_ops;
+#ifdef CONFIG_IOMMU_API
+	void *iommu; /* private IOMMU data */
+#endif
 	struct dma_iommu_mapping	*mapping;
 };
 
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	Rob Herring, Andreas Herrmann,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

This patch adds a description of the device tree binding for the ARM
System MMU architecture.

Cc: Rob Herring <robherring2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Andreas Herrmann <andreas.herrmann-bsGFqQB8/DxBDgjK7y7TUQ@public.gmane.org>
Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 .../devicetree/bindings/iommu/arm,smmu.txt         | 70 ++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
new file mode 100644
index 0000000..e34c6cd
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -0,0 +1,70 @@
+* ARM System MMU Architecture Implementation
+
+ARM SoCs may contain an implementation of the ARM System Memory
+Management Unit Architecture, which can be used to provide 1 or 2 stages
+of address translation to bus masters external to the CPU.
+
+The SMMU may also raise interrupts in response to various fault
+conditions.
+
+** System MMU required properties:
+
+- compatible    : Should be one of:
+
+                        "arm,smmu-v1"
+                        "arm,smmu-v2"
+                        "arm,mmu-400"
+                        "arm,mmu-500"
+
+                  depending on the particular implementation and/or the
+                  version of the architecture implemented.
+
+- reg           : Base address and size of the SMMU.
+
+- #global-interrupts : The number of global interrupts exposed by the
+                       device.
+
+- interrupts    : Interrupt list, with the first #global-irqs entries
+                  corresponding to the global interrupts and any
+                  following entries corresponding to context interrupts,
+                  specified in order of their indexing by the SMMU.
+
+                  For SMMUv2 implementations, there must be exactly one
+                  interrupt per context bank. In the case of a single,
+                  combined interrupt, it must be listed multiple times.
+
+- mmu-masters   : A list of phandles to device nodes representing bus
+                  masters for which the SMMU can provide a translation
+                  and their corresponding StreamIDs (see example below).
+                  Each device node linked from this list must have a
+                  "#stream-id-cells" property, indicating the number of
+                  StreamIDs associated with it.
+
+** System MMU optional properties:
+
+- smmu-parent   : When multiple SMMUs are chained together, this
+                  property can be used to provide a phandle to the
+                  parent SMMU (that is the next SMMU on the path going
+                  from the mmu-masters towards memory) node for this
+                  SMMU.
+
+Example:
+
+        smmu {
+                compatible = "arm,smmu-v1";
+                reg = <0xba5e0000 0x10000>;
+                #global-interrupts = <2>;
+                interrupts = <0 32 4>,
+                             <0 33 4>,
+                             <0 34 4>, /* This is the first context interrupt */
+                             <0 35 4>,
+                             <0 36 4>,
+                             <0 37 4>;
+
+                /*
+                 * Two DMA controllers, the first with two StreamIDs (0xd01d
+                 * and 0xd01e) and the second with only one (0xd11c).
+                 */
+                mmu-masters = <&dma0 0xd01d 0xd01e>,
+                              <&dma1 0xd11c>;
+        };
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds a description of the device tree binding for the ARM
System MMU architecture.

Cc: Rob Herring <robherring2@gmail.com>
Cc: Andreas Herrmann <andreas.herrmann@calxeda.com>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 .../devicetree/bindings/iommu/arm,smmu.txt         | 70 ++++++++++++++++++++++
 1 file changed, 70 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
new file mode 100644
index 0000000..e34c6cd
--- /dev/null
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
@@ -0,0 +1,70 @@
+* ARM System MMU Architecture Implementation
+
+ARM SoCs may contain an implementation of the ARM System Memory
+Management Unit Architecture, which can be used to provide 1 or 2 stages
+of address translation to bus masters external to the CPU.
+
+The SMMU may also raise interrupts in response to various fault
+conditions.
+
+** System MMU required properties:
+
+- compatible    : Should be one of:
+
+                        "arm,smmu-v1"
+                        "arm,smmu-v2"
+                        "arm,mmu-400"
+                        "arm,mmu-500"
+
+                  depending on the particular implementation and/or the
+                  version of the architecture implemented.
+
+- reg           : Base address and size of the SMMU.
+
+- #global-interrupts : The number of global interrupts exposed by the
+                       device.
+
+- interrupts    : Interrupt list, with the first #global-irqs entries
+                  corresponding to the global interrupts and any
+                  following entries corresponding to context interrupts,
+                  specified in order of their indexing by the SMMU.
+
+                  For SMMUv2 implementations, there must be exactly one
+                  interrupt per context bank. In the case of a single,
+                  combined interrupt, it must be listed multiple times.
+
+- mmu-masters   : A list of phandles to device nodes representing bus
+                  masters for which the SMMU can provide a translation
+                  and their corresponding StreamIDs (see example below).
+                  Each device node linked from this list must have a
+                  "#stream-id-cells" property, indicating the number of
+                  StreamIDs associated with it.
+
+** System MMU optional properties:
+
+- smmu-parent   : When multiple SMMUs are chained together, this
+                  property can be used to provide a phandle to the
+                  parent SMMU (that is the next SMMU on the path going
+                  from the mmu-masters towards memory) node for this
+                  SMMU.
+
+Example:
+
+        smmu {
+                compatible = "arm,smmu-v1";
+                reg = <0xba5e0000 0x10000>;
+                #global-interrupts = <2>;
+                interrupts = <0 32 4>,
+                             <0 33 4>,
+                             <0 34 4>, /* This is the first context interrupt */
+                             <0 35 4>,
+                             <0 36 4>,
+                             <0 37 4>;
+
+                /*
+                 * Two DMA controllers, the first with two StreamIDs (0xd01d
+                 * and 0xd01e) and the second with only one (0xd11c).
+                 */
+                mmu-masters = <&dma0 0xd01d 0xd01e>,
+                              <&dma1 0xd11c>;
+        };
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Olav Haugan, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	Will Deacon, Rob Herring, Andreas Herrmann,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

This patch adds support for SMMUs implementing the ARM System MMU
architecture versions 1 or 2. Both arm and arm64 are supported, although
the v7s descriptor format is not used.

Cc: Rob Herring <robherring2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Cc: Andreas Herrmann <andreas.herrmann-bsGFqQB8/DxBDgjK7y7TUQ@public.gmane.org>
Cc: Olav Haugan <ohaugan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/Kconfig    |   13 +
 drivers/iommu/Makefile   |    1 +
 drivers/iommu/arm-smmu.c | 1965 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1979 insertions(+)
 create mode 100644 drivers/iommu/arm-smmu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c332fb9..957cfd4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -261,4 +261,17 @@ config SHMOBILE_IOMMU_L1SIZE
 	default 256 if SHMOBILE_IOMMU_ADDRSIZE_64MB
 	default 128 if SHMOBILE_IOMMU_ADDRSIZE_32MB
 
+config ARM_SMMU
+	bool "ARM Ltd. System MMU (SMMU) Support"
+	depends on ARM64 || (ARM_LPAE && OF)
+	select IOMMU_API
+	select ARM_DMA_USE_IOMMU if ARM
+	help
+	  Support for implementations of the ARM System MMU architecture
+	  versions 1 and 2. The driver supports both v7l and v8l table
+	  formats with 4k and 64k page sizes.
+
+	  Say Y here if your SoC includes an IOMMU device implementing
+	  the ARM SMMU architecture.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index ef0e520..bbe7041 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += iova.o intel-iommu.o
 obj-$(CONFIG_IRQ_REMAP) += intel_irq_remapping.o irq_remapping.o
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
new file mode 100644
index 0000000..57ec597
--- /dev/null
+++ b/drivers/iommu/arm-smmu.c
@@ -0,0 +1,1965 @@
+/*
+ * IOMMU API for ARM architected SMMU implementations.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2013 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
+ *
+ * This driver currently supports:
+ *	- SMMUv1 and v2 implementations
+ *	- Stream-matching and stream-indexing
+ *	- v7/v8 long-descriptor format
+ *	- Non-secure access to the SMMU
+ *	- 4k and 64k pages, with contiguous pte hints.
+ *	- Up to 39-bit addressing
+ *	- Context fault reporting
+ */
+
+#define pr_fmt(fmt) "arm-smmu: " fmt
+
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iommu.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include <linux/amba/bus.h>
+
+#include <asm/pgalloc.h>
+
+/* Maximum number of stream IDs assigned to a single device */
+#define MAX_MASTER_STREAMIDS		8
+
+/* Maximum number of context banks per SMMU */
+#define ARM_SMMU_MAX_CBS		128
+
+/* Maximum number of mapping groups per SMMU */
+#define ARM_SMMU_MAX_SMRS		128
+
+/* Number of VMIDs per SMMU */
+#define ARM_SMMU_NUM_VMIDS		256
+
+/* SMMU global address space */
+#define ARM_SMMU_GR0(smmu)		((smmu)->base)
+#define ARM_SMMU_GR1(smmu)		((smmu)->base + (smmu)->pagesize)
+
+/* Page table bits */
+#define ARM_SMMU_PTE_PAGE		(((pteval_t)3) << 0)
+#define ARM_SMMU_PTE_CONT		(((pteval_t)1) << 52)
+#define ARM_SMMU_PTE_AF			(((pteval_t)1) << 10)
+#define ARM_SMMU_PTE_SH_NS		(((pteval_t)0) << 8)
+#define ARM_SMMU_PTE_SH_OS		(((pteval_t)2) << 8)
+#define ARM_SMMU_PTE_SH_IS		(((pteval_t)3) << 8)
+
+#if PAGE_SIZE == SZ_4K
+#define ARM_SMMU_PTE_CONT_ENTRIES	16
+#elif PAGE_SIZE == SZ_64K
+#define ARM_SMMU_PTE_CONT_ENTRIES	32
+#else
+#define ARM_SMMU_PTE_CONT_ENTRIES	1
+#endif
+
+#define ARM_SMMU_PTE_CONT_SIZE		(PAGE_SIZE * ARM_SMMU_PTE_CONT_ENTRIES)
+#define ARM_SMMU_PTE_CONT_MASK		(~(ARM_SMMU_PTE_CONT_SIZE - 1))
+#define ARM_SMMU_PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(pte_t))
+
+/* Stage-1 PTE */
+#define ARM_SMMU_PTE_AP_UNPRIV		(((pteval_t)1) << 6)
+#define ARM_SMMU_PTE_AP_RDONLY		(((pteval_t)2) << 6)
+#define ARM_SMMU_PTE_ATTRINDX_SHIFT	2
+
+/* Stage-2 PTE */
+#define ARM_SMMU_PTE_HAP_FAULT		(((pteval_t)0) << 6)
+#define ARM_SMMU_PTE_HAP_READ		(((pteval_t)1) << 6)
+#define ARM_SMMU_PTE_HAP_WRITE		(((pteval_t)2) << 6)
+#define ARM_SMMU_PTE_MEMATTR_OIWB	(((pteval_t)0xf) << 2)
+#define ARM_SMMU_PTE_MEMATTR_NC		(((pteval_t)0x5) << 2)
+#define ARM_SMMU_PTE_MEMATTR_DEV	(((pteval_t)0x1) << 2)
+
+/* Configuration registers */
+#define ARM_SMMU_GR0_sCR0		0x0
+#define sCR0_CLIENTPD			(1 << 0)
+#define sCR0_GFRE			(1 << 1)
+#define sCR0_GFIE			(1 << 2)
+#define sCR0_GCFGFRE			(1 << 4)
+#define sCR0_GCFGFIE			(1 << 5)
+#define sCR0_USFCFG			(1 << 10)
+#define sCR0_VMIDPNE			(1 << 11)
+#define sCR0_PTM			(1 << 12)
+#define sCR0_FB				(1 << 13)
+#define sCR0_BSU_SHIFT			14
+#define sCR0_BSU_MASK			0x3
+
+/* Identification registers */
+#define ARM_SMMU_GR0_ID0		0x20
+#define ARM_SMMU_GR0_ID1		0x24
+#define ARM_SMMU_GR0_ID2		0x28
+#define ARM_SMMU_GR0_ID3		0x2c
+#define ARM_SMMU_GR0_ID4		0x30
+#define ARM_SMMU_GR0_ID5		0x34
+#define ARM_SMMU_GR0_ID6		0x38
+#define ARM_SMMU_GR0_ID7		0x3c
+#define ARM_SMMU_GR0_sGFSR		0x48
+#define ARM_SMMU_GR0_sGFSYNR0		0x50
+#define ARM_SMMU_GR0_sGFSYNR1		0x54
+#define ARM_SMMU_GR0_sGFSYNR2		0x58
+#define ARM_SMMU_GR0_PIDR0		0xfe0
+#define ARM_SMMU_GR0_PIDR1		0xfe4
+#define ARM_SMMU_GR0_PIDR2		0xfe8
+
+#define ID0_S1TS			(1 << 30)
+#define ID0_S2TS			(1 << 29)
+#define ID0_NTS				(1 << 28)
+#define ID0_SMS				(1 << 27)
+#define ID0_PTFS_SHIFT			24
+#define ID0_PTFS_MASK			0x2
+#define ID0_PTFS_V8_ONLY		0x2
+#define ID0_CTTW			(1 << 14)
+#define ID0_NUMIRPT_SHIFT		16
+#define ID0_NUMIRPT_MASK		0xff
+#define ID0_NUMSMRG_SHIFT		0
+#define ID0_NUMSMRG_MASK		0xff
+
+#define ID1_PAGESIZE			(1 << 31)
+#define ID1_NUMPAGENDXB_SHIFT		28
+#define ID1_NUMPAGENDXB_MASK		7
+#define ID1_NUMS2CB_SHIFT		16
+#define ID1_NUMS2CB_MASK		0xff
+#define ID1_NUMCB_SHIFT			0
+#define ID1_NUMCB_MASK			0xff
+
+#define ID2_OAS_SHIFT			4
+#define ID2_OAS_MASK			0xf
+#define ID2_IAS_SHIFT			0
+#define ID2_IAS_MASK			0xf
+#define ID2_UBS_SHIFT			8
+#define ID2_UBS_MASK			0xf
+#define ID2_PTFS_4K			(1 << 12)
+#define ID2_PTFS_16K			(1 << 13)
+#define ID2_PTFS_64K			(1 << 14)
+
+#define PIDR2_ARCH_SHIFT		4
+#define PIDR2_ARCH_MASK			0xf
+
+/* Global TLB invalidation */
+#define ARM_SMMU_GR0_STLBIALL		0x60
+#define ARM_SMMU_GR0_TLBIVMID		0x64
+#define ARM_SMMU_GR0_TLBIALLNSNH	0x68
+#define ARM_SMMU_GR0_TLBIALLH		0x6c
+#define ARM_SMMU_GR0_sTLBGSYNC		0x70
+#define ARM_SMMU_GR0_sTLBGSTATUS	0x74
+#define sTLBGSTATUS_GSACTIVE		(1 << 0)
+
+/* Stream mapping registers */
+#define ARM_SMMU_GR0_SMR(n)		(0x800 + ((n) << 2))
+#define SMR_VALID			(1 << 31)
+#define SMR_MASK_SHIFT			16
+#define SMR_MASK_MASK			0x7fff
+#define SMR_ID_SHIFT			0
+#define SMR_ID_MASK			0x7fff
+
+#define ARM_SMMU_GR0_S2CR(n)		(0xc00 + ((n) << 2))
+#define S2CR_CBNDX_SHIFT		0
+#define S2CR_CBNDX_MASK			0xff
+#define S2CR_TYPE_SHIFT			16
+#define S2CR_TYPE_MASK			0x3
+#define S2CR_TYPE_TRANS			(0 << S2CR_TYPE_SHIFT)
+#define S2CR_TYPE_BYPASS		(1 << S2CR_TYPE_SHIFT)
+#define S2CR_TYPE_FAULT			(2 << S2CR_TYPE_SHIFT)
+
+/* Context bank attribute registers */
+#define ARM_SMMU_GR1_CBAR(n)		(0x0 + ((n) << 2))
+#define CBAR_VMID_SHIFT			0
+#define CBAR_VMID_MASK			0xff
+#define CBAR_S1_MEMATTR_SHIFT		12
+#define CBAR_S1_MEMATTR_MASK		0xf
+#define CBAR_S1_MEMATTR_WB		0xf
+#define CBAR_TYPE_SHIFT			16
+#define CBAR_TYPE_MASK			0x3
+#define CBAR_TYPE_S2_TRANS		(0 << CBAR_TYPE_SHIFT)
+#define CBAR_TYPE_S1_TRANS_S2_BYPASS	(1 << CBAR_TYPE_SHIFT)
+#define CBAR_TYPE_S1_TRANS_S2_FAULT	(2 << CBAR_TYPE_SHIFT)
+#define CBAR_TYPE_S1_TRANS_S2_TRANS	(3 << CBAR_TYPE_SHIFT)
+#define CBAR_IRPTNDX_SHIFT		24
+#define CBAR_IRPTNDX_MASK		0xff
+
+#define ARM_SMMU_GR1_CBA2R(n)		(0x800 + ((n) << 2))
+#define CBA2R_RW64_32BIT		(0 << 0)
+#define CBA2R_RW64_64BIT		(1 << 0)
+
+/* Translation context bank */
+#define ARM_SMMU_CB_BASE(smmu)		((smmu)->base + ((smmu)->size >> 1))
+#define ARM_SMMU_CB(smmu, n)		((n) * (smmu)->pagesize)
+
+#define ARM_SMMU_CB_SCTLR		0x0
+#define ARM_SMMU_CB_RESUME		0x8
+#define ARM_SMMU_CB_TTBCR2		0x10
+#define ARM_SMMU_CB_TTBR0_LO		0x20
+#define ARM_SMMU_CB_TTBR0_HI		0x24
+#define ARM_SMMU_CB_TTBCR		0x30
+#define ARM_SMMU_CB_S1_MAIR0		0x38
+#define ARM_SMMU_CB_FSR			0x58
+#define ARM_SMMU_CB_FAR_LO		0x60
+#define ARM_SMMU_CB_FAR_HI		0x64
+#define ARM_SMMU_CB_FSYNR0		0x68
+
+#define SCTLR_S1_ASIDPNE		(1 << 12)
+#define SCTLR_CFCFG			(1 << 7)
+#define SCTLR_CFIE			(1 << 6)
+#define SCTLR_CFRE			(1 << 5)
+#define SCTLR_E				(1 << 4)
+#define SCTLR_AFE			(1 << 2)
+#define SCTLR_TRE			(1 << 1)
+#define SCTLR_M				(1 << 0)
+#define SCTLR_EAE_SBOP			(SCTLR_AFE | SCTLR_TRE)
+
+#define RESUME_RETRY			(0 << 0)
+#define RESUME_TERMINATE		(1 << 0)
+
+#define TTBCR_EAE			(1 << 31)
+
+#define TTBCR_PASIZE_SHIFT		16
+#define TTBCR_PASIZE_MASK		0x7
+
+#define TTBCR_TG0_4K			(0 << 14)
+#define TTBCR_TG0_64K			(1 << 14)
+
+#define TTBCR_SH0_SHIFT			12
+#define TTBCR_SH0_MASK			0x3
+#define TTBCR_SH_NS			0
+#define TTBCR_SH_OS			2
+#define TTBCR_SH_IS			3
+
+#define TTBCR_ORGN0_SHIFT		10
+#define TTBCR_IRGN0_SHIFT		8
+#define TTBCR_RGN_MASK			0x3
+#define TTBCR_RGN_NC			0
+#define TTBCR_RGN_WBWA			1
+#define TTBCR_RGN_WT			2
+#define TTBCR_RGN_WB			3
+
+#define TTBCR_SL0_SHIFT			6
+#define TTBCR_SL0_MASK			0x3
+#define TTBCR_SL0_LVL_2			0
+#define TTBCR_SL0_LVL_1			1
+
+#define TTBCR_T1SZ_SHIFT		16
+#define TTBCR_T0SZ_SHIFT		0
+#define TTBCR_SZ_MASK			0xf
+
+#define TTBCR2_SEP_SHIFT		15
+#define TTBCR2_SEP_MASK			0x7
+
+#define TTBCR2_PASIZE_SHIFT		0
+#define TTBCR2_PASIZE_MASK		0x7
+
+/* Common definitions for PASize and SEP fields */
+#define TTBCR2_ADDR_32			0
+#define TTBCR2_ADDR_36			1
+#define TTBCR2_ADDR_40			2
+#define TTBCR2_ADDR_42			3
+#define TTBCR2_ADDR_44			4
+#define TTBCR2_ADDR_48			5
+
+#define MAIR_ATTR_SHIFT(n)		((n) << 3)
+#define MAIR_ATTR_MASK			0xff
+#define MAIR_ATTR_DEVICE		0x04
+#define MAIR_ATTR_NC			0x44
+#define MAIR_ATTR_WBRWA			0xff
+#define MAIR_ATTR_IDX_NC		0
+#define MAIR_ATTR_IDX_CACHE		1
+#define MAIR_ATTR_IDX_DEV		2
+
+#define FSR_MULTI			(1 << 31)
+#define FSR_SS				(1 << 30)
+#define FSR_UUT				(1 << 8)
+#define FSR_ASF				(1 << 7)
+#define FSR_TLBLKF			(1 << 6)
+#define FSR_TLBMCF			(1 << 5)
+#define FSR_EF				(1 << 4)
+#define FSR_PF				(1 << 3)
+#define FSR_AFF				(1 << 2)
+#define FSR_TF				(1 << 1)
+
+#define FSR_IGN				(FSR_AFF | FSR_ASF | FSR_TLBMCF |	\
+					 FSR_TLBLKF)
+#define FSR_FAULT			(FSR_MULTI | FSR_SS | FSR_UUT |		\
+					 FSR_EF | FSR_PF | FSR_TF)
+
+#define FSYNR0_WNR			(1 << 4)
+
+struct arm_smmu_smr {
+	u8				idx;
+	u16				mask;
+	u16				id;
+};
+
+struct arm_smmu_master {
+	struct device_node		*of_node;
+
+	/*
+	 * The following is specific to the master's position in the
+	 * SMMU chain.
+	 */
+	struct rb_node			node;
+	int				num_streamids;
+	u16				streamids[MAX_MASTER_STREAMIDS];
+
+	/*
+	 * We only need to allocate these on the root SMMU, as we
+	 * configure unmatched streams to bypass translation.
+	 */
+	struct arm_smmu_smr		*smrs;
+};
+
+struct arm_smmu_device {
+	struct device			*dev;
+	struct device_node		*parent_of_node;
+
+	void __iomem			*base;
+	unsigned long			size;
+	unsigned long			pagesize;
+
+#define ARM_SMMU_FEAT_COHERENT_WALK	(1 << 0)
+#define ARM_SMMU_FEAT_STREAM_MATCH	(1 << 1)
+#define ARM_SMMU_FEAT_TRANS_S1		(1 << 2)
+#define ARM_SMMU_FEAT_TRANS_S2		(1 << 3)
+#define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
+	u32				features;
+	int				version;
+
+	u32				num_context_banks;
+	u32				num_s2_context_banks;
+	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
+	atomic_t			irptndx;
+
+	u32				num_mapping_groups;
+	DECLARE_BITMAP(smr_map, ARM_SMMU_MAX_SMRS);
+
+	unsigned long			input_size;
+	unsigned long			s1_output_size;
+	unsigned long			s2_output_size;
+
+	u32				num_global_irqs;
+	u32				num_context_irqs;
+	unsigned int			*irqs;
+
+	DECLARE_BITMAP(vmid_map, ARM_SMMU_NUM_VMIDS);
+
+	struct list_head		list;
+	struct rb_root			masters;
+};
+
+struct arm_smmu_cfg {
+	struct arm_smmu_device		*smmu;
+	u8				vmid;
+	u8				cbndx;
+	u8				irptndx;
+	u32				cbar;
+	pgd_t				*pgd;
+};
+
+struct arm_smmu_domain {
+	/*
+	 * A domain can span across multiple, chained SMMUs and requires
+	 * all devices within the domain to follow the same translation
+	 * path.
+	 */
+	struct arm_smmu_device		*leaf_smmu;
+	struct arm_smmu_cfg		root_cfg;
+
+	spinlock_t			lock;
+};
+
+static DEFINE_SPINLOCK(arm_smmu_devices_lock);
+static LIST_HEAD(arm_smmu_devices);
+
+static struct arm_smmu_master *find_smmu_master(struct arm_smmu_device *smmu,
+						struct device_node *dev_node)
+{
+	struct rb_node *node = smmu->masters.rb_node;
+
+	while (node) {
+		struct arm_smmu_master *master;
+		master = container_of(node, struct arm_smmu_master, node);
+
+		if (dev_node < master->of_node)
+			node = node->rb_left;
+		else if (dev_node > master->of_node)
+			node = node->rb_right;
+		else
+			return master;
+	}
+
+	return NULL;
+}
+
+static int insert_smmu_master(struct arm_smmu_device *smmu,
+			      struct arm_smmu_master *master)
+{
+	struct rb_node **new, *parent;
+
+	new = &smmu->masters.rb_node;
+	parent = NULL;
+	while (*new) {
+		struct arm_smmu_master *this;
+		this = container_of(*new, struct arm_smmu_master, node);
+
+		parent = *new;
+		if (master->of_node < this->of_node)
+			new = &((*new)->rb_left);
+		else if (master->of_node > this->of_node)
+			new = &((*new)->rb_right);
+		else
+			return -EEXIST;
+	}
+
+	rb_link_node(&master->node, parent, new);
+	rb_insert_color(&master->node, &smmu->masters);
+	return 0;
+}
+
+static int register_smmu_master(struct arm_smmu_device *smmu,
+				struct device *dev,
+				struct of_phandle_args *masterspec)
+{
+	int i;
+	struct arm_smmu_master *master;
+
+	master = find_smmu_master(smmu, masterspec->np);
+	if (master) {
+		dev_err(dev,
+			"rejecting multiple registrations for master device %s\n",
+			masterspec->np->name);
+		return -EBUSY;
+	}
+
+	if (masterspec->args_count > MAX_MASTER_STREAMIDS) {
+		dev_err(dev,
+			"reached maximum number (%d) of stream IDs for master device %s\n",
+			MAX_MASTER_STREAMIDS, masterspec->np->name);
+		return -ENOSPC;
+	}
+
+	master = devm_kzalloc(dev, sizeof(*master), GFP_KERNEL);
+	if (!master)
+		return -ENOMEM;
+
+	master->of_node		= masterspec->np;
+	master->num_streamids	= masterspec->args_count;
+
+	for (i = 0; i < master->num_streamids; ++i)
+		master->streamids[i] = masterspec->args[i];
+
+	return insert_smmu_master(smmu, master);
+}
+
+static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
+{
+	struct arm_smmu_device *parent, *tmp;
+
+	if (!smmu->parent_of_node)
+		return NULL;
+
+	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
+		if (parent->dev->of_node == smmu->parent_of_node)
+			return parent;
+
+	dev_warn(smmu->dev,
+		 "Failed to find SMMU parent despite parent in DT\n");
+	return NULL;
+}
+
+static struct arm_smmu_device *find_root_smmu(struct device *dev)
+{
+	struct arm_smmu_device *root, *parent;
+
+	/*
+	 * Walk the SMMU chain to find the root device for this chain.
+	 * We assume that no masters have translations which terminate
+	 * early, and therefore check that the root SMMU does indeed have
+	 * a StreamID for the master in question.
+	 */
+	parent = dev->archdata.iommu;
+	do {
+		root = parent;
+	} while ((parent = find_parent_smmu(root)));
+
+	if (!find_smmu_master(root, dev->of_node))
+		return NULL;
+
+	return root;
+}
+
+static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
+{
+	int idx;
+
+	do {
+		idx = find_next_zero_bit(map, end, start);
+		if (idx == end)
+			return -ENOSPC;
+	} while (test_and_set_bit(idx, map));
+
+	return idx;
+}
+
+static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
+{
+	clear_bit(idx, map);
+}
+
+/* Wait for any pending TLB invalidations to complete */
+static void arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
+{
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
+	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
+	       & sTLBGSTATUS_GSACTIVE)
+		cpu_relax();
+}
+
+static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
+{
+	int flags, ret;
+	u32 fsr, far, fsynr, resume;
+	unsigned long iova;
+	struct iommu_domain *domain = dev;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+	void __iomem *cb_base;
+
+	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, root_cfg->cbndx);
+	fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+
+	if (!(fsr & FSR_FAULT))
+		return IRQ_NONE;
+
+	if (fsr & FSR_IGN)
+		dev_err_ratelimited(smmu->dev,
+				    "Unexpected context fault (fsr 0x%u)\n",
+				    fsr);
+
+	fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+	flags = fsynr & FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ;
+
+	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_LO);
+	iova = far;
+#ifdef CONFIG_64BIT
+	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_HI);
+	iova |= ((unsigned long)far << 32);
+#endif
+
+	if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
+		ret = IRQ_HANDLED;
+		resume = RESUME_RETRY;
+	} else {
+		ret = IRQ_NONE;
+		resume = RESUME_TERMINATE;
+	}
+
+	/* Clear the faulting FSR */
+	writel(fsr, cb_base + ARM_SMMU_CB_FSR);
+
+	/* Retry or terminate any stalled transactions */
+	if (fsr & FSR_SS)
+		writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME);
+
+	return ret;
+}
+
+static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
+{
+	u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+	struct arm_smmu_device *smmu = dev;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+	gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+	gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+	gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+	dev_err_ratelimited(smmu->dev,
+		"Unexpected global fault, this could be serious\n");
+	dev_err_ratelimited(smmu->dev,
+		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
+		gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+	writel(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+	return IRQ_NONE;
+}
+
+static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
+{
+	u32 reg;
+	bool stage1;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+	void __iomem *cb_base, *gr0_base, *gr1_base;
+
+	gr0_base = ARM_SMMU_GR0(smmu);
+	gr1_base = ARM_SMMU_GR1(smmu);
+	stage1 = root_cfg->cbar != CBAR_TYPE_S2_TRANS;
+	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, root_cfg->cbndx);
+
+	/* CBAR */
+	reg = root_cfg->cbar |
+	      (root_cfg->vmid << CBAR_VMID_SHIFT);
+	if (smmu->version == 1)
+	      reg |= root_cfg->irptndx << CBAR_IRPTNDX_SHIFT;
+
+	/* Use the weakest memory type, so it is overridden by the pte */
+	if (stage1)
+		reg |= (CBAR_S1_MEMATTR_WB << CBAR_S1_MEMATTR_SHIFT);
+	writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(root_cfg->cbndx));
+
+	if (smmu->version > 1) {
+		/* CBA2R */
+#ifdef CONFIG_64BIT
+		reg = CBA2R_RW64_64BIT;
+#else
+		reg = CBA2R_RW64_32BIT;
+#endif
+		writel_relaxed(reg,
+			       gr1_base + ARM_SMMU_GR1_CBA2R(root_cfg->cbndx));
+
+		/* TTBCR2 */
+		switch (smmu->input_size) {
+		case 32:
+			reg = (TTBCR2_ADDR_32 << TTBCR2_SEP_SHIFT);
+			break;
+		case 36:
+			reg = (TTBCR2_ADDR_36 << TTBCR2_SEP_SHIFT);
+			break;
+		case 39:
+			reg = (TTBCR2_ADDR_40 << TTBCR2_SEP_SHIFT);
+			break;
+		case 42:
+			reg = (TTBCR2_ADDR_42 << TTBCR2_SEP_SHIFT);
+			break;
+		case 44:
+			reg = (TTBCR2_ADDR_44 << TTBCR2_SEP_SHIFT);
+			break;
+		case 48:
+			reg = (TTBCR2_ADDR_48 << TTBCR2_SEP_SHIFT);
+			break;
+		}
+
+		switch (smmu->s1_output_size) {
+		case 32:
+			reg |= (TTBCR2_ADDR_32 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 36:
+			reg |= (TTBCR2_ADDR_36 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 39:
+			reg |= (TTBCR2_ADDR_40 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 42:
+			reg |= (TTBCR2_ADDR_42 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 44:
+			reg |= (TTBCR2_ADDR_44 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 48:
+			reg |= (TTBCR2_ADDR_48 << TTBCR2_PASIZE_SHIFT);
+			break;
+		}
+
+		if (stage1)
+			writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBCR2);
+	}
+
+	/* TTBR0 */
+	reg = __pa(root_cfg->pgd);
+#ifndef __BIG_ENDIAN
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_LO);
+	reg = (phys_addr_t)__pa(root_cfg->pgd) >> 32;
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_HI);
+#else
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_HI);
+	reg = (phys_addr_t)__pa(root_cfg->pgd) >> 32;
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_LO);
+#endif
+
+	/*
+	 * TTBCR
+	 * We use long descriptor, with inner-shareable WBWA tables in TTBR0.
+	 */
+	if (smmu->version > 1) {
+		if (PAGE_SIZE == SZ_4K)
+			reg = TTBCR_TG0_4K;
+		else
+			reg = TTBCR_TG0_64K;
+
+		if (!stage1) {
+			switch (smmu->s2_output_size) {
+			case 32:
+				reg |= (TTBCR2_ADDR_32 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 36:
+				reg |= (TTBCR2_ADDR_36 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 40:
+				reg |= (TTBCR2_ADDR_40 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 42:
+				reg |= (TTBCR2_ADDR_42 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 44:
+				reg |= (TTBCR2_ADDR_44 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 48:
+				reg |= (TTBCR2_ADDR_48 << TTBCR_PASIZE_SHIFT);
+				break;
+			}
+		} else {
+			reg |= (64 - smmu->s1_output_size) << TTBCR_T0SZ_SHIFT;
+		}
+	} else {
+		reg = 0;
+	}
+
+	reg |= TTBCR_EAE |
+	      (TTBCR_SH_IS << TTBCR_SH0_SHIFT) |
+	      (TTBCR_RGN_WBWA << TTBCR_ORGN0_SHIFT) |
+	      (TTBCR_RGN_WBWA << TTBCR_IRGN0_SHIFT) |
+	      (TTBCR_SL0_LVL_1 << TTBCR_SL0_SHIFT);
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBCR);
+
+	/* MAIR0 (stage-1 only) */
+	if (stage1) {
+		reg = (MAIR_ATTR_NC << MAIR_ATTR_SHIFT(MAIR_ATTR_IDX_NC)) |
+		      (MAIR_ATTR_WBRWA << MAIR_ATTR_SHIFT(MAIR_ATTR_IDX_CACHE)) |
+		      (MAIR_ATTR_DEVICE << MAIR_ATTR_SHIFT(MAIR_ATTR_IDX_DEV));
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_S1_MAIR0);
+	}
+
+	/* Nuke the TLB */
+	writel_relaxed(root_cfg->vmid, gr0_base + ARM_SMMU_GR0_TLBIVMID);
+	arm_smmu_tlb_sync(smmu);
+
+	/* SCTLR */
+	reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
+	if (stage1)
+		reg |= SCTLR_S1_ASIDPNE;
+#ifdef __BIG_ENDIAN
+	reg |= SCTLR_E;
+#endif
+	writel(reg, cb_base + ARM_SMMU_CB_SCTLR);
+}
+
+static int arm_smmu_init_domain_context(struct iommu_domain *domain)
+{
+	int irq, ret, start;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+
+	ret = __arm_smmu_alloc_bitmap(smmu->vmid_map, 0, ARM_SMMU_NUM_VMIDS);
+	if (IS_ERR_VALUE(ret))
+		goto out;
+
+	root_cfg->vmid = ret;
+	if (smmu->features & ARM_SMMU_FEAT_TRANS_NESTED) {
+		/*
+		 * We will likely want to change this if/when KVM gets
+		 * involved.
+		 */
+		root_cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
+		start = smmu->num_s2_context_banks;
+	} else if (smmu->features & ARM_SMMU_FEAT_TRANS_S2) {
+		root_cfg->cbar = CBAR_TYPE_S2_TRANS;
+		start = 0;
+	} else {
+		root_cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
+		start = smmu->num_s2_context_banks;
+	}
+
+	ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
+				      smmu->num_context_banks);
+	if (IS_ERR_VALUE(ret))
+		goto out_free_vmid;
+
+	root_cfg->cbndx = ret;
+
+	if (smmu->version == 1) {
+		root_cfg->irptndx = atomic_inc_return(&smmu->irptndx);
+		root_cfg->irptndx %= smmu->num_context_irqs;
+	} else {
+		root_cfg->irptndx = root_cfg->cbndx;
+	}
+
+	irq = smmu->irqs[smmu->num_global_irqs + root_cfg->irptndx];
+	ret = request_irq(irq, arm_smmu_context_fault, IRQF_SHARED,
+			  "arm-smmu-context-fault", domain);
+	if (IS_ERR_VALUE(ret)) {
+		dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n",
+			root_cfg->irptndx, irq);
+		root_cfg->irptndx = -1;
+		goto out_free_context;
+	}
+
+	arm_smmu_init_context_bank(smmu_domain);
+out:
+	return ret;
+
+out_free_context:
+	__arm_smmu_free_bitmap(smmu->context_map, root_cfg->cbndx);
+out_free_vmid:
+	__arm_smmu_free_bitmap(smmu->vmid_map, root_cfg->vmid);
+	return ret;
+}
+
+static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+	int irq;
+
+	if (!smmu)
+		return;
+
+	if (root_cfg->irptndx != -1) {
+		irq = smmu->irqs[smmu->num_global_irqs + root_cfg->irptndx];
+		free_irq(irq, domain);
+	}
+
+	__arm_smmu_free_bitmap(smmu->vmid_map, root_cfg->vmid);
+	__arm_smmu_free_bitmap(smmu->context_map, root_cfg->cbndx);
+}
+
+static int arm_smmu_domain_init(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain;
+	pgd_t *pgd;
+
+	/*
+	 * Allocate the domain and initialise some of its data structures.
+	 * We can't really do anything meaningful until we've added a
+	 * master.
+	 */
+	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
+	if (!smmu_domain)
+		return -ENOMEM;
+
+	pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!pgd)
+		goto out_free_domain;
+	smmu_domain->root_cfg.pgd = pgd;
+
+	spin_lock_init(&smmu_domain->lock);
+	domain->priv = smmu_domain;
+	return 0;
+
+out_free_domain:
+	kfree(smmu_domain);
+	return -ENOMEM;
+}
+
+static void arm_smmu_free_ptes(pmd_t *pmd)
+{
+	pgtable_t table = pmd_pgtable(*pmd);
+	pgtable_page_dtor(table);
+	__free_page(table);
+}
+
+static void arm_smmu_free_pmds(pud_t *pud)
+{
+	int i;
+	pmd_t *pmd, *pmd_base = pmd_offset(pud, 0);
+
+	pmd = pmd_base;
+	for (i = 0; i < PTRS_PER_PMD; ++i) {
+		if (pmd_none(*pmd))
+			continue;
+
+		arm_smmu_free_ptes(pmd);
+		pmd++;
+	}
+
+	pmd_free(NULL, pmd_base);
+}
+
+static void arm_smmu_free_puds(pgd_t *pgd)
+{
+	int i;
+	pud_t *pud, *pud_base = pud_offset(pgd, 0);
+
+	pud = pud_base;
+	for (i = 0; i < PTRS_PER_PUD; ++i) {
+		if (pud_none(*pud))
+			continue;
+
+		arm_smmu_free_pmds(pud);
+		pud++;
+	}
+
+	pud_free(NULL, pud_base);
+}
+
+static void arm_smmu_free_pgtables(struct arm_smmu_domain *smmu_domain)
+{
+	int i;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	pgd_t *pgd, *pgd_base = root_cfg->pgd;
+
+	/*
+	 * Recursively free the page tables for this domain. We don't
+	 * care about speculative TLB filling, because the TLB will be
+	 * nuked next time this context bank is re-allocated and no devices
+	 * currently map to these tables.
+	 */
+	pgd = pgd_base;
+	for (i = 0; i < PTRS_PER_PGD; ++i) {
+		if (pgd_none(*pgd))
+			continue;
+		arm_smmu_free_puds(pgd);
+		pgd++;
+	}
+
+	kfree(pgd_base);
+}
+
+static void arm_smmu_domain_destroy(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	arm_smmu_destroy_domain_context(domain);
+	arm_smmu_free_pgtables(smmu_domain);
+	kfree(smmu_domain);
+}
+
+static int arm_smmu_master_configure_smrs(struct arm_smmu_device *smmu,
+					  struct arm_smmu_master *master)
+{
+	int i;
+	struct arm_smmu_smr *smrs;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	if (!(smmu->features & ARM_SMMU_FEAT_STREAM_MATCH))
+		return 0;
+
+	if (master->smrs)
+		return -EEXIST;
+
+	smrs = kmalloc(sizeof(*smrs) * master->num_streamids, GFP_KERNEL);
+	if (!smrs) {
+		dev_err(smmu->dev, "failed to allocate %d SMRs for master %s\n",
+			master->num_streamids, master->of_node->name);
+		return -ENOMEM;
+	}
+
+	/* Allocate the SMRs on the root SMMU */
+	for (i = 0; i < master->num_streamids; ++i) {
+		int idx = __arm_smmu_alloc_bitmap(smmu->smr_map, 0,
+						  smmu->num_mapping_groups);
+		if (IS_ERR_VALUE(idx)) {
+			dev_err(smmu->dev, "failed to allocate free SMR\n");
+			goto err_free_smrs;
+		}
+
+		smrs[i] = (struct arm_smmu_smr) {
+			.idx	= idx,
+			.mask	= 0, /* We don't currently share SMRs */
+			.id	= master->streamids[i],
+		};
+	}
+
+	/* It worked! Now, poke the actual hardware */
+	for (i = 0; i < master->num_streamids; ++i) {
+		u32 reg = SMR_VALID | smrs[i].id << SMR_ID_SHIFT |
+			  smrs[i].mask << SMR_MASK_SHIFT;
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_SMR(smrs[i].idx));
+	}
+
+	master->smrs = smrs;
+	return 0;
+
+err_free_smrs:
+	while (--i >= 0)
+		__arm_smmu_free_bitmap(smmu->smr_map, smrs[i].idx);
+	kfree(smrs);
+	return -ENOSPC;
+}
+
+static void arm_smmu_master_free_smrs(struct arm_smmu_device *smmu,
+				      struct arm_smmu_master *master)
+{
+	int i;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+	struct arm_smmu_smr *smrs = master->smrs;
+
+	/* Invalidate the SMRs before freeing back to the allocator */
+	for (i = 0; i < master->num_streamids; ++i) {
+		u8 idx = smrs[i].idx;
+		writel_relaxed(~SMR_VALID, gr0_base + ARM_SMMU_GR0_SMR(idx));
+		__arm_smmu_free_bitmap(smmu->smr_map, idx);
+	}
+
+	master->smrs = NULL;
+	kfree(smrs);
+}
+
+static void arm_smmu_bypass_stream_mapping(struct arm_smmu_device *smmu,
+					   struct arm_smmu_master *master)
+{
+	int i;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	for (i = 0; i < master->num_streamids; ++i) {
+		u16 sid = master->streamids[i];
+		writel_relaxed(S2CR_TYPE_BYPASS,
+			       gr0_base + ARM_SMMU_GR0_S2CR(sid));
+	}
+}
+
+static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
+				      struct arm_smmu_master *master)
+{
+	int i, ret;
+	struct arm_smmu_device *parent, *smmu = smmu_domain->root_cfg.smmu;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	ret = arm_smmu_master_configure_smrs(smmu, master);
+	if (ret)
+		return ret;
+
+	/* Bypass the leaves */
+	smmu = smmu_domain->leaf_smmu;
+	while ((parent = find_parent_smmu(smmu))) {
+		/*
+		 * We won't have a StreamID match for anything but the root
+		 * smmu, so we only need to worry about StreamID indexing,
+		 * where we must install bypass entries in the S2CRs.
+		 */
+		if (smmu->features & ARM_SMMU_FEAT_STREAM_MATCH)
+			continue;
+
+		arm_smmu_bypass_stream_mapping(smmu, master);
+		smmu = parent;
+	}
+
+	/* Now we're at the root, time to point at our context bank */
+	for (i = 0; i < master->num_streamids; ++i) {
+		u32 idx, s2cr;
+		idx = master->smrs ? master->smrs[i].idx : master->streamids[i];
+		s2cr = (S2CR_TYPE_TRANS << S2CR_TYPE_SHIFT) |
+		       (smmu_domain->root_cfg.cbndx << S2CR_CBNDX_SHIFT);
+		writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
+	}
+
+	return 0;
+}
+
+static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
+					  struct arm_smmu_master *master)
+{
+	struct arm_smmu_device *smmu = smmu_domain->root_cfg.smmu;
+
+	/*
+	 * We *must* clear the S2CR first, because freeing the SMR means
+	 * that it can be re-allocated immediately.
+	 */
+	arm_smmu_bypass_stream_mapping(smmu, master);
+	arm_smmu_master_free_smrs(smmu, master);
+}
+
+static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
+{
+	int ret = -EINVAL;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_device *device_smmu = dev->archdata.iommu;
+	struct arm_smmu_master *master;
+
+	if (!device_smmu) {
+		dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
+		return -ENXIO;
+	}
+
+
+	/*
+	 * Sanity check the domain. We don't currently support domains
+	 * that cross between different SMMU chains.
+	 */
+	spin_lock(&smmu_domain->lock);
+	if (!smmu_domain->leaf_smmu) {
+		smmu_domain->root_cfg.smmu = find_root_smmu(dev);
+		if (!smmu_domain->root_cfg.smmu) {
+			dev_err(dev, "unable to find root SMMU for device\n");
+			goto err_unlock;
+		}
+
+		/* Now that we have a master, we can finalise the domain */
+		ret = arm_smmu_init_domain_context(domain);
+		if (IS_ERR_VALUE(ret))
+			goto err_unlock;
+
+		smmu_domain->leaf_smmu = device_smmu;
+	} else if (smmu_domain->leaf_smmu != device_smmu) {
+		dev_err(dev,
+			"cannot attach to SMMU %s whilst already attached to domain on SMMU %s\n",
+			dev_name(smmu_domain->leaf_smmu->dev),
+			dev_name(device_smmu->dev));
+		goto err_unlock;
+	}
+	spin_unlock(&smmu_domain->lock);
+
+	/* Looks ok, so add the device to the domain */
+	master = find_smmu_master(smmu_domain->leaf_smmu, dev->of_node);
+	if (!master)
+		return -ENODEV;
+
+	return arm_smmu_domain_add_master(smmu_domain, master);
+
+err_unlock:
+	spin_unlock(&smmu_domain->lock);
+	return ret;
+}
+
+static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_master *master;
+
+	master = find_smmu_master(smmu_domain->leaf_smmu, dev->of_node);
+	if (master)
+		arm_smmu_domain_remove_master(smmu_domain, master);
+}
+
+static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
+				   size_t size)
+{
+	unsigned long offset = (unsigned long)addr & ~PAGE_MASK;
+
+	/*
+	 * If the SMMU can't walk tables in the CPU caches, treat them
+	 * like non-coherent DMA...
+	 */
+	if (!(smmu->features & ARM_SMMU_FEAT_COHERENT_WALK))
+		dma_map_page(smmu->dev, virt_to_page(addr), offset, size,
+			     DMA_TO_DEVICE);
+}
+
+static bool arm_smmu_pte_is_contiguous_range(unsigned long addr,
+					     unsigned long end)
+{
+	return !(addr & ~ARM_SMMU_PTE_CONT_MASK) &&
+		(addr + ARM_SMMU_PTE_CONT_SIZE <= end);
+}
+
+static int arm_smmu_alloc_init_pte(struct arm_smmu_device *smmu, pmd_t *pmd,
+				   unsigned long addr, unsigned long end,
+				   unsigned long pfn, int flags, int stage)
+{
+	pte_t *pte, *start;
+	pteval_t pteval = ARM_SMMU_PTE_PAGE | ARM_SMMU_PTE_AF;
+
+	if (pmd_none(*pmd)) {
+		/* Allocate a new set of tables */
+		pgtable_t table = alloc_page(PGALLOC_GFP);
+		if (!table)
+			return -ENOMEM;
+
+		arm_smmu_flush_pgtable(smmu, page_address(table),
+				       ARM_SMMU_PTE_HWTABLE_SIZE);
+		pgtable_page_ctor(table);
+		pmd_populate(NULL, pmd, table);
+		arm_smmu_flush_pgtable(smmu, pmd, sizeof(*pmd));
+	}
+
+	if (stage == 1) {
+		pteval |= ARM_SMMU_PTE_AP_UNPRIV;
+		if (!(flags & IOMMU_WRITE) && (flags & IOMMU_READ))
+			pteval |= ARM_SMMU_PTE_AP_RDONLY;
+
+		if (flags & IOMMU_CACHE)
+			pteval |= (MAIR_ATTR_IDX_CACHE <<
+				   ARM_SMMU_PTE_ATTRINDX_SHIFT);
+	} else {
+		pteval |= ARM_SMMU_PTE_HAP_FAULT;
+		if (flags & IOMMU_READ)
+			pteval |= ARM_SMMU_PTE_HAP_READ;
+		if (flags & IOMMU_WRITE)
+			pteval |= ARM_SMMU_PTE_HAP_WRITE;
+		if (flags & IOMMU_CACHE)
+			pteval |= ARM_SMMU_PTE_MEMATTR_OIWB;
+		else
+			pteval |= ARM_SMMU_PTE_MEMATTR_NC;
+	}
+
+	/* If no access, create a faulting entry to avoid TLB fills */
+	if (!(flags & (IOMMU_READ | IOMMU_WRITE)))
+		pteval &= ~ARM_SMMU_PTE_PAGE;
+
+	pteval |= ARM_SMMU_PTE_SH_IS;
+	start = pmd_page_vaddr(*pmd) + pte_index(addr);
+	pte = start;
+
+	/*
+	 * Install the page table entries. This is fairly complicated
+	 * since we attempt to make use of the contiguous hint in the
+	 * ptes where possible. The contiguous hint indicates a series
+	 * of ARM_SMMU_PTE_CONT_ENTRIES ptes mapping a physically
+	 * contiguous region with the following constraints:
+	 *
+	 *   - The region start is aligned to ARM_SMMU_PTE_CONT_SIZE
+	 *   - Each pte in the region has the contiguous hint bit set
+	 *
+	 * This complicates unmapping (also handled by this code, when
+	 * neither IOMMU_READ or IOMMU_WRITE are set) because it is
+	 * possible, yet highly unlikely, that a client may unmap only
+	 * part of a contiguous range. This requires clearing of the
+	 * contiguous hint bits in the range before installing the new
+	 * faulting entries.
+	 *
+	 * Note that re-mapping an address range without first unmapping
+	 * it is not supported, so TLB invalidation is not required here
+	 * and is instead performed at unmap and domain-init time.
+	 */
+	do {
+		int i = 1;
+		pteval &= ~ARM_SMMU_PTE_CONT;
+
+		if (arm_smmu_pte_is_contiguous_range(addr, end)) {
+			i = ARM_SMMU_PTE_CONT_ENTRIES;
+			pteval |= ARM_SMMU_PTE_CONT;
+		} else if (pte_val(*pte) &
+			   (ARM_SMMU_PTE_CONT | ARM_SMMU_PTE_PAGE)) {
+			int j;
+			pte_t *cont_start;
+			unsigned long idx = pte_index(addr);
+
+			idx &= ~(ARM_SMMU_PTE_CONT_ENTRIES - 1);
+			cont_start = pmd_page_vaddr(*pmd) + idx;
+			for (j = 0; j < ARM_SMMU_PTE_CONT_ENTRIES; ++j)
+				pte_val(*(cont_start + j)) &= ~ARM_SMMU_PTE_CONT;
+
+			arm_smmu_flush_pgtable(smmu, cont_start,
+					       sizeof(*pte) *
+					       ARM_SMMU_PTE_CONT_ENTRIES);
+		}
+
+		do {
+			*pte = pfn_pte(pfn, __pgprot(pteval));
+		} while (pte++, pfn++, addr += PAGE_SIZE, --i);
+	} while (addr != end);
+
+	arm_smmu_flush_pgtable(smmu, start, sizeof(*pte) * (pte - start));
+	return 0;
+}
+
+static int arm_smmu_alloc_init_pmd(struct arm_smmu_device *smmu, pud_t *pud,
+				   unsigned long addr, unsigned long end,
+				   phys_addr_t phys, int flags, int stage)
+{
+	int ret;
+	pmd_t *pmd;
+	unsigned long next, pfn = __phys_to_pfn(phys);
+
+#ifndef __PAGETABLE_PMD_FOLDED
+	if (pud_none(*pud)) {
+		pmd = pmd_alloc_one(NULL, addr);
+		if (!pmd)
+			return -ENOMEM;
+	} else
+#endif
+		pmd = pmd_offset(pud, addr);
+
+	do {
+		next = pmd_addr_end(addr, end);
+		ret = arm_smmu_alloc_init_pte(smmu, pmd, addr, end, pfn,
+					      flags, stage);
+		pud_populate(NULL, pud, pmd);
+		arm_smmu_flush_pgtable(smmu, pud, sizeof(*pud));
+		phys += next - addr;
+	} while (pmd++, addr = next, addr < end);
+
+	return ret;
+}
+
+static int arm_smmu_alloc_init_pud(struct arm_smmu_device *smmu, pgd_t *pgd,
+				   unsigned long addr, unsigned long end,
+				   phys_addr_t phys, int flags, int stage)
+{
+	int ret = 0;
+	pud_t *pud;
+	unsigned long next;
+
+#ifndef __PAGETABLE_PUD_FOLDED
+	if (pgd_none(*pgd)) {
+		pud = pud_alloc_one(NULL, addr);
+		if (!pud)
+			return -ENOMEM;
+	} else
+#endif
+		pud = pud_offset(pgd, addr);
+
+	do {
+		next = pud_addr_end(addr, end);
+		ret = arm_smmu_alloc_init_pmd(smmu, pud, addr, next, phys,
+					      flags, stage);
+		pgd_populate(NULL, pud, pgd);
+		arm_smmu_flush_pgtable(smmu, pgd, sizeof(*pgd));
+		phys += next - addr;
+	} while (pud++, addr = next, addr < end);
+
+	return ret;
+}
+
+static int arm_smmu_create_mapping(struct arm_smmu_domain *smmu_domain,
+				   unsigned long iova, phys_addr_t paddr,
+				   size_t size, int flags)
+{
+	int ret, stage;
+	unsigned long end;
+	phys_addr_t input_mask, output_mask;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	pgd_t *pgd = root_cfg->pgd;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+
+	if (root_cfg->cbar == CBAR_TYPE_S2_TRANS) {
+		stage = 2;
+		output_mask = (1ULL << smmu->s2_output_size) - 1;
+	} else {
+		stage = 1;
+		output_mask = (1ULL << smmu->s1_output_size) - 1;
+	}
+
+	if (!pgd)
+		return -EINVAL;
+
+	if (size & ~PAGE_MASK)
+		return -EINVAL;
+
+	input_mask = (1ULL << smmu->input_size) - 1;
+	if ((phys_addr_t)iova & ~input_mask)
+		return -ERANGE;
+
+	if (paddr & ~output_mask)
+		return -ERANGE;
+
+	spin_lock(&smmu_domain->lock);
+	pgd += pgd_index(iova);
+	end = iova + size;
+	do {
+		unsigned long next = pgd_addr_end(iova, end);
+
+		ret = arm_smmu_alloc_init_pud(smmu, pgd, iova, next, paddr,
+					      flags, stage);
+		if (ret)
+			goto out_unlock;
+
+		paddr += next - iova;
+		iova = next;
+	} while (pgd++, iova != end);
+
+out_unlock:
+	spin_unlock(&smmu_domain->lock);
+
+	/* Ensure new page tables are visible to the hardware walker */
+	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+		dsb();
+
+	return ret;
+}
+
+static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int flags)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_device *smmu = smmu_domain->leaf_smmu;
+
+	if (!smmu_domain || !smmu)
+		return -ENODEV;
+
+	/*
+	 * Check for silent address truncation up the SMMU chain.
+	 */
+	do {
+		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
+		if ((phys_addr_t)iova & ~output_mask)
+			return -ERANGE;
+	} while ((smmu = find_parent_smmu(smmu)));
+
+	return arm_smmu_create_mapping(smmu_domain, iova, paddr, size, flags);
+}
+
+static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
+			     size_t size)
+{
+	int ret;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	ret = arm_smmu_create_mapping(smmu_domain, iova, 0, size, 0);
+	writel_relaxed(root_cfg->vmid, gr0_base + ARM_SMMU_GR0_TLBIVMID);
+	arm_smmu_tlb_sync(smmu);
+	return ret ? ret : size;
+}
+
+static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+					 dma_addr_t iova)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+
+	spin_lock(&smmu_domain->lock);
+	pgd = root_cfg->pgd;
+	if (!pgd)
+		goto err_unlock;
+
+	pgd += pgd_index(iova);
+	if (pgd_none_or_clear_bad(pgd))
+		goto err_unlock;
+
+	pud = pud_offset(pgd, iova);
+	if (pud_none_or_clear_bad(pud))
+		goto err_unlock;
+
+	pmd = pmd_offset(pud, iova);
+	if (pmd_none_or_clear_bad(pmd))
+		goto err_unlock;
+
+	pte = pmd_page_vaddr(*pmd) + pte_index(iova);
+	if (pte_none(pte))
+		goto err_unlock;
+
+	spin_unlock(&smmu_domain->lock);
+	return __pfn_to_phys(pte_pfn(*pte)) | (iova & ~PAGE_MASK);
+
+err_unlock:
+	spin_unlock(&smmu_domain->lock);
+	dev_warn(smmu->dev,
+		 "invalid (corrupt?) page tables detected for iova 0x%llx\n",
+		 (unsigned long long)iova);
+	return -EINVAL;
+}
+
+static int arm_smmu_domain_has_cap(struct iommu_domain *domain,
+				   unsigned long cap)
+{
+	unsigned long caps = 0;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+
+	if (smmu_domain->root_cfg.smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+		caps |= IOMMU_CAP_CACHE_COHERENCY;
+
+	return !!(cap & caps);
+}
+
+static int arm_smmu_add_device(struct device *dev)
+{
+	struct arm_smmu_device *child, *parent, *smmu;
+	struct arm_smmu_device *tmp[2];
+	struct arm_smmu_master *master = NULL;
+
+	list_for_each_entry_safe(parent, tmp[0], &arm_smmu_devices, list) {
+		smmu = parent;
+
+		/* Try to find a child of the current SMMU. */
+		list_for_each_entry_safe(child, tmp[1], &arm_smmu_devices, list) {
+			if (child->parent_of_node == parent->dev->of_node) {
+				/* Does the child sit above our master? */
+				master = find_smmu_master(child, dev->of_node);
+				if (master) {
+					smmu = NULL;
+					break;
+				}
+			}
+		}
+
+		/* We found some children, so keep searching. */
+		if (!smmu) {
+			master = NULL;
+			continue;
+		}
+
+		master = find_smmu_master(smmu, dev->of_node);
+		if (master)
+			break;
+	}
+
+	if (!master)
+		return -ENODEV;
+
+	dev->archdata.iommu = smmu;
+	return 0;
+}
+
+static void arm_smmu_remove_device(struct device *dev)
+{
+	dev->archdata.iommu = NULL;
+}
+
+static struct iommu_ops arm_smmu_ops = {
+	.domain_init	= arm_smmu_domain_init,
+	.domain_destroy	= arm_smmu_domain_destroy,
+	.attach_dev	= arm_smmu_attach_dev,
+	.detach_dev	= arm_smmu_detach_dev,
+	.map		= arm_smmu_map,
+	.unmap		= arm_smmu_unmap,
+	.iova_to_phys	= arm_smmu_iova_to_phys,
+	.domain_has_cap	= arm_smmu_domain_has_cap,
+	.add_device	= arm_smmu_add_device,
+	.remove_device	= arm_smmu_remove_device,
+	.pgsize_bitmap	= (SECTION_SIZE |
+			   ARM_SMMU_PTE_CONT_SIZE |
+			   PAGE_SIZE),
+};
+
+static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
+{
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+	int i = 0;
+	u32 scr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sCR0);
+
+	/* Mark all SMRn as invalid and all S2CRn as bypass */
+	for (i = 0; i < smmu->num_mapping_groups; ++i) {
+		writel_relaxed(~SMR_VALID, gr0_base + ARM_SMMU_GR0_SMR(i));
+		writel_relaxed(S2CR_TYPE_BYPASS, gr0_base + ARM_SMMU_GR0_S2CR(i));
+	}
+
+	/* Invalidate the TLB, just in case */
+	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
+	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+
+	/* Enable fault reporting */
+	scr0 |= (sCR0_GFRE | sCR0_GFIE | sCR0_GCFGFRE | sCR0_GCFGFIE);
+
+	/* Disable TLB broadcasting. */
+	scr0 |= (sCR0_VMIDPNE | sCR0_PTM);
+
+	/* Enable client access, but bypass when no mapping is found */
+	scr0 &= ~(sCR0_CLIENTPD | sCR0_USFCFG);
+
+	/* Disable forced broadcasting */
+	scr0 &= ~sCR0_FB;
+
+	/* Don't upgrade barriers */
+	scr0 &= ~(sCR0_BSU_MASK << sCR0_BSU_SHIFT);
+
+	/* Push the button */
+	arm_smmu_tlb_sync(smmu);
+	writel(scr0, gr0_base + ARM_SMMU_GR0_sCR0);
+}
+
+static int arm_smmu_id_size_to_bits(int size)
+{
+	switch (size) {
+	case 0:
+		return 32;
+	case 1:
+		return 36;
+	case 2:
+		return 40;
+	case 3:
+		return 42;
+	case 4:
+		return 44;
+	case 5:
+	default:
+		return 48;
+	}
+}
+
+static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
+{
+	unsigned long size;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+	u32 id;
+
+	dev_notice(smmu->dev, "probing hardware configuration...\n");
+
+	/* Primecell ID */
+	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_PIDR2);
+	smmu->version = ((id >> PIDR2_ARCH_SHIFT) & PIDR2_ARCH_MASK) + 1;
+	dev_notice(smmu->dev, "SMMUv%d with:\n", smmu->version);
+
+	/* ID0 */
+	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID0);
+#ifndef CONFIG_64BIT
+	if (((id >> ID0_PTFS_SHIFT) & ID0_PTFS_MASK) == ID0_PTFS_V8_ONLY) {
+		dev_err(smmu->dev, "\tno v7 descriptor support!\n");
+		return -ENODEV;
+	}
+#endif
+	if (id & ID0_S1TS) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_S1;
+		dev_notice(smmu->dev, "\tstage 1 translation\n");
+	}
+
+	if (id & ID0_S2TS) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
+		dev_notice(smmu->dev, "\tstage 2 translation\n");
+	}
+
+	if (id & ID0_NTS) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_NESTED;
+		dev_notice(smmu->dev, "\tnested translation\n");
+	}
+
+	if (!(smmu->features &
+		(ARM_SMMU_FEAT_TRANS_S1 | ARM_SMMU_FEAT_TRANS_S2 |
+		 ARM_SMMU_FEAT_TRANS_NESTED))) {
+		dev_err(smmu->dev, "\tno translation support!\n");
+		return -ENODEV;
+	}
+
+	if (id & ID0_CTTW) {
+		smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK;
+		dev_notice(smmu->dev, "\tcoherent table walk\n");
+	}
+
+	if (id & ID0_SMS) {
+		u32 smr, sid, mask;
+
+		smmu->features |= ARM_SMMU_FEAT_STREAM_MATCH;
+		smmu->num_mapping_groups = (id >> ID0_NUMSMRG_SHIFT) &
+					   ID0_NUMSMRG_MASK;
+		if (smmu->num_mapping_groups == 0) {
+			dev_err(smmu->dev,
+				"stream-matching supported, but no SMRs present!\n");
+			return -ENODEV;
+		}
+
+		smr = SMR_MASK_MASK << SMR_MASK_SHIFT;
+		smr |= (SMR_ID_MASK << SMR_ID_SHIFT);
+		writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0));
+		smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0));
+
+		mask = (smr >> SMR_MASK_SHIFT) & SMR_MASK_MASK;
+		sid = (smr >> SMR_ID_SHIFT) & SMR_ID_MASK;
+		if ((mask & sid) != sid) {
+			dev_err(smmu->dev,
+				"SMR mask bits (0x%x) insufficient for ID field (0x%x)\n",
+				mask, sid);
+			return -ENODEV;
+		}
+
+		dev_notice(smmu->dev,
+			   "\tstream matching with %u register groups, mask 0x%x",
+			   smmu->num_mapping_groups, mask);
+	}
+
+	/* ID1 */
+	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID1);
+	smmu->pagesize = (id & ID1_PAGESIZE) ? SZ_64K : SZ_4K;
+
+	/* Check that we ioremapped enough */
+	size = 1 << (((id >> ID1_NUMPAGENDXB_SHIFT) & ID1_NUMPAGENDXB_MASK) + 1);
+	size *= (smmu->pagesize << 1);
+	if (smmu->size < size)
+		dev_warn(smmu->dev,
+			 "device is 0x%lx bytes but only mapped 0x%lx!\n",
+			 size, smmu->size);
+
+	smmu->num_s2_context_banks = (id >> ID1_NUMS2CB_SHIFT) &
+				      ID1_NUMS2CB_MASK;
+	smmu->num_context_banks = (id >> ID1_NUMCB_SHIFT) & ID1_NUMCB_MASK;
+	if (smmu->num_s2_context_banks > smmu->num_context_banks) {
+		dev_err(smmu->dev, "impossible number of S2 context banks!\n");
+		return -ENODEV;
+	}
+	dev_notice(smmu->dev, "\t%u context banks (%u stage-2 only)\n",
+		   smmu->num_context_banks, smmu->num_s2_context_banks);
+
+	/* ID2 */
+	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID2);
+	size = arm_smmu_id_size_to_bits((id >> ID2_IAS_SHIFT) & ID2_IAS_MASK);
+
+	/*
+	 * Stage-1 output limited by stage-2 input size due to pgd
+	 * allocation (PTRS_PER_PGD).
+	 */
+#ifdef CONFIG_64BIT
+	/* Current maximum output size of 39 bits */
+	smmu->s1_output_size = min(39UL, size);
+#else
+	smmu->s1_output_size = min(32UL, size);
+#endif
+
+	/* The stage-2 output mask is also applied for bypass */
+	size = arm_smmu_id_size_to_bits((id >> ID2_OAS_SHIFT) & ID2_OAS_MASK);
+	smmu->s2_output_size = min((unsigned long)PHYS_MASK_SHIFT, size);
+
+	if (smmu->version == 1) {
+		smmu->input_size = 32;
+	} else {
+#ifdef CONFIG_64BIT
+		size = (id >> ID2_UBS_SHIFT) & ID2_UBS_MASK;
+		size = min(39, arm_smmu_id_size_to_bits(size));
+#else
+		size = 32;
+#endif
+		smmu->input_size = size;
+
+		if ((PAGE_SIZE == SZ_4K && !(id & ID2_PTFS_4K)) ||
+		    (PAGE_SIZE == SZ_64K && !(id & ID2_PTFS_64K)) ||
+		    (PAGE_SIZE != SZ_4K && PAGE_SIZE != SZ_64K)) {
+			dev_err(smmu->dev, "CPU page size 0x%lx unsupported\n",
+				PAGE_SIZE);
+			return -ENODEV;
+		}
+	}
+
+	dev_notice(smmu->dev,
+		   "\t%lu-bit VA, %lu-bit IPA, %lu-bit PA\n",
+		   smmu->input_size, smmu->s1_output_size, smmu->s2_output_size);
+	return 0;
+}
+
+static int arm_smmu_device_dt_probe(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct arm_smmu_device *smmu;
+	struct device_node *dev_node;
+	struct device *dev = &pdev->dev;
+	struct rb_node *node;
+	struct of_phandle_args masterspec;
+	int num_irqs, i, err;
+
+	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
+	if (!smmu) {
+		dev_err(dev, "failed to allocate arm_smmu_device\n");
+		return -ENOMEM;
+	}
+	smmu->dev = dev;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		dev_err(dev, "missing base address/size\n");
+		return -ENODEV;
+	}
+
+	smmu->size = resource_size(res);
+	smmu->base = devm_request_and_ioremap(dev, res);
+	if (!smmu->base)
+		return -EADDRNOTAVAIL;
+
+	if (of_property_read_u32(dev->of_node, "#global-interrupts",
+				 &smmu->num_global_irqs)) {
+		dev_err(dev, "missing #global-interrupts property\n");
+		return -ENODEV;
+	}
+
+	num_irqs = 0;
+	while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
+		num_irqs++;
+		if (num_irqs > smmu->num_global_irqs)
+			smmu->num_context_irqs++;
+	}
+
+	if (num_irqs < smmu->num_global_irqs) {
+		dev_warn(dev, "found %d interrupts but expected at least %d\n",
+			 num_irqs, smmu->num_global_irqs);
+		smmu->num_global_irqs = num_irqs;
+	}
+	smmu->num_context_irqs = num_irqs - smmu->num_global_irqs;
+
+	smmu->irqs = devm_kzalloc(dev, sizeof(*smmu->irqs) * num_irqs,
+				  GFP_KERNEL);
+	if (!smmu->irqs) {
+		dev_err(dev, "failed to allocate %d irqs\n", num_irqs);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < num_irqs; ++i) {
+		int irq = platform_get_irq(pdev, i);
+		if (irq < 0) {
+			dev_err(dev, "failed to get irq index %d\n", i);
+			return -ENODEV;
+		}
+		smmu->irqs[i] = irq;
+	}
+
+	i = 0;
+	smmu->masters = RB_ROOT;
+	while (!of_parse_phandle_with_args(dev->of_node, "mmu-masters",
+					   "#stream-id-cells", i,
+					   &masterspec)) {
+		err = register_smmu_master(smmu, dev, &masterspec);
+		if (err) {
+			dev_err(dev, "failed to add master %s\n",
+				masterspec.np->name);
+			goto out_put_masters;
+		}
+
+		i++;
+	}
+	dev_notice(dev, "registered %d master devices\n", i);
+
+	if ((dev_node = of_parse_phandle(dev->of_node, "smmu-parent", 0)))
+		smmu->parent_of_node = dev_node;
+
+	err = arm_smmu_device_cfg_probe(smmu);
+	if (err)
+		goto out_put_parent;
+
+	if (smmu->version > 1 &&
+	    smmu->num_context_banks != smmu->num_context_irqs) {
+		dev_err(dev,
+			"found only %d context interrupt(s) but %d required\n",
+			smmu->num_context_irqs, smmu->num_context_banks);
+		goto out_put_parent;
+	}
+
+	arm_smmu_device_reset(smmu);
+
+	for (i = 0; i < smmu->num_global_irqs; ++i) {
+		err = request_irq(smmu->irqs[i],
+				  arm_smmu_global_fault,
+				  IRQF_SHARED,
+				  "arm-smmu global fault",
+				  smmu);
+		if (err) {
+			dev_err(dev, "failed to request global IRQ %d (%u)\n",
+				i, smmu->irqs[i]);
+			goto out_free_irqs;
+		}
+	}
+
+	INIT_LIST_HEAD(&smmu->list);
+	spin_lock(&arm_smmu_devices_lock);
+	list_add(&smmu->list, &arm_smmu_devices);
+	spin_unlock(&arm_smmu_devices_lock);
+	return 0;
+
+out_free_irqs:
+	while (i--)
+		free_irq(smmu->irqs[i], smmu);
+
+out_put_parent:
+	if (smmu->parent_of_node)
+		of_node_put(smmu->parent_of_node);
+
+out_put_masters:
+	for (node = rb_first(&smmu->masters); node; node = rb_next(node)) {
+		struct arm_smmu_master *master;
+		master = container_of(node, struct arm_smmu_master, node);
+		of_node_put(master->of_node);
+	}
+
+	return err;
+}
+
+static int arm_smmu_device_remove(struct platform_device *pdev)
+{
+	int i;
+	struct device *dev = &pdev->dev;
+	struct arm_smmu_device *curr, *tmp, *smmu = NULL;
+	struct rb_node *node;
+
+	list_for_each_entry_safe(curr, tmp, &arm_smmu_devices, list) {
+		if (curr->dev == dev) {
+			smmu = curr;
+			break;
+		}
+	}
+
+	if (!smmu)
+		return -ENODEV;
+
+	spin_lock(&arm_smmu_devices_lock);
+	list_del(&smmu->list);
+	spin_unlock(&arm_smmu_devices_lock);
+
+	if (smmu->parent_of_node)
+		of_node_put(smmu->parent_of_node);
+
+	for (node = rb_first(&smmu->masters); node; node = rb_next(node)) {
+		struct arm_smmu_master *master;
+		master = container_of(node, struct arm_smmu_master, node);
+		of_node_put(master->of_node);
+	}
+
+	if (!bitmap_empty(smmu->vmid_map, ARM_SMMU_NUM_VMIDS))
+		dev_err(dev, "removing device with active domains!\n");
+
+	for (i = 0; i < smmu->num_global_irqs; ++i)
+		free_irq(smmu->irqs[i], smmu);
+
+	/* Turn the thing off */
+	writel(sCR0_CLIENTPD, ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_sCR0);
+	return 0;
+}
+
+#ifdef CONFIG_OF
+static struct of_device_id arm_smmu_of_match[] = {
+	{ .compatible = "arm,smmu-v1", },
+	{ .compatible = "arm,smmu-v2", },
+	{ .compatible = "arm,mmu-400", },
+	{ .compatible = "arm,mmu-500", },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
+#endif
+
+static struct platform_driver arm_smmu_driver = {
+	.driver	= {
+		.owner		= THIS_MODULE,
+		.name		= "arm-smmu",
+		.of_match_table	= of_match_ptr(arm_smmu_of_match),
+	},
+	.probe	= arm_smmu_device_dt_probe,
+	.remove	= arm_smmu_device_remove,
+};
+
+static int __init arm_smmu_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&arm_smmu_driver);
+	if (ret)
+		return ret;
+
+	/* Oh, for a proper bus abstraction */
+	if (!iommu_present(&platform_bus_type));
+		bus_set_iommu(&platform_bus_type, &arm_smmu_ops);
+
+	if (!iommu_present(&amba_bustype));
+		bus_set_iommu(&amba_bustype, &arm_smmu_ops);
+
+	return 0;
+}
+
+static void __exit arm_smmu_exit(void)
+{
+	return platform_driver_unregister(&arm_smmu_driver);
+}
+
+module_init(arm_smmu_init);
+module_exit(arm_smmu_exit);
+
+MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations");
+MODULE_AUTHOR("Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>");
+MODULE_LICENSE("GPL v2");
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

This patch adds support for SMMUs implementing the ARM System MMU
architecture versions 1 or 2. Both arm and arm64 are supported, although
the v7s descriptor format is not used.

Cc: Rob Herring <robherring2@gmail.com>
Cc: Andreas Herrmann <andreas.herrmann@calxeda.com>
Cc: Olav Haugan <ohaugan@codeaurora.org>
Cc: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 drivers/iommu/Kconfig    |   13 +
 drivers/iommu/Makefile   |    1 +
 drivers/iommu/arm-smmu.c | 1965 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1979 insertions(+)
 create mode 100644 drivers/iommu/arm-smmu.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index c332fb9..957cfd4 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -261,4 +261,17 @@ config SHMOBILE_IOMMU_L1SIZE
 	default 256 if SHMOBILE_IOMMU_ADDRSIZE_64MB
 	default 128 if SHMOBILE_IOMMU_ADDRSIZE_32MB
 
+config ARM_SMMU
+	bool "ARM Ltd. System MMU (SMMU) Support"
+	depends on ARM64 || (ARM_LPAE && OF)
+	select IOMMU_API
+	select ARM_DMA_USE_IOMMU if ARM
+	help
+	  Support for implementations of the ARM System MMU architecture
+	  versions 1 and 2. The driver supports both v7l and v8l table
+	  formats with 4k and 64k page sizes.
+
+	  Say Y here if your SoC includes an IOMMU device implementing
+	  the ARM SMMU architecture.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index ef0e520..bbe7041 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_OF_IOMMU)	+= of_iommu.o
 obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o msm_iommu_dev.o
 obj-$(CONFIG_AMD_IOMMU) += amd_iommu.o amd_iommu_init.o
 obj-$(CONFIG_AMD_IOMMU_V2) += amd_iommu_v2.o
+obj-$(CONFIG_ARM_SMMU) += arm-smmu.o
 obj-$(CONFIG_DMAR_TABLE) += dmar.o
 obj-$(CONFIG_INTEL_IOMMU) += iova.o intel-iommu.o
 obj-$(CONFIG_IRQ_REMAP) += intel_irq_remapping.o irq_remapping.o
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
new file mode 100644
index 0000000..57ec597
--- /dev/null
+++ b/drivers/iommu/arm-smmu.c
@@ -0,0 +1,1965 @@
+/*
+ * IOMMU API for ARM architected SMMU implementations.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) 2013 ARM Limited
+ *
+ * Author: Will Deacon <will.deacon@arm.com>
+ *
+ * This driver currently supports:
+ *	- SMMUv1 and v2 implementations
+ *	- Stream-matching and stream-indexing
+ *	- v7/v8 long-descriptor format
+ *	- Non-secure access to the SMMU
+ *	- 4k and 64k pages, with contiguous pte hints.
+ *	- Up to 39-bit addressing
+ *	- Context fault reporting
+ */
+
+#define pr_fmt(fmt) "arm-smmu: " fmt
+
+#include <linux/dma-mapping.h>
+#include <linux/err.h>
+#include <linux/interrupt.h>
+#include <linux/io.h>
+#include <linux/iommu.h>
+#include <linux/mm.h>
+#include <linux/module.h>
+#include <linux/of.h>
+#include <linux/platform_device.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include <linux/amba/bus.h>
+
+#include <asm/pgalloc.h>
+
+/* Maximum number of stream IDs assigned to a single device */
+#define MAX_MASTER_STREAMIDS		8
+
+/* Maximum number of context banks per SMMU */
+#define ARM_SMMU_MAX_CBS		128
+
+/* Maximum number of mapping groups per SMMU */
+#define ARM_SMMU_MAX_SMRS		128
+
+/* Number of VMIDs per SMMU */
+#define ARM_SMMU_NUM_VMIDS		256
+
+/* SMMU global address space */
+#define ARM_SMMU_GR0(smmu)		((smmu)->base)
+#define ARM_SMMU_GR1(smmu)		((smmu)->base + (smmu)->pagesize)
+
+/* Page table bits */
+#define ARM_SMMU_PTE_PAGE		(((pteval_t)3) << 0)
+#define ARM_SMMU_PTE_CONT		(((pteval_t)1) << 52)
+#define ARM_SMMU_PTE_AF			(((pteval_t)1) << 10)
+#define ARM_SMMU_PTE_SH_NS		(((pteval_t)0) << 8)
+#define ARM_SMMU_PTE_SH_OS		(((pteval_t)2) << 8)
+#define ARM_SMMU_PTE_SH_IS		(((pteval_t)3) << 8)
+
+#if PAGE_SIZE == SZ_4K
+#define ARM_SMMU_PTE_CONT_ENTRIES	16
+#elif PAGE_SIZE == SZ_64K
+#define ARM_SMMU_PTE_CONT_ENTRIES	32
+#else
+#define ARM_SMMU_PTE_CONT_ENTRIES	1
+#endif
+
+#define ARM_SMMU_PTE_CONT_SIZE		(PAGE_SIZE * ARM_SMMU_PTE_CONT_ENTRIES)
+#define ARM_SMMU_PTE_CONT_MASK		(~(ARM_SMMU_PTE_CONT_SIZE - 1))
+#define ARM_SMMU_PTE_HWTABLE_SIZE	(PTRS_PER_PTE * sizeof(pte_t))
+
+/* Stage-1 PTE */
+#define ARM_SMMU_PTE_AP_UNPRIV		(((pteval_t)1) << 6)
+#define ARM_SMMU_PTE_AP_RDONLY		(((pteval_t)2) << 6)
+#define ARM_SMMU_PTE_ATTRINDX_SHIFT	2
+
+/* Stage-2 PTE */
+#define ARM_SMMU_PTE_HAP_FAULT		(((pteval_t)0) << 6)
+#define ARM_SMMU_PTE_HAP_READ		(((pteval_t)1) << 6)
+#define ARM_SMMU_PTE_HAP_WRITE		(((pteval_t)2) << 6)
+#define ARM_SMMU_PTE_MEMATTR_OIWB	(((pteval_t)0xf) << 2)
+#define ARM_SMMU_PTE_MEMATTR_NC		(((pteval_t)0x5) << 2)
+#define ARM_SMMU_PTE_MEMATTR_DEV	(((pteval_t)0x1) << 2)
+
+/* Configuration registers */
+#define ARM_SMMU_GR0_sCR0		0x0
+#define sCR0_CLIENTPD			(1 << 0)
+#define sCR0_GFRE			(1 << 1)
+#define sCR0_GFIE			(1 << 2)
+#define sCR0_GCFGFRE			(1 << 4)
+#define sCR0_GCFGFIE			(1 << 5)
+#define sCR0_USFCFG			(1 << 10)
+#define sCR0_VMIDPNE			(1 << 11)
+#define sCR0_PTM			(1 << 12)
+#define sCR0_FB				(1 << 13)
+#define sCR0_BSU_SHIFT			14
+#define sCR0_BSU_MASK			0x3
+
+/* Identification registers */
+#define ARM_SMMU_GR0_ID0		0x20
+#define ARM_SMMU_GR0_ID1		0x24
+#define ARM_SMMU_GR0_ID2		0x28
+#define ARM_SMMU_GR0_ID3		0x2c
+#define ARM_SMMU_GR0_ID4		0x30
+#define ARM_SMMU_GR0_ID5		0x34
+#define ARM_SMMU_GR0_ID6		0x38
+#define ARM_SMMU_GR0_ID7		0x3c
+#define ARM_SMMU_GR0_sGFSR		0x48
+#define ARM_SMMU_GR0_sGFSYNR0		0x50
+#define ARM_SMMU_GR0_sGFSYNR1		0x54
+#define ARM_SMMU_GR0_sGFSYNR2		0x58
+#define ARM_SMMU_GR0_PIDR0		0xfe0
+#define ARM_SMMU_GR0_PIDR1		0xfe4
+#define ARM_SMMU_GR0_PIDR2		0xfe8
+
+#define ID0_S1TS			(1 << 30)
+#define ID0_S2TS			(1 << 29)
+#define ID0_NTS				(1 << 28)
+#define ID0_SMS				(1 << 27)
+#define ID0_PTFS_SHIFT			24
+#define ID0_PTFS_MASK			0x2
+#define ID0_PTFS_V8_ONLY		0x2
+#define ID0_CTTW			(1 << 14)
+#define ID0_NUMIRPT_SHIFT		16
+#define ID0_NUMIRPT_MASK		0xff
+#define ID0_NUMSMRG_SHIFT		0
+#define ID0_NUMSMRG_MASK		0xff
+
+#define ID1_PAGESIZE			(1 << 31)
+#define ID1_NUMPAGENDXB_SHIFT		28
+#define ID1_NUMPAGENDXB_MASK		7
+#define ID1_NUMS2CB_SHIFT		16
+#define ID1_NUMS2CB_MASK		0xff
+#define ID1_NUMCB_SHIFT			0
+#define ID1_NUMCB_MASK			0xff
+
+#define ID2_OAS_SHIFT			4
+#define ID2_OAS_MASK			0xf
+#define ID2_IAS_SHIFT			0
+#define ID2_IAS_MASK			0xf
+#define ID2_UBS_SHIFT			8
+#define ID2_UBS_MASK			0xf
+#define ID2_PTFS_4K			(1 << 12)
+#define ID2_PTFS_16K			(1 << 13)
+#define ID2_PTFS_64K			(1 << 14)
+
+#define PIDR2_ARCH_SHIFT		4
+#define PIDR2_ARCH_MASK			0xf
+
+/* Global TLB invalidation */
+#define ARM_SMMU_GR0_STLBIALL		0x60
+#define ARM_SMMU_GR0_TLBIVMID		0x64
+#define ARM_SMMU_GR0_TLBIALLNSNH	0x68
+#define ARM_SMMU_GR0_TLBIALLH		0x6c
+#define ARM_SMMU_GR0_sTLBGSYNC		0x70
+#define ARM_SMMU_GR0_sTLBGSTATUS	0x74
+#define sTLBGSTATUS_GSACTIVE		(1 << 0)
+
+/* Stream mapping registers */
+#define ARM_SMMU_GR0_SMR(n)		(0x800 + ((n) << 2))
+#define SMR_VALID			(1 << 31)
+#define SMR_MASK_SHIFT			16
+#define SMR_MASK_MASK			0x7fff
+#define SMR_ID_SHIFT			0
+#define SMR_ID_MASK			0x7fff
+
+#define ARM_SMMU_GR0_S2CR(n)		(0xc00 + ((n) << 2))
+#define S2CR_CBNDX_SHIFT		0
+#define S2CR_CBNDX_MASK			0xff
+#define S2CR_TYPE_SHIFT			16
+#define S2CR_TYPE_MASK			0x3
+#define S2CR_TYPE_TRANS			(0 << S2CR_TYPE_SHIFT)
+#define S2CR_TYPE_BYPASS		(1 << S2CR_TYPE_SHIFT)
+#define S2CR_TYPE_FAULT			(2 << S2CR_TYPE_SHIFT)
+
+/* Context bank attribute registers */
+#define ARM_SMMU_GR1_CBAR(n)		(0x0 + ((n) << 2))
+#define CBAR_VMID_SHIFT			0
+#define CBAR_VMID_MASK			0xff
+#define CBAR_S1_MEMATTR_SHIFT		12
+#define CBAR_S1_MEMATTR_MASK		0xf
+#define CBAR_S1_MEMATTR_WB		0xf
+#define CBAR_TYPE_SHIFT			16
+#define CBAR_TYPE_MASK			0x3
+#define CBAR_TYPE_S2_TRANS		(0 << CBAR_TYPE_SHIFT)
+#define CBAR_TYPE_S1_TRANS_S2_BYPASS	(1 << CBAR_TYPE_SHIFT)
+#define CBAR_TYPE_S1_TRANS_S2_FAULT	(2 << CBAR_TYPE_SHIFT)
+#define CBAR_TYPE_S1_TRANS_S2_TRANS	(3 << CBAR_TYPE_SHIFT)
+#define CBAR_IRPTNDX_SHIFT		24
+#define CBAR_IRPTNDX_MASK		0xff
+
+#define ARM_SMMU_GR1_CBA2R(n)		(0x800 + ((n) << 2))
+#define CBA2R_RW64_32BIT		(0 << 0)
+#define CBA2R_RW64_64BIT		(1 << 0)
+
+/* Translation context bank */
+#define ARM_SMMU_CB_BASE(smmu)		((smmu)->base + ((smmu)->size >> 1))
+#define ARM_SMMU_CB(smmu, n)		((n) * (smmu)->pagesize)
+
+#define ARM_SMMU_CB_SCTLR		0x0
+#define ARM_SMMU_CB_RESUME		0x8
+#define ARM_SMMU_CB_TTBCR2		0x10
+#define ARM_SMMU_CB_TTBR0_LO		0x20
+#define ARM_SMMU_CB_TTBR0_HI		0x24
+#define ARM_SMMU_CB_TTBCR		0x30
+#define ARM_SMMU_CB_S1_MAIR0		0x38
+#define ARM_SMMU_CB_FSR			0x58
+#define ARM_SMMU_CB_FAR_LO		0x60
+#define ARM_SMMU_CB_FAR_HI		0x64
+#define ARM_SMMU_CB_FSYNR0		0x68
+
+#define SCTLR_S1_ASIDPNE		(1 << 12)
+#define SCTLR_CFCFG			(1 << 7)
+#define SCTLR_CFIE			(1 << 6)
+#define SCTLR_CFRE			(1 << 5)
+#define SCTLR_E				(1 << 4)
+#define SCTLR_AFE			(1 << 2)
+#define SCTLR_TRE			(1 << 1)
+#define SCTLR_M				(1 << 0)
+#define SCTLR_EAE_SBOP			(SCTLR_AFE | SCTLR_TRE)
+
+#define RESUME_RETRY			(0 << 0)
+#define RESUME_TERMINATE		(1 << 0)
+
+#define TTBCR_EAE			(1 << 31)
+
+#define TTBCR_PASIZE_SHIFT		16
+#define TTBCR_PASIZE_MASK		0x7
+
+#define TTBCR_TG0_4K			(0 << 14)
+#define TTBCR_TG0_64K			(1 << 14)
+
+#define TTBCR_SH0_SHIFT			12
+#define TTBCR_SH0_MASK			0x3
+#define TTBCR_SH_NS			0
+#define TTBCR_SH_OS			2
+#define TTBCR_SH_IS			3
+
+#define TTBCR_ORGN0_SHIFT		10
+#define TTBCR_IRGN0_SHIFT		8
+#define TTBCR_RGN_MASK			0x3
+#define TTBCR_RGN_NC			0
+#define TTBCR_RGN_WBWA			1
+#define TTBCR_RGN_WT			2
+#define TTBCR_RGN_WB			3
+
+#define TTBCR_SL0_SHIFT			6
+#define TTBCR_SL0_MASK			0x3
+#define TTBCR_SL0_LVL_2			0
+#define TTBCR_SL0_LVL_1			1
+
+#define TTBCR_T1SZ_SHIFT		16
+#define TTBCR_T0SZ_SHIFT		0
+#define TTBCR_SZ_MASK			0xf
+
+#define TTBCR2_SEP_SHIFT		15
+#define TTBCR2_SEP_MASK			0x7
+
+#define TTBCR2_PASIZE_SHIFT		0
+#define TTBCR2_PASIZE_MASK		0x7
+
+/* Common definitions for PASize and SEP fields */
+#define TTBCR2_ADDR_32			0
+#define TTBCR2_ADDR_36			1
+#define TTBCR2_ADDR_40			2
+#define TTBCR2_ADDR_42			3
+#define TTBCR2_ADDR_44			4
+#define TTBCR2_ADDR_48			5
+
+#define MAIR_ATTR_SHIFT(n)		((n) << 3)
+#define MAIR_ATTR_MASK			0xff
+#define MAIR_ATTR_DEVICE		0x04
+#define MAIR_ATTR_NC			0x44
+#define MAIR_ATTR_WBRWA			0xff
+#define MAIR_ATTR_IDX_NC		0
+#define MAIR_ATTR_IDX_CACHE		1
+#define MAIR_ATTR_IDX_DEV		2
+
+#define FSR_MULTI			(1 << 31)
+#define FSR_SS				(1 << 30)
+#define FSR_UUT				(1 << 8)
+#define FSR_ASF				(1 << 7)
+#define FSR_TLBLKF			(1 << 6)
+#define FSR_TLBMCF			(1 << 5)
+#define FSR_EF				(1 << 4)
+#define FSR_PF				(1 << 3)
+#define FSR_AFF				(1 << 2)
+#define FSR_TF				(1 << 1)
+
+#define FSR_IGN				(FSR_AFF | FSR_ASF | FSR_TLBMCF |	\
+					 FSR_TLBLKF)
+#define FSR_FAULT			(FSR_MULTI | FSR_SS | FSR_UUT |		\
+					 FSR_EF | FSR_PF | FSR_TF)
+
+#define FSYNR0_WNR			(1 << 4)
+
+struct arm_smmu_smr {
+	u8				idx;
+	u16				mask;
+	u16				id;
+};
+
+struct arm_smmu_master {
+	struct device_node		*of_node;
+
+	/*
+	 * The following is specific to the master's position in the
+	 * SMMU chain.
+	 */
+	struct rb_node			node;
+	int				num_streamids;
+	u16				streamids[MAX_MASTER_STREAMIDS];
+
+	/*
+	 * We only need to allocate these on the root SMMU, as we
+	 * configure unmatched streams to bypass translation.
+	 */
+	struct arm_smmu_smr		*smrs;
+};
+
+struct arm_smmu_device {
+	struct device			*dev;
+	struct device_node		*parent_of_node;
+
+	void __iomem			*base;
+	unsigned long			size;
+	unsigned long			pagesize;
+
+#define ARM_SMMU_FEAT_COHERENT_WALK	(1 << 0)
+#define ARM_SMMU_FEAT_STREAM_MATCH	(1 << 1)
+#define ARM_SMMU_FEAT_TRANS_S1		(1 << 2)
+#define ARM_SMMU_FEAT_TRANS_S2		(1 << 3)
+#define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
+	u32				features;
+	int				version;
+
+	u32				num_context_banks;
+	u32				num_s2_context_banks;
+	DECLARE_BITMAP(context_map, ARM_SMMU_MAX_CBS);
+	atomic_t			irptndx;
+
+	u32				num_mapping_groups;
+	DECLARE_BITMAP(smr_map, ARM_SMMU_MAX_SMRS);
+
+	unsigned long			input_size;
+	unsigned long			s1_output_size;
+	unsigned long			s2_output_size;
+
+	u32				num_global_irqs;
+	u32				num_context_irqs;
+	unsigned int			*irqs;
+
+	DECLARE_BITMAP(vmid_map, ARM_SMMU_NUM_VMIDS);
+
+	struct list_head		list;
+	struct rb_root			masters;
+};
+
+struct arm_smmu_cfg {
+	struct arm_smmu_device		*smmu;
+	u8				vmid;
+	u8				cbndx;
+	u8				irptndx;
+	u32				cbar;
+	pgd_t				*pgd;
+};
+
+struct arm_smmu_domain {
+	/*
+	 * A domain can span across multiple, chained SMMUs and requires
+	 * all devices within the domain to follow the same translation
+	 * path.
+	 */
+	struct arm_smmu_device		*leaf_smmu;
+	struct arm_smmu_cfg		root_cfg;
+
+	spinlock_t			lock;
+};
+
+static DEFINE_SPINLOCK(arm_smmu_devices_lock);
+static LIST_HEAD(arm_smmu_devices);
+
+static struct arm_smmu_master *find_smmu_master(struct arm_smmu_device *smmu,
+						struct device_node *dev_node)
+{
+	struct rb_node *node = smmu->masters.rb_node;
+
+	while (node) {
+		struct arm_smmu_master *master;
+		master = container_of(node, struct arm_smmu_master, node);
+
+		if (dev_node < master->of_node)
+			node = node->rb_left;
+		else if (dev_node > master->of_node)
+			node = node->rb_right;
+		else
+			return master;
+	}
+
+	return NULL;
+}
+
+static int insert_smmu_master(struct arm_smmu_device *smmu,
+			      struct arm_smmu_master *master)
+{
+	struct rb_node **new, *parent;
+
+	new = &smmu->masters.rb_node;
+	parent = NULL;
+	while (*new) {
+		struct arm_smmu_master *this;
+		this = container_of(*new, struct arm_smmu_master, node);
+
+		parent = *new;
+		if (master->of_node < this->of_node)
+			new = &((*new)->rb_left);
+		else if (master->of_node > this->of_node)
+			new = &((*new)->rb_right);
+		else
+			return -EEXIST;
+	}
+
+	rb_link_node(&master->node, parent, new);
+	rb_insert_color(&master->node, &smmu->masters);
+	return 0;
+}
+
+static int register_smmu_master(struct arm_smmu_device *smmu,
+				struct device *dev,
+				struct of_phandle_args *masterspec)
+{
+	int i;
+	struct arm_smmu_master *master;
+
+	master = find_smmu_master(smmu, masterspec->np);
+	if (master) {
+		dev_err(dev,
+			"rejecting multiple registrations for master device %s\n",
+			masterspec->np->name);
+		return -EBUSY;
+	}
+
+	if (masterspec->args_count > MAX_MASTER_STREAMIDS) {
+		dev_err(dev,
+			"reached maximum number (%d) of stream IDs for master device %s\n",
+			MAX_MASTER_STREAMIDS, masterspec->np->name);
+		return -ENOSPC;
+	}
+
+	master = devm_kzalloc(dev, sizeof(*master), GFP_KERNEL);
+	if (!master)
+		return -ENOMEM;
+
+	master->of_node		= masterspec->np;
+	master->num_streamids	= masterspec->args_count;
+
+	for (i = 0; i < master->num_streamids; ++i)
+		master->streamids[i] = masterspec->args[i];
+
+	return insert_smmu_master(smmu, master);
+}
+
+static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
+{
+	struct arm_smmu_device *parent, *tmp;
+
+	if (!smmu->parent_of_node)
+		return NULL;
+
+	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
+		if (parent->dev->of_node == smmu->parent_of_node)
+			return parent;
+
+	dev_warn(smmu->dev,
+		 "Failed to find SMMU parent despite parent in DT\n");
+	return NULL;
+}
+
+static struct arm_smmu_device *find_root_smmu(struct device *dev)
+{
+	struct arm_smmu_device *root, *parent;
+
+	/*
+	 * Walk the SMMU chain to find the root device for this chain.
+	 * We assume that no masters have translations which terminate
+	 * early, and therefore check that the root SMMU does indeed have
+	 * a StreamID for the master in question.
+	 */
+	parent = dev->archdata.iommu;
+	do {
+		root = parent;
+	} while ((parent = find_parent_smmu(root)));
+
+	if (!find_smmu_master(root, dev->of_node))
+		return NULL;
+
+	return root;
+}
+
+static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
+{
+	int idx;
+
+	do {
+		idx = find_next_zero_bit(map, end, start);
+		if (idx == end)
+			return -ENOSPC;
+	} while (test_and_set_bit(idx, map));
+
+	return idx;
+}
+
+static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
+{
+	clear_bit(idx, map);
+}
+
+/* Wait for any pending TLB invalidations to complete */
+static void arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
+{
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
+	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
+	       & sTLBGSTATUS_GSACTIVE)
+		cpu_relax();
+}
+
+static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
+{
+	int flags, ret;
+	u32 fsr, far, fsynr, resume;
+	unsigned long iova;
+	struct iommu_domain *domain = dev;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+	void __iomem *cb_base;
+
+	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, root_cfg->cbndx);
+	fsr = readl_relaxed(cb_base + ARM_SMMU_CB_FSR);
+
+	if (!(fsr & FSR_FAULT))
+		return IRQ_NONE;
+
+	if (fsr & FSR_IGN)
+		dev_err_ratelimited(smmu->dev,
+				    "Unexpected context fault (fsr 0x%u)\n",
+				    fsr);
+
+	fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
+	flags = fsynr & FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ;
+
+	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_LO);
+	iova = far;
+#ifdef CONFIG_64BIT
+	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_HI);
+	iova |= ((unsigned long)far << 32);
+#endif
+
+	if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
+		ret = IRQ_HANDLED;
+		resume = RESUME_RETRY;
+	} else {
+		ret = IRQ_NONE;
+		resume = RESUME_TERMINATE;
+	}
+
+	/* Clear the faulting FSR */
+	writel(fsr, cb_base + ARM_SMMU_CB_FSR);
+
+	/* Retry or terminate any stalled transactions */
+	if (fsr & FSR_SS)
+		writel_relaxed(resume, cb_base + ARM_SMMU_CB_RESUME);
+
+	return ret;
+}
+
+static irqreturn_t arm_smmu_global_fault(int irq, void *dev)
+{
+	u32 gfsr, gfsynr0, gfsynr1, gfsynr2;
+	struct arm_smmu_device *smmu = dev;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	gfsr = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSR);
+	gfsynr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR0);
+	gfsynr1 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR1);
+	gfsynr2 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sGFSYNR2);
+
+	dev_err_ratelimited(smmu->dev,
+		"Unexpected global fault, this could be serious\n");
+	dev_err_ratelimited(smmu->dev,
+		"\tGFSR 0x%08x, GFSYNR0 0x%08x, GFSYNR1 0x%08x, GFSYNR2 0x%08x\n",
+		gfsr, gfsynr0, gfsynr1, gfsynr2);
+
+	writel(gfsr, gr0_base + ARM_SMMU_GR0_sGFSR);
+	return IRQ_NONE;
+}
+
+static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain)
+{
+	u32 reg;
+	bool stage1;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+	void __iomem *cb_base, *gr0_base, *gr1_base;
+
+	gr0_base = ARM_SMMU_GR0(smmu);
+	gr1_base = ARM_SMMU_GR1(smmu);
+	stage1 = root_cfg->cbar != CBAR_TYPE_S2_TRANS;
+	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, root_cfg->cbndx);
+
+	/* CBAR */
+	reg = root_cfg->cbar |
+	      (root_cfg->vmid << CBAR_VMID_SHIFT);
+	if (smmu->version == 1)
+	      reg |= root_cfg->irptndx << CBAR_IRPTNDX_SHIFT;
+
+	/* Use the weakest memory type, so it is overridden by the pte */
+	if (stage1)
+		reg |= (CBAR_S1_MEMATTR_WB << CBAR_S1_MEMATTR_SHIFT);
+	writel_relaxed(reg, gr1_base + ARM_SMMU_GR1_CBAR(root_cfg->cbndx));
+
+	if (smmu->version > 1) {
+		/* CBA2R */
+#ifdef CONFIG_64BIT
+		reg = CBA2R_RW64_64BIT;
+#else
+		reg = CBA2R_RW64_32BIT;
+#endif
+		writel_relaxed(reg,
+			       gr1_base + ARM_SMMU_GR1_CBA2R(root_cfg->cbndx));
+
+		/* TTBCR2 */
+		switch (smmu->input_size) {
+		case 32:
+			reg = (TTBCR2_ADDR_32 << TTBCR2_SEP_SHIFT);
+			break;
+		case 36:
+			reg = (TTBCR2_ADDR_36 << TTBCR2_SEP_SHIFT);
+			break;
+		case 39:
+			reg = (TTBCR2_ADDR_40 << TTBCR2_SEP_SHIFT);
+			break;
+		case 42:
+			reg = (TTBCR2_ADDR_42 << TTBCR2_SEP_SHIFT);
+			break;
+		case 44:
+			reg = (TTBCR2_ADDR_44 << TTBCR2_SEP_SHIFT);
+			break;
+		case 48:
+			reg = (TTBCR2_ADDR_48 << TTBCR2_SEP_SHIFT);
+			break;
+		}
+
+		switch (smmu->s1_output_size) {
+		case 32:
+			reg |= (TTBCR2_ADDR_32 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 36:
+			reg |= (TTBCR2_ADDR_36 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 39:
+			reg |= (TTBCR2_ADDR_40 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 42:
+			reg |= (TTBCR2_ADDR_42 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 44:
+			reg |= (TTBCR2_ADDR_44 << TTBCR2_PASIZE_SHIFT);
+			break;
+		case 48:
+			reg |= (TTBCR2_ADDR_48 << TTBCR2_PASIZE_SHIFT);
+			break;
+		}
+
+		if (stage1)
+			writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBCR2);
+	}
+
+	/* TTBR0 */
+	reg = __pa(root_cfg->pgd);
+#ifndef __BIG_ENDIAN
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_LO);
+	reg = (phys_addr_t)__pa(root_cfg->pgd) >> 32;
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_HI);
+#else
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_HI);
+	reg = (phys_addr_t)__pa(root_cfg->pgd) >> 32;
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBR0_LO);
+#endif
+
+	/*
+	 * TTBCR
+	 * We use long descriptor, with inner-shareable WBWA tables in TTBR0.
+	 */
+	if (smmu->version > 1) {
+		if (PAGE_SIZE == SZ_4K)
+			reg = TTBCR_TG0_4K;
+		else
+			reg = TTBCR_TG0_64K;
+
+		if (!stage1) {
+			switch (smmu->s2_output_size) {
+			case 32:
+				reg |= (TTBCR2_ADDR_32 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 36:
+				reg |= (TTBCR2_ADDR_36 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 40:
+				reg |= (TTBCR2_ADDR_40 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 42:
+				reg |= (TTBCR2_ADDR_42 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 44:
+				reg |= (TTBCR2_ADDR_44 << TTBCR_PASIZE_SHIFT);
+				break;
+			case 48:
+				reg |= (TTBCR2_ADDR_48 << TTBCR_PASIZE_SHIFT);
+				break;
+			}
+		} else {
+			reg |= (64 - smmu->s1_output_size) << TTBCR_T0SZ_SHIFT;
+		}
+	} else {
+		reg = 0;
+	}
+
+	reg |= TTBCR_EAE |
+	      (TTBCR_SH_IS << TTBCR_SH0_SHIFT) |
+	      (TTBCR_RGN_WBWA << TTBCR_ORGN0_SHIFT) |
+	      (TTBCR_RGN_WBWA << TTBCR_IRGN0_SHIFT) |
+	      (TTBCR_SL0_LVL_1 << TTBCR_SL0_SHIFT);
+	writel_relaxed(reg, cb_base + ARM_SMMU_CB_TTBCR);
+
+	/* MAIR0 (stage-1 only) */
+	if (stage1) {
+		reg = (MAIR_ATTR_NC << MAIR_ATTR_SHIFT(MAIR_ATTR_IDX_NC)) |
+		      (MAIR_ATTR_WBRWA << MAIR_ATTR_SHIFT(MAIR_ATTR_IDX_CACHE)) |
+		      (MAIR_ATTR_DEVICE << MAIR_ATTR_SHIFT(MAIR_ATTR_IDX_DEV));
+		writel_relaxed(reg, cb_base + ARM_SMMU_CB_S1_MAIR0);
+	}
+
+	/* Nuke the TLB */
+	writel_relaxed(root_cfg->vmid, gr0_base + ARM_SMMU_GR0_TLBIVMID);
+	arm_smmu_tlb_sync(smmu);
+
+	/* SCTLR */
+	reg = SCTLR_CFCFG | SCTLR_CFIE | SCTLR_CFRE | SCTLR_M | SCTLR_EAE_SBOP;
+	if (stage1)
+		reg |= SCTLR_S1_ASIDPNE;
+#ifdef __BIG_ENDIAN
+	reg |= SCTLR_E;
+#endif
+	writel(reg, cb_base + ARM_SMMU_CB_SCTLR);
+}
+
+static int arm_smmu_init_domain_context(struct iommu_domain *domain)
+{
+	int irq, ret, start;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+
+	ret = __arm_smmu_alloc_bitmap(smmu->vmid_map, 0, ARM_SMMU_NUM_VMIDS);
+	if (IS_ERR_VALUE(ret))
+		goto out;
+
+	root_cfg->vmid = ret;
+	if (smmu->features & ARM_SMMU_FEAT_TRANS_NESTED) {
+		/*
+		 * We will likely want to change this if/when KVM gets
+		 * involved.
+		 */
+		root_cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
+		start = smmu->num_s2_context_banks;
+	} else if (smmu->features & ARM_SMMU_FEAT_TRANS_S2) {
+		root_cfg->cbar = CBAR_TYPE_S2_TRANS;
+		start = 0;
+	} else {
+		root_cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
+		start = smmu->num_s2_context_banks;
+	}
+
+	ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
+				      smmu->num_context_banks);
+	if (IS_ERR_VALUE(ret))
+		goto out_free_vmid;
+
+	root_cfg->cbndx = ret;
+
+	if (smmu->version == 1) {
+		root_cfg->irptndx = atomic_inc_return(&smmu->irptndx);
+		root_cfg->irptndx %= smmu->num_context_irqs;
+	} else {
+		root_cfg->irptndx = root_cfg->cbndx;
+	}
+
+	irq = smmu->irqs[smmu->num_global_irqs + root_cfg->irptndx];
+	ret = request_irq(irq, arm_smmu_context_fault, IRQF_SHARED,
+			  "arm-smmu-context-fault", domain);
+	if (IS_ERR_VALUE(ret)) {
+		dev_err(smmu->dev, "failed to request context IRQ %d (%u)\n",
+			root_cfg->irptndx, irq);
+		root_cfg->irptndx = -1;
+		goto out_free_context;
+	}
+
+	arm_smmu_init_context_bank(smmu_domain);
+out:
+	return ret;
+
+out_free_context:
+	__arm_smmu_free_bitmap(smmu->context_map, root_cfg->cbndx);
+out_free_vmid:
+	__arm_smmu_free_bitmap(smmu->vmid_map, root_cfg->vmid);
+	return ret;
+}
+
+static void arm_smmu_destroy_domain_context(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+	int irq;
+
+	if (!smmu)
+		return;
+
+	if (root_cfg->irptndx != -1) {
+		irq = smmu->irqs[smmu->num_global_irqs + root_cfg->irptndx];
+		free_irq(irq, domain);
+	}
+
+	__arm_smmu_free_bitmap(smmu->vmid_map, root_cfg->vmid);
+	__arm_smmu_free_bitmap(smmu->context_map, root_cfg->cbndx);
+}
+
+static int arm_smmu_domain_init(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain;
+	pgd_t *pgd;
+
+	/*
+	 * Allocate the domain and initialise some of its data structures.
+	 * We can't really do anything meaningful until we've added a
+	 * master.
+	 */
+	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
+	if (!smmu_domain)
+		return -ENOMEM;
+
+	pgd = kzalloc(PTRS_PER_PGD * sizeof(pgd_t), GFP_KERNEL);
+	if (!pgd)
+		goto out_free_domain;
+	smmu_domain->root_cfg.pgd = pgd;
+
+	spin_lock_init(&smmu_domain->lock);
+	domain->priv = smmu_domain;
+	return 0;
+
+out_free_domain:
+	kfree(smmu_domain);
+	return -ENOMEM;
+}
+
+static void arm_smmu_free_ptes(pmd_t *pmd)
+{
+	pgtable_t table = pmd_pgtable(*pmd);
+	pgtable_page_dtor(table);
+	__free_page(table);
+}
+
+static void arm_smmu_free_pmds(pud_t *pud)
+{
+	int i;
+	pmd_t *pmd, *pmd_base = pmd_offset(pud, 0);
+
+	pmd = pmd_base;
+	for (i = 0; i < PTRS_PER_PMD; ++i) {
+		if (pmd_none(*pmd))
+			continue;
+
+		arm_smmu_free_ptes(pmd);
+		pmd++;
+	}
+
+	pmd_free(NULL, pmd_base);
+}
+
+static void arm_smmu_free_puds(pgd_t *pgd)
+{
+	int i;
+	pud_t *pud, *pud_base = pud_offset(pgd, 0);
+
+	pud = pud_base;
+	for (i = 0; i < PTRS_PER_PUD; ++i) {
+		if (pud_none(*pud))
+			continue;
+
+		arm_smmu_free_pmds(pud);
+		pud++;
+	}
+
+	pud_free(NULL, pud_base);
+}
+
+static void arm_smmu_free_pgtables(struct arm_smmu_domain *smmu_domain)
+{
+	int i;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	pgd_t *pgd, *pgd_base = root_cfg->pgd;
+
+	/*
+	 * Recursively free the page tables for this domain. We don't
+	 * care about speculative TLB filling, because the TLB will be
+	 * nuked next time this context bank is re-allocated and no devices
+	 * currently map to these tables.
+	 */
+	pgd = pgd_base;
+	for (i = 0; i < PTRS_PER_PGD; ++i) {
+		if (pgd_none(*pgd))
+			continue;
+		arm_smmu_free_puds(pgd);
+		pgd++;
+	}
+
+	kfree(pgd_base);
+}
+
+static void arm_smmu_domain_destroy(struct iommu_domain *domain)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	arm_smmu_destroy_domain_context(domain);
+	arm_smmu_free_pgtables(smmu_domain);
+	kfree(smmu_domain);
+}
+
+static int arm_smmu_master_configure_smrs(struct arm_smmu_device *smmu,
+					  struct arm_smmu_master *master)
+{
+	int i;
+	struct arm_smmu_smr *smrs;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	if (!(smmu->features & ARM_SMMU_FEAT_STREAM_MATCH))
+		return 0;
+
+	if (master->smrs)
+		return -EEXIST;
+
+	smrs = kmalloc(sizeof(*smrs) * master->num_streamids, GFP_KERNEL);
+	if (!smrs) {
+		dev_err(smmu->dev, "failed to allocate %d SMRs for master %s\n",
+			master->num_streamids, master->of_node->name);
+		return -ENOMEM;
+	}
+
+	/* Allocate the SMRs on the root SMMU */
+	for (i = 0; i < master->num_streamids; ++i) {
+		int idx = __arm_smmu_alloc_bitmap(smmu->smr_map, 0,
+						  smmu->num_mapping_groups);
+		if (IS_ERR_VALUE(idx)) {
+			dev_err(smmu->dev, "failed to allocate free SMR\n");
+			goto err_free_smrs;
+		}
+
+		smrs[i] = (struct arm_smmu_smr) {
+			.idx	= idx,
+			.mask	= 0, /* We don't currently share SMRs */
+			.id	= master->streamids[i],
+		};
+	}
+
+	/* It worked! Now, poke the actual hardware */
+	for (i = 0; i < master->num_streamids; ++i) {
+		u32 reg = SMR_VALID | smrs[i].id << SMR_ID_SHIFT |
+			  smrs[i].mask << SMR_MASK_SHIFT;
+		writel_relaxed(reg, gr0_base + ARM_SMMU_GR0_SMR(smrs[i].idx));
+	}
+
+	master->smrs = smrs;
+	return 0;
+
+err_free_smrs:
+	while (--i >= 0)
+		__arm_smmu_free_bitmap(smmu->smr_map, smrs[i].idx);
+	kfree(smrs);
+	return -ENOSPC;
+}
+
+static void arm_smmu_master_free_smrs(struct arm_smmu_device *smmu,
+				      struct arm_smmu_master *master)
+{
+	int i;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+	struct arm_smmu_smr *smrs = master->smrs;
+
+	/* Invalidate the SMRs before freeing back to the allocator */
+	for (i = 0; i < master->num_streamids; ++i) {
+		u8 idx = smrs[i].idx;
+		writel_relaxed(~SMR_VALID, gr0_base + ARM_SMMU_GR0_SMR(idx));
+		__arm_smmu_free_bitmap(smmu->smr_map, idx);
+	}
+
+	master->smrs = NULL;
+	kfree(smrs);
+}
+
+static void arm_smmu_bypass_stream_mapping(struct arm_smmu_device *smmu,
+					   struct arm_smmu_master *master)
+{
+	int i;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	for (i = 0; i < master->num_streamids; ++i) {
+		u16 sid = master->streamids[i];
+		writel_relaxed(S2CR_TYPE_BYPASS,
+			       gr0_base + ARM_SMMU_GR0_S2CR(sid));
+	}
+}
+
+static int arm_smmu_domain_add_master(struct arm_smmu_domain *smmu_domain,
+				      struct arm_smmu_master *master)
+{
+	int i, ret;
+	struct arm_smmu_device *parent, *smmu = smmu_domain->root_cfg.smmu;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	ret = arm_smmu_master_configure_smrs(smmu, master);
+	if (ret)
+		return ret;
+
+	/* Bypass the leaves */
+	smmu = smmu_domain->leaf_smmu;
+	while ((parent = find_parent_smmu(smmu))) {
+		/*
+		 * We won't have a StreamID match for anything but the root
+		 * smmu, so we only need to worry about StreamID indexing,
+		 * where we must install bypass entries in the S2CRs.
+		 */
+		if (smmu->features & ARM_SMMU_FEAT_STREAM_MATCH)
+			continue;
+
+		arm_smmu_bypass_stream_mapping(smmu, master);
+		smmu = parent;
+	}
+
+	/* Now we're at the root, time to point at our context bank */
+	for (i = 0; i < master->num_streamids; ++i) {
+		u32 idx, s2cr;
+		idx = master->smrs ? master->smrs[i].idx : master->streamids[i];
+		s2cr = (S2CR_TYPE_TRANS << S2CR_TYPE_SHIFT) |
+		       (smmu_domain->root_cfg.cbndx << S2CR_CBNDX_SHIFT);
+		writel_relaxed(s2cr, gr0_base + ARM_SMMU_GR0_S2CR(idx));
+	}
+
+	return 0;
+}
+
+static void arm_smmu_domain_remove_master(struct arm_smmu_domain *smmu_domain,
+					  struct arm_smmu_master *master)
+{
+	struct arm_smmu_device *smmu = smmu_domain->root_cfg.smmu;
+
+	/*
+	 * We *must* clear the S2CR first, because freeing the SMR means
+	 * that it can be re-allocated immediately.
+	 */
+	arm_smmu_bypass_stream_mapping(smmu, master);
+	arm_smmu_master_free_smrs(smmu, master);
+}
+
+static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
+{
+	int ret = -EINVAL;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_device *device_smmu = dev->archdata.iommu;
+	struct arm_smmu_master *master;
+
+	if (!device_smmu) {
+		dev_err(dev, "cannot attach to SMMU, is it on the same bus?\n");
+		return -ENXIO;
+	}
+
+
+	/*
+	 * Sanity check the domain. We don't currently support domains
+	 * that cross between different SMMU chains.
+	 */
+	spin_lock(&smmu_domain->lock);
+	if (!smmu_domain->leaf_smmu) {
+		smmu_domain->root_cfg.smmu = find_root_smmu(dev);
+		if (!smmu_domain->root_cfg.smmu) {
+			dev_err(dev, "unable to find root SMMU for device\n");
+			goto err_unlock;
+		}
+
+		/* Now that we have a master, we can finalise the domain */
+		ret = arm_smmu_init_domain_context(domain);
+		if (IS_ERR_VALUE(ret))
+			goto err_unlock;
+
+		smmu_domain->leaf_smmu = device_smmu;
+	} else if (smmu_domain->leaf_smmu != device_smmu) {
+		dev_err(dev,
+			"cannot attach to SMMU %s whilst already attached to domain on SMMU %s\n",
+			dev_name(smmu_domain->leaf_smmu->dev),
+			dev_name(device_smmu->dev));
+		goto err_unlock;
+	}
+	spin_unlock(&smmu_domain->lock);
+
+	/* Looks ok, so add the device to the domain */
+	master = find_smmu_master(smmu_domain->leaf_smmu, dev->of_node);
+	if (!master)
+		return -ENODEV;
+
+	return arm_smmu_domain_add_master(smmu_domain, master);
+
+err_unlock:
+	spin_unlock(&smmu_domain->lock);
+	return ret;
+}
+
+static void arm_smmu_detach_dev(struct iommu_domain *domain, struct device *dev)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_master *master;
+
+	master = find_smmu_master(smmu_domain->leaf_smmu, dev->of_node);
+	if (master)
+		arm_smmu_domain_remove_master(smmu_domain, master);
+}
+
+static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
+				   size_t size)
+{
+	unsigned long offset = (unsigned long)addr & ~PAGE_MASK;
+
+	/*
+	 * If the SMMU can't walk tables in the CPU caches, treat them
+	 * like non-coherent DMA...
+	 */
+	if (!(smmu->features & ARM_SMMU_FEAT_COHERENT_WALK))
+		dma_map_page(smmu->dev, virt_to_page(addr), offset, size,
+			     DMA_TO_DEVICE);
+}
+
+static bool arm_smmu_pte_is_contiguous_range(unsigned long addr,
+					     unsigned long end)
+{
+	return !(addr & ~ARM_SMMU_PTE_CONT_MASK) &&
+		(addr + ARM_SMMU_PTE_CONT_SIZE <= end);
+}
+
+static int arm_smmu_alloc_init_pte(struct arm_smmu_device *smmu, pmd_t *pmd,
+				   unsigned long addr, unsigned long end,
+				   unsigned long pfn, int flags, int stage)
+{
+	pte_t *pte, *start;
+	pteval_t pteval = ARM_SMMU_PTE_PAGE | ARM_SMMU_PTE_AF;
+
+	if (pmd_none(*pmd)) {
+		/* Allocate a new set of tables */
+		pgtable_t table = alloc_page(PGALLOC_GFP);
+		if (!table)
+			return -ENOMEM;
+
+		arm_smmu_flush_pgtable(smmu, page_address(table),
+				       ARM_SMMU_PTE_HWTABLE_SIZE);
+		pgtable_page_ctor(table);
+		pmd_populate(NULL, pmd, table);
+		arm_smmu_flush_pgtable(smmu, pmd, sizeof(*pmd));
+	}
+
+	if (stage == 1) {
+		pteval |= ARM_SMMU_PTE_AP_UNPRIV;
+		if (!(flags & IOMMU_WRITE) && (flags & IOMMU_READ))
+			pteval |= ARM_SMMU_PTE_AP_RDONLY;
+
+		if (flags & IOMMU_CACHE)
+			pteval |= (MAIR_ATTR_IDX_CACHE <<
+				   ARM_SMMU_PTE_ATTRINDX_SHIFT);
+	} else {
+		pteval |= ARM_SMMU_PTE_HAP_FAULT;
+		if (flags & IOMMU_READ)
+			pteval |= ARM_SMMU_PTE_HAP_READ;
+		if (flags & IOMMU_WRITE)
+			pteval |= ARM_SMMU_PTE_HAP_WRITE;
+		if (flags & IOMMU_CACHE)
+			pteval |= ARM_SMMU_PTE_MEMATTR_OIWB;
+		else
+			pteval |= ARM_SMMU_PTE_MEMATTR_NC;
+	}
+
+	/* If no access, create a faulting entry to avoid TLB fills */
+	if (!(flags & (IOMMU_READ | IOMMU_WRITE)))
+		pteval &= ~ARM_SMMU_PTE_PAGE;
+
+	pteval |= ARM_SMMU_PTE_SH_IS;
+	start = pmd_page_vaddr(*pmd) + pte_index(addr);
+	pte = start;
+
+	/*
+	 * Install the page table entries. This is fairly complicated
+	 * since we attempt to make use of the contiguous hint in the
+	 * ptes where possible. The contiguous hint indicates a series
+	 * of ARM_SMMU_PTE_CONT_ENTRIES ptes mapping a physically
+	 * contiguous region with the following constraints:
+	 *
+	 *   - The region start is aligned to ARM_SMMU_PTE_CONT_SIZE
+	 *   - Each pte in the region has the contiguous hint bit set
+	 *
+	 * This complicates unmapping (also handled by this code, when
+	 * neither IOMMU_READ or IOMMU_WRITE are set) because it is
+	 * possible, yet highly unlikely, that a client may unmap only
+	 * part of a contiguous range. This requires clearing of the
+	 * contiguous hint bits in the range before installing the new
+	 * faulting entries.
+	 *
+	 * Note that re-mapping an address range without first unmapping
+	 * it is not supported, so TLB invalidation is not required here
+	 * and is instead performed@unmap and domain-init time.
+	 */
+	do {
+		int i = 1;
+		pteval &= ~ARM_SMMU_PTE_CONT;
+
+		if (arm_smmu_pte_is_contiguous_range(addr, end)) {
+			i = ARM_SMMU_PTE_CONT_ENTRIES;
+			pteval |= ARM_SMMU_PTE_CONT;
+		} else if (pte_val(*pte) &
+			   (ARM_SMMU_PTE_CONT | ARM_SMMU_PTE_PAGE)) {
+			int j;
+			pte_t *cont_start;
+			unsigned long idx = pte_index(addr);
+
+			idx &= ~(ARM_SMMU_PTE_CONT_ENTRIES - 1);
+			cont_start = pmd_page_vaddr(*pmd) + idx;
+			for (j = 0; j < ARM_SMMU_PTE_CONT_ENTRIES; ++j)
+				pte_val(*(cont_start + j)) &= ~ARM_SMMU_PTE_CONT;
+
+			arm_smmu_flush_pgtable(smmu, cont_start,
+					       sizeof(*pte) *
+					       ARM_SMMU_PTE_CONT_ENTRIES);
+		}
+
+		do {
+			*pte = pfn_pte(pfn, __pgprot(pteval));
+		} while (pte++, pfn++, addr += PAGE_SIZE, --i);
+	} while (addr != end);
+
+	arm_smmu_flush_pgtable(smmu, start, sizeof(*pte) * (pte - start));
+	return 0;
+}
+
+static int arm_smmu_alloc_init_pmd(struct arm_smmu_device *smmu, pud_t *pud,
+				   unsigned long addr, unsigned long end,
+				   phys_addr_t phys, int flags, int stage)
+{
+	int ret;
+	pmd_t *pmd;
+	unsigned long next, pfn = __phys_to_pfn(phys);
+
+#ifndef __PAGETABLE_PMD_FOLDED
+	if (pud_none(*pud)) {
+		pmd = pmd_alloc_one(NULL, addr);
+		if (!pmd)
+			return -ENOMEM;
+	} else
+#endif
+		pmd = pmd_offset(pud, addr);
+
+	do {
+		next = pmd_addr_end(addr, end);
+		ret = arm_smmu_alloc_init_pte(smmu, pmd, addr, end, pfn,
+					      flags, stage);
+		pud_populate(NULL, pud, pmd);
+		arm_smmu_flush_pgtable(smmu, pud, sizeof(*pud));
+		phys += next - addr;
+	} while (pmd++, addr = next, addr < end);
+
+	return ret;
+}
+
+static int arm_smmu_alloc_init_pud(struct arm_smmu_device *smmu, pgd_t *pgd,
+				   unsigned long addr, unsigned long end,
+				   phys_addr_t phys, int flags, int stage)
+{
+	int ret = 0;
+	pud_t *pud;
+	unsigned long next;
+
+#ifndef __PAGETABLE_PUD_FOLDED
+	if (pgd_none(*pgd)) {
+		pud = pud_alloc_one(NULL, addr);
+		if (!pud)
+			return -ENOMEM;
+	} else
+#endif
+		pud = pud_offset(pgd, addr);
+
+	do {
+		next = pud_addr_end(addr, end);
+		ret = arm_smmu_alloc_init_pmd(smmu, pud, addr, next, phys,
+					      flags, stage);
+		pgd_populate(NULL, pud, pgd);
+		arm_smmu_flush_pgtable(smmu, pgd, sizeof(*pgd));
+		phys += next - addr;
+	} while (pud++, addr = next, addr < end);
+
+	return ret;
+}
+
+static int arm_smmu_create_mapping(struct arm_smmu_domain *smmu_domain,
+				   unsigned long iova, phys_addr_t paddr,
+				   size_t size, int flags)
+{
+	int ret, stage;
+	unsigned long end;
+	phys_addr_t input_mask, output_mask;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	pgd_t *pgd = root_cfg->pgd;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+
+	if (root_cfg->cbar == CBAR_TYPE_S2_TRANS) {
+		stage = 2;
+		output_mask = (1ULL << smmu->s2_output_size) - 1;
+	} else {
+		stage = 1;
+		output_mask = (1ULL << smmu->s1_output_size) - 1;
+	}
+
+	if (!pgd)
+		return -EINVAL;
+
+	if (size & ~PAGE_MASK)
+		return -EINVAL;
+
+	input_mask = (1ULL << smmu->input_size) - 1;
+	if ((phys_addr_t)iova & ~input_mask)
+		return -ERANGE;
+
+	if (paddr & ~output_mask)
+		return -ERANGE;
+
+	spin_lock(&smmu_domain->lock);
+	pgd += pgd_index(iova);
+	end = iova + size;
+	do {
+		unsigned long next = pgd_addr_end(iova, end);
+
+		ret = arm_smmu_alloc_init_pud(smmu, pgd, iova, next, paddr,
+					      flags, stage);
+		if (ret)
+			goto out_unlock;
+
+		paddr += next - iova;
+		iova = next;
+	} while (pgd++, iova != end);
+
+out_unlock:
+	spin_unlock(&smmu_domain->lock);
+
+	/* Ensure new page tables are visible to the hardware walker */
+	if (smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+		dsb();
+
+	return ret;
+}
+
+static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
+			phys_addr_t paddr, size_t size, int flags)
+{
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_device *smmu = smmu_domain->leaf_smmu;
+
+	if (!smmu_domain || !smmu)
+		return -ENODEV;
+
+	/*
+	 * Check for silent address truncation up the SMMU chain.
+	 */
+	do {
+		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
+		if ((phys_addr_t)iova & ~output_mask)
+			return -ERANGE;
+	} while ((smmu = find_parent_smmu(smmu)));
+
+	return arm_smmu_create_mapping(smmu_domain, iova, paddr, size, flags);
+}
+
+static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
+			     size_t size)
+{
+	int ret;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+
+	ret = arm_smmu_create_mapping(smmu_domain, iova, 0, size, 0);
+	writel_relaxed(root_cfg->vmid, gr0_base + ARM_SMMU_GR0_TLBIVMID);
+	arm_smmu_tlb_sync(smmu);
+	return ret ? ret : size;
+}
+
+static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain,
+					 dma_addr_t iova)
+{
+	pgd_t *pgd;
+	pud_t *pud;
+	pmd_t *pmd;
+	pte_t *pte;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
+	struct arm_smmu_device *smmu = root_cfg->smmu;
+
+	spin_lock(&smmu_domain->lock);
+	pgd = root_cfg->pgd;
+	if (!pgd)
+		goto err_unlock;
+
+	pgd += pgd_index(iova);
+	if (pgd_none_or_clear_bad(pgd))
+		goto err_unlock;
+
+	pud = pud_offset(pgd, iova);
+	if (pud_none_or_clear_bad(pud))
+		goto err_unlock;
+
+	pmd = pmd_offset(pud, iova);
+	if (pmd_none_or_clear_bad(pmd))
+		goto err_unlock;
+
+	pte = pmd_page_vaddr(*pmd) + pte_index(iova);
+	if (pte_none(pte))
+		goto err_unlock;
+
+	spin_unlock(&smmu_domain->lock);
+	return __pfn_to_phys(pte_pfn(*pte)) | (iova & ~PAGE_MASK);
+
+err_unlock:
+	spin_unlock(&smmu_domain->lock);
+	dev_warn(smmu->dev,
+		 "invalid (corrupt?) page tables detected for iova 0x%llx\n",
+		 (unsigned long long)iova);
+	return -EINVAL;
+}
+
+static int arm_smmu_domain_has_cap(struct iommu_domain *domain,
+				   unsigned long cap)
+{
+	unsigned long caps = 0;
+	struct arm_smmu_domain *smmu_domain = domain->priv;
+
+	if (smmu_domain->root_cfg.smmu->features & ARM_SMMU_FEAT_COHERENT_WALK)
+		caps |= IOMMU_CAP_CACHE_COHERENCY;
+
+	return !!(cap & caps);
+}
+
+static int arm_smmu_add_device(struct device *dev)
+{
+	struct arm_smmu_device *child, *parent, *smmu;
+	struct arm_smmu_device *tmp[2];
+	struct arm_smmu_master *master = NULL;
+
+	list_for_each_entry_safe(parent, tmp[0], &arm_smmu_devices, list) {
+		smmu = parent;
+
+		/* Try to find a child of the current SMMU. */
+		list_for_each_entry_safe(child, tmp[1], &arm_smmu_devices, list) {
+			if (child->parent_of_node == parent->dev->of_node) {
+				/* Does the child sit above our master? */
+				master = find_smmu_master(child, dev->of_node);
+				if (master) {
+					smmu = NULL;
+					break;
+				}
+			}
+		}
+
+		/* We found some children, so keep searching. */
+		if (!smmu) {
+			master = NULL;
+			continue;
+		}
+
+		master = find_smmu_master(smmu, dev->of_node);
+		if (master)
+			break;
+	}
+
+	if (!master)
+		return -ENODEV;
+
+	dev->archdata.iommu = smmu;
+	return 0;
+}
+
+static void arm_smmu_remove_device(struct device *dev)
+{
+	dev->archdata.iommu = NULL;
+}
+
+static struct iommu_ops arm_smmu_ops = {
+	.domain_init	= arm_smmu_domain_init,
+	.domain_destroy	= arm_smmu_domain_destroy,
+	.attach_dev	= arm_smmu_attach_dev,
+	.detach_dev	= arm_smmu_detach_dev,
+	.map		= arm_smmu_map,
+	.unmap		= arm_smmu_unmap,
+	.iova_to_phys	= arm_smmu_iova_to_phys,
+	.domain_has_cap	= arm_smmu_domain_has_cap,
+	.add_device	= arm_smmu_add_device,
+	.remove_device	= arm_smmu_remove_device,
+	.pgsize_bitmap	= (SECTION_SIZE |
+			   ARM_SMMU_PTE_CONT_SIZE |
+			   PAGE_SIZE),
+};
+
+static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
+{
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+	int i = 0;
+	u32 scr0 = readl_relaxed(gr0_base + ARM_SMMU_GR0_sCR0);
+
+	/* Mark all SMRn as invalid and all S2CRn as bypass */
+	for (i = 0; i < smmu->num_mapping_groups; ++i) {
+		writel_relaxed(~SMR_VALID, gr0_base + ARM_SMMU_GR0_SMR(i));
+		writel_relaxed(S2CR_TYPE_BYPASS, gr0_base + ARM_SMMU_GR0_S2CR(i));
+	}
+
+	/* Invalidate the TLB, just in case */
+	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_STLBIALL);
+	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLH);
+	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_TLBIALLNSNH);
+
+	/* Enable fault reporting */
+	scr0 |= (sCR0_GFRE | sCR0_GFIE | sCR0_GCFGFRE | sCR0_GCFGFIE);
+
+	/* Disable TLB broadcasting. */
+	scr0 |= (sCR0_VMIDPNE | sCR0_PTM);
+
+	/* Enable client access, but bypass when no mapping is found */
+	scr0 &= ~(sCR0_CLIENTPD | sCR0_USFCFG);
+
+	/* Disable forced broadcasting */
+	scr0 &= ~sCR0_FB;
+
+	/* Don't upgrade barriers */
+	scr0 &= ~(sCR0_BSU_MASK << sCR0_BSU_SHIFT);
+
+	/* Push the button */
+	arm_smmu_tlb_sync(smmu);
+	writel(scr0, gr0_base + ARM_SMMU_GR0_sCR0);
+}
+
+static int arm_smmu_id_size_to_bits(int size)
+{
+	switch (size) {
+	case 0:
+		return 32;
+	case 1:
+		return 36;
+	case 2:
+		return 40;
+	case 3:
+		return 42;
+	case 4:
+		return 44;
+	case 5:
+	default:
+		return 48;
+	}
+}
+
+static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
+{
+	unsigned long size;
+	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
+	u32 id;
+
+	dev_notice(smmu->dev, "probing hardware configuration...\n");
+
+	/* Primecell ID */
+	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_PIDR2);
+	smmu->version = ((id >> PIDR2_ARCH_SHIFT) & PIDR2_ARCH_MASK) + 1;
+	dev_notice(smmu->dev, "SMMUv%d with:\n", smmu->version);
+
+	/* ID0 */
+	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID0);
+#ifndef CONFIG_64BIT
+	if (((id >> ID0_PTFS_SHIFT) & ID0_PTFS_MASK) == ID0_PTFS_V8_ONLY) {
+		dev_err(smmu->dev, "\tno v7 descriptor support!\n");
+		return -ENODEV;
+	}
+#endif
+	if (id & ID0_S1TS) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_S1;
+		dev_notice(smmu->dev, "\tstage 1 translation\n");
+	}
+
+	if (id & ID0_S2TS) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_S2;
+		dev_notice(smmu->dev, "\tstage 2 translation\n");
+	}
+
+	if (id & ID0_NTS) {
+		smmu->features |= ARM_SMMU_FEAT_TRANS_NESTED;
+		dev_notice(smmu->dev, "\tnested translation\n");
+	}
+
+	if (!(smmu->features &
+		(ARM_SMMU_FEAT_TRANS_S1 | ARM_SMMU_FEAT_TRANS_S2 |
+		 ARM_SMMU_FEAT_TRANS_NESTED))) {
+		dev_err(smmu->dev, "\tno translation support!\n");
+		return -ENODEV;
+	}
+
+	if (id & ID0_CTTW) {
+		smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK;
+		dev_notice(smmu->dev, "\tcoherent table walk\n");
+	}
+
+	if (id & ID0_SMS) {
+		u32 smr, sid, mask;
+
+		smmu->features |= ARM_SMMU_FEAT_STREAM_MATCH;
+		smmu->num_mapping_groups = (id >> ID0_NUMSMRG_SHIFT) &
+					   ID0_NUMSMRG_MASK;
+		if (smmu->num_mapping_groups == 0) {
+			dev_err(smmu->dev,
+				"stream-matching supported, but no SMRs present!\n");
+			return -ENODEV;
+		}
+
+		smr = SMR_MASK_MASK << SMR_MASK_SHIFT;
+		smr |= (SMR_ID_MASK << SMR_ID_SHIFT);
+		writel_relaxed(smr, gr0_base + ARM_SMMU_GR0_SMR(0));
+		smr = readl_relaxed(gr0_base + ARM_SMMU_GR0_SMR(0));
+
+		mask = (smr >> SMR_MASK_SHIFT) & SMR_MASK_MASK;
+		sid = (smr >> SMR_ID_SHIFT) & SMR_ID_MASK;
+		if ((mask & sid) != sid) {
+			dev_err(smmu->dev,
+				"SMR mask bits (0x%x) insufficient for ID field (0x%x)\n",
+				mask, sid);
+			return -ENODEV;
+		}
+
+		dev_notice(smmu->dev,
+			   "\tstream matching with %u register groups, mask 0x%x",
+			   smmu->num_mapping_groups, mask);
+	}
+
+	/* ID1 */
+	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID1);
+	smmu->pagesize = (id & ID1_PAGESIZE) ? SZ_64K : SZ_4K;
+
+	/* Check that we ioremapped enough */
+	size = 1 << (((id >> ID1_NUMPAGENDXB_SHIFT) & ID1_NUMPAGENDXB_MASK) + 1);
+	size *= (smmu->pagesize << 1);
+	if (smmu->size < size)
+		dev_warn(smmu->dev,
+			 "device is 0x%lx bytes but only mapped 0x%lx!\n",
+			 size, smmu->size);
+
+	smmu->num_s2_context_banks = (id >> ID1_NUMS2CB_SHIFT) &
+				      ID1_NUMS2CB_MASK;
+	smmu->num_context_banks = (id >> ID1_NUMCB_SHIFT) & ID1_NUMCB_MASK;
+	if (smmu->num_s2_context_banks > smmu->num_context_banks) {
+		dev_err(smmu->dev, "impossible number of S2 context banks!\n");
+		return -ENODEV;
+	}
+	dev_notice(smmu->dev, "\t%u context banks (%u stage-2 only)\n",
+		   smmu->num_context_banks, smmu->num_s2_context_banks);
+
+	/* ID2 */
+	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID2);
+	size = arm_smmu_id_size_to_bits((id >> ID2_IAS_SHIFT) & ID2_IAS_MASK);
+
+	/*
+	 * Stage-1 output limited by stage-2 input size due to pgd
+	 * allocation (PTRS_PER_PGD).
+	 */
+#ifdef CONFIG_64BIT
+	/* Current maximum output size of 39 bits */
+	smmu->s1_output_size = min(39UL, size);
+#else
+	smmu->s1_output_size = min(32UL, size);
+#endif
+
+	/* The stage-2 output mask is also applied for bypass */
+	size = arm_smmu_id_size_to_bits((id >> ID2_OAS_SHIFT) & ID2_OAS_MASK);
+	smmu->s2_output_size = min((unsigned long)PHYS_MASK_SHIFT, size);
+
+	if (smmu->version == 1) {
+		smmu->input_size = 32;
+	} else {
+#ifdef CONFIG_64BIT
+		size = (id >> ID2_UBS_SHIFT) & ID2_UBS_MASK;
+		size = min(39, arm_smmu_id_size_to_bits(size));
+#else
+		size = 32;
+#endif
+		smmu->input_size = size;
+
+		if ((PAGE_SIZE == SZ_4K && !(id & ID2_PTFS_4K)) ||
+		    (PAGE_SIZE == SZ_64K && !(id & ID2_PTFS_64K)) ||
+		    (PAGE_SIZE != SZ_4K && PAGE_SIZE != SZ_64K)) {
+			dev_err(smmu->dev, "CPU page size 0x%lx unsupported\n",
+				PAGE_SIZE);
+			return -ENODEV;
+		}
+	}
+
+	dev_notice(smmu->dev,
+		   "\t%lu-bit VA, %lu-bit IPA, %lu-bit PA\n",
+		   smmu->input_size, smmu->s1_output_size, smmu->s2_output_size);
+	return 0;
+}
+
+static int arm_smmu_device_dt_probe(struct platform_device *pdev)
+{
+	struct resource *res;
+	struct arm_smmu_device *smmu;
+	struct device_node *dev_node;
+	struct device *dev = &pdev->dev;
+	struct rb_node *node;
+	struct of_phandle_args masterspec;
+	int num_irqs, i, err;
+
+	smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL);
+	if (!smmu) {
+		dev_err(dev, "failed to allocate arm_smmu_device\n");
+		return -ENOMEM;
+	}
+	smmu->dev = dev;
+
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		dev_err(dev, "missing base address/size\n");
+		return -ENODEV;
+	}
+
+	smmu->size = resource_size(res);
+	smmu->base = devm_request_and_ioremap(dev, res);
+	if (!smmu->base)
+		return -EADDRNOTAVAIL;
+
+	if (of_property_read_u32(dev->of_node, "#global-interrupts",
+				 &smmu->num_global_irqs)) {
+		dev_err(dev, "missing #global-interrupts property\n");
+		return -ENODEV;
+	}
+
+	num_irqs = 0;
+	while ((res = platform_get_resource(pdev, IORESOURCE_IRQ, num_irqs))) {
+		num_irqs++;
+		if (num_irqs > smmu->num_global_irqs)
+			smmu->num_context_irqs++;
+	}
+
+	if (num_irqs < smmu->num_global_irqs) {
+		dev_warn(dev, "found %d interrupts but expected at least %d\n",
+			 num_irqs, smmu->num_global_irqs);
+		smmu->num_global_irqs = num_irqs;
+	}
+	smmu->num_context_irqs = num_irqs - smmu->num_global_irqs;
+
+	smmu->irqs = devm_kzalloc(dev, sizeof(*smmu->irqs) * num_irqs,
+				  GFP_KERNEL);
+	if (!smmu->irqs) {
+		dev_err(dev, "failed to allocate %d irqs\n", num_irqs);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < num_irqs; ++i) {
+		int irq = platform_get_irq(pdev, i);
+		if (irq < 0) {
+			dev_err(dev, "failed to get irq index %d\n", i);
+			return -ENODEV;
+		}
+		smmu->irqs[i] = irq;
+	}
+
+	i = 0;
+	smmu->masters = RB_ROOT;
+	while (!of_parse_phandle_with_args(dev->of_node, "mmu-masters",
+					   "#stream-id-cells", i,
+					   &masterspec)) {
+		err = register_smmu_master(smmu, dev, &masterspec);
+		if (err) {
+			dev_err(dev, "failed to add master %s\n",
+				masterspec.np->name);
+			goto out_put_masters;
+		}
+
+		i++;
+	}
+	dev_notice(dev, "registered %d master devices\n", i);
+
+	if ((dev_node = of_parse_phandle(dev->of_node, "smmu-parent", 0)))
+		smmu->parent_of_node = dev_node;
+
+	err = arm_smmu_device_cfg_probe(smmu);
+	if (err)
+		goto out_put_parent;
+
+	if (smmu->version > 1 &&
+	    smmu->num_context_banks != smmu->num_context_irqs) {
+		dev_err(dev,
+			"found only %d context interrupt(s) but %d required\n",
+			smmu->num_context_irqs, smmu->num_context_banks);
+		goto out_put_parent;
+	}
+
+	arm_smmu_device_reset(smmu);
+
+	for (i = 0; i < smmu->num_global_irqs; ++i) {
+		err = request_irq(smmu->irqs[i],
+				  arm_smmu_global_fault,
+				  IRQF_SHARED,
+				  "arm-smmu global fault",
+				  smmu);
+		if (err) {
+			dev_err(dev, "failed to request global IRQ %d (%u)\n",
+				i, smmu->irqs[i]);
+			goto out_free_irqs;
+		}
+	}
+
+	INIT_LIST_HEAD(&smmu->list);
+	spin_lock(&arm_smmu_devices_lock);
+	list_add(&smmu->list, &arm_smmu_devices);
+	spin_unlock(&arm_smmu_devices_lock);
+	return 0;
+
+out_free_irqs:
+	while (i--)
+		free_irq(smmu->irqs[i], smmu);
+
+out_put_parent:
+	if (smmu->parent_of_node)
+		of_node_put(smmu->parent_of_node);
+
+out_put_masters:
+	for (node = rb_first(&smmu->masters); node; node = rb_next(node)) {
+		struct arm_smmu_master *master;
+		master = container_of(node, struct arm_smmu_master, node);
+		of_node_put(master->of_node);
+	}
+
+	return err;
+}
+
+static int arm_smmu_device_remove(struct platform_device *pdev)
+{
+	int i;
+	struct device *dev = &pdev->dev;
+	struct arm_smmu_device *curr, *tmp, *smmu = NULL;
+	struct rb_node *node;
+
+	list_for_each_entry_safe(curr, tmp, &arm_smmu_devices, list) {
+		if (curr->dev == dev) {
+			smmu = curr;
+			break;
+		}
+	}
+
+	if (!smmu)
+		return -ENODEV;
+
+	spin_lock(&arm_smmu_devices_lock);
+	list_del(&smmu->list);
+	spin_unlock(&arm_smmu_devices_lock);
+
+	if (smmu->parent_of_node)
+		of_node_put(smmu->parent_of_node);
+
+	for (node = rb_first(&smmu->masters); node; node = rb_next(node)) {
+		struct arm_smmu_master *master;
+		master = container_of(node, struct arm_smmu_master, node);
+		of_node_put(master->of_node);
+	}
+
+	if (!bitmap_empty(smmu->vmid_map, ARM_SMMU_NUM_VMIDS))
+		dev_err(dev, "removing device with active domains!\n");
+
+	for (i = 0; i < smmu->num_global_irqs; ++i)
+		free_irq(smmu->irqs[i], smmu);
+
+	/* Turn the thing off */
+	writel(sCR0_CLIENTPD, ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_sCR0);
+	return 0;
+}
+
+#ifdef CONFIG_OF
+static struct of_device_id arm_smmu_of_match[] = {
+	{ .compatible = "arm,smmu-v1", },
+	{ .compatible = "arm,smmu-v2", },
+	{ .compatible = "arm,mmu-400", },
+	{ .compatible = "arm,mmu-500", },
+	{ },
+};
+MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
+#endif
+
+static struct platform_driver arm_smmu_driver = {
+	.driver	= {
+		.owner		= THIS_MODULE,
+		.name		= "arm-smmu",
+		.of_match_table	= of_match_ptr(arm_smmu_of_match),
+	},
+	.probe	= arm_smmu_device_dt_probe,
+	.remove	= arm_smmu_device_remove,
+};
+
+static int __init arm_smmu_init(void)
+{
+	int ret;
+
+	ret = platform_driver_register(&arm_smmu_driver);
+	if (ret)
+		return ret;
+
+	/* Oh, for a proper bus abstraction */
+	if (!iommu_present(&platform_bus_type));
+		bus_set_iommu(&platform_bus_type, &arm_smmu_ops);
+
+	if (!iommu_present(&amba_bustype));
+		bus_set_iommu(&amba_bustype, &arm_smmu_ops);
+
+	return 0;
+}
+
+static void __exit arm_smmu_exit(void)
+{
+	return platform_driver_unregister(&arm_smmu_driver);
+}
+
+module_init(arm_smmu_init);
+module_exit(arm_smmu_exit);
+
+MODULE_DESCRIPTION("IOMMU API for ARM architected SMMU implementations");
+MODULE_AUTHOR("Will Deacon <will.deacon@arm.com>");
+MODULE_LICENSE("GPL v2");
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 9/9] MAINTAINERS: add entry for ARM system MMU driver
  2013-06-10 18:34 ` Will Deacon
@ 2013-06-10 18:34     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Add myself as maintainer for the ARM system MMU driver.

Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 250dc97..84df4bd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1310,6 +1310,12 @@ T:	git git://git.xilinx.com/linux-xlnx.git
 S:	Supported
 F:	arch/arm/mach-zynq/
 
+ARM SMMU DRIVER
+M:	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
+L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)
+S:	Maintained
+F:	drivers/iommu/arm-smmu.c
+
 ARM64 PORT (AARCH64 ARCHITECTURE)
 M:	Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
 M:	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 9/9] MAINTAINERS: add entry for ARM system MMU driver
@ 2013-06-10 18:34     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-10 18:34 UTC (permalink / raw)
  To: linux-arm-kernel

Add myself as maintainer for the ARM system MMU driver.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 250dc97..84df4bd 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1310,6 +1310,12 @@ T:	git git://git.xilinx.com/linux-xlnx.git
 S:	Supported
 F:	arch/arm/mach-zynq/
 
+ARM SMMU DRIVER
+M:	Will Deacon <will.deacon@arm.com>
+L:	linux-arm-kernel at lists.infradead.org (moderated for non-subscribers)
+S:	Maintained
+F:	drivers/iommu/arm-smmu.c
+
 ARM64 PORT (AARCH64 ARCHITECTURE)
 M:	Catalin Marinas <catalin.marinas@arm.com>
 M:	Will Deacon <will.deacon@arm.com>
-- 
1.8.2.2

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
       [not found]     ` <1370889285-22799-3-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
@ 2013-06-11  4:37       ` Jassi Brar
       [not found]         ` <CAJe_ZheKMVQgq42Vx5N1TXXdgFJ2sp50ixU30A7beXhmSVHnZQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  2013-06-11 22:32         ` Grant Likely
  1 sibling, 1 reply; 97+ messages in thread
From: Jassi Brar @ 2013-06-11  4:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Vinod Koul, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	device-tree, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 1492 bytes --]

On 11 June 2013 00:04, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:

> The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
> which fails to compile when building on a system with 64-bit bus
> addresses.
>
> This patch uses dma_addr_t to represent the address instead.
>
> Acked-by: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>


> Cc: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Cc: Vinod Koul <vinod.koul-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> ---
>  drivers/dma/pl330.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 22e2a8f..f1bc593 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -501,7 +501,7 @@ struct pl330_dmac {
>         /* Maximum possible events/irqs */
>         int                     events[32];
>         /* BUS address of MicroCode buffer */
> -       u32                     mcode_bus;
> +       dma_addr_t              mcode_bus;
>         /* CPU address of MicroCode buffer */
>         void                    *mcode_cpu;
>         /* List of all Channel threads */
> --
> 1.8.2.2
>
>


-- 
Linaro.org │ Open source software for ARM SoCs | Follow Linaro
http://facebook.com/pages/Linaro/155974581091106  -
http://twitter.com/#!/linaroorg - http://linaro.org/linaro-blog

[-- Attachment #1.2: Type: text/html, Size: 2778 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
       [not found]     ` <1370889285-22799-2-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
@ 2013-06-11  4:37       ` Jassi Brar
  2013-06-11 22:31         ` Grant Likely
  2013-06-12  5:31         ` Vinod Koul
  2 siblings, 0 replies; 97+ messages in thread
From: Jassi Brar @ 2013-06-11  4:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Vinod Koul, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	device-tree, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r


[-- Attachment #1.1: Type: text/plain, Size: 649 bytes --]

On 11 June 2013 00:04, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:

> The PL330 driver probes the peripheral and primecell IDs of the device to
> make sure that it is indeed an AMBA PL330. However, it does this by
> making byte accesses to a device mapping of the word-aligned ID
> registers, which is either UNPREDICTABLE or generates an alignment fault
> (depending on the presence of the virtualisation extensions).
>
> Rather than fix this code, we can actually rip most of it out and let
> the AMBA bus driver correctly do the probing for us.
>
> Acked-by: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

[-- Attachment #1.2: Type: text/html, Size: 1139 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-11  4:39       ` Jassi Brar
  -1 siblings, 0 replies; 97+ messages in thread
From: Jassi Brar @ 2013-06-11  4:39 UTC (permalink / raw)
  To: Will Deacon; +Cc: Vinod Koul, iommu, device-tree, linux-arm-kernel

On 11 June 2013 00:04, Will Deacon <will.deacon@arm.com> wrote:
> The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
> which fails to compile when building on a system with 64-bit bus
> addresses.
>
> This patch uses dma_addr_t to represent the address instead.
>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
@ 2013-06-11  4:39       ` Jassi Brar
  0 siblings, 0 replies; 97+ messages in thread
From: Jassi Brar @ 2013-06-11  4:39 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 June 2013 00:04, Will Deacon <will.deacon@arm.com> wrote:
> The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
> which fails to compile when building on a system with 64-bit bus
> addresses.
>
> This patch uses dma_addr_t to represent the address instead.
>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-11  4:40       ` Jassi Brar
  -1 siblings, 0 replies; 97+ messages in thread
From: Jassi Brar @ 2013-06-11  4:40 UTC (permalink / raw)
  To: Will Deacon; +Cc: Vinod Koul, iommu, device-tree, linux-arm-kernel

On 11 June 2013 00:04, Will Deacon <will.deacon@arm.com> wrote:
> The PL330 driver probes the peripheral and primecell IDs of the device to
> make sure that it is indeed an AMBA PL330. However, it does this by
> making byte accesses to a device mapping of the word-aligned ID
> registers, which is either UNPREDICTABLE or generates an alignment fault
> (depending on the presence of the virtualisation extensions).
>
> Rather than fix this code, we can actually rip most of it out and let
> the AMBA bus driver correctly do the probing for us.
>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
@ 2013-06-11  4:40       ` Jassi Brar
  0 siblings, 0 replies; 97+ messages in thread
From: Jassi Brar @ 2013-06-11  4:40 UTC (permalink / raw)
  To: linux-arm-kernel

On 11 June 2013 00:04, Will Deacon <will.deacon@arm.com> wrote:
> The PL330 driver probes the peripheral and primecell IDs of the device to
> make sure that it is indeed an AMBA PL330. However, it does this by
> making byte accesses to a device mapping of the word-aligned ID
> registers, which is either UNPREDICTABLE or generates an alignment fault
> (depending on the presence of the virtualisation extensions).
>
> Rather than fix this code, we can actually rip most of it out and let
> the AMBA bus driver correctly do the probing for us.
>
Acked-by: Jassi Brar <jaswinder.singh@linaro.org>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-11  5:34         ` Hiroshi Doyu
  -1 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-11  5:34 UTC (permalink / raw)
  To: m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ, will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

Hi Will,

Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote @ Mon, 10 Jun 2013 20:34:40 +0200:

> The current code only clobbers a local variable, so the device is left
> with a stale mapping pointer.

True. This's my bad. Thanks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
@ 2013-06-11  5:34         ` Hiroshi Doyu
  0 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-11  5:34 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

Will Deacon <will.deacon@arm.com> wrote @ Mon, 10 Jun 2013 20:34:40 +0200:

> The current code only clobbers a local variable, so the device is left
> with a stale mapping pointer.

True. This's my bad. Thanks.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
  2013-06-11  4:40       ` Jassi Brar
@ 2013-06-11  8:45           ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-11  8:45 UTC (permalink / raw)
  To: Jassi Brar
  Cc: Vinod Koul, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	device-tree, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 11, 2013 at 05:40:36AM +0100, Jassi Brar wrote:
> On 11 June 2013 00:04, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> > The PL330 driver probes the peripheral and primecell IDs of the device to
> > make sure that it is indeed an AMBA PL330. However, it does this by
> > making byte accesses to a device mapping of the word-aligned ID
> > registers, which is either UNPREDICTABLE or generates an alignment fault
> > (depending on the presence of the virtualisation extensions).
> >
> > Rather than fix this code, we can actually rip most of it out and let
> > the AMBA bus driver correctly do the probing for us.
> >
> Acked-by: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

Thanks for the acks Jassi!

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
@ 2013-06-11  8:45           ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-11  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 11, 2013 at 05:40:36AM +0100, Jassi Brar wrote:
> On 11 June 2013 00:04, Will Deacon <will.deacon@arm.com> wrote:
> > The PL330 driver probes the peripheral and primecell IDs of the device to
> > make sure that it is indeed an AMBA PL330. However, it does this by
> > making byte accesses to a device mapping of the word-aligned ID
> > registers, which is either UNPREDICTABLE or generates an alignment fault
> > (depending on the presence of the virtualisation extensions).
> >
> > Rather than fix this code, we can actually rip most of it out and let
> > the AMBA bus driver correctly do the probing for us.
> >
> Acked-by: Jassi Brar <jaswinder.singh@linaro.org>

Thanks for the acks Jassi!

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
  2013-06-11  5:34         ` Hiroshi Doyu
@ 2013-06-11  8:50             ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-11  8:50 UTC (permalink / raw)
  To: Hiroshi Doyu
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 11, 2013 at 06:34:55AM +0100, Hiroshi Doyu wrote:
> Hi Will,
> 
> Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote @ Mon, 10 Jun 2013 20:34:40 +0200:
> 
> > The current code only clobbers a local variable, so the device is left
> > with a stale mapping pointer.
> 
> True. This's my bad. Thanks.

That's alright, it's easy to fix. Mind if I add your ack please?

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
@ 2013-06-11  8:50             ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-11  8:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 11, 2013 at 06:34:55AM +0100, Hiroshi Doyu wrote:
> Hi Will,
> 
> Will Deacon <will.deacon@arm.com> wrote @ Mon, 10 Jun 2013 20:34:40 +0200:
> 
> > The current code only clobbers a local variable, so the device is left
> > with a stale mapping pointer.
> 
> True. This's my bad. Thanks.

That's alright, it's easy to fix. Mind if I add your ack please?

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
  2013-06-11  8:50             ` Will Deacon
@ 2013-06-11  9:39                 ` Hiroshi Doyu
  -1 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-11  9:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, 11 Jun 2013 10:50:15 +0200
Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:

> On Tue, Jun 11, 2013 at 06:34:55AM +0100, Hiroshi Doyu wrote:
> > Hi Will,
> > 
> > Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote @ Mon, 10 Jun 2013 20:34:40 +0200:
> > 
> > > The current code only clobbers a local variable, so the device is left
> > > with a stale mapping pointer.
> > 
> > True. This's my bad. Thanks.
> 
> That's alright, it's easy to fix. Mind if I add your ack please?

Feel free to add: Acked-by: Hiroshi Doyu <hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
@ 2013-06-11  9:39                 ` Hiroshi Doyu
  0 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-11  9:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, 11 Jun 2013 10:50:15 +0200
Will Deacon <will.deacon@arm.com> wrote:

> On Tue, Jun 11, 2013 at 06:34:55AM +0100, Hiroshi Doyu wrote:
> > Hi Will,
> > 
> > Will Deacon <will.deacon@arm.com> wrote @ Mon, 10 Jun 2013 20:34:40 +0200:
> > 
> > > The current code only clobbers a local variable, so the device is left
> > > with a stale mapping pointer.
> > 
> > True. This's my bad. Thanks.
> 
> That's alright, it's easy to fix. Mind if I add your ack please?

Feel free to add: Acked-by: Hiroshi Doyu <hdoyu@nvidia.com>

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-11 22:31         ` Grant Likely
  -1 siblings, 0 replies; 97+ messages in thread
From: Grant Likely @ 2013-06-11 22:31 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Vinod Koul, Jassi Brar,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, 10 Jun 2013 19:34:37 +0100, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> The PL330 driver probes the peripheral and primecell IDs of the device to
> make sure that it is indeed an AMBA PL330. However, it does this by
> making byte accesses to a device mapping of the word-aligned ID
> registers, which is either UNPREDICTABLE or generates an alignment fault
> (depending on the presence of the virtualisation extensions).
> 
> Rather than fix this code, we can actually rip most of it out and let
> the AMBA bus driver correctly do the probing for us.
> 
> Cc: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Cc: Vinod Koul <vinod.koul-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>

Right, device drivers shouldn't be cosing their own primecell ID parser.

Acked-by: Grant Likely <grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

> ---
>  drivers/dma/pl330.c | 27 +++------------------------
>  1 file changed, 3 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 24e0754..22e2a8f 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -157,7 +157,6 @@ enum pl330_reqtype {
>  #define PERIPH_REV_R0P0		0
>  #define PERIPH_REV_R1P0		1
>  #define PERIPH_REV_R1P1		2
> -#define PCELL_ID		0xff0
>  
>  #define CR0_PERIPH_REQ_SET	(1 << 0)
>  #define CR0_BOOT_EN_SET		(1 << 1)
> @@ -193,8 +192,6 @@ enum pl330_reqtype {
>  #define INTEG_CFG		0x0
>  #define PERIPH_ID_VAL		((PART << 0) | (DESIGNER << 12))
>  
> -#define PCELL_ID_VAL		0xb105f00d
> -
>  #define PL330_STATE_STOPPED		(1 << 0)
>  #define PL330_STATE_EXECUTING		(1 << 1)
>  #define PL330_STATE_WFE			(1 << 2)
> @@ -292,7 +289,6 @@ static unsigned cmd_line;
>  /* Populated by the PL330 core driver for DMA API driver's info */
>  struct pl330_config {
>  	u32	periph_id;
> -	u32	pcell_id;
>  #define DMAC_MODE_NS	(1 << 0)
>  	unsigned int	mode;
>  	unsigned int	data_bus_width:10; /* In number of bits */
> @@ -650,19 +646,6 @@ static inline bool _manager_ns(struct pl330_thread *thrd)
>  	return (pl330->pinfo->pcfg.mode & DMAC_MODE_NS) ? true : false;
>  }
>  
> -static inline u32 get_id(struct pl330_info *pi, u32 off)
> -{
> -	void __iomem *regs = pi->base;
> -	u32 id = 0;
> -
> -	id |= (readb(regs + off + 0x0) << 0);
> -	id |= (readb(regs + off + 0x4) << 8);
> -	id |= (readb(regs + off + 0x8) << 16);
> -	id |= (readb(regs + off + 0xc) << 24);
> -
> -	return id;
> -}
> -
>  static inline u32 get_revision(u32 periph_id)
>  {
>  	return (periph_id >> PERIPH_REV_SHIFT) & PERIPH_REV_MASK;
> @@ -1986,9 +1969,6 @@ static void read_dmac_config(struct pl330_info *pi)
>  	pi->pcfg.num_events = val;
>  
>  	pi->pcfg.irq_ns = readl(regs + CR3);
> -
> -	pi->pcfg.periph_id = get_id(pi, PERIPH_ID);
> -	pi->pcfg.pcell_id = get_id(pi, PCELL_ID);
>  }
>  
>  static inline void _reset_thread(struct pl330_thread *thrd)
> @@ -2098,10 +2078,8 @@ static int pl330_add(struct pl330_info *pi)
>  	regs = pi->base;
>  
>  	/* Check if we can handle this DMAC */
> -	if ((get_id(pi, PERIPH_ID) & 0xfffff) != PERIPH_ID_VAL
> -	   || get_id(pi, PCELL_ID) != PCELL_ID_VAL) {
> -		dev_err(pi->dev, "PERIPH_ID 0x%x, PCELL_ID 0x%x !\n",
> -			get_id(pi, PERIPH_ID), get_id(pi, PCELL_ID));
> +	if ((pi->pcfg.periph_id & 0xfffff) != PERIPH_ID_VAL) {
> +		dev_err(pi->dev, "PERIPH_ID 0x%x !\n", pi->pcfg.periph_id);
>  		return -EINVAL;
>  	}
>  
> @@ -2922,6 +2900,7 @@ pl330_probe(struct amba_device *adev, const struct amba_id *id)
>  	if (ret)
>  		return ret;
>  
> +	pi->pcfg.periph_id = adev->periphid;
>  	ret = pl330_add(pi);
>  	if (ret)
>  		goto probe_err1;
> -- 
> 1.8.2.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
@ 2013-06-11 22:31         ` Grant Likely
  0 siblings, 0 replies; 97+ messages in thread
From: Grant Likely @ 2013-06-11 22:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 10 Jun 2013 19:34:37 +0100, Will Deacon <will.deacon@arm.com> wrote:
> The PL330 driver probes the peripheral and primecell IDs of the device to
> make sure that it is indeed an AMBA PL330. However, it does this by
> making byte accesses to a device mapping of the word-aligned ID
> registers, which is either UNPREDICTABLE or generates an alignment fault
> (depending on the presence of the virtualisation extensions).
> 
> Rather than fix this code, we can actually rip most of it out and let
> the AMBA bus driver correctly do the probing for us.
> 
> Cc: Jassi Brar <jaswinder.singh@linaro.org>
> Cc: Vinod Koul <vinod.koul@intel.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Right, device drivers shouldn't be cosing their own primecell ID parser.

Acked-by: Grant Likely <grant.likely@linaro.org>

> ---
>  drivers/dma/pl330.c | 27 +++------------------------
>  1 file changed, 3 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 24e0754..22e2a8f 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -157,7 +157,6 @@ enum pl330_reqtype {
>  #define PERIPH_REV_R0P0		0
>  #define PERIPH_REV_R1P0		1
>  #define PERIPH_REV_R1P1		2
> -#define PCELL_ID		0xff0
>  
>  #define CR0_PERIPH_REQ_SET	(1 << 0)
>  #define CR0_BOOT_EN_SET		(1 << 1)
> @@ -193,8 +192,6 @@ enum pl330_reqtype {
>  #define INTEG_CFG		0x0
>  #define PERIPH_ID_VAL		((PART << 0) | (DESIGNER << 12))
>  
> -#define PCELL_ID_VAL		0xb105f00d
> -
>  #define PL330_STATE_STOPPED		(1 << 0)
>  #define PL330_STATE_EXECUTING		(1 << 1)
>  #define PL330_STATE_WFE			(1 << 2)
> @@ -292,7 +289,6 @@ static unsigned cmd_line;
>  /* Populated by the PL330 core driver for DMA API driver's info */
>  struct pl330_config {
>  	u32	periph_id;
> -	u32	pcell_id;
>  #define DMAC_MODE_NS	(1 << 0)
>  	unsigned int	mode;
>  	unsigned int	data_bus_width:10; /* In number of bits */
> @@ -650,19 +646,6 @@ static inline bool _manager_ns(struct pl330_thread *thrd)
>  	return (pl330->pinfo->pcfg.mode & DMAC_MODE_NS) ? true : false;
>  }
>  
> -static inline u32 get_id(struct pl330_info *pi, u32 off)
> -{
> -	void __iomem *regs = pi->base;
> -	u32 id = 0;
> -
> -	id |= (readb(regs + off + 0x0) << 0);
> -	id |= (readb(regs + off + 0x4) << 8);
> -	id |= (readb(regs + off + 0x8) << 16);
> -	id |= (readb(regs + off + 0xc) << 24);
> -
> -	return id;
> -}
> -
>  static inline u32 get_revision(u32 periph_id)
>  {
>  	return (periph_id >> PERIPH_REV_SHIFT) & PERIPH_REV_MASK;
> @@ -1986,9 +1969,6 @@ static void read_dmac_config(struct pl330_info *pi)
>  	pi->pcfg.num_events = val;
>  
>  	pi->pcfg.irq_ns = readl(regs + CR3);
> -
> -	pi->pcfg.periph_id = get_id(pi, PERIPH_ID);
> -	pi->pcfg.pcell_id = get_id(pi, PCELL_ID);
>  }
>  
>  static inline void _reset_thread(struct pl330_thread *thrd)
> @@ -2098,10 +2078,8 @@ static int pl330_add(struct pl330_info *pi)
>  	regs = pi->base;
>  
>  	/* Check if we can handle this DMAC */
> -	if ((get_id(pi, PERIPH_ID) & 0xfffff) != PERIPH_ID_VAL
> -	   || get_id(pi, PCELL_ID) != PCELL_ID_VAL) {
> -		dev_err(pi->dev, "PERIPH_ID 0x%x, PCELL_ID 0x%x !\n",
> -			get_id(pi, PERIPH_ID), get_id(pi, PCELL_ID));
> +	if ((pi->pcfg.periph_id & 0xfffff) != PERIPH_ID_VAL) {
> +		dev_err(pi->dev, "PERIPH_ID 0x%x !\n", pi->pcfg.periph_id);
>  		return -EINVAL;
>  	}
>  
> @@ -2922,6 +2900,7 @@ pl330_probe(struct amba_device *adev, const struct amba_id *id)
>  	if (ret)
>  		return ret;
>  
> +	pi->pcfg.periph_id = adev->periphid;
>  	ret = pl330_add(pi);
>  	if (ret)
>  		goto probe_err1;
> -- 
> 1.8.2.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-11 22:32         ` Grant Likely
  -1 siblings, 0 replies; 97+ messages in thread
From: Grant Likely @ 2013-06-11 22:32 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Vinod Koul, Jassi Brar,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, 10 Jun 2013 19:34:38 +0100, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
> which fails to compile when building on a system with 64-bit bus
> addresses.
> 
> This patch uses dma_addr_t to represent the address instead.
> 
> Cc: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Cc: Vinod Koul <vinod.koul-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>

Acked-by: Grant Likely <grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

> ---
>  drivers/dma/pl330.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 22e2a8f..f1bc593 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -501,7 +501,7 @@ struct pl330_dmac {
>  	/* Maximum possible events/irqs */
>  	int			events[32];
>  	/* BUS address of MicroCode buffer */
> -	u32			mcode_bus;
> +	dma_addr_t		mcode_bus;
>  	/* CPU address of MicroCode buffer */
>  	void			*mcode_cpu;
>  	/* List of all Channel threads */
> -- 
> 1.8.2.2
> 
> _______________________________________________
> devicetree-discuss mailing list
> devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
> https://lists.ozlabs.org/listinfo/devicetree-discuss

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
@ 2013-06-11 22:32         ` Grant Likely
  0 siblings, 0 replies; 97+ messages in thread
From: Grant Likely @ 2013-06-11 22:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 10 Jun 2013 19:34:38 +0100, Will Deacon <will.deacon@arm.com> wrote:
> The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
> which fails to compile when building on a system with 64-bit bus
> addresses.
> 
> This patch uses dma_addr_t to represent the address instead.
> 
> Cc: Jassi Brar <jaswinder.singh@linaro.org>
> Cc: Vinod Koul <vinod.koul@intel.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Grant Likely <grant.likely@linaro.org>

> ---
>  drivers/dma/pl330.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 22e2a8f..f1bc593 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -501,7 +501,7 @@ struct pl330_dmac {
>  	/* Maximum possible events/irqs */
>  	int			events[32];
>  	/* BUS address of MicroCode buffer */
> -	u32			mcode_bus;
> +	dma_addr_t		mcode_bus;
>  	/* CPU address of MicroCode buffer */
>  	void			*mcode_cpu;
>  	/* List of all Channel threads */
> -- 
> 1.8.2.2
> 
> _______________________________________________
> devicetree-discuss mailing list
> devicetree-discuss at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/devicetree-discuss

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-12  5:31         ` Vinod Koul
  -1 siblings, 0 replies; 97+ messages in thread
From: Vinod Koul @ 2013-06-12  5:31 UTC (permalink / raw)
  To: Will Deacon
  Cc: Jassi Brar, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Jun 10, 2013 at 07:34:37PM +0100, Will Deacon wrote:
> The PL330 driver probes the peripheral and primecell IDs of the device to
> make sure that it is indeed an AMBA PL330. However, it does this by
> making byte accesses to a device mapping of the word-aligned ID
> registers, which is either UNPREDICTABLE or generates an alignment fault
> (depending on the presence of the virtualisation extensions).
> 
> Rather than fix this code, we can actually rip most of it out and let
> the AMBA bus driver correctly do the probing for us.
> 
> Cc: Jassi Brar <jaswinder.singh-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
> Cc: Vinod Koul <vinod.koul-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
Applied, thanks

--
~Vinod
> ---
>  drivers/dma/pl330.c | 27 +++------------------------
>  1 file changed, 3 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 24e0754..22e2a8f 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -157,7 +157,6 @@ enum pl330_reqtype {
>  #define PERIPH_REV_R0P0		0
>  #define PERIPH_REV_R1P0		1
>  #define PERIPH_REV_R1P1		2
> -#define PCELL_ID		0xff0
>  
>  #define CR0_PERIPH_REQ_SET	(1 << 0)
>  #define CR0_BOOT_EN_SET		(1 << 1)
> @@ -193,8 +192,6 @@ enum pl330_reqtype {
>  #define INTEG_CFG		0x0
>  #define PERIPH_ID_VAL		((PART << 0) | (DESIGNER << 12))
>  
> -#define PCELL_ID_VAL		0xb105f00d
> -
>  #define PL330_STATE_STOPPED		(1 << 0)
>  #define PL330_STATE_EXECUTING		(1 << 1)
>  #define PL330_STATE_WFE			(1 << 2)
> @@ -292,7 +289,6 @@ static unsigned cmd_line;
>  /* Populated by the PL330 core driver for DMA API driver's info */
>  struct pl330_config {
>  	u32	periph_id;
> -	u32	pcell_id;
>  #define DMAC_MODE_NS	(1 << 0)
>  	unsigned int	mode;
>  	unsigned int	data_bus_width:10; /* In number of bits */
> @@ -650,19 +646,6 @@ static inline bool _manager_ns(struct pl330_thread *thrd)
>  	return (pl330->pinfo->pcfg.mode & DMAC_MODE_NS) ? true : false;
>  }
>  
> -static inline u32 get_id(struct pl330_info *pi, u32 off)
> -{
> -	void __iomem *regs = pi->base;
> -	u32 id = 0;
> -
> -	id |= (readb(regs + off + 0x0) << 0);
> -	id |= (readb(regs + off + 0x4) << 8);
> -	id |= (readb(regs + off + 0x8) << 16);
> -	id |= (readb(regs + off + 0xc) << 24);
> -
> -	return id;
> -}
> -
>  static inline u32 get_revision(u32 periph_id)
>  {
>  	return (periph_id >> PERIPH_REV_SHIFT) & PERIPH_REV_MASK;
> @@ -1986,9 +1969,6 @@ static void read_dmac_config(struct pl330_info *pi)
>  	pi->pcfg.num_events = val;
>  
>  	pi->pcfg.irq_ns = readl(regs + CR3);
> -
> -	pi->pcfg.periph_id = get_id(pi, PERIPH_ID);
> -	pi->pcfg.pcell_id = get_id(pi, PCELL_ID);
>  }
>  
>  static inline void _reset_thread(struct pl330_thread *thrd)
> @@ -2098,10 +2078,8 @@ static int pl330_add(struct pl330_info *pi)
>  	regs = pi->base;
>  
>  	/* Check if we can handle this DMAC */
> -	if ((get_id(pi, PERIPH_ID) & 0xfffff) != PERIPH_ID_VAL
> -	   || get_id(pi, PCELL_ID) != PCELL_ID_VAL) {
> -		dev_err(pi->dev, "PERIPH_ID 0x%x, PCELL_ID 0x%x !\n",
> -			get_id(pi, PERIPH_ID), get_id(pi, PCELL_ID));
> +	if ((pi->pcfg.periph_id & 0xfffff) != PERIPH_ID_VAL) {
> +		dev_err(pi->dev, "PERIPH_ID 0x%x !\n", pi->pcfg.periph_id);
>  		return -EINVAL;
>  	}
>  
> @@ -2922,6 +2900,7 @@ pl330_probe(struct amba_device *adev, const struct amba_id *id)
>  	if (ret)
>  		return ret;
>  
> +	pi->pcfg.periph_id = adev->periphid;
>  	ret = pl330_add(pi);
>  	if (ret)
>  		goto probe_err1;
> -- 
> 1.8.2.2
> 

-- 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing
@ 2013-06-12  5:31         ` Vinod Koul
  0 siblings, 0 replies; 97+ messages in thread
From: Vinod Koul @ 2013-06-12  5:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 10, 2013 at 07:34:37PM +0100, Will Deacon wrote:
> The PL330 driver probes the peripheral and primecell IDs of the device to
> make sure that it is indeed an AMBA PL330. However, it does this by
> making byte accesses to a device mapping of the word-aligned ID
> registers, which is either UNPREDICTABLE or generates an alignment fault
> (depending on the presence of the virtualisation extensions).
> 
> Rather than fix this code, we can actually rip most of it out and let
> the AMBA bus driver correctly do the probing for us.
> 
> Cc: Jassi Brar <jaswinder.singh@linaro.org>
> Cc: Vinod Koul <vinod.koul@intel.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
Applied, thanks

--
~Vinod
> ---
>  drivers/dma/pl330.c | 27 +++------------------------
>  1 file changed, 3 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> index 24e0754..22e2a8f 100644
> --- a/drivers/dma/pl330.c
> +++ b/drivers/dma/pl330.c
> @@ -157,7 +157,6 @@ enum pl330_reqtype {
>  #define PERIPH_REV_R0P0		0
>  #define PERIPH_REV_R1P0		1
>  #define PERIPH_REV_R1P1		2
> -#define PCELL_ID		0xff0
>  
>  #define CR0_PERIPH_REQ_SET	(1 << 0)
>  #define CR0_BOOT_EN_SET		(1 << 1)
> @@ -193,8 +192,6 @@ enum pl330_reqtype {
>  #define INTEG_CFG		0x0
>  #define PERIPH_ID_VAL		((PART << 0) | (DESIGNER << 12))
>  
> -#define PCELL_ID_VAL		0xb105f00d
> -
>  #define PL330_STATE_STOPPED		(1 << 0)
>  #define PL330_STATE_EXECUTING		(1 << 1)
>  #define PL330_STATE_WFE			(1 << 2)
> @@ -292,7 +289,6 @@ static unsigned cmd_line;
>  /* Populated by the PL330 core driver for DMA API driver's info */
>  struct pl330_config {
>  	u32	periph_id;
> -	u32	pcell_id;
>  #define DMAC_MODE_NS	(1 << 0)
>  	unsigned int	mode;
>  	unsigned int	data_bus_width:10; /* In number of bits */
> @@ -650,19 +646,6 @@ static inline bool _manager_ns(struct pl330_thread *thrd)
>  	return (pl330->pinfo->pcfg.mode & DMAC_MODE_NS) ? true : false;
>  }
>  
> -static inline u32 get_id(struct pl330_info *pi, u32 off)
> -{
> -	void __iomem *regs = pi->base;
> -	u32 id = 0;
> -
> -	id |= (readb(regs + off + 0x0) << 0);
> -	id |= (readb(regs + off + 0x4) << 8);
> -	id |= (readb(regs + off + 0x8) << 16);
> -	id |= (readb(regs + off + 0xc) << 24);
> -
> -	return id;
> -}
> -
>  static inline u32 get_revision(u32 periph_id)
>  {
>  	return (periph_id >> PERIPH_REV_SHIFT) & PERIPH_REV_MASK;
> @@ -1986,9 +1969,6 @@ static void read_dmac_config(struct pl330_info *pi)
>  	pi->pcfg.num_events = val;
>  
>  	pi->pcfg.irq_ns = readl(regs + CR3);
> -
> -	pi->pcfg.periph_id = get_id(pi, PERIPH_ID);
> -	pi->pcfg.pcell_id = get_id(pi, PCELL_ID);
>  }
>  
>  static inline void _reset_thread(struct pl330_thread *thrd)
> @@ -2098,10 +2078,8 @@ static int pl330_add(struct pl330_info *pi)
>  	regs = pi->base;
>  
>  	/* Check if we can handle this DMAC */
> -	if ((get_id(pi, PERIPH_ID) & 0xfffff) != PERIPH_ID_VAL
> -	   || get_id(pi, PCELL_ID) != PCELL_ID_VAL) {
> -		dev_err(pi->dev, "PERIPH_ID 0x%x, PCELL_ID 0x%x !\n",
> -			get_id(pi, PERIPH_ID), get_id(pi, PCELL_ID));
> +	if ((pi->pcfg.periph_id & 0xfffff) != PERIPH_ID_VAL) {
> +		dev_err(pi->dev, "PERIPH_ID 0x%x !\n", pi->pcfg.periph_id);
>  		return -EINVAL;
>  	}
>  
> @@ -2922,6 +2900,7 @@ pl330_probe(struct amba_device *adev, const struct amba_id *id)
>  	if (ret)
>  		return ret;
>  
> +	pi->pcfg.periph_id = adev->periphid;
>  	ret = pl330_add(pi);
>  	if (ret)
>  		goto probe_err1;
> -- 
> 1.8.2.2
> 

-- 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
  2013-06-11  4:37       ` Jassi Brar
@ 2013-06-12  5:31             ` Vinod Koul
  0 siblings, 0 replies; 97+ messages in thread
From: Vinod Koul @ 2013-06-12  5:31 UTC (permalink / raw)
  To: Jassi Brar
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Will Deacon,
	device-tree, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 11, 2013 at 10:07:12AM +0530, Jassi Brar wrote:
> On 11 June 2013 00:04, Will Deacon <will.deacon@arm.com> wrote:
> 
> > The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
> > which fails to compile when building on a system with 64-bit bus
> > addresses.
> >
> > This patch uses dma_addr_t to represent the address instead.
> >
> > Acked-by: Jassi Brar <jaswinder.singh@linaro.org>
> 
> 
> > Cc: Jassi Brar <jaswinder.singh@linaro.org>
> > Cc: Vinod Koul <vinod.koul@intel.com>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
Applied, thanks

--
~Vinod
> > ---
> >  drivers/dma/pl330.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> > index 22e2a8f..f1bc593 100644
> > --- a/drivers/dma/pl330.c
> > +++ b/drivers/dma/pl330.c
> > @@ -501,7 +501,7 @@ struct pl330_dmac {
> >         /* Maximum possible events/irqs */
> >         int                     events[32];
> >         /* BUS address of MicroCode buffer */
> > -       u32                     mcode_bus;
> > +       dma_addr_t              mcode_bus;
> >         /* CPU address of MicroCode buffer */
> >         void                    *mcode_cpu;
> >         /* List of all Channel threads */
> > --
> > 1.8.2.2
> >
> >
> 
> 
> -- 
> Linaro.org │ Open source software for ARM SoCs | Follow Linaro
> http://facebook.com/pages/Linaro/155974581091106  -
> http://twitter.com/#!/linaroorg - http://linaro.org/linaro-blog

-- 
_______________________________________________
devicetree-discuss mailing list
devicetree-discuss@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/devicetree-discuss

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses
@ 2013-06-12  5:31             ` Vinod Koul
  0 siblings, 0 replies; 97+ messages in thread
From: Vinod Koul @ 2013-06-12  5:31 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 11, 2013 at 10:07:12AM +0530, Jassi Brar wrote:
> On 11 June 2013 00:04, Will Deacon <will.deacon@arm.com> wrote:
> 
> > The microcode bus address (pl330_dmac.mcode_bus) is currently a u32,
> > which fails to compile when building on a system with 64-bit bus
> > addresses.
> >
> > This patch uses dma_addr_t to represent the address instead.
> >
> > Acked-by: Jassi Brar <jaswinder.singh@linaro.org>
> 
> 
> > Cc: Jassi Brar <jaswinder.singh@linaro.org>
> > Cc: Vinod Koul <vinod.koul@intel.com>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
Applied, thanks

--
~Vinod
> > ---
> >  drivers/dma/pl330.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
> > index 22e2a8f..f1bc593 100644
> > --- a/drivers/dma/pl330.c
> > +++ b/drivers/dma/pl330.c
> > @@ -501,7 +501,7 @@ struct pl330_dmac {
> >         /* Maximum possible events/irqs */
> >         int                     events[32];
> >         /* BUS address of MicroCode buffer */
> > -       u32                     mcode_bus;
> > +       dma_addr_t              mcode_bus;
> >         /* CPU address of MicroCode buffer */
> >         void                    *mcode_cpu;
> >         /* List of all Channel threads */
> > --
> > 1.8.2.2
> >
> >
> 
> 
> -- 
> Linaro.org ? Open source software for ARM SoCs | Follow Linaro
> http://facebook.com/pages/Linaro/155974581091106  -
> http://twitter.com/#!/linaroorg - http://linaro.org/linaro-blog

-- 

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-12  8:44         ` Grant Likely
  -1 siblings, 0 replies; 97+ messages in thread
From: Grant Likely @ 2013-06-12  8:44 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: Andreas Herrmann, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	Will Deacon, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, 10 Jun 2013 19:34:43 +0100, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> This patch adds a description of the device tree binding for the ARM
> System MMU architecture.
> 
> Cc: Rob Herring <robherring2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Andreas Herrmann <andreas.herrmann-bsGFqQB8/DxBDgjK7y7TUQ@public.gmane.org>
> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>

Acked-by: Grant Likely <grant.likely-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>

> ---
>  .../devicetree/bindings/iommu/arm,smmu.txt         | 70 ++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> new file mode 100644
> index 0000000..e34c6cd
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -0,0 +1,70 @@
> +* ARM System MMU Architecture Implementation
> +
> +ARM SoCs may contain an implementation of the ARM System Memory
> +Management Unit Architecture, which can be used to provide 1 or 2 stages
> +of address translation to bus masters external to the CPU.
> +
> +The SMMU may also raise interrupts in response to various fault
> +conditions.
> +
> +** System MMU required properties:
> +
> +- compatible    : Should be one of:
> +
> +                        "arm,smmu-v1"
> +                        "arm,smmu-v2"
> +                        "arm,mmu-400"
> +                        "arm,mmu-500"
> +
> +                  depending on the particular implementation and/or the
> +                  version of the architecture implemented.
> +
> +- reg           : Base address and size of the SMMU.
> +
> +- #global-interrupts : The number of global interrupts exposed by the
> +                       device.
> +
> +- interrupts    : Interrupt list, with the first #global-irqs entries
> +                  corresponding to the global interrupts and any
> +                  following entries corresponding to context interrupts,
> +                  specified in order of their indexing by the SMMU.
> +
> +                  For SMMUv2 implementations, there must be exactly one
> +                  interrupt per context bank. In the case of a single,
> +                  combined interrupt, it must be listed multiple times.
> +
> +- mmu-masters   : A list of phandles to device nodes representing bus
> +                  masters for which the SMMU can provide a translation
> +                  and their corresponding StreamIDs (see example below).
> +                  Each device node linked from this list must have a
> +                  "#stream-id-cells" property, indicating the number of
> +                  StreamIDs associated with it.
> +
> +** System MMU optional properties:
> +
> +- smmu-parent   : When multiple SMMUs are chained together, this
> +                  property can be used to provide a phandle to the
> +                  parent SMMU (that is the next SMMU on the path going
> +                  from the mmu-masters towards memory) node for this
> +                  SMMU.
> +
> +Example:
> +
> +        smmu {
> +                compatible = "arm,smmu-v1";
> +                reg = <0xba5e0000 0x10000>;
> +                #global-interrupts = <2>;
> +                interrupts = <0 32 4>,
> +                             <0 33 4>,
> +                             <0 34 4>, /* This is the first context interrupt */
> +                             <0 35 4>,
> +                             <0 36 4>,
> +                             <0 37 4>;
> +
> +                /*
> +                 * Two DMA controllers, the first with two StreamIDs (0xd01d
> +                 * and 0xd01e) and the second with only one (0xd11c).
> +                 */
> +                mmu-masters = <&dma0 0xd01d 0xd01e>,
> +                              <&dma1 0xd11c>;
> +        };
> -- 
> 1.8.2.2
> 
> _______________________________________________
> devicetree-discuss mailing list
> devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
> https://lists.ozlabs.org/listinfo/devicetree-discuss

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-12  8:44         ` Grant Likely
  0 siblings, 0 replies; 97+ messages in thread
From: Grant Likely @ 2013-06-12  8:44 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 10 Jun 2013 19:34:43 +0100, Will Deacon <will.deacon@arm.com> wrote:
> This patch adds a description of the device tree binding for the ARM
> System MMU architecture.
> 
> Cc: Rob Herring <robherring2@gmail.com>
> Cc: Andreas Herrmann <andreas.herrmann@calxeda.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Grant Likely <grant.likely@linaro.org>

> ---
>  .../devicetree/bindings/iommu/arm,smmu.txt         | 70 ++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt
> 
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> new file mode 100644
> index 0000000..e34c6cd
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -0,0 +1,70 @@
> +* ARM System MMU Architecture Implementation
> +
> +ARM SoCs may contain an implementation of the ARM System Memory
> +Management Unit Architecture, which can be used to provide 1 or 2 stages
> +of address translation to bus masters external to the CPU.
> +
> +The SMMU may also raise interrupts in response to various fault
> +conditions.
> +
> +** System MMU required properties:
> +
> +- compatible    : Should be one of:
> +
> +                        "arm,smmu-v1"
> +                        "arm,smmu-v2"
> +                        "arm,mmu-400"
> +                        "arm,mmu-500"
> +
> +                  depending on the particular implementation and/or the
> +                  version of the architecture implemented.
> +
> +- reg           : Base address and size of the SMMU.
> +
> +- #global-interrupts : The number of global interrupts exposed by the
> +                       device.
> +
> +- interrupts    : Interrupt list, with the first #global-irqs entries
> +                  corresponding to the global interrupts and any
> +                  following entries corresponding to context interrupts,
> +                  specified in order of their indexing by the SMMU.
> +
> +                  For SMMUv2 implementations, there must be exactly one
> +                  interrupt per context bank. In the case of a single,
> +                  combined interrupt, it must be listed multiple times.
> +
> +- mmu-masters   : A list of phandles to device nodes representing bus
> +                  masters for which the SMMU can provide a translation
> +                  and their corresponding StreamIDs (see example below).
> +                  Each device node linked from this list must have a
> +                  "#stream-id-cells" property, indicating the number of
> +                  StreamIDs associated with it.
> +
> +** System MMU optional properties:
> +
> +- smmu-parent   : When multiple SMMUs are chained together, this
> +                  property can be used to provide a phandle to the
> +                  parent SMMU (that is the next SMMU on the path going
> +                  from the mmu-masters towards memory) node for this
> +                  SMMU.
> +
> +Example:
> +
> +        smmu {
> +                compatible = "arm,smmu-v1";
> +                reg = <0xba5e0000 0x10000>;
> +                #global-interrupts = <2>;
> +                interrupts = <0 32 4>,
> +                             <0 33 4>,
> +                             <0 34 4>, /* This is the first context interrupt */
> +                             <0 35 4>,
> +                             <0 36 4>,
> +                             <0 37 4>;
> +
> +                /*
> +                 * Two DMA controllers, the first with two StreamIDs (0xd01d
> +                 * and 0xd01e) and the second with only one (0xd11c).
> +                 */
> +                mmu-masters = <&dma0 0xd01d 0xd01e>,
> +                              <&dma1 0xd11c>;
> +        };
> -- 
> 1.8.2.2
> 
> _______________________________________________
> devicetree-discuss mailing list
> devicetree-discuss at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/devicetree-discuss

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 9/9] MAINTAINERS: add entry for ARM system MMU driver
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-12  8:45         ` Grant Likely
  -1 siblings, 0 replies; 97+ messages in thread
From: Grant Likely @ 2013-06-12  8:45 UTC (permalink / raw)
  To: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA
  Cc: devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, 10 Jun 2013 19:34:45 +0100, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> Add myself as maintainer for the ARM system MMU driver.
> 
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>

Acked-by: Grant Likely <grant.likely-s3s/WqlpOiPyB63q8FvJNQ@public.gmane.org>

> ---
>  MAINTAINERS | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 250dc97..84df4bd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1310,6 +1310,12 @@ T:	git git://git.xilinx.com/linux-xlnx.git
>  S:	Supported
>  F:	arch/arm/mach-zynq/
>  
> +ARM SMMU DRIVER
> +M:	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> +L:	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org (moderated for non-subscribers)
> +S:	Maintained
> +F:	drivers/iommu/arm-smmu.c
> +
>  ARM64 PORT (AARCH64 ARCHITECTURE)
>  M:	Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
>  M:	Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> -- 
> 1.8.2.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r@public.gmane.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 9/9] MAINTAINERS: add entry for ARM system MMU driver
@ 2013-06-12  8:45         ` Grant Likely
  0 siblings, 0 replies; 97+ messages in thread
From: Grant Likely @ 2013-06-12  8:45 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, 10 Jun 2013 19:34:45 +0100, Will Deacon <will.deacon@arm.com> wrote:
> Add myself as maintainer for the ARM system MMU driver.
> 
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Acked-by: Grant Likely <grant.likely@secretlab.ca>

> ---
>  MAINTAINERS | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 250dc97..84df4bd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -1310,6 +1310,12 @@ T:	git git://git.xilinx.com/linux-xlnx.git
>  S:	Supported
>  F:	arch/arm/mach-zynq/
>  
> +ARM SMMU DRIVER
> +M:	Will Deacon <will.deacon@arm.com>
> +L:	linux-arm-kernel at lists.infradead.org (moderated for non-subscribers)
> +S:	Maintained
> +F:	drivers/iommu/arm-smmu.c
> +
>  ARM64 PORT (AARCH64 ARCHITECTURE)
>  M:	Catalin Marinas <catalin.marinas@arm.com>
>  M:	Will Deacon <will.deacon@arm.com>
> -- 
> 1.8.2.2
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
Grant Likely, B.Sc, P.Eng.
Secret Lab Technologies, Ltd.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-19  8:37       ` Marek Szyprowski
  -1 siblings, 0 replies; 97+ messages in thread
From: Marek Szyprowski @ 2013-06-19  8:37 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, devicetree-discuss, linux-arm-kernel

Hello,

On 6/10/2013 8:34 PM, Will Deacon wrote:
> IOMMU mappings take a prot parameter, identifying the protection bits
> to enforce on the newly created mapping (READ or WRITE). The ARM
> dma-mapping framework currently just passes 0 as the prot argument,
> resulting in faulting mappings.
>
> This patch infers the protection attributes based on the direction of
> the DMA transfer.
>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Thanks for fixing this issue. Could I take this patch to my dma-mapping tree
with other dma-mapping changes I've collected recently or do you want my ack
and push it via other tree?

> ---
>   arch/arm/mm/dma-mapping.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6fb80cf..d119de7 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1636,13 +1636,27 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
>   {
>   	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
>   	dma_addr_t dma_addr;
> -	int ret, len = PAGE_ALIGN(size + offset);
> +	int ret, prot, len = PAGE_ALIGN(size + offset);
>   
>   	dma_addr = __alloc_iova(mapping, len);
>   	if (dma_addr == DMA_ERROR_CODE)
>   		return dma_addr;
>   
> -	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0);
> +	switch (dir) {
> +	case DMA_BIDIRECTIONAL:
> +		prot = IOMMU_READ | IOMMU_WRITE;
> +		break;
> +	case DMA_TO_DEVICE:
> +		prot = IOMMU_READ;
> +		break;
> +	case DMA_FROM_DEVICE:
> +		prot = IOMMU_WRITE;
> +		break;
> +	default:
> +		prot = 0;
> +	}
> +
> +	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
>   	if (ret < 0)
>   		goto fail;
>   

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-19  8:37       ` Marek Szyprowski
  0 siblings, 0 replies; 97+ messages in thread
From: Marek Szyprowski @ 2013-06-19  8:37 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On 6/10/2013 8:34 PM, Will Deacon wrote:
> IOMMU mappings take a prot parameter, identifying the protection bits
> to enforce on the newly created mapping (READ or WRITE). The ARM
> dma-mapping framework currently just passes 0 as the prot argument,
> resulting in faulting mappings.
>
> This patch infers the protection attributes based on the direction of
> the DMA transfer.
>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

Thanks for fixing this issue. Could I take this patch to my dma-mapping tree
with other dma-mapping changes I've collected recently or do you want my ack
and push it via other tree?

> ---
>   arch/arm/mm/dma-mapping.c | 18 ++++++++++++++++--
>   1 file changed, 16 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
> index 6fb80cf..d119de7 100644
> --- a/arch/arm/mm/dma-mapping.c
> +++ b/arch/arm/mm/dma-mapping.c
> @@ -1636,13 +1636,27 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
>   {
>   	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
>   	dma_addr_t dma_addr;
> -	int ret, len = PAGE_ALIGN(size + offset);
> +	int ret, prot, len = PAGE_ALIGN(size + offset);
>   
>   	dma_addr = __alloc_iova(mapping, len);
>   	if (dma_addr == DMA_ERROR_CODE)
>   		return dma_addr;
>   
> -	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0);
> +	switch (dir) {
> +	case DMA_BIDIRECTIONAL:
> +		prot = IOMMU_READ | IOMMU_WRITE;
> +		break;
> +	case DMA_TO_DEVICE:
> +		prot = IOMMU_READ;
> +		break;
> +	case DMA_FROM_DEVICE:
> +		prot = IOMMU_WRITE;
> +		break;
> +	default:
> +		prot = 0;
> +	}
> +
> +	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
>   	if (ret < 0)
>   		goto fail;
>   

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
  2013-06-19  8:37       ` Marek Szyprowski
@ 2013-06-19  8:52           ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-19  8:52 UTC (permalink / raw)
  To: Marek Szyprowski
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Jun 19, 2013 at 09:37:03AM +0100, Marek Szyprowski wrote:
> Hello,

Hi Marek,

> On 6/10/2013 8:34 PM, Will Deacon wrote:
> > IOMMU mappings take a prot parameter, identifying the protection bits
> > to enforce on the newly created mapping (READ or WRITE). The ARM
> > dma-mapping framework currently just passes 0 as the prot argument,
> > resulting in faulting mappings.
> >
> > This patch infers the protection attributes based on the direction of
> > the DMA transfer.
> >
> > Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
> > Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> 
> Thanks for fixing this issue. Could I take this patch to my dma-mapping tree
> with other dma-mapping changes I've collected recently or do you want my ack
> and push it via other tree?

Please, feel free to take it via the dma-mapping tree! That's probably the
best route for it and there's nothing that really depends on it anyway.

Cheers,

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-19  8:52           ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-19  8:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 19, 2013 at 09:37:03AM +0100, Marek Szyprowski wrote:
> Hello,

Hi Marek,

> On 6/10/2013 8:34 PM, Will Deacon wrote:
> > IOMMU mappings take a prot parameter, identifying the protection bits
> > to enforce on the newly created mapping (READ or WRITE). The ARM
> > dma-mapping framework currently just passes 0 as the prot argument,
> > resulting in faulting mappings.
> >
> > This patch infers the protection attributes based on the direction of
> > the DMA transfer.
> >
> > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> Thanks for fixing this issue. Could I take this patch to my dma-mapping tree
> with other dma-mapping changes I've collected recently or do you want my ack
> and push it via other tree?

Please, feel free to take it via the dma-mapping tree! That's probably the
best route for it and there's nothing that really depends on it anyway.

Cheers,

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
  2013-06-19  8:52           ` Will Deacon
@ 2013-06-19  8:57               ` Marek Szyprowski
  -1 siblings, 0 replies; 97+ messages in thread
From: Marek Szyprowski @ 2013-06-19  8:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hello,

On 6/19/2013 10:52 AM, Will Deacon wrote:
> On Wed, Jun 19, 2013 at 09:37:03AM +0100, Marek Szyprowski wrote:
> > Hello,
>
> Hi Marek,
>
> > On 6/10/2013 8:34 PM, Will Deacon wrote:
> > > IOMMU mappings take a prot parameter, identifying the protection bits
> > > to enforce on the newly created mapping (READ or WRITE). The ARM
> > > dma-mapping framework currently just passes 0 as the prot argument,
> > > resulting in faulting mappings.
> > >
> > > This patch infers the protection attributes based on the direction of
> > > the DMA transfer.
> > >
> > > Cc: Marek Szyprowski <m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
> > > Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> >
> > Thanks for fixing this issue. Could I take this patch to my dma-mapping tree
> > with other dma-mapping changes I've collected recently or do you want my ack
> > and push it via other tree?
>
> Please, feel free to take it via the dma-mapping tree! That's probably the
> best route for it and there's nothing that really depends on it anyway.

Ok, thank I will put it on my tree. Thanks for your patches!

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-19  8:57               ` Marek Szyprowski
  0 siblings, 0 replies; 97+ messages in thread
From: Marek Szyprowski @ 2013-06-19  8:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On 6/19/2013 10:52 AM, Will Deacon wrote:
> On Wed, Jun 19, 2013 at 09:37:03AM +0100, Marek Szyprowski wrote:
> > Hello,
>
> Hi Marek,
>
> > On 6/10/2013 8:34 PM, Will Deacon wrote:
> > > IOMMU mappings take a prot parameter, identifying the protection bits
> > > to enforce on the newly created mapping (READ or WRITE). The ARM
> > > dma-mapping framework currently just passes 0 as the prot argument,
> > > resulting in faulting mappings.
> > >
> > > This patch infers the protection attributes based on the direction of
> > > the DMA transfer.
> > >
> > > Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> > > Signed-off-by: Will Deacon <will.deacon@arm.com>
> >
> > Thanks for fixing this issue. Could I take this patch to my dma-mapping tree
> > with other dma-mapping changes I've collected recently or do you want my ack
> > and push it via other tree?
>
> Please, feel free to take it via the dma-mapping tree! That's probably the
> best route for it and there's nothing that really depends on it anyway.

Ok, thank I will put it on my tree. Thanks for your patches!

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
  2013-06-11  9:39                 ` Hiroshi Doyu
@ 2013-06-19  8:59                     ` Marek Szyprowski
  -1 siblings, 0 replies; 97+ messages in thread
From: Marek Szyprowski @ 2013-06-19  8:59 UTC (permalink / raw)
  To: Hiroshi Doyu
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Will Deacon,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hello,

On 6/11/2013 11:39 AM, Hiroshi Doyu wrote:
> On Tue, 11 Jun 2013 10:50:15 +0200
> Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>
> > On Tue, Jun 11, 2013 at 06:34:55AM +0100, Hiroshi Doyu wrote:
> > > Hi Will,
> > >
> > > Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote @ Mon, 10 Jun 2013 20:34:40 +0200:
> > >
> > > > The current code only clobbers a local variable, so the device is left
> > > > with a stale mapping pointer.
> > >
> > > True. This's my bad. Thanks.
> >
> > That's alright, it's easy to fix. Mind if I add your ack please?
>
> Feel free to add: Acked-by: Hiroshi Doyu <hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>

Thanks for spotting the bug. I've taked the patch to my dma-mapping tree.

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach
@ 2013-06-19  8:59                     ` Marek Szyprowski
  0 siblings, 0 replies; 97+ messages in thread
From: Marek Szyprowski @ 2013-06-19  8:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On 6/11/2013 11:39 AM, Hiroshi Doyu wrote:
> On Tue, 11 Jun 2013 10:50:15 +0200
> Will Deacon <will.deacon@arm.com> wrote:
>
> > On Tue, Jun 11, 2013 at 06:34:55AM +0100, Hiroshi Doyu wrote:
> > > Hi Will,
> > >
> > > Will Deacon <will.deacon@arm.com> wrote @ Mon, 10 Jun 2013 20:34:40 +0200:
> > >
> > > > The current code only clobbers a local variable, so the device is left
> > > > with a stale mapping pointer.
> > >
> > > True. This's my bad. Thanks.
> >
> > That's alright, it's easy to fix. Mind if I add your ack please?
>
> Feel free to add: Acked-by: Hiroshi Doyu <hdoyu@nvidia.com>

Thanks for spotting the bug. I've taked the patch to my dma-mapping tree.

Best regards
-- 
Marek Szyprowski
Samsung R&D Institute Poland

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-20 20:08         ` Joerg Roedel
  -1 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-20 20:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Jun 10, 2013 at 07:34:43PM +0100, Will Deacon wrote:
> This patch adds a description of the device tree binding for the ARM
> System MMU architecture.

Interesting, will this be a common driver to replace existing ARM IOMMU
drivers or is it a common driver for ARM IOMMUs found in future chips?

> +- smmu-parent   : When multiple SMMUs are chained together, this
> +                  property can be used to provide a phandle to the
> +                  parent SMMU (that is the next SMMU on the path going
> +                  from the mmu-masters towards memory) node for this
> +                  SMMU.

What happens when SMMUs are chained? Will the second SMMU seeing the DMA
just pass it through or is it translated again?


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-20 20:08         ` Joerg Roedel
  0 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-20 20:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 10, 2013 at 07:34:43PM +0100, Will Deacon wrote:
> This patch adds a description of the device tree binding for the ARM
> System MMU architecture.

Interesting, will this be a common driver to replace existing ARM IOMMU
drivers or is it a common driver for ARM IOMMUs found in future chips?

> +- smmu-parent   : When multiple SMMUs are chained together, this
> +                  property can be used to provide a phandle to the
> +                  parent SMMU (that is the next SMMU on the path going
> +                  from the mmu-masters towards memory) node for this
> +                  SMMU.

What happens when SMMUs are chained? Will the second SMMU seeing the DMA
just pass it through or is it translated again?


Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-20 21:26         ` Joerg Roedel
  -1 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-20 21:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: Olav Haugan, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Andreas Herrmann,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Will,

On Mon, Jun 10, 2013 at 07:34:44PM +0100, Will Deacon wrote:
> This patch adds support for SMMUs implementing the ARM System MMU
> architecture versions 1 or 2. Both arm and arm64 are supported, although
> the v7s descriptor format is not used.
> 
> Cc: Rob Herring <robherring2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Andreas Herrmann <andreas.herrmann-bsGFqQB8/DxBDgjK7y7TUQ@public.gmane.org>
> Cc: Olav Haugan <ohaugan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>

A few general questions:

How have you tested this code? Has it been run on real hardware? What
were the results?

The code looks good and clean in general, minus a few places mentioned
below were I have questions and/or suggestions:

> +static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
> +{
> +	struct arm_smmu_device *parent, *tmp;
> +
> +	if (!smmu->parent_of_node)
> +		return NULL;
> +
> +	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
> +		if (parent->dev->of_node == smmu->parent_of_node)
> +			return parent;

Why do you need the _safe variant here? You are not changing the list in
this loop so you should be fine with list_for_each_entry().

> +
> +	dev_warn(smmu->dev,
> +		 "Failed to find SMMU parent despite parent in DT\n");
> +	return NULL;
> +}

> +/* Wait for any pending TLB invalidations to complete */
> +static void arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> +{
> +	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> +
> +	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
> +	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
> +	       & sTLBGSTATUS_GSACTIVE)
> +		cpu_relax();

Other IOMMU drivers have a timeout for this loop and report an error
when the state does not change. I think this makes sense here too so
that the kernel will not just stop spinning in that loop if something
goes wrong but prints an error instead.

> +}
> +static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
> +				   size_t size)
> +{
> +	unsigned long offset = (unsigned long)addr & ~PAGE_MASK;
> +
> +	/*
> +	 * If the SMMU can't walk tables in the CPU caches, treat them
> +	 * like non-coherent DMA...
> +	 */
> +	if (!(smmu->features & ARM_SMMU_FEAT_COHERENT_WALK))
> +		dma_map_page(smmu->dev, virt_to_page(addr), offset, size,
> +			     DMA_TO_DEVICE);

Why can you call into DMA-API here? A DMA-API implementation may call
back into this IOMMU driver, no? So this looks a little bit like a
layering violation.

> +}

> +static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int flags)
> +{
> +	struct arm_smmu_domain *smmu_domain = domain->priv;
> +	struct arm_smmu_device *smmu = smmu_domain->leaf_smmu;
> +
> +	if (!smmu_domain || !smmu)
> +		return -ENODEV;
> +
> +	/*
> +	 * Check for silent address truncation up the SMMU chain.
> +	 */
> +	do {
> +		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
> +		if ((phys_addr_t)iova & ~output_mask)
> +			return -ERANGE;
> +	} while ((smmu = find_parent_smmu(smmu)));

This looks a bit too expensive to have in the map path. How about saving
something like an effective_output_mask (or output_size) which contains
the logical OR of every mask up the path? This would make this check a
lot cheaper.

> +
> +	return arm_smmu_create_mapping(smmu_domain, iova, paddr, size, flags);
> +}
> +
> +static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size)
> +{
> +	int ret;
> +	struct arm_smmu_domain *smmu_domain = domain->priv;
> +	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
> +	struct arm_smmu_device *smmu = root_cfg->smmu;
> +	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> +
> +	ret = arm_smmu_create_mapping(smmu_domain, iova, 0, size, 0);

Since this function does also unmapping, how about renaming it to
arm_smmu_handle_mapping(). The 'create' part in there is misleading.

> +	writel_relaxed(root_cfg->vmid, gr0_base + ARM_SMMU_GR0_TLBIVMID);
> +	arm_smmu_tlb_sync(smmu);
> +	return ret ? ret : size;
> +}
> +static int arm_smmu_add_device(struct device *dev)
> +{
> +	struct arm_smmu_device *child, *parent, *smmu;
> +	struct arm_smmu_device *tmp[2];
> +	struct arm_smmu_master *master = NULL;
> +
> +	list_for_each_entry_safe(parent, tmp[0], &arm_smmu_devices, list) {

Again, why do you use the _safe variant, you do not seem to change the
lists traversed here.

> +		smmu = parent;
> +
> +		/* Try to find a child of the current SMMU. */
> +		list_for_each_entry_safe(child, tmp[1], &arm_smmu_devices, list) {
> +			if (child->parent_of_node == parent->dev->of_node) {
> +				/* Does the child sit above our master? */
> +				master = find_smmu_master(child, dev->of_node);
> +				if (master) {
> +					smmu = NULL;
> +					break;
> +				}
> +			}
> +		}
> +
> +		/* We found some children, so keep searching. */
> +		if (!smmu) {
> +			master = NULL;
> +			continue;
> +		}
> +
> +		master = find_smmu_master(smmu, dev->of_node);
> +		if (master)
> +			break;
> +	}
> +
> +	if (!master)
> +		return -ENODEV;
> +
> +	dev->archdata.iommu = smmu;
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
@ 2013-06-20 21:26         ` Joerg Roedel
  0 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-20 21:26 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Mon, Jun 10, 2013 at 07:34:44PM +0100, Will Deacon wrote:
> This patch adds support for SMMUs implementing the ARM System MMU
> architecture versions 1 or 2. Both arm and arm64 are supported, although
> the v7s descriptor format is not used.
> 
> Cc: Rob Herring <robherring2@gmail.com>
> Cc: Andreas Herrmann <andreas.herrmann@calxeda.com>
> Cc: Olav Haugan <ohaugan@codeaurora.org>
> Cc: Joerg Roedel <joro@8bytes.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

A few general questions:

How have you tested this code? Has it been run on real hardware? What
were the results?

The code looks good and clean in general, minus a few places mentioned
below were I have questions and/or suggestions:

> +static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
> +{
> +	struct arm_smmu_device *parent, *tmp;
> +
> +	if (!smmu->parent_of_node)
> +		return NULL;
> +
> +	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
> +		if (parent->dev->of_node == smmu->parent_of_node)
> +			return parent;

Why do you need the _safe variant here? You are not changing the list in
this loop so you should be fine with list_for_each_entry().

> +
> +	dev_warn(smmu->dev,
> +		 "Failed to find SMMU parent despite parent in DT\n");
> +	return NULL;
> +}

> +/* Wait for any pending TLB invalidations to complete */
> +static void arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> +{
> +	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> +
> +	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
> +	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
> +	       & sTLBGSTATUS_GSACTIVE)
> +		cpu_relax();

Other IOMMU drivers have a timeout for this loop and report an error
when the state does not change. I think this makes sense here too so
that the kernel will not just stop spinning in that loop if something
goes wrong but prints an error instead.

> +}
> +static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
> +				   size_t size)
> +{
> +	unsigned long offset = (unsigned long)addr & ~PAGE_MASK;
> +
> +	/*
> +	 * If the SMMU can't walk tables in the CPU caches, treat them
> +	 * like non-coherent DMA...
> +	 */
> +	if (!(smmu->features & ARM_SMMU_FEAT_COHERENT_WALK))
> +		dma_map_page(smmu->dev, virt_to_page(addr), offset, size,
> +			     DMA_TO_DEVICE);

Why can you call into DMA-API here? A DMA-API implementation may call
back into this IOMMU driver, no? So this looks a little bit like a
layering violation.

> +}

> +static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> +			phys_addr_t paddr, size_t size, int flags)
> +{
> +	struct arm_smmu_domain *smmu_domain = domain->priv;
> +	struct arm_smmu_device *smmu = smmu_domain->leaf_smmu;
> +
> +	if (!smmu_domain || !smmu)
> +		return -ENODEV;
> +
> +	/*
> +	 * Check for silent address truncation up the SMMU chain.
> +	 */
> +	do {
> +		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
> +		if ((phys_addr_t)iova & ~output_mask)
> +			return -ERANGE;
> +	} while ((smmu = find_parent_smmu(smmu)));

This looks a bit too expensive to have in the map path. How about saving
something like an effective_output_mask (or output_size) which contains
the logical OR of every mask up the path? This would make this check a
lot cheaper.

> +
> +	return arm_smmu_create_mapping(smmu_domain, iova, paddr, size, flags);
> +}
> +
> +static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> +			     size_t size)
> +{
> +	int ret;
> +	struct arm_smmu_domain *smmu_domain = domain->priv;
> +	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
> +	struct arm_smmu_device *smmu = root_cfg->smmu;
> +	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> +
> +	ret = arm_smmu_create_mapping(smmu_domain, iova, 0, size, 0);

Since this function does also unmapping, how about renaming it to
arm_smmu_handle_mapping(). The 'create' part in there is misleading.

> +	writel_relaxed(root_cfg->vmid, gr0_base + ARM_SMMU_GR0_TLBIVMID);
> +	arm_smmu_tlb_sync(smmu);
> +	return ret ? ret : size;
> +}
> +static int arm_smmu_add_device(struct device *dev)
> +{
> +	struct arm_smmu_device *child, *parent, *smmu;
> +	struct arm_smmu_device *tmp[2];
> +	struct arm_smmu_master *master = NULL;
> +
> +	list_for_each_entry_safe(parent, tmp[0], &arm_smmu_devices, list) {

Again, why do you use the _safe variant, you do not seem to change the
lists traversed here.

> +		smmu = parent;
> +
> +		/* Try to find a child of the current SMMU. */
> +		list_for_each_entry_safe(child, tmp[1], &arm_smmu_devices, list) {
> +			if (child->parent_of_node == parent->dev->of_node) {
> +				/* Does the child sit above our master? */
> +				master = find_smmu_master(child, dev->of_node);
> +				if (master) {
> +					smmu = NULL;
> +					break;
> +				}
> +			}
> +		}
> +
> +		/* We found some children, so keep searching. */
> +		if (!smmu) {
> +			master = NULL;
> +			continue;
> +		}
> +
> +		master = find_smmu_master(smmu, dev->of_node);
> +		if (master)
> +			break;
> +	}
> +
> +	if (!master)
> +		return -ENODEV;
> +
> +	dev->archdata.iommu = smmu;
> +	return 0;
> +}

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-20 20:08         ` Joerg Roedel
@ 2013-06-21  9:57             ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21  9:57 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Joerg,

On Thu, Jun 20, 2013 at 09:08:45PM +0100, Joerg Roedel wrote:
> On Mon, Jun 10, 2013 at 07:34:43PM +0100, Will Deacon wrote:
> > This patch adds a description of the device tree binding for the ARM
> > System MMU architecture.
> 
> Interesting, will this be a common driver to replace existing ARM IOMMU
> drivers or is it a common driver for ARM IOMMUs found in future chips?

This is a common driver that will support any IOMMUs compatible with v1 or
v2 of the ARM SMMU architecture. Currently, that includes SMMUs known
informatively as MMU-400, MMU-401 and MMU-500. I had a look at the other
IOMMU drivers in the kernel and they seem to be driving incompatible IOMMUs,
so I don't see how this driver can replace those.

However, we'll hopefully see ARM SMMU-compatible devices turning up soon (I
know of many SoCs in development that are looking at them).

> > +- smmu-parent   : When multiple SMMUs are chained together, this
> > +                  property can be used to provide a phandle to the
> > +                  parent SMMU (that is the next SMMU on the path going
> > +                  from the mmu-masters towards memory) node for this
> > +                  SMMU.
> 
> What happens when SMMUs are chained? Will the second SMMU seeing the DMA
> just pass it through or is it translated again?

Chaining is really horrible and exists as a hack to support virtualisation
using two separate SMMUs, where neither of them can support nested
translation.

The current driver just programs the translation in the SMMU nearest the
device, then sets the other SMMU into `bypass' mode (it was simple enough to
generalise the chain to contain an arbitrary number of SMMUs, so the driver
can actually deal with any number of the things). In theory (and if we
extended the IOMMU API to distinguish between guest and host mappings --
something which I plan to look at in the future), KVM could install mappings
in the second SMMU, but I think we should draw a line in the sand and mandate
support for nested translation for KVM.

The only thing to take care of is that the SMMUs in the chain don't silently
truncate addresses.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-21  9:57             ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21  9:57 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Joerg,

On Thu, Jun 20, 2013 at 09:08:45PM +0100, Joerg Roedel wrote:
> On Mon, Jun 10, 2013 at 07:34:43PM +0100, Will Deacon wrote:
> > This patch adds a description of the device tree binding for the ARM
> > System MMU architecture.
> 
> Interesting, will this be a common driver to replace existing ARM IOMMU
> drivers or is it a common driver for ARM IOMMUs found in future chips?

This is a common driver that will support any IOMMUs compatible with v1 or
v2 of the ARM SMMU architecture. Currently, that includes SMMUs known
informatively as MMU-400, MMU-401 and MMU-500. I had a look at the other
IOMMU drivers in the kernel and they seem to be driving incompatible IOMMUs,
so I don't see how this driver can replace those.

However, we'll hopefully see ARM SMMU-compatible devices turning up soon (I
know of many SoCs in development that are looking at them).

> > +- smmu-parent   : When multiple SMMUs are chained together, this
> > +                  property can be used to provide a phandle to the
> > +                  parent SMMU (that is the next SMMU on the path going
> > +                  from the mmu-masters towards memory) node for this
> > +                  SMMU.
> 
> What happens when SMMUs are chained? Will the second SMMU seeing the DMA
> just pass it through or is it translated again?

Chaining is really horrible and exists as a hack to support virtualisation
using two separate SMMUs, where neither of them can support nested
translation.

The current driver just programs the translation in the SMMU nearest the
device, then sets the other SMMU into `bypass' mode (it was simple enough to
generalise the chain to contain an arbitrary number of SMMUs, so the driver
can actually deal with any number of the things). In theory (and if we
extended the IOMMU API to distinguish between guest and host mappings --
something which I plan to look at in the future), KVM could install mappings
in the second SMMU, but I think we should draw a line in the sand and mandate
support for nested translation for KVM.

The only thing to take care of is that the SMMUs in the chain don't silently
truncate addresses.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
  2013-06-20 21:26         ` Joerg Roedel
@ 2013-06-21 10:23             ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21 10:23 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Olav Haugan, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Andreas Herrmann,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jun 20, 2013 at 10:26:46PM +0100, Joerg Roedel wrote:
> Hi Will,

Hi Joerg,

First, thanks for looking at this patch. I really appreciate it.

> On Mon, Jun 10, 2013 at 07:34:44PM +0100, Will Deacon wrote:
> > This patch adds support for SMMUs implementing the ARM System MMU
> > architecture versions 1 or 2. Both arm and arm64 are supported, although
> > the v7s descriptor format is not used.
> > 
> > Cc: Rob Herring <robherring2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Cc: Andreas Herrmann <andreas.herrmann-bsGFqQB8/DxBDgjK7y7TUQ@public.gmane.org>
> > Cc: Olav Haugan <ohaugan-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> > Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> > Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> 
> A few general questions:
> 
> How have you tested this code? Has it been run on real hardware? What
> were the results?

I've only tested it with software models, however they are fairly
sophisticated and provide *lots* of tunables, so you can test different
configurations which don't actually exist in hardware yet. You also get a
tonne of trace output, so you can inspect things like page table walking and
TLB filling to make sure the driver is setting things up as you planned.

I have also tested on MMU-400 (AArch32) and MMU-500 (both AArch32 and
AArch64, with 4k and 64k pages in the latter case) models. The setup I used
was 4 PL330 DMA controllers driven by dmatest.ko, which I hacked to call into
the ARM dma-mapping API to create a separate domain for each controller.

The results were that the memory-to-memory DMA didn't show any corruption. I
also managed to tickle access faults by messing around with the permissions,
then remap the buffers and resume the transfers.

> The code looks good and clean in general, minus a few places mentioned
> below were I have questions and/or suggestions:

Cheers, responses inline.

> > +static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
> > +{
> > +	struct arm_smmu_device *parent, *tmp;
> > +
> > +	if (!smmu->parent_of_node)
> > +		return NULL;
> > +
> > +	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
> > +		if (parent->dev->of_node == smmu->parent_of_node)
> > +			return parent;
> 
> Why do you need the _safe variant here? You are not changing the list in
> this loop so you should be fine with list_for_each_entry().

For a system with multiple SMMUs (regardless of chaining), couldn't this
code run in parallel with probing of another SMMU (which has to add to the
arm_smmu_devices list)? The same applies for device removal, which could
perhaps be driven from some power-managment code.

> > +
> > +	dev_warn(smmu->dev,
> > +		 "Failed to find SMMU parent despite parent in DT\n");
> > +	return NULL;
> > +}
> 
> > +/* Wait for any pending TLB invalidations to complete */
> > +static void arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> > +{
> > +	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> > +
> > +	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
> > +	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
> > +	       & sTLBGSTATUS_GSACTIVE)
> > +		cpu_relax();
> 
> Other IOMMU drivers have a timeout for this loop and report an error
> when the state does not change. I think this makes sense here too so
> that the kernel will not just stop spinning in that loop if something
> goes wrong but prints an error instead.

Good idea, I'll add that.

> > +}
> > +static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
> > +				   size_t size)
> > +{
> > +	unsigned long offset = (unsigned long)addr & ~PAGE_MASK;
> > +
> > +	/*
> > +	 * If the SMMU can't walk tables in the CPU caches, treat them
> > +	 * like non-coherent DMA...
> > +	 */
> > +	if (!(smmu->features & ARM_SMMU_FEAT_COHERENT_WALK))
> > +		dma_map_page(smmu->dev, virt_to_page(addr), offset, size,
> > +			     DMA_TO_DEVICE);
> 
> Why can you call into DMA-API here? A DMA-API implementation may call
> back into this IOMMU driver, no? So this looks a little bit like a
> layering violation.

So this is subtle, and the comment could probably do with more explanation.

The problem is that the SMMU hardware page table walker might not be able to
snoop the CPU caches. This means that after writing page table entries on
the CPU, we have to flush them *all* the way out to main memory, so the SMMU
can pick them up. There are only two other places I can think of where we
have to do something like this in Linux:

  (1) In the bowels of CPU suspend, where we may need to clean all of our
      caches prior to shutdown. This is all hidden away in low-level arch
      code.

  (2) When dealing with non-coherent DMA, since we have to transfer buffer
      ownership between the CPU and the device. There is an API for this.

so, in fact, non-coherent DMA is a perfect fit for what we're trying to
achieve with the SMMU page tables! I admit it looks odd, but it can't
possibly go recursive because the master in question is a page-table walker,
rather than anything backed by a struct device.

> > +}
> 
> > +static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> > +			phys_addr_t paddr, size_t size, int flags)
> > +{
> > +	struct arm_smmu_domain *smmu_domain = domain->priv;
> > +	struct arm_smmu_device *smmu = smmu_domain->leaf_smmu;
> > +
> > +	if (!smmu_domain || !smmu)
> > +		return -ENODEV;
> > +
> > +	/*
> > +	 * Check for silent address truncation up the SMMU chain.
> > +	 */
> > +	do {
> > +		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
> > +		if ((phys_addr_t)iova & ~output_mask)
> > +			return -ERANGE;
> > +	} while ((smmu = find_parent_smmu(smmu)));
> 
> This looks a bit too expensive to have in the map path. How about saving
> something like an effective_output_mask (or output_size) which contains
> the logical OR of every mask up the path? This would make this check a
> lot cheaper.

As mentioned in the DT binding thread, it's rare that this loop would
execute more than once, and largely inconceivable that it would execute more
than twice, so I don't know how much we need to worry about the cost.

> > +
> > +	return arm_smmu_create_mapping(smmu_domain, iova, paddr, size, flags);
> > +}
> > +
> > +static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> > +			     size_t size)
> > +{
> > +	int ret;
> > +	struct arm_smmu_domain *smmu_domain = domain->priv;
> > +	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
> > +	struct arm_smmu_device *smmu = root_cfg->smmu;
> > +	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> > +
> > +	ret = arm_smmu_create_mapping(smmu_domain, iova, 0, size, 0);
> 
> Since this function does also unmapping, how about renaming it to
> arm_smmu_handle_mapping(). The 'create' part in there is misleading.

Yep, that's a hangover from when I originally had separate functions. Will
fix,

> > +	writel_relaxed(root_cfg->vmid, gr0_base + ARM_SMMU_GR0_TLBIVMID);
> > +	arm_smmu_tlb_sync(smmu);
> > +	return ret ? ret : size;
> > +}
> > +static int arm_smmu_add_device(struct device *dev)
> > +{
> > +	struct arm_smmu_device *child, *parent, *smmu;
> > +	struct arm_smmu_device *tmp[2];
> > +	struct arm_smmu_master *master = NULL;
> > +
> > +	list_for_each_entry_safe(parent, tmp[0], &arm_smmu_devices, list) {
> 
> Again, why do you use the _safe variant, you do not seem to change the
> lists traversed here.

Same reasoning as above. Happy to remove it if it's not actually required...

Cheers again,

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
@ 2013-06-21 10:23             ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21 10:23 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 20, 2013 at 10:26:46PM +0100, Joerg Roedel wrote:
> Hi Will,

Hi Joerg,

First, thanks for looking at this patch. I really appreciate it.

> On Mon, Jun 10, 2013 at 07:34:44PM +0100, Will Deacon wrote:
> > This patch adds support for SMMUs implementing the ARM System MMU
> > architecture versions 1 or 2. Both arm and arm64 are supported, although
> > the v7s descriptor format is not used.
> > 
> > Cc: Rob Herring <robherring2@gmail.com>
> > Cc: Andreas Herrmann <andreas.herrmann@calxeda.com>
> > Cc: Olav Haugan <ohaugan@codeaurora.org>
> > Cc: Joerg Roedel <joro@8bytes.org>
> > Signed-off-by: Will Deacon <will.deacon@arm.com>
> 
> A few general questions:
> 
> How have you tested this code? Has it been run on real hardware? What
> were the results?

I've only tested it with software models, however they are fairly
sophisticated and provide *lots* of tunables, so you can test different
configurations which don't actually exist in hardware yet. You also get a
tonne of trace output, so you can inspect things like page table walking and
TLB filling to make sure the driver is setting things up as you planned.

I have also tested on MMU-400 (AArch32) and MMU-500 (both AArch32 and
AArch64, with 4k and 64k pages in the latter case) models. The setup I used
was 4 PL330 DMA controllers driven by dmatest.ko, which I hacked to call into
the ARM dma-mapping API to create a separate domain for each controller.

The results were that the memory-to-memory DMA didn't show any corruption. I
also managed to tickle access faults by messing around with the permissions,
then remap the buffers and resume the transfers.

> The code looks good and clean in general, minus a few places mentioned
> below were I have questions and/or suggestions:

Cheers, responses inline.

> > +static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
> > +{
> > +	struct arm_smmu_device *parent, *tmp;
> > +
> > +	if (!smmu->parent_of_node)
> > +		return NULL;
> > +
> > +	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
> > +		if (parent->dev->of_node == smmu->parent_of_node)
> > +			return parent;
> 
> Why do you need the _safe variant here? You are not changing the list in
> this loop so you should be fine with list_for_each_entry().

For a system with multiple SMMUs (regardless of chaining), couldn't this
code run in parallel with probing of another SMMU (which has to add to the
arm_smmu_devices list)? The same applies for device removal, which could
perhaps be driven from some power-managment code.

> > +
> > +	dev_warn(smmu->dev,
> > +		 "Failed to find SMMU parent despite parent in DT\n");
> > +	return NULL;
> > +}
> 
> > +/* Wait for any pending TLB invalidations to complete */
> > +static void arm_smmu_tlb_sync(struct arm_smmu_device *smmu)
> > +{
> > +	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> > +
> > +	writel_relaxed(0, gr0_base + ARM_SMMU_GR0_sTLBGSYNC);
> > +	while (readl_relaxed(gr0_base + ARM_SMMU_GR0_sTLBGSTATUS)
> > +	       & sTLBGSTATUS_GSACTIVE)
> > +		cpu_relax();
> 
> Other IOMMU drivers have a timeout for this loop and report an error
> when the state does not change. I think this makes sense here too so
> that the kernel will not just stop spinning in that loop if something
> goes wrong but prints an error instead.

Good idea, I'll add that.

> > +}
> > +static void arm_smmu_flush_pgtable(struct arm_smmu_device *smmu, void *addr,
> > +				   size_t size)
> > +{
> > +	unsigned long offset = (unsigned long)addr & ~PAGE_MASK;
> > +
> > +	/*
> > +	 * If the SMMU can't walk tables in the CPU caches, treat them
> > +	 * like non-coherent DMA...
> > +	 */
> > +	if (!(smmu->features & ARM_SMMU_FEAT_COHERENT_WALK))
> > +		dma_map_page(smmu->dev, virt_to_page(addr), offset, size,
> > +			     DMA_TO_DEVICE);
> 
> Why can you call into DMA-API here? A DMA-API implementation may call
> back into this IOMMU driver, no? So this looks a little bit like a
> layering violation.

So this is subtle, and the comment could probably do with more explanation.

The problem is that the SMMU hardware page table walker might not be able to
snoop the CPU caches. This means that after writing page table entries on
the CPU, we have to flush them *all* the way out to main memory, so the SMMU
can pick them up. There are only two other places I can think of where we
have to do something like this in Linux:

  (1) In the bowels of CPU suspend, where we may need to clean all of our
      caches prior to shutdown. This is all hidden away in low-level arch
      code.

  (2) When dealing with non-coherent DMA, since we have to transfer buffer
      ownership between the CPU and the device. There is an API for this.

so, in fact, non-coherent DMA is a perfect fit for what we're trying to
achieve with the SMMU page tables! I admit it looks odd, but it can't
possibly go recursive because the master in question is a page-table walker,
rather than anything backed by a struct device.

> > +}
> 
> > +static int arm_smmu_map(struct iommu_domain *domain, unsigned long iova,
> > +			phys_addr_t paddr, size_t size, int flags)
> > +{
> > +	struct arm_smmu_domain *smmu_domain = domain->priv;
> > +	struct arm_smmu_device *smmu = smmu_domain->leaf_smmu;
> > +
> > +	if (!smmu_domain || !smmu)
> > +		return -ENODEV;
> > +
> > +	/*
> > +	 * Check for silent address truncation up the SMMU chain.
> > +	 */
> > +	do {
> > +		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
> > +		if ((phys_addr_t)iova & ~output_mask)
> > +			return -ERANGE;
> > +	} while ((smmu = find_parent_smmu(smmu)));
> 
> This looks a bit too expensive to have in the map path. How about saving
> something like an effective_output_mask (or output_size) which contains
> the logical OR of every mask up the path? This would make this check a
> lot cheaper.

As mentioned in the DT binding thread, it's rare that this loop would
execute more than once, and largely inconceivable that it would execute more
than twice, so I don't know how much we need to worry about the cost.

> > +
> > +	return arm_smmu_create_mapping(smmu_domain, iova, paddr, size, flags);
> > +}
> > +
> > +static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova,
> > +			     size_t size)
> > +{
> > +	int ret;
> > +	struct arm_smmu_domain *smmu_domain = domain->priv;
> > +	struct arm_smmu_cfg *root_cfg = &smmu_domain->root_cfg;
> > +	struct arm_smmu_device *smmu = root_cfg->smmu;
> > +	void __iomem *gr0_base = ARM_SMMU_GR0(smmu);
> > +
> > +	ret = arm_smmu_create_mapping(smmu_domain, iova, 0, size, 0);
> 
> Since this function does also unmapping, how about renaming it to
> arm_smmu_handle_mapping(). The 'create' part in there is misleading.

Yep, that's a hangover from when I originally had separate functions. Will
fix,

> > +	writel_relaxed(root_cfg->vmid, gr0_base + ARM_SMMU_GR0_TLBIVMID);
> > +	arm_smmu_tlb_sync(smmu);
> > +	return ret ? ret : size;
> > +}
> > +static int arm_smmu_add_device(struct device *dev)
> > +{
> > +	struct arm_smmu_device *child, *parent, *smmu;
> > +	struct arm_smmu_device *tmp[2];
> > +	struct arm_smmu_master *master = NULL;
> > +
> > +	list_for_each_entry_safe(parent, tmp[0], &arm_smmu_devices, list) {
> 
> Again, why do you use the _safe variant, you do not seem to change the
> lists traversed here.

Same reasoning as above. Happy to remove it if it's not actually required...

Cheers again,

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-21  9:57             ` Will Deacon
@ 2013-06-21 13:55                 ` Joerg Roedel
  -1 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-21 13:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Will,

On Fri, Jun 21, 2013 at 10:57:29AM +0100, Will Deacon wrote:
> This is a common driver that will support any IOMMUs compatible with v1 or
> v2 of the ARM SMMU architecture. Currently, that includes SMMUs known
> informatively as MMU-400, MMU-401 and MMU-500. I had a look at the other
> IOMMU drivers in the kernel and they seem to be driving incompatible IOMMUs,
> so I don't see how this driver can replace those.
> 
> However, we'll hopefully see ARM SMMU-compatible devices turning up soon (I
> know of many SoCs in development that are looking at them).

That sounds great. So in the future there will be a single driver for
most future SOCs. This is a real improvement over the current situation.

>From NVidia is another driver with 'smmu' in its name, is that one also
incompatible?

> Chaining is really horrible and exists as a hack to support virtualisation
> using two separate SMMUs, where neither of them can support nested
> translation.

I see, thanks for the explanation.


	Joerg

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-21 13:55                 ` Joerg Roedel
  0 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-21 13:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Fri, Jun 21, 2013 at 10:57:29AM +0100, Will Deacon wrote:
> This is a common driver that will support any IOMMUs compatible with v1 or
> v2 of the ARM SMMU architecture. Currently, that includes SMMUs known
> informatively as MMU-400, MMU-401 and MMU-500. I had a look at the other
> IOMMU drivers in the kernel and they seem to be driving incompatible IOMMUs,
> so I don't see how this driver can replace those.
> 
> However, we'll hopefully see ARM SMMU-compatible devices turning up soon (I
> know of many SoCs in development that are looking at them).

That sounds great. So in the future there will be a single driver for
most future SOCs. This is a real improvement over the current situation.

>From NVidia is another driver with 'smmu' in its name, is that one also
incompatible?

> Chaining is really horrible and exists as a hack to support virtualisation
> using two separate SMMUs, where neither of them can support nested
> translation.

I see, thanks for the explanation.


	Joerg

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
  2013-06-21 10:23             ` Will Deacon
@ 2013-06-21 14:13                 ` Joerg Roedel
  -1 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-21 14:13 UTC (permalink / raw)
  To: Will Deacon
  Cc: Olav Haugan, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Andreas Herrmann,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Jun 21, 2013 at 11:23:18AM +0100, Will Deacon wrote:
> The results were that the memory-to-memory DMA didn't show any corruption. I
> also managed to tickle access faults by messing around with the permissions,
> then remap the buffers and resume the transfers.

That sounds pretty conclusive. So when real hardware shows up it should
work reasonably well.

> > > +static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
> > > +{
> > > +	struct arm_smmu_device *parent, *tmp;
> > > +
> > > +	if (!smmu->parent_of_node)
> > > +		return NULL;
> > > +
> > > +	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
> > > +		if (parent->dev->of_node == smmu->parent_of_node)
> > > +			return parent;
> > 
> > Why do you need the _safe variant here? You are not changing the list in
> > this loop so you should be fine with list_for_each_entry().
> 
> For a system with multiple SMMUs (regardless of chaining), couldn't this
> code run in parallel with probing of another SMMU (which has to add to the
> arm_smmu_devices list)? The same applies for device removal, which could
> perhaps be driven from some power-managment code.

Well, the '_safe' does not mean it is safe from concurrent list
manipulations. If you want to protect from that you still need a lock.
The '_safe' variant only allows to remove the current element from the
list while traversing it.

> > > +	do {
> > > +		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
> > > +		if ((phys_addr_t)iova & ~output_mask)
> > > +			return -ERANGE;
> > > +	} while ((smmu = find_parent_smmu(smmu)));
> > 
> > This looks a bit too expensive to have in the map path. How about saving
> > something like an effective_output_mask (or output_size) which contains
> > the logical OR of every mask up the path? This would make this check a
> > lot cheaper.
> 
> As mentioned in the DT binding thread, it's rare that this loop would
> execute more than once, and largely inconceivable that it would execute more
> than twice, so I don't know how much we need to worry about the cost.

But still, this code is a challenge for the branch-predictor, plus
the additional function calls to find_parent_smmu(). I still think it is
worth to optimize this away. The map function is supposed to be a
fast-path function.


	Joerg

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
@ 2013-06-21 14:13                 ` Joerg Roedel
  0 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-21 14:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 21, 2013 at 11:23:18AM +0100, Will Deacon wrote:
> The results were that the memory-to-memory DMA didn't show any corruption. I
> also managed to tickle access faults by messing around with the permissions,
> then remap the buffers and resume the transfers.

That sounds pretty conclusive. So when real hardware shows up it should
work reasonably well.

> > > +static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
> > > +{
> > > +	struct arm_smmu_device *parent, *tmp;
> > > +
> > > +	if (!smmu->parent_of_node)
> > > +		return NULL;
> > > +
> > > +	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
> > > +		if (parent->dev->of_node == smmu->parent_of_node)
> > > +			return parent;
> > 
> > Why do you need the _safe variant here? You are not changing the list in
> > this loop so you should be fine with list_for_each_entry().
> 
> For a system with multiple SMMUs (regardless of chaining), couldn't this
> code run in parallel with probing of another SMMU (which has to add to the
> arm_smmu_devices list)? The same applies for device removal, which could
> perhaps be driven from some power-managment code.

Well, the '_safe' does not mean it is safe from concurrent list
manipulations. If you want to protect from that you still need a lock.
The '_safe' variant only allows to remove the current element from the
list while traversing it.

> > > +	do {
> > > +		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
> > > +		if ((phys_addr_t)iova & ~output_mask)
> > > +			return -ERANGE;
> > > +	} while ((smmu = find_parent_smmu(smmu)));
> > 
> > This looks a bit too expensive to have in the map path. How about saving
> > something like an effective_output_mask (or output_size) which contains
> > the logical OR of every mask up the path? This would make this check a
> > lot cheaper.
> 
> As mentioned in the DT binding thread, it's rare that this loop would
> execute more than once, and largely inconceivable that it would execute more
> than twice, so I don't know how much we need to worry about the cost.

But still, this code is a challenge for the branch-predictor, plus
the additional function calls to find_parent_smmu(). I still think it is
worth to optimize this away. The map function is supposed to be a
fast-path function.


	Joerg

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
  2013-06-21 14:13                 ` Joerg Roedel
@ 2013-06-21 15:00                   ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21 15:00 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Olav Haugan, devicetree-discuss, iommu, Andreas Herrmann,
	linux-arm-kernel

On Fri, Jun 21, 2013 at 03:13:07PM +0100, Joerg Roedel wrote:
> On Fri, Jun 21, 2013 at 11:23:18AM +0100, Will Deacon wrote:
> > The results were that the memory-to-memory DMA didn't show any corruption. I
> > also managed to tickle access faults by messing around with the permissions,
> > then remap the buffers and resume the transfers.
> 
> That sounds pretty conclusive. So when real hardware shows up it should
> work reasonably well.

Fingers crossed. Of course, as I get to run on real RTL I'll address any
issues that might crop up.

> > > > +static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
> > > > +{
> > > > +	struct arm_smmu_device *parent, *tmp;
> > > > +
> > > > +	if (!smmu->parent_of_node)
> > > > +		return NULL;
> > > > +
> > > > +	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
> > > > +		if (parent->dev->of_node == smmu->parent_of_node)
> > > > +			return parent;
> > > 
> > > Why do you need the _safe variant here? You are not changing the list in
> > > this loop so you should be fine with list_for_each_entry().
> > 
> > For a system with multiple SMMUs (regardless of chaining), couldn't this
> > code run in parallel with probing of another SMMU (which has to add to the
> > arm_smmu_devices list)? The same applies for device removal, which could
> > perhaps be driven from some power-managment code.
> 
> Well, the '_safe' does not mean it is safe from concurrent list
> manipulations. If you want to protect from that you still need a lock.
> The '_safe' variant only allows to remove the current element from the
> list while traversing it.

Damn, I was hoping to avoid locking on the map path. In fact, this is a good
argument to go with your suggestion below (otherwise I'd need to use
reader/writer locks which seem to be frowned on).

> > > > +	do {
> > > > +		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
> > > > +		if ((phys_addr_t)iova & ~output_mask)
> > > > +			return -ERANGE;
> > > > +	} while ((smmu = find_parent_smmu(smmu)));
> > > 
> > > This looks a bit too expensive to have in the map path. How about saving
> > > something like an effective_output_mask (or output_size) which contains
> > > the logical OR of every mask up the path? This would make this check a
> > > lot cheaper.
> > 
> > As mentioned in the DT binding thread, it's rare that this loop would
> > execute more than once, and largely inconceivable that it would execute more
> > than twice, so I don't know how much we need to worry about the cost.
> 
> But still, this code is a challenge for the branch-predictor, plus
> the additional function calls to find_parent_smmu(). I still think it is
> worth to optimize this away. The map function is supposed to be a
> fast-path function.

Yes, the locking I'd need to introduce in find_parent_smmu seals the deal.
I'll have a go at constructing the mask statically for each SMMU (although I
think I want to use logical AND rather than OR).

Cheers,

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
@ 2013-06-21 15:00                   ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21 15:00 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 21, 2013 at 03:13:07PM +0100, Joerg Roedel wrote:
> On Fri, Jun 21, 2013 at 11:23:18AM +0100, Will Deacon wrote:
> > The results were that the memory-to-memory DMA didn't show any corruption. I
> > also managed to tickle access faults by messing around with the permissions,
> > then remap the buffers and resume the transfers.
> 
> That sounds pretty conclusive. So when real hardware shows up it should
> work reasonably well.

Fingers crossed. Of course, as I get to run on real RTL I'll address any
issues that might crop up.

> > > > +static struct arm_smmu_device *find_parent_smmu(struct arm_smmu_device *smmu)
> > > > +{
> > > > +	struct arm_smmu_device *parent, *tmp;
> > > > +
> > > > +	if (!smmu->parent_of_node)
> > > > +		return NULL;
> > > > +
> > > > +	list_for_each_entry_safe(parent, tmp, &arm_smmu_devices, list)
> > > > +		if (parent->dev->of_node == smmu->parent_of_node)
> > > > +			return parent;
> > > 
> > > Why do you need the _safe variant here? You are not changing the list in
> > > this loop so you should be fine with list_for_each_entry().
> > 
> > For a system with multiple SMMUs (regardless of chaining), couldn't this
> > code run in parallel with probing of another SMMU (which has to add to the
> > arm_smmu_devices list)? The same applies for device removal, which could
> > perhaps be driven from some power-managment code.
> 
> Well, the '_safe' does not mean it is safe from concurrent list
> manipulations. If you want to protect from that you still need a lock.
> The '_safe' variant only allows to remove the current element from the
> list while traversing it.

Damn, I was hoping to avoid locking on the map path. In fact, this is a good
argument to go with your suggestion below (otherwise I'd need to use
reader/writer locks which seem to be frowned on).

> > > > +	do {
> > > > +		phys_addr_t output_mask = (1ULL << smmu->s2_output_size) - 1;
> > > > +		if ((phys_addr_t)iova & ~output_mask)
> > > > +			return -ERANGE;
> > > > +	} while ((smmu = find_parent_smmu(smmu)));
> > > 
> > > This looks a bit too expensive to have in the map path. How about saving
> > > something like an effective_output_mask (or output_size) which contains
> > > the logical OR of every mask up the path? This would make this check a
> > > lot cheaper.
> > 
> > As mentioned in the DT binding thread, it's rare that this loop would
> > execute more than once, and largely inconceivable that it would execute more
> > than twice, so I don't know how much we need to worry about the cost.
> 
> But still, this code is a challenge for the branch-predictor, plus
> the additional function calls to find_parent_smmu(). I still think it is
> worth to optimize this away. The map function is supposed to be a
> fast-path function.

Yes, the locking I'd need to introduce in find_parent_smmu seals the deal.
I'll have a go at constructing the mask statically for each SMMU (although I
think I want to use logical AND rather than OR).

Cheers,

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
  2013-06-21 15:00                   ` Will Deacon
@ 2013-06-21 15:30                       ` Joerg Roedel
  -1 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-21 15:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: Olav Haugan, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Andreas Herrmann,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Jun 21, 2013 at 04:00:06PM +0100, Will Deacon wrote:
> Damn, I was hoping to avoid locking on the map path. In fact, this is a good
> argument to go with your suggestion below (otherwise I'd need to use
> reader/writer locks which seem to be frowned on).

You should look into using rcu-lists instead. You still need a lock, but
only when you actually manipulating the lists. For traversing them it is
sufficient to take the rcu_read_lock() which has very low overhead.


	Joerg

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
@ 2013-06-21 15:30                       ` Joerg Roedel
  0 siblings, 0 replies; 97+ messages in thread
From: Joerg Roedel @ 2013-06-21 15:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 21, 2013 at 04:00:06PM +0100, Will Deacon wrote:
> Damn, I was hoping to avoid locking on the map path. In fact, this is a good
> argument to go with your suggestion below (otherwise I'd need to use
> reader/writer locks which seem to be frowned on).

You should look into using rcu-lists instead. You still need a lock, but
only when you actually manipulating the lists. For traversing them it is
sufficient to take the rcu_read_lock() which has very low overhead.


	Joerg

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
  2013-06-21 15:30                       ` Joerg Roedel
@ 2013-06-21 16:40                           ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21 16:40 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Olav Haugan, devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	Andreas Herrmann,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Jun 21, 2013 at 04:30:44PM +0100, Joerg Roedel wrote:
> On Fri, Jun 21, 2013 at 04:00:06PM +0100, Will Deacon wrote:
> > Damn, I was hoping to avoid locking on the map path. In fact, this is a good
> > argument to go with your suggestion below (otherwise I'd need to use
> > reader/writer locks which seem to be frowned on).
> 
> You should look into using rcu-lists instead. You still need a lock, but
> only when you actually manipulating the lists. For traversing them it is
> sufficient to take the rcu_read_lock() which has very low overhead.

Indeed, I'd got RCU-protected lists mixed up with the _safe iterators.
However, if I construct a compound output_mask as you suggest, all the
locking is on the slow path (add/attach of devices) so it's no longer an
issue.

I'll post a v2 next week.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture
@ 2013-06-21 16:40                           ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21 16:40 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 21, 2013 at 04:30:44PM +0100, Joerg Roedel wrote:
> On Fri, Jun 21, 2013 at 04:00:06PM +0100, Will Deacon wrote:
> > Damn, I was hoping to avoid locking on the map path. In fact, this is a good
> > argument to go with your suggestion below (otherwise I'd need to use
> > reader/writer locks which seem to be frowned on).
> 
> You should look into using rcu-lists instead. You still need a lock, but
> only when you actually manipulating the lists. For traversing them it is
> sufficient to take the rcu_read_lock() which has very low overhead.

Indeed, I'd got RCU-protected lists mixed up with the _safe iterators.
However, if I construct a compound output_mask as you suggest, all the
locking is on the slow path (add/attach of devices) so it's no longer an
issue.

I'll post a v2 next week.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-21 13:55                 ` Joerg Roedel
@ 2013-06-21 16:41                     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21 16:41 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Jun 21, 2013 at 02:55:08PM +0100, Joerg Roedel wrote:
> On Fri, Jun 21, 2013 at 10:57:29AM +0100, Will Deacon wrote:
> > This is a common driver that will support any IOMMUs compatible with v1 or
> > v2 of the ARM SMMU architecture. Currently, that includes SMMUs known
> > informatively as MMU-400, MMU-401 and MMU-500. I had a look at the other
> > IOMMU drivers in the kernel and they seem to be driving incompatible IOMMUs,
> > so I don't see how this driver can replace those.
> > 
> > However, we'll hopefully see ARM SMMU-compatible devices turning up soon (I
> > know of many SoCs in development that are looking at them).
> 
> That sounds great. So in the future there will be a single driver for
> most future SOCs. This is a real improvement over the current situation.
> 
> From NVidia is another driver with 'smmu' in its name, is that one also
> incompatible?

Yup. I even checked with Stephen Warren before I started my driver, just in
case :)

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-21 16:41                     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-21 16:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 21, 2013 at 02:55:08PM +0100, Joerg Roedel wrote:
> On Fri, Jun 21, 2013 at 10:57:29AM +0100, Will Deacon wrote:
> > This is a common driver that will support any IOMMUs compatible with v1 or
> > v2 of the ARM SMMU architecture. Currently, that includes SMMUs known
> > informatively as MMU-400, MMU-401 and MMU-500. I had a look at the other
> > IOMMU drivers in the kernel and they seem to be driving incompatible IOMMUs,
> > so I don't see how this driver can replace those.
> > 
> > However, we'll hopefully see ARM SMMU-compatible devices turning up soon (I
> > know of many SoCs in development that are looking at them).
> 
> That sounds great. So in the future there will be a single driver for
> most future SOCs. This is a real improvement over the current situation.
> 
> From NVidia is another driver with 'smmu' in its name, is that one also
> incompatible?

Yup. I even checked with Stephen Warren before I started my driver, just in
case :)

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-25 10:12         ` Hiroshi Doyu
  -1 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-25 10:12 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-tegra-u79uwXL29TY76Z2rM5mHXA

Hi Will,

On Mon, 10 Jun 2013 20:34:39 +0200
Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
...
> @@ -1636,13 +1636,27 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
>  {
>  	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
>  	dma_addr_t dma_addr;
> -	int ret, len = PAGE_ALIGN(size + offset);
> +	int ret, prot, len = PAGE_ALIGN(size + offset);
>  
>  	dma_addr = __alloc_iova(mapping, len);
>  	if (dma_addr == DMA_ERROR_CODE)
>  		return dma_addr;
>  
> -	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0);
> +	switch (dir) {
> +	case DMA_BIDIRECTIONAL:
> +		prot = IOMMU_READ | IOMMU_WRITE;
> +		break;
> +	case DMA_TO_DEVICE:
> +		prot = IOMMU_READ;
> +		break;
> +	case DMA_FROM_DEVICE:
> +		prot = IOMMU_WRITE;
> +		break;
> +	default:
> +		prot = 0;
> +	}
> +
> +	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);

Do we need similar changes for map_sg case as well? They still passes '0' as prot.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-25 10:12         ` Hiroshi Doyu
  0 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-25 10:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Mon, 10 Jun 2013 20:34:39 +0200
Will Deacon <will.deacon@arm.com> wrote:
...
> @@ -1636,13 +1636,27 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
>  {
>  	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
>  	dma_addr_t dma_addr;
> -	int ret, len = PAGE_ALIGN(size + offset);
> +	int ret, prot, len = PAGE_ALIGN(size + offset);
>  
>  	dma_addr = __alloc_iova(mapping, len);
>  	if (dma_addr == DMA_ERROR_CODE)
>  		return dma_addr;
>  
> -	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0);
> +	switch (dir) {
> +	case DMA_BIDIRECTIONAL:
> +		prot = IOMMU_READ | IOMMU_WRITE;
> +		break;
> +	case DMA_TO_DEVICE:
> +		prot = IOMMU_READ;
> +		break;
> +	case DMA_FROM_DEVICE:
> +		prot = IOMMU_WRITE;
> +		break;
> +	default:
> +		prot = 0;
> +	}
> +
> +	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);

Do we need similar changes for map_sg case as well? They still passes '0' as prot.

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
  2013-06-25 10:12         ` Hiroshi Doyu
@ 2013-06-25 11:37             ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-25 11:37 UTC (permalink / raw)
  To: Hiroshi Doyu
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 25, 2013 at 11:12:15AM +0100, Hiroshi Doyu wrote:
> Hi Will,

Hi Hiroshi,

> On Mon, 10 Jun 2013 20:34:39 +0200
> Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> ...
> > @@ -1636,13 +1636,27 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
> >  {
> >  	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
> >  	dma_addr_t dma_addr;
> > -	int ret, len = PAGE_ALIGN(size + offset);
> > +	int ret, prot, len = PAGE_ALIGN(size + offset);
> >  
> >  	dma_addr = __alloc_iova(mapping, len);
> >  	if (dma_addr == DMA_ERROR_CODE)
> >  		return dma_addr;
> >  
> > -	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0);
> > +	switch (dir) {
> > +	case DMA_BIDIRECTIONAL:
> > +		prot = IOMMU_READ | IOMMU_WRITE;
> > +		break;
> > +	case DMA_TO_DEVICE:
> > +		prot = IOMMU_READ;
> > +		break;
> > +	case DMA_FROM_DEVICE:
> > +		prot = IOMMU_WRITE;
> > +		break;
> > +	default:
> > +		prot = 0;
> > +	}
> > +
> > +	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
> 
> Do we need similar changes for map_sg case as well? They still passes '0' as prot.

Yes, we could use the same trick there (probably worth moving the logic into
a helper function for translating dma_data_direction into IOMMU_* values).

There are also iommu_map calls when allocating DMA buffers, but I think 0 is
the right thing to pass there (i.e. no permission until pages have been
explicitly mapped). Although, to be honest, I don't see why we need to map
the buffer at all when we allocate it.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-25 11:37             ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-25 11:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 25, 2013 at 11:12:15AM +0100, Hiroshi Doyu wrote:
> Hi Will,

Hi Hiroshi,

> On Mon, 10 Jun 2013 20:34:39 +0200
> Will Deacon <will.deacon@arm.com> wrote:
> ...
> > @@ -1636,13 +1636,27 @@ static dma_addr_t arm_coherent_iommu_map_page(struct device *dev, struct page *p
> >  {
> >  	struct dma_iommu_mapping *mapping = dev->archdata.mapping;
> >  	dma_addr_t dma_addr;
> > -	int ret, len = PAGE_ALIGN(size + offset);
> > +	int ret, prot, len = PAGE_ALIGN(size + offset);
> >  
> >  	dma_addr = __alloc_iova(mapping, len);
> >  	if (dma_addr == DMA_ERROR_CODE)
> >  		return dma_addr;
> >  
> > -	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, 0);
> > +	switch (dir) {
> > +	case DMA_BIDIRECTIONAL:
> > +		prot = IOMMU_READ | IOMMU_WRITE;
> > +		break;
> > +	case DMA_TO_DEVICE:
> > +		prot = IOMMU_READ;
> > +		break;
> > +	case DMA_FROM_DEVICE:
> > +		prot = IOMMU_WRITE;
> > +		break;
> > +	default:
> > +		prot = 0;
> > +	}
> > +
> > +	ret = iommu_map(mapping->domain, dma_addr, page_to_phys(page), len, prot);
> 
> Do we need similar changes for map_sg case as well? They still passes '0' as prot.

Yes, we could use the same trick there (probably worth moving the logic into
a helper function for translating dma_data_direction into IOMMU_* values).

There are also iommu_map calls when allocating DMA buffers, but I think 0 is
the right thing to pass there (i.e. no permission until pages have been
explicitly mapped). Although, to be honest, I don't see why we need to map
the buffer at all when we allocate it.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
       [not found]             ` <20130625113714.GF31838-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
  2013-06-25 11:52                 ` Hiroshi Doyu
@ 2013-06-25 11:52                 ` Hiroshi Doyu
  0 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-25 11:52 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote @ Tue, 25 Jun 2013 13:37:14 +0200:
...
> > Do we need similar changes for map_sg case as well? They still passes '0' as prot.
> 
> Yes, we could use the same trick there (probably worth moving the logic into
> a helper function for translating dma_data_direction into IOMMU_* values).
> 
> There are also iommu_map calls when allocating DMA buffers, but I think 0 is
> the right thing to pass there (i.e. no permission until pages have been
> explicitly mapped). Although, to be honest, I don't see why we need to map
> the buffer at all when we allocate it.

Yes, I thought too. I have a patch for that as below. If you like,
I'll rebase and send for merge with the one which changes
dma-mapping.c.

From 699e6bd4fef86383d197775486b47bcbdc594f4a Mon Sep 17 00:00:00 2001
From: Hiroshi Doyu <hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Date: Tue, 25 Jun 2013 13:43:29 +0300
Subject: [PATCH 1/2] iommu/core: convert DMA direction into IOMMU protection
 attributes

Introduce a new function to convert DMA direction into IOMMU
protection attributes.

Signed-off-by: Hiroshi Doyu <hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 include/linux/iommu.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 00af00f..ce3be78 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -21,11 +21,26 @@
 
 #include <linux/errno.h>
 #include <linux/types.h>
+#include <linux/dma-direction.h>
 
 #define IOMMU_READ	(1)
 #define IOMMU_WRITE	(2)
 #define IOMMU_CACHE	(4) /* DMA cache coherency */
 
+static inline int to_iommu_prot(enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		return IOMMU_READ | IOMMU_WRITE;
+	case DMA_TO_DEVICE:
+		return IOMMU_READ;
+	case DMA_FROM_DEVICE:
+		return IOMMU_WRITE;
+	default:
+		return 0;
+	}
+}
+
 struct iommu_ops;
 struct iommu_group;
 struct bus_type;
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-25 11:52                 ` Hiroshi Doyu
  0 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-25 11:52 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, m.szyprowski-Sze3O3UU22JBDgjK7y7TUQ
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote @ Tue, 25 Jun 2013 13:37:14 +0200:
...
> > Do we need similar changes for map_sg case as well? They still passes '0' as prot.
> 
> Yes, we could use the same trick there (probably worth moving the logic into
> a helper function for translating dma_data_direction into IOMMU_* values).
> 
> There are also iommu_map calls when allocating DMA buffers, but I think 0 is
> the right thing to pass there (i.e. no permission until pages have been
> explicitly mapped). Although, to be honest, I don't see why we need to map
> the buffer at all when we allocate it.

Yes, I thought too. I have a patch for that as below. If you like,
I'll rebase and send for merge with the one which changes
dma-mapping.c.

>From 699e6bd4fef86383d197775486b47bcbdc594f4a Mon Sep 17 00:00:00 2001
From: Hiroshi Doyu <hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
Date: Tue, 25 Jun 2013 13:43:29 +0300
Subject: [PATCH 1/2] iommu/core: convert DMA direction into IOMMU protection
 attributes

Introduce a new function to convert DMA direction into IOMMU
protection attributes.

Signed-off-by: Hiroshi Doyu <hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
---
 include/linux/iommu.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 00af00f..ce3be78 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -21,11 +21,26 @@
 
 #include <linux/errno.h>
 #include <linux/types.h>
+#include <linux/dma-direction.h>
 
 #define IOMMU_READ	(1)
 #define IOMMU_WRITE	(2)
 #define IOMMU_CACHE	(4) /* DMA cache coherency */
 
+static inline int to_iommu_prot(enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		return IOMMU_READ | IOMMU_WRITE;
+	case DMA_TO_DEVICE:
+		return IOMMU_READ;
+	case DMA_FROM_DEVICE:
+		return IOMMU_WRITE;
+	default:
+		return 0;
+	}
+}
+
 struct iommu_ops;
 struct iommu_group;
 struct bus_type;
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-25 11:52                 ` Hiroshi Doyu
  0 siblings, 0 replies; 97+ messages in thread
From: Hiroshi Doyu @ 2013-06-25 11:52 UTC (permalink / raw)
  To: linux-arm-kernel

Will Deacon <will.deacon@arm.com> wrote @ Tue, 25 Jun 2013 13:37:14 +0200:
...
> > Do we need similar changes for map_sg case as well? They still passes '0' as prot.
> 
> Yes, we could use the same trick there (probably worth moving the logic into
> a helper function for translating dma_data_direction into IOMMU_* values).
> 
> There are also iommu_map calls when allocating DMA buffers, but I think 0 is
> the right thing to pass there (i.e. no permission until pages have been
> explicitly mapped). Although, to be honest, I don't see why we need to map
> the buffer at all when we allocate it.

Yes, I thought too. I have a patch for that as below. If you like,
I'll rebase and send for merge with the one which changes
dma-mapping.c.

>From 699e6bd4fef86383d197775486b47bcbdc594f4a Mon Sep 17 00:00:00 2001
From: Hiroshi Doyu <hdoyu@nvidia.com>
Date: Tue, 25 Jun 2013 13:43:29 +0300
Subject: [PATCH 1/2] iommu/core: convert DMA direction into IOMMU protection
 attributes

Introduce a new function to convert DMA direction into IOMMU
protection attributes.

Signed-off-by: Hiroshi Doyu <hdoyu@nvidia.com>
---
 include/linux/iommu.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 00af00f..ce3be78 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -21,11 +21,26 @@
 
 #include <linux/errno.h>
 #include <linux/types.h>
+#include <linux/dma-direction.h>
 
 #define IOMMU_READ	(1)
 #define IOMMU_WRITE	(2)
 #define IOMMU_CACHE	(4) /* DMA cache coherency */
 
+static inline int to_iommu_prot(enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_BIDIRECTIONAL:
+		return IOMMU_READ | IOMMU_WRITE;
+	case DMA_TO_DEVICE:
+		return IOMMU_READ;
+	case DMA_FROM_DEVICE:
+		return IOMMU_WRITE;
+	default:
+		return 0;
+	}
+}
+
 struct iommu_ops;
 struct iommu_group;
 struct bus_type;
-- 
1.8.1.5

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* Re: [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
  2013-06-25 11:52                 ` Hiroshi Doyu
@ 2013-06-25 12:34                     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-25 12:34 UTC (permalink / raw)
  To: Hiroshi Doyu
  Cc: linux-tegra-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Tue, Jun 25, 2013 at 12:52:26PM +0100, Hiroshi Doyu wrote:
> Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote @ Tue, 25 Jun 2013 13:37:14 +0200:
> ...
> > > Do we need similar changes for map_sg case as well? They still passes '0' as prot.
> > 
> > Yes, we could use the same trick there (probably worth moving the logic into
> > a helper function for translating dma_data_direction into IOMMU_* values).
> > 
> > There are also iommu_map calls when allocating DMA buffers, but I think 0 is
> > the right thing to pass there (i.e. no permission until pages have been
> > explicitly mapped). Although, to be honest, I don't see why we need to map
> > the buffer at all when we allocate it.
> 
> Yes, I thought too. I have a patch for that as below. If you like,
> I'll rebase and send for merge with the one which changes
> dma-mapping.c.

Yes, please send the series and I'll take a look. Marek's already picked up
my original patch, so it's better if you can base against a stable branch
from him.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes
@ 2013-06-25 12:34                     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-25 12:34 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 25, 2013 at 12:52:26PM +0100, Hiroshi Doyu wrote:
> Will Deacon <will.deacon@arm.com> wrote @ Tue, 25 Jun 2013 13:37:14 +0200:
> ...
> > > Do we need similar changes for map_sg case as well? They still passes '0' as prot.
> > 
> > Yes, we could use the same trick there (probably worth moving the logic into
> > a helper function for translating dma_data_direction into IOMMU_* values).
> > 
> > There are also iommu_map calls when allocating DMA buffers, but I think 0 is
> > the right thing to pass there (i.e. no permission until pages have been
> > explicitly mapped). Although, to be honest, I don't see why we need to map
> > the buffer at all when we allocate it.
> 
> Yes, I thought too. I have a patch for that as below. If you like,
> I'll rebase and send for merge with the one which changes
> dma-mapping.c.

Yes, please send the series and I'll take a look. Marek's already picked up
my original patch, so it's better if you can base against a stable branch
from him.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-10 18:34     ` Will Deacon
@ 2013-06-25 19:18         ` Stuart Yoder
  -1 siblings, 0 replies; 97+ messages in thread
From: Stuart Yoder @ 2013-06-25 19:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
	Discuss, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Jun 10, 2013 at 1:34 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> This patch adds a description of the device tree binding for the ARM
> System MMU architecture.
>
> Cc: Rob Herring <robherring2-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Andreas Herrmann <andreas.herrmann-bsGFqQB8/DxBDgjK7y7TUQ@public.gmane.org>
> Cc: Joerg Roedel <joro-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
> Signed-off-by: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> ---
>  .../devicetree/bindings/iommu/arm,smmu.txt         | 70 ++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt
>
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> new file mode 100644
> index 0000000..e34c6cd
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -0,0 +1,70 @@
> +* ARM System MMU Architecture Implementation
> +
> +ARM SoCs may contain an implementation of the ARM System Memory
> +Management Unit Architecture, which can be used to provide 1 or 2 stages
> +of address translation to bus masters external to the CPU.
> +
> +The SMMU may also raise interrupts in response to various fault
> +conditions.
> +
> +** System MMU required properties:
> +
> +- compatible    : Should be one of:
> +
> +                        "arm,smmu-v1"
> +                        "arm,smmu-v2"
> +                        "arm,mmu-400"
> +                        "arm,mmu-500"
> +
> +                  depending on the particular implementation and/or the
> +                  version of the architecture implemented.
> +
> +- reg           : Base address and size of the SMMU.
> +
> +- #global-interrupts : The number of global interrupts exposed by the
> +                       device.
> +
> +- interrupts    : Interrupt list, with the first #global-irqs entries
> +                  corresponding to the global interrupts

It seems like we don't have enough information here.   It's not enough
for the OS to know that there are 2, 4, etc global interrupts, no?  It needs
to know which hardware interrupt each corresponds to.   That kind of
stuff is normally defined in the device binding.

What is it that determines the number of global interrupts?

> and any
> +                  following entries corresponding to context interrupts,
> +                  specified in order of their indexing by the SMMU.
> +
> +                  For SMMUv2 implementations, there must be exactly one
> +                  interrupt per context bank. In the case of a single,
> +                  combined interrupt, it must be listed multiple times.
> +
> +- mmu-masters   : A list of phandles to device nodes representing bus
> +                  masters for which the SMMU can provide a translation
> +                  and their corresponding StreamIDs (see example below).
> +                  Each device node linked from this list must have a
> +                  "#stream-id-cells" property, indicating the number of
> +                  StreamIDs associated with it.

So to find a the SMMU for a given device, I walk up the bus hierarchy
until I find an SMMU?

> +** System MMU optional properties:
> +
> +- smmu-parent   : When multiple SMMUs are chained together, this
> +                  property can be used to provide a phandle to the
> +                  parent SMMU (that is the next SMMU on the path going
> +                  from the mmu-masters towards memory) node for this
> +                  SMMU.

Why is an explicit phandle link needed here when you don't need a
smmu-parent phandle in each mmu-master?    Won't walking the bus
hierarchy identify the parent SMMU if things are chained?

Thanks,

Stuart Yoder
Freescale Semiconductor

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-25 19:18         ` Stuart Yoder
  0 siblings, 0 replies; 97+ messages in thread
From: Stuart Yoder @ 2013-06-25 19:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Jun 10, 2013 at 1:34 PM, Will Deacon <will.deacon@arm.com> wrote:
> This patch adds a description of the device tree binding for the ARM
> System MMU architecture.
>
> Cc: Rob Herring <robherring2@gmail.com>
> Cc: Andreas Herrmann <andreas.herrmann@calxeda.com>
> Cc: Joerg Roedel <joro@8bytes.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>
> ---
>  .../devicetree/bindings/iommu/arm,smmu.txt         | 70 ++++++++++++++++++++++
>  1 file changed, 70 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/iommu/arm,smmu.txt
>
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> new file mode 100644
> index 0000000..e34c6cd
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> @@ -0,0 +1,70 @@
> +* ARM System MMU Architecture Implementation
> +
> +ARM SoCs may contain an implementation of the ARM System Memory
> +Management Unit Architecture, which can be used to provide 1 or 2 stages
> +of address translation to bus masters external to the CPU.
> +
> +The SMMU may also raise interrupts in response to various fault
> +conditions.
> +
> +** System MMU required properties:
> +
> +- compatible    : Should be one of:
> +
> +                        "arm,smmu-v1"
> +                        "arm,smmu-v2"
> +                        "arm,mmu-400"
> +                        "arm,mmu-500"
> +
> +                  depending on the particular implementation and/or the
> +                  version of the architecture implemented.
> +
> +- reg           : Base address and size of the SMMU.
> +
> +- #global-interrupts : The number of global interrupts exposed by the
> +                       device.
> +
> +- interrupts    : Interrupt list, with the first #global-irqs entries
> +                  corresponding to the global interrupts

It seems like we don't have enough information here.   It's not enough
for the OS to know that there are 2, 4, etc global interrupts, no?  It needs
to know which hardware interrupt each corresponds to.   That kind of
stuff is normally defined in the device binding.

What is it that determines the number of global interrupts?

> and any
> +                  following entries corresponding to context interrupts,
> +                  specified in order of their indexing by the SMMU.
> +
> +                  For SMMUv2 implementations, there must be exactly one
> +                  interrupt per context bank. In the case of a single,
> +                  combined interrupt, it must be listed multiple times.
> +
> +- mmu-masters   : A list of phandles to device nodes representing bus
> +                  masters for which the SMMU can provide a translation
> +                  and their corresponding StreamIDs (see example below).
> +                  Each device node linked from this list must have a
> +                  "#stream-id-cells" property, indicating the number of
> +                  StreamIDs associated with it.

So to find a the SMMU for a given device, I walk up the bus hierarchy
until I find an SMMU?

> +** System MMU optional properties:
> +
> +- smmu-parent   : When multiple SMMUs are chained together, this
> +                  property can be used to provide a phandle to the
> +                  parent SMMU (that is the next SMMU on the path going
> +                  from the mmu-masters towards memory) node for this
> +                  SMMU.

Why is an explicit phandle link needed here when you don't need a
smmu-parent phandle in each mmu-master?    Won't walking the bus
hierarchy identify the parent SMMU if things are chained?

Thanks,

Stuart Yoder
Freescale Semiconductor

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-25 19:18         ` Stuart Yoder
@ 2013-06-26 13:39             ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-26 13:39 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
	Discuss, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Stuart,

On Tue, Jun 25, 2013 at 08:18:19PM +0100, Stuart Yoder wrote:
> On Mon, Jun 10, 2013 at 1:34 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> > new file mode 100644
> > index 0000000..e34c6cd
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> > @@ -0,0 +1,70 @@
> > +* ARM System MMU Architecture Implementation
> > +
> > +ARM SoCs may contain an implementation of the ARM System Memory
> > +Management Unit Architecture, which can be used to provide 1 or 2 stages
> > +of address translation to bus masters external to the CPU.
> > +
> > +The SMMU may also raise interrupts in response to various fault
> > +conditions.
> > +
> > +** System MMU required properties:
> > +
> > +- compatible    : Should be one of:
> > +
> > +                        "arm,smmu-v1"
> > +                        "arm,smmu-v2"
> > +                        "arm,mmu-400"
> > +                        "arm,mmu-500"
> > +
> > +                  depending on the particular implementation and/or the
> > +                  version of the architecture implemented.
> > +
> > +- reg           : Base address and size of the SMMU.
> > +
> > +- #global-interrupts : The number of global interrupts exposed by the
> > +                       device.
> > +
> > +- interrupts    : Interrupt list, with the first #global-irqs entries
> > +                  corresponding to the global interrupts
> 
> It seems like we don't have enough information here.   It's not enough
> for the OS to know that there are 2, 4, etc global interrupts, no?  It needs
> to know which hardware interrupt each corresponds to.   That kind of
> stuff is normally defined in the device binding.
> 
> What is it that determines the number of global interrupts?

I'd suggest looking at the driver I posted to get a gist of how the parsing
code works, but suffice to say that we describe both the number of
interrupts and the actual interrupt numbers here.

> > +- mmu-masters   : A list of phandles to device nodes representing bus
> > +                  masters for which the SMMU can provide a translation
> > +                  and their corresponding StreamIDs (see example below).
> > +                  Each device node linked from this list must have a
> > +                  "#stream-id-cells" property, indicating the number of
> > +                  StreamIDs associated with it.
> 
> So to find a the SMMU for a given device, I walk up the bus hierarchy
> until I find an SMMU?

Again, the code is better than any explanation I can give here, but we
basically construct a tree of masters for each SMMU in the system based on
the mmu-masters property, which we can later search.

> > +** System MMU optional properties:
> > +
> > +- smmu-parent   : When multiple SMMUs are chained together, this
> > +                  property can be used to provide a phandle to the
> > +                  parent SMMU (that is the next SMMU on the path going
> > +                  from the mmu-masters towards memory) node for this
> > +                  SMMU.
> 
> Why is an explicit phandle link needed here when you don't need a
> smmu-parent phandle in each mmu-master?    Won't walking the bus
> hierarchy identify the parent SMMU if things are chained?

What bus hierarchy? If I have two SMMU device nodes, how do I infer any
topological information without an explicit linkage?

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-26 13:39             ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-26 13:39 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Stuart,

On Tue, Jun 25, 2013 at 08:18:19PM +0100, Stuart Yoder wrote:
> On Mon, Jun 10, 2013 at 1:34 PM, Will Deacon <will.deacon@arm.com> wrote:
> > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> > new file mode 100644
> > index 0000000..e34c6cd
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
> > @@ -0,0 +1,70 @@
> > +* ARM System MMU Architecture Implementation
> > +
> > +ARM SoCs may contain an implementation of the ARM System Memory
> > +Management Unit Architecture, which can be used to provide 1 or 2 stages
> > +of address translation to bus masters external to the CPU.
> > +
> > +The SMMU may also raise interrupts in response to various fault
> > +conditions.
> > +
> > +** System MMU required properties:
> > +
> > +- compatible    : Should be one of:
> > +
> > +                        "arm,smmu-v1"
> > +                        "arm,smmu-v2"
> > +                        "arm,mmu-400"
> > +                        "arm,mmu-500"
> > +
> > +                  depending on the particular implementation and/or the
> > +                  version of the architecture implemented.
> > +
> > +- reg           : Base address and size of the SMMU.
> > +
> > +- #global-interrupts : The number of global interrupts exposed by the
> > +                       device.
> > +
> > +- interrupts    : Interrupt list, with the first #global-irqs entries
> > +                  corresponding to the global interrupts
> 
> It seems like we don't have enough information here.   It's not enough
> for the OS to know that there are 2, 4, etc global interrupts, no?  It needs
> to know which hardware interrupt each corresponds to.   That kind of
> stuff is normally defined in the device binding.
> 
> What is it that determines the number of global interrupts?

I'd suggest looking at the driver I posted to get a gist of how the parsing
code works, but suffice to say that we describe both the number of
interrupts and the actual interrupt numbers here.

> > +- mmu-masters   : A list of phandles to device nodes representing bus
> > +                  masters for which the SMMU can provide a translation
> > +                  and their corresponding StreamIDs (see example below).
> > +                  Each device node linked from this list must have a
> > +                  "#stream-id-cells" property, indicating the number of
> > +                  StreamIDs associated with it.
> 
> So to find a the SMMU for a given device, I walk up the bus hierarchy
> until I find an SMMU?

Again, the code is better than any explanation I can give here, but we
basically construct a tree of masters for each SMMU in the system based on
the mmu-masters property, which we can later search.

> > +** System MMU optional properties:
> > +
> > +- smmu-parent   : When multiple SMMUs are chained together, this
> > +                  property can be used to provide a phandle to the
> > +                  parent SMMU (that is the next SMMU on the path going
> > +                  from the mmu-masters towards memory) node for this
> > +                  SMMU.
> 
> Why is an explicit phandle link needed here when you don't need a
> smmu-parent phandle in each mmu-master?    Won't walking the bus
> hierarchy identify the parent SMMU if things are chained?

What bus hierarchy? If I have two SMMU device nodes, how do I infer any
topological information without an explicit linkage?

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-26 13:39             ` Will Deacon
@ 2013-06-26 16:19                 ` Stuart Yoder
  -1 siblings, 0 replies; 97+ messages in thread
From: Stuart Yoder @ 2013-06-26 16:19 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
	Discuss, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> Hi Stuart,
>
> On Tue, Jun 25, 2013 at 08:18:19PM +0100, Stuart Yoder wrote:
>> On Mon, Jun 10, 2013 at 1:34 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>> > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> > new file mode 100644
>> > index 0000000..e34c6cd
>> > --- /dev/null
>> > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> > @@ -0,0 +1,70 @@
>> > +* ARM System MMU Architecture Implementation
>> > +
>> > +ARM SoCs may contain an implementation of the ARM System Memory
>> > +Management Unit Architecture, which can be used to provide 1 or 2 stages
>> > +of address translation to bus masters external to the CPU.
>> > +
>> > +The SMMU may also raise interrupts in response to various fault
>> > +conditions.
>> > +
>> > +** System MMU required properties:
>> > +
>> > +- compatible    : Should be one of:
>> > +
>> > +                        "arm,smmu-v1"
>> > +                        "arm,smmu-v2"
>> > +                        "arm,mmu-400"
>> > +                        "arm,mmu-500"
>> > +
>> > +                  depending on the particular implementation and/or the
>> > +                  version of the architecture implemented.
>> > +
>> > +- reg           : Base address and size of the SMMU.
>> > +
>> > +- #global-interrupts : The number of global interrupts exposed by the
>> > +                       device.
>> > +
>> > +- interrupts    : Interrupt list, with the first #global-irqs entries
>> > +                  corresponding to the global interrupts
>>
>> It seems like we don't have enough information here.   It's not enough
>> for the OS to know that there are 2, 4, etc global interrupts, no?  It needs
>> to know which hardware interrupt each corresponds to.   That kind of
>> stuff is normally defined in the device binding.
>>
>> What is it that determines the number of global interrupts?
>
> I'd suggest looking at the driver I posted to get a gist of how the parsing
> code works, but suffice to say that we describe both the number of
> interrupts and the actual interrupt numbers here.

I understand that the number of interrupts and actual interrupt numbers
are described here, but was referring to the _meaning_ of the interrupt
numbers.   A binding for a device with 2 interrupts, a TX and RX would
normally identify which interrupt specifier is for TX and which is for RX.

Based on your code, the 2 global interrupts seem to be the secure
and non-secure fault interrupts...which your driver does not differentiate.
However, the device tree is describing hardware and  you can't assume
that all drivers don't care which is which.

So, I would suggest that the binding identify which interrupt specifier
is secure and which is non-secure.

If there are other interrupts added in the future like the performance interrupt
the definition could be expanded to add that.

But a question.... why do you need the #global-interrupts property
at all?   Is the number of "global" interrupts really implementation dependent?
Does't a specific compatible string imply the number of interrupts that there
actually are?

>> > +- mmu-masters   : A list of phandles to device nodes representing bus
>> > +                  masters for which the SMMU can provide a translation
>> > +                  and their corresponding StreamIDs (see example below).
>> > +                  Each device node linked from this list must have a
>> > +                  "#stream-id-cells" property, indicating the number of
>> > +                  StreamIDs associated with it.
>>
>> So to find a the SMMU for a given device, I walk up the bus hierarchy
>> until I find an SMMU?
>
> Again, the code is better than any explanation I can give here, but we
> basically construct a tree of masters for each SMMU in the system based on
> the mmu-masters property, which we can later search.

I see.

>> > +** System MMU optional properties:
>> > +
>> > +- smmu-parent   : When multiple SMMUs are chained together, this
>> > +                  property can be used to provide a phandle to the
>> > +                  parent SMMU (that is the next SMMU on the path going
>> > +                  from the mmu-masters towards memory) node for this
>> > +                  SMMU.
>>
>> Why is an explicit phandle link needed here when you don't need a
>> smmu-parent phandle in each mmu-master?    Won't walking the bus
>> hierarchy identify the parent SMMU if things are chained?
>
> What bus hierarchy? If I have two SMMU device nodes, how do I infer any
> topological information without an explicit linkage?

I really don't know anything about SMMU chaining, but am inferring what
this might look like.   Does the "child" SMMU look just like another
mmu-master to the parent?   If so, why not just use the mmu-masters
property to establish the relationship.

I just found it a bit strange that the relationship between I/O mmu-masters
is defined from parent SMMU node pointing to the device, and the
relationship between a 'child' SMMU and a parent is established
in the other direction.   Not a big deal.

Stuart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-26 16:19                 ` Stuart Yoder
  0 siblings, 0 replies; 97+ messages in thread
From: Stuart Yoder @ 2013-06-26 16:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon@arm.com> wrote:
> Hi Stuart,
>
> On Tue, Jun 25, 2013 at 08:18:19PM +0100, Stuart Yoder wrote:
>> On Mon, Jun 10, 2013 at 1:34 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.txt b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> > new file mode 100644
>> > index 0000000..e34c6cd
>> > --- /dev/null
>> > +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.txt
>> > @@ -0,0 +1,70 @@
>> > +* ARM System MMU Architecture Implementation
>> > +
>> > +ARM SoCs may contain an implementation of the ARM System Memory
>> > +Management Unit Architecture, which can be used to provide 1 or 2 stages
>> > +of address translation to bus masters external to the CPU.
>> > +
>> > +The SMMU may also raise interrupts in response to various fault
>> > +conditions.
>> > +
>> > +** System MMU required properties:
>> > +
>> > +- compatible    : Should be one of:
>> > +
>> > +                        "arm,smmu-v1"
>> > +                        "arm,smmu-v2"
>> > +                        "arm,mmu-400"
>> > +                        "arm,mmu-500"
>> > +
>> > +                  depending on the particular implementation and/or the
>> > +                  version of the architecture implemented.
>> > +
>> > +- reg           : Base address and size of the SMMU.
>> > +
>> > +- #global-interrupts : The number of global interrupts exposed by the
>> > +                       device.
>> > +
>> > +- interrupts    : Interrupt list, with the first #global-irqs entries
>> > +                  corresponding to the global interrupts
>>
>> It seems like we don't have enough information here.   It's not enough
>> for the OS to know that there are 2, 4, etc global interrupts, no?  It needs
>> to know which hardware interrupt each corresponds to.   That kind of
>> stuff is normally defined in the device binding.
>>
>> What is it that determines the number of global interrupts?
>
> I'd suggest looking at the driver I posted to get a gist of how the parsing
> code works, but suffice to say that we describe both the number of
> interrupts and the actual interrupt numbers here.

I understand that the number of interrupts and actual interrupt numbers
are described here, but was referring to the _meaning_ of the interrupt
numbers.   A binding for a device with 2 interrupts, a TX and RX would
normally identify which interrupt specifier is for TX and which is for RX.

Based on your code, the 2 global interrupts seem to be the secure
and non-secure fault interrupts...which your driver does not differentiate.
However, the device tree is describing hardware and  you can't assume
that all drivers don't care which is which.

So, I would suggest that the binding identify which interrupt specifier
is secure and which is non-secure.

If there are other interrupts added in the future like the performance interrupt
the definition could be expanded to add that.

But a question.... why do you need the #global-interrupts property
at all?   Is the number of "global" interrupts really implementation dependent?
Does't a specific compatible string imply the number of interrupts that there
actually are?

>> > +- mmu-masters   : A list of phandles to device nodes representing bus
>> > +                  masters for which the SMMU can provide a translation
>> > +                  and their corresponding StreamIDs (see example below).
>> > +                  Each device node linked from this list must have a
>> > +                  "#stream-id-cells" property, indicating the number of
>> > +                  StreamIDs associated with it.
>>
>> So to find a the SMMU for a given device, I walk up the bus hierarchy
>> until I find an SMMU?
>
> Again, the code is better than any explanation I can give here, but we
> basically construct a tree of masters for each SMMU in the system based on
> the mmu-masters property, which we can later search.

I see.

>> > +** System MMU optional properties:
>> > +
>> > +- smmu-parent   : When multiple SMMUs are chained together, this
>> > +                  property can be used to provide a phandle to the
>> > +                  parent SMMU (that is the next SMMU on the path going
>> > +                  from the mmu-masters towards memory) node for this
>> > +                  SMMU.
>>
>> Why is an explicit phandle link needed here when you don't need a
>> smmu-parent phandle in each mmu-master?    Won't walking the bus
>> hierarchy identify the parent SMMU if things are chained?
>
> What bus hierarchy? If I have two SMMU device nodes, how do I infer any
> topological information without an explicit linkage?

I really don't know anything about SMMU chaining, but am inferring what
this might look like.   Does the "child" SMMU look just like another
mmu-master to the parent?   If so, why not just use the mmu-masters
property to establish the relationship.

I just found it a bit strange that the relationship between I/O mmu-masters
is defined from parent SMMU node pointing to the device, and the
relationship between a 'child' SMMU and a parent is established
in the other direction.   Not a big deal.

Stuart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-26 16:19                 ` Stuart Yoder
@ 2013-06-26 17:42                     ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-26 17:42 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
	Discuss, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Jun 26, 2013 at 05:19:48PM +0100, Stuart Yoder wrote:
> On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> > I'd suggest looking at the driver I posted to get a gist of how the parsing
> > code works, but suffice to say that we describe both the number of
> > interrupts and the actual interrupt numbers here.
> 
> I understand that the number of interrupts and actual interrupt numbers
> are described here, but was referring to the _meaning_ of the interrupt
> numbers.   A binding for a device with 2 interrupts, a TX and RX would
> normally identify which interrupt specifier is for TX and which is for RX.
> 
> Based on your code, the 2 global interrupts seem to be the secure
> and non-secure fault interrupts...which your driver does not differentiate.
> However, the device tree is describing hardware and  you can't assume
> that all drivers don't care which is which.

Currently, the driver only works when Linux is running as non-secure, which is
becoming more and more common since it is required to be able to make use of
hyp mode.

There are actually two global interrupts for SMMUv1 and SMMUv2, which
correspond to configuration faults and `other' global faults.

> If there are other interrupts added in the future like the performance interrupt
> the definition could be expanded to add that.

PMU interrupts would like have a separate property, since the PMU would be
driven by a largely independent piece of code.

> But a question.... why do you need the #global-interrupts property
> at all?   Is the number of "global" interrupts really implementation dependent?
> Does't a specific compatible string imply the number of interrupts that there
> actually are?

Integrators have a nasty habit of ORing interrupts together or not wiring
them at all and I don't want to be caught out by that.

> > What bus hierarchy? If I have two SMMU device nodes, how do I infer any
> > topological information without an explicit linkage?
> 
> I really don't know anything about SMMU chaining, but am inferring what
> this might look like.   Does the "child" SMMU look just like another
> mmu-master to the parent?   If so, why not just use the mmu-masters
> property to establish the relationship.

No, the child SMMU does not look like a master. It can remaster upstream
StreamIDs, for example.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-26 17:42                     ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-26 17:42 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 26, 2013 at 05:19:48PM +0100, Stuart Yoder wrote:
> On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon@arm.com> wrote:
> > I'd suggest looking at the driver I posted to get a gist of how the parsing
> > code works, but suffice to say that we describe both the number of
> > interrupts and the actual interrupt numbers here.
> 
> I understand that the number of interrupts and actual interrupt numbers
> are described here, but was referring to the _meaning_ of the interrupt
> numbers.   A binding for a device with 2 interrupts, a TX and RX would
> normally identify which interrupt specifier is for TX and which is for RX.
> 
> Based on your code, the 2 global interrupts seem to be the secure
> and non-secure fault interrupts...which your driver does not differentiate.
> However, the device tree is describing hardware and  you can't assume
> that all drivers don't care which is which.

Currently, the driver only works when Linux is running as non-secure, which is
becoming more and more common since it is required to be able to make use of
hyp mode.

There are actually two global interrupts for SMMUv1 and SMMUv2, which
correspond to configuration faults and `other' global faults.

> If there are other interrupts added in the future like the performance interrupt
> the definition could be expanded to add that.

PMU interrupts would like have a separate property, since the PMU would be
driven by a largely independent piece of code.

> But a question.... why do you need the #global-interrupts property
> at all?   Is the number of "global" interrupts really implementation dependent?
> Does't a specific compatible string imply the number of interrupts that there
> actually are?

Integrators have a nasty habit of ORing interrupts together or not wiring
them at all and I don't want to be caught out by that.

> > What bus hierarchy? If I have two SMMU device nodes, how do I infer any
> > topological information without an explicit linkage?
> 
> I really don't know anything about SMMU chaining, but am inferring what
> this might look like.   Does the "child" SMMU look just like another
> mmu-master to the parent?   If so, why not just use the mmu-masters
> property to establish the relationship.

No, the child SMMU does not look like a master. It can remaster upstream
StreamIDs, for example.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-26 17:42                     ` Will Deacon
@ 2013-06-27 18:22                         ` Stuart Yoder
  -1 siblings, 0 replies; 97+ messages in thread
From: Stuart Yoder @ 2013-06-27 18:22 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
	Discuss, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Jun 26, 2013 at 12:42 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> On Wed, Jun 26, 2013 at 05:19:48PM +0100, Stuart Yoder wrote:
>> On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>> > I'd suggest looking at the driver I posted to get a gist of how the parsing
>> > code works, but suffice to say that we describe both the number of
>> > interrupts and the actual interrupt numbers here.
>>
>> I understand that the number of interrupts and actual interrupt numbers
>> are described here, but was referring to the _meaning_ of the interrupt
>> numbers.   A binding for a device with 2 interrupts, a TX and RX would
>> normally identify which interrupt specifier is for TX and which is for RX.
>>
>> Based on your code, the 2 global interrupts seem to be the secure
>> and non-secure fault interrupts...which your driver does not differentiate.
>> However, the device tree is describing hardware and  you can't assume
>> that all drivers don't care which is which.
>
> Currently, the driver only works when Linux is running as non-secure, which is
> becoming more and more common since it is required to be able to make use of
> hyp mode.
>
> There are actually two global interrupts for SMMUv1 and SMMUv2, which
> correspond to configuration faults and `other' global faults.

So, why don't we define which interrupt is which in this binding?
...e.g. "The first
interrupt specifier is for the configuration access fault interrupt, the second
interrupt specifier is for other global faults."

How else is software supposed to know which interrupt corresponds to what
event.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-27 18:22                         ` Stuart Yoder
  0 siblings, 0 replies; 97+ messages in thread
From: Stuart Yoder @ 2013-06-27 18:22 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 26, 2013 at 12:42 PM, Will Deacon <will.deacon@arm.com> wrote:
> On Wed, Jun 26, 2013 at 05:19:48PM +0100, Stuart Yoder wrote:
>> On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon@arm.com> wrote:
>> > I'd suggest looking at the driver I posted to get a gist of how the parsing
>> > code works, but suffice to say that we describe both the number of
>> > interrupts and the actual interrupt numbers here.
>>
>> I understand that the number of interrupts and actual interrupt numbers
>> are described here, but was referring to the _meaning_ of the interrupt
>> numbers.   A binding for a device with 2 interrupts, a TX and RX would
>> normally identify which interrupt specifier is for TX and which is for RX.
>>
>> Based on your code, the 2 global interrupts seem to be the secure
>> and non-secure fault interrupts...which your driver does not differentiate.
>> However, the device tree is describing hardware and  you can't assume
>> that all drivers don't care which is which.
>
> Currently, the driver only works when Linux is running as non-secure, which is
> becoming more and more common since it is required to be able to make use of
> hyp mode.
>
> There are actually two global interrupts for SMMUv1 and SMMUv2, which
> correspond to configuration faults and `other' global faults.

So, why don't we define which interrupt is which in this binding?
...e.g. "The first
interrupt specifier is for the configuration access fault interrupt, the second
interrupt specifier is for other global faults."

How else is software supposed to know which interrupt corresponds to what
event.

Thanks,
Stuart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-27 18:22                         ` Stuart Yoder
@ 2013-06-28  9:06                             ` Will Deacon
  -1 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-28  9:06 UTC (permalink / raw)
  To: Stuart Yoder
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
	Discuss, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Thu, Jun 27, 2013 at 07:22:30PM +0100, Stuart Yoder wrote:
> On Wed, Jun 26, 2013 at 12:42 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> > On Wed, Jun 26, 2013 at 05:19:48PM +0100, Stuart Yoder wrote:
> >> On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> >> > I'd suggest looking at the driver I posted to get a gist of how the parsing
> >> > code works, but suffice to say that we describe both the number of
> >> > interrupts and the actual interrupt numbers here.
> >>
> >> I understand that the number of interrupts and actual interrupt numbers
> >> are described here, but was referring to the _meaning_ of the interrupt
> >> numbers.   A binding for a device with 2 interrupts, a TX and RX would
> >> normally identify which interrupt specifier is for TX and which is for RX.
> >>
> >> Based on your code, the 2 global interrupts seem to be the secure
> >> and non-secure fault interrupts...which your driver does not differentiate.
> >> However, the device tree is describing hardware and  you can't assume
> >> that all drivers don't care which is which.
> >
> > Currently, the driver only works when Linux is running as non-secure, which is
> > becoming more and more common since it is required to be able to make use of
> > hyp mode.
> >
> > There are actually two global interrupts for SMMUv1 and SMMUv2, which
> > correspond to configuration faults and `other' global faults.
> 
> So, why don't we define which interrupt is which in this binding?
> ...e.g. "The first
> interrupt specifier is for the configuration access fault interrupt, the second
> interrupt specifier is for other global faults."

Well, first of all, they're both global interrupts and the handler will have
to go and access exactly the same registers to deal with the fault, so you
don't really gain anything.

However, the main reason is that you need to be able to handle:

  (1) Only one of the interrupts is routed to the interrupt controller
  (2) The interrupts are ORd into a single line

> How else is software supposed to know which interrupt corresponds to what
> event.

It really doesn't care. It can't, otherwise we break in the above
situations.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-28  9:06                             ` Will Deacon
  0 siblings, 0 replies; 97+ messages in thread
From: Will Deacon @ 2013-06-28  9:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Jun 27, 2013 at 07:22:30PM +0100, Stuart Yoder wrote:
> On Wed, Jun 26, 2013 at 12:42 PM, Will Deacon <will.deacon@arm.com> wrote:
> > On Wed, Jun 26, 2013 at 05:19:48PM +0100, Stuart Yoder wrote:
> >> On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon@arm.com> wrote:
> >> > I'd suggest looking at the driver I posted to get a gist of how the parsing
> >> > code works, but suffice to say that we describe both the number of
> >> > interrupts and the actual interrupt numbers here.
> >>
> >> I understand that the number of interrupts and actual interrupt numbers
> >> are described here, but was referring to the _meaning_ of the interrupt
> >> numbers.   A binding for a device with 2 interrupts, a TX and RX would
> >> normally identify which interrupt specifier is for TX and which is for RX.
> >>
> >> Based on your code, the 2 global interrupts seem to be the secure
> >> and non-secure fault interrupts...which your driver does not differentiate.
> >> However, the device tree is describing hardware and  you can't assume
> >> that all drivers don't care which is which.
> >
> > Currently, the driver only works when Linux is running as non-secure, which is
> > becoming more and more common since it is required to be able to make use of
> > hyp mode.
> >
> > There are actually two global interrupts for SMMUv1 and SMMUv2, which
> > correspond to configuration faults and `other' global faults.
> 
> So, why don't we define which interrupt is which in this binding?
> ...e.g. "The first
> interrupt specifier is for the configuration access fault interrupt, the second
> interrupt specifier is for other global faults."

Well, first of all, they're both global interrupts and the handler will have
to go and access exactly the same registers to deal with the fault, so you
don't really gain anything.

However, the main reason is that you need to be able to handle:

  (1) Only one of the interrupts is routed to the interrupt controller
  (2) The interrupts are ORd into a single line

> How else is software supposed to know which interrupt corresponds to what
> event.

It really doesn't care. It can't, otherwise we break in the above
situations.

Will

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
  2013-06-28  9:06                             ` Will Deacon
@ 2013-06-28 16:03                                 ` Stuart Yoder
  -1 siblings, 0 replies; 97+ messages in thread
From: Stuart Yoder @ 2013-06-28 16:03 UTC (permalink / raw)
  To: Will Deacon
  Cc: Andreas Herrmann,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA, Rob Herring,
	devicetree-discuss-uLR06cmDAlY/bJ5BZ2RsiQ@public.gmane.org
	Discuss, linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Jun 28, 2013 at 4:06 AM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
> On Thu, Jun 27, 2013 at 07:22:30PM +0100, Stuart Yoder wrote:
>> On Wed, Jun 26, 2013 at 12:42 PM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>> > On Wed, Jun 26, 2013 at 05:19:48PM +0100, Stuart Yoder wrote:
>> >> On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org> wrote:
>> >> > I'd suggest looking at the driver I posted to get a gist of how the parsing
>> >> > code works, but suffice to say that we describe both the number of
>> >> > interrupts and the actual interrupt numbers here.
>> >>
>> >> I understand that the number of interrupts and actual interrupt numbers
>> >> are described here, but was referring to the _meaning_ of the interrupt
>> >> numbers.   A binding for a device with 2 interrupts, a TX and RX would
>> >> normally identify which interrupt specifier is for TX and which is for RX.
>> >>
>> >> Based on your code, the 2 global interrupts seem to be the secure
>> >> and non-secure fault interrupts...which your driver does not differentiate.
>> >> However, the device tree is describing hardware and  you can't assume
>> >> that all drivers don't care which is which.
>> >
>> > Currently, the driver only works when Linux is running as non-secure, which is
>> > becoming more and more common since it is required to be able to make use of
>> > hyp mode.
>> >
>> > There are actually two global interrupts for SMMUv1 and SMMUv2, which
>> > correspond to configuration faults and `other' global faults.
>>
>> So, why don't we define which interrupt is which in this binding?
>> ...e.g. "The first
>> interrupt specifier is for the configuration access fault interrupt, the second
>> interrupt specifier is for other global faults."
>
> Well, first of all, they're both global interrupts and the handler will have
> to go and access exactly the same registers to deal with the fault, so you
> don't really gain anything.

It should be up to the OS to handle 2 interrupts the same, not
something

> However, the main reason is that you need to be able to handle:
>
>   (1) Only one of the interrupts is routed to the interrupt controller
>   (2) The interrupts are ORd into a single line

That's a particular SoC implementation, no?  It shouldn't
dictate the SMMU binding.

If they are ORed then the interrupt controller can't distinguish them
and both interrupt specifiers should have the _same_ interrupt number:

+                #global-interrupts = <2>;
+                interrupts = <0 32 4>,
+                             <0 32 4>,

(...or perhaps only define 1 global interrupt-- interrupt 32)

If an implementation chose to not OR them, then they will
have different interrupt numbers and need to be distinguished:

+                #global-interrupts = <2>;
+                interrupts = <0 32 4>,
+                             <0 33 4>,

...and in that case the binding needs to differentiate which
is which.

Stuart

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding
@ 2013-06-28 16:03                                 ` Stuart Yoder
  0 siblings, 0 replies; 97+ messages in thread
From: Stuart Yoder @ 2013-06-28 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 28, 2013 at 4:06 AM, Will Deacon <will.deacon@arm.com> wrote:
> On Thu, Jun 27, 2013 at 07:22:30PM +0100, Stuart Yoder wrote:
>> On Wed, Jun 26, 2013 at 12:42 PM, Will Deacon <will.deacon@arm.com> wrote:
>> > On Wed, Jun 26, 2013 at 05:19:48PM +0100, Stuart Yoder wrote:
>> >> On Wed, Jun 26, 2013 at 8:39 AM, Will Deacon <will.deacon@arm.com> wrote:
>> >> > I'd suggest looking at the driver I posted to get a gist of how the parsing
>> >> > code works, but suffice to say that we describe both the number of
>> >> > interrupts and the actual interrupt numbers here.
>> >>
>> >> I understand that the number of interrupts and actual interrupt numbers
>> >> are described here, but was referring to the _meaning_ of the interrupt
>> >> numbers.   A binding for a device with 2 interrupts, a TX and RX would
>> >> normally identify which interrupt specifier is for TX and which is for RX.
>> >>
>> >> Based on your code, the 2 global interrupts seem to be the secure
>> >> and non-secure fault interrupts...which your driver does not differentiate.
>> >> However, the device tree is describing hardware and  you can't assume
>> >> that all drivers don't care which is which.
>> >
>> > Currently, the driver only works when Linux is running as non-secure, which is
>> > becoming more and more common since it is required to be able to make use of
>> > hyp mode.
>> >
>> > There are actually two global interrupts for SMMUv1 and SMMUv2, which
>> > correspond to configuration faults and `other' global faults.
>>
>> So, why don't we define which interrupt is which in this binding?
>> ...e.g. "The first
>> interrupt specifier is for the configuration access fault interrupt, the second
>> interrupt specifier is for other global faults."
>
> Well, first of all, they're both global interrupts and the handler will have
> to go and access exactly the same registers to deal with the fault, so you
> don't really gain anything.

It should be up to the OS to handle 2 interrupts the same, not
something

> However, the main reason is that you need to be able to handle:
>
>   (1) Only one of the interrupts is routed to the interrupt controller
>   (2) The interrupts are ORd into a single line

That's a particular SoC implementation, no?  It shouldn't
dictate the SMMU binding.

If they are ORed then the interrupt controller can't distinguish them
and both interrupt specifiers should have the _same_ interrupt number:

+                #global-interrupts = <2>;
+                interrupts = <0 32 4>,
+                             <0 32 4>,

(...or perhaps only define 1 global interrupt-- interrupt 32)

If an implementation chose to not OR them, then they will
have different interrupt numbers and need to be distinguished:

+                #global-interrupts = <2>;
+                interrupts = <0 32 4>,
+                             <0 33 4>,

...and in that case the binding needs to differentiate which
is which.

Stuart

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2013-06-28 16:03 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-10 18:34 [PATCH 0/9] Add support for ARM SMMU architectures 1 and 2 Will Deacon
2013-06-10 18:34 ` Will Deacon
     [not found] ` <1370889285-22799-1-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2013-06-10 18:34   ` [PATCH 1/9] dma: pl330: rip out broken, redundant ID probing Will Deacon
2013-06-10 18:34     ` Will Deacon
     [not found]     ` <1370889285-22799-2-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2013-06-11  4:37       ` Jassi Brar
2013-06-11 22:31       ` Grant Likely
2013-06-11 22:31         ` Grant Likely
2013-06-12  5:31       ` Vinod Koul
2013-06-12  5:31         ` Vinod Koul
2013-06-11  4:40     ` Jassi Brar
2013-06-11  4:40       ` Jassi Brar
     [not found]       ` <CAJe_Zhc1UoTC4q4oaW=dzyi_10Q7EoezoT=G8_v+yCmBxV75+A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-11  8:45         ` Will Deacon
2013-06-11  8:45           ` Will Deacon
2013-06-10 18:34   ` [PATCH 2/9] dma: pl330: use dma_addr_t for describing bus addresses Will Deacon
2013-06-10 18:34     ` Will Deacon
     [not found]     ` <1370889285-22799-3-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2013-06-11  4:37       ` Jassi Brar
     [not found]         ` <CAJe_ZheKMVQgq42Vx5N1TXXdgFJ2sp50ixU30A7beXhmSVHnZQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-12  5:31           ` Vinod Koul
2013-06-12  5:31             ` Vinod Koul
2013-06-11 22:32       ` Grant Likely
2013-06-11 22:32         ` Grant Likely
2013-06-11  4:39     ` Jassi Brar
2013-06-11  4:39       ` Jassi Brar
2013-06-10 18:34   ` [PATCH 3/9] ARM: dma-mapping: convert DMA direction into IOMMU protection attributes Will Deacon
2013-06-10 18:34     ` Will Deacon
2013-06-19  8:37     ` Marek Szyprowski
2013-06-19  8:37       ` Marek Szyprowski
     [not found]       ` <51C16DAF.1090205-Sze3O3UU22JBDgjK7y7TUQ@public.gmane.org>
2013-06-19  8:52         ` Will Deacon
2013-06-19  8:52           ` Will Deacon
     [not found]           ` <20130619085202.GC20351-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-19  8:57             ` Marek Szyprowski
2013-06-19  8:57               ` Marek Szyprowski
     [not found]     ` <1370889285-22799-4-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2013-06-25 10:12       ` Hiroshi Doyu
2013-06-25 10:12         ` Hiroshi Doyu
     [not found]         ` <20130625131215.d3cea2a5668a3d41dbbeb064-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2013-06-25 11:37           ` Will Deacon
2013-06-25 11:37             ` Will Deacon
     [not found]             ` <20130625113714.GF31838-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-25 11:52               ` Hiroshi Doyu
2013-06-25 11:52                 ` Hiroshi Doyu
2013-06-25 11:52                 ` Hiroshi Doyu
     [not found]                 ` <20130625.145226.1632119404634300971.hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2013-06-25 12:34                   ` Will Deacon
2013-06-25 12:34                     ` Will Deacon
2013-06-10 18:34   ` [PATCH 4/9] ARM: dma-mapping: NULLify dev->archdata.mapping pointer on detach Will Deacon
2013-06-10 18:34     ` Will Deacon
     [not found]     ` <1370889285-22799-5-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2013-06-11  5:34       ` Hiroshi Doyu
2013-06-11  5:34         ` Hiroshi Doyu
     [not found]         ` <20130611.083455.1500863288897785600.hdoyu-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2013-06-11  8:50           ` Will Deacon
2013-06-11  8:50             ` Will Deacon
     [not found]             ` <20130611085015.GC24729-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-11  9:39               ` Hiroshi Doyu
2013-06-11  9:39                 ` Hiroshi Doyu
     [not found]                 ` <20130611123933.4d278ff4e056f395788ad060-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>
2013-06-19  8:59                   ` Marek Szyprowski
2013-06-19  8:59                     ` Marek Szyprowski
2013-06-10 18:34   ` [PATCH 5/9] arm64: pgtable: use pte_index instead of __pte_index Will Deacon
2013-06-10 18:34     ` Will Deacon
2013-06-10 18:34   ` [PATCH 6/9] arm64: device: add iommu pointer to device archdata Will Deacon
2013-06-10 18:34     ` Will Deacon
2013-06-10 18:34   ` [PATCH 7/9] documentation: iommu: add description of ARM System MMU binding Will Deacon
2013-06-10 18:34     ` Will Deacon
     [not found]     ` <1370889285-22799-8-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2013-06-12  8:44       ` Grant Likely
2013-06-12  8:44         ` Grant Likely
2013-06-20 20:08       ` Joerg Roedel
2013-06-20 20:08         ` Joerg Roedel
     [not found]         ` <20130620200845.GF11309-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2013-06-21  9:57           ` Will Deacon
2013-06-21  9:57             ` Will Deacon
     [not found]             ` <20130621095729.GA7766-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-21 13:55               ` Joerg Roedel
2013-06-21 13:55                 ` Joerg Roedel
     [not found]                 ` <20130621135507.GI11309-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2013-06-21 16:41                   ` Will Deacon
2013-06-21 16:41                     ` Will Deacon
2013-06-25 19:18       ` Stuart Yoder
2013-06-25 19:18         ` Stuart Yoder
     [not found]         ` <CALRxmdBxFWoRKv+bUu8VEwNNcAJUej9jM2V8N0rrqrr_Vpe8fQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-26 13:39           ` Will Deacon
2013-06-26 13:39             ` Will Deacon
     [not found]             ` <20130626133941.GD7417-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-26 16:19               ` Stuart Yoder
2013-06-26 16:19                 ` Stuart Yoder
     [not found]                 ` <CALRxmdCycFK2wW=C4aU79mudSaT+2vU8nzXxepdstubg+YSdQg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-26 17:42                   ` Will Deacon
2013-06-26 17:42                     ` Will Deacon
     [not found]                     ` <20130626174231.GH10333-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-27 18:22                       ` Stuart Yoder
2013-06-27 18:22                         ` Stuart Yoder
     [not found]                         ` <CALRxmdD5fyp06xW+z=rWagJc_bcJmpr1H9Zbdf=xbg9cCzvVfw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-06-28  9:06                           ` Will Deacon
2013-06-28  9:06                             ` Will Deacon
     [not found]                             ` <20130628090635.GB29002-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-28 16:03                               ` Stuart Yoder
2013-06-28 16:03                                 ` Stuart Yoder
2013-06-10 18:34   ` [PATCH 8/9] iommu: add support for ARM Ltd. System MMU architecture Will Deacon
2013-06-10 18:34     ` Will Deacon
     [not found]     ` <1370889285-22799-9-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2013-06-20 21:26       ` Joerg Roedel
2013-06-20 21:26         ` Joerg Roedel
     [not found]         ` <20130620212646.GG11309-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2013-06-21 10:23           ` Will Deacon
2013-06-21 10:23             ` Will Deacon
     [not found]             ` <20130621102318.GB7766-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-21 14:13               ` Joerg Roedel
2013-06-21 14:13                 ` Joerg Roedel
2013-06-21 15:00                 ` Will Deacon
2013-06-21 15:00                   ` Will Deacon
     [not found]                   ` <20130621150006.GG7766-MRww78TxoiP5vMa5CHWGZ34zcgK1vI+I0E9HWUfgJXw@public.gmane.org>
2013-06-21 15:30                     ` Joerg Roedel
2013-06-21 15:30                       ` Joerg Roedel
     [not found]                       ` <20130621153044.GL11309-zLv9SwRftAIdnm+yROfE0A@public.gmane.org>
2013-06-21 16:40                         ` Will Deacon
2013-06-21 16:40                           ` Will Deacon
2013-06-10 18:34   ` [PATCH 9/9] MAINTAINERS: add entry for ARM system MMU driver Will Deacon
2013-06-10 18:34     ` Will Deacon
     [not found]     ` <1370889285-22799-10-git-send-email-will.deacon-5wv7dgnIgG8@public.gmane.org>
2013-06-12  8:45       ` Grant Likely
2013-06-12  8:45         ` Grant Likely

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.