[PATCH 0/7] arm-smmu: Implementation and context format differentiation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/7] arm-smmu: Implementation and context format differentiation
@ 2016-04-13 17:12 ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:12 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

Hi all,

This is almost 3 separate sub-series, but there's enough interdependency
and horrid merge conflicts that I've been keeping it all together.

- Patches 1-3 rationalise implementation-specific details and errata,
  in order to start shovelling in yet more wonderful workarounds.
- Patches 3-6 do a broadly similar thing for context formats, breaking
  the effective hard-coding to allow it to become a runtime choice.
  (With the bonus of unblocking short-descriptor support, once I find
  and revive the hac^Wpatches I was using for that.)
- Patch 7 then takes advantage of the previous changes to support the
  hilarious raison d'etre of MMU-401, in all its glory.

This is all based on top of the ThunderX patches in Will's iommu/devel
branch. It also conflicts somewhat with my per-domain page size series;
since this is probably less contentious, I expect I'll optimistically
rebase that on top of this one before I post it again.

Robin.
---

Robin Murphy (7):
  iommu/arm-smmu: Differentiate specific implementations
  iommu/arm-smmu: Convert ThunderX workaround to new method
  iommu/arm-smmu: Work around MMU-500 prefetch errata
  io-64-nonatomic: Add relaxed accessor variants
  iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses
  iommu/arm-smmu: Decouple context format from kernel config
  iommu/arm-smmu: Support SMMUv1 64KB supplement

 Documentation/arm64/silicon-errata.txt |   1 +
 drivers/iommu/arm-smmu.c               | 238 +++++++++++++++++++++++----------
 include/linux/io-64-nonatomic-hi-lo.h  |  25 ++++
 include/linux/io-64-nonatomic-lo-hi.h  |  25 ++++
 4 files changed, 216 insertions(+), 73 deletions(-)

-- 
2.7.3.dirty

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 0/7] arm-smmu: Implementation and context format differentiation
@ 2016-04-13 17:12 ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

This is almost 3 separate sub-series, but there's enough interdependency
and horrid merge conflicts that I've been keeping it all together.

- Patches 1-3 rationalise implementation-specific details and errata,
  in order to start shovelling in yet more wonderful workarounds.
- Patches 3-6 do a broadly similar thing for context formats, breaking
  the effective hard-coding to allow it to become a runtime choice.
  (With the bonus of unblocking short-descriptor support, once I find
  and revive the hac^Wpatches I was using for that.)
- Patch 7 then takes advantage of the previous changes to support the
  hilarious raison d'etre of MMU-401, in all its glory.

This is all based on top of the ThunderX patches in Will's iommu/devel
branch. It also conflicts somewhat with my per-domain page size series;
since this is probably less contentious, I expect I'll optimistically
rebase that on top of this one before I post it again.

Robin.
---

Robin Murphy (7):
  iommu/arm-smmu: Differentiate specific implementations
  iommu/arm-smmu: Convert ThunderX workaround to new method
  iommu/arm-smmu: Work around MMU-500 prefetch errata
  io-64-nonatomic: Add relaxed accessor variants
  iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses
  iommu/arm-smmu: Decouple context format from kernel config
  iommu/arm-smmu: Support SMMUv1 64KB supplement

 Documentation/arm64/silicon-errata.txt |   1 +
 drivers/iommu/arm-smmu.c               | 238 +++++++++++++++++++++++----------
 include/linux/io-64-nonatomic-hi-lo.h  |  25 ++++
 include/linux/io-64-nonatomic-lo-hi.h  |  25 ++++
 4 files changed, 216 insertions(+), 73 deletions(-)

-- 
2.7.3.dirty

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/7] iommu/arm-smmu: Differentiate specific implementations
  2016-04-13 17:12 ` Robin Murphy
@ 2016-04-13 17:12     ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:12 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

As the inevitable reality of implementation-specific errata workarounds
begin to accrue alongside our integration quirk handling, it's about
time the driver had a decent way of keeping track. Extend the per-SMMU
data so we can identify specific implementations in an efficient and
firmware-agnostic manner.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e933679..2d5f357 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,10 @@ enum arm_smmu_arch_version {
 	ARM_SMMU_V2,
 };
 
+enum arm_smmu_implementation {
+	GENERIC_SMMU,
+};
+
 struct arm_smmu_smr {
 	u8				idx;
 	u16				mask;
@@ -315,6 +319,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
 	u32				options;
 	enum arm_smmu_arch_version	version;
+	enum arm_smmu_implementation	model;
 
 	u32				num_context_banks;
 	u32				num_s2_context_banks;
@@ -1735,13 +1740,24 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	return 0;
 }
 
+struct arm_smmu_match_data {
+	enum arm_smmu_arch_version version;
+	enum arm_smmu_implementation model;
+};
+
+#define ARM_SMMU_MATCH_DATA(name, ver, imp)	\
+static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+
+ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
+ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
+
 static const struct of_device_id arm_smmu_of_match[] = {
-	{ .compatible = "arm,smmu-v1", .data = (void *)ARM_SMMU_V1 },
-	{ .compatible = "arm,smmu-v2", .data = (void *)ARM_SMMU_V2 },
-	{ .compatible = "arm,mmu-400", .data = (void *)ARM_SMMU_V1 },
-	{ .compatible = "arm,mmu-401", .data = (void *)ARM_SMMU_V1 },
-	{ .compatible = "arm,mmu-500", .data = (void *)ARM_SMMU_V2 },
-	{ .compatible = "cavium,smmu-v2", .data = (void *)ARM_SMMU_V2 },
+	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
+	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
+	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
+	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
+	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
+	{ .compatible = "cavium,smmu-v2", .data = &smmu_generic_v2 },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
@@ -1749,6 +1765,7 @@ MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
 static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 {
 	const struct of_device_id *of_id;
+	const struct arm_smmu_match_data *data;
 	struct resource *res;
 	struct arm_smmu_device *smmu;
 	struct device *dev = &pdev->dev;
@@ -1764,7 +1781,9 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 	smmu->dev = dev;
 
 	of_id = of_match_node(arm_smmu_of_match, dev->of_node);
-	smmu->version = (enum arm_smmu_arch_version)of_id->data;
+	data = of_id->data;
+	smmu->version = data->version;
+	smmu->model = data->model;
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	smmu->base = devm_ioremap_resource(dev, res);
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 1/7] iommu/arm-smmu: Differentiate specific implementations
@ 2016-04-13 17:12     ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

As the inevitable reality of implementation-specific errata workarounds
begin to accrue alongside our integration quirk handling, it's about
time the driver had a decent way of keeping track. Extend the per-SMMU
data so we can identify specific implementations in an efficient and
firmware-agnostic manner.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index e933679..2d5f357 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,10 @@ enum arm_smmu_arch_version {
 	ARM_SMMU_V2,
 };
 
+enum arm_smmu_implementation {
+	GENERIC_SMMU,
+};
+
 struct arm_smmu_smr {
 	u8				idx;
 	u16				mask;
@@ -315,6 +319,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
 	u32				options;
 	enum arm_smmu_arch_version	version;
+	enum arm_smmu_implementation	model;
 
 	u32				num_context_banks;
 	u32				num_s2_context_banks;
@@ -1735,13 +1740,24 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	return 0;
 }
 
+struct arm_smmu_match_data {
+	enum arm_smmu_arch_version version;
+	enum arm_smmu_implementation model;
+};
+
+#define ARM_SMMU_MATCH_DATA(name, ver, imp)	\
+static struct arm_smmu_match_data name = { .version = ver, .model = imp }
+
+ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
+ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
+
 static const struct of_device_id arm_smmu_of_match[] = {
-	{ .compatible = "arm,smmu-v1", .data = (void *)ARM_SMMU_V1 },
-	{ .compatible = "arm,smmu-v2", .data = (void *)ARM_SMMU_V2 },
-	{ .compatible = "arm,mmu-400", .data = (void *)ARM_SMMU_V1 },
-	{ .compatible = "arm,mmu-401", .data = (void *)ARM_SMMU_V1 },
-	{ .compatible = "arm,mmu-500", .data = (void *)ARM_SMMU_V2 },
-	{ .compatible = "cavium,smmu-v2", .data = (void *)ARM_SMMU_V2 },
+	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
+	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
+	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
+	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
+	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
+	{ .compatible = "cavium,smmu-v2", .data = &smmu_generic_v2 },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
@@ -1749,6 +1765,7 @@ MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
 static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 {
 	const struct of_device_id *of_id;
+	const struct arm_smmu_match_data *data;
 	struct resource *res;
 	struct arm_smmu_device *smmu;
 	struct device *dev = &pdev->dev;
@@ -1764,7 +1781,9 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 	smmu->dev = dev;
 
 	of_id = of_match_node(arm_smmu_of_match, dev->of_node);
-	smmu->version = (enum arm_smmu_arch_version)of_id->data;
+	data = of_id->data;
+	smmu->version = data->version;
+	smmu->model = data->model;
 
 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 	smmu->base = devm_ioremap_resource(dev, res);
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/7] iommu/arm-smmu: Convert ThunderX workaround to new method
  2016-04-13 17:12 ` Robin Murphy
@ 2016-04-13 17:12     ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:12 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

With a framework for implementation-specific funtionality in place, the
currently-FDT-dependent ThunderX workaround gets to be the first user.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 2d5f357..d8bc20a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -280,6 +280,7 @@ enum arm_smmu_arch_version {
 
 enum arm_smmu_implementation {
 	GENERIC_SMMU,
+	CAVIUM_SMMUV2,
 };
 
 struct arm_smmu_smr {
@@ -1686,6 +1687,17 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	}
 	dev_notice(smmu->dev, "\t%u context banks (%u stage-2 only)\n",
 		   smmu->num_context_banks, smmu->num_s2_context_banks);
+	/*
+	 * Cavium CN88xx erratum #27704.
+	 * Ensure ASID and VMID allocation is unique across all SMMUs in
+	 * the system.
+	 */
+	if (smmu->model == CAVIUM_SMMUV2) {
+		smmu->cavium_id_base =
+			atomic_add_return(smmu->num_context_banks,
+					  &cavium_smmu_context_count);
+		smmu->cavium_id_base -= smmu->num_context_banks;
+	}
 
 	/* ID2 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID2);
@@ -1750,6 +1762,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
+ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
 static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
@@ -1757,7 +1770,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
-	{ .compatible = "cavium,smmu-v2", .data = &smmu_generic_v2 },
+	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
@@ -1871,18 +1884,6 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 		}
 	}
 
-	/*
-	 * Cavium CN88xx erratum #27704.
-	 * Ensure ASID and VMID allocation is unique across all SMMUs in
-	 * the system.
-	 */
-	if (of_device_is_compatible(dev->of_node, "cavium,smmu-v2")) {
-		smmu->cavium_id_base =
-			atomic_add_return(smmu->num_context_banks,
-					  &cavium_smmu_context_count);
-		smmu->cavium_id_base -= smmu->num_context_banks;
-	}
-
 	INIT_LIST_HEAD(&smmu->list);
 	spin_lock(&arm_smmu_devices_lock);
 	list_add(&smmu->list, &arm_smmu_devices);
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 2/7] iommu/arm-smmu: Convert ThunderX workaround to new method
@ 2016-04-13 17:12     ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

With a framework for implementation-specific funtionality in place, the
currently-FDT-dependent ThunderX workaround gets to be the first user.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 2d5f357..d8bc20a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -280,6 +280,7 @@ enum arm_smmu_arch_version {
 
 enum arm_smmu_implementation {
 	GENERIC_SMMU,
+	CAVIUM_SMMUV2,
 };
 
 struct arm_smmu_smr {
@@ -1686,6 +1687,17 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	}
 	dev_notice(smmu->dev, "\t%u context banks (%u stage-2 only)\n",
 		   smmu->num_context_banks, smmu->num_s2_context_banks);
+	/*
+	 * Cavium CN88xx erratum #27704.
+	 * Ensure ASID and VMID allocation is unique across all SMMUs in
+	 * the system.
+	 */
+	if (smmu->model == CAVIUM_SMMUV2) {
+		smmu->cavium_id_base =
+			atomic_add_return(smmu->num_context_banks,
+					  &cavium_smmu_context_count);
+		smmu->cavium_id_base -= smmu->num_context_banks;
+	}
 
 	/* ID2 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID2);
@@ -1750,6 +1762,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
+ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
 static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
@@ -1757,7 +1770,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
-	{ .compatible = "cavium,smmu-v2", .data = &smmu_generic_v2 },
+	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
 	{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
@@ -1871,18 +1884,6 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 		}
 	}
 
-	/*
-	 * Cavium CN88xx erratum #27704.
-	 * Ensure ASID and VMID allocation is unique across all SMMUs in
-	 * the system.
-	 */
-	if (of_device_is_compatible(dev->of_node, "cavium,smmu-v2")) {
-		smmu->cavium_id_base =
-			atomic_add_return(smmu->num_context_banks,
-					  &cavium_smmu_context_count);
-		smmu->cavium_id_base -= smmu->num_context_banks;
-	}
-
 	INIT_LIST_HEAD(&smmu->list);
 	spin_lock(&arm_smmu_devices_lock);
 	list_add(&smmu->list, &arm_smmu_devices);
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/7] iommu/arm-smmu: Work around MMU-500 prefetch errata
  2016-04-13 17:12 ` Robin Murphy
@ 2016-04-13 17:12     ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:12 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: Catalin Marinas,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

MMU-500 erratum #841119 is tickled by a particular set of circumstances
interacting with the next-page prefetcher. Since said prefetcher is
quite dumb and actually detrimental to performance in some cases (by
causing unwanted TLB evictions for non-sequential access patterns), we
lose very little by turning it off, and what we gain is a guarantee that
the erratum is never hit.

As a bonus, the same workaround will also prevent erratum #826419 once
v7 short descriptor support is implemented.

CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 Documentation/arm64/silicon-errata.txt |  1 +
 drivers/iommu/arm-smmu.c               | 16 +++++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index 806f91c..c6938e5 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -53,6 +53,7 @@ stable kernels.
 | ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075    |
 | ARM            | Cortex-A57      | #852523         | N/A                     |
 | ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220    |
+| ARM            | MMU-500         | #841119,#826419 | N/A                     |
 |                |                 |                 |                         |
 | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
 | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d8bc20a..085fc8d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -203,6 +203,7 @@
 #define ARM_SMMU_CB(smmu, n)		((n) * (1 << (smmu)->pgshift))
 
 #define ARM_SMMU_CB_SCTLR		0x0
+#define ARM_SMMU_CB_ACTLR		0x4
 #define ARM_SMMU_CB_RESUME		0x8
 #define ARM_SMMU_CB_TTBCR2		0x10
 #define ARM_SMMU_CB_TTBR0		0x20
@@ -234,6 +235,8 @@
 #define SCTLR_M				(1 << 0)
 #define SCTLR_EAE_SBOP			(SCTLR_AFE | SCTLR_TRE)
 
+#define ARM_MMU500_ACTLR_CPRE		(1 << 1)
+
 #define CB_PAR_F			(1 << 0)
 
 #define ATSR_ACTIVE			(1 << 0)
@@ -280,6 +283,7 @@ enum arm_smmu_arch_version {
 
 enum arm_smmu_implementation {
 	GENERIC_SMMU,
+	ARM_MMU500,
 	CAVIUM_SMMUV2,
 };
 
@@ -1517,6 +1521,15 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 		cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, i);
 		writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
 		writel_relaxed(FSR_FAULT, cb_base + ARM_SMMU_CB_FSR);
+		/*
+		 * Disable MMU-500's not-particularly-beneficial next-page
+		 * prefetcher for the sake of errata #841119 and #826419.
+		 */
+		if (smmu->model == ARM_MMU500) {
+			reg = readl_relaxed(cb_base + ARM_SMMU_CB_ACTLR);
+			reg &= ~ARM_MMU500_ACTLR_CPRE;
+			writel_relaxed(reg, cb_base + ARM_SMMU_CB_ACTLR);
+		}
 	}
 
 	/* Invalidate the TLB, just in case */
@@ -1762,6 +1775,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
+ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
 static const struct of_device_id arm_smmu_of_match[] = {
@@ -1769,7 +1783,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
 	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
-	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
+	{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
 	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
 	{ },
 };
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 3/7] iommu/arm-smmu: Work around MMU-500 prefetch errata
@ 2016-04-13 17:12     ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:12 UTC (permalink / raw)
  To: linux-arm-kernel

MMU-500 erratum #841119 is tickled by a particular set of circumstances
interacting with the next-page prefetcher. Since said prefetcher is
quite dumb and actually detrimental to performance in some cases (by
causing unwanted TLB evictions for non-sequential access patterns), we
lose very little by turning it off, and what we gain is a guarantee that
the erratum is never hit.

As a bonus, the same workaround will also prevent erratum #826419 once
v7 short descriptor support is implemented.

CC: Catalin Marinas <catalin.marinas@arm.com>
CC: Will Deacon <will.deacon@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 Documentation/arm64/silicon-errata.txt |  1 +
 drivers/iommu/arm-smmu.c               | 16 +++++++++++++++-
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
index 806f91c..c6938e5 100644
--- a/Documentation/arm64/silicon-errata.txt
+++ b/Documentation/arm64/silicon-errata.txt
@@ -53,6 +53,7 @@ stable kernels.
 | ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075    |
 | ARM            | Cortex-A57      | #852523         | N/A                     |
 | ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220    |
+| ARM            | MMU-500         | #841119,#826419 | N/A                     |
 |                |                 |                 |                         |
 | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
 | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d8bc20a..085fc8d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -203,6 +203,7 @@
 #define ARM_SMMU_CB(smmu, n)		((n) * (1 << (smmu)->pgshift))
 
 #define ARM_SMMU_CB_SCTLR		0x0
+#define ARM_SMMU_CB_ACTLR		0x4
 #define ARM_SMMU_CB_RESUME		0x8
 #define ARM_SMMU_CB_TTBCR2		0x10
 #define ARM_SMMU_CB_TTBR0		0x20
@@ -234,6 +235,8 @@
 #define SCTLR_M				(1 << 0)
 #define SCTLR_EAE_SBOP			(SCTLR_AFE | SCTLR_TRE)
 
+#define ARM_MMU500_ACTLR_CPRE		(1 << 1)
+
 #define CB_PAR_F			(1 << 0)
 
 #define ATSR_ACTIVE			(1 << 0)
@@ -280,6 +283,7 @@ enum arm_smmu_arch_version {
 
 enum arm_smmu_implementation {
 	GENERIC_SMMU,
+	ARM_MMU500,
 	CAVIUM_SMMUV2,
 };
 
@@ -1517,6 +1521,15 @@ static void arm_smmu_device_reset(struct arm_smmu_device *smmu)
 		cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, i);
 		writel_relaxed(0, cb_base + ARM_SMMU_CB_SCTLR);
 		writel_relaxed(FSR_FAULT, cb_base + ARM_SMMU_CB_FSR);
+		/*
+		 * Disable MMU-500's not-particularly-beneficial next-page
+		 * prefetcher for the sake of errata #841119 and #826419.
+		 */
+		if (smmu->model == ARM_MMU500) {
+			reg = readl_relaxed(cb_base + ARM_SMMU_CB_ACTLR);
+			reg &= ~ARM_MMU500_ACTLR_CPRE;
+			writel_relaxed(reg, cb_base + ARM_SMMU_CB_ACTLR);
+		}
 	}
 
 	/* Invalidate the TLB, just in case */
@@ -1762,6 +1775,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
+ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
 static const struct of_device_id arm_smmu_of_match[] = {
@@ -1769,7 +1783,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
 	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
-	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
+	{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
 	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
 	{ },
 };
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-13 17:12 ` Robin Murphy
@ 2016-04-13 17:13     ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:13 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: Arnd Bergmann, Hitoshi Mitake, Darren Hart, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
accessor macros as conditional wrappers") makes the *_relaxed forms of
I/O accessors universally available to drivers, in cases where writeq()
is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
end up falling back to writel() regardless of whether writel_relaxed()
is available (identically for s/write/read/).

Add corresponding relaxed forms of the nonatomic helpers to delegate
to the equivalent 32-bit accessors as appropriate.

CC: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
CC: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
CC: Darren Hart <dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
CC: Hitoshi Mitake <mitake.hitoshi-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
 include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
index 11d7e84..1a85566 100644
--- a/include/linux/io-64-nonatomic-hi-lo.h
+++ b/include/linux/io-64-nonatomic-hi-lo.h
@@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
 	writel(val, addr);
 }
 
+static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
+{
+	const volatile u32 __iomem *p = addr;
+	u32 low, high;
+
+	high = readl_relaxed(p + 1);
+	low = readl_relaxed(p);
+
+	return low + ((u64)high << 32);
+}
+
+static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
+{
+	writel_relaxed(val >> 32, addr + 4);
+	writel_relaxed(val, addr);
+}
+
 #ifndef readq
 #define readq hi_lo_readq
 #endif
@@ -29,4 +46,12 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
 #define writeq hi_lo_writeq
 #endif
 
+#ifndef readq_relaxed
+#define readq_relaxed hi_lo_readq_relaxed
+#endif
+
+#ifndef writeq_relaxed
+#define writeq hi_lo_writeq_relaxed
+#endif
+
 #endif	/* _LINUX_IO_64_NONATOMIC_HI_LO_H_ */
diff --git a/include/linux/io-64-nonatomic-lo-hi.h b/include/linux/io-64-nonatomic-lo-hi.h
index 1a4315f..1b7411d 100644
--- a/include/linux/io-64-nonatomic-lo-hi.h
+++ b/include/linux/io-64-nonatomic-lo-hi.h
@@ -21,6 +21,23 @@ static inline void lo_hi_writeq(__u64 val, volatile void __iomem *addr)
 	writel(val >> 32, addr + 4);
 }
 
+static inline __u64 lo_hi_readq_relaxed(const volatile void __iomem *addr)
+{
+	const volatile u32 __iomem *p = addr;
+	u32 low, high;
+
+	low = readl_relaxed(p);
+	high = readl_relaxed(p + 1);
+
+	return low + ((u64)high << 32);
+}
+
+static inline void lo_hi_writeq_relaxed(__u64 val, volatile void __iomem *addr)
+{
+	writel_relaxed(val, addr);
+	writel_relaxed(val >> 32, addr + 4);
+}
+
 #ifndef readq
 #define readq lo_hi_readq
 #endif
@@ -29,4 +46,12 @@ static inline void lo_hi_writeq(__u64 val, volatile void __iomem *addr)
 #define writeq lo_hi_writeq
 #endif
 
+#ifndef readq_relaxed
+#define readq_relaxed hi_lo_readq_relaxed
+#endif
+
+#ifndef writeq_relaxed
+#define writeq hi_lo_writeq_relaxed
+#endif
+
 #endif	/* _LINUX_IO_64_NONATOMIC_LO_HI_H_ */
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-13 17:13     ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
accessor macros as conditional wrappers") makes the *_relaxed forms of
I/O accessors universally available to drivers, in cases where writeq()
is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
end up falling back to writel() regardless of whether writel_relaxed()
is available (identically for s/write/read/).

Add corresponding relaxed forms of the nonatomic helpers to delegate
to the equivalent 32-bit accessors as appropriate.

CC: Arnd Bergmann <arnd@arndb.de>
CC: Christoph Hellwig <hch@lst.de>
CC: Darren Hart <dvhart@linux.intel.com>
CC: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
 include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
 2 files changed, 50 insertions(+)

diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
index 11d7e84..1a85566 100644
--- a/include/linux/io-64-nonatomic-hi-lo.h
+++ b/include/linux/io-64-nonatomic-hi-lo.h
@@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
 	writel(val, addr);
 }
 
+static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
+{
+	const volatile u32 __iomem *p = addr;
+	u32 low, high;
+
+	high = readl_relaxed(p + 1);
+	low = readl_relaxed(p);
+
+	return low + ((u64)high << 32);
+}
+
+static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
+{
+	writel_relaxed(val >> 32, addr + 4);
+	writel_relaxed(val, addr);
+}
+
 #ifndef readq
 #define readq hi_lo_readq
 #endif
@@ -29,4 +46,12 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
 #define writeq hi_lo_writeq
 #endif
 
+#ifndef readq_relaxed
+#define readq_relaxed hi_lo_readq_relaxed
+#endif
+
+#ifndef writeq_relaxed
+#define writeq hi_lo_writeq_relaxed
+#endif
+
 #endif	/* _LINUX_IO_64_NONATOMIC_HI_LO_H_ */
diff --git a/include/linux/io-64-nonatomic-lo-hi.h b/include/linux/io-64-nonatomic-lo-hi.h
index 1a4315f..1b7411d 100644
--- a/include/linux/io-64-nonatomic-lo-hi.h
+++ b/include/linux/io-64-nonatomic-lo-hi.h
@@ -21,6 +21,23 @@ static inline void lo_hi_writeq(__u64 val, volatile void __iomem *addr)
 	writel(val >> 32, addr + 4);
 }
 
+static inline __u64 lo_hi_readq_relaxed(const volatile void __iomem *addr)
+{
+	const volatile u32 __iomem *p = addr;
+	u32 low, high;
+
+	low = readl_relaxed(p);
+	high = readl_relaxed(p + 1);
+
+	return low + ((u64)high << 32);
+}
+
+static inline void lo_hi_writeq_relaxed(__u64 val, volatile void __iomem *addr)
+{
+	writel_relaxed(val, addr);
+	writel_relaxed(val >> 32, addr + 4);
+}
+
 #ifndef readq
 #define readq lo_hi_readq
 #endif
@@ -29,4 +46,12 @@ static inline void lo_hi_writeq(__u64 val, volatile void __iomem *addr)
 #define writeq lo_hi_writeq
 #endif
 
+#ifndef readq_relaxed
+#define readq_relaxed hi_lo_readq_relaxed
+#endif
+
+#ifndef writeq_relaxed
+#define writeq hi_lo_writeq_relaxed
+#endif
+
 #endif	/* _LINUX_IO_64_NONATOMIC_LO_HI_H_ */
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 5/7] iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses
  2016-04-13 17:12 ` Robin Murphy
@ 2016-04-13 17:13     ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:13 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

With {read,write}q_relaxed now able to fall back to the common
nonatomic-hi-lo helper, make use of that so that we don't have to
open-code our own. In the process, also convert the other remaining
split accesses, and repurpose the custom accessor to smooth out the
couple of troublesome instances where we really want to avoid
nonatomic writes (and a 64-bit access is unnecessary in the 32-bit
context formats we would use on a 32-bit CPU).

This paves the way for getting rid of some of the assumptions currently
baked into the driver which make it really awkward to use 32-bit context
formats with SMMUv2 under a 64-bit kernel.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 48 +++++++++++++++++++-----------------------------
 1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 085fc8d..acff332 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -34,6 +34,7 @@
 #include <linux/err.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
+#include <linux/io-64-nonatomic-hi-lo.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
 #include <linux/module.h>
@@ -71,16 +72,15 @@
 		((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS)	\
 			? 0x400 : 0))
 
+/*
+ * Some 64-bit registers only make sense to write atomically, but in such
+ * cases all the data relevant to AArch32 formats lies within the lower word,
+ * therefore this actually makes more sense than it might first appear.
+ */
 #ifdef CONFIG_64BIT
-#define smmu_writeq	writeq_relaxed
+#define smmu_write_atomic_lq		writeq_relaxed
 #else
-#define smmu_writeq(reg64, addr)				\
-	do {							\
-		u64 __val = (reg64);				\
-		void __iomem *__addr = (addr);			\
-		writel_relaxed(__val >> 32, __addr + 4);	\
-		writel_relaxed(__val, __addr);			\
-	} while (0)
+#define smmu_write_atomic_lq		writel_relaxed
 #endif
 
 /* Configuration registers */
@@ -211,11 +211,9 @@
 #define ARM_SMMU_CB_TTBCR		0x30
 #define ARM_SMMU_CB_S1_MAIR0		0x38
 #define ARM_SMMU_CB_S1_MAIR1		0x3c
-#define ARM_SMMU_CB_PAR_LO		0x50
-#define ARM_SMMU_CB_PAR_HI		0x54
+#define ARM_SMMU_CB_PAR			0x50
 #define ARM_SMMU_CB_FSR			0x58
-#define ARM_SMMU_CB_FAR_LO		0x60
-#define ARM_SMMU_CB_FAR_HI		0x64
+#define ARM_SMMU_CB_FAR			0x60
 #define ARM_SMMU_CB_FSYNR0		0x68
 #define ARM_SMMU_CB_S1_TLBIVA		0x600
 #define ARM_SMMU_CB_S1_TLBIASID		0x610
@@ -645,7 +643,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 			      ARM_SMMU_CB_S2_TLBIIPAS2;
 		iova >>= 12;
 		do {
-			writeq_relaxed(iova, reg);
+			smmu_write_atomic_lq(iova, reg);
 			iova += granule >> 12;
 		} while (size -= granule);
 #endif
@@ -664,7 +662,7 @@ static struct iommu_gather_ops arm_smmu_gather_ops = {
 static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 {
 	int flags, ret;
-	u32 fsr, far, fsynr, resume;
+	u32 fsr, fsynr, resume;
 	unsigned long iova;
 	struct iommu_domain *domain = dev;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -686,13 +684,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
 	flags = fsynr & FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ;
 
-	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_LO);
-	iova = far;
-#ifdef CONFIG_64BIT
-	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_HI);
-	iova |= ((unsigned long)far << 32);
-#endif
-
+	iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
 	if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
 		ret = IRQ_HANDLED;
 		resume = RESUME_RETRY;
@@ -788,14 +780,14 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 		reg64 = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 
 		reg64 |= ((u64)ARM_SMMU_CB_ASID(smmu, cfg)) << TTBRn_ASID_SHIFT;
-		smmu_writeq(reg64, cb_base + ARM_SMMU_CB_TTBR0);
+		writeq_relaxed(reg64, cb_base + ARM_SMMU_CB_TTBR0);
 
 		reg64 = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
 		reg64 |= ((u64)ARM_SMMU_CB_ASID(smmu, cfg)) << TTBRn_ASID_SHIFT;
-		smmu_writeq(reg64, cb_base + ARM_SMMU_CB_TTBR1);
+		writeq_relaxed(reg64, cb_base + ARM_SMMU_CB_TTBR1);
 	} else {
 		reg64 = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
-		smmu_writeq(reg64, cb_base + ARM_SMMU_CB_TTBR0);
+		writeq_relaxed(reg64, cb_base + ARM_SMMU_CB_TTBR0);
 	}
 
 	/* TTBCR */
@@ -1263,8 +1255,8 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 	/* ATS1 registers can only be written atomically */
 	va = iova & ~0xfffUL;
 	if (smmu->version == ARM_SMMU_V2)
-		smmu_writeq(va, cb_base + ARM_SMMU_CB_ATS1PR);
-	else
+		smmu_write_atomic_lq(va, cb_base + ARM_SMMU_CB_ATS1PR);
+	else /* Register is only 32-bit in v1 */
 		writel_relaxed(va, cb_base + ARM_SMMU_CB_ATS1PR);
 
 	if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
@@ -1275,9 +1267,7 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 		return ops->iova_to_phys(ops, iova);
 	}
 
-	phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
-	phys |= ((u64)readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32;
-
+	phys = readq_relaxed(cb_base + ARM_SMMU_CB_PAR);
 	if (phys & CB_PAR_F) {
 		dev_err(dev, "translation fault!\n");
 		dev_err(dev, "PAR = 0x%llx\n", phys);
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 5/7] iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses
@ 2016-04-13 17:13     ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

With {read,write}q_relaxed now able to fall back to the common
nonatomic-hi-lo helper, make use of that so that we don't have to
open-code our own. In the process, also convert the other remaining
split accesses, and repurpose the custom accessor to smooth out the
couple of troublesome instances where we really want to avoid
nonatomic writes (and a 64-bit access is unnecessary in the 32-bit
context formats we would use on a 32-bit CPU).

This paves the way for getting rid of some of the assumptions currently
baked into the driver which make it really awkward to use 32-bit context
formats with SMMUv2 under a 64-bit kernel.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 48 +++++++++++++++++++-----------------------------
 1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 085fc8d..acff332 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -34,6 +34,7 @@
 #include <linux/err.h>
 #include <linux/interrupt.h>
 #include <linux/io.h>
+#include <linux/io-64-nonatomic-hi-lo.h>
 #include <linux/iommu.h>
 #include <linux/iopoll.h>
 #include <linux/module.h>
@@ -71,16 +72,15 @@
 		((smmu->options & ARM_SMMU_OPT_SECURE_CFG_ACCESS)	\
 			? 0x400 : 0))
 
+/*
+ * Some 64-bit registers only make sense to write atomically, but in such
+ * cases all the data relevant to AArch32 formats lies within the lower word,
+ * therefore this actually makes more sense than it might first appear.
+ */
 #ifdef CONFIG_64BIT
-#define smmu_writeq	writeq_relaxed
+#define smmu_write_atomic_lq		writeq_relaxed
 #else
-#define smmu_writeq(reg64, addr)				\
-	do {							\
-		u64 __val = (reg64);				\
-		void __iomem *__addr = (addr);			\
-		writel_relaxed(__val >> 32, __addr + 4);	\
-		writel_relaxed(__val, __addr);			\
-	} while (0)
+#define smmu_write_atomic_lq		writel_relaxed
 #endif
 
 /* Configuration registers */
@@ -211,11 +211,9 @@
 #define ARM_SMMU_CB_TTBCR		0x30
 #define ARM_SMMU_CB_S1_MAIR0		0x38
 #define ARM_SMMU_CB_S1_MAIR1		0x3c
-#define ARM_SMMU_CB_PAR_LO		0x50
-#define ARM_SMMU_CB_PAR_HI		0x54
+#define ARM_SMMU_CB_PAR			0x50
 #define ARM_SMMU_CB_FSR			0x58
-#define ARM_SMMU_CB_FAR_LO		0x60
-#define ARM_SMMU_CB_FAR_HI		0x64
+#define ARM_SMMU_CB_FAR			0x60
 #define ARM_SMMU_CB_FSYNR0		0x68
 #define ARM_SMMU_CB_S1_TLBIVA		0x600
 #define ARM_SMMU_CB_S1_TLBIASID		0x610
@@ -645,7 +643,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 			      ARM_SMMU_CB_S2_TLBIIPAS2;
 		iova >>= 12;
 		do {
-			writeq_relaxed(iova, reg);
+			smmu_write_atomic_lq(iova, reg);
 			iova += granule >> 12;
 		} while (size -= granule);
 #endif
@@ -664,7 +662,7 @@ static struct iommu_gather_ops arm_smmu_gather_ops = {
 static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 {
 	int flags, ret;
-	u32 fsr, far, fsynr, resume;
+	u32 fsr, fsynr, resume;
 	unsigned long iova;
 	struct iommu_domain *domain = dev;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -686,13 +684,7 @@ static irqreturn_t arm_smmu_context_fault(int irq, void *dev)
 	fsynr = readl_relaxed(cb_base + ARM_SMMU_CB_FSYNR0);
 	flags = fsynr & FSYNR0_WNR ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ;
 
-	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_LO);
-	iova = far;
-#ifdef CONFIG_64BIT
-	far = readl_relaxed(cb_base + ARM_SMMU_CB_FAR_HI);
-	iova |= ((unsigned long)far << 32);
-#endif
-
+	iova = readq_relaxed(cb_base + ARM_SMMU_CB_FAR);
 	if (!report_iommu_fault(domain, smmu->dev, iova, flags)) {
 		ret = IRQ_HANDLED;
 		resume = RESUME_RETRY;
@@ -788,14 +780,14 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 		reg64 = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 
 		reg64 |= ((u64)ARM_SMMU_CB_ASID(smmu, cfg)) << TTBRn_ASID_SHIFT;
-		smmu_writeq(reg64, cb_base + ARM_SMMU_CB_TTBR0);
+		writeq_relaxed(reg64, cb_base + ARM_SMMU_CB_TTBR0);
 
 		reg64 = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
 		reg64 |= ((u64)ARM_SMMU_CB_ASID(smmu, cfg)) << TTBRn_ASID_SHIFT;
-		smmu_writeq(reg64, cb_base + ARM_SMMU_CB_TTBR1);
+		writeq_relaxed(reg64, cb_base + ARM_SMMU_CB_TTBR1);
 	} else {
 		reg64 = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
-		smmu_writeq(reg64, cb_base + ARM_SMMU_CB_TTBR0);
+		writeq_relaxed(reg64, cb_base + ARM_SMMU_CB_TTBR0);
 	}
 
 	/* TTBCR */
@@ -1263,8 +1255,8 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 	/* ATS1 registers can only be written atomically */
 	va = iova & ~0xfffUL;
 	if (smmu->version == ARM_SMMU_V2)
-		smmu_writeq(va, cb_base + ARM_SMMU_CB_ATS1PR);
-	else
+		smmu_write_atomic_lq(va, cb_base + ARM_SMMU_CB_ATS1PR);
+	else /* Register is only 32-bit in v1 */
 		writel_relaxed(va, cb_base + ARM_SMMU_CB_ATS1PR);
 
 	if (readl_poll_timeout_atomic(cb_base + ARM_SMMU_CB_ATSR, tmp,
@@ -1275,9 +1267,7 @@ static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain,
 		return ops->iova_to_phys(ops, iova);
 	}
 
-	phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO);
-	phys |= ((u64)readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32;
-
+	phys = readq_relaxed(cb_base + ARM_SMMU_CB_PAR);
 	if (phys & CB_PAR_F) {
 		dev_err(dev, "translation fault!\n");
 		dev_err(dev, "PAR = 0x%llx\n", phys);
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
  2016-04-13 17:12 ` Robin Murphy
@ 2016-04-13 17:13     ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:13 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

The way the driver currently forces an AArch32 or AArch64 context format
based on the kernel config and SMMU architecture version is suboptimal,
in that it makes it very hard to support oddball mix-and-match cases
like the SMMUv1 64KB supplement, or situations where the reduced table
depth of an AArch32 short descriptor context may be desirable under an
AArch64 kernel. It also only happens to work on current implementations
which do support all the relevant formats.

Introduce an explicit notion of context format, so we can manage that
independently and get rid of the inflexible #ifdeffery.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 98 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 80 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index acff332..1d4285f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -117,6 +117,8 @@
 #define ID0_NTS				(1 << 28)
 #define ID0_SMS				(1 << 27)
 #define ID0_ATOSNS			(1 << 26)
+#define ID0_PTFS_NO_AARCH32		(1 << 25)
+#define ID0_PTFS_NO_AARCH32S		(1 << 24)
 #define ID0_CTTW			(1 << 14)
 #define ID0_NUMIRPT_SHIFT		16
 #define ID0_NUMIRPT_MASK		0xff
@@ -317,6 +319,11 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
 #define ARM_SMMU_FEAT_TRANS_OPS		(1 << 5)
 #define ARM_SMMU_FEAT_VMID16		(1 << 6)
+#define ARM_SMMU_FEAT_FMT_AARCH64_4K	(1 << 7)
+#define ARM_SMMU_FEAT_FMT_AARCH64_16K	(1 << 8)
+#define ARM_SMMU_FEAT_FMT_AARCH64_64K	(1 << 9)
+#define ARM_SMMU_FEAT_FMT_AARCH32_L	(1 << 10)
+#define ARM_SMMU_FEAT_FMT_AARCH32_S	(1 << 11)
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
@@ -346,10 +353,18 @@ struct arm_smmu_device {
 	u32				cavium_id_base; /* Specific to Cavium */
 };
 
+enum arm_smmu_context_fmt {
+	ARM_SMMU_CTX_FMT_NONE,
+	ARM_SMMU_CTX_FMT_AARCH64,
+	ARM_SMMU_CTX_FMT_AARCH32_L,
+	ARM_SMMU_CTX_FMT_AARCH32_S,
+};
+
 struct arm_smmu_cfg {
 	u8				cbndx;
 	u8				irptndx;
 	u32				cbar;
+	enum arm_smmu_context_fmt	fmt;
 };
 #define INVALID_IRPTNDX			0xff
 
@@ -619,14 +634,13 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 		reg = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
-		if (!IS_ENABLED(CONFIG_64BIT) || smmu->version == ARM_SMMU_V1) {
+		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
 			iova &= ~12UL;
 			iova |= ARM_SMMU_CB_ASID(smmu, cfg);
 			do {
 				writel_relaxed(iova, reg);
 				iova += granule;
 			} while (size -= granule);
-#ifdef CONFIG_64BIT
 		} else {
 			iova >>= 12;
 			iova |= (u64)ARM_SMMU_CB_ASID(smmu, cfg) << 48;
@@ -634,9 +648,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 				writeq_relaxed(iova, reg);
 				iova += granule >> 12;
 			} while (size -= granule);
-#endif
 		}
-#ifdef CONFIG_64BIT
 	} else if (smmu->version == ARM_SMMU_V2) {
 		reg = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
@@ -646,7 +658,6 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 			smmu_write_atomic_lq(iova, reg);
 			iova += granule >> 12;
 		} while (size -= granule);
-#endif
 	} else {
 		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
 		writel_relaxed(ARM_SMMU_CB_VMID(smmu, cfg), reg);
@@ -745,11 +756,10 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 
 	if (smmu->version > ARM_SMMU_V1) {
-#ifdef CONFIG_64BIT
-		reg = CBA2R_RW64_64BIT;
-#else
-		reg = CBA2R_RW64_32BIT;
-#endif
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+			reg = CBA2R_RW64_64BIT;
+		else
+			reg = CBA2R_RW64_32BIT;
 		/* 16-bit VMIDs live in CBA2R */
 		if (smmu->features & ARM_SMMU_FEAT_VMID16)
 			reg |= ARM_SMMU_CB_VMID(smmu, cfg) << CBA2R_VMID_SHIFT;
@@ -860,16 +870,48 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+	/*
+	 * Choosing a suitable context format is even more fiddly. Until we
+	 * grow some way for the caller to express a preference, just aim for
+	 * the closest match to the CPU format out of what we might have.
+	 */
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_L)
+		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_L;
+	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
+		switch (PAGE_SIZE) {
+		case SZ_64K:
+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+				break;
+			} /* else fall through */
+		case SZ_16K:
+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+				break;
+			} /* else fall through */
+		case SZ_4K:
+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+		}
+	}
+	if (cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
 		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
 		start = smmu->num_s2_context_banks;
 		ias = smmu->va_size;
 		oas = smmu->ipa_size;
-		if (IS_ENABLED(CONFIG_64BIT))
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
 			fmt = ARM_64_LPAE_S1;
-		else
+		} else {
 			fmt = ARM_32_LPAE_S1;
+			ias = min(ias, 32UL);
+			oas = min(oas, 40UL);
+		}
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 		/*
@@ -881,10 +923,13 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		start = 0;
 		ias = smmu->ipa_size;
 		oas = smmu->pa_size;
-		if (IS_ENABLED(CONFIG_64BIT))
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
 			fmt = ARM_64_LPAE_S2;
-		else
+		} else {
 			fmt = ARM_32_LPAE_S2;
+			ias = min(ias, 40UL);
+			oas = min(oas, 40UL);
+		}
 		break;
 	default:
 		ret = -EINVAL;
@@ -1670,6 +1715,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 					   ID0_NUMSIDB_MASK;
 	}
 
+	if (smmu->version < ARM_SMMU_V2 || !(id & ID0_PTFS_NO_AARCH32)) {
+		smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_L;
+		if (!(id & ID0_PTFS_NO_AARCH32S))
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_S;
+	}
+
 	/* ID1 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID1);
 	smmu->pgshift = (id & ID1_PAGESIZE) ? 16 : 12;
@@ -1725,7 +1776,6 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 
 	if (smmu->version == ARM_SMMU_V1) {
 		smmu->va_size = smmu->ipa_size;
-		size = SZ_4K | SZ_2M | SZ_1G;
 	} else {
 		size = (id >> ID2_UBS_SHIFT) & ID2_UBS_MASK;
 		smmu->va_size = arm_smmu_id_size_to_bits(size);
@@ -1734,13 +1784,25 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 #endif
 		size = 0;
 		if (id & ID2_PTFS_4K)
-			size |= SZ_4K | SZ_2M | SZ_1G;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_4K;
 		if (id & ID2_PTFS_16K)
-			size |= SZ_16K | SZ_32M;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_16K;
 		if (id & ID2_PTFS_64K)
-			size |= SZ_64K | SZ_512M;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
 	}
 
+	/* Now we've corralled the various formats, what'll it do? */
+	size = 0;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_S)
+		size |= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+	if (smmu->features &
+	    (ARM_SMMU_FEAT_FMT_AARCH32_L | ARM_SMMU_FEAT_FMT_AARCH64_4K))
+		size |= SZ_4K | SZ_2M | SZ_1G;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K)
+		size |= SZ_16K | SZ_32M;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K)
+		size |= SZ_64K | SZ_512M;
+
 	arm_smmu_ops.pgsize_bitmap &= size;
 	dev_notice(smmu->dev, "\tSupported page sizes: 0x%08lx\n", size);
 
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
@ 2016-04-13 17:13     ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

The way the driver currently forces an AArch32 or AArch64 context format
based on the kernel config and SMMU architecture version is suboptimal,
in that it makes it very hard to support oddball mix-and-match cases
like the SMMUv1 64KB supplement, or situations where the reduced table
depth of an AArch32 short descriptor context may be desirable under an
AArch64 kernel. It also only happens to work on current implementations
which do support all the relevant formats.

Introduce an explicit notion of context format, so we can manage that
independently and get rid of the inflexible #ifdeffery.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 98 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 80 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index acff332..1d4285f 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -117,6 +117,8 @@
 #define ID0_NTS				(1 << 28)
 #define ID0_SMS				(1 << 27)
 #define ID0_ATOSNS			(1 << 26)
+#define ID0_PTFS_NO_AARCH32		(1 << 25)
+#define ID0_PTFS_NO_AARCH32S		(1 << 24)
 #define ID0_CTTW			(1 << 14)
 #define ID0_NUMIRPT_SHIFT		16
 #define ID0_NUMIRPT_MASK		0xff
@@ -317,6 +319,11 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
 #define ARM_SMMU_FEAT_TRANS_OPS		(1 << 5)
 #define ARM_SMMU_FEAT_VMID16		(1 << 6)
+#define ARM_SMMU_FEAT_FMT_AARCH64_4K	(1 << 7)
+#define ARM_SMMU_FEAT_FMT_AARCH64_16K	(1 << 8)
+#define ARM_SMMU_FEAT_FMT_AARCH64_64K	(1 << 9)
+#define ARM_SMMU_FEAT_FMT_AARCH32_L	(1 << 10)
+#define ARM_SMMU_FEAT_FMT_AARCH32_S	(1 << 11)
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
@@ -346,10 +353,18 @@ struct arm_smmu_device {
 	u32				cavium_id_base; /* Specific to Cavium */
 };
 
+enum arm_smmu_context_fmt {
+	ARM_SMMU_CTX_FMT_NONE,
+	ARM_SMMU_CTX_FMT_AARCH64,
+	ARM_SMMU_CTX_FMT_AARCH32_L,
+	ARM_SMMU_CTX_FMT_AARCH32_S,
+};
+
 struct arm_smmu_cfg {
 	u8				cbndx;
 	u8				irptndx;
 	u32				cbar;
+	enum arm_smmu_context_fmt	fmt;
 };
 #define INVALID_IRPTNDX			0xff
 
@@ -619,14 +634,13 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 		reg = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
-		if (!IS_ENABLED(CONFIG_64BIT) || smmu->version == ARM_SMMU_V1) {
+		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
 			iova &= ~12UL;
 			iova |= ARM_SMMU_CB_ASID(smmu, cfg);
 			do {
 				writel_relaxed(iova, reg);
 				iova += granule;
 			} while (size -= granule);
-#ifdef CONFIG_64BIT
 		} else {
 			iova >>= 12;
 			iova |= (u64)ARM_SMMU_CB_ASID(smmu, cfg) << 48;
@@ -634,9 +648,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 				writeq_relaxed(iova, reg);
 				iova += granule >> 12;
 			} while (size -= granule);
-#endif
 		}
-#ifdef CONFIG_64BIT
 	} else if (smmu->version == ARM_SMMU_V2) {
 		reg = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
@@ -646,7 +658,6 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 			smmu_write_atomic_lq(iova, reg);
 			iova += granule >> 12;
 		} while (size -= granule);
-#endif
 	} else {
 		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
 		writel_relaxed(ARM_SMMU_CB_VMID(smmu, cfg), reg);
@@ -745,11 +756,10 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 
 	if (smmu->version > ARM_SMMU_V1) {
-#ifdef CONFIG_64BIT
-		reg = CBA2R_RW64_64BIT;
-#else
-		reg = CBA2R_RW64_32BIT;
-#endif
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+			reg = CBA2R_RW64_64BIT;
+		else
+			reg = CBA2R_RW64_32BIT;
 		/* 16-bit VMIDs live in CBA2R */
 		if (smmu->features & ARM_SMMU_FEAT_VMID16)
 			reg |= ARM_SMMU_CB_VMID(smmu, cfg) << CBA2R_VMID_SHIFT;
@@ -860,16 +870,48 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+	/*
+	 * Choosing a suitable context format is even more fiddly. Until we
+	 * grow some way for the caller to express a preference, just aim for
+	 * the closest match to the CPU format out of what we might have.
+	 */
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_L)
+		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_L;
+	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
+		switch (PAGE_SIZE) {
+		case SZ_64K:
+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+				break;
+			} /* else fall through */
+		case SZ_16K:
+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+				break;
+			} /* else fall through */
+		case SZ_4K:
+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+		}
+	}
+	if (cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
 		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
 		start = smmu->num_s2_context_banks;
 		ias = smmu->va_size;
 		oas = smmu->ipa_size;
-		if (IS_ENABLED(CONFIG_64BIT))
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
 			fmt = ARM_64_LPAE_S1;
-		else
+		} else {
 			fmt = ARM_32_LPAE_S1;
+			ias = min(ias, 32UL);
+			oas = min(oas, 40UL);
+		}
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 		/*
@@ -881,10 +923,13 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		start = 0;
 		ias = smmu->ipa_size;
 		oas = smmu->pa_size;
-		if (IS_ENABLED(CONFIG_64BIT))
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
 			fmt = ARM_64_LPAE_S2;
-		else
+		} else {
 			fmt = ARM_32_LPAE_S2;
+			ias = min(ias, 40UL);
+			oas = min(oas, 40UL);
+		}
 		break;
 	default:
 		ret = -EINVAL;
@@ -1670,6 +1715,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 					   ID0_NUMSIDB_MASK;
 	}
 
+	if (smmu->version < ARM_SMMU_V2 || !(id & ID0_PTFS_NO_AARCH32)) {
+		smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_L;
+		if (!(id & ID0_PTFS_NO_AARCH32S))
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_S;
+	}
+
 	/* ID1 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID1);
 	smmu->pgshift = (id & ID1_PAGESIZE) ? 16 : 12;
@@ -1725,7 +1776,6 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 
 	if (smmu->version == ARM_SMMU_V1) {
 		smmu->va_size = smmu->ipa_size;
-		size = SZ_4K | SZ_2M | SZ_1G;
 	} else {
 		size = (id >> ID2_UBS_SHIFT) & ID2_UBS_MASK;
 		smmu->va_size = arm_smmu_id_size_to_bits(size);
@@ -1734,13 +1784,25 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 #endif
 		size = 0;
 		if (id & ID2_PTFS_4K)
-			size |= SZ_4K | SZ_2M | SZ_1G;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_4K;
 		if (id & ID2_PTFS_16K)
-			size |= SZ_16K | SZ_32M;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_16K;
 		if (id & ID2_PTFS_64K)
-			size |= SZ_64K | SZ_512M;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
 	}
 
+	/* Now we've corralled the various formats, what'll it do? */
+	size = 0;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_S)
+		size |= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+	if (smmu->features &
+	    (ARM_SMMU_FEAT_FMT_AARCH32_L | ARM_SMMU_FEAT_FMT_AARCH64_4K))
+		size |= SZ_4K | SZ_2M | SZ_1G;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K)
+		size |= SZ_16K | SZ_32M;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K)
+		size |= SZ_64K | SZ_512M;
+
 	arm_smmu_ops.pgsize_bitmap &= size;
 	dev_notice(smmu->dev, "\tSupported page sizes: 0x%08lx\n", size);
 
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 7/7] iommu/arm-smmu: Support SMMUv1 64KB supplement
  2016-04-13 17:12 ` Robin Murphy
@ 2016-04-13 17:13     ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:13 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: Eric Auger, iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

The 64KB Translation Granule Supplement to the SMMUv1 architecture
allows an SMMUv1 implementation to support 64KB pages for stage 2
translations, using a constrained VMSAv8 descriptor format limited
to 40-bit addresses. Now that we can freely mix and match context
formats, we can actually handle having 4KB pages via an AArch32
context but 64KB pages via an AArch64 context, so plumb it in.

It is assumed that any implementations will have hardware capabilities
matching the format constraints, thus obviating the need for excessive
sanity-checking; this is the case for MMU-401, the only ARM Ltd.
implementation.

CC: Eric Auger <eric.auger-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
 drivers/iommu/arm-smmu.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1d4285f..00aa948 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -277,7 +277,8 @@ MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
 enum arm_smmu_arch_version {
-	ARM_SMMU_V1 = 1,
+	ARM_SMMU_V1,
+	ARM_SMMU_V1_64K,
 	ARM_SMMU_V2,
 };
 
@@ -769,7 +770,7 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 
 	/* CBAR */
 	reg = cfg->cbar;
-	if (smmu->version == ARM_SMMU_V1)
+	if (smmu->version < ARM_SMMU_V2)
 		reg |= cfg->irptndx << CBAR_IRPTNDX_SHIFT;
 
 	/*
@@ -942,7 +943,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		goto out_unlock;
 
 	cfg->cbndx = ret;
-	if (smmu->version == ARM_SMMU_V1) {
+	if (smmu->version < ARM_SMMU_V2) {
 		cfg->irptndx = atomic_inc_return(&smmu->irptndx);
 		cfg->irptndx %= smmu->num_context_irqs;
 	} else {
@@ -1627,7 +1628,8 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	bool cttw_dt, cttw_reg;
 
 	dev_notice(smmu->dev, "probing hardware configuration...\n");
-	dev_notice(smmu->dev, "SMMUv%d with:\n", smmu->version);
+	dev_notice(smmu->dev, "SMMUv%d with:\n",
+			smmu->version == ARM_SMMU_V2 ? 2 : 1);
 
 	/* ID0 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID0);
@@ -1659,7 +1661,8 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		return -ENODEV;
 	}
 
-	if ((id & ID0_S1TS) && ((smmu->version == 1) || !(id & ID0_ATOSNS))) {
+	if ((id & ID0_S1TS) &&
+		((smmu->version < ARM_SMMU_V2) || !(id & ID0_ATOSNS))) {
 		smmu->features |= ARM_SMMU_FEAT_TRANS_OPS;
 		dev_notice(smmu->dev, "\taddress translation ops\n");
 	}
@@ -1774,8 +1777,10 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		dev_warn(smmu->dev,
 			 "failed to set DMA mask for table walker\n");
 
-	if (smmu->version == ARM_SMMU_V1) {
+	if (smmu->version < ARM_SMMU_V2) {
 		smmu->va_size = smmu->ipa_size;
+		if (smmu->version == ARM_SMMU_V1_64K)
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
 	} else {
 		size = (id >> ID2_UBS_SHIFT) & ID2_UBS_MASK;
 		smmu->va_size = arm_smmu_id_size_to_bits(size);
@@ -1827,6 +1832,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
+ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
@@ -1834,7 +1840,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
 	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
-	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
+	{ .compatible = "arm,mmu-401", .data = &arm_mmu401 },
 	{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
 	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
 	{ },
@@ -1928,7 +1934,7 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 
 	parse_driver_options(smmu);
 
-	if (smmu->version > ARM_SMMU_V1 &&
+	if (smmu->version == ARM_SMMU_V2 &&
 	    smmu->num_context_banks != smmu->num_context_irqs) {
 		dev_err(dev,
 			"found only %d context interrupt(s) but %d required\n",
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 7/7] iommu/arm-smmu: Support SMMUv1 64KB supplement
@ 2016-04-13 17:13     ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-13 17:13 UTC (permalink / raw)
  To: linux-arm-kernel

The 64KB Translation Granule Supplement to the SMMUv1 architecture
allows an SMMUv1 implementation to support 64KB pages for stage 2
translations, using a constrained VMSAv8 descriptor format limited
to 40-bit addresses. Now that we can freely mix and match context
formats, we can actually handle having 4KB pages via an AArch32
context but 64KB pages via an AArch64 context, so plumb it in.

It is assumed that any implementations will have hardware capabilities
matching the format constraints, thus obviating the need for excessive
sanity-checking; this is the case for MMU-401, the only ARM Ltd.
implementation.

CC: Eric Auger <eric.auger@linaro.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu.c | 22 ++++++++++++++--------
 1 file changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1d4285f..00aa948 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -277,7 +277,8 @@ MODULE_PARM_DESC(disable_bypass,
 	"Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU.");
 
 enum arm_smmu_arch_version {
-	ARM_SMMU_V1 = 1,
+	ARM_SMMU_V1,
+	ARM_SMMU_V1_64K,
 	ARM_SMMU_V2,
 };
 
@@ -769,7 +770,7 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 
 	/* CBAR */
 	reg = cfg->cbar;
-	if (smmu->version == ARM_SMMU_V1)
+	if (smmu->version < ARM_SMMU_V2)
 		reg |= cfg->irptndx << CBAR_IRPTNDX_SHIFT;
 
 	/*
@@ -942,7 +943,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		goto out_unlock;
 
 	cfg->cbndx = ret;
-	if (smmu->version == ARM_SMMU_V1) {
+	if (smmu->version < ARM_SMMU_V2) {
 		cfg->irptndx = atomic_inc_return(&smmu->irptndx);
 		cfg->irptndx %= smmu->num_context_irqs;
 	} else {
@@ -1627,7 +1628,8 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 	bool cttw_dt, cttw_reg;
 
 	dev_notice(smmu->dev, "probing hardware configuration...\n");
-	dev_notice(smmu->dev, "SMMUv%d with:\n", smmu->version);
+	dev_notice(smmu->dev, "SMMUv%d with:\n",
+			smmu->version == ARM_SMMU_V2 ? 2 : 1);
 
 	/* ID0 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID0);
@@ -1659,7 +1661,8 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		return -ENODEV;
 	}
 
-	if ((id & ID0_S1TS) && ((smmu->version == 1) || !(id & ID0_ATOSNS))) {
+	if ((id & ID0_S1TS) &&
+		((smmu->version < ARM_SMMU_V2) || !(id & ID0_ATOSNS))) {
 		smmu->features |= ARM_SMMU_FEAT_TRANS_OPS;
 		dev_notice(smmu->dev, "\taddress translation ops\n");
 	}
@@ -1774,8 +1777,10 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 		dev_warn(smmu->dev,
 			 "failed to set DMA mask for table walker\n");
 
-	if (smmu->version == ARM_SMMU_V1) {
+	if (smmu->version < ARM_SMMU_V2) {
 		smmu->va_size = smmu->ipa_size;
+		if (smmu->version == ARM_SMMU_V1_64K)
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
 	} else {
 		size = (id >> ID2_UBS_SHIFT) & ID2_UBS_MASK;
 		smmu->va_size = arm_smmu_id_size_to_bits(size);
@@ -1827,6 +1832,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp }
 
 ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
+ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
 
@@ -1834,7 +1840,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
 	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
 	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
-	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
+	{ .compatible = "arm,mmu-401", .data = &arm_mmu401 },
 	{ .compatible = "arm,mmu-500", .data = &arm_mmu500 },
 	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
 	{ },
@@ -1928,7 +1934,7 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
 
 	parse_driver_options(smmu);
 
-	if (smmu->version > ARM_SMMU_V1 &&
+	if (smmu->version == ARM_SMMU_V2 &&
 	    smmu->num_context_banks != smmu->num_context_irqs) {
 		dev_err(dev,
 			"found only %d context interrupt(s) but %d required\n",
-- 
2.7.3.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH 1/7] iommu/arm-smmu: Differentiate specific implementations
  2016-04-13 17:12     ` Robin Murphy
@ 2016-04-13 21:15         ` Chalamarla, Tirumalesh
  -1 siblings, 0 replies; 56+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-04-13 21:15 UTC (permalink / raw)
  To: Robin Murphy, will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r






On 4/13/16, 10:12 AM, "Robin Murphy" <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:

>As the inevitable reality of implementation-specific errata workarounds
>begin to accrue alongside our integration quirk handling, it's about
>time the driver had a decent way of keeping track. Extend the per-SMMU
>data so we can identify specific implementations in an efficient and
>firmware-agnostic manner.
>
>Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>---
> drivers/iommu/arm-smmu.c | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>index e933679..2d5f357 100644
>--- a/drivers/iommu/arm-smmu.c
>+++ b/drivers/iommu/arm-smmu.c
>@@ -278,6 +278,10 @@ enum arm_smmu_arch_version {
> 	ARM_SMMU_V2,
> };
> 
>+enum arm_smmu_implementation {
>+	GENERIC_SMMU,
>+};
>+
> struct arm_smmu_smr {
> 	u8				idx;
> 	u16				mask;
>@@ -315,6 +319,7 @@ struct arm_smmu_device {
> #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
> 	u32				options;
> 	enum arm_smmu_arch_version	version;
>+	enum arm_smmu_implementation	model;
> 
> 	u32				num_context_banks;
> 	u32				num_s2_context_banks;
>@@ -1735,13 +1740,24 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
> 	return 0;
> }
> 
>+struct arm_smmu_match_data {
>+	enum arm_smmu_arch_version version;
>+	enum arm_smmu_implementation model;
>+};
>+
>+#define ARM_SMMU_MATCH_DATA(name, ver, imp)	\
>+static struct arm_smmu_match_data name = { .version = ver, .model = imp }
>+
>+ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>+ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>+
> static const struct of_device_id arm_smmu_of_match[] = {
>-	{ .compatible = "arm,smmu-v1", .data = (void *)ARM_SMMU_V1 },
>-	{ .compatible = "arm,smmu-v2", .data = (void *)ARM_SMMU_V2 },
>-	{ .compatible = "arm,mmu-400", .data = (void *)ARM_SMMU_V1 },
>-	{ .compatible = "arm,mmu-401", .data = (void *)ARM_SMMU_V1 },
>-	{ .compatible = "arm,mmu-500", .data = (void *)ARM_SMMU_V2 },
>-	{ .compatible = "cavium,smmu-v2", .data = (void *)ARM_SMMU_V2 },
>+	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>+	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
>+	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
>+	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
>+	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
>+	{ .compatible = "cavium,smmu-v2", .data = &smmu_generic_v2 },
> 	{ },
> };
> MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
>@@ -1749,6 +1765,7 @@ MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
> static int arm_smmu_device_dt_probe(struct platform_device *pdev)
> {
> 	const struct of_device_id *of_id;
>+	const struct arm_smmu_match_data *data;
> 	struct resource *res;
> 	struct arm_smmu_device *smmu;
> 	struct device *dev = &pdev->dev;
>@@ -1764,7 +1781,9 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
> 	smmu->dev = dev;
> 
> 	of_id = of_match_node(arm_smmu_of_match, dev->of_node);
>-	smmu->version = (enum arm_smmu_arch_version)of_id->data;
>+	data = of_id->data;
>+	smmu->version = data->version;
>+	smmu->model = data->model;
> 
> 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> 	smmu->base = devm_ioremap_resource(dev, res);
>-- 
>2.7.3.dirty
Looks good to me. 

Acked-by: Tirumalesh Chalamarla<tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 1/7] iommu/arm-smmu: Differentiate specific implementations
@ 2016-04-13 21:15         ` Chalamarla, Tirumalesh
  0 siblings, 0 replies; 56+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-04-13 21:15 UTC (permalink / raw)
  To: linux-arm-kernel






On 4/13/16, 10:12 AM, "Robin Murphy" <robin.murphy@arm.com> wrote:

>As the inevitable reality of implementation-specific errata workarounds
>begin to accrue alongside our integration quirk handling, it's about
>time the driver had a decent way of keeping track. Extend the per-SMMU
>data so we can identify specific implementations in an efficient and
>firmware-agnostic manner.
>
>Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>---
> drivers/iommu/arm-smmu.c | 33 ++++++++++++++++++++++++++-------
> 1 file changed, 26 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>index e933679..2d5f357 100644
>--- a/drivers/iommu/arm-smmu.c
>+++ b/drivers/iommu/arm-smmu.c
>@@ -278,6 +278,10 @@ enum arm_smmu_arch_version {
> 	ARM_SMMU_V2,
> };
> 
>+enum arm_smmu_implementation {
>+	GENERIC_SMMU,
>+};
>+
> struct arm_smmu_smr {
> 	u8				idx;
> 	u16				mask;
>@@ -315,6 +319,7 @@ struct arm_smmu_device {
> #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
> 	u32				options;
> 	enum arm_smmu_arch_version	version;
>+	enum arm_smmu_implementation	model;
> 
> 	u32				num_context_banks;
> 	u32				num_s2_context_banks;
>@@ -1735,13 +1740,24 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
> 	return 0;
> }
> 
>+struct arm_smmu_match_data {
>+	enum arm_smmu_arch_version version;
>+	enum arm_smmu_implementation model;
>+};
>+
>+#define ARM_SMMU_MATCH_DATA(name, ver, imp)	\
>+static struct arm_smmu_match_data name = { .version = ver, .model = imp }
>+
>+ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
>+ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>+
> static const struct of_device_id arm_smmu_of_match[] = {
>-	{ .compatible = "arm,smmu-v1", .data = (void *)ARM_SMMU_V1 },
>-	{ .compatible = "arm,smmu-v2", .data = (void *)ARM_SMMU_V2 },
>-	{ .compatible = "arm,mmu-400", .data = (void *)ARM_SMMU_V1 },
>-	{ .compatible = "arm,mmu-401", .data = (void *)ARM_SMMU_V1 },
>-	{ .compatible = "arm,mmu-500", .data = (void *)ARM_SMMU_V2 },
>-	{ .compatible = "cavium,smmu-v2", .data = (void *)ARM_SMMU_V2 },
>+	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>+	{ .compatible = "arm,smmu-v2", .data = &smmu_generic_v2 },
>+	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
>+	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
>+	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
>+	{ .compatible = "cavium,smmu-v2", .data = &smmu_generic_v2 },
> 	{ },
> };
> MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
>@@ -1749,6 +1765,7 @@ MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
> static int arm_smmu_device_dt_probe(struct platform_device *pdev)
> {
> 	const struct of_device_id *of_id;
>+	const struct arm_smmu_match_data *data;
> 	struct resource *res;
> 	struct arm_smmu_device *smmu;
> 	struct device *dev = &pdev->dev;
>@@ -1764,7 +1781,9 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
> 	smmu->dev = dev;
> 
> 	of_id = of_match_node(arm_smmu_of_match, dev->of_node);
>-	smmu->version = (enum arm_smmu_arch_version)of_id->data;
>+	data = of_id->data;
>+	smmu->version = data->version;
>+	smmu->model = data->model;
> 
> 	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> 	smmu->base = devm_ioremap_resource(dev, res);
>-- 
>2.7.3.dirty
Looks good to me. 

Acked-by: Tirumalesh Chalamarla<tchalamarla@caviumnetworks.com>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 2/7] iommu/arm-smmu: Convert ThunderX workaround to new method
  2016-04-13 17:12     ` Robin Murphy
@ 2016-04-13 21:16         ` Chalamarla, Tirumalesh
  -1 siblings, 0 replies; 56+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-04-13 21:16 UTC (permalink / raw)
  To: Robin Murphy, will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r






On 4/13/16, 10:12 AM, "Robin Murphy" <robin.murphy-5wv7dgnIgG8@public.gmane.org> wrote:

>With a framework for implementation-specific funtionality in place, the
>currently-FDT-dependent ThunderX workaround gets to be the first user.
>
>Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>---
> drivers/iommu/arm-smmu.c | 27 ++++++++++++++-------------
> 1 file changed, 14 insertions(+), 13 deletions(-)
>
>diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>index 2d5f357..d8bc20a 100644
>--- a/drivers/iommu/arm-smmu.c
>+++ b/drivers/iommu/arm-smmu.c
>@@ -280,6 +280,7 @@ enum arm_smmu_arch_version {
> 
> enum arm_smmu_implementation {
> 	GENERIC_SMMU,
>+	CAVIUM_SMMUV2,
> };
> 
> struct arm_smmu_smr {
>@@ -1686,6 +1687,17 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
> 	}
> 	dev_notice(smmu->dev, "\t%u context banks (%u stage-2 only)\n",
> 		   smmu->num_context_banks, smmu->num_s2_context_banks);
>+	/*
>+	 * Cavium CN88xx erratum #27704.
>+	 * Ensure ASID and VMID allocation is unique across all SMMUs in
>+	 * the system.
>+	 */
>+	if (smmu->model == CAVIUM_SMMUV2) {
>+		smmu->cavium_id_base =
>+			atomic_add_return(smmu->num_context_banks,
>+					  &cavium_smmu_context_count);
>+		smmu->cavium_id_base -= smmu->num_context_banks;
>+	}
> 
> 	/* ID2 */
> 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID2);
>@@ -1750,6 +1762,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp }
> 
> ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
> ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>+ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> 
> static const struct of_device_id arm_smmu_of_match[] = {
> 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>@@ -1757,7 +1770,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
> 	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
> 	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
> 	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
>-	{ .compatible = "cavium,smmu-v2", .data = &smmu_generic_v2 },
>+	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
> 	{ },
> };
> MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
>@@ -1871,18 +1884,6 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
> 		}
> 	}
> 
>-	/*
>-	 * Cavium CN88xx erratum #27704.
>-	 * Ensure ASID and VMID allocation is unique across all SMMUs in
>-	 * the system.
>-	 */
>-	if (of_device_is_compatible(dev->of_node, "cavium,smmu-v2")) {
>-		smmu->cavium_id_base =
>-			atomic_add_return(smmu->num_context_banks,
>-					  &cavium_smmu_context_count);
>-		smmu->cavium_id_base -= smmu->num_context_banks;
>-	}
>-
> 	INIT_LIST_HEAD(&smmu->list);
> 	spin_lock(&arm_smmu_devices_lock);
> 	list_add(&smmu->list, &arm_smmu_devices);
>-- 
>2.7.3.dirty
This looks fine too.
Acked-by: Tirumalesh Chalamarla<tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>

>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 2/7] iommu/arm-smmu: Convert ThunderX workaround to new method
@ 2016-04-13 21:16         ` Chalamarla, Tirumalesh
  0 siblings, 0 replies; 56+ messages in thread
From: Chalamarla, Tirumalesh @ 2016-04-13 21:16 UTC (permalink / raw)
  To: linux-arm-kernel






On 4/13/16, 10:12 AM, "Robin Murphy" <robin.murphy@arm.com> wrote:

>With a framework for implementation-specific funtionality in place, the
>currently-FDT-dependent ThunderX workaround gets to be the first user.
>
>Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>---
> drivers/iommu/arm-smmu.c | 27 ++++++++++++++-------------
> 1 file changed, 14 insertions(+), 13 deletions(-)
>
>diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>index 2d5f357..d8bc20a 100644
>--- a/drivers/iommu/arm-smmu.c
>+++ b/drivers/iommu/arm-smmu.c
>@@ -280,6 +280,7 @@ enum arm_smmu_arch_version {
> 
> enum arm_smmu_implementation {
> 	GENERIC_SMMU,
>+	CAVIUM_SMMUV2,
> };
> 
> struct arm_smmu_smr {
>@@ -1686,6 +1687,17 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
> 	}
> 	dev_notice(smmu->dev, "\t%u context banks (%u stage-2 only)\n",
> 		   smmu->num_context_banks, smmu->num_s2_context_banks);
>+	/*
>+	 * Cavium CN88xx erratum #27704.
>+	 * Ensure ASID and VMID allocation is unique across all SMMUs in
>+	 * the system.
>+	 */
>+	if (smmu->model == CAVIUM_SMMUV2) {
>+		smmu->cavium_id_base =
>+			atomic_add_return(smmu->num_context_banks,
>+					  &cavium_smmu_context_count);
>+		smmu->cavium_id_base -= smmu->num_context_banks;
>+	}
> 
> 	/* ID2 */
> 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID2);
>@@ -1750,6 +1762,7 @@ static struct arm_smmu_match_data name = { .version = ver, .model = imp }
> 
> ARM_SMMU_MATCH_DATA(smmu_generic_v1, ARM_SMMU_V1, GENERIC_SMMU);
> ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU);
>+ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
> 
> static const struct of_device_id arm_smmu_of_match[] = {
> 	{ .compatible = "arm,smmu-v1", .data = &smmu_generic_v1 },
>@@ -1757,7 +1770,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
> 	{ .compatible = "arm,mmu-400", .data = &smmu_generic_v1 },
> 	{ .compatible = "arm,mmu-401", .data = &smmu_generic_v1 },
> 	{ .compatible = "arm,mmu-500", .data = &smmu_generic_v2 },
>-	{ .compatible = "cavium,smmu-v2", .data = &smmu_generic_v2 },
>+	{ .compatible = "cavium,smmu-v2", .data = &cavium_smmuv2 },
> 	{ },
> };
> MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
>@@ -1871,18 +1884,6 @@ static int arm_smmu_device_dt_probe(struct platform_device *pdev)
> 		}
> 	}
> 
>-	/*
>-	 * Cavium CN88xx erratum #27704.
>-	 * Ensure ASID and VMID allocation is unique across all SMMUs in
>-	 * the system.
>-	 */
>-	if (of_device_is_compatible(dev->of_node, "cavium,smmu-v2")) {
>-		smmu->cavium_id_base =
>-			atomic_add_return(smmu->num_context_banks,
>-					  &cavium_smmu_context_count);
>-		smmu->cavium_id_base -= smmu->num_context_banks;
>-	}
>-
> 	INIT_LIST_HEAD(&smmu->list);
> 	spin_lock(&arm_smmu_devices_lock);
> 	list_add(&smmu->list, &arm_smmu_devices);
>-- 
>2.7.3.dirty
This looks fine too.
Acked-by: Tirumalesh Chalamarla<tchalamarla@caviumnetworks.com>

>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/7] iommu/arm-smmu: Work around MMU-500 prefetch errata
  2016-04-13 17:12     ` Robin Murphy
@ 2016-04-21 16:15         ` Will Deacon
  -1 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-21 16:15 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Catalin Marinas,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Apr 13, 2016 at 06:12:59PM +0100, Robin Murphy wrote:
> MMU-500 erratum #841119 is tickled by a particular set of circumstances
> interacting with the next-page prefetcher. Since said prefetcher is
> quite dumb and actually detrimental to performance in some cases (by
> causing unwanted TLB evictions for non-sequential access patterns), we
> lose very little by turning it off, and what we gain is a guarantee that
> the erratum is never hit.
> 
> As a bonus, the same workaround will also prevent erratum #826419 once
> v7 short descriptor support is implemented.
> 
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  Documentation/arm64/silicon-errata.txt |  1 +
>  drivers/iommu/arm-smmu.c               | 16 +++++++++++++++-
>  2 files changed, 16 insertions(+), 1 deletion(-)

Looks fine, but can you update silicon-errata.txt too, please? It's a
handy reference for "what is Linux actively working around" and this
one is somewhat hidden away.

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 3/7] iommu/arm-smmu: Work around MMU-500 prefetch errata
@ 2016-04-21 16:15         ` Will Deacon
  0 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-21 16:15 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 13, 2016 at 06:12:59PM +0100, Robin Murphy wrote:
> MMU-500 erratum #841119 is tickled by a particular set of circumstances
> interacting with the next-page prefetcher. Since said prefetcher is
> quite dumb and actually detrimental to performance in some cases (by
> causing unwanted TLB evictions for non-sequential access patterns), we
> lose very little by turning it off, and what we gain is a guarantee that
> the erratum is never hit.
> 
> As a bonus, the same workaround will also prevent erratum #826419 once
> v7 short descriptor support is implemented.
> 
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  Documentation/arm64/silicon-errata.txt |  1 +
>  drivers/iommu/arm-smmu.c               | 16 +++++++++++++++-
>  2 files changed, 16 insertions(+), 1 deletion(-)

Looks fine, but can you update silicon-errata.txt too, please? It's a
handy reference for "what is Linux actively working around" and this
one is somewhat hidden away.

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 3/7] iommu/arm-smmu: Work around MMU-500 prefetch errata
  2016-04-13 17:12     ` Robin Murphy
@ 2016-04-21 16:16         ` Will Deacon
  -1 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-21 16:16 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Catalin Marinas,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Wed, Apr 13, 2016 at 06:12:59PM +0100, Robin Murphy wrote:
> MMU-500 erratum #841119 is tickled by a particular set of circumstances
> interacting with the next-page prefetcher. Since said prefetcher is
> quite dumb and actually detrimental to performance in some cases (by
> causing unwanted TLB evictions for non-sequential access patterns), we
> lose very little by turning it off, and what we gain is a guarantee that
> the erratum is never hit.
> 
> As a bonus, the same workaround will also prevent erratum #826419 once
> v7 short descriptor support is implemented.
> 
> CC: Catalin Marinas <catalin.marinas-5wv7dgnIgG8@public.gmane.org>
> CC: Will Deacon <will.deacon-5wv7dgnIgG8@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  Documentation/arm64/silicon-errata.txt |  1 +
>  drivers/iommu/arm-smmu.c               | 16 +++++++++++++++-
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index 806f91c..c6938e5 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -53,6 +53,7 @@ stable kernels.
>  | ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075    |
>  | ARM            | Cortex-A57      | #852523         | N/A                     |
>  | ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220    |
> +| ARM            | MMU-500         | #841119,#826419 | N/A                     |

Found it :)

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 3/7] iommu/arm-smmu: Work around MMU-500 prefetch errata
@ 2016-04-21 16:16         ` Will Deacon
  0 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-21 16:16 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 13, 2016 at 06:12:59PM +0100, Robin Murphy wrote:
> MMU-500 erratum #841119 is tickled by a particular set of circumstances
> interacting with the next-page prefetcher. Since said prefetcher is
> quite dumb and actually detrimental to performance in some cases (by
> causing unwanted TLB evictions for non-sequential access patterns), we
> lose very little by turning it off, and what we gain is a guarantee that
> the erratum is never hit.
> 
> As a bonus, the same workaround will also prevent erratum #826419 once
> v7 short descriptor support is implemented.
> 
> CC: Catalin Marinas <catalin.marinas@arm.com>
> CC: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  Documentation/arm64/silicon-errata.txt |  1 +
>  drivers/iommu/arm-smmu.c               | 16 +++++++++++++++-
>  2 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index 806f91c..c6938e5 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -53,6 +53,7 @@ stable kernels.
>  | ARM            | Cortex-A57      | #832075         | ARM64_ERRATUM_832075    |
>  | ARM            | Cortex-A57      | #852523         | N/A                     |
>  | ARM            | Cortex-A57      | #834220         | ARM64_ERRATUM_834220    |
> +| ARM            | MMU-500         | #841119,#826419 | N/A                     |

Found it :)

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-13 17:13     ` Robin Murphy
@ 2016-04-21 16:18         ` Will Deacon
  -1 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-21 16:18 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Arnd Bergmann, Hitoshi Mitake, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

On Wed, Apr 13, 2016 at 06:13:00PM +0100, Robin Murphy wrote:
> Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
> accessor macros as conditional wrappers") makes the *_relaxed forms of
> I/O accessors universally available to drivers, in cases where writeq()
> is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
> end up falling back to writel() regardless of whether writel_relaxed()
> is available (identically for s/write/read/).
> 
> Add corresponding relaxed forms of the nonatomic helpers to delegate
> to the equivalent 32-bit accessors as appropriate.
> 
> CC: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
> CC: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> CC: Darren Hart <dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> CC: Hitoshi Mitake <mitake.hitoshi-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>  include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
>  include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
> index 11d7e84..1a85566 100644
> --- a/include/linux/io-64-nonatomic-hi-lo.h
> +++ b/include/linux/io-64-nonatomic-hi-lo.h
> @@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
>  	writel(val, addr);
>  }
>  
> +static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
> +{
> +	const volatile u32 __iomem *p = addr;
> +	u32 low, high;
> +
> +	high = readl_relaxed(p + 1);
> +	low = readl_relaxed(p);
> +
> +	return low + ((u64)high << 32);
> +}
> +
> +static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
> +{
> +	writel_relaxed(val >> 32, addr + 4);
> +	writel_relaxed(val, addr);
> +}

Could we not generate the _relaxed variants with some macro magic?

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-21 16:18         ` Will Deacon
  0 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-21 16:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 13, 2016 at 06:13:00PM +0100, Robin Murphy wrote:
> Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
> accessor macros as conditional wrappers") makes the *_relaxed forms of
> I/O accessors universally available to drivers, in cases where writeq()
> is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
> end up falling back to writel() regardless of whether writel_relaxed()
> is available (identically for s/write/read/).
> 
> Add corresponding relaxed forms of the nonatomic helpers to delegate
> to the equivalent 32-bit accessors as appropriate.
> 
> CC: Arnd Bergmann <arnd@arndb.de>
> CC: Christoph Hellwig <hch@lst.de>
> CC: Darren Hart <dvhart@linux.intel.com>
> CC: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
>  include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
>  2 files changed, 50 insertions(+)
> 
> diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
> index 11d7e84..1a85566 100644
> --- a/include/linux/io-64-nonatomic-hi-lo.h
> +++ b/include/linux/io-64-nonatomic-hi-lo.h
> @@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
>  	writel(val, addr);
>  }
>  
> +static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
> +{
> +	const volatile u32 __iomem *p = addr;
> +	u32 low, high;
> +
> +	high = readl_relaxed(p + 1);
> +	low = readl_relaxed(p);
> +
> +	return low + ((u64)high << 32);
> +}
> +
> +static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
> +{
> +	writel_relaxed(val >> 32, addr + 4);
> +	writel_relaxed(val, addr);
> +}

Could we not generate the _relaxed variants with some macro magic?

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
  2016-04-13 17:13     ` Robin Murphy
@ 2016-04-21 16:30         ` Will Deacon
  -1 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-21 16:30 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Robin,

On Wed, Apr 13, 2016 at 06:13:02PM +0100, Robin Murphy wrote:
> The way the driver currently forces an AArch32 or AArch64 context format
> based on the kernel config and SMMU architecture version is suboptimal,
> in that it makes it very hard to support oddball mix-and-match cases
> like the SMMUv1 64KB supplement, or situations where the reduced table
> depth of an AArch32 short descriptor context may be desirable under an
> AArch64 kernel. It also only happens to work on current implementations
> which do support all the relevant formats.
> 
> Introduce an explicit notion of context format, so we can manage that
> independently and get rid of the inflexible #ifdeffery.

Thanks for doing all of this! One comment on the page size stuff:

> +	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
> +		switch (PAGE_SIZE) {
> +		case SZ_64K:
> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> +				break;
> +			} /* else fall through */
> +		case SZ_16K:
> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> +				break;
> +			} /* else fall through */
> +		case SZ_4K:
> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> +		}
> +	}

The io-pgtable code (arm_lpae_restrict_pgsizes) already does something
*very* similar to this, using the pgsize_bitmap as input. Can you not
just choose the ARM_SMMU_CTX_FMT_AARCH64 here and set the pgsize_bitmap
to represent all page sizes supported by the hardware? That way, you should
end up with the best option coming back from the pgtable code (I do this
in the v3 driver, fwiw).

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
@ 2016-04-21 16:30         ` Will Deacon
  0 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-21 16:30 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On Wed, Apr 13, 2016 at 06:13:02PM +0100, Robin Murphy wrote:
> The way the driver currently forces an AArch32 or AArch64 context format
> based on the kernel config and SMMU architecture version is suboptimal,
> in that it makes it very hard to support oddball mix-and-match cases
> like the SMMUv1 64KB supplement, or situations where the reduced table
> depth of an AArch32 short descriptor context may be desirable under an
> AArch64 kernel. It also only happens to work on current implementations
> which do support all the relevant formats.
> 
> Introduce an explicit notion of context format, so we can manage that
> independently and get rid of the inflexible #ifdeffery.

Thanks for doing all of this! One comment on the page size stuff:

> +	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
> +		switch (PAGE_SIZE) {
> +		case SZ_64K:
> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> +				break;
> +			} /* else fall through */
> +		case SZ_16K:
> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> +				break;
> +			} /* else fall through */
> +		case SZ_4K:
> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> +		}
> +	}

The io-pgtable code (arm_lpae_restrict_pgsizes) already does something
*very* similar to this, using the pgsize_bitmap as input. Can you not
just choose the ARM_SMMU_CTX_FMT_AARCH64 here and set the pgsize_bitmap
to represent all page sizes supported by the hardware? That way, you should
end up with the best option coming back from the pgtable code (I do this
in the v3 driver, fwiw).

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-21 16:18         ` Will Deacon
@ 2016-04-22 17:08             ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-22 17:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: Arnd Bergmann, Hitoshi Mitake, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

On 21/04/16 17:18, Will Deacon wrote:
> On Wed, Apr 13, 2016 at 06:13:00PM +0100, Robin Murphy wrote:
>> Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
>> accessor macros as conditional wrappers") makes the *_relaxed forms of
>> I/O accessors universally available to drivers, in cases where writeq()
>> is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
>> end up falling back to writel() regardless of whether writel_relaxed()
>> is available (identically for s/write/read/).
>>
>> Add corresponding relaxed forms of the nonatomic helpers to delegate
>> to the equivalent 32-bit accessors as appropriate.
>>
>> CC: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
>> CC: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
>> CC: Darren Hart <dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
>> CC: Hitoshi Mitake <mitake.hitoshi-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>   include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
>>   include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
>>   2 files changed, 50 insertions(+)
>>
>> diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
>> index 11d7e84..1a85566 100644
>> --- a/include/linux/io-64-nonatomic-hi-lo.h
>> +++ b/include/linux/io-64-nonatomic-hi-lo.h
>> @@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
>>   	writel(val, addr);
>>   }
>>
>> +static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
>> +{
>> +	const volatile u32 __iomem *p = addr;
>> +	u32 low, high;
>> +
>> +	high = readl_relaxed(p + 1);
>> +	low = readl_relaxed(p);
>> +
>> +	return low + ((u64)high << 32);
>> +}
>> +
>> +static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
>> +{
>> +	writel_relaxed(val >> 32, addr + 4);
>> +	writel_relaxed(val, addr);
>> +}
>
> Could we not generate the _relaxed variants with some macro magic?

We _could_ - indeed I started doing that, but then decided that the 
obfuscation of horrible macro-templated functions wasn't worth saving a 
couple of hundred bytes in some code that isn't exactly difficult to 
maintain and has needed touching once in 4 years.

If you did want to go down the macro route, I may as well also generate 
both lo-hi and hi-lo headers all from a single template, it'd be really 
clever... <alarm bells> ;)

Robin.

>
> Will
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-22 17:08             ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-22 17:08 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/04/16 17:18, Will Deacon wrote:
> On Wed, Apr 13, 2016 at 06:13:00PM +0100, Robin Murphy wrote:
>> Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
>> accessor macros as conditional wrappers") makes the *_relaxed forms of
>> I/O accessors universally available to drivers, in cases where writeq()
>> is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
>> end up falling back to writel() regardless of whether writel_relaxed()
>> is available (identically for s/write/read/).
>>
>> Add corresponding relaxed forms of the nonatomic helpers to delegate
>> to the equivalent 32-bit accessors as appropriate.
>>
>> CC: Arnd Bergmann <arnd@arndb.de>
>> CC: Christoph Hellwig <hch@lst.de>
>> CC: Darren Hart <dvhart@linux.intel.com>
>> CC: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
>>   include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
>>   2 files changed, 50 insertions(+)
>>
>> diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
>> index 11d7e84..1a85566 100644
>> --- a/include/linux/io-64-nonatomic-hi-lo.h
>> +++ b/include/linux/io-64-nonatomic-hi-lo.h
>> @@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
>>   	writel(val, addr);
>>   }
>>
>> +static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
>> +{
>> +	const volatile u32 __iomem *p = addr;
>> +	u32 low, high;
>> +
>> +	high = readl_relaxed(p + 1);
>> +	low = readl_relaxed(p);
>> +
>> +	return low + ((u64)high << 32);
>> +}
>> +
>> +static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
>> +{
>> +	writel_relaxed(val >> 32, addr + 4);
>> +	writel_relaxed(val, addr);
>> +}
>
> Could we not generate the _relaxed variants with some macro magic?

We _could_ - indeed I started doing that, but then decided that the 
obfuscation of horrible macro-templated functions wasn't worth saving a 
couple of hundred bytes in some code that isn't exactly difficult to 
maintain and has needed touching once in 4 years.

If you did want to go down the macro route, I may as well also generate 
both lo-hi and hi-lo headers all from a single template, it'd be really 
clever... <alarm bells> ;)

Robin.

>
> Will
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
  2016-04-21 16:30         ` Will Deacon
@ 2016-04-22 17:38             ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-22 17:38 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 21/04/16 17:30, Will Deacon wrote:
> Hi Robin,
>
> On Wed, Apr 13, 2016 at 06:13:02PM +0100, Robin Murphy wrote:
>> The way the driver currently forces an AArch32 or AArch64 context format
>> based on the kernel config and SMMU architecture version is suboptimal,
>> in that it makes it very hard to support oddball mix-and-match cases
>> like the SMMUv1 64KB supplement, or situations where the reduced table
>> depth of an AArch32 short descriptor context may be desirable under an
>> AArch64 kernel. It also only happens to work on current implementations
>> which do support all the relevant formats.
>>
>> Introduce an explicit notion of context format, so we can manage that
>> independently and get rid of the inflexible #ifdeffery.
>
> Thanks for doing all of this! One comment on the page size stuff:
>
>> +	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
>> +		switch (PAGE_SIZE) {
>> +		case SZ_64K:
>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>> +				break;
>> +			} /* else fall through */
>> +		case SZ_16K:
>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>> +				break;
>> +			} /* else fall through */
>> +		case SZ_4K:
>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>> +		}
>> +	}
>
> The io-pgtable code (arm_lpae_restrict_pgsizes) already does something
> *very* similar to this, using the pgsize_bitmap as input. Can you not
> just choose the ARM_SMMU_CTX_FMT_AARCH64 here and set the pgsize_bitmap
> to represent all page sizes supported by the hardware? That way, you should
> end up with the best option coming back from the pgtable code (I do this
> in the v3 driver, fwiw).

It took a while to come back to me, but the problem is that we can't 
call alloc_io_pgtable_ops(fmt...) before choosing either ARM_64_LPAE_Sx 
or ARM_32_LPAE_Sx for fmt, but if we commit to 64-bit we'll get stuck 
later without the possibility of falling back to AArch32 if it turns out 
we don't have a viable AArch64 granule. Plus we'd have to remove the 
LPAE page sizes from the bitmap beforehand in the case we don't support 
AARCH64_4K lest we end up with a phantom format the hardware doesn't 
actually do, and it all starts getting rather horrible...

Robin.

> Will
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
@ 2016-04-22 17:38             ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-22 17:38 UTC (permalink / raw)
  To: linux-arm-kernel

On 21/04/16 17:30, Will Deacon wrote:
> Hi Robin,
>
> On Wed, Apr 13, 2016 at 06:13:02PM +0100, Robin Murphy wrote:
>> The way the driver currently forces an AArch32 or AArch64 context format
>> based on the kernel config and SMMU architecture version is suboptimal,
>> in that it makes it very hard to support oddball mix-and-match cases
>> like the SMMUv1 64KB supplement, or situations where the reduced table
>> depth of an AArch32 short descriptor context may be desirable under an
>> AArch64 kernel. It also only happens to work on current implementations
>> which do support all the relevant formats.
>>
>> Introduce an explicit notion of context format, so we can manage that
>> independently and get rid of the inflexible #ifdeffery.
>
> Thanks for doing all of this! One comment on the page size stuff:
>
>> +	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
>> +		switch (PAGE_SIZE) {
>> +		case SZ_64K:
>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>> +				break;
>> +			} /* else fall through */
>> +		case SZ_16K:
>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>> +				break;
>> +			} /* else fall through */
>> +		case SZ_4K:
>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>> +		}
>> +	}
>
> The io-pgtable code (arm_lpae_restrict_pgsizes) already does something
> *very* similar to this, using the pgsize_bitmap as input. Can you not
> just choose the ARM_SMMU_CTX_FMT_AARCH64 here and set the pgsize_bitmap
> to represent all page sizes supported by the hardware? That way, you should
> end up with the best option coming back from the pgtable code (I do this
> in the v3 driver, fwiw).

It took a while to come back to me, but the problem is that we can't 
call alloc_io_pgtable_ops(fmt...) before choosing either ARM_64_LPAE_Sx 
or ARM_32_LPAE_Sx for fmt, but if we commit to 64-bit we'll get stuck 
later without the possibility of falling back to AArch32 if it turns out 
we don't have a viable AArch64 granule. Plus we'd have to remove the 
LPAE page sizes from the bitmap beforehand in the case we don't support 
AARCH64_4K lest we end up with a phantom format the hardware doesn't 
actually do, and it all starts getting rather horrible...

Robin.

> Will
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
  2016-04-22 17:38             ` Robin Murphy
@ 2016-04-25 11:02                 ` Will Deacon
  -1 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-25 11:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Apr 22, 2016 at 06:38:04PM +0100, Robin Murphy wrote:
> On 21/04/16 17:30, Will Deacon wrote:
> >Hi Robin,
> >
> >On Wed, Apr 13, 2016 at 06:13:02PM +0100, Robin Murphy wrote:
> >>The way the driver currently forces an AArch32 or AArch64 context format
> >>based on the kernel config and SMMU architecture version is suboptimal,
> >>in that it makes it very hard to support oddball mix-and-match cases
> >>like the SMMUv1 64KB supplement, or situations where the reduced table
> >>depth of an AArch32 short descriptor context may be desirable under an
> >>AArch64 kernel. It also only happens to work on current implementations
> >>which do support all the relevant formats.
> >>
> >>Introduce an explicit notion of context format, so we can manage that
> >>independently and get rid of the inflexible #ifdeffery.
> >
> >Thanks for doing all of this! One comment on the page size stuff:
> >
> >>+	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
> >>+		switch (PAGE_SIZE) {
> >>+		case SZ_64K:
> >>+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
> >>+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> >>+				break;
> >>+			} /* else fall through */
> >>+		case SZ_16K:
> >>+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
> >>+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> >>+				break;
> >>+			} /* else fall through */
> >>+		case SZ_4K:
> >>+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
> >>+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> >>+		}
> >>+	}
> >
> >The io-pgtable code (arm_lpae_restrict_pgsizes) already does something
> >*very* similar to this, using the pgsize_bitmap as input. Can you not
> >just choose the ARM_SMMU_CTX_FMT_AARCH64 here and set the pgsize_bitmap
> >to represent all page sizes supported by the hardware? That way, you should
> >end up with the best option coming back from the pgtable code (I do this
> >in the v3 driver, fwiw).
> 
> It took a while to come back to me, but the problem is that we can't call
> alloc_io_pgtable_ops(fmt...) before choosing either ARM_64_LPAE_Sx or
> ARM_32_LPAE_Sx for fmt, but if we commit to 64-bit we'll get stuck later
> without the possibility of falling back to AArch32 if it turns out we don't
> have a viable AArch64 granule.

In what case would you not have a viable AArch64 granule, but the option
of falling back to AArch32 makes things work?

> Plus we'd have to remove the LPAE page sizes
> from the bitmap beforehand in the case we don't support AARCH64_4K lest we
> end up with a phantom format the hardware doesn't actually do, and it all
> starts getting rather horrible...

I'm not following. The io-pgtable code shouldn't give you back a phanton
format.

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
@ 2016-04-25 11:02                 ` Will Deacon
  0 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-25 11:02 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 22, 2016 at 06:38:04PM +0100, Robin Murphy wrote:
> On 21/04/16 17:30, Will Deacon wrote:
> >Hi Robin,
> >
> >On Wed, Apr 13, 2016 at 06:13:02PM +0100, Robin Murphy wrote:
> >>The way the driver currently forces an AArch32 or AArch64 context format
> >>based on the kernel config and SMMU architecture version is suboptimal,
> >>in that it makes it very hard to support oddball mix-and-match cases
> >>like the SMMUv1 64KB supplement, or situations where the reduced table
> >>depth of an AArch32 short descriptor context may be desirable under an
> >>AArch64 kernel. It also only happens to work on current implementations
> >>which do support all the relevant formats.
> >>
> >>Introduce an explicit notion of context format, so we can manage that
> >>independently and get rid of the inflexible #ifdeffery.
> >
> >Thanks for doing all of this! One comment on the page size stuff:
> >
> >>+	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
> >>+		switch (PAGE_SIZE) {
> >>+		case SZ_64K:
> >>+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
> >>+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> >>+				break;
> >>+			} /* else fall through */
> >>+		case SZ_16K:
> >>+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
> >>+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> >>+				break;
> >>+			} /* else fall through */
> >>+		case SZ_4K:
> >>+			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
> >>+				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
> >>+		}
> >>+	}
> >
> >The io-pgtable code (arm_lpae_restrict_pgsizes) already does something
> >*very* similar to this, using the pgsize_bitmap as input. Can you not
> >just choose the ARM_SMMU_CTX_FMT_AARCH64 here and set the pgsize_bitmap
> >to represent all page sizes supported by the hardware? That way, you should
> >end up with the best option coming back from the pgtable code (I do this
> >in the v3 driver, fwiw).
> 
> It took a while to come back to me, but the problem is that we can't call
> alloc_io_pgtable_ops(fmt...) before choosing either ARM_64_LPAE_Sx or
> ARM_32_LPAE_Sx for fmt, but if we commit to 64-bit we'll get stuck later
> without the possibility of falling back to AArch32 if it turns out we don't
> have a viable AArch64 granule.

In what case would you not have a viable AArch64 granule, but the option
of falling back to AArch32 makes things work?

> Plus we'd have to remove the LPAE page sizes
> from the bitmap beforehand in the case we don't support AARCH64_4K lest we
> end up with a phantom format the hardware doesn't actually do, and it all
> starts getting rather horrible...

I'm not following. The io-pgtable code shouldn't give you back a phanton
format.

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
  2016-04-25 11:02                 ` Will Deacon
@ 2016-04-25 13:14                     ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-25 13:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On 25/04/16 12:02, Will Deacon wrote:
> On Fri, Apr 22, 2016 at 06:38:04PM +0100, Robin Murphy wrote:
>> On 21/04/16 17:30, Will Deacon wrote:
>>> Hi Robin,
>>>
>>> On Wed, Apr 13, 2016 at 06:13:02PM +0100, Robin Murphy wrote:
>>>> The way the driver currently forces an AArch32 or AArch64 context format
>>>> based on the kernel config and SMMU architecture version is suboptimal,
>>>> in that it makes it very hard to support oddball mix-and-match cases
>>>> like the SMMUv1 64KB supplement, or situations where the reduced table
>>>> depth of an AArch32 short descriptor context may be desirable under an
>>>> AArch64 kernel. It also only happens to work on current implementations
>>>> which do support all the relevant formats.
>>>>
>>>> Introduce an explicit notion of context format, so we can manage that
>>>> independently and get rid of the inflexible #ifdeffery.
>>>
>>> Thanks for doing all of this! One comment on the page size stuff:
>>>
>>>> +	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
>>>> +		switch (PAGE_SIZE) {
>>>> +		case SZ_64K:
>>>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
>>>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>>>> +				break;
>>>> +			} /* else fall through */
>>>> +		case SZ_16K:
>>>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
>>>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>>>> +				break;
>>>> +			} /* else fall through */
>>>> +		case SZ_4K:
>>>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
>>>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>>>> +		}
>>>> +	}
>>>
>>> The io-pgtable code (arm_lpae_restrict_pgsizes) already does something
>>> *very* similar to this, using the pgsize_bitmap as input. Can you not
>>> just choose the ARM_SMMU_CTX_FMT_AARCH64 here and set the pgsize_bitmap
>>> to represent all page sizes supported by the hardware? That way, you should
>>> end up with the best option coming back from the pgtable code (I do this
>>> in the v3 driver, fwiw).
>>
>> It took a while to come back to me, but the problem is that we can't call
>> alloc_io_pgtable_ops(fmt...) before choosing either ARM_64_LPAE_Sx or
>> ARM_32_LPAE_Sx for fmt, but if we commit to 64-bit we'll get stuck later
>> without the possibility of falling back to AArch32 if it turns out we don't
>> have a viable AArch64 granule.
>
> In what case would you not have a viable AArch64 granule, but the option
> of falling back to AArch32 makes things work?

A 4KB page kernel with MMU-401, whose only supported AArch64 granule is 
supposedly 64KB.

>> Plus we'd have to remove the LPAE page sizes
>> from the bitmap beforehand in the case we don't support AARCH64_4K lest we
>> end up with a phantom format the hardware doesn't actually do, and it all
>> starts getting rather horrible...
>
> I'm not following. The io-pgtable code shouldn't give you back a phanton
> format.

If you call alloc_io_pgtable_ops() with ARM_64_LPAE_S2 and an unmolested 
MMU-401 pgsize_bitmap, you get back a 4K granule v8 pagetable, which per 
a strict reading of the TRM isn't supported (although does appear to 
work in practice...)

Consider also a stage 1 SMMU supporting all formats but only with 4K 
granules: with a 64K page arm64 kernel, the presence of AARCH64_4K 
support would lead us into arm_64_lpae_alloc_pgtable_s1(), which would 
be tricked into giving us back a 64K granule v8 format by the short 
descriptor large page size matching PAGE_SIZE, and things would 
definitely go downhill from there.

Given that, I think the only truly safe thing to do is to pass an 
explicit granule in the io_pgtable_cfg and just get rid of the 
bitmap-based guessing in arm_lpae_restrict_pgsizes() - otherwise, we'd 
still have to pre-restrict the bitmap, making it pretty much redundant 
anyway. I'll have a hack at that - in the meantime please feel free to 
queue patches 1-3 if you're happy with them, as that part is still 
feature-complete without all this context stuff.

Robin.

>
> Will
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
@ 2016-04-25 13:14                     ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-25 13:14 UTC (permalink / raw)
  To: linux-arm-kernel

On 25/04/16 12:02, Will Deacon wrote:
> On Fri, Apr 22, 2016 at 06:38:04PM +0100, Robin Murphy wrote:
>> On 21/04/16 17:30, Will Deacon wrote:
>>> Hi Robin,
>>>
>>> On Wed, Apr 13, 2016 at 06:13:02PM +0100, Robin Murphy wrote:
>>>> The way the driver currently forces an AArch32 or AArch64 context format
>>>> based on the kernel config and SMMU architecture version is suboptimal,
>>>> in that it makes it very hard to support oddball mix-and-match cases
>>>> like the SMMUv1 64KB supplement, or situations where the reduced table
>>>> depth of an AArch32 short descriptor context may be desirable under an
>>>> AArch64 kernel. It also only happens to work on current implementations
>>>> which do support all the relevant formats.
>>>>
>>>> Introduce an explicit notion of context format, so we can manage that
>>>> independently and get rid of the inflexible #ifdeffery.
>>>
>>> Thanks for doing all of this! One comment on the page size stuff:
>>>
>>>> +	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
>>>> +		switch (PAGE_SIZE) {
>>>> +		case SZ_64K:
>>>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
>>>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>>>> +				break;
>>>> +			} /* else fall through */
>>>> +		case SZ_16K:
>>>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
>>>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>>>> +				break;
>>>> +			} /* else fall through */
>>>> +		case SZ_4K:
>>>> +			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
>>>> +				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
>>>> +		}
>>>> +	}
>>>
>>> The io-pgtable code (arm_lpae_restrict_pgsizes) already does something
>>> *very* similar to this, using the pgsize_bitmap as input. Can you not
>>> just choose the ARM_SMMU_CTX_FMT_AARCH64 here and set the pgsize_bitmap
>>> to represent all page sizes supported by the hardware? That way, you should
>>> end up with the best option coming back from the pgtable code (I do this
>>> in the v3 driver, fwiw).
>>
>> It took a while to come back to me, but the problem is that we can't call
>> alloc_io_pgtable_ops(fmt...) before choosing either ARM_64_LPAE_Sx or
>> ARM_32_LPAE_Sx for fmt, but if we commit to 64-bit we'll get stuck later
>> without the possibility of falling back to AArch32 if it turns out we don't
>> have a viable AArch64 granule.
>
> In what case would you not have a viable AArch64 granule, but the option
> of falling back to AArch32 makes things work?

A 4KB page kernel with MMU-401, whose only supported AArch64 granule is 
supposedly 64KB.

>> Plus we'd have to remove the LPAE page sizes
>> from the bitmap beforehand in the case we don't support AARCH64_4K lest we
>> end up with a phantom format the hardware doesn't actually do, and it all
>> starts getting rather horrible...
>
> I'm not following. The io-pgtable code shouldn't give you back a phanton
> format.

If you call alloc_io_pgtable_ops() with ARM_64_LPAE_S2 and an unmolested 
MMU-401 pgsize_bitmap, you get back a 4K granule v8 pagetable, which per 
a strict reading of the TRM isn't supported (although does appear to 
work in practice...)

Consider also a stage 1 SMMU supporting all formats but only with 4K 
granules: with a 64K page arm64 kernel, the presence of AARCH64_4K 
support would lead us into arm_64_lpae_alloc_pgtable_s1(), which would 
be tricked into giving us back a 64K granule v8 format by the short 
descriptor large page size matching PAGE_SIZE, and things would 
definitely go downhill from there.

Given that, I think the only truly safe thing to do is to pass an 
explicit granule in the io_pgtable_cfg and just get rid of the 
bitmap-based guessing in arm_lpae_restrict_pgsizes() - otherwise, we'd 
still have to pre-restrict the bitmap, making it pretty much redundant 
anyway. I'll have a hack at that - in the meantime please feel free to 
queue patches 1-3 if you're happy with them, as that part is still 
feature-complete without all this context stuff.

Robin.

>
> Will
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-22 17:08             ` Robin Murphy
@ 2016-04-25 13:32                 ` Will Deacon
  -1 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-25 13:32 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Arnd Bergmann, Hitoshi Mitake, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

On Fri, Apr 22, 2016 at 06:08:46PM +0100, Robin Murphy wrote:
> On 21/04/16 17:18, Will Deacon wrote:
> >On Wed, Apr 13, 2016 at 06:13:00PM +0100, Robin Murphy wrote:
> >>Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
> >>accessor macros as conditional wrappers") makes the *_relaxed forms of
> >>I/O accessors universally available to drivers, in cases where writeq()
> >>is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
> >>end up falling back to writel() regardless of whether writel_relaxed()
> >>is available (identically for s/write/read/).
> >>
> >>Add corresponding relaxed forms of the nonatomic helpers to delegate
> >>to the equivalent 32-bit accessors as appropriate.
> >>
> >>CC: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
> >>CC: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> >>CC: Darren Hart <dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> >>CC: Hitoshi Mitake <mitake.hitoshi-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> >>Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> >>---
> >>  include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
> >>  include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
> >>  2 files changed, 50 insertions(+)
> >>
> >>diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
> >>index 11d7e84..1a85566 100644
> >>--- a/include/linux/io-64-nonatomic-hi-lo.h
> >>+++ b/include/linux/io-64-nonatomic-hi-lo.h
> >>@@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
> >>  	writel(val, addr);
> >>  }
> >>
> >>+static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
> >>+{
> >>+	const volatile u32 __iomem *p = addr;
> >>+	u32 low, high;
> >>+
> >>+	high = readl_relaxed(p + 1);
> >>+	low = readl_relaxed(p);
> >>+
> >>+	return low + ((u64)high << 32);
> >>+}
> >>+
> >>+static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
> >>+{
> >>+	writel_relaxed(val >> 32, addr + 4);
> >>+	writel_relaxed(val, addr);
> >>+}
> >
> >Could we not generate the _relaxed variants with some macro magic?
> 
> We _could_ - indeed I started doing that, but then decided that the
> obfuscation of horrible macro-templated functions wasn't worth saving a
> couple of hundred bytes in some code that isn't exactly difficult to
> maintain and has needed touching once in 4 years.
> 
> If you did want to go down the macro route, I may as well also generate both
> lo-hi and hi-lo headers all from a single template, it'd be really clever...
> <alarm bells> ;)

I certainly wasn't suggesting any more than the obvious macroisation,
but I'll leave it up to Arnd, as I think this falls on his lap.

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-25 13:32                 ` Will Deacon
  0 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-25 13:32 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Apr 22, 2016 at 06:08:46PM +0100, Robin Murphy wrote:
> On 21/04/16 17:18, Will Deacon wrote:
> >On Wed, Apr 13, 2016 at 06:13:00PM +0100, Robin Murphy wrote:
> >>Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
> >>accessor macros as conditional wrappers") makes the *_relaxed forms of
> >>I/O accessors universally available to drivers, in cases where writeq()
> >>is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
> >>end up falling back to writel() regardless of whether writel_relaxed()
> >>is available (identically for s/write/read/).
> >>
> >>Add corresponding relaxed forms of the nonatomic helpers to delegate
> >>to the equivalent 32-bit accessors as appropriate.
> >>
> >>CC: Arnd Bergmann <arnd@arndb.de>
> >>CC: Christoph Hellwig <hch@lst.de>
> >>CC: Darren Hart <dvhart@linux.intel.com>
> >>CC: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
> >>Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>---
> >>  include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
> >>  include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
> >>  2 files changed, 50 insertions(+)
> >>
> >>diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
> >>index 11d7e84..1a85566 100644
> >>--- a/include/linux/io-64-nonatomic-hi-lo.h
> >>+++ b/include/linux/io-64-nonatomic-hi-lo.h
> >>@@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
> >>  	writel(val, addr);
> >>  }
> >>
> >>+static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
> >>+{
> >>+	const volatile u32 __iomem *p = addr;
> >>+	u32 low, high;
> >>+
> >>+	high = readl_relaxed(p + 1);
> >>+	low = readl_relaxed(p);
> >>+
> >>+	return low + ((u64)high << 32);
> >>+}
> >>+
> >>+static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
> >>+{
> >>+	writel_relaxed(val >> 32, addr + 4);
> >>+	writel_relaxed(val, addr);
> >>+}
> >
> >Could we not generate the _relaxed variants with some macro magic?
> 
> We _could_ - indeed I started doing that, but then decided that the
> obfuscation of horrible macro-templated functions wasn't worth saving a
> couple of hundred bytes in some code that isn't exactly difficult to
> maintain and has needed touching once in 4 years.
> 
> If you did want to go down the macro route, I may as well also generate both
> lo-hi and hi-lo headers all from a single template, it'd be really clever...
> <alarm bells> ;)

I certainly wasn't suggesting any more than the obvious macroisation,
but I'll leave it up to Arnd, as I think this falls on his lap.

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
  2016-04-25 13:14                     ` Robin Murphy
@ 2016-04-25 13:41                         ` Will Deacon
  -1 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-25 13:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Apr 25, 2016 at 02:14:57PM +0100, Robin Murphy wrote:
> On 25/04/16 12:02, Will Deacon wrote:
> >In what case would you not have a viable AArch64 granule, but the option
> >of falling back to AArch32 makes things work?
> 
> A 4KB page kernel with MMU-401, whose only supported AArch64 granule is
> supposedly 64KB.

That doesn't sound right at all!

> >>Plus we'd have to remove the LPAE page sizes
> >>from the bitmap beforehand in the case we don't support AARCH64_4K lest we
> >>end up with a phantom format the hardware doesn't actually do, and it all
> >>starts getting rather horrible...
> >
> >I'm not following. The io-pgtable code shouldn't give you back a phanton
> >format.
> 
> If you call alloc_io_pgtable_ops() with ARM_64_LPAE_S2 and an unmolested
> MMU-401 pgsize_bitmap, you get back a 4K granule v8 pagetable, which per a
> strict reading of the TRM isn't supported (although does appear to work in
> practice...)

I suspect that's a badly worded/incorrect TRM. In reality, I'd be happy
to assume that a given granule is supported across all formats (in as
much as it can be) if it's supported by one of them, but acknowledge
that's not guaranteed by the architecture.

> Consider also a stage 1 SMMU supporting all formats but only with 4K
> granules: with a 64K page arm64 kernel, the presence of AARCH64_4K support
> would lead us into arm_64_lpae_alloc_pgtable_s1(), which would be tricked
> into giving us back a 64K granule v8 format by the short descriptor large
> page size matching PAGE_SIZE, and things would definitely go downhill from
> there.

We could improve this by having a separate pgsize_bitmap per format, and
passing the relevant one to the relevant page table constructor.

> Given that, I think the only truly safe thing to do is to pass an explicit
> granule in the io_pgtable_cfg and just get rid of the bitmap-based guessing
> in arm_lpae_restrict_pgsizes() - otherwise, we'd still have to pre-restrict
> the bitmap, making it pretty much redundant anyway. I'll have a hack at that

Actually, I think I prefer to go the other way. Let's try moving more of
this common logic into the io-pgtable code. For example, an IOMMU driver
could say "I support these formats, and these options (page sizes, quirks,
etc) for each format" and the io-pgtable code can choose the most
appropriate thing.

> - in the meantime please feel free to queue patches 1-3 if you're happy with
> them, as that part is still feature-complete without all this context stuff.

Queued those, thanks!

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
@ 2016-04-25 13:41                         ` Will Deacon
  0 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-25 13:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 25, 2016 at 02:14:57PM +0100, Robin Murphy wrote:
> On 25/04/16 12:02, Will Deacon wrote:
> >In what case would you not have a viable AArch64 granule, but the option
> >of falling back to AArch32 makes things work?
> 
> A 4KB page kernel with MMU-401, whose only supported AArch64 granule is
> supposedly 64KB.

That doesn't sound right at all!

> >>Plus we'd have to remove the LPAE page sizes
> >>from the bitmap beforehand in the case we don't support AARCH64_4K lest we
> >>end up with a phantom format the hardware doesn't actually do, and it all
> >>starts getting rather horrible...
> >
> >I'm not following. The io-pgtable code shouldn't give you back a phanton
> >format.
> 
> If you call alloc_io_pgtable_ops() with ARM_64_LPAE_S2 and an unmolested
> MMU-401 pgsize_bitmap, you get back a 4K granule v8 pagetable, which per a
> strict reading of the TRM isn't supported (although does appear to work in
> practice...)

I suspect that's a badly worded/incorrect TRM. In reality, I'd be happy
to assume that a given granule is supported across all formats (in as
much as it can be) if it's supported by one of them, but acknowledge
that's not guaranteed by the architecture.

> Consider also a stage 1 SMMU supporting all formats but only with 4K
> granules: with a 64K page arm64 kernel, the presence of AARCH64_4K support
> would lead us into arm_64_lpae_alloc_pgtable_s1(), which would be tricked
> into giving us back a 64K granule v8 format by the short descriptor large
> page size matching PAGE_SIZE, and things would definitely go downhill from
> there.

We could improve this by having a separate pgsize_bitmap per format, and
passing the relevant one to the relevant page table constructor.

> Given that, I think the only truly safe thing to do is to pass an explicit
> granule in the io_pgtable_cfg and just get rid of the bitmap-based guessing
> in arm_lpae_restrict_pgsizes() - otherwise, we'd still have to pre-restrict
> the bitmap, making it pretty much redundant anyway. I'll have a hack at that

Actually, I think I prefer to go the other way. Let's try moving more of
this common logic into the io-pgtable code. For example, an IOMMU driver
could say "I support these formats, and these options (page sizes, quirks,
etc) for each format" and the io-pgtable code can choose the most
appropriate thing.

> - in the meantime please feel free to queue patches 1-3 if you're happy with
> them, as that part is still feature-complete without all this context stuff.

Queued those, thanks!

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-25 13:32                 ` Will Deacon
@ 2016-04-25 15:21                     ` Arnd Bergmann
  -1 siblings, 0 replies; 56+ messages in thread
From: Arnd Bergmann @ 2016-04-25 15:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: Hitoshi Mitake, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

On Monday 25 April 2016 14:32:42 Will Deacon wrote:
> > >>
> > >>+static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
> > >>+{
> > >>+   const volatile u32 __iomem *p = addr;
> > >>+   u32 low, high;
> > >>+
> > >>+   high = readl_relaxed(p + 1);
> > >>+   low = readl_relaxed(p);
> > >>+
> > >>+   return low + ((u64)high << 32);
> > >>+}
> > >>+
> > >>+static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
> > >>+{
> > >>+   writel_relaxed(val >> 32, addr + 4);
> > >>+   writel_relaxed(val, addr);
> > >>+}
> > >
> > >Could we not generate the _relaxed variants with some macro magic?
> > 
> > We _could_ - indeed I started doing that, but then decided that the
> > obfuscation of horrible macro-templated functions wasn't worth saving a
> > couple of hundred bytes in some code that isn't exactly difficult to
> > maintain and has needed touching once in 4 years.
> > 
> > If you did want to go down the macro route, I may as well also generate both
> > lo-hi and hi-lo headers all from a single template, it'd be really clever...
> > <alarm bells> 
> 
> I certainly wasn't suggesting any more than the obvious macroisation,
> but I'll leave it up to Arnd, as I think this falls on his lap.

I'd prefer the open-coded variant as well.

	Arnd

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-25 15:21                     ` Arnd Bergmann
  0 siblings, 0 replies; 56+ messages in thread
From: Arnd Bergmann @ 2016-04-25 15:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 25 April 2016 14:32:42 Will Deacon wrote:
> > >>
> > >>+static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
> > >>+{
> > >>+   const volatile u32 __iomem *p = addr;
> > >>+   u32 low, high;
> > >>+
> > >>+   high = readl_relaxed(p + 1);
> > >>+   low = readl_relaxed(p);
> > >>+
> > >>+   return low + ((u64)high << 32);
> > >>+}
> > >>+
> > >>+static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
> > >>+{
> > >>+   writel_relaxed(val >> 32, addr + 4);
> > >>+   writel_relaxed(val, addr);
> > >>+}
> > >
> > >Could we not generate the _relaxed variants with some macro magic?
> > 
> > We _could_ - indeed I started doing that, but then decided that the
> > obfuscation of horrible macro-templated functions wasn't worth saving a
> > couple of hundred bytes in some code that isn't exactly difficult to
> > maintain and has needed touching once in 4 years.
> > 
> > If you did want to go down the macro route, I may as well also generate both
> > lo-hi and hi-lo headers all from a single template, it'd be really clever...
> > <alarm bells> 
> 
> I certainly wasn't suggesting any more than the obvious macroisation,
> but I'll leave it up to Arnd, as I think this falls on his lap.

I'd prefer the open-coded variant as well.

	Arnd

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-25 15:21                     ` Arnd Bergmann
@ 2016-04-25 15:28                       ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-25 15:28 UTC (permalink / raw)
  To: Arnd Bergmann, Will Deacon
  Cc: Hitoshi Mitake, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

Hi Arnd,

On 25/04/16 16:21, Arnd Bergmann wrote:
> On Monday 25 April 2016 14:32:42 Will Deacon wrote:
>>>>>
>>>>> +static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
>>>>> +{
>>>>> +   const volatile u32 __iomem *p = addr;
>>>>> +   u32 low, high;
>>>>> +
>>>>> +   high = readl_relaxed(p + 1);
>>>>> +   low = readl_relaxed(p);
>>>>> +
>>>>> +   return low + ((u64)high << 32);
>>>>> +}
>>>>> +
>>>>> +static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
>>>>> +{
>>>>> +   writel_relaxed(val >> 32, addr + 4);
>>>>> +   writel_relaxed(val, addr);
>>>>> +}
>>>>
>>>> Could we not generate the _relaxed variants with some macro magic?
>>>
>>> We _could_ - indeed I started doing that, but then decided that the
>>> obfuscation of horrible macro-templated functions wasn't worth saving a
>>> couple of hundred bytes in some code that isn't exactly difficult to
>>> maintain and has needed touching once in 4 years.
>>>
>>> If you did want to go down the macro route, I may as well also generate both
>>> lo-hi and hi-lo headers all from a single template, it'd be really clever...
>>> <alarm bells>
>>
>> I certainly wasn't suggesting any more than the obvious macroisation,
>> but I'll leave it up to Arnd, as I think this falls on his lap.
>
> I'd prefer the open-coded variant as well.

By that, do you mean sticking with the smmu_writeq() implementation in 
the driver and dropping this patch, or merging this patch as-is without 
further macro-magic?

Thanks,
Robin.

>
> 	Arnd
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-25 15:28                       ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-25 15:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Arnd,

On 25/04/16 16:21, Arnd Bergmann wrote:
> On Monday 25 April 2016 14:32:42 Will Deacon wrote:
>>>>>
>>>>> +static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
>>>>> +{
>>>>> +   const volatile u32 __iomem *p = addr;
>>>>> +   u32 low, high;
>>>>> +
>>>>> +   high = readl_relaxed(p + 1);
>>>>> +   low = readl_relaxed(p);
>>>>> +
>>>>> +   return low + ((u64)high << 32);
>>>>> +}
>>>>> +
>>>>> +static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
>>>>> +{
>>>>> +   writel_relaxed(val >> 32, addr + 4);
>>>>> +   writel_relaxed(val, addr);
>>>>> +}
>>>>
>>>> Could we not generate the _relaxed variants with some macro magic?
>>>
>>> We _could_ - indeed I started doing that, but then decided that the
>>> obfuscation of horrible macro-templated functions wasn't worth saving a
>>> couple of hundred bytes in some code that isn't exactly difficult to
>>> maintain and has needed touching once in 4 years.
>>>
>>> If you did want to go down the macro route, I may as well also generate both
>>> lo-hi and hi-lo headers all from a single template, it'd be really clever...
>>> <alarm bells>
>>
>> I certainly wasn't suggesting any more than the obvious macroisation,
>> but I'll leave it up to Arnd, as I think this falls on his lap.
>
> I'd prefer the open-coded variant as well.

By that, do you mean sticking with the smmu_writeq() implementation in 
the driver and dropping this patch, or merging this patch as-is without 
further macro-magic?

Thanks,
Robin.

>
> 	Arnd
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-25 15:28                       ` Robin Murphy
@ 2016-04-25 15:41                           ` Arnd Bergmann
  -1 siblings, 0 replies; 56+ messages in thread
From: Arnd Bergmann @ 2016-04-25 15:41 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Hitoshi Mitake, Will Deacon, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

On Monday 25 April 2016 16:28:01 Robin Murphy wrote:
> >>>
> >>> We _could_ - indeed I started doing that, but then decided that the
> >>> obfuscation of horrible macro-templated functions wasn't worth saving a
> >>> couple of hundred bytes in some code that isn't exactly difficult to
> >>> maintain and has needed touching once in 4 years.
> >>>
> >>> If you did want to go down the macro route, I may as well also generate both
> >>> lo-hi and hi-lo headers all from a single template, it'd be really clever...
> >>> <alarm bells>
> >>
> >> I certainly wasn't suggesting any more than the obvious macroisation,
> >> but I'll leave it up to Arnd, as I think this falls on his lap.
> >
> > I'd prefer the open-coded variant as well.
> 
> By that, do you mean sticking with the smmu_writeq() implementation in 
> the driver and dropping this patch, or merging this patch as-is without 
> further macro-magic?
> 

Sorry, that was really ambiguous on my end. I meant leaving patch 4/7
as it is in the version you posted.

However, leaving the open-coded writel_relaxed() in the driver or just
using the non-relaxed hi_lo_readq() would be totally fine too.

	Arnd

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-25 15:41                           ` Arnd Bergmann
  0 siblings, 0 replies; 56+ messages in thread
From: Arnd Bergmann @ 2016-04-25 15:41 UTC (permalink / raw)
  To: linux-arm-kernel

On Monday 25 April 2016 16:28:01 Robin Murphy wrote:
> >>>
> >>> We _could_ - indeed I started doing that, but then decided that the
> >>> obfuscation of horrible macro-templated functions wasn't worth saving a
> >>> couple of hundred bytes in some code that isn't exactly difficult to
> >>> maintain and has needed touching once in 4 years.
> >>>
> >>> If you did want to go down the macro route, I may as well also generate both
> >>> lo-hi and hi-lo headers all from a single template, it'd be really clever...
> >>> <alarm bells>
> >>
> >> I certainly wasn't suggesting any more than the obvious macroisation,
> >> but I'll leave it up to Arnd, as I think this falls on his lap.
> >
> > I'd prefer the open-coded variant as well.
> 
> By that, do you mean sticking with the smmu_writeq() implementation in 
> the driver and dropping this patch, or merging this patch as-is without 
> further macro-magic?
> 

Sorry, that was really ambiguous on my end. I meant leaving patch 4/7
as it is in the version you posted.

However, leaving the open-coded writel_relaxed() in the driver or just
using the non-relaxed hi_lo_readq() would be totally fine too.

	Arnd

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-25 15:41                           ` Arnd Bergmann
@ 2016-04-25 16:11                             ` Will Deacon
  -1 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-25 16:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Hitoshi Mitake, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

On Mon, Apr 25, 2016 at 05:41:21PM +0200, Arnd Bergmann wrote:
> On Monday 25 April 2016 16:28:01 Robin Murphy wrote:
> > >>>
> > >>> We _could_ - indeed I started doing that, but then decided that the
> > >>> obfuscation of horrible macro-templated functions wasn't worth saving a
> > >>> couple of hundred bytes in some code that isn't exactly difficult to
> > >>> maintain and has needed touching once in 4 years.
> > >>>
> > >>> If you did want to go down the macro route, I may as well also generate both
> > >>> lo-hi and hi-lo headers all from a single template, it'd be really clever...
> > >>> <alarm bells>
> > >>
> > >> I certainly wasn't suggesting any more than the obvious macroisation,
> > >> but I'll leave it up to Arnd, as I think this falls on his lap.
> > >
> > > I'd prefer the open-coded variant as well.
> > 
> > By that, do you mean sticking with the smmu_writeq() implementation in 
> > the driver and dropping this patch, or merging this patch as-is without 
> > further macro-magic?
> > 
> 
> Sorry, that was really ambiguous on my end. I meant leaving patch 4/7
> as it is in the version you posted.

I'm happy with that. Could I have your ack, so that I can queue it with
the related SMMU patches, please?

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-25 16:11                             ` Will Deacon
  0 siblings, 0 replies; 56+ messages in thread
From: Will Deacon @ 2016-04-25 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 25, 2016 at 05:41:21PM +0200, Arnd Bergmann wrote:
> On Monday 25 April 2016 16:28:01 Robin Murphy wrote:
> > >>>
> > >>> We _could_ - indeed I started doing that, but then decided that the
> > >>> obfuscation of horrible macro-templated functions wasn't worth saving a
> > >>> couple of hundred bytes in some code that isn't exactly difficult to
> > >>> maintain and has needed touching once in 4 years.
> > >>>
> > >>> If you did want to go down the macro route, I may as well also generate both
> > >>> lo-hi and hi-lo headers all from a single template, it'd be really clever...
> > >>> <alarm bells>
> > >>
> > >> I certainly wasn't suggesting any more than the obvious macroisation,
> > >> but I'll leave it up to Arnd, as I think this falls on his lap.
> > >
> > > I'd prefer the open-coded variant as well.
> > 
> > By that, do you mean sticking with the smmu_writeq() implementation in 
> > the driver and dropping this patch, or merging this patch as-is without 
> > further macro-magic?
> > 
> 
> Sorry, that was really ambiguous on my end. I meant leaving patch 4/7
> as it is in the version you posted.

I'm happy with that. Could I have your ack, so that I can queue it with
the related SMMU patches, please?

Will

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
  2016-04-13 17:13     ` Robin Murphy
@ 2016-04-25 16:11         ` Arnd Bergmann
  -1 siblings, 0 replies; 56+ messages in thread
From: Arnd Bergmann @ 2016-04-25 16:11 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Hitoshi Mitake, will.deacon-5wv7dgnIgG8, Christoph Hellwig,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

On Wednesday 13 April 2016 18:13:00 Robin Murphy wrote:
> Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
> accessor macros as conditional wrappers") makes the *_relaxed forms of
> I/O accessors universally available to drivers, in cases where writeq()
> is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
> end up falling back to writel() regardless of whether writel_relaxed()
> is available (identically for s/write/read/).
> 
> Add corresponding relaxed forms of the nonatomic helpers to delegate
> to the equivalent 32-bit accessors as appropriate.
> 
> CC: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
> CC: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> CC: Darren Hart <dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
> CC: Hitoshi Mitake <mitake.hitoshi-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
> 

Acked-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-25 16:11         ` Arnd Bergmann
  0 siblings, 0 replies; 56+ messages in thread
From: Arnd Bergmann @ 2016-04-25 16:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Wednesday 13 April 2016 18:13:00 Robin Murphy wrote:
> Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
> accessor macros as conditional wrappers") makes the *_relaxed forms of
> I/O accessors universally available to drivers, in cases where writeq()
> is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
> end up falling back to writel() regardless of whether writel_relaxed()
> is available (identically for s/write/read/).
> 
> Add corresponding relaxed forms of the nonatomic helpers to delegate
> to the equivalent 32-bit accessors as appropriate.
> 
> CC: Arnd Bergmann <arnd@arndb.de>
> CC: Christoph Hellwig <hch@lst.de>
> CC: Darren Hart <dvhart@linux.intel.com>
> CC: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 

Acked-by: Arnd Bergmann <arnd@arndb.de>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
  2016-04-25 13:41                         ` Will Deacon
@ 2016-04-25 16:21                             ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-25 16:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	tchalamarla-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8

On 25/04/16 14:41, Will Deacon wrote:
> On Mon, Apr 25, 2016 at 02:14:57PM +0100, Robin Murphy wrote:
>> On 25/04/16 12:02, Will Deacon wrote:
>>> In what case would you not have a viable AArch64 granule, but the option
>>> of falling back to AArch32 makes things work?
>>
>> A 4KB page kernel with MMU-401, whose only supported AArch64 granule is
>> supposedly 64KB.
>
> That doesn't sound right at all!
>
>>>> Plus we'd have to remove the LPAE page sizes
>>> >from the bitmap beforehand in the case we don't support AARCH64_4K lest we
>>>> end up with a phantom format the hardware doesn't actually do, and it all
>>>> starts getting rather horrible...
>>>
>>> I'm not following. The io-pgtable code shouldn't give you back a phanton
>>> format.
>>
>> If you call alloc_io_pgtable_ops() with ARM_64_LPAE_S2 and an unmolested
>> MMU-401 pgsize_bitmap, you get back a 4K granule v8 pagetable, which per a
>> strict reading of the TRM isn't supported (although does appear to work in
>> practice...)
>
> I suspect that's a badly worded/incorrect TRM. In reality, I'd be happy
> to assume that a given granule is supported across all formats (in as
> much as it can be) if it's supported by one of them, but acknowledge
> that's not guaranteed by the architecture.
>
>> Consider also a stage 1 SMMU supporting all formats but only with 4K
>> granules: with a 64K page arm64 kernel, the presence of AARCH64_4K support
>> would lead us into arm_64_lpae_alloc_pgtable_s1(), which would be tricked
>> into giving us back a 64K granule v8 format by the short descriptor large
>> page size matching PAGE_SIZE, and things would definitely go downhill from
>> there.
>
> We could improve this by having a separate pgsize_bitmap per format, and
> passing the relevant one to the relevant page table constructor.
>
>> Given that, I think the only truly safe thing to do is to pass an explicit
>> granule in the io_pgtable_cfg and just get rid of the bitmap-based guessing
>> in arm_lpae_restrict_pgsizes() - otherwise, we'd still have to pre-restrict
>> the bitmap, making it pretty much redundant anyway. I'll have a hack at that
>
> Actually, I think I prefer to go the other way. Let's try moving more of
> this common logic into the io-pgtable code. For example, an IOMMU driver
> could say "I support these formats, and these options (page sizes, quirks,
> etc) for each format" and the io-pgtable code can choose the most
> appropriate thing.

Agreed - that sounds like the best approach in the long run, and I'd be 
happy to put some thought into designing it once the to-do list shrinks 
a bit. Right now, though, how about a quick fix like I just hacked up 
below? (diff on top of this series because laziness)

Robin.

--->8---
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a917ad0e949f..0798926f28f1 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -784,28 +784,20 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,

  	/*
  	 * Choosing a suitable context format is even more fiddly. Until we
-	 * grow some way for the caller to express a preference, just aim for
-	 * the closest match to the CPU format out of what we might have.
+	 * grow some way for the caller to express a preference, and/or move
+	 * the decision into the io-pgtable code where it arguably belongs,
+	 * just aim for the closest thing to the rest of the system, and hope
+	 * that the hardware isn't mental enough that we can't assume AArch64
+	 * support to be a superset of AArch32 support...
  	 */
  	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_L)
  		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_L;
-	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
-		switch (PAGE_SIZE) {
-		case SZ_64K:
-			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
-				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
-				break;
-			} /* else fall through */
-		case SZ_16K:
-			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
-				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
-				break;
-			} /* else fall through */
-		case SZ_4K:
-			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
-				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
-		}
-	}
+	if ((IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) &&
+	    (smmu->features & (ARM_SMMU_FEAT_FMT_AARCH64_64K |
+			       ARM_SMMU_FEAT_FMT_AARCH64_16K |
+			       ARM_SMMU_FEAT_FMT_AARCH64_4K)))
+		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+
  	if (cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
  		ret = -EINVAL;
  		goto out_unlock;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config
@ 2016-04-25 16:21                             ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-25 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

On 25/04/16 14:41, Will Deacon wrote:
> On Mon, Apr 25, 2016 at 02:14:57PM +0100, Robin Murphy wrote:
>> On 25/04/16 12:02, Will Deacon wrote:
>>> In what case would you not have a viable AArch64 granule, but the option
>>> of falling back to AArch32 makes things work?
>>
>> A 4KB page kernel with MMU-401, whose only supported AArch64 granule is
>> supposedly 64KB.
>
> That doesn't sound right at all!
>
>>>> Plus we'd have to remove the LPAE page sizes
>>> >from the bitmap beforehand in the case we don't support AARCH64_4K lest we
>>>> end up with a phantom format the hardware doesn't actually do, and it all
>>>> starts getting rather horrible...
>>>
>>> I'm not following. The io-pgtable code shouldn't give you back a phanton
>>> format.
>>
>> If you call alloc_io_pgtable_ops() with ARM_64_LPAE_S2 and an unmolested
>> MMU-401 pgsize_bitmap, you get back a 4K granule v8 pagetable, which per a
>> strict reading of the TRM isn't supported (although does appear to work in
>> practice...)
>
> I suspect that's a badly worded/incorrect TRM. In reality, I'd be happy
> to assume that a given granule is supported across all formats (in as
> much as it can be) if it's supported by one of them, but acknowledge
> that's not guaranteed by the architecture.
>
>> Consider also a stage 1 SMMU supporting all formats but only with 4K
>> granules: with a 64K page arm64 kernel, the presence of AARCH64_4K support
>> would lead us into arm_64_lpae_alloc_pgtable_s1(), which would be tricked
>> into giving us back a 64K granule v8 format by the short descriptor large
>> page size matching PAGE_SIZE, and things would definitely go downhill from
>> there.
>
> We could improve this by having a separate pgsize_bitmap per format, and
> passing the relevant one to the relevant page table constructor.
>
>> Given that, I think the only truly safe thing to do is to pass an explicit
>> granule in the io_pgtable_cfg and just get rid of the bitmap-based guessing
>> in arm_lpae_restrict_pgsizes() - otherwise, we'd still have to pre-restrict
>> the bitmap, making it pretty much redundant anyway. I'll have a hack at that
>
> Actually, I think I prefer to go the other way. Let's try moving more of
> this common logic into the io-pgtable code. For example, an IOMMU driver
> could say "I support these formats, and these options (page sizes, quirks,
> etc) for each format" and the io-pgtable code can choose the most
> appropriate thing.

Agreed - that sounds like the best approach in the long run, and I'd be 
happy to put some thought into designing it once the to-do list shrinks 
a bit. Right now, though, how about a quick fix like I just hacked up 
below? (diff on top of this series because laziness)

Robin.

--->8---
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a917ad0e949f..0798926f28f1 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -784,28 +784,20 @@ static int arm_smmu_init_domain_context(struct 
iommu_domain *domain,

  	/*
  	 * Choosing a suitable context format is even more fiddly. Until we
-	 * grow some way for the caller to express a preference, just aim for
-	 * the closest match to the CPU format out of what we might have.
+	 * grow some way for the caller to express a preference, and/or move
+	 * the decision into the io-pgtable code where it arguably belongs,
+	 * just aim for the closest thing to the rest of the system, and hope
+	 * that the hardware isn't mental enough that we can't assume AArch64
+	 * support to be a superset of AArch32 support...
  	 */
  	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_L)
  		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_L;
-	if (IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
-		switch (PAGE_SIZE) {
-		case SZ_64K:
-			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K) {
-				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
-				break;
-			} /* else fall through */
-		case SZ_16K:
-			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K) {
-				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
-				break;
-			} /* else fall through */
-		case SZ_4K:
-			if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_4K)
-				cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
-		}
-	}
+	if ((IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) &&
+	    (smmu->features & (ARM_SMMU_FEAT_FMT_AARCH64_64K |
+			       ARM_SMMU_FEAT_FMT_AARCH64_16K |
+			       ARM_SMMU_FEAT_FMT_AARCH64_4K)))
+		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+
  	if (cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
  		ret = -EINVAL;
  		goto out_unlock;

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2] io-64-nonatomic: Add relaxed accessor variants
  2016-04-13 17:13     ` Robin Murphy
@ 2016-04-26 10:38         ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-26 10:38 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, arnd-r2nGTMty4D4
  Cc: Hitoshi Mitake,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Christoph Hellwig,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Darren Hart

Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
accessor macros as conditional wrappers") makes the *_relaxed forms of
I/O accessors universally available to drivers, in cases where writeq()
is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
end up falling back to writel() regardless of whether writel_relaxed()
is available (identically for s/write/read/).

Add corresponding relaxed forms of the nonatomic helpers to delegate
to the equivalent 32-bit accessors as appropriate. We also need to fix
io.h to avoid defining default relaxed variants if the basic accessors
themselves don't exist.

CC: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
CC: Darren Hart <dvhart-VuQAYsv1563Yd54FQh9/CA@public.gmane.org>
CC: Hitoshi Mitake <mitake.hitoshi-Zyj7fXuS5i5L9jVzuh4AOg@public.gmane.org>
Acked-by: Arnd Bergmann <arnd-r2nGTMty4D4@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

Changes since v1:
- Fix the embarassing copy-paste errors which we all managed to miss.
- Fix io.h so that the nonatomic relaxed definitions are actually used.

Arnd, I've taken the liberty of leaving your ack in place for the
additional io.h tweak - please yell at me if that's not OK.

Robin.

 include/asm-generic/io.h              |  4 ++--
 include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
 include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index eed3bbe88c8a..002b81f6f2bc 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -191,7 +191,7 @@ static inline void writeq(u64 value, volatile void __iomem *addr)
 #define readl_relaxed readl
 #endif
 
-#ifndef readq_relaxed
+#if defined(readq) && !defined(readq_relaxed)
 #define readq_relaxed readq
 #endif
 
@@ -207,7 +207,7 @@ static inline void writeq(u64 value, volatile void __iomem *addr)
 #define writel_relaxed writel
 #endif
 
-#ifndef writeq_relaxed
+#if defined(writeq) && !defined(writeq_relaxed)
 #define writeq_relaxed writeq
 #endif
 
diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
index 11d7e840d913..defcc4644ce3 100644
--- a/include/linux/io-64-nonatomic-hi-lo.h
+++ b/include/linux/io-64-nonatomic-hi-lo.h
@@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
 	writel(val, addr);
 }
 
+static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
+{
+	const volatile u32 __iomem *p = addr;
+	u32 low, high;
+
+	high = readl_relaxed(p + 1);
+	low = readl_relaxed(p);
+
+	return low + ((u64)high << 32);
+}
+
+static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
+{
+	writel_relaxed(val >> 32, addr + 4);
+	writel_relaxed(val, addr);
+}
+
 #ifndef readq
 #define readq hi_lo_readq
 #endif
@@ -29,4 +46,12 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
 #define writeq hi_lo_writeq
 #endif
 
+#ifndef readq_relaxed
+#define readq_relaxed hi_lo_readq_relaxed
+#endif
+
+#ifndef writeq_relaxed
+#define writeq_relaxed hi_lo_writeq_relaxed
+#endif
+
 #endif	/* _LINUX_IO_64_NONATOMIC_HI_LO_H_ */
diff --git a/include/linux/io-64-nonatomic-lo-hi.h b/include/linux/io-64-nonatomic-lo-hi.h
index 1a4315f97360..084461a4e5ab 100644
--- a/include/linux/io-64-nonatomic-lo-hi.h
+++ b/include/linux/io-64-nonatomic-lo-hi.h
@@ -21,6 +21,23 @@ static inline void lo_hi_writeq(__u64 val, volatile void __iomem *addr)
 	writel(val >> 32, addr + 4);
 }
 
+static inline __u64 lo_hi_readq_relaxed(const volatile void __iomem *addr)
+{
+	const volatile u32 __iomem *p = addr;
+	u32 low, high;
+
+	low = readl_relaxed(p);
+	high = readl_relaxed(p + 1);
+
+	return low + ((u64)high << 32);
+}
+
+static inline void lo_hi_writeq_relaxed(__u64 val, volatile void __iomem *addr)
+{
+	writel_relaxed(val, addr);
+	writel_relaxed(val >> 32, addr + 4);
+}
+
 #ifndef readq
 #define readq lo_hi_readq
 #endif
@@ -29,4 +46,12 @@ static inline void lo_hi_writeq(__u64 val, volatile void __iomem *addr)
 #define writeq lo_hi_writeq
 #endif
 
+#ifndef readq_relaxed
+#define readq_relaxed lo_hi_readq_relaxed
+#endif
+
+#ifndef writeq_relaxed
+#define writeq_relaxed lo_hi_writeq_relaxed
+#endif
+
 #endif	/* _LINUX_IO_64_NONATOMIC_LO_HI_H_ */
-- 
2.8.1.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2] io-64-nonatomic: Add relaxed accessor variants
@ 2016-04-26 10:38         ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-26 10:38 UTC (permalink / raw)
  To: linux-arm-kernel

Whilst commit 9439eb3ab9d1 ("asm-generic: io: implement relaxed
accessor macros as conditional wrappers") makes the *_relaxed forms of
I/O accessors universally available to drivers, in cases where writeq()
is implemented via the io-64-nonatomic helpers, writeq_relaxed() will
end up falling back to writel() regardless of whether writel_relaxed()
is available (identically for s/write/read/).

Add corresponding relaxed forms of the nonatomic helpers to delegate
to the equivalent 32-bit accessors as appropriate. We also need to fix
io.h to avoid defining default relaxed variants if the basic accessors
themselves don't exist.

CC: Christoph Hellwig <hch@lst.de>
CC: Darren Hart <dvhart@linux.intel.com>
CC: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

Changes since v1:
- Fix the embarassing copy-paste errors which we all managed to miss.
- Fix io.h so that the nonatomic relaxed definitions are actually used.

Arnd, I've taken the liberty of leaving your ack in place for the
additional io.h tweak - please yell at me if that's not OK.

Robin.

 include/asm-generic/io.h              |  4 ++--
 include/linux/io-64-nonatomic-hi-lo.h | 25 +++++++++++++++++++++++++
 include/linux/io-64-nonatomic-lo-hi.h | 25 +++++++++++++++++++++++++
 3 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index eed3bbe88c8a..002b81f6f2bc 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -191,7 +191,7 @@ static inline void writeq(u64 value, volatile void __iomem *addr)
 #define readl_relaxed readl
 #endif
 
-#ifndef readq_relaxed
+#if defined(readq) && !defined(readq_relaxed)
 #define readq_relaxed readq
 #endif
 
@@ -207,7 +207,7 @@ static inline void writeq(u64 value, volatile void __iomem *addr)
 #define writel_relaxed writel
 #endif
 
-#ifndef writeq_relaxed
+#if defined(writeq) && !defined(writeq_relaxed)
 #define writeq_relaxed writeq
 #endif
 
diff --git a/include/linux/io-64-nonatomic-hi-lo.h b/include/linux/io-64-nonatomic-hi-lo.h
index 11d7e840d913..defcc4644ce3 100644
--- a/include/linux/io-64-nonatomic-hi-lo.h
+++ b/include/linux/io-64-nonatomic-hi-lo.h
@@ -21,6 +21,23 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
 	writel(val, addr);
 }
 
+static inline __u64 hi_lo_readq_relaxed(const volatile void __iomem *addr)
+{
+	const volatile u32 __iomem *p = addr;
+	u32 low, high;
+
+	high = readl_relaxed(p + 1);
+	low = readl_relaxed(p);
+
+	return low + ((u64)high << 32);
+}
+
+static inline void hi_lo_writeq_relaxed(__u64 val, volatile void __iomem *addr)
+{
+	writel_relaxed(val >> 32, addr + 4);
+	writel_relaxed(val, addr);
+}
+
 #ifndef readq
 #define readq hi_lo_readq
 #endif
@@ -29,4 +46,12 @@ static inline void hi_lo_writeq(__u64 val, volatile void __iomem *addr)
 #define writeq hi_lo_writeq
 #endif
 
+#ifndef readq_relaxed
+#define readq_relaxed hi_lo_readq_relaxed
+#endif
+
+#ifndef writeq_relaxed
+#define writeq_relaxed hi_lo_writeq_relaxed
+#endif
+
 #endif	/* _LINUX_IO_64_NONATOMIC_HI_LO_H_ */
diff --git a/include/linux/io-64-nonatomic-lo-hi.h b/include/linux/io-64-nonatomic-lo-hi.h
index 1a4315f97360..084461a4e5ab 100644
--- a/include/linux/io-64-nonatomic-lo-hi.h
+++ b/include/linux/io-64-nonatomic-lo-hi.h
@@ -21,6 +21,23 @@ static inline void lo_hi_writeq(__u64 val, volatile void __iomem *addr)
 	writel(val >> 32, addr + 4);
 }
 
+static inline __u64 lo_hi_readq_relaxed(const volatile void __iomem *addr)
+{
+	const volatile u32 __iomem *p = addr;
+	u32 low, high;
+
+	low = readl_relaxed(p);
+	high = readl_relaxed(p + 1);
+
+	return low + ((u64)high << 32);
+}
+
+static inline void lo_hi_writeq_relaxed(__u64 val, volatile void __iomem *addr)
+{
+	writel_relaxed(val, addr);
+	writel_relaxed(val >> 32, addr + 4);
+}
+
 #ifndef readq
 #define readq lo_hi_readq
 #endif
@@ -29,4 +46,12 @@ static inline void lo_hi_writeq(__u64 val, volatile void __iomem *addr)
 #define writeq lo_hi_writeq
 #endif
 
+#ifndef readq_relaxed
+#define readq_relaxed lo_hi_readq_relaxed
+#endif
+
+#ifndef writeq_relaxed
+#define writeq_relaxed lo_hi_writeq_relaxed
+#endif
+
 #endif	/* _LINUX_IO_64_NONATOMIC_LO_HI_H_ */
-- 
2.8.1.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2] iommu/arm-smmu: Decouple context format from kernel config
  2016-04-13 17:13     ` Robin Murphy
@ 2016-04-28 16:12         ` Robin Murphy
  -1 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-28 16:12 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8, joro-zLv9SwRftAIdnm+yROfE0A
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	brian.starkey-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

The way the driver currently forces an AArch32 or AArch64 context format
based on the kernel config and SMMU architecture version is suboptimal,
in that it makes it very hard to support oddball mix-and-match cases
like the SMMUv1 64KB supplement, or situations where the reduced table
depth of an AArch32 short descriptor context may be desirable under an
AArch64 kernel. It also only happens to work on current implementations
which do support all the relevant formats.

Introduce an explicit notion of context format, so we can manage that
independently and get rid of the inflexible #ifdeffery.

Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

Changes since v1:
- Simplify the format decision to let the io-pgtable code pick AArch64
  granules itself, since (pl330 shenanigans aside) MMU-401 actually
  works better than it claims to. I'll look into abstracting this away
  completely into io-pgtable so it can be shared with SMMUv3 and others.
- Remove a couple more now-dead lines from arm_smmu_device_cfg_probe()
  which I missed first time around.

Robin.

 drivers/iommu/arm-smmu.c | 94 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 72 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index acff3326f818..625348431c5e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -117,6 +117,8 @@
 #define ID0_NTS				(1 << 28)
 #define ID0_SMS				(1 << 27)
 #define ID0_ATOSNS			(1 << 26)
+#define ID0_PTFS_NO_AARCH32		(1 << 25)
+#define ID0_PTFS_NO_AARCH32S		(1 << 24)
 #define ID0_CTTW			(1 << 14)
 #define ID0_NUMIRPT_SHIFT		16
 #define ID0_NUMIRPT_MASK		0xff
@@ -317,6 +319,11 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
 #define ARM_SMMU_FEAT_TRANS_OPS		(1 << 5)
 #define ARM_SMMU_FEAT_VMID16		(1 << 6)
+#define ARM_SMMU_FEAT_FMT_AARCH64_4K	(1 << 7)
+#define ARM_SMMU_FEAT_FMT_AARCH64_16K	(1 << 8)
+#define ARM_SMMU_FEAT_FMT_AARCH64_64K	(1 << 9)
+#define ARM_SMMU_FEAT_FMT_AARCH32_L	(1 << 10)
+#define ARM_SMMU_FEAT_FMT_AARCH32_S	(1 << 11)
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
@@ -346,10 +353,18 @@ struct arm_smmu_device {
 	u32				cavium_id_base; /* Specific to Cavium */
 };
 
+enum arm_smmu_context_fmt {
+	ARM_SMMU_CTX_FMT_NONE,
+	ARM_SMMU_CTX_FMT_AARCH64,
+	ARM_SMMU_CTX_FMT_AARCH32_L,
+	ARM_SMMU_CTX_FMT_AARCH32_S,
+};
+
 struct arm_smmu_cfg {
 	u8				cbndx;
 	u8				irptndx;
 	u32				cbar;
+	enum arm_smmu_context_fmt	fmt;
 };
 #define INVALID_IRPTNDX			0xff
 
@@ -619,14 +634,13 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 		reg = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
-		if (!IS_ENABLED(CONFIG_64BIT) || smmu->version == ARM_SMMU_V1) {
+		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
 			iova &= ~12UL;
 			iova |= ARM_SMMU_CB_ASID(smmu, cfg);
 			do {
 				writel_relaxed(iova, reg);
 				iova += granule;
 			} while (size -= granule);
-#ifdef CONFIG_64BIT
 		} else {
 			iova >>= 12;
 			iova |= (u64)ARM_SMMU_CB_ASID(smmu, cfg) << 48;
@@ -634,9 +648,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 				writeq_relaxed(iova, reg);
 				iova += granule >> 12;
 			} while (size -= granule);
-#endif
 		}
-#ifdef CONFIG_64BIT
 	} else if (smmu->version == ARM_SMMU_V2) {
 		reg = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
@@ -646,7 +658,6 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 			smmu_write_atomic_lq(iova, reg);
 			iova += granule >> 12;
 		} while (size -= granule);
-#endif
 	} else {
 		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
 		writel_relaxed(ARM_SMMU_CB_VMID(smmu, cfg), reg);
@@ -745,11 +756,10 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 
 	if (smmu->version > ARM_SMMU_V1) {
-#ifdef CONFIG_64BIT
-		reg = CBA2R_RW64_64BIT;
-#else
-		reg = CBA2R_RW64_32BIT;
-#endif
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+			reg = CBA2R_RW64_64BIT;
+		else
+			reg = CBA2R_RW64_32BIT;
 		/* 16-bit VMIDs live in CBA2R */
 		if (smmu->features & ARM_SMMU_FEAT_VMID16)
 			reg |= ARM_SMMU_CB_VMID(smmu, cfg) << CBA2R_VMID_SHIFT;
@@ -860,16 +870,40 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+	/*
+	 * Choosing a suitable context format is even more fiddly. Until we
+	 * grow some way for the caller to express a preference, and/or move
+	 * the decision into the io-pgtable code where it arguably belongs,
+	 * just aim for the closest thing to the rest of the system, and hope
+	 * that the hardware isn't mental enough that we can't assume AArch64
+	 * support to be a superset of AArch32 support...
+	 */
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_L)
+		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_L;
+	if ((IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) &&
+	    (smmu->features & (ARM_SMMU_FEAT_FMT_AARCH64_64K |
+			       ARM_SMMU_FEAT_FMT_AARCH64_16K |
+			       ARM_SMMU_FEAT_FMT_AARCH64_4K)))
+		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+
+	if (cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
 		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
 		start = smmu->num_s2_context_banks;
 		ias = smmu->va_size;
 		oas = smmu->ipa_size;
-		if (IS_ENABLED(CONFIG_64BIT))
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
 			fmt = ARM_64_LPAE_S1;
-		else
+		} else {
 			fmt = ARM_32_LPAE_S1;
+			ias = min(ias, 32UL);
+			oas = min(oas, 40UL);
+		}
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 		/*
@@ -881,10 +915,13 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		start = 0;
 		ias = smmu->ipa_size;
 		oas = smmu->pa_size;
-		if (IS_ENABLED(CONFIG_64BIT))
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
 			fmt = ARM_64_LPAE_S2;
-		else
+		} else {
 			fmt = ARM_32_LPAE_S2;
+			ias = min(ias, 40UL);
+			oas = min(oas, 40UL);
+		}
 		break;
 	default:
 		ret = -EINVAL;
@@ -1670,6 +1707,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 					   ID0_NUMSIDB_MASK;
 	}
 
+	if (smmu->version < ARM_SMMU_V2 || !(id & ID0_PTFS_NO_AARCH32)) {
+		smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_L;
+		if (!(id & ID0_PTFS_NO_AARCH32S))
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_S;
+	}
+
 	/* ID1 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID1);
 	smmu->pgshift = (id & ID1_PAGESIZE) ? 16 : 12;
@@ -1725,22 +1768,29 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 
 	if (smmu->version == ARM_SMMU_V1) {
 		smmu->va_size = smmu->ipa_size;
-		size = SZ_4K | SZ_2M | SZ_1G;
 	} else {
 		size = (id >> ID2_UBS_SHIFT) & ID2_UBS_MASK;
 		smmu->va_size = arm_smmu_id_size_to_bits(size);
-#ifndef CONFIG_64BIT
-		smmu->va_size = min(32UL, smmu->va_size);
-#endif
-		size = 0;
 		if (id & ID2_PTFS_4K)
-			size |= SZ_4K | SZ_2M | SZ_1G;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_4K;
 		if (id & ID2_PTFS_16K)
-			size |= SZ_16K | SZ_32M;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_16K;
 		if (id & ID2_PTFS_64K)
-			size |= SZ_64K | SZ_512M;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
 	}
 
+	/* Now we've corralled the various formats, what'll it do? */
+	size = 0;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_S)
+		size |= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+	if (smmu->features &
+	    (ARM_SMMU_FEAT_FMT_AARCH32_L | ARM_SMMU_FEAT_FMT_AARCH64_4K))
+		size |= SZ_4K | SZ_2M | SZ_1G;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K)
+		size |= SZ_16K | SZ_32M;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K)
+		size |= SZ_64K | SZ_512M;
+
 	arm_smmu_ops.pgsize_bitmap &= size;
 	dev_notice(smmu->dev, "\tSupported page sizes: 0x%08lx\n", size);
 
-- 
2.8.1.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2] iommu/arm-smmu: Decouple context format from kernel config
@ 2016-04-28 16:12         ` Robin Murphy
  0 siblings, 0 replies; 56+ messages in thread
From: Robin Murphy @ 2016-04-28 16:12 UTC (permalink / raw)
  To: linux-arm-kernel

The way the driver currently forces an AArch32 or AArch64 context format
based on the kernel config and SMMU architecture version is suboptimal,
in that it makes it very hard to support oddball mix-and-match cases
like the SMMUv1 64KB supplement, or situations where the reduced table
depth of an AArch32 short descriptor context may be desirable under an
AArch64 kernel. It also only happens to work on current implementations
which do support all the relevant formats.

Introduce an explicit notion of context format, so we can manage that
independently and get rid of the inflexible #ifdeffery.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

Changes since v1:
- Simplify the format decision to let the io-pgtable code pick AArch64
  granules itself, since (pl330 shenanigans aside) MMU-401 actually
  works better than it claims to. I'll look into abstracting this away
  completely into io-pgtable so it can be shared with SMMUv3 and others.
- Remove a couple more now-dead lines from arm_smmu_device_cfg_probe()
  which I missed first time around.

Robin.

 drivers/iommu/arm-smmu.c | 94 ++++++++++++++++++++++++++++++++++++------------
 1 file changed, 72 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index acff3326f818..625348431c5e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -117,6 +117,8 @@
 #define ID0_NTS				(1 << 28)
 #define ID0_SMS				(1 << 27)
 #define ID0_ATOSNS			(1 << 26)
+#define ID0_PTFS_NO_AARCH32		(1 << 25)
+#define ID0_PTFS_NO_AARCH32S		(1 << 24)
 #define ID0_CTTW			(1 << 14)
 #define ID0_NUMIRPT_SHIFT		16
 #define ID0_NUMIRPT_MASK		0xff
@@ -317,6 +319,11 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_TRANS_NESTED	(1 << 4)
 #define ARM_SMMU_FEAT_TRANS_OPS		(1 << 5)
 #define ARM_SMMU_FEAT_VMID16		(1 << 6)
+#define ARM_SMMU_FEAT_FMT_AARCH64_4K	(1 << 7)
+#define ARM_SMMU_FEAT_FMT_AARCH64_16K	(1 << 8)
+#define ARM_SMMU_FEAT_FMT_AARCH64_64K	(1 << 9)
+#define ARM_SMMU_FEAT_FMT_AARCH32_L	(1 << 10)
+#define ARM_SMMU_FEAT_FMT_AARCH32_S	(1 << 11)
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
@@ -346,10 +353,18 @@ struct arm_smmu_device {
 	u32				cavium_id_base; /* Specific to Cavium */
 };
 
+enum arm_smmu_context_fmt {
+	ARM_SMMU_CTX_FMT_NONE,
+	ARM_SMMU_CTX_FMT_AARCH64,
+	ARM_SMMU_CTX_FMT_AARCH32_L,
+	ARM_SMMU_CTX_FMT_AARCH32_S,
+};
+
 struct arm_smmu_cfg {
 	u8				cbndx;
 	u8				irptndx;
 	u32				cbar;
+	enum arm_smmu_context_fmt	fmt;
 };
 #define INVALID_IRPTNDX			0xff
 
@@ -619,14 +634,13 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 		reg = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 		reg += leaf ? ARM_SMMU_CB_S1_TLBIVAL : ARM_SMMU_CB_S1_TLBIVA;
 
-		if (!IS_ENABLED(CONFIG_64BIT) || smmu->version == ARM_SMMU_V1) {
+		if (cfg->fmt != ARM_SMMU_CTX_FMT_AARCH64) {
 			iova &= ~12UL;
 			iova |= ARM_SMMU_CB_ASID(smmu, cfg);
 			do {
 				writel_relaxed(iova, reg);
 				iova += granule;
 			} while (size -= granule);
-#ifdef CONFIG_64BIT
 		} else {
 			iova >>= 12;
 			iova |= (u64)ARM_SMMU_CB_ASID(smmu, cfg) << 48;
@@ -634,9 +648,7 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 				writeq_relaxed(iova, reg);
 				iova += granule >> 12;
 			} while (size -= granule);
-#endif
 		}
-#ifdef CONFIG_64BIT
 	} else if (smmu->version == ARM_SMMU_V2) {
 		reg = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 		reg += leaf ? ARM_SMMU_CB_S2_TLBIIPAS2L :
@@ -646,7 +658,6 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
 			smmu_write_atomic_lq(iova, reg);
 			iova += granule >> 12;
 		} while (size -= granule);
-#endif
 	} else {
 		reg = ARM_SMMU_GR0(smmu) + ARM_SMMU_GR0_TLBIVMID;
 		writel_relaxed(ARM_SMMU_CB_VMID(smmu, cfg), reg);
@@ -745,11 +756,10 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 	cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx);
 
 	if (smmu->version > ARM_SMMU_V1) {
-#ifdef CONFIG_64BIT
-		reg = CBA2R_RW64_64BIT;
-#else
-		reg = CBA2R_RW64_32BIT;
-#endif
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
+			reg = CBA2R_RW64_64BIT;
+		else
+			reg = CBA2R_RW64_32BIT;
 		/* 16-bit VMIDs live in CBA2R */
 		if (smmu->features & ARM_SMMU_FEAT_VMID16)
 			reg |= ARM_SMMU_CB_VMID(smmu, cfg) << CBA2R_VMID_SHIFT;
@@ -860,16 +870,40 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 
+	/*
+	 * Choosing a suitable context format is even more fiddly. Until we
+	 * grow some way for the caller to express a preference, and/or move
+	 * the decision into the io-pgtable code where it arguably belongs,
+	 * just aim for the closest thing to the rest of the system, and hope
+	 * that the hardware isn't mental enough that we can't assume AArch64
+	 * support to be a superset of AArch32 support...
+	 */
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_L)
+		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH32_L;
+	if ((IS_ENABLED(CONFIG_64BIT) || cfg->fmt == ARM_SMMU_CTX_FMT_NONE) &&
+	    (smmu->features & (ARM_SMMU_FEAT_FMT_AARCH64_64K |
+			       ARM_SMMU_FEAT_FMT_AARCH64_16K |
+			       ARM_SMMU_FEAT_FMT_AARCH64_4K)))
+		cfg->fmt = ARM_SMMU_CTX_FMT_AARCH64;
+
+	if (cfg->fmt == ARM_SMMU_CTX_FMT_NONE) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
 		cfg->cbar = CBAR_TYPE_S1_TRANS_S2_BYPASS;
 		start = smmu->num_s2_context_banks;
 		ias = smmu->va_size;
 		oas = smmu->ipa_size;
-		if (IS_ENABLED(CONFIG_64BIT))
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
 			fmt = ARM_64_LPAE_S1;
-		else
+		} else {
 			fmt = ARM_32_LPAE_S1;
+			ias = min(ias, 32UL);
+			oas = min(oas, 40UL);
+		}
 		break;
 	case ARM_SMMU_DOMAIN_NESTED:
 		/*
@@ -881,10 +915,13 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		start = 0;
 		ias = smmu->ipa_size;
 		oas = smmu->pa_size;
-		if (IS_ENABLED(CONFIG_64BIT))
+		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64) {
 			fmt = ARM_64_LPAE_S2;
-		else
+		} else {
 			fmt = ARM_32_LPAE_S2;
+			ias = min(ias, 40UL);
+			oas = min(oas, 40UL);
+		}
 		break;
 	default:
 		ret = -EINVAL;
@@ -1670,6 +1707,12 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 					   ID0_NUMSIDB_MASK;
 	}
 
+	if (smmu->version < ARM_SMMU_V2 || !(id & ID0_PTFS_NO_AARCH32)) {
+		smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_L;
+		if (!(id & ID0_PTFS_NO_AARCH32S))
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH32_S;
+	}
+
 	/* ID1 */
 	id = readl_relaxed(gr0_base + ARM_SMMU_GR0_ID1);
 	smmu->pgshift = (id & ID1_PAGESIZE) ? 16 : 12;
@@ -1725,22 +1768,29 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu)
 
 	if (smmu->version == ARM_SMMU_V1) {
 		smmu->va_size = smmu->ipa_size;
-		size = SZ_4K | SZ_2M | SZ_1G;
 	} else {
 		size = (id >> ID2_UBS_SHIFT) & ID2_UBS_MASK;
 		smmu->va_size = arm_smmu_id_size_to_bits(size);
-#ifndef CONFIG_64BIT
-		smmu->va_size = min(32UL, smmu->va_size);
-#endif
-		size = 0;
 		if (id & ID2_PTFS_4K)
-			size |= SZ_4K | SZ_2M | SZ_1G;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_4K;
 		if (id & ID2_PTFS_16K)
-			size |= SZ_16K | SZ_32M;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_16K;
 		if (id & ID2_PTFS_64K)
-			size |= SZ_64K | SZ_512M;
+			smmu->features |= ARM_SMMU_FEAT_FMT_AARCH64_64K;
 	}
 
+	/* Now we've corralled the various formats, what'll it do? */
+	size = 0;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH32_S)
+		size |= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
+	if (smmu->features &
+	    (ARM_SMMU_FEAT_FMT_AARCH32_L | ARM_SMMU_FEAT_FMT_AARCH64_4K))
+		size |= SZ_4K | SZ_2M | SZ_1G;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_16K)
+		size |= SZ_16K | SZ_32M;
+	if (smmu->features & ARM_SMMU_FEAT_FMT_AARCH64_64K)
+		size |= SZ_64K | SZ_512M;
+
 	arm_smmu_ops.pgsize_bitmap &= size;
 	dev_notice(smmu->dev, "\tSupported page sizes: 0x%08lx\n", size);
 
-- 
2.8.1.dirty

^ permalink raw reply related	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2016-04-28 16:12 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-13 17:12 [PATCH 0/7] arm-smmu: Implementation and context format differentiation Robin Murphy
2016-04-13 17:12 ` Robin Murphy
     [not found] ` <cover.1460391217.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-04-13 17:12   ` [PATCH 1/7] iommu/arm-smmu: Differentiate specific implementations Robin Murphy
2016-04-13 17:12     ` Robin Murphy
     [not found]     ` <cc1789284c5efa05514231fa3dede9d1d5f2df18.1460391217.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-04-13 21:15       ` Chalamarla, Tirumalesh
2016-04-13 21:15         ` Chalamarla, Tirumalesh
2016-04-13 17:12   ` [PATCH 2/7] iommu/arm-smmu: Convert ThunderX workaround to new method Robin Murphy
2016-04-13 17:12     ` Robin Murphy
     [not found]     ` <98b8079ee3ede4427b045214a60ba77f1cb3552c.1460391217.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-04-13 21:16       ` Chalamarla, Tirumalesh
2016-04-13 21:16         ` Chalamarla, Tirumalesh
2016-04-13 17:12   ` [PATCH 3/7] iommu/arm-smmu: Work around MMU-500 prefetch errata Robin Murphy
2016-04-13 17:12     ` Robin Murphy
     [not found]     ` <0484444b6257bfb6adb68405a72c64fc4fc98142.1460391217.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-04-21 16:15       ` Will Deacon
2016-04-21 16:15         ` Will Deacon
2016-04-21 16:16       ` Will Deacon
2016-04-21 16:16         ` Will Deacon
2016-04-13 17:13   ` [PATCH 4/7] io-64-nonatomic: Add relaxed accessor variants Robin Murphy
2016-04-13 17:13     ` Robin Murphy
     [not found]     ` <44173fd4e8efd27d670cadc6b30e215243a14099.1460391217.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-04-21 16:18       ` Will Deacon
2016-04-21 16:18         ` Will Deacon
     [not found]         ` <20160421161859.GK929-5wv7dgnIgG8@public.gmane.org>
2016-04-22 17:08           ` Robin Murphy
2016-04-22 17:08             ` Robin Murphy
     [not found]             ` <571A5A9E.7040305-5wv7dgnIgG8@public.gmane.org>
2016-04-25 13:32               ` Will Deacon
2016-04-25 13:32                 ` Will Deacon
     [not found]                 ` <20160425133242.GC30830-5wv7dgnIgG8@public.gmane.org>
2016-04-25 15:21                   ` Arnd Bergmann
2016-04-25 15:21                     ` Arnd Bergmann
2016-04-25 15:28                     ` Robin Murphy
2016-04-25 15:28                       ` Robin Murphy
     [not found]                       ` <571E3781.3070609-5wv7dgnIgG8@public.gmane.org>
2016-04-25 15:41                         ` Arnd Bergmann
2016-04-25 15:41                           ` Arnd Bergmann
2016-04-25 16:11                           ` Will Deacon
2016-04-25 16:11                             ` Will Deacon
2016-04-25 16:11       ` Arnd Bergmann
2016-04-25 16:11         ` Arnd Bergmann
2016-04-26 10:38       ` [PATCH v2] " Robin Murphy
2016-04-26 10:38         ` Robin Murphy
2016-04-13 17:13   ` [PATCH 5/7] iommu/arm-smmu: Tidy up 64-bit/atomic I/O accesses Robin Murphy
2016-04-13 17:13     ` Robin Murphy
2016-04-13 17:13   ` [PATCH 6/7] iommu/arm-smmu: Decouple context format from kernel config Robin Murphy
2016-04-13 17:13     ` Robin Murphy
     [not found]     ` <173006777218859d1671ae517c70592c6c02f630.1460391217.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2016-04-21 16:30       ` Will Deacon
2016-04-21 16:30         ` Will Deacon
     [not found]         ` <20160421163019.GL929-5wv7dgnIgG8@public.gmane.org>
2016-04-22 17:38           ` Robin Murphy
2016-04-22 17:38             ` Robin Murphy
     [not found]             ` <571A617C.3020102-5wv7dgnIgG8@public.gmane.org>
2016-04-25 11:02               ` Will Deacon
2016-04-25 11:02                 ` Will Deacon
     [not found]                 ` <20160425110219.GH16065-5wv7dgnIgG8@public.gmane.org>
2016-04-25 13:14                   ` Robin Murphy
2016-04-25 13:14                     ` Robin Murphy
     [not found]                     ` <571E1851.2030400-5wv7dgnIgG8@public.gmane.org>
2016-04-25 13:41                       ` Will Deacon
2016-04-25 13:41                         ` Will Deacon
     [not found]                         ` <20160425134108.GD30830-5wv7dgnIgG8@public.gmane.org>
2016-04-25 16:21                           ` Robin Murphy
2016-04-25 16:21                             ` Robin Murphy
2016-04-28 16:12       ` [PATCH v2] " Robin Murphy
2016-04-28 16:12         ` Robin Murphy
2016-04-13 17:13   ` [PATCH 7/7] iommu/arm-smmu: Support SMMUv1 64KB supplement Robin Murphy
2016-04-13 17:13     ` Robin Murphy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.