All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables
@ 2019-10-25 18:08 ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

Hi all,

Since the flawed first attempt, I've reworked things with an abstracted
TCR and an explicit TTBR1 quirk. I originally envisaged the need to pass
the quirk all the way down to the TLBI calls, hence getting diverted
into trying to make the parameter passing less cluttered in general, but
in the end it turned out fairly neat to just fix the indexing such that
we can always just pass around the original unmodified IOVA. Most of the
new patches come from staring at that indexing code for long enough to
see the subtle inefficiencies that were worth ironing out, plus a bit of
random cleanup which doesn't feel worth posting separately.

Note that these patches depend on the fixes already queued in -rc4,
otherwise there will be conflicts in arm_mali_lpae_alloc_pgtable().

Robin.


Robin Murphy (10):
  iommu/io-pgtable: Make selftest gubbins consistently __init
  iommu/io-pgtable-arm: Rationalise size check
  iommu/io-pgtable-arm: Simplify bounds checks
  iommu/io-pgtable-arm: Simplify start level lookup
  iommu/io-pgtable-arm: Simplify PGD size handling
  iommu/io-pgtable-arm: Simplify level indexing
  iommu/io-pgtable-arm: Rationalise MAIR handling
  iommu/io-pgtable-arm: Rationalise TTBRn handling
  iommu/io-pgtable-arm: Rationalise TCR handling
  iommu/io-pgtable-arm: Prepare for TTBR1 usage

 drivers/iommu/arm-smmu-v3.c        |  45 ++----
 drivers/iommu/arm-smmu.c           |  20 +--
 drivers/iommu/arm-smmu.h           |  27 ++++
 drivers/iommu/io-pgtable-arm-v7s.c |  37 +++--
 drivers/iommu/io-pgtable-arm.c     | 238 ++++++++++++++---------------
 drivers/iommu/io-pgtable.c         |   2 +-
 drivers/iommu/ipmmu-vmsa.c         |   4 +-
 drivers/iommu/msm_iommu.c          |   4 +-
 drivers/iommu/mtk_iommu.c          |   4 +-
 drivers/iommu/qcom_iommu.c         |  15 +-
 include/linux/io-pgtable.h         |  19 ++-
 11 files changed, 209 insertions(+), 206 deletions(-)

-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables
@ 2019-10-25 18:08 ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

Hi all,

Since the flawed first attempt, I've reworked things with an abstracted
TCR and an explicit TTBR1 quirk. I originally envisaged the need to pass
the quirk all the way down to the TLBI calls, hence getting diverted
into trying to make the parameter passing less cluttered in general, but
in the end it turned out fairly neat to just fix the indexing such that
we can always just pass around the original unmodified IOVA. Most of the
new patches come from staring at that indexing code for long enough to
see the subtle inefficiencies that were worth ironing out, plus a bit of
random cleanup which doesn't feel worth posting separately.

Note that these patches depend on the fixes already queued in -rc4,
otherwise there will be conflicts in arm_mali_lpae_alloc_pgtable().

Robin.


Robin Murphy (10):
  iommu/io-pgtable: Make selftest gubbins consistently __init
  iommu/io-pgtable-arm: Rationalise size check
  iommu/io-pgtable-arm: Simplify bounds checks
  iommu/io-pgtable-arm: Simplify start level lookup
  iommu/io-pgtable-arm: Simplify PGD size handling
  iommu/io-pgtable-arm: Simplify level indexing
  iommu/io-pgtable-arm: Rationalise MAIR handling
  iommu/io-pgtable-arm: Rationalise TTBRn handling
  iommu/io-pgtable-arm: Rationalise TCR handling
  iommu/io-pgtable-arm: Prepare for TTBR1 usage

 drivers/iommu/arm-smmu-v3.c        |  45 ++----
 drivers/iommu/arm-smmu.c           |  20 +--
 drivers/iommu/arm-smmu.h           |  27 ++++
 drivers/iommu/io-pgtable-arm-v7s.c |  37 +++--
 drivers/iommu/io-pgtable-arm.c     | 238 ++++++++++++++---------------
 drivers/iommu/io-pgtable.c         |   2 +-
 drivers/iommu/ipmmu-vmsa.c         |   4 +-
 drivers/iommu/msm_iommu.c          |   4 +-
 drivers/iommu/mtk_iommu.c          |   4 +-
 drivers/iommu/qcom_iommu.c         |  15 +-
 include/linux/io-pgtable.h         |  19 ++-
 11 files changed, 209 insertions(+), 206 deletions(-)

-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH v2 01/10] iommu/io-pgtable: Make selftest gubbins consistently __init
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

The selftests run as an initcall, but the annotation of the various
callbacks and data seems to be somewhat arbitrary. Add it consistently
for everything related to the selftests.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm-v7s.c | 15 ++++++++-------
 drivers/iommu/io-pgtable-arm.c     | 13 +++++++------
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 4cb394937700..7c3bd2c3cdca 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -846,27 +846,28 @@ struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
 
 #ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
 
-static struct io_pgtable_cfg *cfg_cookie;
+static struct io_pgtable_cfg *cfg_cookie __initdata;
 
-static void dummy_tlb_flush_all(void *cookie)
+static void __init dummy_tlb_flush_all(void *cookie)
 {
 	WARN_ON(cookie != cfg_cookie);
 }
 
-static void dummy_tlb_flush(unsigned long iova, size_t size, size_t granule,
-			    void *cookie)
+static void __init dummy_tlb_flush(unsigned long iova, size_t size,
+				   size_t granule, void *cookie)
 {
 	WARN_ON(cookie != cfg_cookie);
 	WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
 }
 
-static void dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
-			       unsigned long iova, size_t granule, void *cookie)
+static void __init dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
+				      unsigned long iova, size_t granule,
+				      void *cookie)
 {
 	dummy_tlb_flush(iova, granule, granule, cookie);
 }
 
-static const struct iommu_flush_ops dummy_tlb_ops = {
+static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
 	.tlb_flush_all	= dummy_tlb_flush_all,
 	.tlb_flush_walk	= dummy_tlb_flush,
 	.tlb_flush_leaf	= dummy_tlb_flush,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index fdc1a8308a1c..afa61b32b052 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -1100,22 +1100,23 @@ struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
 
 #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
 
-static struct io_pgtable_cfg *cfg_cookie;
+static struct io_pgtable_cfg *cfg_cookie __initdata;
 
-static void dummy_tlb_flush_all(void *cookie)
+static void __init dummy_tlb_flush_all(void *cookie)
 {
 	WARN_ON(cookie != cfg_cookie);
 }
 
-static void dummy_tlb_flush(unsigned long iova, size_t size, size_t granule,
-			    void *cookie)
+static void __init dummy_tlb_flush(unsigned long iova, size_t size,
+				   size_t granule, void *cookie)
 {
 	WARN_ON(cookie != cfg_cookie);
 	WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
 }
 
-static void dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
-			       unsigned long iova, size_t granule, void *cookie)
+static void __init dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
+				      unsigned long iova, size_t granule,
+				      void *cookie)
 {
 	dummy_tlb_flush(iova, granule, granule, cookie);
 }
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 01/10] iommu/io-pgtable: Make selftest gubbins consistently __init
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

The selftests run as an initcall, but the annotation of the various
callbacks and data seems to be somewhat arbitrary. Add it consistently
for everything related to the selftests.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm-v7s.c | 15 ++++++++-------
 drivers/iommu/io-pgtable-arm.c     | 13 +++++++------
 2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 4cb394937700..7c3bd2c3cdca 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -846,27 +846,28 @@ struct io_pgtable_init_fns io_pgtable_arm_v7s_init_fns = {
 
 #ifdef CONFIG_IOMMU_IO_PGTABLE_ARMV7S_SELFTEST
 
-static struct io_pgtable_cfg *cfg_cookie;
+static struct io_pgtable_cfg *cfg_cookie __initdata;
 
-static void dummy_tlb_flush_all(void *cookie)
+static void __init dummy_tlb_flush_all(void *cookie)
 {
 	WARN_ON(cookie != cfg_cookie);
 }
 
-static void dummy_tlb_flush(unsigned long iova, size_t size, size_t granule,
-			    void *cookie)
+static void __init dummy_tlb_flush(unsigned long iova, size_t size,
+				   size_t granule, void *cookie)
 {
 	WARN_ON(cookie != cfg_cookie);
 	WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
 }
 
-static void dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
-			       unsigned long iova, size_t granule, void *cookie)
+static void __init dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
+				      unsigned long iova, size_t granule,
+				      void *cookie)
 {
 	dummy_tlb_flush(iova, granule, granule, cookie);
 }
 
-static const struct iommu_flush_ops dummy_tlb_ops = {
+static const struct iommu_flush_ops dummy_tlb_ops __initconst = {
 	.tlb_flush_all	= dummy_tlb_flush_all,
 	.tlb_flush_walk	= dummy_tlb_flush,
 	.tlb_flush_leaf	= dummy_tlb_flush,
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index fdc1a8308a1c..afa61b32b052 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -1100,22 +1100,23 @@ struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = {
 
 #ifdef CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST
 
-static struct io_pgtable_cfg *cfg_cookie;
+static struct io_pgtable_cfg *cfg_cookie __initdata;
 
-static void dummy_tlb_flush_all(void *cookie)
+static void __init dummy_tlb_flush_all(void *cookie)
 {
 	WARN_ON(cookie != cfg_cookie);
 }
 
-static void dummy_tlb_flush(unsigned long iova, size_t size, size_t granule,
-			    void *cookie)
+static void __init dummy_tlb_flush(unsigned long iova, size_t size,
+				   size_t granule, void *cookie)
 {
 	WARN_ON(cookie != cfg_cookie);
 	WARN_ON(!(size & cfg_cookie->pgsize_bitmap));
 }
 
-static void dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
-			       unsigned long iova, size_t granule, void *cookie)
+static void __init dummy_tlb_add_page(struct iommu_iotlb_gather *gather,
+				      unsigned long iova, size_t granule,
+				      void *cookie)
 {
 	dummy_tlb_flush(iova, granule, granule, cookie);
 }
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 02/10] iommu/io-pgtable-arm: Rationalise size check
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

It makes little sense to only validate the requested size after we think
we've found a matching block size - making the check up-front is simple,
and far more logical than waiting to walk off the bottom of the table to
infer that we must have been passed a bogus size to start with.

We're missing an equivalent check on the unmap path, so add that as well
for consistency.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index afa61b32b052..2cef0f5335e4 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -392,7 +392,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
 
 	/* If we can install a leaf entry at this level, then do so */
-	if (size == block_size && (size & cfg->pgsize_bitmap))
+	if (size == block_size)
 		return arm_lpae_init_pte(data, iova, paddr, prot, lvl, ptep);
 
 	/* We can't allocate tables at the final level */
@@ -479,6 +479,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 			phys_addr_t paddr, size_t size, int iommu_prot)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
 	int ret, lvl = ARM_LPAE_START_LVL(data);
 	arm_lpae_iopte prot;
@@ -487,6 +488,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
 		return 0;
 
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return -EINVAL;
+
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
 		    paddr >= (1ULL << data->iop.cfg.oas)))
 		return -ERANGE;
@@ -652,9 +656,13 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 			     size_t size, struct iommu_iotlb_gather *gather)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
 	int lvl = ARM_LPAE_START_LVL(data);
 
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return 0;
+
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
 		return 0;
 
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 02/10] iommu/io-pgtable-arm: Rationalise size check
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

It makes little sense to only validate the requested size after we think
we've found a matching block size - making the check up-front is simple,
and far more logical than waiting to walk off the bottom of the table to
infer that we must have been passed a bogus size to start with.

We're missing an equivalent check on the unmap path, so add that as well
for consistency.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index afa61b32b052..2cef0f5335e4 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -392,7 +392,7 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova,
 	ptep += ARM_LPAE_LVL_IDX(iova, lvl, data);
 
 	/* If we can install a leaf entry at this level, then do so */
-	if (size == block_size && (size & cfg->pgsize_bitmap))
+	if (size == block_size)
 		return arm_lpae_init_pte(data, iova, paddr, prot, lvl, ptep);
 
 	/* We can't allocate tables at the final level */
@@ -479,6 +479,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 			phys_addr_t paddr, size_t size, int iommu_prot)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
 	int ret, lvl = ARM_LPAE_START_LVL(data);
 	arm_lpae_iopte prot;
@@ -487,6 +488,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
 		return 0;
 
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return -EINVAL;
+
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
 		    paddr >= (1ULL << data->iop.cfg.oas)))
 		return -ERANGE;
@@ -652,9 +656,13 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 			     size_t size, struct iommu_iotlb_gather *gather)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
+	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
 	int lvl = ARM_LPAE_START_LVL(data);
 
+	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
+		return 0;
+
 	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
 		return 0;
 
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 03/10] iommu/io-pgtable-arm: Simplify bounds checks
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

We're merely checking that the relevant upper bits of each address
are all zero, so there are cheaper ways to achieve that.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 2cef0f5335e4..a9dff0ecf0c3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -491,8 +491,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return -EINVAL;
 
-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
-		    paddr >= (1ULL << data->iop.cfg.oas)))
+	if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
 		return -ERANGE;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
@@ -663,7 +662,7 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return 0;
 
-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
+	if (WARN_ON(iova >> data->iop.cfg.ias))
 		return 0;
 
 	return __arm_lpae_unmap(data, gather, iova, size, lvl, ptep);
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 03/10] iommu/io-pgtable-arm: Simplify bounds checks
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

We're merely checking that the relevant upper bits of each address
are all zero, so there are cheaper ways to achieve that.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 2cef0f5335e4..a9dff0ecf0c3 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -491,8 +491,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return -EINVAL;
 
-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias) ||
-		    paddr >= (1ULL << data->iop.cfg.oas)))
+	if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
 		return -ERANGE;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
@@ -663,7 +662,7 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return 0;
 
-	if (WARN_ON(iova >= (1ULL << data->iop.cfg.ias)))
+	if (WARN_ON(iova >> data->iop.cfg.ias))
 		return 0;
 
 	return __arm_lpae_unmap(data, gather, iova, size, lvl, ptep);
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 04/10] iommu/io-pgtable-arm: Simplify start level lookup
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

Beyond a couple of allocation-time calculations, data->levels is only
ever used to derive the start level. Storing the start level directly
leads to a small reduction in object code, which should help eke out a
little more efficiency, and slightly more readable source to boot.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 45 +++++++++++++++-------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index a9dff0ecf0c3..fb5d30e04001 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -31,19 +31,13 @@
 #define io_pgtable_ops_to_data(x)					\
 	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
 
-/*
- * For consistency with the architecture, we always consider
- * ARM_LPAE_MAX_LEVELS levels, with the walk starting at level n >=0
- */
-#define ARM_LPAE_START_LVL(d)		(ARM_LPAE_MAX_LEVELS - (d)->levels)
-
 /*
  * Calculate the right shift amount to get to the portion describing level l
  * in a virtual address mapped by the pagetable in d.
  */
 #define ARM_LPAE_LVL_SHIFT(l,d)						\
-	((((d)->levels - ((l) - ARM_LPAE_START_LVL(d) + 1))		\
-	  * (d)->bits_per_level) + (d)->pg_shift)
+	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
+	(d)->pg_shift)
 
 #define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
 
@@ -55,7 +49,7 @@
  * pagetable in d.
  */
 #define ARM_LPAE_PGD_IDX(l,d)						\
-	((l) == ARM_LPAE_START_LVL(d) ? ilog2(ARM_LPAE_PAGES_PER_PGD(d)) : 0)
+	((l) == (d)->start_level ? ilog2(ARM_LPAE_PAGES_PER_PGD(d)) : 0)
 
 #define ARM_LPAE_LVL_IDX(a,l,d)						\
 	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
@@ -180,7 +174,7 @@
 struct arm_lpae_io_pgtable {
 	struct io_pgtable	iop;
 
-	int			levels;
+	int			start_level;
 	size_t			pgd_size;
 	unsigned long		pg_shift;
 	unsigned long		bits_per_level;
@@ -481,7 +475,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
-	int ret, lvl = ARM_LPAE_START_LVL(data);
+	int ret, lvl = data->start_level;
 	arm_lpae_iopte prot;
 
 	/* If no access, then nothing to do */
@@ -511,7 +505,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 	arm_lpae_iopte *start, *end;
 	unsigned long table_size;
 
-	if (lvl == ARM_LPAE_START_LVL(data))
+	if (lvl == data->start_level)
 		table_size = data->pgd_size;
 	else
 		table_size = ARM_LPAE_GRANULE(data);
@@ -540,7 +534,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
 
-	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
+	__arm_lpae_free_pgtable(data, data->start_level, data->pgd);
 	kfree(data);
 }
 
@@ -657,7 +651,6 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
-	int lvl = ARM_LPAE_START_LVL(data);
 
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return 0;
@@ -665,7 +658,7 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(iova >> data->iop.cfg.ias))
 		return 0;
 
-	return __arm_lpae_unmap(data, gather, iova, size, lvl, ptep);
+	return __arm_lpae_unmap(data, gather, iova, size, data->start_level, ptep);
 }
 
 static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
@@ -673,7 +666,7 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	arm_lpae_iopte pte, *ptep = data->pgd;
-	int lvl = ARM_LPAE_START_LVL(data);
+	int lvl = data->start_level;
 
 	do {
 		/* Valid IOPTE pointer? */
@@ -752,6 +745,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 {
 	unsigned long va_bits, pgd_bits;
 	struct arm_lpae_io_pgtable *data;
+	int levels;
 
 	arm_lpae_restrict_pgsizes(cfg);
 
@@ -777,10 +771,11 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	data->bits_per_level = data->pg_shift - ilog2(sizeof(arm_lpae_iopte));
 
 	va_bits = cfg->ias - data->pg_shift;
-	data->levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
+	levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
+	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
 
 	/* Calculate the actual size of our pgd (without concatenation) */
-	pgd_bits = va_bits - (data->bits_per_level * (data->levels - 1));
+	pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
 	data->pgd_size = 1UL << (pgd_bits + ilog2(sizeof(arm_lpae_iopte)));
 
 	data->iop.ops = (struct io_pgtable_ops) {
@@ -910,13 +905,13 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	 * Concatenate PGDs at level 1 if possible in order to reduce
 	 * the depth of the stage-2 walk.
 	 */
-	if (data->levels == ARM_LPAE_MAX_LEVELS) {
+	if (data->start_level == 0) {
 		unsigned long pgd_pages;
 
 		pgd_pages = data->pgd_size >> ilog2(sizeof(arm_lpae_iopte));
 		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
 			data->pgd_size = pgd_pages << data->pg_shift;
-			data->levels--;
+			data->start_level++;
 		}
 	}
 
@@ -926,7 +921,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
 	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
 
-	sl = ARM_LPAE_START_LVL(data);
+	sl = data->start_level;
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
@@ -1041,8 +1036,8 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 		return NULL;
 
 	/* Mali seems to need a full 4-level table regardless of IAS */
-	if (data->levels < ARM_LPAE_MAX_LEVELS) {
-		data->levels = ARM_LPAE_MAX_LEVELS;
+	if (data->start_level > 0) {
+		data->start_level = 0;
 		data->pgd_size = sizeof(arm_lpae_iopte);
 	}
 	/*
@@ -1143,8 +1138,8 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
 	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
 		cfg->pgsize_bitmap, cfg->ias);
 	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
-		data->levels, data->pgd_size, data->pg_shift,
-		data->bits_per_level, data->pgd);
+		ARM_LPAE_MAX_LEVELS - data->start_level, data->pgd_size,
+		data->pg_shift, data->bits_per_level, data->pgd);
 }
 
 #define __FAIL(ops, i)	({						\
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 04/10] iommu/io-pgtable-arm: Simplify start level lookup
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

Beyond a couple of allocation-time calculations, data->levels is only
ever used to derive the start level. Storing the start level directly
leads to a small reduction in object code, which should help eke out a
little more efficiency, and slightly more readable source to boot.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 45 +++++++++++++++-------------------
 1 file changed, 20 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index a9dff0ecf0c3..fb5d30e04001 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -31,19 +31,13 @@
 #define io_pgtable_ops_to_data(x)					\
 	io_pgtable_to_data(io_pgtable_ops_to_pgtable(x))
 
-/*
- * For consistency with the architecture, we always consider
- * ARM_LPAE_MAX_LEVELS levels, with the walk starting at level n >=0
- */
-#define ARM_LPAE_START_LVL(d)		(ARM_LPAE_MAX_LEVELS - (d)->levels)
-
 /*
  * Calculate the right shift amount to get to the portion describing level l
  * in a virtual address mapped by the pagetable in d.
  */
 #define ARM_LPAE_LVL_SHIFT(l,d)						\
-	((((d)->levels - ((l) - ARM_LPAE_START_LVL(d) + 1))		\
-	  * (d)->bits_per_level) + (d)->pg_shift)
+	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
+	(d)->pg_shift)
 
 #define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
 
@@ -55,7 +49,7 @@
  * pagetable in d.
  */
 #define ARM_LPAE_PGD_IDX(l,d)						\
-	((l) == ARM_LPAE_START_LVL(d) ? ilog2(ARM_LPAE_PAGES_PER_PGD(d)) : 0)
+	((l) == (d)->start_level ? ilog2(ARM_LPAE_PAGES_PER_PGD(d)) : 0)
 
 #define ARM_LPAE_LVL_IDX(a,l,d)						\
 	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
@@ -180,7 +174,7 @@
 struct arm_lpae_io_pgtable {
 	struct io_pgtable	iop;
 
-	int			levels;
+	int			start_level;
 	size_t			pgd_size;
 	unsigned long		pg_shift;
 	unsigned long		bits_per_level;
@@ -481,7 +475,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
-	int ret, lvl = ARM_LPAE_START_LVL(data);
+	int ret, lvl = data->start_level;
 	arm_lpae_iopte prot;
 
 	/* If no access, then nothing to do */
@@ -511,7 +505,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 	arm_lpae_iopte *start, *end;
 	unsigned long table_size;
 
-	if (lvl == ARM_LPAE_START_LVL(data))
+	if (lvl == data->start_level)
 		table_size = data->pgd_size;
 	else
 		table_size = ARM_LPAE_GRANULE(data);
@@ -540,7 +534,7 @@ static void arm_lpae_free_pgtable(struct io_pgtable *iop)
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_to_data(iop);
 
-	__arm_lpae_free_pgtable(data, ARM_LPAE_START_LVL(data), data->pgd);
+	__arm_lpae_free_pgtable(data, data->start_level, data->pgd);
 	kfree(data);
 }
 
@@ -657,7 +651,6 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
-	int lvl = ARM_LPAE_START_LVL(data);
 
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return 0;
@@ -665,7 +658,7 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(iova >> data->iop.cfg.ias))
 		return 0;
 
-	return __arm_lpae_unmap(data, gather, iova, size, lvl, ptep);
+	return __arm_lpae_unmap(data, gather, iova, size, data->start_level, ptep);
 }
 
 static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
@@ -673,7 +666,7 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 {
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	arm_lpae_iopte pte, *ptep = data->pgd;
-	int lvl = ARM_LPAE_START_LVL(data);
+	int lvl = data->start_level;
 
 	do {
 		/* Valid IOPTE pointer? */
@@ -752,6 +745,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 {
 	unsigned long va_bits, pgd_bits;
 	struct arm_lpae_io_pgtable *data;
+	int levels;
 
 	arm_lpae_restrict_pgsizes(cfg);
 
@@ -777,10 +771,11 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	data->bits_per_level = data->pg_shift - ilog2(sizeof(arm_lpae_iopte));
 
 	va_bits = cfg->ias - data->pg_shift;
-	data->levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
+	levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
+	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
 
 	/* Calculate the actual size of our pgd (without concatenation) */
-	pgd_bits = va_bits - (data->bits_per_level * (data->levels - 1));
+	pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
 	data->pgd_size = 1UL << (pgd_bits + ilog2(sizeof(arm_lpae_iopte)));
 
 	data->iop.ops = (struct io_pgtable_ops) {
@@ -910,13 +905,13 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	 * Concatenate PGDs at level 1 if possible in order to reduce
 	 * the depth of the stage-2 walk.
 	 */
-	if (data->levels == ARM_LPAE_MAX_LEVELS) {
+	if (data->start_level == 0) {
 		unsigned long pgd_pages;
 
 		pgd_pages = data->pgd_size >> ilog2(sizeof(arm_lpae_iopte));
 		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
 			data->pgd_size = pgd_pages << data->pg_shift;
-			data->levels--;
+			data->start_level++;
 		}
 	}
 
@@ -926,7 +921,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
 	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
 
-	sl = ARM_LPAE_START_LVL(data);
+	sl = data->start_level;
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
@@ -1041,8 +1036,8 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 		return NULL;
 
 	/* Mali seems to need a full 4-level table regardless of IAS */
-	if (data->levels < ARM_LPAE_MAX_LEVELS) {
-		data->levels = ARM_LPAE_MAX_LEVELS;
+	if (data->start_level > 0) {
+		data->start_level = 0;
 		data->pgd_size = sizeof(arm_lpae_iopte);
 	}
 	/*
@@ -1143,8 +1138,8 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
 	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
 		cfg->pgsize_bitmap, cfg->ias);
 	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
-		data->levels, data->pgd_size, data->pg_shift,
-		data->bits_per_level, data->pgd);
+		ARM_LPAE_MAX_LEVELS - data->start_level, data->pgd_size,
+		data->pg_shift, data->bits_per_level, data->pgd);
 }
 
 #define __FAIL(ops, i)	({						\
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 05/10] iommu/io-pgtable-arm: Simplify PGD size handling
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

We use data->pgd_size directly for the one-off allocation and freeing of
the top-level table, but otherwise it serves for ARM_LPAE_PGD_IDX() to
repeatedly re-calculate the effective number of top-level address bits
it represents. Flip this around so we store the form we most commonly
need, and derive the lesser-used one instead. This cuts a whole bunch of
code out of the map/unmap/iova_to_phys fast-paths.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index fb5d30e04001..4b1483eb0ccf 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -40,16 +40,15 @@
 	(d)->pg_shift)
 
 #define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
-
-#define ARM_LPAE_PAGES_PER_PGD(d)					\
-	DIV_ROUND_UP((d)->pgd_size, ARM_LPAE_GRANULE(d))
+#define ARM_LPAE_PGD_SIZE(d)						\
+	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
 
 /*
  * Calculate the index at level l used to map virtual address a using the
  * pagetable in d.
  */
 #define ARM_LPAE_PGD_IDX(l,d)						\
-	((l) == (d)->start_level ? ilog2(ARM_LPAE_PAGES_PER_PGD(d)) : 0)
+	((l) == (d)->start_level ? (d)->pgd_bits - (d)->bits_per_level : 0)
 
 #define ARM_LPAE_LVL_IDX(a,l,d)						\
 	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
@@ -174,8 +173,8 @@
 struct arm_lpae_io_pgtable {
 	struct io_pgtable	iop;
 
+	int			pgd_bits;
 	int			start_level;
-	size_t			pgd_size;
 	unsigned long		pg_shift;
 	unsigned long		bits_per_level;
 
@@ -506,7 +505,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 	unsigned long table_size;
 
 	if (lvl == data->start_level)
-		table_size = data->pgd_size;
+		table_size = ARM_LPAE_PGD_SIZE(data);
 	else
 		table_size = ARM_LPAE_GRANULE(data);
 
@@ -743,7 +742,7 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 static struct arm_lpae_io_pgtable *
 arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 {
-	unsigned long va_bits, pgd_bits;
+	unsigned long va_bits;
 	struct arm_lpae_io_pgtable *data;
 	int levels;
 
@@ -775,8 +774,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
 
 	/* Calculate the actual size of our pgd (without concatenation) */
-	pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
-	data->pgd_size = 1UL << (pgd_bits + ilog2(sizeof(arm_lpae_iopte)));
+	data->pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
 
 	data->iop.ops = (struct io_pgtable_ops) {
 		.map		= arm_lpae_map,
@@ -870,7 +868,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s1_cfg.mair[1] = 0;
 
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
+					   GFP_KERNEL, cfg);
 	if (!data->pgd)
 		goto out_free_data;
 
@@ -908,9 +907,9 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	if (data->start_level == 0) {
 		unsigned long pgd_pages;
 
-		pgd_pages = data->pgd_size >> ilog2(sizeof(arm_lpae_iopte));
+		pgd_pages = ARM_LPAE_PGD_SIZE(data) / sizeof(arm_lpae_iopte);
 		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
-			data->pgd_size = pgd_pages << data->pg_shift;
+			data->pgd_bits += data->bits_per_level;
 			data->start_level++;
 		}
 	}
@@ -967,7 +966,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s2_cfg.vtcr = reg;
 
 	/* Allocate pgd pages */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
+					   GFP_KERNEL, cfg);
 	if (!data->pgd)
 		goto out_free_data;
 
@@ -1038,7 +1038,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	/* Mali seems to need a full 4-level table regardless of IAS */
 	if (data->start_level > 0) {
 		data->start_level = 0;
-		data->pgd_size = sizeof(arm_lpae_iopte);
+		data->pgd_bits = 0;
 	}
 	/*
 	 * MEMATTR: Mali has no actual notion of a non-cacheable type, so the
@@ -1055,7 +1055,8 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
 		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
 
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
+					   cfg);
 	if (!data->pgd)
 		goto out_free_data;
 
@@ -1138,7 +1139,7 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
 	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
 		cfg->pgsize_bitmap, cfg->ias);
 	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
-		ARM_LPAE_MAX_LEVELS - data->start_level, data->pgd_size,
+		ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
 		data->pg_shift, data->bits_per_level, data->pgd);
 }
 
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 05/10] iommu/io-pgtable-arm: Simplify PGD size handling
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

We use data->pgd_size directly for the one-off allocation and freeing of
the top-level table, but otherwise it serves for ARM_LPAE_PGD_IDX() to
repeatedly re-calculate the effective number of top-level address bits
it represents. Flip this around so we store the form we most commonly
need, and derive the lesser-used one instead. This cuts a whole bunch of
code out of the map/unmap/iova_to_phys fast-paths.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index fb5d30e04001..4b1483eb0ccf 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -40,16 +40,15 @@
 	(d)->pg_shift)
 
 #define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
-
-#define ARM_LPAE_PAGES_PER_PGD(d)					\
-	DIV_ROUND_UP((d)->pgd_size, ARM_LPAE_GRANULE(d))
+#define ARM_LPAE_PGD_SIZE(d)						\
+	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
 
 /*
  * Calculate the index at level l used to map virtual address a using the
  * pagetable in d.
  */
 #define ARM_LPAE_PGD_IDX(l,d)						\
-	((l) == (d)->start_level ? ilog2(ARM_LPAE_PAGES_PER_PGD(d)) : 0)
+	((l) == (d)->start_level ? (d)->pgd_bits - (d)->bits_per_level : 0)
 
 #define ARM_LPAE_LVL_IDX(a,l,d)						\
 	(((u64)(a) >> ARM_LPAE_LVL_SHIFT(l,d)) &			\
@@ -174,8 +173,8 @@
 struct arm_lpae_io_pgtable {
 	struct io_pgtable	iop;
 
+	int			pgd_bits;
 	int			start_level;
-	size_t			pgd_size;
 	unsigned long		pg_shift;
 	unsigned long		bits_per_level;
 
@@ -506,7 +505,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl,
 	unsigned long table_size;
 
 	if (lvl == data->start_level)
-		table_size = data->pgd_size;
+		table_size = ARM_LPAE_PGD_SIZE(data);
 	else
 		table_size = ARM_LPAE_GRANULE(data);
 
@@ -743,7 +742,7 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 static struct arm_lpae_io_pgtable *
 arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 {
-	unsigned long va_bits, pgd_bits;
+	unsigned long va_bits;
 	struct arm_lpae_io_pgtable *data;
 	int levels;
 
@@ -775,8 +774,7 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
 
 	/* Calculate the actual size of our pgd (without concatenation) */
-	pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
-	data->pgd_size = 1UL << (pgd_bits + ilog2(sizeof(arm_lpae_iopte)));
+	data->pgd_bits = va_bits - (data->bits_per_level * (levels - 1));
 
 	data->iop.ops = (struct io_pgtable_ops) {
 		.map		= arm_lpae_map,
@@ -870,7 +868,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s1_cfg.mair[1] = 0;
 
 	/* Looking good; allocate a pgd */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
+					   GFP_KERNEL, cfg);
 	if (!data->pgd)
 		goto out_free_data;
 
@@ -908,9 +907,9 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	if (data->start_level == 0) {
 		unsigned long pgd_pages;
 
-		pgd_pages = data->pgd_size >> ilog2(sizeof(arm_lpae_iopte));
+		pgd_pages = ARM_LPAE_PGD_SIZE(data) / sizeof(arm_lpae_iopte);
 		if (pgd_pages <= ARM_LPAE_S2_MAX_CONCAT_PAGES) {
-			data->pgd_size = pgd_pages << data->pg_shift;
+			data->pgd_bits += data->bits_per_level;
 			data->start_level++;
 		}
 	}
@@ -967,7 +966,8 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	cfg->arm_lpae_s2_cfg.vtcr = reg;
 
 	/* Allocate pgd pages */
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
+					   GFP_KERNEL, cfg);
 	if (!data->pgd)
 		goto out_free_data;
 
@@ -1038,7 +1038,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 	/* Mali seems to need a full 4-level table regardless of IAS */
 	if (data->start_level > 0) {
 		data->start_level = 0;
-		data->pgd_size = sizeof(arm_lpae_iopte);
+		data->pgd_bits = 0;
 	}
 	/*
 	 * MEMATTR: Mali has no actual notion of a non-cacheable type, so the
@@ -1055,7 +1055,8 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie)
 		(ARM_MALI_LPAE_MEMATTR_IMP_DEF
 		 << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV));
 
-	data->pgd = __arm_lpae_alloc_pages(data->pgd_size, GFP_KERNEL, cfg);
+	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL,
+					   cfg);
 	if (!data->pgd)
 		goto out_free_data;
 
@@ -1138,7 +1139,7 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
 	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
 		cfg->pgsize_bitmap, cfg->ias);
 	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
-		ARM_LPAE_MAX_LEVELS - data->start_level, data->pgd_size,
+		ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
 		data->pg_shift, data->bits_per_level, data->pgd);
 }
 
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

The nature of the LPAE format means that data->pg_shift is always
redundant with data->bits_per_level, since they represent the size of a
page and the number of PTEs per page respectively, and the size of a PTE
is constant. Thus it works out more efficient to only store the latter,
and derive the former via a trivial addition where necessary.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 4b1483eb0ccf..15b4927ce36b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -36,10 +36,11 @@
  * in a virtual address mapped by the pagetable in d.
  */
 #define ARM_LPAE_LVL_SHIFT(l,d)						\
-	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
-	(d)->pg_shift)
+	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
+	ilog2(sizeof(arm_lpae_iopte)))
 
-#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
+#define ARM_LPAE_GRANULE(d)						\
+	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
 #define ARM_LPAE_PGD_SIZE(d)						\
 	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
 
@@ -55,9 +56,7 @@
 	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
 
 /* Calculate the block/page mapping size at level l for pagetable in d. */
-#define ARM_LPAE_BLOCK_SIZE(l,d)					\
-	(1ULL << (ilog2(sizeof(arm_lpae_iopte)) +			\
-		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
+#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
 
 /* Page table bits */
 #define ARM_LPAE_PTE_TYPE_SHIFT		0
@@ -175,8 +174,7 @@ struct arm_lpae_io_pgtable {
 
 	int			pgd_bits;
 	int			start_level;
-	unsigned long		pg_shift;
-	unsigned long		bits_per_level;
+	int			bits_per_level;
 
 	void			*pgd;
 };
@@ -206,7 +204,7 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
 {
 	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
 
-	if (data->pg_shift < 16)
+	if (data->bits_per_level < 13) /* i.e. 64K granule */
 		return paddr;
 
 	/* Rotate the packed high-order bits back to the top */
@@ -742,9 +740,8 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 static struct arm_lpae_io_pgtable *
 arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 {
-	unsigned long va_bits;
 	struct arm_lpae_io_pgtable *data;
-	int levels;
+	int levels, va_bits, pg_shift;
 
 	arm_lpae_restrict_pgsizes(cfg);
 
@@ -766,10 +763,10 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	if (!data)
 		return NULL;
 
-	data->pg_shift = __ffs(cfg->pgsize_bitmap);
-	data->bits_per_level = data->pg_shift - ilog2(sizeof(arm_lpae_iopte));
+	pg_shift = __ffs(cfg->pgsize_bitmap);
+	data->bits_per_level = pg_shift - ilog2(sizeof(arm_lpae_iopte));
 
-	va_bits = cfg->ias - data->pg_shift;
+	va_bits = cfg->ias - pg_shift;
 	levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
 	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
 
@@ -1138,9 +1135,9 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
 
 	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
 		cfg->pgsize_bitmap, cfg->ias);
-	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
+	pr_err("data: %d levels, 0x%zx pgd_size, %u pg_shift, %u bits_per_level, pgd @ %p\n",
 		ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
-		data->pg_shift, data->bits_per_level, data->pgd);
+		ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, data->pgd);
 }
 
 #define __FAIL(ops, i)	({						\
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

The nature of the LPAE format means that data->pg_shift is always
redundant with data->bits_per_level, since they represent the size of a
page and the number of PTEs per page respectively, and the size of a PTE
is constant. Thus it works out more efficient to only store the latter,
and derive the former via a trivial addition where necessary.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 29 +++++++++++++----------------
 1 file changed, 13 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 4b1483eb0ccf..15b4927ce36b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -36,10 +36,11 @@
  * in a virtual address mapped by the pagetable in d.
  */
 #define ARM_LPAE_LVL_SHIFT(l,d)						\
-	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
-	(d)->pg_shift)
+	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
+	ilog2(sizeof(arm_lpae_iopte)))
 
-#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
+#define ARM_LPAE_GRANULE(d)						\
+	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
 #define ARM_LPAE_PGD_SIZE(d)						\
 	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
 
@@ -55,9 +56,7 @@
 	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
 
 /* Calculate the block/page mapping size at level l for pagetable in d. */
-#define ARM_LPAE_BLOCK_SIZE(l,d)					\
-	(1ULL << (ilog2(sizeof(arm_lpae_iopte)) +			\
-		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
+#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
 
 /* Page table bits */
 #define ARM_LPAE_PTE_TYPE_SHIFT		0
@@ -175,8 +174,7 @@ struct arm_lpae_io_pgtable {
 
 	int			pgd_bits;
 	int			start_level;
-	unsigned long		pg_shift;
-	unsigned long		bits_per_level;
+	int			bits_per_level;
 
 	void			*pgd;
 };
@@ -206,7 +204,7 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
 {
 	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
 
-	if (data->pg_shift < 16)
+	if (data->bits_per_level < 13) /* i.e. 64K granule */
 		return paddr;
 
 	/* Rotate the packed high-order bits back to the top */
@@ -742,9 +740,8 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 static struct arm_lpae_io_pgtable *
 arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 {
-	unsigned long va_bits;
 	struct arm_lpae_io_pgtable *data;
-	int levels;
+	int levels, va_bits, pg_shift;
 
 	arm_lpae_restrict_pgsizes(cfg);
 
@@ -766,10 +763,10 @@ arm_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg)
 	if (!data)
 		return NULL;
 
-	data->pg_shift = __ffs(cfg->pgsize_bitmap);
-	data->bits_per_level = data->pg_shift - ilog2(sizeof(arm_lpae_iopte));
+	pg_shift = __ffs(cfg->pgsize_bitmap);
+	data->bits_per_level = pg_shift - ilog2(sizeof(arm_lpae_iopte));
 
-	va_bits = cfg->ias - data->pg_shift;
+	va_bits = cfg->ias - pg_shift;
 	levels = DIV_ROUND_UP(va_bits, data->bits_per_level);
 	data->start_level = ARM_LPAE_MAX_LEVELS - levels;
 
@@ -1138,9 +1135,9 @@ static void __init arm_lpae_dump_ops(struct io_pgtable_ops *ops)
 
 	pr_err("cfg: pgsize_bitmap 0x%lx, ias %u-bit\n",
 		cfg->pgsize_bitmap, cfg->ias);
-	pr_err("data: %d levels, 0x%zx pgd_size, %lu pg_shift, %lu bits_per_level, pgd @ %p\n",
+	pr_err("data: %d levels, 0x%zx pgd_size, %u pg_shift, %u bits_per_level, pgd @ %p\n",
 		ARM_LPAE_MAX_LEVELS - data->start_level, ARM_LPAE_PGD_SIZE(data),
-		data->pg_shift, data->bits_per_level, data->pgd);
+		ilog2(ARM_LPAE_GRANULE(data)), data->bits_per_level, data->pgd);
 }
 
 #define __FAIL(ops, i)	({						\
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

Between VMSAv8-64 and the various 32-bit formats, there is either one
64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
As such, keeping two 64-bit values in io_pgtable_cfg has always been
overkill.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu-v3.c    | 2 +-
 drivers/iommu/arm-smmu.c       | 4 ++--
 drivers/iommu/io-pgtable-arm.c | 3 +--
 drivers/iommu/ipmmu-vmsa.c     | 2 +-
 drivers/iommu/qcom_iommu.c     | 4 ++--
 include/linux/io-pgtable.h     | 2 +-
 6 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8da93e730d6f..3f20e548f1ec 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
+	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 	return 0;
 
 out_free_asid:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 080af0326816..2bc3e93b11e6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
 			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
 		} else {
-			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
-			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
+			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
+			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
 		}
 	}
 }
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 15b4927ce36b..1795df8f7a51 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -861,8 +861,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	      (ARM_LPAE_MAIR_ATTR_INC_OWBRWA
 	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE));
 
-	cfg->arm_lpae_s1_cfg.mair[0] = reg;
-	cfg->arm_lpae_s1_cfg.mair[1] = 0;
+	cfg->arm_lpae_s1_cfg.mair = reg;
 
 	/* Looking good; allocate a pgd */
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 9da8309f7170..e4da6efbda49 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -438,7 +438,7 @@ static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain)
 
 	/* MAIR0 */
 	ipmmu_ctx_write_root(domain, IMMAIR0,
-			     domain->cfg.arm_lpae_s1_cfg.mair[0]);
+			     domain->cfg.arm_lpae_s1_cfg.mair);
 
 	/* IMBUSCR */
 	if (domain->mmu->features->setup_imbuscr)
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index c31e7bc4ccbe..66e9b40e9275 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -284,9 +284,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 
 		/* MAIRs (stage-1 only) */
 		iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0,
-				pgtbl_cfg.arm_lpae_s1_cfg.mair[0]);
+				pgtbl_cfg.arm_lpae_s1_cfg.mair);
 		iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR1,
-				pgtbl_cfg.arm_lpae_s1_cfg.mair[1]);
+				pgtbl_cfg.arm_lpae_s1_cfg.mair >> 32);
 
 		/* SCTLR */
 		reg = SCTLR_CFIE | SCTLR_CFRE | SCTLR_AFE | SCTLR_TRE |
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ec7a13405f10..ee21eedafe98 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -102,7 +102,7 @@ struct io_pgtable_cfg {
 		struct {
 			u64	ttbr[2];
 			u64	tcr;
-			u64	mair[2];
+			u64	mair;
 		} arm_lpae_s1_cfg;
 
 		struct {
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

Between VMSAv8-64 and the various 32-bit formats, there is either one
64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
As such, keeping two 64-bit values in io_pgtable_cfg has always been
overkill.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu-v3.c    | 2 +-
 drivers/iommu/arm-smmu.c       | 4 ++--
 drivers/iommu/io-pgtable-arm.c | 3 +--
 drivers/iommu/ipmmu-vmsa.c     | 2 +-
 drivers/iommu/qcom_iommu.c     | 4 ++--
 include/linux/io-pgtable.h     | 2 +-
 6 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 8da93e730d6f..3f20e548f1ec 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
+	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 	return 0;
 
 out_free_asid:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 080af0326816..2bc3e93b11e6 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
 			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
 		} else {
-			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
-			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
+			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
+			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
 		}
 	}
 }
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 15b4927ce36b..1795df8f7a51 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -861,8 +861,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	      (ARM_LPAE_MAIR_ATTR_INC_OWBRWA
 	       << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE));
 
-	cfg->arm_lpae_s1_cfg.mair[0] = reg;
-	cfg->arm_lpae_s1_cfg.mair[1] = 0;
+	cfg->arm_lpae_s1_cfg.mair = reg;
 
 	/* Looking good; allocate a pgd */
 	data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data),
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index 9da8309f7170..e4da6efbda49 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -438,7 +438,7 @@ static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain)
 
 	/* MAIR0 */
 	ipmmu_ctx_write_root(domain, IMMAIR0,
-			     domain->cfg.arm_lpae_s1_cfg.mair[0]);
+			     domain->cfg.arm_lpae_s1_cfg.mair);
 
 	/* IMBUSCR */
 	if (domain->mmu->features->setup_imbuscr)
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index c31e7bc4ccbe..66e9b40e9275 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -284,9 +284,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 
 		/* MAIRs (stage-1 only) */
 		iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0,
-				pgtbl_cfg.arm_lpae_s1_cfg.mair[0]);
+				pgtbl_cfg.arm_lpae_s1_cfg.mair);
 		iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR1,
-				pgtbl_cfg.arm_lpae_s1_cfg.mair[1]);
+				pgtbl_cfg.arm_lpae_s1_cfg.mair >> 32);
 
 		/* SCTLR */
 		reg = SCTLR_CFIE | SCTLR_CFRE | SCTLR_AFE | SCTLR_TRE |
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ec7a13405f10..ee21eedafe98 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -102,7 +102,7 @@ struct io_pgtable_cfg {
 		struct {
 			u64	ttbr[2];
 			u64	tcr;
-			u64	mair[2];
+			u64	mair;
 		} arm_lpae_s1_cfg;
 
 		struct {
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

TTBR1 values have so far been redundant since no users implement any
support for split address spaces. Crucially, though, one of the main
reasons for wanting to do so is to be able to manage each half entirely
independently, e.g. context-switching one set of mappings without
disturbing the other. Thus it seems unlikely that tying two tables
together in a single io_pgtable_cfg would ever be particularly desirable
or useful.

Streamline the configs to just a single conceptual TTBR value
representing the allocated table. This paves the way for future users to
support split address spaces by simply allocating a table and dealing
with the detailed TTBRn logistics themselves.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu-v3.c        |  2 +-
 drivers/iommu/arm-smmu.c           |  9 ++++-----
 drivers/iommu/io-pgtable-arm-v7s.c | 16 +++++++---------
 drivers/iommu/io-pgtable-arm.c     |  5 ++---
 drivers/iommu/ipmmu-vmsa.c         |  2 +-
 drivers/iommu/msm_iommu.c          |  4 ++--
 drivers/iommu/mtk_iommu.c          |  4 ++--
 drivers/iommu/qcom_iommu.c         |  3 +--
 include/linux/io-pgtable.h         |  4 ++--
 9 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3f20e548f1ec..da31e607698f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2170,7 +2170,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	}
 
 	cfg->cd.asid	= (u16)asid;
-	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
+	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 	return 0;
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 2bc3e93b11e6..a249e4e49ead 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -534,13 +534,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 	/* TTBRs */
 	if (stage1) {
 		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
-			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr[0];
-			cb->ttbr[1] = pgtbl_cfg->arm_v7s_cfg.ttbr[1];
+			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
+			cb->ttbr[1] = 0;
 		} else {
-			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
+			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
 			cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
-			cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
-			cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
+			cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
 		}
 	} else {
 		cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 7c3bd2c3cdca..4d2c1e7f67c4 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
-	/* TTBRs */
-	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
-				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
-				   (cfg->coherent_walk ?
-				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
-				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
-				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
-				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
-	cfg->arm_v7s_cfg.ttbr[1] = 0;
+	/* TTBR */
+	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
+				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
+				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
+				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
+				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
+				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
 	return &data->iop;
 
 out_free_data:
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 1795df8f7a51..bc0841040ebe 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -872,9 +872,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
-	/* TTBRs */
-	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
-	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
+	/* TTBR */
+	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
 	return &data->iop;
 
 out_free_data:
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index e4da6efbda49..4fe0ff3216ce 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -416,7 +416,7 @@ static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain)
 	u32 tmp;
 
 	/* TTBR0 */
-	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
+	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr;
 	ipmmu_ctx_write_root(domain, IMTTLBR0, ttbr);
 	ipmmu_ctx_write_root(domain, IMTTUBR0, ttbr >> 32);
 
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index be99d408cf35..9ceec140fa67 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -279,8 +279,8 @@ static void __program_context(void __iomem *base, int ctx,
 	SET_V2PCFG(base, ctx, 0x3);
 
 	SET_TTBCR(base, ctx, priv->cfg.arm_v7s_cfg.tcr);
-	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[0]);
-	SET_TTBR1(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[1]);
+	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr);
+	SET_TTBR1(base, ctx, 0);
 
 	/* Set prrr and nmrr */
 	SET_PRRR(base, ctx, priv->cfg.arm_v7s_cfg.prrr);
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 67a483c1a935..ef0b36eeb83d 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -392,7 +392,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain,
 	/* Update the pgtable base address register of the M4U HW */
 	if (!data->m4u_dom) {
 		data->m4u_dom = dom;
-		writel(dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
+		writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
 		       data->base + REG_MMU_PT_BASE_ADDR);
 	}
 
@@ -797,7 +797,7 @@ static int __maybe_unused mtk_iommu_resume(struct device *dev)
 	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
 	writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
 	if (m4u_dom)
-		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
+		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
 		       base + REG_MMU_PT_BASE_ADDR);
 	return 0;
 }
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 66e9b40e9275..9a57eb6c253c 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -269,10 +269,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 
 		/* TTBRs */
 		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
-				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0] |
+				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
 				FIELD_PREP(TTBRn_ASID, ctx->asid));
 		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
-				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[1] |
 				FIELD_PREP(TTBRn_ASID, ctx->asid));
 
 		/* TCR */
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ee21eedafe98..53bca5343f52 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -100,7 +100,7 @@ struct io_pgtable_cfg {
 	/* Low-level data specific to the table format */
 	union {
 		struct {
-			u64	ttbr[2];
+			u64	ttbr;
 			u64	tcr;
 			u64	mair;
 		} arm_lpae_s1_cfg;
@@ -111,7 +111,7 @@ struct io_pgtable_cfg {
 		} arm_lpae_s2_cfg;
 
 		struct {
-			u32	ttbr[2];
+			u32	ttbr;
 			u32	tcr;
 			u32	nmrr;
 			u32	prrr;
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

TTBR1 values have so far been redundant since no users implement any
support for split address spaces. Crucially, though, one of the main
reasons for wanting to do so is to be able to manage each half entirely
independently, e.g. context-switching one set of mappings without
disturbing the other. Thus it seems unlikely that tying two tables
together in a single io_pgtable_cfg would ever be particularly desirable
or useful.

Streamline the configs to just a single conceptual TTBR value
representing the allocated table. This paves the way for future users to
support split address spaces by simply allocating a table and dealing
with the detailed TTBRn logistics themselves.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu-v3.c        |  2 +-
 drivers/iommu/arm-smmu.c           |  9 ++++-----
 drivers/iommu/io-pgtable-arm-v7s.c | 16 +++++++---------
 drivers/iommu/io-pgtable-arm.c     |  5 ++---
 drivers/iommu/ipmmu-vmsa.c         |  2 +-
 drivers/iommu/msm_iommu.c          |  4 ++--
 drivers/iommu/mtk_iommu.c          |  4 ++--
 drivers/iommu/qcom_iommu.c         |  3 +--
 include/linux/io-pgtable.h         |  4 ++--
 9 files changed, 22 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 3f20e548f1ec..da31e607698f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2170,7 +2170,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	}
 
 	cfg->cd.asid	= (u16)asid;
-	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
+	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
 	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 	return 0;
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 2bc3e93b11e6..a249e4e49ead 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -534,13 +534,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 	/* TTBRs */
 	if (stage1) {
 		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
-			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr[0];
-			cb->ttbr[1] = pgtbl_cfg->arm_v7s_cfg.ttbr[1];
+			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
+			cb->ttbr[1] = 0;
 		} else {
-			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
+			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
 			cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
-			cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
-			cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
+			cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
 		}
 	} else {
 		cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 7c3bd2c3cdca..4d2c1e7f67c4 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
-	/* TTBRs */
-	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
-				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
-				   (cfg->coherent_walk ?
-				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
-				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
-				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
-				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
-	cfg->arm_v7s_cfg.ttbr[1] = 0;
+	/* TTBR */
+	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
+				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
+				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
+				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
+				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
+				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
 	return &data->iop;
 
 out_free_data:
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 1795df8f7a51..bc0841040ebe 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -872,9 +872,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	/* Ensure the empty pgd is visible before any actual TTBR write */
 	wmb();
 
-	/* TTBRs */
-	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
-	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
+	/* TTBR */
+	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
 	return &data->iop;
 
 out_free_data:
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index e4da6efbda49..4fe0ff3216ce 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -416,7 +416,7 @@ static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain)
 	u32 tmp;
 
 	/* TTBR0 */
-	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
+	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr;
 	ipmmu_ctx_write_root(domain, IMTTLBR0, ttbr);
 	ipmmu_ctx_write_root(domain, IMTTUBR0, ttbr >> 32);
 
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index be99d408cf35..9ceec140fa67 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -279,8 +279,8 @@ static void __program_context(void __iomem *base, int ctx,
 	SET_V2PCFG(base, ctx, 0x3);
 
 	SET_TTBCR(base, ctx, priv->cfg.arm_v7s_cfg.tcr);
-	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[0]);
-	SET_TTBR1(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[1]);
+	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr);
+	SET_TTBR1(base, ctx, 0);
 
 	/* Set prrr and nmrr */
 	SET_PRRR(base, ctx, priv->cfg.arm_v7s_cfg.prrr);
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 67a483c1a935..ef0b36eeb83d 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -392,7 +392,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain,
 	/* Update the pgtable base address register of the M4U HW */
 	if (!data->m4u_dom) {
 		data->m4u_dom = dom;
-		writel(dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
+		writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
 		       data->base + REG_MMU_PT_BASE_ADDR);
 	}
 
@@ -797,7 +797,7 @@ static int __maybe_unused mtk_iommu_resume(struct device *dev)
 	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
 	writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
 	if (m4u_dom)
-		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
+		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
 		       base + REG_MMU_PT_BASE_ADDR);
 	return 0;
 }
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 66e9b40e9275..9a57eb6c253c 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -269,10 +269,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 
 		/* TTBRs */
 		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
-				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0] |
+				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
 				FIELD_PREP(TTBRn_ASID, ctx->asid));
 		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
-				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[1] |
 				FIELD_PREP(TTBRn_ASID, ctx->asid));
 
 		/* TCR */
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index ee21eedafe98..53bca5343f52 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -100,7 +100,7 @@ struct io_pgtable_cfg {
 	/* Low-level data specific to the table format */
 	union {
 		struct {
-			u64	ttbr[2];
+			u64	ttbr;
 			u64	tcr;
 			u64	mair;
 		} arm_lpae_s1_cfg;
@@ -111,7 +111,7 @@ struct io_pgtable_cfg {
 		} arm_lpae_s2_cfg;
 
 		struct {
-			u32	ttbr[2];
+			u32	ttbr;
 			u32	tcr;
 			u32	nmrr;
 			u32	prrr;
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

Although it's conceptually nice for the io_pgtable_cfg to provide a
standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
looks exactly like an Arm CPU, and they all have various other TCR
controls which io-pgtable can't be expected to understand. Thus since
there is an expectation that drivers will have to add to the given TCR
value anyway, let's strip it down to just the essentials that are
directly relevant to io-pgatble's inner workings - namely the various
sizes and the walk attributes.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu-v3.c        | 41 +++----------
 drivers/iommu/arm-smmu.c           |  7 ++-
 drivers/iommu/arm-smmu.h           | 27 ++++++++
 drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
 drivers/iommu/io-pgtable-arm.c     | 98 ++++++++++++------------------
 drivers/iommu/io-pgtable.c         |  2 +-
 drivers/iommu/qcom_iommu.c         |  8 +--
 include/linux/io-pgtable.h         |  9 ++-
 8 files changed, 94 insertions(+), 104 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index da31e607698f..ca72cd777955 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -261,27 +261,18 @@
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
-#define ARM64_TCR_T0SZ			GENMASK_ULL(5, 0)
 #define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
-#define ARM64_TCR_TG0			GENMASK_ULL(15, 14)
 #define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
-#define ARM64_TCR_IRGN0			GENMASK_ULL(9, 8)
 #define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
-#define ARM64_TCR_ORGN0			GENMASK_ULL(11, 10)
 #define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
-#define ARM64_TCR_SH0			GENMASK_ULL(13, 12)
 #define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
-#define ARM64_TCR_EPD0			(1ULL << 7)
 #define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
-#define ARM64_TCR_EPD1			(1ULL << 23)
 
 #define CTXDESC_CD_0_ENDI		(1UL << 15)
 #define CTXDESC_CD_0_V			(1UL << 31)
 
 #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
-#define ARM64_TCR_IPS			GENMASK_ULL(34, 32)
 #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
-#define ARM64_TCR_TBI0			(1ULL << 37)
 
 #define CTXDESC_CD_0_AA64		(1UL << 41)
 #define CTXDESC_CD_0_S			(1UL << 44)
@@ -292,10 +283,6 @@
 
 #define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
 
-/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
-#define ARM_SMMU_TCR2CD(tcr, fld)	FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
-					FIELD_GET(ARM64_TCR_##fld, tcr))
-
 /* Command queue */
 #define CMDQ_ENT_SZ_SHIFT		4
 #define CMDQ_ENT_DWORDS			((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
@@ -1443,23 +1430,6 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 }
 
 /* Context descriptor manipulation functions */
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
-{
-	u64 val = 0;
-
-	/* Repack the TCR. Just care about TTBR0 for now */
-	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
-	val |= ARM_SMMU_TCR2CD(tcr, TG0);
-	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, SH0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
-	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-
-	return val;
-}
-
 static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
 				    struct arm_smmu_s1_cfg *cfg)
 {
@@ -1469,7 +1439,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
 	 * We don't need to issue any invalidation here, as we'll invalidate
 	 * the STE when installing the new entry anyway.
 	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
+	val = cfg->cd.tcr |
 #ifdef __BIG_ENDIAN
 	      CTXDESC_CD_0_ENDI |
 #endif
@@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	int asid;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 
 	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
 	if (asid < 0)
@@ -2171,7 +2142,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
+	cfg->cd.tcr	= FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
+			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 	return 0;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a249e4e49ead..ade323ab0484 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -521,11 +521,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
 			cb->tcr[0] = pgtbl_cfg->arm_v7s_cfg.tcr;
 		} else {
-			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
-			cb->tcr[1] |= FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
+			cb->tcr[0] = arm_smmu_lpae_tcr(pgtbl_cfg);
+			cb->tcr[1] = arm_smmu_lpae_tcr2(pgtbl_cfg);
 			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
 				cb->tcr[1] |= TCR2_AS;
+			else
+				cb->tcr[0] |= TCR_EAE;
 		}
 	} else {
 		cb->tcr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 409716410b0d..98db074281ac 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -158,12 +158,24 @@ enum arm_smmu_cbar_type {
 #define TCR2_SEP			GENMASK(17, 15)
 #define TCR2_SEP_UPSTREAM		0x7
 #define TCR2_AS				BIT(4)
+#define TCR2_PASIZE			GENMASK(3, 0)
 
 #define ARM_SMMU_CB_TTBR0		0x20
 #define ARM_SMMU_CB_TTBR1		0x28
 #define TTBRn_ASID			GENMASK_ULL(63, 48)
 
+/* arm64 headers leak this somehow :( */
+#undef TCR_T0SZ
+
 #define ARM_SMMU_CB_TCR			0x30
+#define TCR_EAE				BIT(31)
+#define TCR_EPD1			BIT(23)
+#define TCR_TG0				GENMASK(15, 14)
+#define TCR_SH0				GENMASK(13, 12)
+#define TCR_ORGN0			GENMASK(11, 10)
+#define TCR_IRGN0			GENMASK(9, 8)
+#define TCR_T0SZ			GENMASK(5, 0)
+
 #define ARM_SMMU_CB_CONTEXTIDR		0x34
 #define ARM_SMMU_CB_S1_MAIR0		0x38
 #define ARM_SMMU_CB_S1_MAIR1		0x3c
@@ -318,6 +330,21 @@ struct arm_smmu_domain {
 	struct iommu_domain		domain;
 };
 
+static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
+{
+	return TCR_EPD1 |
+	       FIELD_PREP(TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+	       FIELD_PREP(TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+	       FIELD_PREP(TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+	       FIELD_PREP(TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+	       FIELD_PREP(TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+}
+
+static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
+{
+	return FIELD_PREP(TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
+	       FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
+}
 
 /* Implementation details, yay! */
 struct arm_smmu_impl {
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 4d2c1e7f67c4..d8e4562ce478 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -149,8 +149,6 @@
 #define ARM_V7S_TTBR_IRGN_ATTR(attr)					\
 	((((attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
 
-#define ARM_V7S_TCR_PD1			BIT(5)
-
 #ifdef CONFIG_ZONE_DMA32
 #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
 #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
@@ -798,8 +796,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 	 */
 	cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
 
-	/* TCR: T0SZ=0, disable TTBR1 */
-	cfg->arm_v7s_cfg.tcr = ARM_V7S_TCR_PD1;
+	/* TCR: T0SZ=0, EAE=0 (if applicable) */
+	cfg->arm_v7s_cfg.tcr = 0;
 
 	/*
 	 * TEX remap: the indices used map to the closest equivalent types
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index bc0841040ebe..9b1912ede000 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -100,40 +100,32 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_32_LPAE_TCR_EAE		(1 << 31)
-#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
+#define ARM_64_LPAE_VTCR_RES1		(1 << 31)
 
-#define ARM_LPAE_TCR_EPD1		(1 << 23)
-
-#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
-#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
-#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
+#define ARM_LPAE_VTCR_TG0_SHIFT		14
+#define ARM_LPAE_TCR_TG0_4K		0
+#define ARM_LPAE_TCR_TG0_64K		1
+#define ARM_LPAE_TCR_TG0_16K		2
 
 #define ARM_LPAE_TCR_SH0_SHIFT		12
-#define ARM_LPAE_TCR_SH0_MASK		0x3
 #define ARM_LPAE_TCR_SH_NS		0
 #define ARM_LPAE_TCR_SH_OS		2
 #define ARM_LPAE_TCR_SH_IS		3
 
 #define ARM_LPAE_TCR_ORGN0_SHIFT	10
 #define ARM_LPAE_TCR_IRGN0_SHIFT	8
-#define ARM_LPAE_TCR_RGN_MASK		0x3
 #define ARM_LPAE_TCR_RGN_NC		0
 #define ARM_LPAE_TCR_RGN_WBWA		1
 #define ARM_LPAE_TCR_RGN_WT		2
 #define ARM_LPAE_TCR_RGN_WB		3
 
-#define ARM_LPAE_TCR_SL0_SHIFT		6
-#define ARM_LPAE_TCR_SL0_MASK		0x3
+#define ARM_LPAE_VTCR_SL0_SHIFT		6
+#define ARM_LPAE_VTCR_SL0_MASK		0x3
 
 #define ARM_LPAE_TCR_T0SZ_SHIFT		0
-#define ARM_LPAE_TCR_SZ_MASK		0xf
 
-#define ARM_LPAE_TCR_PS_SHIFT		16
-#define ARM_LPAE_TCR_PS_MASK		0x7
-
-#define ARM_LPAE_TCR_IPS_SHIFT		32
-#define ARM_LPAE_TCR_IPS_MASK		0x7
+#define ARM_LPAE_VTCR_PS_SHIFT		16
+#define ARM_LPAE_VTCR_PS_MASK		0x7
 
 #define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
 #define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
@@ -787,6 +779,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
 	u64 reg;
 	struct arm_lpae_io_pgtable *data;
+	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
 			    IO_PGTABLE_QUIRK_NON_STRICT))
@@ -798,58 +791,54 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 
 	/* TCR */
 	if (cfg->coherent_walk) {
-		reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
-		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
-		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+		tcr->sh = ARM_LPAE_TCR_SH_IS;
+		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
+		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
 	} else {
-		reg = (ARM_LPAE_TCR_SH_OS << ARM_LPAE_TCR_SH0_SHIFT) |
-		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT) |
-		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT);
+		tcr->sh = ARM_LPAE_TCR_SH_OS;
+		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
+		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
 	}
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
-		reg |= ARM_LPAE_TCR_TG0_4K;
+		tcr->tg = ARM_LPAE_TCR_TG0_4K;
 		break;
 	case SZ_16K:
-		reg |= ARM_LPAE_TCR_TG0_16K;
+		tcr->tg = ARM_LPAE_TCR_TG0_16K;
 		break;
 	case SZ_64K:
-		reg |= ARM_LPAE_TCR_TG0_64K;
+		tcr->tg = ARM_LPAE_TCR_TG0_64K;
 		break;
 	}
 
 	switch (cfg->oas) {
 	case 32:
-		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_32_BIT;
 		break;
 	case 36:
-		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_36_BIT;
 		break;
 	case 40:
-		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_40_BIT;
 		break;
 	case 42:
-		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_42_BIT;
 		break;
 	case 44:
-		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_44_BIT;
 		break;
 	case 48:
-		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_48_BIT;
 		break;
 	case 52:
-		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_52_BIT;
 		break;
 	default:
 		goto out_free_data;
 	}
 
-	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
-
-	/* Disable speculative walks through TTBR1 */
-	reg |= ARM_LPAE_TCR_EPD1;
-	cfg->arm_lpae_s1_cfg.tcr = reg;
+	tcr->tsz = 64ULL - cfg->ias;
 
 	/* MAIRs */
 	reg = (ARM_LPAE_MAIR_ATTR_NC
@@ -910,7 +899,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	}
 
 	/* VTCR */
-	reg = ARM_64_LPAE_S2_TCR_RES1 |
+	reg = ARM_64_LPAE_VTCR_RES1 |
 	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
 	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
 	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
@@ -919,45 +908,45 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
-		reg |= ARM_LPAE_TCR_TG0_4K;
+		reg |= (ARM_LPAE_TCR_TG0_4K << ARM_LPAE_VTCR_TG0_SHIFT);
 		sl++; /* SL0 format is different for 4K granule size */
 		break;
 	case SZ_16K:
-		reg |= ARM_LPAE_TCR_TG0_16K;
+		reg |= (ARM_LPAE_TCR_TG0_16K << ARM_LPAE_VTCR_TG0_SHIFT);
 		break;
 	case SZ_64K:
-		reg |= ARM_LPAE_TCR_TG0_64K;
+		reg |= (ARM_LPAE_TCR_TG0_64K << ARM_LPAE_VTCR_TG0_SHIFT);
 		break;
 	}
 
 	switch (cfg->oas) {
 	case 32:
-		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 36:
-		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 40:
-		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 42:
-		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 44:
-		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 48:
-		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 52:
-		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	default:
 		goto out_free_data;
 	}
 
 	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
-	reg |= (~sl & ARM_LPAE_TCR_SL0_MASK) << ARM_LPAE_TCR_SL0_SHIFT;
+	reg |= (~sl & ARM_LPAE_VTCR_SL0_MASK) << ARM_LPAE_VTCR_SL0_SHIFT;
 	cfg->arm_lpae_s2_cfg.vtcr = reg;
 
 	/* Allocate pgd pages */
@@ -981,19 +970,12 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 static struct io_pgtable *
 arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
-	struct io_pgtable *iop;
-
 	if (cfg->ias > 32 || cfg->oas > 40)
 		return NULL;
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
-	iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
-	if (iop) {
-		cfg->arm_lpae_s1_cfg.tcr |= ARM_32_LPAE_TCR_EAE;
-		cfg->arm_lpae_s1_cfg.tcr &= 0xffffffff;
-	}
 
-	return iop;
+	return arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
 }
 
 static struct io_pgtable *
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index ced53e5b72b5..94394c81468f 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -63,7 +63,7 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
 	if (!ops)
 		return;
 
-	iop = container_of(ops, struct io_pgtable, ops);
+	iop = io_pgtable_ops_to_pgtable(ops);
 	io_pgtable_tlb_flush_all(iop);
 	io_pgtable_init_table[iop->fmt]->free(iop);
 }
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 9a57eb6c253c..059be7e21030 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
 				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
 				FIELD_PREP(TTBRn_ASID, ctx->asid));
-		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
-				FIELD_PREP(TTBRn_ASID, ctx->asid));
+		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
 
 		/* TCR */
 		iommu_writel(ctx, ARM_SMMU_CB_TCR2,
-				(pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) |
-				FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM));
+				arm_smmu_lpae_tcr2(&pgtbl_cfg));
 		iommu_writel(ctx, ARM_SMMU_CB_TCR,
-				pgtbl_cfg.arm_lpae_s1_cfg.tcr);
+				arm_smmu_lpae_tcr(&pgtbl_cfg) | TCR_EAE);
 
 		/* MAIRs (stage-1 only) */
 		iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0,
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 53bca5343f52..6ae104cedfd7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -101,7 +101,14 @@ struct io_pgtable_cfg {
 	union {
 		struct {
 			u64	ttbr;
-			u64	tcr;
+			struct {
+				u32	ips:3;
+				u32	tg:2;
+				u32	sh:2;
+				u32	orgn:2;
+				u32	irgn:2;
+				u32	tsz:6;
+			}	tcr;
 			u64	mair;
 		} arm_lpae_s1_cfg;
 
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

Although it's conceptually nice for the io_pgtable_cfg to provide a
standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
looks exactly like an Arm CPU, and they all have various other TCR
controls which io-pgtable can't be expected to understand. Thus since
there is an expectation that drivers will have to add to the given TCR
value anyway, let's strip it down to just the essentials that are
directly relevant to io-pgatble's inner workings - namely the various
sizes and the walk attributes.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/arm-smmu-v3.c        | 41 +++----------
 drivers/iommu/arm-smmu.c           |  7 ++-
 drivers/iommu/arm-smmu.h           | 27 ++++++++
 drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
 drivers/iommu/io-pgtable-arm.c     | 98 ++++++++++++------------------
 drivers/iommu/io-pgtable.c         |  2 +-
 drivers/iommu/qcom_iommu.c         |  8 +--
 include/linux/io-pgtable.h         |  9 ++-
 8 files changed, 94 insertions(+), 104 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index da31e607698f..ca72cd777955 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -261,27 +261,18 @@
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS		8
 #define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
-#define ARM64_TCR_T0SZ			GENMASK_ULL(5, 0)
 #define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
-#define ARM64_TCR_TG0			GENMASK_ULL(15, 14)
 #define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
-#define ARM64_TCR_IRGN0			GENMASK_ULL(9, 8)
 #define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
-#define ARM64_TCR_ORGN0			GENMASK_ULL(11, 10)
 #define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
-#define ARM64_TCR_SH0			GENMASK_ULL(13, 12)
 #define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
-#define ARM64_TCR_EPD0			(1ULL << 7)
 #define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
-#define ARM64_TCR_EPD1			(1ULL << 23)
 
 #define CTXDESC_CD_0_ENDI		(1UL << 15)
 #define CTXDESC_CD_0_V			(1UL << 31)
 
 #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
-#define ARM64_TCR_IPS			GENMASK_ULL(34, 32)
 #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
-#define ARM64_TCR_TBI0			(1ULL << 37)
 
 #define CTXDESC_CD_0_AA64		(1UL << 41)
 #define CTXDESC_CD_0_S			(1UL << 44)
@@ -292,10 +283,6 @@
 
 #define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
 
-/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
-#define ARM_SMMU_TCR2CD(tcr, fld)	FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
-					FIELD_GET(ARM64_TCR_##fld, tcr))
-
 /* Command queue */
 #define CMDQ_ENT_SZ_SHIFT		4
 #define CMDQ_ENT_DWORDS			((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
@@ -1443,23 +1430,6 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
 }
 
 /* Context descriptor manipulation functions */
-static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
-{
-	u64 val = 0;
-
-	/* Repack the TCR. Just care about TTBR0 for now */
-	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
-	val |= ARM_SMMU_TCR2CD(tcr, TG0);
-	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
-	val |= ARM_SMMU_TCR2CD(tcr, SH0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
-	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
-	val |= ARM_SMMU_TCR2CD(tcr, IPS);
-
-	return val;
-}
-
 static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
 				    struct arm_smmu_s1_cfg *cfg)
 {
@@ -1469,7 +1439,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
 	 * We don't need to issue any invalidation here, as we'll invalidate
 	 * the STE when installing the new entry anyway.
 	 */
-	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
+	val = cfg->cd.tcr |
 #ifdef __BIG_ENDIAN
 	      CTXDESC_CD_0_ENDI |
 #endif
@@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	int asid;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
+	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 
 	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
 	if (asid < 0)
@@ -2171,7 +2142,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 
 	cfg->cd.asid	= (u16)asid;
 	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
+	cfg->cd.tcr	= FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
+			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
+			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
 	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
 	return 0;
 
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index a249e4e49ead..ade323ab0484 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -521,11 +521,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
 			cb->tcr[0] = pgtbl_cfg->arm_v7s_cfg.tcr;
 		} else {
-			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
-			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
-			cb->tcr[1] |= FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
+			cb->tcr[0] = arm_smmu_lpae_tcr(pgtbl_cfg);
+			cb->tcr[1] = arm_smmu_lpae_tcr2(pgtbl_cfg);
 			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
 				cb->tcr[1] |= TCR2_AS;
+			else
+				cb->tcr[0] |= TCR_EAE;
 		}
 	} else {
 		cb->tcr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 409716410b0d..98db074281ac 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -158,12 +158,24 @@ enum arm_smmu_cbar_type {
 #define TCR2_SEP			GENMASK(17, 15)
 #define TCR2_SEP_UPSTREAM		0x7
 #define TCR2_AS				BIT(4)
+#define TCR2_PASIZE			GENMASK(3, 0)
 
 #define ARM_SMMU_CB_TTBR0		0x20
 #define ARM_SMMU_CB_TTBR1		0x28
 #define TTBRn_ASID			GENMASK_ULL(63, 48)
 
+/* arm64 headers leak this somehow :( */
+#undef TCR_T0SZ
+
 #define ARM_SMMU_CB_TCR			0x30
+#define TCR_EAE				BIT(31)
+#define TCR_EPD1			BIT(23)
+#define TCR_TG0				GENMASK(15, 14)
+#define TCR_SH0				GENMASK(13, 12)
+#define TCR_ORGN0			GENMASK(11, 10)
+#define TCR_IRGN0			GENMASK(9, 8)
+#define TCR_T0SZ			GENMASK(5, 0)
+
 #define ARM_SMMU_CB_CONTEXTIDR		0x34
 #define ARM_SMMU_CB_S1_MAIR0		0x38
 #define ARM_SMMU_CB_S1_MAIR1		0x3c
@@ -318,6 +330,21 @@ struct arm_smmu_domain {
 	struct iommu_domain		domain;
 };
 
+static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
+{
+	return TCR_EPD1 |
+	       FIELD_PREP(TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+	       FIELD_PREP(TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+	       FIELD_PREP(TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+	       FIELD_PREP(TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+	       FIELD_PREP(TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+}
+
+static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
+{
+	return FIELD_PREP(TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
+	       FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
+}
 
 /* Implementation details, yay! */
 struct arm_smmu_impl {
diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
index 4d2c1e7f67c4..d8e4562ce478 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -149,8 +149,6 @@
 #define ARM_V7S_TTBR_IRGN_ATTR(attr)					\
 	((((attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
 
-#define ARM_V7S_TCR_PD1			BIT(5)
-
 #ifdef CONFIG_ZONE_DMA32
 #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
 #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
@@ -798,8 +796,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
 	 */
 	cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
 
-	/* TCR: T0SZ=0, disable TTBR1 */
-	cfg->arm_v7s_cfg.tcr = ARM_V7S_TCR_PD1;
+	/* TCR: T0SZ=0, EAE=0 (if applicable) */
+	cfg->arm_v7s_cfg.tcr = 0;
 
 	/*
 	 * TEX remap: the indices used map to the closest equivalent types
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index bc0841040ebe..9b1912ede000 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -100,40 +100,32 @@
 #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
 
 /* Register bits */
-#define ARM_32_LPAE_TCR_EAE		(1 << 31)
-#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
+#define ARM_64_LPAE_VTCR_RES1		(1 << 31)
 
-#define ARM_LPAE_TCR_EPD1		(1 << 23)
-
-#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
-#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
-#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
+#define ARM_LPAE_VTCR_TG0_SHIFT		14
+#define ARM_LPAE_TCR_TG0_4K		0
+#define ARM_LPAE_TCR_TG0_64K		1
+#define ARM_LPAE_TCR_TG0_16K		2
 
 #define ARM_LPAE_TCR_SH0_SHIFT		12
-#define ARM_LPAE_TCR_SH0_MASK		0x3
 #define ARM_LPAE_TCR_SH_NS		0
 #define ARM_LPAE_TCR_SH_OS		2
 #define ARM_LPAE_TCR_SH_IS		3
 
 #define ARM_LPAE_TCR_ORGN0_SHIFT	10
 #define ARM_LPAE_TCR_IRGN0_SHIFT	8
-#define ARM_LPAE_TCR_RGN_MASK		0x3
 #define ARM_LPAE_TCR_RGN_NC		0
 #define ARM_LPAE_TCR_RGN_WBWA		1
 #define ARM_LPAE_TCR_RGN_WT		2
 #define ARM_LPAE_TCR_RGN_WB		3
 
-#define ARM_LPAE_TCR_SL0_SHIFT		6
-#define ARM_LPAE_TCR_SL0_MASK		0x3
+#define ARM_LPAE_VTCR_SL0_SHIFT		6
+#define ARM_LPAE_VTCR_SL0_MASK		0x3
 
 #define ARM_LPAE_TCR_T0SZ_SHIFT		0
-#define ARM_LPAE_TCR_SZ_MASK		0xf
 
-#define ARM_LPAE_TCR_PS_SHIFT		16
-#define ARM_LPAE_TCR_PS_MASK		0x7
-
-#define ARM_LPAE_TCR_IPS_SHIFT		32
-#define ARM_LPAE_TCR_IPS_MASK		0x7
+#define ARM_LPAE_VTCR_PS_SHIFT		16
+#define ARM_LPAE_VTCR_PS_MASK		0x7
 
 #define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
 #define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
@@ -787,6 +779,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
 	u64 reg;
 	struct arm_lpae_io_pgtable *data;
+	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
 			    IO_PGTABLE_QUIRK_NON_STRICT))
@@ -798,58 +791,54 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 
 	/* TCR */
 	if (cfg->coherent_walk) {
-		reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
-		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
-		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
+		tcr->sh = ARM_LPAE_TCR_SH_IS;
+		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
+		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
 	} else {
-		reg = (ARM_LPAE_TCR_SH_OS << ARM_LPAE_TCR_SH0_SHIFT) |
-		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT) |
-		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT);
+		tcr->sh = ARM_LPAE_TCR_SH_OS;
+		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
+		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
 	}
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
-		reg |= ARM_LPAE_TCR_TG0_4K;
+		tcr->tg = ARM_LPAE_TCR_TG0_4K;
 		break;
 	case SZ_16K:
-		reg |= ARM_LPAE_TCR_TG0_16K;
+		tcr->tg = ARM_LPAE_TCR_TG0_16K;
 		break;
 	case SZ_64K:
-		reg |= ARM_LPAE_TCR_TG0_64K;
+		tcr->tg = ARM_LPAE_TCR_TG0_64K;
 		break;
 	}
 
 	switch (cfg->oas) {
 	case 32:
-		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_32_BIT;
 		break;
 	case 36:
-		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_36_BIT;
 		break;
 	case 40:
-		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_40_BIT;
 		break;
 	case 42:
-		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_42_BIT;
 		break;
 	case 44:
-		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_44_BIT;
 		break;
 	case 48:
-		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_48_BIT;
 		break;
 	case 52:
-		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		tcr->ips = ARM_LPAE_TCR_PS_52_BIT;
 		break;
 	default:
 		goto out_free_data;
 	}
 
-	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
-
-	/* Disable speculative walks through TTBR1 */
-	reg |= ARM_LPAE_TCR_EPD1;
-	cfg->arm_lpae_s1_cfg.tcr = reg;
+	tcr->tsz = 64ULL - cfg->ias;
 
 	/* MAIRs */
 	reg = (ARM_LPAE_MAIR_ATTR_NC
@@ -910,7 +899,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	}
 
 	/* VTCR */
-	reg = ARM_64_LPAE_S2_TCR_RES1 |
+	reg = ARM_64_LPAE_VTCR_RES1 |
 	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
 	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
 	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
@@ -919,45 +908,45 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
-		reg |= ARM_LPAE_TCR_TG0_4K;
+		reg |= (ARM_LPAE_TCR_TG0_4K << ARM_LPAE_VTCR_TG0_SHIFT);
 		sl++; /* SL0 format is different for 4K granule size */
 		break;
 	case SZ_16K:
-		reg |= ARM_LPAE_TCR_TG0_16K;
+		reg |= (ARM_LPAE_TCR_TG0_16K << ARM_LPAE_VTCR_TG0_SHIFT);
 		break;
 	case SZ_64K:
-		reg |= ARM_LPAE_TCR_TG0_64K;
+		reg |= (ARM_LPAE_TCR_TG0_64K << ARM_LPAE_VTCR_TG0_SHIFT);
 		break;
 	}
 
 	switch (cfg->oas) {
 	case 32:
-		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 36:
-		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 40:
-		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 42:
-		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 44:
-		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 48:
-		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	case 52:
-		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_VTCR_PS_SHIFT);
 		break;
 	default:
 		goto out_free_data;
 	}
 
 	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
-	reg |= (~sl & ARM_LPAE_TCR_SL0_MASK) << ARM_LPAE_TCR_SL0_SHIFT;
+	reg |= (~sl & ARM_LPAE_VTCR_SL0_MASK) << ARM_LPAE_VTCR_SL0_SHIFT;
 	cfg->arm_lpae_s2_cfg.vtcr = reg;
 
 	/* Allocate pgd pages */
@@ -981,19 +970,12 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 static struct io_pgtable *
 arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 {
-	struct io_pgtable *iop;
-
 	if (cfg->ias > 32 || cfg->oas > 40)
 		return NULL;
 
 	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
-	iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
-	if (iop) {
-		cfg->arm_lpae_s1_cfg.tcr |= ARM_32_LPAE_TCR_EAE;
-		cfg->arm_lpae_s1_cfg.tcr &= 0xffffffff;
-	}
 
-	return iop;
+	return arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
 }
 
 static struct io_pgtable *
diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
index ced53e5b72b5..94394c81468f 100644
--- a/drivers/iommu/io-pgtable.c
+++ b/drivers/iommu/io-pgtable.c
@@ -63,7 +63,7 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
 	if (!ops)
 		return;
 
-	iop = container_of(ops, struct io_pgtable, ops);
+	iop = io_pgtable_ops_to_pgtable(ops);
 	io_pgtable_tlb_flush_all(iop);
 	io_pgtable_init_table[iop->fmt]->free(iop);
 }
diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 9a57eb6c253c..059be7e21030 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
 		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
 				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
 				FIELD_PREP(TTBRn_ASID, ctx->asid));
-		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
-				FIELD_PREP(TTBRn_ASID, ctx->asid));
+		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
 
 		/* TCR */
 		iommu_writel(ctx, ARM_SMMU_CB_TCR2,
-				(pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) |
-				FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM));
+				arm_smmu_lpae_tcr2(&pgtbl_cfg));
 		iommu_writel(ctx, ARM_SMMU_CB_TCR,
-				pgtbl_cfg.arm_lpae_s1_cfg.tcr);
+				arm_smmu_lpae_tcr(&pgtbl_cfg) | TCR_EAE);
 
 		/* MAIRs (stage-1 only) */
 		iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0,
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 53bca5343f52..6ae104cedfd7 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -101,7 +101,14 @@ struct io_pgtable_cfg {
 	union {
 		struct {
 			u64	ttbr;
-			u64	tcr;
+			struct {
+				u32	ips:3;
+				u32	tg:2;
+				u32	sh:2;
+				u32	orgn:2;
+				u32	irgn:2;
+				u32	tsz:6;
+			}	tcr;
 			u64	mair;
 		} arm_lpae_s1_cfg;
 
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
  2019-10-25 18:08 ` Robin Murphy
@ 2019-10-25 18:08   ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, linux-arm-kernel

Now that we can correctly extract top-level indices without relying on
the remaining upper bits being zero, the only remaining impediments to
using a given table for TTBR1 are the address validation on map/unmap
and the awkward TCR translation granule format. Add a quirk so that we
can do the right thing at those points.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 25 +++++++++++++++++++------
 include/linux/io-pgtable.h     |  4 ++++
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 9b1912ede000..e53edff56e54 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -107,6 +107,10 @@
 #define ARM_LPAE_TCR_TG0_64K		1
 #define ARM_LPAE_TCR_TG0_16K		2
 
+#define ARM_LPAE_TCR_TG1_16K		1
+#define ARM_LPAE_TCR_TG1_4K		2
+#define ARM_LPAE_TCR_TG1_64K		3
+
 #define ARM_LPAE_TCR_SH0_SHIFT		12
 #define ARM_LPAE_TCR_SH_NS		0
 #define ARM_LPAE_TCR_SH_OS		2
@@ -466,6 +470,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	arm_lpae_iopte *ptep = data->pgd;
 	int ret, lvl = data->start_level;
 	arm_lpae_iopte prot;
+	long iaext = (long)iova >> cfg->ias;
 
 	/* If no access, then nothing to do */
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
@@ -474,7 +479,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return -EINVAL;
 
-	if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext || paddr >> cfg->oas))
 		return -ERANGE;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
@@ -640,11 +647,14 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
+	long iaext = (long)iova >> cfg->ias;
 
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return 0;
 
-	if (WARN_ON(iova >> data->iop.cfg.ias))
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext))
 		return 0;
 
 	return __arm_lpae_unmap(data, gather, iova, size, data->start_level, ptep);
@@ -780,9 +790,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	u64 reg;
 	struct arm_lpae_io_pgtable *data;
 	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
+	bool tg1;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
-			    IO_PGTABLE_QUIRK_NON_STRICT))
+			    IO_PGTABLE_QUIRK_NON_STRICT |
+			    IO_PGTABLE_QUIRK_ARM_TTBR1))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
 	}
 
+	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
-		tcr->tg = ARM_LPAE_TCR_TG0_4K;
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
 		break;
 	case SZ_16K:
-		tcr->tg = ARM_LPAE_TCR_TG0_16K;
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
 		break;
 	case SZ_64K:
-		tcr->tg = ARM_LPAE_TCR_TG0_64K;
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
 		break;
 	}
 
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 6ae104cedfd7..d7c5cb685e50 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -83,12 +83,16 @@ struct io_pgtable_cfg {
 	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
 	 *	on unmap, for DMA domains using the flush queue mechanism for
 	 *	delayed invalidation.
+	 *
+	 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
+	 *	for use in the upper half of a split address space.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
 	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
 	#define IO_PGTABLE_QUIRK_ARM_MTK_EXT	BIT(3)
 	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(4)
+	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
2.21.0.dirty

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
@ 2019-10-25 18:08   ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-25 18:08 UTC (permalink / raw)
  To: will; +Cc: iommu, jcrouse, linux-arm-kernel

Now that we can correctly extract top-level indices without relying on
the remaining upper bits being zero, the only remaining impediments to
using a given table for TTBR1 are the address validation on map/unmap
and the awkward TCR translation granule format. Add a quirk so that we
can do the right thing at those points.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/iommu/io-pgtable-arm.c | 25 +++++++++++++++++++------
 include/linux/io-pgtable.h     |  4 ++++
 2 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 9b1912ede000..e53edff56e54 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -107,6 +107,10 @@
 #define ARM_LPAE_TCR_TG0_64K		1
 #define ARM_LPAE_TCR_TG0_16K		2
 
+#define ARM_LPAE_TCR_TG1_16K		1
+#define ARM_LPAE_TCR_TG1_4K		2
+#define ARM_LPAE_TCR_TG1_64K		3
+
 #define ARM_LPAE_TCR_SH0_SHIFT		12
 #define ARM_LPAE_TCR_SH_NS		0
 #define ARM_LPAE_TCR_SH_OS		2
@@ -466,6 +470,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	arm_lpae_iopte *ptep = data->pgd;
 	int ret, lvl = data->start_level;
 	arm_lpae_iopte prot;
+	long iaext = (long)iova >> cfg->ias;
 
 	/* If no access, then nothing to do */
 	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
@@ -474,7 +479,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return -EINVAL;
 
-	if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext || paddr >> cfg->oas))
 		return -ERANGE;
 
 	prot = arm_lpae_prot_to_pte(data, iommu_prot);
@@ -640,11 +647,14 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
 	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
 	struct io_pgtable_cfg *cfg = &data->iop.cfg;
 	arm_lpae_iopte *ptep = data->pgd;
+	long iaext = (long)iova >> cfg->ias;
 
 	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
 		return 0;
 
-	if (WARN_ON(iova >> data->iop.cfg.ias))
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+		iaext = ~iaext;
+	if (WARN_ON(iaext))
 		return 0;
 
 	return __arm_lpae_unmap(data, gather, iova, size, data->start_level, ptep);
@@ -780,9 +790,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	u64 reg;
 	struct arm_lpae_io_pgtable *data;
 	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
+	bool tg1;
 
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
-			    IO_PGTABLE_QUIRK_NON_STRICT))
+			    IO_PGTABLE_QUIRK_NON_STRICT |
+			    IO_PGTABLE_QUIRK_ARM_TTBR1))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
 	}
 
+	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
 	switch (ARM_LPAE_GRANULE(data)) {
 	case SZ_4K:
-		tcr->tg = ARM_LPAE_TCR_TG0_4K;
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
 		break;
 	case SZ_16K:
-		tcr->tg = ARM_LPAE_TCR_TG0_16K;
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
 		break;
 	case SZ_64K:
-		tcr->tg = ARM_LPAE_TCR_TG0_64K;
+		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
 		break;
 	}
 
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 6ae104cedfd7..d7c5cb685e50 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -83,12 +83,16 @@ struct io_pgtable_cfg {
 	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
 	 *	on unmap, for DMA domains using the flush queue mechanism for
 	 *	delayed invalidation.
+	 *
+	 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
+	 *	for use in the upper half of a split address space.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
 	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
 	#define IO_PGTABLE_QUIRK_ARM_MTK_EXT	BIT(3)
 	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(4)
+	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
2.21.0.dirty


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
  2019-10-25 18:08   ` Robin Murphy
@ 2019-10-28 15:09     ` Steven Price
  -1 siblings, 0 replies; 69+ messages in thread
From: Steven Price @ 2019-10-28 15:09 UTC (permalink / raw)
  To: Robin Murphy, will; +Cc: iommu, linux-arm-kernel

On 25/10/2019 19:08, Robin Murphy wrote:
> TTBR1 values have so far been redundant since no users implement any
> support for split address spaces. Crucially, though, one of the main
> reasons for wanting to do so is to be able to manage each half entirely
> independently, e.g. context-switching one set of mappings without
> disturbing the other. Thus it seems unlikely that tying two tables
> together in a single io_pgtable_cfg would ever be particularly desirable
> or useful.
> 
> Streamline the configs to just a single conceptual TTBR value
> representing the allocated table. This paves the way for future users to
> support split address spaces by simply allocating a table and dealing
> with the detailed TTBRn logistics themselves.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c        |  2 +-
>  drivers/iommu/arm-smmu.c           |  9 ++++-----
>  drivers/iommu/io-pgtable-arm-v7s.c | 16 +++++++---------
>  drivers/iommu/io-pgtable-arm.c     |  5 ++---
>  drivers/iommu/ipmmu-vmsa.c         |  2 +-
>  drivers/iommu/msm_iommu.c          |  4 ++--
>  drivers/iommu/mtk_iommu.c          |  4 ++--
>  drivers/iommu/qcom_iommu.c         |  3 +--
>  include/linux/io-pgtable.h         |  4 ++--
>  9 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3f20e548f1ec..da31e607698f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2170,7 +2170,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	}
>  
>  	cfg->cd.asid	= (u16)asid;
> -	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> +	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>  	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>  	return 0;
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 2bc3e93b11e6..a249e4e49ead 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -534,13 +534,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>  	/* TTBRs */
>  	if (stage1) {
>  		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
> -			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr[0];
> -			cb->ttbr[1] = pgtbl_cfg->arm_v7s_cfg.ttbr[1];
> +			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
> +			cb->ttbr[1] = 0;
>  		} else {
> -			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> +			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>  			cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> -			cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
> -			cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> +			cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
>  		}
>  	} else {
>  		cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 7c3bd2c3cdca..4d2c1e7f67c4 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBRs */
> -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
> -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
> -				   (cfg->coherent_walk ?
> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> -	cfg->arm_v7s_cfg.ttbr[1] = 0;
> +	/* TTBR */
> +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
> +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
> +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));

ARM_V7S_TTBR_NOS seems to have sneaked into the cfg->coherent_walk
condition here - which you haven't mentioned in the commit log, so it
doesn't look like it should be in this commit.

Steve

>  	return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 1795df8f7a51..bc0841040ebe 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -872,9 +872,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBRs */
> -	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
> -	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
> +	/* TTBR */
> +	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
>  	return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index e4da6efbda49..4fe0ff3216ce 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -416,7 +416,7 @@ static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain)
>  	u32 tmp;
>  
>  	/* TTBR0 */
> -	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
> +	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr;
>  	ipmmu_ctx_write_root(domain, IMTTLBR0, ttbr);
>  	ipmmu_ctx_write_root(domain, IMTTUBR0, ttbr >> 32);
>  
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index be99d408cf35..9ceec140fa67 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -279,8 +279,8 @@ static void __program_context(void __iomem *base, int ctx,
>  	SET_V2PCFG(base, ctx, 0x3);
>  
>  	SET_TTBCR(base, ctx, priv->cfg.arm_v7s_cfg.tcr);
> -	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[0]);
> -	SET_TTBR1(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[1]);
> +	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr);
> +	SET_TTBR1(base, ctx, 0);
>  
>  	/* Set prrr and nmrr */
>  	SET_PRRR(base, ctx, priv->cfg.arm_v7s_cfg.prrr);
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 67a483c1a935..ef0b36eeb83d 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -392,7 +392,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain,
>  	/* Update the pgtable base address register of the M4U HW */
>  	if (!data->m4u_dom) {
>  		data->m4u_dom = dom;
> -		writel(dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> +		writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  		       data->base + REG_MMU_PT_BASE_ADDR);
>  	}
>  
> @@ -797,7 +797,7 @@ static int __maybe_unused mtk_iommu_resume(struct device *dev)
>  	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
>  	writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
>  	if (m4u_dom)
> -		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> +		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  		       base + REG_MMU_PT_BASE_ADDR);
>  	return 0;
>  }
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 66e9b40e9275..9a57eb6c253c 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -269,10 +269,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  
>  		/* TTBRs */
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> -				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0] |
> +				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> -				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[1] |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
>  
>  		/* TCR */
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index ee21eedafe98..53bca5343f52 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -100,7 +100,7 @@ struct io_pgtable_cfg {
>  	/* Low-level data specific to the table format */
>  	union {
>  		struct {
> -			u64	ttbr[2];
> +			u64	ttbr;
>  			u64	tcr;
>  			u64	mair;
>  		} arm_lpae_s1_cfg;
> @@ -111,7 +111,7 @@ struct io_pgtable_cfg {
>  		} arm_lpae_s2_cfg;
>  
>  		struct {
> -			u32	ttbr[2];
> +			u32	ttbr;
>  			u32	tcr;
>  			u32	nmrr;
>  			u32	prrr;
> 

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
@ 2019-10-28 15:09     ` Steven Price
  0 siblings, 0 replies; 69+ messages in thread
From: Steven Price @ 2019-10-28 15:09 UTC (permalink / raw)
  To: Robin Murphy, will; +Cc: iommu, jcrouse, linux-arm-kernel

On 25/10/2019 19:08, Robin Murphy wrote:
> TTBR1 values have so far been redundant since no users implement any
> support for split address spaces. Crucially, though, one of the main
> reasons for wanting to do so is to be able to manage each half entirely
> independently, e.g. context-switching one set of mappings without
> disturbing the other. Thus it seems unlikely that tying two tables
> together in a single io_pgtable_cfg would ever be particularly desirable
> or useful.
> 
> Streamline the configs to just a single conceptual TTBR value
> representing the allocated table. This paves the way for future users to
> support split address spaces by simply allocating a table and dealing
> with the detailed TTBRn logistics themselves.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c        |  2 +-
>  drivers/iommu/arm-smmu.c           |  9 ++++-----
>  drivers/iommu/io-pgtable-arm-v7s.c | 16 +++++++---------
>  drivers/iommu/io-pgtable-arm.c     |  5 ++---
>  drivers/iommu/ipmmu-vmsa.c         |  2 +-
>  drivers/iommu/msm_iommu.c          |  4 ++--
>  drivers/iommu/mtk_iommu.c          |  4 ++--
>  drivers/iommu/qcom_iommu.c         |  3 +--
>  include/linux/io-pgtable.h         |  4 ++--
>  9 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3f20e548f1ec..da31e607698f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2170,7 +2170,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	}
>  
>  	cfg->cd.asid	= (u16)asid;
> -	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> +	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>  	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>  	return 0;
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 2bc3e93b11e6..a249e4e49ead 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -534,13 +534,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>  	/* TTBRs */
>  	if (stage1) {
>  		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
> -			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr[0];
> -			cb->ttbr[1] = pgtbl_cfg->arm_v7s_cfg.ttbr[1];
> +			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
> +			cb->ttbr[1] = 0;
>  		} else {
> -			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> +			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>  			cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> -			cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
> -			cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> +			cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
>  		}
>  	} else {
>  		cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 7c3bd2c3cdca..4d2c1e7f67c4 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBRs */
> -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
> -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
> -				   (cfg->coherent_walk ?
> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> -	cfg->arm_v7s_cfg.ttbr[1] = 0;
> +	/* TTBR */
> +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
> +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
> +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));

ARM_V7S_TTBR_NOS seems to have sneaked into the cfg->coherent_walk
condition here - which you haven't mentioned in the commit log, so it
doesn't look like it should be in this commit.

Steve

>  	return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 1795df8f7a51..bc0841040ebe 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -872,9 +872,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBRs */
> -	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
> -	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
> +	/* TTBR */
> +	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
>  	return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index e4da6efbda49..4fe0ff3216ce 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -416,7 +416,7 @@ static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain)
>  	u32 tmp;
>  
>  	/* TTBR0 */
> -	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
> +	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr;
>  	ipmmu_ctx_write_root(domain, IMTTLBR0, ttbr);
>  	ipmmu_ctx_write_root(domain, IMTTUBR0, ttbr >> 32);
>  
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index be99d408cf35..9ceec140fa67 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -279,8 +279,8 @@ static void __program_context(void __iomem *base, int ctx,
>  	SET_V2PCFG(base, ctx, 0x3);
>  
>  	SET_TTBCR(base, ctx, priv->cfg.arm_v7s_cfg.tcr);
> -	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[0]);
> -	SET_TTBR1(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[1]);
> +	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr);
> +	SET_TTBR1(base, ctx, 0);
>  
>  	/* Set prrr and nmrr */
>  	SET_PRRR(base, ctx, priv->cfg.arm_v7s_cfg.prrr);
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 67a483c1a935..ef0b36eeb83d 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -392,7 +392,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain,
>  	/* Update the pgtable base address register of the M4U HW */
>  	if (!data->m4u_dom) {
>  		data->m4u_dom = dom;
> -		writel(dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> +		writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  		       data->base + REG_MMU_PT_BASE_ADDR);
>  	}
>  
> @@ -797,7 +797,7 @@ static int __maybe_unused mtk_iommu_resume(struct device *dev)
>  	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
>  	writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
>  	if (m4u_dom)
> -		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> +		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  		       base + REG_MMU_PT_BASE_ADDR);
>  	return 0;
>  }
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 66e9b40e9275..9a57eb6c253c 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -269,10 +269,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  
>  		/* TTBRs */
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> -				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0] |
> +				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> -				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[1] |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
>  
>  		/* TCR */
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index ee21eedafe98..53bca5343f52 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -100,7 +100,7 @@ struct io_pgtable_cfg {
>  	/* Low-level data specific to the table format */
>  	union {
>  		struct {
> -			u64	ttbr[2];
> +			u64	ttbr;
>  			u64	tcr;
>  			u64	mair;
>  		} arm_lpae_s1_cfg;
> @@ -111,7 +111,7 @@ struct io_pgtable_cfg {
>  		} arm_lpae_s2_cfg;
>  
>  		struct {
> -			u32	ttbr[2];
> +			u32	ttbr;
>  			u32	tcr;
>  			u32	nmrr;
>  			u32	prrr;
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
  2019-10-28 15:09     ` Steven Price
@ 2019-10-28 18:51       ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-28 18:51 UTC (permalink / raw)
  To: Steven Price, will; +Cc: iommu, linux-arm-kernel

On 28/10/2019 15:09, Steven Price wrote:
[...]
>> --- a/drivers/iommu/io-pgtable-arm-v7s.c
>> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
>> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>>   	/* Ensure the empty pgd is visible before any actual TTBR write */
>>   	wmb();
>>   
>> -	/* TTBRs */
>> -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
>> -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
>> -				   (cfg->coherent_walk ?
>> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
>> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>> -	cfg->arm_v7s_cfg.ttbr[1] = 0;
>> +	/* TTBR */
>> +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
>> +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
>> +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
>> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>> +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> 
> ARM_V7S_TTBR_NOS seems to have sneaked into the cfg->coherent_walk
> condition here - which you haven't mentioned in the commit log, so it
> doesn't look like it should be in this commit.

Ah, yes, it's taken a while to remember whether this was something 
important that got muddled up in rebasing, but it's actually just 
trivial cleanup. For !coherent_walk, the non-cacheable output attribute 
makes shareable accesses implicitly outer-shareable, so setting TTBR.NOS 
for that case actually does nothing except look misleading. Thus this is 
essentially just a cosmetic change included in the reformatting for 
clarity and consistency with the LPAE version. I'll call that out in the 
commit message, thanks for spotting!

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
@ 2019-10-28 18:51       ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-10-28 18:51 UTC (permalink / raw)
  To: Steven Price, will; +Cc: iommu, linux-arm-kernel

On 28/10/2019 15:09, Steven Price wrote:
[...]
>> --- a/drivers/iommu/io-pgtable-arm-v7s.c
>> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
>> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>>   	/* Ensure the empty pgd is visible before any actual TTBR write */
>>   	wmb();
>>   
>> -	/* TTBRs */
>> -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
>> -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
>> -				   (cfg->coherent_walk ?
>> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
>> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>> -	cfg->arm_v7s_cfg.ttbr[1] = 0;
>> +	/* TTBR */
>> +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
>> +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
>> +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
>> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>> +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> 
> ARM_V7S_TTBR_NOS seems to have sneaked into the cfg->coherent_walk
> condition here - which you haven't mentioned in the commit log, so it
> doesn't look like it should be in this commit.

Ah, yes, it's taken a while to remember whether this was something 
important that got muddled up in rebasing, but it's actually just 
trivial cleanup. For !coherent_walk, the non-cacheable output attribute 
makes shareable accesses implicitly outer-shareable, so setting TTBR.NOS 
for that case actually does nothing except look misleading. Thus this is 
essentially just a cosmetic change included in the reformatting for 
clarity and consistency with the LPAE version. I'll call that out in the 
commit message, thanks for spotting!

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing
  2019-10-25 18:08   ` Robin Murphy
@ 2019-11-04 18:17     ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 18:17 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:35PM +0100, Robin Murphy wrote:
> The nature of the LPAE format means that data->pg_shift is always
> redundant with data->bits_per_level, since they represent the size of a
> page and the number of PTEs per page respectively, and the size of a PTE
> is constant. Thus it works out more efficient to only store the latter,
> and derive the former via a trivial addition where necessary.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 29 +++++++++++++----------------
>  1 file changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 4b1483eb0ccf..15b4927ce36b 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -36,10 +36,11 @@
>   * in a virtual address mapped by the pagetable in d.
>   */
>  #define ARM_LPAE_LVL_SHIFT(l,d)						\
> -	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
> -	(d)->pg_shift)
> +	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
> +	ilog2(sizeof(arm_lpae_iopte)))
>  
> -#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
> +#define ARM_LPAE_GRANULE(d)						\
> +	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
>  #define ARM_LPAE_PGD_SIZE(d)						\
>  	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
>  
> @@ -55,9 +56,7 @@
>  	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
>  
>  /* Calculate the block/page mapping size at level l for pagetable in d. */
> -#define ARM_LPAE_BLOCK_SIZE(l,d)					\
> -	(1ULL << (ilog2(sizeof(arm_lpae_iopte)) +			\
> -		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
> +#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
>  
>  /* Page table bits */
>  #define ARM_LPAE_PTE_TYPE_SHIFT		0
> @@ -175,8 +174,7 @@ struct arm_lpae_io_pgtable {
>  
>  	int			pgd_bits;
>  	int			start_level;
> -	unsigned long		pg_shift;
> -	unsigned long		bits_per_level;
> +	int			bits_per_level;
>  
>  	void			*pgd;
>  };
> @@ -206,7 +204,7 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
>  {
>  	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
>  
> -	if (data->pg_shift < 16)
> +	if (data->bits_per_level < 13) /* i.e. 64K granule */

nit, but:

	if (ARM_LPAE_GRANULE(data) < SZ_64K)

might be clearer and avoid the need for a comment?

(I can make the change locally if you agree)

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing
@ 2019-11-04 18:17     ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 18:17 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:35PM +0100, Robin Murphy wrote:
> The nature of the LPAE format means that data->pg_shift is always
> redundant with data->bits_per_level, since they represent the size of a
> page and the number of PTEs per page respectively, and the size of a PTE
> is constant. Thus it works out more efficient to only store the latter,
> and derive the former via a trivial addition where necessary.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 29 +++++++++++++----------------
>  1 file changed, 13 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 4b1483eb0ccf..15b4927ce36b 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -36,10 +36,11 @@
>   * in a virtual address mapped by the pagetable in d.
>   */
>  #define ARM_LPAE_LVL_SHIFT(l,d)						\
> -	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
> -	(d)->pg_shift)
> +	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
> +	ilog2(sizeof(arm_lpae_iopte)))
>  
> -#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
> +#define ARM_LPAE_GRANULE(d)						\
> +	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
>  #define ARM_LPAE_PGD_SIZE(d)						\
>  	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
>  
> @@ -55,9 +56,7 @@
>  	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
>  
>  /* Calculate the block/page mapping size at level l for pagetable in d. */
> -#define ARM_LPAE_BLOCK_SIZE(l,d)					\
> -	(1ULL << (ilog2(sizeof(arm_lpae_iopte)) +			\
> -		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
> +#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
>  
>  /* Page table bits */
>  #define ARM_LPAE_PTE_TYPE_SHIFT		0
> @@ -175,8 +174,7 @@ struct arm_lpae_io_pgtable {
>  
>  	int			pgd_bits;
>  	int			start_level;
> -	unsigned long		pg_shift;
> -	unsigned long		bits_per_level;
> +	int			bits_per_level;
>  
>  	void			*pgd;
>  };
> @@ -206,7 +204,7 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
>  {
>  	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
>  
> -	if (data->pg_shift < 16)
> +	if (data->bits_per_level < 13) /* i.e. 64K granule */

nit, but:

	if (ARM_LPAE_GRANULE(data) < SZ_64K)

might be clearer and avoid the need for a comment?

(I can make the change locally if you agree)

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
  2019-10-25 18:08   ` Robin Murphy
@ 2019-11-04 18:20     ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 18:20 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:36PM +0100, Robin Murphy wrote:
> Between VMSAv8-64 and the various 32-bit formats, there is either one
> 64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
> As such, keeping two 64-bit values in io_pgtable_cfg has always been
> overkill.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c    | 2 +-
>  drivers/iommu/arm-smmu.c       | 4 ++--
>  drivers/iommu/io-pgtable-arm.c | 3 +--
>  drivers/iommu/ipmmu-vmsa.c     | 2 +-
>  drivers/iommu/qcom_iommu.c     | 4 ++--
>  include/linux/io-pgtable.h     | 2 +-
>  6 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8da93e730d6f..3f20e548f1ec 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	cfg->cd.asid	= (u16)asid;
>  	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
>  	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> +	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>  	return 0;
>  
>  out_free_asid:
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 080af0326816..2bc3e93b11e6 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>  			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
>  			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
>  		} else {
> -			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> -			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
> +			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
> +			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;

Does this work correctly for big-endian?

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
@ 2019-11-04 18:20     ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 18:20 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:36PM +0100, Robin Murphy wrote:
> Between VMSAv8-64 and the various 32-bit formats, there is either one
> 64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
> As such, keeping two 64-bit values in io_pgtable_cfg has always been
> overkill.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c    | 2 +-
>  drivers/iommu/arm-smmu.c       | 4 ++--
>  drivers/iommu/io-pgtable-arm.c | 3 +--
>  drivers/iommu/ipmmu-vmsa.c     | 2 +-
>  drivers/iommu/qcom_iommu.c     | 4 ++--
>  include/linux/io-pgtable.h     | 2 +-
>  6 files changed, 8 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 8da93e730d6f..3f20e548f1ec 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	cfg->cd.asid	= (u16)asid;
>  	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
>  	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> +	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>  	return 0;
>  
>  out_free_asid:
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 080af0326816..2bc3e93b11e6 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>  			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
>  			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
>  		} else {
> -			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> -			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
> +			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
> +			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;

Does this work correctly for big-endian?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing
  2019-11-04 18:17     ` Will Deacon
@ 2019-11-04 18:36       ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-11-04 18:36 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, linux-arm-kernel

On 04/11/2019 18:17, Will Deacon wrote:
> On Fri, Oct 25, 2019 at 07:08:35PM +0100, Robin Murphy wrote:
>> The nature of the LPAE format means that data->pg_shift is always
>> redundant with data->bits_per_level, since they represent the size of a
>> page and the number of PTEs per page respectively, and the size of a PTE
>> is constant. Thus it works out more efficient to only store the latter,
>> and derive the former via a trivial addition where necessary.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/io-pgtable-arm.c | 29 +++++++++++++----------------
>>   1 file changed, 13 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index 4b1483eb0ccf..15b4927ce36b 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -36,10 +36,11 @@
>>    * in a virtual address mapped by the pagetable in d.
>>    */
>>   #define ARM_LPAE_LVL_SHIFT(l,d)						\
>> -	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
>> -	(d)->pg_shift)
>> +	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
>> +	ilog2(sizeof(arm_lpae_iopte)))
>>   
>> -#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
>> +#define ARM_LPAE_GRANULE(d)						\
>> +	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
>>   #define ARM_LPAE_PGD_SIZE(d)						\
>>   	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
>>   
>> @@ -55,9 +56,7 @@
>>   	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
>>   
>>   /* Calculate the block/page mapping size at level l for pagetable in d. */
>> -#define ARM_LPAE_BLOCK_SIZE(l,d)					\
>> -	(1ULL << (ilog2(sizeof(arm_lpae_iopte)) +			\
>> -		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
>> +#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
>>   
>>   /* Page table bits */
>>   #define ARM_LPAE_PTE_TYPE_SHIFT		0
>> @@ -175,8 +174,7 @@ struct arm_lpae_io_pgtable {
>>   
>>   	int			pgd_bits;
>>   	int			start_level;
>> -	unsigned long		pg_shift;
>> -	unsigned long		bits_per_level;
>> +	int			bits_per_level;
>>   
>>   	void			*pgd;
>>   };
>> @@ -206,7 +204,7 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
>>   {
>>   	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
>>   
>> -	if (data->pg_shift < 16)
>> +	if (data->bits_per_level < 13) /* i.e. 64K granule */
> 
> nit, but:
> 
> 	if (ARM_LPAE_GRANULE(data) < SZ_64K)
> 
> might be clearer and avoid the need for a comment?

Unfortunately GCC doesn't treat the two as directly equivalent 
(presumably due to boundary conditions) so will emit the additional faff 
to actually compute and compare the intermediate value every time, 
rather than just trivially testing the shift. I figured the minor 
I$/register pressure win was worth the small price of a comment.

Robin.

> (I can make the change locally if you agree)
> 
> Will
> 
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing
@ 2019-11-04 18:36       ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-11-04 18:36 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, jcrouse, linux-arm-kernel

On 04/11/2019 18:17, Will Deacon wrote:
> On Fri, Oct 25, 2019 at 07:08:35PM +0100, Robin Murphy wrote:
>> The nature of the LPAE format means that data->pg_shift is always
>> redundant with data->bits_per_level, since they represent the size of a
>> page and the number of PTEs per page respectively, and the size of a PTE
>> is constant. Thus it works out more efficient to only store the latter,
>> and derive the former via a trivial addition where necessary.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/io-pgtable-arm.c | 29 +++++++++++++----------------
>>   1 file changed, 13 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
>> index 4b1483eb0ccf..15b4927ce36b 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -36,10 +36,11 @@
>>    * in a virtual address mapped by the pagetable in d.
>>    */
>>   #define ARM_LPAE_LVL_SHIFT(l,d)						\
>> -	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
>> -	(d)->pg_shift)
>> +	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
>> +	ilog2(sizeof(arm_lpae_iopte)))
>>   
>> -#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
>> +#define ARM_LPAE_GRANULE(d)						\
>> +	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
>>   #define ARM_LPAE_PGD_SIZE(d)						\
>>   	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
>>   
>> @@ -55,9 +56,7 @@
>>   	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
>>   
>>   /* Calculate the block/page mapping size at level l for pagetable in d. */
>> -#define ARM_LPAE_BLOCK_SIZE(l,d)					\
>> -	(1ULL << (ilog2(sizeof(arm_lpae_iopte)) +			\
>> -		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
>> +#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
>>   
>>   /* Page table bits */
>>   #define ARM_LPAE_PTE_TYPE_SHIFT		0
>> @@ -175,8 +174,7 @@ struct arm_lpae_io_pgtable {
>>   
>>   	int			pgd_bits;
>>   	int			start_level;
>> -	unsigned long		pg_shift;
>> -	unsigned long		bits_per_level;
>> +	int			bits_per_level;
>>   
>>   	void			*pgd;
>>   };
>> @@ -206,7 +204,7 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
>>   {
>>   	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
>>   
>> -	if (data->pg_shift < 16)
>> +	if (data->bits_per_level < 13) /* i.e. 64K granule */
> 
> nit, but:
> 
> 	if (ARM_LPAE_GRANULE(data) < SZ_64K)
> 
> might be clearer and avoid the need for a comment?

Unfortunately GCC doesn't treat the two as directly equivalent 
(presumably due to boundary conditions) so will emit the additional faff 
to actually compute and compare the intermediate value every time, 
rather than just trivially testing the shift. I figured the minor 
I$/register pressure win was worth the small price of a comment.

Robin.

> (I can make the change locally if you agree)
> 
> Will
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
  2019-10-28 18:51       ` Robin Murphy
@ 2019-11-04 18:36         ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 18:36 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel, Steven Price

On Mon, Oct 28, 2019 at 06:51:55PM +0000, Robin Murphy wrote:
> On 28/10/2019 15:09, Steven Price wrote:
> [...]
> > > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > > @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
> > >   	/* Ensure the empty pgd is visible before any actual TTBR write */
> > >   	wmb();
> > > -	/* TTBRs */
> > > -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
> > > -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
> > > -				   (cfg->coherent_walk ?
> > > -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> > > -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> > > -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> > > -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> > > -	cfg->arm_v7s_cfg.ttbr[1] = 0;
> > > +	/* TTBR */
> > > +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
> > > +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
> > > +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> > > +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> > > +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> > > +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> > 
> > ARM_V7S_TTBR_NOS seems to have sneaked into the cfg->coherent_walk
> > condition here - which you haven't mentioned in the commit log, so it
> > doesn't look like it should be in this commit.
> 
> Ah, yes, it's taken a while to remember whether this was something important
> that got muddled up in rebasing, but it's actually just trivial cleanup. For
> !coherent_walk, the non-cacheable output attribute makes shareable accesses
> implicitly outer-shareable, so setting TTBR.NOS for that case actually does
> nothing except look misleading. Thus this is essentially just a cosmetic
> change included in the reformatting for clarity and consistency with the
> LPAE version. I'll call that out in the commit message, thanks for spotting!

I vaguely remember a case where you had to mark non-cacheable accesses as
outer-shareable explicitly to avoid unpredictable behaviour. Hmm.

/me looks at the Arm ARM

Ok, it looks like this changed between ARMv7 and ARMv8. The ARMv7 ARM
states:

  | A memory region with a resultant memory type attribute of Normal, and a
  | resultant cacheability attribute of Inner Non-cacheable, Outer
  | Non-cacheable, must have a resultant shareability attribute of Outer
  | Shareable, otherwise shareability is UNPREDICTABLE.

Although this only seems to be the case for LPAE! The short descriptor docs are
less clear, but I think it might be wise to ensure that non-cacheable mappings
are always outer-shareable for consistency.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
@ 2019-11-04 18:36         ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 18:36 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel, Steven Price

On Mon, Oct 28, 2019 at 06:51:55PM +0000, Robin Murphy wrote:
> On 28/10/2019 15:09, Steven Price wrote:
> [...]
> > > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > > @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
> > >   	/* Ensure the empty pgd is visible before any actual TTBR write */
> > >   	wmb();
> > > -	/* TTBRs */
> > > -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
> > > -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
> > > -				   (cfg->coherent_walk ?
> > > -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> > > -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> > > -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> > > -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> > > -	cfg->arm_v7s_cfg.ttbr[1] = 0;
> > > +	/* TTBR */
> > > +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
> > > +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
> > > +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> > > +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> > > +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> > > +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> > 
> > ARM_V7S_TTBR_NOS seems to have sneaked into the cfg->coherent_walk
> > condition here - which you haven't mentioned in the commit log, so it
> > doesn't look like it should be in this commit.
> 
> Ah, yes, it's taken a while to remember whether this was something important
> that got muddled up in rebasing, but it's actually just trivial cleanup. For
> !coherent_walk, the non-cacheable output attribute makes shareable accesses
> implicitly outer-shareable, so setting TTBR.NOS for that case actually does
> nothing except look misleading. Thus this is essentially just a cosmetic
> change included in the reformatting for clarity and consistency with the
> LPAE version. I'll call that out in the commit message, thanks for spotting!

I vaguely remember a case where you had to mark non-cacheable accesses as
outer-shareable explicitly to avoid unpredictable behaviour. Hmm.

/me looks at the Arm ARM

Ok, it looks like this changed between ARMv7 and ARMv8. The ARMv7 ARM
states:

  | A memory region with a resultant memory type attribute of Normal, and a
  | resultant cacheability attribute of Inner Non-cacheable, Outer
  | Non-cacheable, must have a resultant shareability attribute of Outer
  | Shareable, otherwise shareability is UNPREDICTABLE.

Although this only seems to be the case for LPAE! The short descriptor docs are
less clear, but I think it might be wise to ensure that non-cacheable mappings
are always outer-shareable for consistency.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
  2019-11-04 18:20     ` Will Deacon
@ 2019-11-04 18:43       ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-11-04 18:43 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, linux-arm-kernel

On 04/11/2019 18:20, Will Deacon wrote:
> On Fri, Oct 25, 2019 at 07:08:36PM +0100, Robin Murphy wrote:
>> Between VMSAv8-64 and the various 32-bit formats, there is either one
>> 64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
>> As such, keeping two 64-bit values in io_pgtable_cfg has always been
>> overkill.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/arm-smmu-v3.c    | 2 +-
>>   drivers/iommu/arm-smmu.c       | 4 ++--
>>   drivers/iommu/io-pgtable-arm.c | 3 +--
>>   drivers/iommu/ipmmu-vmsa.c     | 2 +-
>>   drivers/iommu/qcom_iommu.c     | 4 ++--
>>   include/linux/io-pgtable.h     | 2 +-
>>   6 files changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 8da93e730d6f..3f20e548f1ec 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>>   	cfg->cd.asid	= (u16)asid;
>>   	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
>>   	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>> -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
>> +	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>>   	return 0;
>>   
>>   out_free_asid:
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 080af0326816..2bc3e93b11e6 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>>   			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
>>   			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
>>   		} else {
>> -			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
>> -			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
>> +			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
>> +			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
> 
> Does this work correctly for big-endian?

I don't see why it wouldn't - cfg.mair is read and written as a u64, so 
this should always return its most significant word regardless of the 
storage format. We're not doing anything dodgy like trying to type-pun 
the u64 directly into the u32[2].

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
@ 2019-11-04 18:43       ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-11-04 18:43 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, jcrouse, linux-arm-kernel

On 04/11/2019 18:20, Will Deacon wrote:
> On Fri, Oct 25, 2019 at 07:08:36PM +0100, Robin Murphy wrote:
>> Between VMSAv8-64 and the various 32-bit formats, there is either one
>> 64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
>> As such, keeping two 64-bit values in io_pgtable_cfg has always been
>> overkill.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/arm-smmu-v3.c    | 2 +-
>>   drivers/iommu/arm-smmu.c       | 4 ++--
>>   drivers/iommu/io-pgtable-arm.c | 3 +--
>>   drivers/iommu/ipmmu-vmsa.c     | 2 +-
>>   drivers/iommu/qcom_iommu.c     | 4 ++--
>>   include/linux/io-pgtable.h     | 2 +-
>>   6 files changed, 8 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index 8da93e730d6f..3f20e548f1ec 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>>   	cfg->cd.asid	= (u16)asid;
>>   	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
>>   	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>> -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
>> +	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>>   	return 0;
>>   
>>   out_free_asid:
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 080af0326816..2bc3e93b11e6 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>>   			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
>>   			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
>>   		} else {
>> -			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
>> -			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
>> +			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
>> +			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
> 
> Does this work correctly for big-endian?

I don't see why it wouldn't - cfg.mair is read and written as a u64, so 
this should always return its most significant word regardless of the 
storage format. We're not doing anything dodgy like trying to type-pun 
the u64 directly into the u32[2].

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
  2019-11-04 18:36         ` Will Deacon
@ 2019-11-04 19:12           ` Robin Murphy
  -1 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-11-04 19:12 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, linux-arm-kernel, Steven Price

On 04/11/2019 18:36, Will Deacon wrote:
> On Mon, Oct 28, 2019 at 06:51:55PM +0000, Robin Murphy wrote:
>> On 28/10/2019 15:09, Steven Price wrote:
>> [...]
>>>> --- a/drivers/iommu/io-pgtable-arm-v7s.c
>>>> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
>>>> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>>>>    	/* Ensure the empty pgd is visible before any actual TTBR write */
>>>>    	wmb();
>>>> -	/* TTBRs */
>>>> -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
>>>> -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
>>>> -				   (cfg->coherent_walk ?
>>>> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
>>>> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>>>> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>>>> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>>>> -	cfg->arm_v7s_cfg.ttbr[1] = 0;
>>>> +	/* TTBR */
>>>> +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
>>>> +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
>>>> +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
>>>> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>>>> +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>>>> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>>>
>>> ARM_V7S_TTBR_NOS seems to have sneaked into the cfg->coherent_walk
>>> condition here - which you haven't mentioned in the commit log, so it
>>> doesn't look like it should be in this commit.
>>
>> Ah, yes, it's taken a while to remember whether this was something important
>> that got muddled up in rebasing, but it's actually just trivial cleanup. For
>> !coherent_walk, the non-cacheable output attribute makes shareable accesses
>> implicitly outer-shareable, so setting TTBR.NOS for that case actually does
>> nothing except look misleading. Thus this is essentially just a cosmetic
>> change included in the reformatting for clarity and consistency with the
>> LPAE version. I'll call that out in the commit message, thanks for spotting!
> 
> I vaguely remember a case where you had to mark non-cacheable accesses as
> outer-shareable explicitly to avoid unpredictable behaviour. Hmm.
> 
> /me looks at the Arm ARM
> 
> Ok, it looks like this changed between ARMv7 and ARMv8. The ARMv7 ARM
> states:
> 
>    | A memory region with a resultant memory type attribute of Normal, and a
>    | resultant cacheability attribute of Inner Non-cacheable, Outer
>    | Non-cacheable, must have a resultant shareability attribute of Outer
>    | Shareable, otherwise shareability is UNPREDICTABLE.
> 

Although, SMMUv2 does go a bit further in saying:

"In SMMUv2, the SMMU treats final attributes that are Normal Inner 
Non-cacheable or Normal Outer Non-cacheable as Outer Shareable. In 
SMMUv1, it is IMPLEMENTATION DEFINED how the SMMU treats such attributes."

and SMMUv3 follows similar lines:

"The SMMU does not output inconsistent attributes as a result of 
misconfiguration. Outer Shareable is used as the effective Shareability 
when Device or Normal Inner Non-cacheable Outer Non-cacheable types are 
configured."

> Although this only seems to be the case for LPAE! The short descriptor docs are
> less clear, but I think it might be wise to ensure that non-cacheable mappings
> are always outer-shareable for consistency.

Agreed, despite the above I think it does make sense to be explicit and 
not rely on subtleties. Between 9e6ea59f3ff3 and this patch we should 
have walks covered, so I can spin a followup to fix actual mappings as well.

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
@ 2019-11-04 19:12           ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-11-04 19:12 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, linux-arm-kernel, Steven Price

On 04/11/2019 18:36, Will Deacon wrote:
> On Mon, Oct 28, 2019 at 06:51:55PM +0000, Robin Murphy wrote:
>> On 28/10/2019 15:09, Steven Price wrote:
>> [...]
>>>> --- a/drivers/iommu/io-pgtable-arm-v7s.c
>>>> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
>>>> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>>>>    	/* Ensure the empty pgd is visible before any actual TTBR write */
>>>>    	wmb();
>>>> -	/* TTBRs */
>>>> -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
>>>> -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
>>>> -				   (cfg->coherent_walk ?
>>>> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
>>>> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>>>> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>>>> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>>>> -	cfg->arm_v7s_cfg.ttbr[1] = 0;
>>>> +	/* TTBR */
>>>> +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
>>>> +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
>>>> +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
>>>> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
>>>> +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
>>>> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>>>
>>> ARM_V7S_TTBR_NOS seems to have sneaked into the cfg->coherent_walk
>>> condition here - which you haven't mentioned in the commit log, so it
>>> doesn't look like it should be in this commit.
>>
>> Ah, yes, it's taken a while to remember whether this was something important
>> that got muddled up in rebasing, but it's actually just trivial cleanup. For
>> !coherent_walk, the non-cacheable output attribute makes shareable accesses
>> implicitly outer-shareable, so setting TTBR.NOS for that case actually does
>> nothing except look misleading. Thus this is essentially just a cosmetic
>> change included in the reformatting for clarity and consistency with the
>> LPAE version. I'll call that out in the commit message, thanks for spotting!
> 
> I vaguely remember a case where you had to mark non-cacheable accesses as
> outer-shareable explicitly to avoid unpredictable behaviour. Hmm.
> 
> /me looks at the Arm ARM
> 
> Ok, it looks like this changed between ARMv7 and ARMv8. The ARMv7 ARM
> states:
> 
>    | A memory region with a resultant memory type attribute of Normal, and a
>    | resultant cacheability attribute of Inner Non-cacheable, Outer
>    | Non-cacheable, must have a resultant shareability attribute of Outer
>    | Shareable, otherwise shareability is UNPREDICTABLE.
> 

Although, SMMUv2 does go a bit further in saying:

"In SMMUv2, the SMMU treats final attributes that are Normal Inner 
Non-cacheable or Normal Outer Non-cacheable as Outer Shareable. In 
SMMUv1, it is IMPLEMENTATION DEFINED how the SMMU treats such attributes."

and SMMUv3 follows similar lines:

"The SMMU does not output inconsistent attributes as a result of 
misconfiguration. Outer Shareable is used as the effective Shareability 
when Device or Normal Inner Non-cacheable Outer Non-cacheable types are 
configured."

> Although this only seems to be the case for LPAE! The short descriptor docs are
> less clear, but I think it might be wise to ensure that non-cacheable mappings
> are always outer-shareable for consistency.

Agreed, despite the above I think it does make sense to be explicit and 
not rely on subtleties. Between 9e6ea59f3ff3 and this patch we should 
have walks covered, so I can spin a followup to fix actual mappings as well.

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
  2019-10-25 18:08   ` Robin Murphy
@ 2019-11-04 19:14     ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:14 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> Although it's conceptually nice for the io_pgtable_cfg to provide a
> standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
> looks exactly like an Arm CPU, and they all have various other TCR
> controls which io-pgtable can't be expected to understand. Thus since
> there is an expectation that drivers will have to add to the given TCR
> value anyway, let's strip it down to just the essentials that are
> directly relevant to io-pgatble's inner workings - namely the various

typo: "io-pgatble"

> sizes and the walk attributes.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c        | 41 +++----------
>  drivers/iommu/arm-smmu.c           |  7 ++-
>  drivers/iommu/arm-smmu.h           | 27 ++++++++
>  drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
>  drivers/iommu/io-pgtable-arm.c     | 98 ++++++++++++------------------
>  drivers/iommu/io-pgtable.c         |  2 +-
>  drivers/iommu/qcom_iommu.c         |  8 +--
>  include/linux/io-pgtable.h         |  9 ++-
>  8 files changed, 94 insertions(+), 104 deletions(-)

Generally, I *really* like this patch, but I do have a bunch of comments:

> @@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	int asid;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;

I find this pretty grotty, but I couldn't think of something better and
exporting format-specific types out of the iopgtable layer also feels
nasty.

> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index 409716410b0d..98db074281ac 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -158,12 +158,24 @@ enum arm_smmu_cbar_type {
>  #define TCR2_SEP			GENMASK(17, 15)
>  #define TCR2_SEP_UPSTREAM		0x7
>  #define TCR2_AS				BIT(4)
> +#define TCR2_PASIZE			GENMASK(3, 0)
>  
>  #define ARM_SMMU_CB_TTBR0		0x20
>  #define ARM_SMMU_CB_TTBR1		0x28
>  #define TTBRn_ASID			GENMASK_ULL(63, 48)
>  
> +/* arm64 headers leak this somehow :( */
> +#undef TCR_T0SZ

Urgh. I suppose we should prefix these things with ARM_SMMU too :(
Obviously, that's a separate patch.

>  #define ARM_SMMU_CB_TCR			0x30
> +#define TCR_EAE				BIT(31)
> +#define TCR_EPD1			BIT(23)
> +#define TCR_TG0				GENMASK(15, 14)
> +#define TCR_SH0				GENMASK(13, 12)
> +#define TCR_ORGN0			GENMASK(11, 10)
> +#define TCR_IRGN0			GENMASK(9, 8)
> +#define TCR_T0SZ			GENMASK(5, 0)
> +
>  #define ARM_SMMU_CB_CONTEXTIDR		0x34
>  #define ARM_SMMU_CB_S1_MAIR0		0x38
>  #define ARM_SMMU_CB_S1_MAIR1		0x3c
> @@ -318,6 +330,21 @@ struct arm_smmu_domain {
>  	struct iommu_domain		domain;
>  };
>  
> +static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
> +{
> +	return TCR_EPD1 |
> +	       FIELD_PREP(TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
> +	       FIELD_PREP(TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
> +	       FIELD_PREP(TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
> +	       FIELD_PREP(TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
> +	       FIELD_PREP(TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
> +}
> +
> +static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
> +{
> +	return FIELD_PREP(TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
> +	       FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
> +}
>  
>  /* Implementation details, yay! */
>  struct arm_smmu_impl {
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 4d2c1e7f67c4..d8e4562ce478 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -149,8 +149,6 @@
>  #define ARM_V7S_TTBR_IRGN_ATTR(attr)					\
>  	((((attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
>  
> -#define ARM_V7S_TCR_PD1			BIT(5)
> -
>  #ifdef CONFIG_ZONE_DMA32
>  #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
>  #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
> @@ -798,8 +796,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  	 */
>  	cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
>  
> -	/* TCR: T0SZ=0, disable TTBR1 */
> -	cfg->arm_v7s_cfg.tcr = ARM_V7S_TCR_PD1;
> +	/* TCR: T0SZ=0, EAE=0 (if applicable) */
> +	cfg->arm_v7s_cfg.tcr = 0;
>  
>  	/*
>  	 * TEX remap: the indices used map to the closest equivalent types
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index bc0841040ebe..9b1912ede000 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -100,40 +100,32 @@
>  #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
>  
>  /* Register bits */
> -#define ARM_32_LPAE_TCR_EAE		(1 << 31)
> -#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
> +#define ARM_64_LPAE_VTCR_RES1		(1 << 31)

I know you're just renaming things here, but this looks really dodgy to
me. Won't it be treated as signed...

> @@ -910,7 +899,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  	}
>  
>  	/* VTCR */
> -	reg = ARM_64_LPAE_S2_TCR_RES1 |
> +	reg = ARM_64_LPAE_VTCR_RES1 |
>  	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
>  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
>  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);

... and then sign-extended here?

> @@ -919,45 +908,45 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		reg |= ARM_LPAE_TCR_TG0_4K;
> +		reg |= (ARM_LPAE_TCR_TG0_4K << ARM_LPAE_VTCR_TG0_SHIFT);

Why don't we do the bitfield thing for vtcr as well? Yeah, there's only one,
but the nice thing about naming all of the fields in the structure is that
it makes it obvious what you get back from the io-pgtable code.

> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 9a57eb6c253c..059be7e21030 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
>  				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
> -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);

Are you sure it's safe to drop the ASID here? Just want to make sure there
wasn't some "quirk" this was helping with.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
@ 2019-11-04 19:14     ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:14 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> Although it's conceptually nice for the io_pgtable_cfg to provide a
> standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
> looks exactly like an Arm CPU, and they all have various other TCR
> controls which io-pgtable can't be expected to understand. Thus since
> there is an expectation that drivers will have to add to the given TCR
> value anyway, let's strip it down to just the essentials that are
> directly relevant to io-pgatble's inner workings - namely the various

typo: "io-pgatble"

> sizes and the walk attributes.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c        | 41 +++----------
>  drivers/iommu/arm-smmu.c           |  7 ++-
>  drivers/iommu/arm-smmu.h           | 27 ++++++++
>  drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
>  drivers/iommu/io-pgtable-arm.c     | 98 ++++++++++++------------------
>  drivers/iommu/io-pgtable.c         |  2 +-
>  drivers/iommu/qcom_iommu.c         |  8 +--
>  include/linux/io-pgtable.h         |  9 ++-
>  8 files changed, 94 insertions(+), 104 deletions(-)

Generally, I *really* like this patch, but I do have a bunch of comments:

> @@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	int asid;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;

I find this pretty grotty, but I couldn't think of something better and
exporting format-specific types out of the iopgtable layer also feels
nasty.

> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index 409716410b0d..98db074281ac 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -158,12 +158,24 @@ enum arm_smmu_cbar_type {
>  #define TCR2_SEP			GENMASK(17, 15)
>  #define TCR2_SEP_UPSTREAM		0x7
>  #define TCR2_AS				BIT(4)
> +#define TCR2_PASIZE			GENMASK(3, 0)
>  
>  #define ARM_SMMU_CB_TTBR0		0x20
>  #define ARM_SMMU_CB_TTBR1		0x28
>  #define TTBRn_ASID			GENMASK_ULL(63, 48)
>  
> +/* arm64 headers leak this somehow :( */
> +#undef TCR_T0SZ

Urgh. I suppose we should prefix these things with ARM_SMMU too :(
Obviously, that's a separate patch.

>  #define ARM_SMMU_CB_TCR			0x30
> +#define TCR_EAE				BIT(31)
> +#define TCR_EPD1			BIT(23)
> +#define TCR_TG0				GENMASK(15, 14)
> +#define TCR_SH0				GENMASK(13, 12)
> +#define TCR_ORGN0			GENMASK(11, 10)
> +#define TCR_IRGN0			GENMASK(9, 8)
> +#define TCR_T0SZ			GENMASK(5, 0)
> +
>  #define ARM_SMMU_CB_CONTEXTIDR		0x34
>  #define ARM_SMMU_CB_S1_MAIR0		0x38
>  #define ARM_SMMU_CB_S1_MAIR1		0x3c
> @@ -318,6 +330,21 @@ struct arm_smmu_domain {
>  	struct iommu_domain		domain;
>  };
>  
> +static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
> +{
> +	return TCR_EPD1 |
> +	       FIELD_PREP(TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
> +	       FIELD_PREP(TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
> +	       FIELD_PREP(TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
> +	       FIELD_PREP(TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
> +	       FIELD_PREP(TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
> +}
> +
> +static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
> +{
> +	return FIELD_PREP(TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
> +	       FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
> +}
>  
>  /* Implementation details, yay! */
>  struct arm_smmu_impl {
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 4d2c1e7f67c4..d8e4562ce478 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -149,8 +149,6 @@
>  #define ARM_V7S_TTBR_IRGN_ATTR(attr)					\
>  	((((attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
>  
> -#define ARM_V7S_TCR_PD1			BIT(5)
> -
>  #ifdef CONFIG_ZONE_DMA32
>  #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
>  #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
> @@ -798,8 +796,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  	 */
>  	cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
>  
> -	/* TCR: T0SZ=0, disable TTBR1 */
> -	cfg->arm_v7s_cfg.tcr = ARM_V7S_TCR_PD1;
> +	/* TCR: T0SZ=0, EAE=0 (if applicable) */
> +	cfg->arm_v7s_cfg.tcr = 0;
>  
>  	/*
>  	 * TEX remap: the indices used map to the closest equivalent types
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index bc0841040ebe..9b1912ede000 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -100,40 +100,32 @@
>  #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
>  
>  /* Register bits */
> -#define ARM_32_LPAE_TCR_EAE		(1 << 31)
> -#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
> +#define ARM_64_LPAE_VTCR_RES1		(1 << 31)

I know you're just renaming things here, but this looks really dodgy to
me. Won't it be treated as signed...

> @@ -910,7 +899,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  	}
>  
>  	/* VTCR */
> -	reg = ARM_64_LPAE_S2_TCR_RES1 |
> +	reg = ARM_64_LPAE_VTCR_RES1 |
>  	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
>  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
>  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);

... and then sign-extended here?

> @@ -919,45 +908,45 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		reg |= ARM_LPAE_TCR_TG0_4K;
> +		reg |= (ARM_LPAE_TCR_TG0_4K << ARM_LPAE_VTCR_TG0_SHIFT);

Why don't we do the bitfield thing for vtcr as well? Yeah, there's only one,
but the nice thing about naming all of the fields in the structure is that
it makes it obvious what you get back from the io-pgtable code.

> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 9a57eb6c253c..059be7e21030 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
>  				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
> -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);

Are you sure it's safe to drop the ASID here? Just want to make sure there
wasn't some "quirk" this was helping with.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing
  2019-11-04 18:36       ` Robin Murphy
@ 2019-11-04 19:20         ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:20 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Mon, Nov 04, 2019 at 06:36:51PM +0000, Robin Murphy wrote:
> On 04/11/2019 18:17, Will Deacon wrote:
> > On Fri, Oct 25, 2019 at 07:08:35PM +0100, Robin Murphy wrote:
> > > The nature of the LPAE format means that data->pg_shift is always
> > > redundant with data->bits_per_level, since they represent the size of a
> > > page and the number of PTEs per page respectively, and the size of a PTE
> > > is constant. Thus it works out more efficient to only store the latter,
> > > and derive the former via a trivial addition where necessary.
> > > 
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >   drivers/iommu/io-pgtable-arm.c | 29 +++++++++++++----------------
> > >   1 file changed, 13 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > > index 4b1483eb0ccf..15b4927ce36b 100644
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -36,10 +36,11 @@
> > >    * in a virtual address mapped by the pagetable in d.
> > >    */
> > >   #define ARM_LPAE_LVL_SHIFT(l,d)						\
> > > -	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
> > > -	(d)->pg_shift)
> > > +	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
> > > +	ilog2(sizeof(arm_lpae_iopte)))
> > > -#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
> > > +#define ARM_LPAE_GRANULE(d)						\
> > > +	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
> > >   #define ARM_LPAE_PGD_SIZE(d)						\
> > >   	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
> > > @@ -55,9 +56,7 @@
> > >   	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
> > >   /* Calculate the block/page mapping size at level l for pagetable in d. */
> > > -#define ARM_LPAE_BLOCK_SIZE(l,d)					\
> > > -	(1ULL << (ilog2(sizeof(arm_lpae_iopte)) +			\
> > > -		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
> > > +#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
> > >   /* Page table bits */
> > >   #define ARM_LPAE_PTE_TYPE_SHIFT		0
> > > @@ -175,8 +174,7 @@ struct arm_lpae_io_pgtable {
> > >   	int			pgd_bits;
> > >   	int			start_level;
> > > -	unsigned long		pg_shift;
> > > -	unsigned long		bits_per_level;
> > > +	int			bits_per_level;
> > >   	void			*pgd;
> > >   };
> > > @@ -206,7 +204,7 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> > >   {
> > >   	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> > > -	if (data->pg_shift < 16)
> > > +	if (data->bits_per_level < 13) /* i.e. 64K granule */
> > 
> > nit, but:
> > 
> > 	if (ARM_LPAE_GRANULE(data) < SZ_64K)
> > 
> > might be clearer and avoid the need for a comment?
> 
> Unfortunately GCC doesn't treat the two as directly equivalent (presumably
> due to boundary conditions) so will emit the additional faff to actually
> compute and compare the intermediate value every time, rather than just
> trivially testing the shift. I figured the minor I$/register pressure win
> was worth the small price of a comment.

Bet ya can't measure the difference ;)

I'd prefer the readable version in the absence of numbers.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing
@ 2019-11-04 19:20         ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:20 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

On Mon, Nov 04, 2019 at 06:36:51PM +0000, Robin Murphy wrote:
> On 04/11/2019 18:17, Will Deacon wrote:
> > On Fri, Oct 25, 2019 at 07:08:35PM +0100, Robin Murphy wrote:
> > > The nature of the LPAE format means that data->pg_shift is always
> > > redundant with data->bits_per_level, since they represent the size of a
> > > page and the number of PTEs per page respectively, and the size of a PTE
> > > is constant. Thus it works out more efficient to only store the latter,
> > > and derive the former via a trivial addition where necessary.
> > > 
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >   drivers/iommu/io-pgtable-arm.c | 29 +++++++++++++----------------
> > >   1 file changed, 13 insertions(+), 16 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > > index 4b1483eb0ccf..15b4927ce36b 100644
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -36,10 +36,11 @@
> > >    * in a virtual address mapped by the pagetable in d.
> > >    */
> > >   #define ARM_LPAE_LVL_SHIFT(l,d)						\
> > > -	(((ARM_LPAE_MAX_LEVELS - 1 - (l)) * (d)->bits_per_level) +	\
> > > -	(d)->pg_shift)
> > > +	(((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level) +		\
> > > +	ilog2(sizeof(arm_lpae_iopte)))
> > > -#define ARM_LPAE_GRANULE(d)		(1UL << (d)->pg_shift)
> > > +#define ARM_LPAE_GRANULE(d)						\
> > > +	(sizeof(arm_lpae_iopte) << (d)->bits_per_level)
> > >   #define ARM_LPAE_PGD_SIZE(d)						\
> > >   	(sizeof(arm_lpae_iopte) << (d)->pgd_bits)
> > > @@ -55,9 +56,7 @@
> > >   	 ((1 << ((d)->bits_per_level + ARM_LPAE_PGD_IDX(l,d))) - 1))
> > >   /* Calculate the block/page mapping size at level l for pagetable in d. */
> > > -#define ARM_LPAE_BLOCK_SIZE(l,d)					\
> > > -	(1ULL << (ilog2(sizeof(arm_lpae_iopte)) +			\
> > > -		((ARM_LPAE_MAX_LEVELS - (l)) * (d)->bits_per_level)))
> > > +#define ARM_LPAE_BLOCK_SIZE(l,d)	(1ULL << ARM_LPAE_LVL_SHIFT(l,d))
> > >   /* Page table bits */
> > >   #define ARM_LPAE_PTE_TYPE_SHIFT		0
> > > @@ -175,8 +174,7 @@ struct arm_lpae_io_pgtable {
> > >   	int			pgd_bits;
> > >   	int			start_level;
> > > -	unsigned long		pg_shift;
> > > -	unsigned long		bits_per_level;
> > > +	int			bits_per_level;
> > >   	void			*pgd;
> > >   };
> > > @@ -206,7 +204,7 @@ static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> > >   {
> > >   	u64 paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> > > -	if (data->pg_shift < 16)
> > > +	if (data->bits_per_level < 13) /* i.e. 64K granule */
> > 
> > nit, but:
> > 
> > 	if (ARM_LPAE_GRANULE(data) < SZ_64K)
> > 
> > might be clearer and avoid the need for a comment?
> 
> Unfortunately GCC doesn't treat the two as directly equivalent (presumably
> due to boundary conditions) so will emit the additional faff to actually
> compute and compare the intermediate value every time, rather than just
> trivially testing the shift. I figured the minor I$/register pressure win
> was worth the small price of a comment.

Bet ya can't measure the difference ;)

I'd prefer the readable version in the absence of numbers.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
  2019-11-04 18:43       ` Robin Murphy
@ 2019-11-04 19:20         ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:20 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Mon, Nov 04, 2019 at 06:43:06PM +0000, Robin Murphy wrote:
> On 04/11/2019 18:20, Will Deacon wrote:
> > On Fri, Oct 25, 2019 at 07:08:36PM +0100, Robin Murphy wrote:
> > > Between VMSAv8-64 and the various 32-bit formats, there is either one
> > > 64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
> > > As such, keeping two 64-bit values in io_pgtable_cfg has always been
> > > overkill.
> > > 
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >   drivers/iommu/arm-smmu-v3.c    | 2 +-
> > >   drivers/iommu/arm-smmu.c       | 4 ++--
> > >   drivers/iommu/io-pgtable-arm.c | 3 +--
> > >   drivers/iommu/ipmmu-vmsa.c     | 2 +-
> > >   drivers/iommu/qcom_iommu.c     | 4 ++--
> > >   include/linux/io-pgtable.h     | 2 +-
> > >   6 files changed, 8 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > > index 8da93e730d6f..3f20e548f1ec 100644
> > > --- a/drivers/iommu/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm-smmu-v3.c
> > > @@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
> > >   	cfg->cd.asid	= (u16)asid;
> > >   	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> > >   	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> > > -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> > > +	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
> > >   	return 0;
> > >   out_free_asid:
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index 080af0326816..2bc3e93b11e6 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> > >   			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
> > >   			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
> > >   		} else {
> > > -			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> > > -			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
> > > +			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
> > > +			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
> > 
> > Does this work correctly for big-endian?
> 
> I don't see why it wouldn't - cfg.mair is read and written as a u64, so this
> should always return its most significant word regardless of the storage
> format. We're not doing anything dodgy like trying to type-pun the u64
> directly into the u32[2].

Urgh, I need to convince myself about this then. Off to draw those silly
ABCD DCBA diagrams on some paper.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
@ 2019-11-04 19:20         ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:20 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

On Mon, Nov 04, 2019 at 06:43:06PM +0000, Robin Murphy wrote:
> On 04/11/2019 18:20, Will Deacon wrote:
> > On Fri, Oct 25, 2019 at 07:08:36PM +0100, Robin Murphy wrote:
> > > Between VMSAv8-64 and the various 32-bit formats, there is either one
> > > 64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
> > > As such, keeping two 64-bit values in io_pgtable_cfg has always been
> > > overkill.
> > > 
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >   drivers/iommu/arm-smmu-v3.c    | 2 +-
> > >   drivers/iommu/arm-smmu.c       | 4 ++--
> > >   drivers/iommu/io-pgtable-arm.c | 3 +--
> > >   drivers/iommu/ipmmu-vmsa.c     | 2 +-
> > >   drivers/iommu/qcom_iommu.c     | 4 ++--
> > >   include/linux/io-pgtable.h     | 2 +-
> > >   6 files changed, 8 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > > index 8da93e730d6f..3f20e548f1ec 100644
> > > --- a/drivers/iommu/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm-smmu-v3.c
> > > @@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
> > >   	cfg->cd.asid	= (u16)asid;
> > >   	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> > >   	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> > > -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> > > +	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
> > >   	return 0;
> > >   out_free_asid:
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index 080af0326816..2bc3e93b11e6 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> > >   			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
> > >   			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
> > >   		} else {
> > > -			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> > > -			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
> > > +			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
> > > +			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
> > 
> > Does this work correctly for big-endian?
> 
> I don't see why it wouldn't - cfg.mair is read and written as a u64, so this
> should always return its most significant word regardless of the storage
> format. We're not doing anything dodgy like trying to type-pun the u64
> directly into the u32[2].

Urgh, I need to convince myself about this then. Off to draw those silly
ABCD DCBA diagrams on some paper.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables
  2019-10-25 18:08 ` Robin Murphy
@ 2019-11-04 19:22   ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:22 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

Hi Robin,

On Fri, Oct 25, 2019 at 07:08:29PM +0100, Robin Murphy wrote:
> Since the flawed first attempt, I've reworked things with an abstracted
> TCR and an explicit TTBR1 quirk. I originally envisaged the need to pass
> the quirk all the way down to the TLBI calls, hence getting diverted
> into trying to make the parameter passing less cluttered in general, but
> in the end it turned out fairly neat to just fix the indexing such that
> we can always just pass around the original unmodified IOVA. Most of the
> new patches come from staring at that indexing code for long enough to
> see the subtle inefficiencies that were worth ironing out, plus a bit of
> random cleanup which doesn't feel worth posting separately.
> 
> Note that these patches depend on the fixes already queued in -rc4,
> otherwise there will be conflicts in arm_mali_lpae_alloc_pgtable().
> 
> Robin.
> 
> 
> Robin Murphy (10):
>   iommu/io-pgtable: Make selftest gubbins consistently __init
>   iommu/io-pgtable-arm: Rationalise size check
>   iommu/io-pgtable-arm: Simplify bounds checks
>   iommu/io-pgtable-arm: Simplify start level lookup
>   iommu/io-pgtable-arm: Simplify PGD size handling
>   iommu/io-pgtable-arm: Simplify level indexing
>   iommu/io-pgtable-arm: Rationalise MAIR handling
>   iommu/io-pgtable-arm: Rationalise TTBRn handling
>   iommu/io-pgtable-arm: Rationalise TCR handling
>   iommu/io-pgtable-arm: Prepare for TTBR1 usage

Overall, this looks really good to me. There's a bit more work to do
(see my comments) and I'd like Jordan to have a look as well, but on the
whole it's a big improvement. Thanks.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables
@ 2019-11-04 19:22   ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:22 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

Hi Robin,

On Fri, Oct 25, 2019 at 07:08:29PM +0100, Robin Murphy wrote:
> Since the flawed first attempt, I've reworked things with an abstracted
> TCR and an explicit TTBR1 quirk. I originally envisaged the need to pass
> the quirk all the way down to the TLBI calls, hence getting diverted
> into trying to make the parameter passing less cluttered in general, but
> in the end it turned out fairly neat to just fix the indexing such that
> we can always just pass around the original unmodified IOVA. Most of the
> new patches come from staring at that indexing code for long enough to
> see the subtle inefficiencies that were worth ironing out, plus a bit of
> random cleanup which doesn't feel worth posting separately.
> 
> Note that these patches depend on the fixes already queued in -rc4,
> otherwise there will be conflicts in arm_mali_lpae_alloc_pgtable().
> 
> Robin.
> 
> 
> Robin Murphy (10):
>   iommu/io-pgtable: Make selftest gubbins consistently __init
>   iommu/io-pgtable-arm: Rationalise size check
>   iommu/io-pgtable-arm: Simplify bounds checks
>   iommu/io-pgtable-arm: Simplify start level lookup
>   iommu/io-pgtable-arm: Simplify PGD size handling
>   iommu/io-pgtable-arm: Simplify level indexing
>   iommu/io-pgtable-arm: Rationalise MAIR handling
>   iommu/io-pgtable-arm: Rationalise TTBRn handling
>   iommu/io-pgtable-arm: Rationalise TCR handling
>   iommu/io-pgtable-arm: Prepare for TTBR1 usage

Overall, this looks really good to me. There's a bit more work to do
(see my comments) and I'd like Jordan to have a look as well, but on the
whole it's a big improvement. Thanks.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
  2019-11-04 19:20         ` Will Deacon
@ 2019-11-04 19:57           ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:57 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Mon, Nov 04, 2019 at 07:20:58PM +0000, Will Deacon wrote:
> On Mon, Nov 04, 2019 at 06:43:06PM +0000, Robin Murphy wrote:
> > On 04/11/2019 18:20, Will Deacon wrote:
> > > On Fri, Oct 25, 2019 at 07:08:36PM +0100, Robin Murphy wrote:
> > > > Between VMSAv8-64 and the various 32-bit formats, there is either one
> > > > 64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
> > > > As such, keeping two 64-bit values in io_pgtable_cfg has always been
> > > > overkill.
> > > > 
> > > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > > ---
> > > >   drivers/iommu/arm-smmu-v3.c    | 2 +-
> > > >   drivers/iommu/arm-smmu.c       | 4 ++--
> > > >   drivers/iommu/io-pgtable-arm.c | 3 +--
> > > >   drivers/iommu/ipmmu-vmsa.c     | 2 +-
> > > >   drivers/iommu/qcom_iommu.c     | 4 ++--
> > > >   include/linux/io-pgtable.h     | 2 +-
> > > >   6 files changed, 8 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > > > index 8da93e730d6f..3f20e548f1ec 100644
> > > > --- a/drivers/iommu/arm-smmu-v3.c
> > > > +++ b/drivers/iommu/arm-smmu-v3.c
> > > > @@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
> > > >   	cfg->cd.asid	= (u16)asid;
> > > >   	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> > > >   	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> > > > -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> > > > +	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
> > > >   	return 0;
> > > >   out_free_asid:
> > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > > index 080af0326816..2bc3e93b11e6 100644
> > > > --- a/drivers/iommu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm-smmu.c
> > > > @@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> > > >   			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
> > > >   			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
> > > >   		} else {
> > > > -			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> > > > -			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
> > > > +			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
> > > > +			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
> > > 
> > > Does this work correctly for big-endian?
> > 
> > I don't see why it wouldn't - cfg.mair is read and written as a u64, so this
> > should always return its most significant word regardless of the storage
> > format. We're not doing anything dodgy like trying to type-pun the u64
> > directly into the u32[2].
> 
> Urgh, I need to convince myself about this then. Off to draw those silly
> ABCD DCBA diagrams on some paper.

Yes, you're right, it's fine. I was worried about explicitly writing
2x32-bit MAIRs and then loading them as one, but that's not what is going on
here.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling
@ 2019-11-04 19:57           ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 19:57 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

On Mon, Nov 04, 2019 at 07:20:58PM +0000, Will Deacon wrote:
> On Mon, Nov 04, 2019 at 06:43:06PM +0000, Robin Murphy wrote:
> > On 04/11/2019 18:20, Will Deacon wrote:
> > > On Fri, Oct 25, 2019 at 07:08:36PM +0100, Robin Murphy wrote:
> > > > Between VMSAv8-64 and the various 32-bit formats, there is either one
> > > > 64-bit MAIR or a pair of 32-bit MAIR0/MAIR1 or NMRR/PMRR registers.
> > > > As such, keeping two 64-bit values in io_pgtable_cfg has always been
> > > > overkill.
> > > > 
> > > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > > ---
> > > >   drivers/iommu/arm-smmu-v3.c    | 2 +-
> > > >   drivers/iommu/arm-smmu.c       | 4 ++--
> > > >   drivers/iommu/io-pgtable-arm.c | 3 +--
> > > >   drivers/iommu/ipmmu-vmsa.c     | 2 +-
> > > >   drivers/iommu/qcom_iommu.c     | 4 ++--
> > > >   include/linux/io-pgtable.h     | 2 +-
> > > >   6 files changed, 8 insertions(+), 9 deletions(-)
> > > > 
> > > > diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> > > > index 8da93e730d6f..3f20e548f1ec 100644
> > > > --- a/drivers/iommu/arm-smmu-v3.c
> > > > +++ b/drivers/iommu/arm-smmu-v3.c
> > > > @@ -2172,7 +2172,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
> > > >   	cfg->cd.asid	= (u16)asid;
> > > >   	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> > > >   	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> > > > -	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> > > > +	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
> > > >   	return 0;
> > > >   out_free_asid:
> > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > > index 080af0326816..2bc3e93b11e6 100644
> > > > --- a/drivers/iommu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm-smmu.c
> > > > @@ -552,8 +552,8 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> > > >   			cb->mair[0] = pgtbl_cfg->arm_v7s_cfg.prrr;
> > > >   			cb->mair[1] = pgtbl_cfg->arm_v7s_cfg.nmrr;
> > > >   		} else {
> > > > -			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair[0];
> > > > -			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair[1];
> > > > +			cb->mair[0] = pgtbl_cfg->arm_lpae_s1_cfg.mair;
> > > > +			cb->mair[1] = pgtbl_cfg->arm_lpae_s1_cfg.mair >> 32;
> > > 
> > > Does this work correctly for big-endian?
> > 
> > I don't see why it wouldn't - cfg.mair is read and written as a u64, so this
> > should always return its most significant word regardless of the storage
> > format. We're not doing anything dodgy like trying to type-pun the u64
> > directly into the u32[2].
> 
> Urgh, I need to convince myself about this then. Off to draw those silly
> ABCD DCBA diagrams on some paper.

Yes, you're right, it's fine. I was worried about explicitly writing
2x32-bit MAIRs and then loading them as one, but that's not what is going on
here.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables
  2019-11-04 19:22   ` Will Deacon
@ 2019-11-04 20:20     ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 20:20 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Mon, Nov 04, 2019 at 07:22:28PM +0000, Will Deacon wrote:
> On Fri, Oct 25, 2019 at 07:08:29PM +0100, Robin Murphy wrote:
> > Since the flawed first attempt, I've reworked things with an abstracted
> > TCR and an explicit TTBR1 quirk. I originally envisaged the need to pass
> > the quirk all the way down to the TLBI calls, hence getting diverted
> > into trying to make the parameter passing less cluttered in general, but
> > in the end it turned out fairly neat to just fix the indexing such that
> > we can always just pass around the original unmodified IOVA. Most of the
> > new patches come from staring at that indexing code for long enough to
> > see the subtle inefficiencies that were worth ironing out, plus a bit of
> > random cleanup which doesn't feel worth posting separately.
> > 
> > Note that these patches depend on the fixes already queued in -rc4,
> > otherwise there will be conflicts in arm_mali_lpae_alloc_pgtable().
> > 
> > Robin.
> > 
> > 
> > Robin Murphy (10):
> >   iommu/io-pgtable: Make selftest gubbins consistently __init
> >   iommu/io-pgtable-arm: Rationalise size check
> >   iommu/io-pgtable-arm: Simplify bounds checks
> >   iommu/io-pgtable-arm: Simplify start level lookup
> >   iommu/io-pgtable-arm: Simplify PGD size handling
> >   iommu/io-pgtable-arm: Simplify level indexing
> >   iommu/io-pgtable-arm: Rationalise MAIR handling
> >   iommu/io-pgtable-arm: Rationalise TTBRn handling
> >   iommu/io-pgtable-arm: Rationalise TCR handling
> >   iommu/io-pgtable-arm: Prepare for TTBR1 usage
> 
> Overall, this looks really good to me. There's a bit more work to do
> (see my comments) and I'd like Jordan to have a look as well, but on the
> whole it's a big improvement. Thanks.

Also, I've merged the first 7 patches to save you having to repost those:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables
@ 2019-11-04 20:20     ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-04 20:20 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

On Mon, Nov 04, 2019 at 07:22:28PM +0000, Will Deacon wrote:
> On Fri, Oct 25, 2019 at 07:08:29PM +0100, Robin Murphy wrote:
> > Since the flawed first attempt, I've reworked things with an abstracted
> > TCR and an explicit TTBR1 quirk. I originally envisaged the need to pass
> > the quirk all the way down to the TLBI calls, hence getting diverted
> > into trying to make the parameter passing less cluttered in general, but
> > in the end it turned out fairly neat to just fix the indexing such that
> > we can always just pass around the original unmodified IOVA. Most of the
> > new patches come from staring at that indexing code for long enough to
> > see the subtle inefficiencies that were worth ironing out, plus a bit of
> > random cleanup which doesn't feel worth posting separately.
> > 
> > Note that these patches depend on the fixes already queued in -rc4,
> > otherwise there will be conflicts in arm_mali_lpae_alloc_pgtable().
> > 
> > Robin.
> > 
> > 
> > Robin Murphy (10):
> >   iommu/io-pgtable: Make selftest gubbins consistently __init
> >   iommu/io-pgtable-arm: Rationalise size check
> >   iommu/io-pgtable-arm: Simplify bounds checks
> >   iommu/io-pgtable-arm: Simplify start level lookup
> >   iommu/io-pgtable-arm: Simplify PGD size handling
> >   iommu/io-pgtable-arm: Simplify level indexing
> >   iommu/io-pgtable-arm: Rationalise MAIR handling
> >   iommu/io-pgtable-arm: Rationalise TTBRn handling
> >   iommu/io-pgtable-arm: Rationalise TCR handling
> >   iommu/io-pgtable-arm: Prepare for TTBR1 usage
> 
> Overall, this looks really good to me. There's a bit more work to do
> (see my comments) and I'd like Jordan to have a look as well, but on the
> whole it's a big improvement. Thanks.

Also, I've merged the first 7 patches to save you having to repost those:

https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
  2019-11-04 19:14     ` Will Deacon
@ 2019-11-04 23:27       ` Jordan Crouse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-04 23:27 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, Robin Murphy, linux-arm-kernel

On Mon, Nov 04, 2019 at 07:14:45PM +0000, Will Deacon wrote:
> On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> > Although it's conceptually nice for the io_pgtable_cfg to provide a
> > standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
> > looks exactly like an Arm CPU, and they all have various other TCR
> > controls which io-pgtable can't be expected to understand. Thus since
> > there is an expectation that drivers will have to add to the given TCR
> > value anyway, let's strip it down to just the essentials that are
> > directly relevant to io-pgatble's inner workings - namely the various
> 
> typo: "io-pgatble"
> 
> > sizes and the walk attributes.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  drivers/iommu/arm-smmu-v3.c        | 41 +++----------
> >  drivers/iommu/arm-smmu.c           |  7 ++-
> >  drivers/iommu/arm-smmu.h           | 27 ++++++++
> >  drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
> >  drivers/iommu/io-pgtable-arm.c     | 98 ++++++++++++------------------
> >  drivers/iommu/io-pgtable.c         |  2 +-
> >  drivers/iommu/qcom_iommu.c         |  8 +--
> >  include/linux/io-pgtable.h         |  9 ++-
> >  8 files changed, 94 insertions(+), 104 deletions(-)
> 
> Generally, I *really* like this patch, but I do have a bunch of comments:
> 
> > @@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
> >  	int asid;
> >  	struct arm_smmu_device *smmu = smmu_domain->smmu;
> >  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> > +	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> 
> I find this pretty grotty, but I couldn't think of something better and
> exporting format-specific types out of the iopgtable layer also feels
> nasty.
> 
> > diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> > index 409716410b0d..98db074281ac 100644
> > --- a/drivers/iommu/arm-smmu.h
> > +++ b/drivers/iommu/arm-smmu.h
> > @@ -158,12 +158,24 @@ enum arm_smmu_cbar_type {
> >  #define TCR2_SEP			GENMASK(17, 15)
> >  #define TCR2_SEP_UPSTREAM		0x7
> >  #define TCR2_AS				BIT(4)
> > +#define TCR2_PASIZE			GENMASK(3, 0)
> >  
> >  #define ARM_SMMU_CB_TTBR0		0x20
> >  #define ARM_SMMU_CB_TTBR1		0x28
> >  #define TTBRn_ASID			GENMASK_ULL(63, 48)
> >  
> > +/* arm64 headers leak this somehow :( */
> > +#undef TCR_T0SZ
> 
> Urgh. I suppose we should prefix these things with ARM_SMMU too :(
> Obviously, that's a separate patch.
> 
> >  #define ARM_SMMU_CB_TCR			0x30
> > +#define TCR_EAE				BIT(31)
> > +#define TCR_EPD1			BIT(23)
> > +#define TCR_TG0				GENMASK(15, 14)
> > +#define TCR_SH0				GENMASK(13, 12)
> > +#define TCR_ORGN0			GENMASK(11, 10)
> > +#define TCR_IRGN0			GENMASK(9, 8)
> > +#define TCR_T0SZ			GENMASK(5, 0)
> > +
> >  #define ARM_SMMU_CB_CONTEXTIDR		0x34
> >  #define ARM_SMMU_CB_S1_MAIR0		0x38
> >  #define ARM_SMMU_CB_S1_MAIR1		0x3c
> > @@ -318,6 +330,21 @@ struct arm_smmu_domain {
> >  	struct iommu_domain		domain;
> >  };
> >  
> > +static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
> > +{
> > +	return TCR_EPD1 |
> > +	       FIELD_PREP(TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
> > +	       FIELD_PREP(TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
> > +	       FIELD_PREP(TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
> > +	       FIELD_PREP(TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
> > +	       FIELD_PREP(TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
> > +}
> > +
> > +static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
> > +{
> > +	return FIELD_PREP(TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
> > +	       FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
> > +}
> >  
> >  /* Implementation details, yay! */
> >  struct arm_smmu_impl {
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> > index 4d2c1e7f67c4..d8e4562ce478 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -149,8 +149,6 @@
> >  #define ARM_V7S_TTBR_IRGN_ATTR(attr)					\
> >  	((((attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
> >  
> > -#define ARM_V7S_TCR_PD1			BIT(5)
> > -
> >  #ifdef CONFIG_ZONE_DMA32
> >  #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> >  #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
> > @@ -798,8 +796,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
> >  	 */
> >  	cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
> >  
> > -	/* TCR: T0SZ=0, disable TTBR1 */
> > -	cfg->arm_v7s_cfg.tcr = ARM_V7S_TCR_PD1;
> > +	/* TCR: T0SZ=0, EAE=0 (if applicable) */
> > +	cfg->arm_v7s_cfg.tcr = 0;
> >  
> >  	/*
> >  	 * TEX remap: the indices used map to the closest equivalent types
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index bc0841040ebe..9b1912ede000 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -100,40 +100,32 @@
> >  #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
> >  
> >  /* Register bits */
> > -#define ARM_32_LPAE_TCR_EAE		(1 << 31)
> > -#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
> > +#define ARM_64_LPAE_VTCR_RES1		(1 << 31)
> 
> I know you're just renaming things here, but this looks really dodgy to
> me. Won't it be treated as signed...
> 
> > @@ -910,7 +899,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >  	}
> >  
> >  	/* VTCR */
> > -	reg = ARM_64_LPAE_S2_TCR_RES1 |
> > +	reg = ARM_64_LPAE_VTCR_RES1 |
> >  	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> >  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> >  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> 
> ... and then sign-extended here?
> 
> > @@ -919,45 +908,45 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >  
> >  	switch (ARM_LPAE_GRANULE(data)) {
> >  	case SZ_4K:
> > -		reg |= ARM_LPAE_TCR_TG0_4K;
> > +		reg |= (ARM_LPAE_TCR_TG0_4K << ARM_LPAE_VTCR_TG0_SHIFT);
> 
> Why don't we do the bitfield thing for vtcr as well? Yeah, there's only one,
> but the nice thing about naming all of the fields in the structure is that
> it makes it obvious what you get back from the io-pgtable code.
> 
> > diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> > index 9a57eb6c253c..059be7e21030 100644
> > --- a/drivers/iommu/qcom_iommu.c
> > +++ b/drivers/iommu/qcom_iommu.c
> > @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
> >  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> >  				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
> >  				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> > -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
> 
> Are you sure it's safe to drop the ASID here? Just want to make sure there
> wasn't some "quirk" this was helping with.

I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in
TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so
the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if
this is a vendor specific blip or not.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
@ 2019-11-04 23:27       ` Jordan Crouse
  0 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-04 23:27 UTC (permalink / raw)
  To: Will Deacon; +Cc: iommu, Robin Murphy, linux-arm-kernel

On Mon, Nov 04, 2019 at 07:14:45PM +0000, Will Deacon wrote:
> On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> > Although it's conceptually nice for the io_pgtable_cfg to provide a
> > standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
> > looks exactly like an Arm CPU, and they all have various other TCR
> > controls which io-pgtable can't be expected to understand. Thus since
> > there is an expectation that drivers will have to add to the given TCR
> > value anyway, let's strip it down to just the essentials that are
> > directly relevant to io-pgatble's inner workings - namely the various
> 
> typo: "io-pgatble"
> 
> > sizes and the walk attributes.
> > 
> > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > ---
> >  drivers/iommu/arm-smmu-v3.c        | 41 +++----------
> >  drivers/iommu/arm-smmu.c           |  7 ++-
> >  drivers/iommu/arm-smmu.h           | 27 ++++++++
> >  drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
> >  drivers/iommu/io-pgtable-arm.c     | 98 ++++++++++++------------------
> >  drivers/iommu/io-pgtable.c         |  2 +-
> >  drivers/iommu/qcom_iommu.c         |  8 +--
> >  include/linux/io-pgtable.h         |  9 ++-
> >  8 files changed, 94 insertions(+), 104 deletions(-)
> 
> Generally, I *really* like this patch, but I do have a bunch of comments:
> 
> > @@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
> >  	int asid;
> >  	struct arm_smmu_device *smmu = smmu_domain->smmu;
> >  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> > +	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> 
> I find this pretty grotty, but I couldn't think of something better and
> exporting format-specific types out of the iopgtable layer also feels
> nasty.
> 
> > diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> > index 409716410b0d..98db074281ac 100644
> > --- a/drivers/iommu/arm-smmu.h
> > +++ b/drivers/iommu/arm-smmu.h
> > @@ -158,12 +158,24 @@ enum arm_smmu_cbar_type {
> >  #define TCR2_SEP			GENMASK(17, 15)
> >  #define TCR2_SEP_UPSTREAM		0x7
> >  #define TCR2_AS				BIT(4)
> > +#define TCR2_PASIZE			GENMASK(3, 0)
> >  
> >  #define ARM_SMMU_CB_TTBR0		0x20
> >  #define ARM_SMMU_CB_TTBR1		0x28
> >  #define TTBRn_ASID			GENMASK_ULL(63, 48)
> >  
> > +/* arm64 headers leak this somehow :( */
> > +#undef TCR_T0SZ
> 
> Urgh. I suppose we should prefix these things with ARM_SMMU too :(
> Obviously, that's a separate patch.
> 
> >  #define ARM_SMMU_CB_TCR			0x30
> > +#define TCR_EAE				BIT(31)
> > +#define TCR_EPD1			BIT(23)
> > +#define TCR_TG0				GENMASK(15, 14)
> > +#define TCR_SH0				GENMASK(13, 12)
> > +#define TCR_ORGN0			GENMASK(11, 10)
> > +#define TCR_IRGN0			GENMASK(9, 8)
> > +#define TCR_T0SZ			GENMASK(5, 0)
> > +
> >  #define ARM_SMMU_CB_CONTEXTIDR		0x34
> >  #define ARM_SMMU_CB_S1_MAIR0		0x38
> >  #define ARM_SMMU_CB_S1_MAIR1		0x3c
> > @@ -318,6 +330,21 @@ struct arm_smmu_domain {
> >  	struct iommu_domain		domain;
> >  };
> >  
> > +static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
> > +{
> > +	return TCR_EPD1 |
> > +	       FIELD_PREP(TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
> > +	       FIELD_PREP(TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
> > +	       FIELD_PREP(TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
> > +	       FIELD_PREP(TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
> > +	       FIELD_PREP(TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
> > +}
> > +
> > +static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
> > +{
> > +	return FIELD_PREP(TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
> > +	       FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
> > +}
> >  
> >  /* Implementation details, yay! */
> >  struct arm_smmu_impl {
> > diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> > index 4d2c1e7f67c4..d8e4562ce478 100644
> > --- a/drivers/iommu/io-pgtable-arm-v7s.c
> > +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> > @@ -149,8 +149,6 @@
> >  #define ARM_V7S_TTBR_IRGN_ATTR(attr)					\
> >  	((((attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
> >  
> > -#define ARM_V7S_TCR_PD1			BIT(5)
> > -
> >  #ifdef CONFIG_ZONE_DMA32
> >  #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
> >  #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
> > @@ -798,8 +796,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
> >  	 */
> >  	cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
> >  
> > -	/* TCR: T0SZ=0, disable TTBR1 */
> > -	cfg->arm_v7s_cfg.tcr = ARM_V7S_TCR_PD1;
> > +	/* TCR: T0SZ=0, EAE=0 (if applicable) */
> > +	cfg->arm_v7s_cfg.tcr = 0;
> >  
> >  	/*
> >  	 * TEX remap: the indices used map to the closest equivalent types
> > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> > index bc0841040ebe..9b1912ede000 100644
> > --- a/drivers/iommu/io-pgtable-arm.c
> > +++ b/drivers/iommu/io-pgtable-arm.c
> > @@ -100,40 +100,32 @@
> >  #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
> >  
> >  /* Register bits */
> > -#define ARM_32_LPAE_TCR_EAE		(1 << 31)
> > -#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
> > +#define ARM_64_LPAE_VTCR_RES1		(1 << 31)
> 
> I know you're just renaming things here, but this looks really dodgy to
> me. Won't it be treated as signed...
> 
> > @@ -910,7 +899,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >  	}
> >  
> >  	/* VTCR */
> > -	reg = ARM_64_LPAE_S2_TCR_RES1 |
> > +	reg = ARM_64_LPAE_VTCR_RES1 |
> >  	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> >  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> >  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> 
> ... and then sign-extended here?
> 
> > @@ -919,45 +908,45 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
> >  
> >  	switch (ARM_LPAE_GRANULE(data)) {
> >  	case SZ_4K:
> > -		reg |= ARM_LPAE_TCR_TG0_4K;
> > +		reg |= (ARM_LPAE_TCR_TG0_4K << ARM_LPAE_VTCR_TG0_SHIFT);
> 
> Why don't we do the bitfield thing for vtcr as well? Yeah, there's only one,
> but the nice thing about naming all of the fields in the structure is that
> it makes it obvious what you get back from the io-pgtable code.
> 
> > diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> > index 9a57eb6c253c..059be7e21030 100644
> > --- a/drivers/iommu/qcom_iommu.c
> > +++ b/drivers/iommu/qcom_iommu.c
> > @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
> >  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> >  				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
> >  				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> > -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
> 
> Are you sure it's safe to drop the ASID here? Just want to make sure there
> wasn't some "quirk" this was helping with.

I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in
TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so
the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if
this is a vendor specific blip or not.

Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
  2019-10-25 18:08   ` Robin Murphy
@ 2019-11-04 23:40     ` Jordan Crouse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-04 23:40 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:39PM +0100, Robin Murphy wrote:
> Now that we can correctly extract top-level indices without relying on
> the remaining upper bits being zero, the only remaining impediments to
> using a given table for TTBR1 are the address validation on map/unmap
> and the awkward TCR translation granule format. Add a quirk so that we
> can do the right thing at those points.

This looks great.  I have one comment about the TCR.A1 bit below but otherwise
this is sane. My immediate todo this week and next is to try to get something
spun up and working on the db845 for verification.

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 25 +++++++++++++++++++------
>  include/linux/io-pgtable.h     |  4 ++++
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 9b1912ede000..e53edff56e54 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -107,6 +107,10 @@
>  #define ARM_LPAE_TCR_TG0_64K		1
>  #define ARM_LPAE_TCR_TG0_16K		2
>  
> +#define ARM_LPAE_TCR_TG1_16K		1
> +#define ARM_LPAE_TCR_TG1_4K		2
> +#define ARM_LPAE_TCR_TG1_64K		3
> +
>  #define ARM_LPAE_TCR_SH0_SHIFT		12
>  #define ARM_LPAE_TCR_SH_NS		0
>  #define ARM_LPAE_TCR_SH_OS		2
> @@ -466,6 +470,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	arm_lpae_iopte *ptep = data->pgd;
>  	int ret, lvl = data->start_level;
>  	arm_lpae_iopte prot;
> +	long iaext = (long)iova >> cfg->ias;
>  
>  	/* If no access, then nothing to do */
>  	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> @@ -474,7 +479,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>  		return -EINVAL;
>  
> -	if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext || paddr >> cfg->oas))
>  		return -ERANGE;
>  
>  	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> @@ -640,11 +647,14 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	arm_lpae_iopte *ptep = data->pgd;
> +	long iaext = (long)iova >> cfg->ias;
>  
>  	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>  		return 0;
>  
> -	if (WARN_ON(iova >> data->iop.cfg.ias))
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext))
>  		return 0;
>  
>  	return __arm_lpae_unmap(data, gather, iova, size, data->start_level, ptep);
> @@ -780,9 +790,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	u64 reg;
>  	struct arm_lpae_io_pgtable *data;
>  	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
> +	bool tg1;
>  
>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> -			    IO_PGTABLE_QUIRK_NON_STRICT))
> +			    IO_PGTABLE_QUIRK_NON_STRICT |
> +			    IO_PGTABLE_QUIRK_ARM_TTBR1))
>  		return NULL;
>  
>  	data = arm_lpae_alloc_pgtable(cfg);
> @@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
>  	}
>  
> +	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_4K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
>  		break;
>  	case SZ_16K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_16K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
>  		break;
>  	case SZ_64K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_64K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
>  		break;
>  	}

The comment in one of the previous patches about the ASID in TTBR1 triggered
something in my brain. v2 TCR A1,bit[22] controls from which TTBR the ASID is
used I'm not sure if that qualifies as a quirk here or if it should be entirely
handled within arm_smmu_lpae_tcr() but I thought I should point it out.

>  
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 6ae104cedfd7..d7c5cb685e50 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -83,12 +83,16 @@ struct io_pgtable_cfg {
>  	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
>  	 *	on unmap, for DMA domains using the flush queue mechanism for
>  	 *	delayed invalidation.
> +	 *
> +	 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
> +	 *	for use in the upper half of a split address space.
>  	 */
>  	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
>  	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
>  	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
>  	#define IO_PGTABLE_QUIRK_ARM_MTK_EXT	BIT(3)
>  	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(4)
> +	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
>  	unsigned long			quirks;
>  	unsigned long			pgsize_bitmap;
>  	unsigned int			ias;
> -- 
> 2.21.0.dirty
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
@ 2019-11-04 23:40     ` Jordan Crouse
  0 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-04 23:40 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:39PM +0100, Robin Murphy wrote:
> Now that we can correctly extract top-level indices without relying on
> the remaining upper bits being zero, the only remaining impediments to
> using a given table for TTBR1 are the address validation on map/unmap
> and the awkward TCR translation granule format. Add a quirk so that we
> can do the right thing at those points.

This looks great.  I have one comment about the TCR.A1 bit below but otherwise
this is sane. My immediate todo this week and next is to try to get something
spun up and working on the db845 for verification.

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 25 +++++++++++++++++++------
>  include/linux/io-pgtable.h     |  4 ++++
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 9b1912ede000..e53edff56e54 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -107,6 +107,10 @@
>  #define ARM_LPAE_TCR_TG0_64K		1
>  #define ARM_LPAE_TCR_TG0_16K		2
>  
> +#define ARM_LPAE_TCR_TG1_16K		1
> +#define ARM_LPAE_TCR_TG1_4K		2
> +#define ARM_LPAE_TCR_TG1_64K		3
> +
>  #define ARM_LPAE_TCR_SH0_SHIFT		12
>  #define ARM_LPAE_TCR_SH_NS		0
>  #define ARM_LPAE_TCR_SH_OS		2
> @@ -466,6 +470,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	arm_lpae_iopte *ptep = data->pgd;
>  	int ret, lvl = data->start_level;
>  	arm_lpae_iopte prot;
> +	long iaext = (long)iova >> cfg->ias;
>  
>  	/* If no access, then nothing to do */
>  	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> @@ -474,7 +479,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>  		return -EINVAL;
>  
> -	if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext || paddr >> cfg->oas))
>  		return -ERANGE;
>  
>  	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> @@ -640,11 +647,14 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	arm_lpae_iopte *ptep = data->pgd;
> +	long iaext = (long)iova >> cfg->ias;
>  
>  	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>  		return 0;
>  
> -	if (WARN_ON(iova >> data->iop.cfg.ias))
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext))
>  		return 0;
>  
>  	return __arm_lpae_unmap(data, gather, iova, size, data->start_level, ptep);
> @@ -780,9 +790,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	u64 reg;
>  	struct arm_lpae_io_pgtable *data;
>  	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
> +	bool tg1;
>  
>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> -			    IO_PGTABLE_QUIRK_NON_STRICT))
> +			    IO_PGTABLE_QUIRK_NON_STRICT |
> +			    IO_PGTABLE_QUIRK_ARM_TTBR1))
>  		return NULL;
>  
>  	data = arm_lpae_alloc_pgtable(cfg);
> @@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
>  	}
>  
> +	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_4K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
>  		break;
>  	case SZ_16K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_16K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
>  		break;
>  	case SZ_64K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_64K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
>  		break;
>  	}

The comment in one of the previous patches about the ASID in TTBR1 triggered
something in my brain. v2 TCR A1,bit[22] controls from which TTBR the ASID is
used I'm not sure if that qualifies as a quirk here or if it should be entirely
handled within arm_smmu_lpae_tcr() but I thought I should point it out.

>  
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 6ae104cedfd7..d7c5cb685e50 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -83,12 +83,16 @@ struct io_pgtable_cfg {
>  	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
>  	 *	on unmap, for DMA domains using the flush queue mechanism for
>  	 *	delayed invalidation.
> +	 *
> +	 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
> +	 *	for use in the upper half of a split address space.
>  	 */
>  	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
>  	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
>  	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
>  	#define IO_PGTABLE_QUIRK_ARM_MTK_EXT	BIT(3)
>  	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(4)
> +	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
>  	unsigned long			quirks;
>  	unsigned long			pgsize_bitmap;
>  	unsigned int			ias;
> -- 
> 2.21.0.dirty
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
  2019-11-04 23:27       ` Jordan Crouse
  (?)
@ 2019-11-20 15:11       ` Will Deacon
  2019-11-22 15:51           ` Robin Murphy
  -1 siblings, 1 reply; 69+ messages in thread
From: Will Deacon @ 2019-11-20 15:11 UTC (permalink / raw)
  To: Robin Murphy, iommu, linux-arm-kernel

On Mon, Nov 04, 2019 at 04:27:56PM -0700, Jordan Crouse wrote:
> On Mon, Nov 04, 2019 at 07:14:45PM +0000, Will Deacon wrote:
> > On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> > > diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> > > index 9a57eb6c253c..059be7e21030 100644
> > > --- a/drivers/iommu/qcom_iommu.c
> > > +++ b/drivers/iommu/qcom_iommu.c
> > > @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
> > >  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> > >  				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
> > >  				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > > -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> > > -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > > +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
> > 
> > Are you sure it's safe to drop the ASID here? Just want to make sure there
> > wasn't some "quirk" this was helping with.
> 
> I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in
> TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so
> the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if
> this is a vendor specific blip or not.

You should be able to set EPD1 to prevent walks via TTBR1 in that case,
though. Sticking the ASID in there is still dodgy if EPD1 is clear and
TTBR1 points at junk (or even physical address 0x0).

That's probably something which should be folded into this patch.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
  2019-11-04 23:40     ` Jordan Crouse
@ 2019-11-20 19:18       ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-20 19:18 UTC (permalink / raw)
  To: Robin Murphy, iommu, linux-arm-kernel

On Mon, Nov 04, 2019 at 04:40:06PM -0700, Jordan Crouse wrote:
> On Fri, Oct 25, 2019 at 07:08:39PM +0100, Robin Murphy wrote:
> > Now that we can correctly extract top-level indices without relying on
> > the remaining upper bits being zero, the only remaining impediments to
> > using a given table for TTBR1 are the address validation on map/unmap
> > and the awkward TCR translation granule format. Add a quirk so that we
> > can do the right thing at those points.
> 
> This looks great.  I have one comment about the TCR.A1 bit below but otherwise
> this is sane. My immediate todo this week and next is to try to get something
> spun up and working on the db845 for verification.

How did that go?

> > @@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >  		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
> >  	}
> >  
> > +	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
> >  	switch (ARM_LPAE_GRANULE(data)) {
> >  	case SZ_4K:
> > -		tcr->tg = ARM_LPAE_TCR_TG0_4K;
> > +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
> >  		break;
> >  	case SZ_16K:
> > -		tcr->tg = ARM_LPAE_TCR_TG0_16K;
> > +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
> >  		break;
> >  	case SZ_64K:
> > -		tcr->tg = ARM_LPAE_TCR_TG0_64K;
> > +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
> >  		break;
> >  	}
> 
> The comment in one of the previous patches about the ASID in TTBR1 triggered
> something in my brain. v2 TCR A1,bit[22] controls from which TTBR the ASID is
> used I'm not sure if that qualifies as a quirk here or if it should be entirely
> handled within arm_smmu_lpae_tcr() but I thought I should point it out.

That should be confined entirely to the driver code though, no? The
io-pgtable code doesn't go near ASIDs or the A1 bit.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
@ 2019-11-20 19:18       ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-20 19:18 UTC (permalink / raw)
  To: Robin Murphy, iommu, linux-arm-kernel

On Mon, Nov 04, 2019 at 04:40:06PM -0700, Jordan Crouse wrote:
> On Fri, Oct 25, 2019 at 07:08:39PM +0100, Robin Murphy wrote:
> > Now that we can correctly extract top-level indices without relying on
> > the remaining upper bits being zero, the only remaining impediments to
> > using a given table for TTBR1 are the address validation on map/unmap
> > and the awkward TCR translation granule format. Add a quirk so that we
> > can do the right thing at those points.
> 
> This looks great.  I have one comment about the TCR.A1 bit below but otherwise
> this is sane. My immediate todo this week and next is to try to get something
> spun up and working on the db845 for verification.

How did that go?

> > @@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> >  		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
> >  	}
> >  
> > +	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
> >  	switch (ARM_LPAE_GRANULE(data)) {
> >  	case SZ_4K:
> > -		tcr->tg = ARM_LPAE_TCR_TG0_4K;
> > +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
> >  		break;
> >  	case SZ_16K:
> > -		tcr->tg = ARM_LPAE_TCR_TG0_16K;
> > +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
> >  		break;
> >  	case SZ_64K:
> > -		tcr->tg = ARM_LPAE_TCR_TG0_64K;
> > +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
> >  		break;
> >  	}
> 
> The comment in one of the previous patches about the ASID in TTBR1 triggered
> something in my brain. v2 TCR A1,bit[22] controls from which TTBR the ASID is
> used I'm not sure if that qualifies as a quirk here or if it should be entirely
> handled within arm_smmu_lpae_tcr() but I thought I should point it out.

That should be confined entirely to the driver code though, no? The
io-pgtable code doesn't go near ASIDs or the A1 bit.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
  2019-11-20 15:11       ` Will Deacon
@ 2019-11-22 15:51           ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-11-22 15:51 UTC (permalink / raw)
  To: Will Deacon, iommu, linux-arm-kernel

On 20/11/2019 3:11 pm, Will Deacon wrote:
> On Mon, Nov 04, 2019 at 04:27:56PM -0700, Jordan Crouse wrote:
>> On Mon, Nov 04, 2019 at 07:14:45PM +0000, Will Deacon wrote:
>>> On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
>>>> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
>>>> index 9a57eb6c253c..059be7e21030 100644
>>>> --- a/drivers/iommu/qcom_iommu.c
>>>> +++ b/drivers/iommu/qcom_iommu.c
>>>> @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>>>>   		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
>>>>   				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>>>>   				FIELD_PREP(TTBRn_ASID, ctx->asid));
>>>> -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
>>>> -				FIELD_PREP(TTBRn_ASID, ctx->asid));
>>>> +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
>>>
>>> Are you sure it's safe to drop the ASID here? Just want to make sure there
>>> wasn't some "quirk" this was helping with.
>>
>> I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in
>> TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so
>> the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if
>> this is a vendor specific blip or not.
> 
> You should be able to set EPD1 to prevent walks via TTBR1 in that case,
> though. Sticking the ASID in there is still dodgy if EPD1 is clear and
> TTBR1 points at junk (or even physical address 0x0).
> 
> That's probably something which should be folded into this patch.

Note that EPD1 was being set by io-pgtable-arm before this patch, and 
remains set by virtue of arm_smmu_lpae_tcr() afterwards, so presumably 
the brokenness might run a bit deeper than that. Either way, though, I'm 
somewhat dubious since the ASID could well be 0 anyway :/

Robin.
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
@ 2019-11-22 15:51           ` Robin Murphy
  0 siblings, 0 replies; 69+ messages in thread
From: Robin Murphy @ 2019-11-22 15:51 UTC (permalink / raw)
  To: Will Deacon, iommu, linux-arm-kernel

On 20/11/2019 3:11 pm, Will Deacon wrote:
> On Mon, Nov 04, 2019 at 04:27:56PM -0700, Jordan Crouse wrote:
>> On Mon, Nov 04, 2019 at 07:14:45PM +0000, Will Deacon wrote:
>>> On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
>>>> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
>>>> index 9a57eb6c253c..059be7e21030 100644
>>>> --- a/drivers/iommu/qcom_iommu.c
>>>> +++ b/drivers/iommu/qcom_iommu.c
>>>> @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>>>>   		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
>>>>   				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>>>>   				FIELD_PREP(TTBRn_ASID, ctx->asid));
>>>> -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
>>>> -				FIELD_PREP(TTBRn_ASID, ctx->asid));
>>>> +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
>>>
>>> Are you sure it's safe to drop the ASID here? Just want to make sure there
>>> wasn't some "quirk" this was helping with.
>>
>> I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in
>> TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so
>> the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if
>> this is a vendor specific blip or not.
> 
> You should be able to set EPD1 to prevent walks via TTBR1 in that case,
> though. Sticking the ASID in there is still dodgy if EPD1 is clear and
> TTBR1 points at junk (or even physical address 0x0).
> 
> That's probably something which should be folded into this patch.

Note that EPD1 was being set by io-pgtable-arm before this patch, and 
remains set by virtue of arm_smmu_lpae_tcr() afterwards, so presumably 
the brokenness might run a bit deeper than that. Either way, though, I'm 
somewhat dubious since the ASID could well be 0 anyway :/

Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
  2019-10-25 18:08   ` Robin Murphy
  (?)
  (?)
@ 2019-11-22 22:03   ` Jordan Crouse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-22 22:03 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:39PM +0100, Robin Murphy wrote:
> Now that we can correctly extract top-level indices without relying on
> the remaining upper bits being zero, the only remaining impediments to
> using a given table for TTBR1 are the address validation on map/unmap
> and the awkward TCR translation granule format. Add a quirk so that we
> can do the right thing at those points.

Tested-by: Jordan Crouse <jcrouse@codeaurora.org>

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 25 +++++++++++++++++++------
>  include/linux/io-pgtable.h     |  4 ++++
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 9b1912ede000..e53edff56e54 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -107,6 +107,10 @@
>  #define ARM_LPAE_TCR_TG0_64K		1
>  #define ARM_LPAE_TCR_TG0_16K		2
>  
> +#define ARM_LPAE_TCR_TG1_16K		1
> +#define ARM_LPAE_TCR_TG1_4K		2
> +#define ARM_LPAE_TCR_TG1_64K		3
> +
>  #define ARM_LPAE_TCR_SH0_SHIFT		12
>  #define ARM_LPAE_TCR_SH_NS		0
>  #define ARM_LPAE_TCR_SH_OS		2
> @@ -466,6 +470,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	arm_lpae_iopte *ptep = data->pgd;
>  	int ret, lvl = data->start_level;
>  	arm_lpae_iopte prot;
> +	long iaext = (long)iova >> cfg->ias;
>  
>  	/* If no access, then nothing to do */
>  	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> @@ -474,7 +479,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>  		return -EINVAL;
>  
> -	if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext || paddr >> cfg->oas))
>  		return -ERANGE;
>  
>  	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> @@ -640,11 +647,14 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	arm_lpae_iopte *ptep = data->pgd;
> +	long iaext = (long)iova >> cfg->ias;
>  
>  	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>  		return 0;
>  
> -	if (WARN_ON(iova >> data->iop.cfg.ias))
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext))
>  		return 0;
>  
>  	return __arm_lpae_unmap(data, gather, iova, size, data->start_level, ptep);
> @@ -780,9 +790,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	u64 reg;
>  	struct arm_lpae_io_pgtable *data;
>  	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
> +	bool tg1;
>  
>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> -			    IO_PGTABLE_QUIRK_NON_STRICT))
> +			    IO_PGTABLE_QUIRK_NON_STRICT |
> +			    IO_PGTABLE_QUIRK_ARM_TTBR1))
>  		return NULL;
>  
>  	data = arm_lpae_alloc_pgtable(cfg);
> @@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
>  	}
>  
> +	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_4K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
>  		break;
>  	case SZ_16K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_16K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
>  		break;
>  	case SZ_64K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_64K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
>  		break;
>  	}
>  
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 6ae104cedfd7..d7c5cb685e50 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -83,12 +83,16 @@ struct io_pgtable_cfg {
>  	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
>  	 *	on unmap, for DMA domains using the flush queue mechanism for
>  	 *	delayed invalidation.
> +	 *
> +	 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
> +	 *	for use in the upper half of a split address space.
>  	 */
>  	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
>  	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
>  	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
>  	#define IO_PGTABLE_QUIRK_ARM_MTK_EXT	BIT(3)
>  	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(4)
> +	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
>  	unsigned long			quirks;
>  	unsigned long			pgsize_bitmap;
>  	unsigned int			ias;
> -- 
> 2.21.0.dirty
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage
  2019-10-25 18:08   ` Robin Murphy
                     ` (2 preceding siblings ...)
  (?)
@ 2019-11-22 22:03   ` Jordan Crouse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-22 22:03 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:39PM +0100, Robin Murphy wrote:
> Now that we can correctly extract top-level indices without relying on
> the remaining upper bits being zero, the only remaining impediments to
> using a given table for TTBR1 are the address validation on map/unmap
> and the awkward TCR translation granule format. Add a quirk so that we
> can do the right thing at those points.

Tested-by: Jordan Crouse <jcrouse@codeaurora.org>

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/io-pgtable-arm.c | 25 +++++++++++++++++++------
>  include/linux/io-pgtable.h     |  4 ++++
>  2 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 9b1912ede000..e53edff56e54 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -107,6 +107,10 @@
>  #define ARM_LPAE_TCR_TG0_64K		1
>  #define ARM_LPAE_TCR_TG0_16K		2
>  
> +#define ARM_LPAE_TCR_TG1_16K		1
> +#define ARM_LPAE_TCR_TG1_4K		2
> +#define ARM_LPAE_TCR_TG1_64K		3
> +
>  #define ARM_LPAE_TCR_SH0_SHIFT		12
>  #define ARM_LPAE_TCR_SH_NS		0
>  #define ARM_LPAE_TCR_SH_OS		2
> @@ -466,6 +470,7 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	arm_lpae_iopte *ptep = data->pgd;
>  	int ret, lvl = data->start_level;
>  	arm_lpae_iopte prot;
> +	long iaext = (long)iova >> cfg->ias;
>  
>  	/* If no access, then nothing to do */
>  	if (!(iommu_prot & (IOMMU_READ | IOMMU_WRITE)))
> @@ -474,7 +479,9 @@ static int arm_lpae_map(struct io_pgtable_ops *ops, unsigned long iova,
>  	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>  		return -EINVAL;
>  
> -	if (WARN_ON(iova >> data->iop.cfg.ias || paddr >> data->iop.cfg.oas))
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext || paddr >> cfg->oas))
>  		return -ERANGE;
>  
>  	prot = arm_lpae_prot_to_pte(data, iommu_prot);
> @@ -640,11 +647,14 @@ static size_t arm_lpae_unmap(struct io_pgtable_ops *ops, unsigned long iova,
>  	struct arm_lpae_io_pgtable *data = io_pgtable_ops_to_data(ops);
>  	struct io_pgtable_cfg *cfg = &data->iop.cfg;
>  	arm_lpae_iopte *ptep = data->pgd;
> +	long iaext = (long)iova >> cfg->ias;
>  
>  	if (WARN_ON(!size || (size & cfg->pgsize_bitmap) != size))
>  		return 0;
>  
> -	if (WARN_ON(iova >> data->iop.cfg.ias))
> +	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
> +		iaext = ~iaext;
> +	if (WARN_ON(iaext))
>  		return 0;
>  
>  	return __arm_lpae_unmap(data, gather, iova, size, data->start_level, ptep);
> @@ -780,9 +790,11 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	u64 reg;
>  	struct arm_lpae_io_pgtable *data;
>  	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
> +	bool tg1;
>  
>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> -			    IO_PGTABLE_QUIRK_NON_STRICT))
> +			    IO_PGTABLE_QUIRK_NON_STRICT |
> +			    IO_PGTABLE_QUIRK_ARM_TTBR1))
>  		return NULL;
>  
>  	data = arm_lpae_alloc_pgtable(cfg);
> @@ -800,15 +812,16 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
>  	}
>  
> +	tg1 = cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1;
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_4K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_4K : ARM_LPAE_TCR_TG0_4K;
>  		break;
>  	case SZ_16K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_16K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_16K : ARM_LPAE_TCR_TG0_16K;
>  		break;
>  	case SZ_64K:
> -		tcr->tg = ARM_LPAE_TCR_TG0_64K;
> +		tcr->tg = tg1 ? ARM_LPAE_TCR_TG1_64K : ARM_LPAE_TCR_TG0_64K;
>  		break;
>  	}
>  
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 6ae104cedfd7..d7c5cb685e50 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -83,12 +83,16 @@ struct io_pgtable_cfg {
>  	 * IO_PGTABLE_QUIRK_NON_STRICT: Skip issuing synchronous leaf TLBIs
>  	 *	on unmap, for DMA domains using the flush queue mechanism for
>  	 *	delayed invalidation.
> +	 *
> +	 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
> +	 *	for use in the upper half of a split address space.
>  	 */
>  	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
>  	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
>  	#define IO_PGTABLE_QUIRK_TLBI_ON_MAP	BIT(2)
>  	#define IO_PGTABLE_QUIRK_ARM_MTK_EXT	BIT(3)
>  	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(4)
> +	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
>  	unsigned long			quirks;
>  	unsigned long			pgsize_bitmap;
>  	unsigned int			ias;
> -- 
> 2.21.0.dirty
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
  2019-10-25 18:08   ` Robin Murphy
  (?)
  (?)
@ 2019-11-22 22:03   ` Jordan Crouse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-22 22:03 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> Although it's conceptually nice for the io_pgtable_cfg to provide a
> standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
> looks exactly like an Arm CPU, and they all have various other TCR
> controls which io-pgtable can't be expected to understand. Thus since
> there is an expectation that drivers will have to add to the given TCR
> value anyway, let's strip it down to just the essentials that are
> directly relevant to io-pgatble's inner workings - namely the various
> sizes and the walk attributes.

Tested-by: Jordan Crouse <jcrouse@codeaurora.org>

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c        | 41 +++----------
>  drivers/iommu/arm-smmu.c           |  7 ++-
>  drivers/iommu/arm-smmu.h           | 27 ++++++++
>  drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
>  drivers/iommu/io-pgtable-arm.c     | 98 ++++++++++++------------------
>  drivers/iommu/io-pgtable.c         |  2 +-
>  drivers/iommu/qcom_iommu.c         |  8 +--
>  include/linux/io-pgtable.h         |  9 ++-
>  8 files changed, 94 insertions(+), 104 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index da31e607698f..ca72cd777955 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -261,27 +261,18 @@
>  /* Context descriptor (stage-1 only) */
>  #define CTXDESC_CD_DWORDS		8
>  #define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
> -#define ARM64_TCR_T0SZ			GENMASK_ULL(5, 0)
>  #define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
> -#define ARM64_TCR_TG0			GENMASK_ULL(15, 14)
>  #define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
> -#define ARM64_TCR_IRGN0			GENMASK_ULL(9, 8)
>  #define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
> -#define ARM64_TCR_ORGN0			GENMASK_ULL(11, 10)
>  #define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
> -#define ARM64_TCR_SH0			GENMASK_ULL(13, 12)
>  #define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
> -#define ARM64_TCR_EPD0			(1ULL << 7)
>  #define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
> -#define ARM64_TCR_EPD1			(1ULL << 23)
>  
>  #define CTXDESC_CD_0_ENDI		(1UL << 15)
>  #define CTXDESC_CD_0_V			(1UL << 31)
>  
>  #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
> -#define ARM64_TCR_IPS			GENMASK_ULL(34, 32)
>  #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
> -#define ARM64_TCR_TBI0			(1ULL << 37)
>  
>  #define CTXDESC_CD_0_AA64		(1UL << 41)
>  #define CTXDESC_CD_0_S			(1UL << 44)
> @@ -292,10 +283,6 @@
>  
>  #define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
>  
> -/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> -#define ARM_SMMU_TCR2CD(tcr, fld)	FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
> -					FIELD_GET(ARM64_TCR_##fld, tcr))
> -
>  /* Command queue */
>  #define CMDQ_ENT_SZ_SHIFT		4
>  #define CMDQ_ENT_DWORDS			((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
> @@ -1443,23 +1430,6 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>  }
>  
>  /* Context descriptor manipulation functions */
> -static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> -{
> -	u64 val = 0;
> -
> -	/* Repack the TCR. Just care about TTBR0 for now */
> -	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> -	val |= ARM_SMMU_TCR2CD(tcr, TG0);
> -	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, SH0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> -	val |= ARM_SMMU_TCR2CD(tcr, IPS);
> -
> -	return val;
> -}
> -
>  static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
>  				    struct arm_smmu_s1_cfg *cfg)
>  {
> @@ -1469,7 +1439,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
>  	 * We don't need to issue any invalidation here, as we'll invalidate
>  	 * the STE when installing the new entry anyway.
>  	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> +	val = cfg->cd.tcr |
>  #ifdef __BIG_ENDIAN
>  	      CTXDESC_CD_0_ENDI |
>  #endif
> @@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	int asid;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  
>  	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
>  	if (asid < 0)
> @@ -2171,7 +2142,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  
>  	cfg->cd.asid	= (u16)asid;
>  	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> -	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> +	cfg->cd.tcr	= FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
> +			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>  	return 0;
>  
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index a249e4e49ead..ade323ab0484 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -521,11 +521,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>  		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
>  			cb->tcr[0] = pgtbl_cfg->arm_v7s_cfg.tcr;
>  		} else {
> -			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> -			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
> -			cb->tcr[1] |= FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
> +			cb->tcr[0] = arm_smmu_lpae_tcr(pgtbl_cfg);
> +			cb->tcr[1] = arm_smmu_lpae_tcr2(pgtbl_cfg);
>  			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
>  				cb->tcr[1] |= TCR2_AS;
> +			else
> +				cb->tcr[0] |= TCR_EAE;
>  		}
>  	} else {
>  		cb->tcr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index 409716410b0d..98db074281ac 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -158,12 +158,24 @@ enum arm_smmu_cbar_type {
>  #define TCR2_SEP			GENMASK(17, 15)
>  #define TCR2_SEP_UPSTREAM		0x7
>  #define TCR2_AS				BIT(4)
> +#define TCR2_PASIZE			GENMASK(3, 0)
>  
>  #define ARM_SMMU_CB_TTBR0		0x20
>  #define ARM_SMMU_CB_TTBR1		0x28
>  #define TTBRn_ASID			GENMASK_ULL(63, 48)
>  
> +/* arm64 headers leak this somehow :( */
> +#undef TCR_T0SZ
> +
>  #define ARM_SMMU_CB_TCR			0x30
> +#define TCR_EAE				BIT(31)
> +#define TCR_EPD1			BIT(23)
> +#define TCR_TG0				GENMASK(15, 14)
> +#define TCR_SH0				GENMASK(13, 12)
> +#define TCR_ORGN0			GENMASK(11, 10)
> +#define TCR_IRGN0			GENMASK(9, 8)
> +#define TCR_T0SZ			GENMASK(5, 0)
> +
>  #define ARM_SMMU_CB_CONTEXTIDR		0x34
>  #define ARM_SMMU_CB_S1_MAIR0		0x38
>  #define ARM_SMMU_CB_S1_MAIR1		0x3c
> @@ -318,6 +330,21 @@ struct arm_smmu_domain {
>  	struct iommu_domain		domain;
>  };
>  
> +static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
> +{
> +	return TCR_EPD1 |
> +	       FIELD_PREP(TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
> +	       FIELD_PREP(TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
> +	       FIELD_PREP(TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
> +	       FIELD_PREP(TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
> +	       FIELD_PREP(TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
> +}
> +
> +static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
> +{
> +	return FIELD_PREP(TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
> +	       FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
> +}
>  
>  /* Implementation details, yay! */
>  struct arm_smmu_impl {
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 4d2c1e7f67c4..d8e4562ce478 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -149,8 +149,6 @@
>  #define ARM_V7S_TTBR_IRGN_ATTR(attr)					\
>  	((((attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
>  
> -#define ARM_V7S_TCR_PD1			BIT(5)
> -
>  #ifdef CONFIG_ZONE_DMA32
>  #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
>  #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
> @@ -798,8 +796,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  	 */
>  	cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
>  
> -	/* TCR: T0SZ=0, disable TTBR1 */
> -	cfg->arm_v7s_cfg.tcr = ARM_V7S_TCR_PD1;
> +	/* TCR: T0SZ=0, EAE=0 (if applicable) */
> +	cfg->arm_v7s_cfg.tcr = 0;
>  
>  	/*
>  	 * TEX remap: the indices used map to the closest equivalent types
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index bc0841040ebe..9b1912ede000 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -100,40 +100,32 @@
>  #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
>  
>  /* Register bits */
> -#define ARM_32_LPAE_TCR_EAE		(1 << 31)
> -#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
> +#define ARM_64_LPAE_VTCR_RES1		(1 << 31)
>  
> -#define ARM_LPAE_TCR_EPD1		(1 << 23)
> -
> -#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
> -#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
> -#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
> +#define ARM_LPAE_VTCR_TG0_SHIFT		14
> +#define ARM_LPAE_TCR_TG0_4K		0
> +#define ARM_LPAE_TCR_TG0_64K		1
> +#define ARM_LPAE_TCR_TG0_16K		2
>  
>  #define ARM_LPAE_TCR_SH0_SHIFT		12
> -#define ARM_LPAE_TCR_SH0_MASK		0x3
>  #define ARM_LPAE_TCR_SH_NS		0
>  #define ARM_LPAE_TCR_SH_OS		2
>  #define ARM_LPAE_TCR_SH_IS		3
>  
>  #define ARM_LPAE_TCR_ORGN0_SHIFT	10
>  #define ARM_LPAE_TCR_IRGN0_SHIFT	8
> -#define ARM_LPAE_TCR_RGN_MASK		0x3
>  #define ARM_LPAE_TCR_RGN_NC		0
>  #define ARM_LPAE_TCR_RGN_WBWA		1
>  #define ARM_LPAE_TCR_RGN_WT		2
>  #define ARM_LPAE_TCR_RGN_WB		3
>  
> -#define ARM_LPAE_TCR_SL0_SHIFT		6
> -#define ARM_LPAE_TCR_SL0_MASK		0x3
> +#define ARM_LPAE_VTCR_SL0_SHIFT		6
> +#define ARM_LPAE_VTCR_SL0_MASK		0x3
>  
>  #define ARM_LPAE_TCR_T0SZ_SHIFT		0
> -#define ARM_LPAE_TCR_SZ_MASK		0xf
>  
> -#define ARM_LPAE_TCR_PS_SHIFT		16
> -#define ARM_LPAE_TCR_PS_MASK		0x7
> -
> -#define ARM_LPAE_TCR_IPS_SHIFT		32
> -#define ARM_LPAE_TCR_IPS_MASK		0x7
> +#define ARM_LPAE_VTCR_PS_SHIFT		16
> +#define ARM_LPAE_VTCR_PS_MASK		0x7
>  
>  #define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
>  #define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
> @@ -787,6 +779,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	u64 reg;
>  	struct arm_lpae_io_pgtable *data;
> +	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
>  
>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
>  			    IO_PGTABLE_QUIRK_NON_STRICT))
> @@ -798,58 +791,54 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  
>  	/* TCR */
>  	if (cfg->coherent_walk) {
> -		reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> -		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> -		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> +		tcr->sh = ARM_LPAE_TCR_SH_IS;
> +		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
> +		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
>  	} else {
> -		reg = (ARM_LPAE_TCR_SH_OS << ARM_LPAE_TCR_SH0_SHIFT) |
> -		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT) |
> -		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT);
> +		tcr->sh = ARM_LPAE_TCR_SH_OS;
> +		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
> +		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
>  	}
>  
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		reg |= ARM_LPAE_TCR_TG0_4K;
> +		tcr->tg = ARM_LPAE_TCR_TG0_4K;
>  		break;
>  	case SZ_16K:
> -		reg |= ARM_LPAE_TCR_TG0_16K;
> +		tcr->tg = ARM_LPAE_TCR_TG0_16K;
>  		break;
>  	case SZ_64K:
> -		reg |= ARM_LPAE_TCR_TG0_64K;
> +		tcr->tg = ARM_LPAE_TCR_TG0_64K;
>  		break;
>  	}
>  
>  	switch (cfg->oas) {
>  	case 32:
> -		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_32_BIT;
>  		break;
>  	case 36:
> -		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_36_BIT;
>  		break;
>  	case 40:
> -		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_40_BIT;
>  		break;
>  	case 42:
> -		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_42_BIT;
>  		break;
>  	case 44:
> -		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_44_BIT;
>  		break;
>  	case 48:
> -		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_48_BIT;
>  		break;
>  	case 52:
> -		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_52_BIT;
>  		break;
>  	default:
>  		goto out_free_data;
>  	}
>  
> -	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
> -
> -	/* Disable speculative walks through TTBR1 */
> -	reg |= ARM_LPAE_TCR_EPD1;
> -	cfg->arm_lpae_s1_cfg.tcr = reg;
> +	tcr->tsz = 64ULL - cfg->ias;
>  
>  	/* MAIRs */
>  	reg = (ARM_LPAE_MAIR_ATTR_NC
> @@ -910,7 +899,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  	}
>  
>  	/* VTCR */
> -	reg = ARM_64_LPAE_S2_TCR_RES1 |
> +	reg = ARM_64_LPAE_VTCR_RES1 |
>  	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
>  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
>  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> @@ -919,45 +908,45 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		reg |= ARM_LPAE_TCR_TG0_4K;
> +		reg |= (ARM_LPAE_TCR_TG0_4K << ARM_LPAE_VTCR_TG0_SHIFT);
>  		sl++; /* SL0 format is different for 4K granule size */
>  		break;
>  	case SZ_16K:
> -		reg |= ARM_LPAE_TCR_TG0_16K;
> +		reg |= (ARM_LPAE_TCR_TG0_16K << ARM_LPAE_VTCR_TG0_SHIFT);
>  		break;
>  	case SZ_64K:
> -		reg |= ARM_LPAE_TCR_TG0_64K;
> +		reg |= (ARM_LPAE_TCR_TG0_64K << ARM_LPAE_VTCR_TG0_SHIFT);
>  		break;
>  	}
>  
>  	switch (cfg->oas) {
>  	case 32:
> -		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 36:
> -		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 40:
> -		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 42:
> -		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 44:
> -		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 48:
> -		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 52:
> -		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	default:
>  		goto out_free_data;
>  	}
>  
>  	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
> -	reg |= (~sl & ARM_LPAE_TCR_SL0_MASK) << ARM_LPAE_TCR_SL0_SHIFT;
> +	reg |= (~sl & ARM_LPAE_VTCR_SL0_MASK) << ARM_LPAE_VTCR_SL0_SHIFT;
>  	cfg->arm_lpae_s2_cfg.vtcr = reg;
>  
>  	/* Allocate pgd pages */
> @@ -981,19 +970,12 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  static struct io_pgtable *
>  arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  {
> -	struct io_pgtable *iop;
> -
>  	if (cfg->ias > 32 || cfg->oas > 40)
>  		return NULL;
>  
>  	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
> -	iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
> -	if (iop) {
> -		cfg->arm_lpae_s1_cfg.tcr |= ARM_32_LPAE_TCR_EAE;
> -		cfg->arm_lpae_s1_cfg.tcr &= 0xffffffff;
> -	}
>  
> -	return iop;
> +	return arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
>  }
>  
>  static struct io_pgtable *
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index ced53e5b72b5..94394c81468f 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -63,7 +63,7 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
>  	if (!ops)
>  		return;
>  
> -	iop = container_of(ops, struct io_pgtable, ops);
> +	iop = io_pgtable_ops_to_pgtable(ops);
>  	io_pgtable_tlb_flush_all(iop);
>  	io_pgtable_init_table[iop->fmt]->free(iop);
>  }
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 9a57eb6c253c..059be7e21030 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
>  				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
> -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
>  
>  		/* TCR */
>  		iommu_writel(ctx, ARM_SMMU_CB_TCR2,
> -				(pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) |
> -				FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM));
> +				arm_smmu_lpae_tcr2(&pgtbl_cfg));
>  		iommu_writel(ctx, ARM_SMMU_CB_TCR,
> -				pgtbl_cfg.arm_lpae_s1_cfg.tcr);
> +				arm_smmu_lpae_tcr(&pgtbl_cfg) | TCR_EAE);
>  
>  		/* MAIRs (stage-1 only) */
>  		iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0,
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 53bca5343f52..6ae104cedfd7 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -101,7 +101,14 @@ struct io_pgtable_cfg {
>  	union {
>  		struct {
>  			u64	ttbr;
> -			u64	tcr;
> +			struct {
> +				u32	ips:3;
> +				u32	tg:2;
> +				u32	sh:2;
> +				u32	orgn:2;
> +				u32	irgn:2;
> +				u32	tsz:6;
> +			}	tcr;
>  			u64	mair;
>  		} arm_lpae_s1_cfg;
>  
> -- 
> 2.21.0.dirty
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
  2019-10-25 18:08   ` Robin Murphy
                     ` (2 preceding siblings ...)
  (?)
@ 2019-11-22 22:03   ` Jordan Crouse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-22 22:03 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> Although it's conceptually nice for the io_pgtable_cfg to provide a
> standard VMSA TCR value, the reality is that no VMSA-compliant IOMMU
> looks exactly like an Arm CPU, and they all have various other TCR
> controls which io-pgtable can't be expected to understand. Thus since
> there is an expectation that drivers will have to add to the given TCR
> value anyway, let's strip it down to just the essentials that are
> directly relevant to io-pgatble's inner workings - namely the various
> sizes and the walk attributes.

Tested-by: Jordan Crouse <jcrouse@codeaurora.org>

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c        | 41 +++----------
>  drivers/iommu/arm-smmu.c           |  7 ++-
>  drivers/iommu/arm-smmu.h           | 27 ++++++++
>  drivers/iommu/io-pgtable-arm-v7s.c |  6 +-
>  drivers/iommu/io-pgtable-arm.c     | 98 ++++++++++++------------------
>  drivers/iommu/io-pgtable.c         |  2 +-
>  drivers/iommu/qcom_iommu.c         |  8 +--
>  include/linux/io-pgtable.h         |  9 ++-
>  8 files changed, 94 insertions(+), 104 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index da31e607698f..ca72cd777955 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -261,27 +261,18 @@
>  /* Context descriptor (stage-1 only) */
>  #define CTXDESC_CD_DWORDS		8
>  #define CTXDESC_CD_0_TCR_T0SZ		GENMASK_ULL(5, 0)
> -#define ARM64_TCR_T0SZ			GENMASK_ULL(5, 0)
>  #define CTXDESC_CD_0_TCR_TG0		GENMASK_ULL(7, 6)
> -#define ARM64_TCR_TG0			GENMASK_ULL(15, 14)
>  #define CTXDESC_CD_0_TCR_IRGN0		GENMASK_ULL(9, 8)
> -#define ARM64_TCR_IRGN0			GENMASK_ULL(9, 8)
>  #define CTXDESC_CD_0_TCR_ORGN0		GENMASK_ULL(11, 10)
> -#define ARM64_TCR_ORGN0			GENMASK_ULL(11, 10)
>  #define CTXDESC_CD_0_TCR_SH0		GENMASK_ULL(13, 12)
> -#define ARM64_TCR_SH0			GENMASK_ULL(13, 12)
>  #define CTXDESC_CD_0_TCR_EPD0		(1ULL << 14)
> -#define ARM64_TCR_EPD0			(1ULL << 7)
>  #define CTXDESC_CD_0_TCR_EPD1		(1ULL << 30)
> -#define ARM64_TCR_EPD1			(1ULL << 23)
>  
>  #define CTXDESC_CD_0_ENDI		(1UL << 15)
>  #define CTXDESC_CD_0_V			(1UL << 31)
>  
>  #define CTXDESC_CD_0_TCR_IPS		GENMASK_ULL(34, 32)
> -#define ARM64_TCR_IPS			GENMASK_ULL(34, 32)
>  #define CTXDESC_CD_0_TCR_TBI0		(1ULL << 38)
> -#define ARM64_TCR_TBI0			(1ULL << 37)
>  
>  #define CTXDESC_CD_0_AA64		(1UL << 41)
>  #define CTXDESC_CD_0_S			(1UL << 44)
> @@ -292,10 +283,6 @@
>  
>  #define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
>  
> -/* Convert between AArch64 (CPU) TCR format and SMMU CD format */
> -#define ARM_SMMU_TCR2CD(tcr, fld)	FIELD_PREP(CTXDESC_CD_0_TCR_##fld, \
> -					FIELD_GET(ARM64_TCR_##fld, tcr))
> -
>  /* Command queue */
>  #define CMDQ_ENT_SZ_SHIFT		4
>  #define CMDQ_ENT_DWORDS			((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
> @@ -1443,23 +1430,6 @@ static int arm_smmu_cmdq_issue_sync(struct arm_smmu_device *smmu)
>  }
>  
>  /* Context descriptor manipulation functions */
> -static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> -{
> -	u64 val = 0;
> -
> -	/* Repack the TCR. Just care about TTBR0 for now */
> -	val |= ARM_SMMU_TCR2CD(tcr, T0SZ);
> -	val |= ARM_SMMU_TCR2CD(tcr, TG0);
> -	val |= ARM_SMMU_TCR2CD(tcr, IRGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, ORGN0);
> -	val |= ARM_SMMU_TCR2CD(tcr, SH0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD0);
> -	val |= ARM_SMMU_TCR2CD(tcr, EPD1);
> -	val |= ARM_SMMU_TCR2CD(tcr, IPS);
> -
> -	return val;
> -}
> -
>  static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
>  				    struct arm_smmu_s1_cfg *cfg)
>  {
> @@ -1469,7 +1439,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
>  	 * We don't need to issue any invalidation here, as we'll invalidate
>  	 * the STE when installing the new entry anyway.
>  	 */
> -	val = arm_smmu_cpu_tcr_to_cd(cfg->cd.tcr) |
> +	val = cfg->cd.tcr |
>  #ifdef __BIG_ENDIAN
>  	      CTXDESC_CD_0_ENDI |
>  #endif
> @@ -2155,6 +2125,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	int asid;
>  	struct arm_smmu_device *smmu = smmu_domain->smmu;
>  	struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg;
> +	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  
>  	asid = arm_smmu_bitmap_alloc(smmu->asid_map, smmu->asid_bits);
>  	if (asid < 0)
> @@ -2171,7 +2142,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  
>  	cfg->cd.asid	= (u16)asid;
>  	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> -	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> +	cfg->cd.tcr	= FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
> +			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
> +			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>  	return 0;
>  
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index a249e4e49ead..ade323ab0484 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -521,11 +521,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>  		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
>  			cb->tcr[0] = pgtbl_cfg->arm_v7s_cfg.tcr;
>  		} else {
> -			cb->tcr[0] = pgtbl_cfg->arm_lpae_s1_cfg.tcr;
> -			cb->tcr[1] = pgtbl_cfg->arm_lpae_s1_cfg.tcr >> 32;
> -			cb->tcr[1] |= FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
> +			cb->tcr[0] = arm_smmu_lpae_tcr(pgtbl_cfg);
> +			cb->tcr[1] = arm_smmu_lpae_tcr2(pgtbl_cfg);
>  			if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH64)
>  				cb->tcr[1] |= TCR2_AS;
> +			else
> +				cb->tcr[0] |= TCR_EAE;
>  		}
>  	} else {
>  		cb->tcr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index 409716410b0d..98db074281ac 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -158,12 +158,24 @@ enum arm_smmu_cbar_type {
>  #define TCR2_SEP			GENMASK(17, 15)
>  #define TCR2_SEP_UPSTREAM		0x7
>  #define TCR2_AS				BIT(4)
> +#define TCR2_PASIZE			GENMASK(3, 0)
>  
>  #define ARM_SMMU_CB_TTBR0		0x20
>  #define ARM_SMMU_CB_TTBR1		0x28
>  #define TTBRn_ASID			GENMASK_ULL(63, 48)
>  
> +/* arm64 headers leak this somehow :( */
> +#undef TCR_T0SZ
> +
>  #define ARM_SMMU_CB_TCR			0x30
> +#define TCR_EAE				BIT(31)
> +#define TCR_EPD1			BIT(23)
> +#define TCR_TG0				GENMASK(15, 14)
> +#define TCR_SH0				GENMASK(13, 12)
> +#define TCR_ORGN0			GENMASK(11, 10)
> +#define TCR_IRGN0			GENMASK(9, 8)
> +#define TCR_T0SZ			GENMASK(5, 0)
> +
>  #define ARM_SMMU_CB_CONTEXTIDR		0x34
>  #define ARM_SMMU_CB_S1_MAIR0		0x38
>  #define ARM_SMMU_CB_S1_MAIR1		0x3c
> @@ -318,6 +330,21 @@ struct arm_smmu_domain {
>  	struct iommu_domain		domain;
>  };
>  
> +static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
> +{
> +	return TCR_EPD1 |
> +	       FIELD_PREP(TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
> +	       FIELD_PREP(TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
> +	       FIELD_PREP(TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
> +	       FIELD_PREP(TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
> +	       FIELD_PREP(TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
> +}
> +
> +static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
> +{
> +	return FIELD_PREP(TCR2_PASIZE, cfg->arm_lpae_s1_cfg.tcr.ips) |
> +	       FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM);
> +}
>  
>  /* Implementation details, yay! */
>  struct arm_smmu_impl {
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 4d2c1e7f67c4..d8e4562ce478 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -149,8 +149,6 @@
>  #define ARM_V7S_TTBR_IRGN_ATTR(attr)					\
>  	((((attr) & 0x1) << 6) | (((attr) & 0x2) >> 1))
>  
> -#define ARM_V7S_TCR_PD1			BIT(5)
> -
>  #ifdef CONFIG_ZONE_DMA32
>  #define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
>  #define ARM_V7S_TABLE_SLAB_FLAGS SLAB_CACHE_DMA32
> @@ -798,8 +796,8 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  	 */
>  	cfg->pgsize_bitmap &= SZ_4K | SZ_64K | SZ_1M | SZ_16M;
>  
> -	/* TCR: T0SZ=0, disable TTBR1 */
> -	cfg->arm_v7s_cfg.tcr = ARM_V7S_TCR_PD1;
> +	/* TCR: T0SZ=0, EAE=0 (if applicable) */
> +	cfg->arm_v7s_cfg.tcr = 0;
>  
>  	/*
>  	 * TEX remap: the indices used map to the closest equivalent types
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index bc0841040ebe..9b1912ede000 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -100,40 +100,32 @@
>  #define ARM_LPAE_PTE_MEMATTR_DEV	(((arm_lpae_iopte)0x1) << 2)
>  
>  /* Register bits */
> -#define ARM_32_LPAE_TCR_EAE		(1 << 31)
> -#define ARM_64_LPAE_S2_TCR_RES1		(1 << 31)
> +#define ARM_64_LPAE_VTCR_RES1		(1 << 31)
>  
> -#define ARM_LPAE_TCR_EPD1		(1 << 23)
> -
> -#define ARM_LPAE_TCR_TG0_4K		(0 << 14)
> -#define ARM_LPAE_TCR_TG0_64K		(1 << 14)
> -#define ARM_LPAE_TCR_TG0_16K		(2 << 14)
> +#define ARM_LPAE_VTCR_TG0_SHIFT		14
> +#define ARM_LPAE_TCR_TG0_4K		0
> +#define ARM_LPAE_TCR_TG0_64K		1
> +#define ARM_LPAE_TCR_TG0_16K		2
>  
>  #define ARM_LPAE_TCR_SH0_SHIFT		12
> -#define ARM_LPAE_TCR_SH0_MASK		0x3
>  #define ARM_LPAE_TCR_SH_NS		0
>  #define ARM_LPAE_TCR_SH_OS		2
>  #define ARM_LPAE_TCR_SH_IS		3
>  
>  #define ARM_LPAE_TCR_ORGN0_SHIFT	10
>  #define ARM_LPAE_TCR_IRGN0_SHIFT	8
> -#define ARM_LPAE_TCR_RGN_MASK		0x3
>  #define ARM_LPAE_TCR_RGN_NC		0
>  #define ARM_LPAE_TCR_RGN_WBWA		1
>  #define ARM_LPAE_TCR_RGN_WT		2
>  #define ARM_LPAE_TCR_RGN_WB		3
>  
> -#define ARM_LPAE_TCR_SL0_SHIFT		6
> -#define ARM_LPAE_TCR_SL0_MASK		0x3
> +#define ARM_LPAE_VTCR_SL0_SHIFT		6
> +#define ARM_LPAE_VTCR_SL0_MASK		0x3
>  
>  #define ARM_LPAE_TCR_T0SZ_SHIFT		0
> -#define ARM_LPAE_TCR_SZ_MASK		0xf
>  
> -#define ARM_LPAE_TCR_PS_SHIFT		16
> -#define ARM_LPAE_TCR_PS_MASK		0x7
> -
> -#define ARM_LPAE_TCR_IPS_SHIFT		32
> -#define ARM_LPAE_TCR_IPS_MASK		0x7
> +#define ARM_LPAE_VTCR_PS_SHIFT		16
> +#define ARM_LPAE_VTCR_PS_MASK		0x7
>  
>  #define ARM_LPAE_TCR_PS_32_BIT		0x0ULL
>  #define ARM_LPAE_TCR_PS_36_BIT		0x1ULL
> @@ -787,6 +779,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  {
>  	u64 reg;
>  	struct arm_lpae_io_pgtable *data;
> +	typeof(&cfg->arm_lpae_s1_cfg.tcr) tcr = &cfg->arm_lpae_s1_cfg.tcr;
>  
>  	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
>  			    IO_PGTABLE_QUIRK_NON_STRICT))
> @@ -798,58 +791,54 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  
>  	/* TCR */
>  	if (cfg->coherent_walk) {
> -		reg = (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
> -		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
> -		      (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> +		tcr->sh = ARM_LPAE_TCR_SH_IS;
> +		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
> +		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
>  	} else {
> -		reg = (ARM_LPAE_TCR_SH_OS << ARM_LPAE_TCR_SH0_SHIFT) |
> -		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_IRGN0_SHIFT) |
> -		      (ARM_LPAE_TCR_RGN_NC << ARM_LPAE_TCR_ORGN0_SHIFT);
> +		tcr->sh = ARM_LPAE_TCR_SH_OS;
> +		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
> +		tcr->orgn = ARM_LPAE_TCR_RGN_NC;
>  	}
>  
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		reg |= ARM_LPAE_TCR_TG0_4K;
> +		tcr->tg = ARM_LPAE_TCR_TG0_4K;
>  		break;
>  	case SZ_16K:
> -		reg |= ARM_LPAE_TCR_TG0_16K;
> +		tcr->tg = ARM_LPAE_TCR_TG0_16K;
>  		break;
>  	case SZ_64K:
> -		reg |= ARM_LPAE_TCR_TG0_64K;
> +		tcr->tg = ARM_LPAE_TCR_TG0_64K;
>  		break;
>  	}
>  
>  	switch (cfg->oas) {
>  	case 32:
> -		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_32_BIT;
>  		break;
>  	case 36:
> -		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_36_BIT;
>  		break;
>  	case 40:
> -		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_40_BIT;
>  		break;
>  	case 42:
> -		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_42_BIT;
>  		break;
>  	case 44:
> -		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_44_BIT;
>  		break;
>  	case 48:
> -		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_48_BIT;
>  		break;
>  	case 52:
> -		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_IPS_SHIFT);
> +		tcr->ips = ARM_LPAE_TCR_PS_52_BIT;
>  		break;
>  	default:
>  		goto out_free_data;
>  	}
>  
> -	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
> -
> -	/* Disable speculative walks through TTBR1 */
> -	reg |= ARM_LPAE_TCR_EPD1;
> -	cfg->arm_lpae_s1_cfg.tcr = reg;
> +	tcr->tsz = 64ULL - cfg->ias;
>  
>  	/* MAIRs */
>  	reg = (ARM_LPAE_MAIR_ATTR_NC
> @@ -910,7 +899,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  	}
>  
>  	/* VTCR */
> -	reg = ARM_64_LPAE_S2_TCR_RES1 |
> +	reg = ARM_64_LPAE_VTCR_RES1 |
>  	     (ARM_LPAE_TCR_SH_IS << ARM_LPAE_TCR_SH0_SHIFT) |
>  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_IRGN0_SHIFT) |
>  	     (ARM_LPAE_TCR_RGN_WBWA << ARM_LPAE_TCR_ORGN0_SHIFT);
> @@ -919,45 +908,45 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  
>  	switch (ARM_LPAE_GRANULE(data)) {
>  	case SZ_4K:
> -		reg |= ARM_LPAE_TCR_TG0_4K;
> +		reg |= (ARM_LPAE_TCR_TG0_4K << ARM_LPAE_VTCR_TG0_SHIFT);
>  		sl++; /* SL0 format is different for 4K granule size */
>  		break;
>  	case SZ_16K:
> -		reg |= ARM_LPAE_TCR_TG0_16K;
> +		reg |= (ARM_LPAE_TCR_TG0_16K << ARM_LPAE_VTCR_TG0_SHIFT);
>  		break;
>  	case SZ_64K:
> -		reg |= ARM_LPAE_TCR_TG0_64K;
> +		reg |= (ARM_LPAE_TCR_TG0_64K << ARM_LPAE_VTCR_TG0_SHIFT);
>  		break;
>  	}
>  
>  	switch (cfg->oas) {
>  	case 32:
> -		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_32_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 36:
> -		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_36_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 40:
> -		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_40_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 42:
> -		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_42_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 44:
> -		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_44_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 48:
> -		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	case 52:
> -		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_PS_SHIFT);
> +		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_VTCR_PS_SHIFT);
>  		break;
>  	default:
>  		goto out_free_data;
>  	}
>  
>  	reg |= (64ULL - cfg->ias) << ARM_LPAE_TCR_T0SZ_SHIFT;
> -	reg |= (~sl & ARM_LPAE_TCR_SL0_MASK) << ARM_LPAE_TCR_SL0_SHIFT;
> +	reg |= (~sl & ARM_LPAE_VTCR_SL0_MASK) << ARM_LPAE_VTCR_SL0_SHIFT;
>  	cfg->arm_lpae_s2_cfg.vtcr = reg;
>  
>  	/* Allocate pgd pages */
> @@ -981,19 +970,12 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
>  static struct io_pgtable *
>  arm_32_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  {
> -	struct io_pgtable *iop;
> -
>  	if (cfg->ias > 32 || cfg->oas > 40)
>  		return NULL;
>  
>  	cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
> -	iop = arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
> -	if (iop) {
> -		cfg->arm_lpae_s1_cfg.tcr |= ARM_32_LPAE_TCR_EAE;
> -		cfg->arm_lpae_s1_cfg.tcr &= 0xffffffff;
> -	}
>  
> -	return iop;
> +	return arm_64_lpae_alloc_pgtable_s1(cfg, cookie);
>  }
>  
>  static struct io_pgtable *
> diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c
> index ced53e5b72b5..94394c81468f 100644
> --- a/drivers/iommu/io-pgtable.c
> +++ b/drivers/iommu/io-pgtable.c
> @@ -63,7 +63,7 @@ void free_io_pgtable_ops(struct io_pgtable_ops *ops)
>  	if (!ops)
>  		return;
>  
> -	iop = container_of(ops, struct io_pgtable, ops);
> +	iop = io_pgtable_ops_to_pgtable(ops);
>  	io_pgtable_tlb_flush_all(iop);
>  	io_pgtable_init_table[iop->fmt]->free(iop);
>  }
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 9a57eb6c253c..059be7e21030 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
>  				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
> -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
>  
>  		/* TCR */
>  		iommu_writel(ctx, ARM_SMMU_CB_TCR2,
> -				(pgtbl_cfg.arm_lpae_s1_cfg.tcr >> 32) |
> -				FIELD_PREP(TCR2_SEP, TCR2_SEP_UPSTREAM));
> +				arm_smmu_lpae_tcr2(&pgtbl_cfg));
>  		iommu_writel(ctx, ARM_SMMU_CB_TCR,
> -				pgtbl_cfg.arm_lpae_s1_cfg.tcr);
> +				arm_smmu_lpae_tcr(&pgtbl_cfg) | TCR_EAE);
>  
>  		/* MAIRs (stage-1 only) */
>  		iommu_writel(ctx, ARM_SMMU_CB_S1_MAIR0,
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 53bca5343f52..6ae104cedfd7 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -101,7 +101,14 @@ struct io_pgtable_cfg {
>  	union {
>  		struct {
>  			u64	ttbr;
> -			u64	tcr;
> +			struct {
> +				u32	ips:3;
> +				u32	tg:2;
> +				u32	sh:2;
> +				u32	orgn:2;
> +				u32	irgn:2;
> +				u32	tsz:6;
> +			}	tcr;
>  			u64	mair;
>  		} arm_lpae_s1_cfg;
>  
> -- 
> 2.21.0.dirty
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
  2019-10-25 18:08   ` Robin Murphy
  (?)
  (?)
@ 2019-11-22 22:40   ` Jordan Crouse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-22 22:40 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:37PM +0100, Robin Murphy wrote:
> TTBR1 values have so far been redundant since no users implement any
> support for split address spaces. Crucially, though, one of the main
> reasons for wanting to do so is to be able to manage each half entirely
> independently, e.g. context-switching one set of mappings without
> disturbing the other. Thus it seems unlikely that tying two tables
> together in a single io_pgtable_cfg would ever be particularly desirable
> or useful.
> 
> Streamline the configs to just a single conceptual TTBR value
> representing the allocated table. This paves the way for future users to
> support split address spaces by simply allocating a table and dealing
> with the detailed TTBRn logistics themselves.

Tested-by: Jordan Crouse <jcrouse@codeaurora.org>

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c        |  2 +-
>  drivers/iommu/arm-smmu.c           |  9 ++++-----
>  drivers/iommu/io-pgtable-arm-v7s.c | 16 +++++++---------
>  drivers/iommu/io-pgtable-arm.c     |  5 ++---
>  drivers/iommu/ipmmu-vmsa.c         |  2 +-
>  drivers/iommu/msm_iommu.c          |  4 ++--
>  drivers/iommu/mtk_iommu.c          |  4 ++--
>  drivers/iommu/qcom_iommu.c         |  3 +--
>  include/linux/io-pgtable.h         |  4 ++--
>  9 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3f20e548f1ec..da31e607698f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2170,7 +2170,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	}
>  
>  	cfg->cd.asid	= (u16)asid;
> -	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> +	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>  	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>  	return 0;
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 2bc3e93b11e6..a249e4e49ead 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -534,13 +534,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>  	/* TTBRs */
>  	if (stage1) {
>  		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
> -			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr[0];
> -			cb->ttbr[1] = pgtbl_cfg->arm_v7s_cfg.ttbr[1];
> +			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
> +			cb->ttbr[1] = 0;
>  		} else {
> -			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> +			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>  			cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> -			cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
> -			cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> +			cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
>  		}
>  	} else {
>  		cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 7c3bd2c3cdca..4d2c1e7f67c4 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBRs */
> -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
> -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
> -				   (cfg->coherent_walk ?
> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> -	cfg->arm_v7s_cfg.ttbr[1] = 0;
> +	/* TTBR */
> +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
> +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
> +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>  	return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 1795df8f7a51..bc0841040ebe 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -872,9 +872,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBRs */
> -	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
> -	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
> +	/* TTBR */
> +	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
>  	return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index e4da6efbda49..4fe0ff3216ce 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -416,7 +416,7 @@ static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain)
>  	u32 tmp;
>  
>  	/* TTBR0 */
> -	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
> +	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr;
>  	ipmmu_ctx_write_root(domain, IMTTLBR0, ttbr);
>  	ipmmu_ctx_write_root(domain, IMTTUBR0, ttbr >> 32);
>  
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index be99d408cf35..9ceec140fa67 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -279,8 +279,8 @@ static void __program_context(void __iomem *base, int ctx,
>  	SET_V2PCFG(base, ctx, 0x3);
>  
>  	SET_TTBCR(base, ctx, priv->cfg.arm_v7s_cfg.tcr);
> -	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[0]);
> -	SET_TTBR1(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[1]);
> +	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr);
> +	SET_TTBR1(base, ctx, 0);
>  
>  	/* Set prrr and nmrr */
>  	SET_PRRR(base, ctx, priv->cfg.arm_v7s_cfg.prrr);
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 67a483c1a935..ef0b36eeb83d 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -392,7 +392,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain,
>  	/* Update the pgtable base address register of the M4U HW */
>  	if (!data->m4u_dom) {
>  		data->m4u_dom = dom;
> -		writel(dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> +		writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  		       data->base + REG_MMU_PT_BASE_ADDR);
>  	}
>  
> @@ -797,7 +797,7 @@ static int __maybe_unused mtk_iommu_resume(struct device *dev)
>  	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
>  	writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
>  	if (m4u_dom)
> -		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> +		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  		       base + REG_MMU_PT_BASE_ADDR);
>  	return 0;
>  }
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 66e9b40e9275..9a57eb6c253c 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -269,10 +269,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  
>  		/* TTBRs */
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> -				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0] |
> +				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> -				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[1] |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
>  
>  		/* TCR */
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index ee21eedafe98..53bca5343f52 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -100,7 +100,7 @@ struct io_pgtable_cfg {
>  	/* Low-level data specific to the table format */
>  	union {
>  		struct {
> -			u64	ttbr[2];
> +			u64	ttbr;
>  			u64	tcr;
>  			u64	mair;
>  		} arm_lpae_s1_cfg;
> @@ -111,7 +111,7 @@ struct io_pgtable_cfg {
>  		} arm_lpae_s2_cfg;
>  
>  		struct {
> -			u32	ttbr[2];
> +			u32	ttbr;
>  			u32	tcr;
>  			u32	nmrr;
>  			u32	prrr;
> -- 
> 2.21.0.dirty
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling
  2019-10-25 18:08   ` Robin Murphy
                     ` (2 preceding siblings ...)
  (?)
@ 2019-11-22 22:40   ` Jordan Crouse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jordan Crouse @ 2019-11-22 22:40 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, will, linux-arm-kernel

On Fri, Oct 25, 2019 at 07:08:37PM +0100, Robin Murphy wrote:
> TTBR1 values have so far been redundant since no users implement any
> support for split address spaces. Crucially, though, one of the main
> reasons for wanting to do so is to be able to manage each half entirely
> independently, e.g. context-switching one set of mappings without
> disturbing the other. Thus it seems unlikely that tying two tables
> together in a single io_pgtable_cfg would ever be particularly desirable
> or useful.
> 
> Streamline the configs to just a single conceptual TTBR value
> representing the allocated table. This paves the way for future users to
> support split address spaces by simply allocating a table and dealing
> with the detailed TTBRn logistics themselves.

Tested-by: Jordan Crouse <jcrouse@codeaurora.org>

> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c        |  2 +-
>  drivers/iommu/arm-smmu.c           |  9 ++++-----
>  drivers/iommu/io-pgtable-arm-v7s.c | 16 +++++++---------
>  drivers/iommu/io-pgtable-arm.c     |  5 ++---
>  drivers/iommu/ipmmu-vmsa.c         |  2 +-
>  drivers/iommu/msm_iommu.c          |  4 ++--
>  drivers/iommu/mtk_iommu.c          |  4 ++--
>  drivers/iommu/qcom_iommu.c         |  3 +--
>  include/linux/io-pgtable.h         |  4 ++--
>  9 files changed, 22 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3f20e548f1ec..da31e607698f 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2170,7 +2170,7 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
>  	}
>  
>  	cfg->cd.asid	= (u16)asid;
> -	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> +	cfg->cd.ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>  	cfg->cd.tcr	= pgtbl_cfg->arm_lpae_s1_cfg.tcr;
>  	cfg->cd.mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
>  	return 0;
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 2bc3e93b11e6..a249e4e49ead 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -534,13 +534,12 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>  	/* TTBRs */
>  	if (stage1) {
>  		if (cfg->fmt == ARM_SMMU_CTX_FMT_AARCH32_S) {
> -			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr[0];
> -			cb->ttbr[1] = pgtbl_cfg->arm_v7s_cfg.ttbr[1];
> +			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
> +			cb->ttbr[1] = 0;
>  		} else {
> -			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[0];
> +			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
>  			cb->ttbr[0] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> -			cb->ttbr[1] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr[1];
> -			cb->ttbr[1] |= FIELD_PREP(TTBRn_ASID, cfg->asid);
> +			cb->ttbr[1] = FIELD_PREP(TTBRn_ASID, cfg->asid);
>  		}
>  	} else {
>  		cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c b/drivers/iommu/io-pgtable-arm-v7s.c
> index 7c3bd2c3cdca..4d2c1e7f67c4 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -822,15 +822,13 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct io_pgtable_cfg *cfg,
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBRs */
> -	cfg->arm_v7s_cfg.ttbr[0] = virt_to_phys(data->pgd) |
> -				   ARM_V7S_TTBR_S | ARM_V7S_TTBR_NOS |
> -				   (cfg->coherent_walk ?
> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> -				   (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> -				    ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
> -	cfg->arm_v7s_cfg.ttbr[1] = 0;
> +	/* TTBR */
> +	cfg->arm_v7s_cfg.ttbr = virt_to_phys(data->pgd) | ARM_V7S_TTBR_S |
> +				(cfg->coherent_walk ? (ARM_V7S_TTBR_NOS |
> +				  ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_WBWA) |
> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_WBWA)) :
> +				 (ARM_V7S_TTBR_IRGN_ATTR(ARM_V7S_RGN_NC) |
> +				  ARM_V7S_TTBR_ORGN_ATTR(ARM_V7S_RGN_NC)));
>  	return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 1795df8f7a51..bc0841040ebe 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -872,9 +872,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
>  	/* Ensure the empty pgd is visible before any actual TTBR write */
>  	wmb();
>  
> -	/* TTBRs */
> -	cfg->arm_lpae_s1_cfg.ttbr[0] = virt_to_phys(data->pgd);
> -	cfg->arm_lpae_s1_cfg.ttbr[1] = 0;
> +	/* TTBR */
> +	cfg->arm_lpae_s1_cfg.ttbr = virt_to_phys(data->pgd);
>  	return &data->iop;
>  
>  out_free_data:
> diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
> index e4da6efbda49..4fe0ff3216ce 100644
> --- a/drivers/iommu/ipmmu-vmsa.c
> +++ b/drivers/iommu/ipmmu-vmsa.c
> @@ -416,7 +416,7 @@ static void ipmmu_domain_setup_context(struct ipmmu_vmsa_domain *domain)
>  	u32 tmp;
>  
>  	/* TTBR0 */
> -	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr[0];
> +	ttbr = domain->cfg.arm_lpae_s1_cfg.ttbr;
>  	ipmmu_ctx_write_root(domain, IMTTLBR0, ttbr);
>  	ipmmu_ctx_write_root(domain, IMTTUBR0, ttbr >> 32);
>  
> diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
> index be99d408cf35..9ceec140fa67 100644
> --- a/drivers/iommu/msm_iommu.c
> +++ b/drivers/iommu/msm_iommu.c
> @@ -279,8 +279,8 @@ static void __program_context(void __iomem *base, int ctx,
>  	SET_V2PCFG(base, ctx, 0x3);
>  
>  	SET_TTBCR(base, ctx, priv->cfg.arm_v7s_cfg.tcr);
> -	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[0]);
> -	SET_TTBR1(base, ctx, priv->cfg.arm_v7s_cfg.ttbr[1]);
> +	SET_TTBR0(base, ctx, priv->cfg.arm_v7s_cfg.ttbr);
> +	SET_TTBR1(base, ctx, 0);
>  
>  	/* Set prrr and nmrr */
>  	SET_PRRR(base, ctx, priv->cfg.arm_v7s_cfg.prrr);
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 67a483c1a935..ef0b36eeb83d 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -392,7 +392,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain,
>  	/* Update the pgtable base address register of the M4U HW */
>  	if (!data->m4u_dom) {
>  		data->m4u_dom = dom;
> -		writel(dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> +		writel(dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  		       data->base + REG_MMU_PT_BASE_ADDR);
>  	}
>  
> @@ -797,7 +797,7 @@ static int __maybe_unused mtk_iommu_resume(struct device *dev)
>  	writel_relaxed(reg->ivrp_paddr, base + REG_MMU_IVRP_PADDR);
>  	writel_relaxed(reg->vld_pa_rng, base + REG_MMU_VLD_PA_RNG);
>  	if (m4u_dom)
> -		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr[0] & MMU_PT_ADDR_MASK,
> +		writel(m4u_dom->cfg.arm_v7s_cfg.ttbr & MMU_PT_ADDR_MASK,
>  		       base + REG_MMU_PT_BASE_ADDR);
>  	return 0;
>  }
> diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> index 66e9b40e9275..9a57eb6c253c 100644
> --- a/drivers/iommu/qcom_iommu.c
> +++ b/drivers/iommu/qcom_iommu.c
> @@ -269,10 +269,9 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
>  
>  		/* TTBRs */
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> -				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[0] |
> +				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
>  		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> -				pgtbl_cfg.arm_lpae_s1_cfg.ttbr[1] |
>  				FIELD_PREP(TTBRn_ASID, ctx->asid));
>  
>  		/* TCR */
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index ee21eedafe98..53bca5343f52 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -100,7 +100,7 @@ struct io_pgtable_cfg {
>  	/* Low-level data specific to the table format */
>  	union {
>  		struct {
> -			u64	ttbr[2];
> +			u64	ttbr;
>  			u64	tcr;
>  			u64	mair;
>  		} arm_lpae_s1_cfg;
> @@ -111,7 +111,7 @@ struct io_pgtable_cfg {
>  		} arm_lpae_s2_cfg;
>  
>  		struct {
> -			u32	ttbr[2];
> +			u32	ttbr;
>  			u32	tcr;
>  			u32	nmrr;
>  			u32	prrr;
> -- 
> 2.21.0.dirty
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
  2019-11-22 15:51           ` Robin Murphy
@ 2019-11-25  7:58             ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-25  7:58 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Fri, Nov 22, 2019 at 03:51:26PM +0000, Robin Murphy wrote:
> On 20/11/2019 3:11 pm, Will Deacon wrote:
> > On Mon, Nov 04, 2019 at 04:27:56PM -0700, Jordan Crouse wrote:
> > > On Mon, Nov 04, 2019 at 07:14:45PM +0000, Will Deacon wrote:
> > > > On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> > > > > diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> > > > > index 9a57eb6c253c..059be7e21030 100644
> > > > > --- a/drivers/iommu/qcom_iommu.c
> > > > > +++ b/drivers/iommu/qcom_iommu.c
> > > > > @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
> > > > >   		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> > > > >   				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
> > > > >   				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > > > > -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> > > > > -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > > > > +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
> > > > 
> > > > Are you sure it's safe to drop the ASID here? Just want to make sure there
> > > > wasn't some "quirk" this was helping with.
> > > 
> > > I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in
> > > TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so
> > > the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if
> > > this is a vendor specific blip or not.
> > 
> > You should be able to set EPD1 to prevent walks via TTBR1 in that case,
> > though. Sticking the ASID in there is still dodgy if EPD1 is clear and
> > TTBR1 points at junk (or even physical address 0x0).
> > 
> > That's probably something which should be folded into this patch.
> 
> Note that EPD1 was being set by io-pgtable-arm before this patch, and
> remains set by virtue of arm_smmu_lpae_tcr() afterwards, so presumably the
> brokenness might run a bit deeper than that. Either way, though, I'm
> somewhat dubious since the ASID could well be 0 anyway :/

Ah, I missed that the qcom driver was calling arm_smmu_lpae_tcr() with
your patches. In which case, everything should be fine, no?

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling
@ 2019-11-25  7:58             ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2019-11-25  7:58 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Fri, Nov 22, 2019 at 03:51:26PM +0000, Robin Murphy wrote:
> On 20/11/2019 3:11 pm, Will Deacon wrote:
> > On Mon, Nov 04, 2019 at 04:27:56PM -0700, Jordan Crouse wrote:
> > > On Mon, Nov 04, 2019 at 07:14:45PM +0000, Will Deacon wrote:
> > > > On Fri, Oct 25, 2019 at 07:08:38PM +0100, Robin Murphy wrote:
> > > > > diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
> > > > > index 9a57eb6c253c..059be7e21030 100644
> > > > > --- a/drivers/iommu/qcom_iommu.c
> > > > > +++ b/drivers/iommu/qcom_iommu.c
> > > > > @@ -271,15 +271,13 @@ static int qcom_iommu_init_domain(struct iommu_domain *domain,
> > > > >   		iommu_writeq(ctx, ARM_SMMU_CB_TTBR0,
> > > > >   				pgtbl_cfg.arm_lpae_s1_cfg.ttbr |
> > > > >   				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > > > > -		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1,
> > > > > -				FIELD_PREP(TTBRn_ASID, ctx->asid));
> > > > > +		iommu_writeq(ctx, ARM_SMMU_CB_TTBR1, 0);
> > > > 
> > > > Are you sure it's safe to drop the ASID here? Just want to make sure there
> > > > wasn't some "quirk" this was helping with.
> > > 
> > > I was reminded of this recently. Some of our SMMU guys told me that a 0x0 in
> > > TTBR1 could cause a S2 fault if a faulty transaction caused a ttbr1 lookup so
> > > the "quirk" was writing the ASID so the register wasn't zero. I'm not sure if
> > > this is a vendor specific blip or not.
> > 
> > You should be able to set EPD1 to prevent walks via TTBR1 in that case,
> > though. Sticking the ASID in there is still dodgy if EPD1 is clear and
> > TTBR1 points at junk (or even physical address 0x0).
> > 
> > That's probably something which should be folded into this patch.
> 
> Note that EPD1 was being set by io-pgtable-arm before this patch, and
> remains set by virtue of arm_smmu_lpae_tcr() afterwards, so presumably the
> brokenness might run a bit deeper than that. Either way, though, I'm
> somewhat dubious since the ASID could well be 0 anyway :/

Ah, I missed that the qcom driver was calling arm_smmu_lpae_tcr() with
your patches. In which case, everything should be fine, no?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables
  2019-11-04 20:20     ` Will Deacon
@ 2020-01-10 15:09       ` Will Deacon
  -1 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2020-01-10 15:09 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, linux-arm-kernel

On Mon, Nov 04, 2019 at 08:20:12PM +0000, Will Deacon wrote:
> On Mon, Nov 04, 2019 at 07:22:28PM +0000, Will Deacon wrote:
> > On Fri, Oct 25, 2019 at 07:08:29PM +0100, Robin Murphy wrote:
> > > Since the flawed first attempt, I've reworked things with an abstracted
> > > TCR and an explicit TTBR1 quirk. I originally envisaged the need to pass
> > > the quirk all the way down to the TLBI calls, hence getting diverted
> > > into trying to make the parameter passing less cluttered in general, but
> > > in the end it turned out fairly neat to just fix the indexing such that
> > > we can always just pass around the original unmodified IOVA. Most of the
> > > new patches come from staring at that indexing code for long enough to
> > > see the subtle inefficiencies that were worth ironing out, plus a bit of
> > > random cleanup which doesn't feel worth posting separately.
> > > 
> > > Note that these patches depend on the fixes already queued in -rc4,
> > > otherwise there will be conflicts in arm_mali_lpae_alloc_pgtable().
> > > 
> > > Robin.
> > > 
> > > 
> > > Robin Murphy (10):
> > >   iommu/io-pgtable: Make selftest gubbins consistently __init
> > >   iommu/io-pgtable-arm: Rationalise size check
> > >   iommu/io-pgtable-arm: Simplify bounds checks
> > >   iommu/io-pgtable-arm: Simplify start level lookup
> > >   iommu/io-pgtable-arm: Simplify PGD size handling
> > >   iommu/io-pgtable-arm: Simplify level indexing
> > >   iommu/io-pgtable-arm: Rationalise MAIR handling
> > >   iommu/io-pgtable-arm: Rationalise TTBRn handling
> > >   iommu/io-pgtable-arm: Rationalise TCR handling
> > >   iommu/io-pgtable-arm: Prepare for TTBR1 usage
> > 
> > Overall, this looks really good to me. There's a bit more work to do
> > (see my comments) and I'd like Jordan to have a look as well, but on the
> > whole it's a big improvement. Thanks.
> 
> Also, I've merged the first 7 patches to save you having to repost those:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates

I've now picked up the remaining three patches, but I'll post them to the
list shortly because I've ended up trying to address my own review comments
as I'd like this stuff in before we go ahead with Jordan's patches.

Will
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables
@ 2020-01-10 15:09       ` Will Deacon
  0 siblings, 0 replies; 69+ messages in thread
From: Will Deacon @ 2020-01-10 15:09 UTC (permalink / raw)
  To: Robin Murphy; +Cc: iommu, jcrouse, linux-arm-kernel

On Mon, Nov 04, 2019 at 08:20:12PM +0000, Will Deacon wrote:
> On Mon, Nov 04, 2019 at 07:22:28PM +0000, Will Deacon wrote:
> > On Fri, Oct 25, 2019 at 07:08:29PM +0100, Robin Murphy wrote:
> > > Since the flawed first attempt, I've reworked things with an abstracted
> > > TCR and an explicit TTBR1 quirk. I originally envisaged the need to pass
> > > the quirk all the way down to the TLBI calls, hence getting diverted
> > > into trying to make the parameter passing less cluttered in general, but
> > > in the end it turned out fairly neat to just fix the indexing such that
> > > we can always just pass around the original unmodified IOVA. Most of the
> > > new patches come from staring at that indexing code for long enough to
> > > see the subtle inefficiencies that were worth ironing out, plus a bit of
> > > random cleanup which doesn't feel worth posting separately.
> > > 
> > > Note that these patches depend on the fixes already queued in -rc4,
> > > otherwise there will be conflicts in arm_mali_lpae_alloc_pgtable().
> > > 
> > > Robin.
> > > 
> > > 
> > > Robin Murphy (10):
> > >   iommu/io-pgtable: Make selftest gubbins consistently __init
> > >   iommu/io-pgtable-arm: Rationalise size check
> > >   iommu/io-pgtable-arm: Simplify bounds checks
> > >   iommu/io-pgtable-arm: Simplify start level lookup
> > >   iommu/io-pgtable-arm: Simplify PGD size handling
> > >   iommu/io-pgtable-arm: Simplify level indexing
> > >   iommu/io-pgtable-arm: Rationalise MAIR handling
> > >   iommu/io-pgtable-arm: Rationalise TTBRn handling
> > >   iommu/io-pgtable-arm: Rationalise TCR handling
> > >   iommu/io-pgtable-arm: Prepare for TTBR1 usage
> > 
> > Overall, this looks really good to me. There's a bit more work to do
> > (see my comments) and I'd like Jordan to have a look as well, but on the
> > whole it's a big improvement. Thanks.
> 
> Also, I've merged the first 7 patches to save you having to repost those:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=for-joerg/arm-smmu/updates

I've now picked up the remaining three patches, but I'll post them to the
list shortly because I've ended up trying to address my own review comments
as I'd like this stuff in before we go ahead with Jordan's patches.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2020-01-10 15:09 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-25 18:08 [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables Robin Murphy
2019-10-25 18:08 ` Robin Murphy
2019-10-25 18:08 ` [PATCH v2 01/10] iommu/io-pgtable: Make selftest gubbins consistently __init Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-10-25 18:08 ` [PATCH v2 02/10] iommu/io-pgtable-arm: Rationalise size check Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-10-25 18:08 ` [PATCH v2 03/10] iommu/io-pgtable-arm: Simplify bounds checks Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-10-25 18:08 ` [PATCH v2 04/10] iommu/io-pgtable-arm: Simplify start level lookup Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-10-25 18:08 ` [PATCH v2 05/10] iommu/io-pgtable-arm: Simplify PGD size handling Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-10-25 18:08 ` [PATCH v2 06/10] iommu/io-pgtable-arm: Simplify level indexing Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-11-04 18:17   ` Will Deacon
2019-11-04 18:17     ` Will Deacon
2019-11-04 18:36     ` Robin Murphy
2019-11-04 18:36       ` Robin Murphy
2019-11-04 19:20       ` Will Deacon
2019-11-04 19:20         ` Will Deacon
2019-10-25 18:08 ` [PATCH v2 07/10] iommu/io-pgtable-arm: Rationalise MAIR handling Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-11-04 18:20   ` Will Deacon
2019-11-04 18:20     ` Will Deacon
2019-11-04 18:43     ` Robin Murphy
2019-11-04 18:43       ` Robin Murphy
2019-11-04 19:20       ` Will Deacon
2019-11-04 19:20         ` Will Deacon
2019-11-04 19:57         ` Will Deacon
2019-11-04 19:57           ` Will Deacon
2019-10-25 18:08 ` [PATCH v2 08/10] iommu/io-pgtable-arm: Rationalise TTBRn handling Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-10-28 15:09   ` Steven Price
2019-10-28 15:09     ` Steven Price
2019-10-28 18:51     ` Robin Murphy
2019-10-28 18:51       ` Robin Murphy
2019-11-04 18:36       ` Will Deacon
2019-11-04 18:36         ` Will Deacon
2019-11-04 19:12         ` Robin Murphy
2019-11-04 19:12           ` Robin Murphy
2019-11-22 22:40   ` Jordan Crouse
2019-11-22 22:40   ` Jordan Crouse
2019-10-25 18:08 ` [PATCH v2 09/10] iommu/io-pgtable-arm: Rationalise TCR handling Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-11-04 19:14   ` Will Deacon
2019-11-04 19:14     ` Will Deacon
2019-11-04 23:27     ` Jordan Crouse
2019-11-04 23:27       ` Jordan Crouse
2019-11-20 15:11       ` Will Deacon
2019-11-22 15:51         ` Robin Murphy
2019-11-22 15:51           ` Robin Murphy
2019-11-25  7:58           ` Will Deacon
2019-11-25  7:58             ` Will Deacon
2019-11-22 22:03   ` Jordan Crouse
2019-11-22 22:03   ` Jordan Crouse
2019-10-25 18:08 ` [PATCH v2 10/10] iommu/io-pgtable-arm: Prepare for TTBR1 usage Robin Murphy
2019-10-25 18:08   ` Robin Murphy
2019-11-04 23:40   ` Jordan Crouse
2019-11-04 23:40     ` Jordan Crouse
2019-11-20 19:18     ` Will Deacon
2019-11-20 19:18       ` Will Deacon
2019-11-22 22:03   ` Jordan Crouse
2019-11-22 22:03   ` Jordan Crouse
2019-11-04 19:22 ` [PATCH v2 00/10] iommu/io-pgtable: Cleanup and prep for split tables Will Deacon
2019-11-04 19:22   ` Will Deacon
2019-11-04 20:20   ` Will Deacon
2019-11-04 20:20     ` Will Deacon
2020-01-10 15:09     ` Will Deacon
2020-01-10 15:09       ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.