All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] SMMU 52-bit address support
@ 2017-12-14 16:58 ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Hi all,

Here's a (hopefully final) update of 52-bit SMMU support. The only
material changes from v1[1] are a small cosmetic tweak to patch #1,
and correction of the silly error in patch #2 as reported by Nate in
testing.

Robin.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-November/025073.html

Robin Murphy (4):
  iommu/arm-smmu-v3: Clean up address masking
  iommu/io-pgtable-arm: Support 52-bit physical address
  iommu/arm-smmu-v3: Support 52-bit physical address
  iommu/arm-smmu-v3: Support 52-bit virtual address

 drivers/iommu/arm-smmu-v3.c    | 76 ++++++++++++++++++++++--------------------
 drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++----------
 2 files changed, 86 insertions(+), 55 deletions(-)

-- 
2.13.4.dirty

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/4] SMMU 52-bit address support
@ 2017-12-14 16:58 ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

Hi all,

Here's a (hopefully final) update of 52-bit SMMU support. The only
material changes from v1[1] are a small cosmetic tweak to patch #1,
and correction of the silly error in patch #2 as reported by Nate in
testing.

Robin.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-November/025073.html

Robin Murphy (4):
  iommu/arm-smmu-v3: Clean up address masking
  iommu/io-pgtable-arm: Support 52-bit physical address
  iommu/arm-smmu-v3: Support 52-bit physical address
  iommu/arm-smmu-v3: Support 52-bit virtual address

 drivers/iommu/arm-smmu-v3.c    | 76 ++++++++++++++++++++++--------------------
 drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++----------
 2 files changed, 86 insertions(+), 55 deletions(-)

-- 
2.13.4.dirty

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
  2017-12-14 16:58 ` Robin Murphy
@ 2017-12-14 16:58     ` Robin Murphy
  -1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Before trying to add the SMMUv3.1 support for 52-bit addresses, make
things bearable by cleaning up the various address mask definitions to
use GENMASK_ULL() consistently. The fact that doing so reveals (and
fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
jolly good idea it is...

Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

v2: Clean up one more now-unnecessary linewrap

 drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f122071688fd..52cad776b31b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include <linux/acpi.h>
 #include <linux/acpi_iort.h>
+#include <linux/bitops.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
 #include <linux/err.h>
@@ -158,8 +159,7 @@
 
 #define ARM_SMMU_STRTAB_BASE		0x80
 #define STRTAB_BASE_RA			(1UL << 62)
-#define STRTAB_BASE_ADDR_SHIFT		6
-#define STRTAB_BASE_ADDR_MASK		0x3ffffffffffUL
+#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(47, 6)
 
 #define ARM_SMMU_STRTAB_BASE_CFG	0x88
 #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT	0
@@ -190,8 +190,7 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
 
 /* Common MSI config fields */
-#define MSI_CFG0_ADDR_SHIFT		2
-#define MSI_CFG0_ADDR_MASK		0x3fffffffffffUL
+#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(47, 2)
 #define MSI_CFG2_SH_SHIFT		4
 #define MSI_CFG2_SH_NSH			(0UL << MSI_CFG2_SH_SHIFT)
 #define MSI_CFG2_SH_OSH			(2UL << MSI_CFG2_SH_SHIFT)
@@ -207,8 +206,7 @@
 					 Q_IDX(q, p) * (q)->ent_dwords)
 
 #define Q_BASE_RWA			(1UL << 62)
-#define Q_BASE_ADDR_SHIFT		5
-#define Q_BASE_ADDR_MASK		0xfffffffffffUL
+#define Q_BASE_ADDR_MASK		GENMASK_ULL(47, 5)
 #define Q_BASE_LOG2SIZE_SHIFT		0
 #define Q_BASE_LOG2SIZE_MASK		0x1fUL
 
@@ -225,8 +223,7 @@
 #define STRTAB_L1_DESC_DWORDS		1
 #define STRTAB_L1_DESC_SPAN_SHIFT	0
 #define STRTAB_L1_DESC_SPAN_MASK	0x1fUL
-#define STRTAB_L1_DESC_L2PTR_SHIFT	6
-#define STRTAB_L1_DESC_L2PTR_MASK	0x3ffffffffffUL
+#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(47, 6)
 
 #define STRTAB_STE_DWORDS		8
 #define STRTAB_STE_0_V			(1UL << 0)
@@ -239,8 +236,7 @@
 
 #define STRTAB_STE_0_S1FMT_SHIFT	4
 #define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_SHIFT	6
-#define STRTAB_STE_0_S1CTXPTR_MASK	0x3ffffffffffUL
+#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(47, 6)
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
 #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
 
@@ -278,8 +274,7 @@
 #define STRTAB_STE_2_S2PTW		(1UL << 54)
 #define STRTAB_STE_2_S2R		(1UL << 58)
 
-#define STRTAB_STE_3_S2TTB_SHIFT	4
-#define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
+#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(47, 4)
 
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS		8
@@ -325,8 +320,7 @@
 #define CTXDESC_CD_0_ASID_SHIFT		48
 #define CTXDESC_CD_0_ASID_MASK		0xffffUL
 
-#define CTXDESC_CD_1_TTB0_SHIFT		4
-#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
+#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(47, 4)
 
 #define CTXDESC_CD_3_MAIR_SHIFT		0
 
@@ -351,7 +345,7 @@
 
 #define CMDQ_PREFETCH_0_SID_SHIFT	32
 #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
-#define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
+#define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
 
 #define CMDQ_CFGI_0_SID_SHIFT		32
 #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
@@ -362,8 +356,8 @@
 #define CMDQ_TLBI_0_VMID_SHIFT		32
 #define CMDQ_TLBI_0_ASID_SHIFT		48
 #define CMDQ_TLBI_1_LEAF		(1UL << 0)
-#define CMDQ_TLBI_1_VA_MASK		~0xfffUL
-#define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
+#define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
+#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(47, 12)
 
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
@@ -386,8 +380,7 @@
 #define CMDQ_SYNC_0_MSIATTR_OIWB	(0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
 #define CMDQ_SYNC_0_MSIDATA_SHIFT	32
 #define CMDQ_SYNC_0_MSIDATA_MASK	0xffffffffUL
-#define CMDQ_SYNC_1_MSIADDR_SHIFT	0
-#define CMDQ_SYNC_1_MSIADDR_MASK	0xffffffffffffcUL
+#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(47, 2)
 
 /* Event queue */
 #define EVTQ_ENT_DWORDS			4
@@ -413,8 +406,7 @@
 
 #define PRIQ_1_PRG_IDX_SHIFT		0
 #define PRIQ_1_PRG_IDX_MASK		0x1ffUL
-#define PRIQ_1_ADDR_SHIFT		12
-#define PRIQ_1_ADDR_MASK		0xfffffffffffffUL
+#define PRIQ_1_ADDR_MASK		GENMASK_ULL(63, 12)
 
 /* High-level queue structures */
 #define ARM_SMMU_POLL_TIMEOUT_US	100
@@ -1093,7 +1085,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
 
 	cfg->cdptr[0] = cpu_to_le64(val);
 
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
+	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
 	cfg->cdptr[1] = cpu_to_le64(val);
 
 	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
@@ -1107,8 +1099,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
 
 	val |= (desc->span & STRTAB_L1_DESC_SPAN_MASK)
 		<< STRTAB_L1_DESC_SPAN_SHIFT;
-	val |= desc->l2ptr_dma &
-	       STRTAB_L1_DESC_L2PTR_MASK << STRTAB_L1_DESC_L2PTR_SHIFT;
+	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
 
 	*dst = cpu_to_le64(val);
 }
@@ -1214,8 +1205,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
-		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
+		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
 
@@ -1232,7 +1222,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_2_S2R);
 
 		dst[3] = cpu_to_le64(ste->s2_cfg->vttbr &
-			 STRTAB_STE_3_S2TTB_MASK << STRTAB_STE_3_S2TTB_SHIFT);
+			 STRTAB_STE_3_S2TTB_MASK);
 
 		val |= STRTAB_STE_0_CFG_S2_TRANS;
 	}
@@ -1337,7 +1327,7 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
 		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
 		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
+		 evt[1] & PRIQ_1_ADDR_MASK);
 
 	if (last) {
 		struct arm_smmu_cmdq_ent cmd = {
@@ -2093,7 +2083,7 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 	q->ent_dwords	= dwords;
 
 	q->q_base  = Q_BASE_RWA;
-	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK << Q_BASE_ADDR_SHIFT;
+	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
 	q->q_base |= (q->max_n_shift & Q_BASE_LOG2SIZE_MASK)
 		     << Q_BASE_LOG2SIZE_SHIFT;
 
@@ -2230,8 +2220,7 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 		return ret;
 
 	/* Set the strtab base address */
-	reg  = smmu->strtab_cfg.strtab_dma &
-	       STRTAB_BASE_ADDR_MASK << STRTAB_BASE_ADDR_SHIFT;
+	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
 	reg |= STRTAB_BASE_RA;
 	smmu->strtab_cfg.strtab_base = reg;
 
@@ -2294,7 +2283,7 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
 	phys_addr_t *cfg = arm_smmu_msi_cfg[desc->platform.msi_index];
 
 	doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
-	doorbell &= MSI_CFG0_ADDR_MASK << MSI_CFG0_ADDR_SHIFT;
+	doorbell &= MSI_CFG0_ADDR_MASK;
 
 	writeq_relaxed(doorbell, smmu->base + cfg[0]);
 	writel_relaxed(msg->data, smmu->base + cfg[1]);
-- 
2.13.4.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
@ 2017-12-14 16:58     ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

Before trying to add the SMMUv3.1 support for 52-bit addresses, make
things bearable by cleaning up the various address mask definitions to
use GENMASK_ULL() consistently. The fact that doing so reveals (and
fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
jolly good idea it is...

Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: Clean up one more now-unnecessary linewrap

 drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
 1 file changed, 21 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f122071688fd..52cad776b31b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
 
 #include <linux/acpi.h>
 #include <linux/acpi_iort.h>
+#include <linux/bitops.h>
 #include <linux/delay.h>
 #include <linux/dma-iommu.h>
 #include <linux/err.h>
@@ -158,8 +159,7 @@
 
 #define ARM_SMMU_STRTAB_BASE		0x80
 #define STRTAB_BASE_RA			(1UL << 62)
-#define STRTAB_BASE_ADDR_SHIFT		6
-#define STRTAB_BASE_ADDR_MASK		0x3ffffffffffUL
+#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(47, 6)
 
 #define ARM_SMMU_STRTAB_BASE_CFG	0x88
 #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT	0
@@ -190,8 +190,7 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
 
 /* Common MSI config fields */
-#define MSI_CFG0_ADDR_SHIFT		2
-#define MSI_CFG0_ADDR_MASK		0x3fffffffffffUL
+#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(47, 2)
 #define MSI_CFG2_SH_SHIFT		4
 #define MSI_CFG2_SH_NSH			(0UL << MSI_CFG2_SH_SHIFT)
 #define MSI_CFG2_SH_OSH			(2UL << MSI_CFG2_SH_SHIFT)
@@ -207,8 +206,7 @@
 					 Q_IDX(q, p) * (q)->ent_dwords)
 
 #define Q_BASE_RWA			(1UL << 62)
-#define Q_BASE_ADDR_SHIFT		5
-#define Q_BASE_ADDR_MASK		0xfffffffffffUL
+#define Q_BASE_ADDR_MASK		GENMASK_ULL(47, 5)
 #define Q_BASE_LOG2SIZE_SHIFT		0
 #define Q_BASE_LOG2SIZE_MASK		0x1fUL
 
@@ -225,8 +223,7 @@
 #define STRTAB_L1_DESC_DWORDS		1
 #define STRTAB_L1_DESC_SPAN_SHIFT	0
 #define STRTAB_L1_DESC_SPAN_MASK	0x1fUL
-#define STRTAB_L1_DESC_L2PTR_SHIFT	6
-#define STRTAB_L1_DESC_L2PTR_MASK	0x3ffffffffffUL
+#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(47, 6)
 
 #define STRTAB_STE_DWORDS		8
 #define STRTAB_STE_0_V			(1UL << 0)
@@ -239,8 +236,7 @@
 
 #define STRTAB_STE_0_S1FMT_SHIFT	4
 #define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_SHIFT	6
-#define STRTAB_STE_0_S1CTXPTR_MASK	0x3ffffffffffUL
+#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(47, 6)
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
 #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
 
@@ -278,8 +274,7 @@
 #define STRTAB_STE_2_S2PTW		(1UL << 54)
 #define STRTAB_STE_2_S2R		(1UL << 58)
 
-#define STRTAB_STE_3_S2TTB_SHIFT	4
-#define STRTAB_STE_3_S2TTB_MASK		0xfffffffffffUL
+#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(47, 4)
 
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS		8
@@ -325,8 +320,7 @@
 #define CTXDESC_CD_0_ASID_SHIFT		48
 #define CTXDESC_CD_0_ASID_MASK		0xffffUL
 
-#define CTXDESC_CD_1_TTB0_SHIFT		4
-#define CTXDESC_CD_1_TTB0_MASK		0xfffffffffffUL
+#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(47, 4)
 
 #define CTXDESC_CD_3_MAIR_SHIFT		0
 
@@ -351,7 +345,7 @@
 
 #define CMDQ_PREFETCH_0_SID_SHIFT	32
 #define CMDQ_PREFETCH_1_SIZE_SHIFT	0
-#define CMDQ_PREFETCH_1_ADDR_MASK	~0xfffUL
+#define CMDQ_PREFETCH_1_ADDR_MASK	GENMASK_ULL(63, 12)
 
 #define CMDQ_CFGI_0_SID_SHIFT		32
 #define CMDQ_CFGI_0_SID_MASK		0xffffffffUL
@@ -362,8 +356,8 @@
 #define CMDQ_TLBI_0_VMID_SHIFT		32
 #define CMDQ_TLBI_0_ASID_SHIFT		48
 #define CMDQ_TLBI_1_LEAF		(1UL << 0)
-#define CMDQ_TLBI_1_VA_MASK		~0xfffUL
-#define CMDQ_TLBI_1_IPA_MASK		0xfffffffff000UL
+#define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
+#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(47, 12)
 
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
@@ -386,8 +380,7 @@
 #define CMDQ_SYNC_0_MSIATTR_OIWB	(0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
 #define CMDQ_SYNC_0_MSIDATA_SHIFT	32
 #define CMDQ_SYNC_0_MSIDATA_MASK	0xffffffffUL
-#define CMDQ_SYNC_1_MSIADDR_SHIFT	0
-#define CMDQ_SYNC_1_MSIADDR_MASK	0xffffffffffffcUL
+#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(47, 2)
 
 /* Event queue */
 #define EVTQ_ENT_DWORDS			4
@@ -413,8 +406,7 @@
 
 #define PRIQ_1_PRG_IDX_SHIFT		0
 #define PRIQ_1_PRG_IDX_MASK		0x1ffUL
-#define PRIQ_1_ADDR_SHIFT		12
-#define PRIQ_1_ADDR_MASK		0xfffffffffffffUL
+#define PRIQ_1_ADDR_MASK		GENMASK_ULL(63, 12)
 
 /* High-level queue structures */
 #define ARM_SMMU_POLL_TIMEOUT_US	100
@@ -1093,7 +1085,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
 
 	cfg->cdptr[0] = cpu_to_le64(val);
 
-	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
+	val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
 	cfg->cdptr[1] = cpu_to_le64(val);
 
 	cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
@@ -1107,8 +1099,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
 
 	val |= (desc->span & STRTAB_L1_DESC_SPAN_MASK)
 		<< STRTAB_L1_DESC_SPAN_SHIFT;
-	val |= desc->l2ptr_dma &
-	       STRTAB_L1_DESC_L2PTR_MASK << STRTAB_L1_DESC_L2PTR_SHIFT;
+	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
 
 	*dst = cpu_to_le64(val);
 }
@@ -1214,8 +1205,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 		   !(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
 			dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
-		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
-		        << STRTAB_STE_0_S1CTXPTR_SHIFT) |
+		val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
 			STRTAB_STE_0_CFG_S1_TRANS;
 	}
 
@@ -1232,7 +1222,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
 			 STRTAB_STE_2_S2R);
 
 		dst[3] = cpu_to_le64(ste->s2_cfg->vttbr &
-			 STRTAB_STE_3_S2TTB_MASK << STRTAB_STE_3_S2TTB_SHIFT);
+			 STRTAB_STE_3_S2TTB_MASK);
 
 		val |= STRTAB_STE_0_CFG_S2_TRANS;
 	}
@@ -1337,7 +1327,7 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
 		 evt[0] & PRIQ_0_PERM_READ ? "R" : "",
 		 evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
 		 evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
-		 evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
+		 evt[1] & PRIQ_1_ADDR_MASK);
 
 	if (last) {
 		struct arm_smmu_cmdq_ent cmd = {
@@ -2093,7 +2083,7 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
 	q->ent_dwords	= dwords;
 
 	q->q_base  = Q_BASE_RWA;
-	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK << Q_BASE_ADDR_SHIFT;
+	q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
 	q->q_base |= (q->max_n_shift & Q_BASE_LOG2SIZE_MASK)
 		     << Q_BASE_LOG2SIZE_SHIFT;
 
@@ -2230,8 +2220,7 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 		return ret;
 
 	/* Set the strtab base address */
-	reg  = smmu->strtab_cfg.strtab_dma &
-	       STRTAB_BASE_ADDR_MASK << STRTAB_BASE_ADDR_SHIFT;
+	reg  = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
 	reg |= STRTAB_BASE_RA;
 	smmu->strtab_cfg.strtab_base = reg;
 
@@ -2294,7 +2283,7 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
 	phys_addr_t *cfg = arm_smmu_msi_cfg[desc->platform.msi_index];
 
 	doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
-	doorbell &= MSI_CFG0_ADDR_MASK << MSI_CFG0_ADDR_SHIFT;
+	doorbell &= MSI_CFG0_ADDR_MASK;
 
 	writeq_relaxed(doorbell, smmu->base + cfg[0]);
 	writel_relaxed(msg->data, smmu->base + cfg[1]);
-- 
2.13.4.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
  2017-12-14 16:58 ` Robin Murphy
@ 2017-12-14 16:58     ` Robin Murphy
  -1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
52-bit physical addresses when using the 64KB translation granule.
This will be supported by SMMUv3.1.

Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

v2: Fix TCR_PS/TCR_IPS copy-paste error

 drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
 1 file changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 51e5c43caed1..10d888b02948 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -21,6 +21,7 @@
 #define pr_fmt(fmt)	"arm-lpae io-pgtable: " fmt
 
 #include <linux/atomic.h>
+#include <linux/bitops.h>
 #include <linux/iommu.h>
 #include <linux/kernel.h>
 #include <linux/sizes.h>
@@ -32,7 +33,7 @@
 
 #include "io-pgtable.h"
 
-#define ARM_LPAE_MAX_ADDR_BITS		48
+#define ARM_LPAE_MAX_ADDR_BITS		52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
 #define ARM_LPAE_MAX_LEVELS		4
 
@@ -86,6 +87,8 @@
 #define ARM_LPAE_PTE_TYPE_TABLE		3
 #define ARM_LPAE_PTE_TYPE_PAGE		3
 
+#define ARM_LPAE_PTE_ADDR_MASK		GENMASK_ULL(47,12)
+
 #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
 #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
@@ -159,6 +162,7 @@
 #define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
 #define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
 #define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
 
 #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
 #define ARM_LPAE_MAIR_ATTR_MASK		0xff
@@ -170,9 +174,7 @@
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
 
 /* IOPTE accessors */
-#define iopte_deref(pte,d)					\
-	(__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)	\
-	& ~(ARM_LPAE_GRANULE(d) - 1ULL)))
+#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
 
 #define iopte_type(pte,l)					\
 	(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
@@ -184,12 +186,6 @@
 		(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) :	\
 		(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK))
 
-#define iopte_to_pfn(pte,d)					\
-	(((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) >> (d)->pg_shift)
-
-#define pfn_to_iopte(pfn,d)					\
-	(((pfn) << (d)->pg_shift) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1))
-
 struct arm_lpae_io_pgtable {
 	struct io_pgtable	iop;
 
@@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
 
 typedef u64 arm_lpae_iopte;
 
+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
+				     struct arm_lpae_io_pgtable *data)
+{
+	arm_lpae_iopte pte = paddr;
+
+	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
+	return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
+}
+
+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
+				  struct arm_lpae_io_pgtable *data)
+{
+	phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
+	phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
+
+	/* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
+	return (paddr ^ paddr_hi) | (paddr_hi << 36);
+}
+
 static bool selftest_running = false;
 
 static dma_addr_t __arm_lpae_dma_addr(void *pages)
@@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
 
 	pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
-	pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
+	pte |= paddr_to_iopte(paddr, data);
 
 	__arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
 }
@@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 	if (size == split_sz)
 		unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
 
-	blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
+	blk_paddr = iopte_to_paddr(blk_pte, data);
 	pte = iopte_prot(blk_pte);
 
 	for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
@@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 
 found_translation:
 	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-	return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
+	return iopte_to_paddr(pte, data) | iova;
 }
 
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
-	unsigned long granule;
+	unsigned long granule, page_sizes;
+	unsigned int max_addr_bits = 48;
 
 	/*
 	 * We need to restrict the supported page sizes to match the
@@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 
 	switch (granule) {
 	case SZ_4K:
-		cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
+		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
 		break;
 	case SZ_16K:
-		cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
+		page_sizes = (SZ_16K | SZ_32M);
 		break;
 	case SZ_64K:
-		cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
+		max_addr_bits = 52;
+		page_sizes = (SZ_64K | SZ_512M);
+		if (cfg->oas > 48)
+			page_sizes |= 1ULL << 42; /* 4TB */
 		break;
 	default:
-		cfg->pgsize_bitmap = 0;
+		page_sizes = 0;
 	}
+
+	cfg->pgsize_bitmap &= page_sizes;
+	cfg->ias = min(cfg->ias, max_addr_bits);
+	cfg->oas = min(cfg->oas, max_addr_bits);
 }
 
 static struct arm_lpae_io_pgtable *
@@ -784,6 +807,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	case 48:
 		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
 		break;
+	case 52:
+		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		break;
 	default:
 		goto out_free_data;
 	}
@@ -891,6 +917,9 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	case 48:
 		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
 		break;
+	case 52:
+		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		break;
 	default:
 		goto out_free_data;
 	}
-- 
2.13.4.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
@ 2017-12-14 16:58     ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
52-bit physical addresses when using the 64KB translation granule.
This will be supported by SMMUv3.1.

Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: Fix TCR_PS/TCR_IPS copy-paste error

 drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
 1 file changed, 47 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 51e5c43caed1..10d888b02948 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -21,6 +21,7 @@
 #define pr_fmt(fmt)	"arm-lpae io-pgtable: " fmt
 
 #include <linux/atomic.h>
+#include <linux/bitops.h>
 #include <linux/iommu.h>
 #include <linux/kernel.h>
 #include <linux/sizes.h>
@@ -32,7 +33,7 @@
 
 #include "io-pgtable.h"
 
-#define ARM_LPAE_MAX_ADDR_BITS		48
+#define ARM_LPAE_MAX_ADDR_BITS		52
 #define ARM_LPAE_S2_MAX_CONCAT_PAGES	16
 #define ARM_LPAE_MAX_LEVELS		4
 
@@ -86,6 +87,8 @@
 #define ARM_LPAE_PTE_TYPE_TABLE		3
 #define ARM_LPAE_PTE_TYPE_PAGE		3
 
+#define ARM_LPAE_PTE_ADDR_MASK		GENMASK_ULL(47,12)
+
 #define ARM_LPAE_PTE_NSTABLE		(((arm_lpae_iopte)1) << 63)
 #define ARM_LPAE_PTE_XN			(((arm_lpae_iopte)3) << 53)
 #define ARM_LPAE_PTE_AF			(((arm_lpae_iopte)1) << 10)
@@ -159,6 +162,7 @@
 #define ARM_LPAE_TCR_PS_42_BIT		0x3ULL
 #define ARM_LPAE_TCR_PS_44_BIT		0x4ULL
 #define ARM_LPAE_TCR_PS_48_BIT		0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT		0x6ULL
 
 #define ARM_LPAE_MAIR_ATTR_SHIFT(n)	((n) << 3)
 #define ARM_LPAE_MAIR_ATTR_MASK		0xff
@@ -170,9 +174,7 @@
 #define ARM_LPAE_MAIR_ATTR_IDX_DEV	2
 
 /* IOPTE accessors */
-#define iopte_deref(pte,d)					\
-	(__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)	\
-	& ~(ARM_LPAE_GRANULE(d) - 1ULL)))
+#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
 
 #define iopte_type(pte,l)					\
 	(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
@@ -184,12 +186,6 @@
 		(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) :	\
 		(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK))
 
-#define iopte_to_pfn(pte,d)					\
-	(((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) >> (d)->pg_shift)
-
-#define pfn_to_iopte(pfn,d)					\
-	(((pfn) << (d)->pg_shift) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1))
-
 struct arm_lpae_io_pgtable {
 	struct io_pgtable	iop;
 
@@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
 
 typedef u64 arm_lpae_iopte;
 
+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
+				     struct arm_lpae_io_pgtable *data)
+{
+	arm_lpae_iopte pte = paddr;
+
+	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
+	return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
+}
+
+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
+				  struct arm_lpae_io_pgtable *data)
+{
+	phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
+	phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
+
+	/* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
+	return (paddr ^ paddr_hi) | (paddr_hi << 36);
+}
+
 static bool selftest_running = false;
 
 static dma_addr_t __arm_lpae_dma_addr(void *pages)
@@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
 		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
 
 	pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
-	pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
+	pte |= paddr_to_iopte(paddr, data);
 
 	__arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
 }
@@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
 	if (size == split_sz)
 		unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
 
-	blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
+	blk_paddr = iopte_to_paddr(blk_pte, data);
 	pte = iopte_prot(blk_pte);
 
 	for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
@@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
 
 found_translation:
 	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
-	return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
+	return iopte_to_paddr(pte, data) | iova;
 }
 
 static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 {
-	unsigned long granule;
+	unsigned long granule, page_sizes;
+	unsigned int max_addr_bits = 48;
 
 	/*
 	 * We need to restrict the supported page sizes to match the
@@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
 
 	switch (granule) {
 	case SZ_4K:
-		cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
+		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
 		break;
 	case SZ_16K:
-		cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
+		page_sizes = (SZ_16K | SZ_32M);
 		break;
 	case SZ_64K:
-		cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
+		max_addr_bits = 52;
+		page_sizes = (SZ_64K | SZ_512M);
+		if (cfg->oas > 48)
+			page_sizes |= 1ULL << 42; /* 4TB */
 		break;
 	default:
-		cfg->pgsize_bitmap = 0;
+		page_sizes = 0;
 	}
+
+	cfg->pgsize_bitmap &= page_sizes;
+	cfg->ias = min(cfg->ias, max_addr_bits);
+	cfg->oas = min(cfg->oas, max_addr_bits);
 }
 
 static struct arm_lpae_io_pgtable *
@@ -784,6 +807,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	case 48:
 		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
 		break;
+	case 52:
+		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+		break;
 	default:
 		goto out_free_data;
 	}
@@ -891,6 +917,9 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
 	case 48:
 		reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
 		break;
+	case 52:
+		reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_PS_SHIFT);
+		break;
 	default:
 		goto out_free_data;
 	}
-- 
2.13.4.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/4] iommu/arm-smmu-v3: Support 52-bit physical address
  2017-12-14 16:58 ` Robin Murphy
@ 2017-12-14 16:58     ` Robin Murphy
  -1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
OAS implies 64KB translation granule support, permitting level 1 block
entries there is simple, and the rest is just extending address fields.

Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

v2: No change

 drivers/iommu/arm-smmu-v3.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 52cad776b31b..c9c4e6132e27 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -101,6 +101,7 @@
 #define IDR5_OAS_42_BIT			(3 << IDR5_OAS_SHIFT)
 #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
+#define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
 
 #define ARM_SMMU_CR0			0x20
 #define CR0_CMDQEN			(1 << 3)
@@ -159,7 +160,7 @@
 
 #define ARM_SMMU_STRTAB_BASE		0x80
 #define STRTAB_BASE_RA			(1UL << 62)
-#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(47, 6)
+#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(51, 6)
 
 #define ARM_SMMU_STRTAB_BASE_CFG	0x88
 #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT	0
@@ -190,7 +191,7 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
 
 /* Common MSI config fields */
-#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(47, 2)
+#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(51, 2)
 #define MSI_CFG2_SH_SHIFT		4
 #define MSI_CFG2_SH_NSH			(0UL << MSI_CFG2_SH_SHIFT)
 #define MSI_CFG2_SH_OSH			(2UL << MSI_CFG2_SH_SHIFT)
@@ -206,7 +207,7 @@
 					 Q_IDX(q, p) * (q)->ent_dwords)
 
 #define Q_BASE_RWA			(1UL << 62)
-#define Q_BASE_ADDR_MASK		GENMASK_ULL(47, 5)
+#define Q_BASE_ADDR_MASK		GENMASK_ULL(51, 5)
 #define Q_BASE_LOG2SIZE_SHIFT		0
 #define Q_BASE_LOG2SIZE_MASK		0x1fUL
 
@@ -223,7 +224,7 @@
 #define STRTAB_L1_DESC_DWORDS		1
 #define STRTAB_L1_DESC_SPAN_SHIFT	0
 #define STRTAB_L1_DESC_SPAN_MASK	0x1fUL
-#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(47, 6)
+#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 6)
 
 #define STRTAB_STE_DWORDS		8
 #define STRTAB_STE_0_V			(1UL << 0)
@@ -236,7 +237,7 @@
 
 #define STRTAB_STE_0_S1FMT_SHIFT	4
 #define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(47, 6)
+#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
 #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
 
@@ -274,7 +275,7 @@
 #define STRTAB_STE_2_S2PTW		(1UL << 54)
 #define STRTAB_STE_2_S2R		(1UL << 58)
 
-#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(47, 4)
+#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
 
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS		8
@@ -320,7 +321,7 @@
 #define CTXDESC_CD_0_ASID_SHIFT		48
 #define CTXDESC_CD_0_ASID_MASK		0xffffUL
 
-#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(47, 4)
+#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
 
 #define CTXDESC_CD_3_MAIR_SHIFT		0
 
@@ -357,7 +358,7 @@
 #define CMDQ_TLBI_0_ASID_SHIFT		48
 #define CMDQ_TLBI_1_LEAF		(1UL << 0)
 #define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
-#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(47, 12)
+#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
 
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
@@ -380,7 +381,7 @@
 #define CMDQ_SYNC_0_MSIATTR_OIWB	(0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
 #define CMDQ_SYNC_0_MSIDATA_SHIFT	32
 #define CMDQ_SYNC_0_MSIDATA_MASK	0xffffffffUL
-#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(47, 2)
+#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(51, 2)
 
 /* Event queue */
 #define EVTQ_ENT_DWORDS			4
@@ -1686,7 +1687,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 		return -ENOMEM;
 
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-	domain->geometry.aperture_end = (1UL << ias) - 1;
+	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
 	domain->geometry.force_aperture = true;
 	smmu_domain->pgtbl_ops = pgtbl_ops;
 
@@ -2693,11 +2694,6 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR5_GRAN4K)
 		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
 
-	if (arm_smmu_ops.pgsize_bitmap == -1UL)
-		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
-	else
-		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
-
 	/* Output address size */
 	switch (reg & IDR5_OAS_MASK << IDR5_OAS_SHIFT) {
 	case IDR5_OAS_32_BIT:
@@ -2715,6 +2711,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	case IDR5_OAS_44_BIT:
 		smmu->oas = 44;
 		break;
+	case IDR5_OAS_52_BIT:
+		smmu->oas = 52;
+		smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
+		break;
 	default:
 		dev_info(smmu->dev,
 			"unknown output address size. Truncating to 48-bit\n");
@@ -2723,6 +2723,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 		smmu->oas = 48;
 	}
 
+	if (arm_smmu_ops.pgsize_bitmap == -1UL)
+		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
+	else
+		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
+
 	/* Set the DMA mask for our table walker */
 	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
 		dev_warn(smmu->dev,
-- 
2.13.4.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 3/4] iommu/arm-smmu-v3: Support 52-bit physical address
@ 2017-12-14 16:58     ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
OAS implies 64KB translation granule support, permitting level 1 block
entries there is simple, and the rest is just extending address fields.

Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: No change

 drivers/iommu/arm-smmu-v3.c | 35 ++++++++++++++++++++---------------
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 52cad776b31b..c9c4e6132e27 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -101,6 +101,7 @@
 #define IDR5_OAS_42_BIT			(3 << IDR5_OAS_SHIFT)
 #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
+#define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
 
 #define ARM_SMMU_CR0			0x20
 #define CR0_CMDQEN			(1 << 3)
@@ -159,7 +160,7 @@
 
 #define ARM_SMMU_STRTAB_BASE		0x80
 #define STRTAB_BASE_RA			(1UL << 62)
-#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(47, 6)
+#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(51, 6)
 
 #define ARM_SMMU_STRTAB_BASE_CFG	0x88
 #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT	0
@@ -190,7 +191,7 @@
 #define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
 
 /* Common MSI config fields */
-#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(47, 2)
+#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(51, 2)
 #define MSI_CFG2_SH_SHIFT		4
 #define MSI_CFG2_SH_NSH			(0UL << MSI_CFG2_SH_SHIFT)
 #define MSI_CFG2_SH_OSH			(2UL << MSI_CFG2_SH_SHIFT)
@@ -206,7 +207,7 @@
 					 Q_IDX(q, p) * (q)->ent_dwords)
 
 #define Q_BASE_RWA			(1UL << 62)
-#define Q_BASE_ADDR_MASK		GENMASK_ULL(47, 5)
+#define Q_BASE_ADDR_MASK		GENMASK_ULL(51, 5)
 #define Q_BASE_LOG2SIZE_SHIFT		0
 #define Q_BASE_LOG2SIZE_MASK		0x1fUL
 
@@ -223,7 +224,7 @@
 #define STRTAB_L1_DESC_DWORDS		1
 #define STRTAB_L1_DESC_SPAN_SHIFT	0
 #define STRTAB_L1_DESC_SPAN_MASK	0x1fUL
-#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(47, 6)
+#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 6)
 
 #define STRTAB_STE_DWORDS		8
 #define STRTAB_STE_0_V			(1UL << 0)
@@ -236,7 +237,7 @@
 
 #define STRTAB_STE_0_S1FMT_SHIFT	4
 #define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(47, 6)
+#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
 #define STRTAB_STE_0_S1CDMAX_SHIFT	59
 #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
 
@@ -274,7 +275,7 @@
 #define STRTAB_STE_2_S2PTW		(1UL << 54)
 #define STRTAB_STE_2_S2R		(1UL << 58)
 
-#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(47, 4)
+#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
 
 /* Context descriptor (stage-1 only) */
 #define CTXDESC_CD_DWORDS		8
@@ -320,7 +321,7 @@
 #define CTXDESC_CD_0_ASID_SHIFT		48
 #define CTXDESC_CD_0_ASID_MASK		0xffffUL
 
-#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(47, 4)
+#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
 
 #define CTXDESC_CD_3_MAIR_SHIFT		0
 
@@ -357,7 +358,7 @@
 #define CMDQ_TLBI_0_ASID_SHIFT		48
 #define CMDQ_TLBI_1_LEAF		(1UL << 0)
 #define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
-#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(47, 12)
+#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
 
 #define CMDQ_PRI_0_SSID_SHIFT		12
 #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
@@ -380,7 +381,7 @@
 #define CMDQ_SYNC_0_MSIATTR_OIWB	(0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
 #define CMDQ_SYNC_0_MSIDATA_SHIFT	32
 #define CMDQ_SYNC_0_MSIDATA_MASK	0xffffffffUL
-#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(47, 2)
+#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(51, 2)
 
 /* Event queue */
 #define EVTQ_ENT_DWORDS			4
@@ -1686,7 +1687,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 		return -ENOMEM;
 
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-	domain->geometry.aperture_end = (1UL << ias) - 1;
+	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
 	domain->geometry.force_aperture = true;
 	smmu_domain->pgtbl_ops = pgtbl_ops;
 
@@ -2693,11 +2694,6 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR5_GRAN4K)
 		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
 
-	if (arm_smmu_ops.pgsize_bitmap == -1UL)
-		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
-	else
-		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
-
 	/* Output address size */
 	switch (reg & IDR5_OAS_MASK << IDR5_OAS_SHIFT) {
 	case IDR5_OAS_32_BIT:
@@ -2715,6 +2711,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	case IDR5_OAS_44_BIT:
 		smmu->oas = 44;
 		break;
+	case IDR5_OAS_52_BIT:
+		smmu->oas = 52;
+		smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
+		break;
 	default:
 		dev_info(smmu->dev,
 			"unknown output address size. Truncating to 48-bit\n");
@@ -2723,6 +2723,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 		smmu->oas = 48;
 	}
 
+	if (arm_smmu_ops.pgsize_bitmap == -1UL)
+		arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
+	else
+		arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
+
 	/* Set the DMA mask for our table walker */
 	if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
 		dev_warn(smmu->dev,
-- 
2.13.4.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
  2017-12-14 16:58 ` Robin Murphy
@ 2017-12-14 16:58     ` Robin Murphy
  -1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: will.deacon-5wv7dgnIgG8
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
really all that's involved is letting io-pgtable know the appropriate
upper bound for T0SZ.

Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---

v2: No change

 drivers/iommu/arm-smmu-v3.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c9c4e6132e27..7059a0805bd1 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -102,6 +102,7 @@
 #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
 #define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
+#define IDR5_VAX			(1 << 10)
 
 #define ARM_SMMU_CR0			0x20
 #define CR0_CMDQEN			(1 << 3)
@@ -604,6 +605,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALLS		(1 << 11)
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_VAX		(1 << 14)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
 		ias = VA_BITS;
+		if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
+			ias = 48;
 		oas = smmu->ias;
 		fmt = ARM_64_LPAE_S1;
 		finalise_stage_fn = arm_smmu_domain_finalise_s1;
@@ -2694,6 +2698,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR5_GRAN4K)
 		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
 
+	/* Input address size */
+	if (reg & IDR5_VAX)
+		smmu->features |= ARM_SMMU_FEAT_VAX;
+
 	/* Output address size */
 	switch (reg & IDR5_OAS_MASK << IDR5_OAS_SHIFT) {
 	case IDR5_OAS_32_BIT:
-- 
2.13.4.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
@ 2017-12-14 16:58     ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
  To: linux-arm-kernel

Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
really all that's involved is letting io-pgtable know the appropriate
upper bound for T0SZ.

Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---

v2: No change

 drivers/iommu/arm-smmu-v3.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c9c4e6132e27..7059a0805bd1 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -102,6 +102,7 @@
 #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
 #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
 #define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
+#define IDR5_VAX			(1 << 10)
 
 #define ARM_SMMU_CR0			0x20
 #define CR0_CMDQEN			(1 << 3)
@@ -604,6 +605,7 @@ struct arm_smmu_device {
 #define ARM_SMMU_FEAT_STALLS		(1 << 11)
 #define ARM_SMMU_FEAT_HYP		(1 << 12)
 #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
+#define ARM_SMMU_FEAT_VAX		(1 << 14)
 	u32				features;
 
 #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
@@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
 		ias = VA_BITS;
+		if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
+			ias = 48;
 		oas = smmu->ias;
 		fmt = ARM_64_LPAE_S1;
 		finalise_stage_fn = arm_smmu_domain_finalise_s1;
@@ -2694,6 +2698,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
 	if (reg & IDR5_GRAN4K)
 		smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
 
+	/* Input address size */
+	if (reg & IDR5_VAX)
+		smmu->features |= ARM_SMMU_FEAT_VAX;
+
 	/* Output address size */
 	switch (reg & IDR5_OAS_MASK << IDR5_OAS_SHIFT) {
 	case IDR5_OAS_32_BIT:
-- 
2.13.4.dirty

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 0/4] SMMU 52-bit address support
  2017-12-14 16:58 ` Robin Murphy
@ 2018-02-26 18:04     ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:04 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Hi Robin,

On Thu, Dec 14, 2017 at 04:58:49PM +0000, Robin Murphy wrote:
> Here's a (hopefully final) update of 52-bit SMMU support. The only
> material changes from v1[1] are a small cosmetic tweak to patch #1,
> and correction of the silly error in patch #2 as reported by Nate in
> testing.

This generally looks pretty good, but I have some review comments that I
think we should address before merging.

Cheers,

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 0/4] SMMU 52-bit address support
@ 2018-02-26 18:04     ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On Thu, Dec 14, 2017 at 04:58:49PM +0000, Robin Murphy wrote:
> Here's a (hopefully final) update of 52-bit SMMU support. The only
> material changes from v1[1] are a small cosmetic tweak to patch #1,
> and correction of the silly error in patch #2 as reported by Nate in
> testing.

This generally looks pretty good, but I have some review comments that I
think we should address before merging.

Cheers,

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
  2017-12-14 16:58     ` Robin Murphy
@ 2018-02-26 18:04         ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:04 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Hi Robin,

On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
> Before trying to add the SMMUv3.1 support for 52-bit addresses, make
> things bearable by cleaning up the various address mask definitions to
> use GENMASK_ULL() consistently. The fact that doing so reveals (and
> fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
> jolly good idea it is...
> 
> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
> 
> v2: Clean up one more now-unnecessary linewrap
> 
>  drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
>  1 file changed, 21 insertions(+), 32 deletions(-)

Whilst I agree that using GENMASK is better, this patch does mean that the
driver is (more) inconsistent with its _MASK terminology in that you can't
generally tell whether a definition that ends in _MASK is shifted or not,
and this isn't even consistent for fields within the same register.

Should we be using GENMASK/BIT for all fields instead and removing all of
the _SHIFT definitions?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
@ 2018-02-26 18:04         ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:04 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
> Before trying to add the SMMUv3.1 support for 52-bit addresses, make
> things bearable by cleaning up the various address mask definitions to
> use GENMASK_ULL() consistently. The fact that doing so reveals (and
> fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
> jolly good idea it is...
> 
> Tested-by: Nate Watterson <nwatters@codeaurora.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2: Clean up one more now-unnecessary linewrap
> 
>  drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
>  1 file changed, 21 insertions(+), 32 deletions(-)

Whilst I agree that using GENMASK is better, this patch does mean that the
driver is (more) inconsistent with its _MASK terminology in that you can't
generally tell whether a definition that ends in _MASK is shifted or not,
and this isn't even consistent for fields within the same register.

Should we be using GENMASK/BIT for all fields instead and removing all of
the _SHIFT definitions?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
  2017-12-14 16:58     ` Robin Murphy
@ 2018-02-26 18:05         ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
> Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
> 52-bit physical addresses when using the 64KB translation granule.
> This will be supported by SMMUv3.1.
> 
> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
> 
> v2: Fix TCR_PS/TCR_IPS copy-paste error
> 
>  drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
>  1 file changed, 47 insertions(+), 18 deletions(-)

[...]

> @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
>  
>  typedef u64 arm_lpae_iopte;
>  
> +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
> +				     struct arm_lpae_io_pgtable *data)
> +{
> +	arm_lpae_iopte pte = paddr;
> +
> +	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
> +	return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
> +}

I don't particularly like relying on properties of the paddr for correct
construction of the pte here. The existing macro doesn't have this
limitation. I suspect it's all fine at the moment because we only use TTBR0,
but I'd rather not bake that in if we can avoid it.

> +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> +				  struct arm_lpae_io_pgtable *data)
> +{
> +	phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> +	phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
> +
> +	/* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
> +	return (paddr ^ paddr_hi) | (paddr_hi << 36);

Why do we need xor here?

>  static bool selftest_running = false;
>  
>  static dma_addr_t __arm_lpae_dma_addr(void *pages)
> @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>  		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>  
>  	pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
> -	pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
> +	pte |= paddr_to_iopte(paddr, data);
>  
>  	__arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>  }
> @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>  	if (size == split_sz)
>  		unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
>  
> -	blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
> +	blk_paddr = iopte_to_paddr(blk_pte, data);
>  	pte = iopte_prot(blk_pte);
>  
>  	for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
> @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>  
>  found_translation:
>  	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
> -	return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
> +	return iopte_to_paddr(pte, data) | iova;
>  }
>  
>  static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>  {
> -	unsigned long granule;
> +	unsigned long granule, page_sizes;
> +	unsigned int max_addr_bits = 48;
>  
>  	/*
>  	 * We need to restrict the supported page sizes to match the
> @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>  
>  	switch (granule) {
>  	case SZ_4K:
> -		cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
> +		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
>  		break;
>  	case SZ_16K:
> -		cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
> +		page_sizes = (SZ_16K | SZ_32M);
>  		break;
>  	case SZ_64K:
> -		cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
> +		max_addr_bits = 52;
> +		page_sizes = (SZ_64K | SZ_512M);
> +		if (cfg->oas > 48)
> +			page_sizes |= 1ULL << 42; /* 4TB */
>  		break;
>  	default:
> -		cfg->pgsize_bitmap = 0;
> +		page_sizes = 0;
>  	}
> +
> +	cfg->pgsize_bitmap &= page_sizes;
> +	cfg->ias = min(cfg->ias, max_addr_bits);
> +	cfg->oas = min(cfg->oas, max_addr_bits);

I don't think we should be writing to the ias/oas fields here, at least
now without auditing the drivers and updating the comments about the
io-pgtable API. For example, the SMMUv3 driver uses its own ias local
variable to initialise the domain geometry, and won't pick up any changes
made here.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
@ 2018-02-26 18:05         ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
> Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
> 52-bit physical addresses when using the 64KB translation granule.
> This will be supported by SMMUv3.1.
> 
> Tested-by: Nate Watterson <nwatters@codeaurora.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2: Fix TCR_PS/TCR_IPS copy-paste error
> 
>  drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
>  1 file changed, 47 insertions(+), 18 deletions(-)

[...]

> @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
>  
>  typedef u64 arm_lpae_iopte;
>  
> +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
> +				     struct arm_lpae_io_pgtable *data)
> +{
> +	arm_lpae_iopte pte = paddr;
> +
> +	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
> +	return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
> +}

I don't particularly like relying on properties of the paddr for correct
construction of the pte here. The existing macro doesn't have this
limitation. I suspect it's all fine at the moment because we only use TTBR0,
but I'd rather not bake that in if we can avoid it.

> +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> +				  struct arm_lpae_io_pgtable *data)
> +{
> +	phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> +	phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
> +
> +	/* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
> +	return (paddr ^ paddr_hi) | (paddr_hi << 36);

Why do we need xor here?

>  static bool selftest_running = false;
>  
>  static dma_addr_t __arm_lpae_dma_addr(void *pages)
> @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>  		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>  
>  	pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
> -	pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
> +	pte |= paddr_to_iopte(paddr, data);
>  
>  	__arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>  }
> @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>  	if (size == split_sz)
>  		unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
>  
> -	blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
> +	blk_paddr = iopte_to_paddr(blk_pte, data);
>  	pte = iopte_prot(blk_pte);
>  
>  	for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
> @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>  
>  found_translation:
>  	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
> -	return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
> +	return iopte_to_paddr(pte, data) | iova;
>  }
>  
>  static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>  {
> -	unsigned long granule;
> +	unsigned long granule, page_sizes;
> +	unsigned int max_addr_bits = 48;
>  
>  	/*
>  	 * We need to restrict the supported page sizes to match the
> @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>  
>  	switch (granule) {
>  	case SZ_4K:
> -		cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
> +		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
>  		break;
>  	case SZ_16K:
> -		cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
> +		page_sizes = (SZ_16K | SZ_32M);
>  		break;
>  	case SZ_64K:
> -		cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
> +		max_addr_bits = 52;
> +		page_sizes = (SZ_64K | SZ_512M);
> +		if (cfg->oas > 48)
> +			page_sizes |= 1ULL << 42; /* 4TB */
>  		break;
>  	default:
> -		cfg->pgsize_bitmap = 0;
> +		page_sizes = 0;
>  	}
> +
> +	cfg->pgsize_bitmap &= page_sizes;
> +	cfg->ias = min(cfg->ias, max_addr_bits);
> +	cfg->oas = min(cfg->oas, max_addr_bits);

I don't think we should be writing to the ias/oas fields here, at least
now without auditing the drivers and updating the comments about the
io-pgtable API. For example, the SMMUv3 driver uses its own ias local
variable to initialise the domain geometry, and won't pick up any changes
made here.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 3/4] iommu/arm-smmu-v3: Support 52-bit physical address
  2017-12-14 16:58     ` Robin Murphy
@ 2018-02-26 18:05         ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

On Thu, Dec 14, 2017 at 04:58:52PM +0000, Robin Murphy wrote:
> Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
> OAS implies 64KB translation granule support, permitting level 1 block
> entries there is simple, and the rest is just extending address fields.
> 
> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
> 
> v2: No change
> 
>  drivers/iommu/arm-smmu-v3.c | 35 ++++++++++++++++++++---------------
>  1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 52cad776b31b..c9c4e6132e27 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -101,6 +101,7 @@
>  #define IDR5_OAS_42_BIT			(3 << IDR5_OAS_SHIFT)
>  #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
>  #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
> +#define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
>  
>  #define ARM_SMMU_CR0			0x20
>  #define CR0_CMDQEN			(1 << 3)
> @@ -159,7 +160,7 @@
>  
>  #define ARM_SMMU_STRTAB_BASE		0x80
>  #define STRTAB_BASE_RA			(1UL << 62)
> -#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(47, 6)
> +#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(51, 6)
>  
>  #define ARM_SMMU_STRTAB_BASE_CFG	0x88
>  #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT	0
> @@ -190,7 +191,7 @@
>  #define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
>  
>  /* Common MSI config fields */
> -#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(47, 2)
> +#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(51, 2)
>  #define MSI_CFG2_SH_SHIFT		4
>  #define MSI_CFG2_SH_NSH			(0UL << MSI_CFG2_SH_SHIFT)
>  #define MSI_CFG2_SH_OSH			(2UL << MSI_CFG2_SH_SHIFT)
> @@ -206,7 +207,7 @@
>  					 Q_IDX(q, p) * (q)->ent_dwords)
>  
>  #define Q_BASE_RWA			(1UL << 62)
> -#define Q_BASE_ADDR_MASK		GENMASK_ULL(47, 5)
> +#define Q_BASE_ADDR_MASK		GENMASK_ULL(51, 5)
>  #define Q_BASE_LOG2SIZE_SHIFT		0
>  #define Q_BASE_LOG2SIZE_MASK		0x1fUL
>  
> @@ -223,7 +224,7 @@
>  #define STRTAB_L1_DESC_DWORDS		1
>  #define STRTAB_L1_DESC_SPAN_SHIFT	0
>  #define STRTAB_L1_DESC_SPAN_MASK	0x1fUL
> -#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(47, 6)
> +#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 6)
>  
>  #define STRTAB_STE_DWORDS		8
>  #define STRTAB_STE_0_V			(1UL << 0)
> @@ -236,7 +237,7 @@
>  
>  #define STRTAB_STE_0_S1FMT_SHIFT	4
>  #define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
> -#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(47, 6)
> +#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX_SHIFT	59
>  #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
>  
> @@ -274,7 +275,7 @@
>  #define STRTAB_STE_2_S2PTW		(1UL << 54)
>  #define STRTAB_STE_2_S2R		(1UL << 58)
>  
> -#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(47, 4)
> +#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
>  
>  /* Context descriptor (stage-1 only) */
>  #define CTXDESC_CD_DWORDS		8
> @@ -320,7 +321,7 @@
>  #define CTXDESC_CD_0_ASID_SHIFT		48
>  #define CTXDESC_CD_0_ASID_MASK		0xffffUL
>  
> -#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(47, 4)
> +#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
>  
>  #define CTXDESC_CD_3_MAIR_SHIFT		0
>  
> @@ -357,7 +358,7 @@
>  #define CMDQ_TLBI_0_ASID_SHIFT		48
>  #define CMDQ_TLBI_1_LEAF		(1UL << 0)
>  #define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
> -#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(47, 12)
> +#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
>  
>  #define CMDQ_PRI_0_SSID_SHIFT		12
>  #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
> @@ -380,7 +381,7 @@
>  #define CMDQ_SYNC_0_MSIATTR_OIWB	(0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
>  #define CMDQ_SYNC_0_MSIDATA_SHIFT	32
>  #define CMDQ_SYNC_0_MSIDATA_MASK	0xffffffffUL
> -#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(47, 2)
> +#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(51, 2)
>  
>  /* Event queue */
>  #define EVTQ_ENT_DWORDS			4
> @@ -1686,7 +1687,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>  		return -ENOMEM;
>  
>  	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> -	domain->geometry.aperture_end = (1UL << ias) - 1;
> +	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;

Ah good, you fixed this :)

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 3/4] iommu/arm-smmu-v3: Support 52-bit physical address
@ 2018-02-26 18:05         ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 14, 2017 at 04:58:52PM +0000, Robin Murphy wrote:
> Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
> OAS implies 64KB translation granule support, permitting level 1 block
> entries there is simple, and the rest is just extending address fields.
> 
> Tested-by: Nate Watterson <nwatters@codeaurora.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2: No change
> 
>  drivers/iommu/arm-smmu-v3.c | 35 ++++++++++++++++++++---------------
>  1 file changed, 20 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 52cad776b31b..c9c4e6132e27 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -101,6 +101,7 @@
>  #define IDR5_OAS_42_BIT			(3 << IDR5_OAS_SHIFT)
>  #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
>  #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
> +#define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
>  
>  #define ARM_SMMU_CR0			0x20
>  #define CR0_CMDQEN			(1 << 3)
> @@ -159,7 +160,7 @@
>  
>  #define ARM_SMMU_STRTAB_BASE		0x80
>  #define STRTAB_BASE_RA			(1UL << 62)
> -#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(47, 6)
> +#define STRTAB_BASE_ADDR_MASK		GENMASK_ULL(51, 6)
>  
>  #define ARM_SMMU_STRTAB_BASE_CFG	0x88
>  #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT	0
> @@ -190,7 +191,7 @@
>  #define ARM_SMMU_PRIQ_IRQ_CFG2		0xdc
>  
>  /* Common MSI config fields */
> -#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(47, 2)
> +#define MSI_CFG0_ADDR_MASK		GENMASK_ULL(51, 2)
>  #define MSI_CFG2_SH_SHIFT		4
>  #define MSI_CFG2_SH_NSH			(0UL << MSI_CFG2_SH_SHIFT)
>  #define MSI_CFG2_SH_OSH			(2UL << MSI_CFG2_SH_SHIFT)
> @@ -206,7 +207,7 @@
>  					 Q_IDX(q, p) * (q)->ent_dwords)
>  
>  #define Q_BASE_RWA			(1UL << 62)
> -#define Q_BASE_ADDR_MASK		GENMASK_ULL(47, 5)
> +#define Q_BASE_ADDR_MASK		GENMASK_ULL(51, 5)
>  #define Q_BASE_LOG2SIZE_SHIFT		0
>  #define Q_BASE_LOG2SIZE_MASK		0x1fUL
>  
> @@ -223,7 +224,7 @@
>  #define STRTAB_L1_DESC_DWORDS		1
>  #define STRTAB_L1_DESC_SPAN_SHIFT	0
>  #define STRTAB_L1_DESC_SPAN_MASK	0x1fUL
> -#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(47, 6)
> +#define STRTAB_L1_DESC_L2PTR_MASK	GENMASK_ULL(51, 6)
>  
>  #define STRTAB_STE_DWORDS		8
>  #define STRTAB_STE_0_V			(1UL << 0)
> @@ -236,7 +237,7 @@
>  
>  #define STRTAB_STE_0_S1FMT_SHIFT	4
>  #define STRTAB_STE_0_S1FMT_LINEAR	(0UL << STRTAB_STE_0_S1FMT_SHIFT)
> -#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(47, 6)
> +#define STRTAB_STE_0_S1CTXPTR_MASK	GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX_SHIFT	59
>  #define STRTAB_STE_0_S1CDMAX_MASK	0x1fUL
>  
> @@ -274,7 +275,7 @@
>  #define STRTAB_STE_2_S2PTW		(1UL << 54)
>  #define STRTAB_STE_2_S2R		(1UL << 58)
>  
> -#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(47, 4)
> +#define STRTAB_STE_3_S2TTB_MASK		GENMASK_ULL(51, 4)
>  
>  /* Context descriptor (stage-1 only) */
>  #define CTXDESC_CD_DWORDS		8
> @@ -320,7 +321,7 @@
>  #define CTXDESC_CD_0_ASID_SHIFT		48
>  #define CTXDESC_CD_0_ASID_MASK		0xffffUL
>  
> -#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(47, 4)
> +#define CTXDESC_CD_1_TTB0_MASK		GENMASK_ULL(51, 4)
>  
>  #define CTXDESC_CD_3_MAIR_SHIFT		0
>  
> @@ -357,7 +358,7 @@
>  #define CMDQ_TLBI_0_ASID_SHIFT		48
>  #define CMDQ_TLBI_1_LEAF		(1UL << 0)
>  #define CMDQ_TLBI_1_VA_MASK		GENMASK_ULL(63, 12)
> -#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(47, 12)
> +#define CMDQ_TLBI_1_IPA_MASK		GENMASK_ULL(51, 12)
>  
>  #define CMDQ_PRI_0_SSID_SHIFT		12
>  #define CMDQ_PRI_0_SSID_MASK		0xfffffUL
> @@ -380,7 +381,7 @@
>  #define CMDQ_SYNC_0_MSIATTR_OIWB	(0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
>  #define CMDQ_SYNC_0_MSIDATA_SHIFT	32
>  #define CMDQ_SYNC_0_MSIDATA_MASK	0xffffffffUL
> -#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(47, 2)
> +#define CMDQ_SYNC_1_MSIADDR_MASK	GENMASK_ULL(51, 2)
>  
>  /* Event queue */
>  #define EVTQ_ENT_DWORDS			4
> @@ -1686,7 +1687,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>  		return -ENOMEM;
>  
>  	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> -	domain->geometry.aperture_end = (1UL << ias) - 1;
> +	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;

Ah good, you fixed this :)

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
  2017-12-14 16:58     ` Robin Murphy
@ 2018-02-26 18:05         ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

On Thu, Dec 14, 2017 at 04:58:53PM +0000, Robin Murphy wrote:
> Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
> really all that's involved is letting io-pgtable know the appropriate
> upper bound for T0SZ.
> 
> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
> 
> v2: No change
> 
>  drivers/iommu/arm-smmu-v3.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index c9c4e6132e27..7059a0805bd1 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -102,6 +102,7 @@
>  #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
>  #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
>  #define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
> +#define IDR5_VAX			(1 << 10)

I think this is actually a two-bit field, so we should check the whole
thing.

>  
>  #define ARM_SMMU_CR0			0x20
>  #define CR0_CMDQEN			(1 << 3)
> @@ -604,6 +605,7 @@ struct arm_smmu_device {
>  #define ARM_SMMU_FEAT_STALLS		(1 << 11)
>  #define ARM_SMMU_FEAT_HYP		(1 << 12)
>  #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
> +#define ARM_SMMU_FEAT_VAX		(1 << 14)
>  	u32				features;
>  
>  #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
> @@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>  	switch (smmu_domain->stage) {
>  	case ARM_SMMU_DOMAIN_S1:
>  		ias = VA_BITS;
> +		if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))

nit, but I'd rather the if condition was checking ias instead of VA_BITS.

> +			ias = 48;
>  		oas = smmu->ias;

Should we also be sanitising the oas here if the SMMU doesn't support 64k
pages? It looks like the ias/oas sanitisation isn't cleanly split between
the pgtable code and the SMMU driver.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
@ 2018-02-26 18:05         ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Dec 14, 2017 at 04:58:53PM +0000, Robin Murphy wrote:
> Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
> really all that's involved is letting io-pgtable know the appropriate
> upper bound for T0SZ.
> 
> Tested-by: Nate Watterson <nwatters@codeaurora.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
> 
> v2: No change
> 
>  drivers/iommu/arm-smmu-v3.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index c9c4e6132e27..7059a0805bd1 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -102,6 +102,7 @@
>  #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
>  #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
>  #define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
> +#define IDR5_VAX			(1 << 10)

I think this is actually a two-bit field, so we should check the whole
thing.

>  
>  #define ARM_SMMU_CR0			0x20
>  #define CR0_CMDQEN			(1 << 3)
> @@ -604,6 +605,7 @@ struct arm_smmu_device {
>  #define ARM_SMMU_FEAT_STALLS		(1 << 11)
>  #define ARM_SMMU_FEAT_HYP		(1 << 12)
>  #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
> +#define ARM_SMMU_FEAT_VAX		(1 << 14)
>  	u32				features;
>  
>  #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
> @@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>  	switch (smmu_domain->stage) {
>  	case ARM_SMMU_DOMAIN_S1:
>  		ias = VA_BITS;
> +		if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))

nit, but I'd rather the if condition was checking ias instead of VA_BITS.

> +			ias = 48;
>  		oas = smmu->ias;

Should we also be sanitising the oas here if the SMMU doesn't support 64k
pages? It looks like the ias/oas sanitisation isn't cleanly split between
the pgtable code and the SMMU driver.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
  2018-02-26 18:04         ` Will Deacon
@ 2018-02-27 13:28             ` Robin Murphy
  -1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:28 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Hi Will,

On 26/02/18 18:04, Will Deacon wrote:
> Hi Robin,
> 
> On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
>> Before trying to add the SMMUv3.1 support for 52-bit addresses, make
>> things bearable by cleaning up the various address mask definitions to
>> use GENMASK_ULL() consistently. The fact that doing so reveals (and
>> fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
>> jolly good idea it is...
>>
>> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>
>> v2: Clean up one more now-unnecessary linewrap
>>
>>   drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
>>   1 file changed, 21 insertions(+), 32 deletions(-)
> 
> Whilst I agree that using GENMASK is better, this patch does mean that the
> driver is (more) inconsistent with its _MASK terminology in that you can't
> generally tell whether a definition that ends in _MASK is shifted or not,
> and this isn't even consistent for fields within the same register.

The apparently slightly-less-than-obvious internal consistency is that 
every mask used for an *address field* is now in-place, while other 
types of field are still handled as inconsistently as they were before. 
It should also be the case that every x_MASK without a corresponding 
x_SHIFT defined next to it is unshifted.

Either way it's certainly no *worse* than the current situation where 
address masks sometimes have a nonzero shift, sometimes have zero bits 
at the bottom and a shift of 0, and sometimes have no shift defined at all.

Thinking about it some more, the address masks should only ever be 
needed when *extracting* an address from a register/structure word, or 
validating them in the context of an address *before* inserting into a 
field - if we can't trust input to be correct then just silently masking 
off bits probably isn't the best idea either way - so IMHO there is 
plenty of contextual disambiguation too.

> Should we be using GENMASK/BIT for all fields instead and removing all of
> the _SHIFT definitions?

I'm all aboard using BIT() consistently for single-bit boolean fields, 
but for multi-bit fields in general we do have to keep an explicit shift 
defined *somewhere* in order to make sensible use of the value, i.e. either:

	val = (reg >> 22) & 0x1f;
	reg = (val & 0x1f) << 22;

or:
	val = (reg & 0x07c00000) >> 22;
	reg = (val << 22) & 0x07c00000;

[ but ideally not this mess we currently have in some places:

	val = (reg & 0x1f << 22) >> 22;
]

Again, I'd gladly clean everything up to at least be self-consistent 
(and line up more with how we did things in SMMUv2) if you think it's 
worthwhile. Although I guess that means I'd get the job of fixing up 
future stable backport conflicts too ;)

Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
@ 2018-02-27 13:28             ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:28 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On 26/02/18 18:04, Will Deacon wrote:
> Hi Robin,
> 
> On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
>> Before trying to add the SMMUv3.1 support for 52-bit addresses, make
>> things bearable by cleaning up the various address mask definitions to
>> use GENMASK_ULL() consistently. The fact that doing so reveals (and
>> fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
>> jolly good idea it is...
>>
>> Tested-by: Nate Watterson <nwatters@codeaurora.org>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> v2: Clean up one more now-unnecessary linewrap
>>
>>   drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
>>   1 file changed, 21 insertions(+), 32 deletions(-)
> 
> Whilst I agree that using GENMASK is better, this patch does mean that the
> driver is (more) inconsistent with its _MASK terminology in that you can't
> generally tell whether a definition that ends in _MASK is shifted or not,
> and this isn't even consistent for fields within the same register.

The apparently slightly-less-than-obvious internal consistency is that 
every mask used for an *address field* is now in-place, while other 
types of field are still handled as inconsistently as they were before. 
It should also be the case that every x_MASK without a corresponding 
x_SHIFT defined next to it is unshifted.

Either way it's certainly no *worse* than the current situation where 
address masks sometimes have a nonzero shift, sometimes have zero bits 
at the bottom and a shift of 0, and sometimes have no shift defined at all.

Thinking about it some more, the address masks should only ever be 
needed when *extracting* an address from a register/structure word, or 
validating them in the context of an address *before* inserting into a 
field - if we can't trust input to be correct then just silently masking 
off bits probably isn't the best idea either way - so IMHO there is 
plenty of contextual disambiguation too.

> Should we be using GENMASK/BIT for all fields instead and removing all of
> the _SHIFT definitions?

I'm all aboard using BIT() consistently for single-bit boolean fields, 
but for multi-bit fields in general we do have to keep an explicit shift 
defined *somewhere* in order to make sensible use of the value, i.e. either:

	val = (reg >> 22) & 0x1f;
	reg = (val & 0x1f) << 22;

or:
	val = (reg & 0x07c00000) >> 22;
	reg = (val << 22) & 0x07c00000;

[ but ideally not this mess we currently have in some places:

	val = (reg & 0x1f << 22) >> 22;
]

Again, I'd gladly clean everything up to at least be self-consistent 
(and line up more with how we did things in SMMUv2) if you think it's 
worthwhile. Although I guess that means I'd get the job of fixing up 
future stable backport conflicts too ;)

Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
  2018-02-26 18:05         ` Will Deacon
@ 2018-02-27 13:49             ` Robin Murphy
  -1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:49 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

On 26/02/18 18:05, Will Deacon wrote:
> On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
>> Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
>> 52-bit physical addresses when using the 64KB translation granule.
>> This will be supported by SMMUv3.1.
>>
>> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>
>> v2: Fix TCR_PS/TCR_IPS copy-paste error
>>
>>   drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
>>   1 file changed, 47 insertions(+), 18 deletions(-)
> 
> [...]
> 
>> @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
>>   
>>   typedef u64 arm_lpae_iopte;
>>   
>> +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
>> +				     struct arm_lpae_io_pgtable *data)
>> +{
>> +	arm_lpae_iopte pte = paddr;
>> +
>> +	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
>> +	return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
>> +}
> 
> I don't particularly like relying on properties of the paddr for correct
> construction of the pte here. The existing macro doesn't have this
> limitation. I suspect it's all fine at the moment because we only use TTBR0,
> but I'd rather not bake that in if we can avoid it.

What's the relevance of TTBR0 to physical addresses? :/

Note that by this point paddr has been validated against cfg->oas by 
arm_lpae_map(), and validated to be granule-aligned by iommu_map(), so 
it really can't be wrong under reasonable conditions.

>> +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
>> +				  struct arm_lpae_io_pgtable *data)
>> +{
>> +	phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
>> +	phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
>> +
>> +	/* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
>> +	return (paddr ^ paddr_hi) | (paddr_hi << 36);
> 
> Why do we need xor here?

Because "(paddr ^ paddr_hi)" is more concise than "(paddr & 
~(phys_addr_t)(ARM_LPAE_GRANULE(data) - 1)" or variants thereof. It's 
potentially a teeny bit more efficient too, I think, but it's mostly 
about the readability.

>>   static bool selftest_running = false;
>>   
>>   static dma_addr_t __arm_lpae_dma_addr(void *pages)
>> @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>   		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>>   
>>   	pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
>> -	pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
>> +	pte |= paddr_to_iopte(paddr, data);
>>   
>>   	__arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>>   }
>> @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>   	if (size == split_sz)
>>   		unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
>>   
>> -	blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
>> +	blk_paddr = iopte_to_paddr(blk_pte, data);
>>   	pte = iopte_prot(blk_pte);
>>   
>>   	for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>> @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>>   
>>   found_translation:
>>   	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
>> -	return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
>> +	return iopte_to_paddr(pte, data) | iova;
>>   }
>>   
>>   static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>>   {
>> -	unsigned long granule;
>> +	unsigned long granule, page_sizes;
>> +	unsigned int max_addr_bits = 48;
>>   
>>   	/*
>>   	 * We need to restrict the supported page sizes to match the
>> @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>>   
>>   	switch (granule) {
>>   	case SZ_4K:
>> -		cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
>> +		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
>>   		break;
>>   	case SZ_16K:
>> -		cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
>> +		page_sizes = (SZ_16K | SZ_32M);
>>   		break;
>>   	case SZ_64K:
>> -		cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
>> +		max_addr_bits = 52;
>> +		page_sizes = (SZ_64K | SZ_512M);
>> +		if (cfg->oas > 48)
>> +			page_sizes |= 1ULL << 42; /* 4TB */
>>   		break;
>>   	default:
>> -		cfg->pgsize_bitmap = 0;
>> +		page_sizes = 0;
>>   	}
>> +
>> +	cfg->pgsize_bitmap &= page_sizes;
>> +	cfg->ias = min(cfg->ias, max_addr_bits);
>> +	cfg->oas = min(cfg->oas, max_addr_bits);
> 
> I don't think we should be writing to the ias/oas fields here, at least
> now without auditing the drivers and updating the comments about the
> io-pgtable API. For example, the SMMUv3 driver uses its own ias local
> variable to initialise the domain geometry, and won't pick up any changes
> made here.

As you've discovered, the driver thing is indeed true. More generally, 
though, we've always adjusted cfg->pgsize_bitmap here, so I don't see 
why other fields should be treated differently - I've always assumed the 
cfg which the driver passes in here just represents its total maximum 
capabilities, from which it's io-pgtable's job to pick an appropriate 
configuration.

Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
@ 2018-02-27 13:49             ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:49 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/02/18 18:05, Will Deacon wrote:
> On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
>> Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
>> 52-bit physical addresses when using the 64KB translation granule.
>> This will be supported by SMMUv3.1.
>>
>> Tested-by: Nate Watterson <nwatters@codeaurora.org>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> v2: Fix TCR_PS/TCR_IPS copy-paste error
>>
>>   drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
>>   1 file changed, 47 insertions(+), 18 deletions(-)
> 
> [...]
> 
>> @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
>>   
>>   typedef u64 arm_lpae_iopte;
>>   
>> +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
>> +				     struct arm_lpae_io_pgtable *data)
>> +{
>> +	arm_lpae_iopte pte = paddr;
>> +
>> +	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
>> +	return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
>> +}
> 
> I don't particularly like relying on properties of the paddr for correct
> construction of the pte here. The existing macro doesn't have this
> limitation. I suspect it's all fine at the moment because we only use TTBR0,
> but I'd rather not bake that in if we can avoid it.

What's the relevance of TTBR0 to physical addresses? :/

Note that by this point paddr has been validated against cfg->oas by 
arm_lpae_map(), and validated to be granule-aligned by iommu_map(), so 
it really can't be wrong under reasonable conditions.

>> +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
>> +				  struct arm_lpae_io_pgtable *data)
>> +{
>> +	phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
>> +	phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
>> +
>> +	/* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
>> +	return (paddr ^ paddr_hi) | (paddr_hi << 36);
> 
> Why do we need xor here?

Because "(paddr ^ paddr_hi)" is more concise than "(paddr & 
~(phys_addr_t)(ARM_LPAE_GRANULE(data) - 1)" or variants thereof. It's 
potentially a teeny bit more efficient too, I think, but it's mostly 
about the readability.

>>   static bool selftest_running = false;
>>   
>>   static dma_addr_t __arm_lpae_dma_addr(void *pages)
>> @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>>   		pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>>   
>>   	pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
>> -	pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
>> +	pte |= paddr_to_iopte(paddr, data);
>>   
>>   	__arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>>   }
>> @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>>   	if (size == split_sz)
>>   		unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
>>   
>> -	blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
>> +	blk_paddr = iopte_to_paddr(blk_pte, data);
>>   	pte = iopte_prot(blk_pte);
>>   
>>   	for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>> @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>>   
>>   found_translation:
>>   	iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
>> -	return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
>> +	return iopte_to_paddr(pte, data) | iova;
>>   }
>>   
>>   static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>>   {
>> -	unsigned long granule;
>> +	unsigned long granule, page_sizes;
>> +	unsigned int max_addr_bits = 48;
>>   
>>   	/*
>>   	 * We need to restrict the supported page sizes to match the
>> @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>>   
>>   	switch (granule) {
>>   	case SZ_4K:
>> -		cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
>> +		page_sizes = (SZ_4K | SZ_2M | SZ_1G);
>>   		break;
>>   	case SZ_16K:
>> -		cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
>> +		page_sizes = (SZ_16K | SZ_32M);
>>   		break;
>>   	case SZ_64K:
>> -		cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
>> +		max_addr_bits = 52;
>> +		page_sizes = (SZ_64K | SZ_512M);
>> +		if (cfg->oas > 48)
>> +			page_sizes |= 1ULL << 42; /* 4TB */
>>   		break;
>>   	default:
>> -		cfg->pgsize_bitmap = 0;
>> +		page_sizes = 0;
>>   	}
>> +
>> +	cfg->pgsize_bitmap &= page_sizes;
>> +	cfg->ias = min(cfg->ias, max_addr_bits);
>> +	cfg->oas = min(cfg->oas, max_addr_bits);
> 
> I don't think we should be writing to the ias/oas fields here, at least
> now without auditing the drivers and updating the comments about the
> io-pgtable API. For example, the SMMUv3 driver uses its own ias local
> variable to initialise the domain geometry, and won't pick up any changes
> made here.

As you've discovered, the driver thing is indeed true. More generally, 
though, we've always adjusted cfg->pgsize_bitmap here, so I don't see 
why other fields should be treated differently - I've always assumed the 
cfg which the driver passes in here just represents its total maximum 
capabilities, from which it's io-pgtable's job to pick an appropriate 
configuration.

Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
  2018-02-26 18:05         ` Will Deacon
@ 2018-02-27 13:57             ` Robin Murphy
  -1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

On 26/02/18 18:05, Will Deacon wrote:
> On Thu, Dec 14, 2017 at 04:58:53PM +0000, Robin Murphy wrote:
>> Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
>> really all that's involved is letting io-pgtable know the appropriate
>> upper bound for T0SZ.
>>
>> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>
>> v2: No change
>>
>>   drivers/iommu/arm-smmu-v3.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index c9c4e6132e27..7059a0805bd1 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -102,6 +102,7 @@
>>   #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
>>   #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
>>   #define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
>> +#define IDR5_VAX			(1 << 10)
> 
> I think this is actually a two-bit field, so we should check the whole
> thing.

Yes, I either overlooked that or took a cheeky shortcut; will fix.

>>   
>>   #define ARM_SMMU_CR0			0x20
>>   #define CR0_CMDQEN			(1 << 3)
>> @@ -604,6 +605,7 @@ struct arm_smmu_device {
>>   #define ARM_SMMU_FEAT_STALLS		(1 << 11)
>>   #define ARM_SMMU_FEAT_HYP		(1 << 12)
>>   #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
>> +#define ARM_SMMU_FEAT_VAX		(1 << 14)
>>   	u32				features;
>>   
>>   #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
>> @@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>>   	switch (smmu_domain->stage) {
>>   	case ARM_SMMU_DOMAIN_S1:
>>   		ias = VA_BITS;
>> +		if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
> 
> nit, but I'd rather the if condition was checking ias instead of VA_BITS.
> 
>> +			ias = 48;
>>   		oas = smmu->ias;
> 
> Should we also be sanitising the oas here if the SMMU doesn't support 64k
> pages? It looks like the ias/oas sanitisation isn't cleanly split between
> the pgtable code and the SMMU driver.

Right, based on my previous argument it would be cleaner to set 48/52 
here based on the SMMU features, then constrain to VA_BITS in 
io-pgtable; I think I was more focused on minimising the diff initially.

Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
@ 2018-02-27 13:57             ` Robin Murphy
  0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:57 UTC (permalink / raw)
  To: linux-arm-kernel

On 26/02/18 18:05, Will Deacon wrote:
> On Thu, Dec 14, 2017 at 04:58:53PM +0000, Robin Murphy wrote:
>> Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
>> really all that's involved is letting io-pgtable know the appropriate
>> upper bound for T0SZ.
>>
>> Tested-by: Nate Watterson <nwatters@codeaurora.org>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> v2: No change
>>
>>   drivers/iommu/arm-smmu-v3.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index c9c4e6132e27..7059a0805bd1 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -102,6 +102,7 @@
>>   #define IDR5_OAS_44_BIT			(4 << IDR5_OAS_SHIFT)
>>   #define IDR5_OAS_48_BIT			(5 << IDR5_OAS_SHIFT)
>>   #define IDR5_OAS_52_BIT			(6 << IDR5_OAS_SHIFT)
>> +#define IDR5_VAX			(1 << 10)
> 
> I think this is actually a two-bit field, so we should check the whole
> thing.

Yes, I either overlooked that or took a cheeky shortcut; will fix.

>>   
>>   #define ARM_SMMU_CR0			0x20
>>   #define CR0_CMDQEN			(1 << 3)
>> @@ -604,6 +605,7 @@ struct arm_smmu_device {
>>   #define ARM_SMMU_FEAT_STALLS		(1 << 11)
>>   #define ARM_SMMU_FEAT_HYP		(1 << 12)
>>   #define ARM_SMMU_FEAT_STALL_FORCE	(1 << 13)
>> +#define ARM_SMMU_FEAT_VAX		(1 << 14)
>>   	u32				features;
>>   
>>   #define ARM_SMMU_OPT_SKIP_PREFETCH	(1 << 0)
>> @@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>>   	switch (smmu_domain->stage) {
>>   	case ARM_SMMU_DOMAIN_S1:
>>   		ias = VA_BITS;
>> +		if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
> 
> nit, but I'd rather the if condition was checking ias instead of VA_BITS.
> 
>> +			ias = 48;
>>   		oas = smmu->ias;
> 
> Should we also be sanitising the oas here if the SMMU doesn't support 64k
> pages? It looks like the ias/oas sanitisation isn't cleanly split between
> the pgtable code and the SMMU driver.

Right, based on my previous argument it would be cleaner to set 48/52 
here based on the SMMU features, then constrain to VA_BITS in 
io-pgtable; I think I was more focused on minimising the diff initially.

Robin.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
  2018-02-27 13:28             ` Robin Murphy
@ 2018-03-06 16:02                 ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-03-06 16:02 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Hi Robin,

On Tue, Feb 27, 2018 at 01:28:31PM +0000, Robin Murphy wrote:
> On 26/02/18 18:04, Will Deacon wrote:
> >On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
> >>Before trying to add the SMMUv3.1 support for 52-bit addresses, make
> >>things bearable by cleaning up the various address mask definitions to
> >>use GENMASK_ULL() consistently. The fact that doing so reveals (and
> >>fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
> >>jolly good idea it is...
> >>
> >>Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> >>Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> >>---
> >>
> >>v2: Clean up one more now-unnecessary linewrap
> >>
> >>  drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
> >>  1 file changed, 21 insertions(+), 32 deletions(-)
> >
> >Whilst I agree that using GENMASK is better, this patch does mean that the
> >driver is (more) inconsistent with its _MASK terminology in that you can't
> >generally tell whether a definition that ends in _MASK is shifted or not,
> >and this isn't even consistent for fields within the same register.
> 
> The apparently slightly-less-than-obvious internal consistency is that every
> mask used for an *address field* is now in-place, while other types of field
> are still handled as inconsistently as they were before. It should also be
> the case that every x_MASK without a corresponding x_SHIFT defined next to
> it is unshifted.
> 
> Either way it's certainly no *worse* than the current situation where
> address masks sometimes have a nonzero shift, sometimes have zero bits at
> the bottom and a shift of 0, and sometimes have no shift defined at all.
> 
> Thinking about it some more, the address masks should only ever be needed
> when *extracting* an address from a register/structure word, or validating
> them in the context of an address *before* inserting into a field - if we
> can't trust input to be correct then just silently masking off bits probably
> isn't the best idea either way - so IMHO there is plenty of contextual
> disambiguation too.
> 
> >Should we be using GENMASK/BIT for all fields instead and removing all of
> >the _SHIFT definitions?
> 
> I'm all aboard using BIT() consistently for single-bit boolean fields, but
> for multi-bit fields in general we do have to keep an explicit shift defined
> *somewhere* in order to make sensible use of the value, i.e. either:
> 
> 	val = (reg >> 22) & 0x1f;
> 	reg = (val & 0x1f) << 22;
> 
> or:
> 	val = (reg & 0x07c00000) >> 22;
> 	reg = (val << 22) & 0x07c00000;
> 
> [ but ideally not this mess we currently have in some places:
> 
> 	val = (reg & 0x1f << 22) >> 22;
> ]
> 
> Again, I'd gladly clean everything up to at least be self-consistent (and
> line up more with how we did things in SMMUv2) if you think it's worthwhile.
> Although I guess that means I'd get the job of fixing up future stable
> backport conflicts too ;)

I reckon it would be worth the cleanup since you're in the area. I don't
mind keeping the SHITF definitions where they're needed, but using BIT and
GENMASK wherever we can.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
@ 2018-03-06 16:02                 ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-03-06 16:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Robin,

On Tue, Feb 27, 2018 at 01:28:31PM +0000, Robin Murphy wrote:
> On 26/02/18 18:04, Will Deacon wrote:
> >On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
> >>Before trying to add the SMMUv3.1 support for 52-bit addresses, make
> >>things bearable by cleaning up the various address mask definitions to
> >>use GENMASK_ULL() consistently. The fact that doing so reveals (and
> >>fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
> >>jolly good idea it is...
> >>
> >>Tested-by: Nate Watterson <nwatters@codeaurora.org>
> >>Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>---
> >>
> >>v2: Clean up one more now-unnecessary linewrap
> >>
> >>  drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
> >>  1 file changed, 21 insertions(+), 32 deletions(-)
> >
> >Whilst I agree that using GENMASK is better, this patch does mean that the
> >driver is (more) inconsistent with its _MASK terminology in that you can't
> >generally tell whether a definition that ends in _MASK is shifted or not,
> >and this isn't even consistent for fields within the same register.
> 
> The apparently slightly-less-than-obvious internal consistency is that every
> mask used for an *address field* is now in-place, while other types of field
> are still handled as inconsistently as they were before. It should also be
> the case that every x_MASK without a corresponding x_SHIFT defined next to
> it is unshifted.
> 
> Either way it's certainly no *worse* than the current situation where
> address masks sometimes have a nonzero shift, sometimes have zero bits at
> the bottom and a shift of 0, and sometimes have no shift defined at all.
> 
> Thinking about it some more, the address masks should only ever be needed
> when *extracting* an address from a register/structure word, or validating
> them in the context of an address *before* inserting into a field - if we
> can't trust input to be correct then just silently masking off bits probably
> isn't the best idea either way - so IMHO there is plenty of contextual
> disambiguation too.
> 
> >Should we be using GENMASK/BIT for all fields instead and removing all of
> >the _SHIFT definitions?
> 
> I'm all aboard using BIT() consistently for single-bit boolean fields, but
> for multi-bit fields in general we do have to keep an explicit shift defined
> *somewhere* in order to make sensible use of the value, i.e. either:
> 
> 	val = (reg >> 22) & 0x1f;
> 	reg = (val & 0x1f) << 22;
> 
> or:
> 	val = (reg & 0x07c00000) >> 22;
> 	reg = (val << 22) & 0x07c00000;
> 
> [ but ideally not this mess we currently have in some places:
> 
> 	val = (reg & 0x1f << 22) >> 22;
> ]
> 
> Again, I'd gladly clean everything up to at least be self-consistent (and
> line up more with how we did things in SMMUv2) if you think it's worthwhile.
> Although I guess that means I'd get the job of fixing up future stable
> backport conflicts too ;)

I reckon it would be worth the cleanup since you're in the area. I don't
mind keeping the SHITF definitions where they're needed, but using BIT and
GENMASK wherever we can.

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
  2018-02-27 13:49             ` Robin Murphy
@ 2018-03-06 17:54                 ` Will Deacon
  -1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-03-06 17:54 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	kristina.martsenko-5wv7dgnIgG8,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	steve.capper-5wv7dgnIgG8

Hey Robin,

On Tue, Feb 27, 2018 at 01:49:14PM +0000, Robin Murphy wrote:
> On 26/02/18 18:05, Will Deacon wrote:
> >On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
> >>Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
> >>52-bit physical addresses when using the 64KB translation granule.
> >>This will be supported by SMMUv3.1.
> >>
> >>Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> >>Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> >>---
> >>
> >>v2: Fix TCR_PS/TCR_IPS copy-paste error
> >>
> >>  drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
> >>  1 file changed, 47 insertions(+), 18 deletions(-)
> >
> >[...]
> >
> >>@@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
> >>  typedef u64 arm_lpae_iopte;
> >>+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
> >>+				     struct arm_lpae_io_pgtable *data)
> >>+{
> >>+	arm_lpae_iopte pte = paddr;
> >>+
> >>+	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
> >>+	return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
> >>+}
> >
> >I don't particularly like relying on properties of the paddr for correct
> >construction of the pte here. The existing macro doesn't have this
> >limitation. I suspect it's all fine at the moment because we only use TTBR0,
> >but I'd rather not bake that in if we can avoid it.
> 
> What's the relevance of TTBR0 to physical addresses? :/

Good point! I clearly got confused here.

> Note that by this point paddr has been validated against cfg->oas by
> arm_lpae_map(), and validated to be granule-aligned by iommu_map(), so it
> really can't be wrong under reasonable conditions.

Fair enough, but I still think this would be a bit more readable written
along the lines of:

	arm_lpae_iopte pte_hi, pte_lo = 0;

	pte_hi = paddr & GENMASK(47, data->pg_shift);
	if (data->pg_shift == 16) {
		pte_lo = paddr & GENMASK(ARM_LPAE_MAX_ADDR_BITS - 1, 48);
		pte_lo >>= (48 - 12);
	}

	return pte_hi | pte_lo;

Ok, we don't squeeze every last cycle out of it, but I find it much more
readable. Is it just me? The magic numbers could be #defined and/or
derived if necessary.

> >>+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> >>+				  struct arm_lpae_io_pgtable *data)
> >>+{
> >>+	phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> >>+	phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
> >>+
> >>+	/* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
> >>+	return (paddr ^ paddr_hi) | (paddr_hi << 36);
> >
> >Why do we need xor here?
> 
> Because "(paddr ^ paddr_hi)" is more concise than "(paddr &
> ~(phys_addr_t)(ARM_LPAE_GRANULE(data) - 1)" or variants thereof. It's
> potentially a teeny bit more efficient too, I think, but it's mostly about
> the readability.

I don't think the bit-twiddling is super readable, to be honest. Again,
something like:

	phys_addr_t paddr_lo, paddr_hi = 0;

	paddr_lo = pte & GENMASK(47, data->pg_shift);
	if (data->pg_shift == 16) {
		paddr_hi = pte & GENMASK(15, 12);
		paddr_hi <<= (48 - 12);
	}

	return paddr_hi | paddr_lo;

but I've not even compiled this stuff...

> >>+	cfg->pgsize_bitmap &= page_sizes;
> >>+	cfg->ias = min(cfg->ias, max_addr_bits);
> >>+	cfg->oas = min(cfg->oas, max_addr_bits);
> >
> >I don't think we should be writing to the ias/oas fields here, at least
> >now without auditing the drivers and updating the comments about the
> >io-pgtable API. For example, the SMMUv3 driver uses its own ias local
> >variable to initialise the domain geometry, and won't pick up any changes
> >made here.
> 
> As you've discovered, the driver thing is indeed true. More generally,
> though, we've always adjusted cfg->pgsize_bitmap here, so I don't see why
> other fields should be treated differently - I've always assumed the cfg
> which the driver passes in here just represents its total maximum
> capabilities, from which it's io-pgtable's job to pick an appropriate
> configuration.

Can you update the the header file with a comment to that effect, please?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
@ 2018-03-06 17:54                 ` Will Deacon
  0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-03-06 17:54 UTC (permalink / raw)
  To: linux-arm-kernel

Hey Robin,

On Tue, Feb 27, 2018 at 01:49:14PM +0000, Robin Murphy wrote:
> On 26/02/18 18:05, Will Deacon wrote:
> >On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
> >>Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
> >>52-bit physical addresses when using the 64KB translation granule.
> >>This will be supported by SMMUv3.1.
> >>
> >>Tested-by: Nate Watterson <nwatters@codeaurora.org>
> >>Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>---
> >>
> >>v2: Fix TCR_PS/TCR_IPS copy-paste error
> >>
> >>  drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
> >>  1 file changed, 47 insertions(+), 18 deletions(-)
> >
> >[...]
> >
> >>@@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
> >>  typedef u64 arm_lpae_iopte;
> >>+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
> >>+				     struct arm_lpae_io_pgtable *data)
> >>+{
> >>+	arm_lpae_iopte pte = paddr;
> >>+
> >>+	/* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
> >>+	return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
> >>+}
> >
> >I don't particularly like relying on properties of the paddr for correct
> >construction of the pte here. The existing macro doesn't have this
> >limitation. I suspect it's all fine at the moment because we only use TTBR0,
> >but I'd rather not bake that in if we can avoid it.
> 
> What's the relevance of TTBR0 to physical addresses? :/

Good point! I clearly got confused here.

> Note that by this point paddr has been validated against cfg->oas by
> arm_lpae_map(), and validated to be granule-aligned by iommu_map(), so it
> really can't be wrong under reasonable conditions.

Fair enough, but I still think this would be a bit more readable written
along the lines of:

	arm_lpae_iopte pte_hi, pte_lo = 0;

	pte_hi = paddr & GENMASK(47, data->pg_shift);
	if (data->pg_shift == 16) {
		pte_lo = paddr & GENMASK(ARM_LPAE_MAX_ADDR_BITS - 1, 48);
		pte_lo >>= (48 - 12);
	}

	return pte_hi | pte_lo;

Ok, we don't squeeze every last cycle out of it, but I find it much more
readable. Is it just me? The magic numbers could be #defined and/or
derived if necessary.

> >>+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> >>+				  struct arm_lpae_io_pgtable *data)
> >>+{
> >>+	phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> >>+	phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
> >>+
> >>+	/* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
> >>+	return (paddr ^ paddr_hi) | (paddr_hi << 36);
> >
> >Why do we need xor here?
> 
> Because "(paddr ^ paddr_hi)" is more concise than "(paddr &
> ~(phys_addr_t)(ARM_LPAE_GRANULE(data) - 1)" or variants thereof. It's
> potentially a teeny bit more efficient too, I think, but it's mostly about
> the readability.

I don't think the bit-twiddling is super readable, to be honest. Again,
something like:

	phys_addr_t paddr_lo, paddr_hi = 0;

	paddr_lo = pte & GENMASK(47, data->pg_shift);
	if (data->pg_shift == 16) {
		paddr_hi = pte & GENMASK(15, 12);
		paddr_hi <<= (48 - 12);
	}

	return paddr_hi | paddr_lo;

but I've not even compiled this stuff...

> >>+	cfg->pgsize_bitmap &= page_sizes;
> >>+	cfg->ias = min(cfg->ias, max_addr_bits);
> >>+	cfg->oas = min(cfg->oas, max_addr_bits);
> >
> >I don't think we should be writing to the ias/oas fields here, at least
> >now without auditing the drivers and updating the comments about the
> >io-pgtable API. For example, the SMMUv3 driver uses its own ias local
> >variable to initialise the domain geometry, and won't pick up any changes
> >made here.
> 
> As you've discovered, the driver thing is indeed true. More generally,
> though, we've always adjusted cfg->pgsize_bitmap here, so I don't see why
> other fields should be treated differently - I've always assumed the cfg
> which the driver passes in here just represents its total maximum
> capabilities, from which it's io-pgtable's job to pick an appropriate
> configuration.

Can you update the the header file with a comment to that effect, please?

Will

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-03-06 17:54 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-14 16:58 [PATCH v2 0/4] SMMU 52-bit address support Robin Murphy
2017-12-14 16:58 ` Robin Murphy
     [not found] ` <cover.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-12-14 16:58   ` [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking Robin Murphy
2017-12-14 16:58     ` Robin Murphy
     [not found]     ` <24f90689e35a90a337601943a48902a7ab6a7c4d.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-02-26 18:04       ` Will Deacon
2018-02-26 18:04         ` Will Deacon
     [not found]         ` <20180226180455.GB26147-5wv7dgnIgG8@public.gmane.org>
2018-02-27 13:28           ` Robin Murphy
2018-02-27 13:28             ` Robin Murphy
     [not found]             ` <0734679d-d94a-92e7-8cea-dcf0c07b0620-5wv7dgnIgG8@public.gmane.org>
2018-03-06 16:02               ` Will Deacon
2018-03-06 16:02                 ` Will Deacon
2017-12-14 16:58   ` [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address Robin Murphy
2017-12-14 16:58     ` Robin Murphy
     [not found]     ` <fde583dacee8099ed4e952dbf1c4dfb42070e9c5.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-02-26 18:05       ` Will Deacon
2018-02-26 18:05         ` Will Deacon
     [not found]         ` <20180226180501.GC26147-5wv7dgnIgG8@public.gmane.org>
2018-02-27 13:49           ` Robin Murphy
2018-02-27 13:49             ` Robin Murphy
     [not found]             ` <52ff8cd6-64ec-d7d4-2135-0b17becc4251-5wv7dgnIgG8@public.gmane.org>
2018-03-06 17:54               ` Will Deacon
2018-03-06 17:54                 ` Will Deacon
2017-12-14 16:58   ` [PATCH v2 3/4] iommu/arm-smmu-v3: " Robin Murphy
2017-12-14 16:58     ` Robin Murphy
     [not found]     ` <9d2a2eb4527987aec520e112163a178d92fc946b.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-02-26 18:05       ` Will Deacon
2018-02-26 18:05         ` Will Deacon
2017-12-14 16:58   ` [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address Robin Murphy
2017-12-14 16:58     ` Robin Murphy
     [not found]     ` <15d196d9a8e160c5df0e0340e23a432482389f9f.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-02-26 18:05       ` Will Deacon
2018-02-26 18:05         ` Will Deacon
     [not found]         ` <20180226180508.GE26147-5wv7dgnIgG8@public.gmane.org>
2018-02-27 13:57           ` Robin Murphy
2018-02-27 13:57             ` Robin Murphy
2018-02-26 18:04   ` [PATCH v2 0/4] SMMU 52-bit address support Will Deacon
2018-02-26 18:04     ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.