* [PATCH v2 0/4] SMMU 52-bit address support
@ 2017-12-14 16:58 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: will.deacon-5wv7dgnIgG8
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Hi all,
Here's a (hopefully final) update of 52-bit SMMU support. The only
material changes from v1[1] are a small cosmetic tweak to patch #1,
and correction of the silly error in patch #2 as reported by Nate in
testing.
Robin.
[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-November/025073.html
Robin Murphy (4):
iommu/arm-smmu-v3: Clean up address masking
iommu/io-pgtable-arm: Support 52-bit physical address
iommu/arm-smmu-v3: Support 52-bit physical address
iommu/arm-smmu-v3: Support 52-bit virtual address
drivers/iommu/arm-smmu-v3.c | 76 ++++++++++++++++++++++--------------------
drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++----------
2 files changed, 86 insertions(+), 55 deletions(-)
--
2.13.4.dirty
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 0/4] SMMU 52-bit address support
@ 2017-12-14 16:58 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: linux-arm-kernel
Hi all,
Here's a (hopefully final) update of 52-bit SMMU support. The only
material changes from v1[1] are a small cosmetic tweak to patch #1,
and correction of the silly error in patch #2 as reported by Nate in
testing.
Robin.
[1] https://lists.linuxfoundation.org/pipermail/iommu/2017-November/025073.html
Robin Murphy (4):
iommu/arm-smmu-v3: Clean up address masking
iommu/io-pgtable-arm: Support 52-bit physical address
iommu/arm-smmu-v3: Support 52-bit physical address
iommu/arm-smmu-v3: Support 52-bit virtual address
drivers/iommu/arm-smmu-v3.c | 76 ++++++++++++++++++++++--------------------
drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++----------
2 files changed, 86 insertions(+), 55 deletions(-)
--
2.13.4.dirty
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
2017-12-14 16:58 ` Robin Murphy
@ 2017-12-14 16:58 ` Robin Murphy
-1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: will.deacon-5wv7dgnIgG8
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Before trying to add the SMMUv3.1 support for 52-bit addresses, make
things bearable by cleaning up the various address mask definitions to
use GENMASK_ULL() consistently. The fact that doing so reveals (and
fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
jolly good idea it is...
Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
v2: Clean up one more now-unnecessary linewrap
drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
1 file changed, 21 insertions(+), 32 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f122071688fd..52cad776b31b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
#include <linux/acpi.h>
#include <linux/acpi_iort.h>
+#include <linux/bitops.h>
#include <linux/delay.h>
#include <linux/dma-iommu.h>
#include <linux/err.h>
@@ -158,8 +159,7 @@
#define ARM_SMMU_STRTAB_BASE 0x80
#define STRTAB_BASE_RA (1UL << 62)
-#define STRTAB_BASE_ADDR_SHIFT 6
-#define STRTAB_BASE_ADDR_MASK 0x3ffffffffffUL
+#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(47, 6)
#define ARM_SMMU_STRTAB_BASE_CFG 0x88
#define STRTAB_BASE_CFG_LOG2SIZE_SHIFT 0
@@ -190,8 +190,7 @@
#define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
/* Common MSI config fields */
-#define MSI_CFG0_ADDR_SHIFT 2
-#define MSI_CFG0_ADDR_MASK 0x3fffffffffffUL
+#define MSI_CFG0_ADDR_MASK GENMASK_ULL(47, 2)
#define MSI_CFG2_SH_SHIFT 4
#define MSI_CFG2_SH_NSH (0UL << MSI_CFG2_SH_SHIFT)
#define MSI_CFG2_SH_OSH (2UL << MSI_CFG2_SH_SHIFT)
@@ -207,8 +206,7 @@
Q_IDX(q, p) * (q)->ent_dwords)
#define Q_BASE_RWA (1UL << 62)
-#define Q_BASE_ADDR_SHIFT 5
-#define Q_BASE_ADDR_MASK 0xfffffffffffUL
+#define Q_BASE_ADDR_MASK GENMASK_ULL(47, 5)
#define Q_BASE_LOG2SIZE_SHIFT 0
#define Q_BASE_LOG2SIZE_MASK 0x1fUL
@@ -225,8 +223,7 @@
#define STRTAB_L1_DESC_DWORDS 1
#define STRTAB_L1_DESC_SPAN_SHIFT 0
#define STRTAB_L1_DESC_SPAN_MASK 0x1fUL
-#define STRTAB_L1_DESC_L2PTR_SHIFT 6
-#define STRTAB_L1_DESC_L2PTR_MASK 0x3ffffffffffUL
+#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(47, 6)
#define STRTAB_STE_DWORDS 8
#define STRTAB_STE_0_V (1UL << 0)
@@ -239,8 +236,7 @@
#define STRTAB_STE_0_S1FMT_SHIFT 4
#define STRTAB_STE_0_S1FMT_LINEAR (0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_SHIFT 6
-#define STRTAB_STE_0_S1CTXPTR_MASK 0x3ffffffffffUL
+#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
#define STRTAB_STE_0_S1CDMAX_SHIFT 59
#define STRTAB_STE_0_S1CDMAX_MASK 0x1fUL
@@ -278,8 +274,7 @@
#define STRTAB_STE_2_S2PTW (1UL << 54)
#define STRTAB_STE_2_S2R (1UL << 58)
-#define STRTAB_STE_3_S2TTB_SHIFT 4
-#define STRTAB_STE_3_S2TTB_MASK 0xfffffffffffUL
+#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(47, 4)
/* Context descriptor (stage-1 only) */
#define CTXDESC_CD_DWORDS 8
@@ -325,8 +320,7 @@
#define CTXDESC_CD_0_ASID_SHIFT 48
#define CTXDESC_CD_0_ASID_MASK 0xffffUL
-#define CTXDESC_CD_1_TTB0_SHIFT 4
-#define CTXDESC_CD_1_TTB0_MASK 0xfffffffffffUL
+#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(47, 4)
#define CTXDESC_CD_3_MAIR_SHIFT 0
@@ -351,7 +345,7 @@
#define CMDQ_PREFETCH_0_SID_SHIFT 32
#define CMDQ_PREFETCH_1_SIZE_SHIFT 0
-#define CMDQ_PREFETCH_1_ADDR_MASK ~0xfffUL
+#define CMDQ_PREFETCH_1_ADDR_MASK GENMASK_ULL(63, 12)
#define CMDQ_CFGI_0_SID_SHIFT 32
#define CMDQ_CFGI_0_SID_MASK 0xffffffffUL
@@ -362,8 +356,8 @@
#define CMDQ_TLBI_0_VMID_SHIFT 32
#define CMDQ_TLBI_0_ASID_SHIFT 48
#define CMDQ_TLBI_1_LEAF (1UL << 0)
-#define CMDQ_TLBI_1_VA_MASK ~0xfffUL
-#define CMDQ_TLBI_1_IPA_MASK 0xfffffffff000UL
+#define CMDQ_TLBI_1_VA_MASK GENMASK_ULL(63, 12)
+#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(47, 12)
#define CMDQ_PRI_0_SSID_SHIFT 12
#define CMDQ_PRI_0_SSID_MASK 0xfffffUL
@@ -386,8 +380,7 @@
#define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
#define CMDQ_SYNC_0_MSIDATA_SHIFT 32
#define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL
-#define CMDQ_SYNC_1_MSIADDR_SHIFT 0
-#define CMDQ_SYNC_1_MSIADDR_MASK 0xffffffffffffcUL
+#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(47, 2)
/* Event queue */
#define EVTQ_ENT_DWORDS 4
@@ -413,8 +406,7 @@
#define PRIQ_1_PRG_IDX_SHIFT 0
#define PRIQ_1_PRG_IDX_MASK 0x1ffUL
-#define PRIQ_1_ADDR_SHIFT 12
-#define PRIQ_1_ADDR_MASK 0xfffffffffffffUL
+#define PRIQ_1_ADDR_MASK GENMASK_ULL(63, 12)
/* High-level queue structures */
#define ARM_SMMU_POLL_TIMEOUT_US 100
@@ -1093,7 +1085,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
cfg->cdptr[0] = cpu_to_le64(val);
- val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
+ val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
cfg->cdptr[1] = cpu_to_le64(val);
cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
@@ -1107,8 +1099,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
val |= (desc->span & STRTAB_L1_DESC_SPAN_MASK)
<< STRTAB_L1_DESC_SPAN_SHIFT;
- val |= desc->l2ptr_dma &
- STRTAB_L1_DESC_L2PTR_MASK << STRTAB_L1_DESC_L2PTR_SHIFT;
+ val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
*dst = cpu_to_le64(val);
}
@@ -1214,8 +1205,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
- val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
- << STRTAB_STE_0_S1CTXPTR_SHIFT) |
+ val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
STRTAB_STE_0_CFG_S1_TRANS;
}
@@ -1232,7 +1222,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
STRTAB_STE_2_S2R);
dst[3] = cpu_to_le64(ste->s2_cfg->vttbr &
- STRTAB_STE_3_S2TTB_MASK << STRTAB_STE_3_S2TTB_SHIFT);
+ STRTAB_STE_3_S2TTB_MASK);
val |= STRTAB_STE_0_CFG_S2_TRANS;
}
@@ -1337,7 +1327,7 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
evt[0] & PRIQ_0_PERM_READ ? "R" : "",
evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
- evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
+ evt[1] & PRIQ_1_ADDR_MASK);
if (last) {
struct arm_smmu_cmdq_ent cmd = {
@@ -2093,7 +2083,7 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
q->ent_dwords = dwords;
q->q_base = Q_BASE_RWA;
- q->q_base |= q->base_dma & Q_BASE_ADDR_MASK << Q_BASE_ADDR_SHIFT;
+ q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
q->q_base |= (q->max_n_shift & Q_BASE_LOG2SIZE_MASK)
<< Q_BASE_LOG2SIZE_SHIFT;
@@ -2230,8 +2220,7 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
return ret;
/* Set the strtab base address */
- reg = smmu->strtab_cfg.strtab_dma &
- STRTAB_BASE_ADDR_MASK << STRTAB_BASE_ADDR_SHIFT;
+ reg = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
reg |= STRTAB_BASE_RA;
smmu->strtab_cfg.strtab_base = reg;
@@ -2294,7 +2283,7 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
phys_addr_t *cfg = arm_smmu_msi_cfg[desc->platform.msi_index];
doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
- doorbell &= MSI_CFG0_ADDR_MASK << MSI_CFG0_ADDR_SHIFT;
+ doorbell &= MSI_CFG0_ADDR_MASK;
writeq_relaxed(doorbell, smmu->base + cfg[0]);
writel_relaxed(msg->data, smmu->base + cfg[1]);
--
2.13.4.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
@ 2017-12-14 16:58 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: linux-arm-kernel
Before trying to add the SMMUv3.1 support for 52-bit addresses, make
things bearable by cleaning up the various address mask definitions to
use GENMASK_ULL() consistently. The fact that doing so reveals (and
fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
jolly good idea it is...
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
v2: Clean up one more now-unnecessary linewrap
drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
1 file changed, 21 insertions(+), 32 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index f122071688fd..52cad776b31b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -22,6 +22,7 @@
#include <linux/acpi.h>
#include <linux/acpi_iort.h>
+#include <linux/bitops.h>
#include <linux/delay.h>
#include <linux/dma-iommu.h>
#include <linux/err.h>
@@ -158,8 +159,7 @@
#define ARM_SMMU_STRTAB_BASE 0x80
#define STRTAB_BASE_RA (1UL << 62)
-#define STRTAB_BASE_ADDR_SHIFT 6
-#define STRTAB_BASE_ADDR_MASK 0x3ffffffffffUL
+#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(47, 6)
#define ARM_SMMU_STRTAB_BASE_CFG 0x88
#define STRTAB_BASE_CFG_LOG2SIZE_SHIFT 0
@@ -190,8 +190,7 @@
#define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
/* Common MSI config fields */
-#define MSI_CFG0_ADDR_SHIFT 2
-#define MSI_CFG0_ADDR_MASK 0x3fffffffffffUL
+#define MSI_CFG0_ADDR_MASK GENMASK_ULL(47, 2)
#define MSI_CFG2_SH_SHIFT 4
#define MSI_CFG2_SH_NSH (0UL << MSI_CFG2_SH_SHIFT)
#define MSI_CFG2_SH_OSH (2UL << MSI_CFG2_SH_SHIFT)
@@ -207,8 +206,7 @@
Q_IDX(q, p) * (q)->ent_dwords)
#define Q_BASE_RWA (1UL << 62)
-#define Q_BASE_ADDR_SHIFT 5
-#define Q_BASE_ADDR_MASK 0xfffffffffffUL
+#define Q_BASE_ADDR_MASK GENMASK_ULL(47, 5)
#define Q_BASE_LOG2SIZE_SHIFT 0
#define Q_BASE_LOG2SIZE_MASK 0x1fUL
@@ -225,8 +223,7 @@
#define STRTAB_L1_DESC_DWORDS 1
#define STRTAB_L1_DESC_SPAN_SHIFT 0
#define STRTAB_L1_DESC_SPAN_MASK 0x1fUL
-#define STRTAB_L1_DESC_L2PTR_SHIFT 6
-#define STRTAB_L1_DESC_L2PTR_MASK 0x3ffffffffffUL
+#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(47, 6)
#define STRTAB_STE_DWORDS 8
#define STRTAB_STE_0_V (1UL << 0)
@@ -239,8 +236,7 @@
#define STRTAB_STE_0_S1FMT_SHIFT 4
#define STRTAB_STE_0_S1FMT_LINEAR (0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_SHIFT 6
-#define STRTAB_STE_0_S1CTXPTR_MASK 0x3ffffffffffUL
+#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
#define STRTAB_STE_0_S1CDMAX_SHIFT 59
#define STRTAB_STE_0_S1CDMAX_MASK 0x1fUL
@@ -278,8 +274,7 @@
#define STRTAB_STE_2_S2PTW (1UL << 54)
#define STRTAB_STE_2_S2R (1UL << 58)
-#define STRTAB_STE_3_S2TTB_SHIFT 4
-#define STRTAB_STE_3_S2TTB_MASK 0xfffffffffffUL
+#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(47, 4)
/* Context descriptor (stage-1 only) */
#define CTXDESC_CD_DWORDS 8
@@ -325,8 +320,7 @@
#define CTXDESC_CD_0_ASID_SHIFT 48
#define CTXDESC_CD_0_ASID_MASK 0xffffUL
-#define CTXDESC_CD_1_TTB0_SHIFT 4
-#define CTXDESC_CD_1_TTB0_MASK 0xfffffffffffUL
+#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(47, 4)
#define CTXDESC_CD_3_MAIR_SHIFT 0
@@ -351,7 +345,7 @@
#define CMDQ_PREFETCH_0_SID_SHIFT 32
#define CMDQ_PREFETCH_1_SIZE_SHIFT 0
-#define CMDQ_PREFETCH_1_ADDR_MASK ~0xfffUL
+#define CMDQ_PREFETCH_1_ADDR_MASK GENMASK_ULL(63, 12)
#define CMDQ_CFGI_0_SID_SHIFT 32
#define CMDQ_CFGI_0_SID_MASK 0xffffffffUL
@@ -362,8 +356,8 @@
#define CMDQ_TLBI_0_VMID_SHIFT 32
#define CMDQ_TLBI_0_ASID_SHIFT 48
#define CMDQ_TLBI_1_LEAF (1UL << 0)
-#define CMDQ_TLBI_1_VA_MASK ~0xfffUL
-#define CMDQ_TLBI_1_IPA_MASK 0xfffffffff000UL
+#define CMDQ_TLBI_1_VA_MASK GENMASK_ULL(63, 12)
+#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(47, 12)
#define CMDQ_PRI_0_SSID_SHIFT 12
#define CMDQ_PRI_0_SSID_MASK 0xfffffUL
@@ -386,8 +380,7 @@
#define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
#define CMDQ_SYNC_0_MSIDATA_SHIFT 32
#define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL
-#define CMDQ_SYNC_1_MSIADDR_SHIFT 0
-#define CMDQ_SYNC_1_MSIADDR_MASK 0xffffffffffffcUL
+#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(47, 2)
/* Event queue */
#define EVTQ_ENT_DWORDS 4
@@ -413,8 +406,7 @@
#define PRIQ_1_PRG_IDX_SHIFT 0
#define PRIQ_1_PRG_IDX_MASK 0x1ffUL
-#define PRIQ_1_ADDR_SHIFT 12
-#define PRIQ_1_ADDR_MASK 0xfffffffffffffUL
+#define PRIQ_1_ADDR_MASK GENMASK_ULL(63, 12)
/* High-level queue structures */
#define ARM_SMMU_POLL_TIMEOUT_US 100
@@ -1093,7 +1085,7 @@ static void arm_smmu_write_ctx_desc(struct arm_smmu_device *smmu,
cfg->cdptr[0] = cpu_to_le64(val);
- val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK << CTXDESC_CD_1_TTB0_SHIFT;
+ val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
cfg->cdptr[1] = cpu_to_le64(val);
cfg->cdptr[3] = cpu_to_le64(cfg->cd.mair << CTXDESC_CD_3_MAIR_SHIFT);
@@ -1107,8 +1099,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
val |= (desc->span & STRTAB_L1_DESC_SPAN_MASK)
<< STRTAB_L1_DESC_SPAN_SHIFT;
- val |= desc->l2ptr_dma &
- STRTAB_L1_DESC_L2PTR_MASK << STRTAB_L1_DESC_L2PTR_SHIFT;
+ val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
*dst = cpu_to_le64(val);
}
@@ -1214,8 +1205,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
- val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK
- << STRTAB_STE_0_S1CTXPTR_SHIFT) |
+ val |= (ste->s1_cfg->cdptr_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
STRTAB_STE_0_CFG_S1_TRANS;
}
@@ -1232,7 +1222,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_device *smmu, u32 sid,
STRTAB_STE_2_S2R);
dst[3] = cpu_to_le64(ste->s2_cfg->vttbr &
- STRTAB_STE_3_S2TTB_MASK << STRTAB_STE_3_S2TTB_SHIFT);
+ STRTAB_STE_3_S2TTB_MASK);
val |= STRTAB_STE_0_CFG_S2_TRANS;
}
@@ -1337,7 +1327,7 @@ static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt)
evt[0] & PRIQ_0_PERM_READ ? "R" : "",
evt[0] & PRIQ_0_PERM_WRITE ? "W" : "",
evt[0] & PRIQ_0_PERM_EXEC ? "X" : "",
- evt[1] & PRIQ_1_ADDR_MASK << PRIQ_1_ADDR_SHIFT);
+ evt[1] & PRIQ_1_ADDR_MASK);
if (last) {
struct arm_smmu_cmdq_ent cmd = {
@@ -2093,7 +2083,7 @@ static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu,
q->ent_dwords = dwords;
q->q_base = Q_BASE_RWA;
- q->q_base |= q->base_dma & Q_BASE_ADDR_MASK << Q_BASE_ADDR_SHIFT;
+ q->q_base |= q->base_dma & Q_BASE_ADDR_MASK;
q->q_base |= (q->max_n_shift & Q_BASE_LOG2SIZE_MASK)
<< Q_BASE_LOG2SIZE_SHIFT;
@@ -2230,8 +2220,7 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
return ret;
/* Set the strtab base address */
- reg = smmu->strtab_cfg.strtab_dma &
- STRTAB_BASE_ADDR_MASK << STRTAB_BASE_ADDR_SHIFT;
+ reg = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK;
reg |= STRTAB_BASE_RA;
smmu->strtab_cfg.strtab_base = reg;
@@ -2294,7 +2283,7 @@ static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg)
phys_addr_t *cfg = arm_smmu_msi_cfg[desc->platform.msi_index];
doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo;
- doorbell &= MSI_CFG0_ADDR_MASK << MSI_CFG0_ADDR_SHIFT;
+ doorbell &= MSI_CFG0_ADDR_MASK;
writeq_relaxed(doorbell, smmu->base + cfg[0]);
writel_relaxed(msg->data, smmu->base + cfg[1]);
--
2.13.4.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
2017-12-14 16:58 ` Robin Murphy
@ 2017-12-14 16:58 ` Robin Murphy
-1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: will.deacon-5wv7dgnIgG8
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
52-bit physical addresses when using the 64KB translation granule.
This will be supported by SMMUv3.1.
Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
v2: Fix TCR_PS/TCR_IPS copy-paste error
drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
1 file changed, 47 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 51e5c43caed1..10d888b02948 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -21,6 +21,7 @@
#define pr_fmt(fmt) "arm-lpae io-pgtable: " fmt
#include <linux/atomic.h>
+#include <linux/bitops.h>
#include <linux/iommu.h>
#include <linux/kernel.h>
#include <linux/sizes.h>
@@ -32,7 +33,7 @@
#include "io-pgtable.h"
-#define ARM_LPAE_MAX_ADDR_BITS 48
+#define ARM_LPAE_MAX_ADDR_BITS 52
#define ARM_LPAE_S2_MAX_CONCAT_PAGES 16
#define ARM_LPAE_MAX_LEVELS 4
@@ -86,6 +87,8 @@
#define ARM_LPAE_PTE_TYPE_TABLE 3
#define ARM_LPAE_PTE_TYPE_PAGE 3
+#define ARM_LPAE_PTE_ADDR_MASK GENMASK_ULL(47,12)
+
#define ARM_LPAE_PTE_NSTABLE (((arm_lpae_iopte)1) << 63)
#define ARM_LPAE_PTE_XN (((arm_lpae_iopte)3) << 53)
#define ARM_LPAE_PTE_AF (((arm_lpae_iopte)1) << 10)
@@ -159,6 +162,7 @@
#define ARM_LPAE_TCR_PS_42_BIT 0x3ULL
#define ARM_LPAE_TCR_PS_44_BIT 0x4ULL
#define ARM_LPAE_TCR_PS_48_BIT 0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT 0x6ULL
#define ARM_LPAE_MAIR_ATTR_SHIFT(n) ((n) << 3)
#define ARM_LPAE_MAIR_ATTR_MASK 0xff
@@ -170,9 +174,7 @@
#define ARM_LPAE_MAIR_ATTR_IDX_DEV 2
/* IOPTE accessors */
-#define iopte_deref(pte,d) \
- (__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1) \
- & ~(ARM_LPAE_GRANULE(d) - 1ULL)))
+#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
#define iopte_type(pte,l) \
(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
@@ -184,12 +186,6 @@
(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) : \
(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK))
-#define iopte_to_pfn(pte,d) \
- (((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) >> (d)->pg_shift)
-
-#define pfn_to_iopte(pfn,d) \
- (((pfn) << (d)->pg_shift) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1))
-
struct arm_lpae_io_pgtable {
struct io_pgtable iop;
@@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
typedef u64 arm_lpae_iopte;
+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
+ struct arm_lpae_io_pgtable *data)
+{
+ arm_lpae_iopte pte = paddr;
+
+ /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
+ return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
+}
+
+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
+ struct arm_lpae_io_pgtable *data)
+{
+ phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
+ phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
+
+ /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
+ return (paddr ^ paddr_hi) | (paddr_hi << 36);
+}
+
static bool selftest_running = false;
static dma_addr_t __arm_lpae_dma_addr(void *pages)
@@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
pte |= ARM_LPAE_PTE_TYPE_BLOCK;
pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
- pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
+ pte |= paddr_to_iopte(paddr, data);
__arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
}
@@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
if (size == split_sz)
unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
- blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
+ blk_paddr = iopte_to_paddr(blk_pte, data);
pte = iopte_prot(blk_pte);
for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
@@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
found_translation:
iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
- return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
+ return iopte_to_paddr(pte, data) | iova;
}
static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
{
- unsigned long granule;
+ unsigned long granule, page_sizes;
+ unsigned int max_addr_bits = 48;
/*
* We need to restrict the supported page sizes to match the
@@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
switch (granule) {
case SZ_4K:
- cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
+ page_sizes = (SZ_4K | SZ_2M | SZ_1G);
break;
case SZ_16K:
- cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
+ page_sizes = (SZ_16K | SZ_32M);
break;
case SZ_64K:
- cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
+ max_addr_bits = 52;
+ page_sizes = (SZ_64K | SZ_512M);
+ if (cfg->oas > 48)
+ page_sizes |= 1ULL << 42; /* 4TB */
break;
default:
- cfg->pgsize_bitmap = 0;
+ page_sizes = 0;
}
+
+ cfg->pgsize_bitmap &= page_sizes;
+ cfg->ias = min(cfg->ias, max_addr_bits);
+ cfg->oas = min(cfg->oas, max_addr_bits);
}
static struct arm_lpae_io_pgtable *
@@ -784,6 +807,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
case 48:
reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
break;
+ case 52:
+ reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+ break;
default:
goto out_free_data;
}
@@ -891,6 +917,9 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
case 48:
reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
break;
+ case 52:
+ reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_PS_SHIFT);
+ break;
default:
goto out_free_data;
}
--
2.13.4.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
@ 2017-12-14 16:58 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: linux-arm-kernel
Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
52-bit physical addresses when using the 64KB translation granule.
This will be supported by SMMUv3.1.
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
v2: Fix TCR_PS/TCR_IPS copy-paste error
drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
1 file changed, 47 insertions(+), 18 deletions(-)
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 51e5c43caed1..10d888b02948 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -21,6 +21,7 @@
#define pr_fmt(fmt) "arm-lpae io-pgtable: " fmt
#include <linux/atomic.h>
+#include <linux/bitops.h>
#include <linux/iommu.h>
#include <linux/kernel.h>
#include <linux/sizes.h>
@@ -32,7 +33,7 @@
#include "io-pgtable.h"
-#define ARM_LPAE_MAX_ADDR_BITS 48
+#define ARM_LPAE_MAX_ADDR_BITS 52
#define ARM_LPAE_S2_MAX_CONCAT_PAGES 16
#define ARM_LPAE_MAX_LEVELS 4
@@ -86,6 +87,8 @@
#define ARM_LPAE_PTE_TYPE_TABLE 3
#define ARM_LPAE_PTE_TYPE_PAGE 3
+#define ARM_LPAE_PTE_ADDR_MASK GENMASK_ULL(47,12)
+
#define ARM_LPAE_PTE_NSTABLE (((arm_lpae_iopte)1) << 63)
#define ARM_LPAE_PTE_XN (((arm_lpae_iopte)3) << 53)
#define ARM_LPAE_PTE_AF (((arm_lpae_iopte)1) << 10)
@@ -159,6 +162,7 @@
#define ARM_LPAE_TCR_PS_42_BIT 0x3ULL
#define ARM_LPAE_TCR_PS_44_BIT 0x4ULL
#define ARM_LPAE_TCR_PS_48_BIT 0x5ULL
+#define ARM_LPAE_TCR_PS_52_BIT 0x6ULL
#define ARM_LPAE_MAIR_ATTR_SHIFT(n) ((n) << 3)
#define ARM_LPAE_MAIR_ATTR_MASK 0xff
@@ -170,9 +174,7 @@
#define ARM_LPAE_MAIR_ATTR_IDX_DEV 2
/* IOPTE accessors */
-#define iopte_deref(pte,d) \
- (__va((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1) \
- & ~(ARM_LPAE_GRANULE(d) - 1ULL)))
+#define iopte_deref(pte,d) __va(iopte_to_paddr(pte, d))
#define iopte_type(pte,l) \
(((pte) >> ARM_LPAE_PTE_TYPE_SHIFT) & ARM_LPAE_PTE_TYPE_MASK)
@@ -184,12 +186,6 @@
(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_PAGE) : \
(iopte_type(pte,l) == ARM_LPAE_PTE_TYPE_BLOCK))
-#define iopte_to_pfn(pte,d) \
- (((pte) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1)) >> (d)->pg_shift)
-
-#define pfn_to_iopte(pfn,d) \
- (((pfn) << (d)->pg_shift) & ((1ULL << ARM_LPAE_MAX_ADDR_BITS) - 1))
-
struct arm_lpae_io_pgtable {
struct io_pgtable iop;
@@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
typedef u64 arm_lpae_iopte;
+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
+ struct arm_lpae_io_pgtable *data)
+{
+ arm_lpae_iopte pte = paddr;
+
+ /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
+ return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
+}
+
+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
+ struct arm_lpae_io_pgtable *data)
+{
+ phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
+ phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
+
+ /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
+ return (paddr ^ paddr_hi) | (paddr_hi << 36);
+}
+
static bool selftest_running = false;
static dma_addr_t __arm_lpae_dma_addr(void *pages)
@@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
pte |= ARM_LPAE_PTE_TYPE_BLOCK;
pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
- pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
+ pte |= paddr_to_iopte(paddr, data);
__arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
}
@@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
if (size == split_sz)
unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
- blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
+ blk_paddr = iopte_to_paddr(blk_pte, data);
pte = iopte_prot(blk_pte);
for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
@@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
found_translation:
iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
- return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
+ return iopte_to_paddr(pte, data) | iova;
}
static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
{
- unsigned long granule;
+ unsigned long granule, page_sizes;
+ unsigned int max_addr_bits = 48;
/*
* We need to restrict the supported page sizes to match the
@@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
switch (granule) {
case SZ_4K:
- cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
+ page_sizes = (SZ_4K | SZ_2M | SZ_1G);
break;
case SZ_16K:
- cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
+ page_sizes = (SZ_16K | SZ_32M);
break;
case SZ_64K:
- cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
+ max_addr_bits = 52;
+ page_sizes = (SZ_64K | SZ_512M);
+ if (cfg->oas > 48)
+ page_sizes |= 1ULL << 42; /* 4TB */
break;
default:
- cfg->pgsize_bitmap = 0;
+ page_sizes = 0;
}
+
+ cfg->pgsize_bitmap &= page_sizes;
+ cfg->ias = min(cfg->ias, max_addr_bits);
+ cfg->oas = min(cfg->oas, max_addr_bits);
}
static struct arm_lpae_io_pgtable *
@@ -784,6 +807,9 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
case 48:
reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_IPS_SHIFT);
break;
+ case 52:
+ reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_IPS_SHIFT);
+ break;
default:
goto out_free_data;
}
@@ -891,6 +917,9 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie)
case 48:
reg |= (ARM_LPAE_TCR_PS_48_BIT << ARM_LPAE_TCR_PS_SHIFT);
break;
+ case 52:
+ reg |= (ARM_LPAE_TCR_PS_52_BIT << ARM_LPAE_TCR_PS_SHIFT);
+ break;
default:
goto out_free_data;
}
--
2.13.4.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 3/4] iommu/arm-smmu-v3: Support 52-bit physical address
2017-12-14 16:58 ` Robin Murphy
@ 2017-12-14 16:58 ` Robin Murphy
-1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: will.deacon-5wv7dgnIgG8
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
OAS implies 64KB translation granule support, permitting level 1 block
entries there is simple, and the rest is just extending address fields.
Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
v2: No change
drivers/iommu/arm-smmu-v3.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 52cad776b31b..c9c4e6132e27 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -101,6 +101,7 @@
#define IDR5_OAS_42_BIT (3 << IDR5_OAS_SHIFT)
#define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
#define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
+#define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
#define ARM_SMMU_CR0 0x20
#define CR0_CMDQEN (1 << 3)
@@ -159,7 +160,7 @@
#define ARM_SMMU_STRTAB_BASE 0x80
#define STRTAB_BASE_RA (1UL << 62)
-#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(47, 6)
+#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(51, 6)
#define ARM_SMMU_STRTAB_BASE_CFG 0x88
#define STRTAB_BASE_CFG_LOG2SIZE_SHIFT 0
@@ -190,7 +191,7 @@
#define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
/* Common MSI config fields */
-#define MSI_CFG0_ADDR_MASK GENMASK_ULL(47, 2)
+#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2)
#define MSI_CFG2_SH_SHIFT 4
#define MSI_CFG2_SH_NSH (0UL << MSI_CFG2_SH_SHIFT)
#define MSI_CFG2_SH_OSH (2UL << MSI_CFG2_SH_SHIFT)
@@ -206,7 +207,7 @@
Q_IDX(q, p) * (q)->ent_dwords)
#define Q_BASE_RWA (1UL << 62)
-#define Q_BASE_ADDR_MASK GENMASK_ULL(47, 5)
+#define Q_BASE_ADDR_MASK GENMASK_ULL(51, 5)
#define Q_BASE_LOG2SIZE_SHIFT 0
#define Q_BASE_LOG2SIZE_MASK 0x1fUL
@@ -223,7 +224,7 @@
#define STRTAB_L1_DESC_DWORDS 1
#define STRTAB_L1_DESC_SPAN_SHIFT 0
#define STRTAB_L1_DESC_SPAN_MASK 0x1fUL
-#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(47, 6)
+#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
#define STRTAB_STE_DWORDS 8
#define STRTAB_STE_0_V (1UL << 0)
@@ -236,7 +237,7 @@
#define STRTAB_STE_0_S1FMT_SHIFT 4
#define STRTAB_STE_0_S1FMT_LINEAR (0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
+#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(51, 6)
#define STRTAB_STE_0_S1CDMAX_SHIFT 59
#define STRTAB_STE_0_S1CDMAX_MASK 0x1fUL
@@ -274,7 +275,7 @@
#define STRTAB_STE_2_S2PTW (1UL << 54)
#define STRTAB_STE_2_S2R (1UL << 58)
-#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(47, 4)
+#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4)
/* Context descriptor (stage-1 only) */
#define CTXDESC_CD_DWORDS 8
@@ -320,7 +321,7 @@
#define CTXDESC_CD_0_ASID_SHIFT 48
#define CTXDESC_CD_0_ASID_MASK 0xffffUL
-#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(47, 4)
+#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(51, 4)
#define CTXDESC_CD_3_MAIR_SHIFT 0
@@ -357,7 +358,7 @@
#define CMDQ_TLBI_0_ASID_SHIFT 48
#define CMDQ_TLBI_1_LEAF (1UL << 0)
#define CMDQ_TLBI_1_VA_MASK GENMASK_ULL(63, 12)
-#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(47, 12)
+#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(51, 12)
#define CMDQ_PRI_0_SSID_SHIFT 12
#define CMDQ_PRI_0_SSID_MASK 0xfffffUL
@@ -380,7 +381,7 @@
#define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
#define CMDQ_SYNC_0_MSIDATA_SHIFT 32
#define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL
-#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(47, 2)
+#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(51, 2)
/* Event queue */
#define EVTQ_ENT_DWORDS 4
@@ -1686,7 +1687,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
return -ENOMEM;
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
- domain->geometry.aperture_end = (1UL << ias) - 1;
+ domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
domain->geometry.force_aperture = true;
smmu_domain->pgtbl_ops = pgtbl_ops;
@@ -2693,11 +2694,6 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
if (reg & IDR5_GRAN4K)
smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
- if (arm_smmu_ops.pgsize_bitmap == -1UL)
- arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
- else
- arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
-
/* Output address size */
switch (reg & IDR5_OAS_MASK << IDR5_OAS_SHIFT) {
case IDR5_OAS_32_BIT:
@@ -2715,6 +2711,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
case IDR5_OAS_44_BIT:
smmu->oas = 44;
break;
+ case IDR5_OAS_52_BIT:
+ smmu->oas = 52;
+ smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
+ break;
default:
dev_info(smmu->dev,
"unknown output address size. Truncating to 48-bit\n");
@@ -2723,6 +2723,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
smmu->oas = 48;
}
+ if (arm_smmu_ops.pgsize_bitmap == -1UL)
+ arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
+ else
+ arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
+
/* Set the DMA mask for our table walker */
if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
dev_warn(smmu->dev,
--
2.13.4.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 3/4] iommu/arm-smmu-v3: Support 52-bit physical address
@ 2017-12-14 16:58 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: linux-arm-kernel
Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
OAS implies 64KB translation granule support, permitting level 1 block
entries there is simple, and the rest is just extending address fields.
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
v2: No change
drivers/iommu/arm-smmu-v3.c | 35 ++++++++++++++++++++---------------
1 file changed, 20 insertions(+), 15 deletions(-)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 52cad776b31b..c9c4e6132e27 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -101,6 +101,7 @@
#define IDR5_OAS_42_BIT (3 << IDR5_OAS_SHIFT)
#define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
#define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
+#define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
#define ARM_SMMU_CR0 0x20
#define CR0_CMDQEN (1 << 3)
@@ -159,7 +160,7 @@
#define ARM_SMMU_STRTAB_BASE 0x80
#define STRTAB_BASE_RA (1UL << 62)
-#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(47, 6)
+#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(51, 6)
#define ARM_SMMU_STRTAB_BASE_CFG 0x88
#define STRTAB_BASE_CFG_LOG2SIZE_SHIFT 0
@@ -190,7 +191,7 @@
#define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
/* Common MSI config fields */
-#define MSI_CFG0_ADDR_MASK GENMASK_ULL(47, 2)
+#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2)
#define MSI_CFG2_SH_SHIFT 4
#define MSI_CFG2_SH_NSH (0UL << MSI_CFG2_SH_SHIFT)
#define MSI_CFG2_SH_OSH (2UL << MSI_CFG2_SH_SHIFT)
@@ -206,7 +207,7 @@
Q_IDX(q, p) * (q)->ent_dwords)
#define Q_BASE_RWA (1UL << 62)
-#define Q_BASE_ADDR_MASK GENMASK_ULL(47, 5)
+#define Q_BASE_ADDR_MASK GENMASK_ULL(51, 5)
#define Q_BASE_LOG2SIZE_SHIFT 0
#define Q_BASE_LOG2SIZE_MASK 0x1fUL
@@ -223,7 +224,7 @@
#define STRTAB_L1_DESC_DWORDS 1
#define STRTAB_L1_DESC_SPAN_SHIFT 0
#define STRTAB_L1_DESC_SPAN_MASK 0x1fUL
-#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(47, 6)
+#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
#define STRTAB_STE_DWORDS 8
#define STRTAB_STE_0_V (1UL << 0)
@@ -236,7 +237,7 @@
#define STRTAB_STE_0_S1FMT_SHIFT 4
#define STRTAB_STE_0_S1FMT_LINEAR (0UL << STRTAB_STE_0_S1FMT_SHIFT)
-#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
+#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(51, 6)
#define STRTAB_STE_0_S1CDMAX_SHIFT 59
#define STRTAB_STE_0_S1CDMAX_MASK 0x1fUL
@@ -274,7 +275,7 @@
#define STRTAB_STE_2_S2PTW (1UL << 54)
#define STRTAB_STE_2_S2R (1UL << 58)
-#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(47, 4)
+#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4)
/* Context descriptor (stage-1 only) */
#define CTXDESC_CD_DWORDS 8
@@ -320,7 +321,7 @@
#define CTXDESC_CD_0_ASID_SHIFT 48
#define CTXDESC_CD_0_ASID_MASK 0xffffUL
-#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(47, 4)
+#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(51, 4)
#define CTXDESC_CD_3_MAIR_SHIFT 0
@@ -357,7 +358,7 @@
#define CMDQ_TLBI_0_ASID_SHIFT 48
#define CMDQ_TLBI_1_LEAF (1UL << 0)
#define CMDQ_TLBI_1_VA_MASK GENMASK_ULL(63, 12)
-#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(47, 12)
+#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(51, 12)
#define CMDQ_PRI_0_SSID_SHIFT 12
#define CMDQ_PRI_0_SSID_MASK 0xfffffUL
@@ -380,7 +381,7 @@
#define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
#define CMDQ_SYNC_0_MSIDATA_SHIFT 32
#define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL
-#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(47, 2)
+#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(51, 2)
/* Event queue */
#define EVTQ_ENT_DWORDS 4
@@ -1686,7 +1687,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
return -ENOMEM;
domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
- domain->geometry.aperture_end = (1UL << ias) - 1;
+ domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
domain->geometry.force_aperture = true;
smmu_domain->pgtbl_ops = pgtbl_ops;
@@ -2693,11 +2694,6 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
if (reg & IDR5_GRAN4K)
smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
- if (arm_smmu_ops.pgsize_bitmap == -1UL)
- arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
- else
- arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
-
/* Output address size */
switch (reg & IDR5_OAS_MASK << IDR5_OAS_SHIFT) {
case IDR5_OAS_32_BIT:
@@ -2715,6 +2711,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
case IDR5_OAS_44_BIT:
smmu->oas = 44;
break;
+ case IDR5_OAS_52_BIT:
+ smmu->oas = 52;
+ smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */
+ break;
default:
dev_info(smmu->dev,
"unknown output address size. Truncating to 48-bit\n");
@@ -2723,6 +2723,11 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
smmu->oas = 48;
}
+ if (arm_smmu_ops.pgsize_bitmap == -1UL)
+ arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap;
+ else
+ arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap;
+
/* Set the DMA mask for our table walker */
if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas)))
dev_warn(smmu->dev,
--
2.13.4.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
2017-12-14 16:58 ` Robin Murphy
@ 2017-12-14 16:58 ` Robin Murphy
-1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: will.deacon-5wv7dgnIgG8
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
really all that's involved is letting io-pgtable know the appropriate
upper bound for T0SZ.
Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
---
v2: No change
drivers/iommu/arm-smmu-v3.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c9c4e6132e27..7059a0805bd1 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -102,6 +102,7 @@
#define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
#define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
#define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
+#define IDR5_VAX (1 << 10)
#define ARM_SMMU_CR0 0x20
#define CR0_CMDQEN (1 << 3)
@@ -604,6 +605,7 @@ struct arm_smmu_device {
#define ARM_SMMU_FEAT_STALLS (1 << 11)
#define ARM_SMMU_FEAT_HYP (1 << 12)
#define ARM_SMMU_FEAT_STALL_FORCE (1 << 13)
+#define ARM_SMMU_FEAT_VAX (1 << 14)
u32 features;
#define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
@@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
switch (smmu_domain->stage) {
case ARM_SMMU_DOMAIN_S1:
ias = VA_BITS;
+ if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
+ ias = 48;
oas = smmu->ias;
fmt = ARM_64_LPAE_S1;
finalise_stage_fn = arm_smmu_domain_finalise_s1;
@@ -2694,6 +2698,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
if (reg & IDR5_GRAN4K)
smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
+ /* Input address size */
+ if (reg & IDR5_VAX)
+ smmu->features |= ARM_SMMU_FEAT_VAX;
+
/* Output address size */
switch (reg & IDR5_OAS_MASK << IDR5_OAS_SHIFT) {
case IDR5_OAS_32_BIT:
--
2.13.4.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
@ 2017-12-14 16:58 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2017-12-14 16:58 UTC (permalink / raw)
To: linux-arm-kernel
Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
really all that's involved is letting io-pgtable know the appropriate
upper bound for T0SZ.
Tested-by: Nate Watterson <nwatters@codeaurora.org>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
v2: No change
drivers/iommu/arm-smmu-v3.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c9c4e6132e27..7059a0805bd1 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -102,6 +102,7 @@
#define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
#define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
#define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
+#define IDR5_VAX (1 << 10)
#define ARM_SMMU_CR0 0x20
#define CR0_CMDQEN (1 << 3)
@@ -604,6 +605,7 @@ struct arm_smmu_device {
#define ARM_SMMU_FEAT_STALLS (1 << 11)
#define ARM_SMMU_FEAT_HYP (1 << 12)
#define ARM_SMMU_FEAT_STALL_FORCE (1 << 13)
+#define ARM_SMMU_FEAT_VAX (1 << 14)
u32 features;
#define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
@@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
switch (smmu_domain->stage) {
case ARM_SMMU_DOMAIN_S1:
ias = VA_BITS;
+ if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
+ ias = 48;
oas = smmu->ias;
fmt = ARM_64_LPAE_S1;
finalise_stage_fn = arm_smmu_domain_finalise_s1;
@@ -2694,6 +2698,10 @@ static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
if (reg & IDR5_GRAN4K)
smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G;
+ /* Input address size */
+ if (reg & IDR5_VAX)
+ smmu->features |= ARM_SMMU_FEAT_VAX;
+
/* Output address size */
switch (reg & IDR5_OAS_MASK << IDR5_OAS_SHIFT) {
case IDR5_OAS_32_BIT:
--
2.13.4.dirty
^ permalink raw reply related [flat|nested] 30+ messages in thread
* Re: [PATCH v2 0/4] SMMU 52-bit address support
2017-12-14 16:58 ` Robin Murphy
@ 2018-02-26 18:04 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:04 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Hi Robin,
On Thu, Dec 14, 2017 at 04:58:49PM +0000, Robin Murphy wrote:
> Here's a (hopefully final) update of 52-bit SMMU support. The only
> material changes from v1[1] are a small cosmetic tweak to patch #1,
> and correction of the silly error in patch #2 as reported by Nate in
> testing.
This generally looks pretty good, but I have some review comments that I
think we should address before merging.
Cheers,
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 0/4] SMMU 52-bit address support
@ 2018-02-26 18:04 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:04 UTC (permalink / raw)
To: linux-arm-kernel
Hi Robin,
On Thu, Dec 14, 2017 at 04:58:49PM +0000, Robin Murphy wrote:
> Here's a (hopefully final) update of 52-bit SMMU support. The only
> material changes from v1[1] are a small cosmetic tweak to patch #1,
> and correction of the silly error in patch #2 as reported by Nate in
> testing.
This generally looks pretty good, but I have some review comments that I
think we should address before merging.
Cheers,
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
2017-12-14 16:58 ` Robin Murphy
@ 2018-02-26 18:04 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:04 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Hi Robin,
On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
> Before trying to add the SMMUv3.1 support for 52-bit addresses, make
> things bearable by cleaning up the various address mask definitions to
> use GENMASK_ULL() consistently. The fact that doing so reveals (and
> fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
> jolly good idea it is...
>
> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>
> v2: Clean up one more now-unnecessary linewrap
>
> drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
> 1 file changed, 21 insertions(+), 32 deletions(-)
Whilst I agree that using GENMASK is better, this patch does mean that the
driver is (more) inconsistent with its _MASK terminology in that you can't
generally tell whether a definition that ends in _MASK is shifted or not,
and this isn't even consistent for fields within the same register.
Should we be using GENMASK/BIT for all fields instead and removing all of
the _SHIFT definitions?
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
@ 2018-02-26 18:04 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:04 UTC (permalink / raw)
To: linux-arm-kernel
Hi Robin,
On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
> Before trying to add the SMMUv3.1 support for 52-bit addresses, make
> things bearable by cleaning up the various address mask definitions to
> use GENMASK_ULL() consistently. The fact that doing so reveals (and
> fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
> jolly good idea it is...
>
> Tested-by: Nate Watterson <nwatters@codeaurora.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> v2: Clean up one more now-unnecessary linewrap
>
> drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
> 1 file changed, 21 insertions(+), 32 deletions(-)
Whilst I agree that using GENMASK is better, this patch does mean that the
driver is (more) inconsistent with its _MASK terminology in that you can't
generally tell whether a definition that ends in _MASK is shifted or not,
and this isn't even consistent for fields within the same register.
Should we be using GENMASK/BIT for all fields instead and removing all of
the _SHIFT definitions?
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
2017-12-14 16:58 ` Robin Murphy
@ 2018-02-26 18:05 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
> Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
> 52-bit physical addresses when using the 64KB translation granule.
> This will be supported by SMMUv3.1.
>
> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>
> v2: Fix TCR_PS/TCR_IPS copy-paste error
>
> drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
> 1 file changed, 47 insertions(+), 18 deletions(-)
[...]
> @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
>
> typedef u64 arm_lpae_iopte;
>
> +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
> + struct arm_lpae_io_pgtable *data)
> +{
> + arm_lpae_iopte pte = paddr;
> +
> + /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
> + return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
> +}
I don't particularly like relying on properties of the paddr for correct
construction of the pte here. The existing macro doesn't have this
limitation. I suspect it's all fine at the moment because we only use TTBR0,
but I'd rather not bake that in if we can avoid it.
> +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> + struct arm_lpae_io_pgtable *data)
> +{
> + phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> + phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
> +
> + /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
> + return (paddr ^ paddr_hi) | (paddr_hi << 36);
Why do we need xor here?
> static bool selftest_running = false;
>
> static dma_addr_t __arm_lpae_dma_addr(void *pages)
> @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
> pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>
> pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
> - pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
> + pte |= paddr_to_iopte(paddr, data);
>
> __arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
> }
> @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
> if (size == split_sz)
> unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
>
> - blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
> + blk_paddr = iopte_to_paddr(blk_pte, data);
> pte = iopte_prot(blk_pte);
>
> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
> @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>
> found_translation:
> iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
> - return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
> + return iopte_to_paddr(pte, data) | iova;
> }
>
> static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
> {
> - unsigned long granule;
> + unsigned long granule, page_sizes;
> + unsigned int max_addr_bits = 48;
>
> /*
> * We need to restrict the supported page sizes to match the
> @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>
> switch (granule) {
> case SZ_4K:
> - cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
> + page_sizes = (SZ_4K | SZ_2M | SZ_1G);
> break;
> case SZ_16K:
> - cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
> + page_sizes = (SZ_16K | SZ_32M);
> break;
> case SZ_64K:
> - cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
> + max_addr_bits = 52;
> + page_sizes = (SZ_64K | SZ_512M);
> + if (cfg->oas > 48)
> + page_sizes |= 1ULL << 42; /* 4TB */
> break;
> default:
> - cfg->pgsize_bitmap = 0;
> + page_sizes = 0;
> }
> +
> + cfg->pgsize_bitmap &= page_sizes;
> + cfg->ias = min(cfg->ias, max_addr_bits);
> + cfg->oas = min(cfg->oas, max_addr_bits);
I don't think we should be writing to the ias/oas fields here, at least
now without auditing the drivers and updating the comments about the
io-pgtable API. For example, the SMMUv3 driver uses its own ias local
variable to initialise the domain geometry, and won't pick up any changes
made here.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
@ 2018-02-26 18:05 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
> Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
> 52-bit physical addresses when using the 64KB translation granule.
> This will be supported by SMMUv3.1.
>
> Tested-by: Nate Watterson <nwatters@codeaurora.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> v2: Fix TCR_PS/TCR_IPS copy-paste error
>
> drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
> 1 file changed, 47 insertions(+), 18 deletions(-)
[...]
> @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
>
> typedef u64 arm_lpae_iopte;
>
> +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
> + struct arm_lpae_io_pgtable *data)
> +{
> + arm_lpae_iopte pte = paddr;
> +
> + /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
> + return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
> +}
I don't particularly like relying on properties of the paddr for correct
construction of the pte here. The existing macro doesn't have this
limitation. I suspect it's all fine at the moment because we only use TTBR0,
but I'd rather not bake that in if we can avoid it.
> +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> + struct arm_lpae_io_pgtable *data)
> +{
> + phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> + phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
> +
> + /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
> + return (paddr ^ paddr_hi) | (paddr_hi << 36);
Why do we need xor here?
> static bool selftest_running = false;
>
> static dma_addr_t __arm_lpae_dma_addr(void *pages)
> @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
> pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>
> pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
> - pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
> + pte |= paddr_to_iopte(paddr, data);
>
> __arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
> }
> @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
> if (size == split_sz)
> unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
>
> - blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
> + blk_paddr = iopte_to_paddr(blk_pte, data);
> pte = iopte_prot(blk_pte);
>
> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
> @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>
> found_translation:
> iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
> - return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
> + return iopte_to_paddr(pte, data) | iova;
> }
>
> static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
> {
> - unsigned long granule;
> + unsigned long granule, page_sizes;
> + unsigned int max_addr_bits = 48;
>
> /*
> * We need to restrict the supported page sizes to match the
> @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>
> switch (granule) {
> case SZ_4K:
> - cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
> + page_sizes = (SZ_4K | SZ_2M | SZ_1G);
> break;
> case SZ_16K:
> - cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
> + page_sizes = (SZ_16K | SZ_32M);
> break;
> case SZ_64K:
> - cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
> + max_addr_bits = 52;
> + page_sizes = (SZ_64K | SZ_512M);
> + if (cfg->oas > 48)
> + page_sizes |= 1ULL << 42; /* 4TB */
> break;
> default:
> - cfg->pgsize_bitmap = 0;
> + page_sizes = 0;
> }
> +
> + cfg->pgsize_bitmap &= page_sizes;
> + cfg->ias = min(cfg->ias, max_addr_bits);
> + cfg->oas = min(cfg->oas, max_addr_bits);
I don't think we should be writing to the ias/oas fields here, at least
now without auditing the drivers and updating the comments about the
io-pgtable API. For example, the SMMUv3 driver uses its own ias local
variable to initialise the domain geometry, and won't pick up any changes
made here.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 3/4] iommu/arm-smmu-v3: Support 52-bit physical address
2017-12-14 16:58 ` Robin Murphy
@ 2018-02-26 18:05 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
On Thu, Dec 14, 2017 at 04:58:52PM +0000, Robin Murphy wrote:
> Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
> OAS implies 64KB translation granule support, permitting level 1 block
> entries there is simple, and the rest is just extending address fields.
>
> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>
> v2: No change
>
> drivers/iommu/arm-smmu-v3.c | 35 ++++++++++++++++++++---------------
> 1 file changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 52cad776b31b..c9c4e6132e27 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -101,6 +101,7 @@
> #define IDR5_OAS_42_BIT (3 << IDR5_OAS_SHIFT)
> #define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
> #define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
> +#define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
>
> #define ARM_SMMU_CR0 0x20
> #define CR0_CMDQEN (1 << 3)
> @@ -159,7 +160,7 @@
>
> #define ARM_SMMU_STRTAB_BASE 0x80
> #define STRTAB_BASE_RA (1UL << 62)
> -#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(47, 6)
> +#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(51, 6)
>
> #define ARM_SMMU_STRTAB_BASE_CFG 0x88
> #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT 0
> @@ -190,7 +191,7 @@
> #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
>
> /* Common MSI config fields */
> -#define MSI_CFG0_ADDR_MASK GENMASK_ULL(47, 2)
> +#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2)
> #define MSI_CFG2_SH_SHIFT 4
> #define MSI_CFG2_SH_NSH (0UL << MSI_CFG2_SH_SHIFT)
> #define MSI_CFG2_SH_OSH (2UL << MSI_CFG2_SH_SHIFT)
> @@ -206,7 +207,7 @@
> Q_IDX(q, p) * (q)->ent_dwords)
>
> #define Q_BASE_RWA (1UL << 62)
> -#define Q_BASE_ADDR_MASK GENMASK_ULL(47, 5)
> +#define Q_BASE_ADDR_MASK GENMASK_ULL(51, 5)
> #define Q_BASE_LOG2SIZE_SHIFT 0
> #define Q_BASE_LOG2SIZE_MASK 0x1fUL
>
> @@ -223,7 +224,7 @@
> #define STRTAB_L1_DESC_DWORDS 1
> #define STRTAB_L1_DESC_SPAN_SHIFT 0
> #define STRTAB_L1_DESC_SPAN_MASK 0x1fUL
> -#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(47, 6)
> +#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
>
> #define STRTAB_STE_DWORDS 8
> #define STRTAB_STE_0_V (1UL << 0)
> @@ -236,7 +237,7 @@
>
> #define STRTAB_STE_0_S1FMT_SHIFT 4
> #define STRTAB_STE_0_S1FMT_LINEAR (0UL << STRTAB_STE_0_S1FMT_SHIFT)
> -#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
> +#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(51, 6)
> #define STRTAB_STE_0_S1CDMAX_SHIFT 59
> #define STRTAB_STE_0_S1CDMAX_MASK 0x1fUL
>
> @@ -274,7 +275,7 @@
> #define STRTAB_STE_2_S2PTW (1UL << 54)
> #define STRTAB_STE_2_S2R (1UL << 58)
>
> -#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(47, 4)
> +#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4)
>
> /* Context descriptor (stage-1 only) */
> #define CTXDESC_CD_DWORDS 8
> @@ -320,7 +321,7 @@
> #define CTXDESC_CD_0_ASID_SHIFT 48
> #define CTXDESC_CD_0_ASID_MASK 0xffffUL
>
> -#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(47, 4)
> +#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(51, 4)
>
> #define CTXDESC_CD_3_MAIR_SHIFT 0
>
> @@ -357,7 +358,7 @@
> #define CMDQ_TLBI_0_ASID_SHIFT 48
> #define CMDQ_TLBI_1_LEAF (1UL << 0)
> #define CMDQ_TLBI_1_VA_MASK GENMASK_ULL(63, 12)
> -#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(47, 12)
> +#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(51, 12)
>
> #define CMDQ_PRI_0_SSID_SHIFT 12
> #define CMDQ_PRI_0_SSID_MASK 0xfffffUL
> @@ -380,7 +381,7 @@
> #define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
> #define CMDQ_SYNC_0_MSIDATA_SHIFT 32
> #define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL
> -#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(47, 2)
> +#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(51, 2)
>
> /* Event queue */
> #define EVTQ_ENT_DWORDS 4
> @@ -1686,7 +1687,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
> return -ENOMEM;
>
> domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> - domain->geometry.aperture_end = (1UL << ias) - 1;
> + domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
Ah good, you fixed this :)
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 3/4] iommu/arm-smmu-v3: Support 52-bit physical address
@ 2018-02-26 18:05 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Dec 14, 2017 at 04:58:52PM +0000, Robin Murphy wrote:
> Implement SMMUv3.1 support for 52-bit physical addresses. Since a 52-bit
> OAS implies 64KB translation granule support, permitting level 1 block
> entries there is simple, and the rest is just extending address fields.
>
> Tested-by: Nate Watterson <nwatters@codeaurora.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> v2: No change
>
> drivers/iommu/arm-smmu-v3.c | 35 ++++++++++++++++++++---------------
> 1 file changed, 20 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 52cad776b31b..c9c4e6132e27 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -101,6 +101,7 @@
> #define IDR5_OAS_42_BIT (3 << IDR5_OAS_SHIFT)
> #define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
> #define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
> +#define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
>
> #define ARM_SMMU_CR0 0x20
> #define CR0_CMDQEN (1 << 3)
> @@ -159,7 +160,7 @@
>
> #define ARM_SMMU_STRTAB_BASE 0x80
> #define STRTAB_BASE_RA (1UL << 62)
> -#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(47, 6)
> +#define STRTAB_BASE_ADDR_MASK GENMASK_ULL(51, 6)
>
> #define ARM_SMMU_STRTAB_BASE_CFG 0x88
> #define STRTAB_BASE_CFG_LOG2SIZE_SHIFT 0
> @@ -190,7 +191,7 @@
> #define ARM_SMMU_PRIQ_IRQ_CFG2 0xdc
>
> /* Common MSI config fields */
> -#define MSI_CFG0_ADDR_MASK GENMASK_ULL(47, 2)
> +#define MSI_CFG0_ADDR_MASK GENMASK_ULL(51, 2)
> #define MSI_CFG2_SH_SHIFT 4
> #define MSI_CFG2_SH_NSH (0UL << MSI_CFG2_SH_SHIFT)
> #define MSI_CFG2_SH_OSH (2UL << MSI_CFG2_SH_SHIFT)
> @@ -206,7 +207,7 @@
> Q_IDX(q, p) * (q)->ent_dwords)
>
> #define Q_BASE_RWA (1UL << 62)
> -#define Q_BASE_ADDR_MASK GENMASK_ULL(47, 5)
> +#define Q_BASE_ADDR_MASK GENMASK_ULL(51, 5)
> #define Q_BASE_LOG2SIZE_SHIFT 0
> #define Q_BASE_LOG2SIZE_MASK 0x1fUL
>
> @@ -223,7 +224,7 @@
> #define STRTAB_L1_DESC_DWORDS 1
> #define STRTAB_L1_DESC_SPAN_SHIFT 0
> #define STRTAB_L1_DESC_SPAN_MASK 0x1fUL
> -#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(47, 6)
> +#define STRTAB_L1_DESC_L2PTR_MASK GENMASK_ULL(51, 6)
>
> #define STRTAB_STE_DWORDS 8
> #define STRTAB_STE_0_V (1UL << 0)
> @@ -236,7 +237,7 @@
>
> #define STRTAB_STE_0_S1FMT_SHIFT 4
> #define STRTAB_STE_0_S1FMT_LINEAR (0UL << STRTAB_STE_0_S1FMT_SHIFT)
> -#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(47, 6)
> +#define STRTAB_STE_0_S1CTXPTR_MASK GENMASK_ULL(51, 6)
> #define STRTAB_STE_0_S1CDMAX_SHIFT 59
> #define STRTAB_STE_0_S1CDMAX_MASK 0x1fUL
>
> @@ -274,7 +275,7 @@
> #define STRTAB_STE_2_S2PTW (1UL << 54)
> #define STRTAB_STE_2_S2R (1UL << 58)
>
> -#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(47, 4)
> +#define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4)
>
> /* Context descriptor (stage-1 only) */
> #define CTXDESC_CD_DWORDS 8
> @@ -320,7 +321,7 @@
> #define CTXDESC_CD_0_ASID_SHIFT 48
> #define CTXDESC_CD_0_ASID_MASK 0xffffUL
>
> -#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(47, 4)
> +#define CTXDESC_CD_1_TTB0_MASK GENMASK_ULL(51, 4)
>
> #define CTXDESC_CD_3_MAIR_SHIFT 0
>
> @@ -357,7 +358,7 @@
> #define CMDQ_TLBI_0_ASID_SHIFT 48
> #define CMDQ_TLBI_1_LEAF (1UL << 0)
> #define CMDQ_TLBI_1_VA_MASK GENMASK_ULL(63, 12)
> -#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(47, 12)
> +#define CMDQ_TLBI_1_IPA_MASK GENMASK_ULL(51, 12)
>
> #define CMDQ_PRI_0_SSID_SHIFT 12
> #define CMDQ_PRI_0_SSID_MASK 0xfffffUL
> @@ -380,7 +381,7 @@
> #define CMDQ_SYNC_0_MSIATTR_OIWB (0xfUL << CMDQ_SYNC_0_MSIATTR_SHIFT)
> #define CMDQ_SYNC_0_MSIDATA_SHIFT 32
> #define CMDQ_SYNC_0_MSIDATA_MASK 0xffffffffUL
> -#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(47, 2)
> +#define CMDQ_SYNC_1_MSIADDR_MASK GENMASK_ULL(51, 2)
>
> /* Event queue */
> #define EVTQ_ENT_DWORDS 4
> @@ -1686,7 +1687,7 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
> return -ENOMEM;
>
> domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
> - domain->geometry.aperture_end = (1UL << ias) - 1;
> + domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
Ah good, you fixed this :)
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
2017-12-14 16:58 ` Robin Murphy
@ 2018-02-26 18:05 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
On Thu, Dec 14, 2017 at 04:58:53PM +0000, Robin Murphy wrote:
> Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
> really all that's involved is letting io-pgtable know the appropriate
> upper bound for T0SZ.
>
> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> ---
>
> v2: No change
>
> drivers/iommu/arm-smmu-v3.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index c9c4e6132e27..7059a0805bd1 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -102,6 +102,7 @@
> #define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
> #define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
> #define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
> +#define IDR5_VAX (1 << 10)
I think this is actually a two-bit field, so we should check the whole
thing.
>
> #define ARM_SMMU_CR0 0x20
> #define CR0_CMDQEN (1 << 3)
> @@ -604,6 +605,7 @@ struct arm_smmu_device {
> #define ARM_SMMU_FEAT_STALLS (1 << 11)
> #define ARM_SMMU_FEAT_HYP (1 << 12)
> #define ARM_SMMU_FEAT_STALL_FORCE (1 << 13)
> +#define ARM_SMMU_FEAT_VAX (1 << 14)
> u32 features;
>
> #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
> @@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
> switch (smmu_domain->stage) {
> case ARM_SMMU_DOMAIN_S1:
> ias = VA_BITS;
> + if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
nit, but I'd rather the if condition was checking ias instead of VA_BITS.
> + ias = 48;
> oas = smmu->ias;
Should we also be sanitising the oas here if the SMMU doesn't support 64k
pages? It looks like the ias/oas sanitisation isn't cleanly split between
the pgtable code and the SMMU driver.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
@ 2018-02-26 18:05 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-02-26 18:05 UTC (permalink / raw)
To: linux-arm-kernel
On Thu, Dec 14, 2017 at 04:58:53PM +0000, Robin Murphy wrote:
> Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
> really all that's involved is letting io-pgtable know the appropriate
> upper bound for T0SZ.
>
> Tested-by: Nate Watterson <nwatters@codeaurora.org>
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>
> v2: No change
>
> drivers/iommu/arm-smmu-v3.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index c9c4e6132e27..7059a0805bd1 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -102,6 +102,7 @@
> #define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
> #define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
> #define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
> +#define IDR5_VAX (1 << 10)
I think this is actually a two-bit field, so we should check the whole
thing.
>
> #define ARM_SMMU_CR0 0x20
> #define CR0_CMDQEN (1 << 3)
> @@ -604,6 +605,7 @@ struct arm_smmu_device {
> #define ARM_SMMU_FEAT_STALLS (1 << 11)
> #define ARM_SMMU_FEAT_HYP (1 << 12)
> #define ARM_SMMU_FEAT_STALL_FORCE (1 << 13)
> +#define ARM_SMMU_FEAT_VAX (1 << 14)
> u32 features;
>
> #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
> @@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
> switch (smmu_domain->stage) {
> case ARM_SMMU_DOMAIN_S1:
> ias = VA_BITS;
> + if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
nit, but I'd rather the if condition was checking ias instead of VA_BITS.
> + ias = 48;
> oas = smmu->ias;
Should we also be sanitising the oas here if the SMMU doesn't support 64k
pages? It looks like the ias/oas sanitisation isn't cleanly split between
the pgtable code and the SMMU driver.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
2018-02-26 18:04 ` Will Deacon
@ 2018-02-27 13:28 ` Robin Murphy
-1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:28 UTC (permalink / raw)
To: Will Deacon
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Hi Will,
On 26/02/18 18:04, Will Deacon wrote:
> Hi Robin,
>
> On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
>> Before trying to add the SMMUv3.1 support for 52-bit addresses, make
>> things bearable by cleaning up the various address mask definitions to
>> use GENMASK_ULL() consistently. The fact that doing so reveals (and
>> fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
>> jolly good idea it is...
>>
>> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>
>> v2: Clean up one more now-unnecessary linewrap
>>
>> drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
>> 1 file changed, 21 insertions(+), 32 deletions(-)
>
> Whilst I agree that using GENMASK is better, this patch does mean that the
> driver is (more) inconsistent with its _MASK terminology in that you can't
> generally tell whether a definition that ends in _MASK is shifted or not,
> and this isn't even consistent for fields within the same register.
The apparently slightly-less-than-obvious internal consistency is that
every mask used for an *address field* is now in-place, while other
types of field are still handled as inconsistently as they were before.
It should also be the case that every x_MASK without a corresponding
x_SHIFT defined next to it is unshifted.
Either way it's certainly no *worse* than the current situation where
address masks sometimes have a nonzero shift, sometimes have zero bits
at the bottom and a shift of 0, and sometimes have no shift defined at all.
Thinking about it some more, the address masks should only ever be
needed when *extracting* an address from a register/structure word, or
validating them in the context of an address *before* inserting into a
field - if we can't trust input to be correct then just silently masking
off bits probably isn't the best idea either way - so IMHO there is
plenty of contextual disambiguation too.
> Should we be using GENMASK/BIT for all fields instead and removing all of
> the _SHIFT definitions?
I'm all aboard using BIT() consistently for single-bit boolean fields,
but for multi-bit fields in general we do have to keep an explicit shift
defined *somewhere* in order to make sensible use of the value, i.e. either:
val = (reg >> 22) & 0x1f;
reg = (val & 0x1f) << 22;
or:
val = (reg & 0x07c00000) >> 22;
reg = (val << 22) & 0x07c00000;
[ but ideally not this mess we currently have in some places:
val = (reg & 0x1f << 22) >> 22;
]
Again, I'd gladly clean everything up to at least be self-consistent
(and line up more with how we did things in SMMUv2) if you think it's
worthwhile. Although I guess that means I'd get the job of fixing up
future stable backport conflicts too ;)
Robin.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
@ 2018-02-27 13:28 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:28 UTC (permalink / raw)
To: linux-arm-kernel
Hi Will,
On 26/02/18 18:04, Will Deacon wrote:
> Hi Robin,
>
> On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
>> Before trying to add the SMMUv3.1 support for 52-bit addresses, make
>> things bearable by cleaning up the various address mask definitions to
>> use GENMASK_ULL() consistently. The fact that doing so reveals (and
>> fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
>> jolly good idea it is...
>>
>> Tested-by: Nate Watterson <nwatters@codeaurora.org>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> v2: Clean up one more now-unnecessary linewrap
>>
>> drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
>> 1 file changed, 21 insertions(+), 32 deletions(-)
>
> Whilst I agree that using GENMASK is better, this patch does mean that the
> driver is (more) inconsistent with its _MASK terminology in that you can't
> generally tell whether a definition that ends in _MASK is shifted or not,
> and this isn't even consistent for fields within the same register.
The apparently slightly-less-than-obvious internal consistency is that
every mask used for an *address field* is now in-place, while other
types of field are still handled as inconsistently as they were before.
It should also be the case that every x_MASK without a corresponding
x_SHIFT defined next to it is unshifted.
Either way it's certainly no *worse* than the current situation where
address masks sometimes have a nonzero shift, sometimes have zero bits
at the bottom and a shift of 0, and sometimes have no shift defined at all.
Thinking about it some more, the address masks should only ever be
needed when *extracting* an address from a register/structure word, or
validating them in the context of an address *before* inserting into a
field - if we can't trust input to be correct then just silently masking
off bits probably isn't the best idea either way - so IMHO there is
plenty of contextual disambiguation too.
> Should we be using GENMASK/BIT for all fields instead and removing all of
> the _SHIFT definitions?
I'm all aboard using BIT() consistently for single-bit boolean fields,
but for multi-bit fields in general we do have to keep an explicit shift
defined *somewhere* in order to make sensible use of the value, i.e. either:
val = (reg >> 22) & 0x1f;
reg = (val & 0x1f) << 22;
or:
val = (reg & 0x07c00000) >> 22;
reg = (val << 22) & 0x07c00000;
[ but ideally not this mess we currently have in some places:
val = (reg & 0x1f << 22) >> 22;
]
Again, I'd gladly clean everything up to at least be self-consistent
(and line up more with how we did things in SMMUv2) if you think it's
worthwhile. Although I guess that means I'd get the job of fixing up
future stable backport conflicts too ;)
Robin.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
2018-02-26 18:05 ` Will Deacon
@ 2018-02-27 13:49 ` Robin Murphy
-1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:49 UTC (permalink / raw)
To: Will Deacon
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
On 26/02/18 18:05, Will Deacon wrote:
> On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
>> Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
>> 52-bit physical addresses when using the 64KB translation granule.
>> This will be supported by SMMUv3.1.
>>
>> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>
>> v2: Fix TCR_PS/TCR_IPS copy-paste error
>>
>> drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
>> 1 file changed, 47 insertions(+), 18 deletions(-)
>
> [...]
>
>> @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
>>
>> typedef u64 arm_lpae_iopte;
>>
>> +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
>> + struct arm_lpae_io_pgtable *data)
>> +{
>> + arm_lpae_iopte pte = paddr;
>> +
>> + /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
>> + return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
>> +}
>
> I don't particularly like relying on properties of the paddr for correct
> construction of the pte here. The existing macro doesn't have this
> limitation. I suspect it's all fine at the moment because we only use TTBR0,
> but I'd rather not bake that in if we can avoid it.
What's the relevance of TTBR0 to physical addresses? :/
Note that by this point paddr has been validated against cfg->oas by
arm_lpae_map(), and validated to be granule-aligned by iommu_map(), so
it really can't be wrong under reasonable conditions.
>> +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
>> + struct arm_lpae_io_pgtable *data)
>> +{
>> + phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
>> + phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
>> +
>> + /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
>> + return (paddr ^ paddr_hi) | (paddr_hi << 36);
>
> Why do we need xor here?
Because "(paddr ^ paddr_hi)" is more concise than "(paddr &
~(phys_addr_t)(ARM_LPAE_GRANULE(data) - 1)" or variants thereof. It's
potentially a teeny bit more efficient too, I think, but it's mostly
about the readability.
>> static bool selftest_running = false;
>>
>> static dma_addr_t __arm_lpae_dma_addr(void *pages)
>> @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>> pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>>
>> pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
>> - pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
>> + pte |= paddr_to_iopte(paddr, data);
>>
>> __arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>> }
>> @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>> if (size == split_sz)
>> unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
>>
>> - blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
>> + blk_paddr = iopte_to_paddr(blk_pte, data);
>> pte = iopte_prot(blk_pte);
>>
>> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>> @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>>
>> found_translation:
>> iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
>> - return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
>> + return iopte_to_paddr(pte, data) | iova;
>> }
>>
>> static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>> {
>> - unsigned long granule;
>> + unsigned long granule, page_sizes;
>> + unsigned int max_addr_bits = 48;
>>
>> /*
>> * We need to restrict the supported page sizes to match the
>> @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>>
>> switch (granule) {
>> case SZ_4K:
>> - cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
>> + page_sizes = (SZ_4K | SZ_2M | SZ_1G);
>> break;
>> case SZ_16K:
>> - cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
>> + page_sizes = (SZ_16K | SZ_32M);
>> break;
>> case SZ_64K:
>> - cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
>> + max_addr_bits = 52;
>> + page_sizes = (SZ_64K | SZ_512M);
>> + if (cfg->oas > 48)
>> + page_sizes |= 1ULL << 42; /* 4TB */
>> break;
>> default:
>> - cfg->pgsize_bitmap = 0;
>> + page_sizes = 0;
>> }
>> +
>> + cfg->pgsize_bitmap &= page_sizes;
>> + cfg->ias = min(cfg->ias, max_addr_bits);
>> + cfg->oas = min(cfg->oas, max_addr_bits);
>
> I don't think we should be writing to the ias/oas fields here, at least
> now without auditing the drivers and updating the comments about the
> io-pgtable API. For example, the SMMUv3 driver uses its own ias local
> variable to initialise the domain geometry, and won't pick up any changes
> made here.
As you've discovered, the driver thing is indeed true. More generally,
though, we've always adjusted cfg->pgsize_bitmap here, so I don't see
why other fields should be treated differently - I've always assumed the
cfg which the driver passes in here just represents its total maximum
capabilities, from which it's io-pgtable's job to pick an appropriate
configuration.
Robin.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
@ 2018-02-27 13:49 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:49 UTC (permalink / raw)
To: linux-arm-kernel
On 26/02/18 18:05, Will Deacon wrote:
> On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
>> Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
>> 52-bit physical addresses when using the 64KB translation granule.
>> This will be supported by SMMUv3.1.
>>
>> Tested-by: Nate Watterson <nwatters@codeaurora.org>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> v2: Fix TCR_PS/TCR_IPS copy-paste error
>>
>> drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
>> 1 file changed, 47 insertions(+), 18 deletions(-)
>
> [...]
>
>> @@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
>>
>> typedef u64 arm_lpae_iopte;
>>
>> +static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
>> + struct arm_lpae_io_pgtable *data)
>> +{
>> + arm_lpae_iopte pte = paddr;
>> +
>> + /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
>> + return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
>> +}
>
> I don't particularly like relying on properties of the paddr for correct
> construction of the pte here. The existing macro doesn't have this
> limitation. I suspect it's all fine at the moment because we only use TTBR0,
> but I'd rather not bake that in if we can avoid it.
What's the relevance of TTBR0 to physical addresses? :/
Note that by this point paddr has been validated against cfg->oas by
arm_lpae_map(), and validated to be granule-aligned by iommu_map(), so
it really can't be wrong under reasonable conditions.
>> +static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
>> + struct arm_lpae_io_pgtable *data)
>> +{
>> + phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
>> + phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
>> +
>> + /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
>> + return (paddr ^ paddr_hi) | (paddr_hi << 36);
>
> Why do we need xor here?
Because "(paddr ^ paddr_hi)" is more concise than "(paddr &
~(phys_addr_t)(ARM_LPAE_GRANULE(data) - 1)" or variants thereof. It's
potentially a teeny bit more efficient too, I think, but it's mostly
about the readability.
>> static bool selftest_running = false;
>>
>> static dma_addr_t __arm_lpae_dma_addr(void *pages)
>> @@ -287,7 +302,7 @@ static void __arm_lpae_init_pte(struct arm_lpae_io_pgtable *data,
>> pte |= ARM_LPAE_PTE_TYPE_BLOCK;
>>
>> pte |= ARM_LPAE_PTE_AF | ARM_LPAE_PTE_SH_IS;
>> - pte |= pfn_to_iopte(paddr >> data->pg_shift, data);
>> + pte |= paddr_to_iopte(paddr, data);
>>
>> __arm_lpae_set_pte(ptep, pte, &data->iop.cfg);
>> }
>> @@ -528,7 +543,7 @@ static int arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data,
>> if (size == split_sz)
>> unmap_idx = ARM_LPAE_LVL_IDX(iova, lvl, data);
>>
>> - blk_paddr = iopte_to_pfn(blk_pte, data) << data->pg_shift;
>> + blk_paddr = iopte_to_paddr(blk_pte, data);
>> pte = iopte_prot(blk_pte);
>>
>> for (i = 0; i < tablesz / sizeof(pte); i++, blk_paddr += split_sz) {
>> @@ -652,12 +667,13 @@ static phys_addr_t arm_lpae_iova_to_phys(struct io_pgtable_ops *ops,
>>
>> found_translation:
>> iova &= (ARM_LPAE_BLOCK_SIZE(lvl, data) - 1);
>> - return ((phys_addr_t)iopte_to_pfn(pte,data) << data->pg_shift) | iova;
>> + return iopte_to_paddr(pte, data) | iova;
>> }
>>
>> static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>> {
>> - unsigned long granule;
>> + unsigned long granule, page_sizes;
>> + unsigned int max_addr_bits = 48;
>>
>> /*
>> * We need to restrict the supported page sizes to match the
>> @@ -677,17 +693,24 @@ static void arm_lpae_restrict_pgsizes(struct io_pgtable_cfg *cfg)
>>
>> switch (granule) {
>> case SZ_4K:
>> - cfg->pgsize_bitmap &= (SZ_4K | SZ_2M | SZ_1G);
>> + page_sizes = (SZ_4K | SZ_2M | SZ_1G);
>> break;
>> case SZ_16K:
>> - cfg->pgsize_bitmap &= (SZ_16K | SZ_32M);
>> + page_sizes = (SZ_16K | SZ_32M);
>> break;
>> case SZ_64K:
>> - cfg->pgsize_bitmap &= (SZ_64K | SZ_512M);
>> + max_addr_bits = 52;
>> + page_sizes = (SZ_64K | SZ_512M);
>> + if (cfg->oas > 48)
>> + page_sizes |= 1ULL << 42; /* 4TB */
>> break;
>> default:
>> - cfg->pgsize_bitmap = 0;
>> + page_sizes = 0;
>> }
>> +
>> + cfg->pgsize_bitmap &= page_sizes;
>> + cfg->ias = min(cfg->ias, max_addr_bits);
>> + cfg->oas = min(cfg->oas, max_addr_bits);
>
> I don't think we should be writing to the ias/oas fields here, at least
> now without auditing the drivers and updating the comments about the
> io-pgtable API. For example, the SMMUv3 driver uses its own ias local
> variable to initialise the domain geometry, and won't pick up any changes
> made here.
As you've discovered, the driver thing is indeed true. More generally,
though, we've always adjusted cfg->pgsize_bitmap here, so I don't see
why other fields should be treated differently - I've always assumed the
cfg which the driver passes in here just represents its total maximum
capabilities, from which it's io-pgtable's job to pick an appropriate
configuration.
Robin.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
2018-02-26 18:05 ` Will Deacon
@ 2018-02-27 13:57 ` Robin Murphy
-1 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:57 UTC (permalink / raw)
To: Will Deacon
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
On 26/02/18 18:05, Will Deacon wrote:
> On Thu, Dec 14, 2017 at 04:58:53PM +0000, Robin Murphy wrote:
>> Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
>> really all that's involved is letting io-pgtable know the appropriate
>> upper bound for T0SZ.
>>
>> Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
>> Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
>> ---
>>
>> v2: No change
>>
>> drivers/iommu/arm-smmu-v3.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index c9c4e6132e27..7059a0805bd1 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -102,6 +102,7 @@
>> #define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
>> #define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
>> #define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
>> +#define IDR5_VAX (1 << 10)
>
> I think this is actually a two-bit field, so we should check the whole
> thing.
Yes, I either overlooked that or took a cheeky shortcut; will fix.
>>
>> #define ARM_SMMU_CR0 0x20
>> #define CR0_CMDQEN (1 << 3)
>> @@ -604,6 +605,7 @@ struct arm_smmu_device {
>> #define ARM_SMMU_FEAT_STALLS (1 << 11)
>> #define ARM_SMMU_FEAT_HYP (1 << 12)
>> #define ARM_SMMU_FEAT_STALL_FORCE (1 << 13)
>> +#define ARM_SMMU_FEAT_VAX (1 << 14)
>> u32 features;
>>
>> #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
>> @@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>> switch (smmu_domain->stage) {
>> case ARM_SMMU_DOMAIN_S1:
>> ias = VA_BITS;
>> + if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
>
> nit, but I'd rather the if condition was checking ias instead of VA_BITS.
>
>> + ias = 48;
>> oas = smmu->ias;
>
> Should we also be sanitising the oas here if the SMMU doesn't support 64k
> pages? It looks like the ias/oas sanitisation isn't cleanly split between
> the pgtable code and the SMMU driver.
Right, based on my previous argument it would be cleaner to set 48/52
here based on the SMMU features, then constrain to VA_BITS in
io-pgtable; I think I was more focused on minimising the diff initially.
Robin.
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address
@ 2018-02-27 13:57 ` Robin Murphy
0 siblings, 0 replies; 30+ messages in thread
From: Robin Murphy @ 2018-02-27 13:57 UTC (permalink / raw)
To: linux-arm-kernel
On 26/02/18 18:05, Will Deacon wrote:
> On Thu, Dec 14, 2017 at 04:58:53PM +0000, Robin Murphy wrote:
>> Stage 1 input addresses are effectively 64-bit in SMMUv3 anyway, so
>> really all that's involved is letting io-pgtable know the appropriate
>> upper bound for T0SZ.
>>
>> Tested-by: Nate Watterson <nwatters@codeaurora.org>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>
>> v2: No change
>>
>> drivers/iommu/arm-smmu-v3.c | 8 ++++++++
>> 1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
>> index c9c4e6132e27..7059a0805bd1 100644
>> --- a/drivers/iommu/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm-smmu-v3.c
>> @@ -102,6 +102,7 @@
>> #define IDR5_OAS_44_BIT (4 << IDR5_OAS_SHIFT)
>> #define IDR5_OAS_48_BIT (5 << IDR5_OAS_SHIFT)
>> #define IDR5_OAS_52_BIT (6 << IDR5_OAS_SHIFT)
>> +#define IDR5_VAX (1 << 10)
>
> I think this is actually a two-bit field, so we should check the whole
> thing.
Yes, I either overlooked that or took a cheeky shortcut; will fix.
>>
>> #define ARM_SMMU_CR0 0x20
>> #define CR0_CMDQEN (1 << 3)
>> @@ -604,6 +605,7 @@ struct arm_smmu_device {
>> #define ARM_SMMU_FEAT_STALLS (1 << 11)
>> #define ARM_SMMU_FEAT_HYP (1 << 12)
>> #define ARM_SMMU_FEAT_STALL_FORCE (1 << 13)
>> +#define ARM_SMMU_FEAT_VAX (1 << 14)
>> u32 features;
>>
>> #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
>> @@ -1656,6 +1658,8 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
>> switch (smmu_domain->stage) {
>> case ARM_SMMU_DOMAIN_S1:
>> ias = VA_BITS;
>> + if (VA_BITS > 48 && !(smmu->features & ARM_SMMU_FEAT_VAX))
>
> nit, but I'd rather the if condition was checking ias instead of VA_BITS.
>
>> + ias = 48;
>> oas = smmu->ias;
>
> Should we also be sanitising the oas here if the SMMU doesn't support 64k
> pages? It looks like the ias/oas sanitisation isn't cleanly split between
> the pgtable code and the SMMU driver.
Right, based on my previous argument it would be cleaner to set 48/52
here based on the SMMU features, then constrain to VA_BITS in
io-pgtable; I think I was more focused on minimising the diff initially.
Robin.
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
2018-02-27 13:28 ` Robin Murphy
@ 2018-03-06 16:02 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-03-06 16:02 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Hi Robin,
On Tue, Feb 27, 2018 at 01:28:31PM +0000, Robin Murphy wrote:
> On 26/02/18 18:04, Will Deacon wrote:
> >On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
> >>Before trying to add the SMMUv3.1 support for 52-bit addresses, make
> >>things bearable by cleaning up the various address mask definitions to
> >>use GENMASK_ULL() consistently. The fact that doing so reveals (and
> >>fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
> >>jolly good idea it is...
> >>
> >>Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> >>Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> >>---
> >>
> >>v2: Clean up one more now-unnecessary linewrap
> >>
> >> drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
> >> 1 file changed, 21 insertions(+), 32 deletions(-)
> >
> >Whilst I agree that using GENMASK is better, this patch does mean that the
> >driver is (more) inconsistent with its _MASK terminology in that you can't
> >generally tell whether a definition that ends in _MASK is shifted or not,
> >and this isn't even consistent for fields within the same register.
>
> The apparently slightly-less-than-obvious internal consistency is that every
> mask used for an *address field* is now in-place, while other types of field
> are still handled as inconsistently as they were before. It should also be
> the case that every x_MASK without a corresponding x_SHIFT defined next to
> it is unshifted.
>
> Either way it's certainly no *worse* than the current situation where
> address masks sometimes have a nonzero shift, sometimes have zero bits at
> the bottom and a shift of 0, and sometimes have no shift defined at all.
>
> Thinking about it some more, the address masks should only ever be needed
> when *extracting* an address from a register/structure word, or validating
> them in the context of an address *before* inserting into a field - if we
> can't trust input to be correct then just silently masking off bits probably
> isn't the best idea either way - so IMHO there is plenty of contextual
> disambiguation too.
>
> >Should we be using GENMASK/BIT for all fields instead and removing all of
> >the _SHIFT definitions?
>
> I'm all aboard using BIT() consistently for single-bit boolean fields, but
> for multi-bit fields in general we do have to keep an explicit shift defined
> *somewhere* in order to make sensible use of the value, i.e. either:
>
> val = (reg >> 22) & 0x1f;
> reg = (val & 0x1f) << 22;
>
> or:
> val = (reg & 0x07c00000) >> 22;
> reg = (val << 22) & 0x07c00000;
>
> [ but ideally not this mess we currently have in some places:
>
> val = (reg & 0x1f << 22) >> 22;
> ]
>
> Again, I'd gladly clean everything up to at least be self-consistent (and
> line up more with how we did things in SMMUv2) if you think it's worthwhile.
> Although I guess that means I'd get the job of fixing up future stable
> backport conflicts too ;)
I reckon it would be worth the cleanup since you're in the area. I don't
mind keeping the SHITF definitions where they're needed, but using BIT and
GENMASK wherever we can.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking
@ 2018-03-06 16:02 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-03-06 16:02 UTC (permalink / raw)
To: linux-arm-kernel
Hi Robin,
On Tue, Feb 27, 2018 at 01:28:31PM +0000, Robin Murphy wrote:
> On 26/02/18 18:04, Will Deacon wrote:
> >On Thu, Dec 14, 2017 at 04:58:50PM +0000, Robin Murphy wrote:
> >>Before trying to add the SMMUv3.1 support for 52-bit addresses, make
> >>things bearable by cleaning up the various address mask definitions to
> >>use GENMASK_ULL() consistently. The fact that doing so reveals (and
> >>fixes) a latent off-by-one in Q_BASE_ADDR_MASK only goes to show what a
> >>jolly good idea it is...
> >>
> >>Tested-by: Nate Watterson <nwatters@codeaurora.org>
> >>Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>---
> >>
> >>v2: Clean up one more now-unnecessary linewrap
> >>
> >> drivers/iommu/arm-smmu-v3.c | 53 ++++++++++++++++++---------------------------
> >> 1 file changed, 21 insertions(+), 32 deletions(-)
> >
> >Whilst I agree that using GENMASK is better, this patch does mean that the
> >driver is (more) inconsistent with its _MASK terminology in that you can't
> >generally tell whether a definition that ends in _MASK is shifted or not,
> >and this isn't even consistent for fields within the same register.
>
> The apparently slightly-less-than-obvious internal consistency is that every
> mask used for an *address field* is now in-place, while other types of field
> are still handled as inconsistently as they were before. It should also be
> the case that every x_MASK without a corresponding x_SHIFT defined next to
> it is unshifted.
>
> Either way it's certainly no *worse* than the current situation where
> address masks sometimes have a nonzero shift, sometimes have zero bits at
> the bottom and a shift of 0, and sometimes have no shift defined at all.
>
> Thinking about it some more, the address masks should only ever be needed
> when *extracting* an address from a register/structure word, or validating
> them in the context of an address *before* inserting into a field - if we
> can't trust input to be correct then just silently masking off bits probably
> isn't the best idea either way - so IMHO there is plenty of contextual
> disambiguation too.
>
> >Should we be using GENMASK/BIT for all fields instead and removing all of
> >the _SHIFT definitions?
>
> I'm all aboard using BIT() consistently for single-bit boolean fields, but
> for multi-bit fields in general we do have to keep an explicit shift defined
> *somewhere* in order to make sensible use of the value, i.e. either:
>
> val = (reg >> 22) & 0x1f;
> reg = (val & 0x1f) << 22;
>
> or:
> val = (reg & 0x07c00000) >> 22;
> reg = (val << 22) & 0x07c00000;
>
> [ but ideally not this mess we currently have in some places:
>
> val = (reg & 0x1f << 22) >> 22;
> ]
>
> Again, I'd gladly clean everything up to at least be self-consistent (and
> line up more with how we did things in SMMUv2) if you think it's worthwhile.
> Although I guess that means I'd get the job of fixing up future stable
> backport conflicts too ;)
I reckon it would be worth the cleanup since you're in the area. I don't
mind keeping the SHITF definitions where they're needed, but using BIT and
GENMASK wherever we can.
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
2018-02-27 13:49 ` Robin Murphy
@ 2018-03-06 17:54 ` Will Deacon
-1 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-03-06 17:54 UTC (permalink / raw)
To: Robin Murphy
Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
kristina.martsenko-5wv7dgnIgG8,
linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
steve.capper-5wv7dgnIgG8
Hey Robin,
On Tue, Feb 27, 2018 at 01:49:14PM +0000, Robin Murphy wrote:
> On 26/02/18 18:05, Will Deacon wrote:
> >On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
> >>Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
> >>52-bit physical addresses when using the 64KB translation granule.
> >>This will be supported by SMMUv3.1.
> >>
> >>Tested-by: Nate Watterson <nwatters-sgV2jX0FEOL9JmXXK+q4OQ@public.gmane.org>
> >>Signed-off-by: Robin Murphy <robin.murphy-5wv7dgnIgG8@public.gmane.org>
> >>---
> >>
> >>v2: Fix TCR_PS/TCR_IPS copy-paste error
> >>
> >> drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
> >> 1 file changed, 47 insertions(+), 18 deletions(-)
> >
> >[...]
> >
> >>@@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
> >> typedef u64 arm_lpae_iopte;
> >>+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
> >>+ struct arm_lpae_io_pgtable *data)
> >>+{
> >>+ arm_lpae_iopte pte = paddr;
> >>+
> >>+ /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
> >>+ return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
> >>+}
> >
> >I don't particularly like relying on properties of the paddr for correct
> >construction of the pte here. The existing macro doesn't have this
> >limitation. I suspect it's all fine at the moment because we only use TTBR0,
> >but I'd rather not bake that in if we can avoid it.
>
> What's the relevance of TTBR0 to physical addresses? :/
Good point! I clearly got confused here.
> Note that by this point paddr has been validated against cfg->oas by
> arm_lpae_map(), and validated to be granule-aligned by iommu_map(), so it
> really can't be wrong under reasonable conditions.
Fair enough, but I still think this would be a bit more readable written
along the lines of:
arm_lpae_iopte pte_hi, pte_lo = 0;
pte_hi = paddr & GENMASK(47, data->pg_shift);
if (data->pg_shift == 16) {
pte_lo = paddr & GENMASK(ARM_LPAE_MAX_ADDR_BITS - 1, 48);
pte_lo >>= (48 - 12);
}
return pte_hi | pte_lo;
Ok, we don't squeeze every last cycle out of it, but I find it much more
readable. Is it just me? The magic numbers could be #defined and/or
derived if necessary.
> >>+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> >>+ struct arm_lpae_io_pgtable *data)
> >>+{
> >>+ phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> >>+ phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
> >>+
> >>+ /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
> >>+ return (paddr ^ paddr_hi) | (paddr_hi << 36);
> >
> >Why do we need xor here?
>
> Because "(paddr ^ paddr_hi)" is more concise than "(paddr &
> ~(phys_addr_t)(ARM_LPAE_GRANULE(data) - 1)" or variants thereof. It's
> potentially a teeny bit more efficient too, I think, but it's mostly about
> the readability.
I don't think the bit-twiddling is super readable, to be honest. Again,
something like:
phys_addr_t paddr_lo, paddr_hi = 0;
paddr_lo = pte & GENMASK(47, data->pg_shift);
if (data->pg_shift == 16) {
paddr_hi = pte & GENMASK(15, 12);
paddr_hi <<= (48 - 12);
}
return paddr_hi | paddr_lo;
but I've not even compiled this stuff...
> >>+ cfg->pgsize_bitmap &= page_sizes;
> >>+ cfg->ias = min(cfg->ias, max_addr_bits);
> >>+ cfg->oas = min(cfg->oas, max_addr_bits);
> >
> >I don't think we should be writing to the ias/oas fields here, at least
> >now without auditing the drivers and updating the comments about the
> >io-pgtable API. For example, the SMMUv3 driver uses its own ias local
> >variable to initialise the domain geometry, and won't pick up any changes
> >made here.
>
> As you've discovered, the driver thing is indeed true. More generally,
> though, we've always adjusted cfg->pgsize_bitmap here, so I don't see why
> other fields should be treated differently - I've always assumed the cfg
> which the driver passes in here just represents its total maximum
> capabilities, from which it's io-pgtable's job to pick an appropriate
> configuration.
Can you update the the header file with a comment to that effect, please?
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
* [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address
@ 2018-03-06 17:54 ` Will Deacon
0 siblings, 0 replies; 30+ messages in thread
From: Will Deacon @ 2018-03-06 17:54 UTC (permalink / raw)
To: linux-arm-kernel
Hey Robin,
On Tue, Feb 27, 2018 at 01:49:14PM +0000, Robin Murphy wrote:
> On 26/02/18 18:05, Will Deacon wrote:
> >On Thu, Dec 14, 2017 at 04:58:51PM +0000, Robin Murphy wrote:
> >>Bring io-pgtable-arm in line with the ARMv8.2-LPA feature allowing
> >>52-bit physical addresses when using the 64KB translation granule.
> >>This will be supported by SMMUv3.1.
> >>
> >>Tested-by: Nate Watterson <nwatters@codeaurora.org>
> >>Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>---
> >>
> >>v2: Fix TCR_PS/TCR_IPS copy-paste error
> >>
> >> drivers/iommu/io-pgtable-arm.c | 65 ++++++++++++++++++++++++++++++------------
> >> 1 file changed, 47 insertions(+), 18 deletions(-)
> >
> >[...]
> >
> >>@@ -203,6 +199,25 @@ struct arm_lpae_io_pgtable {
> >> typedef u64 arm_lpae_iopte;
> >>+static arm_lpae_iopte paddr_to_iopte(phys_addr_t paddr,
> >>+ struct arm_lpae_io_pgtable *data)
> >>+{
> >>+ arm_lpae_iopte pte = paddr;
> >>+
> >>+ /* Of the bits which overlap, either 51:48 or 15:12 are always RES0 */
> >>+ return (pte | (pte >> 36)) & ARM_LPAE_PTE_ADDR_MASK;
> >>+}
> >
> >I don't particularly like relying on properties of the paddr for correct
> >construction of the pte here. The existing macro doesn't have this
> >limitation. I suspect it's all fine at the moment because we only use TTBR0,
> >but I'd rather not bake that in if we can avoid it.
>
> What's the relevance of TTBR0 to physical addresses? :/
Good point! I clearly got confused here.
> Note that by this point paddr has been validated against cfg->oas by
> arm_lpae_map(), and validated to be granule-aligned by iommu_map(), so it
> really can't be wrong under reasonable conditions.
Fair enough, but I still think this would be a bit more readable written
along the lines of:
arm_lpae_iopte pte_hi, pte_lo = 0;
pte_hi = paddr & GENMASK(47, data->pg_shift);
if (data->pg_shift == 16) {
pte_lo = paddr & GENMASK(ARM_LPAE_MAX_ADDR_BITS - 1, 48);
pte_lo >>= (48 - 12);
}
return pte_hi | pte_lo;
Ok, we don't squeeze every last cycle out of it, but I find it much more
readable. Is it just me? The magic numbers could be #defined and/or
derived if necessary.
> >>+static phys_addr_t iopte_to_paddr(arm_lpae_iopte pte,
> >>+ struct arm_lpae_io_pgtable *data)
> >>+{
> >>+ phys_addr_t paddr = pte & ARM_LPAE_PTE_ADDR_MASK;
> >>+ phys_addr_t paddr_hi = paddr & (ARM_LPAE_GRANULE(data) - 1);
> >>+
> >>+ /* paddr_hi spans nothing for 4K granule, and only RES0 bits for 16K */
> >>+ return (paddr ^ paddr_hi) | (paddr_hi << 36);
> >
> >Why do we need xor here?
>
> Because "(paddr ^ paddr_hi)" is more concise than "(paddr &
> ~(phys_addr_t)(ARM_LPAE_GRANULE(data) - 1)" or variants thereof. It's
> potentially a teeny bit more efficient too, I think, but it's mostly about
> the readability.
I don't think the bit-twiddling is super readable, to be honest. Again,
something like:
phys_addr_t paddr_lo, paddr_hi = 0;
paddr_lo = pte & GENMASK(47, data->pg_shift);
if (data->pg_shift == 16) {
paddr_hi = pte & GENMASK(15, 12);
paddr_hi <<= (48 - 12);
}
return paddr_hi | paddr_lo;
but I've not even compiled this stuff...
> >>+ cfg->pgsize_bitmap &= page_sizes;
> >>+ cfg->ias = min(cfg->ias, max_addr_bits);
> >>+ cfg->oas = min(cfg->oas, max_addr_bits);
> >
> >I don't think we should be writing to the ias/oas fields here, at least
> >now without auditing the drivers and updating the comments about the
> >io-pgtable API. For example, the SMMUv3 driver uses its own ias local
> >variable to initialise the domain geometry, and won't pick up any changes
> >made here.
>
> As you've discovered, the driver thing is indeed true. More generally,
> though, we've always adjusted cfg->pgsize_bitmap here, so I don't see why
> other fields should be treated differently - I've always assumed the cfg
> which the driver passes in here just represents its total maximum
> capabilities, from which it's io-pgtable's job to pick an appropriate
> configuration.
Can you update the the header file with a comment to that effect, please?
Will
^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2018-03-06 17:54 UTC | newest]
Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-12-14 16:58 [PATCH v2 0/4] SMMU 52-bit address support Robin Murphy
2017-12-14 16:58 ` Robin Murphy
[not found] ` <cover.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2017-12-14 16:58 ` [PATCH v2 1/4] iommu/arm-smmu-v3: Clean up address masking Robin Murphy
2017-12-14 16:58 ` Robin Murphy
[not found] ` <24f90689e35a90a337601943a48902a7ab6a7c4d.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-02-26 18:04 ` Will Deacon
2018-02-26 18:04 ` Will Deacon
[not found] ` <20180226180455.GB26147-5wv7dgnIgG8@public.gmane.org>
2018-02-27 13:28 ` Robin Murphy
2018-02-27 13:28 ` Robin Murphy
[not found] ` <0734679d-d94a-92e7-8cea-dcf0c07b0620-5wv7dgnIgG8@public.gmane.org>
2018-03-06 16:02 ` Will Deacon
2018-03-06 16:02 ` Will Deacon
2017-12-14 16:58 ` [PATCH v2 2/4] iommu/io-pgtable-arm: Support 52-bit physical address Robin Murphy
2017-12-14 16:58 ` Robin Murphy
[not found] ` <fde583dacee8099ed4e952dbf1c4dfb42070e9c5.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-02-26 18:05 ` Will Deacon
2018-02-26 18:05 ` Will Deacon
[not found] ` <20180226180501.GC26147-5wv7dgnIgG8@public.gmane.org>
2018-02-27 13:49 ` Robin Murphy
2018-02-27 13:49 ` Robin Murphy
[not found] ` <52ff8cd6-64ec-d7d4-2135-0b17becc4251-5wv7dgnIgG8@public.gmane.org>
2018-03-06 17:54 ` Will Deacon
2018-03-06 17:54 ` Will Deacon
2017-12-14 16:58 ` [PATCH v2 3/4] iommu/arm-smmu-v3: " Robin Murphy
2017-12-14 16:58 ` Robin Murphy
[not found] ` <9d2a2eb4527987aec520e112163a178d92fc946b.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-02-26 18:05 ` Will Deacon
2018-02-26 18:05 ` Will Deacon
2017-12-14 16:58 ` [PATCH v2 4/4] iommu/arm-smmu-v3: Support 52-bit virtual address Robin Murphy
2017-12-14 16:58 ` Robin Murphy
[not found] ` <15d196d9a8e160c5df0e0340e23a432482389f9f.1512038236.git.robin.murphy-5wv7dgnIgG8@public.gmane.org>
2018-02-26 18:05 ` Will Deacon
2018-02-26 18:05 ` Will Deacon
[not found] ` <20180226180508.GE26147-5wv7dgnIgG8@public.gmane.org>
2018-02-27 13:57 ` Robin Murphy
2018-02-27 13:57 ` Robin Murphy
2018-02-26 18:04 ` [PATCH v2 0/4] SMMU 52-bit address support Will Deacon
2018-02-26 18:04 ` Will Deacon
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.