iommu.lists.linux-foundation.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3)
@ 2024-04-16 19:28 Jason Gunthorpe
  2024-04-16 19:28 ` [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code Jason Gunthorpe
                   ` (9 more replies)
  0 siblings, 10 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

This is split out from the larger part two which aimes to rework the PASID
related code.

No new functionality is introduced in theses commits, it just reorganizes
the CD logic to follow the same design of the new STE logic using make
functions and a single programming flow without leaking details to
callers.

CD does not have as strong a need for this as STE, but all the code exists
and continuing with the same pattern makes for fewer things to understand
inside the driver.

The following PASID code makes use of this to rethread how the CD
programming works to take a caller created struct arm_smmu_cd and then
stick whatever that is into the live CD entry. This allows the actual
PASID and CD logic to be general and then the PAGING and SVA domain types
can sit on top of it.

There are four kinds of CDs:
 - Blocking (ie cleared)
 - S1 PAGING
 - SVA
 - SVA with a released MM (all fault)

The last two have to transition hitlessly.

v7:
 - Rebase on Will's for-next & v6.9-rc2
 - Split series in half
 - Include the kunit test
 - Update comments to refer to the STE & CD in the writer logic
v6: https://lore.kernel.org/r/0-v6-228e7adf25eb+4155-smmuv3_newapi_p2_jgg@nvidia.com

Jason Gunthorpe (9):
  iommu/arm-smmu-v3: Add an ops indirection to the STE code
  iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  iommu/arm-smmu-v3: Move the CD generation for S1 domains into a
    function
  iommu/arm-smmu-v3: Consolidate clearing a CD table entry
  iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  iommu/arm-smmu-v3: Allocate the CD table entry in advance
  iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
  iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry

 drivers/iommu/Kconfig                         |  12 +-
 drivers/iommu/arm/arm-smmu-v3/Makefile        |   2 +
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 163 ++++--
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 467 ++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 499 +++++++++---------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  50 +-
 6 files changed, 896 insertions(+), 297 deletions(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c


base-commit: e8e4398d53f98be7ac48e0bda9ea6e26df24136d
-- 
2.43.2


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-16 20:18   ` Nicolin Chen
  2024-04-19 21:02   ` Mostafa Saleh
  2024-04-16 19:28 ` [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() Jason Gunthorpe
                   ` (8 subsequent siblings)
  9 siblings, 2 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

Prepare to put the CD code into the same mechanism. Add an ops indirection
around all the STE specific code and make the worker functions independent
of the entry content being processed.

get_used and sync ops are provided to hook the correct code.

Signed-off-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Michael Shavit <mshavit@google.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 178 ++++++++++++--------
 1 file changed, 106 insertions(+), 72 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 79c18e95dd293e..bf105e914d38b1 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -42,8 +42,20 @@ enum arm_smmu_msi_index {
 	ARM_SMMU_MAX_MSIS,
 };
 
-static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu,
-				      ioasid_t sid);
+struct arm_smmu_entry_writer_ops;
+struct arm_smmu_entry_writer {
+	const struct arm_smmu_entry_writer_ops *ops;
+	struct arm_smmu_master *master;
+};
+
+struct arm_smmu_entry_writer_ops {
+	__le64 v_bit;
+	void (*get_used)(const __le64 *entry, __le64 *used);
+	void (*sync)(struct arm_smmu_entry_writer *writer);
+};
+
+#define NUM_ENTRY_QWORDS 8
+static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
 
 static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
 	[EVTQ_MSI_INDEX] = {
@@ -972,43 +984,42 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
  * would be nice if this was complete according to the spec, but minimally it
  * has to capture the bits this driver uses.
  */
-static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent,
-				  struct arm_smmu_ste *used_bits)
+static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 {
-	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent->data[0]));
+	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
 
-	used_bits->data[0] = cpu_to_le64(STRTAB_STE_0_V);
-	if (!(ent->data[0] & cpu_to_le64(STRTAB_STE_0_V)))
+	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
+	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
 		return;
 
-	used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
+	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
 
 	/* S1 translates */
 	if (cfg & BIT(0)) {
-		used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
-						  STRTAB_STE_0_S1CTXPTR_MASK |
-						  STRTAB_STE_0_S1CDMAX);
-		used_bits->data[1] |=
+		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
+					    STRTAB_STE_0_S1CTXPTR_MASK |
+					    STRTAB_STE_0_S1CDMAX);
+		used_bits[1] |=
 			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
 				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
 				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW |
 				    STRTAB_STE_1_EATS);
-		used_bits->data[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
+		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
 	}
 
 	/* S2 translates */
 	if (cfg & BIT(1)) {
-		used_bits->data[1] |=
+		used_bits[1] |=
 			cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG);
-		used_bits->data[2] |=
+		used_bits[2] |=
 			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
 				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
 				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
-		used_bits->data[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
+		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
 	}
 
 	if (cfg == STRTAB_STE_0_CFG_BYPASS)
-		used_bits->data[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
+		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
 }
 
 /*
@@ -1017,57 +1028,55 @@ static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent,
  * unused_update is an intermediate value of entry that has unused bits set to
  * their new values.
  */
-static u8 arm_smmu_entry_qword_diff(const struct arm_smmu_ste *entry,
-				    const struct arm_smmu_ste *target,
-				    struct arm_smmu_ste *unused_update)
+static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
+				    const __le64 *entry, const __le64 *target,
+				    __le64 *unused_update)
 {
-	struct arm_smmu_ste target_used = {};
-	struct arm_smmu_ste cur_used = {};
+	__le64 target_used[NUM_ENTRY_QWORDS] = {};
+	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
 	u8 used_qword_diff = 0;
 	unsigned int i;
 
-	arm_smmu_get_ste_used(entry, &cur_used);
-	arm_smmu_get_ste_used(target, &target_used);
+	writer->ops->get_used(entry, cur_used);
+	writer->ops->get_used(target, target_used);
 
-	for (i = 0; i != ARRAY_SIZE(target_used.data); i++) {
+	for (i = 0; i != NUM_ENTRY_QWORDS; i++) {
 		/*
 		 * Check that masks are up to date, the make functions are not
 		 * allowed to set a bit to 1 if the used function doesn't say it
 		 * is used.
 		 */
-		WARN_ON_ONCE(target->data[i] & ~target_used.data[i]);
+		WARN_ON_ONCE(target[i] & ~target_used[i]);
 
 		/* Bits can change because they are not currently being used */
-		unused_update->data[i] = (entry->data[i] & cur_used.data[i]) |
-					 (target->data[i] & ~cur_used.data[i]);
+		unused_update[i] = (entry[i] & cur_used[i]) |
+				   (target[i] & ~cur_used[i]);
 		/*
 		 * Each bit indicates that a used bit in a qword needs to be
 		 * changed after unused_update is applied.
 		 */
-		if ((unused_update->data[i] & target_used.data[i]) !=
-		    target->data[i])
+		if ((unused_update[i] & target_used[i]) != target[i])
 			used_qword_diff |= 1 << i;
 	}
 	return used_qword_diff;
 }
 
-static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid,
-		      struct arm_smmu_ste *entry,
-		      const struct arm_smmu_ste *target, unsigned int start,
+static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry,
+		      const __le64 *target, unsigned int start,
 		      unsigned int len)
 {
 	bool changed = false;
 	unsigned int i;
 
 	for (i = start; len != 0; len--, i++) {
-		if (entry->data[i] != target->data[i]) {
-			WRITE_ONCE(entry->data[i], target->data[i]);
+		if (entry[i] != target[i]) {
+			WRITE_ONCE(entry[i], target[i]);
 			changed = true;
 		}
 	}
 
 	if (changed)
-		arm_smmu_sync_ste_for_sid(smmu, sid);
+		writer->ops->sync(writer);
 	return changed;
 }
 
@@ -1097,24 +1106,21 @@ static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid,
  * V=0 process. This relies on the IGNORED behavior described in the
  * specification.
  */
-static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
-			       struct arm_smmu_ste *entry,
-			       const struct arm_smmu_ste *target)
+static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
+				 __le64 *entry, const __le64 *target)
 {
-	unsigned int num_entry_qwords = ARRAY_SIZE(target->data);
-	struct arm_smmu_device *smmu = master->smmu;
-	struct arm_smmu_ste unused_update;
+	__le64 unused_update[NUM_ENTRY_QWORDS];
 	u8 used_qword_diff;
 
 	used_qword_diff =
-		arm_smmu_entry_qword_diff(entry, target, &unused_update);
+		arm_smmu_entry_qword_diff(writer, entry, target, unused_update);
 	if (hweight8(used_qword_diff) == 1) {
 		/*
 		 * Only one qword needs its used bits to be changed. This is a
-		 * hitless update, update all bits the current STE is ignoring
-		 * to their new values, then update a single "critical qword" to
-		 * change the STE and finally 0 out any bits that are now unused
-		 * in the target configuration.
+		 * hitless update, update all bits the current STE/CD is
+		 * ignoring to their new values, then update a single "critical
+		 * qword" to change the STE/CD and finally 0 out any bits that
+		 * are now unused in the target configuration.
 		 */
 		unsigned int critical_qword_index = ffs(used_qword_diff) - 1;
 
@@ -1123,22 +1129,21 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
 		 * writing it in the next step anyways. This can save a sync
 		 * when the only change is in that qword.
 		 */
-		unused_update.data[critical_qword_index] =
-			entry->data[critical_qword_index];
-		entry_set(smmu, sid, entry, &unused_update, 0, num_entry_qwords);
-		entry_set(smmu, sid, entry, target, critical_qword_index, 1);
-		entry_set(smmu, sid, entry, target, 0, num_entry_qwords);
+		unused_update[critical_qword_index] =
+			entry[critical_qword_index];
+		entry_set(writer, entry, unused_update, 0, NUM_ENTRY_QWORDS);
+		entry_set(writer, entry, target, critical_qword_index, 1);
+		entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS);
 	} else if (used_qword_diff) {
 		/*
 		 * At least two qwords need their inuse bits to be changed. This
 		 * requires a breaking update, zero the V bit, write all qwords
 		 * but 0, then set qword 0
 		 */
-		unused_update.data[0] = entry->data[0] &
-					cpu_to_le64(~STRTAB_STE_0_V);
-		entry_set(smmu, sid, entry, &unused_update, 0, 1);
-		entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1);
-		entry_set(smmu, sid, entry, target, 0, 1);
+		unused_update[0] = entry[0] & (~writer->ops->v_bit);
+		entry_set(writer, entry, unused_update, 0, 1);
+		entry_set(writer, entry, target, 1, NUM_ENTRY_QWORDS - 1);
+		entry_set(writer, entry, target, 0, 1);
 	} else {
 		/*
 		 * No inuse bit changed. Sanity check that all unused bits are 0
@@ -1146,18 +1151,7 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
 		 * compute_qword_diff().
 		 */
 		WARN_ON_ONCE(
-			entry_set(smmu, sid, entry, target, 0, num_entry_qwords));
-	}
-
-	/* It's likely that we'll want to use the new STE soon */
-	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) {
-		struct arm_smmu_cmdq_ent
-			prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG,
-					 .prefetch = {
-						 .sid = sid,
-					 } };
-
-		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
+			entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS));
 	}
 }
 
@@ -1430,17 +1424,57 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
 	WRITE_ONCE(*dst, cpu_to_le64(val));
 }
 
-static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
+struct arm_smmu_ste_writer {
+	struct arm_smmu_entry_writer writer;
+	u32 sid;
+};
+
+static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
 {
+	struct arm_smmu_ste_writer *ste_writer =
+		container_of(writer, struct arm_smmu_ste_writer, writer);
 	struct arm_smmu_cmdq_ent cmd = {
 		.opcode	= CMDQ_OP_CFGI_STE,
 		.cfgi	= {
-			.sid	= sid,
+			.sid	= ste_writer->sid,
 			.leaf	= true,
 		},
 	};
 
-	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
+	arm_smmu_cmdq_issue_cmd_with_sync(writer->master->smmu, &cmd);
+}
+
+static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
+	.sync = arm_smmu_ste_writer_sync_entry,
+	.get_used = arm_smmu_get_ste_used,
+	.v_bit = cpu_to_le64(STRTAB_STE_0_V),
+};
+
+static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
+			       struct arm_smmu_ste *ste,
+			       const struct arm_smmu_ste *target)
+{
+	struct arm_smmu_device *smmu = master->smmu;
+	struct arm_smmu_ste_writer ste_writer = {
+		.writer = {
+			.ops = &arm_smmu_ste_writer_ops,
+			.master = master,
+		},
+		.sid = sid,
+	};
+
+	arm_smmu_write_entry(&ste_writer.writer, ste->data, target->data);
+
+	/* It's likely that we'll want to use the new STE soon */
+	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) {
+		struct arm_smmu_cmdq_ent
+			prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG,
+					 .prefetch = {
+						 .sid = sid,
+					 } };
+
+		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
+	}
 }
 
 static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
  2024-04-16 19:28 ` [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-16 20:48   ` Nicolin Chen
                     ` (2 more replies)
  2024-04-16 19:28 ` [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function Jason Gunthorpe
                   ` (7 subsequent siblings)
  9 siblings, 3 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

CD table entries and STE's have the same essential programming sequence,
just with different types.

Have arm_smmu_write_ctx_desc() generate a target CD and call
arm_smmu_write_entry() to do the programming. Due to the way the target CD
is generated by modifying the existing CD this alone is not enough for the
CD callers to be freed of the ordering requirements.

The following patches will make the rest of the CD flow mirror the STE
flow with precise CD contents generated in all cases.

Signed-off-by: Michael Shavit <mshavit@google.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 94 ++++++++++++++++-----
 1 file changed, 74 insertions(+), 20 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index bf105e914d38b1..3983de90c2fa01 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -56,6 +56,7 @@ struct arm_smmu_entry_writer_ops {
 
 #define NUM_ENTRY_QWORDS 8
 static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
+static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64));
 
 static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
 	[EVTQ_MSI_INDEX] = {
@@ -1231,6 +1232,67 @@ static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
 	return &l1_desc->l2ptr[idx];
 }
 
+struct arm_smmu_cd_writer {
+	struct arm_smmu_entry_writer writer;
+	unsigned int ssid;
+};
+
+static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
+{
+	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
+	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
+		return;
+	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
+
+	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
+	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
+		used_bits[0] &= ~cpu_to_le64(
+			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
+			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
+			CTXDESC_CD_0_TCR_SH0);
+		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
+	}
+}
+
+static void arm_smmu_cd_writer_sync_entry(struct arm_smmu_entry_writer *writer)
+{
+	struct arm_smmu_cd_writer *cd_writer =
+		container_of(writer, struct arm_smmu_cd_writer, writer);
+
+	arm_smmu_sync_cd(writer->master, cd_writer->ssid, true);
+}
+
+static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = {
+	.sync = arm_smmu_cd_writer_sync_entry,
+	.get_used = arm_smmu_get_cd_used,
+	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
+};
+
+static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
+				    struct arm_smmu_cd *cdptr,
+				    const struct arm_smmu_cd *target)
+{
+	struct arm_smmu_cd_writer cd_writer = {
+		.writer = {
+			.ops = &arm_smmu_cd_writer_ops,
+			.master = master,
+		},
+		.ssid = ssid,
+	};
+
+	arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data);
+}
+
+static void arm_smmu_clean_cd_entry(struct arm_smmu_cd *target)
+{
+	struct arm_smmu_cd used = {};
+	int i;
+
+	arm_smmu_get_cd_used(target->data, used.data);
+	for (i = 0; i != ARRAY_SIZE(target->data); i++)
+		target->data[i] &= used.data[i];
+}
+
 int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
 			    struct arm_smmu_ctx_desc *cd)
 {
@@ -1247,17 +1309,20 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
 	 */
 	u64 val;
 	bool cd_live;
-	struct arm_smmu_cd *cdptr;
+	struct arm_smmu_cd target;
+	struct arm_smmu_cd *cdptr = &target;
+	struct arm_smmu_cd *cd_table_entry;
 	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
 	struct arm_smmu_device *smmu = master->smmu;
 
 	if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
 		return -E2BIG;
 
-	cdptr = arm_smmu_get_cd_ptr(master, ssid);
-	if (!cdptr)
+	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
+	if (!cd_table_entry)
 		return -ENOMEM;
 
+	target = *cd_table_entry;
 	val = le64_to_cpu(cdptr->data[0]);
 	cd_live = !!(val & CTXDESC_CD_0_V);
 
@@ -1279,13 +1344,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
 		cdptr->data[2] = 0;
 		cdptr->data[3] = cpu_to_le64(cd->mair);
 
-		/*
-		 * STE may be live, and the SMMU might read dwords of this CD in any
-		 * order. Ensure that it observes valid values before reading
-		 * V=1.
-		 */
-		arm_smmu_sync_cd(master, ssid, true);
-
 		val = cd->tcr |
 #ifdef __BIG_ENDIAN
 			CTXDESC_CD_0_ENDI |
@@ -1299,18 +1357,14 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
 		if (cd_table->stall_enabled)
 			val |= CTXDESC_CD_0_S;
 	}
-
+	cdptr->data[0] = cpu_to_le64(val);
 	/*
-	 * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3
-	 * "Configuration structures and configuration invalidation completion"
-	 *
-	 *   The size of single-copy atomic reads made by the SMMU is
-	 *   IMPLEMENTATION DEFINED but must be at least 64 bits. Any single
-	 *   field within an aligned 64-bit span of a structure can be altered
-	 *   without first making the structure invalid.
+	 * Since the above is updating the CD entry based on the current value
+	 * without zeroing unused bits it needs fixing before being passed to
+	 * the programming logic.
 	 */
-	WRITE_ONCE(cdptr->data[0], cpu_to_le64(val));
-	arm_smmu_sync_cd(master, ssid, true);
+	arm_smmu_clean_cd_entry(&target);
+	arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target);
 	return 0;
 }
 
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
  2024-04-16 19:28 ` [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code Jason Gunthorpe
  2024-04-16 19:28 ` [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-16 21:22   ` Nicolin Chen
  2024-04-19 21:10   ` Mostafa Saleh
  2024-04-16 19:28 ` [PATCH v7 4/9] iommu/arm-smmu-v3: Consolidate clearing a CD table entry Jason Gunthorpe
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

Introduce arm_smmu_make_s1_cd() to build the CD from the paging S1 domain,
and reorganize all the places programming S1 domain CD table entries to
call it.

Split arm_smmu_update_s1_domain_cd_entry() from
arm_smmu_update_ctx_desc_devices() so that the S1 path has its own call
chain separate from the unrelated SVA path.

arm_smmu_update_s1_domain_cd_entry() only works on S1 domains
attached to RIDs and refreshes all their CDs.

Remove the forced clear of the CD during S1 domain attach,
arm_smmu_write_cd_entry() will do this automatically if necessary.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Michael Shavit <mshavit@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 25 +++++++-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 60 +++++++++++++------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  9 +++
 3 files changed, 76 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 41b44baef15e80..d159f60480935e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -53,6 +53,29 @@ static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 }
 
+static void
+arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_master *master;
+	struct arm_smmu_cd target_cd;
+	unsigned long flags;
+
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+		struct arm_smmu_cd *cdptr;
+
+		/* S1 domains only support RID attachment right now */
+		cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID);
+		if (WARN_ON(!cdptr))
+			continue;
+
+		arm_smmu_make_s1_cd(&target_cd, master, smmu_domain);
+		arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr,
+					&target_cd);
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
+}
+
 /*
  * Check if the CPU ASID is available on the SMMU side. If a private context
  * descriptor is using it, try to replace it.
@@ -96,7 +119,7 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
 	 * be some overlap between use of both ASIDs, until we invalidate the
 	 * TLB.
 	 */
-	arm_smmu_update_ctx_desc_devices(smmu_domain, IOMMU_NO_PASID, cd);
+	arm_smmu_update_s1_domain_cd_entry(smmu_domain);
 
 	/* Invalidate TLB entries previously associated with that context */
 	arm_smmu_tlb_inv_asid(smmu, asid);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 3983de90c2fa01..d24fa13a52b4e0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1204,8 +1204,8 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst,
 	WRITE_ONCE(*dst, cpu_to_le64(val));
 }
 
-static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
-					       u32 ssid)
+struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
+					u32 ssid)
 {
 	__le64 *l1ptr;
 	unsigned int idx;
@@ -1268,9 +1268,9 @@ static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = {
 	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
 };
 
-static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
-				    struct arm_smmu_cd *cdptr,
-				    const struct arm_smmu_cd *target)
+void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
+			     struct arm_smmu_cd *cdptr,
+			     const struct arm_smmu_cd *target)
 {
 	struct arm_smmu_cd_writer cd_writer = {
 		.writer = {
@@ -1283,6 +1283,32 @@ static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
 	arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data);
 }
 
+void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
+			 struct arm_smmu_master *master,
+			 struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
+
+	memset(target, 0, sizeof(*target));
+
+	target->data[0] = cpu_to_le64(
+		cd->tcr |
+#ifdef __BIG_ENDIAN
+		CTXDESC_CD_0_ENDI |
+#endif
+		CTXDESC_CD_0_V |
+		CTXDESC_CD_0_AA64 |
+		(master->stall_enabled ? CTXDESC_CD_0_S : 0) |
+		CTXDESC_CD_0_R |
+		CTXDESC_CD_0_A |
+		CTXDESC_CD_0_ASET |
+		FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid)
+		);
+
+	target->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
+	target->data[3] = cpu_to_le64(cd->mair);
+}
+
 static void arm_smmu_clean_cd_entry(struct arm_smmu_cd *target)
 {
 	struct arm_smmu_cd used = {};
@@ -2644,29 +2670,29 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	switch (smmu_domain->stage) {
-	case ARM_SMMU_DOMAIN_S1:
+	case ARM_SMMU_DOMAIN_S1: {
+		struct arm_smmu_cd target_cd;
+		struct arm_smmu_cd *cdptr;
+
 		if (!master->cd_table.cdtab) {
 			ret = arm_smmu_alloc_cd_tables(master);
 			if (ret)
 				goto out_list_del;
-		} else {
-			/*
-			 * arm_smmu_write_ctx_desc() relies on the entry being
-			 * invalid to work, clear any existing entry.
-			 */
-			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
-						      NULL);
-			if (ret)
-				goto out_list_del;
 		}
 
-		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
-		if (ret)
+		cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID);
+		if (!cdptr) {
+			ret = -ENOMEM;
 			goto out_list_del;
+		}
 
+		arm_smmu_make_s1_cd(&target_cd, master, smmu_domain);
+		arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr,
+					&target_cd);
 		arm_smmu_make_cdtable_ste(&target, master);
 		arm_smmu_install_ste_for_dev(master, &target);
 		break;
+	}
 	case ARM_SMMU_DOMAIN_S2:
 		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
 		arm_smmu_install_ste_for_dev(master, &target);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 4b767e0eeeb682..bb08f087ba39e4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -751,6 +751,15 @@ extern struct xarray arm_smmu_asid_xa;
 extern struct mutex arm_smmu_asid_lock;
 extern struct arm_smmu_ctx_desc quiet_cd;
 
+struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
+					u32 ssid);
+void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
+			 struct arm_smmu_master *master,
+			 struct arm_smmu_domain *smmu_domain);
+void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
+			     struct arm_smmu_cd *cdptr,
+			     const struct arm_smmu_cd *target);
+
 int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid,
 			    struct arm_smmu_ctx_desc *cd);
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 4/9] iommu/arm-smmu-v3: Consolidate clearing a CD table entry
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
                   ` (2 preceding siblings ...)
  2024-04-16 19:28 ` [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-16 19:28 ` [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr() Jason Gunthorpe
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

A cleared entry is all 0's. Make arm_smmu_clear_cd() do this sequence.

If we are clearing an entry and for some reason it is not already
allocated in the CD table then something has gone wrong.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 20 ++++++++++++++-----
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  1 +
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index d159f60480935e..7cf286f7a009fb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -569,7 +569,7 @@ void arm_smmu_sva_remove_dev_pasid(struct iommu_domain *domain,
 
 	mutex_lock(&sva_lock);
 
-	arm_smmu_write_ctx_desc(master, id, NULL);
+	arm_smmu_clear_cd(master, id);
 
 	list_for_each_entry(t, &master->bonds, list) {
 		if (t->mm == mm) {
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d24fa13a52b4e0..f3df1ec8d258dc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1309,6 +1309,19 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
 	target->data[3] = cpu_to_le64(cd->mair);
 }
 
+void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
+{
+	struct arm_smmu_cd target = {};
+	struct arm_smmu_cd *cdptr;
+
+	if (!master->cd_table.cdtab)
+		return;
+	cdptr = arm_smmu_get_cd_ptr(master, ssid);
+	if (WARN_ON(!cdptr))
+		return;
+	arm_smmu_write_cd_entry(master, ssid, cdptr, &target);
+}
+
 static void arm_smmu_clean_cd_entry(struct arm_smmu_cd *target)
 {
 	struct arm_smmu_cd used = {};
@@ -2696,9 +2709,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	case ARM_SMMU_DOMAIN_S2:
 		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
 		arm_smmu_install_ste_for_dev(master, &target);
-		if (master->cd_table.cdtab)
-			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
-						      NULL);
+		arm_smmu_clear_cd(master, IOMMU_NO_PASID);
 		break;
 	}
 
@@ -2746,8 +2757,7 @@ static int arm_smmu_attach_dev_ste(struct device *dev,
 	 * arm_smmu_domain->devices to avoid races updating the same context
 	 * descriptor from arm_smmu_share_asid().
 	 */
-	if (master->cd_table.cdtab)
-		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
+	arm_smmu_clear_cd(master, IOMMU_NO_PASID);
 	return 0;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index bb08f087ba39e4..99fd6f24caa818 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -751,6 +751,7 @@ extern struct xarray arm_smmu_asid_xa;
 extern struct mutex arm_smmu_asid_lock;
 extern struct arm_smmu_ctx_desc quiet_cd;
 
+void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid);
 struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
 					u32 ssid);
 void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
                   ` (3 preceding siblings ...)
  2024-04-16 19:28 ` [PATCH v7 4/9] iommu/arm-smmu-v3: Consolidate clearing a CD table entry Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-16 22:19   ` Nicolin Chen
  2024-04-19 21:14   ` Mostafa Saleh
  2024-04-16 19:28 ` [PATCH v7 6/9] iommu/arm-smmu-v3: Allocate the CD table entry in advance Jason Gunthorpe
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

Only the attach callers can perform an allocation for the CD table entry,
the other callers must not do so, they do not have the correct locking and
they cannot sleep. Split up the functions so this is clear.

arm_smmu_get_cd_ptr() will return pointer to a CD table entry without
doing any kind of allocation.

arm_smmu_alloc_cd_ptr() will allocate the table and any required
leaf.

A following patch will add lockdep assertions to arm_smmu_alloc_cd_ptr()
once the restructuring is completed and arm_smmu_alloc_cd_ptr() is never
called in the wrong context.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 61 +++++++++++++--------
 1 file changed, 39 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f3df1ec8d258dc..a0d1237272936f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -98,6 +98,7 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
 
 static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
 				    struct arm_smmu_device *smmu);
+static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
 
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
@@ -1207,29 +1208,51 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst,
 struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
 					u32 ssid)
 {
-	__le64 *l1ptr;
-	unsigned int idx;
 	struct arm_smmu_l1_ctx_desc *l1_desc;
-	struct arm_smmu_device *smmu = master->smmu;
 	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
 
+	if (!cd_table->cdtab)
+		return NULL;
+
 	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
 		return (struct arm_smmu_cd *)(cd_table->cdtab +
 					      ssid * CTXDESC_CD_DWORDS);
 
-	idx = ssid >> CTXDESC_SPLIT;
-	l1_desc = &cd_table->l1_desc[idx];
-	if (!l1_desc->l2ptr) {
-		if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
-			return NULL;
+	l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES];
+	if (!l1_desc->l2ptr)
+		return NULL;
+	return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES];
+}
 
-		l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
-		arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
-		/* An invalid L1CD can be cached */
-		arm_smmu_sync_cd(master, ssid, false);
+static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
+						 u32 ssid)
+{
+	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	if (!cd_table->cdtab) {
+		if (arm_smmu_alloc_cd_tables(master))
+			return NULL;
 	}
-	idx = ssid & (CTXDESC_L2_ENTRIES - 1);
-	return &l1_desc->l2ptr[idx];
+
+	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) {
+		unsigned int idx = ssid >> CTXDESC_SPLIT;
+		struct arm_smmu_l1_ctx_desc *l1_desc;
+
+		l1_desc = &cd_table->l1_desc[idx];
+		if (!l1_desc->l2ptr) {
+			__le64 *l1ptr;
+
+			if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
+				return NULL;
+
+			l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
+			arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
+			/* An invalid L1CD can be cached */
+			arm_smmu_sync_cd(master, ssid, false);
+		}
+	}
+	return arm_smmu_get_cd_ptr(master, ssid);
 }
 
 struct arm_smmu_cd_writer {
@@ -1357,7 +1380,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
 	if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
 		return -E2BIG;
 
-	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
+	cd_table_entry = arm_smmu_alloc_cd_ptr(master, ssid);
 	if (!cd_table_entry)
 		return -ENOMEM;
 
@@ -2687,13 +2710,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		struct arm_smmu_cd target_cd;
 		struct arm_smmu_cd *cdptr;
 
-		if (!master->cd_table.cdtab) {
-			ret = arm_smmu_alloc_cd_tables(master);
-			if (ret)
-				goto out_list_del;
-		}
-
-		cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID);
+		cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID);
 		if (!cdptr) {
 			ret = -ENOMEM;
 			goto out_list_del;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 6/9] iommu/arm-smmu-v3: Allocate the CD table entry in advance
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
                   ` (4 preceding siblings ...)
  2024-04-16 19:28 ` [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr() Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-16 19:28 ` [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function Jason Gunthorpe
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

Avoid arm_smmu_attach_dev() having to undo the changes to the
smmu_domain->devices list, acquire the cdptr earlier so we don't need to
handle that error.

Now there is a clear break in arm_smmu_attach_dev() where all the
prep-work has been done non-disruptively and we commit to making the HW
change, which cannot fail.

This completes transforming arm_smmu_attach_dev() so that it does not
disturb the HW if it fails.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 24 +++++++--------------
 1 file changed, 8 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a0d1237272936f..0aacd95f34a479 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2661,6 +2661,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_master *master;
+	struct arm_smmu_cd *cdptr;
 
 	if (!fwspec)
 		return -ENOENT;
@@ -2689,6 +2690,12 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	if (ret)
 		return ret;
 
+	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+		cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID);
+		if (!cdptr)
+			return -ENOMEM;
+	}
+
 	/*
 	 * Prevent arm_smmu_share_asid() from trying to change the ASID
 	 * of either the old or new domain while we are working on it.
@@ -2708,13 +2715,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1: {
 		struct arm_smmu_cd target_cd;
-		struct arm_smmu_cd *cdptr;
-
-		cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID);
-		if (!cdptr) {
-			ret = -ENOMEM;
-			goto out_list_del;
-		}
 
 		arm_smmu_make_s1_cd(&target_cd, master, smmu_domain);
 		arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr,
@@ -2731,16 +2731,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	}
 
 	arm_smmu_enable_ats(master, smmu_domain);
-	goto out_unlock;
-
-out_list_del:
-	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_del_init(&master->domain_head);
-	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
-
-out_unlock:
 	mutex_unlock(&arm_smmu_asid_lock);
-	return ret;
+	return 0;
 }
 
 static int arm_smmu_attach_dev_ste(struct device *dev,
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
                   ` (5 preceding siblings ...)
  2024-04-16 19:28 ` [PATCH v7 6/9] iommu/arm-smmu-v3: Allocate the CD table entry in advance Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-17  7:37   ` Nicolin Chen
                     ` (2 more replies)
  2024-04-16 19:28 ` [PATCH v7 8/9] iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() Jason Gunthorpe
                   ` (2 subsequent siblings)
  9 siblings, 3 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

Pull all the calculations for building the CD table entry for a mmu_struct
into arm_smmu_make_sva_cd().

Call it in the two places installing the SVA CD table entry.

Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove
the function.

Remove arm_smmu_write_ctx_desc() since all callers are gone. Add the
locking assertions to arm_smmu_alloc_cd_ptr() since
arm_smmu_update_ctx_desc_devices() was the last problematic caller.

Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates
the same value.

The behavior of quiet_cd changes slightly, the old implementation edited
the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD
entry. This version generates a full CD entry with a 0 TTB0 and relies on
arm_smmu_write_cd_entry() to install it hitlessly.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 156 +++++++++++-------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 103 +-----------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   7 +-
 3 files changed, 108 insertions(+), 158 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 7cf286f7a009fb..80a7d559ef2d3f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -34,25 +34,6 @@ struct arm_smmu_bond {
 
 static DEFINE_MUTEX(sva_lock);
 
-/*
- * Write the CD to the CD tables for all masters that this domain is attached
- * to. Note that this is only used to update existing CD entries in the target
- * CD table, for which it's assumed that arm_smmu_write_ctx_desc can't fail.
- */
-static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain,
-					   int ssid,
-					   struct arm_smmu_ctx_desc *cd)
-{
-	struct arm_smmu_master *master;
-	unsigned long flags;
-
-	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
-		arm_smmu_write_ctx_desc(master, ssid, cd);
-	}
-	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
-}
-
 static void
 arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain)
 {
@@ -128,11 +109,86 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
 	return NULL;
 }
 
+static u64 page_size_to_cd(void)
+{
+	static_assert(PAGE_SIZE == SZ_4K || PAGE_SIZE == SZ_16K ||
+		      PAGE_SIZE == SZ_64K);
+	if (PAGE_SIZE == SZ_64K)
+		return ARM_LPAE_TCR_TG0_64K;
+	if (PAGE_SIZE == SZ_16K)
+		return ARM_LPAE_TCR_TG0_16K;
+	return ARM_LPAE_TCR_TG0_4K;
+}
+
+static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
+				 struct arm_smmu_master *master,
+				 struct mm_struct *mm, u16 asid)
+{
+	u64 par;
+
+	memset(target, 0, sizeof(*target));
+
+	par = cpuid_feature_extract_unsigned_field(
+		read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1),
+		ID_AA64MMFR0_EL1_PARANGE_SHIFT);
+
+	target->data[0] = cpu_to_le64(
+		CTXDESC_CD_0_TCR_EPD1 |
+#ifdef __BIG_ENDIAN
+		CTXDESC_CD_0_ENDI |
+#endif
+		CTXDESC_CD_0_V |
+		FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) |
+		CTXDESC_CD_0_AA64 |
+		(master->stall_enabled ? CTXDESC_CD_0_S : 0) |
+		CTXDESC_CD_0_R |
+		CTXDESC_CD_0_A |
+		CTXDESC_CD_0_ASET |
+		FIELD_PREP(CTXDESC_CD_0_ASID, asid));
+
+	/*
+	 * If no MM is passed then this creates a SVA entry that faults
+	 * everything. arm_smmu_write_cd_entry() can hitlessly go between these
+	 * two entries types since TTB0 is ignored by HW when EPD0 is set.
+	 */
+	if (mm) {
+		target->data[0] |= cpu_to_le64(
+			FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ,
+				   64ULL - vabits_actual) |
+			FIELD_PREP(CTXDESC_CD_0_TCR_TG0, page_size_to_cd()) |
+			FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0,
+				   ARM_LPAE_TCR_RGN_WBWA) |
+			FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0,
+				   ARM_LPAE_TCR_RGN_WBWA) |
+			FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS));
+
+		target->data[1] = cpu_to_le64(virt_to_phys(mm->pgd) &
+					      CTXDESC_CD_1_TTB0_MASK);
+	} else {
+		target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_EPD0);
+
+		/*
+		 * Disable stall and immediately generate an abort if stall
+		 * disable is permitted. This speeds up cleanup for an unclean
+		 * exit if the device is still doing a lot of DMA.
+		 */
+		if (master->stall_enabled &&
+		    !(master->smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
+			target->data[0] &=
+				cpu_to_le64(~(CTXDESC_CD_0_S | CTXDESC_CD_0_R));
+	}
+
+	/*
+	 * MAIR value is pretty much constant and global, so we can just get it
+	 * from the current CPU register
+	 */
+	target->data[3] = cpu_to_le64(read_sysreg(mair_el1));
+}
+
 static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
 {
 	u16 asid;
 	int err = 0;
-	u64 tcr, par, reg;
 	struct arm_smmu_ctx_desc *cd;
 	struct arm_smmu_ctx_desc *ret = NULL;
 
@@ -166,39 +222,6 @@ static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
 	if (err)
 		goto out_free_asid;
 
-	tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) |
-	      FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
-	      FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
-	      FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
-	      CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
-
-	switch (PAGE_SIZE) {
-	case SZ_4K:
-		tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_4K);
-		break;
-	case SZ_16K:
-		tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_16K);
-		break;
-	case SZ_64K:
-		tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_64K);
-		break;
-	default:
-		WARN_ON(1);
-		err = -EINVAL;
-		goto out_free_asid;
-	}
-
-	reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
-	par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_EL1_PARANGE_SHIFT);
-	tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par);
-
-	cd->ttbr = virt_to_phys(mm->pgd);
-	cd->tcr = tcr;
-	/*
-	 * MAIR value is pretty much constant and global, so we can just get it
-	 * from the current CPU register
-	 */
-	cd->mair = read_sysreg(mair_el1);
 	cd->asid = asid;
 	cd->mm = mm;
 
@@ -276,6 +299,8 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 {
 	struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn);
 	struct arm_smmu_domain *smmu_domain = smmu_mn->domain;
+	struct arm_smmu_master *master;
+	unsigned long flags;
 
 	mutex_lock(&sva_lock);
 	if (smmu_mn->cleared) {
@@ -287,8 +312,19 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	 * DMA may still be running. Keep the cd valid to avoid C_BAD_CD events,
 	 * but disable translation.
 	 */
-	arm_smmu_update_ctx_desc_devices(smmu_domain, mm_get_enqcmd_pasid(mm),
-					 &quiet_cd);
+	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
+	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
+		struct arm_smmu_cd target;
+		struct arm_smmu_cd *cdptr;
+
+		cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm));
+		if (WARN_ON(!cdptr))
+			continue;
+		arm_smmu_make_sva_cd(&target, master, NULL, smmu_mn->cd->asid);
+		arm_smmu_write_cd_entry(master, mm_get_enqcmd_pasid(mm), cdptr,
+					&target);
+	}
+	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 	arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
 	arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0);
@@ -383,6 +419,8 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid,
 			       struct mm_struct *mm)
 {
 	int ret;
+	struct arm_smmu_cd target;
+	struct arm_smmu_cd *cdptr;
 	struct arm_smmu_bond *bond;
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
@@ -409,9 +447,13 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid,
 		goto err_free_bond;
 	}
 
-	ret = arm_smmu_write_ctx_desc(master, pasid, bond->smmu_mn->cd);
-	if (ret)
+	cdptr = arm_smmu_alloc_cd_ptr(master, mm_get_enqcmd_pasid(mm));
+	if (!cdptr) {
+		ret = -ENOMEM;
 		goto err_put_notifier;
+	}
+	arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid);
+	arm_smmu_write_cd_entry(master, pasid, cdptr, &target);
 
 	list_add(&bond->list, &master->bonds);
 	return 0;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0aacd95f34a479..d01b632197c0b7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -84,12 +84,6 @@ struct arm_smmu_option_prop {
 DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa);
 DEFINE_MUTEX(arm_smmu_asid_lock);
 
-/*
- * Special value used by SVA when a process dies, to quiesce a CD without
- * disabling it.
- */
-struct arm_smmu_ctx_desc quiet_cd = { 0 };
-
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
 	{ ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
@@ -1201,7 +1195,7 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst,
 	u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
 		  CTXDESC_L1_DESC_V;
 
-	/* See comment in arm_smmu_write_ctx_desc() */
+	/* The HW has 64 bit atomicity with stores to the L2 CD table */
 	WRITE_ONCE(*dst, cpu_to_le64(val));
 }
 
@@ -1224,12 +1218,15 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
 	return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES];
 }
 
-static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
-						 u32 ssid)
+struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
+					  u32 ssid)
 {
 	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
 	struct arm_smmu_device *smmu = master->smmu;
 
+	might_sleep();
+	iommu_group_mutex_assert(master->dev);
+
 	if (!cd_table->cdtab) {
 		if (arm_smmu_alloc_cd_tables(master))
 			return NULL;
@@ -1345,91 +1342,6 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
 	arm_smmu_write_cd_entry(master, ssid, cdptr, &target);
 }
 
-static void arm_smmu_clean_cd_entry(struct arm_smmu_cd *target)
-{
-	struct arm_smmu_cd used = {};
-	int i;
-
-	arm_smmu_get_cd_used(target->data, used.data);
-	for (i = 0; i != ARRAY_SIZE(target->data); i++)
-		target->data[i] &= used.data[i];
-}
-
-int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
-			    struct arm_smmu_ctx_desc *cd)
-{
-	/*
-	 * This function handles the following cases:
-	 *
-	 * (1) Install primary CD, for normal DMA traffic (SSID = IOMMU_NO_PASID = 0).
-	 * (2) Install a secondary CD, for SID+SSID traffic.
-	 * (3) Update ASID of a CD. Atomically write the first 64 bits of the
-	 *     CD, then invalidate the old entry and mappings.
-	 * (4) Quiesce the context without clearing the valid bit. Disable
-	 *     translation, and ignore any translation fault.
-	 * (5) Remove a secondary CD.
-	 */
-	u64 val;
-	bool cd_live;
-	struct arm_smmu_cd target;
-	struct arm_smmu_cd *cdptr = &target;
-	struct arm_smmu_cd *cd_table_entry;
-	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
-	struct arm_smmu_device *smmu = master->smmu;
-
-	if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
-		return -E2BIG;
-
-	cd_table_entry = arm_smmu_alloc_cd_ptr(master, ssid);
-	if (!cd_table_entry)
-		return -ENOMEM;
-
-	target = *cd_table_entry;
-	val = le64_to_cpu(cdptr->data[0]);
-	cd_live = !!(val & CTXDESC_CD_0_V);
-
-	if (!cd) { /* (5) */
-		val = 0;
-	} else if (cd == &quiet_cd) { /* (4) */
-		if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
-			val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
-		val |= CTXDESC_CD_0_TCR_EPD0;
-	} else if (cd_live) { /* (3) */
-		val &= ~CTXDESC_CD_0_ASID;
-		val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);
-		/*
-		 * Until CD+TLB invalidation, both ASIDs may be used for tagging
-		 * this substream's traffic
-		 */
-	} else { /* (1) and (2) */
-		cdptr->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
-		cdptr->data[2] = 0;
-		cdptr->data[3] = cpu_to_le64(cd->mair);
-
-		val = cd->tcr |
-#ifdef __BIG_ENDIAN
-			CTXDESC_CD_0_ENDI |
-#endif
-			CTXDESC_CD_0_R | CTXDESC_CD_0_A |
-			(cd->mm ? 0 : CTXDESC_CD_0_ASET) |
-			CTXDESC_CD_0_AA64 |
-			FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
-			CTXDESC_CD_0_V;
-
-		if (cd_table->stall_enabled)
-			val |= CTXDESC_CD_0_S;
-	}
-	cdptr->data[0] = cpu_to_le64(val);
-	/*
-	 * Since the above is updating the CD entry based on the current value
-	 * without zeroing unused bits it needs fixing before being passed to
-	 * the programming logic.
-	 */
-	arm_smmu_clean_cd_entry(&target);
-	arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target);
-	return 0;
-}
-
 static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
 {
 	int ret;
@@ -1438,7 +1350,6 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
 	struct arm_smmu_device *smmu = master->smmu;
 	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
 
-	cd_table->stall_enabled = master->stall_enabled;
 	cd_table->s1cdmax = master->ssid_bits;
 	max_contexts = 1 << cd_table->s1cdmax;
 
@@ -1536,7 +1447,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
 	val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
 	val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
 
-	/* See comment in arm_smmu_write_ctx_desc() */
+	/* The HW has 64 bit atomicity with stores to the L2 STE table */
 	WRITE_ONCE(*dst, cpu_to_le64(val));
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 99fd6f24caa818..8098bf8836a180 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -609,8 +609,6 @@ struct arm_smmu_ctx_desc_cfg {
 	u8				s1fmt;
 	/* log2 of the maximum number of CDs supported by this table */
 	u8				s1cdmax;
-	/* Whether CD entries in this table have the stall bit set. */
-	u8				stall_enabled:1;
 };
 
 struct arm_smmu_s2_cfg {
@@ -749,11 +747,12 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 
 extern struct xarray arm_smmu_asid_xa;
 extern struct mutex arm_smmu_asid_lock;
-extern struct arm_smmu_ctx_desc quiet_cd;
 
 void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid);
 struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
 					u32 ssid);
+struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
+					  u32 ssid);
 void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
 			 struct arm_smmu_master *master,
 			 struct arm_smmu_domain *smmu_domain);
@@ -761,8 +760,6 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
 			     struct arm_smmu_cd *cdptr,
 			     const struct arm_smmu_cd *target);
 
-int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid,
-			    struct arm_smmu_ctx_desc *cd);
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
 void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
 				 size_t granule, bool leaf,
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 8/9] iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
                   ` (6 preceding siblings ...)
  2024-04-16 19:28 ` [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-17  7:43   ` Nicolin Chen
  2024-04-16 19:28 ` [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry Jason Gunthorpe
  2024-04-16 19:40 ` [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Nicolin Chen
  9 siblings, 1 reply; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

Half the code was living in arm_smmu_domain_finalise_s1(), just move it
here and take the values directly from the pgtbl_ops instead of storing
copies.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 47 ++++++++-------------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  3 --
 2 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d01b632197c0b7..72402f6a7ed4e0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1308,15 +1308,25 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
 			 struct arm_smmu_domain *smmu_domain)
 {
 	struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
+	const struct io_pgtable_cfg *pgtbl_cfg =
+		&io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg;
+	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr =
+		&pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 
 	memset(target, 0, sizeof(*target));
 
 	target->data[0] = cpu_to_le64(
-		cd->tcr |
+		FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) |
+		FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) |
+		FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) |
+		FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
+		FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
 #ifdef __BIG_ENDIAN
 		CTXDESC_CD_0_ENDI |
 #endif
+		CTXDESC_CD_0_TCR_EPD1 |
 		CTXDESC_CD_0_V |
+		FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
 		CTXDESC_CD_0_AA64 |
 		(master->stall_enabled ? CTXDESC_CD_0_S : 0) |
 		CTXDESC_CD_0_R |
@@ -1324,9 +1334,9 @@ void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
 		CTXDESC_CD_0_ASET |
 		FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid)
 		);
-
-	target->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
-	target->data[3] = cpu_to_le64(cd->mair);
+	target->data[1] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.ttbr &
+				      CTXDESC_CD_1_TTB0_MASK);
+	target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s1_cfg.mair);
 }
 
 void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
@@ -2284,13 +2294,11 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 }
 
 static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu,
-				       struct arm_smmu_domain *smmu_domain,
-				       struct io_pgtable_cfg *pgtbl_cfg)
+				       struct arm_smmu_domain *smmu_domain)
 {
 	int ret;
 	u32 asid;
 	struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
-	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 
 	refcount_set(&cd->refs, 1);
 
@@ -2298,31 +2306,13 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu,
 	mutex_lock(&arm_smmu_asid_lock);
 	ret = xa_alloc(&arm_smmu_asid_xa, &asid, cd,
 		       XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL);
-	if (ret)
-		goto out_unlock;
-
 	cd->asid	= (u16)asid;
-	cd->ttbr	= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-	cd->tcr		= FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) |
-			  FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) |
-			  FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) |
-			  FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) |
-			  FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) |
-			  FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) |
-			  CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
-	cd->mair	= pgtbl_cfg->arm_lpae_s1_cfg.mair;
-
-	mutex_unlock(&arm_smmu_asid_lock);
-	return 0;
-
-out_unlock:
 	mutex_unlock(&arm_smmu_asid_lock);
 	return ret;
 }
 
 static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu,
-				       struct arm_smmu_domain *smmu_domain,
-				       struct io_pgtable_cfg *pgtbl_cfg)
+				       struct arm_smmu_domain *smmu_domain)
 {
 	int vmid;
 	struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
@@ -2346,8 +2336,7 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
 	struct io_pgtable_cfg pgtbl_cfg;
 	struct io_pgtable_ops *pgtbl_ops;
 	int (*finalise_stage_fn)(struct arm_smmu_device *smmu,
-				 struct arm_smmu_domain *smmu_domain,
-				 struct io_pgtable_cfg *pgtbl_cfg);
+				 struct arm_smmu_domain *smmu_domain);
 
 	/* Restrict the stage to what we can actually support */
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
@@ -2390,7 +2379,7 @@ static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
 	smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
 	smmu_domain->domain.geometry.force_aperture = true;
 
-	ret = finalise_stage_fn(smmu, smmu_domain, &pgtbl_cfg);
+	ret = finalise_stage_fn(smmu, smmu_domain);
 	if (ret < 0) {
 		free_io_pgtable_ops(pgtbl_ops);
 		return ret;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8098bf8836a180..8f791f67f9f7f4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -588,9 +588,6 @@ struct arm_smmu_strtab_l1_desc {
 
 struct arm_smmu_ctx_desc {
 	u16				asid;
-	u64				ttbr;
-	u64				tcr;
-	u64				mair;
 
 	refcount_t			refs;
 	struct mm_struct		*mm;
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
                   ` (7 preceding siblings ...)
  2024-04-16 19:28 ` [PATCH v7 8/9] iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() Jason Gunthorpe
@ 2024-04-16 19:28 ` Jason Gunthorpe
  2024-04-17  8:09   ` Nicolin Chen
  2024-04-19 21:24   ` Mostafa Saleh
  2024-04-16 19:40 ` [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Nicolin Chen
  9 siblings, 2 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-16 19:28 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

Add tests for some of the more common STE update operations that we expect
to see, as well as some artificial STE updates to test the edges of
arm_smmu_write_entry. These also serve as a record of which common
operation is expected to be hitless, and how many syncs they require.

arm_smmu_write_entry implements a generic algorithm that updates an STE/CD
to any other abritrary STE/CD configuration. The update requires a
sequence of write+sync operations with some invariants that must be held
true after each sync. arm_smmu_write_entry lends itself well to
unit-testing since the function's interaction with the STE/CD is already
abstracted by input callbacks that we can hook to introspect into the
sequence of operations. We can use these hooks to guarantee that
invariants are held throughout the entire update operation.

Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com
Signed-off-by: Michael Shavit <mshavit@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/Kconfig                         |  12 +-
 drivers/iommu/arm/arm-smmu-v3/Makefile        |   2 +
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   6 +-
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 467 ++++++++++++++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  36 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  30 ++
 6 files changed, 525 insertions(+), 28 deletions(-)
 create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 0af39bbbe3a30e..2e597102baf6e5 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -397,9 +397,9 @@ config ARM_SMMU_V3
 	  Say Y here if your system includes an IOMMU device implementing
 	  the ARM SMMUv3 architecture.
 
+if ARM_SMMU_V3
 config ARM_SMMU_V3_SVA
 	bool "Shared Virtual Addressing support for the ARM SMMUv3"
-	depends on ARM_SMMU_V3
 	select IOMMU_SVA
 	select IOMMU_IOPF
 	select MMU_NOTIFIER
@@ -410,6 +410,16 @@ config ARM_SMMU_V3_SVA
 	  Say Y here if your system supports SVA extensions such as PCIe PASID
 	  and PRI.
 
+config ARM_SMMU_V3_KUNIT_TEST
+	tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
+	depends on KUNIT
+	default KUNIT_ALL_TESTS
+	help
+	  Enable this option to unit-test arm-smmu-v3 driver functions.
+
+	  If unsure, say N.
+endif
+
 config S390_IOMMU
 	def_bool y if S390 && PCI
 	depends on S390 && PCI
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 54feb1ecccad89..014a997753a8a2 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -3,3 +3,5 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
 arm_smmu_v3-objs-y += arm-smmu-v3.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
+
+obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 80a7d559ef2d3f..f56a2d38012b5c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -120,9 +120,9 @@ static u64 page_size_to_cd(void)
 	return ARM_LPAE_TCR_TG0_4K;
 }
 
-static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
-				 struct arm_smmu_master *master,
-				 struct mm_struct *mm, u16 asid)
+void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
+			  struct arm_smmu_master *master, struct mm_struct *mm,
+			  u16 asid)
 {
 	u64 par;
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
new file mode 100644
index 00000000000000..14c8e40712a70e
--- /dev/null
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
@@ -0,0 +1,467 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2024 Google LLC.
+ */
+#include <kunit/test.h>
+#include <linux/io-pgtable.h>
+
+#include "arm-smmu-v3.h"
+
+struct arm_smmu_test_writer {
+	struct arm_smmu_entry_writer writer;
+	struct kunit *test;
+	const __le64 *init_entry;
+	const __le64 *target_entry;
+	__le64 *entry;
+
+	bool invalid_entry_written;
+	unsigned int num_syncs;
+};
+
+#define NUM_ENTRY_QWORDS 8
+#define NUM_EXPECTED_SYNCS(x) x
+
+static struct arm_smmu_ste bypass_ste;
+static struct arm_smmu_ste abort_ste;
+static struct arm_smmu_device smmu = {
+	.features = ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_ATTR_TYPES_OVR
+};
+
+static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry,
+						const __le64 *used_bits,
+						const __le64 *target,
+						unsigned int length)
+{
+	bool differs = false;
+	unsigned int i;
+
+	for (i = 0; i < length; i++) {
+		if ((entry[i] & used_bits[i]) != target[i])
+			differs = true;
+	}
+	return differs;
+}
+
+static void
+arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer)
+{
+	struct arm_smmu_test_writer *test_writer =
+		container_of(writer, struct arm_smmu_test_writer, writer);
+	__le64 *entry_used_bits;
+
+	entry_used_bits = kunit_kzalloc(
+		test_writer->test, sizeof(*entry_used_bits) * NUM_ENTRY_QWORDS,
+		GFP_KERNEL);
+	KUNIT_ASSERT_NOT_NULL(test_writer->test, entry_used_bits);
+
+	pr_debug("STE value is now set to: ");
+	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8,
+			     test_writer->entry,
+			     NUM_ENTRY_QWORDS * sizeof(*test_writer->entry),
+			     false);
+
+	test_writer->num_syncs += 1;
+	if (!(test_writer->entry[0] & writer->ops->v_bit)) {
+		test_writer->invalid_entry_written = true;
+	} else {
+		/*
+		 * At any stage in a hitless transition, the entry must be
+		 * equivalent to either the initial entry or the target entry
+		 * when only considering the bits used by the current
+		 * configuration.
+		 */
+		writer->ops->get_used(test_writer->entry, entry_used_bits);
+		KUNIT_EXPECT_FALSE(
+			test_writer->test,
+			arm_smmu_entry_differs_in_used_bits(
+				test_writer->entry, entry_used_bits,
+				test_writer->init_entry, NUM_ENTRY_QWORDS) &&
+				arm_smmu_entry_differs_in_used_bits(
+					test_writer->entry, entry_used_bits,
+					test_writer->target_entry,
+					NUM_ENTRY_QWORDS));
+	}
+}
+
+static void
+arm_smmu_v3_test_debug_print_used_bits(struct arm_smmu_entry_writer *writer,
+				       const __le64 *ste)
+{
+	__le64 used_bits[NUM_ENTRY_QWORDS] = {};
+
+	arm_smmu_get_ste_used(ste, used_bits);
+	pr_debug("STE used bits: ");
+	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, used_bits,
+			     sizeof(used_bits), false);
+}
+
+static const struct arm_smmu_entry_writer_ops test_ste_ops = {
+	.v_bit = cpu_to_le64(STRTAB_STE_0_V),
+	.sync = arm_smmu_test_writer_record_syncs,
+	.get_used = arm_smmu_get_ste_used,
+};
+
+static const struct arm_smmu_entry_writer_ops test_cd_ops = {
+	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
+	.sync = arm_smmu_test_writer_record_syncs,
+	.get_used = arm_smmu_get_cd_used,
+};
+
+static void arm_smmu_v3_test_ste_expect_transition(
+	struct kunit *test, const struct arm_smmu_ste *cur,
+	const struct arm_smmu_ste *target, unsigned int num_syncs_expected,
+	bool hitless)
+{
+	struct arm_smmu_ste cur_copy = *cur;
+	struct arm_smmu_test_writer test_writer = {
+		.writer = {
+			.ops = &test_ste_ops,
+		},
+		.test = test,
+		.init_entry = cur->data,
+		.target_entry = target->data,
+		.entry = cur_copy.data,
+		.num_syncs = 0,
+		.invalid_entry_written = false,
+
+	};
+
+	pr_debug("STE initial value: ");
+	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data,
+			     sizeof(cur_copy), false);
+	arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, cur->data);
+	pr_debug("STE target value: ");
+	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, target->data,
+			     sizeof(cur_copy), false);
+	arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer,
+					       target->data);
+
+	arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data);
+
+	KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless);
+	KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected);
+	KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy));
+}
+
+static void arm_smmu_v3_test_ste_expect_hitless_transition(
+	struct kunit *test, const struct arm_smmu_ste *cur,
+	const struct arm_smmu_ste *target, unsigned int num_syncs_expected)
+{
+	arm_smmu_v3_test_ste_expect_transition(test, cur, target,
+					       num_syncs_expected, true);
+}
+
+static const dma_addr_t fake_cdtab_dma_addr = 0xF0F0F0F0F0F0;
+
+static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste,
+					   const dma_addr_t dma_addr)
+{
+	struct arm_smmu_master master = {
+		.cd_table.cdtab_dma = dma_addr,
+		.cd_table.s1cdmax = 0xFF,
+		.cd_table.s1fmt = STRTAB_STE_0_S1FMT_64K_L2,
+		.smmu = &smmu,
+	};
+
+	arm_smmu_make_cdtable_ste(ste, &master);
+}
+
+static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test)
+{
+	/*
+	 * Bypass STEs has used bits in the first two Qwords, while abort STEs
+	 * only have used bits in the first QWord. Transitioning from bypass to
+	 * abort requires two syncs: the first to set the first qword and make
+	 * the STE into an abort, the second to clean up the second qword.
+	 */
+	arm_smmu_v3_test_ste_expect_hitless_transition(
+		test, &bypass_ste, &abort_ste, NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_ste_test_abort_to_bypass(struct kunit *test)
+{
+	/*
+	 * Transitioning from abort to bypass also requires two syncs: the first
+	 * to set the second qword data required by the bypass STE, and the
+	 * second to set the first qword and switch to bypass.
+	 */
+	arm_smmu_v3_test_ste_expect_hitless_transition(
+		test, &abort_ste, &bypass_ste, NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_ste_test_cdtable_to_abort(struct kunit *test)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste,
+						       NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_ste_test_abort_to_cdtable(struct kunit *test)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste,
+						       NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_ste_test_cdtable_to_bypass(struct kunit *test)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste,
+						       NUM_EXPECTED_SYNCS(3));
+}
+
+static void arm_smmu_v3_write_ste_test_bypass_to_cdtable(struct kunit *test)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste,
+						       NUM_EXPECTED_SYNCS(3));
+}
+
+static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste,
+				      bool ats_enabled)
+{
+	struct arm_smmu_master master = {
+		.smmu = &smmu,
+		.ats_enabled = ats_enabled,
+	};
+	struct io_pgtable io_pgtable = {};
+	struct arm_smmu_domain smmu_domain = {
+		.pgtbl_ops = &io_pgtable.ops,
+	};
+
+	io_pgtable.cfg.arm_lpae_s2_cfg.vttbr = 0xdaedbeefdeadbeefULL;
+	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.ps = 1;
+	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tg = 2;
+	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sh = 3;
+	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.orgn = 1;
+	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.irgn = 2;
+	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sl = 3;
+	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tsz = 4;
+
+	arm_smmu_make_s2_domain_ste(ste, &master, &smmu_domain);
+}
+
+static void arm_smmu_v3_write_ste_test_s2_to_abort(struct kunit *test)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_test_make_s2_ste(&ste, true);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste,
+						       NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_ste_test_abort_to_s2(struct kunit *test)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_test_make_s2_ste(&ste, true);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste,
+						       NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_ste_test_s2_to_bypass(struct kunit *test)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_test_make_s2_ste(&ste, true);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste,
+						       NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_ste_test_bypass_to_s2(struct kunit *test)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_test_make_s2_ste(&ste, true);
+	arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste,
+						       NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_test_cd_expect_transition(
+	struct kunit *test, const struct arm_smmu_cd *cur,
+	const struct arm_smmu_cd *target, unsigned int num_syncs_expected,
+	bool hitless)
+{
+	struct arm_smmu_cd cur_copy = *cur;
+	struct arm_smmu_test_writer test_writer = {
+		.writer = {
+			.ops = &test_cd_ops,
+		},
+		.test = test,
+		.init_entry = cur->data,
+		.target_entry = target->data,
+		.entry = cur_copy.data,
+		.num_syncs = 0,
+		.invalid_entry_written = false,
+
+	};
+
+	pr_debug("CD initial value: ");
+	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data,
+			     sizeof(cur_copy), false);
+	arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, cur->data);
+	pr_debug("CD target value: ");
+	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, target->data,
+			     sizeof(cur_copy), false);
+	arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer,
+					       target->data);
+
+	arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data);
+
+	KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless);
+	KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected);
+	KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy));
+}
+
+static void arm_smmu_v3_test_cd_expect_non_hitless_transition(
+	struct kunit *test, const struct arm_smmu_cd *cur,
+	const struct arm_smmu_cd *target, unsigned int num_syncs_expected)
+{
+	arm_smmu_v3_test_cd_expect_transition(test, cur, target,
+					      num_syncs_expected, false);
+}
+
+static void arm_smmu_v3_test_cd_expect_hitless_transition(
+	struct kunit *test, const struct arm_smmu_cd *cur,
+	const struct arm_smmu_cd *target, unsigned int num_syncs_expected)
+{
+	arm_smmu_v3_test_cd_expect_transition(test, cur, target,
+					      num_syncs_expected, true);
+}
+
+static void arm_smmu_test_make_s1_cd(struct arm_smmu_cd *cd, unsigned int asid)
+{
+	struct arm_smmu_master master = {
+		.smmu = &smmu,
+	};
+	struct io_pgtable io_pgtable = {};
+	struct arm_smmu_domain smmu_domain = {
+		.pgtbl_ops = &io_pgtable.ops,
+		.cd = {
+			.asid = asid,
+		},
+	};
+
+	io_pgtable.cfg.arm_lpae_s1_cfg.ttbr = 0xdaedbeefdeadbeefULL;
+	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.ips = 1;
+	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.tg = 2;
+	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.sh = 3;
+	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.orgn = 1;
+	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.irgn = 2;
+	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.tsz = 4;
+	io_pgtable.cfg.arm_lpae_s1_cfg.mair = 0xabcdef012345678ULL;
+
+	arm_smmu_make_s1_cd(cd, &master, &smmu_domain);
+}
+
+static void arm_smmu_v3_write_cd_test_s1_clear(struct kunit *test)
+{
+	struct arm_smmu_cd cd = {};
+	struct arm_smmu_cd cd_2;
+
+	arm_smmu_test_make_s1_cd(&cd_2, 1997);
+	arm_smmu_v3_test_cd_expect_non_hitless_transition(
+		test, &cd, &cd_2, NUM_EXPECTED_SYNCS(2));
+	arm_smmu_v3_test_cd_expect_non_hitless_transition(
+		test, &cd_2, &cd, NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_cd_test_s1_change_asid(struct kunit *test)
+{
+	struct arm_smmu_cd cd = {};
+	struct arm_smmu_cd cd_2;
+
+	arm_smmu_test_make_s1_cd(&cd, 778);
+	arm_smmu_test_make_s1_cd(&cd_2, 1997);
+	arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd, &cd_2,
+						      NUM_EXPECTED_SYNCS(1));
+	arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd_2, &cd,
+						      NUM_EXPECTED_SYNCS(1));
+}
+
+static void arm_smmu_test_make_sva_cd(struct arm_smmu_cd *cd, unsigned int asid)
+{
+	struct arm_smmu_master master = {
+		.smmu = &smmu,
+	};
+	struct mm_struct mm = {
+		.pgd = (void *)0xdaedbeefdeadbeefULL,
+	};
+
+	arm_smmu_make_sva_cd(cd, &master, &mm, asid);
+}
+
+static void arm_smmu_test_make_sva_release_cd(struct arm_smmu_cd *cd,
+					      unsigned int asid)
+{
+	struct arm_smmu_master master = {
+		.smmu = &smmu,
+	};
+
+	arm_smmu_make_sva_cd(cd, &master, NULL, asid);
+}
+
+static void arm_smmu_v3_write_cd_test_sva_clear(struct kunit *test)
+{
+	struct arm_smmu_cd cd = {};
+	struct arm_smmu_cd cd_2;
+
+	arm_smmu_test_make_sva_cd(&cd_2, 1997);
+	arm_smmu_v3_test_cd_expect_non_hitless_transition(
+		test, &cd, &cd_2, NUM_EXPECTED_SYNCS(2));
+	arm_smmu_v3_test_cd_expect_non_hitless_transition(
+		test, &cd_2, &cd, NUM_EXPECTED_SYNCS(2));
+}
+
+static void arm_smmu_v3_write_cd_test_sva_release(struct kunit *test)
+{
+	struct arm_smmu_cd cd;
+	struct arm_smmu_cd cd_2;
+
+	arm_smmu_test_make_sva_cd(&cd, 1997);
+	arm_smmu_test_make_sva_release_cd(&cd_2, 1997);
+	arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd, &cd_2,
+						      NUM_EXPECTED_SYNCS(2));
+	arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd_2, &cd,
+						      NUM_EXPECTED_SYNCS(2));
+}
+
+static struct kunit_case arm_smmu_v3_test_cases[] = {
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_abort),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_bypass),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_abort),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_cdtable),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_bypass),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_cdtable),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_abort),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_s2),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_bypass),
+	KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_s2),
+	KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_clear),
+	KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_change_asid),
+	KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_clear),
+	KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_release),
+	{},
+};
+
+static int arm_smmu_v3_test_suite_init(struct kunit_suite *test)
+{
+	arm_smmu_make_bypass_ste(&smmu, &bypass_ste);
+	arm_smmu_make_abort_ste(&abort_ste);
+	return 0;
+}
+
+static struct kunit_suite arm_smmu_v3_test_module = {
+	.name = "arm-smmu-v3-kunit-test",
+	.suite_init = arm_smmu_v3_test_suite_init,
+	.test_cases = arm_smmu_v3_test_cases,
+};
+kunit_test_suites(&arm_smmu_v3_test_module);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 72402f6a7ed4e0..3ffaa3b34b44bf 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -42,18 +42,6 @@ enum arm_smmu_msi_index {
 	ARM_SMMU_MAX_MSIS,
 };
 
-struct arm_smmu_entry_writer_ops;
-struct arm_smmu_entry_writer {
-	const struct arm_smmu_entry_writer_ops *ops;
-	struct arm_smmu_master *master;
-};
-
-struct arm_smmu_entry_writer_ops {
-	__le64 v_bit;
-	void (*get_used)(const __le64 *entry, __le64 *used);
-	void (*sync)(struct arm_smmu_entry_writer *writer);
-};
-
 #define NUM_ENTRY_QWORDS 8
 static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
 static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64));
@@ -980,7 +968,7 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
  * would be nice if this was complete according to the spec, but minimally it
  * has to capture the bits this driver uses.
  */
-static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
+void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 {
 	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
 
@@ -1102,8 +1090,8 @@ static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry,
  * V=0 process. This relies on the IGNORED behavior described in the
  * specification.
  */
-static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
-				 __le64 *entry, const __le64 *target)
+void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry,
+			  const __le64 *target)
 {
 	__le64 unused_update[NUM_ENTRY_QWORDS];
 	u8 used_qword_diff;
@@ -1257,7 +1245,7 @@ struct arm_smmu_cd_writer {
 	unsigned int ssid;
 };
 
-static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
+void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
 {
 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
@@ -1514,7 +1502,7 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
 	}
 }
 
-static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
+void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
 {
 	memset(target, 0, sizeof(*target));
 	target->data[0] = cpu_to_le64(
@@ -1522,8 +1510,8 @@ static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
 		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT));
 }
 
-static void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
-				     struct arm_smmu_ste *target)
+void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
+			      struct arm_smmu_ste *target)
 {
 	memset(target, 0, sizeof(*target));
 	target->data[0] = cpu_to_le64(
@@ -1535,8 +1523,8 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
 							 STRTAB_STE_1_SHCFG_INCOMING));
 }
 
-static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
-				      struct arm_smmu_master *master)
+void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
+			       struct arm_smmu_master *master)
 {
 	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
 	struct arm_smmu_device *smmu = master->smmu;
@@ -1585,9 +1573,9 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
 	}
 }
 
-static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
-					struct arm_smmu_master *master,
-					struct arm_smmu_domain *smmu_domain)
+void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
+				 struct arm_smmu_master *master,
+				 struct arm_smmu_domain *smmu_domain)
 {
 	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
 	const struct io_pgtable_cfg *pgtbl_cfg =
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 8f791f67f9f7f4..0455498d24c730 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -737,6 +737,36 @@ struct arm_smmu_domain {
 	struct list_head		mmu_notifiers;
 };
 
+/* The following are exposed for testing purposes. */
+struct arm_smmu_entry_writer_ops;
+struct arm_smmu_entry_writer {
+	const struct arm_smmu_entry_writer_ops *ops;
+	struct arm_smmu_master *master;
+};
+
+struct arm_smmu_entry_writer_ops {
+	__le64 v_bit;
+	void (*get_used)(const __le64 *entry, __le64 *used);
+	void (*sync)(struct arm_smmu_entry_writer *writer);
+};
+
+void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits);
+void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits);
+void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur,
+			  const __le64 *target);
+
+void arm_smmu_make_abort_ste(struct arm_smmu_ste *target);
+void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
+			      struct arm_smmu_ste *target);
+void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
+			       struct arm_smmu_master *master);
+void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
+				 struct arm_smmu_master *master,
+				 struct arm_smmu_domain *smmu_domain);
+void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
+			  struct arm_smmu_master *master, struct mm_struct *mm,
+			  u16 asid);
+
 static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
 {
 	return container_of(dom, struct arm_smmu_domain, domain);
-- 
2.43.2


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3)
  2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
                   ` (8 preceding siblings ...)
  2024-04-16 19:28 ` [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry Jason Gunthorpe
@ 2024-04-16 19:40 ` Nicolin Chen
  9 siblings, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-16 19:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:11PM -0300, Jason Gunthorpe wrote:
> This is split out from the larger part two which aimes to rework the PASID
> related code.

> v7:
>  - Rebase on Will's for-next & v6.9-rc2
>  - Split series in half
>  - Include the kunit test
>  - Update comments to refer to the STE & CD in the writer logic

Translate mode (S1DSS.SSID0) + SVA sanity passed with this series.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code
  2024-04-16 19:28 ` [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code Jason Gunthorpe
@ 2024-04-16 20:18   ` Nicolin Chen
  2024-04-19 21:02   ` Mostafa Saleh
  1 sibling, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-16 20:18 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:12PM -0300, Jason Gunthorpe wrote:
> Prepare to put the CD code into the same mechanism. Add an ops indirection
> around all the STE specific code and make the worker functions independent
> of the entry content being processed.
> 
> get_used and sync ops are provided to hook the correct code.
> 
> Signed-off-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-16 19:28 ` [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() Jason Gunthorpe
@ 2024-04-16 20:48   ` Nicolin Chen
  2024-04-18 13:01   ` Robin Murphy
  2024-04-19 21:07   ` Mostafa Saleh
  2 siblings, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-16 20:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:13PM -0300, Jason Gunthorpe wrote:
> CD table entries and STE's have the same essential programming sequence,
> just with different types.
> 
> Have arm_smmu_write_ctx_desc() generate a target CD and call
> arm_smmu_write_entry() to do the programming. Due to the way the target CD
> is generated by modifying the existing CD this alone is not enough for the
> CD callers to be freed of the ordering requirements.
> 
> The following patches will make the rest of the CD flow mirror the STE
> flow with precise CD contents generated in all cases.
> 
> Signed-off-by: Michael Shavit <mshavit@google.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Reviewed-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
  2024-04-16 19:28 ` [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function Jason Gunthorpe
@ 2024-04-16 21:22   ` Nicolin Chen
  2024-04-19 21:10   ` Mostafa Saleh
  1 sibling, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-16 21:22 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:14PM -0300, Jason Gunthorpe wrote:
> Introduce arm_smmu_make_s1_cd() to build the CD from the paging S1 domain,
> and reorganize all the places programming S1 domain CD table entries to
> call it.
> 
> Split arm_smmu_update_s1_domain_cd_entry() from
> arm_smmu_update_ctx_desc_devices() so that the S1 path has its own call
> chain separate from the unrelated SVA path.
> 
> arm_smmu_update_s1_domain_cd_entry() only works on S1 domains
> attached to RIDs and refreshes all their CDs.
> 
> Remove the forced clear of the CD during S1 domain attach,
> arm_smmu_write_cd_entry() will do this automatically if necessary.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  2024-04-16 19:28 ` [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr() Jason Gunthorpe
@ 2024-04-16 22:19   ` Nicolin Chen
  2024-04-19 21:14   ` Mostafa Saleh
  1 sibling, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-16 22:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:16PM -0300, Jason Gunthorpe wrote:
> Only the attach callers can perform an allocation for the CD table entry,
> the other callers must not do so, they do not have the correct locking and
> they cannot sleep. Split up the functions so this is clear.
> 
> arm_smmu_get_cd_ptr() will return pointer to a CD table entry without
> doing any kind of allocation.
> 
> arm_smmu_alloc_cd_ptr() will allocate the table and any required
> leaf.
> 
> A following patch will add lockdep assertions to arm_smmu_alloc_cd_ptr()
> once the restructuring is completed and arm_smmu_alloc_cd_ptr() is never
> called in the wrong context.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  2024-04-16 19:28 ` [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function Jason Gunthorpe
@ 2024-04-17  7:37   ` Nicolin Chen
  2024-04-17 13:17     ` Jason Gunthorpe
  2024-04-17 16:26   ` Nicolin Chen
  2024-04-18  4:40   ` Michael Shavit
  2 siblings, 1 reply; 48+ messages in thread
From: Nicolin Chen @ 2024-04-17  7:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:18PM -0300, Jason Gunthorpe wrote:
> Pull all the calculations for building the CD table entry for a mmu_struct
> into arm_smmu_make_sva_cd().
> 
> Call it in the two places installing the SVA CD table entry.
> 
> Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove
> the function.
> 
> Remove arm_smmu_write_ctx_desc() since all callers are gone. Add the
> locking assertions to arm_smmu_alloc_cd_ptr() since
> arm_smmu_update_ctx_desc_devices() was the last problematic caller.
> 
> Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates
> the same value.
> 
> The behavior of quiet_cd changes slightly, the old implementation edited
> the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD
> entry. This version generates a full CD entry with a 0 TTB0 and relies on
> arm_smmu_write_cd_entry() to install it hitlessly.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

> +static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
> +				 struct arm_smmu_master *master,
> +				 struct mm_struct *mm, u16 asid)
> +{
> +	u64 par;
> +
> +	memset(target, 0, sizeof(*target));
> +
> +	par = cpuid_feature_extract_unsigned_field(
> +		read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1),
> +		ID_AA64MMFR0_EL1_PARANGE_SHIFT);
> +
> +	target->data[0] = cpu_to_le64(
> +		CTXDESC_CD_0_TCR_EPD1 |
> +#ifdef __BIG_ENDIAN
> +		CTXDESC_CD_0_ENDI |
> +#endif
> +		CTXDESC_CD_0_V |
> +		FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) |
> +		CTXDESC_CD_0_AA64 |
> +		(master->stall_enabled ? CTXDESC_CD_0_S : 0) |
> +		CTXDESC_CD_0_R |
> +		CTXDESC_CD_0_A |
> +		CTXDESC_CD_0_ASET |
> +		FIELD_PREP(CTXDESC_CD_0_ASID, asid));

This is set for the new "quiet_cd" case too. IIUIC, it is used to
ease the switching back to a normal CD, i.e. mm != NULL case?

Thanks
Nicolin

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 8/9] iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd()
  2024-04-16 19:28 ` [PATCH v7 8/9] iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() Jason Gunthorpe
@ 2024-04-17  7:43   ` Nicolin Chen
  0 siblings, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-17  7:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:19PM -0300, Jason Gunthorpe wrote:
> Half the code was living in arm_smmu_domain_finalise_s1(), just move it
> here and take the values directly from the pgtbl_ops instead of storing
> copies.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Mostafa Saleh <smostafa@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-16 19:28 ` [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry Jason Gunthorpe
@ 2024-04-17  8:09   ` Nicolin Chen
  2024-04-17 14:16     ` Jason Gunthorpe
  2024-04-19 21:24   ` Mostafa Saleh
  1 sibling, 1 reply; 48+ messages in thread
From: Nicolin Chen @ 2024-04-17  8:09 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:20PM -0300, Jason Gunthorpe wrote:
> Add tests for some of the more common STE update operations that we expect
> to see, as well as some artificial STE updates to test the edges of
> arm_smmu_write_entry. These also serve as a record of which common
> operation is expected to be hitless, and how many syncs they require.
> 
> arm_smmu_write_entry implements a generic algorithm that updates an STE/CD
> to any other abritrary STE/CD configuration. The update requires a
> sequence of write+sync operations with some invariants that must be held
> true after each sync. arm_smmu_write_entry lends itself well to
> unit-testing since the function's interaction with the STE/CD is already
> abstracted by input callbacks that we can hook to introspect into the
> sequence of operations. We can use these hooks to guarantee that
> invariants are held throughout the entire update operation.
> 
> Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com
> Signed-off-by: Michael Shavit <mshavit@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/Kconfig                         |  12 +-
>  drivers/iommu/arm/arm-smmu-v3/Makefile        |   2 +
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   6 +-
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 467 ++++++++++++++++++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  36 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  30 ++
 
> +config ARM_SMMU_V3_KUNIT_TEST
> +	tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
> +	depends on KUNIT
> +	default KUNIT_ALL_TESTS
> +	help
> +	  Enable this option to unit-test arm-smmu-v3 driver functions.
> +
> +	  If unsure, say N.

Forgot that my SVA sanity doesn't cover this patch. And it looks
like some problems here when building it with "=m":

ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o
ERROR: modpost: "arm_smmu_make_cdtable_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
ERROR: modpost: "arm_smmu_make_bypass_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
ERROR: modpost: "arm_smmu_make_abort_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
ERROR: modpost: "arm_smmu_make_s2_domain_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
ERROR: modpost: "arm_smmu_get_ste_used" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
ERROR: modpost: "arm_smmu_write_entry" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!

Likely needs MODULE_LICENSE and some EXPORT_SYMBOLs.

With built-in '=y' and a MODULE_LICENSE, tests passed:
[   13.244780] KTAP version 1
[   13.247542] 1..1
[   13.249421]     KTAP version 1
[   13.252538]     # Subtest: arm-smmu-v3-kunit-test
[   13.257344]     1..16
[   13.259727]     ok 1 arm_smmu_v3_write_ste_test_bypass_to_abort
[   13.259789]     ok 2 arm_smmu_v3_write_ste_test_abort_to_bypass
[   13.265895]     ok 3 arm_smmu_v3_write_ste_test_cdtable_to_abort
[   13.271988]     ok 4 arm_smmu_v3_write_ste_test_abort_to_cdtable
[   13.278172]     ok 5 arm_smmu_v3_write_ste_test_cdtable_to_bypass
[   13.284353]     ok 6 arm_smmu_v3_write_ste_test_bypass_to_cdtable
[   13.290636]     ok 7 arm_smmu_v3_write_ste_test_cdtable_s1dss_change
[   13.296917]     ok 8 arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass
[   13.303464]     ok 9 arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass
[   13.310357]     ok 10 arm_smmu_v3_write_ste_test_s2_to_abort
[   13.317265]     ok 11 arm_smmu_v3_write_ste_test_abort_to_s2
[   13.323104]     ok 12 arm_smmu_v3_write_ste_test_s2_to_bypass
[   13.328937]     ok 13 arm_smmu_v3_write_ste_test_bypass_to_s2
[   13.334861]     ok 14 arm_smmu_v3_write_ste_test_s1_to_s2
[   13.340787]     ok 15 arm_smmu_v3_write_ste_test_s2_to_s1
[   13.346364]     ok 16 arm_smmu_v3_write_ste_test_non_hitless
[   13.351883] # arm-smmu-v3-kunit-test: pass:16 fail:0 skip:0 total:16
[   13.357667] # Totals: pass:16 fail:0 skip:0 total:16
[   13.364163] ok 1 arm-smmu-v3-kunit-test

Once those are fixed,
Tested-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  2024-04-17  7:37   ` Nicolin Chen
@ 2024-04-17 13:17     ` Jason Gunthorpe
  2024-04-17 16:25       ` Nicolin Chen
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-17 13:17 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Wed, Apr 17, 2024 at 12:37:19AM -0700, Nicolin Chen wrote:
> On Tue, Apr 16, 2024 at 04:28:18PM -0300, Jason Gunthorpe wrote:
> > Pull all the calculations for building the CD table entry for a mmu_struct
> > into arm_smmu_make_sva_cd().
> > 
> > Call it in the two places installing the SVA CD table entry.
> > 
> > Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove
> > the function.
> > 
> > Remove arm_smmu_write_ctx_desc() since all callers are gone. Add the
> > locking assertions to arm_smmu_alloc_cd_ptr() since
> > arm_smmu_update_ctx_desc_devices() was the last problematic caller.
> > 
> > Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates
> > the same value.
> > 
> > The behavior of quiet_cd changes slightly, the old implementation edited
> > the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD
> > entry. This version generates a full CD entry with a 0 TTB0 and relies on
> > arm_smmu_write_cd_entry() to install it hitlessly.
> > 
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> 
> > +static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
> > +				 struct arm_smmu_master *master,
> > +				 struct mm_struct *mm, u16 asid)
> > +{
> > +	u64 par;
> > +
> > +	memset(target, 0, sizeof(*target));
> > +
> > +	par = cpuid_feature_extract_unsigned_field(
> > +		read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1),
> > +		ID_AA64MMFR0_EL1_PARANGE_SHIFT);
> > +
> > +	target->data[0] = cpu_to_le64(
> > +		CTXDESC_CD_0_TCR_EPD1 |
> > +#ifdef __BIG_ENDIAN
> > +		CTXDESC_CD_0_ENDI |
> > +#endif
> > +		CTXDESC_CD_0_V |
> > +		FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) |
> > +		CTXDESC_CD_0_AA64 |
> > +		(master->stall_enabled ? CTXDESC_CD_0_S : 0) |
> > +		CTXDESC_CD_0_R |
> > +		CTXDESC_CD_0_A |
> > +		CTXDESC_CD_0_ASET |
> > +		FIELD_PREP(CTXDESC_CD_0_ASID, asid));
> 
> This is set for the new "quiet_cd" case too. IIUIC, it is used to
> ease the switching back to a normal CD, i.e. mm != NULL case?

If ASID is used by HW (eg for negative caching) then this is correct.

If ASID is not used by HW then this could be 0'd and we could adjust
the used calculation. It is still functionally correct as-is, just
slightly confusing.

I didn't notice anything in the spec about ASID interaction with
EPD0. The spec was otherwise pretty clear about which fields become
IGNORED by EPD0/1.

So I'm assuming ASID can be used by HW and must be set.

AFAICT this is what the current code does, when it programs "quiet_cd"
it doesn't actually write the whole CD it just flips EPD0 to 1
in-place. Since this is only done from a CD already programmed to a
SVA the ASID remains set.

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-17  8:09   ` Nicolin Chen
@ 2024-04-17 14:16     ` Jason Gunthorpe
  2024-04-17 16:13       ` Nicolin Chen
  2024-04-18  4:39       ` Michael Shavit
  0 siblings, 2 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-17 14:16 UTC (permalink / raw)
  To: Nicolin Chen, Michael Shavit
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, patches,
	Shameerali Kolothum Thodi, Mostafa Saleh

On Wed, Apr 17, 2024 at 01:09:40AM -0700, Nicolin Chen wrote:
> On Tue, Apr 16, 2024 at 04:28:20PM -0300, Jason Gunthorpe wrote:
> > Add tests for some of the more common STE update operations that we expect
> > to see, as well as some artificial STE updates to test the edges of
> > arm_smmu_write_entry. These also serve as a record of which common
> > operation is expected to be hitless, and how many syncs they require.
> > 
> > arm_smmu_write_entry implements a generic algorithm that updates an STE/CD
> > to any other abritrary STE/CD configuration. The update requires a
> > sequence of write+sync operations with some invariants that must be held
> > true after each sync. arm_smmu_write_entry lends itself well to
> > unit-testing since the function's interaction with the STE/CD is already
> > abstracted by input callbacks that we can hook to introspect into the
> > sequence of operations. We can use these hooks to guarantee that
> > invariants are held throughout the entire update operation.
> > 
> > Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com
> > Signed-off-by: Michael Shavit <mshavit@google.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> >  drivers/iommu/Kconfig                         |  12 +-
> >  drivers/iommu/arm/arm-smmu-v3/Makefile        |   2 +
> >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   6 +-
> >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 467 ++++++++++++++++++
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  36 +-
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  30 ++
>  
> > +config ARM_SMMU_V3_KUNIT_TEST
> > +	tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
> > +	depends on KUNIT
> > +	default KUNIT_ALL_TESTS
> > +	help
> > +	  Enable this option to unit-test arm-smmu-v3 driver functions.
> > +
> > +	  If unsure, say N.
> 
> Forgot that my SVA sanity doesn't cover this patch. And it looks
> like some problems here when building it with "=m":
> 
> ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o
> ERROR: modpost: "arm_smmu_make_cdtable_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> ERROR: modpost: "arm_smmu_make_bypass_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> ERROR: modpost: "arm_smmu_make_abort_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> ERROR: modpost: "arm_smmu_make_s2_domain_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> ERROR: modpost: "arm_smmu_get_ste_used" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> ERROR: modpost: "arm_smmu_write_entry" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> 
> Likely needs MODULE_LICENSE and some EXPORT_SYMBOLs.

Oh! The kbuild never tested this kconfig combination...

I think just this? Michael?

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index d03c729c4142dc..7b6a4e244e99cf 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -411,7 +411,7 @@ config ARM_SMMU_V3_SVA
 	  and PRI.
 
 config ARM_SMMU_V3_KUNIT_TEST
-	tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
+	bool "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
 	depends on KUNIT
 	default KUNIT_ALL_TESTS
 	help
diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
index 014a997753a8a2..0b97054b3929b7 100644
--- a/drivers/iommu/arm/arm-smmu-v3/Makefile
+++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
@@ -2,6 +2,5 @@
 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
 arm_smmu_v3-objs-y += arm-smmu-v3.o
 arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
+arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
 arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
-
-obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o


Jason

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-17 14:16     ` Jason Gunthorpe
@ 2024-04-17 16:13       ` Nicolin Chen
  2024-04-18  4:39       ` Michael Shavit
  1 sibling, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-17 16:13 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Michael Shavit, iommu, Joerg Roedel, linux-arm-kernel,
	Robin Murphy, Will Deacon, Eric Auger, Moritz Fischer,
	Moritz Fischer, patches, Shameerali Kolothum Thodi,
	Mostafa Saleh

On Wed, Apr 17, 2024 at 11:16:18AM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 17, 2024 at 01:09:40AM -0700, Nicolin Chen wrote:
> > On Tue, Apr 16, 2024 at 04:28:20PM -0300, Jason Gunthorpe wrote:
> > > Add tests for some of the more common STE update operations that we expect
> > > to see, as well as some artificial STE updates to test the edges of
> > > arm_smmu_write_entry. These also serve as a record of which common
> > > operation is expected to be hitless, and how many syncs they require.
> > > 
> > > arm_smmu_write_entry implements a generic algorithm that updates an STE/CD
> > > to any other abritrary STE/CD configuration. The update requires a
> > > sequence of write+sync operations with some invariants that must be held
> > > true after each sync. arm_smmu_write_entry lends itself well to
> > > unit-testing since the function's interaction with the STE/CD is already
> > > abstracted by input callbacks that we can hook to introspect into the
> > > sequence of operations. We can use these hooks to guarantee that
> > > invariants are held throughout the entire update operation.
> > > 
> > > Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com
> > > Signed-off-by: Michael Shavit <mshavit@google.com>
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > ---
> > >  drivers/iommu/Kconfig                         |  12 +-
> > >  drivers/iommu/arm/arm-smmu-v3/Makefile        |   2 +
> > >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   6 +-
> > >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 467 ++++++++++++++++++
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  36 +-
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  30 ++
> >  
> > > +config ARM_SMMU_V3_KUNIT_TEST
> > > +	tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
> > > +	depends on KUNIT
> > > +	default KUNIT_ALL_TESTS
> > > +	help
> > > +	  Enable this option to unit-test arm-smmu-v3 driver functions.
> > > +
> > > +	  If unsure, say N.
> > 
> > Forgot that my SVA sanity doesn't cover this patch. And it looks
> > like some problems here when building it with "=m":
> > 
> > ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o
> > ERROR: modpost: "arm_smmu_make_cdtable_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_make_bypass_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_make_abort_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_make_s2_domain_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_get_ste_used" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_write_entry" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > 
> > Likely needs MODULE_LICENSE and some EXPORT_SYMBOLs.
> 
> Oh! The kbuild never tested this kconfig combination...
> 
> I think just this? Michael?

Verified with the following change.

Thanks
Nicolin

> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index d03c729c4142dc..7b6a4e244e99cf 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -411,7 +411,7 @@ config ARM_SMMU_V3_SVA
>  	  and PRI.
>  
>  config ARM_SMMU_V3_KUNIT_TEST
> -	tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
> +	bool "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
>  	depends on KUNIT
>  	default KUNIT_ALL_TESTS
>  	help
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index 014a997753a8a2..0b97054b3929b7 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -2,6 +2,5 @@
>  obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
>  arm_smmu_v3-objs-y += arm-smmu-v3.o
>  arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
> +arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
>  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
> -
> -obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
> 
> 
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  2024-04-17 13:17     ` Jason Gunthorpe
@ 2024-04-17 16:25       ` Nicolin Chen
  0 siblings, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-17 16:25 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Wed, Apr 17, 2024 at 10:17:27AM -0300, Jason Gunthorpe wrote:
> On Wed, Apr 17, 2024 at 12:37:19AM -0700, Nicolin Chen wrote:
> > On Tue, Apr 16, 2024 at 04:28:18PM -0300, Jason Gunthorpe wrote:
> > > Pull all the calculations for building the CD table entry for a mmu_struct
> > > into arm_smmu_make_sva_cd().
> > > 
> > > Call it in the two places installing the SVA CD table entry.
> > > 
> > > Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove
> > > the function.
> > > 
> > > Remove arm_smmu_write_ctx_desc() since all callers are gone. Add the
> > > locking assertions to arm_smmu_alloc_cd_ptr() since
> > > arm_smmu_update_ctx_desc_devices() was the last problematic caller.
> > > 
> > > Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates
> > > the same value.
> > > 
> > > The behavior of quiet_cd changes slightly, the old implementation edited
> > > the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD
> > > entry. This version generates a full CD entry with a 0 TTB0 and relies on
> > > arm_smmu_write_cd_entry() to install it hitlessly.
> > > 
> > > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > 
> > > +static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
> > > +				 struct arm_smmu_master *master,
> > > +				 struct mm_struct *mm, u16 asid)
> > > +{
> > > +	u64 par;
> > > +
> > > +	memset(target, 0, sizeof(*target));
> > > +
> > > +	par = cpuid_feature_extract_unsigned_field(
> > > +		read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1),
> > > +		ID_AA64MMFR0_EL1_PARANGE_SHIFT);
> > > +
> > > +	target->data[0] = cpu_to_le64(
> > > +		CTXDESC_CD_0_TCR_EPD1 |
> > > +#ifdef __BIG_ENDIAN
> > > +		CTXDESC_CD_0_ENDI |
> > > +#endif
> > > +		CTXDESC_CD_0_V |
> > > +		FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) |
> > > +		CTXDESC_CD_0_AA64 |
> > > +		(master->stall_enabled ? CTXDESC_CD_0_S : 0) |
> > > +		CTXDESC_CD_0_R |
> > > +		CTXDESC_CD_0_A |
> > > +		CTXDESC_CD_0_ASET |
> > > +		FIELD_PREP(CTXDESC_CD_0_ASID, asid));
> > 
> > This is set for the new "quiet_cd" case too. IIUIC, it is used to
> > ease the switching back to a normal CD, i.e. mm != NULL case?
> 
> If ASID is used by HW (eg for negative caching) then this is correct.
> 
> If ASID is not used by HW then this could be 0'd and we could adjust
> the used calculation. It is still functionally correct as-is, just
> slightly confusing.
> 
> I didn't notice anything in the spec about ASID interaction with
> EPD0. The spec was otherwise pretty clear about which fields become
> IGNORED by EPD0/1.
> 
> So I'm assuming ASID can be used by HW and must be set.
> 
> AFAICT this is what the current code does, when it programs "quiet_cd"
> it doesn't actually write the whole CD it just flips EPD0 to 1
> in-place. Since this is only done from a CD already programmed to a
> SVA the ASID remains set.

Oh right. I misunderstood the last part of the commit message
about the quiet_cd. It's clear now. Thanks!

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  2024-04-16 19:28 ` [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function Jason Gunthorpe
  2024-04-17  7:37   ` Nicolin Chen
@ 2024-04-17 16:26   ` Nicolin Chen
  2024-04-18  4:40   ` Michael Shavit
  2 siblings, 0 replies; 48+ messages in thread
From: Nicolin Chen @ 2024-04-17 16:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Tue, Apr 16, 2024 at 04:28:18PM -0300, Jason Gunthorpe wrote:
> Pull all the calculations for building the CD table entry for a mmu_struct
> into arm_smmu_make_sva_cd().
> 
> Call it in the two places installing the SVA CD table entry.
> 
> Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove
> the function.
> 
> Remove arm_smmu_write_ctx_desc() since all callers are gone. Add the
> locking assertions to arm_smmu_alloc_cd_ptr() since
> arm_smmu_update_ctx_desc_devices() was the last problematic caller.
> 
> Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates
> the same value.
> 
> The behavior of quiet_cd changes slightly, the old implementation edited
> the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD
> entry. This version generates a full CD entry with a 0 TTB0 and relies on
> arm_smmu_write_cd_entry() to install it hitlessly.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-17 14:16     ` Jason Gunthorpe
  2024-04-17 16:13       ` Nicolin Chen
@ 2024-04-18  4:39       ` Michael Shavit
  2024-04-18 12:48         ` Jason Gunthorpe
  1 sibling, 1 reply; 48+ messages in thread
From: Michael Shavit @ 2024-04-18  4:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, iommu, Joerg Roedel, linux-arm-kernel,
	Robin Murphy, Will Deacon, Eric Auger, Moritz Fischer,
	Moritz Fischer, patches, Shameerali Kolothum Thodi,
	Mostafa Saleh

On Wed, Apr 17, 2024 at 10:16 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Apr 17, 2024 at 01:09:40AM -0700, Nicolin Chen wrote:
> > On Tue, Apr 16, 2024 at 04:28:20PM -0300, Jason Gunthorpe wrote:
> > > Add tests for some of the more common STE update operations that we expect
> > > to see, as well as some artificial STE updates to test the edges of
> > > arm_smmu_write_entry. These also serve as a record of which common
> > > operation is expected to be hitless, and how many syncs they require.
> > >
> > > arm_smmu_write_entry implements a generic algorithm that updates an STE/CD
> > > to any other abritrary STE/CD configuration. The update requires a
> > > sequence of write+sync operations with some invariants that must be held
> > > true after each sync. arm_smmu_write_entry lends itself well to
> > > unit-testing since the function's interaction with the STE/CD is already
> > > abstracted by input callbacks that we can hook to introspect into the
> > > sequence of operations. We can use these hooks to guarantee that
> > > invariants are held throughout the entire update operation.
> > >
> > > Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com
> > > Signed-off-by: Michael Shavit <mshavit@google.com>
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > ---
> > >  drivers/iommu/Kconfig                         |  12 +-
> > >  drivers/iommu/arm/arm-smmu-v3/Makefile        |   2 +
> > >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   6 +-
> > >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 467 ++++++++++++++++++
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  36 +-
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  30 ++
> >
> > > +config ARM_SMMU_V3_KUNIT_TEST
> > > +   tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
> > > +   depends on KUNIT
> > > +   default KUNIT_ALL_TESTS
> > > +   help
> > > +     Enable this option to unit-test arm-smmu-v3 driver functions.
> > > +
> > > +     If unsure, say N.
> >
> > Forgot that my SVA sanity doesn't cover this patch. And it looks
> > like some problems here when building it with "=m":
> >
> > ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o
> > ERROR: modpost: "arm_smmu_make_cdtable_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_make_bypass_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_make_abort_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_make_s2_domain_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_get_ste_used" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > ERROR: modpost: "arm_smmu_write_entry" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> >
> > Likely needs MODULE_LICENSE and some EXPORT_SYMBOLs.
>
> Oh! The kbuild never tested this kconfig combination...
>
> I think just this? Michael?

Urhh I'm not sure... Should this also depend on ARM_SMMU_V3? Also what
happens if ARM_SMMU_V3=m and ARM_SMMU_V3_KUNIT_TEST=y ?

>
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index d03c729c4142dc..7b6a4e244e99cf 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -411,7 +411,7 @@ config ARM_SMMU_V3_SVA
>           and PRI.
>
>  config ARM_SMMU_V3_KUNIT_TEST
> -       tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
> +       bool "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
>         depends on KUNIT
>         default KUNIT_ALL_TESTS
>         help
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index 014a997753a8a2..0b97054b3929b7 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -2,6 +2,5 @@
>  obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
>  arm_smmu_v3-objs-y += arm-smmu-v3.o
>  arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
> +arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
>  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
> -
> -obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
>
>
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  2024-04-16 19:28 ` [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function Jason Gunthorpe
  2024-04-17  7:37   ` Nicolin Chen
  2024-04-17 16:26   ` Nicolin Chen
@ 2024-04-18  4:40   ` Michael Shavit
  2024-04-18 14:28     ` Jason Gunthorpe
  2 siblings, 1 reply; 48+ messages in thread
From: Michael Shavit @ 2024-04-18  4:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Nicolin Chen,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Wed, Apr 17, 2024 at 3:28 AM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> Pull all the calculations for building the CD table entry for a mmu_struct
> into arm_smmu_make_sva_cd().
>
> Call it in the two places installing the SVA CD table entry.
>
> Open code the last caller of arm_smmu_update_ctx_desc_devices() and remove
> the function.
>
> Remove arm_smmu_write_ctx_desc() since all callers are gone. Add the
> locking assertions to arm_smmu_alloc_cd_ptr() since
> arm_smmu_update_ctx_desc_devices() was the last problematic caller.
>
> Remove quiet_cd since all users are gone, arm_smmu_make_sva_cd() creates
> the same value.
>
> The behavior of quiet_cd changes slightly, the old implementation edited
> the CD in place to set CTXDESC_CD_0_TCR_EPD0 assuming it was a SVA CD
> entry. This version generates a full CD entry with a 0 TTB0 and relies on
> arm_smmu_write_cd_entry() to install it hitlessly.
>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 156 +++++++++++-------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 103 +-----------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   7 +-
>  3 files changed, 108 insertions(+), 158 deletions(-)
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index 7cf286f7a009fb..80a7d559ef2d3f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -34,25 +34,6 @@ struct arm_smmu_bond {
>
>  static DEFINE_MUTEX(sva_lock);
>
> -/*
> - * Write the CD to the CD tables for all masters that this domain is attached
> - * to. Note that this is only used to update existing CD entries in the target
> - * CD table, for which it's assumed that arm_smmu_write_ctx_desc can't fail.
> - */
> -static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain,
> -                                          int ssid,
> -                                          struct arm_smmu_ctx_desc *cd)
> -{
> -       struct arm_smmu_master *master;
> -       unsigned long flags;
> -
> -       spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> -       list_for_each_entry(master, &smmu_domain->devices, domain_head) {
> -               arm_smmu_write_ctx_desc(master, ssid, cd);
> -       }
> -       spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> -}
> -
>  static void
>  arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain)
>  {
> @@ -128,11 +109,86 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>         return NULL;
>  }
>
> +static u64 page_size_to_cd(void)
> +{
> +       static_assert(PAGE_SIZE == SZ_4K || PAGE_SIZE == SZ_16K ||
> +                     PAGE_SIZE == SZ_64K);
> +       if (PAGE_SIZE == SZ_64K)
> +               return ARM_LPAE_TCR_TG0_64K;
> +       if (PAGE_SIZE == SZ_16K)
> +               return ARM_LPAE_TCR_TG0_16K;
> +       return ARM_LPAE_TCR_TG0_4K;
> +}
> +
> +static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
> +                                struct arm_smmu_master *master,
> +                                struct mm_struct *mm, u16 asid)
> +{
> +       u64 par;
> +
> +       memset(target, 0, sizeof(*target));
> +
> +       par = cpuid_feature_extract_unsigned_field(
> +               read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1),
> +               ID_AA64MMFR0_EL1_PARANGE_SHIFT);
> +
> +       target->data[0] = cpu_to_le64(
> +               CTXDESC_CD_0_TCR_EPD1 |
> +#ifdef __BIG_ENDIAN
> +               CTXDESC_CD_0_ENDI |
> +#endif
> +               CTXDESC_CD_0_V |
> +               FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) |
> +               CTXDESC_CD_0_AA64 |
> +               (master->stall_enabled ? CTXDESC_CD_0_S : 0) |
> +               CTXDESC_CD_0_R |
> +               CTXDESC_CD_0_A |
> +               CTXDESC_CD_0_ASET |
> +               FIELD_PREP(CTXDESC_CD_0_ASID, asid));
> +
> +       /*
> +        * If no MM is passed then this creates a SVA entry that faults
> +        * everything. arm_smmu_write_cd_entry() can hitlessly go between these
> +        * two entries types since TTB0 is ignored by HW when EPD0 is set.
> +        */
> +       if (mm) {
> +               target->data[0] |= cpu_to_le64(
> +                       FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ,
> +                                  64ULL - vabits_actual) |
> +                       FIELD_PREP(CTXDESC_CD_0_TCR_TG0, page_size_to_cd()) |
> +                       FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0,
> +                                  ARM_LPAE_TCR_RGN_WBWA) |
> +                       FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0,
> +                                  ARM_LPAE_TCR_RGN_WBWA) |
> +                       FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS));
> +
> +               target->data[1] = cpu_to_le64(virt_to_phys(mm->pgd) &
> +                                             CTXDESC_CD_1_TTB0_MASK);
> +       } else {
> +               target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_EPD0);
> +
> +               /*
> +                * Disable stall and immediately generate an abort if stall
> +                * disable is permitted. This speeds up cleanup for an unclean
> +                * exit if the device is still doing a lot of DMA.
> +                */
> +               if (master->stall_enabled &&
> +                   !(master->smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> +                       target->data[0] &=
> +                               cpu_to_le64(~(CTXDESC_CD_0_S | CTXDESC_CD_0_R));


This condition looks slightly different from the original one. Does
this imply a change in behaviour that should be noted in the commit
message?

>
> +       }
> +
> +       /*
> +        * MAIR value is pretty much constant and global, so we can just get it
> +        * from the current CPU register
> +        */
> +       target->data[3] = cpu_to_le64(read_sysreg(mair_el1));
> +}
> +
>  static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
>  {
>         u16 asid;
>         int err = 0;
> -       u64 tcr, par, reg;
>         struct arm_smmu_ctx_desc *cd;
>         struct arm_smmu_ctx_desc *ret = NULL;
>
> @@ -166,39 +222,6 @@ static struct arm_smmu_ctx_desc *arm_smmu_alloc_shared_cd(struct mm_struct *mm)
>         if (err)
>                 goto out_free_asid;
>
> -       tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, 64ULL - vabits_actual) |
> -             FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, ARM_LPAE_TCR_RGN_WBWA) |
> -             FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, ARM_LPAE_TCR_RGN_WBWA) |
> -             FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS) |
> -             CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64;
> -
> -       switch (PAGE_SIZE) {
> -       case SZ_4K:
> -               tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_4K);
> -               break;
> -       case SZ_16K:
> -               tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_16K);
> -               break;
> -       case SZ_64K:
> -               tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_TG0, ARM_LPAE_TCR_TG0_64K);
> -               break;
> -       default:
> -               WARN_ON(1);
> -               err = -EINVAL;
> -               goto out_free_asid;
> -       }
> -
> -       reg = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
> -       par = cpuid_feature_extract_unsigned_field(reg, ID_AA64MMFR0_EL1_PARANGE_SHIFT);
> -       tcr |= FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par);
> -
> -       cd->ttbr = virt_to_phys(mm->pgd);
> -       cd->tcr = tcr;
> -       /*
> -        * MAIR value is pretty much constant and global, so we can just get it
> -        * from the current CPU register
> -        */
> -       cd->mair = read_sysreg(mair_el1);
>         cd->asid = asid;
>         cd->mm = mm;
>
> @@ -276,6 +299,8 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>  {
>         struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn);
>         struct arm_smmu_domain *smmu_domain = smmu_mn->domain;
> +       struct arm_smmu_master *master;
> +       unsigned long flags;
>
>         mutex_lock(&sva_lock);
>         if (smmu_mn->cleared) {
> @@ -287,8 +312,19 @@ static void arm_smmu_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
>          * DMA may still be running. Keep the cd valid to avoid C_BAD_CD events,
>          * but disable translation.
>          */
> -       arm_smmu_update_ctx_desc_devices(smmu_domain, mm_get_enqcmd_pasid(mm),
> -                                        &quiet_cd);
> +       spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +       list_for_each_entry(master, &smmu_domain->devices, domain_head) {
> +               struct arm_smmu_cd target;
> +               struct arm_smmu_cd *cdptr;
> +
> +               cdptr = arm_smmu_get_cd_ptr(master, mm_get_enqcmd_pasid(mm));
> +               if (WARN_ON(!cdptr))
> +                       continue;
> +               arm_smmu_make_sva_cd(&target, master, NULL, smmu_mn->cd->asid);
> +               arm_smmu_write_cd_entry(master, mm_get_enqcmd_pasid(mm), cdptr,
> +                                       &target);
> +       }
> +       spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>
>         arm_smmu_tlb_inv_asid(smmu_domain->smmu, smmu_mn->cd->asid);
>         arm_smmu_atc_inv_domain(smmu_domain, mm_get_enqcmd_pasid(mm), 0, 0);
> @@ -383,6 +419,8 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid,
>                                struct mm_struct *mm)
>  {
>         int ret;
> +       struct arm_smmu_cd target;
> +       struct arm_smmu_cd *cdptr;
>         struct arm_smmu_bond *bond;
>         struct arm_smmu_master *master = dev_iommu_priv_get(dev);
>         struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> @@ -409,9 +447,13 @@ static int __arm_smmu_sva_bind(struct device *dev, ioasid_t pasid,
>                 goto err_free_bond;
>         }
>
> -       ret = arm_smmu_write_ctx_desc(master, pasid, bond->smmu_mn->cd);
> -       if (ret)
> +       cdptr = arm_smmu_alloc_cd_ptr(master, mm_get_enqcmd_pasid(mm));
> +       if (!cdptr) {
> +               ret = -ENOMEM;
>                 goto err_put_notifier;
> +       }
> +       arm_smmu_make_sva_cd(&target, master, mm, bond->smmu_mn->cd->asid);
> +       arm_smmu_write_cd_entry(master, pasid, cdptr, &target);
>
>         list_add(&bond->list, &master->bonds);
>         return 0;
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 0aacd95f34a479..d01b632197c0b7 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -84,12 +84,6 @@ struct arm_smmu_option_prop {
>  DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa);
>  DEFINE_MUTEX(arm_smmu_asid_lock);
>
> -/*
> - * Special value used by SVA when a process dies, to quiesce a CD without
> - * disabling it.
> - */
> -struct arm_smmu_ctx_desc quiet_cd = { 0 };
> -
>  static struct arm_smmu_option_prop arm_smmu_options[] = {
>         { ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" },
>         { ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"},
> @@ -1201,7 +1195,7 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst,
>         u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
>                   CTXDESC_L1_DESC_V;
>
> -       /* See comment in arm_smmu_write_ctx_desc() */
> +       /* The HW has 64 bit atomicity with stores to the L2 CD table */
>         WRITE_ONCE(*dst, cpu_to_le64(val));
>  }
>
> @@ -1224,12 +1218,15 @@ struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
>         return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES];
>  }
>
> -static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> -                                                u32 ssid)
> +struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> +                                         u32 ssid)
>  {
>         struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
>         struct arm_smmu_device *smmu = master->smmu;
>
> +       might_sleep();
> +       iommu_group_mutex_assert(master->dev);
> +
>         if (!cd_table->cdtab) {
>                 if (arm_smmu_alloc_cd_tables(master))
>                         return NULL;
> @@ -1345,91 +1342,6 @@ void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid)
>         arm_smmu_write_cd_entry(master, ssid, cdptr, &target);
>  }
>
> -static void arm_smmu_clean_cd_entry(struct arm_smmu_cd *target)
> -{
> -       struct arm_smmu_cd used = {};
> -       int i;
> -
> -       arm_smmu_get_cd_used(target->data, used.data);
> -       for (i = 0; i != ARRAY_SIZE(target->data); i++)
> -               target->data[i] &= used.data[i];
> -}
> -
> -int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
> -                           struct arm_smmu_ctx_desc *cd)
> -{
> -       /*
> -        * This function handles the following cases:
> -        *
> -        * (1) Install primary CD, for normal DMA traffic (SSID = IOMMU_NO_PASID = 0).
> -        * (2) Install a secondary CD, for SID+SSID traffic.
> -        * (3) Update ASID of a CD. Atomically write the first 64 bits of the
> -        *     CD, then invalidate the old entry and mappings.
> -        * (4) Quiesce the context without clearing the valid bit. Disable
> -        *     translation, and ignore any translation fault.
> -        * (5) Remove a secondary CD.
> -        */
> -       u64 val;
> -       bool cd_live;
> -       struct arm_smmu_cd target;
> -       struct arm_smmu_cd *cdptr = &target;
> -       struct arm_smmu_cd *cd_table_entry;
> -       struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
> -       struct arm_smmu_device *smmu = master->smmu;
> -
> -       if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
> -               return -E2BIG;
> -
> -       cd_table_entry = arm_smmu_alloc_cd_ptr(master, ssid);
> -       if (!cd_table_entry)
> -               return -ENOMEM;
> -
> -       target = *cd_table_entry;
> -       val = le64_to_cpu(cdptr->data[0]);
> -       cd_live = !!(val & CTXDESC_CD_0_V);
> -
> -       if (!cd) { /* (5) */
> -               val = 0;
> -       } else if (cd == &quiet_cd) { /* (4) */
> -               if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> -                       val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
> -               val |= CTXDESC_CD_0_TCR_EPD0;
> -       } else if (cd_live) { /* (3) */
> -               val &= ~CTXDESC_CD_0_ASID;
> -               val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);
> -               /*
> -                * Until CD+TLB invalidation, both ASIDs may be used for tagging
> -                * this substream's traffic
> -                */
> -       } else { /* (1) and (2) */
> -               cdptr->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
> -               cdptr->data[2] = 0;
> -               cdptr->data[3] = cpu_to_le64(cd->mair);
> -
> -               val = cd->tcr |
> -#ifdef __BIG_ENDIAN
> -                       CTXDESC_CD_0_ENDI |
> -#endif
> -                       CTXDESC_CD_0_R | CTXDESC_CD_0_A |
> -                       (cd->mm ? 0 : CTXDESC_CD_0_ASET) |
> -                       CTXDESC_CD_0_AA64 |
> -                       FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) |
> -                       CTXDESC_CD_0_V;
> -
> -               if (cd_table->stall_enabled)
> -                       val |= CTXDESC_CD_0_S;
> -       }
> -       cdptr->data[0] = cpu_to_le64(val);
> -       /*
> -        * Since the above is updating the CD entry based on the current value
> -        * without zeroing unused bits it needs fixing before being passed to
> -        * the programming logic.
> -        */
> -       arm_smmu_clean_cd_entry(&target);
> -       arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target);
> -       return 0;
> -}
> -
>  static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
>  {
>         int ret;
> @@ -1438,7 +1350,6 @@ static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master)
>         struct arm_smmu_device *smmu = master->smmu;
>         struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
>
> -       cd_table->stall_enabled = master->stall_enabled;
>         cd_table->s1cdmax = master->ssid_bits;
>         max_contexts = 1 << cd_table->s1cdmax;
>
> @@ -1536,7 +1447,7 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
>         val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span);
>         val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK;
>
> -       /* See comment in arm_smmu_write_ctx_desc() */
> +       /* The HW has 64 bit atomicity with stores to the L2 STE table */
>         WRITE_ONCE(*dst, cpu_to_le64(val));
>  }
>
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 99fd6f24caa818..8098bf8836a180 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -609,8 +609,6 @@ struct arm_smmu_ctx_desc_cfg {
>         u8                              s1fmt;
>         /* log2 of the maximum number of CDs supported by this table */
>         u8                              s1cdmax;
> -       /* Whether CD entries in this table have the stall bit set. */
> -       u8                              stall_enabled:1;
>  };
>
>  struct arm_smmu_s2_cfg {
> @@ -749,11 +747,12 @@ static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
>
>  extern struct xarray arm_smmu_asid_xa;
>  extern struct mutex arm_smmu_asid_lock;
> -extern struct arm_smmu_ctx_desc quiet_cd;
>
>  void arm_smmu_clear_cd(struct arm_smmu_master *master, ioasid_t ssid);
>  struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
>                                         u32 ssid);
> +struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> +                                         u32 ssid);
>  void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
>                          struct arm_smmu_master *master,
>                          struct arm_smmu_domain *smmu_domain);
> @@ -761,8 +760,6 @@ void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
>                              struct arm_smmu_cd *cdptr,
>                              const struct arm_smmu_cd *target);
>
> -int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid,
> -                           struct arm_smmu_ctx_desc *cd);
>  void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
>  void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid,
>                                  size_t granule, bool leaf,
> --
> 2.43.2
>

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-18  4:39       ` Michael Shavit
@ 2024-04-18 12:48         ` Jason Gunthorpe
  2024-04-18 14:34           ` Michael Shavit
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-18 12:48 UTC (permalink / raw)
  To: Michael Shavit
  Cc: Nicolin Chen, iommu, Joerg Roedel, linux-arm-kernel,
	Robin Murphy, Will Deacon, Eric Auger, Moritz Fischer,
	Moritz Fischer, patches, Shameerali Kolothum Thodi,
	Mostafa Saleh

On Thu, Apr 18, 2024 at 12:39:29PM +0800, Michael Shavit wrote:
> > > Forgot that my SVA sanity doesn't cover this patch. And it looks
> > > like some problems here when building it with "=m":
> > >
> > > ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o
> > > ERROR: modpost: "arm_smmu_make_cdtable_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > ERROR: modpost: "arm_smmu_make_bypass_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > ERROR: modpost: "arm_smmu_make_abort_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > ERROR: modpost: "arm_smmu_make_s2_domain_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > ERROR: modpost: "arm_smmu_get_ste_used" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > ERROR: modpost: "arm_smmu_write_entry" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > >
> > > Likely needs MODULE_LICENSE and some EXPORT_SYMBOLs.
> >
> > Oh! The kbuild never tested this kconfig combination...
> >
> > I think just this? Michael?
> 
> Urhh I'm not sure... Should this also depend on ARM_SMMU_V3? 

It does:

if ARM_SMMU_V3
config ARM_SMMU_V3_SVA
        bool "Shared Virtual Addressing support for the ARM SMMUv3"
        select IOMMU_SVA
        select IOMMU_IOPF
        select MMU_NOTIFIER
        help
          Support for sharing process address spaces with devices using the
          SMMUv3.

          Say Y here if your system supports SVA extensions such as PCIe PASID
          and PRI.

config ARM_SMMU_V3_KUNIT_TEST
        bool "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
        depends on KUNIT
        default KUNIT_ALL_TESTS
        help
          Enable this option to unit-test arm-smmu-v3 driver functions.

          If unsure, say N.
endif

The 'if' creates an automatic dependency and groups things into a
kconfig menu

> Also what
> happens if ARM_SMMU_V3=m and ARM_SMMU_V3_KUNIT_TEST=y ?

Works fine, the kunit symbols are exported and we still build one
module for smmu so we don't need to have internal cross-module stuff

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-16 19:28 ` [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() Jason Gunthorpe
  2024-04-16 20:48   ` Nicolin Chen
@ 2024-04-18 13:01   ` Robin Murphy
  2024-04-18 16:08     ` Jason Gunthorpe
  2024-04-19 21:07   ` Mostafa Saleh
  2 siblings, 1 reply; 48+ messages in thread
From: Robin Murphy @ 2024-04-18 13:01 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Joerg Roedel, linux-arm-kernel, Will Deacon
  Cc: Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi, Mostafa Saleh

On 16/04/2024 8:28 pm, Jason Gunthorpe wrote:
> CD table entries and STE's have the same essential programming sequence,
> just with different types.
> 
> Have arm_smmu_write_ctx_desc() generate a target CD and call
> arm_smmu_write_entry() to do the programming. Due to the way the target CD
> is generated by modifying the existing CD this alone is not enough for the
> CD callers to be freed of the ordering requirements.
> 
> The following patches will make the rest of the CD flow mirror the STE
> flow with precise CD contents generated in all cases.
> 
> Signed-off-by: Michael Shavit <mshavit@google.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Reviewed-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 94 ++++++++++++++++-----
>   1 file changed, 74 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index bf105e914d38b1..3983de90c2fa01 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -56,6 +56,7 @@ struct arm_smmu_entry_writer_ops {
>   
>   #define NUM_ENTRY_QWORDS 8
>   static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
> +static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64));
>   
>   static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
>   	[EVTQ_MSI_INDEX] = {
> @@ -1231,6 +1232,67 @@ static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
>   	return &l1_desc->l2ptr[idx];
>   }
>   
> +struct arm_smmu_cd_writer {
> +	struct arm_smmu_entry_writer writer;
> +	unsigned int ssid;
> +};
> +
> +static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> +{
> +	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> +	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> +		return;
> +	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> +
> +	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */

They're ignored if the *effective value* of EPD0 is 1, which means you 
also need to account for when EPD0 itself is ignored, or all this 
complication is essentially meaningless.

Thanks,
Robin.

> +	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> +		used_bits[0] &= ~cpu_to_le64(
> +			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> +			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> +			CTXDESC_CD_0_TCR_SH0);
> +		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> +	}
> +}
> +
> +static void arm_smmu_cd_writer_sync_entry(struct arm_smmu_entry_writer *writer)
> +{
> +	struct arm_smmu_cd_writer *cd_writer =
> +		container_of(writer, struct arm_smmu_cd_writer, writer);
> +
> +	arm_smmu_sync_cd(writer->master, cd_writer->ssid, true);
> +}
> +
> +static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = {
> +	.sync = arm_smmu_cd_writer_sync_entry,
> +	.get_used = arm_smmu_get_cd_used,
> +	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
> +};
> +
> +static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
> +				    struct arm_smmu_cd *cdptr,
> +				    const struct arm_smmu_cd *target)
> +{
> +	struct arm_smmu_cd_writer cd_writer = {
> +		.writer = {
> +			.ops = &arm_smmu_cd_writer_ops,
> +			.master = master,
> +		},
> +		.ssid = ssid,
> +	};
> +
> +	arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data);
> +}
> +
> +static void arm_smmu_clean_cd_entry(struct arm_smmu_cd *target)
> +{
> +	struct arm_smmu_cd used = {};
> +	int i;
> +
> +	arm_smmu_get_cd_used(target->data, used.data);
> +	for (i = 0; i != ARRAY_SIZE(target->data); i++)
> +		target->data[i] &= used.data[i];
> +}
> +
>   int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>   			    struct arm_smmu_ctx_desc *cd)
>   {
> @@ -1247,17 +1309,20 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>   	 */
>   	u64 val;
>   	bool cd_live;
> -	struct arm_smmu_cd *cdptr;
> +	struct arm_smmu_cd target;
> +	struct arm_smmu_cd *cdptr = &target;
> +	struct arm_smmu_cd *cd_table_entry;
>   	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
>   	struct arm_smmu_device *smmu = master->smmu;
>   
>   	if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
>   		return -E2BIG;
>   
> -	cdptr = arm_smmu_get_cd_ptr(master, ssid);
> -	if (!cdptr)
> +	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
> +	if (!cd_table_entry)
>   		return -ENOMEM;
>   
> +	target = *cd_table_entry;
>   	val = le64_to_cpu(cdptr->data[0]);
>   	cd_live = !!(val & CTXDESC_CD_0_V);
>   
> @@ -1279,13 +1344,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>   		cdptr->data[2] = 0;
>   		cdptr->data[3] = cpu_to_le64(cd->mair);
>   
> -		/*
> -		 * STE may be live, and the SMMU might read dwords of this CD in any
> -		 * order. Ensure that it observes valid values before reading
> -		 * V=1.
> -		 */
> -		arm_smmu_sync_cd(master, ssid, true);
> -
>   		val = cd->tcr |
>   #ifdef __BIG_ENDIAN
>   			CTXDESC_CD_0_ENDI |
> @@ -1299,18 +1357,14 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>   		if (cd_table->stall_enabled)
>   			val |= CTXDESC_CD_0_S;
>   	}
> -
> +	cdptr->data[0] = cpu_to_le64(val);
>   	/*
> -	 * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3
> -	 * "Configuration structures and configuration invalidation completion"
> -	 *
> -	 *   The size of single-copy atomic reads made by the SMMU is
> -	 *   IMPLEMENTATION DEFINED but must be at least 64 bits. Any single
> -	 *   field within an aligned 64-bit span of a structure can be altered
> -	 *   without first making the structure invalid.
> +	 * Since the above is updating the CD entry based on the current value
> +	 * without zeroing unused bits it needs fixing before being passed to
> +	 * the programming logic.
>   	 */
> -	WRITE_ONCE(cdptr->data[0], cpu_to_le64(val));
> -	arm_smmu_sync_cd(master, ssid, true);
> +	arm_smmu_clean_cd_entry(&target);
> +	arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target);
>   	return 0;
>   }
>   

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function
  2024-04-18  4:40   ` Michael Shavit
@ 2024-04-18 14:28     ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-18 14:28 UTC (permalink / raw)
  To: Michael Shavit
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Nicolin Chen,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Thu, Apr 18, 2024 at 12:40:03PM +0800, Michael Shavit wrote:

> > +static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
> > +                                struct arm_smmu_master *master,
> > +                                struct mm_struct *mm, u16 asid)
> > +{
> > +       u64 par;
> > +
> > +       memset(target, 0, sizeof(*target));
> > +
> > +       par = cpuid_feature_extract_unsigned_field(
> > +               read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1),
> > +               ID_AA64MMFR0_EL1_PARANGE_SHIFT);
> > +
> > +       target->data[0] = cpu_to_le64(
> > +               CTXDESC_CD_0_TCR_EPD1 |
> > +#ifdef __BIG_ENDIAN
> > +               CTXDESC_CD_0_ENDI |
> > +#endif
> > +               CTXDESC_CD_0_V |
> > +               FIELD_PREP(CTXDESC_CD_0_TCR_IPS, par) |
> > +               CTXDESC_CD_0_AA64 |
> > +               (master->stall_enabled ? CTXDESC_CD_0_S : 0) |
> > +               CTXDESC_CD_0_R |
> > +               CTXDESC_CD_0_A |
> > +               CTXDESC_CD_0_ASET |
> > +               FIELD_PREP(CTXDESC_CD_0_ASID, asid));
> > +
> > +       /*
> > +        * If no MM is passed then this creates a SVA entry that faults
> > +        * everything. arm_smmu_write_cd_entry() can hitlessly go between these
> > +        * two entries types since TTB0 is ignored by HW when EPD0 is set.
> > +        */
> > +       if (mm) {
> > +               target->data[0] |= cpu_to_le64(
> > +                       FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ,
> > +                                  64ULL - vabits_actual) |
> > +                       FIELD_PREP(CTXDESC_CD_0_TCR_TG0, page_size_to_cd()) |
> > +                       FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0,
> > +                                  ARM_LPAE_TCR_RGN_WBWA) |
> > +                       FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0,
> > +                                  ARM_LPAE_TCR_RGN_WBWA) |
> > +                       FIELD_PREP(CTXDESC_CD_0_TCR_SH0, ARM_LPAE_TCR_SH_IS));
> > +
> > +               target->data[1] = cpu_to_le64(virt_to_phys(mm->pgd) &
> > +                                             CTXDESC_CD_1_TTB0_MASK);
> > +       } else {
> > +               target->data[0] |= cpu_to_le64(CTXDESC_CD_0_TCR_EPD0);
> > +
> > +               /*
> > +                * Disable stall and immediately generate an abort if stall
> > +                * disable is permitted. This speeds up cleanup for an unclean
> > +                * exit if the device is still doing a lot of DMA.
> > +                */
> > +               if (master->stall_enabled &&
> > +                   !(master->smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> > +                       target->data[0] &=
> > +                               cpu_to_le64(~(CTXDESC_CD_0_S | CTXDESC_CD_0_R));
> 
> 
> This condition looks slightly different from the original one. Does
> this imply a change in behaviour that should be noted in the commit
> message?

You mean because stall_enable is checked? This means the R bit will
not be cleared for non-stalling devices.

Yeah, that probably shouldn't be changed in this patch, I'll adjust it.

But I think the original commit is slightly off as the PCI modes
shouldn't be changing behavior. Issuing a non-translated MemRd/Wr to
non-present IOVA should always abort and always log an event
regardless of what state the mm is in. Devices need to ensure that
their HW only issues ATS for SVA PASIDs.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-18 12:48         ` Jason Gunthorpe
@ 2024-04-18 14:34           ` Michael Shavit
  0 siblings, 0 replies; 48+ messages in thread
From: Michael Shavit @ 2024-04-18 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Nicolin Chen, iommu, Joerg Roedel, linux-arm-kernel,
	Robin Murphy, Will Deacon, Eric Auger, Moritz Fischer,
	Moritz Fischer, patches, Shameerali Kolothum Thodi,
	Mostafa Saleh

On Thu, Apr 18, 2024 at 8:49 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Thu, Apr 18, 2024 at 12:39:29PM +0800, Michael Shavit wrote:
> > > > Forgot that my SVA sanity doesn't cover this patch. And it looks
> > > > like some problems here when building it with "=m":
> > > >
> > > > ERROR: modpost: missing MODULE_LICENSE() in drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o
> > > > ERROR: modpost: "arm_smmu_make_cdtable_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > > ERROR: modpost: "arm_smmu_make_bypass_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > > ERROR: modpost: "arm_smmu_make_abort_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > > ERROR: modpost: "arm_smmu_make_s2_domain_ste" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > > ERROR: modpost: "arm_smmu_get_ste_used" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > > ERROR: modpost: "arm_smmu_write_entry" [drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.ko] undefined!
> > > >
> > > > Likely needs MODULE_LICENSE and some EXPORT_SYMBOLs.
> > >
> > > Oh! The kbuild never tested this kconfig combination...
> > >
> > > I think just this? Michael?
> >
> > Urhh I'm not sure... Should this also depend on ARM_SMMU_V3?
>
> It does:
>
> if ARM_SMMU_V3
> config ARM_SMMU_V3_SVA
>         bool "Shared Virtual Addressing support for the ARM SMMUv3"
>         select IOMMU_SVA
>         select IOMMU_IOPF
>         select MMU_NOTIFIER
>         help
>           Support for sharing process address spaces with devices using the
>           SMMUv3.
>
>           Say Y here if your system supports SVA extensions such as PCIe PASID
>           and PRI.
>
> config ARM_SMMU_V3_KUNIT_TEST
>         bool "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
>         depends on KUNIT
>         default KUNIT_ALL_TESTS
>         help
>           Enable this option to unit-test arm-smmu-v3 driver functions.
>
>           If unsure, say N.
> endif
>
> The 'if' creates an automatic dependency and groups things into a
> kconfig menu

Ohhh, I should have looked more carefully :) . Thanks.

>
> > Also what
> > happens if ARM_SMMU_V3=m and ARM_SMMU_V3_KUNIT_TEST=y ?
>
> Works fine, the kunit symbols are exported and we still build one
> module for smmu so we don't need to have internal cross-module stuff
>
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-18 13:01   ` Robin Murphy
@ 2024-04-18 16:08     ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-18 16:08 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Will Deacon, Eric Auger,
	Moritz Fischer, Moritz Fischer, Michael Shavit, Nicolin Chen,
	patches, Shameerali Kolothum Thodi, Mostafa Saleh

On Thu, Apr 18, 2024 at 02:01:31PM +0100, Robin Murphy wrote:
> On 16/04/2024 8:28 pm, Jason Gunthorpe wrote:
> > CD table entries and STE's have the same essential programming sequence,
> > just with different types.
> > 
> > Have arm_smmu_write_ctx_desc() generate a target CD and call
> > arm_smmu_write_entry() to do the programming. Due to the way the target CD
> > is generated by modifying the existing CD this alone is not enough for the
> > CD callers to be freed of the ordering requirements.
> > 
> > The following patches will make the rest of the CD flow mirror the STE
> > flow with precise CD contents generated in all cases.
> > 
> > Signed-off-by: Michael Shavit <mshavit@google.com>
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Reviewed-by: Moritz Fischer <moritzf@google.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> >   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 94 ++++++++++++++++-----
> >   1 file changed, 74 insertions(+), 20 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index bf105e914d38b1..3983de90c2fa01 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -56,6 +56,7 @@ struct arm_smmu_entry_writer_ops {
> >   #define NUM_ENTRY_QWORDS 8
> >   static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
> > +static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64));
> >   static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
> >   	[EVTQ_MSI_INDEX] = {
> > @@ -1231,6 +1232,67 @@ static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
> >   	return &l1_desc->l2ptr[idx];
> >   }
> > +struct arm_smmu_cd_writer {
> > +	struct arm_smmu_entry_writer writer;
> > +	unsigned int ssid;
> > +};
> > +
> > +static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > +{
> > +	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> > +	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> > +		return;
> > +	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> > +
> > +	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> 
> They're ignored if the *effective value* of EPD0 is 1, which means you also
> need to account for when EPD0 itself is ignored, or all this complication is
> essentially meaningless.

Do you mean this?

 Consistent with Armv8-A translation, the EPD0 and EPD1 fields are
 IGNORED (and their effective value is 0) if this CD is located from
 an STE with StreamWorld of any-EL2 or EL3. It is only possible for an
 EL1 (Secure or non-secure) or any-EL2-E2H stream to disable
 translation table walk sing EPD0 or EPD1.

Regardless, part of the design is that the make functions don't set
IGNORED bits and get_used only has to process what the make functions
build, not the universe of all descriptors.

In this case the make function sets EPD0 and constructs a CD that is
only valid if EPD0 is available. It also zeros the TTB0/etc values
because they are expected to be IGNORED and the code has no valid
value to provide anyhow.

The comment was intened to be read as: if EPD0 is set [by the make
function] then TTB0/etc will be IGNORED.

I will update the comment for clarity.

The complexity you are talking about must be delt with by the make
side. If we do need to support something where EPD0 doesn't work then
make functions must never set it.

Do we have a problem here? Can SVA activate and EPD0 will be ignored?
That would be a security bug.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code
  2024-04-16 19:28 ` [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code Jason Gunthorpe
  2024-04-16 20:18   ` Nicolin Chen
@ 2024-04-19 21:02   ` Mostafa Saleh
  2024-04-22 13:09     ` Jason Gunthorpe
  1 sibling, 1 reply; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-19 21:02 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

Hi Jason,

On Tue, Apr 16, 2024 at 04:28:12PM -0300, Jason Gunthorpe wrote:
> Prepare to put the CD code into the same mechanism. Add an ops indirection
> around all the STE specific code and make the worker functions independent
> of the entry content being processed.
> 
> get_used and sync ops are provided to hook the correct code.
> 
> Signed-off-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 178 ++++++++++++--------
>  1 file changed, 106 insertions(+), 72 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 79c18e95dd293e..bf105e914d38b1 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -42,8 +42,20 @@ enum arm_smmu_msi_index {
>  	ARM_SMMU_MAX_MSIS,
>  };
>  
> -static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu,
> -				      ioasid_t sid);
> +struct arm_smmu_entry_writer_ops;
> +struct arm_smmu_entry_writer {
> +	const struct arm_smmu_entry_writer_ops *ops;
> +	struct arm_smmu_master *master;
> +};
> +
> +struct arm_smmu_entry_writer_ops {
> +	__le64 v_bit;
> +	void (*get_used)(const __le64 *entry, __le64 *used);
> +	void (*sync)(struct arm_smmu_entry_writer *writer);
> +};
> +
> +#define NUM_ENTRY_QWORDS 8
> +static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
>  
>  static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
>  	[EVTQ_MSI_INDEX] = {
> @@ -972,43 +984,42 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
>   * would be nice if this was complete according to the spec, but minimally it
>   * has to capture the bits this driver uses.
>   */
> -static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent,
> -				  struct arm_smmu_ste *used_bits)
> +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
>  {
> -	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent->data[0]));
> +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
>  
> -	used_bits->data[0] = cpu_to_le64(STRTAB_STE_0_V);
> -	if (!(ent->data[0] & cpu_to_le64(STRTAB_STE_0_V)))
> +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
>  		return;
>  
> -	used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
> +	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
>  
>  	/* S1 translates */
>  	if (cfg & BIT(0)) {
> -		used_bits->data[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
> -						  STRTAB_STE_0_S1CTXPTR_MASK |
> -						  STRTAB_STE_0_S1CDMAX);
> -		used_bits->data[1] |=
> +		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
> +					    STRTAB_STE_0_S1CTXPTR_MASK |
> +					    STRTAB_STE_0_S1CDMAX);
> +		used_bits[1] |=
>  			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
>  				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
>  				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW |
>  				    STRTAB_STE_1_EATS);
> -		used_bits->data[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
> +		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
>  	}
>  
>  	/* S2 translates */
>  	if (cfg & BIT(1)) {
> -		used_bits->data[1] |=
> +		used_bits[1] |=
>  			cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG);
> -		used_bits->data[2] |=
> +		used_bits[2] |=
>  			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
>  				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
>  				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
> -		used_bits->data[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
> +		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
>  	}
>  
>  	if (cfg == STRTAB_STE_0_CFG_BYPASS)
> -		used_bits->data[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
> +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
>  }
>  
>  /*
> @@ -1017,57 +1028,55 @@ static void arm_smmu_get_ste_used(const struct arm_smmu_ste *ent,
>   * unused_update is an intermediate value of entry that has unused bits set to
>   * their new values.
>   */
> -static u8 arm_smmu_entry_qword_diff(const struct arm_smmu_ste *entry,
> -				    const struct arm_smmu_ste *target,
> -				    struct arm_smmu_ste *unused_update)
> +static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
> +				    const __le64 *entry, const __le64 *target,
> +				    __le64 *unused_update)
>  {
> -	struct arm_smmu_ste target_used = {};
> -	struct arm_smmu_ste cur_used = {};
> +	__le64 target_used[NUM_ENTRY_QWORDS] = {};
> +	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
>  	u8 used_qword_diff = 0;
>  	unsigned int i;
>  
> -	arm_smmu_get_ste_used(entry, &cur_used);
> -	arm_smmu_get_ste_used(target, &target_used);
> +	writer->ops->get_used(entry, cur_used);
> +	writer->ops->get_used(target, target_used);
>  
> -	for (i = 0; i != ARRAY_SIZE(target_used.data); i++) {
> +	for (i = 0; i != NUM_ENTRY_QWORDS; i++) {
>  		/*
>  		 * Check that masks are up to date, the make functions are not
>  		 * allowed to set a bit to 1 if the used function doesn't say it
>  		 * is used.
>  		 */
> -		WARN_ON_ONCE(target->data[i] & ~target_used.data[i]);
> +		WARN_ON_ONCE(target[i] & ~target_used[i]);
>  
>  		/* Bits can change because they are not currently being used */
> -		unused_update->data[i] = (entry->data[i] & cur_used.data[i]) |
> -					 (target->data[i] & ~cur_used.data[i]);
> +		unused_update[i] = (entry[i] & cur_used[i]) |
> +				   (target[i] & ~cur_used[i]);
>  		/*
>  		 * Each bit indicates that a used bit in a qword needs to be
>  		 * changed after unused_update is applied.
>  		 */
> -		if ((unused_update->data[i] & target_used.data[i]) !=
> -		    target->data[i])
> +		if ((unused_update[i] & target_used[i]) != target[i])
>  			used_qword_diff |= 1 << i;
>  	}
>  	return used_qword_diff;
>  }
>  
> -static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid,
> -		      struct arm_smmu_ste *entry,
> -		      const struct arm_smmu_ste *target, unsigned int start,
> +static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry,
> +		      const __le64 *target, unsigned int start,
>  		      unsigned int len)
>  {
>  	bool changed = false;
>  	unsigned int i;
>  
>  	for (i = start; len != 0; len--, i++) {
> -		if (entry->data[i] != target->data[i]) {
> -			WRITE_ONCE(entry->data[i], target->data[i]);
> +		if (entry[i] != target[i]) {
> +			WRITE_ONCE(entry[i], target[i]);
>  			changed = true;
>  		}
>  	}
>  
>  	if (changed)
> -		arm_smmu_sync_ste_for_sid(smmu, sid);
> +		writer->ops->sync(writer);
>  	return changed;
>  }
>  
> @@ -1097,24 +1106,21 @@ static bool entry_set(struct arm_smmu_device *smmu, ioasid_t sid,
>   * V=0 process. This relies on the IGNORED behavior described in the
>   * specification.
>   */
> -static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
> -			       struct arm_smmu_ste *entry,
> -			       const struct arm_smmu_ste *target)
> +static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
> +				 __le64 *entry, const __le64 *target)
>  {
> -	unsigned int num_entry_qwords = ARRAY_SIZE(target->data);
> -	struct arm_smmu_device *smmu = master->smmu;
> -	struct arm_smmu_ste unused_update;
> +	__le64 unused_update[NUM_ENTRY_QWORDS];
>  	u8 used_qword_diff;
>  
>  	used_qword_diff =
> -		arm_smmu_entry_qword_diff(entry, target, &unused_update);
> +		arm_smmu_entry_qword_diff(writer, entry, target, unused_update);
>  	if (hweight8(used_qword_diff) == 1) {
>  		/*
>  		 * Only one qword needs its used bits to be changed. This is a
> -		 * hitless update, update all bits the current STE is ignoring
> -		 * to their new values, then update a single "critical qword" to
> -		 * change the STE and finally 0 out any bits that are now unused
> -		 * in the target configuration.
> +		 * hitless update, update all bits the current STE/CD is
> +		 * ignoring to their new values, then update a single "critical
> +		 * qword" to change the STE/CD and finally 0 out any bits that
> +		 * are now unused in the target configuration.
>  		 */
>  		unsigned int critical_qword_index = ffs(used_qword_diff) - 1;
>  
> @@ -1123,22 +1129,21 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
>  		 * writing it in the next step anyways. This can save a sync
>  		 * when the only change is in that qword.
>  		 */
> -		unused_update.data[critical_qword_index] =
> -			entry->data[critical_qword_index];
> -		entry_set(smmu, sid, entry, &unused_update, 0, num_entry_qwords);
> -		entry_set(smmu, sid, entry, target, critical_qword_index, 1);
> -		entry_set(smmu, sid, entry, target, 0, num_entry_qwords);
> +		unused_update[critical_qword_index] =
> +			entry[critical_qword_index];
> +		entry_set(writer, entry, unused_update, 0, NUM_ENTRY_QWORDS);
> +		entry_set(writer, entry, target, critical_qword_index, 1);
> +		entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS);
>  	} else if (used_qword_diff) {
>  		/*
>  		 * At least two qwords need their inuse bits to be changed. This
>  		 * requires a breaking update, zero the V bit, write all qwords
>  		 * but 0, then set qword 0
>  		 */
> -		unused_update.data[0] = entry->data[0] &
> -					cpu_to_le64(~STRTAB_STE_0_V);
> -		entry_set(smmu, sid, entry, &unused_update, 0, 1);
> -		entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1);
> -		entry_set(smmu, sid, entry, target, 0, 1);
> +		unused_update[0] = entry[0] & (~writer->ops->v_bit);

arm_smmu_write_entry() assumes that v_bit is in entry[0] and that “1” means valid
(which is true for both STE and CD) so why do we care about it, if we break the
STE/CD anyway, why not just do:

	unused_update[0] = 0;
	entry_set(writer, entry, unused_update, 0, 1);
	entry_set(writer, entry, target, 1, NUM_ENTRY_QWORDS - 1)
	entry_set(writer, entry, target, 0, 1);

That makes the code simpler by avoiding having the v_bit in
arm_smmu_entry_writer_ops.


Thanks,
Mostafa

> +		entry_set(writer, entry, unused_update, 0, 1);
> +		entry_set(writer, entry, target, 1, NUM_ENTRY_QWORDS - 1);
> +		entry_set(writer, entry, target, 0, 1);
>  	} else {
>  		/*
>  		 * No inuse bit changed. Sanity check that all unused bits are 0
> @@ -1146,18 +1151,7 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
>  		 * compute_qword_diff().
>  		 */
>  		WARN_ON_ONCE(
> -			entry_set(smmu, sid, entry, target, 0, num_entry_qwords));
> -	}
> -
> -	/* It's likely that we'll want to use the new STE soon */
> -	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) {
> -		struct arm_smmu_cmdq_ent
> -			prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG,
> -					 .prefetch = {
> -						 .sid = sid,
> -					 } };
> -
> -		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
> +			entry_set(writer, entry, target, 0, NUM_ENTRY_QWORDS));
>  	}
>  }
>  
> @@ -1430,17 +1424,57 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
>  	WRITE_ONCE(*dst, cpu_to_le64(val));
>  }
>  
> -static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
> +struct arm_smmu_ste_writer {
> +	struct arm_smmu_entry_writer writer;
> +	u32 sid;
> +};
> +
> +static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
>  {
> +	struct arm_smmu_ste_writer *ste_writer =
> +		container_of(writer, struct arm_smmu_ste_writer, writer);
>  	struct arm_smmu_cmdq_ent cmd = {
>  		.opcode	= CMDQ_OP_CFGI_STE,
>  		.cfgi	= {
> -			.sid	= sid,
> +			.sid	= ste_writer->sid,
>  			.leaf	= true,
>  		},
>  	};
>  
> -	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
> +	arm_smmu_cmdq_issue_cmd_with_sync(writer->master->smmu, &cmd);
> +}
> +
> +static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
> +	.sync = arm_smmu_ste_writer_sync_entry,
> +	.get_used = arm_smmu_get_ste_used,
> +	.v_bit = cpu_to_le64(STRTAB_STE_0_V),
> +};
> +
> +static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
> +			       struct arm_smmu_ste *ste,
> +			       const struct arm_smmu_ste *target)
> +{
> +	struct arm_smmu_device *smmu = master->smmu;
> +	struct arm_smmu_ste_writer ste_writer = {
> +		.writer = {
> +			.ops = &arm_smmu_ste_writer_ops,
> +			.master = master,
> +		},
> +		.sid = sid,
> +	};
> +
> +	arm_smmu_write_entry(&ste_writer.writer, ste->data, target->data);
> +
> +	/* It's likely that we'll want to use the new STE soon */
> +	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) {
> +		struct arm_smmu_cmdq_ent
> +			prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG,
> +					 .prefetch = {
> +						 .sid = sid,
> +					 } };
> +
> +		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
> +	}
>  }
>  
>  static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
> -- 
> 2.43.2
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-16 19:28 ` [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() Jason Gunthorpe
  2024-04-16 20:48   ` Nicolin Chen
  2024-04-18 13:01   ` Robin Murphy
@ 2024-04-19 21:07   ` Mostafa Saleh
  2024-04-22 13:29     ` Jason Gunthorpe
  2 siblings, 1 reply; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-19 21:07 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

Hi Jason,

On Tue, Apr 16, 2024 at 04:28:13PM -0300, Jason Gunthorpe wrote:
> CD table entries and STE's have the same essential programming sequence,
> just with different types.
> 
> Have arm_smmu_write_ctx_desc() generate a target CD and call
> arm_smmu_write_entry() to do the programming. Due to the way the target CD
> is generated by modifying the existing CD this alone is not enough for the
> CD callers to be freed of the ordering requirements.
> 
> The following patches will make the rest of the CD flow mirror the STE
> flow with precise CD contents generated in all cases.
> 
> Signed-off-by: Michael Shavit <mshavit@google.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Reviewed-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 94 ++++++++++++++++-----
>  1 file changed, 74 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index bf105e914d38b1..3983de90c2fa01 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -56,6 +56,7 @@ struct arm_smmu_entry_writer_ops {
>  
>  #define NUM_ENTRY_QWORDS 8
>  static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
> +static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64));
>  
>  static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
>  	[EVTQ_MSI_INDEX] = {
> @@ -1231,6 +1232,67 @@ static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
>  	return &l1_desc->l2ptr[idx];
>  }
>  
> +struct arm_smmu_cd_writer {
> +	struct arm_smmu_entry_writer writer;
> +	unsigned int ssid;
> +};
> +
> +static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> +{
> +	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> +	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> +		return;
> +	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> +
> +	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> +	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> +		used_bits[0] &= ~cpu_to_le64(
> +			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> +			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> +			CTXDESC_CD_0_TCR_SH0);
> +		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> +	}
> +}
> +
> +static void arm_smmu_cd_writer_sync_entry(struct arm_smmu_entry_writer *writer)
> +{
> +	struct arm_smmu_cd_writer *cd_writer =
> +		container_of(writer, struct arm_smmu_cd_writer, writer);
> +
> +	arm_smmu_sync_cd(writer->master, cd_writer->ssid, true);
> +}
> +
> +static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = {
> +	.sync = arm_smmu_cd_writer_sync_entry,
> +	.get_used = arm_smmu_get_cd_used,
> +	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
> +};
> +
> +static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
> +				    struct arm_smmu_cd *cdptr,
> +				    const struct arm_smmu_cd *target)
> +{
> +	struct arm_smmu_cd_writer cd_writer = {
> +		.writer = {
> +			.ops = &arm_smmu_cd_writer_ops,
> +			.master = master,
> +		},
> +		.ssid = ssid,
> +	};
> +
> +	arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data);
> +}
> +
> +static void arm_smmu_clean_cd_entry(struct arm_smmu_cd *target)
> +{
> +	struct arm_smmu_cd used = {};
> +	int i;
> +
> +	arm_smmu_get_cd_used(target->data, used.data);
> +	for (i = 0; i != ARRAY_SIZE(target->data); i++)
> +		target->data[i] &= used.data[i];
> +}
> +
>  int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>  			    struct arm_smmu_ctx_desc *cd)
>  {
> @@ -1247,17 +1309,20 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>  	 */
>  	u64 val;
>  	bool cd_live;
> -	struct arm_smmu_cd *cdptr;
> +	struct arm_smmu_cd target;
> +	struct arm_smmu_cd *cdptr = &target;
> +	struct arm_smmu_cd *cd_table_entry;
>  	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
>  	struct arm_smmu_device *smmu = master->smmu;
>  
>  	if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
>  		return -E2BIG;
>  
> -	cdptr = arm_smmu_get_cd_ptr(master, ssid);
> -	if (!cdptr)
> +	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
> +	if (!cd_table_entry)
>  		return -ENOMEM;
>  
> +	target = *cd_table_entry;

As this changes the logic where all CD manipulation is not on the actual
CD, I believe a comment would be helpful here.

>  	val = le64_to_cpu(cdptr->data[0]);
>  	cd_live = !!(val & CTXDESC_CD_0_V);
>  
> @@ -1279,13 +1344,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>  		cdptr->data[2] = 0;
>  		cdptr->data[3] = cpu_to_le64(cd->mair);
>  
> -		/*
> -		 * STE may be live, and the SMMU might read dwords of this CD in any
> -		 * order. Ensure that it observes valid values before reading
> -		 * V=1.
> -		 */
> -		arm_smmu_sync_cd(master, ssid, true);
> -
>  		val = cd->tcr |
>  #ifdef __BIG_ENDIAN
>  			CTXDESC_CD_0_ENDI |
> @@ -1299,18 +1357,14 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>  		if (cd_table->stall_enabled)
>  			val |= CTXDESC_CD_0_S;
>  	}
> -
> +	cdptr->data[0] = cpu_to_le64(val);
>  	/*
> -	 * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3
> -	 * "Configuration structures and configuration invalidation completion"
> -	 *
> -	 *   The size of single-copy atomic reads made by the SMMU is
> -	 *   IMPLEMENTATION DEFINED but must be at least 64 bits. Any single
> -	 *   field within an aligned 64-bit span of a structure can be altered
> -	 *   without first making the structure invalid.
> +	 * Since the above is updating the CD entry based on the current value
> +	 * without zeroing unused bits it needs fixing before being passed to
> +	 * the programming logic.
>  	 */
> -	WRITE_ONCE(cdptr->data[0], cpu_to_le64(val));
> -	arm_smmu_sync_cd(master, ssid, true);
> +	arm_smmu_clean_cd_entry(&target);

I am not sure I understand the logic here, is that only needed for entry[0]
As I see the other entries are set and not reused.

If so, I think it’d be better to make that clear, also as used_bits are always 0xff
for all cases, I believe the EPD0 logic should be integrated in populating the CD so
it is correct by construction, as this looks like a hack to me.

Thanks,
Mostafa

> +	arm_smmu_write_cd_entry(master, ssid, cd_table_entry, &target);
>  	return 0;
>  }
>  
> -- 
> 2.43.2
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
  2024-04-16 19:28 ` [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function Jason Gunthorpe
  2024-04-16 21:22   ` Nicolin Chen
@ 2024-04-19 21:10   ` Mostafa Saleh
  2024-04-22 13:52     ` Jason Gunthorpe
  1 sibling, 1 reply; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-19 21:10 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

Hi Jason,

On Tue, Apr 16, 2024 at 04:28:14PM -0300, Jason Gunthorpe wrote:
> Introduce arm_smmu_make_s1_cd() to build the CD from the paging S1 domain,
> and reorganize all the places programming S1 domain CD table entries to
> call it.
> 
> Split arm_smmu_update_s1_domain_cd_entry() from
> arm_smmu_update_ctx_desc_devices() so that the S1 path has its own call
> chain separate from the unrelated SVA path.
> 
> arm_smmu_update_s1_domain_cd_entry() only works on S1 domains
> attached to RIDs and refreshes all their CDs.
> 
> Remove the forced clear of the CD during S1 domain attach,
> arm_smmu_write_cd_entry() will do this automatically if necessary.
> 
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 25 +++++++-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 60 +++++++++++++------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  9 +++
>  3 files changed, 76 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index 41b44baef15e80..d159f60480935e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -53,6 +53,29 @@ static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain
>  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  }
>  
> +static void
> +arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain)

nit: shouldn’t that be arm_smmu_update_sva_domain_cd_entry?
> +{
> +	struct arm_smmu_master *master;
> +	struct arm_smmu_cd target_cd;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> +	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
> +		struct arm_smmu_cd *cdptr;
> +
> +		/* S1 domains only support RID attachment right now */
> +		cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID);
> +		if (WARN_ON(!cdptr))
> +			continue;
> +
> +		arm_smmu_make_s1_cd(&target_cd, master, smmu_domain);
> +		arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr,
> +					&target_cd);

Case ARM_SMMU_DOMAIN_S1 has the some code:
  arm_smmu_get_cd_pter => arm_smmu_make_s1_cd => arm_smmu_write_cd_entry
I’d prefer if that was abstracted with the SMMUv3 driver and it provides a higher
level API rather than exposing these low-level functions in the header file.
But no strong opinion.

> +	}
> +	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> +}
> +
>  /*
>   * Check if the CPU ASID is available on the SMMU side. If a private context
>   * descriptor is using it, try to replace it.
> @@ -96,7 +119,7 @@ arm_smmu_share_asid(struct mm_struct *mm, u16 asid)
>  	 * be some overlap between use of both ASIDs, until we invalidate the
>  	 * TLB.
>  	 */
> -	arm_smmu_update_ctx_desc_devices(smmu_domain, IOMMU_NO_PASID, cd);
> +	arm_smmu_update_s1_domain_cd_entry(smmu_domain);
>  
>  	/* Invalidate TLB entries previously associated with that context */
>  	arm_smmu_tlb_inv_asid(smmu, asid);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 3983de90c2fa01..d24fa13a52b4e0 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1204,8 +1204,8 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst,
>  	WRITE_ONCE(*dst, cpu_to_le64(val));
>  }
>  
> -static struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
> -					       u32 ssid)
> +struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
> +					u32 ssid)
>  {
>  	__le64 *l1ptr;
>  	unsigned int idx;
> @@ -1268,9 +1268,9 @@ static const struct arm_smmu_entry_writer_ops arm_smmu_cd_writer_ops = {
>  	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
>  };
>  
> -static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
> -				    struct arm_smmu_cd *cdptr,
> -				    const struct arm_smmu_cd *target)
> +void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
> +			     struct arm_smmu_cd *cdptr,
> +			     const struct arm_smmu_cd *target)
>  {
>  	struct arm_smmu_cd_writer cd_writer = {
>  		.writer = {
> @@ -1283,6 +1283,32 @@ static void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
>  	arm_smmu_write_entry(&cd_writer.writer, cdptr->data, target->data);
>  }
>  
> +void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
> +			 struct arm_smmu_master *master,
> +			 struct arm_smmu_domain *smmu_domain)
> +{
> +	struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
> +
> +	memset(target, 0, sizeof(*target));
> +
> +	target->data[0] = cpu_to_le64(
> +		cd->tcr |
> +#ifdef __BIG_ENDIAN
> +		CTXDESC_CD_0_ENDI |
> +#endif
> +		CTXDESC_CD_0_V |
> +		CTXDESC_CD_0_AA64 |
> +		(master->stall_enabled ? CTXDESC_CD_0_S : 0) |
> +		CTXDESC_CD_0_R |
> +		CTXDESC_CD_0_A |
> +		CTXDESC_CD_0_ASET |
> +		FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid)
> +		);
> +
> +	target->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
> +	target->data[3] = cpu_to_le64(cd->mair);
> +}
> +

IMO, patches to handle CD = NULL and quiet CD should be introduced first so it is
easier to follow as now there is duplicate code in arm_smmu_write_ctx_desc() which
is dead and makes it a little harder to review, but if reordered,
arm_smmu_write_ctx_desc() can be removed in this patch so we can see how code moved.

Otherwise:
Reviewed-by: Mostafa Saleh <smostafa@google.com>

Thanks,
Mostafa
>  static void arm_smmu_clean_cd_entry(struct arm_smmu_cd *target)
>  {
>  	struct arm_smmu_cd used = {};
> @@ -2644,29 +2670,29 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
>  	switch (smmu_domain->stage) {
> -	case ARM_SMMU_DOMAIN_S1:
> +	case ARM_SMMU_DOMAIN_S1: {
> +		struct arm_smmu_cd target_cd;
> +		struct arm_smmu_cd *cdptr;
> +
>  		if (!master->cd_table.cdtab) {
>  			ret = arm_smmu_alloc_cd_tables(master);
>  			if (ret)
>  				goto out_list_del;
> -		} else {
> -			/*
> -			 * arm_smmu_write_ctx_desc() relies on the entry being
> -			 * invalid to work, clear any existing entry.
> -			 */
> -			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> -						      NULL);
> -			if (ret)
> -				goto out_list_del;
>  		}
>  
> -		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
> -		if (ret)
> +		cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID);
> +		if (!cdptr) {
> +			ret = -ENOMEM;
>  			goto out_list_del;
> +		}
>  
> +		arm_smmu_make_s1_cd(&target_cd, master, smmu_domain);
> +		arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr,
> +					&target_cd);
>  		arm_smmu_make_cdtable_ste(&target, master);
>  		arm_smmu_install_ste_for_dev(master, &target);
>  		break;
> +	}
>  	case ARM_SMMU_DOMAIN_S2:
>  		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
>  		arm_smmu_install_ste_for_dev(master, &target);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 4b767e0eeeb682..bb08f087ba39e4 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -751,6 +751,15 @@ extern struct xarray arm_smmu_asid_xa;
>  extern struct mutex arm_smmu_asid_lock;
>  extern struct arm_smmu_ctx_desc quiet_cd;
>  
> +struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
> +					u32 ssid);
> +void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
> +			 struct arm_smmu_master *master,
> +			 struct arm_smmu_domain *smmu_domain);
> +void arm_smmu_write_cd_entry(struct arm_smmu_master *master, int ssid,
> +			     struct arm_smmu_cd *cdptr,
> +			     const struct arm_smmu_cd *target);
> +
>  int arm_smmu_write_ctx_desc(struct arm_smmu_master *smmu_master, int ssid,
>  			    struct arm_smmu_ctx_desc *cd);
>  void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid);
> -- 
> 2.43.2
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  2024-04-16 19:28 ` [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr() Jason Gunthorpe
  2024-04-16 22:19   ` Nicolin Chen
@ 2024-04-19 21:14   ` Mostafa Saleh
  2024-04-22 14:20     ` Jason Gunthorpe
  1 sibling, 1 reply; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-19 21:14 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

Hi Jason,

On Tue, Apr 16, 2024 at 04:28:16PM -0300, Jason Gunthorpe wrote:
> Only the attach callers can perform an allocation for the CD table entry,
> the other callers must not do so, they do not have the correct locking and
> they cannot sleep. Split up the functions so this is clear.
> 
> arm_smmu_get_cd_ptr() will return pointer to a CD table entry without
> doing any kind of allocation.
> 
> arm_smmu_alloc_cd_ptr() will allocate the table and any required
> leaf.
> 
> A following patch will add lockdep assertions to arm_smmu_alloc_cd_ptr()
> once the restructuring is completed and arm_smmu_alloc_cd_ptr() is never
> called in the wrong context.
> 
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 61 +++++++++++++--------
>  1 file changed, 39 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index f3df1ec8d258dc..a0d1237272936f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -98,6 +98,7 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
>  
>  static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
>  				    struct arm_smmu_device *smmu);
> +static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
>  
>  static void parse_driver_options(struct arm_smmu_device *smmu)
>  {
> @@ -1207,29 +1208,51 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst,
>  struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
>  					u32 ssid)
>  {
> -	__le64 *l1ptr;
> -	unsigned int idx;
>  	struct arm_smmu_l1_ctx_desc *l1_desc;
> -	struct arm_smmu_device *smmu = master->smmu;
>  	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
>  
> +	if (!cd_table->cdtab)
> +		return NULL;
> +
>  	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
>  		return (struct arm_smmu_cd *)(cd_table->cdtab +
>  					      ssid * CTXDESC_CD_DWORDS);
>  
> -	idx = ssid >> CTXDESC_SPLIT;
> -	l1_desc = &cd_table->l1_desc[idx];
> -	if (!l1_desc->l2ptr) {
> -		if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
> -			return NULL;
> +	l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES];

These operations used to be shift and bit masking which made sense as it does
what hardware does, is there any reason you changed it to division and modulo?
I checked the disassembly and gcc does the right thing as constants are power
of 2, but I am just curious.

> +	if (!l1_desc->l2ptr)
> +		return NULL;
> +	return &l1_desc->l2ptr[ssid % CTXDESC_L2_ENTRIES];
> +}
>  
> -		l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
> -		arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
> -		/* An invalid L1CD can be cached */
> -		arm_smmu_sync_cd(master, ssid, false);
> +static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> +						 u32 ssid)
> +{
> +	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
> +	struct arm_smmu_device *smmu = master->smmu;
> +
> +	if (!cd_table->cdtab) {
> +		if (arm_smmu_alloc_cd_tables(master))
> +			return NULL;
>  	}
> -	idx = ssid & (CTXDESC_L2_ENTRIES - 1);
> -	return &l1_desc->l2ptr[idx];
> +
> +	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) {
> +		unsigned int idx = ssid >> CTXDESC_SPLIT;

Ok, now it’s a shift, I think we should be consistent with how we
calculate the index.

> +		struct arm_smmu_l1_ctx_desc *l1_desc;
> +
> +		l1_desc = &cd_table->l1_desc[idx];
> +		if (!l1_desc->l2ptr) {
> +			__le64 *l1ptr;
> +
> +			if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
> +				return NULL;
> +
> +			l1ptr = cd_table->cdtab + idx * CTXDESC_L1_DESC_DWORDS;
> +			arm_smmu_write_cd_l1_desc(l1ptr, l1_desc);
> +			/* An invalid L1CD can be cached */
> +			arm_smmu_sync_cd(master, ssid, false);
> +		}
> +	}
> +	return arm_smmu_get_cd_ptr(master, ssid);
>  }
>  
>  struct arm_smmu_cd_writer {
> @@ -1357,7 +1380,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
>  	if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
>  		return -E2BIG;
>  
> -	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
> +	cd_table_entry = arm_smmu_alloc_cd_ptr(master, ssid);

The only path allocates the main table is “arm_smmu_attach_dev”, I guess
it would be more robust to leave that as is and have 2 versions of get_cd,
one that allocates leaf and one that is not allocating, what do you think?

Thanks,
Mostafa



>  	if (!cd_table_entry)
>  		return -ENOMEM;
>  
> @@ -2687,13 +2710,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  		struct arm_smmu_cd target_cd;
>  		struct arm_smmu_cd *cdptr;
>  
> -		if (!master->cd_table.cdtab) {
> -			ret = arm_smmu_alloc_cd_tables(master);
> -			if (ret)
> -				goto out_list_del;
> -		}
> -
> -		cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID);
> +		cdptr = arm_smmu_alloc_cd_ptr(master, IOMMU_NO_PASID);
>  		if (!cdptr) {
>  			ret = -ENOMEM;
>  			goto out_list_del;
> -- 
> 2.43.2
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-16 19:28 ` [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry Jason Gunthorpe
  2024-04-17  8:09   ` Nicolin Chen
@ 2024-04-19 21:24   ` Mostafa Saleh
  2024-04-22 14:24     ` Jason Gunthorpe
  1 sibling, 1 reply; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-19 21:24 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

Hi Jason,

I am still reviewing the patch, however 2 quick notes.

On Tue, Apr 16, 2024 at 04:28:20PM -0300, Jason Gunthorpe wrote:
> Add tests for some of the more common STE update operations that we expect
> to see, as well as some artificial STE updates to test the edges of
> arm_smmu_write_entry. These also serve as a record of which common
> operation is expected to be hitless, and how many syncs they require.
> 
> arm_smmu_write_entry implements a generic algorithm that updates an STE/CD
> to any other abritrary STE/CD configuration. The update requires a
> sequence of write+sync operations with some invariants that must be held
> true after each sync. arm_smmu_write_entry lends itself well to
> unit-testing since the function's interaction with the STE/CD is already
> abstracted by input callbacks that we can hook to introspect into the
> sequence of operations. We can use these hooks to guarantee that
> invariants are held throughout the entire update operation.
> 
> Link: https://lore.kernel.org/r/20240106083617.1173871-3-mshavit@google.com
> Signed-off-by: Michael Shavit <mshavit@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/Kconfig                         |  12 +-
>  drivers/iommu/arm/arm-smmu-v3/Makefile        |   2 +
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   6 +-
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  | 467 ++++++++++++++++++
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   |  36 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  30 ++
>  6 files changed, 525 insertions(+), 28 deletions(-)
>  create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 0af39bbbe3a30e..2e597102baf6e5 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -397,9 +397,9 @@ config ARM_SMMU_V3
>  	  Say Y here if your system includes an IOMMU device implementing
>  	  the ARM SMMUv3 architecture.
>  
> +if ARM_SMMU_V3
>  config ARM_SMMU_V3_SVA
>  	bool "Shared Virtual Addressing support for the ARM SMMUv3"
> -	depends on ARM_SMMU_V3
>  	select IOMMU_SVA
>  	select IOMMU_IOPF
>  	select MMU_NOTIFIER
> @@ -410,6 +410,16 @@ config ARM_SMMU_V3_SVA
>  	  Say Y here if your system supports SVA extensions such as PCIe PASID
>  	  and PRI.
>  
> +config ARM_SMMU_V3_KUNIT_TEST
> +	tristate "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
> +	depends on KUNIT
> +	default KUNIT_ALL_TESTS
> +	help
> +	  Enable this option to unit-test arm-smmu-v3 driver functions.
> +
> +	  If unsure, say N.
> +endif
> +
>  config S390_IOMMU
>  	def_bool y if S390 && PCI
>  	depends on S390 && PCI
> diff --git a/drivers/iommu/arm/arm-smmu-v3/Makefile b/drivers/iommu/arm/arm-smmu-v3/Makefile
> index 54feb1ecccad89..014a997753a8a2 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/Makefile
> +++ b/drivers/iommu/arm/arm-smmu-v3/Makefile
> @@ -3,3 +3,5 @@ obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o
>  arm_smmu_v3-objs-y += arm-smmu-v3.o
>  arm_smmu_v3-objs-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o
>  arm_smmu_v3-objs := $(arm_smmu_v3-objs-y)
> +
> +obj-$(CONFIG_ARM_SMMU_V3_KUNIT_TEST) += arm-smmu-v3-test.o
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> index 80a7d559ef2d3f..f56a2d38012b5c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> @@ -120,9 +120,9 @@ static u64 page_size_to_cd(void)
>  	return ARM_LPAE_TCR_TG0_4K;
>  }
>  
> -static void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
> -				 struct arm_smmu_master *master,
> -				 struct mm_struct *mm, u16 asid)
> +void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
> +			  struct arm_smmu_master *master, struct mm_struct *mm,
> +			  u16 asid)
>  {
>  	u64 par;
>  
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
> new file mode 100644
> index 00000000000000..14c8e40712a70e
> --- /dev/null
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
> @@ -0,0 +1,467 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright 2024 Google LLC.
> + */
> +#include <kunit/test.h>
> +#include <linux/io-pgtable.h>
> +
> +#include "arm-smmu-v3.h"
> +
> +struct arm_smmu_test_writer {
> +	struct arm_smmu_entry_writer writer;
> +	struct kunit *test;
> +	const __le64 *init_entry;
> +	const __le64 *target_entry;
> +	__le64 *entry;
> +
> +	bool invalid_entry_written;
> +	unsigned int num_syncs;
> +};
> +
> +#define NUM_ENTRY_QWORDS 8
> +#define NUM_EXPECTED_SYNCS(x) x
> +
> +static struct arm_smmu_ste bypass_ste;
> +static struct arm_smmu_ste abort_ste;
> +static struct arm_smmu_device smmu = {
> +	.features = ARM_SMMU_FEAT_STALLS | ARM_SMMU_FEAT_ATTR_TYPES_OVR
> +};
> +
> +static bool arm_smmu_entry_differs_in_used_bits(const __le64 *entry,
> +						const __le64 *used_bits,
> +						const __le64 *target,
> +						unsigned int length)
> +{
> +	bool differs = false;
> +	unsigned int i;
> +
> +	for (i = 0; i < length; i++) {
> +		if ((entry[i] & used_bits[i]) != target[i])
> +			differs = true;
> +	}
> +	return differs;
> +}
> +
> +static void
> +arm_smmu_test_writer_record_syncs(struct arm_smmu_entry_writer *writer)
> +{
> +	struct arm_smmu_test_writer *test_writer =
> +		container_of(writer, struct arm_smmu_test_writer, writer);
> +	__le64 *entry_used_bits;
> +
> +	entry_used_bits = kunit_kzalloc(
> +		test_writer->test, sizeof(*entry_used_bits) * NUM_ENTRY_QWORDS,
> +		GFP_KERNEL);
> +	KUNIT_ASSERT_NOT_NULL(test_writer->test, entry_used_bits);
> +
> +	pr_debug("STE value is now set to: ");
> +	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8,
> +			     test_writer->entry,
> +			     NUM_ENTRY_QWORDS * sizeof(*test_writer->entry),
> +			     false);
> +
> +	test_writer->num_syncs += 1;
> +	if (!(test_writer->entry[0] & writer->ops->v_bit)) {
> +		test_writer->invalid_entry_written = true;
> +	} else {
> +		/*
> +		 * At any stage in a hitless transition, the entry must be
> +		 * equivalent to either the initial entry or the target entry
> +		 * when only considering the bits used by the current
> +		 * configuration.
> +		 */
> +		writer->ops->get_used(test_writer->entry, entry_used_bits);
> +		KUNIT_EXPECT_FALSE(
> +			test_writer->test,
> +			arm_smmu_entry_differs_in_used_bits(
> +				test_writer->entry, entry_used_bits,
> +				test_writer->init_entry, NUM_ENTRY_QWORDS) &&
> +				arm_smmu_entry_differs_in_used_bits(
> +					test_writer->entry, entry_used_bits,
> +					test_writer->target_entry,
> +					NUM_ENTRY_QWORDS));
> +	}
> +}
> +
> +static void
> +arm_smmu_v3_test_debug_print_used_bits(struct arm_smmu_entry_writer *writer,
> +				       const __le64 *ste)
> +{
> +	__le64 used_bits[NUM_ENTRY_QWORDS] = {};
> +
> +	arm_smmu_get_ste_used(ste, used_bits);
> +	pr_debug("STE used bits: ");
> +	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, used_bits,
> +			     sizeof(used_bits), false);
> +}
> +
> +static const struct arm_smmu_entry_writer_ops test_ste_ops = {
> +	.v_bit = cpu_to_le64(STRTAB_STE_0_V),
> +	.sync = arm_smmu_test_writer_record_syncs,
> +	.get_used = arm_smmu_get_ste_used,
> +};
> +
> +static const struct arm_smmu_entry_writer_ops test_cd_ops = {
> +	.v_bit = cpu_to_le64(CTXDESC_CD_0_V),
> +	.sync = arm_smmu_test_writer_record_syncs,
> +	.get_used = arm_smmu_get_cd_used,
> +};
> +
> +static void arm_smmu_v3_test_ste_expect_transition(
> +	struct kunit *test, const struct arm_smmu_ste *cur,
> +	const struct arm_smmu_ste *target, unsigned int num_syncs_expected,
> +	bool hitless)
> +{
> +	struct arm_smmu_ste cur_copy = *cur;
> +	struct arm_smmu_test_writer test_writer = {
> +		.writer = {
> +			.ops = &test_ste_ops,
> +		},
> +		.test = test,
> +		.init_entry = cur->data,
> +		.target_entry = target->data,
> +		.entry = cur_copy.data,
> +		.num_syncs = 0,
> +		.invalid_entry_written = false,
> +
> +	};
> +
> +	pr_debug("STE initial value: ");
> +	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data,
> +			     sizeof(cur_copy), false);
> +	arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, cur->data);
> +	pr_debug("STE target value: ");
> +	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, target->data,
> +			     sizeof(cur_copy), false);
> +	arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer,
> +					       target->data);
> +
> +	arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data);
> +
> +	KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless);
> +	KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected);
> +	KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy));
> +}
> +
> +static void arm_smmu_v3_test_ste_expect_hitless_transition(
> +	struct kunit *test, const struct arm_smmu_ste *cur,
> +	const struct arm_smmu_ste *target, unsigned int num_syncs_expected)
> +{
> +	arm_smmu_v3_test_ste_expect_transition(test, cur, target,
> +					       num_syncs_expected, true);
> +}
> +
> +static const dma_addr_t fake_cdtab_dma_addr = 0xF0F0F0F0F0F0;
> +
> +static void arm_smmu_test_make_cdtable_ste(struct arm_smmu_ste *ste,
> +					   const dma_addr_t dma_addr)
> +{
> +	struct arm_smmu_master master = {
> +		.cd_table.cdtab_dma = dma_addr,
> +		.cd_table.s1cdmax = 0xFF,
> +		.cd_table.s1fmt = STRTAB_STE_0_S1FMT_64K_L2,
> +		.smmu = &smmu,
> +	};
> +
> +	arm_smmu_make_cdtable_ste(ste, &master);
> +}
> +
> +static void arm_smmu_v3_write_ste_test_bypass_to_abort(struct kunit *test)
> +{
> +	/*
> +	 * Bypass STEs has used bits in the first two Qwords, while abort STEs
> +	 * only have used bits in the first QWord. Transitioning from bypass to
> +	 * abort requires two syncs: the first to set the first qword and make
> +	 * the STE into an abort, the second to clean up the second qword.
> +	 */
> +	arm_smmu_v3_test_ste_expect_hitless_transition(
> +		test, &bypass_ste, &abort_ste, NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_ste_test_abort_to_bypass(struct kunit *test)
> +{
> +	/*
> +	 * Transitioning from abort to bypass also requires two syncs: the first
> +	 * to set the second qword data required by the bypass STE, and the
> +	 * second to set the first qword and switch to bypass.
> +	 */
> +	arm_smmu_v3_test_ste_expect_hitless_transition(
> +		test, &abort_ste, &bypass_ste, NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_ste_test_cdtable_to_abort(struct kunit *test)
> +{
> +	struct arm_smmu_ste ste;
> +
> +	arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr);
> +	arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste,
> +						       NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_ste_test_abort_to_cdtable(struct kunit *test)
> +{
> +	struct arm_smmu_ste ste;
> +
> +	arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr);
> +	arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste,
> +						       NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_ste_test_cdtable_to_bypass(struct kunit *test)
> +{
> +	struct arm_smmu_ste ste;
> +
> +	arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr);
> +	arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste,
> +						       NUM_EXPECTED_SYNCS(3));
> +}
> +
> +static void arm_smmu_v3_write_ste_test_bypass_to_cdtable(struct kunit *test)
> +{
> +	struct arm_smmu_ste ste;
> +
> +	arm_smmu_test_make_cdtable_ste(&ste, fake_cdtab_dma_addr);
> +	arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste,
> +						       NUM_EXPECTED_SYNCS(3));
> +}
> +
> +static void arm_smmu_test_make_s2_ste(struct arm_smmu_ste *ste,
> +				      bool ats_enabled)
> +{
> +	struct arm_smmu_master master = {
> +		.smmu = &smmu,
> +		.ats_enabled = ats_enabled,
> +	};
> +	struct io_pgtable io_pgtable = {};
> +	struct arm_smmu_domain smmu_domain = {
> +		.pgtbl_ops = &io_pgtable.ops,
> +	};
> +
> +	io_pgtable.cfg.arm_lpae_s2_cfg.vttbr = 0xdaedbeefdeadbeefULL;
> +	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.ps = 1;
> +	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tg = 2;
> +	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sh = 3;
> +	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.orgn = 1;
> +	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.irgn = 2;
> +	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.sl = 3;
> +	io_pgtable.cfg.arm_lpae_s2_cfg.vtcr.tsz = 4;
> +
> +	arm_smmu_make_s2_domain_ste(ste, &master, &smmu_domain);
> +}
> +
> +static void arm_smmu_v3_write_ste_test_s2_to_abort(struct kunit *test)
> +{
> +	struct arm_smmu_ste ste;
> +
> +	arm_smmu_test_make_s2_ste(&ste, true);
> +	arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &abort_ste,
> +						       NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_ste_test_abort_to_s2(struct kunit *test)
> +{
> +	struct arm_smmu_ste ste;
> +
> +	arm_smmu_test_make_s2_ste(&ste, true);
> +	arm_smmu_v3_test_ste_expect_hitless_transition(test, &abort_ste, &ste,
> +						       NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_ste_test_s2_to_bypass(struct kunit *test)
> +{
> +	struct arm_smmu_ste ste;
> +
> +	arm_smmu_test_make_s2_ste(&ste, true);
> +	arm_smmu_v3_test_ste_expect_hitless_transition(test, &ste, &bypass_ste,
> +						       NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_ste_test_bypass_to_s2(struct kunit *test)
> +{
> +	struct arm_smmu_ste ste;
> +
> +	arm_smmu_test_make_s2_ste(&ste, true);
> +	arm_smmu_v3_test_ste_expect_hitless_transition(test, &bypass_ste, &ste,
> +						       NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_test_cd_expect_transition(
> +	struct kunit *test, const struct arm_smmu_cd *cur,
> +	const struct arm_smmu_cd *target, unsigned int num_syncs_expected,
> +	bool hitless)
> +{
> +	struct arm_smmu_cd cur_copy = *cur;
> +	struct arm_smmu_test_writer test_writer = {
> +		.writer = {
> +			.ops = &test_cd_ops,
> +		},
> +		.test = test,
> +		.init_entry = cur->data,
> +		.target_entry = target->data,
> +		.entry = cur_copy.data,
> +		.num_syncs = 0,
> +		.invalid_entry_written = false,
> +
> +	};
> +
> +	pr_debug("CD initial value: ");
> +	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, cur_copy.data,
> +			     sizeof(cur_copy), false);
> +	arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer, cur->data);
> +	pr_debug("CD target value: ");
> +	print_hex_dump_debug("    ", DUMP_PREFIX_NONE, 16, 8, target->data,
> +			     sizeof(cur_copy), false);
> +	arm_smmu_v3_test_debug_print_used_bits(&test_writer.writer,
> +					       target->data);
> +
> +	arm_smmu_write_entry(&test_writer.writer, cur_copy.data, target->data);
> +
> +	KUNIT_EXPECT_EQ(test, test_writer.invalid_entry_written, !hitless);
> +	KUNIT_EXPECT_EQ(test, test_writer.num_syncs, num_syncs_expected);
> +	KUNIT_EXPECT_MEMEQ(test, target->data, cur_copy.data, sizeof(cur_copy));
> +}
> +
> +static void arm_smmu_v3_test_cd_expect_non_hitless_transition(
> +	struct kunit *test, const struct arm_smmu_cd *cur,
> +	const struct arm_smmu_cd *target, unsigned int num_syncs_expected)
> +{
> +	arm_smmu_v3_test_cd_expect_transition(test, cur, target,
> +					      num_syncs_expected, false);
> +}
> +
> +static void arm_smmu_v3_test_cd_expect_hitless_transition(
> +	struct kunit *test, const struct arm_smmu_cd *cur,
> +	const struct arm_smmu_cd *target, unsigned int num_syncs_expected)
> +{
> +	arm_smmu_v3_test_cd_expect_transition(test, cur, target,
> +					      num_syncs_expected, true);
> +}
> +
> +static void arm_smmu_test_make_s1_cd(struct arm_smmu_cd *cd, unsigned int asid)
> +{
> +	struct arm_smmu_master master = {
> +		.smmu = &smmu,
> +	};
> +	struct io_pgtable io_pgtable = {};
> +	struct arm_smmu_domain smmu_domain = {
> +		.pgtbl_ops = &io_pgtable.ops,
> +		.cd = {
> +			.asid = asid,
> +		},
> +	};
> +
> +	io_pgtable.cfg.arm_lpae_s1_cfg.ttbr = 0xdaedbeefdeadbeefULL;
> +	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.ips = 1;
> +	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.tg = 2;
> +	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.sh = 3;
> +	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.orgn = 1;
> +	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.irgn = 2;
> +	io_pgtable.cfg.arm_lpae_s1_cfg.tcr.tsz = 4;
> +	io_pgtable.cfg.arm_lpae_s1_cfg.mair = 0xabcdef012345678ULL;
> +
> +	arm_smmu_make_s1_cd(cd, &master, &smmu_domain);
> +}
> +
> +static void arm_smmu_v3_write_cd_test_s1_clear(struct kunit *test)
> +{
> +	struct arm_smmu_cd cd = {};
> +	struct arm_smmu_cd cd_2;
> +
> +	arm_smmu_test_make_s1_cd(&cd_2, 1997);
> +	arm_smmu_v3_test_cd_expect_non_hitless_transition(
> +		test, &cd, &cd_2, NUM_EXPECTED_SYNCS(2));
> +	arm_smmu_v3_test_cd_expect_non_hitless_transition(
> +		test, &cd_2, &cd, NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_cd_test_s1_change_asid(struct kunit *test)
> +{
> +	struct arm_smmu_cd cd = {};
> +	struct arm_smmu_cd cd_2;
> +
> +	arm_smmu_test_make_s1_cd(&cd, 778);
> +	arm_smmu_test_make_s1_cd(&cd_2, 1997);
> +	arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd, &cd_2,
> +						      NUM_EXPECTED_SYNCS(1));
> +	arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd_2, &cd,
> +						      NUM_EXPECTED_SYNCS(1));
> +}
> +
> +static void arm_smmu_test_make_sva_cd(struct arm_smmu_cd *cd, unsigned int asid)
> +{
> +	struct arm_smmu_master master = {
> +		.smmu = &smmu,
> +	};
> +	struct mm_struct mm = {
> +		.pgd = (void *)0xdaedbeefdeadbeefULL,
> +	};
> +
> +	arm_smmu_make_sva_cd(cd, &master, &mm, asid);
> +}
> +
> +static void arm_smmu_test_make_sva_release_cd(struct arm_smmu_cd *cd,
> +					      unsigned int asid)
> +{
> +	struct arm_smmu_master master = {
> +		.smmu = &smmu,
> +	};
> +
> +	arm_smmu_make_sva_cd(cd, &master, NULL, asid);
> +}
> +

The test doesn’t build with SVA disabled, it fails with:
aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
aarch64-linux-gnu-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o: in function `arm_smmu_test_make_sva_release_cd':
.../linux/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:409:(.text+0x17c): undefined reference to `arm_smmu_make_sva_cd'
aarch64-linux-gnu-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o: in function `arm_smmu_test_make_sva_cd':
.../linux/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:399:(.text+0x230): undefined reference to `arm_smmu_make_sva_cd'

I belive this check should be guarded under SVA.

> +static void arm_smmu_v3_write_cd_test_sva_clear(struct kunit *test)
> +{
> +	struct arm_smmu_cd cd = {};
> +	struct arm_smmu_cd cd_2;
> +
> +	arm_smmu_test_make_sva_cd(&cd_2, 1997);
> +	arm_smmu_v3_test_cd_expect_non_hitless_transition(
> +		test, &cd, &cd_2, NUM_EXPECTED_SYNCS(2));
> +	arm_smmu_v3_test_cd_expect_non_hitless_transition(
> +		test, &cd_2, &cd, NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static void arm_smmu_v3_write_cd_test_sva_release(struct kunit *test)
> +{
> +	struct arm_smmu_cd cd;
> +	struct arm_smmu_cd cd_2;
> +
> +	arm_smmu_test_make_sva_cd(&cd, 1997);
> +	arm_smmu_test_make_sva_release_cd(&cd_2, 1997);
> +	arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd, &cd_2,
> +						      NUM_EXPECTED_SYNCS(2));
> +	arm_smmu_v3_test_cd_expect_hitless_transition(test, &cd_2, &cd,
> +						      NUM_EXPECTED_SYNCS(2));
> +}
> +
> +static struct kunit_case arm_smmu_v3_test_cases[] = {
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_abort),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_bypass),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_abort),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_cdtable),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_cdtable_to_bypass),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_cdtable),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_abort),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_s2),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_bypass),
> +	KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_s2),
> +	KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_clear),
> +	KUNIT_CASE(arm_smmu_v3_write_cd_test_s1_change_asid),
> +	KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_clear),
> +	KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_release),
> +	{},
> +};
> +
> +static int arm_smmu_v3_test_suite_init(struct kunit_suite *test)
> +{
> +	arm_smmu_make_bypass_ste(&smmu, &bypass_ste);
> +	arm_smmu_make_abort_ste(&abort_ste);
> +	return 0;
> +}
> +
> +static struct kunit_suite arm_smmu_v3_test_module = {
> +	.name = "arm-smmu-v3-kunit-test",
> +	.suite_init = arm_smmu_v3_test_suite_init,
> +	.test_cases = arm_smmu_v3_test_cases,
> +};
> +kunit_test_suites(&arm_smmu_v3_test_module);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 72402f6a7ed4e0..3ffaa3b34b44bf 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -42,18 +42,6 @@ enum arm_smmu_msi_index {
>  	ARM_SMMU_MAX_MSIS,
>  };
>  
> -struct arm_smmu_entry_writer_ops;
> -struct arm_smmu_entry_writer {
> -	const struct arm_smmu_entry_writer_ops *ops;
> -	struct arm_smmu_master *master;
> -};
> -
> -struct arm_smmu_entry_writer_ops {
> -	__le64 v_bit;
> -	void (*get_used)(const __le64 *entry, __le64 *used);
> -	void (*sync)(struct arm_smmu_entry_writer *writer);
> -};
> -
>  #define NUM_ENTRY_QWORDS 8
>  static_assert(sizeof(struct arm_smmu_ste) == NUM_ENTRY_QWORDS * sizeof(u64));
>  static_assert(sizeof(struct arm_smmu_cd) == NUM_ENTRY_QWORDS * sizeof(u64));
> @@ -980,7 +968,7 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
>   * would be nice if this was complete according to the spec, but minimally it
>   * has to capture the bits this driver uses.
>   */
> -static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> +void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)

IMO we should not export all these low level functions unconditionally.
KUNIT already defines “VISIBLE_IF_KUNIT” which sets symbols to be static
if CONFIG_KUNIT is not enabled. Or maybe even guard it for this test
like what btrfs does with “EXPORT_FOR_TESTS”

Thanks,
Mostafa

>  {
>  	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
>  
> @@ -1102,8 +1090,8 @@ static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry,
>   * V=0 process. This relies on the IGNORED behavior described in the
>   * specification.
>   */
> -static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
> -				 __le64 *entry, const __le64 *target)
> +void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *entry,
> +			  const __le64 *target)
>  {
>  	__le64 unused_update[NUM_ENTRY_QWORDS];
>  	u8 used_qword_diff;
> @@ -1257,7 +1245,7 @@ struct arm_smmu_cd_writer {
>  	unsigned int ssid;
>  };
>  
> -static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> +void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
>  {
>  	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
>  	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> @@ -1514,7 +1502,7 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
>  	}
>  }
>  
> -static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
> +void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
>  {
>  	memset(target, 0, sizeof(*target));
>  	target->data[0] = cpu_to_le64(
> @@ -1522,8 +1510,8 @@ static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
>  		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT));
>  }
>  
> -static void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
> -				     struct arm_smmu_ste *target)
> +void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
> +			      struct arm_smmu_ste *target)
>  {
>  	memset(target, 0, sizeof(*target));
>  	target->data[0] = cpu_to_le64(
> @@ -1535,8 +1523,8 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
>  							 STRTAB_STE_1_SHCFG_INCOMING));
>  }
>  
> -static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
> -				      struct arm_smmu_master *master)
> +void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
> +			       struct arm_smmu_master *master)
>  {
>  	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
>  	struct arm_smmu_device *smmu = master->smmu;
> @@ -1585,9 +1573,9 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
>  	}
>  }
>  
> -static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> -					struct arm_smmu_master *master,
> -					struct arm_smmu_domain *smmu_domain)
> +void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> +				 struct arm_smmu_master *master,
> +				 struct arm_smmu_domain *smmu_domain)
>  {
>  	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
>  	const struct io_pgtable_cfg *pgtbl_cfg =
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index 8f791f67f9f7f4..0455498d24c730 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -737,6 +737,36 @@ struct arm_smmu_domain {
>  	struct list_head		mmu_notifiers;
>  };
>  
> +/* The following are exposed for testing purposes. */
> +struct arm_smmu_entry_writer_ops;
> +struct arm_smmu_entry_writer {
> +	const struct arm_smmu_entry_writer_ops *ops;
> +	struct arm_smmu_master *master;
> +};
> +
> +struct arm_smmu_entry_writer_ops {
> +	__le64 v_bit;
> +	void (*get_used)(const __le64 *entry, __le64 *used);
> +	void (*sync)(struct arm_smmu_entry_writer *writer);
> +};
> +
> +void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits);
> +void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits);
> +void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur,
> +			  const __le64 *target);
> +
> +void arm_smmu_make_abort_ste(struct arm_smmu_ste *target);
> +void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu,
> +			      struct arm_smmu_ste *target);
> +void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
> +			       struct arm_smmu_master *master);
> +void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> +				 struct arm_smmu_master *master,
> +				 struct arm_smmu_domain *smmu_domain);
> +void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
> +			  struct arm_smmu_master *master, struct mm_struct *mm,
> +			  u16 asid);
> +
>  static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom)
>  {
>  	return container_of(dom, struct arm_smmu_domain, domain);
> -- 
> 2.43.2
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code
  2024-04-19 21:02   ` Mostafa Saleh
@ 2024-04-22 13:09     ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-22 13:09 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Fri, Apr 19, 2024 at 09:02:32PM +0000, Mostafa Saleh wrote:
> >  	} else if (used_qword_diff) {
> >  		/*
> >  		 * At least two qwords need their inuse bits to be changed. This
> >  		 * requires a breaking update, zero the V bit, write all qwords
> >  		 * but 0, then set qword 0
> >  		 */
> > -		unused_update.data[0] = entry->data[0] &
> > -					cpu_to_le64(~STRTAB_STE_0_V);
> > -		entry_set(smmu, sid, entry, &unused_update, 0, 1);
> > -		entry_set(smmu, sid, entry, target, 1, num_entry_qwords - 1);
> > -		entry_set(smmu, sid, entry, target, 0, 1);
> > +		unused_update[0] = entry[0] & (~writer->ops->v_bit);
> 
> arm_smmu_write_entry() assumes that v_bit is in entry[0] and that “1” means valid
> (which is true for both STE and CD) so why do we care about it, if we break the
> STE/CD anyway, why not just do:
> 
> 	unused_update[0] = 0;
> 	entry_set(writer, entry, unused_update, 0, 1);
> 	entry_set(writer, entry, target, 1, NUM_ENTRY_QWORDS - 1)
> 	entry_set(writer, entry, target, 0, 1);
> 
> That makes the code simpler by avoiding having the v_bit in
> arm_smmu_entry_writer_ops.

Sure, done

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-19 21:07   ` Mostafa Saleh
@ 2024-04-22 13:29     ` Jason Gunthorpe
  2024-04-27 22:08       ` Mostafa Saleh
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-22 13:29 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Fri, Apr 19, 2024 at 09:07:19PM +0000, Mostafa Saleh wrote:
> > -	cdptr = arm_smmu_get_cd_ptr(master, ssid);
> > -	if (!cdptr)
> > +	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
> > +	if (!cd_table_entry)
> >  		return -ENOMEM;
> >  
> > +	target = *cd_table_entry;
> 
> As this changes the logic where all CD manipulation is not on the actual
> CD, I believe a comment would be helpful here.

This is all deleted in a few patches, doesn't seem worth it to
me. These steps exist only for bisection.

> > @@ -1299,18 +1357,14 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
> >  		if (cd_table->stall_enabled)
> >  			val |= CTXDESC_CD_0_S;
> >  	}
> > -
> > +	cdptr->data[0] = cpu_to_le64(val);
> >  	/*
> > -	 * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3
> > -	 * "Configuration structures and configuration invalidation completion"
> > -	 *
> > -	 *   The size of single-copy atomic reads made by the SMMU is
> > -	 *   IMPLEMENTATION DEFINED but must be at least 64 bits. Any single
> > -	 *   field within an aligned 64-bit span of a structure can be altered
> > -	 *   without first making the structure invalid.
> > +	 * Since the above is updating the CD entry based on the current value
> > +	 * without zeroing unused bits it needs fixing before being passed to
> > +	 * the programming logic.
> >  	 */
> > -	WRITE_ONCE(cdptr->data[0], cpu_to_le64(val));
> > -	arm_smmu_sync_cd(master, ssid, true);
> > +	arm_smmu_clean_cd_entry(&target);
> 
> I am not sure I understand the logic here, is that only needed for entry[0]
> As I see the other entries are set and not reused.

I'm not sure what you are asking?

The issue is the old logic constructs the new CD by manipulating the
existing CD in various ways "in place" that ends up creating CDs that
don't meet the requirements for the new programmer. For instance EPD0
will be set and the TTB0 will also be left programmed.

> If so, I think it’d be better to make that clear, also as used_bits
> are always 0xff for all cases, I believe the EPD0 logic should be
> integrated in populating the CD so it is correct by construction, as
> this looks like a hack to me.

Yes, this is what happens, in a few more steps. We have to go and
build the missing make functions first.

There is a bit of a circular problem here: the new scheme expects that
the CD is only programmed by the new scheme and follows the rules - eg
no unused bits set. While the old scheme doesn't follow the rules.

So this patch makes the old scheme follow the rules and be compatible
with the new scheme then we go place by place and convert to the new
scheme. Then we remove the old scheme entirely. Look at the "Move the
CD generation for SVA into a function" patch.

Yes, this is a minimal hack to let the next few patches work out
correctly without breaking bisection.

How about a new commit message:

iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()

CD table entries and STE's have the same essential programming sequence,
just with different types. Use the new ops indirection to link CD
programming to the common writer.

In a few more patches all CD writers will call an appropriate make
function and then directly call arm_smmu_write_cd_entry().
arm_smmu_write_ctx_desc() will be removed.

Until then lightly tweak arm_smmu_write_ctx_desc() to also use the new
programmer by using the same logic as right now to build the target CD on
the stack, sanitizing it to meet the used rules, and then using the
writer.

This is necessary because the writer expects that the currently programmed
CD follows the used rules. Next patches add new make functions and new
direct calls to arm_smmu_write_cd_entry() which will require this.

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function
  2024-04-19 21:10   ` Mostafa Saleh
@ 2024-04-22 13:52     ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-22 13:52 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Fri, Apr 19, 2024 at 09:10:59PM +0000, Mostafa Saleh wrote:
> Hi Jason,
> 
> On Tue, Apr 16, 2024 at 04:28:14PM -0300, Jason Gunthorpe wrote:
> > Introduce arm_smmu_make_s1_cd() to build the CD from the paging S1 domain,
> > and reorganize all the places programming S1 domain CD table entries to
> > call it.
> > 
> > Split arm_smmu_update_s1_domain_cd_entry() from
> > arm_smmu_update_ctx_desc_devices() so that the S1 path has its own call
> > chain separate from the unrelated SVA path.
> > 
> > arm_smmu_update_s1_domain_cd_entry() only works on S1 domains
> > attached to RIDs and refreshes all their CDs.
> > 
> > Remove the forced clear of the CD during S1 domain attach,
> > arm_smmu_write_cd_entry() will do this automatically if necessary.
> > 
> > Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> > Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > Reviewed-by: Michael Shavit <mshavit@google.com>
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> >  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   | 25 +++++++-
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 60 +++++++++++++------
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  9 +++
> >  3 files changed, 76 insertions(+), 18 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > index 41b44baef15e80..d159f60480935e 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
> > @@ -53,6 +53,29 @@ static void arm_smmu_update_ctx_desc_devices(struct arm_smmu_domain *smmu_domain
> >  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> >  }
> >  
> > +static void
> > +arm_smmu_update_s1_domain_cd_entry(struct arm_smmu_domain *smmu_domain)
> 
> nit: shouldn’t that be arm_smmu_update_sva_domain_cd_entry?

No, that actually was my same confusion too when I was first looking
at this. The logic updates a *S1* domain's CD, it doesn't touch a SVA
CD or a SVA domain.

It actually has nothing to do with SVA, this is part of BTM support to
change the ASID in already programmed S1 domains.

> > +{
> > +	struct arm_smmu_master *master;
> > +	struct arm_smmu_cd target_cd;
> > +	unsigned long flags;
> > +
> > +	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> > +	list_for_each_entry(master, &smmu_domain->devices, domain_head) {
> > +		struct arm_smmu_cd *cdptr;
> > +
> > +		/* S1 domains only support RID attachment right now */
> > +		cdptr = arm_smmu_get_cd_ptr(master, IOMMU_NO_PASID);
> > +		if (WARN_ON(!cdptr))
> > +			continue;
> > +
> > +		arm_smmu_make_s1_cd(&target_cd, master, smmu_domain);
> > +		arm_smmu_write_cd_entry(master, IOMMU_NO_PASID, cdptr,
> > +					&target_cd);
> 
> Case ARM_SMMU_DOMAIN_S1 has the some code:
>   arm_smmu_get_cd_pter => arm_smmu_make_s1_cd => arm_smmu_write_cd_entry
> I’d prefer if that was abstracted with the SMMUv3 driver and it provides a higher
> level API rather than exposing these low-level functions in the header file.
> But no strong opinion.

It is only slightly the same now, and it will keep getting more
different as the patches progress. For instance "Make
arm_smmu_alloc_cd_ptr()" makes them call different alloc functions.

Later on this code will handle a SSID too.

I don't think of those functions as a lower level API, ptr/make/write
is the API design. We have different versions of each of those
functions. The call site needs to string together the right sequence
of three operations for its specific context.

At the end this is an atomic context working on S1 domains with SSID -
there isn't another case exactly like this.

> > +void arm_smmu_make_s1_cd(struct arm_smmu_cd *target,
> > +			 struct arm_smmu_master *master,
> > +			 struct arm_smmu_domain *smmu_domain)
> > +{
> > +	struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
> > +
> > +	memset(target, 0, sizeof(*target));
> > +
> > +	target->data[0] = cpu_to_le64(
> > +		cd->tcr |
> > +#ifdef __BIG_ENDIAN
> > +		CTXDESC_CD_0_ENDI |
> > +#endif
> > +		CTXDESC_CD_0_V |
> > +		CTXDESC_CD_0_AA64 |
> > +		(master->stall_enabled ? CTXDESC_CD_0_S : 0) |
> > +		CTXDESC_CD_0_R |
> > +		CTXDESC_CD_0_A |
> > +		CTXDESC_CD_0_ASET |
> > +		FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid)
> > +		);
> > +
> > +	target->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
> > +	target->data[3] = cpu_to_le64(cd->mair);
> > +}
> > +
> 
> IMO, patches to handle CD = NULL and quiet CD should be introduced first so it is
> easier to follow as now there is duplicate code in arm_smmu_write_ctx_desc() which
> is dead and makes it a little harder to review, but if reordered,
> arm_smmu_write_ctx_desc() can be removed in this patch so we can see how code moved.

arm_smmu_write_ctx_desc() can't be removed until all of S1, clear, SVA
and quiet_cd are converted. No matter what order you pick there will
be some weirdness.

The duplicate code "(1) and (2)" is also still being used for the SVA
domains, it is not unused until patch "Move the CD generation for SVA
into a function".

The only dead code here is the ASID change. So I'll brung this hunk forward:

--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1328,14 +1328,11 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
         *
         * (1) Install primary CD, for normal DMA traffic (SSID = IOMMU_NO_PASID = 0).
         * (2) Install a secondary CD, for SID+SSID traffic.
-        * (3) Update ASID of a CD. Atomically write the first 64 bits of the
-        *     CD, then invalidate the old entry and mappings.
         * (4) Quiesce the context without clearing the valid bit. Disable
         *     translation, and ignore any translation fault.
         * (5) Remove a secondary CD.
         */
        u64 val;
-       bool cd_live;
        struct arm_smmu_cd target;
        struct arm_smmu_cd *cdptr = &target;
        struct arm_smmu_cd *cd_table_entry;
@@ -1351,7 +1348,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
 
        target = *cd_table_entry;
        val = le64_to_cpu(cdptr->data[0]);
-       cd_live = !!(val & CTXDESC_CD_0_V);
 
        if (!cd) { /* (5) */
                val = 0;
@@ -1359,13 +1355,6 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
                if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
                        val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
                val |= CTXDESC_CD_0_TCR_EPD0;
-       } else if (cd_live) { /* (3) */
-               val &= ~CTXDESC_CD_0_ASID;
-               val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);
-               /*
-                * Until CD+TLB invalidation, both ASIDs may be used for tagging
-                * this substream's traffic
-                */
        } else { /* (1) and (2) */
                cdptr->data[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK);
                cdptr->data[2] = 0;

> Otherwise:
> Reviewed-by: Mostafa Saleh <smostafa@google.com>

Thanks,
Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  2024-04-19 21:14   ` Mostafa Saleh
@ 2024-04-22 14:20     ` Jason Gunthorpe
  2024-04-27 22:19       ` Mostafa Saleh
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-22 14:20 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Fri, Apr 19, 2024 at 09:14:21PM +0000, Mostafa Saleh wrote:
> Hi Jason,
> 
> On Tue, Apr 16, 2024 at 04:28:16PM -0300, Jason Gunthorpe wrote:
> > Only the attach callers can perform an allocation for the CD table entry,
> > the other callers must not do so, they do not have the correct locking and
> > they cannot sleep. Split up the functions so this is clear.
> > 
> > arm_smmu_get_cd_ptr() will return pointer to a CD table entry without
> > doing any kind of allocation.
> > 
> > arm_smmu_alloc_cd_ptr() will allocate the table and any required
> > leaf.
> > 
> > A following patch will add lockdep assertions to arm_smmu_alloc_cd_ptr()
> > once the restructuring is completed and arm_smmu_alloc_cd_ptr() is never
> > called in the wrong context.
> > 
> > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > ---
> >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 61 +++++++++++++--------
> >  1 file changed, 39 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > index f3df1ec8d258dc..a0d1237272936f 100644
> > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > @@ -98,6 +98,7 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
> >  
> >  static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
> >  				    struct arm_smmu_device *smmu);
> > +static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
> >  
> >  static void parse_driver_options(struct arm_smmu_device *smmu)
> >  {
> > @@ -1207,29 +1208,51 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> >  struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
> >  					u32 ssid)
> >  {
> > -	__le64 *l1ptr;
> > -	unsigned int idx;
> >  	struct arm_smmu_l1_ctx_desc *l1_desc;
> > -	struct arm_smmu_device *smmu = master->smmu;
> >  	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
> >  
> > +	if (!cd_table->cdtab)
> > +		return NULL;
> > +
> >  	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
> >  		return (struct arm_smmu_cd *)(cd_table->cdtab +
> >  					      ssid * CTXDESC_CD_DWORDS);
> >  
> > -	idx = ssid >> CTXDESC_SPLIT;
> > -	l1_desc = &cd_table->l1_desc[idx];
> > -	if (!l1_desc->l2ptr) {
> > -		if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
> > -			return NULL;
> > +	l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES];
> 
> These operations used to be shift and bit masking which made sense as it does
> what hardware does, is there any reason you changed it to division and modulo?
> I checked the disassembly and gcc does the right thing as constants are power
> of 2, but I am just curious.

I generally prefer the clarity and succinctness of / and % instead of
hacking up bit operations that the compiler will generate
automatically anyhow.

If bit extractions should be used it is better to wrap it in
FIELD_GET() than open code it..

> > +static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> > +						 u32 ssid)
> > +{
> > +	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
> > +	struct arm_smmu_device *smmu = master->smmu;
> > +
> > +	if (!cd_table->cdtab) {
> > +		if (arm_smmu_alloc_cd_tables(master))
> > +			return NULL;
> >  	}
> > -	idx = ssid & (CTXDESC_L2_ENTRIES - 1);
> > -	return &l1_desc->l2ptr[idx];
> > +
> > +	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) {
> > +		unsigned int idx = ssid >> CTXDESC_SPLIT;
> 
> Ok, now it’s a shift, I think we should be consistent with how we
> calculate the index.

Sure. Change that to / will make CTXDESC_SPLIT unused except in
computing CTXDESC_L2_ENTRIES so that can be simplified too:

-#define CTXDESC_SPLIT                  10
-#define CTXDESC_L2_ENTRIES             (1 << CTXDESC_SPLIT)
+#define CTXDESC_L2_ENTRIES             1024


> > @@ -1357,7 +1380,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
> >  	if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
> >  		return -E2BIG;
> >  
> > -	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
> > +	cd_table_entry = arm_smmu_alloc_cd_ptr(master, ssid);
> 
> The only path allocates the main table is “arm_smmu_attach_dev”,

There are two places that allocate the leaf, arm_smmu_attach_dev()
(for the RID) and arm_smmu_sva_set_dev_pasid() (for a PASID)

At this moment all the paths are relying on the above to allocate the
leaf. The next patch makes arm_smmu_attach_dev() allocate the leaf
itself. A few more patches also makes the PASID path allocate the leaf
itself, when the above is removed.

> I guess it would be more robust to leave that as is and have 2
> versions of get_cd, one that allocates leaf and one that is not
> allocating, what do you think?

I'm not sure what you are asking? We have two versions. One is called
alloc and one is called get. That have different locking requirements
on the caller so they have different names. I would not call them both
get?

Thanks,
Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-19 21:24   ` Mostafa Saleh
@ 2024-04-22 14:24     ` Jason Gunthorpe
  2024-04-27 22:33       ` Mostafa Saleh
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-22 14:24 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Fri, Apr 19, 2024 at 09:24:52PM +0000, Mostafa Saleh wrote:
> > +static void arm_smmu_test_make_sva_release_cd(struct arm_smmu_cd *cd,
> > +					      unsigned int asid)
> > +{
> > +	struct arm_smmu_master master = {
> > +		.smmu = &smmu,
> > +	};
> > +
> > +	arm_smmu_make_sva_cd(cd, &master, NULL, asid);
> > +}
> > +
> 
> The test doesn’t build with SVA disabled, it fails with:
> aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
> aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
> aarch64-linux-gnu-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o: in function `arm_smmu_test_make_sva_release_cd':
> .../linux/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:409:(.text+0x17c): undefined reference to `arm_smmu_make_sva_cd'
> aarch64-linux-gnu-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o: in function `arm_smmu_test_make_sva_cd':
> .../linux/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:399:(.text+0x230): undefined reference to `arm_smmu_make_sva_cd'
> 
> I belive this check should be guarded under SVA.

Ugh yes, 0-day just hit this too.

I'm just going to do this:

 config ARM_SMMU_V3_KUNIT_TEST
        bool "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
        depends on KUNIT
+       depends on ARM_SMMU_V3_SVA
        default KUNIT_ALL_TESTS
        help
          Enable this option to unit-test arm-smmu-v3 driver functions.


Instead of adding #ifdefs.  No reason not to test the whole driver?

> > @@ -980,7 +968,7 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
> >   * would be nice if this was complete according to the spec, but minimally it
> >   * has to capture the bits this driver uses.
> >   */
> > -static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> > +void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> 
> IMO we should not export all these low level functions unconditionally.
> KUNIT already defines “VISIBLE_IF_KUNIT” which sets symbols to be static
> if CONFIG_KUNIT is not enabled. Or maybe even guard it for this test
> like what btrfs does with “EXPORT_FOR_TESTS”

Sure, that doesn't look like too much trouble long term.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-22 13:29     ` Jason Gunthorpe
@ 2024-04-27 22:08       ` Mostafa Saleh
  2024-04-29 14:29         ` Jason Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-27 22:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Mon, Apr 22, 2024 at 10:29:54AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 19, 2024 at 09:07:19PM +0000, Mostafa Saleh wrote:
> > > -	cdptr = arm_smmu_get_cd_ptr(master, ssid);
> > > -	if (!cdptr)
> > > +	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
> > > +	if (!cd_table_entry)
> > >  		return -ENOMEM;
> > >  
> > > +	target = *cd_table_entry;
> > 
> > As this changes the logic where all CD manipulation is not on the actual
> > CD, I believe a comment would be helpful here.
> 
> This is all deleted in a few patches, doesn't seem worth it to
> me. These steps exist only for bisection.
> 
> > > @@ -1299,18 +1357,14 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
> > >  		if (cd_table->stall_enabled)
> > >  			val |= CTXDESC_CD_0_S;
> > >  	}
> > > -
> > > +	cdptr->data[0] = cpu_to_le64(val);
> > >  	/*
> > > -	 * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3
> > > -	 * "Configuration structures and configuration invalidation completion"
> > > -	 *
> > > -	 *   The size of single-copy atomic reads made by the SMMU is
> > > -	 *   IMPLEMENTATION DEFINED but must be at least 64 bits. Any single
> > > -	 *   field within an aligned 64-bit span of a structure can be altered
> > > -	 *   without first making the structure invalid.
> > > +	 * Since the above is updating the CD entry based on the current value
> > > +	 * without zeroing unused bits it needs fixing before being passed to
> > > +	 * the programming logic.
> > >  	 */
> > > -	WRITE_ONCE(cdptr->data[0], cpu_to_le64(val));
> > > -	arm_smmu_sync_cd(master, ssid, true);
> > > +	arm_smmu_clean_cd_entry(&target);
> > 
> > I am not sure I understand the logic here, is that only needed for entry[0]
> > As I see the other entries are set and not reused.
> 
> I'm not sure what you are asking?
> 
> The issue is the old logic constructs the new CD by manipulating the
> existing CD in various ways "in place" that ends up creating CDs that
> don't meet the requirements for the new programmer. For instance EPD0
> will be set and the TTB0 will also be left programmed.
> 

I see, but what I don’t understand is why doesn't the function construct
the CD correctly, as from
	} else if (cd == &quiet_cd) { /* (4) */
		if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
			val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
		val |= CTXDESC_CD_0_TCR_EPD0;
		// populate the rest of the CD correctly here.
	}

As I  don’t think the right approach is to populate the CD incorrectly
and then clear the parts not needed for EPD0.
Also, TTB0 is ignored anyway in that case, no?

Thanks,
Mostafa

> > If so, I think it’d be better to make that clear, also as used_bits
> > are always 0xff for all cases, I believe the EPD0 logic should be
> > integrated in populating the CD so it is correct by construction, as
> > this looks like a hack to me.
> 
> Yes, this is what happens, in a few more steps. We have to go and
> build the missing make functions first.
> 
> There is a bit of a circular problem here: the new scheme expects that
> the CD is only programmed by the new scheme and follows the rules - eg
> no unused bits set. While the old scheme doesn't follow the rules.
> 
> So this patch makes the old scheme follow the rules and be compatible
> with the new scheme then we go place by place and convert to the new
> scheme. Then we remove the old scheme entirely. Look at the "Move the
> CD generation for SVA into a function" patch.
> 
> Yes, this is a minimal hack to let the next few patches work out
> correctly without breaking bisection.
> 
> How about a new commit message:
> 
> iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
> 
> CD table entries and STE's have the same essential programming sequence,
> just with different types. Use the new ops indirection to link CD
> programming to the common writer.
> 
> In a few more patches all CD writers will call an appropriate make
> function and then directly call arm_smmu_write_cd_entry().
> arm_smmu_write_ctx_desc() will be removed.
> 
> Until then lightly tweak arm_smmu_write_ctx_desc() to also use the new
> programmer by using the same logic as right now to build the target CD on
> the stack, sanitizing it to meet the used rules, and then using the
> writer.
> 
> This is necessary because the writer expects that the currently programmed
> CD follows the used rules. Next patches add new make functions and new
> direct calls to arm_smmu_write_cd_entry() which will require this.
> 
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  2024-04-22 14:20     ` Jason Gunthorpe
@ 2024-04-27 22:19       ` Mostafa Saleh
  2024-04-29 14:01         ` Jason Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-27 22:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Mon, Apr 22, 2024 at 11:20:53AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 19, 2024 at 09:14:21PM +0000, Mostafa Saleh wrote:
> > Hi Jason,
> > 
> > On Tue, Apr 16, 2024 at 04:28:16PM -0300, Jason Gunthorpe wrote:
> > > Only the attach callers can perform an allocation for the CD table entry,
> > > the other callers must not do so, they do not have the correct locking and
> > > they cannot sleep. Split up the functions so this is clear.
> > > 
> > > arm_smmu_get_cd_ptr() will return pointer to a CD table entry without
> > > doing any kind of allocation.
> > > 
> > > arm_smmu_alloc_cd_ptr() will allocate the table and any required
> > > leaf.
> > > 
> > > A following patch will add lockdep assertions to arm_smmu_alloc_cd_ptr()
> > > once the restructuring is completed and arm_smmu_alloc_cd_ptr() is never
> > > called in the wrong context.
> > > 
> > > Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> > > ---
> > >  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 61 +++++++++++++--------
> > >  1 file changed, 39 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > index f3df1ec8d258dc..a0d1237272936f 100644
> > > --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> > > @@ -98,6 +98,7 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
> > >  
> > >  static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
> > >  				    struct arm_smmu_device *smmu);
> > > +static int arm_smmu_alloc_cd_tables(struct arm_smmu_master *master);
> > >  
> > >  static void parse_driver_options(struct arm_smmu_device *smmu)
> > >  {
> > > @@ -1207,29 +1208,51 @@ static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> > >  struct arm_smmu_cd *arm_smmu_get_cd_ptr(struct arm_smmu_master *master,
> > >  					u32 ssid)
> > >  {
> > > -	__le64 *l1ptr;
> > > -	unsigned int idx;
> > >  	struct arm_smmu_l1_ctx_desc *l1_desc;
> > > -	struct arm_smmu_device *smmu = master->smmu;
> > >  	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
> > >  
> > > +	if (!cd_table->cdtab)
> > > +		return NULL;
> > > +
> > >  	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_LINEAR)
> > >  		return (struct arm_smmu_cd *)(cd_table->cdtab +
> > >  					      ssid * CTXDESC_CD_DWORDS);
> > >  
> > > -	idx = ssid >> CTXDESC_SPLIT;
> > > -	l1_desc = &cd_table->l1_desc[idx];
> > > -	if (!l1_desc->l2ptr) {
> > > -		if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc))
> > > -			return NULL;
> > > +	l1_desc = &cd_table->l1_desc[ssid / CTXDESC_L2_ENTRIES];
> > 
> > These operations used to be shift and bit masking which made sense as it does
> > what hardware does, is there any reason you changed it to division and modulo?
> > I checked the disassembly and gcc does the right thing as constants are power
> > of 2, but I am just curious.
> 
> I generally prefer the clarity and succinctness of / and % instead of
> hacking up bit operations that the compiler will generate
> automatically anyhow.
> 
> If bit extractions should be used it is better to wrap it in
> FIELD_GET() than open code it..
> 
> > > +static struct arm_smmu_cd *arm_smmu_alloc_cd_ptr(struct arm_smmu_master *master,
> > > +						 u32 ssid)
> > > +{
> > > +	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
> > > +	struct arm_smmu_device *smmu = master->smmu;
> > > +
> > > +	if (!cd_table->cdtab) {
> > > +		if (arm_smmu_alloc_cd_tables(master))
> > > +			return NULL;
> > >  	}
> > > -	idx = ssid & (CTXDESC_L2_ENTRIES - 1);
> > > -	return &l1_desc->l2ptr[idx];
> > > +
> > > +	if (cd_table->s1fmt == STRTAB_STE_0_S1FMT_64K_L2) {
> > > +		unsigned int idx = ssid >> CTXDESC_SPLIT;
> > 
> > Ok, now it’s a shift, I think we should be consistent with how we
> > calculate the index.
> 
> Sure. Change that to / will make CTXDESC_SPLIT unused except in
> computing CTXDESC_L2_ENTRIES so that can be simplified too:
> 
> -#define CTXDESC_SPLIT                  10
> -#define CTXDESC_L2_ENTRIES             (1 << CTXDESC_SPLIT)
> +#define CTXDESC_L2_ENTRIES             1024
> 

Sounds good, I don’t think it matters much as long as its consistent, but
anyway the split is defined by the spec to be either 6, 8 or 10.
So split size has to be a power of 2.

> 
> > > @@ -1357,7 +1380,7 @@ int arm_smmu_write_ctx_desc(struct arm_smmu_master *master, int ssid,
> > >  	if (WARN_ON(ssid >= (1 << cd_table->s1cdmax)))
> > >  		return -E2BIG;
> > >  
> > > -	cd_table_entry = arm_smmu_get_cd_ptr(master, ssid);
> > > +	cd_table_entry = arm_smmu_alloc_cd_ptr(master, ssid);
> > 
> > The only path allocates the main table is “arm_smmu_attach_dev”,
> 
> There are two places that allocate the leaf, arm_smmu_attach_dev()
> (for the RID) and arm_smmu_sva_set_dev_pasid() (for a PASID)
> 
> At this moment all the paths are relying on the above to allocate the
> leaf. The next patch makes arm_smmu_attach_dev() allocate the leaf
> itself. A few more patches also makes the PASID path allocate the leaf
> itself, when the above is removed.
> 
> > I guess it would be more robust to leave that as is and have 2
> > versions of get_cd, one that allocates leaf and one that is not
> > allocating, what do you think?
> 
> I'm not sure what you are asking? We have two versions. One is called
> alloc and one is called get. That have different locking requirements
> on the caller so they have different names. I would not call them both
> get?
> 

My point is that arm_smmu_alloc_cd_ptr() doesn’t only allocate the leaf,
but also the L1 through arm_smmu_alloc_cd_tables()

IMO, arm_smmu_alloc_cd_ptr() should only allocate leafs. And inside
arm_smmu_attach_dev() it calls arm_smmu_alloc_cd_tables().
This makes it clear which path is expected to allocate the L1 table.

And arm_smmu_get_cd_ptr() will remain as is.

Thanks,
Mostafa

> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry
  2024-04-22 14:24     ` Jason Gunthorpe
@ 2024-04-27 22:33       ` Mostafa Saleh
  0 siblings, 0 replies; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-27 22:33 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Mon, Apr 22, 2024 at 11:24:29AM -0300, Jason Gunthorpe wrote:
> On Fri, Apr 19, 2024 at 09:24:52PM +0000, Mostafa Saleh wrote:
> > > +static void arm_smmu_test_make_sva_release_cd(struct arm_smmu_cd *cd,
> > > +					      unsigned int asid)
> > > +{
> > > +	struct arm_smmu_master master = {
> > > +		.smmu = &smmu,
> > > +	};
> > > +
> > > +	arm_smmu_make_sva_cd(cd, &master, NULL, asid);
> > > +}
> > > +
> > 
> > The test doesn’t build with SVA disabled, it fails with:
> > aarch64-linux-gnu-ld: Unexpected GOT/PLT entries detected!
> > aarch64-linux-gnu-ld: Unexpected run-time procedure linkages detected!
> > aarch64-linux-gnu-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o: in function `arm_smmu_test_make_sva_release_cd':
> > .../linux/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:409:(.text+0x17c): undefined reference to `arm_smmu_make_sva_cd'
> > aarch64-linux-gnu-ld: drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.o: in function `arm_smmu_test_make_sva_cd':
> > .../linux/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:399:(.text+0x230): undefined reference to `arm_smmu_make_sva_cd'
> > 
> > I belive this check should be guarded under SVA.
> 
> Ugh yes, 0-day just hit this too.
> 
> I'm just going to do this:
> 
>  config ARM_SMMU_V3_KUNIT_TEST
>         bool "KUnit tests for arm-smmu-v3 driver"  if !KUNIT_ALL_TESTS
>         depends on KUNIT
> +       depends on ARM_SMMU_V3_SVA
>         default KUNIT_ALL_TESTS
>         help
>           Enable this option to unit-test arm-smmu-v3 driver functions.
> 
> 
> Instead of adding #ifdefs.  No reason not to test the whole driver?
> 

Sounds good, I guess that option will be only used for development.

Thanks,
Mostafa

> > > @@ -980,7 +968,7 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
> > >   * would be nice if this was complete according to the spec, but minimally it
> > >   * has to capture the bits this driver uses.
> > >   */
> > > -static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> > > +void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> > 
> > IMO we should not export all these low level functions unconditionally.
> > KUNIT already defines “VISIBLE_IF_KUNIT” which sets symbols to be static
> > if CONFIG_KUNIT is not enabled. Or maybe even guard it for this test
> > like what btrfs does with “EXPORT_FOR_TESTS”
> 
> Sure, that doesn't look like too much trouble long term.
> 
> Thanks,
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  2024-04-27 22:19       ` Mostafa Saleh
@ 2024-04-29 14:01         ` Jason Gunthorpe
  2024-04-29 14:47           ` Mostafa Saleh
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-29 14:01 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Sat, Apr 27, 2024 at 10:19:37PM +0000, Mostafa Saleh wrote:

> > I'm not sure what you are asking? We have two versions. One is called
> > alloc and one is called get. That have different locking requirements
> > on the caller so they have different names. I would not call them both
> > get?
> > 
> 
> My point is that arm_smmu_alloc_cd_ptr() doesn’t only allocate the leaf,
> but also the L1 through arm_smmu_alloc_cd_tables()

Sure, it is called alloc, it allocs everything to make the CD table
entry usable.

> IMO, arm_smmu_alloc_cd_ptr() should only allocate leafs. And inside
> arm_smmu_attach_dev() it calls arm_smmu_alloc_cd_tables().
> This makes it clear which path is expected to allocate the L1 table.

The PASID path sometimes has to allocate the L1 table too, why
duplicate the allocation code?

What is different about the L1 vs L2 that it should be open coded?

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-27 22:08       ` Mostafa Saleh
@ 2024-04-29 14:29         ` Jason Gunthorpe
  2024-04-29 15:30           ` Mostafa Saleh
  0 siblings, 1 reply; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-29 14:29 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Sat, Apr 27, 2024 at 10:08:57PM +0000, Mostafa Saleh wrote:
> > The issue is the old logic constructs the new CD by manipulating the
> > existing CD in various ways "in place" that ends up creating CDs that
> > don't meet the requirements for the new programmer. For instance EPD0
> > will be set and the TTB0 will also be left programmed.
> > 
> 
> I see, but what I don’t understand is why doesn't the function construct
> the CD correctly, as from

Why? Because it never had to before. It made minimal edits to minimize
the code.

> 	} else if (cd == &quiet_cd) { /* (4) */
> 		if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> 			val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
> 		val |= CTXDESC_CD_0_TCR_EPD0;
> 		// populate the rest of the CD correctly here.
> 	}

What you are asking for is this:

        cd_live = !!(val & CTXDESC_CD_0_V);
 
        if (!cd) { /* (5) */
+               memset(cdptr, 0, sizeof(*cdptr));
                val = 0;
        } else if (cd == &quiet_cd) { /* (4) */
+               val &= ~(CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
+                        CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
+                        CTXDESC_CD_0_TCR_SH0);
                if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
                        val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
                val |= CTXDESC_CD_0_TCR_EPD0;
+               cdptr->data[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
        } else if (cd_live) { /* (3) */
                val &= ~CTXDESC_CD_0_ASID;
                val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);

I think.. I've been staring at this a while now and I *think* it
covers all the cases and we won't hit the WARN_ON?

So sure, lets do it that way, the code is all deleted anyhow ..

> As I  don’t think the right approach is to populate the CD incorrectly
> and then clear the parts not needed for EPD0.

It is very easy to see that such a simple algorithm will not trigger
the WARN_ON. The above is somewhat trickier.

> Also, TTB0 is ignored anyway in that case, no?

Only by HW, there is a protective WARN_ON that will trigger in the
programmer, that is what this is trying to avoid. For bisection.

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  2024-04-29 14:01         ` Jason Gunthorpe
@ 2024-04-29 14:47           ` Mostafa Saleh
  2024-04-29 14:55             ` Jason Gunthorpe
  0 siblings, 1 reply; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-29 14:47 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Mon, Apr 29, 2024 at 11:01:37AM -0300, Jason Gunthorpe wrote:
> On Sat, Apr 27, 2024 at 10:19:37PM +0000, Mostafa Saleh wrote:
> 
> > > I'm not sure what you are asking? We have two versions. One is called
> > > alloc and one is called get. That have different locking requirements
> > > on the caller so they have different names. I would not call them both
> > > get?
> > > 
> > 
> > My point is that arm_smmu_alloc_cd_ptr() doesn’t only allocate the leaf,
> > but also the L1 through arm_smmu_alloc_cd_tables()
> 
> Sure, it is called alloc, it allocs everything to make the CD table
> entry usable.

Maybe if it’s called alloc_leaf, it only allocates leafs :)

> 
> > IMO, arm_smmu_alloc_cd_ptr() should only allocate leafs. And inside
> > arm_smmu_attach_dev() it calls arm_smmu_alloc_cd_tables().
> > This makes it clear which path is expected to allocate the L1 table.
> 
> The PASID path sometimes has to allocate the L1 table too, why
> duplicate the allocation code?
> 
> What is different about the L1 vs L2 that it should be open coded?
> 

I don’t think it is a big problem, but my main concern is robustness,
for example a small erroneous code change might trigger allocation for
L1 table from a path that shouldn’t, and that might go unnoticed as
this function will allow it, leading to memory leaks, or other issues
that might be harder to triage later, instead with limiting which path
allocates which level, would return a NULL in that case and fail
immediately.

Thanks,
Mostafa
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr()
  2024-04-29 14:47           ` Mostafa Saleh
@ 2024-04-29 14:55             ` Jason Gunthorpe
  0 siblings, 0 replies; 48+ messages in thread
From: Jason Gunthorpe @ 2024-04-29 14:55 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Mon, Apr 29, 2024 at 02:47:26PM +0000, Mostafa Saleh wrote:

> > > IMO, arm_smmu_alloc_cd_ptr() should only allocate leafs. And inside
> > > arm_smmu_attach_dev() it calls arm_smmu_alloc_cd_tables().
> > > This makes it clear which path is expected to allocate the L1 table.
> > 
> > The PASID path sometimes has to allocate the L1 table too, why
> > duplicate the allocation code?
> > 
> > What is different about the L1 vs L2 that it should be open coded?
> 
> I don’t think it is a big problem, but my main concern is robustness,
> for example a small erroneous code change might trigger allocation for
> L1 table from a path that shouldn’t,

A few patches more we add a lockdep, so a wrongly placed allocation is
*very* likely to hit the lockdep. If the lockdep satisfies then it is
not going to cause a functional problem.

> and that might go unnoticed as
> this function will allow it, leading to memory leaks, 

Any cd table memory allocated by arm_smmu_alloc_cd_ptr() is reliably
freed in the arm_smmu_release_device().

> or other issues that might be harder to triage later, instead with
> limiting which path allocates which level, would return a NULL in
> that case and fail immediately.

All cases that need to allocate a leaf need to allocate the L1 too, it
is artifical to make a distinction between them.

Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry()
  2024-04-29 14:29         ` Jason Gunthorpe
@ 2024-04-29 15:30           ` Mostafa Saleh
  0 siblings, 0 replies; 48+ messages in thread
From: Mostafa Saleh @ 2024-04-29 15:30 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Eric Auger, Moritz Fischer, Moritz Fischer, Michael Shavit,
	Nicolin Chen, patches, Shameerali Kolothum Thodi

On Mon, Apr 29, 2024 at 11:29:05AM -0300, Jason Gunthorpe wrote:
> On Sat, Apr 27, 2024 at 10:08:57PM +0000, Mostafa Saleh wrote:
> > > The issue is the old logic constructs the new CD by manipulating the
> > > existing CD in various ways "in place" that ends up creating CDs that
> > > don't meet the requirements for the new programmer. For instance EPD0
> > > will be set and the TTB0 will also be left programmed.
> > > 
> > 
> > I see, but what I don’t understand is why doesn't the function construct
> > the CD correctly, as from
> 
> Why? Because it never had to before. It made minimal edits to minimize
> the code.

I understand, my point was why don’t we introduce a new logic to construct it
correctly, instead of hacking the old one, as it is much easier to reason
about (at least from my point of view)

> 
> > 	} else if (cd == &quiet_cd) { /* (4) */
> > 		if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
> > 			val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
> > 		val |= CTXDESC_CD_0_TCR_EPD0;
> > 		// populate the rest of the CD correctly here.
> > 	}
> 
> What you are asking for is this:
> 
>         cd_live = !!(val & CTXDESC_CD_0_V);
>  
>         if (!cd) { /* (5) */
> +               memset(cdptr, 0, sizeof(*cdptr));
>                 val = 0;
>         } else if (cd == &quiet_cd) { /* (4) */
> +               val &= ~(CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> +                        CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> +                        CTXDESC_CD_0_TCR_SH0);
>                 if (!(smmu->features & ARM_SMMU_FEAT_STALL_FORCE))
>                         val &= ~(CTXDESC_CD_0_S | CTXDESC_CD_0_R);
>                 val |= CTXDESC_CD_0_TCR_EPD0;
> +               cdptr->data[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
>         } else if (cd_live) { /* (3) */
>                 val &= ~CTXDESC_CD_0_ASID;
>                 val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid);
> 
> I think.. I've been staring at this a while now and I *think* it
> covers all the cases and we won't hit the WARN_ON?
> 

That’s similar to how I imagined it.

> So sure, lets do it that way, the code is all deleted anyhow ..
> 

I agree, if it's deleted anyway we shouldn't put much time, I haven't
looked at the SVA patch yet.

> > As I  don’t think the right approach is to populate the CD incorrectly
> > and then clear the parts not needed for EPD0.
> 
> It is very easy to see that such a simple algorithm will not trigger
> the WARN_ON. The above is somewhat trickier.
> 
> > Also, TTB0 is ignored anyway in that case, no?
> 
> Only by HW, there is a protective WARN_ON that will trigger in the
> programmer, that is what this is trying to avoid. For bisection.

Makes sense.

Thanks,
Mostafa
> Jason

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2024-04-29 15:31 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-16 19:28 [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Jason Gunthorpe
2024-04-16 19:28 ` [PATCH v7 1/9] iommu/arm-smmu-v3: Add an ops indirection to the STE code Jason Gunthorpe
2024-04-16 20:18   ` Nicolin Chen
2024-04-19 21:02   ` Mostafa Saleh
2024-04-22 13:09     ` Jason Gunthorpe
2024-04-16 19:28 ` [PATCH v7 2/9] iommu/arm-smmu-v3: Make CD programming use arm_smmu_write_entry() Jason Gunthorpe
2024-04-16 20:48   ` Nicolin Chen
2024-04-18 13:01   ` Robin Murphy
2024-04-18 16:08     ` Jason Gunthorpe
2024-04-19 21:07   ` Mostafa Saleh
2024-04-22 13:29     ` Jason Gunthorpe
2024-04-27 22:08       ` Mostafa Saleh
2024-04-29 14:29         ` Jason Gunthorpe
2024-04-29 15:30           ` Mostafa Saleh
2024-04-16 19:28 ` [PATCH v7 3/9] iommu/arm-smmu-v3: Move the CD generation for S1 domains into a function Jason Gunthorpe
2024-04-16 21:22   ` Nicolin Chen
2024-04-19 21:10   ` Mostafa Saleh
2024-04-22 13:52     ` Jason Gunthorpe
2024-04-16 19:28 ` [PATCH v7 4/9] iommu/arm-smmu-v3: Consolidate clearing a CD table entry Jason Gunthorpe
2024-04-16 19:28 ` [PATCH v7 5/9] iommu/arm-smmu-v3: Make arm_smmu_alloc_cd_ptr() Jason Gunthorpe
2024-04-16 22:19   ` Nicolin Chen
2024-04-19 21:14   ` Mostafa Saleh
2024-04-22 14:20     ` Jason Gunthorpe
2024-04-27 22:19       ` Mostafa Saleh
2024-04-29 14:01         ` Jason Gunthorpe
2024-04-29 14:47           ` Mostafa Saleh
2024-04-29 14:55             ` Jason Gunthorpe
2024-04-16 19:28 ` [PATCH v7 6/9] iommu/arm-smmu-v3: Allocate the CD table entry in advance Jason Gunthorpe
2024-04-16 19:28 ` [PATCH v7 7/9] iommu/arm-smmu-v3: Move the CD generation for SVA into a function Jason Gunthorpe
2024-04-17  7:37   ` Nicolin Chen
2024-04-17 13:17     ` Jason Gunthorpe
2024-04-17 16:25       ` Nicolin Chen
2024-04-17 16:26   ` Nicolin Chen
2024-04-18  4:40   ` Michael Shavit
2024-04-18 14:28     ` Jason Gunthorpe
2024-04-16 19:28 ` [PATCH v7 8/9] iommu/arm-smmu-v3: Build the whole CD in arm_smmu_make_s1_cd() Jason Gunthorpe
2024-04-17  7:43   ` Nicolin Chen
2024-04-16 19:28 ` [PATCH v7 9/9] iommu/arm-smmu-v3: Add unit tests for arm_smmu_write_entry Jason Gunthorpe
2024-04-17  8:09   ` Nicolin Chen
2024-04-17 14:16     ` Jason Gunthorpe
2024-04-17 16:13       ` Nicolin Chen
2024-04-18  4:39       ` Michael Shavit
2024-04-18 12:48         ` Jason Gunthorpe
2024-04-18 14:34           ` Michael Shavit
2024-04-19 21:24   ` Mostafa Saleh
2024-04-22 14:24     ` Jason Gunthorpe
2024-04-27 22:33       ` Mostafa Saleh
2024-04-16 19:40 ` [PATCH v7 0/9] Make the SMMUv3 CD logic match the new STE design (part 2a/3) Nicolin Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).