All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 00/17] Update SMMUv3 to the modern iommu API (part 1/3)
@ 2024-02-06 15:12 ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

The SMMUv3 driver was originally written in 2015 when the iommu driver
facing API looked quite different. The API has evolved, especially lately,
and the driver has fallen behind.

This work aims to bring make the SMMUv3 driver the best IOMMU driver with
the most comprehensive implementation of the API. After all parts it
addresses:

 - Global static BLOCKED and IDENTITY domains with 'never fail' attach
   semantics. BLOCKED is desired for efficient VFIO.

 - Support map before attach for PAGING iommu_domains.

 - attach_dev failure does not change the HW configuration.

 - Fully hitless transitions between IDENTITY -> DMA -> IDENTITY.
   The API has IOMMU_RESV_DIRECT which is expected to be
   continuously translating.

 - Safe transitions between PAGING -> BLOCKED, do not ever temporarily
   do IDENTITY. This is required for iommufd security.

 - Full PASID API support including:
    - S1/SVA domains attached to PASIDs
    - IDENTITY/BLOCKED/S1 attached to RID
    - Change of the RID domain while PASIDs are attached

 - Streamlined SVA support using the core infrastructure

 - Hitless, whenever possible, change between two domains

 - iommufd IOMMU_GET_HW_INFO, IOMMU_HWPT_ALLOC_NEST_PARENT, and
   IOMMU_DOMAIN_NESTED support

Over all these things are going to become more accessible to iommufd, and
exposed to VMs, so it is important for the driver to have a robust
implementation of the API.

The work is split into three parts, with this part largely focusing on the
STE and building up to the BLOCKED & IDENTITY global static domains.

The second part largely focuses on the CD and builds up to having a common
PASID infrastructure that SVA and S1 domains equally use.

The third part has some random cleanups and the iommufd related parts.

Overall this takes the approach of turning the STE/CD programming upside
down where the CD/STE value is computed right at a driver callback
function and then pushed down into programming logic. The programming
logic hides the details of the required CD/STE tear-less update. This
makes the CD/STE functions independent of the arm_smmu_domain which makes
it fairly straightforward to untangle all the different call chains, and
add news ones.

Further, this frees the arm_smmu_domain related logic from keeping track
of what state the STE/CD is currently in so it can carefully sequence the
correct update. There are many new update pairs that are subtly introduced
as the work progresses.

The locking to support BTM via arm_smmu_asid_lock is a bit subtle right
now and patches throughout this work adjust and tighten this so that it is
clearer and doesn't get broken.

Once the lower STE layers no longer need to touch arm_smmu_domain we can
isolate struct arm_smmu_domain to be only used for PAGING domains, audit
all the to_smmu_domain() calls to be only in PAGING domain ops, and
introduce the normal global static BLOCKED/IDENTITY domains using the new
STE infrastructure. Part 2 will ultimately migrate SVA over to use
arm_smmu_domain as well.

All parts are on github:

 https://github.com/jgunthorpe/linux/commits/smmuv3_newapi

v5:
 - Rebase on v6.8-rc3
 - Remove the writer argument to arm_smmu_entry_writer_ops get_used()
 - Swap order of hweight tests so one call to hweight8() can be removed
 - Add STRTAB_STE_2_S2VMID used for STRTAB_STE_0_CFG_S1_TRANS, for
   S2 bypass the VMID is used but 0
 - Be more exact when generating STEs and store 0's to document the HW
   is using that value and 0 is actually a deliberate choice for VMID and
   SHCFG.
 - Remove cd_table argument to arm_smmu_make_cdtable_ste()
 - Put arm_smmu_rmr_install_bypass_ste() after setting up a 2 level table
 - Pull patch "Check that the RID domain is S1 in SVA" from part 2 to
   guard against memory corruption on failure paths
 - Tighten the used logic for SHCFG to accommodate nesting patches in
   part 3
 - Additional comments and commit message adjustments
v4: https://lore.kernel.org/r/0-v4-c93b774edcc4+42d2b-smmuv3_newapi_p1_jgg@nvidia.com
 - Rebase on v6.8-rc1. Patches 1-3 merged
 - Replace patch "Make STE programming independent of the callers" with
   Michael's version
    * Describe the core API desire for hitless updates
    * Replace the iterator with STE/CD specific function pointers.
      This lets the logic be written top down instead of rolled into an
      iterator
    * Optimize away a sync when the critical qword is the only qword
      to update
 - Pass master not smmu to arm_smmu_write_ste() throughout
 - arm_smmu_make_s2_domain_ste() should use data[1] = not |= since
   it is known to be zero
 - Return errno's from domain_alloc() paths
v3: https://lore.kernel.org/r/0-v3-d794f8d934da+411a-smmuv3_newapi_p1_jgg@nvidia.com
 - Use some local variables in arm_smmu_get_step_for_sid() for clarity
 - White space and spelling changes
 - Commit message updates
 - Keep master->domain_head initialized to avoid a list_del corruption
v2: https://lore.kernel.org/r/0-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com
 - Rebased on v6.7-rc1
 - Improve the comment for arm_smmu_write_entry_step()
 - Fix the botched memcmp
 - Document the spec justification for the SHCFG exclusion in used
 - Include STRTAB_STE_1_SHCFG for STRTAB_STE_0_CFG_S2_TRANS in used
 - WARN_ON for unknown STEs in used
 - Fix error unwind in arm_smmu_attach_dev()
 - Whitespace, spelling, and checkpatch related items
v1: https://lore.kernel.org/r/0-v1-e289ca9121be+2be-smmuv3_newapi_p1_jgg@nvidia.com

Jason Gunthorpe (17):
  iommu/arm-smmu-v3: Make STE programming independent of the callers
  iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
  iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into
    functions
  iommu/arm-smmu-v3: Build the whole STE in
    arm_smmu_make_s2_domain_ste()
  iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
  iommu/arm-smmu-v3: Compute the STE only once for each master
  iommu/arm-smmu-v3: Do not change the STE twice during
    arm_smmu_attach_dev()
  iommu/arm-smmu-v3: Put writing the context descriptor in the right
    order
  iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
  iommu/arm-smmu-v3: Remove arm_smmu_master->domain
  iommu/arm-smmu-v3: Check that the RID domain is S1 in SVA
  iommu/arm-smmu-v3: Add a global static IDENTITY domain
  iommu/arm-smmu-v3: Add a global static BLOCKED domain
  iommu/arm-smmu-v3: Use the identity/blocked domain during release
  iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to
    finalize
  iommu/arm-smmu-v3: Convert to domain_alloc_paging()

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   8 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 778 ++++++++++++------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   5 +-
 3 files changed, 549 insertions(+), 242 deletions(-)

The diff against v4 is small:

--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -57,8 +57,7 @@ struct arm_smmu_entry_writer {
 struct arm_smmu_entry_writer_ops {
 	unsigned int num_entry_qwords;
 	__le64 v_bit;
-	void (*get_used)(struct arm_smmu_entry_writer *writer, const __le64 *entry,
-			 __le64 *used);
+	void (*get_used)(const __le64 *entry, __le64 *used);
 	void (*sync)(struct arm_smmu_entry_writer *writer);
 };
 
@@ -1006,8 +1005,8 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
 	u8 used_qword_diff = 0;
 	unsigned int i;
 
-	writer->ops->get_used(writer, entry, cur_used);
-	writer->ops->get_used(writer, target, target_used);
+	writer->ops->get_used(entry, cur_used);
+	writer->ops->get_used(target, target_used);
 
 	for (i = 0; i != writer->ops->num_entry_qwords; i++) {
 		/*
@@ -1084,17 +1083,7 @@ static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
 
 	used_qword_diff =
 		arm_smmu_entry_qword_diff(writer, entry, target, unused_update);
-	if (hweight8(used_qword_diff) > 1) {
-		/*
-		 * At least two qwords need their inuse bits to be changed. This
-		 * requires a breaking update, zero the V bit, write all qwords
-		 * but 0, then set qword 0
-		 */
-		unused_update[0] = entry[0] & (~writer->ops->v_bit);
-		entry_set(writer, entry, unused_update, 0, 1);
-		entry_set(writer, entry, target, 1, num_entry_qwords - 1);
-		entry_set(writer, entry, target, 0, 1);
-	} else if (hweight8(used_qword_diff) == 1) {
+	if (hweight8(used_qword_diff) == 1) {
 		/*
 		 * Only one qword needs its used bits to be changed. This is a
 		 * hitless update, update all bits the current STE is ignoring
@@ -1114,6 +1103,16 @@ static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
 		entry_set(writer, entry, unused_update, 0, num_entry_qwords);
 		entry_set(writer, entry, target, critical_qword_index, 1);
 		entry_set(writer, entry, target, 0, num_entry_qwords);
+	} else if (used_qword_diff) {
+		/*
+		 * At least two qwords need their inuse bits to be changed. This
+		 * requires a breaking update, zero the V bit, write all qwords
+		 * but 0, then set qword 0
+		 */
+		unused_update[0] = entry[0] & (~writer->ops->v_bit);
+		entry_set(writer, entry, unused_update, 0, 1);
+		entry_set(writer, entry, target, 1, num_entry_qwords - 1);
+		entry_set(writer, entry, target, 0, 1);
 	} else {
 		/*
 		 * No inuse bit changed. Sanity check that all unused bits are 0
@@ -1402,28 +1401,30 @@ struct arm_smmu_ste_writer {
  * would be nice if this was complete according to the spec, but minimally it
  * has to capture the bits this driver uses.
  */
-static void arm_smmu_get_ste_used(struct arm_smmu_entry_writer *writer,
-				  const __le64 *ent, __le64 *used_bits)
+static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 {
+	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
+
 	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
 	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
 		return;
 
 	/*
-	 * If S1 is enabled S1DSS is valid, see 13.5 Summary of
-	 * attribute/permission configuration fields for the SHCFG behavior.
+	 * See 13.5 Summary of attribute/permission configuration fields for the
+	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
+	 * and S2 only.
 	 */
-	if (FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0])) & 1 &&
-	    FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
-		    STRTAB_STE_1_S1DSS_BYPASS)
+	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
+	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
+	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
+	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
+		     STRTAB_STE_1_S1DSS_BYPASS))
 		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
 
 	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
-	switch (FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]))) {
+	switch (cfg) {
 	case STRTAB_STE_0_CFG_ABORT:
-		break;
 	case STRTAB_STE_0_CFG_BYPASS:
-		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
 		break;
 	case STRTAB_STE_0_CFG_S1_TRANS:
 		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
@@ -1434,10 +1435,11 @@ static void arm_smmu_get_ste_used(struct arm_smmu_entry_writer *writer,
 				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
 				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
 		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
+		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
 		break;
 	case STRTAB_STE_0_CFG_S2_TRANS:
 		used_bits[1] |=
-			cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG);
+			cpu_to_le64(STRTAB_STE_1_EATS);
 		used_bits[2] |=
 			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
 				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
@@ -1519,9 +1521,9 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
 }
 
 static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
-				      struct arm_smmu_master *master,
-				      struct arm_smmu_ctx_desc_cfg *cd_table)
+				      struct arm_smmu_master *master)
 {
+	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
 	struct arm_smmu_device *smmu = master->smmu;
 
 	memset(target, 0, sizeof(*target));
@@ -1542,11 +1544,30 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
 			 STRTAB_STE_1_S1STALLD :
 			 0) |
 		FIELD_PREP(STRTAB_STE_1_EATS,
-			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
-		FIELD_PREP(STRTAB_STE_1_STRW,
-			   (smmu->features & ARM_SMMU_FEAT_E2H) ?
-				   STRTAB_STE_1_STRW_EL2 :
-				   STRTAB_STE_1_STRW_NSEL1));
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H) {
+		/*
+		 * To support BTM the streamworld needs to match the
+		 * configuration of the CPU so that the ASID broadcasts are
+		 * properly matched. This means either S/NS-EL2-E2H (hypervisor)
+		 * or NS-EL1 (guest). Since an SVA domain can be installed in a
+		 * PASID this should always use a BTM compatible configuration
+		 * if the HW supports it.
+		 */
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_EL2));
+	} else {
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
+
+		/*
+		 * VMID 0 is reserved for stage-2 bypass EL1 STEs, see
+		 * arm_smmu_domain_alloc_id()
+		 */
+		target->data[2] =
+			cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0));
+	}
 }
 
 static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
@@ -1567,7 +1588,9 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 
 	target->data[1] = cpu_to_le64(
 		FIELD_PREP(STRTAB_STE_1_EATS,
-			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
+		FIELD_PREP(STRTAB_STE_1_SHCFG,
+			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
 
 	vtcr_val = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) |
 		   FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) |
@@ -1590,6 +1613,10 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 				      STRTAB_STE_3_S2TTB_MASK);
 }
 
+/*
+ * This can safely directly manipulate the STE memory without a sync sequence
+ * because the STE table has not been installed in the SMMU yet.
+ */
 static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
 				      unsigned int nent)
 {
@@ -2632,7 +2659,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		if (ret)
 			goto out_list_del;
 
-		arm_smmu_make_cdtable_ste(&target, master, &master->cd_table);
+		arm_smmu_make_cdtable_ste(&target, master);
 		arm_smmu_install_ste_for_dev(master, &target);
 		break;
 	case ARM_SMMU_DOMAIN_S2:
@@ -3325,8 +3352,6 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 
 	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
 
-	/* Check for RMRs and install bypass STEs if any */
-	arm_smmu_rmr_install_bypass_ste(smmu);
 	return 0;
 }
 
@@ -3350,6 +3375,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 
 	ida_init(&smmu->vmid_map);
 
+	/* Check for RMRs and install bypass STEs if any */
+	arm_smmu_rmr_install_bypass_ste(smmu);
 	return 0;
 }
 
@@ -4049,6 +4076,10 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
 				continue;
 			}
 
+			/*
+			 * STE table is not programmed to HW, see
+			 * arm_smmu_init_bypass_stes()
+			 */
 			arm_smmu_make_bypass_ste(
 				arm_smmu_get_step_for_sid(smmu, rmr->sids[i]));
 		}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 23baf117e7e4b5..23d8ab9a937aa6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -249,6 +249,7 @@ struct arm_smmu_ste {
 #define STRTAB_STE_1_STRW_EL2		2UL
 
 #define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
+#define STRTAB_STE_1_SHCFG_NON_SHARABLE	0UL
 #define STRTAB_STE_1_SHCFG_INCOMING	1UL
 
 #define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)


base-commit: 54be6c6c5ae8e0d93a6c4641cb7528eb0b6ba478
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 00/17] Update SMMUv3 to the modern iommu API (part 1/3)
@ 2024-02-06 15:12 ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

The SMMUv3 driver was originally written in 2015 when the iommu driver
facing API looked quite different. The API has evolved, especially lately,
and the driver has fallen behind.

This work aims to bring make the SMMUv3 driver the best IOMMU driver with
the most comprehensive implementation of the API. After all parts it
addresses:

 - Global static BLOCKED and IDENTITY domains with 'never fail' attach
   semantics. BLOCKED is desired for efficient VFIO.

 - Support map before attach for PAGING iommu_domains.

 - attach_dev failure does not change the HW configuration.

 - Fully hitless transitions between IDENTITY -> DMA -> IDENTITY.
   The API has IOMMU_RESV_DIRECT which is expected to be
   continuously translating.

 - Safe transitions between PAGING -> BLOCKED, do not ever temporarily
   do IDENTITY. This is required for iommufd security.

 - Full PASID API support including:
    - S1/SVA domains attached to PASIDs
    - IDENTITY/BLOCKED/S1 attached to RID
    - Change of the RID domain while PASIDs are attached

 - Streamlined SVA support using the core infrastructure

 - Hitless, whenever possible, change between two domains

 - iommufd IOMMU_GET_HW_INFO, IOMMU_HWPT_ALLOC_NEST_PARENT, and
   IOMMU_DOMAIN_NESTED support

Over all these things are going to become more accessible to iommufd, and
exposed to VMs, so it is important for the driver to have a robust
implementation of the API.

The work is split into three parts, with this part largely focusing on the
STE and building up to the BLOCKED & IDENTITY global static domains.

The second part largely focuses on the CD and builds up to having a common
PASID infrastructure that SVA and S1 domains equally use.

The third part has some random cleanups and the iommufd related parts.

Overall this takes the approach of turning the STE/CD programming upside
down where the CD/STE value is computed right at a driver callback
function and then pushed down into programming logic. The programming
logic hides the details of the required CD/STE tear-less update. This
makes the CD/STE functions independent of the arm_smmu_domain which makes
it fairly straightforward to untangle all the different call chains, and
add news ones.

Further, this frees the arm_smmu_domain related logic from keeping track
of what state the STE/CD is currently in so it can carefully sequence the
correct update. There are many new update pairs that are subtly introduced
as the work progresses.

The locking to support BTM via arm_smmu_asid_lock is a bit subtle right
now and patches throughout this work adjust and tighten this so that it is
clearer and doesn't get broken.

Once the lower STE layers no longer need to touch arm_smmu_domain we can
isolate struct arm_smmu_domain to be only used for PAGING domains, audit
all the to_smmu_domain() calls to be only in PAGING domain ops, and
introduce the normal global static BLOCKED/IDENTITY domains using the new
STE infrastructure. Part 2 will ultimately migrate SVA over to use
arm_smmu_domain as well.

All parts are on github:

 https://github.com/jgunthorpe/linux/commits/smmuv3_newapi

v5:
 - Rebase on v6.8-rc3
 - Remove the writer argument to arm_smmu_entry_writer_ops get_used()
 - Swap order of hweight tests so one call to hweight8() can be removed
 - Add STRTAB_STE_2_S2VMID used for STRTAB_STE_0_CFG_S1_TRANS, for
   S2 bypass the VMID is used but 0
 - Be more exact when generating STEs and store 0's to document the HW
   is using that value and 0 is actually a deliberate choice for VMID and
   SHCFG.
 - Remove cd_table argument to arm_smmu_make_cdtable_ste()
 - Put arm_smmu_rmr_install_bypass_ste() after setting up a 2 level table
 - Pull patch "Check that the RID domain is S1 in SVA" from part 2 to
   guard against memory corruption on failure paths
 - Tighten the used logic for SHCFG to accommodate nesting patches in
   part 3
 - Additional comments and commit message adjustments
v4: https://lore.kernel.org/r/0-v4-c93b774edcc4+42d2b-smmuv3_newapi_p1_jgg@nvidia.com
 - Rebase on v6.8-rc1. Patches 1-3 merged
 - Replace patch "Make STE programming independent of the callers" with
   Michael's version
    * Describe the core API desire for hitless updates
    * Replace the iterator with STE/CD specific function pointers.
      This lets the logic be written top down instead of rolled into an
      iterator
    * Optimize away a sync when the critical qword is the only qword
      to update
 - Pass master not smmu to arm_smmu_write_ste() throughout
 - arm_smmu_make_s2_domain_ste() should use data[1] = not |= since
   it is known to be zero
 - Return errno's from domain_alloc() paths
v3: https://lore.kernel.org/r/0-v3-d794f8d934da+411a-smmuv3_newapi_p1_jgg@nvidia.com
 - Use some local variables in arm_smmu_get_step_for_sid() for clarity
 - White space and spelling changes
 - Commit message updates
 - Keep master->domain_head initialized to avoid a list_del corruption
v2: https://lore.kernel.org/r/0-v2-de8b10590bf5+400-smmuv3_newapi_p1_jgg@nvidia.com
 - Rebased on v6.7-rc1
 - Improve the comment for arm_smmu_write_entry_step()
 - Fix the botched memcmp
 - Document the spec justification for the SHCFG exclusion in used
 - Include STRTAB_STE_1_SHCFG for STRTAB_STE_0_CFG_S2_TRANS in used
 - WARN_ON for unknown STEs in used
 - Fix error unwind in arm_smmu_attach_dev()
 - Whitespace, spelling, and checkpatch related items
v1: https://lore.kernel.org/r/0-v1-e289ca9121be+2be-smmuv3_newapi_p1_jgg@nvidia.com

Jason Gunthorpe (17):
  iommu/arm-smmu-v3: Make STE programming independent of the callers
  iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
  iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into
    functions
  iommu/arm-smmu-v3: Build the whole STE in
    arm_smmu_make_s2_domain_ste()
  iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
  iommu/arm-smmu-v3: Compute the STE only once for each master
  iommu/arm-smmu-v3: Do not change the STE twice during
    arm_smmu_attach_dev()
  iommu/arm-smmu-v3: Put writing the context descriptor in the right
    order
  iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
  iommu/arm-smmu-v3: Remove arm_smmu_master->domain
  iommu/arm-smmu-v3: Check that the RID domain is S1 in SVA
  iommu/arm-smmu-v3: Add a global static IDENTITY domain
  iommu/arm-smmu-v3: Add a global static BLOCKED domain
  iommu/arm-smmu-v3: Use the identity/blocked domain during release
  iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to
    finalize
  iommu/arm-smmu-v3: Convert to domain_alloc_paging()

 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |   8 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 778 ++++++++++++------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   5 +-
 3 files changed, 549 insertions(+), 242 deletions(-)

The diff against v4 is small:

--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -57,8 +57,7 @@ struct arm_smmu_entry_writer {
 struct arm_smmu_entry_writer_ops {
 	unsigned int num_entry_qwords;
 	__le64 v_bit;
-	void (*get_used)(struct arm_smmu_entry_writer *writer, const __le64 *entry,
-			 __le64 *used);
+	void (*get_used)(const __le64 *entry, __le64 *used);
 	void (*sync)(struct arm_smmu_entry_writer *writer);
 };
 
@@ -1006,8 +1005,8 @@ static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
 	u8 used_qword_diff = 0;
 	unsigned int i;
 
-	writer->ops->get_used(writer, entry, cur_used);
-	writer->ops->get_used(writer, target, target_used);
+	writer->ops->get_used(entry, cur_used);
+	writer->ops->get_used(target, target_used);
 
 	for (i = 0; i != writer->ops->num_entry_qwords; i++) {
 		/*
@@ -1084,17 +1083,7 @@ static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
 
 	used_qword_diff =
 		arm_smmu_entry_qword_diff(writer, entry, target, unused_update);
-	if (hweight8(used_qword_diff) > 1) {
-		/*
-		 * At least two qwords need their inuse bits to be changed. This
-		 * requires a breaking update, zero the V bit, write all qwords
-		 * but 0, then set qword 0
-		 */
-		unused_update[0] = entry[0] & (~writer->ops->v_bit);
-		entry_set(writer, entry, unused_update, 0, 1);
-		entry_set(writer, entry, target, 1, num_entry_qwords - 1);
-		entry_set(writer, entry, target, 0, 1);
-	} else if (hweight8(used_qword_diff) == 1) {
+	if (hweight8(used_qword_diff) == 1) {
 		/*
 		 * Only one qword needs its used bits to be changed. This is a
 		 * hitless update, update all bits the current STE is ignoring
@@ -1114,6 +1103,16 @@ static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
 		entry_set(writer, entry, unused_update, 0, num_entry_qwords);
 		entry_set(writer, entry, target, critical_qword_index, 1);
 		entry_set(writer, entry, target, 0, num_entry_qwords);
+	} else if (used_qword_diff) {
+		/*
+		 * At least two qwords need their inuse bits to be changed. This
+		 * requires a breaking update, zero the V bit, write all qwords
+		 * but 0, then set qword 0
+		 */
+		unused_update[0] = entry[0] & (~writer->ops->v_bit);
+		entry_set(writer, entry, unused_update, 0, 1);
+		entry_set(writer, entry, target, 1, num_entry_qwords - 1);
+		entry_set(writer, entry, target, 0, 1);
 	} else {
 		/*
 		 * No inuse bit changed. Sanity check that all unused bits are 0
@@ -1402,28 +1401,30 @@ struct arm_smmu_ste_writer {
  * would be nice if this was complete according to the spec, but minimally it
  * has to capture the bits this driver uses.
  */
-static void arm_smmu_get_ste_used(struct arm_smmu_entry_writer *writer,
-				  const __le64 *ent, __le64 *used_bits)
+static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 {
+	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
+
 	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
 	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
 		return;
 
 	/*
-	 * If S1 is enabled S1DSS is valid, see 13.5 Summary of
-	 * attribute/permission configuration fields for the SHCFG behavior.
+	 * See 13.5 Summary of attribute/permission configuration fields for the
+	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
+	 * and S2 only.
 	 */
-	if (FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0])) & 1 &&
-	    FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
-		    STRTAB_STE_1_S1DSS_BYPASS)
+	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
+	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
+	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
+	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
+		     STRTAB_STE_1_S1DSS_BYPASS))
 		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
 
 	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
-	switch (FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]))) {
+	switch (cfg) {
 	case STRTAB_STE_0_CFG_ABORT:
-		break;
 	case STRTAB_STE_0_CFG_BYPASS:
-		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
 		break;
 	case STRTAB_STE_0_CFG_S1_TRANS:
 		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
@@ -1434,10 +1435,11 @@ static void arm_smmu_get_ste_used(struct arm_smmu_entry_writer *writer,
 				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
 				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
 		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
+		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
 		break;
 	case STRTAB_STE_0_CFG_S2_TRANS:
 		used_bits[1] |=
-			cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG);
+			cpu_to_le64(STRTAB_STE_1_EATS);
 		used_bits[2] |=
 			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
 				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
@@ -1519,9 +1521,9 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
 }
 
 static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
-				      struct arm_smmu_master *master,
-				      struct arm_smmu_ctx_desc_cfg *cd_table)
+				      struct arm_smmu_master *master)
 {
+	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
 	struct arm_smmu_device *smmu = master->smmu;
 
 	memset(target, 0, sizeof(*target));
@@ -1542,11 +1544,30 @@ static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
 			 STRTAB_STE_1_S1STALLD :
 			 0) |
 		FIELD_PREP(STRTAB_STE_1_EATS,
-			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
-		FIELD_PREP(STRTAB_STE_1_STRW,
-			   (smmu->features & ARM_SMMU_FEAT_E2H) ?
-				   STRTAB_STE_1_STRW_EL2 :
-				   STRTAB_STE_1_STRW_NSEL1));
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H) {
+		/*
+		 * To support BTM the streamworld needs to match the
+		 * configuration of the CPU so that the ASID broadcasts are
+		 * properly matched. This means either S/NS-EL2-E2H (hypervisor)
+		 * or NS-EL1 (guest). Since an SVA domain can be installed in a
+		 * PASID this should always use a BTM compatible configuration
+		 * if the HW supports it.
+		 */
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_EL2));
+	} else {
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
+
+		/*
+		 * VMID 0 is reserved for stage-2 bypass EL1 STEs, see
+		 * arm_smmu_domain_alloc_id()
+		 */
+		target->data[2] =
+			cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0));
+	}
 }
 
 static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
@@ -1567,7 +1588,9 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 
 	target->data[1] = cpu_to_le64(
 		FIELD_PREP(STRTAB_STE_1_EATS,
-			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
+		FIELD_PREP(STRTAB_STE_1_SHCFG,
+			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
 
 	vtcr_val = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) |
 		   FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) |
@@ -1590,6 +1613,10 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 				      STRTAB_STE_3_S2TTB_MASK);
 }
 
+/*
+ * This can safely directly manipulate the STE memory without a sync sequence
+ * because the STE table has not been installed in the SMMU yet.
+ */
 static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
 				      unsigned int nent)
 {
@@ -2632,7 +2659,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		if (ret)
 			goto out_list_del;
 
-		arm_smmu_make_cdtable_ste(&target, master, &master->cd_table);
+		arm_smmu_make_cdtable_ste(&target, master);
 		arm_smmu_install_ste_for_dev(master, &target);
 		break;
 	case ARM_SMMU_DOMAIN_S2:
@@ -3325,8 +3352,6 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 
 	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
 
-	/* Check for RMRs and install bypass STEs if any */
-	arm_smmu_rmr_install_bypass_ste(smmu);
 	return 0;
 }
 
@@ -3350,6 +3375,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 
 	ida_init(&smmu->vmid_map);
 
+	/* Check for RMRs and install bypass STEs if any */
+	arm_smmu_rmr_install_bypass_ste(smmu);
 	return 0;
 }
 
@@ -4049,6 +4076,10 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
 				continue;
 			}
 
+			/*
+			 * STE table is not programmed to HW, see
+			 * arm_smmu_init_bypass_stes()
+			 */
 			arm_smmu_make_bypass_ste(
 				arm_smmu_get_step_for_sid(smmu, rmr->sids[i]));
 		}
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 23baf117e7e4b5..23d8ab9a937aa6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -249,6 +249,7 @@ struct arm_smmu_ste {
 #define STRTAB_STE_1_STRW_EL2		2UL
 
 #define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
+#define STRTAB_STE_1_SHCFG_NON_SHARABLE	0UL
 #define STRTAB_STE_1_SHCFG_INCOMING	1UL
 
 #define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)


base-commit: 54be6c6c5ae8e0d93a6c4641cb7528eb0b6ba478
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

As the comment in arm_smmu_write_strtab_ent() explains, this routine has
been limited to only work correctly in certain scenarios that the caller
must ensure. Generally the caller must put the STE into ABORT or BYPASS
before attempting to program it to something else.

The iommu core APIs would ideally expect the driver to do a hitless change
of iommu_domain in a number of cases:

 - RESV_DIRECT support wants IDENTITY -> DMA -> IDENTITY to be hitless
   for the RESV ranges

 - PASID upgrade has IDENTIY on the RID with no PASID then a PASID paging
   domain installed. The RID should not be impacted

 - PASID downgrade has IDENTIY on the RID and all PASID's removed.
   The RID should not be impacted

 - RID does PAGING -> BLOCKING with active PASID, PASID's should not be
   impacted

 - NESTING -> NESTING for carrying all the above hitless cases in a VM
   into the hypervisor. To comprehensively emulate the HW in a VM we should
   assume the VM OS is running logic like this and expecting hitless updates
   to be relayed to real HW.

For CD updates arm_smmu_write_ctx_desc() has a similar comment explaining
how limited it is, and the driver does have a need for hitless CD updates:

 - SMMUv3 BTM S1 ASID re-label

 - SVA mm release should change the CD to answert not-present to all
   requests without allowing logging (EPD0)

The next patches/series are going to start removing some of this logic
from the callers, and add more complex state combinations than currently.
At the end everything that can be hitless will be hitless, including all
of the above.

Introduce arm_smmu_write_entry() which will run through the multi-qword
programming sequence to avoid creating an incoherent 'torn' STE in the HW
caches. It automatically detects which of two algorithms to use:

1) The disruptive V=0 update described in the spec which disrupts the
   entry and does three syncs to make the change:
       - Write V=0 to QWORD 0
       - Write the entire STE except QWORD 0
       - Write QWORD 0

2) A hitless update algorithm that follows the same rational that the driver
   already uses. It is safe to change IGNORED bits that HW doesn't use:
       - Write the target value into all currently unused bits
       - Write a single QWORD, this makes the new STE live atomically
       - Ensure now unused bits are 0

The detection of which path to use and the implementation of the hitless
update rely on a "used bitmask" describing what bits the HW is actually
using based on the V/CFG/etc bits. This flows from the spec language,
typically indicated as IGNORED.

Knowing which bits the HW is using we can update the bits it does not use
and then compute how many QWORDS need to be changed. If only one qword
needs to be updated the hitless algorithm is possible.

Later patches will include CD updates in this mechanism so make the
implementation generic using a struct arm_smmu_entry_writer and struct
arm_smmu_entry_writer_ops to abstract the differences between STE and CD
to be plugged in.

At this point it generates the same sequence of updates as the current
code, except that zeroing the VMID on entry to BYPASS/ABORT will do an
extra sync (this seems to be an existing bug).

Going forward this will use a V=0 transition instead of cycling through
ABORT if a hitfull change is required. This seems more appropriate as ABORT
will fail DMAs without any logging, but dropping a DMA due to transient
V=0 is probably signaling a bug, so the C_BAD_STE is valuable.

Signed-off-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 330 ++++++++++++++++----
 1 file changed, 263 insertions(+), 67 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0ffb1cf17e0b2e..f0b915567cbcdc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
 	ARM_SMMU_MAX_MSIS,
 };
 
+struct arm_smmu_entry_writer_ops;
+struct arm_smmu_entry_writer {
+	const struct arm_smmu_entry_writer_ops *ops;
+	struct arm_smmu_master *master;
+};
+
+struct arm_smmu_entry_writer_ops {
+	unsigned int num_entry_qwords;
+	__le64 v_bit;
+	void (*get_used)(const __le64 *entry, __le64 *used);
+	void (*sync)(struct arm_smmu_entry_writer *writer);
+};
+
+#define NUM_ENTRY_QWORDS (sizeof(struct arm_smmu_ste) / sizeof(u64))
+
 static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
 	[EVTQ_MSI_INDEX] = {
 		ARM_SMMU_EVTQ_IRQ_CFG0,
@@ -971,6 +986,140 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
 	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 }
 
+/*
+ * Figure out if we can do a hitless update of entry to become target. Returns a
+ * bit mask where 1 indicates that qword needs to be set disruptively.
+ * unused_update is an intermediate value of entry that has unused bits set to
+ * their new values.
+ */
+static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
+				    const __le64 *entry, const __le64 *target,
+				    __le64 *unused_update)
+{
+	__le64 target_used[NUM_ENTRY_QWORDS] = {};
+	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
+	u8 used_qword_diff = 0;
+	unsigned int i;
+
+	writer->ops->get_used(entry, cur_used);
+	writer->ops->get_used(target, target_used);
+
+	for (i = 0; i != writer->ops->num_entry_qwords; i++) {
+		/*
+		 * Check that masks are up to date, the make functions are not
+		 * allowed to set a bit to 1 if the used function doesn't say it
+		 * is used.
+		 */
+		WARN_ON_ONCE(target[i] & ~target_used[i]);
+
+		/* Bits can change because they are not currently being used */
+		unused_update[i] = (entry[i] & cur_used[i]) |
+				   (target[i] & ~cur_used[i]);
+		/*
+		 * Each bit indicates that a used bit in a qword needs to be
+		 * changed after unused_update is applied.
+		 */
+		if ((unused_update[i] & target_used[i]) != target[i])
+			used_qword_diff |= 1 << i;
+	}
+	return used_qword_diff;
+}
+
+static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry,
+		      const __le64 *target, unsigned int start,
+		      unsigned int len)
+{
+	bool changed = false;
+	unsigned int i;
+
+	for (i = start; len != 0; len--, i++) {
+		if (entry[i] != target[i]) {
+			WRITE_ONCE(entry[i], target[i]);
+			changed = true;
+		}
+	}
+
+	if (changed)
+		writer->ops->sync(writer);
+	return changed;
+}
+
+/*
+ * Update the STE/CD to the target configuration. The transition from the
+ * current entry to the target entry takes place over multiple steps that
+ * attempts to make the transition hitless if possible. This function takes care
+ * not to create a situation where the HW can perceive a corrupted entry. HW is
+ * only required to have a 64 bit atomicity with stores from the CPU, while
+ * entries are many 64 bit values big.
+ *
+ * The difference between the current value and the target value is analyzed to
+ * determine which of three updates are required - disruptive, hitless or no
+ * change.
+ *
+ * In the most general disruptive case we can make any update in three steps:
+ *  - Disrupting the entry (V=0)
+ *  - Fill now unused qwords, execpt qword 0 which contains V
+ *  - Make qword 0 have the final value and valid (V=1) with a single 64
+ *    bit store
+ *
+ * However this disrupts the HW while it is happening. There are several
+ * interesting cases where a STE/CD can be updated without disturbing the HW
+ * because only a small number of bits are changing (S1DSS, CONFIG, etc) or
+ * because the used bits don't intersect. We can detect this by calculating how
+ * many 64 bit values need update after adjusting the unused bits and skip the
+ * V=0 process. This relies on the IGNORED behavior described in the
+ * specification.
+ */
+static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
+				 __le64 *entry, const __le64 *target)
+{
+	unsigned int num_entry_qwords = writer->ops->num_entry_qwords;
+	__le64 unused_update[NUM_ENTRY_QWORDS];
+	u8 used_qword_diff;
+
+	used_qword_diff =
+		arm_smmu_entry_qword_diff(writer, entry, target, unused_update);
+	if (hweight8(used_qword_diff) == 1) {
+		/*
+		 * Only one qword needs its used bits to be changed. This is a
+		 * hitless update, update all bits the current STE is ignoring
+		 * to their new values, then update a single "critical qword" to
+		 * change the STE and finally 0 out any bits that are now unused
+		 * in the target configuration.
+		 */
+		unsigned int critical_qword_index = ffs(used_qword_diff) - 1;
+
+		/*
+		 * Skip writing unused bits in the critical qword since we'll be
+		 * writing it in the next step anyways. This can save a sync
+		 * when the only change is in that qword.
+		 */
+		unused_update[critical_qword_index] =
+			entry[critical_qword_index];
+		entry_set(writer, entry, unused_update, 0, num_entry_qwords);
+		entry_set(writer, entry, target, critical_qword_index, 1);
+		entry_set(writer, entry, target, 0, num_entry_qwords);
+	} else if (used_qword_diff) {
+		/*
+		 * At least two qwords need their inuse bits to be changed. This
+		 * requires a breaking update, zero the V bit, write all qwords
+		 * but 0, then set qword 0
+		 */
+		unused_update[0] = entry[0] & (~writer->ops->v_bit);
+		entry_set(writer, entry, unused_update, 0, 1);
+		entry_set(writer, entry, target, 1, num_entry_qwords - 1);
+		entry_set(writer, entry, target, 0, 1);
+	} else {
+		/*
+		 * No inuse bit changed. Sanity check that all unused bits are 0
+		 * in the entry. The target was already sanity checked by
+		 * compute_qword_diff().
+		 */
+		WARN_ON_ONCE(
+			entry_set(writer, entry, target, 0, num_entry_qwords));
+	}
+}
+
 static void arm_smmu_sync_cd(struct arm_smmu_master *master,
 			     int ssid, bool leaf)
 {
@@ -1238,50 +1387,126 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
 	WRITE_ONCE(*dst, cpu_to_le64(val));
 }
 
-static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
+struct arm_smmu_ste_writer {
+	struct arm_smmu_entry_writer writer;
+	u32 sid;
+};
+
+/*
+ * Based on the value of ent report which bits of the STE the HW will access. It
+ * would be nice if this was complete according to the spec, but minimally it
+ * has to capture the bits this driver uses.
+ */
+static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 {
+	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
+
+	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
+	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
+		return;
+
+	/*
+	 * See 13.5 Summary of attribute/permission configuration fields for the
+	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
+	 * and S2 only.
+	 */
+	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
+	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
+	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
+	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
+		     STRTAB_STE_1_S1DSS_BYPASS))
+		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
+
+	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
+	switch (cfg) {
+	case STRTAB_STE_0_CFG_ABORT:
+	case STRTAB_STE_0_CFG_BYPASS:
+		break;
+	case STRTAB_STE_0_CFG_S1_TRANS:
+		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
+					    STRTAB_STE_0_S1CTXPTR_MASK |
+					    STRTAB_STE_0_S1CDMAX);
+		used_bits[1] |=
+			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
+				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
+				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
+		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
+		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
+		break;
+	case STRTAB_STE_0_CFG_S2_TRANS:
+		used_bits[1] |=
+			cpu_to_le64(STRTAB_STE_1_EATS);
+		used_bits[2] |=
+			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
+				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
+				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
+		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
+		break;
+
+	default:
+		memset(used_bits, 0xFF, sizeof(struct arm_smmu_ste));
+		WARN_ON(true);
+	}
+}
+
+static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
+{
+	struct arm_smmu_ste_writer *ste_writer =
+		container_of(writer, struct arm_smmu_ste_writer, writer);
 	struct arm_smmu_cmdq_ent cmd = {
 		.opcode	= CMDQ_OP_CFGI_STE,
 		.cfgi	= {
-			.sid	= sid,
+			.sid	= ste_writer->sid,
 			.leaf	= true,
 		},
 	};
 
-	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
+	arm_smmu_cmdq_issue_cmd_with_sync(writer->master->smmu, &cmd);
+}
+
+static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
+	.sync = arm_smmu_ste_writer_sync_entry,
+	.get_used = arm_smmu_get_ste_used,
+	.v_bit = cpu_to_le64(STRTAB_STE_0_V),
+	.num_entry_qwords = sizeof(struct arm_smmu_ste) / sizeof(u64),
+};
+
+static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
+			       struct arm_smmu_ste *ste,
+			       const struct arm_smmu_ste *target)
+{
+	struct arm_smmu_device *smmu = master->smmu;
+	struct arm_smmu_ste_writer ste_writer = {
+		.writer = {
+			.ops = &arm_smmu_ste_writer_ops,
+			.master = master,
+		},
+		.sid = sid,
+	};
+
+	arm_smmu_write_entry(&ste_writer.writer, ste->data, target->data);
+
+	/* It's likely that we'll want to use the new STE soon */
+	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) {
+		struct arm_smmu_cmdq_ent
+			prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG,
+					 .prefetch = {
+						 .sid = sid,
+					 } };
+
+		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
+	}
 }
 
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 				      struct arm_smmu_ste *dst)
 {
-	/*
-	 * This is hideously complicated, but we only really care about
-	 * three cases at the moment:
-	 *
-	 * 1. Invalid (all zero) -> bypass/fault (init)
-	 * 2. Bypass/fault -> translation/bypass (attach)
-	 * 3. Translation/bypass -> bypass/fault (detach)
-	 *
-	 * Given that we can't update the STE atomically and the SMMU
-	 * doesn't read the thing in a defined order, that leaves us
-	 * with the following maintenance requirements:
-	 *
-	 * 1. Update Config, return (init time STEs aren't live)
-	 * 2. Write everything apart from dword 0, sync, write dword 0, sync
-	 * 3. Update Config, sync
-	 */
-	u64 val = le64_to_cpu(dst->data[0]);
-	bool ste_live = false;
+	u64 val;
 	struct arm_smmu_device *smmu = master->smmu;
 	struct arm_smmu_ctx_desc_cfg *cd_table = NULL;
 	struct arm_smmu_s2_cfg *s2_cfg = NULL;
 	struct arm_smmu_domain *smmu_domain = master->domain;
-	struct arm_smmu_cmdq_ent prefetch_cmd = {
-		.opcode		= CMDQ_OP_PREFETCH_CFG,
-		.prefetch	= {
-			.sid	= sid,
-		},
-	};
+	struct arm_smmu_ste target = {};
 
 	if (smmu_domain) {
 		switch (smmu_domain->stage) {
@@ -1296,22 +1521,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		}
 	}
 
-	if (val & STRTAB_STE_0_V) {
-		switch (FIELD_GET(STRTAB_STE_0_CFG, val)) {
-		case STRTAB_STE_0_CFG_BYPASS:
-			break;
-		case STRTAB_STE_0_CFG_S1_TRANS:
-		case STRTAB_STE_0_CFG_S2_TRANS:
-			ste_live = true;
-			break;
-		case STRTAB_STE_0_CFG_ABORT:
-			BUG_ON(!disable_bypass);
-			break;
-		default:
-			BUG(); /* STE corruption */
-		}
-	}
-
 	/* Nuke the existing STE_0 value, as we're going to rewrite it */
 	val = STRTAB_STE_0_V;
 
@@ -1322,16 +1531,11 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		else
 			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
 
-		dst->data[0] = cpu_to_le64(val);
-		dst->data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
+		target.data[0] = cpu_to_le64(val);
+		target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
 						STRTAB_STE_1_SHCFG_INCOMING));
-		dst->data[2] = 0; /* Nuke the VMID */
-		/*
-		 * The SMMU can perform negative caching, so we must sync
-		 * the STE regardless of whether the old value was live.
-		 */
-		if (smmu)
-			arm_smmu_sync_ste_for_sid(smmu, sid);
+		target.data[2] = 0; /* Nuke the VMID */
+		arm_smmu_write_ste(master, sid, dst, &target);
 		return;
 	}
 
@@ -1339,8 +1543,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
 			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
 
-		BUG_ON(ste_live);
-		dst->data[1] = cpu_to_le64(
+		target.data[1] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
 			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
@@ -1349,7 +1552,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		    !master->stall_enabled)
-			dst->data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
+			target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
 			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
@@ -1358,8 +1561,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	}
 
 	if (s2_cfg) {
-		BUG_ON(ste_live);
-		dst->data[2] = cpu_to_le64(
+		target.data[2] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
 			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
 #ifdef __BIG_ENDIAN
@@ -1368,23 +1570,17 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
 			 STRTAB_STE_2_S2R);
 
-		dst->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+		target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
 
 		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
 	}
 
 	if (master->ats_enabled)
-		dst->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
+		target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
 						 STRTAB_STE_1_EATS_TRANS));
 
-	arm_smmu_sync_ste_for_sid(smmu, sid);
-	/* See comment in arm_smmu_write_ctx_desc() */
-	WRITE_ONCE(dst->data[0], cpu_to_le64(val));
-	arm_smmu_sync_ste_for_sid(smmu, sid);
-
-	/* It's likely that we'll want to use the new STE soon */
-	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH))
-		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
+	target.data[0] = cpu_to_le64(val);
+	arm_smmu_write_ste(master, sid, dst, &target);
 }
 
 static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

As the comment in arm_smmu_write_strtab_ent() explains, this routine has
been limited to only work correctly in certain scenarios that the caller
must ensure. Generally the caller must put the STE into ABORT or BYPASS
before attempting to program it to something else.

The iommu core APIs would ideally expect the driver to do a hitless change
of iommu_domain in a number of cases:

 - RESV_DIRECT support wants IDENTITY -> DMA -> IDENTITY to be hitless
   for the RESV ranges

 - PASID upgrade has IDENTIY on the RID with no PASID then a PASID paging
   domain installed. The RID should not be impacted

 - PASID downgrade has IDENTIY on the RID and all PASID's removed.
   The RID should not be impacted

 - RID does PAGING -> BLOCKING with active PASID, PASID's should not be
   impacted

 - NESTING -> NESTING for carrying all the above hitless cases in a VM
   into the hypervisor. To comprehensively emulate the HW in a VM we should
   assume the VM OS is running logic like this and expecting hitless updates
   to be relayed to real HW.

For CD updates arm_smmu_write_ctx_desc() has a similar comment explaining
how limited it is, and the driver does have a need for hitless CD updates:

 - SMMUv3 BTM S1 ASID re-label

 - SVA mm release should change the CD to answert not-present to all
   requests without allowing logging (EPD0)

The next patches/series are going to start removing some of this logic
from the callers, and add more complex state combinations than currently.
At the end everything that can be hitless will be hitless, including all
of the above.

Introduce arm_smmu_write_entry() which will run through the multi-qword
programming sequence to avoid creating an incoherent 'torn' STE in the HW
caches. It automatically detects which of two algorithms to use:

1) The disruptive V=0 update described in the spec which disrupts the
   entry and does three syncs to make the change:
       - Write V=0 to QWORD 0
       - Write the entire STE except QWORD 0
       - Write QWORD 0

2) A hitless update algorithm that follows the same rational that the driver
   already uses. It is safe to change IGNORED bits that HW doesn't use:
       - Write the target value into all currently unused bits
       - Write a single QWORD, this makes the new STE live atomically
       - Ensure now unused bits are 0

The detection of which path to use and the implementation of the hitless
update rely on a "used bitmask" describing what bits the HW is actually
using based on the V/CFG/etc bits. This flows from the spec language,
typically indicated as IGNORED.

Knowing which bits the HW is using we can update the bits it does not use
and then compute how many QWORDS need to be changed. If only one qword
needs to be updated the hitless algorithm is possible.

Later patches will include CD updates in this mechanism so make the
implementation generic using a struct arm_smmu_entry_writer and struct
arm_smmu_entry_writer_ops to abstract the differences between STE and CD
to be plugged in.

At this point it generates the same sequence of updates as the current
code, except that zeroing the VMID on entry to BYPASS/ABORT will do an
extra sync (this seems to be an existing bug).

Going forward this will use a V=0 transition instead of cycling through
ABORT if a hitfull change is required. This seems more appropriate as ABORT
will fail DMAs without any logging, but dropping a DMA due to transient
V=0 is probably signaling a bug, so the C_BAD_STE is valuable.

Signed-off-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 330 ++++++++++++++++----
 1 file changed, 263 insertions(+), 67 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 0ffb1cf17e0b2e..f0b915567cbcdc 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
 	ARM_SMMU_MAX_MSIS,
 };
 
+struct arm_smmu_entry_writer_ops;
+struct arm_smmu_entry_writer {
+	const struct arm_smmu_entry_writer_ops *ops;
+	struct arm_smmu_master *master;
+};
+
+struct arm_smmu_entry_writer_ops {
+	unsigned int num_entry_qwords;
+	__le64 v_bit;
+	void (*get_used)(const __le64 *entry, __le64 *used);
+	void (*sync)(struct arm_smmu_entry_writer *writer);
+};
+
+#define NUM_ENTRY_QWORDS (sizeof(struct arm_smmu_ste) / sizeof(u64))
+
 static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = {
 	[EVTQ_MSI_INDEX] = {
 		ARM_SMMU_EVTQ_IRQ_CFG0,
@@ -971,6 +986,140 @@ void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
 	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
 }
 
+/*
+ * Figure out if we can do a hitless update of entry to become target. Returns a
+ * bit mask where 1 indicates that qword needs to be set disruptively.
+ * unused_update is an intermediate value of entry that has unused bits set to
+ * their new values.
+ */
+static u8 arm_smmu_entry_qword_diff(struct arm_smmu_entry_writer *writer,
+				    const __le64 *entry, const __le64 *target,
+				    __le64 *unused_update)
+{
+	__le64 target_used[NUM_ENTRY_QWORDS] = {};
+	__le64 cur_used[NUM_ENTRY_QWORDS] = {};
+	u8 used_qword_diff = 0;
+	unsigned int i;
+
+	writer->ops->get_used(entry, cur_used);
+	writer->ops->get_used(target, target_used);
+
+	for (i = 0; i != writer->ops->num_entry_qwords; i++) {
+		/*
+		 * Check that masks are up to date, the make functions are not
+		 * allowed to set a bit to 1 if the used function doesn't say it
+		 * is used.
+		 */
+		WARN_ON_ONCE(target[i] & ~target_used[i]);
+
+		/* Bits can change because they are not currently being used */
+		unused_update[i] = (entry[i] & cur_used[i]) |
+				   (target[i] & ~cur_used[i]);
+		/*
+		 * Each bit indicates that a used bit in a qword needs to be
+		 * changed after unused_update is applied.
+		 */
+		if ((unused_update[i] & target_used[i]) != target[i])
+			used_qword_diff |= 1 << i;
+	}
+	return used_qword_diff;
+}
+
+static bool entry_set(struct arm_smmu_entry_writer *writer, __le64 *entry,
+		      const __le64 *target, unsigned int start,
+		      unsigned int len)
+{
+	bool changed = false;
+	unsigned int i;
+
+	for (i = start; len != 0; len--, i++) {
+		if (entry[i] != target[i]) {
+			WRITE_ONCE(entry[i], target[i]);
+			changed = true;
+		}
+	}
+
+	if (changed)
+		writer->ops->sync(writer);
+	return changed;
+}
+
+/*
+ * Update the STE/CD to the target configuration. The transition from the
+ * current entry to the target entry takes place over multiple steps that
+ * attempts to make the transition hitless if possible. This function takes care
+ * not to create a situation where the HW can perceive a corrupted entry. HW is
+ * only required to have a 64 bit atomicity with stores from the CPU, while
+ * entries are many 64 bit values big.
+ *
+ * The difference between the current value and the target value is analyzed to
+ * determine which of three updates are required - disruptive, hitless or no
+ * change.
+ *
+ * In the most general disruptive case we can make any update in three steps:
+ *  - Disrupting the entry (V=0)
+ *  - Fill now unused qwords, execpt qword 0 which contains V
+ *  - Make qword 0 have the final value and valid (V=1) with a single 64
+ *    bit store
+ *
+ * However this disrupts the HW while it is happening. There are several
+ * interesting cases where a STE/CD can be updated without disturbing the HW
+ * because only a small number of bits are changing (S1DSS, CONFIG, etc) or
+ * because the used bits don't intersect. We can detect this by calculating how
+ * many 64 bit values need update after adjusting the unused bits and skip the
+ * V=0 process. This relies on the IGNORED behavior described in the
+ * specification.
+ */
+static void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer,
+				 __le64 *entry, const __le64 *target)
+{
+	unsigned int num_entry_qwords = writer->ops->num_entry_qwords;
+	__le64 unused_update[NUM_ENTRY_QWORDS];
+	u8 used_qword_diff;
+
+	used_qword_diff =
+		arm_smmu_entry_qword_diff(writer, entry, target, unused_update);
+	if (hweight8(used_qword_diff) == 1) {
+		/*
+		 * Only one qword needs its used bits to be changed. This is a
+		 * hitless update, update all bits the current STE is ignoring
+		 * to their new values, then update a single "critical qword" to
+		 * change the STE and finally 0 out any bits that are now unused
+		 * in the target configuration.
+		 */
+		unsigned int critical_qword_index = ffs(used_qword_diff) - 1;
+
+		/*
+		 * Skip writing unused bits in the critical qword since we'll be
+		 * writing it in the next step anyways. This can save a sync
+		 * when the only change is in that qword.
+		 */
+		unused_update[critical_qword_index] =
+			entry[critical_qword_index];
+		entry_set(writer, entry, unused_update, 0, num_entry_qwords);
+		entry_set(writer, entry, target, critical_qword_index, 1);
+		entry_set(writer, entry, target, 0, num_entry_qwords);
+	} else if (used_qword_diff) {
+		/*
+		 * At least two qwords need their inuse bits to be changed. This
+		 * requires a breaking update, zero the V bit, write all qwords
+		 * but 0, then set qword 0
+		 */
+		unused_update[0] = entry[0] & (~writer->ops->v_bit);
+		entry_set(writer, entry, unused_update, 0, 1);
+		entry_set(writer, entry, target, 1, num_entry_qwords - 1);
+		entry_set(writer, entry, target, 0, 1);
+	} else {
+		/*
+		 * No inuse bit changed. Sanity check that all unused bits are 0
+		 * in the entry. The target was already sanity checked by
+		 * compute_qword_diff().
+		 */
+		WARN_ON_ONCE(
+			entry_set(writer, entry, target, 0, num_entry_qwords));
+	}
+}
+
 static void arm_smmu_sync_cd(struct arm_smmu_master *master,
 			     int ssid, bool leaf)
 {
@@ -1238,50 +1387,126 @@ arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc)
 	WRITE_ONCE(*dst, cpu_to_le64(val));
 }
 
-static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid)
+struct arm_smmu_ste_writer {
+	struct arm_smmu_entry_writer writer;
+	u32 sid;
+};
+
+/*
+ * Based on the value of ent report which bits of the STE the HW will access. It
+ * would be nice if this was complete according to the spec, but minimally it
+ * has to capture the bits this driver uses.
+ */
+static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
 {
+	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
+
+	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
+	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
+		return;
+
+	/*
+	 * See 13.5 Summary of attribute/permission configuration fields for the
+	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
+	 * and S2 only.
+	 */
+	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
+	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
+	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
+	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
+		     STRTAB_STE_1_S1DSS_BYPASS))
+		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
+
+	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
+	switch (cfg) {
+	case STRTAB_STE_0_CFG_ABORT:
+	case STRTAB_STE_0_CFG_BYPASS:
+		break;
+	case STRTAB_STE_0_CFG_S1_TRANS:
+		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
+					    STRTAB_STE_0_S1CTXPTR_MASK |
+					    STRTAB_STE_0_S1CDMAX);
+		used_bits[1] |=
+			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
+				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
+				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
+		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
+		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
+		break;
+	case STRTAB_STE_0_CFG_S2_TRANS:
+		used_bits[1] |=
+			cpu_to_le64(STRTAB_STE_1_EATS);
+		used_bits[2] |=
+			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
+				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
+				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
+		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
+		break;
+
+	default:
+		memset(used_bits, 0xFF, sizeof(struct arm_smmu_ste));
+		WARN_ON(true);
+	}
+}
+
+static void arm_smmu_ste_writer_sync_entry(struct arm_smmu_entry_writer *writer)
+{
+	struct arm_smmu_ste_writer *ste_writer =
+		container_of(writer, struct arm_smmu_ste_writer, writer);
 	struct arm_smmu_cmdq_ent cmd = {
 		.opcode	= CMDQ_OP_CFGI_STE,
 		.cfgi	= {
-			.sid	= sid,
+			.sid	= ste_writer->sid,
 			.leaf	= true,
 		},
 	};
 
-	arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd);
+	arm_smmu_cmdq_issue_cmd_with_sync(writer->master->smmu, &cmd);
+}
+
+static const struct arm_smmu_entry_writer_ops arm_smmu_ste_writer_ops = {
+	.sync = arm_smmu_ste_writer_sync_entry,
+	.get_used = arm_smmu_get_ste_used,
+	.v_bit = cpu_to_le64(STRTAB_STE_0_V),
+	.num_entry_qwords = sizeof(struct arm_smmu_ste) / sizeof(u64),
+};
+
+static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
+			       struct arm_smmu_ste *ste,
+			       const struct arm_smmu_ste *target)
+{
+	struct arm_smmu_device *smmu = master->smmu;
+	struct arm_smmu_ste_writer ste_writer = {
+		.writer = {
+			.ops = &arm_smmu_ste_writer_ops,
+			.master = master,
+		},
+		.sid = sid,
+	};
+
+	arm_smmu_write_entry(&ste_writer.writer, ste->data, target->data);
+
+	/* It's likely that we'll want to use the new STE soon */
+	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) {
+		struct arm_smmu_cmdq_ent
+			prefetch_cmd = { .opcode = CMDQ_OP_PREFETCH_CFG,
+					 .prefetch = {
+						 .sid = sid,
+					 } };
+
+		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
+	}
 }
 
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 				      struct arm_smmu_ste *dst)
 {
-	/*
-	 * This is hideously complicated, but we only really care about
-	 * three cases at the moment:
-	 *
-	 * 1. Invalid (all zero) -> bypass/fault (init)
-	 * 2. Bypass/fault -> translation/bypass (attach)
-	 * 3. Translation/bypass -> bypass/fault (detach)
-	 *
-	 * Given that we can't update the STE atomically and the SMMU
-	 * doesn't read the thing in a defined order, that leaves us
-	 * with the following maintenance requirements:
-	 *
-	 * 1. Update Config, return (init time STEs aren't live)
-	 * 2. Write everything apart from dword 0, sync, write dword 0, sync
-	 * 3. Update Config, sync
-	 */
-	u64 val = le64_to_cpu(dst->data[0]);
-	bool ste_live = false;
+	u64 val;
 	struct arm_smmu_device *smmu = master->smmu;
 	struct arm_smmu_ctx_desc_cfg *cd_table = NULL;
 	struct arm_smmu_s2_cfg *s2_cfg = NULL;
 	struct arm_smmu_domain *smmu_domain = master->domain;
-	struct arm_smmu_cmdq_ent prefetch_cmd = {
-		.opcode		= CMDQ_OP_PREFETCH_CFG,
-		.prefetch	= {
-			.sid	= sid,
-		},
-	};
+	struct arm_smmu_ste target = {};
 
 	if (smmu_domain) {
 		switch (smmu_domain->stage) {
@@ -1296,22 +1521,6 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		}
 	}
 
-	if (val & STRTAB_STE_0_V) {
-		switch (FIELD_GET(STRTAB_STE_0_CFG, val)) {
-		case STRTAB_STE_0_CFG_BYPASS:
-			break;
-		case STRTAB_STE_0_CFG_S1_TRANS:
-		case STRTAB_STE_0_CFG_S2_TRANS:
-			ste_live = true;
-			break;
-		case STRTAB_STE_0_CFG_ABORT:
-			BUG_ON(!disable_bypass);
-			break;
-		default:
-			BUG(); /* STE corruption */
-		}
-	}
-
 	/* Nuke the existing STE_0 value, as we're going to rewrite it */
 	val = STRTAB_STE_0_V;
 
@@ -1322,16 +1531,11 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		else
 			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
 
-		dst->data[0] = cpu_to_le64(val);
-		dst->data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
+		target.data[0] = cpu_to_le64(val);
+		target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
 						STRTAB_STE_1_SHCFG_INCOMING));
-		dst->data[2] = 0; /* Nuke the VMID */
-		/*
-		 * The SMMU can perform negative caching, so we must sync
-		 * the STE regardless of whether the old value was live.
-		 */
-		if (smmu)
-			arm_smmu_sync_ste_for_sid(smmu, sid);
+		target.data[2] = 0; /* Nuke the VMID */
+		arm_smmu_write_ste(master, sid, dst, &target);
 		return;
 	}
 
@@ -1339,8 +1543,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
 			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
 
-		BUG_ON(ste_live);
-		dst->data[1] = cpu_to_le64(
+		target.data[1] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
 			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
 			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
@@ -1349,7 +1552,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 
 		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
 		    !master->stall_enabled)
-			dst->data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
+			target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
 
 		val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
 			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
@@ -1358,8 +1561,7 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	}
 
 	if (s2_cfg) {
-		BUG_ON(ste_live);
-		dst->data[2] = cpu_to_le64(
+		target.data[2] = cpu_to_le64(
 			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
 			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
 #ifdef __BIG_ENDIAN
@@ -1368,23 +1570,17 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
 			 STRTAB_STE_2_S2R);
 
-		dst->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+		target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
 
 		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
 	}
 
 	if (master->ats_enabled)
-		dst->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
+		target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
 						 STRTAB_STE_1_EATS_TRANS));
 
-	arm_smmu_sync_ste_for_sid(smmu, sid);
-	/* See comment in arm_smmu_write_ctx_desc() */
-	WRITE_ONCE(dst->data[0], cpu_to_le64(val));
-	arm_smmu_sync_ste_for_sid(smmu, sid);
-
-	/* It's likely that we'll want to use the new STE soon */
-	if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH))
-		arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd);
+	target.data[0] = cpu_to_le64(val);
+	arm_smmu_write_ste(master, sid, dst, &target);
 }
 
 static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

This allows writing the flow of arm_smmu_write_strtab_ent() around abort
and bypass domains more naturally.

Note that the core code no longer supplies NULL domains, though there is
still a flow in the driver that end up in arm_smmu_write_strtab_ent() with
NULL. A later patch will remove it.

Remove the duplicate calculation of the STE in arm_smmu_init_bypass_stes()
and remove the force parameter. arm_smmu_rmr_install_bypass_ste() can now
simply invoke arm_smmu_make_bypass_ste() directly.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 97 ++++++++++++---------
 1 file changed, 55 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f0b915567cbcdc..6123e5ad95822c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1498,6 +1498,24 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
 	}
 }
 
+static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
+{
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT));
+}
+
+static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
+{
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS));
+	target->data[1] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
+}
+
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 				      struct arm_smmu_ste *dst)
 {
@@ -1508,37 +1526,31 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	struct arm_smmu_domain *smmu_domain = master->domain;
 	struct arm_smmu_ste target = {};
 
-	if (smmu_domain) {
-		switch (smmu_domain->stage) {
-		case ARM_SMMU_DOMAIN_S1:
-			cd_table = &master->cd_table;
-			break;
-		case ARM_SMMU_DOMAIN_S2:
-			s2_cfg = &smmu_domain->s2_cfg;
-			break;
-		default:
-			break;
-		}
+	if (!smmu_domain) {
+		if (disable_bypass)
+			arm_smmu_make_abort_ste(&target);
+		else
+			arm_smmu_make_bypass_ste(&target);
+		arm_smmu_write_ste(master, sid, dst, &target);
+		return;
+	}
+
+	switch (smmu_domain->stage) {
+	case ARM_SMMU_DOMAIN_S1:
+		cd_table = &master->cd_table;
+		break;
+	case ARM_SMMU_DOMAIN_S2:
+		s2_cfg = &smmu_domain->s2_cfg;
+		break;
+	case ARM_SMMU_DOMAIN_BYPASS:
+		arm_smmu_make_bypass_ste(&target);
+		arm_smmu_write_ste(master, sid, dst, &target);
+		return;
 	}
 
 	/* Nuke the existing STE_0 value, as we're going to rewrite it */
 	val = STRTAB_STE_0_V;
 
-	/* Bypass/fault */
-	if (!smmu_domain || !(cd_table || s2_cfg)) {
-		if (!smmu_domain && disable_bypass)
-			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
-		else
-			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
-
-		target.data[0] = cpu_to_le64(val);
-		target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
-						STRTAB_STE_1_SHCFG_INCOMING));
-		target.data[2] = 0; /* Nuke the VMID */
-		arm_smmu_write_ste(master, sid, dst, &target);
-		return;
-	}
-
 	if (cd_table) {
 		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
 			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
@@ -1583,22 +1595,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	arm_smmu_write_ste(master, sid, dst, &target);
 }
 
+/*
+ * This can safely directly manipulate the STE memory without a sync sequence
+ * because the STE table has not been installed in the SMMU yet.
+ */
 static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
-				      unsigned int nent, bool force)
+				      unsigned int nent)
 {
 	unsigned int i;
-	u64 val = STRTAB_STE_0_V;
-
-	if (disable_bypass && !force)
-		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
-	else
-		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
 
 	for (i = 0; i < nent; ++i) {
-		strtab->data[0] = cpu_to_le64(val);
-		strtab->data[1] = cpu_to_le64(FIELD_PREP(
-			STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
-		strtab->data[2] = 0;
+		if (disable_bypass)
+			arm_smmu_make_abort_ste(strtab);
+		else
+			arm_smmu_make_bypass_ste(strtab);
 		strtab++;
 	}
 }
@@ -1626,7 +1636,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 		return -ENOMEM;
 	}
 
-	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
+	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
 	arm_smmu_write_strtab_l1_desc(strtab, desc);
 	return 0;
 }
@@ -3245,7 +3255,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
 	cfg->strtab_base_cfg = reg;
 
-	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
+	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
 	return 0;
 }
 
@@ -3956,7 +3966,6 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
 	iort_get_rmr_sids(dev_fwnode(smmu->dev), &rmr_list);
 
 	list_for_each_entry(e, &rmr_list, list) {
-		struct arm_smmu_ste *step;
 		struct iommu_iort_rmr_data *rmr;
 		int ret, i;
 
@@ -3969,8 +3978,12 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
 				continue;
 			}
 
-			step = arm_smmu_get_step_for_sid(smmu, rmr->sids[i]);
-			arm_smmu_init_bypass_stes(step, 1, true);
+			/*
+			 * STE table is not programmed to HW, see
+			 * arm_smmu_init_bypass_stes()
+			 */
+			arm_smmu_make_bypass_ste(
+				arm_smmu_get_step_for_sid(smmu, rmr->sids[i]));
 		}
 	}
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

This allows writing the flow of arm_smmu_write_strtab_ent() around abort
and bypass domains more naturally.

Note that the core code no longer supplies NULL domains, though there is
still a flow in the driver that end up in arm_smmu_write_strtab_ent() with
NULL. A later patch will remove it.

Remove the duplicate calculation of the STE in arm_smmu_init_bypass_stes()
and remove the force parameter. arm_smmu_rmr_install_bypass_ste() can now
simply invoke arm_smmu_make_bypass_ste() directly.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 97 ++++++++++++---------
 1 file changed, 55 insertions(+), 42 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index f0b915567cbcdc..6123e5ad95822c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1498,6 +1498,24 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
 	}
 }
 
+static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
+{
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT));
+}
+
+static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
+{
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS));
+	target->data[1] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
+}
+
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 				      struct arm_smmu_ste *dst)
 {
@@ -1508,37 +1526,31 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	struct arm_smmu_domain *smmu_domain = master->domain;
 	struct arm_smmu_ste target = {};
 
-	if (smmu_domain) {
-		switch (smmu_domain->stage) {
-		case ARM_SMMU_DOMAIN_S1:
-			cd_table = &master->cd_table;
-			break;
-		case ARM_SMMU_DOMAIN_S2:
-			s2_cfg = &smmu_domain->s2_cfg;
-			break;
-		default:
-			break;
-		}
+	if (!smmu_domain) {
+		if (disable_bypass)
+			arm_smmu_make_abort_ste(&target);
+		else
+			arm_smmu_make_bypass_ste(&target);
+		arm_smmu_write_ste(master, sid, dst, &target);
+		return;
+	}
+
+	switch (smmu_domain->stage) {
+	case ARM_SMMU_DOMAIN_S1:
+		cd_table = &master->cd_table;
+		break;
+	case ARM_SMMU_DOMAIN_S2:
+		s2_cfg = &smmu_domain->s2_cfg;
+		break;
+	case ARM_SMMU_DOMAIN_BYPASS:
+		arm_smmu_make_bypass_ste(&target);
+		arm_smmu_write_ste(master, sid, dst, &target);
+		return;
 	}
 
 	/* Nuke the existing STE_0 value, as we're going to rewrite it */
 	val = STRTAB_STE_0_V;
 
-	/* Bypass/fault */
-	if (!smmu_domain || !(cd_table || s2_cfg)) {
-		if (!smmu_domain && disable_bypass)
-			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
-		else
-			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
-
-		target.data[0] = cpu_to_le64(val);
-		target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
-						STRTAB_STE_1_SHCFG_INCOMING));
-		target.data[2] = 0; /* Nuke the VMID */
-		arm_smmu_write_ste(master, sid, dst, &target);
-		return;
-	}
-
 	if (cd_table) {
 		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
 			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
@@ -1583,22 +1595,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 	arm_smmu_write_ste(master, sid, dst, &target);
 }
 
+/*
+ * This can safely directly manipulate the STE memory without a sync sequence
+ * because the STE table has not been installed in the SMMU yet.
+ */
 static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
-				      unsigned int nent, bool force)
+				      unsigned int nent)
 {
 	unsigned int i;
-	u64 val = STRTAB_STE_0_V;
-
-	if (disable_bypass && !force)
-		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
-	else
-		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
 
 	for (i = 0; i < nent; ++i) {
-		strtab->data[0] = cpu_to_le64(val);
-		strtab->data[1] = cpu_to_le64(FIELD_PREP(
-			STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
-		strtab->data[2] = 0;
+		if (disable_bypass)
+			arm_smmu_make_abort_ste(strtab);
+		else
+			arm_smmu_make_bypass_ste(strtab);
 		strtab++;
 	}
 }
@@ -1626,7 +1636,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
 		return -ENOMEM;
 	}
 
-	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
+	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
 	arm_smmu_write_strtab_l1_desc(strtab, desc);
 	return 0;
 }
@@ -3245,7 +3255,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
 	cfg->strtab_base_cfg = reg;
 
-	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
+	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
 	return 0;
 }
 
@@ -3956,7 +3966,6 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
 	iort_get_rmr_sids(dev_fwnode(smmu->dev), &rmr_list);
 
 	list_for_each_entry(e, &rmr_list, list) {
-		struct arm_smmu_ste *step;
 		struct iommu_iort_rmr_data *rmr;
 		int ret, i;
 
@@ -3969,8 +3978,12 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
 				continue;
 			}
 
-			step = arm_smmu_get_step_for_sid(smmu, rmr->sids[i]);
-			arm_smmu_init_bypass_stes(step, 1, true);
+			/*
+			 * STE table is not programmed to HW, see
+			 * arm_smmu_init_bypass_stes()
+			 */
+			arm_smmu_make_bypass_ste(
+				arm_smmu_get_step_for_sid(smmu, rmr->sids[i]));
 		}
 	}
 
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Logically arm_smmu_init_strtab() is the function that allocates and
populates the stream table with the initial value of the STEs. After this
function returns the stream table should be fully ready.

arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
ensures there is no disruption to the identity mapping during boot.

Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
already executes immediately after arm_smmu_init_strtab().

No functional change intended.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6123e5ad95822c..2ab36dcf7c61f5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -101,6 +101,8 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ 0, NULL},
 };
 
+static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu);
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -3256,6 +3258,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 	cfg->strtab_base_cfg = reg;
 
 	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
+
 	return 0;
 }
 
@@ -3279,6 +3282,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 
 	ida_init(&smmu->vmid_map);
 
+	/* Check for RMRs and install bypass STEs if any */
+	arm_smmu_rmr_install_bypass_ste(smmu);
 	return 0;
 }
 
@@ -4073,9 +4078,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	/* Record our private device structure */
 	platform_set_drvdata(pdev, smmu);
 
-	/* Check for RMRs and install bypass STEs if any */
-	arm_smmu_rmr_install_bypass_ste(smmu);
-
 	/* Reset the device */
 	ret = arm_smmu_device_reset(smmu, bypass);
 	if (ret)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Logically arm_smmu_init_strtab() is the function that allocates and
populates the stream table with the initial value of the STEs. After this
function returns the stream table should be fully ready.

arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
ensures there is no disruption to the identity mapping during boot.

Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
already executes immediately after arm_smmu_init_strtab().

No functional change intended.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 6123e5ad95822c..2ab36dcf7c61f5 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -101,6 +101,8 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ 0, NULL},
 };
 
+static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu);
+
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
 	int i = 0;
@@ -3256,6 +3258,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
 	cfg->strtab_base_cfg = reg;
 
 	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
+
 	return 0;
 }
 
@@ -3279,6 +3282,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
 
 	ida_init(&smmu->vmid_map);
 
+	/* Check for RMRs and install bypass STEs if any */
+	arm_smmu_rmr_install_bypass_ste(smmu);
 	return 0;
 }
 
@@ -4073,9 +4078,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
 	/* Record our private device structure */
 	platform_set_drvdata(pdev, smmu);
 
-	/* Check for RMRs and install bypass STEs if any */
-	arm_smmu_rmr_install_bypass_ste(smmu);
-
 	/* Reset the device */
 	ret = arm_smmu_device_reset(smmu, bypass);
 	if (ret)
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

This is preparation to move the STE calculation higher up in to the call
chain and remove arm_smmu_write_strtab_ent(). These new functions will be
called directly from attach_dev.

Reviewed-by: Moritz Fischer <mdf@kernel.org>
Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 136 ++++++++++++--------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
 2 files changed, 84 insertions(+), 53 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2ab36dcf7c61f5..893df3e76400ec 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1518,13 +1518,89 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
 		FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
 }
 
+static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
+				      struct arm_smmu_master *master)
+{
+	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
+		FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt) |
+		(cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
+		FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax));
+
+	target->data[1] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
+		FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
+		FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
+		FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
+		((smmu->features & ARM_SMMU_FEAT_STALLS &&
+		  !master->stall_enabled) ?
+			 STRTAB_STE_1_S1STALLD :
+			 0) |
+		FIELD_PREP(STRTAB_STE_1_EATS,
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H) {
+		/*
+		 * To support BTM the streamworld needs to match the
+		 * configuration of the CPU so that the ASID broadcasts are
+		 * properly matched. This means either S/NS-EL2-E2H (hypervisor)
+		 * or NS-EL1 (guest). Since an SVA domain can be installed in a
+		 * PASID this should always use a BTM compatible configuration
+		 * if the HW supports it.
+		 */
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_EL2));
+	} else {
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
+
+		/*
+		 * VMID 0 is reserved for stage-2 bypass EL1 STEs, see
+		 * arm_smmu_domain_alloc_id()
+		 */
+		target->data[2] =
+			cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0));
+	}
+}
+
+static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
+					struct arm_smmu_master *master,
+					struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
+
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
+
+	target->data[1] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_1_EATS,
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
+		FIELD_PREP(STRTAB_STE_1_SHCFG,
+			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
+
+	target->data[2] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
+		FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
+		STRTAB_STE_2_S2AA64 |
+#ifdef __BIG_ENDIAN
+		STRTAB_STE_2_S2ENDI |
+#endif
+		STRTAB_STE_2_S2PTW |
+		STRTAB_STE_2_S2R);
+
+	target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+}
+
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 				      struct arm_smmu_ste *dst)
 {
-	u64 val;
-	struct arm_smmu_device *smmu = master->smmu;
-	struct arm_smmu_ctx_desc_cfg *cd_table = NULL;
-	struct arm_smmu_s2_cfg *s2_cfg = NULL;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 	struct arm_smmu_ste target = {};
 
@@ -1539,61 +1615,15 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
-		cd_table = &master->cd_table;
+		arm_smmu_make_cdtable_ste(&target, master);
 		break;
 	case ARM_SMMU_DOMAIN_S2:
-		s2_cfg = &smmu_domain->s2_cfg;
+		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
 		break;
 	case ARM_SMMU_DOMAIN_BYPASS:
 		arm_smmu_make_bypass_ste(&target);
-		arm_smmu_write_ste(master, sid, dst, &target);
-		return;
+		break;
 	}
-
-	/* Nuke the existing STE_0 value, as we're going to rewrite it */
-	val = STRTAB_STE_0_V;
-
-	if (cd_table) {
-		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
-			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
-
-		target.data[1] = cpu_to_le64(
-			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
-			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
-			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
-			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
-			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
-
-		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		    !master->stall_enabled)
-			target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
-
-		val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
-			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
-			FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax) |
-			FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt);
-	}
-
-	if (s2_cfg) {
-		target.data[2] = cpu_to_le64(
-			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
-			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
-#ifdef __BIG_ENDIAN
-			 STRTAB_STE_2_S2ENDI |
-#endif
-			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
-			 STRTAB_STE_2_S2R);
-
-		target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
-
-		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
-	}
-
-	if (master->ats_enabled)
-		target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
-						 STRTAB_STE_1_EATS_TRANS));
-
-	target.data[0] = cpu_to_le64(val);
 	arm_smmu_write_ste(master, sid, dst, &target);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 65fb388d51734d..53695dbc9b33f3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -249,6 +249,7 @@ struct arm_smmu_ste {
 #define STRTAB_STE_1_STRW_EL2		2UL
 
 #define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
+#define STRTAB_STE_1_SHCFG_NON_SHARABLE	0UL
 #define STRTAB_STE_1_SHCFG_INCOMING	1UL
 
 #define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

This is preparation to move the STE calculation higher up in to the call
chain and remove arm_smmu_write_strtab_ent(). These new functions will be
called directly from attach_dev.

Reviewed-by: Moritz Fischer <mdf@kernel.org>
Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 136 ++++++++++++--------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |   1 +
 2 files changed, 84 insertions(+), 53 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2ab36dcf7c61f5..893df3e76400ec 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1518,13 +1518,89 @@ static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
 		FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
 }
 
+static void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target,
+				      struct arm_smmu_master *master)
+{
+	struct arm_smmu_ctx_desc_cfg *cd_table = &master->cd_table;
+	struct arm_smmu_device *smmu = master->smmu;
+
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
+		FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt) |
+		(cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
+		FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax));
+
+	target->data[1] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
+		FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
+		FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
+		FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
+		((smmu->features & ARM_SMMU_FEAT_STALLS &&
+		  !master->stall_enabled) ?
+			 STRTAB_STE_1_S1STALLD :
+			 0) |
+		FIELD_PREP(STRTAB_STE_1_EATS,
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0));
+
+	if (smmu->features & ARM_SMMU_FEAT_E2H) {
+		/*
+		 * To support BTM the streamworld needs to match the
+		 * configuration of the CPU so that the ASID broadcasts are
+		 * properly matched. This means either S/NS-EL2-E2H (hypervisor)
+		 * or NS-EL1 (guest). Since an SVA domain can be installed in a
+		 * PASID this should always use a BTM compatible configuration
+		 * if the HW supports it.
+		 */
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_EL2));
+	} else {
+		target->data[1] |= cpu_to_le64(
+			FIELD_PREP(STRTAB_STE_1_STRW, STRTAB_STE_1_STRW_NSEL1));
+
+		/*
+		 * VMID 0 is reserved for stage-2 bypass EL1 STEs, see
+		 * arm_smmu_domain_alloc_id()
+		 */
+		target->data[2] =
+			cpu_to_le64(FIELD_PREP(STRTAB_STE_2_S2VMID, 0));
+	}
+}
+
+static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
+					struct arm_smmu_master *master,
+					struct arm_smmu_domain *smmu_domain)
+{
+	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
+
+	memset(target, 0, sizeof(*target));
+	target->data[0] = cpu_to_le64(
+		STRTAB_STE_0_V |
+		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
+
+	target->data[1] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_1_EATS,
+			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
+		FIELD_PREP(STRTAB_STE_1_SHCFG,
+			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
+
+	target->data[2] = cpu_to_le64(
+		FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
+		FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
+		STRTAB_STE_2_S2AA64 |
+#ifdef __BIG_ENDIAN
+		STRTAB_STE_2_S2ENDI |
+#endif
+		STRTAB_STE_2_S2PTW |
+		STRTAB_STE_2_S2R);
+
+	target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+}
+
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 				      struct arm_smmu_ste *dst)
 {
-	u64 val;
-	struct arm_smmu_device *smmu = master->smmu;
-	struct arm_smmu_ctx_desc_cfg *cd_table = NULL;
-	struct arm_smmu_s2_cfg *s2_cfg = NULL;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 	struct arm_smmu_ste target = {};
 
@@ -1539,61 +1615,15 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
 
 	switch (smmu_domain->stage) {
 	case ARM_SMMU_DOMAIN_S1:
-		cd_table = &master->cd_table;
+		arm_smmu_make_cdtable_ste(&target, master);
 		break;
 	case ARM_SMMU_DOMAIN_S2:
-		s2_cfg = &smmu_domain->s2_cfg;
+		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
 		break;
 	case ARM_SMMU_DOMAIN_BYPASS:
 		arm_smmu_make_bypass_ste(&target);
-		arm_smmu_write_ste(master, sid, dst, &target);
-		return;
+		break;
 	}
-
-	/* Nuke the existing STE_0 value, as we're going to rewrite it */
-	val = STRTAB_STE_0_V;
-
-	if (cd_table) {
-		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
-			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
-
-		target.data[1] = cpu_to_le64(
-			 FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) |
-			 FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) |
-			 FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) |
-			 FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) |
-			 FIELD_PREP(STRTAB_STE_1_STRW, strw));
-
-		if (smmu->features & ARM_SMMU_FEAT_STALLS &&
-		    !master->stall_enabled)
-			target.data[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD);
-
-		val |= (cd_table->cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) |
-			FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) |
-			FIELD_PREP(STRTAB_STE_0_S1CDMAX, cd_table->s1cdmax) |
-			FIELD_PREP(STRTAB_STE_0_S1FMT, cd_table->s1fmt);
-	}
-
-	if (s2_cfg) {
-		target.data[2] = cpu_to_le64(
-			 FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
-			 FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
-#ifdef __BIG_ENDIAN
-			 STRTAB_STE_2_S2ENDI |
-#endif
-			 STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 |
-			 STRTAB_STE_2_S2R);
-
-		target.data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
-
-		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS);
-	}
-
-	if (master->ats_enabled)
-		target.data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS,
-						 STRTAB_STE_1_EATS_TRANS));
-
-	target.data[0] = cpu_to_le64(val);
 	arm_smmu_write_ste(master, sid, dst, &target);
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 65fb388d51734d..53695dbc9b33f3 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -249,6 +249,7 @@ struct arm_smmu_ste {
 #define STRTAB_STE_1_STRW_EL2		2UL
 
 #define STRTAB_STE_1_SHCFG		GENMASK_ULL(45, 44)
+#define STRTAB_STE_1_SHCFG_NON_SHARABLE	0UL
 #define STRTAB_STE_1_SHCFG_INCOMING	1UL
 
 #define STRTAB_STE_2_S2VMID		GENMASK_ULL(15, 0)
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 05/17] iommu/arm-smmu-v3: Build the whole STE in arm_smmu_make_s2_domain_ste()
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Half the code was living in arm_smmu_domain_finalise_s2(), just move it
here and take the values directly from the pgtbl_ops instead of storing
copies.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 ++++++++++++---------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 --
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 893df3e76400ec..417b2c877ff311 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1573,6 +1573,11 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 					struct arm_smmu_domain *smmu_domain)
 {
 	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
+	const struct io_pgtable_cfg *pgtbl_cfg =
+		&io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg;
+	typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr =
+		&pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
+	u64 vtcr_val;
 
 	memset(target, 0, sizeof(*target));
 	target->data[0] = cpu_to_le64(
@@ -1585,9 +1590,16 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 		FIELD_PREP(STRTAB_STE_1_SHCFG,
 			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
 
+	vtcr_val = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps);
 	target->data[2] = cpu_to_le64(
 		FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
-		FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
+		FIELD_PREP(STRTAB_STE_2_VTCR, vtcr_val) |
 		STRTAB_STE_2_S2AA64 |
 #ifdef __BIG_ENDIAN
 		STRTAB_STE_2_S2ENDI |
@@ -1595,7 +1607,8 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 		STRTAB_STE_2_S2PTW |
 		STRTAB_STE_2_S2R);
 
-	target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+	target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s2_cfg.vttbr &
+				      STRTAB_STE_3_S2TTB_MASK);
 }
 
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
@@ -2355,7 +2368,6 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	int vmid;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
-	typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr;
 
 	/* Reserve VMID 0 for stage-2 bypass STEs */
 	vmid = ida_alloc_range(&smmu->vmid_map, 1, (1 << smmu->vmid_bits) - 1,
@@ -2363,16 +2375,7 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	if (vmid < 0)
 		return vmid;
 
-	vtcr = &pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
 	cfg->vmid	= (u16)vmid;
-	cfg->vttbr	= pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
-	cfg->vtcr	= FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps);
 	return 0;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 53695dbc9b33f3..cbf4b57719b7b9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -610,8 +610,6 @@ struct arm_smmu_ctx_desc_cfg {
 
 struct arm_smmu_s2_cfg {
 	u16				vmid;
-	u64				vttbr;
-	u64				vtcr;
 };
 
 struct arm_smmu_strtab_cfg {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 05/17] iommu/arm-smmu-v3: Build the whole STE in arm_smmu_make_s2_domain_ste()
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Half the code was living in arm_smmu_domain_finalise_s2(), just move it
here and take the values directly from the pgtbl_ops instead of storing
copies.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 27 ++++++++++++---------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  2 --
 2 files changed, 15 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 893df3e76400ec..417b2c877ff311 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1573,6 +1573,11 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 					struct arm_smmu_domain *smmu_domain)
 {
 	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
+	const struct io_pgtable_cfg *pgtbl_cfg =
+		&io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops)->cfg;
+	typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr =
+		&pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
+	u64 vtcr_val;
 
 	memset(target, 0, sizeof(*target));
 	target->data[0] = cpu_to_le64(
@@ -1585,9 +1590,16 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 		FIELD_PREP(STRTAB_STE_1_SHCFG,
 			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
 
+	vtcr_val = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) |
+		   FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps);
 	target->data[2] = cpu_to_le64(
 		FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) |
-		FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) |
+		FIELD_PREP(STRTAB_STE_2_VTCR, vtcr_val) |
 		STRTAB_STE_2_S2AA64 |
 #ifdef __BIG_ENDIAN
 		STRTAB_STE_2_S2ENDI |
@@ -1595,7 +1607,8 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 		STRTAB_STE_2_S2PTW |
 		STRTAB_STE_2_S2R);
 
-	target->data[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK);
+	target->data[3] = cpu_to_le64(pgtbl_cfg->arm_lpae_s2_cfg.vttbr &
+				      STRTAB_STE_3_S2TTB_MASK);
 }
 
 static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
@@ -2355,7 +2368,6 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	int vmid;
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
-	typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr;
 
 	/* Reserve VMID 0 for stage-2 bypass STEs */
 	vmid = ida_alloc_range(&smmu->vmid_map, 1, (1 << smmu->vmid_bits) - 1,
@@ -2363,16 +2375,7 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	if (vmid < 0)
 		return vmid;
 
-	vtcr = &pgtbl_cfg->arm_lpae_s2_cfg.vtcr;
 	cfg->vmid	= (u16)vmid;
-	cfg->vttbr	= pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
-	cfg->vtcr	= FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) |
-			  FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps);
 	return 0;
 }
 
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 53695dbc9b33f3..cbf4b57719b7b9 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -610,8 +610,6 @@ struct arm_smmu_ctx_desc_cfg {
 
 struct arm_smmu_s2_cfg {
 	u16				vmid;
-	u64				vttbr;
-	u64				vtcr;
 };
 
 struct arm_smmu_strtab_cfg {
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 06/17] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

The BTM support wants to be able to change the ASID of any smmu_domain.
When it goes to do this it holds the arm_smmu_asid_lock and iterates over
the target domain's devices list.

During attach of a S1 domain we must ensure that the devices list and
CD are in sync, otherwise we could miss CD updates or a parallel CD update
could push an out of date CD.

This is pretty complicated, and almost works today because
arm_smmu_detach_dev() removes the master from the linked list before
working on the CD entries, preventing parallel update of the CD.

However, it does have an issue where the CD can remain programed while the
domain appears to be unattached. arm_smmu_share_asid() will then not clear
any CD entriess and install its own CD entry with the same ASID
concurrently. This creates a small race window where the IOMMU can see two
ASIDs pointing to different translations.

Solve this by wrapping most of the attach flow in the
arm_smmu_asid_lock. This locks more than strictly needed to prepare for
the next patch which will reorganize the order of the linked list, STE and
CD changes.

Move arm_smmu_detach_dev() till after we have initialized the domain so
the lock can be held for less time.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 417b2c877ff311..1229545ae6db4e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2639,8 +2639,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		return -EBUSY;
 	}
 
-	arm_smmu_detach_dev(master);
-
 	mutex_lock(&smmu_domain->init_mutex);
 
 	if (!smmu_domain->smmu) {
@@ -2655,6 +2653,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	if (ret)
 		return ret;
 
+	/*
+	 * Prevent arm_smmu_share_asid() from trying to change the ASID
+	 * of either the old or new domain while we are working on it.
+	 * This allows the STE and the smmu_domain->devices list to
+	 * be inconsistent during this routine.
+	 */
+	mutex_lock(&arm_smmu_asid_lock);
+
+	arm_smmu_detach_dev(master);
+
 	master->domain = smmu_domain;
 
 	/*
@@ -2680,13 +2688,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			}
 		}
 
-		/*
-		 * Prevent SVA from concurrently modifying the CD or writing to
-		 * the CD entry
-		 */
-		mutex_lock(&arm_smmu_asid_lock);
 		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
-		mutex_unlock(&arm_smmu_asid_lock);
 		if (ret) {
 			master->domain = NULL;
 			goto out_list_del;
@@ -2696,13 +2698,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	arm_smmu_install_ste_for_dev(master);
 
 	arm_smmu_enable_ats(master);
-	return 0;
+	goto out_unlock;
 
 out_list_del:
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_del(&master->domain_head);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
+out_unlock:
+	mutex_unlock(&arm_smmu_asid_lock);
 	return ret;
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 06/17] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

The BTM support wants to be able to change the ASID of any smmu_domain.
When it goes to do this it holds the arm_smmu_asid_lock and iterates over
the target domain's devices list.

During attach of a S1 domain we must ensure that the devices list and
CD are in sync, otherwise we could miss CD updates or a parallel CD update
could push an out of date CD.

This is pretty complicated, and almost works today because
arm_smmu_detach_dev() removes the master from the linked list before
working on the CD entries, preventing parallel update of the CD.

However, it does have an issue where the CD can remain programed while the
domain appears to be unattached. arm_smmu_share_asid() will then not clear
any CD entriess and install its own CD entry with the same ASID
concurrently. This creates a small race window where the IOMMU can see two
ASIDs pointing to different translations.

Solve this by wrapping most of the attach flow in the
arm_smmu_asid_lock. This locks more than strictly needed to prepare for
the next patch which will reorganize the order of the linked list, STE and
CD changes.

Move arm_smmu_detach_dev() till after we have initialized the domain so
the lock can be held for less time.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 417b2c877ff311..1229545ae6db4e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2639,8 +2639,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		return -EBUSY;
 	}
 
-	arm_smmu_detach_dev(master);
-
 	mutex_lock(&smmu_domain->init_mutex);
 
 	if (!smmu_domain->smmu) {
@@ -2655,6 +2653,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	if (ret)
 		return ret;
 
+	/*
+	 * Prevent arm_smmu_share_asid() from trying to change the ASID
+	 * of either the old or new domain while we are working on it.
+	 * This allows the STE and the smmu_domain->devices list to
+	 * be inconsistent during this routine.
+	 */
+	mutex_lock(&arm_smmu_asid_lock);
+
+	arm_smmu_detach_dev(master);
+
 	master->domain = smmu_domain;
 
 	/*
@@ -2680,13 +2688,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			}
 		}
 
-		/*
-		 * Prevent SVA from concurrently modifying the CD or writing to
-		 * the CD entry
-		 */
-		mutex_lock(&arm_smmu_asid_lock);
 		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
-		mutex_unlock(&arm_smmu_asid_lock);
 		if (ret) {
 			master->domain = NULL;
 			goto out_list_del;
@@ -2696,13 +2698,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	arm_smmu_install_ste_for_dev(master);
 
 	arm_smmu_enable_ats(master);
-	return 0;
+	goto out_unlock;
 
 out_list_del:
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_del(&master->domain_head);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
+out_unlock:
+	mutex_unlock(&arm_smmu_asid_lock);
 	return ret;
 }
 
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 07/17] iommu/arm-smmu-v3: Compute the STE only once for each master
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Currently arm_smmu_install_ste_for_dev() iterates over every SID and
computes from scratch an identical STE. Every SID should have the same STE
contents. Turn this inside out so that the STE is supplied by the caller
and arm_smmu_install_ste_for_dev() simply installs it to every SID.

This is possible now that the STE generation does not inform what sequence
should be used to program it.

This allows splitting the STE calculation up according to the call site,
which following patches will make use of, and removes the confusing NULL
domain special case that only supported arm_smmu_detach_dev().

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 57 ++++++++-------------
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1229545ae6db4e..1138e868c4d73e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1611,35 +1611,6 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 				      STRTAB_STE_3_S2TTB_MASK);
 }
 
-static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
-				      struct arm_smmu_ste *dst)
-{
-	struct arm_smmu_domain *smmu_domain = master->domain;
-	struct arm_smmu_ste target = {};
-
-	if (!smmu_domain) {
-		if (disable_bypass)
-			arm_smmu_make_abort_ste(&target);
-		else
-			arm_smmu_make_bypass_ste(&target);
-		arm_smmu_write_ste(master, sid, dst, &target);
-		return;
-	}
-
-	switch (smmu_domain->stage) {
-	case ARM_SMMU_DOMAIN_S1:
-		arm_smmu_make_cdtable_ste(&target, master);
-		break;
-	case ARM_SMMU_DOMAIN_S2:
-		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
-		break;
-	case ARM_SMMU_DOMAIN_BYPASS:
-		arm_smmu_make_bypass_ste(&target);
-		break;
-	}
-	arm_smmu_write_ste(master, sid, dst, &target);
-}
-
 /*
  * This can safely directly manipulate the STE memory without a sync sequence
  * because the STE table has not been installed in the SMMU yet.
@@ -2466,7 +2437,8 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
 	}
 }
 
-static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
+static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master,
+					 const struct arm_smmu_ste *target)
 {
 	int i, j;
 	struct arm_smmu_device *smmu = master->smmu;
@@ -2483,7 +2455,7 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
 		if (j < i)
 			continue;
 
-		arm_smmu_write_strtab_ent(master, sid, step);
+		arm_smmu_write_ste(master, sid, step, target);
 	}
 }
 
@@ -2590,6 +2562,7 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
 	unsigned long flags;
+	struct arm_smmu_ste target;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (!smmu_domain)
@@ -2603,7 +2576,11 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 
 	master->domain = NULL;
 	master->ats_enabled = false;
-	arm_smmu_install_ste_for_dev(master);
+	if (disable_bypass)
+		arm_smmu_make_abort_ste(&target);
+	else
+		arm_smmu_make_bypass_ste(&target);
+	arm_smmu_install_ste_for_dev(master, &target);
 	/*
 	 * Clearing the CD entry isn't strictly required to detach the domain
 	 * since the table is uninstalled anyway, but it helps avoid confusion
@@ -2618,6 +2595,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
 	unsigned long flags;
+	struct arm_smmu_ste target;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2679,7 +2657,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	list_add(&master->domain_head, &smmu_domain->devices);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+	switch (smmu_domain->stage) {
+	case ARM_SMMU_DOMAIN_S1:
 		if (!master->cd_table.cdtab) {
 			ret = arm_smmu_alloc_cd_tables(master);
 			if (ret) {
@@ -2693,9 +2672,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			master->domain = NULL;
 			goto out_list_del;
 		}
-	}
 
-	arm_smmu_install_ste_for_dev(master);
+		arm_smmu_make_cdtable_ste(&target, master);
+		break;
+	case ARM_SMMU_DOMAIN_S2:
+		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
+		break;
+	case ARM_SMMU_DOMAIN_BYPASS:
+		arm_smmu_make_bypass_ste(&target);
+		break;
+	}
+	arm_smmu_install_ste_for_dev(master, &target);
 
 	arm_smmu_enable_ats(master);
 	goto out_unlock;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 07/17] iommu/arm-smmu-v3: Compute the STE only once for each master
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Currently arm_smmu_install_ste_for_dev() iterates over every SID and
computes from scratch an identical STE. Every SID should have the same STE
contents. Turn this inside out so that the STE is supplied by the caller
and arm_smmu_install_ste_for_dev() simply installs it to every SID.

This is possible now that the STE generation does not inform what sequence
should be used to program it.

This allows splitting the STE calculation up according to the call site,
which following patches will make use of, and removes the confusing NULL
domain special case that only supported arm_smmu_detach_dev().

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-by: Mostafa Saleh <smostafa@google.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 57 ++++++++-------------
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1229545ae6db4e..1138e868c4d73e 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -1611,35 +1611,6 @@ static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
 				      STRTAB_STE_3_S2TTB_MASK);
 }
 
-static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
-				      struct arm_smmu_ste *dst)
-{
-	struct arm_smmu_domain *smmu_domain = master->domain;
-	struct arm_smmu_ste target = {};
-
-	if (!smmu_domain) {
-		if (disable_bypass)
-			arm_smmu_make_abort_ste(&target);
-		else
-			arm_smmu_make_bypass_ste(&target);
-		arm_smmu_write_ste(master, sid, dst, &target);
-		return;
-	}
-
-	switch (smmu_domain->stage) {
-	case ARM_SMMU_DOMAIN_S1:
-		arm_smmu_make_cdtable_ste(&target, master);
-		break;
-	case ARM_SMMU_DOMAIN_S2:
-		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
-		break;
-	case ARM_SMMU_DOMAIN_BYPASS:
-		arm_smmu_make_bypass_ste(&target);
-		break;
-	}
-	arm_smmu_write_ste(master, sid, dst, &target);
-}
-
 /*
  * This can safely directly manipulate the STE memory without a sync sequence
  * because the STE table has not been installed in the SMMU yet.
@@ -2466,7 +2437,8 @@ arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid)
 	}
 }
 
-static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
+static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master,
+					 const struct arm_smmu_ste *target)
 {
 	int i, j;
 	struct arm_smmu_device *smmu = master->smmu;
@@ -2483,7 +2455,7 @@ static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master)
 		if (j < i)
 			continue;
 
-		arm_smmu_write_strtab_ent(master, sid, step);
+		arm_smmu_write_ste(master, sid, step, target);
 	}
 }
 
@@ -2590,6 +2562,7 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
 	unsigned long flags;
+	struct arm_smmu_ste target;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (!smmu_domain)
@@ -2603,7 +2576,11 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 
 	master->domain = NULL;
 	master->ats_enabled = false;
-	arm_smmu_install_ste_for_dev(master);
+	if (disable_bypass)
+		arm_smmu_make_abort_ste(&target);
+	else
+		arm_smmu_make_bypass_ste(&target);
+	arm_smmu_install_ste_for_dev(master, &target);
 	/*
 	 * Clearing the CD entry isn't strictly required to detach the domain
 	 * since the table is uninstalled anyway, but it helps avoid confusion
@@ -2618,6 +2595,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 {
 	int ret = 0;
 	unsigned long flags;
+	struct arm_smmu_ste target;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
@@ -2679,7 +2657,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	list_add(&master->domain_head, &smmu_domain->devices);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
+	switch (smmu_domain->stage) {
+	case ARM_SMMU_DOMAIN_S1:
 		if (!master->cd_table.cdtab) {
 			ret = arm_smmu_alloc_cd_tables(master);
 			if (ret) {
@@ -2693,9 +2672,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			master->domain = NULL;
 			goto out_list_del;
 		}
-	}
 
-	arm_smmu_install_ste_for_dev(master);
+		arm_smmu_make_cdtable_ste(&target, master);
+		break;
+	case ARM_SMMU_DOMAIN_S2:
+		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
+		break;
+	case ARM_SMMU_DOMAIN_BYPASS:
+		arm_smmu_make_bypass_ste(&target);
+		break;
+	}
+	arm_smmu_install_ste_for_dev(master, &target);
 
 	arm_smmu_enable_ats(master);
 	goto out_unlock;
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 08/17] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev()
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

This was needed because the STE code required the STE to be in
ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
can automatically handle all transitions we can remove this step
from the attach_dev flow.

A few small bugs exist because of this:

1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
   then there will be a moment where the STE points at BYPASS. Since
   this can be done by VFIO/IOMMUFD it is a small security race.

2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
   regions will temporarily become BLOCKED. We'd like drivers to
   work in a way that allows IOMMU_RESV_DIRECT to be continuously
   functional during these transitions.

Make arm_smmu_release_device() put the STE back to the correct
ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
this path.

As noted before the reordering of the linked list/STE/CD changes is OK
against concurrent arm_smmu_share_asid() because of the
arm_smmu_asid_lock.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1138e868c4d73e..340f3dc82c9ce0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2562,7 +2562,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
 	unsigned long flags;
-	struct arm_smmu_ste target;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (!smmu_domain)
@@ -2576,11 +2575,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 
 	master->domain = NULL;
 	master->ats_enabled = false;
-	if (disable_bypass)
-		arm_smmu_make_abort_ste(&target);
-	else
-		arm_smmu_make_bypass_ste(&target);
-	arm_smmu_install_ste_for_dev(master, &target);
 	/*
 	 * Clearing the CD entry isn't strictly required to detach the domain
 	 * since the table is uninstalled anyway, but it helps avoid confusion
@@ -2928,9 +2922,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 static void arm_smmu_release_device(struct device *dev)
 {
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct arm_smmu_ste target;
 
 	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
 		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
+
+	/* Put the STE back to what arm_smmu_init_strtab() sets */
+	if (disable_bypass && !dev->iommu->require_direct)
+		arm_smmu_make_abort_ste(&target);
+	else
+		arm_smmu_make_bypass_ste(&target);
+	arm_smmu_install_ste_for_dev(master, &target);
+
 	arm_smmu_detach_dev(master);
 	arm_smmu_disable_pasid(master);
 	arm_smmu_remove_master(master);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 08/17] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev()
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

This was needed because the STE code required the STE to be in
ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
can automatically handle all transitions we can remove this step
from the attach_dev flow.

A few small bugs exist because of this:

1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
   then there will be a moment where the STE points at BYPASS. Since
   this can be done by VFIO/IOMMUFD it is a small security race.

2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
   regions will temporarily become BLOCKED. We'd like drivers to
   work in a way that allows IOMMU_RESV_DIRECT to be continuously
   functional during these transitions.

Make arm_smmu_release_device() put the STE back to the correct
ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
this path.

As noted before the reordering of the linked list/STE/CD changes is OK
against concurrent arm_smmu_share_asid() because of the
arm_smmu_asid_lock.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 1138e868c4d73e..340f3dc82c9ce0 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2562,7 +2562,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
 	unsigned long flags;
-	struct arm_smmu_ste target;
 	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	if (!smmu_domain)
@@ -2576,11 +2575,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 
 	master->domain = NULL;
 	master->ats_enabled = false;
-	if (disable_bypass)
-		arm_smmu_make_abort_ste(&target);
-	else
-		arm_smmu_make_bypass_ste(&target);
-	arm_smmu_install_ste_for_dev(master, &target);
 	/*
 	 * Clearing the CD entry isn't strictly required to detach the domain
 	 * since the table is uninstalled anyway, but it helps avoid confusion
@@ -2928,9 +2922,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 static void arm_smmu_release_device(struct device *dev)
 {
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+	struct arm_smmu_ste target;
 
 	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
 		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
+
+	/* Put the STE back to what arm_smmu_init_strtab() sets */
+	if (disable_bypass && !dev->iommu->require_direct)
+		arm_smmu_make_abort_ste(&target);
+	else
+		arm_smmu_make_bypass_ste(&target);
+	arm_smmu_install_ste_for_dev(master, &target);
+
 	arm_smmu_detach_dev(master);
 	arm_smmu_disable_pasid(master);
 	arm_smmu_remove_master(master);
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 09/17] iommu/arm-smmu-v3: Put writing the context descriptor in the right order
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Get closer to the IOMMU API ideal that changes between domains can be
hitless. The ordering for the CD table entry is not entirely clean from
this perspective.

When switching away from a STE with a CD table programmed in it we should
write the new STE first, then clear any old data in the CD entry.

If we are programming a CD table for the first time to a STE then the CD
entry should be programmed before the STE is loaded.

If we are replacing a CD table entry when the STE already points at the CD
entry then we just need to do the make/break sequence.

Lift this code out of arm_smmu_detach_dev() so it can all be sequenced
properly. The only other caller is arm_smmu_release_device() and it is
going to free the cdtable anyhow, so it doesn't matter what is in it.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++-------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 340f3dc82c9ce0..2a6ac0af932c54 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2575,14 +2575,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 
 	master->domain = NULL;
 	master->ats_enabled = false;
-	/*
-	 * Clearing the CD entry isn't strictly required to detach the domain
-	 * since the table is uninstalled anyway, but it helps avoid confusion
-	 * in the call to arm_smmu_write_ctx_desc on the next attach (which
-	 * expects the entry to be empty).
-	 */
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab)
-		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
 }
 
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
@@ -2659,6 +2651,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 				master->domain = NULL;
 				goto out_list_del;
 			}
+		} else {
+			/*
+			 * arm_smmu_write_ctx_desc() relies on the entry being
+			 * invalid to work, clear any existing entry.
+			 */
+			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
+						      NULL);
+			if (ret) {
+				master->domain = NULL;
+				goto out_list_del;
+			}
 		}
 
 		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
@@ -2668,15 +2671,23 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		}
 
 		arm_smmu_make_cdtable_ste(&target, master);
+		arm_smmu_install_ste_for_dev(master, &target);
 		break;
 	case ARM_SMMU_DOMAIN_S2:
 		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
+		arm_smmu_install_ste_for_dev(master, &target);
+		if (master->cd_table.cdtab)
+			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
+						      NULL);
 		break;
 	case ARM_SMMU_DOMAIN_BYPASS:
 		arm_smmu_make_bypass_ste(&target);
+		arm_smmu_install_ste_for_dev(master, &target);
+		if (master->cd_table.cdtab)
+			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
+						      NULL);
 		break;
 	}
-	arm_smmu_install_ste_for_dev(master, &target);
 
 	arm_smmu_enable_ats(master);
 	goto out_unlock;
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 09/17] iommu/arm-smmu-v3: Put writing the context descriptor in the right order
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Get closer to the IOMMU API ideal that changes between domains can be
hitless. The ordering for the CD table entry is not entirely clean from
this perspective.

When switching away from a STE with a CD table programmed in it we should
write the new STE first, then clear any old data in the CD entry.

If we are programming a CD table for the first time to a STE then the CD
entry should be programmed before the STE is loaded.

If we are replacing a CD table entry when the STE already points at the CD
entry then we just need to do the make/break sequence.

Lift this code out of arm_smmu_detach_dev() so it can all be sequenced
properly. The only other caller is arm_smmu_release_device() and it is
going to free the cdtable anyhow, so it doesn't matter what is in it.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++-------
 1 file changed, 20 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 340f3dc82c9ce0..2a6ac0af932c54 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2575,14 +2575,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 
 	master->domain = NULL;
 	master->ats_enabled = false;
-	/*
-	 * Clearing the CD entry isn't strictly required to detach the domain
-	 * since the table is uninstalled anyway, but it helps avoid confusion
-	 * in the call to arm_smmu_write_ctx_desc on the next attach (which
-	 * expects the entry to be empty).
-	 */
-	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab)
-		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
 }
 
 static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
@@ -2659,6 +2651,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 				master->domain = NULL;
 				goto out_list_del;
 			}
+		} else {
+			/*
+			 * arm_smmu_write_ctx_desc() relies on the entry being
+			 * invalid to work, clear any existing entry.
+			 */
+			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
+						      NULL);
+			if (ret) {
+				master->domain = NULL;
+				goto out_list_del;
+			}
 		}
 
 		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
@@ -2668,15 +2671,23 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		}
 
 		arm_smmu_make_cdtable_ste(&target, master);
+		arm_smmu_install_ste_for_dev(master, &target);
 		break;
 	case ARM_SMMU_DOMAIN_S2:
 		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
+		arm_smmu_install_ste_for_dev(master, &target);
+		if (master->cd_table.cdtab)
+			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
+						      NULL);
 		break;
 	case ARM_SMMU_DOMAIN_BYPASS:
 		arm_smmu_make_bypass_ste(&target);
+		arm_smmu_install_ste_for_dev(master, &target);
+		if (master->cd_table.cdtab)
+			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
+						      NULL);
 		break;
 	}
-	arm_smmu_install_ste_for_dev(master, &target);
 
 	arm_smmu_enable_ats(master);
 	goto out_unlock;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 10/17] iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

The caller already has the domain, just pass it in. A following patch will
remove master->domain.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2a6ac0af932c54..133f13f33df124 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2474,12 +2474,12 @@ static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
 	return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev));
 }
 
-static void arm_smmu_enable_ats(struct arm_smmu_master *master)
+static void arm_smmu_enable_ats(struct arm_smmu_master *master,
+				struct arm_smmu_domain *smmu_domain)
 {
 	size_t stu;
 	struct pci_dev *pdev;
 	struct arm_smmu_device *smmu = master->smmu;
-	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	/* Don't enable ATS at the endpoint if it's not enabled in the STE */
 	if (!master->ats_enabled)
@@ -2495,10 +2495,9 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
 		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
 }
 
-static void arm_smmu_disable_ats(struct arm_smmu_master *master)
+static void arm_smmu_disable_ats(struct arm_smmu_master *master,
+				 struct arm_smmu_domain *smmu_domain)
 {
-	struct arm_smmu_domain *smmu_domain = master->domain;
-
 	if (!master->ats_enabled)
 		return;
 
@@ -2567,7 +2566,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 	if (!smmu_domain)
 		return;
 
-	arm_smmu_disable_ats(master);
+	arm_smmu_disable_ats(master, smmu_domain);
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_del(&master->domain_head);
@@ -2689,7 +2688,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		break;
 	}
 
-	arm_smmu_enable_ats(master);
+	arm_smmu_enable_ats(master, smmu_domain);
 	goto out_unlock;
 
 out_list_del:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 10/17] iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

The caller already has the domain, just pass it in. A following patch will
remove master->domain.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 2a6ac0af932c54..133f13f33df124 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2474,12 +2474,12 @@ static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
 	return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev));
 }
 
-static void arm_smmu_enable_ats(struct arm_smmu_master *master)
+static void arm_smmu_enable_ats(struct arm_smmu_master *master,
+				struct arm_smmu_domain *smmu_domain)
 {
 	size_t stu;
 	struct pci_dev *pdev;
 	struct arm_smmu_device *smmu = master->smmu;
-	struct arm_smmu_domain *smmu_domain = master->domain;
 
 	/* Don't enable ATS at the endpoint if it's not enabled in the STE */
 	if (!master->ats_enabled)
@@ -2495,10 +2495,9 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
 		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
 }
 
-static void arm_smmu_disable_ats(struct arm_smmu_master *master)
+static void arm_smmu_disable_ats(struct arm_smmu_master *master,
+				 struct arm_smmu_domain *smmu_domain)
 {
-	struct arm_smmu_domain *smmu_domain = master->domain;
-
 	if (!master->ats_enabled)
 		return;
 
@@ -2567,7 +2566,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 	if (!smmu_domain)
 		return;
 
-	arm_smmu_disable_ats(master);
+	arm_smmu_disable_ats(master, smmu_domain);
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_del(&master->domain_head);
@@ -2689,7 +2688,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		break;
 	}
 
-	arm_smmu_enable_ats(master);
+	arm_smmu_enable_ats(master, smmu_domain);
 	goto out_unlock;
 
 out_list_del:
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Introducing global statics which are of type struct iommu_domain, not
struct arm_smmu_domain makes it difficult to retain
arm_smmu_master->domain, as it can no longer point to an IDENTITY or
BLOCKED domain.

The only place that uses the value is arm_smmu_detach_dev(). Change things
to work like other drivers and call iommu_get_domain_for_dev() to obtain
the current domain.

The master->domain is subtly protecting the domain_head against being
unused, change the domain_head to be INIT'd when the master is not
attached to a domain instead of garbage/zero.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++-------------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 133f13f33df124..a98707cd1efccb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2560,19 +2560,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
+	struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev);
+	struct arm_smmu_domain *smmu_domain;
 	unsigned long flags;
-	struct arm_smmu_domain *smmu_domain = master->domain;
 
-	if (!smmu_domain)
+	if (!domain)
 		return;
 
+	smmu_domain = to_smmu_domain(domain);
 	arm_smmu_disable_ats(master, smmu_domain);
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_del(&master->domain_head);
+	list_del_init(&master->domain_head);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
-	master->domain = NULL;
 	master->ats_enabled = false;
 }
 
@@ -2626,8 +2627,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	arm_smmu_detach_dev(master);
 
-	master->domain = smmu_domain;
-
 	/*
 	 * The SMMU does not support enabling ATS with bypass. When the STE is
 	 * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and
@@ -2646,10 +2645,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	case ARM_SMMU_DOMAIN_S1:
 		if (!master->cd_table.cdtab) {
 			ret = arm_smmu_alloc_cd_tables(master);
-			if (ret) {
-				master->domain = NULL;
+			if (ret)
 				goto out_list_del;
-			}
 		} else {
 			/*
 			 * arm_smmu_write_ctx_desc() relies on the entry being
@@ -2657,17 +2654,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			 */
 			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
 						      NULL);
-			if (ret) {
-				master->domain = NULL;
+			if (ret)
 				goto out_list_del;
-			}
 		}
 
 		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
-		if (ret) {
-			master->domain = NULL;
+		if (ret)
 			goto out_list_del;
-		}
 
 		arm_smmu_make_cdtable_ste(&target, master);
 		arm_smmu_install_ste_for_dev(master, &target);
@@ -2693,7 +2686,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 out_list_del:
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_del(&master->domain_head);
+	list_del_init(&master->domain_head);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 out_unlock:
@@ -2894,6 +2887,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 	master->dev = dev;
 	master->smmu = smmu;
 	INIT_LIST_HEAD(&master->bonds);
+	INIT_LIST_HEAD(&master->domain_head);
 	dev_iommu_priv_set(dev, master);
 
 	ret = arm_smmu_insert_master(smmu, master);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cbf4b57719b7b9..587f99701ad30f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -696,7 +696,6 @@ struct arm_smmu_stream {
 struct arm_smmu_master {
 	struct arm_smmu_device		*smmu;
 	struct device			*dev;
-	struct arm_smmu_domain		*domain;
 	struct list_head		domain_head;
 	struct arm_smmu_stream		*streams;
 	/* Locked by the iommu core using the group mutex */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Introducing global statics which are of type struct iommu_domain, not
struct arm_smmu_domain makes it difficult to retain
arm_smmu_master->domain, as it can no longer point to an IDENTITY or
BLOCKED domain.

The only place that uses the value is arm_smmu_detach_dev(). Change things
to work like other drivers and call iommu_get_domain_for_dev() to obtain
the current domain.

The master->domain is subtly protecting the domain_head against being
unused, change the domain_head to be INIT'd when the master is not
attached to a domain instead of garbage/zero.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++-------------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
 2 files changed, 10 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 133f13f33df124..a98707cd1efccb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2560,19 +2560,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 
 static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 {
+	struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev);
+	struct arm_smmu_domain *smmu_domain;
 	unsigned long flags;
-	struct arm_smmu_domain *smmu_domain = master->domain;
 
-	if (!smmu_domain)
+	if (!domain)
 		return;
 
+	smmu_domain = to_smmu_domain(domain);
 	arm_smmu_disable_ats(master, smmu_domain);
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_del(&master->domain_head);
+	list_del_init(&master->domain_head);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
-	master->domain = NULL;
 	master->ats_enabled = false;
 }
 
@@ -2626,8 +2627,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	arm_smmu_detach_dev(master);
 
-	master->domain = smmu_domain;
-
 	/*
 	 * The SMMU does not support enabling ATS with bypass. When the STE is
 	 * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and
@@ -2646,10 +2645,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	case ARM_SMMU_DOMAIN_S1:
 		if (!master->cd_table.cdtab) {
 			ret = arm_smmu_alloc_cd_tables(master);
-			if (ret) {
-				master->domain = NULL;
+			if (ret)
 				goto out_list_del;
-			}
 		} else {
 			/*
 			 * arm_smmu_write_ctx_desc() relies on the entry being
@@ -2657,17 +2654,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			 */
 			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
 						      NULL);
-			if (ret) {
-				master->domain = NULL;
+			if (ret)
 				goto out_list_del;
-			}
 		}
 
 		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
-		if (ret) {
-			master->domain = NULL;
+		if (ret)
 			goto out_list_del;
-		}
 
 		arm_smmu_make_cdtable_ste(&target, master);
 		arm_smmu_install_ste_for_dev(master, &target);
@@ -2693,7 +2686,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 out_list_del:
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-	list_del(&master->domain_head);
+	list_del_init(&master->domain_head);
 	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
 
 out_unlock:
@@ -2894,6 +2887,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 	master->dev = dev;
 	master->smmu = smmu;
 	INIT_LIST_HEAD(&master->bonds);
+	INIT_LIST_HEAD(&master->domain_head);
 	dev_iommu_priv_set(dev, master);
 
 	ret = arm_smmu_insert_master(smmu, master);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cbf4b57719b7b9..587f99701ad30f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -696,7 +696,6 @@ struct arm_smmu_stream {
 struct arm_smmu_master {
 	struct arm_smmu_device		*smmu;
 	struct device			*dev;
-	struct arm_smmu_domain		*domain;
 	struct list_head		domain_head;
 	struct arm_smmu_stream		*streams;
 	/* Locked by the iommu core using the group mutex */
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 12/17] iommu/arm-smmu-v3: Check that the RID domain is S1 in SVA
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

The SVA code only works if the RID domain is a S1 domain and has already
installed the cdtable.

Originally the check for this was in arm_smmu_sva_bind() but when the op
was removed the test didn't get copied over to the new
arm_smmu_sva_set_dev_pasid().

Without the test wrong usage usually will hit a WARN_ON() in
arm_smmu_write_ctx_desc() due to a missing ctx table.

However, the next patches wil change things so that an IDENTITY domain is
not a struct arm_smmu_domain and this will get into memory corruption if
the struct is wrongly casted.

Fail in arm_smmu_sva_set_dev_pasid() if the STE does not have a S1, which
is a proxy for the STE having a pointer to the CD table. Write it in a way
that will be compatible with the next patches.

Fixes: 386fa64fd52b ("arm-smmu-v3/sva: Add SVA domain support")
Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Closes: https://lore.kernel.org/linux-iommu/2a828e481416405fb3a4cceb9e075a59@huawei.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 05722121f00e70..540f524ecf018b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -387,7 +387,13 @@ static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
 	struct arm_smmu_bond *bond;
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_domain *smmu_domain;
+
+	if (!(domain->type & __IOMMU_DOMAIN_PAGING))
+		return -ENODEV;
+	smmu_domain = to_smmu_domain(domain);
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -ENODEV;
 
 	if (!master || !master->sva_enabled)
 		return -ENODEV;
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 12/17] iommu/arm-smmu-v3: Check that the RID domain is S1 in SVA
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

The SVA code only works if the RID domain is a S1 domain and has already
installed the cdtable.

Originally the check for this was in arm_smmu_sva_bind() but when the op
was removed the test didn't get copied over to the new
arm_smmu_sva_set_dev_pasid().

Without the test wrong usage usually will hit a WARN_ON() in
arm_smmu_write_ctx_desc() due to a missing ctx table.

However, the next patches wil change things so that an IDENTITY domain is
not a struct arm_smmu_domain and this will get into memory corruption if
the struct is wrongly casted.

Fail in arm_smmu_sva_set_dev_pasid() if the STE does not have a S1, which
is a proxy for the STE having a pointer to the CD table. Write it in a way
that will be compatible with the next patches.

Fixes: 386fa64fd52b ("arm-smmu-v3/sva: Add SVA domain support")
Reported-by: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>
Closes: https://lore.kernel.org/linux-iommu/2a828e481416405fb3a4cceb9e075a59@huawei.com/
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
index 05722121f00e70..540f524ecf018b 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c
@@ -387,7 +387,13 @@ static int __arm_smmu_sva_bind(struct device *dev, struct mm_struct *mm)
 	struct arm_smmu_bond *bond;
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
-	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
+	struct arm_smmu_domain *smmu_domain;
+
+	if (!(domain->type & __IOMMU_DOMAIN_PAGING))
+		return -ENODEV;
+	smmu_domain = to_smmu_domain(domain);
+	if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1)
+		return -ENODEV;
 
 	if (!master || !master->sva_enabled)
 		return -ENODEV;
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 13/17] iommu/arm-smmu-v3: Add a global static IDENTITY domain
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Move to the new static global for identity domains. Move all the logic out
of arm_smmu_attach_dev into an identity only function.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 82 +++++++++++++++------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
 2 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a98707cd1efccb..a940b5ff96843c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2253,8 +2253,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		return arm_smmu_sva_domain_alloc();
 
 	if (type != IOMMU_DOMAIN_UNMANAGED &&
-	    type != IOMMU_DOMAIN_DMA &&
-	    type != IOMMU_DOMAIN_IDENTITY)
+	    type != IOMMU_DOMAIN_DMA)
 		return NULL;
 
 	/*
@@ -2362,11 +2361,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-		smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS;
-		return 0;
-	}
-
 	/* Restrict the stage to what we can actually support */
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
@@ -2564,7 +2558,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 	struct arm_smmu_domain *smmu_domain;
 	unsigned long flags;
 
-	if (!domain)
+	if (!domain || !(domain->type & __IOMMU_DOMAIN_PAGING))
 		return;
 
 	smmu_domain = to_smmu_domain(domain);
@@ -2627,15 +2621,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	arm_smmu_detach_dev(master);
 
-	/*
-	 * The SMMU does not support enabling ATS with bypass. When the STE is
-	 * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and
-	 * Translated transactions are denied as though ATS is disabled for the
-	 * stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and
-	 * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry).
-	 */
-	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
-		master->ats_enabled = arm_smmu_ats_supported(master);
+	master->ats_enabled = arm_smmu_ats_supported(master);
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_add(&master->domain_head, &smmu_domain->devices);
@@ -2672,13 +2658,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
 						      NULL);
 		break;
-	case ARM_SMMU_DOMAIN_BYPASS:
-		arm_smmu_make_bypass_ste(&target);
-		arm_smmu_install_ste_for_dev(master, &target);
-		if (master->cd_table.cdtab)
-			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
-						      NULL);
-		break;
 	}
 
 	arm_smmu_enable_ats(master, smmu_domain);
@@ -2694,6 +2673,60 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	return ret;
 }
 
+static int arm_smmu_attach_dev_ste(struct device *dev,
+				   struct arm_smmu_ste *ste)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	if (arm_smmu_master_sva_enabled(master))
+		return -EBUSY;
+
+	/*
+	 * Do not allow any ASID to be changed while are working on the STE,
+	 * otherwise we could miss invalidations.
+	 */
+	mutex_lock(&arm_smmu_asid_lock);
+
+	/*
+	 * The SMMU does not support enabling ATS with bypass/abort. When the
+	 * STE is in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests
+	 * and Translated transactions are denied as though ATS is disabled for
+	 * the stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and
+	 * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry).
+	 */
+	arm_smmu_detach_dev(master);
+
+	arm_smmu_install_ste_for_dev(master, ste);
+	mutex_unlock(&arm_smmu_asid_lock);
+
+	/*
+	 * This has to be done after removing the master from the
+	 * arm_smmu_domain->devices to avoid races updating the same context
+	 * descriptor from arm_smmu_share_asid().
+	 */
+	if (master->cd_table.cdtab)
+		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
+	return 0;
+}
+
+static int arm_smmu_attach_dev_identity(struct iommu_domain *domain,
+					struct device *dev)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_make_bypass_ste(&ste);
+	return arm_smmu_attach_dev_ste(dev, &ste);
+}
+
+static const struct iommu_domain_ops arm_smmu_identity_ops = {
+	.attach_dev = arm_smmu_attach_dev_identity,
+};
+
+static struct iommu_domain arm_smmu_identity_domain = {
+	.type = IOMMU_DOMAIN_IDENTITY,
+	.ops = &arm_smmu_identity_ops,
+};
+
 static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
@@ -3083,6 +3116,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 }
 
 static struct iommu_ops arm_smmu_ops = {
+	.identity_domain	= &arm_smmu_identity_domain,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.probe_device		= arm_smmu_probe_device,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 587f99701ad30f..23d8ab9a937aa6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -713,7 +713,6 @@ struct arm_smmu_master {
 enum arm_smmu_domain_stage {
 	ARM_SMMU_DOMAIN_S1 = 0,
 	ARM_SMMU_DOMAIN_S2,
-	ARM_SMMU_DOMAIN_BYPASS,
 };
 
 struct arm_smmu_domain {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 13/17] iommu/arm-smmu-v3: Add a global static IDENTITY domain
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Move to the new static global for identity domains. Move all the logic out
of arm_smmu_attach_dev into an identity only function.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 82 +++++++++++++++------
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
 2 files changed, 58 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a98707cd1efccb..a940b5ff96843c 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2253,8 +2253,7 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 		return arm_smmu_sva_domain_alloc();
 
 	if (type != IOMMU_DOMAIN_UNMANAGED &&
-	    type != IOMMU_DOMAIN_DMA &&
-	    type != IOMMU_DOMAIN_IDENTITY)
+	    type != IOMMU_DOMAIN_DMA)
 		return NULL;
 
 	/*
@@ -2362,11 +2361,6 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_device *smmu = smmu_domain->smmu;
 
-	if (domain->type == IOMMU_DOMAIN_IDENTITY) {
-		smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS;
-		return 0;
-	}
-
 	/* Restrict the stage to what we can actually support */
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
 		smmu_domain->stage = ARM_SMMU_DOMAIN_S2;
@@ -2564,7 +2558,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
 	struct arm_smmu_domain *smmu_domain;
 	unsigned long flags;
 
-	if (!domain)
+	if (!domain || !(domain->type & __IOMMU_DOMAIN_PAGING))
 		return;
 
 	smmu_domain = to_smmu_domain(domain);
@@ -2627,15 +2621,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 
 	arm_smmu_detach_dev(master);
 
-	/*
-	 * The SMMU does not support enabling ATS with bypass. When the STE is
-	 * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and
-	 * Translated transactions are denied as though ATS is disabled for the
-	 * stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and
-	 * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry).
-	 */
-	if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS)
-		master->ats_enabled = arm_smmu_ats_supported(master);
+	master->ats_enabled = arm_smmu_ats_supported(master);
 
 	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
 	list_add(&master->domain_head, &smmu_domain->devices);
@@ -2672,13 +2658,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
 						      NULL);
 		break;
-	case ARM_SMMU_DOMAIN_BYPASS:
-		arm_smmu_make_bypass_ste(&target);
-		arm_smmu_install_ste_for_dev(master, &target);
-		if (master->cd_table.cdtab)
-			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
-						      NULL);
-		break;
 	}
 
 	arm_smmu_enable_ats(master, smmu_domain);
@@ -2694,6 +2673,60 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	return ret;
 }
 
+static int arm_smmu_attach_dev_ste(struct device *dev,
+				   struct arm_smmu_ste *ste)
+{
+	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+
+	if (arm_smmu_master_sva_enabled(master))
+		return -EBUSY;
+
+	/*
+	 * Do not allow any ASID to be changed while are working on the STE,
+	 * otherwise we could miss invalidations.
+	 */
+	mutex_lock(&arm_smmu_asid_lock);
+
+	/*
+	 * The SMMU does not support enabling ATS with bypass/abort. When the
+	 * STE is in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests
+	 * and Translated transactions are denied as though ATS is disabled for
+	 * the stream (STE.EATS == 0b00), causing F_BAD_ATS_TREQ and
+	 * F_TRANSL_FORBIDDEN events (IHI0070Ea 5.2 Stream Table Entry).
+	 */
+	arm_smmu_detach_dev(master);
+
+	arm_smmu_install_ste_for_dev(master, ste);
+	mutex_unlock(&arm_smmu_asid_lock);
+
+	/*
+	 * This has to be done after removing the master from the
+	 * arm_smmu_domain->devices to avoid races updating the same context
+	 * descriptor from arm_smmu_share_asid().
+	 */
+	if (master->cd_table.cdtab)
+		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
+	return 0;
+}
+
+static int arm_smmu_attach_dev_identity(struct iommu_domain *domain,
+					struct device *dev)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_make_bypass_ste(&ste);
+	return arm_smmu_attach_dev_ste(dev, &ste);
+}
+
+static const struct iommu_domain_ops arm_smmu_identity_ops = {
+	.attach_dev = arm_smmu_attach_dev_identity,
+};
+
+static struct iommu_domain arm_smmu_identity_domain = {
+	.type = IOMMU_DOMAIN_IDENTITY,
+	.ops = &arm_smmu_identity_ops,
+};
+
 static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
@@ -3083,6 +3116,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 }
 
 static struct iommu_ops arm_smmu_ops = {
+	.identity_domain	= &arm_smmu_identity_domain,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.probe_device		= arm_smmu_probe_device,
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index 587f99701ad30f..23d8ab9a937aa6 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -713,7 +713,6 @@ struct arm_smmu_master {
 enum arm_smmu_domain_stage {
 	ARM_SMMU_DOMAIN_S1 = 0,
 	ARM_SMMU_DOMAIN_S2,
-	ARM_SMMU_DOMAIN_BYPASS,
 };
 
 struct arm_smmu_domain {
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 14/17] iommu/arm-smmu-v3: Add a global static BLOCKED domain
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Using the same design as the IDENTITY domain install an
STRTAB_STE_0_CFG_ABORT STE.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a940b5ff96843c..9271f3a035b5b8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2727,6 +2727,24 @@ static struct iommu_domain arm_smmu_identity_domain = {
 	.ops = &arm_smmu_identity_ops,
 };
 
+static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain,
+					struct device *dev)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_make_abort_ste(&ste);
+	return arm_smmu_attach_dev_ste(dev, &ste);
+}
+
+static const struct iommu_domain_ops arm_smmu_blocked_ops = {
+	.attach_dev = arm_smmu_attach_dev_blocked,
+};
+
+static struct iommu_domain arm_smmu_blocked_domain = {
+	.type = IOMMU_DOMAIN_BLOCKED,
+	.ops = &arm_smmu_blocked_ops,
+};
+
 static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
@@ -3117,6 +3135,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 
 static struct iommu_ops arm_smmu_ops = {
 	.identity_domain	= &arm_smmu_identity_domain,
+	.blocked_domain		= &arm_smmu_blocked_domain,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.probe_device		= arm_smmu_probe_device,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 14/17] iommu/arm-smmu-v3: Add a global static BLOCKED domain
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Using the same design as the IDENTITY domain install an
STRTAB_STE_0_CFG_ABORT STE.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index a940b5ff96843c..9271f3a035b5b8 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2727,6 +2727,24 @@ static struct iommu_domain arm_smmu_identity_domain = {
 	.ops = &arm_smmu_identity_ops,
 };
 
+static int arm_smmu_attach_dev_blocked(struct iommu_domain *domain,
+					struct device *dev)
+{
+	struct arm_smmu_ste ste;
+
+	arm_smmu_make_abort_ste(&ste);
+	return arm_smmu_attach_dev_ste(dev, &ste);
+}
+
+static const struct iommu_domain_ops arm_smmu_blocked_ops = {
+	.attach_dev = arm_smmu_attach_dev_blocked,
+};
+
+static struct iommu_domain arm_smmu_blocked_domain = {
+	.type = IOMMU_DOMAIN_BLOCKED,
+	.ops = &arm_smmu_blocked_ops,
+};
+
 static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova,
 			      phys_addr_t paddr, size_t pgsize, size_t pgcount,
 			      int prot, gfp_t gfp, size_t *mapped)
@@ -3117,6 +3135,7 @@ static void arm_smmu_remove_dev_pasid(struct device *dev, ioasid_t pasid)
 
 static struct iommu_ops arm_smmu_ops = {
 	.identity_domain	= &arm_smmu_identity_domain,
+	.blocked_domain		= &arm_smmu_blocked_domain,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
 	.probe_device		= arm_smmu_probe_device,
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 15/17] iommu/arm-smmu-v3: Use the identity/blocked domain during release
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Consolidate some more code by having release call
arm_smmu_attach_dev_identity/blocked() instead of open coding this.

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9271f3a035b5b8..27a2792a6acd76 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2977,19 +2977,16 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 static void arm_smmu_release_device(struct device *dev)
 {
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
-	struct arm_smmu_ste target;
 
 	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
 		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
 
 	/* Put the STE back to what arm_smmu_init_strtab() sets */
 	if (disable_bypass && !dev->iommu->require_direct)
-		arm_smmu_make_abort_ste(&target);
+		arm_smmu_attach_dev_blocked(&arm_smmu_blocked_domain, dev);
 	else
-		arm_smmu_make_bypass_ste(&target);
-	arm_smmu_install_ste_for_dev(master, &target);
+		arm_smmu_attach_dev_identity(&arm_smmu_identity_domain, dev);
 
-	arm_smmu_detach_dev(master);
 	arm_smmu_disable_pasid(master);
 	arm_smmu_remove_master(master);
 	if (master->cd_table.cdtab)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 15/17] iommu/arm-smmu-v3: Use the identity/blocked domain during release
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Consolidate some more code by having release call
arm_smmu_attach_dev_identity/blocked() instead of open coding this.

Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 9271f3a035b5b8..27a2792a6acd76 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2977,19 +2977,16 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
 static void arm_smmu_release_device(struct device *dev)
 {
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
-	struct arm_smmu_ste target;
 
 	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
 		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
 
 	/* Put the STE back to what arm_smmu_init_strtab() sets */
 	if (disable_bypass && !dev->iommu->require_direct)
-		arm_smmu_make_abort_ste(&target);
+		arm_smmu_attach_dev_blocked(&arm_smmu_blocked_domain, dev);
 	else
-		arm_smmu_make_bypass_ste(&target);
-	arm_smmu_install_ste_for_dev(master, &target);
+		arm_smmu_attach_dev_identity(&arm_smmu_identity_domain, dev);
 
-	arm_smmu_detach_dev(master);
 	arm_smmu_disable_pasid(master);
 	arm_smmu_remove_master(master);
 	if (master->cd_table.cdtab)
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 16/17] iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to finalize
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Instead of putting container_of() casts in the internals, use the proper
type in this call chain. This makes it easier to check that the two global
static domains are not leaking into call chains they should not.

Passing the smmu avoids the only caller from having to set it and unset it
in the error path.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 34 ++++++++++-----------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 27a2792a6acd76..7116874c332ffd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -102,6 +102,8 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
 };
 
 static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu);
+static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
+				    struct arm_smmu_device *smmu);
 
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
@@ -2295,12 +2297,12 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	kfree(smmu_domain);
 }
 
-static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
+static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu,
+				       struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
 	u32 asid;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
 	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 
@@ -2332,11 +2334,11 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	return ret;
 }
 
-static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
+static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu,
+				       struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int vmid;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
 
 	/* Reserve VMID 0 for stage-2 bypass STEs */
@@ -2349,17 +2351,17 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
-static int arm_smmu_domain_finalise(struct iommu_domain *domain)
+static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
+				    struct arm_smmu_device *smmu)
 {
 	int ret;
 	unsigned long ias, oas;
 	enum io_pgtable_fmt fmt;
 	struct io_pgtable_cfg pgtbl_cfg;
 	struct io_pgtable_ops *pgtbl_ops;
-	int (*finalise_stage_fn)(struct arm_smmu_domain *,
-				 struct io_pgtable_cfg *);
-	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	int (*finalise_stage_fn)(struct arm_smmu_device *smmu,
+				 struct arm_smmu_domain *smmu_domain,
+				 struct io_pgtable_cfg *pgtbl_cfg);
 
 	/* Restrict the stage to what we can actually support */
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
@@ -2398,17 +2400,18 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	if (!pgtbl_ops)
 		return -ENOMEM;
 
-	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
-	domain->geometry.force_aperture = true;
+	smmu_domain->domain.pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
+	smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
+	smmu_domain->domain.geometry.force_aperture = true;
 
-	ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
+	ret = finalise_stage_fn(smmu, smmu_domain, &pgtbl_cfg);
 	if (ret < 0) {
 		free_io_pgtable_ops(pgtbl_ops);
 		return ret;
 	}
 
 	smmu_domain->pgtbl_ops = pgtbl_ops;
+	smmu_domain->smmu = smmu;
 	return 0;
 }
 
@@ -2600,10 +2603,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	mutex_lock(&smmu_domain->init_mutex);
 
 	if (!smmu_domain->smmu) {
-		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain);
-		if (ret)
-			smmu_domain->smmu = NULL;
+		ret = arm_smmu_domain_finalise(smmu_domain, smmu);
 	} else if (smmu_domain->smmu != smmu)
 		ret = -EINVAL;
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 16/17] iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to finalize
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Instead of putting container_of() casts in the internals, use the proper
type in this call chain. This makes it easier to check that the two global
static domains are not leaking into call chains they should not.

Passing the smmu avoids the only caller from having to set it and unset it
in the error path.

Reviewed-by: Michael Shavit <mshavit@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 34 ++++++++++-----------
 1 file changed, 17 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 27a2792a6acd76..7116874c332ffd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -102,6 +102,8 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
 };
 
 static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu);
+static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
+				    struct arm_smmu_device *smmu);
 
 static void parse_driver_options(struct arm_smmu_device *smmu)
 {
@@ -2295,12 +2297,12 @@ static void arm_smmu_domain_free(struct iommu_domain *domain)
 	kfree(smmu_domain);
 }
 
-static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
+static int arm_smmu_domain_finalise_s1(struct arm_smmu_device *smmu,
+				       struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int ret;
 	u32 asid;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_ctx_desc *cd = &smmu_domain->cd;
 	typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr;
 
@@ -2332,11 +2334,11 @@ static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain,
 	return ret;
 }
 
-static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
+static int arm_smmu_domain_finalise_s2(struct arm_smmu_device *smmu,
+				       struct arm_smmu_domain *smmu_domain,
 				       struct io_pgtable_cfg *pgtbl_cfg)
 {
 	int vmid;
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
 	struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg;
 
 	/* Reserve VMID 0 for stage-2 bypass STEs */
@@ -2349,17 +2351,17 @@ static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain,
 	return 0;
 }
 
-static int arm_smmu_domain_finalise(struct iommu_domain *domain)
+static int arm_smmu_domain_finalise(struct arm_smmu_domain *smmu_domain,
+				    struct arm_smmu_device *smmu)
 {
 	int ret;
 	unsigned long ias, oas;
 	enum io_pgtable_fmt fmt;
 	struct io_pgtable_cfg pgtbl_cfg;
 	struct io_pgtable_ops *pgtbl_ops;
-	int (*finalise_stage_fn)(struct arm_smmu_domain *,
-				 struct io_pgtable_cfg *);
-	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
-	struct arm_smmu_device *smmu = smmu_domain->smmu;
+	int (*finalise_stage_fn)(struct arm_smmu_device *smmu,
+				 struct arm_smmu_domain *smmu_domain,
+				 struct io_pgtable_cfg *pgtbl_cfg);
 
 	/* Restrict the stage to what we can actually support */
 	if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1))
@@ -2398,17 +2400,18 @@ static int arm_smmu_domain_finalise(struct iommu_domain *domain)
 	if (!pgtbl_ops)
 		return -ENOMEM;
 
-	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-	domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
-	domain->geometry.force_aperture = true;
+	smmu_domain->domain.pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
+	smmu_domain->domain.geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1;
+	smmu_domain->domain.geometry.force_aperture = true;
 
-	ret = finalise_stage_fn(smmu_domain, &pgtbl_cfg);
+	ret = finalise_stage_fn(smmu, smmu_domain, &pgtbl_cfg);
 	if (ret < 0) {
 		free_io_pgtable_ops(pgtbl_ops);
 		return ret;
 	}
 
 	smmu_domain->pgtbl_ops = pgtbl_ops;
+	smmu_domain->smmu = smmu;
 	return 0;
 }
 
@@ -2600,10 +2603,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	mutex_lock(&smmu_domain->init_mutex);
 
 	if (!smmu_domain->smmu) {
-		smmu_domain->smmu = smmu;
-		ret = arm_smmu_domain_finalise(domain);
-		if (ret)
-			smmu_domain->smmu = NULL;
+		ret = arm_smmu_domain_finalise(smmu_domain, smmu);
 	} else if (smmu_domain->smmu != smmu)
 		ret = -EINVAL;
 
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 17/17] iommu/arm-smmu-v3: Convert to domain_alloc_paging()
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-06 15:12   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Now that the BLOCKED and IDENTITY behaviors are managed with their own
domains change to the domain_alloc_paging() op.

For now SVA remains using the old interface, eventually it will get its
own op that can pass in the device and mm_struct which will let us have a
sane lifetime for the mmu_notifier.

Call arm_smmu_domain_finalise() early if dev is available.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7116874c332ffd..ab2f5ac4020d71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2249,14 +2249,15 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
 
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
-	struct arm_smmu_domain *smmu_domain;
 
 	if (type == IOMMU_DOMAIN_SVA)
 		return arm_smmu_sva_domain_alloc();
+	return ERR_PTR(-EOPNOTSUPP);
+}
 
-	if (type != IOMMU_DOMAIN_UNMANAGED &&
-	    type != IOMMU_DOMAIN_DMA)
-		return NULL;
+static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev)
+{
+	struct arm_smmu_domain *smmu_domain;
 
 	/*
 	 * Allocate the domain and initialise some of its data structures.
@@ -2265,13 +2266,23 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	 */
 	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
 	if (!smmu_domain)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	mutex_init(&smmu_domain->init_mutex);
 	INIT_LIST_HEAD(&smmu_domain->devices);
 	spin_lock_init(&smmu_domain->devices_lock);
 	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
 
+	if (dev) {
+		struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+		int ret;
+
+		ret = arm_smmu_domain_finalise(smmu_domain, master->smmu);
+		if (ret) {
+			kfree(smmu_domain);
+			return ERR_PTR(ret);
+		}
+	}
 	return &smmu_domain->domain;
 }
 
@@ -3135,6 +3146,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.blocked_domain		= &arm_smmu_blocked_domain,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
+	.domain_alloc_paging    = arm_smmu_domain_alloc_paging,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
 	.device_group		= arm_smmu_device_group,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 112+ messages in thread

* [PATCH v5 17/17] iommu/arm-smmu-v3: Convert to domain_alloc_paging()
@ 2024-02-06 15:12   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-06 15:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Now that the BLOCKED and IDENTITY behaviors are managed with their own
domains change to the domain_alloc_paging() op.

For now SVA remains using the old interface, eventually it will get its
own op that can pass in the device and mm_struct which will let us have a
sane lifetime for the mmu_notifier.

Call arm_smmu_domain_finalise() early if dev is available.

Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
Tested-by: Nicolin Chen <nicolinc@nvidia.com>
Tested-by: Moritz Fischer <moritzf@google.com>
Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 7116874c332ffd..ab2f5ac4020d71 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2249,14 +2249,15 @@ static bool arm_smmu_capable(struct device *dev, enum iommu_cap cap)
 
 static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 {
-	struct arm_smmu_domain *smmu_domain;
 
 	if (type == IOMMU_DOMAIN_SVA)
 		return arm_smmu_sva_domain_alloc();
+	return ERR_PTR(-EOPNOTSUPP);
+}
 
-	if (type != IOMMU_DOMAIN_UNMANAGED &&
-	    type != IOMMU_DOMAIN_DMA)
-		return NULL;
+static struct iommu_domain *arm_smmu_domain_alloc_paging(struct device *dev)
+{
+	struct arm_smmu_domain *smmu_domain;
 
 	/*
 	 * Allocate the domain and initialise some of its data structures.
@@ -2265,13 +2266,23 @@ static struct iommu_domain *arm_smmu_domain_alloc(unsigned type)
 	 */
 	smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
 	if (!smmu_domain)
-		return NULL;
+		return ERR_PTR(-ENOMEM);
 
 	mutex_init(&smmu_domain->init_mutex);
 	INIT_LIST_HEAD(&smmu_domain->devices);
 	spin_lock_init(&smmu_domain->devices_lock);
 	INIT_LIST_HEAD(&smmu_domain->mmu_notifiers);
 
+	if (dev) {
+		struct arm_smmu_master *master = dev_iommu_priv_get(dev);
+		int ret;
+
+		ret = arm_smmu_domain_finalise(smmu_domain, master->smmu);
+		if (ret) {
+			kfree(smmu_domain);
+			return ERR_PTR(ret);
+		}
+	}
 	return &smmu_domain->domain;
 }
 
@@ -3135,6 +3146,7 @@ static struct iommu_ops arm_smmu_ops = {
 	.blocked_domain		= &arm_smmu_blocked_domain,
 	.capable		= arm_smmu_capable,
 	.domain_alloc		= arm_smmu_domain_alloc,
+	.domain_alloc_paging    = arm_smmu_domain_alloc_paging,
 	.probe_device		= arm_smmu_probe_device,
 	.release_device		= arm_smmu_release_device,
 	.device_group		= arm_smmu_device_group,
-- 
2.43.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 00/17] Update SMMUv3 to the modern iommu API (part 1/3)
  2024-02-06 15:12 ` Jason Gunthorpe
@ 2024-02-07  5:27   ` Nicolin Chen
  -1 siblings, 0 replies; 112+ messages in thread
From: Nicolin Chen @ 2024-02-07  5:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, patches, Shameer Kolothum,
	Mostafa Saleh, Zhangfei Gao

On Tue, Feb 06, 2024 at 11:12:37AM -0400, Jason Gunthorpe wrote:
> The SMMUv3 driver was originally written in 2015 when the iommu driver
> facing API looked quite different. The API has evolved, especially lately,
> and the driver has fallen behind.
> 
> This work aims to bring make the SMMUv3 driver the best IOMMU driver with
> the most comprehensive implementation of the API. After all parts it
> addresses:
> 
>  - Global static BLOCKED and IDENTITY domains with 'never fail' attach
>    semantics. BLOCKED is desired for efficient VFIO.
> 
>  - Support map before attach for PAGING iommu_domains.
> 
>  - attach_dev failure does not change the HW configuration.
> 
>  - Fully hitless transitions between IDENTITY -> DMA -> IDENTITY.
>    The API has IOMMU_RESV_DIRECT which is expected to be
>    continuously translating.
> 
>  - Safe transitions between PAGING -> BLOCKED, do not ever temporarily
>    do IDENTITY. This is required for iommufd security.
> 
>  - Full PASID API support including:
>     - S1/SVA domains attached to PASIDs
>     - IDENTITY/BLOCKED/S1 attached to RID
>     - Change of the RID domain while PASIDs are attached
> 
>  - Streamlined SVA support using the core infrastructure
> 
>  - Hitless, whenever possible, change between two domains
> 
>  - iommufd IOMMU_GET_HW_INFO, IOMMU_HWPT_ALLOC_NEST_PARENT, and
>    IOMMU_DOMAIN_NESTED support
> 
> Over all these things are going to become more accessible to iommufd, and
> exposed to VMs, so it is important for the driver to have a robust
> implementation of the API.
> 
> The work is split into three parts, with this part largely focusing on the
> STE and building up to the BLOCKED & IDENTITY global static domains.
> 
> The second part largely focuses on the CD and builds up to having a common
> PASID infrastructure that SVA and S1 domains equally use.
> 
> The third part has some random cleanups and the iommufd related parts.
> 
> Overall this takes the approach of turning the STE/CD programming upside
> down where the CD/STE value is computed right at a driver callback
> function and then pushed down into programming logic. The programming
> logic hides the details of the required CD/STE tear-less update. This
> makes the CD/STE functions independent of the arm_smmu_domain which makes
> it fairly straightforward to untangle all the different call chains, and
> add news ones.
> 
> Further, this frees the arm_smmu_domain related logic from keeping track
> of what state the STE/CD is currently in so it can carefully sequence the
> correct update. There are many new update pairs that are subtly introduced
> as the work progresses.
> 
> The locking to support BTM via arm_smmu_asid_lock is a bit subtle right
> now and patches throughout this work adjust and tighten this so that it is
> clearer and doesn't get broken.
> 
> Once the lower STE layers no longer need to touch arm_smmu_domain we can
> isolate struct arm_smmu_domain to be only used for PAGING domains, audit
> all the to_smmu_domain() calls to be only in PAGING domain ops, and
> introduce the normal global static BLOCKED/IDENTITY domains using the new
> STE infrastructure. Part 2 will ultimately migrate SVA over to use
> arm_smmu_domain as well.
> 
> All parts are on github:
> 
>  https://github.com/jgunthorpe/linux/commits/smmuv3_newapi
> 
> v5:
>  - Rebase on v6.8-rc3
>  - Remove the writer argument to arm_smmu_entry_writer_ops get_used()
>  - Swap order of hweight tests so one call to hweight8() can be removed
>  - Add STRTAB_STE_2_S2VMID used for STRTAB_STE_0_CFG_S1_TRANS, for
>    S2 bypass the VMID is used but 0
>  - Be more exact when generating STEs and store 0's to document the HW
>    is using that value and 0 is actually a deliberate choice for VMID and
>    SHCFG.
>  - Remove cd_table argument to arm_smmu_make_cdtable_ste()
>  - Put arm_smmu_rmr_install_bypass_ste() after setting up a 2 level table
>  - Pull patch "Check that the RID domain is S1 in SVA" from part 2 to
>    guard against memory corruption on failure paths
>  - Tighten the used logic for SHCFG to accommodate nesting patches in
>    part 3
>  - Additional comments and commit message adjustments

I have retested this v5 alone with SVA cases and system sanity.

I also did similar tests with part-2 in the "smmuv3_newapi" branch,
plus adding "iommu.passthrough=y" string to cover the S1DSS.BYPASS
use case.

After that, I retested the entire branch including part-3 with a
nested-smmu VM, to cover different STE configurations.

All results look good.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 00/17] Update SMMUv3 to the modern iommu API (part 1/3)
@ 2024-02-07  5:27   ` Nicolin Chen
  0 siblings, 0 replies; 112+ messages in thread
From: Nicolin Chen @ 2024-02-07  5:27 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, patches, Shameer Kolothum,
	Mostafa Saleh, Zhangfei Gao

On Tue, Feb 06, 2024 at 11:12:37AM -0400, Jason Gunthorpe wrote:
> The SMMUv3 driver was originally written in 2015 when the iommu driver
> facing API looked quite different. The API has evolved, especially lately,
> and the driver has fallen behind.
> 
> This work aims to bring make the SMMUv3 driver the best IOMMU driver with
> the most comprehensive implementation of the API. After all parts it
> addresses:
> 
>  - Global static BLOCKED and IDENTITY domains with 'never fail' attach
>    semantics. BLOCKED is desired for efficient VFIO.
> 
>  - Support map before attach for PAGING iommu_domains.
> 
>  - attach_dev failure does not change the HW configuration.
> 
>  - Fully hitless transitions between IDENTITY -> DMA -> IDENTITY.
>    The API has IOMMU_RESV_DIRECT which is expected to be
>    continuously translating.
> 
>  - Safe transitions between PAGING -> BLOCKED, do not ever temporarily
>    do IDENTITY. This is required for iommufd security.
> 
>  - Full PASID API support including:
>     - S1/SVA domains attached to PASIDs
>     - IDENTITY/BLOCKED/S1 attached to RID
>     - Change of the RID domain while PASIDs are attached
> 
>  - Streamlined SVA support using the core infrastructure
> 
>  - Hitless, whenever possible, change between two domains
> 
>  - iommufd IOMMU_GET_HW_INFO, IOMMU_HWPT_ALLOC_NEST_PARENT, and
>    IOMMU_DOMAIN_NESTED support
> 
> Over all these things are going to become more accessible to iommufd, and
> exposed to VMs, so it is important for the driver to have a robust
> implementation of the API.
> 
> The work is split into three parts, with this part largely focusing on the
> STE and building up to the BLOCKED & IDENTITY global static domains.
> 
> The second part largely focuses on the CD and builds up to having a common
> PASID infrastructure that SVA and S1 domains equally use.
> 
> The third part has some random cleanups and the iommufd related parts.
> 
> Overall this takes the approach of turning the STE/CD programming upside
> down where the CD/STE value is computed right at a driver callback
> function and then pushed down into programming logic. The programming
> logic hides the details of the required CD/STE tear-less update. This
> makes the CD/STE functions independent of the arm_smmu_domain which makes
> it fairly straightforward to untangle all the different call chains, and
> add news ones.
> 
> Further, this frees the arm_smmu_domain related logic from keeping track
> of what state the STE/CD is currently in so it can carefully sequence the
> correct update. There are many new update pairs that are subtly introduced
> as the work progresses.
> 
> The locking to support BTM via arm_smmu_asid_lock is a bit subtle right
> now and patches throughout this work adjust and tighten this so that it is
> clearer and doesn't get broken.
> 
> Once the lower STE layers no longer need to touch arm_smmu_domain we can
> isolate struct arm_smmu_domain to be only used for PAGING domains, audit
> all the to_smmu_domain() calls to be only in PAGING domain ops, and
> introduce the normal global static BLOCKED/IDENTITY domains using the new
> STE infrastructure. Part 2 will ultimately migrate SVA over to use
> arm_smmu_domain as well.
> 
> All parts are on github:
> 
>  https://github.com/jgunthorpe/linux/commits/smmuv3_newapi
> 
> v5:
>  - Rebase on v6.8-rc3
>  - Remove the writer argument to arm_smmu_entry_writer_ops get_used()
>  - Swap order of hweight tests so one call to hweight8() can be removed
>  - Add STRTAB_STE_2_S2VMID used for STRTAB_STE_0_CFG_S1_TRANS, for
>    S2 bypass the VMID is used but 0
>  - Be more exact when generating STEs and store 0's to document the HW
>    is using that value and 0 is actually a deliberate choice for VMID and
>    SHCFG.
>  - Remove cd_table argument to arm_smmu_make_cdtable_ste()
>  - Put arm_smmu_rmr_install_bypass_ste() after setting up a 2 level table
>  - Pull patch "Check that the RID domain is S1 in SVA" from part 2 to
>    guard against memory corruption on failure paths
>  - Tighten the used logic for SHCFG to accommodate nesting patches in
>    part 3
>  - Additional comments and commit message adjustments

I have retested this v5 alone with SVA cases and system sanity.

I also did similar tests with part-2 in the "smmuv3_newapi" branch,
plus adding "iommu.passthrough=y" string to cover the S1DSS.BYPASS
use case.

After that, I retested the entire branch including part-3 with a
nested-smmu VM, to cover different STE configurations.

All results look good.

Tested-by: Nicolin Chen <nicolinc@nvidia.com>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-13 15:37     ` Mostafa Saleh
  -1 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
> Logically arm_smmu_init_strtab() is the function that allocates and
> populates the stream table with the initial value of the STEs. After this
> function returns the stream table should be fully ready.
> 
> arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
> any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
> ensures there is no disruption to the identity mapping during boot.
> 
> Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
> already executes immediately after arm_smmu_init_strtab().
> 
> No functional change intended.

I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
For example in KVM[1] we'd re-use a big part of this driver and rely on similar
low-level functions. But no strong opinion.

[1] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/


Thanks,
Mostafa

> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 6123e5ad95822c..2ab36dcf7c61f5 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -101,6 +101,8 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
>  	{ 0, NULL},
>  };
>  
> +static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu);
> +
>  static void parse_driver_options(struct arm_smmu_device *smmu)
>  {
>  	int i = 0;
> @@ -3256,6 +3258,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>  	cfg->strtab_base_cfg = reg;
>  
>  	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
> +
>  	return 0;
>  }
>  
> @@ -3279,6 +3282,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
>  
>  	ida_init(&smmu->vmid_map);
>  
> +	/* Check for RMRs and install bypass STEs if any */
> +	arm_smmu_rmr_install_bypass_ste(smmu);
>  	return 0;
>  }
>  
> @@ -4073,9 +4078,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	/* Record our private device structure */
>  	platform_set_drvdata(pdev, smmu);
>  
> -	/* Check for RMRs and install bypass STEs if any */
> -	arm_smmu_rmr_install_bypass_ste(smmu);
> -
>  	/* Reset the device */
>  	ret = arm_smmu_device_reset(smmu, bypass);
>  	if (ret)
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
@ 2024-02-13 15:37     ` Mostafa Saleh
  0 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:37 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
> Logically arm_smmu_init_strtab() is the function that allocates and
> populates the stream table with the initial value of the STEs. After this
> function returns the stream table should be fully ready.
> 
> arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
> any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
> ensures there is no disruption to the identity mapping during boot.
> 
> Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
> already executes immediately after arm_smmu_init_strtab().
> 
> No functional change intended.

I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
For example in KVM[1] we'd re-use a big part of this driver and rely on similar
low-level functions. But no strong opinion.

[1] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/


Thanks,
Mostafa

> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 6123e5ad95822c..2ab36dcf7c61f5 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -101,6 +101,8 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
>  	{ 0, NULL},
>  };
>  
> +static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu);
> +
>  static void parse_driver_options(struct arm_smmu_device *smmu)
>  {
>  	int i = 0;
> @@ -3256,6 +3258,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>  	cfg->strtab_base_cfg = reg;
>  
>  	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
> +
>  	return 0;
>  }
>  
> @@ -3279,6 +3282,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
>  
>  	ida_init(&smmu->vmid_map);
>  
> +	/* Check for RMRs and install bypass STEs if any */
> +	arm_smmu_rmr_install_bypass_ste(smmu);
>  	return 0;
>  }
>  
> @@ -4073,9 +4078,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	/* Record our private device structure */
>  	platform_set_drvdata(pdev, smmu);
>  
> -	/* Check for RMRs and install bypass STEs if any */
> -	arm_smmu_rmr_install_bypass_ste(smmu);
> -
>  	/* Reset the device */
>  	ret = arm_smmu_device_reset(smmu, bypass);
>  	if (ret)
> -- 
> 2.43.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 06/17] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-13 15:38     ` Mostafa Saleh
  -1 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:43AM -0400, Jason Gunthorpe wrote:
> The BTM support wants to be able to change the ASID of any smmu_domain.
> When it goes to do this it holds the arm_smmu_asid_lock and iterates over
> the target domain's devices list.
> 
> During attach of a S1 domain we must ensure that the devices list and
> CD are in sync, otherwise we could miss CD updates or a parallel CD update
> could push an out of date CD.
> 
> This is pretty complicated, and almost works today because
> arm_smmu_detach_dev() removes the master from the linked list before
> working on the CD entries, preventing parallel update of the CD.
> 
> However, it does have an issue where the CD can remain programed while the
> domain appears to be unattached. arm_smmu_share_asid() will then not clear
> any CD entriess and install its own CD entry with the same ASID
> concurrently. This creates a small race window where the IOMMU can see two
> ASIDs pointing to different translations.
> 
> Solve this by wrapping most of the attach flow in the
> arm_smmu_asid_lock. This locks more than strictly needed to prepare for
> the next patch which will reorganize the order of the linked list, STE and
> CD changes.
> 
> Move arm_smmu_detach_dev() till after we have initialized the domain so
> the lock can be held for less time.
>

This seems a bit theoretical to me also it requires mis-programming as the master
will issue DMA in detach, but as this is not a hot path, I don’t think overlocking
will is a problem.

Reviewed-by: Mostafa Saleh <smostafa@google.com>

> Reviewed-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 417b2c877ff311..1229545ae6db4e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2639,8 +2639,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  		return -EBUSY;
>  	}
>  
> -	arm_smmu_detach_dev(master);
> -
>  	mutex_lock(&smmu_domain->init_mutex);
>  
>  	if (!smmu_domain->smmu) {
> @@ -2655,6 +2653,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * Prevent arm_smmu_share_asid() from trying to change the ASID
> +	 * of either the old or new domain while we are working on it.
> +	 * This allows the STE and the smmu_domain->devices list to
> +	 * be inconsistent during this routine.
> +	 */
> +	mutex_lock(&arm_smmu_asid_lock);
> +
> +	arm_smmu_detach_dev(master);
> +
>  	master->domain = smmu_domain;
>  
>  	/*
> @@ -2680,13 +2688,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  			}
>  		}
>  
> -		/*
> -		 * Prevent SVA from concurrently modifying the CD or writing to
> -		 * the CD entry
> -		 */
> -		mutex_lock(&arm_smmu_asid_lock);
>  		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
> -		mutex_unlock(&arm_smmu_asid_lock);
>  		if (ret) {
>  			master->domain = NULL;
>  			goto out_list_del;
> @@ -2696,13 +2698,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	arm_smmu_install_ste_for_dev(master);
>  
>  	arm_smmu_enable_ats(master);
> -	return 0;
> +	goto out_unlock;
>  
>  out_list_del:
>  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>  	list_del(&master->domain_head);
>  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
> +out_unlock:
> +	mutex_unlock(&arm_smmu_asid_lock);
>  	return ret;
>  }
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 06/17] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
@ 2024-02-13 15:38     ` Mostafa Saleh
  0 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:38 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:43AM -0400, Jason Gunthorpe wrote:
> The BTM support wants to be able to change the ASID of any smmu_domain.
> When it goes to do this it holds the arm_smmu_asid_lock and iterates over
> the target domain's devices list.
> 
> During attach of a S1 domain we must ensure that the devices list and
> CD are in sync, otherwise we could miss CD updates or a parallel CD update
> could push an out of date CD.
> 
> This is pretty complicated, and almost works today because
> arm_smmu_detach_dev() removes the master from the linked list before
> working on the CD entries, preventing parallel update of the CD.
> 
> However, it does have an issue where the CD can remain programed while the
> domain appears to be unattached. arm_smmu_share_asid() will then not clear
> any CD entriess and install its own CD entry with the same ASID
> concurrently. This creates a small race window where the IOMMU can see two
> ASIDs pointing to different translations.
> 
> Solve this by wrapping most of the attach flow in the
> arm_smmu_asid_lock. This locks more than strictly needed to prepare for
> the next patch which will reorganize the order of the linked list, STE and
> CD changes.
> 
> Move arm_smmu_detach_dev() till after we have initialized the domain so
> the lock can be held for less time.
>

This seems a bit theoretical to me also it requires mis-programming as the master
will issue DMA in detach, but as this is not a hot path, I don’t think overlocking
will is a problem.

Reviewed-by: Mostafa Saleh <smostafa@google.com>

> Reviewed-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 22 ++++++++++++---------
>  1 file changed, 13 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 417b2c877ff311..1229545ae6db4e 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2639,8 +2639,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  		return -EBUSY;
>  	}
>  
> -	arm_smmu_detach_dev(master);
> -
>  	mutex_lock(&smmu_domain->init_mutex);
>  
>  	if (!smmu_domain->smmu) {
> @@ -2655,6 +2653,16 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	if (ret)
>  		return ret;
>  
> +	/*
> +	 * Prevent arm_smmu_share_asid() from trying to change the ASID
> +	 * of either the old or new domain while we are working on it.
> +	 * This allows the STE and the smmu_domain->devices list to
> +	 * be inconsistent during this routine.
> +	 */
> +	mutex_lock(&arm_smmu_asid_lock);
> +
> +	arm_smmu_detach_dev(master);
> +
>  	master->domain = smmu_domain;
>  
>  	/*
> @@ -2680,13 +2688,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  			}
>  		}
>  
> -		/*
> -		 * Prevent SVA from concurrently modifying the CD or writing to
> -		 * the CD entry
> -		 */
> -		mutex_lock(&arm_smmu_asid_lock);
>  		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
> -		mutex_unlock(&arm_smmu_asid_lock);
>  		if (ret) {
>  			master->domain = NULL;
>  			goto out_list_del;
> @@ -2696,13 +2698,15 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	arm_smmu_install_ste_for_dev(master);
>  
>  	arm_smmu_enable_ats(master);
> -	return 0;
> +	goto out_unlock;
>  
>  out_list_del:
>  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>  	list_del(&master->domain_head);
>  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
> +out_unlock:
> +	mutex_unlock(&arm_smmu_asid_lock);
>  	return ret;
>  }
>  
> -- 
> 2.43.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 08/17] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev()
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-13 15:40     ` Mostafa Saleh
  -1 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:45AM -0400, Jason Gunthorpe wrote:
> This was needed because the STE code required the STE to be in
> ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
> can automatically handle all transitions we can remove this step
> from the attach_dev flow.
> 
> A few small bugs exist because of this:
> 
> 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
>    then there will be a moment where the STE points at BYPASS. Since
>    this can be done by VFIO/IOMMUFD it is a small security race.
> 
> 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
>    regions will temporarily become BLOCKED. We'd like drivers to
>    work in a way that allows IOMMU_RESV_DIRECT to be continuously
>    functional during these transitions.
> 
> Make arm_smmu_release_device() put the STE back to the correct
> ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
> this path.
> 
> As noted before the reordering of the linked list/STE/CD changes is OK
> against concurrent arm_smmu_share_asid() because of the
> arm_smmu_asid_lock.
> 
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 1138e868c4d73e..340f3dc82c9ce0 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2562,7 +2562,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  {
>  	unsigned long flags;
> -	struct arm_smmu_ste target;
>  	struct arm_smmu_domain *smmu_domain = master->domain;
>  
>  	if (!smmu_domain)
> @@ -2576,11 +2575,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  
>  	master->domain = NULL;
>  	master->ats_enabled = false;
> -	if (disable_bypass)
> -		arm_smmu_make_abort_ste(&target);
> -	else
> -		arm_smmu_make_bypass_ste(&target);
> -	arm_smmu_install_ste_for_dev(master, &target);
>  	/*
>  	 * Clearing the CD entry isn't strictly required to detach the domain
>  	 * since the table is uninstalled anyway, but it helps avoid confusion
> @@ -2928,9 +2922,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  static void arm_smmu_release_device(struct device *dev)
>  {
>  	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +	struct arm_smmu_ste target;
>  
>  	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
>  		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
> +
> +	/* Put the STE back to what arm_smmu_init_strtab() sets */
> +	if (disable_bypass && !dev->iommu->require_direct)
> +		arm_smmu_make_abort_ste(&target);
> +	else
> +		arm_smmu_make_bypass_ste(&target);
> +	arm_smmu_install_ste_for_dev(master, &target);
> +
>  	arm_smmu_detach_dev(master);
>  	arm_smmu_disable_pasid(master);
>  	arm_smmu_remove_master(master);
> -- 
> 2.43.0
> 
I am still reviewing patch-1 and the hitless machinery (also I think -or hope-
this can be simplified), with the assumption that
arm_smmu_install_ste_for_dev()/arm_smmu_write_ste() will do the right thing,
that good looks good to me.

However, as it changes the current behavior of the driver where disable_bypass
used to override require_direct, I am not sure if this would break any existing setups.


Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 08/17] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev()
@ 2024-02-13 15:40     ` Mostafa Saleh
  0 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:40 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:45AM -0400, Jason Gunthorpe wrote:
> This was needed because the STE code required the STE to be in
> ABORT/BYPASS inorder to program a cdtable or S2 STE. Now that the STE code
> can automatically handle all transitions we can remove this step
> from the attach_dev flow.
> 
> A few small bugs exist because of this:
> 
> 1) If the core code does BLOCKED -> UNMANAGED with disable_bypass=false
>    then there will be a moment where the STE points at BYPASS. Since
>    this can be done by VFIO/IOMMUFD it is a small security race.
> 
> 2) If the core code does IDENTITY -> DMA then any IOMMU_RESV_DIRECT
>    regions will temporarily become BLOCKED. We'd like drivers to
>    work in a way that allows IOMMU_RESV_DIRECT to be continuously
>    functional during these transitions.
> 
> Make arm_smmu_release_device() put the STE back to the correct
> ABORT/BYPASS setting. Fix a bug where a IOMMU_RESV_DIRECT was ignored on
> this path.
> 
> As noted before the reordering of the linked list/STE/CD changes is OK
> against concurrent arm_smmu_share_asid() because of the
> arm_smmu_asid_lock.
> 
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 1138e868c4d73e..340f3dc82c9ce0 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2562,7 +2562,6 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  {
>  	unsigned long flags;
> -	struct arm_smmu_ste target;
>  	struct arm_smmu_domain *smmu_domain = master->domain;
>  
>  	if (!smmu_domain)
> @@ -2576,11 +2575,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  
>  	master->domain = NULL;
>  	master->ats_enabled = false;
> -	if (disable_bypass)
> -		arm_smmu_make_abort_ste(&target);
> -	else
> -		arm_smmu_make_bypass_ste(&target);
> -	arm_smmu_install_ste_for_dev(master, &target);
>  	/*
>  	 * Clearing the CD entry isn't strictly required to detach the domain
>  	 * since the table is uninstalled anyway, but it helps avoid confusion
> @@ -2928,9 +2922,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  static void arm_smmu_release_device(struct device *dev)
>  {
>  	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> +	struct arm_smmu_ste target;
>  
>  	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
>  		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
> +
> +	/* Put the STE back to what arm_smmu_init_strtab() sets */
> +	if (disable_bypass && !dev->iommu->require_direct)
> +		arm_smmu_make_abort_ste(&target);
> +	else
> +		arm_smmu_make_bypass_ste(&target);
> +	arm_smmu_install_ste_for_dev(master, &target);
> +
>  	arm_smmu_detach_dev(master);
>  	arm_smmu_disable_pasid(master);
>  	arm_smmu_remove_master(master);
> -- 
> 2.43.0
> 
I am still reviewing patch-1 and the hitless machinery (also I think -or hope-
this can be simplified), with the assumption that
arm_smmu_install_ste_for_dev()/arm_smmu_write_ste() will do the right thing,
that good looks good to me.

However, as it changes the current behavior of the driver where disable_bypass
used to override require_direct, I am not sure if this would break any existing setups.


Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 09/17] iommu/arm-smmu-v3: Put writing the context descriptor in the right order
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-13 15:42     ` Mostafa Saleh
  -1 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:46AM -0400, Jason Gunthorpe wrote:
> Get closer to the IOMMU API ideal that changes between domains can be
> hitless. The ordering for the CD table entry is not entirely clean from
> this perspective.
> 
> When switching away from a STE with a CD table programmed in it we should
> write the new STE first, then clear any old data in the CD entry.
> 
> If we are programming a CD table for the first time to a STE then the CD
> entry should be programmed before the STE is loaded.
> 
> If we are replacing a CD table entry when the STE already points at the CD
> entry then we just need to do the make/break sequence.
> 
> Lift this code out of arm_smmu_detach_dev() so it can all be sequenced
> properly. The only other caller is arm_smmu_release_device() and it is
> going to free the cdtable anyhow, so it doesn't matter what is in it.
> 
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++-------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 340f3dc82c9ce0..2a6ac0af932c54 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2575,14 +2575,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  
>  	master->domain = NULL;
>  	master->ats_enabled = false;
> -	/*
> -	 * Clearing the CD entry isn't strictly required to detach the domain
> -	 * since the table is uninstalled anyway, but it helps avoid confusion
> -	 * in the call to arm_smmu_write_ctx_desc on the next attach (which
> -	 * expects the entry to be empty).
> -	 */
> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab)
> -		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
>  }
>  
>  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> @@ -2659,6 +2651,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  				master->domain = NULL;
>  				goto out_list_del;
>  			}
> +		} else {
> +			/*
> +			 * arm_smmu_write_ctx_desc() relies on the entry being
> +			 * invalid to work, clear any existing entry.
> +			 */
> +			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> +						      NULL);
> +			if (ret) {
> +				master->domain = NULL;
> +				goto out_list_del;
> +			}

Instead of having duplicate
           if (ret) {
               master->domain = NULL;
               goto out_list_del;
           }

In the if and the else, we can just move it outside.

>  		}
>  
>  		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
> @@ -2668,15 +2671,23 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  		}
>  
>  		arm_smmu_make_cdtable_ste(&target, master);
> +		arm_smmu_install_ste_for_dev(master, &target);
>  		break;
>  	case ARM_SMMU_DOMAIN_S2:
>  		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
> +		arm_smmu_install_ste_for_dev(master, &target);
> +		if (master->cd_table.cdtab)
> +			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> +						      NULL);
>  		break;
>  	case ARM_SMMU_DOMAIN_BYPASS:
>  		arm_smmu_make_bypass_ste(&target);
> +		arm_smmu_install_ste_for_dev(master, &target);
> +		if (master->cd_table.cdtab)
> +			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> +						      NULL);
>  		break;
>  	}
This invalidates the CD while the STE is in bypass/S2 which is a new behavior
to the driver, I don’t see anything from the user manual about this, so I
believe that is fine.

Reviewed-by: Mostafa Saleh <smostafa@google.com>

> -	arm_smmu_install_ste_for_dev(master, &target);
>  
>  	arm_smmu_enable_ats(master);
>  	goto out_unlock;
> -- 
> 2.43.0
>

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 09/17] iommu/arm-smmu-v3: Put writing the context descriptor in the right order
@ 2024-02-13 15:42     ` Mostafa Saleh
  0 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:42 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:46AM -0400, Jason Gunthorpe wrote:
> Get closer to the IOMMU API ideal that changes between domains can be
> hitless. The ordering for the CD table entry is not entirely clean from
> this perspective.
> 
> When switching away from a STE with a CD table programmed in it we should
> write the new STE first, then clear any old data in the CD entry.
> 
> If we are programming a CD table for the first time to a STE then the CD
> entry should be programmed before the STE is loaded.
> 
> If we are replacing a CD table entry when the STE already points at the CD
> entry then we just need to do the make/break sequence.
> 
> Lift this code out of arm_smmu_detach_dev() so it can all be sequenced
> properly. The only other caller is arm_smmu_release_device() and it is
> going to free the cdtable anyhow, so it doesn't matter what is in it.
> 
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 29 ++++++++++++++-------
>  1 file changed, 20 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 340f3dc82c9ce0..2a6ac0af932c54 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2575,14 +2575,6 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  
>  	master->domain = NULL;
>  	master->ats_enabled = false;
> -	/*
> -	 * Clearing the CD entry isn't strictly required to detach the domain
> -	 * since the table is uninstalled anyway, but it helps avoid confusion
> -	 * in the call to arm_smmu_write_ctx_desc on the next attach (which
> -	 * expects the entry to be empty).
> -	 */
> -	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && master->cd_table.cdtab)
> -		arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, NULL);
>  }
>  
>  static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> @@ -2659,6 +2651,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  				master->domain = NULL;
>  				goto out_list_del;
>  			}
> +		} else {
> +			/*
> +			 * arm_smmu_write_ctx_desc() relies on the entry being
> +			 * invalid to work, clear any existing entry.
> +			 */
> +			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> +						      NULL);
> +			if (ret) {
> +				master->domain = NULL;
> +				goto out_list_del;
> +			}

Instead of having duplicate
           if (ret) {
               master->domain = NULL;
               goto out_list_del;
           }

In the if and the else, we can just move it outside.

>  		}
>  
>  		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
> @@ -2668,15 +2671,23 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  		}
>  
>  		arm_smmu_make_cdtable_ste(&target, master);
> +		arm_smmu_install_ste_for_dev(master, &target);
>  		break;
>  	case ARM_SMMU_DOMAIN_S2:
>  		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
> +		arm_smmu_install_ste_for_dev(master, &target);
> +		if (master->cd_table.cdtab)
> +			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> +						      NULL);
>  		break;
>  	case ARM_SMMU_DOMAIN_BYPASS:
>  		arm_smmu_make_bypass_ste(&target);
> +		arm_smmu_install_ste_for_dev(master, &target);
> +		if (master->cd_table.cdtab)
> +			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> +						      NULL);
>  		break;
>  	}
This invalidates the CD while the STE is in bypass/S2 which is a new behavior
to the driver, I don’t see anything from the user manual about this, so I
believe that is fine.

Reviewed-by: Mostafa Saleh <smostafa@google.com>

> -	arm_smmu_install_ste_for_dev(master, &target);
>  
>  	arm_smmu_enable_ats(master);
>  	goto out_unlock;
> -- 
> 2.43.0
>

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 10/17] iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-13 15:43     ` Mostafa Saleh
  -1 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 06, 2024 at 11:12:47AM -0400, Jason Gunthorpe wrote:
> The caller already has the domain, just pass it in. A following patch will
> remove master->domain.
> 
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 2a6ac0af932c54..133f13f33df124 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2474,12 +2474,12 @@ static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
>  	return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev));
>  }
>  
> -static void arm_smmu_enable_ats(struct arm_smmu_master *master)
> +static void arm_smmu_enable_ats(struct arm_smmu_master *master,
> +				struct arm_smmu_domain *smmu_domain)
>  {
>  	size_t stu;
>  	struct pci_dev *pdev;
>  	struct arm_smmu_device *smmu = master->smmu;
> -	struct arm_smmu_domain *smmu_domain = master->domain;
>  
>  	/* Don't enable ATS at the endpoint if it's not enabled in the STE */
>  	if (!master->ats_enabled)
> @@ -2495,10 +2495,9 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
>  		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
>  }
>  
> -static void arm_smmu_disable_ats(struct arm_smmu_master *master)
> +static void arm_smmu_disable_ats(struct arm_smmu_master *master,
> +				 struct arm_smmu_domain *smmu_domain)
>  {
> -	struct arm_smmu_domain *smmu_domain = master->domain;
> -
>  	if (!master->ats_enabled)
>  		return;
>  
> @@ -2567,7 +2566,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  	if (!smmu_domain)
>  		return;
>  
> -	arm_smmu_disable_ats(master);
> +	arm_smmu_disable_ats(master, smmu_domain);
>  
>  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>  	list_del(&master->domain_head);
> @@ -2689,7 +2688,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  		break;
>  	}
>  
> -	arm_smmu_enable_ats(master);
> +	arm_smmu_enable_ats(master, smmu_domain);
>  	goto out_unlock;
>  
>  out_list_del:
> -- 
> 2.43.0
>

Reviewed-by: Mostafa Saleh <smostafa@google.com>

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 10/17] iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats()
@ 2024-02-13 15:43     ` Mostafa Saleh
  0 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 06, 2024 at 11:12:47AM -0400, Jason Gunthorpe wrote:
> The caller already has the domain, just pass it in. A following patch will
> remove master->domain.
> 
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 2a6ac0af932c54..133f13f33df124 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2474,12 +2474,12 @@ static bool arm_smmu_ats_supported(struct arm_smmu_master *master)
>  	return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev));
>  }
>  
> -static void arm_smmu_enable_ats(struct arm_smmu_master *master)
> +static void arm_smmu_enable_ats(struct arm_smmu_master *master,
> +				struct arm_smmu_domain *smmu_domain)
>  {
>  	size_t stu;
>  	struct pci_dev *pdev;
>  	struct arm_smmu_device *smmu = master->smmu;
> -	struct arm_smmu_domain *smmu_domain = master->domain;
>  
>  	/* Don't enable ATS at the endpoint if it's not enabled in the STE */
>  	if (!master->ats_enabled)
> @@ -2495,10 +2495,9 @@ static void arm_smmu_enable_ats(struct arm_smmu_master *master)
>  		dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu);
>  }
>  
> -static void arm_smmu_disable_ats(struct arm_smmu_master *master)
> +static void arm_smmu_disable_ats(struct arm_smmu_master *master,
> +				 struct arm_smmu_domain *smmu_domain)
>  {
> -	struct arm_smmu_domain *smmu_domain = master->domain;
> -
>  	if (!master->ats_enabled)
>  		return;
>  
> @@ -2567,7 +2566,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  	if (!smmu_domain)
>  		return;
>  
> -	arm_smmu_disable_ats(master);
> +	arm_smmu_disable_ats(master, smmu_domain);
>  
>  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
>  	list_del(&master->domain_head);
> @@ -2689,7 +2688,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  		break;
>  	}
>  
> -	arm_smmu_enable_ats(master);
> +	arm_smmu_enable_ats(master, smmu_domain);
>  	goto out_unlock;
>  
>  out_list_del:
> -- 
> 2.43.0
>

Reviewed-by: Mostafa Saleh <smostafa@google.com>

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-13 15:45     ` Mostafa Saleh
  -1 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:48AM -0400, Jason Gunthorpe wrote:
> Introducing global statics which are of type struct iommu_domain, not
> struct arm_smmu_domain makes it difficult to retain
> arm_smmu_master->domain, as it can no longer point to an IDENTITY or
> BLOCKED domain.
> 
> The only place that uses the value is arm_smmu_detach_dev(). Change things
> to work like other drivers and call iommu_get_domain_for_dev() to obtain
> the current domain.
> 
> The master->domain is subtly protecting the domain_head against being
> unused, change the domain_head to be INIT'd when the master is not
> attached to a domain instead of garbage/zero.

I don't this the problem here, neither the reason for initialising the
domain_head, can you please clarify the issue?

> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++-------------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
>  2 files changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 133f13f33df124..a98707cd1efccb 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2560,19 +2560,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
>  
>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  {
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev);
> +	struct arm_smmu_domain *smmu_domain;
>  	unsigned long flags;
> -	struct arm_smmu_domain *smmu_domain = master->domain;
>  
> -	if (!smmu_domain)
> +	if (!domain)
>  		return;
>  
> +	smmu_domain = to_smmu_domain(domain);
>  	arm_smmu_disable_ats(master, smmu_domain);
>  
>  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> -	list_del(&master->domain_head);
> +	list_del_init(&master->domain_head);
>  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
> -	master->domain = NULL;
>  	master->ats_enabled = false;
>  }
>  
> @@ -2626,8 +2627,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  
>  	arm_smmu_detach_dev(master);
>  
> -	master->domain = smmu_domain;
> -
>  	/*
>  	 * The SMMU does not support enabling ATS with bypass. When the STE is
>  	 * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and
> @@ -2646,10 +2645,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	case ARM_SMMU_DOMAIN_S1:
>  		if (!master->cd_table.cdtab) {
>  			ret = arm_smmu_alloc_cd_tables(master);
> -			if (ret) {
> -				master->domain = NULL;
> +			if (ret)
>  				goto out_list_del;
> -			}
>  		} else {
>  			/*
>  			 * arm_smmu_write_ctx_desc() relies on the entry being
> @@ -2657,17 +2654,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  			 */
>  			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
>  						      NULL);
> -			if (ret) {
> -				master->domain = NULL;
> +			if (ret)
>  				goto out_list_del;
> -			}
>  		}
>  
>  		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
> -		if (ret) {
> -			master->domain = NULL;
> +		if (ret)
>  			goto out_list_del;
> -		}
>  
>  		arm_smmu_make_cdtable_ste(&target, master);
>  		arm_smmu_install_ste_for_dev(master, &target);
> @@ -2693,7 +2686,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  
>  out_list_del:
>  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> -	list_del(&master->domain_head);
> +	list_del_init(&master->domain_head);
>  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
>  out_unlock:
> @@ -2894,6 +2887,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  	master->dev = dev;
>  	master->smmu = smmu;
>  	INIT_LIST_HEAD(&master->bonds);
> +	INIT_LIST_HEAD(&master->domain_head);
>  	dev_iommu_priv_set(dev, master);
>  
>  	ret = arm_smmu_insert_master(smmu, master);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index cbf4b57719b7b9..587f99701ad30f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -696,7 +696,6 @@ struct arm_smmu_stream {
>  struct arm_smmu_master {
>  	struct arm_smmu_device		*smmu;
>  	struct device			*dev;
> -	struct arm_smmu_domain		*domain;
>  	struct list_head		domain_head;
>  	struct arm_smmu_stream		*streams;
>  	/* Locked by the iommu core using the group mutex */
> -- 
> 2.43.0
>
Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain
@ 2024-02-13 15:45     ` Mostafa Saleh
  0 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 15:45 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:48AM -0400, Jason Gunthorpe wrote:
> Introducing global statics which are of type struct iommu_domain, not
> struct arm_smmu_domain makes it difficult to retain
> arm_smmu_master->domain, as it can no longer point to an IDENTITY or
> BLOCKED domain.
> 
> The only place that uses the value is arm_smmu_detach_dev(). Change things
> to work like other drivers and call iommu_get_domain_for_dev() to obtain
> the current domain.
> 
> The master->domain is subtly protecting the domain_head against being
> unused, change the domain_head to be INIT'd when the master is not
> attached to a domain instead of garbage/zero.

I don't this the problem here, neither the reason for initialising the
domain_head, can you please clarify the issue?

> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 26 ++++++++-------------
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 -
>  2 files changed, 10 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 133f13f33df124..a98707cd1efccb 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2560,19 +2560,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
>  
>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  {
> +	struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev);
> +	struct arm_smmu_domain *smmu_domain;
>  	unsigned long flags;
> -	struct arm_smmu_domain *smmu_domain = master->domain;
>  
> -	if (!smmu_domain)
> +	if (!domain)
>  		return;
>  
> +	smmu_domain = to_smmu_domain(domain);
>  	arm_smmu_disable_ats(master, smmu_domain);
>  
>  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> -	list_del(&master->domain_head);
> +	list_del_init(&master->domain_head);
>  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
> -	master->domain = NULL;
>  	master->ats_enabled = false;
>  }
>  
> @@ -2626,8 +2627,6 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  
>  	arm_smmu_detach_dev(master);
>  
> -	master->domain = smmu_domain;
> -
>  	/*
>  	 * The SMMU does not support enabling ATS with bypass. When the STE is
>  	 * in bypass (STE.Config[2:0] == 0b100), ATS Translation Requests and
> @@ -2646,10 +2645,8 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  	case ARM_SMMU_DOMAIN_S1:
>  		if (!master->cd_table.cdtab) {
>  			ret = arm_smmu_alloc_cd_tables(master);
> -			if (ret) {
> -				master->domain = NULL;
> +			if (ret)
>  				goto out_list_del;
> -			}
>  		} else {
>  			/*
>  			 * arm_smmu_write_ctx_desc() relies on the entry being
> @@ -2657,17 +2654,13 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  			 */
>  			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
>  						      NULL);
> -			if (ret) {
> -				master->domain = NULL;
> +			if (ret)
>  				goto out_list_del;
> -			}
>  		}
>  
>  		ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID, &smmu_domain->cd);
> -		if (ret) {
> -			master->domain = NULL;
> +		if (ret)
>  			goto out_list_del;
> -		}
>  
>  		arm_smmu_make_cdtable_ste(&target, master);
>  		arm_smmu_install_ste_for_dev(master, &target);
> @@ -2693,7 +2686,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
>  
>  out_list_del:
>  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> -	list_del(&master->domain_head);
> +	list_del_init(&master->domain_head);
>  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
>  
>  out_unlock:
> @@ -2894,6 +2887,7 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
>  	master->dev = dev;
>  	master->smmu = smmu;
>  	INIT_LIST_HEAD(&master->bonds);
> +	INIT_LIST_HEAD(&master->domain_head);
>  	dev_iommu_priv_set(dev, master);
>  
>  	ret = arm_smmu_insert_master(smmu, master);
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> index cbf4b57719b7b9..587f99701ad30f 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
> @@ -696,7 +696,6 @@ struct arm_smmu_stream {
>  struct arm_smmu_master {
>  	struct arm_smmu_device		*smmu;
>  	struct device			*dev;
> -	struct arm_smmu_domain		*domain;
>  	struct list_head		domain_head;
>  	struct arm_smmu_stream		*streams;
>  	/* Locked by the iommu core using the group mutex */
> -- 
> 2.43.0
>
Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  2024-02-13 15:37     ` Mostafa Saleh
@ 2024-02-13 16:16       ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 16:16 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:37:43PM +0000, Mostafa Saleh wrote:
> Hi Jason,
> 
> On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
> > Logically arm_smmu_init_strtab() is the function that allocates and
> > populates the stream table with the initial value of the STEs. After this
> > function returns the stream table should be fully ready.
> > 
> > arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
> > any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
> > ensures there is no disruption to the identity mapping during boot.
> > 
> > Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
> > already executes immediately after arm_smmu_init_strtab().
> > 
> > No functional change intended.
> 
> I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
> For example in KVM[1] we'd re-use a big part of this driver and rely on similar
> low-level functions. But no strong opinion.

I'm happy to drop this patch, if no strong opinion I will leave it

> [1] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/

I saw this but I didn't try to figure out too much what it is
doing.. It looks like a new iommu driver that re-uses alot of the
datastructures of SMMU but has a different HW facing API based on
hypercalls?

It seems interesting, but my knee jerk reaction would be that a new
iommu driver proposal needs to implement the new iommu core APIs, not
the old stuff. IMHO, when this progresses past a RFC it needs to come
as a proper submission of just the iommu driver, in the normal way.

I also wonder about the wisdom of sharing so much code. Code sharing
is not unconditionally good if it doesn't have a robust abstraction..

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
@ 2024-02-13 16:16       ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 16:16 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:37:43PM +0000, Mostafa Saleh wrote:
> Hi Jason,
> 
> On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
> > Logically arm_smmu_init_strtab() is the function that allocates and
> > populates the stream table with the initial value of the STEs. After this
> > function returns the stream table should be fully ready.
> > 
> > arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
> > any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
> > ensures there is no disruption to the identity mapping during boot.
> > 
> > Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
> > already executes immediately after arm_smmu_init_strtab().
> > 
> > No functional change intended.
> 
> I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
> For example in KVM[1] we'd re-use a big part of this driver and rely on similar
> low-level functions. But no strong opinion.

I'm happy to drop this patch, if no strong opinion I will leave it

> [1] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/

I saw this but I didn't try to figure out too much what it is
doing.. It looks like a new iommu driver that re-uses alot of the
datastructures of SMMU but has a different HW facing API based on
hypercalls?

It seems interesting, but my knee jerk reaction would be that a new
iommu driver proposal needs to implement the new iommu core APIs, not
the old stuff. IMHO, when this progresses past a RFC it needs to come
as a proper submission of just the iommu driver, in the normal way.

I also wonder about the wisdom of sharing so much code. Code sharing
is not unconditionally good if it doesn't have a robust abstraction..

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 06/17] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
  2024-02-13 15:38     ` Mostafa Saleh
@ 2024-02-13 16:18       ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 16:18 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:38:57PM +0000, Mostafa Saleh wrote:

> This seems a bit theoretical to me also it requires mis-programming as the master
> will issue DMA in detach

iommufd allows VFIO userspace to do this workload as an attack on the
kernel.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 06/17] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev
@ 2024-02-13 16:18       ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 16:18 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:38:57PM +0000, Mostafa Saleh wrote:

> This seems a bit theoretical to me also it requires mis-programming as the master
> will issue DMA in detach

iommufd allows VFIO userspace to do this workload as an attack on the
kernel.

Thanks,
Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 08/17] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev()
  2024-02-13 15:40     ` Mostafa Saleh
@ 2024-02-13 16:26       ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 16:26 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:40:34PM +0000, Mostafa Saleh wrote:
> > @@ -2928,9 +2922,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
> >  static void arm_smmu_release_device(struct device *dev)
> >  {
> >  	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > +	struct arm_smmu_ste target;
> >  
> >  	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
> >  		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
> > +
> > +	/* Put the STE back to what arm_smmu_init_strtab() sets */
> > +	if (disable_bypass && !dev->iommu->require_direct)
> > +		arm_smmu_make_abort_ste(&target);
> > +	else
> > +		arm_smmu_make_bypass_ste(&target);
> > +	arm_smmu_install_ste_for_dev(master, &target);
> > +
> >  	arm_smmu_detach_dev(master);
> >  	arm_smmu_disable_pasid(master);
> >  	arm_smmu_remove_master(master);

> I am still reviewing patch-1 and the hitless machinery (also I think -or hope-
> this can be simplified), with the assumption that
> arm_smmu_install_ste_for_dev()/arm_smmu_write_ste() will do the right thing,
> that good looks good to me.

I'm interested if you can come up with something. Let me know if you
want to bounce some ideas.

> However, as it changes the current behavior of the driver where
> disable_bypass used to override require_direct, I am not sure if
> this would break any existing setups.

Yes, the commit message explains this. It is a little bug.

require_direct takes precedence when building the initial STE, release
should restore the STE back to how it was before probe.

I don't imagine a case where a system was fine with the STE during
boot but doesn't like that same STE during devuce hot unplug???

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 08/17] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev()
@ 2024-02-13 16:26       ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 16:26 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:40:34PM +0000, Mostafa Saleh wrote:
> > @@ -2928,9 +2922,18 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
> >  static void arm_smmu_release_device(struct device *dev)
> >  {
> >  	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
> > +	struct arm_smmu_ste target;
> >  
> >  	if (WARN_ON(arm_smmu_master_sva_enabled(master)))
> >  		iopf_queue_remove_device(master->smmu->evtq.iopf, dev);
> > +
> > +	/* Put the STE back to what arm_smmu_init_strtab() sets */
> > +	if (disable_bypass && !dev->iommu->require_direct)
> > +		arm_smmu_make_abort_ste(&target);
> > +	else
> > +		arm_smmu_make_bypass_ste(&target);
> > +	arm_smmu_install_ste_for_dev(master, &target);
> > +
> >  	arm_smmu_detach_dev(master);
> >  	arm_smmu_disable_pasid(master);
> >  	arm_smmu_remove_master(master);

> I am still reviewing patch-1 and the hitless machinery (also I think -or hope-
> this can be simplified), with the assumption that
> arm_smmu_install_ste_for_dev()/arm_smmu_write_ste() will do the right thing,
> that good looks good to me.

I'm interested if you can come up with something. Let me know if you
want to bounce some ideas.

> However, as it changes the current behavior of the driver where
> disable_bypass used to override require_direct, I am not sure if
> this would break any existing setups.

Yes, the commit message explains this. It is a little bug.

require_direct takes precedence when building the initial STE, release
should restore the STE back to how it was before probe.

I don't imagine a case where a system was fine with the STE during
boot but doesn't like that same STE during devuce hot unplug???

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain
  2024-02-13 15:45     ` Mostafa Saleh
@ 2024-02-13 16:37       ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 16:37 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:45:34PM +0000, Mostafa Saleh wrote:
> Hi Jason,
> 
> On Tue, Feb 06, 2024 at 11:12:48AM -0400, Jason Gunthorpe wrote:
> > Introducing global statics which are of type struct iommu_domain, not
> > struct arm_smmu_domain makes it difficult to retain
> > arm_smmu_master->domain, as it can no longer point to an IDENTITY or
> > BLOCKED domain.
> > 
> > The only place that uses the value is arm_smmu_detach_dev(). Change things
> > to work like other drivers and call iommu_get_domain_for_dev() to obtain
> > the current domain.
> > 
> > The master->domain is subtly protecting the domain_head against being
> > unused, change the domain_head to be INIT'd when the master is not
> > attached to a domain instead of garbage/zero.
> 
> I don't this the problem here, neither the reason for initialising the
> domain_head, can you please clarify the issue?

I didn't notice it either. Eric found it:

https://lore.kernel.org/linux-iommu/6fff20dd-46d5-4974-a4a5-fb4e7a59ce44@redhat.com/

> > @@ -2560,19 +2560,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
> >  
> >  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> >  {
> > +	struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev);
> > +	struct arm_smmu_domain *smmu_domain;
> >  	unsigned long flags;
> > -	struct arm_smmu_domain *smmu_domain = master->domain;

master->domain is NULL here which happens in cases where the current
RID domain is not a PAGING domain.

> > -	if (!smmu_domain)
> > +	if (!domain)
> >  		return;

Which used to early exit

> >  
> > +	smmu_domain = to_smmu_domain(domain);
> >  	arm_smmu_disable_ats(master, smmu_domain);
> >  
> >  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> > -	list_del(&master->domain_head);
> > +	list_del_init(&master->domain_head);
> >  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);

But now would cause the list_del() to hit a non-inited list_head and
explode.

Instead we keep the list head init'd and the list_del is a NOP.

Tricky right??

I changed the comment like this:

The master->domain is subtly protecting the master->domain_head against
being unused as only PAGING domains will set master->domain and only
paging domains use the master->domain_head. To make it simple keep the
master->domain_head initialized so that the list_del() logic just does
nothing for non-PAGING domains.

OK?

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain
@ 2024-02-13 16:37       ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 16:37 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:45:34PM +0000, Mostafa Saleh wrote:
> Hi Jason,
> 
> On Tue, Feb 06, 2024 at 11:12:48AM -0400, Jason Gunthorpe wrote:
> > Introducing global statics which are of type struct iommu_domain, not
> > struct arm_smmu_domain makes it difficult to retain
> > arm_smmu_master->domain, as it can no longer point to an IDENTITY or
> > BLOCKED domain.
> > 
> > The only place that uses the value is arm_smmu_detach_dev(). Change things
> > to work like other drivers and call iommu_get_domain_for_dev() to obtain
> > the current domain.
> > 
> > The master->domain is subtly protecting the domain_head against being
> > unused, change the domain_head to be INIT'd when the master is not
> > attached to a domain instead of garbage/zero.
> 
> I don't this the problem here, neither the reason for initialising the
> domain_head, can you please clarify the issue?

I didn't notice it either. Eric found it:

https://lore.kernel.org/linux-iommu/6fff20dd-46d5-4974-a4a5-fb4e7a59ce44@redhat.com/

> > @@ -2560,19 +2560,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
> >  
> >  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> >  {
> > +	struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev);
> > +	struct arm_smmu_domain *smmu_domain;
> >  	unsigned long flags;
> > -	struct arm_smmu_domain *smmu_domain = master->domain;

master->domain is NULL here which happens in cases where the current
RID domain is not a PAGING domain.

> > -	if (!smmu_domain)
> > +	if (!domain)
> >  		return;

Which used to early exit

> >  
> > +	smmu_domain = to_smmu_domain(domain);
> >  	arm_smmu_disable_ats(master, smmu_domain);
> >  
> >  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> > -	list_del(&master->domain_head);
> > +	list_del_init(&master->domain_head);
> >  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);

But now would cause the list_del() to hit a non-inited list_head and
explode.

Instead we keep the list head init'd and the list_del is a NOP.

Tricky right??

I changed the comment like this:

The master->domain is subtly protecting the master->domain_head against
being unused as only PAGING domains will set master->domain and only
paging domains use the master->domain_head. To make it simple keep the
master->domain_head initialized so that the list_del() logic just does
nothing for non-PAGING domains.

OK?

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  2024-02-13 16:16       ` Jason Gunthorpe
@ 2024-02-13 16:46         ` Mostafa Saleh
  -1 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 16:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 12:16:01PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 13, 2024 at 03:37:43PM +0000, Mostafa Saleh wrote:
> > Hi Jason,
> > 
> > On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
> > > Logically arm_smmu_init_strtab() is the function that allocates and
> > > populates the stream table with the initial value of the STEs. After this
> > > function returns the stream table should be fully ready.
> > > 
> > > arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
> > > any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
> > > ensures there is no disruption to the identity mapping during boot.
> > > 
> > > Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
> > > already executes immediately after arm_smmu_init_strtab().
> > > 
> > > No functional change intended.
> > 
> > I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
> > For example in KVM[1] we'd re-use a big part of this driver and rely on similar
> > low-level functions. But no strong opinion.
> 
> I'm happy to drop this patch, if no strong opinion I will leave it
> 
> > [1] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/
> 
> I saw this but I didn't try to figure out too much what it is
> doing.. It looks like a new iommu driver that re-uses alot of the
> datastructures of SMMU but has a different HW facing API based on
> hypercalls?

Yes, this is an implementation for the SMMUv3 driver in EL2 in KVM that is needed
for DMA isolation for Protected KVM(pKVM) as the host is untrusted in this case and
can attack the hypervisor through DMA.

This is similar to the kernel driver but with a bunch of tricks such as page table
allocation, power management and more.

Also we have some plans to extend it to KVM guests, (Initially I was considering
KVM-VFIO device but now with the new iommufd stuff, it might fit in there)

> It seems interesting, but my knee jerk reaction would be that a new
> iommu driver proposal needs to implement the new iommu core APIs, not
> the old stuff. IMHO, when this progresses past a RFC it needs to come
> as a proper submission of just the iommu driver, in the normal way.
> 
> I also wonder about the wisdom of sharing so much code. Code sharing
> is not unconditionally good if it doesn't have a robust abstraction..
>

The idea is that as the hardware is common and most of the dt binding also,
we shouldn’t really reinvent the wheel, so this series moves the common code
for the page-table library to be shared with EL1/EL2 and for HW/FW probe and
init to be shared between the kernel drivers and adds a lot of the infrastructure
for IOMMUs in the hypervisor.

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
@ 2024-02-13 16:46         ` Mostafa Saleh
  0 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 16:46 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 12:16:01PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 13, 2024 at 03:37:43PM +0000, Mostafa Saleh wrote:
> > Hi Jason,
> > 
> > On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
> > > Logically arm_smmu_init_strtab() is the function that allocates and
> > > populates the stream table with the initial value of the STEs. After this
> > > function returns the stream table should be fully ready.
> > > 
> > > arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
> > > any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
> > > ensures there is no disruption to the identity mapping during boot.
> > > 
> > > Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
> > > already executes immediately after arm_smmu_init_strtab().
> > > 
> > > No functional change intended.
> > 
> > I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
> > For example in KVM[1] we'd re-use a big part of this driver and rely on similar
> > low-level functions. But no strong opinion.
> 
> I'm happy to drop this patch, if no strong opinion I will leave it
> 
> > [1] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/
> 
> I saw this but I didn't try to figure out too much what it is
> doing.. It looks like a new iommu driver that re-uses alot of the
> datastructures of SMMU but has a different HW facing API based on
> hypercalls?

Yes, this is an implementation for the SMMUv3 driver in EL2 in KVM that is needed
for DMA isolation for Protected KVM(pKVM) as the host is untrusted in this case and
can attack the hypervisor through DMA.

This is similar to the kernel driver but with a bunch of tricks such as page table
allocation, power management and more.

Also we have some plans to extend it to KVM guests, (Initially I was considering
KVM-VFIO device but now with the new iommufd stuff, it might fit in there)

> It seems interesting, but my knee jerk reaction would be that a new
> iommu driver proposal needs to implement the new iommu core APIs, not
> the old stuff. IMHO, when this progresses past a RFC it needs to come
> as a proper submission of just the iommu driver, in the normal way.
> 
> I also wonder about the wisdom of sharing so much code. Code sharing
> is not unconditionally good if it doesn't have a robust abstraction..
>

The idea is that as the hardware is common and most of the dt binding also,
we shouldn’t really reinvent the wheel, so this series moves the common code
for the page-table library to be shared with EL1/EL2 and for HW/FW probe and
init to be shared between the kernel drivers and adds a lot of the infrastructure
for IOMMUs in the hypervisor.

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain
  2024-02-13 16:37       ` Jason Gunthorpe
@ 2024-02-13 17:00         ` Mostafa Saleh
  -1 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 17:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 12:37:39PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 13, 2024 at 03:45:34PM +0000, Mostafa Saleh wrote:
> > Hi Jason,
> > 
> > On Tue, Feb 06, 2024 at 11:12:48AM -0400, Jason Gunthorpe wrote:
> > > Introducing global statics which are of type struct iommu_domain, not
> > > struct arm_smmu_domain makes it difficult to retain
> > > arm_smmu_master->domain, as it can no longer point to an IDENTITY or
> > > BLOCKED domain.
> > > 
> > > The only place that uses the value is arm_smmu_detach_dev(). Change things
> > > to work like other drivers and call iommu_get_domain_for_dev() to obtain
> > > the current domain.
> > > 
> > > The master->domain is subtly protecting the domain_head against being
> > > unused, change the domain_head to be INIT'd when the master is not
> > > attached to a domain instead of garbage/zero.
> > 
> > I don't this the problem here, neither the reason for initialising the
> > domain_head, can you please clarify the issue?
> 
> I didn't notice it either. Eric found it:
> 
> https://lore.kernel.org/linux-iommu/6fff20dd-46d5-4974-a4a5-fb4e7a59ce44@redhat.com/
> 
> > > @@ -2560,19 +2560,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
> > >  
> > >  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> > >  {
> > > +	struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev);
> > > +	struct arm_smmu_domain *smmu_domain;
> > >  	unsigned long flags;
> > > -	struct arm_smmu_domain *smmu_domain = master->domain;
> 
> master->domain is NULL here which happens in cases where the current
> RID domain is not a PAGING domain.
> 
> > > -	if (!smmu_domain)
> > > +	if (!domain)
> > >  		return;
> 
> Which used to early exit
> 
> > >  
> > > +	smmu_domain = to_smmu_domain(domain);
> > >  	arm_smmu_disable_ats(master, smmu_domain);
> > >  
> > >  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> > > -	list_del(&master->domain_head);
> > > +	list_del_init(&master->domain_head);
> > >  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> 
> But now would cause the list_del() to hit a non-inited list_head and
> explode.
> 
> Instead we keep the list head init'd and the list_del is a NOP.
> 
> Tricky right??
> 
> I changed the comment like this:
> 
> The master->domain is subtly protecting the master->domain_head against
> being unused as only PAGING domains will set master->domain and only
> paging domains use the master->domain_head. To make it simple keep the
> master->domain_head initialized so that the list_del() logic just does
> nothing for non-PAGING domains.
> 
> OK?

Ahh, I see, as iommu_get_domain_for_dev() now returns a valid domain.
Thanks for the explanation, that makes sense.

Reviewed-by: Mostafa Saleh <smostafa@google.com>

Thanks,
Mostafa

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain
@ 2024-02-13 17:00         ` Mostafa Saleh
  0 siblings, 0 replies; 112+ messages in thread
From: Mostafa Saleh @ 2024-02-13 17:00 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 12:37:39PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 13, 2024 at 03:45:34PM +0000, Mostafa Saleh wrote:
> > Hi Jason,
> > 
> > On Tue, Feb 06, 2024 at 11:12:48AM -0400, Jason Gunthorpe wrote:
> > > Introducing global statics which are of type struct iommu_domain, not
> > > struct arm_smmu_domain makes it difficult to retain
> > > arm_smmu_master->domain, as it can no longer point to an IDENTITY or
> > > BLOCKED domain.
> > > 
> > > The only place that uses the value is arm_smmu_detach_dev(). Change things
> > > to work like other drivers and call iommu_get_domain_for_dev() to obtain
> > > the current domain.
> > > 
> > > The master->domain is subtly protecting the domain_head against being
> > > unused, change the domain_head to be INIT'd when the master is not
> > > attached to a domain instead of garbage/zero.
> > 
> > I don't this the problem here, neither the reason for initialising the
> > domain_head, can you please clarify the issue?
> 
> I didn't notice it either. Eric found it:
> 
> https://lore.kernel.org/linux-iommu/6fff20dd-46d5-4974-a4a5-fb4e7a59ce44@redhat.com/
> 
> > > @@ -2560,19 +2560,20 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
> > >  
> > >  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
> > >  {
> > > +	struct iommu_domain *domain = iommu_get_domain_for_dev(master->dev);
> > > +	struct arm_smmu_domain *smmu_domain;
> > >  	unsigned long flags;
> > > -	struct arm_smmu_domain *smmu_domain = master->domain;
> 
> master->domain is NULL here which happens in cases where the current
> RID domain is not a PAGING domain.
> 
> > > -	if (!smmu_domain)
> > > +	if (!domain)
> > >  		return;
> 
> Which used to early exit
> 
> > >  
> > > +	smmu_domain = to_smmu_domain(domain);
> > >  	arm_smmu_disable_ats(master, smmu_domain);
> > >  
> > >  	spin_lock_irqsave(&smmu_domain->devices_lock, flags);
> > > -	list_del(&master->domain_head);
> > > +	list_del_init(&master->domain_head);
> > >  	spin_unlock_irqrestore(&smmu_domain->devices_lock, flags);
> 
> But now would cause the list_del() to hit a non-inited list_head and
> explode.
> 
> Instead we keep the list head init'd and the list_del is a NOP.
> 
> Tricky right??
> 
> I changed the comment like this:
> 
> The master->domain is subtly protecting the master->domain_head against
> being unused as only PAGING domains will set master->domain and only
> paging domains use the master->domain_head. To make it simple keep the
> master->domain_head initialized so that the list_del() logic just does
> nothing for non-PAGING domains.
> 
> OK?

Ahh, I see, as iommu_get_domain_for_dev() now returns a valid domain.
Thanks for the explanation, that makes sense.

Reviewed-by: Mostafa Saleh <smostafa@google.com>

Thanks,
Mostafa

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 09/17] iommu/arm-smmu-v3: Put writing the context descriptor in the right order
  2024-02-13 15:42     ` Mostafa Saleh
@ 2024-02-13 17:50       ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 17:50 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:42:53PM +0000, Mostafa Saleh wrote:
> > @@ -2659,6 +2651,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> >  				master->domain = NULL;
> >  				goto out_list_del;
> >  			}
> > +		} else {
> > +			/*
> > +			 * arm_smmu_write_ctx_desc() relies on the entry being
> > +			 * invalid to work, clear any existing entry.
> > +			 */
> > +			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> > +						      NULL);
> > +			if (ret) {
> > +				master->domain = NULL;
> > +				goto out_list_del;
> > +			}
> 
> Instead of having duplicate
>            if (ret) {
>                master->domain = NULL;
>                goto out_list_del;
>            }
> 
> In the if and the else, we can just move it outside.

Stylistically I often try to avoid shifting the error path from its
statement, but it is OK either way..

However, part 2 removes the need for error handling here entirely, so
let's leave it.

> > @@ -2668,15 +2671,23 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> >  		}
> >  
> >  		arm_smmu_make_cdtable_ste(&target, master);
> > +		arm_smmu_install_ste_for_dev(master, &target);
> >  		break;
> >  	case ARM_SMMU_DOMAIN_S2:
> >  		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
> > +		arm_smmu_install_ste_for_dev(master, &target);
> > +		if (master->cd_table.cdtab)
> > +			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> > +						      NULL);
> >  		break;
> >  	case ARM_SMMU_DOMAIN_BYPASS:
> >  		arm_smmu_make_bypass_ste(&target);
> > +		arm_smmu_install_ste_for_dev(master, &target);
> > +		if (master->cd_table.cdtab)
> > +			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> > +						      NULL);
> >  		break;
> >  	}
> This invalidates the CD while the STE is in bypass/S2 which is a new behavior
> to the driver, 

Yes

> I don’t see anything from the user manual about this, so I
> believe that is fine.

Nor do I. Nor can I see any reason why HW would care. We also
invalidate ASID's and VMID's after their tables have been removed from
the STE/CD too.

There are other options here if this is found out to be a trouble but
they are convoluted enough to not do them without a concrete reason.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 09/17] iommu/arm-smmu-v3: Put writing the context descriptor in the right order
@ 2024-02-13 17:50       ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-13 17:50 UTC (permalink / raw)
  To: Mostafa Saleh
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On Tue, Feb 13, 2024 at 03:42:53PM +0000, Mostafa Saleh wrote:
> > @@ -2659,6 +2651,17 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> >  				master->domain = NULL;
> >  				goto out_list_del;
> >  			}
> > +		} else {
> > +			/*
> > +			 * arm_smmu_write_ctx_desc() relies on the entry being
> > +			 * invalid to work, clear any existing entry.
> > +			 */
> > +			ret = arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> > +						      NULL);
> > +			if (ret) {
> > +				master->domain = NULL;
> > +				goto out_list_del;
> > +			}
> 
> Instead of having duplicate
>            if (ret) {
>                master->domain = NULL;
>                goto out_list_del;
>            }
> 
> In the if and the else, we can just move it outside.

Stylistically I often try to avoid shifting the error path from its
statement, but it is OK either way..

However, part 2 removes the need for error handling here entirely, so
let's leave it.

> > @@ -2668,15 +2671,23 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
> >  		}
> >  
> >  		arm_smmu_make_cdtable_ste(&target, master);
> > +		arm_smmu_install_ste_for_dev(master, &target);
> >  		break;
> >  	case ARM_SMMU_DOMAIN_S2:
> >  		arm_smmu_make_s2_domain_ste(&target, master, smmu_domain);
> > +		arm_smmu_install_ste_for_dev(master, &target);
> > +		if (master->cd_table.cdtab)
> > +			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> > +						      NULL);
> >  		break;
> >  	case ARM_SMMU_DOMAIN_BYPASS:
> >  		arm_smmu_make_bypass_ste(&target);
> > +		arm_smmu_install_ste_for_dev(master, &target);
> > +		if (master->cd_table.cdtab)
> > +			arm_smmu_write_ctx_desc(master, IOMMU_NO_PASID,
> > +						      NULL);
> >  		break;
> >  	}
> This invalidates the CD while the STE is in bypass/S2 which is a new behavior
> to the driver, 

Yes

> I don’t see anything from the user manual about this, so I
> believe that is fine.

Nor do I. Nor can I see any reason why HW would care. We also
invalidate ASID's and VMID's after their tables have been removed from
the STE/CD too.

There are other options here if this is found out to be a trouble but
they are convoluted enough to not do them without a concrete reason.

Thanks,
Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-15 13:49     ` Will Deacon
  -1 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-15 13:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:38AM -0400, Jason Gunthorpe wrote:
> As the comment in arm_smmu_write_strtab_ent() explains, this routine has
> been limited to only work correctly in certain scenarios that the caller
> must ensure. Generally the caller must put the STE into ABORT or BYPASS
> before attempting to program it to something else.

This is looking pretty good now, but I have a few comments inline.

>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 330 ++++++++++++++++----
>  1 file changed, 263 insertions(+), 67 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 0ffb1cf17e0b2e..f0b915567cbcdc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
>  	ARM_SMMU_MAX_MSIS,
>  };
>  
> +struct arm_smmu_entry_writer_ops;
> +struct arm_smmu_entry_writer {
> +	const struct arm_smmu_entry_writer_ops *ops;
> +	struct arm_smmu_master *master;
> +};
> +
> +struct arm_smmu_entry_writer_ops {
> +	unsigned int num_entry_qwords;
> +	__le64 v_bit;
> +	void (*get_used)(const __le64 *entry, __le64 *used);
> +	void (*sync)(struct arm_smmu_entry_writer *writer);
> +};

Can we avoid the indirection for now, please? I'm sure we'll want it later
when you extend this to CDs, but for the initial support it just makes it
more difficult to follow the flow. Should be a trivial thing to drop, I
hope.

> +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
>  {
> +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
> +
> +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> +		return;
> +
> +	/*
> +	 * See 13.5 Summary of attribute/permission configuration fields for the
> +	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
> +	 * and S2 only.
> +	 */
> +	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> +	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> +	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> +	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> +		     STRTAB_STE_1_S1DSS_BYPASS))
> +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);

Huh, SHCFG is really getting in the way here, isn't it? I think it also
means we don't have a "hitless" transition from stage-2 translation ->
bypass. I'm inclined to leave it set to "use incoming" all the time; the
only difference I can see is if you have stage-2 translation and a
master emitting outer-shareable transactions, in which case they'd now
be outer-shareable instead of inner-shareable, which I think is harmless.

Additionally, it looks like there's an existing buglet here in that we
shouldn't set SHCFG if SMMU_IDR1.ATTR_TYPES_OVR == 0.

> +
> +	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
> +	switch (cfg) {
> +	case STRTAB_STE_0_CFG_ABORT:
> +	case STRTAB_STE_0_CFG_BYPASS:
> +		break;
> +	case STRTAB_STE_0_CFG_S1_TRANS:
> +		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
> +					    STRTAB_STE_0_S1CTXPTR_MASK |
> +					    STRTAB_STE_0_S1CDMAX);
> +		used_bits[1] |=
> +			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
> +				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
> +				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
> +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
> +		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
> +		break;
> +	case STRTAB_STE_0_CFG_S2_TRANS:
> +		used_bits[1] |=
> +			cpu_to_le64(STRTAB_STE_1_EATS);
> +		used_bits[2] |=
> +			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
> +				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
> +				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
> +		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
> +		break;

With SHCFG fixed, can we go a step further with this and simply identify
the live qwords directly, rather than on a field-by-field basis? I think
we should be able to do the same "hitless" transitions you want with the
coarser granularity.

Will

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-15 13:49     ` Will Deacon
  0 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-15 13:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

Hi Jason,

On Tue, Feb 06, 2024 at 11:12:38AM -0400, Jason Gunthorpe wrote:
> As the comment in arm_smmu_write_strtab_ent() explains, this routine has
> been limited to only work correctly in certain scenarios that the caller
> must ensure. Generally the caller must put the STE into ABORT or BYPASS
> before attempting to program it to something else.

This is looking pretty good now, but I have a few comments inline.

>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 330 ++++++++++++++++----
>  1 file changed, 263 insertions(+), 67 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index 0ffb1cf17e0b2e..f0b915567cbcdc 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
>  	ARM_SMMU_MAX_MSIS,
>  };
>  
> +struct arm_smmu_entry_writer_ops;
> +struct arm_smmu_entry_writer {
> +	const struct arm_smmu_entry_writer_ops *ops;
> +	struct arm_smmu_master *master;
> +};
> +
> +struct arm_smmu_entry_writer_ops {
> +	unsigned int num_entry_qwords;
> +	__le64 v_bit;
> +	void (*get_used)(const __le64 *entry, __le64 *used);
> +	void (*sync)(struct arm_smmu_entry_writer *writer);
> +};

Can we avoid the indirection for now, please? I'm sure we'll want it later
when you extend this to CDs, but for the initial support it just makes it
more difficult to follow the flow. Should be a trivial thing to drop, I
hope.

> +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
>  {
> +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
> +
> +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> +		return;
> +
> +	/*
> +	 * See 13.5 Summary of attribute/permission configuration fields for the
> +	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
> +	 * and S2 only.
> +	 */
> +	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> +	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> +	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> +	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> +		     STRTAB_STE_1_S1DSS_BYPASS))
> +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);

Huh, SHCFG is really getting in the way here, isn't it? I think it also
means we don't have a "hitless" transition from stage-2 translation ->
bypass. I'm inclined to leave it set to "use incoming" all the time; the
only difference I can see is if you have stage-2 translation and a
master emitting outer-shareable transactions, in which case they'd now
be outer-shareable instead of inner-shareable, which I think is harmless.

Additionally, it looks like there's an existing buglet here in that we
shouldn't set SHCFG if SMMU_IDR1.ATTR_TYPES_OVR == 0.

> +
> +	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
> +	switch (cfg) {
> +	case STRTAB_STE_0_CFG_ABORT:
> +	case STRTAB_STE_0_CFG_BYPASS:
> +		break;
> +	case STRTAB_STE_0_CFG_S1_TRANS:
> +		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
> +					    STRTAB_STE_0_S1CTXPTR_MASK |
> +					    STRTAB_STE_0_S1CDMAX);
> +		used_bits[1] |=
> +			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
> +				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
> +				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
> +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
> +		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
> +		break;
> +	case STRTAB_STE_0_CFG_S2_TRANS:
> +		used_bits[1] |=
> +			cpu_to_le64(STRTAB_STE_1_EATS);
> +		used_bits[2] |=
> +			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
> +				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
> +				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
> +		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
> +		break;

With SHCFG fixed, can we go a step further with this and simply identify
the live qwords directly, rather than on a field-by-field basis? I think
we should be able to do the same "hitless" transitions you want with the
coarser granularity.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-15 13:49     ` Will Deacon
@ 2024-02-15 16:01       ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-15 16:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 01:49:53PM +0000, Will Deacon wrote:
> Hi Jason,
> 
> On Tue, Feb 06, 2024 at 11:12:38AM -0400, Jason Gunthorpe wrote:
> > As the comment in arm_smmu_write_strtab_ent() explains, this routine has
> > been limited to only work correctly in certain scenarios that the caller
> > must ensure. Generally the caller must put the STE into ABORT or BYPASS
> > before attempting to program it to something else.
> 
> This is looking pretty good now, but I have a few comments inline.

Ok

> > @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
> >  	ARM_SMMU_MAX_MSIS,
> >  };
> >  
> > +struct arm_smmu_entry_writer_ops;
> > +struct arm_smmu_entry_writer {
> > +	const struct arm_smmu_entry_writer_ops *ops;
> > +	struct arm_smmu_master *master;
> > +};
> > +
> > +struct arm_smmu_entry_writer_ops {
> > +	unsigned int num_entry_qwords;
> > +	__le64 v_bit;
> > +	void (*get_used)(const __le64 *entry, __le64 *used);
> > +	void (*sync)(struct arm_smmu_entry_writer *writer);
> > +};
> 
> Can we avoid the indirection for now, please? I'm sure we'll want it later
> when you extend this to CDs, but for the initial support it just makes it
> more difficult to follow the flow. Should be a trivial thing to drop, I
> hope.

We can.

> > +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> >  {
> > +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
> > +
> > +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> > +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > +		return;
> > +
> > +	/*
> > +	 * See 13.5 Summary of attribute/permission configuration fields for the
> > +	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
> > +	 * and S2 only.
> > +	 */
> > +	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> > +	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> > +	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> > +	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> > +		     STRTAB_STE_1_S1DSS_BYPASS))
> > +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
> 
> Huh, SHCFG is really getting in the way here, isn't it? 

I wouldn't say that.. It is just a complicated bit of the spec. One of
the things we recently did was to audit all the cache settings and, at
least, we then realized that SHCFG was being subtly used by S2 as
well..

Not sure if that was intentional or if it was just missed from the
spec that the S2 uses the value too.

From that perspective I view this layout of used to be valuable. It
forces the kind of reflection and rigor that I think is helpful. The
fact we found a thing to improve on by inspection is proof of this
worth to me.

> I think it also means we don't have a "hitless" transition from
> stage-2 translation -> bypass.

Hmm, I didn't notice that. The kunit passed:

[    0.511483] 1..1
[    0.511510]     KTAP version 1
[    0.511551]     # Subtest: arm-smmu-v3-kunit-test
[    0.511592]     # module: arm_smmu_v3_test
[    0.511594]     1..10
[    0.511910]     ok 1 arm_smmu_v3_write_ste_test_bypass_to_abort
[    0.512110]     ok 2 arm_smmu_v3_write_ste_test_abort_to_bypass
[    0.512386]     ok 3 arm_smmu_v3_write_ste_test_cdtable_to_abort
[    0.512631]     ok 4 arm_smmu_v3_write_ste_test_abort_to_cdtable
[    0.512874]     ok 5 arm_smmu_v3_write_ste_test_cdtable_to_bypass
[    0.513075]     ok 6 arm_smmu_v3_write_ste_test_bypass_to_cdtable
[    0.513275]     ok 7 arm_smmu_v3_write_ste_test_cdtable_s1dss_change
[    0.513466]     ok 8 arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass
[    0.513672]     ok 9 arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass
[    0.514148]     ok 10 arm_smmu_v3_write_ste_test_non_hitless

Which I see is because it did not test the S2 case...

> I'm inclined to leave it set to "use incoming" all the time; the
> only difference I can see is if you have stage-2 translation and a
> master emitting outer-shareable transactions, in which case they'd now
> be outer-shareable instead of inner-shareable, which I think is harmless.

Broadly it seems to me to make sense that the iommu would try to have
a consistent translation - that bypass and S2 use different
cachability doesn't seem great. But isn't the current S2 value of 0
"non-sharable"?

> Additionally, it looks like there's an existing buglet here in that we
> shouldn't set SHCFG if SMMU_IDR1.ATTR_TYPES_OVR == 0.

Ah because the spec says RES0.. I'll add these two into the pile of
random stuff in part 3

> > +	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
> > +	switch (cfg) {
> > +	case STRTAB_STE_0_CFG_ABORT:
> > +	case STRTAB_STE_0_CFG_BYPASS:
> > +		break;
> > +	case STRTAB_STE_0_CFG_S1_TRANS:
> > +		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
> > +					    STRTAB_STE_0_S1CTXPTR_MASK |
> > +					    STRTAB_STE_0_S1CDMAX);
> > +		used_bits[1] |=
> > +			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
> > +				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
> > +				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
> > +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
> > +		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
> > +		break;
> > +	case STRTAB_STE_0_CFG_S2_TRANS:
> > +		used_bits[1] |=
> > +			cpu_to_le64(STRTAB_STE_1_EATS);
> > +		used_bits[2] |=
> > +			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
> > +				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
> > +				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
> > +		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
> > +		break;
> 
> With SHCFG fixed, can we go a step further with this and simply identify
> the live qwords directly, rather than on a field-by-field basis? I think
> we should be able to do the same "hitless" transitions you want with the
> coarser granularity.

Not naively, Michael's excellent unit test shows it.. My understanding
of your idea was roughly thus:

void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
{
	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));

	used_bits[0] = U64_MAX;
	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
		return;

	/*
	 * See 13.5 Summary of attribute/permission configuration fields for the
	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
	 * and S2 only.
	 */
	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
		     STRTAB_STE_1_S1DSS_BYPASS))
		used_bits[1] |= U64_MAX;

	used_bits[0] |= U64_MAX;
	switch (cfg) {
	case STRTAB_STE_0_CFG_ABORT:
	case STRTAB_STE_0_CFG_BYPASS:
		break;
	case STRTAB_STE_0_CFG_S1_TRANS:
		used_bits[0] |= U64_MAX;
		used_bits[1] |= U64_MAX;
		used_bits[2] |= U64_MAX;
		break;
	case STRTAB_STE_0_CFG_NESTED:
		used_bits[0] |= U64_MAX;
		used_bits[1] |= U64_MAX;
		fallthrough;
	case STRTAB_STE_0_CFG_S2_TRANS:
		used_bits[1] |= U64_MAX;
		used_bits[2] |= U64_MAX;
		used_bits[3] |= U64_MAX;
		break;

	default:
		memset(used_bits, 0xFF, sizeof(struct arm_smmu_ste));
		WARN_ON(true);
	}
}

And the failures:

[    0.500676]     ok 1 arm_smmu_v3_write_ste_test_bypass_to_abort
[    0.500818]     ok 2 arm_smmu_v3_write_ste_test_abort_to_bypass
[    0.501014]     ok 3 arm_smmu_v3_write_ste_test_cdtable_to_abort
[    0.501197]     ok 4 arm_smmu_v3_write_ste_test_abort_to_cdtable
[    0.501340]     # arm_smmu_v3_write_ste_test_cdtable_to_bypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:128
[    0.501340]     Expected test_writer.invalid_entry_written == !hitless, but
[    0.501340]         test_writer.invalid_entry_written == 1 (0x1)
[    0.501340]         !hitless == 0 (0x0)
[    0.501489]     not ok 5 arm_smmu_v3_write_ste_test_cdtable_to_bypass
[    0.501787]     # arm_smmu_v3_write_ste_test_bypass_to_cdtable: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:128
[    0.501787]     Expected test_writer.invalid_entry_written == !hitless, but
[    0.501787]         test_writer.invalid_entry_written == 1 (0x1)
[    0.501787]         !hitless == 0 (0x0)
[    0.501931]     not ok 6 arm_smmu_v3_write_ste_test_bypass_to_cdtable
[    0.502274]     ok 7 arm_smmu_v3_write_ste_test_cdtable_s1dss_change
[    0.502397]     # arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:128
[    0.502397]     Expected test_writer.invalid_entry_written == !hitless, but
[    0.502397]         test_writer.invalid_entry_written == 1 (0x1)
[    0.502397]         !hitless == 0 (0x0)
[    0.502473]     # arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:129
[    0.502473]     Expected test_writer.num_syncs == num_syncs_expected, but
[    0.502473]         test_writer.num_syncs == 3 (0x3)
[    0.502473]         num_syncs_expected == 2 (0x2)
[    0.502784]     not ok 8 arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass
[    0.503073]     # arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:128
[    0.503073]     Expected test_writer.invalid_entry_written == !hitless, but
[    0.503073]         test_writer.invalid_entry_written == 1 (0x1)
[    0.503073]         !hitless == 0 (0x0)
[    0.503176]     # arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:129
[    0.503176]     Expected test_writer.num_syncs == num_syncs_expected, but
[    0.503176]         test_writer.num_syncs == 3 (0x3)
[    0.503176]         num_syncs_expected == 2 (0x2)
[    0.503464]     not ok 9 arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass
[    0.503807]     ok 10 arm_smmu_v3_write_ste_test_non_hitless

BYPASS -> S1 requires changing overlapping bits in qword 1. The
programming sequence would look like this:

start qw[1] = SHCFG_INCOMING
      qw[1] = SHCFG_INCOMING | S1DSS
      qw[0] = S1 mode
      qw[1] = S1DSS

The two states are sharing qw[1] and BYPASS ignores all of it except
SHCFG_INCOMING. Since bypass would have its qw[1] marked as used due
to the SHCFG there is no way to express that it is not looking at the
other bits.

We'd have to really start doing really hacky things like remove the
SHCFG as a used field entirely - but I think if you do that you break
the entire logic of the design and also go backwards to having
programming that only works if STEs are constructed in certain ways.

Thanks,
Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-15 16:01       ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-15 16:01 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 01:49:53PM +0000, Will Deacon wrote:
> Hi Jason,
> 
> On Tue, Feb 06, 2024 at 11:12:38AM -0400, Jason Gunthorpe wrote:
> > As the comment in arm_smmu_write_strtab_ent() explains, this routine has
> > been limited to only work correctly in certain scenarios that the caller
> > must ensure. Generally the caller must put the STE into ABORT or BYPASS
> > before attempting to program it to something else.
> 
> This is looking pretty good now, but I have a few comments inline.

Ok

> > @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
> >  	ARM_SMMU_MAX_MSIS,
> >  };
> >  
> > +struct arm_smmu_entry_writer_ops;
> > +struct arm_smmu_entry_writer {
> > +	const struct arm_smmu_entry_writer_ops *ops;
> > +	struct arm_smmu_master *master;
> > +};
> > +
> > +struct arm_smmu_entry_writer_ops {
> > +	unsigned int num_entry_qwords;
> > +	__le64 v_bit;
> > +	void (*get_used)(const __le64 *entry, __le64 *used);
> > +	void (*sync)(struct arm_smmu_entry_writer *writer);
> > +};
> 
> Can we avoid the indirection for now, please? I'm sure we'll want it later
> when you extend this to CDs, but for the initial support it just makes it
> more difficult to follow the flow. Should be a trivial thing to drop, I
> hope.

We can.

> > +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> >  {
> > +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
> > +
> > +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> > +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > +		return;
> > +
> > +	/*
> > +	 * See 13.5 Summary of attribute/permission configuration fields for the
> > +	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
> > +	 * and S2 only.
> > +	 */
> > +	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> > +	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> > +	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> > +	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> > +		     STRTAB_STE_1_S1DSS_BYPASS))
> > +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
> 
> Huh, SHCFG is really getting in the way here, isn't it? 

I wouldn't say that.. It is just a complicated bit of the spec. One of
the things we recently did was to audit all the cache settings and, at
least, we then realized that SHCFG was being subtly used by S2 as
well..

Not sure if that was intentional or if it was just missed from the
spec that the S2 uses the value too.

From that perspective I view this layout of used to be valuable. It
forces the kind of reflection and rigor that I think is helpful. The
fact we found a thing to improve on by inspection is proof of this
worth to me.

> I think it also means we don't have a "hitless" transition from
> stage-2 translation -> bypass.

Hmm, I didn't notice that. The kunit passed:

[    0.511483] 1..1
[    0.511510]     KTAP version 1
[    0.511551]     # Subtest: arm-smmu-v3-kunit-test
[    0.511592]     # module: arm_smmu_v3_test
[    0.511594]     1..10
[    0.511910]     ok 1 arm_smmu_v3_write_ste_test_bypass_to_abort
[    0.512110]     ok 2 arm_smmu_v3_write_ste_test_abort_to_bypass
[    0.512386]     ok 3 arm_smmu_v3_write_ste_test_cdtable_to_abort
[    0.512631]     ok 4 arm_smmu_v3_write_ste_test_abort_to_cdtable
[    0.512874]     ok 5 arm_smmu_v3_write_ste_test_cdtable_to_bypass
[    0.513075]     ok 6 arm_smmu_v3_write_ste_test_bypass_to_cdtable
[    0.513275]     ok 7 arm_smmu_v3_write_ste_test_cdtable_s1dss_change
[    0.513466]     ok 8 arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass
[    0.513672]     ok 9 arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass
[    0.514148]     ok 10 arm_smmu_v3_write_ste_test_non_hitless

Which I see is because it did not test the S2 case...

> I'm inclined to leave it set to "use incoming" all the time; the
> only difference I can see is if you have stage-2 translation and a
> master emitting outer-shareable transactions, in which case they'd now
> be outer-shareable instead of inner-shareable, which I think is harmless.

Broadly it seems to me to make sense that the iommu would try to have
a consistent translation - that bypass and S2 use different
cachability doesn't seem great. But isn't the current S2 value of 0
"non-sharable"?

> Additionally, it looks like there's an existing buglet here in that we
> shouldn't set SHCFG if SMMU_IDR1.ATTR_TYPES_OVR == 0.

Ah because the spec says RES0.. I'll add these two into the pile of
random stuff in part 3

> > +	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
> > +	switch (cfg) {
> > +	case STRTAB_STE_0_CFG_ABORT:
> > +	case STRTAB_STE_0_CFG_BYPASS:
> > +		break;
> > +	case STRTAB_STE_0_CFG_S1_TRANS:
> > +		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
> > +					    STRTAB_STE_0_S1CTXPTR_MASK |
> > +					    STRTAB_STE_0_S1CDMAX);
> > +		used_bits[1] |=
> > +			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
> > +				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
> > +				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
> > +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
> > +		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
> > +		break;
> > +	case STRTAB_STE_0_CFG_S2_TRANS:
> > +		used_bits[1] |=
> > +			cpu_to_le64(STRTAB_STE_1_EATS);
> > +		used_bits[2] |=
> > +			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
> > +				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
> > +				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
> > +		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
> > +		break;
> 
> With SHCFG fixed, can we go a step further with this and simply identify
> the live qwords directly, rather than on a field-by-field basis? I think
> we should be able to do the same "hitless" transitions you want with the
> coarser granularity.

Not naively, Michael's excellent unit test shows it.. My understanding
of your idea was roughly thus:

void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
{
	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));

	used_bits[0] = U64_MAX;
	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
		return;

	/*
	 * See 13.5 Summary of attribute/permission configuration fields for the
	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
	 * and S2 only.
	 */
	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
		     STRTAB_STE_1_S1DSS_BYPASS))
		used_bits[1] |= U64_MAX;

	used_bits[0] |= U64_MAX;
	switch (cfg) {
	case STRTAB_STE_0_CFG_ABORT:
	case STRTAB_STE_0_CFG_BYPASS:
		break;
	case STRTAB_STE_0_CFG_S1_TRANS:
		used_bits[0] |= U64_MAX;
		used_bits[1] |= U64_MAX;
		used_bits[2] |= U64_MAX;
		break;
	case STRTAB_STE_0_CFG_NESTED:
		used_bits[0] |= U64_MAX;
		used_bits[1] |= U64_MAX;
		fallthrough;
	case STRTAB_STE_0_CFG_S2_TRANS:
		used_bits[1] |= U64_MAX;
		used_bits[2] |= U64_MAX;
		used_bits[3] |= U64_MAX;
		break;

	default:
		memset(used_bits, 0xFF, sizeof(struct arm_smmu_ste));
		WARN_ON(true);
	}
}

And the failures:

[    0.500676]     ok 1 arm_smmu_v3_write_ste_test_bypass_to_abort
[    0.500818]     ok 2 arm_smmu_v3_write_ste_test_abort_to_bypass
[    0.501014]     ok 3 arm_smmu_v3_write_ste_test_cdtable_to_abort
[    0.501197]     ok 4 arm_smmu_v3_write_ste_test_abort_to_cdtable
[    0.501340]     # arm_smmu_v3_write_ste_test_cdtable_to_bypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:128
[    0.501340]     Expected test_writer.invalid_entry_written == !hitless, but
[    0.501340]         test_writer.invalid_entry_written == 1 (0x1)
[    0.501340]         !hitless == 0 (0x0)
[    0.501489]     not ok 5 arm_smmu_v3_write_ste_test_cdtable_to_bypass
[    0.501787]     # arm_smmu_v3_write_ste_test_bypass_to_cdtable: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:128
[    0.501787]     Expected test_writer.invalid_entry_written == !hitless, but
[    0.501787]         test_writer.invalid_entry_written == 1 (0x1)
[    0.501787]         !hitless == 0 (0x0)
[    0.501931]     not ok 6 arm_smmu_v3_write_ste_test_bypass_to_cdtable
[    0.502274]     ok 7 arm_smmu_v3_write_ste_test_cdtable_s1dss_change
[    0.502397]     # arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:128
[    0.502397]     Expected test_writer.invalid_entry_written == !hitless, but
[    0.502397]         test_writer.invalid_entry_written == 1 (0x1)
[    0.502397]         !hitless == 0 (0x0)
[    0.502473]     # arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:129
[    0.502473]     Expected test_writer.num_syncs == num_syncs_expected, but
[    0.502473]         test_writer.num_syncs == 3 (0x3)
[    0.502473]         num_syncs_expected == 2 (0x2)
[    0.502784]     not ok 8 arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass
[    0.503073]     # arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:128
[    0.503073]     Expected test_writer.invalid_entry_written == !hitless, but
[    0.503073]         test_writer.invalid_entry_written == 1 (0x1)
[    0.503073]         !hitless == 0 (0x0)
[    0.503176]     # arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass: EXPECTATION FAILED at drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c:129
[    0.503176]     Expected test_writer.num_syncs == num_syncs_expected, but
[    0.503176]         test_writer.num_syncs == 3 (0x3)
[    0.503176]         num_syncs_expected == 2 (0x2)
[    0.503464]     not ok 9 arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass
[    0.503807]     ok 10 arm_smmu_v3_write_ste_test_non_hitless

BYPASS -> S1 requires changing overlapping bits in qword 1. The
programming sequence would look like this:

start qw[1] = SHCFG_INCOMING
      qw[1] = SHCFG_INCOMING | S1DSS
      qw[0] = S1 mode
      qw[1] = S1DSS

The two states are sharing qw[1] and BYPASS ignores all of it except
SHCFG_INCOMING. Since bypass would have its qw[1] marked as used due
to the SHCFG there is no way to express that it is not looking at the
other bits.

We'd have to really start doing really hacky things like remove the
SHCFG as a used field entirely - but I think if you do that you break
the entire logic of the design and also go backwards to having
programming that only works if STEs are constructed in certain ways.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-15 17:27     ` Robin Murphy
  -1 siblings, 0 replies; 112+ messages in thread
From: Robin Murphy @ 2024-02-15 17:27 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Joerg Roedel, linux-arm-kernel, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On 06/02/2024 3:12 pm, Jason Gunthorpe wrote:
> This allows writing the flow of arm_smmu_write_strtab_ent() around abort
> and bypass domains more naturally.
> 
> Note that the core code no longer supplies NULL domains, though there is
> still a flow in the driver that end up in arm_smmu_write_strtab_ent() with
> NULL. A later patch will remove it.
> 
> Remove the duplicate calculation of the STE in arm_smmu_init_bypass_stes()
> and remove the force parameter. arm_smmu_rmr_install_bypass_ste() can now
> simply invoke arm_smmu_make_bypass_ste() directly.
> 
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Mostafa Saleh <smostafa@google.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 97 ++++++++++++---------
>   1 file changed, 55 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index f0b915567cbcdc..6123e5ad95822c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1498,6 +1498,24 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
>   	}
>   }
>   
> +static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
> +{
> +	memset(target, 0, sizeof(*target));
> +	target->data[0] = cpu_to_le64(
> +		STRTAB_STE_0_V |
> +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT));
> +}
> +
> +static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
> +{
> +	memset(target, 0, sizeof(*target));
> +	target->data[0] = cpu_to_le64(
> +		STRTAB_STE_0_V |
> +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS));
> +	target->data[1] = cpu_to_le64(
> +		FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
> +}
> +
>   static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   				      struct arm_smmu_ste *dst)
>   {
> @@ -1508,37 +1526,31 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   	struct arm_smmu_domain *smmu_domain = master->domain;
>   	struct arm_smmu_ste target = {};
>   
> -	if (smmu_domain) {
> -		switch (smmu_domain->stage) {
> -		case ARM_SMMU_DOMAIN_S1:
> -			cd_table = &master->cd_table;
> -			break;
> -		case ARM_SMMU_DOMAIN_S2:
> -			s2_cfg = &smmu_domain->s2_cfg;
> -			break;
> -		default:
> -			break;
> -		}
> +	if (!smmu_domain) {
> +		if (disable_bypass)
> +			arm_smmu_make_abort_ste(&target);
> +		else
> +			arm_smmu_make_bypass_ste(&target);
> +		arm_smmu_write_ste(master, sid, dst, &target);
> +		return;
> +	}
> +
> +	switch (smmu_domain->stage) {
> +	case ARM_SMMU_DOMAIN_S1:
> +		cd_table = &master->cd_table;
> +		break;
> +	case ARM_SMMU_DOMAIN_S2:
> +		s2_cfg = &smmu_domain->s2_cfg;
> +		break;
> +	case ARM_SMMU_DOMAIN_BYPASS:
> +		arm_smmu_make_bypass_ste(&target);
> +		arm_smmu_write_ste(master, sid, dst, &target);
> +		return;
>   	}
>   
>   	/* Nuke the existing STE_0 value, as we're going to rewrite it */
>   	val = STRTAB_STE_0_V;
>   
> -	/* Bypass/fault */
> -	if (!smmu_domain || !(cd_table || s2_cfg)) {
> -		if (!smmu_domain && disable_bypass)
> -			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
> -		else
> -			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
> -
> -		target.data[0] = cpu_to_le64(val);
> -		target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
> -						STRTAB_STE_1_SHCFG_INCOMING));
> -		target.data[2] = 0; /* Nuke the VMID */
> -		arm_smmu_write_ste(master, sid, dst, &target);
> -		return;
> -	}
> -
>   	if (cd_table) {
>   		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
>   			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
> @@ -1583,22 +1595,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   	arm_smmu_write_ste(master, sid, dst, &target);
>   }
>   
> +/*
> + * This can safely directly manipulate the STE memory without a sync sequence
> + * because the STE table has not been installed in the SMMU yet.
> + */
>   static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,

This name is long out-of-date - if we're refreshing this area, please 
rename to something relevant to what it actually does, e.g. 
s/bypass/initial/.

Although frankly I also think that at this point we should just get rid 
of the disable_bypass parameter altogether - it's been almost entirely 
meaningless since default domain support was added, and any tenuous 
cases for wanting inital STEs to be bypass should probably be using RMRs 
now anyway.

Thanks,
Robin.

> -				      unsigned int nent, bool force)
> +				      unsigned int nent)
>   {
>   	unsigned int i;
> -	u64 val = STRTAB_STE_0_V;
> -
> -	if (disable_bypass && !force)
> -		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
> -	else
> -		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>   
>   	for (i = 0; i < nent; ++i) {
> -		strtab->data[0] = cpu_to_le64(val);
> -		strtab->data[1] = cpu_to_le64(FIELD_PREP(
> -			STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
> -		strtab->data[2] = 0;
> +		if (disable_bypass)
> +			arm_smmu_make_abort_ste(strtab);
> +		else
> +			arm_smmu_make_bypass_ste(strtab);
>   		strtab++;
>   	}
>   }
> @@ -1626,7 +1636,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>   		return -ENOMEM;
>   	}
>   
> -	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
> +	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
>   	arm_smmu_write_strtab_l1_desc(strtab, desc);
>   	return 0;
>   }
> @@ -3245,7 +3255,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>   	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
>   	cfg->strtab_base_cfg = reg;
>   
> -	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
> +	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
>   	return 0;
>   }
>   
> @@ -3956,7 +3966,6 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
>   	iort_get_rmr_sids(dev_fwnode(smmu->dev), &rmr_list);
>   
>   	list_for_each_entry(e, &rmr_list, list) {
> -		struct arm_smmu_ste *step;
>   		struct iommu_iort_rmr_data *rmr;
>   		int ret, i;
>   
> @@ -3969,8 +3978,12 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
>   				continue;
>   			}
>   
> -			step = arm_smmu_get_step_for_sid(smmu, rmr->sids[i]);
> -			arm_smmu_init_bypass_stes(step, 1, true);
> +			/*
> +			 * STE table is not programmed to HW, see
> +			 * arm_smmu_init_bypass_stes()
> +			 */
> +			arm_smmu_make_bypass_ste(
> +				arm_smmu_get_step_for_sid(smmu, rmr->sids[i]));
>   		}
>   	}
>   

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
@ 2024-02-15 17:27     ` Robin Murphy
  0 siblings, 0 replies; 112+ messages in thread
From: Robin Murphy @ 2024-02-15 17:27 UTC (permalink / raw)
  To: Jason Gunthorpe, iommu, Joerg Roedel, linux-arm-kernel, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On 06/02/2024 3:12 pm, Jason Gunthorpe wrote:
> This allows writing the flow of arm_smmu_write_strtab_ent() around abort
> and bypass domains more naturally.
> 
> Note that the core code no longer supplies NULL domains, though there is
> still a flow in the driver that end up in arm_smmu_write_strtab_ent() with
> NULL. A later patch will remove it.
> 
> Remove the duplicate calculation of the STE in arm_smmu_init_bypass_stes()
> and remove the force parameter. arm_smmu_rmr_install_bypass_ste() can now
> simply invoke arm_smmu_make_bypass_ste() directly.
> 
> Reviewed-by: Michael Shavit <mshavit@google.com>
> Reviewed-by: Nicolin Chen <nicolinc@nvidia.com>
> Reviewed-by: Mostafa Saleh <smostafa@google.com>
> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
> Tested-by: Moritz Fischer <moritzf@google.com>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> ---
>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 97 ++++++++++++---------
>   1 file changed, 55 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index f0b915567cbcdc..6123e5ad95822c 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -1498,6 +1498,24 @@ static void arm_smmu_write_ste(struct arm_smmu_master *master, u32 sid,
>   	}
>   }
>   
> +static void arm_smmu_make_abort_ste(struct arm_smmu_ste *target)
> +{
> +	memset(target, 0, sizeof(*target));
> +	target->data[0] = cpu_to_le64(
> +		STRTAB_STE_0_V |
> +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT));
> +}
> +
> +static void arm_smmu_make_bypass_ste(struct arm_smmu_ste *target)
> +{
> +	memset(target, 0, sizeof(*target));
> +	target->data[0] = cpu_to_le64(
> +		STRTAB_STE_0_V |
> +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS));
> +	target->data[1] = cpu_to_le64(
> +		FIELD_PREP(STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
> +}
> +
>   static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   				      struct arm_smmu_ste *dst)
>   {
> @@ -1508,37 +1526,31 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   	struct arm_smmu_domain *smmu_domain = master->domain;
>   	struct arm_smmu_ste target = {};
>   
> -	if (smmu_domain) {
> -		switch (smmu_domain->stage) {
> -		case ARM_SMMU_DOMAIN_S1:
> -			cd_table = &master->cd_table;
> -			break;
> -		case ARM_SMMU_DOMAIN_S2:
> -			s2_cfg = &smmu_domain->s2_cfg;
> -			break;
> -		default:
> -			break;
> -		}
> +	if (!smmu_domain) {
> +		if (disable_bypass)
> +			arm_smmu_make_abort_ste(&target);
> +		else
> +			arm_smmu_make_bypass_ste(&target);
> +		arm_smmu_write_ste(master, sid, dst, &target);
> +		return;
> +	}
> +
> +	switch (smmu_domain->stage) {
> +	case ARM_SMMU_DOMAIN_S1:
> +		cd_table = &master->cd_table;
> +		break;
> +	case ARM_SMMU_DOMAIN_S2:
> +		s2_cfg = &smmu_domain->s2_cfg;
> +		break;
> +	case ARM_SMMU_DOMAIN_BYPASS:
> +		arm_smmu_make_bypass_ste(&target);
> +		arm_smmu_write_ste(master, sid, dst, &target);
> +		return;
>   	}
>   
>   	/* Nuke the existing STE_0 value, as we're going to rewrite it */
>   	val = STRTAB_STE_0_V;
>   
> -	/* Bypass/fault */
> -	if (!smmu_domain || !(cd_table || s2_cfg)) {
> -		if (!smmu_domain && disable_bypass)
> -			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
> -		else
> -			val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
> -
> -		target.data[0] = cpu_to_le64(val);
> -		target.data[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG,
> -						STRTAB_STE_1_SHCFG_INCOMING));
> -		target.data[2] = 0; /* Nuke the VMID */
> -		arm_smmu_write_ste(master, sid, dst, &target);
> -		return;
> -	}
> -
>   	if (cd_table) {
>   		u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ?
>   			STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1;
> @@ -1583,22 +1595,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
>   	arm_smmu_write_ste(master, sid, dst, &target);
>   }
>   
> +/*
> + * This can safely directly manipulate the STE memory without a sync sequence
> + * because the STE table has not been installed in the SMMU yet.
> + */
>   static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,

This name is long out-of-date - if we're refreshing this area, please 
rename to something relevant to what it actually does, e.g. 
s/bypass/initial/.

Although frankly I also think that at this point we should just get rid 
of the disable_bypass parameter altogether - it's been almost entirely 
meaningless since default domain support was added, and any tenuous 
cases for wanting inital STEs to be bypass should probably be using RMRs 
now anyway.

Thanks,
Robin.

> -				      unsigned int nent, bool force)
> +				      unsigned int nent)
>   {
>   	unsigned int i;
> -	u64 val = STRTAB_STE_0_V;
> -
> -	if (disable_bypass && !force)
> -		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT);
> -	else
> -		val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS);
>   
>   	for (i = 0; i < nent; ++i) {
> -		strtab->data[0] = cpu_to_le64(val);
> -		strtab->data[1] = cpu_to_le64(FIELD_PREP(
> -			STRTAB_STE_1_SHCFG, STRTAB_STE_1_SHCFG_INCOMING));
> -		strtab->data[2] = 0;
> +		if (disable_bypass)
> +			arm_smmu_make_abort_ste(strtab);
> +		else
> +			arm_smmu_make_bypass_ste(strtab);
>   		strtab++;
>   	}
>   }
> @@ -1626,7 +1636,7 @@ static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid)
>   		return -ENOMEM;
>   	}
>   
> -	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false);
> +	arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT);
>   	arm_smmu_write_strtab_l1_desc(strtab, desc);
>   	return 0;
>   }
> @@ -3245,7 +3255,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>   	reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits);
>   	cfg->strtab_base_cfg = reg;
>   
> -	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false);
> +	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
>   	return 0;
>   }
>   
> @@ -3956,7 +3966,6 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
>   	iort_get_rmr_sids(dev_fwnode(smmu->dev), &rmr_list);
>   
>   	list_for_each_entry(e, &rmr_list, list) {
> -		struct arm_smmu_ste *step;
>   		struct iommu_iort_rmr_data *rmr;
>   		int ret, i;
>   
> @@ -3969,8 +3978,12 @@ static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu)
>   				continue;
>   			}
>   
> -			step = arm_smmu_get_step_for_sid(smmu, rmr->sids[i]);
> -			arm_smmu_init_bypass_stes(step, 1, true);
> +			/*
> +			 * STE table is not programmed to HW, see
> +			 * arm_smmu_init_bypass_stes()
> +			 */
> +			arm_smmu_make_bypass_ste(
> +				arm_smmu_get_step_for_sid(smmu, rmr->sids[i]));
>   		}
>   	}
>   

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-15 16:01       ` Jason Gunthorpe
@ 2024-02-15 18:42         ` Robin Murphy
  -1 siblings, 0 replies; 112+ messages in thread
From: Robin Murphy @ 2024-02-15 18:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On 15/02/2024 4:01 pm, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 01:49:53PM +0000, Will Deacon wrote:
>> Hi Jason,
>>
>> On Tue, Feb 06, 2024 at 11:12:38AM -0400, Jason Gunthorpe wrote:
>>> As the comment in arm_smmu_write_strtab_ent() explains, this routine has
>>> been limited to only work correctly in certain scenarios that the caller
>>> must ensure. Generally the caller must put the STE into ABORT or BYPASS
>>> before attempting to program it to something else.
>>
>> This is looking pretty good now, but I have a few comments inline.
> 
> Ok
> 
>>> @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
>>>   	ARM_SMMU_MAX_MSIS,
>>>   };
>>>   
>>> +struct arm_smmu_entry_writer_ops;
>>> +struct arm_smmu_entry_writer {
>>> +	const struct arm_smmu_entry_writer_ops *ops;
>>> +	struct arm_smmu_master *master;
>>> +};
>>> +
>>> +struct arm_smmu_entry_writer_ops {
>>> +	unsigned int num_entry_qwords;
>>> +	__le64 v_bit;
>>> +	void (*get_used)(const __le64 *entry, __le64 *used);
>>> +	void (*sync)(struct arm_smmu_entry_writer *writer);
>>> +};
>>
>> Can we avoid the indirection for now, please? I'm sure we'll want it later
>> when you extend this to CDs, but for the initial support it just makes it
>> more difficult to follow the flow. Should be a trivial thing to drop, I
>> hope.
> 
> We can.

Ack, the abstraction is really hard to follow, and much of that seems 
entirely self-inflicted in the amount of recalculating information which 
was in-context in a previous step but then thrown away. And as best I 
can tell I think it will still end up doing more CFGIs than needed.

Keeping a single monolithic check-and-update function will be *so* much 
easier to understand and maintain. As far as CDs go, anything we might 
reasonably want to change in a live CD is all in the first word so I 
don't see any value in attempting to generalise further on that side of 
things. Maybe arm_smmu_write_ctx_desc() could stand to be a bit 
prettier, but honestly I don't think it's too bad as-is.

>>> +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
>>>   {
>>> +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
>>> +
>>> +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
>>> +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
>>> +		return;
>>> +
>>> +	/*
>>> +	 * See 13.5 Summary of attribute/permission configuration fields for the
>>> +	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
>>> +	 * and S2 only.
>>> +	 */
>>> +	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
>>> +	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
>>> +	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
>>> +	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
>>> +		     STRTAB_STE_1_S1DSS_BYPASS))
>>> +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
>>
>> Huh, SHCFG is really getting in the way here, isn't it?
> 
> I wouldn't say that.. It is just a complicated bit of the spec. One of
> the things we recently did was to audit all the cache settings and, at
> least, we then realized that SHCFG was being subtly used by S2 as
> well..

Yeah, that really shouldn't be subtle; incoming attributes are replaced 
by S1 translation, thus they are relevant to not-S1 configs.

I think it's likely to be significantly more straightforward to give up 
on the switch statement and jump straight into the more architectural 
paradigm at this level, e.g.

	// Stage 1
	if (cfg & BIT(0)) {
		...
	} else {
		...
	}
	// Stage 2
	if (cfg & BIT(1)) {
		...
	} else {
		...
	}

Thanks,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-15 18:42         ` Robin Murphy
  0 siblings, 0 replies; 112+ messages in thread
From: Robin Murphy @ 2024-02-15 18:42 UTC (permalink / raw)
  To: Jason Gunthorpe, Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On 15/02/2024 4:01 pm, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 01:49:53PM +0000, Will Deacon wrote:
>> Hi Jason,
>>
>> On Tue, Feb 06, 2024 at 11:12:38AM -0400, Jason Gunthorpe wrote:
>>> As the comment in arm_smmu_write_strtab_ent() explains, this routine has
>>> been limited to only work correctly in certain scenarios that the caller
>>> must ensure. Generally the caller must put the STE into ABORT or BYPASS
>>> before attempting to program it to something else.
>>
>> This is looking pretty good now, but I have a few comments inline.
> 
> Ok
> 
>>> @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
>>>   	ARM_SMMU_MAX_MSIS,
>>>   };
>>>   
>>> +struct arm_smmu_entry_writer_ops;
>>> +struct arm_smmu_entry_writer {
>>> +	const struct arm_smmu_entry_writer_ops *ops;
>>> +	struct arm_smmu_master *master;
>>> +};
>>> +
>>> +struct arm_smmu_entry_writer_ops {
>>> +	unsigned int num_entry_qwords;
>>> +	__le64 v_bit;
>>> +	void (*get_used)(const __le64 *entry, __le64 *used);
>>> +	void (*sync)(struct arm_smmu_entry_writer *writer);
>>> +};
>>
>> Can we avoid the indirection for now, please? I'm sure we'll want it later
>> when you extend this to CDs, but for the initial support it just makes it
>> more difficult to follow the flow. Should be a trivial thing to drop, I
>> hope.
> 
> We can.

Ack, the abstraction is really hard to follow, and much of that seems 
entirely self-inflicted in the amount of recalculating information which 
was in-context in a previous step but then thrown away. And as best I 
can tell I think it will still end up doing more CFGIs than needed.

Keeping a single monolithic check-and-update function will be *so* much 
easier to understand and maintain. As far as CDs go, anything we might 
reasonably want to change in a live CD is all in the first word so I 
don't see any value in attempting to generalise further on that side of 
things. Maybe arm_smmu_write_ctx_desc() could stand to be a bit 
prettier, but honestly I don't think it's too bad as-is.

>>> +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
>>>   {
>>> +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
>>> +
>>> +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
>>> +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
>>> +		return;
>>> +
>>> +	/*
>>> +	 * See 13.5 Summary of attribute/permission configuration fields for the
>>> +	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
>>> +	 * and S2 only.
>>> +	 */
>>> +	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
>>> +	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
>>> +	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
>>> +	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
>>> +		     STRTAB_STE_1_S1DSS_BYPASS))
>>> +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
>>
>> Huh, SHCFG is really getting in the way here, isn't it?
> 
> I wouldn't say that.. It is just a complicated bit of the spec. One of
> the things we recently did was to audit all the cache settings and, at
> least, we then realized that SHCFG was being subtly used by S2 as
> well..

Yeah, that really shouldn't be subtle; incoming attributes are replaced 
by S1 translation, thus they are relevant to not-S1 configs.

I think it's likely to be significantly more straightforward to give up 
on the switch statement and jump straight into the more architectural 
paradigm at this level, e.g.

	// Stage 1
	if (cfg & BIT(0)) {
		...
	} else {
		...
	}
	// Stage 2
	if (cfg & BIT(1)) {
		...
	} else {
		...
	}

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  2024-02-13 15:37     ` Mostafa Saleh
@ 2024-02-15 19:01       ` Robin Murphy
  -1 siblings, 0 replies; 112+ messages in thread
From: Robin Murphy @ 2024-02-15 19:01 UTC (permalink / raw)
  To: Mostafa Saleh, Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Will Deacon, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On 13/02/2024 3:37 pm, Mostafa Saleh wrote:
> Hi Jason,
> 
> On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
>> Logically arm_smmu_init_strtab() is the function that allocates and
>> populates the stream table with the initial value of the STEs. After this
>> function returns the stream table should be fully ready.
>>
>> arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
>> any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
>> ensures there is no disruption to the identity mapping during boot.
>>
>> Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
>> already executes immediately after arm_smmu_init_strtab().
>>
>> No functional change intended.
> 
> I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
> For example in KVM[1] we'd re-use a big part of this driver and rely on similar
> low-level functions. But no strong opinion.

Right, the fact that RMR handling is currently based on bypass STEs is 
an implementation detail; if we ever get round to doing the strict 
version with full-on temporary pagetables, that would obviously not 
belong in init_strtab, thus I would prefer to leave the "handle RMRs" 
step in its appropriate place in the higher-level flow regardless of how 
it happens to be named and implemented today.

Thanks,
Robin.

> 
> [1] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/
> 
> 
> Thanks,
> Mostafa
> 
>> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
>> Tested-by: Moritz Fischer <moritzf@google.com>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 6123e5ad95822c..2ab36dcf7c61f5 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -101,6 +101,8 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
>>   	{ 0, NULL},
>>   };
>>   
>> +static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu);
>> +
>>   static void parse_driver_options(struct arm_smmu_device *smmu)
>>   {
>>   	int i = 0;
>> @@ -3256,6 +3258,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>>   	cfg->strtab_base_cfg = reg;
>>   
>>   	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
>> +
>>   	return 0;
>>   }
>>   
>> @@ -3279,6 +3282,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
>>   
>>   	ida_init(&smmu->vmid_map);
>>   
>> +	/* Check for RMRs and install bypass STEs if any */
>> +	arm_smmu_rmr_install_bypass_ste(smmu);
>>   	return 0;
>>   }
>>   
>> @@ -4073,9 +4078,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>   	/* Record our private device structure */
>>   	platform_set_drvdata(pdev, smmu);
>>   
>> -	/* Check for RMRs and install bypass STEs if any */
>> -	arm_smmu_rmr_install_bypass_ste(smmu);
>> -
>>   	/* Reset the device */
>>   	ret = arm_smmu_device_reset(smmu, bypass);
>>   	if (ret)
>> -- 
>> 2.43.0
>>

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
@ 2024-02-15 19:01       ` Robin Murphy
  0 siblings, 0 replies; 112+ messages in thread
From: Robin Murphy @ 2024-02-15 19:01 UTC (permalink / raw)
  To: Mostafa Saleh, Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Will Deacon, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Zhangfei Gao

On 13/02/2024 3:37 pm, Mostafa Saleh wrote:
> Hi Jason,
> 
> On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
>> Logically arm_smmu_init_strtab() is the function that allocates and
>> populates the stream table with the initial value of the STEs. After this
>> function returns the stream table should be fully ready.
>>
>> arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
>> any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
>> ensures there is no disruption to the identity mapping during boot.
>>
>> Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
>> already executes immediately after arm_smmu_init_strtab().
>>
>> No functional change intended.
> 
> I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
> For example in KVM[1] we'd re-use a big part of this driver and rely on similar
> low-level functions. But no strong opinion.

Right, the fact that RMR handling is currently based on bypass STEs is 
an implementation detail; if we ever get round to doing the strict 
version with full-on temporary pagetables, that would obviously not 
belong in init_strtab, thus I would prefer to leave the "handle RMRs" 
step in its appropriate place in the higher-level flow regardless of how 
it happens to be named and implemented today.

Thanks,
Robin.

> 
> [1] https://lore.kernel.org/kvmarm/20230201125328.2186498-1-jean-philippe@linaro.org/
> 
> 
> Thanks,
> Mostafa
> 
>> Tested-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>> Tested-by: Nicolin Chen <nicolinc@nvidia.com>
>> Tested-by: Moritz Fischer <moritzf@google.com>
>> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
>> ---
>>   drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 8 +++++---
>>   1 file changed, 5 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> index 6123e5ad95822c..2ab36dcf7c61f5 100644
>> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
>> @@ -101,6 +101,8 @@ static struct arm_smmu_option_prop arm_smmu_options[] = {
>>   	{ 0, NULL},
>>   };
>>   
>> +static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu);
>> +
>>   static void parse_driver_options(struct arm_smmu_device *smmu)
>>   {
>>   	int i = 0;
>> @@ -3256,6 +3258,7 @@ static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu)
>>   	cfg->strtab_base_cfg = reg;
>>   
>>   	arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents);
>> +
>>   	return 0;
>>   }
>>   
>> @@ -3279,6 +3282,8 @@ static int arm_smmu_init_strtab(struct arm_smmu_device *smmu)
>>   
>>   	ida_init(&smmu->vmid_map);
>>   
>> +	/* Check for RMRs and install bypass STEs if any */
>> +	arm_smmu_rmr_install_bypass_ste(smmu);
>>   	return 0;
>>   }
>>   
>> @@ -4073,9 +4078,6 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>   	/* Record our private device structure */
>>   	platform_set_drvdata(pdev, smmu);
>>   
>> -	/* Check for RMRs and install bypass STEs if any */
>> -	arm_smmu_rmr_install_bypass_ste(smmu);
>> -
>>   	/* Reset the device */
>>   	ret = arm_smmu_device_reset(smmu, bypass);
>>   	if (ret)
>> -- 
>> 2.43.0
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-15 18:42         ` Robin Murphy
@ 2024-02-15 20:11           ` Robin Murphy
  -1 siblings, 0 replies; 112+ messages in thread
From: Robin Murphy @ 2024-02-15 20:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On 2024-02-15 6:42 pm, Robin Murphy wrote:
[...]
>>>> +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 
>>>> *used_bits)
>>>>   {
>>>> +    unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, 
>>>> le64_to_cpu(ent[0]));
>>>> +
>>>> +    used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
>>>> +    if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
>>>> +        return;
>>>> +
>>>> +    /*
>>>> +     * See 13.5 Summary of attribute/permission configuration 
>>>> fields for the
>>>> +     * SHCFG behavior. It is only used for BYPASS, including S1DSS 
>>>> BYPASS,
>>>> +     * and S2 only.
>>>> +     */
>>>> +    if (cfg == STRTAB_STE_0_CFG_BYPASS ||
>>>> +        cfg == STRTAB_STE_0_CFG_S2_TRANS ||
>>>> +        (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
>>>> +         FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
>>>> +             STRTAB_STE_1_S1DSS_BYPASS))
>>>> +        used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
>>>
>>> Huh, SHCFG is really getting in the way here, isn't it?
>>
>> I wouldn't say that.. It is just a complicated bit of the spec. One of
>> the things we recently did was to audit all the cache settings and, at
>> least, we then realized that SHCFG was being subtly used by S2 as
>> well..
> 
> Yeah, that really shouldn't be subtle; incoming attributes are replaced 
> by S1 translation, thus they are relevant to not-S1 configs.

That said, in this specific case I don't understand why we're worrying 
about SHCFG here at all - we're never going to make use of any value 
other than "use incoming" because we can't rely on it being implemented 
in the first place, and even if it is, we really don't want to start 
getting into the forced-coherency notion that the DMA layer can'#t 
understand and devicetree can't describe.

We're still unconditionally setting the "use incoming" value for MTCFG, 
ALLOCCFG, PRIVCFG and INSTCFG without checking them, so there's no logic 
in pretending SHCFG is any different from its peers simply because its 
encoding is slightly less convenient. If the micro-optimisation of not 
setting it when we know it's going to be ignored anyway starts getting 
in the way, just drop that.

Thanks,
Robin.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-15 20:11           ` Robin Murphy
  0 siblings, 0 replies; 112+ messages in thread
From: Robin Murphy @ 2024-02-15 20:11 UTC (permalink / raw)
  To: Jason Gunthorpe, Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On 2024-02-15 6:42 pm, Robin Murphy wrote:
[...]
>>>> +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 
>>>> *used_bits)
>>>>   {
>>>> +    unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, 
>>>> le64_to_cpu(ent[0]));
>>>> +
>>>> +    used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
>>>> +    if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
>>>> +        return;
>>>> +
>>>> +    /*
>>>> +     * See 13.5 Summary of attribute/permission configuration 
>>>> fields for the
>>>> +     * SHCFG behavior. It is only used for BYPASS, including S1DSS 
>>>> BYPASS,
>>>> +     * and S2 only.
>>>> +     */
>>>> +    if (cfg == STRTAB_STE_0_CFG_BYPASS ||
>>>> +        cfg == STRTAB_STE_0_CFG_S2_TRANS ||
>>>> +        (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
>>>> +         FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
>>>> +             STRTAB_STE_1_S1DSS_BYPASS))
>>>> +        used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
>>>
>>> Huh, SHCFG is really getting in the way here, isn't it?
>>
>> I wouldn't say that.. It is just a complicated bit of the spec. One of
>> the things we recently did was to audit all the cache settings and, at
>> least, we then realized that SHCFG was being subtly used by S2 as
>> well..
> 
> Yeah, that really shouldn't be subtle; incoming attributes are replaced 
> by S1 translation, thus they are relevant to not-S1 configs.

That said, in this specific case I don't understand why we're worrying 
about SHCFG here at all - we're never going to make use of any value 
other than "use incoming" because we can't rely on it being implemented 
in the first place, and even if it is, we really don't want to start 
getting into the forced-coherency notion that the DMA layer can'#t 
understand and devicetree can't describe.

We're still unconditionally setting the "use incoming" value for MTCFG, 
ALLOCCFG, PRIVCFG and INSTCFG without checking them, so there's no logic 
in pretending SHCFG is any different from its peers simply because its 
encoding is slightly less convenient. If the micro-optimisation of not 
setting it when we know it's going to be ignored anyway starts getting 
in the way, just drop that.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-15 18:42         ` Robin Murphy
@ 2024-02-15 21:17           ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-15 21:17 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 06:42:37PM +0000, Robin Murphy wrote:

> > > > @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
> > > >   	ARM_SMMU_MAX_MSIS,
> > > >   };
> > > > +struct arm_smmu_entry_writer_ops;
> > > > +struct arm_smmu_entry_writer {
> > > > +	const struct arm_smmu_entry_writer_ops *ops;
> > > > +	struct arm_smmu_master *master;
> > > > +};
> > > > +
> > > > +struct arm_smmu_entry_writer_ops {
> > > > +	unsigned int num_entry_qwords;
> > > > +	__le64 v_bit;
> > > > +	void (*get_used)(const __le64 *entry, __le64 *used);
> > > > +	void (*sync)(struct arm_smmu_entry_writer *writer);
> > > > +};
> > > 
> > > Can we avoid the indirection for now, please? I'm sure we'll want it later
> > > when you extend this to CDs, but for the initial support it just makes it
> > > more difficult to follow the flow. Should be a trivial thing to drop, I
> > > hope.
> > 
> > We can.
> 
> Ack, the abstraction is really hard to follow, and much of that
> seems entirely self-inflicted in the amount of recalculating
> information which was in-context in a previous step but then thrown
> away.

I'm not sure I understand this can you be more specific? I don't know
what we are throwing away that you see?

> And as best I can tell I think it will still end up doing more CFGIs
> than needed.

I think we've minimized the number of steps and Michael did check it,
even pushed tests for the popular scenarios into the kunit. He found a
case where it was not optimal and it was improved.

Mostafa asked about extra syncs, and you can read my reply explaining
why. We both agreed the sync's are necessary.

The only extra thing I know of is the zeroing of fields. Perhaps we
don't have to do this, but I think we should. Operating with the STE
in a known state seems like the conservative choice.

Regardless if you have a case in mind where there are extra steps lets
try it in the kunit and check.

This is not a performance path, so I wouldn't invest too much in this
question.

> Keeping a single monolithic check-and-update function will be *so* much
> easier to understand and maintain. 

The ops are used by the kunit test suite and I think the kunit is
valuable.

Further I've been looking at the AMD driver and it has the same
problem to solve for its DTE and can use this same solution.  Intel
also has > 128 bit structures too. I already drafted an exploration of
using this algorithm in AMD.

I see a someday future where we will move this to shared core code. In
which case the driver only provides the used and sync operation which
I think is a low driver burden for solving such a tricky shared
problem. There is some more shared complexity here on x86 which needs
to use 128 bit stores if the CPU supports those instructions.

IOW this approach is nice and valuable outside ARM. I would like to
move in a direction where we simply use this shared code for all
multi-qword HW descriptors. We've certainly invested enough in
building it and none of the three drivers have anything better.

> As far as CDs go, anything we might reasonably want to change in a
> live CD is all in the first word so I don't see any value in

Changing from a S1 -> S1 requires updating two qwords in the CD and
that requires the V=0 flow that the current arm_smmu_write_ctx_desc()
doesn't do. It is not that arm_smmu_write_ctx_desc() needs to be
prettier, it needs more functionality.

> > > > +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> > > >   {
> > > > +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
> > > > +
> > > > +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> > > > +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > > > +		return;
> > > > +
> > > > +	/*
> > > > +	 * See 13.5 Summary of attribute/permission configuration fields for the
> > > > +	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
> > > > +	 * and S2 only.
> > > > +	 */
> > > > +	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> > > > +	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> > > > +	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> > > > +	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> > > > +		     STRTAB_STE_1_S1DSS_BYPASS))
> > > > +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
> > > > > > Huh, SHCFG is really getting in the way here, isn't it?
> > 
> > I wouldn't say that.. It is just a complicated bit of the spec. One of
> > the things we recently did was to audit all the cache settings and, at
> > least, we then realized that SHCFG was being subtly used by S2 as
> > well..
> 
> Yeah, that really shouldn't be subtle; incoming attributes are replaced by
> S1 translation, thus they are relevant to not-S1 configs.

That is a really nice way to summarize the spec! But my remark was
more about the code which isn't so obvious what value it intended to
have for SHCFG on the S2 case.

This doesn't really change anthing about this patch, we'd still have
the above hunk to accurately reflect the SHCFG usage, and we'd still
set SHCFG to 0 in S1 cases where it isn't used by HW, just like today.

> I think it's likely to be significantly more straightforward to give up on
> the switch statement and jump straight into the more architectural paradigm
> at this level, e.g.

I've thought about that, I can make effort to do this, the later
nesting change would probably look nicer in this style.

Thanks,
Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-15 21:17           ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-15 21:17 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Will Deacon, iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 06:42:37PM +0000, Robin Murphy wrote:

> > > > @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
> > > >   	ARM_SMMU_MAX_MSIS,
> > > >   };
> > > > +struct arm_smmu_entry_writer_ops;
> > > > +struct arm_smmu_entry_writer {
> > > > +	const struct arm_smmu_entry_writer_ops *ops;
> > > > +	struct arm_smmu_master *master;
> > > > +};
> > > > +
> > > > +struct arm_smmu_entry_writer_ops {
> > > > +	unsigned int num_entry_qwords;
> > > > +	__le64 v_bit;
> > > > +	void (*get_used)(const __le64 *entry, __le64 *used);
> > > > +	void (*sync)(struct arm_smmu_entry_writer *writer);
> > > > +};
> > > 
> > > Can we avoid the indirection for now, please? I'm sure we'll want it later
> > > when you extend this to CDs, but for the initial support it just makes it
> > > more difficult to follow the flow. Should be a trivial thing to drop, I
> > > hope.
> > 
> > We can.
> 
> Ack, the abstraction is really hard to follow, and much of that
> seems entirely self-inflicted in the amount of recalculating
> information which was in-context in a previous step but then thrown
> away.

I'm not sure I understand this can you be more specific? I don't know
what we are throwing away that you see?

> And as best I can tell I think it will still end up doing more CFGIs
> than needed.

I think we've minimized the number of steps and Michael did check it,
even pushed tests for the popular scenarios into the kunit. He found a
case where it was not optimal and it was improved.

Mostafa asked about extra syncs, and you can read my reply explaining
why. We both agreed the sync's are necessary.

The only extra thing I know of is the zeroing of fields. Perhaps we
don't have to do this, but I think we should. Operating with the STE
in a known state seems like the conservative choice.

Regardless if you have a case in mind where there are extra steps lets
try it in the kunit and check.

This is not a performance path, so I wouldn't invest too much in this
question.

> Keeping a single monolithic check-and-update function will be *so* much
> easier to understand and maintain. 

The ops are used by the kunit test suite and I think the kunit is
valuable.

Further I've been looking at the AMD driver and it has the same
problem to solve for its DTE and can use this same solution.  Intel
also has > 128 bit structures too. I already drafted an exploration of
using this algorithm in AMD.

I see a someday future where we will move this to shared core code. In
which case the driver only provides the used and sync operation which
I think is a low driver burden for solving such a tricky shared
problem. There is some more shared complexity here on x86 which needs
to use 128 bit stores if the CPU supports those instructions.

IOW this approach is nice and valuable outside ARM. I would like to
move in a direction where we simply use this shared code for all
multi-qword HW descriptors. We've certainly invested enough in
building it and none of the three drivers have anything better.

> As far as CDs go, anything we might reasonably want to change in a
> live CD is all in the first word so I don't see any value in

Changing from a S1 -> S1 requires updating two qwords in the CD and
that requires the V=0 flow that the current arm_smmu_write_ctx_desc()
doesn't do. It is not that arm_smmu_write_ctx_desc() needs to be
prettier, it needs more functionality.

> > > > +static void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> > > >   {
> > > > +	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
> > > > +
> > > > +	used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> > > > +	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > > > +		return;
> > > > +
> > > > +	/*
> > > > +	 * See 13.5 Summary of attribute/permission configuration fields for the
> > > > +	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
> > > > +	 * and S2 only.
> > > > +	 */
> > > > +	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> > > > +	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> > > > +	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> > > > +	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> > > > +		     STRTAB_STE_1_S1DSS_BYPASS))
> > > > +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
> > > > > > Huh, SHCFG is really getting in the way here, isn't it?
> > 
> > I wouldn't say that.. It is just a complicated bit of the spec. One of
> > the things we recently did was to audit all the cache settings and, at
> > least, we then realized that SHCFG was being subtly used by S2 as
> > well..
> 
> Yeah, that really shouldn't be subtle; incoming attributes are replaced by
> S1 translation, thus they are relevant to not-S1 configs.

That is a really nice way to summarize the spec! But my remark was
more about the code which isn't so obvious what value it intended to
have for SHCFG on the S2 case.

This doesn't really change anthing about this patch, we'd still have
the above hunk to accurately reflect the SHCFG usage, and we'd still
set SHCFG to 0 in S1 cases where it isn't used by HW, just like today.

> I think it's likely to be significantly more straightforward to give up on
> the switch statement and jump straight into the more architectural paradigm
> at this level, e.g.

I've thought about that, I can make effort to do this, the later
nesting change would probably look nicer in this style.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
  2024-02-15 19:01       ` Robin Murphy
@ 2024-02-15 21:18         ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-15 21:18 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Mostafa Saleh, iommu, Joerg Roedel, linux-arm-kernel,
	Will Deacon, Lu Baolu, Jean-Philippe Brucker, Joerg Roedel,
	Moritz Fischer, Moritz Fischer, Michael Shavit, Nicolin Chen,
	patches, Shameer Kolothum, Zhangfei Gao

On Thu, Feb 15, 2024 at 07:01:59PM +0000, Robin Murphy wrote:
> On 13/02/2024 3:37 pm, Mostafa Saleh wrote:
> > Hi Jason,
> > 
> > On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
> > > Logically arm_smmu_init_strtab() is the function that allocates and
> > > populates the stream table with the initial value of the STEs. After this
> > > function returns the stream table should be fully ready.
> > > 
> > > arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
> > > any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
> > > ensures there is no disruption to the identity mapping during boot.
> > > 
> > > Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
> > > already executes immediately after arm_smmu_init_strtab().
> > > 
> > > No functional change intended.
> > 
> > I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
> > For example in KVM[1] we'd re-use a big part of this driver and rely on similar
> > low-level functions. But no strong opinion.
> 
> Right, the fact that RMR handling is currently based on bypass STEs is an
> implementation detail; if we ever get round to doing the strict version with
> full-on temporary pagetables, that would obviously not belong in
> init_strtab, thus I would prefer to leave the "handle RMRs" step in its
> appropriate place in the higher-level flow regardless of how it happens to
> be named and implemented today.

I will drop this patch

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste()
@ 2024-02-15 21:18         ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-15 21:18 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Mostafa Saleh, iommu, Joerg Roedel, linux-arm-kernel,
	Will Deacon, Lu Baolu, Jean-Philippe Brucker, Joerg Roedel,
	Moritz Fischer, Moritz Fischer, Michael Shavit, Nicolin Chen,
	patches, Shameer Kolothum, Zhangfei Gao

On Thu, Feb 15, 2024 at 07:01:59PM +0000, Robin Murphy wrote:
> On 13/02/2024 3:37 pm, Mostafa Saleh wrote:
> > Hi Jason,
> > 
> > On Tue, Feb 06, 2024 at 11:12:40AM -0400, Jason Gunthorpe wrote:
> > > Logically arm_smmu_init_strtab() is the function that allocates and
> > > populates the stream table with the initial value of the STEs. After this
> > > function returns the stream table should be fully ready.
> > > 
> > > arm_smmu_rmr_install_bypass_ste() adjusts the initial stream table to force
> > > any SIDs that the FW says have IOMMU_RESV_DIRECT to use bypass. This
> > > ensures there is no disruption to the identity mapping during boot.
> > > 
> > > Put arm_smmu_rmr_install_bypass_ste() into arm_smmu_init_strtab(), it
> > > already executes immediately after arm_smmu_init_strtab().
> > > 
> > > No functional change intended.
> > 
> > I think arm_smmu_init_strtab is quite low level to abstract FW configuration in it.
> > For example in KVM[1] we'd re-use a big part of this driver and rely on similar
> > low-level functions. But no strong opinion.
> 
> Right, the fact that RMR handling is currently based on bypass STEs is an
> implementation detail; if we ever get round to doing the strict version with
> full-on temporary pagetables, that would obviously not belong in
> init_strtab, thus I would prefer to leave the "handle RMRs" step in its
> appropriate place in the higher-level flow regardless of how it happens to
> be named and implemented today.

I will drop this patch

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-15 20:11           ` Robin Murphy
@ 2024-02-16 16:28             ` Will Deacon
  -1 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-16 16:28 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 08:11:38PM +0000, Robin Murphy wrote:
> On 2024-02-15 6:42 pm, Robin Murphy wrote:
> [...]
> > > > > +static void arm_smmu_get_ste_used(const __le64 *ent, __le64
> > > > > *used_bits)
> > > > >   {
> > > > > +    unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG,
> > > > > le64_to_cpu(ent[0]));
> > > > > +
> > > > > +    used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> > > > > +    if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > > > > +        return;
> > > > > +
> > > > > +    /*
> > > > > +     * See 13.5 Summary of attribute/permission
> > > > > configuration fields for the
> > > > > +     * SHCFG behavior. It is only used for BYPASS,
> > > > > including S1DSS BYPASS,
> > > > > +     * and S2 only.
> > > > > +     */
> > > > > +    if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> > > > > +        cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> > > > > +        (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> > > > > +         FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> > > > > +             STRTAB_STE_1_S1DSS_BYPASS))
> > > > > +        used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
> > > > 
> > > > Huh, SHCFG is really getting in the way here, isn't it?
> > > 
> > > I wouldn't say that.. It is just a complicated bit of the spec. One of
> > > the things we recently did was to audit all the cache settings and, at
> > > least, we then realized that SHCFG was being subtly used by S2 as
> > > well..
> > 
> > Yeah, that really shouldn't be subtle; incoming attributes are replaced
> > by S1 translation, thus they are relevant to not-S1 configs.
> 
> That said, in this specific case I don't understand why we're worrying about
> SHCFG here at all - we're never going to make use of any value other than
> "use incoming" because we can't rely on it being implemented in the first
> place, and even if it is, we really don't want to start getting into the
> forced-coherency notion that the DMA layer can'#t understand and devicetree
> can't describe.

Yup, that's exactly what I'm thinking. We currently set it to NSH when
translation is enabled, so that the stage-2 shareability is effectively
an override. However, the device is either coherent, or it isn't, and so
we should just leave this always set to "use incoming" in my opinion,
which means we no longer need to care about qword 1 for the bypass case.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-16 16:28             ` Will Deacon
  0 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-16 16:28 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 08:11:38PM +0000, Robin Murphy wrote:
> On 2024-02-15 6:42 pm, Robin Murphy wrote:
> [...]
> > > > > +static void arm_smmu_get_ste_used(const __le64 *ent, __le64
> > > > > *used_bits)
> > > > >   {
> > > > > +    unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG,
> > > > > le64_to_cpu(ent[0]));
> > > > > +
> > > > > +    used_bits[0] = cpu_to_le64(STRTAB_STE_0_V);
> > > > > +    if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> > > > > +        return;
> > > > > +
> > > > > +    /*
> > > > > +     * See 13.5 Summary of attribute/permission
> > > > > configuration fields for the
> > > > > +     * SHCFG behavior. It is only used for BYPASS,
> > > > > including S1DSS BYPASS,
> > > > > +     * and S2 only.
> > > > > +     */
> > > > > +    if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> > > > > +        cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> > > > > +        (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> > > > > +         FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> > > > > +             STRTAB_STE_1_S1DSS_BYPASS))
> > > > > +        used_bits[1] |= cpu_to_le64(STRTAB_STE_1_SHCFG);
> > > > 
> > > > Huh, SHCFG is really getting in the way here, isn't it?
> > > 
> > > I wouldn't say that.. It is just a complicated bit of the spec. One of
> > > the things we recently did was to audit all the cache settings and, at
> > > least, we then realized that SHCFG was being subtly used by S2 as
> > > well..
> > 
> > Yeah, that really shouldn't be subtle; incoming attributes are replaced
> > by S1 translation, thus they are relevant to not-S1 configs.
> 
> That said, in this specific case I don't understand why we're worrying about
> SHCFG here at all - we're never going to make use of any value other than
> "use incoming" because we can't rely on it being implemented in the first
> place, and even if it is, we really don't want to start getting into the
> forced-coherency notion that the DMA layer can'#t understand and devicetree
> can't describe.

Yup, that's exactly what I'm thinking. We currently set it to NSH when
translation is enabled, so that the stage-2 shareability is effectively
an override. However, the device is either coherent, or it isn't, and so
we should just leave this always set to "use incoming" in my opinion,
which means we no longer need to care about qword 1 for the bypass case.

Will

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions
  2024-02-06 15:12   ` Jason Gunthorpe
@ 2024-02-16 17:12     ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-16 17:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> +					struct arm_smmu_master *master,
> +					struct arm_smmu_domain *smmu_domain)
> +{
> +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> +
> +	memset(target, 0, sizeof(*target));
> +	target->data[0] = cpu_to_le64(
> +		STRTAB_STE_0_V |
> +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> +
> +	target->data[1] = cpu_to_le64(
> +		FIELD_PREP(STRTAB_STE_1_EATS,
> +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));

Just so we are on the same page.. The above NON_SHARABLE is a mistake
here since v1.

It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
the S2 ends up re-using the qword[1] that was installed by the
bypass/abort STE that has to be in place prior to installing the S2.

Only the S1 path sets SHCFG to 0 because the HW doesn't use it due to
the current driver not using S1DSS.

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions
@ 2024-02-16 17:12     ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-16 17:12 UTC (permalink / raw)
  To: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Will Deacon
  Cc: Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> +					struct arm_smmu_master *master,
> +					struct arm_smmu_domain *smmu_domain)
> +{
> +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> +
> +	memset(target, 0, sizeof(*target));
> +	target->data[0] = cpu_to_le64(
> +		STRTAB_STE_0_V |
> +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> +
> +	target->data[1] = cpu_to_le64(
> +		FIELD_PREP(STRTAB_STE_1_EATS,
> +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));

Just so we are on the same page.. The above NON_SHARABLE is a mistake
here since v1.

It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
the S2 ends up re-using the qword[1] that was installed by the
bypass/abort STE that has to be in place prior to installing the S2.

Only the S1 path sets SHCFG to 0 because the HW doesn't use it due to
the current driver not using S1DSS.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions
  2024-02-16 17:12     ` Jason Gunthorpe
@ 2024-02-16 17:39       ` Will Deacon
  -1 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-16 17:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Fri, Feb 16, 2024 at 01:12:17PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> > +					struct arm_smmu_master *master,
> > +					struct arm_smmu_domain *smmu_domain)
> > +{
> > +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> > +
> > +	memset(target, 0, sizeof(*target));
> > +	target->data[0] = cpu_to_le64(
> > +		STRTAB_STE_0_V |
> > +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> > +
> > +	target->data[1] = cpu_to_le64(
> > +		FIELD_PREP(STRTAB_STE_1_EATS,
> > +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> > +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> > +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
> 
> Just so we are on the same page.. The above NON_SHARABLE is a mistake
> here since v1.
> 
> It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
> the S2 ends up re-using the qword[1] that was installed by the
> bypass/abort STE that has to be in place prior to installing the S2.

Ah! I thought you were inheriting the existing behaviour, but yeah, it's
a straight-up bug which I think just makes life a little more difficult
than it needs to be. If we can keep SHCFG as "use incoming" in all
configurations, then I do think we can move to a per-qword rather than a
per-field approach, as mentioned in the other part of the thread. I'll
try to make some time next week to play with it.

Will

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions
@ 2024-02-16 17:39       ` Will Deacon
  0 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-16 17:39 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Fri, Feb 16, 2024 at 01:12:17PM -0400, Jason Gunthorpe wrote:
> On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> > +					struct arm_smmu_master *master,
> > +					struct arm_smmu_domain *smmu_domain)
> > +{
> > +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> > +
> > +	memset(target, 0, sizeof(*target));
> > +	target->data[0] = cpu_to_le64(
> > +		STRTAB_STE_0_V |
> > +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> > +
> > +	target->data[1] = cpu_to_le64(
> > +		FIELD_PREP(STRTAB_STE_1_EATS,
> > +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> > +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> > +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
> 
> Just so we are on the same page.. The above NON_SHARABLE is a mistake
> here since v1.
> 
> It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
> the S2 ends up re-using the qword[1] that was installed by the
> bypass/abort STE that has to be in place prior to installing the S2.

Ah! I thought you were inheriting the existing behaviour, but yeah, it's
a straight-up bug which I think just makes life a little more difficult
than it needs to be. If we can keep SHCFG as "use incoming" in all
configurations, then I do think we can move to a per-qword rather than a
per-field approach, as mentioned in the other part of the thread. I'll
try to make some time next week to play with it.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions
  2024-02-16 17:39       ` Will Deacon
@ 2024-02-16 17:58         ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-16 17:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Fri, Feb 16, 2024 at 05:39:22PM +0000, Will Deacon wrote:
> On Fri, Feb 16, 2024 at 01:12:17PM -0400, Jason Gunthorpe wrote:
> > On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> > > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> > > +					struct arm_smmu_master *master,
> > > +					struct arm_smmu_domain *smmu_domain)
> > > +{
> > > +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> > > +
> > > +	memset(target, 0, sizeof(*target));
> > > +	target->data[0] = cpu_to_le64(
> > > +		STRTAB_STE_0_V |
> > > +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> > > +
> > > +	target->data[1] = cpu_to_le64(
> > > +		FIELD_PREP(STRTAB_STE_1_EATS,
> > > +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> > > +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> > > +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
> > 
> > Just so we are on the same page.. The above NON_SHARABLE is a mistake
> > here since v1.
> > 
> > It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
> > the S2 ends up re-using the qword[1] that was installed by the
> > bypass/abort STE that has to be in place prior to installing the S2.
> 
> Ah! I thought you were inheriting the existing behaviour, but yeah,

Yeah, so did I..

> it's a straight-up bug which I think just makes life a little more
> difficult than it needs to be. If we can keep SHCFG as "use
> incoming" in all configurations, then I do think we can move to a
> per-qword rather than a per-field approach, as mentioned in the
> other part of the thread. I'll try to make some time next week to
> play with it.

I'm sure you can make per-qword work. I think it will be worse code
though because doing so will have to compromise some of the
underpinning logical principles:

 - Used bits reflects actual HW behavior and flows from the
   spec's IGNORED/etc language
 - Make STE functions set the bits that the HW uses and no extra bits
 - The make STE functions create a complete STE

I already tried the naive version where none of the above are
compromised and it does not work. Someone else may have an idea. IMHO
this is really not a valuable avenue to use all of our limited time
on.

I'm getting the existings remarks typed in, it is already turning into
some work, but if you feel strongly please come next week with exactly
you will accept and I will ensure whatever it is gets done if you will
commit to merge it. Even if we have to toss out Michael's version too.

I can always bring back this version for some future shared code
thing.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions
@ 2024-02-16 17:58         ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-16 17:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Fri, Feb 16, 2024 at 05:39:22PM +0000, Will Deacon wrote:
> On Fri, Feb 16, 2024 at 01:12:17PM -0400, Jason Gunthorpe wrote:
> > On Tue, Feb 06, 2024 at 11:12:41AM -0400, Jason Gunthorpe wrote:
> > > +static void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target,
> > > +					struct arm_smmu_master *master,
> > > +					struct arm_smmu_domain *smmu_domain)
> > > +{
> > > +	struct arm_smmu_s2_cfg *s2_cfg = &smmu_domain->s2_cfg;
> > > +
> > > +	memset(target, 0, sizeof(*target));
> > > +	target->data[0] = cpu_to_le64(
> > > +		STRTAB_STE_0_V |
> > > +		FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS));
> > > +
> > > +	target->data[1] = cpu_to_le64(
> > > +		FIELD_PREP(STRTAB_STE_1_EATS,
> > > +			   master->ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0) |
> > > +		FIELD_PREP(STRTAB_STE_1_SHCFG,
> > > +			   STRTAB_STE_1_SHCFG_NON_SHARABLE));
> > 
> > Just so we are on the same page.. The above NON_SHARABLE is a mistake
> > here since v1.
> > 
> > It is hard to follow arm_smmu_write_strtab_ent() so we all missed that
> > the S2 ends up re-using the qword[1] that was installed by the
> > bypass/abort STE that has to be in place prior to installing the S2.
> 
> Ah! I thought you were inheriting the existing behaviour, but yeah,

Yeah, so did I..

> it's a straight-up bug which I think just makes life a little more
> difficult than it needs to be. If we can keep SHCFG as "use
> incoming" in all configurations, then I do think we can move to a
> per-qword rather than a per-field approach, as mentioned in the
> other part of the thread. I'll try to make some time next week to
> play with it.

I'm sure you can make per-qword work. I think it will be worse code
though because doing so will have to compromise some of the
underpinning logical principles:

 - Used bits reflects actual HW behavior and flows from the
   spec's IGNORED/etc language
 - Make STE functions set the bits that the HW uses and no extra bits
 - The make STE functions create a complete STE

I already tried the naive version where none of the above are
compromised and it does not work. Someone else may have an idea. IMHO
this is really not a valuable avenue to use all of our limited time
on.

I'm getting the existings remarks typed in, it is already turning into
some work, but if you feel strongly please come next week with exactly
you will accept and I will ensure whatever it is gets done if you will
commit to merge it. Even if we have to toss out Michael's version too.

I can always bring back this version for some future shared code
thing.

Thanks,
Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-15 16:01       ` Jason Gunthorpe
@ 2024-02-21 13:49         ` Will Deacon
  -1 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-21 13:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 12:01:35PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 01:49:53PM +0000, Will Deacon wrote:
> > On Tue, Feb 06, 2024 at 11:12:38AM -0400, Jason Gunthorpe wrote:
> > > @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
> > >  	ARM_SMMU_MAX_MSIS,
> > >  };
> > >  
> > > +struct arm_smmu_entry_writer_ops;
> > > +struct arm_smmu_entry_writer {
> > > +	const struct arm_smmu_entry_writer_ops *ops;
> > > +	struct arm_smmu_master *master;
> > > +};
> > > +
> > > +struct arm_smmu_entry_writer_ops {
> > > +	unsigned int num_entry_qwords;
> > > +	__le64 v_bit;
> > > +	void (*get_used)(const __le64 *entry, __le64 *used);
> > > +	void (*sync)(struct arm_smmu_entry_writer *writer);
> > > +};
> > 
> > Can we avoid the indirection for now, please? I'm sure we'll want it later
> > when you extend this to CDs, but for the initial support it just makes it
> > more difficult to follow the flow. Should be a trivial thing to drop, I
> > hope.
> 
> We can.

Thanks.

> > I think it also means we don't have a "hitless" transition from
> > stage-2 translation -> bypass.
> 
> Hmm, I didn't notice that. The kunit passed:
> 
> [    0.511483] 1..1
> [    0.511510]     KTAP version 1
> [    0.511551]     # Subtest: arm-smmu-v3-kunit-test
> [    0.511592]     # module: arm_smmu_v3_test
> [    0.511594]     1..10
> [    0.511910]     ok 1 arm_smmu_v3_write_ste_test_bypass_to_abort
> [    0.512110]     ok 2 arm_smmu_v3_write_ste_test_abort_to_bypass
> [    0.512386]     ok 3 arm_smmu_v3_write_ste_test_cdtable_to_abort
> [    0.512631]     ok 4 arm_smmu_v3_write_ste_test_abort_to_cdtable
> [    0.512874]     ok 5 arm_smmu_v3_write_ste_test_cdtable_to_bypass
> [    0.513075]     ok 6 arm_smmu_v3_write_ste_test_bypass_to_cdtable
> [    0.513275]     ok 7 arm_smmu_v3_write_ste_test_cdtable_s1dss_change
> [    0.513466]     ok 8 arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass
> [    0.513672]     ok 9 arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass
> [    0.514148]     ok 10 arm_smmu_v3_write_ste_test_non_hitless
> 
> Which I see is because it did not test the S2 case...

Oops!

> > Additionally, it looks like there's an existing buglet here in that we
> > shouldn't set SHCFG if SMMU_IDR1.ATTR_TYPES_OVR == 0.
> 
> Ah because the spec says RES0.. I'll add these two into the pile of
> random stuff in part 3

I don't think this needs to wait until part 3, but it also doesn't need to
be part of your series. I'll make a note that we can improve this.

> > > +	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
> > > +	switch (cfg) {
> > > +	case STRTAB_STE_0_CFG_ABORT:
> > > +	case STRTAB_STE_0_CFG_BYPASS:
> > > +		break;
> > > +	case STRTAB_STE_0_CFG_S1_TRANS:
> > > +		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
> > > +					    STRTAB_STE_0_S1CTXPTR_MASK |
> > > +					    STRTAB_STE_0_S1CDMAX);
> > > +		used_bits[1] |=
> > > +			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
> > > +				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
> > > +				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
> > > +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
> > > +		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
> > > +		break;
> > > +	case STRTAB_STE_0_CFG_S2_TRANS:
> > > +		used_bits[1] |=
> > > +			cpu_to_le64(STRTAB_STE_1_EATS);
> > > +		used_bits[2] |=
> > > +			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
> > > +				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
> > > +				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
> > > +		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
> > > +		break;
> > 
> > With SHCFG fixed, can we go a step further with this and simply identify
> > the live qwords directly, rather than on a field-by-field basis? I think
> > we should be able to do the same "hitless" transitions you want with the
> > coarser granularity.
> 
> Not naively, Michael's excellent unit test shows it.. My understanding
> of your idea was roughly thus:
> 
> void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> {
> 	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
> 
> 	used_bits[0] = U64_MAX;
> 	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> 		return;
> 
> 	/*
> 	 * See 13.5 Summary of attribute/permission configuration fields for the
> 	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
> 	 * and S2 only.
> 	 */
> 	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> 	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> 	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> 	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> 		     STRTAB_STE_1_S1DSS_BYPASS))
> 		used_bits[1] |= U64_MAX;
> 
> 	used_bits[0] |= U64_MAX;
> 	switch (cfg) {
> 	case STRTAB_STE_0_CFG_ABORT:
> 	case STRTAB_STE_0_CFG_BYPASS:
> 		break;
> 	case STRTAB_STE_0_CFG_S1_TRANS:
> 		used_bits[0] |= U64_MAX;
> 		used_bits[1] |= U64_MAX;
> 		used_bits[2] |= U64_MAX;
> 		break;
> 	case STRTAB_STE_0_CFG_NESTED:
> 		used_bits[0] |= U64_MAX;
> 		used_bits[1] |= U64_MAX;
> 		fallthrough;
> 	case STRTAB_STE_0_CFG_S2_TRANS:
> 		used_bits[1] |= U64_MAX;
> 		used_bits[2] |= U64_MAX;
> 		used_bits[3] |= U64_MAX;
> 		break;

Very roughly, yes, although I'd go further and just return a bitmap of
used qwords instead of tracking these bits. Basically, we could have some
#defines saying which qwords are used by which configs, and then we can
simplify the algorithm while retaining the ability to reject updates
to qwords which we're not expecting.

> And the failures:

[...]

> BYPASS -> S1 requires changing overlapping bits in qword 1. The
> programming sequence would look like this:
> 
> start qw[1] = SHCFG_INCOMING
>       qw[1] = SHCFG_INCOMING | S1DSS
>       qw[0] = S1 mode
>       qw[1] = S1DSS
> 
> The two states are sharing qw[1] and BYPASS ignores all of it except
> SHCFG_INCOMING. Since bypass would have its qw[1] marked as used due
> to the SHCFG there is no way to express that it is not looking at the
> other bits.
> 
> We'd have to really start doing really hacky things like remove the
> SHCFG as a used field entirely - but I think if you do that you break
> the entire logic of the design and also go backwards to having
> programming that only works if STEs are constructed in certain ways.

I would actually like to remove SHCFG as a used field. If the encoding
was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
be easy, but it shouldn't be too hard to work around that.

Then BYPASS doesn't need to worry about qword 1 at all.

Will

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-21 13:49         ` Will Deacon
  0 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-21 13:49 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 12:01:35PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 01:49:53PM +0000, Will Deacon wrote:
> > On Tue, Feb 06, 2024 at 11:12:38AM -0400, Jason Gunthorpe wrote:
> > > @@ -48,6 +48,21 @@ enum arm_smmu_msi_index {
> > >  	ARM_SMMU_MAX_MSIS,
> > >  };
> > >  
> > > +struct arm_smmu_entry_writer_ops;
> > > +struct arm_smmu_entry_writer {
> > > +	const struct arm_smmu_entry_writer_ops *ops;
> > > +	struct arm_smmu_master *master;
> > > +};
> > > +
> > > +struct arm_smmu_entry_writer_ops {
> > > +	unsigned int num_entry_qwords;
> > > +	__le64 v_bit;
> > > +	void (*get_used)(const __le64 *entry, __le64 *used);
> > > +	void (*sync)(struct arm_smmu_entry_writer *writer);
> > > +};
> > 
> > Can we avoid the indirection for now, please? I'm sure we'll want it later
> > when you extend this to CDs, but for the initial support it just makes it
> > more difficult to follow the flow. Should be a trivial thing to drop, I
> > hope.
> 
> We can.

Thanks.

> > I think it also means we don't have a "hitless" transition from
> > stage-2 translation -> bypass.
> 
> Hmm, I didn't notice that. The kunit passed:
> 
> [    0.511483] 1..1
> [    0.511510]     KTAP version 1
> [    0.511551]     # Subtest: arm-smmu-v3-kunit-test
> [    0.511592]     # module: arm_smmu_v3_test
> [    0.511594]     1..10
> [    0.511910]     ok 1 arm_smmu_v3_write_ste_test_bypass_to_abort
> [    0.512110]     ok 2 arm_smmu_v3_write_ste_test_abort_to_bypass
> [    0.512386]     ok 3 arm_smmu_v3_write_ste_test_cdtable_to_abort
> [    0.512631]     ok 4 arm_smmu_v3_write_ste_test_abort_to_cdtable
> [    0.512874]     ok 5 arm_smmu_v3_write_ste_test_cdtable_to_bypass
> [    0.513075]     ok 6 arm_smmu_v3_write_ste_test_bypass_to_cdtable
> [    0.513275]     ok 7 arm_smmu_v3_write_ste_test_cdtable_s1dss_change
> [    0.513466]     ok 8 arm_smmu_v3_write_ste_test_s1dssbypass_to_stebypass
> [    0.513672]     ok 9 arm_smmu_v3_write_ste_test_stebypass_to_s1dssbypass
> [    0.514148]     ok 10 arm_smmu_v3_write_ste_test_non_hitless
> 
> Which I see is because it did not test the S2 case...

Oops!

> > Additionally, it looks like there's an existing buglet here in that we
> > shouldn't set SHCFG if SMMU_IDR1.ATTR_TYPES_OVR == 0.
> 
> Ah because the spec says RES0.. I'll add these two into the pile of
> random stuff in part 3

I don't think this needs to wait until part 3, but it also doesn't need to
be part of your series. I'll make a note that we can improve this.

> > > +	used_bits[0] |= cpu_to_le64(STRTAB_STE_0_CFG);
> > > +	switch (cfg) {
> > > +	case STRTAB_STE_0_CFG_ABORT:
> > > +	case STRTAB_STE_0_CFG_BYPASS:
> > > +		break;
> > > +	case STRTAB_STE_0_CFG_S1_TRANS:
> > > +		used_bits[0] |= cpu_to_le64(STRTAB_STE_0_S1FMT |
> > > +					    STRTAB_STE_0_S1CTXPTR_MASK |
> > > +					    STRTAB_STE_0_S1CDMAX);
> > > +		used_bits[1] |=
> > > +			cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR |
> > > +				    STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH |
> > > +				    STRTAB_STE_1_S1STALLD | STRTAB_STE_1_STRW);
> > > +		used_bits[1] |= cpu_to_le64(STRTAB_STE_1_EATS);
> > > +		used_bits[2] |= cpu_to_le64(STRTAB_STE_2_S2VMID);
> > > +		break;
> > > +	case STRTAB_STE_0_CFG_S2_TRANS:
> > > +		used_bits[1] |=
> > > +			cpu_to_le64(STRTAB_STE_1_EATS);
> > > +		used_bits[2] |=
> > > +			cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR |
> > > +				    STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI |
> > > +				    STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2R);
> > > +		used_bits[3] |= cpu_to_le64(STRTAB_STE_3_S2TTB_MASK);
> > > +		break;
> > 
> > With SHCFG fixed, can we go a step further with this and simply identify
> > the live qwords directly, rather than on a field-by-field basis? I think
> > we should be able to do the same "hitless" transitions you want with the
> > coarser granularity.
> 
> Not naively, Michael's excellent unit test shows it.. My understanding
> of your idea was roughly thus:
> 
> void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits)
> {
> 	unsigned int cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(ent[0]));
> 
> 	used_bits[0] = U64_MAX;
> 	if (!(ent[0] & cpu_to_le64(STRTAB_STE_0_V)))
> 		return;
> 
> 	/*
> 	 * See 13.5 Summary of attribute/permission configuration fields for the
> 	 * SHCFG behavior. It is only used for BYPASS, including S1DSS BYPASS,
> 	 * and S2 only.
> 	 */
> 	if (cfg == STRTAB_STE_0_CFG_BYPASS ||
> 	    cfg == STRTAB_STE_0_CFG_S2_TRANS ||
> 	    (cfg == STRTAB_STE_0_CFG_S1_TRANS &&
> 	     FIELD_GET(STRTAB_STE_1_S1DSS, le64_to_cpu(ent[1])) ==
> 		     STRTAB_STE_1_S1DSS_BYPASS))
> 		used_bits[1] |= U64_MAX;
> 
> 	used_bits[0] |= U64_MAX;
> 	switch (cfg) {
> 	case STRTAB_STE_0_CFG_ABORT:
> 	case STRTAB_STE_0_CFG_BYPASS:
> 		break;
> 	case STRTAB_STE_0_CFG_S1_TRANS:
> 		used_bits[0] |= U64_MAX;
> 		used_bits[1] |= U64_MAX;
> 		used_bits[2] |= U64_MAX;
> 		break;
> 	case STRTAB_STE_0_CFG_NESTED:
> 		used_bits[0] |= U64_MAX;
> 		used_bits[1] |= U64_MAX;
> 		fallthrough;
> 	case STRTAB_STE_0_CFG_S2_TRANS:
> 		used_bits[1] |= U64_MAX;
> 		used_bits[2] |= U64_MAX;
> 		used_bits[3] |= U64_MAX;
> 		break;

Very roughly, yes, although I'd go further and just return a bitmap of
used qwords instead of tracking these bits. Basically, we could have some
#defines saying which qwords are used by which configs, and then we can
simplify the algorithm while retaining the ability to reject updates
to qwords which we're not expecting.

> And the failures:

[...]

> BYPASS -> S1 requires changing overlapping bits in qword 1. The
> programming sequence would look like this:
> 
> start qw[1] = SHCFG_INCOMING
>       qw[1] = SHCFG_INCOMING | S1DSS
>       qw[0] = S1 mode
>       qw[1] = S1DSS
> 
> The two states are sharing qw[1] and BYPASS ignores all of it except
> SHCFG_INCOMING. Since bypass would have its qw[1] marked as used due
> to the SHCFG there is no way to express that it is not looking at the
> other bits.
> 
> We'd have to really start doing really hacky things like remove the
> SHCFG as a used field entirely - but I think if you do that you break
> the entire logic of the design and also go backwards to having
> programming that only works if STEs are constructed in certain ways.

I would actually like to remove SHCFG as a used field. If the encoding
was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
be easy, but it shouldn't be too hard to work around that.

Then BYPASS doesn't need to worry about qword 1 at all.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-21 13:49         ` Will Deacon
@ 2024-02-21 14:08           ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-21 14:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:

> Very roughly, yes, although I'd go further and just return a bitmap of
> used qwords instead of tracking these bits. Basically, we could have some
> #defines saying which qwords are used by which configs, 

I don't think this will work well for CD's EPD0 case..

static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
{
	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
		return;
	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));

	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
		used_bits[0] &= ~cpu_to_le64(
			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
			CTXDESC_CD_0_TCR_SH0);
		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
	}
}

> and then we can
> simplify the algorithm while retaining the ability to reject updates
> to qwords which we're not expecting.

It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
shorter (not that it is complex anyhow) and other stuff gets worse.

> > We'd have to really start doing really hacky things like remove the
> > SHCFG as a used field entirely - but I think if you do that you break
> > the entire logic of the design and also go backwards to having
> > programming that only works if STEs are constructed in certain ways.
> 
> I would actually like to remove SHCFG as a used field. If the encoding
> was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
> be easy, but it shouldn't be too hard to work around that.

But why?

You throw away the entire logic of the design, go back to subtly
coupling the two parts, and *for what*? Exactly what are we trying to
achieve in return? You haven't explained why we are still discussing
this afer 7 months. It really isn't worthwhile.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-21 14:08           ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-21 14:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:

> Very roughly, yes, although I'd go further and just return a bitmap of
> used qwords instead of tracking these bits. Basically, we could have some
> #defines saying which qwords are used by which configs, 

I don't think this will work well for CD's EPD0 case..

static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
{
	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
		return;
	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));

	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
		used_bits[0] &= ~cpu_to_le64(
			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
			CTXDESC_CD_0_TCR_SH0);
		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
	}
}

> and then we can
> simplify the algorithm while retaining the ability to reject updates
> to qwords which we're not expecting.

It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
shorter (not that it is complex anyhow) and other stuff gets worse.

> > We'd have to really start doing really hacky things like remove the
> > SHCFG as a used field entirely - but I think if you do that you break
> > the entire logic of the design and also go backwards to having
> > programming that only works if STEs are constructed in certain ways.
> 
> I would actually like to remove SHCFG as a used field. If the encoding
> was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
> be easy, but it shouldn't be too hard to work around that.

But why?

You throw away the entire logic of the design, go back to subtly
coupling the two parts, and *for what*? Exactly what are we trying to
achieve in return? You haven't explained why we are still discussing
this afer 7 months. It really isn't worthwhile.

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-21 14:08           ` Jason Gunthorpe
@ 2024-02-21 16:19             ` Michael Shavit
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Shavit @ 2024-02-21 16:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Nicolin Chen, patches, Shameer Kolothum,
	Mostafa Saleh, Zhangfei Gao

On Wed, Feb 21, 2024 at 10:08 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
>
> > Very roughly, yes, although I'd go further and just return a bitmap of
> > used qwords instead of tracking these bits. Basically, we could have some
> > #defines saying which qwords are used by which configs,
>
> I don't think this will work well for CD's EPD0 case..
>
> static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> {
>         used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
>         if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
>                 return;
>         memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
>
>         /* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
>         if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
>                 used_bits[0] &= ~cpu_to_le64(
>                         CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
>                         CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
>                         CTXDESC_CD_0_TCR_SH0);
>                 used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
>         }
> }
>
> > and then we can
> > simplify the algorithm while retaining the ability to reject updates
> > to qwords which we're not expecting.
>
> It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
> shorter (not that it is complex anyhow) and other stuff gets worse.

I think the simplification here is in the first if branch of
arm_smmu_write_ste. With Will's proposal, we only perform a hitless
update if there's a single used qword that needs updating. There's no
longer a case where we first set unused bits in qwords whose other
bits are in use. I'd argue that setting unused bits of a q word was
very clever and removing the logic does conceptually simplify things,
although yes it's not much fewer lines of code. I also don't think
this throws away the entire logic of the current design, the idea of
counting the number of qwords that differ and writing qwords that are
unused first is still there.

But, it does mean that hitless updates are only possible under
narrower circumstances...We now have to figure out if there are
transitions where this is problematic where we could previously assume
that we'd always get the best behavior possible. Both in the present
(i.e this SHCFG discussion and EPD0 case) and in the future if new
parts of the configs start getting used. IMO not having to think about
this is a meaningful advantage of the current solution.

> > > We'd have to really start doing really hacky things like remove the
> > > SHCFG as a used field entirely - but I think if you do that you break
> > > the entire logic of the design and also go backwards to having
> > > programming that only works if STEs are constructed in certain ways.
> >
> > I would actually like to remove SHCFG as a used field. If the encoding
> > was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
> > be easy, but it shouldn't be too hard to work around that.
>

What do you mean by removing SHCFG as a used field? Are we changing
the driver so that it only ever sets SHCFG to a single possible value?
Or are we talking about fudging things and pretending it's not used
when it is and might have different values?

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-21 16:19             ` Michael Shavit
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Shavit @ 2024-02-21 16:19 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Nicolin Chen, patches, Shameer Kolothum,
	Mostafa Saleh, Zhangfei Gao

On Wed, Feb 21, 2024 at 10:08 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
>
> On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
>
> > Very roughly, yes, although I'd go further and just return a bitmap of
> > used qwords instead of tracking these bits. Basically, we could have some
> > #defines saying which qwords are used by which configs,
>
> I don't think this will work well for CD's EPD0 case..
>
> static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> {
>         used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
>         if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
>                 return;
>         memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
>
>         /* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
>         if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
>                 used_bits[0] &= ~cpu_to_le64(
>                         CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
>                         CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
>                         CTXDESC_CD_0_TCR_SH0);
>                 used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
>         }
> }
>
> > and then we can
> > simplify the algorithm while retaining the ability to reject updates
> > to qwords which we're not expecting.
>
> It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
> shorter (not that it is complex anyhow) and other stuff gets worse.

I think the simplification here is in the first if branch of
arm_smmu_write_ste. With Will's proposal, we only perform a hitless
update if there's a single used qword that needs updating. There's no
longer a case where we first set unused bits in qwords whose other
bits are in use. I'd argue that setting unused bits of a q word was
very clever and removing the logic does conceptually simplify things,
although yes it's not much fewer lines of code. I also don't think
this throws away the entire logic of the current design, the idea of
counting the number of qwords that differ and writing qwords that are
unused first is still there.

But, it does mean that hitless updates are only possible under
narrower circumstances...We now have to figure out if there are
transitions where this is problematic where we could previously assume
that we'd always get the best behavior possible. Both in the present
(i.e this SHCFG discussion and EPD0 case) and in the future if new
parts of the configs start getting used. IMO not having to think about
this is a meaningful advantage of the current solution.

> > > We'd have to really start doing really hacky things like remove the
> > > SHCFG as a used field entirely - but I think if you do that you break
> > > the entire logic of the design and also go backwards to having
> > > programming that only works if STEs are constructed in certain ways.
> >
> > I would actually like to remove SHCFG as a used field. If the encoding
> > was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
> > be easy, but it shouldn't be too hard to work around that.
>

What do you mean by removing SHCFG as a used field? Are we changing
the driver so that it only ever sets SHCFG to a single possible value?
Or are we talking about fudging things and pretending it's not used
when it is and might have different values?

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-21 16:19             ` Michael Shavit
@ 2024-02-21 16:52               ` Michael Shavit
  -1 siblings, 0 replies; 112+ messages in thread
From: Michael Shavit @ 2024-02-21 16:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Nicolin Chen, patches, Shameer Kolothum,
	Mostafa Saleh, Zhangfei Gao

On Thu, Feb 22, 2024 at 12:19 AM Michael Shavit <mshavit@google.com> wrote:
>
> On Wed, Feb 21, 2024 at 10:08 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> >
> > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > used qwords instead of tracking these bits. Basically, we could have some
> > > #defines saying which qwords are used by which configs,
> >
> > I don't think this will work well for CD's EPD0 case..
> >
> > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > {
> >         used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> >         if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> >                 return;
> >         memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> >
> >         /* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> >         if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> >                 used_bits[0] &= ~cpu_to_le64(
> >                         CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> >                         CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> >                         CTXDESC_CD_0_TCR_SH0);
> >                 used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> >         }
> > }
> >
> > > and then we can
> > > simplify the algorithm while retaining the ability to reject updates
> > > to qwords which we're not expecting.
> >
> > It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
> > shorter (not that it is complex anyhow) and other stuff gets worse.
>
> I think the simplification here is in the first if branch of
> arm_smmu_write_ste. With Will's proposal, we only perform a hitless
> update if there's a single used qword that needs updating. There's no
> longer a case where we first set unused bits in qwords whose other
> bits are in use. I'd argue that setting unused bits of a q word was
> very clever and removing the logic does conceptually simplify things,
> although yes it's not much fewer lines of code. I also don't think
> this throws away the entire logic of the current design, the idea of
> counting the number of qwords that differ and writing qwords that are
> unused first is still there.
>
> But, it does mean that hitless updates are only possible under
> narrower circumstances...We now have to figure out if there are
> transitions where this is problematic where we could previously assume
> that we'd always get the best behavior possible. Both in the present
> (i.e this SHCFG discussion and EPD0 case) and in the future if new
> parts of the configs start getting used. IMO not having to think about
> this is a meaningful advantage of the current solution.

To be more explicit, I hope we can keep the current solution. The
tests we added mitigates the extra complexity, while there's no
certainty that the 1-bit-per-qword proposal will always be
satisfactory in the future (nor have we even reached consensus that it
is satisfactory in the present with the part 2 CD series)

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-21 16:52               ` Michael Shavit
  0 siblings, 0 replies; 112+ messages in thread
From: Michael Shavit @ 2024-02-21 16:52 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Will Deacon, iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Nicolin Chen, patches, Shameer Kolothum,
	Mostafa Saleh, Zhangfei Gao

On Thu, Feb 22, 2024 at 12:19 AM Michael Shavit <mshavit@google.com> wrote:
>
> On Wed, Feb 21, 2024 at 10:08 PM Jason Gunthorpe <jgg@nvidia.com> wrote:
> >
> > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> >
> > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > used qwords instead of tracking these bits. Basically, we could have some
> > > #defines saying which qwords are used by which configs,
> >
> > I don't think this will work well for CD's EPD0 case..
> >
> > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > {
> >         used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> >         if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> >                 return;
> >         memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> >
> >         /* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> >         if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> >                 used_bits[0] &= ~cpu_to_le64(
> >                         CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> >                         CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> >                         CTXDESC_CD_0_TCR_SH0);
> >                 used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> >         }
> > }
> >
> > > and then we can
> > > simplify the algorithm while retaining the ability to reject updates
> > > to qwords which we're not expecting.
> >
> > It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
> > shorter (not that it is complex anyhow) and other stuff gets worse.
>
> I think the simplification here is in the first if branch of
> arm_smmu_write_ste. With Will's proposal, we only perform a hitless
> update if there's a single used qword that needs updating. There's no
> longer a case where we first set unused bits in qwords whose other
> bits are in use. I'd argue that setting unused bits of a q word was
> very clever and removing the logic does conceptually simplify things,
> although yes it's not much fewer lines of code. I also don't think
> this throws away the entire logic of the current design, the idea of
> counting the number of qwords that differ and writing qwords that are
> unused first is still there.
>
> But, it does mean that hitless updates are only possible under
> narrower circumstances...We now have to figure out if there are
> transitions where this is problematic where we could previously assume
> that we'd always get the best behavior possible. Both in the present
> (i.e this SHCFG discussion and EPD0 case) and in the future if new
> parts of the configs start getting used. IMO not having to think about
> this is a meaningful advantage of the current solution.

To be more explicit, I hope we can keep the current solution. The
tests we added mitigates the extra complexity, while there's no
certainty that the 1-bit-per-qword proposal will always be
satisfactory in the future (nor have we even reached consensus that it
is satisfactory in the present with the part 2 CD series)

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-21 16:19             ` Michael Shavit
@ 2024-02-21 17:06               ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-21 17:06 UTC (permalink / raw)
  To: Michael Shavit
  Cc: Will Deacon, iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Nicolin Chen, patches, Shameer Kolothum,
	Mostafa Saleh, Zhangfei Gao

On Thu, Feb 22, 2024 at 12:19:06AM +0800, Michael Shavit wrote:
> I think the simplification here is in the first if branch of
> arm_smmu_write_ste. With Will's proposal, we only perform a hitless
> update if there's a single used qword that needs updating.

The normal cases like BYPASS -> S1 still require updating QW[1,2]
before updating QW[0], and the reverse as well. That still needs the
three entry_set()'s to process the same way. 

From what I can see if we did 1 bit per qw:

 - get_used becomes harder to explain but shorter (we ignore the used
   qw 1 for bypass/abort)
 - arm_smmu_entry_qword_diff becomes a bit simpler, less bitwise logic,
   no unused_update
 - arm_smmu_write_entry() has the same logic but unused_update is
   replaced by target
 - We have to hack something to make SHCFG=1 - change the make
   functions or have arm_smmu_write_ste() force SHCFG=1
 - We have to write a seperate programming logic for CD -
   always do V=0/1 for normal updates, and a special EPD0 flow.

All doable, but I don't see the benefit in aggregate..

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-21 17:06               ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-21 17:06 UTC (permalink / raw)
  To: Michael Shavit
  Cc: Will Deacon, iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy,
	Lu Baolu, Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Nicolin Chen, patches, Shameer Kolothum,
	Mostafa Saleh, Zhangfei Gao

On Thu, Feb 22, 2024 at 12:19:06AM +0800, Michael Shavit wrote:
> I think the simplification here is in the first if branch of
> arm_smmu_write_ste. With Will's proposal, we only perform a hitless
> update if there's a single used qword that needs updating.

The normal cases like BYPASS -> S1 still require updating QW[1,2]
before updating QW[0], and the reverse as well. That still needs the
three entry_set()'s to process the same way. 

From what I can see if we did 1 bit per qw:

 - get_used becomes harder to explain but shorter (we ignore the used
   qw 1 for bypass/abort)
 - arm_smmu_entry_qword_diff becomes a bit simpler, less bitwise logic,
   no unused_update
 - arm_smmu_write_entry() has the same logic but unused_update is
   replaced by target
 - We have to hack something to make SHCFG=1 - change the make
   functions or have arm_smmu_write_ste() force SHCFG=1
 - We have to write a seperate programming logic for CD -
   always do V=0/1 for normal updates, and a special EPD0 flow.

All doable, but I don't see the benefit in aggregate..

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
  2024-02-15 17:27     ` Robin Murphy
@ 2024-02-22 17:40       ` Will Deacon
  -1 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-22 17:40 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 05:27:09PM +0000, Robin Murphy wrote:
> On 06/02/2024 3:12 pm, Jason Gunthorpe wrote:
> > +/*
> > + * This can safely directly manipulate the STE memory without a sync sequence
> > + * because the STE table has not been installed in the SMMU yet.
> > + */
> >   static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
> 
> This name is long out-of-date - if we're refreshing this area, please rename
> to something relevant to what it actually does, e.g. s/bypass/initial/.
> 
> Although frankly I also think that at this point we should just get rid of
> the disable_bypass parameter altogether - it's been almost entirely
> meaningless since default domain support was added, and any tenuous cases
> for wanting inital STEs to be bypass should probably be using RMRs now
> anyway.

We probably can't drop it for SMMUv2, but I'd be more than happy to do so
on SMMUv3. I think one of the reasons for keeping it in the new driver was
that the Arm Fast Model used to need it for legacy virtio devices that were
downsteam of the SMMU but couldn't advertise F_ACCESS_PLATFORM. Was that
fixed?

Will

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
@ 2024-02-22 17:40       ` Will Deacon
  0 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-22 17:40 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Jason Gunthorpe, iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 05:27:09PM +0000, Robin Murphy wrote:
> On 06/02/2024 3:12 pm, Jason Gunthorpe wrote:
> > +/*
> > + * This can safely directly manipulate the STE memory without a sync sequence
> > + * because the STE table has not been installed in the SMMU yet.
> > + */
> >   static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
> 
> This name is long out-of-date - if we're refreshing this area, please rename
> to something relevant to what it actually does, e.g. s/bypass/initial/.
> 
> Although frankly I also think that at this point we should just get rid of
> the disable_bypass parameter altogether - it's been almost entirely
> meaningless since default domain support was added, and any tenuous cases
> for wanting inital STEs to be bypass should probably be using RMRs now
> anyway.

We probably can't drop it for SMMUv2, but I'd be more than happy to do so
on SMMUv3. I think one of the reasons for keeping it in the new driver was
that the Arm Fast Model used to need it for legacy virtio devices that were
downsteam of the SMMU but couldn't advertise F_ACCESS_PLATFORM. Was that
fixed?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-21 14:08           ` Jason Gunthorpe
@ 2024-02-22 17:43             ` Will Deacon
  -1 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-22 17:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> 
> > Very roughly, yes, although I'd go further and just return a bitmap of
> > used qwords instead of tracking these bits. Basically, we could have some
> > #defines saying which qwords are used by which configs, 
> 
> I don't think this will work well for CD's EPD0 case..
> 
> static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> {
> 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> 		return;
> 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> 
> 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> 		used_bits[0] &= ~cpu_to_le64(
> 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> 			CTXDESC_CD_0_TCR_SH0);
> 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> 	}
> }

Please can you explain more about the issue here? I know what EPDx are,
but I'm not understanding why they're problematic. This presumably
involves a hitless transition to/from an aborting CD?

> > and then we can
> > simplify the algorithm while retaining the ability to reject updates
> > to qwords which we're not expecting.
> 
> It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
> shorter (not that it is complex anyhow) and other stuff gets worse.
> 
> > > We'd have to really start doing really hacky things like remove the
> > > SHCFG as a used field entirely - but I think if you do that you break
> > > the entire logic of the design and also go backwards to having
> > > programming that only works if STEs are constructed in certain ways.
> > 
> > I would actually like to remove SHCFG as a used field. If the encoding
> > was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
> > be easy, but it shouldn't be too hard to work around that.
> 
> But why?
> 
> You throw away the entire logic of the design, go back to subtly
> coupling the two parts, and *for what*? Exactly what are we trying to
> achieve in return? You haven't explained why we are still discussing
> this afer 7 months. It really isn't worthwhile.

I'm just trying to avoid introducing dynamic behaviours to the driver
which aren't actually used, and per-qword tracking feels like an easier
way to maintain the hitless updates for the cases you care about. It's
really not about throwing away the entire logic of the design -- as I
said, I think this is looking pretty good. I'm also absolutely open to
being convinced that per-field makes more sense and per-qword is terrible,
so I'd really like to understand the E0PD case more.

As an aside: is this per-field/per-qword discussion the only thing holding
up a v6? With the rest of the feedback addressed and a version of Michael's
selftest that exercises stage-2 translating domains, I'd like to think
we could get it queued up soon.

Cheers,

Will

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-22 17:43             ` Will Deacon
  0 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-22 17:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> 
> > Very roughly, yes, although I'd go further and just return a bitmap of
> > used qwords instead of tracking these bits. Basically, we could have some
> > #defines saying which qwords are used by which configs, 
> 
> I don't think this will work well for CD's EPD0 case..
> 
> static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> {
> 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> 		return;
> 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> 
> 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> 		used_bits[0] &= ~cpu_to_le64(
> 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> 			CTXDESC_CD_0_TCR_SH0);
> 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> 	}
> }

Please can you explain more about the issue here? I know what EPDx are,
but I'm not understanding why they're problematic. This presumably
involves a hitless transition to/from an aborting CD?

> > and then we can
> > simplify the algorithm while retaining the ability to reject updates
> > to qwords which we're not expecting.
> 
> It is not much simplification. arm_smmu_entry_qword_diff() gets a bit
> shorter (not that it is complex anyhow) and other stuff gets worse.
> 
> > > We'd have to really start doing really hacky things like remove the
> > > SHCFG as a used field entirely - but I think if you do that you break
> > > the entire logic of the design and also go backwards to having
> > > programming that only works if STEs are constructed in certain ways.
> > 
> > I would actually like to remove SHCFG as a used field. If the encoding
> > was less whacky (i.e. if 0b00 always meant "use incoming"), then it would
> > be easy, but it shouldn't be too hard to work around that.
> 
> But why?
> 
> You throw away the entire logic of the design, go back to subtly
> coupling the two parts, and *for what*? Exactly what are we trying to
> achieve in return? You haven't explained why we are still discussing
> this afer 7 months. It really isn't worthwhile.

I'm just trying to avoid introducing dynamic behaviours to the driver
which aren't actually used, and per-qword tracking feels like an easier
way to maintain the hitless updates for the cases you care about. It's
really not about throwing away the entire logic of the design -- as I
said, I think this is looking pretty good. I'm also absolutely open to
being convinced that per-field makes more sense and per-qword is terrible,
so I'd really like to understand the E0PD case more.

As an aside: is this per-field/per-qword discussion the only thing holding
up a v6? With the rest of the feedback addressed and a version of Michael's
selftest that exercises stage-2 translating domains, I'd like to think
we could get it queued up soon.

Cheers,

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-22 17:43             ` Will Deacon
@ 2024-02-23 15:18               ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-23 15:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 22, 2024 at 05:43:46PM +0000, Will Deacon wrote:
> On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> > 
> > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > used qwords instead of tracking these bits. Basically, we could have some
> > > #defines saying which qwords are used by which configs, 
> > 
> > I don't think this will work well for CD's EPD0 case..
> > 
> > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > {
> > 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> > 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> > 		return;
> > 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> > 
> > 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> > 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> > 		used_bits[0] &= ~cpu_to_le64(
> > 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> > 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> > 			CTXDESC_CD_0_TCR_SH0);
> > 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> > 	}
> > }
> 
> Please can you explain more about the issue here? I know what EPDx are,
> but I'm not understanding why they're problematic. This presumably
> involves a hitless transition to/from an aborting CD?

When a process using SVA exits uncleanly the MM is released so the
SMMU HW must stop chasing the page table pointers since all that
memory will be freed.

However, in an unclean exit we can't control the order of shutdown so
something like uacce or RDMA may not have quieted the DMA device yet.

So there is a period during shutdown where the mm has been released
and the device is doing DMA, the desire is that the DMA continue to be
handled as a PRI and the SW will return failure for all PRI requests.

Specifically we do not want to trigger any dmesg log events during
this condition.

Jean-Philippe came up with this solution where we hitlessly use EPD0
in release to allow the mm to release the page table while continuing
to use the PRI flow.

So it is going from a "SVA domain with a page table" to a "SVA domain
without a page table but EPD0 set", hitlessly.


> I'm just trying to avoid introducing dynamic behaviours to the driver
> which aren't actually used, and per-qword tracking feels like an easier
> way to maintain the hitless updates for the cases you care about. It's
> really not about throwing away the entire logic of the design -- as I
> said, I think this is looking pretty good. I'm also absolutely open to
> being convinced that per-field makes more sense and per-qword is terrible,
> so I'd really like to understand the E0PD case more.

It is not more sense/terrible, it is more that we have to make some
trade offs. I outlined what I think would be needed to make per-qw
work in the other email:

 - get_used becomes harder to explain but shorter (we ignore the used
   qw 1 for bypass/abort)
 - arm_smmu_entry_qword_diff becomes a bit simpler, less bitwise logic,
   no unused_update
 - arm_smmu_write_entry() has the same logic but unused_update is
   replaced by target
 - We have to hack something to make SHCFG=1 - change the make
   functions or have arm_smmu_write_ste() force SHCFG=1.
 - We have to write a seperate programming logic for CD -
   always do V=0/1 for normal updates, and a special EPD0 flow.

I think it is worse over all because none of those trade offs really
make the code clearer, and I dislike the idea of open coding
CD. Especially now that we have a test suite that requires the ops
anyhow.

It is a minor decision, trust Michael and I make this choice, we both
agree now and have spent alot of time studying this.

> As an aside: is this per-field/per-qword discussion the only thing holding
> up a v6?

As far as I know, yes. I have not typed in every feedback yet, but I
hope to get that done today. I will try to post it by Monday so we can
see what it looks like with Robin's suggestion but without per-qw.

> With the rest of the feedback addressed and a version of Michael's
> selftest that exercises stage-2 translating domains, I'd like to
> think we could get it queued up soon.

I would really like this, we have so many more patches to work on, you
probably saw the HTTU stuff was re posted again, we have a clean full
BTM enablement now on the list for the first time, nesting patches,
and more. Including this, I'm tracking a work list of about 100-150
patches for SMMUv3 in the next little bit.

This is not unique to SMMUv3, AMD is on part 6 of work for their
driver, and Intel has been pushing ~10-20 patches/cycle pretty
reliably. iommufd has opened the door to actually solving alot of the
stuck problems and everyone is rushing to complete their previously
stalled HW enablement. I have to review and help design all of this
work too! :)

BTW Michael's self test won't be in part 1 because it needs the ops to
be restored in order to work (now done in part 2), and has a few other
more minor dependencies on part 2 and 3.

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-23 15:18               ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-23 15:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 22, 2024 at 05:43:46PM +0000, Will Deacon wrote:
> On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> > 
> > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > used qwords instead of tracking these bits. Basically, we could have some
> > > #defines saying which qwords are used by which configs, 
> > 
> > I don't think this will work well for CD's EPD0 case..
> > 
> > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > {
> > 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> > 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> > 		return;
> > 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> > 
> > 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> > 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> > 		used_bits[0] &= ~cpu_to_le64(
> > 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> > 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> > 			CTXDESC_CD_0_TCR_SH0);
> > 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> > 	}
> > }
> 
> Please can you explain more about the issue here? I know what EPDx are,
> but I'm not understanding why they're problematic. This presumably
> involves a hitless transition to/from an aborting CD?

When a process using SVA exits uncleanly the MM is released so the
SMMU HW must stop chasing the page table pointers since all that
memory will be freed.

However, in an unclean exit we can't control the order of shutdown so
something like uacce or RDMA may not have quieted the DMA device yet.

So there is a period during shutdown where the mm has been released
and the device is doing DMA, the desire is that the DMA continue to be
handled as a PRI and the SW will return failure for all PRI requests.

Specifically we do not want to trigger any dmesg log events during
this condition.

Jean-Philippe came up with this solution where we hitlessly use EPD0
in release to allow the mm to release the page table while continuing
to use the PRI flow.

So it is going from a "SVA domain with a page table" to a "SVA domain
without a page table but EPD0 set", hitlessly.


> I'm just trying to avoid introducing dynamic behaviours to the driver
> which aren't actually used, and per-qword tracking feels like an easier
> way to maintain the hitless updates for the cases you care about. It's
> really not about throwing away the entire logic of the design -- as I
> said, I think this is looking pretty good. I'm also absolutely open to
> being convinced that per-field makes more sense and per-qword is terrible,
> so I'd really like to understand the E0PD case more.

It is not more sense/terrible, it is more that we have to make some
trade offs. I outlined what I think would be needed to make per-qw
work in the other email:

 - get_used becomes harder to explain but shorter (we ignore the used
   qw 1 for bypass/abort)
 - arm_smmu_entry_qword_diff becomes a bit simpler, less bitwise logic,
   no unused_update
 - arm_smmu_write_entry() has the same logic but unused_update is
   replaced by target
 - We have to hack something to make SHCFG=1 - change the make
   functions or have arm_smmu_write_ste() force SHCFG=1.
 - We have to write a seperate programming logic for CD -
   always do V=0/1 for normal updates, and a special EPD0 flow.

I think it is worse over all because none of those trade offs really
make the code clearer, and I dislike the idea of open coding
CD. Especially now that we have a test suite that requires the ops
anyhow.

It is a minor decision, trust Michael and I make this choice, we both
agree now and have spent alot of time studying this.

> As an aside: is this per-field/per-qword discussion the only thing holding
> up a v6?

As far as I know, yes. I have not typed in every feedback yet, but I
hope to get that done today. I will try to post it by Monday so we can
see what it looks like with Robin's suggestion but without per-qw.

> With the rest of the feedback addressed and a version of Michael's
> selftest that exercises stage-2 translating domains, I'd like to
> think we could get it queued up soon.

I would really like this, we have so many more patches to work on, you
probably saw the HTTU stuff was re posted again, we have a clean full
BTM enablement now on the list for the first time, nesting patches,
and more. Including this, I'm tracking a work list of about 100-150
patches for SMMUv3 in the next little bit.

This is not unique to SMMUv3, AMD is on part 6 of work for their
driver, and Intel has been pushing ~10-20 patches/cycle pretty
reliably. iommufd has opened the door to actually solving alot of the
stuck problems and everyone is rushing to complete their previously
stalled HW enablement. I have to review and help design all of this
work too! :)

BTW Michael's self test won't be in part 1 because it needs the ops to
be restored in order to work (now done in part 2), and has a few other
more minor dependencies on part 2 and 3.

Thanks,
Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
  2024-02-15 17:27     ` Robin Murphy
@ 2024-02-23 18:53       ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-23 18:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Will Deacon, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 05:27:09PM +0000, Robin Murphy wrote:
> > @@ -1583,22 +1595,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> >   	arm_smmu_write_ste(master, sid, dst, &target);
> >   }
> > +/*
> > + * This can safely directly manipulate the STE memory without a sync sequence
> > + * because the STE table has not been installed in the SMMU yet.
> > + */
> >   static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
> 
> This name is long out-of-date - if we're refreshing this area, please rename
> to something relevant to what it actually does, e.g. s/bypass/initial/.

Done

> Although frankly I also think that at this point we should just get rid of
> the disable_bypass parameter altogether - it's been almost entirely
> meaningless since default domain support was added, and any tenuous cases
> for wanting inital STEs to be bypass should probably be using RMRs now
> anyway.

I can write the patch for this if you and Will agree

Thanks,
Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
@ 2024-02-23 18:53       ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-23 18:53 UTC (permalink / raw)
  To: Robin Murphy
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Will Deacon, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Thu, Feb 15, 2024 at 05:27:09PM +0000, Robin Murphy wrote:
> > @@ -1583,22 +1595,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> >   	arm_smmu_write_ste(master, sid, dst, &target);
> >   }
> > +/*
> > + * This can safely directly manipulate the STE memory without a sync sequence
> > + * because the STE table has not been installed in the SMMU yet.
> > + */
> >   static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
> 
> This name is long out-of-date - if we're refreshing this area, please rename
> to something relevant to what it actually does, e.g. s/bypass/initial/.

Done

> Although frankly I also think that at this point we should just get rid of
> the disable_bypass parameter altogether - it's been almost entirely
> meaningless since default domain support was added, and any tenuous cases
> for wanting inital STEs to be bypass should probably be using RMRs now
> anyway.

I can write the patch for this if you and Will agree

Thanks,
Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
  2024-02-23 18:53       ` Jason Gunthorpe
@ 2024-02-27 10:50         ` Will Deacon
  -1 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-27 10:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Fri, Feb 23, 2024 at 02:53:58PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 05:27:09PM +0000, Robin Murphy wrote:
> > > @@ -1583,22 +1595,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> > >   	arm_smmu_write_ste(master, sid, dst, &target);
> > >   }
> > > +/*
> > > + * This can safely directly manipulate the STE memory without a sync sequence
> > > + * because the STE table has not been installed in the SMMU yet.
> > > + */
> > >   static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
> > 
> > This name is long out-of-date - if we're refreshing this area, please rename
> > to something relevant to what it actually does, e.g. s/bypass/initial/.
> 
> Done
> 
> > Although frankly I also think that at this point we should just get rid of
> > the disable_bypass parameter altogether - it's been almost entirely
> > meaningless since default domain support was added, and any tenuous cases
> > for wanting inital STEs to be bypass should probably be using RMRs now
> > anyway.
> 
> I can write the patch for this if you and Will agree

Yes, please! I'll be glad to see the back of that option. Since you just
posted v6 of your "part 1", feel free to send a patch which applies on top
of that (rather than having to rebase the whole shebang).

Will

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass
@ 2024-02-27 10:50         ` Will Deacon
  0 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-27 10:50 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Robin Murphy, iommu, Joerg Roedel, linux-arm-kernel, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Fri, Feb 23, 2024 at 02:53:58PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 15, 2024 at 05:27:09PM +0000, Robin Murphy wrote:
> > > @@ -1583,22 +1595,20 @@ static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid,
> > >   	arm_smmu_write_ste(master, sid, dst, &target);
> > >   }
> > > +/*
> > > + * This can safely directly manipulate the STE memory without a sync sequence
> > > + * because the STE table has not been installed in the SMMU yet.
> > > + */
> > >   static void arm_smmu_init_bypass_stes(struct arm_smmu_ste *strtab,
> > 
> > This name is long out-of-date - if we're refreshing this area, please rename
> > to something relevant to what it actually does, e.g. s/bypass/initial/.
> 
> Done
> 
> > Although frankly I also think that at this point we should just get rid of
> > the disable_bypass parameter altogether - it's been almost entirely
> > meaningless since default domain support was added, and any tenuous cases
> > for wanting inital STEs to be bypass should probably be using RMRs now
> > anyway.
> 
> I can write the patch for this if you and Will agree

Yes, please! I'll be glad to see the back of that option. Since you just
posted v6 of your "part 1", feel free to send a patch which applies on top
of that (rather than having to rebase the whole shebang).

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-23 15:18               ` Jason Gunthorpe
@ 2024-02-27 12:43                 ` Will Deacon
  -1 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-27 12:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Fri, Feb 23, 2024 at 11:18:41AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 22, 2024 at 05:43:46PM +0000, Will Deacon wrote:
> > On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> > > 
> > > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > > used qwords instead of tracking these bits. Basically, we could have some
> > > > #defines saying which qwords are used by which configs, 
> > > 
> > > I don't think this will work well for CD's EPD0 case..
> > > 
> > > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > > {
> > > 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> > > 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> > > 		return;
> > > 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> > > 
> > > 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> > > 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> > > 		used_bits[0] &= ~cpu_to_le64(
> > > 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> > > 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> > > 			CTXDESC_CD_0_TCR_SH0);
> > > 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> > > 	}
> > > }
> > 
> > Please can you explain more about the issue here? I know what EPDx are,
> > but I'm not understanding why they're problematic. This presumably
> > involves a hitless transition to/from an aborting CD?
> 
> When a process using SVA exits uncleanly the MM is released so the
> SMMU HW must stop chasing the page table pointers since all that
> memory will be freed.
> 
> However, in an unclean exit we can't control the order of shutdown so
> something like uacce or RDMA may not have quieted the DMA device yet.
> 
> So there is a period during shutdown where the mm has been released
> and the device is doing DMA, the desire is that the DMA continue to be
> handled as a PRI and the SW will return failure for all PRI requests.
> 
> Specifically we do not want to trigger any dmesg log events during
> this condition.

Curious, but why is it problematic to log events? As you say, it's an
"unclean" exit, so it doesn't seem that unreasonable to me.

> Jean-Philippe came up with this solution where we hitlessly use EPD0
> in release to allow the mm to release the page table while continuing
> to use the PRI flow.
>
> So it is going from a "SVA domain with a page table" to a "SVA domain
> without a page table but EPD0 set", hitlessly.

Ok, and so the reason this adds complexity is because the set of used
bits/qwords changes based on something other than the cfg? I think it's
a pretty weak argument for field vs qwords, but it's a good
counter-example to my naive approach of per-config masks, so thanks.

> BTW Michael's self test won't be in part 1 because it needs the ops to
> be restored in order to work (now done in part 2), and has a few other
> more minor dependencies on part 2 and 3.

That's a pity, but fair enough.

Will

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-27 12:43                 ` Will Deacon
  0 siblings, 0 replies; 112+ messages in thread
From: Will Deacon @ 2024-02-27 12:43 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Fri, Feb 23, 2024 at 11:18:41AM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 22, 2024 at 05:43:46PM +0000, Will Deacon wrote:
> > On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> > > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> > > 
> > > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > > used qwords instead of tracking these bits. Basically, we could have some
> > > > #defines saying which qwords are used by which configs, 
> > > 
> > > I don't think this will work well for CD's EPD0 case..
> > > 
> > > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > > {
> > > 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> > > 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> > > 		return;
> > > 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> > > 
> > > 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> > > 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> > > 		used_bits[0] &= ~cpu_to_le64(
> > > 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> > > 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> > > 			CTXDESC_CD_0_TCR_SH0);
> > > 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> > > 	}
> > > }
> > 
> > Please can you explain more about the issue here? I know what EPDx are,
> > but I'm not understanding why they're problematic. This presumably
> > involves a hitless transition to/from an aborting CD?
> 
> When a process using SVA exits uncleanly the MM is released so the
> SMMU HW must stop chasing the page table pointers since all that
> memory will be freed.
> 
> However, in an unclean exit we can't control the order of shutdown so
> something like uacce or RDMA may not have quieted the DMA device yet.
> 
> So there is a period during shutdown where the mm has been released
> and the device is doing DMA, the desire is that the DMA continue to be
> handled as a PRI and the SW will return failure for all PRI requests.
> 
> Specifically we do not want to trigger any dmesg log events during
> this condition.

Curious, but why is it problematic to log events? As you say, it's an
"unclean" exit, so it doesn't seem that unreasonable to me.

> Jean-Philippe came up with this solution where we hitlessly use EPD0
> in release to allow the mm to release the page table while continuing
> to use the PRI flow.
>
> So it is going from a "SVA domain with a page table" to a "SVA domain
> without a page table but EPD0 set", hitlessly.

Ok, and so the reason this adds complexity is because the set of used
bits/qwords changes based on something other than the cfg? I think it's
a pretty weak argument for field vs qwords, but it's a good
counter-example to my naive approach of per-config masks, so thanks.

> BTW Michael's self test won't be in part 1 because it needs the ops to
> be restored in order to work (now done in part 2), and has a few other
> more minor dependencies on part 2 and 3.

That's a pity, but fair enough.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
  2024-02-27 12:43                 ` Will Deacon
@ 2024-02-29 13:57                   ` Jason Gunthorpe
  -1 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-29 13:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Tue, Feb 27, 2024 at 12:43:18PM +0000, Will Deacon wrote:
> On Fri, Feb 23, 2024 at 11:18:41AM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 22, 2024 at 05:43:46PM +0000, Will Deacon wrote:
> > > On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> > > > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> > > > 
> > > > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > > > used qwords instead of tracking these bits. Basically, we could have some
> > > > > #defines saying which qwords are used by which configs, 
> > > > 
> > > > I don't think this will work well for CD's EPD0 case..
> > > > 
> > > > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > > > {
> > > > 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> > > > 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> > > > 		return;
> > > > 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> > > > 
> > > > 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> > > > 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> > > > 		used_bits[0] &= ~cpu_to_le64(
> > > > 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> > > > 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> > > > 			CTXDESC_CD_0_TCR_SH0);
> > > > 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> > > > 	}
> > > > }
> > > 
> > > Please can you explain more about the issue here? I know what EPDx are,
> > > but I'm not understanding why they're problematic. This presumably
> > > involves a hitless transition to/from an aborting CD?
> > 
> > When a process using SVA exits uncleanly the MM is released so the
> > SMMU HW must stop chasing the page table pointers since all that
> > memory will be freed.
> > 
> > However, in an unclean exit we can't control the order of shutdown so
> > something like uacce or RDMA may not have quieted the DMA device yet.
> > 
> > So there is a period during shutdown where the mm has been released
> > and the device is doing DMA, the desire is that the DMA continue to be
> > handled as a PRI and the SW will return failure for all PRI requests.
> > 
> > Specifically we do not want to trigger any dmesg log events during
> > this condition.
> 
> Curious, but why is it problematic to log events? As you say, it's an
> "unclean" exit, so it doesn't seem that unreasonable to me.

Well, I would defer to Jean-Philippe, but I can understand the
logic. A user ctrl-c's their application it is not nice to get some
dmesg logs from that.

I recall he felt strongly about this, we had some discussion about it
related to the mmu notifiers back when the iommu drivers were all
updated to the new notifier API I built...

> > Jean-Philippe came up with this solution where we hitlessly use EPD0
> > in release to allow the mm to release the page table while continuing
> > to use the PRI flow.
> >
> > So it is going from a "SVA domain with a page table" to a "SVA domain
> > without a page table but EPD0 set", hitlessly.
> 
> Ok, and so the reason this adds complexity is because the set of used
> bits/qwords changes based on something other than the cfg? 

There is no cfg for CD entries? I think it is the same issue as SHCFG,
qw1 of CD is not neatly split and qw0/1 are both changing for the EPD0
case - we also zero the unused TCR/TTB.

> I think it's a pretty weak argument for field vs qwords, but it's a
> good counter-example to my naive approach of per-config masks, so
> thanks.

We could do EPD0 just by editting in place, it would be easy to code,
but the point of this design was to never edit a descriptor in place.

Jason

^ permalink raw reply	[flat|nested] 112+ messages in thread

* Re: [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers
@ 2024-02-29 13:57                   ` Jason Gunthorpe
  0 siblings, 0 replies; 112+ messages in thread
From: Jason Gunthorpe @ 2024-02-29 13:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: iommu, Joerg Roedel, linux-arm-kernel, Robin Murphy, Lu Baolu,
	Jean-Philippe Brucker, Joerg Roedel, Moritz Fischer,
	Moritz Fischer, Michael Shavit, Nicolin Chen, patches,
	Shameer Kolothum, Mostafa Saleh, Zhangfei Gao

On Tue, Feb 27, 2024 at 12:43:18PM +0000, Will Deacon wrote:
> On Fri, Feb 23, 2024 at 11:18:41AM -0400, Jason Gunthorpe wrote:
> > On Thu, Feb 22, 2024 at 05:43:46PM +0000, Will Deacon wrote:
> > > On Wed, Feb 21, 2024 at 10:08:18AM -0400, Jason Gunthorpe wrote:
> > > > On Wed, Feb 21, 2024 at 01:49:23PM +0000, Will Deacon wrote:
> > > > 
> > > > > Very roughly, yes, although I'd go further and just return a bitmap of
> > > > > used qwords instead of tracking these bits. Basically, we could have some
> > > > > #defines saying which qwords are used by which configs, 
> > > > 
> > > > I don't think this will work well for CD's EPD0 case..
> > > > 
> > > > static void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits)
> > > > {
> > > > 	used_bits[0] = cpu_to_le64(CTXDESC_CD_0_V);
> > > > 	if (!(ent[0] & cpu_to_le64(CTXDESC_CD_0_V)))
> > > > 		return;
> > > > 	memset(used_bits, 0xFF, sizeof(struct arm_smmu_cd));
> > > > 
> > > > 	/* EPD0 means T0SZ/TG0/IR0/OR0/SH0/TTB0 are IGNORED */
> > > > 	if (ent[0] & cpu_to_le64(CTXDESC_CD_0_TCR_EPD0)) {
> > > > 		used_bits[0] &= ~cpu_to_le64(
> > > > 			CTXDESC_CD_0_TCR_T0SZ | CTXDESC_CD_0_TCR_TG0 |
> > > > 			CTXDESC_CD_0_TCR_IRGN0 | CTXDESC_CD_0_TCR_ORGN0 |
> > > > 			CTXDESC_CD_0_TCR_SH0);
> > > > 		used_bits[1] &= ~cpu_to_le64(CTXDESC_CD_1_TTB0_MASK);
> > > > 	}
> > > > }
> > > 
> > > Please can you explain more about the issue here? I know what EPDx are,
> > > but I'm not understanding why they're problematic. This presumably
> > > involves a hitless transition to/from an aborting CD?
> > 
> > When a process using SVA exits uncleanly the MM is released so the
> > SMMU HW must stop chasing the page table pointers since all that
> > memory will be freed.
> > 
> > However, in an unclean exit we can't control the order of shutdown so
> > something like uacce or RDMA may not have quieted the DMA device yet.
> > 
> > So there is a period during shutdown where the mm has been released
> > and the device is doing DMA, the desire is that the DMA continue to be
> > handled as a PRI and the SW will return failure for all PRI requests.
> > 
> > Specifically we do not want to trigger any dmesg log events during
> > this condition.
> 
> Curious, but why is it problematic to log events? As you say, it's an
> "unclean" exit, so it doesn't seem that unreasonable to me.

Well, I would defer to Jean-Philippe, but I can understand the
logic. A user ctrl-c's their application it is not nice to get some
dmesg logs from that.

I recall he felt strongly about this, we had some discussion about it
related to the mmu notifiers back when the iommu drivers were all
updated to the new notifier API I built...

> > Jean-Philippe came up with this solution where we hitlessly use EPD0
> > in release to allow the mm to release the page table while continuing
> > to use the PRI flow.
> >
> > So it is going from a "SVA domain with a page table" to a "SVA domain
> > without a page table but EPD0 set", hitlessly.
> 
> Ok, and so the reason this adds complexity is because the set of used
> bits/qwords changes based on something other than the cfg? 

There is no cfg for CD entries? I think it is the same issue as SHCFG,
qw1 of CD is not neatly split and qw0/1 are both changing for the EPD0
case - we also zero the unused TCR/TTB.

> I think it's a pretty weak argument for field vs qwords, but it's a
> good counter-example to my naive approach of per-config masks, so
> thanks.

We could do EPD0 just by editting in place, it would be easy to code,
but the point of this design was to never edit a descriptor in place.

Jason

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 112+ messages in thread

end of thread, other threads:[~2024-02-29 13:57 UTC | newest]

Thread overview: 112+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-06 15:12 [PATCH v5 00/17] Update SMMUv3 to the modern iommu API (part 1/3) Jason Gunthorpe
2024-02-06 15:12 ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 01/17] iommu/arm-smmu-v3: Make STE programming independent of the callers Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-15 13:49   ` Will Deacon
2024-02-15 13:49     ` Will Deacon
2024-02-15 16:01     ` Jason Gunthorpe
2024-02-15 16:01       ` Jason Gunthorpe
2024-02-15 18:42       ` Robin Murphy
2024-02-15 18:42         ` Robin Murphy
2024-02-15 20:11         ` Robin Murphy
2024-02-15 20:11           ` Robin Murphy
2024-02-16 16:28           ` Will Deacon
2024-02-16 16:28             ` Will Deacon
2024-02-15 21:17         ` Jason Gunthorpe
2024-02-15 21:17           ` Jason Gunthorpe
2024-02-21 13:49       ` Will Deacon
2024-02-21 13:49         ` Will Deacon
2024-02-21 14:08         ` Jason Gunthorpe
2024-02-21 14:08           ` Jason Gunthorpe
2024-02-21 16:19           ` Michael Shavit
2024-02-21 16:19             ` Michael Shavit
2024-02-21 16:52             ` Michael Shavit
2024-02-21 16:52               ` Michael Shavit
2024-02-21 17:06             ` Jason Gunthorpe
2024-02-21 17:06               ` Jason Gunthorpe
2024-02-22 17:43           ` Will Deacon
2024-02-22 17:43             ` Will Deacon
2024-02-23 15:18             ` Jason Gunthorpe
2024-02-23 15:18               ` Jason Gunthorpe
2024-02-27 12:43               ` Will Deacon
2024-02-27 12:43                 ` Will Deacon
2024-02-29 13:57                 ` Jason Gunthorpe
2024-02-29 13:57                   ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 02/17] iommu/arm-smmu-v3: Consolidate the STE generation for abort/bypass Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-15 17:27   ` Robin Murphy
2024-02-15 17:27     ` Robin Murphy
2024-02-22 17:40     ` Will Deacon
2024-02-22 17:40       ` Will Deacon
2024-02-23 18:53     ` Jason Gunthorpe
2024-02-23 18:53       ` Jason Gunthorpe
2024-02-27 10:50       ` Will Deacon
2024-02-27 10:50         ` Will Deacon
2024-02-06 15:12 ` [PATCH v5 03/17] iommu/arm-smmu-v3: Move arm_smmu_rmr_install_bypass_ste() Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-13 15:37   ` Mostafa Saleh
2024-02-13 15:37     ` Mostafa Saleh
2024-02-13 16:16     ` Jason Gunthorpe
2024-02-13 16:16       ` Jason Gunthorpe
2024-02-13 16:46       ` Mostafa Saleh
2024-02-13 16:46         ` Mostafa Saleh
2024-02-15 19:01     ` Robin Murphy
2024-02-15 19:01       ` Robin Murphy
2024-02-15 21:18       ` Jason Gunthorpe
2024-02-15 21:18         ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 04/17] iommu/arm-smmu-v3: Move the STE generation for S1 and S2 domains into functions Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-16 17:12   ` Jason Gunthorpe
2024-02-16 17:12     ` Jason Gunthorpe
2024-02-16 17:39     ` Will Deacon
2024-02-16 17:39       ` Will Deacon
2024-02-16 17:58       ` Jason Gunthorpe
2024-02-16 17:58         ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 05/17] iommu/arm-smmu-v3: Build the whole STE in arm_smmu_make_s2_domain_ste() Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 06/17] iommu/arm-smmu-v3: Hold arm_smmu_asid_lock during all of attach_dev Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-13 15:38   ` Mostafa Saleh
2024-02-13 15:38     ` Mostafa Saleh
2024-02-13 16:18     ` Jason Gunthorpe
2024-02-13 16:18       ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 07/17] iommu/arm-smmu-v3: Compute the STE only once for each master Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 08/17] iommu/arm-smmu-v3: Do not change the STE twice during arm_smmu_attach_dev() Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-13 15:40   ` Mostafa Saleh
2024-02-13 15:40     ` Mostafa Saleh
2024-02-13 16:26     ` Jason Gunthorpe
2024-02-13 16:26       ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 09/17] iommu/arm-smmu-v3: Put writing the context descriptor in the right order Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-13 15:42   ` Mostafa Saleh
2024-02-13 15:42     ` Mostafa Saleh
2024-02-13 17:50     ` Jason Gunthorpe
2024-02-13 17:50       ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 10/17] iommu/arm-smmu-v3: Pass smmu_domain to arm_enable/disable_ats() Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-13 15:43   ` Mostafa Saleh
2024-02-13 15:43     ` Mostafa Saleh
2024-02-06 15:12 ` [PATCH v5 11/17] iommu/arm-smmu-v3: Remove arm_smmu_master->domain Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-13 15:45   ` Mostafa Saleh
2024-02-13 15:45     ` Mostafa Saleh
2024-02-13 16:37     ` Jason Gunthorpe
2024-02-13 16:37       ` Jason Gunthorpe
2024-02-13 17:00       ` Mostafa Saleh
2024-02-13 17:00         ` Mostafa Saleh
2024-02-06 15:12 ` [PATCH v5 12/17] iommu/arm-smmu-v3: Check that the RID domain is S1 in SVA Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 13/17] iommu/arm-smmu-v3: Add a global static IDENTITY domain Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 14/17] iommu/arm-smmu-v3: Add a global static BLOCKED domain Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 15/17] iommu/arm-smmu-v3: Use the identity/blocked domain during release Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 16/17] iommu/arm-smmu-v3: Pass arm_smmu_domain and arm_smmu_device to finalize Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-06 15:12 ` [PATCH v5 17/17] iommu/arm-smmu-v3: Convert to domain_alloc_paging() Jason Gunthorpe
2024-02-06 15:12   ` Jason Gunthorpe
2024-02-07  5:27 ` [PATCH v5 00/17] Update SMMUv3 to the modern iommu API (part 1/3) Nicolin Chen
2024-02-07  5:27   ` Nicolin Chen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.