IOMMU Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation
@ 2020-07-20 15:40 Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 01/13] iommu/arm-smmu: Pass io-pgtable config to implementation specific function Jordan Crouse
                   ` (12 more replies)
  0 siblings, 13 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: devicetree, David Airlie, Akhil P Oommen, dri-devel, Eric Anholt,
	AngeloGioacchino Del Regno, Will Deacon, Emil Velikov,
	Jonathan Marek, iommu, Andy Gross, Brian Masney, Wambui Karuga,
	Sharat Masetty, Rob Herring, John Stultz, Sean Paul, Ben Dooks,
	linux-arm-kernel, Robin Murphy, linux-kernel, Daniel Vetter,
	Shawn Guo, freedreno

(reworded the summary to reflect ongoing changes in the code)

This series adds an Adreno SMMU implementation to arm-smmu to allow GPU hardware
pagetable switching.

The Adreno GPU has built in capabilities to switch the TTBR0 pagetable during
runtime to allow each individual instance or application to have its own
pagetable.  In order to take advantage of the HW capabilities there are certain
requirements needed of the SMMU hardware.

This series adds support for an Adreno specific arm-smmu implementation. The new
implementation 1) ensures that the GPU domain is always assigned context bank 0,
2) enables split pagetable support (TTBR1) so that the instance specific
pagetable can be swapped while the global memory remains in place and 3) shares
the current pagetable configuration with the GPU driver to allow it to create
its own io-pgtable instances.

The series then adds the drm/msm code to enable these features. For targets that
support it allocate new pagetables using the io-pgtable configuration shared by
the arm-smmu driver and swap them in during runtime.

This version of the series merges the previous patchset(s) [1] and [2]
with the following improvements:

  - arm-smmu: add implementation hook to allocate context banks
  - arm-smmu: Match the GPU domain by stream ID instead of compatible string
  - arm-smmu: Make DOMAIN_ATTR_PGTABLE_CFG bi-directional. The leaf driver
    queries the configuration to create a pagetable and then sends the newly
    created configuration back to the smmu-driver to enable TTBR0
  - drm/msm: Add context reference counting for submissions
  - drm/msm: Use dummy functions to skip TLB operations on per-instance
    pagetables

[1] https://lists.linuxfoundation.org/pipermail/iommu/2020-June/045653.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2020-June/045659.html


Jordan Crouse (13):
  iommu/arm-smmu: Pass io-pgtable config to implementation specific
    function
  iommu/arm-smmu: Add support for split pagetables
  iommu/arm-smmu: Add implementation hooks to configure contexts
  iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU
  iommu: Add a domain attribute to get/set a pagetable configuration
  iommu/arm-smmu-qcom: Get and set the pagetable config for split
    pagetables
  dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  drm/msm: Add a context pointer to the submitqueue
  drm/msm: Set the global virtual address range from the IOMMU domain
  drm/msm: Add support to create a local pagetable
  drm/msm: Add support for private address space instances
  drm/msm/a6xx: Add support for per-instance pagetables
  arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU

 .../devicetree/bindings/iommu/arm,smmu.yaml   |   4 +
 arch/arm64/boot/dts/qcom/sdm845.dtsi          |   2 +-
 drivers/gpu/drm/msm/adreno/a5xx_gpu.c         |  12 +-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c         |  58 ++++-
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h         |   1 +
 drivers/gpu/drm/msm/adreno/adreno_gpu.c       |  18 +-
 drivers/gpu/drm/msm/adreno/adreno_gpu.h       |   3 +-
 drivers/gpu/drm/msm/msm_drv.c                 |  16 +-
 drivers/gpu/drm/msm/msm_drv.h                 |  13 ++
 drivers/gpu/drm/msm/msm_gem.h                 |   1 +
 drivers/gpu/drm/msm/msm_gem_submit.c          |   8 +-
 drivers/gpu/drm/msm/msm_gem_vma.c             |   9 +
 drivers/gpu/drm/msm/msm_gpu.c                 |  26 ++-
 drivers/gpu/drm/msm/msm_gpu.h                 |  12 +-
 drivers/gpu/drm/msm/msm_gpummu.c              |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c               | 198 +++++++++++++++++-
 drivers/gpu/drm/msm/msm_mmu.h                 |  16 +-
 drivers/gpu/drm/msm/msm_ringbuffer.h          |   1 +
 drivers/gpu/drm/msm/msm_submitqueue.c         |   8 +-
 drivers/iommu/arm-smmu-impl.c                 |   6 +-
 drivers/iommu/arm-smmu-qcom.c                 | 130 +++++++++++-
 drivers/iommu/arm-smmu.c                      | 108 +++++-----
 drivers/iommu/arm-smmu.h                      |  65 +++++-
 include/linux/iommu.h                         |   1 +
 24 files changed, 619 insertions(+), 99 deletions(-)

-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 01/13] iommu/arm-smmu: Pass io-pgtable config to implementation specific function
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 02/13] iommu/arm-smmu: Add support for split pagetables Jordan Crouse
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: linux-kernel, freedreno, Robin Murphy, iommu, Will Deacon,
	linux-arm-kernel

Construct the io-pgtable config before calling the implementation specific
init_context function and pass it so the implementation specific function
can get a chance to change it before the io-pgtable is created.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/iommu/arm-smmu-impl.c |  3 ++-
 drivers/iommu/arm-smmu.c      | 11 ++++++-----
 drivers/iommu/arm-smmu.h      |  3 ++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index c75b9d957b70..a20e426d81ac 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -68,7 +68,8 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 	return 0;
 }
 
-static int cavium_init_context(struct arm_smmu_domain *smmu_domain)
+static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
+		struct io_pgtable_cfg *pgtbl_cfg)
 {
 	struct cavium_smmu *cs = container_of(smmu_domain->smmu,
 					      struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 243bc4cb2705..0e2c65ee9e5a 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -797,11 +797,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		cfg->asid = cfg->cbndx;
 
 	smmu_domain->smmu = smmu;
-	if (smmu->impl && smmu->impl->init_context) {
-		ret = smmu->impl->init_context(smmu_domain);
-		if (ret)
-			goto out_unlock;
-	}
 
 	pgtbl_cfg = (struct io_pgtable_cfg) {
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
@@ -812,6 +807,12 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		.iommu_dev	= smmu->dev,
 	};
 
+	if (smmu->impl && smmu->impl->init_context) {
+		ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg);
+		if (ret)
+			goto out_clear_smmu;
+	}
+
 	if (smmu_domain->non_strict)
 		pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d172c024be61..38b041530a4f 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -383,7 +383,8 @@ struct arm_smmu_impl {
 			    u64 val);
 	int (*cfg_probe)(struct arm_smmu_device *smmu);
 	int (*reset)(struct arm_smmu_device *smmu);
-	int (*init_context)(struct arm_smmu_domain *smmu_domain);
+	int (*init_context)(struct arm_smmu_domain *smmu_domain,
+			struct io_pgtable_cfg *cfg);
 	void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 			 int status);
 	int (*def_domain_type)(struct device *dev);
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 02/13] iommu/arm-smmu: Add support for split pagetables
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 01/13] iommu/arm-smmu: Pass io-pgtable config to implementation specific function Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 03/13] iommu/arm-smmu: Add implementation hooks to configure contexts Jordan Crouse
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: linux-kernel, freedreno, Robin Murphy, iommu, Will Deacon,
	linux-arm-kernel

Enable TTBR1 for a context bank if IO_PGTABLE_QUIRK_ARM_TTBR1 is selected
by the io-pgtable configuration.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/iommu/arm-smmu.c | 21 ++++++++++++++++-----
 drivers/iommu/arm-smmu.h | 25 +++++++++++++++++++------
 2 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 0e2c65ee9e5a..8798428a4c8d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -555,11 +555,15 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 			cb->ttbr[0] = pgtbl_cfg->arm_v7s_cfg.ttbr;
 			cb->ttbr[1] = 0;
 		} else {
-			cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
-			cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-						  cfg->asid);
+			cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
+				cfg->asid);
 			cb->ttbr[1] = FIELD_PREP(ARM_SMMU_TTBRn_ASID,
-						 cfg->asid);
+				cfg->asid);
+
+			if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1)
+				cb->ttbr[1] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+			else
+				cb->ttbr[0] |= pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
 		}
 	} else {
 		cb->ttbr[0] = pgtbl_cfg->arm_lpae_s2_cfg.vttbr;
@@ -824,7 +828,14 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 
 	/* Update the domain's page sizes to reflect the page table format */
 	domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap;
-	domain->geometry.aperture_end = (1UL << ias) - 1;
+
+	if (pgtbl_cfg.quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+		domain->geometry.aperture_start = ~0UL << ias;
+		domain->geometry.aperture_end = ~0UL;
+	} else {
+		domain->geometry.aperture_end = (1UL << ias) - 1;
+	}
+
 	domain->geometry.force_aperture = true;
 
 	/* Initialise the context bank with our page table cfg */
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 38b041530a4f..5f2de20e883b 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -168,10 +168,12 @@ enum arm_smmu_cbar_type {
 #define ARM_SMMU_CB_TCR			0x30
 #define ARM_SMMU_TCR_EAE		BIT(31)
 #define ARM_SMMU_TCR_EPD1		BIT(23)
+#define ARM_SMMU_TCR_A1			BIT(22)
 #define ARM_SMMU_TCR_TG0		GENMASK(15, 14)
 #define ARM_SMMU_TCR_SH0		GENMASK(13, 12)
 #define ARM_SMMU_TCR_ORGN0		GENMASK(11, 10)
 #define ARM_SMMU_TCR_IRGN0		GENMASK(9, 8)
+#define ARM_SMMU_TCR_EPD0		BIT(7)
 #define ARM_SMMU_TCR_T0SZ		GENMASK(5, 0)
 
 #define ARM_SMMU_VTCR_RES1		BIT(31)
@@ -347,12 +349,23 @@ struct arm_smmu_domain {
 
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
-	return ARM_SMMU_TCR_EPD1 |
-	       FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
-	       FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
-	       FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
-	       FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
-	       FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+	u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
+		FIELD_PREP(ARM_SMMU_TCR_SH0, cfg->arm_lpae_s1_cfg.tcr.sh) |
+		FIELD_PREP(ARM_SMMU_TCR_ORGN0, cfg->arm_lpae_s1_cfg.tcr.orgn) |
+		FIELD_PREP(ARM_SMMU_TCR_IRGN0, cfg->arm_lpae_s1_cfg.tcr.irgn) |
+		FIELD_PREP(ARM_SMMU_TCR_T0SZ, cfg->arm_lpae_s1_cfg.tcr.tsz);
+
+       /*
+	* When TTBR1 is selected shift the TCR fields by 16 bits and disable
+	* translation in TTBR0
+	*/
+	if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_TTBR1) {
+		tcr = (tcr << 16) & ~ARM_SMMU_TCR_A1;
+		tcr |= ARM_SMMU_TCR_EPD0;
+	} else
+		tcr |= ARM_SMMU_TCR_EPD1;
+
+	return tcr;
 }
 
 static inline u32 arm_smmu_lpae_tcr2(struct io_pgtable_cfg *cfg)
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 03/13] iommu/arm-smmu: Add implementation hooks to configure contexts
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 01/13] iommu/arm-smmu: Pass io-pgtable config to implementation specific function Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 02/13] iommu/arm-smmu: Add support for split pagetables Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 04/13] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU Jordan Crouse
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: linux-kernel, freedreno, Robin Murphy, iommu, Will Deacon,
	linux-arm-kernel

Add a new hook to allow implementations to implement their own context
bank allocation scheme and update the existing init_context function to
take the device pointer.

These modifications will be used by the upcoming Adreno SMMU
implementation to identify the GPU device and properly configure it
for pagetable switching.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/iommu/arm-smmu-impl.c |  2 +-
 drivers/iommu/arm-smmu.c      | 46 ++++++++++++-----------------------
 drivers/iommu/arm-smmu.h      | 28 ++++++++++++++++++++-
 3 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index a20e426d81ac..b71b14685cc9 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -69,7 +69,7 @@ static int cavium_cfg_probe(struct arm_smmu_device *smmu)
 }
 
 static int cavium_init_context(struct arm_smmu_domain *smmu_domain,
-		struct io_pgtable_cfg *pgtbl_cfg)
+		struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
 {
 	struct cavium_smmu *cs = container_of(smmu_domain->smmu,
 					      struct cavium_smmu, smmu);
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 8798428a4c8d..fff536a44faa 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -93,16 +93,6 @@ struct arm_smmu_cb {
 	struct arm_smmu_cfg		*cfg;
 };
 
-struct arm_smmu_master_cfg {
-	struct arm_smmu_device		*smmu;
-	s16				smendx[];
-};
-#define INVALID_SMENDX			-1
-#define cfg_smendx(cfg, fw, i) \
-	(i >= fw->num_ids ? INVALID_SMENDX : cfg->smendx[i])
-#define for_each_cfg_sme(cfg, fw, i, idx) \
-	for (i = 0; idx = cfg_smendx(cfg, fw, i), i < fw->num_ids; ++i)
-
 static bool using_legacy_binding, using_generic_binding;
 
 static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
@@ -237,19 +227,6 @@ static int arm_smmu_register_legacy_master(struct device *dev,
 }
 #endif /* CONFIG_ARM_SMMU_LEGACY_DT_BINDINGS */
 
-static int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
-{
-	int idx;
-
-	do {
-		idx = find_next_zero_bit(map, end, start);
-		if (idx == end)
-			return -ENOSPC;
-	} while (test_and_set_bit(idx, map));
-
-	return idx;
-}
-
 static void __arm_smmu_free_bitmap(unsigned long *map, int idx)
 {
 	clear_bit(idx, map);
@@ -668,7 +645,8 @@ static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
 }
 
 static int arm_smmu_init_domain_context(struct iommu_domain *domain,
-					struct arm_smmu_device *smmu)
+					struct arm_smmu_device *smmu,
+					struct device *dev)
 {
 	int irq, start, ret = 0;
 	unsigned long ias, oas;
@@ -782,10 +760,20 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 		ret = -EINVAL;
 		goto out_unlock;
 	}
-	ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
+
+	smmu_domain->smmu = smmu;
+
+	if (smmu->impl && smmu->impl->alloc_context_bank)
+		ret = smmu->impl->alloc_context_bank(smmu_domain, dev,
+				start, smmu->num_context_banks);
+	else
+		ret = __arm_smmu_alloc_bitmap(smmu->context_map, start,
 				      smmu->num_context_banks);
-	if (ret < 0)
+
+	if (ret < 0) {
+		smmu_domain->smmu = NULL;
 		goto out_unlock;
+	}
 
 	cfg->cbndx = ret;
 	if (smmu->version < ARM_SMMU_V2) {
@@ -800,8 +788,6 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	else
 		cfg->asid = cfg->cbndx;
 
-	smmu_domain->smmu = smmu;
-
 	pgtbl_cfg = (struct io_pgtable_cfg) {
 		.pgsize_bitmap	= smmu->pgsize_bitmap,
 		.ias		= ias,
@@ -812,7 +798,7 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain,
 	};
 
 	if (smmu->impl && smmu->impl->init_context) {
-		ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg);
+		ret = smmu->impl->init_context(smmu_domain, &pgtbl_cfg, dev);
 		if (ret)
 			goto out_clear_smmu;
 	}
@@ -1190,7 +1176,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 		return ret;
 
 	/* Ensure that the domain is finalised */
-	ret = arm_smmu_init_domain_context(domain, smmu);
+	ret = arm_smmu_init_domain_context(domain, smmu, dev);
 	if (ret < 0)
 		goto rpm_put;
 
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 5f2de20e883b..d10d745a0290 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -347,6 +347,11 @@ struct arm_smmu_domain {
 	struct iommu_domain		domain;
 };
 
+struct arm_smmu_master_cfg {
+	struct arm_smmu_device		*smmu;
+	s16				smendx[];
+};
+
 static inline u32 arm_smmu_lpae_tcr(struct io_pgtable_cfg *cfg)
 {
 	u32 tcr = FIELD_PREP(ARM_SMMU_TCR_TG0, cfg->arm_lpae_s1_cfg.tcr.tg) |
@@ -397,12 +402,33 @@ struct arm_smmu_impl {
 	int (*cfg_probe)(struct arm_smmu_device *smmu);
 	int (*reset)(struct arm_smmu_device *smmu);
 	int (*init_context)(struct arm_smmu_domain *smmu_domain,
-			struct io_pgtable_cfg *cfg);
+			struct io_pgtable_cfg *cfg, struct device *dev);
 	void (*tlb_sync)(struct arm_smmu_device *smmu, int page, int sync,
 			 int status);
 	int (*def_domain_type)(struct device *dev);
+	int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
+			struct device *dev, int start, int max);
 };
 
+#define INVALID_SMENDX			-1
+#define cfg_smendx(cfg, fw, i) \
+	(i >= fw->num_ids ? INVALID_SMENDX : cfg->smendx[i])
+#define for_each_cfg_sme(cfg, fw, i, idx) \
+	for (i = 0; idx = cfg_smendx(cfg, fw, i), i < fw->num_ids; ++i)
+
+static inline int __arm_smmu_alloc_bitmap(unsigned long *map, int start, int end)
+{
+	int idx;
+
+	do {
+		idx = find_next_zero_bit(map, end, start);
+		if (idx == end)
+			return -ENOSPC;
+	} while (test_and_set_bit(idx, map));
+
+	return idx;
+}
+
 static inline void __iomem *arm_smmu_page(struct arm_smmu_device *smmu, int n)
 {
 	return smmu->base + (n << smmu->pgshift);
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 04/13] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (2 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 03/13] iommu/arm-smmu: Add implementation hooks to configure contexts Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-27  6:27   ` Bjorn Andersson
  2020-07-20 15:40 ` [PATCH v10 05/13] iommu: Add a domain attribute to get/set a pagetable configuration Jordan Crouse
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: linux-kernel, freedreno, Robin Murphy, iommu, Will Deacon,
	linux-arm-kernel

Add a special implementation for the SMMU attached to most Adreno GPU
target triggered from the qcom,adreno-smmu compatible string.

The new Adreno SMMU implementation will enable split pagetables
(TTBR1) for the domain attached to the GPU device (SID 0) and
hard code it context bank 0 so the GPU hardware can implement
per-instance pagetables.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/iommu/arm-smmu-impl.c |  3 ++
 drivers/iommu/arm-smmu-qcom.c | 83 ++++++++++++++++++++++++++++++++++-
 drivers/iommu/arm-smmu.h      |  1 +
 3 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/arm-smmu-impl.c b/drivers/iommu/arm-smmu-impl.c
index b71b14685cc9..3bb1ef4e85f7 100644
--- a/drivers/iommu/arm-smmu-impl.c
+++ b/drivers/iommu/arm-smmu-impl.c
@@ -176,5 +176,8 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu)
 	    of_device_is_compatible(np, "qcom,sc7180-smmu-500"))
 		return qcom_smmu_impl_init(smmu);
 
+	if (of_device_is_compatible(smmu->dev->of_node, "qcom,adreno-smmu"))
+		return qcom_adreno_smmu_impl_init(smmu);
+
 	return smmu;
 }
diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index be4318044f96..b9a5c5369e86 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -12,6 +12,67 @@ struct qcom_smmu {
 	struct arm_smmu_device smmu;
 };
 
+#define QCOM_ADRENO_SMMU_GPU_SID 0
+
+static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
+{
+	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+	struct arm_smmu_master_cfg *cfg = dev_iommu_priv_get(dev);
+	int idx, i;
+
+	/*
+	 * The GPU will always use SID 0 so that is a handy way to uniquely
+	 * identify it and configure it for per-instance pagetables
+	 */
+	for_each_cfg_sme(cfg, fwspec, i, idx) {
+		u16 sid = FIELD_GET(ARM_SMMU_SMR_ID, fwspec->ids[i]);
+
+		if (sid == QCOM_ADRENO_SMMU_GPU_SID)
+			return true;
+	}
+
+	return false;
+}
+
+static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
+		struct device *dev, int start, int count)
+{
+	struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+	/*
+	 * Assign context bank 0 to the GPU device so the GPU hardware can
+	 * switch pagetables
+	 */
+	if (qcom_adreno_smmu_is_gpu_device(dev)) {
+		if (start > 0 || test_bit(0, smmu->context_map))
+			return -ENOSPC;
+
+		set_bit(0, smmu->context_map);
+		return 0;
+	}
+
+	return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);
+}
+
+static int qcom_adreno_smmu_init_context(struct arm_smmu_domain *smmu_domain,
+		struct io_pgtable_cfg *pgtbl_cfg, struct device *dev)
+{
+	/* Only enable split pagetables for the GPU device (SID 0) */
+	if (!qcom_adreno_smmu_is_gpu_device(dev))
+		return 0;
+
+	/*
+	 * All targets that use the qcom,adreno-smmu compatible string *should*
+	 * be AARCH64 stage 1 but double check because the arm-smmu code assumes
+	 * that is the case when the TTBR1 quirk is enabled
+	 */
+	if ((smmu_domain->stage == ARM_SMMU_DOMAIN_S1) &&
+	    (smmu_domain->cfg.fmt == ARM_SMMU_CTX_FMT_AARCH64))
+		pgtbl_cfg->quirks |= IO_PGTABLE_QUIRK_ARM_TTBR1;
+
+	return 0;
+}
+
 static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = {
 	{ .compatible = "qcom,adreno" },
 	{ .compatible = "qcom,mdp4" },
@@ -65,7 +126,15 @@ static const struct arm_smmu_impl qcom_smmu_impl = {
 	.reset = qcom_smmu500_reset,
 };
 
-struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
+	.init_context = qcom_adreno_smmu_init_context,
+	.def_domain_type = qcom_smmu_def_domain_type,
+	.reset = qcom_smmu500_reset,
+	.alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
+};
+
+static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
+		const struct arm_smmu_impl *impl)
 {
 	struct qcom_smmu *qsmmu;
 
@@ -75,8 +144,18 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
 
 	qsmmu->smmu = *smmu;
 
-	qsmmu->smmu.impl = &qcom_smmu_impl;
+	qsmmu->smmu.impl = impl;
 	devm_kfree(smmu->dev, smmu);
 
 	return &qsmmu->smmu;
 }
+
+struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+	return qcom_smmu_create(smmu, &qcom_smmu_impl);
+}
+
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu)
+{
+	return qcom_smmu_create(smmu, &qcom_adreno_smmu_impl);
+}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index d10d745a0290..9f81c1fffe1e 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -491,6 +491,7 @@ static inline void arm_smmu_writeq(struct arm_smmu_device *smmu, int page,
 
 struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
+struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu);
 
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 05/13] iommu: Add a domain attribute to get/set a pagetable configuration
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (3 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 04/13] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 06/13] iommu/arm-smmu-qcom: Get and set the pagetable config for split pagetables Jordan Crouse
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm; +Cc: freedreno, Robin Murphy, iommu, Will Deacon, linux-kernel

Add domain attribute DOMAIN_ATTR_PGTABLE_CFG. This will be used by
arm-smmu to share the current pagetable configuration with the
leaf driver and to allow the leaf driver to set up a new pagetable
configuration under certain circumstances.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 include/linux/iommu.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 5657d4fef9f2..e8d59ad15611 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -124,6 +124,7 @@ enum iommu_attr {
 	DOMAIN_ATTR_FSL_PAMUV1,
 	DOMAIN_ATTR_NESTING,	/* two stages of translation */
 	DOMAIN_ATTR_DMA_USE_FLUSH_QUEUE,
+	DOMAIN_ATTR_PGTABLE_CFG,
 	DOMAIN_ATTR_MAX,
 };
 
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 06/13] iommu/arm-smmu-qcom: Get and set the pagetable config for split pagetables
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (4 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 05/13] iommu: Add a domain attribute to get/set a pagetable configuration Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-26 17:03   ` [Freedreno] " Rob Clark
  2020-07-20 15:40 ` [PATCH v10 07/13] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU Jordan Crouse
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: linux-kernel, freedreno, Robin Murphy, iommu, Will Deacon,
	linux-arm-kernel

The Adreno GPU has the capability to manage its own pagetables and switch
them dynamically from the hardware. To do this the GPU uses TTBR1 for
"global" GPU memory and creates local pagetables for each context and
switches them dynamically with the GPU.

Use DOMAIN_ATTR_PGTABLE_CFG to get the current configuration for the
TTBR1 pagetable from the smmu driver so the leaf driver can create
compatible pagetables for use with TTBR0.

Because TTBR0 is disabled by default when TTBR1 is enabled the GPU
driver can pass the configuration of one of the newly created pagetables
back through DOMAIN_ATTR_PGTABLE_CFG as a trigger to enable translation on
TTBR0.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/iommu/arm-smmu-qcom.c | 47 +++++++++++++++++++++++++++++++++++
 drivers/iommu/arm-smmu.c      | 32 ++++++++++++++++++------
 drivers/iommu/arm-smmu.h      | 10 ++++++++
 3 files changed, 81 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
index b9a5c5369e86..9a0c64ca9cb6 100644
--- a/drivers/iommu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm-smmu-qcom.c
@@ -34,6 +34,52 @@ static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
 	return false;
 }
 
+/*
+ * Local implementation to configure TTBR0 wil the specified pagetable config.
+ * The GPU driver will call this to enable TTBR0 when per-instance pagetables
+ * are active
+ */
+static int qcom_adreno_smmu_set_pgtable_cfg(struct arm_smmu_domain *smmu_domain,
+		struct io_pgtable_cfg *pgtbl_cfg)
+{
+	struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+	struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
+	struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
+
+	/* The domain must have split pagetables already enabled */
+	if (cb->tcr[0] & ARM_SMMU_TCR_EPD1)
+		return -EINVAL;
+
+	/* If the pagetable config is NULL, disable TTBR0 */
+	if (!pgtbl_cfg) {
+		/* Do nothing if it is already disabled */
+		if ((cb->tcr[0] & ARM_SMMU_TCR_EPD0))
+			return -EINVAL;
+
+		/* Set TCR to the original configuration */
+		cb->tcr[0] = arm_smmu_lpae_tcr(&pgtable->cfg);
+		cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+	} else {
+		u32 tcr = cb->tcr[0];
+
+		/* FIXME: What sort of validation do we need to do here? */
+
+		/* Don't call this again if TTBR0 is already enabled */
+		if (!(cb->tcr[0] & ARM_SMMU_TCR_EPD0))
+			return -EINVAL;
+
+		tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
+		tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
+
+		cb->tcr[0] = tcr;
+		cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
+		cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
+	}
+
+	arm_smmu_write_context_bank(smmu_domain->smmu, cb->cfg->cbndx);
+	return 0;
+}
+
 static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
 		struct device *dev, int start, int count)
 {
@@ -131,6 +177,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
 	.def_domain_type = qcom_smmu_def_domain_type,
 	.reset = qcom_smmu500_reset,
 	.alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
+	.set_pgtable_cfg = qcom_adreno_smmu_set_pgtable_cfg,
 };
 
 static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index fff536a44faa..e1036ae54a8d 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -86,13 +86,6 @@ struct arm_smmu_smr {
 	bool				valid;
 };
 
-struct arm_smmu_cb {
-	u64				ttbr[2];
-	u32				tcr[2];
-	u32				mair[2];
-	struct arm_smmu_cfg		*cfg;
-};
-
 static bool using_legacy_binding, using_generic_binding;
 
 static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
@@ -558,7 +551,7 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
 	}
 }
 
-static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
+void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
 {
 	u32 reg;
 	bool stage1;
@@ -1515,6 +1508,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
 		case DOMAIN_ATTR_NESTING:
 			*(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
 			return 0;
+		case DOMAIN_ATTR_PGTABLE_CFG: {
+			struct io_pgtable *pgtable;
+			struct io_pgtable_cfg *dest = data;
+
+			if (!smmu_domain->pgtbl_ops)
+				return -ENODEV;
+
+			pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
+
+			memcpy(dest, &pgtable->cfg, sizeof(*dest));
+			return 0;
+		}
 		default:
 			return -ENODEV;
 		}
@@ -1555,6 +1560,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
 			else
 				smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
 			break;
+		case DOMAIN_ATTR_PGTABLE_CFG: {
+			struct arm_smmu_device *smmu = smmu_domain->smmu;
+
+			ret = -EPERM;
+
+			if (smmu)
+				if (smmu->impl && smmu->impl->set_pgtable_cfg)
+					ret = smmu->impl->set_pgtable_cfg(smmu_domain,
+						data);
+			}
+			break;
 		default:
 			ret = -ENODEV;
 		}
diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
index 9f81c1fffe1e..9325fc28d24a 100644
--- a/drivers/iommu/arm-smmu.h
+++ b/drivers/iommu/arm-smmu.h
@@ -328,6 +328,13 @@ struct arm_smmu_cfg {
 };
 #define ARM_SMMU_INVALID_IRPTNDX	0xff
 
+struct arm_smmu_cb {
+	u64				ttbr[2];
+	u32				tcr[2];
+	u32				mair[2];
+	struct arm_smmu_cfg		*cfg;
+};
+
 enum arm_smmu_domain_stage {
 	ARM_SMMU_DOMAIN_S1 = 0,
 	ARM_SMMU_DOMAIN_S2,
@@ -408,6 +415,8 @@ struct arm_smmu_impl {
 	int (*def_domain_type)(struct device *dev);
 	int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
 			struct device *dev, int start, int max);
+	int (*set_pgtable_cfg)(struct arm_smmu_domain *smmu_domain,
+			struct io_pgtable_cfg *cfg);
 };
 
 #define INVALID_SMENDX			-1
@@ -493,6 +502,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
 struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu);
 
+void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
 int arm_mmu500_reset(struct arm_smmu_device *smmu);
 
 #endif /* _ARM_SMMU_H */
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 07/13] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (5 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 06/13] iommu/arm-smmu-qcom: Get and set the pagetable config for split pagetables Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-26 16:55   ` Rob Clark
  2020-07-20 15:40 ` [PATCH v10 08/13] drm/msm: Add a context pointer to the submitqueue Jordan Crouse
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Rob Herring, linux-kernel, devicetree, freedreno, Robin Murphy,
	iommu, Rob Herring, Will Deacon, linux-arm-kernel

Every Qcom Adreno GPU has an embedded SMMU for its own use. These
devices depend on unique features such as split pagetables,
different stall/halt requirements and other settings. Identify them
with a compatible string so that they can be identified in the
arm-smmu implementation specific code.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Reviewed-by: Rob Herring <robh@kernel.org>
---

 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index d7ceb4c34423..e52a1b146c97 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -38,6 +38,10 @@ properties:
               - qcom,sc7180-smmu-500
               - qcom,sdm845-smmu-500
           - const: arm,mmu-500
+      - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
+        items:
+          - const: qcom,adreno-smmu
+          - const: qcom,smmu-v2
       - items:
           - const: arm,mmu-500
           - const: arm,smmu-v2
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 08/13] drm/msm: Add a context pointer to the submitqueue
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (6 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 07/13] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-26 17:09   ` Rob Clark
  2020-07-20 15:40 ` [PATCH v10 09/13] drm/msm: Set the global virtual address range from the IOMMU domain Jordan Crouse
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: David Airlie, Akhil P Oommen, dri-devel, Eric Anholt,
	AngeloGioacchino Del Regno, Will Deacon, Emil Velikov,
	Jonathan Marek, iommu, Ben Dooks, Wambui Karuga, Sharat Masetty,
	Brian Masney, Sean Paul, Robin Murphy, linux-kernel,
	Daniel Vetter, Shawn Guo, freedreno

Each submitqueue is attached to a context. Add a pointer to the
context to the submitqueue at create time and refcount it so
that it stays around through the life of the queue.

GPU submissions can access the active context via the submitqueue
instead of requiring it to be passed around from function to
function.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 12 +++++-------
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  5 ++---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c |  5 ++---
 drivers/gpu/drm/msm/adreno/adreno_gpu.h |  3 +--
 drivers/gpu/drm/msm/msm_drv.c           |  3 ++-
 drivers/gpu/drm/msm/msm_drv.h           |  8 ++++++++
 drivers/gpu/drm/msm/msm_gem.h           |  1 +
 drivers/gpu/drm/msm/msm_gem_submit.c    |  8 ++++----
 drivers/gpu/drm/msm/msm_gpu.c           |  9 ++++-----
 drivers/gpu/drm/msm/msm_gpu.h           |  7 +++----
 drivers/gpu/drm/msm/msm_submitqueue.c   |  8 +++++++-
 11 files changed, 39 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
index 9e63a190642c..eff2439ea57b 100644
--- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
@@ -43,8 +43,7 @@ static void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
 		gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
 }
 
-static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-	struct msm_file_private *ctx)
+static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
 	struct msm_drm_private *priv = gpu->dev->dev_private;
 	struct msm_ringbuffer *ring = submit->ring;
@@ -57,7 +56,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-			if (priv->lastctx == ctx)
+			if (priv->lastctx == submit->queue->ctx)
 				break;
 			/* fall-thru */
 		case MSM_SUBMIT_CMD_BUF:
@@ -103,8 +102,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit
 	msm_gpu_retire(gpu);
 }
 
-static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-	struct msm_file_private *ctx)
+static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
@@ -114,7 +112,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 
 	if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
 		priv->lastctx = NULL;
-		a5xx_submit_in_rb(gpu, submit, ctx);
+		a5xx_submit_in_rb(gpu, submit);
 		return;
 	}
 
@@ -148,7 +146,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-			if (priv->lastctx == ctx)
+			if (priv->lastctx == submit->queue->ctx)
 				break;
 			/* fall-thru */
 		case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index c5a3e4d4c007..5eabb0109577 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -81,8 +81,7 @@ static void get_stats_counter(struct msm_ringbuffer *ring, u32 counter,
 	OUT_RING(ring, upper_32_bits(iova));
 }
 
-static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-	struct msm_file_private *ctx)
+static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
 	unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
 	struct msm_drm_private *priv = gpu->dev->dev_private;
@@ -115,7 +114,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 		case MSM_SUBMIT_CMD_IB_TARGET_BUF:
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
-			if (priv->lastctx == ctx)
+			if (priv->lastctx == submit->queue->ctx)
 				break;
 			/* fall-thru */
 		case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index e23641a5ec84..b38a8126541a 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -457,8 +457,7 @@ void adreno_recover(struct msm_gpu *gpu)
 	}
 }
 
-void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-		struct msm_file_private *ctx)
+void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
 	struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
 	struct msm_drm_private *priv = gpu->dev->dev_private;
@@ -472,7 +471,7 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 			break;
 		case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
 			/* ignore if there has not been a ctx switch: */
-			if (priv->lastctx == ctx)
+			if (priv->lastctx == submit->queue->ctx)
 				break;
 			/* fall-thru */
 		case MSM_SUBMIT_CMD_BUF:
diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
index 99bb468f5f24..0ae8b373c428 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
@@ -267,8 +267,7 @@ struct drm_gem_object *adreno_fw_create_bo(struct msm_gpu *gpu,
 		const struct firmware *fw, u64 *iova);
 int adreno_hw_init(struct msm_gpu *gpu);
 void adreno_recover(struct msm_gpu *gpu);
-void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-		struct msm_file_private *ctx);
+void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit);
 void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 bool adreno_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index ff379ed40469..556198d4ba5f 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -600,6 +600,7 @@ static int context_init(struct drm_device *dev, struct drm_file *file)
 	if (!ctx)
 		return -ENOMEM;
 
+	kref_init(&ctx->ref);
 	msm_submitqueue_init(dev, ctx);
 
 	ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
@@ -621,7 +622,7 @@ static int msm_open(struct drm_device *dev, struct drm_file *file)
 static void context_close(struct msm_file_private *ctx)
 {
 	msm_submitqueue_close(ctx);
-	kfree(ctx);
+	kref_put(&ctx->ref, msm_file_private_destroy);
 }
 
 static void msm_postclose(struct drm_device *dev, struct drm_file *file)
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index 2687f7a42c15..ab5f77261816 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -57,6 +57,7 @@ struct msm_file_private {
 	struct list_head submitqueues;
 	int queueid;
 	struct msm_gem_address_space *aspace;
+	struct kref ref;
 };
 
 enum msm_mdp_plane_property {
@@ -429,6 +430,13 @@ void msm_submitqueue_close(struct msm_file_private *ctx);
 
 void msm_submitqueue_destroy(struct kref *kref);
 
+static inline void msm_file_private_destroy(struct kref *kref)
+{
+	struct msm_file_private *ctx = container_of(kref,
+		struct msm_file_private, ref);
+
+	kfree(ctx);
+}
 
 #define DBG(fmt, ...) DRM_DEBUG_DRIVER(fmt"\n", ##__VA_ARGS__)
 #define VERB(fmt, ...) if (0) DRM_DEBUG_DRIVER(fmt"\n", ##__VA_ARGS__)
diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
index 972490b14ba5..9c573c4269cb 100644
--- a/drivers/gpu/drm/msm/msm_gem.h
+++ b/drivers/gpu/drm/msm/msm_gem.h
@@ -142,6 +142,7 @@ struct msm_gem_submit {
 	bool valid;         /* true if no cmdstream patching needed */
 	bool in_rb;         /* "sudo" mode, copy cmds into RB */
 	struct msm_ringbuffer *ring;
+	struct msm_file_private *ctx;
 	unsigned int nr_cmds;
 	unsigned int nr_bos;
 	u32 ident;	   /* A "identifier" for the submit for logging */
diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
index 8cb9aa15ff90..aa5c60a7132d 100644
--- a/drivers/gpu/drm/msm/msm_gem_submit.c
+++ b/drivers/gpu/drm/msm/msm_gem_submit.c
@@ -27,7 +27,7 @@
 #define BO_PINNED   0x2000
 
 static struct msm_gem_submit *submit_create(struct drm_device *dev,
-		struct msm_gpu *gpu, struct msm_gem_address_space *aspace,
+		struct msm_gpu *gpu,
 		struct msm_gpu_submitqueue *queue, uint32_t nr_bos,
 		uint32_t nr_cmds)
 {
@@ -43,7 +43,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
 		return NULL;
 
 	submit->dev = dev;
-	submit->aspace = aspace;
+	submit->aspace = queue->ctx->aspace;
 	submit->gpu = gpu;
 	submit->fence = NULL;
 	submit->cmd = (void *)&submit->bos[nr_bos];
@@ -677,7 +677,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		}
 	}
 
-	submit = submit_create(dev, gpu, ctx->aspace, queue, args->nr_bos,
+	submit = submit_create(dev, gpu, queue, args->nr_bos,
 		args->nr_cmds);
 	if (!submit) {
 		ret = -ENOMEM;
@@ -785,7 +785,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
 		}
 	}
 
-	msm_gpu_submit(gpu, submit, ctx);
+	msm_gpu_submit(gpu, submit);
 
 	args->fence = submit->fence->seqno;
 
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index d5645472b25d..a1f3da6550e5 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -520,7 +520,7 @@ static void recover_worker(struct work_struct *work)
 			struct msm_ringbuffer *ring = gpu->rb[i];
 
 			list_for_each_entry(submit, &ring->submits, node)
-				gpu->funcs->submit(gpu, submit, NULL);
+				gpu->funcs->submit(gpu, submit);
 		}
 	}
 
@@ -747,8 +747,7 @@ void msm_gpu_retire(struct msm_gpu *gpu)
 }
 
 /* add bo's to gpu's ring, and kick gpu: */
-void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-		struct msm_file_private *ctx)
+void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
 	struct drm_device *dev = gpu->dev;
 	struct msm_drm_private *priv = dev->dev_private;
@@ -788,8 +787,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
 			msm_gem_move_to_active(&msm_obj->base, gpu, false, submit->fence);
 	}
 
-	gpu->funcs->submit(gpu, submit, ctx);
-	priv->lastctx = ctx;
+	gpu->funcs->submit(gpu, submit);
+	priv->lastctx = submit->queue->ctx;
 
 	hangcheck_timer_reset(gpu);
 }
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index 0db117a7339b..d496d488222c 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -44,8 +44,7 @@ struct msm_gpu_funcs {
 	int (*hw_init)(struct msm_gpu *gpu);
 	int (*pm_suspend)(struct msm_gpu *gpu);
 	int (*pm_resume)(struct msm_gpu *gpu);
-	void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-			struct msm_file_private *ctx);
+	void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit);
 	void (*flush)(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
 	irqreturn_t (*irq)(struct msm_gpu *irq);
 	struct msm_ringbuffer *(*active_ring)(struct msm_gpu *gpu);
@@ -181,6 +180,7 @@ struct msm_gpu_submitqueue {
 	u32 flags;
 	u32 prio;
 	int faults;
+	struct msm_file_private *ctx;
 	struct list_head node;
 	struct kref ref;
 };
@@ -280,8 +280,7 @@ int msm_gpu_perfcntr_sample(struct msm_gpu *gpu, uint32_t *activetime,
 		uint32_t *totaltime, uint32_t ncntrs, uint32_t *cntrs);
 
 void msm_gpu_retire(struct msm_gpu *gpu);
-void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
-		struct msm_file_private *ctx);
+void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit);
 
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c
index a1d94be7883a..10f557225a3e 100644
--- a/drivers/gpu/drm/msm/msm_submitqueue.c
+++ b/drivers/gpu/drm/msm/msm_submitqueue.c
@@ -49,8 +49,10 @@ void msm_submitqueue_close(struct msm_file_private *ctx)
 	 * No lock needed in close and there won't
 	 * be any more user ioctls coming our way
 	 */
-	list_for_each_entry_safe(entry, tmp, &ctx->submitqueues, node)
+	list_for_each_entry_safe(entry, tmp, &ctx->submitqueues, node) {
+		kref_put(&ctx->ref, msm_file_private_destroy);
 		msm_submitqueue_put(entry);
+	}
 }
 
 int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private *ctx,
@@ -81,6 +83,9 @@ int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private *ctx,
 
 	write_lock(&ctx->queuelock);
 
+	kref_get(&ctx->ref);
+
+	queue->ctx = ctx;
 	queue->id = ctx->queueid++;
 
 	if (id)
@@ -177,6 +182,7 @@ int msm_submitqueue_remove(struct msm_file_private *ctx, u32 id)
 			list_del(&entry->node);
 			write_unlock(&ctx->queuelock);
 
+			kref_put(&ctx->ref, msm_file_private_destroy);
 			msm_submitqueue_put(entry);
 			return 0;
 		}
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 09/13] drm/msm: Set the global virtual address range from the IOMMU domain
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (7 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 08/13] drm/msm: Add a context pointer to the submitqueue Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 10/13] drm/msm: Add support to create a local pagetable Jordan Crouse
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Sean Paul, Jonathan Marek, David Airlie, freedreno, Robin Murphy,
	dri-devel, Eric Anholt, iommu, John Stultz, Daniel Vetter,
	Will Deacon, linux-kernel, Brian Masney

Use the aperture settings from the IOMMU domain to set up the virtual
address range for the GPU. This allows us to transparently deal with
IOMMU side features (like split pagetables).

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 13 +++++++++++--
 drivers/gpu/drm/msm/msm_iommu.c         |  7 +++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index b38a8126541a..f9e3badf2fca 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -192,9 +192,18 @@ adreno_iommu_create_address_space(struct msm_gpu *gpu,
 	struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
 	struct msm_mmu *mmu = msm_iommu_new(&pdev->dev, iommu);
 	struct msm_gem_address_space *aspace;
+	u64 start, size;
 
-	aspace = msm_gem_address_space_create(mmu, "gpu", SZ_16M,
-		0xffffffff - SZ_16M);
+	/*
+	 * Use the aperture start or SZ_16M, whichever is greater. This will
+	 * ensure that we align with the allocated pagetable range while still
+	 * allowing room in the lower 32 bits for GMEM and whatnot
+	 */
+	start = max_t(u64, SZ_16M, iommu->geometry.aperture_start);
+	size = iommu->geometry.aperture_end - start + 1;
+
+	aspace = msm_gem_address_space_create(mmu, "gpu",
+		start & GENMASK(48, 0), size);
 
 	if (IS_ERR(aspace) && !IS_ERR(mmu))
 		mmu->funcs->destroy(mmu);
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 3a381a9674c9..1b6635504069 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -36,6 +36,10 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 	size_t ret;
 
+	/* The arm-smmu driver expects the addresses to be sign extended */
+	if (iova & BIT_ULL(48))
+		iova |= GENMASK_ULL(63, 49);
+
 	ret = iommu_map_sg(iommu->domain, iova, sgt->sgl, sgt->nents, prot);
 	WARN_ON(!ret);
 
@@ -46,6 +50,9 @@ static int msm_iommu_unmap(struct msm_mmu *mmu, uint64_t iova, size_t len)
 {
 	struct msm_iommu *iommu = to_msm_iommu(mmu);
 
+	if (iova & BIT_ULL(48))
+		iova |= GENMASK_ULL(63, 49);
+
 	iommu_unmap(iommu->domain, iova, len);
 
 	return 0;
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 10/13] drm/msm: Add support to create a local pagetable
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (8 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 09/13] drm/msm: Set the global virtual address range from the IOMMU domain Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 11/13] drm/msm: Add support for private address space instances Jordan Crouse
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Sean Paul, David Airlie, freedreno, Robin Murphy, dri-devel,
	iommu, Daniel Vetter, Will Deacon, linux-kernel

Add support to create a io-pgtable for use by targets that support
per-instance pagetables. In order to support per-instance pagetables the
GPU SMMU device needs to have the qcom,adreno-smmu compatible string and
split pagetables enabled.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/gpu/drm/msm/msm_gpummu.c |   2 +-
 drivers/gpu/drm/msm/msm_iommu.c  | 191 ++++++++++++++++++++++++++++++-
 drivers/gpu/drm/msm/msm_mmu.h    |  16 ++-
 3 files changed, 206 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_gpummu.c b/drivers/gpu/drm/msm/msm_gpummu.c
index 310a31b05faa..aab121f4beb7 100644
--- a/drivers/gpu/drm/msm/msm_gpummu.c
+++ b/drivers/gpu/drm/msm/msm_gpummu.c
@@ -102,7 +102,7 @@ struct msm_mmu *msm_gpummu_new(struct device *dev, struct msm_gpu *gpu)
 	}
 
 	gpummu->gpu = gpu;
-	msm_mmu_init(&gpummu->base, dev, &funcs);
+	msm_mmu_init(&gpummu->base, dev, &funcs, MSM_MMU_GPUMMU);
 
 	return &gpummu->base;
 }
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 1b6635504069..8cf8c7f7a665 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -4,15 +4,202 @@
  * Author: Rob Clark <robdclark@gmail.com>
  */
 
+#include <linux/io-pgtable.h>
 #include "msm_drv.h"
 #include "msm_mmu.h"
 
 struct msm_iommu {
 	struct msm_mmu base;
 	struct iommu_domain *domain;
+	atomic_t pagetables;
 };
+
 #define to_msm_iommu(x) container_of(x, struct msm_iommu, base)
 
+struct msm_iommu_pagetable {
+	struct msm_mmu base;
+	struct msm_mmu *parent;
+	struct io_pgtable_ops *pgtbl_ops;
+	phys_addr_t ttbr;
+	u32 asid;
+};
+static struct msm_iommu_pagetable *to_pagetable(struct msm_mmu *mmu)
+{
+	return container_of(mmu, struct msm_iommu_pagetable, base);
+}
+
+static int msm_iommu_pagetable_unmap(struct msm_mmu *mmu, u64 iova,
+		size_t size)
+{
+	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+	size_t unmapped = 0;
+
+	/* Unmap the block one page at a time */
+	while (size) {
+		unmapped += ops->unmap(ops, iova, 4096, NULL);
+		iova += 4096;
+		size -= 4096;
+	}
+
+	iommu_flush_tlb_all(to_msm_iommu(pagetable->parent)->domain);
+
+	return (unmapped == size) ? 0 : -EINVAL;
+}
+
+static int msm_iommu_pagetable_map(struct msm_mmu *mmu, u64 iova,
+		struct sg_table *sgt, size_t len, int prot)
+{
+	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+	struct io_pgtable_ops *ops = pagetable->pgtbl_ops;
+	struct scatterlist *sg;
+	size_t mapped = 0;
+	u64 addr = iova;
+	unsigned int i;
+
+	for_each_sg(sgt->sgl, sg, sgt->nents, i) {
+		size_t size = sg->length;
+		phys_addr_t phys = sg_phys(sg);
+
+		/* Map the block one page at a time */
+		while (size) {
+			if (ops->map(ops, addr, phys, 4096, prot)) {
+				msm_iommu_pagetable_unmap(mmu, iova, mapped);
+				return -EINVAL;
+			}
+
+			phys += 4096;
+			addr += 4096;
+			size -= 4096;
+			mapped += 4096;
+		}
+	}
+
+	return 0;
+}
+
+static void msm_iommu_pagetable_destroy(struct msm_mmu *mmu)
+{
+	struct msm_iommu_pagetable *pagetable = to_pagetable(mmu);
+	struct msm_iommu *iommu = to_msm_iommu(pagetable->parent);
+
+	/*
+	 * If this is the last attached pagetable for the parent,
+	 * disable TTBR0 in the arm-smmu driver
+	 */
+	if (atomic_dec_return(&iommu->pagetables) == 0)
+		iommu_domain_set_attr(iommu->domain,
+			DOMAIN_ATTR_PGTABLE_CFG, NULL);
+
+	free_io_pgtable_ops(pagetable->pgtbl_ops);
+	kfree(pagetable);
+}
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu,
+		phys_addr_t *ttbr, int *asid)
+{
+	struct msm_iommu_pagetable *pagetable;
+
+	if (mmu->type != MSM_MMU_IOMMU_PAGETABLE)
+		return -EINVAL;
+
+	pagetable = to_pagetable(mmu);
+
+	if (ttbr)
+		*ttbr = pagetable->ttbr;
+
+	if (asid)
+		*asid = pagetable->asid;
+
+	return 0;
+}
+
+static const struct msm_mmu_funcs pagetable_funcs = {
+		.map = msm_iommu_pagetable_map,
+		.unmap = msm_iommu_pagetable_unmap,
+		.destroy = msm_iommu_pagetable_destroy,
+};
+
+static void msm_iommu_tlb_flush_all(void *cookie)
+{
+}
+
+static void msm_iommu_tlb_flush_walk(unsigned long iova, size_t size,
+		size_t granule, void *cookie)
+{
+}
+
+static void msm_iommu_tlb_add_page(struct iommu_iotlb_gather *gather,
+		unsigned long iova, size_t granule, void *cookie)
+{
+}
+
+static const struct iommu_flush_ops null_tlb_ops = {
+	.tlb_flush_all = msm_iommu_tlb_flush_all,
+	.tlb_flush_walk = msm_iommu_tlb_flush_walk,
+	.tlb_flush_leaf = msm_iommu_tlb_flush_walk,
+	.tlb_add_page = msm_iommu_tlb_add_page,
+};
+
+struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent)
+{
+	struct msm_iommu *iommu = to_msm_iommu(parent);
+	static int next_asid = 16;
+	struct msm_iommu_pagetable *pagetable;
+	struct io_pgtable_cfg cfg;
+	int ret;
+
+	/* Get the pagetable configuration from the domain */
+	ret = iommu_domain_get_attr(iommu->domain,
+		DOMAIN_ATTR_PGTABLE_CFG, &cfg);
+	if (ret)
+		return ERR_PTR(ret);
+
+	pagetable = kzalloc(sizeof(*pagetable), GFP_KERNEL);
+	if (!pagetable)
+		return ERR_PTR(-ENOMEM);
+
+	msm_mmu_init(&pagetable->base, parent->dev, &pagetable_funcs,
+		MSM_MMU_IOMMU_PAGETABLE);
+
+	/* The incoming cfg will have the TTBR1 quirk enabled */
+	cfg.quirks &= ~IO_PGTABLE_QUIRK_ARM_TTBR1;
+	cfg.tlb = &null_tlb_ops;
+
+	pagetable->pgtbl_ops = alloc_io_pgtable_ops(ARM_64_LPAE_S1,
+		&cfg, iommu->domain);
+
+	if (!pagetable->pgtbl_ops) {
+		kfree(pagetable);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	/*
+	 * If this is the first pagetable that we've allocated, send it back to
+	 * the arm-smmu driver as a trigger to set up TTBR0
+	 */
+	if (atomic_inc_return(&iommu->pagetables) == 1) {
+		ret = iommu_domain_set_attr(iommu->domain,
+			DOMAIN_ATTR_PGTABLE_CFG, &cfg);
+		if (ret) {
+			free_io_pgtable_ops(pagetable->pgtbl_ops);
+			kfree(pagetable);
+			return ERR_PTR(ret);
+		}
+	}
+
+	/* Needed later for TLB flush */
+	pagetable->parent = parent;
+	pagetable->ttbr = cfg.arm_lpae_s1_cfg.ttbr;
+
+	pagetable->asid = next_asid;
+	next_asid = (next_asid + 1)  % 255;
+	if (next_asid < 16)
+		next_asid = 16;
+
+	return &pagetable->base;
+}
+
 static int msm_fault_handler(struct iommu_domain *domain, struct device *dev,
 		unsigned long iova, int flags, void *arg)
 {
@@ -85,9 +272,11 @@ struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain)
 		return ERR_PTR(-ENOMEM);
 
 	iommu->domain = domain;
-	msm_mmu_init(&iommu->base, dev, &funcs);
+	msm_mmu_init(&iommu->base, dev, &funcs, MSM_MMU_IOMMU);
 	iommu_set_fault_handler(domain, msm_fault_handler, iommu);
 
+	atomic_set(&iommu->pagetables, 0);
+
 	ret = iommu_attach_device(iommu->domain, dev);
 	if (ret) {
 		kfree(iommu);
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 3a534ee59bf6..61ade89d9e48 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -17,18 +17,26 @@ struct msm_mmu_funcs {
 	void (*destroy)(struct msm_mmu *mmu);
 };
 
+enum msm_mmu_type {
+	MSM_MMU_GPUMMU,
+	MSM_MMU_IOMMU,
+	MSM_MMU_IOMMU_PAGETABLE,
+};
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
+	enum msm_mmu_type type;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
-		const struct msm_mmu_funcs *funcs)
+		const struct msm_mmu_funcs *funcs, enum msm_mmu_type type)
 {
 	mmu->dev = dev;
 	mmu->funcs = funcs;
+	mmu->type = type;
 }
 
 struct msm_mmu *msm_iommu_new(struct device *dev, struct iommu_domain *domain);
@@ -41,7 +49,13 @@ static inline void msm_mmu_set_fault_handler(struct msm_mmu *mmu, void *arg,
 	mmu->handler = handler;
 }
 
+struct msm_mmu *msm_iommu_pagetable_create(struct msm_mmu *parent);
+
 void msm_gpummu_params(struct msm_mmu *mmu, dma_addr_t *pt_base,
 		dma_addr_t *tran_error);
 
+
+int msm_iommu_pagetable_params(struct msm_mmu *mmu, phys_addr_t *ttbr,
+		int *asid);
+
 #endif /* __MSM_MMU_H__ */
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 11/13] drm/msm: Add support for private address space instances
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (9 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 10/13] drm/msm: Add support to create a local pagetable Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 12/13] drm/msm/a6xx: Add support for per-instance pagetables Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 13/13] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU Jordan Crouse
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Sean Paul, David Airlie, freedreno, Robin Murphy, dri-devel,
	iommu, Daniel Vetter, Will Deacon, linux-kernel

Add support for allocating private address space instances. Targets that
support per-context pagetables should implement their own function to
allocate private address spaces.

The default will return a pointer to the global address space.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/gpu/drm/msm/msm_drv.c     | 13 +++++++------
 drivers/gpu/drm/msm/msm_drv.h     |  5 +++++
 drivers/gpu/drm/msm/msm_gem_vma.c |  9 +++++++++
 drivers/gpu/drm/msm/msm_gpu.c     | 17 +++++++++++++++++
 drivers/gpu/drm/msm/msm_gpu.h     |  5 +++++
 5 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
index 556198d4ba5f..c0328abea52d 100644
--- a/drivers/gpu/drm/msm/msm_drv.c
+++ b/drivers/gpu/drm/msm/msm_drv.c
@@ -603,7 +603,7 @@ static int context_init(struct drm_device *dev, struct drm_file *file)
 	kref_init(&ctx->ref);
 	msm_submitqueue_init(dev, ctx);
 
-	ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
+	ctx->aspace = msm_gpu_create_private_address_space(priv->gpu);
 	file->driver_priv = ctx;
 
 	return 0;
@@ -786,18 +786,19 @@ static int msm_ioctl_gem_cpu_fini(struct drm_device *dev, void *data,
 }
 
 static int msm_ioctl_gem_info_iova(struct drm_device *dev,
-		struct drm_gem_object *obj, uint64_t *iova)
+		struct drm_file *file, struct drm_gem_object *obj,
+		uint64_t *iova)
 {
-	struct msm_drm_private *priv = dev->dev_private;
+	struct msm_file_private *ctx = file->driver_priv;
 
-	if (!priv->gpu)
+	if (!ctx->aspace)
 		return -EINVAL;
 
 	/*
 	 * Don't pin the memory here - just get an address so that userspace can
 	 * be productive
 	 */
-	return msm_gem_get_iova(obj, priv->gpu->aspace, iova);
+	return msm_gem_get_iova(obj, ctx->aspace, iova);
 }
 
 static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
@@ -836,7 +837,7 @@ static int msm_ioctl_gem_info(struct drm_device *dev, void *data,
 		args->value = msm_gem_mmap_offset(obj);
 		break;
 	case MSM_INFO_GET_IOVA:
-		ret = msm_ioctl_gem_info_iova(dev, obj, &args->value);
+		ret = msm_ioctl_gem_info_iova(dev, file, obj, &args->value);
 		break;
 	case MSM_INFO_SET_NAME:
 		/* length check should leave room for terminating null: */
diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
index ab5f77261816..df400f9ec38c 100644
--- a/drivers/gpu/drm/msm/msm_drv.h
+++ b/drivers/gpu/drm/msm/msm_drv.h
@@ -250,6 +250,10 @@ int msm_gem_map_vma(struct msm_gem_address_space *aspace,
 void msm_gem_close_vma(struct msm_gem_address_space *aspace,
 		struct msm_gem_vma *vma);
 
+
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace);
+
 void msm_gem_address_space_put(struct msm_gem_address_space *aspace);
 
 struct msm_gem_address_space *
@@ -435,6 +439,7 @@ static inline void msm_file_private_destroy(struct kref *kref)
 	struct msm_file_private *ctx = container_of(kref,
 		struct msm_file_private, ref);
 
+	msm_gem_address_space_put(ctx->aspace);
 	kfree(ctx);
 }
 
diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c b/drivers/gpu/drm/msm/msm_gem_vma.c
index 5f6a11211b64..29cc1305cf37 100644
--- a/drivers/gpu/drm/msm/msm_gem_vma.c
+++ b/drivers/gpu/drm/msm/msm_gem_vma.c
@@ -27,6 +27,15 @@ void msm_gem_address_space_put(struct msm_gem_address_space *aspace)
 		kref_put(&aspace->kref, msm_gem_address_space_destroy);
 }
 
+struct msm_gem_address_space *
+msm_gem_address_space_get(struct msm_gem_address_space *aspace)
+{
+	if (!IS_ERR_OR_NULL(aspace))
+		kref_get(&aspace->kref);
+
+	return aspace;
+}
+
 /* Actually unmap memory for the vma */
 void msm_gem_purge_vma(struct msm_gem_address_space *aspace,
 		struct msm_gem_vma *vma)
diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
index a1f3da6550e5..aabbd7908ee5 100644
--- a/drivers/gpu/drm/msm/msm_gpu.c
+++ b/drivers/gpu/drm/msm/msm_gpu.c
@@ -823,6 +823,23 @@ static int get_clocks(struct platform_device *pdev, struct msm_gpu *gpu)
 	return 0;
 }
 
+/* Return a new address space for a msm_drm_private instance */
+struct msm_gem_address_space *
+msm_gpu_create_private_address_space(struct msm_gpu *gpu)
+{
+	if (!gpu)
+		return NULL;
+
+	/*
+	 * If the target doesn't support private address spaces then return
+	 * the global one
+	 */
+	if (!gpu->funcs->create_private_address_space)
+		return msm_gem_address_space_get(gpu->aspace);
+
+	return gpu->funcs->create_private_address_space(gpu);
+}
+
 int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
 		const char *name, struct msm_gpu_config *config)
diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
index d496d488222c..d298657b4730 100644
--- a/drivers/gpu/drm/msm/msm_gpu.h
+++ b/drivers/gpu/drm/msm/msm_gpu.h
@@ -64,6 +64,8 @@ struct msm_gpu_funcs {
 	void (*gpu_set_freq)(struct msm_gpu *gpu, struct dev_pm_opp *opp);
 	struct msm_gem_address_space *(*create_address_space)
 		(struct msm_gpu *gpu, struct platform_device *pdev);
+	struct msm_gem_address_space *(*create_private_address_space)
+		(struct msm_gpu *gpu);
 };
 
 struct msm_gpu {
@@ -286,6 +288,9 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
 		struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
 		const char *name, struct msm_gpu_config *config);
 
+struct msm_gem_address_space *
+msm_gpu_create_private_address_space(struct msm_gpu *gpu);
+
 void msm_gpu_cleanup(struct msm_gpu *gpu);
 
 struct msm_gpu *adreno_load_gpu(struct drm_device *dev);
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 12/13] drm/msm/a6xx: Add support for per-instance pagetables
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (10 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 11/13] drm/msm: Add support for private address space instances Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  2020-07-20 15:40 ` [PATCH v10 13/13] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU Jordan Crouse
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: Sean Paul, Jonathan Marek, David Airlie, freedreno,
	Sharat Masetty, Robin Murphy, dri-devel, Eric Anholt, iommu,
	Akhil P Oommen, Daniel Vetter, Will Deacon, linux-kernel

Add support for using per-instance pagetables if all the dependencies are
available.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 53 +++++++++++++++++++++++++++
 drivers/gpu/drm/msm/adreno/a6xx_gpu.h |  1 +
 drivers/gpu/drm/msm/msm_ringbuffer.h  |  1 +
 3 files changed, 55 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5eabb0109577..57c6cdec7e9a 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -81,6 +81,41 @@ static void get_stats_counter(struct msm_ringbuffer *ring, u32 counter,
 	OUT_RING(ring, upper_32_bits(iova));
 }
 
+static void a6xx_set_pagetable(struct a6xx_gpu *a6xx_gpu,
+		struct msm_ringbuffer *ring, struct msm_file_private *ctx)
+{
+	phys_addr_t ttbr;
+	u32 asid;
+	u64 memptr = rbmemptr(ring, ttbr0);
+
+	if (ctx == a6xx_gpu->cur_ctx)
+		return;
+
+	if (msm_iommu_pagetable_params(ctx->aspace->mmu, &ttbr, &asid))
+		return;
+
+	/* Execute the table update */
+	OUT_PKT7(ring, CP_SMMU_TABLE_UPDATE, 4);
+	OUT_RING(ring, CP_SMMU_TABLE_UPDATE_0_TTBR0_LO(lower_32_bits(ttbr)));
+	OUT_RING(ring,
+		CP_SMMU_TABLE_UPDATE_1_TTBR0_HI(upper_32_bits(ttbr)) |
+		CP_SMMU_TABLE_UPDATE_1_ASID(asid));
+	OUT_RING(ring, CP_SMMU_TABLE_UPDATE_2_CONTEXTIDR(0));
+	OUT_RING(ring, CP_SMMU_TABLE_UPDATE_3_CONTEXTBANK(0));
+
+	/*
+	 * Write the new TTBR0 to the memstore. This is good for debugging.
+	 */
+	OUT_PKT7(ring, CP_MEM_WRITE, 4);
+	OUT_RING(ring, CP_MEM_WRITE_0_ADDR_LO(lower_32_bits(memptr)));
+	OUT_RING(ring, CP_MEM_WRITE_1_ADDR_HI(upper_32_bits(memptr)));
+	OUT_RING(ring, lower_32_bits(ttbr));
+	OUT_RING(ring, (asid << 16) | upper_32_bits(ttbr));
+
+
+	a6xx_gpu->cur_ctx = ctx;
+}
+
 static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 {
 	unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
@@ -90,6 +125,8 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
 	struct msm_ringbuffer *ring = submit->ring;
 	unsigned int i;
 
+	a6xx_set_pagetable(a6xx_gpu, ring, submit->queue->ctx);
+
 	get_stats_counter(ring, REG_A6XX_RBBM_PERFCTR_CP_0_LO,
 		rbmemptr_stats(ring, index, cpcycles_start));
 
@@ -696,6 +733,8 @@ static int a6xx_hw_init(struct msm_gpu *gpu)
 	/* Always come up on rb 0 */
 	a6xx_gpu->cur_ring = gpu->rb[0];
 
+	a6xx_gpu->cur_ctx = NULL;
+
 	/* Enable the SQE_to start the CP engine */
 	gpu_write(gpu, REG_A6XX_CP_SQE_CNTL, 1);
 
@@ -1008,6 +1047,19 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu)
 	return (unsigned long)busy_time;
 }
 
+static struct msm_gem_address_space *
+a6xx_create_private_address_space(struct msm_gpu *gpu)
+{
+	struct msm_mmu *mmu;
+
+	mmu = msm_iommu_pagetable_create(gpu->aspace->mmu);
+	if (IS_ERR(mmu))
+		return msm_gem_address_space_get(gpu->aspace);
+
+	return msm_gem_address_space_create(mmu,
+		"gpu", 0x100000000ULL, 0x1ffffffffULL);
+}
+
 static const struct adreno_gpu_funcs funcs = {
 	.base = {
 		.get_param = adreno_get_param,
@@ -1031,6 +1083,7 @@ static const struct adreno_gpu_funcs funcs = {
 		.gpu_state_put = a6xx_gpu_state_put,
 #endif
 		.create_address_space = adreno_iommu_create_address_space,
+		.create_private_address_space = a6xx_create_private_address_space,
 	},
 	.get_timestamp = a6xx_get_timestamp,
 };
diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
index 03ba60d5b07f..da22d7549d9b 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.h
@@ -19,6 +19,7 @@ struct a6xx_gpu {
 	uint64_t sqe_iova;
 
 	struct msm_ringbuffer *cur_ring;
+	struct msm_file_private *cur_ctx;
 
 	struct a6xx_gmu gmu;
 };
diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.h b/drivers/gpu/drm/msm/msm_ringbuffer.h
index 7764373d0ed2..0987d6bf848c 100644
--- a/drivers/gpu/drm/msm/msm_ringbuffer.h
+++ b/drivers/gpu/drm/msm/msm_ringbuffer.h
@@ -31,6 +31,7 @@ struct msm_rbmemptrs {
 	volatile uint32_t fence;
 
 	volatile struct msm_gpu_submit_stats stats[MSM_GPU_SUBMIT_STATS_COUNT];
+	volatile u64 ttbr0;
 };
 
 struct msm_ringbuffer {
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v10 13/13] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU
  2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
                   ` (11 preceding siblings ...)
  2020-07-20 15:40 ` [PATCH v10 12/13] drm/msm/a6xx: Add support for per-instance pagetables Jordan Crouse
@ 2020-07-20 15:40 ` Jordan Crouse
  12 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-20 15:40 UTC (permalink / raw)
  To: linux-arm-msm
  Cc: devicetree, freedreno, Robin Murphy, Rob Herring, iommu,
	Andy Gross, Will Deacon, linux-kernel

Set the qcom,adreno-smmu compatible string for the GPU SMMU to enable
split pagetables and per-instance pagetables for drm/msm.

Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
---

 arch/arm64/boot/dts/qcom/sdm845.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi
index 759cdd0b002b..d883144360aa 100644
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi
+++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi
@@ -3981,7 +3981,7 @@ opp-257000000 {
 		};
 
 		adreno_smmu: iommu@5040000 {
-			compatible = "qcom,sdm845-smmu-v2", "qcom,smmu-v2";
+			compatible = "qcom,adreno-smmu", "qcom,smmu-v2";
 			reg = <0 0x5040000 0 0x10000>;
 			#iommu-cells = <1>;
 			#global-interrupts = <2>;
-- 
2.25.1

_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 07/13] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU
  2020-07-20 15:40 ` [PATCH v10 07/13] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU Jordan Crouse
@ 2020-07-26 16:55   ` Rob Clark
  0 siblings, 0 replies; 21+ messages in thread
From: Rob Clark @ 2020-07-26 16:55 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	Will Deacon, Rob Herring, linux-arm-msm,
	Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Rob Herring, freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Jul 20, 2020 at 8:41 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
>
> Every Qcom Adreno GPU has an embedded SMMU for its own use. These

minor detail: this is true for a3xx and later but not a2xx ;-)

> devices depend on unique features such as split pagetables,
> different stall/halt requirements and other settings. Identify them
> with a compatible string so that they can be identified in the
> arm-smmu implementation specific code.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> Reviewed-by: Rob Herring <robh@kernel.org>
> ---
>
>  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> index d7ceb4c34423..e52a1b146c97 100644
> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> @@ -38,6 +38,10 @@ properties:
>                - qcom,sc7180-smmu-500
>                - qcom,sdm845-smmu-500
>            - const: arm,mmu-500
> +      - description: Qcom Adreno GPUs implementing "arm,smmu-v2"
> +        items:
> +          - const: qcom,adreno-smmu
> +          - const: qcom,smmu-v2
>        - items:
>            - const: arm,mmu-500
>            - const: arm,smmu-v2
> --
> 2.25.1
>
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Freedreno] [PATCH v10 06/13] iommu/arm-smmu-qcom: Get and set the pagetable config for split pagetables
  2020-07-20 15:40 ` [PATCH v10 06/13] iommu/arm-smmu-qcom: Get and set the pagetable config for split pagetables Jordan Crouse
@ 2020-07-26 17:03   ` Rob Clark
  2020-07-27 15:03     ` Jordan Crouse
  0 siblings, 1 reply; 21+ messages in thread
From: Rob Clark @ 2020-07-26 17:03 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: freedreno, Will Deacon, linux-arm-msm, Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Jul 20, 2020 at 8:41 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
>
> The Adreno GPU has the capability to manage its own pagetables and switch
> them dynamically from the hardware. To do this the GPU uses TTBR1 for
> "global" GPU memory and creates local pagetables for each context and
> switches them dynamically with the GPU.
>
> Use DOMAIN_ATTR_PGTABLE_CFG to get the current configuration for the
> TTBR1 pagetable from the smmu driver so the leaf driver can create
> compatible pagetables for use with TTBR0.
>
> Because TTBR0 is disabled by default when TTBR1 is enabled the GPU
> driver can pass the configuration of one of the newly created pagetables
> back through DOMAIN_ATTR_PGTABLE_CFG as a trigger to enable translation on
> TTBR0.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>
>  drivers/iommu/arm-smmu-qcom.c | 47 +++++++++++++++++++++++++++++++++++
>  drivers/iommu/arm-smmu.c      | 32 ++++++++++++++++++------
>  drivers/iommu/arm-smmu.h      | 10 ++++++++
>  3 files changed, 81 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
> index b9a5c5369e86..9a0c64ca9cb6 100644
> --- a/drivers/iommu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm-smmu-qcom.c
> @@ -34,6 +34,52 @@ static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
>         return false;
>  }
>
> +/*
> + * Local implementation to configure TTBR0 wil the specified pagetable config.
> + * The GPU driver will call this to enable TTBR0 when per-instance pagetables
> + * are active
> + */
> +static int qcom_adreno_smmu_set_pgtable_cfg(struct arm_smmu_domain *smmu_domain,
> +               struct io_pgtable_cfg *pgtbl_cfg)
> +{
> +       struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> +       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> +       struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
> +
> +       /* The domain must have split pagetables already enabled */
> +       if (cb->tcr[0] & ARM_SMMU_TCR_EPD1)
> +               return -EINVAL;
> +
> +       /* If the pagetable config is NULL, disable TTBR0 */
> +       if (!pgtbl_cfg) {
> +               /* Do nothing if it is already disabled */
> +               if ((cb->tcr[0] & ARM_SMMU_TCR_EPD0))
> +                       return -EINVAL;
> +
> +               /* Set TCR to the original configuration */
> +               cb->tcr[0] = arm_smmu_lpae_tcr(&pgtable->cfg);
> +               cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
> +       } else {
> +               u32 tcr = cb->tcr[0];
> +
> +               /* FIXME: What sort of validation do we need to do here? */
> +
> +               /* Don't call this again if TTBR0 is already enabled */
> +               if (!(cb->tcr[0] & ARM_SMMU_TCR_EPD0))
> +                       return -EINVAL;
> +
> +               tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
> +               tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
> +
> +               cb->tcr[0] = tcr;
> +               cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> +               cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
> +       }
> +
> +       arm_smmu_write_context_bank(smmu_domain->smmu, cb->cfg->cbndx);
> +       return 0;
> +}
> +
>  static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
>                 struct device *dev, int start, int count)
>  {
> @@ -131,6 +177,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
>         .def_domain_type = qcom_smmu_def_domain_type,
>         .reset = qcom_smmu500_reset,
>         .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
> +       .set_pgtable_cfg = qcom_adreno_smmu_set_pgtable_cfg,
>  };
>
>  static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index fff536a44faa..e1036ae54a8d 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -86,13 +86,6 @@ struct arm_smmu_smr {
>         bool                            valid;
>  };
>
> -struct arm_smmu_cb {
> -       u64                             ttbr[2];
> -       u32                             tcr[2];
> -       u32                             mair[2];
> -       struct arm_smmu_cfg             *cfg;
> -};
> -
>  static bool using_legacy_binding, using_generic_binding;
>
>  static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
> @@ -558,7 +551,7 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
>         }
>  }
>
> -static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> +void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
>  {
>         u32 reg;
>         bool stage1;
> @@ -1515,6 +1508,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
>                 case DOMAIN_ATTR_NESTING:
>                         *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
>                         return 0;
> +               case DOMAIN_ATTR_PGTABLE_CFG: {
> +                       struct io_pgtable *pgtable;
> +                       struct io_pgtable_cfg *dest = data;
> +
> +                       if (!smmu_domain->pgtbl_ops)
> +                               return -ENODEV;
> +
> +                       pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> +
> +                       memcpy(dest, &pgtable->cfg, sizeof(*dest));
> +                       return 0;
> +               }

hmm, maybe it would make sense to have impl hooks for get/set_attr, so
we could handle DOMAIN_ATTR_PGTABLE_CFG inside the adreno_smmu_impl?

Having impl specific domain attrs would be useful for what I have in
mind to enable stall/resume support, so we can hook in devcoredump to
iova faults (which would be a huge improvement for debugability, right
now iova faults are somewhat harder to debug than needed).  My rough
idea was to add DOMAIN_ATTR_RESUME, which could be used with
set_attr() to (1) enable STALL and let drm/msm know whether the iommu
supports it, and (2) resume translation from wq context after
devcoredump snapshot is collected.

BR,
-R

>                 default:
>                         return -ENODEV;
>                 }
> @@ -1555,6 +1560,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
>                         else
>                                 smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
>                         break;
> +               case DOMAIN_ATTR_PGTABLE_CFG: {
> +                       struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +                       ret = -EPERM;
> +
> +                       if (smmu)
> +                               if (smmu->impl && smmu->impl->set_pgtable_cfg)
> +                                       ret = smmu->impl->set_pgtable_cfg(smmu_domain,
> +                                               data);
> +                       }
> +                       break;
>                 default:
>                         ret = -ENODEV;
>                 }
> diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> index 9f81c1fffe1e..9325fc28d24a 100644
> --- a/drivers/iommu/arm-smmu.h
> +++ b/drivers/iommu/arm-smmu.h
> @@ -328,6 +328,13 @@ struct arm_smmu_cfg {
>  };
>  #define ARM_SMMU_INVALID_IRPTNDX       0xff
>
> +struct arm_smmu_cb {
> +       u64                             ttbr[2];
> +       u32                             tcr[2];
> +       u32                             mair[2];
> +       struct arm_smmu_cfg             *cfg;
> +};
> +
>  enum arm_smmu_domain_stage {
>         ARM_SMMU_DOMAIN_S1 = 0,
>         ARM_SMMU_DOMAIN_S2,
> @@ -408,6 +415,8 @@ struct arm_smmu_impl {
>         int (*def_domain_type)(struct device *dev);
>         int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
>                         struct device *dev, int start, int max);
> +       int (*set_pgtable_cfg)(struct arm_smmu_domain *smmu_domain,
> +                       struct io_pgtable_cfg *cfg);
>  };
>
>  #define INVALID_SMENDX                 -1
> @@ -493,6 +502,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
>  struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
>  struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu);
>
> +void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
>  int arm_mmu500_reset(struct arm_smmu_device *smmu);
>
>  #endif /* _ARM_SMMU_H */
> --
> 2.25.1
>
> _______________________________________________
> Freedreno mailing list
> Freedreno@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/freedreno
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 08/13] drm/msm: Add a context pointer to the submitqueue
  2020-07-20 15:40 ` [PATCH v10 08/13] drm/msm: Add a context pointer to the submitqueue Jordan Crouse
@ 2020-07-26 17:09   ` Rob Clark
  0 siblings, 0 replies; 21+ messages in thread
From: Rob Clark @ 2020-07-26 17:09 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: David Airlie, Akhil P Oommen, dri-devel, Eric Anholt,
	AngeloGioacchino Del Regno, Will Deacon, Emil Velikov,
	Jonathan Marek, Ben Dooks, Wambui Karuga, linux-arm-msm,
	Sharat Masetty, Brian Masney, Sean Paul, freedreno,
	Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Daniel Vetter, Shawn Guo, Robin Murphy

On Mon, Jul 20, 2020 at 8:41 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
>
> Each submitqueue is attached to a context. Add a pointer to the
> context to the submitqueue at create time and refcount it so
> that it stays around through the life of the queue.
>
> GPU submissions can access the active context via the submitqueue
> instead of requiring it to be passed around from function to
> function.
>
> Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> ---
>
>  drivers/gpu/drm/msm/adreno/a5xx_gpu.c   | 12 +++++-------
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   |  5 ++---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c |  5 ++---
>  drivers/gpu/drm/msm/adreno/adreno_gpu.h |  3 +--
>  drivers/gpu/drm/msm/msm_drv.c           |  3 ++-
>  drivers/gpu/drm/msm/msm_drv.h           |  8 ++++++++
>  drivers/gpu/drm/msm/msm_gem.h           |  1 +
>  drivers/gpu/drm/msm/msm_gem_submit.c    |  8 ++++----
>  drivers/gpu/drm/msm/msm_gpu.c           |  9 ++++-----
>  drivers/gpu/drm/msm/msm_gpu.h           |  7 +++----
>  drivers/gpu/drm/msm/msm_submitqueue.c   |  8 +++++++-
>  11 files changed, 39 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> index 9e63a190642c..eff2439ea57b 100644
> --- a/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a5xx_gpu.c
> @@ -43,8 +43,7 @@ static void a5xx_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring)
>                 gpu_write(gpu, REG_A5XX_CP_RB_WPTR, wptr);
>  }
>
> -static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> -       struct msm_file_private *ctx)
> +static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  {
>         struct msm_drm_private *priv = gpu->dev->dev_private;
>         struct msm_ringbuffer *ring = submit->ring;
> @@ -57,7 +56,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit
>                 case MSM_SUBMIT_CMD_IB_TARGET_BUF:
>                         break;
>                 case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
> -                       if (priv->lastctx == ctx)
> +                       if (priv->lastctx == submit->queue->ctx)
>                                 break;
>                         /* fall-thru */
>                 case MSM_SUBMIT_CMD_BUF:
> @@ -103,8 +102,7 @@ static void a5xx_submit_in_rb(struct msm_gpu *gpu, struct msm_gem_submit *submit
>         msm_gpu_retire(gpu);
>  }
>
> -static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> -       struct msm_file_private *ctx)
> +static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  {
>         struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>         struct a5xx_gpu *a5xx_gpu = to_a5xx_gpu(adreno_gpu);
> @@ -114,7 +112,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
>
>         if (IS_ENABLED(CONFIG_DRM_MSM_GPU_SUDO) && submit->in_rb) {
>                 priv->lastctx = NULL;
> -               a5xx_submit_in_rb(gpu, submit, ctx);
> +               a5xx_submit_in_rb(gpu, submit);
>                 return;
>         }
>
> @@ -148,7 +146,7 @@ static void a5xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
>                 case MSM_SUBMIT_CMD_IB_TARGET_BUF:
>                         break;
>                 case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
> -                       if (priv->lastctx == ctx)
> +                       if (priv->lastctx == submit->queue->ctx)
>                                 break;
>                         /* fall-thru */
>                 case MSM_SUBMIT_CMD_BUF:
> diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> index c5a3e4d4c007..5eabb0109577 100644
> --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
> @@ -81,8 +81,7 @@ static void get_stats_counter(struct msm_ringbuffer *ring, u32 counter,
>         OUT_RING(ring, upper_32_bits(iova));
>  }
>
> -static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> -       struct msm_file_private *ctx)
> +static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  {
>         unsigned int index = submit->seqno % MSM_GPU_SUBMIT_STATS_COUNT;
>         struct msm_drm_private *priv = gpu->dev->dev_private;
> @@ -115,7 +114,7 @@ static void a6xx_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
>                 case MSM_SUBMIT_CMD_IB_TARGET_BUF:
>                         break;
>                 case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
> -                       if (priv->lastctx == ctx)
> +                       if (priv->lastctx == submit->queue->ctx)
>                                 break;
>                         /* fall-thru */
>                 case MSM_SUBMIT_CMD_BUF:
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> index e23641a5ec84..b38a8126541a 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
> @@ -457,8 +457,7 @@ void adreno_recover(struct msm_gpu *gpu)
>         }
>  }
>
> -void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> -               struct msm_file_private *ctx)
> +void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  {
>         struct adreno_gpu *adreno_gpu = to_adreno_gpu(gpu);
>         struct msm_drm_private *priv = gpu->dev->dev_private;
> @@ -472,7 +471,7 @@ void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
>                         break;
>                 case MSM_SUBMIT_CMD_CTX_RESTORE_BUF:
>                         /* ignore if there has not been a ctx switch: */
> -                       if (priv->lastctx == ctx)
> +                       if (priv->lastctx == submit->queue->ctx)
>                                 break;
>                         /* fall-thru */
>                 case MSM_SUBMIT_CMD_BUF:
> diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.h b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> index 99bb468f5f24..0ae8b373c428 100644
> --- a/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> +++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.h
> @@ -267,8 +267,7 @@ struct drm_gem_object *adreno_fw_create_bo(struct msm_gpu *gpu,
>                 const struct firmware *fw, u64 *iova);
>  int adreno_hw_init(struct msm_gpu *gpu);
>  void adreno_recover(struct msm_gpu *gpu);
> -void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> -               struct msm_file_private *ctx);
> +void adreno_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit);
>  void adreno_flush(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
>  bool adreno_idle(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
>  #if defined(CONFIG_DEBUG_FS) || defined(CONFIG_DEV_COREDUMP)
> diff --git a/drivers/gpu/drm/msm/msm_drv.c b/drivers/gpu/drm/msm/msm_drv.c
> index ff379ed40469..556198d4ba5f 100644
> --- a/drivers/gpu/drm/msm/msm_drv.c
> +++ b/drivers/gpu/drm/msm/msm_drv.c
> @@ -600,6 +600,7 @@ static int context_init(struct drm_device *dev, struct drm_file *file)
>         if (!ctx)
>                 return -ENOMEM;
>
> +       kref_init(&ctx->ref);
>         msm_submitqueue_init(dev, ctx);
>
>         ctx->aspace = priv->gpu ? priv->gpu->aspace : NULL;
> @@ -621,7 +622,7 @@ static int msm_open(struct drm_device *dev, struct drm_file *file)
>  static void context_close(struct msm_file_private *ctx)
>  {
>         msm_submitqueue_close(ctx);
> -       kfree(ctx);
> +       kref_put(&ctx->ref, msm_file_private_destroy);
>  }
>
>  static void msm_postclose(struct drm_device *dev, struct drm_file *file)
> diff --git a/drivers/gpu/drm/msm/msm_drv.h b/drivers/gpu/drm/msm/msm_drv.h
> index 2687f7a42c15..ab5f77261816 100644
> --- a/drivers/gpu/drm/msm/msm_drv.h
> +++ b/drivers/gpu/drm/msm/msm_drv.h
> @@ -57,6 +57,7 @@ struct msm_file_private {
>         struct list_head submitqueues;
>         int queueid;
>         struct msm_gem_address_space *aspace;
> +       struct kref ref;
>  };
>
>  enum msm_mdp_plane_property {
> @@ -429,6 +430,13 @@ void msm_submitqueue_close(struct msm_file_private *ctx);
>
>  void msm_submitqueue_destroy(struct kref *kref);
>
> +static inline void msm_file_private_destroy(struct kref *kref)
> +{
> +       struct msm_file_private *ctx = container_of(kref,
> +               struct msm_file_private, ref);
> +
> +       kfree(ctx);
> +}
>
>  #define DBG(fmt, ...) DRM_DEBUG_DRIVER(fmt"\n", ##__VA_ARGS__)
>  #define VERB(fmt, ...) if (0) DRM_DEBUG_DRIVER(fmt"\n", ##__VA_ARGS__)
> diff --git a/drivers/gpu/drm/msm/msm_gem.h b/drivers/gpu/drm/msm/msm_gem.h
> index 972490b14ba5..9c573c4269cb 100644
> --- a/drivers/gpu/drm/msm/msm_gem.h
> +++ b/drivers/gpu/drm/msm/msm_gem.h
> @@ -142,6 +142,7 @@ struct msm_gem_submit {
>         bool valid;         /* true if no cmdstream patching needed */
>         bool in_rb;         /* "sudo" mode, copy cmds into RB */
>         struct msm_ringbuffer *ring;
> +       struct msm_file_private *ctx;
>         unsigned int nr_cmds;
>         unsigned int nr_bos;
>         u32 ident;         /* A "identifier" for the submit for logging */
> diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c b/drivers/gpu/drm/msm/msm_gem_submit.c
> index 8cb9aa15ff90..aa5c60a7132d 100644
> --- a/drivers/gpu/drm/msm/msm_gem_submit.c
> +++ b/drivers/gpu/drm/msm/msm_gem_submit.c
> @@ -27,7 +27,7 @@
>  #define BO_PINNED   0x2000
>
>  static struct msm_gem_submit *submit_create(struct drm_device *dev,
> -               struct msm_gpu *gpu, struct msm_gem_address_space *aspace,
> +               struct msm_gpu *gpu,
>                 struct msm_gpu_submitqueue *queue, uint32_t nr_bos,
>                 uint32_t nr_cmds)
>  {
> @@ -43,7 +43,7 @@ static struct msm_gem_submit *submit_create(struct drm_device *dev,
>                 return NULL;
>
>         submit->dev = dev;
> -       submit->aspace = aspace;
> +       submit->aspace = queue->ctx->aspace;
>         submit->gpu = gpu;
>         submit->fence = NULL;
>         submit->cmd = (void *)&submit->bos[nr_bos];
> @@ -677,7 +677,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
>                 }
>         }
>
> -       submit = submit_create(dev, gpu, ctx->aspace, queue, args->nr_bos,
> +       submit = submit_create(dev, gpu, queue, args->nr_bos,
>                 args->nr_cmds);
>         if (!submit) {
>                 ret = -ENOMEM;
> @@ -785,7 +785,7 @@ int msm_ioctl_gem_submit(struct drm_device *dev, void *data,
>                 }
>         }
>
> -       msm_gpu_submit(gpu, submit, ctx);
> +       msm_gpu_submit(gpu, submit);
>
>         args->fence = submit->fence->seqno;
>
> diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c
> index d5645472b25d..a1f3da6550e5 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.c
> +++ b/drivers/gpu/drm/msm/msm_gpu.c
> @@ -520,7 +520,7 @@ static void recover_worker(struct work_struct *work)
>                         struct msm_ringbuffer *ring = gpu->rb[i];
>
>                         list_for_each_entry(submit, &ring->submits, node)
> -                               gpu->funcs->submit(gpu, submit, NULL);
> +                               gpu->funcs->submit(gpu, submit);
>                 }
>         }
>
> @@ -747,8 +747,7 @@ void msm_gpu_retire(struct msm_gpu *gpu)
>  }
>
>  /* add bo's to gpu's ring, and kick gpu: */
> -void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> -               struct msm_file_private *ctx)
> +void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit)
>  {
>         struct drm_device *dev = gpu->dev;
>         struct msm_drm_private *priv = dev->dev_private;
> @@ -788,8 +787,8 @@ void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
>                         msm_gem_move_to_active(&msm_obj->base, gpu, false, submit->fence);
>         }
>
> -       gpu->funcs->submit(gpu, submit, ctx);
> -       priv->lastctx = ctx;
> +       gpu->funcs->submit(gpu, submit);
> +       priv->lastctx = submit->queue->ctx;
>
>         hangcheck_timer_reset(gpu);
>  }
> diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h
> index 0db117a7339b..d496d488222c 100644
> --- a/drivers/gpu/drm/msm/msm_gpu.h
> +++ b/drivers/gpu/drm/msm/msm_gpu.h
> @@ -44,8 +44,7 @@ struct msm_gpu_funcs {
>         int (*hw_init)(struct msm_gpu *gpu);
>         int (*pm_suspend)(struct msm_gpu *gpu);
>         int (*pm_resume)(struct msm_gpu *gpu);
> -       void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> -                       struct msm_file_private *ctx);
> +       void (*submit)(struct msm_gpu *gpu, struct msm_gem_submit *submit);
>         void (*flush)(struct msm_gpu *gpu, struct msm_ringbuffer *ring);
>         irqreturn_t (*irq)(struct msm_gpu *irq);
>         struct msm_ringbuffer *(*active_ring)(struct msm_gpu *gpu);
> @@ -181,6 +180,7 @@ struct msm_gpu_submitqueue {
>         u32 flags;
>         u32 prio;
>         int faults;
> +       struct msm_file_private *ctx;
>         struct list_head node;
>         struct kref ref;
>  };
> @@ -280,8 +280,7 @@ int msm_gpu_perfcntr_sample(struct msm_gpu *gpu, uint32_t *activetime,
>                 uint32_t *totaltime, uint32_t ncntrs, uint32_t *cntrs);
>
>  void msm_gpu_retire(struct msm_gpu *gpu);
> -void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit,
> -               struct msm_file_private *ctx);
> +void msm_gpu_submit(struct msm_gpu *gpu, struct msm_gem_submit *submit);
>
>  int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev,
>                 struct msm_gpu *gpu, const struct msm_gpu_funcs *funcs,
> diff --git a/drivers/gpu/drm/msm/msm_submitqueue.c b/drivers/gpu/drm/msm/msm_submitqueue.c
> index a1d94be7883a..10f557225a3e 100644
> --- a/drivers/gpu/drm/msm/msm_submitqueue.c
> +++ b/drivers/gpu/drm/msm/msm_submitqueue.c
> @@ -49,8 +49,10 @@ void msm_submitqueue_close(struct msm_file_private *ctx)
>          * No lock needed in close and there won't
>          * be any more user ioctls coming our way
>          */
> -       list_for_each_entry_safe(entry, tmp, &ctx->submitqueues, node)
> +       list_for_each_entry_safe(entry, tmp, &ctx->submitqueues, node) {
> +               kref_put(&ctx->ref, msm_file_private_destroy);
>                 msm_submitqueue_put(entry);
> +       }

Seems kinda like the kref_put should happen in the final unref of the
submitqueue, since the reference is taken in msm_submitqueue_create()?
 (If there is something I'm overlooking, I guess that is a hint that
there should be a comment here)

Otherwise, I like this.

BR,
-R

>  }
>
>  int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private *ctx,
> @@ -81,6 +83,9 @@ int msm_submitqueue_create(struct drm_device *drm, struct msm_file_private *ctx,
>
>         write_lock(&ctx->queuelock);
>
> +       kref_get(&ctx->ref);
> +
> +       queue->ctx = ctx;
>         queue->id = ctx->queueid++;
>
>         if (id)
> @@ -177,6 +182,7 @@ int msm_submitqueue_remove(struct msm_file_private *ctx, u32 id)
>                         list_del(&entry->node);
>                         write_unlock(&ctx->queuelock);
>
> +                       kref_put(&ctx->ref, msm_file_private_destroy);
>                         msm_submitqueue_put(entry);
>                         return 0;
>                 }
> --
> 2.25.1
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 04/13] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU
  2020-07-20 15:40 ` [PATCH v10 04/13] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU Jordan Crouse
@ 2020-07-27  6:27   ` Bjorn Andersson
  2020-07-27 14:57     ` Jordan Crouse
  0 siblings, 1 reply; 21+ messages in thread
From: Bjorn Andersson @ 2020-07-27  6:27 UTC (permalink / raw)
  To: Jordan Crouse
  Cc: Will Deacon, linux-arm-msm, Robin Murphy, linux-kernel, iommu,
	freedreno, linux-arm-kernel

On Mon 20 Jul 08:40 PDT 2020, Jordan Crouse wrote:
> diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
[..]
> +static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
> +		struct device *dev, int start, int count)
> +{
> +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> +
> +	/*
> +	 * Assign context bank 0 to the GPU device so the GPU hardware can
> +	 * switch pagetables
> +	 */
> +	if (qcom_adreno_smmu_is_gpu_device(dev)) {
> +		if (start > 0 || test_bit(0, smmu->context_map))
> +			return -ENOSPC;
> +
> +		set_bit(0, smmu->context_map);
> +		return 0;
> +	}
> +
> +	return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);

If we end up here before the GPU device shows up this is going to
steal the first context bank, causing the subsequent allocation for the
GPU to always fail.

As such I think it would be appropriate for you to adjust "start" to
never be 0 here. And I think it would be appropriate to write this
function as:

	if (gpu) {
		start = 0;
		count = 1;
	} else {
		if (start == 0)
			start = 1;
	}

	return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);

Regards,
Bjorn
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v10 04/13] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU
  2020-07-27  6:27   ` Bjorn Andersson
@ 2020-07-27 14:57     ` Jordan Crouse
  0 siblings, 0 replies; 21+ messages in thread
From: Jordan Crouse @ 2020-07-27 14:57 UTC (permalink / raw)
  To: Bjorn Andersson
  Cc: Will Deacon, linux-arm-msm, Robin Murphy, linux-kernel, iommu,
	freedreno, linux-arm-kernel

On Sun, Jul 26, 2020 at 11:27:03PM -0700, Bjorn Andersson wrote:
> On Mon 20 Jul 08:40 PDT 2020, Jordan Crouse wrote:
> > diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
> [..]
> > +static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
> > +		struct device *dev, int start, int count)
> > +{
> > +	struct arm_smmu_device *smmu = smmu_domain->smmu;
> > +
> > +	/*
> > +	 * Assign context bank 0 to the GPU device so the GPU hardware can
> > +	 * switch pagetables
> > +	 */
> > +	if (qcom_adreno_smmu_is_gpu_device(dev)) {
> > +		if (start > 0 || test_bit(0, smmu->context_map))
> > +			return -ENOSPC;
> > +
> > +		set_bit(0, smmu->context_map);
> > +		return 0;
> > +	}
> > +
> > +	return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);
> 
> If we end up here before the GPU device shows up this is going to
> steal the first context bank, causing the subsequent allocation for the
> GPU to always fail.
> 
> As such I think it would be appropriate for you to adjust "start" to
> never be 0 here. And I think it would be appropriate to write this
> function as:
> 
> 	if (gpu) {
> 		start = 0;
> 		count = 1;
> 	} else {
> 		if (start == 0)
> 			start = 1;
> 	}
> 
> 	return __arm_smmu_alloc_bitmap(smmu->context_map, start, count);

Excellent suggestions.  Thanks.

Jordan

> Regards,
> Bjorn

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Freedreno] [PATCH v10 06/13] iommu/arm-smmu-qcom: Get and set the pagetable config for split pagetables
  2020-07-26 17:03   ` [Freedreno] " Rob Clark
@ 2020-07-27 15:03     ` Jordan Crouse
  2020-07-27 17:17       ` Rob Clark
  0 siblings, 1 reply; 21+ messages in thread
From: Jordan Crouse @ 2020-07-27 15:03 UTC (permalink / raw)
  To: Rob Clark
  Cc: freedreno, Will Deacon, linux-arm-msm, Linux Kernel Mailing List,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Sun, Jul 26, 2020 at 10:03:07AM -0700, Rob Clark wrote:
> On Mon, Jul 20, 2020 at 8:41 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
> >
> > The Adreno GPU has the capability to manage its own pagetables and switch
> > them dynamically from the hardware. To do this the GPU uses TTBR1 for
> > "global" GPU memory and creates local pagetables for each context and
> > switches them dynamically with the GPU.
> >
> > Use DOMAIN_ATTR_PGTABLE_CFG to get the current configuration for the
> > TTBR1 pagetable from the smmu driver so the leaf driver can create
> > compatible pagetables for use with TTBR0.
> >
> > Because TTBR0 is disabled by default when TTBR1 is enabled the GPU
> > driver can pass the configuration of one of the newly created pagetables
> > back through DOMAIN_ATTR_PGTABLE_CFG as a trigger to enable translation on
> > TTBR0.
> >
> > Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> > ---
> >
> >  drivers/iommu/arm-smmu-qcom.c | 47 +++++++++++++++++++++++++++++++++++
> >  drivers/iommu/arm-smmu.c      | 32 ++++++++++++++++++------
> >  drivers/iommu/arm-smmu.h      | 10 ++++++++
> >  3 files changed, 81 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
> > index b9a5c5369e86..9a0c64ca9cb6 100644
> > --- a/drivers/iommu/arm-smmu-qcom.c
> > +++ b/drivers/iommu/arm-smmu-qcom.c
> > @@ -34,6 +34,52 @@ static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
> >         return false;
> >  }
> >
> > +/*
> > + * Local implementation to configure TTBR0 wil the specified pagetable config.
> > + * The GPU driver will call this to enable TTBR0 when per-instance pagetables
> > + * are active
> > + */
> > +static int qcom_adreno_smmu_set_pgtable_cfg(struct arm_smmu_domain *smmu_domain,
> > +               struct io_pgtable_cfg *pgtbl_cfg)
> > +{
> > +       struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> > +       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> > +       struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
> > +
> > +       /* The domain must have split pagetables already enabled */
> > +       if (cb->tcr[0] & ARM_SMMU_TCR_EPD1)
> > +               return -EINVAL;
> > +
> > +       /* If the pagetable config is NULL, disable TTBR0 */
> > +       if (!pgtbl_cfg) {
> > +               /* Do nothing if it is already disabled */
> > +               if ((cb->tcr[0] & ARM_SMMU_TCR_EPD0))
> > +                       return -EINVAL;
> > +
> > +               /* Set TCR to the original configuration */
> > +               cb->tcr[0] = arm_smmu_lpae_tcr(&pgtable->cfg);
> > +               cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
> > +       } else {
> > +               u32 tcr = cb->tcr[0];
> > +
> > +               /* FIXME: What sort of validation do we need to do here? */
> > +
> > +               /* Don't call this again if TTBR0 is already enabled */
> > +               if (!(cb->tcr[0] & ARM_SMMU_TCR_EPD0))
> > +                       return -EINVAL;
> > +
> > +               tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
> > +               tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
> > +
> > +               cb->tcr[0] = tcr;
> > +               cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> > +               cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
> > +       }
> > +
> > +       arm_smmu_write_context_bank(smmu_domain->smmu, cb->cfg->cbndx);
> > +       return 0;
> > +}
> > +
> >  static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
> >                 struct device *dev, int start, int count)
> >  {
> > @@ -131,6 +177,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
> >         .def_domain_type = qcom_smmu_def_domain_type,
> >         .reset = qcom_smmu500_reset,
> >         .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
> > +       .set_pgtable_cfg = qcom_adreno_smmu_set_pgtable_cfg,
> >  };
> >
> >  static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
> > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > index fff536a44faa..e1036ae54a8d 100644
> > --- a/drivers/iommu/arm-smmu.c
> > +++ b/drivers/iommu/arm-smmu.c
> > @@ -86,13 +86,6 @@ struct arm_smmu_smr {
> >         bool                            valid;
> >  };
> >
> > -struct arm_smmu_cb {
> > -       u64                             ttbr[2];
> > -       u32                             tcr[2];
> > -       u32                             mair[2];
> > -       struct arm_smmu_cfg             *cfg;
> > -};
> > -
> >  static bool using_legacy_binding, using_generic_binding;
> >
> >  static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
> > @@ -558,7 +551,7 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> >         }
> >  }
> >
> > -static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> > +void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> >  {
> >         u32 reg;
> >         bool stage1;
> > @@ -1515,6 +1508,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> >                 case DOMAIN_ATTR_NESTING:
> >                         *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
> >                         return 0;
> > +               case DOMAIN_ATTR_PGTABLE_CFG: {
> > +                       struct io_pgtable *pgtable;
> > +                       struct io_pgtable_cfg *dest = data;
> > +
> > +                       if (!smmu_domain->pgtbl_ops)
> > +                               return -ENODEV;
> > +
> > +                       pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> > +
> > +                       memcpy(dest, &pgtable->cfg, sizeof(*dest));
> > +                       return 0;
> > +               }
> 
> hmm, maybe it would make sense to have impl hooks for get/set_attr, so
> we could handle DOMAIN_ATTR_PGTABLE_CFG inside the adreno_smmu_impl?
> 
> Having impl specific domain attrs would be useful for what I have in
> mind to enable stall/resume support, so we can hook in devcoredump to
> iova faults (which would be a huge improvement for debugability, right
> now iova faults are somewhat harder to debug than needed).  My rough
> idea was to add DOMAIN_ATTR_RESUME, which could be used with
> set_attr() to (1) enable STALL and let drm/msm know whether the iommu
> supports it, and (2) resume translation from wq context after
> devcoredump snapshot is collected.

Expanding on that, maybe a DOMAIN_ATTR_IMPL with struct { int subtype; void
*data } as the payload would let us add things without having to populate the
generic enum.  That would force us to export an arm-smmu header but at this
point it might be such a bad thing.

Jordan


> BR,
> -R
> 
> >                 default:
> >                         return -ENODEV;
> >                 }
> > @@ -1555,6 +1560,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> >                         else
> >                                 smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> >                         break;
> > +               case DOMAIN_ATTR_PGTABLE_CFG: {
> > +                       struct arm_smmu_device *smmu = smmu_domain->smmu;
> > +
> > +                       ret = -EPERM;
> > +
> > +                       if (smmu)
> > +                               if (smmu->impl && smmu->impl->set_pgtable_cfg)
> > +                                       ret = smmu->impl->set_pgtable_cfg(smmu_domain,
> > +                                               data);
> > +                       }
> > +                       break;
> >                 default:
> >                         ret = -ENODEV;
> >                 }
> > diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> > index 9f81c1fffe1e..9325fc28d24a 100644
> > --- a/drivers/iommu/arm-smmu.h
> > +++ b/drivers/iommu/arm-smmu.h
> > @@ -328,6 +328,13 @@ struct arm_smmu_cfg {
> >  };
> >  #define ARM_SMMU_INVALID_IRPTNDX       0xff
> >
> > +struct arm_smmu_cb {
> > +       u64                             ttbr[2];
> > +       u32                             tcr[2];
> > +       u32                             mair[2];
> > +       struct arm_smmu_cfg             *cfg;
> > +};
> > +
> >  enum arm_smmu_domain_stage {
> >         ARM_SMMU_DOMAIN_S1 = 0,
> >         ARM_SMMU_DOMAIN_S2,
> > @@ -408,6 +415,8 @@ struct arm_smmu_impl {
> >         int (*def_domain_type)(struct device *dev);
> >         int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
> >                         struct device *dev, int start, int max);
> > +       int (*set_pgtable_cfg)(struct arm_smmu_domain *smmu_domain,
> > +                       struct io_pgtable_cfg *cfg);
> >  };
> >
> >  #define INVALID_SMENDX                 -1
> > @@ -493,6 +502,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
> >  struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
> >  struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu);
> >
> > +void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
> >  int arm_mmu500_reset(struct arm_smmu_device *smmu);
> >
> >  #endif /* _ARM_SMMU_H */
> > --
> > 2.25.1
> >
> > _______________________________________________
> > Freedreno mailing list
> > Freedreno@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/freedreno

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Freedreno] [PATCH v10 06/13] iommu/arm-smmu-qcom: Get and set the pagetable config for split pagetables
  2020-07-27 15:03     ` Jordan Crouse
@ 2020-07-27 17:17       ` Rob Clark
  0 siblings, 0 replies; 21+ messages in thread
From: Rob Clark @ 2020-07-27 17:17 UTC (permalink / raw)
  To: Rob Clark, linux-arm-msm, Sai Prakash Ranjan,
	Linux Kernel Mailing List, freedreno, Joerg Roedel, Robin Murphy,
	Bjorn Andersson,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>, ,
	Will Deacon,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Jul 27, 2020 at 8:03 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
>
> On Sun, Jul 26, 2020 at 10:03:07AM -0700, Rob Clark wrote:
> > On Mon, Jul 20, 2020 at 8:41 AM Jordan Crouse <jcrouse@codeaurora.org> wrote:
> > >
> > > The Adreno GPU has the capability to manage its own pagetables and switch
> > > them dynamically from the hardware. To do this the GPU uses TTBR1 for
> > > "global" GPU memory and creates local pagetables for each context and
> > > switches them dynamically with the GPU.
> > >
> > > Use DOMAIN_ATTR_PGTABLE_CFG to get the current configuration for the
> > > TTBR1 pagetable from the smmu driver so the leaf driver can create
> > > compatible pagetables for use with TTBR0.
> > >
> > > Because TTBR0 is disabled by default when TTBR1 is enabled the GPU
> > > driver can pass the configuration of one of the newly created pagetables
> > > back through DOMAIN_ATTR_PGTABLE_CFG as a trigger to enable translation on
> > > TTBR0.
> > >
> > > Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
> > > ---
> > >
> > >  drivers/iommu/arm-smmu-qcom.c | 47 +++++++++++++++++++++++++++++++++++
> > >  drivers/iommu/arm-smmu.c      | 32 ++++++++++++++++++------
> > >  drivers/iommu/arm-smmu.h      | 10 ++++++++
> > >  3 files changed, 81 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/drivers/iommu/arm-smmu-qcom.c b/drivers/iommu/arm-smmu-qcom.c
> > > index b9a5c5369e86..9a0c64ca9cb6 100644
> > > --- a/drivers/iommu/arm-smmu-qcom.c
> > > +++ b/drivers/iommu/arm-smmu-qcom.c
> > > @@ -34,6 +34,52 @@ static bool qcom_adreno_smmu_is_gpu_device(struct device *dev)
> > >         return false;
> > >  }
> > >
> > > +/*
> > > + * Local implementation to configure TTBR0 wil the specified pagetable config.
> > > + * The GPU driver will call this to enable TTBR0 when per-instance pagetables
> > > + * are active
> > > + */
> > > +static int qcom_adreno_smmu_set_pgtable_cfg(struct arm_smmu_domain *smmu_domain,
> > > +               struct io_pgtable_cfg *pgtbl_cfg)
> > > +{
> > > +       struct io_pgtable *pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> > > +       struct arm_smmu_cfg *cfg = &smmu_domain->cfg;
> > > +       struct arm_smmu_cb *cb = &smmu_domain->smmu->cbs[cfg->cbndx];
> > > +
> > > +       /* The domain must have split pagetables already enabled */
> > > +       if (cb->tcr[0] & ARM_SMMU_TCR_EPD1)
> > > +               return -EINVAL;
> > > +
> > > +       /* If the pagetable config is NULL, disable TTBR0 */
> > > +       if (!pgtbl_cfg) {
> > > +               /* Do nothing if it is already disabled */
> > > +               if ((cb->tcr[0] & ARM_SMMU_TCR_EPD0))
> > > +                       return -EINVAL;
> > > +
> > > +               /* Set TCR to the original configuration */
> > > +               cb->tcr[0] = arm_smmu_lpae_tcr(&pgtable->cfg);
> > > +               cb->ttbr[0] = FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
> > > +       } else {
> > > +               u32 tcr = cb->tcr[0];
> > > +
> > > +               /* FIXME: What sort of validation do we need to do here? */
> > > +
> > > +               /* Don't call this again if TTBR0 is already enabled */
> > > +               if (!(cb->tcr[0] & ARM_SMMU_TCR_EPD0))
> > > +                       return -EINVAL;
> > > +
> > > +               tcr |= arm_smmu_lpae_tcr(pgtbl_cfg);
> > > +               tcr &= ~(ARM_SMMU_TCR_EPD0 | ARM_SMMU_TCR_EPD1);
> > > +
> > > +               cb->tcr[0] = tcr;
> > > +               cb->ttbr[0] = pgtbl_cfg->arm_lpae_s1_cfg.ttbr;
> > > +               cb->ttbr[0] |= FIELD_PREP(ARM_SMMU_TTBRn_ASID, cb->cfg->asid);
> > > +       }
> > > +
> > > +       arm_smmu_write_context_bank(smmu_domain->smmu, cb->cfg->cbndx);
> > > +       return 0;
> > > +}
> > > +
> > >  static int qcom_adreno_smmu_alloc_context_bank(struct arm_smmu_domain *smmu_domain,
> > >                 struct device *dev, int start, int count)
> > >  {
> > > @@ -131,6 +177,7 @@ static const struct arm_smmu_impl qcom_adreno_smmu_impl = {
> > >         .def_domain_type = qcom_smmu_def_domain_type,
> > >         .reset = qcom_smmu500_reset,
> > >         .alloc_context_bank = qcom_adreno_smmu_alloc_context_bank,
> > > +       .set_pgtable_cfg = qcom_adreno_smmu_set_pgtable_cfg,
> > >  };
> > >
> > >  static struct arm_smmu_device *qcom_smmu_create(struct arm_smmu_device *smmu,
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index fff536a44faa..e1036ae54a8d 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -86,13 +86,6 @@ struct arm_smmu_smr {
> > >         bool                            valid;
> > >  };
> > >
> > > -struct arm_smmu_cb {
> > > -       u64                             ttbr[2];
> > > -       u32                             tcr[2];
> > > -       u32                             mair[2];
> > > -       struct arm_smmu_cfg             *cfg;
> > > -};
> > > -
> > >  static bool using_legacy_binding, using_generic_binding;
> > >
> > >  static inline int arm_smmu_rpm_get(struct arm_smmu_device *smmu)
> > > @@ -558,7 +551,7 @@ static void arm_smmu_init_context_bank(struct arm_smmu_domain *smmu_domain,
> > >         }
> > >  }
> > >
> > > -static void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> > > +void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx)
> > >  {
> > >         u32 reg;
> > >         bool stage1;
> > > @@ -1515,6 +1508,18 @@ static int arm_smmu_domain_get_attr(struct iommu_domain *domain,
> > >                 case DOMAIN_ATTR_NESTING:
> > >                         *(int *)data = (smmu_domain->stage == ARM_SMMU_DOMAIN_NESTED);
> > >                         return 0;
> > > +               case DOMAIN_ATTR_PGTABLE_CFG: {
> > > +                       struct io_pgtable *pgtable;
> > > +                       struct io_pgtable_cfg *dest = data;
> > > +
> > > +                       if (!smmu_domain->pgtbl_ops)
> > > +                               return -ENODEV;
> > > +
> > > +                       pgtable = io_pgtable_ops_to_pgtable(smmu_domain->pgtbl_ops);
> > > +
> > > +                       memcpy(dest, &pgtable->cfg, sizeof(*dest));
> > > +                       return 0;
> > > +               }
> >
> > hmm, maybe it would make sense to have impl hooks for get/set_attr, so
> > we could handle DOMAIN_ATTR_PGTABLE_CFG inside the adreno_smmu_impl?
> >
> > Having impl specific domain attrs would be useful for what I have in
> > mind to enable stall/resume support, so we can hook in devcoredump to
> > iova faults (which would be a huge improvement for debugability, right
> > now iova faults are somewhat harder to debug than needed).  My rough
> > idea was to add DOMAIN_ATTR_RESUME, which could be used with
> > set_attr() to (1) enable STALL and let drm/msm know whether the iommu
> > supports it, and (2) resume translation from wq context after
> > devcoredump snapshot is collected.
>
> Expanding on that, maybe a DOMAIN_ATTR_IMPL with struct { int subtype; void
> *data } as the payload would let us add things without having to populate the
> generic enum.  That would force us to export an arm-smmu header but at this
> point it might be such a bad thing.

That feels a bit like overkill to me, I don't expect there to be that
many custom things.  But I'll defer to the iommu folks as to which
they prefer.

BR,
-R

> Jordan
>
>
> > BR,
> > -R
> >
> > >                 default:
> > >                         return -ENODEV;
> > >                 }
> > > @@ -1555,6 +1560,17 @@ static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> > >                         else
> > >                                 smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> > >                         break;
> > > +               case DOMAIN_ATTR_PGTABLE_CFG: {
> > > +                       struct arm_smmu_device *smmu = smmu_domain->smmu;
> > > +
> > > +                       ret = -EPERM;
> > > +
> > > +                       if (smmu)
> > > +                               if (smmu->impl && smmu->impl->set_pgtable_cfg)
> > > +                                       ret = smmu->impl->set_pgtable_cfg(smmu_domain,
> > > +                                               data);
> > > +                       }
> > > +                       break;
> > >                 default:
> > >                         ret = -ENODEV;
> > >                 }
> > > diff --git a/drivers/iommu/arm-smmu.h b/drivers/iommu/arm-smmu.h
> > > index 9f81c1fffe1e..9325fc28d24a 100644
> > > --- a/drivers/iommu/arm-smmu.h
> > > +++ b/drivers/iommu/arm-smmu.h
> > > @@ -328,6 +328,13 @@ struct arm_smmu_cfg {
> > >  };
> > >  #define ARM_SMMU_INVALID_IRPTNDX       0xff
> > >
> > > +struct arm_smmu_cb {
> > > +       u64                             ttbr[2];
> > > +       u32                             tcr[2];
> > > +       u32                             mair[2];
> > > +       struct arm_smmu_cfg             *cfg;
> > > +};
> > > +
> > >  enum arm_smmu_domain_stage {
> > >         ARM_SMMU_DOMAIN_S1 = 0,
> > >         ARM_SMMU_DOMAIN_S2,
> > > @@ -408,6 +415,8 @@ struct arm_smmu_impl {
> > >         int (*def_domain_type)(struct device *dev);
> > >         int (*alloc_context_bank)(struct arm_smmu_domain *smmu_domain,
> > >                         struct device *dev, int start, int max);
> > > +       int (*set_pgtable_cfg)(struct arm_smmu_domain *smmu_domain,
> > > +                       struct io_pgtable_cfg *cfg);
> > >  };
> > >
> > >  #define INVALID_SMENDX                 -1
> > > @@ -493,6 +502,7 @@ struct arm_smmu_device *arm_smmu_impl_init(struct arm_smmu_device *smmu);
> > >  struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu);
> > >  struct arm_smmu_device *qcom_adreno_smmu_impl_init(struct arm_smmu_device *smmu);
> > >
> > > +void arm_smmu_write_context_bank(struct arm_smmu_device *smmu, int idx);
> > >  int arm_mmu500_reset(struct arm_smmu_device *smmu);
> > >
> > >  #endif /* _ARM_SMMU_H */
> > > --
> > > 2.25.1
> > >
> > > _______________________________________________
> > > Freedreno mailing list
> > > Freedreno@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/freedreno
>
> --
> The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, back to index

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-20 15:40 [PATCH v10 00/13] iommu/arm-smmu: Add Adreno SMMU specific implementation Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 01/13] iommu/arm-smmu: Pass io-pgtable config to implementation specific function Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 02/13] iommu/arm-smmu: Add support for split pagetables Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 03/13] iommu/arm-smmu: Add implementation hooks to configure contexts Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 04/13] iommu/arm-smmu-qcom: Add implementation for the adreno GPU SMMU Jordan Crouse
2020-07-27  6:27   ` Bjorn Andersson
2020-07-27 14:57     ` Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 05/13] iommu: Add a domain attribute to get/set a pagetable configuration Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 06/13] iommu/arm-smmu-qcom: Get and set the pagetable config for split pagetables Jordan Crouse
2020-07-26 17:03   ` [Freedreno] " Rob Clark
2020-07-27 15:03     ` Jordan Crouse
2020-07-27 17:17       ` Rob Clark
2020-07-20 15:40 ` [PATCH v10 07/13] dt-bindings: arm-smmu: Add compatible string for Adreno GPU SMMU Jordan Crouse
2020-07-26 16:55   ` Rob Clark
2020-07-20 15:40 ` [PATCH v10 08/13] drm/msm: Add a context pointer to the submitqueue Jordan Crouse
2020-07-26 17:09   ` Rob Clark
2020-07-20 15:40 ` [PATCH v10 09/13] drm/msm: Set the global virtual address range from the IOMMU domain Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 10/13] drm/msm: Add support to create a local pagetable Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 11/13] drm/msm: Add support for private address space instances Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 12/13] drm/msm/a6xx: Add support for per-instance pagetables Jordan Crouse
2020-07-20 15:40 ` [PATCH v10 13/13] arm: dts: qcom: sm845: Set the compatible string for the GPU SMMU Jordan Crouse

IOMMU Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-iommu/0 linux-iommu/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-iommu linux-iommu/ https://lore.kernel.org/linux-iommu \
		iommu@lists.linux-foundation.org
	public-inbox-index linux-iommu

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.linux-foundation.lists.iommu


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git