linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
@ 2021-01-11 14:15 Sai Prakash Ranjan
  2021-01-11 14:15 ` [PATCH 1/3] iommu/io-pgtable: Rename last-level cache quirk to IO_PGTABLE_QUIRK_PTW_LLC Sai Prakash Ranjan
                   ` (4 more replies)
  0 siblings, 5 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-11 14:15 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, Jordan Crouse,
	Rob Clark, Akhil P Oommen, isaacm
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel, Sai Prakash Ranjan

commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we will
need to set the right PTE attribute for GPU buffers to be sys cached.
Without this, the system cache lines are not allocated for GPU.

So the patches in this series introduces a new prot flag IOMMU_LLC,
renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
and makes GPU the user of this protection flag.

The series slightly depends on following 2 patches posted earlier and
is based on msm-next branch:
 * https://lore.kernel.org/patchwork/patch/1363008/
 * https://lore.kernel.org/patchwork/patch/1363010/

Sai Prakash Ranjan (3):
  iommu/io-pgtable: Rename last-level cache quirk to
    IO_PGTABLE_QUIRK_PTW_LLC
  iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
 drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
 drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
 drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
 include/linux/io-pgtable.h              | 6 +++---
 include/linux/iommu.h                   | 6 ++++++
 7 files changed, 26 insertions(+), 7 deletions(-)


base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/3] iommu/io-pgtable: Rename last-level cache quirk to IO_PGTABLE_QUIRK_PTW_LLC
  2021-01-11 14:15 [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
@ 2021-01-11 14:15 ` Sai Prakash Ranjan
  2021-01-11 14:15 ` [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag Sai Prakash Ranjan
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-11 14:15 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, Jordan Crouse,
	Rob Clark, Akhil P Oommen, isaacm
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel, Sai Prakash Ranjan

Rename last-level cache quirk IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to
IO_PGTABLE_QUIRK_PTW_LLC which is used to set the required TCR
attributes for non-coherent page table walker to be more generic
and in sync with the upcoming page protection flag IOMMU_LLC.

Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
 drivers/iommu/io-pgtable-arm.c          | 6 +++---
 include/linux/io-pgtable.h              | 6 +++---
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/msm/adreno/adreno_gpu.c b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
index 0f184c3dd9d9..82b5e4969195 100644
--- a/drivers/gpu/drm/msm/adreno/adreno_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/adreno_gpu.c
@@ -190,7 +190,7 @@ void adreno_set_llc_attributes(struct iommu_domain *iommu)
 {
 	struct io_pgtable_domain_attr pgtbl_cfg;
 
-	pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_ARM_OUTER_WBWA;
+	pgtbl_cfg.quirks = IO_PGTABLE_QUIRK_PTW_LLC;
 	iommu_domain_set_attr(iommu, DOMAIN_ATTR_IO_PGTABLE_CFG, &pgtbl_cfg);
 }
 
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 7c9ea9d7874a..7439ee7fdcdb 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -762,7 +762,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 	if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
 			    IO_PGTABLE_QUIRK_NON_STRICT |
 			    IO_PGTABLE_QUIRK_ARM_TTBR1 |
-			    IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+			    IO_PGTABLE_QUIRK_PTW_LLC))
 		return NULL;
 
 	data = arm_lpae_alloc_pgtable(cfg);
@@ -774,12 +774,12 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
 		tcr->sh = ARM_LPAE_TCR_SH_IS;
 		tcr->irgn = ARM_LPAE_TCR_RGN_WBWA;
 		tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
-		if (cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+		if (cfg->quirks & IO_PGTABLE_QUIRK_PTW_LLC)
 			goto out_free_data;
 	} else {
 		tcr->sh = ARM_LPAE_TCR_SH_OS;
 		tcr->irgn = ARM_LPAE_TCR_RGN_NC;
-		if (!(cfg->quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+		if (!(cfg->quirks & IO_PGTABLE_QUIRK_PTW_LLC))
 			tcr->orgn = ARM_LPAE_TCR_RGN_NC;
 		else
 			tcr->orgn = ARM_LPAE_TCR_RGN_WBWA;
diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index fb4d5a763e0c..6f996a817441 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -87,8 +87,8 @@ struct io_pgtable_cfg {
 	 * IO_PGTABLE_QUIRK_ARM_TTBR1: (ARM LPAE format) Configure the table
 	 *	for use in the upper half of a split address space.
 	 *
-	 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
-	 *	attributes set in the TCR for a non-coherent page-table walker.
+	 * IO_PGTABLE_QUIRK_PTW_LLC: Override the outer-cacheability attributes
+	 *	set in the TCR for a non-coherent page-table walker.
 	 */
 	#define IO_PGTABLE_QUIRK_ARM_NS		BIT(0)
 	#define IO_PGTABLE_QUIRK_NO_PERMS	BIT(1)
@@ -96,7 +96,7 @@ struct io_pgtable_cfg {
 	#define IO_PGTABLE_QUIRK_ARM_MTK_EXT	BIT(3)
 	#define IO_PGTABLE_QUIRK_NON_STRICT	BIT(4)
 	#define IO_PGTABLE_QUIRK_ARM_TTBR1	BIT(5)
-	#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA	BIT(6)
+	#define IO_PGTABLE_QUIRK_PTW_LLC	BIT(6)
 	unsigned long			quirks;
 	unsigned long			pgsize_bitmap;
 	unsigned int			ias;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-01-11 14:15 [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
  2021-01-11 14:15 ` [PATCH 1/3] iommu/io-pgtable: Rename last-level cache quirk to IO_PGTABLE_QUIRK_PTW_LLC Sai Prakash Ranjan
@ 2021-01-11 14:15 ` Sai Prakash Ranjan
       [not found]   ` <20210129090516.GB3998@willie-the-truck>
  2021-01-11 14:15 ` [PATCH 3/3] drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers Sai Prakash Ranjan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-11 14:15 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, Jordan Crouse,
	Rob Clark, Akhil P Oommen, isaacm
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel, Sai Prakash Ranjan

Add a new page protection flag IOMMU_LLC which can be used
by non-coherent masters to set cacheable memory attributes
for an outer level of cache called as last-level cache or
system cache. Initial user of this page protection flag is
the adreno gpu and then can later be used by other clients
such as video where this can be used for per-buffer based
mapping.

Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/io-pgtable-arm.c | 3 +++
 include/linux/iommu.h          | 6 ++++++
 2 files changed, 9 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 7439ee7fdcdb..ebe653ef601b 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 		else if (prot & IOMMU_CACHE)
 			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+		else if (prot & IOMMU_LLC)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
 	}
 
 	if (prot & IOMMU_CACHE)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ffaa389ea128..1f82057df531 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -31,6 +31,12 @@
  * if the IOMMU page table format is equivalent.
  */
 #define IOMMU_PRIV	(1 << 5)
+/*
+ * Non-coherent masters can use this page protection flag to set cacheable
+ * memory attributes for only a transparent outer level of cache, also known as
+ * the last-level or system cache.
+ */
+#define IOMMU_LLC	(1 << 6)
 
 struct iommu_ops;
 struct iommu_group;
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 3/3] drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
  2021-01-11 14:15 [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
  2021-01-11 14:15 ` [PATCH 1/3] iommu/io-pgtable: Rename last-level cache quirk to IO_PGTABLE_QUIRK_PTW_LLC Sai Prakash Ranjan
  2021-01-11 14:15 ` [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag Sai Prakash Ranjan
@ 2021-01-11 14:15 ` Sai Prakash Ranjan
  2021-01-20  5:18 ` [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
  2021-07-28 14:00 ` Georgi Djakov
  4 siblings, 0 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-11 14:15 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, Jordan Crouse,
	Rob Clark, Akhil P Oommen, isaacm
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel, Sai Prakash Ranjan

Use the newly introduced IOMMU_LLC page protection flag to map
GPU buffers. This will make sure that proper stage-1 PTE
attributes are set for GPU buffers to use system cache. This
also introduces MMU_FEATURE_USE_LLC features bit to check for
GPUs supporting LLC and set them in the target specific address
space creation, in this case we set them for A6XX GPUs.

Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 3 +++
 drivers/gpu/drm/msm/msm_iommu.c       | 3 +++
 drivers/gpu/drm/msm/msm_mmu.h         | 4 ++++
 3 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 3c7ad51732bb..23da21b6f0ff 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -1266,6 +1266,9 @@ a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev)
 		return ERR_CAST(mmu);
 	}
 
+	if (!IS_ERR_OR_NULL(a6xx_gpu->llc_slice))
+		mmu->features |= MMU_FEATURE_USE_LLC;
+
 	/*
 	 * Use the aperture start or SZ_16M, whichever is greater. This will
 	 * ensure that we align with the allocated pagetable range while still
diff --git a/drivers/gpu/drm/msm/msm_iommu.c b/drivers/gpu/drm/msm/msm_iommu.c
index 22ac7c692a81..a329f9836422 100644
--- a/drivers/gpu/drm/msm/msm_iommu.c
+++ b/drivers/gpu/drm/msm/msm_iommu.c
@@ -235,6 +235,9 @@ static int msm_iommu_map(struct msm_mmu *mmu, uint64_t iova,
 	if (iova & BIT_ULL(48))
 		iova |= GENMASK_ULL(63, 49);
 
+	if (mmu->features & MMU_FEATURE_USE_LLC)
+		prot |= IOMMU_LLC;
+
 	ret = iommu_map_sgtable(iommu->domain, iova, sgt, prot);
 	WARN_ON(!ret);
 
diff --git a/drivers/gpu/drm/msm/msm_mmu.h b/drivers/gpu/drm/msm/msm_mmu.h
index 61ade89d9e48..efcd1939c98e 100644
--- a/drivers/gpu/drm/msm/msm_mmu.h
+++ b/drivers/gpu/drm/msm/msm_mmu.h
@@ -23,12 +23,16 @@ enum msm_mmu_type {
 	MSM_MMU_IOMMU_PAGETABLE,
 };
 
+/* MMU features */
+#define MMU_FEATURE_USE_LLC	BIT(0)
+
 struct msm_mmu {
 	const struct msm_mmu_funcs *funcs;
 	struct device *dev;
 	int (*handler)(void *arg, unsigned long iova, int flags);
 	void *arg;
 	enum msm_mmu_type type;
+	u32 features;
 };
 
 static inline void msm_mmu_init(struct msm_mmu *mmu, struct device *dev,
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-01-11 14:15 [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
                   ` (2 preceding siblings ...)
  2021-01-11 14:15 ` [PATCH 3/3] drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers Sai Prakash Ranjan
@ 2021-01-20  5:18 ` Sai Prakash Ranjan
  2021-07-28 14:00 ` Georgi Djakov
  4 siblings, 0 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-20  5:18 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel, Jordan Crouse,
	Rob Clark, Akhil P Oommen, isaacm
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel

On 2021-01-11 19:45, Sai Prakash Ranjan wrote:
> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> the memory type setting required for the non-coherent masters to use
> system cache. Now that system cache support for GPU is added, we will
> need to set the right PTE attribute for GPU buffers to be sys cached.
> Without this, the system cache lines are not allocated for GPU.
> 
> So the patches in this series introduces a new prot flag IOMMU_LLC,
> renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> and makes GPU the user of this protection flag.
> 
> The series slightly depends on following 2 patches posted earlier and
> is based on msm-next branch:
>  * https://lore.kernel.org/patchwork/patch/1363008/
>  * https://lore.kernel.org/patchwork/patch/1363010/
> 
> Sai Prakash Ranjan (3):
>   iommu/io-pgtable: Rename last-level cache quirk to
>     IO_PGTABLE_QUIRK_PTW_LLC
>   iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
>   drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>  drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
>  drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
>  drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
>  include/linux/io-pgtable.h              | 6 +++---
>  include/linux/iommu.h                   | 6 ++++++
>  7 files changed, 26 insertions(+), 7 deletions(-)
> 
> 
> base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709


Gentle Ping!

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
       [not found]     ` <5d23fce629323bcda71594010824aad0@codeaurora.org>
@ 2021-02-01 11:15       ` Will Deacon
  2021-02-01 16:20         ` Rob Clark
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2021-02-01 11:15 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Robin Murphy, Joerg Roedel, Jordan Crouse, Rob Clark,
	Akhil P Oommen, isaacm, iommu, linux-arm-kernel, linux-kernel,
	linux-arm-msm, freedreno, Kristian H Kristensen, Sean Paul,
	David Airlie, Daniel Vetter, dri-devel

On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> On 2021-01-29 14:35, Will Deacon wrote:
> > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > Add a new page protection flag IOMMU_LLC which can be used
> > > by non-coherent masters to set cacheable memory attributes
> > > for an outer level of cache called as last-level cache or
> > > system cache. Initial user of this page protection flag is
> > > the adreno gpu and then can later be used by other clients
> > > such as video where this can be used for per-buffer based
> > > mapping.
> > > 
> > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> > > ---
> > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > >  include/linux/iommu.h          | 6 ++++++
> > >  2 files changed, 9 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > b/drivers/iommu/io-pgtable-arm.c
> > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > --- a/drivers/iommu/io-pgtable-arm.c
> > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > >  		else if (prot & IOMMU_CACHE)
> > >  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > >  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > +		else if (prot & IOMMU_LLC)
> > > +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > >  	}
> > > 
> > >  	if (prot & IOMMU_CACHE)
> > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > index ffaa389ea128..1f82057df531 100644
> > > --- a/include/linux/iommu.h
> > > +++ b/include/linux/iommu.h
> > > @@ -31,6 +31,12 @@
> > >   * if the IOMMU page table format is equivalent.
> > >   */
> > >  #define IOMMU_PRIV	(1 << 5)
> > > +/*
> > > + * Non-coherent masters can use this page protection flag to set
> > > cacheable
> > > + * memory attributes for only a transparent outer level of cache,
> > > also known as
> > > + * the last-level or system cache.
> > > + */
> > > +#define IOMMU_LLC	(1 << 6)
> > 
> > On reflection, I'm a bit worried about exposing this because I think it
> > will
> > introduce a mismatched virtual alias with the CPU (we don't even have a
> > MAIR
> > set up for this memory type). Now, we also have that issue for the PTW,
> > but
> > since we always use cache maintenance (i.e. the streaming API) for
> > publishing the page-tables to a non-coheren walker, it works out.
> > However,
> > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > allocation, then they're potentially in for a nasty surprise due to the
> > mismatched outer-cacheability attributes.
> > 
> 
> Can't we add the syscached memory type similar to what is done on android?

Maybe. How does the GPU driver map these things on the CPU side?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-01 11:15       ` Will Deacon
@ 2021-02-01 16:20         ` Rob Clark
  2021-02-01 18:20           ` Jordan Crouse
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-02-01 16:20 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Robin Murphy, Joerg Roedel, Jordan Crouse,
	Akhil P Oommen, Isaac J. Manjarres,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Linux Kernel Mailing List, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel

On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
>
> On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > On 2021-01-29 14:35, Will Deacon wrote:
> > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > Add a new page protection flag IOMMU_LLC which can be used
> > > > by non-coherent masters to set cacheable memory attributes
> > > > for an outer level of cache called as last-level cache or
> > > > system cache. Initial user of this page protection flag is
> > > > the adreno gpu and then can later be used by other clients
> > > > such as video where this can be used for per-buffer based
> > > > mapping.
> > > >
> > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> > > > ---
> > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > > >  include/linux/iommu.h          | 6 ++++++
> > > >  2 files changed, 9 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > b/drivers/iommu/io-pgtable-arm.c
> > > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > >           else if (prot & IOMMU_CACHE)
> > > >                   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > >                           << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > +         else if (prot & IOMMU_LLC)
> > > > +                 pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > +                         << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > >   }
> > > >
> > > >   if (prot & IOMMU_CACHE)
> > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > index ffaa389ea128..1f82057df531 100644
> > > > --- a/include/linux/iommu.h
> > > > +++ b/include/linux/iommu.h
> > > > @@ -31,6 +31,12 @@
> > > >   * if the IOMMU page table format is equivalent.
> > > >   */
> > > >  #define IOMMU_PRIV       (1 << 5)
> > > > +/*
> > > > + * Non-coherent masters can use this page protection flag to set
> > > > cacheable
> > > > + * memory attributes for only a transparent outer level of cache,
> > > > also known as
> > > > + * the last-level or system cache.
> > > > + */
> > > > +#define IOMMU_LLC        (1 << 6)
> > >
> > > On reflection, I'm a bit worried about exposing this because I think it
> > > will
> > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > MAIR
> > > set up for this memory type). Now, we also have that issue for the PTW,
> > > but
> > > since we always use cache maintenance (i.e. the streaming API) for
> > > publishing the page-tables to a non-coheren walker, it works out.
> > > However,
> > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > allocation, then they're potentially in for a nasty surprise due to the
> > > mismatched outer-cacheability attributes.
> > >
> >
> > Can't we add the syscached memory type similar to what is done on android?
>
> Maybe. How does the GPU driver map these things on the CPU side?

Currently we use writecombine mappings for everything, although there
are some cases that we'd like to use cached (but have not merged
patches that would give userspace a way to flush/invalidate)

BR,
-R

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-01 16:20         ` Rob Clark
@ 2021-02-01 18:20           ` Jordan Crouse
  2021-02-02  6:26             ` Sai Prakash Ranjan
  2021-02-02  6:28             ` Sai Prakash Ranjan
  0 siblings, 2 replies; 36+ messages in thread
From: Jordan Crouse @ 2021-02-01 18:20 UTC (permalink / raw)
  To: Rob Clark
  Cc: Will Deacon, Sai Prakash Ranjan, Robin Murphy, Joerg Roedel,
	Akhil P Oommen, Isaac J. Manjarres,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Linux Kernel Mailing List, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel

On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > > Add a new page protection flag IOMMU_LLC which can be used
> > > > > by non-coherent masters to set cacheable memory attributes
> > > > > for an outer level of cache called as last-level cache or
> > > > > system cache. Initial user of this page protection flag is
> > > > > the adreno gpu and then can later be used by other clients
> > > > > such as video where this can be used for per-buffer based
> > > > > mapping.
> > > > >
> > > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> > > > > ---
> > > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > > > >  include/linux/iommu.h          | 6 ++++++
> > > > >  2 files changed, 9 insertions(+)
> > > > >
> > > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > > b/drivers/iommu/io-pgtable-arm.c
> > > > > index 7439ee7fdcdb..ebe653ef601b 100644
> > > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > > >           else if (prot & IOMMU_CACHE)
> > > > >                   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > > >                           << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > > +         else if (prot & IOMMU_LLC)
> > > > > +                 pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > > +                         << ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > >   }
> > > > >
> > > > >   if (prot & IOMMU_CACHE)
> > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> > > > > index ffaa389ea128..1f82057df531 100644
> > > > > --- a/include/linux/iommu.h
> > > > > +++ b/include/linux/iommu.h
> > > > > @@ -31,6 +31,12 @@
> > > > >   * if the IOMMU page table format is equivalent.
> > > > >   */
> > > > >  #define IOMMU_PRIV       (1 << 5)
> > > > > +/*
> > > > > + * Non-coherent masters can use this page protection flag to set
> > > > > cacheable
> > > > > + * memory attributes for only a transparent outer level of cache,
> > > > > also known as
> > > > > + * the last-level or system cache.
> > > > > + */
> > > > > +#define IOMMU_LLC        (1 << 6)
> > > >
> > > > On reflection, I'm a bit worried about exposing this because I think it
> > > > will
> > > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > > MAIR
> > > > set up for this memory type). Now, we also have that issue for the PTW,
> > > > but
> > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > However,
> > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > > allocation, then they're potentially in for a nasty surprise due to the
> > > > mismatched outer-cacheability attributes.
> > > >
> > >
> > > Can't we add the syscached memory type similar to what is done on android?
> >
> > Maybe. How does the GPU driver map these things on the CPU side?
> 
> Currently we use writecombine mappings for everything, although there
> are some cases that we'd like to use cached (but have not merged
> patches that would give userspace a way to flush/invalidate)
> 
> BR,
> -R

LLC/system cache doesn't have a relationship with the CPU cache.  Its just a
little accelerator that sits on the connection from the GPU to DDR and caches
accesses. The hint that Sai is suggesting is used to mark the buffers as
'no-write-allocate' to prevent GPU write operations from being cached in the LLC
which a) isn't interesting and b) takes up cache space for read operations.

Its easiest to think of the LLC as a bonus accelerator that has no cost for
us to use outside of the unfortunate per buffer hint.

We do have to worry about the CPU cache w.r.t I/O coherency (which is a
different hint) and in that case we have all of concerns that Will identified.

Jordan
-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-01 18:20           ` Jordan Crouse
@ 2021-02-02  6:26             ` Sai Prakash Ranjan
  2021-02-03 21:46               ` Will Deacon
  2021-02-02  6:28             ` Sai Prakash Ranjan
  1 sibling, 1 reply; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-02-02  6:26 UTC (permalink / raw)
  To: Rob Clark
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Akhil P Oommen,
	Isaac J. Manjarres, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	linux-arm-kernel, Linux Kernel Mailing List, linux-arm-msm,
	freedreno, Kristian H Kristensen, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel

On 2021-02-01 23:50, Jordan Crouse wrote:
> On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
>> On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
>> >
>> > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
>> > > On 2021-01-29 14:35, Will Deacon wrote:
>> > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
>> > > > > Add a new page protection flag IOMMU_LLC which can be used
>> > > > > by non-coherent masters to set cacheable memory attributes
>> > > > > for an outer level of cache called as last-level cache or
>> > > > > system cache. Initial user of this page protection flag is
>> > > > > the adreno gpu and then can later be used by other clients
>> > > > > such as video where this can be used for per-buffer based
>> > > > > mapping.
>> > > > >
>> > > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> > > > > ---
>> > > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
>> > > > >  include/linux/iommu.h          | 6 ++++++
>> > > > >  2 files changed, 9 insertions(+)
>> > > > >
>> > > > > diff --git a/drivers/iommu/io-pgtable-arm.c
>> > > > > b/drivers/iommu/io-pgtable-arm.c
>> > > > > index 7439ee7fdcdb..ebe653ef601b 100644
>> > > > > --- a/drivers/iommu/io-pgtable-arm.c
>> > > > > +++ b/drivers/iommu/io-pgtable-arm.c
>> > > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
>> > > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>> > > > >           else if (prot & IOMMU_CACHE)
>> > > > >                   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>> > > > >                           << ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > > > +         else if (prot & IOMMU_LLC)
>> > > > > +                 pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>> > > > > +                         << ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > > >   }
>> > > > >
>> > > > >   if (prot & IOMMU_CACHE)
>> > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> > > > > index ffaa389ea128..1f82057df531 100644
>> > > > > --- a/include/linux/iommu.h
>> > > > > +++ b/include/linux/iommu.h
>> > > > > @@ -31,6 +31,12 @@
>> > > > >   * if the IOMMU page table format is equivalent.
>> > > > >   */
>> > > > >  #define IOMMU_PRIV       (1 << 5)
>> > > > > +/*
>> > > > > + * Non-coherent masters can use this page protection flag to set
>> > > > > cacheable
>> > > > > + * memory attributes for only a transparent outer level of cache,
>> > > > > also known as
>> > > > > + * the last-level or system cache.
>> > > > > + */
>> > > > > +#define IOMMU_LLC        (1 << 6)
>> > > >
>> > > > On reflection, I'm a bit worried about exposing this because I think it
>> > > > will
>> > > > introduce a mismatched virtual alias with the CPU (we don't even have a
>> > > > MAIR
>> > > > set up for this memory type). Now, we also have that issue for the PTW,
>> > > > but
>> > > > since we always use cache maintenance (i.e. the streaming API) for
>> > > > publishing the page-tables to a non-coheren walker, it works out.
>> > > > However,
>> > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
>> > > > allocation, then they're potentially in for a nasty surprise due to the
>> > > > mismatched outer-cacheability attributes.
>> > > >
>> > >
>> > > Can't we add the syscached memory type similar to what is done on android?
>> >
>> > Maybe. How does the GPU driver map these things on the CPU side?
>> 
>> Currently we use writecombine mappings for everything, although there
>> are some cases that we'd like to use cached (but have not merged
>> patches that would give userspace a way to flush/invalidate)
>> 
>> BR,
>> -R
> 
> LLC/system cache doesn't have a relationship with the CPU cache.  Its 
> just a
> little accelerator that sits on the connection from the GPU to DDR and 
> caches
> accesses. The hint that Sai is suggesting is used to mark the buffers 
> as
> 'no-write-allocate' to prevent GPU write operations from being cached 
> in the LLC
> which a) isn't interesting and b) takes up cache space for read 
> operations.
> 
> Its easiest to think of the LLC as a bonus accelerator that has no cost 
> for
> us to use outside of the unfortunate per buffer hint.
> 
> We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> different hint) and in that case we have all of concerns that Will 
> identified.
> 

For mismatched outer cacheability attributes which Will mentioned, I was
referring to [1] in android kernel.

[1] https://android-review.googlesource.com/c/kernel/common/+/1549097/3

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-01 18:20           ` Jordan Crouse
  2021-02-02  6:26             ` Sai Prakash Ranjan
@ 2021-02-02  6:28             ` Sai Prakash Ranjan
  1 sibling, 0 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-02-02  6:28 UTC (permalink / raw)
  To: Rob Clark, Jordan Crouse, Will Deacon
  Cc: Robin Murphy, Joerg Roedel, Akhil P Oommen, Isaac J. Manjarres,
	list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Linux Kernel Mailing List, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel

On 2021-02-01 23:50, Jordan Crouse wrote:
> On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
>> On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
>> >
>> > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
>> > > On 2021-01-29 14:35, Will Deacon wrote:
>> > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
>> > > > > Add a new page protection flag IOMMU_LLC which can be used
>> > > > > by non-coherent masters to set cacheable memory attributes
>> > > > > for an outer level of cache called as last-level cache or
>> > > > > system cache. Initial user of this page protection flag is
>> > > > > the adreno gpu and then can later be used by other clients
>> > > > > such as video where this can be used for per-buffer based
>> > > > > mapping.
>> > > > >
>> > > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> > > > > ---
>> > > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
>> > > > >  include/linux/iommu.h          | 6 ++++++
>> > > > >  2 files changed, 9 insertions(+)
>> > > > >
>> > > > > diff --git a/drivers/iommu/io-pgtable-arm.c
>> > > > > b/drivers/iommu/io-pgtable-arm.c
>> > > > > index 7439ee7fdcdb..ebe653ef601b 100644
>> > > > > --- a/drivers/iommu/io-pgtable-arm.c
>> > > > > +++ b/drivers/iommu/io-pgtable-arm.c
>> > > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
>> > > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>> > > > >           else if (prot & IOMMU_CACHE)
>> > > > >                   pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>> > > > >                           << ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > > > +         else if (prot & IOMMU_LLC)
>> > > > > +                 pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>> > > > > +                         << ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > > >   }
>> > > > >
>> > > > >   if (prot & IOMMU_CACHE)
>> > > > > diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> > > > > index ffaa389ea128..1f82057df531 100644
>> > > > > --- a/include/linux/iommu.h
>> > > > > +++ b/include/linux/iommu.h
>> > > > > @@ -31,6 +31,12 @@
>> > > > >   * if the IOMMU page table format is equivalent.
>> > > > >   */
>> > > > >  #define IOMMU_PRIV       (1 << 5)
>> > > > > +/*
>> > > > > + * Non-coherent masters can use this page protection flag to set
>> > > > > cacheable
>> > > > > + * memory attributes for only a transparent outer level of cache,
>> > > > > also known as
>> > > > > + * the last-level or system cache.
>> > > > > + */
>> > > > > +#define IOMMU_LLC        (1 << 6)
>> > > >
>> > > > On reflection, I'm a bit worried about exposing this because I think it
>> > > > will
>> > > > introduce a mismatched virtual alias with the CPU (we don't even have a
>> > > > MAIR
>> > > > set up for this memory type). Now, we also have that issue for the PTW,
>> > > > but
>> > > > since we always use cache maintenance (i.e. the streaming API) for
>> > > > publishing the page-tables to a non-coheren walker, it works out.
>> > > > However,
>> > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
>> > > > allocation, then they're potentially in for a nasty surprise due to the
>> > > > mismatched outer-cacheability attributes.
>> > > >
>> > >
>> > > Can't we add the syscached memory type similar to what is done on android?
>> >
>> > Maybe. How does the GPU driver map these things on the CPU side?
>> 
>> Currently we use writecombine mappings for everything, although there
>> are some cases that we'd like to use cached (but have not merged
>> patches that would give userspace a way to flush/invalidate)
>> 
>> BR,
>> -R
> 
> LLC/system cache doesn't have a relationship with the CPU cache.  Its 
> just a
> little accelerator that sits on the connection from the GPU to DDR and 
> caches
> accesses. The hint that Sai is suggesting is used to mark the buffers 
> as
> 'no-write-allocate' to prevent GPU write operations from being cached 
> in the LLC
> which a) isn't interesting and b) takes up cache space for read 
> operations.
> 
> Its easiest to think of the LLC as a bonus accelerator that has no cost 
> for
> us to use outside of the unfortunate per buffer hint.
> 
> We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> different hint) and in that case we have all of concerns that Will 
> identified.
> 

For mismatched outer cacheability attributes which Will mentioned, I was
referring to [1] in android kernel.

[1] https://android-review.googlesource.com/c/kernel/common/+/1549097/3

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-02  6:26             ` Sai Prakash Ranjan
@ 2021-02-03 21:46               ` Will Deacon
  2021-02-03 22:14                 ` Rob Clark
  2021-02-05 12:08                 ` Sai Prakash Ranjan
  0 siblings, 2 replies; 36+ messages in thread
From: Will Deacon @ 2021-02-03 21:46 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Akhil P Oommen,
	Isaac J. Manjarres, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	linux-arm-kernel, Linux Kernel Mailing List, linux-arm-msm,
	freedreno, Kristian H Kristensen, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel

On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> On 2021-02-01 23:50, Jordan Crouse wrote:
> > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
> > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > +#define IOMMU_LLC        (1 << 6)
> > > > > >
> > > > > > On reflection, I'm a bit worried about exposing this because I think it
> > > > > > will
> > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > > > > MAIR
> > > > > > set up for this memory type). Now, we also have that issue for the PTW,
> > > > > > but
> > > > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > > > However,
> > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > > > > allocation, then they're potentially in for a nasty surprise due to the
> > > > > > mismatched outer-cacheability attributes.
> > > > > >
> > > > >
> > > > > Can't we add the syscached memory type similar to what is done on android?
> > > >
> > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > 
> > > Currently we use writecombine mappings for everything, although there
> > > are some cases that we'd like to use cached (but have not merged
> > > patches that would give userspace a way to flush/invalidate)
> > > 
> > 
> > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > just a
> > little accelerator that sits on the connection from the GPU to DDR and
> > caches
> > accesses. The hint that Sai is suggesting is used to mark the buffers as
> > 'no-write-allocate' to prevent GPU write operations from being cached in
> > the LLC
> > which a) isn't interesting and b) takes up cache space for read
> > operations.
> > 
> > Its easiest to think of the LLC as a bonus accelerator that has no cost
> > for
> > us to use outside of the unfortunate per buffer hint.
> > 
> > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> > different hint) and in that case we have all of concerns that Will
> > identified.
> > 
> 
> For mismatched outer cacheability attributes which Will mentioned, I was
> referring to [1] in android kernel.

I've lost track of the conversation here :/

When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped
into the CPU and with what attributes? Rob said "writecombine for
everything" -- does that mean ioremap_wc() / MEMREMAP_WC?

Finally, we need to be careful when we use the word "hint" as "allocation
hint" has a specific meaning in the architecture, and if we only mismatch on
those then we're actually ok. But I think IOMMU_LLC is more than just a
hint, since it actually drives eviction policy (i.e. it enables writeback).

Sorry for the pedantry, but I just want to make sure we're all talking
about the same things!

Cheers,

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-03 21:46               ` Will Deacon
@ 2021-02-03 22:14                 ` Rob Clark
  2021-03-16 17:04                   ` Rob Clark
  2021-02-05 12:08                 ` Sai Prakash Ranjan
  1 sibling, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-02-03 22:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Robin Murphy, Joerg Roedel, Akhil P Oommen,
	Isaac J. Manjarres, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Linux Kernel Mailing List, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel

On Wed, Feb 3, 2021 at 1:46 PM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> > On 2021-02-01 23:50, Jordan Crouse wrote:
> > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
> > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > +#define IOMMU_LLC        (1 << 6)
> > > > > > >
> > > > > > > On reflection, I'm a bit worried about exposing this because I think it
> > > > > > > will
> > > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > > > > > MAIR
> > > > > > > set up for this memory type). Now, we also have that issue for the PTW,
> > > > > > > but
> > > > > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > > > > However,
> > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > > > > > allocation, then they're potentially in for a nasty surprise due to the
> > > > > > > mismatched outer-cacheability attributes.
> > > > > > >
> > > > > >
> > > > > > Can't we add the syscached memory type similar to what is done on android?
> > > > >
> > > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > >
> > > > Currently we use writecombine mappings for everything, although there
> > > > are some cases that we'd like to use cached (but have not merged
> > > > patches that would give userspace a way to flush/invalidate)
> > > >
> > >
> > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > > just a
> > > little accelerator that sits on the connection from the GPU to DDR and
> > > caches
> > > accesses. The hint that Sai is suggesting is used to mark the buffers as
> > > 'no-write-allocate' to prevent GPU write operations from being cached in
> > > the LLC
> > > which a) isn't interesting and b) takes up cache space for read
> > > operations.
> > >
> > > Its easiest to think of the LLC as a bonus accelerator that has no cost
> > > for
> > > us to use outside of the unfortunate per buffer hint.
> > >
> > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> > > different hint) and in that case we have all of concerns that Will
> > > identified.
> > >
> >
> > For mismatched outer cacheability attributes which Will mentioned, I was
> > referring to [1] in android kernel.
>
> I've lost track of the conversation here :/
>
> When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped
> into the CPU and with what attributes? Rob said "writecombine for
> everything" -- does that mean ioremap_wc() / MEMREMAP_WC?

Currently userspace asks for everything WC, so pgprot_writecombine()

The kernel doesn't enforce this, but so far provides no UAPI to do
anything useful with non-coherent cached mappings (although there is
interest to support this)

BR,
-R

> Finally, we need to be careful when we use the word "hint" as "allocation
> hint" has a specific meaning in the architecture, and if we only mismatch on
> those then we're actually ok. But I think IOMMU_LLC is more than just a
> hint, since it actually drives eviction policy (i.e. it enables writeback).
>
> Sorry for the pedantry, but I just want to make sure we're all talking
> about the same things!
>
> Cheers,
>
> Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-03 21:46               ` Will Deacon
  2021-02-03 22:14                 ` Rob Clark
@ 2021-02-05 12:08                 ` Sai Prakash Ranjan
  2021-03-09  6:40                   ` Sai Prakash Ranjan
  1 sibling, 1 reply; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-02-05 12:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Akhil P Oommen,
	Isaac J. Manjarres, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	linux-arm-kernel, Linux Kernel Mailing List, linux-arm-msm,
	freedreno, Kristian H Kristensen, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel

On 2021-02-04 03:16, Will Deacon wrote:
> On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
>> On 2021-02-01 23:50, Jordan Crouse wrote:
>> > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
>> > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
>> > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
>> > > > > On 2021-01-29 14:35, Will Deacon wrote:
>> > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > +#define IOMMU_LLC        (1 << 6)
>> > > > > >
>> > > > > > On reflection, I'm a bit worried about exposing this because I think it
>> > > > > > will
>> > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
>> > > > > > MAIR
>> > > > > > set up for this memory type). Now, we also have that issue for the PTW,
>> > > > > > but
>> > > > > > since we always use cache maintenance (i.e. the streaming API) for
>> > > > > > publishing the page-tables to a non-coheren walker, it works out.
>> > > > > > However,
>> > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
>> > > > > > allocation, then they're potentially in for a nasty surprise due to the
>> > > > > > mismatched outer-cacheability attributes.
>> > > > > >
>> > > > >
>> > > > > Can't we add the syscached memory type similar to what is done on android?
>> > > >
>> > > > Maybe. How does the GPU driver map these things on the CPU side?
>> > >
>> > > Currently we use writecombine mappings for everything, although there
>> > > are some cases that we'd like to use cached (but have not merged
>> > > patches that would give userspace a way to flush/invalidate)
>> > >
>> >
>> > LLC/system cache doesn't have a relationship with the CPU cache.  Its
>> > just a
>> > little accelerator that sits on the connection from the GPU to DDR and
>> > caches
>> > accesses. The hint that Sai is suggesting is used to mark the buffers as
>> > 'no-write-allocate' to prevent GPU write operations from being cached in
>> > the LLC
>> > which a) isn't interesting and b) takes up cache space for read
>> > operations.
>> >
>> > Its easiest to think of the LLC as a bonus accelerator that has no cost
>> > for
>> > us to use outside of the unfortunate per buffer hint.
>> >
>> > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
>> > different hint) and in that case we have all of concerns that Will
>> > identified.
>> >
>> 
>> For mismatched outer cacheability attributes which Will mentioned, I 
>> was
>> referring to [1] in android kernel.
> 
> I've lost track of the conversation here :/
> 
> When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also 
> mapped
> into the CPU and with what attributes? Rob said "writecombine for
> everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
> 

Rob answered this.

> Finally, we need to be careful when we use the word "hint" as 
> "allocation
> hint" has a specific meaning in the architecture, and if we only 
> mismatch on
> those then we're actually ok. But I think IOMMU_LLC is more than just a
> hint, since it actually drives eviction policy (i.e. it enables 
> writeback).
> 
> Sorry for the pedantry, but I just want to make sure we're all talking
> about the same things!
> 

Sorry for the confusion which probably was caused by my mentioning of
android, NWA(no write allocate) is an allocation hint which we can 
ignore
for now as it is not introduced yet in upstream.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-05 12:08                 ` Sai Prakash Ranjan
@ 2021-03-09  6:40                   ` Sai Prakash Ranjan
  2021-03-25 17:33                     ` Will Deacon
  0 siblings, 1 reply; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-03-09  6:40 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rob Clark, Robin Murphy, Joerg Roedel, Akhil P Oommen,
	Isaac J. Manjarres, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	linux-arm-kernel, Linux Kernel Mailing List, linux-arm-msm,
	freedreno, Kristian H Kristensen, Sean Paul, David Airlie,
	Daniel Vetter, dri-devel

Hi,

On 2021-02-05 17:38, Sai Prakash Ranjan wrote:
> On 2021-02-04 03:16, Will Deacon wrote:
>> On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
>>> On 2021-02-01 23:50, Jordan Crouse wrote:
>>> > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
>>> > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
>>> > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
>>> > > > > On 2021-01-29 14:35, Will Deacon wrote:
>>> > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
>>> > > > > > > +#define IOMMU_LLC        (1 << 6)
>>> > > > > >
>>> > > > > > On reflection, I'm a bit worried about exposing this because I think it
>>> > > > > > will
>>> > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
>>> > > > > > MAIR
>>> > > > > > set up for this memory type). Now, we also have that issue for the PTW,
>>> > > > > > but
>>> > > > > > since we always use cache maintenance (i.e. the streaming API) for
>>> > > > > > publishing the page-tables to a non-coheren walker, it works out.
>>> > > > > > However,
>>> > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
>>> > > > > > allocation, then they're potentially in for a nasty surprise due to the
>>> > > > > > mismatched outer-cacheability attributes.
>>> > > > > >
>>> > > > >
>>> > > > > Can't we add the syscached memory type similar to what is done on android?
>>> > > >
>>> > > > Maybe. How does the GPU driver map these things on the CPU side?
>>> > >
>>> > > Currently we use writecombine mappings for everything, although there
>>> > > are some cases that we'd like to use cached (but have not merged
>>> > > patches that would give userspace a way to flush/invalidate)
>>> > >
>>> >
>>> > LLC/system cache doesn't have a relationship with the CPU cache.  Its
>>> > just a
>>> > little accelerator that sits on the connection from the GPU to DDR and
>>> > caches
>>> > accesses. The hint that Sai is suggesting is used to mark the buffers as
>>> > 'no-write-allocate' to prevent GPU write operations from being cached in
>>> > the LLC
>>> > which a) isn't interesting and b) takes up cache space for read
>>> > operations.
>>> >
>>> > Its easiest to think of the LLC as a bonus accelerator that has no cost
>>> > for
>>> > us to use outside of the unfortunate per buffer hint.
>>> >
>>> > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
>>> > different hint) and in that case we have all of concerns that Will
>>> > identified.
>>> >
>>> 
>>> For mismatched outer cacheability attributes which Will mentioned, I 
>>> was
>>> referring to [1] in android kernel.
>> 
>> I've lost track of the conversation here :/
>> 
>> When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also 
>> mapped
>> into the CPU and with what attributes? Rob said "writecombine for
>> everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
>> 
> 
> Rob answered this.
> 
>> Finally, we need to be careful when we use the word "hint" as 
>> "allocation
>> hint" has a specific meaning in the architecture, and if we only 
>> mismatch on
>> those then we're actually ok. But I think IOMMU_LLC is more than just 
>> a
>> hint, since it actually drives eviction policy (i.e. it enables 
>> writeback).
>> 
>> Sorry for the pedantry, but I just want to make sure we're all talking
>> about the same things!
>> 
> 
> Sorry for the confusion which probably was caused by my mentioning of
> android, NWA(no write allocate) is an allocation hint which we can 
> ignore
> for now as it is not introduced yet in upstream.
> 

Any chance of taking this forward? We do not want to miss out on small 
fps
gain when the product gets released.

Thanks,
Sai
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-02-03 22:14                 ` Rob Clark
@ 2021-03-16 17:04                   ` Rob Clark
  2021-03-16 17:16                     ` Rob Clark
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-03-16 17:04 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Robin Murphy, Joerg Roedel, Akhil P Oommen,
	Isaac J. Manjarres, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Linux Kernel Mailing List, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel

On Wed, Feb 3, 2021 at 2:14 PM Rob Clark <robdclark@gmail.com> wrote:
>
> On Wed, Feb 3, 2021 at 1:46 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> > > On 2021-02-01 23:50, Jordan Crouse wrote:
> > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
> > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > +#define IOMMU_LLC        (1 << 6)
> > > > > > > >
> > > > > > > > On reflection, I'm a bit worried about exposing this because I think it
> > > > > > > > will
> > > > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > > > > > > MAIR
> > > > > > > > set up for this memory type). Now, we also have that issue for the PTW,
> > > > > > > > but
> > > > > > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > > > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > > > > > However,
> > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > > > > > > allocation, then they're potentially in for a nasty surprise due to the
> > > > > > > > mismatched outer-cacheability attributes.
> > > > > > > >
> > > > > > >
> > > > > > > Can't we add the syscached memory type similar to what is done on android?
> > > > > >
> > > > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > > >
> > > > > Currently we use writecombine mappings for everything, although there
> > > > > are some cases that we'd like to use cached (but have not merged
> > > > > patches that would give userspace a way to flush/invalidate)
> > > > >
> > > >
> > > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > > > just a
> > > > little accelerator that sits on the connection from the GPU to DDR and
> > > > caches
> > > > accesses. The hint that Sai is suggesting is used to mark the buffers as
> > > > 'no-write-allocate' to prevent GPU write operations from being cached in
> > > > the LLC
> > > > which a) isn't interesting and b) takes up cache space for read
> > > > operations.
> > > >
> > > > Its easiest to think of the LLC as a bonus accelerator that has no cost
> > > > for
> > > > us to use outside of the unfortunate per buffer hint.
> > > >
> > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> > > > different hint) and in that case we have all of concerns that Will
> > > > identified.
> > > >
> > >
> > > For mismatched outer cacheability attributes which Will mentioned, I was
> > > referring to [1] in android kernel.
> >
> > I've lost track of the conversation here :/
> >
> > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped
> > into the CPU and with what attributes? Rob said "writecombine for
> > everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
>
> Currently userspace asks for everything WC, so pgprot_writecombine()
>
> The kernel doesn't enforce this, but so far provides no UAPI to do
> anything useful with non-coherent cached mappings (although there is
> interest to support this)
>

btw, I'm looking at a benchmark (gl_driver2_off) where (after some
other in-flight optimizations land) we end up bottlenecked on writing
to WC cmdstream buffers.  I assume in the current state, WC goes all
the way to main memory rather than just to system cache?

BR,
-R

> BR,
> -R
>
> > Finally, we need to be careful when we use the word "hint" as "allocation
> > hint" has a specific meaning in the architecture, and if we only mismatch on
> > those then we're actually ok. But I think IOMMU_LLC is more than just a
> > hint, since it actually drives eviction policy (i.e. it enables writeback).
> >
> > Sorry for the pedantry, but I just want to make sure we're all talking
> > about the same things!
> >
> > Cheers,
> >
> > Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-03-16 17:04                   ` Rob Clark
@ 2021-03-16 17:16                     ` Rob Clark
  0 siblings, 0 replies; 36+ messages in thread
From: Rob Clark @ 2021-03-16 17:16 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Robin Murphy, Joerg Roedel, Akhil P Oommen,
	Isaac J. Manjarres, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Linux Kernel Mailing List, linux-arm-msm, freedreno,
	Kristian H Kristensen, Sean Paul, David Airlie, Daniel Vetter,
	dri-devel

On Tue, Mar 16, 2021 at 10:04 AM Rob Clark <robdclark@gmail.com> wrote:
>
> On Wed, Feb 3, 2021 at 2:14 PM Rob Clark <robdclark@gmail.com> wrote:
> >
> > On Wed, Feb 3, 2021 at 1:46 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> > > > On 2021-02-01 23:50, Jordan Crouse wrote:
> > > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
> > > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > > +#define IOMMU_LLC        (1 << 6)
> > > > > > > > >
> > > > > > > > > On reflection, I'm a bit worried about exposing this because I think it
> > > > > > > > > will
> > > > > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > > > > > > > MAIR
> > > > > > > > > set up for this memory type). Now, we also have that issue for the PTW,
> > > > > > > > > but
> > > > > > > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > > > > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > > > > > > However,
> > > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > > > > > > > allocation, then they're potentially in for a nasty surprise due to the
> > > > > > > > > mismatched outer-cacheability attributes.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can't we add the syscached memory type similar to what is done on android?
> > > > > > >
> > > > > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > > > >
> > > > > > Currently we use writecombine mappings for everything, although there
> > > > > > are some cases that we'd like to use cached (but have not merged
> > > > > > patches that would give userspace a way to flush/invalidate)
> > > > > >
> > > > >
> > > > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > > > > just a
> > > > > little accelerator that sits on the connection from the GPU to DDR and
> > > > > caches
> > > > > accesses. The hint that Sai is suggesting is used to mark the buffers as
> > > > > 'no-write-allocate' to prevent GPU write operations from being cached in
> > > > > the LLC
> > > > > which a) isn't interesting and b) takes up cache space for read
> > > > > operations.
> > > > >
> > > > > Its easiest to think of the LLC as a bonus accelerator that has no cost
> > > > > for
> > > > > us to use outside of the unfortunate per buffer hint.
> > > > >
> > > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> > > > > different hint) and in that case we have all of concerns that Will
> > > > > identified.
> > > > >
> > > >
> > > > For mismatched outer cacheability attributes which Will mentioned, I was
> > > > referring to [1] in android kernel.
> > >
> > > I've lost track of the conversation here :/
> > >
> > > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also mapped
> > > into the CPU and with what attributes? Rob said "writecombine for
> > > everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
> >
> > Currently userspace asks for everything WC, so pgprot_writecombine()
> >
> > The kernel doesn't enforce this, but so far provides no UAPI to do
> > anything useful with non-coherent cached mappings (although there is
> > interest to support this)
> >
>
> btw, I'm looking at a benchmark (gl_driver2_off) where (after some
> other in-flight optimizations land) we end up bottlenecked on writing
> to WC cmdstream buffers.  I assume in the current state, WC goes all
> the way to main memory rather than just to system cache?
>

oh, I guess this (mentioned earlier in thread) is what I really want
for this benchmark:

https://android-review.googlesource.com/c/kernel/common/+/1549097/3

> BR,
> -R
>
> > BR,
> > -R
> >
> > > Finally, we need to be careful when we use the word "hint" as "allocation
> > > hint" has a specific meaning in the architecture, and if we only mismatch on
> > > those then we're actually ok. But I think IOMMU_LLC is more than just a
> > > hint, since it actually drives eviction policy (i.e. it enables writeback).
> > >
> > > Sorry for the pedantry, but I just want to make sure we're all talking
> > > about the same things!
> > >
> > > Cheers,
> > >
> > > Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-03-09  6:40                   ` Sai Prakash Ranjan
@ 2021-03-25 17:33                     ` Will Deacon
  2021-06-30 10:07                       ` Sai Prakash Ranjan
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2021-03-25 17:33 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Isaac J. Manjarres, freedreno, David Airlie,
	Linux Kernel Mailing List, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>, ,
	dri-devel, Akhil P Oommen, Sean Paul, Kristian H Kristensen,
	Daniel Vetter, linux-arm-msm, Robin Murphy, linux-arm-kernel

On Tue, Mar 09, 2021 at 12:10:44PM +0530, Sai Prakash Ranjan wrote:
> On 2021-02-05 17:38, Sai Prakash Ranjan wrote:
> > On 2021-02-04 03:16, Will Deacon wrote:
> > > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
> > > > On 2021-02-01 23:50, Jordan Crouse wrote:
> > > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
> > > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
> > > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > On 2021-01-29 14:35, Will Deacon wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > > +#define IOMMU_LLC        (1 << 6)
> > > > > > > > >
> > > > > > > > > On reflection, I'm a bit worried about exposing this because I think it
> > > > > > > > > will
> > > > > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
> > > > > > > > > MAIR
> > > > > > > > > set up for this memory type). Now, we also have that issue for the PTW,
> > > > > > > > > but
> > > > > > > > > since we always use cache maintenance (i.e. the streaming API) for
> > > > > > > > > publishing the page-tables to a non-coheren walker, it works out.
> > > > > > > > > However,
> > > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
> > > > > > > > > allocation, then they're potentially in for a nasty surprise due to the
> > > > > > > > > mismatched outer-cacheability attributes.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Can't we add the syscached memory type similar to what is done on android?
> > > > > > >
> > > > > > > Maybe. How does the GPU driver map these things on the CPU side?
> > > > > >
> > > > > > Currently we use writecombine mappings for everything, although there
> > > > > > are some cases that we'd like to use cached (but have not merged
> > > > > > patches that would give userspace a way to flush/invalidate)
> > > > > >
> > > > >
> > > > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
> > > > > just a
> > > > > little accelerator that sits on the connection from the GPU to DDR and
> > > > > caches
> > > > > accesses. The hint that Sai is suggesting is used to mark the buffers as
> > > > > 'no-write-allocate' to prevent GPU write operations from being cached in
> > > > > the LLC
> > > > > which a) isn't interesting and b) takes up cache space for read
> > > > > operations.
> > > > >
> > > > > Its easiest to think of the LLC as a bonus accelerator that has no cost
> > > > > for
> > > > > us to use outside of the unfortunate per buffer hint.
> > > > >
> > > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
> > > > > different hint) and in that case we have all of concerns that Will
> > > > > identified.
> > > > >
> > > > 
> > > > For mismatched outer cacheability attributes which Will
> > > > mentioned, I was
> > > > referring to [1] in android kernel.
> > > 
> > > I've lost track of the conversation here :/
> > > 
> > > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also
> > > mapped
> > > into the CPU and with what attributes? Rob said "writecombine for
> > > everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
> > > 
> > 
> > Rob answered this.
> > 
> > > Finally, we need to be careful when we use the word "hint" as
> > > "allocation
> > > hint" has a specific meaning in the architecture, and if we only
> > > mismatch on
> > > those then we're actually ok. But I think IOMMU_LLC is more than
> > > just a
> > > hint, since it actually drives eviction policy (i.e. it enables
> > > writeback).
> > > 
> > > Sorry for the pedantry, but I just want to make sure we're all talking
> > > about the same things!
> > > 
> > 
> > Sorry for the confusion which probably was caused by my mentioning of
> > android, NWA(no write allocate) is an allocation hint which we can
> > ignore
> > for now as it is not introduced yet in upstream.
> > 
> 
> Any chance of taking this forward? We do not want to miss out on small fps
> gain when the product gets released.

Do we have a solution to the mismatched virtual alias?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
  2021-03-25 17:33                     ` Will Deacon
@ 2021-06-30 10:07                       ` Sai Prakash Ranjan
  0 siblings, 0 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-06-30 10:07 UTC (permalink / raw)
  To: Will Deacon
  Cc: Isaac J. Manjarres, freedreno, David Airlie,
	Linux Kernel Mailing List, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	dri-devel, Akhil P Oommen, Sean Paul, Kristian H Kristensen,
	Daniel Vetter, linux-arm-msm, Robin Murphy, linux-arm-kernel

Hi Will,

On 2021-03-25 23:03, Will Deacon wrote:
> On Tue, Mar 09, 2021 at 12:10:44PM +0530, Sai Prakash Ranjan wrote:
>> On 2021-02-05 17:38, Sai Prakash Ranjan wrote:
>> > On 2021-02-04 03:16, Will Deacon wrote:
>> > > On Tue, Feb 02, 2021 at 11:56:27AM +0530, Sai Prakash Ranjan wrote:
>> > > > On 2021-02-01 23:50, Jordan Crouse wrote:
>> > > > > On Mon, Feb 01, 2021 at 08:20:44AM -0800, Rob Clark wrote:
>> > > > > > On Mon, Feb 1, 2021 at 3:16 AM Will Deacon <will@kernel.org> wrote:
>> > > > > > > On Fri, Jan 29, 2021 at 03:12:59PM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > > On 2021-01-29 14:35, Will Deacon wrote:
>> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:04PM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > > > > +#define IOMMU_LLC        (1 << 6)
>> > > > > > > > >
>> > > > > > > > > On reflection, I'm a bit worried about exposing this because I think it
>> > > > > > > > > will
>> > > > > > > > > introduce a mismatched virtual alias with the CPU (we don't even have a
>> > > > > > > > > MAIR
>> > > > > > > > > set up for this memory type). Now, we also have that issue for the PTW,
>> > > > > > > > > but
>> > > > > > > > > since we always use cache maintenance (i.e. the streaming API) for
>> > > > > > > > > publishing the page-tables to a non-coheren walker, it works out.
>> > > > > > > > > However,
>> > > > > > > > > if somebody expects IOMMU_LLC to be coherent with a DMA API coherent
>> > > > > > > > > allocation, then they're potentially in for a nasty surprise due to the
>> > > > > > > > > mismatched outer-cacheability attributes.
>> > > > > > > > >
>> > > > > > > >
>> > > > > > > > Can't we add the syscached memory type similar to what is done on android?
>> > > > > > >
>> > > > > > > Maybe. How does the GPU driver map these things on the CPU side?
>> > > > > >
>> > > > > > Currently we use writecombine mappings for everything, although there
>> > > > > > are some cases that we'd like to use cached (but have not merged
>> > > > > > patches that would give userspace a way to flush/invalidate)
>> > > > > >
>> > > > >
>> > > > > LLC/system cache doesn't have a relationship with the CPU cache.  Its
>> > > > > just a
>> > > > > little accelerator that sits on the connection from the GPU to DDR and
>> > > > > caches
>> > > > > accesses. The hint that Sai is suggesting is used to mark the buffers as
>> > > > > 'no-write-allocate' to prevent GPU write operations from being cached in
>> > > > > the LLC
>> > > > > which a) isn't interesting and b) takes up cache space for read
>> > > > > operations.
>> > > > >
>> > > > > Its easiest to think of the LLC as a bonus accelerator that has no cost
>> > > > > for
>> > > > > us to use outside of the unfortunate per buffer hint.
>> > > > >
>> > > > > We do have to worry about the CPU cache w.r.t I/O coherency (which is a
>> > > > > different hint) and in that case we have all of concerns that Will
>> > > > > identified.
>> > > > >
>> > > >
>> > > > For mismatched outer cacheability attributes which Will
>> > > > mentioned, I was
>> > > > referring to [1] in android kernel.
>> > >
>> > > I've lost track of the conversation here :/
>> > >
>> > > When the GPU has a buffer mapped with IOMMU_LLC, is the buffer also
>> > > mapped
>> > > into the CPU and with what attributes? Rob said "writecombine for
>> > > everything" -- does that mean ioremap_wc() / MEMREMAP_WC?
>> > >
>> >
>> > Rob answered this.
>> >
>> > > Finally, we need to be careful when we use the word "hint" as
>> > > "allocation
>> > > hint" has a specific meaning in the architecture, and if we only
>> > > mismatch on
>> > > those then we're actually ok. But I think IOMMU_LLC is more than
>> > > just a
>> > > hint, since it actually drives eviction policy (i.e. it enables
>> > > writeback).
>> > >
>> > > Sorry for the pedantry, but I just want to make sure we're all talking
>> > > about the same things!
>> > >
>> >
>> > Sorry for the confusion which probably was caused by my mentioning of
>> > android, NWA(no write allocate) is an allocation hint which we can
>> > ignore
>> > for now as it is not introduced yet in upstream.
>> >
>> 
>> Any chance of taking this forward? We do not want to miss out on small 
>> fps
>> gain when the product gets released.
> 
> Do we have a solution to the mismatched virtual alias?
> 

Sorry for the long delay on this thread.

For mismatched virtual alias question, wasn't this already discussed in 
stretch
when initial support for system cache [1] (which was reverted by you) 
was added?

Excerpt from there,

"As seen in downstream kernels there are few non-coherent devices which
would not want to allocate in system cache, and therefore would want
Inner/Outer non-cached memory. So, we may want to either override the
attributes per-device, or as you suggested we may want to introduce
another memory type 'sys-cached' that can be added with its separate
infra."

As for DMA API usage, we do not have any upstream users (video will be
one if they decide to upstream that).

[1] 
https://patchwork.kernel.org/project/linux-arm-msm/patch/20180615105329.26800-1-vivek.gautam@codeaurora.org/

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-01-11 14:15 [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
                   ` (3 preceding siblings ...)
  2021-01-20  5:18 ` [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
@ 2021-07-28 14:00 ` Georgi Djakov
  2021-07-29  4:38   ` Sai Prakash Ranjan
  4 siblings, 1 reply; 36+ messages in thread
From: Georgi Djakov @ 2021-07-28 14:00 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Jordan Crouse,
	Rob Clark, Akhil P Oommen, isaacm, iommu, linux-arm-kernel,
	linux-kernel, linux-arm-msm, freedreno, Kristian H Kristensen,
	Sean Paul, David Airlie, Daniel Vetter, dri-devel

On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> the memory type setting required for the non-coherent masters to use
> system cache. Now that system cache support for GPU is added, we will
> need to set the right PTE attribute for GPU buffers to be sys cached.
> Without this, the system cache lines are not allocated for GPU.
> 
> So the patches in this series introduces a new prot flag IOMMU_LLC,
> renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> and makes GPU the user of this protection flag.

Hi Sai,

Thank you for the patchset! Are you planning to refresh it, as it does
not apply anymore?

Thanks,
Georgi

> 
> The series slightly depends on following 2 patches posted earlier and
> is based on msm-next branch:
>  * https://lore.kernel.org/patchwork/patch/1363008/
>  * https://lore.kernel.org/patchwork/patch/1363010/
> 
> Sai Prakash Ranjan (3):
>   iommu/io-pgtable: Rename last-level cache quirk to
>     IO_PGTABLE_QUIRK_PTW_LLC
>   iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
>   drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
> 
>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>  drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
>  drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
>  drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
>  include/linux/io-pgtable.h              | 6 +++---
>  include/linux/iommu.h                   | 6 ++++++
>  7 files changed, 26 insertions(+), 7 deletions(-)
> 
> 
> base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-07-28 14:00 ` Georgi Djakov
@ 2021-07-29  4:38   ` Sai Prakash Ranjan
  2021-08-02 10:55     ` Will Deacon
  0 siblings, 1 reply; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-07-29  4:38 UTC (permalink / raw)
  To: Georgi Djakov
  Cc: Will Deacon, Robin Murphy, Joerg Roedel, Jordan Crouse,
	Rob Clark, Akhil P Oommen, isaacm, iommu, linux-arm-kernel,
	linux-kernel, linux-arm-msm, freedreno, Kristian H Kristensen,
	Sean Paul, David Airlie, Daniel Vetter, dri-devel

Hi Georgi,

On 2021-07-28 19:30, Georgi Djakov wrote:
> On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
>> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> the memory type setting required for the non-coherent masters to use
>> system cache. Now that system cache support for GPU is added, we will
>> need to set the right PTE attribute for GPU buffers to be sys cached.
>> Without this, the system cache lines are not allocated for GPU.
>> 
>> So the patches in this series introduces a new prot flag IOMMU_LLC,
>> renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> and makes GPU the user of this protection flag.
> 
> Hi Sai,
> 
> Thank you for the patchset! Are you planning to refresh it, as it does
> not apply anymore?
> 

I was waiting on Will's reply [1]. If there are no changes needed, then
I can repost the patch.

[1] 
https://lore.kernel.org/lkml/21239ba603d0bdc4e4c696588a905f88@codeaurora.org/

Thanks,
Sai

> 
>> 
>> The series slightly depends on following 2 patches posted earlier and
>> is based on msm-next branch:
>>  * https://lore.kernel.org/patchwork/patch/1363008/
>>  * https://lore.kernel.org/patchwork/patch/1363010/
>> 
>> Sai Prakash Ranjan (3):
>>   iommu/io-pgtable: Rename last-level cache quirk to
>>     IO_PGTABLE_QUIRK_PTW_LLC
>>   iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag
>>   drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers
>> 
>>  drivers/gpu/drm/msm/adreno/a6xx_gpu.c   | 3 +++
>>  drivers/gpu/drm/msm/adreno/adreno_gpu.c | 2 +-
>>  drivers/gpu/drm/msm/msm_iommu.c         | 3 +++
>>  drivers/gpu/drm/msm/msm_mmu.h           | 4 ++++
>>  drivers/iommu/io-pgtable-arm.c          | 9 ++++++---
>>  include/linux/io-pgtable.h              | 6 +++---
>>  include/linux/iommu.h                   | 6 ++++++
>>  7 files changed, 26 insertions(+), 7 deletions(-)
>> 
>> 
>> base-commit: 00fd44a1a4700718d5d962432b55c09820f7e709
>> --
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
>> member
>> of Code Aurora Forum, hosted by The Linux Foundation
>> 

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-07-29  4:38   ` Sai Prakash Ranjan
@ 2021-08-02 10:55     ` Will Deacon
  2021-08-02 15:08       ` [Freedreno] " Rob Clark
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2021-08-02 10:55 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Georgi Djakov, isaacm, David Airlie, Akhil P Oommen, iommu,
	linux-kernel, Sean Paul, Jordan Crouse, Kristian H Kristensen,
	dri-devel, Daniel Vetter, linux-arm-msm, freedreno, Robin Murphy,
	linux-arm-kernel

On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> On 2021-07-28 19:30, Georgi Djakov wrote:
> > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > the memory type setting required for the non-coherent masters to use
> > > system cache. Now that system cache support for GPU is added, we will
> > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > Without this, the system cache lines are not allocated for GPU.
> > > 
> > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > and makes GPU the user of this protection flag.
> > 
> > Thank you for the patchset! Are you planning to refresh it, as it does
> > not apply anymore?
> > 
> 
> I was waiting on Will's reply [1]. If there are no changes needed, then
> I can repost the patch.

I still think you need to handle the mismatched alias, no? You're adding
a new memory type to the SMMU which doesn't exist on the CPU side. That
can't be right.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-02 10:55     ` Will Deacon
@ 2021-08-02 15:08       ` Rob Clark
  2021-08-02 15:14         ` Will Deacon
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-08-02 15:08 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Georgi Djakov, Isaac J. Manjarres,
	David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > the memory type setting required for the non-coherent masters to use
> > > > system cache. Now that system cache support for GPU is added, we will
> > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > Without this, the system cache lines are not allocated for GPU.
> > > >
> > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > and makes GPU the user of this protection flag.
> > >
> > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > not apply anymore?
> > >
> >
> > I was waiting on Will's reply [1]. If there are no changes needed, then
> > I can repost the patch.
>
> I still think you need to handle the mismatched alias, no? You're adding
> a new memory type to the SMMU which doesn't exist on the CPU side. That
> can't be right.
>

Just curious, and maybe this is a dumb question, but what is your
concern about mismatched aliases?  I mean the cache hierarchy on the
GPU device side (anything beyond the LLC) is pretty different and
doesn't really care about the smmu pgtable attributes..

BR,
-R

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-02 15:08       ` [Freedreno] " Rob Clark
@ 2021-08-02 15:14         ` Will Deacon
  2021-08-03  1:36           ` Rob Clark
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2021-08-02 15:14 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sai Prakash Ranjan, Georgi Djakov, Isaac J. Manjarres,
	David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > the memory type setting required for the non-coherent masters to use
> > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > Without this, the system cache lines are not allocated for GPU.
> > > > >
> > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > and makes GPU the user of this protection flag.
> > > >
> > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > not apply anymore?
> > > >
> > >
> > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > I can repost the patch.
> >
> > I still think you need to handle the mismatched alias, no? You're adding
> > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > can't be right.
> >
> 
> Just curious, and maybe this is a dumb question, but what is your
> concern about mismatched aliases?  I mean the cache hierarchy on the
> GPU device side (anything beyond the LLC) is pretty different and
> doesn't really care about the smmu pgtable attributes..

If the CPU accesses a shared buffer with different attributes to those which
the device is using then you fall into the "mismatched memory attributes"
part of the Arm architecture. It's reasonably unforgiving (you should go and
read it) and in some cases can apply to speculative accesses as well, but
the end result is typically loss of coherency.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-02 15:14         ` Will Deacon
@ 2021-08-03  1:36           ` Rob Clark
  2021-08-09 14:56             ` Will Deacon
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-08-03  1:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Georgi Djakov, Isaac J. Manjarres,
	David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > >
> > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > and makes GPU the user of this protection flag.
> > > > >
> > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > not apply anymore?
> > > > >
> > > >
> > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > I can repost the patch.
> > >
> > > I still think you need to handle the mismatched alias, no? You're adding
> > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > can't be right.
> > >
> >
> > Just curious, and maybe this is a dumb question, but what is your
> > concern about mismatched aliases?  I mean the cache hierarchy on the
> > GPU device side (anything beyond the LLC) is pretty different and
> > doesn't really care about the smmu pgtable attributes..
>
> If the CPU accesses a shared buffer with different attributes to those which
> the device is using then you fall into the "mismatched memory attributes"
> part of the Arm architecture. It's reasonably unforgiving (you should go and
> read it) and in some cases can apply to speculative accesses as well, but
> the end result is typically loss of coherency.

Ok, I might have a few other sections to read first to decipher the
terminology..

But my understanding of LLC is that it looks just like system memory
to the CPU and GPU (I think that would make it "the point of
coherence" between the GPU and CPU?)  If that is true, shouldn't it be
invisible from the point of view of different CPU mapping options?

BR,
-R

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-03  1:36           ` Rob Clark
@ 2021-08-09 14:56             ` Will Deacon
  2021-08-09 16:57               ` Rob Clark
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2021-08-09 14:56 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sai Prakash Ranjan, Georgi Djakov, Isaac J. Manjarres,
	David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > >
> > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > >
> > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > and makes GPU the user of this protection flag.
> > > > > >
> > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > not apply anymore?
> > > > > >
> > > > >
> > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > I can repost the patch.
> > > >
> > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > can't be right.
> > > >
> > >
> > > Just curious, and maybe this is a dumb question, but what is your
> > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > GPU device side (anything beyond the LLC) is pretty different and
> > > doesn't really care about the smmu pgtable attributes..
> >
> > If the CPU accesses a shared buffer with different attributes to those which
> > the device is using then you fall into the "mismatched memory attributes"
> > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > read it) and in some cases can apply to speculative accesses as well, but
> > the end result is typically loss of coherency.
> 
> Ok, I might have a few other sections to read first to decipher the
> terminology..
> 
> But my understanding of LLC is that it looks just like system memory
> to the CPU and GPU (I think that would make it "the point of
> coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> invisible from the point of view of different CPU mapping options?

You could certainly build a system where mismatched attributes don't cause
loss of coherence, but as it's not guaranteed by the architecture and the
changes proposed here affect APIs which are exposed across SoCs, then I
don't think it helps much.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 14:56             ` Will Deacon
@ 2021-08-09 16:57               ` Rob Clark
  2021-08-09 17:05                 ` Will Deacon
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-08-09 16:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Georgi Djakov, Isaac J. Manjarres,
	David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > >
> > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > and makes GPU the user of this protection flag.
> > > > > > >
> > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > > not apply anymore?
> > > > > > >
> > > > > >
> > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > > I can repost the patch.
> > > > >
> > > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > > can't be right.
> > > > >
> > > >
> > > > Just curious, and maybe this is a dumb question, but what is your
> > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > doesn't really care about the smmu pgtable attributes..
> > >
> > > If the CPU accesses a shared buffer with different attributes to those which
> > > the device is using then you fall into the "mismatched memory attributes"
> > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > > read it) and in some cases can apply to speculative accesses as well, but
> > > the end result is typically loss of coherency.
> >
> > Ok, I might have a few other sections to read first to decipher the
> > terminology..
> >
> > But my understanding of LLC is that it looks just like system memory
> > to the CPU and GPU (I think that would make it "the point of
> > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > invisible from the point of view of different CPU mapping options?
>
> You could certainly build a system where mismatched attributes don't cause
> loss of coherence, but as it's not guaranteed by the architecture and the
> changes proposed here affect APIs which are exposed across SoCs, then I
> don't think it helps much.
>

Hmm, the description of the new mapping flag is that it applies only
to transparent outer level cache:

+/*
+ * Non-coherent masters can use this page protection flag to set cacheable
+ * memory attributes for only a transparent outer level of cache, also known as
+ * the last-level or system cache.
+ */
+#define IOMMU_LLC      (1 << 6)

But I suppose we could call it instead IOMMU_QCOM_LLC or something
like that to make it more clear that it is not necessarily something
that would work with a different outer level cache implementation?

BR,
-R

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 16:57               ` Rob Clark
@ 2021-08-09 17:05                 ` Will Deacon
  2021-08-09 17:18                   ` Rob Clark
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2021-08-09 17:05 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sai Prakash Ranjan, Georgi Djakov, Isaac J. Manjarres,
	David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > > >
> > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > >
> > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > > > not apply anymore?
> > > > > > > >
> > > > > > >
> > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > > > I can repost the patch.
> > > > > >
> > > > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > > > can't be right.
> > > > > >
> > > > >
> > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > doesn't really care about the smmu pgtable attributes..
> > > >
> > > > If the CPU accesses a shared buffer with different attributes to those which
> > > > the device is using then you fall into the "mismatched memory attributes"
> > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > > > read it) and in some cases can apply to speculative accesses as well, but
> > > > the end result is typically loss of coherency.
> > >
> > > Ok, I might have a few other sections to read first to decipher the
> > > terminology..
> > >
> > > But my understanding of LLC is that it looks just like system memory
> > > to the CPU and GPU (I think that would make it "the point of
> > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > invisible from the point of view of different CPU mapping options?
> >
> > You could certainly build a system where mismatched attributes don't cause
> > loss of coherence, but as it's not guaranteed by the architecture and the
> > changes proposed here affect APIs which are exposed across SoCs, then I
> > don't think it helps much.
> >
> 
> Hmm, the description of the new mapping flag is that it applies only
> to transparent outer level cache:
> 
> +/*
> + * Non-coherent masters can use this page protection flag to set cacheable
> + * memory attributes for only a transparent outer level of cache, also known as
> + * the last-level or system cache.
> + */
> +#define IOMMU_LLC      (1 << 6)
> 
> But I suppose we could call it instead IOMMU_QCOM_LLC or something
> like that to make it more clear that it is not necessarily something
> that would work with a different outer level cache implementation?

... or we could just deal with the problem so that other people can reuse
the code. I haven't really understood the reluctance to solve this properly.

Am I missing some reason this isn't solvable?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 17:05                 ` Will Deacon
@ 2021-08-09 17:18                   ` Rob Clark
  2021-08-09 17:40                     ` Will Deacon
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-08-09 17:18 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Georgi Djakov, Isaac J. Manjarres,
	David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>
> On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > > > >
> > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > >
> > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > > > > not apply anymore?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > > > > I can repost the patch.
> > > > > > >
> > > > > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > > > > can't be right.
> > > > > > >
> > > > > >
> > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > doesn't really care about the smmu pgtable attributes..
> > > > >
> > > > > If the CPU accesses a shared buffer with different attributes to those which
> > > > > the device is using then you fall into the "mismatched memory attributes"
> > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > > > > read it) and in some cases can apply to speculative accesses as well, but
> > > > > the end result is typically loss of coherency.
> > > >
> > > > Ok, I might have a few other sections to read first to decipher the
> > > > terminology..
> > > >
> > > > But my understanding of LLC is that it looks just like system memory
> > > > to the CPU and GPU (I think that would make it "the point of
> > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > invisible from the point of view of different CPU mapping options?
> > >
> > > You could certainly build a system where mismatched attributes don't cause
> > > loss of coherence, but as it's not guaranteed by the architecture and the
> > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > don't think it helps much.
> > >
> >
> > Hmm, the description of the new mapping flag is that it applies only
> > to transparent outer level cache:
> >
> > +/*
> > + * Non-coherent masters can use this page protection flag to set cacheable
> > + * memory attributes for only a transparent outer level of cache, also known as
> > + * the last-level or system cache.
> > + */
> > +#define IOMMU_LLC      (1 << 6)
> >
> > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > like that to make it more clear that it is not necessarily something
> > that would work with a different outer level cache implementation?
>
> ... or we could just deal with the problem so that other people can reuse
> the code. I haven't really understood the reluctance to solve this properly.
>
> Am I missing some reason this isn't solvable?
>

Oh, was there another way to solve it (other than foregoing setting
INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
corresponding setting on the MMU pgtables side of things?

BR,
-R

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 17:18                   ` Rob Clark
@ 2021-08-09 17:40                     ` Will Deacon
  2021-08-09 17:47                       ` Sai Prakash Ranjan
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2021-08-09 17:40 UTC (permalink / raw)
  To: Rob Clark
  Cc: Sai Prakash Ranjan, Georgi Djakov, Isaac J. Manjarres,
	David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS
	<iommu@lists.linux-foundation.org>,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
> >
> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > > > > >
> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > > >
> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> > > > > > > > > > not apply anymore?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> > > > > > > > > I can repost the patch.
> > > > > > > >
> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > > > > > can't be right.
> > > > > > > >
> > > > > > >
> > > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > > doesn't really care about the smmu pgtable attributes..
> > > > > >
> > > > > > If the CPU accesses a shared buffer with different attributes to those which
> > > > > > the device is using then you fall into the "mismatched memory attributes"
> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > > > > > read it) and in some cases can apply to speculative accesses as well, but
> > > > > > the end result is typically loss of coherency.
> > > > >
> > > > > Ok, I might have a few other sections to read first to decipher the
> > > > > terminology..
> > > > >
> > > > > But my understanding of LLC is that it looks just like system memory
> > > > > to the CPU and GPU (I think that would make it "the point of
> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > > invisible from the point of view of different CPU mapping options?
> > > >
> > > > You could certainly build a system where mismatched attributes don't cause
> > > > loss of coherence, but as it's not guaranteed by the architecture and the
> > > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > > don't think it helps much.
> > > >
> > >
> > > Hmm, the description of the new mapping flag is that it applies only
> > > to transparent outer level cache:
> > >
> > > +/*
> > > + * Non-coherent masters can use this page protection flag to set cacheable
> > > + * memory attributes for only a transparent outer level of cache, also known as
> > > + * the last-level or system cache.
> > > + */
> > > +#define IOMMU_LLC      (1 << 6)
> > >
> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > > like that to make it more clear that it is not necessarily something
> > > that would work with a different outer level cache implementation?
> >
> > ... or we could just deal with the problem so that other people can reuse
> > the code. I haven't really understood the reluctance to solve this properly.
> >
> > Am I missing some reason this isn't solvable?
> 
> Oh, was there another way to solve it (other than foregoing setting
> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> corresponding setting on the MMU pgtables side of things?

Right -- we just need to program the CPU's MMU with the matching memory
attributes! It's a bit more fiddly if you're just using ioremap_wc()
though, as it's usually the DMA API which handles the attributes under the
hood.

Anyway, sorry, I should've said that explicitly earlier on. We've done this
sort of thing in the Android tree so I assumed Sai knew what needed to be
done and then I didn't think to explain to you :(

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 17:40                     ` Will Deacon
@ 2021-08-09 17:47                       ` Sai Prakash Ranjan
  2021-08-09 18:07                         ` Rob Clark
  2021-08-10  9:16                         ` Will Deacon
  0 siblings, 2 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-08-09 17:47 UTC (permalink / raw)
  To: Will Deacon, Rob Clark
  Cc: Georgi Djakov, Isaac J. Manjarres, David Airlie, Akhil P Oommen,
	list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On 2021-08-09 23:10, Will Deacon wrote:
> On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>> >
>> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
>> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
>> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
>> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
>> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
>> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
>> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
>> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
>> > > > > > > > > > >
>> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
>> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> > > > > > > > > > > and makes GPU the user of this protection flag.
>> > > > > > > > > >
>> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
>> > > > > > > > > > not apply anymore?
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
>> > > > > > > > > I can repost the patch.
>> > > > > > > >
>> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
>> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
>> > > > > > > > can't be right.
>> > > > > > > >
>> > > > > > >
>> > > > > > > Just curious, and maybe this is a dumb question, but what is your
>> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
>> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
>> > > > > > > doesn't really care about the smmu pgtable attributes..
>> > > > > >
>> > > > > > If the CPU accesses a shared buffer with different attributes to those which
>> > > > > > the device is using then you fall into the "mismatched memory attributes"
>> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
>> > > > > > read it) and in some cases can apply to speculative accesses as well, but
>> > > > > > the end result is typically loss of coherency.
>> > > > >
>> > > > > Ok, I might have a few other sections to read first to decipher the
>> > > > > terminology..
>> > > > >
>> > > > > But my understanding of LLC is that it looks just like system memory
>> > > > > to the CPU and GPU (I think that would make it "the point of
>> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
>> > > > > invisible from the point of view of different CPU mapping options?
>> > > >
>> > > > You could certainly build a system where mismatched attributes don't cause
>> > > > loss of coherence, but as it's not guaranteed by the architecture and the
>> > > > changes proposed here affect APIs which are exposed across SoCs, then I
>> > > > don't think it helps much.
>> > > >
>> > >
>> > > Hmm, the description of the new mapping flag is that it applies only
>> > > to transparent outer level cache:
>> > >
>> > > +/*
>> > > + * Non-coherent masters can use this page protection flag to set cacheable
>> > > + * memory attributes for only a transparent outer level of cache, also known as
>> > > + * the last-level or system cache.
>> > > + */
>> > > +#define IOMMU_LLC      (1 << 6)
>> > >
>> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> > > like that to make it more clear that it is not necessarily something
>> > > that would work with a different outer level cache implementation?
>> >
>> > ... or we could just deal with the problem so that other people can reuse
>> > the code. I haven't really understood the reluctance to solve this properly.
>> >
>> > Am I missing some reason this isn't solvable?
>> 
>> Oh, was there another way to solve it (other than foregoing setting
>> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> corresponding setting on the MMU pgtables side of things?
> 
> Right -- we just need to program the CPU's MMU with the matching memory
> attributes! It's a bit more fiddly if you're just using ioremap_wc()
> though, as it's usually the DMA API which handles the attributes under 
> the
> hood.
> 
> Anyway, sorry, I should've said that explicitly earlier on. We've done 
> this
> sort of thing in the Android tree so I assumed Sai knew what needed to 
> be
> done and then I didn't think to explain to you :(
> 

Right I was aware of that but even in the android tree there is no user 
:)
I think we can't have a new memory type without any user right in 
upstream
like android tree?

@Rob, I think you  already tried adding a new MT and used 
pgprot_syscached()
in GPU driver but it was crashing?

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 17:47                       ` Sai Prakash Ranjan
@ 2021-08-09 18:07                         ` Rob Clark
  2021-08-09 18:10                           ` Sai Prakash Ranjan
  2021-08-10  9:16                         ` Will Deacon
  1 sibling, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-08-09 18:07 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Will Deacon, Georgi Djakov, Isaac J. Manjarres, David Airlie,
	Akhil P Oommen, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> On 2021-08-09 23:10, Will Deacon wrote:
> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
> >> >
> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> >> > > > > > > > > > >
> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> >> > > > > > > > > > > and makes GPU the user of this protection flag.
> >> > > > > > > > > >
> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> >> > > > > > > > > > not apply anymore?
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> >> > > > > > > > > I can repost the patch.
> >> > > > > > > >
> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> >> > > > > > > > can't be right.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your
> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
> >> > > > > > > doesn't really care about the smmu pgtable attributes..
> >> > > > > >
> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which
> >> > > > > > the device is using then you fall into the "mismatched memory attributes"
> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but
> >> > > > > > the end result is typically loss of coherency.
> >> > > > >
> >> > > > > Ok, I might have a few other sections to read first to decipher the
> >> > > > > terminology..
> >> > > > >
> >> > > > > But my understanding of LLC is that it looks just like system memory
> >> > > > > to the CPU and GPU (I think that would make it "the point of
> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> >> > > > > invisible from the point of view of different CPU mapping options?
> >> > > >
> >> > > > You could certainly build a system where mismatched attributes don't cause
> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the
> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I
> >> > > > don't think it helps much.
> >> > > >
> >> > >
> >> > > Hmm, the description of the new mapping flag is that it applies only
> >> > > to transparent outer level cache:
> >> > >
> >> > > +/*
> >> > > + * Non-coherent masters can use this page protection flag to set cacheable
> >> > > + * memory attributes for only a transparent outer level of cache, also known as
> >> > > + * the last-level or system cache.
> >> > > + */
> >> > > +#define IOMMU_LLC      (1 << 6)
> >> > >
> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> >> > > like that to make it more clear that it is not necessarily something
> >> > > that would work with a different outer level cache implementation?
> >> >
> >> > ... or we could just deal with the problem so that other people can reuse
> >> > the code. I haven't really understood the reluctance to solve this properly.
> >> >
> >> > Am I missing some reason this isn't solvable?
> >>
> >> Oh, was there another way to solve it (other than foregoing setting
> >> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> >> corresponding setting on the MMU pgtables side of things?
> >
> > Right -- we just need to program the CPU's MMU with the matching memory
> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
> > though, as it's usually the DMA API which handles the attributes under
> > the
> > hood.
> >
> > Anyway, sorry, I should've said that explicitly earlier on. We've done
> > this
> > sort of thing in the Android tree so I assumed Sai knew what needed to
> > be
> > done and then I didn't think to explain to you :(
> >
>
> Right I was aware of that but even in the android tree there is no user
> :)
> I think we can't have a new memory type without any user right in
> upstream
> like android tree?
>
> @Rob, I think you  already tried adding a new MT and used
> pgprot_syscached()
> in GPU driver but it was crashing?

Correct, but IIRC there were some differences in the code for memory
types compared to the android tree.. I couldn't figure out the
necessary patches to cherry-pick to get the android patch to apply
cleanly, so I tried re-implementing it without having much of a clue
about how that code works (which was probably the issue) ;-)

BR,
-R

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 18:07                         ` Rob Clark
@ 2021-08-09 18:10                           ` Sai Prakash Ranjan
  2021-08-09 18:30                             ` Rob Clark
  0 siblings, 1 reply; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-08-09 18:10 UTC (permalink / raw)
  To: Rob Clark
  Cc: Will Deacon, Georgi Djakov, Isaac J. Manjarres, David Airlie,
	Akhil P Oommen, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On 2021-08-09 23:37, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>> 
>> On 2021-08-09 23:10, Will Deacon wrote:
>> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>> >> >
>> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
>> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
>> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
>> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
>> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
>> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
>> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
>> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
>> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
>> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> >> > > > > > > > > > > and makes GPU the user of this protection flag.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
>> >> > > > > > > > > > not apply anymore?
>> >> > > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
>> >> > > > > > > > > I can repost the patch.
>> >> > > > > > > >
>> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
>> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
>> >> > > > > > > > can't be right.
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your
>> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
>> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
>> >> > > > > > > doesn't really care about the smmu pgtable attributes..
>> >> > > > > >
>> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which
>> >> > > > > > the device is using then you fall into the "mismatched memory attributes"
>> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
>> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but
>> >> > > > > > the end result is typically loss of coherency.
>> >> > > > >
>> >> > > > > Ok, I might have a few other sections to read first to decipher the
>> >> > > > > terminology..
>> >> > > > >
>> >> > > > > But my understanding of LLC is that it looks just like system memory
>> >> > > > > to the CPU and GPU (I think that would make it "the point of
>> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
>> >> > > > > invisible from the point of view of different CPU mapping options?
>> >> > > >
>> >> > > > You could certainly build a system where mismatched attributes don't cause
>> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the
>> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I
>> >> > > > don't think it helps much.
>> >> > > >
>> >> > >
>> >> > > Hmm, the description of the new mapping flag is that it applies only
>> >> > > to transparent outer level cache:
>> >> > >
>> >> > > +/*
>> >> > > + * Non-coherent masters can use this page protection flag to set cacheable
>> >> > > + * memory attributes for only a transparent outer level of cache, also known as
>> >> > > + * the last-level or system cache.
>> >> > > + */
>> >> > > +#define IOMMU_LLC      (1 << 6)
>> >> > >
>> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> >> > > like that to make it more clear that it is not necessarily something
>> >> > > that would work with a different outer level cache implementation?
>> >> >
>> >> > ... or we could just deal with the problem so that other people can reuse
>> >> > the code. I haven't really understood the reluctance to solve this properly.
>> >> >
>> >> > Am I missing some reason this isn't solvable?
>> >>
>> >> Oh, was there another way to solve it (other than foregoing setting
>> >> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> >> corresponding setting on the MMU pgtables side of things?
>> >
>> > Right -- we just need to program the CPU's MMU with the matching memory
>> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
>> > though, as it's usually the DMA API which handles the attributes under
>> > the
>> > hood.
>> >
>> > Anyway, sorry, I should've said that explicitly earlier on. We've done
>> > this
>> > sort of thing in the Android tree so I assumed Sai knew what needed to
>> > be
>> > done and then I didn't think to explain to you :(
>> >
>> 
>> Right I was aware of that but even in the android tree there is no 
>> user
>> :)
>> I think we can't have a new memory type without any user right in
>> upstream
>> like android tree?
>> 
>> @Rob, I think you  already tried adding a new MT and used
>> pgprot_syscached()
>> in GPU driver but it was crashing?
> 
> Correct, but IIRC there were some differences in the code for memory
> types compared to the android tree.. I couldn't figure out the
> necessary patches to cherry-pick to get the android patch to apply
> cleanly, so I tried re-implementing it without having much of a clue
> about how that code works (which was probably the issue) ;-)
> 

Hehe no, even I get the same crash after porting/modifying the required
patches from android ;) and I think crashes would be seen in android as
well, its just that they don't have any user exercising that code.

Thing is I can't make head and tail of the GPU crash logs, maybe you 
know
how to decode those errors, if not I can start a thread with QC GPU team
and ask them to decode?

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 18:10                           ` Sai Prakash Ranjan
@ 2021-08-09 18:30                             ` Rob Clark
  2021-08-09 18:32                               ` Sai Prakash Ranjan
  0 siblings, 1 reply; 36+ messages in thread
From: Rob Clark @ 2021-08-09 18:30 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Will Deacon, Georgi Djakov, Isaac J. Manjarres, David Airlie,
	Akhil P Oommen, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Jordan Crouse,
	Kristian H Kristensen, dri-devel, Daniel Vetter, linux-arm-msm,
	freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE

On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan
<saiprakash.ranjan@codeaurora.org> wrote:
>
> On 2021-08-09 23:37, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
> > <saiprakash.ranjan@codeaurora.org> wrote:
> >>
> >> On 2021-08-09 23:10, Will Deacon wrote:
> >> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> >> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
> >> >> >
> >> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> >> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
> >> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> >> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
> >> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> >> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
> >> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> >> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> >> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> >> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> >> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> >> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
> >> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
> >> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
> >> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
> >> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
> >> >> > > > > > > > > > > and makes GPU the user of this protection flag.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
> >> >> > > > > > > > > > not apply anymore?
> >> >> > > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
> >> >> > > > > > > > > I can repost the patch.
> >> >> > > > > > > >
> >> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
> >> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> >> >> > > > > > > > can't be right.
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your
> >> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> >> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
> >> >> > > > > > > doesn't really care about the smmu pgtable attributes..
> >> >> > > > > >
> >> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which
> >> >> > > > > > the device is using then you fall into the "mismatched memory attributes"
> >> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
> >> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but
> >> >> > > > > > the end result is typically loss of coherency.
> >> >> > > > >
> >> >> > > > > Ok, I might have a few other sections to read first to decipher the
> >> >> > > > > terminology..
> >> >> > > > >
> >> >> > > > > But my understanding of LLC is that it looks just like system memory
> >> >> > > > > to the CPU and GPU (I think that would make it "the point of
> >> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> >> >> > > > > invisible from the point of view of different CPU mapping options?
> >> >> > > >
> >> >> > > > You could certainly build a system where mismatched attributes don't cause
> >> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the
> >> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I
> >> >> > > > don't think it helps much.
> >> >> > > >
> >> >> > >
> >> >> > > Hmm, the description of the new mapping flag is that it applies only
> >> >> > > to transparent outer level cache:
> >> >> > >
> >> >> > > +/*
> >> >> > > + * Non-coherent masters can use this page protection flag to set cacheable
> >> >> > > + * memory attributes for only a transparent outer level of cache, also known as
> >> >> > > + * the last-level or system cache.
> >> >> > > + */
> >> >> > > +#define IOMMU_LLC      (1 << 6)
> >> >> > >
> >> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> >> >> > > like that to make it more clear that it is not necessarily something
> >> >> > > that would work with a different outer level cache implementation?
> >> >> >
> >> >> > ... or we could just deal with the problem so that other people can reuse
> >> >> > the code. I haven't really understood the reluctance to solve this properly.
> >> >> >
> >> >> > Am I missing some reason this isn't solvable?
> >> >>
> >> >> Oh, was there another way to solve it (other than foregoing setting
> >> >> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> >> >> corresponding setting on the MMU pgtables side of things?
> >> >
> >> > Right -- we just need to program the CPU's MMU with the matching memory
> >> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
> >> > though, as it's usually the DMA API which handles the attributes under
> >> > the
> >> > hood.
> >> >
> >> > Anyway, sorry, I should've said that explicitly earlier on. We've done
> >> > this
> >> > sort of thing in the Android tree so I assumed Sai knew what needed to
> >> > be
> >> > done and then I didn't think to explain to you :(
> >> >
> >>
> >> Right I was aware of that but even in the android tree there is no
> >> user
> >> :)
> >> I think we can't have a new memory type without any user right in
> >> upstream
> >> like android tree?
> >>
> >> @Rob, I think you  already tried adding a new MT and used
> >> pgprot_syscached()
> >> in GPU driver but it was crashing?
> >
> > Correct, but IIRC there were some differences in the code for memory
> > types compared to the android tree.. I couldn't figure out the
> > necessary patches to cherry-pick to get the android patch to apply
> > cleanly, so I tried re-implementing it without having much of a clue
> > about how that code works (which was probably the issue) ;-)
> >
>
> Hehe no, even I get the same crash after porting/modifying the required
> patches from android ;) and I think crashes would be seen in android as
> well, its just that they don't have any user exercising that code.
>
> Thing is I can't make head and tail of the GPU crash logs, maybe you
> know
> how to decode those errors, if not I can start a thread with QC GPU team
> and ask them to decode?
>

If you have a gpu devcore dump, I can take a look at it with
crashdec.. otherwise I can try to find the branch where I had that
patch backported.

I'm more familiar with using crashdec to figure out mesa bugs, but
maybe I could spot something where what the GPU is seeing disagrees
with what the CPU expects it to be seeing.

BR,
-R

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 18:30                             ` Rob Clark
@ 2021-08-09 18:32                               ` Sai Prakash Ranjan
  0 siblings, 0 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-08-09 18:32 UTC (permalink / raw)
  To: Rob Clark
  Cc: Will Deacon, Georgi Djakov, Isaac J. Manjarres, David Airlie,
	Akhil P Oommen, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	Linux Kernel Mailing List, Sean Paul, Kristian H Kristensen,
	dri-devel, Daniel Vetter, linux-arm-msm, freedreno, Robin Murphy,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Jordan Crouse

On 2021-08-10 00:00, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan
> <saiprakash.ranjan@codeaurora.org> wrote:
>> 
>> On 2021-08-09 23:37, Rob Clark wrote:
>> > On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
>> > <saiprakash.ranjan@codeaurora.org> wrote:
>> >>
>> >> On 2021-08-09 23:10, Will Deacon wrote:
>> >> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> >> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>> >> >> >
>> >> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> >> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon <will@kernel.org> wrote:
>> >> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> >> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon <will@kernel.org> wrote:
>> >> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> >> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon <will@kernel.org> wrote:
>> >> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
>> >> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> >> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
>> >> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> >> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> >> >> > > > > > > > > > > the memory type setting required for the non-coherent masters to use
>> >> >> > > > > > > > > > > system cache. Now that system cache support for GPU is added, we will
>> >> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys cached.
>> >> >> > > > > > > > > > > Without this, the system cache lines are not allocated for GPU.
>> >> >> > > > > > > > > > >
>> >> >> > > > > > > > > > > So the patches in this series introduces a new prot flag IOMMU_LLC,
>> >> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>> >> >> > > > > > > > > > > and makes GPU the user of this protection flag.
>> >> >> > > > > > > > > >
>> >> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, as it does
>> >> >> > > > > > > > > > not apply anymore?
>> >> >> > > > > > > > > >
>> >> >> > > > > > > > >
>> >> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes needed, then
>> >> >> > > > > > > > > I can repost the patch.
>> >> >> > > > > > > >
>> >> >> > > > > > > > I still think you need to handle the mismatched alias, no? You're adding
>> >> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
>> >> >> > > > > > > > can't be right.
>> >> >> > > > > > > >
>> >> >> > > > > > >
>> >> >> > > > > > > Just curious, and maybe this is a dumb question, but what is your
>> >> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
>> >> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
>> >> >> > > > > > > doesn't really care about the smmu pgtable attributes..
>> >> >> > > > > >
>> >> >> > > > > > If the CPU accesses a shared buffer with different attributes to those which
>> >> >> > > > > > the device is using then you fall into the "mismatched memory attributes"
>> >> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you should go and
>> >> >> > > > > > read it) and in some cases can apply to speculative accesses as well, but
>> >> >> > > > > > the end result is typically loss of coherency.
>> >> >> > > > >
>> >> >> > > > > Ok, I might have a few other sections to read first to decipher the
>> >> >> > > > > terminology..
>> >> >> > > > >
>> >> >> > > > > But my understanding of LLC is that it looks just like system memory
>> >> >> > > > > to the CPU and GPU (I think that would make it "the point of
>> >> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
>> >> >> > > > > invisible from the point of view of different CPU mapping options?
>> >> >> > > >
>> >> >> > > > You could certainly build a system where mismatched attributes don't cause
>> >> >> > > > loss of coherence, but as it's not guaranteed by the architecture and the
>> >> >> > > > changes proposed here affect APIs which are exposed across SoCs, then I
>> >> >> > > > don't think it helps much.
>> >> >> > > >
>> >> >> > >
>> >> >> > > Hmm, the description of the new mapping flag is that it applies only
>> >> >> > > to transparent outer level cache:
>> >> >> > >
>> >> >> > > +/*
>> >> >> > > + * Non-coherent masters can use this page protection flag to set cacheable
>> >> >> > > + * memory attributes for only a transparent outer level of cache, also known as
>> >> >> > > + * the last-level or system cache.
>> >> >> > > + */
>> >> >> > > +#define IOMMU_LLC      (1 << 6)
>> >> >> > >
>> >> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> >> >> > > like that to make it more clear that it is not necessarily something
>> >> >> > > that would work with a different outer level cache implementation?
>> >> >> >
>> >> >> > ... or we could just deal with the problem so that other people can reuse
>> >> >> > the code. I haven't really understood the reluctance to solve this properly.
>> >> >> >
>> >> >> > Am I missing some reason this isn't solvable?
>> >> >>
>> >> >> Oh, was there another way to solve it (other than foregoing setting
>> >> >> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> >> >> corresponding setting on the MMU pgtables side of things?
>> >> >
>> >> > Right -- we just need to program the CPU's MMU with the matching memory
>> >> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
>> >> > though, as it's usually the DMA API which handles the attributes under
>> >> > the
>> >> > hood.
>> >> >
>> >> > Anyway, sorry, I should've said that explicitly earlier on. We've done
>> >> > this
>> >> > sort of thing in the Android tree so I assumed Sai knew what needed to
>> >> > be
>> >> > done and then I didn't think to explain to you :(
>> >> >
>> >>
>> >> Right I was aware of that but even in the android tree there is no
>> >> user
>> >> :)
>> >> I think we can't have a new memory type without any user right in
>> >> upstream
>> >> like android tree?
>> >>
>> >> @Rob, I think you  already tried adding a new MT and used
>> >> pgprot_syscached()
>> >> in GPU driver but it was crashing?
>> >
>> > Correct, but IIRC there were some differences in the code for memory
>> > types compared to the android tree.. I couldn't figure out the
>> > necessary patches to cherry-pick to get the android patch to apply
>> > cleanly, so I tried re-implementing it without having much of a clue
>> > about how that code works (which was probably the issue) ;-)
>> >
>> 
>> Hehe no, even I get the same crash after porting/modifying the 
>> required
>> patches from android ;) and I think crashes would be seen in android 
>> as
>> well, its just that they don't have any user exercising that code.
>> 
>> Thing is I can't make head and tail of the GPU crash logs, maybe you
>> know
>> how to decode those errors, if not I can start a thread with QC GPU 
>> team
>> and ask them to decode?
>> 
> 
> If you have a gpu devcore dump, I can take a look at it with
> crashdec.. otherwise I can try to find the branch where I had that
> patch backported.
> 
> I'm more familiar with using crashdec to figure out mesa bugs, but
> maybe I could spot something where what the GPU is seeing disagrees
> with what the CPU expects it to be seeing.
> 

Sure, I will get a devcoredump tomorrow and attach in the bug, currently
I don't have it handy.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-09 17:47                       ` Sai Prakash Ranjan
  2021-08-09 18:07                         ` Rob Clark
@ 2021-08-10  9:16                         ` Will Deacon
  2021-08-10  9:54                           ` Sai Prakash Ranjan
  1 sibling, 1 reply; 36+ messages in thread
From: Will Deacon @ 2021-08-10  9:16 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Rob Clark, Isaac J. Manjarres, freedreno, Jordan Crouse,
	David Airlie, linux-arm-msm, Akhil P Oommen, dri-devel,
	Linux Kernel Mailing List, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>, ,
	Kristian H Kristensen, Daniel Vetter, Sean Paul,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Robin Murphy

On Mon, Aug 09, 2021 at 11:17:40PM +0530, Sai Prakash Ranjan wrote:
> On 2021-08-09 23:10, Will Deacon wrote:
> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> > > On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
> > > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > > > > like that to make it more clear that it is not necessarily something
> > > > > that would work with a different outer level cache implementation?
> > > >
> > > > ... or we could just deal with the problem so that other people can reuse
> > > > the code. I haven't really understood the reluctance to solve this properly.
> > > >
> > > > Am I missing some reason this isn't solvable?
> > > 
> > > Oh, was there another way to solve it (other than foregoing setting
> > > INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> > > corresponding setting on the MMU pgtables side of things?
> > 
> > Right -- we just need to program the CPU's MMU with the matching memory
> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
> > though, as it's usually the DMA API which handles the attributes under
> > the
> > hood.
> > 
> > Anyway, sorry, I should've said that explicitly earlier on. We've done
> > this
> > sort of thing in the Android tree so I assumed Sai knew what needed to
> > be
> > done and then I didn't think to explain to you :(
> > 
> 
> Right I was aware of that but even in the android tree there is no user :)

I'm assuming there are vendor modules using it there, otherwise we wouldn't
have been asked to put it in. Since you work at Qualcomm, maybe you could
talk to your colleagues (Isaac and Patrick) directly?

> I think we can't have a new memory type without any user right in upstream
> like android tree?

Correct. But I don't think we should be adding IOMMU_* anything upstream
if we don't have a user.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache
  2021-08-10  9:16                         ` Will Deacon
@ 2021-08-10  9:54                           ` Sai Prakash Ranjan
  0 siblings, 0 replies; 36+ messages in thread
From: Sai Prakash Ranjan @ 2021-08-10  9:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Rob Clark, Isaac J. Manjarres, freedreno, Jordan Crouse,
	David Airlie, linux-arm-msm, Akhil P Oommen, dri-devel,
	Linux Kernel Mailing List, list@263.net:IOMMU DRIVERS ,
	Joerg Roedel <joro@8bytes.org>,,
	Kristian H Kristensen, Daniel Vetter, Sean Paul,
	moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE,
	Robin Murphy

On 2021-08-10 14:46, Will Deacon wrote:
> On Mon, Aug 09, 2021 at 11:17:40PM +0530, Sai Prakash Ranjan wrote:
>> On 2021-08-09 23:10, Will Deacon wrote:
>> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> > > On Mon, Aug 9, 2021 at 10:05 AM Will Deacon <will@kernel.org> wrote:
>> > > > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> > > > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> > > > > like that to make it more clear that it is not necessarily something
>> > > > > that would work with a different outer level cache implementation?
>> > > >
>> > > > ... or we could just deal with the problem so that other people can reuse
>> > > > the code. I haven't really understood the reluctance to solve this properly.
>> > > >
>> > > > Am I missing some reason this isn't solvable?
>> > >
>> > > Oh, was there another way to solve it (other than foregoing setting
>> > > INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> > > corresponding setting on the MMU pgtables side of things?
>> >
>> > Right -- we just need to program the CPU's MMU with the matching memory
>> > attributes! It's a bit more fiddly if you're just using ioremap_wc()
>> > though, as it's usually the DMA API which handles the attributes under
>> > the
>> > hood.
>> >
>> > Anyway, sorry, I should've said that explicitly earlier on. We've done
>> > this
>> > sort of thing in the Android tree so I assumed Sai knew what needed to
>> > be
>> > done and then I didn't think to explain to you :(
>> >
>> 
>> Right I was aware of that but even in the android tree there is no 
>> user :)
> 
> I'm assuming there are vendor modules using it there, otherwise we 
> wouldn't
> have been asked to put it in. Since you work at Qualcomm, maybe you 
> could
> talk to your colleagues (Isaac and Patrick) directly?
> 

Right I will check with them regarding the vendor modules in android.

>> I think we can't have a new memory type without any user right in 
>> upstream
>> like android tree?
> 
> Correct. But I don't think we should be adding IOMMU_* anything 
> upstream
> if we don't have a user.
> 

Agreed, once we have the fix for GPU crash I can continue further on 
using
this properly.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-08-10  9:55 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-11 14:15 [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
2021-01-11 14:15 ` [PATCH 1/3] iommu/io-pgtable: Rename last-level cache quirk to IO_PGTABLE_QUIRK_PTW_LLC Sai Prakash Ranjan
2021-01-11 14:15 ` [PATCH 2/3] iommu/io-pgtable-arm: Add IOMMU_LLC page protection flag Sai Prakash Ranjan
     [not found]   ` <20210129090516.GB3998@willie-the-truck>
     [not found]     ` <5d23fce629323bcda71594010824aad0@codeaurora.org>
2021-02-01 11:15       ` Will Deacon
2021-02-01 16:20         ` Rob Clark
2021-02-01 18:20           ` Jordan Crouse
2021-02-02  6:26             ` Sai Prakash Ranjan
2021-02-03 21:46               ` Will Deacon
2021-02-03 22:14                 ` Rob Clark
2021-03-16 17:04                   ` Rob Clark
2021-03-16 17:16                     ` Rob Clark
2021-02-05 12:08                 ` Sai Prakash Ranjan
2021-03-09  6:40                   ` Sai Prakash Ranjan
2021-03-25 17:33                     ` Will Deacon
2021-06-30 10:07                       ` Sai Prakash Ranjan
2021-02-02  6:28             ` Sai Prakash Ranjan
2021-01-11 14:15 ` [PATCH 3/3] drm/msm: Use IOMMU_LLC page protection flag to map gpu buffers Sai Prakash Ranjan
2021-01-20  5:18 ` [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
2021-07-28 14:00 ` Georgi Djakov
2021-07-29  4:38   ` Sai Prakash Ranjan
2021-08-02 10:55     ` Will Deacon
2021-08-02 15:08       ` [Freedreno] " Rob Clark
2021-08-02 15:14         ` Will Deacon
2021-08-03  1:36           ` Rob Clark
2021-08-09 14:56             ` Will Deacon
2021-08-09 16:57               ` Rob Clark
2021-08-09 17:05                 ` Will Deacon
2021-08-09 17:18                   ` Rob Clark
2021-08-09 17:40                     ` Will Deacon
2021-08-09 17:47                       ` Sai Prakash Ranjan
2021-08-09 18:07                         ` Rob Clark
2021-08-09 18:10                           ` Sai Prakash Ranjan
2021-08-09 18:30                             ` Rob Clark
2021-08-09 18:32                               ` Sai Prakash Ranjan
2021-08-10  9:16                         ` Will Deacon
2021-08-10  9:54                           ` Sai Prakash Ranjan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).