linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
@ 2020-12-24  6:40 Sai Prakash Ranjan
  2021-01-06 11:56 ` Will Deacon
  0 siblings, 1 reply; 10+ messages in thread
From: Sai Prakash Ranjan @ 2020-12-24  6:40 UTC (permalink / raw)
  To: Will Deacon, Robin Murphy, Joerg Roedel
  Cc: iommu, linux-arm-kernel, linux-kernel, linux-arm-msm,
	Jordan Crouse, Rob Clark, Akhil P Oommen, Sai Prakash Ranjan

commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
the memory type setting required for the non-coherent masters to use
system cache. Now that system cache support for GPU is added, we will
need to mark the memory as normal sys-cached for GPU to use system cache.
Without this, the system cache lines are not allocated for GPU. We use
the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection
flag as the flag cannot be exposed via DMA api because of no in-tree
users.

Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
---
 drivers/iommu/io-pgtable-arm.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 7c9ea9d7874a..3fb7de8304a2 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
 		else if (prot & IOMMU_CACHE)
 			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
 				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
+		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
+			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
+				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
 	}
 
 	if (prot & IOMMU_CACHE)
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2020-12-24  6:40 [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
@ 2021-01-06 11:56 ` Will Deacon
  2021-01-07  6:35   ` Sai Prakash Ranjan
  2021-01-07 16:57   ` isaacm
  0 siblings, 2 replies; 10+ messages in thread
From: Will Deacon @ 2021-01-06 11:56 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Robin Murphy, Joerg Roedel, iommu, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jordan Crouse, Rob Clark,
	Akhil P Oommen

On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> the memory type setting required for the non-coherent masters to use
> system cache. Now that system cache support for GPU is added, we will
> need to mark the memory as normal sys-cached for GPU to use system cache.
> Without this, the system cache lines are not allocated for GPU. We use
> the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection
> flag as the flag cannot be exposed via DMA api because of no in-tree
> users.
> 
> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> ---
>  drivers/iommu/io-pgtable-arm.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 7c9ea9d7874a..3fb7de8304a2 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>  		else if (prot & IOMMU_CACHE)
>  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
> +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>  	}

drivers/iommu/io-pgtable.c currently documents this quirk as applying only
to the page-table walker. Given that we only have one user at the moment,
I think it's ok to change that, but please update the comment.

We also need to decide on whether we want to allow the quirk to be passed
if the coherency of the page-table walker differs from the DMA device, since
we have these combinations:

	Coherent walker?	IOMMU_CACHE	IO_PGTABLE_QUIRK_ARM_OUTER_WBWA
0:	N			0		0
1:	N			0		1
2:	N			1		0
3:	N			1		1
4:	Y			0		0
5:	Y			0		1
6:	Y			1		0
7:	Y			1		1

Some of them are obviously bogus, such as (7), but I don't know what to
do about cases such as (3) and (5).

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2021-01-06 11:56 ` Will Deacon
@ 2021-01-07  6:35   ` Sai Prakash Ranjan
  2021-01-07 16:57   ` isaacm
  1 sibling, 0 replies; 10+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-07  6:35 UTC (permalink / raw)
  To: Will Deacon
  Cc: Robin Murphy, Joerg Roedel, iommu, linux-arm-kernel,
	linux-kernel, linux-arm-msm, Jordan Crouse, Rob Clark,
	Akhil P Oommen

Hi Will,

On 2021-01-06 17:26, Will Deacon wrote:
> On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
>> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> the memory type setting required for the non-coherent masters to use
>> system cache. Now that system cache support for GPU is added, we will
>> need to mark the memory as normal sys-cached for GPU to use system 
>> cache.
>> Without this, the system cache lines are not allocated for GPU. We use
>> the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection
>> flag as the flag cannot be exposed via DMA api because of no in-tree
>> users.
>> 
>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> ---
>>  drivers/iommu/io-pgtable-arm.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/drivers/iommu/io-pgtable-arm.c 
>> b/drivers/iommu/io-pgtable-arm.c
>> index 7c9ea9d7874a..3fb7de8304a2 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
>> arm_lpae_io_pgtable *data,
>>  		else if (prot & IOMMU_CACHE)
>>  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>>  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
>> +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>> +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>  	}
> 
> drivers/iommu/io-pgtable.c currently documents this quirk as applying 
> only
> to the page-table walker. Given that we only have one user at the 
> moment,
> I think it's ok to change that, but please update the comment.
> 

Sure, how about this change in comment:

          * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the 
outer-cacheability
-        *      attributes set in the TCR for a non-coherent page-table 
walker.
+        *      attributes set in the TCR for a non-coherent page-table 
walker
+        *      and also to set the correct cacheability attributes to 
use an
+        *      outer level of cache for non-coherent masters.

> We also need to decide on whether we want to allow the quirk to be 
> passed
> if the coherency of the page-table walker differs from the DMA device, 
> since
> we have these combinations:
> 
> 	Coherent walker?	IOMMU_CACHE	IO_PGTABLE_QUIRK_ARM_OUTER_WBWA
> 0:	N			0		0
> 1:	N			0		1
> 2:	N			1		0
> 3:	N			1		1
> 4:	Y			0		0
> 5:	Y			0		1
> 6:	Y			1		0
> 7:	Y			1		1
> 
> Some of them are obviously bogus, such as (7), but I don't know what to
> do about cases such as (3) and (5).
> 

I thought this was already decided when IOMMU_SYS_CACHE_ONLY prot flag 
was
added in this same location [1]. dma-coherent masters can use the normal
cached memory type to use the system cache and non dma-coherent masters
willing to use system cache should use normal sys-cached memory type 
with
this quirk.

[1] 
https://lore.kernel.org/linux-arm-msm/20190516093020.18028-1-vivek.gautam@codeaurora.org/

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2021-01-06 11:56 ` Will Deacon
  2021-01-07  6:35   ` Sai Prakash Ranjan
@ 2021-01-07 16:57   ` isaacm
  2021-01-08  5:47     ` Sai Prakash Ranjan
  1 sibling, 1 reply; 10+ messages in thread
From: isaacm @ 2021-01-07 16:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, Rob Clark, Jordan Crouse, linux-arm-msm,
	Joerg Roedel, linux-kernel, Akhil P Oommen, iommu, Robin Murphy,
	linux-arm-kernel

On 2021-01-06 03:56, Will Deacon wrote:
> On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
>> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY flag")
>> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> the memory type setting required for the non-coherent masters to use
>> system cache. Now that system cache support for GPU is added, we will
>> need to mark the memory as normal sys-cached for GPU to use system 
>> cache.
>> Without this, the system cache lines are not allocated for GPU. We use
>> the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page protection
>> flag as the flag cannot be exposed via DMA api because of no in-tree
>> users.
>> 
>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> ---
>>  drivers/iommu/io-pgtable-arm.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/drivers/iommu/io-pgtable-arm.c 
>> b/drivers/iommu/io-pgtable-arm.c
>> index 7c9ea9d7874a..3fb7de8304a2 100644
>> --- a/drivers/iommu/io-pgtable-arm.c
>> +++ b/drivers/iommu/io-pgtable-arm.c
>> @@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
>> arm_lpae_io_pgtable *data,
>>  		else if (prot & IOMMU_CACHE)
>>  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>>  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
>> +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>> +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>  	}
> 
While this approach of enabling system cache globally for both page 
tables and other buffers
works for the GPU usecase, this isn't ideal for other clients that use 
system cache. For example,
video clients only want to cache a subset of their buffers in the system 
cache, due to the sizing constraint
imposed by how much of the system cache they can use. So, it would be 
ideal to have
a way of expressing the desire to use the system cache on a per-buffer 
basis. Additionally,
our video clients use the DMA layer, and since the requirement is for 
caching in the system cache
to be a per buffer attribute, it seems like we would have to have a DMA 
attribute to express
this on a per-buffer basis.

Thanks,
Isaac

> drivers/iommu/io-pgtable.c currently documents this quirk as applying 
> only
> to the page-table walker. Given that we only have one user at the 
> moment,
> I think it's ok to change that, but please update the comment.
> 
> We also need to decide on whether we want to allow the quirk to be 
> passed
> if the coherency of the page-table walker differs from the DMA device, 
> since
> we have these combinations:
> 
> 	Coherent walker?	IOMMU_CACHE	IO_PGTABLE_QUIRK_ARM_OUTER_WBWA
> 0:	N			0		0
> 1:	N			0		1
> 2:	N			1		0
> 3:	N			1		1
> 4:	Y			0		0
> 5:	Y			0		1
> 6:	Y			1		0
> 7:	Y			1		1
> 
> Some of them are obviously bogus, such as (7), but I don't know what to
> do about cases such as (3) and (5).
> 
> Will
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2021-01-07 16:57   ` isaacm
@ 2021-01-08  5:47     ` Sai Prakash Ranjan
  2021-01-08 18:09       ` isaacm
  2021-01-08 18:18       ` Will Deacon
  0 siblings, 2 replies; 10+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-08  5:47 UTC (permalink / raw)
  To: isaacm
  Cc: Will Deacon, Rob Clark, Jordan Crouse, linux-arm-msm,
	Joerg Roedel, linux-kernel, Akhil P Oommen, iommu, Robin Murphy,
	linux-arm-kernel

On 2021-01-07 22:27, isaacm@codeaurora.org wrote:
> On 2021-01-06 03:56, Will Deacon wrote:
>> On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
>>> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
>>> flag")
>>> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>>> the memory type setting required for the non-coherent masters to use
>>> system cache. Now that system cache support for GPU is added, we will
>>> need to mark the memory as normal sys-cached for GPU to use system 
>>> cache.
>>> Without this, the system cache lines are not allocated for GPU. We 
>>> use
>>> the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page 
>>> protection
>>> flag as the flag cannot be exposed via DMA api because of no in-tree
>>> users.
>>> 
>>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>>> ---
>>>  drivers/iommu/io-pgtable-arm.c | 3 +++
>>>  1 file changed, 3 insertions(+)
>>> 
>>> diff --git a/drivers/iommu/io-pgtable-arm.c 
>>> b/drivers/iommu/io-pgtable-arm.c
>>> index 7c9ea9d7874a..3fb7de8304a2 100644
>>> --- a/drivers/iommu/io-pgtable-arm.c
>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>> @@ -415,6 +415,9 @@ static arm_lpae_iopte arm_lpae_prot_to_pte(struct 
>>> arm_lpae_io_pgtable *data,
>>>  		else if (prot & IOMMU_CACHE)
>>>  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>>>  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
>>> +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>>> +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>>  	}
>> 
> While this approach of enabling system cache globally for both page
> tables and other buffers
> works for the GPU usecase, this isn't ideal for other clients that use
> system cache. For example,
> video clients only want to cache a subset of their buffers in the
> system cache, due to the sizing constraint
> imposed by how much of the system cache they can use. So, it would be
> ideal to have
> a way of expressing the desire to use the system cache on a per-buffer
> basis. Additionally,
> our video clients use the DMA layer, and since the requirement is for
> caching in the system cache
> to be a per buffer attribute, it seems like we would have to have a
> DMA attribute to express
> this on a per-buffer basis.
> 

I did bring this up initially [1], also where is this video client
in upstream? AFAIK, only system cache user in upstream is GPU.
We cannot add any DMA attribute unless there is any user upstream
as per [2], so when the support for such a client is added, wouldn't
((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG)
work?

[1] 
https://lore.kernel.org/dri-devel/ecfda7ca80f6d7b4ff3d89b8758f4dc9@codeaurora.org/
[2] https://lore.kernel.org/linux-iommu/20191026053026.GA14545@lst.de/T/

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2021-01-08  5:47     ` Sai Prakash Ranjan
@ 2021-01-08 18:09       ` isaacm
  2021-01-11  4:38         ` Sai Prakash Ranjan
  2021-01-08 18:18       ` Will Deacon
  1 sibling, 1 reply; 10+ messages in thread
From: isaacm @ 2021-01-08 18:09 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: Will Deacon, Rob Clark, Jordan Crouse, linux-arm-msm,
	Joerg Roedel, linux-kernel, Akhil P Oommen, iommu, Robin Murphy,
	linux-arm-kernel

On 2021-01-07 21:47, Sai Prakash Ranjan wrote:
> On 2021-01-07 22:27, isaacm@codeaurora.org wrote:
>> On 2021-01-06 03:56, Will Deacon wrote:
>>> On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
>>>> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
>>>> flag")
>>>> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>>>> the memory type setting required for the non-coherent masters to use
>>>> system cache. Now that system cache support for GPU is added, we 
>>>> will
>>>> need to mark the memory as normal sys-cached for GPU to use system 
>>>> cache.
>>>> Without this, the system cache lines are not allocated for GPU. We 
>>>> use
>>>> the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page 
>>>> protection
>>>> flag as the flag cannot be exposed via DMA api because of no in-tree
>>>> users.
>>>> 
>>>> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>>>> ---
>>>>  drivers/iommu/io-pgtable-arm.c | 3 +++
>>>>  1 file changed, 3 insertions(+)
>>>> 
>>>> diff --git a/drivers/iommu/io-pgtable-arm.c 
>>>> b/drivers/iommu/io-pgtable-arm.c
>>>> index 7c9ea9d7874a..3fb7de8304a2 100644
>>>> --- a/drivers/iommu/io-pgtable-arm.c
>>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>>> @@ -415,6 +415,9 @@ static arm_lpae_iopte 
>>>> arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>>>>  		else if (prot & IOMMU_CACHE)
>>>>  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>>>>  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>>> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
>>>> +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>>>> +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>>>  	}
>>> 
>> While this approach of enabling system cache globally for both page
>> tables and other buffers
>> works for the GPU usecase, this isn't ideal for other clients that use
>> system cache. For example,
>> video clients only want to cache a subset of their buffers in the
>> system cache, due to the sizing constraint
>> imposed by how much of the system cache they can use. So, it would be
>> ideal to have
>> a way of expressing the desire to use the system cache on a per-buffer
>> basis. Additionally,
>> our video clients use the DMA layer, and since the requirement is for
>> caching in the system cache
>> to be a per buffer attribute, it seems like we would have to have a
>> DMA attribute to express
>> this on a per-buffer basis.
>> 
> 
> I did bring this up initially [1], also where is this video client
> in upstream? AFAIK, only system cache user in upstream is GPU.
> We cannot add any DMA attribute unless there is any user upstream
Right, there wouldn't be an upstream user, which would be problematic,
but I was thinking of having it so that when video or any of our other
clients that use this attribute on a per buffer basis upstreams their
code, it's not too much of a stretch to add the support.
> as per [2], so when the support for such a client is added, wouldn't
> ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG)
> work?
I don't think that will work, because we currently have clients who use 
the
system cache as follows:
-cache only page tables in the system cache
-cache only data buffers in the system cache
-cache both page tables and all buffers in the system cache
-cache both page tables and some buffers in the system cache

The approach you're suggesting doesn't allow for the last case, as 
caching the
page tables in the system cache involves setting 
IO_PGTABLE_QUIRK_ARM_OUTER_WBWA,
so we will end up losing the flexibility to cache some data buffers in 
the system cache.

Ideally, the page table quirk would drive the settings for the TCR, and 
the prot flag
drives the PTE for the mapping, as is done with the page table walker 
being dma-coherent,
while buffers are mapped as cacheable based on IOMMU_CACHE. Thoughts?

Thanks,
Isaac
> 
> [1]
> https://lore.kernel.org/dri-devel/ecfda7ca80f6d7b4ff3d89b8758f4dc9@codeaurora.org/
> [2] 
> https://lore.kernel.org/linux-iommu/20191026053026.GA14545@lst.de/T/
> 
> Thanks,
> Sai

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2021-01-08  5:47     ` Sai Prakash Ranjan
  2021-01-08 18:09       ` isaacm
@ 2021-01-08 18:18       ` Will Deacon
  2021-01-08 19:50         ` isaacm
  2021-01-11  4:56         ` Sai Prakash Ranjan
  1 sibling, 2 replies; 10+ messages in thread
From: Will Deacon @ 2021-01-08 18:18 UTC (permalink / raw)
  To: Sai Prakash Ranjan
  Cc: isaacm, Rob Clark, Jordan Crouse, linux-arm-msm, Joerg Roedel,
	linux-kernel, Akhil P Oommen, iommu, Robin Murphy,
	linux-arm-kernel

On Fri, Jan 08, 2021 at 11:17:25AM +0530, Sai Prakash Ranjan wrote:
> On 2021-01-07 22:27, isaacm@codeaurora.org wrote:
> > On 2021-01-06 03:56, Will Deacon wrote:
> > > On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
> > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY
> > > > flag")
> > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
> > > > the memory type setting required for the non-coherent masters to use
> > > > system cache. Now that system cache support for GPU is added, we will
> > > > need to mark the memory as normal sys-cached for GPU to use
> > > > system cache.
> > > > Without this, the system cache lines are not allocated for GPU.
> > > > We use
> > > > the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page
> > > > protection
> > > > flag as the flag cannot be exposed via DMA api because of no in-tree
> > > > users.
> > > > 
> > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> > > > ---
> > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
> > > > b/drivers/iommu/io-pgtable-arm.c
> > > > index 7c9ea9d7874a..3fb7de8304a2 100644
> > > > --- a/drivers/iommu/io-pgtable-arm.c
> > > > +++ b/drivers/iommu/io-pgtable-arm.c
> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
> > > >  		else if (prot & IOMMU_CACHE)
> > > >  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
> > > >  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > > +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
> > > > +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
> > > > +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
> > > >  	}
> > > 
> > While this approach of enabling system cache globally for both page
> > tables and other buffers
> > works for the GPU usecase, this isn't ideal for other clients that use
> > system cache. For example,
> > video clients only want to cache a subset of their buffers in the
> > system cache, due to the sizing constraint
> > imposed by how much of the system cache they can use. So, it would be
> > ideal to have
> > a way of expressing the desire to use the system cache on a per-buffer
> > basis. Additionally,
> > our video clients use the DMA layer, and since the requirement is for
> > caching in the system cache
> > to be a per buffer attribute, it seems like we would have to have a
> > DMA attribute to express
> > this on a per-buffer basis.
> > 
> 
> I did bring this up initially [1], also where is this video client
> in upstream? AFAIK, only system cache user in upstream is GPU.
> We cannot add any DMA attribute unless there is any user upstream
> as per [2], so when the support for such a client is added, wouldn't
> ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || PROT_FLAG)
> work?

Hmm, I think this is another case where we need to separate out the
page-table walker attributes from the access attributes. Currently,
IO_PGTABLE_QUIRK_ARM_OUTER_WBWA applies _only_ to the page-table walker
and I don't think it makes any sense for that to be per-buffer (how would
you even manage that?). However, if we want to extend this to data accesses
and we know that there are valid use-cases where this should be per-buffer,
then shoe-horning it in with the walker quirk does not feel like the best
thing to do.

As a starting point, we could:

  1. Rename IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
  2. Add a new prot flag IOMMU_LLC
  3. Have the GPU pass the new prot for its buffer mappings

Does that work? One thing I'm not sure about is whether IOMMU_CACHE should
imply IOMMU_LLC, or whether there is a use-case for inner-cacheable, outer
non-cacheable mappings for a coherent device. Have you ever seen that sort
of thing before?

Will

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2021-01-08 18:18       ` Will Deacon
@ 2021-01-08 19:50         ` isaacm
  2021-01-11  4:56         ` Sai Prakash Ranjan
  1 sibling, 0 replies; 10+ messages in thread
From: isaacm @ 2021-01-08 19:50 UTC (permalink / raw)
  To: Will Deacon
  Cc: Sai Prakash Ranjan, linux-arm-msm, Joerg Roedel, Jordan Crouse,
	iommu, linux-kernel, Rob Clark, Akhil P Oommen, Robin Murphy,
	linux-arm-kernel

On 2021-01-08 10:18, Will Deacon wrote:
> On Fri, Jan 08, 2021 at 11:17:25AM +0530, Sai Prakash Ranjan wrote:
>> On 2021-01-07 22:27, isaacm@codeaurora.org wrote:
>> > On 2021-01-06 03:56, Will Deacon wrote:
>> > > On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
>> > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY
>> > > > flag")
>> > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> > > > the memory type setting required for the non-coherent masters to use
>> > > > system cache. Now that system cache support for GPU is added, we will
>> > > > need to mark the memory as normal sys-cached for GPU to use
>> > > > system cache.
>> > > > Without this, the system cache lines are not allocated for GPU.
>> > > > We use
>> > > > the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page
>> > > > protection
>> > > > flag as the flag cannot be exposed via DMA api because of no in-tree
>> > > > users.
>> > > >
>> > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> > > > ---
>> > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
>> > > >  1 file changed, 3 insertions(+)
>> > > >
>> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
>> > > > b/drivers/iommu/io-pgtable-arm.c
>> > > > index 7c9ea9d7874a..3fb7de8304a2 100644
>> > > > --- a/drivers/iommu/io-pgtable-arm.c
>> > > > +++ b/drivers/iommu/io-pgtable-arm.c
>> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
>> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>> > > >  		else if (prot & IOMMU_CACHE)
>> > > >  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>> > > >  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > > +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
>> > > > +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>> > > > +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > >  	}
>> > >
>> > While this approach of enabling system cache globally for both page
>> > tables and other buffers
>> > works for the GPU usecase, this isn't ideal for other clients that use
>> > system cache. For example,
>> > video clients only want to cache a subset of their buffers in the
>> > system cache, due to the sizing constraint
>> > imposed by how much of the system cache they can use. So, it would be
>> > ideal to have
>> > a way of expressing the desire to use the system cache on a per-buffer
>> > basis. Additionally,
>> > our video clients use the DMA layer, and since the requirement is for
>> > caching in the system cache
>> > to be a per buffer attribute, it seems like we would have to have a
>> > DMA attribute to express
>> > this on a per-buffer basis.
>> >
>> 
>> I did bring this up initially [1], also where is this video client
>> in upstream? AFAIK, only system cache user in upstream is GPU.
>> We cannot add any DMA attribute unless there is any user upstream
>> as per [2], so when the support for such a client is added, wouldn't
>> ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || 
>> PROT_FLAG)
>> work?
> 
> Hmm, I think this is another case where we need to separate out the
> page-table walker attributes from the access attributes. Currently,
> IO_PGTABLE_QUIRK_ARM_OUTER_WBWA applies _only_ to the page-table walker
> and I don't think it makes any sense for that to be per-buffer (how 
> would
> you even manage that?). However, if we want to extend this to data 
> accesses
> and we know that there are valid use-cases where this should be 
> per-buffer,
> then shoe-horning it in with the walker quirk does not feel like the 
> best
> thing to do.
Right, I agree that this seems something that merits the same level of 
separation
that exists for the page table walker attributes with respect to 
coherency, and
data buffer attributes with respect to coherency (i.e page table walker 
coherency
does not imply data buffer coherency--that is driven through 
IOMMU_CACHE).
> 
> As a starting point, we could:
> 
>   1. Rename IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>   2. Add a new prot flag IOMMU_LLC
>   3. Have the GPU pass the new prot for its buffer mappings
> 
> Does that work? One thing I'm not sure about is whether IOMMU_CACHE 
> should
Yes, that should work, as that'll leave the door open for there to be a 
DMA attribute
that can be wired up to IOMMU_LLC.
> imply IOMMU_LLC, or whether there is a use-case for inner-cacheable, 
> outer
> non-cacheable mappings for a coherent device. Have you ever seen that 
> sort
I'm not aware of such a usecase, but I believe that a coherent device 
will
have their buffers cached in the system cache anyway, as well as the CPU 
caches.

--Isaac
> of thing before?
> 
> Will
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2021-01-08 18:09       ` isaacm
@ 2021-01-11  4:38         ` Sai Prakash Ranjan
  0 siblings, 0 replies; 10+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-11  4:38 UTC (permalink / raw)
  To: isaacm
  Cc: Will Deacon, Rob Clark, Jordan Crouse, linux-arm-msm,
	Joerg Roedel, linux-kernel, Akhil P Oommen, iommu, Robin Murphy,
	linux-arm-kernel

On 2021-01-08 23:39, isaacm@codeaurora.org wrote:
> On 2021-01-07 21:47, Sai Prakash Ranjan wrote:
>> On 2021-01-07 22:27, isaacm@codeaurora.org wrote:
>>> On 2021-01-06 03:56, Will Deacon wrote:
>>>> On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
>>>>> commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
>>>>> flag")
>>>>> removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it 
>>>>> went
>>>>> the memory type setting required for the non-coherent masters to 
>>>>> use
>>>>> system cache. Now that system cache support for GPU is added, we 
>>>>> will
>>>>> need to mark the memory as normal sys-cached for GPU to use system 
>>>>> cache.
>>>>> Without this, the system cache lines are not allocated for GPU. We 
>>>>> use
>>>>> the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page 
>>>>> protection
>>>>> flag as the flag cannot be exposed via DMA api because of no 
>>>>> in-tree
>>>>> users.
>>>>> 
>>>>> Signed-off-by: Sai Prakash Ranjan 
>>>>> <saiprakash.ranjan@codeaurora.org>
>>>>> ---
>>>>>  drivers/iommu/io-pgtable-arm.c | 3 +++
>>>>>  1 file changed, 3 insertions(+)
>>>>> 
>>>>> diff --git a/drivers/iommu/io-pgtable-arm.c 
>>>>> b/drivers/iommu/io-pgtable-arm.c
>>>>> index 7c9ea9d7874a..3fb7de8304a2 100644
>>>>> --- a/drivers/iommu/io-pgtable-arm.c
>>>>> +++ b/drivers/iommu/io-pgtable-arm.c
>>>>> @@ -415,6 +415,9 @@ static arm_lpae_iopte 
>>>>> arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>>>>>  		else if (prot & IOMMU_CACHE)
>>>>>  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>>>>>  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>>>> +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
>>>>> +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>>>>> +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>>>>>  	}
>>>> 
>>> While this approach of enabling system cache globally for both page
>>> tables and other buffers
>>> works for the GPU usecase, this isn't ideal for other clients that 
>>> use
>>> system cache. For example,
>>> video clients only want to cache a subset of their buffers in the
>>> system cache, due to the sizing constraint
>>> imposed by how much of the system cache they can use. So, it would be
>>> ideal to have
>>> a way of expressing the desire to use the system cache on a 
>>> per-buffer
>>> basis. Additionally,
>>> our video clients use the DMA layer, and since the requirement is for
>>> caching in the system cache
>>> to be a per buffer attribute, it seems like we would have to have a
>>> DMA attribute to express
>>> this on a per-buffer basis.
>>> 
>> 
>> I did bring this up initially [1], also where is this video client
>> in upstream? AFAIK, only system cache user in upstream is GPU.
>> We cannot add any DMA attribute unless there is any user upstream
> Right, there wouldn't be an upstream user, which would be problematic,
> but I was thinking of having it so that when video or any of our other
> clients that use this attribute on a per buffer basis upstreams their
> code, it's not too much of a stretch to add the support.

Agreed.

>> as per [2], so when the support for such a client is added, wouldn't
>> ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || 
>> PROT_FLAG)
>> work?
> I don't think that will work, because we currently have clients who use 
> the
> system cache as follows:
> -cache only page tables in the system cache
> -cache only data buffers in the system cache
> -cache both page tables and all buffers in the system cache
> -cache both page tables and some buffers in the system cache
> 
> The approach you're suggesting doesn't allow for the last case, as 
> caching the
> page tables in the system cache involves setting
> IO_PGTABLE_QUIRK_ARM_OUTER_WBWA,
> so we will end up losing the flexibility to cache some data buffers in
> the system cache.
> 

Ah yes, you are right, I believe Jordan mentioned the same [1].

[1] 
https://lore.kernel.org/lkml/20200709161352.GC21059@jcrouse1-lnx.qualcomm.com/

> Ideally, the page table quirk would drive the settings for the TCR,
> and the prot flag
> drives the PTE for the mapping, as is done with the page table walker
> being dma-coherent,
> while buffers are mapped as cacheable based on IOMMU_CACHE. Thoughts?
> 

Right, mixing the two is not correct. Will's suggestion for a new prot
flag sounds good to me, I will work on that.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache
  2021-01-08 18:18       ` Will Deacon
  2021-01-08 19:50         ` isaacm
@ 2021-01-11  4:56         ` Sai Prakash Ranjan
  1 sibling, 0 replies; 10+ messages in thread
From: Sai Prakash Ranjan @ 2021-01-11  4:56 UTC (permalink / raw)
  To: Will Deacon
  Cc: isaacm, Rob Clark, Jordan Crouse, linux-arm-msm, Joerg Roedel,
	linux-kernel, Akhil P Oommen, iommu, Robin Murphy,
	linux-arm-kernel

On 2021-01-08 23:48, Will Deacon wrote:
> On Fri, Jan 08, 2021 at 11:17:25AM +0530, Sai Prakash Ranjan wrote:
>> On 2021-01-07 22:27, isaacm@codeaurora.org wrote:
>> > On 2021-01-06 03:56, Will Deacon wrote:
>> > > On Thu, Dec 24, 2020 at 12:10:07PM +0530, Sai Prakash Ranjan wrote:
>> > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY
>> > > > flag")
>> > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it went
>> > > > the memory type setting required for the non-coherent masters to use
>> > > > system cache. Now that system cache support for GPU is added, we will
>> > > > need to mark the memory as normal sys-cached for GPU to use
>> > > > system cache.
>> > > > Without this, the system cache lines are not allocated for GPU.
>> > > > We use
>> > > > the IO_PGTABLE_QUIRK_ARM_OUTER_WBWA quirk instead of a page
>> > > > protection
>> > > > flag as the flag cannot be exposed via DMA api because of no in-tree
>> > > > users.
>> > > >
>> > > > Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
>> > > > ---
>> > > >  drivers/iommu/io-pgtable-arm.c | 3 +++
>> > > >  1 file changed, 3 insertions(+)
>> > > >
>> > > > diff --git a/drivers/iommu/io-pgtable-arm.c
>> > > > b/drivers/iommu/io-pgtable-arm.c
>> > > > index 7c9ea9d7874a..3fb7de8304a2 100644
>> > > > --- a/drivers/iommu/io-pgtable-arm.c
>> > > > +++ b/drivers/iommu/io-pgtable-arm.c
>> > > > @@ -415,6 +415,9 @@ static arm_lpae_iopte
>> > > > arm_lpae_prot_to_pte(struct arm_lpae_io_pgtable *data,
>> > > >  		else if (prot & IOMMU_CACHE)
>> > > >  			pte |= (ARM_LPAE_MAIR_ATTR_IDX_CACHE
>> > > >  				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > > +		else if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)
>> > > > +			pte |= (ARM_LPAE_MAIR_ATTR_IDX_INC_OCACHE
>> > > > +				<< ARM_LPAE_PTE_ATTRINDX_SHIFT);
>> > > >  	}
>> > >
>> > While this approach of enabling system cache globally for both page
>> > tables and other buffers
>> > works for the GPU usecase, this isn't ideal for other clients that use
>> > system cache. For example,
>> > video clients only want to cache a subset of their buffers in the
>> > system cache, due to the sizing constraint
>> > imposed by how much of the system cache they can use. So, it would be
>> > ideal to have
>> > a way of expressing the desire to use the system cache on a per-buffer
>> > basis. Additionally,
>> > our video clients use the DMA layer, and since the requirement is for
>> > caching in the system cache
>> > to be a per buffer attribute, it seems like we would have to have a
>> > DMA attribute to express
>> > this on a per-buffer basis.
>> >
>> 
>> I did bring this up initially [1], also where is this video client
>> in upstream? AFAIK, only system cache user in upstream is GPU.
>> We cannot add any DMA attribute unless there is any user upstream
>> as per [2], so when the support for such a client is added, wouldn't
>> ((data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_OUTER_WBWA) || 
>> PROT_FLAG)
>> work?
> 
> Hmm, I think this is another case where we need to separate out the
> page-table walker attributes from the access attributes. Currently,
> IO_PGTABLE_QUIRK_ARM_OUTER_WBWA applies _only_ to the page-table walker
> and I don't think it makes any sense for that to be per-buffer (how 
> would
> you even manage that?). However, if we want to extend this to data 
> accesses
> and we know that there are valid use-cases where this should be 
> per-buffer,
> then shoe-horning it in with the walker quirk does not feel like the 
> best
> thing to do.
> 
> As a starting point, we could:
> 
>   1. Rename IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to IO_PGTABLE_QUIRK_PTW_LLC
>   2. Add a new prot flag IOMMU_LLC
>   3. Have the GPU pass the new prot for its buffer mappings
> 

This looks good to me, I will work on this and post something soon.

> Does that work? One thing I'm not sure about is whether IOMMU_CACHE 
> should
> imply IOMMU_LLC, or whether there is a use-case for inner-cacheable, 
> outer
> non-cacheable mappings for a coherent device. Have you ever seen that 
> sort
> of thing before?
> 

I don't think there is such a usecase as Isaac mentioned.

Thanks,
Sai

-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member
of Code Aurora Forum, hosted by The Linux Foundation

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-01-11  4:57 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-24  6:40 [PATCH] iommu/io-pgtable-arm: Allow non-coherent masters to use system cache Sai Prakash Ranjan
2021-01-06 11:56 ` Will Deacon
2021-01-07  6:35   ` Sai Prakash Ranjan
2021-01-07 16:57   ` isaacm
2021-01-08  5:47     ` Sai Prakash Ranjan
2021-01-08 18:09       ` isaacm
2021-01-11  4:38         ` Sai Prakash Ranjan
2021-01-08 18:18       ` Will Deacon
2021-01-08 19:50         ` isaacm
2021-01-11  4:56         ` Sai Prakash Ranjan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).