All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch
@ 2023-03-05  5:21 Huacai Chen
  2023-03-05 16:40 ` Christian König
  0 siblings, 1 reply; 5+ messages in thread
From: Huacai Chen @ 2023-03-05  5:21 UTC (permalink / raw)
  To: Alex Deucher, Christian König, David Airlie
  Cc: Xuefeng Li, dri-devel, amd-gfx, Huacai Chen

LoongArch maintains cache coherency in hardware, but its WUC attribute
(Weak-ordered UnCached, which is similar to WC) is out of the scope of
cache coherency machanism. This means WUC can only used for write-only
memory regions. So use uncached ioremap() for LoongArch in the amdgpu
driver.

Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c5ef7f7bdc15..c6888a58819a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1750,8 +1750,13 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 
 	else
 #endif
+#ifdef CONFIG_LOONGARCH
+		adev->mman.aper_base_kaddr = ioremap(adev->gmc.aper_base,
+				adev->gmc.visible_vram_size);
+#else
 		adev->mman.aper_base_kaddr = ioremap_wc(adev->gmc.aper_base,
 				adev->gmc.visible_vram_size);
+#endif
 #endif
 
 	/*
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch
  2023-03-05  5:21 [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch Huacai Chen
@ 2023-03-05 16:40 ` Christian König
  2023-03-06  2:49   ` Huacai Chen
  0 siblings, 1 reply; 5+ messages in thread
From: Christian König @ 2023-03-05 16:40 UTC (permalink / raw)
  To: Huacai Chen, Alex Deucher, David Airlie; +Cc: Xuefeng Li, dri-devel, amd-gfx

Am 05.03.23 um 06:21 schrieb Huacai Chen:
> LoongArch maintains cache coherency in hardware, but its WUC attribute
> (Weak-ordered UnCached, which is similar to WC) is out of the scope of
> cache coherency machanism. This means WUC can only used for write-only
> memory regions. So use uncached ioremap() for LoongArch in the amdgpu
> driver.

Well NAK. This is leaking platform dependencies into the driver code.

When you have a limitation that ioremap_wc() can't guarantee read/write 
ordering then that's pretty clearly a platform bug and you would need to 
apply this workaround to all drivers using ioremap_wc() which isn't 
really feasible.

The x86 cache dependencies is because the GPU can also be part of the 
CPU in which case PCIe is not used to access the aperture base.

Regards,
Christian.

>
> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index c5ef7f7bdc15..c6888a58819a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -1750,8 +1750,13 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
>   
>   	else
>   #endif
> +#ifdef CONFIG_LOONGARCH
> +		adev->mman.aper_base_kaddr = ioremap(adev->gmc.aper_base,
> +				adev->gmc.visible_vram_size);
> +#else
>   		adev->mman.aper_base_kaddr = ioremap_wc(adev->gmc.aper_base,
>   				adev->gmc.visible_vram_size);
> +#endif
>   #endif
>   
>   	/*


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch
  2023-03-05 16:40 ` Christian König
@ 2023-03-06  2:49   ` Huacai Chen
  2023-03-06  3:01     ` WANG Xuerui
  0 siblings, 1 reply; 5+ messages in thread
From: Huacai Chen @ 2023-03-06  2:49 UTC (permalink / raw)
  To: Christian König
  Cc: David Airlie, amd-gfx, dri-devel, Alex Deucher, WANG Xuerui,
	Xuefeng Li, Huacai Chen

Hi, Christian,

On Mon, Mar 6, 2023 at 12:40 AM Christian König
<christian.koenig@amd.com> wrote:
>
> Am 05.03.23 um 06:21 schrieb Huacai Chen:
> > LoongArch maintains cache coherency in hardware, but its WUC attribute
> > (Weak-ordered UnCached, which is similar to WC) is out of the scope of
> > cache coherency machanism. This means WUC can only used for write-only
> > memory regions. So use uncached ioremap() for LoongArch in the amdgpu
> > driver.
>
> Well NAK. This is leaking platform dependencies into the driver code.
Then is it acceptable to let ioremap() depend on drm_arch_can_wc_memory()?

Huacai
>
> When you have a limitation that ioremap_wc() can't guarantee read/write
> ordering then that's pretty clearly a platform bug and you would need to
> apply this workaround to all drivers using ioremap_wc() which isn't
> really feasible.
>
> The x86 cache dependencies is because the GPU can also be part of the
> CPU in which case PCIe is not used to access the aperture base.
>
> Regards,
> Christian.
>
> >
> > Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 +++++
> >   1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > index c5ef7f7bdc15..c6888a58819a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> > @@ -1750,8 +1750,13 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
> >
> >       else
> >   #endif
> > +#ifdef CONFIG_LOONGARCH
> > +             adev->mman.aper_base_kaddr = ioremap(adev->gmc.aper_base,
> > +                             adev->gmc.visible_vram_size);
> > +#else
> >               adev->mman.aper_base_kaddr = ioremap_wc(adev->gmc.aper_base,
> >                               adev->gmc.visible_vram_size);
> > +#endif
> >   #endif
> >
> >       /*
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch
  2023-03-06  2:49   ` Huacai Chen
@ 2023-03-06  3:01     ` WANG Xuerui
  2023-03-06  7:07       ` Christian König
  0 siblings, 1 reply; 5+ messages in thread
From: WANG Xuerui @ 2023-03-06  3:01 UTC (permalink / raw)
  To: Huacai Chen, Christian König
  Cc: David Airlie, amd-gfx, dri-devel, Alex Deucher, Xuefeng Li, Huacai Chen

On 2023/3/6 10:49, Huacai Chen wrote:
> Hi, Christian,
> 
> On Mon, Mar 6, 2023 at 12:40 AM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> Am 05.03.23 um 06:21 schrieb Huacai Chen:
>>> LoongArch maintains cache coherency in hardware, but its WUC attribute
>>> (Weak-ordered UnCached, which is similar to WC) is out of the scope of
>>> cache coherency machanism. This means WUC can only used for write-only
>>> memory regions. So use uncached ioremap() for LoongArch in the amdgpu
>>> driver.
>>
>> Well NAK. This is leaking platform dependencies into the driver code.
> Then is it acceptable to let ioremap() depend on drm_arch_can_wc_memory()?

Note: he's likely meaning "is it acceptable to use 
drm_arch_can_wc_memory() to decide between ioremap() and ioremap_wc()".

Although I doubt it's acceptable to you (driver) folks either, because 
while drm_arch_can_wc_memory() does isolate platform details from driver 
proper, it's still papering over platform PCIe violation in VRAM domain. 
Still better than having platform defines though.

Also making use of drm_arch_can_wc_memory might fix this fdo issue [1] 
on aarch64 too (where I replied earlier). It seems people simply can't 
stop inventing such micro-architectures sadly...

[1]: https://gitlab.freedesktop.org/drm/amd/-/issues/2313

>>
>> When you have a limitation that ioremap_wc() can't guarantee read/write
>> ordering then that's pretty clearly a platform bug and you would need to
>> apply this workaround to all drivers using ioremap_wc() which isn't
>> really feasible.
>>

I agree in this case perhaps all of ioremap_wc() usages would have to 
degrade into ioremap() for correctness on such platforms. In which case 
amdgpu wouldn't have to be individually called out / touched anyway. 
Whether this is easily doable/upstreamable is another question though...

-- 
WANG "xen0n" Xuerui

Linux/LoongArch mailing list: https://lore.kernel.org/loongarch/


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch
  2023-03-06  3:01     ` WANG Xuerui
@ 2023-03-06  7:07       ` Christian König
  0 siblings, 0 replies; 5+ messages in thread
From: Christian König @ 2023-03-06  7:07 UTC (permalink / raw)
  To: WANG Xuerui, Huacai Chen
  Cc: David Airlie, amd-gfx, dri-devel, Alex Deucher, Xuefeng Li, Huacai Chen

Am 06.03.23 um 04:01 schrieb WANG Xuerui:
> On 2023/3/6 10:49, Huacai Chen wrote:
>> Hi, Christian,
>>
>> On Mon, Mar 6, 2023 at 12:40 AM Christian König
>> <christian.koenig@amd.com> wrote:
>>>
>>> Am 05.03.23 um 06:21 schrieb Huacai Chen:
>>>> LoongArch maintains cache coherency in hardware, but its WUC attribute
>>>> (Weak-ordered UnCached, which is similar to WC) is out of the scope of
>>>> cache coherency machanism. This means WUC can only used for write-only
>>>> memory regions. So use uncached ioremap() for LoongArch in the amdgpu
>>>> driver.
>>>
>>> Well NAK. This is leaking platform dependencies into the driver code.
>> Then is it acceptable to let ioremap() depend on 
>> drm_arch_can_wc_memory()?
>
> Note: he's likely meaning "is it acceptable to use 
> drm_arch_can_wc_memory() to decide between ioremap() and ioremap_wc()".
>
> Although I doubt it's acceptable to you (driver) folks either, because 
> while drm_arch_can_wc_memory() does isolate platform details from 
> driver proper, it's still papering over platform PCIe violation in 
> VRAM domain. Still better than having platform defines though.

Well agree on the PCIe violations, but drm_arch_can_wc_memory() is just 
for a completely different use case.

drm_arch_can_wc_memory() checks if system memory can be accessed write 
combined as well (which is a PCIe extension) or needs to be accessed 
with caching enabled (which is a core PCIe requirement).

So completely different topic and no using this here is not acceptable 
either.

The key point is that when WUC only works with write only mappings you 
*can't* use that to implement ioremap_wc().

> Also making use of drm_arch_can_wc_memory might fix this fdo issue [1] 
> on aarch64 too (where I replied earlier). It seems people simply can't 
> stop inventing such micro-architectures sadly...

I don't think that will help for this bug. WC on iomem is known to work 
correctly on aarch64 and well tested. What doesn't work is using WC on 
system memory.

And in the case of aarch64 it's not a core issue with the platform, but 
rather that some hw implements get it right and some get it wrong.

I already had an in deep discussion with ARM folks about that and it 
seems that some hw implementations think they can combine the core IP 
with some cheap PCIe root complex and it just magically works.

Regards,
Christian.

>
> [1]: https://gitlab.freedesktop.org/drm/amd/-/issues/2313
>
>>>
>>> When you have a limitation that ioremap_wc() can't guarantee read/write
>>> ordering then that's pretty clearly a platform bug and you would 
>>> need to
>>> apply this workaround to all drivers using ioremap_wc() which isn't
>>> really feasible.
>>>
>
> I agree in this case perhaps all of ioremap_wc() usages would have to 
> degrade into ioremap() for correctness on such platforms. In which 
> case amdgpu wouldn't have to be individually called out / touched 
> anyway. Whether this is easily doable/upstreamable is another question 
> though...
>


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-03-06  8:23 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-05  5:21 [PATCH] drm/amdgpu: Use uncached ioremap() for LoongArch Huacai Chen
2023-03-05 16:40 ` Christian König
2023-03-06  2:49   ` Huacai Chen
2023-03-06  3:01     ` WANG Xuerui
2023-03-06  7:07       ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.