All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] drm: return false in drm_arch_can_wc_memory() for ARM
@ 2018-12-20 14:56 Alex Deucher
       [not found] ` <20181220145657.304-1-alexander.deucher-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Deucher @ 2018-12-20 14:56 UTC (permalink / raw)
  To: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Alex Deucher

I'm not familiar enough with ARM to know if write combining
is actually an architectural limitation or if it's an issue
with the PCIe IPs used on various platforms, but so far
everyone that has tried to run radeon hardware on
ARM has had to disable it.  So let's just make it official.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
---
 include/drm/drm_cache.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
index bfe1639df02d..691b4c4b0587 100644
--- a/include/drm/drm_cache.h
+++ b/include/drm/drm_cache.h
@@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void)
 	return false;
 #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
 	return false;
+#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
+	return false;
 #else
 	return true;
 #endif
-- 
2.13.6

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm: return false in drm_arch_can_wc_memory() for ARM
       [not found] ` <20181220145657.304-1-alexander.deucher-5C7GfCeVMHo@public.gmane.org>
@ 2018-12-20 15:36   ` Daniel Vetter
       [not found]     ` <20181220153619.GP21184-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Daniel Vetter @ 2018-12-20 15:36 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Alex Deucher, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote:
> I'm not familiar enough with ARM to know if write combining
> is actually an architectural limitation or if it's an issue
> with the PCIe IPs used on various platforms, but so far
> everyone that has tried to run radeon hardware on
> ARM has had to disable it.  So let's just make it official.

wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing
mappings and stuff, so you need to allocate your wc memory from special
pools. So probably best to just disable it until we figure this out.
 
> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
>  include/drm/drm_cache.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> index bfe1639df02d..691b4c4b0587 100644
> --- a/include/drm/drm_cache.h
> +++ b/include/drm/drm_cache.h
> @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void)
>  	return false;
>  #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
>  	return false;
> +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> +	return false;
>  #else
>  	return true;
>  #endif
> -- 
> 2.13.6
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm: return false in drm_arch_can_wc_memory() for ARM
       [not found]     ` <20181220153619.GP21184-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
@ 2018-12-21 14:16       ` Liviu Dudau
       [not found]         ` <20181221141644.GB22341-A/Nd4k6kWRHZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Liviu Dudau @ 2018-12-21 14:16 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Alex Deucher, Alex Deucher,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote:
> On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote:
> > I'm not familiar enough with ARM to know if write combining
> > is actually an architectural limitation or if it's an issue
> > with the PCIe IPs used on various platforms, but so far
> > everyone that has tried to run radeon hardware on
> > ARM has had to disable it.  So let's just make it official.
> 
> wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing
> mappings and stuff, so you need to allocate your wc memory from special
> pools. So probably best to just disable it until we figure this out.

I believe both of you are conflating different issues under the wrong
name. Write combining happens all the time with Arm, the ARMv8
architecture is a weakly-ordered model of memory so hardware is allowed
to re-order or combine memory access as they seem fit.

A while ago I did run an AMD GPU card on my Juno dev board and it worked
(for a very limited definition of worked, I've only validated the fact
that I could get an fbcon and could run un-accelerated X11). So I would
be interested if Alex could share some of the scenarios where people are
seeing failures.

As for aliasing, yeah, having multiple aliases to the same piece of
memory is a bad thing. The problem arises when devices on the PCI bus
have memory allocated as device memory (which on Arm is non-cacheable
and non-reorderable), but the PCI bus effectively acts as a write-combiner
which changes the order of transactions. Therefore, for devices that
have local memory associated with them (i.e. more than just register
accesses) one should allocate memory in the first place that is
Device-GRE (gathering, reordering and early-access). Otherwise, problems
will surface that are not visible on x86 as that is a strongly ordered
architecture.

>  
> > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> 
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Given that this API is only used by AMD I'm OK for now with the change,
but I think in general it is misleading and we should work towards
fixing radeon and amd drivers.

Best regards,
Liviu

> 
> > ---
> >  include/drm/drm_cache.h | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> > index bfe1639df02d..691b4c4b0587 100644
> > --- a/include/drm/drm_cache.h
> > +++ b/include/drm/drm_cache.h
> > @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void)
> >  	return false;
> >  #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
> >  	return false;
> > +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> > +	return false;
> >  #else
> >  	return true;
> >  #endif
> > -- 
> > 2.13.6
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
====================
| I would like to |
| fix the world,  |
| but they're not |
| giving me the   |
 \ source code!  /
  ---------------
    ¯\_(ツ)_/¯
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm: return false in drm_arch_can_wc_memory() for ARM
       [not found]         ` <20181221141644.GB22341-A/Nd4k6kWRHZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
@ 2018-12-21 14:48           ` Alex Deucher
       [not found]             ` <CADnq5_P9VZMhC_4j3F4hqHdD6Ey0=totPuNhF3hYBzkbgHsPpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 5+ messages in thread
From: Alex Deucher @ 2018-12-21 14:48 UTC (permalink / raw)
  To: Liviu Dudau
  Cc: Alex Deucher, amd-gfx list, Maling list - DRI developers, Daniel Vetter

On Fri, Dec 21, 2018 at 9:16 AM Liviu Dudau <Liviu.Dudau@arm.com> wrote:
>
> On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote:
> > On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote:
> > > I'm not familiar enough with ARM to know if write combining
> > > is actually an architectural limitation or if it's an issue
> > > with the PCIe IPs used on various platforms, but so far
> > > everyone that has tried to run radeon hardware on
> > > ARM has had to disable it.  So let's just make it official.
> >
> > wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing
> > mappings and stuff, so you need to allocate your wc memory from special
> > pools. So probably best to just disable it until we figure this out.
>
> I believe both of you are conflating different issues under the wrong
> name. Write combining happens all the time with Arm, the ARMv8
> architecture is a weakly-ordered model of memory so hardware is allowed
> to re-order or combine memory access as they seem fit.
>
> A while ago I did run an AMD GPU card on my Juno dev board and it worked
> (for a very limited definition of worked, I've only validated the fact
> that I could get an fbcon and could run un-accelerated X11). So I would
> be interested if Alex could share some of the scenarios where people are
> seeing failures.

Here's an example:
https://bugs.freedesktop.org/show_bug.cgi?id=108625
But there are probably 5 or 6 other cases where people have emailed me
or our team directly with issues on ARM resolved by disabling WC.
Generally the driver seems to load ok, but then hangs as soon as you
try and use acceleration from userspace or we end up with page
flipping timeouts.  Not really sure what the issue is.  Michel
suggested maybe ARM has a cacheable kernel mapping of all "normal"
system memory, and having
both that mapping and another non-cacheable mapping of the same page
can result in bad behaviour.

>
> As for aliasing, yeah, having multiple aliases to the same piece of
> memory is a bad thing. The problem arises when devices on the PCI bus
> have memory allocated as device memory (which on Arm is non-cacheable
> and non-reorderable), but the PCI bus effectively acts as a write-combiner
> which changes the order of transactions. Therefore, for devices that
> have local memory associated with them (i.e. more than just register
> accesses) one should allocate memory in the first place that is
> Device-GRE (gathering, reordering and early-access). Otherwise, problems
> will surface that are not visible on x86 as that is a strongly ordered
> architecture.

PCI framebuffer BARs are mapped on the CPU with WC.  We also use
uncached WC mappings for system memory in cases where it's not likely
we will be doing any CPU reads.  When accessing system memory, the GPU
can either do a CPU cache snooped transaction or a non-snooped
transaction.  The non-snooped transaction has lower latency and better
throughput since it doesn't have to snoop the CPU cache.

>
> >
> > > Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
> >
> > Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> Given that this API is only used by AMD I'm OK for now with the change,
> but I think in general it is misleading and we should work towards
> fixing radeon and amd drivers.

Alternatively, we could just disable WC in the amdgpu driver on ARM.
I'm not sure to what extent other drivers are using WC in general or
have been tested on ARM.

Alex

>
> Best regards,
> Liviu
>
> >
> > > ---
> > >  include/drm/drm_cache.h | 2 ++
> > >  1 file changed, 2 insertions(+)
> > >
> > > diff --git a/include/drm/drm_cache.h b/include/drm/drm_cache.h
> > > index bfe1639df02d..691b4c4b0587 100644
> > > --- a/include/drm/drm_cache.h
> > > +++ b/include/drm/drm_cache.h
> > > @@ -47,6 +47,8 @@ static inline bool drm_arch_can_wc_memory(void)
> > >     return false;
> > >  #elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
> > >     return false;
> > > +#elif defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> > > +   return false;
> > >  #else
> > >     return true;
> > >  #endif
> > > --
> > > 2.13.6
> > >
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> ====================
> | I would like to |
> | fix the world,  |
> | but they're not |
> | giving me the   |
>  \ source code!  /
>   ---------------
>     ¯\_(ツ)_/¯
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm: return false in drm_arch_can_wc_memory() for ARM
       [not found]             ` <CADnq5_P9VZMhC_4j3F4hqHdD6Ey0=totPuNhF3hYBzkbgHsPpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2018-12-21 16:39               ` Eric Anholt
  0 siblings, 0 replies; 5+ messages in thread
From: Eric Anholt @ 2018-12-21 16:39 UTC (permalink / raw)
  To: Alex Deucher, Liviu Dudau
  Cc: Alex Deucher, Maling list - DRI developers, amd-gfx list


[-- Attachment #1.1: Type: text/plain, Size: 3922 bytes --]

Alex Deucher <alexdeucher-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> writes:

> On Fri, Dec 21, 2018 at 9:16 AM Liviu Dudau <Liviu.Dudau-5wv7dgnIgG8@public.gmane.org> wrote:
>>
>> On Thu, Dec 20, 2018 at 04:36:19PM +0100, Daniel Vetter wrote:
>> > On Thu, Dec 20, 2018 at 09:56:57AM -0500, Alex Deucher wrote:
>> > > I'm not familiar enough with ARM to know if write combining
>> > > is actually an architectural limitation or if it's an issue
>> > > with the PCIe IPs used on various platforms, but so far
>> > > everyone that has tried to run radeon hardware on
>> > > ARM has had to disable it.  So let's just make it official.
>> >
>> > wc on arm is Really Complicated (tm) afaiui. There's issues with aliasing
>> > mappings and stuff, so you need to allocate your wc memory from special
>> > pools. So probably best to just disable it until we figure this out.
>>
>> I believe both of you are conflating different issues under the wrong
>> name. Write combining happens all the time with Arm, the ARMv8
>> architecture is a weakly-ordered model of memory so hardware is allowed
>> to re-order or combine memory access as they seem fit.
>>
>> A while ago I did run an AMD GPU card on my Juno dev board and it worked
>> (for a very limited definition of worked, I've only validated the fact
>> that I could get an fbcon and could run un-accelerated X11). So I would
>> be interested if Alex could share some of the scenarios where people are
>> seeing failures.
>
> Here's an example:
> https://bugs.freedesktop.org/show_bug.cgi?id=108625
> But there are probably 5 or 6 other cases where people have emailed me
> or our team directly with issues on ARM resolved by disabling WC.
> Generally the driver seems to load ok, but then hangs as soon as you
> try and use acceleration from userspace or we end up with page
> flipping timeouts.  Not really sure what the issue is.  Michel
> suggested maybe ARM has a cacheable kernel mapping of all "normal"
> system memory, and having
> both that mapping and another non-cacheable mapping of the same page
> can result in bad behaviour.
>
>>
>> As for aliasing, yeah, having multiple aliases to the same piece of
>> memory is a bad thing. The problem arises when devices on the PCI bus
>> have memory allocated as device memory (which on Arm is non-cacheable
>> and non-reorderable), but the PCI bus effectively acts as a write-combiner
>> which changes the order of transactions. Therefore, for devices that
>> have local memory associated with them (i.e. more than just register
>> accesses) one should allocate memory in the first place that is
>> Device-GRE (gathering, reordering and early-access). Otherwise, problems
>> will surface that are not visible on x86 as that is a strongly ordered
>> architecture.
>
> PCI framebuffer BARs are mapped on the CPU with WC.  We also use
> uncached WC mappings for system memory in cases where it's not likely
> we will be doing any CPU reads.  When accessing system memory, the GPU
> can either do a CPU cache snooped transaction or a non-snooped
> transaction.  The non-snooped transaction has lower latency and better
> throughput since it doesn't have to snoop the CPU cache.
>
>>
>> >
>> > > Signed-off-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>
>> >
>> > Reviewed-by: Daniel Vetter <daniel.vetter-/w4YWyX8dFk@public.gmane.org>
>>
>> Given that this API is only used by AMD I'm OK for now with the change,
>> but I think in general it is misleading and we should work towards
>> fixing radeon and amd drivers.
>
> Alternatively, we could just disable WC in the amdgpu driver on ARM.
> I'm not sure to what extent other drivers are using WC in general or
> have been tested on ARM.

FWIW, I use WC mappings of BOs on V3D (shmem) and VC4 (cma).  V3D is
totally stable.  VC4 I've heard reports of stability issues long-term
but I don't think it's related.  I don't do any cached mappings of my
BOs, though.

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-12-21 16:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-20 14:56 [PATCH] drm: return false in drm_arch_can_wc_memory() for ARM Alex Deucher
     [not found] ` <20181220145657.304-1-alexander.deucher-5C7GfCeVMHo@public.gmane.org>
2018-12-20 15:36   ` Daniel Vetter
     [not found]     ` <20181220153619.GP21184-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2018-12-21 14:16       ` Liviu Dudau
     [not found]         ` <20181221141644.GB22341-A/Nd4k6kWRHZROr8t4l/smS4ubULX0JqMm0uRHvK7Nw@public.gmane.org>
2018-12-21 14:48           ` Alex Deucher
     [not found]             ` <CADnq5_P9VZMhC_4j3F4hqHdD6Ey0=totPuNhF3hYBzkbgHsPpg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2018-12-21 16:39               ` Eric Anholt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.