linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/etnaviv: fix external abort seen on GC600 rev 0x19
@ 2020-08-21 18:17 Christian Gmeiner
  2020-08-23 14:27 ` Ing. Josua Mayer
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Gmeiner @ 2020-08-21 18:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: josua.mayer, Christian Gmeiner, stable, Lucas Stach,
	Russell King, David Airlie, Daniel Vetter, etnaviv, dri-devel

It looks like that this GPU core triggers an abort when
reading VIVS_HI_CHIP_PRODUCT_ID and/or VIVS_HI_CHIP_CUSTOMER_ID.

I looked at different versions of Vivante's kernel driver and did
not found anything about this issue or what feature flag can be
used. So go the simplest route and do not read these two registers
on the affected GPU core.

Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reported-by: Josua Mayer <josua.mayer@jm0.eu>
Fixes: 815e45bbd4d3 ("drm/etnaviv: determine product, customer and eco id")
Cc: stable@vger.kernel.org
---
 drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
index d5a4cd85a0f6..d3906688c2b3 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
@@ -337,10 +337,17 @@ static void etnaviv_hw_identify(struct etnaviv_gpu *gpu)
 
 		gpu->identity.model = gpu_read(gpu, VIVS_HI_CHIP_MODEL);
 		gpu->identity.revision = gpu_read(gpu, VIVS_HI_CHIP_REV);
-		gpu->identity.product_id = gpu_read(gpu, VIVS_HI_CHIP_PRODUCT_ID);
-		gpu->identity.customer_id = gpu_read(gpu, VIVS_HI_CHIP_CUSTOMER_ID);
 		gpu->identity.eco_id = gpu_read(gpu, VIVS_HI_CHIP_ECO_ID);
 
+		/*
+		 * Reading these two registers on GC600 rev 0x19 result in a
+		 * unhandled fault: external abort on non-linefetch
+		 */
+		if (!etnaviv_is_model_rev(gpu, GC600, 0x19)) {
+			gpu->identity.product_id = gpu_read(gpu, VIVS_HI_CHIP_PRODUCT_ID);
+			gpu->identity.customer_id = gpu_read(gpu, VIVS_HI_CHIP_CUSTOMER_ID);
+		}
+
 		/*
 		 * !!!! HACK ALERT !!!!
 		 * Because people change device IDs without letting software
-- 
2.26.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/etnaviv: fix external abort seen on GC600 rev 0x19
  2020-08-21 18:17 [PATCH] drm/etnaviv: fix external abort seen on GC600 rev 0x19 Christian Gmeiner
@ 2020-08-23 14:27 ` Ing. Josua Mayer
  2020-08-23 19:10   ` Christian Gmeiner
  0 siblings, 1 reply; 5+ messages in thread
From: Ing. Josua Mayer @ 2020-08-23 14:27 UTC (permalink / raw)
  To: Christian Gmeiner, linux-kernel
  Cc: stable, Lucas Stach, Russell King, David Airlie, Daniel Vetter,
	etnaviv, dri-devel

Hi Christian,

I have formally tested the patch with 5.7.10 - and it doesn't resolve
the issue - sadly :(

From my testing, the reads on
VIVS_HI_CHIP_PRODUCT_ID
VIVS_HI_CHIP_ECO_ID
need to be conditional - while
VIVS_HI_CHIP_CUSTOMER_ID
seems to be okay.

br
josau Mayer

Am 21.08.20 um 20:17 schrieb Christian Gmeiner:
> It looks like that this GPU core triggers an abort when
> reading VIVS_HI_CHIP_PRODUCT_ID and/or VIVS_HI_CHIP_CUSTOMER_ID.
> 
> I looked at different versions of Vivante's kernel driver and did
> not found anything about this issue or what feature flag can be
> used. So go the simplest route and do not read these two registers
> on the affected GPU core.
> 
> Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
> Reported-by: Josua Mayer <josua.mayer@jm0.eu>
> Fixes: 815e45bbd4d3 ("drm/etnaviv: determine product, customer and eco id")
> Cc: stable@vger.kernel.org
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> index d5a4cd85a0f6..d3906688c2b3 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c
> @@ -337,10 +337,17 @@ static void etnaviv_hw_identify(struct etnaviv_gpu *gpu)
>  
>  		gpu->identity.model = gpu_read(gpu, VIVS_HI_CHIP_MODEL);
>  		gpu->identity.revision = gpu_read(gpu, VIVS_HI_CHIP_REV);
> -		gpu->identity.product_id = gpu_read(gpu, VIVS_HI_CHIP_PRODUCT_ID);
> -		gpu->identity.customer_id = gpu_read(gpu, VIVS_HI_CHIP_CUSTOMER_ID);
>  		gpu->identity.eco_id = gpu_read(gpu, VIVS_HI_CHIP_ECO_ID);
>  
> +		/*
> +		 * Reading these two registers on GC600 rev 0x19 result in a
> +		 * unhandled fault: external abort on non-linefetch
> +		 */
> +		if (!etnaviv_is_model_rev(gpu, GC600, 0x19)) {
> +			gpu->identity.product_id = gpu_read(gpu, VIVS_HI_CHIP_PRODUCT_ID);
> +			gpu->identity.customer_id = gpu_read(gpu, VIVS_HI_CHIP_CUSTOMER_ID);
> +		}
> +
>  		/*
>  		 * !!!! HACK ALERT !!!!
>  		 * Because people change device IDs without letting software
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/etnaviv: fix external abort seen on GC600 rev 0x19
  2020-08-23 14:27 ` Ing. Josua Mayer
@ 2020-08-23 19:10   ` Christian Gmeiner
  2020-08-23 19:19     ` Russell King - ARM Linux admin
  0 siblings, 1 reply; 5+ messages in thread
From: Christian Gmeiner @ 2020-08-23 19:10 UTC (permalink / raw)
  To: Ing. Josua Mayer
  Cc: LKML, stable, Lucas Stach, Russell King, David Airlie,
	Daniel Vetter, The etnaviv authors, DRI mailing list

Hi

> I have formally tested the patch with 5.7.10 - and it doesn't resolve
> the issue - sadly :(
>
> From my testing, the reads on
> VIVS_HI_CHIP_PRODUCT_ID
> VIVS_HI_CHIP_ECO_ID
> need to be conditional - while
> VIVS_HI_CHIP_CUSTOMER_ID
> seems to be okay.
>

Uhh.. okay.. just send a V2 - thanks for testing :)

-- 
greets
--
Christian Gmeiner, MSc

https://christian-gmeiner.info/privacypolicy

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/etnaviv: fix external abort seen on GC600 rev 0x19
  2020-08-23 19:10   ` Christian Gmeiner
@ 2020-08-23 19:19     ` Russell King - ARM Linux admin
  2020-08-24 11:04       ` Lucas Stach
  0 siblings, 1 reply; 5+ messages in thread
From: Russell King - ARM Linux admin @ 2020-08-23 19:19 UTC (permalink / raw)
  To: Christian Gmeiner
  Cc: Ing. Josua Mayer, LKML, stable, Lucas Stach, David Airlie,
	Daniel Vetter, The etnaviv authors, DRI mailing list

On Sun, Aug 23, 2020 at 09:10:25PM +0200, Christian Gmeiner wrote:
> Hi
> 
> > I have formally tested the patch with 5.7.10 - and it doesn't resolve
> > the issue - sadly :(
> >
> > From my testing, the reads on
> > VIVS_HI_CHIP_PRODUCT_ID
> > VIVS_HI_CHIP_ECO_ID
> > need to be conditional - while
> > VIVS_HI_CHIP_CUSTOMER_ID
> > seems to be okay.
> >
> 
> Uhh.. okay.. just send a V2 - thanks for testing :)

There is also something else going on with the GC600 - 5.4 worked fine,
5.8 doesn't - my 2D Xorg driver gets stuck waiting on a BO after just
a couple of minutes.  Looking in debugfs, there's a whole load of BOs
that are listed as "active", yet the GPU is idle:

   00020000: A  0 ( 7) 00000000 00000000 8294400
   00010000: I  0 ( 1) 00000000 00000000 4096
   00010000: I  0 ( 1) 00000000 00000000 4096
   00010000: I  0 ( 1) 00000000 00000000 327680
   00010000: A  0 ( 7) 00000000 00000000 8388608
   00010000: I  0 ( 1) 00000000 00000000 8388608
   00010000: I  0 ( 1) 00000000 00000000 8388608
   00010000: A  0 ( 7) 00000000 00000000 8388608
   00010000: A  0 ( 3) 00000000 00000000 8388608
   00010000: A  0 ( 4) 00000000 00000000 8388608
   00010000: A  0 ( 3) 00000000 00000000 8388608
   00010000: A  0 ( 3) 00000000 00000000 8388608
   00010000: A  0 ( 3) 00000000 00000000 8388608
....
   00010000: A  0 ( 3) 00000000 00000000 8388608
Total 38 objects, 293842944 bytes

My guess is there's something up with the way a job completes that's
causing the BOs not to be marked inactive.  I haven't yet been able
to debug any further.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] drm/etnaviv: fix external abort seen on GC600 rev 0x19
  2020-08-23 19:19     ` Russell King - ARM Linux admin
@ 2020-08-24 11:04       ` Lucas Stach
  0 siblings, 0 replies; 5+ messages in thread
From: Lucas Stach @ 2020-08-24 11:04 UTC (permalink / raw)
  To: Russell King - ARM Linux admin, Christian Gmeiner
  Cc: Ing. Josua Mayer, LKML, stable, David Airlie, Daniel Vetter,
	The etnaviv authors, DRI mailing list

Hi Russell,

Am Sonntag, den 23.08.2020, 20:19 +0100 schrieb Russell King - ARM Linux admin:
> On Sun, Aug 23, 2020 at 09:10:25PM +0200, Christian Gmeiner wrote:
> > Hi
> > 
> > > I have formally tested the patch with 5.7.10 - and it doesn't resolve
> > > the issue - sadly :(
> > > 
> > > From my testing, the reads on
> > > VIVS_HI_CHIP_PRODUCT_ID
> > > VIVS_HI_CHIP_ECO_ID
> > > need to be conditional - while
> > > VIVS_HI_CHIP_CUSTOMER_ID
> > > seems to be okay.
> > > 
> > 
> > Uhh.. okay.. just send a V2 - thanks for testing :)
> 
> There is also something else going on with the GC600 - 5.4 worked fine,
> 5.8 doesn't - my 2D Xorg driver gets stuck waiting on a BO after just
> a couple of minutes.  Looking in debugfs, there's a whole load of BOs
> that are listed as "active", yet the GPU is idle:
> 
>    00020000: A  0 ( 7) 00000000 00000000 8294400
>    00010000: I  0 ( 1) 00000000 00000000 4096
>    00010000: I  0 ( 1) 00000000 00000000 4096
>    00010000: I  0 ( 1) 00000000 00000000 327680
>    00010000: A  0 ( 7) 00000000 00000000 8388608
>    00010000: I  0 ( 1) 00000000 00000000 8388608
>    00010000: I  0 ( 1) 00000000 00000000 8388608
>    00010000: A  0 ( 7) 00000000 00000000 8388608
>    00010000: A  0 ( 3) 00000000 00000000 8388608
>    00010000: A  0 ( 4) 00000000 00000000 8388608
>    00010000: A  0 ( 3) 00000000 00000000 8388608
>    00010000: A  0 ( 3) 00000000 00000000 8388608
>    00010000: A  0 ( 3) 00000000 00000000 8388608
> ....
>    00010000: A  0 ( 3) 00000000 00000000 8388608
> Total 38 objects, 293842944 bytes
> 
> My guess is there's something up with the way a job completes that's
> causing the BOs not to be marked inactive.  I haven't yet been able
> to debug any further.

The patch I just sent out should fix this issue. The DRM scheduler is
doing some funny business which breaks our job done signalling if the
GPU timeout has been hit, even if our timeout handler is just extending
the timeout as the GPU is still working normally.

Regards,
Lucas


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-08-24 11:05 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-21 18:17 [PATCH] drm/etnaviv: fix external abort seen on GC600 rev 0x19 Christian Gmeiner
2020-08-23 14:27 ` Ing. Josua Mayer
2020-08-23 19:10   ` Christian Gmeiner
2020-08-23 19:19     ` Russell King - ARM Linux admin
2020-08-24 11:04       ` Lucas Stach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).