All of lore.kernel.org
 help / color / mirror / Atom feed
* AMDPU breaks suspend after kernel 5.0
@ 2019-07-30 13:34 Paul Gover
  2019-07-31 10:21 ` Gao, Likun
  0 siblings, 1 reply; 2+ messages in thread
From: Paul Gover @ 2019-07-30 13:34 UTC (permalink / raw)
  To: Likun.Gao-5C7GfCeVMHo; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

Hi Likun,

Sorry if you don't want emails like this.  I added info. to
https://bugs.freedesktop.org/show_bug.cgi?id=110258
but people on Gentoo forums said email would be better.

Git bisect lead me to you:
---------------
106c7d6148e5aadd394e6701f7e498df49b869d1 is the first bad commit
commit 106c7d6148e5aadd394e6701f7e498df49b869d1
Author: Likun Gao <Likun.Gao@amd.com>
Date:   Thu Nov 8 20:19:54 2018 +0800

    drm/amdgpu: abstract the function of enter/exit safe mode for RLC
    
    Abstract the function of amdgpu_gfx_rlc_enter/exit_safe_mode and some part 
of rlc_init to improve the reusability of RLC.
    
    Signed-off-by: Likun Gao <Likun.Gao@amd.com>
    Acked-by: Christian König <christian.koenig@amd.com>
    Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

:040000 040000 8f3b365496f3bbd380a62032f20642ace51c8fef 
e14ec968011019e3f601df3f15682bb9ae0bafc6 M      drivers
---------------------
Symptoms are when resuming after pm-suspend, the screen is blank or corrupt,
the keyboard dead, and syslog shows
--------------------
kernel: [   81.096666] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, 
signaled seq=51, emitted seq=52
kernel: [   81.096671] [drm] IP block:gfx_v8_0 is hung!
kernel: [   81.096734] [drm] GPU recovery disabled.
---------------------
or similar.  The problem occurs with all kernels since 5.0 up to and including 
5.3-rc2.  My laptop is:

HP 15-bw0xx
cpu:AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G
with integrated graphics:
Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98E4]

There are several similar reports on the web, most or all for Stoney hardware, 
but that might be a coincidence as laptop users are more concerned about 
suspend, and there are a lot of laptops with similar integrated graphics 
motherboards.

I'm running Gentoo with a custom kernel, the most relevant bits of the config
CONFIG_DRM_AMDGPU=y
# CONFIG_DRM_AMDGPU_SI is not set
# CONFIG_DRM_AMDGPU_CIK is not set
# CONFIG_DRM_AMDGPU_USERPTR is not set

If you tell me how, I'm willing to try to collect traces etc.

Paul Gover


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: AMDPU breaks suspend after kernel 5.0
  2019-07-30 13:34 AMDPU breaks suspend after kernel 5.0 Paul Gover
@ 2019-07-31 10:21 ` Gao, Likun
  0 siblings, 0 replies; 2+ messages in thread
From: Gao, Likun @ 2019-07-31 10:21 UTC (permalink / raw)
  To: Paul Gover; +Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW

[-- Attachment #1: Type: text/plain, Size: 2814 bytes --]

Hi Gover,

Sorry for responds late, can you help to give a try to add the patch attached and share me the related result and logs? 
Besides, do you have tried to revert this commit to see whether it's good?
Thanks.

Regards,
Likun

-----Original Message-----
From: Paul Gover <pmw.gover-/E1597aS9LT10XsdtD+oqA@public.gmane.org> 
Sent: Tuesday, July 30, 2019 9:34 PM
To: Gao, Likun <Likun.Gao-5C7GfCeVMHo@public.gmane.org>
Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: AMDPU breaks suspend after kernel 5.0

Hi Likun,

Sorry if you don't want emails like this.  I added info. to
https://bugs.freedesktop.org/show_bug.cgi?id=110258
but people on Gentoo forums said email would be better.

Git bisect lead me to you:
---------------
106c7d6148e5aadd394e6701f7e498df49b869d1 is the first bad commit commit 106c7d6148e5aadd394e6701f7e498df49b869d1
Author: Likun Gao <Likun.Gao-5C7GfCeVMHo@public.gmane.org>
Date:   Thu Nov 8 20:19:54 2018 +0800

    drm/amdgpu: abstract the function of enter/exit safe mode for RLC
    
    Abstract the function of amdgpu_gfx_rlc_enter/exit_safe_mode and some part of rlc_init to improve the reusability of RLC.
    
    Signed-off-by: Likun Gao <Likun.Gao-5C7GfCeVMHo@public.gmane.org>
    Acked-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
    Reviewed-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>
    Signed-off-by: Alex Deucher <alexander.deucher-5C7GfCeVMHo@public.gmane.org>

:040000 040000 8f3b365496f3bbd380a62032f20642ace51c8fef 
e14ec968011019e3f601df3f15682bb9ae0bafc6 M      drivers
---------------------
Symptoms are when resuming after pm-suspend, the screen is blank or corrupt, the keyboard dead, and syslog shows
--------------------
kernel: [   81.096666] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, 
signaled seq=51, emitted seq=52
kernel: [   81.096671] [drm] IP block:gfx_v8_0 is hung!
kernel: [   81.096734] [drm] GPU recovery disabled.
---------------------
or similar.  The problem occurs with all kernels since 5.0 up to and including 5.3-rc2.  My laptop is:

HP 15-bw0xx
cpu:AMD A9-9420 RADEON R5, 5 COMPUTE CORES 2C+3G with integrated graphics:
Stoney [Radeon R2/R3/R4/R5 Graphics] [1002:98E4]

There are several similar reports on the web, most or all for Stoney hardware, but that might be a coincidence as laptop users are more concerned about suspend, and there are a lot of laptops with similar integrated graphics motherboards.

I'm running Gentoo with a custom kernel, the most relevant bits of the config CONFIG_DRM_AMDGPU=y # CONFIG_DRM_AMDGPU_SI is not set # CONFIG_DRM_AMDGPU_CIK is not set # CONFIG_DRM_AMDGPU_USERPTR is not set

If you tell me how, I'm willing to try to collect traces etc.

Paul Gover



[-- Attachment #2: 0001-drm-amdgpu-debug-for-gfx-v8-Stoney-pm-suspend.patch --]
[-- Type: application/octet-stream, Size: 1492 bytes --]

From f6ad633767b4178fef2040711417e71e93f412b8 Mon Sep 17 00:00:00 2001
From: Likun Gao <Likun.Gao@amd.com>
Date: Wed, 31 Jul 2019 13:52:07 +0800
Subject: [PATCH] drm/amdgpu: debug for gfx v8 Stoney pm-suspend

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index c8793e6..5981ff5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -36,6 +36,8 @@
  */
 void amdgpu_gfx_rlc_enter_safe_mode(struct amdgpu_device *adev)
 {
+	printk("gfxv8 safe mode state is %d to enter\n",
+	       adev->gfx.rlc.in_safe_mode);
 	if (adev->gfx.rlc.in_safe_mode)
 		return;
 
@@ -60,6 +62,8 @@ void amdgpu_gfx_rlc_enter_safe_mode(struct amdgpu_device *adev)
  */
 void amdgpu_gfx_rlc_exit_safe_mode(struct amdgpu_device *adev)
 {
+	printk("gfxv8 safe mode state is %d to exit\n",
+	       adev->gfx.rlc.in_safe_mode);
 	if (!(adev->gfx.rlc.in_safe_mode))
 		return;
 
@@ -145,7 +149,7 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev)
 	dst_ptr = adev->gfx.rlc.cs_ptr;
 	adev->gfx.rlc.funcs->get_csb_buffer(adev, dst_ptr);
 	amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
-	amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
+	//amdgpu_bo_unpin(adev->gfx.rlc.clear_state_obj);
 	amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
 
 	return 0;
-- 
2.7.4


[-- Attachment #3: Type: text/plain, Size: 153 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2019-07-31 10:21 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 13:34 AMDPU breaks suspend after kernel 5.0 Paul Gover
2019-07-31 10:21 ` Gao, Likun

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.