Regression in amdgpu driver / kernel v5.8.6

* Regression in amdgpu driver / kernel v5.8.6
@ 2020-09-28  9:10 Jean Delvare
  2020-09-28 21:44 ` Alex Deucher
  0 siblings, 1 reply; 5+ messages in thread
From: Jean Delvare @ 2020-09-28  9:10 UTC (permalink / raw)
  To: Navid Emamdoost; +Cc: Alex Deucher, Sasha Levin, amd-gfx

Hi all,

I have recently experienced a regression in stable kernel series 5.8.
The problem is that my display will randomly turn to black after just a
few seconds of inactivity. I have to switch to text console then back
to X to recover.

I bisected it down to:

commit b86657e328b601a5b98f8c4c21d108d356dbceee
Author: Navid Emamdoost <navid.emamdoost@gmail.com>
Date:   Sun Jun 14 02:09:44 2020 -0500

    drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config
    
    [ Upstream commit e008fa6fb41544b63973a529b704ef342f47cc65 ]
    
Reverting this commit on top of 5.8.10 makes the problem go away. My
hardware setup:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 550 640SP / RX 560/560X] [1002:67ff] (rev ff)
01:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Baffin HDMI/DP Audio [Radeon RX 550 640SP / RX 560/560X] [1002:aae0]

This is a Sapphire Pulse Radeon RX550 card, with two Lenovo P27h-10
displays connected (one over DP, one over HDMI). I'm using option dc=0
otherwise the multi-screen setup doesn't work.

Looking at the patch, and at the logic of function
amdgpu_display_crtc_set_config() in general, I suspect that the middle
chunk of the patch is incorrect. Calling pm_runtime_put_autosuspend()
if pm_runtime_get_sync() failed is, albeit surprising, correct due to
how that function is implemented. Calling it right after
"adev->have_disp_power_ref = true" however looks wrong. The comment
right before that line pretty much implies that we *want* to keep the
reference. 

So I think we want to apply a partial revert like the following:

From: Jean Delvare <jdelvare@suse.de>
Subject: drm/amdgpu: restore proper ref count in amdgpu_display_crtc_set_config

A recent attempt to fix a ref count leak in
amdgpu_display_crtc_set_config() turned out to be doing too much and
"fixed" an intended decrease as if it were a leak. Undo that part to
restore the proper balance. This is the very nature of this function
to increase or decrease the power reference count depending on the
situation.

Consequences of this bug is that the power reference would
eventually get down to 0 while the display was still in use,
resulting in that display switching off unexpectedly.

Signed-off-by: Jean Delvare <jdelvare@suse.de>
Fixes: e008fa6fb415 ("drm/amdgpu: fix ref count leak in amdgpu_display_crtc_set_config")
Cc: stable@vger.kernel.org
Cc: Navid Emamdoost <navid.emamdoost@gmail.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- linux.orig/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c	2020-09-28 10:54:12.634245251 +0200
+++ linux/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c	2020-09-28 10:55:40.569906840 +0200
@@ -297,7 +297,7 @@ int amdgpu_display_crtc_set_config(struc
 	   take the current one */
 	if (active && !adev->have_disp_power_ref) {
 		adev->have_disp_power_ref = true;
-		goto out;
+		return ret;
 	}
 	/* if we have no active crtcs, then drop the power ref
 	   we got before */


-- 
Jean Delvare
SUSE L3 Support
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 5+ messages in thread