All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
@ 2023-05-16 23:34 Mario Limonciello
  2023-05-16 23:34 ` [PATCH v2 1/3] drm/amd: Flush any delayed gfxoff on suspend entry Mario Limonciello
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Mario Limonciello @ 2023-05-16 23:34 UTC (permalink / raw)
  To: amd-gfx
  Cc: anson.tsao, Tim Huang, Juan.Martinez, Mario Limonciello, richard.gong

DCN 3.1.4 s2idle entry will hang
occasionally on s2idle entry, but only if running Wayland and only
when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.

This happens because using `systemctl suspend` will cause the screen
to lock right before writing mem into /sys/power/state.

This causes a delayed GFXOFF entry to be scheduled right before s2idle
entry.  If the workqueue doesn't get processed before the RLC is turned
off the system is hung. Even if the workqueue *does* get processed, there
is a race between the APU microcontrollers and driver for whether GFX
is actually powered off when RLC is turned off.

To avoid this issue, flush the workqueue on s2idle entry and ensure that
GFX is really in GFXOFF before any sensitive register accesses occur.

Mario Limonciello (3):
  drm/amd: Flush any delayed gfxoff on suspend entry
  drm/amd: Poll for GFX core to be off
  drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
 drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
 4 files changed, 46 insertions(+), 2 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 1/3] drm/amd: Flush any delayed gfxoff on suspend entry
  2023-05-16 23:34 [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Mario Limonciello
@ 2023-05-16 23:34 ` Mario Limonciello
  2023-05-16 23:34 ` [PATCH v2 2/3] drm/amd: Poll for GFX core to be off Mario Limonciello
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Mario Limonciello @ 2023-05-16 23:34 UTC (permalink / raw)
  To: amd-gfx
  Cc: anson.tsao, Tim Huang, Juan.Martinez, Mario Limonciello, richard.gong

DCN 3.1.4 is reported to hang on s2idle entry if graphics activity
is happening during entry.  This is because GFXOFF was scheduled as
delayed but RLC gets disabled in s2idle entry sequence which will
hang GFX IP if not already in GFXOFF.

To help this problem, flush any delayed work for GFXOFF early in
s2idle entry sequence to ensure that it's off when RLC is changed.

Cc: stable@vger.kernel.org # 6.1+
Suggested-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index a9d9bbe8586b..059139f1f973 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4316,6 +4316,7 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
 		drm_fb_helper_set_suspend_unlocked(adev_to_drm(adev)->fb_helper, true);
 
 	cancel_delayed_work_sync(&adev->delayed_init_work);
+	flush_delayed_work(&adev->gfx.gfx_off_delay_work);
 
 	amdgpu_ras_suspend(adev);
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 2/3] drm/amd: Poll for GFX core to be off
  2023-05-16 23:34 [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Mario Limonciello
  2023-05-16 23:34 ` [PATCH v2 1/3] drm/amd: Flush any delayed gfxoff on suspend entry Mario Limonciello
@ 2023-05-16 23:34 ` Mario Limonciello
  2023-05-16 23:34 ` [PATCH v2 3/3] drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11 Mario Limonciello
  2023-05-17  4:43 ` [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Lazar, Lijo
  3 siblings, 0 replies; 11+ messages in thread
From: Mario Limonciello @ 2023-05-16 23:34 UTC (permalink / raw)
  To: amd-gfx
  Cc: anson.tsao, Tim Huang, Juan.Martinez, Mario Limonciello, richard.gong

If GFXOFF was flushed during suspend entry it may take some time
for GFX core to be powered down.  Ensure that it's powered off
before continuing any operations that may try to utilize related
IP. This avoids hangs from stopping RLC as well as problems with
fence interrupts timing out during s2idle entry and exit.

Cc: stable@vger.kernel.org # 6.1
Suggested-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v1->v2:
 * Only poll in the s0ix case not all GFXOFF cases
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 24 ++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
 drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
 3 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 059139f1f973..59d5fc65276c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3063,6 +3063,26 @@ static void amdgpu_device_delay_enable_gfx_off(struct work_struct *work)
 		adev->gfx.gfx_off_state = true;
 }
 
+static int amdgpu_device_ensure_gfx_off(struct amdgpu_device *adev)
+{
+	int i, r;
+
+	if (!adev->in_s0ix)
+		return 0;
+
+	for (i = adev->num_ip_blocks - 1; i >= 0; i--) {
+		if (adev->ip_blocks[i].version->type != AMD_IP_BLOCK_TYPE_GFX)
+			continue;
+		if (!adev->ip_blocks[i].version->funcs->wait_for_off)
+			continue;
+		r = adev->ip_blocks[i].version->funcs->wait_for_off((void *)adev);
+		if (r)
+			return r;
+	}
+
+	return 0;
+}
+
 /**
  * amdgpu_device_ip_suspend_phase1 - run suspend for hardware IPs (phase 1)
  *
@@ -4318,6 +4338,10 @@ int amdgpu_device_suspend(struct drm_device *dev, bool fbcon)
 	cancel_delayed_work_sync(&adev->delayed_init_work);
 	flush_delayed_work(&adev->gfx.gfx_off_delay_work);
 
+	r = amdgpu_device_ensure_gfx_off(adev);
+	if (r)
+		return r;
+
 	amdgpu_ras_suspend(adev);
 
 	amdgpu_device_ip_suspend_phase1(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 4b7224de879e..dcbdb2641086 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -4434,6 +4434,23 @@ static int gfx_v11_0_wait_for_idle(void *handle)
 	return -ETIMEDOUT;
 }
 
+
+static int gfx_v11_0_wait_for_off(void *handle)
+{
+	struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+	u32 tmp;
+	int i;
+
+	for (i = 0; i < adev->usec_timeout; i++) {
+		tmp = RREG32_SOC15(GC, 0, regGFX_IMU_MSG_FLAGS);
+		if (!(tmp & 0x06))
+			return 0;
+		udelay(1);
+	}
+	dev_dbg(adev->dev, "GFX IMU is %x\n", tmp);
+	return -ETIMEDOUT;
+}
+
 static int gfx_v11_0_soft_reset(void *handle)
 {
 	u32 grbm_soft_reset = 0;
@@ -6109,6 +6126,7 @@ static const struct amd_ip_funcs gfx_v11_0_ip_funcs = {
 	.resume = gfx_v11_0_resume,
 	.is_idle = gfx_v11_0_is_idle,
 	.wait_for_idle = gfx_v11_0_wait_for_idle,
+	.wait_for_off = gfx_v11_0_wait_for_off,
 	.soft_reset = gfx_v11_0_soft_reset,
 	.check_soft_reset = gfx_v11_0_check_soft_reset,
 	.post_soft_reset = gfx_v11_0_post_soft_reset,
diff --git a/drivers/gpu/drm/amd/include/amd_shared.h b/drivers/gpu/drm/amd/include/amd_shared.h
index f175e65b853a..ce2e2b6fd6ff 100644
--- a/drivers/gpu/drm/amd/include/amd_shared.h
+++ b/drivers/gpu/drm/amd/include/amd_shared.h
@@ -298,6 +298,7 @@ struct amd_ip_funcs {
 	int (*resume)(void *handle);
 	bool (*is_idle)(void *handle);
 	int (*wait_for_idle)(void *handle);
+	int (*wait_for_off)(void *handle);
 	bool (*check_soft_reset)(void *handle);
 	int (*pre_soft_reset)(void *handle);
 	int (*soft_reset)(void *handle);
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v2 3/3] drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
  2023-05-16 23:34 [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Mario Limonciello
  2023-05-16 23:34 ` [PATCH v2 1/3] drm/amd: Flush any delayed gfxoff on suspend entry Mario Limonciello
  2023-05-16 23:34 ` [PATCH v2 2/3] drm/amd: Poll for GFX core to be off Mario Limonciello
@ 2023-05-16 23:34 ` Mario Limonciello
  2023-05-17  4:43 ` [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Lazar, Lijo
  3 siblings, 0 replies; 11+ messages in thread
From: Mario Limonciello @ 2023-05-16 23:34 UTC (permalink / raw)
  To: amd-gfx
  Cc: Tim Huang, richard.gong, Juan.Martinez, Mario Limonciello,
	Alexander Deucher, anson.tsao

RLC suspend in s0ix is unncessary as the SMU and IMU jointly
manages graphics power state.

Suggested-by: Alexander Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
---
v1->v2:
 * Skip RLC all the time instead of adding safety to it
---
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index 4dea79a0c5b5..f8510ce1c4f7 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -1549,9 +1549,9 @@ static int smu_disable_dpms(struct smu_context *smu)
 
 	/*
 	 * For SMU 13.0.4/11, PMFW will handle the features disablement properly
-	 * for gpu reset case. Driver involvement is unnecessary.
+	 * for gpu reset and s0ix case. Driver involvement is unnecessary.
 	 */
-	if (amdgpu_in_reset(adev)) {
+	if (amdgpu_in_reset(adev) || adev->in_s0ix) {
 		switch (adev->ip_versions[MP1_HWIP][0]) {
 		case IP_VERSION(13, 0, 4):
 		case IP_VERSION(13, 0, 11):
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
  2023-05-16 23:34 [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Mario Limonciello
                   ` (2 preceding siblings ...)
  2023-05-16 23:34 ` [PATCH v2 3/3] drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11 Mario Limonciello
@ 2023-05-17  4:43 ` Lazar, Lijo
  2023-05-17  4:55   ` Limonciello, Mario
  3 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2023-05-17  4:43 UTC (permalink / raw)
  To: Mario Limonciello, amd-gfx
  Cc: anson.tsao, Juan.Martinez, richard.gong, Tim Huang



On 5/17/2023 5:04 AM, Mario Limonciello wrote:
> DCN 3.1.4 s2idle entry will hang
> occasionally on s2idle entry, but only if running Wayland and only
> when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.
> 
> This happens because using `systemctl suspend` will cause the screen
> to lock right before writing mem into /sys/power/state.
> 

A couple of things on this since this mentions systemctl suspend -

1) If in s2idle, it's supposed to immediately signal and not schedule 
delayed work

3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
allow signal during suspend without delay

2) RLC is never stopped on GFX 10 or greater.

Wondering if the code hides something else because of the timing.

Thanks,
Lijo

> This causes a delayed GFXOFF entry to be scheduled right before s2idle
> entry.  If the workqueue doesn't get processed before the RLC is turned
> off the system is hung. Even if the workqueue *does* get processed, there
> is a race between the APU microcontrollers and driver for whether GFX
> is actually powered off when RLC is turned off.
> 
> To avoid this issue, flush the workqueue on s2idle entry and ensure that
> GFX is really in GFXOFF before any sensitive register accesses occur.
> 
> Mario Limonciello (3):
>    drm/amd: Flush any delayed gfxoff on suspend entry
>    drm/amd: Poll for GFX core to be off
>    drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
> 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 ++++++++++++++++++++++
>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
>   drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
>   4 files changed, 46 insertions(+), 2 deletions(-)
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
  2023-05-17  4:43 ` [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Lazar, Lijo
@ 2023-05-17  4:55   ` Limonciello, Mario
  2023-05-17  5:07     ` Lazar, Lijo
  0 siblings, 1 reply; 11+ messages in thread
From: Limonciello, Mario @ 2023-05-17  4:55 UTC (permalink / raw)
  To: Lazar, Lijo, Mario Limonciello, amd-gfx
  Cc: anson.tsao, Juan.Martinez, richard.gong, Tim Huang

[-- Attachment #1: Type: text/plain, Size: 3214 bytes --]


On 5/16/2023 11:43 PM, Lazar, Lijo wrote:
>
>
> On 5/17/2023 5:04 AM, Mario Limonciello wrote:
>> DCN 3.1.4 s2idle entry will hang
>> occasionally on s2idle entry, but only if running Wayland and only
>> when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.
>>
>> This happens because using `systemctl suspend` will cause the screen
>> to lock right before writing mem into /sys/power/state.
>>
>
> A couple of things on this since this mentions systemctl suspend -
>
> 1) If in s2idle, it's supposed to immediately signal and not schedule 
> delayed work
>
> 3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
> allow signal during suspend without delay

It looks like dead code to me now actually.

amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets 
called as part of suspend path.

>
> 2) RLC is never stopped on GFX 10 or greater.
>
System was hanging before this series.

Patch 3 "alone" matches this behavior as described above to skip RLC 
suspend but two problems happen:

1) GFXOFF workqueue doesn't get flushed and so driver's request for 
GFXOFF can happen at wrong time.

2) If suspend entry happens before GFXOFF is really asserted lots of 
errors on resume. IE:

[   63.095227] [drm] Fence fallback timer expired on ring sdma0
[   63.098360] [drm] ring gfx_32772.1.1 was added
[   63.099439] [drm] ring compute_32772.2.2 was added
[   63.100460] [drm] ring sdma_32772.3.3 was added
[   63.100504] [drm] ring gfx_32772.1.1 test pass
[   63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
[   63.607234] [drm] ring gfx_32772.1.1 ib test pass
[   63.608964] [drm] ring compute_32772.2.2 test pass
[   64.119173] [drm] Fence fallback timer expired on ring compute_32772.2.2
[   64.119219] [drm] ring compute_32772.2.2 ib test pass
[   64.121364] [drm] ring sdma_32772.3.3 test pass
[   64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
[   64.631465] [drm] ring sdma_32772.3.3 ib test pass
[   65.143184] [drm] Fence fallback timer expired on ring sdma0

> Wondering if the code hides something else because of the timing.
> Thanks,
> Lijo
>
>> This causes a delayed GFXOFF entry to be scheduled right before s2idle
>> entry.  If the workqueue doesn't get processed before the RLC is turned
>> off the system is hung. Even if the workqueue *does* get processed, 
>> there
>> is a race between the APU microcontrollers and driver for whether GFX
>> is actually powered off when RLC is turned off.
>>
>> To avoid this issue, flush the workqueue on s2idle entry and ensure that
>> GFX is really in GFXOFF before any sensitive register accesses occur.
>>
>> Mario Limonciello (3):
>>    drm/amd: Flush any delayed gfxoff on suspend entry
>>    drm/amd: Poll for GFX core to be off
>>    drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
>>
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 ++++++++++++++++++++++
>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
>>   drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
>>   4 files changed, 46 insertions(+), 2 deletions(-)
>>

[-- Attachment #2: Type: text/html, Size: 5313 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
  2023-05-17  4:55   ` Limonciello, Mario
@ 2023-05-17  5:07     ` Lazar, Lijo
  2023-05-17  5:16       ` Limonciello, Mario
  0 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2023-05-17  5:07 UTC (permalink / raw)
  To: Limonciello, Mario, Mario Limonciello, amd-gfx
  Cc: anson.tsao, Juan.Martinez, richard.gong, Tim Huang



On 5/17/2023 10:25 AM, Limonciello, Mario wrote:
> 
> On 5/16/2023 11:43 PM, Lazar, Lijo wrote:
>>
>>
>> On 5/17/2023 5:04 AM, Mario Limonciello wrote:
>>> DCN 3.1.4 s2idle entry will hang
>>> occasionally on s2idle entry, but only if running Wayland and only
>>> when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.
>>>
>>> This happens because using `systemctl suspend` will cause the screen
>>> to lock right before writing mem into /sys/power/state.
>>>
>>
>> A couple of things on this since this mentions systemctl suspend -
>>
>> 1) If in s2idle, it's supposed to immediately signal and not schedule 
>> delayed work
>>
>> 3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
>> allow signal during suspend without delay
> 
> It looks like dead code to me now actually.
> 
> amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets 
> called as part of suspend path.
> 

Ok, that means schedule happened sometime before. Can we remove this 
code also as there is a flush anyway with patch 1? Also, is there a need 
to call GFXOFF forcefully on S0ix suspend (any chance that gfxoff is not 
scheduled)?

>>
>> 2) RLC is never stopped on GFX 10 or greater.
>>
> System was hanging before this series.
> 
> Patch 3 "alone" matches this behavior as described above to skip RLC 
> suspend but two problems happen:
> 
> 1) GFXOFF workqueue doesn't get flushed and so driver's request for 
> GFXOFF can happen at wrong time.
> 
> 2) If suspend entry happens before GFXOFF is really asserted lots of 
> errors on resume. IE:
> 

Is patch 3 really required?  Does it make any difference?

Thanks,
Lijo

> [   63.095227] [drm] Fence fallback timer expired on ring sdma0
> [   63.098360] [drm] ring gfx_32772.1.1 was added
> [   63.099439] [drm] ring compute_32772.2.2 was added
> [   63.100460] [drm] ring sdma_32772.3.3 was added
> [   63.100504] [drm] ring gfx_32772.1.1 test pass
> [   63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
> [   63.607234] [drm] ring gfx_32772.1.1 ib test pass
> [   63.608964] [drm] ring compute_32772.2.2 test pass
> [   64.119173] [drm] Fence fallback timer expired on ring compute_32772.2.2
> [   64.119219] [drm] ring compute_32772.2.2 ib test pass
> [   64.121364] [drm] ring sdma_32772.3.3 test pass
> [   64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
> [   64.631465] [drm] ring sdma_32772.3.3 ib test pass
> [   65.143184] [drm] Fence fallback timer expired on ring sdma0
> 
>> Wondering if the code hides something else because of the timing.
>> Thanks,
>> Lijo
>>
>>> This causes a delayed GFXOFF entry to be scheduled right before s2idle
>>> entry.  If the workqueue doesn't get processed before the RLC is turned
>>> off the system is hung. Even if the workqueue *does* get processed, 
>>> there
>>> is a race between the APU microcontrollers and driver for whether GFX
>>> is actually powered off when RLC is turned off.
>>>
>>> To avoid this issue, flush the workqueue on s2idle entry and ensure that
>>> GFX is really in GFXOFF before any sensitive register accesses occur.
>>>
>>> Mario Limonciello (3):
>>>    drm/amd: Flush any delayed gfxoff on suspend entry
>>>    drm/amd: Poll for GFX core to be off
>>>    drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
>>>
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 ++++++++++++++++++++++
>>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
>>>   drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
>>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
>>>   4 files changed, 46 insertions(+), 2 deletions(-)
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
  2023-05-17  5:07     ` Lazar, Lijo
@ 2023-05-17  5:16       ` Limonciello, Mario
  2023-05-17  5:26         ` Lazar, Lijo
  0 siblings, 1 reply; 11+ messages in thread
From: Limonciello, Mario @ 2023-05-17  5:16 UTC (permalink / raw)
  To: Lazar, Lijo, Mario Limonciello, amd-gfx
  Cc: anson.tsao, Juan.Martinez, richard.gong, Tim Huang


On 5/17/2023 12:07 AM, Lazar, Lijo wrote:
>
>
> On 5/17/2023 10:25 AM, Limonciello, Mario wrote:
>>
>> On 5/16/2023 11:43 PM, Lazar, Lijo wrote:
>>>
>>>
>>> On 5/17/2023 5:04 AM, Mario Limonciello wrote:
>>>> DCN 3.1.4 s2idle entry will hang
>>>> occasionally on s2idle entry, but only if running Wayland and only
>>>> when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.
>>>>
>>>> This happens because using `systemctl suspend` will cause the screen
>>>> to lock right before writing mem into /sys/power/state.
>>>>
>>>
>>> A couple of things on this since this mentions systemctl suspend -
>>>
>>> 1) If in s2idle, it's supposed to immediately signal and not 
>>> schedule delayed work
>>>
>>> 3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
>>> allow signal during suspend without delay
>>
>> It looks like dead code to me now actually.
>>
>> amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets 
>> called as part of suspend path.
>>
>
> Ok, that means schedule happened sometime before. 
To come up with these patches I had a test kernel with extra prints that 
showed the function call orders.

With systemctl suspend there is a call to 
gfx_v11_0_get_gpu_clock_counter() from userspace IOCTL that triggers all 
this behavior.  Here is the function calls with the patched kernel:

[   32.720456] amdgpu 0000:c2:00.0: amdgpu: set GFX off state to 
enabled, count:1
[   32.720457] amdgpu 0000:c2:00.0: amdgpu: broke gfx_off_mutex for 
gfx_v11_0_get_gpu_clock_counter+0xa8/0xf0 [amdgpu], 
adev->gfx.gfx_off_state is 0
[   32.760475] PM: suspend entry (s2idle)
[   32.768996] Filesystems sync: 0.008 seconds
[   32.769310] Freezing user space processes
[   32.776527] Freezing user space processes completed (elapsed 0.007 
seconds)
[   32.776530] OOM killer disabled.
[   32.776531] Freezing remaining freezable tasks
[   32.777528] Freezing remaining freezable tasks completed (elapsed 
0.000 seconds)
[   32.777531] printk: Suspending console(s) (use no_console_suspend to 
debug)
[   32.817853] amdgpu 0000:c2:00.0: amdgpu: Delayed work to enable gfxoff
[   32.817857] amdgpu 0000:c2:00.0: amdgpu: 
amdgpu_dpm_set_powergating_by_smu by 
amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
[   32.818142] amdgpu 0000:c2:00.0: amdgpu: broke pm.mutex for 
amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
[   32.852099] amdgpu 0000:c2:00.0: amdgpu: smu_suspend: suspend called
[   32.852101] amdgpu 0000:c2:00.0: amdgpu: smu_disable_dpms: called

Without patch 1 the delayed work doesn't get called on entry ever.

> Can we remove this code also as there is a flush anyway with patch 1?

Sure.  Do you think it should go into patch 1 or on it's own?

> Also, is there a need to call GFXOFF forcefully on S0ix suspend (any 
> chance that gfxoff is not scheduled)?
>
If using "echo mem | sudo tee /sys/power/state" I've confirmed that it's 
already in GFXOFF.  I don't think this case should happen.
>>> 2) RLC is never stopped on GFX 10 or greater.
>>>
>> System was hanging before this series.
>>
>> Patch 3 "alone" matches this behavior as described above to skip RLC 
>> suspend but two problems happen:
>>
>> 1) GFXOFF workqueue doesn't get flushed and so driver's request for 
>> GFXOFF can happen at wrong time.
>>
>> 2) If suspend entry happens before GFXOFF is really asserted lots of 
>> errors on resume. IE:
>>
>
> Is patch 3 really required?  Does it make any difference?
>
No; patch 3 isn't really required with patches 1 and 2.

> Thanks,
> Lijo
>
>> [   63.095227] [drm] Fence fallback timer expired on ring sdma0
>> [   63.098360] [drm] ring gfx_32772.1.1 was added
>> [   63.099439] [drm] ring compute_32772.2.2 was added
>> [   63.100460] [drm] ring sdma_32772.3.3 was added
>> [   63.100504] [drm] ring gfx_32772.1.1 test pass
>> [   63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
>> [   63.607234] [drm] ring gfx_32772.1.1 ib test pass
>> [   63.608964] [drm] ring compute_32772.2.2 test pass
>> [   64.119173] [drm] Fence fallback timer expired on ring 
>> compute_32772.2.2
>> [   64.119219] [drm] ring compute_32772.2.2 ib test pass
>> [   64.121364] [drm] ring sdma_32772.3.3 test pass
>> [   64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
>> [   64.631465] [drm] ring sdma_32772.3.3 ib test pass
>> [   65.143184] [drm] Fence fallback timer expired on ring sdma0
>>
>>> Wondering if the code hides something else because of the timing.
>>> Thanks,
>>> Lijo
>>>
>>>> This causes a delayed GFXOFF entry to be scheduled right before s2idle
>>>> entry.  If the workqueue doesn't get processed before the RLC is 
>>>> turned
>>>> off the system is hung. Even if the workqueue *does* get processed, 
>>>> there
>>>> is a race between the APU microcontrollers and driver for whether GFX
>>>> is actually powered off when RLC is turned off.
>>>>
>>>> To avoid this issue, flush the workqueue on s2idle entry and ensure 
>>>> that
>>>> GFX is really in GFXOFF before any sensitive register accesses occur.
>>>>
>>>> Mario Limonciello (3):
>>>>    drm/amd: Flush any delayed gfxoff on suspend entry
>>>>    drm/amd: Poll for GFX core to be off
>>>>    drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
>>>>
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 
>>>> ++++++++++++++++++++++
>>>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
>>>>   drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
>>>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
>>>>   4 files changed, 46 insertions(+), 2 deletions(-)
>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
  2023-05-17  5:16       ` Limonciello, Mario
@ 2023-05-17  5:26         ` Lazar, Lijo
  2023-05-17  5:35           ` Limonciello, Mario
  0 siblings, 1 reply; 11+ messages in thread
From: Lazar, Lijo @ 2023-05-17  5:26 UTC (permalink / raw)
  To: Limonciello, Mario, Mario Limonciello, amd-gfx
  Cc: anson.tsao, Juan.Martinez, richard.gong, Tim Huang



On 5/17/2023 10:46 AM, Limonciello, Mario wrote:
> 
> On 5/17/2023 12:07 AM, Lazar, Lijo wrote:
>>
>>
>> On 5/17/2023 10:25 AM, Limonciello, Mario wrote:
>>>
>>> On 5/16/2023 11:43 PM, Lazar, Lijo wrote:
>>>>
>>>>
>>>> On 5/17/2023 5:04 AM, Mario Limonciello wrote:
>>>>> DCN 3.1.4 s2idle entry will hang
>>>>> occasionally on s2idle entry, but only if running Wayland and only
>>>>> when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.
>>>>>
>>>>> This happens because using `systemctl suspend` will cause the screen
>>>>> to lock right before writing mem into /sys/power/state.
>>>>>
>>>>
>>>> A couple of things on this since this mentions systemctl suspend -
>>>>
>>>> 1) If in s2idle, it's supposed to immediately signal and not 
>>>> schedule delayed work
>>>>
>>>> 3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
>>>> allow signal during suspend without delay
>>>
>>> It looks like dead code to me now actually.
>>>
>>> amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets 
>>> called as part of suspend path.
>>>
>>
>> Ok, that means schedule happened sometime before. 
> To come up with these patches I had a test kernel with extra prints that 
> showed the function call orders.
> 
> With systemctl suspend there is a call to 
> gfx_v11_0_get_gpu_clock_counter() from userspace IOCTL that triggers all 
> this behavior. 

I think we replaced this with golden timestamp value which doesn't 
require GFX register access.

  Here is the function calls with the patched kernel:
> 
> [   32.720456] amdgpu 0000:c2:00.0: amdgpu: set GFX off state to 
> enabled, count:1
> [   32.720457] amdgpu 0000:c2:00.0: amdgpu: broke gfx_off_mutex for 
> gfx_v11_0_get_gpu_clock_counter+0xa8/0xf0 [amdgpu], 
> adev->gfx.gfx_off_state is 0
> [   32.760475] PM: suspend entry (s2idle)
> [   32.768996] Filesystems sync: 0.008 seconds
> [   32.769310] Freezing user space processes
> [   32.776527] Freezing user space processes completed (elapsed 0.007 
> seconds)
> [   32.776530] OOM killer disabled.
> [   32.776531] Freezing remaining freezable tasks
> [   32.777528] Freezing remaining freezable tasks completed (elapsed 
> 0.000 seconds)
> [   32.777531] printk: Suspending console(s) (use no_console_suspend to 
> debug)
> [   32.817853] amdgpu 0000:c2:00.0: amdgpu: Delayed work to enable gfxoff
> [   32.817857] amdgpu 0000:c2:00.0: amdgpu: 
> amdgpu_dpm_set_powergating_by_smu by 
> amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
> [   32.818142] amdgpu 0000:c2:00.0: amdgpu: broke pm.mutex for 
> amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
> [   32.852099] amdgpu 0000:c2:00.0: amdgpu: smu_suspend: suspend called
> [   32.852101] amdgpu 0000:c2:00.0: amdgpu: smu_disable_dpms: called
> 
> Without patch 1 the delayed work doesn't get called on entry ever.
> 
>> Can we remove this code also as there is a flush anyway with patch 1?
> 
> Sure.  Do you think it should go into patch 1 or on it's own?
> 

Preferably in patch 1 itself as it explains why it was removed.

>> Also, is there a need to call GFXOFF forcefully on S0ix suspend (any 
>> chance that gfxoff is not scheduled)?
>>
> If using "echo mem | sudo tee /sys/power/state" I've confirmed that it's 
> already in GFXOFF.  I don't think this case should happen.
>>>> 2) RLC is never stopped on GFX 10 or greater.
>>>>
>>> System was hanging before this series.
>>>
>>> Patch 3 "alone" matches this behavior as described above to skip RLC 
>>> suspend but two problems happen:
>>>
>>> 1) GFXOFF workqueue doesn't get flushed and so driver's request for 
>>> GFXOFF can happen at wrong time.
>>>
>>> 2) If suspend entry happens before GFXOFF is really asserted lots of 
>>> errors on resume. IE:
>>>
>>
>> Is patch 3 really required?  Does it make any difference?
>>
> No; patch 3 isn't really required with patches 1 and 2.
> 

My preference is to drop patch 3 and not to have an additional place of 
in_s0ix check.

Thanks,
Lijo

>> Thanks,
>> Lijo
>>
>>> [   63.095227] [drm] Fence fallback timer expired on ring sdma0
>>> [   63.098360] [drm] ring gfx_32772.1.1 was added
>>> [   63.099439] [drm] ring compute_32772.2.2 was added
>>> [   63.100460] [drm] ring sdma_32772.3.3 was added
>>> [   63.100504] [drm] ring gfx_32772.1.1 test pass
>>> [   63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
>>> [   63.607234] [drm] ring gfx_32772.1.1 ib test pass
>>> [   63.608964] [drm] ring compute_32772.2.2 test pass
>>> [   64.119173] [drm] Fence fallback timer expired on ring 
>>> compute_32772.2.2
>>> [   64.119219] [drm] ring compute_32772.2.2 ib test pass
>>> [   64.121364] [drm] ring sdma_32772.3.3 test pass
>>> [   64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
>>> [   64.631465] [drm] ring sdma_32772.3.3 ib test pass
>>> [   65.143184] [drm] Fence fallback timer expired on ring sdma0
>>>
>>>> Wondering if the code hides something else because of the timing.
>>>> Thanks,
>>>> Lijo
>>>>
>>>>> This causes a delayed GFXOFF entry to be scheduled right before s2idle
>>>>> entry.  If the workqueue doesn't get processed before the RLC is 
>>>>> turned
>>>>> off the system is hung. Even if the workqueue *does* get processed, 
>>>>> there
>>>>> is a race between the APU microcontrollers and driver for whether GFX
>>>>> is actually powered off when RLC is turned off.
>>>>>
>>>>> To avoid this issue, flush the workqueue on s2idle entry and ensure 
>>>>> that
>>>>> GFX is really in GFXOFF before any sensitive register accesses occur.
>>>>>
>>>>> Mario Limonciello (3):
>>>>>    drm/amd: Flush any delayed gfxoff on suspend entry
>>>>>    drm/amd: Poll for GFX core to be off
>>>>>    drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11
>>>>>
>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 
>>>>> ++++++++++++++++++++++
>>>>>   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c     | 18 ++++++++++++++++
>>>>>   drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
>>>>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
>>>>>   4 files changed, 46 insertions(+), 2 deletions(-)
>>>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
  2023-05-17  5:26         ` Lazar, Lijo
@ 2023-05-17  5:35           ` Limonciello, Mario
  2023-05-17  6:23             ` Limonciello, Mario
  0 siblings, 1 reply; 11+ messages in thread
From: Limonciello, Mario @ 2023-05-17  5:35 UTC (permalink / raw)
  To: Lazar, Lijo, Mario Limonciello, amd-gfx
  Cc: anson.tsao, Juan.Martinez, richard.gong, Tim Huang


On 5/17/2023 12:26 AM, Lazar, Lijo wrote:
>
>
> On 5/17/2023 10:46 AM, Limonciello, Mario wrote:
>>
>> On 5/17/2023 12:07 AM, Lazar, Lijo wrote:
>>>
>>>
>>> On 5/17/2023 10:25 AM, Limonciello, Mario wrote:
>>>>
>>>> On 5/16/2023 11:43 PM, Lazar, Lijo wrote:
>>>>>
>>>>>
>>>>> On 5/17/2023 5:04 AM, Mario Limonciello wrote:
>>>>>> DCN 3.1.4 s2idle entry will hang
>>>>>> occasionally on s2idle entry, but only if running Wayland and only
>>>>>> when using `systemctl suspend`, not `echo mem | tee 
>>>>>> /sys/power/state`.
>>>>>>
>>>>>> This happens because using `systemctl suspend` will cause the screen
>>>>>> to lock right before writing mem into /sys/power/state.
>>>>>>
>>>>>
>>>>> A couple of things on this since this mentions systemctl suspend -
>>>>>
>>>>> 1) If in s2idle, it's supposed to immediately signal and not 
>>>>> schedule delayed work
>>>>>
>>>>> 3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete 
>>>>> gfxoff allow signal during suspend without delay
>>>>
>>>> It looks like dead code to me now actually.
>>>>
>>>> amdgpu_device_set_pg_state() skips GFX, so gfxoff control never 
>>>> gets called as part of suspend path.
>>>>
>>>
>>> Ok, that means schedule happened sometime before. 
>> To come up with these patches I had a test kernel with extra prints 
>> that showed the function call orders.
>>
>> With systemctl suspend there is a call to 
>> gfx_v11_0_get_gpu_clock_counter() from userspace IOCTL that triggers 
>> all this behavior. 
>
> I think we replaced this with golden timestamp value which doesn't 
> require GFX register access.

Ah yes; through

5591a051b86b ("drm/amdgpu: refine get gpu clock counter method")

This wasn't part of the kernel this was originally reported on.

I suspect this would significantly decrease the likelihood of it 
occurring.  I'll confirm it.
I do think that patches 1/2 still make sense though because gfxoff can 
be triggered other ways too.

>
>  Here is the function calls with the patched kernel:
>>
>> [   32.720456] amdgpu 0000:c2:00.0: amdgpu: set GFX off state to 
>> enabled, count:1
>> [   32.720457] amdgpu 0000:c2:00.0: amdgpu: broke gfx_off_mutex for 
>> gfx_v11_0_get_gpu_clock_counter+0xa8/0xf0 [amdgpu], 
>> adev->gfx.gfx_off_state is 0
>> [   32.760475] PM: suspend entry (s2idle)
>> [   32.768996] Filesystems sync: 0.008 seconds
>> [   32.769310] Freezing user space processes
>> [   32.776527] Freezing user space processes completed (elapsed 0.007 
>> seconds)
>> [   32.776530] OOM killer disabled.
>> [   32.776531] Freezing remaining freezable tasks
>> [   32.777528] Freezing remaining freezable tasks completed (elapsed 
>> 0.000 seconds)
>> [   32.777531] printk: Suspending console(s) (use no_console_suspend 
>> to debug)
>> [   32.817853] amdgpu 0000:c2:00.0: amdgpu: Delayed work to enable 
>> gfxoff
>> [   32.817857] amdgpu 0000:c2:00.0: amdgpu: 
>> amdgpu_dpm_set_powergating_by_smu by 
>> amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
>> [   32.818142] amdgpu 0000:c2:00.0: amdgpu: broke pm.mutex for 
>> amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
>> [   32.852099] amdgpu 0000:c2:00.0: amdgpu: smu_suspend: suspend called
>> [   32.852101] amdgpu 0000:c2:00.0: amdgpu: smu_disable_dpms: called
>>
>> Without patch 1 the delayed work doesn't get called on entry ever.
>>
>>> Can we remove this code also as there is a flush anyway with patch 1?
>>
>> Sure.  Do you think it should go into patch 1 or on it's own?
>>
>
> Preferably in patch 1 itself as it explains why it was removed.
OK.
>
>>> Also, is there a need to call GFXOFF forcefully on S0ix suspend (any 
>>> chance that gfxoff is not scheduled)?
>>>
>> If using "echo mem | sudo tee /sys/power/state" I've confirmed that 
>> it's already in GFXOFF.  I don't think this case should happen.
>>>>> 2) RLC is never stopped on GFX 10 or greater.
>>>>>
>>>> System was hanging before this series.
>>>>
>>>> Patch 3 "alone" matches this behavior as described above to skip 
>>>> RLC suspend but two problems happen:
>>>>
>>>> 1) GFXOFF workqueue doesn't get flushed and so driver's request for 
>>>> GFXOFF can happen at wrong time.
>>>>
>>>> 2) If suspend entry happens before GFXOFF is really asserted lots 
>>>> of errors on resume. IE:
>>>>
>>>
>>> Is patch 3 really required?  Does it make any difference?
>>>
>> No; patch 3 isn't really required with patches 1 and 2.
>>
>
> My preference is to drop patch 3 and not to have an additional place 
> of in_s0ix check.
OK.
>
> Thanks,
> Lijo
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry
  2023-05-17  5:35           ` Limonciello, Mario
@ 2023-05-17  6:23             ` Limonciello, Mario
  0 siblings, 0 replies; 11+ messages in thread
From: Limonciello, Mario @ 2023-05-17  6:23 UTC (permalink / raw)
  To: Alex Deucher, Lazar, Lijo
  Cc: anson.tsao, Juan.Martinez, richard.gong, amd-gfx, Tim Huang


>> I think we replaced this with golden timestamp value which doesn't 
>> require GFX register access.
>
> Ah yes; through
>
> 5591a051b86b ("drm/amdgpu: refine get gpu clock counter method")
>
> This wasn't part of the kernel this was originally reported on.
>
> I suspect this would significantly decrease the likelihood of it 
> occurring. I'll confirm it.
> I do think that patches 1/2 still make sense though because gfxoff can 
> be triggered other ways too.
>
Confirmed that by adding:

5591a051b86b ("drm/amdgpu: refine get gpu clock counter method")
and
ea27ee2bea6b ("drm/amdgpu/gfx11: update gpu_clock_counter logic")
the original issue goes away.

I will still refine my patches and send a v3 up though as GFXOFF can be 
triggered other ways by userspace and we should avoid this bug.

@Alex:

Can you please queue up ea27ee2bea6b for this week's fixes and include 
the tags:

Cc: stable@vger.kernel.org # 6.1.y: 5591a051b86b: drm/amdgpu: refine get 
gpu clock counter method
Cc: stable@vger.kernel.org # 6.2.y: 5591a051b86b: drm/amdgpu: refine get 
gpu clock counter method
Cc: stable@vger.kernel.org # 6.3.y: 5591a051b86b: drm/amdgpu: refine get 
gpu clock counter method

>>  Here is the function calls with the patched kernel:
>>>
>>> [   32.720456] amdgpu 0000:c2:00.0: amdgpu: set GFX off state to 
>>> enabled, count:1
>>> [   32.720457] amdgpu 0000:c2:00.0: amdgpu: broke gfx_off_mutex for 
>>> gfx_v11_0_get_gpu_clock_counter+0xa8/0xf0 [amdgpu], 
>>> adev->gfx.gfx_off_state is 0
>>> [   32.760475] PM: suspend entry (s2idle)
>>> [   32.768996] Filesystems sync: 0.008 seconds
>>> [   32.769310] Freezing user space processes
>>> [   32.776527] Freezing user space processes completed (elapsed 
>>> 0.007 seconds)
>>> [   32.776530] OOM killer disabled.
>>> [   32.776531] Freezing remaining freezable tasks
>>> [   32.777528] Freezing remaining freezable tasks completed (elapsed 
>>> 0.000 seconds)
>>> [   32.777531] printk: Suspending console(s) (use no_console_suspend 
>>> to debug)
>>> [   32.817853] amdgpu 0000:c2:00.0: amdgpu: Delayed work to enable 
>>> gfxoff
>>> [   32.817857] amdgpu 0000:c2:00.0: amdgpu: 
>>> amdgpu_dpm_set_powergating_by_smu by 
>>> amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
>>> [   32.818142] amdgpu 0000:c2:00.0: amdgpu: broke pm.mutex for 
>>> amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
>>> [   32.852099] amdgpu 0000:c2:00.0: amdgpu: smu_suspend: suspend called
>>> [   32.852101] amdgpu 0000:c2:00.0: amdgpu: smu_disable_dpms: called
>>>
>>> Without patch 1 the delayed work doesn't get called on entry ever.
>>>
>>>> Can we remove this code also as there is a flush anyway with patch 1?
>>>
>>> Sure.  Do you think it should go into patch 1 or on it's own?
>>>
>>
>> Preferably in patch 1 itself as it explains why it was removed.
> OK.
>>
>>>> Also, is there a need to call GFXOFF forcefully on S0ix suspend 
>>>> (any chance that gfxoff is not scheduled)?
>>>>
>>> If using "echo mem | sudo tee /sys/power/state" I've confirmed that 
>>> it's already in GFXOFF.  I don't think this case should happen.
>>>>>> 2) RLC is never stopped on GFX 10 or greater.
>>>>>>
>>>>> System was hanging before this series.
>>>>>
>>>>> Patch 3 "alone" matches this behavior as described above to skip 
>>>>> RLC suspend but two problems happen:
>>>>>
>>>>> 1) GFXOFF workqueue doesn't get flushed and so driver's request 
>>>>> for GFXOFF can happen at wrong time.
>>>>>
>>>>> 2) If suspend entry happens before GFXOFF is really asserted lots 
>>>>> of errors on resume. IE:
>>>>>
>>>>
>>>> Is patch 3 really required?  Does it make any difference?
>>>>
>>> No; patch 3 isn't really required with patches 1 and 2.
>>>
>>
>> My preference is to drop patch 3 and not to have an additional place 
>> of in_s0ix check.
> OK.
>>
>> Thanks,
>> Lijo
>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-05-17  6:24 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-16 23:34 [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Mario Limonciello
2023-05-16 23:34 ` [PATCH v2 1/3] drm/amd: Flush any delayed gfxoff on suspend entry Mario Limonciello
2023-05-16 23:34 ` [PATCH v2 2/3] drm/amd: Poll for GFX core to be off Mario Limonciello
2023-05-16 23:34 ` [PATCH v2 3/3] drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11 Mario Limonciello
2023-05-17  4:43 ` [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry Lazar, Lijo
2023-05-17  4:55   ` Limonciello, Mario
2023-05-17  5:07     ` Lazar, Lijo
2023-05-17  5:16       ` Limonciello, Mario
2023-05-17  5:26         ` Lazar, Lijo
2023-05-17  5:35           ` Limonciello, Mario
2023-05-17  6:23             ` Limonciello, Mario

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.