stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thorsten Leemhuis <regressions@leemhuis.info>
To: Christian Casteyde <casteyde.christian@free.fr>,
	Kai-Heng Feng <kai.heng.feng@canonical.com>
Cc: stable@vger.kernel.org, regressions@lists.linux.dev,
	alexander deucher <alexander.deucher@amd.com>,
	gregkh@linuxfoundation.org,
	Mario Limonciello <mario.limonciello@amd.com>
Subject: Re: [REGRESSION] Laptop with Ryzen 4600H fails to resume video since 5.17.4 (works 5.17.3)
Date: Wed, 25 May 2022 09:29:19 +0200	[thread overview]
Message-ID: <ff1191fa-1cc9-24ab-4c7d-b3d41c442670@leemhuis.info> (raw)
In-Reply-To: <2175827.vFx2qVVIhK@geek500.localdomain>

On 24.05.22 22:54, Christian Casteyde wrote:
> Conclusion from gitlab discussion

Christian, Mario, thx for the updates. I'll remove the issue from the
list of tracked regressions then:

#regzbot invalid: tricky situation with other problems; the issue thus
not really qualifies as regression

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.


> There are three different problems at stake here:
> 
> 1. First suspend failure after boot is due to TAD driver.
> When it is loaded, first suspend deep fails.
> This is the root cause of the following sequence.
> TAD seems to be not completely functionnal on my laptop, sysfs returns IO 
> errors (missing handlers in dmesg).
> After the first failure, every "deep" attempt work.
> 
> 2.  When suspend deep fails, elogind (used by Slackware 15) falls back to 
> s2idle.
> This behaviour is documented in logind.conf man page, and normaly can be 
> configured (side note: I didn't manage to do so, it ignores my configuration 
> file).
> 
> 3. When suspend to s2idle, the GPU fails to suspend and need a reset.
> This reset must be done in pm_suspend (not totally ok if reset at resume).
> However, the laptop indeed goes into s2idle.
> In this state, the power button awakes it: this part is handled by the BIOS 
> and not the distribution (which shut downs if not suspended).
> This is what doesn't work anymore un 5.17.4 and 5.18.
> 
> The root cause is therefore the ACPI TAD preventing the first deep suspend to 
> complete, then elogind asking for a s2idle in fallback, then s2idle leaving 
> the APU in inconsistent state, that can only be fixed by a reset in 
> pm_suspend, and not pm_suspend_noirq or pm_suspend_late.
> 
> I will open a separate bug for the ACPI TAD problem. For now I will run 
> without this driver, as deep suspend works fine in this case and s2idle is 
> therefore useless for me.
> 
> CC
> 
> 
> Le lundi 23 mai 2022, 19:03:27 CEST Christian Casteyde a écrit :
>> I've opened the gitlab entry for this discussion:
>> https://gitlab.freedesktop.org/drm/amd/-/issues/2023
>>
>> I confirm I 'm not receiving mails anymore from the mailing list but I'll
>> follow gitlab.
>>
>> In reply to the patch proposed by Mario:
>> https://patchwork.freedesktop.org/patch/486836/
>> With this patch applied on vanilla 5.18 kernel:
>> - suspend still fails;
>> - after suspend attempt, the screen comes back with only the cursor;
>> - switching to a console let me get the following dmesg file.
>>
>> CC
>>
>> Le lundi 23 mai 2022, 15:02:53 CEST Christian Casteyde a écrit :
>>> Hello
>>>
>>> I've checked with 5.18 the problem is still there.
>>> Interestingly, I tried to revert the commit but it was rejected because of
>>>
>>> the change in the test from:
>>>         if (!adev->in_s0ix)
>>>
>>> to:
>>>       if (amdgpu_acpi_should_gpu_reset(adev))
>>>
>>> in amdgpu_pmops_suspend.
>>>
>>> I fixed the rejection, keeping shoud_gpu_reset, but it still fails.
>>> Then I changed to restore test of in_s0ix as it was in 5.17, and it works.
>>> I tried with a call to amd_gpu_asic_reset without testing at all in_s0ix,
>>> it works.
>>>
>>> Therefore, my APU wants a reset in amdgpu_pmops_suspend.
>>>
>>> By curiosity, I tried to do the reset in amdgpu_pmops_suspend_noirq as was
>>> intended in 5.18 original code, commenting out the test of
>>> amdgpu_acpi_should_gpu_reset(adev) (since this APU wants a reset).
>>> This does not work, I got the Fence timeout errors or freezes.
>>>
>>> If I leave  noirq function unchanged (original 5.18 code), and just add a
>>> reset in suspend() as was done in 5.17, it works.
>>>
>>> Therefore, my GPU does NOT want to be reset in noirq, the reset must be in
>>> suspend.
>>>
>>> In other words, I modified amdgpu_pmops_suspend (partial revert) like this
>>> and this works on my laptop:
>>>
>>> static int amdgpu_pmops_suspend(struct device *dev)
>>> {
>>>
>>> 	struct drm_device *drm_dev = dev_get_drvdata(dev);
>>> 	struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>>
>>> +	int r;
>>>
>>> 	if (amdgpu_acpi_is_s0ix_active(adev))
>>> 	
>>> 		adev->in_s0ix = true;
>>> 	
>>> 	else
>>> 	
>>> 		adev->in_s3 = true;
>>>
>>> -	return amdgpu_device_suspend(drm_dev, true);
>>> +	r = amdgpu_device_suspend(drm_dev, true);
>>> +	if (r)
>>> +		return r;
>>> +	if (!adev->in_s0ix)
>>> +		return amdgpu_asic_reset(adev);
>>>
>>> 	return 0;
>>>
>>> }
>>>
>>> static int amdgpu_pmops_suspend_noirq(struct device *dev)
>>> {
>>>
>>> 	struct drm_device *drm_dev = dev_get_drvdata(dev);
>>> 	struct amdgpu_device *adev = drm_to_adev(drm_dev);
>>> 	
>>> 	if (amdgpu_acpi_should_gpu_reset(adev))
>>> 	
>>> 		return amdgpu_asic_reset(adev);
>>> 	
>>> 	return 0;
>>>
>>> }
>>>
>>> I don't know if other APU want a reset, in the same context, and how to
>>> differentiate all the cases, so I cannot go further, but I can test
>>> patches
>>> if needed.
>>>
>>> CC
>>>
>>> Le mercredi 18 mai 2022, 08:37:27 CEST Thorsten Leemhuis a écrit :
>>>> On 18.05.22 07:54, Kai-Heng Feng wrote:
>>>>> On Wed, May 18, 2022 at 1:52 PM Thorsten Leemhuis
>>>>>
>>>>> <regressions@leemhuis.info> wrote:
>>>>>> On 17.05.22 19:37, casteyde.christian@free.fr wrote:
>>>>>>> I've tryied to revert the offending commit on 5.18-rc7 (887f75cfd0da
>>>>>>> ("drm/amdgpu: Ensure HDA function is suspended before ASIC reset"),
>>>>>>> and
>>>>>>> the problem disappears so it's really this commit that breaks.
>>>>>>
>>>>>> In that case I'll update the regzbot status to make sure it's visible
>>>>>> as
>>>>>> regression introduced in the 5.18 cycle:
>>>>>>
>>>>>> #regzbot introduced: 887f75cfd0da
>>>>>>
>>>>>> BTW: obviously would be nice to get this fixed before 5.18 is
>>>>>> released
>>>>>> (which might already happen on Sunday), especially as the culprit
>>>>>> apparently was already backported to stable, but I guess that won't
>>>>>> be
>>>>>> easy...
>>>>>>
>>>>>> Which made me wondering: is reverting the culprit temporarily in
>>>>>> mainline (and reapplying it later with a fix) a option here?
>>>>>
>>>>> It's too soon to call it's the culprit.
>>>>
>>>> Well, sure, the root-cause might be somewhere else. But from the point
>>>> of kernel regressions (and tracking them) it's the culprit, as that's
>>>> the change that triggers the misbehavior. And that's how Linus
>>>> approaches these things as well when it comes to reverting to fix
>>>> regressions -- and he even might...
>>>>
>>>>> The suspend on the system
>>>>> doesn't work properly at the first place.
>>>>
>>>> ...ignore things like this, as long as a revert is unlikely to cause
>>>> more damage than good.
>>>>
>>>> Ciao. Thorsten
>>>>
>>>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker'
>>>>>> hat)
>>>>>>
>>>>>> P.S.: As the Linux kernel's regression tracker I deal with a lot of
>>>>>> reports and sometimes miss something important when writing mails
>>>>>> like
>>>>>> this. If that's the case here, don't hesitate to tell me in a public
>>>>>> reply, it's in everyone's interest to set the public record straight.
> 
> 
> 
> 
> 

  reply	other threads:[~2022-05-25  7:29 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-14 14:41 [REGRESSION] Laptop with Ryzen 4600H fails to resume video since 5.17.4 (works 5.17.3) Christian Casteyde
2022-05-14 15:12 ` Thorsten Leemhuis
2022-05-14 17:34   ` Christian Casteyde
2022-05-16  2:47     ` Kai-Heng Feng
2022-05-16  3:06       ` Mario Limonciello
2022-05-16 17:23       ` Christian Casteyde
2022-05-17  2:03         ` Kai-Heng Feng
2022-05-17  6:36           ` Christian Casteyde
2022-05-17  6:58             ` Kai-Heng Feng
2022-05-17 17:37               ` casteyde.christian
2022-05-18  5:52                 ` Thorsten Leemhuis
2022-05-18  5:54                   ` Kai-Heng Feng
2022-05-18  6:37                     ` Thorsten Leemhuis
2022-05-18 20:15                       ` Limonciello, Mario
2022-05-23 13:02                       ` Christian Casteyde
2022-05-23 14:00                         ` Limonciello, Mario
2022-05-23 17:03                         ` Christian Casteyde
2022-05-23 22:01                           ` Limonciello, Mario
2022-05-24 20:54                           ` Christian Casteyde
2022-05-25  7:29                             ` Thorsten Leemhuis [this message]
2022-05-17 17:38               ` casteyde.christian
2022-05-17 18:13                 ` Limonciello, Mario
2022-05-18 11:02                   ` casteyde.christian
2022-05-19 17:31                     ` Limonciello, Mario
2022-05-19 18:07                       ` Limonciello, Mario
2022-05-18  2:08                 ` Kai-Heng Feng
2022-05-18  2:19                   ` Mario Limonciello
2022-05-18  7:15                   ` Christian Casteyde
2022-05-18 10:53                     ` casteyde.christian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ff1191fa-1cc9-24ab-4c7d-b3d41c442670@leemhuis.info \
    --to=regressions@leemhuis.info \
    --cc=alexander.deucher@amd.com \
    --cc=casteyde.christian@free.fr \
    --cc=gregkh@linuxfoundation.org \
    --cc=kai.heng.feng@canonical.com \
    --cc=mario.limonciello@amd.com \
    --cc=regressions@lists.linux.dev \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).