regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: 6.1.22: Resume from hibernate fails; bisected
       [not found] <b52bfd11-0d90-739b-be3e-058e246478f7@mailbox.org>
@ 2023-04-06 13:30 ` Linux regression tracking (Thorsten Leemhuis)
  2023-04-06 15:39   ` Rainer Fiebig
  0 siblings, 1 reply; 6+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-04-06 13:30 UTC (permalink / raw)
  To: Rainer Fiebig, stable, tim.huang, Alex Deucher,
	Linux kernel regressions list

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 06.04.23 14:06, Rainer Fiebig wrote:
> Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a
> key on the keyboard fails. However, if the PC was switched off and on
> again (or reset), the resume is OK. The APU  is a Ryzen 5600G.
> 
> Bisecting between 6.1.21/22 turned up this:
> 
> 
> Author: Tim Huang <tim.huang@amd.com>
> Date:   Thu Mar 9 16:27:51 2023 +0800
> 
>     drm/amdgpu: skip ASIC reset for APUs when go to S4
> 
>     commit b589626674de94d977e81c99bf7905872b991197 upstream.
> 
>     For GC IP v11.0.4/11, PSP TMR need to be reserved
>     for ASIC mode2 reset. But for S4, when psp suspend,
>     it will destroy the TMR that fails the ASIC reset.
> [...]
> 
> 
> Reverting the commit solves the problem.
> Thanks.

Please try 6.1.23 and report back, because from the thread
https://lore.kernel.org/all/20230330160740.1dbff94b@schienar/
it sounds a lot like "drm/amdgpu: allow more APUs to do mode2 reset when
go to S4" might be fixing this, which went into 6.1.23.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 6.1.22: Resume from hibernate fails; bisected
  2023-04-06 13:30 ` 6.1.22: Resume from hibernate fails; bisected Linux regression tracking (Thorsten Leemhuis)
@ 2023-04-06 15:39   ` Rainer Fiebig
  2023-04-06 20:09     ` Greg KH
  0 siblings, 1 reply; 6+ messages in thread
From: Rainer Fiebig @ 2023-04-06 15:39 UTC (permalink / raw)
  To: Linux regressions mailing list, stable, tim.huang, Alex Deucher

Am 06.04.23 um 15:30 schrieb Linux regression tracking (Thorsten Leemhuis):
> [CCing the regression list, as it should be in the loop for regressions:
> https://docs.kernel.org/admin-guide/reporting-regressions.html]
> 
> On 06.04.23 14:06, Rainer Fiebig wrote:
>> Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a
>> key on the keyboard fails. However, if the PC was switched off and on
>> again (or reset), the resume is OK. The APU  is a Ryzen 5600G.
>>
>> Bisecting between 6.1.21/22 turned up this:
>>
>>
>> Author: Tim Huang <tim.huang@amd.com>
>> Date:   Thu Mar 9 16:27:51 2023 +0800
>>
>>     drm/amdgpu: skip ASIC reset for APUs when go to S4
>>
>>     commit b589626674de94d977e81c99bf7905872b991197 upstream.
>>
>>     For GC IP v11.0.4/11, PSP TMR need to be reserved
>>     for ASIC mode2 reset. But for S4, when psp suspend,
>>     it will destroy the TMR that fails the ASIC reset.
>> [...]
>>
>>
>> Reverting the commit solves the problem.
>> Thanks.
> 
> Please try 6.1.23 and report back, because from the thread
> https://lore.kernel.org/all/20230330160740.1dbff94b@schienar/
> it sounds a lot like "drm/amdgpu: allow more APUs to do mode2 reset when
> go to S4" might be fixing this, which went into 6.1.23.
Yes, 6.1.23 seems OK so far.

I think, however, that rc-kernels and LTS-kernels are different matters.
 With a bleeding edge kernel, problems are to be expected.  But an
LTS-kernel is chosen for stability.  And this is the second time within
just a few weeks that I've been bitten by a time-consuming hibernate-bug
caused by a backport of a commit in amdgpu.

So I'm asking the devs to either test their patches more thoroughly or
to be a bit more conservative with what they recommend for backporting
to LTS-kernels.  Thanks.


Rainer Fiebig


> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 6.1.22: Resume from hibernate fails; bisected
  2023-04-06 15:39   ` Rainer Fiebig
@ 2023-04-06 20:09     ` Greg KH
       [not found]       ` <BY5PR12MB3873E2729AAA7D0FBB657611F6969@BY5PR12MB3873.namprd12.prod.outlook.com>
  2023-04-07 11:56       ` Rainer Fiebig
  0 siblings, 2 replies; 6+ messages in thread
From: Greg KH @ 2023-04-06 20:09 UTC (permalink / raw)
  To: Rainer Fiebig
  Cc: Linux regressions mailing list, stable, tim.huang, Alex Deucher

On Thu, Apr 06, 2023 at 05:39:07PM +0200, Rainer Fiebig wrote:
> Am 06.04.23 um 15:30 schrieb Linux regression tracking (Thorsten Leemhuis):
> > [CCing the regression list, as it should be in the loop for regressions:
> > https://docs.kernel.org/admin-guide/reporting-regressions.html]
> > 
> > On 06.04.23 14:06, Rainer Fiebig wrote:
> >> Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a
> >> key on the keyboard fails. However, if the PC was switched off and on
> >> again (or reset), the resume is OK. The APU  is a Ryzen 5600G.
> >>
> >> Bisecting between 6.1.21/22 turned up this:
> >>
> >>
> >> Author: Tim Huang <tim.huang@amd.com>
> >> Date:   Thu Mar 9 16:27:51 2023 +0800
> >>
> >>     drm/amdgpu: skip ASIC reset for APUs when go to S4
> >>
> >>     commit b589626674de94d977e81c99bf7905872b991197 upstream.
> >>
> >>     For GC IP v11.0.4/11, PSP TMR need to be reserved
> >>     for ASIC mode2 reset. But for S4, when psp suspend,
> >>     it will destroy the TMR that fails the ASIC reset.
> >> [...]
> >>
> >>
> >> Reverting the commit solves the problem.
> >> Thanks.
> > 
> > Please try 6.1.23 and report back, because from the thread
> > https://lore.kernel.org/all/20230330160740.1dbff94b@schienar/
> > it sounds a lot like "drm/amdgpu: allow more APUs to do mode2 reset when
> > go to S4" might be fixing this, which went into 6.1.23.
> Yes, 6.1.23 seems OK so far.
> 
> I think, however, that rc-kernels and LTS-kernels are different matters.
>  With a bleeding edge kernel, problems are to be expected.  But an
> LTS-kernel is chosen for stability.  And this is the second time within
> just a few weeks that I've been bitten by a time-consuming hibernate-bug
> caused by a backport of a commit in amdgpu.
> 
> So I'm asking the devs to either test their patches more thoroughly or
> to be a bit more conservative with what they recommend for backporting
> to LTS-kernels.  Thanks.

Please feel free to suggest better ways to have automated tests for
stuff like this, or to help provide testing for the -rc LTS/stable
kernel releases.

We can't do this alone :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 6.1.22: Resume from hibernate fails; bisected
       [not found]       ` <BY5PR12MB3873E2729AAA7D0FBB657611F6969@BY5PR12MB3873.namprd12.prod.outlook.com>
@ 2023-04-07 10:01         ` Rainer Fiebig
  0 siblings, 0 replies; 6+ messages in thread
From: Rainer Fiebig @ 2023-04-07 10:01 UTC (permalink / raw)
  To: Huang, Tim
  Cc: Linux regressions mailing list, stable, Deucher, Alexander, Greg KH

Am 07.04.23 um 05:40 schrieb Huang, Tim:
> [AMD Official Use Only - General]
> 
> On Thu, Apr 06, 2023 at 05:39:07PM +0200, Rainer Fiebig wrote:
>> Am 06.04.23 um 15:30 schrieb Linux regression tracking (Thorsten Leemhuis):
>>> [CCing the regression list, as it should be in the loop for regressions:
>>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>>>
>>> On 06.04.23 14:06, Rainer Fiebig wrote:
>>>> Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a
>>>> key on the keyboard fails. However, if the PC was switched off and on
>>>> again (or reset), the resume is OK. The APU  is a Ryzen 5600G.
>>>>
>>>> Bisecting between 6.1.21/22 turned up this:
>>>>
>>>>
>>>> Author: Tim Huang <tim.huang@amd.com>
>>>> Date:   Thu Mar 9 16:27:51 2023 +0800
>>>>
>>>>     drm/amdgpu: skip ASIC reset for APUs when go to S4
>>>>
>>>>     commit b589626674de94d977e81c99bf7905872b991197 upstream.
>>>>
>>>>     For GC IP v11.0.4/11, PSP TMR need to be reserved
>>>>     for ASIC mode2 reset. But for S4, when psp suspend,
>>>>     it will destroy the TMR that fails the ASIC reset.
>>>> [...]
>>>>
>>>>
>>>> Reverting the commit solves the problem.
>>>> Thanks.
>>>
>>> Please try 6.1.23 and report back, because from the thread
>>> https://lore.kernel.org/all/20230330160740.1dbff94b@schienar/
>>> it sounds a lot like "drm/amdgpu: allow more APUs to do mode2 reset when
>>> go to S4" might be fixing this, which went into 6.1.23.
>> Yes, 6.1.23 seems OK so far.
>>
> 
> 
> The patch " drm/amdgpu: allow more APUs to do mode2 reset when go to S4" is to fix this hibernate regression issue.
> 
> Sorry to have troubled you.
No problem, please don't take it personally. It wasn't a big deal and I
was just a bit grumpy yesterday.

Thanks for the info and have a good day!

Rainer



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 6.1.22: Resume from hibernate fails; bisected
  2023-04-06 20:09     ` Greg KH
       [not found]       ` <BY5PR12MB3873E2729AAA7D0FBB657611F6969@BY5PR12MB3873.namprd12.prod.outlook.com>
@ 2023-04-07 11:56       ` Rainer Fiebig
  2023-04-07 12:14         ` Greg KH
  1 sibling, 1 reply; 6+ messages in thread
From: Rainer Fiebig @ 2023-04-07 11:56 UTC (permalink / raw)
  To: Greg KH; +Cc: Linux regressions mailing list, stable, tim.huang, Alex Deucher

Am 06.04.23 um 22:09 schrieb Greg KH:
> On Thu, Apr 06, 2023 at 05:39:07PM +0200, Rainer Fiebig wrote:
>> Am 06.04.23 um 15:30 schrieb Linux regression tracking (Thorsten Leemhuis):
>>> [CCing the regression list, as it should be in the loop for regressions:
>>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
>>>
>>> On 06.04.23 14:06, Rainer Fiebig wrote:
>>>> Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a
>>>> key on the keyboard fails. However, if the PC was switched off and on
>>>> again (or reset), the resume is OK. The APU  is a Ryzen 5600G.
>>>>
>>>> Bisecting between 6.1.21/22 turned up this:
>>>>
>>>>
>>>> Author: Tim Huang <tim.huang@amd.com>
>>>> Date:   Thu Mar 9 16:27:51 2023 +0800
>>>>
>>>>     drm/amdgpu: skip ASIC reset for APUs when go to S4
>>>>
>>>>     commit b589626674de94d977e81c99bf7905872b991197 upstream.
>>>>
>>>>     For GC IP v11.0.4/11, PSP TMR need to be reserved
>>>>     for ASIC mode2 reset. But for S4, when psp suspend,
>>>>     it will destroy the TMR that fails the ASIC reset.
>>>> [...]
>>>>
>>>>
>>>> Reverting the commit solves the problem.
>>>> Thanks.
>>>
>>> Please try 6.1.23 and report back, because from the thread
>>> https://lore.kernel.org/all/20230330160740.1dbff94b@schienar/
>>> it sounds a lot like "drm/amdgpu: allow more APUs to do mode2 reset when
>>> go to S4" might be fixing this, which went into 6.1.23.
>> Yes, 6.1.23 seems OK so far.
>>
>> I think, however, that rc-kernels and LTS-kernels are different matters.
>>  With a bleeding edge kernel, problems are to be expected.  But an
>> LTS-kernel is chosen for stability.  And this is the second time within
>> just a few weeks that I've been bitten by a time-consuming hibernate-bug
>> caused by a backport of a commit in amdgpu.
>>
>> So I'm asking the devs to either test their patches more thoroughly or
>> to be a bit more conservative with what they recommend for backporting
>> to LTS-kernels.  Thanks.
> 
> Please feel free to suggest better ways to have automated tests for
> stuff like this, or to help provide testing for the -rc LTS/stable
> kernel releases.
Well, I'm afraid I can't offer a panacea or the ultimate automated
quality assurance system.  But for the two cases that I've encountered
lately, a simple hibernate/resume would have shown that there's a
problem.  After all, that's how I and other users noticed it.

So I think the primary line of defence against regressions remains the
developer himself who should try hard to imagine what ramifications his
patch might have and test it accordingly.  But I'm aware of the fact
that we are all only humans.

Another idea might be to give patches that introduce new features or
only minimal improvements ample time to mature in the latest stable
kernel before backporting them to LTS-kernels, say three or four
point-releases.  Or to only backport fixes for bugs or security issues.

> 
> We can't do this alone :)
Right.  For now I can't commit to testing release-candidates because of
a lack in time.  But I try to bisect and report problems as soon as
possible so that they can be resolved quickly.

To avoid a false impression: kernelwise - and including amdgpu - I'm
rather happy with the current state of affairs.  Thanks to all!


Rainer Fiebig

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 6.1.22: Resume from hibernate fails; bisected
  2023-04-07 11:56       ` Rainer Fiebig
@ 2023-04-07 12:14         ` Greg KH
  0 siblings, 0 replies; 6+ messages in thread
From: Greg KH @ 2023-04-07 12:14 UTC (permalink / raw)
  To: Rainer Fiebig
  Cc: Linux regressions mailing list, stable, tim.huang, Alex Deucher

On Fri, Apr 07, 2023 at 01:56:49PM +0200, Rainer Fiebig wrote:
> Am 06.04.23 um 22:09 schrieb Greg KH:
> > On Thu, Apr 06, 2023 at 05:39:07PM +0200, Rainer Fiebig wrote:
> >> Am 06.04.23 um 15:30 schrieb Linux regression tracking (Thorsten Leemhuis):
> >>> [CCing the regression list, as it should be in the loop for regressions:
> >>> https://docs.kernel.org/admin-guide/reporting-regressions.html]
> >>>
> >>> On 06.04.23 14:06, Rainer Fiebig wrote:
> >>>> Hi! Since kernel 6.1.22 starting a resume from hibernate by hitting a
> >>>> key on the keyboard fails. However, if the PC was switched off and on
> >>>> again (or reset), the resume is OK. The APU  is a Ryzen 5600G.
> >>>>
> >>>> Bisecting between 6.1.21/22 turned up this:
> >>>>
> >>>>
> >>>> Author: Tim Huang <tim.huang@amd.com>
> >>>> Date:   Thu Mar 9 16:27:51 2023 +0800
> >>>>
> >>>>     drm/amdgpu: skip ASIC reset for APUs when go to S4
> >>>>
> >>>>     commit b589626674de94d977e81c99bf7905872b991197 upstream.
> >>>>
> >>>>     For GC IP v11.0.4/11, PSP TMR need to be reserved
> >>>>     for ASIC mode2 reset. But for S4, when psp suspend,
> >>>>     it will destroy the TMR that fails the ASIC reset.
> >>>> [...]
> >>>>
> >>>>
> >>>> Reverting the commit solves the problem.
> >>>> Thanks.
> >>>
> >>> Please try 6.1.23 and report back, because from the thread
> >>> https://lore.kernel.org/all/20230330160740.1dbff94b@schienar/
> >>> it sounds a lot like "drm/amdgpu: allow more APUs to do mode2 reset when
> >>> go to S4" might be fixing this, which went into 6.1.23.
> >> Yes, 6.1.23 seems OK so far.
> >>
> >> I think, however, that rc-kernels and LTS-kernels are different matters.
> >>  With a bleeding edge kernel, problems are to be expected.  But an
> >> LTS-kernel is chosen for stability.  And this is the second time within
> >> just a few weeks that I've been bitten by a time-consuming hibernate-bug
> >> caused by a backport of a commit in amdgpu.
> >>
> >> So I'm asking the devs to either test their patches more thoroughly or
> >> to be a bit more conservative with what they recommend for backporting
> >> to LTS-kernels.  Thanks.
> > 
> > Please feel free to suggest better ways to have automated tests for
> > stuff like this, or to help provide testing for the -rc LTS/stable
> > kernel releases.
> Well, I'm afraid I can't offer a panacea or the ultimate automated
> quality assurance system.  But for the two cases that I've encountered
> lately, a simple hibernate/resume would have shown that there's a
> problem.  After all, that's how I and other users noticed it.
> 
> So I think the primary line of defence against regressions remains the
> developer himself who should try hard to imagine what ramifications his
> patch might have and test it accordingly.  But I'm aware of the fact
> that we are all only humans.
> 
> Another idea might be to give patches that introduce new features or
> only minimal improvements ample time to mature in the latest stable
> kernel before backporting them to LTS-kernels, say three or four
> point-releases.  Or to only backport fixes for bugs or security issues.

This is a long-running discussion.  How do you determine that a "bug
fix" should not be backported now?  For example, this bugfix that caused
a problem was a reported fix for something else, and it had passed the
automated testing system that the DRM developers have.  So why wait on
it?

There's always going to be slip ups, and fixes needed for fixes as we
can't test all hardware configurations or use-cases, all we can do is
react quickly to fix problems when reported.

And in this case, it was fixed _before_ you reported it, which to be
honest, is pretty fast :)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-04-07 12:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <b52bfd11-0d90-739b-be3e-058e246478f7@mailbox.org>
2023-04-06 13:30 ` 6.1.22: Resume from hibernate fails; bisected Linux regression tracking (Thorsten Leemhuis)
2023-04-06 15:39   ` Rainer Fiebig
2023-04-06 20:09     ` Greg KH
     [not found]       ` <BY5PR12MB3873E2729AAA7D0FBB657611F6969@BY5PR12MB3873.namprd12.prod.outlook.com>
2023-04-07 10:01         ` Rainer Fiebig
2023-04-07 11:56       ` Rainer Fiebig
2023-04-07 12:14         ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).