regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle
       [not found] <20230411204229.GA4168208@bhelgaas>
@ 2023-04-12 12:24 ` Linux regression tracking #adding (Thorsten Leemhuis)
  2023-04-12 12:30   ` Thorsten Leemhuis
  2023-05-04 15:23 ` Bjorn Helgaas
  1 sibling, 1 reply; 7+ messages in thread
From: Linux regression tracking #adding (Thorsten Leemhuis) @ 2023-04-12 12:24 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci; +Cc: Vidya Sagar, Linux kernel regressions list

A quick note before the usual boilerplate:

Bjorn, you asked KobaKo some questions, but didn't CC him -- and the
comment apparently did not make it to the bugzilla ticket. Something
wrong there? I wish I could CC him, but due to bugzilla's "never show
your email address to logged out users" policies I can't. I added a
comment to the ticket pointing him to your mail.

[TLDR for the rest of the mail: adding this reported to the regression
tracking]

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

On 11.04.23 22:42, Bjorn Helgaas wrote:
> On Tue, Apr 11, 2023 at 08:32:04AM +0000, bugzilla-daemon@kernel.org wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=217321
>> ... 
>>         Regression: No
>>
>> [Symptom]
>> Intel cpu can't sleep deeper than pcˇ during long idle
>> ~~~
>> Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
>> 15.08   75.02   0.00    0.00    0.00    0.00    0.00
>> 15.09   75.02   0.00    0.00    0.00    0.00    0.00
>> ^CPkg%pc2       Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
>> 15.38   68.97   0.00    0.00    0.00    0.00    0.00
>> 15.38   68.96   0.00    0.00    0.00    0.00    0.00
>> ~~~
>> [How to Reproduce]
>> 1. run turbostat to monitor
>> 2. leave machine idle
>> 3. turbostat show cpu only go into pc2~pc3.
>>
>> [Misc]
>> The culprit are this 
>> a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for
>> suspend/resume”
>>
>> if revert a7152be79b62, the issue is gone
> 
> Relevant commits:
> 
>   4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
>   a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"")
> 
> 4ff116d0d5fd appeared in v6.1-rc1.  Prior to 4ff116d0d5fd, ASPM L1 PM
> Substates configuration was not preserved across suspend/resume, so
> the system *worked* after resume, but used more power than expected.
> 
> But 4ff116d0d5fd caused resume to fail completely on some systems, so
> a7152be79b62 reverted it.  With a7152be79b62 reverted, ASPM L1 PM
> Substates configuration is likely not preserved across suspend/resume.
> a7152be79b62 appeared in v6.2-rc8 and was backported to the v6.1
> stable series starting with v6.1.12.
> 
> KobaKo, you don't mention any suspend/resume in this bug report, but
> neither patch should make any difference unless suspend/resume is
> involved.  Does the platform sleep as expected *before* suspend, but
> fail to sleep after resume?
> 
> Or maybe some individual device was suspended via runtime power
> management, and that device lost its L1 PM Substates config?  I don't
> know if there's a way to disable runtime PM easily.
> 
> The lspci output attached to the bugzilla was not collected as root,
> so it lacks the ASPM-related information.  Can you do this again with
> "sudo lspci -vv"?
#regzbot introduced: a7152be79b62
https://bugzilla.kernel.org/show_bug.cgi?id=217321
#regzbot title: PCI/ASPM: Intel system does not sleep deeper than PC3
(caused by a revert applied to fixes another regression)
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle
  2023-04-12 12:24 ` [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-04-12 12:30   ` Thorsten Leemhuis
  0 siblings, 0 replies; 7+ messages in thread
From: Thorsten Leemhuis @ 2023-04-12 12:30 UTC (permalink / raw)
  To: Bjorn Helgaas, linux-pci; +Cc: Vidya Sagar, Linux kernel regressions list

On 12.04.23 14:24, Linux regression tracking #adding (Thorsten Leemhuis)
wrote:
> 
> Bjorn, you asked KobaKo some questions, but didn't CC him -- and the
> comment apparently did not make it to the bugzilla ticket. Something
> wrong there? I wish I could CC him, but due to bugzilla's "never show
> your email address to logged out users" policies I can't. I added a
> comment to the ticket pointing him to your mail.

Hah, stupid me, I assume you BCCed him.

/me needs more tea

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle
       [not found] <20230411204229.GA4168208@bhelgaas>
  2023-04-12 12:24 ` [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle Linux regression tracking #adding (Thorsten Leemhuis)
@ 2023-05-04 15:23 ` Bjorn Helgaas
  2023-05-05  6:56   ` Koba Ko
  1 sibling, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2023-05-04 15:23 UTC (permalink / raw)
  To: linux-pci
  Cc: Vidya Sagar, Koba Ko, Ajay Agarwal, Tasev Nikola, Mark Enriquez,
	Thomas Witt, regressions

[+cc Koba, Ajay, Tasev, Mark, Thomas, regressions list]

On Tue, Apr 11, 2023 at 03:42:29PM -0500, Bjorn Helgaas wrote:
> On Tue, Apr 11, 2023 at 08:32:04AM +0000, bugzilla-daemon@kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=217321
> > ... 
> >         Regression: No
> > 
> > [Symptom]
> > Intel cpu can't sleep deeper than pcˇ during long idle
> > ~~~
> > Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
> > 15.08   75.02   0.00    0.00    0.00    0.00    0.00
> > 15.09   75.02   0.00    0.00    0.00    0.00    0.00
> > ^CPkg%pc2       Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
> > 15.38   68.97   0.00    0.00    0.00    0.00    0.00
> > 15.38   68.96   0.00    0.00    0.00    0.00    0.00
> > ~~~
> > [How to Reproduce]
> > 1. run turbostat to monitor
> > 2. leave machine idle
> > 3. turbostat show cpu only go into pc2~pc3.
> > 
> > [Misc]
> > The culprit are this 
> > a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for
> > suspend/resume”
> > 
> > if revert a7152be79b62, the issue is gone
> 
> Relevant commits:
> 
>   4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
>   a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"")
> 
> 4ff116d0d5fd appeared in v6.1-rc1.  Prior to 4ff116d0d5fd, ASPM L1 PM
> Substates configuration was not preserved across suspend/resume, so
> the system *worked* after resume, but used more power than expected.
> 
> But 4ff116d0d5fd caused resume to fail completely on some systems, so
> a7152be79b62 reverted it.  With a7152be79b62 reverted, ASPM L1 PM
> Substates configuration is likely not preserved across suspend/resume.
> a7152be79b62 appeared in v6.2-rc8 and was backported to the v6.1
> stable series starting with v6.1.12.
> 
> KobaKo, you don't mention any suspend/resume in this bug report, but
> neither patch should make any difference unless suspend/resume is
> involved.  Does the platform sleep as expected *before* suspend, but
> fail to sleep after resume?
>
> Or maybe some individual device was suspended via runtime power
> management, and that device lost its L1 PM Substates config?  I don't
> know if there's a way to disable runtime PM easily.

Koba, per your bugzilla update, the issue happens even without
suspend/resume.  And we don't know whether some particular device is
responsible.

But if we save/restore L1SS state, we can sleep deeper than PC3.  If
we don't preserve L1SS state, we can't.

We definitely want to preserve the L1SS state, but we can't simply
apply 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume") again because it caused its own regressions [1,2,3]

So somebody needs to figure out what was wrong with 4ff116d0d5fd, fix
it, verify that it doesn't cause the issues reported by Tasev, Thomas,
and Mark, and then we can apply it.

Bjorn

[1] https://git.kernel.org/linus/a7152be79b62
[2] https://bugzilla.kernel.org/show_bug.cgi?id=216782
[3] https://bugzilla.kernel.org/show_bug.cgi?id=216877

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle
  2023-05-04 15:23 ` Bjorn Helgaas
@ 2023-05-05  6:56   ` Koba Ko
  2023-05-22 11:45     ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 7+ messages in thread
From: Koba Ko @ 2023-05-05  6:56 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-pci, Vidya Sagar, Ajay Agarwal, Tasev Nikola,
	Mark Enriquez, Thomas Witt, regressions

On Thu, May 4, 2023 at 5:23 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>
> [+cc Koba, Ajay, Tasev, Mark, Thomas, regressions list]
>
> On Tue, Apr 11, 2023 at 03:42:29PM -0500, Bjorn Helgaas wrote:
> > On Tue, Apr 11, 2023 at 08:32:04AM +0000, bugzilla-daemon@kernel.org wrote:
> > > https://bugzilla.kernel.org/show_bug.cgi?id=217321
> > > ...
> > >         Regression: No
> > >
> > > [Symptom]
> > > Intel cpu can't sleep deeper than pcˇ during long idle
> > > ~~~
> > > Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
> > > 15.08   75.02   0.00    0.00    0.00    0.00    0.00
> > > 15.09   75.02   0.00    0.00    0.00    0.00    0.00
> > > ^CPkg%pc2       Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
> > > 15.38   68.97   0.00    0.00    0.00    0.00    0.00
> > > 15.38   68.96   0.00    0.00    0.00    0.00    0.00
> > > ~~~
> > > [How to Reproduce]
> > > 1. run turbostat to monitor
> > > 2. leave machine idle
> > > 3. turbostat show cpu only go into pc2~pc3.
> > >
> > > [Misc]
> > > The culprit are this
> > > a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for
> > > suspend/resume”
> > >
> > > if revert a7152be79b62, the issue is gone
> >
> > Relevant commits:
> >
> >   4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
> >   a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"")
> >
> > 4ff116d0d5fd appeared in v6.1-rc1.  Prior to 4ff116d0d5fd, ASPM L1 PM
> > Substates configuration was not preserved across suspend/resume, so
> > the system *worked* after resume, but used more power than expected.
> >
> > But 4ff116d0d5fd caused resume to fail completely on some systems, so
> > a7152be79b62 reverted it.  With a7152be79b62 reverted, ASPM L1 PM
> > Substates configuration is likely not preserved across suspend/resume.
> > a7152be79b62 appeared in v6.2-rc8 and was backported to the v6.1
> > stable series starting with v6.1.12.
> >
> > KobaKo, you don't mention any suspend/resume in this bug report, but
> > neither patch should make any difference unless suspend/resume is
> > involved.  Does the platform sleep as expected *before* suspend, but
> > fail to sleep after resume?
> >
> > Or maybe some individual device was suspended via runtime power
> > management, and that device lost its L1 PM Substates config?  I don't
> > know if there's a way to disable runtime PM easily.
>
> Koba, per your bugzilla update, the issue happens even without
> suspend/resume.  And we don't know whether some particular device is
> responsible.
>
> But if we save/restore L1SS state, we can sleep deeper than PC3.  If
> we don't preserve L1SS state, we can't.
>
> We definitely want to preserve the L1SS state, but we can't simply
> apply 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
> suspend/resume") again because it caused its own regressions [1,2,3]
>
> So somebody needs to figure out what was wrong with 4ff116d0d5fd, fix
> it, verify that it doesn't cause the issues reported by Tasev, Thomas,
> and Mark, and then we can apply it.
>
> Bjorn

Good days, discussed with Kai-Heng and he mentioned  the GPU may not
be pulled off the power.
then the GPU needs L1ss to get into power saving.

I will investigate further on this way.

>
> [1] https://git.kernel.org/linus/a7152be79b62
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=216782
> [3] https://bugzilla.kernel.org/show_bug.cgi?id=216877

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle
  2023-05-05  6:56   ` Koba Ko
@ 2023-05-22 11:45     ` Linux regression tracking (Thorsten Leemhuis)
  2023-05-23 21:49       ` Bjorn Helgaas
  0 siblings, 1 reply; 7+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-05-22 11:45 UTC (permalink / raw)
  To: Koba Ko, Bjorn Helgaas
  Cc: linux-pci, Vidya Sagar, Ajay Agarwal, Tasev Nikola,
	Mark Enriquez, Thomas Witt, regressions

On 05.05.23 08:56, Koba Ko wrote:
> On Thu, May 4, 2023 at 5:23 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>> [+cc Koba, Ajay, Tasev, Mark, Thomas, regressions list]
>> On Tue, Apr 11, 2023 at 03:42:29PM -0500, Bjorn Helgaas wrote:
>>> On Tue, Apr 11, 2023 at 08:32:04AM +0000, bugzilla-daemon@kernel.org wrote:
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217321
>>>> ...
>>>>         Regression: No
>>>>
>>>> [Symptom]
>>>> Intel cpu can't sleep deeper than pcˇ during long idle
>>>> ~~~
>>>> Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
>>>> 15.08   75.02   0.00    0.00    0.00    0.00    0.00
>>>> 15.09   75.02   0.00    0.00    0.00    0.00    0.00
>>>> ^CPkg%pc2       Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
>>>> 15.38   68.97   0.00    0.00    0.00    0.00    0.00
>>>> 15.38   68.96   0.00    0.00    0.00    0.00    0.00
>>>> ~~~
>>>> [How to Reproduce]
>>>> 1. run turbostat to monitor
>>>> 2. leave machine idle
>>>> 3. turbostat show cpu only go into pc2~pc3.
>>>>
>>>> [Misc]
>>>> The culprit are this
>>>> a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for
>>>> suspend/resume”
>>>>
>>>> if revert a7152be79b62, the issue is gone
>>>
>>> Relevant commits:
>>>
>>>   4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
>>>   a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"")
>>>
>>> 4ff116d0d5fd appeared in v6.1-rc1.  Prior to 4ff116d0d5fd, ASPM L1 PM
>>> Substates configuration was not preserved across suspend/resume, so
>>> the system *worked* after resume, but used more power than expected.
>>>
>>> But 4ff116d0d5fd caused resume to fail completely on some systems, so
>>> a7152be79b62 reverted it.  With a7152be79b62 reverted, ASPM L1 PM
>>> Substates configuration is likely not preserved across suspend/resume.
>>> a7152be79b62 appeared in v6.2-rc8 and was backported to the v6.1
>>> stable series starting with v6.1.12.
>>>
>>> KobaKo, you don't mention any suspend/resume in this bug report, but
>>> neither patch should make any difference unless suspend/resume is
>>> involved.  Does the platform sleep as expected *before* suspend, but
>>> fail to sleep after resume?
>>>
>>> Or maybe some individual device was suspended via runtime power
>>> management, and that device lost its L1 PM Substates config?  I don't
>>> know if there's a way to disable runtime PM easily.
>>
>> Koba, per your bugzilla update, the issue happens even without
>> suspend/resume.  And we don't know whether some particular device is
>> responsible.
>>
>> But if we save/restore L1SS state, we can sleep deeper than PC3.  If
>> we don't preserve L1SS state, we can't.
>>
>> We definitely want to preserve the L1SS state, but we can't simply
>> apply 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
>> suspend/resume") again because it caused its own regressions [1,2,3]
>>
>> So somebody needs to figure out what was wrong with 4ff116d0d5fd, fix
>> it, verify that it doesn't cause the issues reported by Tasev, Thomas,
>> and Mark, and then we can apply it.
> 
> Good days, discussed with Kai-Heng and he mentioned  the GPU may not
> be pulled off the power.
> then the GPU needs L1ss to get into power saving.
> 
> I will investigate further on this way.

Did anything come our of this?

FWIW, I'm considering to drop this from the list of tracked regressions.
Yes, this is a regression, but it's caused by fix for another (worse)
regression -- so there is nothing we can do for now anyway (and Koba
seems motivated already to look properly into all of this). Or does
anyone consider this to be a problem?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle
  2023-05-22 11:45     ` Linux regression tracking (Thorsten Leemhuis)
@ 2023-05-23 21:49       ` Bjorn Helgaas
  2023-05-24  4:15         ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 1 reply; 7+ messages in thread
From: Bjorn Helgaas @ 2023-05-23 21:49 UTC (permalink / raw)
  To: Linux regressions mailing list
  Cc: Koba Ko, linux-pci, Vidya Sagar, Ajay Agarwal, Tasev Nikola,
	Mark Enriquez, Thomas Witt

On Mon, May 22, 2023 at 01:45:55PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
> On 05.05.23 08:56, Koba Ko wrote:
> > On Thu, May 4, 2023 at 5:23 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
> >> [+cc Koba, Ajay, Tasev, Mark, Thomas, regressions list]
> >> On Tue, Apr 11, 2023 at 03:42:29PM -0500, Bjorn Helgaas wrote:
> >>> On Tue, Apr 11, 2023 at 08:32:04AM +0000, bugzilla-daemon@kernel.org wrote:
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=217321
> >>>> ...
> >>>>         Regression: No
> >>>>
> >>>> [Symptom]
> >>>> Intel cpu can't sleep deeper than pcˇ during long idle
> >>>> ~~~
> >>>> Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
> >>>> 15.08   75.02   0.00    0.00    0.00    0.00    0.00
> >>>> 15.09   75.02   0.00    0.00    0.00    0.00    0.00
> >>>> ^CPkg%pc2       Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
> >>>> 15.38   68.97   0.00    0.00    0.00    0.00    0.00
> >>>> 15.38   68.96   0.00    0.00    0.00    0.00    0.00
> >>>> ~~~
> >>>> [How to Reproduce]
> >>>> 1. run turbostat to monitor
> >>>> 2. leave machine idle
> >>>> 3. turbostat show cpu only go into pc2~pc3.
> >>>>
> >>>> [Misc]
> >>>> The culprit are this
> >>>> a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for
> >>>> suspend/resume”
> >>>>
> >>>> if revert a7152be79b62, the issue is gone
> >>>
> >>> Relevant commits:
> >>>
> >>>   4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
> >>>   a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"")
> >>>
> >>> 4ff116d0d5fd appeared in v6.1-rc1.  Prior to 4ff116d0d5fd, ASPM L1 PM
> >>> Substates configuration was not preserved across suspend/resume, so
> >>> the system *worked* after resume, but used more power than expected.
> >>>
> >>> But 4ff116d0d5fd caused resume to fail completely on some systems, so
> >>> a7152be79b62 reverted it.  With a7152be79b62 reverted, ASPM L1 PM
> >>> Substates configuration is likely not preserved across suspend/resume.
> >>> a7152be79b62 appeared in v6.2-rc8 and was backported to the v6.1
> >>> stable series starting with v6.1.12.
> >>>
> >>> KobaKo, you don't mention any suspend/resume in this bug report, but
> >>> neither patch should make any difference unless suspend/resume is
> >>> involved.  Does the platform sleep as expected *before* suspend, but
> >>> fail to sleep after resume?
> >>>
> >>> Or maybe some individual device was suspended via runtime power
> >>> management, and that device lost its L1 PM Substates config?  I don't
> >>> know if there's a way to disable runtime PM easily.
> >>
> >> Koba, per your bugzilla update, the issue happens even without
> >> suspend/resume.  And we don't know whether some particular device is
> >> responsible.
> >>
> >> But if we save/restore L1SS state, we can sleep deeper than PC3.  If
> >> we don't preserve L1SS state, we can't.
> >>
> >> We definitely want to preserve the L1SS state, but we can't simply
> >> apply 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
> >> suspend/resume") again because it caused its own regressions [1,2,3]
> >>
> >> So somebody needs to figure out what was wrong with 4ff116d0d5fd, fix
> >> it, verify that it doesn't cause the issues reported by Tasev, Thomas,
> >> and Mark, and then we can apply it.
> > 
> > Good days, discussed with Kai-Heng and he mentioned  the GPU may not
> > be pulled off the power.
> > then the GPU needs L1ss to get into power saving.
> > 
> > I will investigate further on this way.
> 
> Did anything come our of this?
> 
> FWIW, I'm considering to drop this from the list of tracked regressions.
> Yes, this is a regression, but it's caused by fix for another (worse)
> regression -- so there is nothing we can do for now anyway (and Koba
> seems motivated already to look properly into all of this). Or does
> anyone consider this to be a problem?

I would drop this from the regression list.

Yes, bz 217321 is a bug, and yes, 4ff116d0d5fd is a partial fix for
it, but 4ff116d0d5fd causes worse problems (it breaks resume from
suspend) than just living with bz 217321, which is a "mere" power
consumption issue.

I updated bz 217321 to drop the "regression" label there.

Bjorn

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle
  2023-05-23 21:49       ` Bjorn Helgaas
@ 2023-05-24  4:15         ` Linux regression tracking #update (Thorsten Leemhuis)
  0 siblings, 0 replies; 7+ messages in thread
From: Linux regression tracking #update (Thorsten Leemhuis) @ 2023-05-24  4:15 UTC (permalink / raw)
  To: Bjorn Helgaas, Linux regressions mailing list
  Cc: Koba Ko, linux-pci, Vidya Sagar, Ajay Agarwal, Tasev Nikola,
	Mark Enriquez, Thomas Witt

On 23.05.23 23:49, Bjorn Helgaas wrote:
> On Mon, May 22, 2023 at 01:45:55PM +0200, Linux regression tracking (Thorsten Leemhuis) wrote:
>> On 05.05.23 08:56, Koba Ko wrote:
>>> On Thu, May 4, 2023 at 5:23 PM Bjorn Helgaas <helgaas@kernel.org> wrote:
>>>> [+cc Koba, Ajay, Tasev, Mark, Thomas, regressions list]
>>>> On Tue, Apr 11, 2023 at 03:42:29PM -0500, Bjorn Helgaas wrote:
>>>>> On Tue, Apr 11, 2023 at 08:32:04AM +0000, bugzilla-daemon@kernel.org wrote:
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=217321
>>>>>> ...
>>>>>>         Regression: No
>>>>>>
>>>>>> [Symptom]
>>>>>> Intel cpu can't sleep deeper than pcˇ during long idle
>>>>>> ~~~
>>>>>> Pkg%pc2 Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
>>>>>> 15.08   75.02   0.00    0.00    0.00    0.00    0.00
>>>>>> 15.09   75.02   0.00    0.00    0.00    0.00    0.00
>>>>>> ^CPkg%pc2       Pkg%pc3 Pkg%pc6 Pkg%pc7 Pkg%pc8 Pkg%pc9 Pk%pc10
>>>>>> 15.38   68.97   0.00    0.00    0.00    0.00    0.00
>>>>>> 15.38   68.96   0.00    0.00    0.00    0.00    0.00
>>>>>> ~~~
>>>>>> [How to Reproduce]
>>>>>> 1. run turbostat to monitor
>>>>>> 2. leave machine idle
>>>>>> 3. turbostat show cpu only go into pc2~pc3.
>>>>>>
>>>>>> [Misc]
>>>>>> The culprit are this
>>>>>> a7152be79b62) Revert "PCI/ASPM: Save L1 PM Substates Capability for
>>>>>> suspend/resume”
>>>>>>
>>>>>> if revert a7152be79b62, the issue is gone
>>>>>
>>>>> Relevant commits:
>>>>>
>>>>>   4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for suspend/resume")
>>>>>   a7152be79b62 ("Revert "PCI/ASPM: Save L1 PM Substates Capability for suspend/resume"")
>>>>>
>>>>> 4ff116d0d5fd appeared in v6.1-rc1.  Prior to 4ff116d0d5fd, ASPM L1 PM
>>>>> Substates configuration was not preserved across suspend/resume, so
>>>>> the system *worked* after resume, but used more power than expected.
>>>>>
>>>>> But 4ff116d0d5fd caused resume to fail completely on some systems, so
>>>>> a7152be79b62 reverted it.  With a7152be79b62 reverted, ASPM L1 PM
>>>>> Substates configuration is likely not preserved across suspend/resume.
>>>>> a7152be79b62 appeared in v6.2-rc8 and was backported to the v6.1
>>>>> stable series starting with v6.1.12.
>>>>>
>>>>> KobaKo, you don't mention any suspend/resume in this bug report, but
>>>>> neither patch should make any difference unless suspend/resume is
>>>>> involved.  Does the platform sleep as expected *before* suspend, but
>>>>> fail to sleep after resume?
>>>>>
>>>>> Or maybe some individual device was suspended via runtime power
>>>>> management, and that device lost its L1 PM Substates config?  I don't
>>>>> know if there's a way to disable runtime PM easily.
>>>>
>>>> Koba, per your bugzilla update, the issue happens even without
>>>> suspend/resume.  And we don't know whether some particular device is
>>>> responsible.
>>>>
>>>> But if we save/restore L1SS state, we can sleep deeper than PC3.  If
>>>> we don't preserve L1SS state, we can't.
>>>>
>>>> We definitely want to preserve the L1SS state, but we can't simply
>>>> apply 4ff116d0d5fd ("PCI/ASPM: Save L1 PM Substates Capability for
>>>> suspend/resume") again because it caused its own regressions [1,2,3]
>>>>
>>>> So somebody needs to figure out what was wrong with 4ff116d0d5fd, fix
>>>> it, verify that it doesn't cause the issues reported by Tasev, Thomas,
>>>> and Mark, and then we can apply it.
>>>
>>> Good days, discussed with Kai-Heng and he mentioned  the GPU may not
>>> be pulled off the power.
>>> then the GPU needs L1ss to get into power saving.
>>>
>>> I will investigate further on this way.
>>
>> Did anything come our of this?
>>
>> FWIW, I'm considering to drop this from the list of tracked regressions.
>> Yes, this is a regression, but it's caused by fix for another (worse)
>> regression -- so there is nothing we can do for now anyway (and Koba
>> seems motivated already to look properly into all of this). Or does
>> anyone consider this to be a problem?
> 
> I would drop this from the regression list.
> 
> Yes, bz 217321 is a bug, and yes, 4ff116d0d5fd is a partial fix for
> it, but 4ff116d0d5fd causes worse problems (it breaks resume from
> suspend) than just living with bz 217321, which is a "mere" power
> consumption issue.

Thx for confirming and putting it in better words.

#regzbot inconclusive: can't be solved for now, as this is a regression
causes by a fix for a regression (see list/bz for details)

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-05-24  4:15 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20230411204229.GA4168208@bhelgaas>
2023-04-12 12:24 ` [Bug 217321] New: Intel platforms can't sleep deeper than PC3 during long idle Linux regression tracking #adding (Thorsten Leemhuis)
2023-04-12 12:30   ` Thorsten Leemhuis
2023-05-04 15:23 ` Bjorn Helgaas
2023-05-05  6:56   ` Koba Ko
2023-05-22 11:45     ` Linux regression tracking (Thorsten Leemhuis)
2023-05-23 21:49       ` Bjorn Helgaas
2023-05-24  4:15         ` Linux regression tracking #update (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).