linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* problem in changing from active to passive mode
@ 2021-10-24 13:02 Julia Lawall
  2021-10-24 22:44 ` Doug Smythies
  0 siblings, 1 reply; 19+ messages in thread
From: Julia Lawall @ 2021-10-24 13:02 UTC (permalink / raw)
  To: Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	linux-pm
  Cc: linux-kernel

Hello,

I have an Intel 6130 and an Intel 5218.  These machines have HWP.  They
are configured to boot with active mode and performance as the power
governor.  Since the following commit:

commit a365ab6b9dfbaf8fb4fb4cd5d8a4c55dc4fb8b1c (HEAD, refs/bisect/bad)
Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Date:   Mon Dec 14 21:09:26 2020 +0100

    cpufreq: intel_pstate: Implement the ->adjust_perf() callback

If I change te mode from active to passive, I have the impression that the
machine is no longer able to raise the core frequencies above the minimum.
Changing the mode back to active has no effect.  This persists if I reboot
to another kernel.

Here are some runs that illustrate the problem.  I have tested the
benchmark many times, and apart from this issue its performance is stable.

Intel 6130:

root@yeti-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
===== DaCapo 9.12-MR1 avrora completed warmup 1 in 3420 msec =====
===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
===== DaCapo 9.12-MR1 avrora completed warmup 2 in 2536 msec =====
===== DaCapo 9.12-MR1 avrora starting =====
===== DaCapo 9.12-MR1 avrora PASSED in 2502 msec =====
root@yeti-2:/tmp# echo passive | tee /sys/devices/system/cpu/intel_pstate/status
passive
root@yeti-2:/tmp#
root@yeti-2:/tmp# echo active | tee /sys/devices/system/cpu/intel_pstate/status
active
root@yeti-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
===== DaCapo 9.12-MR1 avrora completed warmup 1 in 7561 msec =====
===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
===== DaCapo 9.12-MR1 avrora completed warmup 2 in 6528 msec =====
===== DaCapo 9.12-MR1 avrora starting =====
===== DaCapo 9.12-MR1 avrora PASSED in 7796 msec =====

-------------------------------------------------------------------------

Intel 5218:

root@troll-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
===== DaCapo 9.12-MR1 avrora completed warmup 1 in 2265 msec =====
===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
===== DaCapo 9.12-MR1 avrora completed warmup 2 in 2033 msec =====
===== DaCapo 9.12-MR1 avrora starting =====
===== DaCapo 9.12-MR1 avrora PASSED in 2068 msec =====
root@troll-2:/tmp# echo passive | tee /sys/devices/system/cpu/intel_pstate/status
passive
root@troll-2:/tmp# echo active | tee /sys/devices/system/cpu/intel_pstate/statusactive
root@troll-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
===== DaCapo 9.12-MR1 avrora completed warmup 1 in 4363 msec =====
===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
===== DaCapo 9.12-MR1 avrora completed warmup 2 in 4486 msec =====
===== DaCapo 9.12-MR1 avrora starting =====
===== DaCapo 9.12-MR1 avrora PASSED in 3417 msec =====

-------------------------------------------------------------------------

thanks,
julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-24 13:02 problem in changing from active to passive mode Julia Lawall
@ 2021-10-24 22:44 ` Doug Smythies
  2021-10-25  5:17   ` Julia Lawall
                     ` (2 more replies)
  0 siblings, 3 replies; 19+ messages in thread
From: Doug Smythies @ 2021-10-24 22:44 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Linux PM list, Linux Kernel Mailing List, dsmythies

On Sun, Oct 24, 2021 at 6:03 AM Julia Lawall <julia.lawall@inria.fr> wrote:
>
> Hello,

Hi,

>
> I have an Intel 6130 and an Intel 5218.  These machines have HWP.  They
> are configured to boot with active mode and performance as the power
> governor.  Since the following commit:
>
> commit a365ab6b9dfbaf8fb4fb4cd5d8a4c55dc4fb8b1c (HEAD, refs/bisect/bad)
> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Date:   Mon Dec 14 21:09:26 2020 +0100
>
>     cpufreq: intel_pstate: Implement the ->adjust_perf() callback
>
> If I change te mode from active to passive, I have the impression that the
> machine is no longer able to raise the core frequencies above the minimum.
> Changing the mode back to active has no effect.  This persists if I reboot
> to another kernel.
>
> Here are some runs that illustrate the problem.  I have tested the
> benchmark many times, and apart from this issue its performance is stable.

Could you also list the CPU frequency scaling governor being used in your
tests. I know you mentioned the performance governor above, but it
changes between active/passive/active transitions.

Example from my test computer:

Note 1: It is only for brevity of this e-mail that I only list for one CPU.
Obviously, I looked at all CPUs when doing this.

Note 2: The test example and conditions have been cherry picked
for dramatic effect.

$ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_driver
intel_pstate
$ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/intel_pstate/status
active
$ ./ping-pong-many 100000 500 10
1418.0660 usecs/loop. (less is better)

$ echo passive | sudo tee /sys/devices/system/cpu/intel_pstate/status
passive
$ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_driver
intel_cpufreq
$ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
schedutil
$ cat /sys/devices/system/cpu/intel_pstate/status
passive
$ ./ping-pong-many 100000 500 10
5053.6355 usecs/loop.

$ echo active | sudo tee /sys/devices/system/cpu/intel_pstate/status
active
$ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_driver
intel_pstate
$ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
powersave
$ cat /sys/devices/system/cpu/intel_pstate/status
active
$ ./ping-pong-many 100000 500 10
2253.5833 usecs/loop.

... Doug

>
> Intel 6130:
>
> root@yeti-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
> ===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
> ===== DaCapo 9.12-MR1 avrora completed warmup 1 in 3420 msec =====
> ===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
> ===== DaCapo 9.12-MR1 avrora completed warmup 2 in 2536 msec =====
> ===== DaCapo 9.12-MR1 avrora starting =====
> ===== DaCapo 9.12-MR1 avrora PASSED in 2502 msec =====
> root@yeti-2:/tmp# echo passive | tee /sys/devices/system/cpu/intel_pstate/status
> passive
> root@yeti-2:/tmp#
> root@yeti-2:/tmp# echo active | tee /sys/devices/system/cpu/intel_pstate/status
> active
> root@yeti-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
> ===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
> ===== DaCapo 9.12-MR1 avrora completed warmup 1 in 7561 msec =====
> ===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
> ===== DaCapo 9.12-MR1 avrora completed warmup 2 in 6528 msec =====
> ===== DaCapo 9.12-MR1 avrora starting =====
> ===== DaCapo 9.12-MR1 avrora PASSED in 7796 msec =====
>
> -------------------------------------------------------------------------
>
> Intel 5218:
>
> root@troll-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
> ===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
> ===== DaCapo 9.12-MR1 avrora completed warmup 1 in 2265 msec =====
> ===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
> ===== DaCapo 9.12-MR1 avrora completed warmup 2 in 2033 msec =====
> ===== DaCapo 9.12-MR1 avrora starting =====
> ===== DaCapo 9.12-MR1 avrora PASSED in 2068 msec =====
> root@troll-2:/tmp# echo passive | tee /sys/devices/system/cpu/intel_pstate/status
> passive
> root@troll-2:/tmp# echo active | tee /sys/devices/system/cpu/intel_pstate/statusactive
> root@troll-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
> ===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
> ===== DaCapo 9.12-MR1 avrora completed warmup 1 in 4363 msec =====
> ===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
> ===== DaCapo 9.12-MR1 avrora completed warmup 2 in 4486 msec =====
> ===== DaCapo 9.12-MR1 avrora starting =====
> ===== DaCapo 9.12-MR1 avrora PASSED in 3417 msec =====
>
> -------------------------------------------------------------------------
>
> thanks,
> julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-24 22:44 ` Doug Smythies
@ 2021-10-25  5:17   ` Julia Lawall
  2021-10-25 20:49   ` Julia Lawall
  2021-10-26 15:13   ` Julia Lawall
  2 siblings, 0 replies; 19+ messages in thread
From: Julia Lawall @ 2021-10-25  5:17 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Linux PM list, Linux Kernel Mailing List



On Sun, 24 Oct 2021, Doug Smythies wrote:

> On Sun, Oct 24, 2021 at 6:03 AM Julia Lawall <julia.lawall@inria.fr> wrote:
> >
> > Hello,
>
> Hi,
>
> >
> > I have an Intel 6130 and an Intel 5218.  These machines have HWP.  They
> > are configured to boot with active mode and performance as the power
> > governor.  Since the following commit:
> >
> > commit a365ab6b9dfbaf8fb4fb4cd5d8a4c55dc4fb8b1c (HEAD, refs/bisect/bad)
> > Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> > Date:   Mon Dec 14 21:09:26 2020 +0100
> >
> >     cpufreq: intel_pstate: Implement the ->adjust_perf() callback
> >
> > If I change te mode from active to passive, I have the impression that the
> > machine is no longer able to raise the core frequencies above the minimum.
> > Changing the mode back to active has no effect.  This persists if I reboot
> > to another kernel.
> >
> > Here are some runs that illustrate the problem.  I have tested the
> > benchmark many times, and apart from this issue its performance is stable.
>
> Could you also list the CPU frequency scaling governor being used in your
> tests. I know you mentioned the performance governor above, but it
> changes between active/passive/active transitions.

Performance.  I only booted and then changed to passive and then changed
back.

I originally saw the problem when changeing from active-performance to
passive-schedutil.  But seeing the problem doesn't require changing the
governor to schedutil.

>
> Example from my test computer:
>
> Note 1: It is only for brevity of this e-mail that I only list for one CPU.
> Obviously, I looked at all CPUs when doing this.
>
> Note 2: The test example and conditions have been cherry picked
> for dramatic effect.
>
> $ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_driver
> intel_pstate
> $ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
> performance
> $ cat /sys/devices/system/cpu/intel_pstate/status
> active
> $ ./ping-pong-many 100000 500 10
> 1418.0660 usecs/loop. (less is better)
>
> $ echo passive | sudo tee /sys/devices/system/cpu/intel_pstate/status
> passive

So converting to passive send you directly to schedutil?  I didn't check
on that - I have always changed to passive and then explicitly change to
schedutil.

> $ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_driver
> intel_cpufreq
> $ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
> schedutil
> $ cat /sys/devices/system/cpu/intel_pstate/status
> passive
> $ ./ping-pong-many 100000 500 10
> 5053.6355 usecs/loop.
>
> $ echo active | sudo tee /sys/devices/system/cpu/intel_pstate/status
> active
> $ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_driver
> intel_pstate
> $ cat /sys/devices/system/cpu/cpu6/cpufreq/scaling_governor
> powersave
> $ cat /sys/devices/system/cpu/intel_pstate/status
> active
> $ ./ping-pong-many 100000 500 10
> 2253.5833 usecs/loop.

So now you are twice as slow, but don't know how much this benchmark
varies.  I suspect that on my machine I would get the 5000 number. I also
traced the frequencies and they were at the lowest point (1GHz) almost all
of the time.

I'll redo my tests and collect all of this information.

thanks,
julia

> ... Doug
>
> >
> > Intel 6130:
> >
> > root@yeti-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
> > ===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
> > ===== DaCapo 9.12-MR1 avrora completed warmup 1 in 3420 msec =====
> > ===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
> > ===== DaCapo 9.12-MR1 avrora completed warmup 2 in 2536 msec =====
> > ===== DaCapo 9.12-MR1 avrora starting =====
> > ===== DaCapo 9.12-MR1 avrora PASSED in 2502 msec =====
> > root@yeti-2:/tmp# echo passive | tee /sys/devices/system/cpu/intel_pstate/status
> > passive
> > root@yeti-2:/tmp#
> > root@yeti-2:/tmp# echo active | tee /sys/devices/system/cpu/intel_pstate/status
> > active
> > root@yeti-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
> > ===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
> > ===== DaCapo 9.12-MR1 avrora completed warmup 1 in 7561 msec =====
> > ===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
> > ===== DaCapo 9.12-MR1 avrora completed warmup 2 in 6528 msec =====
> > ===== DaCapo 9.12-MR1 avrora starting =====
> > ===== DaCapo 9.12-MR1 avrora PASSED in 7796 msec =====
> >
> > -------------------------------------------------------------------------
> >
> > Intel 5218:
> >
> > root@troll-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
> > ===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
> > ===== DaCapo 9.12-MR1 avrora completed warmup 1 in 2265 msec =====
> > ===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
> > ===== DaCapo 9.12-MR1 avrora completed warmup 2 in 2033 msec =====
> > ===== DaCapo 9.12-MR1 avrora starting =====
> > ===== DaCapo 9.12-MR1 avrora PASSED in 2068 msec =====
> > root@troll-2:/tmp# echo passive | tee /sys/devices/system/cpu/intel_pstate/status
> > passive
> > root@troll-2:/tmp# echo active | tee /sys/devices/system/cpu/intel_pstate/statusactive
> > root@troll-2:/tmp# java -jar dacapo-9.12-MR1-bach.jar avrora -n 3
> > ===== DaCapo 9.12-MR1 avrora starting warmup 1 =====
> > ===== DaCapo 9.12-MR1 avrora completed warmup 1 in 4363 msec =====
> > ===== DaCapo 9.12-MR1 avrora starting warmup 2 =====
> > ===== DaCapo 9.12-MR1 avrora completed warmup 2 in 4486 msec =====
> > ===== DaCapo 9.12-MR1 avrora starting =====
> > ===== DaCapo 9.12-MR1 avrora PASSED in 3417 msec =====
> >
> > -------------------------------------------------------------------------
> >
> > thanks,
> > julia
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-24 22:44 ` Doug Smythies
  2021-10-25  5:17   ` Julia Lawall
@ 2021-10-25 20:49   ` Julia Lawall
  2021-10-26 15:13   ` Julia Lawall
  2 siblings, 0 replies; 19+ messages in thread
From: Julia Lawall @ 2021-10-25 20:49 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Julia Lawall, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Linux PM list, Linux Kernel Mailing List

Thanks for the feedback.  I see that if I change the mode from active to
passive and back to active, I end up in active powersave, not active
performance.  Changing the governor to performance does reproducethe
original performance.

Still, I have the impression that the performance with passive/schedutil
is excessively bad because the frequency is excessively low.  But my
machines are not available at the moment, so I will have to try again
tomorrow to see what exactly is going on.

julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-24 22:44 ` Doug Smythies
  2021-10-25  5:17   ` Julia Lawall
  2021-10-25 20:49   ` Julia Lawall
@ 2021-10-26 15:13   ` Julia Lawall
  2021-10-27 15:10     ` Doug Smythies
  2 siblings, 1 reply; 19+ messages in thread
From: Julia Lawall @ 2021-10-26 15:13 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Julia Lawall, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Linux PM list, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1699 bytes --]

The problem is illustrated by the attached graphs.  These graphs on the
odd numbered pages show the frequency of each core measures at every clock
tick.  At each measurement there is a small bar representing 4ms of the
color associated with the frequency.  The percentages shown are thus not
entirely accurate, because the frequency could change within those 4ms and
we would not observe that.

The first graph, 5.9schedutil_yeti, is the normal behavior of schedutil
running.  The application mostly uses the second highest turbo mode, which
is the appropriate one given that there are around 5 active cores most of
the time.  I traced power:cpu_frequency, which is the event that occurs
when the OS requests a change of frequency.  This happens around 5400
times.

The second graph, 5.15-schedutil_yeti, is the latest version of Linus's
tree.  The cores are almost always at the lowest frequency.  There are no
occurrences of the power:cpu_frequency event.

The third graph, 5.9schedutil_after_yeti, it what happens when I reboot
into 5.9 after having changed to passive mode in 5.15.  The number of
power:cpu_frequency drops to around 1100.  The proper turbo mode is
actually used sometimes, but much less than in the first graph.  More than
half of the time, an active core is at the lowest frequency.

This application (avrora from the DaCapo benchmarks) is continually
stopping and starting, both for very short intervals.  This may discourage
the hardware from raising the frequency of its own volition.  I also tried
a simple spin loop (for(;;);) with the 5.15 rc version, and it does go to
the highest frequency as one would expect.  But there are again no
power:cpu_frequency events.

julia

[-- Attachment #2: Type: application/pdf, Size: 182726 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-26 15:13   ` Julia Lawall
@ 2021-10-27 15:10     ` Doug Smythies
  2021-10-27 15:16       ` Julia Lawall
  2021-10-28 17:10       ` Julia Lawall
  0 siblings, 2 replies; 19+ messages in thread
From: Doug Smythies @ 2021-10-27 15:10 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Linux PM list, Linux Kernel Mailing List, dsmythies

On Tue, Oct 26, 2021 at 8:13 AM Julia Lawall <julia.lawall@inria.fr> wrote:
>
> The problem is illustrated by the attached graphs.  These graphs on the
> odd numbered pages show the frequency of each core measures at every clock
> tick.  At each measurement there is a small bar representing 4ms of the
> color associated with the frequency.  The percentages shown are thus not
> entirely accurate, because the frequency could change within those 4ms and
> we would not observe that.
>
> The first graph, 5.9schedutil_yeti, is the normal behavior of schedutil
> running.  The application mostly uses the second highest turbo mode, which
> is the appropriate one given that there are around 5 active cores most of
> the time.  I traced power:cpu_frequency, which is the event that occurs
> when the OS requests a change of frequency.  This happens around 5400
> times.
>
> The second graph, 5.15-schedutil_yeti, is the latest version of Linus's
> tree.  The cores are almost always at the lowest frequency.  There are no
> occurrences of the power:cpu_frequency event.
>
> The third graph, 5.9schedutil_after_yeti, it what happens when I reboot
> into 5.9 after having changed to passive mode in 5.15.  The number of
> power:cpu_frequency drops to around 1100.  The proper turbo mode is
> actually used sometimes, but much less than in the first graph.  More than
> half of the time, an active core is at the lowest frequency.
>
> This application (avrora from the DaCapo benchmarks) is continually
> stopping and starting, both for very short intervals.  This may discourage
> the hardware from raising the frequency of its own volition.

Agreed. This type of workflow has long been known to be a challenge
for various CPU frequency scaling governors. It comes up every so
often on the linux-pm email list. Basically, the schedutil CPU frequency
scaling governor becomes somewhat indecisive under these conditions.
However, if for some reason it gets kicked up to max CPU frequency,
then often it will stay there (depending on details of the workflow,
it stays up for my workflows).

Around the time of the commit you referenced in your earlier
email, it was recognised that proposed changes were adding
a bit of a downward bias to the hwp-passive-scheutil case for
some of these difficult workflows [1].

I booted an old 5.9, HWP enabled, passive, schedutil.
I got the following for my ping-pong test type workflow,
(which is not the best example):

Run 1: 6234 uSecs/loop
Run 2: 2813 uSecs/loop
Run 3: 2721 uSecs/loop
Run 4: 2813 uSecs/loop
Run 5: 11303 uSecs/loop
Run 6: 13803 uSecs/loop
Run 7: 2809 uSecs/loop
Run 8: 2796 uSecs/loop
Run 9: 2760 uSecs/loop
Run 10: 2691 uSecs/loop
Run 11: 9288 uSecs/loop
Run 12: 4275 uSecs/loop

Then the same with kernel 5.15-rc5
(I am a couple of weeks behind).

Run 1: 13618 uSecs/loop
Run 2: 13901 uSecs/loop
Run 3: 8929 uSecs/loop
Run 4: 12189 uSecs/loop
Run 5: 10338 uSecs/loop
Run 6: 12846 uSecs/loop
Run 7: 5418 uSecs/loop
Run 8: 7692 uSecs/loop
Run 9: 11531 uSecs/loop
Run 10: 9763 uSecs/loop

Now, for your graph 3, are you saying this pseudo
code of the process is repeatable?:

Power up the system, booting kernel 5.9
switch to passive/schedutil.
wait X minutes for system to settle
do benchmark, result ~13 seconds
re-boot to kernel 5.15-RC
switch to passive/schedutil.
wait X minutes for system to settle
do benchmark, result ~40 seconds
re-boot to kernel 5.9
switch to passive/schedutil.
wait X minutes for system to settle
do benchmark, result ~28 seconds

... Doug

>  I also tried
> a simple spin loop (for(;;);) with the 5.15 rc version, and it does go to
> the highest frequency as one would expect.  But there are again no
> power:cpu_frequency events.
>
> julia

[1] https://www.spinics.net/lists/kernel/msg3775304.html

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-27 15:10     ` Doug Smythies
@ 2021-10-27 15:16       ` Julia Lawall
  2021-10-28 17:10       ` Julia Lawall
  1 sibling, 0 replies; 19+ messages in thread
From: Julia Lawall @ 2021-10-27 15:16 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Linux PM list, Linux Kernel Mailing List



On Wed, 27 Oct 2021, Doug Smythies wrote:

> On Tue, Oct 26, 2021 at 8:13 AM Julia Lawall <julia.lawall@inria.fr> wrote:
> >
> > The problem is illustrated by the attached graphs.  These graphs on the
> > odd numbered pages show the frequency of each core measures at every clock
> > tick.  At each measurement there is a small bar representing 4ms of the
> > color associated with the frequency.  The percentages shown are thus not
> > entirely accurate, because the frequency could change within those 4ms and
> > we would not observe that.
> >
> > The first graph, 5.9schedutil_yeti, is the normal behavior of schedutil
> > running.  The application mostly uses the second highest turbo mode, which
> > is the appropriate one given that there are around 5 active cores most of
> > the time.  I traced power:cpu_frequency, which is the event that occurs
> > when the OS requests a change of frequency.  This happens around 5400
> > times.
> >
> > The second graph, 5.15-schedutil_yeti, is the latest version of Linus's
> > tree.  The cores are almost always at the lowest frequency.  There are no
> > occurrences of the power:cpu_frequency event.
> >
> > The third graph, 5.9schedutil_after_yeti, it what happens when I reboot
> > into 5.9 after having changed to passive mode in 5.15.  The number of
> > power:cpu_frequency drops to around 1100.  The proper turbo mode is
> > actually used sometimes, but much less than in the first graph.  More than
> > half of the time, an active core is at the lowest frequency.
> >
> > This application (avrora from the DaCapo benchmarks) is continually
> > stopping and starting, both for very short intervals.  This may discourage
> > the hardware from raising the frequency of its own volition.
>
> Agreed. This type of workflow has long been known to be a challenge
> for various CPU frequency scaling governors. It comes up every so
> often on the linux-pm email list. Basically, the schedutil CPU frequency
> scaling governor becomes somewhat indecisive under these conditions.
> However, if for some reason it gets kicked up to max CPU frequency,
> then often it will stay there (depending on details of the workflow,
> it stays up for my workflows).
>
> Around the time of the commit you referenced in your earlier
> email, it was recognised that proposed changes were adding
> a bit of a downward bias to the hwp-passive-scheutil case for
> some of these difficult workflows [1].
>
> I booted an old 5.9, HWP enabled, passive, schedutil.
> I got the following for my ping-pong test type workflow,
> (which is not the best example):
>
> Run 1: 6234 uSecs/loop
> Run 2: 2813 uSecs/loop
> Run 3: 2721 uSecs/loop
> Run 4: 2813 uSecs/loop
> Run 5: 11303 uSecs/loop
> Run 6: 13803 uSecs/loop
> Run 7: 2809 uSecs/loop
> Run 8: 2796 uSecs/loop
> Run 9: 2760 uSecs/loop
> Run 10: 2691 uSecs/loop
> Run 11: 9288 uSecs/loop
> Run 12: 4275 uSecs/loop
>
> Then the same with kernel 5.15-rc5
> (I am a couple of weeks behind).
>
> Run 1: 13618 uSecs/loop
> Run 2: 13901 uSecs/loop
> Run 3: 8929 uSecs/loop
> Run 4: 12189 uSecs/loop
> Run 5: 10338 uSecs/loop
> Run 6: 12846 uSecs/loop
> Run 7: 5418 uSecs/loop
> Run 8: 7692 uSecs/loop
> Run 9: 11531 uSecs/loop
> Run 10: 9763 uSecs/loop
>
> Now, for your graph 3, are you saying this pseudo
> code of the process is repeatable?:
>
> Power up the system, booting kernel 5.9
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~13 seconds
> re-boot to kernel 5.15-RC
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~40 seconds
> re-boot to kernel 5.9
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~28 seconds

Yes, exactly.

I have been looking into why with 5.15-RC there are no requests from
schedutil.  I'm not yet sure to understand everything.  But I do notice
that the function cpufreq_this_cpu_can_update returns false around 2/3 of
the time.  This comes from the following code returning 0:

cpumask_test_cpu(smp_processor_id(), policy->cpus)

It seems that the mask policy->cpus always contains only one core, which
might or might not be the running one.  I don't know if this is the
intended behavior.

julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-27 15:10     ` Doug Smythies
  2021-10-27 15:16       ` Julia Lawall
@ 2021-10-28 17:10       ` Julia Lawall
  2021-10-28 17:29         ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: Julia Lawall @ 2021-10-28 17:10 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Srinivas Pandruvada, Len Brown, Rafael J. Wysocki, Viresh Kumar,
	Linux PM list, Linux Kernel Mailing List

> Now, for your graph 3, are you saying this pseudo
> code of the process is repeatable?:
>
> Power up the system, booting kernel 5.9
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~13 seconds
> re-boot to kernel 5.15-RC
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~40 seconds
> re-boot to kernel 5.9
> switch to passive/schedutil.
> wait X minutes for system to settle
> do benchmark, result ~28 seconds

In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
register is 0 and in the second boot (after booting 5.15 and entering
passive mode) it is 10.  I don't know though if this is a bug or a
feature...

julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 17:10       ` Julia Lawall
@ 2021-10-28 17:29         ` Rafael J. Wysocki
  2021-10-28 17:57           ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2021-10-28 17:29 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Doug Smythies, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Linux PM list, Linux Kernel Mailing List

On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
>
> > Now, for your graph 3, are you saying this pseudo
> > code of the process is repeatable?:
> >
> > Power up the system, booting kernel 5.9
> > switch to passive/schedutil.
> > wait X minutes for system to settle
> > do benchmark, result ~13 seconds
> > re-boot to kernel 5.15-RC
> > switch to passive/schedutil.
> > wait X minutes for system to settle
> > do benchmark, result ~40 seconds
> > re-boot to kernel 5.9
> > switch to passive/schedutil.
> > wait X minutes for system to settle
> > do benchmark, result ~28 seconds
>
> In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> register is 0 and in the second boot (after booting 5.15 and entering
> passive mode) it is 10.  I don't know though if this is a bug or a
> feature...

It looks like a bug.

I think that the desired value is not cleared on driver exit which
should happen.  Let me see if I can do a quick patch for that.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 17:29         ` Rafael J. Wysocki
@ 2021-10-28 17:57           ` Rafael J. Wysocki
  2021-10-28 18:16             ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2021-10-28 17:57 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Doug Smythies, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Linux PM list, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1178 bytes --]

On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> >
> > > Now, for your graph 3, are you saying this pseudo
> > > code of the process is repeatable?:
> > >
> > > Power up the system, booting kernel 5.9
> > > switch to passive/schedutil.
> > > wait X minutes for system to settle
> > > do benchmark, result ~13 seconds
> > > re-boot to kernel 5.15-RC
> > > switch to passive/schedutil.
> > > wait X minutes for system to settle
> > > do benchmark, result ~40 seconds
> > > re-boot to kernel 5.9
> > > switch to passive/schedutil.
> > > wait X minutes for system to settle
> > > do benchmark, result ~28 seconds
> >
> > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > register is 0 and in the second boot (after booting 5.15 and entering
> > passive mode) it is 10.  I don't know though if this is a bug or a
> > feature...
>
> It looks like a bug.
>
> I think that the desired value is not cleared on driver exit which
> should happen.  Let me see if I can do a quick patch for that.

Please check the behavior with the attached patch applied.

[-- Attachment #2: intel_pstate-clear-desired-on-offline.patch --]
[-- Type: text/x-patch, Size: 611 bytes --]

---
 drivers/cpufreq/intel_pstate.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1015,6 +1015,9 @@ static void intel_pstate_hwp_offline(str
 	value |= HWP_MAX_PERF(min_perf);
 	value |= HWP_MIN_PERF(min_perf);
 
+	/* Clear DESIRED_PERF */
+	value &= ~HWP_DESIRED_PERF(~0L);
+
 	/* Set EPP to min */
 	if (boot_cpu_has(X86_FEATURE_HWP_EPP))
 		value |= HWP_ENERGY_PERF_PREFERENCE(HWP_EPP_POWERSAVE);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 17:57           ` Rafael J. Wysocki
@ 2021-10-28 18:16             ` Rafael J. Wysocki
  2021-10-28 18:43               ` Rafael J. Wysocki
  2021-10-28 19:13               ` Julia Lawall
  0 siblings, 2 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2021-10-28 18:16 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Doug Smythies, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Linux PM list, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

On Thu, Oct 28, 2021 at 7:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > >
> > > > Now, for your graph 3, are you saying this pseudo
> > > > code of the process is repeatable?:
> > > >
> > > > Power up the system, booting kernel 5.9
> > > > switch to passive/schedutil.
> > > > wait X minutes for system to settle
> > > > do benchmark, result ~13 seconds
> > > > re-boot to kernel 5.15-RC
> > > > switch to passive/schedutil.
> > > > wait X minutes for system to settle
> > > > do benchmark, result ~40 seconds
> > > > re-boot to kernel 5.9
> > > > switch to passive/schedutil.
> > > > wait X minutes for system to settle
> > > > do benchmark, result ~28 seconds
> > >
> > > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > > register is 0 and in the second boot (after booting 5.15 and entering
> > > passive mode) it is 10.  I don't know though if this is a bug or a
> > > feature...
> >
> > It looks like a bug.
> >
> > I think that the desired value is not cleared on driver exit which
> > should happen.  Let me see if I can do a quick patch for that.
>
> Please check the behavior with the attached patch applied.

Well, actually, the previous one won't do anything, because the
desired perf field is already cleared in this function before writing
the MSR, so please try the one attached to this message instead.

[-- Attachment #2: intel_pstate-clear-desired-on-offline.patch --]
[-- Type: text/x-patch, Size: 762 bytes --]

---
 drivers/cpufreq/intel_pstate.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1005,9 +1005,12 @@ static void intel_pstate_hwp_offline(str
 		 */
 		value &= ~GENMASK_ULL(31, 24);
 		value |= HWP_ENERGY_PERF_PREFERENCE(cpu->epp_cached);
-		WRITE_ONCE(cpu->hwp_req_cached, value);
 	}
 
+	/* Clear the desired perf field in the cached HWP request value. */
+	value &= ~HWP_DESIRED_PERF(~0L);
+	WRITE_ONCE(cpu->hwp_req_cached, value);
+
 	value &= ~GENMASK_ULL(31, 0);
 	min_perf = HWP_LOWEST_PERF(READ_ONCE(cpu->hwp_cap_cached));
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 18:16             ` Rafael J. Wysocki
@ 2021-10-28 18:43               ` Rafael J. Wysocki
  2021-10-28 19:13               ` Julia Lawall
  1 sibling, 0 replies; 19+ messages in thread
From: Rafael J. Wysocki @ 2021-10-28 18:43 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Doug Smythies, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Linux PM list, Linux Kernel Mailing List

On Thu, Oct 28, 2021 at 8:16 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
>
> On Thu, Oct 28, 2021 at 7:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > > >
> > > > > Now, for your graph 3, are you saying this pseudo
> > > > > code of the process is repeatable?:
> > > > >
> > > > > Power up the system, booting kernel 5.9
> > > > > switch to passive/schedutil.
> > > > > wait X minutes for system to settle
> > > > > do benchmark, result ~13 seconds
> > > > > re-boot to kernel 5.15-RC
> > > > > switch to passive/schedutil.
> > > > > wait X minutes for system to settle
> > > > > do benchmark, result ~40 seconds
> > > > > re-boot to kernel 5.9
> > > > > switch to passive/schedutil.
> > > > > wait X minutes for system to settle
> > > > > do benchmark, result ~28 seconds
> > > >
> > > > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > > > register is 0 and in the second boot (after booting 5.15 and entering
> > > > passive mode) it is 10.  I don't know though if this is a bug or a
> > > > feature...

I think I didn't understand you correctly, sorry about that.

In 5.15-rc (starting in 5.11-rc) the desired perf field in HWP_REQUEST
is used in the passive mode, so that is expected.

However, it may not be reset to 0 when going back from the passive to
the active mode.

> > > It looks like a bug.
> > >
> > > I think that the desired value is not cleared on driver exit which
> > > should happen.  Let me see if I can do a quick patch for that.
> >
> > Please check the behavior with the attached patch applied.
>
> Well, actually, the previous one won't do anything, because the
> desired perf field is already cleared in this function before writing
> the MSR, so please try the one attached to this message instead.

So with the last patch applied, can you please check if you get
desired=0 with 5.15-rc when switching driver modes from passive to
active?  FWIW, this works for me here.

In any case, the desired perf value in HWP_REQUEST is expected to be
reset to 0 on system restart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 18:16             ` Rafael J. Wysocki
  2021-10-28 18:43               ` Rafael J. Wysocki
@ 2021-10-28 19:13               ` Julia Lawall
  2021-10-28 19:21                 ` Rafael J. Wysocki
  1 sibling, 1 reply; 19+ messages in thread
From: Julia Lawall @ 2021-10-28 19:13 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Julia Lawall, Doug Smythies, Srinivas Pandruvada, Len Brown,
	Rafael J. Wysocki, Viresh Kumar, Linux PM list,
	Linux Kernel Mailing List



On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:

> On Thu, Oct 28, 2021 at 7:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > > >
> > > > > Now, for your graph 3, are you saying this pseudo
> > > > > code of the process is repeatable?:
> > > > >
> > > > > Power up the system, booting kernel 5.9
> > > > > switch to passive/schedutil.
> > > > > wait X minutes for system to settle
> > > > > do benchmark, result ~13 seconds
> > > > > re-boot to kernel 5.15-RC
> > > > > switch to passive/schedutil.
> > > > > wait X minutes for system to settle
> > > > > do benchmark, result ~40 seconds
> > > > > re-boot to kernel 5.9
> > > > > switch to passive/schedutil.
> > > > > wait X minutes for system to settle
> > > > > do benchmark, result ~28 seconds
> > > >
> > > > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > > > register is 0 and in the second boot (after booting 5.15 and entering
> > > > passive mode) it is 10.  I don't know though if this is a bug or a
> > > > feature...
> > >
> > > It looks like a bug.
> > >
> > > I think that the desired value is not cleared on driver exit which
> > > should happen.  Let me see if I can do a quick patch for that.
> >
> > Please check the behavior with the attached patch applied.
>
> Well, actually, the previous one won't do anything, because the
> desired perf field is already cleared in this function before writing
> the MSR, so please try the one attached to this message instead.
>

Turbostat still shows 10:

cpu0: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
cpu0: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
cpu0: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
cpu0: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
cpu1: MSR_PM_ENABLE: 0x00000001 (HWP)
cpu1: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
cpu1: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
cpu1: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
cpu1: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
cpu2: MSR_PM_ENABLE: 0x00000001 (HWP)
cpu2: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
cpu2: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
cpu2: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
cpu2: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
cpu3: MSR_PM_ENABLE: 0x00000001 (HWP)
cpu3: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
cpu3: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
cpu3: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
cpu3: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)

julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 19:13               ` Julia Lawall
@ 2021-10-28 19:21                 ` Rafael J. Wysocki
  2021-10-28 19:25                   ` Julia Lawall
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2021-10-28 19:21 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Rafael J. Wysocki, Doug Smythies, Srinivas Pandruvada, Len Brown,
	Rafael J. Wysocki, Viresh Kumar, Linux PM list,
	Linux Kernel Mailing List

On Thu, Oct 28, 2021 at 9:13 PM Julia Lawall <julia.lawall@inria.fr> wrote:
>
>
>
> On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
>
> > On Thu, Oct 28, 2021 at 7:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > > On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > > > >
> > > > > > Now, for your graph 3, are you saying this pseudo
> > > > > > code of the process is repeatable?:
> > > > > >
> > > > > > Power up the system, booting kernel 5.9
> > > > > > switch to passive/schedutil.
> > > > > > wait X minutes for system to settle
> > > > > > do benchmark, result ~13 seconds
> > > > > > re-boot to kernel 5.15-RC
> > > > > > switch to passive/schedutil.
> > > > > > wait X minutes for system to settle
> > > > > > do benchmark, result ~40 seconds
> > > > > > re-boot to kernel 5.9
> > > > > > switch to passive/schedutil.
> > > > > > wait X minutes for system to settle
> > > > > > do benchmark, result ~28 seconds
> > > > >
> > > > > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > > > > register is 0 and in the second boot (after booting 5.15 and entering
> > > > > passive mode) it is 10.  I don't know though if this is a bug or a
> > > > > feature...
> > > >
> > > > It looks like a bug.
> > > >
> > > > I think that the desired value is not cleared on driver exit which
> > > > should happen.  Let me see if I can do a quick patch for that.
> > >
> > > Please check the behavior with the attached patch applied.
> >
> > Well, actually, the previous one won't do anything, because the
> > desired perf field is already cleared in this function before writing
> > the MSR, so please try the one attached to this message instead.
> >
>
> Turbostat still shows 10:
>
> cpu0: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> cpu0: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> cpu0: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> cpu0: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> cpu1: MSR_PM_ENABLE: 0x00000001 (HWP)
> cpu1: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> cpu1: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> cpu1: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> cpu1: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> cpu2: MSR_PM_ENABLE: 0x00000001 (HWP)
> cpu2: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> cpu2: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> cpu2: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> cpu2: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> cpu3: MSR_PM_ENABLE: 0x00000001 (HWP)
> cpu3: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> cpu3: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> cpu3: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> cpu3: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)

Hmmm.

Is this also the case if you go from "passive" to "active" on 5.15-rc
w/ the patch applied?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 19:21                 ` Rafael J. Wysocki
@ 2021-10-28 19:25                   ` Julia Lawall
  2021-10-28 19:48                     ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Julia Lawall @ 2021-10-28 19:25 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Julia Lawall, Doug Smythies, Srinivas Pandruvada, Len Brown,
	Rafael J. Wysocki, Viresh Kumar, Linux PM list,
	Linux Kernel Mailing List



On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:

> On Thu, Oct 28, 2021 at 9:13 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> >
> >
> >
> > On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
> >
> > > On Thu, Oct 28, 2021 at 7:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > >
> > > > On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > > > > >
> > > > > > > Now, for your graph 3, are you saying this pseudo
> > > > > > > code of the process is repeatable?:
> > > > > > >
> > > > > > > Power up the system, booting kernel 5.9
> > > > > > > switch to passive/schedutil.
> > > > > > > wait X minutes for system to settle
> > > > > > > do benchmark, result ~13 seconds
> > > > > > > re-boot to kernel 5.15-RC
> > > > > > > switch to passive/schedutil.
> > > > > > > wait X minutes for system to settle
> > > > > > > do benchmark, result ~40 seconds
> > > > > > > re-boot to kernel 5.9
> > > > > > > switch to passive/schedutil.
> > > > > > > wait X minutes for system to settle
> > > > > > > do benchmark, result ~28 seconds
> > > > > >
> > > > > > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > > > > > register is 0 and in the second boot (after booting 5.15 and entering
> > > > > > passive mode) it is 10.  I don't know though if this is a bug or a
> > > > > > feature...
> > > > >
> > > > > It looks like a bug.
> > > > >
> > > > > I think that the desired value is not cleared on driver exit which
> > > > > should happen.  Let me see if I can do a quick patch for that.
> > > >
> > > > Please check the behavior with the attached patch applied.
> > >
> > > Well, actually, the previous one won't do anything, because the
> > > desired perf field is already cleared in this function before writing
> > > the MSR, so please try the one attached to this message instead.
> > >
> >
> > Turbostat still shows 10:
> >
> > cpu0: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > cpu0: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > cpu0: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > cpu0: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > cpu1: MSR_PM_ENABLE: 0x00000001 (HWP)
> > cpu1: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > cpu1: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > cpu1: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > cpu1: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > cpu2: MSR_PM_ENABLE: 0x00000001 (HWP)
> > cpu2: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > cpu2: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > cpu2: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > cpu2: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > cpu3: MSR_PM_ENABLE: 0x00000001 (HWP)
> > cpu3: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > cpu3: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > cpu3: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > cpu3: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
>
> Hmmm.
>
> Is this also the case if you go from "passive" to "active" on 5.15-rc
> w/ the patch applied?

Sorry, I was wrong.  If I am in 5.15 and go from passive to active, the
des field indeed returns to 0.  If I use kexec to reboot from 5.15
passive into 5.9, then the des field remains 10.

julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 19:25                   ` Julia Lawall
@ 2021-10-28 19:48                     ` Rafael J. Wysocki
  2021-10-28 20:18                       ` Julia Lawall
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2021-10-28 19:48 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Rafael J. Wysocki, Doug Smythies, Srinivas Pandruvada, Len Brown,
	Rafael J. Wysocki, Viresh Kumar, Linux PM list,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 4278 bytes --]

On Thu, Oct 28, 2021 at 9:25 PM Julia Lawall <julia.lawall@inria.fr> wrote:
>
>
>
> On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
>
> > On Thu, Oct 28, 2021 at 9:13 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > >
> > >
> > >
> > > On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
> > >
> > > > On Thu, Oct 28, 2021 at 7:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > >
> > > > > On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > > > > > >
> > > > > > > > Now, for your graph 3, are you saying this pseudo
> > > > > > > > code of the process is repeatable?:
> > > > > > > >
> > > > > > > > Power up the system, booting kernel 5.9
> > > > > > > > switch to passive/schedutil.
> > > > > > > > wait X minutes for system to settle
> > > > > > > > do benchmark, result ~13 seconds
> > > > > > > > re-boot to kernel 5.15-RC
> > > > > > > > switch to passive/schedutil.
> > > > > > > > wait X minutes for system to settle
> > > > > > > > do benchmark, result ~40 seconds
> > > > > > > > re-boot to kernel 5.9
> > > > > > > > switch to passive/schedutil.
> > > > > > > > wait X minutes for system to settle
> > > > > > > > do benchmark, result ~28 seconds
> > > > > > >
> > > > > > > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > > > > > > register is 0 and in the second boot (after booting 5.15 and entering
> > > > > > > passive mode) it is 10.  I don't know though if this is a bug or a
> > > > > > > feature...
> > > > > >
> > > > > > It looks like a bug.
> > > > > >
> > > > > > I think that the desired value is not cleared on driver exit which
> > > > > > should happen.  Let me see if I can do a quick patch for that.
> > > > >
> > > > > Please check the behavior with the attached patch applied.
> > > >
> > > > Well, actually, the previous one won't do anything, because the
> > > > desired perf field is already cleared in this function before writing
> > > > the MSR, so please try the one attached to this message instead.
> > > >
> > >
> > > Turbostat still shows 10:
> > >
> > > cpu0: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > cpu0: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > cpu0: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > cpu0: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > cpu1: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > cpu1: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > cpu1: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > cpu1: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > cpu1: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > cpu2: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > cpu2: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > cpu2: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > cpu2: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > cpu2: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > cpu3: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > cpu3: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > cpu3: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > cpu3: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > cpu3: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> >
> > Hmmm.
> >
> > Is this also the case if you go from "passive" to "active" on 5.15-rc
> > w/ the patch applied?
>
> Sorry, I was wrong.  If I am in 5.15 and go from passive to active, the
> des field indeed returns to 0.  If I use kexec

Well, this means that the cpufreq driver cleanup is not carried out in
the kexec path and the old desired value remains in the register.

> to reboot from 5.15 passive into 5.9, then the des field remains 10.

It looks like desired perf needs to be cleared explicitly in the active mode.

Attached is a patch to do that, but please note that the 5.9 will need
to be patched too to address this issue.

[-- Attachment #2: intel_pstate-clear-desired-in-active.patch --]
[-- Type: text/x-patch, Size: 517 bytes --]

---
 drivers/cpufreq/intel_pstate.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -946,6 +946,8 @@ static void intel_pstate_hwp_set(unsigne
 	value &= ~HWP_MAX_PERF(~0L);
 	value |= HWP_MAX_PERF(max);
 
+	value &= ~HWP_DESIRED_PERF(~0L);
+
 	if (cpu_data->epp_policy == cpu_data->policy)
 		goto skip_epp;
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 19:48                     ` Rafael J. Wysocki
@ 2021-10-28 20:18                       ` Julia Lawall
  2021-10-29 15:39                         ` Rafael J. Wysocki
  0 siblings, 1 reply; 19+ messages in thread
From: Julia Lawall @ 2021-10-28 20:18 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Doug Smythies, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Linux PM list, Linux Kernel Mailing List



On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:

> On Thu, Oct 28, 2021 at 9:25 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> >
> >
> >
> > On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
> >
> > > On Thu, Oct 28, 2021 at 9:13 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > > >
> > > >
> > > >
> > > > On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
> > > >
> > > > > On Thu, Oct 28, 2021 at 7:57 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Oct 28, 2021 at 7:29 PM Rafael J. Wysocki <rafael@kernel.org> wrote:
> > > > > > >
> > > > > > > On Thu, Oct 28, 2021 at 7:10 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> > > > > > > >
> > > > > > > > > Now, for your graph 3, are you saying this pseudo
> > > > > > > > > code of the process is repeatable?:
> > > > > > > > >
> > > > > > > > > Power up the system, booting kernel 5.9
> > > > > > > > > switch to passive/schedutil.
> > > > > > > > > wait X minutes for system to settle
> > > > > > > > > do benchmark, result ~13 seconds
> > > > > > > > > re-boot to kernel 5.15-RC
> > > > > > > > > switch to passive/schedutil.
> > > > > > > > > wait X minutes for system to settle
> > > > > > > > > do benchmark, result ~40 seconds
> > > > > > > > > re-boot to kernel 5.9
> > > > > > > > > switch to passive/schedutil.
> > > > > > > > > wait X minutes for system to settle
> > > > > > > > > do benchmark, result ~28 seconds
> > > > > > > >
> > > > > > > > In the first boot of 5.9, the des (desired?) field of the HWP_REQUEST
> > > > > > > > register is 0 and in the second boot (after booting 5.15 and entering
> > > > > > > > passive mode) it is 10.  I don't know though if this is a bug or a
> > > > > > > > feature...
> > > > > > >
> > > > > > > It looks like a bug.
> > > > > > >
> > > > > > > I think that the desired value is not cleared on driver exit which
> > > > > > > should happen.  Let me see if I can do a quick patch for that.
> > > > > >
> > > > > > Please check the behavior with the attached patch applied.
> > > > >
> > > > > Well, actually, the previous one won't do anything, because the
> > > > > desired perf field is already cleared in this function before writing
> > > > > the MSR, so please try the one attached to this message instead.
> > > > >
> > > >
> > > > Turbostat still shows 10:
> > > >
> > > > cpu0: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > > cpu0: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > > cpu0: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > > cpu0: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > > cpu1: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > > cpu1: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > > cpu1: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > > cpu1: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > > cpu1: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > > cpu2: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > > cpu2: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > > cpu2: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > > cpu2: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > > cpu2: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > > > cpu3: MSR_PM_ENABLE: 0x00000001 (HWP)
> > > > cpu3: MSR_HWP_CAPABILITIES: 0x070a1525 (high 37 guar 21 eff 10 low 7)
> > > > cpu3: MSR_HWP_REQUEST: 0x000a2525 (min 37 max 37 des 10 epp 0x0 window 0x0 pkg 0x0)
> > > > cpu3: MSR_HWP_REQUEST_PKG: 0x8000ff00 (min 0 max 255 des 0 epp 0x80 window 0x0)
> > > > cpu3: MSR_HWP_STATUS: 0x00000004 (No-Guaranteed_Perf_Change, No-Excursion_Min)
> > >
> > > Hmmm.
> > >
> > > Is this also the case if you go from "passive" to "active" on 5.15-rc
> > > w/ the patch applied?
> >
> > Sorry, I was wrong.  If I am in 5.15 and go from passive to active, the
> > des field indeed returns to 0.  If I use kexec
>
> Well, this means that the cpufreq driver cleanup is not carried out in
> the kexec path and the old desired value remains in the register.
>
> > to reboot from 5.15 passive into 5.9, then the des field remains 10.
>
> It looks like desired perf needs to be cleared explicitly in the active mode.
>
> Attached is a patch to do that, but please note that the 5.9 will need
> to be patched too to address this issue.

I'm not completely clear on what the new patch is doing and how I should
test it.  If I stay in 5.15, the original patch worked for clearing des
when going from passive to active.

julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-28 20:18                       ` Julia Lawall
@ 2021-10-29 15:39                         ` Rafael J. Wysocki
  2021-10-29 20:29                           ` Julia Lawall
  0 siblings, 1 reply; 19+ messages in thread
From: Rafael J. Wysocki @ 2021-10-29 15:39 UTC (permalink / raw)
  To: Julia Lawall
  Cc: Rafael J. Wysocki, Doug Smythies, Srinivas Pandruvada, Len Brown,
	Rafael J. Wysocki, Viresh Kumar, Linux PM list,
	Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 1295 bytes --]

On Thu, Oct 28, 2021 at 10:18 PM Julia Lawall <julia.lawall@inria.fr> wrote:
>
>
>
> On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
>
> > On Thu, Oct 28, 2021 at 9:25 PM Julia Lawall <julia.lawall@inria.fr> wrote:

[cut]

> > Attached is a patch to do that, but please note that the 5.9 will need
> > to be patched too to address this issue.
>
> I'm not completely clear on what the new patch is doing and how I should
> test it.  If I stay in 5.15, the original patch worked for clearing des
> when going from passive to active.

Sorry for the confusion.

If applied to 5.15-rc alone, the last patch would cause des to be
cleared when switching from passive to active and if applied to both
5.15-rc and 5.9, it would fix the kexec issue as well.

Never mind, though.

The patch attached to this message should cause des to be cleared when
switching from passive to active (because it is based on the previous
patch doing that) and it should prevent nonzero des from being leaked
via the HWP_REQUEST MSR to the new kernel started via kexec.  With
this patch applied to 5.15-rc des should be 0 when switching from
passive to active and it should also be 0 after starting another
kernel via kexec while intel_pstate is running in the passive mode.

Can you please verify that it works as expected?

[-- Attachment #2: intel_pstate-clear-desired-on-offline-and-suspend.patch --]
[-- Type: text/x-patch, Size: 1931 bytes --]

---
 drivers/cpufreq/intel_pstate.c |   28 ++++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1005,9 +1005,12 @@ static void intel_pstate_hwp_offline(str
 		 */
 		value &= ~GENMASK_ULL(31, 24);
 		value |= HWP_ENERGY_PERF_PREFERENCE(cpu->epp_cached);
-		WRITE_ONCE(cpu->hwp_req_cached, value);
 	}
 
+	/* Clear the desired perf field in the cached HWP request value. */
+	value &= ~HWP_DESIRED_PERF(~0L);
+	WRITE_ONCE(cpu->hwp_req_cached, value);
+
 	value &= ~GENMASK_ULL(31, 0);
 	min_perf = HWP_LOWEST_PERF(READ_ONCE(cpu->hwp_cap_cached));
 
@@ -3002,6 +3005,27 @@ static int intel_cpufreq_cpu_exit(struct
 	return intel_pstate_cpu_exit(policy);
 }
 
+static int intel_cpufreq_suspend(struct cpufreq_policy *policy)
+{
+	intel_pstate_suspend(policy);
+
+	if (hwp_active) {
+		struct cpudata *cpu = all_cpu_data[policy->cpu];
+		u64 value = READ_ONCE(cpu->hwp_req_cached);
+
+		/*
+		 * Clear the desired perf field in MSR_HWP_REQUEST in case
+		 * intel_cpufreq_adjust_perf() is in use and the last value
+		 * written by it may not be suitable.
+		 */
+		value &= ~HWP_DESIRED_PERF(~0L);
+		wrmsrl_on_cpu(cpu->cpu, MSR_HWP_REQUEST, value);
+		WRITE_ONCE(cpu->hwp_req_cached, value);
+	}
+
+	return 0;
+}
+
 static struct cpufreq_driver intel_cpufreq = {
 	.flags		= CPUFREQ_CONST_LOOPS,
 	.verify		= intel_cpufreq_verify_policy,
@@ -3011,7 +3035,7 @@ static struct cpufreq_driver intel_cpufr
 	.exit		= intel_cpufreq_cpu_exit,
 	.offline	= intel_cpufreq_cpu_offline,
 	.online		= intel_pstate_cpu_online,
-	.suspend	= intel_pstate_suspend,
+	.suspend	= intel_cpufreq_suspend,
 	.resume		= intel_pstate_resume,
 	.update_limits	= intel_pstate_update_limits,
 	.name		= "intel_cpufreq",

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: problem in changing from active to passive mode
  2021-10-29 15:39                         ` Rafael J. Wysocki
@ 2021-10-29 20:29                           ` Julia Lawall
  0 siblings, 0 replies; 19+ messages in thread
From: Julia Lawall @ 2021-10-29 20:29 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Doug Smythies, Srinivas Pandruvada, Len Brown, Rafael J. Wysocki,
	Viresh Kumar, Linux PM list, Linux Kernel Mailing List



On Fri, 29 Oct 2021, Rafael J. Wysocki wrote:

> On Thu, Oct 28, 2021 at 10:18 PM Julia Lawall <julia.lawall@inria.fr> wrote:
> >
> >
> >
> > On Thu, 28 Oct 2021, Rafael J. Wysocki wrote:
> >
> > > On Thu, Oct 28, 2021 at 9:25 PM Julia Lawall <julia.lawall@inria.fr> wrote:
>
> [cut]
>
> > > Attached is a patch to do that, but please note that the 5.9 will need
> > > to be patched too to address this issue.
> >
> > I'm not completely clear on what the new patch is doing and how I should
> > test it.  If I stay in 5.15, the original patch worked for clearing des
> > when going from passive to active.
>
> Sorry for the confusion.
>
> If applied to 5.15-rc alone, the last patch would cause des to be
> cleared when switching from passive to active and if applied to both
> 5.15-rc and 5.9, it would fix the kexec issue as well.
>
> Never mind, though.
>
> The patch attached to this message should cause des to be cleared when
> switching from passive to active (because it is based on the previous
> patch doing that) and it should prevent nonzero des from being leaked
> via the HWP_REQUEST MSR to the new kernel started via kexec.  With
> this patch applied to 5.15-rc des should be 0 when switching from
> passive to active and it should also be 0 after starting another
> kernel via kexec while intel_pstate is running in the passive mode.
>
> Can you please verify that it works as expected?

I booted 5.15 rc6 in active then changed to passive making the des field
non zero, and then changed back to active, making it 0 again.  I them
changed again to passive and kexeced 5.9.  The des field was again 0.

So it looks fine.

thanks,
julia

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-10-29 20:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-24 13:02 problem in changing from active to passive mode Julia Lawall
2021-10-24 22:44 ` Doug Smythies
2021-10-25  5:17   ` Julia Lawall
2021-10-25 20:49   ` Julia Lawall
2021-10-26 15:13   ` Julia Lawall
2021-10-27 15:10     ` Doug Smythies
2021-10-27 15:16       ` Julia Lawall
2021-10-28 17:10       ` Julia Lawall
2021-10-28 17:29         ` Rafael J. Wysocki
2021-10-28 17:57           ` Rafael J. Wysocki
2021-10-28 18:16             ` Rafael J. Wysocki
2021-10-28 18:43               ` Rafael J. Wysocki
2021-10-28 19:13               ` Julia Lawall
2021-10-28 19:21                 ` Rafael J. Wysocki
2021-10-28 19:25                   ` Julia Lawall
2021-10-28 19:48                     ` Rafael J. Wysocki
2021-10-28 20:18                       ` Julia Lawall
2021-10-29 15:39                         ` Rafael J. Wysocki
2021-10-29 20:29                           ` Julia Lawall

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).