[v3,1/3] cpufreq: ondemand: Change the calculation of target frequency
diff mbox series

Message ID 51AF60D5.3080605@semaphore.gr
State New, archived
Headers show
Series
  • cpufreq: ondemand: Change the calculation of target frequency
Related show

Commit Message

Stratos Karafotis June 5, 2013, 4:01 p.m. UTC
Ondemand calculates load in terms of frequency and increases it only
if the load_freq is greater than up_threshold multiplied by current
or average frequency. This seems to produce oscillations of frequency
between min and max because, for example, a relatively small load can
easily saturate minimum frequency and lead the CPU to max. Then, the
CPU will decrease back to min due to a small load_freq.

This patch changes the calculation method of load and target frequency
considering 2 points:
- Load computation should be independent from current or average
measured frequency. For example an absolute load 80% at 100MHz is not
necessarily equivalent to 8% at 1000MHz in the next sampling interval.
- Target frequency should be increased to any value of frequency table
proportional to absolute load, instead to only the max. Thus:

Target frequency = C * load

where C = policy->cpuinfo.max_freq / 100

Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
that middle frequencies are used more, with this patch. Highest
and lowest frequencies were used less by ~9%

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
---
 drivers/cpufreq/cpufreq_governor.c | 10 +---------
 drivers/cpufreq/cpufreq_governor.h |  1 -
 drivers/cpufreq/cpufreq_ondemand.c | 39 +++++++-------------------------------
 3 files changed, 8 insertions(+), 42 deletions(-)

Comments

Borislav Petkov June 5, 2013, 4:17 p.m. UTC | #1
On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> Ondemand calculates load in terms of frequency and increases it only
> if the load_freq is greater than up_threshold multiplied by current
> or average frequency. This seems to produce oscillations of frequency
> between min and max because, for example, a relatively small load can
> easily saturate minimum frequency and lead the CPU to max. Then, the
> CPU will decrease back to min due to a small load_freq.

Right, and I think this is how we want it, no?

The thing is, the faster you finish your work, the faster you can become
idle and save power.

If you switch frequencies in a staircase-like manner, you're going to
take longer to finish, in certain cases, and burn more power while doing
so.

Btw, racing to idle is also a good example for why you want boosting:
you want to go max out the core but stay within power limits so that you
can finish sooner.

> This patch changes the calculation method of load and target frequency
> considering 2 points:
> - Load computation should be independent from current or average
> measured frequency. For example an absolute load 80% at 100MHz is not
> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> - Target frequency should be increased to any value of frequency table
> proportional to absolute load, instead to only the max. Thus:
> 
> Target frequency = C * load
> 
> where C = policy->cpuinfo.max_freq / 100
> 
> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> that middle frequencies are used more, with this patch. Highest
> and lowest frequencies were used less by ~9%

I read this as "the workload takes longer to complete" which means
higher power consumption and longer execution times which means less
time spent in idle. And I don't think we want that.

Yes, no?

Thanks.
David C Niemi June 5, 2013, 4:58 p.m. UTC | #2
When you are doing a locally-originated truly CPU-bound task, "race to idle" does make some sense.  But I can think of a couple of caveats.

1) If you care about power consumption, you want to avoid super-power-hungry turbo states, as you get less done per watt-hour than in some of the middle states.

2) CPU usage that is related to I/O (network, disk, video) doesn't necessarily let you go to idle sooner if at all.  In this case if you want to minimize power consumption you may want to use middle states a lot.  But if you care more about responsiveness or latency than power consumption, you might want to go to a high state anyway; that is why we have tunables -- so we can configure based on the actual priorities for the machine.

DCN

On 06/05/13 12:17, Borislav Petkov wrote:
> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>> Ondemand calculates load in terms of frequency and increases it only
>> if the load_freq is greater than up_threshold multiplied by current
>> or average frequency. This seems to produce oscillations of frequency
>> between min and max because, for example, a relatively small load can
>> easily saturate minimum frequency and lead the CPU to max. Then, the
>> CPU will decrease back to min due to a small load_freq.
> Right, and I think this is how we want it, no?
>
> The thing is, the faster you finish your work, the faster you can become
> idle and save power.
>
> If you switch frequencies in a staircase-like manner, you're going to
> take longer to finish, in certain cases, and burn more power while doing
> so.
>
> Btw, racing to idle is also a good example for why you want boosting:
> you want to go max out the core but stay within power limits so that you
> can finish sooner.
>
>> This patch changes the calculation method of load and target frequency
>> considering 2 points:
>> - Load computation should be independent from current or average
>> measured frequency. For example an absolute load 80% at 100MHz is not
>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>> - Target frequency should be increased to any value of frequency table
>> proportional to absolute load, instead to only the max. Thus:
>>
>> Target frequency = C * load
>>
>> where C = policy->cpuinfo.max_freq / 100
>>
>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>> that middle frequencies are used more, with this patch. Highest
>> and lowest frequencies were used less by ~9%
> I read this as "the workload takes longer to complete" which means
> higher power consumption and longer execution times which means less
> time spent in idle. And I don't think we want that.
>
> Yes, no?
>
> Thanks.
>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stratos Karafotis June 5, 2013, 5:13 p.m. UTC | #3
Hi Borislav,

On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>> Ondemand calculates load in terms of frequency and increases it only
>> if the load_freq is greater than up_threshold multiplied by current
>> or average frequency. This seems to produce oscillations of frequency
>> between min and max because, for example, a relatively small load can
>> easily saturate minimum frequency and lead the CPU to max. Then, the
>> CPU will decrease back to min due to a small load_freq.
>
> Right, and I think this is how we want it, no?
>
> The thing is, the faster you finish your work, the faster you can become
> idle and save power.

This is exactly the goal of this patch. To use more efficiently middle
frequencies to finish faster the work.

> If you switch frequencies in a staircase-like manner, you're going to
> take longer to finish, in certain cases, and burn more power while doing
> so.

This is not true with this patch. It switches to middle frequencies
when the load < up_threshold.
Now, ondemand does not increase freq. CPU runs in lowest freq till the
load is greater than up_threshold.

> Btw, racing to idle is also a good example for why you want boosting:
> you want to go max out the core but stay within power limits so that you
> can finish sooner.
>
>> This patch changes the calculation method of load and target frequency
>> considering 2 points:
>> - Load computation should be independent from current or average
>> measured frequency. For example an absolute load 80% at 100MHz is not
>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>> - Target frequency should be increased to any value of frequency table
>> proportional to absolute load, instead to only the max. Thus:
>>
>> Target frequency = C * load
>>
>> where C = policy->cpuinfo.max_freq / 100
>>
>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>> that middle frequencies are used more, with this patch. Highest
>> and lowest frequencies were used less by ~9%
>
> I read this as "the workload takes longer to complete" which means
> higher power consumption and longer execution times which means less
> time spent in idle. And I don't think we want that.
>
> Yes, no?

In my opinion, no.
Running the benchmark mentioned in changelog shows shorter execution
time by ~1.5%

Thanks,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Rafael J. Wysocki June 5, 2013, 8:35 p.m. UTC | #4
On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> Hi Borislav,
> 
> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> > On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> >> Ondemand calculates load in terms of frequency and increases it only
> >> if the load_freq is greater than up_threshold multiplied by current
> >> or average frequency. This seems to produce oscillations of frequency
> >> between min and max because, for example, a relatively small load can
> >> easily saturate minimum frequency and lead the CPU to max. Then, the
> >> CPU will decrease back to min due to a small load_freq.
> >
> > Right, and I think this is how we want it, no?
> >
> > The thing is, the faster you finish your work, the faster you can become
> > idle and save power.
> 
> This is exactly the goal of this patch. To use more efficiently middle
> frequencies to finish faster the work.
> 
> > If you switch frequencies in a staircase-like manner, you're going to
> > take longer to finish, in certain cases, and burn more power while doing
> > so.
> 
> This is not true with this patch. It switches to middle frequencies
> when the load < up_threshold.
> Now, ondemand does not increase freq. CPU runs in lowest freq till the
> load is greater than up_threshold.
> 
> > Btw, racing to idle is also a good example for why you want boosting:
> > you want to go max out the core but stay within power limits so that you
> > can finish sooner.
> >
> >> This patch changes the calculation method of load and target frequency
> >> considering 2 points:
> >> - Load computation should be independent from current or average
> >> measured frequency. For example an absolute load 80% at 100MHz is not
> >> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> >> - Target frequency should be increased to any value of frequency table
> >> proportional to absolute load, instead to only the max. Thus:
> >>
> >> Target frequency = C * load
> >>
> >> where C = policy->cpuinfo.max_freq / 100
> >>
> >> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> >> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> >> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> >> that middle frequencies are used more, with this patch. Highest
> >> and lowest frequencies were used less by ~9%

Can you also use powertop to measure the percentage of time spent in idle
states for the same workload with and without your patchset?  Also, it would
be good to measure the total energy consumption somehow ...

Thanks,
Rafael
Borislav Petkov June 6, 2013, 9:55 a.m. UTC | #5
Please do not top-post.

On Wed, Jun 05, 2013 at 12:58:33PM -0400, David C Niemi wrote:
> When you are doing a locally-originated truly CPU-bound task, "race to
> idle" does make some sense. But I can think of a couple of caveats.
>
> 1) If you care about power consumption, you want to avoid
> super-power-hungry turbo states, as you get less done per watt-hour
> than in some of the middle states.
>
> 2) CPU usage that is related to I/O (network, disk, video) doesn't
> necessarily let you go to idle sooner if at all. In this case if you
> want to minimize power consumption you may want to use middle states a
> lot. But if you care more about responsiveness or latency than power
> consumption, you might want to go to a high state anyway; that is why
> we have tunables -- so we can configure based on the actual priorities
> for the machine.

No, users don't always know about tunables - this should Just Work(tm).

The correct "fix" for this whole deal is coupling cpufreq with
the scheduler, as it has been said so many times before. You need
"something" which can tell you whether raising the freq. is worth it or
not (i.e. the process is waiting on IO or is executing instructions).

Btw, recent AMD CPUs have something called frequency feedback interface
which can tell you how much performance you would get if you would raise
the frequency to the next P-state.

I don't know though how reliable this heuristic is, and, besides,
we need this addressed for all hw out there, which means, a sw-only
solution would be the way to go.
Viresh Kumar June 6, 2013, 9:57 a.m. UTC | #6
On 6 June 2013 15:25, Borislav Petkov <bp@suse.de> wrote:
> The correct "fix" for this whole deal is coupling cpufreq with
> the scheduler, as it has been said so many times before. You need
> "something" which can tell you whether raising the freq. is worth it or
> not (i.e. the process is waiting on IO or is executing instructions).

Linaro has got a blueprint in this direction but doesn't have any
proof of concept or RFC patches to share. But that will happen soon.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Borislav Petkov June 6, 2013, 10:01 a.m. UTC | #7
On Wed, Jun 05, 2013 at 10:35:05PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> > Hi Borislav,
> > 
> > On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> > > On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> > >> Ondemand calculates load in terms of frequency and increases it only
> > >> if the load_freq is greater than up_threshold multiplied by current
> > >> or average frequency. This seems to produce oscillations of frequency
> > >> between min and max because, for example, a relatively small load can
> > >> easily saturate minimum frequency and lead the CPU to max. Then, the
> > >> CPU will decrease back to min due to a small load_freq.
> > >
> > > Right, and I think this is how we want it, no?
> > >
> > > The thing is, the faster you finish your work, the faster you can become
> > > idle and save power.
> > 
> > This is exactly the goal of this patch. To use more efficiently middle
> > frequencies to finish faster the work.

Hold on, you say above "easily saturate minimum frequency and lead the
CPU to max". I read this as we jump straight to max P-state where we
even boost.

"CPU to max" finishes the work faster than middle frequencies, if you're
CPU-bound.

> > > If you switch frequencies in a staircase-like manner, you're going to
> > > take longer to finish, in certain cases, and burn more power while doing
> > > so.
> > 
> > This is not true with this patch. It switches to middle frequencies
> > when the load < up_threshold.

This is worth investigating wrt hightened power consumption, as Rafael
suggested.
Viresh Kumar June 6, 2013, 10:10 a.m. UTC | #8
On 6 June 2013 15:31, Borislav Petkov <bp@suse.de> wrote:

> Hold on, you say above "easily saturate minimum frequency and lead the
> CPU to max". I read this as we jump straight to max P-state where we
> even boost.

Probably he meant: "At lowest levels of frequencies, a small load on system
may look like a huge one. like: 20-30% load on max freq can be 95% load
on min freq. And so we jump to max freq even for this load and return back
pretty quickly as this load doesn't sustain for longer. over that we wait for
load to go over up_threshold to increase freq."

> "CPU to max" finishes the work faster than middle frequencies, if you're
> CPU-bound.

He isn't removing this feature at all.

Current code is:

if (load > up_threshold)
   goto maxfreq.
else
   don't increase freq, maybe decrease it in steps

What he is doing is:

if (load > up_threshold)
   goto maxfreq.
else
   increase/decrease freq based on current load.

So, if up_threshold is 95 and load remains < 95, his patch will
give significant improvement both power & performance wise.

Else, it shouldn't decrease it.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Borislav Petkov June 6, 2013, 12:10 p.m. UTC | #9
On Thu, Jun 06, 2013 at 03:40:13PM +0530, Viresh Kumar wrote:
> his patch will give significant improvement both power & performance wise.

Yes, and I'd like to see the paperwork on that. Numbers, and on a couple
of platforms/vendors if possible, please.

Thanks.
David C Niemi June 6, 2013, 1:50 p.m. UTC | #10
On 06/06/13 05:55, Borislav Petkov wrote:
> Please do not top-post.
>
> On Wed, Jun 05, 2013 at 12:58:33PM -0400, David C Niemi wrote:
>> When you are doing a locally-originated truly CPU-bound task, "race to
>> idle" does make some sense. But I can think of a couple of caveats.
>>
>> 1) If you care about power consumption, you want to avoid
>> super-power-hungry turbo states, as you get less done per watt-hour
>> than in some of the middle states.
>>
>> 2) CPU usage that is related to I/O (network, disk, video) doesn't
>> necessarily let you go to idle sooner if at all. In this case if you
>> want to minimize power consumption you may want to use middle states a
>> lot. But if you care more about responsiveness or latency than power
>> consumption, you might want to go to a high state anyway; that is why
>> we have tunables -- so we can configure based on the actual priorities
>> for the machine.
> No, users don't always know about tunables - this should Just Work(tm).
It should "Just Work" for the common mass-market case.  Tunables are not for the average end-user -- they are for either the userland part of the operating system to manage, or for people like me who have specific requirements to meet on thoroughly managed machines.  Without tunables you will be lumping servers, desktops, laptops, and embedded devices together and they simply do not have the same high-level priorities.
>
> The correct "fix" for this whole deal is coupling cpufreq with
> the scheduler, as it has been said so many times before. You need
> "something" which can tell you whether raising the freq. is worth it or
> not (i.e. the process is waiting on IO or is executing instructions).
I'll grant you this in the case of regular userland processes that have medium to large chunks of work to do.  For handling huge amounts of I/O, you have different needs -- think about cases where you need to peg many of your cores at once just handling I/O.  That has to work well too.  That's not saying the scheduler can't help, but the governor needs to know about all CPU consumed, including doing I/O and in all parts of the kernel.

Another part of this picture is the p-state governor.  That is even more scheduler-relevant than the c-state governor.
...
DCN
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stratos Karafotis June 6, 2013, 4:46 p.m. UTC | #11
On 06/06/2013 03:10 PM, Borislav Petkov wrote:
> On Thu, Jun 06, 2013 at 03:40:13PM +0530, Viresh Kumar wrote:
>> his patch will give significant improvement both power & performance wise.
> 
> Yes, and I'd like to see the paperwork on that. Numbers, and on a couple
> of platforms/vendors if possible, please.
> 
> Thanks.
> 

On 06/06/2013 04:15 PM, Borislav Petkov wrote:> Please do not top-post.
> 
> On Thu, Jun 06, 2013 at 03:54:20PM +0300, Stratos Karafotis wrote:
>> I will try to provide the requested info (although, I'm not sure how
>> to measure total energy :) )
> 
> tools/power/x86/turbostat looks like a good tool. It can show, a.o.,
> power consumption in Watts on modern Intels and other interesting stuff.
> 
> HTH.
> 

Apologies for top-posting. I was able to send email only from my phone.

Thanks for you hint about turbostat.

As you most probably understood, I'm individual amateur kernel developer.
I could provide some numbers from x86 architecture as Rafael suggested.
But unfortunately, I don't have access to more sources/infrastructure.
So, I will not be able to provide numbers from different platform(s).

I've already provided some benchmarks from x86 (3.10-rc3) and also
tested the patch in 3.4.47 kernel (ARM, Nexus 4 phone, ~1000 installations)
and in 3.0.80 kernel (ARM, Samsung Galaxy S phone, ~1500 installations).

Kindly let me know if "couple of platforms/vendors" is a show stopper
for this patch series. If yes, please ignore this patch and accept
my apologies for wasting your time. I am just trying to contribute
on this project (I believe there is space here for amateur developers).

Many thanks to Rafael who helped me and guide me.
Thanks to Viresh for his helpful comments and his acknowledgment for
the patch.

Best Regards,
Stratos
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Borislav Petkov June 6, 2013, 5:11 p.m. UTC | #12
On Thu, Jun 06, 2013 at 07:46:17PM +0300, Stratos Karafotis wrote:
> Apologies for top-posting. I was able to send email only from my phone.
> 
> Thanks for you hint about turbostat.
> 
> As you most probably understood, I'm individual amateur kernel developer.
> I could provide some numbers from x86 architecture as Rafael suggested.
> But unfortunately, I don't have access to more sources/infrastructure.
> So, I will not be able to provide numbers from different platform(s).
> 
> I've already provided some benchmarks from x86 (3.10-rc3) and also
> tested the patch in 3.4.47 kernel (ARM, Nexus 4 phone, ~1000 installations)
> and in 3.0.80 kernel (ARM, Samsung Galaxy S phone, ~1500 installations).
> 
> Kindly let me know if "couple of platforms/vendors" is a show stopper
> for this patch series. If yes, please ignore this patch and accept
> my apologies for wasting your time. I am just trying to contribute
> on this project (I believe there is space here for amateur developers).

I'm in no way discouraging you in contributing to the kernel - on the
opposite: you should continue doing that.

I'm just trying to make sure that a change like that doesn't hurt
existing systems, thus the request to test on a couple of platforms. If
you don't have other platforms, that's fine, we'll find them somewhere. :-)

I'm hoping you can understand my aspect too, though - how would you feel
if a patch shows improvement on my box but slows down yours - you won't
be very happy with it, right? That's why we generally want to test such
power/performance tweaks on a wider range of machines.

But you said you have a i7-3770 CPU on which, I think, turbostat should
be able to show you how the power consumption looks like.

And if so, you could measure that consumption once with, and once
without your patch. This will give us initial numbers, at least.

How does that sound?
Stratos Karafotis June 6, 2013, 5:32 p.m. UTC | #13
On 06/06/2013 08:11 PM, Borislav Petkov wrote:
> On Thu, Jun 06, 2013 at 07:46:17PM +0300, Stratos Karafotis wrote:
>> Apologies for top-posting. I was able to send email only from my phone.
>>
>> Thanks for you hint about turbostat.
>>
>> As you most probably understood, I'm individual amateur kernel developer.
>> I could provide some numbers from x86 architecture as Rafael suggested.
>> But unfortunately, I don't have access to more sources/infrastructure.
>> So, I will not be able to provide numbers from different platform(s).
>>
>> I've already provided some benchmarks from x86 (3.10-rc3) and also
>> tested the patch in 3.4.47 kernel (ARM, Nexus 4 phone, ~1000 installations)
>> and in 3.0.80 kernel (ARM, Samsung Galaxy S phone, ~1500 installations).
>>
>> Kindly let me know if "couple of platforms/vendors" is a show stopper
>> for this patch series. If yes, please ignore this patch and accept
>> my apologies for wasting your time. I am just trying to contribute
>> on this project (I believe there is space here for amateur developers).
>
> I'm in no way discouraging you in contributing to the kernel - on the
> opposite: you should continue doing that.

I will try! :)

> I'm just trying to make sure that a change like that doesn't hurt
> existing systems, thus the request to test on a couple of platforms. If
> you don't have other platforms, that's fine, we'll find them somewhere. :-)
>
> I'm hoping you can understand my aspect too, though - how would you feel
> if a patch shows improvement on my box but slows down yours - you won't
> be very happy with it, right? That's why we generally want to test such
> power/performance tweaks on a wider range of machines.

I'm totally understand your aspect and I think you are absolutely
right. I just wanted to declare that I am not able to provide numbers
for other platforms due to lack of hardware.

> But you said you have a i7-3770 CPU on which, I think, turbostat should
> be able to show you how the power consumption looks like.
>
> And if so, you could measure that consumption once with, and once
> without your patch. This will give us initial numbers, at least.
>
> How does that sound?
>

That sounds perfect! I will provide numbers for i7 soon.

Thanks for your comments!
Stratos

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Stratos Karafotis June 7, 2013, 7:14 p.m. UTC | #14
On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
>> Hi Borislav,
>>
>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>>>> Ondemand calculates load in terms of frequency and increases it only
>>>> if the load_freq is greater than up_threshold multiplied by current
>>>> or average frequency. This seems to produce oscillations of frequency
>>>> between min and max because, for example, a relatively small load can
>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
>>>> CPU will decrease back to min due to a small load_freq.
>>>
>>> Right, and I think this is how we want it, no?
>>>
>>> The thing is, the faster you finish your work, the faster you can become
>>> idle and save power.
>>
>> This is exactly the goal of this patch. To use more efficiently middle
>> frequencies to finish faster the work.
>>
>>> If you switch frequencies in a staircase-like manner, you're going to
>>> take longer to finish, in certain cases, and burn more power while doing
>>> so.
>>
>> This is not true with this patch. It switches to middle frequencies
>> when the load < up_threshold.
>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
>> load is greater than up_threshold.
>>
>>> Btw, racing to idle is also a good example for why you want boosting:
>>> you want to go max out the core but stay within power limits so that you
>>> can finish sooner.
>>>
>>>> This patch changes the calculation method of load and target frequency
>>>> considering 2 points:
>>>> - Load computation should be independent from current or average
>>>> measured frequency. For example an absolute load 80% at 100MHz is not
>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>>>> - Target frequency should be increased to any value of frequency table
>>>> proportional to absolute load, instead to only the max. Thus:
>>>>
>>>> Target frequency = C * load
>>>>
>>>> where C = policy->cpuinfo.max_freq / 100
>>>>
>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>>>> that middle frequencies are used more, with this patch. Highest
>>>> and lowest frequencies were used less by ~9%
> 
> Can you also use powertop to measure the percentage of time spent in idle
> states for the same workload with and without your patchset?  Also, it would
> be good to measure the total energy consumption somehow ...
> 
> Thanks,
> Rafael

Hi Rafael,

I repeated the tests extracting also powertop results.
Measurement steps with and without this patch:
1) Reboot system
2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
   without taking measurement
3) Wait few minutes
4) Run Phoronix and powertop for 100secs and take measurement.

I will try to repeat the test and take measurements with turbostat as
Borislav suggested.


Thanks,
Stratos

------------------------------------------------------------------
Test WITHOUT this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3v+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): n


Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 21:41:19
        Started Run 1 @ 21:41:30
        Running Interim Test Script @ 21:41:44
        Started Run 2 @ 21:41:47
        Running Interim Test Script @ 21:42:02
        Started Run 3 @ 21:42:05
        Running Interim Test Script @ 21:42:15  [Std. Dev: 19.28%]
        Started Run 4 @ 21:42:19
        Running Interim Test Script @ 21:42:29  [Std. Dev: 18.72%]
        Started Run 5 @ 21:42:32
        Running Interim Test Script @ 21:42:42  [Std. Dev: 17.84%]
        Started Run 6 @ 21:42:46  [Std. Dev: 16.91%]
        Running Post-Test Script @ 21:42:55

    Test Results:
        11.073544979095
        14.059958934784
        9.6814110279083
        9.6158590316772
        9.5762379169464
        9.5944919586182

    Average: 10.60 Seconds

Powertop results:
http://www.semaphore.gr/results/powertop_without.html


---------------------------------------------------------------------
Test WITH this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): n


Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 21:28:05
        Started Run 1 @ 21:28:17
        Running Interim Test Script @ 21:28:30
        Started Run 2 @ 21:28:34
        Running Interim Test Script @ 21:28:44
        Started Run 3 @ 21:28:47
        Running Interim Test Script @ 21:28:58  [Std. Dev: 4.81%]
        Started Run 4 @ 21:29:02
        Running Interim Test Script @ 21:29:12  [Std. Dev: 6.05%]
        Started Run 5 @ 21:29:15
        Running Interim Test Script @ 21:29:25  [Std. Dev: 6.13%]
        Started Run 6 @ 21:29:28  [Std. Dev: 6.02%]
        Running Post-Test Script @ 21:29:38

    Test Results:
        10.442322015762
        10.038927078247
        11.044027090073
        9.5781810283661
        9.5812470912933
        9.5545389652252

    Average: 10.04 Seconds

Powertop results:
http://www.semaphore.gr/results/powertop_with.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Rafael J. Wysocki June 7, 2013, 8:57 p.m. UTC | #15
On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
> > On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> >> Hi Borislav,
> >>
> >> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> >>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> >>>> Ondemand calculates load in terms of frequency and increases it only
> >>>> if the load_freq is greater than up_threshold multiplied by current
> >>>> or average frequency. This seems to produce oscillations of frequency
> >>>> between min and max because, for example, a relatively small load can
> >>>> easily saturate minimum frequency and lead the CPU to max. Then, the
> >>>> CPU will decrease back to min due to a small load_freq.
> >>>
> >>> Right, and I think this is how we want it, no?
> >>>
> >>> The thing is, the faster you finish your work, the faster you can become
> >>> idle and save power.
> >>
> >> This is exactly the goal of this patch. To use more efficiently middle
> >> frequencies to finish faster the work.
> >>
> >>> If you switch frequencies in a staircase-like manner, you're going to
> >>> take longer to finish, in certain cases, and burn more power while doing
> >>> so.
> >>
> >> This is not true with this patch. It switches to middle frequencies
> >> when the load < up_threshold.
> >> Now, ondemand does not increase freq. CPU runs in lowest freq till the
> >> load is greater than up_threshold.
> >>
> >>> Btw, racing to idle is also a good example for why you want boosting:
> >>> you want to go max out the core but stay within power limits so that you
> >>> can finish sooner.
> >>>
> >>>> This patch changes the calculation method of load and target frequency
> >>>> considering 2 points:
> >>>> - Load computation should be independent from current or average
> >>>> measured frequency. For example an absolute load 80% at 100MHz is not
> >>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> >>>> - Target frequency should be increased to any value of frequency table
> >>>> proportional to absolute load, instead to only the max. Thus:
> >>>>
> >>>> Target frequency = C * load
> >>>>
> >>>> where C = policy->cpuinfo.max_freq / 100
> >>>>
> >>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> >>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> >>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> >>>> that middle frequencies are used more, with this patch. Highest
> >>>> and lowest frequencies were used less by ~9%
> > 
> > Can you also use powertop to measure the percentage of time spent in idle
> > states for the same workload with and without your patchset?  Also, it would
> > be good to measure the total energy consumption somehow ...
> > 
> > Thanks,
> > Rafael
> 
> Hi Rafael,
> 
> I repeated the tests extracting also powertop results.
> Measurement steps with and without this patch:
> 1) Reboot system
> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>    without taking measurement
> 3) Wait few minutes
> 4) Run Phoronix and powertop for 100secs and take measurement.

Well, while this is not conclusive, it definitely looks very promising. :-)

We're seeing measurable performance improvement with the patchset applied *and*
more time spent in idle states both at the same time.  I'd be very surprised if
the energy consumption measuremets did not confirm that the patchset allowed
us to reduce it.

If my computations are correct (somebody please check), the cores spent about
20% more time in idle on the average with the patchset applied and in addition
to that the cc6 residency was greater by about 2% on the average with respect
to the kernel without the patchset.

We need to verify if there are gains (or at least no regressions) with other
workloads, but since this *also* reduces code complexity quite a bit, I'm
seriously considering taking it.

> I will try to repeat the test and take measurements with turbostat as
> Borislav suggested.

Please do!

Thanks,
Rafael


> ------------------------------------------------------------------
> Test WITHOUT this patch:
> 
> Phoronix Test Suite v4.6.0
> 
>     Installed: pts/build-linux-kernel-1.3.0
> 
> System Information
> 
> Hardware:
> Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R
> 
> Software:
> OS: Fedora 18, Kernel: 3.10.0-rc3v+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080
> 
>     Would you like to save these test results (Y/n): n
> 
> 
> Timed Linux Kernel Compilation 3.1:
>     pts/build-linux-kernel-1.3.0
>     Test 1 of 1
>     Estimated Trial Run Count:    3
>     Estimated Time To Completion: 2 Minutes
>         Running Pre-Test Script @ 21:41:19
>         Started Run 1 @ 21:41:30
>         Running Interim Test Script @ 21:41:44
>         Started Run 2 @ 21:41:47
>         Running Interim Test Script @ 21:42:02
>         Started Run 3 @ 21:42:05
>         Running Interim Test Script @ 21:42:15  [Std. Dev: 19.28%]
>         Started Run 4 @ 21:42:19
>         Running Interim Test Script @ 21:42:29  [Std. Dev: 18.72%]
>         Started Run 5 @ 21:42:32
>         Running Interim Test Script @ 21:42:42  [Std. Dev: 17.84%]
>         Started Run 6 @ 21:42:46  [Std. Dev: 16.91%]
>         Running Post-Test Script @ 21:42:55
> 
>     Test Results:
>         11.073544979095
>         14.059958934784
>         9.6814110279083
>         9.6158590316772
>         9.5762379169464
>         9.5944919586182
> 
>     Average: 10.60 Seconds
> 
> Powertop results:
> http://www.semaphore.gr/results/powertop_without.html
> 
> 
> ---------------------------------------------------------------------
> Test WITH this patch:
> 
> Phoronix Test Suite v4.6.0
> 
>     Installed: pts/build-linux-kernel-1.3.0
> 
> System Information
> 
> Hardware:
> Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R
> 
> Software:
> OS: Fedora 18, Kernel: 3.10.0-rc3+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080
> 
>     Would you like to save these test results (Y/n): n
> 
> 
> Timed Linux Kernel Compilation 3.1:
>     pts/build-linux-kernel-1.3.0
>     Test 1 of 1
>     Estimated Trial Run Count:    3
>     Estimated Time To Completion: 2 Minutes
>         Running Pre-Test Script @ 21:28:05
>         Started Run 1 @ 21:28:17
>         Running Interim Test Script @ 21:28:30
>         Started Run 2 @ 21:28:34
>         Running Interim Test Script @ 21:28:44
>         Started Run 3 @ 21:28:47
>         Running Interim Test Script @ 21:28:58  [Std. Dev: 4.81%]
>         Started Run 4 @ 21:29:02
>         Running Interim Test Script @ 21:29:12  [Std. Dev: 6.05%]
>         Started Run 5 @ 21:29:15
>         Running Interim Test Script @ 21:29:25  [Std. Dev: 6.13%]
>         Started Run 6 @ 21:29:28  [Std. Dev: 6.02%]
>         Running Post-Test Script @ 21:29:38
> 
>     Test Results:
>         10.442322015762
>         10.038927078247
>         11.044027090073
>         9.5781810283661
>         9.5812470912933
>         9.5545389652252
> 
>     Average: 10.04 Seconds
> 
> Powertop results:
> http://www.semaphore.gr/results/powertop_with.html
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stratos Karafotis June 8, 2013, 9:56 a.m. UTC | #16
On 06/07/2013 11:57 PM, Rafael J. Wysocki wrote:
> On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
>> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
>>> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
>>>> Hi Borislav,
>>>>
>>>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
>>>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>>>>>> Ondemand calculates load in terms of frequency and increases it only
>>>>>> if the load_freq is greater than up_threshold multiplied by current
>>>>>> or average frequency. This seems to produce oscillations of frequency
>>>>>> between min and max because, for example, a relatively small load can
>>>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
>>>>>> CPU will decrease back to min due to a small load_freq.
>>>>>
>>>>> Right, and I think this is how we want it, no?
>>>>>
>>>>> The thing is, the faster you finish your work, the faster you can become
>>>>> idle and save power.
>>>>
>>>> This is exactly the goal of this patch. To use more efficiently middle
>>>> frequencies to finish faster the work.
>>>>
>>>>> If you switch frequencies in a staircase-like manner, you're going to
>>>>> take longer to finish, in certain cases, and burn more power while doing
>>>>> so.
>>>>
>>>> This is not true with this patch. It switches to middle frequencies
>>>> when the load < up_threshold.
>>>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
>>>> load is greater than up_threshold.
>>>>
>>>>> Btw, racing to idle is also a good example for why you want boosting:
>>>>> you want to go max out the core but stay within power limits so that you
>>>>> can finish sooner.
>>>>>
>>>>>> This patch changes the calculation method of load and target frequency
>>>>>> considering 2 points:
>>>>>> - Load computation should be independent from current or average
>>>>>> measured frequency. For example an absolute load 80% at 100MHz is not
>>>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>>>>>> - Target frequency should be increased to any value of frequency table
>>>>>> proportional to absolute load, instead to only the max. Thus:
>>>>>>
>>>>>> Target frequency = C * load
>>>>>>
>>>>>> where C = policy->cpuinfo.max_freq / 100
>>>>>>
>>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>>>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>>>>>> that middle frequencies are used more, with this patch. Highest
>>>>>> and lowest frequencies were used less by ~9%
>>>
>>> Can you also use powertop to measure the percentage of time spent in idle
>>> states for the same workload with and without your patchset?  Also, it would
>>> be good to measure the total energy consumption somehow ...
>>>
>>> Thanks,
>>> Rafael
>>
>> Hi Rafael,
>>
>> I repeated the tests extracting also powertop results.
>> Measurement steps with and without this patch:
>> 1) Reboot system
>> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>>     without taking measurement
>> 3) Wait few minutes
>> 4) Run Phoronix and powertop for 100secs and take measurement.
> 
> Well, while this is not conclusive, it definitely looks very promising. :-)
> 
> We're seeing measurable performance improvement with the patchset applied *and*
> more time spent in idle states both at the same time.  I'd be very surprised if
> the energy consumption measuremets did not confirm that the patchset allowed
> us to reduce it.
> 
> If my computations are correct (somebody please check), the cores spent about
> 20% more time in idle on the average with the patchset applied and in addition
> to that the cc6 residency was greater by about 2% on the average with respect
> to the kernel without the patchset.
> 
> We need to verify if there are gains (or at least no regressions) with other
> workloads, but since this *also* reduces code complexity quite a bit, I'm
> seriously considering taking it.
> 
>> I will try to repeat the test and take measurements with turbostat as
>> Borislav suggested.
> 
> Please do!
> 
> Thanks,
> Rafael
> 

Hi,

I repeated the tests extracting results from turbostat.
Measurement steps with and without this patch:
1) Reboot system
2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
   without taking measurement
3) Wait few minutes
4) Run Phoronix and turbostat (-i 100) and take measurement


Thanks,
Stratos

------------------------------------------------------------------
Test WITHOUT this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3v+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): n


Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 12:38:35
        Started Run 1 @ 12:38:46
        Running Interim Test Script @ 12:38:59
        Started Run 2 @ 12:39:03
        Running Interim Test Script @ 12:39:14
        Started Run 3 @ 12:39:18
        Running Interim Test Script @ 12:39:27  [Std. Dev: 8.57%]
        Started Run 4 @ 12:39:31
        Running Interim Test Script @ 12:39:41  [Std. Dev: 8.56%]
        Started Run 5 @ 12:39:44
        Running Interim Test Script @ 12:39:54  [Std. Dev: 8.05%]
        Started Run 6 @ 12:39:58  [Std. Dev: 7.57%]
        Running Post-Test Script @ 12:40:07

    Test Results:
        10.280334949493
        11.148964166641
        9.3881862163544
        9.3307340145111
        9.3948450088501
        9.3976459503174

    Average: 9.82 Seconds

cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
         38.86 3.57 3.39   0  10.07   2.98  48.09   0.00   44   44   0.00   0.00   0.00   0.00  26.23  20.28  0.00
  0   0  33.32 3.65 3.39   0  19.88   3.26  43.54   0.00   44   44   0.00   0.00   0.00   0.00  26.23  20.28  0.00
  0   4  48.87 3.52 3.39   0   4.32
  1   1  35.58 3.67 3.39   0  12.93   3.28  48.21   0.00   39
  1   5  42.12 3.51 3.39   0   6.39
  2   2  33.42 3.66 3.39   0  13.11   2.78  50.69   0.00   34
  2   6  40.83 3.43 3.39   0   5.70
  3   3  35.97 3.68 3.39   0  11.51   2.61  49.92   0.00   39
  3   7  40.75 3.49 3.39   0   6.73


---------------------------------------------------------------------
Test WITH this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): n


Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 12:28:03
        Started Run 1 @ 12:28:15
        Running Interim Test Script @ 12:28:28
        Started Run 2 @ 12:28:31
        Running Interim Test Script @ 12:28:41
        Started Run 3 @ 12:28:47
        Running Interim Test Script @ 12:28:56  [Std. Dev: 5.03%]
        Started Run 4 @ 12:29:00
        Running Interim Test Script @ 12:29:09  [Std. Dev: 4.37%]
        Started Run 5 @ 12:29:13
        Running Interim Test Script @ 12:29:22  [Std. Dev: 3.79%]
        Started Run 6 @ 12:29:26  [Std. Dev: 3.49%]
        Running Post-Test Script @ 12:29:35

    Test Results:
        10.134061098099
        9.3411478996277
        9.2629590034485
        9.3126730918884
        9.4799311161041
        9.3236708641052

    Average: 9.48 Seconds

cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
         38.61 3.59 3.39   0   9.64   3.04  48.71   0.00   43   43   0.00   0.00   0.00   0.00  26.30  20.35  0.00
  0   0  34.73 3.67 3.39   0  13.33   3.02  48.93   0.00   43   43   0.00   0.00   0.00   0.00  26.30  20.35  0.00
  0   4  41.86 3.52 3.39   0   6.19
  1   1  33.48 3.66 3.39   0  12.53   4.00  49.99   0.00   40
  1   5  40.62 3.52 3.39   0   5.39
  2   2  34.41 3.66 3.39   0  18.06   2.98  44.55   0.00   35
  2   6  48.26 3.58 3.39   0   4.22
  3   3  35.79 3.69 3.39   0  10.70   2.16  51.36   0.00   40
  3   7  39.77 3.50 3.39   0   6.71



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Rafael J. Wysocki June 8, 2013, 11:18 a.m. UTC | #17
On Saturday, June 08, 2013 12:56:00 PM Stratos Karafotis wrote:
> On 06/07/2013 11:57 PM, Rafael J. Wysocki wrote:
> > On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
> >> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
> >>> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> >>>> Hi Borislav,
> >>>>
> >>>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> >>>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> >>>>>> Ondemand calculates load in terms of frequency and increases it only
> >>>>>> if the load_freq is greater than up_threshold multiplied by current
> >>>>>> or average frequency. This seems to produce oscillations of frequency
> >>>>>> between min and max because, for example, a relatively small load can
> >>>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
> >>>>>> CPU will decrease back to min due to a small load_freq.
> >>>>>
> >>>>> Right, and I think this is how we want it, no?
> >>>>>
> >>>>> The thing is, the faster you finish your work, the faster you can become
> >>>>> idle and save power.
> >>>>
> >>>> This is exactly the goal of this patch. To use more efficiently middle
> >>>> frequencies to finish faster the work.
> >>>>
> >>>>> If you switch frequencies in a staircase-like manner, you're going to
> >>>>> take longer to finish, in certain cases, and burn more power while doing
> >>>>> so.
> >>>>
> >>>> This is not true with this patch. It switches to middle frequencies
> >>>> when the load < up_threshold.
> >>>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
> >>>> load is greater than up_threshold.
> >>>>
> >>>>> Btw, racing to idle is also a good example for why you want boosting:
> >>>>> you want to go max out the core but stay within power limits so that you
> >>>>> can finish sooner.
> >>>>>
> >>>>>> This patch changes the calculation method of load and target frequency
> >>>>>> considering 2 points:
> >>>>>> - Load computation should be independent from current or average
> >>>>>> measured frequency. For example an absolute load 80% at 100MHz is not
> >>>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> >>>>>> - Target frequency should be increased to any value of frequency table
> >>>>>> proportional to absolute load, instead to only the max. Thus:
> >>>>>>
> >>>>>> Target frequency = C * load
> >>>>>>
> >>>>>> where C = policy->cpuinfo.max_freq / 100
> >>>>>>
> >>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> >>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> >>>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> >>>>>> that middle frequencies are used more, with this patch. Highest
> >>>>>> and lowest frequencies were used less by ~9%
> >>>
> >>> Can you also use powertop to measure the percentage of time spent in idle
> >>> states for the same workload with and without your patchset?  Also, it would
> >>> be good to measure the total energy consumption somehow ...
> >>>
> >>> Thanks,
> >>> Rafael
> >>
> >> Hi Rafael,
> >>
> >> I repeated the tests extracting also powertop results.
> >> Measurement steps with and without this patch:
> >> 1) Reboot system
> >> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
> >>     without taking measurement
> >> 3) Wait few minutes
> >> 4) Run Phoronix and powertop for 100secs and take measurement.
> > 
> > Well, while this is not conclusive, it definitely looks very promising. :-)
> > 
> > We're seeing measurable performance improvement with the patchset applied *and*
> > more time spent in idle states both at the same time.  I'd be very surprised if
> > the energy consumption measuremets did not confirm that the patchset allowed
> > us to reduce it.
> > 
> > If my computations are correct (somebody please check), the cores spent about
> > 20% more time in idle on the average with the patchset applied and in addition
> > to that the cc6 residency was greater by about 2% on the average with respect
> > to the kernel without the patchset.
> > 
> > We need to verify if there are gains (or at least no regressions) with other
> > workloads, but since this *also* reduces code complexity quite a bit, I'm
> > seriously considering taking it.
> > 
> >> I will try to repeat the test and take measurements with turbostat as
> >> Borislav suggested.
> > 
> > Please do!
> > 
> > Thanks,
> > Rafael
> > 
> 
> Hi,
> 
> I repeated the tests extracting results from turbostat.
> Measurement steps with and without this patch:
> 1) Reboot system
> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>    without taking measurement
> 3) Wait few minutes
> 4) Run Phoronix and turbostat (-i 100) and take measurement

You need to do something like

# ./turbostat <command invoking the phoronix suite>

Did you do that?

Rafael

Patch
diff mbox series

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index a849b2d..47c8077 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -54,7 +54,7 @@  void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 
 	policy = cdbs->cur_policy;
 
-	/* Get Absolute Load (in terms of freq for ondemand gov) */
+	/* Get Absolute Load */
 	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_common_info *j_cdbs;
 		u64 cur_wall_time, cur_idle_time;
@@ -105,14 +105,6 @@  void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 
 		load = 100 * (wall_time - idle_time) / wall_time;
 
-		if (dbs_data->cdata->governor == GOV_ONDEMAND) {
-			int freq_avg = __cpufreq_driver_getavg(policy, j);
-			if (freq_avg <= 0)
-				freq_avg = policy->cur;
-
-			load *= freq_avg;
-		}
-
 		if (load > max_load)
 			max_load = load;
 	}
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index e7bbf76..c305cad 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -169,7 +169,6 @@  struct od_dbs_tuners {
 	unsigned int sampling_rate;
 	unsigned int sampling_down_factor;
 	unsigned int up_threshold;
-	unsigned int adj_up_threshold;
 	unsigned int powersave_bias;
 	unsigned int io_is_busy;
 };
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 4b9bb5d..62e67a9 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -29,11 +29,9 @@ 
 #include "cpufreq_governor.h"
 
 /* On-demand governor macros */
-#define DEF_FREQUENCY_DOWN_DIFFERENTIAL		(10)
 #define DEF_FREQUENCY_UP_THRESHOLD		(80)
 #define DEF_SAMPLING_DOWN_FACTOR		(1)
 #define MAX_SAMPLING_DOWN_FACTOR		(100000)
-#define MICRO_FREQUENCY_DOWN_DIFFERENTIAL	(3)
 #define MICRO_FREQUENCY_UP_THRESHOLD		(95)
 #define MICRO_FREQUENCY_MIN_SAMPLE_RATE		(10000)
 #define MIN_FREQUENCY_UP_THRESHOLD		(11)
@@ -159,14 +157,10 @@  static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq)
 
 /*
  * Every sampling_rate, we check, if current idle time is less than 20%
- * (default), then we try to increase frequency. Every sampling_rate, we look
- * for the lowest frequency which can sustain the load while keeping idle time
- * over 30%. If such a frequency exist, we try to decrease to this frequency.
- *
- * Any frequency increase takes it to the maximum frequency. Frequency reduction
- * happens at minimum steps of 5% (default) of current frequency
+ * (default), then we try to increase frequency. Else, we adjust the frequency
+ * proportional to load.
  */
-static void od_check_cpu(int cpu, unsigned int load_freq)
+static void od_check_cpu(int cpu, unsigned int load)
 {
 	struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
 	struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy;
@@ -176,29 +170,17 @@  static void od_check_cpu(int cpu, unsigned int load_freq)
 	dbs_info->freq_lo = 0;
 
 	/* Check for frequency increase */
-	if (load_freq > od_tuners->up_threshold * policy->cur) {
+	if (load > od_tuners->up_threshold) {
 		/* If switching to max speed, apply sampling_down_factor */
 		if (policy->cur < policy->max)
 			dbs_info->rate_mult =
 				od_tuners->sampling_down_factor;
 		dbs_freq_increase(policy, policy->max);
 		return;
-	}
-
-	/* Check for frequency decrease */
-	/* if we cannot reduce the frequency anymore, break out early */
-	if (policy->cur == policy->min)
-		return;
-
-	/*
-	 * The optimal frequency is the frequency that is the lowest that can
-	 * support the current CPU usage without triggering the up policy. To be
-	 * safe, we focus 10 points under the threshold.
-	 */
-	if (load_freq < od_tuners->adj_up_threshold
-			* policy->cur) {
+	} else {
+		/* Calculate the next frequency proportional to load */
 		unsigned int freq_next;
-		freq_next = load_freq / od_tuners->adj_up_threshold;
+		freq_next = load * policy->cpuinfo.max_freq / 100;
 
 		/* No longer fully busy, reset rate_mult */
 		dbs_info->rate_mult = 1;
@@ -372,9 +354,6 @@  static ssize_t store_up_threshold(struct dbs_data *dbs_data, const char *buf,
 			input < MIN_FREQUENCY_UP_THRESHOLD) {
 		return -EINVAL;
 	}
-	/* Calculate the new adj_up_threshold */
-	od_tuners->adj_up_threshold += input;
-	od_tuners->adj_up_threshold -= od_tuners->up_threshold;
 
 	od_tuners->up_threshold = input;
 	return count;
@@ -523,8 +502,6 @@  static int od_init(struct dbs_data *dbs_data)
 	if (idle_time != -1ULL) {
 		/* Idle micro accounting is supported. Use finer thresholds */
 		tuners->up_threshold = MICRO_FREQUENCY_UP_THRESHOLD;
-		tuners->adj_up_threshold = MICRO_FREQUENCY_UP_THRESHOLD -
-			MICRO_FREQUENCY_DOWN_DIFFERENTIAL;
 		/*
 		 * In nohz/micro accounting case we set the minimum frequency
 		 * not depending on HZ, but fixed (very low). The deferred
@@ -533,8 +510,6 @@  static int od_init(struct dbs_data *dbs_data)
 		dbs_data->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
 	} else {
 		tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
-		tuners->adj_up_threshold = DEF_FREQUENCY_UP_THRESHOLD -
-			DEF_FREQUENCY_DOWN_DIFFERENTIAL;
 
 		/* For correct statistics, we need 10 ticks for each measure */
 		dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *