All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-08 12:34 ` Stratos Karafotis
  0 siblings, 0 replies; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-08 12:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 5698 bytes --]

I also did the test with the way you mentioned. But I thought to run turbostat for 100 sec as I did with powertop.
Actually benchmark lasts about 96 secs.

I think that we use almost the same energy for 100 sec to run the same load a little bit faster. I think this means also a reduce to power consumption.

I will also send the results running the test as you said.

Thanks again,
Stratos

"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

>On Saturday, June 08, 2013 12:56:00 PM Stratos Karafotis wrote:
>> On 06/07/2013 11:57 PM, Rafael J. Wysocki wrote:
>> > On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
>> >> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
>> >>> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
>> >>>> Hi Borislav,
>> >>>>
>> >>>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
>> >>>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>> >>>>>> Ondemand calculates load in terms of frequency and increases it only
>> >>>>>> if the load_freq is greater than up_threshold multiplied by current
>> >>>>>> or average frequency. This seems to produce oscillations of frequency
>> >>>>>> between min and max because, for example, a relatively small load can
>> >>>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
>> >>>>>> CPU will decrease back to min due to a small load_freq.
>> >>>>>
>> >>>>> Right, and I think this is how we want it, no?
>> >>>>>
>> >>>>> The thing is, the faster you finish your work, the faster you can become
>> >>>>> idle and save power.
>> >>>>
>> >>>> This is exactly the goal of this patch. To use more efficiently middle
>> >>>> frequencies to finish faster the work.
>> >>>>
>> >>>>> If you switch frequencies in a staircase-like manner, you're going to
>> >>>>> take longer to finish, in certain cases, and burn more power while doing
>> >>>>> so.
>> >>>>
>> >>>> This is not true with this patch. It switches to middle frequencies
>> >>>> when the load < up_threshold.
>> >>>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
>> >>>> load is greater than up_threshold.
>> >>>>
>> >>>>> Btw, racing to idle is also a good example for why you want boosting:
>> >>>>> you want to go max out the core but stay within power limits so that you
>> >>>>> can finish sooner.
>> >>>>>
>> >>>>>> This patch changes the calculation method of load and target frequency
>> >>>>>> considering 2 points:
>> >>>>>> - Load computation should be independent from current or average
>> >>>>>> measured frequency. For example an absolute load 80% at 100MHz is not
>> >>>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>> >>>>>> - Target frequency should be increased to any value of frequency table
>> >>>>>> proportional to absolute load, instead to only the max. Thus:
>> >>>>>>
>> >>>>>> Target frequency = C * load
>> >>>>>>
>> >>>>>> where C = policy->cpuinfo.max_freq / 100
>> >>>>>>
>> >>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>> >>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>> >>>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>> >>>>>> that middle frequencies are used more, with this patch. Highest
>> >>>>>> and lowest frequencies were used less by ~9%
>> >>>
>> >>> Can you also use powertop to measure the percentage of time spent in idle
>> >>> states for the same workload with and without your patchset?  Also, it would
>> >>> be good to measure the total energy consumption somehow ...
>> >>>
>> >>> Thanks,
>> >>> Rafael
>> >>
>> >> Hi Rafael,
>> >>
>> >> I repeated the tests extracting also powertop results.
>> >> Measurement steps with and without this patch:
>> >> 1) Reboot system
>> >> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>> >>     without taking measurement
>> >> 3) Wait few minutes
>> >> 4) Run Phoronix and powertop for 100secs and take measurement.
>> > 
>> > Well, while this is not conclusive, it definitely looks very promising. :-)
>> > 
>> > We're seeing measurable performance improvement with the patchset applied *and*
>> > more time spent in idle states both at the same time.  I'd be very surprised if
>> > the energy consumption measuremets did not confirm that the patchset allowed
>> > us to reduce it.
>> > 
>> > If my computations are correct (somebody please check), the cores spent about
>> > 20% more time in idle on the average with the patchset applied and in addition
>> > to that the cc6 residency was greater by about 2% on the average with respect
>> > to the kernel without the patchset.
>> > 
>> > We need to verify if there are gains (or at least no regressions) with other
>> > workloads, but since this *also* reduces code complexity quite a bit, I'm
>> > seriously considering taking it.
>> > 
>> >> I will try to repeat the test and take measurements with turbostat as
>> >> Borislav suggested.
>> > 
>> > Please do!
>> > 
>> > Thanks,
>> > Rafael
>> > 
>> 
>> Hi,
>> 
>> I repeated the tests extracting results from turbostat.
>> Measurement steps with and without this patch:
>> 1) Reboot system
>> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>>    without taking measurement
>> 3) Wait few minutes
>> 4) Run Phoronix and turbostat (-i 100) and take measurement
>
>You need to do something like
>
># ./turbostat <command invoking the phoronix suite>
>
>Did you do that?
>
>Rafael
>
>
>-- 
>I speak only for myself.
>Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-08 12:34 ` Stratos Karafotis
  0 siblings, 0 replies; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-08 12:34 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

I also did the test with the way you mentioned. But I thought to run turbostat for 100 sec as I did with powertop.
Actually benchmark lasts about 96 secs.

I think that we use almost the same energy for 100 sec to run the same load a little bit faster. I think this means also a reduce to power consumption.

I will also send the results running the test as you said.

Thanks again,
Stratos

"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

>On Saturday, June 08, 2013 12:56:00 PM Stratos Karafotis wrote:
>> On 06/07/2013 11:57 PM, Rafael J. Wysocki wrote:
>> > On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
>> >> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
>> >>> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
>> >>>> Hi Borislav,
>> >>>>
>> >>>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
>> >>>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>> >>>>>> Ondemand calculates load in terms of frequency and increases it only
>> >>>>>> if the load_freq is greater than up_threshold multiplied by current
>> >>>>>> or average frequency. This seems to produce oscillations of frequency
>> >>>>>> between min and max because, for example, a relatively small load can
>> >>>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
>> >>>>>> CPU will decrease back to min due to a small load_freq.
>> >>>>>
>> >>>>> Right, and I think this is how we want it, no?
>> >>>>>
>> >>>>> The thing is, the faster you finish your work, the faster you can become
>> >>>>> idle and save power.
>> >>>>
>> >>>> This is exactly the goal of this patch. To use more efficiently middle
>> >>>> frequencies to finish faster the work.
>> >>>>
>> >>>>> If you switch frequencies in a staircase-like manner, you're going to
>> >>>>> take longer to finish, in certain cases, and burn more power while doing
>> >>>>> so.
>> >>>>
>> >>>> This is not true with this patch. It switches to middle frequencies
>> >>>> when the load < up_threshold.
>> >>>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
>> >>>> load is greater than up_threshold.
>> >>>>
>> >>>>> Btw, racing to idle is also a good example for why you want boosting:
>> >>>>> you want to go max out the core but stay within power limits so that you
>> >>>>> can finish sooner.
>> >>>>>
>> >>>>>> This patch changes the calculation method of load and target frequency
>> >>>>>> considering 2 points:
>> >>>>>> - Load computation should be independent from current or average
>> >>>>>> measured frequency. For example an absolute load 80% at 100MHz is not
>> >>>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>> >>>>>> - Target frequency should be increased to any value of frequency table
>> >>>>>> proportional to absolute load, instead to only the max. Thus:
>> >>>>>>
>> >>>>>> Target frequency = C * load
>> >>>>>>
>> >>>>>> where C = policy->cpuinfo.max_freq / 100
>> >>>>>>
>> >>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>> >>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>> >>>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>> >>>>>> that middle frequencies are used more, with this patch. Highest
>> >>>>>> and lowest frequencies were used less by ~9%
>> >>>
>> >>> Can you also use powertop to measure the percentage of time spent in idle
>> >>> states for the same workload with and without your patchset?  Also, it would
>> >>> be good to measure the total energy consumption somehow ...
>> >>>
>> >>> Thanks,
>> >>> Rafael
>> >>
>> >> Hi Rafael,
>> >>
>> >> I repeated the tests extracting also powertop results.
>> >> Measurement steps with and without this patch:
>> >> 1) Reboot system
>> >> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>> >>     without taking measurement
>> >> 3) Wait few minutes
>> >> 4) Run Phoronix and powertop for 100secs and take measurement.
>> > 
>> > Well, while this is not conclusive, it definitely looks very promising. :-)
>> > 
>> > We're seeing measurable performance improvement with the patchset applied *and*
>> > more time spent in idle states both at the same time.  I'd be very surprised if
>> > the energy consumption measuremets did not confirm that the patchset allowed
>> > us to reduce it.
>> > 
>> > If my computations are correct (somebody please check), the cores spent about
>> > 20% more time in idle on the average with the patchset applied and in addition
>> > to that the cc6 residency was greater by about 2% on the average with respect
>> > to the kernel without the patchset.
>> > 
>> > We need to verify if there are gains (or at least no regressions) with other
>> > workloads, but since this *also* reduces code complexity quite a bit, I'm
>> > seriously considering taking it.
>> > 
>> >> I will try to repeat the test and take measurements with turbostat as
>> >> Borislav suggested.
>> > 
>> > Please do!
>> > 
>> > Thanks,
>> > Rafael
>> > 
>> 
>> Hi,
>> 
>> I repeated the tests extracting results from turbostat.
>> Measurement steps with and without this patch:
>> 1) Reboot system
>> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>>    without taking measurement
>> 3) Wait few minutes
>> 4) Run Phoronix and turbostat (-i 100) and take measurement
>
>You need to do something like
>
># ./turbostat <command invoking the phoronix suite>
>
>Did you do that?
>
>Rafael
>
>
>-- 
>I speak only for myself.
>Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-08 12:34 ` Stratos Karafotis
  (?)
@ 2013-06-08 14:05 ` Rafael J. Wysocki
  2013-06-08 20:31   ` Stratos Karafotis
  -1 siblings, 1 reply; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-08 14:05 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Saturday, June 08, 2013 03:34:29 PM Stratos Karafotis wrote:
> I also did the test with the way you mentioned. But I thought to run turbostat for 100 sec as I did with powertop.

Ah, OK.

> Actually benchmark lasts about 96 secs.
> 
> I think that we use almost the same energy for 100 sec to run the same load a little bit faster. I think this means also a reduce to power consumption.
> 
> I will also send the results running the test as you said.

Cool, thanks!

Rafael


> "Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> 
> >On Saturday, June 08, 2013 12:56:00 PM Stratos Karafotis wrote:
> >> On 06/07/2013 11:57 PM, Rafael J. Wysocki wrote:
> >> > On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
> >> >> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
> >> >>> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> >> >>>> Hi Borislav,
> >> >>>>
> >> >>>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> >> >>>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> >> >>>>>> Ondemand calculates load in terms of frequency and increases it only
> >> >>>>>> if the load_freq is greater than up_threshold multiplied by current
> >> >>>>>> or average frequency. This seems to produce oscillations of frequency
> >> >>>>>> between min and max because, for example, a relatively small load can
> >> >>>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
> >> >>>>>> CPU will decrease back to min due to a small load_freq.
> >> >>>>>
> >> >>>>> Right, and I think this is how we want it, no?
> >> >>>>>
> >> >>>>> The thing is, the faster you finish your work, the faster you can become
> >> >>>>> idle and save power.
> >> >>>>
> >> >>>> This is exactly the goal of this patch. To use more efficiently middle
> >> >>>> frequencies to finish faster the work.
> >> >>>>
> >> >>>>> If you switch frequencies in a staircase-like manner, you're going to
> >> >>>>> take longer to finish, in certain cases, and burn more power while doing
> >> >>>>> so.
> >> >>>>
> >> >>>> This is not true with this patch. It switches to middle frequencies
> >> >>>> when the load < up_threshold.
> >> >>>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
> >> >>>> load is greater than up_threshold.
> >> >>>>
> >> >>>>> Btw, racing to idle is also a good example for why you want boosting:
> >> >>>>> you want to go max out the core but stay within power limits so that you
> >> >>>>> can finish sooner.
> >> >>>>>
> >> >>>>>> This patch changes the calculation method of load and target frequency
> >> >>>>>> considering 2 points:
> >> >>>>>> - Load computation should be independent from current or average
> >> >>>>>> measured frequency. For example an absolute load 80% at 100MHz is not
> >> >>>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> >> >>>>>> - Target frequency should be increased to any value of frequency table
> >> >>>>>> proportional to absolute load, instead to only the max. Thus:
> >> >>>>>>
> >> >>>>>> Target frequency = C * load
> >> >>>>>>
> >> >>>>>> where C = policy->cpuinfo.max_freq / 100
> >> >>>>>>
> >> >>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> >> >>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> >> >>>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> >> >>>>>> that middle frequencies are used more, with this patch. Highest
> >> >>>>>> and lowest frequencies were used less by ~9%
> >> >>>
> >> >>> Can you also use powertop to measure the percentage of time spent in idle
> >> >>> states for the same workload with and without your patchset?  Also, it would
> >> >>> be good to measure the total energy consumption somehow ...
> >> >>>
> >> >>> Thanks,
> >> >>> Rafael
> >> >>
> >> >> Hi Rafael,
> >> >>
> >> >> I repeated the tests extracting also powertop results.
> >> >> Measurement steps with and without this patch:
> >> >> 1) Reboot system
> >> >> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
> >> >>     without taking measurement
> >> >> 3) Wait few minutes
> >> >> 4) Run Phoronix and powertop for 100secs and take measurement.
> >> > 
> >> > Well, while this is not conclusive, it definitely looks very promising. :-)
> >> > 
> >> > We're seeing measurable performance improvement with the patchset applied *and*
> >> > more time spent in idle states both at the same time.  I'd be very surprised if
> >> > the energy consumption measuremets did not confirm that the patchset allowed
> >> > us to reduce it.
> >> > 
> >> > If my computations are correct (somebody please check), the cores spent about
> >> > 20% more time in idle on the average with the patchset applied and in addition
> >> > to that the cc6 residency was greater by about 2% on the average with respect
> >> > to the kernel without the patchset.
> >> > 
> >> > We need to verify if there are gains (or at least no regressions) with other
> >> > workloads, but since this *also* reduces code complexity quite a bit, I'm
> >> > seriously considering taking it.
> >> > 
> >> >> I will try to repeat the test and take measurements with turbostat as
> >> >> Borislav suggested.
> >> > 
> >> > Please do!
> >> > 
> >> > Thanks,
> >> > Rafael
> >> > 
> >> 
> >> Hi,
> >> 
> >> I repeated the tests extracting results from turbostat.
> >> Measurement steps with and without this patch:
> >> 1) Reboot system
> >> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
> >>    without taking measurement
> >> 3) Wait few minutes
> >> 4) Run Phoronix and turbostat (-i 100) and take measurement
> >
> >You need to do something like
> >
> ># ./turbostat <command invoking the phoronix suite>
> >
> >Did you do that?
> >
> >Rafael

-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-08 14:05 ` Rafael J. Wysocki
@ 2013-06-08 20:31   ` Stratos Karafotis
  2013-06-08 22:18     ` Rafael J. Wysocki
  0 siblings, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-08 20:31 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On 06/08/2013 05:05 PM, Rafael J. Wysocki wrote:
> On Saturday, June 08, 2013 03:34:29 PM Stratos Karafotis wrote:
>> I also did the test with the way you mentioned. But I thought to run turbostat for 100 sec as I did with powertop.
> 
> Ah, OK.
> 
>> Actually benchmark lasts about 96 secs.
>>
>> I think that we use almost the same energy for 100 sec to run the same load a little bit faster. I think this means also a reduce to power consumption.
>>
>> I will also send the results running the test as you said.
> 
> Cool, thanks!

More results running:
./turbostat phoronix-test-suite benchmark pts/build-linux-kernel

Measurement steps with and without this patch:
1) Reboot system
2) Run twice the command above without taking measurement
3) Wait few minutes
4) Run the command and take measurement

Thanks,
Stratos

--------------------------------------------------------------
Test WITHOUT this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3v+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): 

Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 22:59:35
        Started Run 1 @ 22:59:46
        Running Interim Test Script @ 23:00:00
        Started Run 2 @ 23:00:04
        Running Interim Test Script @ 23:00:13
        Started Run 3 @ 23:00:17
        Running Interim Test Script @ 23:00:26  [Std. Dev: 10.04%]
        Started Run 4 @ 23:00:30
        Running Interim Test Script @ 23:00:39  [Std. Dev: 8.98%]
        Started Run 5 @ 23:00:43
        Running Interim Test Script @ 23:00:53  [Std. Dev: 7.80%]
        Started Run 6 @ 23:00:56  [Std. Dev: 7.21%]
        Running Post-Test Script @ 23:01:06

    Test Results:
        11.121481895447
        9.3301539421082
        9.4521908760071
        9.3115320205688
        9.720575094223
        9.396096944809

    Average: 9.72 Seconds

cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
         40.96 3.57 3.39   0   9.83   3.36  45.85   0.00   46   46   0.00   0.00   0.00   0.00  27.25  21.27  0.00
  0   0  37.65 3.67 3.39   0  20.53   3.18  38.64   0.00   46   46   0.00   0.00   0.00   0.00  27.25  21.27  0.00
  0   4  52.10 3.54 3.39   0   6.08
  1   1  35.21 3.66 3.39   0  11.45   3.80  49.54   0.00   41
  1   5  41.99 3.45 3.39   0   4.66
  2   2  35.46 3.66 3.39   0  10.97   3.60  49.97   0.00   38
  2   6  41.90 3.48 3.39   0   4.53
  3   3  39.44 3.69 3.39   0  12.46   2.86  45.24   0.00   41
  3   7  43.90 3.45 3.39   0   7.99
94.876210 sec


---------------------------------------------------------------------
Test WITH this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): 

Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 22:48:20
        Started Run 1 @ 22:48:30
        Running Interim Test Script @ 22:48:44
        Started Run 2 @ 22:48:47
        Running Interim Test Script @ 22:48:56
        Started Run 3 @ 22:49:00
        Running Interim Test Script @ 22:49:10  [Std. Dev: 4.68%]
        Started Run 4 @ 22:49:13
        Running Interim Test Script @ 22:49:23  [Std. Dev: 4.72%]
        Started Run 5 @ 22:49:26
        Running Interim Test Script @ 22:49:35  [Std. Dev: 4.25%]
        Started Run 6 @ 22:49:39  [Std. Dev: 3.98%]
        Running Post-Test Script @ 22:49:48

    Test Results:
        10.205597162247
        9.2953701019287
        9.8262219429016
        9.2547709941864
        9.4089620113373
        9.3398430347443

    Average: 9.56 Seconds

cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
         41.50 3.59 3.39   0   9.76   3.10  45.64   0.00   46   46   0.00   0.00   0.00   0.00  27.66  21.69  0.00
  0   0  35.70 3.66 3.39   0  13.02   3.74  47.55   0.00   46   46   0.00   0.00   0.00   0.00  27.66  21.69  0.00
  0   4  44.02 3.49 3.39   0   4.69
  1   1  37.20 3.67 3.39   0  12.29   2.90  47.62   0.00   39
  1   5  44.49 3.54 3.39   0   4.99
  2   2  35.62 3.66 3.39   0  20.04   2.53  41.81   0.00   40
  2   6  52.39 3.55 3.39   0   3.27
  3   3  37.65 3.67 3.39   0  13.53   3.24  45.58   0.00   40
  3   7  44.94 3.55 3.39   0   6.25
92.544695 sec



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-08 20:31   ` Stratos Karafotis
@ 2013-06-08 22:18     ` Rafael J. Wysocki
  2013-06-09 16:26       ` Borislav Petkov
  0 siblings, 1 reply; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-08 22:18 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Saturday, June 08, 2013 11:31:37 PM Stratos Karafotis wrote:
> On 06/08/2013 05:05 PM, Rafael J. Wysocki wrote:
> > On Saturday, June 08, 2013 03:34:29 PM Stratos Karafotis wrote:
> >> I also did the test with the way you mentioned. But I thought to run turbostat for 100 sec as I did with powertop.
> > 
> > Ah, OK.
> > 
> >> Actually benchmark lasts about 96 secs.
> >>
> >> I think that we use almost the same energy for 100 sec to run the same load a little bit faster. I think this means also a reduce to power consumption.
> >>
> >> I will also send the results running the test as you said.
> > 
> > Cool, thanks!
> 
> More results running:
> ./turbostat phoronix-test-suite benchmark pts/build-linux-kernel
> 
> Measurement steps with and without this patch:
> 1) Reboot system
> 2) Run twice the command above without taking measurement
> 3) Wait few minutes
> 4) Run the command and take measurement
> 
> Thanks,
> Stratos
> 
> --------------------------------------------------------------
> Test WITHOUT this patch:
> 
> Phoronix Test Suite v4.6.0
> 
>     Installed: pts/build-linux-kernel-1.3.0
> 
> System Information
> 
> Hardware:
> Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R
> 
> Software:
> OS: Fedora 18, Kernel: 3.10.0-rc3v+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080
> 
>     Would you like to save these test results (Y/n): 
> 
> Timed Linux Kernel Compilation 3.1:
>     pts/build-linux-kernel-1.3.0
>     Test 1 of 1
>     Estimated Trial Run Count:    3
>     Estimated Time To Completion: 2 Minutes
>         Running Pre-Test Script @ 22:59:35
>         Started Run 1 @ 22:59:46
>         Running Interim Test Script @ 23:00:00
>         Started Run 2 @ 23:00:04
>         Running Interim Test Script @ 23:00:13
>         Started Run 3 @ 23:00:17
>         Running Interim Test Script @ 23:00:26  [Std. Dev: 10.04%]
>         Started Run 4 @ 23:00:30
>         Running Interim Test Script @ 23:00:39  [Std. Dev: 8.98%]
>         Started Run 5 @ 23:00:43
>         Running Interim Test Script @ 23:00:53  [Std. Dev: 7.80%]
>         Started Run 6 @ 23:00:56  [Std. Dev: 7.21%]
>         Running Post-Test Script @ 23:01:06
> 
>     Test Results:
>         11.121481895447
>         9.3301539421082
>         9.4521908760071
>         9.3115320205688
>         9.720575094223
>         9.396096944809
> 
>     Average: 9.72 Seconds
> 
> cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
>          40.96 3.57 3.39   0   9.83   3.36  45.85   0.00   46   46   0.00   0.00   0.00   0.00  27.25  21.27  0.00
>   0   0  37.65 3.67 3.39   0  20.53   3.18  38.64   0.00   46   46   0.00   0.00   0.00   0.00  27.25  21.27  0.00
>   0   4  52.10 3.54 3.39   0   6.08
>   1   1  35.21 3.66 3.39   0  11.45   3.80  49.54   0.00   41
>   1   5  41.99 3.45 3.39   0   4.66
>   2   2  35.46 3.66 3.39   0  10.97   3.60  49.97   0.00   38
>   2   6  41.90 3.48 3.39   0   4.53
>   3   3  39.44 3.69 3.39   0  12.46   2.86  45.24   0.00   41
>   3   7  43.90 3.45 3.39   0   7.99
> 94.876210 sec
> 
> 
> ---------------------------------------------------------------------
> Test WITH this patch:
> 
> Phoronix Test Suite v4.6.0
> 
>     Installed: pts/build-linux-kernel-1.3.0
> 
> System Information
> 
> Hardware:
> Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R
> 
> Software:
> OS: Fedora 18, Kernel: 3.10.0-rc3+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080
> 
>     Would you like to save these test results (Y/n): 
> 
> Timed Linux Kernel Compilation 3.1:
>     pts/build-linux-kernel-1.3.0
>     Test 1 of 1
>     Estimated Trial Run Count:    3
>     Estimated Time To Completion: 2 Minutes
>         Running Pre-Test Script @ 22:48:20
>         Started Run 1 @ 22:48:30
>         Running Interim Test Script @ 22:48:44
>         Started Run 2 @ 22:48:47
>         Running Interim Test Script @ 22:48:56
>         Started Run 3 @ 22:49:00
>         Running Interim Test Script @ 22:49:10  [Std. Dev: 4.68%]
>         Started Run 4 @ 22:49:13
>         Running Interim Test Script @ 22:49:23  [Std. Dev: 4.72%]
>         Started Run 5 @ 22:49:26
>         Running Interim Test Script @ 22:49:35  [Std. Dev: 4.25%]
>         Started Run 6 @ 22:49:39  [Std. Dev: 3.98%]
>         Running Post-Test Script @ 22:49:48
> 
>     Test Results:
>         10.205597162247
>         9.2953701019287
>         9.8262219429016
>         9.2547709941864
>         9.4089620113373
>         9.3398430347443
> 
>     Average: 9.56 Seconds
> 
> cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
>          41.50 3.59 3.39   0   9.76   3.10  45.64   0.00   46   46   0.00   0.00   0.00   0.00  27.66  21.69  0.00
>   0   0  35.70 3.66 3.39   0  13.02   3.74  47.55   0.00   46   46   0.00   0.00   0.00   0.00  27.66  21.69  0.00
>   0   4  44.02 3.49 3.39   0   4.69
>   1   1  37.20 3.67 3.39   0  12.29   2.90  47.62   0.00   39
>   1   5  44.49 3.54 3.39   0   4.99
>   2   2  35.62 3.66 3.39   0  20.04   2.53  41.81   0.00   40
>   2   6  52.39 3.55 3.39   0   3.27
>   3   3  37.65 3.67 3.39   0  13.53   3.24  45.58   0.00   40
>   3   7  44.94 3.55 3.39   0   6.25
> 92.544695 sec

OK

The average power drawn by the package is slightly higher with the patchset
applied (27.66 W vs 27.25 W), but since the time needed to complete the
workload with the patchset applied was shorter by about 2.3 sec, the total
energy used was less in the latter case (by about 25.7 J if I'm not mistaken,
or 1% relative).  This means that in the absence of a power limit between
27.25 W and 27.66 W it's better to use the kernel with the patchset applied
for that particular workload from the performance and energy usage perspective.

Good, hopefully that's going to be confirmed on other systems and/or with other
workloads. :-)

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-08 22:18     ` Rafael J. Wysocki
@ 2013-06-09 16:26       ` Borislav Petkov
  2013-06-09 18:08         ` Stratos Karafotis
  0 siblings, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2013-06-09 16:26 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stratos Karafotis, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Sun, Jun 09, 2013 at 12:18:09AM +0200, Rafael J. Wysocki wrote:
> The average power drawn by the package is slightly higher with the
> patchset applied (27.66 W vs 27.25 W), but since the time needed to
> complete the workload with the patchset applied was shorter by about
> 2.3 sec, the total energy used was less in the latter case (by about
> 25.7 J if I'm not mistaken, or 1% relative). This means that in the
> absence of a power limit between 27.25 W and 27.66 W it's better to
> use the kernel with the patchset applied for that particular workload
> from the performance and energy usage perspective.
>
> Good, hopefully that's going to be confirmed on other systems and/or
> with other workloads. :-)

Yep, I see similar results on my AMD F15h.

So there's a register which tells you what the current energy
consumption in Watts is and support for it is integrated in lm_sensors.
I did one read per second, for the duration of the kernel build (10-r5 +
tip), with and without the patch, and averaged out the results:

without
=======

1. 158 samples, avg Watts: 116.915
2. 158 samples, avg Watts: 116.855
3. 158 samples, avg Watts: 116.737
4. 158 samples, avg Watts: 116.792

=> 116.82475 avg Watts.

with
====

1. 157 samples, avg Watts: 116.496
2. 156 samples, avg Watts: 117.535
3. 156 samples, avg Watts: 118.174
4. 157 samples, avg Watts: 117.95

=> 117.53875 avg Watts.

So there's a slight raise in the average power consumption but the
samples count drops by 1 or 2, which is consistent with the observed
kernel build speedup of 1 or 2 seconds.

perf doesn't show any significant difference with and without the patch
but those are single runs only.

without
=======

 Performance counter stats for 'make -j9':

    1167856.647713 task-clock                #    7.272 CPUs utilized
         1,071,177 context-switches          #    0.917 K/sec
            52,844 cpu-migrations            #    0.045 K/sec
        43,600,721 page-faults               #    0.037 M/sec
 4,712,068,048,465 cycles                    #    4.035 GHz
 1,181,730,064,794 stalled-cycles-frontend   #   25.08% frontend cycles idle
   243,576,229,438 stalled-cycles-backend    #    5.17% backend  cycles idle
 2,966,369,010,209 instructions              #    0.63  insns per cycle
                                             #    0.40  stalled cycles per insn
   651,136,706,156 branches                  #  557.548 M/sec
    34,582,447,788 branch-misses             #    5.31% of all branches

     160.599796045 seconds time elapsed

with
====

 Performance counter stats for 'make -j9':

    1169278.095561 task-clock                #    7.271 CPUs utilized
         1,076,528 context-switches          #    0.921 K/sec
            53,284 cpu-migrations            #    0.046 K/sec
        43,598,610 page-faults               #    0.037 M/sec
 4,721,747,687,668 cycles                    #    4.038 GHz
 1,182,301,583,422 stalled-cycles-frontend   #   25.04% frontend cycles idle
   248,675,448,161 stalled-cycles-backend    #    5.27% backend  cycles idle
 2,967,419,684,598 instructions              #    0.63  insns per cycle
                                             #    0.40  stalled cycles per insn
   651,527,448,140 branches                  #  557.205 M/sec
    34,560,656,638 branch-misses             #    5.30% of all branches

     160.811815170 seconds time elapsed


-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-09 16:26       ` Borislav Petkov
@ 2013-06-09 18:08         ` Stratos Karafotis
  2013-06-09 20:58           ` Rafael J. Wysocki
  0 siblings, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-09 18:08 UTC (permalink / raw)
  To: Borislav Petkov, Rafael J. Wysocki
  Cc: Viresh Kumar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-pm, cpufreq, linux-kernel

On 06/09/2013 07:26 PM, Borislav Petkov wrote:
> On Sun, Jun 09, 2013 at 12:18:09AM +0200, Rafael J. Wysocki wrote:
>> The average power drawn by the package is slightly higher with the
>> patchset applied (27.66 W vs 27.25 W), but since the time needed to
>> complete the workload with the patchset applied was shorter by about
>> 2.3 sec, the total energy used was less in the latter case (by about
>> 25.7 J if I'm not mistaken, or 1% relative). This means that in the
>> absence of a power limit between 27.25 W and 27.66 W it's better to
>> use the kernel with the patchset applied for that particular workload
>> from the performance and energy usage perspective.
>>
>> Good, hopefully that's going to be confirmed on other systems and/or
>> with other workloads. :-)
> 
> Yep, I see similar results on my AMD F15h.
> 
> So there's a register which tells you what the current energy
> consumption in Watts is and support for it is integrated in lm_sensors.
> I did one read per second, for the duration of the kernel build (10-r5 +
> tip), with and without the patch, and averaged out the results:
> 
> without
> =======
> 
> 1. 158 samples, avg Watts: 116.915
> 2. 158 samples, avg Watts: 116.855
> 3. 158 samples, avg Watts: 116.737
> 4. 158 samples, avg Watts: 116.792
> 
> => 116.82475 avg Watts.
> 
> with
> ====
> 
> 1. 157 samples, avg Watts: 116.496
> 2. 156 samples, avg Watts: 117.535
> 3. 156 samples, avg Watts: 118.174
> 4. 157 samples, avg Watts: 117.95
> 
> => 117.53875 avg Watts.
> 
> So there's a slight raise in the average power consumption but the
> samples count drops by 1 or 2, which is consistent with the observed
> kernel build speedup of 1 or 2 seconds.
> 
> perf doesn't show any significant difference with and without the patch
> but those are single runs only.
> 
> without
> =======
> 
>   Performance counter stats for 'make -j9':
> 
>      1167856.647713 task-clock                #    7.272 CPUs utilized
>           1,071,177 context-switches          #    0.917 K/sec
>              52,844 cpu-migrations            #    0.045 K/sec
>          43,600,721 page-faults               #    0.037 M/sec
>   4,712,068,048,465 cycles                    #    4.035 GHz
>   1,181,730,064,794 stalled-cycles-frontend   #   25.08% frontend cycles idle
>     243,576,229,438 stalled-cycles-backend    #    5.17% backend  cycles idle
>   2,966,369,010,209 instructions              #    0.63  insns per cycle
>                                               #    0.40  stalled cycles per insn
>     651,136,706,156 branches                  #  557.548 M/sec
>      34,582,447,788 branch-misses             #    5.31% of all branches
> 
>       160.599796045 seconds time elapsed
> 
> with
> ====
> 
>   Performance counter stats for 'make -j9':
> 
>      1169278.095561 task-clock                #    7.271 CPUs utilized
>           1,076,528 context-switches          #    0.921 K/sec
>              53,284 cpu-migrations            #    0.046 K/sec
>          43,598,610 page-faults               #    0.037 M/sec
>   4,721,747,687,668 cycles                    #    4.038 GHz
>   1,182,301,583,422 stalled-cycles-frontend   #   25.04% frontend cycles idle
>     248,675,448,161 stalled-cycles-backend    #    5.27% backend  cycles idle
>   2,967,419,684,598 instructions              #    0.63  insns per cycle
>                                               #    0.40  stalled cycles per insn
>     651,527,448,140 branches                  #  557.205 M/sec
>      34,560,656,638 branch-misses             #    5.30% of all branches
> 
>       160.811815170 seconds time elapsed

Hi,

Boris, thanks so much for your tests!

Rafael, thanks for your analysis!

I did some additional tests to see how the CPU behaves in it's low and high limits.

I used Phoronix Java SciMark 2.0 test (FFT, Monte Carlo etc) to check the patch in
really heavy loads. The results were almost identical with and without this patch.
This is the expected behavior because I believe the load is greater than up_threshold
most of the time in this cases.
With this patch.
Duration: 120.568521 sec
Pkg_W: 20.97

Without this patch
Duration: 120.606813 sec
Pkg_W: 21.11


I also used a small program to check the CPU in very small loads with duration
comparable to sampling rate (10000 in my config).
The program uses a tight 'for' loop with duration ~ (2 x sampling_rate).
After this it sleeps for 5000us.
I repeat the above for 100 times and then the program sleeps for 1 sec.
The above procedure repeats 15 times.

Results show that there is a slow down (~4%) WITH this patch.
Though, less energy used WITH this patch (25,23J ~3.3%)

Thanks,
Stratos


WITHOUT patch:
----------------
Starting benchmark
run 0
Avg time: 21907 us
run 1
Avg time: 21792 us
run 2
Avg time: 21827 us
run 3
Avg time: 21831 us
run 4
Avg time: 21828 us
run 5
Avg time: 21838 us
run 6
Avg time: 21819 us
run 7
Avg time: 21836 us
run 8
Avg time: 21761 us
run 9
Avg time: 21586 us
run 10
Avg time: 20366 us
run 11
Avg time: 21732 us
run 12
Avg time: 20225 us
run 13
Avg time: 21818 us
run 14
Avg time: 21812 us
Elapsed time: 55004.660000 msec
cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
          8.34 3.30 3.39   0   8.78   0.48  82.41   0.00   43   43   0.00   0.00   0.00   0.00  13.87   8.15  0.00
  0   0   0.28 3.10 3.39   0   0.95   0.26  98.51   0.00   43   43   0.00   0.00   0.00   0.00  13.87   8.15  0.00
  0   4   0.54 2.97 3.39   0   0.69
  1   1   0.18 2.15 3.39   0  59.11   0.03  40.67   0.00   39
  1   5  58.86 3.26 3.39   0   0.43
  2   2   3.20 3.82 3.39   0   0.28   0.03  96.50   0.00   36
  2   6   0.13 2.40 3.39   0   3.34
  3   3   0.47 3.04 3.39   0   4.01   1.58  93.94   0.00   39
  3   7   3.04 3.73 3.39   0   1.45
55.027201 sec


WITH patch
----------
Starting benchmark
run 0
Avg time: 23198 us
run 1
Avg time: 23100 us
run 2
Avg time: 23068 us
run 3
Avg time: 23101 us
run 4
Avg time: 23075 us
run 5
Avg time: 23173 us
run 6
Avg time: 23151 us
run 7
Avg time: 23123 us
run 8
Avg time: 23112 us
run 9
Avg time: 23157 us
run 10
Avg time: 23107 us
run 11
Avg time: 23146 us
run 12
Avg time: 23067 us
run 13
Avg time: 23189 us
run 14
Avg time: 23053 us
Elapsed time: 57288.522000 msec
cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
          7.69 3.03 3.39   0   7.86   0.56  83.89   0.00   44   44   0.00   0.00   0.00   0.00  12.88   7.17  0.00
  0   0  60.24 3.05 3.39   0   0.34   0.02  39.40   0.00   44   44   0.00   0.00   0.00   0.00  12.88   7.17  0.00
  0   4   0.11 1.84 3.39   0  60.47
  1   1   0.22 2.15 3.39   0   0.61   0.04  99.13   0.00   37
  1   5   0.50 2.53 3.39   0   0.33
  2   2   0.12 2.12 3.39   0   0.29   0.11  99.48   0.00   34
  2   6   0.05 2.26 3.39   0   0.36
  3   3   0.31 2.66 3.39   0   0.08   2.08  97.53   0.00   38
  3   7   0.03 1.96 3.39   0   0.37
57.290084 sec

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-09 18:08         ` Stratos Karafotis
@ 2013-06-09 20:58           ` Rafael J. Wysocki
  2013-06-09 21:14             ` Borislav Petkov
  2013-06-10 21:57             ` Stratos Karafotis
  0 siblings, 2 replies; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-09 20:58 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Sunday, June 09, 2013 09:08:23 PM Stratos Karafotis wrote:
> On 06/09/2013 07:26 PM, Borislav Petkov wrote:
> > On Sun, Jun 09, 2013 at 12:18:09AM +0200, Rafael J. Wysocki wrote:
> >> The average power drawn by the package is slightly higher with the
> >> patchset applied (27.66 W vs 27.25 W), but since the time needed to
> >> complete the workload with the patchset applied was shorter by about
> >> 2.3 sec, the total energy used was less in the latter case (by about
> >> 25.7 J if I'm not mistaken, or 1% relative). This means that in the
> >> absence of a power limit between 27.25 W and 27.66 W it's better to
> >> use the kernel with the patchset applied for that particular workload
> >> from the performance and energy usage perspective.
> >>
> >> Good, hopefully that's going to be confirmed on other systems and/or
> >> with other workloads. :-)
> > 
> > Yep, I see similar results on my AMD F15h.
> > 
> > So there's a register which tells you what the current energy
> > consumption in Watts is and support for it is integrated in lm_sensors.
> > I did one read per second, for the duration of the kernel build (10-r5 +
> > tip), with and without the patch, and averaged out the results:
> > 
> > without
> > =======
> > 
> > 1. 158 samples, avg Watts: 116.915
> > 2. 158 samples, avg Watts: 116.855
> > 3. 158 samples, avg Watts: 116.737
> > 4. 158 samples, avg Watts: 116.792
> > 
> > => 116.82475 avg Watts.
> > 
> > with
> > ====
> > 
> > 1. 157 samples, avg Watts: 116.496
> > 2. 156 samples, avg Watts: 117.535
> > 3. 156 samples, avg Watts: 118.174
> > 4. 157 samples, avg Watts: 117.95
> > 
> > => 117.53875 avg Watts.
> > 
> > So there's a slight raise in the average power consumption but the
> > samples count drops by 1 or 2, which is consistent with the observed
> > kernel build speedup of 1 or 2 seconds.
> > 
> > perf doesn't show any significant difference with and without the patch
> > but those are single runs only.
> > 
> > without
> > =======
> > 
> >   Performance counter stats for 'make -j9':
> > 
> >      1167856.647713 task-clock                #    7.272 CPUs utilized
> >           1,071,177 context-switches          #    0.917 K/sec
> >              52,844 cpu-migrations            #    0.045 K/sec
> >          43,600,721 page-faults               #    0.037 M/sec
> >   4,712,068,048,465 cycles                    #    4.035 GHz
> >   1,181,730,064,794 stalled-cycles-frontend   #   25.08% frontend cycles idle
> >     243,576,229,438 stalled-cycles-backend    #    5.17% backend  cycles idle
> >   2,966,369,010,209 instructions              #    0.63  insns per cycle
> >                                               #    0.40  stalled cycles per insn
> >     651,136,706,156 branches                  #  557.548 M/sec
> >      34,582,447,788 branch-misses             #    5.31% of all branches
> > 
> >       160.599796045 seconds time elapsed
> > 
> > with
> > ====
> > 
> >   Performance counter stats for 'make -j9':
> > 
> >      1169278.095561 task-clock                #    7.271 CPUs utilized
> >           1,076,528 context-switches          #    0.921 K/sec
> >              53,284 cpu-migrations            #    0.046 K/sec
> >          43,598,610 page-faults               #    0.037 M/sec
> >   4,721,747,687,668 cycles                    #    4.038 GHz
> >   1,182,301,583,422 stalled-cycles-frontend   #   25.04% frontend cycles idle
> >     248,675,448,161 stalled-cycles-backend    #    5.27% backend  cycles idle
> >   2,967,419,684,598 instructions              #    0.63  insns per cycle
> >                                               #    0.40  stalled cycles per insn
> >     651,527,448,140 branches                  #  557.205 M/sec
> >      34,560,656,638 branch-misses             #    5.30% of all branches
> > 
> >       160.811815170 seconds time elapsed
> 
> Hi,
> 
> Boris, thanks so much for your tests!
> 
> Rafael, thanks for your analysis!
> 
> I did some additional tests to see how the CPU behaves in it's low and high limits.
> 
> I used Phoronix Java SciMark 2.0 test (FFT, Monte Carlo etc) to check the patch in
> really heavy loads. The results were almost identical with and without this patch.
> This is the expected behavior because I believe the load is greater than up_threshold
> most of the time in this cases.
> With this patch.
> Duration: 120.568521 sec
> Pkg_W: 20.97
> 
> Without this patch
> Duration: 120.606813 sec
> Pkg_W: 21.11

The kernel with the patch applied still uses slightly less energy, however.

> I also used a small program to check the CPU in very small loads with duration
> comparable to sampling rate (10000 in my config).
> The program uses a tight 'for' loop with duration ~ (2 x sampling_rate).
> After this it sleeps for 5000us.
> I repeat the above for 100 times and then the program sleeps for 1 sec.
> The above procedure repeats 15 times.
> 
> Results show that there is a slow down (~4%) WITH this patch.
> Though, less energy used WITH this patch (25,23J ~3.3%)

Well, this means that your changes may hurt performance if the load comes and
goes in spikes, which is not so good.  The fact that they cause less energy to
be used at the same time kind of balance that, though.  [After all, we're
talking about the ondemand governor which should be used if the user wants to
sacrifice some performance for energy savings.]

It would be interesting to see if the picture changes for different time
intervals in your test program (e.g. loop duration that is not a multiple of
sampling_rate and sleep times different from 5000 us) to rule out any random
coincidences.

Can you possibly prepare a graph showing both the execution time and energy
consumption for several different loop durations in your program (let's keep
the 5000 us sleep for now), including multiples of sampling_rate as well as
some other durations?

Thanks,
Rafael


> WITHOUT patch:
> ----------------
> Starting benchmark
> run 0
> Avg time: 21907 us
> run 1
> Avg time: 21792 us
> run 2
> Avg time: 21827 us
> run 3
> Avg time: 21831 us
> run 4
> Avg time: 21828 us
> run 5
> Avg time: 21838 us
> run 6
> Avg time: 21819 us
> run 7
> Avg time: 21836 us
> run 8
> Avg time: 21761 us
> run 9
> Avg time: 21586 us
> run 10
> Avg time: 20366 us
> run 11
> Avg time: 21732 us
> run 12
> Avg time: 20225 us
> run 13
> Avg time: 21818 us
> run 14
> Avg time: 21812 us
> Elapsed time: 55004.660000 msec
> cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
>           8.34 3.30 3.39   0   8.78   0.48  82.41   0.00   43   43   0.00   0.00   0.00   0.00  13.87   8.15  0.00
>   0   0   0.28 3.10 3.39   0   0.95   0.26  98.51   0.00   43   43   0.00   0.00   0.00   0.00  13.87   8.15  0.00
>   0   4   0.54 2.97 3.39   0   0.69
>   1   1   0.18 2.15 3.39   0  59.11   0.03  40.67   0.00   39
>   1   5  58.86 3.26 3.39   0   0.43
>   2   2   3.20 3.82 3.39   0   0.28   0.03  96.50   0.00   36
>   2   6   0.13 2.40 3.39   0   3.34
>   3   3   0.47 3.04 3.39   0   4.01   1.58  93.94   0.00   39
>   3   7   3.04 3.73 3.39   0   1.45
> 55.027201 sec
> 
> 
> WITH patch
> ----------
> Starting benchmark
> run 0
> Avg time: 23198 us
> run 1
> Avg time: 23100 us
> run 2
> Avg time: 23068 us
> run 3
> Avg time: 23101 us
> run 4
> Avg time: 23075 us
> run 5
> Avg time: 23173 us
> run 6
> Avg time: 23151 us
> run 7
> Avg time: 23123 us
> run 8
> Avg time: 23112 us
> run 9
> Avg time: 23157 us
> run 10
> Avg time: 23107 us
> run 11
> Avg time: 23146 us
> run 12
> Avg time: 23067 us
> run 13
> Avg time: 23189 us
> run 14
> Avg time: 23053 us
> Elapsed time: 57288.522000 msec
> cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
>           7.69 3.03 3.39   0   7.86   0.56  83.89   0.00   44   44   0.00   0.00   0.00   0.00  12.88   7.17  0.00
>   0   0  60.24 3.05 3.39   0   0.34   0.02  39.40   0.00   44   44   0.00   0.00   0.00   0.00  12.88   7.17  0.00
>   0   4   0.11 1.84 3.39   0  60.47
>   1   1   0.22 2.15 3.39   0   0.61   0.04  99.13   0.00   37
>   1   5   0.50 2.53 3.39   0   0.33
>   2   2   0.12 2.12 3.39   0   0.29   0.11  99.48   0.00   34
>   2   6   0.05 2.26 3.39   0   0.36
>   3   3   0.31 2.66 3.39   0   0.08   2.08  97.53   0.00   38
>   3   7   0.03 1.96 3.39   0   0.37
> 57.290084 sec
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-09 20:58           ` Rafael J. Wysocki
@ 2013-06-09 21:14             ` Borislav Petkov
  2013-06-09 22:11               ` Rafael J. Wysocki
  2013-06-10 21:57             ` Stratos Karafotis
  1 sibling, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2013-06-09 21:14 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Sun, Jun 09, 2013 at 10:58:51PM +0200, Rafael J. Wysocki wrote:
> Can you possibly prepare a graph showing both the execution time
> and energy consumption for several different loop durations in your
> program (let's keep the 5000 us sleep for now), including multiples of
> sampling_rate as well as some other durations?

Judgind by the times in C0 one of the cores spent, this small program
is single-threaded and is a microbenchmark. And you know how optimizing
against a microbenchmark doesn't really make a lot of sense.

I wonder if lmbench or aim9 or whatever would make more sense to try
here...

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-09 21:14             ` Borislav Petkov
@ 2013-06-09 22:11               ` Rafael J. Wysocki
  2015-02-23 16:42                 ` nitin
  0 siblings, 1 reply; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-09 22:11 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Sunday, June 09, 2013 11:14:49 PM Borislav Petkov wrote:
> On Sun, Jun 09, 2013 at 10:58:51PM +0200, Rafael J. Wysocki wrote:
> > Can you possibly prepare a graph showing both the execution time
> > and energy consumption for several different loop durations in your
> > program (let's keep the 5000 us sleep for now), including multiples of
> > sampling_rate as well as some other durations?
> 
> Judgind by the times in C0 one of the cores spent, this small program
> is single-threaded and is a microbenchmark.

Yes, it is single-threaded, but that can be easily addressed by running
multiple copies of it in parallel. :-)

And yes, it is a microbenchmark, ->

> And you know how optimizing against a microbenchmark doesn't really make
> a lot of sense.

-> but this is more about finding possible issues that about optimizing.

I'm regarding this change as a substantial code simplification in the first
place, both in terms of conceptual complexity and the actual code size, so I'd
like to know what is *likely* to be affected by it (be it a microbenchmark or
whatever).

IOW, try to play a devil's advocate and find something that get's worse after
applying these changes.  If we can't find anything like that, there won't be
any reason not to apply them.

> I wonder if lmbench or aim9 or whatever would make more sense to try here...

I think we'll need to try them too.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-09 20:58           ` Rafael J. Wysocki
  2013-06-09 21:14             ` Borislav Petkov
@ 2013-06-10 21:57             ` Stratos Karafotis
  2013-06-10 23:24               ` Rafael J. Wysocki
  1 sibling, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-10 21:57 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On 06/09/2013 11:58 PM, Rafael J. Wysocki wrote:
> Well, this means that your changes may hurt performance if the load comes and
> goes in spikes, which is not so good.  The fact that they cause less energy to
> be used at the same time kind of balance that, though.  [After all, we're
> talking about the ondemand governor which should be used if the user wants to
> sacrifice some performance for energy savings.]
> 
> It would be interesting to see if the picture changes for different time
> intervals in your test program (e.g. loop duration that is not a multiple of
> sampling_rate and sleep times different from 5000 us) to rule out any random
> coincidences.
> 
> Can you possibly prepare a graph showing both the execution time and energy
> consumption for several different loop durations in your program (let's keep
> the 5000 us sleep for now), including multiples of sampling_rate as well as
> some other durations?

Hi,

I tested different loop durations with my program from 1,000us to 1,000,000us.
The logic is almost the same with the previous test:

1) Use a 'for' loop to a period T (~ 1000-1000000us)
2) sleep for 5000us
3) Repeat steps 1-2, 50 times.
4) sleep for 1s
5) Repeat 1-4, 5 times.

The results:
https://docs.google.com/spreadsheet/ccc?key=0AnMfNYUV1k0ddE13ZUtYdGs2dUVRdG00bVRVT3JScWc&usp=sharing

Sheet1 (ProcessX1) includes the results from the test program running
as single copy. The second one (ProcessX4) includes the results from the test
program running it in 4 copies in parallel (using a bash script that waits
the end of execution).

Graphs show the difference(%) in total execution time and total energy without
and with the patch.
Negative values mean that the test *with* the patch had better performance or
used less energy.

Test shows that below sampling rate (10000us in my config), ondemand with this
patch behaves better (both in performance and consumption).
Though, in this test, for loads with 10000us < duration <= 200000us ondemand
behaves better without the patch.   

Thanks,
Stratos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-10 21:57             ` Stratos Karafotis
@ 2013-06-10 23:24               ` Rafael J. Wysocki
  2013-06-13 21:22                 ` Stratos Karafotis
  0 siblings, 1 reply; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-10 23:24 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Tuesday, June 11, 2013 12:57:26 AM Stratos Karafotis wrote:
> On 06/09/2013 11:58 PM, Rafael J. Wysocki wrote:
> > Well, this means that your changes may hurt performance if the load comes and
> > goes in spikes, which is not so good.  The fact that they cause less energy to
> > be used at the same time kind of balance that, though.  [After all, we're
> > talking about the ondemand governor which should be used if the user wants to
> > sacrifice some performance for energy savings.]
> > 
> > It would be interesting to see if the picture changes for different time
> > intervals in your test program (e.g. loop duration that is not a multiple of
> > sampling_rate and sleep times different from 5000 us) to rule out any random
> > coincidences.
> > 
> > Can you possibly prepare a graph showing both the execution time and energy
> > consumption for several different loop durations in your program (let's keep
> > the 5000 us sleep for now), including multiples of sampling_rate as well as
> > some other durations?
> 
> Hi,
> 
> I tested different loop durations with my program from 1,000us to 1,000,000us.
> The logic is almost the same with the previous test:
> 
> 1) Use a 'for' loop to a period T (~ 1000-1000000us)
> 2) sleep for 5000us
> 3) Repeat steps 1-2, 50 times.
> 4) sleep for 1s
> 5) Repeat 1-4, 5 times.
> 
> The results:
> https://docs.google.com/spreadsheet/ccc?key=0AnMfNYUV1k0ddE13ZUtYdGs2dUVRdG00bVRVT3JScWc&usp=sharing
> 
> Sheet1 (ProcessX1) includes the results from the test program running
> as single copy. The second one (ProcessX4) includes the results from the test
> program running it in 4 copies in parallel (using a bash script that waits
> the end of execution).
> 
> Graphs show the difference(%) in total execution time and total energy without
> and with the patch.
> Negative values mean that the test *with* the patch had better performance or
> used less energy.
> 
> Test shows that below sampling rate (10000us in my config), ondemand with this
> patch behaves better (both in performance and consumption).
> Though, in this test, for loads with 10000us < duration <= 200000us ondemand
> behaves better without the patch.

Thanks for these results!

Well, I'd say that this doesn't look rosy any more, so the jury is still out.

We need more testing with different workloads and on different hardware.  I'll
try to arrange something to that end.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-10 23:24               ` Rafael J. Wysocki
@ 2013-06-13 21:22                 ` Stratos Karafotis
  2013-06-13 21:40                   ` Borislav Petkov
  0 siblings, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-13 21:22 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

Hi Rafael,

On 06/11/2013 02:24 AM, Rafael J. Wysocki wrote:
> On Tuesday, June 11, 2013 12:57:26 AM Stratos Karafotis wrote:
>> On 06/09/2013 11:58 PM, Rafael J. Wysocki wrote:
>>> Well, this means that your changes may hurt performance if the load comes and
>>> goes in spikes, which is not so good.  The fact that they cause less energy to
>>> be used at the same time kind of balance that, though.  [After all, we're
>>> talking about the ondemand governor which should be used if the user wants to
>>> sacrifice some performance for energy savings.]
>>>
>>> It would be interesting to see if the picture changes for different time
>>> intervals in your test program (e.g. loop duration that is not a multiple of
>>> sampling_rate and sleep times different from 5000 us) to rule out any random
>>> coincidences.
>>>
>>> Can you possibly prepare a graph showing both the execution time and energy
>>> consumption for several different loop durations in your program (let's keep
>>> the 5000 us sleep for now), including multiples of sampling_rate as well as
>>> some other durations?
>>
>> Hi,
>>
>> I tested different loop durations with my program from 1,000us to 1,000,000us.
>> The logic is almost the same with the previous test:
>>
>> 1) Use a 'for' loop to a period T (~ 1000-1000000us)
>> 2) sleep for 5000us
>> 3) Repeat steps 1-2, 50 times.
>> 4) sleep for 1s
>> 5) Repeat 1-4, 5 times.
>>
>> The results:
>> https://docs.google.com/spreadsheet/ccc?key=0AnMfNYUV1k0ddE13ZUtYdGs2dUVRdG00bVRVT3JScWc&usp=sharing
>>
>> Sheet1 (ProcessX1) includes the results from the test program running
>> as single copy. The second one (ProcessX4) includes the results from the test
>> program running it in 4 copies in parallel (using a bash script that waits
>> the end of execution).
>>
>> Graphs show the difference(%) in total execution time and total energy without
>> and with the patch.
>> Negative values mean that the test *with* the patch had better performance or
>> used less energy.
>>
>> Test shows that below sampling rate (10000us in my config), ondemand with this
>> patch behaves better (both in performance and consumption).
>> Though, in this test, for loads with 10000us < duration <= 200000us ondemand
>> behaves better without the patch.
> 
> Thanks for these results!
> 
> Well, I'd say that this doesn't look rosy any more, so the jury is still out.
> 
> We need more testing with different workloads and on different hardware.  I'll
> try to arrange something to that end.

Please let me share some more test results using aim9 benchmark suite:
https://docs.google.com/spreadsheet/ccc?key=0AnMfNYUV1k0ddDdGdlJyUHpqT2xGY1lBOEt2UEVnNlE&usp=sharing

Each test was running for 10sec. 
Total execution time with and without the patch was almost identical, which is
expected since the tests in aim9 run for a specific period.
The energy during the test run was increased by 0.43% with the patch.
The performance was increased by 1.25% (average) with this patch.

Thanks,
Stratos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-13 21:22                 ` Stratos Karafotis
@ 2013-06-13 21:40                   ` Borislav Petkov
  2013-06-13 22:04                     ` Stratos Karafotis
  2013-06-13 22:15                     ` Rafael J. Wysocki
  0 siblings, 2 replies; 48+ messages in thread
From: Borislav Petkov @ 2013-06-13 21:40 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Rafael J. Wysocki, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Fri, Jun 14, 2013 at 12:22:18AM +0300, Stratos Karafotis wrote:
> Please let me share some more test results using aim9 benchmark suite:
> https://docs.google.com/spreadsheet/ccc?key=0AnMfNYUV1k0ddDdGdlJyUHpqT2xGY1lBOEt2UEVnNlE&usp=sharing
> 
> Each test was running for 10sec. 
> Total execution time with and without the patch was almost identical, which is
> expected since the tests in aim9 run for a specific period.
> The energy during the test run was increased by 0.43% with the patch.
> The performance was increased by 1.25% (average) with this patch.

Not bad. However, exec_test and fork_test are kinda unexpected with such
a high improvement percentage. Happen to have an explanation?

FWIW, if we don't find any serious perf/power regressions with
this patch, I'd say it is worth applying even solely for the code
simplification it brings.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-13 21:40                   ` Borislav Petkov
@ 2013-06-13 22:04                     ` Stratos Karafotis
  2013-06-13 22:38                       ` Borislav Petkov
  2013-06-13 22:15                     ` Rafael J. Wysocki
  1 sibling, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-13 22:04 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On 06/14/2013 12:40 AM, Borislav Petkov wrote:
> On Fri, Jun 14, 2013 at 12:22:18AM +0300, Stratos Karafotis wrote:
>> Please let me share some more test results using aim9 benchmark suite:
>> https://docs.google.com/spreadsheet/ccc?key=0AnMfNYUV1k0ddDdGdlJyUHpqT2xGY1lBOEt2UEVnNlE&usp=sharing
>>
>> Each test was running for 10sec.
>> Total execution time with and without the patch was almost identical, which is
>> expected since the tests in aim9 run for a specific period.
>> The energy during the test run was increased by 0.43% with the patch.
>> The performance was increased by 1.25% (average) with this patch.
> 
> Not bad. However, exec_test and fork_test are kinda unexpected with such
> a high improvement percentage. Happen to have an explanation?
> 
> FWIW, if we don't find any serious perf/power regressions with
> this patch, I'd say it is worth applying even solely for the code
> simplification it brings.
> 

Although, I'm not sure about the unexpected improvement, I confirm this 
(run again the test). Also, there is important improvement in 
Directory searches (+5.79%), Disk Copies (+1.19%), shell scripts 
(1.20%, 1.51%, 2.38%) and tcp/udp tests (3.62%, 1.41%).

I believe that ondemand has better performance with this patch in 
medium loads. Maybe these operations produce small to medium loads (lower
than up_threshold) and push the CPU to medium frequencies. Without the
patch CPU stays longer to min frequency.

Thanks,
Stratos


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-13 21:40                   ` Borislav Petkov
  2013-06-13 22:04                     ` Stratos Karafotis
@ 2013-06-13 22:15                     ` Rafael J. Wysocki
  2013-06-13 22:37                         ` Borislav Petkov
  1 sibling, 1 reply; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-13 22:15 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Thursday, June 13, 2013 11:40:08 PM Borislav Petkov wrote:
> On Fri, Jun 14, 2013 at 12:22:18AM +0300, Stratos Karafotis wrote:
> > Please let me share some more test results using aim9 benchmark suite:
> > https://docs.google.com/spreadsheet/ccc?key=0AnMfNYUV1k0ddDdGdlJyUHpqT2xGY1lBOEt2UEVnNlE&usp=sharing
> > 
> > Each test was running for 10sec. 
> > Total execution time with and without the patch was almost identical, which is
> > expected since the tests in aim9 run for a specific period.
> > The energy during the test run was increased by 0.43% with the patch.
> > The performance was increased by 1.25% (average) with this patch.
> 
> Not bad. However, exec_test and fork_test are kinda unexpected with such
> a high improvement percentage. Happen to have an explanation?
> 
> FWIW, if we don't find any serious perf/power regressions with
> this patch, I'd say it is worth applying even solely for the code
> simplification it brings.

May I take this as an ACK? ;-)

Well, that's my opinion too, actually.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-13 22:15                     ` Rafael J. Wysocki
@ 2013-06-13 22:37                         ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2013-06-13 22:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Fri, Jun 14, 2013 at 12:15:36AM +0200, Rafael J. Wysocki wrote:
> On Thursday, June 13, 2013 11:40:08 PM Borislav Petkov wrote:

[ … ]

> > Not bad. However, exec_test and fork_test are kinda unexpected with such
> > a high improvement percentage. Happen to have an explanation?
> > 
> > FWIW, if we don't find any serious perf/power regressions with
> > this patch, I'd say it is worth applying even solely for the code
> > simplification it brings.
> 
> May I take this as an ACK? ;-)
> 
> Well, that's my opinion too, actually.

I know - you told me and I like that aspect :-). And from the test
results so far, the code simplification is maybe the most persuasive
one. The slight improvements in perf/power are then the cherry on top.

Although, I'm not sure we're exhaustive with the benchmarks and we
should maybe run a couple more. Although, judging by the results,
generally no serious outliers should be expected (except exec_test and
fork_test funsies above), which are actually positive outliers.

Judging by the code change, the only worry we should have, AFAIU, is
any raise in power consumption due to spending longer periods in the
intermediary P-states now and not going straight to the lowest P-state.
But this compensates with improvement in runtime of the workloads.

Hmm, I dunno - I'm just thinking out loud here...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-13 22:37                         ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2013-06-13 22:37 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Fri, Jun 14, 2013 at 12:15:36AM +0200, Rafael J. Wysocki wrote:
> On Thursday, June 13, 2013 11:40:08 PM Borislav Petkov wrote:

[ … ]

> > Not bad. However, exec_test and fork_test are kinda unexpected with such
> > a high improvement percentage. Happen to have an explanation?
> > 
> > FWIW, if we don't find any serious perf/power regressions with
> > this patch, I'd say it is worth applying even solely for the code
> > simplification it brings.
> 
> May I take this as an ACK? ;-)
> 
> Well, that's my opinion too, actually.

I know - you told me and I like that aspect :-). And from the test
results so far, the code simplification is maybe the most persuasive
one. The slight improvements in perf/power are then the cherry on top.

Although, I'm not sure we're exhaustive with the benchmarks and we
should maybe run a couple more. Although, judging by the results,
generally no serious outliers should be expected (except exec_test and
fork_test funsies above), which are actually positive outliers.

Judging by the code change, the only worry we should have, AFAIU, is
any raise in power consumption due to spending longer periods in the
intermediary P-states now and not going straight to the lowest P-state.
But this compensates with improvement in runtime of the workloads.

Hmm, I dunno - I'm just thinking out loud here...

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-13 22:04                     ` Stratos Karafotis
@ 2013-06-13 22:38                       ` Borislav Petkov
  0 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2013-06-13 22:38 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Rafael J. Wysocki, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Fri, Jun 14, 2013 at 01:04:25AM +0300, Stratos Karafotis wrote:
> I believe that ondemand has better performance with this patch in
> medium loads. Maybe these operations produce small to medium loads
> (lower than up_threshold) and push the CPU to medium frequencies.
> Without the patch CPU stays longer to min frequency.

Yep, this is my impression too.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-14 12:46                           ` Rafael J. Wysocki
  (?)
@ 2013-06-14 12:44                           ` Borislav Petkov
  2013-06-14 12:55                             ` Rafael J. Wysocki
  -1 siblings, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2013-06-14 12:44 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Fri, Jun 14, 2013 at 02:46:38PM +0200, Rafael J. Wysocki wrote:
> OK, so here's a deal. After 3.10-rc1 goes out, I'll put this into
> linux-next

Yeah, you mean 3.11-rc1 here...

> for 3.12, so that people have a few more weeks to complain. If they
> don't, it'll go into 3.12.

but yep, sounds like a deal.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-13 22:37                         ` Borislav Petkov
@ 2013-06-14 12:46                           ` Rafael J. Wysocki
  -1 siblings, 0 replies; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-14 12:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Friday, June 14, 2013 12:37:41 AM Borislav Petkov wrote:
> On Fri, Jun 14, 2013 at 12:15:36AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, June 13, 2013 11:40:08 PM Borislav Petkov wrote:
> 
> [ … ]
> 
> > > Not bad. However, exec_test and fork_test are kinda unexpected with such
> > > a high improvement percentage. Happen to have an explanation?
> > > 
> > > FWIW, if we don't find any serious perf/power regressions with
> > > this patch, I'd say it is worth applying even solely for the code
> > > simplification it brings.
> > 
> > May I take this as an ACK? ;-)
> > 
> > Well, that's my opinion too, actually.
> 
> I know - you told me and I like that aspect :-). And from the test
> results so far, the code simplification is maybe the most persuasive
> one. The slight improvements in perf/power are then the cherry on top.
> 
> Although, I'm not sure we're exhaustive with the benchmarks and we
> should maybe run a couple more. Although, judging by the results,
> generally no serious outliers should be expected (except exec_test and
> fork_test funsies above), which are actually positive outliers.
> 
> Judging by the code change, the only worry we should have, AFAIU, is
> any raise in power consumption due to spending longer periods in the
> intermediary P-states now and not going straight to the lowest P-state.
> But this compensates with improvement in runtime of the workloads.
> 
> Hmm, I dunno - I'm just thinking out loud here...

OK, so here's a deal.  After 3.10-rc1 goes out, I'll put this into linux-next
for 3.12, so that people have a few more weeks to complain.  If they don't,
it'll go into 3.12.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-14 12:46                           ` Rafael J. Wysocki
  0 siblings, 0 replies; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-14 12:46 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Friday, June 14, 2013 12:37:41 AM Borislav Petkov wrote:
> On Fri, Jun 14, 2013 at 12:15:36AM +0200, Rafael J. Wysocki wrote:
> > On Thursday, June 13, 2013 11:40:08 PM Borislav Petkov wrote:
> 
> [ … ]
> 
> > > Not bad. However, exec_test and fork_test are kinda unexpected with such
> > > a high improvement percentage. Happen to have an explanation?
> > > 
> > > FWIW, if we don't find any serious perf/power regressions with
> > > this patch, I'd say it is worth applying even solely for the code
> > > simplification it brings.
> > 
> > May I take this as an ACK? ;-)
> > 
> > Well, that's my opinion too, actually.
> 
> I know - you told me and I like that aspect :-). And from the test
> results so far, the code simplification is maybe the most persuasive
> one. The slight improvements in perf/power are then the cherry on top.
> 
> Although, I'm not sure we're exhaustive with the benchmarks and we
> should maybe run a couple more. Although, judging by the results,
> generally no serious outliers should be expected (except exec_test and
> fork_test funsies above), which are actually positive outliers.
> 
> Judging by the code change, the only worry we should have, AFAIU, is
> any raise in power consumption due to spending longer periods in the
> intermediary P-states now and not going straight to the lowest P-state.
> But this compensates with improvement in runtime of the workloads.
> 
> Hmm, I dunno - I'm just thinking out loud here...

OK, so here's a deal.  After 3.10-rc1 goes out, I'll put this into linux-next
for 3.12, so that people have a few more weeks to complain.  If they don't,
it'll go into 3.12.

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-14 12:44                           ` Borislav Petkov
@ 2013-06-14 12:55                             ` Rafael J. Wysocki
  2013-06-14 15:53                               ` Stratos Karafotis
  0 siblings, 1 reply; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-14 12:55 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stratos Karafotis, Borislav Petkov, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On Friday, June 14, 2013 02:44:01 PM Borislav Petkov wrote:
> On Fri, Jun 14, 2013 at 02:46:38PM +0200, Rafael J. Wysocki wrote:
> > OK, so here's a deal. After 3.10-rc1 goes out, I'll put this into
> > linux-next
> 
> Yeah, you mean 3.11-rc1 here...

Sure, sorry for the confusion.

> > for 3.12, so that people have a few more weeks to complain. If they
> > don't, it'll go into 3.12.
> 
> but yep, sounds like a deal.

Cool, thanks!


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-14 12:55                             ` Rafael J. Wysocki
@ 2013-06-14 15:53                               ` Stratos Karafotis
  0 siblings, 0 replies; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-14 15:53 UTC (permalink / raw)
  To: Rafael J. Wysocki, Borislav Petkov, Borislav Petkov, Viresh Kumar
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

Hi,

On 06/14/2013 03:55 PM, Rafael J. Wysocki wrote:
> On Friday, June 14, 2013 02:44:01 PM Borislav Petkov wrote:
>> On Fri, Jun 14, 2013 at 02:46:38PM +0200, Rafael J. Wysocki wrote:
>>> OK, so here's a deal. After 3.10-rc1 goes out, I'll put this into
>>> linux-next
>>
>> Yeah, you mean 3.11-rc1 here...
> 
> Sure, sorry for the confusion.
> 
>>> for 3.12, so that people have a few more weeks to complain. If they
>>> don't, it'll go into 3.12.
>>
>> but yep, sounds like a deal.
> 
> Cool, thanks!


Great news! :-)

Thank you all, for your help and for your valuable time!

Regards,
Stratos


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-09 22:11               ` Rafael J. Wysocki
@ 2015-02-23 16:42                 ` nitin
  0 siblings, 0 replies; 48+ messages in thread
From: nitin @ 2015-02-23 16:42 UTC (permalink / raw)
  To: linux-kernel

Rafael J. Wysocki <rjw <at> sisk.pl> writes:

> 
> On Sunday, June 09, 2013 11:14:49 PM Borislav Petkov wrote:
> > On Sun, Jun 09, 2013 at 10:58:51PM +0200, Rafael J. Wysocki wrote:
> > > Can you possibly prepare a graph showing both the execution time
> > > and energy consumption for several different loop durations in 
your
> > > program (let's keep the 5000 us sleep for now), including 
multiples of
> > > sampling_rate as well as some other durations?
> > 
> > Judgind by the times in C0 one of the cores spent, this small 
program
> > is single-threaded and is a microbenchmark.
> 
> Yes, it is single-threaded, but that can be easily addressed by 
running
> multiple copies of it in parallel. 
> 
> And yes, it is a microbenchmark, ->
> 
> > And you know how optimizing against a microbenchmark doesn't really 
make
> > a lot of sense.
> 
> -> but this is more about finding possible issues that about 
optimizing.
> 
> I'm regarding this change as a substantial code simplification in the 
first
> place, both in terms of conceptual complexity and the actual code 
size, so I'd
> like to know what is *likely* to be affected by it (be it a 
microbenchmark or
> whatever).
> 
> IOW, try to play a devil's advocate and find something that get's 
worse after
> applying these changes.  If we can't find anything like that, there 
won't be
> any reason not to apply them.
> 
> > I wonder if lmbench or aim9 or whatever would make more sense to try 
here...
> 
> I think we'll need to try them too.
> 
> Thanks,
> Rafael
> 

Hi, I am working on integrating the cpufreq interactive governor with 
scheduler. We would like to verify our results using aim9 benchmark on 
Linux v3.10.28 over Android v4.4.4. Is there a patch available for 
porting aim9 benchmark tools on Android?
Right now I am unable to compile it using the gcc provided with Android 
NDK toolchain in the arm-linux-androideabi-4.8.

Thanks,
Nitin



^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-08  9:56           ` Stratos Karafotis
@ 2013-06-08 11:18             ` Rafael J. Wysocki
  0 siblings, 0 replies; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-08 11:18 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Saturday, June 08, 2013 12:56:00 PM Stratos Karafotis wrote:
> On 06/07/2013 11:57 PM, Rafael J. Wysocki wrote:
> > On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
> >> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
> >>> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> >>>> Hi Borislav,
> >>>>
> >>>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> >>>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> >>>>>> Ondemand calculates load in terms of frequency and increases it only
> >>>>>> if the load_freq is greater than up_threshold multiplied by current
> >>>>>> or average frequency. This seems to produce oscillations of frequency
> >>>>>> between min and max because, for example, a relatively small load can
> >>>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
> >>>>>> CPU will decrease back to min due to a small load_freq.
> >>>>>
> >>>>> Right, and I think this is how we want it, no?
> >>>>>
> >>>>> The thing is, the faster you finish your work, the faster you can become
> >>>>> idle and save power.
> >>>>
> >>>> This is exactly the goal of this patch. To use more efficiently middle
> >>>> frequencies to finish faster the work.
> >>>>
> >>>>> If you switch frequencies in a staircase-like manner, you're going to
> >>>>> take longer to finish, in certain cases, and burn more power while doing
> >>>>> so.
> >>>>
> >>>> This is not true with this patch. It switches to middle frequencies
> >>>> when the load < up_threshold.
> >>>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
> >>>> load is greater than up_threshold.
> >>>>
> >>>>> Btw, racing to idle is also a good example for why you want boosting:
> >>>>> you want to go max out the core but stay within power limits so that you
> >>>>> can finish sooner.
> >>>>>
> >>>>>> This patch changes the calculation method of load and target frequency
> >>>>>> considering 2 points:
> >>>>>> - Load computation should be independent from current or average
> >>>>>> measured frequency. For example an absolute load 80% at 100MHz is not
> >>>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> >>>>>> - Target frequency should be increased to any value of frequency table
> >>>>>> proportional to absolute load, instead to only the max. Thus:
> >>>>>>
> >>>>>> Target frequency = C * load
> >>>>>>
> >>>>>> where C = policy->cpuinfo.max_freq / 100
> >>>>>>
> >>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> >>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> >>>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> >>>>>> that middle frequencies are used more, with this patch. Highest
> >>>>>> and lowest frequencies were used less by ~9%
> >>>
> >>> Can you also use powertop to measure the percentage of time spent in idle
> >>> states for the same workload with and without your patchset?  Also, it would
> >>> be good to measure the total energy consumption somehow ...
> >>>
> >>> Thanks,
> >>> Rafael
> >>
> >> Hi Rafael,
> >>
> >> I repeated the tests extracting also powertop results.
> >> Measurement steps with and without this patch:
> >> 1) Reboot system
> >> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
> >>     without taking measurement
> >> 3) Wait few minutes
> >> 4) Run Phoronix and powertop for 100secs and take measurement.
> > 
> > Well, while this is not conclusive, it definitely looks very promising. :-)
> > 
> > We're seeing measurable performance improvement with the patchset applied *and*
> > more time spent in idle states both at the same time.  I'd be very surprised if
> > the energy consumption measuremets did not confirm that the patchset allowed
> > us to reduce it.
> > 
> > If my computations are correct (somebody please check), the cores spent about
> > 20% more time in idle on the average with the patchset applied and in addition
> > to that the cc6 residency was greater by about 2% on the average with respect
> > to the kernel without the patchset.
> > 
> > We need to verify if there are gains (or at least no regressions) with other
> > workloads, but since this *also* reduces code complexity quite a bit, I'm
> > seriously considering taking it.
> > 
> >> I will try to repeat the test and take measurements with turbostat as
> >> Borislav suggested.
> > 
> > Please do!
> > 
> > Thanks,
> > Rafael
> > 
> 
> Hi,
> 
> I repeated the tests extracting results from turbostat.
> Measurement steps with and without this patch:
> 1) Reboot system
> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>    without taking measurement
> 3) Wait few minutes
> 4) Run Phoronix and turbostat (-i 100) and take measurement

You need to do something like

# ./turbostat <command invoking the phoronix suite>

Did you do that?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-07 20:57         ` Rafael J. Wysocki
@ 2013-06-08  9:56           ` Stratos Karafotis
  2013-06-08 11:18             ` Rafael J. Wysocki
  0 siblings, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-08  9:56 UTC (permalink / raw)
  To: Rafael J. Wysocki, Borislav Petkov
  Cc: Viresh Kumar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-pm, cpufreq, linux-kernel

On 06/07/2013 11:57 PM, Rafael J. Wysocki wrote:
> On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
>> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
>>> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
>>>> Hi Borislav,
>>>>
>>>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
>>>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>>>>>> Ondemand calculates load in terms of frequency and increases it only
>>>>>> if the load_freq is greater than up_threshold multiplied by current
>>>>>> or average frequency. This seems to produce oscillations of frequency
>>>>>> between min and max because, for example, a relatively small load can
>>>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
>>>>>> CPU will decrease back to min due to a small load_freq.
>>>>>
>>>>> Right, and I think this is how we want it, no?
>>>>>
>>>>> The thing is, the faster you finish your work, the faster you can become
>>>>> idle and save power.
>>>>
>>>> This is exactly the goal of this patch. To use more efficiently middle
>>>> frequencies to finish faster the work.
>>>>
>>>>> If you switch frequencies in a staircase-like manner, you're going to
>>>>> take longer to finish, in certain cases, and burn more power while doing
>>>>> so.
>>>>
>>>> This is not true with this patch. It switches to middle frequencies
>>>> when the load < up_threshold.
>>>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
>>>> load is greater than up_threshold.
>>>>
>>>>> Btw, racing to idle is also a good example for why you want boosting:
>>>>> you want to go max out the core but stay within power limits so that you
>>>>> can finish sooner.
>>>>>
>>>>>> This patch changes the calculation method of load and target frequency
>>>>>> considering 2 points:
>>>>>> - Load computation should be independent from current or average
>>>>>> measured frequency. For example an absolute load 80% at 100MHz is not
>>>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>>>>>> - Target frequency should be increased to any value of frequency table
>>>>>> proportional to absolute load, instead to only the max. Thus:
>>>>>>
>>>>>> Target frequency = C * load
>>>>>>
>>>>>> where C = policy->cpuinfo.max_freq / 100
>>>>>>
>>>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>>>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>>>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>>>>>> that middle frequencies are used more, with this patch. Highest
>>>>>> and lowest frequencies were used less by ~9%
>>>
>>> Can you also use powertop to measure the percentage of time spent in idle
>>> states for the same workload with and without your patchset?  Also, it would
>>> be good to measure the total energy consumption somehow ...
>>>
>>> Thanks,
>>> Rafael
>>
>> Hi Rafael,
>>
>> I repeated the tests extracting also powertop results.
>> Measurement steps with and without this patch:
>> 1) Reboot system
>> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>>     without taking measurement
>> 3) Wait few minutes
>> 4) Run Phoronix and powertop for 100secs and take measurement.
> 
> Well, while this is not conclusive, it definitely looks very promising. :-)
> 
> We're seeing measurable performance improvement with the patchset applied *and*
> more time spent in idle states both at the same time.  I'd be very surprised if
> the energy consumption measuremets did not confirm that the patchset allowed
> us to reduce it.
> 
> If my computations are correct (somebody please check), the cores spent about
> 20% more time in idle on the average with the patchset applied and in addition
> to that the cc6 residency was greater by about 2% on the average with respect
> to the kernel without the patchset.
> 
> We need to verify if there are gains (or at least no regressions) with other
> workloads, but since this *also* reduces code complexity quite a bit, I'm
> seriously considering taking it.
> 
>> I will try to repeat the test and take measurements with turbostat as
>> Borislav suggested.
> 
> Please do!
> 
> Thanks,
> Rafael
> 

Hi,

I repeated the tests extracting results from turbostat.
Measurement steps with and without this patch:
1) Reboot system
2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
   without taking measurement
3) Wait few minutes
4) Run Phoronix and turbostat (-i 100) and take measurement


Thanks,
Stratos

------------------------------------------------------------------
Test WITHOUT this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3v+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): n


Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 12:38:35
        Started Run 1 @ 12:38:46
        Running Interim Test Script @ 12:38:59
        Started Run 2 @ 12:39:03
        Running Interim Test Script @ 12:39:14
        Started Run 3 @ 12:39:18
        Running Interim Test Script @ 12:39:27  [Std. Dev: 8.57%]
        Started Run 4 @ 12:39:31
        Running Interim Test Script @ 12:39:41  [Std. Dev: 8.56%]
        Started Run 5 @ 12:39:44
        Running Interim Test Script @ 12:39:54  [Std. Dev: 8.05%]
        Started Run 6 @ 12:39:58  [Std. Dev: 7.57%]
        Running Post-Test Script @ 12:40:07

    Test Results:
        10.280334949493
        11.148964166641
        9.3881862163544
        9.3307340145111
        9.3948450088501
        9.3976459503174

    Average: 9.82 Seconds

cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
         38.86 3.57 3.39   0  10.07   2.98  48.09   0.00   44   44   0.00   0.00   0.00   0.00  26.23  20.28  0.00
  0   0  33.32 3.65 3.39   0  19.88   3.26  43.54   0.00   44   44   0.00   0.00   0.00   0.00  26.23  20.28  0.00
  0   4  48.87 3.52 3.39   0   4.32
  1   1  35.58 3.67 3.39   0  12.93   3.28  48.21   0.00   39
  1   5  42.12 3.51 3.39   0   6.39
  2   2  33.42 3.66 3.39   0  13.11   2.78  50.69   0.00   34
  2   6  40.83 3.43 3.39   0   5.70
  3   3  35.97 3.68 3.39   0  11.51   2.61  49.92   0.00   39
  3   7  40.75 3.49 3.39   0   6.73


---------------------------------------------------------------------
Test WITH this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): n


Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 12:28:03
        Started Run 1 @ 12:28:15
        Running Interim Test Script @ 12:28:28
        Started Run 2 @ 12:28:31
        Running Interim Test Script @ 12:28:41
        Started Run 3 @ 12:28:47
        Running Interim Test Script @ 12:28:56  [Std. Dev: 5.03%]
        Started Run 4 @ 12:29:00
        Running Interim Test Script @ 12:29:09  [Std. Dev: 4.37%]
        Started Run 5 @ 12:29:13
        Running Interim Test Script @ 12:29:22  [Std. Dev: 3.79%]
        Started Run 6 @ 12:29:26  [Std. Dev: 3.49%]
        Running Post-Test Script @ 12:29:35

    Test Results:
        10.134061098099
        9.3411478996277
        9.2629590034485
        9.3126730918884
        9.4799311161041
        9.3236708641052

    Average: 9.48 Seconds

cor CPU    %c0  GHz  TSC SMI    %c1    %c3    %c6    %c7 CTMP PTMP   %pc2   %pc3   %pc6   %pc7  Pkg_W  Cor_W GFX_W
         38.61 3.59 3.39   0   9.64   3.04  48.71   0.00   43   43   0.00   0.00   0.00   0.00  26.30  20.35  0.00
  0   0  34.73 3.67 3.39   0  13.33   3.02  48.93   0.00   43   43   0.00   0.00   0.00   0.00  26.30  20.35  0.00
  0   4  41.86 3.52 3.39   0   6.19
  1   1  33.48 3.66 3.39   0  12.53   4.00  49.99   0.00   40
  1   5  40.62 3.52 3.39   0   5.39
  2   2  34.41 3.66 3.39   0  18.06   2.98  44.55   0.00   35
  2   6  48.26 3.58 3.39   0   4.22
  3   3  35.79 3.69 3.39   0  10.70   2.16  51.36   0.00   40
  3   7  39.77 3.50 3.39   0   6.71




^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-07 19:14       ` Stratos Karafotis
@ 2013-06-07 20:57         ` Rafael J. Wysocki
  2013-06-08  9:56           ` Stratos Karafotis
  0 siblings, 1 reply; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-07 20:57 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Friday, June 07, 2013 10:14:34 PM Stratos Karafotis wrote:
> On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
> > On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> >> Hi Borislav,
> >>
> >> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> >>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> >>>> Ondemand calculates load in terms of frequency and increases it only
> >>>> if the load_freq is greater than up_threshold multiplied by current
> >>>> or average frequency. This seems to produce oscillations of frequency
> >>>> between min and max because, for example, a relatively small load can
> >>>> easily saturate minimum frequency and lead the CPU to max. Then, the
> >>>> CPU will decrease back to min due to a small load_freq.
> >>>
> >>> Right, and I think this is how we want it, no?
> >>>
> >>> The thing is, the faster you finish your work, the faster you can become
> >>> idle and save power.
> >>
> >> This is exactly the goal of this patch. To use more efficiently middle
> >> frequencies to finish faster the work.
> >>
> >>> If you switch frequencies in a staircase-like manner, you're going to
> >>> take longer to finish, in certain cases, and burn more power while doing
> >>> so.
> >>
> >> This is not true with this patch. It switches to middle frequencies
> >> when the load < up_threshold.
> >> Now, ondemand does not increase freq. CPU runs in lowest freq till the
> >> load is greater than up_threshold.
> >>
> >>> Btw, racing to idle is also a good example for why you want boosting:
> >>> you want to go max out the core but stay within power limits so that you
> >>> can finish sooner.
> >>>
> >>>> This patch changes the calculation method of load and target frequency
> >>>> considering 2 points:
> >>>> - Load computation should be independent from current or average
> >>>> measured frequency. For example an absolute load 80% at 100MHz is not
> >>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> >>>> - Target frequency should be increased to any value of frequency table
> >>>> proportional to absolute load, instead to only the max. Thus:
> >>>>
> >>>> Target frequency = C * load
> >>>>
> >>>> where C = policy->cpuinfo.max_freq / 100
> >>>>
> >>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> >>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> >>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> >>>> that middle frequencies are used more, with this patch. Highest
> >>>> and lowest frequencies were used less by ~9%
> > 
> > Can you also use powertop to measure the percentage of time spent in idle
> > states for the same workload with and without your patchset?  Also, it would
> > be good to measure the total energy consumption somehow ...
> > 
> > Thanks,
> > Rafael
> 
> Hi Rafael,
> 
> I repeated the tests extracting also powertop results.
> Measurement steps with and without this patch:
> 1) Reboot system
> 2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
>    without taking measurement
> 3) Wait few minutes
> 4) Run Phoronix and powertop for 100secs and take measurement.

Well, while this is not conclusive, it definitely looks very promising. :-)

We're seeing measurable performance improvement with the patchset applied *and*
more time spent in idle states both at the same time.  I'd be very surprised if
the energy consumption measuremets did not confirm that the patchset allowed
us to reduce it.

If my computations are correct (somebody please check), the cores spent about
20% more time in idle on the average with the patchset applied and in addition
to that the cc6 residency was greater by about 2% on the average with respect
to the kernel without the patchset.

We need to verify if there are gains (or at least no regressions) with other
workloads, but since this *also* reduces code complexity quite a bit, I'm
seriously considering taking it.

> I will try to repeat the test and take measurements with turbostat as
> Borislav suggested.

Please do!

Thanks,
Rafael


> ------------------------------------------------------------------
> Test WITHOUT this patch:
> 
> Phoronix Test Suite v4.6.0
> 
>     Installed: pts/build-linux-kernel-1.3.0
> 
> System Information
> 
> Hardware:
> Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R
> 
> Software:
> OS: Fedora 18, Kernel: 3.10.0-rc3v+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080
> 
>     Would you like to save these test results (Y/n): n
> 
> 
> Timed Linux Kernel Compilation 3.1:
>     pts/build-linux-kernel-1.3.0
>     Test 1 of 1
>     Estimated Trial Run Count:    3
>     Estimated Time To Completion: 2 Minutes
>         Running Pre-Test Script @ 21:41:19
>         Started Run 1 @ 21:41:30
>         Running Interim Test Script @ 21:41:44
>         Started Run 2 @ 21:41:47
>         Running Interim Test Script @ 21:42:02
>         Started Run 3 @ 21:42:05
>         Running Interim Test Script @ 21:42:15  [Std. Dev: 19.28%]
>         Started Run 4 @ 21:42:19
>         Running Interim Test Script @ 21:42:29  [Std. Dev: 18.72%]
>         Started Run 5 @ 21:42:32
>         Running Interim Test Script @ 21:42:42  [Std. Dev: 17.84%]
>         Started Run 6 @ 21:42:46  [Std. Dev: 16.91%]
>         Running Post-Test Script @ 21:42:55
> 
>     Test Results:
>         11.073544979095
>         14.059958934784
>         9.6814110279083
>         9.6158590316772
>         9.5762379169464
>         9.5944919586182
> 
>     Average: 10.60 Seconds
> 
> Powertop results:
> http://www.semaphore.gr/results/powertop_without.html
> 
> 
> ---------------------------------------------------------------------
> Test WITH this patch:
> 
> Phoronix Test Suite v4.6.0
> 
>     Installed: pts/build-linux-kernel-1.3.0
> 
> System Information
> 
> Hardware:
> Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R
> 
> Software:
> OS: Fedora 18, Kernel: 3.10.0-rc3+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080
> 
>     Would you like to save these test results (Y/n): n
> 
> 
> Timed Linux Kernel Compilation 3.1:
>     pts/build-linux-kernel-1.3.0
>     Test 1 of 1
>     Estimated Trial Run Count:    3
>     Estimated Time To Completion: 2 Minutes
>         Running Pre-Test Script @ 21:28:05
>         Started Run 1 @ 21:28:17
>         Running Interim Test Script @ 21:28:30
>         Started Run 2 @ 21:28:34
>         Running Interim Test Script @ 21:28:44
>         Started Run 3 @ 21:28:47
>         Running Interim Test Script @ 21:28:58  [Std. Dev: 4.81%]
>         Started Run 4 @ 21:29:02
>         Running Interim Test Script @ 21:29:12  [Std. Dev: 6.05%]
>         Started Run 5 @ 21:29:15
>         Running Interim Test Script @ 21:29:25  [Std. Dev: 6.13%]
>         Started Run 6 @ 21:29:28  [Std. Dev: 6.02%]
>         Running Post-Test Script @ 21:29:38
> 
>     Test Results:
>         10.442322015762
>         10.038927078247
>         11.044027090073
>         9.5781810283661
>         9.5812470912933
>         9.5545389652252
> 
>     Average: 10.04 Seconds
> 
> Powertop results:
> http://www.semaphore.gr/results/powertop_with.html
> --
> To unsubscribe from this list: send the line "unsubscribe cpufreq" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-05 20:35     ` Rafael J. Wysocki
  2013-06-06 10:01       ` Borislav Petkov
@ 2013-06-07 19:14       ` Stratos Karafotis
  2013-06-07 20:57         ` Rafael J. Wysocki
  1 sibling, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-07 19:14 UTC (permalink / raw)
  To: Rafael J. Wysocki, Borislav Petkov
  Cc: Viresh Kumar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-pm, cpufreq, linux-kernel

On 06/05/2013 11:35 PM, Rafael J. Wysocki wrote:
> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
>> Hi Borislav,
>>
>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
>>> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>>>> Ondemand calculates load in terms of frequency and increases it only
>>>> if the load_freq is greater than up_threshold multiplied by current
>>>> or average frequency. This seems to produce oscillations of frequency
>>>> between min and max because, for example, a relatively small load can
>>>> easily saturate minimum frequency and lead the CPU to max. Then, the
>>>> CPU will decrease back to min due to a small load_freq.
>>>
>>> Right, and I think this is how we want it, no?
>>>
>>> The thing is, the faster you finish your work, the faster you can become
>>> idle and save power.
>>
>> This is exactly the goal of this patch. To use more efficiently middle
>> frequencies to finish faster the work.
>>
>>> If you switch frequencies in a staircase-like manner, you're going to
>>> take longer to finish, in certain cases, and burn more power while doing
>>> so.
>>
>> This is not true with this patch. It switches to middle frequencies
>> when the load < up_threshold.
>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
>> load is greater than up_threshold.
>>
>>> Btw, racing to idle is also a good example for why you want boosting:
>>> you want to go max out the core but stay within power limits so that you
>>> can finish sooner.
>>>
>>>> This patch changes the calculation method of load and target frequency
>>>> considering 2 points:
>>>> - Load computation should be independent from current or average
>>>> measured frequency. For example an absolute load 80% at 100MHz is not
>>>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>>>> - Target frequency should be increased to any value of frequency table
>>>> proportional to absolute load, instead to only the max. Thus:
>>>>
>>>> Target frequency = C * load
>>>>
>>>> where C = policy->cpuinfo.max_freq / 100
>>>>
>>>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>>>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>>>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>>>> that middle frequencies are used more, with this patch. Highest
>>>> and lowest frequencies were used less by ~9%
> 
> Can you also use powertop to measure the percentage of time spent in idle
> states for the same workload with and without your patchset?  Also, it would
> be good to measure the total energy consumption somehow ...
> 
> Thanks,
> Rafael

Hi Rafael,

I repeated the tests extracting also powertop results.
Measurement steps with and without this patch:
1) Reboot system
2) Running twice Phoronix benchmark of Linux Kernel Compilation 3.1 test
   without taking measurement
3) Wait few minutes
4) Run Phoronix and powertop for 100secs and take measurement.

I will try to repeat the test and take measurements with turbostat as
Borislav suggested.


Thanks,
Stratos

------------------------------------------------------------------
Test WITHOUT this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3v+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): n


Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 21:41:19
        Started Run 1 @ 21:41:30
        Running Interim Test Script @ 21:41:44
        Started Run 2 @ 21:41:47
        Running Interim Test Script @ 21:42:02
        Started Run 3 @ 21:42:05
        Running Interim Test Script @ 21:42:15  [Std. Dev: 19.28%]
        Started Run 4 @ 21:42:19
        Running Interim Test Script @ 21:42:29  [Std. Dev: 18.72%]
        Started Run 5 @ 21:42:32
        Running Interim Test Script @ 21:42:42  [Std. Dev: 17.84%]
        Started Run 6 @ 21:42:46  [Std. Dev: 16.91%]
        Running Post-Test Script @ 21:42:55

    Test Results:
        11.073544979095
        14.059958934784
        9.6814110279083
        9.6158590316772
        9.5762379169464
        9.5944919586182

    Average: 10.60 Seconds

Powertop results:
http://www.semaphore.gr/results/powertop_without.html


---------------------------------------------------------------------
Test WITH this patch:

Phoronix Test Suite v4.6.0

    Installed: pts/build-linux-kernel-1.3.0

System Information

Hardware:
Processor: Intel Core i7-3770 @ 3.40GHz (8 Cores), Motherboard: ASUS CM6870, Chipset: Intel Xeon E3-1200 v2/3rd, Memory: 2 x 4096 MB DDR3-1600MHz HY64C1C1624ZY, Disk: 1000GB Seagate ST1000DM003-9YN1, Graphics: NVIDIA GeForce GT 640 3072MB, Audio: Realtek ALC892, Monitor: S23B350, Network: Realtek RTL8111/8168 + Ralink RT3090 Wireless 802.11n 1T/1R

Software:
OS: Fedora 18, Kernel: 3.10.0-rc3+ (x86_64), Desktop: KDE 4.10.3, Display Server: X Server 1.13.3, Display Driver: nouveau 1.0.7, File-System: ext4, Screen Resolution: 1920x1080

    Would you like to save these test results (Y/n): n


Timed Linux Kernel Compilation 3.1:
    pts/build-linux-kernel-1.3.0
    Test 1 of 1
    Estimated Trial Run Count:    3
    Estimated Time To Completion: 2 Minutes
        Running Pre-Test Script @ 21:28:05
        Started Run 1 @ 21:28:17
        Running Interim Test Script @ 21:28:30
        Started Run 2 @ 21:28:34
        Running Interim Test Script @ 21:28:44
        Started Run 3 @ 21:28:47
        Running Interim Test Script @ 21:28:58  [Std. Dev: 4.81%]
        Started Run 4 @ 21:29:02
        Running Interim Test Script @ 21:29:12  [Std. Dev: 6.05%]
        Started Run 5 @ 21:29:15
        Running Interim Test Script @ 21:29:25  [Std. Dev: 6.13%]
        Started Run 6 @ 21:29:28  [Std. Dev: 6.02%]
        Running Post-Test Script @ 21:29:38

    Test Results:
        10.442322015762
        10.038927078247
        11.044027090073
        9.5781810283661
        9.5812470912933
        9.5545389652252

    Average: 10.04 Seconds

Powertop results:
http://www.semaphore.gr/results/powertop_with.html

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-06 17:11               ` Borislav Petkov
@ 2013-06-06 17:32                 ` Stratos Karafotis
  0 siblings, 0 replies; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-06 17:32 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Viresh Kumar, Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On 06/06/2013 08:11 PM, Borislav Petkov wrote:
> On Thu, Jun 06, 2013 at 07:46:17PM +0300, Stratos Karafotis wrote:
>> Apologies for top-posting. I was able to send email only from my phone.
>>
>> Thanks for you hint about turbostat.
>>
>> As you most probably understood, I'm individual amateur kernel developer.
>> I could provide some numbers from x86 architecture as Rafael suggested.
>> But unfortunately, I don't have access to more sources/infrastructure.
>> So, I will not be able to provide numbers from different platform(s).
>>
>> I've already provided some benchmarks from x86 (3.10-rc3) and also
>> tested the patch in 3.4.47 kernel (ARM, Nexus 4 phone, ~1000 installations)
>> and in 3.0.80 kernel (ARM, Samsung Galaxy S phone, ~1500 installations).
>>
>> Kindly let me know if "couple of platforms/vendors" is a show stopper
>> for this patch series. If yes, please ignore this patch and accept
>> my apologies for wasting your time. I am just trying to contribute
>> on this project (I believe there is space here for amateur developers).
>
> I'm in no way discouraging you in contributing to the kernel - on the
> opposite: you should continue doing that.

I will try! :)

> I'm just trying to make sure that a change like that doesn't hurt
> existing systems, thus the request to test on a couple of platforms. If
> you don't have other platforms, that's fine, we'll find them somewhere. :-)
>
> I'm hoping you can understand my aspect too, though - how would you feel
> if a patch shows improvement on my box but slows down yours - you won't
> be very happy with it, right? That's why we generally want to test such
> power/performance tweaks on a wider range of machines.

I'm totally understand your aspect and I think you are absolutely
right. I just wanted to declare that I am not able to provide numbers
for other platforms due to lack of hardware.

> But you said you have a i7-3770 CPU on which, I think, turbostat should
> be able to show you how the power consumption looks like.
>
> And if so, you could measure that consumption once with, and once
> without your patch. This will give us initial numbers, at least.
>
> How does that sound?
>

That sounds perfect! I will provide numbers for i7 soon.

Thanks for your comments!
Stratos


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-06 16:46             ` Stratos Karafotis
@ 2013-06-06 17:11               ` Borislav Petkov
  2013-06-06 17:32                 ` Stratos Karafotis
  0 siblings, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2013-06-06 17:11 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Viresh Kumar, Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Thu, Jun 06, 2013 at 07:46:17PM +0300, Stratos Karafotis wrote:
> Apologies for top-posting. I was able to send email only from my phone.
> 
> Thanks for you hint about turbostat.
> 
> As you most probably understood, I'm individual amateur kernel developer.
> I could provide some numbers from x86 architecture as Rafael suggested.
> But unfortunately, I don't have access to more sources/infrastructure.
> So, I will not be able to provide numbers from different platform(s).
> 
> I've already provided some benchmarks from x86 (3.10-rc3) and also
> tested the patch in 3.4.47 kernel (ARM, Nexus 4 phone, ~1000 installations)
> and in 3.0.80 kernel (ARM, Samsung Galaxy S phone, ~1500 installations).
> 
> Kindly let me know if "couple of platforms/vendors" is a show stopper
> for this patch series. If yes, please ignore this patch and accept
> my apologies for wasting your time. I am just trying to contribute
> on this project (I believe there is space here for amateur developers).

I'm in no way discouraging you in contributing to the kernel - on the
opposite: you should continue doing that.

I'm just trying to make sure that a change like that doesn't hurt
existing systems, thus the request to test on a couple of platforms. If
you don't have other platforms, that's fine, we'll find them somewhere. :-)

I'm hoping you can understand my aspect too, though - how would you feel
if a patch shows improvement on my box but slows down yours - you won't
be very happy with it, right? That's why we generally want to test such
power/performance tweaks on a wider range of machines.

But you said you have a i7-3770 CPU on which, I think, turbostat should
be able to show you how the power consumption looks like.

And if so, you could measure that consumption once with, and once
without your patch. This will give us initial numbers, at least.

How does that sound?

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-06 12:10           ` Borislav Petkov
@ 2013-06-06 16:46             ` Stratos Karafotis
  2013-06-06 17:11               ` Borislav Petkov
  0 siblings, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-06 16:46 UTC (permalink / raw)
  To: Borislav Petkov, Viresh Kumar, Rafael J. Wysocki
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On 06/06/2013 03:10 PM, Borislav Petkov wrote:
> On Thu, Jun 06, 2013 at 03:40:13PM +0530, Viresh Kumar wrote:
>> his patch will give significant improvement both power & performance wise.
> 
> Yes, and I'd like to see the paperwork on that. Numbers, and on a couple
> of platforms/vendors if possible, please.
> 
> Thanks.
> 

On 06/06/2013 04:15 PM, Borislav Petkov wrote:> Please do not top-post.
> 
> On Thu, Jun 06, 2013 at 03:54:20PM +0300, Stratos Karafotis wrote:
>> I will try to provide the requested info (although, I'm not sure how
>> to measure total energy :) )
> 
> tools/power/x86/turbostat looks like a good tool. It can show, a.o.,
> power consumption in Watts on modern Intels and other interesting stuff.
> 
> HTH.
> 

Apologies for top-posting. I was able to send email only from my phone.

Thanks for you hint about turbostat.

As you most probably understood, I'm individual amateur kernel developer.
I could provide some numbers from x86 architecture as Rafael suggested.
But unfortunately, I don't have access to more sources/infrastructure.
So, I will not be able to provide numbers from different platform(s).

I've already provided some benchmarks from x86 (3.10-rc3) and also
tested the patch in 3.4.47 kernel (ARM, Nexus 4 phone, ~1000 installations)
and in 3.0.80 kernel (ARM, Samsung Galaxy S phone, ~1500 installations).

Kindly let me know if "couple of platforms/vendors" is a show stopper
for this patch series. If yes, please ignore this patch and accept
my apologies for wasting your time. I am just trying to contribute
on this project (I believe there is space here for amateur developers).

Many thanks to Rafael who helped me and guide me.
Thanks to Viresh for his helpful comments and his acknowledgment for
the patch.

Best Regards,
Stratos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-06  9:55     ` Borislav Petkov
  2013-06-06  9:57       ` Viresh Kumar
@ 2013-06-06 13:50       ` David C Niemi
  1 sibling, 0 replies; 48+ messages in thread
From: David C Niemi @ 2013-06-06 13:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stratos Karafotis, Rafael J. Wysocki, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

On 06/06/13 05:55, Borislav Petkov wrote:
> Please do not top-post.
>
> On Wed, Jun 05, 2013 at 12:58:33PM -0400, David C Niemi wrote:
>> When you are doing a locally-originated truly CPU-bound task, "race to
>> idle" does make some sense. But I can think of a couple of caveats.
>>
>> 1) If you care about power consumption, you want to avoid
>> super-power-hungry turbo states, as you get less done per watt-hour
>> than in some of the middle states.
>>
>> 2) CPU usage that is related to I/O (network, disk, video) doesn't
>> necessarily let you go to idle sooner if at all. In this case if you
>> want to minimize power consumption you may want to use middle states a
>> lot. But if you care more about responsiveness or latency than power
>> consumption, you might want to go to a high state anyway; that is why
>> we have tunables -- so we can configure based on the actual priorities
>> for the machine.
> No, users don't always know about tunables - this should Just Work(tm).
It should "Just Work" for the common mass-market case.  Tunables are not for the average end-user -- they are for either the userland part of the operating system to manage, or for people like me who have specific requirements to meet on thoroughly managed machines.  Without tunables you will be lumping servers, desktops, laptops, and embedded devices together and they simply do not have the same high-level priorities.
>
> The correct "fix" for this whole deal is coupling cpufreq with
> the scheduler, as it has been said so many times before. You need
> "something" which can tell you whether raising the freq. is worth it or
> not (i.e. the process is waiting on IO or is executing instructions).
I'll grant you this in the case of regular userland processes that have medium to large chunks of work to do.  For handling huge amounts of I/O, you have different needs -- think about cases where you need to peg many of your cores at once just handling I/O.  That has to work well too.  That's not saying the scheduler can't help, but the governor needs to know about all CPU consumed, including doing I/O and in all parts of the kernel.

Another part of this picture is the p-state governor.  That is even more scheduler-relevant than the c-state governor.
...
DCN

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-06 12:54 ` Stratos Karafotis
  (?)
@ 2013-06-06 13:15 ` Borislav Petkov
  -1 siblings, 0 replies; 48+ messages in thread
From: Borislav Petkov @ 2013-06-06 13:15 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Rafael J. Wysocki, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

Please do not top-post.

On Thu, Jun 06, 2013 at 03:54:20PM +0300, Stratos Karafotis wrote:
> I will try to provide the requested info (although, I'm not sure how
> to measure total energy :) )

tools/power/x86/turbostat looks like a good tool. It can show, a.o.,
power consumption in Watts on modern Intels and other interesting stuff.

HTH.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-06 12:56 ` Stratos Karafotis
  0 siblings, 0 replies; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-06 12:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Borislav Petkov, Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 1436 bytes --]

Thanks Viresh. I think I couldn't explain this in better way.
Also thanks for acknowledgment!

Stratos

Viresh Kumar <viresh.kumar@linaro.org> wrote:

>On 6 June 2013 15:31, Borislav Petkov <bp@suse.de> wrote:
>
>> Hold on, you say above "easily saturate minimum frequency and lead the
>> CPU to max". I read this as we jump straight to max P-state where we
>> even boost.
>
>Probably he meant: "At lowest levels of frequencies, a small load on system
>may look like a huge one. like: 20-30% load on max freq can be 95% load
>on min freq. And so we jump to max freq even for this load and return back
>pretty quickly as this load doesn't sustain for longer. over that we wait for
>load to go over up_threshold to increase freq."
>
>> "CPU to max" finishes the work faster than middle frequencies, if you're
>> CPU-bound.
>
>He isn't removing this feature at all.
>
>Current code is:
>
>if (load > up_threshold)
>   goto maxfreq.
>else
>   don't increase freq, maybe decrease it in steps
>
>What he is doing is:
>
>if (load > up_threshold)
>   goto maxfreq.
>else
>   increase/decrease freq based on current load.
>
>So, if up_threshold is 95 and load remains < 95, his patch will
>give significant improvement both power & performance wise.
>
>Else, it shouldn't decrease it.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-06 12:56 ` Stratos Karafotis
  0 siblings, 0 replies; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-06 12:56 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Borislav Petkov, Rafael J. Wysocki, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

Thanks Viresh. I think I couldn't explain this in better way.
Also thanks for acknowledgment!

Stratos

Viresh Kumar <viresh.kumar@linaro.org> wrote:

>On 6 June 2013 15:31, Borislav Petkov <bp@suse.de> wrote:
>
>> Hold on, you say above "easily saturate minimum frequency and lead the
>> CPU to max". I read this as we jump straight to max P-state where we
>> even boost.
>
>Probably he meant: "At lowest levels of frequencies, a small load on system
>may look like a huge one. like: 20-30% load on max freq can be 95% load
>on min freq. And so we jump to max freq even for this load and return back
>pretty quickly as this load doesn't sustain for longer. over that we wait for
>load to go over up_threshold to increase freq."
>
>> "CPU to max" finishes the work faster than middle frequencies, if you're
>> CPU-bound.
>
>He isn't removing this feature at all.
>
>Current code is:
>
>if (load > up_threshold)
>   goto maxfreq.
>else
>   don't increase freq, maybe decrease it in steps
>
>What he is doing is:
>
>if (load > up_threshold)
>   goto maxfreq.
>else
>   increase/decrease freq based on current load.
>
>So, if up_threshold is 95 and load remains < 95, his patch will
>give significant improvement both power & performance wise.
>
>Else, it shouldn't decrease it.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-06 12:54 ` Stratos Karafotis
  0 siblings, 0 replies; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-06 12:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=utf-8, Size: 3075 bytes --]

Hi Rafael,

I will try to provide the requested info (although, I'm not sure how to measure total energy :) )

Thanks,
Stratos

"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

>On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
>> Hi Borislav,
>> 
>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
>> > On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>> >> Ondemand calculates load in terms of frequency and increases it only
>> >> if the load_freq is greater than up_threshold multiplied by current
>> >> or average frequency. This seems to produce oscillations of frequency
>> >> between min and max because, for example, a relatively small load can
>> >> easily saturate minimum frequency and lead the CPU to max. Then, the
>> >> CPU will decrease back to min due to a small load_freq.
>> >
>> > Right, and I think this is how we want it, no?
>> >
>> > The thing is, the faster you finish your work, the faster you can become
>> > idle and save power.
>> 
>> This is exactly the goal of this patch. To use more efficiently middle
>> frequencies to finish faster the work.
>> 
>> > If you switch frequencies in a staircase-like manner, you're going to
>> > take longer to finish, in certain cases, and burn more power while doing
>> > so.
>> 
>> This is not true with this patch. It switches to middle frequencies
>> when the load < up_threshold.
>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
>> load is greater than up_threshold.
>> 
>> > Btw, racing to idle is also a good example for why you want boosting:
>> > you want to go max out the core but stay within power limits so that you
>> > can finish sooner.
>> >
>> >> This patch changes the calculation method of load and target frequency
>> >> considering 2 points:
>> >> - Load computation should be independent from current or average
>> >> measured frequency. For example an absolute load 80% at 100MHz is not
>> >> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>> >> - Target frequency should be increased to any value of frequency table
>> >> proportional to absolute load, instead to only the max. Thus:
>> >>
>> >> Target frequency = C * load
>> >>
>> >> where C = policy->cpuinfo.max_freq / 100
>> >>
>> >> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>> >> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>> >> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>> >> that middle frequencies are used more, with this patch. Highest
>> >> and lowest frequencies were used less by ~9%
>
>Can you also use powertop to measure the percentage of time spent in idle
>states for the same workload with and without your patchset?  Also, it would
>be good to measure the total energy consumption somehow ...
>
>Thanks,
>Rafael
>
>
>-- 
>I speak only for myself.
>Rafael J. Wysocki, Intel Open Source Technology Center.
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-06 12:54 ` Stratos Karafotis
  0 siblings, 0 replies; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-06 12:54 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

Hi Rafael,

I will try to provide the requested info (although, I'm not sure how to measure total energy :) )

Thanks,
Stratos

"Rafael J. Wysocki" <rjw@sisk.pl> wrote:

>On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
>> Hi Borislav,
>> 
>> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
>> > On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>> >> Ondemand calculates load in terms of frequency and increases it only
>> >> if the load_freq is greater than up_threshold multiplied by current
>> >> or average frequency. This seems to produce oscillations of frequency
>> >> between min and max because, for example, a relatively small load can
>> >> easily saturate minimum frequency and lead the CPU to max. Then, the
>> >> CPU will decrease back to min due to a small load_freq.
>> >
>> > Right, and I think this is how we want it, no?
>> >
>> > The thing is, the faster you finish your work, the faster you can become
>> > idle and save power.
>> 
>> This is exactly the goal of this patch. To use more efficiently middle
>> frequencies to finish faster the work.
>> 
>> > If you switch frequencies in a staircase-like manner, you're going to
>> > take longer to finish, in certain cases, and burn more power while doing
>> > so.
>> 
>> This is not true with this patch. It switches to middle frequencies
>> when the load < up_threshold.
>> Now, ondemand does not increase freq. CPU runs in lowest freq till the
>> load is greater than up_threshold.
>> 
>> > Btw, racing to idle is also a good example for why you want boosting:
>> > you want to go max out the core but stay within power limits so that you
>> > can finish sooner.
>> >
>> >> This patch changes the calculation method of load and target frequency
>> >> considering 2 points:
>> >> - Load computation should be independent from current or average
>> >> measured frequency. For example an absolute load 80% at 100MHz is not
>> >> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>> >> - Target frequency should be increased to any value of frequency table
>> >> proportional to absolute load, instead to only the max. Thus:
>> >>
>> >> Target frequency = C * load
>> >>
>> >> where C = policy->cpuinfo.max_freq / 100
>> >>
>> >> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>> >> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>> >> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>> >> that middle frequencies are used more, with this patch. Highest
>> >> and lowest frequencies were used less by ~9%
>
>Can you also use powertop to measure the percentage of time spent in idle
>states for the same workload with and without your patchset?  Also, it would
>be good to measure the total energy consumption somehow ...
>
>Thanks,
>Rafael
>
>
>-- 
>I speak only for myself.
>Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-06 10:10         ` Viresh Kumar
@ 2013-06-06 12:10           ` Borislav Petkov
  2013-06-06 16:46             ` Stratos Karafotis
  0 siblings, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2013-06-06 12:10 UTC (permalink / raw)
  To: Viresh Kumar
  Cc: Rafael J. Wysocki, Stratos Karafotis, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Thu, Jun 06, 2013 at 03:40:13PM +0530, Viresh Kumar wrote:
> his patch will give significant improvement both power & performance wise.

Yes, and I'd like to see the paperwork on that. Numbers, and on a couple
of platforms/vendors if possible, please.

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-06 10:01       ` Borislav Petkov
@ 2013-06-06 10:10         ` Viresh Kumar
  2013-06-06 12:10           ` Borislav Petkov
  0 siblings, 1 reply; 48+ messages in thread
From: Viresh Kumar @ 2013-06-06 10:10 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, Stratos Karafotis, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On 6 June 2013 15:31, Borislav Petkov <bp@suse.de> wrote:

> Hold on, you say above "easily saturate minimum frequency and lead the
> CPU to max". I read this as we jump straight to max P-state where we
> even boost.

Probably he meant: "At lowest levels of frequencies, a small load on system
may look like a huge one. like: 20-30% load on max freq can be 95% load
on min freq. And so we jump to max freq even for this load and return back
pretty quickly as this load doesn't sustain for longer. over that we wait for
load to go over up_threshold to increase freq."

> "CPU to max" finishes the work faster than middle frequencies, if you're
> CPU-bound.

He isn't removing this feature at all.

Current code is:

if (load > up_threshold)
   goto maxfreq.
else
   don't increase freq, maybe decrease it in steps

What he is doing is:

if (load > up_threshold)
   goto maxfreq.
else
   increase/decrease freq based on current load.

So, if up_threshold is 95 and load remains < 95, his patch will
give significant improvement both power & performance wise.

Else, it shouldn't decrease it.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-05 20:35     ` Rafael J. Wysocki
@ 2013-06-06 10:01       ` Borislav Petkov
  2013-06-06 10:10         ` Viresh Kumar
  2013-06-07 19:14       ` Stratos Karafotis
  1 sibling, 1 reply; 48+ messages in thread
From: Borislav Petkov @ 2013-06-06 10:01 UTC (permalink / raw)
  To: Rafael J. Wysocki, Stratos Karafotis
  Cc: Viresh Kumar, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	linux-pm, cpufreq, linux-kernel

On Wed, Jun 05, 2013 at 10:35:05PM +0200, Rafael J. Wysocki wrote:
> On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> > Hi Borislav,
> > 
> > On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> > > On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> > >> Ondemand calculates load in terms of frequency and increases it only
> > >> if the load_freq is greater than up_threshold multiplied by current
> > >> or average frequency. This seems to produce oscillations of frequency
> > >> between min and max because, for example, a relatively small load can
> > >> easily saturate minimum frequency and lead the CPU to max. Then, the
> > >> CPU will decrease back to min due to a small load_freq.
> > >
> > > Right, and I think this is how we want it, no?
> > >
> > > The thing is, the faster you finish your work, the faster you can become
> > > idle and save power.
> > 
> > This is exactly the goal of this patch. To use more efficiently middle
> > frequencies to finish faster the work.

Hold on, you say above "easily saturate minimum frequency and lead the
CPU to max". I read this as we jump straight to max P-state where we
even boost.

"CPU to max" finishes the work faster than middle frequencies, if you're
CPU-bound.

> > > If you switch frequencies in a staircase-like manner, you're going to
> > > take longer to finish, in certain cases, and burn more power while doing
> > > so.
> > 
> > This is not true with this patch. It switches to middle frequencies
> > when the load < up_threshold.

This is worth investigating wrt hightened power consumption, as Rafael
suggested.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-06  9:55     ` Borislav Petkov
@ 2013-06-06  9:57       ` Viresh Kumar
  2013-06-06 13:50       ` David C Niemi
  1 sibling, 0 replies; 48+ messages in thread
From: Viresh Kumar @ 2013-06-06  9:57 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: David C Niemi, Stratos Karafotis, Rafael J. Wysocki,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel, Lists linaro-kernel

On 6 June 2013 15:25, Borislav Petkov <bp@suse.de> wrote:
> The correct "fix" for this whole deal is coupling cpufreq with
> the scheduler, as it has been said so many times before. You need
> "something" which can tell you whether raising the freq. is worth it or
> not (i.e. the process is waiting on IO or is executing instructions).

Linaro has got a blueprint in this direction but doesn't have any
proof of concept or RFC patches to share. But that will happen soon.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-05 16:58   ` David C Niemi
@ 2013-06-06  9:55     ` Borislav Petkov
  2013-06-06  9:57       ` Viresh Kumar
  2013-06-06 13:50       ` David C Niemi
  0 siblings, 2 replies; 48+ messages in thread
From: Borislav Petkov @ 2013-06-06  9:55 UTC (permalink / raw)
  To: David C Niemi
  Cc: Stratos Karafotis, Rafael J. Wysocki, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

Please do not top-post.

On Wed, Jun 05, 2013 at 12:58:33PM -0400, David C Niemi wrote:
> When you are doing a locally-originated truly CPU-bound task, "race to
> idle" does make some sense. But I can think of a couple of caveats.
>
> 1) If you care about power consumption, you want to avoid
> super-power-hungry turbo states, as you get less done per watt-hour
> than in some of the middle states.
>
> 2) CPU usage that is related to I/O (network, disk, video) doesn't
> necessarily let you go to idle sooner if at all. In this case if you
> want to minimize power consumption you may want to use middle states a
> lot. But if you care more about responsiveness or latency than power
> consumption, you might want to go to a high state anyway; that is why
> we have tunables -- so we can configure based on the actual priorities
> for the machine.

No, users don't always know about tunables - this should Just Work(tm).

The correct "fix" for this whole deal is coupling cpufreq with
the scheduler, as it has been said so many times before. You need
"something" which can tell you whether raising the freq. is worth it or
not (i.e. the process is waiting on IO or is executing instructions).

Btw, recent AMD CPUs have something called frequency feedback interface
which can tell you how much performance you would get if you would raise
the frequency to the next P-state.

I don't know though how reliable this heuristic is, and, besides,
we need this addressed for all hw out there, which means, a sw-only
solution would be the way to go.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-05 17:13   ` Stratos Karafotis
@ 2013-06-05 20:35     ` Rafael J. Wysocki
  2013-06-06 10:01       ` Borislav Petkov
  2013-06-07 19:14       ` Stratos Karafotis
  0 siblings, 2 replies; 48+ messages in thread
From: Rafael J. Wysocki @ 2013-06-05 20:35 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Borislav Petkov, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Wednesday, June 05, 2013 08:13:26 PM Stratos Karafotis wrote:
> Hi Borislav,
> 
> On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> > On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> >> Ondemand calculates load in terms of frequency and increases it only
> >> if the load_freq is greater than up_threshold multiplied by current
> >> or average frequency. This seems to produce oscillations of frequency
> >> between min and max because, for example, a relatively small load can
> >> easily saturate minimum frequency and lead the CPU to max. Then, the
> >> CPU will decrease back to min due to a small load_freq.
> >
> > Right, and I think this is how we want it, no?
> >
> > The thing is, the faster you finish your work, the faster you can become
> > idle and save power.
> 
> This is exactly the goal of this patch. To use more efficiently middle
> frequencies to finish faster the work.
> 
> > If you switch frequencies in a staircase-like manner, you're going to
> > take longer to finish, in certain cases, and burn more power while doing
> > so.
> 
> This is not true with this patch. It switches to middle frequencies
> when the load < up_threshold.
> Now, ondemand does not increase freq. CPU runs in lowest freq till the
> load is greater than up_threshold.
> 
> > Btw, racing to idle is also a good example for why you want boosting:
> > you want to go max out the core but stay within power limits so that you
> > can finish sooner.
> >
> >> This patch changes the calculation method of load and target frequency
> >> considering 2 points:
> >> - Load computation should be independent from current or average
> >> measured frequency. For example an absolute load 80% at 100MHz is not
> >> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> >> - Target frequency should be increased to any value of frequency table
> >> proportional to absolute load, instead to only the max. Thus:
> >>
> >> Target frequency = C * load
> >>
> >> where C = policy->cpuinfo.max_freq / 100
> >>
> >> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> >> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> >> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> >> that middle frequencies are used more, with this patch. Highest
> >> and lowest frequencies were used less by ~9%

Can you also use powertop to measure the percentage of time spent in idle
states for the same workload with and without your patchset?  Also, it would
be good to measure the total energy consumption somehow ...

Thanks,
Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-05 16:17 ` Borislav Petkov
  2013-06-05 16:58   ` David C Niemi
@ 2013-06-05 17:13   ` Stratos Karafotis
  2013-06-05 20:35     ` Rafael J. Wysocki
  1 sibling, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-05 17:13 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Rafael J. Wysocki, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

Hi Borislav,

On 06/05/2013 07:17 PM, Borislav Petkov wrote:
> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>> Ondemand calculates load in terms of frequency and increases it only
>> if the load_freq is greater than up_threshold multiplied by current
>> or average frequency. This seems to produce oscillations of frequency
>> between min and max because, for example, a relatively small load can
>> easily saturate minimum frequency and lead the CPU to max. Then, the
>> CPU will decrease back to min due to a small load_freq.
>
> Right, and I think this is how we want it, no?
>
> The thing is, the faster you finish your work, the faster you can become
> idle and save power.

This is exactly the goal of this patch. To use more efficiently middle
frequencies to finish faster the work.

> If you switch frequencies in a staircase-like manner, you're going to
> take longer to finish, in certain cases, and burn more power while doing
> so.

This is not true with this patch. It switches to middle frequencies
when the load < up_threshold.
Now, ondemand does not increase freq. CPU runs in lowest freq till the
load is greater than up_threshold.

> Btw, racing to idle is also a good example for why you want boosting:
> you want to go max out the core but stay within power limits so that you
> can finish sooner.
>
>> This patch changes the calculation method of load and target frequency
>> considering 2 points:
>> - Load computation should be independent from current or average
>> measured frequency. For example an absolute load 80% at 100MHz is not
>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>> - Target frequency should be increased to any value of frequency table
>> proportional to absolute load, instead to only the max. Thus:
>>
>> Target frequency = C * load
>>
>> where C = policy->cpuinfo.max_freq / 100
>>
>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>> that middle frequencies are used more, with this patch. Highest
>> and lowest frequencies were used less by ~9%
>
> I read this as "the workload takes longer to complete" which means
> higher power consumption and longer execution times which means less
> time spent in idle. And I don't think we want that.
>
> Yes, no?

In my opinion, no.
Running the benchmark mentioned in changelog shows shorter execution
time by ~1.5%

Thanks,
Stratos

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-05 16:17 ` Borislav Petkov
@ 2013-06-05 16:58   ` David C Niemi
  2013-06-06  9:55     ` Borislav Petkov
  2013-06-05 17:13   ` Stratos Karafotis
  1 sibling, 1 reply; 48+ messages in thread
From: David C Niemi @ 2013-06-05 16:58 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Stratos Karafotis, Rafael J. Wysocki, Viresh Kumar,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, linux-pm, cpufreq,
	linux-kernel

When you are doing a locally-originated truly CPU-bound task, "race to idle" does make some sense.  But I can think of a couple of caveats.

1) If you care about power consumption, you want to avoid super-power-hungry turbo states, as you get less done per watt-hour than in some of the middle states.

2) CPU usage that is related to I/O (network, disk, video) doesn't necessarily let you go to idle sooner if at all.  In this case if you want to minimize power consumption you may want to use middle states a lot.  But if you care more about responsiveness or latency than power consumption, you might want to go to a high state anyway; that is why we have tunables -- so we can configure based on the actual priorities for the machine.

DCN

On 06/05/13 12:17, Borislav Petkov wrote:
> On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
>> Ondemand calculates load in terms of frequency and increases it only
>> if the load_freq is greater than up_threshold multiplied by current
>> or average frequency. This seems to produce oscillations of frequency
>> between min and max because, for example, a relatively small load can
>> easily saturate minimum frequency and lead the CPU to max. Then, the
>> CPU will decrease back to min due to a small load_freq.
> Right, and I think this is how we want it, no?
>
> The thing is, the faster you finish your work, the faster you can become
> idle and save power.
>
> If you switch frequencies in a staircase-like manner, you're going to
> take longer to finish, in certain cases, and burn more power while doing
> so.
>
> Btw, racing to idle is also a good example for why you want boosting:
> you want to go max out the core but stay within power limits so that you
> can finish sooner.
>
>> This patch changes the calculation method of load and target frequency
>> considering 2 points:
>> - Load computation should be independent from current or average
>> measured frequency. For example an absolute load 80% at 100MHz is not
>> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
>> - Target frequency should be increased to any value of frequency table
>> proportional to absolute load, instead to only the max. Thus:
>>
>> Target frequency = C * load
>>
>> where C = policy->cpuinfo.max_freq / 100
>>
>> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
>> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
>> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
>> that middle frequencies are used more, with this patch. Highest
>> and lowest frequencies were used less by ~9%
> I read this as "the workload takes longer to complete" which means
> higher power consumption and longer execution times which means less
> time spent in idle. And I don't think we want that.
>
> Yes, no?
>
> Thanks.
>


^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
  2013-06-05 16:01 Stratos Karafotis
@ 2013-06-05 16:17 ` Borislav Petkov
  2013-06-05 16:58   ` David C Niemi
  2013-06-05 17:13   ` Stratos Karafotis
  0 siblings, 2 replies; 48+ messages in thread
From: Borislav Petkov @ 2013-06-05 16:17 UTC (permalink / raw)
  To: Stratos Karafotis
  Cc: Rafael J. Wysocki, Viresh Kumar, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, linux-pm, cpufreq, linux-kernel

On Wed, Jun 05, 2013 at 07:01:25PM +0300, Stratos Karafotis wrote:
> Ondemand calculates load in terms of frequency and increases it only
> if the load_freq is greater than up_threshold multiplied by current
> or average frequency. This seems to produce oscillations of frequency
> between min and max because, for example, a relatively small load can
> easily saturate minimum frequency and lead the CPU to max. Then, the
> CPU will decrease back to min due to a small load_freq.

Right, and I think this is how we want it, no?

The thing is, the faster you finish your work, the faster you can become
idle and save power.

If you switch frequencies in a staircase-like manner, you're going to
take longer to finish, in certain cases, and burn more power while doing
so.

Btw, racing to idle is also a good example for why you want boosting:
you want to go max out the core but stay within power limits so that you
can finish sooner.

> This patch changes the calculation method of load and target frequency
> considering 2 points:
> - Load computation should be independent from current or average
> measured frequency. For example an absolute load 80% at 100MHz is not
> necessarily equivalent to 8% at 1000MHz in the next sampling interval.
> - Target frequency should be increased to any value of frequency table
> proportional to absolute load, instead to only the max. Thus:
> 
> Target frequency = C * load
> 
> where C = policy->cpuinfo.max_freq / 100
> 
> Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
> Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
> increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
> that middle frequencies are used more, with this patch. Highest
> and lowest frequencies were used less by ~9%

I read this as "the workload takes longer to complete" which means
higher power consumption and longer execution times which means less
time spent in idle. And I don't think we want that.

Yes, no?

Thanks.

-- 
Regards/Gruss,
    Boris.

Sent from a fat crate under my desk. Formatting is fine.
--

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency
@ 2013-06-05 16:01 Stratos Karafotis
  2013-06-05 16:17 ` Borislav Petkov
  0 siblings, 1 reply; 48+ messages in thread
From: Stratos Karafotis @ 2013-06-05 16:01 UTC (permalink / raw)
  To: Rafael J. Wysocki, Viresh Kumar
  Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Borislav Petkov,
	linux-pm, cpufreq, linux-kernel

Ondemand calculates load in terms of frequency and increases it only
if the load_freq is greater than up_threshold multiplied by current
or average frequency. This seems to produce oscillations of frequency
between min and max because, for example, a relatively small load can
easily saturate minimum frequency and lead the CPU to max. Then, the
CPU will decrease back to min due to a small load_freq.

This patch changes the calculation method of load and target frequency
considering 2 points:
- Load computation should be independent from current or average
measured frequency. For example an absolute load 80% at 100MHz is not
necessarily equivalent to 8% at 1000MHz in the next sampling interval.
- Target frequency should be increased to any value of frequency table
proportional to absolute load, instead to only the max. Thus:

Target frequency = C * load

where C = policy->cpuinfo.max_freq / 100

Tested on Intel i7-3770 CPU @ 3.40GHz and on Quad core 1500MHz Krait.
Phoronix benchmark of Linux Kernel Compilation 3.1 test shows an
increase ~1.5% in performance. cpufreq_stats (time_in_state) shows
that middle frequencies are used more, with this patch. Highest
and lowest frequencies were used less by ~9%

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
---
 drivers/cpufreq/cpufreq_governor.c | 10 +---------
 drivers/cpufreq/cpufreq_governor.h |  1 -
 drivers/cpufreq/cpufreq_ondemand.c | 39 +++++++-------------------------------
 3 files changed, 8 insertions(+), 42 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c
index a849b2d..47c8077 100644
--- a/drivers/cpufreq/cpufreq_governor.c
+++ b/drivers/cpufreq/cpufreq_governor.c
@@ -54,7 +54,7 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 
 	policy = cdbs->cur_policy;
 
-	/* Get Absolute Load (in terms of freq for ondemand gov) */
+	/* Get Absolute Load */
 	for_each_cpu(j, policy->cpus) {
 		struct cpu_dbs_common_info *j_cdbs;
 		u64 cur_wall_time, cur_idle_time;
@@ -105,14 +105,6 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu)
 
 		load = 100 * (wall_time - idle_time) / wall_time;
 
-		if (dbs_data->cdata->governor == GOV_ONDEMAND) {
-			int freq_avg = __cpufreq_driver_getavg(policy, j);
-			if (freq_avg <= 0)
-				freq_avg = policy->cur;
-
-			load *= freq_avg;
-		}
-
 		if (load > max_load)
 			max_load = load;
 	}
diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h
index e7bbf76..c305cad 100644
--- a/drivers/cpufreq/cpufreq_governor.h
+++ b/drivers/cpufreq/cpufreq_governor.h
@@ -169,7 +169,6 @@ struct od_dbs_tuners {
 	unsigned int sampling_rate;
 	unsigned int sampling_down_factor;
 	unsigned int up_threshold;
-	unsigned int adj_up_threshold;
 	unsigned int powersave_bias;
 	unsigned int io_is_busy;
 };
diff --git a/drivers/cpufreq/cpufreq_ondemand.c b/drivers/cpufreq/cpufreq_ondemand.c
index 4b9bb5d..62e67a9 100644
--- a/drivers/cpufreq/cpufreq_ondemand.c
+++ b/drivers/cpufreq/cpufreq_ondemand.c
@@ -29,11 +29,9 @@
 #include "cpufreq_governor.h"
 
 /* On-demand governor macros */
-#define DEF_FREQUENCY_DOWN_DIFFERENTIAL		(10)
 #define DEF_FREQUENCY_UP_THRESHOLD		(80)
 #define DEF_SAMPLING_DOWN_FACTOR		(1)
 #define MAX_SAMPLING_DOWN_FACTOR		(100000)
-#define MICRO_FREQUENCY_DOWN_DIFFERENTIAL	(3)
 #define MICRO_FREQUENCY_UP_THRESHOLD		(95)
 #define MICRO_FREQUENCY_MIN_SAMPLE_RATE		(10000)
 #define MIN_FREQUENCY_UP_THRESHOLD		(11)
@@ -159,14 +157,10 @@ static void dbs_freq_increase(struct cpufreq_policy *p, unsigned int freq)
 
 /*
  * Every sampling_rate, we check, if current idle time is less than 20%
- * (default), then we try to increase frequency. Every sampling_rate, we look
- * for the lowest frequency which can sustain the load while keeping idle time
- * over 30%. If such a frequency exist, we try to decrease to this frequency.
- *
- * Any frequency increase takes it to the maximum frequency. Frequency reduction
- * happens at minimum steps of 5% (default) of current frequency
+ * (default), then we try to increase frequency. Else, we adjust the frequency
+ * proportional to load.
  */
-static void od_check_cpu(int cpu, unsigned int load_freq)
+static void od_check_cpu(int cpu, unsigned int load)
 {
 	struct od_cpu_dbs_info_s *dbs_info = &per_cpu(od_cpu_dbs_info, cpu);
 	struct cpufreq_policy *policy = dbs_info->cdbs.cur_policy;
@@ -176,29 +170,17 @@ static void od_check_cpu(int cpu, unsigned int load_freq)
 	dbs_info->freq_lo = 0;
 
 	/* Check for frequency increase */
-	if (load_freq > od_tuners->up_threshold * policy->cur) {
+	if (load > od_tuners->up_threshold) {
 		/* If switching to max speed, apply sampling_down_factor */
 		if (policy->cur < policy->max)
 			dbs_info->rate_mult =
 				od_tuners->sampling_down_factor;
 		dbs_freq_increase(policy, policy->max);
 		return;
-	}
-
-	/* Check for frequency decrease */
-	/* if we cannot reduce the frequency anymore, break out early */
-	if (policy->cur == policy->min)
-		return;
-
-	/*
-	 * The optimal frequency is the frequency that is the lowest that can
-	 * support the current CPU usage without triggering the up policy. To be
-	 * safe, we focus 10 points under the threshold.
-	 */
-	if (load_freq < od_tuners->adj_up_threshold
-			* policy->cur) {
+	} else {
+		/* Calculate the next frequency proportional to load */
 		unsigned int freq_next;
-		freq_next = load_freq / od_tuners->adj_up_threshold;
+		freq_next = load * policy->cpuinfo.max_freq / 100;
 
 		/* No longer fully busy, reset rate_mult */
 		dbs_info->rate_mult = 1;
@@ -372,9 +354,6 @@ static ssize_t store_up_threshold(struct dbs_data *dbs_data, const char *buf,
 			input < MIN_FREQUENCY_UP_THRESHOLD) {
 		return -EINVAL;
 	}
-	/* Calculate the new adj_up_threshold */
-	od_tuners->adj_up_threshold += input;
-	od_tuners->adj_up_threshold -= od_tuners->up_threshold;
 
 	od_tuners->up_threshold = input;
 	return count;
@@ -523,8 +502,6 @@ static int od_init(struct dbs_data *dbs_data)
 	if (idle_time != -1ULL) {
 		/* Idle micro accounting is supported. Use finer thresholds */
 		tuners->up_threshold = MICRO_FREQUENCY_UP_THRESHOLD;
-		tuners->adj_up_threshold = MICRO_FREQUENCY_UP_THRESHOLD -
-			MICRO_FREQUENCY_DOWN_DIFFERENTIAL;
 		/*
 		 * In nohz/micro accounting case we set the minimum frequency
 		 * not depending on HZ, but fixed (very low). The deferred
@@ -533,8 +510,6 @@ static int od_init(struct dbs_data *dbs_data)
 		dbs_data->min_sampling_rate = MICRO_FREQUENCY_MIN_SAMPLE_RATE;
 	} else {
 		tuners->up_threshold = DEF_FREQUENCY_UP_THRESHOLD;
-		tuners->adj_up_threshold = DEF_FREQUENCY_UP_THRESHOLD -
-			DEF_FREQUENCY_DOWN_DIFFERENTIAL;
 
 		/* For correct statistics, we need 10 ticks for each measure */
 		dbs_data->min_sampling_rate = MIN_SAMPLING_RATE_RATIO *
-- 
1.8.1.4


^ permalink raw reply related	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2015-02-23 16:45 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-08 12:34 [PATCH v3 1/3] cpufreq: ondemand: Change the calculation of target frequency Stratos Karafotis
2013-06-08 12:34 ` Stratos Karafotis
2013-06-08 14:05 ` Rafael J. Wysocki
2013-06-08 20:31   ` Stratos Karafotis
2013-06-08 22:18     ` Rafael J. Wysocki
2013-06-09 16:26       ` Borislav Petkov
2013-06-09 18:08         ` Stratos Karafotis
2013-06-09 20:58           ` Rafael J. Wysocki
2013-06-09 21:14             ` Borislav Petkov
2013-06-09 22:11               ` Rafael J. Wysocki
2015-02-23 16:42                 ` nitin
2013-06-10 21:57             ` Stratos Karafotis
2013-06-10 23:24               ` Rafael J. Wysocki
2013-06-13 21:22                 ` Stratos Karafotis
2013-06-13 21:40                   ` Borislav Petkov
2013-06-13 22:04                     ` Stratos Karafotis
2013-06-13 22:38                       ` Borislav Petkov
2013-06-13 22:15                     ` Rafael J. Wysocki
2013-06-13 22:37                       ` Borislav Petkov
2013-06-13 22:37                         ` Borislav Petkov
2013-06-14 12:46                         ` Rafael J. Wysocki
2013-06-14 12:46                           ` Rafael J. Wysocki
2013-06-14 12:44                           ` Borislav Petkov
2013-06-14 12:55                             ` Rafael J. Wysocki
2013-06-14 15:53                               ` Stratos Karafotis
  -- strict thread matches above, loose matches on Subject: below --
2013-06-06 12:56 Stratos Karafotis
2013-06-06 12:56 ` Stratos Karafotis
2013-06-06 12:54 Stratos Karafotis
2013-06-06 12:54 ` Stratos Karafotis
2013-06-06 13:15 ` Borislav Petkov
2013-06-05 16:01 Stratos Karafotis
2013-06-05 16:17 ` Borislav Petkov
2013-06-05 16:58   ` David C Niemi
2013-06-06  9:55     ` Borislav Petkov
2013-06-06  9:57       ` Viresh Kumar
2013-06-06 13:50       ` David C Niemi
2013-06-05 17:13   ` Stratos Karafotis
2013-06-05 20:35     ` Rafael J. Wysocki
2013-06-06 10:01       ` Borislav Petkov
2013-06-06 10:10         ` Viresh Kumar
2013-06-06 12:10           ` Borislav Petkov
2013-06-06 16:46             ` Stratos Karafotis
2013-06-06 17:11               ` Borislav Petkov
2013-06-06 17:32                 ` Stratos Karafotis
2013-06-07 19:14       ` Stratos Karafotis
2013-06-07 20:57         ` Rafael J. Wysocki
2013-06-08  9:56           ` Stratos Karafotis
2013-06-08 11:18             ` Rafael J. Wysocki

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.