All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andreas Herrmann <aherrmann@suse.com>
To: Viresh Kumar <viresh.kumar@linaro.org>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	linux-pm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Stratos Karafotis <stratosk@semaphore.gr>,
	Thomas Renninger <trenn@suse.com>
Subject: Re: [PATCH 1/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes
Date: Tue, 13 Sep 2016 12:53:35 +0200	[thread overview]
Message-ID: <20160913105335.GA25602@suselix> (raw)
In-Reply-To: <20160907050201.GK27345@vireshk-i7>

On Wed, Sep 07, 2016 at 10:32:01AM +0530, Viresh Kumar wrote:
> On 01-09-16, 15:21, Andreas Herrmann wrote:
> > On Mon, Aug 29, 2016 at 11:31:53AM +0530, Viresh Kumar wrote:
> 
> > > I am _really_ worried about such hacks in drivers to negate the effect of a
> > > patch, that was actually good.
> > 
> > > Did you try to increase the sampling period of ondemand governor to see if that
> > > helps without this patch.
> > 
> > With an older kernel I've modified transition_latency of the driver
> > which in turn is used to calculate the sampling rate.
> 
> Naah, that isn't what I was looking for, sorry :(
> 
> To explain it a bit more, this is what the patch did.
> 
> Suppose, your platform supports frequencies: F1 (lowest), F2, F3, F4,
> F5, F6 and F7 (highest). The cpufreq governor (ondemand) based on a
> sampling rate and system load tries to change the frequency of the
> underlying hardware and select one of those.
> 
> Before the original patch came in, F2 and F3 were never getting
> selected and the system was stuck in F1 for a long time. Which will
> decrease the performance for that period of time as we should have
> switched to a higher frequency really.
> 
> With the new patch we switch to the frequency proportional to current
> load.
> 
> The requests made from cpufreq-governor wouldn't have changed at all
> with that, but the number of freq changes at hardware level may
> actually change as we might be returning very quickly if the target
> freq evaluated to F1 in the earlier case.
> 
> That may lead to switching to frequencies F2 and F3, which wasn't
> happening earlier.
> 
> I don't think the result of that should be too bad.. We should have
> performed better almost everywhere.
> 
> transition_latency is used only to initialize sampling rate, but that
> may get modified from userspace later on. Please tell us the value
> read from sysfs file sampling_rate present in:
> 
> /sys/devices/system/cpu/cpufreq/policy0/ondemand/

/sys/devices/system/cpu/cpufreq/ondemand contains following values:

ignore_nice_load           : 0
io_is_busy                 : 1
min_sampling_rate          : 10000
powersave_bias             : 0
up_threshold               : 95
sampling_rate              : 10000
sampling_down_factor       : 1

FYI, for each CPU direcories /sys/devices/system/cpu/cpufreq/policy* look like:

scaling_cur_freq           : 1200000
cpuinfo_max_freq           : 2800000
cpuinfo_transition_latency : 0
cpuinfo_min_freq           : 1200000
cpuinfo_cur_freq           : 1204000
affected_cpus              : 0
scaling_governor           : ondemand
scaling_driver             : pcc-cpufreq
scaling_max_freq           : 2800000
related_cpus               : 0
scaling_setspeed           : <unsupported>
scaling_available_governors: ondemand performance 
scaling_min_freq           : 1200000

(which is for policy0 for CPU0)

> And try playing with that variable a bit to see if you can make things
> better or not.

Done that. Following the elapsed time with ondemand governor and
different values for sampling rate (for each of 4.8-rc5
vanilla/modified with my patch/commit 6393d6a reverted).

4.8-rc5 (vanilla)
                                                 sample_rate
           10000            20000            30000            45000            60000            90000
jobs  seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU
2     283.47   203.00  273.79   204.00  276.08   204.00  276.99   204.00  281.80   203.60  291.00   203.00 
4     146.92   399.20  140.98   399.80  141.55   399.80  140.91   400.00  142.91   400.00  146.49   399.80 
8      81.66   770.00   77.44   769.20   76.81   769.20   73.74   769.20   73.17   767.20   72.75   767.00 
16     54.31  1466.20   49.58  1458.40   48.37  1456.60   44.29  1447.80   42.89  1436.40   41.12  1416.80
32     36.71  2720.60   30.70  2672.80   29.22  2578.80   26.34  2563.20   25.57  2506.80   24.58  2457.80
64     13.99  3291.20   13.63  3295.40   13.55  3309.20   13.30  3354.60   13.66  3264.40   13.55  3288.60
120    14.84  3114.20   14.10  3215.00   14.57  3115.20   14.11  3209.80   14.17  3198.60   14.09  3220.80
------------------------------
4.8-rc5 re-intro deadband for pcc-cpufreq only (my patch)
                                                 sample_rate
           10000            20000            30000            45000            60000            90000
jobs  seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU
2     270.88   192.00  270.87   192.80  270.65   192.00  274.08   192.00  277.97   192.80  289.85   193.00 
4     140.23   379.20  138.94   379.80  138.20   378.80  138.99   379.20  140.81   379.60  145.83   380.40 
8      76.15   738.20   73.86   733.60   71.58   734.20   70.43   730.40   70.32   730.60   71.14   728.20 
16     47.39  1414.80   44.20  1395.60   43.02  1384.60   41.09  1368.00   40.37  1362.80   39.56  1351.20
32     32.78  2659.80   29.05  2531.80   27.18  2482.40   25.36  2471.60   24.55  2441.00   24.18  2365.80
64     13.34  3213.20   13.29  3204.60   13.38  3158.60   13.30  3162.40   13.57  3089.60   13.48  3125.20
120    13.89  3103.00   14.00  3047.40   13.87  3068.40   13.83  3077.80   13.93  3054.80   14.10  3033.60
------------------------------
4.8-rc5 reverted commit 6393d6a
                                                 sample_rate
           10000            20000            30000            45000            60000            90000
jobs  seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU   seconds  % CPU
2     250.94   205.00  256.26   205.00  261.29   205.00  268.92   205.00  275.60   204.80  288.64   204.00 
4     127.20   401.20  129.63   401.80  131.80   402.00  135.41   401.20  138.88   401.20  144.96   401.00 
8      64.24   767.60   64.18   769.60   64.49   770.20   65.69   766.00   66.38   768.60   68.64   767.40 
16     37.02  1438.00   36.09  1432.20   36.10  1429.40   36.10  1424.60   36.61  1408.80   36.87  1405.60
32     24.73  2618.80   23.28  2533.80   23.00  2513.40   22.45  2526.00   22.92  2463.00   23.12  2422.20
64     12.62  3550.60   12.94  3439.60   12.95  3425.60   12.94  3421.80   13.29  3336.80   13.65  3257.80
120    13.30  3408.80   13.43  3346.40   13.44  3347.80   13.55  3322.60   13.76  3277.20   14.19  3192.40

> > I started with the value return as "nominal latency" for PCC.  This
> > was 300000 ns on the test system and made things worse.  I've tested
> > other values as well unitl I've found a local optimium at 45000ns but
> > performance was lower in comparison to when I've applied my hack.
> 
> Can you try to use kernel tracer (ftrace) and see how the frequencies
> are getting changed and at what frequency.

I am currently looking into this ...

> We need to understand the problem better, as I am not 100% sure what's
> going on right now.

... to gather related data.

> > > Also, it is important to understand why is the performance going
> > > down, while the original commit should have made it better.
> > 
> > My understanding is that the original commit was tested with certain
> > combinations of hardware and cpufreq-drivers and the claim was that
> > for those (two?) tested combinations performance increased and power
> > consumption was lower. So I am not so sure what to expect from all
> > other cpufreq-driver/hardware combinations.
> 
> It was principally the right thing to do IMO. And I don't think any
> other hardware should get affected badly. At the max, the tuning needs
> to be made a bit better.
> 
> > > Is it only about more transitions ?
> > 
> > I think this is the main issue.
> 
> Then it can be controlled with sampling rate from userspace.
> 
> > In an older kernel version I activated/added debug output in
> > __cpufreq_driver_target(). Of course this creates a huge amount of
> > messages. But with original patch reverted it was like:
> > 
> > [   29.489677] cpufreq: target for CPU 0: 1760000 kHz (1200000 kHz), relation 2, requested 1760000 kHz
> > [   29.570364] cpufreq: target for CPU 0: 1216000 kHz (1760000 kHz), relation 2, requested 1216000 kHz
> > [   29.571055] cpufreq: target for CPU 1: 1200000 kHz (1148000 kHz), relation 0, requested 1200000 kHz
> > [   29.571483] cpufreq: target for CPU 1: 1200000 kHz (1200000 kHz), relation 2, requested 1200000 kHz
> > [   29.572042] cpufreq: target for CPU 2: 1200000 kHz (1064000 kHz), relation 0, requested 1200000 kHz
> > [   29.572503] cpufreq: target for CPU 2: 1200000 kHz (1200000 kHz), relation 2, requested 1200000 kHz
> 
> Your platform is a bit special as it uses ->target() callback and not
> ->target_index(). And so you can pretty much switch to any frequency.
> 
> Can you please read value of all the sysfs files present in the
> governor directory? That would be helpful. Maybe we can play with some
> more files like: up_threshold to see what the results are.

For sysfs values see above. I've not yet played with up_threshold but
plan to do this as well.

> > a lot of stuff, but system could still handle it and booted to the
> > prompt.
> > 
> > With the original patch applied the system was really flooded and
> > eventually became unresponsive:
> > 
> > ** 459 printk messages dropped ** [   29.838689] cpufreq: target for CPU 43: 1408000 kHz (2384000 kHz), relation 2, requested 1408000 kHz
> > ** 480 printk messages dropped ** [   29.993849] cpufreq: target for CPU 54: 1200000 kHz (1248000 kHz), relation 2, requested 1200000 kHz
> > ** 413 printk messages dropped ** [   30.113921] cpufreq: target for CPU 59: 2064000 kHz (1248000 kHz), relation 2, requested 2064000 kHz
> > ** 437 printk messages dropped ** [   30.245846] cpufreq: target for CPU 21: 1296000 kHz (1296000 kHz), relation 2, requested 1296000 kHz
> > ** 435 printk messages dropped ** [   30.397748] cpufreq: target for CPU 13: 1280000 kHz (2640000 kHz), relation 2, requested 1280000 kHz
> > ** 480 printk messages dropped ** [   30.541846] cpufreq: target for CPU 58: 2112000 kHz (1632000 kHz), relation 2, requested 2112000 kHz
> 
> This looks even more dangerous :)
> 
> -- 
> viresh


Thanks,

Andreas

  reply	other threads:[~2016-09-13 10:54 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-19 12:18 [PATCH 0/1] cpufreq: pcc-cpufreq: Re-introduce deadband effect to reduce number of frequency changes Andreas Herrmann
2016-08-19 12:21 ` [PATCH 1/1] " Andreas Herrmann
2016-08-29  6:01   ` Viresh Kumar
2016-09-01 13:21     ` Andreas Herrmann
2016-09-07  5:02       ` Viresh Kumar
2016-09-13 10:53         ` Andreas Herrmann [this message]
2016-09-14 14:56         ` Andreas Herrmann
2016-10-05  5:17           ` Viresh Kumar
2016-10-11  6:28             ` Andreas Herrmann
2016-09-16  9:47         ` Andreas Herrmann
2016-09-16 18:48           ` Stratos Karafotis
     [not found]             ` <CADmjqpNE9f7fzQjWsHKB4wEjLq-4ZvQpaC314OcLdQ-i_TAABg@mail.gmail.com>
2016-09-19 16:16               ` Andreas Herrmann
2016-09-19 19:39                 ` Stratos Karafotis
2016-09-22 17:54                   ` Andreas Herrmann
2016-10-05  5:21                     ` Viresh Kumar
2016-08-19 12:40 ` [PATCH 0/1] " Andreas Herrmann
2016-09-23 16:56 ` [PATCH v2 0/2] " Andreas Herrmann
2016-09-23 17:02   ` [PATCH v2 1/2] cpufreq/ondemand: Introduce op to customize mapping of load to frequency Andreas Herrmann
2016-10-05  4:01     ` Viresh Kumar
2016-10-11  6:30       ` Andreas Herrmann
2016-09-23 17:07   ` [PATCH v2 2/2] cpufreq/pcc-cpufreq: Make use of map_load_to_freq op Andreas Herrmann
2016-09-26  9:05     ` [PATCH v3 " Andreas Herrmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160913105335.GA25602@suselix \
    --to=aherrmann@suse.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=stratosk@semaphore.gr \
    --cc=trenn@suse.com \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.