regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit")
@ 2016-06-17  8:30 Jisheng Zhang
  2016-06-17  8:40 ` Jisheng Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Jisheng Zhang @ 2016-06-17  8:30 UTC (permalink / raw)
  To: rjw, viresh.kumar, linux-pm, linux-kernel

Dear all,

I found one regression: In an idle system, wakeups/s (reported by powertop)
is increased a lot, e.g on a intel snb 4 core platform, the wakeup event
number is increased from 8 wakeups/s to 24 wakeup/s. bisect points to
this commit. I could send detailed bisect log if it's wanted.

Thanks,
Jisheng

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit")
  2016-06-17  8:30 regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit") Jisheng Zhang
@ 2016-06-17  8:40 ` Jisheng Zhang
  2016-06-17 13:09   ` Rafael J. Wysocki
  0 siblings, 1 reply; 17+ messages in thread
From: Jisheng Zhang @ 2016-06-17  8:40 UTC (permalink / raw)
  To: rjw, viresh.kumar, linux-pm, linux-kernel

Dear all,

On Fri, 17 Jun 2016 16:30:23 +0800 Jisheng Zhang wrote:

> Dear all,
> 
> I found one regression: In an idle system, wakeups/s (reported by powertop)
> is increased a lot, e.g on a intel snb 4 core platform, the wakeup event
> number is increased from 8 wakeups/s to 24 wakeup/s. bisect points to
> this commit. I could send detailed bisect log if it's wanted.
> 

more information maybe useful: after the commit, the top two wakeup source
are

        Process        [rcu_sched]

        Timer          tick_sched_timer

thanks

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit")
  2016-06-17  8:40 ` Jisheng Zhang
@ 2016-06-17 13:09   ` Rafael J. Wysocki
  2016-06-17 13:16     ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2016-06-17 13:09 UTC (permalink / raw)
  To: Jisheng Zhang, Paul McKenney, Peter Zijlstra
  Cc: Rafael J. Wysocki, Viresh Kumar, linux-pm, Linux Kernel Mailing List

On Fri, Jun 17, 2016 at 10:40 AM, Jisheng Zhang <jszhang@marvell.com> wrote:
> Dear all,
>
> On Fri, 17 Jun 2016 16:30:23 +0800 Jisheng Zhang wrote:
>
>> Dear all,
>>
>> I found one regression: In an idle system, wakeups/s (reported by powertop)
>> is increased a lot, e.g on a intel snb 4 core platform, the wakeup event
>> number is increased from 8 wakeups/s to 24 wakeup/s. bisect points to
>> this commit. I could send detailed bisect log if it's wanted.
>>
>
> more information maybe useful: after the commit, the top two wakeup source
> are
>
>         Process        [rcu_sched]
>
>         Timer          tick_sched_timer

And what was there before the commit?

Granted, I'm not seeing this on my systems.

Paul, Peter, any ideas about what may be going on here?

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit")
  2016-06-17 13:09   ` Rafael J. Wysocki
@ 2016-06-17 13:16     ` Paul E. McKenney
  2016-06-17 14:03       ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2016-06-17 13:16 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Jisheng Zhang, Peter Zijlstra, Rafael J. Wysocki, Viresh Kumar,
	linux-pm, Linux Kernel Mailing List

On Fri, Jun 17, 2016 at 03:09:36PM +0200, Rafael J. Wysocki wrote:
> On Fri, Jun 17, 2016 at 10:40 AM, Jisheng Zhang <jszhang@marvell.com> wrote:
> > Dear all,
> >
> > On Fri, 17 Jun 2016 16:30:23 +0800 Jisheng Zhang wrote:
> >
> >> Dear all,
> >>
> >> I found one regression: In an idle system, wakeups/s (reported by powertop)
> >> is increased a lot, e.g on a intel snb 4 core platform, the wakeup event
> >> number is increased from 8 wakeups/s to 24 wakeup/s. bisect points to
> >> this commit. I could send detailed bisect log if it's wanted.
> >>
> >
> > more information maybe useful: after the commit, the top two wakeup source
> > are
> >
> >         Process        [rcu_sched]
> >
> >         Timer          tick_sched_timer
> 
> And what was there before the commit?
> 
> Granted, I'm not seeing this on my systems.
> 
> Paul, Peter, any ideas about what may be going on here?

Looks to me like this commit moved some code from synchronize_rcu() to
synchronize_sched().  Assuming that this is a CONFIG_PREEMPT=y system,
might there have been a decrease in the wakeups from the rcu_preempt
kthread?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit")
  2016-06-17 13:16     ` Paul E. McKenney
@ 2016-06-17 14:03       ` Peter Zijlstra
  2016-06-17 15:32         ` regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early") Jisheng Zhang
  2016-06-17 15:32         ` regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit") Rafael J. Wysocki
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2016-06-17 14:03 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Rafael J. Wysocki, Jisheng Zhang, Rafael J. Wysocki,
	Viresh Kumar, linux-pm, Linux Kernel Mailing List

On Fri, Jun 17, 2016 at 06:16:51AM -0700, Paul E. McKenney wrote:
> > Paul, Peter, any ideas about what may be going on here?
> 
> Looks to me like this commit moved some code from synchronize_rcu() to
> synchronize_sched().  Assuming that this is a CONFIG_PREEMPT=y system,
> might there have been a decrease in the wakeups from the rcu_preempt
> kthread?

The 'funny' thing is though; those synchronize thingies are only reached
when we change cpufreq policy, so things like:

  for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done

Something which is hardly possible when idle. Weird.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-17 14:03       ` Peter Zijlstra
@ 2016-06-17 15:32         ` Jisheng Zhang
  2016-06-17 15:40           ` Rafael J. Wysocki
  2016-06-23 16:07           ` Doug Smythies
  2016-06-17 15:32         ` regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit") Rafael J. Wysocki
  1 sibling, 2 replies; 17+ messages in thread
From: Jisheng Zhang @ 2016-06-17 15:32 UTC (permalink / raw)
  To: Peter Zijlstra, Paul E. McKenney
  Cc: Rafael J. Wysocki, Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List

Hi all,

First of all, sorry for top post, only webmail is available now.

Second, sorry again for report incorrect commit, I were too tired this morning so I remember the wrong commit.  The regression is caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early"), so I update the email title.

here is the bisect log:

# good: [9735a22799b9214d17d3c231fe377fc852f042e9] Linux 4.6-rc2
git bisect good 9735a22799b9214d17d3c231fe377fc852f042e9

# bad: [bf16200689118d19de1b8d2a3c314fc21f5dc7bb] Linux 4.6-rc3
git bisect bad bf16200689118d19de1b8d2a3c314fc21f5dc7bb

# good: [839a3f765728cdca0057a12e2dc0bf669ac1c22e] Merge branch 'for-linus-4.6' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
git bisect good 839a3f765728cdca0057a12e2dc0bf669ac1c22e

# bad: [63b106a87dd84283e21aa2ce476732633eaab11d] Merge tag 'md/4.6-rc2-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
git bisect bad 63b106a87dd84283e21aa2ce476732633eaab11d

# good: [30d237a6c2e9be1bb816fe8e787b88fd7aad833b] Merge tag 'mac80211-for-davem-2016-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
git bisect good 30d237a6c2e9be1bb816fe8e787b88fd7aad833b

# bad: [fa81e66ec8648f62e96e95e53db2ea95a4b57b26] Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'acpi-cppc'
git bisect bad fa81e66ec8648f62e96e95e53db2ea95a4b57b26

# good: [08820546e4c30c84d0a1f1a49df055e1719c07ea] intel_idle: Propagate hot plug errors.
git bisect good 08820546e4c30c84d0a1f1a49df055e1719c07ea

# bad: [b318556479cc923970a79d6c2311138581c0db83] cpufreq: dt: Drop stale comment
git bisect bad b318556479cc923970a79d6c2311138581c0db83

# bad: [febce40febcff3ccdb33f63456ffc4cfc61640c8] intel_pstate: Avoid extra invocation of intel_pstate_sample()
git bisect bad febce40febcff3ccdb33f63456ffc4cfc61640c8

# bad: [bb6ab52f2befe1fb29ac198f27d8a6aadf510f81] intel_pstate: Do not set utilization update hook too early
git bisect bad bb6ab52f2befe1fb29ac198f27d8a6aadf510f81

# first bad commit: [bb6ab52f2befe1fb29ac198f27d8a6aadf510f81] intel_pstate: Do not set utilization update hook too early


________________________________________
From: Peter Zijlstra [peterz@infradead.org]
Sent: Friday, June 17, 2016 22:03
To: Paul E. McKenney
Cc: Rafael J. Wysocki; Jisheng Zhang; Rafael J. Wysocki; Viresh Kumar; linux-pm@vger.kernel.org; Linux Kernel Mailing List
Subject: Re: regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit")

On Fri, Jun 17, 2016 at 06:16:51AM -0700, Paul E. McKenney wrote:
> > Paul, Peter, any ideas about what may be going on here?
>
> Looks to me like this commit moved some code from synchronize_rcu() to
> synchronize_sched().  Assuming that this is a CONFIG_PREEMPT=y system,
> might there have been a decrease in the wakeups from the rcu_preempt
> kthread?

The 'funny' thing is though; those synchronize thingies are only reached
when we change cpufreq policy, so things like:

  for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done

Something which is hardly possible when idle. Weird.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit")
  2016-06-17 14:03       ` Peter Zijlstra
  2016-06-17 15:32         ` regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early") Jisheng Zhang
@ 2016-06-17 15:32         ` Rafael J. Wysocki
  1 sibling, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2016-06-17 15:32 UTC (permalink / raw)
  To: Peter Zijlstra, Jisheng Zhang
  Cc: Paul E. McKenney, Rafael J. Wysocki, Rafael J. Wysocki,
	Viresh Kumar, linux-pm, Linux Kernel Mailing List

On Fri, Jun 17, 2016 at 4:03 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jun 17, 2016 at 06:16:51AM -0700, Paul E. McKenney wrote:
>> > Paul, Peter, any ideas about what may be going on here?
>>
>> Looks to me like this commit moved some code from synchronize_rcu() to
>> synchronize_sched().  Assuming that this is a CONFIG_PREEMPT=y system,
>> might there have been a decrease in the wakeups from the rcu_preempt
>> kthread?
>
> The 'funny' thing is though; those synchronize thingies are only reached
> when we change cpufreq policy, so things like:
>
>   for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i ; done

Or if you change min or max for a policy and the driver is intel_pstate.

> Something which is hardly possible when idle. Weird.

Well, exactly.

Jisheng, do you use intel_pstate or acpi-cpufreq as the scaling driver?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-17 15:32         ` regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early") Jisheng Zhang
@ 2016-06-17 15:40           ` Rafael J. Wysocki
  2016-06-17 15:42             ` Rafael J. Wysocki
  2016-06-23 16:07           ` Doug Smythies
  1 sibling, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2016-06-17 15:40 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Peter Zijlstra, Paul E. McKenney, Rafael J. Wysocki,
	Rafael J. Wysocki, Viresh Kumar, linux-pm,
	Linux Kernel Mailing List

On Fri, Jun 17, 2016 at 5:32 PM, Jisheng Zhang <jszhang@marvell.com> wrote:
> Hi all,
>
> First of all, sorry for top post, only webmail is available now.
>
> Second, sorry again for report incorrect commit, I were too tired this morning so I remember the wrong commit.  The regression is caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early"), so I update the email title.

OK, that makes much more sense. :-)

And

4578ee7e1def intel_pstate: Avoid unnecessary synchronize_sched()
during initialization

is not sufficient I suppose?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-17 15:40           ` Rafael J. Wysocki
@ 2016-06-17 15:42             ` Rafael J. Wysocki
  2016-06-17 15:53               ` Jisheng Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Rafael J. Wysocki @ 2016-06-17 15:42 UTC (permalink / raw)
  To: Jisheng Zhang
  Cc: Peter Zijlstra, Paul E. McKenney, Rafael J. Wysocki,
	Viresh Kumar, linux-pm, Linux Kernel Mailing List

On Fri, Jun 17, 2016 at 5:40 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Fri, Jun 17, 2016 at 5:32 PM, Jisheng Zhang <jszhang@marvell.com> wrote:
>> Hi all,
>>
>> First of all, sorry for top post, only webmail is available now.
>>
>> Second, sorry again for report incorrect commit, I were too tired this morning so I remember the wrong commit.  The regression is caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early"), so I update the email title.
>
> OK, that makes much more sense. :-)
>
> And
>
> 4578ee7e1def intel_pstate: Avoid unnecessary synchronize_sched()
> during initialization
>
> is not sufficient I suppose?

I mean, it is not sufficient to reduce the number of wakeups again?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-17 15:42             ` Rafael J. Wysocki
@ 2016-06-17 15:53               ` Jisheng Zhang
  2016-06-17 16:09                 ` Jisheng Zhang
  0 siblings, 1 reply; 17+ messages in thread
From: Jisheng Zhang @ 2016-06-17 15:53 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Paul E. McKenney, Rafael J. Wysocki,
	Viresh Kumar, linux-pm, Linux Kernel Mailing List

Dear Rafael,

I used intel_pstate.

I just tested v4.7-rc3, which should include commit 4578ee7e1def intel_pstate: Avoid
unnecessary synchronize_sched() during initialization, I can still get the same 
wakeups, so it's not sufficient.

Another clue maybe helpful, I found these top2 wakeup/s is consistent, e.g
rcu_sched always gives about 10 wakeups/s, and tick_sched_timer gave
5.6-5.9 wakeups/s, 

            144.3 µs/s      10.0        Process        [rcu_sched]

            193.9 µs/s       5.7        Timer          tick_sched_timer

Thanks,
Jisheng

________________________________________
From:  Rafael J. Wysocki 
Sent: Friday, June 17, 2016 23:42
To: Jisheng Zhang
Cc: Peter Zijlstra; Paul E. McKenney; Rafael J. Wysocki; Viresh Kumar; linux-pm@vger.kernel.org; Linux Kernel Mailing List
Subject: Re: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")

On Fri, Jun 17, 2016 at 5:40 PM, Rafael J. Wysocki  wrote:
> On Fri, Jun 17, 2016 at 5:32 PM, Jisheng Zhang  wrote:
>> Hi all,
>>
>> First of all, sorry for top post, only webmail is available now.
>>
>> Second, sorry again for report incorrect commit, I were too tired this morning so I remember the wrong commit.  The regression is caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early"), so I update the email title.
>
> OK, that makes much more sense. :-)
>
> And
>
> 4578ee7e1def intel_pstate: Avoid unnecessary synchronize_sched()
> during initialization
>
> is not sufficient I suppose?

I mean, it is not sufficient to reduce the number of wakeups again?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-17 15:53               ` Jisheng Zhang
@ 2016-06-17 16:09                 ` Jisheng Zhang
  2016-06-25  0:14                   ` Rafael J. Wysocki
  0 siblings, 1 reply; 17+ messages in thread
From: Jisheng Zhang @ 2016-06-17 16:09 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Peter Zijlstra, Paul E. McKenney, Rafael J. Wysocki,
	Viresh Kumar, linux-pm, Linux Kernel Mailing List

Dear all,

If using acpi-cpufreq instead, v4.6, v4.6-rc3, v4.7-rc3 can't reproduce the issue. It seems
only intel_pstate is impacted.

Thanks,
Jisheng
________________________________________
From: Jisheng Zhang
Sent: Friday, June 17, 2016 23:53
To: Rafael J. Wysocki
Cc: Peter Zijlstra; Paul E. McKenney; Rafael J. Wysocki; Viresh Kumar; linux-pm@vger.kernel.org; Linux Kernel Mailing List
Subject: RE: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")

Dear Rafael,

I used intel_pstate.

I just tested v4.7-rc3, which should include commit 4578ee7e1def intel_pstate: Avoid
unnecessary synchronize_sched() during initialization, I can still get the same
wakeups, so it's not sufficient.

Another clue maybe helpful, I found these top2 wakeup/s is consistent, e.g
rcu_sched always gives about 10 wakeups/s, and tick_sched_timer gave
5.6-5.9 wakeups/s,

            144.3 µs/s      10.0        Process        [rcu_sched]

            193.9 µs/s       5.7        Timer          tick_sched_timer

Thanks,
Jisheng

________________________________________
From:  Rafael J. Wysocki
Sent: Friday, June 17, 2016 23:42
To: Jisheng Zhang
Cc: Peter Zijlstra; Paul E. McKenney; Rafael J. Wysocki; Viresh Kumar; linux-pm@vger.kernel.org; Linux Kernel Mailing List
Subject: Re: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")

On Fri, Jun 17, 2016 at 5:40 PM, Rafael J. Wysocki  wrote:
> On Fri, Jun 17, 2016 at 5:32 PM, Jisheng Zhang  wrote:
>> Hi all,
>>
>> First of all, sorry for top post, only webmail is available now.
>>
>> Second, sorry again for report incorrect commit, I were too tired this morning so I remember the wrong commit.  The regression is caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early"), so I update the email title.
>
> OK, that makes much more sense. :-)
>
> And
>
> 4578ee7e1def intel_pstate: Avoid unnecessary synchronize_sched()
> during initialization
>
> is not sufficient I suppose?

I mean, it is not sufficient to reduce the number of wakeups again?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-17 15:32         ` regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early") Jisheng Zhang
  2016-06-17 15:40           ` Rafael J. Wysocki
@ 2016-06-23 16:07           ` Doug Smythies
  2016-06-25  0:03             ` Rafael J. Wysocki
  1 sibling, 1 reply; 17+ messages in thread
From: Doug Smythies @ 2016-06-23 16:07 UTC (permalink / raw)
  To: 'Jisheng Zhang', 'Rafael J. Wysocki',
	'Rafael J. Wysocki'
  Cc: 'Viresh Kumar',
	linux-pm, 'Linux Kernel Mailing List',
	'Peter Zijlstra', 'Paul E. McKenney'

On 2016.06.17 08:32 Jisheng Zhang wrote:

> Second, sorry again for report incorrect commit, 
> I were too tired this morning so I remember the wrong commit.
> The regression is caused by bb6ab52f2bef ("intel_pstate: Do not
> set utilization update hook too early"), so I update the email title.

Summary:

Isn't running powertop itself a significant contributor here?

When running powertop, the system is no longer "idle".
Powertop has a significant effect on a system that achieves such a quite "idle state".
If one uses a kernel that includes the later commit:
"intel_pstate: Avoid extra invocation of intel_pstate_sample()"
then there are many skipped samples when running powertop.

To some extent, some increase in wakeups per second, is what was supposed to
happen with the recent changes, and is what solved the very long standing issue
of failing to raise the CPU frequency when high enough periodic loads just so
happened to be idle on jiffy boundaries. Admittedly, I am somewhat confusing
wakeups per second with number of passes through the intel_pstate driver
per second here.

Details:

Note 1: On my system (server, no GUI), I could only get down to an "idle" of around
20 wakeups / second, after turning off Samba, mysql, apache, cron.

Using kernel 4.7-rc3 (and then later 4.7-rc4), I observed significant impact when running powertop.
(I also isolated this to the same commit, bb6ab52f2bef)
As a function of sample time:
Sample time (seconds)	Events/second
300				~20
200				~20
100				~22
 50				~29
 30				~33
 20				~44
  3				~155

Instead of using powertop, if I observe timer stats manually I get ~20 wakeups / second.
If I use an older kernel (4.4 ish) I get ~20 wakeups per second.

If I leave mysql running, chosen because while idling it has a fairly high number
of events per second, no increase in wakeups per second is observed before or after the commit
in question here (it is always around 25 Events / second).

Using a kernel that includes the commit
"intel_pstate: Avoid extra invocation of intel_pstate_sample()"
and running powertop with a sample interval of 5 seconds, just to increase
the number of events per second, then intel_pstate samples are skipped at
a rate of about 200 per minute, and some trace data makes no sense at all. Besides the
skipped samples, sometimes the new pstate is set to maximum for no apparent reason.
It seems that powertop messes with the minimum frequency, sometimes setting it
to the maximum turbo value (presumably at a rate of about 100 per minute,
causing one skip when setting it high and one skip when it is restored) Example:

The following script was run while running powertop:

#!/bin/dash

echo "watch minimum frequency. Doug Smythies 2016.06.23"
echo "I think powertop messes with the minimum frequency. Try to catch it."
echo "If I use watch, with -g, it exits but doesn't leave the display showing the numbers."
echo
  cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq > watch_min
while [ 1 ];
do
  echo "." >> watch_min
# Don't hog too much CPU, even though it makes it harder to get a hit
  sleep 0.2
  cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq >> watch_min
done

In the resulting file several occurrences similar to the below are observed:

.
1600000
3800000   <<<<<<<<<< ????
1600000
1600000
1600000
1600000
1600000
1600000
.
1600000
1600000
1600000
1600000
1600000
1600000
1600000
1600000
.

On my, otherwise idle, system package power goes up by about 1 watt while running powertop
(with a sample time of 5 seconds) (from ~3.9 to ~4.9 watts).

> here is the bisect log:
>
> # good: [9735a22799b9214d17d3c231fe377fc852f042e9] Linux 4.6-rc2
> git bisect good 9735a22799b9214d17d3c231fe377fc852f042e9
> # bad: [bf16200689118d19de1b8d2a3c314fc21f5dc7bb] Linux 4.6-rc3
> git bisect bad bf16200689118d19de1b8d2a3c314fc21f5dc7bb
> # good: [839a3f765728cdca0057a12e2dc0bf669ac1c22e] Merge branch 'for-linus-4.6' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
> git bisect good 839a3f765728cdca0057a12e2dc0bf669ac1c22e
> # bad: [63b106a87dd84283e21aa2ce476732633eaab11d] Merge tag 'md/4.6-rc2-fix' of
git://git.kernel.org/pub/scm/linux/kernel/git/shli/md
> git bisect bad 63b106a87dd84283e21aa2ce476732633eaab11d
> # good: [30d237a6c2e9be1bb816fe8e787b88fd7aad833b] Merge tag 'mac80211-for-davem-2016-04-06' of
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
> git bisect good 30d237a6c2e9be1bb816fe8e787b88fd7aad833b
> # bad: [fa81e66ec8648f62e96e95e53db2ea95a4b57b26] Merge branches 'pm-cpufreq', 'pm-cpuidle' and 'acpi-cppc'
> git bisect bad fa81e66ec8648f62e96e95e53db2ea95a4b57b26
> # good: [08820546e4c30c84d0a1f1a49df055e1719c07ea] intel_idle: Propagate hot plug errors.
> git bisect good 08820546e4c30c84d0a1f1a49df055e1719c07ea
> # bad: [b318556479cc923970a79d6c2311138581c0db83] cpufreq: dt: Drop stale comment
> git bisect bad b318556479cc923970a79d6c2311138581c0db83
> # bad: [febce40febcff3ccdb33f63456ffc4cfc61640c8] intel_pstate: Avoid extra invocation of intel_pstate_sample()
> git bisect bad febce40febcff3ccdb33f63456ffc4cfc61640c8
> # bad: [bb6ab52f2befe1fb29ac198f27d8a6aadf510f81] intel_pstate: Do not set utilization update hook too early
> git bisect bad bb6ab52f2befe1fb29ac198f27d8a6aadf510f81
> # first bad commit: [bb6ab52f2befe1fb29ac198f27d8a6aadf510f81] intel_pstate: Do not set utilization update hook too early

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-23 16:07           ` Doug Smythies
@ 2016-06-25  0:03             ` Rafael J. Wysocki
  0 siblings, 0 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2016-06-25  0:03 UTC (permalink / raw)
  To: Doug Smythies
  Cc: Jisheng Zhang, Rafael J. Wysocki, Rafael J. Wysocki,
	Viresh Kumar, linux-pm, Linux Kernel Mailing List,
	Peter Zijlstra, Paul E. McKenney

On Thu, Jun 23, 2016 at 6:07 PM, Doug Smythies <dsmythies@telus.net> wrote:
> On 2016.06.17 08:32 Jisheng Zhang wrote:
>
>> Second, sorry again for report incorrect commit,
>> I were too tired this morning so I remember the wrong commit.
>> The regression is caused by bb6ab52f2bef ("intel_pstate: Do not
>> set utilization update hook too early"), so I update the email title.
>
> Summary:
>
> Isn't running powertop itself a significant contributor here?

It shouldn't be, because the overhead in question comes from
->setpolicy and powertop itself should not cause it to be invoked that
often.

Thanks,
Rafael

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-17 16:09                 ` Jisheng Zhang
@ 2016-06-25  0:14                   ` Rafael J. Wysocki
  2016-06-25 15:09                     ` Pandruvada, Srinivas
                                       ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Rafael J. Wysocki @ 2016-06-25  0:14 UTC (permalink / raw)
  To: Jisheng Zhang, Viresh Kumar
  Cc: Rafael J. Wysocki, linux-pm, Linux Kernel Mailing List,
	Srinivas Pandruvada

On Friday, June 17, 2016 04:09:33 PM Jisheng Zhang wrote:
> Dear all,
> 
> If using acpi-cpufreq instead, v4.6, v4.6-rc3, v4.7-rc3 can't reproduce the issue. It seems
> only intel_pstate is impacted.

Which is quite obvious, since the commit your bisection led to was
intel_pstate-specific. :-)

If the issue is what I'm thinking it is, the patch below should help, so
can you please test it?

Thanks,
Rafael

---
From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Subject: [PATCH] intel_pstate: Do not clear utilization update hooks on policy changes

intel_pstate_set_policy() is invoked by the cpufreq core during
driver initialization, on changes of policy attributes (minimim and
maximum frequency, for example) via sysfs and via CPU notifications
from the platform firmware.  On some platforms the latter may occur
relatively often.

Commit bb6ab52f2bef (intel_pstate: Do not set utilization update hook
too early) made intel_pstate_set_policy() clear the CPU's utilization
update hook before updating the policy attributes for it (and set the
hook again after doind that), but that involves invoking
synchronize_sched() and adds overhead to the CPU notifications
mentioned above and to the sched-RCU handling in general.

That extra overhead is arguably not necessary, because updating
policy attributes when the CPU's utilization update hook is active
should not lead to any adverse effects, so drop the clearing of
the hook from intel_pstate_set_policy() and make it check if
the hook has been set already when attempting to set it.

Fixes: bb6ab52f2bef (intel_pstate: Do not set utilization update hook too early)
Reported-by: Jisheng Zhang <jszhang@marvell.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
---
 drivers/cpufreq/intel_pstate.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Index: linux-pm/drivers/cpufreq/intel_pstate.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/intel_pstate.c
+++ linux-pm/drivers/cpufreq/intel_pstate.c
@@ -1440,6 +1440,9 @@ static void intel_pstate_set_update_util
 {
 	struct cpudata *cpu = all_cpu_data[cpu_num];

+	if (cpu->update_util_set)
+		return;
+
 	/* Prevent intel_pstate_update_util() from using stale data. */
 	cpu->sample.time = 0;
 	cpufreq_add_update_util_hook(cpu_num, &cpu->update_util,
@@ -1480,8 +1483,6 @@ static int intel_pstate_set_policy(struc
 	if (!policy->cpuinfo.max_freq)
 		return -ENODEV;

-	intel_pstate_clear_update_util_hook(policy->cpu);
-
 	pr_debug("set_policy cpuinfo.max %u policy->max %u\n",
 		 policy->cpuinfo.max_freq, policy->max);

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-25  0:14                   ` Rafael J. Wysocki
@ 2016-06-25 15:09                     ` Pandruvada, Srinivas
  2016-06-26  0:27                     ` Doug Smythies
  2016-06-27  6:08                     ` Jisheng Zhang
  2 siblings, 0 replies; 17+ messages in thread
From: Pandruvada, Srinivas @ 2016-06-25 15:09 UTC (permalink / raw)
  To: jszhang, viresh.kumar, rjw; +Cc: linux-kernel, linux-pm, rafael

On Sat, 2016-06-25 at 02:14 +0200, Rafael J. Wysocki wrote:
> On Friday, June 17, 2016 04:09:33 PM Jisheng Zhang wrote:
> > Dear all,
> > 
> > If using acpi-cpufreq instead, v4.6, v4.6-rc3, v4.7-rc3 can't
> > reproduce the issue. It seems
> > only intel_pstate is impacted.
> 
> Which is quite obvious, since the commit your bisection led to was
> intel_pstate-specific. :-)
> 
We should also check why the set_policy callback is getting called
quite often. May be some thermal zone is tripping quite often.

echo 'file thermal_core.c +p' > /sys/kernel/debug/dynamic_debug/control

may give us some clue.

Thanks,
Srinivas

> If the issue is what I'm thinking it is, the patch below should help,
> so
> can you please test it?
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] intel_pstate: Do not clear utilization update hooks
> on policy changes
> 
> intel_pstate_set_policy() is invoked by the cpufreq core during
> driver initialization, on changes of policy attributes (minimim and
> maximum frequency, for example) via sysfs and via CPU notifications
> from the platform firmware.  On some platforms the latter may occur
> relatively often.
> 
> Commit bb6ab52f2bef (intel_pstate: Do not set utilization update hook
> too early) made intel_pstate_set_policy() clear the CPU's utilization
> update hook before updating the policy attributes for it (and set the
> hook again after doind that), but that involves invoking
> synchronize_sched() and adds overhead to the CPU notifications
> mentioned above and to the sched-RCU handling in general.
> 
> That extra overhead is arguably not necessary, because updating
> policy attributes when the CPU's utilization update hook is active
> should not lead to any adverse effects, so drop the clearing of
> the hook from intel_pstate_set_policy() and make it check if
> the hook has been set already when attempting to set it.
> 
> Fixes: bb6ab52f2bef (intel_pstate: Do not set utilization update hook
> too early)
> Reported-by: Jisheng Zhang <jszhang@marvell.com>
> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -1440,6 +1440,9 @@ static void intel_pstate_set_update_util
>  {
>  	struct cpudata *cpu = all_cpu_data[cpu_num];
>  
> +	if (cpu->update_util_set)
> +		return;
> +
>  	/* Prevent intel_pstate_update_util() from using stale data.
> */
>  	cpu->sample.time = 0;
>  	cpufreq_add_update_util_hook(cpu_num, &cpu->update_util,
> @@ -1480,8 +1483,6 @@ static int intel_pstate_set_policy(struc
>  	if (!policy->cpuinfo.max_freq)
>  		return -ENODEV;
>  
> -	intel_pstate_clear_update_util_hook(policy->cpu);
> -
>  	pr_debug("set_policy cpuinfo.max %u policy->max %u\n",
>  		 policy->cpuinfo.max_freq, policy->max);
>  
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-25  0:14                   ` Rafael J. Wysocki
  2016-06-25 15:09                     ` Pandruvada, Srinivas
@ 2016-06-26  0:27                     ` Doug Smythies
  2016-06-27  6:08                     ` Jisheng Zhang
  2 siblings, 0 replies; 17+ messages in thread
From: Doug Smythies @ 2016-06-26  0:27 UTC (permalink / raw)
  To: 'Rafael J. Wysocki', 'Jisheng Zhang',
	'Srinivas Pandruvada'
  Cc: 'Rafael J. Wysocki',
	linux-pm, 'Linux Kernel Mailing List',
	'Viresh Kumar'

On 2016.06.24 16:09 Rafael J. Wysocki wrote:
> On Friday, June 17, 2016 04:09:33 PM Jisheng Zhang wrote:
>> Dear all,
>> 
>> If using acpi-cpufreq instead, v4.6, v4.6-rc3, v4.7-rc3 can't reproduce the issue. It seems
>> only intel_pstate is impacted.
>
> Which is quite obvious, since the commit your bisection led to was
> intel_pstate-specific. :-)
>
> If the issue is what I'm thinking it is, the patch below should help, so
> can you please test it?

Rafael, while you asked Jisheng to test, I tested it also, since I was
already setup for testing this stuff. Summary: works great, see further below
for details.

On 2016.06.25 08:10 Srinivas Pandruvada wrote:
> We should also check why the set_policy callback is getting called
> quite often. May be some thermal zone is tripping quite often.
>
> echo 'file thermal_core.c +p' > /sys/kernel/debug/dynamic_debug/control
>
> may give us some clue.

Srinivas, This part has me baffled, particularly with the new test data
(see further below). Note that my test sever never suffers from thermal events,
It can run flat out on all CPU's forever.

Details (some old test data repeated so as to provide context with the new data):

Powertop Wakeups/Second as a function of sample time:
Sample time	Kernel 4.7-rc4	+rjw patch
(seconds)	Wakeups/second	Wakeups/second
300			~20			~14
200			~20			~15
100			~22			~16
 50			~29			~14
 30			~33			~15
 20			~44			~15
  5						~17 (noisy)
  3			~155			~20 (noisy)

Manual timer stats method:
Kernel 4.7-rc4: ~20 Events/Second
Kernel 4.7-rc4+rjw patch: ~20 Events / Second
Kernel 4.4.0-24 (Ubuntu version numbering method): ~20 Events / Second

Note (to self): Do the timer stats method over a long period (say 300 seconds)
so as to reduce the localized influence from running the script itself.

Skipped samples in the intel_pstate driver while running powertop at 5 second sample time:
Kernel 4.7-rc4: ~200 / minute, or ~3.3 per second.
Kernel 4.7-rc4+rjw patch: 0.
Kernel 4.4.0-24: 0.
Other tests were done with Kernel 4.7-rc4 such as compile the kernel and a bunch of Phoronix tests,
and some skipped samples were observed (8 over a 3 hour trace session), but nothing like when running powertop.

Check for messing with the minimum frequency while running powertop at 5 second sample time:
Command: watch -n 0.3 -g cat /sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
Kernel 4.7-rc4: Never takes more than a few second to get a hit.
Kernel 4.7-rc4+rjw patch: Ran for over 2 hours without a hit.
Kernel 4.4.0-24: Can not recall how long I ran the test for. No hit.
Srinivas, this is what has me baffled. If it wasn't powertop itself messing
with the minimum CPU clock frequency and setting it to maximum, then what was it?

On an otherwise "idle" system,
how many times does the intel-pstate driver run per unit time?
Kernel 4.7-rc4: (14341 + 0 skipped) times / 1000 seconds.
Kernel 4.7-rc4 + powertop --time=5: (38148 + 3075 skipped) times / 1000 seconds.
Kernel 4.7-rc4+rjw: (11947 + 0 skipped) times / 1000 seconds.
Kernel 4.7-rc4+rjw + powertop --time=5: (22657 + 0 skipped) times / 1000 seconds.
Kernel 4.4.0-24: (5725 + 0 skipped) times / 1000 seconds.
Kernel 4.4.0-24 + powertop --time=5: (16656 + 0 skipped) times / 1000 seconds.

Important note: It is a good thing that the number of driver passes per unit time increased
with the recent changes. The driver was not running often enough before,
often hitting the watchdog limits. Isn't it energy and performance that
matters (see next test)?

Energy consumption on an otherwise "idle" system (package power):
Kernel 4.7-rc4: 3.84 Watts
Kernel 4.7-rc4 + powertop --time=5: sometimes 4.92 watts, sometimes 6.2 watts (not sure why)
Kernel 4.7-rc4+rjw: 3.88 Watts
Kernel 4.7-rc4+rjw + powertop --time=5: 4.5 watts (did observe a 6.3 watts one)
Kernel 4.4.0-24: 3.92 Watts.
Kernel 4.4.0-24 + powertop --time=5: did not test.
While there are variations in the results, the 2 to 3% savings in idle energy seems
somewhat consistent (referring to between Kernel 4.4 and 4.7-rc4+rjw patch, without powertop).

... Doug

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early")
  2016-06-25  0:14                   ` Rafael J. Wysocki
  2016-06-25 15:09                     ` Pandruvada, Srinivas
  2016-06-26  0:27                     ` Doug Smythies
@ 2016-06-27  6:08                     ` Jisheng Zhang
  2 siblings, 0 replies; 17+ messages in thread
From: Jisheng Zhang @ 2016-06-27  6:08 UTC (permalink / raw)
  To: Rafael J. Wysocki
  Cc: Viresh Kumar, Rafael J. Wysocki, linux-pm,
	Linux Kernel Mailing List, Srinivas Pandruvada

Dear Rafael,

On Sat, 25 Jun 2016 02:14:28 +0200 "Rafael J. Wysocki" wrote:

> On Friday, June 17, 2016 04:09:33 PM Jisheng Zhang wrote:
> > Dear all,
> > 
> > If using acpi-cpufreq instead, v4.6, v4.6-rc3, v4.7-rc3 can't reproduce the issue. It seems
> > only intel_pstate is impacted.  
> 
> Which is quite obvious, since the commit your bisection led to was
> intel_pstate-specific. :-)
> 
> If the issue is what I'm thinking it is, the patch below should help, so
> can you please test it?
> 
> Thanks,
> Rafael
> 
> ---
> From: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Subject: [PATCH] intel_pstate: Do not clear utilization update hooks on policy changes
> 
> intel_pstate_set_policy() is invoked by the cpufreq core during
> driver initialization, on changes of policy attributes (minimim and
> maximum frequency, for example) via sysfs and via CPU notifications
> from the platform firmware.  On some platforms the latter may occur
> relatively often.
> 
> Commit bb6ab52f2bef (intel_pstate: Do not set utilization update hook
> too early) made intel_pstate_set_policy() clear the CPU's utilization
> update hook before updating the policy attributes for it (and set the
> hook again after doind that), but that involves invoking
> synchronize_sched() and adds overhead to the CPU notifications
> mentioned above and to the sched-RCU handling in general.
> 
> That extra overhead is arguably not necessary, because updating
> policy attributes when the CPU's utilization update hook is active
> should not lead to any adverse effects, so drop the clearing of
> the hook from intel_pstate_set_policy() and make it check if
> the hook has been set already when attempting to set it.

This patch works! 

on a 32 snb cores server:

80 wakeups/s w/o the patch

7 wakeups/s w/ the patch


on a 4 snb cores laptop:

22 wakeups/s w/o the patch

6 wakeups/s w/ the patch

Thank you so much for fixing it ;)

> 
> Fixes: bb6ab52f2bef (intel_pstate: Do not set utilization update hook too early)
> Reported-by: Jisheng Zhang <jszhang@marvell.com>

So feel free to add

Tested-by: Jisheng Zhang <jszhang@marvell.com>

> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> ---
>  drivers/cpufreq/intel_pstate.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Index: linux-pm/drivers/cpufreq/intel_pstate.c
> ===================================================================
> --- linux-pm.orig/drivers/cpufreq/intel_pstate.c
> +++ linux-pm/drivers/cpufreq/intel_pstate.c
> @@ -1440,6 +1440,9 @@ static void intel_pstate_set_update_util
>  {
>  	struct cpudata *cpu = all_cpu_data[cpu_num];
>  
> +	if (cpu->update_util_set)
> +		return;
> +
>  	/* Prevent intel_pstate_update_util() from using stale data. */
>  	cpu->sample.time = 0;
>  	cpufreq_add_update_util_hook(cpu_num, &cpu->update_util,
> @@ -1480,8 +1483,6 @@ static int intel_pstate_set_policy(struc
>  	if (!policy->cpuinfo.max_freq)
>  		return -ENODEV;
>  
> -	intel_pstate_clear_update_util_hook(policy->cpu);
> -
>  	pr_debug("set_policy cpuinfo.max %u policy->max %u\n",
>  		 policy->cpuinfo.max_freq, policy->max);
>  
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2016-06-27  6:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-17  8:30 regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit") Jisheng Zhang
2016-06-17  8:40 ` Jisheng Zhang
2016-06-17 13:09   ` Rafael J. Wysocki
2016-06-17 13:16     ` Paul E. McKenney
2016-06-17 14:03       ` Peter Zijlstra
2016-06-17 15:32         ` regression caused by bb6ab52f2bef ("intel_pstate: Do not set utilization update hook too early") Jisheng Zhang
2016-06-17 15:40           ` Rafael J. Wysocki
2016-06-17 15:42             ` Rafael J. Wysocki
2016-06-17 15:53               ` Jisheng Zhang
2016-06-17 16:09                 ` Jisheng Zhang
2016-06-25  0:14                   ` Rafael J. Wysocki
2016-06-25 15:09                     ` Pandruvada, Srinivas
2016-06-26  0:27                     ` Doug Smythies
2016-06-27  6:08                     ` Jisheng Zhang
2016-06-23 16:07           ` Doug Smythies
2016-06-25  0:03             ` Rafael J. Wysocki
2016-06-17 15:32         ` regression caused by 08f511fd41c3 ("cpufreq: Reduce cpufreq_update_util() overhead a bit") Rafael J. Wysocki

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).