Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
@ 2016-06-16 16:38 Jirka Hladky
  2016-06-16 17:22 ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Jirka Hladky @ 2016-06-16 16:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Peter Zijlstra, Kamil Kolakowski

Hello,

we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel.

We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as
well affected.

We have observed the drop on variety of different x86_64 servers with
different configuration (different CPU models, RAM sizes, both with
Hyper Threading ON and OFF, different NUMA configurations (2 and 4
NUMA nodes)

Linpack and Stream benchmarks do not show any performance drop.

The performance drop increases with higher number of threads. The
maximum number of threads in each benchmark is the same as number of
CPUs.

We have opened a BZ to track the progress:
https://bugzilla.kernel.org/show_bug.cgi?id=120481

You can find more details along with graphs and tables there.

Do you have any hints which commit should we try to reverse?

Thanks a lot!
Jirka

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-16 16:38 Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel Jirka Hladky
@ 2016-06-16 17:22 ` Peter Zijlstra
  2016-06-16 23:04   ` Jirka Hladky
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-16 17:22 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Thu, Jun 16, 2016 at 06:38:50PM +0200, Jirka Hladky wrote:
> Hello,
> 
> we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008

Blergh, of course I don't have those.. :/

> benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel.
> 
> We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as
> well affected.
> 
> We have observed the drop on variety of different x86_64 servers with
> different configuration (different CPU models, RAM sizes, both with
> Hyper Threading ON and OFF, different NUMA configurations (2 and 4
> NUMA nodes)

What kind of config and userspace setup? Do you run this cruft in a
cgroup of sorts?

If so, does it change anything if you run it in the root cgroup?

> Linpack and Stream benchmarks do not show any performance drop.
> 
> The performance drop increases with higher number of threads. The
> maximum number of threads in each benchmark is the same as number of
> CPUs.
> 
> We have opened a BZ to track the progress:
> https://bugzilla.kernel.org/show_bug.cgi?id=120481
> 
> You can find more details along with graphs and tables there.
> 
> Do you have any hints which commit should we try to reverse?

There were only 66 commits or so, and I think we can rule out the
hotplug changes, which should reduce it even further.

You could see what the parent of this one does:

  2159197d6677 sched/core: Enable increased load resolution on 64-bit kernels

If not that, maybe the parent of:

  c58d25f371f5 sched/fair: Move record_wakee()

After that I suppose you'll have to go bisect.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-16 17:22 ` Peter Zijlstra
@ 2016-06-16 23:04   ` Jirka Hladky
  2016-06-21 13:17     ` Jirka Hladky
                       ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-16 23:04 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

> > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
> Blergh, of course I don't have those.. :/

SPECjvm2008 is publicly available.
https://www.spec.org/download.html

We will prepare a reproducer and attach it to the BZ.

> What kind of config and userspace setup? Do you run this cruft in a
> cgroup of sorts?

 No, we don't do any special setup except to control the number of threads.

Thanks for the hints which commits are most likely the root cause for
this. We will try to find the commit which has caused it.

Jirka



On Thu, Jun 16, 2016 at 7:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 16, 2016 at 06:38:50PM +0200, Jirka Hladky wrote:
>> Hello,
>>
>> we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
>
> Blergh, of course I don't have those.. :/
>
>> benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel.
>>
>> We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as
>> well affected.
>>
>> We have observed the drop on variety of different x86_64 servers with
>> different configuration (different CPU models, RAM sizes, both with
>> Hyper Threading ON and OFF, different NUMA configurations (2 and 4
>> NUMA nodes)
>
> What kind of config and userspace setup? Do you run this cruft in a
> cgroup of sorts?
>
> If so, does it change anything if you run it in the root cgroup?
>
>> Linpack and Stream benchmarks do not show any performance drop.
>>
>> The performance drop increases with higher number of threads. The
>> maximum number of threads in each benchmark is the same as number of
>> CPUs.
>>
>> We have opened a BZ to track the progress:
>> https://bugzilla.kernel.org/show_bug.cgi?id=120481
>>
>> You can find more details along with graphs and tables there.
>>
>> Do you have any hints which commit should we try to reverse?
>
> There were only 66 commits or so, and I think we can rule out the
> hotplug changes, which should reduce it even further.
>
> You could see what the parent of this one does:
>
>   2159197d6677 sched/core: Enable increased load resolution on 64-bit kernels
>
> If not that, maybe the parent of:
>
>   c58d25f371f5 sched/fair: Move record_wakee()
>
> After that I suppose you'll have to go bisect.
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-16 23:04   ` Jirka Hladky
@ 2016-06-21 13:17     ` Jirka Hladky
  2016-06-22  7:16     ` Peter Zijlstra
  2016-06-23 18:33     ` Peter Zijlstra
  2 siblings, 0 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-21 13:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

Hi Peter,

I have an update for this performance issue. I have tested several
kernels, I'm not at the parent of

  2159197d6677 sched/core: Enable increased load resolution on 64-bit kernels

and I still see the performance regression for multithreaded workloads.

There are only 27 commits remaining between v4.6 (last known to be OK)
and current HEAD (6ecdd74962f246dfe8750b7bea481a1c0816315d)
6ecdd74962f246dfe8750b7bea481a1c0816315d    sched/fair: Generalize the
load/util averages resolution definitionq hook unless util changed

See below [0].

Any hint which commit should I try now?

Thanks a lot!
Jirka

[0]
$ git log --pretty=oneline v4.6..HEAD kernel/sched
6ecdd74962f246dfe8750b7bea481a1c0816315d sched/fair: Generalize the
load/util averages resolution definition
2159197d66770ec01f75c93fb11dc66df81fd45b sched/core: Enable increased
load resolution on 64-bit kernels
e7904a28f5331c21d17af638cb477c83662e3cb6 locking/lockdep, sched/core:
Implement a better lock pinning scheme
eb58075149b7f0300ff19142e6245fe75db2a081 sched/core: Introduce 'struct rq_flags'
3e71a462dd483ce508a723356b293731e7d788ea sched/core: Move
task_rq_lock() out of line
64b7aad5798478ffff52e110878ccaae4c3aaa34 Merge branch 'sched/urgent'
into sched/core, to pick up fixes before applying new changes
f98db6013c557c216da5038d9c52045be55cd039 sched/core: Add
switch_mm_irqs_off() and use it in the scheduler
594dd290cf5403a9a5818619dfff42d8e8e0518e sched/cpufreq: Optimize
cpufreq update kicker to avoid update multiple times
fec148c000d0f9ac21679601722811eb60b4cc52 sched/deadline: Fix a bug in
dl_overflow()
9fd81dd5ce0b12341c9f83346f8d32ac68bd3841 sched/fair: Optimize
!CONFIG_NO_HZ_COMMON CPU load updates
1f41906a6fda1114debd3898668bd7ab6470ee41 sched/fair: Correctly handle
nohz ticks CPU load accounting
cee1afce3053e7aa0793fbd5f2e845fa2cef9e33 sched/fair: Gather CPU load
functions under a more conventional namespace
a2c6c91f98247fef0fe75216d607812485aeb0df sched/fair: Call cpufreq hook
in additional paths
41e0d37f7ac81297c07ba311e4ad39465b8c8295 sched/fair: Do not call
cpufreq hook unless util changed
21e96f88776deead303ecd30a17d1d7c2a1776e3 sched/fair: Move cpufreq hook
to update_cfs_rq_load_avg()
1f621e028baf391f6684003e32e009bc934b750f sched/fair: Fix asym packing
to select correct CPU
bd92883051a0228cc34996b8e766111ba10c9aac sched/cpuacct: Check for NULL
when using task_pt_regs()
2c923e94cd9c6acff3b22f0ae29cfe65e2658b40 sched/clock: Make
local_clock()/cpu_clock() inline
c78b17e28cc2c2df74264afc408bdc6aaf3fbcc8 sched/clock: Remove pointless
test in cpu_clock/local_clock
fb90a6e93c0684ab2629a42462400603aa829b9c sched/debug: Don't dump sched
debug info in SysRq-W
2b8c41daba327c633228169e8bd8ec067ab443f8 sched/fair: Initiate a new
task's util avg to a bounded value
1c3de5e19fc96206dd086e634129d08e5f7b1000 sched/fair: Update comments
after a variable rename
47252cfbac03644ee4a3adfa50c77896aa94f2bb sched/core: Add preempt
checks in preempt_schedule() code
bfdb198ccd99472c5bded689699eb30dd06316bb sched/numa: Remove
unnecessary NUMA dequeue update from non-SMP kernels
d02c071183e1c01a76811c878c8a52322201f81f sched/fair: Reset
nr_balance_failed after active balancing
d740037fac7052e49450f6fa1454f1144a103b55 sched/cpuacct: Split usage
accounting into user_usage and sys_usage
5ca3726af7f66a8cc71ce4414cfeb86deb784491 sched/cpuacct: Show all
possible CPUs in cpuacct output

On Fri, Jun 17, 2016 at 1:04 AM, Jirka Hladky <jhladky@redhat.com> wrote:
>> > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
>> Blergh, of course I don't have those.. :/
>
> SPECjvm2008 is publicly available.
> https://www.spec.org/download.html
>
> We will prepare a reproducer and attach it to the BZ.
>
>> What kind of config and userspace setup? Do you run this cruft in a
>> cgroup of sorts?
>
>  No, we don't do any special setup except to control the number of threads.
>
> Thanks for the hints which commits are most likely the root cause for
> this. We will try to find the commit which has caused it.
>
> Jirka
>
>
>
> On Thu, Jun 16, 2016 at 7:22 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Thu, Jun 16, 2016 at 06:38:50PM +0200, Jirka Hladky wrote:
>>> Hello,
>>>
>>> we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
>>
>> Blergh, of course I don't have those.. :/
>>
>>> benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel.
>>>
>>> We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as
>>> well affected.
>>>
>>> We have observed the drop on variety of different x86_64 servers with
>>> different configuration (different CPU models, RAM sizes, both with
>>> Hyper Threading ON and OFF, different NUMA configurations (2 and 4
>>> NUMA nodes)
>>
>> What kind of config and userspace setup? Do you run this cruft in a
>> cgroup of sorts?
>>
>> If so, does it change anything if you run it in the root cgroup?
>>
>>> Linpack and Stream benchmarks do not show any performance drop.
>>>
>>> The performance drop increases with higher number of threads. The
>>> maximum number of threads in each benchmark is the same as number of
>>> CPUs.
>>>
>>> We have opened a BZ to track the progress:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=120481
>>>
>>> You can find more details along with graphs and tables there.
>>>
>>> Do you have any hints which commit should we try to reverse?
>>
>> There were only 66 commits or so, and I think we can rule out the
>> hotplug changes, which should reduce it even further.
>>
>> You could see what the parent of this one does:
>>
>>   2159197d6677 sched/core: Enable increased load resolution on 64-bit kernels
>>
>> If not that, maybe the parent of:
>>
>>   c58d25f371f5 sched/fair: Move record_wakee()
>>
>> After that I suppose you'll have to go bisect.
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-16 23:04   ` Jirka Hladky
  2016-06-21 13:17     ` Jirka Hladky
@ 2016-06-22  7:16     ` Peter Zijlstra
  2016-06-22  7:49       ` Peter Zijlstra
  2016-06-22  8:20       ` Jirka Hladky
  2016-06-23 18:33     ` Peter Zijlstra
  2 siblings, 2 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-22  7:16 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote:
> > > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
> > Blergh, of course I don't have those.. :/
> 
> SPECjvm2008 is publicly available.
> https://www.spec.org/download.html

Urgh, I _so_ hate java.

Why does it have to pop up windows split between my screens, total fail.

In any case, I run it like:

   java -jar SPECjvm2008.jar --benchmarkThreads 40

because I have 40 cpus (2 sockets * 10 cores/socket * 2 threads/core).

It seems to produce numbers, but then ends with a splat:

Error while creating report: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
        at java.awt.Toolkit.loadAssistiveTechnologies(Toolkit.java:807)
        at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:886)
        at sun.swing.SwingUtilities2.getSystemMnemonicKeyMask(SwingUtilities2.java:2020)
        at javax.swing.plaf.basic.BasicLookAndFeel.initComponentDefaults(BasicLookAndFeel.java:1158)
        at javax.swing.plaf.metal.MetalLookAndFeel.initComponentDefaults(MetalLookAndFeel.java:431)
        at javax.swing.plaf.basic.BasicLookAndFeel.getDefaults(BasicLookAndFeel.java:148)
        at javax.swing.plaf.metal.MetalLookAndFeel.getDefaults(MetalLookAndFeel.java:1577)
        at javax.swing.UIManager.setLookAndFeel(UIManager.java:539)
        at javax.swing.UIManager.setLookAndFeel(UIManager.java:579)
        at javax.swing.UIManager.initializeDefaultLAF(UIManager.java:1349)
        at javax.swing.UIManager.initialize(UIManager.java:1459)
        at javax.swing.UIManager.maybeInitialize(UIManager.java:1426)
        at javax.swing.UIManager.getDefaults(UIManager.java:659)
        at javax.swing.UIManager.getColor(UIManager.java:701)
        at org.jfree.chart.JFreeChart.<clinit>(JFreeChart.java:246)
        at org.jfree.chart.ChartFactory.createXYLineChart(ChartFactory.java:1478)
        at spec.reporter.BenchmarkChart.<init>(BenchmarkChart.java:47)
        at spec.reporter.ReportGenerator.handleBenchmarkResult(ReportGenerator.java:141)
        at spec.reporter.ReportGenerator.handleBenchmarksResults(ReportGenerator.java:105)
        at spec.reporter.ReportGenerator.<init>(ReportGenerator.java:87)
        at spec.reporter.ReportGenerator.main2(ReportGenerator.java:750)
        at spec.reporter.Reporter.main2(Reporter.java:51)
        at spec.harness.Launch.createReport(Launch.java:307)
        at spec.harness.Launch.runBenchmarkSuite(Launch.java:250)
        at spec.harness.Launch.main(Launch.java:452)

WTF a benchmark needs that crap is beyond me, but whatever, I have
numbers.

I'll try and reproduce.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22  7:16     ` Peter Zijlstra
@ 2016-06-22  7:49       ` Peter Zijlstra
  2016-06-22  7:54         ` Peter Zijlstra
  2016-06-22  8:20       ` Jirka Hladky
  1 sibling, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-22  7:49 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Wed, Jun 22, 2016 at 09:16:01AM +0200, Peter Zijlstra wrote:
> WTF a benchmark needs that crap is beyond me, but whatever, I have
> numbers.

Oh, shaft me harder, its XML shite :/ How is a sane person ever going to
get numbers out.

I'm >.< close to giving up on this site and declaring the thing
-EDONTCARE.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22  7:49       ` Peter Zijlstra
@ 2016-06-22  7:54         ` Peter Zijlstra
  2016-06-22  9:52           ` Jirka Hladky
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-22  7:54 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Wed, Jun 22, 2016 at 09:49:41AM +0200, Peter Zijlstra wrote:
> On Wed, Jun 22, 2016 at 09:16:01AM +0200, Peter Zijlstra wrote:
> > WTF a benchmark needs that crap is beyond me, but whatever, I have
> > numbers.
> 
> Oh, shaft me harder, its XML shite :/ How is a sane person ever going to
> get numbers out.
> 
> I'm >.< close to giving up on this site and declaring the thing
> -EDONTCARE.

OK, done.. have a look at this:


/tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence
bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6
bad constant pool tag: 18 at 10
Please remove or make sure it appears in the correct subdirectory of the classpath.
public interface Name extends CharSequence {
                              ^
ERROR: compiler exit code: 1

Warmup (120s) begins: Wed Jun 22 09:45:33 CEST 2016
/tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence
bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6
bad constant pool tag: 18 at 10
Please remove or make sure it appears in the correct subdirectory of the classpath.
public interface Name extends CharSequence {
                              ^
/tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence
bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6
bad constant pool tag: 18 at 10
Please remove or make sure it appears in the correct subdirectory of the classpath.



Clearly this stuff just isn't made to be used.


/me goes do something useful.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22  7:54         ` Peter Zijlstra
@ 2016-06-22  9:52           ` Jirka Hladky
  2016-06-22 11:12             ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Jirka Hladky @ 2016-06-22  9:52 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

Hi Peter,

the performance regression has been caused by this commit

=================================================
commit 6ecdd74962f246dfe8750b7bea481a1c0816315d
Author: Yuyang Du <yuyang.du@intel.com>
Date:   Tue Apr 5 12:12:26 2016 +0800

    sched/fair: Generalize the load/util averages resolution definition
=================================================

Could you please have a look?

Thanks a lot!
Jirka


On Wed, Jun 22, 2016 at 9:54 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jun 22, 2016 at 09:49:41AM +0200, Peter Zijlstra wrote:
>> On Wed, Jun 22, 2016 at 09:16:01AM +0200, Peter Zijlstra wrote:
>> > WTF a benchmark needs that crap is beyond me, but whatever, I have
>> > numbers.
>>
>> Oh, shaft me harder, its XML shite :/ How is a sane person ever going to
>> get numbers out.
>>
>> I'm >.< close to giving up on this site and declaring the thing
>> -EDONTCARE.
>
> OK, done.. have a look at this:
>
>
> /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence
> bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6
> bad constant pool tag: 18 at 10
> Please remove or make sure it appears in the correct subdirectory of the classpath.
> public interface Name extends CharSequence {
>                               ^
> ERROR: compiler exit code: 1
>
> Warmup (120s) begins: Wed Jun 22 09:45:33 CEST 2016
> /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence
> bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6
> bad constant pool tag: 18 at 10
> Please remove or make sure it appears in the correct subdirectory of the classpath.
> public interface Name extends CharSequence {
>                               ^
> /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence
> bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6
> bad constant pool tag: 18 at 10
> Please remove or make sure it appears in the correct subdirectory of the classpath.
>
>
>
> Clearly this stuff just isn't made to be used.
>
>
> /me goes do something useful.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22  9:52           ` Jirka Hladky
@ 2016-06-22 11:12             ` Peter Zijlstra
  2016-06-22 12:37               ` Jirka Hladky
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-22 11:12 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Wed, Jun 22, 2016 at 11:52:45AM +0200, Jirka Hladky wrote:
> Hi Peter,
> 
> the performance regression has been caused by this commit
> 
> =================================================
> commit 6ecdd74962f246dfe8750b7bea481a1c0816315d
> Author: Yuyang Du <yuyang.du@intel.com>
> Date:   Tue Apr 5 12:12:26 2016 +0800
> 
>     sched/fair: Generalize the load/util averages resolution definition
> =================================================
> 
> Could you please have a look?

That patch looks like a NO-OP to me.

In any case, the good news it that I can run the benchmark, the bad news
is that the patch you fingered doesn't appear to be it.


v4.60:
./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.transform: 2007.18 ops/m
./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.validation: 2999.44 ops/m

tip/master:
./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.transform: 1283.14 ops/m
./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.validation: 2008.62 ops/m

patch^1
./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.transform: 1196.18 ops/m
./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.validation: 2055.11 ops/m

patch^1 + patch
./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.transform: 1294.59 ops/m
./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.validation: 2140.02 ops/m

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22 11:12             ` Peter Zijlstra
@ 2016-06-22 12:37               ` Jirka Hladky
  2016-06-22 12:46                 ` Jirka Hladky
  0 siblings, 1 reply; 34+ messages in thread
From: Jirka Hladky @ 2016-06-22 12:37 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

Hi Peter,

crap - I have done bisecting manually (not using git bisect) and I
have probably done some mistake.

Commits (git checkout <commit>) for which I got BAD results:

2159197d66770ec01f75c93fb11dc66df81fd45b
6ecdd74962f246dfe8750b7bea481a1c0816315d

Commits (git checkout <commit>) for which I got GOOD results:
21e96f88776deead303ecd30a17d1d7c2a1776e3
64b7aad5798478ffff52e110878ccaae4c3aaa34
e7904a28f5331c21d17af638cb477c83662e3cb6

I will try to use git bisect now.

Jirka

On Wed, Jun 22, 2016 at 1:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jun 22, 2016 at 11:52:45AM +0200, Jirka Hladky wrote:
>> Hi Peter,
>>
>> the performance regression has been caused by this commit
>>
>> =================================================
>> commit 6ecdd74962f246dfe8750b7bea481a1c0816315d
>> Author: Yuyang Du <yuyang.du@intel.com>
>> Date:   Tue Apr 5 12:12:26 2016 +0800
>>
>>     sched/fair: Generalize the load/util averages resolution definition
>> =================================================
>>
>> Could you please have a look?
>
> That patch looks like a NO-OP to me.
>
> In any case, the good news it that I can run the benchmark, the bad news
> is that the patch you fingered doesn't appear to be it.
>
>
> v4.60:
> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.transform: 2007.18 ops/m
> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.validation: 2999.44 ops/m
>
> tip/master:
> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.transform: 1283.14 ops/m
> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.validation: 2008.62 ops/m
>
> patch^1
> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.transform: 1196.18 ops/m
> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.validation: 2055.11 ops/m
>
> patch^1 + patch
> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.transform: 1294.59 ops/m
> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.validation: 2140.02 ops/m
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22 12:37               ` Jirka Hladky
@ 2016-06-22 12:46                 ` Jirka Hladky
  2016-06-22 14:41                   ` Jirka Hladky
  0 siblings, 1 reply; 34+ messages in thread
From: Jirka Hladky @ 2016-06-22 12:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

OK, I have reviewed my results once again:

This commit is fine:
64b7aad - Ingo Molnar, 7 weeks ago : Merge branch 'sched/urgent' into
sched/core, to pick up fixes before applying new changes

This version has already a problem:
2159197 - Peter Zijlstra, 8 weeks ago : sched/core: Enable increased
load resolution on 64-bit kernels

git bisect start
git bisect good 64b7aad
git bisect bad 2159197
Bisecting: 1 revision left to test after this (roughly 1 step)
[eb58075149b7f0300ff19142e6245fe75db2a081] sched/core: Introduce
'struct rq_flags'

I should have results pretty soon.

Jirka


On Wed, Jun 22, 2016 at 2:37 PM, Jirka Hladky <jhladky@redhat.com> wrote:
> Hi Peter,
>
> crap - I have done bisecting manually (not using git bisect) and I
> have probably done some mistake.
>
> Commits (git checkout <commit>) for which I got BAD results:
>
> 2159197d66770ec01f75c93fb11dc66df81fd45b
> 6ecdd74962f246dfe8750b7bea481a1c0816315d
>
> Commits (git checkout <commit>) for which I got GOOD results:
> 21e96f88776deead303ecd30a17d1d7c2a1776e3
> 64b7aad5798478ffff52e110878ccaae4c3aaa34
> e7904a28f5331c21d17af638cb477c83662e3cb6
>
> I will try to use git bisect now.
> 
> Jirka
>
> On Wed, Jun 22, 2016 at 1:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Wed, Jun 22, 2016 at 11:52:45AM +0200, Jirka Hladky wrote:
>>> Hi Peter,
>>>
>>> the performance regression has been caused by this commit
>>>
>>> =================================================
>>> commit 6ecdd74962f246dfe8750b7bea481a1c0816315d
>>> Author: Yuyang Du <yuyang.du@intel.com>
>>> Date:   Tue Apr 5 12:12:26 2016 +0800
>>>
>>>     sched/fair: Generalize the load/util averages resolution definition
>>> =================================================
>>>
>>> Could you please have a look?
>>
>> That patch looks like a NO-OP to me.
>>
>> In any case, the good news it that I can run the benchmark, the bad news
>> is that the patch you fingered doesn't appear to be it.
>>
>>
>> v4.60:
>> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.transform: 2007.18 ops/m
>> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.validation: 2999.44 ops/m
>>
>> tip/master:
>> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.transform: 1283.14 ops/m
>> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.validation: 2008.62 ops/m
>>
>> patch^1
>> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.transform: 1196.18 ops/m
>> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.validation: 2055.11 ops/m
>>
>> patch^1 + patch
>> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.transform: 1294.59 ops/m
>> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.validation: 2140.02 ops/m
>>
>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22 12:46                 ` Jirka Hladky
@ 2016-06-22 14:41                   ` Jirka Hladky
  2016-06-22 20:59                     ` Peter Zijlstra
  0 siblings, 1 reply; 34+ messages in thread
From: Jirka Hladky @ 2016-06-22 14:41 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

Hi Peter,

the kernel I got with bisecting does not work - I'm getting kernel
panic during the boot.

In any case, the regression was introduced between
git bisect good 64b7aad
git bisect bad 2159197

This commit is good:
64b7aad - Ingo Molnar, 7 weeks ago : Merge branch 'sched/urgent' into
sched/core, to pick up fixes before applying new changes

This commit is bad:
2159197 - Peter Zijlstra, 8 weeks ago : sched/core: Enable increased
load resolution on 64-bit kernels

Could you please have a look?

Thanks a lot!
Jirka


On Wed, Jun 22, 2016 at 2:46 PM, Jirka Hladky <jhladky@redhat.com> wrote:
> OK, I have reviewed my results once again:
>
> This commit is fine:
> 64b7aad - Ingo Molnar, 7 weeks ago : Merge branch 'sched/urgent' into
> sched/core, to pick up fixes before applying new changes
>
> This version has already a problem:
> 2159197 - Peter Zijlstra, 8 weeks ago : sched/core: Enable increased
> load resolution on 64-bit kernels
>
> git bisect start
> git bisect good 64b7aad
> git bisect bad 2159197
> Bisecting: 1 revision left to test after this (roughly 1 step)
> [eb58075149b7f0300ff19142e6245fe75db2a081] sched/core: Introduce
> 'struct rq_flags'
>
> I should have results pretty soon.
>
> Jirka
>
>
> On Wed, Jun 22, 2016 at 2:37 PM, Jirka Hladky <jhladky@redhat.com> wrote:
>> Hi Peter,
>>
>> crap - I have done bisecting manually (not using git bisect) and I
>> have probably done some mistake.
>>
>> Commits (git checkout <commit>) for which I got BAD results:
>>
>> 2159197d66770ec01f75c93fb11dc66df81fd45b
>> 6ecdd74962f246dfe8750b7bea481a1c0816315d
>>
>> Commits (git checkout <commit>) for which I got GOOD results:
>> 21e96f88776deead303ecd30a17d1d7c2a1776e3
>> 64b7aad5798478ffff52e110878ccaae4c3aaa34
>> e7904a28f5331c21d17af638cb477c83662e3cb6
>>
>> I will try to use git bisect now.
>> 
>> Jirka
>>
>> On Wed, Jun 22, 2016 at 1:12 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>>> On Wed, Jun 22, 2016 at 11:52:45AM +0200, Jirka Hladky wrote:
>>>> Hi Peter,
>>>>
>>>> the performance regression has been caused by this commit
>>>>
>>>> =================================================
>>>> commit 6ecdd74962f246dfe8750b7bea481a1c0816315d
>>>> Author: Yuyang Du <yuyang.du@intel.com>
>>>> Date:   Tue Apr 5 12:12:26 2016 +0800
>>>>
>>>>     sched/fair: Generalize the load/util averages resolution definition
>>>> =================================================
>>>>
>>>> Could you please have a look?
>>>
>>> That patch looks like a NO-OP to me.
>>>
>>> In any case, the good news it that I can run the benchmark, the bad news
>>> is that the patch you fingered doesn't appear to be it.
>>>
>>>
>>> v4.60:
>>> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.transform: 2007.18 ops/m
>>> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.validation: 2999.44 ops/m
>>>
>>> tip/master:
>>> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.transform: 1283.14 ops/m
>>> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.validation: 2008.62 ops/m
>>>
>>> patch^1
>>> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.transform: 1196.18 ops/m
>>> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.validation: 2055.11 ops/m
>>>
>>> patch^1 + patch
>>> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.transform: 1294.59 ops/m
>>> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.validation: 2140.02 ops/m
>>>
>>>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22 14:41                   ` Jirka Hladky
@ 2016-06-22 20:59                     ` Peter Zijlstra
  0 siblings, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-22 20:59 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Wed, Jun 22, 2016 at 04:41:06PM +0200, Jirka Hladky wrote:
> This commit is bad:
> 2159197 - Peter Zijlstra, 8 weeks ago : sched/core: Enable increased
> load resolution on 64-bit kernels
> 
> Could you please have a look?

Yes, that is indeed the culprit.

The below 'revert' makes it go fast again. I'll try and figure out
what's wrong tomorrow.

---

diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bf6fea9..e7e312b 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -55,7 +55,7 @@ static inline void cpu_load_update_active(struct rq *this_rq) { }
  * Really only required when CONFIG_FAIR_GROUP_SCHED is also set, but to
  * increase coverage and consistency always enable it on 64bit platforms.
  */
-#ifdef CONFIG_64BIT
+#if 0 // def CONFIG_64BIT
 # define NICE_0_LOAD_SHIFT	(SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT)
 # define scale_load(w)		((w) << SCHED_FIXEDPOINT_SHIFT)
 # define scale_load_down(w)	((w) >> SCHED_FIXEDPOINT_SHIFT)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22  7:16     ` Peter Zijlstra
  2016-06-22  7:49       ` Peter Zijlstra
@ 2016-06-22  8:20       ` Jirka Hladky
  1 sibling, 0 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-22  8:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

[-- Attachment #1: Type: text/plain, Size: 4156 bytes --]

Hi Peter,

please find the reproducer script attached. My command to reproduce the bug is:

./run-specjvm.sh --benchmarkThreads 32 --iterations 1 --iterationTime
180 --warmuptime 90 xml.transform xml.validation

I run just xml benchmarks to speed up the runtime.

Please check
 https://bugzilla.kernel.org/show_bug.cgi?id=120481#c9
for some details how to run the benchmark.

The benchmark needs Window manager to be installed to create graphs.
However, you can run the script from ssh terminal. I don't know
exactly why is that but I know that Python's matplot library has the
same requirements.

last known good commit: e7904a28f5331c21d17af638cb477c83662e3cb6
first known bad commit: 6ecdd74962f246dfe8750b7bea481a1c0816315d

Last two commits to be checked:

 git log --pretty=oneline
e7904a28f5331c21d17af638cb477c83662e3cb6..6ecdd74962f246dfe8750b7bea481a1c0816315d
6ecdd74962f246dfe8750b7bea481a1c0816315d sched/fair: Generalize the
load/util averages resolution definition
2159197d66770ec01f75c93fb11dc66df81fd45b sched/core: Enable increased
load resolution on 64-bit kernels

I use following command to review the results produced by reproduce.sh script.

find ./ -name "*log" | xargs grep -H Score | grep xml.validation |
grep "[0-9]\{4\}[.][0-9]\{2\} ops/m"

Jirka

On Wed, Jun 22, 2016 at 9:16 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote:
>> > > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
>> > Blergh, of course I don't have those.. :/
>>
>> SPECjvm2008 is publicly available.
>> https://www.spec.org/download.html
>
> Urgh, I _so_ hate java.
>
> Why does it have to pop up windows split between my screens, total fail.
>
> In any case, I run it like:
>
>    java -jar SPECjvm2008.jar --benchmarkThreads 40
>
> because I have 40 cpus (2 sockets * 10 cores/socket * 2 threads/core).
>
> It seems to produce numbers, but then ends with a splat:
>
> Error while creating report: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
> java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper
>         at java.awt.Toolkit.loadAssistiveTechnologies(Toolkit.java:807)
>         at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:886)
>         at sun.swing.SwingUtilities2.getSystemMnemonicKeyMask(SwingUtilities2.java:2020)
>         at javax.swing.plaf.basic.BasicLookAndFeel.initComponentDefaults(BasicLookAndFeel.java:1158)
>         at javax.swing.plaf.metal.MetalLookAndFeel.initComponentDefaults(MetalLookAndFeel.java:431)
>         at javax.swing.plaf.basic.BasicLookAndFeel.getDefaults(BasicLookAndFeel.java:148)
>         at javax.swing.plaf.metal.MetalLookAndFeel.getDefaults(MetalLookAndFeel.java:1577)
>         at javax.swing.UIManager.setLookAndFeel(UIManager.java:539)
>         at javax.swing.UIManager.setLookAndFeel(UIManager.java:579)
>         at javax.swing.UIManager.initializeDefaultLAF(UIManager.java:1349)
>         at javax.swing.UIManager.initialize(UIManager.java:1459)
>         at javax.swing.UIManager.maybeInitialize(UIManager.java:1426)
>         at javax.swing.UIManager.getDefaults(UIManager.java:659)
>         at javax.swing.UIManager.getColor(UIManager.java:701)
>         at org.jfree.chart.JFreeChart.<clinit>(JFreeChart.java:246)
>         at org.jfree.chart.ChartFactory.createXYLineChart(ChartFactory.java:1478)
>         at spec.reporter.BenchmarkChart.<init>(BenchmarkChart.java:47)
>         at spec.reporter.ReportGenerator.handleBenchmarkResult(ReportGenerator.java:141)
>         at spec.reporter.ReportGenerator.handleBenchmarksResults(ReportGenerator.java:105)
>         at spec.reporter.ReportGenerator.<init>(ReportGenerator.java:87)
>         at spec.reporter.ReportGenerator.main2(ReportGenerator.java:750)
>         at spec.reporter.Reporter.main2(Reporter.java:51)
>         at spec.harness.Launch.createReport(Launch.java:307)
>         at spec.harness.Launch.runBenchmarkSuite(Launch.java:250)
>         at spec.harness.Launch.main(Launch.java:452)
>
> WTF a benchmark needs that crap is beyond me, but whatever, I have
> numbers.
>
> I'll try and reproduce.

[-- Attachment #2: reproduce.sh --]
[-- Type: application/x-sh, Size: 562 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-16 23:04   ` Jirka Hladky
  2016-06-21 13:17     ` Jirka Hladky
  2016-06-22  7:16     ` Peter Zijlstra
@ 2016-06-23 18:33     ` Peter Zijlstra
  2016-06-23 18:43       ` Peter Zijlstra
  2 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-23 18:33 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote:

> > What kind of config and userspace setup? Do you run this cruft in a
> > cgroup of sorts?
> 
>  No, we don't do any special setup except to control the number of threads.

OK, so I'm fairly certain you _do_ run in a cgroup, because its made
almost impossible not to these days.

Run:

	grep java /proc/sched_debug

while the thing is running. That'll show you the actual cgroup the stuff
is running in.

This modern Linux stuff stinks loads. And even Debian seems infected to
the point of almost being useless :-(

The _only_ reason I could reproduce was because I recently did an
upgrade of Debian Testing and I hadn't noticed just how messed up things
had become.

When I run it in the root cgroup (I had to kill cgmanager and reboot)
the numbers are just fine.

In any case, now I gotta go look at the cgroup code...

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-23 18:33     ` Peter Zijlstra
@ 2016-06-23 18:43       ` Peter Zijlstra
  2016-06-24  7:44         ` Jirka Hladky
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-23 18:43 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Thu, Jun 23, 2016 at 08:33:18PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote:
> 
> > > What kind of config and userspace setup? Do you run this cruft in a
> > > cgroup of sorts?
> > 
> >  No, we don't do any special setup except to control the number of threads.
> 
> OK, so I'm fairly certain you _do_ run in a cgroup, because its made
> almost impossible not to these days.
> 
> Run:
> 
> 	grep java /proc/sched_debug
> 
> while the thing is running. That'll show you the actual cgroup the stuff
> is running in.

That'll end up looking something like:

root@ivb-ep:/usr/src/linux-2.6# grep java /proc/sched_debug
            java  2714     18270.634925        89   120         0.000000         1.490023         0.000000 0 0 /user.slice/user-0.slice/session-2.scope
            java  2666     18643.629673         2   120         0.000000         0.063129         0.000000 0 0 /user.slice/user-0.slice/session-2.scope
            java  2676     18655.652878         3   120         0.000000         0.077127         0.000000 0 0 /user.slice/user-0.slice/session-2.scope
            java  2680     18655.683384         3   120         0.000000         0.082993         0.000000 0 0 /user.slice/user-0.slice/session-2.scope

which shows a 3 deep hierarchy. Clearly these people haven't the
faintest clue about the cost of what they're doing. This stuff ain't
free.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-23 18:43       ` Peter Zijlstra
@ 2016-06-24  7:44         ` Jirka Hladky
  2016-06-24  8:08           ` Peter Zijlstra
  2016-06-24 12:02           ` Peter Zijlstra
  0 siblings, 2 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-24  7:44 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

Hi Peter,

thanks a lot for looking into it!

I have tried to disable autogroups

sysctl -w kernel.sched_autogroup_enabled=0

and I can confirm that performance is then back at level as in 4.6 kernel.

I have double checked default settings and

kernel.sched_autogroup_enabled

is by default ON both in 4.6 and 4.7 kernel.

Jirka

On Thu, Jun 23, 2016 at 8:43 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, Jun 23, 2016 at 08:33:18PM +0200, Peter Zijlstra wrote:
>> On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote:
>>
>> > > What kind of config and userspace setup? Do you run this cruft in a
>> > > cgroup of sorts?
>> >
>> >  No, we don't do any special setup except to control the number of threads.
>>
>> OK, so I'm fairly certain you _do_ run in a cgroup, because its made
>> almost impossible not to these days.
>>
>> Run:
>>
>>       grep java /proc/sched_debug
>>
>> while the thing is running. That'll show you the actual cgroup the stuff
>> is running in.
>
> That'll end up looking something like:
>
> root@ivb-ep:/usr/src/linux-2.6# grep java /proc/sched_debug
>             java  2714     18270.634925        89   120         0.000000         1.490023         0.000000 0 0 /user.slice/user-0.slice/session-2.scope
>             java  2666     18643.629673         2   120         0.000000         0.063129         0.000000 0 0 /user.slice/user-0.slice/session-2.scope
>             java  2676     18655.652878         3   120         0.000000         0.077127         0.000000 0 0 /user.slice/user-0.slice/session-2.scope
>             java  2680     18655.683384         3   120         0.000000         0.082993         0.000000 0 0 /user.slice/user-0.slice/session-2.scope
>
> which shows a 3 deep hierarchy. Clearly these people haven't the
> faintest clue about the cost of what they're doing. This stuff ain't
> free.
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24  7:44         ` Jirka Hladky
@ 2016-06-24  8:08           ` Peter Zijlstra
  2016-06-24  8:20             ` Jirka Hladky
  2016-06-24 12:02           ` Peter Zijlstra
  1 sibling, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-24  8:08 UTC (permalink / raw)
  To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote:
> I have double checked default settings and
> 
> kernel.sched_autogroup_enabled
> 
> is by default ON both in 4.6 and 4.7 kernel.

Yeah, if you enable that CONFIG its default enabled. In any case, I'll
go trawl through the cgroup code now. I spend yesterday looking at the
'wrong' part things.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24  8:08           ` Peter Zijlstra
@ 2016-06-24  8:20             ` Jirka Hladky
  0 siblings, 0 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-24  8:20 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski

I had a look and

CONFIG_SCHED_AUTOGROUP=y

is used both in RHEL6 and RHEL7.  We compile the upstream kernels with
config derived from RHEL7 config file.

Jirka

On Fri, Jun 24, 2016 at 10:08 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote:
>> I have double checked default settings and
>>
>> kernel.sched_autogroup_enabled
>>
>> is by default ON both in 4.6 and 4.7 kernel.
>
> Yeah, if you enable that CONFIG its default enabled. In any case, I'll
> go trawl through the cgroup code now. I spend yesterday looking at the
> 'wrong' part things.
>
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24  7:44         ` Jirka Hladky
  2016-06-24  8:08           ` Peter Zijlstra
@ 2016-06-24 12:02           ` Peter Zijlstra
  2016-06-24 12:09             ` Jirka Hladky
  2016-06-24 12:44             ` Vincent Guittot
  1 sibling, 2 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-24 12:02 UTC (permalink / raw)
  To: Jirka Hladky
  Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen,
	Yuyang Du, Dietmar Eggemann, Vincent Guittot, umgwanakikbuti,
	bsegall, pjt, matt

On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote:
> Hi Peter,
> 
> thanks a lot for looking into it!
> 
> I have tried to disable autogroups
> 
> sysctl -w kernel.sched_autogroup_enabled=0
> 
> and I can confirm that performance is then back at level as in 4.6 kernel.

So unless the heat has made me do really silly things, the below seems
to cure things. Could you please verify?


---
 kernel/sched/fair.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 22d64b3f5876..d4f6fb2f3057 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
 	 */
 	tg_weight = atomic_long_read(&tg->load_avg);
 	tg_weight -= cfs_rq->tg_load_avg_contrib;
-	tg_weight += cfs_rq->load.weight;
+	tg_weight += cfs_rq->avg.load_avg;
 
 	return tg_weight;
 }
@@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
 	long tg_weight, load, shares;
 
 	tg_weight = calc_tg_weight(tg, cfs_rq);
-	load = cfs_rq->load.weight;
+	load = cfs_rq->avg.load_avg;
 
 	shares = (tg->shares * load);
 	if (tg_weight)

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 12:02           ` Peter Zijlstra
@ 2016-06-24 12:09             ` Jirka Hladky
  2016-06-24 12:30               ` Peter Zijlstra
  2016-06-24 12:35               ` Jirka Hladky
  2016-06-24 12:44             ` Vincent Guittot
  1 sibling, 2 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-24 12:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen,
	Yuyang Du, Dietmar Eggemann, Vincent Guittot, Mike Galbraith,
	bsegall, pjt, matt

Thank you Peter!

Should I apply it to v4.7-rc4 ?

Jirka

On Fri, Jun 24, 2016 at 2:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote:
>> Hi Peter,
>>
>> thanks a lot for looking into it!
>>
>> I have tried to disable autogroups
>>
>> sysctl -w kernel.sched_autogroup_enabled=0
>>
>> and I can confirm that performance is then back at level as in 4.6 kernel.
>
> So unless the heat has made me do really silly things, the below seems
> to cure things. Could you please verify?
>
>
> ---
>  kernel/sched/fair.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 22d64b3f5876..d4f6fb2f3057 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
>          */
>         tg_weight = atomic_long_read(&tg->load_avg);
>         tg_weight -= cfs_rq->tg_load_avg_contrib;
> -       tg_weight += cfs_rq->load.weight;
> +       tg_weight += cfs_rq->avg.load_avg;
>
>         return tg_weight;
>  }
> @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
>         long tg_weight, load, shares;
>
>         tg_weight = calc_tg_weight(tg, cfs_rq);
> -       load = cfs_rq->load.weight;
> +       load = cfs_rq->avg.load_avg;
>
>         shares = (tg->shares * load);
>         if (tg_weight)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 12:09             ` Jirka Hladky
@ 2016-06-24 12:30               ` Peter Zijlstra
  2016-06-24 12:35               ` Jirka Hladky
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-24 12:30 UTC (permalink / raw)
  To: Jirka Hladky
  Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen,
	Yuyang Du, Dietmar Eggemann, Vincent Guittot, Mike Galbraith,
	bsegall, pjt, matt

On Fri, Jun 24, 2016 at 02:09:30PM +0200, Jirka Hladky wrote:
> Thank you Peter!
> 
> Should I apply it to v4.7-rc4 ?

It does indeed apply to v4.7-rc4, although I only tested it against
tip/master.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 12:09             ` Jirka Hladky
  2016-06-24 12:30               ` Peter Zijlstra
@ 2016-06-24 12:35               ` Jirka Hladky
  1 sibling, 0 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-24 12:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen,
	Yuyang Du, Dietmar Eggemann, Vincent Guittot, Mike Galbraith,
	bsegall, pjt, matt

OK, I have applied to v4.7-rc4 via git am

Compiling kernel, should have the results soon.

Jirka

On Fri, Jun 24, 2016 at 2:09 PM, Jirka Hladky <jhladky@redhat.com> wrote:
> Thank you Peter!
>
> Should I apply it to v4.7-rc4 ?
>
> Jirka
>
> On Fri, Jun 24, 2016 at 2:02 PM, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote:
>>> Hi Peter,
>>>
>>> thanks a lot for looking into it!
>>>
>>> I have tried to disable autogroups
>>>
>>> sysctl -w kernel.sched_autogroup_enabled=0
>>>
>>> and I can confirm that performance is then back at level as in 4.6 kernel.
>>
>> So unless the heat has made me do really silly things, the below seems
>> to cure things. Could you please verify?
>>
>>
>> ---
>>  kernel/sched/fair.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 22d64b3f5876..d4f6fb2f3057 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
>>          */
>>         tg_weight = atomic_long_read(&tg->load_avg);
>>         tg_weight -= cfs_rq->tg_load_avg_contrib;
>> -       tg_weight += cfs_rq->load.weight;
>> +       tg_weight += cfs_rq->avg.load_avg;
>>
>>         return tg_weight;
>>  }
>> @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
>>         long tg_weight, load, shares;
>>
>>         tg_weight = calc_tg_weight(tg, cfs_rq);
>> -       load = cfs_rq->load.weight;
>> +       load = cfs_rq->avg.load_avg;
>>
>>         shares = (tg->shares * load);
>>         if (tg_weight)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 12:02           ` Peter Zijlstra
  2016-06-24 12:09             ` Jirka Hladky
@ 2016-06-24 12:44             ` Vincent Guittot
  2016-06-24 13:08               ` Jirka Hladky
                                 ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Vincent Guittot @ 2016-06-24 12:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

Hi Peter,

On 24 June 2016 at 14:02, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote:
>> Hi Peter,
>>
>> thanks a lot for looking into it!
>>
>> I have tried to disable autogroups
>>
>> sysctl -w kernel.sched_autogroup_enabled=0
>>
>> and I can confirm that performance is then back at level as in 4.6 kernel.
>
> So unless the heat has made me do really silly things, the below seems
> to cure things. Could you please verify?
>
>
> ---
>  kernel/sched/fair.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 22d64b3f5876..d4f6fb2f3057 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
>          */
>         tg_weight = atomic_long_read(&tg->load_avg);
>         tg_weight -= cfs_rq->tg_load_avg_contrib;
> -       tg_weight += cfs_rq->load.weight;
> +       tg_weight += cfs_rq->avg.load_avg;

IIUC, you are reverting
commit  fde7d22e01aa (sched/fair: Fix overly small weight for
interactive group entities)

I have one question regarding the use of cfs_rq->avg.load_avg
cfs_rq->tg_load_avg_contrib is the sampling of cfs_rq->avg.load_avg so
I'm curious to understand why you use cfs_rq->avg.load_avg instead of
keeping cfs_rq->tg_load_avg_contrib. Do you think that the sampling is
not accurate enough to prevent any significant difference between both
when we use tg->load_avg ?


>
>         return tg_weight;
>  }
> @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
>         long tg_weight, load, shares;
>
>         tg_weight = calc_tg_weight(tg, cfs_rq);
> -       load = cfs_rq->load.weight;
> +       load = cfs_rq->avg.load_avg;
>
>         shares = (tg->shares * load);
>         if (tg_weight)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 12:44             ` Vincent Guittot
@ 2016-06-24 13:08               ` Jirka Hladky
  2016-06-24 13:09               ` Peter Zijlstra
  2016-06-24 13:42               ` Peter Zijlstra
  2 siblings, 0 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-24 13:08 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

Hi Peter,

the proposed patch has fixed the performance issue. I have applied the
patch to v4.7-rc4

Jirka

On Fri, Jun 24, 2016 at 2:44 PM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> Hi Peter,
>
> On 24 June 2016 at 14:02, Peter Zijlstra <peterz@infradead.org> wrote:
>> On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote:
>>> Hi Peter,
>>>
>>> thanks a lot for looking into it!
>>>
>>> I have tried to disable autogroups
>>>
>>> sysctl -w kernel.sched_autogroup_enabled=0
>>>
>>> and I can confirm that performance is then back at level as in 4.6 kernel.
>>
>> So unless the heat has made me do really silly things, the below seems
>> to cure things. Could you please verify?
>>
>>
>> ---
>>  kernel/sched/fair.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> index 22d64b3f5876..d4f6fb2f3057 100644
>> --- a/kernel/sched/fair.c
>> +++ b/kernel/sched/fair.c
>> @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
>>          */
>>         tg_weight = atomic_long_read(&tg->load_avg);
>>         tg_weight -= cfs_rq->tg_load_avg_contrib;
>> -       tg_weight += cfs_rq->load.weight;
>> +       tg_weight += cfs_rq->avg.load_avg;
>
> IIUC, you are reverting
> commit  fde7d22e01aa (sched/fair: Fix overly small weight for
> interactive group entities)
>
> I have one question regarding the use of cfs_rq->avg.load_avg
> cfs_rq->tg_load_avg_contrib is the sampling of cfs_rq->avg.load_avg so
> I'm curious to understand why you use cfs_rq->avg.load_avg instead of
> keeping cfs_rq->tg_load_avg_contrib. Do you think that the sampling is
> not accurate enough to prevent any significant difference between both
> when we use tg->load_avg ?
>
>
>>
>>         return tg_weight;
>>  }
>> @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg)
>>         long tg_weight, load, shares;
>>
>>         tg_weight = calc_tg_weight(tg, cfs_rq);
>> -       load = cfs_rq->load.weight;
>> +       load = cfs_rq->avg.load_avg;
>>
>>         shares = (tg->shares * load);
>>         if (tg_weight)

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 12:44             ` Vincent Guittot
  2016-06-24 13:08               ` Jirka Hladky
@ 2016-06-24 13:09               ` Peter Zijlstra
  2016-06-24 13:23                 ` Vincent Guittot
  2016-06-24 13:42               ` Peter Zijlstra
  2 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-24 13:09 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote:

> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 22d64b3f5876..d4f6fb2f3057 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
> >          */
> >         tg_weight = atomic_long_read(&tg->load_avg);
> >         tg_weight -= cfs_rq->tg_load_avg_contrib;
> > -       tg_weight += cfs_rq->load.weight;
> > +       tg_weight += cfs_rq->avg.load_avg;
> 
> IIUC, you are reverting
> commit  fde7d22e01aa (sched/fair: Fix overly small weight for
> interactive group entities)

Ah!, I hadn't yet done a git-blame on this. Right you are, we should
have put a comment there.

So the problem here is that since commit:

  2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")

load.weight and avg.load_avg are in different metrics. Which completely
wrecked things.

The obvious alternative is using:

	scale_load_down(cfs_rq->load.weight);

Let me go run that through the benchmark.

> I have one question regarding the use of cfs_rq->avg.load_avg
> cfs_rq->tg_load_avg_contrib is the sampling of cfs_rq->avg.load_avg so
> I'm curious to understand why you use cfs_rq->avg.load_avg instead of
> keeping cfs_rq->tg_load_avg_contrib. Do you think that the sampling is
> not accurate enough to prevent any significant difference between both
> when we use tg->load_avg ?

I'm not entirely sure I understand your question; is it to the existence
of calc_tg_weight()? That is, why use calc_tg_weight() and not use
tg->load_avg as is?

It seemed like a simple and cheap way to increase accuracy, nothing more
behind it until the commit you referred to.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 13:09               ` Peter Zijlstra
@ 2016-06-24 13:23                 ` Vincent Guittot
  2016-06-24 13:33                   ` Peter Zijlstra
  2016-06-24 13:45                   ` Peter Zijlstra
  0 siblings, 2 replies; 34+ messages in thread
From: Vincent Guittot @ 2016-06-24 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

On 24 June 2016 at 15:09, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote:
>
>> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
>> > index 22d64b3f5876..d4f6fb2f3057 100644
>> > --- a/kernel/sched/fair.c
>> > +++ b/kernel/sched/fair.c
>> > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
>> >          */
>> >         tg_weight = atomic_long_read(&tg->load_avg);
>> >         tg_weight -= cfs_rq->tg_load_avg_contrib;
>> > -       tg_weight += cfs_rq->load.weight;
>> > +       tg_weight += cfs_rq->avg.load_avg;
>>
>> IIUC, you are reverting
>> commit  fde7d22e01aa (sched/fair: Fix overly small weight for
>> interactive group entities)
>
> Ah!, I hadn't yet done a git-blame on this. Right you are, we should
> have put a comment there.
>
> So the problem here is that since commit:
>
>   2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
>
> load.weight and avg.load_avg are in different metrics. Which completely
> wrecked things.
>
> The obvious alternative is using:
>
>         scale_load_down(cfs_rq->load.weight);
>
> Let me go run that through the benchmark.

Yes, looks to be good alternative

>
>> I have one question regarding the use of cfs_rq->avg.load_avg
>> cfs_rq->tg_load_avg_contrib is the sampling of cfs_rq->avg.load_avg so
>> I'm curious to understand why you use cfs_rq->avg.load_avg instead of
>> keeping cfs_rq->tg_load_avg_contrib. Do you think that the sampling is
>> not accurate enough to prevent any significant difference between both
>> when we use tg->load_avg ?
>
> I'm not entirely sure I understand your question; is it to the existence
> of calc_tg_weight()? That is, why use calc_tg_weight() and not use
> tg->load_avg as is?

Yes

>
> It seemed like a simple and cheap way to increase accuracy, nothing more
> behind it until the commit you referred to.

Thanks for the clarification.
I thought that the difference should always be smaller than 1/64th of
the cfs_rq->avg.load_avg thanks to update_tg_load_avg

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 13:23                 ` Vincent Guittot
@ 2016-06-24 13:33                   ` Peter Zijlstra
  2016-06-24 13:45                   ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-24 13:33 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

On Fri, Jun 24, 2016 at 03:23:37PM +0200, Vincent Guittot wrote:
> On 24 June 2016 at 15:09, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote:
> >
> >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> > index 22d64b3f5876..d4f6fb2f3057 100644
> >> > --- a/kernel/sched/fair.c
> >> > +++ b/kernel/sched/fair.c
> >> > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
> >> >          */
> >> >         tg_weight = atomic_long_read(&tg->load_avg);
> >> >         tg_weight -= cfs_rq->tg_load_avg_contrib;
> >> > -       tg_weight += cfs_rq->load.weight;
> >> > +       tg_weight += cfs_rq->avg.load_avg;
> >>
> >> IIUC, you are reverting
> >> commit  fde7d22e01aa (sched/fair: Fix overly small weight for
> >> interactive group entities)
> >
> > Ah!, I hadn't yet done a git-blame on this. Right you are, we should
> > have put a comment there.
> >
> > So the problem here is that since commit:
> >
> >   2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels")
> >
> > load.weight and avg.load_avg are in different metrics. Which completely
> > wrecked things.
> >
> > The obvious alternative is using:
> >
> >         scale_load_down(cfs_rq->load.weight);
> >
> > Let me go run that through the benchmark.
> 
> Yes, looks to be good alternative

Does indeed also work. Let me go write a Changelog and try and magic it
into sched/urgent.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 13:23                 ` Vincent Guittot
  2016-06-24 13:33                   ` Peter Zijlstra
@ 2016-06-24 13:45                   ` Peter Zijlstra
  1 sibling, 0 replies; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-24 13:45 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

On Fri, Jun 24, 2016 at 03:23:37PM +0200, Vincent Guittot wrote:
> > It seemed like a simple and cheap way to increase accuracy, nothing more
> > behind it until the commit you referred to.
> 
> Thanks for the clarification.
> I thought that the difference should always be smaller than 1/64th of
> the cfs_rq->avg.load_avg thanks to update_tg_load_avg

Right, another reason I just remembered is that it ensures:

	tg_weight >= cfs_rq_weight

Because if this is the only task in the entire group and cfs_rq
increased (but did not exceed the 1/64th threshold) you get the group
weight being smaller than the entity weight, which would be weird.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 12:44             ` Vincent Guittot
  2016-06-24 13:08               ` Jirka Hladky
  2016-06-24 13:09               ` Peter Zijlstra
@ 2016-06-24 13:42               ` Peter Zijlstra
  2016-06-24 15:54                 ` Peter Zijlstra
  2 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-24 13:42 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote:
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
> >          */
> >         tg_weight = atomic_long_read(&tg->load_avg);
> >         tg_weight -= cfs_rq->tg_load_avg_contrib;
> > -       tg_weight += cfs_rq->load.weight;
> > +       tg_weight += cfs_rq->avg.load_avg;
> 
> IIUC, you are reverting
> commit  fde7d22e01aa (sched/fair: Fix overly small weight for
> interactive group entities)

Hurm.. looking at that commit again, that seems to wreck
effective_load(), since that doesn't compensate.

Maybe I'll remove calc_tg_weight and open code its slightly different
usages in the two sites.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 13:42               ` Peter Zijlstra
@ 2016-06-24 15:54                 ` Peter Zijlstra
  2016-06-24 22:13                   ` Jirka Hladky
  0 siblings, 1 reply; 34+ messages in thread
From: Peter Zijlstra @ 2016-06-24 15:54 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

On Fri, Jun 24, 2016 at 03:42:26PM +0200, Peter Zijlstra wrote:
> On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote:
> > > --- a/kernel/sched/fair.c
> > > +++ b/kernel/sched/fair.c
> > > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
> > >          */
> > >         tg_weight = atomic_long_read(&tg->load_avg);
> > >         tg_weight -= cfs_rq->tg_load_avg_contrib;
> > > -       tg_weight += cfs_rq->load.weight;
> > > +       tg_weight += cfs_rq->avg.load_avg;
> > 
> > IIUC, you are reverting
> > commit  fde7d22e01aa (sched/fair: Fix overly small weight for
> > interactive group entities)
> 
> Hurm.. looking at that commit again, that seems to wreck
> effective_load(), since that doesn't compensate.
> 
> Maybe I'll remove calc_tg_weight and open code its slightly different
> usages in the two sites.

OK, sorry for not actually posting, but I need to run. Please find the
two patches in:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-24 15:54                 ` Peter Zijlstra
@ 2016-06-24 22:13                   ` Jirka Hladky
  0 siblings, 0 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-24 22:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Kamil Kolakowski,
	Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith,
	Benjamin Segall, Paul Turner, Matt Fleming

Hi Peter,

I have compiled your version of linux kernel and run the SPECjvm2008
tests. Results are fine, performance is at the level of 4.6 kernel.

$ git rev-parse HEAD
02548776ded1185e6e16ad0a475481e982741ee9

Jirka




On Fri, Jun 24, 2016 at 5:54 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jun 24, 2016 at 03:42:26PM +0200, Peter Zijlstra wrote:
>> On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote:
>> > > --- a/kernel/sched/fair.c
>> > > +++ b/kernel/sched/fair.c
>> > > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq)
>> > >          */
>> > >         tg_weight = atomic_long_read(&tg->load_avg);
>> > >         tg_weight -= cfs_rq->tg_load_avg_contrib;
>> > > -       tg_weight += cfs_rq->load.weight;
>> > > +       tg_weight += cfs_rq->avg.load_avg;
>> >
>> > IIUC, you are reverting
>> > commit  fde7d22e01aa (sched/fair: Fix overly small weight for
>> > interactive group entities)
>>
>> Hurm.. looking at that commit again, that seems to wreck
>> effective_load(), since that doesn't compensate.
>>
>> Maybe I'll remove calc_tg_weight and open code its slightly different
>> usages in the two sites.
>
> OK, sorry for not actually posting, but I need to run. Please find the
> two patches in:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
@ 2016-06-22  7:37 Branimir Maksimovic
  2016-06-22  8:25 ` Jirka Hladky
  0 siblings, 1 reply; 34+ messages in thread
From: Branimir Maksimovic @ 2016-06-22  7:37 UTC (permalink / raw)
  To: linux-kernel

Could it be related to this:

https://www.phoronix.com/scan.php?page=news_item&px=P-State-Possible-4.6-Regression


On Thu, 16 Jun 2016 18:40:01 +0200
Jirka Hladky <jhladky@redhat.com> wrote:

 > Hello,
 >
 > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
 > benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel.
 >
 > We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as
 > well affected.
 >
 > We have observed the drop on variety of different x86_64 servers with
 > different configuration (different CPU models, RAM sizes, both with
 > Hyper Threading ON and OFF, different NUMA configurations (2 and 4
 > NUMA nodes)
 >
 > Linpack and Stream benchmarks do not show any performance drop.
 >
 > The performance drop increases with higher number of threads. The
 > maximum number of threads in each benchmark is the same as number of
 > CPUs.
 >
 > We have opened a BZ to track the progress:
 > https://bugzilla.kernel.org/show_bug.cgi?id=120481
 >
 > You can find more details along with graphs and tables there.
 >
 > Do you have any hints which commit should we try to reverse?
 >
 > Thanks a lot!
 > Jirka

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel
  2016-06-22  7:37 Branimir Maksimovic
@ 2016-06-22  8:25 ` Jirka Hladky
  0 siblings, 0 replies; 34+ messages in thread
From: Jirka Hladky @ 2016-06-22  8:25 UTC (permalink / raw)
  To: Branimir Maksimovic; +Cc: linux-kernel

Hi Branimir,

I don't think that it's related. The regression has happened in one of
these two commits:

$ git log --pretty=oneline
e7904a28f5331c21d17af638cb477c83662e3cb6..6ecdd74962f246dfe8750b7bea481a1c0816315d
6ecdd74962f246dfe8750b7bea481a1c0816315d sched/fair: Generalize the
load/util averages resolution definition
2159197d66770ec01f75c93fb11dc66df81fd45b sched/core: Enable increased
load resolution on 64-bit kernels

Please see
https://bugzilla.kernel.org/show_bug.cgi?id=120481
for the details.

Jirka




On Wed, Jun 22, 2016 at 9:37 AM, Branimir Maksimovic
<branimir.maksimovic@gmail.com> wrote:
> Could it be related to this:
>
> https://www.phoronix.com/scan.php?page=news_item&px=P-State-Possible-4.6-Regression
>
>
> On Thu, 16 Jun 2016 18:40:01 +0200
> Jirka Hladky <jhladky@redhat.com> wrote:
>
>> Hello,
>>
>> we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008
>> benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel.
>>
>> We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as
>> well affected.
>>
>> We have observed the drop on variety of different x86_64 servers with
>> different configuration (different CPU models, RAM sizes, both with
>> Hyper Threading ON and OFF, different NUMA configurations (2 and 4
>> NUMA nodes)
>>
>> Linpack and Stream benchmarks do not show any performance drop.
>>
>> The performance drop increases with higher number of threads. The
>> maximum number of threads in each benchmark is the same as number of
>> CPUs.
>>
>> We have opened a BZ to track the progress:
>> https://bugzilla.kernel.org/show_bug.cgi?id=120481
>>
>> You can find more details along with graphs and tables there.
>>
>> Do you have any hints which commit should we try to reverse?
>>
>> Thanks a lot!
>> Jirka
>

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2016-06-24 22:13 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-16 16:38 Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel Jirka Hladky
2016-06-16 17:22 ` Peter Zijlstra
2016-06-16 23:04   ` Jirka Hladky
2016-06-21 13:17     ` Jirka Hladky
2016-06-22  7:16     ` Peter Zijlstra
2016-06-22  7:49       ` Peter Zijlstra
2016-06-22  7:54         ` Peter Zijlstra
2016-06-22  9:52           ` Jirka Hladky
2016-06-22 11:12             ` Peter Zijlstra
2016-06-22 12:37               ` Jirka Hladky
2016-06-22 12:46                 ` Jirka Hladky
2016-06-22 14:41                   ` Jirka Hladky
2016-06-22 20:59                     ` Peter Zijlstra
2016-06-22  8:20       ` Jirka Hladky
2016-06-23 18:33     ` Peter Zijlstra
2016-06-23 18:43       ` Peter Zijlstra
2016-06-24  7:44         ` Jirka Hladky
2016-06-24  8:08           ` Peter Zijlstra
2016-06-24  8:20             ` Jirka Hladky
2016-06-24 12:02           ` Peter Zijlstra
2016-06-24 12:09             ` Jirka Hladky
2016-06-24 12:30               ` Peter Zijlstra
2016-06-24 12:35               ` Jirka Hladky
2016-06-24 12:44             ` Vincent Guittot
2016-06-24 13:08               ` Jirka Hladky
2016-06-24 13:09               ` Peter Zijlstra
2016-06-24 13:23                 ` Vincent Guittot
2016-06-24 13:33                   ` Peter Zijlstra
2016-06-24 13:45                   ` Peter Zijlstra
2016-06-24 13:42               ` Peter Zijlstra
2016-06-24 15:54                 ` Peter Zijlstra
2016-06-24 22:13                   ` Jirka Hladky
2016-06-22  7:37 Branimir Maksimovic
2016-06-22  8:25 ` Jirka Hladky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).