* Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel @ 2016-06-16 16:38 Jirka Hladky 2016-06-16 17:22 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Jirka Hladky @ 2016-06-16 16:38 UTC (permalink / raw) To: linux-kernel; +Cc: Ingo Molnar, Peter Zijlstra, Kamil Kolakowski Hello, we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel. We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as well affected. We have observed the drop on variety of different x86_64 servers with different configuration (different CPU models, RAM sizes, both with Hyper Threading ON and OFF, different NUMA configurations (2 and 4 NUMA nodes) Linpack and Stream benchmarks do not show any performance drop. The performance drop increases with higher number of threads. The maximum number of threads in each benchmark is the same as number of CPUs. We have opened a BZ to track the progress: https://bugzilla.kernel.org/show_bug.cgi?id=120481 You can find more details along with graphs and tables there. Do you have any hints which commit should we try to reverse? Thanks a lot! Jirka ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-16 16:38 Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel Jirka Hladky @ 2016-06-16 17:22 ` Peter Zijlstra 2016-06-16 23:04 ` Jirka Hladky 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-16 17:22 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Thu, Jun 16, 2016 at 06:38:50PM +0200, Jirka Hladky wrote: > Hello, > > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 Blergh, of course I don't have those.. :/ > benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel. > > We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as > well affected. > > We have observed the drop on variety of different x86_64 servers with > different configuration (different CPU models, RAM sizes, both with > Hyper Threading ON and OFF, different NUMA configurations (2 and 4 > NUMA nodes) What kind of config and userspace setup? Do you run this cruft in a cgroup of sorts? If so, does it change anything if you run it in the root cgroup? > Linpack and Stream benchmarks do not show any performance drop. > > The performance drop increases with higher number of threads. The > maximum number of threads in each benchmark is the same as number of > CPUs. > > We have opened a BZ to track the progress: > https://bugzilla.kernel.org/show_bug.cgi?id=120481 > > You can find more details along with graphs and tables there. > > Do you have any hints which commit should we try to reverse? There were only 66 commits or so, and I think we can rule out the hotplug changes, which should reduce it even further. You could see what the parent of this one does: 2159197d6677 sched/core: Enable increased load resolution on 64-bit kernels If not that, maybe the parent of: c58d25f371f5 sched/fair: Move record_wakee() After that I suppose you'll have to go bisect. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-16 17:22 ` Peter Zijlstra @ 2016-06-16 23:04 ` Jirka Hladky 2016-06-21 13:17 ` Jirka Hladky ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-16 23:04 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski > > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 > Blergh, of course I don't have those.. :/ SPECjvm2008 is publicly available. https://www.spec.org/download.html We will prepare a reproducer and attach it to the BZ. > What kind of config and userspace setup? Do you run this cruft in a > cgroup of sorts? No, we don't do any special setup except to control the number of threads. Thanks for the hints which commits are most likely the root cause for this. We will try to find the commit which has caused it. Jirka On Thu, Jun 16, 2016 at 7:22 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Jun 16, 2016 at 06:38:50PM +0200, Jirka Hladky wrote: >> Hello, >> >> we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 > > Blergh, of course I don't have those.. :/ > >> benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel. >> >> We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as >> well affected. >> >> We have observed the drop on variety of different x86_64 servers with >> different configuration (different CPU models, RAM sizes, both with >> Hyper Threading ON and OFF, different NUMA configurations (2 and 4 >> NUMA nodes) > > What kind of config and userspace setup? Do you run this cruft in a > cgroup of sorts? > > If so, does it change anything if you run it in the root cgroup? > >> Linpack and Stream benchmarks do not show any performance drop. >> >> The performance drop increases with higher number of threads. The >> maximum number of threads in each benchmark is the same as number of >> CPUs. >> >> We have opened a BZ to track the progress: >> https://bugzilla.kernel.org/show_bug.cgi?id=120481 >> >> You can find more details along with graphs and tables there. >> >> Do you have any hints which commit should we try to reverse? > > There were only 66 commits or so, and I think we can rule out the > hotplug changes, which should reduce it even further. > > You could see what the parent of this one does: > > 2159197d6677 sched/core: Enable increased load resolution on 64-bit kernels > > If not that, maybe the parent of: > > c58d25f371f5 sched/fair: Move record_wakee() > > After that I suppose you'll have to go bisect. > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-16 23:04 ` Jirka Hladky @ 2016-06-21 13:17 ` Jirka Hladky 2016-06-22 7:16 ` Peter Zijlstra 2016-06-23 18:33 ` Peter Zijlstra 2 siblings, 0 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-21 13:17 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski Hi Peter, I have an update for this performance issue. I have tested several kernels, I'm not at the parent of 2159197d6677 sched/core: Enable increased load resolution on 64-bit kernels and I still see the performance regression for multithreaded workloads. There are only 27 commits remaining between v4.6 (last known to be OK) and current HEAD (6ecdd74962f246dfe8750b7bea481a1c0816315d) 6ecdd74962f246dfe8750b7bea481a1c0816315d sched/fair: Generalize the load/util averages resolution definitionq hook unless util changed See below [0]. Any hint which commit should I try now? Thanks a lot! Jirka [0] $ git log --pretty=oneline v4.6..HEAD kernel/sched 6ecdd74962f246dfe8750b7bea481a1c0816315d sched/fair: Generalize the load/util averages resolution definition 2159197d66770ec01f75c93fb11dc66df81fd45b sched/core: Enable increased load resolution on 64-bit kernels e7904a28f5331c21d17af638cb477c83662e3cb6 locking/lockdep, sched/core: Implement a better lock pinning scheme eb58075149b7f0300ff19142e6245fe75db2a081 sched/core: Introduce 'struct rq_flags' 3e71a462dd483ce508a723356b293731e7d788ea sched/core: Move task_rq_lock() out of line 64b7aad5798478ffff52e110878ccaae4c3aaa34 Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes f98db6013c557c216da5038d9c52045be55cd039 sched/core: Add switch_mm_irqs_off() and use it in the scheduler 594dd290cf5403a9a5818619dfff42d8e8e0518e sched/cpufreq: Optimize cpufreq update kicker to avoid update multiple times fec148c000d0f9ac21679601722811eb60b4cc52 sched/deadline: Fix a bug in dl_overflow() 9fd81dd5ce0b12341c9f83346f8d32ac68bd3841 sched/fair: Optimize !CONFIG_NO_HZ_COMMON CPU load updates 1f41906a6fda1114debd3898668bd7ab6470ee41 sched/fair: Correctly handle nohz ticks CPU load accounting cee1afce3053e7aa0793fbd5f2e845fa2cef9e33 sched/fair: Gather CPU load functions under a more conventional namespace a2c6c91f98247fef0fe75216d607812485aeb0df sched/fair: Call cpufreq hook in additional paths 41e0d37f7ac81297c07ba311e4ad39465b8c8295 sched/fair: Do not call cpufreq hook unless util changed 21e96f88776deead303ecd30a17d1d7c2a1776e3 sched/fair: Move cpufreq hook to update_cfs_rq_load_avg() 1f621e028baf391f6684003e32e009bc934b750f sched/fair: Fix asym packing to select correct CPU bd92883051a0228cc34996b8e766111ba10c9aac sched/cpuacct: Check for NULL when using task_pt_regs() 2c923e94cd9c6acff3b22f0ae29cfe65e2658b40 sched/clock: Make local_clock()/cpu_clock() inline c78b17e28cc2c2df74264afc408bdc6aaf3fbcc8 sched/clock: Remove pointless test in cpu_clock/local_clock fb90a6e93c0684ab2629a42462400603aa829b9c sched/debug: Don't dump sched debug info in SysRq-W 2b8c41daba327c633228169e8bd8ec067ab443f8 sched/fair: Initiate a new task's util avg to a bounded value 1c3de5e19fc96206dd086e634129d08e5f7b1000 sched/fair: Update comments after a variable rename 47252cfbac03644ee4a3adfa50c77896aa94f2bb sched/core: Add preempt checks in preempt_schedule() code bfdb198ccd99472c5bded689699eb30dd06316bb sched/numa: Remove unnecessary NUMA dequeue update from non-SMP kernels d02c071183e1c01a76811c878c8a52322201f81f sched/fair: Reset nr_balance_failed after active balancing d740037fac7052e49450f6fa1454f1144a103b55 sched/cpuacct: Split usage accounting into user_usage and sys_usage 5ca3726af7f66a8cc71ce4414cfeb86deb784491 sched/cpuacct: Show all possible CPUs in cpuacct output On Fri, Jun 17, 2016 at 1:04 AM, Jirka Hladky <jhladky@redhat.com> wrote: >> > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 >> Blergh, of course I don't have those.. :/ > > SPECjvm2008 is publicly available. > https://www.spec.org/download.html > > We will prepare a reproducer and attach it to the BZ. > >> What kind of config and userspace setup? Do you run this cruft in a >> cgroup of sorts? > > No, we don't do any special setup except to control the number of threads. > > Thanks for the hints which commits are most likely the root cause for > this. We will try to find the commit which has caused it. > > Jirka > > > > On Thu, Jun 16, 2016 at 7:22 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> On Thu, Jun 16, 2016 at 06:38:50PM +0200, Jirka Hladky wrote: >>> Hello, >>> >>> we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 >> >> Blergh, of course I don't have those.. :/ >> >>> benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel. >>> >>> We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as >>> well affected. >>> >>> We have observed the drop on variety of different x86_64 servers with >>> different configuration (different CPU models, RAM sizes, both with >>> Hyper Threading ON and OFF, different NUMA configurations (2 and 4 >>> NUMA nodes) >> >> What kind of config and userspace setup? Do you run this cruft in a >> cgroup of sorts? >> >> If so, does it change anything if you run it in the root cgroup? >> >>> Linpack and Stream benchmarks do not show any performance drop. >>> >>> The performance drop increases with higher number of threads. The >>> maximum number of threads in each benchmark is the same as number of >>> CPUs. >>> >>> We have opened a BZ to track the progress: >>> https://bugzilla.kernel.org/show_bug.cgi?id=120481 >>> >>> You can find more details along with graphs and tables there. >>> >>> Do you have any hints which commit should we try to reverse? >> >> There were only 66 commits or so, and I think we can rule out the >> hotplug changes, which should reduce it even further. >> >> You could see what the parent of this one does: >> >> 2159197d6677 sched/core: Enable increased load resolution on 64-bit kernels >> >> If not that, maybe the parent of: >> >> c58d25f371f5 sched/fair: Move record_wakee() >> >> After that I suppose you'll have to go bisect. >> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-16 23:04 ` Jirka Hladky 2016-06-21 13:17 ` Jirka Hladky @ 2016-06-22 7:16 ` Peter Zijlstra 2016-06-22 7:49 ` Peter Zijlstra 2016-06-22 8:20 ` Jirka Hladky 2016-06-23 18:33 ` Peter Zijlstra 2 siblings, 2 replies; 34+ messages in thread From: Peter Zijlstra @ 2016-06-22 7:16 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote: > > > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 > > Blergh, of course I don't have those.. :/ > > SPECjvm2008 is publicly available. > https://www.spec.org/download.html Urgh, I _so_ hate java. Why does it have to pop up windows split between my screens, total fail. In any case, I run it like: java -jar SPECjvm2008.jar --benchmarkThreads 40 because I have 40 cpus (2 sockets * 10 cores/socket * 2 threads/core). It seems to produce numbers, but then ends with a splat: Error while creating report: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper at java.awt.Toolkit.loadAssistiveTechnologies(Toolkit.java:807) at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:886) at sun.swing.SwingUtilities2.getSystemMnemonicKeyMask(SwingUtilities2.java:2020) at javax.swing.plaf.basic.BasicLookAndFeel.initComponentDefaults(BasicLookAndFeel.java:1158) at javax.swing.plaf.metal.MetalLookAndFeel.initComponentDefaults(MetalLookAndFeel.java:431) at javax.swing.plaf.basic.BasicLookAndFeel.getDefaults(BasicLookAndFeel.java:148) at javax.swing.plaf.metal.MetalLookAndFeel.getDefaults(MetalLookAndFeel.java:1577) at javax.swing.UIManager.setLookAndFeel(UIManager.java:539) at javax.swing.UIManager.setLookAndFeel(UIManager.java:579) at javax.swing.UIManager.initializeDefaultLAF(UIManager.java:1349) at javax.swing.UIManager.initialize(UIManager.java:1459) at javax.swing.UIManager.maybeInitialize(UIManager.java:1426) at javax.swing.UIManager.getDefaults(UIManager.java:659) at javax.swing.UIManager.getColor(UIManager.java:701) at org.jfree.chart.JFreeChart.<clinit>(JFreeChart.java:246) at org.jfree.chart.ChartFactory.createXYLineChart(ChartFactory.java:1478) at spec.reporter.BenchmarkChart.<init>(BenchmarkChart.java:47) at spec.reporter.ReportGenerator.handleBenchmarkResult(ReportGenerator.java:141) at spec.reporter.ReportGenerator.handleBenchmarksResults(ReportGenerator.java:105) at spec.reporter.ReportGenerator.<init>(ReportGenerator.java:87) at spec.reporter.ReportGenerator.main2(ReportGenerator.java:750) at spec.reporter.Reporter.main2(Reporter.java:51) at spec.harness.Launch.createReport(Launch.java:307) at spec.harness.Launch.runBenchmarkSuite(Launch.java:250) at spec.harness.Launch.main(Launch.java:452) WTF a benchmark needs that crap is beyond me, but whatever, I have numbers. I'll try and reproduce. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 7:16 ` Peter Zijlstra @ 2016-06-22 7:49 ` Peter Zijlstra 2016-06-22 7:54 ` Peter Zijlstra 2016-06-22 8:20 ` Jirka Hladky 1 sibling, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-22 7:49 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Wed, Jun 22, 2016 at 09:16:01AM +0200, Peter Zijlstra wrote: > WTF a benchmark needs that crap is beyond me, but whatever, I have > numbers. Oh, shaft me harder, its XML shite :/ How is a sane person ever going to get numbers out. I'm >.< close to giving up on this site and declaring the thing -EDONTCARE. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 7:49 ` Peter Zijlstra @ 2016-06-22 7:54 ` Peter Zijlstra 2016-06-22 9:52 ` Jirka Hladky 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-22 7:54 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Wed, Jun 22, 2016 at 09:49:41AM +0200, Peter Zijlstra wrote: > On Wed, Jun 22, 2016 at 09:16:01AM +0200, Peter Zijlstra wrote: > > WTF a benchmark needs that crap is beyond me, but whatever, I have > > numbers. > > Oh, shaft me harder, its XML shite :/ How is a sane person ever going to > get numbers out. > > I'm >.< close to giving up on this site and declaring the thing > -EDONTCARE. OK, done.. have a look at this: /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6 bad constant pool tag: 18 at 10 Please remove or make sure it appears in the correct subdirectory of the classpath. public interface Name extends CharSequence { ^ ERROR: compiler exit code: 1 Warmup (120s) begins: Wed Jun 22 09:45:33 CEST 2016 /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6 bad constant pool tag: 18 at 10 Please remove or make sure it appears in the correct subdirectory of the classpath. public interface Name extends CharSequence { ^ /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6 bad constant pool tag: 18 at 10 Please remove or make sure it appears in the correct subdirectory of the classpath. Clearly this stuff just isn't made to be used. /me goes do something useful. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 7:54 ` Peter Zijlstra @ 2016-06-22 9:52 ` Jirka Hladky 2016-06-22 11:12 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Jirka Hladky @ 2016-06-22 9:52 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski Hi Peter, the performance regression has been caused by this commit ================================================= commit 6ecdd74962f246dfe8750b7bea481a1c0816315d Author: Yuyang Du <yuyang.du@intel.com> Date: Tue Apr 5 12:12:26 2016 +0800 sched/fair: Generalize the load/util averages resolution definition ================================================= Could you please have a look? Thanks a lot! Jirka On Wed, Jun 22, 2016 at 9:54 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Jun 22, 2016 at 09:49:41AM +0200, Peter Zijlstra wrote: >> On Wed, Jun 22, 2016 at 09:16:01AM +0200, Peter Zijlstra wrote: >> > WTF a benchmark needs that crap is beyond me, but whatever, I have >> > numbers. >> >> Oh, shaft me harder, its XML shite :/ How is a sane person ever going to >> get numbers out. >> >> I'm >.< close to giving up on this site and declaring the thing >> -EDONTCARE. > > OK, done.. have a look at this: > > > /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence > bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6 > bad constant pool tag: 18 at 10 > Please remove or make sure it appears in the correct subdirectory of the classpath. > public interface Name extends CharSequence { > ^ > ERROR: compiler exit code: 1 > > Warmup (120s) begins: Wed Jun 22 09:45:33 CEST 2016 > /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence > bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6 > bad constant pool tag: 18 at 10 > Please remove or make sure it appears in the correct subdirectory of the classpath. > public interface Name extends CharSequence { > ^ > /tmp/SPECjvm2008/compiler.compiler/compiler/src/share/classes/javax/lang/model/element/Name.java:54: cannot access java.lang.CharSequence > bad class file: spec.benchmarks.compiler.SpecFileManager$CachedFileObject@1c06fce6 > bad constant pool tag: 18 at 10 > Please remove or make sure it appears in the correct subdirectory of the classpath. > > > > Clearly this stuff just isn't made to be used. > > > /me goes do something useful. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 9:52 ` Jirka Hladky @ 2016-06-22 11:12 ` Peter Zijlstra 2016-06-22 12:37 ` Jirka Hladky 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-22 11:12 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Wed, Jun 22, 2016 at 11:52:45AM +0200, Jirka Hladky wrote: > Hi Peter, > > the performance regression has been caused by this commit > > ================================================= > commit 6ecdd74962f246dfe8750b7bea481a1c0816315d > Author: Yuyang Du <yuyang.du@intel.com> > Date: Tue Apr 5 12:12:26 2016 +0800 > > sched/fair: Generalize the load/util averages resolution definition > ================================================= > > Could you please have a look? That patch looks like a NO-OP to me. In any case, the good news it that I can run the benchmark, the bad news is that the patch you fingered doesn't appear to be it. v4.60: ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.transform: 2007.18 ops/m ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.validation: 2999.44 ops/m tip/master: ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.transform: 1283.14 ops/m ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.validation: 2008.62 ops/m patch^1 ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.transform: 1196.18 ops/m ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.validation: 2055.11 ops/m patch^1 + patch ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.transform: 1294.59 ops/m ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.validation: 2140.02 ops/m ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 11:12 ` Peter Zijlstra @ 2016-06-22 12:37 ` Jirka Hladky 2016-06-22 12:46 ` Jirka Hladky 0 siblings, 1 reply; 34+ messages in thread From: Jirka Hladky @ 2016-06-22 12:37 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski Hi Peter, crap - I have done bisecting manually (not using git bisect) and I have probably done some mistake. Commits (git checkout <commit>) for which I got BAD results: 2159197d66770ec01f75c93fb11dc66df81fd45b 6ecdd74962f246dfe8750b7bea481a1c0816315d Commits (git checkout <commit>) for which I got GOOD results: 21e96f88776deead303ecd30a17d1d7c2a1776e3 64b7aad5798478ffff52e110878ccaae4c3aaa34 e7904a28f5331c21d17af638cb477c83662e3cb6 I will try to use git bisect now.  Jirka On Wed, Jun 22, 2016 at 1:12 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Wed, Jun 22, 2016 at 11:52:45AM +0200, Jirka Hladky wrote: >> Hi Peter, >> >> the performance regression has been caused by this commit >> >> ================================================= >> commit 6ecdd74962f246dfe8750b7bea481a1c0816315d >> Author: Yuyang Du <yuyang.du@intel.com> >> Date: Tue Apr 5 12:12:26 2016 +0800 >> >> sched/fair: Generalize the load/util averages resolution definition >> ================================================= >> >> Could you please have a look? > > That patch looks like a NO-OP to me. > > In any case, the good news it that I can run the benchmark, the bad news > is that the patch you fingered doesn't appear to be it. > > > v4.60: > ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.transform: 2007.18 ops/m > ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.validation: 2999.44 ops/m > > tip/master: > ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.transform: 1283.14 ops/m > ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.validation: 2008.62 ops/m > > patch^1 > ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.transform: 1196.18 ops/m > ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.validation: 2055.11 ops/m > > patch^1 + patch > ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.transform: 1294.59 ops/m > ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.validation: 2140.02 ops/m > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 12:37 ` Jirka Hladky @ 2016-06-22 12:46 ` Jirka Hladky 2016-06-22 14:41 ` Jirka Hladky 0 siblings, 1 reply; 34+ messages in thread From: Jirka Hladky @ 2016-06-22 12:46 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski OK, I have reviewed my results once again: This commit is fine: 64b7aad - Ingo Molnar, 7 weeks ago : Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes This version has already a problem: 2159197 - Peter Zijlstra, 8 weeks ago : sched/core: Enable increased load resolution on 64-bit kernels git bisect start git bisect good 64b7aad git bisect bad 2159197 Bisecting: 1 revision left to test after this (roughly 1 step) [eb58075149b7f0300ff19142e6245fe75db2a081] sched/core: Introduce 'struct rq_flags' I should have results pretty soon. Jirka On Wed, Jun 22, 2016 at 2:37 PM, Jirka Hladky <jhladky@redhat.com> wrote: > Hi Peter, > > crap - I have done bisecting manually (not using git bisect) and I > have probably done some mistake. > > Commits (git checkout <commit>) for which I got BAD results: > > 2159197d66770ec01f75c93fb11dc66df81fd45b > 6ecdd74962f246dfe8750b7bea481a1c0816315d > > Commits (git checkout <commit>) for which I got GOOD results: > 21e96f88776deead303ecd30a17d1d7c2a1776e3 > 64b7aad5798478ffff52e110878ccaae4c3aaa34 > e7904a28f5331c21d17af638cb477c83662e3cb6 > > I will try to use git bisect now. >  > Jirka > > On Wed, Jun 22, 2016 at 1:12 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> On Wed, Jun 22, 2016 at 11:52:45AM +0200, Jirka Hladky wrote: >>> Hi Peter, >>> >>> the performance regression has been caused by this commit >>> >>> ================================================= >>> commit 6ecdd74962f246dfe8750b7bea481a1c0816315d >>> Author: Yuyang Du <yuyang.du@intel.com> >>> Date: Tue Apr 5 12:12:26 2016 +0800 >>> >>> sched/fair: Generalize the load/util averages resolution definition >>> ================================================= >>> >>> Could you please have a look? >> >> That patch looks like a NO-OP to me. >> >> In any case, the good news it that I can run the benchmark, the bad news >> is that the patch you fingered doesn't appear to be it. >> >> >> v4.60: >> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.transform: 2007.18 ops/m >> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.validation: 2999.44 ops/m >> >> tip/master: >> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.transform: 1283.14 ops/m >> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.validation: 2008.62 ops/m >> >> patch^1 >> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.transform: 1196.18 ops/m >> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.validation: 2055.11 ops/m >> >> patch^1 + patch >> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.transform: 1294.59 ops/m >> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.validation: 2140.02 ops/m >> >> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 12:46 ` Jirka Hladky @ 2016-06-22 14:41 ` Jirka Hladky 2016-06-22 20:59 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Jirka Hladky @ 2016-06-22 14:41 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski Hi Peter, the kernel I got with bisecting does not work - I'm getting kernel panic during the boot. In any case, the regression was introduced between git bisect good 64b7aad git bisect bad 2159197 This commit is good: 64b7aad - Ingo Molnar, 7 weeks ago : Merge branch 'sched/urgent' into sched/core, to pick up fixes before applying new changes This commit is bad: 2159197 - Peter Zijlstra, 8 weeks ago : sched/core: Enable increased load resolution on 64-bit kernels Could you please have a look? Thanks a lot! Jirka On Wed, Jun 22, 2016 at 2:46 PM, Jirka Hladky <jhladky@redhat.com> wrote: > OK, I have reviewed my results once again: > > This commit is fine: > 64b7aad - Ingo Molnar, 7 weeks ago : Merge branch 'sched/urgent' into > sched/core, to pick up fixes before applying new changes > > This version has already a problem: > 2159197 - Peter Zijlstra, 8 weeks ago : sched/core: Enable increased > load resolution on 64-bit kernels > > git bisect start > git bisect good 64b7aad > git bisect bad 2159197 > Bisecting: 1 revision left to test after this (roughly 1 step) > [eb58075149b7f0300ff19142e6245fe75db2a081] sched/core: Introduce > 'struct rq_flags' > > I should have results pretty soon. > > Jirka > > > On Wed, Jun 22, 2016 at 2:37 PM, Jirka Hladky <jhladky@redhat.com> wrote: >> Hi Peter, >> >> crap - I have done bisecting manually (not using git bisect) and I >> have probably done some mistake. >> >> Commits (git checkout <commit>) for which I got BAD results: >> >> 2159197d66770ec01f75c93fb11dc66df81fd45b >> 6ecdd74962f246dfe8750b7bea481a1c0816315d >> >> Commits (git checkout <commit>) for which I got GOOD results: >> 21e96f88776deead303ecd30a17d1d7c2a1776e3 >> 64b7aad5798478ffff52e110878ccaae4c3aaa34 >> e7904a28f5331c21d17af638cb477c83662e3cb6 >> >> I will try to use git bisect now. >>  >> Jirka >> >> On Wed, Jun 22, 2016 at 1:12 PM, Peter Zijlstra <peterz@infradead.org> wrote: >>> On Wed, Jun 22, 2016 at 11:52:45AM +0200, Jirka Hladky wrote: >>>> Hi Peter, >>>> >>>> the performance regression has been caused by this commit >>>> >>>> ================================================= >>>> commit 6ecdd74962f246dfe8750b7bea481a1c0816315d >>>> Author: Yuyang Du <yuyang.du@intel.com> >>>> Date: Tue Apr 5 12:12:26 2016 +0800 >>>> >>>> sched/fair: Generalize the load/util averages resolution definition >>>> ================================================= >>>> >>>> Could you please have a look? >>> >>> That patch looks like a NO-OP to me. >>> >>> In any case, the good news it that I can run the benchmark, the bad news >>> is that the patch you fingered doesn't appear to be it. >>> >>> >>> v4.60: >>> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.transform: 2007.18 ops/m >>> ./4.6.0/2016-Jun-22_11h11m07s.log:Score on xml.validation: 2999.44 ops/m >>> >>> tip/master: >>> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.transform: 1283.14 ops/m >>> ./4.7.0-rc4-00345-gf6e78bb/2016-Jun-22_11h30m27s.log:Score on xml.validation: 2008.62 ops/m >>> >>> patch^1 >>> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.transform: 1196.18 ops/m >>> ./4.6.0-rc5-00034-g2159197/2016-Jun-22_12h38m50s.log:Score on xml.validation: 2055.11 ops/m >>> >>> patch^1 + patch >>> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.transform: 1294.59 ops/m >>> ./4.6.0-rc5-00034-g2159197-dirty/2016-Jun-22_12h55m43s.log:Score on xml.validation: 2140.02 ops/m >>> >>> ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 14:41 ` Jirka Hladky @ 2016-06-22 20:59 ` Peter Zijlstra 0 siblings, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2016-06-22 20:59 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Wed, Jun 22, 2016 at 04:41:06PM +0200, Jirka Hladky wrote: > This commit is bad: > 2159197 - Peter Zijlstra, 8 weeks ago : sched/core: Enable increased > load resolution on 64-bit kernels > > Could you please have a look? Yes, that is indeed the culprit. The below 'revert' makes it go fast again. I'll try and figure out what's wrong tomorrow. --- diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index bf6fea9..e7e312b 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -55,7 +55,7 @@ static inline void cpu_load_update_active(struct rq *this_rq) { } * Really only required when CONFIG_FAIR_GROUP_SCHED is also set, but to * increase coverage and consistency always enable it on 64bit platforms. */ -#ifdef CONFIG_64BIT +#if 0 // def CONFIG_64BIT # define NICE_0_LOAD_SHIFT (SCHED_FIXEDPOINT_SHIFT + SCHED_FIXEDPOINT_SHIFT) # define scale_load(w) ((w) << SCHED_FIXEDPOINT_SHIFT) # define scale_load_down(w) ((w) >> SCHED_FIXEDPOINT_SHIFT) ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 7:16 ` Peter Zijlstra 2016-06-22 7:49 ` Peter Zijlstra @ 2016-06-22 8:20 ` Jirka Hladky 1 sibling, 0 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-22 8:20 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski [-- Attachment #1: Type: text/plain, Size: 4156 bytes --] Hi Peter, please find the reproducer script attached. My command to reproduce the bug is: ./run-specjvm.sh --benchmarkThreads 32 --iterations 1 --iterationTime 180 --warmuptime 90 xml.transform xml.validation I run just xml benchmarks to speed up the runtime. Please check https://bugzilla.kernel.org/show_bug.cgi?id=120481#c9 for some details how to run the benchmark. The benchmark needs Window manager to be installed to create graphs. However, you can run the script from ssh terminal. I don't know exactly why is that but I know that Python's matplot library has the same requirements. last known good commit: e7904a28f5331c21d17af638cb477c83662e3cb6 first known bad commit: 6ecdd74962f246dfe8750b7bea481a1c0816315d Last two commits to be checked: git log --pretty=oneline e7904a28f5331c21d17af638cb477c83662e3cb6..6ecdd74962f246dfe8750b7bea481a1c0816315d 6ecdd74962f246dfe8750b7bea481a1c0816315d sched/fair: Generalize the load/util averages resolution definition 2159197d66770ec01f75c93fb11dc66df81fd45b sched/core: Enable increased load resolution on 64-bit kernels I use following command to review the results produced by reproduce.sh script. find ./ -name "*log" | xargs grep -H Score | grep xml.validation | grep "[0-9]\{4\}[.][0-9]\{2\} ops/m" Jirka On Wed, Jun 22, 2016 at 9:16 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote: >> > > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 >> > Blergh, of course I don't have those.. :/ >> >> SPECjvm2008 is publicly available. >> https://www.spec.org/download.html > > Urgh, I _so_ hate java. > > Why does it have to pop up windows split between my screens, total fail. > > In any case, I run it like: > > java -jar SPECjvm2008.jar --benchmarkThreads 40 > > because I have 40 cpus (2 sockets * 10 cores/socket * 2 threads/core). > > It seems to produce numbers, but then ends with a splat: > > Error while creating report: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper > java.awt.AWTError: Assistive Technology not found: org.GNOME.Accessibility.AtkWrapper > at java.awt.Toolkit.loadAssistiveTechnologies(Toolkit.java:807) > at java.awt.Toolkit.getDefaultToolkit(Toolkit.java:886) > at sun.swing.SwingUtilities2.getSystemMnemonicKeyMask(SwingUtilities2.java:2020) > at javax.swing.plaf.basic.BasicLookAndFeel.initComponentDefaults(BasicLookAndFeel.java:1158) > at javax.swing.plaf.metal.MetalLookAndFeel.initComponentDefaults(MetalLookAndFeel.java:431) > at javax.swing.plaf.basic.BasicLookAndFeel.getDefaults(BasicLookAndFeel.java:148) > at javax.swing.plaf.metal.MetalLookAndFeel.getDefaults(MetalLookAndFeel.java:1577) > at javax.swing.UIManager.setLookAndFeel(UIManager.java:539) > at javax.swing.UIManager.setLookAndFeel(UIManager.java:579) > at javax.swing.UIManager.initializeDefaultLAF(UIManager.java:1349) > at javax.swing.UIManager.initialize(UIManager.java:1459) > at javax.swing.UIManager.maybeInitialize(UIManager.java:1426) > at javax.swing.UIManager.getDefaults(UIManager.java:659) > at javax.swing.UIManager.getColor(UIManager.java:701) > at org.jfree.chart.JFreeChart.<clinit>(JFreeChart.java:246) > at org.jfree.chart.ChartFactory.createXYLineChart(ChartFactory.java:1478) > at spec.reporter.BenchmarkChart.<init>(BenchmarkChart.java:47) > at spec.reporter.ReportGenerator.handleBenchmarkResult(ReportGenerator.java:141) > at spec.reporter.ReportGenerator.handleBenchmarksResults(ReportGenerator.java:105) > at spec.reporter.ReportGenerator.<init>(ReportGenerator.java:87) > at spec.reporter.ReportGenerator.main2(ReportGenerator.java:750) > at spec.reporter.Reporter.main2(Reporter.java:51) > at spec.harness.Launch.createReport(Launch.java:307) > at spec.harness.Launch.runBenchmarkSuite(Launch.java:250) > at spec.harness.Launch.main(Launch.java:452) > > WTF a benchmark needs that crap is beyond me, but whatever, I have > numbers. > > I'll try and reproduce. [-- Attachment #2: reproduce.sh --] [-- Type: application/x-sh, Size: 562 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-16 23:04 ` Jirka Hladky 2016-06-21 13:17 ` Jirka Hladky 2016-06-22 7:16 ` Peter Zijlstra @ 2016-06-23 18:33 ` Peter Zijlstra 2016-06-23 18:43 ` Peter Zijlstra 2 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-23 18:33 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote: > > What kind of config and userspace setup? Do you run this cruft in a > > cgroup of sorts? > > No, we don't do any special setup except to control the number of threads. OK, so I'm fairly certain you _do_ run in a cgroup, because its made almost impossible not to these days. Run: grep java /proc/sched_debug while the thing is running. That'll show you the actual cgroup the stuff is running in. This modern Linux stuff stinks loads. And even Debian seems infected to the point of almost being useless :-( The _only_ reason I could reproduce was because I recently did an upgrade of Debian Testing and I hadn't noticed just how messed up things had become. When I run it in the root cgroup (I had to kill cgmanager and reboot) the numbers are just fine. In any case, now I gotta go look at the cgroup code... ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-23 18:33 ` Peter Zijlstra @ 2016-06-23 18:43 ` Peter Zijlstra 2016-06-24 7:44 ` Jirka Hladky 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-23 18:43 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Thu, Jun 23, 2016 at 08:33:18PM +0200, Peter Zijlstra wrote: > On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote: > > > > What kind of config and userspace setup? Do you run this cruft in a > > > cgroup of sorts? > > > > No, we don't do any special setup except to control the number of threads. > > OK, so I'm fairly certain you _do_ run in a cgroup, because its made > almost impossible not to these days. > > Run: > > grep java /proc/sched_debug > > while the thing is running. That'll show you the actual cgroup the stuff > is running in. That'll end up looking something like: root@ivb-ep:/usr/src/linux-2.6# grep java /proc/sched_debug java 2714 18270.634925 89 120 0.000000 1.490023 0.000000 0 0 /user.slice/user-0.slice/session-2.scope java 2666 18643.629673 2 120 0.000000 0.063129 0.000000 0 0 /user.slice/user-0.slice/session-2.scope java 2676 18655.652878 3 120 0.000000 0.077127 0.000000 0 0 /user.slice/user-0.slice/session-2.scope java 2680 18655.683384 3 120 0.000000 0.082993 0.000000 0 0 /user.slice/user-0.slice/session-2.scope which shows a 3 deep hierarchy. Clearly these people haven't the faintest clue about the cost of what they're doing. This stuff ain't free. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-23 18:43 ` Peter Zijlstra @ 2016-06-24 7:44 ` Jirka Hladky 2016-06-24 8:08 ` Peter Zijlstra 2016-06-24 12:02 ` Peter Zijlstra 0 siblings, 2 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-24 7:44 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski Hi Peter, thanks a lot for looking into it! I have tried to disable autogroups sysctl -w kernel.sched_autogroup_enabled=0 and I can confirm that performance is then back at level as in 4.6 kernel. I have double checked default settings and kernel.sched_autogroup_enabled is by default ON both in 4.6 and 4.7 kernel. Jirka On Thu, Jun 23, 2016 at 8:43 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Thu, Jun 23, 2016 at 08:33:18PM +0200, Peter Zijlstra wrote: >> On Fri, Jun 17, 2016 at 01:04:23AM +0200, Jirka Hladky wrote: >> >> > > What kind of config and userspace setup? Do you run this cruft in a >> > > cgroup of sorts? >> > >> > No, we don't do any special setup except to control the number of threads. >> >> OK, so I'm fairly certain you _do_ run in a cgroup, because its made >> almost impossible not to these days. >> >> Run: >> >> grep java /proc/sched_debug >> >> while the thing is running. That'll show you the actual cgroup the stuff >> is running in. > > That'll end up looking something like: > > root@ivb-ep:/usr/src/linux-2.6# grep java /proc/sched_debug > java 2714 18270.634925 89 120 0.000000 1.490023 0.000000 0 0 /user.slice/user-0.slice/session-2.scope > java 2666 18643.629673 2 120 0.000000 0.063129 0.000000 0 0 /user.slice/user-0.slice/session-2.scope > java 2676 18655.652878 3 120 0.000000 0.077127 0.000000 0 0 /user.slice/user-0.slice/session-2.scope > java 2680 18655.683384 3 120 0.000000 0.082993 0.000000 0 0 /user.slice/user-0.slice/session-2.scope > > which shows a 3 deep hierarchy. Clearly these people haven't the > faintest clue about the cost of what they're doing. This stuff ain't > free. > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 7:44 ` Jirka Hladky @ 2016-06-24 8:08 ` Peter Zijlstra 2016-06-24 8:20 ` Jirka Hladky 2016-06-24 12:02 ` Peter Zijlstra 1 sibling, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-24 8:08 UTC (permalink / raw) To: Jirka Hladky; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote: > I have double checked default settings and > > kernel.sched_autogroup_enabled > > is by default ON both in 4.6 and 4.7 kernel. Yeah, if you enable that CONFIG its default enabled. In any case, I'll go trawl through the cgroup code now. I spend yesterday looking at the 'wrong' part things. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 8:08 ` Peter Zijlstra @ 2016-06-24 8:20 ` Jirka Hladky 0 siblings, 0 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-24 8:20 UTC (permalink / raw) To: Peter Zijlstra; +Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski I had a look and CONFIG_SCHED_AUTOGROUP=y is used both in RHEL6 and RHEL7. We compile the upstream kernels with config derived from RHEL7 config file. Jirka On Fri, Jun 24, 2016 at 10:08 AM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote: >> I have double checked default settings and >> >> kernel.sched_autogroup_enabled >> >> is by default ON both in 4.6 and 4.7 kernel. > > Yeah, if you enable that CONFIG its default enabled. In any case, I'll > go trawl through the cgroup code now. I spend yesterday looking at the > 'wrong' part things. > > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 7:44 ` Jirka Hladky 2016-06-24 8:08 ` Peter Zijlstra @ 2016-06-24 12:02 ` Peter Zijlstra 2016-06-24 12:09 ` Jirka Hladky 2016-06-24 12:44 ` Vincent Guittot 1 sibling, 2 replies; 34+ messages in thread From: Peter Zijlstra @ 2016-06-24 12:02 UTC (permalink / raw) To: Jirka Hladky Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Vincent Guittot, umgwanakikbuti, bsegall, pjt, matt On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote: > Hi Peter, > > thanks a lot for looking into it! > > I have tried to disable autogroups > > sysctl -w kernel.sched_autogroup_enabled=0 > > and I can confirm that performance is then back at level as in 4.6 kernel. So unless the heat has made me do really silly things, the below seems to cure things. Could you please verify? --- kernel/sched/fair.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 22d64b3f5876..d4f6fb2f3057 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) */ tg_weight = atomic_long_read(&tg->load_avg); tg_weight -= cfs_rq->tg_load_avg_contrib; - tg_weight += cfs_rq->load.weight; + tg_weight += cfs_rq->avg.load_avg; return tg_weight; } @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) long tg_weight, load, shares; tg_weight = calc_tg_weight(tg, cfs_rq); - load = cfs_rq->load.weight; + load = cfs_rq->avg.load_avg; shares = (tg->shares * load); if (tg_weight) ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 12:02 ` Peter Zijlstra @ 2016-06-24 12:09 ` Jirka Hladky 2016-06-24 12:30 ` Peter Zijlstra 2016-06-24 12:35 ` Jirka Hladky 2016-06-24 12:44 ` Vincent Guittot 1 sibling, 2 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-24 12:09 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Vincent Guittot, Mike Galbraith, bsegall, pjt, matt Thank you Peter! Should I apply it to v4.7-rc4 ? Jirka On Fri, Jun 24, 2016 at 2:02 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote: >> Hi Peter, >> >> thanks a lot for looking into it! >> >> I have tried to disable autogroups >> >> sysctl -w kernel.sched_autogroup_enabled=0 >> >> and I can confirm that performance is then back at level as in 4.6 kernel. > > So unless the heat has made me do really silly things, the below seems > to cure things. Could you please verify? > > > --- > kernel/sched/fair.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 22d64b3f5876..d4f6fb2f3057 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) > */ > tg_weight = atomic_long_read(&tg->load_avg); > tg_weight -= cfs_rq->tg_load_avg_contrib; > - tg_weight += cfs_rq->load.weight; > + tg_weight += cfs_rq->avg.load_avg; > > return tg_weight; > } > @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) > long tg_weight, load, shares; > > tg_weight = calc_tg_weight(tg, cfs_rq); > - load = cfs_rq->load.weight; > + load = cfs_rq->avg.load_avg; > > shares = (tg->shares * load); > if (tg_weight) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 12:09 ` Jirka Hladky @ 2016-06-24 12:30 ` Peter Zijlstra 2016-06-24 12:35 ` Jirka Hladky 1 sibling, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2016-06-24 12:30 UTC (permalink / raw) To: Jirka Hladky Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Vincent Guittot, Mike Galbraith, bsegall, pjt, matt On Fri, Jun 24, 2016 at 02:09:30PM +0200, Jirka Hladky wrote: > Thank you Peter! > > Should I apply it to v4.7-rc4 ? It does indeed apply to v4.7-rc4, although I only tested it against tip/master. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 12:09 ` Jirka Hladky 2016-06-24 12:30 ` Peter Zijlstra @ 2016-06-24 12:35 ` Jirka Hladky 1 sibling, 0 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-24 12:35 UTC (permalink / raw) To: Peter Zijlstra Cc: linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Vincent Guittot, Mike Galbraith, bsegall, pjt, matt OK, I have applied to v4.7-rc4 via git am Compiling kernel, should have the results soon. Jirka On Fri, Jun 24, 2016 at 2:09 PM, Jirka Hladky <jhladky@redhat.com> wrote: > Thank you Peter! > > Should I apply it to v4.7-rc4 ? > > Jirka > > On Fri, Jun 24, 2016 at 2:02 PM, Peter Zijlstra <peterz@infradead.org> wrote: >> On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote: >>> Hi Peter, >>> >>> thanks a lot for looking into it! >>> >>> I have tried to disable autogroups >>> >>> sysctl -w kernel.sched_autogroup_enabled=0 >>> >>> and I can confirm that performance is then back at level as in 4.6 kernel. >> >> So unless the heat has made me do really silly things, the below seems >> to cure things. Could you please verify? >> >> >> --- >> kernel/sched/fair.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 22d64b3f5876..d4f6fb2f3057 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) >> */ >> tg_weight = atomic_long_read(&tg->load_avg); >> tg_weight -= cfs_rq->tg_load_avg_contrib; >> - tg_weight += cfs_rq->load.weight; >> + tg_weight += cfs_rq->avg.load_avg; >> >> return tg_weight; >> } >> @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) >> long tg_weight, load, shares; >> >> tg_weight = calc_tg_weight(tg, cfs_rq); >> - load = cfs_rq->load.weight; >> + load = cfs_rq->avg.load_avg; >> >> shares = (tg->shares * load); >> if (tg_weight) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 12:02 ` Peter Zijlstra 2016-06-24 12:09 ` Jirka Hladky @ 2016-06-24 12:44 ` Vincent Guittot 2016-06-24 13:08 ` Jirka Hladky ` (2 more replies) 1 sibling, 3 replies; 34+ messages in thread From: Vincent Guittot @ 2016-06-24 12:44 UTC (permalink / raw) To: Peter Zijlstra Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming Hi Peter, On 24 June 2016 at 14:02, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote: >> Hi Peter, >> >> thanks a lot for looking into it! >> >> I have tried to disable autogroups >> >> sysctl -w kernel.sched_autogroup_enabled=0 >> >> and I can confirm that performance is then back at level as in 4.6 kernel. > > So unless the heat has made me do really silly things, the below seems > to cure things. Could you please verify? > > > --- > kernel/sched/fair.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 22d64b3f5876..d4f6fb2f3057 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) > */ > tg_weight = atomic_long_read(&tg->load_avg); > tg_weight -= cfs_rq->tg_load_avg_contrib; > - tg_weight += cfs_rq->load.weight; > + tg_weight += cfs_rq->avg.load_avg; IIUC, you are reverting commit fde7d22e01aa (sched/fair: Fix overly small weight for interactive group entities) I have one question regarding the use of cfs_rq->avg.load_avg cfs_rq->tg_load_avg_contrib is the sampling of cfs_rq->avg.load_avg so I'm curious to understand why you use cfs_rq->avg.load_avg instead of keeping cfs_rq->tg_load_avg_contrib. Do you think that the sampling is not accurate enough to prevent any significant difference between both when we use tg->load_avg ? > > return tg_weight; > } > @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) > long tg_weight, load, shares; > > tg_weight = calc_tg_weight(tg, cfs_rq); > - load = cfs_rq->load.weight; > + load = cfs_rq->avg.load_avg; > > shares = (tg->shares * load); > if (tg_weight) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 12:44 ` Vincent Guittot @ 2016-06-24 13:08 ` Jirka Hladky 2016-06-24 13:09 ` Peter Zijlstra 2016-06-24 13:42 ` Peter Zijlstra 2 siblings, 0 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-24 13:08 UTC (permalink / raw) To: Vincent Guittot Cc: Peter Zijlstra, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming Hi Peter, the proposed patch has fixed the performance issue. I have applied the patch to v4.7-rc4 Jirka On Fri, Jun 24, 2016 at 2:44 PM, Vincent Guittot <vincent.guittot@linaro.org> wrote: > Hi Peter, > > On 24 June 2016 at 14:02, Peter Zijlstra <peterz@infradead.org> wrote: >> On Fri, Jun 24, 2016 at 09:44:41AM +0200, Jirka Hladky wrote: >>> Hi Peter, >>> >>> thanks a lot for looking into it! >>> >>> I have tried to disable autogroups >>> >>> sysctl -w kernel.sched_autogroup_enabled=0 >>> >>> and I can confirm that performance is then back at level as in 4.6 kernel. >> >> So unless the heat has made me do really silly things, the below seems >> to cure things. Could you please verify? >> >> >> --- >> kernel/sched/fair.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 22d64b3f5876..d4f6fb2f3057 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) >> */ >> tg_weight = atomic_long_read(&tg->load_avg); >> tg_weight -= cfs_rq->tg_load_avg_contrib; >> - tg_weight += cfs_rq->load.weight; >> + tg_weight += cfs_rq->avg.load_avg; > > IIUC, you are reverting > commit fde7d22e01aa (sched/fair: Fix overly small weight for > interactive group entities) > > I have one question regarding the use of cfs_rq->avg.load_avg > cfs_rq->tg_load_avg_contrib is the sampling of cfs_rq->avg.load_avg so > I'm curious to understand why you use cfs_rq->avg.load_avg instead of > keeping cfs_rq->tg_load_avg_contrib. Do you think that the sampling is > not accurate enough to prevent any significant difference between both > when we use tg->load_avg ? > > >> >> return tg_weight; >> } >> @@ -2494,7 +2494,7 @@ static long calc_cfs_shares(struct cfs_rq *cfs_rq, struct task_group *tg) >> long tg_weight, load, shares; >> >> tg_weight = calc_tg_weight(tg, cfs_rq); >> - load = cfs_rq->load.weight; >> + load = cfs_rq->avg.load_avg; >> >> shares = (tg->shares * load); >> if (tg_weight) ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 12:44 ` Vincent Guittot 2016-06-24 13:08 ` Jirka Hladky @ 2016-06-24 13:09 ` Peter Zijlstra 2016-06-24 13:23 ` Vincent Guittot 2016-06-24 13:42 ` Peter Zijlstra 2 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-24 13:09 UTC (permalink / raw) To: Vincent Guittot Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote: > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > > index 22d64b3f5876..d4f6fb2f3057 100644 > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) > > */ > > tg_weight = atomic_long_read(&tg->load_avg); > > tg_weight -= cfs_rq->tg_load_avg_contrib; > > - tg_weight += cfs_rq->load.weight; > > + tg_weight += cfs_rq->avg.load_avg; > > IIUC, you are reverting > commit fde7d22e01aa (sched/fair: Fix overly small weight for > interactive group entities) Ah!, I hadn't yet done a git-blame on this. Right you are, we should have put a comment there. So the problem here is that since commit: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels") load.weight and avg.load_avg are in different metrics. Which completely wrecked things. The obvious alternative is using: scale_load_down(cfs_rq->load.weight); Let me go run that through the benchmark. > I have one question regarding the use of cfs_rq->avg.load_avg > cfs_rq->tg_load_avg_contrib is the sampling of cfs_rq->avg.load_avg so > I'm curious to understand why you use cfs_rq->avg.load_avg instead of > keeping cfs_rq->tg_load_avg_contrib. Do you think that the sampling is > not accurate enough to prevent any significant difference between both > when we use tg->load_avg ? I'm not entirely sure I understand your question; is it to the existence of calc_tg_weight()? That is, why use calc_tg_weight() and not use tg->load_avg as is? It seemed like a simple and cheap way to increase accuracy, nothing more behind it until the commit you referred to. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 13:09 ` Peter Zijlstra @ 2016-06-24 13:23 ` Vincent Guittot 2016-06-24 13:33 ` Peter Zijlstra 2016-06-24 13:45 ` Peter Zijlstra 0 siblings, 2 replies; 34+ messages in thread From: Vincent Guittot @ 2016-06-24 13:23 UTC (permalink / raw) To: Peter Zijlstra Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming On 24 June 2016 at 15:09, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote: > >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> > index 22d64b3f5876..d4f6fb2f3057 100644 >> > --- a/kernel/sched/fair.c >> > +++ b/kernel/sched/fair.c >> > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) >> > */ >> > tg_weight = atomic_long_read(&tg->load_avg); >> > tg_weight -= cfs_rq->tg_load_avg_contrib; >> > - tg_weight += cfs_rq->load.weight; >> > + tg_weight += cfs_rq->avg.load_avg; >> >> IIUC, you are reverting >> commit fde7d22e01aa (sched/fair: Fix overly small weight for >> interactive group entities) > > Ah!, I hadn't yet done a git-blame on this. Right you are, we should > have put a comment there. > > So the problem here is that since commit: > > 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels") > > load.weight and avg.load_avg are in different metrics. Which completely > wrecked things. > > The obvious alternative is using: > > scale_load_down(cfs_rq->load.weight); > > Let me go run that through the benchmark. Yes, looks to be good alternative > >> I have one question regarding the use of cfs_rq->avg.load_avg >> cfs_rq->tg_load_avg_contrib is the sampling of cfs_rq->avg.load_avg so >> I'm curious to understand why you use cfs_rq->avg.load_avg instead of >> keeping cfs_rq->tg_load_avg_contrib. Do you think that the sampling is >> not accurate enough to prevent any significant difference between both >> when we use tg->load_avg ? > > I'm not entirely sure I understand your question; is it to the existence > of calc_tg_weight()? That is, why use calc_tg_weight() and not use > tg->load_avg as is? Yes > > It seemed like a simple and cheap way to increase accuracy, nothing more > behind it until the commit you referred to. Thanks for the clarification. I thought that the difference should always be smaller than 1/64th of the cfs_rq->avg.load_avg thanks to update_tg_load_avg ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 13:23 ` Vincent Guittot @ 2016-06-24 13:33 ` Peter Zijlstra 2016-06-24 13:45 ` Peter Zijlstra 1 sibling, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2016-06-24 13:33 UTC (permalink / raw) To: Vincent Guittot Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming On Fri, Jun 24, 2016 at 03:23:37PM +0200, Vincent Guittot wrote: > On 24 June 2016 at 15:09, Peter Zijlstra <peterz@infradead.org> wrote: > > On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote: > > > >> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > >> > index 22d64b3f5876..d4f6fb2f3057 100644 > >> > --- a/kernel/sched/fair.c > >> > +++ b/kernel/sched/fair.c > >> > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) > >> > */ > >> > tg_weight = atomic_long_read(&tg->load_avg); > >> > tg_weight -= cfs_rq->tg_load_avg_contrib; > >> > - tg_weight += cfs_rq->load.weight; > >> > + tg_weight += cfs_rq->avg.load_avg; > >> > >> IIUC, you are reverting > >> commit fde7d22e01aa (sched/fair: Fix overly small weight for > >> interactive group entities) > > > > Ah!, I hadn't yet done a git-blame on this. Right you are, we should > > have put a comment there. > > > > So the problem here is that since commit: > > > > 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels") > > > > load.weight and avg.load_avg are in different metrics. Which completely > > wrecked things. > > > > The obvious alternative is using: > > > > scale_load_down(cfs_rq->load.weight); > > > > Let me go run that through the benchmark. > > Yes, looks to be good alternative Does indeed also work. Let me go write a Changelog and try and magic it into sched/urgent. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 13:23 ` Vincent Guittot 2016-06-24 13:33 ` Peter Zijlstra @ 2016-06-24 13:45 ` Peter Zijlstra 1 sibling, 0 replies; 34+ messages in thread From: Peter Zijlstra @ 2016-06-24 13:45 UTC (permalink / raw) To: Vincent Guittot Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming On Fri, Jun 24, 2016 at 03:23:37PM +0200, Vincent Guittot wrote: > > It seemed like a simple and cheap way to increase accuracy, nothing more > > behind it until the commit you referred to. > > Thanks for the clarification. > I thought that the difference should always be smaller than 1/64th of > the cfs_rq->avg.load_avg thanks to update_tg_load_avg Right, another reason I just remembered is that it ensures: tg_weight >= cfs_rq_weight Because if this is the only task in the entire group and cfs_rq increased (but did not exceed the 1/64th threshold) you get the group weight being smaller than the entity weight, which would be weird. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 12:44 ` Vincent Guittot 2016-06-24 13:08 ` Jirka Hladky 2016-06-24 13:09 ` Peter Zijlstra @ 2016-06-24 13:42 ` Peter Zijlstra 2016-06-24 15:54 ` Peter Zijlstra 2 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-24 13:42 UTC (permalink / raw) To: Vincent Guittot Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote: > > --- a/kernel/sched/fair.c > > +++ b/kernel/sched/fair.c > > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) > > */ > > tg_weight = atomic_long_read(&tg->load_avg); > > tg_weight -= cfs_rq->tg_load_avg_contrib; > > - tg_weight += cfs_rq->load.weight; > > + tg_weight += cfs_rq->avg.load_avg; > > IIUC, you are reverting > commit fde7d22e01aa (sched/fair: Fix overly small weight for > interactive group entities) Hurm.. looking at that commit again, that seems to wreck effective_load(), since that doesn't compensate. Maybe I'll remove calc_tg_weight and open code its slightly different usages in the two sites. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 13:42 ` Peter Zijlstra @ 2016-06-24 15:54 ` Peter Zijlstra 2016-06-24 22:13 ` Jirka Hladky 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-06-24 15:54 UTC (permalink / raw) To: Vincent Guittot Cc: Jirka Hladky, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming On Fri, Jun 24, 2016 at 03:42:26PM +0200, Peter Zijlstra wrote: > On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote: > > > --- a/kernel/sched/fair.c > > > +++ b/kernel/sched/fair.c > > > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) > > > */ > > > tg_weight = atomic_long_read(&tg->load_avg); > > > tg_weight -= cfs_rq->tg_load_avg_contrib; > > > - tg_weight += cfs_rq->load.weight; > > > + tg_weight += cfs_rq->avg.load_avg; > > > > IIUC, you are reverting > > commit fde7d22e01aa (sched/fair: Fix overly small weight for > > interactive group entities) > > Hurm.. looking at that commit again, that seems to wreck > effective_load(), since that doesn't compensate. > > Maybe I'll remove calc_tg_weight and open code its slightly different > usages in the two sites. OK, sorry for not actually posting, but I need to run. Please find the two patches in: git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-24 15:54 ` Peter Zijlstra @ 2016-06-24 22:13 ` Jirka Hladky 0 siblings, 0 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-24 22:13 UTC (permalink / raw) To: Peter Zijlstra Cc: Vincent Guittot, linux-kernel, Ingo Molnar, Kamil Kolakowski, Morten Rasmussen, Yuyang Du, Dietmar Eggemann, Mike Galbraith, Benjamin Segall, Paul Turner, Matt Fleming Hi Peter, I have compiled your version of linux kernel and run the SPECjvm2008 tests. Results are fine, performance is at the level of 4.6 kernel. $ git rev-parse HEAD 02548776ded1185e6e16ad0a475481e982741ee9 Jirka On Fri, Jun 24, 2016 at 5:54 PM, Peter Zijlstra <peterz@infradead.org> wrote: > On Fri, Jun 24, 2016 at 03:42:26PM +0200, Peter Zijlstra wrote: >> On Fri, Jun 24, 2016 at 02:44:07PM +0200, Vincent Guittot wrote: >> > > --- a/kernel/sched/fair.c >> > > +++ b/kernel/sched/fair.c >> > > @@ -2484,7 +2484,7 @@ static inline long calc_tg_weight(struct task_group *tg, struct cfs_rq *cfs_rq) >> > > */ >> > > tg_weight = atomic_long_read(&tg->load_avg); >> > > tg_weight -= cfs_rq->tg_load_avg_contrib; >> > > - tg_weight += cfs_rq->load.weight; >> > > + tg_weight += cfs_rq->avg.load_avg; >> > >> > IIUC, you are reverting >> > commit fde7d22e01aa (sched/fair: Fix overly small weight for >> > interactive group entities) >> >> Hurm.. looking at that commit again, that seems to wreck >> effective_load(), since that doesn't compensate. >> >> Maybe I'll remove calc_tg_weight and open code its slightly different >> usages in the two sites. > > OK, sorry for not actually posting, but I need to run. Please find the > two patches in: > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel @ 2016-06-22 7:37 Branimir Maksimovic 2016-06-22 8:25 ` Jirka Hladky 0 siblings, 1 reply; 34+ messages in thread From: Branimir Maksimovic @ 2016-06-22 7:37 UTC (permalink / raw) To: linux-kernel Could it be related to this: https://www.phoronix.com/scan.php?page=news_item&px=P-State-Possible-4.6-Regression On Thu, 16 Jun 2016 18:40:01 +0200 Jirka Hladky <jhladky@redhat.com> wrote: > Hello, > > we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 > benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel. > > We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as > well affected. > > We have observed the drop on variety of different x86_64 servers with > different configuration (different CPU models, RAM sizes, both with > Hyper Threading ON and OFF, different NUMA configurations (2 and 4 > NUMA nodes) > > Linpack and Stream benchmarks do not show any performance drop. > > The performance drop increases with higher number of threads. The > maximum number of threads in each benchmark is the same as number of > CPUs. > > We have opened a BZ to track the progress: > https://bugzilla.kernel.org/show_bug.cgi?id=120481 > > You can find more details along with graphs and tables there. > > Do you have any hints which commit should we try to reverse? > > Thanks a lot! > Jirka ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel 2016-06-22 7:37 Branimir Maksimovic @ 2016-06-22 8:25 ` Jirka Hladky 0 siblings, 0 replies; 34+ messages in thread From: Jirka Hladky @ 2016-06-22 8:25 UTC (permalink / raw) To: Branimir Maksimovic; +Cc: linux-kernel Hi Branimir, I don't think that it's related. The regression has happened in one of these two commits: $ git log --pretty=oneline e7904a28f5331c21d17af638cb477c83662e3cb6..6ecdd74962f246dfe8750b7bea481a1c0816315d 6ecdd74962f246dfe8750b7bea481a1c0816315d sched/fair: Generalize the load/util averages resolution definition 2159197d66770ec01f75c93fb11dc66df81fd45b sched/core: Enable increased load resolution on 64-bit kernels Please see https://bugzilla.kernel.org/show_bug.cgi?id=120481 for the details. Jirka On Wed, Jun 22, 2016 at 9:37 AM, Branimir Maksimovic <branimir.maksimovic@gmail.com> wrote: > Could it be related to this: > > https://www.phoronix.com/scan.php?page=news_item&px=P-State-Possible-4.6-Regression > > > On Thu, 16 Jun 2016 18:40:01 +0200 > Jirka Hladky <jhladky@redhat.com> wrote: > >> Hello, >> >> we see performance drop 30-40% for SPECjbb2005 and SPECjvm2008 >> benchmarks starting from 4.7.0-0.rc0 kernel compared to 4.6 kernel. >> >> We have tested kernels 4.7.0-0.rc1 and 4.7.0-0.rc3 and these are as >> well affected. >> >> We have observed the drop on variety of different x86_64 servers with >> different configuration (different CPU models, RAM sizes, both with >> Hyper Threading ON and OFF, different NUMA configurations (2 and 4 >> NUMA nodes) >> >> Linpack and Stream benchmarks do not show any performance drop. >> >> The performance drop increases with higher number of threads. The >> maximum number of threads in each benchmark is the same as number of >> CPUs. >> >> We have opened a BZ to track the progress: >> https://bugzilla.kernel.org/show_bug.cgi?id=120481 >> >> You can find more details along with graphs and tables there. >> >> Do you have any hints which commit should we try to reverse? >> >> Thanks a lot! >> Jirka > ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2016-06-24 22:13 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-06-16 16:38 Kernel 4.7rc3 - Performance drop 30-40% for SPECjbb2005 and SPECjvm2008 benchmarks against 4.6 kernel Jirka Hladky 2016-06-16 17:22 ` Peter Zijlstra 2016-06-16 23:04 ` Jirka Hladky 2016-06-21 13:17 ` Jirka Hladky 2016-06-22 7:16 ` Peter Zijlstra 2016-06-22 7:49 ` Peter Zijlstra 2016-06-22 7:54 ` Peter Zijlstra 2016-06-22 9:52 ` Jirka Hladky 2016-06-22 11:12 ` Peter Zijlstra 2016-06-22 12:37 ` Jirka Hladky 2016-06-22 12:46 ` Jirka Hladky 2016-06-22 14:41 ` Jirka Hladky 2016-06-22 20:59 ` Peter Zijlstra 2016-06-22 8:20 ` Jirka Hladky 2016-06-23 18:33 ` Peter Zijlstra 2016-06-23 18:43 ` Peter Zijlstra 2016-06-24 7:44 ` Jirka Hladky 2016-06-24 8:08 ` Peter Zijlstra 2016-06-24 8:20 ` Jirka Hladky 2016-06-24 12:02 ` Peter Zijlstra 2016-06-24 12:09 ` Jirka Hladky 2016-06-24 12:30 ` Peter Zijlstra 2016-06-24 12:35 ` Jirka Hladky 2016-06-24 12:44 ` Vincent Guittot 2016-06-24 13:08 ` Jirka Hladky 2016-06-24 13:09 ` Peter Zijlstra 2016-06-24 13:23 ` Vincent Guittot 2016-06-24 13:33 ` Peter Zijlstra 2016-06-24 13:45 ` Peter Zijlstra 2016-06-24 13:42 ` Peter Zijlstra 2016-06-24 15:54 ` Peter Zijlstra 2016-06-24 22:13 ` Jirka Hladky 2016-06-22 7:37 Branimir Maksimovic 2016-06-22 8:25 ` Jirka Hladky
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).