Hi, Mel, On Mon, 2022-04-18 at 22:18 +0800, kernel test robot wrote: > (please be noted we reported "[sched/fair] 2cfb7a1b03: fsmark.files_per_sec > -26.2% regression" at > https://lore.kernel.org/all/20220303153108.GC14527@xsang-OptiPlex-9020/ > when this is still on branch: > commit: 2cfb7a1b031b0e816af7a6ee0c6ab83b0acdf05a ("sched/fair: Improve consistency of allowed NUMA balance calculations") > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core > > now we noticed the similar performance changes as well as some others are > still existing on mainline, so report this again for information) > > > Greeting, > > FYI, we noticed a -19.8% regression of stress-ng.fstat.ops_per_sec due to commit: > > > commit: 2cfb7a1b031b0e816af7a6ee0c6ab83b0acdf05a ("sched/fair: Improve consistency of allowed NUMA balance calculations") > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > in testcase: stress-ng > on test machine: 96 threads 2 sockets Ice Lake with 256G memory > with following parameters: > > nr_threads: 10% > disk: 1HDD > testtime: 60s > fs: f2fs > class: filesystem > test: fstat > cpufreq_governor: performance > ucode: 0xb000280 > > > In addition to that, the commit also has significant impact on the following tests: > > +------------------+-------------------------------------------------------------------------------------+ > > testcase: change | phoronix-test-suite: phoronix-test-suite.neatbench.CPU.fps 12.5% improvement | > > test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory | > > test parameters | cpufreq_governor=performance | > >                  | option_a=CPU | > >                  | test=neatbench-1.0.4 | > >                  | ucode=0x500320a | > +------------------+-------------------------------------------------------------------------------------+ > > testcase: change | phoronix-test-suite: phoronix-test-suite.neatbench.All.fps 15.2% improvement | > > test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory | > > test parameters | cpufreq_governor=performance | > >                  | option_a=All (CPU + GPU) | > >                  | test=neatbench-1.0.4 | > >                  | ucode=0x500320a | > +------------------+-------------------------------------------------------------------------------------+ > > testcase: change | fsmark: fsmark.files_per_sec -9.9% regression | > > test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | > > test parameters | cpufreq_governor=performance | > >                  | disk=1BRD_48G | > >                  | filesize=4M | > >                  | fs=f2fs | > >                  | iterations=1x | > >                  | nr_threads=64t | > >                  | sync_method=fsyncBeforeClose | > >                  | test_size=24G | > >                  | ucode=0x500320a | > +------------------+-------------------------------------------------------------------------------------+ > > testcase: change | fsmark: fsmark.files_per_sec -26.2% regression | > > test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | > > test parameters | cpufreq_governor=performance | > >                  | disk=1BRD_48G | > >                  | filesize=4M | > >                  | fs=ext4 | > >                  | iterations=1x | > >                  | nr_threads=64t | > >                  | sync_method=NoSync | > >                  | test_size=24G | > >                  | ucode=0x500320a | > +------------------+-------------------------------------------------------------------------------------+ > > testcase: change | stress-ng: stress-ng.fstat.ops_per_sec -20.1% regression | > > test machine | 96 threads 2 sockets Ice Lake with 256G memory | > > test parameters | class=filesystem | > >                  | cpufreq_governor=performance | > >                  | disk=1HDD | > >                  | fs=xfs | > >                  | nr_threads=10% | > >                  | test=fstat | > >                  | testtime=60s | > >                  | ucode=0xb000280 | > +------------------+-------------------------------------------------------------------------------------+ > > testcase: change | fsmark: fsmark.files_per_sec -16.3% regression | > > test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | > > test parameters | cpufreq_governor=performance | > >                  | disk=1BRD_48G | > >                  | filesize=4M | > >                  | fs=ext4 | > >                  | iterations=1x | > >                  | nr_threads=64t | > >                  | sync_method=fsyncBeforeClose | > >                  | test_size=24G | > >                  | ucode=0x500320a | > +------------------+-------------------------------------------------------------------------------------+ > > testcase: change | stress-ng: stress-ng.fstat.ops_per_sec -20.2% regression | > > test machine | 96 threads 2 sockets Ice Lake with 256G memory | > > test parameters | class=filesystem | > >                  | cpufreq_governor=performance | > >                  | disk=1HDD | > >                  | fs=xfs | > >                  | nr_threads=10% | > >                  | test=fstat | > >                  | testtime=60s | > >                  | ucode=0xb000280 | > +------------------+-------------------------------------------------------------------------------------+ > When I worked on the following regression report, https://lore.kernel.org/lkml/87tuc7fp9k.fsf@yhuang6-desk2.ccr.corp.intel.com/ I found stress-ng throughput will regress if the tasks are distrubuted more evenly among NUMA nodes. So for this regression, I re-tested with mpstat per node statistics. The results are as follows, mpstat.node.0.user% mpstat.node.1.user% mpstat.node.0.sys% mpstat.node.1.sys% 889c5d60fb 3.04 2.65 30.0 25.8 2cfb7a1b03 2.39 2.28 31.2 28.9 It can be found that the task are balanced better with the commit 2cfb7a1b03. So I think the regression isn't a real problem. Best Regards, Huang, Ying