Hi, Mel, On Wed, 2022-04-20 at 14:24 +0800, ying.huang@intel.com wrote: > Hi, Mel, > > On Mon, 2022-04-18 at 22:18 +0800, kernel test robot wrote: > > (please be noted we reported "[sched/fair] 2cfb7a1b03: fsmark.files_per_sec > > -26.2% regression" at > > https://lore.kernel.org/all/20220303153108.GC14527@xsang-OptiPlex-9020/ > > when this is still on branch: > > commit: 2cfb7a1b031b0e816af7a6ee0c6ab83b0acdf05a ("sched/fair: Improve consistency of allowed NUMA balance calculations") > > https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git sched/core > > > > now we noticed the similar performance changes as well as some others are > > still existing on mainline, so report this again for information) > > > > > > Greeting, > > > > FYI, we noticed a -19.8% regression of stress-ng.fstat.ops_per_sec due to commit: > > > > > > commit: 2cfb7a1b031b0e816af7a6ee0c6ab83b0acdf05a ("sched/fair: Improve consistency of allowed NUMA balance calculations") > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > in testcase: stress-ng > > on test machine: 96 threads 2 sockets Ice Lake with 256G memory > > with following parameters: > > > > nr_threads: 10% > > disk: 1HDD > > testtime: 60s > > fs: f2fs > > class: filesystem > > test: fstat > > cpufreq_governor: performance > > ucode: 0xb000280 > > > > > > In addition to that, the commit also has significant impact on the following tests: > > > > +------------------+-------------------------------------------------------------------------------------+ > > > testcase: change | phoronix-test-suite: phoronix-test-suite.neatbench.CPU.fps 12.5% improvement | > > > test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory | > > > test parameters | cpufreq_governor=performance | > > >                  | option_a=CPU | > > >                  | test=neatbench-1.0.4 | > > >                  | ucode=0x500320a | > > +------------------+-------------------------------------------------------------------------------------+ > > > testcase: change | phoronix-test-suite: phoronix-test-suite.neatbench.All.fps 15.2% improvement | > > > test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 512G memory | > > > test parameters | cpufreq_governor=performance | > > >                  | option_a=All (CPU + GPU) | > > >                  | test=neatbench-1.0.4 | > > >                  | ucode=0x500320a | > > +------------------+-------------------------------------------------------------------------------------+ > > > testcase: change | fsmark: fsmark.files_per_sec -9.9% regression | > > > test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | > > > test parameters | cpufreq_governor=performance | > > >                  | disk=1BRD_48G | > > >                  | filesize=4M | > > >                  | fs=f2fs | > > >                  | iterations=1x | > > >                  | nr_threads=64t | > > >                  | sync_method=fsyncBeforeClose | > > >                  | test_size=24G | > > >                  | ucode=0x500320a | > > +------------------+-------------------------------------------------------------------------------------+ > > > testcase: change | fsmark: fsmark.files_per_sec -26.2% regression | > > > test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | > > > test parameters | cpufreq_governor=performance | > > >                  | disk=1BRD_48G | > > >                  | filesize=4M | > > >                  | fs=ext4 | > > >                  | iterations=1x | > > >                  | nr_threads=64t | > > >                  | sync_method=NoSync | > > >                  | test_size=24G | > > >                  | ucode=0x500320a | > > +------------------+-------------------------------------------------------------------------------------+ > > > testcase: change | stress-ng: stress-ng.fstat.ops_per_sec -20.1% regression | > > > test machine | 96 threads 2 sockets Ice Lake with 256G memory | > > > test parameters | class=filesystem | > > >                  | cpufreq_governor=performance | > > >                  | disk=1HDD | > > >                  | fs=xfs | > > >                  | nr_threads=10% | > > >                  | test=fstat | > > >                  | testtime=60s | > > >                  | ucode=0xb000280 | > > +------------------+-------------------------------------------------------------------------------------+ > > > testcase: change | fsmark: fsmark.files_per_sec -16.3% regression | > > > test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory | > > > test parameters | cpufreq_governor=performance | > > >                  | disk=1BRD_48G | > > >                  | filesize=4M | > > >                  | fs=ext4 | > > >                  | iterations=1x | > > >                  | nr_threads=64t | > > >                  | sync_method=fsyncBeforeClose | > > >                  | test_size=24G | > > >                  | ucode=0x500320a | > > +------------------+-------------------------------------------------------------------------------------+ > > > testcase: change | stress-ng: stress-ng.fstat.ops_per_sec -20.2% regression | > > > test machine | 96 threads 2 sockets Ice Lake with 256G memory | > > > test parameters | class=filesystem | > > >                  | cpufreq_governor=performance | > > >                  | disk=1HDD | > > >                  | fs=xfs | > > >                  | nr_threads=10% | > > >                  | test=fstat | > > >                  | testtime=60s | > > >                  | ucode=0xb000280 | > > +------------------+-------------------------------------------------------------------------------------+ > > > > When I worked on the following regression report, > > https://lore.kernel.org/lkml/87tuc7fp9k.fsf@yhuang6-desk2.ccr.corp.intel.com/ > > I found stress-ng throughput will regress if the tasks are distrubuted > more evenly among NUMA nodes. So for this regression, I re-tested with > mpstat per node statistics. The results are as follows, > > mpstat.node.0.user% mpstat.node.1.user% mpstat.node.0.sys% mpstat.node.1.sys% > 889c5d60fb 3.04 2.65 30.0 25.8 > 2cfb7a1b03 2.39 2.28 31.2 28.9 > > It can be found that the task are balanced better with the commit > 2cfb7a1b03. So I think the regression isn't a real problem. I think stress-ng isn't a good workload to evaluate the effect of this patch. Can you teach me which workloads are appropriate? Best Regards, Huang, Ying