Hi, Michal On 02/07, Michal Hocko wrote: [snip] >Could you retest with a single NUMA node? I am not familiar with the >benchmark enough to judge it was set up properly for a NUMA machine. I've retested the commit with a single NUMA node via "numactl -m 0 fs_mark xxx", and it did help recover the performance back. Here is the comparison: commit/compiler/cpufreq_governor/disk/filesize/fs/iterations/kconfig/md/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase: 5e56dfbd837421b7fa3c6c06018c6701e2704917/gcc-6/performance/3HDD/4M/btrfs/1/x86_64-rhel-7.2/RAID5/64/debian-x86_64-2016-08-31.cgz/NoSync/ivb44/130G/fsmark (with a single NUMA node) (2 NUMA nodes) -------------------------------------------------------------------- fail:runs %reproduction fail:runs | | | %stddev %change %stddev \ | \ 57.60 ± 0% -11.1% 51.20 ± 0% fsmark.files_per_sec 607.84 ± 0% +9.0% 662.24 ± 1% fsmark.time.elapsed_time 607.84 ± 0% +9.0% 662.24 ± 1% fsmark.time.elapsed_time.max 14317 ± 6% -12.2% 12568 ± 7% fsmark.time.involuntary_context_switches 1864 ± 0% +0.5% 1873 ± 0% fsmark.time.maximum_resident_set_size 12425 ± 0% +23.3% 15320 ± 3% fsmark.time.minor_page_faults 33.00 ± 3% -33.9% 21.80 ± 1% fsmark.time.percent_of_cpu_this_job_got 203.49 ± 3% -28.1% 146.31 ± 1% fsmark.time.system_time 605701 ± 0% +3.6% 627486 ± 0% fsmark.time.voluntary_context_switches 307106 ± 2% +20.2% 368992 ± 9% interrupts.CAL:Function_call_interrupts 183040 ± 0% +23.2% 225559 ± 3% softirqs.BLOCK 12203 ± 57% +236.4% 41056 ±103% softirqs.NET_RX 186118 ± 0% +21.9% 226922 ± 2% softirqs.TASKLET 14317 ± 6% -12.2% 12568 ± 7% time.involuntary_context_switches 12425 ± 0% +23.3% 15320 ± 3% time.minor_page_faults 33.00 ± 3% -33.9% 21.80 ± 1% time.percent_of_cpu_this_job_got 203.49 ± 3% -28.1% 146.31 ± 1% time.system_time 3.47 ± 3% -13.0% 3.02 ± 1% turbostat.%Busy 99.60 ± 1% -9.6% 90.00 ± 1% turbostat.Avg_MHz 78.69 ± 1% +1.7% 80.01 ± 0% turbostat.CorWatt 3.56 ± 61% -91.7% 0.30 ± 76% turbostat.Pkg%pc2 207790 ± 0% -8.2% 190654 ± 1% vmstat.io.bo 30667691 ± 0% +65.9% 50890669 ± 1% vmstat.memory.cache 34549892 ± 0% -58.4% 14378939 ± 4% vmstat.memory.free 6768 ± 0% -1.3% 6681 ± 1% vmstat.system.cs 1.089e+10 ± 2% +13.4% 1.236e+10 ± 3% cpuidle.C1E-IVT.time 11475304 ± 2% +13.4% 13007849 ± 3% cpuidle.C1E-IVT.usage 2.7e+09 ± 6% +13.2% 3.057e+09 ± 3% cpuidle.C3-IVT.time 2954294 ± 6% +14.3% 3375966 ± 3% cpuidle.C3-IVT.usage 96963295 ± 14% +17.5% 1.139e+08 ± 12% cpuidle.POLL.time 8761 ± 7% +17.6% 10299 ± 9% cpuidle.POLL.usage 30454483 ± 0% +66.4% 50666102 ± 1% meminfo.Cached Do you see what's happening? Or is there anything we can do to improve fsmark benchmark setup to make it more reasonable? Thanks, Xiaolong >-- >Michal Hocko >SUSE Labs