Hi Rik, FYI, we noticed the below changes on commit 0132c3e1777ceabc24c7d209b7cbe78c28c03c09 ("sched/numa: Examine a task move when examining a task swap") test case: brickland3/vm-scalability/300s-1T-anon-cow-seq 1c5d3eb3759013b 0132c3e1777ceabc24c7d209b --------------- ------------------------- 36496955 ± 0% -6.4% 34158456 ± 0% TOTAL vm-scalability.throughput 0.82 ±24% +239.4% 2.79 ±10% TOTAL turbostat.%pc2 3.83e+08 ± 4% -55.8% 1.693e+08 ±10% TOTAL cpuidle.C1-IVT-4S.time 103637 ± 3% +32.8% 137617 ± 1% TOTAL cpuidle.C6-IVT-4S.usage 14451 ±12% +28.1% 18508 ± 4% TOTAL proc-vmstat.numa_pages_migrated 14451 ±12% +28.1% 18508 ± 4% TOTAL proc-vmstat.pgmigrate_success 4.13e+09 ± 1% +18.2% 4.882e+09 ± 0% TOTAL cpuidle.C6-IVT-4S.time 5858 ± 6% -16.1% 4915 ± 1% TOTAL proc-vmstat.nr_mapped 786 ± 7% +14.1% 897 ± 6% TOTAL numa-vmstat.node0.nr_alloc_batch 9.62 ± 2% -13.2% 8.35 ± 2% TOTAL turbostat.%c1 14.64 ± 3% +16.1% 17.00 ± 1% TOTAL turbostat.%c6 123533 ± 5% -9.3% 112004 ± 0% TOTAL proc-vmstat.pgactivate 21982 ± 4% -11.0% 19567 ± 1% TOTAL meminfo.Mapped 5037 ± 4% -8.1% 4630 ± 4% TOTAL numa-meminfo.node0.Mapped 45737 ±10% -28.6% 32671 ± 9% TOTAL time.voluntary_context_switches 3124 ± 7% -15.6% 2637 ± 2% TOTAL vmstat.system.cs 10226 ± 0% +7.4% 10978 ± 0% TOTAL time.system_time 154 ± 0% +7.1% 165 ± 0% TOTAL time.elapsed_time 153975 ± 3% -6.2% 144429 ± 0% TOTAL vmstat.system.in 168 ± 0% -3.2% 163 ± 0% TOTAL turbostat.RAM_W 374 ± 0% -1.6% 368 ± 0% TOTAL turbostat.Cor_W 458 ± 0% -1.5% 451 ± 0% TOTAL turbostat.Pkg_W 9092 ± 0% -1.5% 8953 ± 0% TOTAL time.percent_of_cpu_this_job_got 75.72 ± 0% -1.5% 74.61 ± 0% TOTAL turbostat.%c0 The test box configuration is brickland3: Brickland Ivy Bridge-EX Memory: 512G Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 120 On-line CPU(s) list: 0-119 Thread(s) per core: 2 Core(s) per socket: 15 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Stepping: 7 CPU MHz: 3192.875 BogoMIPS: 5593.49 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 38400K NUMA node0 CPU(s): 0-14,60-74 NUMA node1 CPU(s): 15-29,75-89 NUMA node2 CPU(s): 30-44,90-104 NUMA node3 CPU(s): 45-59,105-119 time.system_time 11500 ++------------------------------------------------------------------+ | | | O O O O O O O O O O O O O | 11000 O+ O O O O O O O O O O O O | | | | 10500 ++ | | .*..*..*.. .*.. .*.. .*..*..*. | 10000 ++ * * *. * * | | + | | * + | 9500 ++ .. : *. .* | | .*.* : + *. | *. : + | 9000 ++---------*--------------------------------------------------------+ time.elapsed_time 170 ++--------------------------------------------------------------------+ | O O O O O O O | O O O O O O O O O O O O O O O O O 165 ++ O O | | | | | 160 ++ | | | 155 ++ .*.. .*.. .*..*.. | | *. *..*..*.*..*. .* * | | * + *. | 150 ++ + + *..*..* | *..*..* + .. | | * | 145 ++--------------------------------------------------------------------+ vm-scalability.throughput 4e+07 ++---------*------------------------------------------------------+ *.. : : | 3.9e+07 ++ *.*.. : : .*. | | : *. * | 3.8e+07 ++ * : | | : | 3.7e+07 ++ : .*.. .*.. | | *..*.*..*..*.*. * .*.*..* | 3.6e+07 ++ *. | | | 3.5e+07 ++ | | O | 3.4e+07 O+ O O O O O O O O O O O O O O O O O O | O O O O O O | 3.3e+07 ++----------------------------------------------------------------+ turbostat.RAM_W 173 ++--------------------------------------------------------------------+ 172 *+. *.. | | .* + | 171 ++ *. + + *..*..*. | 170 ++ * *..*.. *.. | 169 ++ *.. .*.. .. .*..* | 168 ++ *..*.*. * *.*. | | | 167 ++ | 166 ++ | 165 ++ | 164 ++ O O | O O O O O O O O 163 ++ O O O O O O O O O O O O O O O | 162 ++--------------------------------O-----------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Fengguang