Greeting, FYI, we noticed a -43.3% regression of fio.read_iops due to commit: commit: a0ac629ebe7b3d248cb93807782a00d9142fdb98 ("x86/copy_mc: Introduce copy_mc_generic()") url: https://github.com/0day-ci/linux/commits/Dan-Williams/Renovate-memcpy_mcsafe-with-copy_mc_to_-user-kernel/20200802-014046 in testcase: fio-basic on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory with following parameters: disk: 2pmem fs: xfs mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: read bs: 2M ioengine: libaio test_size: 200G cpufreq_governor: performance ucode: 0x5002f01 test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. test-url: https://github.com/axboe/fio In addition to that, the commit also has significant impact on the following tests: +------------------+----------------------------------------------------------------------+ | testcase: change | fio-basic: fio.read_iops -55.6% regression | | test machine | 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory | | test parameters | bs=2M | | | cpufreq_governor=performance | | | disk=2pmem | | | fs=xfs | | | ioengine=sync | | | mount_option=dax | | | nr_task=50% | | | runtime=200s | | | rw=read | | | test_size=200G | | | time_based=tb | | | ucode=0x5002f01 | +------------------+----------------------------------------------------------------------+ If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode: 2M/gcc-9/performance/2pmem/xfs/libaio/x86_64-rhel-8.3/dax/50%/debian-10.4-x86_64-20200603.cgz/200s/read/lkp-csl-2sp6/200G/fio-basic/tb/0x5002f01 commit: 7476b91d4d ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()") a0ac629ebe ("x86/copy_mc: Introduce copy_mc_generic()") 7476b91d4db369d8 a0ac629ebe7b3d248cb93807782 ---------------- --------------------------- %stddev %change %stddev \ | \ 97.22 -96.0 1.19 ± 21% fio.latency_100ms% 0.14 -0.1 0.05 fio.latency_10ms% 0.27 ± 13% -0.1 0.14 fio.latency_20ms% 0.04 ± 6% -0.0 0.03 ± 12% fio.latency_20us% 1.00 ± 28% +96.6 97.57 fio.latency_250ms% 0.05 -0.0 0.05 fio.latency_4ms% 0.02 ± 48% +0.3 0.31 ± 15% fio.latency_500ms% 1.25 ± 47% -0.6 0.63 ± 11% fio.latency_50ms% 0.01 ± 9% +0.0 0.02 ± 24% fio.latency_50us% 44292 -43.3% 25124 fio.read_bw_MBps 67895296 +76.8% 1.201e+08 fio.read_clat_90%_us 68681728 +76.7% 1.214e+08 fio.read_clat_95%_us 98304000 ± 19% +80.3% 1.772e+08 ± 4% fio.read_clat_99%_us 66674508 +76.2% 1.175e+08 fio.read_clat_mean_us 9950116 ± 12% +80.3% 17935634 fio.read_clat_stddev 22146 -43.3% 12562 fio.read_iops 2152824 +76.8% 3805428 fio.read_slat_mean_us 291719 ± 14% +86.6% 544324 fio.read_slat_stddev 12923 -2.5% 12594 fio.time.involuntary_context_switches 77.65 ± 3% -39.1% 47.29 fio.time.user_time 4429275 -43.3% 2512537 fio.workload 0.14 ± 3% +0.0 0.16 ± 4% mpstat.cpu.all.soft% 0.47 ± 3% -0.2 0.31 mpstat.cpu.all.usr% 53185 ± 91% +121.2% 117642 ± 40% numa-vmstat.node0.numa_other 122640 ± 39% -52.6% 58092 ± 81% numa-vmstat.node1.numa_other 60096 +1.5% 61021 proc-vmstat.nr_slab_unreclaimable 20103 ± 5% -17.9% 16495 ± 12% proc-vmstat.pgactivate 49.00 -2.0% 48.00 vmstat.cpu.id 1612 -1.6% 1587 vmstat.system.cs 2713 ± 4% +8.0% 2931 ± 4% slabinfo.PING.active_objs 2713 ± 4% +8.0% 2931 ± 4% slabinfo.PING.num_objs 1164 ± 9% +16.8% 1360 ± 6% slabinfo.task_group.active_objs 1164 ± 9% +16.8% 1360 ± 6% slabinfo.task_group.num_objs 379.25 ± 85% +279.7% 1439 ± 75% sched_debug.cfs_rq:/.exec_clock.min 29948 ± 5% -15.5% 25309 ± 5% sched_debug.cfs_rq:/.exec_clock.stddev 21606 ± 7% +25.1% 27034 ± 7% sched_debug.cfs_rq:/.min_vruntime.min 33321 ± 6% -16.5% 27820 ± 6% sched_debug.cfs_rq:/.min_vruntime.stddev 13783 ±109% +184.1% 39158 ± 20% sched_debug.cfs_rq:/.spread0.avg -38497 -76.6% -9012 sched_debug.cfs_rq:/.spread0.min 33321 ± 6% -16.5% 27820 ± 6% sched_debug.cfs_rq:/.spread0.stddev 12.22 ± 10% +27.9% 15.62 ± 3% sched_debug.cpu.clock.stddev 3716 ±173% -100.0% 1.50 ± 57% softirqs.CPU10.NET_RX 17411 ± 36% -41.8% 10126 ± 19% softirqs.CPU24.SCHED 9179 ± 67% +87.1% 17173 ± 23% softirqs.CPU35.SCHED 9611 ± 34% -58.9% 3951 ± 10% softirqs.CPU48.SCHED 17177 ± 30% -42.6% 9864 ± 37% softirqs.CPU69.SCHED 86644 ± 29% -22.3% 67339 ± 5% softirqs.CPU76.TIMER 6339 ± 66% +115.9% 13686 ± 31% softirqs.CPU78.SCHED 10156 ± 64% +91.8% 19477 ± 25% softirqs.CPU81.SCHED 1239 ±172% -100.0% 0.00 interrupts.62:PCI-MSI.31981595-edge.i40e-eth0-TxRx-26 47482 +5.4% 50055 ± 4% interrupts.CAL:Function_call_interrupts 209.00 ± 23% -50.4% 103.75 ± 8% interrupts.CPU0.RES:Rescheduling_interrupts 146.25 ± 16% -27.4% 106.25 ± 16% interrupts.CPU15.RES:Rescheduling_interrupts 168.75 ± 81% -64.6% 59.75 ± 33% interrupts.CPU15.TLB:TLB_shootdowns 7321 ± 5% -52.7% 3461 ± 39% interrupts.CPU20.NMI:Non-maskable_interrupts 7321 ± 5% -52.7% 3461 ± 39% interrupts.CPU20.PMI:Performance_monitoring_interrupts 6665 ± 14% -61.2% 2586 ± 26% interrupts.CPU21.NMI:Non-maskable_interrupts 6665 ± 14% -61.2% 2586 ± 26% interrupts.CPU21.PMI:Performance_monitoring_interrupts 64.50 ± 23% +41.9% 91.50 ± 22% interrupts.CPU21.TLB:TLB_shootdowns 100.00 ± 41% +66.0% 166.00 ± 9% interrupts.CPU24.RES:Rescheduling_interrupts 1238 ±173% -100.0% 0.00 interrupts.CPU26.62:PCI-MSI.31981595-edge.i40e-eth0-TxRx-26 438.25 ± 4% +16.1% 509.00 ± 18% interrupts.CPU28.CAL:Function_call_interrupts 145.50 ± 20% -34.4% 95.50 ± 25% interrupts.CPU35.RES:Rescheduling_interrupts 7134 ± 11% -28.3% 5118 ± 19% interrupts.CPU41.NMI:Non-maskable_interrupts 7134 ± 11% -28.3% 5118 ± 19% interrupts.CPU41.PMI:Performance_monitoring_interrupts 107.75 ± 34% -47.3% 56.75 ± 40% interrupts.CPU93.RES:Rescheduling_interrupts 63.18 ± 12% -26.1 37.12 ± 15% perf-profile.calltrace.cycles-pp.copy_mc_fragile.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor 0.00 +3.7 3.72 ± 52% perf-profile.calltrace.cycles-pp.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor 0.00 +37.8 37.83 ± 12% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter 63.34 ± 12% -26.2 37.14 ± 15% perf-profile.children.cycles-pp.copy_mc_fragile 2.41 ±112% -2.2 0.25 ±108% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 2.26 ±109% -2.0 0.29 ± 89% perf-profile.children.cycles-pp.asm_call_on_stack 2.15 ±112% -1.9 0.23 ±110% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 2.12 ±113% -1.9 0.23 ±110% perf-profile.children.cycles-pp.hrtimer_interrupt 1.68 ±114% -1.5 0.17 ±119% perf-profile.children.cycles-pp.__hrtimer_run_queues 1.48 ±123% -1.3 0.15 ±121% perf-profile.children.cycles-pp.tick_sched_timer 1.34 ±120% -1.2 0.14 ±122% perf-profile.children.cycles-pp.tick_sched_handle 1.28 ±119% -1.1 0.14 ±122% perf-profile.children.cycles-pp.update_process_times 0.70 ±107% -0.6 0.10 ±120% perf-profile.children.cycles-pp.scheduler_tick 2.65 ±106% +16.5 19.13 ± 12% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.00 +22.6 22.58 ± 7% perf-profile.children.cycles-pp.copy_mc_generic 62.52 ± 12% -25.5 37.00 ± 15% perf-profile.self.cycles-pp.copy_mc_fragile 0.00 +22.4 22.41 ± 6% perf-profile.self.cycles-pp.copy_mc_generic 42.43 +68.7% 71.58 perf-stat.i.MPKI 5.949e+09 -42.2% 3.44e+09 perf-stat.i.branch-instructions 0.07 +0.0 0.10 ± 5% perf-stat.i.branch-miss-rate% 3554006 ± 2% -7.5% 3286479 ± 3% perf-stat.i.branch-misses 95.02 -2.4 92.63 perf-stat.i.cache-miss-rate% 1.444e+09 -5.2% 1.369e+09 perf-stat.i.cache-misses 1.513e+09 -2.8% 1.471e+09 perf-stat.i.cache-references 3.81 +72.5% 6.58 perf-stat.i.cpi 102.49 +4.5% 107.13 perf-stat.i.cycles-between-cache-misses 0.00 ± 4% +0.0 0.00 ± 41% perf-stat.i.dTLB-load-miss-rate% 6.03e+09 -42.0% 3.495e+09 perf-stat.i.dTLB-loads 0.00 ± 5% +0.0 0.00 ± 7% perf-stat.i.dTLB-store-miss-rate% 5.909e+09 -42.5% 3.4e+09 perf-stat.i.dTLB-stores 47.00 +1.4 48.45 perf-stat.i.iTLB-load-miss-rate% 2270674 -11.0% 2021114 perf-stat.i.iTLB-load-misses 2563127 -16.0% 2151931 perf-stat.i.iTLB-loads 3.548e+10 -42.4% 2.044e+10 perf-stat.i.instructions 15634 -35.2% 10127 perf-stat.i.instructions-per-iTLB-miss 0.26 -41.6% 0.15 perf-stat.i.ipc 207.77 -37.5% 129.85 perf-stat.i.metric.M/sec 78061415 ± 13% +98.0% 1.546e+08 ± 20% perf-stat.i.node-load-misses 85582855 ± 11% +58.1% 1.353e+08 ± 20% perf-stat.i.node-loads 3.817e+08 -2.8% 3.709e+08 perf-stat.i.node-stores 42.66 +68.7% 71.96 perf-stat.overall.MPKI 0.06 +0.0 0.09 ± 3% perf-stat.overall.branch-miss-rate% 95.45 -2.4 93.07 perf-stat.overall.cache-miss-rate% 3.81 +73.0% 6.59 perf-stat.overall.cpi 93.55 +5.2% 98.41 perf-stat.overall.cycles-between-cache-misses 0.00 ± 5% +0.0 0.00 ± 13% perf-stat.overall.dTLB-load-miss-rate% 0.00 ± 5% +0.0 0.00 ± 2% perf-stat.overall.dTLB-store-miss-rate% 46.98 +1.5 48.43 perf-stat.overall.iTLB-load-miss-rate% 15639 -35.2% 10127 perf-stat.overall.instructions-per-iTLB-miss 0.26 -42.2% 0.15 perf-stat.overall.ipc 1605743 +1.5% 1630326 perf-stat.overall.path-length 5.919e+09 -42.2% 3.422e+09 perf-stat.ps.branch-instructions 3519866 ± 2% -7.8% 3245208 ± 3% perf-stat.ps.branch-misses 1.437e+09 -5.2% 1.362e+09 perf-stat.ps.cache-misses 1.506e+09 -2.8% 1.463e+09 perf-stat.ps.cache-references 1552 -1.4% 1530 perf-stat.ps.context-switches 6e+09 -42.1% 3.477e+09 perf-stat.ps.dTLB-loads 5.88e+09 -42.5% 3.382e+09 perf-stat.ps.dTLB-stores 2257568 -11.0% 2008542 perf-stat.ps.iTLB-load-misses 2547705 -16.1% 2138603 perf-stat.ps.iTLB-loads 3.53e+10 -42.4% 2.034e+10 perf-stat.ps.instructions 77685715 ± 13% +97.9% 1.538e+08 ± 20% perf-stat.ps.node-load-misses 85143339 ± 11% +58.1% 1.346e+08 ± 20% perf-stat.ps.node-loads 3.797e+08 -2.8% 3.69e+08 perf-stat.ps.node-stores 7.112e+12 -42.4% 4.096e+12 perf-stat.total.instructions fio.read_bw_MBps 46000 +-------------------------------------------------------------------+ 44000 |..+.+..+.+..+..+.+..+..+.+.. .+.. .+.+..+.+..+ | | + +. | 42000 |-+ | 40000 |-+ | 38000 |-+ | 36000 |-+ | | | 34000 |-+ | 32000 |-+ | 30000 |-+ | 28000 |-+ | | | 26000 |-+O O O O O O O O O O O O O O O O O O O O O O O O O | 24000 +-------------------------------------------------------------------+ fio.read_iops 23000 +-------------------------------------------------------------------+ 22000 |..+.+..+.+..+..+.+..+..+.+.. .+.. .+.+..+.+..+ | | + +. | 21000 |-+ | 20000 |-+ | 19000 |-+ | 18000 |-+ | | | 17000 |-+ | 16000 |-+ | 15000 |-+ | 14000 |-+ | | | 13000 |-+O O O O O O O O O O O O O O O O O O O O O O O O O | 12000 +-------------------------------------------------------------------+ fio.read_clat_mean_us 1.2e+08 +-----------------------------------------------------------------+ | O O O O O O O O O O O O O O O O O O O O O O O O | 1.1e+08 |-+ | | | | | 1e+08 |-+ | | | 9e+07 |-+ | | | 8e+07 |-+ | | | | | 7e+07 |..+.+.. .+..+. .+..+.+..+..+.+..+.+.. | | +.+..+ +..+ + | 6e+07 +-----------------------------------------------------------------+ fio.read_clat_90__us 1.3e+08 +-----------------------------------------------------------------+ | | 1.2e+08 |-+O O O O O O O O O O O O O O O O O O O O O O O O O | | | 1.1e+08 |-+ | | | 1e+08 |-+ | | | 9e+07 |-+ | | | 8e+07 |-+ | | | 7e+07 |..+.+..+.+..+.+..+.+..+.+..+.+..+..+.+..+.+..+ | | | 6e+07 +-----------------------------------------------------------------+ fio.read_clat_95__us 1.3e+08 +-----------------------------------------------------------------+ | O O O O O | 1.2e+08 |-+O O O O O O O O O O O O O O O O O O O O | | | 1.1e+08 |-+ | | | 1e+08 |-+ | | | 9e+07 |-+ | | | 8e+07 |-+ | | .+. .+.. | 7e+07 |..+.+..+.+..+.+..+.+..+.+. +. +.+..+.+..+ | | | 6e+07 +-----------------------------------------------------------------+ fio.read_slat_mean_us 4e+06 +-----------------------------------------------------------------+ 3.8e+06 |-+O O O O O O O O O O O O O O O O O O O O O O O O O | | | 3.6e+06 |-+ | 3.4e+06 |-+ | | | 3.2e+06 |-+ | 3e+06 |-+ | 2.8e+06 |-+ | | | 2.6e+06 |-+ | 2.4e+06 |-+ | | | 2.2e+06 |..+.+..+.+..+.+..+.+..+.+..+.+..+..+.+..+.+..+ | 2e+06 +-----------------------------------------------------------------+ fio.latency_10ms_ 0.15 +--------------------------------------------------------------------+ 0.14 |..+.+..+..+.+..+..+.+..+..+.+..+..+.+..+.+..+..+ | | | 0.13 |-+ | 0.12 |-+ | 0.11 |-+ | 0.1 |-+ | | | 0.09 |-+ | 0.08 |-+ | 0.07 |-+ | 0.06 |-+ | | | 0.05 |-+O O O O O O O O O O O O O O O O O O O O O O O O O | 0.04 +--------------------------------------------------------------------+ fio.latency_20ms_ 0.55 +--------------------------------------------------------------------+ | + + + | 0.5 |-+ : + + :: | 0.45 |++ : + + + : : | | + : + + : | 0.4 |-+ : : : : : | 0.35 |-+ : : : : : | | : : : : : + | 0.3 |-+ : : : +. : + + | 0.25 |-+ : : .. +.. : + + | | + +..+ +..+ +..+ | 0.2 |-+ | 0.15 |-+ | | O O O O O O O O O O O O O O O O O O O O O O O O O | 0.1 +--------------------------------------------------------------------+ fio.latency_100ms_ 100 +---------------------------------------------------------------------+ 90 |-+ + +.+..+..+ +. | | | 80 |-+ | 70 |-+ | | | 60 |-+ | 50 |-+ | 40 |-+ | | | 30 |-+ | 20 |-+ | | | 10 |-+ | 0 +---------------------------------------------------------------------+ fio.latency_250ms_ 100 +---------------------------------------------------------------------+ 90 |-+ O | | | 80 |-+ | 70 |-+ | | | 60 |-+ | 50 |-+ | 40 |-+ | | | 30 |-+ | 20 |-+ | | | 10 |-+ | 0 +---------------------------------------------------------------------+ fio.workload 4.6e+06 +-----------------------------------------------------------------+ 4.4e+06 |..+.+..+.+..+.+..+.+..+.+.. .+.. .+.+..+.+..+ | | + +. | 4.2e+06 |-+ | 4e+06 |-+ | 3.8e+06 |-+ | 3.6e+06 |-+ | | | 3.4e+06 |-+ | 3.2e+06 |-+ | 3e+06 |-+ | 2.8e+06 |-+ | | | 2.6e+06 |-+O O O O O O O O O O O O O O O O O O O O O O O O O | 2.4e+06 +-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample *************************************************************************************************** lkp-csl-2sp6: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 256G memory ========================================================================================= bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/mount_option/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase/time_based/ucode: 2M/gcc-9/performance/2pmem/xfs/sync/x86_64-rhel-8.3/dax/50%/debian-10.4-x86_64-20200603.cgz/200s/read/lkp-csl-2sp6/200G/fio-basic/tb/0x5002f01 commit: 7476b91d4d ("x86, powerpc: Rename memcpy_mcsafe() to copy_mc_to_{user, kernel}()") a0ac629ebe ("x86/copy_mc: Introduce copy_mc_generic()") 7476b91d4db369d8 a0ac629ebe7b3d248cb93807782 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.61 ± 15% -0.4 0.22 ± 94% fio.latency_1000us% 0.01 ± 11% +1.3 1.27 ± 25% fio.latency_10ms% 96.06 -95.5 0.60 ± 80% fio.latency_2ms% 1.27 ± 33% +96.2 97.48 fio.latency_4ms% 1.29 ± 55% -1.2 0.05 ± 54% fio.latency_500us% 75143 -55.6% 33381 fio.read_bw_MBps 1372160 +118.5% 2998272 fio.read_clat_90%_us 1409024 +116.9% 3055616 fio.read_clat_95%_us 2142208 ± 19% +120.3% 4718592 ± 17% fio.read_clat_99%_us 1272849 +125.4% 2869293 fio.read_clat_mean_us 228201 ± 15% +103.6% 464620 ± 14% fio.read_clat_stddev 37571 -55.6% 16690 fio.read_iops 69.28 ± 2% -40.3% 41.38 ± 3% fio.time.user_time 7514438 -55.6% 3338252 fio.workload 0.11 ± 3% +0.0 0.14 ± 5% mpstat.cpu.all.soft% 0.43 ± 3% -0.1 0.28 ± 2% mpstat.cpu.all.usr% 115069 -2.3% 112454 proc-vmstat.nr_shmem 20846 ± 6% -27.8% 15052 ± 3% proc-vmstat.pgactivate 967.50 ± 27% -50.0% 483.75 ± 78% slabinfo.xfs_buf_item.active_objs 967.50 ± 27% -50.0% 483.75 ± 78% slabinfo.xfs_buf_item.num_objs 100.00 -2.0% 98.00 vmstat.io.bo 1672 -3.3% 1616 vmstat.system.cs 9.059e+09 ± 6% -32.3% 6.131e+09 ± 54% cpuidle.C1E.time 19004364 ± 3% -22.4% 14741281 ± 34% cpuidle.C1E.usage 4.034e+08 ±133% +713.0% 3.28e+09 ±100% cpuidle.C6.time 570211 ±122% +571.6% 3829822 ± 86% cpuidle.C6.usage 61.80 ± 9% -17.6 44.19 perf-profile.calltrace.cycles-pp.copy_mc_fragile.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor 0.00 +7.8 7.81 ± 6% perf-profile.calltrace.cycles-pp.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter.dax_iomap_actor 0.00 +29.2 29.21 ± 5% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.copy_mc_generic.copy_mc_to_user.copyout_mc._copy_mc_to_iter 61.92 ± 9% -17.7 44.25 perf-profile.children.cycles-pp.copy_mc_fragile 3.47 ±132% +11.7 15.21 ± 5% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.00 +22.3 22.32 perf-profile.children.cycles-pp.copy_mc_generic 61.16 ± 9% -17.4 43.78 perf-profile.self.cycles-pp.copy_mc_fragile 0.00 +22.1 22.09 perf-profile.self.cycles-pp.copy_mc_generic 212.00 ± 38% +288.6% 823.90 ± 67% sched_debug.cfs_rq:/.exec_clock.min 34013 ± 3% -17.1% 28181 ± 2% sched_debug.cfs_rq:/.exec_clock.stddev 36118 ± 5% -15.0% 30710 ± 2% sched_debug.cfs_rq:/.min_vruntime.stddev 36118 ± 5% -15.0% 30707 ± 2% sched_debug.cfs_rq:/.spread0.stddev 9.52 ± 11% +33.8% 12.73 ± 9% sched_debug.cpu.clock.stddev 17832 ± 13% -47.5% 9368 ± 17% sched_debug.cpu.sched_count.max 2475 ± 9% -34.4% 1624 ± 8% sched_debug.cpu.sched_count.stddev 8858 ± 13% -48.3% 4577 ± 18% sched_debug.cpu.sched_goidle.max 1260 ± 9% -33.2% 841.68 ± 8% sched_debug.cpu.sched_goidle.stddev 8285 ± 16% -32.1% 5622 ± 7% sched_debug.cpu.ttwu_count.max 1169 ± 9% -24.9% 878.40 ± 4% sched_debug.cpu.ttwu_count.stddev 26587 ± 8% -21.9% 20773 ± 22% softirqs.CPU1.SCHED 19906 ± 37% -55.7% 8824 ± 96% softirqs.CPU10.SCHED 21997 ± 34% -82.2% 3910 ± 55% softirqs.CPU20.SCHED 5126 ± 70% +166.6% 13666 ± 15% softirqs.CPU30.SCHED 5567 ± 56% +165.3% 14772 ± 29% softirqs.CPU31.SCHED 10027 ± 35% +101.3% 20182 ± 18% softirqs.CPU33.SCHED 4868 ± 50% +112.6% 10349 ± 14% softirqs.CPU44.SCHED 6304 ± 60% +154.5% 16043 ± 22% softirqs.CPU46.SCHED 4127 ± 76% +198.6% 12326 ± 32% softirqs.CPU49.SCHED 6313 ± 62% +98.5% 12530 ± 19% softirqs.CPU51.SCHED 8249 ± 58% +148.7% 20515 ± 31% softirqs.CPU57.SCHED 6971 ±109% +268.6% 25698 ± 8% softirqs.CPU68.SCHED 25116 ± 15% -32.4% 16974 ± 12% softirqs.CPU78.SCHED 24757 ± 12% -36.8% 15657 ± 27% softirqs.CPU79.SCHED 20231 ± 14% -45.5% 11024 ± 24% softirqs.CPU81.SCHED 21830 ± 23% -55.4% 9733 ± 67% softirqs.CPU9.SCHED 24043 ± 16% -39.9% 14449 ± 23% softirqs.CPU94.SCHED 42.31 +68.3% 71.22 perf-stat.i.MPKI 9.958e+09 -54.7% 4.511e+09 perf-stat.i.branch-instructions 0.05 ± 2% +0.0 0.08 ± 4% perf-stat.i.branch-miss-rate% 3682118 ± 2% -8.2% 3381534 perf-stat.i.branch-misses 67.34 +10.4 77.74 perf-stat.i.cache-miss-rate% 1.709e+09 -12.2% 1.501e+09 perf-stat.i.cache-misses 2.531e+09 -24.0% 1.923e+09 perf-stat.i.cache-references 1639 -4.1% 1571 perf-stat.i.context-switches 2.25 +121.4% 4.98 perf-stat.i.cpi 99.03 -1.8% 97.24 perf-stat.i.cpu-migrations 85.60 +14.2% 97.78 perf-stat.i.cycles-between-cache-misses 0.00 ± 18% +0.0 0.00 ± 44% perf-stat.i.dTLB-load-miss-rate% 9.996e+09 -54.5% 4.549e+09 perf-stat.i.dTLB-loads 0.00 ± 7% +0.0 0.00 ± 6% perf-stat.i.dTLB-store-miss-rate% 9.904e+09 -54.9% 4.466e+09 perf-stat.i.dTLB-stores 44.79 +4.2 48.99 perf-stat.i.iTLB-load-miss-rate% 2535885 -13.8% 2185118 perf-stat.i.iTLB-load-misses 3134177 -27.3% 2278467 perf-stat.i.iTLB-loads 5.952e+10 -54.9% 2.687e+10 perf-stat.i.instructions 23480 -47.6% 12304 perf-stat.i.instructions-per-iTLB-miss 0.45 -54.6% 0.20 perf-stat.i.ipc 342.39 -51.0% 167.90 perf-stat.i.metric.M/sec 1.165e+08 ± 30% +72.8% 2.013e+08 ± 9% perf-stat.i.node-load-misses 1.257e+08 ± 26% +41.1% 1.773e+08 ± 9% perf-stat.i.node-loads 2.42e+08 +19.6% 2.895e+08 perf-stat.i.node-stores 42.53 +68.3% 71.58 perf-stat.overall.MPKI 0.04 ± 2% +0.0 0.07 perf-stat.overall.branch-miss-rate% 67.52 +10.5 78.07 perf-stat.overall.cache-miss-rate% 2.24 +122.4% 4.99 perf-stat.overall.cpi 78.17 +14.3% 89.34 perf-stat.overall.cycles-between-cache-misses 0.00 ± 25% +0.0 0.00 ± 12% perf-stat.overall.dTLB-load-miss-rate% 0.00 ± 13% +0.0 0.00 ± 10% perf-stat.overall.dTLB-store-miss-rate% 44.72 +4.2 48.96 perf-stat.overall.iTLB-load-miss-rate% 23499 -47.6% 12306 perf-stat.overall.instructions-per-iTLB-miss 0.45 -55.0% 0.20 perf-stat.overall.ipc 1587395 +1.5% 1611895 perf-stat.overall.path-length 9.912e+09 -54.7% 4.489e+09 perf-stat.ps.branch-instructions 3650903 ± 2% -8.4% 3345674 perf-stat.ps.branch-misses 1.701e+09 -12.2% 1.494e+09 perf-stat.ps.cache-misses 2.52e+09 -24.1% 1.914e+09 perf-stat.ps.cache-references 1616 -3.7% 1556 perf-stat.ps.context-switches 9.95e+09 -54.5% 4.526e+09 perf-stat.ps.dTLB-loads 9.859e+09 -54.9% 4.445e+09 perf-stat.ps.dTLB-stores 2521574 -13.8% 2172342 perf-stat.ps.iTLB-load-misses 3116655 -27.3% 2264894 perf-stat.ps.iTLB-loads 5.925e+10 -54.9% 2.673e+10 perf-stat.ps.instructions 1.159e+08 ± 30% +72.8% 2.003e+08 ± 9% perf-stat.ps.node-load-misses 1.25e+08 ± 26% +41.1% 1.764e+08 ± 9% perf-stat.ps.node-loads 2.407e+08 +19.6% 2.878e+08 perf-stat.ps.node-stores 1.193e+13 -54.9% 5.381e+12 perf-stat.total.instructions 0.00 +2.7e+105% 2689 ±171% interrupts.115:PCI-MSI.31981648-edge.i40e-eth0-TxRx-79 62.75 ± 27% +51.8% 95.25 ± 21% interrupts.CPU1.RES:Rescheduling_interrupts 6530 ± 17% -44.1% 3647 ± 35% interrupts.CPU17.NMI:Non-maskable_interrupts 6530 ± 17% -44.1% 3647 ± 35% interrupts.CPU17.PMI:Performance_monitoring_interrupts 62.00 ± 74% +187.9% 178.50 ± 5% interrupts.CPU20.RES:Rescheduling_interrupts 365.00 ± 78% -76.0% 87.50 ± 53% interrupts.CPU25.TLB:TLB_shootdowns 170.50 ± 15% -26.8% 124.75 ± 10% interrupts.CPU30.RES:Rescheduling_interrupts 7605 -43.3% 4316 ± 32% interrupts.CPU31.NMI:Non-maskable_interrupts 7605 -43.3% 4316 ± 32% interrupts.CPU31.PMI:Performance_monitoring_interrupts 169.00 ± 12% -37.1% 106.25 ± 23% interrupts.CPU31.RES:Rescheduling_interrupts 7145 ± 11% -33.0% 4786 ± 18% interrupts.CPU36.NMI:Non-maskable_interrupts 7145 ± 11% -33.0% 4786 ± 18% interrupts.CPU36.PMI:Performance_monitoring_interrupts 136.50 ± 27% -44.7% 75.50 ± 60% interrupts.CPU39.TLB:TLB_shootdowns 149.25 ± 24% -24.6% 112.50 ± 30% interrupts.CPU4.RES:Rescheduling_interrupts 7599 -46.6% 4061 ± 35% interrupts.CPU41.NMI:Non-maskable_interrupts 7599 -46.6% 4061 ± 35% interrupts.CPU41.PMI:Performance_monitoring_interrupts 6661 ± 24% -52.1% 3191 ± 51% interrupts.CPU44.NMI:Non-maskable_interrupts 6661 ± 24% -52.1% 3191 ± 51% interrupts.CPU44.PMI:Performance_monitoring_interrupts 7622 -43.5% 4307 ± 33% interrupts.CPU46.NMI:Non-maskable_interrupts 7622 -43.5% 4307 ± 33% interrupts.CPU46.PMI:Performance_monitoring_interrupts 7613 -43.1% 4331 ± 31% interrupts.CPU47.NMI:Non-maskable_interrupts 7613 -43.1% 4331 ± 31% interrupts.CPU47.PMI:Performance_monitoring_interrupts 5823 ± 32% -36.4% 3703 ± 34% interrupts.CPU5.NMI:Non-maskable_interrupts 5823 ± 32% -36.4% 3703 ± 34% interrupts.CPU5.PMI:Performance_monitoring_interrupts 89.25 ± 48% -61.1% 34.75 ± 31% interrupts.CPU53.TLB:TLB_shootdowns 5698 ± 33% -42.5% 3277 ± 49% interrupts.CPU55.NMI:Non-maskable_interrupts 5698 ± 33% -42.5% 3277 ± 49% interrupts.CPU55.PMI:Performance_monitoring_interrupts 172.00 ± 14% -35.2% 111.50 ± 41% interrupts.CPU56.RES:Rescheduling_interrupts 64.00 ± 42% -39.5% 38.75 ± 29% interrupts.CPU56.TLB:TLB_shootdowns 156.00 ± 17% -36.2% 99.50 ± 21% interrupts.CPU57.RES:Rescheduling_interrupts 146.25 ± 28% -48.9% 74.75 ± 67% interrupts.CPU58.RES:Rescheduling_interrupts 7627 -47.0% 4043 ± 31% interrupts.CPU62.NMI:Non-maskable_interrupts 7627 -47.0% 4043 ± 31% interrupts.CPU62.PMI:Performance_monitoring_interrupts 174.75 ± 12% -29.9% 122.50 ± 30% interrupts.CPU62.RES:Rescheduling_interrupts 76.00 ± 29% -48.4% 39.25 ± 29% interrupts.CPU62.TLB:TLB_shootdowns 7159 ± 11% -50.2% 3564 ± 32% interrupts.CPU63.NMI:Non-maskable_interrupts 7159 ± 11% -50.2% 3564 ± 32% interrupts.CPU63.PMI:Performance_monitoring_interrupts 7628 -62.9% 2831 interrupts.CPU66.NMI:Non-maskable_interrupts 7628 -62.9% 2831 interrupts.CPU66.PMI:Performance_monitoring_interrupts 174.50 ± 10% -36.4% 111.00 ± 50% interrupts.CPU66.RES:Rescheduling_interrupts 4370 ± 18% -34.7% 2853 interrupts.CPU69.NMI:Non-maskable_interrupts 4370 ± 18% -34.7% 2853 interrupts.CPU69.PMI:Performance_monitoring_interrupts 6885 ± 18% -45.8% 3731 ± 28% interrupts.CPU74.NMI:Non-maskable_interrupts 6885 ± 18% -45.8% 3731 ± 28% interrupts.CPU74.PMI:Performance_monitoring_interrupts 5900 ± 18% -57.5% 2510 ± 24% interrupts.CPU77.NMI:Non-maskable_interrupts 5900 ± 18% -57.5% 2510 ± 24% interrupts.CPU77.PMI:Performance_monitoring_interrupts 62.00 ± 41% +58.9% 98.50 ± 14% interrupts.CPU78.RES:Rescheduling_interrupts 0.00 +2.7e+105% 2689 ±171% interrupts.CPU79.115:PCI-MSI.31981648-edge.i40e-eth0-TxRx-79 49.75 ± 47% +119.6% 109.25 ± 28% interrupts.CPU79.RES:Rescheduling_interrupts 61.50 ± 54% +115.0% 132.25 ± 31% interrupts.CPU8.RES:Rescheduling_interrupts 5871 ± 19% -38.8% 3594 ± 32% interrupts.CPU80.NMI:Non-maskable_interrupts 5871 ± 19% -38.8% 3594 ± 32% interrupts.CPU80.PMI:Performance_monitoring_interrupts 60.50 ± 19% +120.2% 133.25 ± 14% interrupts.CPU81.RES:Rescheduling_interrupts 36.00 ± 79% +179.2% 100.50 ± 35% interrupts.CPU86.RES:Rescheduling_interrupts 6322 ± 21% -60.6% 2490 ± 25% interrupts.CPU88.NMI:Non-maskable_interrupts 6322 ± 21% -60.6% 2490 ± 25% interrupts.CPU88.PMI:Performance_monitoring_interrupts 32.50 ± 40% +150.0% 81.25 ± 41% interrupts.CPU92.RES:Rescheduling_interrupts 124.00 ± 11% -22.8% 95.75 ± 4% interrupts.IWI:IRQ_work_interrupts 538989 ± 8% -28.0% 387910 ± 2% interrupts.NMI:Non-maskable_interrupts 538989 ± 8% -28.0% 387910 ± 2% interrupts.PMI:Performance_monitoring_interrupts Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen