Greeting, FYI, we noticed a 865% improvement of reaim.jobs_per_min due to commit: commit: cb6268f05df684e00607762fd8ad95d515e2407f ("ipc: optimize semget/shmget/msgget for lots of keys") url: https://github.com/0day-ci/linux/commits/Guillaume-Knispel/ipc-optimize-semget-shmget-msgget-for-lots-of-keys/20170731-170031 in testcase: reaim on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory with following parameters: runtime: 300s nr_task: 5000 test: shared_memory cpufreq_governor: performance test-description: REAIM is an updated and improved version of AIM 7 benchmark. test-url: https://sourceforge.net/projects/re-aim-7/ Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/01org/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml testcase/path_params/tbox_group/run: reaim/300s-5000-shared_memory-performance/lkp-hsw-ep5 fd2b2c57ec2020ae cb6268f05df684e00607762fd8 ---------------- -------------------------- 634033 ± 4% 865% 6120365 reaim.jobs_per_min 126 ± 4% 865% 1224 reaim.jobs_per_min_child 672755 ± 5% 831% 6263184 reaim.max_jobs_per_min 34.14 ± 3% 11% 37.82 reaim.std_dev_percent 65.40 -6% 61.66 reaim.jti 13.53 -7% 12.60 reaim.child_utime 47.48 ± 4% -90% 4.90 reaim.parent_time 1981 ± 4% -90% 204 reaim.child_systime 12.19 -90% 1.24 reaim.std_dev_time 5160570 ± 7% 517% 31838188 reaim.time.minor_page_faults 88.17 ± 7% 474% 505.69 reaim.time.user_time 125994 ± 11% 244% 433632 reaim.time.involuntary_context_switches 4053795 ± 7% 79% 7256949 reaim.time.voluntary_context_switches 3974 -28% 2864 reaim.time.percent_of_cpu_this_job_got 12855 ± 4% -36% 8249 reaim.time.system_time 27741 ± 3% 86% 51490 vmstat.system.cs 214773 ± 5% 44% 308264 interrupts.CAL:Function_call_interrupts fail:runs %reproduction fail:runs | | | :4 25% 1:4 stderr.create_shared_memory():can't_create_shared_memory,pausing 0 6e+05 551946 ±104% latency_stats.avg.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.do_syscall_64.return_from_SYSCALL_64 240707 ± 7% -2e+05 81933 latency_stats.avg.call_rwsem_down_write_failed.ipcget.SyS_shmget.entry_SYSCALL_64_fastpath 282714 ± 4% -2e+05 76327 latency_stats.avg.call_rwsem_down_write_failed.shm_close.remove_vma.do_munmap.SyS_shmdt.entry_SYSCALL_64_fastpath 313951 ± 5% -2e+05 78957 latency_stats.avg.call_rwsem_down_write_failed.do_shmat.SyS_shmat.entry_SYSCALL_64_fastpath 341015 ± 4% -3e+05 78091 latency_stats.avg.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.entry_SYSCALL_64_fastpath 0 6e+05 551946 ±104% latency_stats.max.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.do_syscall_64.return_from_SYSCALL_64 21599230 ± 3% -2e+07 3153822 ± 6% latency_stats.max.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.entry_SYSCALL_64_fastpath 21608679 ± 3% -2e+07 3152519 ± 6% latency_stats.max.call_rwsem_down_write_failed.ipcget.SyS_shmget.entry_SYSCALL_64_fastpath 21612440 ± 3% -2e+07 3153028 ± 6% latency_stats.max.call_rwsem_down_write_failed.do_shmat.SyS_shmat.entry_SYSCALL_64_fastpath 21613940 ± 3% -2e+07 3154107 ± 6% latency_stats.max.call_rwsem_down_write_failed.shm_close.remove_vma.do_munmap.SyS_shmdt.entry_SYSCALL_64_fastpath 21615866 ± 3% -2e+07 3154900 ± 6% latency_stats.max.max 3.835e+10 ± 4% 9e+10 1.254e+11 ± 7% latency_stats.sum.io_schedule.__lock_page.do_wp_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 1.672e+09 ± 22% 5e+09 6.765e+09 ± 86% latency_stats.sum.call_rwsem_down_write_failed.ipcget.SyS_semget.entry_SYSCALL_64_fastpath 2757771 ± 40% 2e+08 2.425e+08 ± 26% latency_stats.sum.io_schedule.wait_on_page_bit.__migration_entry_wait.migration_entry_wait.do_swap_page.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 0 6e+05 551946 ±104% latency_stats.sum.call_rwsem_down_write_failed.shmctl_down.SyS_shmctl.do_syscall_64.return_from_SYSCALL_64 24449 ± 67% 2e+05 200503 ± 8% latency_stats.sum.io_schedule.__lock_page_or_retry.filemap_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 40790 ± 10% 2e+05 191446 ± 11% latency_stats.sum.ep_poll.SyS_epoll_wait.do_syscall_64.return_from_SYSCALL_64 27684 ± 19% 1e+05 172543 ± 13% latency_stats.sum.devkmsg_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath 52252 ± 4% 1e+05 189551 ± 5% latency_stats.sum.wait_woken.inotify_read.__vfs_read.vfs_read.SyS_read.entry_SYSCALL_64_fastpath 6530 ± 95% 9e+04 91966 ± 21% latency_stats.sum.io_schedule.__lock_page_killable.__lock_page_or_retry.filemap_fault.__do_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 5976 ± 20% 3e+04 34861 ± 4% latency_stats.sum.pipe_wait.pipe_write.__vfs_write.vfs_write.SyS_write.entry_SYSCALL_64_fastpath 2.659e+11 ± 3% -2e+11 8.371e+10 latency_stats.sum.call_rwsem_down_write_failed.do_shmat.SyS_shmat.entry_SYSCALL_64_fastpath 5864993 ± 7% 454% 32495687 perf-stat.page-faults 5864993 ± 7% 454% 32495687 perf-stat.minor-faults 0.11 ± 3% 422% 0.60 perf-stat.branch-miss-rate% 1.501e+08 ± 7% 356% 6.847e+08 perf-stat.node-store-misses 4.444e+11 ± 6% 316% 1.849e+12 perf-stat.dTLB-stores 1.077e+08 ± 5% 314% 4.46e+08 perf-stat.node-loads 1.067e+09 ± 7% 287% 4.133e+09 perf-stat.cache-misses 6.282e+08 ± 7% 275% 2.355e+09 perf-stat.node-load-misses 1.757e+08 ± 7% 261% 6.339e+08 perf-stat.node-stores 4.56e+09 ± 6% 232% 1.516e+10 perf-stat.branch-misses 3.20 212% 9.98 perf-stat.cache-miss-rate% 71703040 ± 6% 163% 1.884e+08 perf-stat.dTLB-store-misses 2.831e+08 ± 11% 111% 5.964e+08 perf-stat.iTLB-loads 9086070 ± 7% 74% 15819624 perf-stat.context-switches 1340441 ± 8% 72% 2308738 perf-stat.cpu-migrations 2.164e+09 ± 7% 32% 2.847e+09 perf-stat.iTLB-load-misses 3.337e+10 ± 5% 24% 4.14e+10 perf-stat.cache-references 46.07 13% 51.93 perf-stat.node-store-miss-rate% 0.66 9% 0.72 perf-stat.ipc 85.35 84.08 perf-stat.node-load-miss-rate% 88.46 -7% 82.67 perf-stat.iTLB-load-miss-rate% 1.51 -8% 1.39 perf-stat.cpi 5.134e+12 ± 4% -28% 3.698e+12 perf-stat.dTLB-loads 1.972e+13 ± 4% -32% 1.342e+13 perf-stat.instructions 3.976e+12 ± 4% -36% 2.533e+12 perf-stat.branch-instructions 0.02 ± 6% -37% 0.01 perf-stat.dTLB-store-miss-rate% 2.977e+13 ± 4% -37% 1.865e+13 perf-stat.cpu-cycles 9132 ± 3% -48% 4715 perf-stat.instructions-per-iTLB-miss 0.06 ± 27% -68% 0.02 perf-stat.dTLB-load-miss-rate% 3.271e+09 ± 27% -77% 7.473e+08 perf-stat.dTLB-load-misses reaim.parent_time 55 ++---------------------------------------------------------------------+ 50 ++.*. .*.. *..| *. *..*. *.*..*..*..*.*..*..*..*.*..*..*.*..*.. .*.*..*..*.. + * 45 ++ *. * | 40 ++ | 35 ++ | 30 ++ | | | 25 ++ | 20 ++ | 15 ++ | 10 ++ | | | 5 O+ O O O O O O O O O O O O O O O O O O O O O O | 0 ++---------------------------------------------------------------------+ reaim.child_systime 2200 ++-*-*-----*---------------------------------------------------------+ 2000 *+ *. : *.*..*..*. .*.. .*..| | : .*.*.. .*.. .. *..*..*.*. * * 1800 ++ *..*. *..* *..*.* | 1600 ++ | 1400 ++ | 1200 ++ | | | 1000 ++ | 800 ++ | 600 ++ | 400 ++ | | | 200 O+ O O O O O O O O O O O O O O O O O O O O O O | 0 ++-------------------------------------------------------------------+ reaim.jobs_per_min 7e+06 ++------------------------------------------------------------------+ O O O O O O O O O O O O O O O | 6e+06 ++ O O O O O O O O | | | 5e+06 ++ | | | 4e+06 ++ | | | 3e+06 ++ | | | 2e+06 ++ | | | 1e+06 ++ | *..*.*..*.*..*..*.*..*..*.*..*.*..*..*.*..*.*..*..*.*..*..*.*..*.*..* 0 ++------------------------------------------------------------------+ reaim.jobs_per_min_child 1400 ++-------------------------------------------------------------------+ O O O O O O O O O O O O O O O | 1200 ++ O O O O O O O O | | | 1000 ++ | | | 800 ++ | | | 600 ++ | | | 400 ++ | | | 200 ++ | *..*.*..*..*.*..*..*.*..*..*.*..*..*.*..*.*..*..*.*..*..*.*..*..*.*..* 0 ++-------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Xiaolong