Greeting, FYI, we noticed a -6.1% regression of netperf.Throughput_Mbps due to commit: commit: a337531b942bd8a03e7052444d7e36972aac2d92 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git master in testcase: netperf on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G memory with following parameters: ip: ipv4 runtime: 900s nr_threads: 200% cluster: cs-localhost test: TCP_STREAM ucode: 0x7000013 cpufreq_governor: performance test-description: Netperf is a benchmark that can be use to measure various aspect of networking performance. test-url: http://www.netperf.org/netperf/ Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/tbox_group/test/testcase/ucode: cs-localhost/gcc-7/performance/ipv4/x86_64-rhel-7.2/200%/debian-x86_64-2018-04-03.cgz/900s/lkp-bdw-de1/TCP_STREAM/netperf/0x7000013 commit: 3ff6cde846 ("hns3: Another build fix.") a337531b94 ("tcp: up initial rmem to 128KB and SYN rwin to around 64KB") 3ff6cde846857d45 a337531b942bd8a03e7052444d ---------------- -------------------------- fail:runs %reproduction fail:runs | | | :4 50% 2:4 dmesg.WARNING:at#for_ip_interrupt_entry/0x %stddev %change %stddev \ | \ 2497 -6.1% 2345 netperf.Throughput_Mbps 79924 -6.1% 75061 netperf.Throughput_total_Mbps 186513 +11.3% 207590 netperf.time.involuntary_context_switches 5.488e+08 -6.1% 5.154e+08 netperf.workload 1172 ± 34% -37.6% 731.75 ± 5% cpuidle.C1E.usage 1137 ± 34% -40.0% 682.25 ± 8% turbostat.C1E 2775 ± 11% +17.5% 3261 ± 9% sched_debug.cpu.nr_switches.stddev 0.01 ± 17% +28.2% 0.01 ± 10% sched_debug.rt_rq:/.rt_time.avg 0.14 ± 17% +28.2% 0.18 ± 10% sched_debug.rt_rq:/.rt_time.max 0.03 ± 17% +28.2% 0.04 ± 10% sched_debug.rt_rq:/.rt_time.stddev 66336 +0.9% 66948 proc-vmstat.nr_anon_pages 2.755e+08 -6.1% 2.588e+08 proc-vmstat.numa_hit 2.755e+08 -6.1% 2.588e+08 proc-vmstat.numa_local 2.197e+09 -6.1% 2.064e+09 proc-vmstat.pgalloc_normal 2.197e+09 -6.1% 2.064e+09 proc-vmstat.pgfree 5.903e+11 -7.9% 5.438e+11 perf-stat.branch-instructions 2.68 -0.0 2.64 perf-stat.branch-miss-rate% 1.582e+10 -9.2% 1.436e+10 perf-stat.branch-misses 6.26e+11 -4.7% 5.964e+11 perf-stat.cache-misses 6.26e+11 -4.7% 5.964e+11 perf-stat.cache-references 11.69 +8.6% 12.69 perf-stat.cpi 123723 +2.1% 126291 perf-stat.cpu-migrations 0.09 ± 2% +0.0 0.09 perf-stat.dTLB-load-miss-rate% 1.475e+12 -7.1% 1.37e+12 perf-stat.dTLB-loads 1.094e+12 -6.9% 1.018e+12 perf-stat.dTLB-stores 2.912e+08 ± 5% -13.0% 2.533e+08 perf-stat.iTLB-loads 3.019e+12 -7.9% 2.781e+12 perf-stat.instructions 0.09 -7.9% 0.08 perf-stat.ipc 5500 -1.9% 5394 perf-stat.path-length 0.53 ± 2% -0.2 0.38 ± 57% perf-profile.calltrace.cycles-pp.ip_output.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames 0.63 ± 2% -0.1 0.58 ± 4% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret 0.73 ± 3% +0.1 0.78 ± 2% perf-profile.calltrace.cycles-pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv 0.96 +0.1 1.03 perf-profile.calltrace.cycles-pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_local_deliver_finish 98.02 +0.1 98.13 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 97.88 +0.1 98.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.70 ± 3% -0.1 0.64 ± 4% perf-profile.children.cycles-pp.syscall_return_via_sysret 0.26 ± 5% -0.0 0.21 ± 6% perf-profile.children.cycles-pp._raw_spin_lock_bh 0.28 ± 5% -0.0 0.24 ± 6% perf-profile.children.cycles-pp.lock_sock_nested 0.46 ± 4% -0.0 0.43 ± 2% perf-profile.children.cycles-pp.nf_hook_slow 0.21 ± 8% -0.0 0.18 ± 5% perf-profile.children.cycles-pp.tcp_rcv_space_adjust 0.08 ± 5% -0.0 0.06 perf-profile.children.cycles-pp.entry_SYSCALL_64_stage2 0.08 ± 6% -0.0 0.06 ± 6% perf-profile.children.cycles-pp.ip_finish_output 0.17 ± 6% +0.0 0.20 ± 5% perf-profile.children.cycles-pp.tcp_event_new_data_sent 0.24 ± 4% +0.0 0.27 ± 2% perf-profile.children.cycles-pp.mod_timer 0.15 ± 2% +0.0 0.18 ± 2% perf-profile.children.cycles-pp.__might_sleep 0.80 ± 3% +0.0 0.84 ± 2% perf-profile.children.cycles-pp.tcp_clean_rtx_queue 0.30 ± 3% +0.1 0.36 ± 4% perf-profile.children.cycles-pp.__might_fault 1.61 ± 4% +0.1 1.69 perf-profile.children.cycles-pp.__release_sock 1.06 ± 2% +0.1 1.14 perf-profile.children.cycles-pp.tcp_ack 98.24 +0.1 98.36 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 98.09 +0.1 98.23 perf-profile.children.cycles-pp.do_syscall_64 70.28 +0.6 70.86 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string 1.56 -0.1 1.48 ± 3% perf-profile.self.cycles-pp.copy_page_to_iter 0.70 ± 3% -0.1 0.64 ± 4% perf-profile.self.cycles-pp.syscall_return_via_sysret 1.37 ± 2% -0.1 1.32 ± 2% perf-profile.self.cycles-pp.__free_pages_ok 0.55 ± 3% -0.0 0.50 ± 3% perf-profile.self.cycles-pp.__alloc_skb 0.44 ± 3% -0.0 0.40 ± 5% perf-profile.self.cycles-pp.tcp_recvmsg 0.16 ± 9% -0.0 0.14 ± 5% perf-profile.self.cycles-pp.sock_has_perm 0.08 ± 6% -0.0 0.06 perf-profile.self.cycles-pp.entry_SYSCALL_64_stage2 0.10 ± 4% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.tcp_clean_rtx_queue 0.14 ± 6% +0.0 0.17 ± 4% perf-profile.self.cycles-pp.__might_sleep 69.25 +0.5 69.77 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string netperf.Throughput_Mbps 3000 +-+------------------------------------------------------------------+ | | 2500 +-+..+.+..+.+..+.+..+.+..+.+..+.+..+.+.+..+.+..+.+..+.+..+.+..+.+..+.| O O O O O O O O O O O O O O O O O O O O O O O O O | | : | 2000 +-+ | |: | 1500 +-+ | |: | 1000 +-+ | |: | |: | 500 +-+ | | | 0 +-+------------------------------------------------------------------+ netperf.Throughput_total_Mbps 90000 +-+-----------------------------------------------------------------+ | | 80000 O-O..O.O..O.O..O.O.O..O.O..O.O..O.O.O..O.O..O.O..O.O.O..O.O..+.+..+.| 70000 +-+ | | : | 60000 +-+ | 50000 +-+ | |: | 40000 +-+ | 30000 +-+ | |: | 20000 +-+ | 10000 +-+ | | | 0 +-+-----------------------------------------------------------------+ netperf.workload 6e+08 +-+-----------------------------------------------------------------+ | +..+.+..+.+..+.+.+..+.+..+.+..+.+.+..+.+..+.+..+.+.+..+.+..+.+..+.| 5e+08 O-O O O O O O O O O O O O O O O O O O O O O O O O | | : | | : | 4e+08 +-+ | |: | 3e+08 +-+ | |: | 2e+08 +-+ | |: | | | 1e+08 +-+ | | | 0 +-+-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen