Greeting, FYI, we noticed a 18.2% improvement of fxmark.ssd_f2fs_DRBL_1_directio.works/sec due to commit: commit: 90c91dfb86d0ff545bd329d3ddd72c147e2ae198 ("perf/core: Fix endless multiplex timer") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: fxmark on test machine: 192 threads Intel(R) Xeon(R) CPU @ 2.20GHz with 192G memory with following parameters: disk: 1SSD media: ssd test: DRBL fstype: f2fs directio: directio cpufreq_governor: performance ucode: 0x400002c Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/directio/disk/fstype/kconfig/media/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/directio/1SSD/f2fs/x86_64-rhel-7.6/ssd/debian-x86_64-20191114.cgz/lkp-csl-2ap1/DRBL/fxmark/0x400002c commit: d8a7386897 ("x86/optprobe: Fix OPTPROBE vs UACCESS") 90c91dfb86 ("perf/core: Fix endless multiplex timer") d8a738689794c42c 90c91dfb86d0ff545bd329d3ddd ---------------- --------------------------- fail:runs %reproduction fail:runs | | | :4 50% 2:4 dmesg.WARNING:at#for_ip_swapgs_restore_regs_and_return_to_usermode/0x :4 50% 2:4 dmesg.WARNING:stack_recursion 0:4 1% 0:4 perf-profile.children.cycles-pp.error_entry %stddev %change %stddev \ | \ 16.79 +27.6% 21.41 fxmark.ssd_f2fs_DRBL_1_directio.iowait_sec 18.45 ± 2% -24.0% 14.02 ± 22% fxmark.ssd_f2fs_DRBL_1_directio.sys_util 2.17 ± 2% -24.9% 1.63 ± 14% fxmark.ssd_f2fs_DRBL_1_directio.user_util 736006 +18.2% 869786 fxmark.ssd_f2fs_DRBL_1_directio.works 24533 +18.2% 28992 fxmark.ssd_f2fs_DRBL_1_directio.works/sec 26010490 ± 2% +6.0% 27559732 fxmark.time.file_system_inputs 3219151 ± 2% +6.0% 3413339 fxmark.time.voluntary_context_switches 75000192 -10.2% 67344189 cpuidle.POLL.time 7.24 ± 67% -68.8% 2.26 ± 18% iostat.nvme0n1.await.max 7.27 ± 67% -68.7% 2.28 ± 17% iostat.nvme0n1.w_await.max 196991 ± 50% -74.1% 50950 ± 66% numa-numastat.node3.local_node 220900 ± 41% -62.6% 82622 ± 41% numa-numastat.node3.numa_hit 7151 ± 5% -7.1% 6640 ± 2% slabinfo.anon_vma.active_objs 3230 ± 3% -8.5% 2954 ± 4% slabinfo.files_cache.num_objs 2518633 -32.3% 1706174 ± 2% proc-vmstat.pgalloc_normal 1684 +4.7% 1763 ± 4% proc-vmstat.pgdeactivate 2505517 -31.1% 1726169 ± 3% proc-vmstat.pgfree 1586 ± 3% -96.2% 60.50 ± 66% proc-vmstat.thp_fault_alloc 71788 ± 12% -47.7% 37550 ± 67% sched_debug.cfs_rq:/.load.min 72.38 ± 12% -46.6% 38.67 ± 64% sched_debug.cfs_rq:/.load_avg.min 10.17 ± 7% -29.9% 7.12 ± 50% sched_debug.cfs_rq:/.nr_spread_over.min 355.54 ± 9% -14.7% 303.12 ± 7% sched_debug.cfs_rq:/.runnable_load_avg.max 29913 ± 70% -74.8% 7537 ± 71% numa-vmstat.node1.nr_active_anon 29915 ± 70% -74.8% 7538 ± 71% numa-vmstat.node1.nr_anon_pages 29913 ± 70% -74.8% 7537 ± 71% numa-vmstat.node1.nr_zone_active_anon 385.50 ± 68% -60.0% 154.25 ± 37% numa-vmstat.node3.nr_page_table_pages 11469 ± 11% -12.1% 10080 ± 8% numa-vmstat.node3.nr_slab_unreclaimable 119643 ± 70% -74.8% 30148 ± 71% numa-meminfo.node1.Active 119643 ± 70% -74.8% 30148 ± 71% numa-meminfo.node1.Active(anon) 86988 ± 71% -87.1% 11205 ±118% numa-meminfo.node1.AnonHugePages 119650 ± 70% -74.8% 30154 ± 71% numa-meminfo.node1.AnonPages 1547 ± 67% -60.0% 618.50 ± 37% numa-meminfo.node3.PageTables 45893 ± 11% -12.2% 40313 ± 8% numa-meminfo.node3.SUnreclaim 261.00 ± 61% -66.0% 88.75 ± 42% interrupts.CPU100.31:PCI-MSI.524289-edge.eth0-TxRx-0 6.25 ± 17% +416.0% 32.25 ±128% interrupts.CPU131.RES:Rescheduling_interrupts 5.25 ± 49% +709.5% 42.50 ±120% interrupts.CPU166.RES:Rescheduling_interrupts 109.75 ± 10% -64.0% 39.50 ±105% interrupts.CPU177.NMI:Non-maskable_interrupts 109.75 ± 10% -64.0% 39.50 ±105% interrupts.CPU177.PMI:Performance_monitoring_interrupts 109.75 ± 11% -58.5% 45.50 ±100% interrupts.CPU178.NMI:Non-maskable_interrupts 109.75 ± 11% -58.5% 45.50 ±100% interrupts.CPU178.PMI:Performance_monitoring_interrupts 86.00 ± 17% +284.9% 331.00 ± 61% interrupts.CPU2.NMI:Non-maskable_interrupts 86.00 ± 17% +284.9% 331.00 ± 61% interrupts.CPU2.PMI:Performance_monitoring_interrupts 105.25 ± 28% +290.0% 410.50 ± 68% interrupts.CPU3.NMI:Non-maskable_interrupts 105.25 ± 28% +290.0% 410.50 ± 68% interrupts.CPU3.PMI:Performance_monitoring_interrupts 403.75 ± 11% +262.6% 1464 ±104% interrupts.CPU3.RES:Rescheduling_interrupts 142.00 ± 31% +146.1% 349.50 ± 58% interrupts.CPU4.NMI:Non-maskable_interrupts 142.00 ± 31% +146.1% 349.50 ± 58% interrupts.CPU4.PMI:Performance_monitoring_interrupts 76.00 ± 39% +338.5% 333.25 ± 90% interrupts.CPU5.NMI:Non-maskable_interrupts 76.00 ± 39% +338.5% 333.25 ± 90% interrupts.CPU5.PMI:Performance_monitoring_interrupts 131.50 ± 24% +100.8% 264.00 ± 49% interrupts.CPU6.RES:Rescheduling_interrupts 1124 ±172% -100.0% 0.25 ±173% interrupts.CPU95.TLB:TLB_shootdowns 4474 ± 8% +61.9% 7245 ± 37% interrupts.NMI:Non-maskable_interrupts 4474 ± 8% +61.9% 7245 ± 37% interrupts.PMI:Performance_monitoring_interrupts 37.99 ± 10% -18.0 20.03 ± 61% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry 33.38 ± 10% -16.4 16.98 ± 62% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle 18.63 ± 7% -11.0 7.66 ± 57% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter 11.82 ± 5% -7.9 3.94 ± 53% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state 3.81 ± 9% -3.1 0.70 ± 69% perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt 3.13 ± 16% -1.8 1.33 ± 75% perf-profile.calltrace.cycles-pp.serial8250_console_putchar.uart_console_write.serial8250_console_write.console_unlock.vprintk_emit 3.27 ± 18% -1.7 1.54 ± 64% perf-profile.calltrace.cycles-pp.uart_console_write.serial8250_console_write.console_unlock.vprintk_emit.printk 3.13 ± 16% -1.7 1.46 ± 57% perf-profile.calltrace.cycles-pp.wait_for_xmitr.serial8250_console_putchar.uart_console_write.serial8250_console_write.console_unlock 1.72 ± 33% -1.0 0.67 ± 74% perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_xmitr.serial8250_console_putchar.uart_console_write.serial8250_console_write 1.12 ± 11% -0.7 0.44 ±101% perf-profile.calltrace.cycles-pp.lapic_next_deadline.clockevents_program_event.hrtimer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt 1.55 ± 19% -0.7 0.89 ± 58% perf-profile.calltrace.cycles-pp.irq_enter.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter 0.97 ± 13% -0.6 0.38 ±100% perf-profile.calltrace.cycles-pp.native_write_msr.lapic_next_deadline.clockevents_program_event.hrtimer_interrupt.smp_apic_timer_interrupt 36.02 ± 10% -16.9 19.15 ± 58% perf-profile.children.cycles-pp.apic_timer_interrupt 33.51 ± 10% -16.1 17.36 ± 59% perf-profile.children.cycles-pp.smp_apic_timer_interrupt 18.75 ± 7% -10.6 8.20 ± 52% perf-profile.children.cycles-pp.hrtimer_interrupt 11.98 ± 5% -7.5 4.45 ± 43% perf-profile.children.cycles-pp.__hrtimer_run_queues 66.76 ± 4% -7.2 59.58 ± 11% perf-profile.children.cycles-pp.cpuidle_enter_state 4.08 ± 9% -3.3 0.83 ± 50% perf-profile.children.cycles-pp.perf_mux_hrtimer_handler 2.83 ± 7% -1.8 1.02 ± 36% perf-profile.children.cycles-pp.native_write_msr 3.52 ± 17% -1.5 2.00 ± 50% perf-profile.children.cycles-pp.printk 3.52 ± 17% -1.5 2.00 ± 50% perf-profile.children.cycles-pp.vprintk_emit 1.79 ± 15% -1.5 0.34 ± 67% perf-profile.children.cycles-pp.__intel_pmu_enable_all 3.51 ± 18% -1.4 2.14 ± 38% perf-profile.children.cycles-pp.console_unlock 3.38 ± 17% -1.3 2.04 ± 39% perf-profile.children.cycles-pp.serial8250_console_write 3.27 ± 19% -1.3 1.96 ± 39% perf-profile.children.cycles-pp.uart_console_write 3.24 ± 16% -1.3 1.95 ± 41% perf-profile.children.cycles-pp.wait_for_xmitr 3.13 ± 17% -1.3 1.86 ± 41% perf-profile.children.cycles-pp.serial8250_console_putchar 1.33 ± 22% -1.2 0.11 ± 69% perf-profile.children.cycles-pp.enqueue_hrtimer 1.25 ± 23% -1.2 0.10 ± 76% perf-profile.children.cycles-pp.timerqueue_add 1.00 ± 29% -0.9 0.11 ± 74% perf-profile.children.cycles-pp.__remove_hrtimer 0.88 ± 29% -0.8 0.07 ±112% perf-profile.children.cycles-pp.timerqueue_del 0.65 ± 30% -0.6 0.04 ±110% perf-profile.children.cycles-pp.rb_erase 1.23 ± 26% -0.6 0.65 ± 22% perf-profile.children.cycles-pp._raw_spin_lock 1.17 ± 8% -0.5 0.65 ± 46% perf-profile.children.cycles-pp.lapic_next_deadline 1.44 ± 10% -0.5 0.94 ± 11% perf-profile.children.cycles-pp.read_tsc 1.56 ± 18% -0.5 1.07 ± 38% perf-profile.children.cycles-pp.irq_enter 1.03 ± 12% -0.5 0.54 ± 43% perf-profile.children.cycles-pp.delay_tsc 0.60 ± 22% -0.4 0.19 ± 64% perf-profile.children.cycles-pp._raw_spin_lock_irq 1.04 ± 23% -0.4 0.64 ± 39% perf-profile.children.cycles-pp.page_fault 0.95 ± 23% -0.4 0.57 ± 40% perf-profile.children.cycles-pp.do_page_fault 1.36 ± 12% -0.4 1.01 ± 11% perf-profile.children.cycles-pp.native_irq_return_iret 0.61 ± 10% -0.3 0.27 ± 79% perf-profile.children.cycles-pp.timekeeping_max_deferment 0.68 ± 27% -0.3 0.36 ± 51% perf-profile.children.cycles-pp.tick_check_oneshot_broadcast_this_cpu 0.83 ± 26% -0.3 0.52 ± 41% perf-profile.children.cycles-pp.__handle_mm_fault 0.84 ± 27% -0.3 0.53 ± 41% perf-profile.children.cycles-pp.handle_mm_fault 0.24 ± 43% -0.2 0.07 ±100% perf-profile.children.cycles-pp.mmap_region 0.27 ± 19% -0.2 0.09 ± 30% perf-profile.children.cycles-pp.setlocale 0.19 ± 26% -0.1 0.04 ±113% perf-profile.children.cycles-pp.pipe_read 0.33 ± 14% -0.1 0.19 ± 38% perf-profile.children.cycles-pp.newidle_balance 0.17 ± 16% -0.1 0.05 ±116% perf-profile.children.cycles-pp.rb_next 0.16 ± 20% -0.1 0.07 ± 61% perf-profile.children.cycles-pp.update_blocked_averages 0.12 ± 32% -0.1 0.05 ±106% perf-profile.children.cycles-pp.fbcon_putcs 0.09 ± 17% -0.1 0.03 ±100% perf-profile.children.cycles-pp.trigger_load_balance 0.11 ± 28% -0.1 0.05 ±106% perf-profile.children.cycles-pp.bit_putcs 0.01 ±173% +0.1 0.13 ± 59% perf-profile.children.cycles-pp.__slab_free 0.04 ±102% +0.4 0.40 ± 80% perf-profile.children.cycles-pp.update_load_avg 0.09 ± 61% +0.7 0.77 ± 85% perf-profile.children.cycles-pp.schedule_idle 0.09 ± 64% +9.5 9.54 ±113% perf-profile.children.cycles-pp.poll_idle 2.81 ± 7% -1.8 1.02 ± 36% perf-profile.self.cycles-pp.native_write_msr 0.69 ± 30% -0.6 0.06 ±116% perf-profile.self.cycles-pp.timerqueue_add 1.23 ± 26% -0.6 0.63 ± 22% perf-profile.self.cycles-pp._raw_spin_lock 0.63 ± 29% -0.6 0.04 ±106% perf-profile.self.cycles-pp.rb_erase 1.41 ± 10% -0.5 0.89 ± 12% perf-profile.self.cycles-pp.read_tsc 1.03 ± 12% -0.5 0.54 ± 43% perf-profile.self.cycles-pp.delay_tsc 0.58 ± 25% -0.4 0.19 ± 64% perf-profile.self.cycles-pp._raw_spin_lock_irq 0.55 ± 23% -0.4 0.16 ± 61% perf-profile.self.cycles-pp.perf_mux_hrtimer_handler 0.61 ± 10% -0.4 0.24 ± 89% perf-profile.self.cycles-pp.timekeeping_max_deferment 1.36 ± 12% -0.4 1.01 ± 11% perf-profile.self.cycles-pp.native_irq_return_iret 0.68 ± 27% -0.3 0.36 ± 51% perf-profile.self.cycles-pp.tick_check_oneshot_broadcast_this_cpu 0.31 ± 15% -0.2 0.15 ± 73% perf-profile.self.cycles-pp.__hrtimer_run_queues 0.18 ± 43% -0.1 0.05 ±110% perf-profile.self.cycles-pp.clockevents_program_event 0.14 ± 26% -0.1 0.05 ±114% perf-profile.self.cycles-pp.rb_next 0.11 ± 30% -0.1 0.03 ±100% perf-profile.self.cycles-pp.__remove_hrtimer 0.11 ± 13% -0.1 0.04 ±103% perf-profile.self.cycles-pp.__note_gp_changes 0.01 ±173% +0.1 0.09 ± 16% perf-profile.self.cycles-pp.tick_nohz_get_sleep_length 0.08 ± 73% +0.1 0.16 ± 7% perf-profile.self.cycles-pp.find_next_bit 0.00 +0.1 0.08 ± 28% perf-profile.self.cycles-pp.cpuidle_enter 0.01 ±173% +0.1 0.13 ± 59% perf-profile.self.cycles-pp.__slab_free 0.08 ± 61% +8.7 8.77 ±113% perf-profile.self.cycles-pp.poll_idle fxmark.ssd_f2fs_DRBL_1_directio.works 900000 +------------------------------------------------------------------+ | O O | 880000 |-+ O O O O O O O | 860000 |-OO OO OO O OO OO O O OO OO O O O O | | O O O O OO | 840000 |-+ O O | 820000 |-+ | | | 800000 |-+ | 780000 |-+ + | | :+ | 760000 |.+ .++.+. .+ + +.+. +. .+.++.+ + | 740000 |-++.+ .+. + ++ +.+.+ + +.+.+ + + ++ : | | + + + +.+ + : +.++.| 720000 +------------------------------------------------------------------+ fxmark.ssd_f2fs_DRBL_1_directio.works_sec 30000 +-------------------------------------------------------------------+ | O O | 29000 |-+ O O O O O O O | | OO OO OO O O OO OO O O O OO OO O O | | O O O O O O | 28000 |-+ O O | | | 27000 |-+ | | | 26000 |-+ + | | :+ | |. .+.++. .+. +. .+.+ .+. .++.+.+ + | 25000 |-++.+ .+. + ++ ++.+. : ++.+. + + ++ : | | + + + ++. : : +.++.| 24000 +-------------------------------------------------------------------+ fxmark.ssd_f2fs_DRBL_1_directio.iowait_sec 22 +----------------------------------------------------------------------+ | OO O O | 21 |-OO O OO O OO O OO O OO O OO O O O O O O O OO O OO O OO | | O | | | 20 |-+ | | | 19 |-+ | | | 18 |-+ | | .+.+.+ | |. .+.++.+.++.+. +. .+.++.+.++.+.++ : | 17 |-++.+.++.+ ++.+.+ +.+.++.+. + :.+.+ .| | + + + | 16 +----------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen