linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [linus:master] [mm]  8d59d2214c:  vm-scalability.throughput -36.6% regression
@ 2024-01-22  8:39 kernel test robot
  2024-01-22 21:39 ` Yosry Ahmed
  0 siblings, 1 reply; 6+ messages in thread
From: kernel test robot @ 2024-01-22  8:39 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner,
	Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen,
	Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song,
	Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups,
	linux-mm, ying.huang, feng.tang, fengwei.yin, oliver.sang



hi, Yosry Ahmed,

per your suggestion in
https://lore.kernel.org/all/CAJD7tkameJBrJQxRj+ibKL6-yd-i0wyoyv2cgZdh3ZepA1p7wA@mail.gmail.com/
"I think it would be useful to know if there are
regressions/improvements in other microbenchmarks, at least to
investigate whether they represent real regressions."

we still report below two regressions to you just FYI what we observed in our
microbenchmark tests.
(we still captured will-it-scale::fallocate regression but ignore here per
your commit message)


Hello,

kernel test robot noticed a -36.6% regression of vm-scalability.throughput on:


commit: 8d59d2214c2362e7a9d185d80b613e632581af7b ("mm: memcg: make stats flushing threshold per-memcg")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: vm-scalability
test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
parameters:

	runtime: 300s
	size: 1T
	test: lru-shm
	cpufreq_governor: performance

test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/

In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops -32.3% regression                                     |
| test machine     | 104 threads 2 sockets (Skylake) with 192G memory                                                   |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | mode=process                                                                                       |
|                  | nr_task=50%                                                                                        |
|                  | test=tlb_flush2                                                                                    |
+------------------+----------------------------------------------------------------------------------------------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240122/202401221624.cb53a8ca-oliver.sang@intel.com

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/1T/lkp-cpl-4sp2/lru-shm/vm-scalability

commit: 
  e0bf1dc859 ("mm: memcg: move vmstats structs definition above flushing code")
  8d59d2214c ("mm: memcg: make stats flushing threshold per-memcg")

e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      0.01           +86.7%       0.02        vm-scalability.free_time
    946447           -37.8%     588327        vm-scalability.median
 2.131e+08           -36.6%  1.351e+08        vm-scalability.throughput
    284.74            +6.3%     302.62        vm-scalability.time.elapsed_time
    284.74            +6.3%     302.62        vm-scalability.time.elapsed_time.max
     30485           +14.8%      34987        vm-scalability.time.involuntary_context_switches
      1893           +43.6%       2718        vm-scalability.time.percent_of_cpu_this_job_got
      3855           +67.7%       6467        vm-scalability.time.system_time
      1537           +14.5%       1760        vm-scalability.time.user_time
    120009            -5.6%     113290        vm-scalability.time.voluntary_context_switches
      6.46            +3.5        9.95        mpstat.cpu.all.sys%
     21.22           +38.8%      29.46        vmstat.procs.r
      0.01 ± 20%   +1887.0%       0.18 ±203%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 28%     +63.3%       0.01 ± 29%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
    113624 ±  5%     +14.0%     129566 ±  3%  meminfo.Active
    113476 ±  5%     +14.0%     129417 ±  3%  meminfo.Active(anon)
   3987746           +46.0%    5821636        meminfo.Mapped
     16345           +14.6%      18729        meminfo.PageTables
    474.17 ±  3%     -88.9%      52.50 ±125%  perf-c2c.DRAM.local
    483.17 ±  5%     -79.3%      99.83 ± 70%  perf-c2c.DRAM.remote
      1045 ±  5%     -71.9%     294.00 ± 63%  perf-c2c.HITM.local
    119.50 ± 10%     -78.8%      25.33 ± 20%  perf-c2c.HITM.remote
    392.33           +35.4%     531.17        turbostat.Avg_MHz
     10.35            +3.7       14.00        turbostat.Busy%
     90.56            -3.7       86.86        turbostat.C1%
      0.28 ±  5%     -31.5%       0.19        turbostat.IPC
    481.33            +2.5%     493.38        turbostat.PkgWatt
    999019 ±  3%     +44.4%    1442651 ±  2%  numa-meminfo.node0.Mapped
   1005687 ±  4%     +44.1%    1449402 ±  3%  numa-meminfo.node1.Mapped
      3689 ±  3%     +21.7%       4490 ±  7%  numa-meminfo.node1.PageTables
    980589 ±  2%     +42.3%    1395777 ±  2%  numa-meminfo.node2.Mapped
     96484 ±  5%     +22.0%     117715 ±  4%  numa-meminfo.node3.Active
     96430 ±  5%     +22.1%     117694 ±  4%  numa-meminfo.node3.Active(anon)
    991367 ±  3%     +42.7%    1414337 ±  4%  numa-meminfo.node3.Mapped
    251219 ±  3%     +44.8%     363745 ±  2%  numa-vmstat.node0.nr_mapped
    253252 ±  2%     +44.6%     366087 ±  3%  numa-vmstat.node1.nr_mapped
    927.67 ±  3%     +21.9%       1130 ±  7%  numa-vmstat.node1.nr_page_table_pages
    248171 ±  2%     +42.5%     353541 ±  4%  numa-vmstat.node2.nr_mapped
     24188 ±  5%     +21.6%      29410 ±  4%  numa-vmstat.node3.nr_active_anon
    245825 ±  2%     +45.5%     357622 ±  3%  numa-vmstat.node3.nr_mapped
      1038 ± 11%     +17.8%       1224 ±  6%  numa-vmstat.node3.nr_page_table_pages
     24188 ±  5%     +21.6%      29410 ±  4%  numa-vmstat.node3.nr_zone_active_anon
     28376 ±  5%     +14.0%      32338 ±  3%  proc-vmstat.nr_active_anon
    993504           +46.6%    1456136        proc-vmstat.nr_mapped
      4060           +15.5%       4691        proc-vmstat.nr_page_table_pages
     28376 ±  5%     +14.0%      32338 ±  3%  proc-vmstat.nr_zone_active_anon
 1.066e+09            -2.0%  1.045e+09        proc-vmstat.numa_hit
 1.065e+09            -2.0%  1.044e+09        proc-vmstat.numa_local
      5659            +5.6%       5978        proc-vmstat.unevictable_pgs_culled
  34604288            +3.7%   35898496        proc-vmstat.unevictable_pgs_scanned
   1223376 ± 14%    +119.1%    2680582 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.avg
   1673909 ± 14%     +97.6%    3308254 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.max
    810795 ± 15%    +145.8%    1993289 ±  9%  sched_debug.cfs_rq:/.avg_vruntime.min
    156233 ±  8%     +55.1%     242331 ±  6%  sched_debug.cfs_rq:/.avg_vruntime.stddev
   1223376 ± 14%    +119.1%    2680582 ±  9%  sched_debug.cfs_rq:/.min_vruntime.avg
   1673909 ± 14%     +97.6%    3308254 ±  8%  sched_debug.cfs_rq:/.min_vruntime.max
    810795 ± 15%    +145.8%    1993289 ±  9%  sched_debug.cfs_rq:/.min_vruntime.min
    156233 ±  8%     +55.1%     242331 ±  6%  sched_debug.cfs_rq:/.min_vruntime.stddev
    126445 ±  3%     -11.0%     112493 ±  4%  sched_debug.cpu.avg_idle.stddev
      1447 ± 15%     +32.0%       1910 ±  9%  sched_debug.cpu.nr_switches.min
      0.71           +13.4%       0.80        perf-stat.i.MPKI
 2.343e+10            -7.9%  2.157e+10        perf-stat.i.branch-instructions
      0.36            -0.0        0.35        perf-stat.i.branch-miss-rate%
  30833194            -7.3%   28584190        perf-stat.i.branch-misses
     26.04            -1.4       24.66        perf-stat.i.cache-miss-rate%
  51345490 ±  3%     +40.7%   72258633 ±  3%  perf-stat.i.cache-misses
 1.616e+08 ±  6%     +58.6%  2.562e+08 ±  6%  perf-stat.i.cache-references
      1.29            +9.4%       1.42        perf-stat.i.cpi
 8.394e+10           +33.7%  1.122e+11        perf-stat.i.cpu-cycles
    505.77            -2.6%     492.52        perf-stat.i.cpu-migrations
      0.03            +0.0        0.03 ±  2%  perf-stat.i.dTLB-load-miss-rate%
 2.335e+10            -7.4%  2.162e+10        perf-stat.i.dTLB-loads
      0.03            +0.0        0.03        perf-stat.i.dTLB-store-miss-rate%
   3948344            -8.0%    3633633        perf-stat.i.dTLB-store-misses
 6.549e+09            -7.0%   6.09e+09        perf-stat.i.dTLB-stores
  17546602           -22.8%   13551001        perf-stat.i.iTLB-load-misses
   2552560            -2.6%    2485876        perf-stat.i.iTLB-loads
 8.367e+10            -7.5%  7.737e+10        perf-stat.i.instructions
      4706            +7.7%       5070        perf-stat.i.instructions-per-iTLB-miss
      0.81           -12.0%       0.72        perf-stat.i.ipc
      1.59 ±  3%     -22.3%       1.23 ±  4%  perf-stat.i.major-faults
      0.37           +34.2%       0.49        perf-stat.i.metric.GHz
    233.98            -6.9%     217.90        perf-stat.i.metric.M/sec
   3619177            -9.5%    3276556        perf-stat.i.minor-faults
     74.28            +4.8       79.04        perf-stat.i.node-load-miss-rate%
   2898733 ±  4%     +49.0%    4320557        perf-stat.i.node-load-misses
   1928237 ±  4%     -11.9%    1698426        perf-stat.i.node-loads
  13383344 ±  2%      +4.7%   14013398 ±  3%  perf-stat.i.node-stores
   3619179            -9.5%    3276558        perf-stat.i.page-faults
      0.61 ±  3%     +52.5%       0.94 ±  3%  perf-stat.overall.MPKI
     31.95 ±  2%      -3.6       28.34 ±  3%  perf-stat.overall.cache-miss-rate%
      1.00           +45.0%       1.45        perf-stat.overall.cpi
      0.07            +0.0        0.08 ±  4%  perf-stat.overall.dTLB-load-miss-rate%
     87.62            -2.6       85.05        perf-stat.overall.iTLB-load-miss-rate%
      4778           +20.2%       5745        perf-stat.overall.instructions-per-iTLB-miss
      1.00           -31.0%       0.69        perf-stat.overall.ipc
     59.75 ±  3%     +11.8       71.59        perf-stat.overall.node-load-miss-rate%
      5145            +1.8%       5239        perf-stat.overall.path-length
 2.405e+10            -6.3%  2.252e+10        perf-stat.ps.branch-instructions
  31203502            -6.4%   29219514        perf-stat.ps.branch-misses
  52696784 ±  3%     +43.4%   75547948 ±  3%  perf-stat.ps.cache-misses
 1.652e+08 ±  6%     +61.7%  2.672e+08 ±  7%  perf-stat.ps.cache-references
 8.584e+10           +36.3%   1.17e+11        perf-stat.ps.cpu-cycles
    506.29            -2.0%     496.05        perf-stat.ps.cpu-migrations
 2.395e+10            -5.9%  2.254e+10        perf-stat.ps.dTLB-loads
   4059043            -6.2%    3806002        perf-stat.ps.dTLB-store-misses
 6.688e+09            -5.7%  6.308e+09        perf-stat.ps.dTLB-stores
  17944396           -21.8%   14028927        perf-stat.ps.iTLB-load-misses
   2534093            -2.7%    2465233        perf-stat.ps.iTLB-loads
 8.575e+10            -6.0%  8.059e+10        perf-stat.ps.instructions
      1.60 ±  3%     -23.2%       1.23 ±  4%  perf-stat.ps.major-faults
   3726053            -7.7%    3439511        perf-stat.ps.minor-faults
   2942507 ±  4%     +52.0%    4472428        perf-stat.ps.node-load-misses
   1980077 ±  4%     -10.4%    1774633        perf-stat.ps.node-loads
  13780660 ±  2%      +6.8%   14716100 ±  3%  perf-stat.ps.node-stores
   3726055            -7.7%    3439513        perf-stat.ps.page-faults
     37.11            -6.7       30.40 ±  6%  perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
     21.14            -3.8       17.36 ±  7%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     21.05            -3.8       17.29 ±  7%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     21.05            -3.8       17.29 ±  7%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     21.05            -3.8       17.29 ±  7%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     21.00            -3.8       17.25 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     20.70            -3.7       17.00 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     20.69            -3.7       16.99 ±  7%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     20.64            -3.7       16.95 ±  7%  perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      9.51 ±  3%      -1.9        7.57 ±  2%  perf-profile.calltrace.cycles-pp.do_rw_once
      4.54            -1.4        3.19        perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      2.83            -0.9        1.96        perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
      3.90            -0.6        3.34 ±  5%  perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
      4.44 ±  6%      -0.5        3.98 ±  3%  perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      1.17 ±  3%      -0.4        0.73 ±  6%  perf-profile.calltrace.cycles-pp.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
      1.42 ±  2%      -0.4        0.99 ±  2%  perf-profile.calltrace.cycles-pp.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
      1.32 ±  2%      -0.4        0.91        perf-profile.calltrace.cycles-pp.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      1.19 ±  2%      -0.4        0.82        perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      0.96 ±  2%      -0.3        0.65 ±  2%  perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
      0.98 ±  2%      -0.3        0.68 ±  4%  perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.do_access
      1.21            +0.5        1.69 ±  5%  perf-profile.calltrace.cycles-pp.__munmap
      1.21            +0.5        1.69 ±  5%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.21            +0.5        1.69 ±  5%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.21            +0.5        1.69 ±  5%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.21            +0.5        1.69 ±  5%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      1.21            +0.5        1.69 ±  5%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.21            +0.5        1.69 ±  5%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      1.20            +0.5        1.68 ±  5%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.20            +0.5        1.68 ±  5%  perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.20            +0.5        1.68 ±  5%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
      1.20            +0.5        1.69 ±  5%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      1.18            +0.5        1.67 ±  6%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
      0.84 ±  2%      +0.6        1.43 ±  5%  perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      0.58 ±  3%      +0.6        1.18 ±  5%  perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.00            +0.8        0.79 ±  4%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
      0.00            +1.0        1.02 ±  5%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio
      0.00            +1.1        1.08 ±  4%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
      0.00            +1.5        1.46 ±  5%  perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      3.29 ±  3%      +1.9        5.19        perf-profile.calltrace.cycles-pp.finish_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      3.02 ±  4%      +2.0        5.00        perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_read_fault.do_fault.__handle_mm_fault
      2.84 ±  4%      +2.0        4.86        perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault.do_fault
      2.73 ±  4%      +2.0        4.77        perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault
      1.48 ±  4%      +2.1        3.56 ±  2%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault
      0.57 ±  4%      +2.8        3.35 ±  2%  perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      1.96 ±  5%      +2.9        4.86 ±  2%  perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      3.65 ±  2%      +3.1        6.77 ±  2%  perf-profile.calltrace.cycles-pp.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
      0.80 ±  4%      +3.1        3.92 ±  3%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      2.68 ±  3%      +3.4        6.08 ±  2%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      7.71 ±  6%      +3.9       11.66 ±  2%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
     67.18            +6.3       73.46 ±  3%  perf-profile.calltrace.cycles-pp.do_access
      1.46 ±  9%      +7.1        8.57 ± 16%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio
      1.50 ±  9%      +7.1        8.61 ± 16%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      1.38 ± 10%      +7.1        8.51 ± 16%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
     51.46            +7.6       59.08 ±  3%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
      2.98 ±  5%      +7.7       10.66 ± 14%  perf-profile.calltrace.cycles-pp.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
      2.84 ±  6%      +7.7       10.56 ± 14%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
     34.18            +8.5       42.68 ±  4%  perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
     34.14            +8.5       42.64 ±  4%  perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
     33.95            +8.6       42.51 ±  4%  perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
     42.88            +8.8       51.70 ±  4%  perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     42.34            +9.0       51.30 ±  4%  perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     42.29            +9.0       51.28 ±  4%  perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     45.07            +9.6       54.62 ±  4%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
     44.95            +9.6       54.53 ±  4%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
     43.72            +9.9       53.64 ±  4%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
     17.28 ±  2%     +13.8       31.05 ±  6%  perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
     21.14            -3.8       17.36 ±  7%  perf-profile.children.cycles-pp.cpu_startup_entry
     21.14            -3.8       17.36 ±  7%  perf-profile.children.cycles-pp.do_idle
     21.14            -3.8       17.36 ±  7%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     21.09            -3.8       17.33 ±  7%  perf-profile.children.cycles-pp.cpuidle_idle_call
     21.05            -3.8       17.29 ±  7%  perf-profile.children.cycles-pp.start_secondary
     20.79            -3.7       17.07 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter
     20.78            -3.7       17.07 ±  7%  perf-profile.children.cycles-pp.cpuidle_enter_state
     20.72            -3.7       17.02 ±  7%  perf-profile.children.cycles-pp.acpi_idle_enter
     20.71            -3.7       17.01 ±  7%  perf-profile.children.cycles-pp.acpi_safe_halt
     20.79            -3.6       17.19 ±  6%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
     11.52            -3.1        8.42        perf-profile.children.cycles-pp.do_rw_once
      4.62            -1.4        3.24        perf-profile.children.cycles-pp.filemap_map_pages
      2.89            -0.9        2.00        perf-profile.children.cycles-pp.next_uptodate_folio
      3.98            -0.6        3.39 ±  5%  perf-profile.children.cycles-pp.clear_page_erms
      4.46 ±  6%      -0.5        3.99 ±  3%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      1.18 ±  4%      -0.4        0.74 ±  6%  perf-profile.children.cycles-pp.shmem_inode_acct_blocks
      1.44 ±  2%      -0.4        1.00 ±  2%  perf-profile.children.cycles-pp.shmem_alloc_folio
      1.40            -0.4        0.99        perf-profile.children.cycles-pp.alloc_pages_mpol
      1.27            -0.4        0.90        perf-profile.children.cycles-pp.__alloc_pages
      1.01 ±  2%      -0.3        0.68        perf-profile.children.cycles-pp.get_page_from_freelist
      1.02 ±  2%      -0.3        0.70 ±  4%  perf-profile.children.cycles-pp.sync_regs
      0.77 ±  2%      -0.3        0.51        perf-profile.children.cycles-pp.rmqueue
      0.81 ±  2%      -0.2        0.60        perf-profile.children.cycles-pp.__perf_sw_event
      0.53 ±  3%      -0.2        0.34 ±  2%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      0.68 ±  2%      -0.2        0.50 ±  5%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.65 ±  6%      -0.2        0.47 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock
      0.47 ±  3%      -0.2        0.29 ±  2%  perf-profile.children.cycles-pp.rmqueue_bulk
      0.65 ±  2%      -0.2        0.49        perf-profile.children.cycles-pp.___perf_sw_event
      0.64 ±  4%      -0.1        0.49 ±  5%  perf-profile.children.cycles-pp.xas_load
      0.54            -0.1        0.39 ±  4%  perf-profile.children.cycles-pp.__mod_node_page_state
      0.49 ±  2%      -0.1        0.35 ±  3%  perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.54 ±  5%      -0.1        0.40 ±  2%  perf-profile.children.cycles-pp.xas_find
      0.39 ±  4%      -0.1        0.28 ±  3%  perf-profile.children.cycles-pp.__pte_offset_map_lock
      0.39 ±  3%      -0.1        0.29 ±  3%  perf-profile.children.cycles-pp.xas_descend
      0.32 ±  4%      -0.1        0.22 ±  8%  perf-profile.children.cycles-pp.__dquot_alloc_space
      0.30 ±  3%      -0.1        0.22 ±  3%  perf-profile.children.cycles-pp.mas_walk
      0.20 ± 13%      -0.1        0.13 ±  5%  perf-profile.children.cycles-pp.shmem_recalc_inode
      0.26 ±  2%      -0.1        0.19 ±  3%  perf-profile.children.cycles-pp.filemap_get_entry
      0.18 ±  5%      -0.1        0.12 ±  5%  perf-profile.children.cycles-pp.xas_find_conflict
      0.28 ±  4%      -0.1        0.22 ±  8%  perf-profile.children.cycles-pp.__x64_sys_execve
      0.28 ±  4%      -0.1        0.22 ±  8%  perf-profile.children.cycles-pp.do_execveat_common
      0.28 ±  4%      -0.1        0.22 ±  8%  perf-profile.children.cycles-pp.execve
      0.29 ±  3%      -0.1        0.24 ±  8%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.16 ±  5%      -0.1        0.11 ±  8%  perf-profile.children.cycles-pp.error_entry
      0.14 ±  5%      -0.0        0.09 ±  8%  perf-profile.children.cycles-pp.__percpu_counter_limited_add
      0.15 ±  5%      -0.0        0.10 ± 10%  perf-profile.children.cycles-pp.inode_add_bytes
      0.07 ±  6%      -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.__folio_throttle_swaprate
      0.10            -0.0        0.06 ± 13%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.18 ±  7%      -0.0        0.14 ± 13%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.16 ±  5%      -0.0        0.12        perf-profile.children.cycles-pp.handle_pte_fault
      0.17 ±  7%      -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.xas_start
      0.14 ±  6%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.__pte_offset_map
      0.07 ±  5%      -0.0        0.03 ± 70%  perf-profile.children.cycles-pp.policy_nodemask
      0.16 ±  4%      -0.0        0.13 ± 12%  perf-profile.children.cycles-pp.folio_mark_accessed
      0.19 ±  4%      -0.0        0.16 ±  8%  perf-profile.children.cycles-pp.bprm_execve
      0.11 ±  9%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.down_read_trylock
      0.16 ±  6%      -0.0        0.13 ±  5%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.11 ±  6%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.up_read
      0.15 ±  7%      -0.0        0.12 ± 13%  perf-profile.children.cycles-pp.folio_unlock
      0.10 ±  4%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp.__libc_fork
      0.07 ±  6%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.ksys_read
      0.10 ±  3%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.kernel_clone
      0.09 ±  5%      -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.09 ±  5%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.__x64_sys_openat
      0.08 ±  8%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.do_filp_open
      0.08 ±  8%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.path_openat
      0.07            -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.vfs_read
      0.09 ±  4%      -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.__do_sys_clone
      0.10 ±  6%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.08 ±  8%      -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.do_sys_openat2
      0.07 ±  5%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.copy_process
      0.16 ±  5%      -0.0        0.14 ±  6%  perf-profile.children.cycles-pp.exec_binprm
      0.10 ±  6%      -0.0        0.08 ±  7%  perf-profile.children.cycles-pp.__vm_enough_memory
      0.16 ±  4%      -0.0        0.14 ±  6%  perf-profile.children.cycles-pp.search_binary_handler
      0.08            -0.0        0.06 ±  9%  perf-profile.children.cycles-pp.__irqentry_text_end
      0.09 ±  5%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp._compound_head
      0.15 ±  5%      -0.0        0.13 ±  7%  perf-profile.children.cycles-pp.xas_create
      0.15 ±  4%      -0.0        0.14 ±  8%  perf-profile.children.cycles-pp.load_elf_binary
      0.12 ±  4%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.kmem_cache_alloc_lru
      0.05 ±  8%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.propagate_protected_usage
      0.25 ±  2%      +0.0        0.30 ±  4%  perf-profile.children.cycles-pp.page_counter_try_charge
      0.02 ±141%      +0.0        0.06 ±  7%  perf-profile.children.cycles-pp.mod_objcg_state
      0.00            +0.1        0.07 ± 14%  perf-profile.children.cycles-pp.tlb_finish_mmu
      1.25            +0.5        1.72 ±  5%  perf-profile.children.cycles-pp.unmap_vmas
      1.24            +0.5        1.71 ±  5%  perf-profile.children.cycles-pp.zap_pte_range
      1.24            +0.5        1.71 ±  5%  perf-profile.children.cycles-pp.unmap_page_range
      1.24            +0.5        1.71 ±  5%  perf-profile.children.cycles-pp.zap_pmd_range
      1.21            +0.5        1.69 ±  5%  perf-profile.children.cycles-pp.__munmap
      1.22            +0.5        1.71 ±  5%  perf-profile.children.cycles-pp.__vm_munmap
      1.21            +0.5        1.70 ±  5%  perf-profile.children.cycles-pp.__x64_sys_munmap
      1.25            +0.5        1.74 ±  5%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      1.25            +0.5        1.74 ±  5%  perf-profile.children.cycles-pp.do_vmi_munmap
      1.22            +0.5        1.72 ±  5%  perf-profile.children.cycles-pp.unmap_region
      0.85 ±  2%      +0.6        1.44 ±  5%  perf-profile.children.cycles-pp.lru_add_fn
      0.60 ±  3%      +0.6        1.20 ±  4%  perf-profile.children.cycles-pp.page_remove_rmap
      3.30 ±  3%      +1.9        5.20        perf-profile.children.cycles-pp.finish_fault
      3.04 ±  4%      +2.0        5.01        perf-profile.children.cycles-pp.set_pte_range
      2.85 ±  4%      +2.0        4.87        perf-profile.children.cycles-pp.folio_add_file_rmap_range
      1.97 ±  5%      +2.9        4.88 ±  2%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      3.69 ±  2%      +3.1        6.80 ±  2%  perf-profile.children.cycles-pp.shmem_add_to_page_cache
      7.74 ±  6%      +3.9       11.69 ±  2%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.80 ±  4%      +4.0        4.85 ±  3%  perf-profile.children.cycles-pp.__count_memcg_events
      6.12 ±  3%      +6.1       12.18        perf-profile.children.cycles-pp.__mod_lruvec_page_state
      2.99 ±  3%      +6.6        9.56 ±  2%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
     61.44            +6.7       68.11 ±  3%  perf-profile.children.cycles-pp.do_access
      1.58 ±  9%      +7.1        8.72 ± 16%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.45 ±  9%      +7.2        8.63 ± 16%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      1.53 ±  9%      +7.2        8.72 ± 16%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      2.98 ±  5%      +7.7       10.67 ± 14%  perf-profile.children.cycles-pp.folio_add_lru
      2.86 ±  6%      +7.8       10.63 ± 14%  perf-profile.children.cycles-pp.folio_batch_move_lru
     49.12            +8.3       57.47 ±  3%  perf-profile.children.cycles-pp.asm_exc_page_fault
     34.19            +8.5       42.68 ±  4%  perf-profile.children.cycles-pp.__do_fault
     34.15            +8.5       42.65 ±  4%  perf-profile.children.cycles-pp.shmem_fault
     33.99            +8.6       42.54 ±  4%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
     43.06            +8.8       51.84 ±  4%  perf-profile.children.cycles-pp.__handle_mm_fault
     42.43            +8.9       51.37 ±  4%  perf-profile.children.cycles-pp.do_fault
     42.38            +9.0       51.34 ±  4%  perf-profile.children.cycles-pp.do_read_fault
     45.26            +9.5       54.78 ±  4%  perf-profile.children.cycles-pp.exc_page_fault
     45.15            +9.5       54.69 ±  4%  perf-profile.children.cycles-pp.do_user_addr_fault
     43.91            +9.9       53.80 ±  4%  perf-profile.children.cycles-pp.handle_mm_fault
     17.31 ±  2%     +13.8       31.07 ±  5%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
     12.24            -4.5        7.76 ±  3%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
     17.96            -3.3       14.66 ±  4%  perf-profile.self.cycles-pp.acpi_safe_halt
     10.95            -3.2        7.74        perf-profile.self.cycles-pp.do_rw_once
      5.96            -1.4        4.58 ±  2%  perf-profile.self.cycles-pp.do_access
      2.40            -0.8        1.64        perf-profile.self.cycles-pp.next_uptodate_folio
      3.92            -0.6        3.36 ±  5%  perf-profile.self.cycles-pp.clear_page_erms
      4.40 ±  6%      -0.5        3.95 ±  3%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      1.52 ±  2%      -0.4        1.10 ±  2%  perf-profile.self.cycles-pp.filemap_map_pages
      1.02 ±  2%      -0.3        0.70 ±  4%  perf-profile.self.cycles-pp.sync_regs
      0.50 ±  7%      -0.2        0.27 ±  5%  perf-profile.self.cycles-pp.shmem_inode_acct_blocks
      0.63 ±  5%      -0.2        0.46 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock
      0.42 ±  2%      -0.1        0.27 ±  2%  perf-profile.self.cycles-pp.rmqueue_bulk
      0.52            -0.1        0.38 ±  4%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.56 ±  2%      -0.1        0.42        perf-profile.self.cycles-pp.___perf_sw_event
      0.31 ±  3%      -0.1        0.20 ±  2%  perf-profile.self.cycles-pp.shmem_add_to_page_cache
      0.38 ±  4%      -0.1        0.28        perf-profile.self.cycles-pp.__handle_mm_fault
      0.36 ±  4%      -0.1        0.26 ±  2%  perf-profile.self.cycles-pp.xas_descend
      0.30 ±  2%      -0.1        0.22 ±  2%  perf-profile.self.cycles-pp.mas_walk
      0.33 ±  3%      -0.1        0.26 ± 10%  perf-profile.self.cycles-pp.lru_add_fn
      0.20 ±  3%      -0.1        0.14 ±  5%  perf-profile.self.cycles-pp.asm_exc_page_fault
      0.21 ±  5%      -0.1        0.15 ±  6%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.26 ±  9%      -0.1        0.20 ± 15%  perf-profile.self.cycles-pp.xas_store
      0.16 ±  7%      -0.1        0.11 ±  6%  perf-profile.self.cycles-pp.__perf_sw_event
      0.18 ±  2%      -0.1        0.13 ±  5%  perf-profile.self.cycles-pp.__alloc_pages
      0.22 ±  4%      -0.1        0.17 ±  4%  perf-profile.self.cycles-pp.handle_mm_fault
      0.20 ±  8%      -0.1        0.14 ±  5%  perf-profile.self.cycles-pp.xas_find
      0.15 ±  6%      -0.0        0.10 ±  7%  perf-profile.self.cycles-pp.error_entry
      0.17 ±  2%      -0.0        0.12 ±  6%  perf-profile.self.cycles-pp.__dquot_alloc_space
      0.17 ±  6%      -0.0        0.13 ± 10%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.22 ±  4%      -0.0        0.18 ±  9%  perf-profile.self.cycles-pp.xas_load
      0.23 ±  4%      -0.0        0.19 ± 10%  perf-profile.self.cycles-pp.zap_pte_range
      0.12 ±  7%      -0.0        0.08 ± 10%  perf-profile.self.cycles-pp.__percpu_counter_limited_add
      0.14 ±  3%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.rmqueue
      0.15 ±  6%      -0.0        0.10 ±  9%  perf-profile.self.cycles-pp.__mod_lruvec_state
      0.15 ±  2%      -0.0        0.11 ±  6%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.12 ±  7%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.folio_add_lru
      0.16 ±  7%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.xas_start
      0.06 ±  7%      -0.0        0.02 ± 99%  perf-profile.self.cycles-pp.finish_fault
      0.16 ±  4%      -0.0        0.12 ± 12%  perf-profile.self.cycles-pp.folio_mark_accessed
      0.11 ±  8%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.13 ±  6%      -0.0        0.09 ±  4%  perf-profile.self.cycles-pp.__pte_offset_map
      0.11 ±  9%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.down_read_trylock
      0.12 ±  3%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.do_read_fault
      0.09 ±  5%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
      0.16 ±  4%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.11 ±  6%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.shmem_alloc_and_add_folio
      0.08 ±  8%      -0.0        0.05        perf-profile.self.cycles-pp.xas_find_conflict
      0.12 ±  4%      -0.0        0.09 ±  7%  perf-profile.self.cycles-pp.folio_add_file_rmap_range
      0.10 ±  6%      -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.up_read
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.09 ±  4%      -0.0        0.07 ±  7%  perf-profile.self.cycles-pp.exc_page_fault
      0.13 ±  6%      -0.0        0.10 ±  9%  perf-profile.self.cycles-pp.page_remove_rmap
      0.08            -0.0        0.06 ±  8%  perf-profile.self.cycles-pp.__irqentry_text_end
      0.19 ±  5%      -0.0        0.17 ±  5%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.09 ±  6%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.set_pte_range
      0.07 ±  5%      -0.0        0.05 ±  7%  perf-profile.self.cycles-pp._compound_head
      0.08            -0.0        0.06 ±  9%  perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.05 ±  8%      +0.0        0.08 ±  8%  perf-profile.self.cycles-pp.propagate_protected_usage
      2.93 ±  4%      +0.4        3.35 ±  3%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
      0.77 ±  7%      +1.5        2.23 ±  3%  perf-profile.self.cycles-pp.__mem_cgroup_charge
      0.75 ±  4%      +4.0        4.80 ±  3%  perf-profile.self.cycles-pp.__count_memcg_events
      2.83 ±  3%      +6.6        9.40 ±  2%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      1.45 ±  9%      +7.2        8.63 ± 16%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


***************************************************************************************************
lkp-skl-fpga01: 104 threads 2 sockets (Skylake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale

commit: 
  e0bf1dc859 ("mm: memcg: move vmstats structs definition above flushing code")
  8d59d2214c ("mm: memcg: make stats flushing threshold per-memcg")

e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      4.05            -1.2        2.81        mpstat.cpu.all.usr%
    193.83 ±  6%     +69.3%     328.17 ±  8%  perf-c2c.DRAM.local
      1216 ±  8%     +27.1%       1546 ±  6%  perf-c2c.DRAM.remote
    150.33 ± 13%     -40.0%      90.17 ± 13%  perf-c2c.HITM.remote
      0.04           -25.0%       0.03        turbostat.IPC
    316.16            -1.5%     311.47        turbostat.PkgWatt
     30.54            +4.9%      32.04        turbostat.RAMWatt
   2132437           -32.3%    1444430        will-it-scale.52.processes
     41008           -32.3%      27776        will-it-scale.per_process_ops
   2132437           -32.3%    1444430        will-it-scale.workload
 3.113e+08 ±  3%     -31.7%  2.125e+08 ±  4%  numa-numastat.node0.local_node
 3.114e+08 ±  3%     -31.7%  2.126e+08 ±  4%  numa-numastat.node0.numa_hit
 3.322e+08 ±  2%     -32.5%  2.243e+08 ±  3%  numa-numastat.node1.local_node
 3.323e+08 ±  2%     -32.5%  2.243e+08 ±  3%  numa-numastat.node1.numa_hit
 3.114e+08 ±  3%     -31.7%  2.126e+08 ±  4%  numa-vmstat.node0.numa_hit
 3.113e+08 ±  3%     -31.7%  2.125e+08 ±  4%  numa-vmstat.node0.numa_local
 3.323e+08 ±  2%     -32.5%  2.243e+08 ±  3%  numa-vmstat.node1.numa_hit
 3.322e+08 ±  2%     -32.5%  2.243e+08 ±  3%  numa-vmstat.node1.numa_local
      0.00 ± 19%     -61.1%       0.00 ± 31%  perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
    217.07 ± 11%     -46.4%     116.39 ± 23%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
    218.50 ±  6%     +19.1%     260.33 ±  4%  perf-sched.wait_and_delay.count.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
    217.06 ± 11%     -46.4%     116.38 ± 23%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
 6.436e+08           -32.1%  4.369e+08        proc-vmstat.numa_hit
 6.435e+08           -32.1%  4.368e+08        proc-vmstat.numa_local
 6.432e+08           -32.1%  4.368e+08        proc-vmstat.pgalloc_normal
 1.286e+09           -32.1%  8.726e+08        proc-vmstat.pgfault
 6.432e+08           -32.1%  4.367e+08        proc-vmstat.pgfree
    170696 ±  8%      +3.4%     176515 ±  8%  sched_debug.cpu.clock.avg
    170703 ±  8%      +3.4%     176522 ±  8%  sched_debug.cpu.clock.max
    170689 ±  8%      +3.4%     176508 ±  8%  sched_debug.cpu.clock.min
    169431 ±  8%      +3.4%     175248 ±  8%  sched_debug.cpu.clock_task.avg
    169630 ±  8%      +3.4%     175429 ±  8%  sched_debug.cpu.clock_task.max
    162542 ±  8%      +3.5%     168260 ±  8%  sched_debug.cpu.clock_task.min
    170690 ±  8%      +3.4%     176508 ±  8%  sched_debug.cpu_clk
    170117 ±  8%      +3.4%     175938 ±  8%  sched_debug.ktime
    171259 ±  8%      +3.4%     177078 ±  8%  sched_debug.sched_clk
      4.06           +80.8%       7.34        perf-stat.i.MPKI
 4.066e+09           -23.3%   3.12e+09        perf-stat.i.branch-instructions
      0.57            -0.0        0.55        perf-stat.i.branch-miss-rate%
  23478297           -25.0%   17605102        perf-stat.i.branch-misses
     17.25            +7.0       24.27        perf-stat.i.cache-miss-rate%
  82715093 ±  2%     +35.9%  1.124e+08        perf-stat.i.cache-misses
 4.795e+08 ±  2%      -3.4%   4.63e+08        perf-stat.i.cache-references
      7.14           +32.9%       9.49        perf-stat.i.cpi
    134.85            -1.2%     133.29        perf-stat.i.cpu-migrations
      1760 ±  2%     -26.5%       1294        perf-stat.i.cycles-between-cache-misses
      0.26            -0.0        0.24        perf-stat.i.dTLB-load-miss-rate%
  13461491           -31.7%    9190211        perf-stat.i.dTLB-load-misses
 5.141e+09           -24.1%  3.902e+09        perf-stat.i.dTLB-loads
      0.45            -0.0        0.44        perf-stat.i.dTLB-store-miss-rate%
  12934403           -32.2%    8773143        perf-stat.i.dTLB-store-misses
 2.841e+09           -29.9%  1.992e+09        perf-stat.i.dTLB-stores
     14.76            +1.4       16.18 ±  4%  perf-stat.i.iTLB-load-miss-rate%
   7454399 ±  2%     -22.7%    5760387 ±  4%  perf-stat.i.iTLB-load-misses
  43026423           -30.6%   29840650        perf-stat.i.iTLB-loads
 2.042e+10           -24.7%  1.538e+10        perf-stat.i.instructions
      0.14           -24.6%       0.11        perf-stat.i.ipc
    815.65           -20.2%     651.03        perf-stat.i.metric.K/sec
    120.43           -24.3%      91.11        perf-stat.i.metric.M/sec
   4264808           -32.2%    2892980        perf-stat.i.minor-faults
  11007315 ±  2%     +39.7%   15375516        perf-stat.i.node-load-misses
   1459152 ±  6%     +45.1%    2116827 ±  5%  perf-stat.i.node-loads
   7872989 ±  2%     -26.2%    5812458        perf-stat.i.node-store-misses
   4264808           -32.2%    2892980        perf-stat.i.page-faults
      4.05           +80.4%       7.31        perf-stat.overall.MPKI
      0.58            -0.0        0.57        perf-stat.overall.branch-miss-rate%
     17.25            +7.0       24.27        perf-stat.overall.cache-miss-rate%
      7.13           +32.7%       9.46        perf-stat.overall.cpi
      1759 ±  2%     -26.5%       1294        perf-stat.overall.cycles-between-cache-misses
      0.26            -0.0        0.23        perf-stat.overall.dTLB-load-miss-rate%
      0.45            -0.0        0.44        perf-stat.overall.dTLB-store-miss-rate%
     14.77            +1.4       16.18 ±  4%  perf-stat.overall.iTLB-load-miss-rate%
      0.14           -24.7%       0.11        perf-stat.overall.ipc
   2882666           +11.2%    3206246        perf-stat.overall.path-length
 4.052e+09           -23.3%   3.11e+09        perf-stat.ps.branch-instructions
  23421504           -25.0%   17574476        perf-stat.ps.branch-misses
  82419384 ±  2%     +35.9%   1.12e+08        perf-stat.ps.cache-misses
 4.778e+08 ±  2%      -3.4%  4.614e+08        perf-stat.ps.cache-references
    134.44            -1.1%     132.98        perf-stat.ps.cpu-migrations
  13415064           -31.7%    9160067        perf-stat.ps.dTLB-load-misses
 5.124e+09           -24.1%   3.89e+09        perf-stat.ps.dTLB-loads
  12889609           -32.2%    8744145        perf-stat.ps.dTLB-store-misses
 2.831e+09           -29.9%  1.986e+09        perf-stat.ps.dTLB-stores
   7428050 ±  2%     -22.7%    5741276 ±  4%  perf-stat.ps.iTLB-load-misses
  42877049           -30.6%   29741122        perf-stat.ps.iTLB-loads
 2.035e+10           -24.7%  1.533e+10        perf-stat.ps.instructions
   4250034           -32.2%    2883410        perf-stat.ps.minor-faults
  10968228 ±  2%     +39.7%   15322266        perf-stat.ps.node-load-misses
   1454274 ±  6%     +45.1%    2109746 ±  5%  perf-stat.ps.node-loads
   7845298 ±  2%     -26.2%    5792864        perf-stat.ps.node-store-misses
   4250034           -32.2%    2883410        perf-stat.ps.page-faults
 6.147e+12           -24.7%  4.631e+12        perf-stat.total.instructions
     26.77            -1.8       24.93 ±  3%  perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     26.75            -1.8       24.92 ±  2%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     26.75            -1.8       24.92 ±  2%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.84            -1.8       25.00 ±  3%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     26.75            -1.8       24.92 ±  2%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.75            -1.8       24.92 ±  2%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.75            -1.8       24.92 ±  2%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     27.05            -1.8       25.29 ±  3%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     13.02 ±  2%      -1.4       11.60 ±  4%  perf-profile.calltrace.cycles-pp.testcase
      5.54 ±  5%      -1.0        4.52 ±  3%  perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
      1.37 ±  2%      -0.9        0.51 ± 58%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__madvise
     10.38 ±  3%      -0.8        9.54 ±  2%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
      2.38 ±  2%      -0.8        1.63 ±  3%  perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush
      4.02 ±  3%      -0.7        3.32 ±  3%  perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
      1.92 ±  4%      -0.4        1.49 ±  2%  perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase
      1.36 ±  2%      -0.4        0.99        perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
      1.30 ± 10%      -0.4        0.94 ±  6%  perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      1.50 ± 11%      -0.3        1.19 ±  5%  perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
      1.13 ±  3%      -0.3        0.83        perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
      0.71 ±  3%      -0.3        0.43 ± 44%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__madvise
      1.02 ±  3%      -0.3        0.75        perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
      0.97 ±  3%      -0.3        0.72        perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
      0.77 ±  2%      -0.2        0.58 ±  2%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.71 ±  2%      -0.1        0.60 ±  3%  perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages
      1.20            +0.1        1.34        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      1.10 ±  2%      +0.2        1.28        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
      1.04 ±  2%      +0.2        1.24        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
      0.83            +0.2        1.07 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
      0.81 ±  2%      +0.3        1.08        perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
      0.88 ± 10%      +0.3        1.16 ±  4%  perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.71 ±  2%      +0.3        1.00        perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
      0.76 ±  3%      +0.3        1.09 ±  2%  perf-profile.calltrace.cycles-pp.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.73 ±  3%      +0.3        1.07 ±  2%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.00            +0.6        0.55 ±  2%  perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault
      6.60 ±  4%      +0.6        7.18 ±  3%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
      6.54 ±  4%      +0.6        7.13 ±  3%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.00            +0.7        0.74 ±  3%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
      0.00            +0.8        0.79 ±  2%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
      0.00            +0.8        0.79 ±  3%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
      0.00            +0.8        0.80 ±  3%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault
      5.80 ±  5%      +0.8        6.60 ±  3%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.00            +0.8        0.82        perf-profile.calltrace.cycles-pp.__count_memcg_events.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush
      0.69 ±  4%      +0.9        1.59 ±  2%  perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     30.43            +1.1       31.57        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
     29.22            +1.5       30.69        perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
     29.05            +1.5       30.56        perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
     22.56 ±  2%      +2.3       24.87        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
     22.36 ±  2%      +2.3       24.70        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
     22.11 ±  2%      +2.4       24.55        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush
     22.70            +2.6       25.35        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
     22.38            +2.7       25.08        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
     24.10            +2.7       26.82        perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
     24.09            +2.7       26.82        perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise
     24.07            +2.7       26.79        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior
     22.14            +2.8       24.93        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
     59.76            +2.9       62.64        perf-profile.calltrace.cycles-pp.__madvise
     57.63            +3.5       61.10        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
     57.27            +3.6       60.85        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     56.41            +3.8       60.20        perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     56.37            +3.8       60.17        perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     55.94            +3.9       59.88        perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
     55.85            +4.0       59.82        perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
     26.75            -1.8       24.92 ±  2%  perf-profile.children.cycles-pp.start_secondary
     26.98            -1.8       25.22 ±  3%  perf-profile.children.cycles-pp.intel_idle_ibrs
     27.05            -1.8       25.29 ±  3%  perf-profile.children.cycles-pp.cpu_startup_entry
     27.05            -1.8       25.29 ±  3%  perf-profile.children.cycles-pp.do_idle
     27.05            -1.8       25.29 ±  3%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     27.05            -1.8       25.29 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter
     27.05            -1.8       25.29 ±  3%  perf-profile.children.cycles-pp.cpuidle_enter_state
     27.05            -1.8       25.29 ±  3%  perf-profile.children.cycles-pp.cpuidle_idle_call
     13.66 ±  2%      -1.3       12.38        perf-profile.children.cycles-pp.testcase
      5.55 ±  5%      -1.0        4.52 ±  3%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
      2.39 ±  2%      -0.8        1.63 ±  3%  perf-profile.children.cycles-pp.page_counter_uncharge
      4.03 ±  3%      -0.7        3.32 ±  3%  perf-profile.children.cycles-pp.uncharge_batch
      1.96 ±  4%      -0.4        1.52 ±  2%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      1.30            -0.4        0.94 ±  2%  perf-profile.children.cycles-pp.error_entry
      1.36 ±  2%      -0.4        0.99        perf-profile.children.cycles-pp.__irqentry_text_end
      1.30 ± 10%      -0.4        0.94 ±  6%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      1.51 ± 11%      -0.3        1.19 ±  5%  perf-profile.children.cycles-pp.uncharge_folio
      1.14 ±  3%      -0.3        0.84        perf-profile.children.cycles-pp.flush_tlb_mm_range
      1.02 ±  3%      -0.3        0.75        perf-profile.children.cycles-pp.flush_tlb_func
      0.98 ±  3%      -0.3        0.72        perf-profile.children.cycles-pp.native_flush_tlb_one_user
      0.73 ±  2%      -0.2        0.52 ±  2%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.69 ±  2%      -0.2        0.50 ±  2%  perf-profile.children.cycles-pp.native_irq_return_iret
      0.79 ±  2%      -0.2        0.60 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.51 ±  2%      -0.1        0.38 ±  2%  perf-profile.children.cycles-pp.sync_regs
      0.41 ±  3%      -0.1        0.29 ±  3%  perf-profile.children.cycles-pp.__perf_sw_event
      0.44 ±  2%      -0.1        0.32 ±  2%  perf-profile.children.cycles-pp.vma_alloc_folio
      0.72 ±  2%      -0.1        0.61 ±  3%  perf-profile.children.cycles-pp.propagate_protected_usage
      0.39            -0.1        0.28 ±  2%  perf-profile.children.cycles-pp.alloc_pages_mpol
      0.35 ±  3%      -0.1        0.25 ±  3%  perf-profile.children.cycles-pp.__alloc_pages
      0.34 ±  2%      -0.1        0.24 ±  4%  perf-profile.children.cycles-pp.___perf_sw_event
      0.30 ±  3%      -0.1        0.21 ±  5%  perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.32 ±  2%      -0.1        0.24        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.12 ±  4%      -0.1        0.03 ± 70%  perf-profile.children.cycles-pp.down_read
      0.25 ±  3%      -0.1        0.18 ±  4%  perf-profile.children.cycles-pp.mas_walk
      0.25 ±  3%      -0.1        0.18 ±  2%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.17 ±  4%      -0.1        0.11 ±  3%  perf-profile.children.cycles-pp.__pte_offset_map_lock
      0.14 ±  3%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.clear_page_erms
      0.17 ±  2%      -0.0        0.12 ±  3%  perf-profile.children.cycles-pp.find_vma_prev
      0.13 ±  2%      -0.0        0.09        perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.11 ±  4%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.__cond_resched
      0.13 ±  2%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.06 ±  7%      -0.0        0.03 ± 70%  perf-profile.children.cycles-pp.unmap_vmas
      0.11 ±  3%      -0.0        0.08 ±  6%  perf-profile.children.cycles-pp.free_unref_page_list
      0.06            -0.0        0.03 ± 70%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.09 ±  7%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.free_swap_cache
      0.09 ±  7%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.__munmap
      0.09 ±  8%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp._raw_spin_lock
      0.09 ±  5%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.handle_pte_fault
      0.08 ±  8%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.do_vmi_munmap
      0.08 ±  4%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.07 ±  6%      -0.0        0.05 ±  8%  perf-profile.children.cycles-pp.rmqueue
      0.07 ±  9%      -0.0        0.05 ±  7%  perf-profile.children.cycles-pp.unmap_region
      0.08 ±  8%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.__vm_munmap
      0.08 ±  8%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.__x64_sys_munmap
      0.08 ±  8%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.08 ±  5%      -0.0        0.07 ±  7%  perf-profile.children.cycles-pp.try_charge_memcg
      1.27            +0.1        1.40        perf-profile.children.cycles-pp.unmap_page_range
      1.17            +0.2        1.32        perf-profile.children.cycles-pp.zap_pmd_range
      1.12            +0.2        1.29        perf-profile.children.cycles-pp.zap_pte_range
      0.84            +0.2        1.07 ±  2%  perf-profile.children.cycles-pp.lru_add_fn
      0.81 ±  2%      +0.3        1.08        perf-profile.children.cycles-pp.page_remove_rmap
      0.89 ± 10%      +0.3        1.16 ±  4%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      0.77 ±  3%      +0.3        1.09 ±  2%  perf-profile.children.cycles-pp.folio_add_new_anon_rmap
      6.62 ±  4%      +0.6        7.19 ±  3%  perf-profile.children.cycles-pp.exc_page_fault
      6.56 ±  4%      +0.6        7.14 ±  3%  perf-profile.children.cycles-pp.do_user_addr_fault
      1.44 ±  2%      +0.6        2.08 ±  2%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
      5.80 ±  5%      +0.8        6.61 ±  3%  perf-profile.children.cycles-pp.handle_mm_fault
     30.44            +1.1       31.58        perf-profile.children.cycles-pp.tlb_finish_mmu
     29.23            +1.5       30.69        perf-profile.children.cycles-pp.tlb_batch_pages_flush
     29.19            +1.5       30.66        perf-profile.children.cycles-pp.release_pages
      1.63 ±  5%      +1.5        3.13 ±  2%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      1.32 ±  4%      +1.6        2.97 ±  2%  perf-profile.children.cycles-pp.__count_memcg_events
     24.12            +2.7       26.84        perf-profile.children.cycles-pp.lru_add_drain
     24.12            +2.7       26.84        perf-profile.children.cycles-pp.lru_add_drain_cpu
     24.09            +2.7       26.81        perf-profile.children.cycles-pp.folio_batch_move_lru
     59.80            +2.9       62.68        perf-profile.children.cycles-pp.__madvise
     57.82            +3.4       61.26        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     57.44            +3.5       60.99        perf-profile.children.cycles-pp.do_syscall_64
     56.41            +3.8       60.20        perf-profile.children.cycles-pp.__x64_sys_madvise
     56.37            +3.8       60.17        perf-profile.children.cycles-pp.do_madvise
     55.94            +3.9       59.88        perf-profile.children.cycles-pp.madvise_vma_behavior
     55.85            +4.0       59.82        perf-profile.children.cycles-pp.zap_page_range_single
     45.26            +5.0       50.23        perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     44.75            +5.0       49.80        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     44.26            +5.2       49.50        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     26.98            -1.8       25.22 ±  3%  perf-profile.self.cycles-pp.intel_idle_ibrs
      1.67 ±  3%      -0.6        1.02 ±  3%  perf-profile.self.cycles-pp.page_counter_uncharge
      1.92 ±  5%      -0.4        1.49 ±  2%  perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
      1.47 ±  2%      -0.4        1.06 ±  2%  perf-profile.self.cycles-pp.testcase
      1.36 ±  2%      -0.4        0.99        perf-profile.self.cycles-pp.__irqentry_text_end
      1.30            -0.4        0.94        perf-profile.self.cycles-pp.error_entry
      1.30 ± 10%      -0.4        0.94 ±  6%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      1.18 ±  8%      -0.3        0.86 ±  6%  perf-profile.self.cycles-pp.uncharge_batch
      1.50 ± 11%      -0.3        1.19 ±  5%  perf-profile.self.cycles-pp.uncharge_folio
      0.98 ±  3%      -0.3        0.72        perf-profile.self.cycles-pp.native_flush_tlb_one_user
      0.71 ±  2%      -0.2        0.51 ±  2%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.69 ±  2%      -0.2        0.50 ±  2%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.50 ±  4%      -0.2        0.30 ±  5%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.75 ±  2%      -0.2        0.56 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.51 ±  2%      -0.1        0.38 ±  2%  perf-profile.self.cycles-pp.sync_regs
      0.35 ±  3%      -0.1        0.23 ±  2%  perf-profile.self.cycles-pp.folio_batch_move_lru
      0.36 ±  5%      -0.1        0.24 ±  2%  perf-profile.self.cycles-pp.lru_add_fn
      0.39 ±  2%      -0.1        0.27 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.72 ±  2%      -0.1        0.61 ±  3%  perf-profile.self.cycles-pp.propagate_protected_usage
      0.45            -0.1        0.34 ±  2%  perf-profile.self.cycles-pp.release_pages
      0.54 ±  4%      -0.1        0.45 ±  4%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
      0.30 ±  2%      -0.1        0.21 ±  3%  perf-profile.self.cycles-pp.___perf_sw_event
      0.52 ±  5%      -0.1        0.43 ±  5%  perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
      0.28 ±  3%      -0.1        0.21        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.25 ±  3%      -0.1        0.18 ±  4%  perf-profile.self.cycles-pp.mas_walk
      0.24 ±  2%      -0.1        0.17 ±  4%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.16 ±  4%      -0.1        0.10 ±  9%  perf-profile.self.cycles-pp.zap_pte_range
      0.14 ±  4%      -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.clear_page_erms
      0.08 ±  6%      -0.0        0.03 ± 70%  perf-profile.self.cycles-pp.__cond_resched
      0.13            -0.0        0.09        perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.14 ±  5%      -0.0        0.11 ±  3%  perf-profile.self.cycles-pp.handle_mm_fault
      0.11 ±  3%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.08 ±  6%      -0.0        0.04 ± 44%  perf-profile.self.cycles-pp.__perf_sw_event
      0.07 ± 10%      -0.0        0.04 ± 44%  perf-profile.self.cycles-pp.tlb_finish_mmu
      0.09 ±  7%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.free_swap_cache
      0.08 ±  7%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.09 ±  8%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp._raw_spin_lock
      0.07 ±  7%      -0.0        0.04 ± 44%  perf-profile.self.cycles-pp.asm_exc_page_fault
      0.10 ±  3%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.page_remove_rmap
      0.08 ±  6%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.flush_tlb_mm_range
      0.08 ±  6%      -0.0        0.06 ±  9%  perf-profile.self.cycles-pp.do_anonymous_page
      0.08 ±  7%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.unmap_page_range
      0.08 ±  5%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp.__alloc_pages
      0.08 ±  6%      -0.0        0.06 ±  8%  perf-profile.self.cycles-pp.do_madvise
      0.07 ± 10%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.up_read
      1.58 ±  6%      +1.5        3.09 ±  2%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      1.27 ±  5%      +1.7        2.93 ±  2%  perf-profile.self.cycles-pp.__count_memcg_events
     44.25            +5.2       49.50        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath



Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression
  2024-01-22  8:39 [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression kernel test robot
@ 2024-01-22 21:39 ` Yosry Ahmed
  2024-01-23  7:21   ` Oliver Sang
  0 siblings, 1 reply; 6+ messages in thread
From: Yosry Ahmed @ 2024-01-22 21:39 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner,
	Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen,
	Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song,
	Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups,
	linux-mm, ying.huang, feng.tang, fengwei.yin

[-- Attachment #1: Type: text/plain, Size: 3423 bytes --]

On Mon, Jan 22, 2024 at 12:39 AM kernel test robot
<oliver.sang@intel.com> wrote:
>
>
>
> hi, Yosry Ahmed,
>
> per your suggestion in
> https://lore.kernel.org/all/CAJD7tkameJBrJQxRj+ibKL6-yd-i0wyoyv2cgZdh3ZepA1p7wA@mail.gmail.com/
> "I think it would be useful to know if there are
> regressions/improvements in other microbenchmarks, at least to
> investigate whether they represent real regressions."
>
> we still report below two regressions to you just FYI what we observed in our
> microbenchmark tests.
> (we still captured will-it-scale::fallocate regression but ignore here per
> your commit message)
>
>
> Hello,
>
> kernel test robot noticed a -36.6% regression of vm-scalability.throughput on:
>
>
> commit: 8d59d2214c2362e7a9d185d80b613e632581af7b ("mm: memcg: make stats flushing threshold per-memcg")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: vm-scalability
> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> parameters:
>
>         runtime: 300s
>         size: 1T
>         test: lru-shm
>         cpufreq_governor: performance
>
> test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
>
> In addition to that, the commit also has significant impact on the following tests:
>
> +------------------+----------------------------------------------------------------------------------------------------+
> | testcase: change | will-it-scale: will-it-scale.per_process_ops -32.3% regression                                     |
> | test machine     | 104 threads 2 sockets (Skylake) with 192G memory                                                   |
> | test parameters  | cpufreq_governor=performance                                                                       |
> |                  | mode=process                                                                                       |
> |                  | nr_task=50%                                                                                        |
> |                  | test=tlb_flush2                                                                                    |
> +------------------+----------------------------------------------------------------------------------------------------+
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com

Thanks for reporting this. We have had these patches running on O(10K)
machines in our production for a while now, and there haven't been any
complaints (at least not yet). OTOH, we do see significant CPU savings
on reading memcg stats.

That being said, I think we can improve the performance here by
caching pointers to the parent_memcg->vmstats_percpu and
memcg->vmstats in struct memcg_vmstat_percpu. This should
significantly reduce the memory fetches in the loop in
memcg_rstat_updated().

Oliver, would you be able to test if the attached patch helps? It's
based on 8d59d2214c236.

[..]

[-- Attachment #2: 0001-mm-memcg-optimize-parent-iteration-in-memcg_rstat_up.patch --]
[-- Type: application/octet-stream, Size: 4006 bytes --]

From 8d04c38137c71d1577a8576fb75db07f3bf92491 Mon Sep 17 00:00:00 2001
From: Yosry Ahmed <yosryahmed@google.com>
Date: Mon, 22 Jan 2024 21:35:29 +0000
Subject: [PATCH] mm: memcg: optimize parent iteration in memcg_rstat_updated()

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/memcontrol.c | 45 ++++++++++++++++++++++++++++-----------------
 1 file changed, 28 insertions(+), 17 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c5aa0c2cb68b2..b5ec4a8413215 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -634,6 +634,10 @@ struct memcg_vmstats_percpu {
 
 	/* Stats updates since the last flush */
 	unsigned int		stats_updates;
+
+	/* Cached pointers for fast updates in memcg_rstat_updated() */
+	struct memcg_vmstats_percpu	*parent;
+	struct memcg_vmstats		*vmstats;
 };
 
 struct memcg_vmstats {
@@ -698,36 +702,34 @@ static void memcg_stats_unlock(void)
 }
 
 
-static bool memcg_should_flush_stats(struct mem_cgroup *memcg)
+static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
 {
-	return atomic64_read(&memcg->vmstats->stats_updates) >
+	return atomic64_read(&vmstats->stats_updates) >
 		MEMCG_CHARGE_BATCH * num_online_cpus();
 }
 
 static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
 {
+	struct memcg_vmstats_percpu *statc;
 	int cpu = smp_processor_id();
-	unsigned int x;
 
 	if (!val)
 		return;
 
 	cgroup_rstat_updated(memcg->css.cgroup, cpu);
-
-	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
-		x = __this_cpu_add_return(memcg->vmstats_percpu->stats_updates,
-					  abs(val));
-
-		if (x < MEMCG_CHARGE_BATCH)
+	statc = this_cpu_ptr(memcg->vmstats_percpu);
+	for (; statc; statc = statc->parent) {
+		statc->stats_updates += abs(val);
+		if (statc->stats_updates < MEMCG_CHARGE_BATCH)
 			continue;
 
 		/*
 		 * If @memcg is already flush-able, increasing stats_updates is
 		 * redundant. Avoid the overhead of the atomic update.
 		 */
-		if (!memcg_should_flush_stats(memcg))
-			atomic64_add(x, &memcg->vmstats->stats_updates);
-		__this_cpu_write(memcg->vmstats_percpu->stats_updates, 0);
+		if (!memcg_vmstats_needs_flush(statc->vmstats))
+			atomic64_add(x, &statc->vmstats->stats_updates);
+		statc->stats_updates = 0;
 	}
 }
 
@@ -751,7 +753,7 @@ static void do_flush_stats(void)
 
 void mem_cgroup_flush_stats(void)
 {
-	if (memcg_should_flush_stats(root_mem_cgroup))
+	if (memcg_vmstats_needs_flush(root_mem_cgroup->vmstats))
 		do_flush_stats();
 }
 
@@ -765,7 +767,7 @@ void mem_cgroup_flush_stats_ratelimited(void)
 static void flush_memcg_stats_dwork(struct work_struct *w)
 {
 	/*
-	 * Deliberately ignore memcg_should_flush_stats() here so that flushing
+	 * Deliberately ignore memcg_vmstats_needs_flush() here so that flushing
 	 * in latency-sensitive paths is as cheap as possible.
 	 */
 	do_flush_stats();
@@ -5453,10 +5455,11 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
 	__mem_cgroup_free(memcg);
 }
 
-static struct mem_cgroup *mem_cgroup_alloc(void)
+static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
 {
+	struct memcg_vmstats_percpu *statc, *pstatc;
 	struct mem_cgroup *memcg;
-	int node;
+	int node, cpu;
 	int __maybe_unused i;
 	long error = -ENOMEM;
 
@@ -5480,6 +5483,14 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	if (!memcg->vmstats_percpu)
 		goto fail;
 
+	for_each_possible_cpu(cpu) {
+		if (parent)
+			pstatc = per_cpu_ptr(parent->vmstats_percpu, cpu);
+		statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
+		statc->parent = parent ? pstatc : NULL;
+		statc->vmstats = memcg->vmstats;
+	}
+
 	for_each_node(node)
 		if (alloc_mem_cgroup_per_node_info(memcg, node))
 			goto fail;
@@ -5525,7 +5536,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	struct mem_cgroup *memcg, *old_memcg;
 
 	old_memcg = set_active_memcg(parent);
-	memcg = mem_cgroup_alloc();
+	memcg = mem_cgroup_alloc(parent);
 	set_active_memcg(old_memcg);
 	if (IS_ERR(memcg))
 		return ERR_CAST(memcg);
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression
  2024-01-22 21:39 ` Yosry Ahmed
@ 2024-01-23  7:21   ` Oliver Sang
  2024-01-23  7:42     ` Yosry Ahmed
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Sang @ 2024-01-23  7:21 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner,
	Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen,
	Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song,
	Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups,
	linux-mm, ying.huang, feng.tang, fengwei.yin, oliver.sang

hi, Yosry Ahmed,

On Mon, Jan 22, 2024 at 01:39:19PM -0800, Yosry Ahmed wrote:
> On Mon, Jan 22, 2024 at 12:39 AM kernel test robot
> <oliver.sang@intel.com> wrote:
> >
> >
> >
> > hi, Yosry Ahmed,
> >
> > per your suggestion in
> > https://lore.kernel.org/all/CAJD7tkameJBrJQxRj+ibKL6-yd-i0wyoyv2cgZdh3ZepA1p7wA@mail.gmail.com/
> > "I think it would be useful to know if there are
> > regressions/improvements in other microbenchmarks, at least to
> > investigate whether they represent real regressions."
> >
> > we still report below two regressions to you just FYI what we observed in our
> > microbenchmark tests.
> > (we still captured will-it-scale::fallocate regression but ignore here per
> > your commit message)
> >
> >
> > Hello,
> >
> > kernel test robot noticed a -36.6% regression of vm-scalability.throughput on:
> >
> >
> > commit: 8d59d2214c2362e7a9d185d80b613e632581af7b ("mm: memcg: make stats flushing threshold per-memcg")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > testcase: vm-scalability
> > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
> > parameters:
> >
> >         runtime: 300s
> >         size: 1T
> >         test: lru-shm
> >         cpufreq_governor: performance
> >
> > test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
> > test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
> >
> > In addition to that, the commit also has significant impact on the following tests:
> >
> > +------------------+----------------------------------------------------------------------------------------------------+
> > | testcase: change | will-it-scale: will-it-scale.per_process_ops -32.3% regression                                     |
> > | test machine     | 104 threads 2 sockets (Skylake) with 192G memory                                                   |
> > | test parameters  | cpufreq_governor=performance                                                                       |
> > |                  | mode=process                                                                                       |
> > |                  | nr_task=50%                                                                                        |
> > |                  | test=tlb_flush2                                                                                    |
> > +------------------+----------------------------------------------------------------------------------------------------+
> >
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <oliver.sang@intel.com>
> > | Closes: https://lore.kernel.org/oe-lkp/202401221624.cb53a8ca-oliver.sang@intel.com
> 
> Thanks for reporting this. We have had these patches running on O(10K)
> machines in our production for a while now, and there haven't been any
> complaints (at least not yet). OTOH, we do see significant CPU savings
> on reading memcg stats.
> 
> That being said, I think we can improve the performance here by
> caching pointers to the parent_memcg->vmstats_percpu and
> memcg->vmstats in struct memcg_vmstat_percpu. This should
> significantly reduce the memory fetches in the loop in
> memcg_rstat_updated().
> 
> Oliver, would you be able to test if the attached patch helps? It's
> based on 8d59d2214c236.

the patch failed to compile:

build_errors:
  - "mm/memcontrol.c:731:38: error: 'x' undeclared (first use in this function)"


> 
> [..]



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression
  2024-01-23  7:21   ` Oliver Sang
@ 2024-01-23  7:42     ` Yosry Ahmed
  2024-01-24  8:26       ` Oliver Sang
  0 siblings, 1 reply; 6+ messages in thread
From: Yosry Ahmed @ 2024-01-23  7:42 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner,
	Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen,
	Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song,
	Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups,
	linux-mm, ying.huang, feng.tang, fengwei.yin

[-- Attachment #1: Type: text/plain, Size: 379 bytes --]

> > Oliver, would you be able to test if the attached patch helps? It's
> > based on 8d59d2214c236.
>
> the patch failed to compile:
>
> build_errors:
>   - "mm/memcontrol.c:731:38: error: 'x' undeclared (first use in this function)"

Apologizes, apparently I sent the patch with some pending diff in my
tree that I hadn't committed. Please find a fixed patch attached.

Thanks.

[-- Attachment #2: 0001-mm-memcg-optimize-parent-iteration-in-memcg_rstat_up.patch --]
[-- Type: application/octet-stream, Size: 4036 bytes --]

From 1b00b4e0bbc215fcebb9d3d45e5d63135b7b7e89 Mon Sep 17 00:00:00 2001
From: Yosry Ahmed <yosryahmed@google.com>
Date: Mon, 22 Jan 2024 21:35:29 +0000
Subject: [PATCH] mm: memcg: optimize parent iteration in memcg_rstat_updated()

Signed-off-by: Yosry Ahmed <yosryahmed@google.com>
---
 mm/memcontrol.c | 46 +++++++++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 17 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index c5aa0c2cb68b2..d6a9d6dad2f00 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -634,6 +634,10 @@ struct memcg_vmstats_percpu {
 
 	/* Stats updates since the last flush */
 	unsigned int		stats_updates;
+
+	/* Cached pointers for fast updates in memcg_rstat_updated() */
+	struct memcg_vmstats_percpu	*parent;
+	struct memcg_vmstats		*vmstats;
 };
 
 struct memcg_vmstats {
@@ -698,36 +702,35 @@ static void memcg_stats_unlock(void)
 }
 
 
-static bool memcg_should_flush_stats(struct mem_cgroup *memcg)
+static bool memcg_vmstats_needs_flush(struct memcg_vmstats *vmstats)
 {
-	return atomic64_read(&memcg->vmstats->stats_updates) >
+	return atomic64_read(&vmstats->stats_updates) >
 		MEMCG_CHARGE_BATCH * num_online_cpus();
 }
 
 static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
 {
+	struct memcg_vmstats_percpu *statc;
 	int cpu = smp_processor_id();
-	unsigned int x;
 
 	if (!val)
 		return;
 
 	cgroup_rstat_updated(memcg->css.cgroup, cpu);
-
-	for (; memcg; memcg = parent_mem_cgroup(memcg)) {
-		x = __this_cpu_add_return(memcg->vmstats_percpu->stats_updates,
-					  abs(val));
-
-		if (x < MEMCG_CHARGE_BATCH)
+	statc = this_cpu_ptr(memcg->vmstats_percpu);
+	for (; statc; statc = statc->parent) {
+		statc->stats_updates += abs(val);
+		if (statc->stats_updates < MEMCG_CHARGE_BATCH)
 			continue;
 
 		/*
 		 * If @memcg is already flush-able, increasing stats_updates is
 		 * redundant. Avoid the overhead of the atomic update.
 		 */
-		if (!memcg_should_flush_stats(memcg))
-			atomic64_add(x, &memcg->vmstats->stats_updates);
-		__this_cpu_write(memcg->vmstats_percpu->stats_updates, 0);
+		if (!memcg_vmstats_needs_flush(statc->vmstats))
+			atomic64_add(statc->stats_updates,
+				     &statc->vmstats->stats_updates);
+		statc->stats_updates = 0;
 	}
 }
 
@@ -751,7 +754,7 @@ static void do_flush_stats(void)
 
 void mem_cgroup_flush_stats(void)
 {
-	if (memcg_should_flush_stats(root_mem_cgroup))
+	if (memcg_vmstats_needs_flush(root_mem_cgroup->vmstats))
 		do_flush_stats();
 }
 
@@ -765,7 +768,7 @@ void mem_cgroup_flush_stats_ratelimited(void)
 static void flush_memcg_stats_dwork(struct work_struct *w)
 {
 	/*
-	 * Deliberately ignore memcg_should_flush_stats() here so that flushing
+	 * Deliberately ignore memcg_vmstats_needs_flush() here so that flushing
 	 * in latency-sensitive paths is as cheap as possible.
 	 */
 	do_flush_stats();
@@ -5453,10 +5456,11 @@ static void mem_cgroup_free(struct mem_cgroup *memcg)
 	__mem_cgroup_free(memcg);
 }
 
-static struct mem_cgroup *mem_cgroup_alloc(void)
+static struct mem_cgroup *mem_cgroup_alloc(struct mem_cgroup *parent)
 {
+	struct memcg_vmstats_percpu *statc, *pstatc;
 	struct mem_cgroup *memcg;
-	int node;
+	int node, cpu;
 	int __maybe_unused i;
 	long error = -ENOMEM;
 
@@ -5480,6 +5484,14 @@ static struct mem_cgroup *mem_cgroup_alloc(void)
 	if (!memcg->vmstats_percpu)
 		goto fail;
 
+	for_each_possible_cpu(cpu) {
+		if (parent)
+			pstatc = per_cpu_ptr(parent->vmstats_percpu, cpu);
+		statc = per_cpu_ptr(memcg->vmstats_percpu, cpu);
+		statc->parent = parent ? pstatc : NULL;
+		statc->vmstats = memcg->vmstats;
+	}
+
 	for_each_node(node)
 		if (alloc_mem_cgroup_per_node_info(memcg, node))
 			goto fail;
@@ -5525,7 +5537,7 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css)
 	struct mem_cgroup *memcg, *old_memcg;
 
 	old_memcg = set_active_memcg(parent);
-	memcg = mem_cgroup_alloc();
+	memcg = mem_cgroup_alloc(parent);
 	set_active_memcg(old_memcg);
 	if (IS_ERR(memcg))
 		return ERR_CAST(memcg);
-- 
2.43.0.429.g432eaa2c6b-goog


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression
  2024-01-23  7:42     ` Yosry Ahmed
@ 2024-01-24  8:26       ` Oliver Sang
  2024-01-24  9:11         ` Yosry Ahmed
  0 siblings, 1 reply; 6+ messages in thread
From: Oliver Sang @ 2024-01-24  8:26 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner,
	Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen,
	Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song,
	Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups,
	linux-mm, ying.huang, feng.tang, fengwei.yin, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 3943 bytes --]

hi, Yosry Ahmed,

On Mon, Jan 22, 2024 at 11:42:04PM -0800, Yosry Ahmed wrote:
> > > Oliver, would you be able to test if the attached patch helps? It's
> > > based on 8d59d2214c236.
> >
> > the patch failed to compile:
> >
> > build_errors:
> >   - "mm/memcontrol.c:731:38: error: 'x' undeclared (first use in this function)"
> 
> Apologizes, apparently I sent the patch with some pending diff in my
> tree that I hadn't committed. Please find a fixed patch attached.

the regression disappears after applying the patch.

Tested-by: kernel test robot <oliver.sang@intel.com>

for 1st regression we reported (details is attached as vm-scalability):

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/1T/lkp-cpl-4sp2/lru-shm/vm-scalability

commit:
  e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code
  8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg
  0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated()

e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    946447           -37.8%     588327            -1.1%     936279        vm-scalability.median
 2.131e+08           -36.6%  1.351e+08            -1.4%  2.102e+08        vm-scalability.throughput



for 2nd regression (details is attached as will-it-scale-tlb_flush2):

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale

commit:
  e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code
  8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg
  0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated()

e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   2132437           -32.3%    1444430            +0.9%    2151460        will-it-scale.52.processes
     41008           -32.3%      27776            +0.9%      41373        will-it-scale.per_process_ops



interesting thing is, it also helps on will-it-scale:fallocate tests which we
ignored in original report (details is attached as will-it-scale-fallocate1):

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/fallocate1/will-it-scale

commit:
  e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code
  8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg
  0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated()

e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
   5426049           -33.8%    3590953            +3.3%    5605429        will-it-scale.224.processes
     24222           -33.8%      16030            +3.3%      25023        will-it-scale.per_process_ops

> 
> Thanks.



[-- Attachment #2: vm-scalability --]
[-- Type: text/plain, Size: 82648 bytes --]

=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/1T/lkp-cpl-4sp2/lru-shm/vm-scalability

commit:
  e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code
  8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg
  0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated()

e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
      0.01           +86.7%       0.02            +3.7%       0.01        vm-scalability.free_time
    946447           -37.8%     588327            -1.1%     936279        vm-scalability.median
 2.131e+08           -36.6%  1.351e+08            -1.4%  2.102e+08        vm-scalability.throughput
    284.74            +6.3%     302.62            +1.5%     288.98        vm-scalability.time.elapsed_time
    284.74            +6.3%     302.62            +1.5%     288.98        vm-scalability.time.elapsed_time.max
     30485           +14.8%      34987            +0.1%      30514        vm-scalability.time.involuntary_context_switches
      1893           +43.6%       2718            -0.1%       1891        vm-scalability.time.percent_of_cpu_this_job_got
      3855           +67.7%       6467            +1.7%       3922        vm-scalability.time.system_time
      1537           +14.5%       1760            +0.5%       1545        vm-scalability.time.user_time
    120009            -5.6%     113290            -0.4%     119542        vm-scalability.time.voluntary_context_switches
      6.46            +3.5        9.95            +0.0        6.48        mpstat.cpu.all.sys%
     21.22           +38.8%      29.46            -0.3%      21.14        vmstat.procs.r
      9376            +0.5%       9422            +2.6%       9621        vmstat.system.cs
    233326            -0.6%     231877            -1.2%     230635        vmstat.system.in
    113624 ±  5%     +14.0%     129566 ±  3%     -11.8%     100234 ±  3%  meminfo.Active
    113476 ±  5%     +14.0%     129417 ±  3%     -11.8%     100070 ±  3%  meminfo.Active(anon)
   3987746           +46.0%    5821636            -0.8%    3954895        meminfo.Mapped
     16345           +14.6%      18729            -2.4%      15952 ±  2%  meminfo.PageTables
    474.17 ±  3%     -88.9%      52.50 ±125%     -98.2%       8.67 ± 31%  perf-c2c.DRAM.local
    483.17 ±  5%     -79.3%      99.83 ± 70%     -87.3%      61.17 ± 21%  perf-c2c.DRAM.remote
      1045 ±  5%     -71.9%     294.00 ± 63%     -80.7%     202.17 ±  6%  perf-c2c.HITM.local
    119.50 ± 10%     -78.8%      25.33 ± 20%     -79.8%      24.17 ± 28%  perf-c2c.HITM.remote
    392.33           +35.4%     531.17            -0.1%     392.00        turbostat.Avg_MHz
     10.35            +3.7       14.00            -0.0       10.34        turbostat.Busy%
     90.56            -3.7       86.86            +0.0       90.57        turbostat.C1%
      0.28 ±  5%     -31.5%       0.19            +0.0%       0.28 ±  2%  turbostat.IPC
    481.33            +2.5%     493.38            -0.3%     480.09        turbostat.PkgWatt
    999019 ±  3%     +44.4%    1442651 ±  2%      -1.4%     984731 ±  4%  numa-meminfo.node0.Mapped
   1005687 ±  4%     +44.1%    1449402 ±  3%      +2.3%    1029138        numa-meminfo.node1.Mapped
      3689 ±  3%     +21.7%       4490 ±  7%      +5.7%       3899 ±  7%  numa-meminfo.node1.PageTables
    980589 ±  2%     +42.3%    1395777 ±  2%      +1.6%     996328 ±  3%  numa-meminfo.node2.Mapped
     96484 ±  5%     +22.0%     117715 ±  4%      -9.0%      87779 ±  4%  numa-meminfo.node3.Active
     96430 ±  5%     +22.1%     117694 ±  4%      -9.0%      87737 ±  4%  numa-meminfo.node3.Active(anon)
    991367 ±  3%     +42.7%    1414337 ±  4%      -0.7%     984261 ±  2%  numa-meminfo.node3.Mapped
    251219 ±  3%     +44.8%     363745 ±  2%      -2.9%     244018 ±  5%  numa-vmstat.node0.nr_mapped
    253252 ±  2%     +44.6%     366087 ±  3%      +0.8%     255216        numa-vmstat.node1.nr_mapped
    927.67 ±  3%     +21.9%       1130 ±  7%      +4.4%     968.41 ±  7%  numa-vmstat.node1.nr_page_table_pages
    248171 ±  2%     +42.5%     353541 ±  4%      -0.7%     246429 ±  3%  numa-vmstat.node2.nr_mapped
     24188 ±  5%     +21.6%      29410 ±  4%      -9.2%      21963 ±  4%  numa-vmstat.node3.nr_active_anon
    245825 ±  2%     +45.5%     357622 ±  3%      -0.2%     245258 ±  3%  numa-vmstat.node3.nr_mapped
      1038 ± 11%     +17.8%       1224 ±  6%      -4.5%     992.13 ±  2%  numa-vmstat.node3.nr_page_table_pages
     24188 ±  5%     +21.6%      29410 ±  4%      -9.2%      21963 ±  4%  numa-vmstat.node3.nr_zone_active_anon
    284.74            +6.3%     302.62            +1.5%     288.98        time.elapsed_time
    284.74            +6.3%     302.62            +1.5%     288.98        time.elapsed_time.max
     30485           +14.8%      34987            +0.1%      30514        time.involuntary_context_switches
    448.67 ±  3%     -18.4%     366.00 ±  4%      -2.5%     437.67 ±  2%  time.major_page_faults
      1893           +43.6%       2718            -0.1%       1891        time.percent_of_cpu_this_job_got
      3855           +67.7%       6467            +1.7%       3922        time.system_time
      1537           +14.5%       1760            +0.5%       1545        time.user_time
    120009            -5.6%     113290            -0.4%     119542        time.voluntary_context_switches
     28376 ±  5%     +14.0%      32338 ±  3%     -11.8%      25021 ±  3%  proc-vmstat.nr_active_anon
    993504           +46.6%    1456136            -0.8%     985427        proc-vmstat.nr_mapped
      4060           +15.5%       4691            -2.1%       3977        proc-vmstat.nr_page_table_pages
     28376 ±  5%     +14.0%      32338 ±  3%     -11.8%      25021 ±  3%  proc-vmstat.nr_zone_active_anon
 1.066e+09            -2.0%  1.045e+09            -0.0%  1.066e+09        proc-vmstat.numa_hit
 1.065e+09            -2.0%  1.044e+09            +0.0%  1.065e+09        proc-vmstat.numa_local
     69848 ±  2%      -1.5%      68819 ±  2%     -13.1%      60717 ±  2%  proc-vmstat.pgactivate
      5659            +5.6%       5978            +1.0%       5713        proc-vmstat.unevictable_pgs_culled
  34604288            +3.7%   35898496            +1.1%   35001600        proc-vmstat.unevictable_pgs_scanned
      0.08 ±111%     +14.5%       0.09 ± 85%    +351.6%       0.36 ± 46%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
      0.01 ± 35%     +20.5%       0.01 ± 30%     -49.3%       0.01 ± 17%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.01 ± 20%   +1887.0%       0.18 ±203%     +61.1%       0.01 ± 51%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      6.73 ±221%     +10.1%       7.41 ±102%   +1086.8%      79.86 ±111%  perf-sched.sch_delay.max.ms.pipe_write.vfs_write.ksys_write.do_syscall_64
     20.95 ±118%     +66.6%      34.90 ± 45%    +372.6%      99.03 ± 24%  perf-sched.sch_delay.max.ms.schedule_hrtimeout_range_clock.do_select.core_sys_select.kern_select
     10.50 ±172%     -78.5%       2.26 ±205%    +824.8%      97.06 ± 88%  perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.01 ± 28%     +63.3%       0.01 ± 29%     +30.0%       0.01 ± 24%  perf-sched.sch_delay.max.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
      4539 ±  6%      -0.0%       4538 ±  3%     -12.0%       3994 ±  2%  perf-sched.total_wait_and_delay.max.ms
      4539 ±  6%      -0.0%       4538 ±  3%     -12.0%       3994 ±  2%  perf-sched.total_wait_time.max.ms
    524.54 ± 91%      +0.1%     525.21 ± 91%     -91.3%      45.89 ±  2%  perf-sched.wait_and_delay.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
    524.53 ± 91%      +0.1%     524.88 ± 91%     -91.3%      45.88 ±  2%  perf-sched.wait_time.max.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
   1223376 ± 14%    +119.1%    2680582 ±  9%      +1.4%    1239953 ± 14%  sched_debug.cfs_rq:/.avg_vruntime.avg
   1673909 ± 14%     +97.6%    3308254 ±  8%      -0.6%    1663719 ± 12%  sched_debug.cfs_rq:/.avg_vruntime.max
    810795 ± 15%    +145.8%    1993289 ±  9%      +0.1%     811327 ± 18%  sched_debug.cfs_rq:/.avg_vruntime.min
    156233 ±  8%     +55.1%     242331 ±  6%      +6.4%     166243 ±  8%  sched_debug.cfs_rq:/.avg_vruntime.stddev
      0.17 ± 36%     +10.4%       0.19 ± 25%     -76.3%       0.04 ± 10%  sched_debug.cfs_rq:/.h_nr_running.avg
      0.28 ± 17%      -2.6%       0.27 ±  8%     -29.5%       0.20 ±  7%  sched_debug.cfs_rq:/.h_nr_running.stddev
   1223376 ± 14%    +119.1%    2680582 ±  9%      +1.4%    1239953 ± 14%  sched_debug.cfs_rq:/.min_vruntime.avg
   1673909 ± 14%     +97.6%    3308254 ±  8%      -0.6%    1663719 ± 12%  sched_debug.cfs_rq:/.min_vruntime.max
    810795 ± 15%    +145.8%    1993289 ±  9%      +0.1%     811327 ± 18%  sched_debug.cfs_rq:/.min_vruntime.min
    156233 ±  8%     +55.1%     242331 ±  6%      +6.4%     166243 ±  8%  sched_debug.cfs_rq:/.min_vruntime.stddev
      0.17 ± 36%     +10.6%       0.19 ± 25%     -76.5%       0.04 ± 10%  sched_debug.cfs_rq:/.nr_running.avg
      0.28 ± 16%      -2.0%       0.27 ±  8%     -29.8%       0.19 ±  8%  sched_debug.cfs_rq:/.nr_running.stddev
    247.44 ± 17%      -1.9%     242.83 ±  9%     -34.5%     162.07 ±  7%  sched_debug.cfs_rq:/.runnable_avg.stddev
    182.32 ± 33%      +8.6%     197.96 ± 25%     -71.3%      52.23 ± 11%  sched_debug.cfs_rq:/.util_avg.avg
    245.26 ± 17%      -1.9%     240.57 ± 10%     -35.1%     159.09 ±  6%  sched_debug.cfs_rq:/.util_avg.stddev
     31.39 ± 29%     +19.1%      37.39 ± 23%     -73.0%       8.47 ± 31%  sched_debug.cfs_rq:/.util_est_enqueued.avg
    126445 ±  3%     -11.0%     112493 ±  4%     -12.3%     110951 ±  8%  sched_debug.cpu.avg_idle.stddev
      5970 ± 46%     -24.7%       4497 ± 54%     -95.0%     300.50 ±  6%  sched_debug.cpu.curr->pid.avg
      7221 ± 31%     -15.2%       6125 ± 28%     -62.3%       2719 ±  9%  sched_debug.cpu.curr->pid.stddev
      0.25 ± 21%      +0.1%       0.25 ± 12%     -39.2%       0.15 ±  8%  sched_debug.cpu.nr_running.stddev
      1447 ± 15%     +32.0%       1910 ±  9%      +3.3%       1494 ± 14%  sched_debug.cpu.nr_switches.min
      0.71           +13.4%       0.80            -1.8%       0.69 ±  2%  perf-stat.i.MPKI
 2.343e+10            -7.9%  2.157e+10            -1.2%  2.315e+10        perf-stat.i.branch-instructions
      0.36            -0.0        0.35            -0.0        0.35        perf-stat.i.branch-miss-rate%
  30833194            -7.3%   28584190            -0.8%   30598705        perf-stat.i.branch-misses
     26.04            -1.4       24.66            -0.5       25.53 ±  2%  perf-stat.i.cache-miss-rate%
  51345490 ±  3%     +40.7%   72258633 ±  3%      +1.1%   51925265 ±  5%  perf-stat.i.cache-misses
 1.616e+08 ±  6%     +58.6%  2.562e+08 ±  6%      +4.9%  1.695e+08 ± 10%  perf-stat.i.cache-references
      9297            +0.6%       9355            +2.8%       9558        perf-stat.i.context-switches
      1.29            +9.4%       1.42            -0.9%       1.28        perf-stat.i.cpi
 8.394e+10           +33.7%  1.122e+11            -0.6%  8.344e+10        perf-stat.i.cpu-cycles
    505.77            -2.6%     492.52            -1.2%     499.66        perf-stat.i.cpu-migrations
      0.03            +0.0        0.03 ±  2%      -0.0        0.03 ±  2%  perf-stat.i.dTLB-load-miss-rate%
 2.335e+10            -7.4%  2.162e+10            -0.9%  2.315e+10        perf-stat.i.dTLB-loads
      0.03            +0.0        0.03            -0.0        0.03        perf-stat.i.dTLB-store-miss-rate%
   3948344            -8.0%    3633633            -2.0%    3867670        perf-stat.i.dTLB-store-misses
 6.549e+09            -7.0%   6.09e+09            -0.3%  6.528e+09        perf-stat.i.dTLB-stores
  17546602           -22.8%   13551001           -15.2%   14872025        perf-stat.i.iTLB-load-misses
   2552560            -2.6%    2485876            +0.1%    2555872        perf-stat.i.iTLB-loads
 8.367e+10            -7.5%  7.737e+10            -0.9%  8.288e+10        perf-stat.i.instructions
      4706            +7.7%       5070            +4.6%       4922        perf-stat.i.instructions-per-iTLB-miss
      0.81           -12.0%       0.72            +0.5%       0.82        perf-stat.i.ipc
      1.59 ±  3%     -22.3%       1.23 ±  4%      -4.5%       1.52 ±  3%  perf-stat.i.major-faults
      0.37           +34.2%       0.49            -0.4%       0.37        perf-stat.i.metric.GHz
    233.98            -6.9%     217.90            -0.7%     232.33        perf-stat.i.metric.M/sec
   3619177            -9.5%    3276556            -2.3%    3535780        perf-stat.i.minor-faults
     74.28            +4.8       79.04            +0.5       74.78        perf-stat.i.node-load-miss-rate%
   2898733 ±  4%     +49.0%    4320557            -3.5%    2796977 ±  6%  perf-stat.i.node-load-misses
   1928237 ±  4%     -11.9%    1698426            -0.4%    1920388 ±  6%  perf-stat.i.node-loads
  13383344 ±  2%      +4.7%   14013398 ±  3%      -0.3%   13338644 ±  3%  perf-stat.i.node-stores
   3619179            -9.5%    3276558            -2.3%    3535782        perf-stat.i.page-faults
      0.61 ±  3%     +52.5%       0.94 ±  3%      +2.1%       0.63 ±  5%  perf-stat.overall.MPKI
     31.95 ±  2%      -3.6       28.34 ±  3%      -1.0       30.92 ±  4%  perf-stat.overall.cache-miss-rate%
      1.00           +45.0%       1.45            +0.3%       1.00        perf-stat.overall.cpi
      0.07            +0.0        0.08 ±  4%      +0.0        0.07 ±  2%  perf-stat.overall.dTLB-load-miss-rate%
      0.06            -0.0        0.06            -0.0        0.06        perf-stat.overall.dTLB-store-miss-rate%
     87.62            -2.6       85.05            -1.9       85.75        perf-stat.overall.iTLB-load-miss-rate%
      4778           +20.2%       5745           +17.3%       5604        perf-stat.overall.instructions-per-iTLB-miss
      1.00           -31.0%       0.69            -0.3%       1.00        perf-stat.overall.ipc
     59.75 ±  3%     +11.8       71.59            -0.8       58.91 ±  5%  perf-stat.overall.node-load-miss-rate%
      5145            +1.8%       5239            +1.2%       5208        perf-stat.overall.path-length
 2.405e+10            -6.3%  2.252e+10            -0.3%  2.396e+10        perf-stat.ps.branch-instructions
  31203502            -6.4%   29219514            -0.3%   31124801        perf-stat.ps.branch-misses
  52696784 ±  3%     +43.4%   75547948 ±  3%      +1.9%   53714277 ±  5%  perf-stat.ps.cache-misses
 1.652e+08 ±  6%     +61.7%  2.672e+08 ±  7%      +5.7%  1.746e+08 ± 11%  perf-stat.ps.cache-references
      9279            +0.5%       9326            +2.6%       9525        perf-stat.ps.context-switches
 8.584e+10           +36.3%   1.17e+11            +0.1%  8.594e+10        perf-stat.ps.cpu-cycles
    506.29            -2.0%     496.05            -0.7%     502.50        perf-stat.ps.cpu-migrations
 2.395e+10            -5.9%  2.254e+10            -0.1%  2.393e+10        perf-stat.ps.dTLB-loads
   4059043            -6.2%    3806002            -1.1%    4012385        perf-stat.ps.dTLB-store-misses
 6.688e+09            -5.7%  6.308e+09            +0.4%  6.714e+09        perf-stat.ps.dTLB-stores
  17944396           -21.8%   14028927           -14.9%   15276611        perf-stat.ps.iTLB-load-misses
   2534093            -2.7%    2465233            +0.2%    2538321        perf-stat.ps.iTLB-loads
 8.575e+10            -6.0%  8.059e+10            -0.2%  8.561e+10        perf-stat.ps.instructions
      1.60 ±  3%     -23.2%       1.23 ±  4%      -3.9%       1.54 ±  2%  perf-stat.ps.major-faults
   3726053            -7.7%    3439511            -1.4%    3674617        perf-stat.ps.minor-faults
   2942507 ±  4%     +52.0%    4472428            -2.9%    2857607 ±  6%  perf-stat.ps.node-load-misses
   1980077 ±  4%     -10.4%    1774633            +0.5%    1989918 ±  6%  perf-stat.ps.node-loads
  13780660 ±  2%      +6.8%   14716100 ±  3%      +0.6%   13865246 ±  3%  perf-stat.ps.node-stores
   3726055            -7.7%    3439513            -1.4%    3674618        perf-stat.ps.page-faults
 2.447e+13            -0.2%  2.443e+13            +1.2%  2.477e+13        perf-stat.total.instructions
     37.11            -6.7       30.40 ±  6%      +3.0       40.09        perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
     21.14            -3.8       17.36 ±  7%      +3.1       24.20 ±  2%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     21.05            -3.8       17.29 ±  7%      +3.0       24.08 ±  2%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     21.05            -3.8       17.29 ±  7%      +3.0       24.09 ±  2%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     21.05            -3.8       17.29 ±  7%      +3.0       24.09 ±  2%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     21.00            -3.8       17.25 ±  7%      +3.0       24.02 ±  2%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     20.70            -3.7       17.00 ±  7%      +2.9       23.59        perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     20.69            -3.7       16.99 ±  7%      +2.9       23.57        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     20.64            -3.7       16.95 ±  7%      +2.8       23.48        perf-profile.calltrace.cycles-pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
      9.51 ±  3%      -1.9        7.57 ±  2%      -1.1        8.44        perf-profile.calltrace.cycles-pp.do_rw_once
      4.54            -1.4        3.19            -0.5        4.08 ±  2%  perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      2.83            -0.9        1.96            -0.3        2.55 ±  2%  perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
      0.75 ±  2%      -0.6        0.17 ±141%      -0.1        0.68 ±  3%  perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio
      3.90            -0.6        3.34 ±  5%      -0.5        3.37 ±  2%  perf-profile.calltrace.cycles-pp.clear_page_erms.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
      4.44 ±  6%      -0.5        3.98 ±  3%      -0.8        3.68 ±  2%  perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      2.96 ±  4%      -0.4        2.52 ± 26%      +2.4        5.40 ±  4%  perf-profile.calltrace.cycles-pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call
      1.17 ±  3%      -0.4        0.73 ±  6%      -0.2        1.01 ±  4%  perf-profile.calltrace.cycles-pp.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
      1.42 ±  2%      -0.4        0.99 ±  2%      -0.1        1.28 ±  2%  perf-profile.calltrace.cycles-pp.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
      1.32 ±  2%      -0.4        0.91            -0.1        1.19 ±  2%  perf-profile.calltrace.cycles-pp.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      0.61 ±  6%      -0.4        0.23 ±141%      +0.5        1.09 ±  3%  perf-profile.calltrace.cycles-pp.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues
      1.19 ±  2%      -0.4        0.82            -0.1        1.07 ±  3%  perf-profile.calltrace.cycles-pp.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      2.24 ±  6%      -0.3        1.91 ± 24%      +1.8        4.08 ±  4%  perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
      0.92 ± 12%      -0.3        0.60 ± 74%      +0.8        1.69 ±  7%  perf-profile.calltrace.cycles-pp.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      0.96 ±  2%      -0.3        0.65 ±  2%      -0.1        0.86 ±  3%  perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.alloc_pages_mpol.shmem_alloc_folio.shmem_alloc_and_add_folio
      0.98 ±  2%      -0.3        0.68 ±  4%      -0.1        0.89 ±  2%  perf-profile.calltrace.cycles-pp.sync_regs.asm_exc_page_fault.do_access
      0.76 ±  7%      -0.3        0.49 ± 75%      +0.6        1.37 ±  4%  perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt
      0.77 ±  7%      -0.3        0.49 ± 75%      +0.6        1.38 ±  4%  perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_nohz_highres_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
      0.80 ± 17%      -0.2        0.57 ± 74%      +0.8        1.63 ± 12%  perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict
      1.99 ±  6%      -0.2        1.77 ± 21%      +1.5        3.44 ±  4%  perf-profile.calltrace.cycles-pp.kthread.ret_from_fork.ret_from_fork_asm
      1.99 ±  6%      -0.2        1.77 ± 21%      +1.5        3.44 ±  4%  perf-profile.calltrace.cycles-pp.ret_from_fork.ret_from_fork_asm
      1.99 ±  6%      -0.2        1.77 ± 21%      +1.5        3.44 ±  4%  perf-profile.calltrace.cycles-pp.ret_from_fork_asm
      1.95 ±  6%      -0.2        1.74 ± 21%      +1.4        3.37 ±  4%  perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.60 ±  7%      -0.2        1.38 ± 25%      +1.4        3.00 ±  3%  perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter
      1.60 ±  7%      -0.2        1.38 ± 25%      +1.4        2.99 ±  3%  perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
      1.96 ±  6%      -0.2        1.74 ± 21%      +1.4        3.39 ±  4%  perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1.92 ±  6%      -0.2        1.71 ± 21%      +1.4        3.32 ±  4%  perf-profile.calltrace.cycles-pp.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread.ret_from_fork
      1.92 ±  6%      -0.2        1.71 ± 21%      +1.4        3.32 ±  4%  perf-profile.calltrace.cycles-pp.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread.kthread
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.calltrace.cycles-pp.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.calltrace.cycles-pp.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.unlinkat
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.unlinkat
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.calltrace.cycles-pp.unlinkat
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.calltrace.cycles-pp.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.calltrace.cycles-pp.shmem_evict_inode.evict.do_unlinkat.__x64_sys_unlinkat.do_syscall_64
      2.17 ± 17%      -0.2        1.97 ± 30%      +2.2        4.38 ± 12%  perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_evict_inode.evict.do_unlinkat.__x64_sys_unlinkat
      1.81 ±  6%      -0.2        1.61 ± 22%      +1.4        3.18 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work.worker_thread
      1.81 ±  6%      -0.2        1.61 ± 22%      +1.4        3.18 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work.process_one_work
      1.81 ±  6%      -0.2        1.61 ± 22%      +1.4        3.18 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty.drm_fb_helper_damage_work
      1.76 ±  6%      -0.2        1.57 ± 22%      +1.3        3.10 ±  4%  perf-profile.calltrace.cycles-pp.memcpy_toio.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.calltrace.cycles-pp.drm_fb_memcpy.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.calltrace.cycles-pp.ast_primary_plane_helper_atomic_update.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.calltrace.cycles-pp.drm_atomic_helper_commit_planes.drm_atomic_helper_commit_tail_rpm.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit
      1.80 ±  6%      -0.2        1.60 ± 22%      +1.4        3.16 ±  4%  perf-profile.calltrace.cycles-pp.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb.drm_fbdev_generic_helper_fb_dirty
      1.80 ±  6%      -0.2        1.60 ± 22%      +1.4        3.16 ±  4%  perf-profile.calltrace.cycles-pp.ast_mode_config_helper_atomic_commit_tail.commit_tail.drm_atomic_helper_commit.drm_atomic_commit.drm_atomic_helper_dirtyfb
      1.36 ± 10%      -0.2        1.18 ± 28%      +1.2        2.56 ±  5%  perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt
      0.83 ± 17%      -0.2        0.67 ± 53%      +0.8        1.66 ± 12%  perf-profile.calltrace.cycles-pp.filemap_remove_folio.truncate_inode_folio.shmem_undo_range.shmem_evict_inode.evict
      0.81 ± 17%      -0.1        0.66 ± 53%      +0.8        1.65 ± 12%  perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_evict_inode.evict.do_unlinkat
      1.99 ±  5%      -0.1        1.84 ± 22%      +1.4        3.40 ±  5%  perf-profile.calltrace.cycles-pp.write
      1.98 ±  5%      -0.1        1.84 ± 22%      +1.4        3.40 ±  5%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      1.98 ±  5%      -0.1        1.84 ± 22%      +1.4        3.40 ±  5%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.write
      1.98 ±  5%      -0.1        1.84 ± 22%      +1.4        3.40 ±  5%  perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      1.98 ±  5%      -0.1        1.84 ± 22%      +1.4        3.40 ±  5%  perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write
      1.96 ±  5%      -0.1        1.83 ± 22%      +1.4        3.37 ±  5%  perf-profile.calltrace.cycles-pp.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write
      1.96 ±  5%      -0.1        1.83 ± 22%      +1.4        3.37 ±  5%  perf-profile.calltrace.cycles-pp.console_unlock.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write
      1.96 ±  5%      -0.1        1.83 ± 22%      +1.4        3.37 ±  5%  perf-profile.calltrace.cycles-pp.vprintk_emit.devkmsg_emit.devkmsg_write.vfs_write.ksys_write
      1.96 ±  5%      -0.1        1.83 ± 22%      +1.4        3.37 ±  5%  perf-profile.calltrace.cycles-pp.devkmsg_emit.devkmsg_write.vfs_write.ksys_write.do_syscall_64
      1.96 ±  5%      -0.1        1.83 ± 22%      +1.4        3.37 ±  5%  perf-profile.calltrace.cycles-pp.devkmsg_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.80 ±  5%      -0.1        1.69 ± 22%      +1.3        3.13 ±  5%  perf-profile.calltrace.cycles-pp.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit.devkmsg_emit
      0.55 ± 47%      -0.1        0.45 ± 74%      +0.7        1.27 ± 12%  perf-profile.calltrace.cycles-pp.__filemap_remove_folio.filemap_remove_folio.truncate_inode_folio.shmem_undo_range.shmem_evict_inode
      0.98 ± 17%      -0.1        0.89 ± 30%      +1.0        1.95 ± 12%  perf-profile.calltrace.cycles-pp.truncate_inode_folio.shmem_undo_range.shmem_evict_inode.evict.do_unlinkat
      1.38 ±  5%      -0.1        1.31 ± 20%      +1.0        2.38 ±  6%  perf-profile.calltrace.cycles-pp.wait_for_lsr.serial8250_console_write.console_flush_all.console_unlock.vprintk_emit
      1.18 ±  5%      -0.1        1.12 ± 20%      +0.9        2.04 ±  6%  perf-profile.calltrace.cycles-pp.io_serial_in.wait_for_lsr.serial8250_console_write.console_flush_all.console_unlock
      0.00            +0.0        0.00            +0.6        0.59 ± 13%  perf-profile.calltrace.cycles-pp.ktime_get.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt
      0.00            +0.0        0.00            +0.6        0.61 ± 11%  perf-profile.calltrace.cycles-pp.find_lock_entries.shmem_undo_range.shmem_evict_inode.evict.do_unlinkat
      0.00            +0.0        0.00            +0.7        0.66 ±  8%  perf-profile.calltrace.cycles-pp.perf_adjust_freq_unthr_context.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle
      0.00            +0.0        0.00            +0.7        0.66 ±  5%  perf-profile.calltrace.cycles-pp.__do_softirq.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt
      0.00            +0.0        0.00            +0.7        0.66 ±  8%  perf-profile.calltrace.cycles-pp.perf_event_task_tick.scheduler_tick.update_process_times.tick_sched_handle.tick_nohz_highres_handler
      0.00            +0.0        0.00            +0.8        0.76 ± 12%  perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.__folio_batch_release.shmem_undo_range.shmem_evict_inode
      0.00            +0.1        0.09 ±223%      +0.8        0.80 ± 12%  perf-profile.calltrace.cycles-pp.perf_mux_hrtimer_handler.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt
      0.00            +0.1        0.09 ±223%      +0.8        0.82 ±  7%  perf-profile.calltrace.cycles-pp.irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.acpi_safe_halt.acpi_idle_enter
      1.21            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.calltrace.cycles-pp.__munmap
      1.20            +0.5        1.68 ±  5%      -0.2        1.03 ±  5%  perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap
      1.20            +0.5        1.68 ±  5%      -0.2        1.03 ±  5%  perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap
      1.20            +0.5        1.68 ±  5%      -0.2        1.03 ±  5%  perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.do_vmi_align_munmap
      1.21            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.21            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.21            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
      1.21            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.calltrace.cycles-pp.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
      1.21            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.calltrace.cycles-pp.do_vmi_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.21            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
      1.20            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.calltrace.cycles-pp.unmap_region.do_vmi_align_munmap.do_vmi_munmap.__vm_munmap.__x64_sys_munmap
      1.18            +0.5        1.67 ±  6%      -0.2        1.01 ±  6%  perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
      0.84 ±  2%      +0.6        1.43 ±  5%      -0.1        0.78 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      0.58 ±  3%      +0.6        1.18 ±  5%      -0.2        0.36 ± 71%  perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas
      0.00            +0.8        0.79 ±  4%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
      0.00            +1.0        1.02 ±  5%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio
      0.00            +1.1        1.08 ±  4%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
      0.00            +1.5        1.46 ±  5%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
      3.29 ±  3%      +1.9        5.19            -0.3        3.00 ±  2%  perf-profile.calltrace.cycles-pp.finish_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
      3.02 ±  4%      +2.0        5.00            -0.3        2.77 ±  2%  perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_read_fault.do_fault.__handle_mm_fault
      2.84 ±  4%      +2.0        4.86            -0.2        2.60 ±  2%  perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault.do_fault
      2.73 ±  4%      +2.0        4.77            -0.2        2.50 ±  2%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault
      1.48 ±  4%      +2.1        3.56 ±  2%      -0.1        1.36 ±  3%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.folio_add_file_rmap_range.set_pte_range.finish_fault
      0.57 ±  4%      +2.8        3.35 ±  2%      -0.2        0.36 ± 70%  perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      1.96 ±  5%      +2.9        4.86 ±  2%      -0.3        1.65 ±  3%  perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      3.65 ±  2%      +3.1        6.77 ±  2%      -0.4        3.29 ±  3%  perf-profile.calltrace.cycles-pp.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
      0.80 ±  4%      +3.1        3.92 ±  3%      -0.0        0.77 ±  4%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      2.68 ±  3%      +3.4        6.08 ±  2%      -0.2        2.48 ±  5%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
      7.71 ±  6%      +3.9       11.66 ±  2%      -1.3        6.46 ±  2%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
     67.18            +6.3       73.46 ±  3%      -8.0       59.16        perf-profile.calltrace.cycles-pp.do_access
      1.46 ±  9%      +7.1        8.57 ± 16%      -0.0        1.44 ± 10%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio
      1.50 ±  9%      +7.1        8.61 ± 16%      -0.0        1.48 ± 10%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      1.38 ± 10%      +7.1        8.51 ± 16%      -0.0        1.38 ± 11%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
     51.46            +7.6       59.08 ±  3%      -5.7       45.73        perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
      2.98 ±  5%      +7.7       10.66 ± 14%      -0.1        2.84 ±  5%  perf-profile.calltrace.cycles-pp.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault
      2.84 ±  6%      +7.7       10.56 ± 14%      -0.1        2.72 ±  5%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault
     34.18            +8.5       42.68 ±  4%      -4.1       30.12        perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
     34.14            +8.5       42.64 ±  4%      -4.1       30.09        perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
     33.95            +8.6       42.51 ±  4%      -4.0       29.91        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
     42.88            +8.8       51.70 ±  4%      -4.9       38.00        perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     42.34            +9.0       51.30 ±  4%      -4.8       37.50        perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
     42.29            +9.0       51.28 ±  4%      -4.8       37.46        perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
     45.07            +9.6       54.62 ±  4%      -5.1       39.97        perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
     44.95            +9.6       54.53 ±  4%      -5.1       39.85        perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
     43.72            +9.9       53.64 ±  4%      -5.0       38.76        perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
     17.28 ±  2%     +13.8       31.05 ±  6%      -2.1       15.20 ±  2%  perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
     21.14            -3.8       17.36 ±  7%      +3.1       24.20 ±  2%  perf-profile.children.cycles-pp.cpu_startup_entry
     21.14            -3.8       17.36 ±  7%      +3.1       24.20 ±  2%  perf-profile.children.cycles-pp.do_idle
     21.14            -3.8       17.36 ±  7%      +3.1       24.20 ±  2%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     21.09            -3.8       17.33 ±  7%      +3.0       24.13 ±  2%  perf-profile.children.cycles-pp.cpuidle_idle_call
     21.05            -3.8       17.29 ±  7%      +3.0       24.09 ±  2%  perf-profile.children.cycles-pp.start_secondary
     20.79            -3.7       17.07 ±  7%      +2.9       23.69        perf-profile.children.cycles-pp.cpuidle_enter
     20.78            -3.7       17.07 ±  7%      +2.9       23.67        perf-profile.children.cycles-pp.cpuidle_enter_state
     20.71            -3.7       17.01 ±  7%      +2.9       23.57        perf-profile.children.cycles-pp.acpi_safe_halt
     20.72            -3.7       17.02 ±  7%      +2.9       23.59        perf-profile.children.cycles-pp.acpi_idle_enter
     20.79            -3.6       17.19 ±  6%      +2.6       23.34        perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
     11.52            -3.1        8.42            -1.2       10.31 ±  2%  perf-profile.children.cycles-pp.do_rw_once
      4.62            -1.4        3.24            -0.5        4.16 ±  2%  perf-profile.children.cycles-pp.filemap_map_pages
      2.89            -0.9        2.00            -0.3        2.61 ±  3%  perf-profile.children.cycles-pp.next_uptodate_folio
      3.98            -0.6        3.39 ±  5%      -0.5        3.43 ±  2%  perf-profile.children.cycles-pp.clear_page_erms
      4.46 ±  6%      -0.5        3.99 ±  3%      -0.8        3.70 ±  2%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      1.18 ±  4%      -0.4        0.74 ±  6%      -0.2        1.02 ±  4%  perf-profile.children.cycles-pp.shmem_inode_acct_blocks
      1.44 ±  2%      -0.4        1.00 ±  2%      -0.1        1.29 ±  2%  perf-profile.children.cycles-pp.shmem_alloc_folio
      1.40            -0.4        0.99            -0.1        1.26 ±  2%  perf-profile.children.cycles-pp.alloc_pages_mpol
      6.86            -0.4        6.47 ±  5%      -0.8        6.07 ±  3%  perf-profile.children.cycles-pp.native_irq_return_iret
      1.27            -0.4        0.90            -0.1        1.14 ±  3%  perf-profile.children.cycles-pp.__alloc_pages
      3.06 ±  4%      -0.4        2.68 ± 17%      +1.9        4.91 ±  3%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      1.01 ±  2%      -0.3        0.68            -0.1        0.89 ±  3%  perf-profile.children.cycles-pp.get_page_from_freelist
      1.02 ±  2%      -0.3        0.70 ±  4%      -0.1        0.93 ±  2%  perf-profile.children.cycles-pp.sync_regs
      2.34 ±  5%      -0.3        2.09 ± 15%      +1.3        3.69 ±  3%  perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.77 ±  2%      -0.3        0.51            -0.1        0.70 ±  3%  perf-profile.children.cycles-pp.rmqueue
      2.34 ±  5%      -0.3        2.08 ± 15%      +1.3        3.68 ±  3%  perf-profile.children.cycles-pp.hrtimer_interrupt
      2.00 ±  6%      -0.2        1.78 ± 21%      +1.5        3.45 ±  4%  perf-profile.children.cycles-pp.ret_from_fork
      2.00 ±  6%      -0.2        1.78 ± 21%      +1.5        3.45 ±  4%  perf-profile.children.cycles-pp.ret_from_fork_asm
      1.99 ±  6%      -0.2        1.77 ± 21%      +1.5        3.44 ±  4%  perf-profile.children.cycles-pp.kthread
      1.95 ±  6%      -0.2        1.74 ± 21%      +1.4        3.37 ±  4%  perf-profile.children.cycles-pp.process_one_work
      0.81 ±  2%      -0.2        0.60            -0.1        0.72 ±  4%  perf-profile.children.cycles-pp.__perf_sw_event
      2.04 ±  7%      -0.2        1.83 ± 17%      +1.2        3.21 ±  5%  perf-profile.children.cycles-pp.__hrtimer_run_queues
      1.96 ±  6%      -0.2        1.74 ± 21%      +1.4        3.39 ±  4%  perf-profile.children.cycles-pp.worker_thread
      1.92 ±  6%      -0.2        1.71 ± 21%      +1.4        3.32 ±  4%  perf-profile.children.cycles-pp.drm_fb_helper_damage_work
      1.92 ±  6%      -0.2        1.71 ± 21%      +1.4        3.32 ±  4%  perf-profile.children.cycles-pp.drm_fbdev_generic_helper_fb_dirty
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.children.cycles-pp.__x64_sys_unlinkat
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.children.cycles-pp.do_unlinkat
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.children.cycles-pp.evict
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.children.cycles-pp.unlinkat
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.children.cycles-pp.shmem_evict_inode
      2.18 ± 17%      -0.2        1.97 ± 30%      +2.2        4.39 ± 12%  perf-profile.children.cycles-pp.shmem_undo_range
      1.81 ±  6%      -0.2        1.61 ± 22%      +1.4        3.18 ±  4%  perf-profile.children.cycles-pp.drm_atomic_helper_dirtyfb
      1.81 ±  6%      -0.2        1.61 ± 22%      +1.4        3.18 ±  4%  perf-profile.children.cycles-pp.drm_atomic_commit
      1.81 ±  6%      -0.2        1.61 ± 22%      +1.4        3.18 ±  4%  perf-profile.children.cycles-pp.drm_atomic_helper_commit
      0.53 ±  3%      -0.2        0.34 ±  2%      -0.1        0.47 ±  4%  perf-profile.children.cycles-pp.__rmqueue_pcplist
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.children.cycles-pp.drm_fb_memcpy
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.children.cycles-pp.memcpy_toio
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_tail_rpm
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.children.cycles-pp.ast_primary_plane_helper_atomic_update
      1.79 ±  6%      -0.2        1.60 ± 22%      +1.4        3.15 ±  4%  perf-profile.children.cycles-pp.drm_atomic_helper_commit_planes
      1.80 ±  6%      -0.2        1.60 ± 22%      +1.4        3.16 ±  4%  perf-profile.children.cycles-pp.commit_tail
      1.80 ±  6%      -0.2        1.60 ± 22%      +1.4        3.16 ±  4%  perf-profile.children.cycles-pp.ast_mode_config_helper_atomic_commit_tail
      1.50 ±  9%      -0.2        1.31 ± 18%      +0.7        2.23 ±  6%  perf-profile.children.cycles-pp.tick_nohz_highres_handler
      0.68 ±  2%      -0.2        0.50 ±  5%      +0.0        0.69 ±  3%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.65 ±  6%      -0.2        0.47 ±  2%      +0.0        0.69 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock
      0.47 ±  3%      -0.2        0.29 ±  2%      -0.1        0.42 ±  3%  perf-profile.children.cycles-pp.rmqueue_bulk
      1.32 ±  5%      -0.2        1.16 ± 15%      +0.6        1.88 ±  3%  perf-profile.children.cycles-pp.update_process_times
      0.65 ±  2%      -0.2        0.49            -0.1        0.58 ±  4%  perf-profile.children.cycles-pp.___perf_sw_event
      1.32 ±  5%      -0.2        1.16 ± 15%      +0.6        1.89 ±  3%  perf-profile.children.cycles-pp.tick_sched_handle
      0.64 ±  4%      -0.1        0.49 ±  5%      +0.1        0.71 ±  2%  perf-profile.children.cycles-pp.xas_load
      0.54            -0.1        0.39 ±  4%      -0.0        0.53 ±  3%  perf-profile.children.cycles-pp.__mod_node_page_state
      2.00 ±  4%      -0.1        1.86 ± 21%      +1.4        3.40 ±  5%  perf-profile.children.cycles-pp.vprintk_emit
      1.99 ±  5%      -0.1        1.85 ± 21%      +1.4        3.40 ±  5%  perf-profile.children.cycles-pp.write
      0.49 ±  2%      -0.1        0.35 ±  3%      -0.0        0.46 ±  4%  perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.54 ±  5%      -0.1        0.40 ±  2%      +0.0        0.56 ±  2%  perf-profile.children.cycles-pp.xas_find
      1.96 ±  5%      -0.1        1.83 ± 22%      +1.4        3.37 ±  5%  perf-profile.children.cycles-pp.devkmsg_emit
      1.96 ±  5%      -0.1        1.83 ± 22%      +1.4        3.37 ±  5%  perf-profile.children.cycles-pp.devkmsg_write
      1.98 ±  4%      -0.1        1.85 ± 21%      +1.4        3.40 ±  5%  perf-profile.children.cycles-pp.console_flush_all
      1.98 ±  4%      -0.1        1.85 ± 21%      +1.4        3.40 ±  5%  perf-profile.children.cycles-pp.console_unlock
      2.00 ±  4%      -0.1        1.88 ± 22%      +1.4        3.42 ±  5%  perf-profile.children.cycles-pp.ksys_write
      2.00 ±  4%      -0.1        1.88 ± 22%      +1.4        3.42 ±  5%  perf-profile.children.cycles-pp.vfs_write
      0.39 ±  4%      -0.1        0.28 ±  3%      -0.0        0.34 ±  6%  perf-profile.children.cycles-pp.__pte_offset_map_lock
      1.07 ±  5%      -0.1        0.96 ± 14%      +0.4        1.51 ±  2%  perf-profile.children.cycles-pp.scheduler_tick
      1.82 ±  5%      -0.1        1.71 ± 21%      +1.3        3.16 ±  6%  perf-profile.children.cycles-pp.serial8250_console_write
      0.39 ±  3%      -0.1        0.29 ±  3%      +0.0        0.42 ±  4%  perf-profile.children.cycles-pp.xas_descend
      0.32 ±  4%      -0.1        0.22 ±  8%      -0.0        0.27 ±  5%  perf-profile.children.cycles-pp.__dquot_alloc_space
      0.56 ±  8%      -0.1        0.47 ± 15%      +0.3        0.82 ±  7%  perf-profile.children.cycles-pp.xas_store
      1.64 ±  4%      -0.1        1.55 ± 21%      +1.2        2.85 ±  6%  perf-profile.children.cycles-pp.wait_for_lsr
      0.98 ± 17%      -0.1        0.89 ± 30%      +1.0        1.95 ± 12%  perf-profile.children.cycles-pp.truncate_inode_folio
      0.52 ±  7%      -0.1        0.43 ± 29%      +0.4        0.95 ±  5%  perf-profile.children.cycles-pp.irq_exit_rcu
      1.39 ±  5%      -0.1        1.31 ± 22%      +1.0        2.42 ±  6%  perf-profile.children.cycles-pp.io_serial_in
      0.30 ±  3%      -0.1        0.22 ±  3%      -0.0        0.28 ±  3%  perf-profile.children.cycles-pp.mas_walk
      1.06 ± 13%      -0.1        0.98 ± 22%      +0.8        1.87 ± 10%  perf-profile.children.cycles-pp.release_pages
      0.84 ± 17%      -0.1        0.76 ± 29%      +0.8        1.68 ± 12%  perf-profile.children.cycles-pp.filemap_remove_folio
      0.20 ± 13%      -0.1        0.13 ±  5%      -0.0        0.16 ± 13%  perf-profile.children.cycles-pp.shmem_recalc_inode
      0.44 ±  4%      -0.1        0.36 ± 28%      +0.4        0.79 ±  4%  perf-profile.children.cycles-pp.__do_softirq
      0.81 ± 17%      -0.1        0.74 ± 30%      +0.8        1.65 ± 12%  perf-profile.children.cycles-pp.__folio_batch_release
      0.26 ±  2%      -0.1        0.19 ±  3%      -0.0        0.23 ±  3%  perf-profile.children.cycles-pp.filemap_get_entry
      0.18 ±  5%      -0.1        0.12 ±  5%      -0.0        0.15 ±  6%  perf-profile.children.cycles-pp.xas_find_conflict
      0.55 ±  4%      -0.1        0.49 ± 14%      +0.3        0.82 ±  6%  perf-profile.children.cycles-pp.perf_adjust_freq_unthr_context
      0.55 ±  4%      -0.1        0.49 ± 14%      +0.3        0.82 ±  6%  perf-profile.children.cycles-pp.perf_event_task_tick
      0.28 ±  4%      -0.1        0.22 ±  8%      +0.0        0.28 ± 12%  perf-profile.children.cycles-pp.execve
      0.28 ±  4%      -0.1        0.22 ±  8%      +0.0        0.28 ± 12%  perf-profile.children.cycles-pp.__x64_sys_execve
      0.28 ±  4%      -0.1        0.22 ±  8%      +0.0        0.28 ± 12%  perf-profile.children.cycles-pp.do_execveat_common
      0.29 ±  3%      -0.1        0.24 ±  8%      -0.0        0.26 ± 11%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.63 ± 17%      -0.1        0.58 ± 30%      +0.6        1.27 ± 12%  perf-profile.children.cycles-pp.__filemap_remove_folio
      0.16 ±  5%      -0.1        0.11 ±  8%      -0.0        0.16 ±  3%  perf-profile.children.cycles-pp.error_entry
      0.25 ±  7%      -0.0        0.20 ± 22%      +0.1        0.37 ±  9%  perf-profile.children.cycles-pp._raw_spin_trylock
      0.15 ±  5%      -0.0        0.10 ± 10%      -0.0        0.12 ±  8%  perf-profile.children.cycles-pp.inode_add_bytes
      0.14 ±  5%      -0.0        0.09 ±  8%      -0.0        0.12 ± 13%  perf-profile.children.cycles-pp.__percpu_counter_limited_add
      0.07 ±  6%      -0.0        0.02 ± 99%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.__folio_throttle_swaprate
      0.40 ±  4%      -0.0        0.36 ± 13%      +0.2        0.60 ±  8%  perf-profile.children.cycles-pp.__intel_pmu_enable_all
      0.06 ±  7%      -0.0        0.02 ±142%      +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.read
      0.39 ± 17%      -0.0        0.34 ± 30%      +0.4        0.77 ± 12%  perf-profile.children.cycles-pp.free_unref_page_list
      0.10            -0.0        0.06 ± 13%      -0.0        0.10 ± 11%  perf-profile.children.cycles-pp.security_vm_enough_memory_mm
      0.18 ±  7%      -0.0        0.14 ± 13%      +0.1        0.24 ± 11%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.16 ±  5%      -0.0        0.12            -0.0        0.15 ±  4%  perf-profile.children.cycles-pp.handle_pte_fault
      0.17 ±  7%      -0.0        0.12 ±  4%      +0.0        0.17 ±  4%  perf-profile.children.cycles-pp.xas_start
      0.14 ±  6%      -0.0        0.10 ±  3%      -0.0        0.12 ±  6%  perf-profile.children.cycles-pp.__pte_offset_map
      0.26 ±  8%      -0.0        0.22 ± 33%      +0.2        0.48 ±  4%  perf-profile.children.cycles-pp.rebalance_domains
      0.31 ± 16%      -0.0        0.27 ± 28%      +0.3        0.62 ± 11%  perf-profile.children.cycles-pp.find_lock_entries
      0.07 ±  5%      -0.0        0.03 ± 70%      -0.0        0.06 ±  7%  perf-profile.children.cycles-pp.policy_nodemask
      0.06 ±  9%      -0.0        0.02 ±141%      +0.0        0.09 ± 13%  perf-profile.children.cycles-pp.lapic_next_deadline
      0.16 ±  4%      -0.0        0.13 ± 12%      -0.0        0.13 ±  4%  perf-profile.children.cycles-pp.folio_mark_accessed
      0.19 ±  4%      -0.0        0.16 ±  8%      -0.0        0.19 ± 10%  perf-profile.children.cycles-pp.bprm_execve
      0.11 ±  9%      -0.0        0.08 ±  6%      -0.0        0.10 ± 12%  perf-profile.children.cycles-pp.down_read_trylock
      0.16 ±  6%      -0.0        0.13 ±  5%      -0.0        0.14 ±  3%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.11 ±  6%      -0.0        0.08 ±  6%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.up_read
      0.15 ±  7%      -0.0        0.12 ± 13%      +0.0        0.20 ±  8%  perf-profile.children.cycles-pp.folio_unlock
      0.10 ±  4%      -0.0        0.07 ±  6%      +0.0        0.11 ± 13%  perf-profile.children.cycles-pp.__libc_fork
      0.07 ±  6%      -0.0        0.04 ± 45%      +0.0        0.10 ±  7%  perf-profile.children.cycles-pp.ksys_read
      0.10 ±  3%      -0.0        0.07 ± 11%      +0.0        0.10 ± 11%  perf-profile.children.cycles-pp.kernel_clone
      0.09 ±  5%      -0.0        0.06 ±  7%      +0.0        0.10 ±  7%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.07 ± 15%      -0.0        0.05 ± 72%      +0.0        0.11 ± 12%  perf-profile.children.cycles-pp.rcu_pending
      0.14 ± 17%      -0.0        0.11 ± 32%      +0.1        0.26 ± 11%  perf-profile.children.cycles-pp.xas_clear_mark
      0.25 ± 17%      -0.0        0.22 ± 30%      +0.2        0.50 ± 12%  perf-profile.children.cycles-pp.free_unref_page_commit
      0.17 ±  3%      -0.0        0.14 ± 23%      +0.1        0.29 ±  6%  perf-profile.children.cycles-pp.load_balance
      0.08 ±  8%      -0.0        0.06 ± 11%      -0.0        0.08 ± 12%  perf-profile.children.cycles-pp.path_openat
      0.09 ±  5%      -0.0        0.06 ± 11%      +0.0        0.09 ± 12%  perf-profile.children.cycles-pp.__x64_sys_openat
      0.08 ±  8%      -0.0        0.06 ± 11%      +0.0        0.08 ± 16%  perf-profile.children.cycles-pp.do_filp_open
      0.07            -0.0        0.04 ± 45%      +0.0        0.10 ±  5%  perf-profile.children.cycles-pp.vfs_read
      0.10 ±  6%      -0.0        0.08 ±  6%      -0.0        0.09 ±  5%  perf-profile.children.cycles-pp.pte_offset_map_nolock
      0.09 ±  4%      -0.0        0.06 ±  7%      +0.0        0.09 ±  9%  perf-profile.children.cycles-pp.__do_sys_clone
      0.14 ± 10%      -0.0        0.11 ± 15%      +0.1        0.22 ± 14%  perf-profile.children.cycles-pp.perf_rotate_context
      0.13 ± 18%      -0.0        0.10 ± 33%      +0.1        0.23 ± 13%  perf-profile.children.cycles-pp.irqtime_account_irq
      0.08 ±  8%      -0.0        0.06 ± 11%      +0.0        0.09 ± 12%  perf-profile.children.cycles-pp.do_sys_openat2
      0.07 ±  5%      -0.0        0.04 ± 45%      +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.copy_process
      0.07 ± 17%      -0.0        0.04 ± 75%      +0.1        0.13 ± 11%  perf-profile.children.cycles-pp.filemap_free_folio
      0.10 ± 20%      -0.0        0.08 ± 29%      +0.1        0.18 ± 15%  perf-profile.children.cycles-pp.xas_init_marks
      0.13 ±  5%      -0.0        0.11 ± 23%      +0.1        0.22 ±  8%  perf-profile.children.cycles-pp.update_sd_lb_stats
      0.16 ±  5%      -0.0        0.14 ±  6%      +0.0        0.16 ± 10%  perf-profile.children.cycles-pp.exec_binprm
      0.16 ±  4%      -0.0        0.13 ± 21%      +0.1        0.23 ±  8%  perf-profile.children.cycles-pp.fbcon_redraw
      0.16 ±  4%      -0.0        0.13 ± 21%      +0.1        0.23 ±  8%  perf-profile.children.cycles-pp.con_scroll
      0.16 ±  4%      -0.0        0.13 ± 21%      +0.1        0.23 ±  8%  perf-profile.children.cycles-pp.fbcon_scroll
      0.16 ±  4%      -0.0        0.13 ± 21%      +0.1        0.23 ±  8%  perf-profile.children.cycles-pp.lf
      0.10 ±  6%      -0.0        0.08 ±  7%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.__vm_enough_memory
      0.16 ±  4%      -0.0        0.14 ±  6%      +0.0        0.16 ± 10%  perf-profile.children.cycles-pp.search_binary_handler
      0.09 ±  5%      -0.0        0.07 ±  7%      -0.0        0.07 ±  6%  perf-profile.children.cycles-pp._compound_head
      0.08            -0.0        0.06 ±  9%      -0.0        0.07 ±  9%  perf-profile.children.cycles-pp.__irqentry_text_end
      0.09 ±  4%      -0.0        0.07 ± 14%      +0.0        0.12 ± 11%  perf-profile.children.cycles-pp.__schedule
      0.15 ±  5%      -0.0        0.13 ±  7%      -0.0        0.14 ±  8%  perf-profile.children.cycles-pp.xas_create
      0.06 ± 11%      -0.0        0.04 ± 75%      +0.0        0.09 ± 14%  perf-profile.children.cycles-pp.trigger_load_balance
      0.15 ±  5%      -0.0        0.13 ± 20%      +0.1        0.21 ±  8%  perf-profile.children.cycles-pp.bit_putcs
      0.16 ±  3%      -0.0        0.14 ± 20%      +0.1        0.24 ±  8%  perf-profile.children.cycles-pp.vt_console_print
      0.24 ±  6%      -0.0        0.22 ± 34%      +0.2        0.44 ±  5%  perf-profile.children.cycles-pp.wait_for_xmitr
      0.14 ±  3%      -0.0        0.12 ± 24%      +0.1        0.23 ±  8%  perf-profile.children.cycles-pp.find_busiest_group
      0.15 ±  4%      -0.0        0.14 ±  8%      +0.0        0.16 ±  9%  perf-profile.children.cycles-pp.load_elf_binary
      0.09 ± 11%      -0.0        0.07 ± 26%      +0.0        0.13 ± 11%  perf-profile.children.cycles-pp.rcu_sched_clock_irq
      0.15 ±  4%      -0.0        0.13 ± 21%      +0.1        0.22 ±  8%  perf-profile.children.cycles-pp.fbcon_putcs
      0.06 ±  8%      -0.0        0.04 ± 72%      +0.0        0.09 ±  6%  perf-profile.children.cycles-pp.update_rq_clock_task
      0.12 ±  4%      -0.0        0.10 ± 22%      +0.1        0.18 ±  9%  perf-profile.children.cycles-pp.update_sg_lb_stats
      0.18 ± 19%      -0.0        0.17 ± 29%      +0.2        0.36 ± 11%  perf-profile.children.cycles-pp.free_pcppages_bulk
      0.21 ±  7%      -0.0        0.19 ±  5%      +0.0        0.25 ±  6%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.18 ±  9%      -0.0        0.16 ± 19%      +0.1        0.30 ±  5%  perf-profile.children.cycles-pp.io_serial_out
      0.58 ± 36%      -0.0        0.56 ± 22%      +0.4        1.00 ± 20%  perf-profile.children.cycles-pp.ktime_get
      0.12 ±  5%      -0.0        0.10 ± 20%      +0.1        0.18 ±  9%  perf-profile.children.cycles-pp.fast_imageblit
      0.12 ±  5%      -0.0        0.10 ± 20%      +0.1        0.18 ±  9%  perf-profile.children.cycles-pp.sys_imageblit
      0.06 ± 17%      -0.0        0.05 ± 74%      +0.1        0.13 ± 17%  perf-profile.children.cycles-pp.truncate_cleanup_folio
      0.08 ± 20%      -0.0        0.06 ± 50%      +0.1        0.16 ±  8%  perf-profile.children.cycles-pp.free_unref_page_prepare
      0.07 ±  5%      -0.0        0.06 ± 11%      +0.0        0.10 ± 13%  perf-profile.children.cycles-pp.schedule
      0.08 ± 16%      -0.0        0.07 ± 16%      +0.1        0.13 ± 23%  perf-profile.children.cycles-pp.ktime_get_update_offsets_now
      0.12 ±  5%      -0.0        0.10 ± 23%      +0.1        0.18 ±  9%  perf-profile.children.cycles-pp.drm_fbdev_generic_defio_imageblit
      0.16 ± 19%      -0.0        0.14 ± 30%      +0.1        0.30 ± 11%  perf-profile.children.cycles-pp.__free_one_page
      0.12 ±  4%      -0.0        0.10 ±  3%      -0.0        0.10 ±  7%  perf-profile.children.cycles-pp.kmem_cache_alloc_lru
      0.48 ± 12%      -0.0        0.46 ± 15%      +0.4        0.88 ± 11%  perf-profile.children.cycles-pp.perf_mux_hrtimer_handler
      6.18 ±  6%      -0.0        6.17 ± 17%      +3.5        9.70 ±  5%  perf-profile.children.cycles-pp.do_syscall_64
      0.08 ±  5%      -0.0        0.07 ± 21%      +0.1        0.15 ± 12%  perf-profile.children.cycles-pp.rcu_core
      0.11 ± 11%      -0.0        0.10 ± 15%      +0.0        0.14 ±  5%  perf-profile.children.cycles-pp.memcpy_orig
      0.25 ±  6%      -0.0        0.24 ± 22%      +0.2        0.43 ± 13%  perf-profile.children.cycles-pp.delay_tsc
      0.06 ±  7%      -0.0        0.06 ±  9%      +0.0        0.09 ±  9%  perf-profile.children.cycles-pp.__mod_zone_page_state
      6.18 ±  6%      -0.0        6.18 ± 17%      +3.5        9.70 ±  5%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.02 ± 99%      -0.0        0.02 ±141%      +0.0        0.06 ±  6%  perf-profile.children.cycles-pp.update_irq_load_avg
      0.02 ± 99%      -0.0        0.02 ±142%      +0.1        0.09 ± 12%  perf-profile.children.cycles-pp.run_rebalance_domains
      0.02 ± 99%      -0.0        0.02 ±142%      +0.1        0.09 ± 12%  perf-profile.children.cycles-pp.update_blocked_averages
      0.03 ±100%      -0.0        0.02 ±141%      +0.1        0.09 ± 18%  perf-profile.children.cycles-pp.irq_enter_rcu
      0.07 ± 11%      -0.0        0.06 ± 19%      +0.1        0.13 ± 17%  perf-profile.children.cycles-pp.rcu_do_batch
      0.03 ± 70%      -0.0        0.03 ±100%      +0.1        0.09 ± 17%  perf-profile.children.cycles-pp.uncharge_folio
      0.08 ± 17%      -0.0        0.08 ± 27%      +0.1        0.16 ± 14%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
      0.01 ±223%      +0.0        0.01 ±223%      +0.1        0.06 ± 16%  perf-profile.children.cycles-pp.uncharge_batch
      0.00            +0.0        0.00            +0.1        0.05 ±  8%  perf-profile.children.cycles-pp.sched_clock
      0.00            +0.0        0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.native_sched_clock
      0.00            +0.0        0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.sched_clock_cpu
      0.00            +0.0        0.00            +0.1        0.06 ± 19%  perf-profile.children.cycles-pp.__slab_free
      0.00            +0.0        0.00            +0.1        0.06 ± 11%  perf-profile.children.cycles-pp.read_tsc
      0.00            +0.0        0.00            +0.1        0.07 ± 10%  perf-profile.children.cycles-pp.get_pfnblock_flags_mask
      0.20 ± 18%      +0.0        0.21 ± 30%      +0.2        0.44 ± 13%  perf-profile.children.cycles-pp.filemap_unaccount_folio
      0.01 ±223%      +0.0        0.02 ±141%      +0.1        0.06 ± 16%  perf-profile.children.cycles-pp.kmem_cache_free
      0.05 ±  8%      +0.0        0.08 ±  8%      -0.0        0.03 ±100%  perf-profile.children.cycles-pp.propagate_protected_usage
      0.54 ±  2%      +0.0        0.57 ±  4%      -0.1        0.45 ±  5%  perf-profile.children.cycles-pp.try_charge_memcg
      0.25 ±  2%      +0.0        0.30 ±  4%      -0.0        0.22 ±  9%  perf-profile.children.cycles-pp.page_counter_try_charge
      0.02 ±141%      +0.0        0.06 ±  7%      +0.0        0.06 ±  9%  perf-profile.children.cycles-pp.mod_objcg_state
      0.00            +0.1        0.07 ± 14%      +0.0        0.01 ±223%  perf-profile.children.cycles-pp.tlb_finish_mmu
      1.25            +0.5        1.72 ±  5%      -0.2        1.09 ±  5%  perf-profile.children.cycles-pp.unmap_vmas
      1.24            +0.5        1.71 ±  5%      -0.2        1.08 ±  5%  perf-profile.children.cycles-pp.zap_pte_range
      1.24            +0.5        1.71 ±  5%      -0.2        1.08 ±  5%  perf-profile.children.cycles-pp.unmap_page_range
      1.24            +0.5        1.71 ±  5%      -0.2        1.08 ±  5%  perf-profile.children.cycles-pp.zap_pmd_range
      1.21            +0.5        1.69 ±  5%      -0.2        1.04 ±  5%  perf-profile.children.cycles-pp.__munmap
      1.22            +0.5        1.71 ±  5%      -0.2        1.06 ±  5%  perf-profile.children.cycles-pp.__vm_munmap
      1.21            +0.5        1.70 ±  5%      -0.2        1.05 ±  5%  perf-profile.children.cycles-pp.__x64_sys_munmap
      1.25            +0.5        1.74 ±  5%      -0.2        1.08 ±  4%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      1.25            +0.5        1.74 ±  5%      -0.2        1.09 ±  5%  perf-profile.children.cycles-pp.do_vmi_munmap
      1.22            +0.5        1.72 ±  5%      -0.2        1.06 ±  6%  perf-profile.children.cycles-pp.unmap_region
      0.85 ±  2%      +0.6        1.44 ±  5%      -0.1        0.79 ±  2%  perf-profile.children.cycles-pp.lru_add_fn
      0.60 ±  3%      +0.6        1.20 ±  4%      -0.1        0.54 ±  7%  perf-profile.children.cycles-pp.page_remove_rmap
      3.30 ±  3%      +1.9        5.20            -0.3        3.02        perf-profile.children.cycles-pp.finish_fault
      3.04 ±  4%      +2.0        5.01            -0.2        2.79 ±  2%  perf-profile.children.cycles-pp.set_pte_range
      2.85 ±  4%      +2.0        4.87            -0.2        2.61 ±  2%  perf-profile.children.cycles-pp.folio_add_file_rmap_range
      1.97 ±  5%      +2.9        4.88 ±  2%      -0.3        1.66 ±  2%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      3.69 ±  2%      +3.1        6.80 ±  2%      -0.4        3.32 ±  3%  perf-profile.children.cycles-pp.shmem_add_to_page_cache
      7.74 ±  6%      +3.9       11.69 ±  2%      -1.3        6.48 ±  2%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      0.80 ±  4%      +4.0        4.85 ±  3%      -0.1        0.74 ±  5%  perf-profile.children.cycles-pp.__count_memcg_events
      6.12 ±  3%      +6.1       12.18            -0.3        5.85 ±  4%  perf-profile.children.cycles-pp.__mod_lruvec_page_state
      2.99 ±  3%      +6.6        9.56 ±  2%      +0.0        3.03 ±  2%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
     61.44            +6.7       68.11 ±  3%      -7.1       54.29        perf-profile.children.cycles-pp.do_access
      1.58 ±  9%      +7.1        8.72 ± 16%      +0.0        1.59 ±  9%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      1.45 ±  9%      +7.2        8.63 ± 16%      -0.0        1.45 ± 10%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      1.53 ±  9%      +7.2        8.72 ± 16%      -0.0        1.51 ± 10%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      2.98 ±  5%      +7.7       10.67 ± 14%      -0.1        2.84 ±  5%  perf-profile.children.cycles-pp.folio_add_lru
      2.86 ±  6%      +7.8       10.63 ± 14%      -0.1        2.74 ±  5%  perf-profile.children.cycles-pp.folio_batch_move_lru
     49.12            +8.3       57.47 ±  3%      -5.5       43.65        perf-profile.children.cycles-pp.asm_exc_page_fault
     34.19            +8.5       42.68 ±  4%      -4.1       30.13        perf-profile.children.cycles-pp.__do_fault
     34.15            +8.5       42.65 ±  4%      -4.1       30.09        perf-profile.children.cycles-pp.shmem_fault
     33.99            +8.6       42.54 ±  4%      -4.0       29.95        perf-profile.children.cycles-pp.shmem_get_folio_gfp
     43.06            +8.8       51.84 ±  4%      -4.9       38.17        perf-profile.children.cycles-pp.__handle_mm_fault
     42.43            +8.9       51.37 ±  4%      -4.8       37.59        perf-profile.children.cycles-pp.do_fault
     42.38            +9.0       51.34 ±  4%      -4.8       37.55        perf-profile.children.cycles-pp.do_read_fault
     45.26            +9.5       54.78 ±  4%      -5.1       40.16        perf-profile.children.cycles-pp.exc_page_fault
     45.15            +9.5       54.69 ±  4%      -5.1       40.05        perf-profile.children.cycles-pp.do_user_addr_fault
     43.91            +9.9       53.80 ±  4%      -5.0       38.95        perf-profile.children.cycles-pp.handle_mm_fault
     17.31 ±  2%     +13.8       31.07 ±  5%      -2.1       15.22 ±  2%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
     12.24            -4.5        7.76 ±  3%      -1.4       10.87 ±  2%  perf-profile.self.cycles-pp.shmem_get_folio_gfp
     17.96            -3.3       14.66 ±  4%      +0.6       18.55        perf-profile.self.cycles-pp.acpi_safe_halt
     10.95            -3.2        7.74            -1.1        9.82 ±  2%  perf-profile.self.cycles-pp.do_rw_once
      5.96            -1.4        4.58 ±  2%      -0.7        5.29        perf-profile.self.cycles-pp.do_access
      2.40            -0.8        1.64            -0.2        2.16 ±  3%  perf-profile.self.cycles-pp.next_uptodate_folio
      3.92            -0.6        3.36 ±  5%      -0.5        3.39 ±  2%  perf-profile.self.cycles-pp.clear_page_erms
      4.40 ±  6%      -0.5        3.95 ±  3%      -0.8        3.65 ±  2%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      1.52 ±  2%      -0.4        1.10 ±  2%      -0.2        1.36 ±  2%  perf-profile.self.cycles-pp.filemap_map_pages
      6.86            -0.4        6.47 ±  5%      -0.8        6.07 ±  3%  perf-profile.self.cycles-pp.native_irq_return_iret
      1.02 ±  2%      -0.3        0.70 ±  4%      -0.1        0.92 ±  2%  perf-profile.self.cycles-pp.sync_regs
      0.50 ±  7%      -0.2        0.27 ±  5%      -0.1        0.41 ±  9%  perf-profile.self.cycles-pp.shmem_inode_acct_blocks
      1.78 ±  6%      -0.2        1.58 ± 22%      +1.4        3.14 ±  4%  perf-profile.self.cycles-pp.memcpy_toio
      0.63 ±  5%      -0.2        0.46 ±  3%      +0.0        0.67 ±  5%  perf-profile.self.cycles-pp._raw_spin_lock
      0.42 ±  2%      -0.1        0.27 ±  2%      -0.0        0.37 ±  4%  perf-profile.self.cycles-pp.rmqueue_bulk
      0.52            -0.1        0.38 ±  4%      -0.0        0.50 ±  4%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.56 ±  2%      -0.1        0.42            -0.1        0.50 ±  4%  perf-profile.self.cycles-pp.___perf_sw_event
      0.31 ±  3%      -0.1        0.20 ±  2%      -0.1        0.24 ±  4%  perf-profile.self.cycles-pp.shmem_add_to_page_cache
      0.38 ±  4%      -0.1        0.28            -0.0        0.35 ±  4%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.36 ±  4%      -0.1        0.26 ±  2%      +0.0        0.39 ±  4%  perf-profile.self.cycles-pp.xas_descend
      0.30 ±  2%      -0.1        0.22 ±  2%      -0.0        0.27 ±  4%  perf-profile.self.cycles-pp.mas_walk
      1.39 ±  5%      -0.1        1.31 ± 22%      +1.0        2.42 ±  6%  perf-profile.self.cycles-pp.io_serial_in
      0.33 ±  3%      -0.1        0.26 ± 10%      -0.0        0.29 ±  5%  perf-profile.self.cycles-pp.lru_add_fn
      0.44 ±  9%      -0.1        0.38 ± 17%      +0.2        0.65 ±  6%  perf-profile.self.cycles-pp.release_pages
      0.20 ±  3%      -0.1        0.14 ±  5%      -0.0        0.18 ±  5%  perf-profile.self.cycles-pp.asm_exc_page_fault
      0.21 ±  5%      -0.1        0.15 ±  6%      -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.get_page_from_freelist
      0.26 ±  9%      -0.1        0.20 ± 15%      +0.1        0.34 ±  8%  perf-profile.self.cycles-pp.xas_store
      0.16 ±  7%      -0.1        0.11 ±  6%      -0.0        0.14 ±  7%  perf-profile.self.cycles-pp.__perf_sw_event
      0.18 ±  2%      -0.1        0.13 ±  5%      -0.0        0.16 ±  6%  perf-profile.self.cycles-pp.__alloc_pages
      0.22 ±  4%      -0.1        0.17 ±  4%      -0.0        0.22 ±  3%  perf-profile.self.cycles-pp.handle_mm_fault
      0.20 ±  8%      -0.1        0.14 ±  5%      +0.0        0.22 ±  5%  perf-profile.self.cycles-pp.xas_find
      0.15 ±  6%      -0.0        0.10 ±  7%      -0.0        0.15 ±  3%  perf-profile.self.cycles-pp.error_entry
      0.25 ±  8%      -0.0        0.20 ± 24%      +0.1        0.37 ± 10%  perf-profile.self.cycles-pp._raw_spin_trylock
      0.40 ±  4%      -0.0        0.36 ± 13%      +0.2        0.60 ±  8%  perf-profile.self.cycles-pp.__intel_pmu_enable_all
      0.17 ±  2%      -0.0        0.12 ±  6%      -0.0        0.14 ±  3%  perf-profile.self.cycles-pp.__dquot_alloc_space
      0.17 ±  6%      -0.0        0.13 ± 10%      +0.0        0.22 ± 10%  perf-profile.self.cycles-pp._raw_spin_lock_irq
      0.22 ±  4%      -0.0        0.18 ±  9%      +0.0        0.25 ±  3%  perf-profile.self.cycles-pp.xas_load
      0.23 ±  4%      -0.0        0.19 ± 10%      -0.0        0.20 ±  5%  perf-profile.self.cycles-pp.zap_pte_range
      0.12 ±  7%      -0.0        0.08 ± 10%      -0.0        0.11 ± 15%  perf-profile.self.cycles-pp.__percpu_counter_limited_add
      0.14 ±  3%      -0.0        0.09 ±  7%      -0.0        0.13 ± 10%  perf-profile.self.cycles-pp.rmqueue
      0.15 ±  2%      -0.0        0.11 ±  6%      -0.0        0.13 ±  7%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.12 ±  7%      -0.0        0.08 ±  5%      -0.0        0.11 ±  4%  perf-profile.self.cycles-pp.folio_add_lru
      0.15 ±  6%      -0.0        0.10 ±  9%      +0.0        0.16 ±  5%  perf-profile.self.cycles-pp.__mod_lruvec_state
      0.16 ±  7%      -0.0        0.12 ±  4%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.xas_start
      0.30 ± 10%      -0.0        0.26 ± 29%      +0.2        0.52 ± 10%  perf-profile.self.cycles-pp.asm_sysvec_apic_timer_interrupt
      0.06 ±  7%      -0.0        0.02 ± 99%      -0.0        0.06        perf-profile.self.cycles-pp.finish_fault
      0.16 ±  4%      -0.0        0.12 ± 12%      -0.0        0.13 ±  7%  perf-profile.self.cycles-pp.folio_mark_accessed
      0.11 ±  8%      -0.0        0.08 ±  6%      -0.0        0.10 ± 14%  perf-profile.self.cycles-pp.__pte_offset_map_lock
      0.13 ±  6%      -0.0        0.09 ±  4%      -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.__pte_offset_map
      0.06 ±  9%      -0.0        0.02 ±141%      +0.0        0.09 ± 13%  perf-profile.self.cycles-pp.lapic_next_deadline
      0.11 ±  9%      -0.0        0.08 ±  6%      -0.0        0.10 ± 12%  perf-profile.self.cycles-pp.down_read_trylock
      0.12 ±  3%      -0.0        0.09 ±  5%      -0.0        0.11 ±  6%  perf-profile.self.cycles-pp.do_read_fault
      0.14 ±  8%      -0.0        0.12 ± 14%      +0.0        0.19 ±  9%  perf-profile.self.cycles-pp.folio_unlock
      0.16 ±  4%      -0.0        0.12 ±  4%      -0.0        0.14 ±  4%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.09 ±  5%      -0.0        0.06 ±  7%      +0.0        0.10 ±  9%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
      0.11 ±  6%      -0.0        0.08 ±  5%      +0.0        0.13 ±  6%  perf-profile.self.cycles-pp.shmem_alloc_and_add_folio
      0.06 ± 14%      -0.0        0.03 ±101%      +0.1        0.12 ± 15%  perf-profile.self.cycles-pp.free_unref_page_commit
      0.08 ±  8%      -0.0        0.05            -0.0        0.06 ±  8%  perf-profile.self.cycles-pp.xas_find_conflict
      0.10 ±  6%      -0.0        0.07 ±  6%      -0.0        0.08 ±  5%  perf-profile.self.cycles-pp.up_read
      0.12 ±  4%      -0.0        0.09 ±  7%      -0.0        0.11 ±  8%  perf-profile.self.cycles-pp.folio_add_file_rmap_range
      0.25 ± 15%      -0.0        0.22 ± 29%      +0.2        0.49 ± 12%  perf-profile.self.cycles-pp.find_lock_entries
      0.12 ±  4%      -0.0        0.10 ±  6%      +0.0        0.14 ±  5%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.13 ±  6%      -0.0        0.10 ±  9%      -0.0        0.11 ± 10%  perf-profile.self.cycles-pp.page_remove_rmap
      0.09 ±  4%      -0.0        0.07 ±  7%      -0.0        0.09 ±  5%  perf-profile.self.cycles-pp.exc_page_fault
      0.22 ±  6%      -0.0        0.19 ± 17%      +0.1        0.34 ±  5%  perf-profile.self.cycles-pp.perf_adjust_freq_unthr_context
      0.08            -0.0        0.06 ±  8%      -0.0        0.07 ± 11%  perf-profile.self.cycles-pp.__irqentry_text_end
      0.13 ± 18%      -0.0        0.10 ± 31%      +0.1        0.24 ± 12%  perf-profile.self.cycles-pp.xas_clear_mark
      0.19 ±  5%      -0.0        0.17 ±  5%      +0.0        0.21 ±  6%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.09 ±  6%      -0.0        0.07 ±  5%      -0.0        0.09 ± 12%  perf-profile.self.cycles-pp.set_pte_range
      0.05 ± 46%      -0.0        0.03 ±102%      +0.1        0.11 ± 15%  perf-profile.self.cycles-pp.free_unref_page_list
      0.07 ±  5%      -0.0        0.05 ±  7%      -0.0        0.06 ±  6%  perf-profile.self.cycles-pp._compound_head
      0.08            -0.0        0.06 ±  9%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.06 ± 17%      -0.0        0.04 ± 75%      +0.1        0.13 ± 13%  perf-profile.self.cycles-pp.filemap_free_folio
      0.06 ±  9%      -0.0        0.04 ± 73%      +0.0        0.09 ± 15%  perf-profile.self.cycles-pp.trigger_load_balance
      0.10 ± 23%      -0.0        0.09 ± 35%      +0.1        0.20 ± 16%  perf-profile.self.cycles-pp.irqtime_account_irq
      0.06 ±  9%      -0.0        0.04 ± 45%      +0.0        0.08 ±  8%  perf-profile.self.cycles-pp.__mod_zone_page_state
      0.08 ± 22%      -0.0        0.06 ± 51%      +0.1        0.14 ± 14%  perf-profile.self.cycles-pp.filemap_remove_folio
      0.55 ± 38%      -0.0        0.53 ± 24%      +0.4        0.95 ± 21%  perf-profile.self.cycles-pp.ktime_get
      0.28            -0.0        0.27 ±  6%      -0.1        0.23 ±  6%  perf-profile.self.cycles-pp.try_charge_memcg
      0.18 ±  9%      -0.0        0.16 ± 19%      +0.1        0.30 ±  5%  perf-profile.self.cycles-pp.io_serial_out
      0.12 ±  5%      -0.0        0.10 ± 20%      +0.1        0.18 ±  9%  perf-profile.self.cycles-pp.fast_imageblit
      0.09 ±  5%      -0.0        0.07 ± 21%      +0.1        0.14 ±  9%  perf-profile.self.cycles-pp.update_sg_lb_stats
      0.06 ± 16%      -0.0        0.05 ± 74%      +0.1        0.12 ± 15%  perf-profile.self.cycles-pp.truncate_cleanup_folio
      0.14 ± 19%      -0.0        0.13 ± 32%      +0.1        0.28 ± 11%  perf-profile.self.cycles-pp.__free_one_page
      0.25 ±  6%      -0.0        0.24 ± 22%      +0.2        0.43 ± 13%  perf-profile.self.cycles-pp.delay_tsc
      0.07 ± 16%      -0.0        0.06 ± 17%      +0.0        0.12 ± 25%  perf-profile.self.cycles-pp.ktime_get_update_offsets_now
      0.03 ±102%      -0.0        0.02 ±142%      +0.1        0.09 ± 13%  perf-profile.self.cycles-pp.__filemap_remove_folio
      0.02 ± 99%      -0.0        0.02 ±141%      +0.0        0.06 ±  6%  perf-profile.self.cycles-pp.update_irq_load_avg
      0.02 ± 99%      -0.0        0.02 ±141%      +0.0        0.07 ±  8%  perf-profile.self.cycles-pp.menu_select
      0.02 ±142%      -0.0        0.02 ±141%      +0.1        0.08 ± 13%  perf-profile.self.cycles-pp.free_unref_page_prepare
      0.00            +0.0        0.00            +0.1        0.06 ±  9%  perf-profile.self.cycles-pp.native_sched_clock
      0.00            +0.0        0.00            +0.1        0.06 ± 19%  perf-profile.self.cycles-pp.__slab_free
      0.00            +0.0        0.00            +0.1        0.06 ± 11%  perf-profile.self.cycles-pp.read_tsc
      0.00            +0.0        0.00            +0.1        0.07 ± 10%  perf-profile.self.cycles-pp.get_pfnblock_flags_mask
      0.02 ±141%      +0.0        0.02 ±142%      +0.1        0.08 ± 16%  perf-profile.self.cycles-pp.uncharge_folio
      0.05 ±  8%      +0.0        0.08 ±  8%      -0.0        0.03 ±100%  perf-profile.self.cycles-pp.propagate_protected_usage
      1.31 ±  6%      +0.1        1.43 ±  2%      -0.2        1.07 ±  3%  perf-profile.self.cycles-pp.mem_cgroup_commit_charge
      2.93 ±  4%      +0.4        3.35 ±  3%      -0.2        2.71 ±  6%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
      0.77 ±  7%      +1.5        2.23 ±  3%      -0.1        0.67 ±  3%  perf-profile.self.cycles-pp.__mem_cgroup_charge
      0.75 ±  4%      +4.0        4.80 ±  3%      -0.1        0.70 ±  5%  perf-profile.self.cycles-pp.__count_memcg_events
      2.83 ±  3%      +6.6        9.40 ±  2%      +0.0        2.84 ±  2%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      1.45 ±  9%      +7.2        8.63 ± 16%      -0.0        1.45 ± 10%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath





[-- Attachment #3: will-it-scale-tlb_flush2 --]
[-- Type: text/plain, Size: 38804 bytes --]


=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush2/will-it-scale

commit:
  e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code
  8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg
  0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated()

e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
      4.05            -1.2        2.81            +0.0        4.06        mpstat.cpu.all.usr%
    118438 ± 14%     -27.8%      85543 ± 57%     -47.1%      62659 ± 72%  numa-meminfo.node0.AnonHugePages
    193.83 ±  6%     +69.3%     328.17 ±  8%      +0.5%     194.83 ±  7%  perf-c2c.DRAM.local
      1216 ±  8%     +27.1%       1546 ±  6%      +8.2%       1316 ±  8%  perf-c2c.DRAM.remote
    150.33 ± 13%     -40.0%      90.17 ± 13%     +10.9%     166.67 ±  8%  perf-c2c.HITM.remote
      0.04           -25.0%       0.03            +0.0%       0.04        turbostat.IPC
    316.16            -1.5%     311.47            -0.3%     315.25        turbostat.PkgWatt
     30.54            +4.9%      32.04            -0.5%      30.38        turbostat.RAMWatt
   2132437           -32.3%    1444430            +0.9%    2151460        will-it-scale.52.processes
     41008           -32.3%      27776            +0.9%      41373        will-it-scale.per_process_ops
   2132437           -32.3%    1444430            +0.9%    2151460        will-it-scale.workload
 3.113e+08 ±  3%     -31.7%  2.125e+08 ±  4%      +2.1%   3.18e+08 ±  2%  numa-numastat.node0.local_node
 3.114e+08 ±  3%     -31.7%  2.126e+08 ±  4%      +2.1%   3.18e+08 ±  2%  numa-numastat.node0.numa_hit
 3.322e+08 ±  2%     -32.5%  2.243e+08 ±  3%      -0.3%  3.312e+08 ±  3%  numa-numastat.node1.local_node
 3.323e+08 ±  2%     -32.5%  2.243e+08 ±  3%      -0.3%  3.312e+08 ±  3%  numa-numastat.node1.numa_hit
 3.114e+08 ±  3%     -31.7%  2.126e+08 ±  4%      +2.1%   3.18e+08 ±  2%  numa-vmstat.node0.numa_hit
 3.113e+08 ±  3%     -31.7%  2.125e+08 ±  4%      +2.1%   3.18e+08 ±  2%  numa-vmstat.node0.numa_local
 3.323e+08 ±  2%     -32.5%  2.243e+08 ±  3%      -0.3%  3.312e+08 ±  3%  numa-vmstat.node1.numa_hit
 3.322e+08 ±  2%     -32.5%  2.243e+08 ±  3%      -0.3%  3.312e+08 ±  3%  numa-vmstat.node1.numa_local
      0.00 ± 19%     -61.1%       0.00 ± 31%     +16.7%       0.00 ± 14%  perf-sched.sch_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
    217.07 ± 11%     -46.4%     116.39 ± 23%      -1.8%     213.18 ±  8%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
    218.50 ±  6%     +19.1%     260.33 ±  4%      +7.2%     234.17 ±  5%  perf-sched.wait_and_delay.count.__cond_resched.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
    217.06 ± 11%     -46.4%     116.38 ± 23%      -1.8%     213.18 ±  8%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.do_group_exit.__x64_sys_exit_group.do_syscall_64
      7758 ± 24%     +15.6%       8968 ± 43%    +113.6%      16574 ± 24%  proc-vmstat.numa_hint_faults_local
 6.436e+08           -32.1%  4.369e+08            +0.9%  6.493e+08        proc-vmstat.numa_hit
 6.435e+08           -32.1%  4.368e+08            +0.9%  6.492e+08        proc-vmstat.numa_local
 6.432e+08           -32.1%  4.368e+08            +0.9%  6.489e+08        proc-vmstat.pgalloc_normal
 1.286e+09           -32.1%  8.726e+08            +0.9%  1.297e+09        proc-vmstat.pgfault
 6.432e+08           -32.1%  4.367e+08            +0.9%  6.488e+08        proc-vmstat.pgfree
    170696 ±  8%      +3.4%     176515 ±  8%      +3.2%     176206 ±  8%  sched_debug.cpu.clock.avg
    170703 ±  8%      +3.4%     176522 ±  8%      +3.2%     176212 ±  8%  sched_debug.cpu.clock.max
    170689 ±  8%      +3.4%     176508 ±  8%      +3.2%     176198 ±  8%  sched_debug.cpu.clock.min
    169431 ±  8%      +3.4%     175248 ±  8%      +3.2%     174916 ±  8%  sched_debug.cpu.clock_task.avg
    169630 ±  8%      +3.4%     175429 ±  8%      +3.2%     175098 ±  8%  sched_debug.cpu.clock_task.max
    162542 ±  8%      +3.5%     168260 ±  8%      +3.4%     168099 ±  9%  sched_debug.cpu.clock_task.min
    170690 ±  8%      +3.4%     176508 ±  8%      +3.2%     176199 ±  8%  sched_debug.cpu_clk
    170117 ±  8%      +3.4%     175938 ±  8%      +3.2%     175626 ±  8%  sched_debug.ktime
    171259 ±  8%      +3.4%     177078 ±  8%      +3.2%     176768 ±  8%  sched_debug.sched_clk
      4.06           +80.8%       7.34            -4.8%       3.86        perf-stat.i.MPKI
 4.066e+09           -23.3%   3.12e+09            +3.5%  4.207e+09        perf-stat.i.branch-instructions
      0.57            -0.0        0.55            -0.0        0.57        perf-stat.i.branch-miss-rate%
  23478297           -25.0%   17605102            +3.3%   24242314        perf-stat.i.branch-misses
     17.25            +7.0       24.27            +0.7       17.95        perf-stat.i.cache-miss-rate%
  82715093 ±  2%     +35.9%  1.124e+08            -1.8%   81201463        perf-stat.i.cache-misses
 4.795e+08 ±  2%      -3.4%   4.63e+08            -5.6%  4.525e+08        perf-stat.i.cache-references
      7.14           +32.9%       9.49            -3.0%       6.92        perf-stat.i.cpi
    134.85            -1.2%     133.29            -0.2%     134.53        perf-stat.i.cpu-migrations
      1760 ±  2%     -26.5%       1294            +1.8%       1792        perf-stat.i.cycles-between-cache-misses
      0.26            -0.0        0.24            -0.0        0.25        perf-stat.i.dTLB-load-miss-rate%
  13461491           -31.7%    9190211            +0.9%   13582086        perf-stat.i.dTLB-load-misses
 5.141e+09           -24.1%  3.902e+09            +3.6%  5.327e+09        perf-stat.i.dTLB-loads
      0.45            -0.0        0.44            -0.0        0.45        perf-stat.i.dTLB-store-miss-rate%
  12934403           -32.2%    8773143            +0.9%   13056838        perf-stat.i.dTLB-store-misses
 2.841e+09           -29.9%  1.992e+09            +2.7%  2.917e+09        perf-stat.i.dTLB-stores
     14.76            +1.4       16.18 ±  4%      +2.2       16.92        perf-stat.i.iTLB-load-miss-rate%
   7454399 ±  2%     -22.7%    5760387 ±  4%     +16.4%    8674584        perf-stat.i.iTLB-load-misses
  43026423           -30.6%   29840650            -1.0%   42585377        perf-stat.i.iTLB-loads
 2.042e+10           -24.7%  1.538e+10            +3.1%  2.104e+10        perf-stat.i.instructions
      2745            -2.5%       2677 ±  4%     -11.4%       2432        perf-stat.i.instructions-per-iTLB-miss
      0.14           -24.6%       0.11            +3.1%       0.14        perf-stat.i.ipc
    815.65           -20.2%     651.03            -1.1%     807.03        perf-stat.i.metric.K/sec
    120.43           -24.3%      91.11            +3.0%     124.05        perf-stat.i.metric.M/sec
   4264808           -32.2%    2892980            +0.9%    4302236        perf-stat.i.minor-faults
  11007315 ±  2%     +39.7%   15375516            -2.9%   10691798 ±  2%  perf-stat.i.node-load-misses
   1459152 ±  6%     +45.1%    2116827 ±  5%      -5.0%    1386160 ±  5%  perf-stat.i.node-loads
   7872989 ±  2%     -26.2%    5812458            -3.4%    7608281 ±  2%  perf-stat.i.node-store-misses
   4264808           -32.2%    2892980            +0.9%    4302236        perf-stat.i.page-faults
      4.05           +80.4%       7.31            -4.8%       3.86        perf-stat.overall.MPKI
      0.58            -0.0        0.57            -0.0        0.58        perf-stat.overall.branch-miss-rate%
     17.25            +7.0       24.27            +0.7       17.95        perf-stat.overall.cache-miss-rate%
      7.13           +32.7%       9.46            -3.0%       6.91        perf-stat.overall.cpi
      1759 ±  2%     -26.5%       1294            +1.8%       1792        perf-stat.overall.cycles-between-cache-misses
      0.26            -0.0        0.23            -0.0        0.25        perf-stat.overall.dTLB-load-miss-rate%
      0.45            -0.0        0.44            -0.0        0.45        perf-stat.overall.dTLB-store-miss-rate%
     14.77            +1.4       16.18 ±  4%      +2.2       16.92        perf-stat.overall.iTLB-load-miss-rate%
      2739            -2.4%       2674 ±  4%     -11.4%       2426        perf-stat.overall.instructions-per-iTLB-miss
      0.14           -24.7%       0.11            +3.1%       0.14        perf-stat.overall.ipc
   2882666           +11.2%    3206246            +2.1%    2944234        perf-stat.overall.path-length
 4.052e+09           -23.3%   3.11e+09            +3.5%  4.193e+09        perf-stat.ps.branch-instructions
  23421504           -25.0%   17574476            +3.2%   24179002        perf-stat.ps.branch-misses
  82419384 ±  2%     +35.9%   1.12e+08            -1.8%   80913267        perf-stat.ps.cache-misses
 4.778e+08 ±  2%      -3.4%  4.614e+08            -5.6%  4.509e+08        perf-stat.ps.cache-references
    134.44            -1.1%     132.98            -0.2%     134.17        perf-stat.ps.cpu-migrations
  13415064           -31.7%    9160067            +0.9%   13535797        perf-stat.ps.dTLB-load-misses
 5.124e+09           -24.1%   3.89e+09            +3.6%   5.31e+09        perf-stat.ps.dTLB-loads
  12889609           -32.2%    8744145            +1.0%   13012111        perf-stat.ps.dTLB-store-misses
 2.831e+09           -29.9%  1.986e+09            +2.7%  2.907e+09        perf-stat.ps.dTLB-stores
   7428050 ±  2%     -22.7%    5741276 ±  4%     +16.4%    8644862        perf-stat.ps.iTLB-load-misses
  42877049           -30.6%   29741122            -1.0%   42438686        perf-stat.ps.iTLB-loads
 2.035e+10           -24.7%  1.533e+10            +3.1%  2.097e+10        perf-stat.ps.instructions
   4250034           -32.2%    2883410            +0.9%    4287486        perf-stat.ps.minor-faults
  10968228 ±  2%     +39.7%   15322266            -2.9%   10654062 ±  2%  perf-stat.ps.node-load-misses
   1454274 ±  6%     +45.1%    2109746 ±  5%      -5.0%    1381519 ±  5%  perf-stat.ps.node-loads
   7845298 ±  2%     -26.2%    5792864            -3.4%    7581789 ±  2%  perf-stat.ps.node-store-misses
   4250034           -32.2%    2883410            +0.9%    4287486        perf-stat.ps.page-faults
 6.147e+12           -24.7%  4.631e+12            +3.0%  6.334e+12        perf-stat.total.instructions
     26.77            -1.8       24.93 ±  3%      +0.5       27.32 ±  5%  perf-profile.calltrace.cycles-pp.intel_idle_ibrs.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     26.75            -1.8       24.92 ±  2%      +0.4       27.17 ±  5%  perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     26.75            -1.8       24.92 ±  2%      +0.4       27.18 ±  5%  perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.84            -1.8       25.00 ±  3%      +0.6       27.39 ±  5%  perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     26.75            -1.8       24.92 ±  2%      +0.4       27.18 ±  5%  perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.75            -1.8       24.92 ±  2%      +0.4       27.18 ±  5%  perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     26.75            -1.8       24.92 ±  2%      +0.4       27.18 ±  5%  perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     27.05            -1.8       25.29 ±  3%      +0.4       27.48 ±  5%  perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
     13.02 ±  2%      -1.4       11.60 ±  4%      -0.4       12.62 ±  2%  perf-profile.calltrace.cycles-pp.testcase
      5.54 ±  5%      -1.0        4.52 ±  3%      -0.5        5.06 ±  2%  perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
      1.37 ±  2%      -0.9        0.51 ± 58%      +0.0        1.38 ±  3%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.__madvise
     10.38 ±  3%      -0.8        9.54 ±  2%      -0.4        9.97 ±  2%  perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase
      2.38 ±  2%      -0.8        1.63 ±  3%      -0.1        2.29 ±  4%  perf-profile.calltrace.cycles-pp.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush
      4.02 ±  3%      -0.7        3.32 ±  3%      -0.3        3.76 ±  2%  perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
      1.92 ±  4%      -0.4        1.49 ±  2%      -0.0        1.88 ±  2%  perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.testcase
      1.36 ±  2%      -0.4        0.99            -0.0        1.36 ±  3%  perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase
      1.30 ± 10%      -0.4        0.94 ±  6%      -0.1        1.16 ±  5%  perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      1.50 ± 11%      -0.3        1.19 ±  5%      -0.2        1.29 ±  8%  perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
      1.13 ±  3%      -0.3        0.83            -0.0        1.13 ±  2%  perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
      0.71 ±  3%      -0.3        0.43 ± 44%      +0.0        0.71 ±  3%  perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.__madvise
      1.02 ±  3%      -0.3        0.75            -0.0        1.01 ±  2%  perf-profile.calltrace.cycles-pp.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
      0.97 ±  3%      -0.3        0.72            -0.0        0.96 ±  2%  perf-profile.calltrace.cycles-pp.native_flush_tlb_one_user.flush_tlb_func.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single
      0.77 ±  2%      -0.2        0.58 ±  2%      -0.0        0.75 ±  3%  perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
      0.71 ±  2%      -0.1        0.60 ±  3%      -0.0        0.69 ±  4%  perf-profile.calltrace.cycles-pp.propagate_protected_usage.page_counter_uncharge.uncharge_batch.__mem_cgroup_uncharge_list.release_pages
      1.20            +0.1        1.34            -0.1        1.12        perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
      1.10 ±  2%      +0.2        1.28            -0.1        1.03        perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise
      1.04 ±  2%      +0.2        1.24            -0.1        0.98        perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior
      0.83            +0.2        1.07 ±  2%      -0.0        0.82        perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
      0.81 ±  2%      +0.3        1.08            -0.0        0.76 ±  2%  perf-profile.calltrace.cycles-pp.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single
      0.88 ± 10%      +0.3        1.16 ±  4%      -0.1        0.77 ±  5%  perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.71 ±  2%      +0.3        1.00            -0.0        0.68        perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range.unmap_page_range
      0.76 ±  3%      +0.3        1.09 ±  2%      -0.0        0.75 ±  2%  perf-profile.calltrace.cycles-pp.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
      0.73 ±  3%      +0.3        1.07 ±  2%      -0.0        0.72 ±  2%  perf-profile.calltrace.cycles-pp.__mod_lruvec_page_state.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault.handle_mm_fault
      0.00            +0.6        0.55 ±  2%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.do_anonymous_page.__handle_mm_fault
      6.60 ±  4%      +0.6        7.18 ±  3%      -0.4        6.22 ±  2%  perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase
      6.54 ±  4%      +0.6        7.13 ±  3%      -0.4        6.17 ±  2%  perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.00            +0.7        0.74 ±  3%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
      0.00            +0.8        0.79 ±  2%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.page_remove_rmap.zap_pte_range.zap_pmd_range
      0.00            +0.8        0.79 ±  3%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
      0.00            +0.8        0.80 ±  3%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.folio_add_new_anon_rmap.do_anonymous_page.__handle_mm_fault
      5.80 ±  5%      +0.8        6.60 ±  3%      -0.4        5.41 ±  2%  perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase
      0.00            +0.8        0.82            +0.0        0.00        perf-profile.calltrace.cycles-pp.__count_memcg_events.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.tlb_batch_pages_flush
      0.69 ±  4%      +0.9        1.59 ±  2%      -0.0        0.66 ±  3%  perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
     30.43            +1.1       31.57            -0.3       30.08        perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
     29.22            +1.5       30.69            -0.3       28.88        perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise
     29.05            +1.5       30.56            -0.4       28.69        perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior
     22.56 ±  2%      +2.3       24.87            +0.1       22.70 ±  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.zap_page_range_single
     22.36 ±  2%      +2.3       24.70            +0.1       22.51 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
     22.11 ±  2%      +2.4       24.55            +0.2       22.27 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush
     22.70            +2.6       25.35            +0.4       23.12 ±  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single
     22.38            +2.7       25.08            +0.4       22.80 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
     24.10            +2.7       26.82            +0.4       24.51 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise
     24.09            +2.7       26.82            +0.4       24.51 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior.do_madvise
     24.07            +2.7       26.79            +0.4       24.48 ±  2%  perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.zap_page_range_single.madvise_vma_behavior
     22.14            +2.8       24.93            +0.4       22.56 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
     59.76            +2.9       62.64            -0.0       59.73 ±  2%  perf-profile.calltrace.cycles-pp.__madvise
     57.63            +3.5       61.10            -0.0       57.59 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise
     57.27            +3.6       60.85            -0.0       57.24 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     56.41            +3.8       60.20            -0.0       56.39 ±  2%  perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     56.37            +3.8       60.17            -0.0       56.34 ±  2%  perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise
     55.94            +3.9       59.88            -0.0       55.92 ±  2%  perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe
     55.85            +4.0       59.82            -0.0       55.83 ±  2%  perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64
     26.75            -1.8       24.92 ±  2%      +0.4       27.18 ±  5%  perf-profile.children.cycles-pp.start_secondary
     26.98            -1.8       25.22 ±  3%      +0.4       27.40 ±  5%  perf-profile.children.cycles-pp.intel_idle_ibrs
     27.05            -1.8       25.29 ±  3%      +0.4       27.48 ±  5%  perf-profile.children.cycles-pp.cpu_startup_entry
     27.05            -1.8       25.29 ±  3%      +0.4       27.48 ±  5%  perf-profile.children.cycles-pp.do_idle
     27.05            -1.8       25.29 ±  3%      +0.4       27.48 ±  5%  perf-profile.children.cycles-pp.secondary_startup_64_no_verify
     27.05            -1.8       25.29 ±  3%      +0.4       27.48 ±  5%  perf-profile.children.cycles-pp.cpuidle_enter
     27.05            -1.8       25.29 ±  3%      +0.4       27.48 ±  5%  perf-profile.children.cycles-pp.cpuidle_enter_state
     27.05            -1.8       25.29 ±  3%      +0.4       27.48 ±  5%  perf-profile.children.cycles-pp.cpuidle_idle_call
     13.66 ±  2%      -1.3       12.38            -0.4       13.26 ±  2%  perf-profile.children.cycles-pp.testcase
      5.55 ±  5%      -1.0        4.52 ±  3%      -0.5        5.06 ±  2%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
      2.39 ±  2%      -0.8        1.63 ±  3%      -0.1        2.29 ±  4%  perf-profile.children.cycles-pp.page_counter_uncharge
      4.03 ±  3%      -0.7        3.32 ±  3%      -0.3        3.76 ±  2%  perf-profile.children.cycles-pp.uncharge_batch
      1.96 ±  4%      -0.4        1.52 ±  2%      -0.0        1.91 ±  2%  perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
      1.30            -0.4        0.94 ±  2%      +0.0        1.32 ±  2%  perf-profile.children.cycles-pp.error_entry
      1.36 ±  2%      -0.4        0.99            -0.0        1.36 ±  3%  perf-profile.children.cycles-pp.__irqentry_text_end
      1.30 ± 10%      -0.4        0.94 ±  6%      -0.1        1.16 ±  5%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
      1.51 ± 11%      -0.3        1.19 ±  5%      -0.2        1.29 ±  8%  perf-profile.children.cycles-pp.uncharge_folio
      1.14 ±  3%      -0.3        0.84            -0.0        1.14 ±  2%  perf-profile.children.cycles-pp.flush_tlb_mm_range
      1.02 ±  3%      -0.3        0.75            -0.0        1.02 ±  2%  perf-profile.children.cycles-pp.flush_tlb_func
      0.98 ±  3%      -0.3        0.72            -0.0        0.96 ±  2%  perf-profile.children.cycles-pp.native_flush_tlb_one_user
      0.73 ±  2%      -0.2        0.52 ±  2%      -0.0        0.72 ±  3%  perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.69 ±  2%      -0.2        0.50 ±  2%      +0.0        0.70 ±  4%  perf-profile.children.cycles-pp.native_irq_return_iret
      0.79 ±  2%      -0.2        0.60 ±  2%      -0.0        0.77 ±  3%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.51 ±  2%      -0.1        0.38 ±  2%      +0.0        0.52 ±  4%  perf-profile.children.cycles-pp.sync_regs
      0.41 ±  3%      -0.1        0.29 ±  3%      -0.0        0.41 ±  3%  perf-profile.children.cycles-pp.__perf_sw_event
      0.44 ±  2%      -0.1        0.32 ±  2%      +0.0        0.44 ±  4%  perf-profile.children.cycles-pp.vma_alloc_folio
      0.72 ±  2%      -0.1        0.61 ±  3%      -0.0        0.71 ±  3%  perf-profile.children.cycles-pp.propagate_protected_usage
      0.39            -0.1        0.28 ±  2%      -0.0        0.39 ±  4%  perf-profile.children.cycles-pp.alloc_pages_mpol
      0.35 ±  3%      -0.1        0.25 ±  3%      -0.0        0.35 ±  4%  perf-profile.children.cycles-pp.__alloc_pages
      0.34 ±  2%      -0.1        0.24 ±  4%      +0.0        0.34 ±  4%  perf-profile.children.cycles-pp.___perf_sw_event
      0.30 ±  3%      -0.1        0.21 ±  5%      -0.0        0.30        perf-profile.children.cycles-pp.lock_vma_under_rcu
      0.32 ±  2%      -0.1        0.24            -0.0        0.32 ±  4%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.12 ±  4%      -0.1        0.03 ± 70%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.down_read
      0.25 ±  3%      -0.1        0.18 ±  4%      +0.0        0.26        perf-profile.children.cycles-pp.mas_walk
      0.25 ±  3%      -0.1        0.18 ±  2%      +0.0        0.25 ±  3%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.17 ±  4%      -0.1        0.11 ±  3%      -0.0        0.17 ±  6%  perf-profile.children.cycles-pp.__pte_offset_map_lock
      0.14 ±  3%      -0.0        0.10 ±  3%      +0.0        0.14 ±  7%  perf-profile.children.cycles-pp.clear_page_erms
      0.17 ±  2%      -0.0        0.12 ±  3%      +0.0        0.17 ±  4%  perf-profile.children.cycles-pp.find_vma_prev
      0.13 ±  2%      -0.0        0.09            -0.0        0.12 ±  4%  perf-profile.children.cycles-pp.percpu_counter_add_batch
      0.11 ±  4%      -0.0        0.07 ± 10%      -0.0        0.10 ±  3%  perf-profile.children.cycles-pp.__cond_resched
      0.13 ±  2%      -0.0        0.10 ±  7%      +0.0        0.15 ±  5%  perf-profile.children.cycles-pp.free_pages_and_swap_cache
      0.06 ±  7%      -0.0        0.03 ± 70%      +0.0        0.07 ±  7%  perf-profile.children.cycles-pp.unmap_vmas
      0.11 ±  3%      -0.0        0.08 ±  6%      +0.0        0.11 ±  6%  perf-profile.children.cycles-pp.free_unref_page_list
      0.06            -0.0        0.03 ± 70%      -0.0        0.06 ±  6%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.09 ±  7%      -0.0        0.06 ±  6%      +0.0        0.10 ±  5%  perf-profile.children.cycles-pp.free_swap_cache
      0.09 ±  7%      -0.0        0.07 ±  7%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.__munmap
      0.09 ±  8%      -0.0        0.06 ±  6%      +0.0        0.09 ±  8%  perf-profile.children.cycles-pp._raw_spin_lock
      0.09 ±  5%      -0.0        0.06 ±  6%      +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.handle_pte_fault
      0.08 ±  8%      -0.0        0.06 ±  6%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.do_vmi_munmap
      0.07 ±  6%      -0.0        0.05 ±  8%      +0.0        0.07 ±  6%  perf-profile.children.cycles-pp.rmqueue
      0.08 ±  4%      -0.0        0.06 ±  6%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.07 ±  9%      -0.0        0.05 ±  7%      +0.0        0.07 ±  6%  perf-profile.children.cycles-pp.unmap_region
      0.08 ±  8%      -0.0        0.06 ±  6%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.08 ±  8%      -0.0        0.06 ±  6%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.__vm_munmap
      0.08 ±  8%      -0.0        0.06 ±  6%      +0.0        0.08 ±  5%  perf-profile.children.cycles-pp.__x64_sys_munmap
      0.08 ±  5%      -0.0        0.07 ±  7%      +0.0        0.09 ±  8%  perf-profile.children.cycles-pp.try_charge_memcg
      1.27            +0.1        1.40            -0.1        1.19        perf-profile.children.cycles-pp.unmap_page_range
      1.17            +0.2        1.32            -0.1        1.10        perf-profile.children.cycles-pp.zap_pmd_range
      1.12            +0.2        1.29            -0.1        1.05        perf-profile.children.cycles-pp.zap_pte_range
      0.84            +0.2        1.07 ±  2%      -0.0        0.82        perf-profile.children.cycles-pp.lru_add_fn
      0.81 ±  2%      +0.3        1.08            -0.0        0.76 ±  2%  perf-profile.children.cycles-pp.page_remove_rmap
      0.89 ± 10%      +0.3        1.16 ±  4%      -0.1        0.78 ±  6%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      0.77 ±  3%      +0.3        1.09 ±  2%      -0.0        0.75        perf-profile.children.cycles-pp.folio_add_new_anon_rmap
      6.62 ±  4%      +0.6        7.19 ±  3%      -0.4        6.24 ±  2%  perf-profile.children.cycles-pp.exc_page_fault
      6.56 ±  4%      +0.6        7.14 ±  3%      -0.4        6.18 ±  2%  perf-profile.children.cycles-pp.do_user_addr_fault
      1.44 ±  2%      +0.6        2.08 ±  2%      -0.0        1.40        perf-profile.children.cycles-pp.__mod_lruvec_page_state
      5.80 ±  5%      +0.8        6.61 ±  3%      -0.4        5.43 ±  2%  perf-profile.children.cycles-pp.handle_mm_fault
     30.44            +1.1       31.58            -0.3       30.09        perf-profile.children.cycles-pp.tlb_finish_mmu
     29.23            +1.5       30.69            -0.3       28.88        perf-profile.children.cycles-pp.tlb_batch_pages_flush
     29.19            +1.5       30.66            -0.4       28.84        perf-profile.children.cycles-pp.release_pages
      1.63 ±  5%      +1.5        3.13 ±  2%      -0.1        1.56 ±  3%  perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
      1.32 ±  4%      +1.6        2.97 ±  2%      -0.1        1.26 ±  2%  perf-profile.children.cycles-pp.__count_memcg_events
     24.12            +2.7       26.84            +0.4       24.54 ±  2%  perf-profile.children.cycles-pp.lru_add_drain
     24.12            +2.7       26.84            +0.4       24.53 ±  2%  perf-profile.children.cycles-pp.lru_add_drain_cpu
     24.09            +2.7       26.81            +0.4       24.50 ±  2%  perf-profile.children.cycles-pp.folio_batch_move_lru
     59.80            +2.9       62.68            -0.0       59.78 ±  2%  perf-profile.children.cycles-pp.__madvise
     57.82            +3.4       61.26            -0.0       57.78 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     57.44            +3.5       60.99            -0.0       57.40 ±  2%  perf-profile.children.cycles-pp.do_syscall_64
     56.41            +3.8       60.20            -0.0       56.39 ±  2%  perf-profile.children.cycles-pp.__x64_sys_madvise
     56.37            +3.8       60.17            -0.0       56.35 ±  2%  perf-profile.children.cycles-pp.do_madvise
     55.94            +3.9       59.88            -0.0       55.92 ±  2%  perf-profile.children.cycles-pp.madvise_vma_behavior
     55.85            +4.0       59.82            -0.0       55.84 ±  2%  perf-profile.children.cycles-pp.zap_page_range_single
     45.26            +5.0       50.23            +0.6       45.82 ±  2%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     44.75            +5.0       49.80            +0.6       45.32 ±  2%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     44.26            +5.2       49.50            +0.6       44.84 ±  2%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
     26.98            -1.8       25.22 ±  3%      +0.4       27.40 ±  5%  perf-profile.self.cycles-pp.intel_idle_ibrs
      1.67 ±  3%      -0.6        1.02 ±  3%      -0.1        1.59 ±  5%  perf-profile.self.cycles-pp.page_counter_uncharge
      1.92 ±  5%      -0.4        1.49 ±  2%      -0.1        1.87 ±  2%  perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
      1.47 ±  2%      -0.4        1.06 ±  2%      +0.0        1.48 ±  4%  perf-profile.self.cycles-pp.testcase
      1.36 ±  2%      -0.4        0.99            -0.0        1.36 ±  3%  perf-profile.self.cycles-pp.__irqentry_text_end
      1.30            -0.4        0.94            +0.0        1.32 ±  2%  perf-profile.self.cycles-pp.error_entry
      1.30 ± 10%      -0.4        0.94 ±  6%      -0.1        1.16 ±  5%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      1.18 ±  8%      -0.3        0.86 ±  6%      -0.2        1.02 ±  6%  perf-profile.self.cycles-pp.uncharge_batch
      1.50 ± 11%      -0.3        1.19 ±  5%      -0.2        1.28 ±  8%  perf-profile.self.cycles-pp.uncharge_folio
      0.98 ±  3%      -0.3        0.72            -0.0        0.96 ±  2%  perf-profile.self.cycles-pp.native_flush_tlb_one_user
      0.71 ±  2%      -0.2        0.51 ±  2%      -0.0        0.70 ±  3%  perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.69 ±  2%      -0.2        0.50 ±  2%      +0.0        0.70 ±  3%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.50 ±  4%      -0.2        0.30 ±  5%      -0.0        0.49 ±  3%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.75 ±  2%      -0.2        0.56 ±  2%      -0.0        0.73 ±  2%  perf-profile.self.cycles-pp.syscall_exit_to_user_mode
      0.51 ±  2%      -0.1        0.38 ±  2%      +0.0        0.52 ±  4%  perf-profile.self.cycles-pp.sync_regs
      0.35 ±  3%      -0.1        0.23 ±  2%      -0.0        0.35 ±  5%  perf-profile.self.cycles-pp.folio_batch_move_lru
      0.36 ±  5%      -0.1        0.24 ±  2%      +0.0        0.36 ±  3%  perf-profile.self.cycles-pp.lru_add_fn
      0.39 ±  2%      -0.1        0.27 ±  2%      -0.0        0.39 ±  4%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.72 ±  2%      -0.1        0.61 ±  3%      -0.0        0.70 ±  3%  perf-profile.self.cycles-pp.propagate_protected_usage
      0.45            -0.1        0.34 ±  2%      -0.0        0.44 ±  2%  perf-profile.self.cycles-pp.release_pages
      0.54 ±  4%      -0.1        0.45 ±  4%      -0.0        0.53 ±  5%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
      0.30 ±  2%      -0.1        0.21 ±  3%      +0.0        0.30 ±  3%  perf-profile.self.cycles-pp.___perf_sw_event
      0.52 ±  5%      -0.1        0.43 ±  5%      -0.0        0.50 ±  4%  perf-profile.self.cycles-pp.folio_lruvec_lock_irqsave
      0.28 ±  3%      -0.1        0.21            -0.0        0.28 ±  3%  perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.25 ±  3%      -0.1        0.18 ±  4%      +0.0        0.25        perf-profile.self.cycles-pp.mas_walk
      0.24 ±  2%      -0.1        0.17 ±  4%      +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.__handle_mm_fault
      0.16 ±  4%      -0.1        0.10 ±  9%      +0.0        0.16 ±  3%  perf-profile.self.cycles-pp.zap_pte_range
      0.14 ±  4%      -0.0        0.10 ±  4%      +0.0        0.14 ±  7%  perf-profile.self.cycles-pp.clear_page_erms
      0.08 ±  6%      -0.0        0.03 ± 70%      -0.0        0.07        perf-profile.self.cycles-pp.__cond_resched
      0.13            -0.0        0.09            -0.0        0.12 ±  4%  perf-profile.self.cycles-pp.percpu_counter_add_batch
      0.14 ±  5%      -0.0        0.11 ±  3%      +0.0        0.15 ±  6%  perf-profile.self.cycles-pp.handle_mm_fault
      0.11 ±  3%      -0.0        0.08 ±  6%      +0.0        0.11 ±  8%  perf-profile.self.cycles-pp.do_user_addr_fault
      0.08 ±  6%      -0.0        0.04 ± 44%      -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.__perf_sw_event
      0.07 ± 10%      -0.0        0.04 ± 44%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.tlb_finish_mmu
      0.08 ±  7%      -0.0        0.05 ±  8%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.lock_vma_under_rcu
      0.09 ±  7%      -0.0        0.06 ±  6%      +0.0        0.09 ±  5%  perf-profile.self.cycles-pp.free_swap_cache
      0.09 ±  8%      -0.0        0.06 ±  6%      -0.0        0.08 ±  8%  perf-profile.self.cycles-pp._raw_spin_lock
      0.07 ±  7%      -0.0        0.04 ± 44%      +0.0        0.07 ±  5%  perf-profile.self.cycles-pp.asm_exc_page_fault
      0.10 ±  3%      -0.0        0.08 ±  6%      -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.page_remove_rmap
      0.08 ±  6%      -0.0        0.05 ±  8%      -0.0        0.07 ±  6%  perf-profile.self.cycles-pp.flush_tlb_mm_range
      0.08 ±  7%      -0.0        0.06 ±  6%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.unmap_page_range
      0.08 ±  6%      -0.0        0.06 ±  9%      +0.0        0.08 ±  7%  perf-profile.self.cycles-pp.do_anonymous_page
      0.08 ±  5%      -0.0        0.06 ±  6%      +0.0        0.08 ±  8%  perf-profile.self.cycles-pp.__alloc_pages
      0.08 ±  6%      -0.0        0.06 ±  8%      -0.0        0.08 ±  6%  perf-profile.self.cycles-pp.do_madvise
      0.07 ± 10%      -0.0        0.05 ±  8%      +0.0        0.08 ±  8%  perf-profile.self.cycles-pp.up_read
      1.58 ±  6%      +1.5        3.09 ±  2%      -0.1        1.51 ±  3%  perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
      1.27 ±  5%      +1.7        2.93 ±  2%      -0.0        1.22 ±  2%  perf-profile.self.cycles-pp.__count_memcg_events
     44.25            +5.2       49.50            +0.6       44.84 ±  2%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


[-- Attachment #4: will-it-scale-fallocate1 --]
[-- Type: text/plain, Size: 33789 bytes --]

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/fallocate1/will-it-scale

commit:
  e0bf1dc859fdd mm: memcg: move vmstats structs definition above flushing code
  8d59d2214c236 mm: memcg: make stats flushing threshold per-memcg
  0cba55e237ba6 mm: memcg: optimize parent iteration in memcg_rstat_updated()

e0bf1dc859fdd08e 8d59d2214c2362e7a9d185d80b6 0cba55e237ba61489c0a29f7d27
---------------- --------------------------- ---------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
    763.24            -1.2%     754.22            -2.0%     748.22        turbostat.PkgWatt
      4560 ±  2%     -19.4%       3673            +7.1%       4882 ±  2%  vmstat.system.cs
      0.03 ±  3%      -0.0        0.02            -0.0        0.03 ±  2%  mpstat.cpu.all.soft%
      0.13 ±  2%      -0.0        0.10 ±  2%      -0.0        0.13        mpstat.cpu.all.usr%
    293.00 ± 10%     +54.7%     453.17 ± 12%      +7.6%     315.33 ±  3%  perf-c2c.DRAM.local
      3720 ±  3%     +41.1%       5251 ±  3%      +0.8%       3752 ±  3%  perf-c2c.DRAM.remote
    325.67 ±  5%     -21.5%     255.50 ±  7%      +2.5%     333.83 ±  3%  perf-c2c.HITM.remote
   5426049           -33.8%    3590953            +3.3%    5605429        will-it-scale.224.processes
     24222           -33.8%      16030            +3.3%      25023        will-it-scale.per_process_ops
   5426049           -33.8%    3590953            +3.3%    5605429        will-it-scale.workload
    148965 ±  9%      +4.5%     155664 ± 20%     -14.2%     127883 ± 10%  numa-meminfo.node0.Slab
     41751 ± 62%     -31.7%      28502 ±122%     -66.6%      13962 ±108%  numa-meminfo.node1.Active
     41727 ± 62%     -31.7%      28502 ±122%     -66.6%      13948 ±108%  numa-meminfo.node1.Active(anon)
     69062 ± 38%     -19.5%      55596 ± 63%     -39.1%      42090 ± 35%  numa-meminfo.node1.Shmem
    355193 ±  3%     +16.1%     412516 ±  4%      -3.3%     343648 ±  4%  sched_debug.cfs_rq:/.avg_vruntime.stddev
    355191 ±  3%     +16.1%     412513 ±  4%      -3.2%     343648 ±  4%  sched_debug.cfs_rq:/.min_vruntime.stddev
     89.04 ±  9%     +15.9%     103.22 ±  9%      +3.2%      91.93 ± 11%  sched_debug.cfs_rq:/.runnable_avg.stddev
      4289           -13.9%       3693            +4.9%       4498        sched_debug.cpu.nr_switches.avg
      2259 ±  3%     -25.1%       1693 ±  2%      +5.7%       2388 ±  5%  sched_debug.cpu.nr_switches.min
     44536            -5.9%      41918            +1.5%      45191        proc-vmstat.nr_slab_reclaimable
 3.257e+09           -33.9%  2.153e+09            +3.3%  3.366e+09        proc-vmstat.numa_hit
 3.256e+09           -33.9%  2.152e+09            +3.3%  3.365e+09        proc-vmstat.numa_local
     10269 ± 45%     +87.3%      19237 ± 14%     +83.8%      18876 ± 39%  proc-vmstat.numa_pages_migrated
 3.257e+09           -33.9%  2.153e+09            +3.3%  3.365e+09        proc-vmstat.pgalloc_normal
 3.257e+09           -33.9%  2.153e+09            +3.3%  3.365e+09        proc-vmstat.pgfree
     10269 ± 45%     +87.3%      19237 ± 14%     +83.8%      18876 ± 39%  proc-vmstat.pgmigrate_success
 7.906e+08 ±  4%     -32.9%  5.303e+08 ±  2%      +3.5%  8.181e+08 ±  4%  numa-numastat.node0.local_node
 7.909e+08 ±  4%     -32.9%  5.305e+08 ±  2%      +3.5%  8.184e+08 ±  4%  numa-numastat.node0.numa_hit
 8.069e+08 ±  3%     -33.6%  5.361e+08 ±  2%      +6.0%  8.552e+08 ±  2%  numa-numastat.node1.local_node
 8.072e+08 ±  3%     -33.6%  5.363e+08 ±  2%      +6.0%  8.556e+08 ±  2%  numa-numastat.node1.numa_hit
    101456           -21.4%      79695 ± 38%     -33.4%      67613 ± 38%  numa-numastat.node1.other_node
 8.276e+08           -34.1%  5.457e+08 ±  2%      +2.8%  8.508e+08        numa-numastat.node2.local_node
 8.278e+08           -34.1%  5.459e+08 ±  2%      +2.8%  8.511e+08        numa-numastat.node2.numa_hit
  8.31e+08           -35.0%  5.403e+08            +1.1%  8.406e+08 ±  3%  numa-numastat.node3.local_node
 8.314e+08           -35.0%  5.404e+08            +1.2%  8.409e+08 ±  3%  numa-numastat.node3.numa_hit
 7.909e+08 ±  4%     -32.9%  5.305e+08 ±  2%      +3.5%  8.184e+08 ±  4%  numa-vmstat.node0.numa_hit
 7.906e+08 ±  4%     -32.9%  5.303e+08 ±  2%      +3.5%  8.181e+08 ±  4%  numa-vmstat.node0.numa_local
     10428 ± 62%     -31.6%       7130 ±122%     -66.6%       3486 ±108%  numa-vmstat.node1.nr_active_anon
     17331 ± 38%     -19.0%      14042 ± 63%     -37.6%      10816 ± 33%  numa-vmstat.node1.nr_shmem
     10428 ± 62%     -31.6%       7130 ±122%     -66.6%       3486 ±108%  numa-vmstat.node1.nr_zone_active_anon
 8.072e+08 ±  3%     -33.6%  5.363e+08 ±  2%      +6.0%  8.556e+08 ±  2%  numa-vmstat.node1.numa_hit
 8.069e+08 ±  3%     -33.6%  5.361e+08 ±  2%      +6.0%  8.552e+08 ±  2%  numa-vmstat.node1.numa_local
    101455           -21.4%      79693 ± 38%     -33.4%      67613 ± 38%  numa-vmstat.node1.numa_other
 8.278e+08           -34.1%  5.459e+08 ±  2%      +2.8%  8.511e+08        numa-vmstat.node2.numa_hit
 8.276e+08           -34.1%  5.457e+08 ±  2%      +2.8%  8.508e+08        numa-vmstat.node2.numa_local
 8.314e+08           -35.0%  5.404e+08            +1.2%  8.409e+08 ±  3%  numa-vmstat.node3.numa_hit
  8.31e+08           -35.0%  5.403e+08            +1.1%  8.406e+08 ±  3%  numa-vmstat.node3.numa_local
      0.10 ±  8%    +135.1%       0.24 ± 10%     +32.0%       0.13 ± 23%  perf-sched.sch_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.04 ± 11%     +42.4%       0.06 ± 16%     +13.6%       0.04 ± 20%  perf-sched.sch_delay.avg.ms.pipe_read.vfs_read.ksys_read.do_syscall_64
      0.06 ± 33%    +112.1%       0.14 ± 25%     -32.0%       0.04 ± 47%  perf-sched.sch_delay.avg.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.06 ± 46%    +447.4%       0.31 ± 92%      +8.2%       0.06 ± 38%  perf-sched.sch_delay.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.09 ± 33%     +82.8%       0.16 ± 25%     -21.4%       0.07 ± 53%  perf-sched.sch_delay.max.ms.syslog_print.do_syslog.kmsg_read.vfs_read
      0.03 ±  6%     +32.9%       0.04 ±  7%      -6.1%       0.03 ± 11%  perf-sched.total_sch_delay.average.ms
    139.63 ±  4%     +21.7%     169.99 ±  3%      -9.8%     125.97 ±  3%  perf-sched.total_wait_and_delay.average.ms
     31780 ±  8%     -19.0%      25751 ± 14%      -5.8%      29937 ± 14%  perf-sched.total_wait_and_delay.count.ms
    139.60 ±  4%     +21.7%     169.95 ±  3%      -9.8%     125.94 ±  3%  perf-sched.total_wait_time.average.ms
      0.18 ±  6%     +19.2%       0.22 ± 21%     -14.5%       0.16 ± 11%  perf-sched.wait_and_delay.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
      3.52 ±  5%     +13.4%       3.99 ±  2%      -0.3%       3.51 ±  4%  perf-sched.wait_and_delay.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      0.45 ±223%    +821.8%       4.15 ±  9%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    305.95 ±  7%     +44.4%     441.73 ±  4%     -14.7%     260.96 ±  4%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      6913 ±  6%     -16.5%       5771 ± 13%      -0.4%       6884 ± 13%  perf-sched.wait_and_delay.count.__cond_resched.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64
      1974 ± 11%     -42.6%       1132 ± 16%     -16.3%       1651 ± 17%  perf-sched.wait_and_delay.count.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
      1602 ±  7%      +2.7%       1646 ± 13%     -14.3%       1373 ± 12%  perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread
      9474 ± 11%     -33.5%       6303 ± 13%      +0.5%       9524 ± 15%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      2.19 ±223%    +770.9%      19.04 ± 63%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
      1233 ± 30%    +163.1%       3245 ± 26%      +0.9%       1243 ± 30%  perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.18 ±  6%     +19.2%       0.22 ± 21%     -14.5%       0.16 ± 11%  perf-sched.wait_time.avg.ms.__cond_resched.shmem_inode_acct_blocks.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
    104.96 ± 11%     +50.8%     158.31 ± 14%     -22.0%      81.88 ±  6%  perf-sched.wait_time.avg.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ±186%    +985.0%       0.18 ± 33%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.unmap_vmas.exit_mmap.__mmput.exit_mm
      3.41 ±  5%      +9.8%       3.75 ±  3%      -1.2%       3.37 ±  4%  perf-sched.wait_time.avg.ms.do_wait.kernel_wait4.__do_sys_wait4.do_syscall_64
      2.38 ±  6%     +65.7%       3.95 ±  9%      +1.4%       2.42 ± 10%  perf-sched.wait_time.avg.ms.schedule_timeout.__wait_for_common.wait_for_completion_state.kernel_clone
    305.93 ±  7%     +44.4%     441.71 ±  4%     -14.7%     260.94 ±  4%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.07 ± 12%     +59.5%       0.11 ± 27%     +28.5%       0.09 ± 44%  perf-sched.wait_time.avg.ms.wait_for_partner.fifo_open.do_dentry_open.do_open
    361.49 ± 10%    +163.6%     952.71 ± 24%     -10.2%     324.59 ± 12%  perf-sched.wait_time.max.ms.__cond_resched.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      0.02 ±186%   +1370.0%       0.24 ± 20%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.unmap_vmas.exit_mmap.__mmput.exit_mm
      1233 ± 30%    +163.1%       3245 ± 26%      +0.9%       1243 ± 30%  perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
      1.51           +70.8%       2.58            -5.4%       1.43 ±  2%  perf-stat.i.MPKI
 1.364e+10           -19.8%  1.094e+10            +6.4%  1.451e+10        perf-stat.i.branch-instructions
      0.29            -0.0        0.25            -0.0        0.27        perf-stat.i.branch-miss-rate%
  39037567           -29.6%   27478165            +1.8%   39751409        perf-stat.i.branch-misses
     26.72            +7.0       33.67            +0.7       27.42        perf-stat.i.cache-miss-rate%
  97210743           +34.0%  1.302e+08 ±  2%      +0.3%   97544414 ±  2%  perf-stat.i.cache-misses
 3.641e+08            +6.3%  3.868e+08 ±  2%      -2.2%  3.559e+08 ±  2%  perf-stat.i.cache-references
      4452 ±  2%     -20.0%       3561            +7.5%       4784 ±  2%  perf-stat.i.context-switches
     13.13           +27.5%      16.74            -5.7%      12.38        perf-stat.i.cpi
    270.31            -1.6%     265.85            +0.4%     271.43        perf-stat.i.cpu-migrations
      8711           -25.3%       6504            -0.3%       8685 ±  2%  perf-stat.i.cycles-between-cache-misses
  1.66e+10           -21.1%   1.31e+10            +6.8%  1.774e+10        perf-stat.i.dTLB-loads
 7.758e+09           -31.1%  5.343e+09            +6.3%  8.251e+09        perf-stat.i.dTLB-stores
  12549015           -38.5%    7719822            +0.5%   12615766        perf-stat.i.iTLB-load-misses
 6.454e+10           -21.6%   5.06e+10            +6.1%  6.846e+10        perf-stat.i.instructions
      5208           +29.5%       6745            +5.4%       5491        perf-stat.i.instructions-per-iTLB-miss
      0.08           -21.6%       0.06            +6.1%       0.08        perf-stat.i.ipc
      0.36 ±  6%     -24.6%       0.27 ± 25%     -23.7%       0.27 ± 25%  perf-stat.i.major-faults
     86.32 ±  2%     +27.6%     110.14            +2.1%      88.10        perf-stat.i.metric.K/sec
    171.24           -22.4%     132.88            +6.5%     182.34        perf-stat.i.metric.M/sec
  14793159           +36.3%   20167559            +0.9%   14924992        perf-stat.i.node-load-misses
   1101912 ±  7%     +48.5%    1636628 ±  4%      +2.6%    1130608 ±  9%  perf-stat.i.node-loads
   3340101 ±  2%     -19.8%    2679120            +6.9%    3571816        perf-stat.i.node-store-misses
     84773 ±  5%     -20.6%      67339 ±  6%      +2.0%      86484 ±  5%  perf-stat.i.node-stores
      1.51           +70.8%       2.57            -5.4%       1.42 ±  2%  perf-stat.overall.MPKI
      0.29            -0.0        0.25            -0.0        0.27        perf-stat.overall.branch-miss-rate%
     26.69            +6.9       33.63            +0.7       27.39        perf-stat.overall.cache-miss-rate%
     13.12           +27.5%      16.73            -5.7%      12.37        perf-stat.overall.cpi
      8709           -25.3%       6503 ±  2%      -0.3%       8682 ±  2%  perf-stat.overall.cycles-between-cache-misses
      5146           +27.5%       6563            +5.5%       5430        perf-stat.overall.instructions-per-iTLB-miss
      0.08           -21.6%       0.06            +6.1%       0.08        perf-stat.overall.ipc
   3581676           +18.4%    4239733            +2.6%    3673713        perf-stat.overall.path-length
 1.359e+10           -19.8%  1.091e+10            +6.4%  1.446e+10        perf-stat.ps.branch-instructions
  38876130           -29.7%   27341584            +1.8%   39577054        perf-stat.ps.branch-misses
  96879835           +34.0%  1.298e+08 ±  2%      +0.3%   97215764 ±  2%  perf-stat.ps.cache-misses
  3.63e+08            +6.3%  3.859e+08 ±  2%      -2.2%  3.549e+08 ±  2%  perf-stat.ps.cache-references
      4434 ±  2%     -20.0%       3547            +7.4%       4764 ±  2%  perf-stat.ps.context-switches
    268.37            -1.9%     263.37            +0.3%     269.14        perf-stat.ps.cpu-migrations
 1.655e+10           -21.1%  1.305e+10            +6.8%  1.768e+10        perf-stat.ps.dTLB-loads
 7.733e+09           -31.1%  5.325e+09            +6.3%  8.223e+09        perf-stat.ps.dTLB-stores
  12499097           -38.5%    7684522            +0.5%   12563331        perf-stat.ps.iTLB-load-misses
 6.433e+10           -21.6%  5.044e+10            +6.1%  6.823e+10        perf-stat.ps.instructions
      0.34 ±  6%     -25.9%       0.25 ± 25%     -23.8%       0.26 ± 25%  perf-stat.ps.major-faults
  14743590           +36.3%   20098836            +0.9%   14874764        perf-stat.ps.node-load-misses
   1098750 ±  7%     +48.7%    1633532 ±  4%      +2.7%    1128235 ±  9%  perf-stat.ps.node-loads
   3328886 ±  2%     -19.8%    2670192            +6.9%    3559593        perf-stat.ps.node-store-misses
     84559 ±  5%     -20.6%      67163 ±  6%      +1.9%      86147 ±  5%  perf-stat.ps.node-stores
 1.943e+13           -21.7%  1.522e+13            +6.0%  2.059e+13        perf-stat.total.instructions
      9.91 ± 10%      -3.8        6.10 ±  4%      -1.4        8.53 ± 11%  perf-profile.calltrace.cycles-pp.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate
      4.47 ± 10%      -2.3        2.19 ±  4%      -0.6        3.84 ± 11%  perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
     58.11            -2.1       56.01            -1.1       57.01        perf-profile.calltrace.cycles-pp.fallocate64
     58.02            -2.1       55.95            -1.1       56.91        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fallocate64
     58.00            -2.1       55.94            -1.1       56.90        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
     57.96            -2.1       55.91            -1.1       56.85        perf-profile.calltrace.cycles-pp.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
     57.92            -2.0       55.89            -1.1       56.82        perf-profile.calltrace.cycles-pp.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe.fallocate64
     57.82            -2.0       55.83            -1.1       56.72        perf-profile.calltrace.cycles-pp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64.entry_SYSCALL_64_after_hwframe
     57.47            -1.8       55.62            -1.1       56.40        perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate.do_syscall_64
     57.30            -1.8       55.53            -1.1       56.22        perf-profile.calltrace.cycles-pp.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate.__x64_sys_fallocate
      2.17 ±  4%      -1.0        1.14 ±  3%      -0.1        2.06 ±  4%  perf-profile.calltrace.cycles-pp.__mem_cgroup_uncharge_list.release_pages.__folio_batch_release.shmem_undo_range.shmem_setattr
      3.08 ±  9%      -0.9        2.19 ±  4%      -0.4        2.64 ± 11%  perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
      1.29 ±  6%      -0.7        0.54 ±  4%      -0.1        1.16 ±  6%  perf-profile.calltrace.cycles-pp.uncharge_folio.__mem_cgroup_uncharge_list.release_pages.__folio_batch_release.shmem_undo_range
      0.88 ±  2%      -0.3        0.59 ±  2%      +0.0        0.90 ±  2%  perf-profile.calltrace.cycles-pp.uncharge_batch.__mem_cgroup_uncharge_list.release_pages.__folio_batch_release.shmem_undo_range
      1.66            -0.0        1.63 ±  3%      +0.0        1.69        perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.__folio_batch_release.shmem_undo_range.shmem_setattr.notify_change
      1.64            -0.0        1.62 ±  3%      +0.0        1.68        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
      1.66            -0.0        1.63 ±  3%      +0.0        1.69        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.__folio_batch_release.shmem_undo_range.shmem_setattr
      0.80            +0.1        0.86 ±  2%      -0.0        0.78 ±  2%  perf-profile.calltrace.cycles-pp.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      0.61 ±  2%      +0.1        0.74 ±  2%      -0.0        0.58 ±  3%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.lru_add_fn.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio
      1.65 ±  2%      +0.2        1.85            +0.0        1.67 ±  2%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.filemap_unaccount_folio.__filemap_remove_folio.filemap_remove_folio
      1.44 ±  3%      +0.4        1.79 ±  3%      -0.1        1.34 ±  4%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__mod_lruvec_page_state.shmem_add_to_page_cache.shmem_alloc_and_add_folio.shmem_get_folio_gfp
      0.08 ±223%      +0.5        0.60 ±  2%      +0.1        0.17 ±141%  perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.release_pages.__folio_batch_release.shmem_undo_range.shmem_setattr
      0.00            +0.9        0.86 ±  4%      +0.0        0.00        perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_commit_charge.__mem_cgroup_charge.shmem_alloc_and_add_folio.shmem_get_folio_gfp
     41.70            +2.1       43.82 ±  2%      +1.1       42.82 ±  2%  perf-profile.calltrace.cycles-pp.ftruncate64
     41.68            +2.1       43.81 ±  2%      +1.1       42.80 ±  2%  perf-profile.calltrace.cycles-pp.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     41.68            +2.1       43.81 ±  2%      +1.1       42.80 ±  2%  perf-profile.calltrace.cycles-pp.do_truncate.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     41.68            +2.1       43.80 ±  2%      +1.1       42.80 ±  2%  perf-profile.calltrace.cycles-pp.notify_change.do_truncate.do_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe
     41.69            +2.1       43.82 ±  2%      +1.1       42.81 ±  2%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     41.69            +2.1       43.82 ±  2%      +1.1       42.81 ±  2%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
     41.67            +2.1       43.80 ±  2%      +1.1       42.80 ±  2%  perf-profile.calltrace.cycles-pp.shmem_setattr.notify_change.do_truncate.do_sys_ftruncate.do_syscall_64
     41.67            +2.1       43.80 ±  2%      +1.1       42.79 ±  2%  perf-profile.calltrace.cycles-pp.shmem_undo_range.shmem_setattr.notify_change.do_truncate.do_sys_ftruncate
     38.67            +2.3       40.97 ±  2%      +1.0       39.68 ±  2%  perf-profile.calltrace.cycles-pp.__folio_batch_release.shmem_undo_range.shmem_setattr.notify_change.do_truncate
     36.98            +2.3       39.32 ±  2%      +1.0       37.96 ±  2%  perf-profile.calltrace.cycles-pp.release_pages.__folio_batch_release.shmem_undo_range.shmem_setattr.notify_change
     44.10            +2.4       46.47 ±  2%      +0.4       44.48        perf-profile.calltrace.cycles-pp.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate.vfs_fallocate
     44.04            +2.4       46.42 ±  2%      +0.4       44.42        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp.shmem_fallocate
     42.89            +2.4       45.32 ±  2%      +0.4       43.29        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio.shmem_get_folio_gfp
     42.87            +2.4       45.31 ±  2%      +0.4       43.27        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru.shmem_alloc_and_add_folio
     42.84            +2.4       45.29 ±  2%      +0.4       43.24        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.folio_add_lru
     33.96            +3.4       37.31 ±  2%      +1.1       35.02 ±  2%  perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.__folio_batch_release.shmem_undo_range.shmem_setattr
     33.94            +3.4       37.30 ±  2%      +1.1       35.00 ±  2%  perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.__folio_batch_release.shmem_undo_range
     33.92            +3.4       37.28 ±  2%      +1.1       34.98 ±  2%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.__folio_batch_release
      9.93 ± 10%      -3.8        6.10 ±  4%      -1.4        8.54 ± 11%  perf-profile.children.cycles-pp.__mem_cgroup_charge
      4.48 ± 10%      -2.3        2.20 ±  4%      -0.6        3.84 ± 11%  perf-profile.children.cycles-pp.get_mem_cgroup_from_mm
     58.14            -2.1       56.03            -1.1       57.04        perf-profile.children.cycles-pp.fallocate64
     57.96            -2.1       55.91            -1.1       56.85        perf-profile.children.cycles-pp.__x64_sys_fallocate
     57.92            -2.0       55.89            -1.1       56.82        perf-profile.children.cycles-pp.vfs_fallocate
     57.82            -2.0       55.83            -1.1       56.73        perf-profile.children.cycles-pp.shmem_fallocate
     57.53            -1.8       55.69            -1.1       56.44        perf-profile.children.cycles-pp.shmem_get_folio_gfp
     57.36            -1.8       55.60            -1.1       56.27        perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
      2.18 ±  4%      -1.0        1.14 ±  3%      -0.1        2.07 ±  4%  perf-profile.children.cycles-pp.__mem_cgroup_uncharge_list
      3.09 ±  9%      -0.9        2.19 ±  4%      -0.4        2.64 ± 11%  perf-profile.children.cycles-pp.mem_cgroup_commit_charge
      1.29 ±  6%      -0.7        0.54 ±  4%      -0.1        1.16 ±  6%  perf-profile.children.cycles-pp.uncharge_folio
      0.88 ±  2%      -0.3        0.59 ±  2%      +0.0        0.90 ±  2%  perf-profile.children.cycles-pp.uncharge_batch
      0.36            -0.1        0.22 ±  2%      +0.0        0.36        perf-profile.children.cycles-pp.shmem_alloc_folio
      0.36 ±  2%      -0.1        0.23 ±  2%      +0.0        0.37        perf-profile.children.cycles-pp.xas_store
      0.32 ±  2%      -0.1        0.20 ±  2%      +0.0        0.32        perf-profile.children.cycles-pp.alloc_pages_mpol
      0.27 ±  2%      -0.1        0.16 ±  4%      -0.0        0.27 ±  2%  perf-profile.children.cycles-pp.shmem_inode_acct_blocks
      0.27            -0.1        0.17 ±  3%      +0.0        0.28        perf-profile.children.cycles-pp.__alloc_pages
      0.37 ±  4%      -0.1        0.29            +0.0        0.40        perf-profile.children.cycles-pp.page_counter_uncharge
      0.18            -0.1        0.11 ±  4%      +0.0        0.18 ±  3%  perf-profile.children.cycles-pp.get_page_from_freelist
      0.16 ±  3%      -0.1        0.09            +0.0        0.16 ±  4%  perf-profile.children.cycles-pp.xas_load
      0.18 ±  2%      -0.1        0.12 ±  4%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.__mod_lruvec_state
      0.14 ±  2%      -0.1        0.09            +0.0        0.15 ±  3%  perf-profile.children.cycles-pp._raw_spin_lock
      0.18 ±  3%      -0.1        0.13 ±  4%      +0.0        0.19 ±  3%  perf-profile.children.cycles-pp.try_charge_memcg
      0.09 ± 10%      -0.0        0.04 ± 73%      +0.0        0.09 ± 12%  perf-profile.children.cycles-pp._raw_spin_lock_irq
      0.12            -0.0        0.07            +0.0        0.12        perf-profile.children.cycles-pp.__dquot_alloc_space
      0.11            -0.0        0.06 ±  6%      +0.0        0.11        perf-profile.children.cycles-pp.filemap_get_entry
      0.13 ±  2%      -0.0        0.09 ±  4%      +0.0        0.14        perf-profile.children.cycles-pp.__mod_node_page_state
      0.10 ±  3%      -0.0        0.06 ±  9%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.xas_descend
      0.12            -0.0        0.08            +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.free_unref_page_list
      0.11 ±  3%      -0.0        0.07            +0.0        0.12 ±  4%  perf-profile.children.cycles-pp.rmqueue
      0.10 ± 35%      -0.0        0.06            -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.cgroup_rstat_updated
      0.10            -0.0        0.06 ±  7%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.xas_clear_mark
      0.18            -0.0        0.14 ±  3%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.find_lock_entries
      0.16 ±  4%      -0.0        0.13 ±  2%      +0.0        0.17 ±  6%  perf-profile.children.cycles-pp.propagate_protected_usage
      0.10            -0.0        0.07 ±  5%      +0.0        0.10 ±  4%  perf-profile.children.cycles-pp.truncate_cleanup_folio
      0.08 ±  4%      -0.0        0.05 ±  7%      +0.0        0.08        perf-profile.children.cycles-pp.xas_init_marks
      0.09 ±  4%      -0.0        0.06 ±  7%      +0.0        0.09 ±  5%  perf-profile.children.cycles-pp.page_counter_try_charge
      0.18 ±  2%      -0.0        0.16            -0.0        0.18        perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
      0.14 ±  2%      -0.0        0.13 ±  2%      -0.0        0.14        perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt
      0.14 ±  2%      -0.0        0.13 ±  2%      -0.0        0.14        perf-profile.children.cycles-pp.hrtimer_interrupt
      0.09            -0.0        0.08            +0.0        0.09        perf-profile.children.cycles-pp.tick_sched_handle
      0.00            +0.0        0.00            +0.1        0.06 ±  6%  perf-profile.children.cycles-pp.mem_cgroup_update_lru_size
      0.82            +0.1        0.87            -0.0        0.79 ±  2%  perf-profile.children.cycles-pp.lru_add_fn
     99.81            +0.1       99.89            +0.0       99.81        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     99.79            +0.1       99.88            +0.0       99.80        perf-profile.children.cycles-pp.do_syscall_64
      0.51            +0.5        0.98 ±  3%      +0.0        0.53 ±  5%  perf-profile.children.cycles-pp.__count_memcg_events
      4.21            +0.8        5.01            -0.1        4.12        perf-profile.children.cycles-pp.__mod_memcg_lruvec_state
     41.68            +2.1       43.81 ±  2%      +1.1       42.80 ±  2%  perf-profile.children.cycles-pp.do_sys_ftruncate
     41.70            +2.1       43.82 ±  2%      +1.1       42.82 ±  2%  perf-profile.children.cycles-pp.ftruncate64
     41.68            +2.1       43.81 ±  2%      +1.1       42.80 ±  2%  perf-profile.children.cycles-pp.do_truncate
     41.68            +2.1       43.80 ±  2%      +1.1       42.80 ±  2%  perf-profile.children.cycles-pp.notify_change
     41.67            +2.1       43.80 ±  2%      +1.1       42.80 ±  2%  perf-profile.children.cycles-pp.shmem_setattr
     41.67            +2.1       43.81 ±  2%      +1.1       42.80 ±  2%  perf-profile.children.cycles-pp.shmem_undo_range
     38.67            +2.3       40.98 ±  2%      +1.0       39.68 ±  2%  perf-profile.children.cycles-pp.__folio_batch_release
     37.07            +2.3       39.39 ±  2%      +1.0       38.05 ±  2%  perf-profile.children.cycles-pp.release_pages
     45.77            +2.4       48.14            +0.4       46.17        perf-profile.children.cycles-pp.folio_batch_move_lru
     44.14            +2.4       46.52 ±  2%      +0.4       44.51        perf-profile.children.cycles-pp.folio_add_lru
     78.55            +5.8       84.34            +1.5       80.04        perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
     78.52            +5.8       84.31            +1.5       80.00        perf-profile.children.cycles-pp._raw_spin_lock_irqsave
     78.48            +5.8       84.29            +1.5       79.96        perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      4.47 ± 10%      -2.3        2.19 ±  4%      -0.6        3.83 ± 11%  perf-profile.self.cycles-pp.get_mem_cgroup_from_mm
      2.67 ± 11%      -1.3        1.32 ±  4%      -0.5        2.22 ± 12%  perf-profile.self.cycles-pp.mem_cgroup_commit_charge
      1.28 ±  6%      -0.7        0.54 ±  4%      -0.1        1.16 ±  6%  perf-profile.self.cycles-pp.uncharge_folio
      2.18 ± 11%      -0.6        1.58 ±  5%      -0.3        1.86 ± 12%  perf-profile.self.cycles-pp.__mem_cgroup_charge
      1.16 ±  8%      -0.3        0.84 ± 10%      +0.1        1.24 ± 14%  perf-profile.self.cycles-pp.__mod_lruvec_page_state
      0.38 ±  7%      -0.2        0.18 ±  5%      -0.0        0.35 ±  6%  perf-profile.self.cycles-pp.uncharge_batch
      0.24 ±  4%      -0.1        0.16 ±  2%      +0.0        0.24        perf-profile.self.cycles-pp.folio_batch_move_lru
      0.18 ±  3%      -0.1        0.11 ±  3%      -0.0        0.18        perf-profile.self.cycles-pp.xas_store
      0.19            -0.1        0.12 ±  3%      -0.0        0.18 ±  2%  perf-profile.self.cycles-pp.release_pages
      0.14 ±  5%      -0.1        0.08 ±  5%      -0.0        0.14 ±  2%  perf-profile.self.cycles-pp.lru_add_fn
      0.23 ±  5%      -0.1        0.17 ±  2%      +0.0        0.25 ±  2%  perf-profile.self.cycles-pp.page_counter_uncharge
      0.11            -0.1        0.06            -0.0        0.10 ±  4%  perf-profile.self.cycles-pp.shmem_fallocate
      0.13            -0.0        0.08 ±  5%      +0.0        0.13 ±  2%  perf-profile.self.cycles-pp.__mod_node_page_state
      0.14 ±  3%      -0.0        0.09            +0.0        0.14        perf-profile.self.cycles-pp._raw_spin_lock
      0.09 ±  5%      -0.0        0.05 ±  7%      +0.0        0.09 ±  5%  perf-profile.self.cycles-pp.xas_descend
      0.10 ±  3%      -0.0        0.06            +0.0        0.10 ±  4%  perf-profile.self.cycles-pp.shmem_add_to_page_cache
      0.09 ± 37%      -0.0        0.05 ±  7%      -0.0        0.07 ±  5%  perf-profile.self.cycles-pp.cgroup_rstat_updated
      0.16 ±  4%      -0.0        0.13 ±  2%      +0.0        0.17 ±  6%  perf-profile.self.cycles-pp.propagate_protected_usage
      0.09            -0.0        0.06            +0.0        0.09 ±  5%  perf-profile.self.cycles-pp.xas_clear_mark
      0.09 ±  5%      -0.0        0.06 ±  7%      +0.0        0.09 ±  5%  perf-profile.self.cycles-pp.try_charge_memcg
      0.15            -0.0        0.12 ±  3%      +0.0        0.15 ±  3%  perf-profile.self.cycles-pp.find_lock_entries
      0.07 ±  7%      -0.0        0.05            +0.0        0.07 ±  6%  perf-profile.self.cycles-pp.page_counter_try_charge
      0.00            +0.0        0.00            +0.1        0.06 ±  8%  perf-profile.self.cycles-pp.mem_cgroup_update_lru_size
      0.50 ±  3%      +0.5        0.97 ±  3%      +0.0        0.52 ±  5%  perf-profile.self.cycles-pp.__count_memcg_events
      4.14 ±  2%      +0.8        4.97            -0.1        4.06        perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
     78.48            +5.8       84.29            +1.5       79.96        perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression
  2024-01-24  8:26       ` Oliver Sang
@ 2024-01-24  9:11         ` Yosry Ahmed
  0 siblings, 0 replies; 6+ messages in thread
From: Yosry Ahmed @ 2024-01-24  9:11 UTC (permalink / raw)
  To: Oliver Sang
  Cc: oe-lkp, lkp, linux-kernel, Andrew Morton, Johannes Weiner,
	Domenico Cerasuolo, Shakeel Butt, Chris Li, Greg Thelen,
	Ivan Babrou, Michal Hocko, Michal Koutny, Muchun Song,
	Roman Gushchin, Tejun Heo, Waiman Long, Wei Xu, cgroups,
	linux-mm, ying.huang, feng.tang, fengwei.yin

On Wed, Jan 24, 2024 at 12:26 AM Oliver Sang <oliver.sang@intel.com> wrote:
>
> hi, Yosry Ahmed,
>
> On Mon, Jan 22, 2024 at 11:42:04PM -0800, Yosry Ahmed wrote:
> > > > Oliver, would you be able to test if the attached patch helps? It's
> > > > based on 8d59d2214c236.
> > >
> > > the patch failed to compile:
> > >
> > > build_errors:
> > >   - "mm/memcontrol.c:731:38: error: 'x' undeclared (first use in this function)"
> >
> > Apologizes, apparently I sent the patch with some pending diff in my
> > tree that I hadn't committed. Please find a fixed patch attached.
>
> the regression disappears after applying the patch.
>
> Tested-by: kernel test robot <oliver.sang@intel.com>

Awesome! Thanks for testing. I will formalize the patch and send it
out for review.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-24  9:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-22  8:39 [linus:master] [mm] 8d59d2214c: vm-scalability.throughput -36.6% regression kernel test robot
2024-01-22 21:39 ` Yosry Ahmed
2024-01-23  7:21   ` Oliver Sang
2024-01-23  7:42     ` Yosry Ahmed
2024-01-24  8:26       ` Oliver Sang
2024-01-24  9:11         ` Yosry Ahmed

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).