From: kernel test robot <oliver.sang@intel.com>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: <oe-lkp@lists.linux.dev>, <lkp@intel.com>, <linux-mm@kvack.org>,
<ying.huang@intel.com>, <feng.tang@intel.com>,
<fengwei.yin@intel.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Matthew Wilcox (Oracle)" <willy@infradead.org>,
Suren Baghdasaryan <surenb@google.com>, <oliver.sang@intel.com>
Subject: Re: [PATCH v2 5/6] mm: Handle read faults under the VMA lock
Date: Fri, 20 Oct 2023 17:55:02 +0800 [thread overview]
Message-ID: <202310201715.3f52109d-oliver.sang@intel.com> (raw)
In-Reply-To: <20231006195318.4087158-6-willy@infradead.org>
Hello,
kernel test robot noticed a 46.0% improvement of vm-scalability.throughput on:
commit: 39fbbca087dd149cdb82f08e7b92d62395c21ecf ("[PATCH v2 5/6] mm: Handle read faults under the VMA lock")
url: https://github.com/intel-lab-lkp/linux/commits/Matthew-Wilcox-Oracle/mm-Make-lock_folio_maybe_drop_mmap-VMA-lock-aware/20231007-035513
base: v6.6-rc4
patch link: https://lore.kernel.org/all/20231006195318.4087158-6-willy@infradead.org/
patch subject: [PATCH v2 5/6] mm: Handle read faults under the VMA lock
testcase: vm-scalability
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Platinum 8260L CPU @ 2.40GHz (Cascade Lake) with 128G memory
parameters:
runtime: 300s
size: 2T
test: shm-pread-seq-mt
cpufreq_governor: performance
test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us.
test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231020/202310201715.3f52109d-oliver.sang@intel.com
=========================================================================================
compiler/cpufreq_governor/kconfig/rootfs/runtime/size/tbox_group/test/testcase:
gcc-12/performance/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/300s/2T/lkp-csl-2sp3/shm-pread-seq-mt/vm-scalability
commit:
90e99527c7 ("mm: Handle COW faults under the VMA lock")
39fbbca087 ("mm: Handle read faults under the VMA lock")
90e99527c746cd9e 39fbbca087dd149cdb82f08e7b9
---------------- ---------------------------
%stddev %change %stddev
\ | \
34.69 ± 23% +72.5% 59.82 ± 2% vm-scalability.free_time
173385 +45.6% 252524 vm-scalability.median
16599151 +46.0% 24242352 vm-scalability.throughput
390.45 +6.9% 417.32 vm-scalability.time.elapsed_time
390.45 +6.9% 417.32 vm-scalability.time.elapsed_time.max
45781 ± 2% +16.3% 53251 ± 2% vm-scalability.time.involuntary_context_switches
4.213e+09 +50.1% 6.325e+09 vm-scalability.time.maximum_resident_set_size
5.316e+08 +47.3% 7.83e+08 vm-scalability.time.minor_page_faults
6400 -8.0% 5890 vm-scalability.time.percent_of_cpu_this_job_got
21673 -10.2% 19455 vm-scalability.time.system_time
3319 +54.4% 5126 vm-scalability.time.user_time
2.321e+08 ± 2% +27.2% 2.953e+08 ± 5% vm-scalability.time.voluntary_context_switches
5.004e+09 +42.2% 7.116e+09 vm-scalability.workload
13110 +24.0% 16254 uptime.idle
1.16e+10 +24.5% 1.444e+10 cpuidle..time
2.648e+08 ± 3% +16.3% 3.079e+08 ± 5% cpuidle..usage
22.86 +6.3 29.17 mpstat.cpu.all.idle%
8.29 ± 5% -1.2 7.13 ± 7% mpstat.cpu.all.iowait%
58.63 -9.2 49.38 mpstat.cpu.all.sys%
9.05 +4.0 13.09 mpstat.cpu.all.usr%
8721571 ± 5% +44.8% 12630342 ± 2% numa-numastat.node0.local_node
8773210 ± 5% +44.8% 12706884 ± 2% numa-numastat.node0.numa_hit
7793725 ± 5% +51.3% 11793573 numa-numastat.node1.local_node
7842342 ± 5% +50.7% 11816543 numa-numastat.node1.numa_hit
23.17 +26.8% 29.37 vmstat.cpu.id
31295414 +50.9% 47211341 vmstat.memory.cache
95303378 -18.8% 77355720 vmstat.memory.free
1176885 ± 2% +19.2% 1402891 ± 3% vmstat.system.cs
194658 +5.4% 205149 ± 2% vmstat.system.in
9920198 ± 10% -48.9% 5071533 ± 15% turbostat.C1
0.51 ± 12% -0.3 0.21 ± 12% turbostat.C1%
1831098 ± 15% -72.0% 512888 ± 19% turbostat.C1E
0.14 ± 13% -0.1 0.06 ± 11% turbostat.C1E%
8736699 +36.3% 11905646 turbostat.C6
22.74 +6.3 29.02 turbostat.C6%
17.82 +25.5% 22.37 turbostat.CPU%c1
5.36 +28.2% 6.87 turbostat.CPU%c6
0.07 +42.9% 0.10 turbostat.IPC
77317703 +12.3% 86804635 ± 3% turbostat.IRQ
2.443e+08 ± 3% +18.9% 2.904e+08 ± 6% turbostat.POLL
4.80 +30.2% 6.24 turbostat.Pkg%pc2
266.73 -1.3% 263.33 turbostat.PkgWatt
0.00 -25.0% 0.00 perf-sched.sch_delay.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
0.06 ± 11% -21.8% 0.04 ± 9% perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
26.45 ± 9% -16.0% 22.21 ± 6% perf-sched.sch_delay.max.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
0.00 -25.0% 0.00 perf-sched.total_sch_delay.average.ms
106.37 ±167% -79.1% 22.21 ± 6% perf-sched.total_sch_delay.max.ms
0.46 ± 2% -16.0% 0.39 ± 5% perf-sched.total_wait_and_delay.average.ms
2202457 ± 2% +26.1% 2776824 ± 3% perf-sched.total_wait_and_delay.count.ms
0.45 ± 2% -15.9% 0.38 ± 5% perf-sched.total_wait_time.average.ms
0.02 ± 2% -19.8% 0.01 ± 2% perf-sched.wait_and_delay.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
494.65 ± 4% +10.6% 546.88 ± 3% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
2196122 ± 2% +26.1% 2770017 ± 3% perf-sched.wait_and_delay.count.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
0.01 ± 3% -19.5% 0.01 perf-sched.wait_time.avg.ms.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
494.63 ± 4% +10.6% 546.87 ± 3% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
0.22 ± 42% -68.8% 0.07 ±125% perf-sched.wait_time.max.ms.__cond_resched.stop_one_cpu.migrate_task_to.task_numa_migrate.isra
11445425 +82.1% 20837223 meminfo.Active
11444642 +82.1% 20836443 meminfo.Active(anon)
31218122 +51.0% 47138293 meminfo.Cached
30006048 +53.7% 46116816 meminfo.Committed_AS
17425032 +37.4% 23950392 meminfo.Inactive
17423257 +37.5% 23948613 meminfo.Inactive(anon)
164910 +21.8% 200913 meminfo.KReclaimable
26336530 +57.6% 41514589 meminfo.Mapped
94668993 -19.0% 76693589 meminfo.MemAvailable
95202238 -18.9% 77208832 meminfo.MemFree
36610737 +49.1% 54604143 meminfo.Memused
4072810 +50.1% 6114589 meminfo.PageTables
164910 +21.8% 200913 meminfo.SReclaimable
28535318 +55.8% 44455489 meminfo.Shmem
367289 +10.1% 404373 meminfo.Slab
37978157 +50.2% 57055526 meminfo.max_used_kB
2860756 +82.1% 5208445 proc-vmstat.nr_active_anon
2361286 -19.0% 1912151 proc-vmstat.nr_dirty_background_threshold
4728345 -19.0% 3828978 proc-vmstat.nr_dirty_threshold
7804148 +51.0% 11783823 proc-vmstat.nr_file_pages
23801109 -18.9% 19303173 proc-vmstat.nr_free_pages
4355690 +37.5% 5986921 proc-vmstat.nr_inactive_anon
6583645 +57.6% 10377790 proc-vmstat.nr_mapped
1018109 +50.1% 1528565 proc-vmstat.nr_page_table_pages
7133183 +55.8% 11112858 proc-vmstat.nr_shmem
41226 +21.8% 50226 proc-vmstat.nr_slab_reclaimable
2860756 +82.1% 5208445 proc-vmstat.nr_zone_active_anon
4355690 +37.5% 5986921 proc-vmstat.nr_zone_inactive_anon
112051 +3.8% 116273 proc-vmstat.numa_hint_faults
16618553 +47.6% 24525492 proc-vmstat.numa_hit
16518296 +47.9% 24425975 proc-vmstat.numa_local
11052273 +49.9% 16566743 proc-vmstat.pgactivate
16757533 +47.2% 24672644 proc-vmstat.pgalloc_normal
5.329e+08 +47.2% 7.844e+08 proc-vmstat.pgfault
16101786 +48.3% 23877738 proc-vmstat.pgfree
3302784 +6.0% 3500288 proc-vmstat.unevictable_pgs_scanned
6101287 ± 7% +81.3% 11062634 ± 3% numa-meminfo.node0.Active
6101026 ± 7% +81.3% 11062389 ± 3% numa-meminfo.node0.Active(anon)
17217355 ± 5% +46.3% 25196100 ± 3% numa-meminfo.node0.FilePages
9363213 ± 7% +31.9% 12347562 ± 2% numa-meminfo.node0.Inactive
9362621 ± 7% +31.9% 12347130 ± 2% numa-meminfo.node0.Inactive(anon)
14211196 ± 7% +51.2% 21487599 numa-meminfo.node0.Mapped
45879058 ± 2% -19.6% 36888633 ± 2% numa-meminfo.node0.MemFree
19925073 ± 5% +45.1% 28915498 ± 3% numa-meminfo.node0.MemUsed
2032891 +50.5% 3060344 numa-meminfo.node0.PageTables
15318197 ± 6% +52.0% 23276446 ± 2% numa-meminfo.node0.Shmem
5342463 ± 7% +82.9% 9769639 ± 4% numa-meminfo.node1.Active
5341941 ± 7% +82.9% 9769104 ± 4% numa-meminfo.node1.Active(anon)
13998966 ± 8% +56.6% 21919509 ± 3% numa-meminfo.node1.FilePages
8060699 ± 7% +43.7% 11584190 ± 2% numa-meminfo.node1.Inactive
8059515 ± 7% +43.7% 11582844 ± 2% numa-meminfo.node1.Inactive(anon)
12125745 ± 7% +65.0% 20005342 numa-meminfo.node1.Mapped
49326340 ± 2% -18.2% 40347902 ± 2% numa-meminfo.node1.MemFree
16682503 ± 7% +53.8% 25660941 ± 3% numa-meminfo.node1.MemUsed
2039529 +49.6% 3051247 numa-meminfo.node1.PageTables
13214266 ± 7% +60.1% 21155303 ± 2% numa-meminfo.node1.Shmem
156378 ± 13% +21.1% 189316 ± 9% numa-meminfo.node1.Slab
1525784 ± 7% +81.4% 2767183 ± 3% numa-vmstat.node0.nr_active_anon
4304756 ± 5% +46.4% 6302189 ± 3% numa-vmstat.node0.nr_file_pages
11469263 ± 2% -19.6% 9218468 ± 2% numa-vmstat.node0.nr_free_pages
2340569 ± 7% +32.0% 3088383 ± 2% numa-vmstat.node0.nr_inactive_anon
3553304 ± 7% +51.3% 5375214 numa-vmstat.node0.nr_mapped
508315 +50.6% 765564 numa-vmstat.node0.nr_page_table_pages
3829966 ± 6% +52.0% 5822276 ± 2% numa-vmstat.node0.nr_shmem
1525783 ± 7% +81.4% 2767184 ± 3% numa-vmstat.node0.nr_zone_active_anon
2340569 ± 7% +32.0% 3088382 ± 2% numa-vmstat.node0.nr_zone_inactive_anon
8773341 ± 5% +44.8% 12707017 ± 2% numa-vmstat.node0.numa_hit
8721702 ± 5% +44.8% 12630474 ± 2% numa-vmstat.node0.numa_local
1335910 ± 7% +82.9% 2443778 ± 4% numa-vmstat.node1.nr_active_anon
3500040 ± 8% +56.7% 5482887 ± 3% numa-vmstat.node1.nr_file_pages
12331163 ± 2% -18.2% 10083422 ± 2% numa-vmstat.node1.nr_free_pages
2014795 ± 7% +43.8% 2897243 ± 2% numa-vmstat.node1.nr_inactive_anon
3031806 ± 7% +65.1% 5004449 numa-vmstat.node1.nr_mapped
510000 +49.7% 763297 numa-vmstat.node1.nr_page_table_pages
3303865 ± 7% +60.2% 5291835 ± 2% numa-vmstat.node1.nr_shmem
1335910 ± 7% +82.9% 2443778 ± 4% numa-vmstat.node1.nr_zone_active_anon
2014795 ± 7% +43.8% 2897242 ± 2% numa-vmstat.node1.nr_zone_inactive_anon
7842425 ± 5% +50.7% 11816530 numa-vmstat.node1.numa_hit
7793808 ± 5% +51.3% 11793555 numa-vmstat.node1.numa_local
9505083 +21.3% 11532590 ± 3% sched_debug.cfs_rq:/.avg_vruntime.avg
9551715 +21.4% 11595502 ± 3% sched_debug.cfs_rq:/.avg_vruntime.max
9426050 +21.4% 11443528 ± 3% sched_debug.cfs_rq:/.avg_vruntime.min
19249 ± 4% +28.3% 24698 ± 10% sched_debug.cfs_rq:/.avg_vruntime.stddev
0.79 -30.7% 0.55 ± 8% sched_debug.cfs_rq:/.h_nr_running.avg
12458 ± 12% +70.8% 21277 ± 22% sched_debug.cfs_rq:/.load.avg
13767 ± 95% +311.7% 56677 ± 29% sched_debug.cfs_rq:/.load.stddev
9505083 +21.3% 11532590 ± 3% sched_debug.cfs_rq:/.min_vruntime.avg
9551715 +21.4% 11595502 ± 3% sched_debug.cfs_rq:/.min_vruntime.max
9426050 +21.4% 11443528 ± 3% sched_debug.cfs_rq:/.min_vruntime.min
19249 ± 4% +28.3% 24698 ± 10% sched_debug.cfs_rq:/.min_vruntime.stddev
0.78 -30.7% 0.54 ± 8% sched_debug.cfs_rq:/.nr_running.avg
170.67 -21.4% 134.10 ± 6% sched_debug.cfs_rq:/.removed.load_avg.max
708.55 -32.2% 480.43 ± 7% sched_debug.cfs_rq:/.runnable_avg.avg
1510 ± 3% -12.5% 1320 ± 4% sched_debug.cfs_rq:/.runnable_avg.max
219.68 ± 7% -12.7% 191.74 ± 5% sched_debug.cfs_rq:/.runnable_avg.stddev
707.51 -32.3% 479.05 ± 7% sched_debug.cfs_rq:/.util_avg.avg
1506 ± 3% -12.6% 1317 ± 4% sched_debug.cfs_rq:/.util_avg.max
219.64 ± 7% -13.0% 191.15 ± 5% sched_debug.cfs_rq:/.util_avg.stddev
564.18 ± 2% -32.4% 381.24 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.avg
1168 ± 7% -14.8% 995.94 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.max
235.45 ± 5% -21.4% 185.13 ± 7% sched_debug.cfs_rq:/.util_est_enqueued.stddev
149234 ± 5% +192.0% 435707 ± 10% sched_debug.cpu.avg_idle.avg
404765 ± 17% +47.3% 596259 ± 15% sched_debug.cpu.avg_idle.max
5455 ± 4% +3302.8% 185624 ± 34% sched_debug.cpu.avg_idle.min
201990 +24.9% 252309 ± 5% sched_debug.cpu.clock.avg
201997 +24.9% 252315 ± 5% sched_debug.cpu.clock.max
201983 +24.9% 252303 ± 5% sched_debug.cpu.clock.min
3.80 ± 2% -10.1% 3.42 ± 3% sched_debug.cpu.clock.stddev
200296 +24.8% 249952 ± 5% sched_debug.cpu.clock_task.avg
200541 +24.8% 250280 ± 5% sched_debug.cpu.clock_task.max
194086 +25.5% 243582 ± 5% sched_debug.cpu.clock_task.min
4069 -32.7% 2739 ± 8% sched_debug.cpu.curr->pid.avg
8703 +15.2% 10027 ± 3% sched_debug.cpu.curr->pid.max
0.00 ± 6% -27.2% 0.00 ± 5% sched_debug.cpu.next_balance.stddev
0.78 -32.7% 0.52 ± 8% sched_debug.cpu.nr_running.avg
0.33 ± 6% -13.9% 0.29 ± 5% sched_debug.cpu.nr_running.stddev
2372181 ± 2% +57.6% 3737590 ± 8% sched_debug.cpu.nr_switches.avg
2448893 ± 2% +58.5% 3880813 ± 8% sched_debug.cpu.nr_switches.max
2290032 ± 2% +55.9% 3570559 ± 8% sched_debug.cpu.nr_switches.min
36185 ± 10% +74.8% 63244 ± 8% sched_debug.cpu.nr_switches.stddev
0.10 ± 19% +138.0% 0.23 ± 19% sched_debug.cpu.nr_uninterruptible.avg
201984 +24.9% 252304 ± 5% sched_debug.cpu_clk
201415 +25.0% 251735 ± 5% sched_debug.ktime
202543 +24.8% 252867 ± 5% sched_debug.sched_clk
3.84 ± 2% -14.1% 3.30 ± 2% perf-stat.i.MPKI
1.679e+10 +30.1% 2.186e+10 perf-stat.i.branch-instructions
0.54 ± 2% -0.1 0.45 perf-stat.i.branch-miss-rate%
75872684 -2.6% 73927540 perf-stat.i.branch-misses
31.85 -1.1 30.75 perf-stat.i.cache-miss-rate%
1184992 ± 2% +19.1% 1411069 ± 3% perf-stat.i.context-switches
3.49 -29.3% 2.47 perf-stat.i.cpi
2.265e+11 -8.1% 2.081e+11 perf-stat.i.cpu-cycles
950.46 ± 3% -11.6% 840.03 ± 2% perf-stat.i.cycles-between-cache-misses
9514714 ± 12% +27.3% 12109471 ± 10% perf-stat.i.dTLB-load-misses
1.556e+10 +29.9% 2.022e+10 perf-stat.i.dTLB-loads
1575276 ± 5% +35.8% 2138868 ± 5% perf-stat.i.dTLB-store-misses
3.396e+09 +21.6% 4.129e+09 perf-stat.i.dTLB-stores
79.97 +2.8 82.74 perf-stat.i.iTLB-load-miss-rate%
4265612 +8.4% 4624960 ± 2% perf-stat.i.iTLB-load-misses
712599 ± 8% -38.4% 438645 ± 7% perf-stat.i.iTLB-loads
5.59e+10 +27.7% 7.137e+10 perf-stat.i.instructions
12120 +11.6% 13525 ± 2% perf-stat.i.instructions-per-iTLB-miss
0.35 +32.7% 0.46 perf-stat.i.ipc
0.04 ± 38% +119.0% 0.08 ± 33% perf-stat.i.major-faults
2.36 -8.1% 2.17 perf-stat.i.metric.GHz
863.69 +7.5% 928.37 perf-stat.i.metric.K/sec
378.76 +28.8% 487.87 perf-stat.i.metric.M/sec
1359089 +37.9% 1874285 perf-stat.i.minor-faults
84.30 -2.8 81.50 perf-stat.i.node-load-miss-rate%
89.54 -2.5 87.09 perf-stat.i.node-store-miss-rate%
1359089 +37.9% 1874285 perf-stat.i.page-faults
3.65 ± 3% -22.5% 2.82 ± 4% perf-stat.overall.MPKI
0.45 -0.1 0.34 perf-stat.overall.branch-miss-rate%
32.64 -1.7 30.98 perf-stat.overall.cache-miss-rate%
4.05 -28.0% 2.92 perf-stat.overall.cpi
1113 ± 3% -7.1% 1034 ± 3% perf-stat.overall.cycles-between-cache-misses
0.05 ± 5% +0.0 0.05 ± 5% perf-stat.overall.dTLB-store-miss-rate%
85.73 +5.6 91.37 perf-stat.overall.iTLB-load-miss-rate%
13110 ± 2% +17.8% 15440 ± 2% perf-stat.overall.instructions-per-iTLB-miss
0.25 +39.0% 0.34 perf-stat.overall.ipc
4378 -4.2% 4195 perf-stat.overall.path-length
1.679e+10 +30.2% 2.186e+10 perf-stat.ps.branch-instructions
75862675 -2.6% 73920168 perf-stat.ps.branch-misses
1184994 ± 2% +19.1% 1411192 ± 3% perf-stat.ps.context-switches
2.265e+11 -8.1% 2.082e+11 perf-stat.ps.cpu-cycles
9518014 ± 12% +27.3% 12118863 ± 10% perf-stat.ps.dTLB-load-misses
1.556e+10 +29.9% 2.022e+10 perf-stat.ps.dTLB-loads
1575414 ± 5% +35.8% 2139373 ± 5% perf-stat.ps.dTLB-store-misses
3.396e+09 +21.6% 4.129e+09 perf-stat.ps.dTLB-stores
4265139 +8.4% 4625090 ± 2% perf-stat.ps.iTLB-load-misses
711002 ± 8% -38.5% 437258 ± 7% perf-stat.ps.iTLB-loads
5.59e+10 +27.7% 7.137e+10 perf-stat.ps.instructions
0.04 ± 37% +118.9% 0.08 ± 33% perf-stat.ps.major-faults
1359186 +37.9% 1874615 perf-stat.ps.minor-faults
1359186 +37.9% 1874615 perf-stat.ps.page-faults
2.191e+13 +36.3% 2.986e+13 perf-stat.total.instructions
74.66 -6.7 67.93 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.do_access
74.61 -6.7 67.89 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
53.18 -6.3 46.88 perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
35.54 -6.1 29.43 perf-profile.calltrace.cycles-pp.next_uptodate_folio.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
76.49 -5.4 71.07 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.do_access
79.82 -3.9 75.89 perf-profile.calltrace.cycles-pp.do_access
70.02 -3.8 66.23 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault
70.39 -3.7 66.70 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.do_access
68.31 -2.8 65.51 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
68.29 -2.8 65.50 perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault
0.65 ± 7% -0.3 0.37 ± 71% perf-profile.calltrace.cycles-pp.finish_task_switch.__schedule.schedule.io_schedule.folio_wait_bit_common
1.94 ± 6% -0.2 1.71 ± 6% perf-profile.calltrace.cycles-pp.__schedule.schedule.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp
1.96 ± 6% -0.2 1.74 ± 6% perf-profile.calltrace.cycles-pp.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault
1.95 ± 6% -0.2 1.74 ± 6% perf-profile.calltrace.cycles-pp.schedule.io_schedule.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
0.86 +0.1 1.00 ± 2% perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.filemap_map_pages.do_read_fault.do_fault
0.56 +0.2 0.72 ± 4% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry
1.16 ± 3% +0.2 1.33 ± 2% perf-profile.calltrace.cycles-pp.set_pte_range.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault
0.71 ± 2% +0.2 0.92 ± 3% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary
0.78 +0.2 1.02 ± 4% perf-profile.calltrace.cycles-pp.flush_smp_call_function_queue.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.44 ± 44% +0.3 0.73 ± 3% perf-profile.calltrace.cycles-pp.set_pte_range.finish_fault.do_read_fault.do_fault.__handle_mm_fault
0.89 ± 9% +0.3 1.24 ± 8% perf-profile.calltrace.cycles-pp.finish_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
1.23 +0.4 1.59 perf-profile.calltrace.cycles-pp.irqentry_exit_to_user_mode.asm_exc_page_fault.do_access
0.18 ±141% +0.4 0.57 ± 5% perf-profile.calltrace.cycles-pp.try_to_wake_up.wake_page_function.__wake_up_common.folio_wake_bit.filemap_map_pages
1.50 +0.6 2.05 perf-profile.calltrace.cycles-pp.filemap_get_entry.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
0.00 +0.6 0.56 ± 4% perf-profile.calltrace.cycles-pp.wake_page_function.__wake_up_common.folio_wake_bit.do_read_fault.do_fault
0.09 ±223% +0.6 0.69 ± 4% perf-profile.calltrace.cycles-pp.__wake_up_common.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault
0.00 +0.6 0.60 perf-profile.calltrace.cycles-pp.folio_add_file_rmap_range.set_pte_range.finish_fault.do_read_fault.do_fault
2.98 ± 3% +0.7 3.66 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault
3.39 ± 3% +0.8 4.21 perf-profile.calltrace.cycles-pp._raw_spin_lock_irq.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault
7.48 +0.9 8.41 perf-profile.calltrace.cycles-pp.folio_wait_bit_common.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault
2.25 ± 6% +1.0 3.30 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_wake_bit.do_read_fault.do_fault
2.44 ± 5% +1.1 3.56 ± 2% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault
3.11 ± 4% +1.4 4.52 perf-profile.calltrace.cycles-pp.folio_wake_bit.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
10.14 +1.9 12.06 perf-profile.calltrace.cycles-pp.shmem_get_folio_gfp.shmem_fault.__do_fault.do_read_fault.do_fault
10.26 +2.0 12.25 perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_read_fault.do_fault.__handle_mm_fault
10.29 +2.0 12.29 perf-profile.calltrace.cycles-pp.__do_fault.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault
9.69 +5.5 15.21 ± 2% perf-profile.calltrace.cycles-pp.do_rw_once
74.66 -6.7 67.94 perf-profile.children.cycles-pp.exc_page_fault
74.62 -6.7 67.90 perf-profile.children.cycles-pp.do_user_addr_fault
53.19 -6.3 46.89 perf-profile.children.cycles-pp.filemap_map_pages
35.56 -6.1 29.44 perf-profile.children.cycles-pp.next_uptodate_folio
76.51 -6.0 70.48 perf-profile.children.cycles-pp.asm_exc_page_fault
70.02 -3.8 66.24 perf-profile.children.cycles-pp.__handle_mm_fault
70.40 -3.7 66.71 perf-profile.children.cycles-pp.handle_mm_fault
81.33 -3.5 77.78 perf-profile.children.cycles-pp.do_access
68.32 -2.8 65.52 perf-profile.children.cycles-pp.do_fault
68.30 -2.8 65.50 perf-profile.children.cycles-pp.do_read_fault
2.07 ± 7% -2.0 0.12 ± 6% perf-profile.children.cycles-pp.down_read_trylock
1.28 ± 4% -1.1 0.16 ± 4% perf-profile.children.cycles-pp.up_read
0.65 ± 12% -0.4 0.28 ± 15% perf-profile.children.cycles-pp.intel_idle_irq
1.96 ± 6% -0.2 1.74 ± 6% perf-profile.children.cycles-pp.schedule
1.96 ± 6% -0.2 1.74 ± 6% perf-profile.children.cycles-pp.io_schedule
0.36 ± 7% -0.2 0.15 ± 3% perf-profile.children.cycles-pp.mtree_range_walk
0.30 ± 8% -0.2 0.13 ± 14% perf-profile.children.cycles-pp.mm_cid_get
0.12 ± 12% -0.1 0.03 ±100% perf-profile.children.cycles-pp.update_sg_lb_stats
0.16 ± 9% -0.1 0.07 ± 15% perf-profile.children.cycles-pp.load_balance
0.14 ± 10% -0.1 0.05 ± 46% perf-profile.children.cycles-pp.update_sd_lb_stats
0.20 ± 10% -0.1 0.11 ± 8% perf-profile.children.cycles-pp.newidle_balance
0.14 ± 10% -0.1 0.06 ± 17% perf-profile.children.cycles-pp.find_busiest_group
0.33 ± 6% -0.0 0.28 ± 5% perf-profile.children.cycles-pp.pick_next_task_fair
0.05 +0.0 0.06 perf-profile.children.cycles-pp.nohz_run_idle_balance
0.06 +0.0 0.08 ± 6% perf-profile.children.cycles-pp.__update_load_avg_se
0.04 ± 44% +0.0 0.06 perf-profile.children.cycles-pp.reweight_entity
0.09 ± 7% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.xas_descend
0.08 ± 5% +0.0 0.10 ± 4% perf-profile.children.cycles-pp.update_curr
0.09 ± 7% +0.0 0.11 ± 3% perf-profile.children.cycles-pp.prepare_task_switch
0.10 ± 4% +0.0 0.12 ± 3% perf-profile.children.cycles-pp.call_function_single_prep_ipi
0.08 ± 4% +0.0 0.10 ± 5% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.04 ± 44% +0.0 0.06 ± 7% perf-profile.children.cycles-pp.sched_clock
0.13 ± 7% +0.0 0.16 ± 4% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.08 ± 6% +0.0 0.10 ± 3% perf-profile.children.cycles-pp.set_next_entity
0.16 ± 4% +0.0 0.19 ± 3% perf-profile.children.cycles-pp.__switch_to
0.09 ± 4% +0.0 0.12 ± 4% perf-profile.children.cycles-pp.llist_reverse_order
0.04 ± 44% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.place_entity
0.14 ± 3% +0.0 0.16 ± 3% perf-profile.children.cycles-pp.llist_add_batch
0.09 ± 5% +0.0 0.12 ± 6% perf-profile.children.cycles-pp.available_idle_cpu
0.15 ± 4% +0.0 0.18 ± 4% perf-profile.children.cycles-pp.sysvec_call_function_single
0.08 ± 5% +0.0 0.12 ± 6% perf-profile.children.cycles-pp.wake_affine
0.08 +0.0 0.11 perf-profile.children.cycles-pp.__list_del_entry_valid_or_report
0.11 ± 4% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.update_rq_clock_task
0.11 ± 4% +0.0 0.14 ± 4% perf-profile.children.cycles-pp.__switch_to_asm
0.04 ± 44% +0.0 0.07 ± 6% perf-profile.children.cycles-pp.folio_add_lru
0.06 ± 7% +0.0 0.10 ± 6% perf-profile.children.cycles-pp.shmem_add_to_page_cache
0.18 ± 5% +0.0 0.22 ± 4% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.02 ±141% +0.0 0.06 ± 6% perf-profile.children.cycles-pp.tick_nohz_idle_exit
0.12 ± 3% +0.0 0.17 ± 5% perf-profile.children.cycles-pp.select_task_rq_fair
0.13 ± 3% +0.0 0.18 ± 6% perf-profile.children.cycles-pp.select_task_rq
0.23 ± 3% +0.1 0.29 ± 3% perf-profile.children.cycles-pp.__smp_call_single_queue
0.20 ± 3% +0.1 0.26 ± 3% perf-profile.children.cycles-pp.update_load_avg
0.01 ±223% +0.1 0.07 ± 18% perf-profile.children.cycles-pp.shmem_alloc_and_acct_folio
0.26 ± 2% +0.1 0.34 ± 3% perf-profile.children.cycles-pp.dequeue_entity
0.29 ± 3% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.dequeue_task_fair
0.17 ± 3% +0.1 0.26 ± 2% perf-profile.children.cycles-pp.sync_regs
0.34 ± 2% +0.1 0.42 ± 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.28 ± 3% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.enqueue_entity
0.28 ± 3% +0.1 0.38 ± 6% perf-profile.children.cycles-pp.__perf_sw_event
0.32 ± 2% +0.1 0.42 ± 5% perf-profile.children.cycles-pp.___perf_sw_event
0.34 ± 3% +0.1 0.44 ± 4% perf-profile.children.cycles-pp.enqueue_task_fair
0.36 ± 2% +0.1 0.46 ± 3% perf-profile.children.cycles-pp.activate_task
0.24 ± 2% +0.1 0.35 perf-profile.children.cycles-pp.native_irq_return_iret
0.30 ± 6% +0.1 0.42 ± 10% perf-profile.children.cycles-pp.xas_load
0.31 +0.1 0.43 ± 3% perf-profile.children.cycles-pp.folio_unlock
0.44 ± 2% +0.1 0.56 ± 4% perf-profile.children.cycles-pp.ttwu_do_activate
0.40 ± 6% +0.2 0.56 ± 5% perf-profile.children.cycles-pp._compound_head
1.52 +0.2 1.68 ± 4% perf-profile.children.cycles-pp.wake_page_function
0.68 ± 3% +0.2 0.86 ± 4% perf-profile.children.cycles-pp.try_to_wake_up
0.66 ± 2% +0.2 0.84 ± 3% perf-profile.children.cycles-pp.sched_ttwu_pending
0.85 ± 2% +0.2 1.09 ± 3% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.79 ± 2% +0.2 1.03 ± 4% perf-profile.children.cycles-pp.flush_smp_call_function_queue
1.83 +0.3 2.08 ± 4% perf-profile.children.cycles-pp.__wake_up_common
1.29 +0.3 1.60 perf-profile.children.cycles-pp.folio_add_file_rmap_range
0.89 ± 9% +0.4 1.24 ± 8% perf-profile.children.cycles-pp.finish_fault
1.24 +0.4 1.60 perf-profile.children.cycles-pp.irqentry_exit_to_user_mode
1.68 ± 3% +0.4 2.06 ± 2% perf-profile.children.cycles-pp.set_pte_range
1.50 +0.6 2.06 perf-profile.children.cycles-pp.filemap_get_entry
3.42 ± 3% +0.8 4.24 perf-profile.children.cycles-pp._raw_spin_lock_irq
7.48 +0.9 8.41 perf-profile.children.cycles-pp.folio_wait_bit_common
9.67 ± 4% +1.4 11.07 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave
12.08 ± 3% +1.8 13.84 perf-profile.children.cycles-pp.folio_wake_bit
10.15 +1.9 12.07 perf-profile.children.cycles-pp.shmem_get_folio_gfp
11.80 ± 4% +1.9 13.74 ± 2% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
10.26 +2.0 12.25 perf-profile.children.cycles-pp.shmem_fault
10.29 +2.0 12.29 perf-profile.children.cycles-pp.__do_fault
8.59 +5.3 13.94 ± 2% perf-profile.children.cycles-pp.do_rw_once
35.10 -6.1 28.98 ± 2% perf-profile.self.cycles-pp.next_uptodate_folio
2.06 ± 7% -1.9 0.11 ± 4% perf-profile.self.cycles-pp.down_read_trylock
1.28 ± 4% -1.1 0.16 ± 3% perf-profile.self.cycles-pp.up_read
1.66 ± 6% -1.0 0.68 ± 3% perf-profile.self.cycles-pp.__handle_mm_fault
7.20 -0.7 6.55 perf-profile.self.cycles-pp.filemap_map_pages
0.64 ± 12% -0.4 0.28 ± 15% perf-profile.self.cycles-pp.intel_idle_irq
0.36 ± 7% -0.2 0.15 perf-profile.self.cycles-pp.mtree_range_walk
0.30 ± 8% -0.2 0.13 ± 14% perf-profile.self.cycles-pp.mm_cid_get
0.71 ± 8% -0.1 0.59 ± 7% perf-profile.self.cycles-pp.__schedule
0.05 ± 8% +0.0 0.06 ± 7% perf-profile.self.cycles-pp.ttwu_do_activate
0.08 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.do_idle
0.06 ± 6% +0.0 0.08 ± 6% perf-profile.self.cycles-pp.enqueue_task_fair
0.05 ± 8% +0.0 0.07 ± 8% perf-profile.self.cycles-pp.__update_load_avg_se
0.09 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.xas_descend
0.04 ± 44% +0.0 0.06 perf-profile.self.cycles-pp.reweight_entity
0.05 ± 7% +0.0 0.07 ± 9% perf-profile.self.cycles-pp.set_pte_range
0.08 ± 6% +0.0 0.10 ± 5% perf-profile.self.cycles-pp.update_load_avg
0.10 ± 4% +0.0 0.12 ± 3% perf-profile.self.cycles-pp.call_function_single_prep_ipi
0.07 ± 5% +0.0 0.09 ± 5% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.08 ± 6% +0.0 0.10 ± 6% perf-profile.self.cycles-pp.flush_smp_call_function_queue
0.10 ± 4% +0.0 0.13 ± 2% perf-profile.self.cycles-pp.__flush_smp_call_function_queue
0.16 ± 4% +0.0 0.19 ± 3% perf-profile.self.cycles-pp.__switch_to
0.14 ± 3% +0.0 0.16 ± 3% perf-profile.self.cycles-pp.llist_add_batch
0.09 ± 5% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.available_idle_cpu
0.08 ± 5% +0.0 0.12 ± 6% perf-profile.self.cycles-pp.enqueue_entity
0.08 ± 5% +0.0 0.12 ± 4% perf-profile.self.cycles-pp.llist_reverse_order
0.10 ± 4% +0.0 0.13 ± 3% perf-profile.self.cycles-pp.update_rq_clock_task
0.08 +0.0 0.11 perf-profile.self.cycles-pp.__list_del_entry_valid_or_report
0.11 ± 4% +0.0 0.14 ± 4% perf-profile.self.cycles-pp.__switch_to_asm
0.09 ± 5% +0.0 0.12 ± 8% perf-profile.self.cycles-pp.ttwu_queue_wakelist
0.12 ± 4% +0.0 0.16 ± 6% perf-profile.self.cycles-pp.xas_load
0.00 +0.1 0.05 perf-profile.self.cycles-pp.sched_ttwu_pending
0.00 +0.1 0.06 perf-profile.self.cycles-pp.asm_exc_page_fault
0.11 ± 4% +0.1 0.18 ± 4% perf-profile.self.cycles-pp.shmem_fault
0.17 ± 3% +0.1 0.26 ± 2% perf-profile.self.cycles-pp.sync_regs
0.31 ± 2% +0.1 0.40 ± 5% perf-profile.self.cycles-pp.___perf_sw_event
0.31 ± 2% +0.1 0.40 ± 3% perf-profile.self.cycles-pp.__wake_up_common
0.24 ± 2% +0.1 0.35 perf-profile.self.cycles-pp.native_irq_return_iret
0.31 +0.1 0.43 ± 3% perf-profile.self.cycles-pp.folio_unlock
0.44 ± 3% +0.1 0.57 ± 2% perf-profile.self.cycles-pp._raw_spin_lock_irq
0.68 ± 3% +0.1 0.83 ± 2% perf-profile.self.cycles-pp.folio_wake_bit
0.85 +0.2 1.00 ± 3% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.40 ± 5% +0.2 0.56 ± 5% perf-profile.self.cycles-pp._compound_head
1.29 +0.3 1.59 perf-profile.self.cycles-pp.folio_add_file_rmap_range
0.99 +0.3 1.30 ± 2% perf-profile.self.cycles-pp.shmem_get_folio_gfp
2.08 +0.3 2.39 ± 2% perf-profile.self.cycles-pp.folio_wait_bit_common
1.18 +0.4 1.55 perf-profile.self.cycles-pp.irqentry_exit_to_user_mode
1.43 +0.5 1.90 perf-profile.self.cycles-pp.filemap_get_entry
3.93 +1.9 5.85 perf-profile.self.cycles-pp.do_access
11.80 ± 4% +1.9 13.74 ± 2% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
6.55 +4.5 11.08 ± 2% perf-profile.self.cycles-pp.do_rw_once
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
next prev parent reply other threads:[~2023-10-20 9:55 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-06 19:53 [PATCH v2 0/6] Handle more faults under the VMA lock Matthew Wilcox (Oracle)
2023-10-06 19:53 ` [PATCH v2 1/6] mm: Make lock_folio_maybe_drop_mmap() VMA lock aware Matthew Wilcox (Oracle)
2023-10-08 21:47 ` Suren Baghdasaryan
2023-10-06 19:53 ` [PATCH v2 2/6] mm: Call wp_page_copy() under the VMA lock Matthew Wilcox (Oracle)
2023-10-08 22:00 ` Suren Baghdasaryan
2023-10-06 19:53 ` [PATCH v2 3/6] mm: Handle shared faults " Matthew Wilcox (Oracle)
2023-10-08 22:01 ` Suren Baghdasaryan
2023-10-20 13:23 ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 4/6] mm: Handle COW " Matthew Wilcox (Oracle)
2023-10-08 22:05 ` Suren Baghdasaryan
2023-10-20 13:18 ` kernel test robot
2023-10-06 19:53 ` [PATCH v2 5/6] mm: Handle read " Matthew Wilcox (Oracle)
2023-10-08 22:06 ` Suren Baghdasaryan
2023-10-20 9:55 ` kernel test robot [this message]
2023-10-06 19:53 ` [PATCH v2 6/6] mm: Handle write faults to RO pages " Matthew Wilcox (Oracle)
2023-10-08 22:07 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=202310201715.3f52109d-oliver.sang@intel.com \
--to=oliver.sang@intel.com \
--cc=akpm@linux-foundation.org \
--cc=feng.tang@intel.com \
--cc=fengwei.yin@intel.com \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=oe-lkp@lists.linux.dev \
--cc=surenb@google.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).