diff for duplicates of <9FE19350E8A7EE45B64D8D63D368C8966B834B67@SHSMSX101.ccr.corp.intel.com>

diff --git a/a/1.txt b/N1/1.txt
index 5e82f63..6fe9694 100644
--- a/a/1.txt
+++ b/N1/1.txt
@@ -1,470 +1,551 @@
-
-Some regression and improvements is found by LKP-tools(linux kernel performance) on V9 patch series
-tested on Intel 4s Skylake platform.
-
-The regression result is sorted by the metric will-it-scale.per_thread_ops.
-Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch series)
-Commit id:
-    base commit: d55f34411b1b126429a823d06c3124c16283231f
-    head commit: 0355322b3577eeab7669066df42c550a56801110
-Benchmark suite: will-it-scale
-Download link:
-https://github.com/antonblanchard/will-it-scale/tree/master/tests
-Metrics:
-    will-it-scale.per_process_ops=processes/nr_cpu
-    will-it-scale.per_thread_ops=threads/nr_cpu
-test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)
-THP: enable / disable
-nr_task: 100%
-
-1. Regressions:
-a) THP enabled:
-testcase                        base            change          head       metric
-page_fault3/ enable THP         10092           -17.5%          8323       will-it-scale.per_thread_ops
-page_fault2/ enable THP          8300           -17.2%          6869       will-it-scale.per_thread_ops
-brk1/ enable THP                  957.67         -7.6%           885       will-it-scale.per_thread_ops
-page_fault3/ enable THP        172821            -5.3%        163692       will-it-scale.per_process_ops
-signal1/ enable THP              9125            -3.2%          8834       will-it-scale.per_process_ops
-
-b) THP disabled:
-testcase                        base            change          head       metric
-page_fault3/ disable THP        10107           -19.1%          8180       will-it-scale.per_thread_ops
-page_fault2/ disable THP         8432           -17.8%          6931       will-it-scale.per_thread_ops
-context_switch1/ disable THP   215389            -6.8%        200776       will-it-scale.per_thread_ops
-brk1/ disable THP                 939.67         -6.6%           877.33    will-it-scale.per_thread_ops
-page_fault3/ disable THP       173145            -4.7%        165064       will-it-scale.per_process_ops
-signal1/ disable THP             9162            -3.9%          8802       will-it-scale.per_process_ops
-
-2. Improvements:
-a) THP enabled:
-testcase                        base            change          head       metric
-malloc1/ enable THP               66.33        +469.8%           383.67    will-it-scale.per_thread_ops
-writeseek3/ enable THP          2531             +4.5%          2646       will-it-scale.per_thread_ops
-signal1/ enable THP              989.33          +2.8%          1016       will-it-scale.per_thread_ops
-
-b) THP disabled:
-testcase                        base            change          head       metric
-malloc1/ disable THP              90.33        +417.3%           467.33    will-it-scale.per_thread_ops
-read2/ disable THP             58934            +39.2%         82060       will-it-scale.per_thread_ops
-page_fault1/ disable THP        8607            +36.4%         11736       will-it-scale.per_thread_ops
-read1/ disable THP            314063            +12.7%        353934       will-it-scale.per_thread_ops
-writeseek3/ disable THP         2452            +12.5%          2759       will-it-scale.per_thread_ops
-signal1/ disable THP             971.33          +5.5%          1024       will-it-scale.per_thread_ops
-
-Notes: for above values in column "change", the higher value means that the related testcase result
-on head commit is better than that on base commit for this benchmark.
-
-
-Best regards
-Haiyan Song
-
-________________________________________
-From: owner-linux-mm@kvack.org [owner-linux-mm@kvack.org] on behalf of Laurent Dufour [ldufour@linux.vnet.ibm.com]
-Sent: Thursday, May 17, 2018 7:06 PM
-To: akpm@linux-foundation.org; mhocko@kernel.org; peterz@infradead.org; kirill@shutemov.name; ak@linux.intel.com; dave@stgolabs.net; jack@suse.cz; Matthew Wilcox; khandual@linux.vnet.ibm.com; aneesh.kumar@linux.vnet.ibm.com; benh@kernel.crashing.org; mpe@ellerman.id.au; paulus@samba.org; Thomas Gleixner; Ingo Molnar; hpa@zytor.com; Will Deacon; Sergey Senozhatsky; sergey.senozhatsky.work@gmail.com; Andrea Arcangeli; Alexei Starovoitov; Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; Minchan Kim; Punit Agrawal; vinayak menon; Yang Shi
-Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; haren@linux.vnet.ibm.com; npiggin@gmail.com; bsingharora@gmail.com; paulmck@linux.vnet.ibm.com; Tim Chen; linuxppc-dev@lists.ozlabs.org; x86@kernel.org
-Subject: [PATCH v11 00/26] Speculative page faults
-
-This is a port on kernel 4.17 of the work done by Peter Zijlstra to handle
-page fault without holding the mm semaphore [1].
-
-The idea is to try to handle user space page faults without holding the
-mmap_sem. This should allow better concurrency for massively threaded
-process since the page fault handler will not wait for other threads memory
-layout change to be done, assuming that this change is done in another part
-of the process's memory space. This type page fault is named speculative
-page fault. If the speculative page fault fails because of a concurrency is
-detected or because underlying PMD or PTE tables are not yet allocating, it
-is failing its processing and a classic page fault is then tried.
-
-The speculative page fault (SPF) has to look for the VMA matching the fault
-address without holding the mmap_sem, this is done by introducing a rwlock
-which protects the access to the mm_rb tree. Previously this was done using
-SRCU but it was introducing a lot of scheduling to process the VMA's
-freeing operation which was hitting the performance by 20% as reported by
-Kemi Wang [2]. Using a rwlock to protect access to the mm_rb tree is
-limiting the locking contention to these operations which are expected to
-be in a O(log n) order. In addition to ensure that the VMA is not freed in
-our back a reference count is added and 2 services (get_vma() and
-put_vma()) are introduced to handle the reference count. Once a VMA is
-fetched from the RB tree using get_vma(), it must be later freed using
-put_vma(). I can't see anymore the overhead I got while will-it-scale
-benchmark anymore.
-
-The VMA's attributes checked during the speculative page fault processing
-have to be protected against parallel changes. This is done by using a per
-VMA sequence lock. This sequence lock allows the speculative page fault
-handler to fast check for parallel changes in progress and to abort the
-speculative page fault in that case.
-
-Once the VMA has been found, the speculative page fault handler would check
-for the VMA's attributes to verify that the page fault has to be handled
-correctly or not. Thus, the VMA is protected through a sequence lock which
-allows fast detection of concurrent VMA changes. If such a change is
-detected, the speculative page fault is aborted and a *classic* page fault
-is tried.  VMA sequence lockings are added when VMA attributes which are
-checked during the page fault are modified.
-
-When the PTE is fetched, the VMA is checked to see if it has been changed,
-so once the page table is locked, the VMA is valid, so any other changes
-leading to touching this PTE will need to lock the page table, so no
-parallel change is possible at this time.
-
-The locking of the PTE is done with interrupts disabled, this allows
-checking for the PMD to ensure that there is not an ongoing collapsing
-operation. Since khugepaged is firstly set the PMD to pmd_none and then is
-waiting for the other CPU to have caught the IPI interrupt, if the pmd is
-valid at the time the PTE is locked, we have the guarantee that the
-collapsing operation will have to wait on the PTE lock to move forward.
-This allows the SPF handler to map the PTE safely. If the PMD value is
-different from the one recorded at the beginning of the SPF operation, the
-classic page fault handler will be called to handle the operation while
-holding the mmap_sem. As the PTE lock is done with the interrupts disabled,
-the lock is done using spin_trylock() to avoid dead lock when handling a
-page fault while a TLB invalidate is requested by another CPU holding the
-PTE.
-
-In pseudo code, this could be seen as:
-    speculative_page_fault()
-    {
-            vma = get_vma()
-            check vma sequence count
-            check vma's support
-            disable interrupt
-                  check pgd,p4d,...,pte
-                  save pmd and pte in vmf
-                  save vma sequence counter in vmf
-            enable interrupt
-            check vma sequence count
-            handle_pte_fault(vma)
-                    ..
-                    page = alloc_page()
-                    pte_map_lock()
-                            disable interrupt
-                                    abort if sequence counter has changed
-                                    abort if pmd or pte has changed
-                                    pte map and lock
-                            enable interrupt
-                    if abort
-                       free page
-                       abort
-                    ...
-    }
-
-    arch_fault_handler()
-    {
-            if (speculative_page_fault(&vma))
-               goto done
-    again:
-            lock(mmap_sem)
-            vma = find_vma();
-            handle_pte_fault(vma);
-            if retry
-               unlock(mmap_sem)
-               goto again;
-    done:
-            handle fault error
-    }
-
-Support for THP is not done because when checking for the PMD, we can be
-confused by an in progress collapsing operation done by khugepaged. The
-issue is that pmd_none() could be true either if the PMD is not already
-populated or if the underlying PTE are in the way to be collapsed. So we
-cannot safely allocate a PMD if pmd_none() is true.
-
-This series add a new software performance event named 'speculative-faults'
-or 'spf'. It counts the number of successful page fault event handled
-speculatively. When recording 'faults,spf' events, the faults one is
-counting the total number of page fault events while 'spf' is only counting
-the part of the faults processed speculatively.
-
-There are some trace events introduced by this series. They allow
-identifying why the page faults were not processed speculatively. This
-doesn't take in account the faults generated by a monothreaded process
-which directly processed while holding the mmap_sem. This trace events are
-grouped in a system named 'pagefault', they are:
- - pagefault:spf_vma_changed : if the VMA has been changed in our back
- - pagefault:spf_vma_noanon : the vma->anon_vma field was not yet set.
- - pagefault:spf_vma_notsup : the VMA's type is not supported
- - pagefault:spf_vma_access : the VMA's access right are not respected
- - pagefault:spf_pmd_changed : the upper PMD pointer has changed in our
-   back.
-
-To record all the related events, the easier is to run perf with the
-following arguments :
-$ perf stat -e 'faults,spf,pagefault:*' <command>
-
-There is also a dedicated vmstat counter showing the number of successful
-page fault handled speculatively. I can be seen this way:
-$ grep speculative_pgfault /proc/vmstat
-
-This series builds on top of v4.16-mmotm-2018-04-13-17-28 and is functional
-on x86, PowerPC and arm64.
-
----------------------
-Real Workload results
-
-As mentioned in previous email, we did non official runs using a "popular
-in memory multithreaded database product" on 176 cores SMT8 Power system
-which showed a 30% improvements in the number of transaction processed per
-second. This run has been done on the v6 series, but changes introduced in
-this new version should not impact the performance boost seen.
-
-Here are the perf data captured during 2 of these runs on top of the v8
-series:
-                vanilla         spf
-faults          89.418          101.364         +13%
-spf                n/a           97.989
-
-With the SPF kernel, most of the page fault were processed in a speculative
-way.
-
-Ganesh Mahendran had backported the series on top of a 4.9 kernel and gave
-it a try on an android device. He reported that the application launch time
-was improved in average by 6%, and for large applications (~100 threads) by
-20%.
-
-Here are the launch time Ganesh mesured on Android 8.0 on top of a Qcom
-MSM845 (8 cores) with 6GB (the less is better):
-
-Application                             4.9     4.9+spf delta
-com.tencent.mm                          416     389     -7%
-com.eg.android.AlipayGphone             1135    986     -13%
-com.tencent.mtt                         455     454     0%
-com.qqgame.hlddz                        1497    1409    -6%
-com.autonavi.minimap                    711     701     -1%
-com.tencent.tmgp.sgame                  788     748     -5%
-com.immomo.momo                         501     487     -3%
-com.tencent.peng                        2145    2112    -2%
-com.smile.gifmaker                      491     461     -6%
-com.baidu.BaiduMap                      479     366     -23%
-com.taobao.taobao                       1341    1198    -11%
-com.baidu.searchbox                     333     314     -6%
-com.tencent.mobileqq                    394     384     -3%
-com.sina.weibo                          907     906     0%
-com.youku.phone                         816     731     -11%
-com.happyelements.AndroidAnimal.qq      763     717     -6%
-com.UCMobile                            415     411     -1%
-com.tencent.tmgp.ak                     1464    1431    -2%
-com.tencent.qqmusic                     336     329     -2%
-com.sankuai.meituan                     1661    1302    -22%
-com.netease.cloudmusic                  1193    1200    1%
-air.tv.douyu.android                    4257    4152    -2%
-
-------------------
-Benchmarks results
-
-Base kernel is v4.17.0-rc4-mm1
-SPF is BASE + this series
-
-Kernbench:
-----------
-Here are the results on a 16 CPUs X86 guest using kernbench on a 4.15
-kernel (kernel is build 5 times):
-
-Average Half load -j 8
-                 Run    (std deviation)
-                 BASE                   SPF
-Elapsed Time     1448.65 (5.72312)      1455.84 (4.84951)       0.50%
-User    Time     10135.4 (30.3699)      10148.8 (31.1252)       0.13%
-System  Time     900.47  (2.81131)      923.28  (7.52779)       2.53%
-Percent CPU      761.4   (1.14018)      760.2   (0.447214)      -0.16%
-Context Switches 85380   (3419.52)      84748   (1904.44)       -0.74%
-Sleeps           105064  (1240.96)      105074  (337.612)       0.01%
-
-Average Optimal load -j 16
-                 Run    (std deviation)
-                 BASE                   SPF
-Elapsed Time     920.528 (10.1212)      927.404 (8.91789)       0.75%
-User    Time     11064.8 (981.142)      11085   (990.897)       0.18%
-System  Time     979.904 (84.0615)      1001.14 (82.5523)       2.17%
-Percent CPU      1089.5  (345.894)      1086.1  (343.545)       -0.31%
-Context Switches 159488  (78156.4)      158223  (77472.1)       -0.79%
-Sleeps           110566  (5877.49)      110388  (5617.75)       -0.16%
-
-
-During a run on the SPF, perf events were captured:
- Performance counter stats for '../kernbench -M':
-         526743764      faults
-               210      spf
-                 3      pagefault:spf_vma_changed
-                 0      pagefault:spf_vma_noanon
-              2278      pagefault:spf_vma_notsup
-                 0      pagefault:spf_vma_access
-                 0      pagefault:spf_pmd_changed
-
-Very few speculative page faults were recorded as most of the processes
-involved are monothreaded (sounds that on this architecture some threads
-were created during the kernel build processing).
-
-Here are the kerbench results on a 80 CPUs Power8 system:
-
-Average Half load -j 40
-                 Run    (std deviation)
-                 BASE                   SPF
-Elapsed Time     117.152 (0.774642)     117.166 (0.476057)      0.01%
-User    Time     4478.52 (24.7688)      4479.76 (9.08555)       0.03%
-System  Time     131.104 (0.720056)     134.04  (0.708414)      2.24%
-Percent CPU      3934    (19.7104)      3937.2  (19.0184)       0.08%
-Context Switches 92125.4 (576.787)      92581.6 (198.622)       0.50%
-Sleeps           317923  (652.499)      318469  (1255.59)       0.17%
-
-Average Optimal load -j 80
-                 Run    (std deviation)
-                 BASE                   SPF
-Elapsed Time     107.73  (0.632416)     107.31  (0.584936)      -0.39%
-User    Time     5869.86 (1466.72)      5871.71 (1467.27)       0.03%
-System  Time     153.728 (23.8573)      157.153 (24.3704)       2.23%
-Percent CPU      5418.6  (1565.17)      5436.7  (1580.91)       0.33%
-Context Switches 223861  (138865)       225032  (139632)        0.52%
-Sleeps           330529  (13495.1)      332001  (14746.2)       0.45%
-
-During a run on the SPF, perf events were captured:
- Performance counter stats for '../kernbench -M':
-         116730856      faults
-                 0      spf
-                 3      pagefault:spf_vma_changed
-                 0      pagefault:spf_vma_noanon
-               476      pagefault:spf_vma_notsup
-                 0      pagefault:spf_vma_access
-                 0      pagefault:spf_pmd_changed
-
-Most of the processes involved are monothreaded so SPF is not activated but
-there is no impact on the performance.
-
-Ebizzy:
--------
-The test is counting the number of records per second it can manage, the
-higher is the best. I run it like this 'ebizzy -mTt <nrcpus>'. To get
-consistent result I repeated the test 100 times and measure the average
-result. The number is the record processes per second, the higher is the
-best.
-
-                BASE            SPF             delta
-16 CPUs x86 VM  742.57          1490.24         100.69%
-80 CPUs P8 node 13105.4         24174.23        84.46%
-
-Here are the performance counter read during a run on a 16 CPUs x86 VM:
- Performance counter stats for './ebizzy -mTt 16':
-           1706379      faults
-           1674599      spf
-             30588      pagefault:spf_vma_changed
-                 0      pagefault:spf_vma_noanon
-               363      pagefault:spf_vma_notsup
-                 0      pagefault:spf_vma_access
-                 0      pagefault:spf_pmd_changed
-
-And the ones captured during a run on a 80 CPUs Power node:
- Performance counter stats for './ebizzy -mTt 80':
-           1874773      faults
-           1461153      spf
-            413293      pagefault:spf_vma_changed
-                 0      pagefault:spf_vma_noanon
-               200      pagefault:spf_vma_notsup
-                 0      pagefault:spf_vma_access
-                 0      pagefault:spf_pmd_changed
-
-In ebizzy's case most of the page fault were handled in a speculative way,
-leading the ebizzy performance boost.
-
-------------------
-Changes since v10 (https://lkml.org/lkml/2018/4/17/572):
- - Accounted for all review feedbacks from Punit Agrawal, Ganesh Mahendran
-   and Minchan Kim, hopefully.
- - Remove unneeded check on CONFIG_SPECULATIVE_PAGE_FAULT in
-   __do_page_fault().
- - Loop in pte_spinlock() and pte_map_lock() when pte try lock fails
-   instead
-   of aborting the speculative page fault handling. Dropping the now
-useless
-   trace event pagefault:spf_pte_lock.
- - No more try to reuse the fetched VMA during the speculative page fault
-   handling when retrying is needed. This adds a lot of complexity and
-   additional tests done didn't show a significant performance improvement.
- - Convert IS_ENABLED(CONFIG_NUMA) back to #ifdef due to build error.
-
-[1] http://linux-kernel.2935.n7.nabble.com/RFC-PATCH-0-6-Another-go-at-speculative-page-faults-tt965642.html#none
-[2] https://patchwork.kernel.org/patch/9999687/
-
-
-Laurent Dufour (20):
-  mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT
-  x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
-  powerpc/mm: set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
-  mm: introduce pte_spinlock for FAULT_FLAG_SPECULATIVE
-  mm: make pte_unmap_same compatible with SPF
-  mm: introduce INIT_VMA()
-  mm: protect VMA modifications using VMA sequence count
-  mm: protect mremap() against SPF hanlder
-  mm: protect SPF handler against anon_vma changes
-  mm: cache some VMA fields in the vm_fault structure
-  mm/migrate: Pass vm_fault pointer to migrate_misplaced_page()
-  mm: introduce __lru_cache_add_active_or_unevictable
-  mm: introduce __vm_normal_page()
-  mm: introduce __page_add_new_anon_rmap()
-  mm: protect mm_rb tree with a rwlock
-  mm: adding speculative page fault failure trace events
-  perf: add a speculative page fault sw event
-  perf tools: add support for the SPF perf event
-  mm: add speculative page fault vmstats
-  powerpc/mm: add speculative page fault
-
-Mahendran Ganesh (2):
-  arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT
-  arm64/mm: add speculative page fault
-
-Peter Zijlstra (4):
-  mm: prepare for FAULT_FLAG_SPECULATIVE
-  mm: VMA sequence count
-  mm: provide speculative fault infrastructure
-  x86/mm: add speculative pagefault handling
-
- arch/arm64/Kconfig                    |   1 +
- arch/arm64/mm/fault.c                 |  12 +
- arch/powerpc/Kconfig                  |   1 +
- arch/powerpc/mm/fault.c               |  16 +
- arch/x86/Kconfig                      |   1 +
- arch/x86/mm/fault.c                   |  27 +-
- fs/exec.c                             |   2 +-
- fs/proc/task_mmu.c                    |   5 +-
- fs/userfaultfd.c                      |  17 +-
- include/linux/hugetlb_inline.h        |   2 +-
- include/linux/migrate.h               |   4 +-
- include/linux/mm.h                    | 136 +++++++-
- include/linux/mm_types.h              |   7 +
- include/linux/pagemap.h               |   4 +-
- include/linux/rmap.h                  |  12 +-
- include/linux/swap.h                  |  10 +-
- include/linux/vm_event_item.h         |   3 +
- include/trace/events/pagefault.h      |  80 +++++
- include/uapi/linux/perf_event.h       |   1 +
- kernel/fork.c                         |   5 +-
- mm/Kconfig                            |  22 ++
- mm/huge_memory.c                      |   6 +-
- mm/hugetlb.c                          |   2 +
- mm/init-mm.c                          |   3 +
- mm/internal.h                         |  20 ++
- mm/khugepaged.c                       |   5 +
- mm/madvise.c                          |   6 +-
- mm/memory.c                           | 612 +++++++++++++++++++++++++++++-----
- mm/mempolicy.c                        |  51 ++-
- mm/migrate.c                          |   6 +-
- mm/mlock.c                            |  13 +-
- mm/mmap.c                             | 229 ++++++++++---
- mm/mprotect.c                         |   4 +-
- mm/mremap.c                           |  13 +
- mm/nommu.c                            |   2 +-
- mm/rmap.c                             |   5 +-
- mm/swap.c                             |   6 +-
- mm/swap_state.c                       |   8 +-
- mm/vmstat.c                           |   5 +-
- tools/include/uapi/linux/perf_event.h |   1 +
- tools/perf/util/evsel.c               |   1 +
- tools/perf/util/parse-events.c        |   4 +
- tools/perf/util/parse-events.l        |   1 +
- tools/perf/util/python.c              |   1 +
- 44 files changed, 1161 insertions(+), 211 deletions(-)
- create mode 100644 include/trace/events/pagefault.h
-
---
-2.7.4
\ No newline at end of file
+=0A=
+Some regression and improvements is found by LKP-tools(linux kernel perform=
+ance) on V9 patch series=0A=
+tested on Intel 4s Skylake platform.=0A=
+=0A=
+The regression result is sorted by the metric will-it-scale.per_thread_ops.=
+=0A=
+Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch se=
+ries)=0A=
+Commit id:=0A=
+    base commit: d55f34411b1b126429a823d06c3124c16283231f=0A=
+    head commit: 0355322b3577eeab7669066df42c550a56801110=0A=
+Benchmark suite: will-it-scale=0A=
+Download link:=0A=
+https://github.com/antonblanchard/will-it-scale/tree/master/tests=0A=
+Metrics:=0A=
+    will-it-scale.per_process_ops=3Dprocesses/nr_cpu=0A=
+    will-it-scale.per_thread_ops=3Dthreads/nr_cpu=0A=
+test box: lkp-skl-4sp1(nr_cpu=3D192,memory=3D768G)=0A=
+THP: enable / disable=0A=
+nr_task: 100%=0A=
+=0A=
+1. Regressions:=0A=
+a) THP enabled:=0A=
+testcase                        base            change          head       =
+metric=0A=
+page_fault3/ enable THP         10092           -17.5%          8323       =
+will-it-scale.per_thread_ops=0A=
+page_fault2/ enable THP          8300           -17.2%          6869       =
+will-it-scale.per_thread_ops=0A=
+brk1/ enable THP                  957.67         -7.6%           885       =
+will-it-scale.per_thread_ops=0A=
+page_fault3/ enable THP        172821            -5.3%        163692       =
+will-it-scale.per_process_ops=0A=
+signal1/ enable THP              9125            -3.2%          8834       =
+will-it-scale.per_process_ops=0A=
+=0A=
+b) THP disabled:=0A=
+testcase                        base            change          head       =
+metric=0A=
+page_fault3/ disable THP        10107           -19.1%          8180       =
+will-it-scale.per_thread_ops=0A=
+page_fault2/ disable THP         8432           -17.8%          6931       =
+will-it-scale.per_thread_ops=0A=
+context_switch1/ disable THP   215389            -6.8%        200776       =
+will-it-scale.per_thread_ops=0A=
+brk1/ disable THP                 939.67         -6.6%           877.33    =
+will-it-scale.per_thread_ops=0A=
+page_fault3/ disable THP       173145            -4.7%        165064       =
+will-it-scale.per_process_ops=0A=
+signal1/ disable THP             9162            -3.9%          8802       =
+will-it-scale.per_process_ops=0A=
+=0A=
+2. Improvements:=0A=
+a) THP enabled:=0A=
+testcase                        base            change          head       =
+metric=0A=
+malloc1/ enable THP               66.33        +469.8%           383.67    =
+will-it-scale.per_thread_ops=0A=
+writeseek3/ enable THP          2531             +4.5%          2646       =
+will-it-scale.per_thread_ops=0A=
+signal1/ enable THP              989.33          +2.8%          1016       =
+will-it-scale.per_thread_ops=0A=
+=0A=
+b) THP disabled:=0A=
+testcase                        base            change          head       =
+metric=0A=
+malloc1/ disable THP              90.33        +417.3%           467.33    =
+will-it-scale.per_thread_ops=0A=
+read2/ disable THP             58934            +39.2%         82060       =
+will-it-scale.per_thread_ops=0A=
+page_fault1/ disable THP        8607            +36.4%         11736       =
+will-it-scale.per_thread_ops=0A=
+read1/ disable THP            314063            +12.7%        353934       =
+will-it-scale.per_thread_ops=0A=
+writeseek3/ disable THP         2452            +12.5%          2759       =
+will-it-scale.per_thread_ops=0A=
+signal1/ disable THP             971.33          +5.5%          1024       =
+will-it-scale.per_thread_ops=0A=
+=0A=
+Notes: for above values in column "change", the higher value means that the=
+ related testcase result=0A=
+on head commit is better than that on base commit for this benchmark.=0A=
+=0A=
+=0A=
+Best regards=0A=
+Haiyan Song=0A=
+=0A=
+________________________________________=0A=
+From: owner-linux-mm@kvack.org [owner-linux-mm@kvack.org] on behalf of Laur=
+ent Dufour [ldufour@linux.vnet.ibm.com]=0A=
+Sent: Thursday, May 17, 2018 7:06 PM=0A=
+To: akpm@linux-foundation.org; mhocko@kernel.org; peterz@infradead.org; kir=
+ill@shutemov.name; ak@linux.intel.com; dave@stgolabs.net; jack@suse.cz; Mat=
+thew Wilcox; khandual@linux.vnet.ibm.com; aneesh.kumar@linux.vnet.ibm.com; =
+benh@kernel.crashing.org; mpe@ellerman.id.au; paulus@samba.org; Thomas Glei=
+xner; Ingo Molnar; hpa@zytor.com; Will Deacon; Sergey Senozhatsky; sergey.s=
+enozhatsky.work@gmail.com; Andrea Arcangeli; Alexei Starovoitov; Wang, Kemi=
+; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; Minchan K=
+im; Punit Agrawal; vinayak menon; Yang Shi=0A=
+Cc: linux-kernel@vger.kernel.org; linux-mm@kvack.org; haren@linux.vnet.ibm.=
+com; npiggin@gmail.com; bsingharora@gmail.com; paulmck@linux.vnet.ibm.com; =
+Tim Chen; linuxppc-dev@lists.ozlabs.org; x86@kernel.org=0A=
+Subject: [PATCH v11 00/26] Speculative page faults=0A=
+=0A=
+This is a port on kernel 4.17 of the work done by Peter Zijlstra to handle=
+=0A=
+page fault without holding the mm semaphore [1].=0A=
+=0A=
+The idea is to try to handle user space page faults without holding the=0A=
+mmap_sem. This should allow better concurrency for massively threaded=0A=
+process since the page fault handler will not wait for other threads memory=
+=0A=
+layout change to be done, assuming that this change is done in another part=
+=0A=
+of the process's memory space. This type page fault is named speculative=0A=
+page fault. If the speculative page fault fails because of a concurrency is=
+=0A=
+detected or because underlying PMD or PTE tables are not yet allocating, it=
+=0A=
+is failing its processing and a classic page fault is then tried.=0A=
+=0A=
+The speculative page fault (SPF) has to look for the VMA matching the fault=
+=0A=
+address without holding the mmap_sem, this is done by introducing a rwlock=
+=0A=
+which protects the access to the mm_rb tree. Previously this was done using=
+=0A=
+SRCU but it was introducing a lot of scheduling to process the VMA's=0A=
+freeing operation which was hitting the performance by 20% as reported by=
+=0A=
+Kemi Wang [2]. Using a rwlock to protect access to the mm_rb tree is=0A=
+limiting the locking contention to these operations which are expected to=
+=0A=
+be in a O(log n) order. In addition to ensure that the VMA is not freed in=
+=0A=
+our back a reference count is added and 2 services (get_vma() and=0A=
+put_vma()) are introduced to handle the reference count. Once a VMA is=0A=
+fetched from the RB tree using get_vma(), it must be later freed using=0A=
+put_vma(). I can't see anymore the overhead I got while will-it-scale=0A=
+benchmark anymore.=0A=
+=0A=
+The VMA's attributes checked during the speculative page fault processing=
+=0A=
+have to be protected against parallel changes. This is done by using a per=
+=0A=
+VMA sequence lock. This sequence lock allows the speculative page fault=0A=
+handler to fast check for parallel changes in progress and to abort the=0A=
+speculative page fault in that case.=0A=
+=0A=
+Once the VMA has been found, the speculative page fault handler would check=
+=0A=
+for the VMA's attributes to verify that the page fault has to be handled=0A=
+correctly or not. Thus, the VMA is protected through a sequence lock which=
+=0A=
+allows fast detection of concurrent VMA changes. If such a change is=0A=
+detected, the speculative page fault is aborted and a *classic* page fault=
+=0A=
+is tried.  VMA sequence lockings are added when VMA attributes which are=0A=
+checked during the page fault are modified.=0A=
+=0A=
+When the PTE is fetched, the VMA is checked to see if it has been changed,=
+=0A=
+so once the page table is locked, the VMA is valid, so any other changes=0A=
+leading to touching this PTE will need to lock the page table, so no=0A=
+parallel change is possible at this time.=0A=
+=0A=
+The locking of the PTE is done with interrupts disabled, this allows=0A=
+checking for the PMD to ensure that there is not an ongoing collapsing=0A=
+operation. Since khugepaged is firstly set the PMD to pmd_none and then is=
+=0A=
+waiting for the other CPU to have caught the IPI interrupt, if the pmd is=
+=0A=
+valid at the time the PTE is locked, we have the guarantee that the=0A=
+collapsing operation will have to wait on the PTE lock to move forward.=0A=
+This allows the SPF handler to map the PTE safely. If the PMD value is=0A=
+different from the one recorded at the beginning of the SPF operation, the=
+=0A=
+classic page fault handler will be called to handle the operation while=0A=
+holding the mmap_sem. As the PTE lock is done with the interrupts disabled,=
+=0A=
+the lock is done using spin_trylock() to avoid dead lock when handling a=0A=
+page fault while a TLB invalidate is requested by another CPU holding the=
+=0A=
+PTE.=0A=
+=0A=
+In pseudo code, this could be seen as:=0A=
+    speculative_page_fault()=0A=
+    {=0A=
+            vma =3D get_vma()=0A=
+            check vma sequence count=0A=
+            check vma's support=0A=
+            disable interrupt=0A=
+                  check pgd,p4d,...,pte=0A=
+                  save pmd and pte in vmf=0A=
+                  save vma sequence counter in vmf=0A=
+            enable interrupt=0A=
+            check vma sequence count=0A=
+            handle_pte_fault(vma)=0A=
+                    ..=0A=
+                    page =3D alloc_page()=0A=
+                    pte_map_lock()=0A=
+                            disable interrupt=0A=
+                                    abort if sequence counter has changed=
+=0A=
+                                    abort if pmd or pte has changed=0A=
+                                    pte map and lock=0A=
+                            enable interrupt=0A=
+                    if abort=0A=
+                       free page=0A=
+                       abort=0A=
+                    ...=0A=
+    }=0A=
+=0A=
+    arch_fault_handler()=0A=
+    {=0A=
+            if (speculative_page_fault(&vma))=0A=
+               goto done=0A=
+    again:=0A=
+            lock(mmap_sem)=0A=
+            vma =3D find_vma();=0A=
+            handle_pte_fault(vma);=0A=
+            if retry=0A=
+               unlock(mmap_sem)=0A=
+               goto again;=0A=
+    done:=0A=
+            handle fault error=0A=
+    }=0A=
+=0A=
+Support for THP is not done because when checking for the PMD, we can be=0A=
+confused by an in progress collapsing operation done by khugepaged. The=0A=
+issue is that pmd_none() could be true either if the PMD is not already=0A=
+populated or if the underlying PTE are in the way to be collapsed. So we=0A=
+cannot safely allocate a PMD if pmd_none() is true.=0A=
+=0A=
+This series add a new software performance event named 'speculative-faults'=
+=0A=
+or 'spf'. It counts the number of successful page fault event handled=0A=
+speculatively. When recording 'faults,spf' events, the faults one is=0A=
+counting the total number of page fault events while 'spf' is only counting=
+=0A=
+the part of the faults processed speculatively.=0A=
+=0A=
+There are some trace events introduced by this series. They allow=0A=
+identifying why the page faults were not processed speculatively. This=0A=
+doesn't take in account the faults generated by a monothreaded process=0A=
+which directly processed while holding the mmap_sem. This trace events are=
+=0A=
+grouped in a system named 'pagefault', they are:=0A=
+ - pagefault:spf_vma_changed : if the VMA has been changed in our back=0A=
+ - pagefault:spf_vma_noanon : the vma->anon_vma field was not yet set.=0A=
+ - pagefault:spf_vma_notsup : the VMA's type is not supported=0A=
+ - pagefault:spf_vma_access : the VMA's access right are not respected=0A=
+ - pagefault:spf_pmd_changed : the upper PMD pointer has changed in our=0A=
+   back.=0A=
+=0A=
+To record all the related events, the easier is to run perf with the=0A=
+following arguments :=0A=
+$ perf stat -e 'faults,spf,pagefault:*' <command>=0A=
+=0A=
+There is also a dedicated vmstat counter showing the number of successful=
+=0A=
+page fault handled speculatively. I can be seen this way:=0A=
+$ grep speculative_pgfault /proc/vmstat=0A=
+=0A=
+This series builds on top of v4.16-mmotm-2018-04-13-17-28 and is functional=
+=0A=
+on x86, PowerPC and arm64.=0A=
+=0A=
+---------------------=0A=
+Real Workload results=0A=
+=0A=
+As mentioned in previous email, we did non official runs using a "popular=
+=0A=
+in memory multithreaded database product" on 176 cores SMT8 Power system=0A=
+which showed a 30% improvements in the number of transaction processed per=
+=0A=
+second. This run has been done on the v6 series, but changes introduced in=
+=0A=
+this new version should not impact the performance boost seen.=0A=
+=0A=
+Here are the perf data captured during 2 of these runs on top of the v8=0A=
+series:=0A=
+                vanilla         spf=0A=
+faults          89.418          101.364         +13%=0A=
+spf                n/a           97.989=0A=
+=0A=
+With the SPF kernel, most of the page fault were processed in a speculative=
+=0A=
+way.=0A=
+=0A=
+Ganesh Mahendran had backported the series on top of a 4.9 kernel and gave=
+=0A=
+it a try on an android device. He reported that the application launch time=
+=0A=
+was improved in average by 6%, and for large applications (~100 threads) by=
+=0A=
+20%.=0A=
+=0A=
+Here are the launch time Ganesh mesured on Android 8.0 on top of a Qcom=0A=
+MSM845 (8 cores) with 6GB (the less is better):=0A=
+=0A=
+Application                             4.9     4.9+spf delta=0A=
+com.tencent.mm                          416     389     -7%=0A=
+com.eg.android.AlipayGphone             1135    986     -13%=0A=
+com.tencent.mtt                         455     454     0%=0A=
+com.qqgame.hlddz                        1497    1409    -6%=0A=
+com.autonavi.minimap                    711     701     -1%=0A=
+com.tencent.tmgp.sgame                  788     748     -5%=0A=
+com.immomo.momo                         501     487     -3%=0A=
+com.tencent.peng                        2145    2112    -2%=0A=
+com.smile.gifmaker                      491     461     -6%=0A=
+com.baidu.BaiduMap                      479     366     -23%=0A=
+com.taobao.taobao                       1341    1198    -11%=0A=
+com.baidu.searchbox                     333     314     -6%=0A=
+com.tencent.mobileqq                    394     384     -3%=0A=
+com.sina.weibo                          907     906     0%=0A=
+com.youku.phone                         816     731     -11%=0A=
+com.happyelements.AndroidAnimal.qq      763     717     -6%=0A=
+com.UCMobile                            415     411     -1%=0A=
+com.tencent.tmgp.ak                     1464    1431    -2%=0A=
+com.tencent.qqmusic                     336     329     -2%=0A=
+com.sankuai.meituan                     1661    1302    -22%=0A=
+com.netease.cloudmusic                  1193    1200    1%=0A=
+air.tv.douyu.android                    4257    4152    -2%=0A=
+=0A=
+------------------=0A=
+Benchmarks results=0A=
+=0A=
+Base kernel is v4.17.0-rc4-mm1=0A=
+SPF is BASE + this series=0A=
+=0A=
+Kernbench:=0A=
+----------=0A=
+Here are the results on a 16 CPUs X86 guest using kernbench on a 4.15=0A=
+kernel (kernel is build 5 times):=0A=
+=0A=
+Average Half load -j 8=0A=
+                 Run    (std deviation)=0A=
+                 BASE                   SPF=0A=
+Elapsed Time     1448.65 (5.72312)      1455.84 (4.84951)       0.50%=0A=
+User    Time     10135.4 (30.3699)      10148.8 (31.1252)       0.13%=0A=
+System  Time     900.47  (2.81131)      923.28  (7.52779)       2.53%=0A=
+Percent CPU      761.4   (1.14018)      760.2   (0.447214)      -0.16%=0A=
+Context Switches 85380   (3419.52)      84748   (1904.44)       -0.74%=0A=
+Sleeps           105064  (1240.96)      105074  (337.612)       0.01%=0A=
+=0A=
+Average Optimal load -j 16=0A=
+                 Run    (std deviation)=0A=
+                 BASE                   SPF=0A=
+Elapsed Time     920.528 (10.1212)      927.404 (8.91789)       0.75%=0A=
+User    Time     11064.8 (981.142)      11085   (990.897)       0.18%=0A=
+System  Time     979.904 (84.0615)      1001.14 (82.5523)       2.17%=0A=
+Percent CPU      1089.5  (345.894)      1086.1  (343.545)       -0.31%=0A=
+Context Switches 159488  (78156.4)      158223  (77472.1)       -0.79%=0A=
+Sleeps           110566  (5877.49)      110388  (5617.75)       -0.16%=0A=
+=0A=
+=0A=
+During a run on the SPF, perf events were captured:=0A=
+ Performance counter stats for '../kernbench -M':=0A=
+         526743764      faults=0A=
+               210      spf=0A=
+                 3      pagefault:spf_vma_changed=0A=
+                 0      pagefault:spf_vma_noanon=0A=
+              2278      pagefault:spf_vma_notsup=0A=
+                 0      pagefault:spf_vma_access=0A=
+                 0      pagefault:spf_pmd_changed=0A=
+=0A=
+Very few speculative page faults were recorded as most of the processes=0A=
+involved are monothreaded (sounds that on this architecture some threads=0A=
+were created during the kernel build processing).=0A=
+=0A=
+Here are the kerbench results on a 80 CPUs Power8 system:=0A=
+=0A=
+Average Half load -j 40=0A=
+                 Run    (std deviation)=0A=
+                 BASE                   SPF=0A=
+Elapsed Time     117.152 (0.774642)     117.166 (0.476057)      0.01%=0A=
+User    Time     4478.52 (24.7688)      4479.76 (9.08555)       0.03%=0A=
+System  Time     131.104 (0.720056)     134.04  (0.708414)      2.24%=0A=
+Percent CPU      3934    (19.7104)      3937.2  (19.0184)       0.08%=0A=
+Context Switches 92125.4 (576.787)      92581.6 (198.622)       0.50%=0A=
+Sleeps           317923  (652.499)      318469  (1255.59)       0.17%=0A=
+=0A=
+Average Optimal load -j 80=0A=
+                 Run    (std deviation)=0A=
+                 BASE                   SPF=0A=
+Elapsed Time     107.73  (0.632416)     107.31  (0.584936)      -0.39%=0A=
+User    Time     5869.86 (1466.72)      5871.71 (1467.27)       0.03%=0A=
+System  Time     153.728 (23.8573)      157.153 (24.3704)       2.23%=0A=
+Percent CPU      5418.6  (1565.17)      5436.7  (1580.91)       0.33%=0A=
+Context Switches 223861  (138865)       225032  (139632)        0.52%=0A=
+Sleeps           330529  (13495.1)      332001  (14746.2)       0.45%=0A=
+=0A=
+During a run on the SPF, perf events were captured:=0A=
+ Performance counter stats for '../kernbench -M':=0A=
+         116730856      faults=0A=
+                 0      spf=0A=
+                 3      pagefault:spf_vma_changed=0A=
+                 0      pagefault:spf_vma_noanon=0A=
+               476      pagefault:spf_vma_notsup=0A=
+                 0      pagefault:spf_vma_access=0A=
+                 0      pagefault:spf_pmd_changed=0A=
+=0A=
+Most of the processes involved are monothreaded so SPF is not activated but=
+=0A=
+there is no impact on the performance.=0A=
+=0A=
+Ebizzy:=0A=
+-------=0A=
+The test is counting the number of records per second it can manage, the=0A=
+higher is the best. I run it like this 'ebizzy -mTt <nrcpus>'. To get=0A=
+consistent result I repeated the test 100 times and measure the average=0A=
+result. The number is the record processes per second, the higher is the=0A=
+best.=0A=
+=0A=
+                BASE            SPF             delta=0A=
+16 CPUs x86 VM  742.57          1490.24         100.69%=0A=
+80 CPUs P8 node 13105.4         24174.23        84.46%=0A=
+=0A=
+Here are the performance counter read during a run on a 16 CPUs x86 VM:=0A=
+ Performance counter stats for './ebizzy -mTt 16':=0A=
+           1706379      faults=0A=
+           1674599      spf=0A=
+             30588      pagefault:spf_vma_changed=0A=
+                 0      pagefault:spf_vma_noanon=0A=
+               363      pagefault:spf_vma_notsup=0A=
+                 0      pagefault:spf_vma_access=0A=
+                 0      pagefault:spf_pmd_changed=0A=
+=0A=
+And the ones captured during a run on a 80 CPUs Power node:=0A=
+ Performance counter stats for './ebizzy -mTt 80':=0A=
+           1874773      faults=0A=
+           1461153      spf=0A=
+            413293      pagefault:spf_vma_changed=0A=
+                 0      pagefault:spf_vma_noanon=0A=
+               200      pagefault:spf_vma_notsup=0A=
+                 0      pagefault:spf_vma_access=0A=
+                 0      pagefault:spf_pmd_changed=0A=
+=0A=
+In ebizzy's case most of the page fault were handled in a speculative way,=
+=0A=
+leading the ebizzy performance boost.=0A=
+=0A=
+------------------=0A=
+Changes since v10 (https://lkml.org/lkml/2018/4/17/572):=0A=
+ - Accounted for all review feedbacks from Punit Agrawal, Ganesh Mahendran=
+=0A=
+   and Minchan Kim, hopefully.=0A=
+ - Remove unneeded check on CONFIG_SPECULATIVE_PAGE_FAULT in=0A=
+   __do_page_fault().=0A=
+ - Loop in pte_spinlock() and pte_map_lock() when pte try lock fails=0A=
+   instead=0A=
+   of aborting the speculative page fault handling. Dropping the now=0A=
+useless=0A=
+   trace event pagefault:spf_pte_lock.=0A=
+ - No more try to reuse the fetched VMA during the speculative page fault=
+=0A=
+   handling when retrying is needed. This adds a lot of complexity and=0A=
+   additional tests done didn't show a significant performance improvement.=
+=0A=
+ - Convert IS_ENABLED(CONFIG_NUMA) back to #ifdef due to build error.=0A=
+=0A=
+[1] http://linux-kernel.2935.n7.nabble.com/RFC-PATCH-0-6-Another-go-at-spec=
+ulative-page-faults-tt965642.html#none=0A=
+[2] https://patchwork.kernel.org/patch/9999687/=0A=
+=0A=
+=0A=
+Laurent Dufour (20):=0A=
+  mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT=0A=
+  x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT=0A=
+  powerpc/mm: set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT=0A=
+  mm: introduce pte_spinlock for FAULT_FLAG_SPECULATIVE=0A=
+  mm: make pte_unmap_same compatible with SPF=0A=
+  mm: introduce INIT_VMA()=0A=
+  mm: protect VMA modifications using VMA sequence count=0A=
+  mm: protect mremap() against SPF hanlder=0A=
+  mm: protect SPF handler against anon_vma changes=0A=
+  mm: cache some VMA fields in the vm_fault structure=0A=
+  mm/migrate: Pass vm_fault pointer to migrate_misplaced_page()=0A=
+  mm: introduce __lru_cache_add_active_or_unevictable=0A=
+  mm: introduce __vm_normal_page()=0A=
+  mm: introduce __page_add_new_anon_rmap()=0A=
+  mm: protect mm_rb tree with a rwlock=0A=
+  mm: adding speculative page fault failure trace events=0A=
+  perf: add a speculative page fault sw event=0A=
+  perf tools: add support for the SPF perf event=0A=
+  mm: add speculative page fault vmstats=0A=
+  powerpc/mm: add speculative page fault=0A=
+=0A=
+Mahendran Ganesh (2):=0A=
+  arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT=0A=
+  arm64/mm: add speculative page fault=0A=
+=0A=
+Peter Zijlstra (4):=0A=
+  mm: prepare for FAULT_FLAG_SPECULATIVE=0A=
+  mm: VMA sequence count=0A=
+  mm: provide speculative fault infrastructure=0A=
+  x86/mm: add speculative pagefault handling=0A=
+=0A=
+ arch/arm64/Kconfig                    |   1 +=0A=
+ arch/arm64/mm/fault.c                 |  12 +=0A=
+ arch/powerpc/Kconfig                  |   1 +=0A=
+ arch/powerpc/mm/fault.c               |  16 +=0A=
+ arch/x86/Kconfig                      |   1 +=0A=
+ arch/x86/mm/fault.c                   |  27 +-=0A=
+ fs/exec.c                             |   2 +-=0A=
+ fs/proc/task_mmu.c                    |   5 +-=0A=
+ fs/userfaultfd.c                      |  17 +-=0A=
+ include/linux/hugetlb_inline.h        |   2 +-=0A=
+ include/linux/migrate.h               |   4 +-=0A=
+ include/linux/mm.h                    | 136 +++++++-=0A=
+ include/linux/mm_types.h              |   7 +=0A=
+ include/linux/pagemap.h               |   4 +-=0A=
+ include/linux/rmap.h                  |  12 +-=0A=
+ include/linux/swap.h                  |  10 +-=0A=
+ include/linux/vm_event_item.h         |   3 +=0A=
+ include/trace/events/pagefault.h      |  80 +++++=0A=
+ include/uapi/linux/perf_event.h       |   1 +=0A=
+ kernel/fork.c                         |   5 +-=0A=
+ mm/Kconfig                            |  22 ++=0A=
+ mm/huge_memory.c                      |   6 +-=0A=
+ mm/hugetlb.c                          |   2 +=0A=
+ mm/init-mm.c                          |   3 +=0A=
+ mm/internal.h                         |  20 ++=0A=
+ mm/khugepaged.c                       |   5 +=0A=
+ mm/madvise.c                          |   6 +-=0A=
+ mm/memory.c                           | 612 +++++++++++++++++++++++++++++-=
+----=0A=
+ mm/mempolicy.c                        |  51 ++-=0A=
+ mm/migrate.c                          |   6 +-=0A=
+ mm/mlock.c                            |  13 +-=0A=
+ mm/mmap.c                             | 229 ++++++++++---=0A=
+ mm/mprotect.c                         |   4 +-=0A=
+ mm/mremap.c                           |  13 +=0A=
+ mm/nommu.c                            |   2 +-=0A=
+ mm/rmap.c                             |   5 +-=0A=
+ mm/swap.c                             |   6 +-=0A=
+ mm/swap_state.c                       |   8 +-=0A=
+ mm/vmstat.c                           |   5 +-=0A=
+ tools/include/uapi/linux/perf_event.h |   1 +=0A=
+ tools/perf/util/evsel.c               |   1 +=0A=
+ tools/perf/util/parse-events.c        |   4 +=0A=
+ tools/perf/util/parse-events.l        |   1 +=0A=
+ tools/perf/util/python.c              |   1 +=0A=
+ 44 files changed, 1161 insertions(+), 211 deletions(-)=0A=
+ create mode 100644 include/trace/events/pagefault.h=0A=
+=0A=
+--=0A=
+2.7.4=0A=
+=0A=
\ No newline at end of file
diff --git a/a/content_digest b/N1/content_digest
index 6725bb9..310254b 100644
--- a/a/content_digest
+++ b/N1/content_digest
@@ -62,476 +62,557 @@
   "b\0"
 ]
 [
-  "\n",
-  "Some regression and improvements is found by LKP-tools(linux kernel performance) on V9 patch series\n",
-  "tested on Intel 4s Skylake platform.\n",
-  "\n",
-  "The regression result is sorted by the metric will-it-scale.per_thread_ops.\n",
-  "Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch series)\n",
-  "Commit id:\n",
-  "    base commit: d55f34411b1b126429a823d06c3124c16283231f\n",
-  "    head commit: 0355322b3577eeab7669066df42c550a56801110\n",
-  "Benchmark suite: will-it-scale\n",
-  "Download link:\n",
-  "https://github.com/antonblanchard/will-it-scale/tree/master/tests\n",
-  "Metrics:\n",
-  "    will-it-scale.per_process_ops=processes/nr_cpu\n",
-  "    will-it-scale.per_thread_ops=threads/nr_cpu\n",
-  "test box: lkp-skl-4sp1(nr_cpu=192,memory=768G)\n",
-  "THP: enable / disable\n",
-  "nr_task: 100%\n",
-  "\n",
-  "1. Regressions:\n",
-  "a) THP enabled:\n",
-  "testcase                        base            change          head       metric\n",
-  "page_fault3/ enable THP         10092           -17.5%          8323       will-it-scale.per_thread_ops\n",
-  "page_fault2/ enable THP          8300           -17.2%          6869       will-it-scale.per_thread_ops\n",
-  "brk1/ enable THP                  957.67         -7.6%           885       will-it-scale.per_thread_ops\n",
-  "page_fault3/ enable THP        172821            -5.3%        163692       will-it-scale.per_process_ops\n",
-  "signal1/ enable THP              9125            -3.2%          8834       will-it-scale.per_process_ops\n",
-  "\n",
-  "b) THP disabled:\n",
-  "testcase                        base            change          head       metric\n",
-  "page_fault3/ disable THP        10107           -19.1%          8180       will-it-scale.per_thread_ops\n",
-  "page_fault2/ disable THP         8432           -17.8%          6931       will-it-scale.per_thread_ops\n",
-  "context_switch1/ disable THP   215389            -6.8%        200776       will-it-scale.per_thread_ops\n",
-  "brk1/ disable THP                 939.67         -6.6%           877.33    will-it-scale.per_thread_ops\n",
-  "page_fault3/ disable THP       173145            -4.7%        165064       will-it-scale.per_process_ops\n",
-  "signal1/ disable THP             9162            -3.9%          8802       will-it-scale.per_process_ops\n",
-  "\n",
-  "2. Improvements:\n",
-  "a) THP enabled:\n",
-  "testcase                        base            change          head       metric\n",
-  "malloc1/ enable THP               66.33        +469.8%           383.67    will-it-scale.per_thread_ops\n",
-  "writeseek3/ enable THP          2531             +4.5%          2646       will-it-scale.per_thread_ops\n",
-  "signal1/ enable THP              989.33          +2.8%          1016       will-it-scale.per_thread_ops\n",
-  "\n",
-  "b) THP disabled:\n",
-  "testcase                        base            change          head       metric\n",
-  "malloc1/ disable THP              90.33        +417.3%           467.33    will-it-scale.per_thread_ops\n",
-  "read2/ disable THP             58934            +39.2%         82060       will-it-scale.per_thread_ops\n",
-  "page_fault1/ disable THP        8607            +36.4%         11736       will-it-scale.per_thread_ops\n",
-  "read1/ disable THP            314063            +12.7%        353934       will-it-scale.per_thread_ops\n",
-  "writeseek3/ disable THP         2452            +12.5%          2759       will-it-scale.per_thread_ops\n",
-  "signal1/ disable THP             971.33          +5.5%          1024       will-it-scale.per_thread_ops\n",
-  "\n",
-  "Notes: for above values in column \"change\", the higher value means that the related testcase result\n",
-  "on head commit is better than that on base commit for this benchmark.\n",
-  "\n",
-  "\n",
-  "Best regards\n",
-  "Haiyan Song\n",
-  "\n",
-  "________________________________________\n",
-  "From: owner-linux-mm\@kvack.org [owner-linux-mm\@kvack.org] on behalf of Laurent Dufour [ldufour\@linux.vnet.ibm.com]\n",
-  "Sent: Thursday, May 17, 2018 7:06 PM\n",
-  "To: akpm\@linux-foundation.org; mhocko\@kernel.org; peterz\@infradead.org; kirill\@shutemov.name; ak\@linux.intel.com; dave\@stgolabs.net; jack\@suse.cz; Matthew Wilcox; khandual\@linux.vnet.ibm.com; aneesh.kumar\@linux.vnet.ibm.com; benh\@kernel.crashing.org; mpe\@ellerman.id.au; paulus\@samba.org; Thomas Gleixner; Ingo Molnar; hpa\@zytor.com; Will Deacon; Sergey Senozhatsky; sergey.senozhatsky.work\@gmail.com; Andrea Arcangeli; Alexei Starovoitov; Wang, Kemi; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; Minchan Kim; Punit Agrawal; vinayak menon; Yang Shi\n",
-  "Cc: linux-kernel\@vger.kernel.org; linux-mm\@kvack.org; haren\@linux.vnet.ibm.com; npiggin\@gmail.com; bsingharora\@gmail.com; paulmck\@linux.vnet.ibm.com; Tim Chen; linuxppc-dev\@lists.ozlabs.org; x86\@kernel.org\n",
-  "Subject: [PATCH v11 00/26] Speculative page faults\n",
-  "\n",
-  "This is a port on kernel 4.17 of the work done by Peter Zijlstra to handle\n",
-  "page fault without holding the mm semaphore [1].\n",
-  "\n",
-  "The idea is to try to handle user space page faults without holding the\n",
-  "mmap_sem. This should allow better concurrency for massively threaded\n",
-  "process since the page fault handler will not wait for other threads memory\n",
-  "layout change to be done, assuming that this change is done in another part\n",
-  "of the process's memory space. This type page fault is named speculative\n",
-  "page fault. If the speculative page fault fails because of a concurrency is\n",
-  "detected or because underlying PMD or PTE tables are not yet allocating, it\n",
-  "is failing its processing and a classic page fault is then tried.\n",
-  "\n",
-  "The speculative page fault (SPF) has to look for the VMA matching the fault\n",
-  "address without holding the mmap_sem, this is done by introducing a rwlock\n",
-  "which protects the access to the mm_rb tree. Previously this was done using\n",
-  "SRCU but it was introducing a lot of scheduling to process the VMA's\n",
-  "freeing operation which was hitting the performance by 20% as reported by\n",
-  "Kemi Wang [2]. Using a rwlock to protect access to the mm_rb tree is\n",
-  "limiting the locking contention to these operations which are expected to\n",
-  "be in a O(log n) order. In addition to ensure that the VMA is not freed in\n",
-  "our back a reference count is added and 2 services (get_vma() and\n",
-  "put_vma()) are introduced to handle the reference count. Once a VMA is\n",
-  "fetched from the RB tree using get_vma(), it must be later freed using\n",
-  "put_vma(). I can't see anymore the overhead I got while will-it-scale\n",
-  "benchmark anymore.\n",
-  "\n",
-  "The VMA's attributes checked during the speculative page fault processing\n",
-  "have to be protected against parallel changes. This is done by using a per\n",
-  "VMA sequence lock. This sequence lock allows the speculative page fault\n",
-  "handler to fast check for parallel changes in progress and to abort the\n",
-  "speculative page fault in that case.\n",
-  "\n",
-  "Once the VMA has been found, the speculative page fault handler would check\n",
-  "for the VMA's attributes to verify that the page fault has to be handled\n",
-  "correctly or not. Thus, the VMA is protected through a sequence lock which\n",
-  "allows fast detection of concurrent VMA changes. If such a change is\n",
-  "detected, the speculative page fault is aborted and a *classic* page fault\n",
-  "is tried.  VMA sequence lockings are added when VMA attributes which are\n",
-  "checked during the page fault are modified.\n",
-  "\n",
-  "When the PTE is fetched, the VMA is checked to see if it has been changed,\n",
-  "so once the page table is locked, the VMA is valid, so any other changes\n",
-  "leading to touching this PTE will need to lock the page table, so no\n",
-  "parallel change is possible at this time.\n",
-  "\n",
-  "The locking of the PTE is done with interrupts disabled, this allows\n",
-  "checking for the PMD to ensure that there is not an ongoing collapsing\n",
-  "operation. Since khugepaged is firstly set the PMD to pmd_none and then is\n",
-  "waiting for the other CPU to have caught the IPI interrupt, if the pmd is\n",
-  "valid at the time the PTE is locked, we have the guarantee that the\n",
-  "collapsing operation will have to wait on the PTE lock to move forward.\n",
-  "This allows the SPF handler to map the PTE safely. If the PMD value is\n",
-  "different from the one recorded at the beginning of the SPF operation, the\n",
-  "classic page fault handler will be called to handle the operation while\n",
-  "holding the mmap_sem. As the PTE lock is done with the interrupts disabled,\n",
-  "the lock is done using spin_trylock() to avoid dead lock when handling a\n",
-  "page fault while a TLB invalidate is requested by another CPU holding the\n",
-  "PTE.\n",
-  "\n",
-  "In pseudo code, this could be seen as:\n",
-  "    speculative_page_fault()\n",
-  "    {\n",
-  "            vma = get_vma()\n",
-  "            check vma sequence count\n",
-  "            check vma's support\n",
-  "            disable interrupt\n",
-  "                  check pgd,p4d,...,pte\n",
-  "                  save pmd and pte in vmf\n",
-  "                  save vma sequence counter in vmf\n",
-  "            enable interrupt\n",
-  "            check vma sequence count\n",
-  "            handle_pte_fault(vma)\n",
-  "                    ..\n",
-  "                    page = alloc_page()\n",
-  "                    pte_map_lock()\n",
-  "                            disable interrupt\n",
-  "                                    abort if sequence counter has changed\n",
-  "                                    abort if pmd or pte has changed\n",
-  "                                    pte map and lock\n",
-  "                            enable interrupt\n",
-  "                    if abort\n",
-  "                       free page\n",
-  "                       abort\n",
-  "                    ...\n",
-  "    }\n",
-  "\n",
-  "    arch_fault_handler()\n",
-  "    {\n",
-  "            if (speculative_page_fault(&vma))\n",
-  "               goto done\n",
-  "    again:\n",
-  "            lock(mmap_sem)\n",
-  "            vma = find_vma();\n",
-  "            handle_pte_fault(vma);\n",
-  "            if retry\n",
-  "               unlock(mmap_sem)\n",
-  "               goto again;\n",
-  "    done:\n",
-  "            handle fault error\n",
-  "    }\n",
-  "\n",
-  "Support for THP is not done because when checking for the PMD, we can be\n",
-  "confused by an in progress collapsing operation done by khugepaged. The\n",
-  "issue is that pmd_none() could be true either if the PMD is not already\n",
-  "populated or if the underlying PTE are in the way to be collapsed. So we\n",
-  "cannot safely allocate a PMD if pmd_none() is true.\n",
-  "\n",
-  "This series add a new software performance event named 'speculative-faults'\n",
-  "or 'spf'. It counts the number of successful page fault event handled\n",
-  "speculatively. When recording 'faults,spf' events, the faults one is\n",
-  "counting the total number of page fault events while 'spf' is only counting\n",
-  "the part of the faults processed speculatively.\n",
-  "\n",
-  "There are some trace events introduced by this series. They allow\n",
-  "identifying why the page faults were not processed speculatively. This\n",
-  "doesn't take in account the faults generated by a monothreaded process\n",
-  "which directly processed while holding the mmap_sem. This trace events are\n",
-  "grouped in a system named 'pagefault', they are:\n",
-  " - pagefault:spf_vma_changed : if the VMA has been changed in our back\n",
-  " - pagefault:spf_vma_noanon : the vma->anon_vma field was not yet set.\n",
-  " - pagefault:spf_vma_notsup : the VMA's type is not supported\n",
-  " - pagefault:spf_vma_access : the VMA's access right are not respected\n",
-  " - pagefault:spf_pmd_changed : the upper PMD pointer has changed in our\n",
-  "   back.\n",
-  "\n",
-  "To record all the related events, the easier is to run perf with the\n",
-  "following arguments :\n",
-  "\$ perf stat -e 'faults,spf,pagefault:*' <command>\n",
-  "\n",
-  "There is also a dedicated vmstat counter showing the number of successful\n",
-  "page fault handled speculatively. I can be seen this way:\n",
-  "\$ grep speculative_pgfault /proc/vmstat\n",
-  "\n",
-  "This series builds on top of v4.16-mmotm-2018-04-13-17-28 and is functional\n",
-  "on x86, PowerPC and arm64.\n",
-  "\n",
-  "---------------------\n",
-  "Real Workload results\n",
-  "\n",
-  "As mentioned in previous email, we did non official runs using a \"popular\n",
-  "in memory multithreaded database product\" on 176 cores SMT8 Power system\n",
-  "which showed a 30% improvements in the number of transaction processed per\n",
-  "second. This run has been done on the v6 series, but changes introduced in\n",
-  "this new version should not impact the performance boost seen.\n",
-  "\n",
-  "Here are the perf data captured during 2 of these runs on top of the v8\n",
-  "series:\n",
-  "                vanilla         spf\n",
-  "faults          89.418          101.364         +13%\n",
-  "spf                n/a           97.989\n",
-  "\n",
-  "With the SPF kernel, most of the page fault were processed in a speculative\n",
-  "way.\n",
-  "\n",
-  "Ganesh Mahendran had backported the series on top of a 4.9 kernel and gave\n",
-  "it a try on an android device. He reported that the application launch time\n",
-  "was improved in average by 6%, and for large applications (~100 threads) by\n",
-  "20%.\n",
-  "\n",
-  "Here are the launch time Ganesh mesured on Android 8.0 on top of a Qcom\n",
-  "MSM845 (8 cores) with 6GB (the less is better):\n",
-  "\n",
-  "Application                             4.9     4.9+spf delta\n",
-  "com.tencent.mm                          416     389     -7%\n",
-  "com.eg.android.AlipayGphone             1135    986     -13%\n",
-  "com.tencent.mtt                         455     454     0%\n",
-  "com.qqgame.hlddz                        1497    1409    -6%\n",
-  "com.autonavi.minimap                    711     701     -1%\n",
-  "com.tencent.tmgp.sgame                  788     748     -5%\n",
-  "com.immomo.momo                         501     487     -3%\n",
-  "com.tencent.peng                        2145    2112    -2%\n",
-  "com.smile.gifmaker                      491     461     -6%\n",
-  "com.baidu.BaiduMap                      479     366     -23%\n",
-  "com.taobao.taobao                       1341    1198    -11%\n",
-  "com.baidu.searchbox                     333     314     -6%\n",
-  "com.tencent.mobileqq                    394     384     -3%\n",
-  "com.sina.weibo                          907     906     0%\n",
-  "com.youku.phone                         816     731     -11%\n",
-  "com.happyelements.AndroidAnimal.qq      763     717     -6%\n",
-  "com.UCMobile                            415     411     -1%\n",
-  "com.tencent.tmgp.ak                     1464    1431    -2%\n",
-  "com.tencent.qqmusic                     336     329     -2%\n",
-  "com.sankuai.meituan                     1661    1302    -22%\n",
-  "com.netease.cloudmusic                  1193    1200    1%\n",
-  "air.tv.douyu.android                    4257    4152    -2%\n",
-  "\n",
-  "------------------\n",
-  "Benchmarks results\n",
-  "\n",
-  "Base kernel is v4.17.0-rc4-mm1\n",
-  "SPF is BASE + this series\n",
-  "\n",
-  "Kernbench:\n",
-  "----------\n",
-  "Here are the results on a 16 CPUs X86 guest using kernbench on a 4.15\n",
-  "kernel (kernel is build 5 times):\n",
-  "\n",
-  "Average Half load -j 8\n",
-  "                 Run    (std deviation)\n",
-  "                 BASE                   SPF\n",
-  "Elapsed Time     1448.65 (5.72312)      1455.84 (4.84951)       0.50%\n",
-  "User    Time     10135.4 (30.3699)      10148.8 (31.1252)       0.13%\n",
-  "System  Time     900.47  (2.81131)      923.28  (7.52779)       2.53%\n",
-  "Percent CPU      761.4   (1.14018)      760.2   (0.447214)      -0.16%\n",
-  "Context Switches 85380   (3419.52)      84748   (1904.44)       -0.74%\n",
-  "Sleeps           105064  (1240.96)      105074  (337.612)       0.01%\n",
-  "\n",
-  "Average Optimal load -j 16\n",
-  "                 Run    (std deviation)\n",
-  "                 BASE                   SPF\n",
-  "Elapsed Time     920.528 (10.1212)      927.404 (8.91789)       0.75%\n",
-  "User    Time     11064.8 (981.142)      11085   (990.897)       0.18%\n",
-  "System  Time     979.904 (84.0615)      1001.14 (82.5523)       2.17%\n",
-  "Percent CPU      1089.5  (345.894)      1086.1  (343.545)       -0.31%\n",
-  "Context Switches 159488  (78156.4)      158223  (77472.1)       -0.79%\n",
-  "Sleeps           110566  (5877.49)      110388  (5617.75)       -0.16%\n",
-  "\n",
-  "\n",
-  "During a run on the SPF, perf events were captured:\n",
-  " Performance counter stats for '../kernbench -M':\n",
-  "         526743764      faults\n",
-  "               210      spf\n",
-  "                 3      pagefault:spf_vma_changed\n",
-  "                 0      pagefault:spf_vma_noanon\n",
-  "              2278      pagefault:spf_vma_notsup\n",
-  "                 0      pagefault:spf_vma_access\n",
-  "                 0      pagefault:spf_pmd_changed\n",
-  "\n",
-  "Very few speculative page faults were recorded as most of the processes\n",
-  "involved are monothreaded (sounds that on this architecture some threads\n",
-  "were created during the kernel build processing).\n",
-  "\n",
-  "Here are the kerbench results on a 80 CPUs Power8 system:\n",
-  "\n",
-  "Average Half load -j 40\n",
-  "                 Run    (std deviation)\n",
-  "                 BASE                   SPF\n",
-  "Elapsed Time     117.152 (0.774642)     117.166 (0.476057)      0.01%\n",
-  "User    Time     4478.52 (24.7688)      4479.76 (9.08555)       0.03%\n",
-  "System  Time     131.104 (0.720056)     134.04  (0.708414)      2.24%\n",
-  "Percent CPU      3934    (19.7104)      3937.2  (19.0184)       0.08%\n",
-  "Context Switches 92125.4 (576.787)      92581.6 (198.622)       0.50%\n",
-  "Sleeps           317923  (652.499)      318469  (1255.59)       0.17%\n",
-  "\n",
-  "Average Optimal load -j 80\n",
-  "                 Run    (std deviation)\n",
-  "                 BASE                   SPF\n",
-  "Elapsed Time     107.73  (0.632416)     107.31  (0.584936)      -0.39%\n",
-  "User    Time     5869.86 (1466.72)      5871.71 (1467.27)       0.03%\n",
-  "System  Time     153.728 (23.8573)      157.153 (24.3704)       2.23%\n",
-  "Percent CPU      5418.6  (1565.17)      5436.7  (1580.91)       0.33%\n",
-  "Context Switches 223861  (138865)       225032  (139632)        0.52%\n",
-  "Sleeps           330529  (13495.1)      332001  (14746.2)       0.45%\n",
-  "\n",
-  "During a run on the SPF, perf events were captured:\n",
-  " Performance counter stats for '../kernbench -M':\n",
-  "         116730856      faults\n",
-  "                 0      spf\n",
-  "                 3      pagefault:spf_vma_changed\n",
-  "                 0      pagefault:spf_vma_noanon\n",
-  "               476      pagefault:spf_vma_notsup\n",
-  "                 0      pagefault:spf_vma_access\n",
-  "                 0      pagefault:spf_pmd_changed\n",
-  "\n",
-  "Most of the processes involved are monothreaded so SPF is not activated but\n",
-  "there is no impact on the performance.\n",
-  "\n",
-  "Ebizzy:\n",
-  "-------\n",
-  "The test is counting the number of records per second it can manage, the\n",
-  "higher is the best. I run it like this 'ebizzy -mTt <nrcpus>'. To get\n",
-  "consistent result I repeated the test 100 times and measure the average\n",
-  "result. The number is the record processes per second, the higher is the\n",
-  "best.\n",
-  "\n",
-  "                BASE            SPF             delta\n",
-  "16 CPUs x86 VM  742.57          1490.24         100.69%\n",
-  "80 CPUs P8 node 13105.4         24174.23        84.46%\n",
-  "\n",
-  "Here are the performance counter read during a run on a 16 CPUs x86 VM:\n",
-  " Performance counter stats for './ebizzy -mTt 16':\n",
-  "           1706379      faults\n",
-  "           1674599      spf\n",
-  "             30588      pagefault:spf_vma_changed\n",
-  "                 0      pagefault:spf_vma_noanon\n",
-  "               363      pagefault:spf_vma_notsup\n",
-  "                 0      pagefault:spf_vma_access\n",
-  "                 0      pagefault:spf_pmd_changed\n",
-  "\n",
-  "And the ones captured during a run on a 80 CPUs Power node:\n",
-  " Performance counter stats for './ebizzy -mTt 80':\n",
-  "           1874773      faults\n",
-  "           1461153      spf\n",
-  "            413293      pagefault:spf_vma_changed\n",
-  "                 0      pagefault:spf_vma_noanon\n",
-  "               200      pagefault:spf_vma_notsup\n",
-  "                 0      pagefault:spf_vma_access\n",
-  "                 0      pagefault:spf_pmd_changed\n",
-  "\n",
-  "In ebizzy's case most of the page fault were handled in a speculative way,\n",
-  "leading the ebizzy performance boost.\n",
-  "\n",
-  "------------------\n",
-  "Changes since v10 (https://lkml.org/lkml/2018/4/17/572):\n",
-  " - Accounted for all review feedbacks from Punit Agrawal, Ganesh Mahendran\n",
-  "   and Minchan Kim, hopefully.\n",
-  " - Remove unneeded check on CONFIG_SPECULATIVE_PAGE_FAULT in\n",
-  "   __do_page_fault().\n",
-  " - Loop in pte_spinlock() and pte_map_lock() when pte try lock fails\n",
-  "   instead\n",
-  "   of aborting the speculative page fault handling. Dropping the now\n",
-  "useless\n",
-  "   trace event pagefault:spf_pte_lock.\n",
-  " - No more try to reuse the fetched VMA during the speculative page fault\n",
-  "   handling when retrying is needed. This adds a lot of complexity and\n",
-  "   additional tests done didn't show a significant performance improvement.\n",
-  " - Convert IS_ENABLED(CONFIG_NUMA) back to #ifdef due to build error.\n",
-  "\n",
-  "[1] http://linux-kernel.2935.n7.nabble.com/RFC-PATCH-0-6-Another-go-at-speculative-page-faults-tt965642.html#none\n",
-  "[2] https://patchwork.kernel.org/patch/9999687/\n",
-  "\n",
-  "\n",
-  "Laurent Dufour (20):\n",
-  "  mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT\n",
-  "  x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT\n",
-  "  powerpc/mm: set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT\n",
-  "  mm: introduce pte_spinlock for FAULT_FLAG_SPECULATIVE\n",
-  "  mm: make pte_unmap_same compatible with SPF\n",
-  "  mm: introduce INIT_VMA()\n",
-  "  mm: protect VMA modifications using VMA sequence count\n",
-  "  mm: protect mremap() against SPF hanlder\n",
-  "  mm: protect SPF handler against anon_vma changes\n",
-  "  mm: cache some VMA fields in the vm_fault structure\n",
-  "  mm/migrate: Pass vm_fault pointer to migrate_misplaced_page()\n",
-  "  mm: introduce __lru_cache_add_active_or_unevictable\n",
-  "  mm: introduce __vm_normal_page()\n",
-  "  mm: introduce __page_add_new_anon_rmap()\n",
-  "  mm: protect mm_rb tree with a rwlock\n",
-  "  mm: adding speculative page fault failure trace events\n",
-  "  perf: add a speculative page fault sw event\n",
-  "  perf tools: add support for the SPF perf event\n",
-  "  mm: add speculative page fault vmstats\n",
-  "  powerpc/mm: add speculative page fault\n",
-  "\n",
-  "Mahendran Ganesh (2):\n",
-  "  arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT\n",
-  "  arm64/mm: add speculative page fault\n",
-  "\n",
-  "Peter Zijlstra (4):\n",
-  "  mm: prepare for FAULT_FLAG_SPECULATIVE\n",
-  "  mm: VMA sequence count\n",
-  "  mm: provide speculative fault infrastructure\n",
-  "  x86/mm: add speculative pagefault handling\n",
-  "\n",
-  " arch/arm64/Kconfig                    |   1 +\n",
-  " arch/arm64/mm/fault.c                 |  12 +\n",
-  " arch/powerpc/Kconfig                  |   1 +\n",
-  " arch/powerpc/mm/fault.c               |  16 +\n",
-  " arch/x86/Kconfig                      |   1 +\n",
-  " arch/x86/mm/fault.c                   |  27 +-\n",
-  " fs/exec.c                             |   2 +-\n",
-  " fs/proc/task_mmu.c                    |   5 +-\n",
-  " fs/userfaultfd.c                      |  17 +-\n",
-  " include/linux/hugetlb_inline.h        |   2 +-\n",
-  " include/linux/migrate.h               |   4 +-\n",
-  " include/linux/mm.h                    | 136 +++++++-\n",
-  " include/linux/mm_types.h              |   7 +\n",
-  " include/linux/pagemap.h               |   4 +-\n",
-  " include/linux/rmap.h                  |  12 +-\n",
-  " include/linux/swap.h                  |  10 +-\n",
-  " include/linux/vm_event_item.h         |   3 +\n",
-  " include/trace/events/pagefault.h      |  80 +++++\n",
-  " include/uapi/linux/perf_event.h       |   1 +\n",
-  " kernel/fork.c                         |   5 +-\n",
-  " mm/Kconfig                            |  22 ++\n",
-  " mm/huge_memory.c                      |   6 +-\n",
-  " mm/hugetlb.c                          |   2 +\n",
-  " mm/init-mm.c                          |   3 +\n",
-  " mm/internal.h                         |  20 ++\n",
-  " mm/khugepaged.c                       |   5 +\n",
-  " mm/madvise.c                          |   6 +-\n",
-  " mm/memory.c                           | 612 +++++++++++++++++++++++++++++-----\n",
-  " mm/mempolicy.c                        |  51 ++-\n",
-  " mm/migrate.c                          |   6 +-\n",
-  " mm/mlock.c                            |  13 +-\n",
-  " mm/mmap.c                             | 229 ++++++++++---\n",
-  " mm/mprotect.c                         |   4 +-\n",
-  " mm/mremap.c                           |  13 +\n",
-  " mm/nommu.c                            |   2 +-\n",
-  " mm/rmap.c                             |   5 +-\n",
-  " mm/swap.c                             |   6 +-\n",
-  " mm/swap_state.c                       |   8 +-\n",
-  " mm/vmstat.c                           |   5 +-\n",
-  " tools/include/uapi/linux/perf_event.h |   1 +\n",
-  " tools/perf/util/evsel.c               |   1 +\n",
-  " tools/perf/util/parse-events.c        |   4 +\n",
-  " tools/perf/util/parse-events.l        |   1 +\n",
-  " tools/perf/util/python.c              |   1 +\n",
-  " 44 files changed, 1161 insertions(+), 211 deletions(-)\n",
-  " create mode 100644 include/trace/events/pagefault.h\n",
-  "\n",
-  "--\n",
-  "2.7.4"
+  "=0A=\n",
+  "Some regression and improvements is found by LKP-tools(linux kernel perform=\n",
+  "ance) on V9 patch series=0A=\n",
+  "tested on Intel 4s Skylake platform.=0A=\n",
+  "=0A=\n",
+  "The regression result is sorted by the metric will-it-scale.per_thread_ops.=\n",
+  "=0A=\n",
+  "Branch: Laurent-Dufour/Speculative-page-faults/20180316-151833 (V9 patch se=\n",
+  "ries)=0A=\n",
+  "Commit id:=0A=\n",
+  "    base commit: d55f34411b1b126429a823d06c3124c16283231f=0A=\n",
+  "    head commit: 0355322b3577eeab7669066df42c550a56801110=0A=\n",
+  "Benchmark suite: will-it-scale=0A=\n",
+  "Download link:=0A=\n",
+  "https://github.com/antonblanchard/will-it-scale/tree/master/tests=0A=\n",
+  "Metrics:=0A=\n",
+  "    will-it-scale.per_process_ops=3Dprocesses/nr_cpu=0A=\n",
+  "    will-it-scale.per_thread_ops=3Dthreads/nr_cpu=0A=\n",
+  "test box: lkp-skl-4sp1(nr_cpu=3D192,memory=3D768G)=0A=\n",
+  "THP: enable / disable=0A=\n",
+  "nr_task: 100%=0A=\n",
+  "=0A=\n",
+  "1. Regressions:=0A=\n",
+  "a) THP enabled:=0A=\n",
+  "testcase                        base            change          head       =\n",
+  "metric=0A=\n",
+  "page_fault3/ enable THP         10092           -17.5%          8323       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "page_fault2/ enable THP          8300           -17.2%          6869       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "brk1/ enable THP                  957.67         -7.6%           885       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "page_fault3/ enable THP        172821            -5.3%        163692       =\n",
+  "will-it-scale.per_process_ops=0A=\n",
+  "signal1/ enable THP              9125            -3.2%          8834       =\n",
+  "will-it-scale.per_process_ops=0A=\n",
+  "=0A=\n",
+  "b) THP disabled:=0A=\n",
+  "testcase                        base            change          head       =\n",
+  "metric=0A=\n",
+  "page_fault3/ disable THP        10107           -19.1%          8180       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "page_fault2/ disable THP         8432           -17.8%          6931       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "context_switch1/ disable THP   215389            -6.8%        200776       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "brk1/ disable THP                 939.67         -6.6%           877.33    =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "page_fault3/ disable THP       173145            -4.7%        165064       =\n",
+  "will-it-scale.per_process_ops=0A=\n",
+  "signal1/ disable THP             9162            -3.9%          8802       =\n",
+  "will-it-scale.per_process_ops=0A=\n",
+  "=0A=\n",
+  "2. Improvements:=0A=\n",
+  "a) THP enabled:=0A=\n",
+  "testcase                        base            change          head       =\n",
+  "metric=0A=\n",
+  "malloc1/ enable THP               66.33        +469.8%           383.67    =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "writeseek3/ enable THP          2531             +4.5%          2646       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "signal1/ enable THP              989.33          +2.8%          1016       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "=0A=\n",
+  "b) THP disabled:=0A=\n",
+  "testcase                        base            change          head       =\n",
+  "metric=0A=\n",
+  "malloc1/ disable THP              90.33        +417.3%           467.33    =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "read2/ disable THP             58934            +39.2%         82060       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "page_fault1/ disable THP        8607            +36.4%         11736       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "read1/ disable THP            314063            +12.7%        353934       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "writeseek3/ disable THP         2452            +12.5%          2759       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "signal1/ disable THP             971.33          +5.5%          1024       =\n",
+  "will-it-scale.per_thread_ops=0A=\n",
+  "=0A=\n",
+  "Notes: for above values in column \"change\", the higher value means that the=\n",
+  " related testcase result=0A=\n",
+  "on head commit is better than that on base commit for this benchmark.=0A=\n",
+  "=0A=\n",
+  "=0A=\n",
+  "Best regards=0A=\n",
+  "Haiyan Song=0A=\n",
+  "=0A=\n",
+  "________________________________________=0A=\n",
+  "From: owner-linux-mm\@kvack.org [owner-linux-mm\@kvack.org] on behalf of Laur=\n",
+  "ent Dufour [ldufour\@linux.vnet.ibm.com]=0A=\n",
+  "Sent: Thursday, May 17, 2018 7:06 PM=0A=\n",
+  "To: akpm\@linux-foundation.org; mhocko\@kernel.org; peterz\@infradead.org; kir=\n",
+  "ill\@shutemov.name; ak\@linux.intel.com; dave\@stgolabs.net; jack\@suse.cz; Mat=\n",
+  "thew Wilcox; khandual\@linux.vnet.ibm.com; aneesh.kumar\@linux.vnet.ibm.com; =\n",
+  "benh\@kernel.crashing.org; mpe\@ellerman.id.au; paulus\@samba.org; Thomas Glei=\n",
+  "xner; Ingo Molnar; hpa\@zytor.com; Will Deacon; Sergey Senozhatsky; sergey.s=\n",
+  "enozhatsky.work\@gmail.com; Andrea Arcangeli; Alexei Starovoitov; Wang, Kemi=\n",
+  "; Daniel Jordan; David Rientjes; Jerome Glisse; Ganesh Mahendran; Minchan K=\n",
+  "im; Punit Agrawal; vinayak menon; Yang Shi=0A=\n",
+  "Cc: linux-kernel\@vger.kernel.org; linux-mm\@kvack.org; haren\@linux.vnet.ibm.=\n",
+  "com; npiggin\@gmail.com; bsingharora\@gmail.com; paulmck\@linux.vnet.ibm.com; =\n",
+  "Tim Chen; linuxppc-dev\@lists.ozlabs.org; x86\@kernel.org=0A=\n",
+  "Subject: [PATCH v11 00/26] Speculative page faults=0A=\n",
+  "=0A=\n",
+  "This is a port on kernel 4.17 of the work done by Peter Zijlstra to handle=\n",
+  "=0A=\n",
+  "page fault without holding the mm semaphore [1].=0A=\n",
+  "=0A=\n",
+  "The idea is to try to handle user space page faults without holding the=0A=\n",
+  "mmap_sem. This should allow better concurrency for massively threaded=0A=\n",
+  "process since the page fault handler will not wait for other threads memory=\n",
+  "=0A=\n",
+  "layout change to be done, assuming that this change is done in another part=\n",
+  "=0A=\n",
+  "of the process's memory space. This type page fault is named speculative=0A=\n",
+  "page fault. If the speculative page fault fails because of a concurrency is=\n",
+  "=0A=\n",
+  "detected or because underlying PMD or PTE tables are not yet allocating, it=\n",
+  "=0A=\n",
+  "is failing its processing and a classic page fault is then tried.=0A=\n",
+  "=0A=\n",
+  "The speculative page fault (SPF) has to look for the VMA matching the fault=\n",
+  "=0A=\n",
+  "address without holding the mmap_sem, this is done by introducing a rwlock=\n",
+  "=0A=\n",
+  "which protects the access to the mm_rb tree. Previously this was done using=\n",
+  "=0A=\n",
+  "SRCU but it was introducing a lot of scheduling to process the VMA's=0A=\n",
+  "freeing operation which was hitting the performance by 20% as reported by=\n",
+  "=0A=\n",
+  "Kemi Wang [2]. Using a rwlock to protect access to the mm_rb tree is=0A=\n",
+  "limiting the locking contention to these operations which are expected to=\n",
+  "=0A=\n",
+  "be in a O(log n) order. In addition to ensure that the VMA is not freed in=\n",
+  "=0A=\n",
+  "our back a reference count is added and 2 services (get_vma() and=0A=\n",
+  "put_vma()) are introduced to handle the reference count. Once a VMA is=0A=\n",
+  "fetched from the RB tree using get_vma(), it must be later freed using=0A=\n",
+  "put_vma(). I can't see anymore the overhead I got while will-it-scale=0A=\n",
+  "benchmark anymore.=0A=\n",
+  "=0A=\n",
+  "The VMA's attributes checked during the speculative page fault processing=\n",
+  "=0A=\n",
+  "have to be protected against parallel changes. This is done by using a per=\n",
+  "=0A=\n",
+  "VMA sequence lock. This sequence lock allows the speculative page fault=0A=\n",
+  "handler to fast check for parallel changes in progress and to abort the=0A=\n",
+  "speculative page fault in that case.=0A=\n",
+  "=0A=\n",
+  "Once the VMA has been found, the speculative page fault handler would check=\n",
+  "=0A=\n",
+  "for the VMA's attributes to verify that the page fault has to be handled=0A=\n",
+  "correctly or not. Thus, the VMA is protected through a sequence lock which=\n",
+  "=0A=\n",
+  "allows fast detection of concurrent VMA changes. If such a change is=0A=\n",
+  "detected, the speculative page fault is aborted and a *classic* page fault=\n",
+  "=0A=\n",
+  "is tried.  VMA sequence lockings are added when VMA attributes which are=0A=\n",
+  "checked during the page fault are modified.=0A=\n",
+  "=0A=\n",
+  "When the PTE is fetched, the VMA is checked to see if it has been changed,=\n",
+  "=0A=\n",
+  "so once the page table is locked, the VMA is valid, so any other changes=0A=\n",
+  "leading to touching this PTE will need to lock the page table, so no=0A=\n",
+  "parallel change is possible at this time.=0A=\n",
+  "=0A=\n",
+  "The locking of the PTE is done with interrupts disabled, this allows=0A=\n",
+  "checking for the PMD to ensure that there is not an ongoing collapsing=0A=\n",
+  "operation. Since khugepaged is firstly set the PMD to pmd_none and then is=\n",
+  "=0A=\n",
+  "waiting for the other CPU to have caught the IPI interrupt, if the pmd is=\n",
+  "=0A=\n",
+  "valid at the time the PTE is locked, we have the guarantee that the=0A=\n",
+  "collapsing operation will have to wait on the PTE lock to move forward.=0A=\n",
+  "This allows the SPF handler to map the PTE safely. If the PMD value is=0A=\n",
+  "different from the one recorded at the beginning of the SPF operation, the=\n",
+  "=0A=\n",
+  "classic page fault handler will be called to handle the operation while=0A=\n",
+  "holding the mmap_sem. As the PTE lock is done with the interrupts disabled,=\n",
+  "=0A=\n",
+  "the lock is done using spin_trylock() to avoid dead lock when handling a=0A=\n",
+  "page fault while a TLB invalidate is requested by another CPU holding the=\n",
+  "=0A=\n",
+  "PTE.=0A=\n",
+  "=0A=\n",
+  "In pseudo code, this could be seen as:=0A=\n",
+  "    speculative_page_fault()=0A=\n",
+  "    {=0A=\n",
+  "            vma =3D get_vma()=0A=\n",
+  "            check vma sequence count=0A=\n",
+  "            check vma's support=0A=\n",
+  "            disable interrupt=0A=\n",
+  "                  check pgd,p4d,...,pte=0A=\n",
+  "                  save pmd and pte in vmf=0A=\n",
+  "                  save vma sequence counter in vmf=0A=\n",
+  "            enable interrupt=0A=\n",
+  "            check vma sequence count=0A=\n",
+  "            handle_pte_fault(vma)=0A=\n",
+  "                    ..=0A=\n",
+  "                    page =3D alloc_page()=0A=\n",
+  "                    pte_map_lock()=0A=\n",
+  "                            disable interrupt=0A=\n",
+  "                                    abort if sequence counter has changed=\n",
+  "=0A=\n",
+  "                                    abort if pmd or pte has changed=0A=\n",
+  "                                    pte map and lock=0A=\n",
+  "                            enable interrupt=0A=\n",
+  "                    if abort=0A=\n",
+  "                       free page=0A=\n",
+  "                       abort=0A=\n",
+  "                    ...=0A=\n",
+  "    }=0A=\n",
+  "=0A=\n",
+  "    arch_fault_handler()=0A=\n",
+  "    {=0A=\n",
+  "            if (speculative_page_fault(&vma))=0A=\n",
+  "               goto done=0A=\n",
+  "    again:=0A=\n",
+  "            lock(mmap_sem)=0A=\n",
+  "            vma =3D find_vma();=0A=\n",
+  "            handle_pte_fault(vma);=0A=\n",
+  "            if retry=0A=\n",
+  "               unlock(mmap_sem)=0A=\n",
+  "               goto again;=0A=\n",
+  "    done:=0A=\n",
+  "            handle fault error=0A=\n",
+  "    }=0A=\n",
+  "=0A=\n",
+  "Support for THP is not done because when checking for the PMD, we can be=0A=\n",
+  "confused by an in progress collapsing operation done by khugepaged. The=0A=\n",
+  "issue is that pmd_none() could be true either if the PMD is not already=0A=\n",
+  "populated or if the underlying PTE are in the way to be collapsed. So we=0A=\n",
+  "cannot safely allocate a PMD if pmd_none() is true.=0A=\n",
+  "=0A=\n",
+  "This series add a new software performance event named 'speculative-faults'=\n",
+  "=0A=\n",
+  "or 'spf'. It counts the number of successful page fault event handled=0A=\n",
+  "speculatively. When recording 'faults,spf' events, the faults one is=0A=\n",
+  "counting the total number of page fault events while 'spf' is only counting=\n",
+  "=0A=\n",
+  "the part of the faults processed speculatively.=0A=\n",
+  "=0A=\n",
+  "There are some trace events introduced by this series. They allow=0A=\n",
+  "identifying why the page faults were not processed speculatively. This=0A=\n",
+  "doesn't take in account the faults generated by a monothreaded process=0A=\n",
+  "which directly processed while holding the mmap_sem. This trace events are=\n",
+  "=0A=\n",
+  "grouped in a system named 'pagefault', they are:=0A=\n",
+  " - pagefault:spf_vma_changed : if the VMA has been changed in our back=0A=\n",
+  " - pagefault:spf_vma_noanon : the vma->anon_vma field was not yet set.=0A=\n",
+  " - pagefault:spf_vma_notsup : the VMA's type is not supported=0A=\n",
+  " - pagefault:spf_vma_access : the VMA's access right are not respected=0A=\n",
+  " - pagefault:spf_pmd_changed : the upper PMD pointer has changed in our=0A=\n",
+  "   back.=0A=\n",
+  "=0A=\n",
+  "To record all the related events, the easier is to run perf with the=0A=\n",
+  "following arguments :=0A=\n",
+  "\$ perf stat -e 'faults,spf,pagefault:*' <command>=0A=\n",
+  "=0A=\n",
+  "There is also a dedicated vmstat counter showing the number of successful=\n",
+  "=0A=\n",
+  "page fault handled speculatively. I can be seen this way:=0A=\n",
+  "\$ grep speculative_pgfault /proc/vmstat=0A=\n",
+  "=0A=\n",
+  "This series builds on top of v4.16-mmotm-2018-04-13-17-28 and is functional=\n",
+  "=0A=\n",
+  "on x86, PowerPC and arm64.=0A=\n",
+  "=0A=\n",
+  "---------------------=0A=\n",
+  "Real Workload results=0A=\n",
+  "=0A=\n",
+  "As mentioned in previous email, we did non official runs using a \"popular=\n",
+  "=0A=\n",
+  "in memory multithreaded database product\" on 176 cores SMT8 Power system=0A=\n",
+  "which showed a 30% improvements in the number of transaction processed per=\n",
+  "=0A=\n",
+  "second. This run has been done on the v6 series, but changes introduced in=\n",
+  "=0A=\n",
+  "this new version should not impact the performance boost seen.=0A=\n",
+  "=0A=\n",
+  "Here are the perf data captured during 2 of these runs on top of the v8=0A=\n",
+  "series:=0A=\n",
+  "                vanilla         spf=0A=\n",
+  "faults          89.418          101.364         +13%=0A=\n",
+  "spf                n/a           97.989=0A=\n",
+  "=0A=\n",
+  "With the SPF kernel, most of the page fault were processed in a speculative=\n",
+  "=0A=\n",
+  "way.=0A=\n",
+  "=0A=\n",
+  "Ganesh Mahendran had backported the series on top of a 4.9 kernel and gave=\n",
+  "=0A=\n",
+  "it a try on an android device. He reported that the application launch time=\n",
+  "=0A=\n",
+  "was improved in average by 6%, and for large applications (~100 threads) by=\n",
+  "=0A=\n",
+  "20%.=0A=\n",
+  "=0A=\n",
+  "Here are the launch time Ganesh mesured on Android 8.0 on top of a Qcom=0A=\n",
+  "MSM845 (8 cores) with 6GB (the less is better):=0A=\n",
+  "=0A=\n",
+  "Application                             4.9     4.9+spf delta=0A=\n",
+  "com.tencent.mm                          416     389     -7%=0A=\n",
+  "com.eg.android.AlipayGphone             1135    986     -13%=0A=\n",
+  "com.tencent.mtt                         455     454     0%=0A=\n",
+  "com.qqgame.hlddz                        1497    1409    -6%=0A=\n",
+  "com.autonavi.minimap                    711     701     -1%=0A=\n",
+  "com.tencent.tmgp.sgame                  788     748     -5%=0A=\n",
+  "com.immomo.momo                         501     487     -3%=0A=\n",
+  "com.tencent.peng                        2145    2112    -2%=0A=\n",
+  "com.smile.gifmaker                      491     461     -6%=0A=\n",
+  "com.baidu.BaiduMap                      479     366     -23%=0A=\n",
+  "com.taobao.taobao                       1341    1198    -11%=0A=\n",
+  "com.baidu.searchbox                     333     314     -6%=0A=\n",
+  "com.tencent.mobileqq                    394     384     -3%=0A=\n",
+  "com.sina.weibo                          907     906     0%=0A=\n",
+  "com.youku.phone                         816     731     -11%=0A=\n",
+  "com.happyelements.AndroidAnimal.qq      763     717     -6%=0A=\n",
+  "com.UCMobile                            415     411     -1%=0A=\n",
+  "com.tencent.tmgp.ak                     1464    1431    -2%=0A=\n",
+  "com.tencent.qqmusic                     336     329     -2%=0A=\n",
+  "com.sankuai.meituan                     1661    1302    -22%=0A=\n",
+  "com.netease.cloudmusic                  1193    1200    1%=0A=\n",
+  "air.tv.douyu.android                    4257    4152    -2%=0A=\n",
+  "=0A=\n",
+  "------------------=0A=\n",
+  "Benchmarks results=0A=\n",
+  "=0A=\n",
+  "Base kernel is v4.17.0-rc4-mm1=0A=\n",
+  "SPF is BASE + this series=0A=\n",
+  "=0A=\n",
+  "Kernbench:=0A=\n",
+  "----------=0A=\n",
+  "Here are the results on a 16 CPUs X86 guest using kernbench on a 4.15=0A=\n",
+  "kernel (kernel is build 5 times):=0A=\n",
+  "=0A=\n",
+  "Average Half load -j 8=0A=\n",
+  "                 Run    (std deviation)=0A=\n",
+  "                 BASE                   SPF=0A=\n",
+  "Elapsed Time     1448.65 (5.72312)      1455.84 (4.84951)       0.50%=0A=\n",
+  "User    Time     10135.4 (30.3699)      10148.8 (31.1252)       0.13%=0A=\n",
+  "System  Time     900.47  (2.81131)      923.28  (7.52779)       2.53%=0A=\n",
+  "Percent CPU      761.4   (1.14018)      760.2   (0.447214)      -0.16%=0A=\n",
+  "Context Switches 85380   (3419.52)      84748   (1904.44)       -0.74%=0A=\n",
+  "Sleeps           105064  (1240.96)      105074  (337.612)       0.01%=0A=\n",
+  "=0A=\n",
+  "Average Optimal load -j 16=0A=\n",
+  "                 Run    (std deviation)=0A=\n",
+  "                 BASE                   SPF=0A=\n",
+  "Elapsed Time     920.528 (10.1212)      927.404 (8.91789)       0.75%=0A=\n",
+  "User    Time     11064.8 (981.142)      11085   (990.897)       0.18%=0A=\n",
+  "System  Time     979.904 (84.0615)      1001.14 (82.5523)       2.17%=0A=\n",
+  "Percent CPU      1089.5  (345.894)      1086.1  (343.545)       -0.31%=0A=\n",
+  "Context Switches 159488  (78156.4)      158223  (77472.1)       -0.79%=0A=\n",
+  "Sleeps           110566  (5877.49)      110388  (5617.75)       -0.16%=0A=\n",
+  "=0A=\n",
+  "=0A=\n",
+  "During a run on the SPF, perf events were captured:=0A=\n",
+  " Performance counter stats for '../kernbench -M':=0A=\n",
+  "         526743764      faults=0A=\n",
+  "               210      spf=0A=\n",
+  "                 3      pagefault:spf_vma_changed=0A=\n",
+  "                 0      pagefault:spf_vma_noanon=0A=\n",
+  "              2278      pagefault:spf_vma_notsup=0A=\n",
+  "                 0      pagefault:spf_vma_access=0A=\n",
+  "                 0      pagefault:spf_pmd_changed=0A=\n",
+  "=0A=\n",
+  "Very few speculative page faults were recorded as most of the processes=0A=\n",
+  "involved are monothreaded (sounds that on this architecture some threads=0A=\n",
+  "were created during the kernel build processing).=0A=\n",
+  "=0A=\n",
+  "Here are the kerbench results on a 80 CPUs Power8 system:=0A=\n",
+  "=0A=\n",
+  "Average Half load -j 40=0A=\n",
+  "                 Run    (std deviation)=0A=\n",
+  "                 BASE                   SPF=0A=\n",
+  "Elapsed Time     117.152 (0.774642)     117.166 (0.476057)      0.01%=0A=\n",
+  "User    Time     4478.52 (24.7688)      4479.76 (9.08555)       0.03%=0A=\n",
+  "System  Time     131.104 (0.720056)     134.04  (0.708414)      2.24%=0A=\n",
+  "Percent CPU      3934    (19.7104)      3937.2  (19.0184)       0.08%=0A=\n",
+  "Context Switches 92125.4 (576.787)      92581.6 (198.622)       0.50%=0A=\n",
+  "Sleeps           317923  (652.499)      318469  (1255.59)       0.17%=0A=\n",
+  "=0A=\n",
+  "Average Optimal load -j 80=0A=\n",
+  "                 Run    (std deviation)=0A=\n",
+  "                 BASE                   SPF=0A=\n",
+  "Elapsed Time     107.73  (0.632416)     107.31  (0.584936)      -0.39%=0A=\n",
+  "User    Time     5869.86 (1466.72)      5871.71 (1467.27)       0.03%=0A=\n",
+  "System  Time     153.728 (23.8573)      157.153 (24.3704)       2.23%=0A=\n",
+  "Percent CPU      5418.6  (1565.17)      5436.7  (1580.91)       0.33%=0A=\n",
+  "Context Switches 223861  (138865)       225032  (139632)        0.52%=0A=\n",
+  "Sleeps           330529  (13495.1)      332001  (14746.2)       0.45%=0A=\n",
+  "=0A=\n",
+  "During a run on the SPF, perf events were captured:=0A=\n",
+  " Performance counter stats for '../kernbench -M':=0A=\n",
+  "         116730856      faults=0A=\n",
+  "                 0      spf=0A=\n",
+  "                 3      pagefault:spf_vma_changed=0A=\n",
+  "                 0      pagefault:spf_vma_noanon=0A=\n",
+  "               476      pagefault:spf_vma_notsup=0A=\n",
+  "                 0      pagefault:spf_vma_access=0A=\n",
+  "                 0      pagefault:spf_pmd_changed=0A=\n",
+  "=0A=\n",
+  "Most of the processes involved are monothreaded so SPF is not activated but=\n",
+  "=0A=\n",
+  "there is no impact on the performance.=0A=\n",
+  "=0A=\n",
+  "Ebizzy:=0A=\n",
+  "-------=0A=\n",
+  "The test is counting the number of records per second it can manage, the=0A=\n",
+  "higher is the best. I run it like this 'ebizzy -mTt <nrcpus>'. To get=0A=\n",
+  "consistent result I repeated the test 100 times and measure the average=0A=\n",
+  "result. The number is the record processes per second, the higher is the=0A=\n",
+  "best.=0A=\n",
+  "=0A=\n",
+  "                BASE            SPF             delta=0A=\n",
+  "16 CPUs x86 VM  742.57          1490.24         100.69%=0A=\n",
+  "80 CPUs P8 node 13105.4         24174.23        84.46%=0A=\n",
+  "=0A=\n",
+  "Here are the performance counter read during a run on a 16 CPUs x86 VM:=0A=\n",
+  " Performance counter stats for './ebizzy -mTt 16':=0A=\n",
+  "           1706379      faults=0A=\n",
+  "           1674599      spf=0A=\n",
+  "             30588      pagefault:spf_vma_changed=0A=\n",
+  "                 0      pagefault:spf_vma_noanon=0A=\n",
+  "               363      pagefault:spf_vma_notsup=0A=\n",
+  "                 0      pagefault:spf_vma_access=0A=\n",
+  "                 0      pagefault:spf_pmd_changed=0A=\n",
+  "=0A=\n",
+  "And the ones captured during a run on a 80 CPUs Power node:=0A=\n",
+  " Performance counter stats for './ebizzy -mTt 80':=0A=\n",
+  "           1874773      faults=0A=\n",
+  "           1461153      spf=0A=\n",
+  "            413293      pagefault:spf_vma_changed=0A=\n",
+  "                 0      pagefault:spf_vma_noanon=0A=\n",
+  "               200      pagefault:spf_vma_notsup=0A=\n",
+  "                 0      pagefault:spf_vma_access=0A=\n",
+  "                 0      pagefault:spf_pmd_changed=0A=\n",
+  "=0A=\n",
+  "In ebizzy's case most of the page fault were handled in a speculative way,=\n",
+  "=0A=\n",
+  "leading the ebizzy performance boost.=0A=\n",
+  "=0A=\n",
+  "------------------=0A=\n",
+  "Changes since v10 (https://lkml.org/lkml/2018/4/17/572):=0A=\n",
+  " - Accounted for all review feedbacks from Punit Agrawal, Ganesh Mahendran=\n",
+  "=0A=\n",
+  "   and Minchan Kim, hopefully.=0A=\n",
+  " - Remove unneeded check on CONFIG_SPECULATIVE_PAGE_FAULT in=0A=\n",
+  "   __do_page_fault().=0A=\n",
+  " - Loop in pte_spinlock() and pte_map_lock() when pte try lock fails=0A=\n",
+  "   instead=0A=\n",
+  "   of aborting the speculative page fault handling. Dropping the now=0A=\n",
+  "useless=0A=\n",
+  "   trace event pagefault:spf_pte_lock.=0A=\n",
+  " - No more try to reuse the fetched VMA during the speculative page fault=\n",
+  "=0A=\n",
+  "   handling when retrying is needed. This adds a lot of complexity and=0A=\n",
+  "   additional tests done didn't show a significant performance improvement.=\n",
+  "=0A=\n",
+  " - Convert IS_ENABLED(CONFIG_NUMA) back to #ifdef due to build error.=0A=\n",
+  "=0A=\n",
+  "[1] http://linux-kernel.2935.n7.nabble.com/RFC-PATCH-0-6-Another-go-at-spec=\n",
+  "ulative-page-faults-tt965642.html#none=0A=\n",
+  "[2] https://patchwork.kernel.org/patch/9999687/=0A=\n",
+  "=0A=\n",
+  "=0A=\n",
+  "Laurent Dufour (20):=0A=\n",
+  "  mm: introduce CONFIG_SPECULATIVE_PAGE_FAULT=0A=\n",
+  "  x86/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT=0A=\n",
+  "  powerpc/mm: set ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT=0A=\n",
+  "  mm: introduce pte_spinlock for FAULT_FLAG_SPECULATIVE=0A=\n",
+  "  mm: make pte_unmap_same compatible with SPF=0A=\n",
+  "  mm: introduce INIT_VMA()=0A=\n",
+  "  mm: protect VMA modifications using VMA sequence count=0A=\n",
+  "  mm: protect mremap() against SPF hanlder=0A=\n",
+  "  mm: protect SPF handler against anon_vma changes=0A=\n",
+  "  mm: cache some VMA fields in the vm_fault structure=0A=\n",
+  "  mm/migrate: Pass vm_fault pointer to migrate_misplaced_page()=0A=\n",
+  "  mm: introduce __lru_cache_add_active_or_unevictable=0A=\n",
+  "  mm: introduce __vm_normal_page()=0A=\n",
+  "  mm: introduce __page_add_new_anon_rmap()=0A=\n",
+  "  mm: protect mm_rb tree with a rwlock=0A=\n",
+  "  mm: adding speculative page fault failure trace events=0A=\n",
+  "  perf: add a speculative page fault sw event=0A=\n",
+  "  perf tools: add support for the SPF perf event=0A=\n",
+  "  mm: add speculative page fault vmstats=0A=\n",
+  "  powerpc/mm: add speculative page fault=0A=\n",
+  "=0A=\n",
+  "Mahendran Ganesh (2):=0A=\n",
+  "  arm64/mm: define ARCH_SUPPORTS_SPECULATIVE_PAGE_FAULT=0A=\n",
+  "  arm64/mm: add speculative page fault=0A=\n",
+  "=0A=\n",
+  "Peter Zijlstra (4):=0A=\n",
+  "  mm: prepare for FAULT_FLAG_SPECULATIVE=0A=\n",
+  "  mm: VMA sequence count=0A=\n",
+  "  mm: provide speculative fault infrastructure=0A=\n",
+  "  x86/mm: add speculative pagefault handling=0A=\n",
+  "=0A=\n",
+  " arch/arm64/Kconfig                    |   1 +=0A=\n",
+  " arch/arm64/mm/fault.c                 |  12 +=0A=\n",
+  " arch/powerpc/Kconfig                  |   1 +=0A=\n",
+  " arch/powerpc/mm/fault.c               |  16 +=0A=\n",
+  " arch/x86/Kconfig                      |   1 +=0A=\n",
+  " arch/x86/mm/fault.c                   |  27 +-=0A=\n",
+  " fs/exec.c                             |   2 +-=0A=\n",
+  " fs/proc/task_mmu.c                    |   5 +-=0A=\n",
+  " fs/userfaultfd.c                      |  17 +-=0A=\n",
+  " include/linux/hugetlb_inline.h        |   2 +-=0A=\n",
+  " include/linux/migrate.h               |   4 +-=0A=\n",
+  " include/linux/mm.h                    | 136 +++++++-=0A=\n",
+  " include/linux/mm_types.h              |   7 +=0A=\n",
+  " include/linux/pagemap.h               |   4 +-=0A=\n",
+  " include/linux/rmap.h                  |  12 +-=0A=\n",
+  " include/linux/swap.h                  |  10 +-=0A=\n",
+  " include/linux/vm_event_item.h         |   3 +=0A=\n",
+  " include/trace/events/pagefault.h      |  80 +++++=0A=\n",
+  " include/uapi/linux/perf_event.h       |   1 +=0A=\n",
+  " kernel/fork.c                         |   5 +-=0A=\n",
+  " mm/Kconfig                            |  22 ++=0A=\n",
+  " mm/huge_memory.c                      |   6 +-=0A=\n",
+  " mm/hugetlb.c                          |   2 +=0A=\n",
+  " mm/init-mm.c                          |   3 +=0A=\n",
+  " mm/internal.h                         |  20 ++=0A=\n",
+  " mm/khugepaged.c                       |   5 +=0A=\n",
+  " mm/madvise.c                          |   6 +-=0A=\n",
+  " mm/memory.c                           | 612 +++++++++++++++++++++++++++++-=\n",
+  "----=0A=\n",
+  " mm/mempolicy.c                        |  51 ++-=0A=\n",
+  " mm/migrate.c                          |   6 +-=0A=\n",
+  " mm/mlock.c                            |  13 +-=0A=\n",
+  " mm/mmap.c                             | 229 ++++++++++---=0A=\n",
+  " mm/mprotect.c                         |   4 +-=0A=\n",
+  " mm/mremap.c                           |  13 +=0A=\n",
+  " mm/nommu.c                            |   2 +-=0A=\n",
+  " mm/rmap.c                             |   5 +-=0A=\n",
+  " mm/swap.c                             |   6 +-=0A=\n",
+  " mm/swap_state.c                       |   8 +-=0A=\n",
+  " mm/vmstat.c                           |   5 +-=0A=\n",
+  " tools/include/uapi/linux/perf_event.h |   1 +=0A=\n",
+  " tools/perf/util/evsel.c               |   1 +=0A=\n",
+  " tools/perf/util/parse-events.c        |   4 +=0A=\n",
+  " tools/perf/util/parse-events.l        |   1 +=0A=\n",
+  " tools/perf/util/python.c              |   1 +=0A=\n",
+  " 44 files changed, 1161 insertions(+), 211 deletions(-)=0A=\n",
+  " create mode 100644 include/trace/events/pagefault.h=0A=\n",
+  "=0A=\n",
+  "--=0A=\n",
+  "2.7.4=0A=\n",
+  "=0A="
 ]
 
-2132a59670f1022071fbfa60e89a61cd2636364daab8fe796b1ad29853395339
+f4ab25561955834718a227aa66b5c54b3164e6658940eb8b0e0607429f284aa0
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.