[RFC][PATCH 0/6] Another go at speculative page faults

* [RFC][PATCH 0/6] Another go at speculative page faults
@ 2014-10-20 21:56 ` Peter Zijlstra
  0 siblings, 0 replies; 88+ messages in thread
From: Peter Zijlstra @ 2014-10-20 21:56 UTC (permalink / raw)
  To: torvalds, paulmck, tglx, akpm, riel, mgorman, oleg, mingo,
	minchan, kamezawa.hiroyu, viro, laijs, dave
  Cc: linux-kernel, linux-mm, Peter Zijlstra

Hi,

I figured I'd give my 2010 speculative fault series another spin:

  https://lkml.org/lkml/2010/1/4/257

Since then I think many of the outstanding issues have changed sufficiently to
warrant another go. In particular Al Viro's delayed fput seems to have made it
entirely 'normal' to delay fput(). Lai Jiangshan's SRCU rewrite provided us
with call_srcu() and my preemptible mmu_gather removed the TLB flushes from
under the PTL.

The code needs way more attention but builds a kernel and runs the
micro-benchmark so I figured I'd post it before sinking more time into it.

I realize the micro-bench is about as good as it gets for this series and not
very realistic otherwise, but I think it does show the potential benefit the
approach has.

(patches go against .18-rc1+)

---

Using Kamezawa's multi-fault micro-bench from: https://lkml.org/lkml/2010/1/6/28

My Ivy Bridge EP (2*10*2) has a ~58% improvement in pagefault throughput:

PRE:

root@ivb-ep:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 20

 Performance counter stats for './multi-fault 20' (5 runs):

       149,441,555      page-faults                  ( +-  1.25% )
     2,153,651,828      cache-misses                 ( +-  1.09% )

      60.003082014 seconds time elapsed              ( +-  0.00% )

POST:

root@ivb-ep:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 20

 Performance counter stats for './multi-fault 20' (5 runs):

       236,442,626      page-faults                  ( +-  0.08% )
     2,796,353,939      cache-misses                 ( +-  1.01% )

      60.002792431 seconds time elapsed              ( +-  0.00% )

My Ivy Bridge EX (4*15*2) has a ~78% improvement in pagefault throughput:

PRE:

root@ivb-ex:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 60

 Performance counter stats for './multi-fault 60' (5 runs):

       105,789,078      page-faults                 ( +-  2.24% )
     1,314,072,090      cache-misses                ( +-  1.17% )

      60.009243533 seconds time elapsed             ( +-  0.00% )

POST:

root@ivb-ex:~# perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault 60

 Performance counter stats for './multi-fault 60' (5 runs):

       187,751,767      page-faults                 ( +-  2.24% )
     1,792,758,664      cache-misses                ( +-  2.30% )

      60.011611579 seconds time elapsed             ( +-  0.00% )

(I've not yet looked at why the EX sucks chunks compared to the EP box, I
 suspect we contend on other locks, but it could be anything.)

---

 arch/x86/mm/fault.c      |  35 ++-
 include/linux/mm.h       |  19 +-
 include/linux/mm_types.h |   5 +
 kernel/fork.c            |   1 +
 mm/init-mm.c             |   1 +
 mm/internal.h            |  18 ++
 mm/memory.c              | 672 ++++++++++++++++++++++++++++-------------------
 mm/mmap.c                | 101 +++++--
 8 files changed, 544 insertions(+), 308 deletions(-)

^ permalink raw reply	[flat|nested] 88+ messages in thread