On Tue, 5 Jan 2010 20:20:56 -0800 (PST) Linus Torvalds wrote: > > > On Wed, 6 Jan 2010, KAMEZAWA Hiroyuki wrote: > > > > > > Of course, your other load with MADV_DONTNEED seems to be horrible, and > > > has some nasty spinlock issues, but that looks like a separate deal (I > > > assume that load is just very hard on the pgtable lock). > > > > It's zone->lock, I guess. My test program avoids pgtable lock problem. > > Yeah, I should have looked more at your callchain. That's nasty. Much > worse than the per-mm lock. I thought the page buffering would avoid the > zone lock becoming a huge problem, but clearly not in this case. > For my mental peace, I rewrote test program as while () { touch memory barrier madvice DONTNEED all range by cpu 0 barrier } And serialize madivce(). Then, zone->lock disappears and I don't see big difference with XADD rwsem and my tricky patch. I think I got reasonable result and fixing rwsem is the sane way. next target will be clear_page()? hehe. What catches my eyes is cost of memcg... (>_< Thank you all, -Kame == [XADD rwsem] [root@bluextal memory]# /root/bin/perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault-all 8 Performance counter stats for './multi-fault-all 8' (5 runs): 33029186 page-faults ( +- 0.146% ) 348698659 cache-misses ( +- 0.149% ) 60.002876268 seconds time elapsed ( +- 0.001% ) # Samples: 815596419603 # # Overhead Command Shared Object Symbol # ........ ............... ........................ ...... # 41.51% multi-fault-all [kernel] [k] clear_page_c 9.08% multi-fault-all [kernel] [k] down_read_trylock 6.23% multi-fault-all [kernel] [k] up_read 6.17% multi-fault-all [kernel] [k] __mem_cgroup_try_charg 4.76% multi-fault-all [kernel] [k] handle_mm_fault 3.77% multi-fault-all [kernel] [k] __mem_cgroup_commit_ch 3.62% multi-fault-all [kernel] [k] __rmqueue 2.30% multi-fault-all [kernel] [k] _raw_spin_lock 2.30% multi-fault-all [kernel] [k] page_fault 2.12% multi-fault-all [kernel] [k] mem_cgroup_charge_comm 2.05% multi-fault-all [kernel] [k] bad_range 1.78% multi-fault-all [kernel] [k] _raw_spin_lock_irq 1.53% multi-fault-all [kernel] [k] lookup_page_cgroup 1.44% multi-fault-all [kernel] [k] __mem_cgroup_uncharge_ 1.41% multi-fault-all ./multi-fault-all [.] worker 1.30% multi-fault-all [kernel] [k] get_page_from_freelist 1.06% multi-fault-all [kernel] [k] page_remove_rmap [async page fault] [root@bluextal memory]# /root/bin/perf stat -e page-faults,cache-misses --repeat 5 ./multi-fault-all 8 Performance counter stats for './multi-fault-all 8' (5 runs): 33345089 page-faults ( +- 0.555% ) 357660074 cache-misses ( +- 1.438% ) 60.003711279 seconds time elapsed ( +- 0.002% ) 40.94% multi-fault-all [kernel] [k] clear_page_c 6.96% multi-fault-all [kernel] [k] vma_put 6.82% multi-fault-all [kernel] [k] page_add_new_anon_rmap 5.86% multi-fault-all [kernel] [k] __mem_cgroup_try_charg 4.40% multi-fault-all [kernel] [k] __rmqueue 4.14% multi-fault-all [kernel] [k] find_vma_speculative 3.97% multi-fault-all [kernel] [k] handle_mm_fault 3.52% multi-fault-all [kernel] [k] _raw_spin_lock 3.46% multi-fault-all [kernel] [k] __mem_cgroup_commit_ch 2.23% multi-fault-all [kernel] [k] bad_range 2.16% multi-fault-all [kernel] [k] mem_cgroup_charge_comm 1.96% multi-fault-all [kernel] [k] _raw_spin_lock_irq 1.75% multi-fault-all [kernel] [k] mem_cgroup_add_lru_lis 1.73% multi-fault-all [kernel] [k] page_fault