* 2.6.9-rc1: page_referenced_one() CPU consumption
@ 2004-09-10 10:51 Nikita Danilov
2004-09-10 12:14 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Nikita Danilov @ 2004-09-10 10:51 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Hugh Dickins
Hello,
in 2.6.9-rc1 page_referenced_one() is among top CPU consumers (which
wasn't a case for 2.6.8-rc2) in the host kernel when running heavily
loaded UML. readprofile -b shows that time is spent in
spin_lock(&mm->page_table_lock), so, I reckon, recent "rmaplock: kill
page_map_lock" changes are probably not completely unrelated.
Without any deep investigation, one possible scenario is that multiple
threads are doing (as part of direct reclaim),
refill_inactive_zone()
page_referenced()
page_referenced_file() /* (1) mapping->i_mmap_lock doesn't
serialize them */
page_referenced_one()
spin_lock(&mm->page_table_lock) /* (2) everybody is
serialized here */
(1) and (2) will be true if we have one huge address space with a lot of
VMAs, which seems to be exactly what UML does:
$ wc /proc/<UML-host-pid>/maps
4134 28931 561916
This didn't happen before, because page_referenced_one() used to
try-lock.
Nikita.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
2004-09-10 10:51 2.6.9-rc1: page_referenced_one() CPU consumption Nikita Danilov
@ 2004-09-10 12:14 ` Hugh Dickins
2004-09-10 12:21 ` Hugh Dickins
0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2004-09-10 12:14 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Linux Kernel Mailing List
On Fri, 10 Sep 2004, Nikita Danilov wrote:
>
> in 2.6.9-rc1 page_referenced_one() is among top CPU consumers (which
> wasn't a case for 2.6.8-rc2) in the host kernel when running heavily
> loaded UML. readprofile -b shows that time is spent in
> spin_lock(&mm->page_table_lock), so, I reckon, recent "rmaplock: kill
> page_map_lock" changes are probably not completely unrelated.
>
> Without any deep investigation, one possible scenario is that multiple
> threads are doing (as part of direct reclaim),
>
> refill_inactive_zone()
> page_referenced()
> page_referenced_file() /* (1) mapping->i_mmap_lock doesn't
> serialize them */
> page_referenced_one()
> spin_lock(&mm->page_table_lock) /* (2) everybody is
> serialized here */
>
> (1) and (2) will be true if we have one huge address space with a lot of
> VMAs, which seems to be exactly what UML does:
>
> $ wc /proc/<UML-host-pid>/maps
> 4134 28931 561916
>
> This didn't happen before, because page_referenced_one() used to
> try-lock.
I'd be very surprised if you're wrong.
I remarked on that in the ChangeLog comment: "Though I suppose
it's possible that we'll find that vmscan makes better progress with
trylocks than spinning - we're free to choose trylocks again if so."
I'm quite content to go back to a trylock in page_referenced_one - and
in try_to_unmap_one? But yours is the first report of an issue there,
so I'm inclined to wait for more reports (which should come flooding in
now you mention it!), and input from those with a better grasp than I
of how vmscan pans out in practice (Andrew, Nick, Con spring to mind).
Hugh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
2004-09-10 12:14 ` Hugh Dickins
@ 2004-09-10 12:21 ` Hugh Dickins
2004-09-11 1:01 ` Nick Piggin
0 siblings, 1 reply; 6+ messages in thread
From: Hugh Dickins @ 2004-09-10 12:21 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Linux Kernel Mailing List
On Fri, 10 Sep 2004, Hugh Dickins wrote:
>
> I'm quite content to go back to a trylock in page_referenced_one - and
> in try_to_unmap_one? But yours is the first report of an issue there,
> so I'm inclined to wait for more reports (which should come flooding in
> now you mention it!), and input from those with a better grasp than I
> of how vmscan pans out in practice (Andrew, Nick, Con spring to mind).
Just want to add, that there'd be little point in changing that back
to a trylock, if vmscan ends up cycling hopelessly around a larger
loop - though if the larger loop is more preemptible, that's a plus.
Hugh
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
2004-09-10 12:21 ` Hugh Dickins
@ 2004-09-11 1:01 ` Nick Piggin
2004-09-12 15:53 ` Nikita Danilov
0 siblings, 1 reply; 6+ messages in thread
From: Nick Piggin @ 2004-09-11 1:01 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Nikita Danilov, Linux Kernel Mailing List
Hugh Dickins wrote:
> On Fri, 10 Sep 2004, Hugh Dickins wrote:
>
>>I'm quite content to go back to a trylock in page_referenced_one - and
>>in try_to_unmap_one? But yours is the first report of an issue there,
>>so I'm inclined to wait for more reports (which should come flooding in
>>now you mention it!), and input from those with a better grasp than I
>>of how vmscan pans out in practice (Andrew, Nick, Con spring to mind).
>
>
> Just want to add, that there'd be little point in changing that back
> to a trylock, if vmscan ends up cycling hopelessly around a larger
> loop - though if the larger loop is more preemptible, that's a plus.
>
Yeah - I'm not sure why a trylock would perform better. If it is just
one big address space, and memory needs to be freed, presumably the
scanner will just choose a different page, and try the lock again.
Feel like doing a few more quick tests Nikita? ;)
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
2004-09-11 1:01 ` Nick Piggin
@ 2004-09-12 15:53 ` Nikita Danilov
2004-09-13 4:53 ` Nick Piggin
0 siblings, 1 reply; 6+ messages in thread
From: Nikita Danilov @ 2004-09-12 15:53 UTC (permalink / raw)
To: Nick Piggin; +Cc: Hugh Dickins, Linux Kernel Mailing List
Nick Piggin writes:
> Hugh Dickins wrote:
> > On Fri, 10 Sep 2004, Hugh Dickins wrote:
> >
> >>I'm quite content to go back to a trylock in page_referenced_one - and
> >>in try_to_unmap_one? But yours is the first report of an issue there,
> >>so I'm inclined to wait for more reports (which should come flooding in
> >>now you mention it!), and input from those with a better grasp than I
> >>of how vmscan pans out in practice (Andrew, Nick, Con spring to mind).
> >
> >
> > Just want to add, that there'd be little point in changing that back
> > to a trylock, if vmscan ends up cycling hopelessly around a larger
> > loop - though if the larger loop is more preemptible, that's a plus.
> >
>
> Yeah - I'm not sure why a trylock would perform better. If it is just
> one big address space, and memory needs to be freed, presumably the
> scanner will just choose a different page, and try the lock again.
>
> Feel like doing a few more quick tests Nikita? ;)
Ok, here are my highly unscientific results.
Work-load: copying 1G _byte_ file from XNU lustre client to the UML
lustre server running in the 2.6.9-rc1 host.
Top CPU consumers according to readprofile:
2.6.9-rc1 vanilla:
3312 prio_tree_parent 41.4000
3483 ide_do_request 3.9806
4899 page_referenced_file 25.1231
6138 __copy_from_user_ll 78.6923
7461 get_offset_pmtmr 54.0652
7492 __copy_to_user_ll 96.0513
8042 finish_task_switch 76.5905
9657 prio_tree_next 76.0394
10083 wait_task_stopped 11.7517
11080 sigprocmask 53.0144
13345 prio_tree_right 65.4167
14956 vma_prio_tree_next 173.9070
15838 prio_tree_left 92.6199
27049 eligible_child 124.0780
28533 try_to_unmap_one 64.8477
33810 system_call 768.4091
49865 __preempt_spin_lock 547.9670
56045 do_wait 47.5360
109964 page_referenced_one 388.5654
1529155 mwait_idle 19604.5513
2011318 total 0.7514
2.6.9-rc1 with patch (below) applied:
2999 prio_tree_parent 37.4875
3012 ide_outbsync 301.2000
3272 ide_do_request 3.7394
4365 page_referenced_file 22.3846
6031 __copy_to_user_ll 77.3205
6296 __copy_from_user_ll 80.7179
7563 get_offset_pmtmr 54.8043
7698 finish_task_switch 73.3143
9133 prio_tree_next 71.9134
9817 wait_task_stopped 11.4417
13242 prio_tree_right 64.9118
13620 vma_prio_tree_next 158.3721
15736 prio_tree_left 92.0234
17768 sigprocmask 85.0144
26260 try_to_unmap_one 59.9543
27096 eligible_child 124.2936
28141 system_call 639.5682
41002 __preempt_spin_lock 450.5714
58325 do_wait 49.4699
101512 page_referenced_one 347.6438
1648567 mwait_idle 21135.4744
2107521 total 0.7874
Patch:
----------------------------------------------------------------------
===== mm/rmap.c 1.77 vs edited =====
--- 1.77/mm/rmap.c 2004-08-24 13:08:39 +04:00
+++ edited/mm/rmap.c 2004-09-12 19:05:26 +04:00
@@ -268,7 +268,8 @@
if (address == -EFAULT)
goto out;
- spin_lock(&mm->page_table_lock);
+ if (!spin_trylock(&mm->page_table_lock))
+ goto out;
pgd = pgd_offset(mm, address);
if (!pgd_present(*pgd))
----------------------------------------------------------------------
I ran tests few times, and difference between patched and un-patched
kernels is within noise, so you are right, try-lock does not help.
But now I have new great idea instead. :)
I think page_referenced() should transfer dirtiness to the struct page
as it scans pte's. Basically the earlier we mark page dirty the better
file system write-back performs, because page has more chances to be
bulk-written by ->writepages(). This is better than my previous patches
to this end (that used separate function to transfer dirtiness from
pte's to the page), because
- locking overhead is avoided
- it's simpler.
Nick, are you still in business of benchmarking random VM patches? :-)
Nikita.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.6.9-rc1: page_referenced_one() CPU consumption
2004-09-12 15:53 ` Nikita Danilov
@ 2004-09-13 4:53 ` Nick Piggin
0 siblings, 0 replies; 6+ messages in thread
From: Nick Piggin @ 2004-09-13 4:53 UTC (permalink / raw)
To: Nikita Danilov; +Cc: Hugh Dickins, Linux Kernel Mailing List
Nikita Danilov wrote:
> I ran tests few times, and difference between patched and un-patched
> kernels is within noise, so you are right, try-lock does not help.
>
Well I'm glad - because I much prefer the spin_lock over the trylock :)
> But now I have new great idea instead. :)
>
> I think page_referenced() should transfer dirtiness to the struct page
> as it scans pte's. Basically the earlier we mark page dirty the better
> file system write-back performs, because page has more chances to be
> bulk-written by ->writepages(). This is better than my previous patches
> to this end (that used separate function to transfer dirtiness from
> pte's to the page), because
>
> - locking overhead is avoided
>
> - it's simpler.
>
> Nick, are you still in business of benchmarking random VM patches? :-)
>
Yeah I am, and I do have that patch sitting around. It can *really*
help for writeout via maped memory (obviously doesn't help write()).
I think Andrew's response was that it can theoretically cause writeout
for workloads that don't want it, so I should come up with at least
one real-world improvement!
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2004-09-13 4:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-09-10 10:51 2.6.9-rc1: page_referenced_one() CPU consumption Nikita Danilov
2004-09-10 12:14 ` Hugh Dickins
2004-09-10 12:21 ` Hugh Dickins
2004-09-11 1:01 ` Nick Piggin
2004-09-12 15:53 ` Nikita Danilov
2004-09-13 4:53 ` Nick Piggin
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).