* [PATCH] fix obj vma sorting
@ 2003-04-08 18:16 Hugh Dickins
2003-04-09 17:07 ` Martin J. Bligh
0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2003-04-08 18:16 UTC (permalink / raw)
To: Andrew Morton; +Cc: Dave McCracken, linux-kernel
Fix several points in objrmap's vma sorting:
1. It was adding all vmas, even private ones, to i_mmap_shared.
2. It was not quite sorting: list_add_tail is needed in all cases.
3. If vm_pgoff is changed on a file vma (as in vma_merge and split_vma)
we must unlink vma from list and relink while holding i_shared_sem:
move_vma_start to do this (holds page_table_lock too, as vma_merge
did and split_vma did not: I think nothing needs that, rip it out
if you like, but my guess was that you'd prefer the extra safety).
Sorry, no, this doesn't magically make it all a hundred times faster.
--- 2.5.67-mm1/mm/mmap.c Tue Apr 8 14:02:06 2003
+++ linux/mm/mmap.c Tue Apr 8 18:06:07 2003
@@ -321,16 +321,13 @@
else
vmhead = &mapping->i_mmap;
- list_for_each(vmlist, &mapping->i_mmap_shared) {
+ list_for_each(vmlist, vmhead) {
struct vm_area_struct *vmtemp;
vmtemp = list_entry(vmlist, struct vm_area_struct, shared);
if (vmtemp->vm_pgoff >= vma->vm_pgoff)
break;
}
- if (vmlist == vmhead)
- list_add_tail(&vma->shared, vmlist);
- else
- list_add(&vma->shared, vmlist);
+ list_add_tail(&vma->shared, vmlist);
}
}
@@ -366,6 +363,28 @@
validate_mm(mm);
}
+static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
+{
+ spinlock_t *lock = &vma->vm_mm->page_table_lock;
+ struct inode *inode = NULL;
+
+ if (vma->vm_file) {
+ inode = vma->vm_file->f_dentry->d_inode;
+ down(&inode->i_mapping->i_shared_sem);
+ }
+ spin_lock(lock);
+ if (inode)
+ __remove_shared_vm_struct(vma, inode);
+ /* If no vm_file, perhaps we should always keep vm_pgoff at 0?? */
+ vma->vm_pgoff += (long)(addr - vma->vm_start) >> PAGE_SHIFT;
+ vma->vm_start = addr;
+ if (inode) {
+ __vma_link_file(vma);
+ up(&inode->i_mapping->i_shared_sem);
+ }
+ spin_unlock(lock);
+}
+
/*
* Return true if we can merge this (vm_flags,file,vm_pgoff,size)
* in front of (at a lower virtual address and file offset than) the vma.
@@ -422,8 +441,6 @@
unsigned long end, unsigned long vm_flags,
struct file *file, unsigned long pgoff)
{
- spinlock_t * lock = &mm->page_table_lock;
-
if (!prev) {
prev = rb_entry(rb_parent, struct vm_area_struct, vm_rb);
goto merge_next;
@@ -435,6 +452,7 @@
if (prev->vm_end == addr &&
can_vma_merge_after(prev, vm_flags, file, pgoff)) {
struct vm_area_struct *next;
+ spinlock_t *lock = &mm->page_table_lock;
struct inode *inode = file ? file->f_dentry->d_inode : NULL;
int need_up = 0;
@@ -480,10 +498,7 @@
pgoff, (end - addr) >> PAGE_SHIFT))
return 0;
if (end == prev->vm_start) {
- spin_lock(lock);
- prev->vm_start = addr;
- prev->vm_pgoff -= (end - addr) >> PAGE_SHIFT;
- spin_unlock(lock);
+ move_vma_start(prev, addr);
return 1;
}
}
@@ -1203,8 +1218,7 @@
if (new_below) {
new->vm_end = addr;
- vma->vm_start = addr;
- vma->vm_pgoff += ((addr - new->vm_start) >> PAGE_SHIFT);
+ move_vma_start(vma, addr);
} else {
vma->vm_end = addr;
new->vm_start = addr;
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-08 18:16 [PATCH] fix obj vma sorting Hugh Dickins
@ 2003-04-09 17:07 ` Martin J. Bligh
2003-04-09 18:24 ` Hugh Dickins
0 siblings, 1 reply; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-09 17:07 UTC (permalink / raw)
To: Hugh Dickins, Andrew Morton; +Cc: Dave McCracken, linux-kernel
Hmmm. Something somewhere went wrong. Some semaphore blew up
somewhere ... I'm not convinced that this is your patch
causing the problem, I just thought that since vma_link seems
to have gone up rather in the profile. I'm playing with getting
some better data on what actually happened, but in case someone
is feeling psychic.
The main thing I changed here (66-mjb2 -> 67-mjb0.2) was to pick up
Andrew's rmap speedups, and drop the objrmap code I had for the stuff
he had. *However*, what he had worked fine. I also picked up your
sorting patch here Hugh ... this bit worries me:
+static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
+{
+ spinlock_t *lock = &vma->vm_mm->page_table_lock;
+ struct inode *inode = NULL;
+
+ if (vma->vm_file) {
+ inode = vma->vm_file->f_dentry->d_inode;
+ down(&inode->i_mapping->i_shared_sem);
+ }
+ spin_lock(lock);
+ if (inode)
+ __remove_shared_vm_struct(vma, inode);
+ /* If no vm_file, perhaps we should always keep vm_pgoff at 0?? */
+ vma->vm_pgoff += (long)(addr - vma->vm_start) >> PAGE_SHIFT;
+ vma->vm_start = addr;
+ if (inode) {
+ __vma_link_file(vma);
+ up(&inode->i_mapping->i_shared_sem);
+ }
+ spin_unlock(lock);
+}
M.
DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.
Results are shown as percentages of the first set displayed
SDET 128 (see disclaimer)
Throughput Std. Dev
2.5.66 100.0% 0.2%
2.5.67 97.7% 5.1%
2.5.66-mm2 176.1% 0.6%
2.5.67-mm1 176.7% 0.2%
2.5.66-mjb2 181.8% 0.0%
2.5.67-mjb0.2 141.1% 0.1%
diffprofile {2.5.66-mjb2,2.5.67-mjb0.2}/sdetbench/128/profile
(these are at 100 Hz).
12913 38.8% default_idle
12472 20.2% total
3085 912.7% __down
1026 385.7% schedule
946 666.2% __wake_up
904 0.0% __d_lookup
626 0.0% move_vma_start
452 6457.1% __vma_link
159 40.9% remove_shared_vm_struct
84 36.4% do_no_page
69 22.5% copy_mm
65 125.0% vma_link
37 528.6% default_wake_function
31 9.0% do_wp_page
29 290.0% rb_insert_color
19 95.0% try_to_wake_up
18 450.0% __vma_link_rb
17 6.0% clear_page_tables
15 20.3% handle_mm_fault
14 140.0% find_vma_prepare
14 700.0% __rb_rotate_left
13 46.4% exit_mmap
11 110.0% kunmap_atomic
10 24.4% do_mmap_pgoff
...
-102 -58.6% __read_lock_failed
-124 -43.5% path_release
-126 -17.7% __copy_to_user_ll
-168 -20.1% release_pages
-189 -21.1% page_add_rmap
-223 -33.2% path_lookup
-241 -15.3% zap_pte_range
-247 -18.3% page_remove_rmap
-310 -46.5% follow_mount
-405 -70.7% .text.lock.dcache
-425 -76.6% .text.lock.namei
-551 -49.5% atomic_dec_and_lock
-628 -71.2% .text.lock.dec_and_lock
-1148 -98.5% d_lookup
diffprofile {2.5.67-mm1,2.5.67-mjb0.2}/sdetbench/128/profile
110028 31.3% default_idle
92085 14.2% total
31265 1054.5% __down
10473 428.0% schedule
9351 611.6% __wake_up
6260 0.0% move_vma_start
4200 1076.9% __vma_link
1328 32.0% remove_shared_vm_struct
831 1695.9% find_trylock_page
567 17.8% copy_mm
428 57.7% vma_link
380 633.3% default_wake_function
294 306.2% rb_insert_color
182 87.5% try_to_wake_up
177 411.6% __vma_link_rb
158 62.7% exit_mmap
150 0.0% rcu_do_batch
135 540.0% __rb_rotate_left
...
-196 -31.3% block_invalidatepage
-202 -39.5% ext2_new_block
-204 -54.5% .text.lock.inode
-208 -33.1% task_mem
-213 -6.6% clear_page_tables
-213 -55.6% d_lookup
-218 -44.7% select_parent
-228 -21.2% kmap_high
-235 -46.5% read_block_bitmap
-241 -47.2% d_path
-244 -57.5% complete
-261 -100.0% group_release_blocks
-263 -31.6% proc_root_link
-264 -17.9% number
-295 -38.6% strnlen_user
-296 -26.8% task_vsize
-320 -49.2% generic_file_aio_write_nolock
-331 -75.1% call_rcu
-334 -23.3% __fput
-336 -56.4% may_open
-339 -35.3% dput
-343 -51.0% __find_get_block_slow
-348 -40.6% d_instantiate
-354 -65.1% __alloc_pages
-371 -41.6% prune_dcache
-377 -100.0% group_reserve_blocks
-380 -22.6% release_task
-398 -55.4% generic_fillattr
-398 -31.4% exit_notify
-420 -51.9% unmap_vmas
-424 -35.5% file_kill
-427 -72.7% read_inode_bitmap
-435 -41.2% proc_check_root
-459 -16.3% free_pages_and_swap_cache
-480 -14.3% do_anonymous_page
-517 -43.9% ext2_new_inode
-519 -72.2% ext2_get_group_desc
-527 -35.9% fd_install
-537 -28.8% d_alloc
-559 -30.4% __find_get_block
-574 -49.7% __mark_inode_dirty
-575 -38.0% .text.lock.highmem
-580 -44.6% .text.lock.attr
-598 -24.6% file_move
-603 -23.4% copy_process
-628 -27.4% filemap_nopage
-633 -42.1% __set_page_dirty_buffers
-634 -24.0% proc_pid_stat
-636 -42.2% .text.lock.base
-705 -28.5% link_path_walk
-716 -60.9% flush_signal_handlers
-758 -11.5% __copy_to_user_ll
-780 -52.0% .text.lock.file_table
-781 -36.3% free_hot_cold_page
-834 -91.2% update_atime
-906 -42.2% buffered_rmqueue
-916 -56.0% __read_lock_failed
-920 -34.3% kmem_cache_free
-993 -38.1% path_release
-1002 -71.0% __brelse
-1106 -13.6% page_add_rmap
-1256 -39.7% pte_alloc_one
-1303 -29.3% do_no_page
-1365 -83.5% grab_block
-1522 -80.4% current_kernel_time
-1819 -21.4% release_pages
-1902 -14.2% copy_page_range
-2149 -32.4% path_lookup
-2150 -62.3% .text.lock.namei
-2464 -21.4% __d_lookup
-2486 -26.1% find_get_page
-2499 -41.2% follow_mount
-3174 -22.3% page_remove_rmap
-3217 -65.7% .text.lock.dcache
-4119 -42.3% atomic_dec_and_lock
-4359 -24.6% zap_pte_range
-4551 -100.0% .text.lock.filemap
-4665 -64.7% .text.lock.dec_and_lock
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-09 17:07 ` Martin J. Bligh
@ 2003-04-09 18:24 ` Hugh Dickins
2003-04-09 18:33 ` Martin J. Bligh
0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2003-04-09 18:24 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Dave McCracken, linux-kernel
On Wed, 9 Apr 2003, Martin J. Bligh wrote:
> Hmmm. Something somewhere went wrong. Some semaphore blew up
> somewhere ... I'm not convinced that this is your patch
> causing the problem, I just thought that since vma_link seems
> to have gone up rather in the profile. I'm playing with getting
> some better data on what actually happened, but in case someone
> is feeling psychic.
>
> The main thing I changed here (66-mjb2 -> 67-mjb0.2) was to pick up
> Andrew's rmap speedups, and drop the objrmap code I had for the stuff
I haven't examined it, but I'm guessing 66-mjb2 did not have Dave's
vma sorting in at all? Its linear search would certainly raise the
time spent in __vma_link (notable in your diffprofile), which would
increase the pressure on i_shared_sem.
(Whether it's a worthwhile optimization remains to be seen: like
rmap generally, it speeds up page_referenced and try_to_unmap at
the expense of the fast path. One improvement would be for fork
to just slot dst vma in next to src vma instead of linear search.)
I don't think my fix to the sort order could have slowed it down
further (though once there are stray entries out of order, it may
be hard to predict how things will work out). But without it
page_referenced and try_to_unmap sometimes couldn't quite find
all the mappings they were looking for.
> he had. *However*, what he had worked fine. I also picked up your
> sorting patch here Hugh ... this bit worries me:
>
> +static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
It does use i_shared_sem where it wasn't used before, yes, but it's
only called by one case of vma_merge and one case of split_vma:
unless your tests are doing a lot of vma splitting (e.g. mprotecting
ranges which break up vmas), I wouldn't expect it to figure highly.
I can see it's there in the plus part of your diffprofile, but I'm
too inexperienced at reading these things, without the original
profiles, to tell whether it's being used a surprising amount.
When you say "*However*, what he had worked fine", are you saying
you profiled before adding in my patch on top? The diffprofile of
the before and after my patch should in that case illuminate.
Hugh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-09 18:24 ` Hugh Dickins
@ 2003-04-09 18:33 ` Martin J. Bligh
2003-04-09 19:20 ` Hugh Dickins
0 siblings, 1 reply; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-09 18:33 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, Dave McCracken, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 5724 bytes --]
>> Hmmm. Something somewhere went wrong. Some semaphore blew up
>> somewhere ... I'm not convinced that this is your patch
>> causing the problem, I just thought that since vma_link seems
>> to have gone up rather in the profile. I'm playing with getting
>> some better data on what actually happened, but in case someone
>> is feeling psychic.
>>
>> The main thing I changed here (66-mjb2 -> 67-mjb0.2) was to pick up
>> Andrew's rmap speedups, and drop the objrmap code I had for the stuff
>
> I haven't examined it, but I'm guessing 66-mjb2 did not have Dave's
> vma sorting in at all? Its linear search would certainly raise the
> time spent in __vma_link (notable in your diffprofile), which would
> increase the pressure on i_shared_sem.
No it didn't ... but I think 67-mm1 did.
> (Whether it's a worthwhile optimization remains to be seen: like
> rmap generally, it speeds up page_referenced and try_to_unmap at
> the expense of the fast path. One improvement would be for fork
> to just slot dst vma in next to src vma instead of linear search.)
>
> I don't think my fix to the sort order could have slowed it down
> further (though once there are stray entries out of order, it may
> be hard to predict how things will work out). But without it
> page_referenced and try_to_unmap sometimes couldn't quite find
> all the mappings they were looking for.
It is that fix ... I just backed that one patch off and recompared:
DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.
Results are shown as percentages of the first set displayed
SDET 32 (see disclaimer)
Throughput Std. Dev
2.5.67 100.0% 0.3%
2.5.67-mjb0.2 151.7% 0.5%
2.5.67-mjb0.2-nosort 207.1% 0.0%
SDET 64 (see disclaimer)
Throughput Std. Dev
2.5.67 100.0% 0.4%
2.5.67-mjb0.2 147.0% 0.5%
2.5.67-mjb0.2-nosort 201.5% 0.2%
SDET 128 (see disclaimer)
Throughput Std. Dev
2.5.67 100.0% 5.1%
2.5.67-mjb0.2 144.5% 0.1%
2.5.67-mjb0.2-nosort 188.6% 0.3%
I think it's that sem, which seems to be heavily contented.
Quite possibly for glibc's address_space or something.
(even though it says "-nosort", it's just your sort fix I
backed out ... otherwise it's what was in -mm).
>> he had. *However*, what he had worked fine. I also picked up your
>> sorting patch here Hugh ... this bit worries me:
>>
>> +static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
>
> It does use i_shared_sem where it wasn't used before, yes, but it's
> only called by one case of vma_merge and one case of split_vma:
> unless your tests are doing a lot of vma splitting (e.g. mprotecting
> ranges which break up vmas), I wouldn't expect it to figure highly.
> I can see it's there in the plus part of your diffprofile, but I'm
> too inexperienced at reading these things, without the original
> profiles, to tell whether it's being used a surprising amount.
Here's the diffprofile for just your patch ... where it's positive,
that's the increase in the number of ticks by applying your patch.
Where it's negative, that's the decrease. The %age is the change from
the first to the second profile:
larry:/var/bench/results# diffprofile 2.5.67-mjb0.2{-nosort,}/sdetbench/64/profile
7148 24.9% total
6482 37.7% default_idle
1466 842.5% __down
442 566.7% __wake_up
435 378.3% schedule
251 0.0% move_vma_start
149 876.5% __vma_link
72 40.2% remove_shared_vm_struct
46 35.1% copy_mm
20 60.6% vma_link
12 300.0% default_wake_function
11 137.5% rb_insert_color
...
-20 -37.0% number
-20 -12.6% do_anonymous_page
-21 -36.8% fd_install
-23 -27.7% __find_get_block
-24 -55.8% flush_signal_handlers
-27 -45.0% __set_page_dirty_buffers
-28 -26.7% kmem_cache_free
-28 -7.5% find_get_page
-29 -34.1% buffered_rmqueue
-32 -34.8% path_release
-33 -32.0% file_move
-35 -60.3% __read_lock_failed
-35 -43.8% .text.lock.highmem
-37 -59.7% .text.lock.namei
-37 -29.1% pte_alloc_one
-40 -10.3% page_add_rmap
-41 -41.4% free_hot_cold_page
-44 -60.3% .text.lock.file_table
-54 -18.4% __copy_to_user_ll
-58 -43.0% follow_mount
-62 -29.0% path_lookup
-85 -20.9% __d_lookup
-86 -20.4% release_pages
-99 -68.8% .text.lock.dcache
-100 -15.4% page_remove_rmap
-106 -36.6% atomic_dec_and_lock
-126 -16.8% zap_pte_range
-141 -66.8% .text.lock.dec_and_lock
Note the massive increase in down() (and some of the vma ops).
The things that are cheaper are probably just because of less
contention, I guess.
> When you say "*However*, what he had worked fine", are you saying
> you profiled before adding in my patch on top? The diffprofile of
> the before and after my patch should in that case illuminate.
Well, I hadn't ... but I should have done, and I have now ;-)
I'll attach the two raw profiles for you as well. profile.with
is with your patch, profile.without is without ... I was looking
at SDET 64, since it showed the most dramatic difference.
M.
[-- Attachment #2: profile.with --]
[-- Type: application/octet-stream, Size: 9977 bytes --]
35905 total
23653 default_idle
1640 __down
622 zap_pte_range
580 copy_page_range
551 page_remove_rmap
550 schedule
520 __wake_up
349 page_add_rmap
347 find_get_page
335 release_pages
321 __d_lookup
251 remove_shared_vm_struct
251 move_vma_start
239 __copy_to_user_ll
184 atomic_dec_and_lock
177 copy_mm
166 __vma_link
163 do_wp_page
152 path_lookup
150 do_no_page
139 do_anonymous_page
136 clear_page_tables
121 free_pages_and_swap_cache
116 do_page_fault
90 pte_alloc_one
89 filemap_nopage
88 copy_process
77 kmem_cache_free
77 follow_mount
70 file_move
70 .text.lock.dec_and_lock
65 link_path_walk
62 release_task
60 path_release
60 __find_get_block
58 proc_pid_stat
58 free_hot_cold_page
56 buffered_rmqueue
54 find_trylock_page
53 vma_link
52 page_address
51 grab_block
51 d_alloc
51 __block_prepare_write
45 system_call
45 .text.lock.highmem
45 .text.lock.dcache
43 fput
43 exit_notify
42 __fput
41 kmap_atomic
41 __copy_user_intel
41 __copy_from_user_ll
39 handle_mm_fault
38 kmap_high
38 file_kill
36 find_vma
36 fd_install
34 number
34 alloc_inode
33 __set_page_dirty_buffers
31 fget
31 ext2_new_inode
29 new_inode
29 dnotify_parent
29 .text.lock.file_table
28 kmalloc
25 .text.lock.namei
24 set_page_address
24 ext2_update_inode
24 dentry_open
23 task_vsize
23 exit_mmap
23 dput
23 do_generic_mapping_read
23 deny_write_access
23 __read_lock_failed
22 vsnprintf
22 real_lookup
22 do_mmap_pgoff
22 block_invalidatepage
21 d_instantiate
21 __mark_inode_dirty
20 radix_tree_lookup
20 current_kernel_time
19 rb_insert_color
19 flush_signal_handlers
18 unmap_vmas
18 strnlen_user
18 pte_alloc_map
18 file_ra_state_init
18 ext2_new_block
18 do_page_cache_readahead
17 __generic_file_aio_read
16 prune_dcache
16 ext2_free_blocks
16 default_wake_function
15 read_block_bitmap
15 proc_pid_status
15 generic_file_aio_write_nolock
15 generic_delete_inode
15 exec_mmap
15 __insert_inode_hash
14 task_mem
14 select_parent
14 render_sigset_t
14 get_pid_list
14 do_lookup
14 __brelse
13 igrab
13 find_vma_prepare
13 filp_close
12 vfs_read
12 sys_brk
12 prep_new_page
12 find_group_other
12 del_timer_sync
12 d_delete
12 __pagevec_lru_add_active
12 __find_get_block_slow
11 unlock_page
11 try_to_wake_up
11 truncate_inode_pages
11 split_vma
11 proc_check_root
11 may_open
11 kunmap_high
11 kunmap_atomic
11 generic_fillattr
11 find_get_pages
11 ext2_find_entry
11 copy_files
11 .text.lock.attr
10 proc_root_link
10 proc_fd_link
10 open_namei
10 mark_page_accessed
10 inode_change_ok
10 ext2_get_inode
10 ext2_get_block
10 dup_task_struct
10 copy_strings
9 wake_up_forked_process
9 sys_wait4
9 strncpy_from_user
9 read_inode_bitmap
9 flush_tlb_mm
9 ext2_preread_inode
9 ext2_get_group_desc
9 ext2_discard_prealloc
9 ext2_add_link
9 create_buffers
9 __vma_link_rb
9 .text.lock.base
8 get_unused_fd
8 complete
8 __alloc_pages
7 vm_enough_memory
7 vfs_unlink
7 try_to_free_buffers
7 flush_old_exec
7 ext2_truncate
7 ext2_reserve_inode
7 do_sigaction
7 d_lookup
7 __pte_chain_free
7 .text.lock.inode
6 vm_acct_memory
6 sigprocmask
6 page_cache_readahead
6 get_write_access
6 get_empty_filp
6 generic_file_write
6 do_exit
6 dnotify_flush
6 __block_commit_write
5 vma_merge
5 sys_read
5 sys_close
5 pte_chain_alloc
5 old_mmap
5 lru_cache_add_active
5 get_wchan
5 get_signal_to_deliver
5 generic_file_mmap
5 flush_tlb_page
5 file_read_actor
5 ext2_block_to_path
5 call_rcu
4 vfs_write
4 vfs_getattr
4 update_atime
4 sys_open
4 schedule_tail
4 proc_pid_readlink
4 proc_delete_inode
4 pipe_write
4 pid_fd_revalidate
4 load_elf_binary
4 kmem_cache_alloc
4 generic_file_open
4 ext2_inode_by_name
4 ext2_commit_chunk
4 do_munmap
4 d_path
4 cp_new_stat64
4 build_mmap_rb
4 bad_range
4 __rb_rotate_left
4 __rb_erase_color
4 __pagevec_lru_add
4 __lookup
3 wait_task_zombie
3 vfs_readdir
3 vfs_permission
3 unmap_vma
3 set_cpus_allowed
3 search_binary_handler
3 sched_best_cpu
3 proc_info_read
3 pid_revalidate
3 lookup_mnt
3 iput
3 inode_update_time
3 getname
3 generic_file_read
3 find_lock_page
3 ext2_release_inode
3 ext2_readdir
3 ext2_get_page
3 ext2_free_inode
3 ext2_free_branches
3 eventpoll_release
3 eligible_child
3 do_brk
3 clear_user
3 alloc_pidmap
3 __set_page_dirty_nobuffers
3 __rb_rotate_right
3 __iget
2 wait_for_completion
2 unmap_region
2 unmap_page_range
2 sys_unlink
2 sys_ioctl
2 setup_arg_pages
2 set_fs_pwd
2 sem_exit
2 remove_wait_queue
2 rcu_do_batch
2 put_filp
2 put_files_struct
2 profile_exit_mmap
2 proc_root_lookup
2 proc_pid_lookup
2 proc_lookup
2 proc_base_lookup
2 prepare_binprm
2 pipe_read
2 pid_base_iput
2 pgd_free
2 pgd_ctor
2 permission
2 page_waitqueue
2 page_cache_readaround
2 mm_alloc
2 migration_thread
2 mark_buffer_dirty
2 lru_add_drain
2 lookup_hash
2 locks_remove_posix
2 load_elf_interp
2 kstat_read_proc
2 kill_fasync
2 kfree_percpu
2 inode_times_differ
2 generic_file_llseek
2 generic_commit_write
2 ext2_lookup
2 ext2_delete_entry
2 drop_buffers
2 do_execve
2 create_empty_buffers
2 copy_thread
2 copy_namespace
2 bad_get_user
2 __free_pages
2 __d_path
2 __clear_page_buffers
2 __bread
2 .text.lock.sem
1 zap_pmd_range
1 vfs_stat
1 vfs_rmdir
1 test_clear_page_dirty
1 task_dumpable
1 syscall_exit
1 sys_write
1 sys_rt_sigprocmask
1 sys_rt_sigaction
1 sys_newuname
1 sys_lstat64
1 sys_llseek
1 sys_execve
1 set_bh_page
1 rwsem_wake
1 ret_from_intr
1 remove_from_page_cache
1 read_cache_page
1 rb_erase
1 radix_tree_preload
1 radix_tree_insert
1 radix_tree_delete
1 profile_exit_task
1 proc_pid_make_inode
1 proc_lookupfd
1 pipe_wait
1 pgd_alloc
1 open_exec
1 notify_change
1 mprotect_fixup
1 mm_release
1 lru_cache_add
1 is_bad_inode
1 invalidate_vcache
1 insert_vm_struct
1 inode_setattr
1 inode_init_once
1 inode_has_buffers
1 get_unmapped_area
1 get_jiffies_64
1 generic_file_write_nolock
1 find_vma_prev
1 find_task_by_pid
1 find_group_orlov
1 filp_open
1 filldir64
1 ext2_statfs
1 ext2_prepare_write
1 ext2_group_sparse
1 ext2_count_free_inodes
1 ext2_alloc_block
1 exit_itimers
1 elf_map
1 do_fork
1 detach_vmas_to_be_unmapped
1 d_rehash
1 d_invalidate
1 d_free
1 count_open_files
1 change_protection
1 can_vma_merge_after
1 cache_grow
1 balance_dirty_pages_ratelimited
1 add_to_page_cache
1 __user_walk
1 __up
1 __pagevec_free
1 __getblk
1 __get_page_state
1 __get_free_pages
1 __copy_user_zeroing_intel
1 .text.lock.dnotify
0 write_profile
0 wake_up_buffer
0 wait_on_page_bit
0 vsscanf
0 vfs_mkdir
0 vfs_lstat
0 vfs_fstat
0 vfs_follow_link
0 vfs_create
0 unmap_vma_list
0 unmap_underlying_metadata
0 unlock_buffer
0 unix_create1
0 unix_create
0 try_to_release_page
0 truncate_complete_page
0 task_nice
0 syscall_call
0 sys_vhangup
0 sys_vfork
0 sys_utime
0 sys_time
0 sys_sysctl
0 sys_statfs
0 sys_sigreturn
0 sys_readlink
0 sys_munmap
0 sys_mprotect
0 sys_getpid
0 sys_fstat64
0 sys_dup2
0 sys_chown
0 sys_chmod
0 sys_chdir
0 sys_access
0 sprintf
0 sock_map_fd
0 sock_init_data
0 smp_call_function
0 skip_atoi
0 sk_alloc
0 si_swapinfo
0 setup_sigcontext
0 setup_frame
0 setattr_mask
0 set_binfmt
0 sched_migrate_task
0 save_i387_fxsave
0 save_i387
0 rwsem_down_write_failed
0 rwsem_down_read_failed
0 resume_userspace
0 restore_sigcontext
0 restore_fpu
0 restore_all
0 remove_suid
0 release_x86_irqs
0 recalc_bh_state
0 radix_tree_gang_lookup
0 radix_tree_extend
0 put_unused_fd
0 pty_unthrottle
0 pte_alloc_kernel
0 proc_root_readdir
0 proc_read_inode
0 proc_file_read
0 proc_file_lseek
0 proc_destroy_inode
0 proc_alloc_inode
0 prepare_to_copy
0 posix_block_lock
0 pipe_write_release
0 pipe_write_fasync
0 pipe_release
0 pipe_read_fasync
0 pipe_ioctl
0 pid_delete_dentry
0 parse_table
0 pagevec_lookup
0 page_slot
0 open_private_file
0 nr_free_pages
0 nr_blockdev_pages
0 nobh_prepare_write
0 next_thread
0 mmput
0 mm_init
0 lookup_one_len
0 lookup_create
0 lookup_chrfops
0 locks_remove_flock
0 lock_rename
0 kunmap
0 kmap_atomic_to_page
0 kfree_skbmem
0 kfree
0 kernel_read
0 is_subdir
0 invalidate_inode_buffers
0 invalidate_bh_lru
0 inode_sub_bytes
0 inode_add_bytes
0 init_new_context
0 init_dev
0 in_group_p
0 iget_locked
0 hash_vcache
0 handle_signal
0 handle_ra_miss
0 getrusage
0 get_zone_counts
0 get_zeroed_page
0 get_vmalloc_info
0 get_pipe_inode
0 get_offset_tsc
0 get_new_inode_fast
0 get_chrfops
0 generic_forget_inode
0 generic_drop_inode
0 fs_may_remount_ro
0 free_task_struct
0 free_pgtables
0 free_pages
0 free_buffer_head
0 follow_down
0 flush_tlb_others
0 flush_thread
0 flush_all_zero_pkmaps
0 finish_wait
0 find_busiest_node
0 filp_ctor
0 fillonedir
0 fcntl_dirnotify
0 fasync_helper
0 ext2_unlink
0 ext2_setattr
0 ext2_set_link
0 ext2_set_inode_flags
0 ext2_rmdir
0 ext2_release_file
0 ext2_put_inode
0 ext2_make_empty
0 ext2_last_byte
0 ext2_ioctl
0 ext2_get_branch
0 ext2_find_near
0 ext2_empty_dir
0 ext2_destroy_inode
0 ext2_delete_inode
0 ext2_create
0 ext2_check_page
0 ext2_bg_num_gdb
0 ext2_alloc_inode
0 ext2_alloc_branch
0 expand_stack
0 expand_files
0 expand_fd_array
0 error_code
0 end_page_writeback
0 down_tty_sem
0 do_sync_write
0 do_signal
0 do_pipe
0 do_mpage_readpage
0 do_gettimeofday
0 do_fcntl
0 destroy_context
0 de_put
0 d_validate
0 d_unhash
0 d_callback
0 d_alloc_root
0 create_elf_tables
0 cpu_sched_info
0 cpu_idle
0 count
0 copy_strings_kernel
0 copy_semundo
0 compute_creds
0 clear_inode
0 chrdev_open
0 chown_common
0 check_tty_count
0 cap_bprm_set_security
0 cap_bprm_compute_creds
0 can_share_swap_page
0 cached_lookup
0 bounce_copy_vec
0 block_truncate_page
0 block_prepare_write
0 block_commit_write
0 bh_waitq_head
0 bh_lru_install
0 bad_page
0 alloc_buffer_head
0 add_wait_queue
0 add_to_page_cache_lru
0 __set_page_buffers
0 __remove_from_page_cache
0 __posix_lock_file
0 __pmd_alloc
0 __pagevec_release
0 __mmdrop
0 __get_user_4
0 __down_failed
0 __cond_resched
0 __block_write_full_page
0 .text.lock.vcache
0 .text.lock.tty_io
0 .text.lock.sysctl
0 .text.lock.root
0 .text.lock.page_writeback
0 .text.lock.mmap
0 .text.lock.ioctl
0 .text.lock.ialloc
0 .text.lock.fs_writeback
0 .text.lock.fork
0 .text.lock.exec
0 .text.lock.char_dev
0 .text.lock.balloc
0 .text.lock.array
[-- Attachment #3: profile.without --]
[-- Type: application/octet-stream, Size: 10188 bytes --]
28757 total
17171 default_idle
748 zap_pte_range
651 page_remove_rmap
572 copy_page_range
421 release_pages
406 __d_lookup
389 page_add_rmap
375 find_get_page
293 __copy_to_user_ll
290 atomic_dec_and_lock
214 path_lookup
211 .text.lock.dec_and_lock
179 remove_shared_vm_struct
174 __down
163 do_no_page
161 do_wp_page
159 do_anonymous_page
144 .text.lock.dcache
138 free_pages_and_swap_cache
135 follow_mount
131 copy_mm
130 clear_page_tables
127 pte_alloc_one
115 schedule
109 do_page_fault
105 kmem_cache_free
103 file_move
101 filemap_nopage
99 free_hot_cold_page
92 path_release
92 copy_process
85 buffered_rmqueue
83 __find_get_block
80 .text.lock.highmem
78 __wake_up
77 proc_pid_stat
74 link_path_walk
73 .text.lock.file_table
70 d_alloc
66 release_task
66 find_trylock_page
62 .text.lock.namei
61 __block_prepare_write
60 __set_page_dirty_buffers
59 page_address
59 grab_block
58 __read_lock_failed
57 fd_install
56 kmap_high
55 exit_notify
55 __fput
54 number
54 file_kill
47 system_call
44 kmap_atomic
43 flush_signal_handlers
43 fget
42 fput
41 ext2_new_inode
40 handle_mm_fault
40 dnotify_parent
40 __copy_from_user_ll
39 alloc_inode
37 unmap_vmas
37 __copy_user_intel
36 new_inode
35 kmalloc
35 d_instantiate
35 block_invalidatepage
33 vma_link
33 ext2_update_inode
31 prune_dcache
31 find_vma
31 do_generic_mapping_read
31 __mark_inode_dirty
31 __insert_inode_hash
30 dput
30 do_page_cache_readahead
30 __brelse
29 strnlen_user
27 radix_tree_lookup
27 ext2_new_block
26 set_page_address
26 read_block_bitmap
26 generic_file_aio_write_nolock
26 dentry_open
26 .text.lock.attr
25 task_vsize
25 real_lookup
25 ext2_free_blocks
24 vsnprintf
24 do_mmap_pgoff
24 .text.lock.base
23 deny_write_access
21 __generic_file_aio_read
21 __find_get_block_slow
21 .text.lock.inode
20 proc_check_root
20 get_pid_list
20 find_get_pages
19 file_ra_state_init
19 ext2_find_entry
19 copy_files
18 truncate_inode_pages
18 task_mem
18 render_sigset_t
18 proc_root_link
18 proc_pid_status
18 exit_mmap
17 vfs_read
17 pte_alloc_map
17 kunmap_high
17 ext2_discard_prealloc
17 __vma_link
17 __pagevec_lru_add_active
16 igrab
16 generic_delete_inode
16 complete
15 select_parent
15 inode_change_ok
15 generic_fillattr
15 find_group_other
15 ext2_get_group_desc
15 d_delete
14 prep_new_page
14 may_open
14 mark_page_accessed
14 get_unused_fd
14 filp_close
14 ext2_preread_inode
14 exec_mmap
14 dup_task_struct
14 do_lookup
13 ext2_get_inode
13 current_kernel_time
12 sys_wait4
12 ext2_get_block
12 create_buffers
11 strncpy_from_user
11 ext2_add_link
11 del_timer_sync
11 d_path
10 vfs_unlink
10 rcu_do_batch
10 kunmap_atomic
10 get_write_access
10 do_sigaction
10 __alloc_pages
9 unlock_page
9 try_to_free_buffers
9 sys_brk
9 read_inode_bitmap
9 open_namei
9 flush_tlb_mm
9 ext2_reserve_inode
9 ext2_inode_by_name
9 d_lookup
9 copy_strings
9 __iget
8 vm_enough_memory
8 try_to_wake_up
8 truncate_complete_page
8 split_vma
8 sched_best_cpu
8 rb_insert_color
8 proc_fd_link
8 ext2_block_to_path
8 do_exit
8 dnotify_flush
7 vfs_write
7 vfs_getattr
7 load_elf_binary
7 get_wchan
7 get_empty_filp
7 flush_old_exec
7 file_read_actor
7 ext2_truncate
7 __block_commit_write
6 wake_up_forked_process
6 sigprocmask
6 set_cpus_allowed
6 get_signal_to_deliver
6 generic_file_write
6 find_vma_prepare
6 find_lock_page
6 ext2_release_inode
6 ext2_readdir
6 ext2_free_inode
6 ext2_commit_chunk
6 bad_range
6 bad_get_user
6 __lookup
5 wait_task_zombie
5 vma_merge
5 vm_acct_memory
5 vfs_permission
5 sys_close
5 setup_arg_pages
5 schedule_tail
5 profile_exit_mmap
5 pid_revalidate
5 page_cache_readahead
5 lookup_mnt
5 iput
5 ext2_delete_entry
5 d_rehash
5 cp_new_stat64
5 call_rcu
5 __pte_chain_free
5 __clear_page_buffers
4 wait_for_completion
4 update_atime
4 search_binary_handler
4 pte_chain_alloc
4 proc_pid_lookup
4 proc_delete_inode
4 prepare_binprm
4 pid_fd_revalidate
4 page_waitqueue
4 mm_alloc
4 lru_cache_add_active
4 kstat_read_proc
4 kmem_cache_alloc
4 inode_has_buffers
4 generic_file_read
4 generic_file_open
4 flush_tlb_page
4 ext2_free_branches
4 do_munmap
4 default_wake_function
4 alloc_pidmap
4 add_to_page_cache
4 __set_page_dirty_nobuffers
4 __pagevec_lru_add
3 vfs_readdir
3 unmap_vma
3 sys_read
3 sys_open
3 proc_root_lookup
3 pipe_write
3 migration_thread
3 mark_buffer_dirty
3 locks_remove_posix
3 inode_times_differ
3 inode_setattr
3 generic_file_mmap
3 ext2_get_page
3 ext2_count_free_inodes
3 eventpoll_release
3 do_execve
3 do_brk
3 create_empty_buffers
3 clear_user
3 __free_pages
3 __copy_user_zeroing_intel
3 .text.lock.sem
2 zap_pmd_range
2 unmap_region
2 unlock_buffer
2 sys_unlink
2 sys_newuname
2 sys_ioctl
2 sys_execve
2 set_fs_pwd
2 sem_exit
2 ret_from_intr
2 remove_wait_queue
2 radix_tree_insert
2 radix_tree_delete
2 put_filp
2 put_files_struct
2 proc_pid_readlink
2 proc_pid_make_inode
2 proc_info_read
2 proc_base_lookup
2 pipe_read
2 pid_base_iput
2 pgd_free
2 pgd_ctor
2 pgd_alloc
2 permission
2 open_exec
2 old_mmap
2 load_elf_interp
2 kill_fasync
2 invalidate_vcache
2 getname
2 generic_file_write_nolock
2 generic_file_llseek
2 generic_commit_write
2 find_vma_prev
2 find_busiest_node
2 filldir64
2 eligible_child
2 drop_buffers
2 do_fork
2 d_invalidate
2 cap_bprm_compute_creds
2 cache_grow
2 __vma_link_rb
2 __rb_rotate_left
2 __rb_erase_color
2 __get_page_state
2 .text.lock.root
2 .text.lock.ialloc
2 .text.lock.fs_writeback
1 vfs_rmdir
1 vfs_follow_link
1 vfs_create
1 unmap_page_range
1 task_dumpable
1 syscall_exit
1 sys_write
1 sys_rt_sigprocmask
1 sys_rt_sigaction
1 sys_dup2
1 rwsem_wake
1 remove_suid
1 remove_from_page_cache
1 profile_exit_task
1 proc_lookupfd
1 proc_lookup
1 pipe_wait
1 pid_delete_dentry
1 page_cache_readaround
1 notify_change
1 next_thread
1 mmput
1 mm_release
1 mm_init
1 lru_cache_add
1 lru_add_drain
1 lookup_hash
1 inode_update_time
1 inode_sub_bytes
1 inode_init_once
1 get_zone_counts
1 get_unmapped_area
1 get_offset_tsc
1 follow_down
1 flush_thread
1 find_task_by_pid
1 find_group_orlov
1 filp_open
1 ext2_statfs
1 ext2_rmdir
1 ext2_lookup
1 ext2_get_branch
1 ext2_check_page
1 ext2_alloc_branch
1 ext2_alloc_block
1 exit_itimers
1 error_code
1 elf_map
1 do_sync_write
1 d_alloc_root
1 create_elf_tables
1 copy_thread
1 copy_namespace
1 chown_common
1 change_protection
1 can_share_swap_page
1 cached_lookup
1 block_prepare_write
1 bh_lru_install
1 balance_dirty_pages_ratelimited
1 alloc_buffer_head
1 __user_walk
1 __set_page_buffers
1 __remove_from_page_cache
1 __pagevec_free
1 __mmdrop
1 __getblk
1 __get_free_pages
1 __d_path
1 .text.lock.ioctl
1 .text.lock.dnotify
0 write_profile
0 write_inode
0 wake_up_process
0 wake_up_buffer
0 wait_on_page_bit
0 vsscanf
0 vsprintf
0 vmtruncate
0 vfs_stat
0 vfs_rename
0 vfs_mkdir
0 vfs_lstat
0 vfs_fstat
0 up_tty_sem
0 unix_sock_destructor
0 unix_mkname
0 unix_create1
0 tty_drivers_read_proc
0 test_clear_page_dirty
0 task_prio
0 task_nice
0 sysctl_string
0 syscall_call
0 sys_vhangup
0 sys_utime
0 sys_time
0 sys_sysctl
0 sys_setpgid
0 sys_rmdir
0 sys_readlink
0 sys_mprotect
0 sys_mkdir
0 sys_lstat64
0 sys_llseek
0 sys_gettimeofday
0 sys_getrlimit
0 sys_getdents64
0 sys_getcwd
0 sys_fstat64
0 sys_fcntl64
0 sys_epoll_create
0 sys_chmod
0 sys_access
0 sync_supers
0 supplemental_group_member
0 sprintf
0 sock_map_fd
0 smp_call_function
0 sk_alloc
0 si_swapinfo
0 setup_frame
0 setattr_mask
0 set_brk
0 set_bh_page
0 send_IPI_mask_sequence
0 sched_migrate_task
0 sched_balance_exec
0 save_i387
0 rwsem_down_read_failed
0 resume_userspace
0 restore_fpu
0 restore_all
0 release_x86_irqs
0 release_thread
0 register_reboot_notifier
0 recalc_bh_state
0 read_zero
0 read_cache_page
0 rcu_process_callbacks
0 rb_erase
0 radix_tree_preload
0 radix_tree_gang_lookup
0 put_unused_fd
0 put_dirty_page
0 pty_open
0 pte_alloc_kernel
0 proc_pid_readdir
0 proc_permission
0 proc_get_inode
0 proc_file_read
0 proc_file_lseek
0 proc_destroy_inode
0 proc_alloc_inode
0 prepare_to_wait_exclusive
0 posix_block_lock
0 pipe_write_fasync
0 pipe_release
0 pipe_read_release
0 pagevec_lookup
0 page_slot
0 padzero
0 nr_running
0 nr_iowait
0 nr_free_pages
0 nr_context_switches
0 nr_blockdev_pages
0 mprotect_fixup
0 math_state_restore
0 lookup_chrfops
0 locks_remove_flock
0 locks_insert_lock
0 lock_rename
0 locate_fd
0 kunmap
0 ksoftirqd
0 kmap_atomic_to_page
0 kmap
0 kfree_percpu
0 kfree
0 kernel_read
0 is_bad_inode
0 invalidate_inode_buffers
0 invalidate_bh_lru
0 insert_vm_struct
0 inode_needs_sync
0 inode_add_bytes
0 init_fpu
0 init_dev
0 in_group_p
0 hash_vcache
0 handle_signal
0 handle_ra_miss
0 grab_cache_page_nowait
0 get_vmalloc_info
0 get_pipe_inode
0 get_new_inode_fast
0 get_chrfops
0 generic_forget_inode
0 generic_file_readv
0 generic_file_aio_read
0 generic_drop_inode
0 fs_may_remount_ro
0 free_task_struct
0 free_pgtables
0 free_pages
0 free_buffer_head
0 flush_all_zero_pkmaps
0 finish_wait
0 find_or_create_page
0 fcntl_dirnotify
0 fasync_helper
0 ext2_unlink
0 ext2_setattr
0 ext2_set_link
0 ext2_set_inode_flags
0 ext2_rename
0 ext2_release_file
0 ext2_put_inode
0 ext2_mknod
0 ext2_make_empty
0 ext2_ioctl
0 ext2_group_sparse
0 ext2_follow_link
0 ext2_find_near
0 ext2_empty_dir
0 ext2_create
0 ext2_count_free_blocks
0 ext2_count_dirs
0 ext2_bg_has_super
0 ext2_alloc_inode
0 expand_stack
0 expand_fd_array
0 exit_aio
0 eventpoll_init_file
0 do_truncate
0 do_softirq
0 do_signal
0 do_proc_readlink
0 do_pipe
0 do_gettimeofday
0 do_file_page
0 device_not_available
0 detach_vmas_to_be_unmapped
0 destroy_inode
0 de_put
0 d_validate
0 d_unhash
0 d_move
0 d_free
0 cpu_sched_info
0 count_open_files
0 count
0 copy_semundo
0 convert_fxsr_to_user
0 compute_creds
0 clear_inode
0 check_tty_count
0 check_ttfb_buffer
0 cap_bprm_set_security
0 can_vma_merge_after
0 build_mmap_rb
0 block_truncate_page
0 block_commit_write
0 bad_page
0 background_writeout
0 add_wait_queue
0 activate_page
0 __up
0 __rb_rotate_right
0 __put_task_struct
0 __put_ioctx
0 __posix_lock_file
0 __get_user_1
0 __filemap_copy_from_user_iovec
0 __down_failed_interruptible
0 __bread
0 .text.lock.tty_io
0 .text.lock.sysctl
0 .text.lock.sys_i386
0 .text.lock.rcupdate
0 .text.lock.open
0 .text.lock.mmap
0 .text.lock.locks
0 .text.lock.fork
0 .text.lock.char_dev
0 .text.lock.buffer
0 .text.lock.balloc
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-09 18:33 ` Martin J. Bligh
@ 2003-04-09 19:20 ` Hugh Dickins
2003-04-09 20:11 ` William Lee Irwin III
2003-04-10 13:52 ` Hugh Dickins
0 siblings, 2 replies; 13+ messages in thread
From: Hugh Dickins @ 2003-04-09 19:20 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Dave McCracken, linux-kernel
On Wed, 9 Apr 2003, Martin J. Bligh wrote:
> >> Hmmm. Something somewhere went wrong. Some semaphore blew up
> >> somewhere ... I'm not convinced that this is your patch
> >> causing the problem, I just thought that since vma_link seems
> >> to have gone up rather in the profile. I'm playing with getting
> >> some better data on what actually happened, but in case someone
> >> is feeling psychic.
> >>
> >> The main thing I changed here (66-mjb2 -> 67-mjb0.2) was to pick up
> >> Andrew's rmap speedups, and drop the objrmap code I had for the stuff
> >
> > I haven't examined it, but I'm guessing 66-mjb2 did not have Dave's
> > vma sorting in at all? Its linear search would certainly raise the
> > time spent in __vma_link (notable in your diffprofile), which would
> > increase the pressure on i_shared_sem.
>
> No it didn't ... but I think 67-mm1 did.
>
> > (Whether it's a worthwhile optimization remains to be seen: like
> > rmap generally, it speeds up page_referenced and try_to_unmap at
> > the expense of the fast path. One improvement would be for fork
> > to just slot dst vma in next to src vma instead of linear search.)
Ignore that last parenthetical sentence: I just took a look at copy_mm,
noticing it up in your diffprofile, and it does already slot new vma
in next to old vma without linear search.
> > I don't think my fix to the sort order could have slowed it down
> > further (though once there are stray entries out of order, it may
> > be hard to predict how things will work out). But without it
> > page_referenced and try_to_unmap sometimes couldn't quite find
> > all the mappings they were looking for.
>
> It is that fix ... I just backed that one patch off and recompared:
Thanks. Yes, seems conclusive, but I'm puzzled.
I hope a fresh pair of eyes can work it out for us.
> DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
> trademarks of the Standard Performance Evaluation Corporation. This
> benchmarking was performed for research purposes only, and the run results
> are non-compliant and not-comparable with any published results.
>
> Results are shown as percentages of the first set displayed
>
> SDET 32 (see disclaimer)
> Throughput Std. Dev
> 2.5.67 100.0% 0.3%
> 2.5.67-mjb0.2 151.7% 0.5%
> 2.5.67-mjb0.2-nosort 207.1% 0.0%
>
> SDET 64 (see disclaimer)
> Throughput Std. Dev
> 2.5.67 100.0% 0.4%
> 2.5.67-mjb0.2 147.0% 0.5%
> 2.5.67-mjb0.2-nosort 201.5% 0.2%
>
> SDET 128 (see disclaimer)
> Throughput Std. Dev
> 2.5.67 100.0% 5.1%
> 2.5.67-mjb0.2 144.5% 0.1%
> 2.5.67-mjb0.2-nosort 188.6% 0.3%
>
>
> I think it's that sem, which seems to be heavily contented.
> Quite possibly for glibc's address_space or something.
> (even though it says "-nosort", it's just your sort fix I
> backed out ... otherwise it's what was in -mm).
Certainly your idea of glibc's address_space is plausible: I can
well imagine (sorry, can't try right now) that it patches the mmap
of some jump tables, doing mprotect and split and merge. But
split_vma and vma_merge didn't show all that high before. Of
course, the inline __vma_link_file in move_vma_start will push
it quite high, but I still don't see why __down soars that high.
> >> he had. *However*, what he had worked fine. I also picked up your
> >> sorting patch here Hugh ... this bit worries me:
> >>
> >> +static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
> >
> > It does use i_shared_sem where it wasn't used before, yes, but it's
> > only called by one case of vma_merge and one case of split_vma:
> > unless your tests are doing a lot of vma splitting (e.g. mprotecting
> > ranges which break up vmas), I wouldn't expect it to figure highly.
> > I can see it's there in the plus part of your diffprofile, but I'm
> > too inexperienced at reading these things, without the original
> > profiles, to tell whether it's being used a surprising amount.
>
> Here's the diffprofile for just your patch ... where it's positive,
> that's the increase in the number of ticks by applying your patch.
> Where it's negative, that's the decrease. The %age is the change from
> the first to the second profile:
>
> larry:/var/bench/results# diffprofile 2.5.67-mjb0.2{-nosort,}/sdetbench/64/profile
> 7148 24.9% total
> 6482 37.7% default_idle
> 1466 842.5% __down
> 442 566.7% __wake_up
> 435 378.3% schedule
> 251 0.0% move_vma_start
> 149 876.5% __vma_link
> 72 40.2% remove_shared_vm_struct
> 46 35.1% copy_mm
> 20 60.6% vma_link
> 12 300.0% default_wake_function
> 11 137.5% rb_insert_color
> ...
> -20 -37.0% number
> -20 -12.6% do_anonymous_page
> -21 -36.8% fd_install
> -23 -27.7% __find_get_block
> -24 -55.8% flush_signal_handlers
> -27 -45.0% __set_page_dirty_buffers
> -28 -26.7% kmem_cache_free
> -28 -7.5% find_get_page
> -29 -34.1% buffered_rmqueue
> -32 -34.8% path_release
> -33 -32.0% file_move
> -35 -60.3% __read_lock_failed
> -35 -43.8% .text.lock.highmem
> -37 -59.7% .text.lock.namei
> -37 -29.1% pte_alloc_one
> -40 -10.3% page_add_rmap
> -41 -41.4% free_hot_cold_page
> -44 -60.3% .text.lock.file_table
> -54 -18.4% __copy_to_user_ll
> -58 -43.0% follow_mount
> -62 -29.0% path_lookup
> -85 -20.9% __d_lookup
> -86 -20.4% release_pages
> -99 -68.8% .text.lock.dcache
> -100 -15.4% page_remove_rmap
> -106 -36.6% atomic_dec_and_lock
> -126 -16.8% zap_pte_range
> -141 -66.8% .text.lock.dec_and_lock
>
> Note the massive increase in down() (and some of the vma ops).
> The things that are cheaper are probably just because of less
> contention, I guess.
>
> > When you say "*However*, what he had worked fine", are you saying
> > you profiled before adding in my patch on top? The diffprofile of
> > the before and after my patch should in that case illuminate.
>
> Well, I hadn't ... but I should have done, and I have now ;-)
>
> I'll attach the two raw profiles for you as well. profile.with
> is with your patch, profile.without is without ... I was looking
> at SDET 64, since it showed the most dramatic difference.
Thanks for all the info, I'm sorry, I must rush away now.
I'll try another think later, but hope someone can do better.
Hugh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-09 19:20 ` Hugh Dickins
@ 2003-04-09 20:11 ` William Lee Irwin III
2003-04-10 13:52 ` Hugh Dickins
1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2003-04-09 20:11 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Martin J. Bligh, Andrew Morton, Dave McCracken, linux-kernel
On Wed, Apr 09, 2003 at 08:20:28PM +0100, Hugh Dickins wrote:
> Thanks. Yes, seems conclusive, but I'm puzzled.
> I hope a fresh pair of eyes can work it out for us.
They're pounding ->i_shared_sem, which you already knew.
Here's what I see as far as number of processes mapping what. It seems
to indicate large scale sharing occurs for a number of objects, which
could very well lead to mutual interference for several objects.
It seems to indicate more than glibc is involved, and that there's some
shm involved with large vma count files on "normal" systems as well.
-- wli
how many processes were mapping a given file
(i.e. remove dups in /proc/$PID/maps)
---------------------------------------------
/lib/libc-2.2.5.so 151
/lib/ld-2.2.5.so 151
/lib/libnsl-2.2.5.so 110
/lib/libnss_compat-2.2.5.so 107
/lib/libdl-2.2.5.so 85
/lib/libm-2.2.5.so 70
/lib/libncurses.so.5.2 65
/usr/X11R6/lib/libX11.so.6.2 44
/usr/X11R6/lib/libSM.so.6.0 43
/usr/X11R6/lib/libICE.so.6.3 43
/lib/libcap.so.1.10 39
/usr/lib/zsh/4.0.4/zsh/zle.so 35
/usr/lib/zsh/4.0.4/zsh/rlimits.so 35
/usr/lib/zsh/4.0.4/zsh/complete.so 35
/usr/lib/zsh/4.0.4/zsh/compctl.so 35
/usr/X11R6/lib/libXpm.so.4.11 35
/bin/zsh4 35
/lib/libnss_files-2.2.5.so 32
/usr/lib/libz.so.1.1.4 31
/usr/X11R6/lib/libXext.so.6.4 24
/lib/libcrypt-2.2.5.so 22
/lib/libresolv-2.2.5.so 21
/lib/libnss_dns-2.2.5.so 21
How many vma's total mapped a given file:
-----------------------------------------
/lib/libc-2.2.5.so 302
/lib/ld-2.2.5.so 302
/lib/libnsl-2.2.5.so 220
/SYSV00000000 220
/lib/libnss_compat-2.2.5.so 214
/lib/libdl-2.2.5.so 170
/lib/libm-2.2.5.so 140
/lib/libncurses.so.5.2 130
/usr/X11R6/lib/libX11.so.6.2 88
/usr/X11R6/lib/libSM.so.6.0 86
/usr/X11R6/lib/libICE.so.6.3 86
/lib/libcap.so.1.10 78
/usr/lib/zsh/4.0.4/zsh/zle.so 70
/usr/X11R6/lib/libXpm.so.4.11 70
/bin/zsh4 70
/lib/libnss_files-2.2.5.so 64
/usr/lib/libz.so.1.1.4 62
/usr/X11R6/lib/libXext.so.6.4 48
/lib/libcrypt-2.2.5.so 44
/lib/libresolv-2.2.5.so 42
/lib/libnss_dns-2.2.5.so 42
/usr/X11R6/lib/libXt.so.6.0 40
/usr/X11R6/bin/wterm 40
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-09 19:20 ` Hugh Dickins
2003-04-09 20:11 ` William Lee Irwin III
@ 2003-04-10 13:52 ` Hugh Dickins
2003-04-10 14:29 ` Martin J. Bligh
1 sibling, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2003-04-10 13:52 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Dave McCracken, linux-kernel
On Wed, 9 Apr 2003, Hugh Dickins wrote:
> On Wed, 9 Apr 2003, Martin J. Bligh wrote:
> >
> > Here's the diffprofile for just your patch ... where it's positive,
> > that's the increase in the number of ticks by applying your patch.
> > Where it's negative, that's the decrease. The %age is the change from
> > the first to the second profile:
> >
> > larry:/var/bench/results# diffprofile 2.5.67-mjb0.2{-nosort,}/sdetbench/64/profile
> > 7148 24.9% total
> > 6482 37.7% default_idle
> > 1466 842.5% __down
> > 442 566.7% __wake_up
> > 435 378.3% schedule
> > 251 0.0% move_vma_start
> > 149 876.5% __vma_link
> > 72 40.2% remove_shared_vm_struct
> > 46 35.1% copy_mm
> > 20 60.6% vma_link
> >
> > Note the massive increase in down() (and some of the vma ops).
>
> Thanks for all the info, I'm sorry, I must rush away now.
> I'll try another think later, but hope someone can do better.
I've not reproduced this in testing myself (I don't have SDET);
but the conclusion I've come to is that the length of your vma lists
(for one or probably more files) was such that they were already
dangerously extending the hold of i_shared_sem with Dave's linear-
search-to-sort patch, and my additional downs in move_vma_start
then just pushed it over the edge into a thrash of collisions.
Clearly I was wrong to suppose that move_vma_start would scarcely be
called: even in my testing it showed up ~50% higher than __vma_link,
the other user of __vma_link_file. But we cannot avoid i_shared_sem
there (can probably avoid page_table_lock and I did try doing without
that, just in case my up before spin_unlock had some hideous effect,
but apparently not).
I believe you've done the right thing in 2.5.67-mjb1: chucked out
both my patch and the vma list sorting: it's just too expensive on
the fast path, and you've shown that vividly.
Hugh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-10 13:52 ` Hugh Dickins
@ 2003-04-10 14:29 ` Martin J. Bligh
2003-04-10 14:39 ` Hugh Dickins
2003-04-10 14:50 ` Dave McCracken
0 siblings, 2 replies; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-10 14:29 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Andrew Morton, Dave McCracken, linux-kernel
>> > Here's the diffprofile for just your patch ... where it's positive,
>> > that's the increase in the number of ticks by applying your patch.
>> > Where it's negative, that's the decrease. The %age is the change from
>> > the first to the second profile:
>> >
>> > larry:/var/bench/results# diffprofile 2.5.67-mjb0.2{-nosort,}/sdetbench/64/profile
>> > 7148 24.9% total
>> > 6482 37.7% default_idle
>> > 1466 842.5% __down
>> > 442 566.7% __wake_up
>> > 435 378.3% schedule
>> > 251 0.0% move_vma_start
>> > 149 876.5% __vma_link
>> > 72 40.2% remove_shared_vm_struct
>> > 46 35.1% copy_mm
>> > 20 60.6% vma_link
>> >
>> > Note the massive increase in down() (and some of the vma ops).
>>
>> Thanks for all the info, I'm sorry, I must rush away now.
>> I'll try another think later, but hope someone can do better.
>
> I've not reproduced this in testing myself (I don't have SDET);
> but the conclusion I've come to is that the length of your vma lists
> (for one or probably more files) was such that they were already
> dangerously extending the hold of i_shared_sem with Dave's linear-
> search-to-sort patch, and my additional downs in move_vma_start
> then just pushed it over the edge into a thrash of collisions.
>
> Clearly I was wrong to suppose that move_vma_start would scarcely be
> called: even in my testing it showed up ~50% higher than __vma_link,
> the other user of __vma_link_file. But we cannot avoid i_shared_sem
> there (can probably avoid page_table_lock and I did try doing without
> that, just in case my up before spin_unlock had some hideous effect,
> but apparently not).
Yeah, sorry ... I guess someone should have published the phone conversation
we had yesterday ... </me pokes Dave in the eye>
We came to the conclusion that should be adding the semaphore to the current
code even, as list_add_tail isn't atomic to a doubly linked list (unless
maybe you can do some fancy-pants compare and exchange thing after setting
up the prev pointer of the new element already). Which is probably going
to suck performance-wise, but I'd prefer correctness. From there we can
make a better judgment, but it sounds like it's going to content horribly
on those busy semaphores.
cat /proc/*/maps | nawk '{print $6}' | sort | uniq -c
reveals that we have 600 or so mappings to libc and ld splattered around,
which seems fairly low load ... SDET is doing bunches of shell scripts,
which probably generates the high operations on top of that.
I think the "list of lists" thing will help this, but unless we do
something like RCU here, I don't see how we can do much to this data
structure without death-by-semaphore contention.
> I believe you've done the right thing in 2.5.67-mjb1: chucked out
> both my patch and the vma list sorting: it's just too expensive on
> the fast path, and you've shown that vividly.
Yeah, I was being grumpy and threw it all out ;-) Needs more
thought before we decide what to do with this stuff.
M.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-10 14:29 ` Martin J. Bligh
@ 2003-04-10 14:39 ` Hugh Dickins
2003-04-10 14:50 ` Dave McCracken
1 sibling, 0 replies; 13+ messages in thread
From: Hugh Dickins @ 2003-04-10 14:39 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Andrew Morton, Dave McCracken, linux-kernel
On Thu, 10 Apr 2003, Martin J. Bligh wrote:
>
> Yeah, sorry ... I guess someone should have published the phone conversation
> we had yesterday ... </me pokes Dave in the eye>
No problem: I left you all hanging.
> We came to the conclusion that should be adding the semaphore to the current
> code even, as list_add_tail isn't atomic to a doubly linked list
Sure you can't list_add_tail without the semaphore: where is it missed?
Hugh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-10 14:29 ` Martin J. Bligh
2003-04-10 14:39 ` Hugh Dickins
@ 2003-04-10 14:50 ` Dave McCracken
2003-04-10 14:57 ` Martin J. Bligh
1 sibling, 1 reply; 13+ messages in thread
From: Dave McCracken @ 2003-04-10 14:50 UTC (permalink / raw)
To: Martin J. Bligh, Hugh Dickins; +Cc: Andrew Morton, linux-kernel
--On Thursday, April 10, 2003 07:29:03 -0700 "Martin J. Bligh"
<mbligh@aracnet.com> wrote:
> Yeah, sorry ... I guess someone should have published the phone
> conversation we had yesterday ... </me pokes Dave in the eye>
>
> We came to the conclusion that should be adding the semaphore to the
> current code even, as list_add_tail isn't atomic to a doubly linked list
> (unless maybe you can do some fancy-pants compare and exchange thing
> after setting up the prev pointer of the new element already). Which is
> probably going to suck performance-wise, but I'd prefer correctness. From
> there we can make a better judgment, but it sounds like it's going to
> content horribly on those busy semaphores.
I didn't publish the conversation because I realized that the semaphore is
taken outside the function, so it is held. It's what I called you back to
tell you.
I'm guessing the contention we're seeing with Hugh's fix is because of the
way ld.so works. It maps the entire library, then does an mprotect to
change the idata section from shared to private. It does this for every
mapped library after every exec.
Dave
======================================================================
Dave McCracken IBM Linux Base Kernel Team 1-512-838-3059
dmccr@us.ibm.com T/L 678-3059
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-10 14:50 ` Dave McCracken
@ 2003-04-10 14:57 ` Martin J. Bligh
2003-04-10 15:21 ` Hugh Dickins
0 siblings, 1 reply; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-10 14:57 UTC (permalink / raw)
To: Dave McCracken, Hugh Dickins; +Cc: Andrew Morton, linux-kernel
>> Yeah, sorry ... I guess someone should have published the phone
>> conversation we had yesterday ... </me pokes Dave in the eye>
>>
>> We came to the conclusion that should be adding the semaphore to the
>> current code even, as list_add_tail isn't atomic to a doubly linked list
>> (unless maybe you can do some fancy-pants compare and exchange thing
>> after setting up the prev pointer of the new element already). Which is
>> probably going to suck performance-wise, but I'd prefer correctness. From
>> there we can make a better judgment, but it sounds like it's going to
>> content horribly on those busy semaphores.
>
> I didn't publish the conversation because I realized that the semaphore is
> taken outside the function, so it is held. It's what I called you back to
> tell you.
Oh yeah. I guess I should poke myself in the eye instead ;-)
So it's OK the way it is.
> I'm guessing the contention we're seeing with Hugh's fix is because of the
> way ld.so works. It maps the entire library, then does an mprotect to
> change the idata section from shared to private. It does this for every
> mapped library after every exec.
Eeek. There's no way we can set this up to do it as two separate VMAs
initially, is there?
M.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-10 14:57 ` Martin J. Bligh
@ 2003-04-10 15:21 ` Hugh Dickins
2003-04-10 15:24 ` Martin J. Bligh
0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2003-04-10 15:21 UTC (permalink / raw)
To: Martin J. Bligh; +Cc: Dave McCracken, Andrew Morton, linux-kernel
On Thu, 10 Apr 2003, Martin J. Bligh wrote:
>
> Eeek. There's no way we can set this up to do it as two separate VMAs
> initially, is there?
What if we could? It's already shown the VMA sorting is (liable to be)
too slow. Changing that most common case won't change the fact.
Hugh
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH] fix obj vma sorting
2003-04-10 15:21 ` Hugh Dickins
@ 2003-04-10 15:24 ` Martin J. Bligh
0 siblings, 0 replies; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-10 15:24 UTC (permalink / raw)
To: Hugh Dickins; +Cc: Dave McCracken, Andrew Morton, linux-kernel
>> Eeek. There's no way we can set this up to do it as two separate VMAs
>> initially, is there?
>
> What if we could? It's already shown the VMA sorting is (liable to be)
> too slow. Changing that most common case won't change the fact.
Well, it'd thrash it substantially less, I guess. However, you're probably
right ... need a design change instead of tweaking. Doubling the number
of tasks would probably just take us back to where we were before ... need
something more radical.
M.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-04-10 15:13 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-08 18:16 [PATCH] fix obj vma sorting Hugh Dickins
2003-04-09 17:07 ` Martin J. Bligh
2003-04-09 18:24 ` Hugh Dickins
2003-04-09 18:33 ` Martin J. Bligh
2003-04-09 19:20 ` Hugh Dickins
2003-04-09 20:11 ` William Lee Irwin III
2003-04-10 13:52 ` Hugh Dickins
2003-04-10 14:29 ` Martin J. Bligh
2003-04-10 14:39 ` Hugh Dickins
2003-04-10 14:50 ` Dave McCracken
2003-04-10 14:57 ` Martin J. Bligh
2003-04-10 15:21 ` Hugh Dickins
2003-04-10 15:24 ` Martin J. Bligh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).