linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fix obj vma sorting
@ 2003-04-08 18:16 Hugh Dickins
  2003-04-09 17:07 ` Martin J. Bligh
  0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2003-04-08 18:16 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave McCracken, linux-kernel

Fix several points in objrmap's vma sorting:

1. It was adding all vmas, even private ones, to i_mmap_shared.
2. It was not quite sorting: list_add_tail is needed in all cases.
3. If vm_pgoff is changed on a file vma (as in vma_merge and split_vma)
   we must unlink vma from list and relink while holding i_shared_sem:
   move_vma_start to do this (holds page_table_lock too, as vma_merge
   did and split_vma did not: I think nothing needs that, rip it out
   if you like, but my guess was that you'd prefer the extra safety).

Sorry, no, this doesn't magically make it all a hundred times faster.

--- 2.5.67-mm1/mm/mmap.c	Tue Apr  8 14:02:06 2003
+++ linux/mm/mmap.c	Tue Apr  8 18:06:07 2003
@@ -321,16 +321,13 @@
 		else
 			vmhead = &mapping->i_mmap;
 
-		list_for_each(vmlist, &mapping->i_mmap_shared) {
+		list_for_each(vmlist, vmhead) {
 			struct vm_area_struct *vmtemp;
 			vmtemp = list_entry(vmlist, struct vm_area_struct, shared);
 			if (vmtemp->vm_pgoff >= vma->vm_pgoff)
 				break;
 		}
-		if (vmlist == vmhead)
-			list_add_tail(&vma->shared, vmlist);
-		else
-			list_add(&vma->shared, vmlist);
+		list_add_tail(&vma->shared, vmlist);
 	}
 }
 
@@ -366,6 +363,28 @@
 	validate_mm(mm);
 }
 
+static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
+{
+	spinlock_t *lock = &vma->vm_mm->page_table_lock;
+	struct inode *inode = NULL;
+	
+	if (vma->vm_file) {
+		inode = vma->vm_file->f_dentry->d_inode;
+		down(&inode->i_mapping->i_shared_sem);
+	}
+	spin_lock(lock);
+	if (inode)
+		__remove_shared_vm_struct(vma, inode);
+	/* If no vm_file, perhaps we should always keep vm_pgoff at 0?? */
+	vma->vm_pgoff += (long)(addr - vma->vm_start) >> PAGE_SHIFT;
+	vma->vm_start = addr;
+	if (inode) {
+		__vma_link_file(vma);
+		up(&inode->i_mapping->i_shared_sem);
+	}
+	spin_unlock(lock);
+}
+
 /*
  * Return true if we can merge this (vm_flags,file,vm_pgoff,size)
  * in front of (at a lower virtual address and file offset than) the vma.
@@ -422,8 +441,6 @@
 			unsigned long end, unsigned long vm_flags,
 			struct file *file, unsigned long pgoff)
 {
-	spinlock_t * lock = &mm->page_table_lock;
-
 	if (!prev) {
 		prev = rb_entry(rb_parent, struct vm_area_struct, vm_rb);
 		goto merge_next;
@@ -435,6 +452,7 @@
 	if (prev->vm_end == addr &&
 			can_vma_merge_after(prev, vm_flags, file, pgoff)) {
 		struct vm_area_struct *next;
+		spinlock_t *lock = &mm->page_table_lock;
 		struct inode *inode = file ? file->f_dentry->d_inode : NULL;
 		int need_up = 0;
 
@@ -480,10 +498,7 @@
 				pgoff, (end - addr) >> PAGE_SHIFT))
 			return 0;
 		if (end == prev->vm_start) {
-			spin_lock(lock);
-			prev->vm_start = addr;
-			prev->vm_pgoff -= (end - addr) >> PAGE_SHIFT;
-			spin_unlock(lock);
+			move_vma_start(prev, addr);
 			return 1;
 		}
 	}
@@ -1203,8 +1218,7 @@
 
 	if (new_below) {
 		new->vm_end = addr;
-		vma->vm_start = addr;
-		vma->vm_pgoff += ((addr - new->vm_start) >> PAGE_SHIFT);
+		move_vma_start(vma, addr);
 	} else {
 		vma->vm_end = addr;
 		new->vm_start = addr;


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-08 18:16 [PATCH] fix obj vma sorting Hugh Dickins
@ 2003-04-09 17:07 ` Martin J. Bligh
  2003-04-09 18:24   ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-09 17:07 UTC (permalink / raw)
  To: Hugh Dickins, Andrew Morton; +Cc: Dave McCracken, linux-kernel

Hmmm. Something somewhere went wrong. Some semaphore blew up
somewhere ... I'm not convinced that this is your patch
causing the problem, I just thought that since vma_link seems
to have gone up rather in the profile. I'm playing with getting
some better data on what actually happened, but in case someone
is feeling psychic. 

The main thing I changed here (66-mjb2 -> 67-mjb0.2) was to pick up 
Andrew's rmap speedups, and drop the objrmap code I had for the stuff 
he had. *However*, what he had worked fine. I also picked up your 
sorting patch here Hugh ... this bit worries me:

+static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
+{
+	spinlock_t *lock = &vma->vm_mm->page_table_lock;
+	struct inode *inode = NULL;
+	
+	if (vma->vm_file) {
+		inode = vma->vm_file->f_dentry->d_inode;
+		down(&inode->i_mapping->i_shared_sem);
+	}
+	spin_lock(lock);
+	if (inode)
+		__remove_shared_vm_struct(vma, inode);
+	/* If no vm_file, perhaps we should always keep vm_pgoff at 0?? */
+	vma->vm_pgoff += (long)(addr - vma->vm_start) >> PAGE_SHIFT;
+	vma->vm_start = addr;
+	if (inode) {
+		__vma_link_file(vma);
+		up(&inode->i_mapping->i_shared_sem);
+	}
+	spin_unlock(lock);
+}

M.

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 128  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.66       100.0%         0.2%
                   2.5.67        97.7%         5.1%
               2.5.66-mm2       176.1%         0.6%
               2.5.67-mm1       176.7%         0.2%
              2.5.66-mjb2       181.8%         0.0%
            2.5.67-mjb0.2       141.1%         0.1%


diffprofile {2.5.66-mjb2,2.5.67-mjb0.2}/sdetbench/128/profile
(these are at 100 Hz).

     12913    38.8% default_idle
     12472    20.2% total
      3085   912.7% __down
      1026   385.7% schedule
       946   666.2% __wake_up
       904     0.0% __d_lookup
       626     0.0% move_vma_start
       452  6457.1% __vma_link
       159    40.9% remove_shared_vm_struct
        84    36.4% do_no_page
        69    22.5% copy_mm
        65   125.0% vma_link
        37   528.6% default_wake_function
        31     9.0% do_wp_page
        29   290.0% rb_insert_color
        19    95.0% try_to_wake_up
        18   450.0% __vma_link_rb
        17     6.0% clear_page_tables
        15    20.3% handle_mm_fault
        14   140.0% find_vma_prepare
        14   700.0% __rb_rotate_left
        13    46.4% exit_mmap
        11   110.0% kunmap_atomic
        10    24.4% do_mmap_pgoff
...
      -102   -58.6% __read_lock_failed
      -124   -43.5% path_release
      -126   -17.7% __copy_to_user_ll
      -168   -20.1% release_pages
      -189   -21.1% page_add_rmap
      -223   -33.2% path_lookup
      -241   -15.3% zap_pte_range
      -247   -18.3% page_remove_rmap
      -310   -46.5% follow_mount
      -405   -70.7% .text.lock.dcache
      -425   -76.6% .text.lock.namei
      -551   -49.5% atomic_dec_and_lock
      -628   -71.2% .text.lock.dec_and_lock
     -1148   -98.5% d_lookup

diffprofile {2.5.67-mm1,2.5.67-mjb0.2}/sdetbench/128/profile


    110028    31.3% default_idle
     92085    14.2% total
     31265  1054.5% __down
     10473   428.0% schedule
      9351   611.6% __wake_up
      6260     0.0% move_vma_start
      4200  1076.9% __vma_link
      1328    32.0% remove_shared_vm_struct
       831  1695.9% find_trylock_page
       567    17.8% copy_mm
       428    57.7% vma_link
       380   633.3% default_wake_function
       294   306.2% rb_insert_color
       182    87.5% try_to_wake_up
       177   411.6% __vma_link_rb
       158    62.7% exit_mmap
       150     0.0% rcu_do_batch
       135   540.0% __rb_rotate_left
...
      -196   -31.3% block_invalidatepage
      -202   -39.5% ext2_new_block
      -204   -54.5% .text.lock.inode
      -208   -33.1% task_mem
      -213    -6.6% clear_page_tables
      -213   -55.6% d_lookup
      -218   -44.7% select_parent
      -228   -21.2% kmap_high
      -235   -46.5% read_block_bitmap
      -241   -47.2% d_path
      -244   -57.5% complete
      -261  -100.0% group_release_blocks
      -263   -31.6% proc_root_link
      -264   -17.9% number
      -295   -38.6% strnlen_user
      -296   -26.8% task_vsize
      -320   -49.2% generic_file_aio_write_nolock
      -331   -75.1% call_rcu
      -334   -23.3% __fput
      -336   -56.4% may_open
      -339   -35.3% dput
      -343   -51.0% __find_get_block_slow
      -348   -40.6% d_instantiate
      -354   -65.1% __alloc_pages
      -371   -41.6% prune_dcache
      -377  -100.0% group_reserve_blocks
      -380   -22.6% release_task
      -398   -55.4% generic_fillattr
      -398   -31.4% exit_notify
      -420   -51.9% unmap_vmas
      -424   -35.5% file_kill
      -427   -72.7% read_inode_bitmap
      -435   -41.2% proc_check_root
      -459   -16.3% free_pages_and_swap_cache
      -480   -14.3% do_anonymous_page
      -517   -43.9% ext2_new_inode
      -519   -72.2% ext2_get_group_desc
      -527   -35.9% fd_install
      -537   -28.8% d_alloc
      -559   -30.4% __find_get_block
      -574   -49.7% __mark_inode_dirty
      -575   -38.0% .text.lock.highmem
      -580   -44.6% .text.lock.attr
      -598   -24.6% file_move
      -603   -23.4% copy_process
      -628   -27.4% filemap_nopage
      -633   -42.1% __set_page_dirty_buffers
      -634   -24.0% proc_pid_stat
      -636   -42.2% .text.lock.base
      -705   -28.5% link_path_walk
      -716   -60.9% flush_signal_handlers
      -758   -11.5% __copy_to_user_ll
      -780   -52.0% .text.lock.file_table
      -781   -36.3% free_hot_cold_page
      -834   -91.2% update_atime
      -906   -42.2% buffered_rmqueue
      -916   -56.0% __read_lock_failed
      -920   -34.3% kmem_cache_free
      -993   -38.1% path_release
     -1002   -71.0% __brelse
     -1106   -13.6% page_add_rmap
     -1256   -39.7% pte_alloc_one
     -1303   -29.3% do_no_page
     -1365   -83.5% grab_block
     -1522   -80.4% current_kernel_time
     -1819   -21.4% release_pages
     -1902   -14.2% copy_page_range
     -2149   -32.4% path_lookup
     -2150   -62.3% .text.lock.namei
     -2464   -21.4% __d_lookup
     -2486   -26.1% find_get_page
     -2499   -41.2% follow_mount
     -3174   -22.3% page_remove_rmap
     -3217   -65.7% .text.lock.dcache
     -4119   -42.3% atomic_dec_and_lock
     -4359   -24.6% zap_pte_range
     -4551  -100.0% .text.lock.filemap
     -4665   -64.7% .text.lock.dec_and_lock


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-09 17:07 ` Martin J. Bligh
@ 2003-04-09 18:24   ` Hugh Dickins
  2003-04-09 18:33     ` Martin J. Bligh
  0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2003-04-09 18:24 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, Dave McCracken, linux-kernel

On Wed, 9 Apr 2003, Martin J. Bligh wrote:

> Hmmm. Something somewhere went wrong. Some semaphore blew up
> somewhere ... I'm not convinced that this is your patch
> causing the problem, I just thought that since vma_link seems
> to have gone up rather in the profile. I'm playing with getting
> some better data on what actually happened, but in case someone
> is feeling psychic. 
> 
> The main thing I changed here (66-mjb2 -> 67-mjb0.2) was to pick up 
> Andrew's rmap speedups, and drop the objrmap code I had for the stuff 

I haven't examined it, but I'm guessing 66-mjb2 did not have Dave's
vma sorting in at all?  Its linear search would certainly raise the
time spent in __vma_link (notable in your diffprofile), which would
increase the pressure on i_shared_sem.

(Whether it's a worthwhile optimization remains to be seen: like
rmap generally, it speeds up page_referenced and try_to_unmap at
the expense of the fast path.  One improvement would be for fork
to just slot dst vma in next to src vma instead of linear search.)

I don't think my fix to the sort order could have slowed it down
further (though once there are stray entries out of order, it may
be hard to predict how things will work out).  But without it
page_referenced and try_to_unmap sometimes couldn't quite find
all the mappings they were looking for.

> he had. *However*, what he had worked fine. I also picked up your 
> sorting patch here Hugh ... this bit worries me:
> 
> +static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)

It does use i_shared_sem where it wasn't used before, yes, but it's
only called by one case of vma_merge and one case of split_vma:
unless your tests are doing a lot of vma splitting (e.g. mprotecting
ranges which break up vmas), I wouldn't expect it to figure highly.
I can see it's there in the plus part of your diffprofile, but I'm
too inexperienced at reading these things, without the original
profiles, to tell whether it's being used a surprising amount.

When you say "*However*, what he had worked fine", are you saying
you profiled before adding in my patch on top?  The diffprofile of
the before and after my patch should in that case illuminate.

Hugh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-09 18:24   ` Hugh Dickins
@ 2003-04-09 18:33     ` Martin J. Bligh
  2003-04-09 19:20       ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-09 18:33 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Dave McCracken, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 5724 bytes --]

>> Hmmm. Something somewhere went wrong. Some semaphore blew up
>> somewhere ... I'm not convinced that this is your patch
>> causing the problem, I just thought that since vma_link seems
>> to have gone up rather in the profile. I'm playing with getting
>> some better data on what actually happened, but in case someone
>> is feeling psychic. 
>> 
>> The main thing I changed here (66-mjb2 -> 67-mjb0.2) was to pick up 
>> Andrew's rmap speedups, and drop the objrmap code I had for the stuff 
> 
> I haven't examined it, but I'm guessing 66-mjb2 did not have Dave's
> vma sorting in at all?  Its linear search would certainly raise the
> time spent in __vma_link (notable in your diffprofile), which would
> increase the pressure on i_shared_sem.

No it didn't ... but I think 67-mm1 did.
 
> (Whether it's a worthwhile optimization remains to be seen: like
> rmap generally, it speeds up page_referenced and try_to_unmap at
> the expense of the fast path.  One improvement would be for fork
> to just slot dst vma in next to src vma instead of linear search.)
> 
> I don't think my fix to the sort order could have slowed it down
> further (though once there are stray entries out of order, it may
> be hard to predict how things will work out).  But without it
> page_referenced and try_to_unmap sometimes couldn't quite find
> all the mappings they were looking for.

It is that fix ... I just backed that one patch off and recompared:

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This 
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 32  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.67       100.0%         0.3%
            2.5.67-mjb0.2       151.7%         0.5%
     2.5.67-mjb0.2-nosort       207.1%         0.0%

SDET 64  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.67       100.0%         0.4%
            2.5.67-mjb0.2       147.0%         0.5%
     2.5.67-mjb0.2-nosort       201.5%         0.2%

SDET 128  (see disclaimer)
                           Throughput    Std. Dev
                   2.5.67       100.0%         5.1%
            2.5.67-mjb0.2       144.5%         0.1%
     2.5.67-mjb0.2-nosort       188.6%         0.3%


I think it's that sem, which seems to be heavily contented.
Quite possibly for glibc's address_space or something.
(even though it says "-nosort", it's just your sort fix I
backed out ... otherwise it's what was in -mm).

>> he had. *However*, what he had worked fine. I also picked up your 
>> sorting patch here Hugh ... this bit worries me:
>> 
>> +static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
> 
> It does use i_shared_sem where it wasn't used before, yes, but it's
> only called by one case of vma_merge and one case of split_vma:
> unless your tests are doing a lot of vma splitting (e.g. mprotecting
> ranges which break up vmas), I wouldn't expect it to figure highly.
> I can see it's there in the plus part of your diffprofile, but I'm
> too inexperienced at reading these things, without the original
> profiles, to tell whether it's being used a surprising amount.

Here's the diffprofile for just your patch ... where it's positive,
that's the increase in the number of ticks by applying your patch.
Where it's negative, that's the decrease. The %age is the change from
the first to the second profile:

larry:/var/bench/results# diffprofile 2.5.67-mjb0.2{-nosort,}/sdetbench/64/profile
      7148    24.9% total
      6482    37.7% default_idle
      1466   842.5% __down
       442   566.7% __wake_up
       435   378.3% schedule
       251     0.0% move_vma_start
       149   876.5% __vma_link
        72    40.2% remove_shared_vm_struct
        46    35.1% copy_mm
        20    60.6% vma_link
        12   300.0% default_wake_function
        11   137.5% rb_insert_color
...
       -20   -37.0% number
       -20   -12.6% do_anonymous_page
       -21   -36.8% fd_install
       -23   -27.7% __find_get_block
       -24   -55.8% flush_signal_handlers
       -27   -45.0% __set_page_dirty_buffers
       -28   -26.7% kmem_cache_free
       -28    -7.5% find_get_page
       -29   -34.1% buffered_rmqueue
       -32   -34.8% path_release
       -33   -32.0% file_move
       -35   -60.3% __read_lock_failed
       -35   -43.8% .text.lock.highmem
       -37   -59.7% .text.lock.namei
       -37   -29.1% pte_alloc_one
       -40   -10.3% page_add_rmap
       -41   -41.4% free_hot_cold_page
       -44   -60.3% .text.lock.file_table
       -54   -18.4% __copy_to_user_ll
       -58   -43.0% follow_mount
       -62   -29.0% path_lookup
       -85   -20.9% __d_lookup
       -86   -20.4% release_pages
       -99   -68.8% .text.lock.dcache
      -100   -15.4% page_remove_rmap
      -106   -36.6% atomic_dec_and_lock
      -126   -16.8% zap_pte_range
      -141   -66.8% .text.lock.dec_and_lock

Note the massive increase in down() (and some of the vma ops).
The things that are cheaper are probably just because of less
contention, I guess.

> When you say "*However*, what he had worked fine", are you saying
> you profiled before adding in my patch on top?  The diffprofile of
> the before and after my patch should in that case illuminate.

Well, I hadn't ... but I should have done, and I have now ;-)

I'll attach the two raw profiles for you as well. profile.with
is with your patch, profile.without is without ... I was looking
at SDET 64, since it showed the most dramatic difference.

M.

[-- Attachment #2: profile.with --]
[-- Type: application/octet-stream, Size: 9977 bytes --]

35905 total
23653 default_idle
1640 __down
622 zap_pte_range
580 copy_page_range
551 page_remove_rmap
550 schedule
520 __wake_up
349 page_add_rmap
347 find_get_page
335 release_pages
321 __d_lookup
251 remove_shared_vm_struct
251 move_vma_start
239 __copy_to_user_ll
184 atomic_dec_and_lock
177 copy_mm
166 __vma_link
163 do_wp_page
152 path_lookup
150 do_no_page
139 do_anonymous_page
136 clear_page_tables
121 free_pages_and_swap_cache
116 do_page_fault
90 pte_alloc_one
89 filemap_nopage
88 copy_process
77 kmem_cache_free
77 follow_mount
70 file_move
70 .text.lock.dec_and_lock
65 link_path_walk
62 release_task
60 path_release
60 __find_get_block
58 proc_pid_stat
58 free_hot_cold_page
56 buffered_rmqueue
54 find_trylock_page
53 vma_link
52 page_address
51 grab_block
51 d_alloc
51 __block_prepare_write
45 system_call
45 .text.lock.highmem
45 .text.lock.dcache
43 fput
43 exit_notify
42 __fput
41 kmap_atomic
41 __copy_user_intel
41 __copy_from_user_ll
39 handle_mm_fault
38 kmap_high
38 file_kill
36 find_vma
36 fd_install
34 number
34 alloc_inode
33 __set_page_dirty_buffers
31 fget
31 ext2_new_inode
29 new_inode
29 dnotify_parent
29 .text.lock.file_table
28 kmalloc
25 .text.lock.namei
24 set_page_address
24 ext2_update_inode
24 dentry_open
23 task_vsize
23 exit_mmap
23 dput
23 do_generic_mapping_read
23 deny_write_access
23 __read_lock_failed
22 vsnprintf
22 real_lookup
22 do_mmap_pgoff
22 block_invalidatepage
21 d_instantiate
21 __mark_inode_dirty
20 radix_tree_lookup
20 current_kernel_time
19 rb_insert_color
19 flush_signal_handlers
18 unmap_vmas
18 strnlen_user
18 pte_alloc_map
18 file_ra_state_init
18 ext2_new_block
18 do_page_cache_readahead
17 __generic_file_aio_read
16 prune_dcache
16 ext2_free_blocks
16 default_wake_function
15 read_block_bitmap
15 proc_pid_status
15 generic_file_aio_write_nolock
15 generic_delete_inode
15 exec_mmap
15 __insert_inode_hash
14 task_mem
14 select_parent
14 render_sigset_t
14 get_pid_list
14 do_lookup
14 __brelse
13 igrab
13 find_vma_prepare
13 filp_close
12 vfs_read
12 sys_brk
12 prep_new_page
12 find_group_other
12 del_timer_sync
12 d_delete
12 __pagevec_lru_add_active
12 __find_get_block_slow
11 unlock_page
11 try_to_wake_up
11 truncate_inode_pages
11 split_vma
11 proc_check_root
11 may_open
11 kunmap_high
11 kunmap_atomic
11 generic_fillattr
11 find_get_pages
11 ext2_find_entry
11 copy_files
11 .text.lock.attr
10 proc_root_link
10 proc_fd_link
10 open_namei
10 mark_page_accessed
10 inode_change_ok
10 ext2_get_inode
10 ext2_get_block
10 dup_task_struct
10 copy_strings
9 wake_up_forked_process
9 sys_wait4
9 strncpy_from_user
9 read_inode_bitmap
9 flush_tlb_mm
9 ext2_preread_inode
9 ext2_get_group_desc
9 ext2_discard_prealloc
9 ext2_add_link
9 create_buffers
9 __vma_link_rb
9 .text.lock.base
8 get_unused_fd
8 complete
8 __alloc_pages
7 vm_enough_memory
7 vfs_unlink
7 try_to_free_buffers
7 flush_old_exec
7 ext2_truncate
7 ext2_reserve_inode
7 do_sigaction
7 d_lookup
7 __pte_chain_free
7 .text.lock.inode
6 vm_acct_memory
6 sigprocmask
6 page_cache_readahead
6 get_write_access
6 get_empty_filp
6 generic_file_write
6 do_exit
6 dnotify_flush
6 __block_commit_write
5 vma_merge
5 sys_read
5 sys_close
5 pte_chain_alloc
5 old_mmap
5 lru_cache_add_active
5 get_wchan
5 get_signal_to_deliver
5 generic_file_mmap
5 flush_tlb_page
5 file_read_actor
5 ext2_block_to_path
5 call_rcu
4 vfs_write
4 vfs_getattr
4 update_atime
4 sys_open
4 schedule_tail
4 proc_pid_readlink
4 proc_delete_inode
4 pipe_write
4 pid_fd_revalidate
4 load_elf_binary
4 kmem_cache_alloc
4 generic_file_open
4 ext2_inode_by_name
4 ext2_commit_chunk
4 do_munmap
4 d_path
4 cp_new_stat64
4 build_mmap_rb
4 bad_range
4 __rb_rotate_left
4 __rb_erase_color
4 __pagevec_lru_add
4 __lookup
3 wait_task_zombie
3 vfs_readdir
3 vfs_permission
3 unmap_vma
3 set_cpus_allowed
3 search_binary_handler
3 sched_best_cpu
3 proc_info_read
3 pid_revalidate
3 lookup_mnt
3 iput
3 inode_update_time
3 getname
3 generic_file_read
3 find_lock_page
3 ext2_release_inode
3 ext2_readdir
3 ext2_get_page
3 ext2_free_inode
3 ext2_free_branches
3 eventpoll_release
3 eligible_child
3 do_brk
3 clear_user
3 alloc_pidmap
3 __set_page_dirty_nobuffers
3 __rb_rotate_right
3 __iget
2 wait_for_completion
2 unmap_region
2 unmap_page_range
2 sys_unlink
2 sys_ioctl
2 setup_arg_pages
2 set_fs_pwd
2 sem_exit
2 remove_wait_queue
2 rcu_do_batch
2 put_filp
2 put_files_struct
2 profile_exit_mmap
2 proc_root_lookup
2 proc_pid_lookup
2 proc_lookup
2 proc_base_lookup
2 prepare_binprm
2 pipe_read
2 pid_base_iput
2 pgd_free
2 pgd_ctor
2 permission
2 page_waitqueue
2 page_cache_readaround
2 mm_alloc
2 migration_thread
2 mark_buffer_dirty
2 lru_add_drain
2 lookup_hash
2 locks_remove_posix
2 load_elf_interp
2 kstat_read_proc
2 kill_fasync
2 kfree_percpu
2 inode_times_differ
2 generic_file_llseek
2 generic_commit_write
2 ext2_lookup
2 ext2_delete_entry
2 drop_buffers
2 do_execve
2 create_empty_buffers
2 copy_thread
2 copy_namespace
2 bad_get_user
2 __free_pages
2 __d_path
2 __clear_page_buffers
2 __bread
2 .text.lock.sem
1 zap_pmd_range
1 vfs_stat
1 vfs_rmdir
1 test_clear_page_dirty
1 task_dumpable
1 syscall_exit
1 sys_write
1 sys_rt_sigprocmask
1 sys_rt_sigaction
1 sys_newuname
1 sys_lstat64
1 sys_llseek
1 sys_execve
1 set_bh_page
1 rwsem_wake
1 ret_from_intr
1 remove_from_page_cache
1 read_cache_page
1 rb_erase
1 radix_tree_preload
1 radix_tree_insert
1 radix_tree_delete
1 profile_exit_task
1 proc_pid_make_inode
1 proc_lookupfd
1 pipe_wait
1 pgd_alloc
1 open_exec
1 notify_change
1 mprotect_fixup
1 mm_release
1 lru_cache_add
1 is_bad_inode
1 invalidate_vcache
1 insert_vm_struct
1 inode_setattr
1 inode_init_once
1 inode_has_buffers
1 get_unmapped_area
1 get_jiffies_64
1 generic_file_write_nolock
1 find_vma_prev
1 find_task_by_pid
1 find_group_orlov
1 filp_open
1 filldir64
1 ext2_statfs
1 ext2_prepare_write
1 ext2_group_sparse
1 ext2_count_free_inodes
1 ext2_alloc_block
1 exit_itimers
1 elf_map
1 do_fork
1 detach_vmas_to_be_unmapped
1 d_rehash
1 d_invalidate
1 d_free
1 count_open_files
1 change_protection
1 can_vma_merge_after
1 cache_grow
1 balance_dirty_pages_ratelimited
1 add_to_page_cache
1 __user_walk
1 __up
1 __pagevec_free
1 __getblk
1 __get_page_state
1 __get_free_pages
1 __copy_user_zeroing_intel
1 .text.lock.dnotify
0 write_profile
0 wake_up_buffer
0 wait_on_page_bit
0 vsscanf
0 vfs_mkdir
0 vfs_lstat
0 vfs_fstat
0 vfs_follow_link
0 vfs_create
0 unmap_vma_list
0 unmap_underlying_metadata
0 unlock_buffer
0 unix_create1
0 unix_create
0 try_to_release_page
0 truncate_complete_page
0 task_nice
0 syscall_call
0 sys_vhangup
0 sys_vfork
0 sys_utime
0 sys_time
0 sys_sysctl
0 sys_statfs
0 sys_sigreturn
0 sys_readlink
0 sys_munmap
0 sys_mprotect
0 sys_getpid
0 sys_fstat64
0 sys_dup2
0 sys_chown
0 sys_chmod
0 sys_chdir
0 sys_access
0 sprintf
0 sock_map_fd
0 sock_init_data
0 smp_call_function
0 skip_atoi
0 sk_alloc
0 si_swapinfo
0 setup_sigcontext
0 setup_frame
0 setattr_mask
0 set_binfmt
0 sched_migrate_task
0 save_i387_fxsave
0 save_i387
0 rwsem_down_write_failed
0 rwsem_down_read_failed
0 resume_userspace
0 restore_sigcontext
0 restore_fpu
0 restore_all
0 remove_suid
0 release_x86_irqs
0 recalc_bh_state
0 radix_tree_gang_lookup
0 radix_tree_extend
0 put_unused_fd
0 pty_unthrottle
0 pte_alloc_kernel
0 proc_root_readdir
0 proc_read_inode
0 proc_file_read
0 proc_file_lseek
0 proc_destroy_inode
0 proc_alloc_inode
0 prepare_to_copy
0 posix_block_lock
0 pipe_write_release
0 pipe_write_fasync
0 pipe_release
0 pipe_read_fasync
0 pipe_ioctl
0 pid_delete_dentry
0 parse_table
0 pagevec_lookup
0 page_slot
0 open_private_file
0 nr_free_pages
0 nr_blockdev_pages
0 nobh_prepare_write
0 next_thread
0 mmput
0 mm_init
0 lookup_one_len
0 lookup_create
0 lookup_chrfops
0 locks_remove_flock
0 lock_rename
0 kunmap
0 kmap_atomic_to_page
0 kfree_skbmem
0 kfree
0 kernel_read
0 is_subdir
0 invalidate_inode_buffers
0 invalidate_bh_lru
0 inode_sub_bytes
0 inode_add_bytes
0 init_new_context
0 init_dev
0 in_group_p
0 iget_locked
0 hash_vcache
0 handle_signal
0 handle_ra_miss
0 getrusage
0 get_zone_counts
0 get_zeroed_page
0 get_vmalloc_info
0 get_pipe_inode
0 get_offset_tsc
0 get_new_inode_fast
0 get_chrfops
0 generic_forget_inode
0 generic_drop_inode
0 fs_may_remount_ro
0 free_task_struct
0 free_pgtables
0 free_pages
0 free_buffer_head
0 follow_down
0 flush_tlb_others
0 flush_thread
0 flush_all_zero_pkmaps
0 finish_wait
0 find_busiest_node
0 filp_ctor
0 fillonedir
0 fcntl_dirnotify
0 fasync_helper
0 ext2_unlink
0 ext2_setattr
0 ext2_set_link
0 ext2_set_inode_flags
0 ext2_rmdir
0 ext2_release_file
0 ext2_put_inode
0 ext2_make_empty
0 ext2_last_byte
0 ext2_ioctl
0 ext2_get_branch
0 ext2_find_near
0 ext2_empty_dir
0 ext2_destroy_inode
0 ext2_delete_inode
0 ext2_create
0 ext2_check_page
0 ext2_bg_num_gdb
0 ext2_alloc_inode
0 ext2_alloc_branch
0 expand_stack
0 expand_files
0 expand_fd_array
0 error_code
0 end_page_writeback
0 down_tty_sem
0 do_sync_write
0 do_signal
0 do_pipe
0 do_mpage_readpage
0 do_gettimeofday
0 do_fcntl
0 destroy_context
0 de_put
0 d_validate
0 d_unhash
0 d_callback
0 d_alloc_root
0 create_elf_tables
0 cpu_sched_info
0 cpu_idle
0 count
0 copy_strings_kernel
0 copy_semundo
0 compute_creds
0 clear_inode
0 chrdev_open
0 chown_common
0 check_tty_count
0 cap_bprm_set_security
0 cap_bprm_compute_creds
0 can_share_swap_page
0 cached_lookup
0 bounce_copy_vec
0 block_truncate_page
0 block_prepare_write
0 block_commit_write
0 bh_waitq_head
0 bh_lru_install
0 bad_page
0 alloc_buffer_head
0 add_wait_queue
0 add_to_page_cache_lru
0 __set_page_buffers
0 __remove_from_page_cache
0 __posix_lock_file
0 __pmd_alloc
0 __pagevec_release
0 __mmdrop
0 __get_user_4
0 __down_failed
0 __cond_resched
0 __block_write_full_page
0 .text.lock.vcache
0 .text.lock.tty_io
0 .text.lock.sysctl
0 .text.lock.root
0 .text.lock.page_writeback
0 .text.lock.mmap
0 .text.lock.ioctl
0 .text.lock.ialloc
0 .text.lock.fs_writeback
0 .text.lock.fork
0 .text.lock.exec
0 .text.lock.char_dev
0 .text.lock.balloc
0 .text.lock.array

[-- Attachment #3: profile.without --]
[-- Type: application/octet-stream, Size: 10188 bytes --]

28757 total
17171 default_idle
748 zap_pte_range
651 page_remove_rmap
572 copy_page_range
421 release_pages
406 __d_lookup
389 page_add_rmap
375 find_get_page
293 __copy_to_user_ll
290 atomic_dec_and_lock
214 path_lookup
211 .text.lock.dec_and_lock
179 remove_shared_vm_struct
174 __down
163 do_no_page
161 do_wp_page
159 do_anonymous_page
144 .text.lock.dcache
138 free_pages_and_swap_cache
135 follow_mount
131 copy_mm
130 clear_page_tables
127 pte_alloc_one
115 schedule
109 do_page_fault
105 kmem_cache_free
103 file_move
101 filemap_nopage
99 free_hot_cold_page
92 path_release
92 copy_process
85 buffered_rmqueue
83 __find_get_block
80 .text.lock.highmem
78 __wake_up
77 proc_pid_stat
74 link_path_walk
73 .text.lock.file_table
70 d_alloc
66 release_task
66 find_trylock_page
62 .text.lock.namei
61 __block_prepare_write
60 __set_page_dirty_buffers
59 page_address
59 grab_block
58 __read_lock_failed
57 fd_install
56 kmap_high
55 exit_notify
55 __fput
54 number
54 file_kill
47 system_call
44 kmap_atomic
43 flush_signal_handlers
43 fget
42 fput
41 ext2_new_inode
40 handle_mm_fault
40 dnotify_parent
40 __copy_from_user_ll
39 alloc_inode
37 unmap_vmas
37 __copy_user_intel
36 new_inode
35 kmalloc
35 d_instantiate
35 block_invalidatepage
33 vma_link
33 ext2_update_inode
31 prune_dcache
31 find_vma
31 do_generic_mapping_read
31 __mark_inode_dirty
31 __insert_inode_hash
30 dput
30 do_page_cache_readahead
30 __brelse
29 strnlen_user
27 radix_tree_lookup
27 ext2_new_block
26 set_page_address
26 read_block_bitmap
26 generic_file_aio_write_nolock
26 dentry_open
26 .text.lock.attr
25 task_vsize
25 real_lookup
25 ext2_free_blocks
24 vsnprintf
24 do_mmap_pgoff
24 .text.lock.base
23 deny_write_access
21 __generic_file_aio_read
21 __find_get_block_slow
21 .text.lock.inode
20 proc_check_root
20 get_pid_list
20 find_get_pages
19 file_ra_state_init
19 ext2_find_entry
19 copy_files
18 truncate_inode_pages
18 task_mem
18 render_sigset_t
18 proc_root_link
18 proc_pid_status
18 exit_mmap
17 vfs_read
17 pte_alloc_map
17 kunmap_high
17 ext2_discard_prealloc
17 __vma_link
17 __pagevec_lru_add_active
16 igrab
16 generic_delete_inode
16 complete
15 select_parent
15 inode_change_ok
15 generic_fillattr
15 find_group_other
15 ext2_get_group_desc
15 d_delete
14 prep_new_page
14 may_open
14 mark_page_accessed
14 get_unused_fd
14 filp_close
14 ext2_preread_inode
14 exec_mmap
14 dup_task_struct
14 do_lookup
13 ext2_get_inode
13 current_kernel_time
12 sys_wait4
12 ext2_get_block
12 create_buffers
11 strncpy_from_user
11 ext2_add_link
11 del_timer_sync
11 d_path
10 vfs_unlink
10 rcu_do_batch
10 kunmap_atomic
10 get_write_access
10 do_sigaction
10 __alloc_pages
9 unlock_page
9 try_to_free_buffers
9 sys_brk
9 read_inode_bitmap
9 open_namei
9 flush_tlb_mm
9 ext2_reserve_inode
9 ext2_inode_by_name
9 d_lookup
9 copy_strings
9 __iget
8 vm_enough_memory
8 try_to_wake_up
8 truncate_complete_page
8 split_vma
8 sched_best_cpu
8 rb_insert_color
8 proc_fd_link
8 ext2_block_to_path
8 do_exit
8 dnotify_flush
7 vfs_write
7 vfs_getattr
7 load_elf_binary
7 get_wchan
7 get_empty_filp
7 flush_old_exec
7 file_read_actor
7 ext2_truncate
7 __block_commit_write
6 wake_up_forked_process
6 sigprocmask
6 set_cpus_allowed
6 get_signal_to_deliver
6 generic_file_write
6 find_vma_prepare
6 find_lock_page
6 ext2_release_inode
6 ext2_readdir
6 ext2_free_inode
6 ext2_commit_chunk
6 bad_range
6 bad_get_user
6 __lookup
5 wait_task_zombie
5 vma_merge
5 vm_acct_memory
5 vfs_permission
5 sys_close
5 setup_arg_pages
5 schedule_tail
5 profile_exit_mmap
5 pid_revalidate
5 page_cache_readahead
5 lookup_mnt
5 iput
5 ext2_delete_entry
5 d_rehash
5 cp_new_stat64
5 call_rcu
5 __pte_chain_free
5 __clear_page_buffers
4 wait_for_completion
4 update_atime
4 search_binary_handler
4 pte_chain_alloc
4 proc_pid_lookup
4 proc_delete_inode
4 prepare_binprm
4 pid_fd_revalidate
4 page_waitqueue
4 mm_alloc
4 lru_cache_add_active
4 kstat_read_proc
4 kmem_cache_alloc
4 inode_has_buffers
4 generic_file_read
4 generic_file_open
4 flush_tlb_page
4 ext2_free_branches
4 do_munmap
4 default_wake_function
4 alloc_pidmap
4 add_to_page_cache
4 __set_page_dirty_nobuffers
4 __pagevec_lru_add
3 vfs_readdir
3 unmap_vma
3 sys_read
3 sys_open
3 proc_root_lookup
3 pipe_write
3 migration_thread
3 mark_buffer_dirty
3 locks_remove_posix
3 inode_times_differ
3 inode_setattr
3 generic_file_mmap
3 ext2_get_page
3 ext2_count_free_inodes
3 eventpoll_release
3 do_execve
3 do_brk
3 create_empty_buffers
3 clear_user
3 __free_pages
3 __copy_user_zeroing_intel
3 .text.lock.sem
2 zap_pmd_range
2 unmap_region
2 unlock_buffer
2 sys_unlink
2 sys_newuname
2 sys_ioctl
2 sys_execve
2 set_fs_pwd
2 sem_exit
2 ret_from_intr
2 remove_wait_queue
2 radix_tree_insert
2 radix_tree_delete
2 put_filp
2 put_files_struct
2 proc_pid_readlink
2 proc_pid_make_inode
2 proc_info_read
2 proc_base_lookup
2 pipe_read
2 pid_base_iput
2 pgd_free
2 pgd_ctor
2 pgd_alloc
2 permission
2 open_exec
2 old_mmap
2 load_elf_interp
2 kill_fasync
2 invalidate_vcache
2 getname
2 generic_file_write_nolock
2 generic_file_llseek
2 generic_commit_write
2 find_vma_prev
2 find_busiest_node
2 filldir64
2 eligible_child
2 drop_buffers
2 do_fork
2 d_invalidate
2 cap_bprm_compute_creds
2 cache_grow
2 __vma_link_rb
2 __rb_rotate_left
2 __rb_erase_color
2 __get_page_state
2 .text.lock.root
2 .text.lock.ialloc
2 .text.lock.fs_writeback
1 vfs_rmdir
1 vfs_follow_link
1 vfs_create
1 unmap_page_range
1 task_dumpable
1 syscall_exit
1 sys_write
1 sys_rt_sigprocmask
1 sys_rt_sigaction
1 sys_dup2
1 rwsem_wake
1 remove_suid
1 remove_from_page_cache
1 profile_exit_task
1 proc_lookupfd
1 proc_lookup
1 pipe_wait
1 pid_delete_dentry
1 page_cache_readaround
1 notify_change
1 next_thread
1 mmput
1 mm_release
1 mm_init
1 lru_cache_add
1 lru_add_drain
1 lookup_hash
1 inode_update_time
1 inode_sub_bytes
1 inode_init_once
1 get_zone_counts
1 get_unmapped_area
1 get_offset_tsc
1 follow_down
1 flush_thread
1 find_task_by_pid
1 find_group_orlov
1 filp_open
1 ext2_statfs
1 ext2_rmdir
1 ext2_lookup
1 ext2_get_branch
1 ext2_check_page
1 ext2_alloc_branch
1 ext2_alloc_block
1 exit_itimers
1 error_code
1 elf_map
1 do_sync_write
1 d_alloc_root
1 create_elf_tables
1 copy_thread
1 copy_namespace
1 chown_common
1 change_protection
1 can_share_swap_page
1 cached_lookup
1 block_prepare_write
1 bh_lru_install
1 balance_dirty_pages_ratelimited
1 alloc_buffer_head
1 __user_walk
1 __set_page_buffers
1 __remove_from_page_cache
1 __pagevec_free
1 __mmdrop
1 __getblk
1 __get_free_pages
1 __d_path
1 .text.lock.ioctl
1 .text.lock.dnotify
0 write_profile
0 write_inode
0 wake_up_process
0 wake_up_buffer
0 wait_on_page_bit
0 vsscanf
0 vsprintf
0 vmtruncate
0 vfs_stat
0 vfs_rename
0 vfs_mkdir
0 vfs_lstat
0 vfs_fstat
0 up_tty_sem
0 unix_sock_destructor
0 unix_mkname
0 unix_create1
0 tty_drivers_read_proc
0 test_clear_page_dirty
0 task_prio
0 task_nice
0 sysctl_string
0 syscall_call
0 sys_vhangup
0 sys_utime
0 sys_time
0 sys_sysctl
0 sys_setpgid
0 sys_rmdir
0 sys_readlink
0 sys_mprotect
0 sys_mkdir
0 sys_lstat64
0 sys_llseek
0 sys_gettimeofday
0 sys_getrlimit
0 sys_getdents64
0 sys_getcwd
0 sys_fstat64
0 sys_fcntl64
0 sys_epoll_create
0 sys_chmod
0 sys_access
0 sync_supers
0 supplemental_group_member
0 sprintf
0 sock_map_fd
0 smp_call_function
0 sk_alloc
0 si_swapinfo
0 setup_frame
0 setattr_mask
0 set_brk
0 set_bh_page
0 send_IPI_mask_sequence
0 sched_migrate_task
0 sched_balance_exec
0 save_i387
0 rwsem_down_read_failed
0 resume_userspace
0 restore_fpu
0 restore_all
0 release_x86_irqs
0 release_thread
0 register_reboot_notifier
0 recalc_bh_state
0 read_zero
0 read_cache_page
0 rcu_process_callbacks
0 rb_erase
0 radix_tree_preload
0 radix_tree_gang_lookup
0 put_unused_fd
0 put_dirty_page
0 pty_open
0 pte_alloc_kernel
0 proc_pid_readdir
0 proc_permission
0 proc_get_inode
0 proc_file_read
0 proc_file_lseek
0 proc_destroy_inode
0 proc_alloc_inode
0 prepare_to_wait_exclusive
0 posix_block_lock
0 pipe_write_fasync
0 pipe_release
0 pipe_read_release
0 pagevec_lookup
0 page_slot
0 padzero
0 nr_running
0 nr_iowait
0 nr_free_pages
0 nr_context_switches
0 nr_blockdev_pages
0 mprotect_fixup
0 math_state_restore
0 lookup_chrfops
0 locks_remove_flock
0 locks_insert_lock
0 lock_rename
0 locate_fd
0 kunmap
0 ksoftirqd
0 kmap_atomic_to_page
0 kmap
0 kfree_percpu
0 kfree
0 kernel_read
0 is_bad_inode
0 invalidate_inode_buffers
0 invalidate_bh_lru
0 insert_vm_struct
0 inode_needs_sync
0 inode_add_bytes
0 init_fpu
0 init_dev
0 in_group_p
0 hash_vcache
0 handle_signal
0 handle_ra_miss
0 grab_cache_page_nowait
0 get_vmalloc_info
0 get_pipe_inode
0 get_new_inode_fast
0 get_chrfops
0 generic_forget_inode
0 generic_file_readv
0 generic_file_aio_read
0 generic_drop_inode
0 fs_may_remount_ro
0 free_task_struct
0 free_pgtables
0 free_pages
0 free_buffer_head
0 flush_all_zero_pkmaps
0 finish_wait
0 find_or_create_page
0 fcntl_dirnotify
0 fasync_helper
0 ext2_unlink
0 ext2_setattr
0 ext2_set_link
0 ext2_set_inode_flags
0 ext2_rename
0 ext2_release_file
0 ext2_put_inode
0 ext2_mknod
0 ext2_make_empty
0 ext2_ioctl
0 ext2_group_sparse
0 ext2_follow_link
0 ext2_find_near
0 ext2_empty_dir
0 ext2_create
0 ext2_count_free_blocks
0 ext2_count_dirs
0 ext2_bg_has_super
0 ext2_alloc_inode
0 expand_stack
0 expand_fd_array
0 exit_aio
0 eventpoll_init_file
0 do_truncate
0 do_softirq
0 do_signal
0 do_proc_readlink
0 do_pipe
0 do_gettimeofday
0 do_file_page
0 device_not_available
0 detach_vmas_to_be_unmapped
0 destroy_inode
0 de_put
0 d_validate
0 d_unhash
0 d_move
0 d_free
0 cpu_sched_info
0 count_open_files
0 count
0 copy_semundo
0 convert_fxsr_to_user
0 compute_creds
0 clear_inode
0 check_tty_count
0 check_ttfb_buffer
0 cap_bprm_set_security
0 can_vma_merge_after
0 build_mmap_rb
0 block_truncate_page
0 block_commit_write
0 bad_page
0 background_writeout
0 add_wait_queue
0 activate_page
0 __up
0 __rb_rotate_right
0 __put_task_struct
0 __put_ioctx
0 __posix_lock_file
0 __get_user_1
0 __filemap_copy_from_user_iovec
0 __down_failed_interruptible
0 __bread
0 .text.lock.tty_io
0 .text.lock.sysctl
0 .text.lock.sys_i386
0 .text.lock.rcupdate
0 .text.lock.open
0 .text.lock.mmap
0 .text.lock.locks
0 .text.lock.fork
0 .text.lock.char_dev
0 .text.lock.buffer
0 .text.lock.balloc

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-09 18:33     ` Martin J. Bligh
@ 2003-04-09 19:20       ` Hugh Dickins
  2003-04-09 20:11         ` William Lee Irwin III
  2003-04-10 13:52         ` Hugh Dickins
  0 siblings, 2 replies; 13+ messages in thread
From: Hugh Dickins @ 2003-04-09 19:20 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, Dave McCracken, linux-kernel

On Wed, 9 Apr 2003, Martin J. Bligh wrote:
> >> Hmmm. Something somewhere went wrong. Some semaphore blew up
> >> somewhere ... I'm not convinced that this is your patch
> >> causing the problem, I just thought that since vma_link seems
> >> to have gone up rather in the profile. I'm playing with getting
> >> some better data on what actually happened, but in case someone
> >> is feeling psychic. 
> >> 
> >> The main thing I changed here (66-mjb2 -> 67-mjb0.2) was to pick up 
> >> Andrew's rmap speedups, and drop the objrmap code I had for the stuff 
> > 
> > I haven't examined it, but I'm guessing 66-mjb2 did not have Dave's
> > vma sorting in at all?  Its linear search would certainly raise the
> > time spent in __vma_link (notable in your diffprofile), which would
> > increase the pressure on i_shared_sem.
> 
> No it didn't ... but I think 67-mm1 did.
>  
> > (Whether it's a worthwhile optimization remains to be seen: like
> > rmap generally, it speeds up page_referenced and try_to_unmap at
> > the expense of the fast path.  One improvement would be for fork
> > to just slot dst vma in next to src vma instead of linear search.)

Ignore that last parenthetical sentence: I just took a look at copy_mm,
noticing it up in your diffprofile, and it does already slot new vma
in next to old vma without linear search.

> > I don't think my fix to the sort order could have slowed it down
> > further (though once there are stray entries out of order, it may
> > be hard to predict how things will work out).  But without it
> > page_referenced and try_to_unmap sometimes couldn't quite find
> > all the mappings they were looking for.
> 
> It is that fix ... I just backed that one patch off and recompared:

Thanks.  Yes, seems conclusive, but I'm puzzled.
I hope a fresh pair of eyes can work it out for us.

> DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
> trademarks of the Standard Performance Evaluation Corporation. This 
> benchmarking was performed for research purposes only, and the run results
> are non-compliant and not-comparable with any published results.
> 
> Results are shown as percentages of the first set displayed
> 
> SDET 32  (see disclaimer)
>                            Throughput    Std. Dev
>                    2.5.67       100.0%         0.3%
>             2.5.67-mjb0.2       151.7%         0.5%
>      2.5.67-mjb0.2-nosort       207.1%         0.0%
> 
> SDET 64  (see disclaimer)
>                            Throughput    Std. Dev
>                    2.5.67       100.0%         0.4%
>             2.5.67-mjb0.2       147.0%         0.5%
>      2.5.67-mjb0.2-nosort       201.5%         0.2%
> 
> SDET 128  (see disclaimer)
>                            Throughput    Std. Dev
>                    2.5.67       100.0%         5.1%
>             2.5.67-mjb0.2       144.5%         0.1%
>      2.5.67-mjb0.2-nosort       188.6%         0.3%
> 
> 
> I think it's that sem, which seems to be heavily contented.
> Quite possibly for glibc's address_space or something.
> (even though it says "-nosort", it's just your sort fix I
> backed out ... otherwise it's what was in -mm).

Certainly your idea of glibc's address_space is plausible: I can
well imagine (sorry, can't try right now) that it patches the mmap
of some jump tables, doing mprotect and split and merge.  But
split_vma and vma_merge didn't show all that high before.  Of
course, the inline __vma_link_file in move_vma_start will push
it quite high, but I still don't see why __down soars that high.

> >> he had. *However*, what he had worked fine. I also picked up your 
> >> sorting patch here Hugh ... this bit worries me:
> >> 
> >> +static void move_vma_start(struct vm_area_struct *vma, unsigned long addr)
> > 
> > It does use i_shared_sem where it wasn't used before, yes, but it's
> > only called by one case of vma_merge and one case of split_vma:
> > unless your tests are doing a lot of vma splitting (e.g. mprotecting
> > ranges which break up vmas), I wouldn't expect it to figure highly.
> > I can see it's there in the plus part of your diffprofile, but I'm
> > too inexperienced at reading these things, without the original
> > profiles, to tell whether it's being used a surprising amount.
> 
> Here's the diffprofile for just your patch ... where it's positive,
> that's the increase in the number of ticks by applying your patch.
> Where it's negative, that's the decrease. The %age is the change from
> the first to the second profile:
> 
> larry:/var/bench/results# diffprofile 2.5.67-mjb0.2{-nosort,}/sdetbench/64/profile
>       7148    24.9% total
>       6482    37.7% default_idle
>       1466   842.5% __down
>        442   566.7% __wake_up
>        435   378.3% schedule
>        251     0.0% move_vma_start
>        149   876.5% __vma_link
>         72    40.2% remove_shared_vm_struct
>         46    35.1% copy_mm
>         20    60.6% vma_link
>         12   300.0% default_wake_function
>         11   137.5% rb_insert_color
> ...
>        -20   -37.0% number
>        -20   -12.6% do_anonymous_page
>        -21   -36.8% fd_install
>        -23   -27.7% __find_get_block
>        -24   -55.8% flush_signal_handlers
>        -27   -45.0% __set_page_dirty_buffers
>        -28   -26.7% kmem_cache_free
>        -28    -7.5% find_get_page
>        -29   -34.1% buffered_rmqueue
>        -32   -34.8% path_release
>        -33   -32.0% file_move
>        -35   -60.3% __read_lock_failed
>        -35   -43.8% .text.lock.highmem
>        -37   -59.7% .text.lock.namei
>        -37   -29.1% pte_alloc_one
>        -40   -10.3% page_add_rmap
>        -41   -41.4% free_hot_cold_page
>        -44   -60.3% .text.lock.file_table
>        -54   -18.4% __copy_to_user_ll
>        -58   -43.0% follow_mount
>        -62   -29.0% path_lookup
>        -85   -20.9% __d_lookup
>        -86   -20.4% release_pages
>        -99   -68.8% .text.lock.dcache
>       -100   -15.4% page_remove_rmap
>       -106   -36.6% atomic_dec_and_lock
>       -126   -16.8% zap_pte_range
>       -141   -66.8% .text.lock.dec_and_lock
> 
> Note the massive increase in down() (and some of the vma ops).
> The things that are cheaper are probably just because of less
> contention, I guess.
> 
> > When you say "*However*, what he had worked fine", are you saying
> > you profiled before adding in my patch on top?  The diffprofile of
> > the before and after my patch should in that case illuminate.
> 
> Well, I hadn't ... but I should have done, and I have now ;-)
> 
> I'll attach the two raw profiles for you as well. profile.with
> is with your patch, profile.without is without ... I was looking
> at SDET 64, since it showed the most dramatic difference.

Thanks for all the info, I'm sorry, I must rush away now.
I'll try another think later, but hope someone can do better.

Hugh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-09 19:20       ` Hugh Dickins
@ 2003-04-09 20:11         ` William Lee Irwin III
  2003-04-10 13:52         ` Hugh Dickins
  1 sibling, 0 replies; 13+ messages in thread
From: William Lee Irwin III @ 2003-04-09 20:11 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Martin J. Bligh, Andrew Morton, Dave McCracken, linux-kernel

On Wed, Apr 09, 2003 at 08:20:28PM +0100, Hugh Dickins wrote:
> Thanks.  Yes, seems conclusive, but I'm puzzled.
> I hope a fresh pair of eyes can work it out for us.

They're pounding ->i_shared_sem, which you already knew.

Here's what I see as far as number of processes mapping what. It seems
to indicate large scale sharing occurs for a number of objects, which
could very well lead to mutual interference for several objects.

It seems to indicate more than glibc is involved, and that there's some
shm involved with large vma count files on "normal" systems as well.

-- wli

how many processes were mapping a given file
	(i.e. remove dups in /proc/$PID/maps)
---------------------------------------------
/lib/libc-2.2.5.so                                          151
/lib/ld-2.2.5.so                                            151
/lib/libnsl-2.2.5.so                                        110
/lib/libnss_compat-2.2.5.so                                 107
/lib/libdl-2.2.5.so                                         85
/lib/libm-2.2.5.so                                          70
/lib/libncurses.so.5.2                                      65
/usr/X11R6/lib/libX11.so.6.2                                44
/usr/X11R6/lib/libSM.so.6.0                                 43
/usr/X11R6/lib/libICE.so.6.3                                43
/lib/libcap.so.1.10                                         39
/usr/lib/zsh/4.0.4/zsh/zle.so                               35
/usr/lib/zsh/4.0.4/zsh/rlimits.so                           35
/usr/lib/zsh/4.0.4/zsh/complete.so                          35
/usr/lib/zsh/4.0.4/zsh/compctl.so                           35
/usr/X11R6/lib/libXpm.so.4.11                               35
/bin/zsh4                                                   35
/lib/libnss_files-2.2.5.so                                  32
/usr/lib/libz.so.1.1.4                                      31
/usr/X11R6/lib/libXext.so.6.4                               24
/lib/libcrypt-2.2.5.so                                      22
/lib/libresolv-2.2.5.so                                     21
/lib/libnss_dns-2.2.5.so                                    21


How many vma's total mapped a given file:
-----------------------------------------
/lib/libc-2.2.5.so            302
/lib/ld-2.2.5.so              302
/lib/libnsl-2.2.5.so          220
/SYSV00000000                 220
/lib/libnss_compat-2.2.5.so   214
/lib/libdl-2.2.5.so           170
/lib/libm-2.2.5.so            140
/lib/libncurses.so.5.2        130
/usr/X11R6/lib/libX11.so.6.2  88
/usr/X11R6/lib/libSM.so.6.0   86
/usr/X11R6/lib/libICE.so.6.3  86
/lib/libcap.so.1.10           78
/usr/lib/zsh/4.0.4/zsh/zle.so 70
/usr/X11R6/lib/libXpm.so.4.11 70
/bin/zsh4                     70
/lib/libnss_files-2.2.5.so    64
/usr/lib/libz.so.1.1.4        62
/usr/X11R6/lib/libXext.so.6.4 48
/lib/libcrypt-2.2.5.so        44
/lib/libresolv-2.2.5.so       42
/lib/libnss_dns-2.2.5.so      42
/usr/X11R6/lib/libXt.so.6.0   40
/usr/X11R6/bin/wterm          40

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-09 19:20       ` Hugh Dickins
  2003-04-09 20:11         ` William Lee Irwin III
@ 2003-04-10 13:52         ` Hugh Dickins
  2003-04-10 14:29           ` Martin J. Bligh
  1 sibling, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2003-04-10 13:52 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, Dave McCracken, linux-kernel

On Wed, 9 Apr 2003, Hugh Dickins wrote:
> On Wed, 9 Apr 2003, Martin J. Bligh wrote:
> > 
> > Here's the diffprofile for just your patch ... where it's positive,
> > that's the increase in the number of ticks by applying your patch.
> > Where it's negative, that's the decrease. The %age is the change from
> > the first to the second profile:
> > 
> > larry:/var/bench/results# diffprofile 2.5.67-mjb0.2{-nosort,}/sdetbench/64/profile
> >       7148    24.9% total
> >       6482    37.7% default_idle
> >       1466   842.5% __down
> >        442   566.7% __wake_up
> >        435   378.3% schedule
> >        251     0.0% move_vma_start
> >        149   876.5% __vma_link
> >         72    40.2% remove_shared_vm_struct
> >         46    35.1% copy_mm
> >         20    60.6% vma_link
> > 
> > Note the massive increase in down() (and some of the vma ops).
> 
> Thanks for all the info, I'm sorry, I must rush away now.
> I'll try another think later, but hope someone can do better.

I've not reproduced this in testing myself (I don't have SDET);
but the conclusion I've come to is that the length of your vma lists
(for one or probably more files) was such that they were already
dangerously extending the hold of i_shared_sem with Dave's linear-
search-to-sort patch, and my additional downs in move_vma_start
then just pushed it over the edge into a thrash of collisions.

Clearly I was wrong to suppose that move_vma_start would scarcely be
called: even in my testing it showed up ~50% higher than __vma_link,
the other user of __vma_link_file.  But we cannot avoid i_shared_sem
there (can probably avoid page_table_lock and I did try doing without
that, just in case my up before spin_unlock had some hideous effect,
but apparently not).

I believe you've done the right thing in 2.5.67-mjb1: chucked out
both my patch and the vma list sorting: it's just too expensive on
the fast path, and you've shown that vividly.

Hugh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-10 13:52         ` Hugh Dickins
@ 2003-04-10 14:29           ` Martin J. Bligh
  2003-04-10 14:39             ` Hugh Dickins
  2003-04-10 14:50             ` Dave McCracken
  0 siblings, 2 replies; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-10 14:29 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Dave McCracken, linux-kernel

>> > Here's the diffprofile for just your patch ... where it's positive,
>> > that's the increase in the number of ticks by applying your patch.
>> > Where it's negative, that's the decrease. The %age is the change from
>> > the first to the second profile:
>> > 
>> > larry:/var/bench/results# diffprofile 2.5.67-mjb0.2{-nosort,}/sdetbench/64/profile
>> >       7148    24.9% total
>> >       6482    37.7% default_idle
>> >       1466   842.5% __down
>> >        442   566.7% __wake_up
>> >        435   378.3% schedule
>> >        251     0.0% move_vma_start
>> >        149   876.5% __vma_link
>> >         72    40.2% remove_shared_vm_struct
>> >         46    35.1% copy_mm
>> >         20    60.6% vma_link
>> > 
>> > Note the massive increase in down() (and some of the vma ops).
>> 
>> Thanks for all the info, I'm sorry, I must rush away now.
>> I'll try another think later, but hope someone can do better.
> 
> I've not reproduced this in testing myself (I don't have SDET);
> but the conclusion I've come to is that the length of your vma lists
> (for one or probably more files) was such that they were already
> dangerously extending the hold of i_shared_sem with Dave's linear-
> search-to-sort patch, and my additional downs in move_vma_start
> then just pushed it over the edge into a thrash of collisions.
> 
> Clearly I was wrong to suppose that move_vma_start would scarcely be
> called: even in my testing it showed up ~50% higher than __vma_link,
> the other user of __vma_link_file.  But we cannot avoid i_shared_sem
> there (can probably avoid page_table_lock and I did try doing without
> that, just in case my up before spin_unlock had some hideous effect,
> but apparently not).

Yeah, sorry ... I guess someone should have published the phone conversation
we had yesterday ... </me pokes Dave in the eye>

We came to the conclusion that should be adding the semaphore to the current 
code even, as list_add_tail isn't atomic to a doubly linked list (unless
maybe you can do some fancy-pants compare and exchange thing after setting
up the prev pointer of the new element already). Which is probably going
to suck performance-wise, but I'd prefer correctness. From there we can
make a better judgment, but it sounds like it's going to content horribly
on those busy semaphores. 

cat /proc/*/maps | nawk '{print $6}' | sort | uniq -c
reveals that we have 600 or so mappings to libc and ld splattered around,
which seems fairly low load ... SDET is doing bunches of shell scripts,
which probably generates the high operations on top of that.

I think the "list of lists" thing will help this, but unless we do 
something like RCU here, I don't see how we can do much to this data
structure without death-by-semaphore contention.

> I believe you've done the right thing in 2.5.67-mjb1: chucked out
> both my patch and the vma list sorting: it's just too expensive on
> the fast path, and you've shown that vividly.

Yeah, I was being grumpy and threw it all out ;-) Needs more
thought before we decide what to do with this stuff.

M.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-10 14:29           ` Martin J. Bligh
@ 2003-04-10 14:39             ` Hugh Dickins
  2003-04-10 14:50             ` Dave McCracken
  1 sibling, 0 replies; 13+ messages in thread
From: Hugh Dickins @ 2003-04-10 14:39 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, Dave McCracken, linux-kernel

On Thu, 10 Apr 2003, Martin J. Bligh wrote:
> 
> Yeah, sorry ... I guess someone should have published the phone conversation
> we had yesterday ... </me pokes Dave in the eye>

No problem: I left you all hanging.

> We came to the conclusion that should be adding the semaphore to the current 
> code even, as list_add_tail isn't atomic to a doubly linked list

Sure you can't list_add_tail without the semaphore: where is it missed?

Hugh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-10 14:29           ` Martin J. Bligh
  2003-04-10 14:39             ` Hugh Dickins
@ 2003-04-10 14:50             ` Dave McCracken
  2003-04-10 14:57               ` Martin J. Bligh
  1 sibling, 1 reply; 13+ messages in thread
From: Dave McCracken @ 2003-04-10 14:50 UTC (permalink / raw)
  To: Martin J. Bligh, Hugh Dickins; +Cc: Andrew Morton, linux-kernel


--On Thursday, April 10, 2003 07:29:03 -0700 "Martin J. Bligh"
<mbligh@aracnet.com> wrote:

> Yeah, sorry ... I guess someone should have published the phone
> conversation we had yesterday ... </me pokes Dave in the eye>
> 
> We came to the conclusion that should be adding the semaphore to the
> current  code even, as list_add_tail isn't atomic to a doubly linked list
> (unless maybe you can do some fancy-pants compare and exchange thing
> after setting up the prev pointer of the new element already). Which is
> probably going to suck performance-wise, but I'd prefer correctness. From
> there we can make a better judgment, but it sounds like it's going to
> content horribly on those busy semaphores. 

I didn't publish the conversation because I realized that the semaphore is
taken outside the function, so it is held.  It's what I called you back to
tell you.

I'm guessing the contention we're seeing with Hugh's fix is because of the
way ld.so works.  It maps the entire library, then does an mprotect to
change the idata section from shared to private.  It does this for every
mapped library after every exec.

Dave

======================================================================
Dave McCracken          IBM Linux Base Kernel Team      1-512-838-3059
dmccr@us.ibm.com                                        T/L   678-3059


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-10 14:50             ` Dave McCracken
@ 2003-04-10 14:57               ` Martin J. Bligh
  2003-04-10 15:21                 ` Hugh Dickins
  0 siblings, 1 reply; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-10 14:57 UTC (permalink / raw)
  To: Dave McCracken, Hugh Dickins; +Cc: Andrew Morton, linux-kernel

>> Yeah, sorry ... I guess someone should have published the phone
>> conversation we had yesterday ... </me pokes Dave in the eye>
>> 
>> We came to the conclusion that should be adding the semaphore to the
>> current  code even, as list_add_tail isn't atomic to a doubly linked list
>> (unless maybe you can do some fancy-pants compare and exchange thing
>> after setting up the prev pointer of the new element already). Which is
>> probably going to suck performance-wise, but I'd prefer correctness. From
>> there we can make a better judgment, but it sounds like it's going to
>> content horribly on those busy semaphores. 
> 
> I didn't publish the conversation because I realized that the semaphore is
> taken outside the function, so it is held.  It's what I called you back to
> tell you.

Oh yeah. I guess I should poke myself in the eye instead ;-)
So it's OK the way it is.
 
> I'm guessing the contention we're seeing with Hugh's fix is because of the
> way ld.so works.  It maps the entire library, then does an mprotect to
> change the idata section from shared to private.  It does this for every
> mapped library after every exec.

Eeek. There's no way we can set this up to do it as two separate VMAs
initially, is there?

M.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-10 14:57               ` Martin J. Bligh
@ 2003-04-10 15:21                 ` Hugh Dickins
  2003-04-10 15:24                   ` Martin J. Bligh
  0 siblings, 1 reply; 13+ messages in thread
From: Hugh Dickins @ 2003-04-10 15:21 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Dave McCracken, Andrew Morton, linux-kernel

On Thu, 10 Apr 2003, Martin J. Bligh wrote:
> 
> Eeek. There's no way we can set this up to do it as two separate VMAs
> initially, is there?

What if we could?  It's already shown the VMA sorting is (liable to be)
too slow.  Changing that most common case won't change the fact.

Hugh


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] fix obj vma sorting
  2003-04-10 15:21                 ` Hugh Dickins
@ 2003-04-10 15:24                   ` Martin J. Bligh
  0 siblings, 0 replies; 13+ messages in thread
From: Martin J. Bligh @ 2003-04-10 15:24 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave McCracken, Andrew Morton, linux-kernel

>> Eeek. There's no way we can set this up to do it as two separate VMAs
>> initially, is there?
> 
> What if we could?  It's already shown the VMA sorting is (liable to be)
> too slow.  Changing that most common case won't change the fact.

Well, it'd thrash it substantially less, I guess. However, you're probably
right ... need a design change instead of tweaking. Doubling the number
of tasks would probably just take us back to where we were before ... need
something more radical.

M.


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2003-04-10 15:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-04-08 18:16 [PATCH] fix obj vma sorting Hugh Dickins
2003-04-09 17:07 ` Martin J. Bligh
2003-04-09 18:24   ` Hugh Dickins
2003-04-09 18:33     ` Martin J. Bligh
2003-04-09 19:20       ` Hugh Dickins
2003-04-09 20:11         ` William Lee Irwin III
2003-04-10 13:52         ` Hugh Dickins
2003-04-10 14:29           ` Martin J. Bligh
2003-04-10 14:39             ` Hugh Dickins
2003-04-10 14:50             ` Dave McCracken
2003-04-10 14:57               ` Martin J. Bligh
2003-04-10 15:21                 ` Hugh Dickins
2003-04-10 15:24                   ` Martin J. Bligh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).