linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [patch] Inline vm_acct_memory
  2003-05-28 11:05 [patch] Inline vm_acct_memory Ravikiran G Thirumalai
@ 2003-05-28 11:04 ` Andrew Morton
  2003-05-28 15:45 ` Hugh Dickins
  1 sibling, 0 replies; 7+ messages in thread
From: Andrew Morton @ 2003-05-28 11:04 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: linux-kernel

Ravikiran G Thirumalai <kiran@in.ibm.com> wrote:
>
> I found that inlining vm_acct_memory speeds up vm_enough_memory.  
>  Since vm_acct_memory is only called by vm_enough_memory,
>  theoritically inlining should help, and my tests proved so -- there was
>  an improvement of about 10 % in profile ticks (vm_enough_memory) on a 
>  4 processor PIII Xeon running kernbench.

OK.  But with just a single call site we may as well implement vm_acct_memory()
inside mm/memory.c.  Could you please make that change?

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [patch] Inline vm_acct_memory
@ 2003-05-28 11:05 Ravikiran G Thirumalai
  2003-05-28 11:04 ` Andrew Morton
  2003-05-28 15:45 ` Hugh Dickins
  0 siblings, 2 replies; 7+ messages in thread
From: Ravikiran G Thirumalai @ 2003-05-28 11:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Hi Andrew,
I found that inlining vm_acct_memory speeds up vm_enough_memory.  
Since vm_acct_memory is only called by vm_enough_memory,
theoritically inlining should help, and my tests proved so -- there was
an improvement of about 10 % in profile ticks (vm_enough_memory) on a 
4 processor PIII Xeon running kernbench.  Here is the patch.  Tested on 2.5.70

Thanks,
Kiran


D:
D: Patch inlines use of vm_acct_memory
D:

diff -ruN -X dontdiff linux-2.5.69/include/linux/mman.h linux-2.5.69-vm_acct_inline/include/linux/mman.h
--- linux-2.5.69/include/linux/mman.h	Mon May  5 05:23:02 2003
+++ linux-2.5.69-vm_acct_inline/include/linux/mman.h	Tue May 20 12:18:34 2003
@@ -2,6 +2,8 @@
 #define _LINUX_MMAN_H
 
 #include <linux/config.h>
+#include <linux/smp.h>
+#include <linux/percpu.h>
 
 #include <asm/atomic.h>
 #include <asm/mman.h>
@@ -13,7 +15,27 @@
 extern atomic_t vm_committed_space;
 
 #ifdef CONFIG_SMP
-extern void vm_acct_memory(long pages);
+DECLARE_PER_CPU(long, committed_space);
+
+/*
+ * We tolerate a little inaccuracy to avoid ping-ponging the counter between
+ * CPUs
+ */
+#define ACCT_THRESHOLD	max(16, NR_CPUS * 2)
+
+static void inline vm_acct_memory(long pages)
+{
+	long *local;
+
+	preempt_disable();
+	local = &__get_cpu_var(committed_space);
+	*local += pages;
+	if (*local > ACCT_THRESHOLD || *local < -ACCT_THRESHOLD) {
+		atomic_add(*local, &vm_committed_space);
+		*local = 0;
+	}
+	preempt_enable();
+}
 #else
 static inline void vm_acct_memory(long pages)
 {
diff -ruN -X dontdiff linux-2.5.69/mm/mmap.c linux-2.5.69-vm_acct_inline/mm/mmap.c
--- linux-2.5.69/mm/mmap.c	Mon May  5 05:23:32 2003
+++ linux-2.5.69-vm_acct_inline/mm/mmap.c	Tue May 20 12:05:36 2003
@@ -53,6 +53,10 @@
 int sysctl_overcommit_ratio = 50;	/* default is 50% */
 atomic_t vm_committed_space = ATOMIC_INIT(0);
 
+#ifdef CONFIG_SMP
+DEFINE_PER_CPU(long, committed_space) = 0;
+#endif
+
 /*
  * Check that a process has enough memory to allocate a new virtual
  * mapping. 1 means there is enough memory for the allocation to
diff -ruN -X dontdiff linux-2.5.69/mm/swap.c linux-2.5.69-vm_acct_inline/mm/swap.c
--- linux-2.5.69/mm/swap.c	Mon May  5 05:23:13 2003
+++ linux-2.5.69-vm_acct_inline/mm/swap.c	Tue May 20 11:50:59 2003
@@ -349,30 +349,6 @@
 }
 
 
-#ifdef CONFIG_SMP
-/*
- * We tolerate a little inaccuracy to avoid ping-ponging the counter between
- * CPUs
- */
-#define ACCT_THRESHOLD	max(16, NR_CPUS * 2)
-
-static DEFINE_PER_CPU(long, committed_space) = 0;
-
-void vm_acct_memory(long pages)
-{
-	long *local;
-
-	preempt_disable();
-	local = &__get_cpu_var(committed_space);
-	*local += pages;
-	if (*local > ACCT_THRESHOLD || *local < -ACCT_THRESHOLD) {
-		atomic_add(*local, &vm_committed_space);
-		*local = 0;
-	}
-	preempt_enable();
-}
-#endif
-
 
 /*
  * Perform any setup for the swap system

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Inline vm_acct_memory
  2003-05-28 11:05 [patch] Inline vm_acct_memory Ravikiran G Thirumalai
  2003-05-28 11:04 ` Andrew Morton
@ 2003-05-28 15:45 ` Hugh Dickins
  2003-05-29  2:23   ` Andrew Morton
  1 sibling, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2003-05-28 15:45 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Andrew Morton, linux-kernel

On Wed, 28 May 2003, Ravikiran G Thirumalai wrote:
> I found that inlining vm_acct_memory speeds up vm_enough_memory.  
> Since vm_acct_memory is only called by vm_enough_memory,

No, linux/mman.h declares

static inline void vm_unacct_memory(long pages)
{
	vm_acct_memory(-pages);
}

and I count 18 callsites for vm_unacct_memory.

I'm no judge of what's worth inlining, but Andrew is widely known
and feared as The Scourge of Inliners, so I'd advise you to hide...

Hugh


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Inline vm_acct_memory
  2003-05-28 15:45 ` Hugh Dickins
@ 2003-05-29  2:23   ` Andrew Morton
  2003-05-29  4:43     ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 7+ messages in thread
From: Andrew Morton @ 2003-05-29  2:23 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: kiran, linux-kernel

Hugh Dickins <hugh@veritas.com> wrote:
>
> On Wed, 28 May 2003, Ravikiran G Thirumalai wrote:
> > I found that inlining vm_acct_memory speeds up vm_enough_memory.  
> > Since vm_acct_memory is only called by vm_enough_memory,
> 
> No, linux/mman.h declares
> 
> static inline void vm_unacct_memory(long pages)
> {
> 	vm_acct_memory(-pages);
> }
> 
> and I count 18 callsites for vm_unacct_memory.
> 
> I'm no judge of what's worth inlining, but Andrew is widely known
> and feared as The Scourge of Inliners, so I'd advise you to hide...

Maybe I'm wrong.  kernbench is not some silly tight-loop microbenchmark. 
It is a "real" workload: gcc.

The thing about pagetable setup and teardown is that it tends to be called
in big lumps: for a while the process is establishing thousands of pages
and then later it is tearing down thousands.  So the cache-thrashing impact
of having those eighteen instances sprinkled around the place is less
than it would be if they were all being called randomly.  If you can believe
such waffle.

Kiran, do you still have the raw data available?  profiles and runtimes?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Inline vm_acct_memory
  2003-05-29  2:23   ` Andrew Morton
@ 2003-05-29  4:43     ` Ravikiran G Thirumalai
  2003-05-29  5:59       ` Hugh Dickins
  0 siblings, 1 reply; 7+ messages in thread
From: Ravikiran G Thirumalai @ 2003-05-29  4:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Hugh Dickins, linux-kernel

On Wed, May 28, 2003 at 07:23:30PM -0700, Andrew Morton wrote:
>... 
> Maybe I'm wrong.  kernbench is not some silly tight-loop microbenchmark. 
> It is a "real" workload: gcc.
> 
> The thing about pagetable setup and teardown is that it tends to be called
> in big lumps: for a while the process is establishing thousands of pages
> and then later it is tearing down thousands.  So the cache-thrashing impact
> of having those eighteen instances sprinkled around the place is less
> than it would be if they were all being called randomly.  If you can believe
> such waffle.
> 
> Kiran, do you still have the raw data available?  profiles and runtimes?

Yeah I kinda overlooked vm_unacct_memory 'til I wondered why I had put
the inlines in the header file earlier :).  But I do have the profiles
and checking on all calls to vm_unacct_memory, I find there is no 
regression at significant callsites.  
I can provide the detailed profile if required,
but here is the summary for 4 runs of kernbench -- nos against routines
are profile ticks (oprofile) for 4 runs.


Vanilla
-------
vm_enough_memory 786 778 780 735
vm_acct_memory	870 952 884 898
-----------------------------------
tot of above    1656 1730 1664 1633

do_mmap_pgoff 	3559 3673 3669 3807
expand_stack	27 34 33 42
unmap_region	236 267 280 280
do_brk		594 596 615 615
exit_mmap	51 52 44 52	
mprotect_fixup	47 55 55 57
do_mremap	0 2 1 1
shmem_notify_change 0 0 0 0
shmem_notify_change 0 0 0 0	
shmem_delete_inode  0 0 0 0
shmem_file_write 0 0 0 0	
shmem_symlink  0 0 0 0
shmem_file_setup 0 0 0 0
sys_swapoff	0 0 0 0
poll_idle	101553 108687 89281 86251

Inline
------
vm_enough_memory 1539 1488 1488 1472
do_mmap_pgoff 	3510 3523 3436 3475
expand_stack	27 33 32 27
unmap_region	295 340 311 349
do_brk		641 583 659 640
exit_mmap	33 52 44 42
mprotect_fixup	54 65 73 64
do_mremap	2 0 0 0
shmem_notify_change 0 0 0 0
shmem_notify_change 0 0 0 0	
shmem_delete_inode  0 0 0 0
shmem_file_write 0 0 0 0	
shmem_symlink  0 0 0 0
shmem_file_setup 0 0 0 0
sys_swapoff	0 0 0 0
poll_idle	98733 85659 104994 103096


Thanks,
Kiran

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Inline vm_acct_memory
  2003-05-29  4:43     ` Ravikiran G Thirumalai
@ 2003-05-29  5:59       ` Hugh Dickins
  2003-05-29  9:52         ` Ravikiran G Thirumalai
  0 siblings, 1 reply; 7+ messages in thread
From: Hugh Dickins @ 2003-05-29  5:59 UTC (permalink / raw)
  To: Ravikiran G Thirumalai; +Cc: Andrew Morton, linux-kernel

On Thu, 29 May 2003, Ravikiran G Thirumalai wrote:
> 
> Yeah I kinda overlooked vm_unacct_memory 'til I wondered why I had put
> the inlines in the header file earlier :).  But I do have the profiles
> and checking on all calls to vm_unacct_memory, I find there is no 
> regression at significant callsites.  
> I can provide the detailed profile if required,
> but here is the summary for 4 runs of kernbench -- nos against routines
> are profile ticks (oprofile) for 4 runs.
> 
> Vanilla
> -------
> vm_enough_memory 786 778 780 735
> vm_acct_memory	870 952 884 898
> -----------------------------------
> tot of above    1656 1730 1664 1633
> 
> do_mmap_pgoff 	3559 3673 3669 3807
> expand_stack	27 34 33 42
> unmap_region	236 267 280 280
> do_brk		594 596 615 615
> exit_mmap	51 52 44 52	
> mprotect_fixup	47 55 55 57
> do_mremap	0 2 1 1
> poll_idle	101553 108687 89281 86251
> 
> Inline
> ------
> vm_enough_memory 1539 1488 1488 1472
> do_mmap_pgoff 	3510 3523 3436 3475
> expand_stack	27 33 32 27
> unmap_region	295 340 311 349
> do_brk		641 583 659 640
> exit_mmap	33 52 44 42
> mprotect_fixup	54 65 73 64
> do_mremap	2 0 0 0
> poll_idle	98733 85659 104994 103096

There does look to be a regression in unmap_region.

Though I'd be reluctant to assign much general significance
to any of these numbers (suspect it might all come out quite
differently on another machine, another config, another run).

But the probable best course is just to inline vm_acct_memory
as you did, but also uninline vm_unacct_memory: placing the
static inline vm_acct_memory and then vm_unacct_memory just
before vm_enough_memory in mm/mmap.c.

Hugh



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [patch] Inline vm_acct_memory
  2003-05-29  5:59       ` Hugh Dickins
@ 2003-05-29  9:52         ` Ravikiran G Thirumalai
  0 siblings, 0 replies; 7+ messages in thread
From: Ravikiran G Thirumalai @ 2003-05-29  9:52 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, linux-kernel

On Thu, May 29, 2003 at 06:59:47AM +0100, Hugh Dickins wrote:
> On Thu, 29 May 2003, Ravikiran G Thirumalai wrote:
> > ...
> > Vanilla
> > -------
> > vm_enough_memory 786 778 780 735
> > vm_acct_memory	870 952 884 898
> > -----------------------------------
> > tot of above    1656 1730 1664 1633
> > 
> > do_mmap_pgoff 	3559 3673 3669 3807
> > expand_stack	27 34 33 42
> > unmap_region	236 267 280 280
> > do_brk		594 596 615 615
> > exit_mmap	51 52 44 52	
> > mprotect_fixup	47 55 55 57
> > do_mremap	0 2 1 1
> > poll_idle	101553 108687 89281 86251
> > 
> > Inline
> > ------
> > vm_enough_memory 1539 1488 1488 1472
> > do_mmap_pgoff 	3510 3523 3436 3475
> > expand_stack	27 33 32 27
> > unmap_region	295 340 311 349
> > do_brk		641 583 659 640
> > exit_mmap	33 52 44 42
> > mprotect_fixup	54 65 73 64
> > do_mremap	2 0 0 0
> > poll_idle	98733 85659 104994 103096
> 
> There does look to be a regression in unmap_region.

Yeah, but there is an improvement of a greater magnitude in do_mmap_pgoff
...by about 190 ticks. There is a regression of about 60 ticks in 
unmap_region (taking standard deviation into consideration).  Change
in ticks for do_brk is not statistically significant.

> 
> Though I'd be reluctant to assign much general significance
> to any of these numbers (suspect it might all come out quite
> differently on another machine, another config, another run).
>

I agree with this for vm_unacct_memory, since it is called from many
places.  Based on the workload, nos could be different (if all the
callsites are called with comparable frequency).   
However, if the workload is such that one particular caller is
stressed much more than other callers, then, maybe inlining  helps 
as in this case where do_mmap_pgoff is stresed more?  

> But the probable best course is just to inline vm_acct_memory
> as you did, but also uninline vm_unacct_memory: placing the
> static inline vm_acct_memory and then vm_unacct_memory just
> before vm_enough_memory in mm/mmap.c.

Sounds reasonable.  I'll give this a run and see how the profiles look.

Thanks,
Kiran

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-05-29  9:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-28 11:05 [patch] Inline vm_acct_memory Ravikiran G Thirumalai
2003-05-28 11:04 ` Andrew Morton
2003-05-28 15:45 ` Hugh Dickins
2003-05-29  2:23   ` Andrew Morton
2003-05-29  4:43     ` Ravikiran G Thirumalai
2003-05-29  5:59       ` Hugh Dickins
2003-05-29  9:52         ` Ravikiran G Thirumalai

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).