All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V3] Fix Committed_AS underflow
@ 2009-04-20  9:09 Eric B Munson
  2009-04-20  9:18   ` KOSAKI Motohiro
  2009-04-20 16:15   ` Dave Hansen
  0 siblings, 2 replies; 12+ messages in thread
From: Eric B Munson @ 2009-04-20  9:09 UTC (permalink / raw)
  To: kosaki.motohiro; +Cc: dave, linux-mm, linux-kernel, mel, cl, Eric B Munson

This patch makes a small change to an earlier patch by Kosaki Motohiro.
The threshold calculation was changed to avoid the overhead of calculating
the number of online cpus each time the threshold is needed.

As reported by Dave Hansen, the Committed_AS field can underflow in certain
situations:

>         # while true; do cat /proc/meminfo  | grep _AS; sleep 1; done | uniq -c
>               1 Committed_AS: 18446744073709323392 kB
>              11 Committed_AS: 18446744073709455488 kB
>               6 Committed_AS:    35136 kB
>               5 Committed_AS: 18446744073709454400 kB
>               7 Committed_AS:    35904 kB
>               3 Committed_AS: 18446744073709453248 kB
>               2 Committed_AS:    34752 kB
>               9 Committed_AS: 18446744073709453248 kB
>               8 Committed_AS:    34752 kB
>               3 Committed_AS: 18446744073709320960 kB
>               7 Committed_AS: 18446744073709454080 kB
>               3 Committed_AS: 18446744073709320960 kB
>               5 Committed_AS: 18446744073709454080 kB
>               6 Committed_AS: 18446744073709320960 kB

Because NR_CPUS can be greater than 1000 and meminfo_proc_show() does not check
for underflow.

This patch makes two changes:

1. Change NR_CPUS to min(64, NR_CPUS)
   This will limit the amount of possible skew on kernels compiled for very
   large SMP machines.  64 is an arbitrary number selected to limit the worst
   of the skew without using more cache lines.  min(64, NR_CPUS) is used
   instead of nr_online_cpus() because nr_online_cpus() requires a shared
   cache line and a call to hweight to make the calculation.  Its runtime
   overhead and keeping this counter accurate showed up in profiles and it's
   possible that nr_online_cpus() would also show.

2. Add an underflow check to meminfo_proc_show()
   Most fields in /proc/meminfo have an underflow check, Committed_AS should as
   well.  This adds the check.

Reported-by: Dave Hansen <dave@linux.vnet.ibm.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Eric B Munson <ebmunson@us.ibm.com>

---
 fs/proc/meminfo.c |    4 +++-
 mm/swap.c         |    5 +++--
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index 74ea974..facb9fb 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -22,7 +22,7 @@ void __attribute__((weak)) arch_report_meminfo(struct seq_file *m)
 static int meminfo_proc_show(struct seq_file *m, void *v)
 {
 	struct sysinfo i;
-	unsigned long committed;
+	long committed;
 	unsigned long allowed;
 	struct vmalloc_info vmi;
 	long cached;
@@ -36,6 +36,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 	si_meminfo(&i);
 	si_swapinfo(&i);
 	committed = atomic_long_read(&vm_committed_space);
+	if (committed < 0)
+		committed = 0;
 	allowed = ((totalram_pages - hugetlb_total_pages())
 		* sysctl_overcommit_ratio / 100) + total_swap_pages;
 
diff --git a/mm/swap.c b/mm/swap.c
index bede23c..f9a179f 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -494,9 +494,10 @@ EXPORT_SYMBOL(pagevec_lookup_tag);
 #ifdef CONFIG_SMP
 /*
  * We tolerate a little inaccuracy to avoid ping-ponging the counter between
- * CPUs
+ * CPUs.  64 is an arbitrary constant chosen to limit skew in calculating
+ * Committed_AS without using more cache lines.
  */
-#define ACCT_THRESHOLD	max(16, NR_CPUS * 2)
+#define ACCT_THRESHOLD	max_t(long, 16, min(64, NR_CPUS) * 2)
 
 static DEFINE_PER_CPU(long, committed_space);
 
-- 
1.6.1.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
  2009-04-20  9:09 [PATCH V3] Fix Committed_AS underflow Eric B Munson
@ 2009-04-20  9:18   ` KOSAKI Motohiro
  2009-04-20 16:15   ` Dave Hansen
  1 sibling, 0 replies; 12+ messages in thread
From: KOSAKI Motohiro @ 2009-04-20  9:18 UTC (permalink / raw)
  To: Eric B Munson; +Cc: kosaki.motohiro, dave, linux-mm, linux-kernel, mel, cl

> This patch makes a small change to an earlier patch by Kosaki Motohiro.
> The threshold calculation was changed to avoid the overhead of calculating
> the number of online cpus each time the threshold is needed.

IIRC, Mel have the patch of kill num_online_cups() calculation cost.
but I can accept your patch because mel patch haven't merged mainline yet.

Thanks :)




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
@ 2009-04-20  9:18   ` KOSAKI Motohiro
  0 siblings, 0 replies; 12+ messages in thread
From: KOSAKI Motohiro @ 2009-04-20  9:18 UTC (permalink / raw)
  To: Eric B Munson; +Cc: kosaki.motohiro, dave, linux-mm, linux-kernel, mel, cl

> This patch makes a small change to an earlier patch by Kosaki Motohiro.
> The threshold calculation was changed to avoid the overhead of calculating
> the number of online cpus each time the threshold is needed.

IIRC, Mel have the patch of kill num_online_cups() calculation cost.
but I can accept your patch because mel patch haven't merged mainline yet.

Thanks :)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
  2009-04-20  9:09 [PATCH V3] Fix Committed_AS underflow Eric B Munson
@ 2009-04-20 16:15   ` Dave Hansen
  2009-04-20 16:15   ` Dave Hansen
  1 sibling, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2009-04-20 16:15 UTC (permalink / raw)
  To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl

On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote:
> 1. Change NR_CPUS to min(64, NR_CPUS)
>    This will limit the amount of possible skew on kernels compiled for very
>    large SMP machines.  64 is an arbitrary number selected to limit the worst
>    of the skew without using more cache lines.  min(64, NR_CPUS) is used
>    instead of nr_online_cpus() because nr_online_cpus() requires a shared
>    cache line and a call to hweight to make the calculation.  Its runtime
>    overhead and keeping this counter accurate showed up in profiles and it's
>    possible that nr_online_cpus() would also show.



-- Dave


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
@ 2009-04-20 16:15   ` Dave Hansen
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2009-04-20 16:15 UTC (permalink / raw)
  To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl

On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote:
> 1. Change NR_CPUS to min(64, NR_CPUS)
>    This will limit the amount of possible skew on kernels compiled for very
>    large SMP machines.  64 is an arbitrary number selected to limit the worst
>    of the skew without using more cache lines.  min(64, NR_CPUS) is used
>    instead of nr_online_cpus() because nr_online_cpus() requires a shared
>    cache line and a call to hweight to make the calculation.  Its runtime
>    overhead and keeping this counter accurate showed up in profiles and it's
>    possible that nr_online_cpus() would also show.



-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
  2009-04-20 16:15   ` Dave Hansen
@ 2009-04-20 19:49     ` Dave Hansen
  -1 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2009-04-20 19:49 UTC (permalink / raw)
  To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl

On Mon, 2009-04-20 at 09:15 -0700, Dave Hansen wrote:
> On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote:
> > 1. Change NR_CPUS to min(64, NR_CPUS)
> >    This will limit the amount of possible skew on kernels compiled for very
> >    large SMP machines.  64 is an arbitrary number selected to limit the worst
> >    of the skew without using more cache lines.  min(64, NR_CPUS) is used
> >    instead of nr_online_cpus() because nr_online_cpus() requires a shared
> >    cache line and a call to hweight to make the calculation.  Its runtime
> >    overhead and keeping this counter accurate showed up in profiles and it's
> >    possible that nr_online_cpus() would also show.

Wow, that empty reply was really informative, wasn't it? :)

My worry with this min(64, NR_CPUS) approach is that you effectively
ensure that you're going to be doing a lot more cacheline bouncing, but
it isn't quite as explicit.

Now, every time there's a mapping (or set of them) created or destroyed
that nets greater than 64 pages, you've got to go get a r/w cacheline to
a possibly highly contended atomic.  With a number this low, you're
almost guaranteed to hit it at fork() and exec().  Could you
double-check that this doesn't hurt any of the fork() AIM tests?

Another thought is that, instead of trying to fix this up in meminfo, we
could do this in a way that is guaranteed to never skew the global
counter negative: we always keep the *percpu* skew negative.  This
should be the same as what's in the kernel now:

void vm_acct_memory(long pages)
{
        long *local;
	long local_min = -ACCT_THRESHOLD;
	long local_max = ACCT_THRESHOLD;
	long local_goal = 0;

        preempt_disable();
        local = &__get_cpu_var(committed_space);
        *local += pages;
        if (*local > local_max || *local < local_min) {
                atomic_long_add(*local - local_goal, &vm_committed_space);
                *local = local_goal;
        }
        preempt_enable();
}

But now consider if we changed the local_* variables a bit:

	long local_min = -(ACCT_THRESHOLD*2);
	long local_max = 0
	long local_goal = -ACCT_THRESHOLD;

We'll get some possibly *large* numbers in meminfo, but it will at least
never underflow.

-- Dave


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
@ 2009-04-20 19:49     ` Dave Hansen
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2009-04-20 19:49 UTC (permalink / raw)
  To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl

On Mon, 2009-04-20 at 09:15 -0700, Dave Hansen wrote:
> On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote:
> > 1. Change NR_CPUS to min(64, NR_CPUS)
> >    This will limit the amount of possible skew on kernels compiled for very
> >    large SMP machines.  64 is an arbitrary number selected to limit the worst
> >    of the skew without using more cache lines.  min(64, NR_CPUS) is used
> >    instead of nr_online_cpus() because nr_online_cpus() requires a shared
> >    cache line and a call to hweight to make the calculation.  Its runtime
> >    overhead and keeping this counter accurate showed up in profiles and it's
> >    possible that nr_online_cpus() would also show.

Wow, that empty reply was really informative, wasn't it? :)

My worry with this min(64, NR_CPUS) approach is that you effectively
ensure that you're going to be doing a lot more cacheline bouncing, but
it isn't quite as explicit.

Now, every time there's a mapping (or set of them) created or destroyed
that nets greater than 64 pages, you've got to go get a r/w cacheline to
a possibly highly contended atomic.  With a number this low, you're
almost guaranteed to hit it at fork() and exec().  Could you
double-check that this doesn't hurt any of the fork() AIM tests?

Another thought is that, instead of trying to fix this up in meminfo, we
could do this in a way that is guaranteed to never skew the global
counter negative: we always keep the *percpu* skew negative.  This
should be the same as what's in the kernel now:

void vm_acct_memory(long pages)
{
        long *local;
	long local_min = -ACCT_THRESHOLD;
	long local_max = ACCT_THRESHOLD;
	long local_goal = 0;

        preempt_disable();
        local = &__get_cpu_var(committed_space);
        *local += pages;
        if (*local > local_max || *local < local_min) {
                atomic_long_add(*local - local_goal, &vm_committed_space);
                *local = local_goal;
        }
        preempt_enable();
}

But now consider if we changed the local_* variables a bit:

	long local_min = -(ACCT_THRESHOLD*2);
	long local_max = 0
	long local_goal = -ACCT_THRESHOLD;

We'll get some possibly *large* numbers in meminfo, but it will at least
never underflow.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
  2009-04-20 19:49     ` Dave Hansen
@ 2009-04-21  1:41       ` KOSAKI Motohiro
  -1 siblings, 0 replies; 12+ messages in thread
From: KOSAKI Motohiro @ 2009-04-21  1:41 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kosaki.motohiro, Eric B Munson, linux-mm, linux-kernel, mel, cl

> void vm_acct_memory(long pages)
> {
>         long *local;
> 	long local_min = -ACCT_THRESHOLD;
> 	long local_max = ACCT_THRESHOLD;
> 	long local_goal = 0;
> 
>         preempt_disable();
>         local = &__get_cpu_var(committed_space);
>         *local += pages;
>         if (*local > local_max || *local < local_min) {
>                 atomic_long_add(*local - local_goal, &vm_committed_space);
>                 *local = local_goal;
>         }
>         preempt_enable();
> }
> 
> But now consider if we changed the local_* variables a bit:
> 
> 	long local_min = -(ACCT_THRESHOLD*2);
> 	long local_max = 0
> 	long local_goal = -ACCT_THRESHOLD;
> 
> We'll get some possibly *large* numbers in meminfo, but it will at least
> never underflow.

if *local == -(ACCT_THRESHOLD*2), 
  *local - local_goal = -(ACCT_THRESHOLD*2) + ACCT_THRESHOLD = -ACCT_THRESHOLD

Then, we still pass negative value to atomic_long_add().
IOW, vm_committed_space still can be negative value.

Am I missing anything?




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
@ 2009-04-21  1:41       ` KOSAKI Motohiro
  0 siblings, 0 replies; 12+ messages in thread
From: KOSAKI Motohiro @ 2009-04-21  1:41 UTC (permalink / raw)
  To: Dave Hansen
  Cc: kosaki.motohiro, Eric B Munson, linux-mm, linux-kernel, mel, cl

> void vm_acct_memory(long pages)
> {
>         long *local;
> 	long local_min = -ACCT_THRESHOLD;
> 	long local_max = ACCT_THRESHOLD;
> 	long local_goal = 0;
> 
>         preempt_disable();
>         local = &__get_cpu_var(committed_space);
>         *local += pages;
>         if (*local > local_max || *local < local_min) {
>                 atomic_long_add(*local - local_goal, &vm_committed_space);
>                 *local = local_goal;
>         }
>         preempt_enable();
> }
> 
> But now consider if we changed the local_* variables a bit:
> 
> 	long local_min = -(ACCT_THRESHOLD*2);
> 	long local_max = 0
> 	long local_goal = -ACCT_THRESHOLD;
> 
> We'll get some possibly *large* numbers in meminfo, but it will at least
> never underflow.

if *local == -(ACCT_THRESHOLD*2), 
  *local - local_goal = -(ACCT_THRESHOLD*2) + ACCT_THRESHOLD = -ACCT_THRESHOLD

Then, we still pass negative value to atomic_long_add().
IOW, vm_committed_space still can be negative value.

Am I missing anything?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
  2009-04-20 19:49     ` Dave Hansen
  (?)
  (?)
@ 2009-04-23 16:31     ` Eric B Munson
  2009-04-23 20:38         ` Dave Hansen
  -1 siblings, 1 reply; 12+ messages in thread
From: Eric B Munson @ 2009-04-23 16:31 UTC (permalink / raw)
  To: Dave Hansen; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl

[-- Attachment #1: Type: text/plain, Size: 3028 bytes --]

On Mon, 20 Apr 2009, Dave Hansen wrote:

> On Mon, 2009-04-20 at 09:15 -0700, Dave Hansen wrote:
> > On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote:
> > > 1. Change NR_CPUS to min(64, NR_CPUS)
> > >    This will limit the amount of possible skew on kernels compiled for very
> > >    large SMP machines.  64 is an arbitrary number selected to limit the worst
> > >    of the skew without using more cache lines.  min(64, NR_CPUS) is used
> > >    instead of nr_online_cpus() because nr_online_cpus() requires a shared
> > >    cache line and a call to hweight to make the calculation.  Its runtime
> > >    overhead and keeping this counter accurate showed up in profiles and it's
> > >    possible that nr_online_cpus() would also show.
> 
> Wow, that empty reply was really informative, wasn't it? :)
> 
> My worry with this min(64, NR_CPUS) approach is that you effectively
> ensure that you're going to be doing a lot more cacheline bouncing, but
> it isn't quite as explicit.

Unfortunately this is a choice we have to make, do we want to avoid cache
line bouncing of fork-heavy workloads using more than 64 pages or bad
information being used for overcommit decisions?

> 
> Now, every time there's a mapping (or set of them) created or destroyed
> that nets greater than 64 pages, you've got to go get a r/w cacheline to
> a possibly highly contended atomic.  With a number this low, you're
> almost guaranteed to hit it at fork() and exec().  Could you
> double-check that this doesn't hurt any of the fork() AIM tests?

It is unlikely that the aim9 benchmarks would show if this patch was a
problem because it forks in a tight loop and in a process that is not
necessarily beig enough to hit ACCT_THRESHOLD, likely on a single CPU.
In order to show any problems here we need a fork heavy workload with
many threads on many CPUs.

> 
> Another thought is that, instead of trying to fix this up in meminfo, we
> could do this in a way that is guaranteed to never skew the global
> counter negative: we always keep the *percpu* skew negative.  This
> should be the same as what's in the kernel now:
> 
> void vm_acct_memory(long pages)
> {
>         long *local;
> 	long local_min = -ACCT_THRESHOLD;
> 	long local_max = ACCT_THRESHOLD;
> 	long local_goal = 0;
> 
>         preempt_disable();
>         local = &__get_cpu_var(committed_space);
>         *local += pages;
>         if (*local > local_max || *local < local_min) {
>                 atomic_long_add(*local - local_goal, &vm_committed_space);
>                 *local = local_goal;
>         }
>         preempt_enable();
> }
> 
> But now consider if we changed the local_* variables a bit:
> 
> 	long local_min = -(ACCT_THRESHOLD*2);
> 	long local_max = 0
> 	long local_goal = -ACCT_THRESHOLD;
> 
> We'll get some possibly *large* numbers in meminfo, but it will at least
> never underflow.
> 
> -- Dave
> 

-- 
Eric B Munson
IBM Linux Technology Center
ebmunson@us.ibm.com


[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
  2009-04-23 16:31     ` Eric B Munson
@ 2009-04-23 20:38         ` Dave Hansen
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2009-04-23 20:38 UTC (permalink / raw)
  To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl

On Thu, 2009-04-23 at 17:31 +0100, Eric B Munson wrote:
> On Mon, 20 Apr 2009, Dave Hansen wrote:
> 
> > On Mon, 2009-04-20 at 09:15 -0700, Dave Hansen wrote:
> > > On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote:
> > > > 1. Change NR_CPUS to min(64, NR_CPUS)
> > > >    This will limit the amount of possible skew on kernels compiled for very
> > > >    large SMP machines.  64 is an arbitrary number selected to limit the worst
> > > >    of the skew without using more cache lines.  min(64, NR_CPUS) is used
> > > >    instead of nr_online_cpus() because nr_online_cpus() requires a shared
> > > >    cache line and a call to hweight to make the calculation.  Its runtime
> > > >    overhead and keeping this counter accurate showed up in profiles and it's
> > > >    possible that nr_online_cpus() would also show.
> > 
> > Wow, that empty reply was really informative, wasn't it? :)
> > 
> > My worry with this min(64, NR_CPUS) approach is that you effectively
> > ensure that you're going to be doing a lot more cacheline bouncing, but
> > it isn't quite as explicit.
> 
> Unfortunately this is a choice we have to make, do we want to avoid cache
> line bouncing of fork-heavy workloads using more than 64 pages or bad
> information being used for overcommit decisions?

On SLES11, a new bash shell has ~9MB of mapped space.  A cat(1) has
~6.7MB.  The ACCT_THRESHOLD=64*2 pages with a 64k page is 8MB.  So,
you're basically *guaranteed* to go off-cpu every time a shell command
is performed.  So, I'd say 8MB is a bit on the low side.

Note that even with your suggested patch, we can still have 8GB of skew
on a 1024-way machine since there are still num_online_cpus()*8MB.  This
is better than what we have now, certainly.

> > Now, every time there's a mapping (or set of them) created or destroyed
> > that nets greater than 64 pages, you've got to go get a r/w cacheline to
> > a possibly highly contended atomic.  With a number this low, you're
> > almost guaranteed to hit it at fork() and exec().  Could you
> > double-check that this doesn't hurt any of the fork() AIM tests?
> 
> It is unlikely that the aim9 benchmarks would show if this patch was a
> problem because it forks in a tight loop and in a process that is not
> necessarily beig enough to hit ACCT_THRESHOLD, likely on a single CPU.
> In order to show any problems here we need a fork heavy workload with
> many threads on many CPUs.

That's true.  I'd suspect that aim is somewhere around the size of
'cat'.  It's a best-case scenario.  But, if you see degradation on the
best-case workload, we'll certainly see issues on worse workloads.

Also, I used fork as an obvious example here.  This issue will present
itself any time there is a mapping or set of mappings with a net change
in size of 8MB.

There's another alternative.  We could temporarily add statistics to
count how many times we go over ACCT_THRESHOLD to see just how bad the
situation is.  For instance, 64 causes 342.43 times as many touches of
the global counter as 8192 (or whatever it is).  You could also make
ACCT_THRESHOLD a tunable, at least for the duration of this testing.

There are also ways you can bias the global counter in controlled ways.
You could ensure, for instance, that a local counter never gets a
positive bias, causing the global one to be biased negatively.  That
would guarantee no underflows, although it would still be possible to
see vm_committed_space in an temporary *inflated* state.

-- Dave


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH V3] Fix Committed_AS underflow
@ 2009-04-23 20:38         ` Dave Hansen
  0 siblings, 0 replies; 12+ messages in thread
From: Dave Hansen @ 2009-04-23 20:38 UTC (permalink / raw)
  To: Eric B Munson; +Cc: kosaki.motohiro, linux-mm, linux-kernel, mel, cl

On Thu, 2009-04-23 at 17:31 +0100, Eric B Munson wrote:
> On Mon, 20 Apr 2009, Dave Hansen wrote:
> 
> > On Mon, 2009-04-20 at 09:15 -0700, Dave Hansen wrote:
> > > On Mon, 2009-04-20 at 10:09 +0100, Eric B Munson wrote:
> > > > 1. Change NR_CPUS to min(64, NR_CPUS)
> > > >    This will limit the amount of possible skew on kernels compiled for very
> > > >    large SMP machines.  64 is an arbitrary number selected to limit the worst
> > > >    of the skew without using more cache lines.  min(64, NR_CPUS) is used
> > > >    instead of nr_online_cpus() because nr_online_cpus() requires a shared
> > > >    cache line and a call to hweight to make the calculation.  Its runtime
> > > >    overhead and keeping this counter accurate showed up in profiles and it's
> > > >    possible that nr_online_cpus() would also show.
> > 
> > Wow, that empty reply was really informative, wasn't it? :)
> > 
> > My worry with this min(64, NR_CPUS) approach is that you effectively
> > ensure that you're going to be doing a lot more cacheline bouncing, but
> > it isn't quite as explicit.
> 
> Unfortunately this is a choice we have to make, do we want to avoid cache
> line bouncing of fork-heavy workloads using more than 64 pages or bad
> information being used for overcommit decisions?

On SLES11, a new bash shell has ~9MB of mapped space.  A cat(1) has
~6.7MB.  The ACCT_THRESHOLD=64*2 pages with a 64k page is 8MB.  So,
you're basically *guaranteed* to go off-cpu every time a shell command
is performed.  So, I'd say 8MB is a bit on the low side.

Note that even with your suggested patch, we can still have 8GB of skew
on a 1024-way machine since there are still num_online_cpus()*8MB.  This
is better than what we have now, certainly.

> > Now, every time there's a mapping (or set of them) created or destroyed
> > that nets greater than 64 pages, you've got to go get a r/w cacheline to
> > a possibly highly contended atomic.  With a number this low, you're
> > almost guaranteed to hit it at fork() and exec().  Could you
> > double-check that this doesn't hurt any of the fork() AIM tests?
> 
> It is unlikely that the aim9 benchmarks would show if this patch was a
> problem because it forks in a tight loop and in a process that is not
> necessarily beig enough to hit ACCT_THRESHOLD, likely on a single CPU.
> In order to show any problems here we need a fork heavy workload with
> many threads on many CPUs.

That's true.  I'd suspect that aim is somewhere around the size of
'cat'.  It's a best-case scenario.  But, if you see degradation on the
best-case workload, we'll certainly see issues on worse workloads.

Also, I used fork as an obvious example here.  This issue will present
itself any time there is a mapping or set of mappings with a net change
in size of 8MB.

There's another alternative.  We could temporarily add statistics to
count how many times we go over ACCT_THRESHOLD to see just how bad the
situation is.  For instance, 64 causes 342.43 times as many touches of
the global counter as 8192 (or whatever it is).  You could also make
ACCT_THRESHOLD a tunable, at least for the duration of this testing.

There are also ways you can bias the global counter in controlled ways.
You could ensure, for instance, that a local counter never gets a
positive bias, causing the global one to be biased negatively.  That
would guarantee no underflows, although it would still be possible to
see vm_committed_space in an temporary *inflated* state.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2009-04-23 20:38 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-20  9:09 [PATCH V3] Fix Committed_AS underflow Eric B Munson
2009-04-20  9:18 ` KOSAKI Motohiro
2009-04-20  9:18   ` KOSAKI Motohiro
2009-04-20 16:15 ` Dave Hansen
2009-04-20 16:15   ` Dave Hansen
2009-04-20 19:49   ` Dave Hansen
2009-04-20 19:49     ` Dave Hansen
2009-04-21  1:41     ` KOSAKI Motohiro
2009-04-21  1:41       ` KOSAKI Motohiro
2009-04-23 16:31     ` Eric B Munson
2009-04-23 20:38       ` Dave Hansen
2009-04-23 20:38         ` Dave Hansen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.