linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* cpusets: randomize node rotor used in cpuset_mem_spread_node()
@ 2011-04-13 14:35 Michal Hocko
  2011-04-14  2:19 ` KOSAKI Motohiro
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2011-04-13 14:35 UTC (permalink / raw)
  To: LKML
  Cc: Jack Steiner, Lee Schermerhorn, Christoph Lameter, Pekka Enberg,
	Paul Menage, Robin Holt, Andrew Morton, Linus Torvalds

Hi,
I have revisited and fixed the following patch. I am keeping Jack's and
Lee's s-o-b and all others are CCed because I am not exactly sure who
worked on the patch originally.
Can we consider it for re-inclusion?
---
>From 4a2cfbdcfeae9d49ae9ff3486150bec2e4c6107e Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Wed, 13 Apr 2011 16:29:36 +0200
Subject: [PATCH] cpusets: randomize node rotor used in cpuset_mem_spread_node()

[This patch has already been accepted as 0ac0c0d but later reverted
(35926ff) because it itroduced arch specific __node_random which was
defined only for x86 code so it broke other archs.
This is a followup without any arch specific code. Other than that there
are no functional changes.]

Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number of
the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
[mhocko@suse.cz: Make it arch independent]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/bitmap.h   |    1 +
 include/linux/nodemask.h |    7 +++++++
 kernel/fork.c            |    4 ++++
 lib/bitmap.c             |    2 +-
 mm/mempolicy.c           |   16 ++++++++++++++++
 5 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..6fb2720 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -141,6 +141,7 @@ extern int bitmap_find_free_region(unsigned long *bitmap, int bits, int order);
 extern void bitmap_release_region(unsigned long *bitmap, int pos, int order);
 extern int bitmap_allocate_region(unsigned long *bitmap, int pos, int order);
 extern void bitmap_copy_le(void *dst, const unsigned long *src, int nbits);
+extern int bitmap_ord_to_pos(const unsigned long *bitmap, int n, int bits);
 
 #define BITMAP_LAST_WORD_MASK(nbits)					\
 (									\
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index dba35e4..66ccda9 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -66,6 +66,8 @@
  * int num_online_nodes()		Number of online Nodes
  * int num_possible_nodes()		Number of all possible Nodes
  *
+ * int node_random(mask)		Random node with set bit in mask
+ *
  * int node_online(node)		Is some node online?
  * int node_possible(node)		Is some node possible?
  *
@@ -430,6 +432,9 @@ static inline void node_set_offline(int nid)
 	node_clear_state(nid, N_ONLINE);
 	nr_online_nodes = num_node_state(N_ONLINE);
 }
+
+extern int node_random(const nodemask_t *maskp);
+
 #else
 
 static inline int node_state(int node, enum node_states state)
@@ -460,6 +465,8 @@ static inline int num_node_state(enum node_states state)
 
 #define node_set_online(node)	   node_set_state((node), N_ONLINE)
 #define node_set_offline(node)	   node_clear_state((node), N_ONLINE)
+
+static inline int node_random(const nodemask_t *mask) { return 0; }
 #endif
 
 #define node_online_map 	node_states[N_ONLINE]
diff --git a/kernel/fork.c b/kernel/fork.c
index e7548de..f59c686 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1119,6 +1119,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
  	}
 	mpol_fix_fork_child_flag(p);
 #endif
+#ifdef CONFIG_CPUSETS
+	p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
+	p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
+#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 91e0ccf..a4b5217 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -671,7 +671,7 @@ static int bitmap_pos_to_ord(const unsigned long *buf, int pos, int bits)
  *
  * The bit positions 0 through @bits are valid positions in @buf.
  */
-static int bitmap_ord_to_pos(const unsigned long *buf, int ord, int bits)
+int bitmap_ord_to_pos(const unsigned long *buf, int ord, int bits)
 {
 	int pos = 0;
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 959a8b8..8e57a72 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -93,6 +93,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
+#include <linux/random.h>
 
 #include "internal.h"
 
@@ -1649,6 +1650,21 @@ static inline unsigned interleave_nid(struct mempolicy *pol,
 		return interleave_nodes(pol);
 }
 
+/*
+ * Return the bit number of a random bit set in the nodemask.
+ * (returns -1 if nodemask is empty)
+ */
+int node_random(const nodemask_t *maskp)
+{
+	int w, bit = -1;
+
+	w = nodes_weight(*maskp);
+	if (w)
+		bit = bitmap_ord_to_pos(maskp->bits,
+			get_random_int() % w, MAX_NUMNODES);
+	return bit;
+}
+
 #ifdef CONFIG_HUGETLBFS
 /*
  * huge_zonelist(@vma, @addr, @gfp_flags, @mpol)
-- 
1.7.4.1

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-13 14:35 cpusets: randomize node rotor used in cpuset_mem_spread_node() Michal Hocko
@ 2011-04-14  2:19 ` KOSAKI Motohiro
  2011-04-14  6:51   ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-04-14  2:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kosaki.motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Andrew Morton, Linus Torvalds

> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 959a8b8..8e57a72 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -93,6 +93,7 @@
>  
>  #include <asm/tlbflush.h>
>  #include <asm/uaccess.h>
> +#include <linux/random.h>
>  
>  #include "internal.h"
>  
> @@ -1649,6 +1650,21 @@ static inline unsigned interleave_nid(struct mempolicy *pol,
>  		return interleave_nodes(pol);
>  }
>  
> +/*
> + * Return the bit number of a random bit set in the nodemask.
> + * (returns -1 if nodemask is empty)
> + */
> +int node_random(const nodemask_t *maskp)
> +{
> +	int w, bit = -1;
> +
> +	w = nodes_weight(*maskp);
> +	if (w)
> +		bit = bitmap_ord_to_pos(maskp->bits,
> +			get_random_int() % w, MAX_NUMNODES);
> +	return bit;
> +}
> +
>  #ifdef CONFIG_HUGETLBFS
>  /*
>   * huge_zonelist(@vma, @addr, @gfp_flags, @mpol)

mempolicy.c is no best place for putting generic nodemask utility function.
but unforunately we have no alternative. Gack.
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-14  2:19 ` KOSAKI Motohiro
@ 2011-04-14  6:51   ` Michal Hocko
  2011-04-14  7:01     ` KOSAKI Motohiro
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2011-04-14  6:51 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Jack Steiner, Lee Schermerhorn, Christoph Lameter,
	Pekka Enberg, Paul Menage, Robin Holt, Andrew Morton,
	Linus Torvalds

On Thu 14-04-11 11:19:24, KOSAKI Motohiro wrote:
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index 959a8b8..8e57a72 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -93,6 +93,7 @@
> >  
> >  #include <asm/tlbflush.h>
> >  #include <asm/uaccess.h>
> > +#include <linux/random.h>
> >  
> >  #include "internal.h"
> >  
> > @@ -1649,6 +1650,21 @@ static inline unsigned interleave_nid(struct mempolicy *pol,
> >  		return interleave_nodes(pol);
> >  }
> >  
> > +/*
> > + * Return the bit number of a random bit set in the nodemask.
> > + * (returns -1 if nodemask is empty)
> > + */
> > +int node_random(const nodemask_t *maskp)
> > +{
> > +	int w, bit = -1;
> > +
> > +	w = nodes_weight(*maskp);
> > +	if (w)
> > +		bit = bitmap_ord_to_pos(maskp->bits,
> > +			get_random_int() % w, MAX_NUMNODES);
> > +	return bit;
> > +}
> > +
> >  #ifdef CONFIG_HUGETLBFS
> >  /*
> >   * huge_zonelist(@vma, @addr, @gfp_flags, @mpol)
>
> mempolicy.c is no best place for putting generic nodemask utility function.
> but unforunately we have no alternative. Gack.

Agreed. I was thinking about adding a new mm/numa.c but then I concluded
that it would be overkill for the single function.

> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thanks!
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-14  6:51   ` Michal Hocko
@ 2011-04-14  7:01     ` KOSAKI Motohiro
  2011-04-15  7:18       ` KOSAKI Motohiro
  0 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-04-14  7:01 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kosaki.motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Andrew Morton, Linus Torvalds

> > mempolicy.c is no best place for putting generic nodemask utility function.
> > but unforunately we have no alternative. Gack.
> 
> Agreed. I was thinking about adding a new mm/numa.c but then I concluded
> that it would be overkill for the single function.

Agreed. Your choice is best in practical. I think. ;-)




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-14  7:01     ` KOSAKI Motohiro
@ 2011-04-15  7:18       ` KOSAKI Motohiro
  2011-04-15  8:20         ` [PATCH v2] " Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-04-15  7:18 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kosaki.motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Andrew Morton, Linus Torvalds

Oops.
I should have look into !mempolicy part too.
I'm sorry.

> diff --git a/kernel/fork.c b/kernel/fork.c
> index e7548de..f59c686 100644
> --- a/kernel/fork.c
> +++ b/kernel/fork.c
> @@ -1119,6 +1119,10 @@ static struct task_struct *copy_process(unsigned long clone_flags,
>   	}
>  	mpol_fix_fork_child_flag(p);
>  #endif
> +#ifdef CONFIG_CPUSETS
> +	p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
> +	p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
> +#endif

Michal, I think this should be

#ifdef CONFIG_CPUSETS
	if (cpuset_do_page_mem_spread())
		p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
	if (cpuset_do_slab_mem_spread())
		p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
#endif

because 99.999% people don't use cpuset's spread mem/slab feature and
get_random_int() isn't zero cost.

What do you think?




^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-15  7:18       ` KOSAKI Motohiro
@ 2011-04-15  8:20         ` Michal Hocko
  2011-04-15  8:29           ` KOSAKI Motohiro
                             ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Michal Hocko @ 2011-04-15  8:20 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Jack Steiner, Lee Schermerhorn, Christoph Lameter,
	Pekka Enberg, Paul Menage, Robin Holt, Andrew Morton,
	Linus Torvalds, linux-mm

[I just realized that I forgot to CC mm mailing list]

On Fri 15-04-11 16:18:45, KOSAKI Motohiro wrote:
> Oops.
> I should have look into !mempolicy part too.
> I'm sorry.
> 
[...]
> Michal, I think this should be
> 
> #ifdef CONFIG_CPUSETS
> 	if (cpuset_do_page_mem_spread())
> 		p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
> 	if (cpuset_do_slab_mem_spread())
> 		p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
> #endif
> 
> because 99.999% people don't use cpuset's spread mem/slab feature and
> get_random_int() isn't zero cost.
> 
> What do you think?

You are right. I was thinking about lazy approach and initialize those
values when they are used for the first time. What about the patch
below?

Change from v1:
- initialize cpuset_{mem,slab}_spread_rotor lazily

---
>From 46da525789a41cb10096aa09c5bc0e7dcc3ce6db Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.cz>
Date: Wed, 13 Apr 2011 16:29:36 +0200
Subject: [PATCH] cpusets: randomize node rotor used in cpuset_mem_spread_node()

[This patch has already been accepted as 0ac0c0d but later reverted
(35926ff) because it itroduced arch specific __node_random which was
defined only for x86 code so it broke other archs.
This is a followup without any arch specific code. Other than that there
are no functional changes.]

Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems).  Part of the reason is that
the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number of
the cpuset. We are initializating it lazily in cpuset_mem_spread_node
resp. cpuset_slab_spread_node.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
[mhocko@suse.cz: Make it arch independent + lazy initialization]
Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Paul Menage <menage@google.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
---
 include/linux/bitmap.h   |    1 +
 include/linux/nodemask.h |    7 +++++++
 kernel/cpuset.c          |    8 ++++++++
 kernel/fork.c            |    5 +++++
 lib/bitmap.c             |    2 +-
 mm/mempolicy.c           |   16 ++++++++++++++++
 6 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index daf8c48..6fb2720 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -141,6 +141,7 @@ extern int bitmap_find_free_region(unsigned long *bitmap, int bits, int order);
 extern void bitmap_release_region(unsigned long *bitmap, int pos, int order);
 extern int bitmap_allocate_region(unsigned long *bitmap, int pos, int order);
 extern void bitmap_copy_le(void *dst, const unsigned long *src, int nbits);
+extern int bitmap_ord_to_pos(const unsigned long *bitmap, int n, int bits);
 
 #define BITMAP_LAST_WORD_MASK(nbits)					\
 (									\
diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
index dba35e4..66ccda9 100644
--- a/include/linux/nodemask.h
+++ b/include/linux/nodemask.h
@@ -66,6 +66,8 @@
  * int num_online_nodes()		Number of online Nodes
  * int num_possible_nodes()		Number of all possible Nodes
  *
+ * int node_random(mask)		Random node with set bit in mask
+ *
  * int node_online(node)		Is some node online?
  * int node_possible(node)		Is some node possible?
  *
@@ -430,6 +432,9 @@ static inline void node_set_offline(int nid)
 	node_clear_state(nid, N_ONLINE);
 	nr_online_nodes = num_node_state(N_ONLINE);
 }
+
+extern int node_random(const nodemask_t *maskp);
+
 #else
 
 static inline int node_state(int node, enum node_states state)
@@ -460,6 +465,8 @@ static inline int num_node_state(enum node_states state)
 
 #define node_set_online(node)	   node_set_state((node), N_ONLINE)
 #define node_set_offline(node)	   node_clear_state((node), N_ONLINE)
+
+static inline int node_random(const nodemask_t *mask) { return 0; }
 #endif
 
 #define node_online_map 	node_states[N_ONLINE]
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 33eee16..5376478 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -2465,11 +2465,19 @@ static int cpuset_spread_node(int *rotor)
 
 int cpuset_mem_spread_node(void)
 {
+	if (current->cpuset_mem_spread_rotor == -1)
+		current->cpuset_mem_spread_rotor =
+			node_random(&current->mems_allowed);
+
 	return cpuset_spread_node(&current->cpuset_mem_spread_rotor);
 }
 
 int cpuset_slab_spread_node(void)
 {
+	if (current->cpuset_slab_spread_rotor == -1)
+		current->cpuset_slab_spread_rotor
+			= node_random(&current->mems_allowed);
+
 	return cpuset_spread_node(&current->cpuset_slab_spread_rotor);
 }
 
diff --git a/kernel/fork.c b/kernel/fork.c
index e7548de..0883cff 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1119,6 +1119,11 @@ static struct task_struct *copy_process(unsigned long clone_flags,
  	}
 	mpol_fix_fork_child_flag(p);
 #endif
+#ifdef CONFIG_CPUSETS
+	/* Will be initialized lazily when first used */
+	p->cpuset_mem_spread_rotor = -1;
+	p->cpuset_slab_spread_rotor = -1;
+#endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
 #ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 91e0ccf..a4b5217 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -671,7 +671,7 @@ static int bitmap_pos_to_ord(const unsigned long *buf, int pos, int bits)
  *
  * The bit positions 0 through @bits are valid positions in @buf.
  */
-static int bitmap_ord_to_pos(const unsigned long *buf, int ord, int bits)
+int bitmap_ord_to_pos(const unsigned long *buf, int ord, int bits)
 {
 	int pos = 0;
 
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 959a8b8..8e57a72 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -93,6 +93,7 @@
 
 #include <asm/tlbflush.h>
 #include <asm/uaccess.h>
+#include <linux/random.h>
 
 #include "internal.h"
 
@@ -1649,6 +1650,21 @@ static inline unsigned interleave_nid(struct mempolicy *pol,
 		return interleave_nodes(pol);
 }
 
+/*
+ * Return the bit number of a random bit set in the nodemask.
+ * (returns -1 if nodemask is empty)
+ */
+int node_random(const nodemask_t *maskp)
+{
+	int w, bit = -1;
+
+	w = nodes_weight(*maskp);
+	if (w)
+		bit = bitmap_ord_to_pos(maskp->bits,
+			get_random_int() % w, MAX_NUMNODES);
+	return bit;
+}
+
 #ifdef CONFIG_HUGETLBFS
 /*
  * huge_zonelist(@vma, @addr, @gfp_flags, @mpol)
-- 
1.7.4.1


-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-15  8:20         ` [PATCH v2] " Michal Hocko
@ 2011-04-15  8:29           ` KOSAKI Motohiro
  2011-04-15  8:31             ` Michal Hocko
  2011-04-15 23:42           ` David Rientjes
  2011-05-26 22:33           ` [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node() Andrew Morton
  2 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2011-04-15  8:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: kosaki.motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Andrew Morton, Linus Torvalds, linux-mm

> [I just realized that I forgot to CC mm mailing list]
> 
> On Fri 15-04-11 16:18:45, KOSAKI Motohiro wrote:
> > Oops.
> > I should have look into !mempolicy part too.
> > I'm sorry.
> > 
> [...]
> > Michal, I think this should be
> > 
> > #ifdef CONFIG_CPUSETS
> > 	if (cpuset_do_page_mem_spread())
> > 		p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
> > 	if (cpuset_do_slab_mem_spread())
> > 		p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
> > #endif
> > 
> > because 99.999% people don't use cpuset's spread mem/slab feature and
> > get_random_int() isn't zero cost.
> > 
> > What do you think?
> 
> You are right. I was thinking about lazy approach and initialize those
> values when they are used for the first time. What about the patch
> below?
> 
> Change from v1:
> - initialize cpuset_{mem,slab}_spread_rotor lazily

Yeah! This is much much better than mine. Thank you!
	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-15  8:29           ` KOSAKI Motohiro
@ 2011-04-15  8:31             ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2011-04-15  8:31 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: LKML, Jack Steiner, Lee Schermerhorn, Christoph Lameter,
	Pekka Enberg, Paul Menage, Robin Holt, Andrew Morton,
	Linus Torvalds, linux-mm

On Fri 15-04-11 17:29:00, KOSAKI Motohiro wrote:
[...]
> > Change from v1:
> > - initialize cpuset_{mem,slab}_spread_rotor lazily
> 
> Yeah! This is much much better than mine. Thank you!
> 	Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Thank you for the careful review.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-15  8:20         ` [PATCH v2] " Michal Hocko
  2011-04-15  8:29           ` KOSAKI Motohiro
@ 2011-04-15 23:42           ` David Rientjes
  2011-04-18  8:42             ` [PATCH incremental] cpusets: initialize spread rotor lazily Michal Hocko
  2011-05-26 22:33           ` [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node() Andrew Morton
  2 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-04-15 23:42 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KOSAKI Motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Andrew Morton, Linus Torvalds, linux-mm

On Fri, 15 Apr 2011, Michal Hocko wrote:

> You are right. I was thinking about lazy approach and initialize those
> values when they are used for the first time. What about the patch
> below?
> 
> Change from v1:
> - initialize cpuset_{mem,slab}_spread_rotor lazily
> 

The difference between this v2 patch and what is already in the -mm tree 
(http://userweb.kernel.org/~akpm/mmotm/broken-out/cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node.patch) 
is the lazy initialization by adding cpuset_{mem,slab}_spread_node()?

It'd probably be better to just make an incremental patch on top of 
mmotm-2011-04-14-15-08 with a new changelog and then propose with with 
your list of reviewed-by lines.

Andrew could easily drop the earlier version and merge this v2, but I'm 
asking for selfish reasons: please use NUMA_NO_NODE instead of -1.

Thanks!

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH incremental] cpusets: initialize spread rotor lazily
  2011-04-15 23:42           ` David Rientjes
@ 2011-04-18  8:42             ` Michal Hocko
  2011-04-18 20:19               ` David Rientjes
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2011-04-18  8:42 UTC (permalink / raw)
  To: Andrew Morton, David Rientjes
  Cc: KOSAKI Motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Linus Torvalds, linux-mm

On Fri 15-04-11 16:42:13, David Rientjes wrote:
> On Fri, 15 Apr 2011, Michal Hocko wrote:
> 
> > You are right. I was thinking about lazy approach and initialize those
> > values when they are used for the first time. What about the patch
> > below?
> > 
> > Change from v1:
> > - initialize cpuset_{mem,slab}_spread_rotor lazily
> > 
> 
> The difference between this v2 patch and what is already in the -mm tree 
> (http://userweb.kernel.org/~akpm/mmotm/broken-out/cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node.patch) 
> is the lazy initialization by adding cpuset_{mem,slab}_spread_node()?

Yes.
 
> It'd probably be better to just make an incremental patch on top of 
> mmotm-2011-04-14-15-08 with a new changelog and then propose with with 
> your list of reviewed-by lines.

Sure, no problems. Maybe it will be easier for Andrew as well.

> 
> Andrew could easily drop the earlier version and merge this v2, but I'm 
> asking for selfish reasons:

Just out of curiosity. What is the reason? Don't want to wait for new mmotm?

> please use NUMA_NO_NODE instead of -1.

Good idea. I have updated the patch.

Changes from v2:
 - use NUMA_NO_NODE rather than hardcoded -1
 - make the patch incremental to the original one because that one is in
   -mm tree already.
Changes from v1:
 - initialize cpuset_{mem,slab}_spread_rotor lazily}

[Here is the follow-up patch based on top of
http://userweb.kernel.org/~akpm/mmotm/broken-out/cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node.patch]
---
From: Michal Hocko <mhocko@suse.cz>
Subject: cpusets: initialize spread mem/slab rotor lazily

Kosaki Motohiro raised a concern that copy_process is hot path and we do
not want to initialize cpuset_{mem,slab}_spread_rotor if they are not
used most of the time.

I think that we should rather intialize it lazily when rotors are used
for the first time.
This will also catch the case when we set up spread mem/slab later.

Also do not use -1 for unitialized nodes and rather use NUMA_NO_NODE
instead.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 cpuset.c |    8 ++++++++
 fork.c   |    4 ++--
 2 files changed, 10 insertions(+), 2 deletions(-)
Index: linus_tree/kernel/cpuset.c
===================================================================
--- linus_tree.orig/kernel/cpuset.c	2011-04-18 10:33:15.000000000 +0200
+++ linus_tree/kernel/cpuset.c	2011-04-18 10:33:56.000000000 +0200
@@ -2460,11 +2460,19 @@ static int cpuset_spread_node(int *rotor
 
 int cpuset_mem_spread_node(void)
 {
+	if (current->cpuset_mem_spread_rotor == NUMA_NO_NODE)
+		current->cpuset_mem_spread_rotor =
+			node_random(&current->mems_allowed);
+
 	return cpuset_spread_node(&current->cpuset_mem_spread_rotor);
 }
 
 int cpuset_slab_spread_node(void)
 {
+	if (current->cpuset_slab_spread_rotor == NUMA_NO_NODE)
+		current->cpuset_slab_spread_rotor
+			= node_random(&current->mems_allowed);
+
 	return cpuset_spread_node(&current->cpuset_slab_spread_rotor);
 }
 
Index: linus_tree/kernel/fork.c
===================================================================
--- linus_tree.orig/kernel/fork.c	2011-04-18 10:33:15.000000000 +0200
+++ linus_tree/kernel/fork.c	2011-04-18 10:33:56.000000000 +0200
@@ -1126,8 +1126,8 @@ static struct task_struct *copy_process(
 	mpol_fix_fork_child_flag(p);
 #endif
 #ifdef CONFIG_CPUSETS
-	p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
-	p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
+	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
+	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH incremental] cpusets: initialize spread rotor lazily
  2011-04-18  8:42             ` [PATCH incremental] cpusets: initialize spread rotor lazily Michal Hocko
@ 2011-04-18 20:19               ` David Rientjes
  2011-04-18 21:29                 ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-04-18 20:19 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Mon, 18 Apr 2011, Michal Hocko wrote:

> > It'd probably be better to just make an incremental patch on top of 
> > mmotm-2011-04-14-15-08 with a new changelog and then propose with with 
> > your list of reviewed-by lines.
> 
> Sure, no problems. Maybe it will be easier for Andrew as well.
> 
> > Andrew could easily drop the earlier version and merge this v2, but I'm 
> > asking for selfish reasons:
> 
> Just out of curiosity. What is the reason? Don't want to wait for new mmotm?
> 

Because lazy initialization is another feature on top of the existing 
patch so it should be done incrementally instead of proposing an entirely 
new patch which is already mostly in -mm.

> > please use NUMA_NO_NODE instead of -1.
> 
> Good idea. I have updated the patch.
> 

Thanks.

> Changes from v2:
>  - use NUMA_NO_NODE rather than hardcoded -1
>  - make the patch incremental to the original one because that one is in
>    -mm tree already.
> Changes from v1:
>  - initialize cpuset_{mem,slab}_spread_rotor lazily}
> 
> [Here is the follow-up patch based on top of
> http://userweb.kernel.org/~akpm/mmotm/broken-out/cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node.patch]
> ---
> From: Michal Hocko <mhocko@suse.cz>
> Subject: cpusets: initialize spread mem/slab rotor lazily
> 
> Kosaki Motohiro raised a concern that copy_process is hot path and we do
> not want to initialize cpuset_{mem,slab}_spread_rotor if they are not
> used most of the time.
> 
> I think that we should rather intialize it lazily when rotors are used
> for the first time.
> This will also catch the case when we set up spread mem/slab later.
> 
> Also do not use -1 for unitialized nodes and rather use NUMA_NO_NODE
> instead.
> 

Don't need to refer to a previous version that used -1 since it will never 
be committed and nobody will know what you're talking about in the git 
log.

> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
> ---
>  cpuset.c |    8 ++++++++
>  fork.c   |    4 ++--
>  2 files changed, 10 insertions(+), 2 deletions(-)
> Index: linus_tree/kernel/cpuset.c
> ===================================================================
> --- linus_tree.orig/kernel/cpuset.c	2011-04-18 10:33:15.000000000 +0200
> +++ linus_tree/kernel/cpuset.c	2011-04-18 10:33:56.000000000 +0200
> @@ -2460,11 +2460,19 @@ static int cpuset_spread_node(int *rotor
>  
>  int cpuset_mem_spread_node(void)
>  {
> +	if (current->cpuset_mem_spread_rotor == NUMA_NO_NODE)
> +		current->cpuset_mem_spread_rotor =
> +			node_random(&current->mems_allowed);
> +
>  	return cpuset_spread_node(&current->cpuset_mem_spread_rotor);
>  }
>  
>  int cpuset_slab_spread_node(void)
>  {
> +	if (current->cpuset_slab_spread_rotor == NUMA_NO_NODE)
> +		current->cpuset_slab_spread_rotor
> +			= node_random(&current->mems_allowed);
> +

So one function has the `=' on the line with the assignment (preferred) 
and the other has it on the new value?

>  	return cpuset_spread_node(&current->cpuset_slab_spread_rotor);
>  }
>  
> Index: linus_tree/kernel/fork.c
> ===================================================================
> --- linus_tree.orig/kernel/fork.c	2011-04-18 10:33:15.000000000 +0200
> +++ linus_tree/kernel/fork.c	2011-04-18 10:33:56.000000000 +0200
> @@ -1126,8 +1126,8 @@ static struct task_struct *copy_process(
>  	mpol_fix_fork_child_flag(p);
>  #endif
>  #ifdef CONFIG_CPUSETS
> -	p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
> -	p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
> +	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
> +	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
>  #endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  	p->irq_events = 0;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH incremental] cpusets: initialize spread rotor lazily
  2011-04-18 20:19               ` David Rientjes
@ 2011-04-18 21:29                 ` Michal Hocko
  2011-04-19  1:15                   ` David Rientjes
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2011-04-18 21:29 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Mon 18-04-11 13:19:09, David Rientjes wrote:
> On Mon, 18 Apr 2011, Michal Hocko wrote:
[...]
> > > Andrew could easily drop the earlier version and merge this v2, but I'm 
> > > asking for selfish reasons:
> > 
> > Just out of curiosity. What is the reason? Don't want to wait for new mmotm?
> > 
> 
> Because lazy initialization is another feature on top of the existing 
> patch so it should be done incrementally instead of proposing an entirely 
> new patch which is already mostly in -mm.

ok, makes sense

> > 
> > [Here is the follow-up patch based on top of
> > http://userweb.kernel.org/~akpm/mmotm/broken-out/cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node.patch]
> > ---
> > From: Michal Hocko <mhocko@suse.cz>
> > Subject: cpusets: initialize spread mem/slab rotor lazily
> > 
[...]
> > Also do not use -1 for unitialized nodes and rather use NUMA_NO_NODE
> > instead.
> > 
> 
> Don't need to refer to a previous version that used -1 since it will never 
> be committed and nobody will know what you're talking about in the git 
> log.

removed

[...]
> >  int cpuset_mem_spread_node(void)
> >  {
> > +	if (current->cpuset_mem_spread_rotor == NUMA_NO_NODE)
> > +		current->cpuset_mem_spread_rotor =
> > +			node_random(&current->mems_allowed);
> > +
> >  	return cpuset_spread_node(&current->cpuset_mem_spread_rotor);
> >  }
> >  
> >  int cpuset_slab_spread_node(void)
> >  {
> > +	if (current->cpuset_slab_spread_rotor == NUMA_NO_NODE)
> > +		current->cpuset_slab_spread_rotor
> > +			= node_random(&current->mems_allowed);
> > +
> 
> So one function has the `=' on the line with the assignment (preferred) 
> and the other has it on the new value?

fixed

Thanks! Updated patch bellow.

Changes from v3:
 - code style fix
Changes from v2:
 - use NUMA_NO_NODE rather than hardcoded -1
 - make the patch incremental to the original one because that one is in
   -mm tree already.
Changes from v1:
 - initialize cpuset_{mem,slab}_spread_rotor lazily}
---
From: Michal Hocko <mhocko@suse.cz>
Subject: cpusets: initialize spread mem/slab rotor lazily

Kosaki Motohiro raised a concern that copy_process is hot path and we do
not want to initialize cpuset_{mem,slab}_spread_rotor if they are not
used most of the time.

I think that we should rather initialize it lazily when rotors are used
for the first time.
This will also catch the case when we set up spread mem/slab later.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Index: linus_tree/kernel/cpuset.c
===================================================================
--- linus_tree.orig/kernel/cpuset.c	2011-04-18 10:33:15.000000000 +0200
+++ linus_tree/kernel/cpuset.c	2011-04-18 23:24:02.000000000 +0200
@@ -2460,11 +2460,19 @@ static int cpuset_spread_node(int *rotor
 
 int cpuset_mem_spread_node(void)
 {
+	if (current->cpuset_mem_spread_rotor == NUMA_NO_NODE)
+		current->cpuset_mem_spread_rotor =
+			node_random(&current->mems_allowed);
+
 	return cpuset_spread_node(&current->cpuset_mem_spread_rotor);
 }
 
 int cpuset_slab_spread_node(void)
 {
+	if (current->cpuset_slab_spread_rotor == NUMA_NO_NODE)
+		current->cpuset_slab_spread_rotor =
+			node_random(&current->mems_allowed);
+
 	return cpuset_spread_node(&current->cpuset_slab_spread_rotor);
 }
 
Index: linus_tree/kernel/fork.c
===================================================================
--- linus_tree.orig/kernel/fork.c	2011-04-18 10:33:15.000000000 +0200
+++ linus_tree/kernel/fork.c	2011-04-18 10:33:56.000000000 +0200
@@ -1126,8 +1126,8 @@ static struct task_struct *copy_process(
 	mpol_fix_fork_child_flag(p);
 #endif
 #ifdef CONFIG_CPUSETS
-	p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
-	p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
+	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
+	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
 	p->irq_events = 0;
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH incremental] cpusets: initialize spread rotor lazily
  2011-04-18 21:29                 ` Michal Hocko
@ 2011-04-19  1:15                   ` David Rientjes
  0 siblings, 0 replies; 23+ messages in thread
From: David Rientjes @ 2011-04-19  1:15 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Mon, 18 Apr 2011, Michal Hocko wrote:

> From: Michal Hocko <mhocko@suse.cz>
> Subject: cpusets: initialize spread mem/slab rotor lazily
> 
> Kosaki Motohiro raised a concern that copy_process is hot path and we do
> not want to initialize cpuset_{mem,slab}_spread_rotor if they are not
> used most of the time.
> 
> I think that we should rather initialize it lazily when rotors are used
> for the first time.
> This will also catch the case when we set up spread mem/slab later.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

Acked-by: David Rientjes <rientjes@google.com>

Thanks Michal!

> Index: linus_tree/kernel/cpuset.c
> ===================================================================
> --- linus_tree.orig/kernel/cpuset.c	2011-04-18 10:33:15.000000000 +0200
> +++ linus_tree/kernel/cpuset.c	2011-04-18 23:24:02.000000000 +0200
> @@ -2460,11 +2460,19 @@ static int cpuset_spread_node(int *rotor
>  
>  int cpuset_mem_spread_node(void)
>  {
> +	if (current->cpuset_mem_spread_rotor == NUMA_NO_NODE)
> +		current->cpuset_mem_spread_rotor =
> +			node_random(&current->mems_allowed);
> +
>  	return cpuset_spread_node(&current->cpuset_mem_spread_rotor);
>  }
>  
>  int cpuset_slab_spread_node(void)
>  {
> +	if (current->cpuset_slab_spread_rotor == NUMA_NO_NODE)
> +		current->cpuset_slab_spread_rotor =
> +			node_random(&current->mems_allowed);
> +
>  	return cpuset_spread_node(&current->cpuset_slab_spread_rotor);
>  }
>  
> Index: linus_tree/kernel/fork.c
> ===================================================================
> --- linus_tree.orig/kernel/fork.c	2011-04-18 10:33:15.000000000 +0200
> +++ linus_tree/kernel/fork.c	2011-04-18 10:33:56.000000000 +0200
> @@ -1126,8 +1126,8 @@ static struct task_struct *copy_process(
>  	mpol_fix_fork_child_flag(p);
>  #endif
>  #ifdef CONFIG_CPUSETS
> -	p->cpuset_mem_spread_rotor = node_random(&p->mems_allowed);
> -	p->cpuset_slab_spread_rotor = node_random(&p->mems_allowed);
> +	p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
> +	p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
>  #endif
>  #ifdef CONFIG_TRACE_IRQFLAGS
>  	p->irq_events = 0;

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-04-15  8:20         ` [PATCH v2] " Michal Hocko
  2011-04-15  8:29           ` KOSAKI Motohiro
  2011-04-15 23:42           ` David Rientjes
@ 2011-05-26 22:33           ` Andrew Morton
  2011-05-27 12:47             ` Michal Hocko
  2 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2011-05-26 22:33 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KOSAKI Motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Linus Torvalds, linux-mm

On Fri, 15 Apr 2011 10:20:51 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> Some workloads that create a large number of small files tend to assign
> too many pages to node 0 (multi-node systems).  Part of the reason is that
> the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
> node 0 for newly created tasks.
> 
> This patch changes the rotor to be initialized to a random node number of
> the cpuset. We are initializating it lazily in cpuset_mem_spread_node
> resp. cpuset_slab_spread_node.
> 
>
> ...
>
> --- a/kernel/cpuset.c
> +++ b/kernel/cpuset.c
> @@ -2465,11 +2465,19 @@ static int cpuset_spread_node(int *rotor)
>  
>  int cpuset_mem_spread_node(void)
>  {
> +	if (current->cpuset_mem_spread_rotor == -1)
> +		current->cpuset_mem_spread_rotor =
> +			node_random(&current->mems_allowed);
> +
>  	return cpuset_spread_node(&current->cpuset_mem_spread_rotor);
>  }
>  
>  int cpuset_slab_spread_node(void)
>  {
> +	if (current->cpuset_slab_spread_rotor == -1)
> +		current->cpuset_slab_spread_rotor
> +			= node_random(&current->mems_allowed);
> +
>  	return cpuset_spread_node(&current->cpuset_slab_spread_rotor);
>  }
>  

alpha allmodconfig:

kernel/built-in.o: In function `cpuset_slab_spread_node':
(.text+0x67360): undefined reference to `node_random'
kernel/built-in.o: In function `cpuset_slab_spread_node':
(.text+0x67368): undefined reference to `node_random'
kernel/built-in.o: In function `cpuset_mem_spread_node':
(.text+0x673b8): undefined reference to `node_random'
kernel/built-in.o: In function `cpuset_mem_spread_node':
(.text+0x673c0): undefined reference to `node_random'

because it has CONFIG_NUMA=n, CONFIG_NODES_SHIFT=7.

We use "#if MAX_NUMNODES > 1" in nodemask.h, but we use CONFIG_NUMA
when deciding to build mempolicy.o.  That's a bit odd - why didn't
nodemask.h use CONFIG_NUMA?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-26 22:33           ` [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node() Andrew Morton
@ 2011-05-27 12:47             ` Michal Hocko
  2011-05-27 19:07               ` David Rientjes
  2011-05-27 21:20               ` Andrew Morton
  0 siblings, 2 replies; 23+ messages in thread
From: Michal Hocko @ 2011-05-27 12:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Linus Torvalds, linux-mm

On Thu 26-05-11 15:33:19, Andrew Morton wrote:
> On Fri, 15 Apr 2011 10:20:51 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > Some workloads that create a large number of small files tend to assign
> > too many pages to node 0 (multi-node systems).  Part of the reason is that
> > the rotor (in cpuset_mem_spread_node()) used to assign nodes starts at
> > node 0 for newly created tasks.
> > 
> > This patch changes the rotor to be initialized to a random node number of
> > the cpuset. We are initializating it lazily in cpuset_mem_spread_node
> > resp. cpuset_slab_spread_node.
> > 
> >
> > ...
> >
> > --- a/kernel/cpuset.c
> > +++ b/kernel/cpuset.c
> > @@ -2465,11 +2465,19 @@ static int cpuset_spread_node(int *rotor)
> >  
> >  int cpuset_mem_spread_node(void)
> >  {
> > +	if (current->cpuset_mem_spread_rotor == -1)
> > +		current->cpuset_mem_spread_rotor =
> > +			node_random(&current->mems_allowed);
> > +
> >  	return cpuset_spread_node(&current->cpuset_mem_spread_rotor);
> >  }
> >  
> >  int cpuset_slab_spread_node(void)
> >  {
> > +	if (current->cpuset_slab_spread_rotor == -1)
> > +		current->cpuset_slab_spread_rotor
> > +			= node_random(&current->mems_allowed);
> > +
> >  	return cpuset_spread_node(&current->cpuset_slab_spread_rotor);
> >  }
> >  
> 
> alpha allmodconfig:
> 
> kernel/built-in.o: In function `cpuset_slab_spread_node':
> (.text+0x67360): undefined reference to `node_random'
> kernel/built-in.o: In function `cpuset_slab_spread_node':
> (.text+0x67368): undefined reference to `node_random'
> kernel/built-in.o: In function `cpuset_mem_spread_node':
> (.text+0x673b8): undefined reference to `node_random'
> kernel/built-in.o: In function `cpuset_mem_spread_node':
> (.text+0x673c0): undefined reference to `node_random'
> 
> because it has CONFIG_NUMA=n, CONFIG_NODES_SHIFT=7.

non-NUMA with MAX_NUMA_NODES? Hmm, really weird and looks like a numa
misuse.

> We use "#if MAX_NUMNODES > 1" in nodemask.h, but we use CONFIG_NUMA
> when deciding to build mempolicy.o.  That's a bit odd - why didn't
> nodemask.h use CONFIG_NUMA?

We have this since the kernel git age. I guess this is just for
optimizations where some functions can be NOOP when there is only one
node.

I know that this is ugly but what if we just define node_random in the
header?
---

Define node_random directly in the mempolicy header

Alpha allows a strange configuration CONFIG_NUMA=n and CONFIG_NODES_SHIFT=7
which means that mempolicy.c is not compiled and linked while we still have
MAX_NUMNODES>1 which means that node_random is not defined.

Let's move node_random definition into the header. We will be consistent with
other node_* functions.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Index: linus_tree/include/linux/nodemask.h
===================================================================
--- linus_tree.orig/include/linux/nodemask.h	2011-05-27 14:15:52.000000000 +0200
+++ linus_tree/include/linux/nodemask.h	2011-05-27 14:36:30.000000000 +0200
@@ -433,7 +433,21 @@ static inline void node_set_offline(int
 	nr_online_nodes = num_node_state(N_ONLINE);
 }
 
-extern int node_random(const nodemask_t *maskp);
+unsigned int get_random_int(void );
+/*
+ * Return the bit number of a random bit set in the nodemask.
+ * (returns -1 if nodemask is empty)
+ */
+static inline int node_random(const nodemask_t *maskp)
+{
+	int w, bit = -1;
+
+	w = nodes_weight(*maskp);
+	if (w)
+		bit = bitmap_ord_to_pos(maskp->bits,
+			get_random_int() % w, MAX_NUMNODES);
+	return bit;
+}
 
 #else
 
Index: linus_tree/mm/mempolicy.c
===================================================================
--- linus_tree.orig/mm/mempolicy.c	2011-05-27 14:16:05.000000000 +0200
+++ linus_tree/mm/mempolicy.c	2011-05-27 14:16:34.000000000 +0200
@@ -1650,21 +1650,6 @@ static inline unsigned interleave_nid(st
 		return interleave_nodes(pol);
 }
 
-/*
- * Return the bit number of a random bit set in the nodemask.
- * (returns -1 if nodemask is empty)
- */
-int node_random(const nodemask_t *maskp)
-{
-	int w, bit = -1;
-
-	w = nodes_weight(*maskp);
-	if (w)
-		bit = bitmap_ord_to_pos(maskp->bits,
-			get_random_int() % w, MAX_NUMNODES);
-	return bit;
-}
-
 #ifdef CONFIG_HUGETLBFS
 /*
  * huge_zonelist(@vma, @addr, @gfp_flags, @mpol)
-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-27 12:47             ` Michal Hocko
@ 2011-05-27 19:07               ` David Rientjes
  2011-05-27 23:17                 ` Michal Hocko
  2011-05-27 21:20               ` Andrew Morton
  1 sibling, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-05-27 19:07 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Fri, 27 May 2011, Michal Hocko wrote:

> > alpha allmodconfig:
> > 
> > kernel/built-in.o: In function `cpuset_slab_spread_node':
> > (.text+0x67360): undefined reference to `node_random'
> > kernel/built-in.o: In function `cpuset_slab_spread_node':
> > (.text+0x67368): undefined reference to `node_random'
> > kernel/built-in.o: In function `cpuset_mem_spread_node':
> > (.text+0x673b8): undefined reference to `node_random'
> > kernel/built-in.o: In function `cpuset_mem_spread_node':
> > (.text+0x673c0): undefined reference to `node_random'
> > 
> > because it has CONFIG_NUMA=n, CONFIG_NODES_SHIFT=7.
> 
> non-NUMA with MAX_NUMA_NODES? Hmm, really weird and looks like a numa
> misuse.
> 

CONFIG_NODES_SHIFT is used for UMA machines that are using DISCONTIGMEM 
usually because they have very large holes; such machines don't need 
things like mempolicies but do need the data structures that abstract 
ranges of memory in the physical address space.  This build breakage 
probably isn't restricted to only alpha, you could probably see it with at 
least ia64 and mips as well.

> Define node_random directly in the mempolicy header
> 
> Alpha allows a strange configuration CONFIG_NUMA=n and CONFIG_NODES_SHIFT=7
> which means that mempolicy.c is not compiled and linked while we still have
> MAX_NUMNODES>1 which means that node_random is not defined.
> 

It's not just alpha, and it's not entirely strange.

> Let's move node_random definition into the header. We will be consistent with
> other node_* functions.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> Index: linus_tree/include/linux/nodemask.h
> ===================================================================
> --- linus_tree.orig/include/linux/nodemask.h	2011-05-27 14:15:52.000000000 +0200
> +++ linus_tree/include/linux/nodemask.h	2011-05-27 14:36:30.000000000 +0200
> @@ -433,7 +433,21 @@ static inline void node_set_offline(int
>  	nr_online_nodes = num_node_state(N_ONLINE);
>  }
>  
> -extern int node_random(const nodemask_t *maskp);
> +unsigned int get_random_int(void );

Spurious space.

> +/*
> + * Return the bit number of a random bit set in the nodemask.
> + * (returns -1 if nodemask is empty)
> + */
> +static inline int node_random(const nodemask_t *maskp)
> +{
> +	int w, bit = -1;
> +
> +	w = nodes_weight(*maskp);
> +	if (w)
> +		bit = bitmap_ord_to_pos(maskp->bits,
> +			get_random_int() % w, MAX_NUMNODES);
> +	return bit;
> +}
>  
>  #else
>  

Probably should have a no-op definition when MAX_NUMNODES == 1 that just 
returns 0?

> Index: linus_tree/mm/mempolicy.c
> ===================================================================
> --- linus_tree.orig/mm/mempolicy.c	2011-05-27 14:16:05.000000000 +0200
> +++ linus_tree/mm/mempolicy.c	2011-05-27 14:16:34.000000000 +0200
> @@ -1650,21 +1650,6 @@ static inline unsigned interleave_nid(st
>  		return interleave_nodes(pol);
>  }
>  
> -/*
> - * Return the bit number of a random bit set in the nodemask.
> - * (returns -1 if nodemask is empty)
> - */
> -int node_random(const nodemask_t *maskp)
> -{
> -	int w, bit = -1;
> -
> -	w = nodes_weight(*maskp);
> -	if (w)
> -		bit = bitmap_ord_to_pos(maskp->bits,
> -			get_random_int() % w, MAX_NUMNODES);
> -	return bit;
> -}
> -
>  #ifdef CONFIG_HUGETLBFS
>  /*
>   * huge_zonelist(@vma, @addr, @gfp_flags, @mpol)

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-27 12:47             ` Michal Hocko
  2011-05-27 19:07               ` David Rientjes
@ 2011-05-27 21:20               ` Andrew Morton
  2011-05-27 23:17                 ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2011-05-27 21:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: KOSAKI Motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Linus Torvalds, linux-mm

On Fri, 27 May 2011 14:47:05 +0200
Michal Hocko <mhocko@suse.cz> wrote:

> > We use "#if MAX_NUMNODES > 1" in nodemask.h, but we use CONFIG_NUMA
> > when deciding to build mempolicy.o.  That's a bit odd - why didn't
> > nodemask.h use CONFIG_NUMA?
> 
> We have this since the kernel git age. I guess this is just for
> optimizations where some functions can be NOOP when there is only one
> node.
> 
> I know that this is ugly but what if we just define node_random in the
> header?

I think I prefer this:

--- a/include/linux/nodemask.h~cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node-fix-2
+++ a/include/linux/nodemask.h
@@ -433,8 +433,6 @@ static inline void node_set_offline(int 
 	nr_online_nodes = num_node_state(N_ONLINE);
 }
 
-extern int node_random(const nodemask_t *maskp);
-
 #else
 
 static inline int node_state(int node, enum node_states state)
@@ -466,7 +464,15 @@ static inline int num_node_state(enum no
 #define node_set_online(node)	   node_set_state((node), N_ONLINE)
 #define node_set_offline(node)	   node_clear_state((node), N_ONLINE)
 
-static inline int node_random(const nodemask_t *mask) { return 0; }
+#endif
+
+#if defined(CONFIG_NUMA) && (MAX_NUMNODES > 1)
+extern int node_random(const nodemask_t *maskp);
+#else
+static inline int node_random(const nodemask_t *mask)
+{
+	return 0;
+}
 #endif
 
 #define node_online_map 	node_states[N_ONLINE]


It's beyond weird that we implement node_random() if
defined(CONFIG_NUMA) && (MAX_NUMNODES > 1), and only use it if
defined(CONFIG_CPUSETS).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-27 21:20               ` Andrew Morton
@ 2011-05-27 23:17                 ` Michal Hocko
  2011-05-27 23:30                   ` David Rientjes
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2011-05-27 23:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: KOSAKI Motohiro, LKML, Jack Steiner, Lee Schermerhorn,
	Christoph Lameter, Pekka Enberg, Paul Menage, Robin Holt,
	Linus Torvalds, linux-mm

On Fri 27-05-11 14:20:51, Andrew Morton wrote:
> On Fri, 27 May 2011 14:47:05 +0200
> Michal Hocko <mhocko@suse.cz> wrote:
> 
> > > We use "#if MAX_NUMNODES > 1" in nodemask.h, but we use CONFIG_NUMA
> > > when deciding to build mempolicy.o.  That's a bit odd - why didn't
> > > nodemask.h use CONFIG_NUMA?
> > 
> > We have this since the kernel git age. I guess this is just for
> > optimizations where some functions can be NOOP when there is only one
> > node.
> > 
> > I know that this is ugly but what if we just define node_random in the
> > header?
> 
> I think I prefer this:
> 
> --- a/include/linux/nodemask.h~cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node-fix-2
> +++ a/include/linux/nodemask.h
> @@ -433,8 +433,6 @@ static inline void node_set_offline(int 
>  	nr_online_nodes = num_node_state(N_ONLINE);
>  }
>  
> -extern int node_random(const nodemask_t *maskp);
> -
>  #else
>  
>  static inline int node_state(int node, enum node_states state)
> @@ -466,7 +464,15 @@ static inline int num_node_state(enum no
>  #define node_set_online(node)	   node_set_state((node), N_ONLINE)
>  #define node_set_offline(node)	   node_clear_state((node), N_ONLINE)
>  
> -static inline int node_random(const nodemask_t *mask) { return 0; }
> +#endif
> +
> +#if defined(CONFIG_NUMA) && (MAX_NUMNODES > 1)
> +extern int node_random(const nodemask_t *maskp);
> +#else
> +static inline int node_random(const nodemask_t *mask)
> +{
> +	return 0;
> +}
>  #endif

I have to admit that I quite don't understand concept of several nodes
with UMA archs but do we really want to provide the sane node all the
time?

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-27 19:07               ` David Rientjes
@ 2011-05-27 23:17                 ` Michal Hocko
  2011-05-27 23:27                   ` David Rientjes
  0 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2011-05-27 23:17 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Fri 27-05-11 12:07:30, David Rientjes wrote:
> On Fri, 27 May 2011, Michal Hocko wrote:
> 
> > > alpha allmodconfig:
> > > 
> > > kernel/built-in.o: In function `cpuset_slab_spread_node':
> > > (.text+0x67360): undefined reference to `node_random'
> > > kernel/built-in.o: In function `cpuset_slab_spread_node':
> > > (.text+0x67368): undefined reference to `node_random'
> > > kernel/built-in.o: In function `cpuset_mem_spread_node':
> > > (.text+0x673b8): undefined reference to `node_random'
> > > kernel/built-in.o: In function `cpuset_mem_spread_node':
> > > (.text+0x673c0): undefined reference to `node_random'
> > > 
> > > because it has CONFIG_NUMA=n, CONFIG_NODES_SHIFT=7.
> > 
> > non-NUMA with MAX_NUMA_NODES? Hmm, really weird and looks like a numa
> > misuse.
> > 
> 
> CONFIG_NODES_SHIFT is used for UMA machines that are using DISCONTIGMEM 
> usually because they have very large holes; such machines don't need 
> things like mempolicies but do need the data structures that abstract 
> ranges of memory in the physical address space.  This build breakage 
> probably isn't restricted to only alpha, you could probably see it with at 
> least ia64 and mips as well.

Hmmm. I just find strange that some UMA arch uses functions like
{first,next}_online_node.

[...]
> > +/*
> > + * Return the bit number of a random bit set in the nodemask.
> > + * (returns -1 if nodemask is empty)
> > + */
> > +static inline int node_random(const nodemask_t *maskp)
> > +{
> > +	int w, bit = -1;
> > +
> > +	w = nodes_weight(*maskp);
> > +	if (w)
> > +		bit = bitmap_ord_to_pos(maskp->bits,
> > +			get_random_int() % w, MAX_NUMNODES);
> > +	return bit;
> > +}
> >  
> >  #else
> >  
> 
> Probably should have a no-op definition when MAX_NUMNODES == 1 that just 
> returns 0?

There is one, which has been added by the patch which introduced the
function. This is just an incremental patch for the compilation fix.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-27 23:17                 ` Michal Hocko
@ 2011-05-27 23:27                   ` David Rientjes
  2011-05-27 23:40                     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-05-27 23:27 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Sat, 28 May 2011, Michal Hocko wrote:

> > CONFIG_NODES_SHIFT is used for UMA machines that are using DISCONTIGMEM 
> > usually because they have very large holes; such machines don't need 
> > things like mempolicies but do need the data structures that abstract 
> > ranges of memory in the physical address space.  This build breakage 
> > probably isn't restricted to only alpha, you could probably see it with at 
> > least ia64 and mips as well.
> 
> Hmmm. I just find strange that some UMA arch uses functions like
> {first,next}_online_node.
> 

They shouldn't, but they do use NUMA data structures like pg_data_t for 
DISCONTIGMEM.  The MAX_NUMNODES > 1 optimization in nodemask.h is to 
prevent doing things like node_weight() on a nodemask when we know that 
only one bit will ever be set, otherwise we could make it conditional on 
CONFIG_NEED_MULTIPLE_NODES.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-27 23:17                 ` Michal Hocko
@ 2011-05-27 23:30                   ` David Rientjes
  2011-05-27 23:38                     ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: David Rientjes @ 2011-05-27 23:30 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Sat, 28 May 2011, Michal Hocko wrote:

> > --- a/include/linux/nodemask.h~cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node-fix-2
> > +++ a/include/linux/nodemask.h
> > @@ -433,8 +433,6 @@ static inline void node_set_offline(int 
> >  	nr_online_nodes = num_node_state(N_ONLINE);
> >  }
> >  
> > -extern int node_random(const nodemask_t *maskp);
> > -
> >  #else
> >  
> >  static inline int node_state(int node, enum node_states state)
> > @@ -466,7 +464,15 @@ static inline int num_node_state(enum no
> >  #define node_set_online(node)	   node_set_state((node), N_ONLINE)
> >  #define node_set_offline(node)	   node_clear_state((node), N_ONLINE)
> >  
> > -static inline int node_random(const nodemask_t *mask) { return 0; }
> > +#endif
> > +
> > +#if defined(CONFIG_NUMA) && (MAX_NUMNODES > 1)
> > +extern int node_random(const nodemask_t *maskp);
> > +#else
> > +static inline int node_random(const nodemask_t *mask)
> > +{
> > +	return 0;
> > +}
> >  #endif
> 
> I have to admit that I quite don't understand concept of several nodes
> with UMA archs but do we really want to provide the sane node all the
> time?
> 

They aren't nodes on UMA machines, they are memory regions for 
DISCONTIGMEM which are separated by large holes in the address space.  
These archs will never sanely use node_random(), so it doesn't really 
matter except for CONFIG_NUMA where MAX_NUMNODES > 1, since they won't be 
selecting random memory regions.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-27 23:30                   ` David Rientjes
@ 2011-05-27 23:38                     ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2011-05-27 23:38 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Fri 27-05-11 16:30:33, David Rientjes wrote:
> On Sat, 28 May 2011, Michal Hocko wrote:
> 
> > > --- a/include/linux/nodemask.h~cpusets-randomize-node-rotor-used-in-cpuset_mem_spread_node-fix-2
> > > +++ a/include/linux/nodemask.h
> > > @@ -433,8 +433,6 @@ static inline void node_set_offline(int 
> > >  	nr_online_nodes = num_node_state(N_ONLINE);
> > >  }
> > >  
> > > -extern int node_random(const nodemask_t *maskp);
> > > -
> > >  #else
> > >  
> > >  static inline int node_state(int node, enum node_states state)
> > > @@ -466,7 +464,15 @@ static inline int num_node_state(enum no
> > >  #define node_set_online(node)	   node_set_state((node), N_ONLINE)
> > >  #define node_set_offline(node)	   node_clear_state((node), N_ONLINE)
> > >  
> > > -static inline int node_random(const nodemask_t *mask) { return 0; }
> > > +#endif
> > > +
> > > +#if defined(CONFIG_NUMA) && (MAX_NUMNODES > 1)
> > > +extern int node_random(const nodemask_t *maskp);
> > > +#else
> > > +static inline int node_random(const nodemask_t *mask)
> > > +{
> > > +	return 0;
> > > +}
> > >  #endif
> > 
> > I have to admit that I quite don't understand concept of several nodes
> > with UMA archs but do we really want to provide the sane node all the
> > time?
> > 
> 
> They aren't nodes on UMA machines, they are memory regions for 
> DISCONTIGMEM which are separated by large holes in the address space.  
> These archs will never sanely use node_random(), so it doesn't really 
> matter except for CONFIG_NUMA where MAX_NUMNODES > 1, since they won't be 
> selecting random memory regions.

OK, I see. Then it makes much more sense. Thanks for clarification.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node()
  2011-05-27 23:27                   ` David Rientjes
@ 2011-05-27 23:40                     ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2011-05-27 23:40 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, KOSAKI Motohiro, LKML, Jack Steiner,
	Lee Schermerhorn, Christoph Lameter, Pekka Enberg, Paul Menage,
	Robin Holt, Linus Torvalds, linux-mm

On Fri 27-05-11 16:27:34, David Rientjes wrote:
> On Sat, 28 May 2011, Michal Hocko wrote:
> 
> > > CONFIG_NODES_SHIFT is used for UMA machines that are using DISCONTIGMEM 
> > > usually because they have very large holes; such machines don't need 
> > > things like mempolicies but do need the data structures that abstract 
> > > ranges of memory in the physical address space.  This build breakage 
> > > probably isn't restricted to only alpha, you could probably see it with at 
> > > least ia64 and mips as well.
> > 
> > Hmmm. I just find strange that some UMA arch uses functions like
> > {first,next}_online_node.
> > 
> 
> They shouldn't, but they do use NUMA data structures like pg_data_t for 
> DISCONTIGMEM.  The MAX_NUMNODES > 1 optimization in nodemask.h is to 
> prevent doing things like node_weight() on a nodemask when we know that 
> only one bit will ever be set, otherwise we could make it conditional on 
> CONFIG_NEED_MULTIPLE_NODES.

Thanks for the explanation.

-- 
Michal Hocko
SUSE Labs
SUSE LINUX s.r.o.
Lihovarska 1060/12
190 00 Praha 9    
Czech Republic

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-05-27 23:40 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-13 14:35 cpusets: randomize node rotor used in cpuset_mem_spread_node() Michal Hocko
2011-04-14  2:19 ` KOSAKI Motohiro
2011-04-14  6:51   ` Michal Hocko
2011-04-14  7:01     ` KOSAKI Motohiro
2011-04-15  7:18       ` KOSAKI Motohiro
2011-04-15  8:20         ` [PATCH v2] " Michal Hocko
2011-04-15  8:29           ` KOSAKI Motohiro
2011-04-15  8:31             ` Michal Hocko
2011-04-15 23:42           ` David Rientjes
2011-04-18  8:42             ` [PATCH incremental] cpusets: initialize spread rotor lazily Michal Hocko
2011-04-18 20:19               ` David Rientjes
2011-04-18 21:29                 ` Michal Hocko
2011-04-19  1:15                   ` David Rientjes
2011-05-26 22:33           ` [PATCH v2] cpusets: randomize node rotor used in cpuset_mem_spread_node() Andrew Morton
2011-05-27 12:47             ` Michal Hocko
2011-05-27 19:07               ` David Rientjes
2011-05-27 23:17                 ` Michal Hocko
2011-05-27 23:27                   ` David Rientjes
2011-05-27 23:40                     ` Michal Hocko
2011-05-27 21:20               ` Andrew Morton
2011-05-27 23:17                 ` Michal Hocko
2011-05-27 23:30                   ` David Rientjes
2011-05-27 23:38                     ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).