All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-21 16:10 ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 16:10 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Mel Gorman,
	Tejun Heo, Benjamin Herrenschmidt

Currently SLQB is not allowed to be configured on PPC and S390 machines as
CPUs can belong to memoryless nodes. SLQB does not deal with this very well
and crashes reliably.

These patches fix the problem on PPC64 and it appears to be fairly stable.
At least, basic actions that were previously silently halting the machine
complete successfully. There might still be per-cpu problems as Sachin
reported the stability problems on this machine did not depend on SLQB.

Patch 1 notes that the per-node hack in SLQB only works if every node in
	the system has a CPU of the same ID. If this is not the case,
	the per-node areas are not necessarily allocated. This fix only
	applies to ppc64. It's possible that s390 needs a similar hack. The
	alternative is to statically allocate the per-node structures but
	this is both sub-optimal in terms of performance and memory usage.

Patch 2 notes that on memoryless configurations, memory is always freed
	remotely but always allocates locally and falls back to the page
	allocator on failure. This effectively is a memory leak. This patch
	records in kmem_cache_cpu what node it considers local to be either
	the real local node or the closest node available

Patch 3 allows SLQB to be configured on PPC again. It's not enabled on
	S390 because I can't test for sure on a suitable configuration there.

This is not ready for merging just yet.

It needs signed-off from the powerpc side because it's now allocating more
memory potentially (Ben?). An alternative to this patch is in V1 that
statically declares the per-node structures but this is potentially
sub-optimal but from a performance and memory utilisation perspective.

>From an SLQB side, how does patch 2 now look from a potential list-corruption
point of view (Christoph, Nick, Pekka?). Certainly this version seems a
lot more sensible than the patch in V1 because the per-cpu list is now
always being used for pages from the closest node.

It would also be nice if the S390 guys could retest as well with SLQB to see
if special action with respect to per-cpu areas is still needed.

 arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
 include/linux/slqb_def.h       |    3 +++
 init/Kconfig                   |    2 +-
 mm/slqb.c                      |   23 +++++++++++++++++------
 4 files changed, 41 insertions(+), 7 deletions(-)


^ permalink raw reply	[flat|nested] 70+ messages in thread

* [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-21 16:10 ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 16:10 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Mel Gorman,
	Tejun Heo, Benjamin Herrenschmidt

Currently SLQB is not allowed to be configured on PPC and S390 machines as
CPUs can belong to memoryless nodes. SLQB does not deal with this very well
and crashes reliably.

These patches fix the problem on PPC64 and it appears to be fairly stable.
At least, basic actions that were previously silently halting the machine
complete successfully. There might still be per-cpu problems as Sachin
reported the stability problems on this machine did not depend on SLQB.

Patch 1 notes that the per-node hack in SLQB only works if every node in
	the system has a CPU of the same ID. If this is not the case,
	the per-node areas are not necessarily allocated. This fix only
	applies to ppc64. It's possible that s390 needs a similar hack. The
	alternative is to statically allocate the per-node structures but
	this is both sub-optimal in terms of performance and memory usage.

Patch 2 notes that on memoryless configurations, memory is always freed
	remotely but always allocates locally and falls back to the page
	allocator on failure. This effectively is a memory leak. This patch
	records in kmem_cache_cpu what node it considers local to be either
	the real local node or the closest node available

Patch 3 allows SLQB to be configured on PPC again. It's not enabled on
	S390 because I can't test for sure on a suitable configuration there.

This is not ready for merging just yet.

It needs signed-off from the powerpc side because it's now allocating more
memory potentially (Ben?). An alternative to this patch is in V1 that
statically declares the per-node structures but this is potentially
sub-optimal but from a performance and memory utilisation perspective.

>From an SLQB side, how does patch 2 now look from a potential list-corruption
point of view (Christoph, Nick, Pekka?). Certainly this version seems a
lot more sensible than the patch in V1 because the per-cpu list is now
always being used for pages from the closest node.

It would also be nice if the S390 guys could retest as well with SLQB to see
if special action with respect to per-cpu areas is still needed.

 arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
 include/linux/slqb_def.h       |    3 +++
 init/Kconfig                   |    2 +-
 mm/slqb.c                      |   23 +++++++++++++++++------
 4 files changed, 41 insertions(+), 7 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
  2009-09-21 16:10 ` Mel Gorman
@ 2009-09-21 16:10   ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 16:10 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Mel Gorman,
	Tejun Heo, Benjamin Herrenschmidt

SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
assumption is made that all valid node IDs will have matching valid CPU
ids. In memoryless configurations, it is possible to have a node ID with
no CPU having the same ID. When this happens, a per-cpu are is not
created and the value of paca[cpu].data_offset is some random value.
This is later deferenced and the system crashes after accessing some
invalid address.

This patch hacks powerpc to allocate per-cpu areas for node IDs that
have no corresponding CPU id. This gets around the immediate problem but
it should be discussed if there is a requirement for a DEFINE_PER_NODE
and how it should be implemented.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 1f68160..a5f52d4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
 		paca[i].data_offset = ptr - __per_cpu_start;
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
 	}
+#ifdef CONFIG_SLQB
+	/* 
+	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
+	 * assumes that ever node ID will have a CPU of that ID to match.
+	 * On systems with memoryless nodes, this may not hold true. Hence,
+	 * we take a second pass initialising a "per-cpu" area for node-ids
+	 * that SLQB can use
+	 */
+	for_each_node_state(i, N_NORMAL_MEMORY) {
+
+		/* Skip node IDs that a valid CPU id exists for */
+		if (paca[i].data_offset)
+			continue;
+
+		ptr = alloc_bootmem_pages_node(NODE_DATA(cpu_to_node(i)), size);
+
+		paca[i].data_offset = ptr - __per_cpu_start;
+		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+	}
+#endif /* CONFIG_SLQB */
 }
 #endif
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
@ 2009-09-21 16:10   ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 16:10 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Mel Gorman,
	Tejun Heo, Benjamin Herrenschmidt

SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
assumption is made that all valid node IDs will have matching valid CPU
ids. In memoryless configurations, it is possible to have a node ID with
no CPU having the same ID. When this happens, a per-cpu are is not
created and the value of paca[cpu].data_offset is some random value.
This is later deferenced and the system crashes after accessing some
invalid address.

This patch hacks powerpc to allocate per-cpu areas for node IDs that
have no corresponding CPU id. This gets around the immediate problem but
it should be discussed if there is a requirement for a DEFINE_PER_NODE
and how it should be implemented.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 1f68160..a5f52d4 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
 		paca[i].data_offset = ptr - __per_cpu_start;
 		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
 	}
+#ifdef CONFIG_SLQB
+	/* 
+	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
+	 * assumes that ever node ID will have a CPU of that ID to match.
+	 * On systems with memoryless nodes, this may not hold true. Hence,
+	 * we take a second pass initialising a "per-cpu" area for node-ids
+	 * that SLQB can use
+	 */
+	for_each_node_state(i, N_NORMAL_MEMORY) {
+
+		/* Skip node IDs that a valid CPU id exists for */
+		if (paca[i].data_offset)
+			continue;
+
+		ptr = alloc_bootmem_pages_node(NODE_DATA(cpu_to_node(i)), size);
+
+		paca[i].data_offset = ptr - __per_cpu_start;
+		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
+	}
+#endif /* CONFIG_SLQB */
 }
 #endif
 
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 2/3] slqb: Record what node is local to a kmem_cache_cpu
  2009-09-21 16:10 ` Mel Gorman
@ 2009-09-21 16:10   ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 16:10 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Mel Gorman,
	Tejun Heo, Benjamin Herrenschmidt

When freeing a page, SLQB checks if the page belongs to the local node.
If it is not, it is considered a remote free. On the allocation side, it
always checks the local lists and if they are empty, the page allocator
is called. On memoryless configurations, this is effectively a memory
leak and the machine quickly kills itself in an OOM storm.

This patch records what node ID is closest to a CPU and considers that to be
the local node. As the management structure for the CPU is always allocated
from the closest node, the node the CPU structure resides on is considered
"local".

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/slqb_def.h |    3 +++
 mm/slqb.c                |   23 +++++++++++++++++------
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/slqb_def.h b/include/linux/slqb_def.h
index 1243dda..2ccbe7e 100644
--- a/include/linux/slqb_def.h
+++ b/include/linux/slqb_def.h
@@ -101,6 +101,9 @@ struct kmem_cache_cpu {
 	struct kmem_cache_list	list;		/* List for node-local slabs */
 	unsigned int		colour_next;	/* Next colour offset to use */
 
+	/* local_nid will be numa_node_id() except when memoryless */
+	unsigned int		local_nid;
+
 #ifdef CONFIG_SMP
 	/*
 	 * rlist is a list of objects that don't fit on list.freelist (ie.
diff --git a/mm/slqb.c b/mm/slqb.c
index 4ca85e2..1846480 100644
--- a/mm/slqb.c
+++ b/mm/slqb.c
@@ -1375,7 +1375,7 @@ static noinline void *__slab_alloc_page(struct kmem_cache *s,
 	if (unlikely(!page))
 		return page;
 
-	if (!NUMA_BUILD || likely(slqb_page_to_nid(page) == numa_node_id())) {
+	if (!NUMA_BUILD || likely(slqb_page_to_nid(page) == c->local_nid)) {
 		struct kmem_cache_cpu *c;
 		int cpu = smp_processor_id();
 
@@ -1501,15 +1501,16 @@ static __always_inline void *__slab_alloc(struct kmem_cache *s,
 	struct kmem_cache_cpu *c;
 	struct kmem_cache_list *l;
 
+	c = get_cpu_slab(s, smp_processor_id());
+	VM_BUG_ON(!c);
+
 #ifdef CONFIG_NUMA
-	if (unlikely(node != -1) && unlikely(node != numa_node_id())) {
+	if (unlikely(node != -1) && unlikely(node != c->local_nid)) {
 try_remote:
 		return __remote_slab_alloc(s, gfpflags, node);
 	}
 #endif
 
-	c = get_cpu_slab(s, smp_processor_id());
-	VM_BUG_ON(!c);
 	l = &c->list;
 	object = __cache_list_get_object(s, l);
 	if (unlikely(!object)) {
@@ -1518,7 +1519,7 @@ try_remote:
 			object = __slab_alloc_page(s, gfpflags, node);
 #ifdef CONFIG_NUMA
 			if (unlikely(!object)) {
-				node = numa_node_id();
+				node = c->local_nid;
 				goto try_remote;
 			}
 #endif
@@ -1733,7 +1734,7 @@ static __always_inline void __slab_free(struct kmem_cache *s,
 	slqb_stat_inc(l, FREE);
 
 	if (!NUMA_BUILD || !slab_numa(s) ||
-			likely(slqb_page_to_nid(page) == numa_node_id())) {
+			likely(slqb_page_to_nid(page) == c->local_nid)) {
 		/*
 		 * Freeing fastpath. Collects all local-node objects, not
 		 * just those allocated from our per-CPU list. This allows
@@ -1928,6 +1929,16 @@ static void init_kmem_cache_cpu(struct kmem_cache *s,
 	c->rlist.tail		= NULL;
 	c->remote_cache_list	= NULL;
 #endif
+
+	/*
+	 * Determine what the local node to this CPU is. Ordinarily
+	 * this would be cpu_to_node() but for memoryless nodes, that
+	 * is not the best value. Instead, we take the numa node that
+	 * kmem_cache_cpu is allocated from as being the best guess
+	 * as being local because it'll match what the page allocator
+	 * thinks is the most local
+	 */
+	c->local_nid = page_to_nid(virt_to_page((unsigned long)c & PAGE_MASK));
 }
 
 #ifdef CONFIG_NUMA
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 2/3] slqb: Record what node is local to a kmem_cache_cpu
@ 2009-09-21 16:10   ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 16:10 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Mel Gorman,
	Tejun Heo, Benjamin Herrenschmidt

When freeing a page, SLQB checks if the page belongs to the local node.
If it is not, it is considered a remote free. On the allocation side, it
always checks the local lists and if they are empty, the page allocator
is called. On memoryless configurations, this is effectively a memory
leak and the machine quickly kills itself in an OOM storm.

This patch records what node ID is closest to a CPU and considers that to be
the local node. As the management structure for the CPU is always allocated
from the closest node, the node the CPU structure resides on is considered
"local".

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 include/linux/slqb_def.h |    3 +++
 mm/slqb.c                |   23 +++++++++++++++++------
 2 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/linux/slqb_def.h b/include/linux/slqb_def.h
index 1243dda..2ccbe7e 100644
--- a/include/linux/slqb_def.h
+++ b/include/linux/slqb_def.h
@@ -101,6 +101,9 @@ struct kmem_cache_cpu {
 	struct kmem_cache_list	list;		/* List for node-local slabs */
 	unsigned int		colour_next;	/* Next colour offset to use */
 
+	/* local_nid will be numa_node_id() except when memoryless */
+	unsigned int		local_nid;
+
 #ifdef CONFIG_SMP
 	/*
 	 * rlist is a list of objects that don't fit on list.freelist (ie.
diff --git a/mm/slqb.c b/mm/slqb.c
index 4ca85e2..1846480 100644
--- a/mm/slqb.c
+++ b/mm/slqb.c
@@ -1375,7 +1375,7 @@ static noinline void *__slab_alloc_page(struct kmem_cache *s,
 	if (unlikely(!page))
 		return page;
 
-	if (!NUMA_BUILD || likely(slqb_page_to_nid(page) == numa_node_id())) {
+	if (!NUMA_BUILD || likely(slqb_page_to_nid(page) == c->local_nid)) {
 		struct kmem_cache_cpu *c;
 		int cpu = smp_processor_id();
 
@@ -1501,15 +1501,16 @@ static __always_inline void *__slab_alloc(struct kmem_cache *s,
 	struct kmem_cache_cpu *c;
 	struct kmem_cache_list *l;
 
+	c = get_cpu_slab(s, smp_processor_id());
+	VM_BUG_ON(!c);
+
 #ifdef CONFIG_NUMA
-	if (unlikely(node != -1) && unlikely(node != numa_node_id())) {
+	if (unlikely(node != -1) && unlikely(node != c->local_nid)) {
 try_remote:
 		return __remote_slab_alloc(s, gfpflags, node);
 	}
 #endif
 
-	c = get_cpu_slab(s, smp_processor_id());
-	VM_BUG_ON(!c);
 	l = &c->list;
 	object = __cache_list_get_object(s, l);
 	if (unlikely(!object)) {
@@ -1518,7 +1519,7 @@ try_remote:
 			object = __slab_alloc_page(s, gfpflags, node);
 #ifdef CONFIG_NUMA
 			if (unlikely(!object)) {
-				node = numa_node_id();
+				node = c->local_nid;
 				goto try_remote;
 			}
 #endif
@@ -1733,7 +1734,7 @@ static __always_inline void __slab_free(struct kmem_cache *s,
 	slqb_stat_inc(l, FREE);
 
 	if (!NUMA_BUILD || !slab_numa(s) ||
-			likely(slqb_page_to_nid(page) == numa_node_id())) {
+			likely(slqb_page_to_nid(page) == c->local_nid)) {
 		/*
 		 * Freeing fastpath. Collects all local-node objects, not
 		 * just those allocated from our per-CPU list. This allows
@@ -1928,6 +1929,16 @@ static void init_kmem_cache_cpu(struct kmem_cache *s,
 	c->rlist.tail		= NULL;
 	c->remote_cache_list	= NULL;
 #endif
+
+	/*
+	 * Determine what the local node to this CPU is. Ordinarily
+	 * this would be cpu_to_node() but for memoryless nodes, that
+	 * is not the best value. Instead, we take the numa node that
+	 * kmem_cache_cpu is allocated from as being the best guess
+	 * as being local because it'll match what the page allocator
+	 * thinks is the most local
+	 */
+	c->local_nid = page_to_nid(virt_to_page((unsigned long)c & PAGE_MASK));
 }
 
 #ifdef CONFIG_NUMA
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 3/3] slqb: Allow SLQB to be used on PPC
  2009-09-21 16:10 ` Mel Gorman
@ 2009-09-21 16:10   ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 16:10 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Mel Gorman,
	Tejun Heo, Benjamin Herrenschmidt

SLQB was disabled on PPC as it would stab itself in the face when running
on machines with CPUs on memoryless nodes. As those configurations should
now work, allow SLQB to be configured again on PPC.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 init/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index adc10ab..8f55fde 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1033,7 +1033,7 @@ config SLUB
 
 config SLQB
 	bool "SLQB (Queued allocator)"
-	depends on !PPC && !S390
+	depends on !S390
 	help
 	  SLQB is a proposed new slab allocator.
 
-- 
1.6.3.3


^ permalink raw reply related	[flat|nested] 70+ messages in thread

* [PATCH 3/3] slqb: Allow SLQB to be used on PPC
@ 2009-09-21 16:10   ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 16:10 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Mel Gorman,
	Tejun Heo, Benjamin Herrenschmidt

SLQB was disabled on PPC as it would stab itself in the face when running
on machines with CPUs on memoryless nodes. As those configurations should
now work, allow SLQB to be configured again on PPC.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 init/Kconfig |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index adc10ab..8f55fde 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1033,7 +1033,7 @@ config SLUB
 
 config SLQB
 	bool "SLQB (Queued allocator)"
-	depends on !PPC && !S390
+	depends on !S390
 	help
 	  SLQB is a proposed new slab allocator.
 
-- 
1.6.3.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
  2009-09-21 16:10   ` Mel Gorman
@ 2009-09-21 17:17     ` Daniel Walker
  -1 siblings, 0 replies; 70+ messages in thread
From: Daniel Walker @ 2009-09-21 17:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
> assumption is made that all valid node IDs will have matching valid CPU
> ids. In memoryless configurations, it is possible to have a node ID with
> no CPU having the same ID. When this happens, a per-cpu are is not
> created and the value of paca[cpu].data_offset is some random value.
> This is later deferenced and the system crashes after accessing some
> invalid address.
> 
> This patch hacks powerpc to allocate per-cpu areas for node IDs that
> have no corresponding CPU id. This gets around the immediate problem but
> it should be discussed if there is a requirement for a DEFINE_PER_NODE
> and how it should be implemented.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
>  1 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 1f68160..a5f52d4 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
>  		paca[i].data_offset = ptr - __per_cpu_start;
>  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
>  	}
> +#ifdef CONFIG_SLQB
> +	/* 
> +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> +	 * assumes that ever node ID will have a CPU of that ID to match.
> +	 * On systems with memoryless nodes, this may not hold true. Hence,
> +	 * we take a second pass initialising a "per-cpu" area for node-ids
> +	 * that SLQB can use
> +	 */

Very trivial, but there's a little trailing whitespace in the first line
of the comment (checkpatch warns on it.) You also spelled initializing
wrong.

Daniel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
@ 2009-09-21 17:17     ` Daniel Walker
  0 siblings, 0 replies; 70+ messages in thread
From: Daniel Walker @ 2009-09-21 17:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
> assumption is made that all valid node IDs will have matching valid CPU
> ids. In memoryless configurations, it is possible to have a node ID with
> no CPU having the same ID. When this happens, a per-cpu are is not
> created and the value of paca[cpu].data_offset is some random value.
> This is later deferenced and the system crashes after accessing some
> invalid address.
> 
> This patch hacks powerpc to allocate per-cpu areas for node IDs that
> have no corresponding CPU id. This gets around the immediate problem but
> it should be discussed if there is a requirement for a DEFINE_PER_NODE
> and how it should be implemented.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
>  1 files changed, 20 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 1f68160..a5f52d4 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
>  		paca[i].data_offset = ptr - __per_cpu_start;
>  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
>  	}
> +#ifdef CONFIG_SLQB
> +	/* 
> +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> +	 * assumes that ever node ID will have a CPU of that ID to match.
> +	 * On systems with memoryless nodes, this may not hold true. Hence,
> +	 * we take a second pass initialising a "per-cpu" area for node-ids
> +	 * that SLQB can use
> +	 */

Very trivial, but there's a little trailing whitespace in the first line
of the comment (checkpatch warns on it.) You also spelled initializing
wrong.

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
  2009-09-21 17:17     ` Daniel Walker
@ 2009-09-21 17:24       ` Randy Dunlap
  -1 siblings, 0 replies; 70+ messages in thread
From: Randy Dunlap @ 2009-09-21 17:24 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Mel Gorman, Nick Piggin, Pekka Enberg, Christoph Lameter,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, 21 Sep 2009 10:17:52 -0700 Daniel Walker wrote:

> On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> > SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
> > assumption is made that all valid node IDs will have matching valid CPU
> > ids. In memoryless configurations, it is possible to have a node ID with
> > no CPU having the same ID. When this happens, a per-cpu are is not
> > created and the value of paca[cpu].data_offset is some random value.
> > This is later deferenced and the system crashes after accessing some
> > invalid address.
> > 
> > This patch hacks powerpc to allocate per-cpu areas for node IDs that
> > have no corresponding CPU id. This gets around the immediate problem but
> > it should be discussed if there is a requirement for a DEFINE_PER_NODE
> > and how it should be implemented.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
> >  1 files changed, 20 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index 1f68160..a5f52d4 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
> >  		paca[i].data_offset = ptr - __per_cpu_start;
> >  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> >  	}
> > +#ifdef CONFIG_SLQB
> > +	/* 
> > +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> > +	 * assumes that ever node ID will have a CPU of that ID to match.
> > +	 * On systems with memoryless nodes, this may not hold true. Hence,
> > +	 * we take a second pass initialising a "per-cpu" area for node-ids
> > +	 * that SLQB can use
> > +	 */
> 
> Very trivial, but there's a little trailing whitespace in the first line
> of the comment (checkpatch warns on it.) You also spelled initializing
> wrong.

re: spelling.  Not really.  Think internationally.

---
~Randy
LPC 2009, Sept. 23-25, Portland, Oregon
http://linuxplumbersconf.org/2009/

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
@ 2009-09-21 17:24       ` Randy Dunlap
  0 siblings, 0 replies; 70+ messages in thread
From: Randy Dunlap @ 2009-09-21 17:24 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Mel Gorman, Nick Piggin, Pekka Enberg, Christoph Lameter,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, 21 Sep 2009 10:17:52 -0700 Daniel Walker wrote:

> On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> > SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
> > assumption is made that all valid node IDs will have matching valid CPU
> > ids. In memoryless configurations, it is possible to have a node ID with
> > no CPU having the same ID. When this happens, a per-cpu are is not
> > created and the value of paca[cpu].data_offset is some random value.
> > This is later deferenced and the system crashes after accessing some
> > invalid address.
> > 
> > This patch hacks powerpc to allocate per-cpu areas for node IDs that
> > have no corresponding CPU id. This gets around the immediate problem but
> > it should be discussed if there is a requirement for a DEFINE_PER_NODE
> > and how it should be implemented.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
> >  1 files changed, 20 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index 1f68160..a5f52d4 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
> >  		paca[i].data_offset = ptr - __per_cpu_start;
> >  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> >  	}
> > +#ifdef CONFIG_SLQB
> > +	/* 
> > +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> > +	 * assumes that ever node ID will have a CPU of that ID to match.
> > +	 * On systems with memoryless nodes, this may not hold true. Hence,
> > +	 * we take a second pass initialising a "per-cpu" area for node-ids
> > +	 * that SLQB can use
> > +	 */
> 
> Very trivial, but there's a little trailing whitespace in the first line
> of the comment (checkpatch warns on it.) You also spelled initializing
> wrong.

re: spelling.  Not really.  Think internationally.

---
~Randy
LPC 2009, Sept. 23-25, Portland, Oregon
http://linuxplumbersconf.org/2009/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
  2009-09-21 17:24       ` Randy Dunlap
@ 2009-09-21 17:29         ` Daniel Walker
  -1 siblings, 0 replies; 70+ messages in thread
From: Daniel Walker @ 2009-09-21 17:29 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Mel Gorman, Nick Piggin, Pekka Enberg, Christoph Lameter,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, 2009-09-21 at 10:24 -0700, Randy Dunlap wrote:
> On Mon, 21 Sep 2009 10:17:52 -0700 Daniel Walker wrote:
> 
> > On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> > > SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
> > > assumption is made that all valid node IDs will have matching valid CPU
> > > ids. In memoryless configurations, it is possible to have a node ID with
> > > no CPU having the same ID. When this happens, a per-cpu are is not
> > > created and the value of paca[cpu].data_offset is some random value.
> > > This is later deferenced and the system crashes after accessing some
> > > invalid address.
> > > 
> > > This patch hacks powerpc to allocate per-cpu areas for node IDs that
> > > have no corresponding CPU id. This gets around the immediate problem but
> > > it should be discussed if there is a requirement for a DEFINE_PER_NODE
> > > and how it should be implemented.
> > > 
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > ---
> > >  arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
> > >  1 files changed, 20 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > > index 1f68160..a5f52d4 100644
> > > --- a/arch/powerpc/kernel/setup_64.c
> > > +++ b/arch/powerpc/kernel/setup_64.c
> > > @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
> > >  		paca[i].data_offset = ptr - __per_cpu_start;
> > >  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> > >  	}
> > > +#ifdef CONFIG_SLQB
> > > +	/* 
> > > +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> > > +	 * assumes that ever node ID will have a CPU of that ID to match.
> > > +	 * On systems with memoryless nodes, this may not hold true. Hence,
> > > +	 * we take a second pass initialising a "per-cpu" area for node-ids
> > > +	 * that SLQB can use
> > > +	 */
> > 
> > Very trivial, but there's a little trailing whitespace in the first line
> > of the comment (checkpatch warns on it.) You also spelled initializing
> > wrong.
> 
> re: spelling.  Not really.  Think internationally.

Yeah, I realized that after I sent it .. So misspelled in the American
sense I guess.

Daniel


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
@ 2009-09-21 17:29         ` Daniel Walker
  0 siblings, 0 replies; 70+ messages in thread
From: Daniel Walker @ 2009-09-21 17:29 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Mel Gorman, Nick Piggin, Pekka Enberg, Christoph Lameter,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, 2009-09-21 at 10:24 -0700, Randy Dunlap wrote:
> On Mon, 21 Sep 2009 10:17:52 -0700 Daniel Walker wrote:
> 
> > On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> > > SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
> > > assumption is made that all valid node IDs will have matching valid CPU
> > > ids. In memoryless configurations, it is possible to have a node ID with
> > > no CPU having the same ID. When this happens, a per-cpu are is not
> > > created and the value of paca[cpu].data_offset is some random value.
> > > This is later deferenced and the system crashes after accessing some
> > > invalid address.
> > > 
> > > This patch hacks powerpc to allocate per-cpu areas for node IDs that
> > > have no corresponding CPU id. This gets around the immediate problem but
> > > it should be discussed if there is a requirement for a DEFINE_PER_NODE
> > > and how it should be implemented.
> > > 
> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > > ---
> > >  arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
> > >  1 files changed, 20 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > > index 1f68160..a5f52d4 100644
> > > --- a/arch/powerpc/kernel/setup_64.c
> > > +++ b/arch/powerpc/kernel/setup_64.c
> > > @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
> > >  		paca[i].data_offset = ptr - __per_cpu_start;
> > >  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> > >  	}
> > > +#ifdef CONFIG_SLQB
> > > +	/* 
> > > +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> > > +	 * assumes that ever node ID will have a CPU of that ID to match.
> > > +	 * On systems with memoryless nodes, this may not hold true. Hence,
> > > +	 * we take a second pass initialising a "per-cpu" area for node-ids
> > > +	 * that SLQB can use
> > > +	 */
> > 
> > Very trivial, but there's a little trailing whitespace in the first line
> > of the comment (checkpatch warns on it.) You also spelled initializing
> > wrong.
> 
> re: spelling.  Not really.  Think internationally.

Yeah, I realized that after I sent it .. So misspelled in the American
sense I guess.

Daniel

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
  2009-09-21 17:17     ` Daniel Walker
@ 2009-09-21 17:42       ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 17:42 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 10:17:52AM -0700, Daniel Walker wrote:
> On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> > SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
> > assumption is made that all valid node IDs will have matching valid CPU
> > ids. In memoryless configurations, it is possible to have a node ID with
> > no CPU having the same ID. When this happens, a per-cpu are is not
> > created and the value of paca[cpu].data_offset is some random value.
> > This is later deferenced and the system crashes after accessing some
> > invalid address.
> > 
> > This patch hacks powerpc to allocate per-cpu areas for node IDs that
> > have no corresponding CPU id. This gets around the immediate problem but
> > it should be discussed if there is a requirement for a DEFINE_PER_NODE
> > and how it should be implemented.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
> >  1 files changed, 20 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index 1f68160..a5f52d4 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
> >  		paca[i].data_offset = ptr - __per_cpu_start;
> >  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> >  	}
> > +#ifdef CONFIG_SLQB
> > +	/* 
> > +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> > +	 * assumes that ever node ID will have a CPU of that ID to match.
> > +	 * On systems with memoryless nodes, this may not hold true. Hence,
> > +	 * we take a second pass initialising a "per-cpu" area for node-ids
> > +	 * that SLQB can use
> > +	 */
> 
> Very trivial, but there's a little trailing whitespace in the first line
> of the comment (checkpatch warns on it.)

D'oh, will clean it up in the next revision if the substance of the
patch is agreed upon.

> You also spelled initializing wrong.

It's spelt correctly for British English.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
@ 2009-09-21 17:42       ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 17:42 UTC (permalink / raw)
  To: Daniel Walker
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 10:17:52AM -0700, Daniel Walker wrote:
> On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> > SLQB uses DEFINE_PER_CPU to define per-node areas. An implicit
> > assumption is made that all valid node IDs will have matching valid CPU
> > ids. In memoryless configurations, it is possible to have a node ID with
> > no CPU having the same ID. When this happens, a per-cpu are is not
> > created and the value of paca[cpu].data_offset is some random value.
> > This is later deferenced and the system crashes after accessing some
> > invalid address.
> > 
> > This patch hacks powerpc to allocate per-cpu areas for node IDs that
> > have no corresponding CPU id. This gets around the immediate problem but
> > it should be discussed if there is a requirement for a DEFINE_PER_NODE
> > and how it should be implemented.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  arch/powerpc/kernel/setup_64.c |   20 ++++++++++++++++++++
> >  1 files changed, 20 insertions(+), 0 deletions(-)
> > 
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index 1f68160..a5f52d4 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
> >  		paca[i].data_offset = ptr - __per_cpu_start;
> >  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> >  	}
> > +#ifdef CONFIG_SLQB
> > +	/* 
> > +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> > +	 * assumes that ever node ID will have a CPU of that ID to match.
> > +	 * On systems with memoryless nodes, this may not hold true. Hence,
> > +	 * we take a second pass initialising a "per-cpu" area for node-ids
> > +	 * that SLQB can use
> > +	 */
> 
> Very trivial, but there's a little trailing whitespace in the first line
> of the comment (checkpatch warns on it.)

D'oh, will clean it up in the next revision if the substance of the
patch is agreed upon.

> You also spelled initializing wrong.

It's spelt correctly for British English.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-21 16:10 ` Mel Gorman
@ 2009-09-21 17:46   ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 17:46 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 05:10:23PM +0100, Mel Gorman wrote:
> Currently SLQB is not allowed to be configured on PPC and S390 machines as
> CPUs can belong to memoryless nodes. SLQB does not deal with this very well
> and crashes reliably.
> 
> These patches fix the problem on PPC64 and it appears to be fairly stable.
> At least, basic actions that were previously silently halting the machine
> complete successfully.

I spoke too soon. Stress tests result in application failure, nothing to
dmesg even with the patches applied so it looks like patch 2 is still the
wrong way to fix the OOM-kill storm.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-21 17:46   ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 17:46 UTC (permalink / raw)
  To: Nick Piggin, Pekka Enberg, Christoph Lameter
  Cc: heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 05:10:23PM +0100, Mel Gorman wrote:
> Currently SLQB is not allowed to be configured on PPC and S390 machines as
> CPUs can belong to memoryless nodes. SLQB does not deal with this very well
> and crashes reliably.
> 
> These patches fix the problem on PPC64 and it appears to be fairly stable.
> At least, basic actions that were previously silently halting the machine
> complete successfully.

I spoke too soon. Stress tests result in application failure, nothing to
dmesg even with the patches applied so it looks like patch 2 is still the
wrong way to fix the OOM-kill storm.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-21 17:46   ` Mel Gorman
@ 2009-09-21 17:54     ` Christoph Lameter
  -1 siblings, 0 replies; 70+ messages in thread
From: Christoph Lameter @ 2009-09-21 17:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

Lets just keep SLQB back until the basic issues with memoryless nodes are
resolved. There does not seem to be an easy way to deal with this. Some
thought needs to go into how memoryless node handling relates to per cpu
lists and locking. List handling issues need to be addressed before SLQB.
can work reliably. The same issues can surface on x86 platforms with weird
NUMA memory setups.

Or just allow SLQB for !NUMA configurations and merge it now.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-21 17:54     ` Christoph Lameter
  0 siblings, 0 replies; 70+ messages in thread
From: Christoph Lameter @ 2009-09-21 17:54 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

Lets just keep SLQB back until the basic issues with memoryless nodes are
resolved. There does not seem to be an easy way to deal with this. Some
thought needs to go into how memoryless node handling relates to per cpu
lists and locking. List handling issues need to be addressed before SLQB.
can work reliably. The same issues can surface on x86 platforms with weird
NUMA memory setups.

Or just allow SLQB for !NUMA configurations and merge it now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-21 17:54     ` Christoph Lameter
@ 2009-09-21 18:05       ` Pekka Enberg
  -1 siblings, 0 replies; 70+ messages in thread
From: Pekka Enberg @ 2009-09-21 18:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Nick Piggin, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 8:54 PM, Christoph Lameter
<cl@linux-foundation.org> wrote:
> Lets just keep SLQB back until the basic issues with memoryless nodes are
> resolved. There does not seem to be an easy way to deal with this. Some
> thought needs to go into how memoryless node handling relates to per cpu
> lists and locking. List handling issues need to be addressed before SLQB.
> can work reliably. The same issues can surface on x86 platforms with weird
> NUMA memory setups.
>
> Or just allow SLQB for !NUMA configurations and merge it now.

I'm holding on to it until the issues are resolved.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-21 18:05       ` Pekka Enberg
  0 siblings, 0 replies; 70+ messages in thread
From: Pekka Enberg @ 2009-09-21 18:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Nick Piggin, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 8:54 PM, Christoph Lameter
<cl@linux-foundation.org> wrote:
> Lets just keep SLQB back until the basic issues with memoryless nodes are
> resolved. There does not seem to be an easy way to deal with this. Some
> thought needs to go into how memoryless node handling relates to per cpu
> lists and locking. List handling issues need to be addressed before SLQB.
> can work reliably. The same issues can surface on x86 platforms with weird
> NUMA memory setups.
>
> Or just allow SLQB for !NUMA configurations and merge it now.

I'm holding on to it until the issues are resolved.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-21 17:54     ` Christoph Lameter
@ 2009-09-21 18:07       ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 18:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Pekka Enberg, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
> Lets just keep SLQB back until the basic issues with memoryless nodes are
> resolved.

It's not even super-clear that the memoryless nodes issues are entirely
related to SLQB. Sachin for example says that there was a stall issue
with memoryless nodes that could be triggered without SLQB. Sachin, is
that still accurate?

If so, it's possible that SLQB somehow exasperates the problem in some
unknown fashion.

> There does not seem to be an easy way to deal with this. Some
> thought needs to go into how memoryless node handling relates to per cpu
> lists and locking. List handling issues need to be addressed before SLQB.
> can work reliably. The same issues can surface on x86 platforms with weird
> NUMA memory setups.
> 

Can you spot if there is something fundamentally wrong with patch 2? I.e. what
is wrong with treating the closest node as local instead of only the
closest node?

> Or just allow SLQB for !NUMA configurations and merge it now.
> 

Forcing SLQB !NUMA will not rattle out any existing list issues
unfortunately :(.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-21 18:07       ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-21 18:07 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Pekka Enberg, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
> Lets just keep SLQB back until the basic issues with memoryless nodes are
> resolved.

It's not even super-clear that the memoryless nodes issues are entirely
related to SLQB. Sachin for example says that there was a stall issue
with memoryless nodes that could be triggered without SLQB. Sachin, is
that still accurate?

If so, it's possible that SLQB somehow exasperates the problem in some
unknown fashion.

> There does not seem to be an easy way to deal with this. Some
> thought needs to go into how memoryless node handling relates to per cpu
> lists and locking. List handling issues need to be addressed before SLQB.
> can work reliably. The same issues can surface on x86 platforms with weird
> NUMA memory setups.
> 

Can you spot if there is something fundamentally wrong with patch 2? I.e. what
is wrong with treating the closest node as local instead of only the
closest node?

> Or just allow SLQB for !NUMA configurations and merge it now.
> 

Forcing SLQB !NUMA will not rattle out any existing list issues
unfortunately :(.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-21 18:07       ` Mel Gorman
@ 2009-09-21 18:17         ` Christoph Lameter
  -1 siblings, 0 replies; 70+ messages in thread
From: Christoph Lameter @ 2009-09-21 18:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, 21 Sep 2009, Mel Gorman wrote:
> Can you spot if there is something fundamentally wrong with patch 2? I.e. what
> is wrong with treating the closest node as local instead of only the
> closest node?

Depends on the way locking is done for percpu queues (likely lockless).
A misidentification of the numa locality of an object may result in locks
not being taken that should have been taken.

> > Or just allow SLQB for !NUMA configurations and merge it now.
> >
>
> Forcing SLQB !NUMA will not rattle out any existing list issues
> unfortunately :(.

But it will make SLQB work right in permitted configurations. The NUMA
issues can then be fixed later upstream.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-21 18:17         ` Christoph Lameter
  0 siblings, 0 replies; 70+ messages in thread
From: Christoph Lameter @ 2009-09-21 18:17 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, 21 Sep 2009, Mel Gorman wrote:
> Can you spot if there is something fundamentally wrong with patch 2? I.e. what
> is wrong with treating the closest node as local instead of only the
> closest node?

Depends on the way locking is done for percpu queues (likely lockless).
A misidentification of the numa locality of an object may result in locks
not being taken that should have been taken.

> > Or just allow SLQB for !NUMA configurations and merge it now.
> >
>
> Forcing SLQB !NUMA will not rattle out any existing list issues
> unfortunately :(.

But it will make SLQB work right in permitted configurations. The NUMA
issues can then be fixed later upstream.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-21 16:10 ` Mel Gorman
@ 2009-09-22  0:00   ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 70+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-22  0:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Tejun Heo

On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> 
> It needs signed-off from the powerpc side because it's now allocating
> more
> memory potentially (Ben?). An alternative to this patch is in V1 that
> statically declares the per-node structures but this is potentially
> sub-optimal but from a performance and memory utilisation perspective.

So if I understand correctly, we have a problem with both cpu-less and
memory-less nodes. Interesting setups :-)

I have no strong objection on the allocating of the per-cpu data for
the cpu-less nodes. However, I wonder if we should do that a bit more
nicely, maybe with some kind of "adjusted" cpu_possible_mask() (could be
something like cpu_node_valid_mask or similar) to be used by percpu.

Mostly because it would be nice to have built-in debug features in
per-cpu and in that case, it would need some way to know a valid
number from an invalid one). Either that or just keep track of the
mask of cpus that had percpu data allocated to them

Cheers,
Ben.



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22  0:00   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 70+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-22  0:00 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Tejun Heo

On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> 
> It needs signed-off from the powerpc side because it's now allocating
> more
> memory potentially (Ben?). An alternative to this patch is in V1 that
> statically declares the per-node structures but this is potentially
> sub-optimal but from a performance and memory utilisation perspective.

So if I understand correctly, we have a problem with both cpu-less and
memory-less nodes. Interesting setups :-)

I have no strong objection on the allocating of the per-cpu data for
the cpu-less nodes. However, I wonder if we should do that a bit more
nicely, maybe with some kind of "adjusted" cpu_possible_mask() (could be
something like cpu_node_valid_mask or similar) to be used by percpu.

Mostly because it would be nice to have built-in debug features in
per-cpu and in that case, it would need some way to know a valid
number from an invalid one). Either that or just keep track of the
mask of cpus that had percpu data allocated to them

Cheers,
Ben.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
  2009-09-21 16:10   ` Mel Gorman
@ 2009-09-22  0:01     ` Tejun Heo
  -1 siblings, 0 replies; 70+ messages in thread
From: Tejun Heo @ 2009-09-22  0:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Benjamin Herrenschmidt

Hello,

Mel Gorman wrote:
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 1f68160..a5f52d4 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
>  		paca[i].data_offset = ptr - __per_cpu_start;
>  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
>  	}
> +#ifdef CONFIG_SLQB
> +	/* 
> +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> +	 * assumes that ever node ID will have a CPU of that ID to match.
> +	 * On systems with memoryless nodes, this may not hold true. Hence,
> +	 * we take a second pass initialising a "per-cpu" area for node-ids
> +	 * that SLQB can use
> +	 */
> +	for_each_node_state(i, N_NORMAL_MEMORY) {
> +
> +		/* Skip node IDs that a valid CPU id exists for */
> +		if (paca[i].data_offset)
> +			continue;
> +
> +		ptr = alloc_bootmem_pages_node(NODE_DATA(cpu_to_node(i)), size);
> +
> +		paca[i].data_offset = ptr - __per_cpu_start;
> +		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> +	}
> +#endif /* CONFIG_SLQB */
>  }
>  #endif

Eh... I don't know.  This seems too hacky to me.  Why not just
allocate pointer array of MAX_NUMNODES and allocate per-node memory
there?  This will be slightly more expensive but I doubt it will be
noticeable.  The only extra overhead is the cachline footprint for the
extra array.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
@ 2009-09-22  0:01     ` Tejun Heo
  0 siblings, 0 replies; 70+ messages in thread
From: Tejun Heo @ 2009-09-22  0:01 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Benjamin Herrenschmidt

Hello,

Mel Gorman wrote:
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index 1f68160..a5f52d4 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
>  		paca[i].data_offset = ptr - __per_cpu_start;
>  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
>  	}
> +#ifdef CONFIG_SLQB
> +	/* 
> +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> +	 * assumes that ever node ID will have a CPU of that ID to match.
> +	 * On systems with memoryless nodes, this may not hold true. Hence,
> +	 * we take a second pass initialising a "per-cpu" area for node-ids
> +	 * that SLQB can use
> +	 */
> +	for_each_node_state(i, N_NORMAL_MEMORY) {
> +
> +		/* Skip node IDs that a valid CPU id exists for */
> +		if (paca[i].data_offset)
> +			continue;
> +
> +		ptr = alloc_bootmem_pages_node(NODE_DATA(cpu_to_node(i)), size);
> +
> +		paca[i].data_offset = ptr - __per_cpu_start;
> +		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> +	}
> +#endif /* CONFIG_SLQB */
>  }
>  #endif

Eh... I don't know.  This seems too hacky to me.  Why not just
allocate pointer array of MAX_NUMNODES and allocate per-node memory
there?  This will be slightly more expensive but I doubt it will be
noticeable.  The only extra overhead is the cachline footprint for the
extra array.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22  0:00   ` Benjamin Herrenschmidt
@ 2009-09-22  0:19     ` David Rientjes
  -1 siblings, 0 replies; 70+ messages in thread
From: David Rientjes @ 2009-09-22  0:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Mel Gorman, Nick Piggin, Pekka Enberg, Christoph Lameter,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Tue, 22 Sep 2009, Benjamin Herrenschmidt wrote:

> So if I understand correctly, we have a problem with both cpu-less and
> memory-less nodes. Interesting setups :-)
> 

I agree with Christoph that we need to resolve the larger kernel issue of 
memoryless nodes in the kernel and the result of that work will most 
likely become the basis from which the slqb fixes originate.

I disagree that we need kernel support for memoryless nodes on x86 and 
probably on all architectures period.  "NUMA nodes" will always contain 
memory by definition and I think hijacking the node abstraction away from 
representing anything but memory affinity is wrong in the interest of a 
long-term maintainable kernel and will continue to cause issues such as 
this in other subsystems.

I do understand the asymmetries of these machines, including the ppc that 
is triggering this particular hang with slqb.  But I believe the support 
can be implemented in a different way: I would offer an alternative 
representation based entirely on node distances.  This would isolate each 
region of memory that has varying affinity to cpus, pci busses, etc., into 
nodes and then report a distance, whether local or remote, to other nodes 
much in the way the ACPI specification does with proximity domains.

Using node distances instead of memoryless nodes would still be able to 
represent all asymmetric machines that currently benefit from the support 
by binding devices to memory regions to which they have the closest 
affinity and then reporting relative distances to other nodes via 
node_distance().

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22  0:19     ` David Rientjes
  0 siblings, 0 replies; 70+ messages in thread
From: David Rientjes @ 2009-09-22  0:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Mel Gorman, Nick Piggin, Pekka Enberg, Christoph Lameter,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Tue, 22 Sep 2009, Benjamin Herrenschmidt wrote:

> So if I understand correctly, we have a problem with both cpu-less and
> memory-less nodes. Interesting setups :-)
> 

I agree with Christoph that we need to resolve the larger kernel issue of 
memoryless nodes in the kernel and the result of that work will most 
likely become the basis from which the slqb fixes originate.

I disagree that we need kernel support for memoryless nodes on x86 and 
probably on all architectures period.  "NUMA nodes" will always contain 
memory by definition and I think hijacking the node abstraction away from 
representing anything but memory affinity is wrong in the interest of a 
long-term maintainable kernel and will continue to cause issues such as 
this in other subsystems.

I do understand the asymmetries of these machines, including the ppc that 
is triggering this particular hang with slqb.  But I believe the support 
can be implemented in a different way: I would offer an alternative 
representation based entirely on node distances.  This would isolate each 
region of memory that has varying affinity to cpus, pci busses, etc., into 
nodes and then report a distance, whether local or remote, to other nodes 
much in the way the ACPI specification does with proximity domains.

Using node distances instead of memoryless nodes would still be able to 
represent all asymmetric machines that currently benefit from the support 
by binding devices to memory regions to which they have the closest 
affinity and then reporting relative distances to other nodes via 
node_distance().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-21 18:07       ` Mel Gorman
@ 2009-09-22  5:03         ` Sachin Sant
  -1 siblings, 0 replies; 70+ messages in thread
From: Sachin Sant @ 2009-09-22  5:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

Mel Gorman wrote:
> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>   
>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>> resolved.
>>     
>
> It's not even super-clear that the memoryless nodes issues are entirely
> related to SLQB. Sachin for example says that there was a stall issue
> with memoryless nodes that could be triggered without SLQB. Sachin, is
> that still accurate?
>   
I think there are two different problems that we are dealing with.

First one is the SLQB not working on a ppc64 box which seems to be specific
to only one machine and i haven't seen that on other power boxes.The patches
that you have posted seems to allow the box to boot, but eventually it hits
the stall issue(related to percpu dynamic allocator not working on ppc64),
which is the second problem we are dealing with.

The stall issue seems to be much more critical as it is affecting almost
all of the power boxes that i have tested with (4 in all).
This issue is seen with Linus tree as well and was first seen with
2.6.31-git5 (0cb583fd..) 

The stall issue was reported here:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html

Thanks
-Sachin


> If so, it's possible that SLQB somehow exasperates the problem in some
> unknown fashion.
>
>   
>> There does not seem to be an easy way to deal with this. Some
>> thought needs to go into how memoryless node handling relates to per cpu
>> lists and locking. List handling issues need to be addressed before SLQB.
>> can work reliably. The same issues can surface on x86 platforms with weird
>> NUMA memory setups.
>>
>>     
>
> Can you spot if there is something fundamentally wrong with patch 2? I.e. what
> is wrong with treating the closest node as local instead of only the
> closest node?
>
>   
>> Or just allow SLQB for !NUMA configurations and merge it now.
>>
>>     
>
> Forcing SLQB !NUMA will not rattle out any existing list issues
> unfortunately :(.
>
>   


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22  5:03         ` Sachin Sant
  0 siblings, 0 replies; 70+ messages in thread
From: Sachin Sant @ 2009-09-22  5:03 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

Mel Gorman wrote:
> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>   
>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>> resolved.
>>     
>
> It's not even super-clear that the memoryless nodes issues are entirely
> related to SLQB. Sachin for example says that there was a stall issue
> with memoryless nodes that could be triggered without SLQB. Sachin, is
> that still accurate?
>   
I think there are two different problems that we are dealing with.

First one is the SLQB not working on a ppc64 box which seems to be specific
to only one machine and i haven't seen that on other power boxes.The patches
that you have posted seems to allow the box to boot, but eventually it hits
the stall issue(related to percpu dynamic allocator not working on ppc64),
which is the second problem we are dealing with.

The stall issue seems to be much more critical as it is affecting almost
all of the power boxes that i have tested with (4 in all).
This issue is seen with Linus tree as well and was first seen with
2.6.31-git5 (0cb583fd..) 

The stall issue was reported here:
http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html

Thanks
-Sachin


> If so, it's possible that SLQB somehow exasperates the problem in some
> unknown fashion.
>
>   
>> There does not seem to be an easy way to deal with this. Some
>> thought needs to go into how memoryless node handling relates to per cpu
>> lists and locking. List handling issues need to be addressed before SLQB.
>> can work reliably. The same issues can surface on x86 platforms with weird
>> NUMA memory setups.
>>
>>     
>
> Can you spot if there is something fundamentally wrong with patch 2? I.e. what
> is wrong with treating the closest node as local instead of only the
> closest node?
>
>   
>> Or just allow SLQB for !NUMA configurations and merge it now.
>>
>>     
>
> Forcing SLQB !NUMA will not rattle out any existing list issues
> unfortunately :(.
>
>   


-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22  0:19     ` David Rientjes
@ 2009-09-22  6:30       ` Christoph Lameter
  -1 siblings, 0 replies; 70+ messages in thread
From: Christoph Lameter @ 2009-09-22  6:30 UTC (permalink / raw)
  To: David Rientjes
  Cc: Benjamin Herrenschmidt, Mel Gorman, Nick Piggin, Pekka Enberg,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Mon, 21 Sep 2009, David Rientjes wrote:

> I disagree that we need kernel support for memoryless nodes on x86 and
> probably on all architectures period.  "NUMA nodes" will always contain
> memory by definition and I think hijacking the node abstraction away from
> representing anything but memory affinity is wrong in the interest of a
> long-term maintainable kernel and will continue to cause issues such as
> this in other subsystems.

Amen. Sadly my past opinions on this did not seem convincing enough.

> I do understand the asymmetries of these machines, including the ppc that
> is triggering this particular hang with slqb.  But I believe the support
> can be implemented in a different way: I would offer an alternative
> representation based entirely on node distances.  This would isolate each
> region of memory that has varying affinity to cpus, pci busses, etc., into
> nodes and then report a distance, whether local or remote, to other nodes
> much in the way the ACPI specification does with proximity domains.

Good idea.

> Using node distances instead of memoryless nodes would still be able to
> represent all asymmetric machines that currently benefit from the support
> by binding devices to memory regions to which they have the closest
> affinity and then reporting relative distances to other nodes via
> node_distance().

How would you deal with a memoryless node that has lets say 4 processors
and some I/O devices? Now the memory policy is round robin and there are 4
nodes at the same distance with 4G memory each. Does one of the nodes now
become priviledged under your plan? How do you equally use memory from all
these nodes?



^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22  6:30       ` Christoph Lameter
  0 siblings, 0 replies; 70+ messages in thread
From: Christoph Lameter @ 2009-09-22  6:30 UTC (permalink / raw)
  To: David Rientjes
  Cc: Benjamin Herrenschmidt, Mel Gorman, Nick Piggin, Pekka Enberg,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Mon, 21 Sep 2009, David Rientjes wrote:

> I disagree that we need kernel support for memoryless nodes on x86 and
> probably on all architectures period.  "NUMA nodes" will always contain
> memory by definition and I think hijacking the node abstraction away from
> representing anything but memory affinity is wrong in the interest of a
> long-term maintainable kernel and will continue to cause issues such as
> this in other subsystems.

Amen. Sadly my past opinions on this did not seem convincing enough.

> I do understand the asymmetries of these machines, including the ppc that
> is triggering this particular hang with slqb.  But I believe the support
> can be implemented in a different way: I would offer an alternative
> representation based entirely on node distances.  This would isolate each
> region of memory that has varying affinity to cpus, pci busses, etc., into
> nodes and then report a distance, whether local or remote, to other nodes
> much in the way the ACPI specification does with proximity domains.

Good idea.

> Using node distances instead of memoryless nodes would still be able to
> represent all asymmetric machines that currently benefit from the support
> by binding devices to memory regions to which they have the closest
> affinity and then reporting relative distances to other nodes via
> node_distance().

How would you deal with a memoryless node that has lets say 4 processors
and some I/O devices? Now the memory policy is round robin and there are 4
nodes at the same distance with 4G memory each. Does one of the nodes now
become priviledged under your plan? How do you equally use memory from all
these nodes?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22  6:30       ` Christoph Lameter
@ 2009-09-22  7:59         ` David Rientjes
  -1 siblings, 0 replies; 70+ messages in thread
From: David Rientjes @ 2009-09-22  7:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Benjamin Herrenschmidt, Mel Gorman, Nick Piggin, Pekka Enberg,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Tue, 22 Sep 2009, Christoph Lameter wrote:

> How would you deal with a memoryless node that has lets say 4 processors
> and some I/O devices? Now the memory policy is round robin and there are 4
> nodes at the same distance with 4G memory each. Does one of the nodes now
> become priviledged under your plan? How do you equally use memory from all
> these nodes?
> 

If the distance between the memoryless node with the cpus/devices and all 
4G nodes is the same, then this is UMA and no abstraction is necessary: 
there's no reason to support interleaving of memory allocations amongst 
four different regions of memory if there's no difference in latencies to 
those regions.

It is possible, however, to have a system configured in such a way that 
representing all devices, including memory, at a single level of 
abstraction isn't possible.  An example is a four cpu system where cpus 
0-1 have local distance to all memory and cpus 2-3 have remote distance.

A solution would be to abstract everything into "system localities" like 
the ACPI specification does.  These localities in my plan are slightly 
different, though: they are limited to only a single class of device.

A locality is simply an aggregate of a particular type of device; a device 
is bound to a locality if it shares the same proximity as all other 
devices in that locality to all other localities.  In other words, the  
previous example would have two cpu localities: one with cpus 0-1 and one 
with cpus 2-3.  If cpu 0 had a different proximity than cpu 1 to a pci 
bus, however, there would be three cpu localities.

The equivalent of proximity domains then describes the distance between 
all localities; these distances need not be one-way, it is possible for 
distance in one direction to be different from the opposite direction, 
just as ACPI pxm's allow.

A "node" in this plan is simply a system locality consisting of memory.

For subsystems such as slab allocators, all we require is cpu_to_node() 
tables which would map cpu localities to nodes and describe them in terms 
of local or remote distance (or whatever the SLIT says, if provided).  All 
present day information can still be represented in this model, we've just 
added additional layers of abstraction internally.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22  7:59         ` David Rientjes
  0 siblings, 0 replies; 70+ messages in thread
From: David Rientjes @ 2009-09-22  7:59 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Benjamin Herrenschmidt, Mel Gorman, Nick Piggin, Pekka Enberg,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Tue, 22 Sep 2009, Christoph Lameter wrote:

> How would you deal with a memoryless node that has lets say 4 processors
> and some I/O devices? Now the memory policy is round robin and there are 4
> nodes at the same distance with 4G memory each. Does one of the nodes now
> become priviledged under your plan? How do you equally use memory from all
> these nodes?
> 

If the distance between the memoryless node with the cpus/devices and all 
4G nodes is the same, then this is UMA and no abstraction is necessary: 
there's no reason to support interleaving of memory allocations amongst 
four different regions of memory if there's no difference in latencies to 
those regions.

It is possible, however, to have a system configured in such a way that 
representing all devices, including memory, at a single level of 
abstraction isn't possible.  An example is a four cpu system where cpus 
0-1 have local distance to all memory and cpus 2-3 have remote distance.

A solution would be to abstract everything into "system localities" like 
the ACPI specification does.  These localities in my plan are slightly 
different, though: they are limited to only a single class of device.

A locality is simply an aggregate of a particular type of device; a device 
is bound to a locality if it shares the same proximity as all other 
devices in that locality to all other localities.  In other words, the  
previous example would have two cpu localities: one with cpus 0-1 and one 
with cpus 2-3.  If cpu 0 had a different proximity than cpu 1 to a pci 
bus, however, there would be three cpu localities.

The equivalent of proximity domains then describes the distance between 
all localities; these distances need not be one-way, it is possible for 
distance in one direction to be different from the opposite direction, 
just as ACPI pxm's allow.

A "node" in this plan is simply a system locality consisting of memory.

For subsystems such as slab allocators, all we require is cpu_to_node() 
tables which would map cpu localities to nodes and describe them in terms 
of local or remote distance (or whatever the SLIT says, if provided).  All 
present day information can still be represented in this model, we've just 
added additional layers of abstraction internally.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22  7:59         ` David Rientjes
@ 2009-09-22  8:11           ` Benjamin Herrenschmidt
  -1 siblings, 0 replies; 70+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-22  8:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Mel Gorman, Nick Piggin, Pekka Enberg,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Tue, 2009-09-22 at 00:59 -0700, David Rientjes wrote:
> The equivalent of proximity domains then describes the distance between 
> all localities; these distances need not be one-way, it is possible for 
> distance in one direction to be different from the opposite direction, 
> just as ACPI pxm's allow.
> 
> A "node" in this plan is simply a system locality consisting of memory.
> 
> For subsystems such as slab allocators, all we require is cpu_to_node() 
> tables which would map cpu localities to nodes and describe them in terms 
> of local or remote distance (or whatever the SLIT says, if provided).  All 
> present day information can still be represented in this model, we've just 
> added additional layers of abstraction internally.

While I like the idea of NUMA nodes being strictly memory and everything
else being expressed by distances, we'll have to clean up quite a few
corners with skeletons in various states of decompositions waiting for
us there.

For example, we have code here or there that (ab)uses the NUMA node
information to link devices with their iommu, that sort of thing. IE, a
hard dependency which isn't really related to a concept of distance to
any memory.

At least on powerpc, nowadays, I can pretty much make everything
fallback to some representation in the device-tree though, thus it
shouldn't be -that- hard to fix I suppose.

Cheers,
Ben.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22  8:11           ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 70+ messages in thread
From: Benjamin Herrenschmidt @ 2009-09-22  8:11 UTC (permalink / raw)
  To: David Rientjes
  Cc: Christoph Lameter, Mel Gorman, Nick Piggin, Pekka Enberg,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Tue, 2009-09-22 at 00:59 -0700, David Rientjes wrote:
> The equivalent of proximity domains then describes the distance between 
> all localities; these distances need not be one-way, it is possible for 
> distance in one direction to be different from the opposite direction, 
> just as ACPI pxm's allow.
> 
> A "node" in this plan is simply a system locality consisting of memory.
> 
> For subsystems such as slab allocators, all we require is cpu_to_node() 
> tables which would map cpu localities to nodes and describe them in terms 
> of local or remote distance (or whatever the SLIT says, if provided).  All 
> present day information can still be represented in this model, we've just 
> added additional layers of abstraction internally.

While I like the idea of NUMA nodes being strictly memory and everything
else being expressed by distances, we'll have to clean up quite a few
corners with skeletons in various states of decompositions waiting for
us there.

For example, we have code here or there that (ab)uses the NUMA node
information to link devices with their iommu, that sort of thing. IE, a
hard dependency which isn't really related to a concept of distance to
any memory.

At least on powerpc, nowadays, I can pretty much make everything
fallback to some representation in the device-tree though, thus it
shouldn't be -that- hard to fix I suppose.

Cheers,
Ben.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22  8:11           ` Benjamin Herrenschmidt
@ 2009-09-22  8:44             ` David Rientjes
  -1 siblings, 0 replies; 70+ messages in thread
From: David Rientjes @ 2009-09-22  8:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Lameter, Mel Gorman, Nick Piggin, Pekka Enberg,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Tue, 22 Sep 2009, Benjamin Herrenschmidt wrote:

> While I like the idea of NUMA nodes being strictly memory and everything
> else being expressed by distances, we'll have to clean up quite a few
> corners with skeletons in various states of decompositions waiting for
> us there.
> 

Agreed, it's invasive.

> For example, we have code here or there that (ab)uses the NUMA node
> information to link devices with their iommu, that sort of thing. IE, a
> hard dependency which isn't really related to a concept of distance to
> any memory.
> 

ACPI's slit uses a distance of 0xff to specify that one locality is 
unreachable from another.  We could easily adopt that convention.

> At least on powerpc, nowadays, I can pretty much make everything
> fallback to some representation in the device-tree though, thus it
> shouldn't be -that- hard to fix I suppose.
> 

Cool, that's encouraging.

I really think that this type of abstraction would make things simpler in 
the long term.  For example, I just finished fixing a bug in tip where 
cpumask_of_pcibus() wasn't returning cpu_all_mask for busses without any 
affinity on x86.  This was a consequence of cpumask_of_pcibus() being 
forced to rely on pcibus_to_node() since there is no other abstraction 
available.  For busses without affinity to any specific cpus, the 
implementation had relied on returning the mapping's default node of -1 to 
represent all cpus.  That type of complexity could easily be avoided if 
the bus was isolated into its own locality and the mapping to all cpu 
localities was of local distance.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22  8:44             ` David Rientjes
  0 siblings, 0 replies; 70+ messages in thread
From: David Rientjes @ 2009-09-22  8:44 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Christoph Lameter, Mel Gorman, Nick Piggin, Pekka Enberg,
	heiko.carstens, sachinp, linux-kernel, linux-mm, Tejun Heo,
	Lee Schermerhorn

On Tue, 22 Sep 2009, Benjamin Herrenschmidt wrote:

> While I like the idea of NUMA nodes being strictly memory and everything
> else being expressed by distances, we'll have to clean up quite a few
> corners with skeletons in various states of decompositions waiting for
> us there.
> 

Agreed, it's invasive.

> For example, we have code here or there that (ab)uses the NUMA node
> information to link devices with their iommu, that sort of thing. IE, a
> hard dependency which isn't really related to a concept of distance to
> any memory.
> 

ACPI's slit uses a distance of 0xff to specify that one locality is 
unreachable from another.  We could easily adopt that convention.

> At least on powerpc, nowadays, I can pretty much make everything
> fallback to some representation in the device-tree though, thus it
> shouldn't be -that- hard to fix I suppose.
> 

Cool, that's encouraging.

I really think that this type of abstraction would make things simpler in 
the long term.  For example, I just finished fixing a bug in tip where 
cpumask_of_pcibus() wasn't returning cpu_all_mask for busses without any 
affinity on x86.  This was a consequence of cpumask_of_pcibus() being 
forced to rely on pcibus_to_node() since there is no other abstraction 
available.  For busses without affinity to any specific cpus, the 
implementation had relied on returning the mapping's default node of -1 to 
represent all cpus.  That type of complexity could easily be avoided if 
the bus was isolated into its own locality and the mapping to all cpu 
localities was of local distance.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/3] slqb: Allow SLQB to be used on PPC
  2009-09-21 16:10   ` Mel Gorman
@ 2009-09-22  9:30     ` Heiko Carstens
  -1 siblings, 0 replies; 70+ messages in thread
From: Heiko Carstens @ 2009-09-22  9:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, sachinp,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 05:10:26PM +0100, Mel Gorman wrote:
> SLQB was disabled on PPC as it would stab itself in the face when running
> on machines with CPUs on memoryless nodes. As those configurations should
> now work, allow SLQB to be configured again on PPC.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  init/Kconfig |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index adc10ab..8f55fde 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1033,7 +1033,7 @@ config SLUB
> 
>  config SLQB
>  	bool "SLQB (Queued allocator)"
> -	depends on !PPC && !S390
> +	depends on !S390

You can remove S390 from the list independently from this patch set.
As already mentioned SLQB works again on s390 and whatever caused the
bug I reported a few weeks back is gone.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/3] slqb: Allow SLQB to be used on PPC
@ 2009-09-22  9:30     ` Heiko Carstens
  0 siblings, 0 replies; 70+ messages in thread
From: Heiko Carstens @ 2009-09-22  9:30 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, sachinp,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 05:10:26PM +0100, Mel Gorman wrote:
> SLQB was disabled on PPC as it would stab itself in the face when running
> on machines with CPUs on memoryless nodes. As those configurations should
> now work, allow SLQB to be configured again on PPC.
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
>  init/Kconfig |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/init/Kconfig b/init/Kconfig
> index adc10ab..8f55fde 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -1033,7 +1033,7 @@ config SLUB
> 
>  config SLQB
>  	bool "SLQB (Queued allocator)"
> -	depends on !PPC && !S390
> +	depends on !S390

You can remove S390 from the list independently from this patch set.
As already mentioned SLQB works again on s390 and whatever caused the
bug I reported a few weeks back is gone.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
  2009-09-22  0:01     ` Tejun Heo
@ 2009-09-22  9:32       ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22  9:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 09:01:55AM +0900, Tejun Heo wrote:
> Mel Gorman wrote:
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index 1f68160..a5f52d4 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
> >  		paca[i].data_offset = ptr - __per_cpu_start;
> >  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> >  	}
> > +#ifdef CONFIG_SLQB
> > +	/* 
> > +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> > +	 * assumes that ever node ID will have a CPU of that ID to match.
> > +	 * On systems with memoryless nodes, this may not hold true. Hence,
> > +	 * we take a second pass initialising a "per-cpu" area for node-ids
> > +	 * that SLQB can use
> > +	 */
> > +	for_each_node_state(i, N_NORMAL_MEMORY) {
> > +
> > +		/* Skip node IDs that a valid CPU id exists for */
> > +		if (paca[i].data_offset)
> > +			continue;
> > +
> > +		ptr = alloc_bootmem_pages_node(NODE_DATA(cpu_to_node(i)), size);
> > +
> > +		paca[i].data_offset = ptr - __per_cpu_start;
> > +		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> > +	}
> > +#endif /* CONFIG_SLQB */
> >  }
> >  #endif
> 
> Eh... I don't know.  This seems too hacky to me. 

I've come around to this opinion as well. There are probably too many
other architectures and corners where this is gotten wrong and a more
fundamental fix is needed for SLQB to be able to use this hack.

> Why not just
> allocate pointer array of MAX_NUMNODES and allocate per-node memory
> there? 

That's what patch 1 from V1 did. I'll make it Patch 1 for V3.

> This will be slightly more expensive but I doubt it will be
> noticeable.  The only extra overhead is the cachline footprint for the
> extra array.
> 

I'll compare the vmlinux's to quantify the exact penalty but basically,
I don't think it can be avoided at this point.

Thanks.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas
@ 2009-09-22  9:32       ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22  9:32 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 09:01:55AM +0900, Tejun Heo wrote:
> Mel Gorman wrote:
> > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> > index 1f68160..a5f52d4 100644
> > --- a/arch/powerpc/kernel/setup_64.c
> > +++ b/arch/powerpc/kernel/setup_64.c
> > @@ -588,6 +588,26 @@ void __init setup_per_cpu_areas(void)
> >  		paca[i].data_offset = ptr - __per_cpu_start;
> >  		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> >  	}
> > +#ifdef CONFIG_SLQB
> > +	/* 
> > +	 * SLQB abuses DEFINE_PER_CPU to setup a per-node area. This trick
> > +	 * assumes that ever node ID will have a CPU of that ID to match.
> > +	 * On systems with memoryless nodes, this may not hold true. Hence,
> > +	 * we take a second pass initialising a "per-cpu" area for node-ids
> > +	 * that SLQB can use
> > +	 */
> > +	for_each_node_state(i, N_NORMAL_MEMORY) {
> > +
> > +		/* Skip node IDs that a valid CPU id exists for */
> > +		if (paca[i].data_offset)
> > +			continue;
> > +
> > +		ptr = alloc_bootmem_pages_node(NODE_DATA(cpu_to_node(i)), size);
> > +
> > +		paca[i].data_offset = ptr - __per_cpu_start;
> > +		memcpy(ptr, __per_cpu_start, __per_cpu_end - __per_cpu_start);
> > +	}
> > +#endif /* CONFIG_SLQB */
> >  }
> >  #endif
> 
> Eh... I don't know.  This seems too hacky to me. 

I've come around to this opinion as well. There are probably too many
other architectures and corners where this is gotten wrong and a more
fundamental fix is needed for SLQB to be able to use this hack.

> Why not just
> allocate pointer array of MAX_NUMNODES and allocate per-node memory
> there? 

That's what patch 1 from V1 did. I'll make it Patch 1 for V3.

> This will be slightly more expensive but I doubt it will be
> noticeable.  The only extra overhead is the cachline footprint for the
> extra array.
> 

I'll compare the vmlinux's to quantify the exact penalty but basically,
I don't think it can be avoided at this point.

Thanks.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/3] slqb: Allow SLQB to be used on PPC
  2009-09-22  9:30     ` Heiko Carstens
@ 2009-09-22  9:32       ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22  9:32 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, sachinp,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 11:30:23AM +0200, Heiko Carstens wrote:
> On Mon, Sep 21, 2009 at 05:10:26PM +0100, Mel Gorman wrote:
> > SLQB was disabled on PPC as it would stab itself in the face when running
> > on machines with CPUs on memoryless nodes. As those configurations should
> > now work, allow SLQB to be configured again on PPC.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  init/Kconfig |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index adc10ab..8f55fde 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1033,7 +1033,7 @@ config SLUB
> > 
> >  config SLQB
> >  	bool "SLQB (Queued allocator)"
> > -	depends on !PPC && !S390
> > +	depends on !S390
> 
> You can remove S390 from the list independently from this patch set.
> As already mentioned SLQB works again on s390 and whatever caused the
> bug I reported a few weeks back is gone.
> 

Nice one. Thanks.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [PATCH 3/3] slqb: Allow SLQB to be used on PPC
@ 2009-09-22  9:32       ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22  9:32 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, sachinp,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 11:30:23AM +0200, Heiko Carstens wrote:
> On Mon, Sep 21, 2009 at 05:10:26PM +0100, Mel Gorman wrote:
> > SLQB was disabled on PPC as it would stab itself in the face when running
> > on machines with CPUs on memoryless nodes. As those configurations should
> > now work, allow SLQB to be configured again on PPC.
> > 
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> > ---
> >  init/Kconfig |    2 +-
> >  1 files changed, 1 insertions(+), 1 deletions(-)
> > 
> > diff --git a/init/Kconfig b/init/Kconfig
> > index adc10ab..8f55fde 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -1033,7 +1033,7 @@ config SLUB
> > 
> >  config SLQB
> >  	bool "SLQB (Queued allocator)"
> > -	depends on !PPC && !S390
> > +	depends on !S390
> 
> You can remove S390 from the list independently from this patch set.
> As already mentioned SLQB works again on s390 and whatever caused the
> bug I reported a few weeks back is gone.
> 

Nice one. Thanks.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-21 18:17         ` Christoph Lameter
@ 2009-09-22 10:05           ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 10:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Pekka Enberg, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 02:17:40PM -0400, Christoph Lameter wrote:
> On Mon, 21 Sep 2009, Mel Gorman wrote:
> > Can you spot if there is something fundamentally wrong with patch 2? I.e. what
> > is wrong with treating the closest node as local instead of only the
> > closest node?
> 
> Depends on the way locking is done for percpu queues (likely lockless).
> A misidentification of the numa locality of an object may result in locks
> not being taken that should have been taken.
> 

Ok, I'll continue looking from that perspective and see what comes out.
I've spotted a few possible anomolies which I'll stick into a separate
patch.

> > > Or just allow SLQB for !NUMA configurations and merge it now.
> > >
> >
> > Forcing SLQB !NUMA will not rattle out any existing list issues
> > unfortunately :(.
> 
> But it will make SLQB work right in permitted configurations. The NUMA
> issues can then be fixed later upstream.
> 

I'm going to punt the decision on this one to Pekka or Nick. My feeling
is leave it enabled for NUMA so it can be identified if it gets fixed
for some other reason - e.g. the stalls are due to a per-cpu problem as
stated by Sachin and SLQB happens to exasperate the problem.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 10:05           ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 10:05 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nick Piggin, Pekka Enberg, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Mon, Sep 21, 2009 at 02:17:40PM -0400, Christoph Lameter wrote:
> On Mon, 21 Sep 2009, Mel Gorman wrote:
> > Can you spot if there is something fundamentally wrong with patch 2? I.e. what
> > is wrong with treating the closest node as local instead of only the
> > closest node?
> 
> Depends on the way locking is done for percpu queues (likely lockless).
> A misidentification of the numa locality of an object may result in locks
> not being taken that should have been taken.
> 

Ok, I'll continue looking from that perspective and see what comes out.
I've spotted a few possible anomolies which I'll stick into a separate
patch.

> > > Or just allow SLQB for !NUMA configurations and merge it now.
> > >
> >
> > Forcing SLQB !NUMA will not rattle out any existing list issues
> > unfortunately :(.
> 
> But it will make SLQB work right in permitted configurations. The NUMA
> issues can then be fixed later upstream.
> 

I'm going to punt the decision on this one to Pekka or Nick. My feeling
is leave it enabled for NUMA so it can be identified if it gets fixed
for some other reason - e.g. the stalls are due to a per-cpu problem as
stated by Sachin and SLQB happens to exasperate the problem.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22  5:03         ` Sachin Sant
@ 2009-09-22 10:07           ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 10:07 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 10:33:11AM +0530, Sachin Sant wrote:
> Mel Gorman wrote:
>> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>>   
>>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>>> resolved.
>>>     
>>
>> It's not even super-clear that the memoryless nodes issues are entirely
>> related to SLQB. Sachin for example says that there was a stall issue
>> with memoryless nodes that could be triggered without SLQB. Sachin, is
>> that still accurate?
>>   
>
> I think there are two different problems that we are dealing with.
>

Agreed.

> First one is the SLQB not working on a ppc64 box which seems to be specific
> to only one machine and i haven't seen that on other power boxes.The patches
> that you have posted seems to allow the box to boot, but eventually it hits
> the stall issue(related to percpu dynamic allocator not working on ppc64),
> which is the second problem we are dealing with.
>
> The stall issue seems to be much more critical as it is affecting almost
> all of the power boxes that i have tested with (4 in all).
> This issue is seen with Linus tree as well and was first seen with
> 2.6.31-git5 (0cb583fd..) 
>
> The stall issue was reported here:
> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>

Because of this, I'd like to try and get SLQB as far as it can at the
moment and then treat this problem from scratch. I'd rather not treat
multiple different bugs as if they were one problem when they probably
are not.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 10:07           ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 10:07 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 10:33:11AM +0530, Sachin Sant wrote:
> Mel Gorman wrote:
>> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>>   
>>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>>> resolved.
>>>     
>>
>> It's not even super-clear that the memoryless nodes issues are entirely
>> related to SLQB. Sachin for example says that there was a stall issue
>> with memoryless nodes that could be triggered without SLQB. Sachin, is
>> that still accurate?
>>   
>
> I think there are two different problems that we are dealing with.
>

Agreed.

> First one is the SLQB not working on a ppc64 box which seems to be specific
> to only one machine and i haven't seen that on other power boxes.The patches
> that you have posted seems to allow the box to boot, but eventually it hits
> the stall issue(related to percpu dynamic allocator not working on ppc64),
> which is the second problem we are dealing with.
>
> The stall issue seems to be much more critical as it is affecting almost
> all of the power boxes that i have tested with (4 in all).
> This issue is seen with Linus tree as well and was first seen with
> 2.6.31-git5 (0cb583fd..) 
>
> The stall issue was reported here:
> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>

Because of this, I'd like to try and get SLQB as far as it can at the
moment and then treat this problem from scratch. I'd rather not treat
multiple different bugs as if they were one problem when they probably
are not.

Thanks

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22 10:05           ` Mel Gorman
@ 2009-09-22 10:21             ` Pekka Enberg
  -1 siblings, 0 replies; 70+ messages in thread
From: Pekka Enberg @ 2009-09-22 10:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Nick Piggin, heiko.carstens, sachinp,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

Hi Mel,

On Tue, 2009-09-22 at 11:05 +0100, Mel Gorman wrote:
> I'm going to punt the decision on this one to Pekka or Nick. My feeling
> is leave it enabled for NUMA so it can be identified if it gets fixed
> for some other reason - e.g. the stalls are due to a per-cpu problem as
> stated by Sachin and SLQB happens to exasperate the problem.

Can I have a tested patch that uses MAX_NUMNODES to allocate the
structs, please? We can convert SLQB over to per-cpu allocator if the
memoryless node issue is resolved.

			Pekka


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 10:21             ` Pekka Enberg
  0 siblings, 0 replies; 70+ messages in thread
From: Pekka Enberg @ 2009-09-22 10:21 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Nick Piggin, heiko.carstens, sachinp,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

Hi Mel,

On Tue, 2009-09-22 at 11:05 +0100, Mel Gorman wrote:
> I'm going to punt the decision on this one to Pekka or Nick. My feeling
> is leave it enabled for NUMA so it can be identified if it gets fixed
> for some other reason - e.g. the stalls are due to a per-cpu problem as
> stated by Sachin and SLQB happens to exasperate the problem.

Can I have a tested patch that uses MAX_NUMNODES to allocate the
structs, please? We can convert SLQB over to per-cpu allocator if the
memoryless node issue is resolved.

			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22 10:21             ` Pekka Enberg
@ 2009-09-22 10:24               ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 10:24 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Nick Piggin, heiko.carstens, sachinp,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 01:21:15PM +0300, Pekka Enberg wrote:
> Hi Mel,
> 
> On Tue, 2009-09-22 at 11:05 +0100, Mel Gorman wrote:
> > I'm going to punt the decision on this one to Pekka or Nick. My feeling
> > is leave it enabled for NUMA so it can be identified if it gets fixed
> > for some other reason - e.g. the stalls are due to a per-cpu problem as
> > stated by Sachin and SLQB happens to exasperate the problem.
> 
> Can I have a tested patch that uses MAX_NUMNODES to allocate the
> structs, please? We can convert SLQB over to per-cpu allocator if the
> memoryless node issue is resolved.
> 

Patch set in the process of testing. Should have something in a few
hours.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 10:24               ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 10:24 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Nick Piggin, heiko.carstens, sachinp,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 01:21:15PM +0300, Pekka Enberg wrote:
> Hi Mel,
> 
> On Tue, 2009-09-22 at 11:05 +0100, Mel Gorman wrote:
> > I'm going to punt the decision on this one to Pekka or Nick. My feeling
> > is leave it enabled for NUMA so it can be identified if it gets fixed
> > for some other reason - e.g. the stalls are due to a per-cpu problem as
> > stated by Sachin and SLQB happens to exasperate the problem.
> 
> Can I have a tested patch that uses MAX_NUMNODES to allocate the
> structs, please? We can convert SLQB over to per-cpu allocator if the
> memoryless node issue is resolved.
> 

Patch set in the process of testing. Should have something in a few
hours.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22  5:03         ` Sachin Sant
@ 2009-09-22 12:55           ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 12:55 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 10:33:11AM +0530, Sachin Sant wrote:
> Mel Gorman wrote:
>> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>>   
>>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>>> resolved.
>>>     
>>
>> It's not even super-clear that the memoryless nodes issues are entirely
>> related to SLQB. Sachin for example says that there was a stall issue
>> with memoryless nodes that could be triggered without SLQB. Sachin, is
>> that still accurate?
>>   
> I think there are two different problems that we are dealing with.
>
> First one is the SLQB not working on a ppc64 box which seems to be specific
> to only one machine and i haven't seen that on other power boxes.The patches
> that you have posted seems to allow the box to boot, but eventually it hits
> the stall issue(related to percpu dynamic allocator not working on ppc64),
> which is the second problem we are dealing with.
>

Ok, I've sent out V3 of this. It's only a partial fix but it's about as
far as it can be brought until the other difficulties are resolved.

> The stall issue seems to be much more critical as it is affecting almost
> all of the power boxes that i have tested with (4 in all).
> This issue is seen with Linus tree as well and was first seen with
> 2.6.31-git5 (0cb583fd..) 
>
> The stall issue was reported here:
> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>

Can you bisect this please?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 12:55           ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 12:55 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 10:33:11AM +0530, Sachin Sant wrote:
> Mel Gorman wrote:
>> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>>   
>>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>>> resolved.
>>>     
>>
>> It's not even super-clear that the memoryless nodes issues are entirely
>> related to SLQB. Sachin for example says that there was a stall issue
>> with memoryless nodes that could be triggered without SLQB. Sachin, is
>> that still accurate?
>>   
> I think there are two different problems that we are dealing with.
>
> First one is the SLQB not working on a ppc64 box which seems to be specific
> to only one machine and i haven't seen that on other power boxes.The patches
> that you have posted seems to allow the box to boot, but eventually it hits
> the stall issue(related to percpu dynamic allocator not working on ppc64),
> which is the second problem we are dealing with.
>

Ok, I've sent out V3 of this. It's only a partial fix but it's about as
far as it can be brought until the other difficulties are resolved.

> The stall issue seems to be much more critical as it is affecting almost
> all of the power boxes that i have tested with (4 in all).
> This issue is seen with Linus tree as well and was first seen with
> 2.6.31-git5 (0cb583fd..) 
>
> The stall issue was reported here:
> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>

Can you bisect this please?

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22 12:55           ` Mel Gorman
@ 2009-09-22 13:05             ` Sachin Sant
  -1 siblings, 0 replies; 70+ messages in thread
From: Sachin Sant @ 2009-09-22 13:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

Mel Gorman wrote:
> On Tue, Sep 22, 2009 at 10:33:11AM +0530, Sachin Sant wrote:
>   
>> Mel Gorman wrote:
>>     
>>> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>>>   
>>>       
>>>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>>>> resolved.
>>>>     
>>>>         
>>> It's not even super-clear that the memoryless nodes issues are entirely
>>> related to SLQB. Sachin for example says that there was a stall issue
>>> with memoryless nodes that could be triggered without SLQB. Sachin, is
>>> that still accurate?
>>>   
>>>       
>> I think there are two different problems that we are dealing with.
>>
>> First one is the SLQB not working on a ppc64 box which seems to be specific
>> to only one machine and i haven't seen that on other power boxes.The patches
>> that you have posted seems to allow the box to boot, but eventually it hits
>> the stall issue(related to percpu dynamic allocator not working on ppc64),
>> which is the second problem we are dealing with.
>>
>>     
>
> Ok, I've sent out V3 of this. It's only a partial fix but it's about as
> far as it can be brought until the other difficulties are resolved.
>   
Thanks Mel.

>   
>> The stall issue seems to be much more critical as it is affecting almost
>> all of the power boxes that i have tested with (4 in all).
>> This issue is seen with Linus tree as well and was first seen with
>> 2.6.31-git5 (0cb583fd..) 
>>
>> The stall issue was reported here:
>> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>>
>>     
>
> Can you bisect this please?
>   
The problem seems to have been introduced with
commit ada3fa15057205b7d3f727bba5cd26b5912e350f.

Specifically this patch : 
powerpc64: convert to dynamic percpu allocator

If i revert this patch i am able to boot latest git
on a powerpc box.

Thanks
-Sachin

-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 13:05             ` Sachin Sant
  0 siblings, 0 replies; 70+ messages in thread
From: Sachin Sant @ 2009-09-22 13:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

Mel Gorman wrote:
> On Tue, Sep 22, 2009 at 10:33:11AM +0530, Sachin Sant wrote:
>   
>> Mel Gorman wrote:
>>     
>>> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>>>   
>>>       
>>>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>>>> resolved.
>>>>     
>>>>         
>>> It's not even super-clear that the memoryless nodes issues are entirely
>>> related to SLQB. Sachin for example says that there was a stall issue
>>> with memoryless nodes that could be triggered without SLQB. Sachin, is
>>> that still accurate?
>>>   
>>>       
>> I think there are two different problems that we are dealing with.
>>
>> First one is the SLQB not working on a ppc64 box which seems to be specific
>> to only one machine and i haven't seen that on other power boxes.The patches
>> that you have posted seems to allow the box to boot, but eventually it hits
>> the stall issue(related to percpu dynamic allocator not working on ppc64),
>> which is the second problem we are dealing with.
>>
>>     
>
> Ok, I've sent out V3 of this. It's only a partial fix but it's about as
> far as it can be brought until the other difficulties are resolved.
>   
Thanks Mel.

>   
>> The stall issue seems to be much more critical as it is affecting almost
>> all of the power boxes that i have tested with (4 in all).
>> This issue is seen with Linus tree as well and was first seen with
>> 2.6.31-git5 (0cb583fd..) 
>>
>> The stall issue was reported here:
>> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>>
>>     
>
> Can you bisect this please?
>   
The problem seems to have been introduced with
commit ada3fa15057205b7d3f727bba5cd26b5912e350f.

Specifically this patch : 
powerpc64: convert to dynamic percpu allocator

If i revert this patch i am able to boot latest git
on a powerpc box.

Thanks
-Sachin

-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22 13:05             ` Sachin Sant
@ 2009-09-22 13:20               ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 13:20 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 06:35:05PM +0530, Sachin Sant wrote:
> Mel Gorman wrote:
>> On Tue, Sep 22, 2009 at 10:33:11AM +0530, Sachin Sant wrote:
>>   
>>> Mel Gorman wrote:
>>>     
>>>> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>>>>         
>>>>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>>>>> resolved.
>>>>>             
>>>> It's not even super-clear that the memoryless nodes issues are entirely
>>>> related to SLQB. Sachin for example says that there was a stall issue
>>>> with memoryless nodes that could be triggered without SLQB. Sachin, is
>>>> that still accurate?
>>>>         
>>> I think there are two different problems that we are dealing with.
>>>
>>> First one is the SLQB not working on a ppc64 box which seems to be specific
>>> to only one machine and i haven't seen that on other power boxes.The patches
>>> that you have posted seems to allow the box to boot, but eventually it hits
>>> the stall issue(related to percpu dynamic allocator not working on ppc64),
>>> which is the second problem we are dealing with.
>>>
>>>     
>>
>> Ok, I've sent out V3 of this. It's only a partial fix but it's about as
>> far as it can be brought until the other difficulties are resolved.
>>   
> Thanks Mel.
>
>>   
>>> The stall issue seems to be much more critical as it is affecting almost
>>> all of the power boxes that i have tested with (4 in all).
>>> This issue is seen with Linus tree as well and was first seen with
>>> 2.6.31-git5 (0cb583fd..) 
>>>
>>> The stall issue was reported here:
>>> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>>>
>>>     
>>
>> Can you bisect this please?
>>   
> The problem seems to have been introduced with
> commit ada3fa15057205b7d3f727bba5cd26b5912e350f.
>
> Specifically this patch : powerpc64: convert to dynamic percpu allocator
>

That is commit c2a7e818019f20a5cf7fb26a6eb59e212e6c0cd8.

> If i revert this patch i am able to boot latest git
> on a powerpc box.
>

Ok, this is then independent of the corruption I was seeing which I now
believe is based entirely within SLQBs handling of remote nodes. I was based
on 2.6.31 with SLQB applied on top. This patch was applied in a time after
the baseline I was working from.

Tejun, have you looked at that stall problem? Sorry if you have already,
I haven't dug around for related threads.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 13:20               ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 13:20 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Christoph Lameter, Nick Piggin, Pekka Enberg, heiko.carstens,
	linux-kernel, linux-mm, Tejun Heo, Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 06:35:05PM +0530, Sachin Sant wrote:
> Mel Gorman wrote:
>> On Tue, Sep 22, 2009 at 10:33:11AM +0530, Sachin Sant wrote:
>>   
>>> Mel Gorman wrote:
>>>     
>>>> On Mon, Sep 21, 2009 at 01:54:12PM -0400, Christoph Lameter wrote:
>>>>         
>>>>> Lets just keep SLQB back until the basic issues with memoryless nodes are
>>>>> resolved.
>>>>>             
>>>> It's not even super-clear that the memoryless nodes issues are entirely
>>>> related to SLQB. Sachin for example says that there was a stall issue
>>>> with memoryless nodes that could be triggered without SLQB. Sachin, is
>>>> that still accurate?
>>>>         
>>> I think there are two different problems that we are dealing with.
>>>
>>> First one is the SLQB not working on a ppc64 box which seems to be specific
>>> to only one machine and i haven't seen that on other power boxes.The patches
>>> that you have posted seems to allow the box to boot, but eventually it hits
>>> the stall issue(related to percpu dynamic allocator not working on ppc64),
>>> which is the second problem we are dealing with.
>>>
>>>     
>>
>> Ok, I've sent out V3 of this. It's only a partial fix but it's about as
>> far as it can be brought until the other difficulties are resolved.
>>   
> Thanks Mel.
>
>>   
>>> The stall issue seems to be much more critical as it is affecting almost
>>> all of the power boxes that i have tested with (4 in all).
>>> This issue is seen with Linus tree as well and was first seen with
>>> 2.6.31-git5 (0cb583fd..) 
>>>
>>> The stall issue was reported here:
>>> http://lists.ozlabs.org/pipermail/linuxppc-dev/2009-September/075791.html
>>>
>>>     
>>
>> Can you bisect this please?
>>   
> The problem seems to have been introduced with
> commit ada3fa15057205b7d3f727bba5cd26b5912e350f.
>
> Specifically this patch : powerpc64: convert to dynamic percpu allocator
>

That is commit c2a7e818019f20a5cf7fb26a6eb59e212e6c0cd8.

> If i revert this patch i am able to boot latest git
> on a powerpc box.
>

Ok, this is then independent of the corruption I was seeing which I now
believe is based entirely within SLQBs handling of remote nodes. I was based
on 2.6.31 with SLQB applied on top. This patch was applied in a time after
the baseline I was working from.

Tejun, have you looked at that stall problem? Sorry if you have already,
I haven't dug around for related threads.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
       [not found]               ` <363172900909220629j2f5174cbo9fe027354948d37@mail.gmail.com>
@ 2009-09-22 13:38                   ` Mel Gorman
  2009-09-22 23:07                   ` Christoph Lameter
  1 sibling, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 13:38 UTC (permalink / raw)
  To: ????
  Cc: Sachin Sant, Christoph Lameter, Nick Piggin, Pekka Enberg,
	heiko.carstens, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 09:29:44PM +0800, ???? wrote:
> Dear all,
>        I want ask question about kernel-customization. Our product is
> embedded system, such as ADSL Modem(home gateway).  and we use linux
> 2.6.22.15 version. Now config for linux kernel will build kernel size is
> 800KB. How can I config kernel config to reduce kernel size, I want to get
> smaller size  like 500KB.
> But out product is network device,so some network protocol of kernel can not
> remove. Below is our config, all of you can give me suggestion, Thanks very
> much!
> 

*blinks*

This is not a sensible thread to ask the question on.

I haven't looked at your config to see how it might be stripped down but
I would suggest using scripts/bloat-o-meter to start getting a breakdown
per-subsystem that is making up the bulk of your kernel image.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 13:38                   ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 13:38 UTC (permalink / raw)
  To: ????
  Cc: Sachin Sant, Christoph Lameter, Nick Piggin, Pekka Enberg,
	heiko.carstens, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Tue, Sep 22, 2009 at 09:29:44PM +0800, ???? wrote:
> Dear all,
>        I want ask question about kernel-customization. Our product is
> embedded system, such as ADSL Modem(home gateway).  and we use linux
> 2.6.22.15 version. Now config for linux kernel will build kernel size is
> 800KB. How can I config kernel config to reduce kernel size, I want to get
> smaller size  like 500KB.
> But out product is network device,so some network protocol of kernel can not
> remove. Below is our config, all of you can give me suggestion, Thanks very
> much!
> 

*blinks*

This is not a sensible thread to ask the question on.

I haven't looked at your config to see how it might be stripped down but
I would suggest using scripts/bloat-o-meter to start getting a breakdown
per-subsystem that is making up the bulk of your kernel image.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22  0:00   ` Benjamin Herrenschmidt
@ 2009-09-22 15:26     ` Mel Gorman
  -1 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 15:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Tejun Heo

On Tue, Sep 22, 2009 at 10:00:03AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> > 
> > It needs signed-off from the powerpc side because it's now allocating
> > more
> > memory potentially (Ben?). An alternative to this patch is in V1 that
> > statically declares the per-node structures but this is potentially
> > sub-optimal but from a performance and memory utilisation perspective.
> 
> So if I understand correctly, we have a problem with both cpu-less and
> memory-less nodes. Interesting setups :-)
> 

memoryless anyway because of the per-node trick. I'm not aware of
cpuless problems but I suspect cpuless nodes exist for more than ppc64.

> I have no strong objection on the allocating of the per-cpu data for
> the cpu-less nodes. However, I wonder if we should do that a bit more
> nicely, maybe with some kind of "adjusted" cpu_possible_mask() (could be
> something like cpu_node_valid_mask or similar) to be used by percpu.
> 

That would be reasonable if per-node data becomes more popular although
with only SLQB depending on per-node data, it's hard to justify unless
SLQB *really* benefits from per-node.

Lumping the per-cpu allocator to cover per-cpu and per-node feels a bit
confusing. Maybe it would have been easier if there simply were never
memoryless nodes and cpus were always mapped to their closest, instead of
their local, node. There likely would be a few corner cases though and memory
hotplug would add to the mess. I haven't been able to decide on a sensible
way forward that doesn't involve a number of invasive changes.

> Mostly because it would be nice to have built-in debug features in
> per-cpu and in that case, it would need some way to know a valid
> number from an invalid one). Either that or just keep track of the
> mask of cpus that had percpu data allocated to them
> 

The former would be nice, the latter would be a requirement if per-node data
was pursued.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 15:26     ` Mel Gorman
  0 siblings, 0 replies; 70+ messages in thread
From: Mel Gorman @ 2009-09-22 15:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Nick Piggin, Pekka Enberg, Christoph Lameter, heiko.carstens,
	sachinp, linux-kernel, linux-mm, Tejun Heo

On Tue, Sep 22, 2009 at 10:00:03AM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2009-09-21 at 17:10 +0100, Mel Gorman wrote:
> > 
> > It needs signed-off from the powerpc side because it's now allocating
> > more
> > memory potentially (Ben?). An alternative to this patch is in V1 that
> > statically declares the per-node structures but this is potentially
> > sub-optimal but from a performance and memory utilisation perspective.
> 
> So if I understand correctly, we have a problem with both cpu-less and
> memory-less nodes. Interesting setups :-)
> 

memoryless anyway because of the per-node trick. I'm not aware of
cpuless problems but I suspect cpuless nodes exist for more than ppc64.

> I have no strong objection on the allocating of the per-cpu data for
> the cpu-less nodes. However, I wonder if we should do that a bit more
> nicely, maybe with some kind of "adjusted" cpu_possible_mask() (could be
> something like cpu_node_valid_mask or similar) to be used by percpu.
> 

That would be reasonable if per-node data becomes more popular although
with only SLQB depending on per-node data, it's hard to justify unless
SLQB *really* benefits from per-node.

Lumping the per-cpu allocator to cover per-cpu and per-node feels a bit
confusing. Maybe it would have been easier if there simply were never
memoryless nodes and cpus were always mapped to their closest, instead of
their local, node. There likely would be a few corner cases though and memory
hotplug would add to the mess. I haven't been able to decide on a sensible
way forward that doesn't involve a number of invasive changes.

> Mostly because it would be nice to have built-in debug features in
> per-cpu and in that case, it would need some way to know a valid
> number from an invalid one). Either that or just keep track of the
> mask of cpus that had percpu data allocated to them
> 

The former would be nice, the latter would be a requirement if per-node data
was pursued.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
  2009-09-22 15:26     ` Mel Gorman
@ 2009-09-22 17:31       ` David Rientjes
  -1 siblings, 0 replies; 70+ messages in thread
From: David Rientjes @ 2009-09-22 17:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Benjamin Herrenschmidt, Nick Piggin, Pekka Enberg,
	Christoph Lameter, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo

On Tue, 22 Sep 2009, Mel Gorman wrote:

> Lumping the per-cpu allocator to cover per-cpu and per-node feels a bit
> confusing. Maybe it would have been easier if there simply were never
> memoryless nodes and cpus were always mapped to their closest, instead of
> their local, node. There likely would be a few corner cases though and memory
> hotplug would add to the mess. I haven't been able to decide on a sensible
> way forward that doesn't involve a number of invasive changes.
> 

I strongly agree with removing memoryless node support from the kernel, 
but I don't think we should substitute it with a multiple cpu binding 
approach because it doesn't export the true physical topology of the 
system.

If we treat all cpus equally with respect to a region of memory when one 
happens to be more remote, userspace can no longer use sched_setaffinity() 
for latency-sensitive apps nor can it correctly interleave with other 
nodes.  Reduced to its simplest form, a machine now with a single node 
because memoryless nodes have been obsoleted would incorrectly report that 
all cpus are true siblings.

While memoryless nodes are inconvenient for the implementation, they do 
have the benefit of being able to represent the actual physical topology 
when binding cpus to their nearest node, even though it may not be local, 
would not.

It's important to accurately represent the physical topology, and that can 
be done with the device class abstraction model as I described in 
http://lkml.org/lkml/2009/9/22/97 of which a node is only a locality of a 
particular type.

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 17:31       ` David Rientjes
  0 siblings, 0 replies; 70+ messages in thread
From: David Rientjes @ 2009-09-22 17:31 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Benjamin Herrenschmidt, Nick Piggin, Pekka Enberg,
	Christoph Lameter, heiko.carstens, sachinp, linux-kernel,
	linux-mm, Tejun Heo

On Tue, 22 Sep 2009, Mel Gorman wrote:

> Lumping the per-cpu allocator to cover per-cpu and per-node feels a bit
> confusing. Maybe it would have been easier if there simply were never
> memoryless nodes and cpus were always mapped to their closest, instead of
> their local, node. There likely would be a few corner cases though and memory
> hotplug would add to the mess. I haven't been able to decide on a sensible
> way forward that doesn't involve a number of invasive changes.
> 

I strongly agree with removing memoryless node support from the kernel, 
but I don't think we should substitute it with a multiple cpu binding 
approach because it doesn't export the true physical topology of the 
system.

If we treat all cpus equally with respect to a region of memory when one 
happens to be more remote, userspace can no longer use sched_setaffinity() 
for latency-sensitive apps nor can it correctly interleave with other 
nodes.  Reduced to its simplest form, a machine now with a single node 
because memoryless nodes have been obsoleted would incorrectly report that 
all cpus are true siblings.

While memoryless nodes are inconvenient for the implementation, they do 
have the benefit of being able to represent the actual physical topology 
when binding cpus to their nearest node, even though it may not be local, 
would not.

It's important to accurately represent the physical topology, and that can 
be done with the device class abstraction model as I described in 
http://lkml.org/lkml/2009/9/22/97 of which a node is only a locality of a 
particular type.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
       [not found]               ` <363172900909220629j2f5174cbo9fe027354948d37@mail.gmail.com>
@ 2009-09-22 23:07                   ` Christoph Lameter
  2009-09-22 23:07                   ` Christoph Lameter
  1 sibling, 0 replies; 70+ messages in thread
From: Christoph Lameter @ 2009-09-22 23:07 UTC (permalink / raw)
  To: 飞颜
  Cc: Mel Gorman, Sachin Sant, Nick Piggin, Pekka Enberg,
	heiko.carstens, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Tue, 22 Sep 2009, ?? wrote:

> 800KB. How can I config kernel config to reduce kernel size, I want to get
> smaller size  like 500KB.


500kb? That may be a tough call.

> *CONFIG_NETFILTER=y*

Can you drop this one?

Is CONFIG_EMBEDDED set? Maybe I skipped it.


^ permalink raw reply	[flat|nested] 70+ messages in thread

* Re: [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2
@ 2009-09-22 23:07                   ` Christoph Lameter
  0 siblings, 0 replies; 70+ messages in thread
From: Christoph Lameter @ 2009-09-22 23:07 UTC (permalink / raw)
  To: 飞颜
  Cc: Mel Gorman, Sachin Sant, Nick Piggin, Pekka Enberg,
	heiko.carstens, linux-kernel, linux-mm, Tejun Heo,
	Benjamin Herrenschmidt

On Tue, 22 Sep 2009, ?? wrote:

> 800KB. How can I config kernel config to reduce kernel size, I want to get
> smaller size  like 500KB.


500kb? That may be a tough call.

> *CONFIG_NETFILTER=y*

Can you drop this one?

Is CONFIG_EMBEDDED set? Maybe I skipped it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 70+ messages in thread

end of thread, other threads:[~2009-09-22 23:20 UTC | newest]

Thread overview: 70+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-09-21 16:10 [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2 Mel Gorman
2009-09-21 16:10 ` Mel Gorman
2009-09-21 16:10 ` [PATCH 1/3] powerpc: Allocate per-cpu areas for node IDs for SLQB to use as per-node areas Mel Gorman
2009-09-21 16:10   ` Mel Gorman
2009-09-21 17:17   ` Daniel Walker
2009-09-21 17:17     ` Daniel Walker
2009-09-21 17:24     ` Randy Dunlap
2009-09-21 17:24       ` Randy Dunlap
2009-09-21 17:29       ` Daniel Walker
2009-09-21 17:29         ` Daniel Walker
2009-09-21 17:42     ` Mel Gorman
2009-09-21 17:42       ` Mel Gorman
2009-09-22  0:01   ` Tejun Heo
2009-09-22  0:01     ` Tejun Heo
2009-09-22  9:32     ` Mel Gorman
2009-09-22  9:32       ` Mel Gorman
2009-09-21 16:10 ` [PATCH 2/3] slqb: Record what node is local to a kmem_cache_cpu Mel Gorman
2009-09-21 16:10   ` Mel Gorman
2009-09-21 16:10 ` [PATCH 3/3] slqb: Allow SLQB to be used on PPC Mel Gorman
2009-09-21 16:10   ` Mel Gorman
2009-09-22  9:30   ` Heiko Carstens
2009-09-22  9:30     ` Heiko Carstens
2009-09-22  9:32     ` Mel Gorman
2009-09-22  9:32       ` Mel Gorman
2009-09-21 17:46 ` [RFC PATCH 0/3] Fix SLQB on memoryless configurations V2 Mel Gorman
2009-09-21 17:46   ` Mel Gorman
2009-09-21 17:54   ` Christoph Lameter
2009-09-21 17:54     ` Christoph Lameter
2009-09-21 18:05     ` Pekka Enberg
2009-09-21 18:05       ` Pekka Enberg
2009-09-21 18:07     ` Mel Gorman
2009-09-21 18:07       ` Mel Gorman
2009-09-21 18:17       ` Christoph Lameter
2009-09-21 18:17         ` Christoph Lameter
2009-09-22 10:05         ` Mel Gorman
2009-09-22 10:05           ` Mel Gorman
2009-09-22 10:21           ` Pekka Enberg
2009-09-22 10:21             ` Pekka Enberg
2009-09-22 10:24             ` Mel Gorman
2009-09-22 10:24               ` Mel Gorman
2009-09-22  5:03       ` Sachin Sant
2009-09-22  5:03         ` Sachin Sant
2009-09-22 10:07         ` Mel Gorman
2009-09-22 10:07           ` Mel Gorman
2009-09-22 12:55         ` Mel Gorman
2009-09-22 12:55           ` Mel Gorman
2009-09-22 13:05           ` Sachin Sant
2009-09-22 13:05             ` Sachin Sant
2009-09-22 13:20             ` Mel Gorman
2009-09-22 13:20               ` Mel Gorman
     [not found]               ` <363172900909220629j2f5174cbo9fe027354948d37@mail.gmail.com>
2009-09-22 13:38                 ` Mel Gorman
2009-09-22 13:38                   ` Mel Gorman
2009-09-22 23:07                 ` Christoph Lameter
2009-09-22 23:07                   ` Christoph Lameter
2009-09-22  0:00 ` Benjamin Herrenschmidt
2009-09-22  0:00   ` Benjamin Herrenschmidt
2009-09-22  0:19   ` David Rientjes
2009-09-22  0:19     ` David Rientjes
2009-09-22  6:30     ` Christoph Lameter
2009-09-22  6:30       ` Christoph Lameter
2009-09-22  7:59       ` David Rientjes
2009-09-22  7:59         ` David Rientjes
2009-09-22  8:11         ` Benjamin Herrenschmidt
2009-09-22  8:11           ` Benjamin Herrenschmidt
2009-09-22  8:44           ` David Rientjes
2009-09-22  8:44             ` David Rientjes
2009-09-22 15:26   ` Mel Gorman
2009-09-22 15:26     ` Mel Gorman
2009-09-22 17:31     ` David Rientjes
2009-09-22 17:31       ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.