linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Fix kmalloc_node on offline nodes
@ 2020-03-18  7:28 Srikar Dronamraju
  2020-03-18  7:28 ` [PATCH v2 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Srikar Dronamraju @ 2020-03-18  7:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao,
	Nathan Lynch

Changelog v1 -> v2:
- Handled comments from Vlastimil Babka and Bharata B Rao
- Changes only in patch 2 and 4.

Sachin recently reported that linux-next was no more bootable on few
powerpc systems.
https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/

# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10
#

Sachin bisected the problem to Commit a75056fc1e7c ("mm/memcontrol.c: allocate
shrinker_map on appropriate NUMA node")

The root cause analysis showed that mm/slub and powerpc/numa had some shortcomings
with respect to offline nodes.

This patch series is on top of patches posted at
https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>

Srikar Dronamraju (4):
  mm: Check for node_online in node_present_pages
  mm/slub: Use mem_node to allocate a new slab
  mm: Implement reset_numa_mem
  powerpc/numa: Set fallback nodes for offline nodes

 arch/powerpc/include/asm/topology.h | 16 ++++++++++++++++
 arch/powerpc/kernel/smp.c           |  1 +
 include/asm-generic/topology.h      |  3 +++
 include/linux/mmzone.h              |  6 ++++--
 include/linux/topology.h            |  7 +++++++
 mm/slub.c                           |  9 +++++----
 6 files changed, 36 insertions(+), 6 deletions(-)

-- 
2.18.1



^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-18  7:28 [PATCH v2 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
@ 2020-03-18  7:28 ` Srikar Dronamraju
  2020-03-18 10:02   ` Michal Hocko
  2020-03-18  7:28 ` [PATCH v2 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: Srikar Dronamraju @ 2020-03-18  7:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao,
	Nathan Lynch

Calling a kmalloc_node on a possible node which is not yet onlined can
lead to panic. Currently node_present_pages() doesn't verify the node is
online before accessing the pgdat for the node. However pgdat struct may
not be available resulting in a crash.

NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
LR [c0000000003d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
[c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
[c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
[c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
[c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
[c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
[c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
[c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
[c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68

Fix this by verifying the node is online before accessing the pgdat
structure. Fix the same for node_spanned_pages() too.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/mmzone.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f3f264826423..88078a3b95e5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -756,8 +756,10 @@ typedef struct pglist_data {
 	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
 } pg_data_t;
 
-#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
-#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
+#define node_present_pages(nid)		\
+	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
+#define node_spanned_pages(nid)		\
+	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
 #define pgdat_page_nr(pgdat, pagenr)	((pgdat)->node_mem_map + (pagenr))
 #else
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-18  7:28 [PATCH v2 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
  2020-03-18  7:28 ` [PATCH v2 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
@ 2020-03-18  7:28 ` Srikar Dronamraju
  2020-03-18  7:28 ` [PATCH v2 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
  2020-03-18  7:28 ` [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
  3 siblings, 0 replies; 17+ messages in thread
From: Srikar Dronamraju @ 2020-03-18  7:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao,
	Nathan Lynch

Currently while allocating a slab for a offline node, we use its
associated node_numa_mem to search for a partial slab. If we don't find
a partial slab, we try allocating a slab from the offline node using
__alloc_pages_node. However this is bound to fail.

NIP [c00000000039a300] __alloc_pages_nodemask+0x130/0x3b0
LR [c00000000039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
Call Trace:
[c0000008b36837f0] [c00000000039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 (unreliable)
[c0000008b3683870] [c0000000003d1ff8] new_slab+0x128/0xcf0
[c0000008b3683950] [c0000000003d6060] ___slab_alloc+0x410/0x820
[c0000008b3683a40] [c0000000003d64a4] __slab_alloc+0x34/0x60
[c0000008b3683a70] [c0000000003d78b0] __kmalloc_node+0x110/0x490
[c0000008b3683af0] [c000000000343a08] kvmalloc_node+0x58/0x110
[c0000008b3683b30] [c0000000003ffd44] mem_cgroup_css_online+0x104/0x270
[c0000008b3683b90] [c000000000234e08] online_css+0x48/0xd0
[c0000008b3683bc0] [c00000000023dedc] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3683ca0] [c0000000002416f8] cgroup_mkdir+0x228/0x5f0
[c0000008b3683d10] [c000000000520360] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3683d50] [c00000000043e400] vfs_mkdir+0x110/0x230
[c0000008b3683da0] [c000000000441ee0] do_mkdirat+0xb0/0x1a0
[c0000008b3683e20] [c00000000000b278] system_call+0x5c/0x68

Mitigate this by allocating the new slab from the node_numa_mem.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v1 -> v2:
- Handled comments from Vlastimil Babka
	- Now node gets set to node_numa_mem in new_slab_objects.

 mm/slub.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1c55bf7892bf..2dc603a84290 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2475,6 +2475,9 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
 	if (freelist)
 		return freelist;
 
+	if (node != NUMA_NO_NODE && !node_present_pages(node))
+		node = node_to_mem_node(node);
+
 	page = new_slab(s, flags, node);
 	if (page) {
 		c = raw_cpu_ptr(s->cpu_slab);
@@ -2569,12 +2572,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
 		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
+			node = node_to_mem_node(node);
 
-		if (unlikely(!node_match(page, searchnode))) {
+		if (unlikely(!node_match(page, node))) {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/4] mm: Implement reset_numa_mem
  2020-03-18  7:28 [PATCH v2 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
  2020-03-18  7:28 ` [PATCH v2 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
  2020-03-18  7:28 ` [PATCH v2 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
@ 2020-03-18  7:28 ` Srikar Dronamraju
  2020-03-18 19:20   ` Christopher Lameter
  2020-03-18  7:28 ` [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
  3 siblings, 1 reply; 17+ messages in thread
From: Srikar Dronamraju @ 2020-03-18  7:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao,
	Nathan Lynch

For a memoryless or offline nodes, node_numa_mem refers to a N_MEMORY
fallback node. Currently kernel has an API set_numa_mem that sets
node_numa_mem for memoryless node. However this API cannot be used for
offline nodes. Hence all offline nodes will have their node_numa_mem set
to 0. However systems can themselves have node 0 as offline i.e
memoryless and cpuless at this time. In such cases,
node_to_mem_node() fails to provide a N_MEMORY fallback node.

Mitigate this by having a new API that sets the default node_numa_mem for
offline nodes to be first_memory_node.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/asm-generic/topology.h | 3 +++
 include/linux/topology.h       | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/include/asm-generic/topology.h b/include/asm-generic/topology.h
index 238873739550..e803ee7850e6 100644
--- a/include/asm-generic/topology.h
+++ b/include/asm-generic/topology.h
@@ -68,6 +68,9 @@
 #ifndef set_numa_mem
 #define set_numa_mem(node)
 #endif
+#ifndef reset_numa_mem
+#define reset_numa_mem(node)
+#endif
 #ifndef set_cpu_numa_mem
 #define set_cpu_numa_mem(cpu, node)
 #endif
diff --git a/include/linux/topology.h b/include/linux/topology.h
index eb2fe6edd73c..bebda80038bf 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -147,6 +147,13 @@ static inline int node_to_mem_node(int node)
 }
 #endif
 
+#ifndef reset_numa_mem
+static inline void reset_numa_mem(int node)
+{
+	_node_numa_mem_[node] = first_memory_node;
+}
+#endif
+
 #ifndef numa_mem_id
 /* Returns the number of the nearest Node with memory */
 static inline int numa_mem_id(void)
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes
  2020-03-18  7:28 [PATCH v2 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
                   ` (2 preceding siblings ...)
  2020-03-18  7:28 ` [PATCH v2 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
@ 2020-03-18  7:28 ` Srikar Dronamraju
  2020-03-18 14:28   ` kbuild test robot
  2020-03-18 18:56   ` kbuild test robot
  3 siblings, 2 replies; 17+ messages in thread
From: Srikar Dronamraju @ 2020-03-18  7:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao,
	Nathan Lynch

Currently fallback nodes for offline nodes aren't set. Hence by default
node 0 ends up being the default node. However node 0 might be offline.

Fix this by explicitly setting fallback node. Ensure first_memory_node
is set before kernel does explicit setting of fallback node.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
Changelog v1 -> v2:
- Handled comments from Bharata B Rao
	- Dont use dump_numa_cpu_topology to set fallback nodes

 arch/powerpc/include/asm/topology.h | 16 ++++++++++++++++
 arch/powerpc/kernel/smp.c           |  1 +
 2 files changed, 17 insertions(+)

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 2db7ba789720..baa89364197c 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -62,6 +62,21 @@ static inline int early_cpu_to_node(int cpu)
 	 */
 	return (nid < 0) ? 0 : nid;
 }
+
+static inline int update_default_numa_mem(void)
+{
+	unsigned int node;
+
+	for_each_node(node) {
+		/*
+		 * For all possible but not yet online nodes, ensure their
+		 * node_numa_mem is set correctly so that kmalloc_node works
+		 * for such nodes.
+		 */
+		if (!node_online(node))
+			reset_numa_mem(node);
+	}
+}
 #else
 
 static inline int early_cpu_to_node(int cpu) { return 0; }
@@ -90,6 +105,7 @@ static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
 	return 0;
 }
 
+static inline int update_default_numa_mem(void) {}
 #endif /* CONFIG_NUMA */
 
 #if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 37c12e3bab9e..d23faa70ea2d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1383,6 +1383,7 @@ void __init smp_cpus_done(unsigned int max_cpus)
 	if (smp_ops && smp_ops->bringup_done)
 		smp_ops->bringup_done();
 
+	update_default_numa_mem();
 	dump_numa_cpu_topology();
 
 #ifdef CONFIG_SCHED_SMT
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-18  7:28 ` [PATCH v2 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
@ 2020-03-18 10:02   ` Michal Hocko
  2020-03-18 11:02     ` Srikar Dronamraju
  2020-03-18 11:53     ` Vlastimil Babka
  0 siblings, 2 replies; 17+ messages in thread
From: Michal Hocko @ 2020-03-18 10:02 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Christopher Lameter, linuxppc-dev, Joonsoo Kim,
	Kirill Tkhai, Vlastimil Babka, Bharata B Rao, Nathan Lynch

On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
> Calling a kmalloc_node on a possible node which is not yet onlined can
> lead to panic. Currently node_present_pages() doesn't verify the node is
> online before accessing the pgdat for the node. However pgdat struct may
> not be available resulting in a crash.
> 
> NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
> LR [c0000000003d5b94] __slab_alloc+0x34/0x60
> Call Trace:
> [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
> [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
> [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
> [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
> [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
> [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
> [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
> [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
> [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
> [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
> [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
> [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
> 
> Fix this by verifying the node is online before accessing the pgdat
> structure. Fix the same for node_spanned_pages() too.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> Cc: Bharata B Rao <bharata@linux.ibm.com>
> Cc: Nathan Lynch <nathanl@linux.ibm.com>
> 
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  include/linux/mmzone.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f3f264826423..88078a3b95e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>  	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
>  } pg_data_t;
>  
> -#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
> -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> +#define node_present_pages(nid)		\
> +	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
> +#define node_spanned_pages(nid)		\
> +	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)

I believe this is a wrong approach. We really do not want to special
case all the places which require NODE_DATA. Can we please go and
allocate pgdat for all possible nodes?

The current state of memory less hacks subtle bugs poping up here and
there just prove that we should have done that from the very begining
IMHO.

>  #ifdef CONFIG_FLAT_NODE_MEM_MAP
>  #define pgdat_page_nr(pgdat, pagenr)	((pgdat)->node_mem_map + (pagenr))
>  #else
> -- 
> 2.18.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-18 10:02   ` Michal Hocko
@ 2020-03-18 11:02     ` Srikar Dronamraju
  2020-03-18 11:14       ` Michal Hocko
  2020-03-18 11:53     ` Vlastimil Babka
  1 sibling, 1 reply; 17+ messages in thread
From: Srikar Dronamraju @ 2020-03-18 11:02 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Christopher Lameter, linuxppc-dev, Joonsoo Kim,
	Kirill Tkhai, Vlastimil Babka, Bharata B Rao, Nathan Lynch

* Michal Hocko <mhocko@suse.com> [2020-03-18 11:02:56]:

> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
> > Calling a kmalloc_node on a possible node which is not yet onlined can
> > lead to panic. Currently node_present_pages() doesn't verify the node is
> > online before accessing the pgdat for the node. However pgdat struct may
> > not be available resulting in a crash.
> >
> > NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
> > LR [c0000000003d5b94] __slab_alloc+0x34/0x60
> > Call Trace:
> > [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
> > [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
> > [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
> > [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
> > [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
> > [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
> > [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
> > [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
> > [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
> > [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
> > [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
> > [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
> >
> > Fix this by verifying the node is online before accessing the pgdat
> > structure. Fix the same for node_spanned_pages() too.
> >
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: linux-mm@kvack.org
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Michael Ellerman <mpe@ellerman.id.au>
> > Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > Cc: Michal Hocko <mhocko@kernel.org>
> > Cc: Christopher Lameter <cl@linux.com>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > Cc: Bharata B Rao <bharata@linux.ibm.com>
> > Cc: Nathan Lynch <nathanl@linux.ibm.com>
> >
> > Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  include/linux/mmzone.h | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index f3f264826423..88078a3b95e5 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -756,8 +756,10 @@ typedef struct pglist_data {
> >  	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
> >  } pg_data_t;
> >
> > -#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
> > -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> > +#define node_present_pages(nid)		\
> > +	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
> > +#define node_spanned_pages(nid)		\
> > +	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
>
> I believe this is a wrong approach. We really do not want to special
> case all the places which require NODE_DATA. Can we please go and
> allocate pgdat for all possible nodes?
>

I can do that but the question I had was should we make this change just for
Powerpc or should the change be for other archs.

NODE_DATA initialization always seems to be in arch specific code.

The other archs that are affected seem to be mips, sh and sparc
These archs seem to have making an assumption that NODE_DATA has to be local
only,

For example on sparc / arch/sparc/mm/init_64.c in allocate_node_data function.

  NODE_DATA(nid) = memblock_alloc_node(sizeof(struct pglist_data),
                                             SMP_CACHE_BYTES, nid);
        if (!NODE_DATA(nid)) {
                prom_printf("Cannot allocate pglist_data for nid[%d]\n", nid);
                prom_halt();
        }

        NODE_DATA(nid)->node_id = nid;

So even if I make changes to allocate NODE_DATA from fallback node, I may not
be able to test them.

So please let me know your thoughts around the same.

> The current state of memory less hacks subtle bugs poping up here and
> there just prove that we should have done that from the very begining
> IMHO.
>
> >  #ifdef CONFIG_FLAT_NODE_MEM_MAP
> >  #define pgdat_page_nr(pgdat, pagenr)	((pgdat)->node_mem_map + (pagenr))
> >  #else
> > --
> > 2.18.1
>
> --
> Michal Hocko
> SUSE Labs
>

--
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-18 11:02     ` Srikar Dronamraju
@ 2020-03-18 11:14       ` Michal Hocko
  0 siblings, 0 replies; 17+ messages in thread
From: Michal Hocko @ 2020-03-18 11:14 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Christopher Lameter, linuxppc-dev, Joonsoo Kim,
	Kirill Tkhai, Vlastimil Babka, Bharata B Rao, Nathan Lynch

On Wed 18-03-20 16:32:15, Srikar Dronamraju wrote:
> * Michal Hocko <mhocko@suse.com> [2020-03-18 11:02:56]:
> 
> > On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
[...]
> > > -#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
> > > -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> > > +#define node_present_pages(nid)		\
> > > +	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
> > > +#define node_spanned_pages(nid)		\
> > > +	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
> >
> > I believe this is a wrong approach. We really do not want to special
> > case all the places which require NODE_DATA. Can we please go and
> > allocate pgdat for all possible nodes?
> >
> 
> I can do that but the question I had was should we make this change just for
> Powerpc or should the change be for other archs.

No, we shouldn't, really. If NODE_DATA is non-null for all possible
nodes then this shouldn't be really necessary and arch specific.

> NODE_DATA initialization always seems to be in arch specific code.
> 
> The other archs that are affected seem to be mips, sh and sparc
> These archs seem to have making an assumption that NODE_DATA has to be local
> only,

Which is all good and fine for nodes that hold some memory. If those
architectures support memory less nodes at all then I do not see any
problem to have remote pgdata.

> For example on sparc / arch/sparc/mm/init_64.c in allocate_node_data function.
> 
>   NODE_DATA(nid) = memblock_alloc_node(sizeof(struct pglist_data),
>                                              SMP_CACHE_BYTES, nid);
>         if (!NODE_DATA(nid)) {
>                 prom_printf("Cannot allocate pglist_data for nid[%d]\n", nid);
>                 prom_halt();
>         }
> 
>         NODE_DATA(nid)->node_id = nid;

This code is not about memroy less nodes, is it? It looks more like a
allocation failure panic-like handling because there is not enough
memory to hold pgdat. This also strongly suggests that this platform
doesn't really expect memory less nodes in the early init path.

> So even if I make changes to allocate NODE_DATA from fallback node, I may not
> be able to test them.

Please try to focus on the architecture you can test for. From the
existing reports I have seen this looks mostly to be a problem for x86
and ppc
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-18 10:02   ` Michal Hocko
  2020-03-18 11:02     ` Srikar Dronamraju
@ 2020-03-18 11:53     ` Vlastimil Babka
  2020-03-18 12:52       ` Michal Hocko
  2020-03-19  0:32       ` Michael Ellerman
  1 sibling, 2 replies; 17+ messages in thread
From: Vlastimil Babka @ 2020-03-18 11:53 UTC (permalink / raw)
  To: Michal Hocko, Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Christopher Lameter, linuxppc-dev, Joonsoo Kim,
	Kirill Tkhai, Bharata B Rao, Nathan Lynch

On 3/18/20 11:02 AM, Michal Hocko wrote:
> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
>> Calling a kmalloc_node on a possible node which is not yet onlined can
>> lead to panic. Currently node_present_pages() doesn't verify the node is
>> online before accessing the pgdat for the node. However pgdat struct may
>> not be available resulting in a crash.
>> 
>> NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
>> LR [c0000000003d5b94] __slab_alloc+0x34/0x60
>> Call Trace:
>> [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
>> [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
>> [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
>> [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
>> [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
>> [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
>> [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
>> [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
>> [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
>> [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
>> [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
>> [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
>> 
>> Fix this by verifying the node is online before accessing the pgdat
>> structure. Fix the same for node_spanned_pages() too.
>> 
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: linux-mm@kvack.org
>> Cc: Mel Gorman <mgorman@suse.de>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> Cc: Michal Hocko <mhocko@kernel.org>
>> Cc: Christopher Lameter <cl@linux.com>
>> Cc: linuxppc-dev@lists.ozlabs.org
>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> Cc: Bharata B Rao <bharata@linux.ibm.com>
>> Cc: Nathan Lynch <nathanl@linux.ibm.com>
>> 
>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>> ---
>>  include/linux/mmzone.h | 6 ++++--
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index f3f264826423..88078a3b95e5 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>>  	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
>>  } pg_data_t;
>>  
>> -#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
>> -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
>> +#define node_present_pages(nid)		\
>> +	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
>> +#define node_spanned_pages(nid)		\
>> +	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
> 
> I believe this is a wrong approach. We really do not want to special
> case all the places which require NODE_DATA. Can we please go and
> allocate pgdat for all possible nodes?
> 
> The current state of memory less hacks subtle bugs poping up here and
> there just prove that we should have done that from the very begining
> IMHO.

Yes. So here's an alternative proposal for fixing the current situation in SLUB,
before the long-term solution of having all possible nodes provide valid pgdat
with zonelists:

- fix SLUB with the hunk at the end of this mail - the point is to use NUMA_NO_NODE
  as fallback instead of node_to_mem_node()
- this removes all uses of node_to_mem_node (luckily it's just SLUB),
  kill it completely instead of trying to fix it up
- patch 1/4 is not needed with the fix
- perhaps many of your other patches are alss not needed 
- once we get the long-term solution, some of the !node_online() checks can be removed

----8<----
diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e33115..1d4f2d7a0080 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
-	if (node == NUMA_NO_NODE)
+	if (node == NUMA_NO_NODE || !node_online(node))
 		page = alloc_pages(flags, order);
 	else
 		page = __alloc_pages_node(node, flags, order);
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
-		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
-
-		if (unlikely(!node_match(page, searchnode))) {
+		/*
+		 * node_match() false implies node != NUMA_NO_NODE
+		 * but if the node is not online and has no pages, just
+		 * ignore the constraint
+		 */
+		if ((!node_online(node) || !node_present_pages(node))) {
+			node = NUMA_NO_NODE;
+			goto redo;
+		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;



^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-18 11:53     ` Vlastimil Babka
@ 2020-03-18 12:52       ` Michal Hocko
  2020-03-19  0:32       ` Michael Ellerman
  1 sibling, 0 replies; 17+ messages in thread
From: Michal Hocko @ 2020-03-18 12:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Srikar Dronamraju, Andrew Morton, linux-mm, Mel Gorman,
	Michael Ellerman, Sachin Sant, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Bharata B Rao, Nathan Lynch

On Wed 18-03-20 12:53:32, Vlastimil Babka wrote:
[...]
> Yes. So here's an alternative proposal for fixing the current situation in SLUB,
> before the long-term solution of having all possible nodes provide valid pgdat
> with zonelists:
> 
> - fix SLUB with the hunk at the end of this mail - the point is to use NUMA_NO_NODE
>   as fallback instead of node_to_mem_node()

I am not familiar with SLUB to review.

> - this removes all uses of node_to_mem_node (luckily it's just SLUB),
>   kill it completely instead of trying to fix it up

Sounds like a good plan to me. The code shouldn't really care.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes
  2020-03-18  7:28 ` [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
@ 2020-03-18 14:28   ` kbuild test robot
  2020-03-18 18:56   ` kbuild test robot
  1 sibling, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2020-03-18 14:28 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: kbuild-all, Andrew Morton, Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 2854 bytes --]

Hi Srikar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on powerpc/next]
[also build test WARNING on next-20200317]
[cannot apply to linus/master asm-generic/master mpe/next v5.6-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Srikar-Dronamraju/Fix-kmalloc_node-on-offline-nodes/20200318-180303
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-motionpro_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=9.2.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from include/linux/topology.h:36,
                    from include/linux/gfp.h:9,
                    from include/linux/mm.h:10,
                    from include/linux/bvec.h:13,
                    from include/linux/blk_types.h:10,
                    from include/linux/iomap.h:7,
                    from fs/iomap/trace.c:5:
   arch/powerpc/include/asm/topology.h: In function 'update_default_numa_mem':
>> arch/powerpc/include/asm/topology.h:108:1: warning: no return statement in function returning non-void [-Wreturn-type]
     108 | static inline int update_default_numa_mem(void) {}
         | ^~~~~~
--
   In file included from include/linux/topology.h:36,
                    from include/linux/gfp.h:9,
                    from include/linux/slab.h:15,
                    from include/linux/crypto.h:19,
                    from include/crypto/hash.h:11,
                    from include/linux/uio.h:10,
                    from include/linux/socket.h:8,
                    from include/linux/compat.h:15,
                    from arch/powerpc/kernel/asm-offsets.c:14:
   arch/powerpc/include/asm/topology.h: In function 'update_default_numa_mem':
>> arch/powerpc/include/asm/topology.h:108:1: warning: no return statement in function returning non-void [-Wreturn-type]
     108 | static inline int update_default_numa_mem(void) {}
         | ^~~~~~
   29 real  8 user  16 sys  85.32% cpu 	make prepare

vim +108 arch/powerpc/include/asm/topology.h

   107	
 > 108	static inline int update_default_numa_mem(void) {}
   109	#endif /* CONFIG_NUMA */
   110	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 15143 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes
  2020-03-18  7:28 ` [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
  2020-03-18 14:28   ` kbuild test robot
@ 2020-03-18 18:56   ` kbuild test robot
  1 sibling, 0 replies; 17+ messages in thread
From: kbuild test robot @ 2020-03-18 18:56 UTC (permalink / raw)
  To: Srikar Dronamraju; +Cc: kbuild-all, Andrew Morton, Linux Memory Management List

[-- Attachment #1: Type: text/plain, Size: 4457 bytes --]

Hi Srikar,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on next-20200317]
[cannot apply to linus/master asm-generic/master mpe/next v5.6-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Srikar-Dronamraju/Fix-kmalloc_node-on-offline-nodes/20200318-180303
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-defconfig (attached as .config)
compiler: powerpc64-linux-gcc (GCC) 9.2.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        GCC_VERSION=9.2.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/topology.h:36,
                    from include/linux/gfp.h:9,
                    from include/linux/slab.h:15,
                    from include/linux/crypto.h:19,
                    from include/crypto/hash.h:11,
                    from include/linux/uio.h:10,
                    from include/linux/socket.h:8,
                    from include/linux/compat.h:15,
                    from arch/powerpc/kernel/asm-offsets.c:14:
   arch/powerpc/include/asm/topology.h: In function 'update_default_numa_mem':
>> arch/powerpc/include/asm/topology.h:77:4: error: implicit declaration of function 'reset_numa_mem' [-Werror=implicit-function-declaration]
      77 |    reset_numa_mem(node);
         |    ^~~~~~~~~~~~~~
   arch/powerpc/include/asm/topology.h:79:1: warning: no return statement in function returning non-void [-Wreturn-type]
      79 | }
         | ^
   In file included from include/linux/gfp.h:9,
                    from include/linux/slab.h:15,
                    from include/linux/crypto.h:19,
                    from include/crypto/hash.h:11,
                    from include/linux/uio.h:10,
                    from include/linux/socket.h:8,
                    from include/linux/compat.h:15,
                    from arch/powerpc/kernel/asm-offsets.c:14:
   include/linux/topology.h: At top level:
>> include/linux/topology.h:151:20: warning: conflicting types for 'reset_numa_mem'
     151 | static inline void reset_numa_mem(int node)
         |                    ^~~~~~~~~~~~~~
>> include/linux/topology.h:151:20: error: static declaration of 'reset_numa_mem' follows non-static declaration
   In file included from include/linux/topology.h:36,
                    from include/linux/gfp.h:9,
                    from include/linux/slab.h:15,
                    from include/linux/crypto.h:19,
                    from include/crypto/hash.h:11,
                    from include/linux/uio.h:10,
                    from include/linux/socket.h:8,
                    from include/linux/compat.h:15,
                    from arch/powerpc/kernel/asm-offsets.c:14:
   arch/powerpc/include/asm/topology.h:77:4: note: previous implicit declaration of 'reset_numa_mem' was here
      77 |    reset_numa_mem(node);
         |    ^~~~~~~~~~~~~~
   cc1: some warnings being treated as errors
   make[2]: *** [scripts/Makefile.build:101: arch/powerpc/kernel/asm-offsets.s] Error 1
   make[2]: Target '__build' not remade because of errors.
   make[1]: *** [Makefile:1112: prepare0] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:179: sub-make] Error 2
   256 real  58 user  115 sys  67.93% cpu 	make prepare

vim +/reset_numa_mem +77 arch/powerpc/include/asm/topology.h

    65	
    66	static inline int update_default_numa_mem(void)
    67	{
    68		unsigned int node;
    69	
    70		for_each_node(node) {
    71			/*
    72			 * For all possible but not yet online nodes, ensure their
    73			 * node_numa_mem is set correctly so that kmalloc_node works
    74			 * for such nodes.
    75			 */
    76			if (!node_online(node))
  > 77				reset_numa_mem(node);
    78		}
    79	}
    80	#else
    81	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 25779 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/4] mm: Implement reset_numa_mem
  2020-03-18  7:28 ` [PATCH v2 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
@ 2020-03-18 19:20   ` Christopher Lameter
  2020-03-19  7:44     ` Michal Hocko
  0 siblings, 1 reply; 17+ messages in thread
From: Christopher Lameter @ 2020-03-18 19:20 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, linuxppc-dev, Joonsoo Kim,
	Kirill Tkhai, Vlastimil Babka, Bharata B Rao, Nathan Lynch

On Wed, 18 Mar 2020, Srikar Dronamraju wrote:

> For a memoryless or offline nodes, node_numa_mem refers to a N_MEMORY
> fallback node. Currently kernel has an API set_numa_mem that sets
> node_numa_mem for memoryless node. However this API cannot be used for
> offline nodes. Hence all offline nodes will have their node_numa_mem set
> to 0. However systems can themselves have node 0 as offline i.e

That is a significant change to the basic assumptions for memory less
nodes. Node 0 needed to have memory and processors. Not sure what else
may break.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-18 11:53     ` Vlastimil Babka
  2020-03-18 12:52       ` Michal Hocko
@ 2020-03-19  0:32       ` Michael Ellerman
  2020-03-19  1:11         ` Michael Ellerman
  2020-03-19  9:38         ` Vlastimil Babka
  1 sibling, 2 replies; 17+ messages in thread
From: Michael Ellerman @ 2020-03-19  0:32 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko, Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Sachin Sant,
	Christopher Lameter, linuxppc-dev, Joonsoo Kim, Kirill Tkhai,
	Bharata B Rao, Nathan Lynch

Vlastimil Babka <vbabka@suse.cz> writes:
> On 3/18/20 11:02 AM, Michal Hocko wrote:
>> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
>>> Calling a kmalloc_node on a possible node which is not yet onlined can
>>> lead to panic. Currently node_present_pages() doesn't verify the node is
>>> online before accessing the pgdat for the node. However pgdat struct may
>>> not be available resulting in a crash.
>>> 
>>> NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
>>> LR [c0000000003d5b94] __slab_alloc+0x34/0x60
>>> Call Trace:
>>> [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
>>> [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
>>> [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
>>> [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
>>> [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
>>> [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
>>> [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
>>> [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
>>> [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
>>> [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
>>> [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
>>> [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
>>> 
>>> Fix this by verifying the node is online before accessing the pgdat
>>> structure. Fix the same for node_spanned_pages() too.
>>> 
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Cc: linux-mm@kvack.org
>>> Cc: Mel Gorman <mgorman@suse.de>
>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>> Cc: Michal Hocko <mhocko@kernel.org>
>>> Cc: Christopher Lameter <cl@linux.com>
>>> Cc: linuxppc-dev@lists.ozlabs.org
>>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>> Cc: Bharata B Rao <bharata@linux.ibm.com>
>>> Cc: Nathan Lynch <nathanl@linux.ibm.com>
>>> 
>>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>> ---
>>>  include/linux/mmzone.h | 6 ++++--
>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>> 
>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>> index f3f264826423..88078a3b95e5 100644
>>> --- a/include/linux/mmzone.h
>>> +++ b/include/linux/mmzone.h
>>> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>>>  	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
>>>  } pg_data_t;
>>>  
>>> -#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
>>> -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
>>> +#define node_present_pages(nid)		\
>>> +	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
>>> +#define node_spanned_pages(nid)		\
>>> +	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
>> 
>> I believe this is a wrong approach. We really do not want to special
>> case all the places which require NODE_DATA. Can we please go and
>> allocate pgdat for all possible nodes?
>> 
>> The current state of memory less hacks subtle bugs poping up here and
>> there just prove that we should have done that from the very begining
>> IMHO.
>
> Yes. So here's an alternative proposal for fixing the current situation in SLUB,
> before the long-term solution of having all possible nodes provide valid pgdat
> with zonelists:
>
> - fix SLUB with the hunk at the end of this mail - the point is to use NUMA_NO_NODE
>   as fallback instead of node_to_mem_node()
> - this removes all uses of node_to_mem_node (luckily it's just SLUB),
>   kill it completely instead of trying to fix it up
> - patch 1/4 is not needed with the fix
> - perhaps many of your other patches are alss not needed 
> - once we get the long-term solution, some of the !node_online() checks can be removed

Seems like a nice solution to me :)

> ----8<----
> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..1d4f2d7a0080 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>  	struct page *page;
>  	unsigned int order = oo_order(oo);
>  
> -	if (node == NUMA_NO_NODE)
> +	if (node == NUMA_NO_NODE || !node_online(node))

Why don't we need the node_present_pages() check here?

>  		page = alloc_pages(flags, order);
>  	else
>  		page = __alloc_pages_node(node, flags, order);
> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  
>  	if (node == NUMA_NO_NODE)
>  		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);
>  
>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>  	if (object || node != NUMA_NO_NODE)
> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  redo:
>  
>  	if (unlikely(!node_match(page, node))) {
> -		int searchnode = node;
> -
> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
> -			searchnode = node_to_mem_node(node);
> -
> -		if (unlikely(!node_match(page, searchnode))) {
> +		/*
> +		 * node_match() false implies node != NUMA_NO_NODE
> +		 * but if the node is not online and has no pages, just
                                                 ^
                                                 this should be 'or' ?

> +		 * ignore the constraint
> +		 */
> +		if ((!node_online(node) || !node_present_pages(node))) {
> +			node = NUMA_NO_NODE;
> +			goto redo;
> +		} else {
>  			stat(s, ALLOC_NODE_MISMATCH);
>  			deactivate_slab(s, page, c->freelist, c);
>  			goto new_slab;

cheers


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-19  0:32       ` Michael Ellerman
@ 2020-03-19  1:11         ` Michael Ellerman
  2020-03-19  9:38         ` Vlastimil Babka
  1 sibling, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2020-03-19  1:11 UTC (permalink / raw)
  To: Vlastimil Babka, Michal Hocko, Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Sachin Sant,
	Christopher Lameter, linuxppc-dev, Joonsoo Kim, Kirill Tkhai,
	Bharata B Rao, Nathan Lynch

Michael Ellerman <mpe@ellerman.id.au> writes:
> Vlastimil Babka <vbabka@suse.cz> writes:
>> On 3/18/20 11:02 AM, Michal Hocko wrote:
>>> On Wed 18-03-20 12:58:07, Srikar Dronamraju wrote:
>>>> Calling a kmalloc_node on a possible node which is not yet onlined can
>>>> lead to panic. Currently node_present_pages() doesn't verify the node is
>>>> online before accessing the pgdat for the node. However pgdat struct may
>>>> not be available resulting in a crash.
>>>> 
>>>> NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
>>>> LR [c0000000003d5b94] __slab_alloc+0x34/0x60
>>>> Call Trace:
>>>> [c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
>>>> [c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
>>>> [c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
>>>> [c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
>>>> [c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
>>>> [c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
>>>> [c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
>>>> [c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
>>>> [c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
>>>> [c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
>>>> [c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
>>>> [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
>>>> 
>>>> Fix this by verifying the node is online before accessing the pgdat
>>>> structure. Fix the same for node_spanned_pages() too.
>>>> 
>>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>>> Cc: linux-mm@kvack.org
>>>> Cc: Mel Gorman <mgorman@suse.de>
>>>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>>>> Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Cc: Michal Hocko <mhocko@kernel.org>
>>>> Cc: Christopher Lameter <cl@linux.com>
>>>> Cc: linuxppc-dev@lists.ozlabs.org
>>>> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
>>>> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
>>>> Cc: Vlastimil Babka <vbabka@suse.cz>
>>>> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>>> Cc: Bharata B Rao <bharata@linux.ibm.com>
>>>> Cc: Nathan Lynch <nathanl@linux.ibm.com>
>>>> 
>>>> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>>>> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
>>>> ---
>>>>  include/linux/mmzone.h | 6 ++++--
>>>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>>> 
>>>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>>>> index f3f264826423..88078a3b95e5 100644
>>>> --- a/include/linux/mmzone.h
>>>> +++ b/include/linux/mmzone.h
>>>> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>>>>  	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
>>>>  } pg_data_t;
>>>>  
>>>> -#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
>>>> -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
>>>> +#define node_present_pages(nid)		\
>>>> +	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
>>>> +#define node_spanned_pages(nid)		\
>>>> +	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
>>> 
>>> I believe this is a wrong approach. We really do not want to special
>>> case all the places which require NODE_DATA. Can we please go and
>>> allocate pgdat for all possible nodes?
>>> 
>>> The current state of memory less hacks subtle bugs poping up here and
>>> there just prove that we should have done that from the very begining
>>> IMHO.
>>
>> Yes. So here's an alternative proposal for fixing the current situation in SLUB,
>> before the long-term solution of having all possible nodes provide valid pgdat
>> with zonelists:
>>
>> - fix SLUB with the hunk at the end of this mail - the point is to use NUMA_NO_NODE
>>   as fallback instead of node_to_mem_node()
>> - this removes all uses of node_to_mem_node (luckily it's just SLUB),
>>   kill it completely instead of trying to fix it up
>> - patch 1/4 is not needed with the fix
>> - perhaps many of your other patches are alss not needed 
>> - once we get the long-term solution, some of the !node_online() checks can be removed
>
> Seems like a nice solution to me :)
>
>> ----8<----
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..1d4f2d7a0080 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>>  	struct page *page;
>>  	unsigned int order = oo_order(oo);
>>  
>> -	if (node == NUMA_NO_NODE)
>> +	if (node == NUMA_NO_NODE || !node_online(node))
>
> Why don't we need the node_present_pages() check here?
>
>>  		page = alloc_pages(flags, order);
>>  	else
>>  		page = __alloc_pages_node(node, flags, order);
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>>  
>>  	if (node == NUMA_NO_NODE)
>>  		searchnode = numa_mem_id();
>> -	else if (!node_present_pages(node))
>> -		searchnode = node_to_mem_node(node);
>>  
>>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>>  	if (object || node != NUMA_NO_NODE)
>> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  redo:
>>  
>>  	if (unlikely(!node_match(page, node))) {
>> -		int searchnode = node;
>> -
>> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
>> -			searchnode = node_to_mem_node(node);
>> -
>> -		if (unlikely(!node_match(page, searchnode))) {
>> +		/*
>> +		 * node_match() false implies node != NUMA_NO_NODE
>> +		 * but if the node is not online and has no pages, just
>                                                  ^
>                                                  this should be 'or' ?

Sorry I see you've already fixed this in the version you posted.

cheers


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/4] mm: Implement reset_numa_mem
  2020-03-18 19:20   ` Christopher Lameter
@ 2020-03-19  7:44     ` Michal Hocko
  0 siblings, 0 replies; 17+ messages in thread
From: Michal Hocko @ 2020-03-19  7:44 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Srikar Dronamraju, Andrew Morton, linux-mm, Mel Gorman,
	Michael Ellerman, Sachin Sant, linuxppc-dev, Joonsoo Kim,
	Kirill Tkhai, Vlastimil Babka, Bharata B Rao, Nathan Lynch

On Wed 18-03-20 19:20:41, Cristopher Lameter wrote:
> On Wed, 18 Mar 2020, Srikar Dronamraju wrote:
> 
> > For a memoryless or offline nodes, node_numa_mem refers to a N_MEMORY
> > fallback node. Currently kernel has an API set_numa_mem that sets
> > node_numa_mem for memoryless node. However this API cannot be used for
> > offline nodes. Hence all offline nodes will have their node_numa_mem set
> > to 0. However systems can themselves have node 0 as offline i.e
> 
> That is a significant change to the basic assumptions for memory less
> nodes. Node 0 needed to have memory and processors. Not sure what else
> may break.

This assumption is simply incorrect. There many examples but just
one from top of my head 3e8589963773 ("memcg: make it work on sparse
non-0-node systems"). We simply have to forget that some nodes are
special.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 1/4] mm: Check for node_online in node_present_pages
  2020-03-19  0:32       ` Michael Ellerman
  2020-03-19  1:11         ` Michael Ellerman
@ 2020-03-19  9:38         ` Vlastimil Babka
  1 sibling, 0 replies; 17+ messages in thread
From: Vlastimil Babka @ 2020-03-19  9:38 UTC (permalink / raw)
  To: Michael Ellerman, Michal Hocko, Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Sachin Sant,
	Christopher Lameter, linuxppc-dev, Joonsoo Kim, Kirill Tkhai,
	Bharata B Rao, Nathan Lynch

On 3/19/20 1:32 AM, Michael Ellerman wrote:
> Seems like a nice solution to me

Thanks :)

>> ----8<----
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..1d4f2d7a0080 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>>  	struct page *page;
>>  	unsigned int order = oo_order(oo);
>>  
>> -	if (node == NUMA_NO_NODE)
>> +	if (node == NUMA_NO_NODE || !node_online(node))
> 
> Why don't we need the node_present_pages() check here?

Page allocator is fine with a node without present pages, as long as there's a
zonelist, which online nodes must have (ideally all possible nodes should have,
and then we can remove this).

SLUB on the other hand doesn't allocate cache per-cpu structures for nodes
without present pages (understandably) that's why the other place includes the
node_present_pages() check.

Thanks


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-03-19  9:38 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18  7:28 [PATCH v2 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
2020-03-18  7:28 ` [PATCH v2 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
2020-03-18 10:02   ` Michal Hocko
2020-03-18 11:02     ` Srikar Dronamraju
2020-03-18 11:14       ` Michal Hocko
2020-03-18 11:53     ` Vlastimil Babka
2020-03-18 12:52       ` Michal Hocko
2020-03-19  0:32       ` Michael Ellerman
2020-03-19  1:11         ` Michael Ellerman
2020-03-19  9:38         ` Vlastimil Babka
2020-03-18  7:28 ` [PATCH v2 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
2020-03-18  7:28 ` [PATCH v2 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
2020-03-18 19:20   ` Christopher Lameter
2020-03-19  7:44     ` Michal Hocko
2020-03-18  7:28 ` [PATCH v2 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
2020-03-18 14:28   ` kbuild test robot
2020-03-18 18:56   ` kbuild test robot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).