linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Fix kmalloc_node on offline nodes
       [not found] <3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com>
@ 2020-03-17 13:17 ` Srikar Dronamraju
  2020-03-17 13:17   ` [PATCH 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
                     ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 13:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao

Sachin recently reported that linux-next was no more bootable on few
systems.
https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/

# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10
#

Sachin bisected the problem to Commit a75056fc1e7c ("mm/memcontrol.c: allocate
shrinker_map on appropriate NUMA node")

The root cause analysis showed that mm/slub and powerpc/numa had some shortcomings
with respect to offline nodes.

This patch series is on top of patches posted at
https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>

Srikar Dronamraju (4):
  mm: Check for node_online in node_present_pages
  mm/slub: Use mem_node to allocate a new slab
  mm: Implement reset_numa_mem
  powerpc/numa: Set fallback nodes for offline nodes

 arch/powerpc/mm/numa.c         | 11 ++++++++++-
 include/asm-generic/topology.h |  3 +++
 include/linux/mmzone.h         |  6 ++++--
 include/linux/topology.h       |  7 +++++++
 mm/slub.c                      | 19 ++++++++-----------
 5 files changed, 32 insertions(+), 14 deletions(-)

--
2.18.1



^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/4] mm: Check for node_online in node_present_pages
  2020-03-17 13:17 ` [PATCH 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
@ 2020-03-17 13:17   ` Srikar Dronamraju
  2020-03-17 13:37     ` Srikar Dronamraju
  2020-03-17 13:17   ` [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 13:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao

Calling a kmalloc_node on a possible node which is not yet onlined can
lead to panic. Currently node_present_pages() doesn't verify the node is
online before accessing the pgdat for the node. However pgdat struct may
not be available resulting in a crash.

NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
LR [c0000000003d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
[c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
[c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
[c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
[c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
[c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
[c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
[c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
[c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68

Fix this by verifying the node is online before accessing the pgdat
structure. Fix the same for node_spanned_pages() too.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/linux/mmzone.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index f3f264826423..88078a3b95e5 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -756,8 +756,10 @@ typedef struct pglist_data {
 	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
 } pg_data_t;
 
-#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
-#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
+#define node_present_pages(nid)		\
+	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
+#define node_spanned_pages(nid)		\
+	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
 #ifdef CONFIG_FLAT_NODE_MEM_MAP
 #define pgdat_page_nr(pgdat, pagenr)	((pgdat)->node_mem_map + (pagenr))
 #else
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-17 13:17 ` [PATCH 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
  2020-03-17 13:17   ` [PATCH 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
@ 2020-03-17 13:17   ` Srikar Dronamraju
  2020-03-17 13:34     ` Vlastimil Babka
  2020-03-17 13:17   ` [PATCH 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
  2020-03-17 13:17   ` [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
  3 siblings, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 13:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao

Currently while allocating a slab for a offline node, we use its
associated node_numa_mem to search for a partial slab. If we don't find
a partial slab, we try allocating a slab from the offline node using
__alloc_pages_node. However this is bound to fail.

NIP [c00000000039a300] __alloc_pages_nodemask+0x130/0x3b0
LR [c00000000039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
Call Trace:
[c0000008b36837f0] [c00000000039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 (unreliable)
[c0000008b3683870] [c0000000003d1ff8] new_slab+0x128/0xcf0
[c0000008b3683950] [c0000000003d6060] ___slab_alloc+0x410/0x820
[c0000008b3683a40] [c0000000003d64a4] __slab_alloc+0x34/0x60
[c0000008b3683a70] [c0000000003d78b0] __kmalloc_node+0x110/0x490
[c0000008b3683af0] [c000000000343a08] kvmalloc_node+0x58/0x110
[c0000008b3683b30] [c0000000003ffd44] mem_cgroup_css_online+0x104/0x270
[c0000008b3683b90] [c000000000234e08] online_css+0x48/0xd0
[c0000008b3683bc0] [c00000000023dedc] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3683ca0] [c0000000002416f8] cgroup_mkdir+0x228/0x5f0
[c0000008b3683d10] [c000000000520360] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3683d50] [c00000000043e400] vfs_mkdir+0x110/0x230
[c0000008b3683da0] [c000000000441ee0] do_mkdirat+0xb0/0x1a0
[c0000008b3683e20] [c00000000000b278] system_call+0x5c/0x68

Mitigate this by allocating the new slab from the node_numa_mem.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 mm/slub.c | 19 ++++++++-----------
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 1c55bf7892bf..fdf7f38f96e6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 		struct kmem_cache_cpu *c)
 {
 	void *object;
-	int searchnode = node;
 
-	if (node == NUMA_NO_NODE)
-		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
-
-	object = get_partial_node(s, get_node(s, searchnode), c, flags);
+	object = get_partial_node(s, get_node(s, node), c, flags);
 	if (object || node != NUMA_NO_NODE)
 		return object;
 
@@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
 
 	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
 
+	if (node == NUMA_NO_NODE)
+		node = numa_mem_id();
+	else if (!node_present_pages(node))
+		node = node_to_mem_node(node);
+
 	freelist = get_partial(s, flags, node, c);
 
 	if (freelist)
@@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
 		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
+			node = node_to_mem_node(node);
 
-		if (unlikely(!node_match(page, searchnode))) {
+		if (unlikely(!node_match(page, node))) {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] mm: Implement reset_numa_mem
  2020-03-17 13:17 ` [PATCH 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
  2020-03-17 13:17   ` [PATCH 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
  2020-03-17 13:17   ` [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
@ 2020-03-17 13:17   ` Srikar Dronamraju
  2020-03-17 13:17   ` [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
  3 siblings, 0 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 13:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao

For a memoryless or offline nodes, node_numa_mem refers to a N_MEMORY
fallback node. Currently kernel has an API set_numa_mem that sets
node_numa_mem for memoryless node. However this API cannot be used for
offline nodes. Hence all offline nodes will have their node_numa_mem set
to 0. However systems can themselves have node 0 as offline i.e
memoryless and cpuless at this time. In such cases,
node_to_mem_node() fails to provide a N_MEMORY fallback node.

Mitigate this by having a new API that sets the default node_numa_mem for
offline nodes to be first_memory_node.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 include/asm-generic/topology.h | 3 +++
 include/linux/topology.h       | 7 +++++++
 2 files changed, 10 insertions(+)

diff --git a/include/asm-generic/topology.h b/include/asm-generic/topology.h
index 238873739550..e803ee7850e6 100644
--- a/include/asm-generic/topology.h
+++ b/include/asm-generic/topology.h
@@ -68,6 +68,9 @@
 #ifndef set_numa_mem
 #define set_numa_mem(node)
 #endif
+#ifndef reset_numa_mem
+#define reset_numa_mem(node)
+#endif
 #ifndef set_cpu_numa_mem
 #define set_cpu_numa_mem(cpu, node)
 #endif
diff --git a/include/linux/topology.h b/include/linux/topology.h
index eb2fe6edd73c..bebda80038bf 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -147,6 +147,13 @@ static inline int node_to_mem_node(int node)
 }
 #endif
 
+#ifndef reset_numa_mem
+static inline void reset_numa_mem(int node)
+{
+	_node_numa_mem_[node] = first_memory_node;
+}
+#endif
+
 #ifndef numa_mem_id
 /* Returns the number of the nearest Node with memory */
 static inline int numa_mem_id(void)
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes
  2020-03-17 13:17 ` [PATCH 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
                     ` (2 preceding siblings ...)
  2020-03-17 13:17   ` [PATCH 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
@ 2020-03-17 13:17   ` Srikar Dronamraju
  2020-03-17 14:22     ` Bharata B Rao
  3 siblings, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 13:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Srikar Dronamraju, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka, Bharata B Rao

Currently fallback nodes for offline nodes aren't set. Hence by default
node 0 ends up being the default node. However node 0 might be offline.

Fix this by explicitly setting fallback node. Ensure first_memory_node
is set before kernel does explicit setting of fallback node.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Sachin Sant <sachinp@linux.vnet.ibm.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Bharata B Rao <bharata@linux.ibm.com>

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/numa.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index 281531340230..6e97ab6575cb 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -827,7 +827,16 @@ void __init dump_numa_cpu_topology(void)
 	if (!numa_enabled)
 		return;
 
-	for_each_online_node(node) {
+	for_each_node(node) {
+		/*
+		 * For all possible but not yet online nodes, ensure their
+		 * node_numa_mem is set correctly so that kmalloc_node works
+		 * for such nodes.
+		 */
+		if (!node_online(node)) {
+			reset_numa_mem(node);
+			continue;
+		}
 		pr_info("Node %d CPUs:", node);
 
 		count = 0;
-- 
2.18.1



^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-17 13:17   ` [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
@ 2020-03-17 13:34     ` Vlastimil Babka
  2020-03-17 13:45       ` Srikar Dronamraju
  2020-03-17 16:41       ` Srikar Dronamraju
  0 siblings, 2 replies; 15+ messages in thread
From: Vlastimil Babka @ 2020-03-17 13:34 UTC (permalink / raw)
  To: Srikar Dronamraju, Andrew Morton
  Cc: linux-mm, Mel Gorman, Michael Ellerman, Sachin Sant,
	Michal Hocko, Christopher Lameter, linuxppc-dev, Joonsoo Kim,
	Kirill Tkhai, Bharata B Rao

On 3/17/20 2:17 PM, Srikar Dronamraju wrote:
> Currently while allocating a slab for a offline node, we use its
> associated node_numa_mem to search for a partial slab. If we don't find
> a partial slab, we try allocating a slab from the offline node using
> __alloc_pages_node. However this is bound to fail.
> 
> NIP [c00000000039a300] __alloc_pages_nodemask+0x130/0x3b0
> LR [c00000000039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
> Call Trace:
> [c0000008b36837f0] [c00000000039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 (unreliable)
> [c0000008b3683870] [c0000000003d1ff8] new_slab+0x128/0xcf0
> [c0000008b3683950] [c0000000003d6060] ___slab_alloc+0x410/0x820
> [c0000008b3683a40] [c0000000003d64a4] __slab_alloc+0x34/0x60
> [c0000008b3683a70] [c0000000003d78b0] __kmalloc_node+0x110/0x490
> [c0000008b3683af0] [c000000000343a08] kvmalloc_node+0x58/0x110
> [c0000008b3683b30] [c0000000003ffd44] mem_cgroup_css_online+0x104/0x270
> [c0000008b3683b90] [c000000000234e08] online_css+0x48/0xd0
> [c0000008b3683bc0] [c00000000023dedc] cgroup_apply_control_enable+0x2ec/0x4d0
> [c0000008b3683ca0] [c0000000002416f8] cgroup_mkdir+0x228/0x5f0
> [c0000008b3683d10] [c000000000520360] kernfs_iop_mkdir+0x90/0xf0
> [c0000008b3683d50] [c00000000043e400] vfs_mkdir+0x110/0x230
> [c0000008b3683da0] [c000000000441ee0] do_mkdirat+0xb0/0x1a0
> [c0000008b3683e20] [c00000000000b278] system_call+0x5c/0x68
> 
> Mitigate this by allocating the new slab from the node_numa_mem.

Are you sure this is really needed and the other 3 patches are not enough for
the current SLUB code to work as needed? It seems you are changing the semantics
here...

> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  		struct kmem_cache_cpu *c)
>  {
>  	void *object;
> -	int searchnode = node;
>  
> -	if (node == NUMA_NO_NODE)
> -		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);
> -
> -	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> +	object = get_partial_node(s, get_node(s, node), c, flags);
>  	if (object || node != NUMA_NO_NODE)>  		return object;
>
>       return get_any_partial(s, flags, c);

I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the
hunk below), thus the get_any_partial() call becomes dead code?

> @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
>  
>  	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
>  
> +	if (node == NUMA_NO_NODE)
> +		node = numa_mem_id();
> +	else if (!node_present_pages(node))
> +		node = node_to_mem_node(node);
> +
>  	freelist = get_partial(s, flags, node, c);
>  
>  	if (freelist)
> @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  redo:
>  
>  	if (unlikely(!node_match(page, node))) {
> -		int searchnode = node;
> -
>  		if (node != NUMA_NO_NODE && !node_present_pages(node))
> -			searchnode = node_to_mem_node(node);
> +			node = node_to_mem_node(node);
>  
> -		if (unlikely(!node_match(page, searchnode))) {
> +		if (unlikely(!node_match(page, node))) {
>  			stat(s, ALLOC_NODE_MISMATCH);
>  			deactivate_slab(s, page, c->freelist, c);
>  			goto new_slab;
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] mm: Check for node_online in node_present_pages
  2020-03-17 13:17   ` [PATCH 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
@ 2020-03-17 13:37     ` Srikar Dronamraju
  0 siblings, 0 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 13:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Mel Gorman, Michael Ellerman, Sachin Sant,
	Michal Hocko, Christopher Lameter, linuxppc-dev, Joonsoo Kim,
	Kirill Tkhai, Vlastimil Babka, Bharata B Rao

* Srikar Dronamraju <srikar@linux.vnet.ibm.com> [2020-03-17 18:47:50]:

> 
> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> ---
>  include/linux/mmzone.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index f3f264826423..88078a3b95e5 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -756,8 +756,10 @@ typedef struct pglist_data {
>  	atomic_long_t		vm_stat[NR_VM_NODE_STAT_ITEMS];
>  } pg_data_t;
> 
> -#define node_present_pages(nid)	(NODE_DATA(nid)->node_present_pages)
> -#define node_spanned_pages(nid)	(NODE_DATA(nid)->node_spanned_pages)
> +#define node_present_pages(nid)		\
> +	(node_online(nid) ? NODE_DATA(nid)->node_present_pages : 0)
> +#define node_spanned_pages(nid)		\
> +	(node_online(nid) ? NODE_DATA(nid)->node_spanned_pages : 0)
>  #ifdef CONFIG_FLAT_NODE_MEM_MAP
>  #define pgdat_page_nr(pgdat, pagenr)	((pgdat)->node_mem_map + (pagenr))
>  #else

If we indeed start having pgdat/NODE_DATA for even offline nodes as Michal
Hocko, then we may not this particular patch.

-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-17 13:34     ` Vlastimil Babka
@ 2020-03-17 13:45       ` Srikar Dronamraju
  2020-03-17 13:53         ` Vlastimil Babka
  2020-03-17 16:41       ` Srikar Dronamraju
  1 sibling, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 13:45 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Bharata B Rao

* Vlastimil Babka <vbabka@suse.cz> [2020-03-17 14:34:25]:

> On 3/17/20 2:17 PM, Srikar Dronamraju wrote:
> > Currently while allocating a slab for a offline node, we use its
> > associated node_numa_mem to search for a partial slab. If we don't find
> > a partial slab, we try allocating a slab from the offline node using
> > __alloc_pages_node. However this is bound to fail.
> > 
> > NIP [c00000000039a300] __alloc_pages_nodemask+0x130/0x3b0
> > LR [c00000000039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
> > Call Trace:
> > [c0000008b36837f0] [c00000000039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 (unreliable)
> > [c0000008b3683870] [c0000000003d1ff8] new_slab+0x128/0xcf0
> > [c0000008b3683950] [c0000000003d6060] ___slab_alloc+0x410/0x820
> > [c0000008b3683a40] [c0000000003d64a4] __slab_alloc+0x34/0x60
> > [c0000008b3683a70] [c0000000003d78b0] __kmalloc_node+0x110/0x490
> > [c0000008b3683af0] [c000000000343a08] kvmalloc_node+0x58/0x110
> > [c0000008b3683b30] [c0000000003ffd44] mem_cgroup_css_online+0x104/0x270
> > [c0000008b3683b90] [c000000000234e08] online_css+0x48/0xd0
> > [c0000008b3683bc0] [c00000000023dedc] cgroup_apply_control_enable+0x2ec/0x4d0
> > [c0000008b3683ca0] [c0000000002416f8] cgroup_mkdir+0x228/0x5f0
> > [c0000008b3683d10] [c000000000520360] kernfs_iop_mkdir+0x90/0xf0
> > [c0000008b3683d50] [c00000000043e400] vfs_mkdir+0x110/0x230
> > [c0000008b3683da0] [c000000000441ee0] do_mkdirat+0xb0/0x1a0
> > [c0000008b3683e20] [c00000000000b278] system_call+0x5c/0x68
> > 
> > Mitigate this by allocating the new slab from the node_numa_mem.
> 
> Are you sure this is really needed and the other 3 patches are not enough for
> the current SLUB code to work as needed? It seems you are changing the semantics
> here...
> 

The other 3 patches are not enough because we don't carry the searchnode
when the actual alloc_pages_node gets called. 

With only the 3 patches, we see the above Panic, its signature is slightly
different from what Sachin first reported and which I have carried in 1st
patch.

> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >  		struct kmem_cache_cpu *c)
> >  {
> >  	void *object;
> > -	int searchnode = node;
> >  
> > -	if (node == NUMA_NO_NODE)
> > -		searchnode = numa_mem_id();
> > -	else if (!node_present_pages(node))
> > -		searchnode = node_to_mem_node(node);
> > -
> > -	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > +	object = get_partial_node(s, get_node(s, node), c, flags);
> >  	if (object || node != NUMA_NO_NODE)>  		return object;
> >
> >       return get_any_partial(s, flags, c);
> 
> I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the
> hunk below), thus the get_any_partial() call becomes dead code?
> 
> > @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
> >  
> >  	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
> >  
> > +	if (node == NUMA_NO_NODE)
> > +		node = numa_mem_id();
> > +	else if (!node_present_pages(node))
> > +		node = node_to_mem_node(node);
> > +
> >  	freelist = get_partial(s, flags, node, c);
> >  
> >  	if (freelist)
> > @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> >  redo:
> >  
> >  	if (unlikely(!node_match(page, node))) {
> > -		int searchnode = node;
> > -
> >  		if (node != NUMA_NO_NODE && !node_present_pages(node))
> > -			searchnode = node_to_mem_node(node);
> > +			node = node_to_mem_node(node);
> >  
> > -		if (unlikely(!node_match(page, searchnode))) {
> > +		if (unlikely(!node_match(page, node))) {
> >  			stat(s, ALLOC_NODE_MISMATCH);
> >  			deactivate_slab(s, page, c->freelist, c);
> >  			goto new_slab;
> > 
> 

-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-17 13:45       ` Srikar Dronamraju
@ 2020-03-17 13:53         ` Vlastimil Babka
  2020-03-17 14:51           ` Srikar Dronamraju
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2020-03-17 13:53 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Bharata B Rao

On 3/17/20 2:45 PM, Srikar Dronamraju wrote:
> * Vlastimil Babka <vbabka@suse.cz> [2020-03-17 14:34:25]:
> 
>> On 3/17/20 2:17 PM, Srikar Dronamraju wrote:
>> > Currently while allocating a slab for a offline node, we use its
>> > associated node_numa_mem to search for a partial slab. If we don't find
>> > a partial slab, we try allocating a slab from the offline node using
>> > __alloc_pages_node. However this is bound to fail.
>> > 
>> > NIP [c00000000039a300] __alloc_pages_nodemask+0x130/0x3b0
>> > LR [c00000000039a3c4] __alloc_pages_nodemask+0x1f4/0x3b0
>> > Call Trace:
>> > [c0000008b36837f0] [c00000000039a3b4] __alloc_pages_nodemask+0x1e4/0x3b0 (unreliable)
>> > [c0000008b3683870] [c0000000003d1ff8] new_slab+0x128/0xcf0
>> > [c0000008b3683950] [c0000000003d6060] ___slab_alloc+0x410/0x820
>> > [c0000008b3683a40] [c0000000003d64a4] __slab_alloc+0x34/0x60
>> > [c0000008b3683a70] [c0000000003d78b0] __kmalloc_node+0x110/0x490
>> > [c0000008b3683af0] [c000000000343a08] kvmalloc_node+0x58/0x110
>> > [c0000008b3683b30] [c0000000003ffd44] mem_cgroup_css_online+0x104/0x270
>> > [c0000008b3683b90] [c000000000234e08] online_css+0x48/0xd0
>> > [c0000008b3683bc0] [c00000000023dedc] cgroup_apply_control_enable+0x2ec/0x4d0
>> > [c0000008b3683ca0] [c0000000002416f8] cgroup_mkdir+0x228/0x5f0
>> > [c0000008b3683d10] [c000000000520360] kernfs_iop_mkdir+0x90/0xf0
>> > [c0000008b3683d50] [c00000000043e400] vfs_mkdir+0x110/0x230
>> > [c0000008b3683da0] [c000000000441ee0] do_mkdirat+0xb0/0x1a0
>> > [c0000008b3683e20] [c00000000000b278] system_call+0x5c/0x68
>> > 
>> > Mitigate this by allocating the new slab from the node_numa_mem.
>> 
>> Are you sure this is really needed and the other 3 patches are not enough for
>> the current SLUB code to work as needed? It seems you are changing the semantics
>> here...
>> 
> 
> The other 3 patches are not enough because we don't carry the searchnode
> when the actual alloc_pages_node gets called. 
> 
> With only the 3 patches, we see the above Panic, its signature is slightly
> different from what Sachin first reported and which I have carried in 1st
> patch.

Ah, I see. So that's the missing pgdat after your series [1] right?

That sounds like an argument for Michal's suggestions that pgdats exist and have
correctly populated zonelists for all possible nodes.
node_to_mem_node() could be just a shortcut for the first zone's node in the
zonelist, so that fallback follows the topology.

[1]
https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e

>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>> >  		struct kmem_cache_cpu *c)
>> >  {
>> >  	void *object;
>> > -	int searchnode = node;
>> >  
>> > -	if (node == NUMA_NO_NODE)
>> > -		searchnode = numa_mem_id();
>> > -	else if (!node_present_pages(node))
>> > -		searchnode = node_to_mem_node(node);
>> > -
>> > -	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>> > +	object = get_partial_node(s, get_node(s, node), c, flags);
>> >  	if (object || node != NUMA_NO_NODE)>  		return object;
>> >
>> >       return get_any_partial(s, flags, c);
>> 
>> I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the
>> hunk below), thus the get_any_partial() call becomes dead code?
>> 
>> > @@ -2470,6 +2464,11 @@ static inline void *new_slab_objects(struct kmem_cache *s, gfp_t flags,
>> >  
>> >  	WARN_ON_ONCE(s->ctor && (flags & __GFP_ZERO));
>> >  
>> > +	if (node == NUMA_NO_NODE)
>> > +		node = numa_mem_id();
>> > +	else if (!node_present_pages(node))
>> > +		node = node_to_mem_node(node);
>> > +
>> >  	freelist = get_partial(s, flags, node, c);
>> >  
>> >  	if (freelist)
>> > @@ -2569,12 +2568,10 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>> >  redo:
>> >  
>> >  	if (unlikely(!node_match(page, node))) {
>> > -		int searchnode = node;
>> > -
>> >  		if (node != NUMA_NO_NODE && !node_present_pages(node))
>> > -			searchnode = node_to_mem_node(node);
>> > +			node = node_to_mem_node(node);
>> >  
>> > -		if (unlikely(!node_match(page, searchnode))) {
>> > +		if (unlikely(!node_match(page, node))) {
>> >  			stat(s, ALLOC_NODE_MISMATCH);
>> >  			deactivate_slab(s, page, c->freelist, c);
>> >  			goto new_slab;
>> > 
>> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes
  2020-03-17 13:17   ` [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
@ 2020-03-17 14:22     ` Bharata B Rao
  2020-03-17 14:29       ` Srikar Dronamraju
  0 siblings, 1 reply; 15+ messages in thread
From: Bharata B Rao @ 2020-03-17 14:22 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka

This patchset can also fix a related problem that I reported earlier at
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-March/206076.html
with an additional change, suggested by Srikar as shown below:

On Tue, Mar 17, 2020 at 06:47:53PM +0530, Srikar Dronamraju wrote:
> Currently fallback nodes for offline nodes aren't set. Hence by default
> node 0 ends up being the default node. However node 0 might be offline.
> 
> Fix this by explicitly setting fallback node. Ensure first_memory_node
> is set before kernel does explicit setting of fallback node.
> 
>  arch/powerpc/mm/numa.c | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index 281531340230..6e97ab6575cb 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -827,7 +827,16 @@ void __init dump_numa_cpu_topology(void)
>  	if (!numa_enabled)
>  		return;
>  
> -	for_each_online_node(node) {
> +	for_each_node(node) {
> +		/*
> +		 * For all possible but not yet online nodes, ensure their
> +		 * node_numa_mem is set correctly so that kmalloc_node works
> +		 * for such nodes.
> +		 */
> +		if (!node_online(node)) {

Change the above line to like below:

+               if (!node_state(node, N_MEMORY)) {

Regards,
Bharata.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes
  2020-03-17 14:22     ` Bharata B Rao
@ 2020-03-17 14:29       ` Srikar Dronamraju
  0 siblings, 0 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 14:29 UTC (permalink / raw)
  To: Bharata B Rao
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Vlastimil Babka

* Bharata B Rao <bharata@linux.ibm.com> [2020-03-17 19:52:32]:

Thanks Bharata.

> This patchset can also fix a related problem that I reported earlier at
> https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-March/206076.html
> with an additional change, suggested by Srikar as shown below:
> 
> >  
> > -	for_each_online_node(node) {
> > +	for_each_node(node) {
> > +		/*
> > +		 * For all possible but not yet online nodes, ensure their
> > +		 * node_numa_mem is set correctly so that kmalloc_node works
> > +		 * for such nodes.
> > +		 */
> > +		if (!node_online(node)) {
> 
> Change the above line to like below:
> 
> +               if (!node_state(node, N_MEMORY)) {
> 

Just to clarify, this is needed if we don't have
http://lore.kernel.org/lkml/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u


-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-17 13:53         ` Vlastimil Babka
@ 2020-03-17 14:51           ` Srikar Dronamraju
  2020-03-17 15:29             ` Vlastimil Babka
  0 siblings, 1 reply; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 14:51 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Bharata B Rao

* Vlastimil Babka <vbabka@suse.cz> [2020-03-17 14:53:26]:

> >> > 
> >> > Mitigate this by allocating the new slab from the node_numa_mem.
> >> 
> >> Are you sure this is really needed and the other 3 patches are not enough for
> >> the current SLUB code to work as needed? It seems you are changing the semantics
> >> here...
> >> 
> > 
> > The other 3 patches are not enough because we don't carry the searchnode
> > when the actual alloc_pages_node gets called. 
> > 
> > With only the 3 patches, we see the above Panic, its signature is slightly
> > different from what Sachin first reported and which I have carried in 1st
> > patch.
> 
> Ah, I see. So that's the missing pgdat after your series [1] right?

Yes the pgdat would be missing after my cpuless, memoryless node patchset.
However..
> 
> That sounds like an argument for Michal's suggestions that pgdats exist and have
> correctly populated zonelists for all possible nodes.

Only the first patch in this series would be affected by pgdat existing or
not.  Even if the pgdat existed, the NODE_DATA[nid]->node_present_pages
would be 0. Right? So it would look at node_to_mem_node(). And since node 0 is
cpuless it would return 0. If we pass this node 0 (which is memoryless/cpuless) to
alloc_pages_node. Please note I am only setting node_numa_mem only
for offline nodes. However we could change this to set for all offline and
memoryless nodes.

> node_to_mem_node() could be just a shortcut for the first zone's node in the
> zonelist, so that fallback follows the topology.
> 
> [1]
> https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e
> 



-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-17 14:51           ` Srikar Dronamraju
@ 2020-03-17 15:29             ` Vlastimil Babka
  2020-03-18  7:29               ` Srikar Dronamraju
  0 siblings, 1 reply; 15+ messages in thread
From: Vlastimil Babka @ 2020-03-17 15:29 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Bharata B Rao

On 3/17/20 3:51 PM, Srikar Dronamraju wrote:
> * Vlastimil Babka <vbabka@suse.cz> [2020-03-17 14:53:26]:
> 
>> >> > 
>> >> > Mitigate this by allocating the new slab from the node_numa_mem.
>> >> 
>> >> Are you sure this is really needed and the other 3 patches are not enough for
>> >> the current SLUB code to work as needed? It seems you are changing the semantics
>> >> here...
>> >> 
>> > 
>> > The other 3 patches are not enough because we don't carry the searchnode
>> > when the actual alloc_pages_node gets called. 
>> > 
>> > With only the 3 patches, we see the above Panic, its signature is slightly
>> > different from what Sachin first reported and which I have carried in 1st
>> > patch.
>> 
>> Ah, I see. So that's the missing pgdat after your series [1] right?
> 
> Yes the pgdat would be missing after my cpuless, memoryless node patchset.
> However..
>> 
>> That sounds like an argument for Michal's suggestions that pgdats exist and have
>> correctly populated zonelists for all possible nodes.
> 
> Only the first patch in this series would be affected by pgdat existing or
> not.  Even if the pgdat existed, the NODE_DATA[nid]->node_present_pages
> would be 0. Right? So it would look at node_to_mem_node(). And since node 0 is
> cpuless it would return 0.

I thought the point was to return 1 for node 0.

> If we pass this node 0 (which is memoryless/cpuless) to
> alloc_pages_node. Please note I am only setting node_numa_mem only
> for offline nodes. However we could change this to set for all offline and
> memoryless nodes.

That would indeed make sense.

But I guess that alloc_pages would still crash as the result of
numa_to_mem_node() is not passed down to alloc_pages() without this patch. In
__alloc_pages_node() we currently have "The node must be valid and online" so
offline nodes don't have zonelists. Either they get them, or we indeed need
something like this patch. But in order to not make get_any_partial() dead code,
the final replacement of invalid node with a valid one should be done in
alloc_slab_page() I guess?

>> node_to_mem_node() could be just a shortcut for the first zone's node in the
>> zonelist, so that fallback follows the topology.
>> 
>> [1]
>> https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#m76e5b4c4084380b1d4b193d5aa0359b987f2290e
>> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-17 13:34     ` Vlastimil Babka
  2020-03-17 13:45       ` Srikar Dronamraju
@ 2020-03-17 16:41       ` Srikar Dronamraju
  1 sibling, 0 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-17 16:41 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Bharata B Rao

* Vlastimil Babka <vbabka@suse.cz> [2020-03-17 14:34:25]:

> 
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -1970,14 +1970,8 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >  		struct kmem_cache_cpu *c)
> >  {
> >  	void *object;
> > -	int searchnode = node;
> >  
> > -	if (node == NUMA_NO_NODE)
> > -		searchnode = numa_mem_id();
> > -	else if (!node_present_pages(node))
> > -		searchnode = node_to_mem_node(node);
> > -
> > -	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > +	object = get_partial_node(s, get_node(s, node), c, flags);
> >  	if (object || node != NUMA_NO_NODE)>  		return object;
> >
> >       return get_any_partial(s, flags, c);
> 
> I.e. here in this if(), now node will never equal NUMA_NO_NODE (thanks to the
> hunk below), thus the get_any_partial() call becomes dead code?

Very true. 

Would it be okay if we remove the node != NUMA_NO_NODE
  	if (object || node != NUMA_NO_NODE)
		return object;

will now become
  	if (object)
		return object;

-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab
  2020-03-17 15:29             ` Vlastimil Babka
@ 2020-03-18  7:29               ` Srikar Dronamraju
  0 siblings, 0 replies; 15+ messages in thread
From: Srikar Dronamraju @ 2020-03-18  7:29 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, linux-mm, Mel Gorman, Michael Ellerman,
	Sachin Sant, Michal Hocko, Christopher Lameter, linuxppc-dev,
	Joonsoo Kim, Kirill Tkhai, Bharata B Rao

* Vlastimil Babka <vbabka@suse.cz> [2020-03-17 16:29:21]:

> > If we pass this node 0 (which is memoryless/cpuless) to
> > alloc_pages_node. Please note I am only setting node_numa_mem only
> > for offline nodes. However we could change this to set for all offline and
> > memoryless nodes.
> 
> That would indeed make sense.
> 
> But I guess that alloc_pages would still crash as the result of
> numa_to_mem_node() is not passed down to alloc_pages() without this patch. In
> __alloc_pages_node() we currently have "The node must be valid and online" so
> offline nodes don't have zonelists. Either they get them, or we indeed need
> something like this patch. But in order to not make get_any_partial() dead code,
> the final replacement of invalid node with a valid one should be done in
> alloc_slab_page() I guess?
> 

I am posting v2 with this change.

> >> node_to_mem_node() could be just a shortcut for the first zone's node in the
> >> zonelist, so that fallback follows the topology.

-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-03-18  7:30 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com>
2020-03-17 13:17 ` [PATCH 0/4] Fix kmalloc_node on offline nodes Srikar Dronamraju
2020-03-17 13:17   ` [PATCH 1/4] mm: Check for node_online in node_present_pages Srikar Dronamraju
2020-03-17 13:37     ` Srikar Dronamraju
2020-03-17 13:17   ` [PATCH 2/4] mm/slub: Use mem_node to allocate a new slab Srikar Dronamraju
2020-03-17 13:34     ` Vlastimil Babka
2020-03-17 13:45       ` Srikar Dronamraju
2020-03-17 13:53         ` Vlastimil Babka
2020-03-17 14:51           ` Srikar Dronamraju
2020-03-17 15:29             ` Vlastimil Babka
2020-03-18  7:29               ` Srikar Dronamraju
2020-03-17 16:41       ` Srikar Dronamraju
2020-03-17 13:17   ` [PATCH 3/4] mm: Implement reset_numa_mem Srikar Dronamraju
2020-03-17 13:17   ` [PATCH 4/4] powerpc/numa: Set fallback nodes for offline nodes Srikar Dronamraju
2020-03-17 14:22     ` Bharata B Rao
2020-03-17 14:29       ` Srikar Dronamraju

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).