All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-18 14:42 ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-18 14:42 UTC (permalink / raw)
  To: linux-mm
  Cc: Vlastimil Babka, Sachin Sant, Bharata B Rao, Srikar Dronamraju,
	Mel Gorman, Michael Ellerman, Michal Hocko, Christopher Lameter,
	linuxppc-dev, Joonsoo Kim, Pekka Enberg, David Rientjes,
	Kirill Tkhai, Nathan Lynch

Sachin reports [1] a crash in SLUB __slab_alloc():

BUG: Kernel NULL pointer dereference on read at 0x000073b0
Faulting instruction address: 0xc0000000003d55f4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
LR [c0000000003d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
[c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
[c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
[c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
[c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
[c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
[c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
[c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
[c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68

This is a PowerPC platform with following NUMA topology:

available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10

possible numa nodes: 0-31

This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on
appropriate NUMA node" [2] which effectively calls kmalloc_node for each
possible node. SLUB however only allocates kmem_cache_node on online nodes
with present memory, and relies on node_to_mem_node to return such valid node
for other nodes since commit a561ce00b09e ("slub: fall back to
node_to_mem_node() node if allocating on memoryless node"). This is however not
true in this configuration where the _node_numa_mem_ array is not initialized
for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up
accessing non-allocated kmem_cache_node.

A related issue was reported by Bharata [3] where a similar PowerPC
configuration, but without patch [2] ends up allocating large amounts of pages
by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
node_to_mem_node() not behaving as expected, and might probably also lead
to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.

This patch should fix both issues by not relying on node_to_mem_node() anymore
and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is
attempted for a node that's not online or has no pages. Also in case
alloc_slab_page() is reached with a non-online node, fallback as well, until
we have a guarantee that all possible nodes have valid NODE_DATA with proper
zonelist for fallback.

[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: Bharata B Rao <bharata@linux.ibm.com>
Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
---
Hi, this is my alternate solution of the SLUB issues to the series [1]. Could
Sachin and Bharata please test whether it fixes the issues?
1) on plain mainline (Bharata) or next (Sachin)
2) the same but with [PATCH 0/3] Offline memoryless cpuless node 0 [2] as I
   assume the series was not related to the SLUB issues and will be kept?

Thanks!

[1] https://lore.kernel.org/linux-mm/20200318072810.9735-1-srikar@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/20200311110237.5731-1-srikar@linux.vnet.ibm.com/

 mm/slub.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e33115..4d798cacdae1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
-	if (node == NUMA_NO_NODE)
+	if (node == NUMA_NO_NODE || !node_online(node))
 		page = alloc_pages(flags, order);
 	else
 		page = __alloc_pages_node(node, flags, order);
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
-		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
-
-		if (unlikely(!node_match(page, searchnode))) {
+		/*
+		 * node_match() false implies node != NUMA_NO_NODE
+		 * but if the node is not online or has no pages, just
+		 * ignore the constraint
+		 */
+		if ((!node_online(node) || !node_present_pages(node))) {
+			node = NUMA_NO_NODE;
+			goto redo;
+		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-18 14:42 ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-18 14:42 UTC (permalink / raw)
  To: linux-mm
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Bharata B Rao, Pekka Enberg, Kirill Tkhai, David Rientjes,
	Christopher Lameter, Michal Hocko, Mel Gorman, Joonsoo Kim,
	Vlastimil Babka

Sachin reports [1] a crash in SLUB __slab_alloc():

BUG: Kernel NULL pointer dereference on read at 0x000073b0
Faulting instruction address: 0xc0000000003d55f4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
LR [c0000000003d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
[c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
[c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
[c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
[c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
[c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
[c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
[c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
[c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68

This is a PowerPC platform with following NUMA topology:

available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10

possible numa nodes: 0-31

This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on
appropriate NUMA node" [2] which effectively calls kmalloc_node for each
possible node. SLUB however only allocates kmem_cache_node on online nodes
with present memory, and relies on node_to_mem_node to return such valid node
for other nodes since commit a561ce00b09e ("slub: fall back to
node_to_mem_node() node if allocating on memoryless node"). This is however not
true in this configuration where the _node_numa_mem_ array is not initialized
for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up
accessing non-allocated kmem_cache_node.

A related issue was reported by Bharata [3] where a similar PowerPC
configuration, but without patch [2] ends up allocating large amounts of pages
by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
node_to_mem_node() not behaving as expected, and might probably also lead
to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.

This patch should fix both issues by not relying on node_to_mem_node() anymore
and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is
attempted for a node that's not online or has no pages. Also in case
alloc_slab_page() is reached with a non-online node, fallback as well, until
we have a guarantee that all possible nodes have valid NODE_DATA with proper
zonelist for fallback.

[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: Bharata B Rao <bharata@linux.ibm.com>
Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
---
Hi, this is my alternate solution of the SLUB issues to the series [1]. Could
Sachin and Bharata please test whether it fixes the issues?
1) on plain mainline (Bharata) or next (Sachin)
2) the same but with [PATCH 0/3] Offline memoryless cpuless node 0 [2] as I
   assume the series was not related to the SLUB issues and will be kept?

Thanks!

[1] https://lore.kernel.org/linux-mm/20200318072810.9735-1-srikar@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/20200311110237.5731-1-srikar@linux.vnet.ibm.com/

 mm/slub.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e33115..4d798cacdae1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
-	if (node == NUMA_NO_NODE)
+	if (node == NUMA_NO_NODE || !node_online(node))
 		page = alloc_pages(flags, order);
 	else
 		page = __alloc_pages_node(node, flags, order);
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
-		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
-
-		if (unlikely(!node_match(page, searchnode))) {
+		/*
+		 * node_match() false implies node != NUMA_NO_NODE
+		 * but if the node is not online or has no pages, just
+		 * ignore the constraint
+		 */
+		if ((!node_online(node) || !node_present_pages(node))) {
+			node = NUMA_NO_NODE;
+			goto redo;
+		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [RFC 2/2] Revert "topology: add support for node_to_mem_node() to determine the fallback node"
  2020-03-18 14:42 ` Vlastimil Babka
  (?)
@ 2020-03-18 14:42 ` Vlastimil Babka
  -1 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-18 14:42 UTC (permalink / raw)
  To: linux-mm
  Cc: Vlastimil Babka, Joonsoo Kim, Bharata B Rao, Christopher Lameter,
	David Rientjes, Kirill Tkhai, Mel Gorman, Michael Ellerman,
	Michal Hocko, Nathan Lynch, Pekka Enberg, Sachin Sant,
	Srikar Dronamraju

This reverts commit ad2c8144418c6a81cefe65379fd47bbe8344cef2.

The function node_to_mem_node() was introduced for use in SLUB, but it turned
out to be unreliable and a simpler solution exists than fixing it up. Thus
the previous commit "mm, slub: prevent kmalloc_node crashes and memory leaks"
removed its only user and we can revert the commit that introduced it.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
---
 include/linux/topology.h | 17 -----------------
 mm/page_alloc.c          |  1 -
 2 files changed, 18 deletions(-)

diff --git a/include/linux/topology.h b/include/linux/topology.h
index eb2fe6edd73c..608fa4aadf0e 100644
--- a/include/linux/topology.h
+++ b/include/linux/topology.h
@@ -130,20 +130,11 @@ static inline int numa_node_id(void)
  * Use the accessor functions set_numa_mem(), numa_mem_id() and cpu_to_mem().
  */
 DECLARE_PER_CPU(int, _numa_mem_);
-extern int _node_numa_mem_[MAX_NUMNODES];
 
 #ifndef set_numa_mem
 static inline void set_numa_mem(int node)
 {
 	this_cpu_write(_numa_mem_, node);
-	_node_numa_mem_[numa_node_id()] = node;
-}
-#endif
-
-#ifndef node_to_mem_node
-static inline int node_to_mem_node(int node)
-{
-	return _node_numa_mem_[node];
 }
 #endif
 
@@ -166,7 +157,6 @@ static inline int cpu_to_mem(int cpu)
 static inline void set_cpu_numa_mem(int cpu, int node)
 {
 	per_cpu(_numa_mem_, cpu) = node;
-	_node_numa_mem_[cpu_to_node(cpu)] = node;
 }
 #endif
 
@@ -180,13 +170,6 @@ static inline int numa_mem_id(void)
 }
 #endif
 
-#ifndef node_to_mem_node
-static inline int node_to_mem_node(int node)
-{
-	return node;
-}
-#endif
-
 #ifndef cpu_to_mem
 static inline int cpu_to_mem(int cpu)
 {
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3c4eb750a199..6e7e9c1d6caa 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -95,7 +95,6 @@ DEFINE_STATIC_KEY_TRUE(vm_numa_stat_key);
  */
 DEFINE_PER_CPU(int, _numa_mem_);		/* Kernel "local memory" node */
 EXPORT_PER_CPU_SYMBOL(_numa_mem_);
-int _node_numa_mem_[MAX_NUMNODES];
 #endif
 
 /* work_structs for global per-cpu drains */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-18 14:42 ` Vlastimil Babka
@ 2020-03-18 16:06   ` Bharata B Rao
  -1 siblings, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2020-03-18 16:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Sachin Sant, Srikar Dronamraju, Mel Gorman,
	Michael Ellerman, Michal Hocko, Christopher Lameter,
	linuxppc-dev, Joonsoo Kim, Pekka Enberg, David Rientjes,
	Kirill Tkhai, Nathan Lynch

On Wed, Mar 18, 2020 at 03:42:19PM +0100, Vlastimil Babka wrote:
> This is a PowerPC platform with following NUMA topology:
> 
> available: 2 nodes (0-1)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 1 size: 35247 MB
> node 1 free: 30907 MB
> node distances:
> node   0   1
>   0:  10  40
>   1:  40  10
> 
> possible numa nodes: 0-31
> 
> A related issue was reported by Bharata [3] where a similar PowerPC
> configuration, but without patch [2] ends up allocating large amounts of pages
> by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
> node_to_mem_node() not behaving as expected, and might probably also lead
> to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.

This patch doesn't fix the issue of kmalloc caches consuming more
memory for the above mentioned topology. Also CONFIG_SLUB_CPU_PARTIAL is set
here and I have not observed infinite loop till now.

Or, are you expecting your fix to work on top of Srikar's other patchset
https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u ?

With the above patchset, no fix is required to address increased memory
consumption of kmalloc caches because this patchset prevents such
topology from occuring thereby making it impossible for the problem
to surface (or at least impossible for the specific topology that I
mentioned)

> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..4d798cacdae1 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>  	struct page *page;
>  	unsigned int order = oo_order(oo);
>  
> -	if (node == NUMA_NO_NODE)
> +	if (node == NUMA_NO_NODE || !node_online(node))
>  		page = alloc_pages(flags, order);
>  	else
>  		page = __alloc_pages_node(node, flags, order);
> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  
>  	if (node == NUMA_NO_NODE)
>  		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);

We still come here with memory-less node=0 (and not NUMA_NO_NODE), fail to
find partial slab, go back and allocate a new one thereby continuosly
increasing the number of newly allocated slabs.

>  
>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>  	if (object || node != NUMA_NO_NODE)
> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  redo:
>  
>  	if (unlikely(!node_match(page, node))) {
> -		int searchnode = node;
> -
> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
> -			searchnode = node_to_mem_node(node);
> -
> -		if (unlikely(!node_match(page, searchnode))) {
> +		/*
> +		 * node_match() false implies node != NUMA_NO_NODE
> +		 * but if the node is not online or has no pages, just
> +		 * ignore the constraint
> +		 */
> +		if ((!node_online(node) || !node_present_pages(node))) {
> +			node = NUMA_NO_NODE;
> +			goto redo;

Many calls for allocating slab object from memory-less node 0 in my case
don't even hit the above check because they get short circuited by
goto new_slab label which is present a few lines above.  Hence I don't see
any reduction in the amount of slab memory with this fix.

Regards,
Bharata.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-18 16:06   ` Bharata B Rao
  0 siblings, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2020-03-18 16:06 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim

On Wed, Mar 18, 2020 at 03:42:19PM +0100, Vlastimil Babka wrote:
> This is a PowerPC platform with following NUMA topology:
> 
> available: 2 nodes (0-1)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 1 size: 35247 MB
> node 1 free: 30907 MB
> node distances:
> node   0   1
>   0:  10  40
>   1:  40  10
> 
> possible numa nodes: 0-31
> 
> A related issue was reported by Bharata [3] where a similar PowerPC
> configuration, but without patch [2] ends up allocating large amounts of pages
> by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
> node_to_mem_node() not behaving as expected, and might probably also lead
> to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.

This patch doesn't fix the issue of kmalloc caches consuming more
memory for the above mentioned topology. Also CONFIG_SLUB_CPU_PARTIAL is set
here and I have not observed infinite loop till now.

Or, are you expecting your fix to work on top of Srikar's other patchset
https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u ?

With the above patchset, no fix is required to address increased memory
consumption of kmalloc caches because this patchset prevents such
topology from occuring thereby making it impossible for the problem
to surface (or at least impossible for the specific topology that I
mentioned)

> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..4d798cacdae1 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>  	struct page *page;
>  	unsigned int order = oo_order(oo);
>  
> -	if (node == NUMA_NO_NODE)
> +	if (node == NUMA_NO_NODE || !node_online(node))
>  		page = alloc_pages(flags, order);
>  	else
>  		page = __alloc_pages_node(node, flags, order);
> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  
>  	if (node == NUMA_NO_NODE)
>  		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);

We still come here with memory-less node=0 (and not NUMA_NO_NODE), fail to
find partial slab, go back and allocate a new one thereby continuosly
increasing the number of newly allocated slabs.

>  
>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>  	if (object || node != NUMA_NO_NODE)
> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  redo:
>  
>  	if (unlikely(!node_match(page, node))) {
> -		int searchnode = node;
> -
> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
> -			searchnode = node_to_mem_node(node);
> -
> -		if (unlikely(!node_match(page, searchnode))) {
> +		/*
> +		 * node_match() false implies node != NUMA_NO_NODE
> +		 * but if the node is not online or has no pages, just
> +		 * ignore the constraint
> +		 */
> +		if ((!node_online(node) || !node_present_pages(node))) {
> +			node = NUMA_NO_NODE;
> +			goto redo;

Many calls for allocating slab object from memory-less node 0 in my case
don't even hit the above check because they get short circuited by
goto new_slab label which is present a few lines above.  Hence I don't see
any reduction in the amount of slab memory with this fix.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-18 16:06   ` Bharata B Rao
@ 2020-03-18 16:10     ` Vlastimil Babka
  -1 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-18 16:10 UTC (permalink / raw)
  To: bharata
  Cc: linux-mm, Sachin Sant, Srikar Dronamraju, Mel Gorman,
	Michael Ellerman, Michal Hocko, Christopher Lameter,
	linuxppc-dev, Joonsoo Kim, Pekka Enberg, David Rientjes,
	Kirill Tkhai, Nathan Lynch

On 3/18/20 5:06 PM, Bharata B Rao wrote:
> On Wed, Mar 18, 2020 at 03:42:19PM +0100, Vlastimil Babka wrote:
>> This is a PowerPC platform with following NUMA topology:
>> 
>> available: 2 nodes (0-1)
>> node 0 cpus:
>> node 0 size: 0 MB
>> node 0 free: 0 MB
>> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
>> node 1 size: 35247 MB
>> node 1 free: 30907 MB
>> node distances:
>> node   0   1
>>   0:  10  40
>>   1:  40  10
>> 
>> possible numa nodes: 0-31
>> 
>> A related issue was reported by Bharata [3] where a similar PowerPC
>> configuration, but without patch [2] ends up allocating large amounts of pages
>> by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
>> node_to_mem_node() not behaving as expected, and might probably also lead
>> to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.
> 
> This patch doesn't fix the issue of kmalloc caches consuming more
> memory for the above mentioned topology. Also CONFIG_SLUB_CPU_PARTIAL is set
> here and I have not observed infinite loop till now.

OK that means something is wrong with my analysis.

> Or, are you expecting your fix to work on top of Srikar's other patchset
> https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u ?

No, I hoped it would work on mainline.

> With the above patchset, no fix is required to address increased memory
> consumption of kmalloc caches because this patchset prevents such
> topology from occuring thereby making it impossible for the problem
> to surface (or at least impossible for the specific topology that I
> mentioned)

Right, I hope to fix it nevertheless.

>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..4d798cacdae1 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>>  	struct page *page;
>>  	unsigned int order = oo_order(oo);
>>  
>> -	if (node == NUMA_NO_NODE)
>> +	if (node == NUMA_NO_NODE || !node_online(node))
>>  		page = alloc_pages(flags, order);
>>  	else
>>  		page = __alloc_pages_node(node, flags, order);
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>>  
>>  	if (node == NUMA_NO_NODE)
>>  		searchnode = numa_mem_id();
>> -	else if (!node_present_pages(node))
>> -		searchnode = node_to_mem_node(node);
> 
> We still come here with memory-less node=0 (and not NUMA_NO_NODE), fail to
> find partial slab, go back and allocate a new one thereby continuosly
> increasing the number of newly allocated slabs.
> 
>>  
>>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>>  	if (object || node != NUMA_NO_NODE)
>> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  redo:
>>  
>>  	if (unlikely(!node_match(page, node))) {
>> -		int searchnode = node;
>> -
>> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
>> -			searchnode = node_to_mem_node(node);
>> -
>> -		if (unlikely(!node_match(page, searchnode))) {
>> +		/*
>> +		 * node_match() false implies node != NUMA_NO_NODE
>> +		 * but if the node is not online or has no pages, just
>> +		 * ignore the constraint
>> +		 */
>> +		if ((!node_online(node) || !node_present_pages(node))) {
>> +			node = NUMA_NO_NODE;
>> +			goto redo;
> 
> Many calls for allocating slab object from memory-less node 0 in my case
> don't even hit the above check because they get short circuited by
> goto new_slab label which is present a few lines above.  Hence I don't see
> any reduction in the amount of slab memory with this fix.

Thanks a lot for the info, I will try again :)

> Regards,
> Bharata.
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-18 16:10     ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-18 16:10 UTC (permalink / raw)
  To: bharata
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim

On 3/18/20 5:06 PM, Bharata B Rao wrote:
> On Wed, Mar 18, 2020 at 03:42:19PM +0100, Vlastimil Babka wrote:
>> This is a PowerPC platform with following NUMA topology:
>> 
>> available: 2 nodes (0-1)
>> node 0 cpus:
>> node 0 size: 0 MB
>> node 0 free: 0 MB
>> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
>> node 1 size: 35247 MB
>> node 1 free: 30907 MB
>> node distances:
>> node   0   1
>>   0:  10  40
>>   1:  40  10
>> 
>> possible numa nodes: 0-31
>> 
>> A related issue was reported by Bharata [3] where a similar PowerPC
>> configuration, but without patch [2] ends up allocating large amounts of pages
>> by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
>> node_to_mem_node() not behaving as expected, and might probably also lead
>> to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.
> 
> This patch doesn't fix the issue of kmalloc caches consuming more
> memory for the above mentioned topology. Also CONFIG_SLUB_CPU_PARTIAL is set
> here and I have not observed infinite loop till now.

OK that means something is wrong with my analysis.

> Or, are you expecting your fix to work on top of Srikar's other patchset
> https://lore.kernel.org/linuxppc-dev/20200311110237.5731-1-srikar@linux.vnet.ibm.com/t/#u ?

No, I hoped it would work on mainline.

> With the above patchset, no fix is required to address increased memory
> consumption of kmalloc caches because this patchset prevents such
> topology from occuring thereby making it impossible for the problem
> to surface (or at least impossible for the specific topology that I
> mentioned)

Right, I hope to fix it nevertheless.

>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..4d798cacdae1 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
>>  	struct page *page;
>>  	unsigned int order = oo_order(oo);
>>  
>> -	if (node == NUMA_NO_NODE)
>> +	if (node == NUMA_NO_NODE || !node_online(node))
>>  		page = alloc_pages(flags, order);
>>  	else
>>  		page = __alloc_pages_node(node, flags, order);
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>>  
>>  	if (node == NUMA_NO_NODE)
>>  		searchnode = numa_mem_id();
>> -	else if (!node_present_pages(node))
>> -		searchnode = node_to_mem_node(node);
> 
> We still come here with memory-less node=0 (and not NUMA_NO_NODE), fail to
> find partial slab, go back and allocate a new one thereby continuosly
> increasing the number of newly allocated slabs.
> 
>>  
>>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>>  	if (object || node != NUMA_NO_NODE)
>> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  redo:
>>  
>>  	if (unlikely(!node_match(page, node))) {
>> -		int searchnode = node;
>> -
>> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
>> -			searchnode = node_to_mem_node(node);
>> -
>> -		if (unlikely(!node_match(page, searchnode))) {
>> +		/*
>> +		 * node_match() false implies node != NUMA_NO_NODE
>> +		 * but if the node is not online or has no pages, just
>> +		 * ignore the constraint
>> +		 */
>> +		if ((!node_online(node) || !node_present_pages(node))) {
>> +			node = NUMA_NO_NODE;
>> +			goto redo;
> 
> Many calls for allocating slab object from memory-less node 0 in my case
> don't even hit the above check because they get short circuited by
> goto new_slab label which is present a few lines above.  Hence I don't see
> any reduction in the amount of slab memory with this fix.

Thanks a lot for the info, I will try again :)

> Regards,
> Bharata.
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-18 16:06   ` Bharata B Rao
@ 2020-03-18 16:51     ` Vlastimil Babka
  -1 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-18 16:51 UTC (permalink / raw)
  To: bharata
  Cc: linux-mm, Sachin Sant, Srikar Dronamraju, Mel Gorman,
	Michael Ellerman, Michal Hocko, Christopher Lameter,
	linuxppc-dev, Joonsoo Kim, Pekka Enberg, David Rientjes,
	Kirill Tkhai, Nathan Lynch


On 3/18/20 5:06 PM, Bharata B Rao wrote:
>> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  redo:
>>  
>>  	if (unlikely(!node_match(page, node))) {
>> -		int searchnode = node;
>> -
>> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
>> -			searchnode = node_to_mem_node(node);
>> -
>> -		if (unlikely(!node_match(page, searchnode))) {
>> +		/*
>> +		 * node_match() false implies node != NUMA_NO_NODE
>> +		 * but if the node is not online or has no pages, just
>> +		 * ignore the constraint
>> +		 */
>> +		if ((!node_online(node) || !node_present_pages(node))) {
>> +			node = NUMA_NO_NODE;
>> +			goto redo;
> 
> Many calls for allocating slab object from memory-less node 0 in my case
> don't even hit the above check because they get short circuited by
> goto new_slab label which is present a few lines above.  Hence I don't see
> any reduction in the amount of slab memory with this fix.
> 
> Regards,
> Bharata.
 
OK how about this version? It's somewhat ugly, but important is that the fast
path case (c->page exists) is unaffected and another common case (c->page is
NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
some point anyway.

----8<----
From d1675363c2ddc3758e5c0d0f435ca46818219194 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Wed, 18 Mar 2020 14:47:33 +0100
Subject: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks

Sachin reports [1] a crash in SLUB __slab_alloc():

BUG: Kernel NULL pointer dereference on read at 0x000073b0
Faulting instruction address: 0xc0000000003d55f4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
LR [c0000000003d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
[c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
[c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
[c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
[c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
[c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
[c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
[c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
[c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68

This is a PowerPC platform with following NUMA topology:

available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10

possible numa nodes: 0-31

This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on
appropriate NUMA node" [2] which effectively calls kmalloc_node for each
possible node. SLUB however only allocates kmem_cache_node on online nodes
with present memory, and relies on node_to_mem_node to return such valid node
for other nodes since commit a561ce00b09e ("slub: fall back to
node_to_mem_node() node if allocating on memoryless node"). This is however not
true in this configuration where the _node_numa_mem_ array is not initialized
for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up
accessing non-allocated kmem_cache_node.

A related issue was reported by Bharata [3] where a similar PowerPC
configuration, but without patch [2] ends up allocating large amounts of pages
by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
node_to_mem_node() not behaving as expected, and might probably also lead
to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.

This patch should fix both issues by not relying on node_to_mem_node() anymore
and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is
attempted for a node that's not online or has no pages. Also in case
alloc_slab_page() is reached with a non-online node, fallback as well, until
we have a guarantee that all possible nodes have valid NODE_DATA with proper
zonelist for fallback.

[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: Bharata B Rao <bharata@linux.ibm.com>
Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
---
 mm/slub.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e33115..60352f80eb33 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
-	if (node == NUMA_NO_NODE)
+	if (node == NUMA_NO_NODE || !node_online(node))
 		page = alloc_pages(flags, order);
 	else
 		page = __alloc_pages_node(node, flags, order);
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	struct page *page;
 
 	page = c->page;
-	if (!page)
+	if (!page) {
+		/*
+		 * if the node is not online or has no pages, just
+		 * ignore the constraint
+		 */
+		if (unlikely(node != NUMA_NO_NODE &&
+		    (!node_online(node) || !node_present_pages(node))))
+			node = NUMA_NO_NODE;
 		goto new_slab;
+	}
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
-		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
-
-		if (unlikely(!node_match(page, searchnode))) {
+		/*
+		 * same as above but node_match() being false already
+		 * implies node != NUMA_NO_NODE
+		 */
+		if ((!node_online(node) || !node_present_pages(node))) {
+			node = NUMA_NO_NODE;
+			goto redo;
+		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-18 16:51     ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-18 16:51 UTC (permalink / raw)
  To: bharata
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim


On 3/18/20 5:06 PM, Bharata B Rao wrote:
>> @@ -2568,12 +2566,15 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  redo:
>>  
>>  	if (unlikely(!node_match(page, node))) {
>> -		int searchnode = node;
>> -
>> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
>> -			searchnode = node_to_mem_node(node);
>> -
>> -		if (unlikely(!node_match(page, searchnode))) {
>> +		/*
>> +		 * node_match() false implies node != NUMA_NO_NODE
>> +		 * but if the node is not online or has no pages, just
>> +		 * ignore the constraint
>> +		 */
>> +		if ((!node_online(node) || !node_present_pages(node))) {
>> +			node = NUMA_NO_NODE;
>> +			goto redo;
> 
> Many calls for allocating slab object from memory-less node 0 in my case
> don't even hit the above check because they get short circuited by
> goto new_slab label which is present a few lines above.  Hence I don't see
> any reduction in the amount of slab memory with this fix.
> 
> Regards,
> Bharata.
 
OK how about this version? It's somewhat ugly, but important is that the fast
path case (c->page exists) is unaffected and another common case (c->page is
NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
some point anyway.

----8<----
From d1675363c2ddc3758e5c0d0f435ca46818219194 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Wed, 18 Mar 2020 14:47:33 +0100
Subject: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks

Sachin reports [1] a crash in SLUB __slab_alloc():

BUG: Kernel NULL pointer dereference on read at 0x000073b0
Faulting instruction address: 0xc0000000003d55f4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
NIP:  c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
REGS: c0000008b37836d0 TRAP: 0300   Not tainted  (5.6.0-rc2-next-20200218-autotest)
MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
LR [c0000000003d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
[c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
[c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
[c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
[c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
[c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
[c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
[c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
[c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68

This is a PowerPC platform with following NUMA topology:

available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10

possible numa nodes: 0-31

This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on
appropriate NUMA node" [2] which effectively calls kmalloc_node for each
possible node. SLUB however only allocates kmem_cache_node on online nodes
with present memory, and relies on node_to_mem_node to return such valid node
for other nodes since commit a561ce00b09e ("slub: fall back to
node_to_mem_node() node if allocating on memoryless node"). This is however not
true in this configuration where the _node_numa_mem_ array is not initialized
for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up
accessing non-allocated kmem_cache_node.

A related issue was reported by Bharata [3] where a similar PowerPC
configuration, but without patch [2] ends up allocating large amounts of pages
by kmalloc-1k kmalloc-512. This seems to have the same underlying issue with
node_to_mem_node() not behaving as expected, and might probably also lead
to an infinite loop with CONFIG_SLUB_CPU_PARTIAL.

This patch should fix both issues by not relying on node_to_mem_node() anymore
and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is
attempted for a node that's not online or has no pages. Also in case
alloc_slab_page() is reached with a non-online node, fallback as well, until
we have a guarantee that all possible nodes have valid NODE_DATA with proper
zonelist for fallback.

[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@linux.vnet.ibm.com/
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtuozzo.com/
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.cz/

Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: Bharata B Rao <bharata@linux.ibm.com>
Debugged-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Christopher Lameter <cl@linux.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
---
 mm/slub.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e33115..60352f80eb33 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1511,7 +1511,7 @@ static inline struct page *alloc_slab_page(struct kmem_cache *s,
 	struct page *page;
 	unsigned int order = oo_order(oo);
 
-	if (node == NUMA_NO_NODE)
+	if (node == NUMA_NO_NODE || !node_online(node))
 		page = alloc_pages(flags, order);
 	else
 		page = __alloc_pages_node(node, flags, order);
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	struct page *page;
 
 	page = c->page;
-	if (!page)
+	if (!page) {
+		/*
+		 * if the node is not online or has no pages, just
+		 * ignore the constraint
+		 */
+		if (unlikely(node != NUMA_NO_NODE &&
+		    (!node_online(node) || !node_present_pages(node))))
+			node = NUMA_NO_NODE;
 		goto new_slab;
+	}
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
-		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
-
-		if (unlikely(!node_match(page, searchnode))) {
+		/*
+		 * same as above but node_match() being false already
+		 * implies node != NUMA_NO_NODE
+		 */
+		if ((!node_online(node) || !node_present_pages(node))) {
+			node = NUMA_NO_NODE;
+			goto redo;
+		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-18 16:51     ` Vlastimil Babka
@ 2020-03-19  8:52       ` Sachin Sant
  -1 siblings, 0 replies; 35+ messages in thread
From: Sachin Sant @ 2020-03-19  8:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: bharata, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim


> OK how about this version? It's somewhat ugly, but important is that the fast
> path case (c->page exists) is unaffected and another common case (c->page is
> NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
> some point anyway.
> 

I attempted the suggested tests.

Test 1: March 18 linux-next + Patch 1 [1] + Patch  2 [2]

Machine boots fine. numactl o/p after boot:

# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35631 MB
node 1 free: 32724 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10
#

Test 2: Code base as used in Test 1 + 3 patch series from Srikar [3]

Machine boots fine. numactl o/p after boot:

# numactl -H
available: 1 nodes (1)
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35631 MB
node 1 free: 32711 MB
node distances:
node   1
  1:  10
#

Thanks!
-Sachin


[1] https://lore.kernel.org/linux-mm/e060ad43-ff4e-0e59-2e64-ce8a4916ec70@suse.cz/T/#mb8d0a358dc5c5fb9661fa45047072a92c510faf2
[2] https://lore.kernel.org/linux-mm/e060ad43-ff4e-0e59-2e64-ce8a4916ec70@suse.cz/T/#mce342e54a95ce2ee3043901e52d70ee2fd94c516
[3] https://lore.kernel.org/linux-mm/20200311110237.5731-1-srikar@linux.vnet.ibm.com/


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-19  8:52       ` Sachin Sant
  0 siblings, 0 replies; 35+ messages in thread
From: Sachin Sant @ 2020-03-19  8:52 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Nathan Lynch, Srikar Dronamraju, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim


> OK how about this version? It's somewhat ugly, but important is that the fast
> path case (c->page exists) is unaffected and another common case (c->page is
> NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
> some point anyway.
> 

I attempted the suggested tests.

Test 1: March 18 linux-next + Patch 1 [1] + Patch  2 [2]

Machine boots fine. numactl o/p after boot:

# numactl -H
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35631 MB
node 1 free: 32724 MB
node distances:
node   0   1
  0:  10  40
  1:  40  10
#

Test 2: Code base as used in Test 1 + 3 patch series from Srikar [3]

Machine boots fine. numactl o/p after boot:

# numactl -H
available: 1 nodes (1)
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35631 MB
node 1 free: 32711 MB
node distances:
node   1
  1:  10
#

Thanks!
-Sachin


[1] https://lore.kernel.org/linux-mm/e060ad43-ff4e-0e59-2e64-ce8a4916ec70@suse.cz/T/#mb8d0a358dc5c5fb9661fa45047072a92c510faf2
[2] https://lore.kernel.org/linux-mm/e060ad43-ff4e-0e59-2e64-ce8a4916ec70@suse.cz/T/#mce342e54a95ce2ee3043901e52d70ee2fd94c516
[3] https://lore.kernel.org/linux-mm/20200311110237.5731-1-srikar@linux.vnet.ibm.com/

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-19  8:52       ` Sachin Sant
@ 2020-03-19 13:23         ` Vlastimil Babka
  -1 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-19 13:23 UTC (permalink / raw)
  To: Sachin Sant
  Cc: bharata, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim,
	Michael Ellerman

On 3/19/20 9:52 AM, Sachin Sant wrote:
> 
>> OK how about this version? It's somewhat ugly, but important is that the fast
>> path case (c->page exists) is unaffected and another common case (c->page is
>> NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
>> some point anyway.
>> 
> 
> I attempted the suggested tests.
> 
> Test 1: March 18 linux-next + Patch 1 [1] + Patch  2 [2]
> 
> Machine boots fine. numactl o/p after boot:

Great, thanks! Can I add your Tested-by: then?

If Bharata confirms his scenario with updated patch, I will send it properly.

> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 1 size: 35631 MB
> node 1 free: 32724 MB
> node distances:
> node   0   1
>   0:  10  40
>   1:  40  10
> #
> 
> Test 2: Code base as used in Test 1 + 3 patch series from Srikar [3]
> 
> Machine boots fine. numactl o/p after boot:
> 
> # numactl -H
> available: 1 nodes (1)
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 1 size: 35631 MB
> node 1 free: 32711 MB
> node distances:
> node   1
>   1:  10
> #
> 
> Thanks!
> -Sachin
> 
> 
> [1] https://lore.kernel.org/linux-mm/e060ad43-ff4e-0e59-2e64-ce8a4916ec70@suse.cz/T/#mb8d0a358dc5c5fb9661fa45047072a92c510faf2
> [2] https://lore.kernel.org/linux-mm/e060ad43-ff4e-0e59-2e64-ce8a4916ec70@suse.cz/T/#mce342e54a95ce2ee3043901e52d70ee2fd94c516
> [3] https://lore.kernel.org/linux-mm/20200311110237.5731-1-srikar@linux.vnet.ibm.com/
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-19 13:23         ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-19 13:23 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Nathan Lynch, Srikar Dronamraju, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim

On 3/19/20 9:52 AM, Sachin Sant wrote:
> 
>> OK how about this version? It's somewhat ugly, but important is that the fast
>> path case (c->page exists) is unaffected and another common case (c->page is
>> NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
>> some point anyway.
>> 
> 
> I attempted the suggested tests.
> 
> Test 1: March 18 linux-next + Patch 1 [1] + Patch  2 [2]
> 
> Machine boots fine. numactl o/p after boot:

Great, thanks! Can I add your Tested-by: then?

If Bharata confirms his scenario with updated patch, I will send it properly.

> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 1 size: 35631 MB
> node 1 free: 32724 MB
> node distances:
> node   0   1
>   0:  10  40
>   1:  40  10
> #
> 
> Test 2: Code base as used in Test 1 + 3 patch series from Srikar [3]
> 
> Machine boots fine. numactl o/p after boot:
> 
> # numactl -H
> available: 1 nodes (1)
> node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
> node 1 size: 35631 MB
> node 1 free: 32711 MB
> node distances:
> node   1
>   1:  10
> #
> 
> Thanks!
> -Sachin
> 
> 
> [1] https://lore.kernel.org/linux-mm/e060ad43-ff4e-0e59-2e64-ce8a4916ec70@suse.cz/T/#mb8d0a358dc5c5fb9661fa45047072a92c510faf2
> [2] https://lore.kernel.org/linux-mm/e060ad43-ff4e-0e59-2e64-ce8a4916ec70@suse.cz/T/#mce342e54a95ce2ee3043901e52d70ee2fd94c516
> [3] https://lore.kernel.org/linux-mm/20200311110237.5731-1-srikar@linux.vnet.ibm.com/
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-19 13:23         ` Vlastimil Babka
@ 2020-03-19 13:26           ` Sachin Sant
  -1 siblings, 0 replies; 35+ messages in thread
From: Sachin Sant @ 2020-03-19 13:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: bharata, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim,
	Michael Ellerman



> On 19-Mar-2020, at 6:53 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
> 
> On 3/19/20 9:52 AM, Sachin Sant wrote:
>> 
>>> OK how about this version? It's somewhat ugly, but important is that the fast
>>> path case (c->page exists) is unaffected and another common case (c->page is
>>> NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
>>> some point anyway.
>>> 
>> 
>> I attempted the suggested tests.
>> 
>> Test 1: March 18 linux-next + Patch 1 [1] + Patch  2 [2]
>> 
>> Machine boots fine. numactl o/p after boot:
> 
> Great, thanks! Can I add your Tested-by: then?

Sure.
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>

Thank you for the fix.

Thanks
-Sachin



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-19 13:26           ` Sachin Sant
  0 siblings, 0 replies; 35+ messages in thread
From: Sachin Sant @ 2020-03-19 13:26 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Nathan Lynch, Srikar Dronamraju, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim



> On 19-Mar-2020, at 6:53 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
> 
> On 3/19/20 9:52 AM, Sachin Sant wrote:
>> 
>>> OK how about this version? It's somewhat ugly, but important is that the fast
>>> path case (c->page exists) is unaffected and another common case (c->page is
>>> NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
>>> some point anyway.
>>> 
>> 
>> I attempted the suggested tests.
>> 
>> Test 1: March 18 linux-next + Patch 1 [1] + Patch  2 [2]
>> 
>> Machine boots fine. numactl o/p after boot:
> 
> Great, thanks! Can I add your Tested-by: then?

Sure.
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>

Thank you for the fix.

Thanks
-Sachin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-19 13:26           ` Sachin Sant
@ 2020-03-19 13:47             ` Vlastimil Babka
  -1 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-19 13:47 UTC (permalink / raw)
  To: Sachin Sant
  Cc: bharata, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim,
	Michael Ellerman

On 3/19/20 2:26 PM, Sachin Sant wrote:
> 
> 
>> On 19-Mar-2020, at 6:53 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
>> 
>> On 3/19/20 9:52 AM, Sachin Sant wrote:
>>> 
>>>> OK how about this version? It's somewhat ugly, but important is that the fast
>>>> path case (c->page exists) is unaffected and another common case (c->page is
>>>> NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
>>>> some point anyway.
>>>> 
>>> 
>>> I attempted the suggested tests.
>>> 
>>> Test 1: March 18 linux-next + Patch 1 [1] + Patch  2 [2]
>>> 
>>> Machine boots fine. numactl o/p after boot:
>> 
>> Great, thanks! Can I add your Tested-by: then?
> 
> Sure.
> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> 
> Thank you for the fix.

Thanks! Sorry to bother, but in the end I decided to do further change so I
would appreciate verification if it still works as intended.
The point is to use node_state(N_NORMAL_MEMORY) instead of node_present_pages(),
as that is really what SLUB uses to decide whether to allocate the
kmem_cache_node. So we should match this properly given the opportunity.
I have also again removed the node_online() check in alloc_slab_page() as it
really shouldn't be reachable with an offline node - everything is taken care of
in ___slab_alloc, or callers use NUMA_NO_NODE.

----8<----
diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e33115..7113b1f9cd77 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	struct page *page;
 
 	page = c->page;
-	if (!page)
+	if (!page) {
+		/*
+		 * if the node is not online or has no normal memory, just
+		 * ignore the node constraint
+		 */
+		if (unlikely(node != NUMA_NO_NODE &&
+			     !node_state(node, N_NORMAL_MEMORY)))
+			node = NUMA_NO_NODE;
 		goto new_slab;
+	}
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
-		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
-
-		if (unlikely(!node_match(page, searchnode))) {
+		/*
+		 * same as above but node_match() being false already
+		 * implies node != NUMA_NO_NODE
+		 */
+		if (!node_state(node, N_NORMAL_MEMORY)) {
+			node = NUMA_NO_NODE;
+			goto redo;
+		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-19 13:47             ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-19 13:47 UTC (permalink / raw)
  To: Sachin Sant
  Cc: Nathan Lynch, Srikar Dronamraju, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim

On 3/19/20 2:26 PM, Sachin Sant wrote:
> 
> 
>> On 19-Mar-2020, at 6:53 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
>> 
>> On 3/19/20 9:52 AM, Sachin Sant wrote:
>>> 
>>>> OK how about this version? It's somewhat ugly, but important is that the fast
>>>> path case (c->page exists) is unaffected and another common case (c->page is
>>>> NULL, but node is NUMA_NO_NODE) is just one extra check - impossible to avoid at
>>>> some point anyway.
>>>> 
>>> 
>>> I attempted the suggested tests.
>>> 
>>> Test 1: March 18 linux-next + Patch 1 [1] + Patch  2 [2]
>>> 
>>> Machine boots fine. numactl o/p after boot:
>> 
>> Great, thanks! Can I add your Tested-by: then?
> 
> Sure.
> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
> 
> Thank you for the fix.

Thanks! Sorry to bother, but in the end I decided to do further change so I
would appreciate verification if it still works as intended.
The point is to use node_state(N_NORMAL_MEMORY) instead of node_present_pages(),
as that is really what SLUB uses to decide whether to allocate the
kmem_cache_node. So we should match this properly given the opportunity.
I have also again removed the node_online() check in alloc_slab_page() as it
really shouldn't be reachable with an offline node - everything is taken care of
in ___slab_alloc, or callers use NUMA_NO_NODE.

----8<----
diff --git a/mm/slub.c b/mm/slub.c
index 17dc00e33115..7113b1f9cd77 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 
 	if (node == NUMA_NO_NODE)
 		searchnode = numa_mem_id();
-	else if (!node_present_pages(node))
-		searchnode = node_to_mem_node(node);
 
 	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
@@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 	struct page *page;
 
 	page = c->page;
-	if (!page)
+	if (!page) {
+		/*
+		 * if the node is not online or has no normal memory, just
+		 * ignore the node constraint
+		 */
+		if (unlikely(node != NUMA_NO_NODE &&
+			     !node_state(node, N_NORMAL_MEMORY)))
+			node = NUMA_NO_NODE;
 		goto new_slab;
+	}
 redo:
 
 	if (unlikely(!node_match(page, node))) {
-		int searchnode = node;
-
-		if (node != NUMA_NO_NODE && !node_present_pages(node))
-			searchnode = node_to_mem_node(node);
-
-		if (unlikely(!node_match(page, searchnode))) {
+		/*
+		 * same as above but node_match() being false already
+		 * implies node != NUMA_NO_NODE
+		 */
+		if (!node_state(node, N_NORMAL_MEMORY)) {
+			node = NUMA_NO_NODE;
+			goto redo;
+		} else {
 			stat(s, ALLOC_NODE_MISMATCH);
 			deactivate_slab(s, page, c->freelist, c);
 			goto new_slab;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-19 13:47             ` Vlastimil Babka
@ 2020-03-19 14:05               ` Srikar Dronamraju
  -1 siblings, 0 replies; 35+ messages in thread
From: Srikar Dronamraju @ 2020-03-19 14:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, bharata, Nathan Lynch, linuxppc-dev, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, Mel Gorman, Joonsoo Kim, Michael Ellerman

* Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:

> ----8<----
> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..7113b1f9cd77 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> 
>  	if (node == NUMA_NO_NODE)
>  		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);
> 
>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);

Are we okay with passing a node to get_partial_node with !NUMA_NO_NODE and
!N_MEMORY including possible nodes?

>  	if (object || node != NUMA_NO_NODE)

-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-19 14:05               ` Srikar Dronamraju
  0 siblings, 0 replies; 35+ messages in thread
From: Srikar Dronamraju @ 2020-03-19 14:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, Nathan Lynch, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim

* Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:

> ----8<----
> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..7113b1f9cd77 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> 
>  	if (node == NUMA_NO_NODE)
>  		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);
> 
>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);

Are we okay with passing a node to get_partial_node with !NUMA_NO_NODE and
!N_MEMORY including possible nodes?

>  	if (object || node != NUMA_NO_NODE)

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-19 14:05               ` Srikar Dronamraju
@ 2020-03-19 14:10                 ` Vlastimil Babka
  -1 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-19 14:10 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Sachin Sant, bharata, Nathan Lynch, linuxppc-dev, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, Mel Gorman, Joonsoo Kim, Michael Ellerman

On 3/19/20 3:05 PM, Srikar Dronamraju wrote:
> * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:
> 
>> ----8<----
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..7113b1f9cd77 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>> 
>>  	if (node == NUMA_NO_NODE)
>>  		searchnode = numa_mem_id();
>> -	else if (!node_present_pages(node))
>> -		searchnode = node_to_mem_node(node);
>> 
>>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> 
> Are we okay with passing a node to get_partial_node with !NUMA_NO_NODE and
> !N_MEMORY including possible nodes?

No, but AFAICS, such node values are already handled in ___slab_alloc, and
cannot reach get_partial(). If you see something I missed, please do tell.

>>  	if (object || node != NUMA_NO_NODE)
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-19 14:10                 ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-19 14:10 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Sachin Sant, Nathan Lynch, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim

On 3/19/20 3:05 PM, Srikar Dronamraju wrote:
> * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:
> 
>> ----8<----
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..7113b1f9cd77 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>> 
>>  	if (node == NUMA_NO_NODE)
>>  		searchnode = numa_mem_id();
>> -	else if (!node_present_pages(node))
>> -		searchnode = node_to_mem_node(node);
>> 
>>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> 
> Are we okay with passing a node to get_partial_node with !NUMA_NO_NODE and
> !N_MEMORY including possible nodes?

No, but AFAICS, such node values are already handled in ___slab_alloc, and
cannot reach get_partial(). If you see something I missed, please do tell.

>>  	if (object || node != NUMA_NO_NODE)
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-19 13:47             ` Vlastimil Babka
@ 2020-03-19 14:59               ` Sachin Sant
  -1 siblings, 0 replies; 35+ messages in thread
From: Sachin Sant @ 2020-03-19 14:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: bharata, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim,
	Michael Ellerman

>>> Great, thanks! Can I add your Tested-by: then?
>> 
>> Sure.
>> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> 
>> Thank you for the fix.
> 
> Thanks! Sorry to bother, but in the end I decided to do further change so I
> would appreciate verification if it still works as intended.

Works as expected. I am able to boot the machine without any issues.

Thanks
-Sachin



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-19 14:59               ` Sachin Sant
  0 siblings, 0 replies; 35+ messages in thread
From: Sachin Sant @ 2020-03-19 14:59 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Nathan Lynch, Srikar Dronamraju, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim

>>> Great, thanks! Can I add your Tested-by: then?
>> 
>> Sure.
>> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
>> 
>> Thank you for the fix.
> 
> Thanks! Sorry to bother, but in the end I decided to do further change so I
> would appreciate verification if it still works as intended.

Works as expected. I am able to boot the machine without any issues.

Thanks
-Sachin


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-19 13:47             ` Vlastimil Babka
@ 2020-03-20  3:42               ` Bharata B Rao
  -1 siblings, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2020-03-20  3:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim,
	Michael Ellerman

On Thu, Mar 19, 2020 at 02:47:58PM +0100, Vlastimil Babka wrote:
> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..7113b1f9cd77 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  
>  	if (node == NUMA_NO_NODE)
>  		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);
>  
>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>  	if (object || node != NUMA_NO_NODE)
> @@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	struct page *page;
>  
>  	page = c->page;
> -	if (!page)
> +	if (!page) {
> +		/*
> +		 * if the node is not online or has no normal memory, just
> +		 * ignore the node constraint
> +		 */
> +		if (unlikely(node != NUMA_NO_NODE &&
> +			     !node_state(node, N_NORMAL_MEMORY)))
> +			node = NUMA_NO_NODE;
>  		goto new_slab;
> +	}
>  redo:
>  
>  	if (unlikely(!node_match(page, node))) {
> -		int searchnode = node;
> -
> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
> -			searchnode = node_to_mem_node(node);
> -
> -		if (unlikely(!node_match(page, searchnode))) {
> +		/*
> +		 * same as above but node_match() being false already
> +		 * implies node != NUMA_NO_NODE
> +		 */
> +		if (!node_state(node, N_NORMAL_MEMORY)) {
> +			node = NUMA_NO_NODE;
> +			goto redo;
> +		} else {
>  			stat(s, ALLOC_NODE_MISMATCH);
>  			deactivate_slab(s, page, c->freelist, c);
>  			goto new_slab;

This fixes the problem I reported at
https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/

Regards,
Bharata.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-20  3:42               ` Bharata B Rao
  0 siblings, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2020-03-20  3:42 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, Mel Gorman,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, linuxppc-dev, Joonsoo Kim

On Thu, Mar 19, 2020 at 02:47:58PM +0100, Vlastimil Babka wrote:
> diff --git a/mm/slub.c b/mm/slub.c
> index 17dc00e33115..7113b1f9cd77 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>  
>  	if (node == NUMA_NO_NODE)
>  		searchnode = numa_mem_id();
> -	else if (!node_present_pages(node))
> -		searchnode = node_to_mem_node(node);
>  
>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>  	if (object || node != NUMA_NO_NODE)
> @@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  	struct page *page;
>  
>  	page = c->page;
> -	if (!page)
> +	if (!page) {
> +		/*
> +		 * if the node is not online or has no normal memory, just
> +		 * ignore the node constraint
> +		 */
> +		if (unlikely(node != NUMA_NO_NODE &&
> +			     !node_state(node, N_NORMAL_MEMORY)))
> +			node = NUMA_NO_NODE;
>  		goto new_slab;
> +	}
>  redo:
>  
>  	if (unlikely(!node_match(page, node))) {
> -		int searchnode = node;
> -
> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
> -			searchnode = node_to_mem_node(node);
> -
> -		if (unlikely(!node_match(page, searchnode))) {
> +		/*
> +		 * same as above but node_match() being false already
> +		 * implies node != NUMA_NO_NODE
> +		 */
> +		if (!node_state(node, N_NORMAL_MEMORY)) {
> +			node = NUMA_NO_NODE;
> +			goto redo;
> +		} else {
>  			stat(s, ALLOC_NODE_MISMATCH);
>  			deactivate_slab(s, page, c->freelist, c);
>  			goto new_slab;

This fixes the problem I reported at
https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-19 14:10                 ` Vlastimil Babka
@ 2020-03-20  7:46                   ` Srikar Dronamraju
  -1 siblings, 0 replies; 35+ messages in thread
From: Srikar Dronamraju @ 2020-03-20  7:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, bharata, Nathan Lynch, linuxppc-dev, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, Mel Gorman, Joonsoo Kim, Michael Ellerman

* Vlastimil Babka <vbabka@suse.cz> [2020-03-19 15:10:19]:

> On 3/19/20 3:05 PM, Srikar Dronamraju wrote:
> > * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:
> > 
> >> ----8<----
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 17dc00e33115..7113b1f9cd77 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >> 
> >>  	if (node == NUMA_NO_NODE)
> >>  		searchnode = numa_mem_id();
> >> -	else if (!node_present_pages(node))
> >> -		searchnode = node_to_mem_node(node);
> >> 
> >>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > 
> > Are we okay with passing a node to get_partial_node with !NUMA_NO_NODE and
> > !N_MEMORY including possible nodes?
> 
> No, but AFAICS, such node values are already handled in ___slab_alloc, and
> cannot reach get_partial(). If you see something I missed, please do tell.
> 

Ah I probably got confused with your previous version where
alloc_slab_page() was modified. I see no problems with this version.

Sorry for the noise.

A question just for my better understanding,
How worse would it be to set node to numa_mem_id() instead of NUMA_NODE_ID
when the current node is !N_NORMAL_MEMORY?

> >>  	if (object || node != NUMA_NO_NODE)
> > 
> 

-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-20  7:46                   ` Srikar Dronamraju
  0 siblings, 0 replies; 35+ messages in thread
From: Srikar Dronamraju @ 2020-03-20  7:46 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, Nathan Lynch, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim

* Vlastimil Babka <vbabka@suse.cz> [2020-03-19 15:10:19]:

> On 3/19/20 3:05 PM, Srikar Dronamraju wrote:
> > * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:
> > 
> >> ----8<----
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 17dc00e33115..7113b1f9cd77 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >> 
> >>  	if (node == NUMA_NO_NODE)
> >>  		searchnode = numa_mem_id();
> >> -	else if (!node_present_pages(node))
> >> -		searchnode = node_to_mem_node(node);
> >> 
> >>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> > 
> > Are we okay with passing a node to get_partial_node with !NUMA_NO_NODE and
> > !N_MEMORY including possible nodes?
> 
> No, but AFAICS, such node values are already handled in ___slab_alloc, and
> cannot reach get_partial(). If you see something I missed, please do tell.
> 

Ah I probably got confused with your previous version where
alloc_slab_page() was modified. I see no problems with this version.

Sorry for the noise.

A question just for my better understanding,
How worse would it be to set node to numa_mem_id() instead of NUMA_NODE_ID
when the current node is !N_NORMAL_MEMORY?

> >>  	if (object || node != NUMA_NO_NODE)
> > 
> 

-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-20  3:42               ` Bharata B Rao
@ 2020-03-20  8:37                 ` Vlastimil Babka
  -1 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-20  8:37 UTC (permalink / raw)
  To: bharata
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim,
	Michael Ellerman

On 3/20/20 4:42 AM, Bharata B Rao wrote:
> On Thu, Mar 19, 2020 at 02:47:58PM +0100, Vlastimil Babka wrote:
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..7113b1f9cd77 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>>  
>>  	if (node == NUMA_NO_NODE)
>>  		searchnode = numa_mem_id();
>> -	else if (!node_present_pages(node))
>> -		searchnode = node_to_mem_node(node);
>>  
>>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>>  	if (object || node != NUMA_NO_NODE)
>> @@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  	struct page *page;
>>  
>>  	page = c->page;
>> -	if (!page)
>> +	if (!page) {
>> +		/*
>> +		 * if the node is not online or has no normal memory, just
>> +		 * ignore the node constraint
>> +		 */
>> +		if (unlikely(node != NUMA_NO_NODE &&
>> +			     !node_state(node, N_NORMAL_MEMORY)))
>> +			node = NUMA_NO_NODE;
>>  		goto new_slab;
>> +	}
>>  redo:
>>  
>>  	if (unlikely(!node_match(page, node))) {
>> -		int searchnode = node;
>> -
>> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
>> -			searchnode = node_to_mem_node(node);
>> -
>> -		if (unlikely(!node_match(page, searchnode))) {
>> +		/*
>> +		 * same as above but node_match() being false already
>> +		 * implies node != NUMA_NO_NODE
>> +		 */
>> +		if (!node_state(node, N_NORMAL_MEMORY)) {
>> +			node = NUMA_NO_NODE;
>> +			goto redo;
>> +		} else {
>>  			stat(s, ALLOC_NODE_MISMATCH);
>>  			deactivate_slab(s, page, c->freelist, c);
>>  			goto new_slab;
> 
> This fixes the problem I reported at
> https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/

Thanks, I hope it means I can make it Reported-and-tested-by: you


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-20  8:37                 ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-20  8:37 UTC (permalink / raw)
  To: bharata
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, Mel Gorman,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, linuxppc-dev, Joonsoo Kim

On 3/20/20 4:42 AM, Bharata B Rao wrote:
> On Thu, Mar 19, 2020 at 02:47:58PM +0100, Vlastimil Babka wrote:
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 17dc00e33115..7113b1f9cd77 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
>>  
>>  	if (node == NUMA_NO_NODE)
>>  		searchnode = numa_mem_id();
>> -	else if (!node_present_pages(node))
>> -		searchnode = node_to_mem_node(node);
>>  
>>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
>>  	if (object || node != NUMA_NO_NODE)
>> @@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  	struct page *page;
>>  
>>  	page = c->page;
>> -	if (!page)
>> +	if (!page) {
>> +		/*
>> +		 * if the node is not online or has no normal memory, just
>> +		 * ignore the node constraint
>> +		 */
>> +		if (unlikely(node != NUMA_NO_NODE &&
>> +			     !node_state(node, N_NORMAL_MEMORY)))
>> +			node = NUMA_NO_NODE;
>>  		goto new_slab;
>> +	}
>>  redo:
>>  
>>  	if (unlikely(!node_match(page, node))) {
>> -		int searchnode = node;
>> -
>> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
>> -			searchnode = node_to_mem_node(node);
>> -
>> -		if (unlikely(!node_match(page, searchnode))) {
>> +		/*
>> +		 * same as above but node_match() being false already
>> +		 * implies node != NUMA_NO_NODE
>> +		 */
>> +		if (!node_state(node, N_NORMAL_MEMORY)) {
>> +			node = NUMA_NO_NODE;
>> +			goto redo;
>> +		} else {
>>  			stat(s, ALLOC_NODE_MISMATCH);
>>  			deactivate_slab(s, page, c->freelist, c);
>>  			goto new_slab;
> 
> This fixes the problem I reported at
> https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/

Thanks, I hope it means I can make it Reported-and-tested-by: you

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-20  7:46                   ` Srikar Dronamraju
@ 2020-03-20  8:43                     ` Vlastimil Babka
  -1 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-20  8:43 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Sachin Sant, bharata, Nathan Lynch, linuxppc-dev, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, Mel Gorman, Joonsoo Kim, Michael Ellerman

On 3/20/20 8:46 AM, Srikar Dronamraju wrote:
> * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 15:10:19]:
> 
>> On 3/19/20 3:05 PM, Srikar Dronamraju wrote:
>> > * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:
>> > 
>> 
>> No, but AFAICS, such node values are already handled in ___slab_alloc, and
>> cannot reach get_partial(). If you see something I missed, please do tell.
>> 
> 
> Ah I probably got confused with your previous version where
> alloc_slab_page() was modified. I see no problems with this version.

Thanks!

> Sorry for the noise.

No problem.

> A question just for my better understanding,
> How worse would it be to set node to numa_mem_id() instead of NUMA_NODE_ID
> when the current node is !N_NORMAL_MEMORY?

(I'm assuming you mean s/NUMA_NODE_ID/NUMA_NO_NODE/)

Well, numa_mem_id() should work too, but it would make the allocation
constrained to the node of current cpu, with all the consequences (deactivating
percpu slab if it was from a different node etc).

There's no reason why this cpu's node should be the closest node to the one that
was originally requested (but has no memory), so it's IMO pointless or even
suboptimal to constraint to it. This can be revisited in case we get guaranteed
existence of node data with zonelists for all possible nodes, but for now
NUMA_NO_NODE seems the most reasonable fix to me.

>> >>  	if (object || node != NUMA_NO_NODE)
>> > 
>> 
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-20  8:43                     ` Vlastimil Babka
  0 siblings, 0 replies; 35+ messages in thread
From: Vlastimil Babka @ 2020-03-20  8:43 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Sachin Sant, Nathan Lynch, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim

On 3/20/20 8:46 AM, Srikar Dronamraju wrote:
> * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 15:10:19]:
> 
>> On 3/19/20 3:05 PM, Srikar Dronamraju wrote:
>> > * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:
>> > 
>> 
>> No, but AFAICS, such node values are already handled in ___slab_alloc, and
>> cannot reach get_partial(). If you see something I missed, please do tell.
>> 
> 
> Ah I probably got confused with your previous version where
> alloc_slab_page() was modified. I see no problems with this version.

Thanks!

> Sorry for the noise.

No problem.

> A question just for my better understanding,
> How worse would it be to set node to numa_mem_id() instead of NUMA_NODE_ID
> when the current node is !N_NORMAL_MEMORY?

(I'm assuming you mean s/NUMA_NODE_ID/NUMA_NO_NODE/)

Well, numa_mem_id() should work too, but it would make the allocation
constrained to the node of current cpu, with all the consequences (deactivating
percpu slab if it was from a different node etc).

There's no reason why this cpu's node should be the closest node to the one that
was originally requested (but has no memory), so it's IMO pointless or even
suboptimal to constraint to it. This can be revisited in case we get guaranteed
existence of node data with zonelists for all possible nodes, but for now
NUMA_NO_NODE seems the most reasonable fix to me.

>> >>  	if (object || node != NUMA_NO_NODE)
>> > 
>> 
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-20  8:37                 ` Vlastimil Babka
@ 2020-03-20  8:44                   ` Bharata B Rao
  -1 siblings, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2020-03-20  8:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, linuxppc-dev,
	Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, Mel Gorman, Joonsoo Kim,
	Michael Ellerman, puvichakravarthy

On Fri, Mar 20, 2020 at 09:37:18AM +0100, Vlastimil Babka wrote:
> On 3/20/20 4:42 AM, Bharata B Rao wrote:
> > On Thu, Mar 19, 2020 at 02:47:58PM +0100, Vlastimil Babka wrote:
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 17dc00e33115..7113b1f9cd77 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >>  
> >>  	if (node == NUMA_NO_NODE)
> >>  		searchnode = numa_mem_id();
> >> -	else if (!node_present_pages(node))
> >> -		searchnode = node_to_mem_node(node);
> >>  
> >>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> >>  	if (object || node != NUMA_NO_NODE)
> >> @@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> >>  	struct page *page;
> >>  
> >>  	page = c->page;
> >> -	if (!page)
> >> +	if (!page) {
> >> +		/*
> >> +		 * if the node is not online or has no normal memory, just
> >> +		 * ignore the node constraint
> >> +		 */
> >> +		if (unlikely(node != NUMA_NO_NODE &&
> >> +			     !node_state(node, N_NORMAL_MEMORY)))
> >> +			node = NUMA_NO_NODE;
> >>  		goto new_slab;
> >> +	}
> >>  redo:
> >>  
> >>  	if (unlikely(!node_match(page, node))) {
> >> -		int searchnode = node;
> >> -
> >> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
> >> -			searchnode = node_to_mem_node(node);
> >> -
> >> -		if (unlikely(!node_match(page, searchnode))) {
> >> +		/*
> >> +		 * same as above but node_match() being false already
> >> +		 * implies node != NUMA_NO_NODE
> >> +		 */
> >> +		if (!node_state(node, N_NORMAL_MEMORY)) {
> >> +			node = NUMA_NO_NODE;
> >> +			goto redo;
> >> +		} else {
> >>  			stat(s, ALLOC_NODE_MISMATCH);
> >>  			deactivate_slab(s, page, c->freelist, c);
> >>  			goto new_slab;
> > 
> > This fixes the problem I reported at
> > https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
> 
> Thanks, I hope it means I can make it Reported-and-tested-by: you

It was reeported first by PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
You can add my tested-by.

Regards,
Bharata.



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-20  8:44                   ` Bharata B Rao
  0 siblings, 0 replies; 35+ messages in thread
From: Bharata B Rao @ 2020-03-20  8:44 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, Nathan Lynch, Srikar Dronamraju, puvichakravarthy,
	Mel Gorman, Michal Hocko, Pekka Enberg, linux-mm, Kirill Tkhai,
	David Rientjes, Christopher Lameter, linuxppc-dev, Joonsoo Kim

On Fri, Mar 20, 2020 at 09:37:18AM +0100, Vlastimil Babka wrote:
> On 3/20/20 4:42 AM, Bharata B Rao wrote:
> > On Thu, Mar 19, 2020 at 02:47:58PM +0100, Vlastimil Babka wrote:
> >> diff --git a/mm/slub.c b/mm/slub.c
> >> index 17dc00e33115..7113b1f9cd77 100644
> >> --- a/mm/slub.c
> >> +++ b/mm/slub.c
> >> @@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
> >>  
> >>  	if (node == NUMA_NO_NODE)
> >>  		searchnode = numa_mem_id();
> >> -	else if (!node_present_pages(node))
> >> -		searchnode = node_to_mem_node(node);
> >>  
> >>  	object = get_partial_node(s, get_node(s, searchnode), c, flags);
> >>  	if (object || node != NUMA_NO_NODE)
> >> @@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> >>  	struct page *page;
> >>  
> >>  	page = c->page;
> >> -	if (!page)
> >> +	if (!page) {
> >> +		/*
> >> +		 * if the node is not online or has no normal memory, just
> >> +		 * ignore the node constraint
> >> +		 */
> >> +		if (unlikely(node != NUMA_NO_NODE &&
> >> +			     !node_state(node, N_NORMAL_MEMORY)))
> >> +			node = NUMA_NO_NODE;
> >>  		goto new_slab;
> >> +	}
> >>  redo:
> >>  
> >>  	if (unlikely(!node_match(page, node))) {
> >> -		int searchnode = node;
> >> -
> >> -		if (node != NUMA_NO_NODE && !node_present_pages(node))
> >> -			searchnode = node_to_mem_node(node);
> >> -
> >> -		if (unlikely(!node_match(page, searchnode))) {
> >> +		/*
> >> +		 * same as above but node_match() being false already
> >> +		 * implies node != NUMA_NO_NODE
> >> +		 */
> >> +		if (!node_state(node, N_NORMAL_MEMORY)) {
> >> +			node = NUMA_NO_NODE;
> >> +			goto redo;
> >> +		} else {
> >>  			stat(s, ALLOC_NODE_MISMATCH);
> >>  			deactivate_slab(s, page, c->freelist, c);
> >>  			goto new_slab;
> > 
> > This fixes the problem I reported at
> > https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
> 
> Thanks, I hope it means I can make it Reported-and-tested-by: you

It was reeported first by PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy@in.ibm.com>
You can add my tested-by.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
  2020-03-20  8:43                     ` Vlastimil Babka
@ 2020-03-20 10:10                       ` Srikar Dronamraju
  -1 siblings, 0 replies; 35+ messages in thread
From: Srikar Dronamraju @ 2020-03-20 10:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, bharata, Nathan Lynch, linuxppc-dev, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, Mel Gorman, Joonsoo Kim, Michael Ellerman

* Vlastimil Babka <vbabka@suse.cz> [2020-03-20 09:43:11]:

> On 3/20/20 8:46 AM, Srikar Dronamraju wrote:
> > * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 15:10:19]:
> > 
> >> On 3/19/20 3:05 PM, Srikar Dronamraju wrote:
> >> > * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:
> >> > 
> >> 
> >> No, but AFAICS, such node values are already handled in ___slab_alloc, and
> >> cannot reach get_partial(). If you see something I missed, please do tell.
> >> 
> > 
> > Ah I probably got confused with your previous version where
> > alloc_slab_page() was modified. I see no problems with this version.
> 
> Thanks!
> 
> > Sorry for the noise.
> 
> No problem.
> 
> > A question just for my better understanding,
> > How worse would it be to set node to numa_mem_id() instead of NUMA_NODE_ID
> > when the current node is !N_NORMAL_MEMORY?
> 

Yes,

> (I'm assuming you mean s/NUMA_NODE_ID/NUMA_NO_NODE/)
> 
> Well, numa_mem_id() should work too, but it would make the allocation
> constrained to the node of current cpu, with all the consequences (deactivating
> percpu slab if it was from a different node etc).
> 
> There's no reason why this cpu's node should be the closest node to the one that
> was originally requested (but has no memory), so it's IMO pointless or even
> suboptimal to constraint to it. This can be revisited in case we get guaranteed
> existence of node data with zonelists for all possible nodes, but for now
> NUMA_NO_NODE seems the most reasonable fix to me.
> 

Okay.


-- 
Thanks and Regards
Srikar Dronamraju



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks
@ 2020-03-20 10:10                       ` Srikar Dronamraju
  0 siblings, 0 replies; 35+ messages in thread
From: Srikar Dronamraju @ 2020-03-20 10:10 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Sachin Sant, Nathan Lynch, Mel Gorman, Michal Hocko,
	Pekka Enberg, linux-mm, Kirill Tkhai, David Rientjes,
	Christopher Lameter, bharata, linuxppc-dev, Joonsoo Kim

* Vlastimil Babka <vbabka@suse.cz> [2020-03-20 09:43:11]:

> On 3/20/20 8:46 AM, Srikar Dronamraju wrote:
> > * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 15:10:19]:
> > 
> >> On 3/19/20 3:05 PM, Srikar Dronamraju wrote:
> >> > * Vlastimil Babka <vbabka@suse.cz> [2020-03-19 14:47:58]:
> >> > 
> >> 
> >> No, but AFAICS, such node values are already handled in ___slab_alloc, and
> >> cannot reach get_partial(). If you see something I missed, please do tell.
> >> 
> > 
> > Ah I probably got confused with your previous version where
> > alloc_slab_page() was modified. I see no problems with this version.
> 
> Thanks!
> 
> > Sorry for the noise.
> 
> No problem.
> 
> > A question just for my better understanding,
> > How worse would it be to set node to numa_mem_id() instead of NUMA_NODE_ID
> > when the current node is !N_NORMAL_MEMORY?
> 

Yes,

> (I'm assuming you mean s/NUMA_NODE_ID/NUMA_NO_NODE/)
> 
> Well, numa_mem_id() should work too, but it would make the allocation
> constrained to the node of current cpu, with all the consequences (deactivating
> percpu slab if it was from a different node etc).
> 
> There's no reason why this cpu's node should be the closest node to the one that
> was originally requested (but has no memory), so it's IMO pointless or even
> suboptimal to constraint to it. This can be revisited in case we get guaranteed
> existence of node data with zonelists for all possible nodes, but for now
> NUMA_NO_NODE seems the most reasonable fix to me.
> 

Okay.


-- 
Thanks and Regards
Srikar Dronamraju


^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2020-03-20 10:12 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18 14:42 [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks Vlastimil Babka
2020-03-18 14:42 ` Vlastimil Babka
2020-03-18 14:42 ` [RFC 2/2] Revert "topology: add support for node_to_mem_node() to determine the fallback node" Vlastimil Babka
2020-03-18 16:06 ` [RFC 1/2] mm, slub: prevent kmalloc_node crashes and memory leaks Bharata B Rao
2020-03-18 16:06   ` Bharata B Rao
2020-03-18 16:10   ` Vlastimil Babka
2020-03-18 16:10     ` Vlastimil Babka
2020-03-18 16:51   ` Vlastimil Babka
2020-03-18 16:51     ` Vlastimil Babka
2020-03-19  8:52     ` Sachin Sant
2020-03-19  8:52       ` Sachin Sant
2020-03-19 13:23       ` Vlastimil Babka
2020-03-19 13:23         ` Vlastimil Babka
2020-03-19 13:26         ` Sachin Sant
2020-03-19 13:26           ` Sachin Sant
2020-03-19 13:47           ` Vlastimil Babka
2020-03-19 13:47             ` Vlastimil Babka
2020-03-19 14:05             ` Srikar Dronamraju
2020-03-19 14:05               ` Srikar Dronamraju
2020-03-19 14:10               ` Vlastimil Babka
2020-03-19 14:10                 ` Vlastimil Babka
2020-03-20  7:46                 ` Srikar Dronamraju
2020-03-20  7:46                   ` Srikar Dronamraju
2020-03-20  8:43                   ` Vlastimil Babka
2020-03-20  8:43                     ` Vlastimil Babka
2020-03-20 10:10                     ` Srikar Dronamraju
2020-03-20 10:10                       ` Srikar Dronamraju
2020-03-19 14:59             ` Sachin Sant
2020-03-19 14:59               ` Sachin Sant
2020-03-20  3:42             ` Bharata B Rao
2020-03-20  3:42               ` Bharata B Rao
2020-03-20  8:37               ` Vlastimil Babka
2020-03-20  8:37                 ` Vlastimil Babka
2020-03-20  8:44                 ` Bharata B Rao
2020-03-20  8:44                   ` Bharata B Rao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.