[PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  4:28 ` Paul Mackerras
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Mackerras @ 2014-12-01  4:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linuxppc-dev, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton

The bounds check for nodeid in ____cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers, like this:

kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
CFAR: c00000000047e978 SOFTE: 0
GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
Call Trace:
[c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
[c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
[c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
[c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
[c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
[c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
Instruction dump:
7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14

To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
example if the node is off-line), the BUG_ON in the next line will
catch that.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
v2: include the oops message in the patch description

 mm/slab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	void *obj;
 	int x;
 
-	VM_BUG_ON(nodeid > num_online_nodes());
+	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
 	n = get_node(cachep, nodeid);
 	BUG_ON(!n);
 
-- 
2.1.1


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  4:28 ` Paul Mackerras
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Mackerras @ 2014-12-01  4:28 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linuxppc-dev, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton

The bounds check for nodeid in ____cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers, like this:

kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
CFAR: c00000000047e978 SOFTE: 0
GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
Call Trace:
[c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
[c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
[c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
[c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
[c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
[c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
Instruction dump:
7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14

To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
example if the node is off-line), the BUG_ON in the next line will
catch that.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
v2: include the oops message in the patch description

 mm/slab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	void *obj;
 	int x;
 
-	VM_BUG_ON(nodeid > num_online_nodes());
+	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
 	n = get_node(cachep, nodeid);
 	BUG_ON(!n);
 
-- 
2.1.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  4:28 ` Paul Mackerras
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Mackerras @ 2014-12-01  4:28 UTC (permalink / raw)
  To: linux-mm
  Cc: Andrew Morton, linux-kernel, Pekka Enberg, linuxppc-dev,
	David Rientjes, Christoph Lameter, Joonsoo Kim

The bounds check for nodeid in ____cache_alloc_node gives false
positives on machines where the node IDs are not contiguous, leading
to a panic at boot time.  For example, on a POWER8 machine the node
IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
VM_BUG_ON triggers, like this:

kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
Oops: Exception in kernel mode, sig: 5 [#1]
SMP NR_CPUS=1024 NUMA PowerNV
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
CFAR: c00000000047e978 SOFTE: 0
GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
Call Trace:
[c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
[c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
[c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
[c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
[c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
[c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
Instruction dump:
7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14

To fix this, we instead compare the nodeid with MAX_NUMNODES, and
additionally make sure it isn't negative (since nodeid is an int).
The check is there mainly to protect the array dereference in the
get_node() call in the next line, and the array being dereferenced is
of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
example if the node is off-line), the BUG_ON in the next line will
catch that.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
v2: include the oops message in the patch description

 mm/slab.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index eb2b2ea..f34e053 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
 	void *obj;
 	int x;
 
-	VM_BUG_ON(nodeid > num_online_nodes());
+	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
 	n = get_node(cachep, nodeid);
 	BUG_ON(!n);
 
-- 
2.1.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
  2014-12-01  4:28 ` Paul Mackerras
  (?)
@ 2014-12-01  4:58   ` Yasuaki Ishimatsu
  -1 siblings, 0 replies; 21+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-01  4:58 UTC (permalink / raw)
  To: Paul Mackerras, linux-mm
  Cc: linux-kernel, linuxppc-dev, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton

(2014/12/01 13:28), Paul Mackerras wrote:
> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
>
> kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
> task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
> NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
> REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
> MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
> CFAR: c00000000047e978 SOFTE: 0
> GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
> GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
> GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
> GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
> GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
> GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
> GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
> NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
> LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
> Call Trace:
> [c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
> [c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
> [c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
> [c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
> [c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
> [c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
> Instruction dump:
> 7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
> 60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14
>
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.
>
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---

Looks good to me.

Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If you need to backport it into -stable kernel, please read
Documentation/stable_kernel_rules.txt.

Thanks,
Yasuaki Ishimatsu

> v2: include the oops message in the patch description
>
>   mm/slab.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index eb2b2ea..f34e053 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
>   	void *obj;
>   	int x;
>
> -	VM_BUG_ON(nodeid > num_online_nodes());
> +	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
>   	n = get_node(cachep, nodeid);
>   	BUG_ON(!n);
>
>



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  4:58   ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 21+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-01  4:58 UTC (permalink / raw)
  To: Paul Mackerras, linux-mm
  Cc: linux-kernel, linuxppc-dev, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton

(2014/12/01 13:28), Paul Mackerras wrote:
> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
>
> kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
> task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
> NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
> REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
> MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
> CFAR: c00000000047e978 SOFTE: 0
> GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
> GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
> GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
> GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
> GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
> GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
> GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
> NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
> LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
> Call Trace:
> [c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
> [c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
> [c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
> [c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
> [c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
> [c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
> Instruction dump:
> 7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
> 60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14
>
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.
>
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---

Looks good to me.

Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If you need to backport it into -stable kernel, please read
Documentation/stable_kernel_rules.txt.

Thanks,
Yasuaki Ishimatsu

> v2: include the oops message in the patch description
>
>   mm/slab.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index eb2b2ea..f34e053 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
>   	void *obj;
>   	int x;
>
> -	VM_BUG_ON(nodeid > num_online_nodes());
> +	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
>   	n = get_node(cachep, nodeid);
>   	BUG_ON(!n);
>
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  4:58   ` Yasuaki Ishimatsu
  0 siblings, 0 replies; 21+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-01  4:58 UTC (permalink / raw)
  To: Paul Mackerras, linux-mm
  Cc: Andrew Morton, linux-kernel, Pekka Enberg, linuxppc-dev,
	David Rientjes, Christoph Lameter, Joonsoo Kim

(2014/12/01 13:28), Paul Mackerras wrote:
> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
>
> kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
> task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
> NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
> REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
> MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
> CFAR: c00000000047e978 SOFTE: 0
> GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
> GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
> GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
> GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
> GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
> GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
> GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
> NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
> LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
> Call Trace:
> [c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
> [c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
> [c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
> [c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
> [c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
> [c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
> Instruction dump:
> 7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
> 60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14
>
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.
>
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---

Looks good to me.

Reviewed-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>

If you need to backport it into -stable kernel, please read
Documentation/stable_kernel_rules.txt.

Thanks,
Yasuaki Ishimatsu

> v2: include the oops message in the patch description
>
>   mm/slab.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index eb2b2ea..f34e053 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
>   	void *obj;
>   	int x;
>
> -	VM_BUG_ON(nodeid > num_online_nodes());
> +	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
>   	n = get_node(cachep, nodeid);
>   	BUG_ON(!n);
>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
  2014-12-01  4:28 ` Paul Mackerras
  (?)
@ 2014-12-01  5:02   ` Michael Ellerman
  -1 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2014-12-01  5:02 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linux-mm, Andrew Morton, linux-kernel, Pekka Enberg,
	linuxppc-dev, David Rientjes, Christoph Lameter, Joonsoo Kim

On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
...
> 
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.

When did this break? How come we only just noticed?

Also needs:

Cc: stable@vger.kernel.org

cheers




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  5:02   ` Michael Ellerman
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2014-12-01  5:02 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linux-mm, Andrew Morton, linux-kernel, Pekka Enberg,
	linuxppc-dev, David Rientjes, Christoph Lameter, Joonsoo Kim

On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
...
> 
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.

When did this break? How come we only just noticed?

Also needs:

Cc: stable@vger.kernel.org

cheers



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  5:02   ` Michael Ellerman
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2014-12-01  5:02 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxppc-dev, linux-kernel, Pekka Enberg, linux-mm,
	David Rientjes, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
...
> 
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.

When did this break? How come we only just noticed?

Also needs:

Cc: stable@vger.kernel.org

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
  2014-12-01  5:02   ` Michael Ellerman
  (?)
@ 2014-12-01  5:24     ` Paul Mackerras
  -1 siblings, 0 replies; 21+ messages in thread
From: Paul Mackerras @ 2014-12-01  5:24 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-mm, Andrew Morton, linux-kernel, Pekka Enberg,
	linuxppc-dev, David Rientjes, Christoph Lameter, Joonsoo Kim

On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
> On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> > The bounds check for nodeid in ____cache_alloc_node gives false
> > positives on machines where the node IDs are not contiguous, leading
> > to a panic at boot time.  For example, on a POWER8 machine the node
> > IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> > returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> > VM_BUG_ON triggers, like this:
> ...
> > 
> > To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> > additionally make sure it isn't negative (since nodeid is an int).
> > The check is there mainly to protect the array dereference in the
> > get_node() call in the next line, and the array being dereferenced is
> > of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> > example if the node is off-line), the BUG_ON in the next line will
> > catch that.
> 
> When did this break? How come we only just noticed?

Commit 14e50c6a9bc2, which went into 3.10-rc1.

You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
and you're running on a machine with discontiguous node IDs.

> Also needs:
> 
> Cc: stable@vger.kernel.org

It does.  I remembered that a minute after I sent the patch.

Paul.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  5:24     ` Paul Mackerras
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Mackerras @ 2014-12-01  5:24 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linux-mm, Andrew Morton, linux-kernel, Pekka Enberg,
	linuxppc-dev, David Rientjes, Christoph Lameter, Joonsoo Kim

On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
> On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> > The bounds check for nodeid in ____cache_alloc_node gives false
> > positives on machines where the node IDs are not contiguous, leading
> > to a panic at boot time.  For example, on a POWER8 machine the node
> > IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> > returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> > VM_BUG_ON triggers, like this:
> ...
> > 
> > To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> > additionally make sure it isn't negative (since nodeid is an int).
> > The check is there mainly to protect the array dereference in the
> > get_node() call in the next line, and the array being dereferenced is
> > of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> > example if the node is off-line), the BUG_ON in the next line will
> > catch that.
> 
> When did this break? How come we only just noticed?

Commit 14e50c6a9bc2, which went into 3.10-rc1.

You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
and you're running on a machine with discontiguous node IDs.

> Also needs:
> 
> Cc: stable@vger.kernel.org

It does.  I remembered that a minute after I sent the patch.

Paul.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  5:24     ` Paul Mackerras
  0 siblings, 0 replies; 21+ messages in thread
From: Paul Mackerras @ 2014-12-01  5:24 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: linuxppc-dev, linux-kernel, Pekka Enberg, linux-mm,
	David Rientjes, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
> On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> > The bounds check for nodeid in ____cache_alloc_node gives false
> > positives on machines where the node IDs are not contiguous, leading
> > to a panic at boot time.  For example, on a POWER8 machine the node
> > IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> > returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> > VM_BUG_ON triggers, like this:
> ...
> > 
> > To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> > additionally make sure it isn't negative (since nodeid is an int).
> > The check is there mainly to protect the array dereference in the
> > get_node() call in the next line, and the array being dereferenced is
> > of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> > example if the node is off-line), the BUG_ON in the next line will
> > catch that.
> 
> When did this break? How come we only just noticed?

Commit 14e50c6a9bc2, which went into 3.10-rc1.

You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
and you're running on a machine with discontiguous node IDs.

> Also needs:
> 
> Cc: stable@vger.kernel.org

It does.  I remembered that a minute after I sent the patch.

Paul.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
  2014-12-01  5:24     ` Paul Mackerras
  (?)
@ 2014-12-01  8:52       ` Michael Ellerman
  -1 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2014-12-01  8:52 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linux-mm, Andrew Morton, linux-kernel, Pekka Enberg,
	linuxppc-dev, David Rientjes, Christoph Lameter, Joonsoo Kim

On Mon, 2014-12-01 at 16:24 +1100, Paul Mackerras wrote:
> On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
> > On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> > > The bounds check for nodeid in ____cache_alloc_node gives false
> > > positives on machines where the node IDs are not contiguous, leading
> > > to a panic at boot time.  For example, on a POWER8 machine the node
> > > IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> > > returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> > > VM_BUG_ON triggers, like this:
> > ...
> > > 
> > > To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> > > additionally make sure it isn't negative (since nodeid is an int).
> > > The check is there mainly to protect the array dereference in the
> > > get_node() call in the next line, and the array being dereferenced is
> > > of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> > > example if the node is off-line), the BUG_ON in the next line will
> > > catch that.
> > 
> > When did this break? How come we only just noticed?
> 
> Commit 14e50c6a9bc2, which went into 3.10-rc1.

OK. So a Fixes tag is nice:

Fixes: 14e50c6a9bc2 ("mm: slab: Verify the nodeid passed to ____cache_alloc_node")

> You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
> and you're running on a machine with discontiguous node IDs.

Right. And we have SLUB=y for all the defconfigs that are likely to hit that.

> > Also needs:
> > 
> > Cc: stable@vger.kernel.org
> 
> It does.  I remembered that a minute after I sent the patch.

OK. Hopefully one of the slab maintainers will be happy to add it for us when
they merge this?

cheers




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  8:52       ` Michael Ellerman
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2014-12-01  8:52 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linux-mm, Andrew Morton, linux-kernel, Pekka Enberg,
	linuxppc-dev, David Rientjes, Christoph Lameter, Joonsoo Kim

On Mon, 2014-12-01 at 16:24 +1100, Paul Mackerras wrote:
> On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
> > On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> > > The bounds check for nodeid in ____cache_alloc_node gives false
> > > positives on machines where the node IDs are not contiguous, leading
> > > to a panic at boot time.  For example, on a POWER8 machine the node
> > > IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> > > returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> > > VM_BUG_ON triggers, like this:
> > ...
> > > 
> > > To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> > > additionally make sure it isn't negative (since nodeid is an int).
> > > The check is there mainly to protect the array dereference in the
> > > get_node() call in the next line, and the array being dereferenced is
> > > of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> > > example if the node is off-line), the BUG_ON in the next line will
> > > catch that.
> > 
> > When did this break? How come we only just noticed?
> 
> Commit 14e50c6a9bc2, which went into 3.10-rc1.

OK. So a Fixes tag is nice:

Fixes: 14e50c6a9bc2 ("mm: slab: Verify the nodeid passed to ____cache_alloc_node")

> You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
> and you're running on a machine with discontiguous node IDs.

Right. And we have SLUB=y for all the defconfigs that are likely to hit that.

> > Also needs:
> > 
> > Cc: stable@vger.kernel.org
> 
> It does.  I remembered that a minute after I sent the patch.

OK. Hopefully one of the slab maintainers will be happy to add it for us when
they merge this?

cheers



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  8:52       ` Michael Ellerman
  0 siblings, 0 replies; 21+ messages in thread
From: Michael Ellerman @ 2014-12-01  8:52 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linuxppc-dev, linux-kernel, Pekka Enberg, linux-mm,
	David Rientjes, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Mon, 2014-12-01 at 16:24 +1100, Paul Mackerras wrote:
> On Mon, Dec 01, 2014 at 04:02:14PM +1100, Michael Ellerman wrote:
> > On Mon, 2014-12-01 at 15:28 +1100, Paul Mackerras wrote:
> > > The bounds check for nodeid in ____cache_alloc_node gives false
> > > positives on machines where the node IDs are not contiguous, leading
> > > to a panic at boot time.  For example, on a POWER8 machine the node
> > > IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> > > returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> > > VM_BUG_ON triggers, like this:
> > ...
> > > 
> > > To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> > > additionally make sure it isn't negative (since nodeid is an int).
> > > The check is there mainly to protect the array dereference in the
> > > get_node() call in the next line, and the array being dereferenced is
> > > of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> > > example if the node is off-line), the BUG_ON in the next line will
> > > catch that.
> > 
> > When did this break? How come we only just noticed?
> 
> Commit 14e50c6a9bc2, which went into 3.10-rc1.

OK. So a Fixes tag is nice:

Fixes: 14e50c6a9bc2 ("mm: slab: Verify the nodeid passed to ____cache_alloc_node")

> You'll only notice if you have CONFIG_SLAB=y and CONFIG_DEBUG_VM=y
> and you're running on a machine with discontiguous node IDs.

Right. And we have SLUB=y for all the defconfigs that are likely to hit that.

> > Also needs:
> > 
> > Cc: stable@vger.kernel.org
> 
> It does.  I remembered that a minute after I sent the patch.

OK. Hopefully one of the slab maintainers will be happy to add it for us when
they merge this?

cheers

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
  2014-12-01  4:28 ` Paul Mackerras
  (?)
@ 2014-12-01  9:22   ` Pekka Enberg
  -1 siblings, 0 replies; 21+ messages in thread
From: Pekka Enberg @ 2014-12-01  9:22 UTC (permalink / raw)
  To: Paul Mackerras, linux-mm
  Cc: linux-kernel, linuxppc-dev, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton

On 12/1/14 6:28 AM, Paul Mackerras wrote:
> ---
> v2: include the oops message in the patch description
>
>   mm/slab.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index eb2b2ea..f34e053 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
>   	void *obj;
>   	int x;
>   
> -	VM_BUG_ON(nodeid > num_online_nodes());
> +	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
>   	n = get_node(cachep, nodeid);
>   	BUG_ON(!n);

Reviewed-by: Pekka Enberg <penberg@kernel.org>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  9:22   ` Pekka Enberg
  0 siblings, 0 replies; 21+ messages in thread
From: Pekka Enberg @ 2014-12-01  9:22 UTC (permalink / raw)
  To: Paul Mackerras, linux-mm
  Cc: linux-kernel, linuxppc-dev, Christoph Lameter, Pekka Enberg,
	David Rientjes, Joonsoo Kim, Andrew Morton

On 12/1/14 6:28 AM, Paul Mackerras wrote:
> ---
> v2: include the oops message in the patch description
>
>   mm/slab.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index eb2b2ea..f34e053 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
>   	void *obj;
>   	int x;
>   
> -	VM_BUG_ON(nodeid > num_online_nodes());
> +	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
>   	n = get_node(cachep, nodeid);
>   	BUG_ON(!n);

Reviewed-by: Pekka Enberg <penberg@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01  9:22   ` Pekka Enberg
  0 siblings, 0 replies; 21+ messages in thread
From: Pekka Enberg @ 2014-12-01  9:22 UTC (permalink / raw)
  To: Paul Mackerras, linux-mm
  Cc: Andrew Morton, linux-kernel, Pekka Enberg, linuxppc-dev,
	David Rientjes, Christoph Lameter, Joonsoo Kim

On 12/1/14 6:28 AM, Paul Mackerras wrote:
> ---
> v2: include the oops message in the patch description
>
>   mm/slab.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index eb2b2ea..f34e053 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3076,7 +3076,7 @@ static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags,
>   	void *obj;
>   	int x;
>   
> -	VM_BUG_ON(nodeid > num_online_nodes());
> +	VM_BUG_ON(nodeid < 0 || nodeid >= MAX_NUMNODES);
>   	n = get_node(cachep, nodeid);
>   	BUG_ON(!n);

Reviewed-by: Pekka Enberg <penberg@kernel.org>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
  2014-12-01  4:28 ` Paul Mackerras
  (?)
@ 2014-12-01 21:06   ` David Rientjes
  -1 siblings, 0 replies; 21+ messages in thread
From: David Rientjes @ 2014-12-01 21:06 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linux-mm, linux-kernel, linuxppc-dev, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Andrew Morton

On Mon, 1 Dec 2014, Paul Mackerras wrote:

> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
> 
> kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
> task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
> NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
> REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
> MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
> CFAR: c00000000047e978 SOFTE: 0
> GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
> GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
> GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
> GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
> GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
> GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
> GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
> NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
> LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
> Call Trace:
> [c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
> [c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
> [c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
> [c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
> [c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
> [c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
> Instruction dump:
> 7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
> 60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14
> 
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01 21:06   ` David Rientjes
  0 siblings, 0 replies; 21+ messages in thread
From: David Rientjes @ 2014-12-01 21:06 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: linux-mm, linux-kernel, linuxppc-dev, Christoph Lameter,
	Pekka Enberg, Joonsoo Kim, Andrew Morton

On Mon, 1 Dec 2014, Paul Mackerras wrote:

> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
> 
> kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
> task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
> NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
> REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
> MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
> CFAR: c00000000047e978 SOFTE: 0
> GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
> GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
> GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
> GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
> GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
> GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
> GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
> NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
> LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
> Call Trace:
> [c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
> [c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
> [c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
> [c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
> [c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
> [c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
> Instruction dump:
> 7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
> 60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14
> 
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs
@ 2014-12-01 21:06   ` David Rientjes
  0 siblings, 0 replies; 21+ messages in thread
From: David Rientjes @ 2014-12-01 21:06 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, linuxppc-dev, linux-kernel, Pekka Enberg,
	linux-mm, Christoph Lameter, Joonsoo Kim

On Mon, 1 Dec 2014, Paul Mackerras wrote:

> The bounds check for nodeid in ____cache_alloc_node gives false
> positives on machines where the node IDs are not contiguous, leading
> to a panic at boot time.  For example, on a POWER8 machine the node
> IDs are typically 0, 1, 16 and 17.  This means that num_online_nodes()
> returns 4, so when ____cache_alloc_node is called with nodeid = 16 the
> VM_BUG_ON triggers, like this:
> 
> kernel BUG at /home/paulus/kernel/kvm/mm/slab.c:3079!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=1024 NUMA PowerNV
> Modules linked in:
> CPU: 0 PID: 0 Comm: swapper Not tainted 3.18.0-rc5-kvm+ #17
> task: c0000000013ba230 ti: c000000001494000 task.ti: c000000001494000
> NIP: c000000000264f6c LR: c000000000264f5c CTR: 0000000000000000
> REGS: c0000000014979a0 TRAP: 0700   Not tainted  (3.18.0-rc5-kvm+)
> MSR: 9000000002021032 <SF,HV,VEC,ME,IR,DR,RI>  CR: 28000448  XER: 20000000
> CFAR: c00000000047e978 SOFTE: 0
> GPR00: c000000000264f5c c000000001497c20 c000000001499d48 0000000000000004
> GPR04: 0000000000000100 0000000000000010 0000000000000068 ffffffffffffffff
> GPR08: 0000000000000000 0000000000000001 00000000082d0000 c000000000cca5a8
> GPR12: 0000000048000448 c00000000fda0000 000001003bd44ff0 0000000010020578
> GPR16: 000001003bd44ff8 000001003bd45000 0000000000000001 0000000000000000
> GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000010
> GPR24: c000000ffe000080 c000000000c824ec 0000000000000068 c000000ffe000080
> GPR28: 0000000000000010 c000000ffe000080 0000000000000010 0000000000000000
> NIP [c000000000264f6c] .____cache_alloc_node+0x6c/0x270
> LR [c000000000264f5c] .____cache_alloc_node+0x5c/0x270
> Call Trace:
> [c000000001497c20] [c000000000264f5c] .____cache_alloc_node+0x5c/0x270 (unreliable)
> [c000000001497cf0] [c00000000026552c] .kmem_cache_alloc_node_trace+0xdc/0x360
> [c000000001497dc0] [c000000000c824ec] .init_list+0x3c/0x128
> [c000000001497e50] [c000000000c827b4] .kmem_cache_init+0x1dc/0x258
> [c000000001497ef0] [c000000000c54090] .start_kernel+0x2a0/0x568
> [c000000001497f90] [c000000000008c6c] start_here_common+0x20/0xa8
> Instruction dump:
> 7c7d1b78 7c962378 4bda4e91 60000000 3c620004 38800100 386370d8 48219959
> 60000000 7f83e000 7d301026 5529effe <0b090000> 393c0010 79291f24 7d3d4a14
> 
> To fix this, we instead compare the nodeid with MAX_NUMNODES, and
> additionally make sure it isn't negative (since nodeid is an int).
> The check is there mainly to protect the array dereference in the
> get_node() call in the next line, and the array being dereferenced is
> of size MAX_NUMNODES.  If the nodeid is in range but invalid (for
> example if the node is off-line), the BUG_ON in the next line will
> catch that.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: David Rientjes <rientjes@google.com>

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2014-12-01 21:06 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-01  4:28 [PATCH v2] slab: Fix nodeid bounds check for non-contiguous node IDs Paul Mackerras
2014-12-01  4:28 ` Paul Mackerras
2014-12-01  4:28 ` Paul Mackerras
2014-12-01  4:58 ` Yasuaki Ishimatsu
2014-12-01  4:58   ` Yasuaki Ishimatsu
2014-12-01  4:58   ` Yasuaki Ishimatsu
2014-12-01  5:02 ` Michael Ellerman
2014-12-01  5:02   ` Michael Ellerman
2014-12-01  5:02   ` Michael Ellerman
2014-12-01  5:24   ` Paul Mackerras
2014-12-01  5:24     ` Paul Mackerras
2014-12-01  5:24     ` Paul Mackerras
2014-12-01  8:52     ` Michael Ellerman
2014-12-01  8:52       ` Michael Ellerman
2014-12-01  8:52       ` Michael Ellerman
2014-12-01  9:22 ` Pekka Enberg
2014-12-01  9:22   ` Pekka Enberg
2014-12-01  9:22   ` Pekka Enberg
2014-12-01 21:06 ` David Rientjes
2014-12-01 21:06   ` David Rientjes
2014-12-01 21:06   ` David Rientjes

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.