* FIX [1/2] slub: Do not dereference NULL pointer in node_match
[not found] <20130123214514.370647954@linux.com>
@ 2013-01-23 21:45 ` Christoph Lameter
2013-01-23 21:45 ` Christoph Lameter
1 sibling, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw)
To: Pekka Enberg
Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
David Rientjes, elezegarcia
The variables accessed in slab_alloc are volatile and therefore
the page pointer passed to node_match can be NULL. The processing
of data in slab_alloc is tentative until either the cmpxhchg
succeeds or the __slab_alloc slowpath is invoked. Both are
able to perform the same allocation from the freelist.
Check for the NULL pointer in node_match.
A false positive will lead to a retry of the loop in __slab_alloc.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600
+++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600
@@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
static inline int node_match(struct page *page, int node)
{
#ifdef CONFIG_NUMA
- if (node != NUMA_NO_NODE && page_to_nid(page) != node)
+ if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
return 0;
#endif
return 1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-01-23 21:45 ` Christoph Lameter
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw)
To: Pekka Enberg
Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
David Rientjes, elezegarcia
The variables accessed in slab_alloc are volatile and therefore
the page pointer passed to node_match can be NULL. The processing
of data in slab_alloc is tentative until either the cmpxhchg
succeeds or the __slab_alloc slowpath is invoked. Both are
able to perform the same allocation from the freelist.
Check for the NULL pointer in node_match.
A false positive will lead to a retry of the loop in __slab_alloc.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600
+++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600
@@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
static inline int node_match(struct page *page, int node)
{
#ifdef CONFIG_NUMA
- if (node != NUMA_NO_NODE && page_to_nid(page) != node)
+ if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
return 0;
#endif
return 1;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor.
[not found] <20130123214514.370647954@linux.com>
@ 2013-01-23 21:45 ` Christoph Lameter
2013-01-23 21:45 ` Christoph Lameter
1 sibling, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw)
To: Pekka Enberg
Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
David Rientjes, elezegarcia
As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
after the determination of the per cpu pointer and before the tid is retrieved.
This could result in allocation from the wrong node in slab_alloc.
The effect is much more severe in slab_free() where we could free to the freelist
of the wrong page.
The window for something like that occurring is pretty small but it is possible.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2013-01-23 15:06:39.805154107 -0600
+++ linux/mm/slub.c 2013-01-23 15:24:47.656868067 -0600
@@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_
s = memcg_kmem_get_cache(s, gfpflags);
redo:
-
/*
* Must read kmem_cache cpu data via this cpu ptr. Preemption is
* enabled. We may switch back and forth between cpus while
* reading from one cpu area. That does not matter as long
* as we end up on the original cpu again when doing the cmpxchg.
+ *
+ * Preemption is disabled for the retrieval of the tid because that
+ * must occur from the current processor. We cannot allow rescheduling
+ * on a different processor between the determination of the pointer
+ * and the retrieval of the tid.
*/
+ preempt_disable();
c = __this_cpu_ptr(s->cpu_slab);
/*
@@ -2347,7 +2352,7 @@ redo:
* linked list in between.
*/
tid = c->tid;
- barrier();
+ preempt_enable();
object = c->freelist;
page = c->page;
@@ -2594,10 +2599,11 @@ redo:
* data is retrieved via this pointer. If we are on the same cpu
* during the cmpxchg then the free will succedd.
*/
+ preempt_disable();
c = __this_cpu_ptr(s->cpu_slab);
tid = c->tid;
- barrier();
+ preempt_enable();
if (likely(page == c->page)) {
set_freepointer(s, object, c->freelist);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor.
@ 2013-01-23 21:45 ` Christoph Lameter
0 siblings, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw)
To: Pekka Enberg
Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
David Rientjes, elezegarcia
As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
after the determination of the per cpu pointer and before the tid is retrieved.
This could result in allocation from the wrong node in slab_alloc.
The effect is much more severe in slab_free() where we could free to the freelist
of the wrong page.
The window for something like that occurring is pretty small but it is possible.
Signed-off-by: Christoph Lameter <cl@linux.com>
Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c 2013-01-23 15:06:39.805154107 -0600
+++ linux/mm/slub.c 2013-01-23 15:24:47.656868067 -0600
@@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_
s = memcg_kmem_get_cache(s, gfpflags);
redo:
-
/*
* Must read kmem_cache cpu data via this cpu ptr. Preemption is
* enabled. We may switch back and forth between cpus while
* reading from one cpu area. That does not matter as long
* as we end up on the original cpu again when doing the cmpxchg.
+ *
+ * Preemption is disabled for the retrieval of the tid because that
+ * must occur from the current processor. We cannot allow rescheduling
+ * on a different processor between the determination of the pointer
+ * and the retrieval of the tid.
*/
+ preempt_disable();
c = __this_cpu_ptr(s->cpu_slab);
/*
@@ -2347,7 +2352,7 @@ redo:
* linked list in between.
*/
tid = c->tid;
- barrier();
+ preempt_enable();
object = c->freelist;
page = c->page;
@@ -2594,10 +2599,11 @@ redo:
* data is retrieved via this pointer. If we are on the same cpu
* during the cmpxchg then the free will succedd.
*/
+ preempt_disable();
c = __this_cpu_ptr(s->cpu_slab);
tid = c->tid;
- barrier();
+ preempt_enable();
if (likely(page == c->page)) {
set_freepointer(s, object, c->freelist);
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
2013-01-23 21:45 ` Christoph Lameter
@ 2013-01-24 0:53 ` Simon Jeons
-1 siblings, 0 replies; 16+ messages in thread
From: Simon Jeons @ 2013-01-24 0:53 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
Clark Williams, John Kacur, Luis Claudio R. Goncalves,
Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
elezegarcia
On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> The variables accessed in slab_alloc are volatile and therefore
> the page pointer passed to node_match can be NULL. The processing
> of data in slab_alloc is tentative until either the cmpxhchg
> succeeds or the __slab_alloc slowpath is invoked. Both are
> able to perform the same allocation from the freelist.
>
> Check for the NULL pointer in node_match.
>
> A false positive will lead to a retry of the loop in __slab_alloc.
Hi Christoph,
Since page_to_nid(NULL) will trigger bug, then how can run into
__slab_alloc?
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600
> +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600
> @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
> static inline int node_match(struct page *page, int node)
> {
> #ifdef CONFIG_NUMA
> - if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
> return 0;
> #endif
> return 1;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-01-24 0:53 ` Simon Jeons
0 siblings, 0 replies; 16+ messages in thread
From: Simon Jeons @ 2013-01-24 0:53 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
Clark Williams, John Kacur, Luis Claudio R. Goncalves,
Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
elezegarcia
On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> The variables accessed in slab_alloc are volatile and therefore
> the page pointer passed to node_match can be NULL. The processing
> of data in slab_alloc is tentative until either the cmpxhchg
> succeeds or the __slab_alloc slowpath is invoked. Both are
> able to perform the same allocation from the freelist.
>
> Check for the NULL pointer in node_match.
>
> A false positive will lead to a retry of the loop in __slab_alloc.
Hi Christoph,
Since page_to_nid(NULL) will trigger bug, then how can run into
__slab_alloc?
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600
> +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600
> @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
> static inline int node_match(struct page *page, int node)
> {
> #ifdef CONFIG_NUMA
> - if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
> return 0;
> #endif
> return 1;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
2013-01-24 0:53 ` Simon Jeons
(?)
@ 2013-01-24 15:14 ` Christoph Lameter
2013-01-25 8:11 ` Simon Jeons
-1 siblings, 1 reply; 16+ messages in thread
From: Christoph Lameter @ 2013-01-24 15:14 UTC (permalink / raw)
To: Simon Jeons
Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
Clark Williams, John Kacur, Luis Claudio R. Goncalves,
Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
elezegarcia
On Wed, 23 Jan 2013, Simon Jeons wrote:
> On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> > The variables accessed in slab_alloc are volatile and therefore
> > the page pointer passed to node_match can be NULL. The processing
> > of data in slab_alloc is tentative until either the cmpxhchg
> > succeeds or the __slab_alloc slowpath is invoked. Both are
> > able to perform the same allocation from the freelist.
> >
> > Check for the NULL pointer in node_match.
> >
> > A false positive will lead to a retry of the loop in __slab_alloc.
>
> Hi Christoph,
>
> Since page_to_nid(NULL) will trigger bug, then how can run into
> __slab_alloc?
page = NULL
->
node_match(NULL, xx) = 0
->
call into __slab_alloc.
__slab_alloc() will check for !c->page which requires the assignment of a
new per cpu slab page.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
2013-01-24 15:14 ` Christoph Lameter
@ 2013-01-25 8:11 ` Simon Jeons
0 siblings, 0 replies; 16+ messages in thread
From: Simon Jeons @ 2013-01-25 8:11 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
Clark Williams, John Kacur, Luis Claudio R. Goncalves,
Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
elezegarcia
On Thu, 2013-01-24 at 15:14 +0000, Christoph Lameter wrote:
> On Wed, 23 Jan 2013, Simon Jeons wrote:
>
> > On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> > > The variables accessed in slab_alloc are volatile and therefore
> > > the page pointer passed to node_match can be NULL. The processing
> > > of data in slab_alloc is tentative until either the cmpxhchg
> > > succeeds or the __slab_alloc slowpath is invoked. Both are
> > > able to perform the same allocation from the freelist.
> > >
> > > Check for the NULL pointer in node_match.
> > >
> > > A false positive will lead to a retry of the loop in __slab_alloc.
> >
> > Hi Christoph,
> >
> > Since page_to_nid(NULL) will trigger bug, then how can run into
> > __slab_alloc?
>
> page = NULL
>
> ->
>
> node_match(NULL, xx) = 0
>
> ->
>
> call into __slab_alloc.
>
> __slab_alloc() will check for !c->page which requires the assignment of a
> new per cpu slab page.
>
But there are dereference in page_to_nid path, function page_to_section:
return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-01-25 8:11 ` Simon Jeons
0 siblings, 0 replies; 16+ messages in thread
From: Simon Jeons @ 2013-01-25 8:11 UTC (permalink / raw)
To: Christoph Lameter
Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
Clark Williams, John Kacur, Luis Claudio R. Goncalves,
Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
elezegarcia
On Thu, 2013-01-24 at 15:14 +0000, Christoph Lameter wrote:
> On Wed, 23 Jan 2013, Simon Jeons wrote:
>
> > On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> > > The variables accessed in slab_alloc are volatile and therefore
> > > the page pointer passed to node_match can be NULL. The processing
> > > of data in slab_alloc is tentative until either the cmpxhchg
> > > succeeds or the __slab_alloc slowpath is invoked. Both are
> > > able to perform the same allocation from the freelist.
> > >
> > > Check for the NULL pointer in node_match.
> > >
> > > A false positive will lead to a retry of the loop in __slab_alloc.
> >
> > Hi Christoph,
> >
> > Since page_to_nid(NULL) will trigger bug, then how can run into
> > __slab_alloc?
>
> page = NULL
>
> ->
>
> node_match(NULL, xx) = 0
>
> ->
>
> call into __slab_alloc.
>
> __slab_alloc() will check for !c->page which requires the assignment of a
> new per cpu slab page.
>
But there are dereference in page_to_nid path, function page_to_section:
return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
2013-01-25 8:11 ` Simon Jeons
(?)
@ 2013-01-25 14:53 ` Christoph Lameter
-1 siblings, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-25 14:53 UTC (permalink / raw)
To: Simon Jeons
Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
Clark Williams, John Kacur, Luis Claudio R. Goncalves,
Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
elezegarcia
On Fri, 25 Jan 2013, Simon Jeons wrote:
> >
> > node_match(NULL, xx) = 0
> >
> > ->
> >
> > call into __slab_alloc.
> >
> > __slab_alloc() will check for !c->page which requires the assignment of a
> > new per cpu slab page.
> >
>
> But there are dereference in page_to_nid path, function page_to_section:
> return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;
node_match() checks for NULL and will not invoke page_to_nid for a NULL
pointer.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
2013-01-23 21:45 ` Christoph Lameter
@ 2013-02-01 10:23 ` Pekka Enberg
-1 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2013-02-01 10:23 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
David Rientjes, elezegarcia
On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> The variables accessed in slab_alloc are volatile and therefore
> the page pointer passed to node_match can be NULL. The processing
> of data in slab_alloc is tentative until either the cmpxhchg
> succeeds or the __slab_alloc slowpath is invoked. Both are
> able to perform the same allocation from the freelist.
>
> Check for the NULL pointer in node_match.
>
> A false positive will lead to a retry of the loop in __slab_alloc.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
Steven, how did you trigger the problem - i.e. is this -rt only
problem? Does the patch work for you?
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600
> +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600
> @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
> static inline int node_match(struct page *page, int node)
> {
> #ifdef CONFIG_NUMA
> - if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
> return 0;
> #endif
> return 1;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-02-01 10:23 ` Pekka Enberg
0 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2013-02-01 10:23 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
David Rientjes, elezegarcia
On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> The variables accessed in slab_alloc are volatile and therefore
> the page pointer passed to node_match can be NULL. The processing
> of data in slab_alloc is tentative until either the cmpxhchg
> succeeds or the __slab_alloc slowpath is invoked. Both are
> able to perform the same allocation from the freelist.
>
> Check for the NULL pointer in node_match.
>
> A false positive will lead to a retry of the loop in __slab_alloc.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
Steven, how did you trigger the problem - i.e. is this -rt only
problem? Does the patch work for you?
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600
> +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600
> @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
> static inline int node_match(struct page *page, int node)
> {
> #ifdef CONFIG_NUMA
> - if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
> return 0;
> #endif
> return 1;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor.
2013-01-23 21:45 ` Christoph Lameter
@ 2013-02-01 10:24 ` Pekka Enberg
-1 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2013-02-01 10:24 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
David Rientjes, elezegarcia
On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
> after the determination of the per cpu pointer and before the tid is retrieved.
> This could result in allocation from the wrong node in slab_alloc.
>
> The effect is much more severe in slab_free() where we could free to the freelist
> of the wrong page.
>
> The window for something like that occurring is pretty small but it is possible.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
Okay, makes sense. Has anyone triggered this in practice? Do we want
to tag this for -stable?
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2013-01-23 15:06:39.805154107 -0600
> +++ linux/mm/slub.c 2013-01-23 15:24:47.656868067 -0600
> @@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_
>
> s = memcg_kmem_get_cache(s, gfpflags);
> redo:
> -
> /*
> * Must read kmem_cache cpu data via this cpu ptr. Preemption is
> * enabled. We may switch back and forth between cpus while
> * reading from one cpu area. That does not matter as long
> * as we end up on the original cpu again when doing the cmpxchg.
> + *
> + * Preemption is disabled for the retrieval of the tid because that
> + * must occur from the current processor. We cannot allow rescheduling
> + * on a different processor between the determination of the pointer
> + * and the retrieval of the tid.
> */
> + preempt_disable();
> c = __this_cpu_ptr(s->cpu_slab);
>
> /*
> @@ -2347,7 +2352,7 @@ redo:
> * linked list in between.
> */
> tid = c->tid;
> - barrier();
> + preempt_enable();
>
> object = c->freelist;
> page = c->page;
> @@ -2594,10 +2599,11 @@ redo:
> * data is retrieved via this pointer. If we are on the same cpu
> * during the cmpxchg then the free will succedd.
> */
> + preempt_disable();
> c = __this_cpu_ptr(s->cpu_slab);
>
> tid = c->tid;
> - barrier();
> + preempt_enable();
>
> if (likely(page == c->page)) {
> set_freepointer(s, object, c->freelist);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor.
@ 2013-02-01 10:24 ` Pekka Enberg
0 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2013-02-01 10:24 UTC (permalink / raw)
To: Christoph Lameter
Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
David Rientjes, elezegarcia
On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
> after the determination of the per cpu pointer and before the tid is retrieved.
> This could result in allocation from the wrong node in slab_alloc.
>
> The effect is much more severe in slab_free() where we could free to the freelist
> of the wrong page.
>
> The window for something like that occurring is pretty small but it is possible.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>
Okay, makes sense. Has anyone triggered this in practice? Do we want
to tag this for -stable?
>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c 2013-01-23 15:06:39.805154107 -0600
> +++ linux/mm/slub.c 2013-01-23 15:24:47.656868067 -0600
> @@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_
>
> s = memcg_kmem_get_cache(s, gfpflags);
> redo:
> -
> /*
> * Must read kmem_cache cpu data via this cpu ptr. Preemption is
> * enabled. We may switch back and forth between cpus while
> * reading from one cpu area. That does not matter as long
> * as we end up on the original cpu again when doing the cmpxchg.
> + *
> + * Preemption is disabled for the retrieval of the tid because that
> + * must occur from the current processor. We cannot allow rescheduling
> + * on a different processor between the determination of the pointer
> + * and the retrieval of the tid.
> */
> + preempt_disable();
> c = __this_cpu_ptr(s->cpu_slab);
>
> /*
> @@ -2347,7 +2352,7 @@ redo:
> * linked list in between.
> */
> tid = c->tid;
> - barrier();
> + preempt_enable();
>
> object = c->freelist;
> page = c->page;
> @@ -2594,10 +2599,11 @@ redo:
> * data is retrieved via this pointer. If we are on the same cpu
> * during the cmpxchg then the free will succedd.
> */
> + preempt_disable();
> c = __this_cpu_ptr(s->cpu_slab);
>
> tid = c->tid;
> - barrier();
> + preempt_enable();
>
> if (likely(page == c->page)) {
> set_freepointer(s, object, c->freelist);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
2013-02-01 10:23 ` Pekka Enberg
@ 2013-02-01 11:51 ` Steven Rostedt
-1 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2013-02-01 11:51 UTC (permalink / raw)
To: Pekka Enberg
Cc: Christoph Lameter, Thomas Gleixner, RT, Clark Williams,
John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim,
Glauber Costa, linux-mm, David Rientjes, elezegarcia
On Fri, 2013-02-01 at 12:23 +0200, Pekka Enberg wrote:
> On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> > The variables accessed in slab_alloc are volatile and therefore
> > the page pointer passed to node_match can be NULL. The processing
> > of data in slab_alloc is tentative until either the cmpxhchg
> > succeeds or the __slab_alloc slowpath is invoked. Both are
> > able to perform the same allocation from the freelist.
> >
> > Check for the NULL pointer in node_match.
> >
> > A false positive will lead to a retry of the loop in __slab_alloc.
> >
> > Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Steven, how did you trigger the problem - i.e. is this -rt only
> problem? Does the patch work for you?
I haven't tested Christoph's version yet. I've only tested my own. But
I'll take his and run them through tests as well. This bug is not easy
to hit.
It is not a -rt only bug, and yes it probably should go to stable. The
race is extremely small, but -rt creates scenarios that may only be hit
by 1000 CPU core machines. Because of the preemptive nature of -rt, -rt
is much more susceptible to race conditions than mainline. But these are
real bugs for mainline too. It may only trigger once a year, where in
-rt it will trigger once a week.
-- Steve
>
> > Index: linux/mm/slub.c
> > ===================================================================
> > --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600
> > +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600
> > @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
> > static inline int node_match(struct page *page, int node)
> > {
> > #ifdef CONFIG_NUMA
> > - if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> > + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
> > return 0;
> > #endif
> > return 1;
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-02-01 11:51 ` Steven Rostedt
0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2013-02-01 11:51 UTC (permalink / raw)
To: Pekka Enberg
Cc: Christoph Lameter, Thomas Gleixner, RT, Clark Williams,
John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim,
Glauber Costa, linux-mm, David Rientjes, elezegarcia
On Fri, 2013-02-01 at 12:23 +0200, Pekka Enberg wrote:
> On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> > The variables accessed in slab_alloc are volatile and therefore
> > the page pointer passed to node_match can be NULL. The processing
> > of data in slab_alloc is tentative until either the cmpxhchg
> > succeeds or the __slab_alloc slowpath is invoked. Both are
> > able to perform the same allocation from the freelist.
> >
> > Check for the NULL pointer in node_match.
> >
> > A false positive will lead to a retry of the loop in __slab_alloc.
> >
> > Signed-off-by: Christoph Lameter <cl@linux.com>
>
> Steven, how did you trigger the problem - i.e. is this -rt only
> problem? Does the patch work for you?
I haven't tested Christoph's version yet. I've only tested my own. But
I'll take his and run them through tests as well. This bug is not easy
to hit.
It is not a -rt only bug, and yes it probably should go to stable. The
race is extremely small, but -rt creates scenarios that may only be hit
by 1000 CPU core machines. Because of the preemptive nature of -rt, -rt
is much more susceptible to race conditions than mainline. But these are
real bugs for mainline too. It may only trigger once a year, where in
-rt it will trigger once a week.
-- Steve
>
> > Index: linux/mm/slub.c
> > ===================================================================
> > --- linux.orig/mm/slub.c 2013-01-18 08:47:29.198954250 -0600
> > +++ linux/mm/slub.c 2013-01-18 08:47:40.579126371 -0600
> > @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
> > static inline int node_match(struct page *page, int node)
> > {
> > #ifdef CONFIG_NUMA
> > - if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> > + if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
> > return 0;
> > #endif
> > return 1;
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org. For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-02-01 11:51 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20130123214514.370647954@linux.com>
2013-01-23 21:45 ` FIX [1/2] slub: Do not dereference NULL pointer in node_match Christoph Lameter
2013-01-23 21:45 ` Christoph Lameter
2013-01-24 0:53 ` Simon Jeons
2013-01-24 0:53 ` Simon Jeons
2013-01-24 15:14 ` Christoph Lameter
2013-01-25 8:11 ` Simon Jeons
2013-01-25 8:11 ` Simon Jeons
2013-01-25 14:53 ` Christoph Lameter
2013-02-01 10:23 ` Pekka Enberg
2013-02-01 10:23 ` Pekka Enberg
2013-02-01 11:51 ` Steven Rostedt
2013-02-01 11:51 ` Steven Rostedt
2013-01-23 21:45 ` FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor Christoph Lameter
2013-01-23 21:45 ` Christoph Lameter
2013-02-01 10:24 ` Pekka Enberg
2013-02-01 10:24 ` Pekka Enberg
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.