All of lore.kernel.org
 help / color / mirror / Atom feed
* FIX [1/2] slub: Do not dereference NULL pointer in node_match
       [not found] <20130123214514.370647954@linux.com>
@ 2013-01-23 21:45   ` Christoph Lameter
  2013-01-23 21:45   ` Christoph Lameter
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
	David Rientjes, elezegarcia

The variables accessed in slab_alloc are volatile and therefore
the page pointer passed to node_match can be NULL. The processing
of data in slab_alloc is tentative until either the cmpxhchg
succeeds or the __slab_alloc slowpath is invoked. Both are
able to perform the same allocation from the freelist.

Check for the NULL pointer in node_match.

A false positive will lead to a retry of the loop in __slab_alloc.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-01-18 08:47:29.198954250 -0600
+++ linux/mm/slub.c	2013-01-18 08:47:40.579126371 -0600
@@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
 static inline int node_match(struct page *page, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
+	if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
 		return 0;
 #endif
 	return 1;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-01-23 21:45   ` Christoph Lameter
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
	David Rientjes, elezegarcia

The variables accessed in slab_alloc are volatile and therefore
the page pointer passed to node_match can be NULL. The processing
of data in slab_alloc is tentative until either the cmpxhchg
succeeds or the __slab_alloc slowpath is invoked. Both are
able to perform the same allocation from the freelist.

Check for the NULL pointer in node_match.

A false positive will lead to a retry of the loop in __slab_alloc.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-01-18 08:47:29.198954250 -0600
+++ linux/mm/slub.c	2013-01-18 08:47:40.579126371 -0600
@@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
 static inline int node_match(struct page *page, int node)
 {
 #ifdef CONFIG_NUMA
-	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
+	if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
 		return 0;
 #endif
 	return 1;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor.
       [not found] <20130123214514.370647954@linux.com>
@ 2013-01-23 21:45   ` Christoph Lameter
  2013-01-23 21:45   ` Christoph Lameter
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
	David Rientjes, elezegarcia

As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
after the determination of the per cpu pointer and before the tid is retrieved.
This could result in allocation from the wrong node in slab_alloc.

The effect is much more severe in slab_free() where we could free to the freelist
of the wrong page.

The window for something like that occurring is pretty small but it is possible.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-01-23 15:06:39.805154107 -0600
+++ linux/mm/slub.c	2013-01-23 15:24:47.656868067 -0600
@@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_
 
 	s = memcg_kmem_get_cache(s, gfpflags);
 redo:
-
 	/*
 	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
 	 * enabled. We may switch back and forth between cpus while
 	 * reading from one cpu area. That does not matter as long
 	 * as we end up on the original cpu again when doing the cmpxchg.
+	 *
+	 * Preemption is disabled for the retrieval of the tid because that
+	 * must occur from the current processor. We cannot allow rescheduling
+	 * on a different processor between the determination of the pointer
+	 * and the retrieval of the tid.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	/*
@@ -2347,7 +2352,7 @@ redo:
 	 * linked list in between.
 	 */
 	tid = c->tid;
-	barrier();
+	preempt_enable();
 
 	object = c->freelist;
 	page = c->page;
@@ -2594,10 +2599,11 @@ redo:
 	 * data is retrieved via this pointer. If we are on the same cpu
 	 * during the cmpxchg then the free will succedd.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	tid = c->tid;
-	barrier();
+	preempt_enable();
 
 	if (likely(page == c->page)) {
 		set_freepointer(s, object, c->freelist);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor.
@ 2013-01-23 21:45   ` Christoph Lameter
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-23 21:45 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
	David Rientjes, elezegarcia

As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
after the determination of the per cpu pointer and before the tid is retrieved.
This could result in allocation from the wrong node in slab_alloc.

The effect is much more severe in slab_free() where we could free to the freelist
of the wrong page.

The window for something like that occurring is pretty small but it is possible.

Signed-off-by: Christoph Lameter <cl@linux.com>

Index: linux/mm/slub.c
===================================================================
--- linux.orig/mm/slub.c	2013-01-23 15:06:39.805154107 -0600
+++ linux/mm/slub.c	2013-01-23 15:24:47.656868067 -0600
@@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_
 
 	s = memcg_kmem_get_cache(s, gfpflags);
 redo:
-
 	/*
 	 * Must read kmem_cache cpu data via this cpu ptr. Preemption is
 	 * enabled. We may switch back and forth between cpus while
 	 * reading from one cpu area. That does not matter as long
 	 * as we end up on the original cpu again when doing the cmpxchg.
+	 *
+	 * Preemption is disabled for the retrieval of the tid because that
+	 * must occur from the current processor. We cannot allow rescheduling
+	 * on a different processor between the determination of the pointer
+	 * and the retrieval of the tid.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	/*
@@ -2347,7 +2352,7 @@ redo:
 	 * linked list in between.
 	 */
 	tid = c->tid;
-	barrier();
+	preempt_enable();
 
 	object = c->freelist;
 	page = c->page;
@@ -2594,10 +2599,11 @@ redo:
 	 * data is retrieved via this pointer. If we are on the same cpu
 	 * during the cmpxchg then the free will succedd.
 	 */
+	preempt_disable();
 	c = __this_cpu_ptr(s->cpu_slab);
 
 	tid = c->tid;
-	barrier();
+	preempt_enable();
 
 	if (likely(page == c->page)) {
 		set_freepointer(s, object, c->freelist);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
  2013-01-23 21:45   ` Christoph Lameter
@ 2013-01-24  0:53     ` Simon Jeons
  -1 siblings, 0 replies; 16+ messages in thread
From: Simon Jeons @ 2013-01-24  0:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
	Clark Williams, John Kacur, Luis Claudio R. Goncalves,
	Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
	elezegarcia

On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> The variables accessed in slab_alloc are volatile and therefore
> the page pointer passed to node_match can be NULL. The processing
> of data in slab_alloc is tentative until either the cmpxhchg
> succeeds or the __slab_alloc slowpath is invoked. Both are
> able to perform the same allocation from the freelist.
> 
> Check for the NULL pointer in node_match.
> 
> A false positive will lead to a retry of the loop in __slab_alloc.

Hi Christoph,

Since page_to_nid(NULL) will trigger bug, then how can run into
__slab_alloc?

> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-01-18 08:47:29.198954250 -0600
> +++ linux/mm/slub.c	2013-01-18 08:47:40.579126371 -0600
> @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
>  static inline int node_match(struct page *page, int node)
>  {
>  #ifdef CONFIG_NUMA
> -	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> +	if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
>  		return 0;
>  #endif
>  	return 1;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-01-24  0:53     ` Simon Jeons
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Jeons @ 2013-01-24  0:53 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
	Clark Williams, John Kacur, Luis Claudio R. Goncalves,
	Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
	elezegarcia

On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> The variables accessed in slab_alloc are volatile and therefore
> the page pointer passed to node_match can be NULL. The processing
> of data in slab_alloc is tentative until either the cmpxhchg
> succeeds or the __slab_alloc slowpath is invoked. Both are
> able to perform the same allocation from the freelist.
> 
> Check for the NULL pointer in node_match.
> 
> A false positive will lead to a retry of the loop in __slab_alloc.

Hi Christoph,

Since page_to_nid(NULL) will trigger bug, then how can run into
__slab_alloc?

> 
> Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c	2013-01-18 08:47:29.198954250 -0600
> +++ linux/mm/slub.c	2013-01-18 08:47:40.579126371 -0600
> @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
>  static inline int node_match(struct page *page, int node)
>  {
>  #ifdef CONFIG_NUMA
> -	if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> +	if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
>  		return 0;
>  #endif
>  	return 1;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
  2013-01-24  0:53     ` Simon Jeons
  (?)
@ 2013-01-24 15:14     ` Christoph Lameter
  2013-01-25  8:11         ` Simon Jeons
  -1 siblings, 1 reply; 16+ messages in thread
From: Christoph Lameter @ 2013-01-24 15:14 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
	Clark Williams, John Kacur, Luis Claudio R. Goncalves,
	Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
	elezegarcia

On Wed, 23 Jan 2013, Simon Jeons wrote:

> On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> > The variables accessed in slab_alloc are volatile and therefore
> > the page pointer passed to node_match can be NULL. The processing
> > of data in slab_alloc is tentative until either the cmpxhchg
> > succeeds or the __slab_alloc slowpath is invoked. Both are
> > able to perform the same allocation from the freelist.
> >
> > Check for the NULL pointer in node_match.
> >
> > A false positive will lead to a retry of the loop in __slab_alloc.
>
> Hi Christoph,
>
> Since page_to_nid(NULL) will trigger bug, then how can run into
> __slab_alloc?

page = NULL

	 ->

node_match(NULL, xx) = 0

 	->

call into __slab_alloc.

__slab_alloc() will check for !c->page which requires the assignment of a
new per cpu slab page.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
  2013-01-24 15:14     ` Christoph Lameter
@ 2013-01-25  8:11         ` Simon Jeons
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Jeons @ 2013-01-25  8:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
	Clark Williams, John Kacur, Luis Claudio R. Goncalves,
	Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
	elezegarcia

On Thu, 2013-01-24 at 15:14 +0000, Christoph Lameter wrote:
> On Wed, 23 Jan 2013, Simon Jeons wrote:
> 
> > On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> > > The variables accessed in slab_alloc are volatile and therefore
> > > the page pointer passed to node_match can be NULL. The processing
> > > of data in slab_alloc is tentative until either the cmpxhchg
> > > succeeds or the __slab_alloc slowpath is invoked. Both are
> > > able to perform the same allocation from the freelist.
> > >
> > > Check for the NULL pointer in node_match.
> > >
> > > A false positive will lead to a retry of the loop in __slab_alloc.
> >
> > Hi Christoph,
> >
> > Since page_to_nid(NULL) will trigger bug, then how can run into
> > __slab_alloc?
> 
> page = NULL
> 
> 	 ->
> 
> node_match(NULL, xx) = 0
> 
>  	->
> 
> call into __slab_alloc.
> 
> __slab_alloc() will check for !c->page which requires the assignment of a
> new per cpu slab page.
> 

But there are dereference in page_to_nid path, function page_to_section:
return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-01-25  8:11         ` Simon Jeons
  0 siblings, 0 replies; 16+ messages in thread
From: Simon Jeons @ 2013-01-25  8:11 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
	Clark Williams, John Kacur, Luis Claudio R. Goncalves,
	Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
	elezegarcia

On Thu, 2013-01-24 at 15:14 +0000, Christoph Lameter wrote:
> On Wed, 23 Jan 2013, Simon Jeons wrote:
> 
> > On Wed, 2013-01-23 at 21:45 +0000, Christoph Lameter wrote:
> > > The variables accessed in slab_alloc are volatile and therefore
> > > the page pointer passed to node_match can be NULL. The processing
> > > of data in slab_alloc is tentative until either the cmpxhchg
> > > succeeds or the __slab_alloc slowpath is invoked. Both are
> > > able to perform the same allocation from the freelist.
> > >
> > > Check for the NULL pointer in node_match.
> > >
> > > A false positive will lead to a retry of the loop in __slab_alloc.
> >
> > Hi Christoph,
> >
> > Since page_to_nid(NULL) will trigger bug, then how can run into
> > __slab_alloc?
> 
> page = NULL
> 
> 	 ->
> 
> node_match(NULL, xx) = 0
> 
>  	->
> 
> call into __slab_alloc.
> 
> __slab_alloc() will check for !c->page which requires the assignment of a
> new per cpu slab page.
> 

But there are dereference in page_to_nid path, function page_to_section:
return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
  2013-01-25  8:11         ` Simon Jeons
  (?)
@ 2013-01-25 14:53         ` Christoph Lameter
  -1 siblings, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2013-01-25 14:53 UTC (permalink / raw)
  To: Simon Jeons
  Cc: Pekka Enberg, Steven Rostedt, Thomas Gleixner, RT,
	Clark Williams, John Kacur, Luis Claudio R. Goncalves,
	Joonsoo Kim, Glauber Costa, linux-mm, David Rientjes,
	elezegarcia

On Fri, 25 Jan 2013, Simon Jeons wrote:

> >
> > node_match(NULL, xx) = 0
> >
> >  	->
> >
> > call into __slab_alloc.
> >
> > __slab_alloc() will check for !c->page which requires the assignment of a
> > new per cpu slab page.
> >
>
> But there are dereference in page_to_nid path, function page_to_section:
> return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK;

node_match() checks for NULL and will not invoke page_to_nid for a NULL
pointer.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
  2013-01-23 21:45   ` Christoph Lameter
@ 2013-02-01 10:23     ` Pekka Enberg
  -1 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2013-02-01 10:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
	David Rientjes, elezegarcia

On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> The variables accessed in slab_alloc are volatile and therefore
> the page pointer passed to node_match can be NULL. The processing
> of data in slab_alloc is tentative until either the cmpxhchg
> succeeds or the __slab_alloc slowpath is invoked. Both are
> able to perform the same allocation from the freelist.
>
> Check for the NULL pointer in node_match.
>
> A false positive will lead to a retry of the loop in __slab_alloc.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

Steven, how did you trigger the problem - i.e. is this -rt only
problem? Does the patch work for you?

> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c        2013-01-18 08:47:29.198954250 -0600
> +++ linux/mm/slub.c     2013-01-18 08:47:40.579126371 -0600
> @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
>  static inline int node_match(struct page *page, int node)
>  {
>  #ifdef CONFIG_NUMA
> -       if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> +       if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
>                 return 0;
>  #endif
>         return 1;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-02-01 10:23     ` Pekka Enberg
  0 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2013-02-01 10:23 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
	David Rientjes, elezegarcia

On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> The variables accessed in slab_alloc are volatile and therefore
> the page pointer passed to node_match can be NULL. The processing
> of data in slab_alloc is tentative until either the cmpxhchg
> succeeds or the __slab_alloc slowpath is invoked. Both are
> able to perform the same allocation from the freelist.
>
> Check for the NULL pointer in node_match.
>
> A false positive will lead to a retry of the loop in __slab_alloc.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

Steven, how did you trigger the problem - i.e. is this -rt only
problem? Does the patch work for you?

> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c        2013-01-18 08:47:29.198954250 -0600
> +++ linux/mm/slub.c     2013-01-18 08:47:40.579126371 -0600
> @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
>  static inline int node_match(struct page *page, int node)
>  {
>  #ifdef CONFIG_NUMA
> -       if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> +       if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
>                 return 0;
>  #endif
>         return 1;
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor.
  2013-01-23 21:45   ` Christoph Lameter
@ 2013-02-01 10:24     ` Pekka Enberg
  -1 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2013-02-01 10:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
	David Rientjes, elezegarcia

On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
> after the determination of the per cpu pointer and before the tid is retrieved.
> This could result in allocation from the wrong node in slab_alloc.
>
> The effect is much more severe in slab_free() where we could free to the freelist
> of the wrong page.
>
> The window for something like that occurring is pretty small but it is possible.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

Okay, makes sense. Has anyone triggered this in practice? Do we want
to tag this for -stable?

>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c        2013-01-23 15:06:39.805154107 -0600
> +++ linux/mm/slub.c     2013-01-23 15:24:47.656868067 -0600
> @@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_
>
>         s = memcg_kmem_get_cache(s, gfpflags);
>  redo:
> -
>         /*
>          * Must read kmem_cache cpu data via this cpu ptr. Preemption is
>          * enabled. We may switch back and forth between cpus while
>          * reading from one cpu area. That does not matter as long
>          * as we end up on the original cpu again when doing the cmpxchg.
> +        *
> +        * Preemption is disabled for the retrieval of the tid because that
> +        * must occur from the current processor. We cannot allow rescheduling
> +        * on a different processor between the determination of the pointer
> +        * and the retrieval of the tid.
>          */
> +       preempt_disable();
>         c = __this_cpu_ptr(s->cpu_slab);
>
>         /*
> @@ -2347,7 +2352,7 @@ redo:
>          * linked list in between.
>          */
>         tid = c->tid;
> -       barrier();
> +       preempt_enable();
>
>         object = c->freelist;
>         page = c->page;
> @@ -2594,10 +2599,11 @@ redo:
>          * data is retrieved via this pointer. If we are on the same cpu
>          * during the cmpxchg then the free will succedd.
>          */
> +       preempt_disable();
>         c = __this_cpu_ptr(s->cpu_slab);
>
>         tid = c->tid;
> -       barrier();
> +       preempt_enable();
>
>         if (likely(page == c->page)) {
>                 set_freepointer(s, object, c->freelist);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor.
@ 2013-02-01 10:24     ` Pekka Enberg
  0 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2013-02-01 10:24 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Steven Rostedt, Thomas Gleixner, RT, Clark Williams, John Kacur,
	Luis Claudio R. Goncalves, Joonsoo Kim, Glauber Costa, linux-mm,
	David Rientjes, elezegarcia

On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> As Steven Rostedt has pointer out: Rescheduling could occur on a differnet processor
> after the determination of the per cpu pointer and before the tid is retrieved.
> This could result in allocation from the wrong node in slab_alloc.
>
> The effect is much more severe in slab_free() where we could free to the freelist
> of the wrong page.
>
> The window for something like that occurring is pretty small but it is possible.
>
> Signed-off-by: Christoph Lameter <cl@linux.com>

Okay, makes sense. Has anyone triggered this in practice? Do we want
to tag this for -stable?

>
> Index: linux/mm/slub.c
> ===================================================================
> --- linux.orig/mm/slub.c        2013-01-23 15:06:39.805154107 -0600
> +++ linux/mm/slub.c     2013-01-23 15:24:47.656868067 -0600
> @@ -2331,13 +2331,18 @@ static __always_inline void *slab_alloc_
>
>         s = memcg_kmem_get_cache(s, gfpflags);
>  redo:
> -
>         /*
>          * Must read kmem_cache cpu data via this cpu ptr. Preemption is
>          * enabled. We may switch back and forth between cpus while
>          * reading from one cpu area. That does not matter as long
>          * as we end up on the original cpu again when doing the cmpxchg.
> +        *
> +        * Preemption is disabled for the retrieval of the tid because that
> +        * must occur from the current processor. We cannot allow rescheduling
> +        * on a different processor between the determination of the pointer
> +        * and the retrieval of the tid.
>          */
> +       preempt_disable();
>         c = __this_cpu_ptr(s->cpu_slab);
>
>         /*
> @@ -2347,7 +2352,7 @@ redo:
>          * linked list in between.
>          */
>         tid = c->tid;
> -       barrier();
> +       preempt_enable();
>
>         object = c->freelist;
>         page = c->page;
> @@ -2594,10 +2599,11 @@ redo:
>          * data is retrieved via this pointer. If we are on the same cpu
>          * during the cmpxchg then the free will succedd.
>          */
> +       preempt_disable();
>         c = __this_cpu_ptr(s->cpu_slab);
>
>         tid = c->tid;
> -       barrier();
> +       preempt_enable();
>
>         if (likely(page == c->page)) {
>                 set_freepointer(s, object, c->freelist);
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
  2013-02-01 10:23     ` Pekka Enberg
@ 2013-02-01 11:51       ` Steven Rostedt
  -1 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2013-02-01 11:51 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Thomas Gleixner, RT, Clark Williams,
	John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim,
	Glauber Costa, linux-mm, David Rientjes, elezegarcia

On Fri, 2013-02-01 at 12:23 +0200, Pekka Enberg wrote:
> On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> > The variables accessed in slab_alloc are volatile and therefore
> > the page pointer passed to node_match can be NULL. The processing
> > of data in slab_alloc is tentative until either the cmpxhchg
> > succeeds or the __slab_alloc slowpath is invoked. Both are
> > able to perform the same allocation from the freelist.
> >
> > Check for the NULL pointer in node_match.
> >
> > A false positive will lead to a retry of the loop in __slab_alloc.
> >
> > Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Steven, how did you trigger the problem - i.e. is this -rt only
> problem? Does the patch work for you?

I haven't tested Christoph's version yet. I've only tested my own. But
I'll take his and run them through tests as well. This bug is not easy
to hit.

It is not a -rt only bug, and yes it probably should go to stable. The
race is extremely small, but -rt creates scenarios that may only be hit
by 1000 CPU core machines. Because of the preemptive nature of -rt, -rt
is much more susceptible to race conditions than mainline. But these are
real bugs for mainline too. It may only trigger once a year, where in
-rt it will trigger once a week.

-- Steve

> 
> > Index: linux/mm/slub.c
> > ===================================================================
> > --- linux.orig/mm/slub.c        2013-01-18 08:47:29.198954250 -0600
> > +++ linux/mm/slub.c     2013-01-18 08:47:40.579126371 -0600
> > @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
> >  static inline int node_match(struct page *page, int node)
> >  {
> >  #ifdef CONFIG_NUMA
> > -       if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> > +       if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
> >                 return 0;
> >  #endif
> >         return 1;
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FIX [1/2] slub: Do not dereference NULL pointer in node_match
@ 2013-02-01 11:51       ` Steven Rostedt
  0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2013-02-01 11:51 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Thomas Gleixner, RT, Clark Williams,
	John Kacur, Luis Claudio R. Goncalves, Joonsoo Kim,
	Glauber Costa, linux-mm, David Rientjes, elezegarcia

On Fri, 2013-02-01 at 12:23 +0200, Pekka Enberg wrote:
> On Wed, Jan 23, 2013 at 11:45 PM, Christoph Lameter <cl@linux.com> wrote:
> > The variables accessed in slab_alloc are volatile and therefore
> > the page pointer passed to node_match can be NULL. The processing
> > of data in slab_alloc is tentative until either the cmpxhchg
> > succeeds or the __slab_alloc slowpath is invoked. Both are
> > able to perform the same allocation from the freelist.
> >
> > Check for the NULL pointer in node_match.
> >
> > A false positive will lead to a retry of the loop in __slab_alloc.
> >
> > Signed-off-by: Christoph Lameter <cl@linux.com>
> 
> Steven, how did you trigger the problem - i.e. is this -rt only
> problem? Does the patch work for you?

I haven't tested Christoph's version yet. I've only tested my own. But
I'll take his and run them through tests as well. This bug is not easy
to hit.

It is not a -rt only bug, and yes it probably should go to stable. The
race is extremely small, but -rt creates scenarios that may only be hit
by 1000 CPU core machines. Because of the preemptive nature of -rt, -rt
is much more susceptible to race conditions than mainline. But these are
real bugs for mainline too. It may only trigger once a year, where in
-rt it will trigger once a week.

-- Steve

> 
> > Index: linux/mm/slub.c
> > ===================================================================
> > --- linux.orig/mm/slub.c        2013-01-18 08:47:29.198954250 -0600
> > +++ linux/mm/slub.c     2013-01-18 08:47:40.579126371 -0600
> > @@ -2041,7 +2041,7 @@ static void flush_all(struct kmem_cache
> >  static inline int node_match(struct page *page, int node)
> >  {
> >  #ifdef CONFIG_NUMA
> > -       if (node != NUMA_NO_NODE && page_to_nid(page) != node)
> > +       if (!page || (node != NUMA_NO_NODE && page_to_nid(page) != node))
> >                 return 0;
> >  #endif
> >         return 1;
> >
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-mm' in
> > the body to majordomo@kvack.org.  For more info on Linux MM,
> > see: http://www.linux-mm.org/ .
> > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-02-01 11:51 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20130123214514.370647954@linux.com>
2013-01-23 21:45 ` FIX [1/2] slub: Do not dereference NULL pointer in node_match Christoph Lameter
2013-01-23 21:45   ` Christoph Lameter
2013-01-24  0:53   ` Simon Jeons
2013-01-24  0:53     ` Simon Jeons
2013-01-24 15:14     ` Christoph Lameter
2013-01-25  8:11       ` Simon Jeons
2013-01-25  8:11         ` Simon Jeons
2013-01-25 14:53         ` Christoph Lameter
2013-02-01 10:23   ` Pekka Enberg
2013-02-01 10:23     ` Pekka Enberg
2013-02-01 11:51     ` Steven Rostedt
2013-02-01 11:51       ` Steven Rostedt
2013-01-23 21:45 ` FIX [2/2] slub: tid must be retrieved from the percpu area of the current processor Christoph Lameter
2013-01-23 21:45   ` Christoph Lameter
2013-02-01 10:24   ` Pekka Enberg
2013-02-01 10:24     ` Pekka Enberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.