All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup
@ 2015-11-05 15:37 Jesper Dangaard Brouer
  2015-11-05 15:37 ` [PATCH V2 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
                   ` (2 more replies)
  0 siblings, 3 replies; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-05 15:37 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

I fixed some bugs for kmem cgroup interaction with SLUB bulk API,
compiled kernel with CONFIG_MEMCG_KMEM=y, but I don't known how to
setup kmem cgroups for slab, thus this is mostly untested.

I will appriciate anyone who can give me a simple setup script...

---

Jesper Dangaard Brouer (2):
      slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
      slub: add missing kmem cgroup support to kmem_cache_free_bulk


 mm/slub.c |   17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH V2 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-05 15:37 [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
@ 2015-11-05 15:37 ` Jesper Dangaard Brouer
  2015-11-05 16:18   ` Vladimir Davydov
  2015-11-05 15:38 ` [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
  2015-11-05 16:10 ` [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Vladimir Davydov
  2 siblings, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-05 15:37 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

The call slab_pre_alloc_hook() interacts with kmemgc and is not
allowed to be called several times inside the bulk alloc for loop,
due to the call to memcg_kmem_get_cache().

This would result in hitting the VM_BUG_ON in __memcg_kmem_get_cache.

To match the number of "put" calls, the call to memcg_kmem_put_cache()
is moved out of slab_post_alloc_hook().

Reported-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 mm/slub.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 9be12ffae9fc..8e9e9b2ee6f3 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1298,7 +1298,6 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
 	flags &= gfp_allowed_mask;
 	kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
 	kmemleak_alloc_recursive(object, s->object_size, 1, s->flags, flags);
-	memcg_kmem_put_cache(s);
 	kasan_slab_alloc(s, object);
 }
 
@@ -2557,6 +2556,7 @@ redo:
 		memset(object, 0, s->object_size);
 
 	slab_post_alloc_hook(s, gfpflags, object);
+	memcg_kmem_put_cache(s);
 
 	return object;
 }
@@ -2906,6 +2906,11 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	struct kmem_cache_cpu *c;
 	int i;
 
+	/* memcg and kmem_cache debug support */
+	s = slab_pre_alloc_hook(s, flags);
+	if (unlikely(!s))
+		return false;
+
 	/*
 	 * Drain objects in the per cpu slab, while disabling local
 	 * IRQs, which protects against PREEMPT and interrupts
@@ -2931,11 +2936,6 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			continue; /* goto for-loop */
 		}
 
-		/* kmem_cache debug support */
-		s = slab_pre_alloc_hook(s, flags);
-		if (unlikely(!s))
-			goto error;
-
 		c->freelist = get_freepointer(s, object);
 		p[i] = object;
 
@@ -2953,9 +2953,11 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			memset(p[j], 0, s->object_size);
 	}
 
+	memcg_kmem_put_cache(s);
 	return true;
 
 error:
+	memcg_kmem_put_cache(s);
 	__kmem_cache_free_bulk(s, i, p);
 	local_irq_enable();
 	return false;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-05 15:37 [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
  2015-11-05 15:37 ` [PATCH V2 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
@ 2015-11-05 15:38 ` Jesper Dangaard Brouer
  2015-11-05 16:25   ` Vladimir Davydov
  2015-11-05 16:10 ` [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Vladimir Davydov
  2 siblings, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-05 15:38 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

Initial implementation missed support for kmem cgroup support
in kmem_cache_free_bulk() call, add this.

If CONFIG_MEMCG_KMEM is not enabled, the compiler should
be smart enough to not add any asm code.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
V2: Fixes according to input from:
 Vladimir Davydov <vdavydov@virtuozzo.com>
 and Joonsoo Kim <iamjoonsoo.kim@lge.com>

 mm/slub.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/slub.c b/mm/slub.c
index 8e9e9b2ee6f3..bc64514ad1bb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2890,6 +2890,9 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 	do {
 		struct detached_freelist df;
 
+		/* Support for memcg */
+		s = cache_from_obj(s, p[size - 1]);
+
 		size = build_detached_freelist(s, size, p, &df);
 		if (unlikely(!df.page))
 			continue;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup
  2015-11-05 15:37 [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
  2015-11-05 15:37 ` [PATCH V2 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
  2015-11-05 15:38 ` [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
@ 2015-11-05 16:10 ` Vladimir Davydov
  2015-11-09 18:16   ` [PATCH V3 " Jesper Dangaard Brouer
  2015-11-13 10:57   ` [PATCH V4 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
  2 siblings, 2 replies; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-05 16:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Thu, Nov 05, 2015 at 04:37:39PM +0100, Jesper Dangaard Brouer wrote:
> I fixed some bugs for kmem cgroup interaction with SLUB bulk API,
> compiled kernel with CONFIG_MEMCG_KMEM=y, but I don't known how to
> setup kmem cgroups for slab, thus this is mostly untested.
> 
> I will appriciate anyone who can give me a simple setup script...

# create a memcg
mkdir /sys/fs/cgroup/memory/test

# enable kmem acct *before* putting any tasks in it
echo -1 > /sys/fs/cgroup/memory/test/memory.kmem.limit_in_bytes

# put a task in the cgroup
echo $$ > /sys/fs/cgroup/memory/test/cgroup.procs

# do what you want to do here

# you can check if kmem actt really works by looking at
cat /sys/fs/cgroup/memory/test/memory.kmem.slabinfo
# it should not be empty

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-05 15:37 ` [PATCH V2 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
@ 2015-11-05 16:18   ` Vladimir Davydov
  2015-11-07 16:40     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-05 16:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Thu, Nov 05, 2015 at 04:37:51PM +0100, Jesper Dangaard Brouer wrote:
...
> @@ -1298,7 +1298,6 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
>  	flags &= gfp_allowed_mask;
>  	kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
>  	kmemleak_alloc_recursive(object, s->object_size, 1, s->flags, flags);
> -	memcg_kmem_put_cache(s);
>  	kasan_slab_alloc(s, object);
>  }
>  
> @@ -2557,6 +2556,7 @@ redo:
>  		memset(object, 0, s->object_size);
>  
>  	slab_post_alloc_hook(s, gfpflags, object);
> +	memcg_kmem_put_cache(s);

Asymmetric - not good IMO. What about passing array of allocated objects
to slab_post_alloc_hook? Then we could leave memcg_kmem_put_cache where
it is now. I.e here we'd have

	slab_post_alloc_hook(s, gfpflags, &object, 1);

while in kmem_cache_alloc_bulk it'd look like

	slab_post_alloc_hook(s, flags, p, size);

right before return.

>  
>  	return object;
>  }
> @@ -2906,6 +2906,11 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  	struct kmem_cache_cpu *c;
>  	int i;
>  
> +	/* memcg and kmem_cache debug support */
> +	s = slab_pre_alloc_hook(s, flags);
> +	if (unlikely(!s))
> +		return false;
> +
>  	/*
>  	 * Drain objects in the per cpu slab, while disabling local
>  	 * IRQs, which protects against PREEMPT and interrupts
> @@ -2931,11 +2936,6 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  			continue; /* goto for-loop */
>  		}
>  
> -		/* kmem_cache debug support */
> -		s = slab_pre_alloc_hook(s, flags);
> -		if (unlikely(!s))
> -			goto error;
> -
>  		c->freelist = get_freepointer(s, object);
>  		p[i] = object;
>  
> @@ -2953,9 +2953,11 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  			memset(p[j], 0, s->object_size);
>  	}
>  
> +	memcg_kmem_put_cache(s);
>  	return true;
>  
>  error:
> +	memcg_kmem_put_cache(s);

It drops a reference to the cache so better to call it after you are
done with the cache, i.e. right before 'return false'.

Thanks,
Vladimir

>  	__kmem_cache_free_bulk(s, i, p);
>  	local_irq_enable();
>  	return false;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-05 15:38 ` [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
@ 2015-11-05 16:25   ` Vladimir Davydov
  2015-11-07 16:53     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-05 16:25 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Thu, Nov 05, 2015 at 04:38:06PM +0100, Jesper Dangaard Brouer wrote:
> Initial implementation missed support for kmem cgroup support
> in kmem_cache_free_bulk() call, add this.
> 
> If CONFIG_MEMCG_KMEM is not enabled, the compiler should
> be smart enough to not add any asm code.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> 
> ---
> V2: Fixes according to input from:
>  Vladimir Davydov <vdavydov@virtuozzo.com>
>  and Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
>  mm/slub.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 8e9e9b2ee6f3..bc64514ad1bb 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2890,6 +2890,9 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
>  	do {
>  		struct detached_freelist df;
>  
> +		/* Support for memcg */
> +		s = cache_from_obj(s, p[size - 1]);
> +

AFAIU all objects in the array should come from the same cache (should
they?), so it should be enough to call this only once:

diff --git a/mm/slub.c b/mm/slub.c
index 438ebf8bbab1..a6c3c058ce7c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2887,6 +2887,7 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
 	if (WARN_ON(!size))
 		return;
 
+	s = cache_from_obj(s, *p);
 	do {
 		struct detached_freelist df;

Thanks,
Vladimir

>  		size = build_detached_freelist(s, size, p, &df);
>  		if (unlikely(!df.page))
>  			continue;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-05 16:18   ` Vladimir Davydov
@ 2015-11-07 16:40     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-07 16:40 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter, brouer

On Thu, 5 Nov 2015 19:18:05 +0300
Vladimir Davydov <vdavydov@virtuozzo.com> wrote:

> On Thu, Nov 05, 2015 at 04:37:51PM +0100, Jesper Dangaard Brouer wrote:
> ...
> > @@ -1298,7 +1298,6 @@ static inline void slab_post_alloc_hook(struct kmem_cache *s,
> >  	flags &= gfp_allowed_mask;
> >  	kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
> >  	kmemleak_alloc_recursive(object, s->object_size, 1, s->flags, flags);
> > -	memcg_kmem_put_cache(s);
> >  	kasan_slab_alloc(s, object);
> >  }
> >  
> > @@ -2557,6 +2556,7 @@ redo:
> >  		memset(object, 0, s->object_size);
> >  
> >  	slab_post_alloc_hook(s, gfpflags, object);
> > +	memcg_kmem_put_cache(s);
> 
> Asymmetric - not good IMO. What about passing array of allocated objects
> to slab_post_alloc_hook? Then we could leave memcg_kmem_put_cache where
> it is now. I.e here we'd have
> 
> 	slab_post_alloc_hook(s, gfpflags, &object, 1);
> 
> while in kmem_cache_alloc_bulk it'd look like
> 
> 	slab_post_alloc_hook(s, flags, p, size);
> 
> right before return.

In theory a good idea, but we just have to make sure that the compiler
can "see" that it can remove the loop if the CONFIG feature is turned
off, and that const propagation works for the "1" element case.

I'll verify this tomorrow or Monday (busy at a conf yesterday goo.gl/rRTdNL)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-05 16:25   ` Vladimir Davydov
@ 2015-11-07 16:53     ` Jesper Dangaard Brouer
  2015-11-07 20:25       ` Vladimir Davydov
  0 siblings, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-07 16:53 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter, brouer

On Thu, 5 Nov 2015 19:25:14 +0300
Vladimir Davydov <vdavydov@virtuozzo.com> wrote:

> On Thu, Nov 05, 2015 at 04:38:06PM +0100, Jesper Dangaard Brouer wrote:
> > Initial implementation missed support for kmem cgroup support
> > in kmem_cache_free_bulk() call, add this.
> > 
> > If CONFIG_MEMCG_KMEM is not enabled, the compiler should
> > be smart enough to not add any asm code.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > 
> > ---
> > V2: Fixes according to input from:
> >  Vladimir Davydov <vdavydov@virtuozzo.com>
> >  and Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > 
> >  mm/slub.c |    3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 8e9e9b2ee6f3..bc64514ad1bb 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2890,6 +2890,9 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> >  	do {
> >  		struct detached_freelist df;
> >  
> > +		/* Support for memcg */
> > +		s = cache_from_obj(s, p[size - 1]);
> > +
> 
> AFAIU all objects in the array should come from the same cache (should
> they?), so it should be enough to call this only once:

Can we be sure all objects in the array come from same cache?

Imagine my use case:
 1. application send packet alloc a SKB (from a slab)
 2. packet TX to NIC via DMA
 3. TX DMA completion cleans up 256 packets and kmem free SKBs

I don't know enough about mem cgroups... but I can imagine two
applications belonging to different mem-cgroups sending packet out same
NIC and later getting their SKB (pkt-metadata struct) free'ed during
the same TX completion (TX softirq) cycle, as a bulk free.

With my limited mem cgroups, it looks like memcg works on the slab-page
level?  And what I'm doing in this code is to group object together
belonging to the same slab-page.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-07 16:53     ` Jesper Dangaard Brouer
@ 2015-11-07 20:25       ` Vladimir Davydov
  2015-11-07 20:55         ` Christoph Lameter
  2015-11-09 16:39         ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-07 20:25 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Sat, Nov 07, 2015 at 05:53:38PM +0100, Jesper Dangaard Brouer wrote:
> On Thu, 5 Nov 2015 19:25:14 +0300
> Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> 
> > On Thu, Nov 05, 2015 at 04:38:06PM +0100, Jesper Dangaard Brouer wrote:
> > > Initial implementation missed support for kmem cgroup support
> > > in kmem_cache_free_bulk() call, add this.
> > > 
> > > If CONFIG_MEMCG_KMEM is not enabled, the compiler should
> > > be smart enough to not add any asm code.
> > > 
> > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > 
> > > ---
> > > V2: Fixes according to input from:
> > >  Vladimir Davydov <vdavydov@virtuozzo.com>
> > >  and Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > 
> > >  mm/slub.c |    3 +++
> > >  1 file changed, 3 insertions(+)
> > > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 8e9e9b2ee6f3..bc64514ad1bb 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -2890,6 +2890,9 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> > >  	do {
> > >  		struct detached_freelist df;
> > >  
> > > +		/* Support for memcg */
> > > +		s = cache_from_obj(s, p[size - 1]);
> > > +
> > 
> > AFAIU all objects in the array should come from the same cache (should
> > they?), so it should be enough to call this only once:
> 
> Can we be sure all objects in the array come from same cache?
> 
> Imagine my use case:
>  1. application send packet alloc a SKB (from a slab)
>  2. packet TX to NIC via DMA
>  3. TX DMA completion cleans up 256 packets and kmem free SKBs
> 
> I don't know enough about mem cgroups... but I can imagine two
> applications belonging to different mem-cgroups sending packet out same
> NIC and later getting their SKB (pkt-metadata struct) free'ed during
> the same TX completion (TX softirq) cycle, as a bulk free.

Hmm, I thought that a bunch of objects allocated using
kmem_cache_alloc_bulk must be freed using kmem_cache_free_bulk. If it
does not hold, i.e. if one can allocate an array of objects one by one
using kmem_cache_alloc and then batch-free them using
kmem_cache_free_bulk, then my proposal is irrelevant.

> 
> With my limited mem cgroups, it looks like memcg works on the slab-page
> level? 

Yes, a memcg has its private copy of each global kmem cache it attempted
to use, which implies that all objects on the same slab-page must belong
to the same memcg.

> And what I'm doing in this code is to group object together
> belonging to the same slab-page.

Yeah, after inspecting build_detached_freelist more closely, I see your
patch is correct. Feel free to add

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-07 20:25       ` Vladimir Davydov
@ 2015-11-07 20:55         ` Christoph Lameter
  2015-11-09 16:39         ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 33+ messages in thread
From: Christoph Lameter @ 2015-11-07 20:55 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Jesper Dangaard Brouer, linux-mm, Joonsoo Kim, Andrew Morton

On Sat, 7 Nov 2015, Vladimir Davydov wrote:

> Hmm, I thought that a bunch of objects allocated using
> kmem_cache_alloc_bulk must be freed using kmem_cache_free_bulk. If it
> does not hold, i.e. if one can allocate an array of objects one by one
> using kmem_cache_alloc and then batch-free them using
> kmem_cache_free_bulk, then my proposal is irrelevant.

Nope they can be allocated and freed in multiple ways.

> > With my limited mem cgroups, it looks like memcg works on the slab-page
> > level?
>
> Yes, a memcg has its private copy of each global kmem cache it attempted
> to use, which implies that all objects on the same slab-page must belong
> to the same memcg.

Every memcg duplicates all slab caches and thus these are separate caches.
Bulk freeing to a mixture of cgroups caches does not work.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-07 20:25       ` Vladimir Davydov
  2015-11-07 20:55         ` Christoph Lameter
@ 2015-11-09 16:39         ` Jesper Dangaard Brouer
  2015-11-09 18:38           ` Vladimir Davydov
  1 sibling, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-09 16:39 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter, brouer


On Sat, 7 Nov 2015 23:25:48 +0300 Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> On Sat, Nov 07, 2015 at 05:53:38PM +0100, Jesper Dangaard Brouer wrote:
> > On Thu, 5 Nov 2015 19:25:14 +0300 Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > 
> > > On Thu, Nov 05, 2015 at 04:38:06PM +0100, Jesper Dangaard Brouer wrote:
> > > > Initial implementation missed support for kmem cgroup support
> > > > in kmem_cache_free_bulk() call, add this.
> > > > 
> > > > If CONFIG_MEMCG_KMEM is not enabled, the compiler should
> > > > be smart enough to not add any asm code.
> > > > 
> > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > > 
> > > > ---
> > > > V2: Fixes according to input from:
> > > >  Vladimir Davydov <vdavydov@virtuozzo.com>
> > > >  and Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > 
[...]
> > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > index 8e9e9b2ee6f3..bc64514ad1bb 100644
> > > > --- a/mm/slub.c
> > > > +++ b/mm/slub.c
> > > > @@ -2890,6 +2890,9 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> > > >  	do {
> > > >  		struct detached_freelist df;
> > > >  
> > > > +		/* Support for memcg */
> > > > +		s = cache_from_obj(s, p[size - 1]);
> > > > +
[...]
> 
> Yeah, after inspecting build_detached_freelist more closely, I see your
> patch is correct.

Actually, my patch is not correct... after spending most of my day
debugging V3 of patch 1/2, I've just realized this patch it the culprit.

We cannot overwrite the original "s", as the second time around the
loop, "s" will be a memcg slab cache.  And then slab_equal_or_root()
cannot find  the "root_cache" (s->memcg_params.root_cache).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH V3 0/2] SLUB bulk API interactions with kmem cgroup
  2015-11-05 16:10 ` [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Vladimir Davydov
@ 2015-11-09 18:16   ` Jesper Dangaard Brouer
  2015-11-09 18:17     ` [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
  2015-11-09 18:17     ` [PATCH V3 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
  2015-11-13 10:57   ` [PATCH V4 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
  1 sibling, 2 replies; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-09 18:16 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

Added correct support for kmem cgroup interaction with SLUB bulk API.

I've compiled kernel with CONFIG_MEMCG_KMEM=y, and have tested the
kernel with the setup provide by Vladimir Davydov.  And with my
network stack use-case patchset applied, to actually activate the API.

Patch01: I've verified the loop in slab_post_alloc_hook() gets removed
 by the compiler (when no debug options defined). This was actually
 tricky due to kernel gcc compile options, and I wrote a small program
 to figure this out [1].

Patch02: The "try_crash" mode of the test module slab_bulk_test03 [2]
 have been modified as after this change we no longer handle error
 cases like passing of NULL pointers in the array to free, when
 CONFIG_MEMCG_KMEM is enabled.

[1] https://github.com/netoptimizer/network-testing/blob/master/src/compiler_test01.c
[2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test03.c

---

Jesper Dangaard Brouer (2):
      slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
      slub: add missing kmem cgroup support to kmem_cache_free_bulk


 mm/slub.c |   41 +++++++++++++++++++++++++++--------------
 1 file changed, 27 insertions(+), 14 deletions(-)

--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-09 18:16   ` [PATCH V3 " Jesper Dangaard Brouer
@ 2015-11-09 18:17     ` Jesper Dangaard Brouer
  2015-11-09 19:13       ` Vladimir Davydov
  2015-11-09 18:17     ` [PATCH V3 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
  1 sibling, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-09 18:17 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

The call slab_pre_alloc_hook() interacts with kmemgc and is not
allowed to be called several times inside the bulk alloc for loop,
due to the call to memcg_kmem_get_cache().

This would result in hitting the VM_BUG_ON in __memcg_kmem_get_cache.

As suggested by Vladimir Davydov, change slab_post_alloc_hook()
to be able to handle an array of objects.

A subtle detail is, loop iterator "i" in slab_post_alloc_hook()
must have same type (size_t) as size argument.  This helps the
compiler to easier realize that it can remove the loop, when all
debug statements inside loop evaluates to nothing.
 Note, this is only an issue because the kernel is compiled with
GCC option: -fno-strict-overflow

Reported-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
V3:
 Addressed input and used suggetions from Vladimir

 mm/slub.c |   35 ++++++++++++++++++++++-------------
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 9be12ffae9fc..8e6b929d06d6 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1292,14 +1292,21 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
 	return memcg_kmem_get_cache(s, flags);
 }
 
-static inline void slab_post_alloc_hook(struct kmem_cache *s,
-					gfp_t flags, void *object)
+static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
+					size_t size, void **p)
 {
+	size_t i;
+
 	flags &= gfp_allowed_mask;
-	kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
-	kmemleak_alloc_recursive(object, s->object_size, 1, s->flags, flags);
+	for (i = 0; i < size; i++) {
+		void *object = p[i];
+
+		kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
+		kmemleak_alloc_recursive(object, s->object_size, 1,
+					 s->flags, flags);
+		kasan_slab_alloc(s, object);
+	}
 	memcg_kmem_put_cache(s);
-	kasan_slab_alloc(s, object);
 }
 
 static inline void slab_free_hook(struct kmem_cache *s, void *x)
@@ -2556,7 +2563,7 @@ redo:
 	if (unlikely(gfpflags & __GFP_ZERO) && object)
 		memset(object, 0, s->object_size);
 
-	slab_post_alloc_hook(s, gfpflags, object);
+	slab_post_alloc_hook(s, gfpflags, 1, object);
 
 	return object;
 }
@@ -2906,6 +2913,11 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	struct kmem_cache_cpu *c;
 	int i;
 
+	/* memcg and kmem_cache debug support */
+	s = slab_pre_alloc_hook(s, flags);
+	if (unlikely(!s))
+		return false;
+
 	/*
 	 * Drain objects in the per cpu slab, while disabling local
 	 * IRQs, which protects against PREEMPT and interrupts
@@ -2931,16 +2943,9 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			continue; /* goto for-loop */
 		}
 
-		/* kmem_cache debug support */
-		s = slab_pre_alloc_hook(s, flags);
-		if (unlikely(!s))
-			goto error;
-
 		c->freelist = get_freepointer(s, object);
 		p[i] = object;
 
-		/* kmem_cache debug support */
-		slab_post_alloc_hook(s, flags, object);
 	}
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
@@ -2953,11 +2958,15 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			memset(p[j], 0, s->object_size);
 	}
 
+	/* memcg and kmem_cache debug support */
+	slab_post_alloc_hook(s, flags, size, p);
+
 	return true;
 
 error:
 	__kmem_cache_free_bulk(s, i, p);
 	local_irq_enable();
+	memcg_kmem_put_cache(s);
 	return false;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH V3 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-09 18:16   ` [PATCH V3 " Jesper Dangaard Brouer
  2015-11-09 18:17     ` [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
@ 2015-11-09 18:17     ` Jesper Dangaard Brouer
  2015-11-09 18:56       ` Vladimir Davydov
  1 sibling, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-09 18:17 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

Initial implementation missed support for kmem cgroup support
in kmem_cache_free_bulk() call, add this.

If CONFIG_MEMCG_KMEM is not enabled, the compiler should
be smart enough to not add any asm code.

Incomming bulk free objects can belong to different kmem cgroups, and
object free call can happen at a later point outside memcg context.
Thus, we need to keep the orig kmem_cache, to correctly verify if a
memcg object match against its "root_cache" (s->memcg_params.root_cache).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
V3:
 Learned more about memcg, actually tested it and fixed a bug

V2: Fixes according to input from:
 Vladimir Davydov <vdavydov@virtuozzo.com>
 and Joonsoo Kim <iamjoonsoo.kim@lge.com>

 mm/slub.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8e6b929d06d6..e3fa85278706 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2889,13 +2889,17 @@ static int build_detached_freelist(struct kmem_cache *s, size_t size,
 
 
 /* Note that interrupts must be enabled when calling this function. */
-void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
 {
 	if (WARN_ON(!size))
 		return;
 
 	do {
 		struct detached_freelist df;
+		struct kmem_cache *s;
+
+		/* Support for memcg */
+		s = cache_from_obj(orig_s, p[size - 1]);
 
 		size = build_detached_freelist(s, size, p, &df);
 		if (unlikely(!df.page))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-09 16:39         ` Jesper Dangaard Brouer
@ 2015-11-09 18:38           ` Vladimir Davydov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-09 18:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Mon, Nov 09, 2015 at 05:39:10PM +0100, Jesper Dangaard Brouer wrote:
> 
> On Sat, 7 Nov 2015 23:25:48 +0300 Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > On Sat, Nov 07, 2015 at 05:53:38PM +0100, Jesper Dangaard Brouer wrote:
> > > On Thu, 5 Nov 2015 19:25:14 +0300 Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > > 
> > > > On Thu, Nov 05, 2015 at 04:38:06PM +0100, Jesper Dangaard Brouer wrote:
> > > > > Initial implementation missed support for kmem cgroup support
> > > > > in kmem_cache_free_bulk() call, add this.
> > > > > 
> > > > > If CONFIG_MEMCG_KMEM is not enabled, the compiler should
> > > > > be smart enough to not add any asm code.
> > > > > 
> > > > > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > > > > 
> > > > > ---
> > > > > V2: Fixes according to input from:
> > > > >  Vladimir Davydov <vdavydov@virtuozzo.com>
> > > > >  and Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > > > > 
> [...]
> > > > > diff --git a/mm/slub.c b/mm/slub.c
> > > > > index 8e9e9b2ee6f3..bc64514ad1bb 100644
> > > > > --- a/mm/slub.c
> > > > > +++ b/mm/slub.c
> > > > > @@ -2890,6 +2890,9 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> > > > >  	do {
> > > > >  		struct detached_freelist df;
> > > > >  
> > > > > +		/* Support for memcg */
> > > > > +		s = cache_from_obj(s, p[size - 1]);
> > > > > +
> [...]
> > 
> > Yeah, after inspecting build_detached_freelist more closely, I see your
> > patch is correct.
> 
> Actually, my patch is not correct... after spending most of my day
> debugging V3 of patch 1/2, I've just realized this patch it the culprit.
> 
> We cannot overwrite the original "s", as the second time around the
> loop, "s" will be a memcg slab cache.  And then slab_equal_or_root()
> cannot find  the "root_cache" (s->memcg_params.root_cache).

Yeah, you're right. Shame that I missed that.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-09 18:17     ` [PATCH V3 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
@ 2015-11-09 18:56       ` Vladimir Davydov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-09 18:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Mon, Nov 09, 2015 at 07:17:50PM +0100, Jesper Dangaard Brouer wrote:
> Initial implementation missed support for kmem cgroup support
> in kmem_cache_free_bulk() call, add this.
> 
> If CONFIG_MEMCG_KMEM is not enabled, the compiler should
> be smart enough to not add any asm code.
> 
> Incomming bulk free objects can belong to different kmem cgroups, and
> object free call can happen at a later point outside memcg context.
> Thus, we need to keep the orig kmem_cache, to correctly verify if a
> memcg object match against its "root_cache" (s->memcg_params.root_cache).
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-09 18:17     ` [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
@ 2015-11-09 19:13       ` Vladimir Davydov
  2015-11-09 20:25         ` Jesper Dangaard Brouer
  2015-11-09 22:04         ` Christoph Lameter
  0 siblings, 2 replies; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-09 19:13 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Mon, Nov 09, 2015 at 07:17:31PM +0100, Jesper Dangaard Brouer wrote:
...
> @@ -2556,7 +2563,7 @@ redo:
>  	if (unlikely(gfpflags & __GFP_ZERO) && object)
>  		memset(object, 0, s->object_size);
>  
> -	slab_post_alloc_hook(s, gfpflags, object);
> +	slab_post_alloc_hook(s, gfpflags, 1, object);

I think it must be &object

BTW why is object defined as void **? I suspect we can safely drop one
star.

>  
>  	return object;
>  }
...
> @@ -2953,11 +2958,15 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>  			memset(p[j], 0, s->object_size);
>  	}
>  
> +	/* memcg and kmem_cache debug support */
> +	slab_post_alloc_hook(s, flags, size, p);
> +
>  	return true;
>  
>  error:
>  	__kmem_cache_free_bulk(s, i, p);
>  	local_irq_enable();
> +	memcg_kmem_put_cache(s);

I wouldn't tear memcg_kmem_put_cache from slab_post_alloc_hook. If we
add something else to slab_post_alloc_hook (e.g. we might want to call
tracing functions from there), we'll have to modify this error path
either, which is easy to miss.

What about calling

	slab_post_alloc_hook(s, flags, 0, NULL);

here?

Thanks,
Vladimir

>  	return false;
>  }
>  EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-09 19:13       ` Vladimir Davydov
@ 2015-11-09 20:25         ` Jesper Dangaard Brouer
  2015-11-10  8:46           ` Vladimir Davydov
  2015-11-09 22:04         ` Christoph Lameter
  1 sibling, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-09 20:25 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter, brouer

On Mon, 9 Nov 2015 22:13:35 +0300
Vladimir Davydov <vdavydov@virtuozzo.com> wrote:

> On Mon, Nov 09, 2015 at 07:17:31PM +0100, Jesper Dangaard Brouer wrote:
> ...
> > @@ -2556,7 +2563,7 @@ redo:
> >  	if (unlikely(gfpflags & __GFP_ZERO) && object)
> >  		memset(object, 0, s->object_size);
> >  
> > -	slab_post_alloc_hook(s, gfpflags, object);
> > +	slab_post_alloc_hook(s, gfpflags, 1, object);
> 
> I think it must be &object

The object is already a void ** type.

> BTW why is object defined as void **? I suspect we can safely drop one
> star.

Maybe Christoph can explain this?


> >  
> >  	return object;
> >  }
> ...
> > @@ -2953,11 +2958,15 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> >  			memset(p[j], 0, s->object_size);
> >  	}
> >  
> > +	/* memcg and kmem_cache debug support */
> > +	slab_post_alloc_hook(s, flags, size, p);
> > +
> >  	return true;
> >  
> >  error:
> >  	__kmem_cache_free_bulk(s, i, p);
> >  	local_irq_enable();
> > +	memcg_kmem_put_cache(s);
> 
> I wouldn't tear memcg_kmem_put_cache from slab_post_alloc_hook. If we
> add something else to slab_post_alloc_hook (e.g. we might want to call
> tracing functions from there), we'll have to modify this error path
> either, which is easy to miss.
> 
> What about calling
> 
> 	slab_post_alloc_hook(s, flags, 0, NULL);
> 
> here?

Maybe the correct behavior here, to adhere to all the debugging options,
is to call:

error:
 local_irq_enable();
 slab_post_alloc_hook(s, flags, i, p);
 __kmem_cache_free_bulk(s, i, p);
 return false;

 
> >  	return false;
> >  }
> >  EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> > 


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-09 19:13       ` Vladimir Davydov
  2015-11-09 20:25         ` Jesper Dangaard Brouer
@ 2015-11-09 22:04         ` Christoph Lameter
  2015-11-10  8:30           ` Vladimir Davydov
  1 sibling, 1 reply; 33+ messages in thread
From: Christoph Lameter @ 2015-11-09 22:04 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Jesper Dangaard Brouer, linux-mm, Joonsoo Kim, Andrew Morton

On Mon, 9 Nov 2015, Vladimir Davydov wrote:

> I think it must be &object
>
> BTW why is object defined as void **? I suspect we can safely drop one
> star.

See get_freepointer()

static inline void *get_freepointer(struct kmem_cache *s, void *object)
{
        return *(void **)(object + s->offset);
}

The object at some point has a freepointer and ** allows the use of the
s->offset field to get to it.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-09 22:04         ` Christoph Lameter
@ 2015-11-10  8:30           ` Vladimir Davydov
  2015-11-10 15:23             ` Christoph Lameter
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-10  8:30 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Jesper Dangaard Brouer, linux-mm, Joonsoo Kim, Andrew Morton

On Mon, Nov 09, 2015 at 04:04:51PM -0600, Christoph Lameter wrote:
> On Mon, 9 Nov 2015, Vladimir Davydov wrote:
> 
> > I think it must be &object
> >
> > BTW why is object defined as void **? I suspect we can safely drop one
> > star.
> 
> See get_freepointer()
> 
> static inline void *get_freepointer(struct kmem_cache *s, void *object)
> {
>         return *(void **)(object + s->offset);
> }

In this function object has type (void *)

> 
> The object at some point has a freepointer and ** allows the use of the
> s->offset field to get to it.

But it doesn't mean we have to define it as (void **) in
slab_alloc_node. Actually, the fact that object is of type (void **) is
never used in slab_alloc_node, and all functions called by it accept
(void *) for object, not (void **). Dropping one star there doesn't
break anything and looks less confusing IMO.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-09 20:25         ` Jesper Dangaard Brouer
@ 2015-11-10  8:46           ` Vladimir Davydov
  2015-11-10 15:55             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-10  8:46 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Mon, Nov 09, 2015 at 09:25:22PM +0100, Jesper Dangaard Brouer wrote:
> On Mon, 9 Nov 2015 22:13:35 +0300
> Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> 
> > On Mon, Nov 09, 2015 at 07:17:31PM +0100, Jesper Dangaard Brouer wrote:
> > ...
> > > @@ -2556,7 +2563,7 @@ redo:
> > >  	if (unlikely(gfpflags & __GFP_ZERO) && object)
> > >  		memset(object, 0, s->object_size);
> > >  
> > > -	slab_post_alloc_hook(s, gfpflags, object);
> > > +	slab_post_alloc_hook(s, gfpflags, 1, object);
> > 
> > I think it must be &object
> 
> The object is already a void ** type.

Let's forget about types for a second. object contains an address to the
newly allocated object, while slab_post_alloc_hook expects an array of
addresses to objects. Simple test. Suppose an allocation failed. Then
object equals 0. Passing 0 to slab_post_alloc_hook as @p and 1 as @size
will result in NULL ptr dereference.

> 
> > BTW why is object defined as void **? I suspect we can safely drop one
> > star.
> 
> Maybe Christoph can explain this?
> 
> 
> > >  
> > >  	return object;
> > >  }
> > ...
> > > @@ -2953,11 +2958,15 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> > >  			memset(p[j], 0, s->object_size);
> > >  	}
> > >  
> > > +	/* memcg and kmem_cache debug support */
> > > +	slab_post_alloc_hook(s, flags, size, p);
> > > +
> > >  	return true;
> > >  
> > >  error:
> > >  	__kmem_cache_free_bulk(s, i, p);
> > >  	local_irq_enable();
> > > +	memcg_kmem_put_cache(s);
> > 
> > I wouldn't tear memcg_kmem_put_cache from slab_post_alloc_hook. If we
> > add something else to slab_post_alloc_hook (e.g. we might want to call
> > tracing functions from there), we'll have to modify this error path
> > either, which is easy to miss.
> > 
> > What about calling
> > 
> > 	slab_post_alloc_hook(s, flags, 0, NULL);
> > 
> > here?
> 
> Maybe the correct behavior here, to adhere to all the debugging options,
> is to call:
> 
> error:
>  local_irq_enable();
>  slab_post_alloc_hook(s, flags, i, p);
>  __kmem_cache_free_bulk(s, i, p);
>  return false;

Yeah, I think you're right, because __kmem_cache_free_bulk calls
slab_free_hook, which is supposed to undo slab_post_alloc_hook, so we
must call the latter for allocated objects.

Thanks,
Vladimir

> 
>  
> > >  	return false;
> > >  }
> > >  EXPORT_SYMBOL(kmem_cache_alloc_bulk);
> > > 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-10  8:30           ` Vladimir Davydov
@ 2015-11-10 15:23             ` Christoph Lameter
  0 siblings, 0 replies; 33+ messages in thread
From: Christoph Lameter @ 2015-11-10 15:23 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: Jesper Dangaard Brouer, linux-mm, Joonsoo Kim, Andrew Morton

On Tue, 10 Nov 2015, Vladimir Davydov wrote:

> But it doesn't mean we have to define it as (void **) in
> slab_alloc_node. Actually, the fact that object is of type (void **) is
> never used in slab_alloc_node, and all functions called by it accept
> (void *) for object, not (void **). Dropping one star there doesn't
> break anything and looks less confusing IMO.

I just tried to point out that this is a historical artifact. Can be
removed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-10  8:46           ` Vladimir Davydov
@ 2015-11-10 15:55             ` Jesper Dangaard Brouer
  2015-11-10 16:17               ` Christoph Lameter
  2015-11-10 18:32               ` Vladimir Davydov
  0 siblings, 2 replies; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-10 15:55 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter, brouer

On Tue, 10 Nov 2015 11:46:33 +0300
Vladimir Davydov <vdavydov@virtuozzo.com> wrote:

> On Mon, Nov 09, 2015 at 09:25:22PM +0100, Jesper Dangaard Brouer wrote:
> > On Mon, 9 Nov 2015 22:13:35 +0300
> > Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > 
> > > On Mon, Nov 09, 2015 at 07:17:31PM +0100, Jesper Dangaard Brouer wrote:
> > > ...
> > > > @@ -2556,7 +2563,7 @@ redo:
> > > >  	if (unlikely(gfpflags & __GFP_ZERO) && object)
> > > >  		memset(object, 0, s->object_size);
> > > >  
> > > > -	slab_post_alloc_hook(s, gfpflags, object);
> > > > +	slab_post_alloc_hook(s, gfpflags, 1, object);
> > > 
> > > I think it must be &object
> > 
> > The object is already a void ** type.
> 
> Let's forget about types for a second. object contains an address to the
> newly allocated object, while slab_post_alloc_hook expects an array of
> addresses to objects. Simple test. Suppose an allocation failed. Then
> object equals 0. Passing 0 to slab_post_alloc_hook as @p and 1 as @size
> will result in NULL ptr dereference.

Argh, that is not good :-(
I tested memory exhaustion and NULL ptr deref does happen in this case.

 BUG: unable to handle kernel NULL pointer dereference at           (null)
 IP: [<ffffffff8113dea2>] kmem_cache_alloc+0x92/0x1d0

(gdb) list *(kmem_cache_alloc)+0x92
0xffffffff8113dea2 is in kmem_cache_alloc (mm/slub.c:1302).
1297	{
1298		size_t i;
1299	
1300		flags &= gfp_allowed_mask;
1301		for (i = 0; i < size; i++) {
1302			void *object = p[i];
1303	
1304			kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
1305			kmemleak_alloc_recursive(object, s->object_size, 1,
1306						 s->flags, flags);
(gdb) quit

I changed:

diff --git a/mm/slub.c b/mm/slub.c
index 2eab115e18c5..c5a62fd02321 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2484,7 +2484,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 static __always_inline void *slab_alloc_node(struct kmem_cache *s,
                gfp_t gfpflags, int node, unsigned long addr)
 {
-       void **object;
+       void *object;
        struct kmem_cache_cpu *c;
        struct page *page;
        unsigned long tid;
@@ -2563,7 +2563,7 @@ redo:
        if (unlikely(gfpflags & __GFP_ZERO) && object)
                memset(object, 0, s->object_size);
 
-       slab_post_alloc_hook(s, gfpflags, 1, object);
+       slab_post_alloc_hook(s, gfpflags, 1, &object);
 
        return object;
 }

But then the kernel cannot correctly boot?!?! (It dies in
x86_perf_event_update+0x15.)  What did I miss???

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-10 15:55             ` Jesper Dangaard Brouer
@ 2015-11-10 16:17               ` Christoph Lameter
  2015-11-10 18:32               ` Vladimir Davydov
  1 sibling, 0 replies; 33+ messages in thread
From: Christoph Lameter @ 2015-11-10 16:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Vladimir Davydov, linux-mm, Joonsoo Kim, Andrew Morton

On Tue, 10 Nov 2015, Jesper Dangaard Brouer wrote:

> @@ -2563,7 +2563,7 @@ redo:
>         if (unlikely(gfpflags & __GFP_ZERO) && object)
>                 memset(object, 0, s->object_size);
>
> -       slab_post_alloc_hook(s, gfpflags, 1, object);
> +       slab_post_alloc_hook(s, gfpflags, 1, &object);
>
>         return object;
>  }
>
> But then the kernel cannot correctly boot?!?! (It dies in
> x86_perf_event_update+0x15.)  What did I miss???

Dont make the above change. object is a pointer to the object. It does not
matter if that is ** or *. Dont take the address.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-10 15:55             ` Jesper Dangaard Brouer
  2015-11-10 16:17               ` Christoph Lameter
@ 2015-11-10 18:32               ` Vladimir Davydov
  2015-11-11 15:28                 ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-10 18:32 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Tue, Nov 10, 2015 at 04:55:34PM +0100, Jesper Dangaard Brouer wrote:
> On Tue, 10 Nov 2015 11:46:33 +0300
> Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> 
> > On Mon, Nov 09, 2015 at 09:25:22PM +0100, Jesper Dangaard Brouer wrote:
> > > On Mon, 9 Nov 2015 22:13:35 +0300
> > > Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > > 
> > > > On Mon, Nov 09, 2015 at 07:17:31PM +0100, Jesper Dangaard Brouer wrote:
> > > > ...
> > > > > @@ -2556,7 +2563,7 @@ redo:
> > > > >  	if (unlikely(gfpflags & __GFP_ZERO) && object)
> > > > >  		memset(object, 0, s->object_size);
> > > > >  
> > > > > -	slab_post_alloc_hook(s, gfpflags, object);
> > > > > +	slab_post_alloc_hook(s, gfpflags, 1, object);
> > > > 
> > > > I think it must be &object
> > > 
> > > The object is already a void ** type.
> > 
> > Let's forget about types for a second. object contains an address to the
> > newly allocated object, while slab_post_alloc_hook expects an array of
> > addresses to objects. Simple test. Suppose an allocation failed. Then
> > object equals 0. Passing 0 to slab_post_alloc_hook as @p and 1 as @size
> > will result in NULL ptr dereference.
> 
> Argh, that is not good :-(
> I tested memory exhaustion and NULL ptr deref does happen in this case.
> 
>  BUG: unable to handle kernel NULL pointer dereference at           (null)
>  IP: [<ffffffff8113dea2>] kmem_cache_alloc+0x92/0x1d0
> 
> (gdb) list *(kmem_cache_alloc)+0x92
> 0xffffffff8113dea2 is in kmem_cache_alloc (mm/slub.c:1302).
> 1297	{
> 1298		size_t i;
> 1299	
> 1300		flags &= gfp_allowed_mask;
> 1301		for (i = 0; i < size; i++) {
> 1302			void *object = p[i];
> 1303	
> 1304			kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
> 1305			kmemleak_alloc_recursive(object, s->object_size, 1,
> 1306						 s->flags, flags);
> (gdb) quit
> 
> I changed:
> 
> diff --git a/mm/slub.c b/mm/slub.c
> index 2eab115e18c5..c5a62fd02321 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2484,7 +2484,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>  static __always_inline void *slab_alloc_node(struct kmem_cache *s,
>                 gfp_t gfpflags, int node, unsigned long addr)
>  {
> -       void **object;
> +       void *object;
>         struct kmem_cache_cpu *c;
>         struct page *page;
>         unsigned long tid;
> @@ -2563,7 +2563,7 @@ redo:
>         if (unlikely(gfpflags & __GFP_ZERO) && object)
>                 memset(object, 0, s->object_size);
>  
> -       slab_post_alloc_hook(s, gfpflags, 1, object);
> +       slab_post_alloc_hook(s, gfpflags, 1, &object);
>  
>         return object;
>  }
> 
> But then the kernel cannot correctly boot?!?! (It dies in
> x86_perf_event_update+0x15.)  What did I miss???

Weird... I applied all your patches including the one above to
v4.3-rc6-mmotm-2015-10-21-14-41 and everything boots and works just fine
both inside a VM and on my x86 host. Are you sure the problem is caused
by your patches? Perhaps you updated the source tree in the meantime.

Could you try to just remove the star from @object definition, w/o
applying your patches? May be, there is something in slab_alloc_node
that implicitly relies on the @object type (e.g. this_cpu_cmpxchg_double
macro)...

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-10 18:32               ` Vladimir Davydov
@ 2015-11-11 15:28                 ` Jesper Dangaard Brouer
  2015-11-11 18:30                   ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-11 15:28 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter, brouer

On Tue, 10 Nov 2015 21:32:46 +0300
Vladimir Davydov <vdavydov@virtuozzo.com> wrote:

> On Tue, Nov 10, 2015 at 04:55:34PM +0100, Jesper Dangaard Brouer wrote:
> > On Tue, 10 Nov 2015 11:46:33 +0300
> > Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > 
> > > On Mon, Nov 09, 2015 at 09:25:22PM +0100, Jesper Dangaard Brouer wrote:
> > > > On Mon, 9 Nov 2015 22:13:35 +0300
> > > > Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > > > 
> > > > > On Mon, Nov 09, 2015 at 07:17:31PM +0100, Jesper Dangaard Brouer wrote:
> > > > > ...
> > > > > > @@ -2556,7 +2563,7 @@ redo:
> > > > > >  	if (unlikely(gfpflags & __GFP_ZERO) && object)
> > > > > >  		memset(object, 0, s->object_size);
> > > > > >  
> > > > > > -	slab_post_alloc_hook(s, gfpflags, object);
> > > > > > +	slab_post_alloc_hook(s, gfpflags, 1, object);
> > > > > 
> > > > > I think it must be &object
> > > > 
> > > > The object is already a void ** type.
> > > 
> > > Let's forget about types for a second. object contains an address to the
> > > newly allocated object, while slab_post_alloc_hook expects an array of
> > > addresses to objects. Simple test. Suppose an allocation failed. Then
> > > object equals 0. Passing 0 to slab_post_alloc_hook as @p and 1 as @size
> > > will result in NULL ptr dereference.
> > 
> > Argh, that is not good :-(
> > I tested memory exhaustion and NULL ptr deref does happen in this case.
> > 
> >  BUG: unable to handle kernel NULL pointer dereference at           (null)
> >  IP: [<ffffffff8113dea2>] kmem_cache_alloc+0x92/0x1d0
> > 
> > (gdb) list *(kmem_cache_alloc)+0x92
> > 0xffffffff8113dea2 is in kmem_cache_alloc (mm/slub.c:1302).
> > 1297	{
> > 1298		size_t i;
> > 1299	
> > 1300		flags &= gfp_allowed_mask;
> > 1301		for (i = 0; i < size; i++) {
> > 1302			void *object = p[i];
> > 1303	
> > 1304			kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
> > 1305			kmemleak_alloc_recursive(object, s->object_size, 1,
> > 1306						 s->flags, flags);
> > (gdb) quit
> > 
> > I changed:
> > 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 2eab115e18c5..c5a62fd02321 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2484,7 +2484,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> >  static __always_inline void *slab_alloc_node(struct kmem_cache *s,
> >                 gfp_t gfpflags, int node, unsigned long addr)
> >  {
> > -       void **object;
> > +       void *object;
> >         struct kmem_cache_cpu *c;
> >         struct page *page;
> >         unsigned long tid;
> > @@ -2563,7 +2563,7 @@ redo:
> >         if (unlikely(gfpflags & __GFP_ZERO) && object)
> >                 memset(object, 0, s->object_size);
> >  
> > -       slab_post_alloc_hook(s, gfpflags, 1, object);
> > +       slab_post_alloc_hook(s, gfpflags, 1, &object);
> >  
> >         return object;
> >  }
> > 
> > But then the kernel cannot correctly boot?!?! (It dies in
> > x86_perf_event_update+0x15.)  What did I miss???
> 
> Weird... I applied all your patches including the one above to
> v4.3-rc6-mmotm-2015-10-21-14-41 and everything boots and works just fine
> both inside a VM and on my x86 host. Are you sure the problem is caused
> by your patches? Perhaps you updated the source tree in the meantime.

I didn't rebase, but I likely _should_ rebase my patchset.  It could be
something different from my patch, I will investigate further.

When you tested it, did you make sure the compiler didn't "remove" the
code inside the for loop?

To put some code inside the for loop, I have enabled both
CONFIG_KMEMCHECK and CONFIG_DEBUG_KMEMLEAK, plus CONFIG_SLUB_DEBUG_ON=y
(but it seems SLUB_DEBUG gets somewhat removed when these gets enabled,
didn't check the details).


> Could you try to just remove the star from @object definition, w/o
> applying your patches? May be, there is something in slab_alloc_node
> that implicitly relies on the @object type (e.g. this_cpu_cmpxchg_double
> macro)...

Removing the star from @object definition works (without the other change).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-11 15:28                 ` Jesper Dangaard Brouer
@ 2015-11-11 18:30                   ` Jesper Dangaard Brouer
  2015-11-11 18:56                     ` Vladimir Davydov
  0 siblings, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-11 18:30 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter, brouer

On Wed, 11 Nov 2015 16:28:20 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Tue, 10 Nov 2015 21:32:46 +0300
> Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> 
> > On Tue, Nov 10, 2015 at 04:55:34PM +0100, Jesper Dangaard Brouer wrote:
> > > On Tue, 10 Nov 2015 11:46:33 +0300
> > > Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > > 
> > > > On Mon, Nov 09, 2015 at 09:25:22PM +0100, Jesper Dangaard Brouer wrote:
> > > > > On Mon, 9 Nov 2015 22:13:35 +0300
> > > > > Vladimir Davydov <vdavydov@virtuozzo.com> wrote:
> > > > > 
> > > > > > On Mon, Nov 09, 2015 at 07:17:31PM +0100, Jesper Dangaard Brouer wrote:
> > > > > > ...
> > > > > > > @@ -2556,7 +2563,7 @@ redo:
> > > > > > >  	if (unlikely(gfpflags & __GFP_ZERO) && object)
> > > > > > >  		memset(object, 0, s->object_size);
> > > > > > >  
> > > > > > > -	slab_post_alloc_hook(s, gfpflags, object);
> > > > > > > +	slab_post_alloc_hook(s, gfpflags, 1, object);
> > > > > > 
> > > > > > I think it must be &object
> > > > > 
> > > > > The object is already a void ** type.
> > > > 
> > > > Let's forget about types for a second. object contains an address to the
> > > > newly allocated object, while slab_post_alloc_hook expects an array of
> > > > addresses to objects. Simple test. Suppose an allocation failed. Then
> > > > object equals 0. Passing 0 to slab_post_alloc_hook as @p and 1 as @size
> > > > will result in NULL ptr dereference.
> > > 
> > > Argh, that is not good :-(
> > > I tested memory exhaustion and NULL ptr deref does happen in this case.
> > > 
> > >  BUG: unable to handle kernel NULL pointer dereference at           (null)
> > >  IP: [<ffffffff8113dea2>] kmem_cache_alloc+0x92/0x1d0
> > > 
> > > (gdb) list *(kmem_cache_alloc)+0x92
> > > 0xffffffff8113dea2 is in kmem_cache_alloc (mm/slub.c:1302).
> > > 1297	{
> > > 1298		size_t i;
> > > 1299	
> > > 1300		flags &= gfp_allowed_mask;
> > > 1301		for (i = 0; i < size; i++) {
> > > 1302			void *object = p[i];
> > > 1303	
> > > 1304			kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
> > > 1305			kmemleak_alloc_recursive(object, s->object_size, 1,
> > > 1306						 s->flags, flags);
> > > (gdb) quit
> > > 
> > > I changed:
> > > 
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 2eab115e18c5..c5a62fd02321 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -2484,7 +2484,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
> > >  static __always_inline void *slab_alloc_node(struct kmem_cache *s,
> > >                 gfp_t gfpflags, int node, unsigned long addr)
> > >  {
> > > -       void **object;
> > > +       void *object;
> > >         struct kmem_cache_cpu *c;
> > >         struct page *page;
> > >         unsigned long tid;
> > > @@ -2563,7 +2563,7 @@ redo:
> > >         if (unlikely(gfpflags & __GFP_ZERO) && object)
> > >                 memset(object, 0, s->object_size);
> > >  
> > > -       slab_post_alloc_hook(s, gfpflags, 1, object);
> > > +       slab_post_alloc_hook(s, gfpflags, 1, &object);
> > >  
> > >         return object;
> > >  }
> > > 
> > > But then the kernel cannot correctly boot?!?! (It dies in
> > > x86_perf_event_update+0x15.)  What did I miss???
> > 
> > Weird... I applied all your patches including the one above to
> > v4.3-rc6-mmotm-2015-10-21-14-41 and everything boots and works just fine
> > both inside a VM and on my x86 host. Are you sure the problem is caused
> > by your patches? Perhaps you updated the source tree in the meantime.
> 
> I didn't rebase, but I likely _should_ rebase my patchset.  It could be
> something different from my patch, I will investigate further.
>
> When you tested it, did you make sure the compiler didn't "remove" the
> code inside the for loop?
> 
> To put some code inside the for loop, I have enabled both
> CONFIG_KMEMCHECK and CONFIG_DEBUG_KMEMLEAK, plus CONFIG_SLUB_DEBUG_ON=y
> (but it seems SLUB_DEBUG gets somewhat removed when these gets enabled,
> didn't check the details).

Okay, there is nothing wrong with this change (it is actually more correct).

The problem was related to CONFIG_KMEMCHECK.  It was causing the system
to not boot (I have not look into why yet, don't have full console
output, but I can see it complains about PCI and ACPI init and then
dies in x86_perf_event_update+0x15, thus it could be system/HW specific).

I'm now running with CONFIG_DEBUG_KMEMLEAK, and is running tests with
exhausting memory.  And it works, e.g. when the alloc fails and @object
becomes NULL.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-11 18:30                   ` Jesper Dangaard Brouer
@ 2015-11-11 18:56                     ` Vladimir Davydov
  2015-11-12 14:27                       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-11 18:56 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Wed, Nov 11, 2015 at 07:30:59PM +0100, Jesper Dangaard Brouer wrote:
...
> The problem was related to CONFIG_KMEMCHECK.  It was causing the system
> to not boot (I have not look into why yet, don't have full console
> output, but I can see it complains about PCI and ACPI init and then
> dies in x86_perf_event_update+0x15, thus it could be system/HW specific).

AFAIK kmemcheck is rarely used nowadays, because kasan does practically
the same and does it better, so failures are expected.

> 
> I'm now running with CONFIG_DEBUG_KMEMLEAK, and is running tests with

kmemleak must be OK. Personally I use it quite often.

> exhausting memory.  And it works, e.g. when the alloc fails and @object
> becomes NULL.

Cool.

Thanks,
Vladimir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-11 18:56                     ` Vladimir Davydov
@ 2015-11-12 14:27                       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-12 14:27 UTC (permalink / raw)
  To: Vladimir Davydov
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter, brouer

On Wed, 11 Nov 2015 21:56:48 +0300
Vladimir Davydov <vdavydov@virtuozzo.com> wrote:

> On Wed, Nov 11, 2015 at 07:30:59PM +0100, Jesper Dangaard Brouer wrote:
> ...
> > The problem was related to CONFIG_KMEMCHECK.  It was causing the system
> > to not boot (I have not look into why yet, don't have full console
> > output, but I can see it complains about PCI and ACPI init and then
> > dies in x86_perf_event_update+0x15, thus it could be system/HW specific).
> 
> AFAIK kmemcheck is rarely used nowadays, because kasan does practically
> the same and does it better, so failures are expected.

For the record, I've look at the early console output that caused the
CONFIG_KMEMCHECK enabled kernel to crash (general protection fault:
077b) in x86_perf_event_update+0x15.

It seems kmemcheck gets called while in NMI context, and that is likely
cause of the issue, when perf_event_nmi_handler gets activated.  

I don't know kmemcheck, but judging from the WARN_ON_ONCE(in_nmi()) in
kmemcheck_fault (which gets activated) it does have some issue with NMI.

Before crashing the warning appears:
 WARNING: CPU: 0 PID: 1 at arch/x86/mm/kmemcheck/kmemcheck.c:640 kmemcheck_fault+0x91/0xa0()

I've going to compile without CONFIG_KMEMCHECK, and just assume it is
broken (... that is I'm not going to fix this kmemcheck issue).

- - 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer

[   24.537699] WARNING: CPU: 0 PID: 1 at arch/x86/mm/kmemcheck/kmemcheck.c:640 kmemcheck_fault+0x91/0xa0()
[   24.537699] Modules linked in:
[   24.537700] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.3.0-rc6-net-next-MM-rx-hacks+ #595
[   24.537701] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
[   24.537702]  ffffffff8180f3c8 ffff88041dc05740 ffffffff812e39cf 0000000000000000
[   24.537702]  ffff88041dc05778 ffffffff81056ae8 ffff88041dc05828 ffff88041d0401c4
[   24.537703]  0000000000000000 0000000000000000 ffff880419eb0000 ffff88041dc05788
[   24.537703] Call Trace:
[   24.537706]  <NMI>  [<ffffffff812e39cf>] dump_stack+0x44/0x55
[   24.537708]  [<ffffffff81056ae8>] warn_slowpath_common+0x78/0xb0
[   24.537708]  [<ffffffff81056bd5>] warn_slowpath_null+0x15/0x20
[   24.537709]  [<ffffffff8104adc1>] kmemcheck_fault+0x91/0xa0
[   24.537711]  [<ffffffff81043b45>] __do_page_fault+0x2a5/0x3f0
[   24.537712]  [<ffffffff812ede6a>] ? number.isra.14+0x2aa/0x2d0
[   24.537713]  [<ffffffff81043c9c>] do_page_fault+0xc/0x10
[   24.537714]  [<ffffffff81607522>] page_fault+0x22/0x30
[   24.537717]  [<ffffffff813a34d0>] ? lf+0x80/0x80
[   24.537718]  [<ffffffff813a356e>] ? vt_console_print+0x9e/0x3b0
[   24.537719]  [<ffffffff813a3511>] ? vt_console_print+0x41/0x3b0
[   24.537720]  [<ffffffff810914e3>] ? print_prefix+0xd3/0x1c0
[   24.537722]  [<ffffffff8109140a>] call_console_drivers.constprop.27+0xaa/0xb0
[   24.537723]  [<ffffffff81091fdc>] console_unlock+0x2ec/0x510
[   24.537724]  [<ffffffff81092aac>] vprintk_emit+0x2bc/0x510
[   24.537724]  [<ffffffff8104adc1>] ? kmemcheck_fault+0x91/0xa0
[   24.537725]  [<ffffffff81092e44>] vprintk_default+0x24/0x40
[   24.537727]  [<ffffffff810ed65a>] printk+0x43/0x4b
[   24.537728]  [<ffffffff8104adc1>] ? kmemcheck_fault+0x91/0xa0
[   24.537728]  [<ffffffff81056a98>] warn_slowpath_common+0x28/0xb0
[   24.537729]  [<ffffffff81056bd5>] warn_slowpath_null+0x15/0x20
[   24.537730]  [<ffffffff8104adc1>] kmemcheck_fault+0x91/0xa0
[   24.537731]  [<ffffffff81043b45>] __do_page_fault+0x2a5/0x3f0
[   24.537732]  [<ffffffff81043c9c>] do_page_fault+0xc/0x10
[   24.537732]  [<ffffffff81607522>] page_fault+0x22/0x30
[   24.537734]  [<ffffffff81014eae>] ? x86_perf_event_update+0xe/0x70
[   24.537735]  [<ffffffff8101cfcd>] ? intel_pmu_save_and_restart+0xd/0x50
[   24.537736]  [<ffffffff8101d165>] intel_pmu_handle_irq+0x155/0x3e0
[   24.537738]  [<ffffffff81014d36>] perf_event_nmi_handler+0x26/0x40
[   24.537739]  [<ffffffff810064d0>] nmi_handle+0x60/0xb0
[   24.537740]  [<ffffffff812e5262>] ? ida_pre_get+0x62/0xd0
[   24.537741]  [<ffffffff8100695b>] default_do_nmi+0x3b/0xf0
[   24.537741]  [<ffffffff81006ae6>] do_nmi+0xd6/0x120
[   24.537742]  [<ffffffff81607827>] end_repeat_nmi+0x1a/0x1e
[   24.537743]  [<ffffffff812e5262>] ? ida_pre_get+0x62/0xd0
[   24.537743]  [<ffffffff8104aa35>] ? kmemcheck_hide+0x25/0x140
[   24.537744]  [<ffffffff8104aa35>] ? kmemcheck_hide+0x25/0x140
[   24.537745]  [<ffffffff8104aa35>] ? kmemcheck_hide+0x25/0x140
[   24.537746]  <<EOE>>  <#DB>  [<ffffffff812e5262>] ? ida_pre_get+0x62/0xd0
[   24.537747]  [<ffffffff8104adef>] kmemcheck_trap+0x1f/0x30
[   24.537748]  [<ffffffff81004149>] do_debug+0x69/0x1a0
[   24.537749]  [<ffffffff8160740f>] debug+0x2f/0x60
[   24.537750]  [<ffffffff812e5262>] ? ida_pre_get+0x62/0xd0
[   24.537751]  [<ffffffff812f0ab9>] ? memset_erms+0x9/0x10
[   24.537753]  <<EOE>>  [<ffffffff8113de04>] ? kmem_cache_alloc+0x184/0x1d0
[   24.537754]  [<ffffffff812e5262>] ida_pre_get+0x62/0xd0
[   24.537754]  [<ffffffff812e5312>] ida_simple_get+0x42/0xe0
[   24.537755]  [<ffffffff8113dd31>] ? kmem_cache_alloc+0xb1/0x1d0
[   24.537757]  [<ffffffff811bcb1a>] __kernfs_new_node+0x5a/0xc0
[   24.537758]  [<ffffffff811bda91>] kernfs_new_node+0x21/0x40
[   24.537759]  [<ffffffff811bf217>] __kernfs_create_file+0x27/0x90
[   24.537761]  [<ffffffff811bfa04>] sysfs_add_file_mode_ns+0x94/0x180
[   24.537762]  [<ffffffff811bfb15>] sysfs_create_file_ns+0x25/0x30
[   24.537764]  [<ffffffff813d5a9d>] device_create_file+0x3d/0x90
[   24.537766]  [<ffffffff81350419>] acpi_device_setup_files+0x10e/0x209
[   24.537766]  [<ffffffff813530b2>] acpi_device_add+0x230/0x28f
[   24.537767]  [<ffffffff813532a4>] ? acpi_free_pnp_ids+0x4b/0x4b
[   24.537768]  [<ffffffff81353c8c>] acpi_add_single_object+0x4cc/0x525
[   24.537769]  [<ffffffff8134f421>] ? acpi_evaluate_integer+0x2f/0x4e
[   24.537769]  [<ffffffff8134ef76>] ? acpi_os_signal_semaphore+0x27/0x33
[   24.537770]  [<ffffffff81353da3>] acpi_bus_check_add+0xbe/0x16d
[   24.537772]  [<ffffffff8136e6dd>] acpi_ns_walk_namespace+0xdc/0x18e
[   24.537772]  [<ffffffff81353ce5>] ? acpi_add_single_object+0x525/0x525
[   24.537773]  [<ffffffff81353ce5>] ? acpi_add_single_object+0x525/0x525
[   24.537773]  [<ffffffff8136ebaf>] acpi_walk_namespace+0x97/0xcb
[   24.537774]  [<ffffffff81a1b0d7>] ? acpi_sleep_init+0xbb/0xbb
[   24.537775]  [<ffffffff81354097>] acpi_bus_scan+0x43/0x62
[   24.537775]  [<ffffffff81a1b514>] acpi_scan_init+0x5b/0x189
[   24.537776]  [<ffffffff81a1b333>] acpi_init+0x25c/0x274
[   24.537777]  [<ffffffff810002e6>] do_one_initcall+0xa6/0x1c0
[   24.537778]  [<ffffffff8106ff4a>] ? parse_args+0x26a/0x490
[   24.537780]  [<ffffffff819eb04c>] kernel_init_freeable+0x16a/0x1f5
[   24.537781]  [<ffffffff815fb030>] ? rest_init+0x80/0x80
[   24.537782]  [<ffffffff815fb039>] kernel_init+0x9/0xd0
[   24.537783]  [<ffffffff8160616f>] ret_from_fork+0x3f/0x70
[   24.537784]  [<ffffffff815fb030>] ? rest_init+0x80/0x80
[   24.537785] ---[ end trace ef72f67e8e798002 ]---


[   25.485937] general protection fault: 077b [#1] SMP 
[   25.491290] Modules linked in:
[   25.494618] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.3.0-rc6-net-next-MM-rx-hacks+ #595
[   25.504659] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z97 Extreme4, BIOS P2.10 05/12/2015
[   25.515062] task: ffff880419eb0000 ti: ffff880419e5c000 task.ti: ffff880419e5c000
[   25.523040] RIP: 0010:[<ffffffff81014eb5>]  [<ffffffff81014eb5>] x86_perf_event_update+0x15/0x70
[   25.532417] RSP: 0000:ffff88041dc05c18  EFLAGS: 00010283
[   25.538041] RAX: 0000000000000021 RBX: ffff880419975400 RCX: 0000000000000021
[   25.545556] RDX: 00000000ffffffff RSI: 0000000000000040 RDI: ffff880419975400
[   25.553065] RBP: ffff88041dc05c30 R08: ffff88041dc17f60 R09: 0000000000000010
[   25.560579] R10: ffff880419e5fb40 R11: 000000005254535f R12: ffff88041dc0b160
[   25.568088] R13: ffff88041dc0af60 R14: ffff880419975400 R15: 0000000000000021
[   25.575601] FS:  0000000000000000(0000) GS:ffff88041dc00000(0000) knlGS:0000000000000000
[   25.584207] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   25.590278] CR2: ffff88041d0401fc CR3: 00000000018f3000 CR4: 00000000001406f0
[   25.597790] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   25.605303] DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
[   25.612814] Stack:
[   25.615019]  ffff88041dc05c30 ffffffff8101cfcd 0000000000000001 ffff88041dc05e40
[   25.623072]  ffffffff8101d165 0000000000000000 0000000000000000 0000000000000000
[   25.631122]  0000000000000000 ffff88041dc0b8b0 0000006400000000 ffff88041dc05ef8
[   25.639171] Call Trace:
[   25.641825]  <NMI> 
[   25.643846]  [<ffffffff8101cfcd>] ? intel_pmu_save_and_restart+0xd/0x50
[   25.651135]  [<ffffffff8101d165>] intel_pmu_handle_irq+0x155/0x3e0
[   25.657658]  [<ffffffff81014d36>] perf_event_nmi_handler+0x26/0x40
[   25.664183]  [<ffffffff810064d0>] nmi_handle+0x60/0xb0
[   25.669628]  [<ffffffff812e5262>] ? ida_pre_get+0x62/0xd0
[   25.675338]  [<ffffffff8100695b>] default_do_nmi+0x3b/0xf0
[   25.681138]  [<ffffffff81006ae6>] do_nmi+0xd6/0x120
[   25.686313]  [<ffffffff81607827>] end_repeat_nmi+0x1a/0x1e
[   25.692115]  [<ffffffff812e5262>] ? ida_pre_get+0x62/0xd0
[   25.697830]  [<ffffffff8104aa35>] ? kmemcheck_hide+0x25/0x140
[   25.703902]  [<ffffffff8104aa35>] ? kmemcheck_hide+0x25/0x140
[   25.709974]  [<ffffffff8104aa35>] ? kmemcheck_hide+0x25/0x140
[   25.716047]  <<EOE>> 
[   25.718249]  <#DB> [   25.720584]  [<ffffffff812e5262>] ? ida_pre_get+0x62/0xd0
[   25.726404]  [<ffffffff8104adef>] kmemcheck_trap+0x1f/0x30
[   25.732203]  [<ffffffff81004149>] do_debug+0x69/0x1a0
[   25.737560]  [<ffffffff8160740f>] debug+0x2f/0x60
[   25.742553]  [<ffffffff812e5262>] ? ida_pre_get+0x62/0xd0
[   25.748265]  [<ffffffff812f0ab9>] ? memset_erms+0x9/0x10
[   25.753884]  <<EOE>> 
[   25.756086]  [<ffffffff8113de04>] ? kmem_cache_alloc+0x184/0x1d0
[   25.762740]  [<ffffffff812e5262>] ida_pre_get+0x62/0xd0
[   25.768272]  [<ffffffff812e5312>] ida_simple_get+0x42/0xe0
[   25.774079]  [<ffffffff8113dd31>] ? kmem_cache_alloc+0xb1/0x1d0
[   25.780332]  [<ffffffff811bcb1a>] __kernfs_new_node+0x5a/0xc0
[   25.786408]  [<ffffffff811bda91>] kernfs_new_node+0x21/0x40
[   25.792303]  [<ffffffff811bf217>] __kernfs_create_file+0x27/0x90
[   25.798645]  [<ffffffff811bfa04>] sysfs_add_file_mode_ns+0x94/0x180
[   25.805255]  [<ffffffff811bfb15>] sysfs_create_file_ns+0x25/0x30
[   25.811596]  [<ffffffff813d5a9d>] device_create_file+0x3d/0x90
[   25.817760]  [<ffffffff81350419>] acpi_device_setup_files+0x10e/0x209
[   25.824549]  [<ffffffff813530b2>] acpi_device_add+0x230/0x28f
[   25.830622]  [<ffffffff813532a4>] ? acpi_free_pnp_ids+0x4b/0x4b
[   25.836876]  [<ffffffff81353c8c>] acpi_add_single_object+0x4cc/0x525
[   25.843581]  [<ffffffff8134f421>] ? acpi_evaluate_integer+0x2f/0x4e
[   25.850197]  [<ffffffff8134ef76>] ? acpi_os_signal_semaphore+0x27/0x33
[   25.857078]  [<ffffffff81353da3>] acpi_bus_check_add+0xbe/0x16d
[   25.863326]  [<ffffffff8136e6dd>] acpi_ns_walk_namespace+0xdc/0x18e
[   25.869942]  [<ffffffff81353ce5>] ? acpi_add_single_object+0x525/0x525
[   25.876822]  [<ffffffff81353ce5>] ? acpi_add_single_object+0x525/0x525
[   25.883702]  [<ffffffff8136ebaf>] acpi_walk_namespace+0x97/0xcb
[   25.889952]  [<ffffffff81a1b0d7>] ? acpi_sleep_init+0xbb/0xbb
[   25.896027]  [<ffffffff81354097>] acpi_bus_scan+0x43/0x62
[   25.901741]  [<ffffffff81a1b514>] acpi_scan_init+0x5b/0x189
[   25.907630]  [<ffffffff81a1b333>] acpi_init+0x25c/0x274
[   25.913164]  [<ffffffff810002e6>] do_one_initcall+0xa6/0x1c0
[   25.919150]  [<ffffffff8106ff4a>] ? parse_args+0x26a/0x490
[   25.924956]  [<ffffffff819eb04c>] kernel_init_freeable+0x16a/0x1f5
[   25.931482]  [<ffffffff815fb030>] ? rest_init+0x80/0x80
[   25.937015]  [<ffffffff815fb039>] kernel_init+0x9/0xd0
[   25.942462]  [<ffffffff8160616f>] ret_from_fork+0x3f/0x70
[   25.948173]  [<ffffffff815fb030>] ? rest_init+0x80/0x80
[   25.953702] Code: 8b 12 00 48 89 df e8 9b 8b 12 00 eb e5 66 0f 1f 84 00 00 00 00 00 41 b9 40 00 00 00 44 2b 0d a 
[   25.976301] RIP  [<ffffffff81014eb5>] x86_perf_event_update+0x15/0x70
[   25.983135]  RSP <ffff88041dc05c18>
[   25.986872] ---[ end trace ef72f67e8e798004 ]---
[   25.991775] Kernel panic - not syncing: Fatal exception in interrupt
[   25.998480] ---[ end Kernel panic - not syncing: Fatal exception in interrupt



[

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH V4 0/2] SLUB bulk API interactions with kmem cgroup
  2015-11-05 16:10 ` [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Vladimir Davydov
  2015-11-09 18:16   ` [PATCH V3 " Jesper Dangaard Brouer
@ 2015-11-13 10:57   ` Jesper Dangaard Brouer
  2015-11-13 10:57     ` [PATCH V4 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
  2015-11-13 10:57     ` [PATCH V4 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
  1 sibling, 2 replies; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-13 10:57 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

Added correct support for kmem cgroup interaction with SLUB bulk API.

Kernel tested with CONFIG_MEMCG_KMEM=y, and memcg setup[4] provide by
Vladimir Davydov. And with my network stack use-case patchset applied,
to actually activate the API.

Patch01: I've verified the loop in slab_post_alloc_hook() gets removed
 by the compiler (when no debug options defined). This was actually
 tricky due to kernel gcc compile options, and I wrote a small program
 to figure this out [1].
 Also tested memory exhaustion[3] to verify error: (label) code path.

Patch02: If CONFIG_MEMCG_KMEM is enabled, we no longer handle error
 cases like passing of NULL pointers in the array to free.  The
 "try_crash" mode test of module slab_bulk_test03 [2] have been adjusted.

Kernel config wise, ran with combinations of:
 CONFIG_DEBUG_KMEMLEAK, CONFIG_KASAN and CONFIG_MEMCG_KMEM
 (Explicitly disabled/avoided CONFIG_KMEMCHECK)

[1] https://github.com/netoptimizer/network-testing/blob/master/src/compiler_test01.c
[2] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test03.c
[3] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test04_exhaust_mem.c
[4] http://thread.gmane.org/gmane.linux.kernel.mm/140860/focus=140865

---

Jesper Dangaard Brouer (2):
      slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
      slub: add missing kmem cgroup support to kmem_cache_free_bulk


 mm/slub.c |   46 +++++++++++++++++++++++++++-------------------
 1 file changed, 27 insertions(+), 19 deletions(-)

--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH V4 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-13 10:57   ` [PATCH V4 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
@ 2015-11-13 10:57     ` Jesper Dangaard Brouer
  2015-11-14 11:04       ` Vladimir Davydov
  2015-11-13 10:57     ` [PATCH V4 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
  1 sibling, 1 reply; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-13 10:57 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

The call slab_pre_alloc_hook() interacts with kmemgc and is not
allowed to be called several times inside the bulk alloc for loop,
due to the call to memcg_kmem_get_cache().

This would result in hitting the VM_BUG_ON in __memcg_kmem_get_cache.

As suggested by Vladimir Davydov, change slab_post_alloc_hook()
to be able to handle an array of objects.

A subtle detail is, loop iterator "i" in slab_post_alloc_hook()
must have same type (size_t) as size argument.  This helps the
compiler to easier realize that it can remove the loop, when all
debug statements inside loop evaluates to nothing.
 Note, this is only an issue because the kernel is compiled with
GCC option: -fno-strict-overflow

In slab_alloc_node() the compiler inlines and optimizes the invocation
of slab_post_alloc_hook(s, flags, 1, &object) by removing the loop and
access object directly.

Reported-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
V4:
 - Adjust slab_alloc_node() to have @object be correct type.
 Did objdump to verify compiler optimized out loop and &addr of @object
 Wrote tool/module to test memory exhaust for error path testing
 [1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test04_exhaust_mem.c

V3:
 Addressed and used suggetions from Vladimir

 mm/slub.c |   40 ++++++++++++++++++++++------------------
 1 file changed, 22 insertions(+), 18 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 9be12ffae9fc..efcddd223369 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1292,14 +1292,21 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s,
 	return memcg_kmem_get_cache(s, flags);
 }
 
-static inline void slab_post_alloc_hook(struct kmem_cache *s,
-					gfp_t flags, void *object)
+static inline void slab_post_alloc_hook(struct kmem_cache *s, gfp_t flags,
+					size_t size, void **p)
 {
+	size_t i;
+
 	flags &= gfp_allowed_mask;
-	kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
-	kmemleak_alloc_recursive(object, s->object_size, 1, s->flags, flags);
+	for (i = 0; i < size; i++) {
+		void *object = p[i];
+
+		kmemcheck_slab_alloc(s, flags, object, slab_ksize(s));
+		kmemleak_alloc_recursive(object, s->object_size, 1,
+					 s->flags, flags);
+		kasan_slab_alloc(s, object);
+	}
 	memcg_kmem_put_cache(s);
-	kasan_slab_alloc(s, object);
 }
 
 static inline void slab_free_hook(struct kmem_cache *s, void *x)
@@ -2477,7 +2484,7 @@ static void *__slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
 static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		gfp_t gfpflags, int node, unsigned long addr)
 {
-	void **object;
+	void *object;
 	struct kmem_cache_cpu *c;
 	struct page *page;
 	unsigned long tid;
@@ -2556,7 +2563,7 @@ redo:
 	if (unlikely(gfpflags & __GFP_ZERO) && object)
 		memset(object, 0, s->object_size);
 
-	slab_post_alloc_hook(s, gfpflags, object);
+	slab_post_alloc_hook(s, gfpflags, 1, &object);
 
 	return object;
 }
@@ -2906,6 +2913,10 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	struct kmem_cache_cpu *c;
 	int i;
 
+	/* memcg and kmem_cache debug support */
+	s = slab_pre_alloc_hook(s, flags);
+	if (unlikely(!s))
+		return false;
 	/*
 	 * Drain objects in the per cpu slab, while disabling local
 	 * IRQs, which protects against PREEMPT and interrupts
@@ -2930,17 +2941,8 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			c = this_cpu_ptr(s->cpu_slab);
 			continue; /* goto for-loop */
 		}
-
-		/* kmem_cache debug support */
-		s = slab_pre_alloc_hook(s, flags);
-		if (unlikely(!s))
-			goto error;
-
 		c->freelist = get_freepointer(s, object);
 		p[i] = object;
-
-		/* kmem_cache debug support */
-		slab_post_alloc_hook(s, flags, object);
 	}
 	c->tid = next_tid(c->tid);
 	local_irq_enable();
@@ -2953,11 +2955,13 @@ bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 			memset(p[j], 0, s->object_size);
 	}
 
+	/* memcg and kmem_cache debug support */
+	slab_post_alloc_hook(s, flags, size, p);
 	return true;
-
 error:
-	__kmem_cache_free_bulk(s, i, p);
 	local_irq_enable();
+	slab_post_alloc_hook(s, flags, i, p);
+	__kmem_cache_free_bulk(s, i, p);
 	return false;
 }
 EXPORT_SYMBOL(kmem_cache_alloc_bulk);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH V4 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk
  2015-11-13 10:57   ` [PATCH V4 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
  2015-11-13 10:57     ` [PATCH V4 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
@ 2015-11-13 10:57     ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 33+ messages in thread
From: Jesper Dangaard Brouer @ 2015-11-13 10:57 UTC (permalink / raw)
  To: linux-mm
  Cc: vdavydov, Joonsoo Kim, Andrew Morton, Christoph Lameter,
	Jesper Dangaard Brouer

Initial implementation missed support for kmem cgroup support
in kmem_cache_free_bulk() call, add this.

If CONFIG_MEMCG_KMEM is not enabled, the compiler should
be smart enough to not add any asm code.

Incoming bulk free objects can belong to different kmem cgroups, and
object free call can happen at a later point outside memcg context.
Thus, we need to keep the orig kmem_cache, to correctly verify if a
memcg object match against its "root_cache" (s->memcg_params.root_cache).

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

---
V3:
 Learned more about memcg, actually tested it and fixed a bug

V2: Fixes according to input from:
 Vladimir Davydov <vdavydov@virtuozzo.com>
 and Joonsoo Kim <iamjoonsoo.kim@lge.com>

 mm/slub.c |    6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/mm/slub.c b/mm/slub.c
index efcddd223369..d52ac8a2b147 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2889,13 +2889,17 @@ static int build_detached_freelist(struct kmem_cache *s, size_t size,
 
 
 /* Note that interrupts must be enabled when calling this function. */
-void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
+void kmem_cache_free_bulk(struct kmem_cache *orig_s, size_t size, void **p)
 {
 	if (WARN_ON(!size))
 		return;
 
 	do {
 		struct detached_freelist df;
+		struct kmem_cache *s;
+
+		/* Support for memcg */
+		s = cache_from_obj(orig_s, p[size - 1]);
 
 		size = build_detached_freelist(s, size, p, &df);
 		if (unlikely(!df.page))

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH V4 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk
  2015-11-13 10:57     ` [PATCH V4 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
@ 2015-11-14 11:04       ` Vladimir Davydov
  0 siblings, 0 replies; 33+ messages in thread
From: Vladimir Davydov @ 2015-11-14 11:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: linux-mm, Joonsoo Kim, Andrew Morton, Christoph Lameter

On Fri, Nov 13, 2015 at 11:57:40AM +0100, Jesper Dangaard Brouer wrote:
> The call slab_pre_alloc_hook() interacts with kmemgc and is not
> allowed to be called several times inside the bulk alloc for loop,
> due to the call to memcg_kmem_get_cache().
> 
> This would result in hitting the VM_BUG_ON in __memcg_kmem_get_cache.
> 
> As suggested by Vladimir Davydov, change slab_post_alloc_hook()
> to be able to handle an array of objects.
> 
> A subtle detail is, loop iterator "i" in slab_post_alloc_hook()
> must have same type (size_t) as size argument.  This helps the
> compiler to easier realize that it can remove the loop, when all
> debug statements inside loop evaluates to nothing.
>  Note, this is only an issue because the kernel is compiled with
> GCC option: -fno-strict-overflow
> 
> In slab_alloc_node() the compiler inlines and optimizes the invocation
> of slab_post_alloc_hook(s, flags, 1, &object) by removing the loop and
> access object directly.
> 
> Reported-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> Suggested-by: Vladimir Davydov <vdavydov@virtuozzo.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Reviewed-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-11-14 11:04 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-05 15:37 [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
2015-11-05 15:37 ` [PATCH V2 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
2015-11-05 16:18   ` Vladimir Davydov
2015-11-07 16:40     ` Jesper Dangaard Brouer
2015-11-05 15:38 ` [PATCH V2 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
2015-11-05 16:25   ` Vladimir Davydov
2015-11-07 16:53     ` Jesper Dangaard Brouer
2015-11-07 20:25       ` Vladimir Davydov
2015-11-07 20:55         ` Christoph Lameter
2015-11-09 16:39         ` Jesper Dangaard Brouer
2015-11-09 18:38           ` Vladimir Davydov
2015-11-05 16:10 ` [PATCH V2 0/2] SLUB bulk API interactions with kmem cgroup Vladimir Davydov
2015-11-09 18:16   ` [PATCH V3 " Jesper Dangaard Brouer
2015-11-09 18:17     ` [PATCH V3 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
2015-11-09 19:13       ` Vladimir Davydov
2015-11-09 20:25         ` Jesper Dangaard Brouer
2015-11-10  8:46           ` Vladimir Davydov
2015-11-10 15:55             ` Jesper Dangaard Brouer
2015-11-10 16:17               ` Christoph Lameter
2015-11-10 18:32               ` Vladimir Davydov
2015-11-11 15:28                 ` Jesper Dangaard Brouer
2015-11-11 18:30                   ` Jesper Dangaard Brouer
2015-11-11 18:56                     ` Vladimir Davydov
2015-11-12 14:27                       ` Jesper Dangaard Brouer
2015-11-09 22:04         ` Christoph Lameter
2015-11-10  8:30           ` Vladimir Davydov
2015-11-10 15:23             ` Christoph Lameter
2015-11-09 18:17     ` [PATCH V3 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer
2015-11-09 18:56       ` Vladimir Davydov
2015-11-13 10:57   ` [PATCH V4 0/2] SLUB bulk API interactions with kmem cgroup Jesper Dangaard Brouer
2015-11-13 10:57     ` [PATCH V4 1/2] slub: fix kmem cgroup bug in kmem_cache_alloc_bulk Jesper Dangaard Brouer
2015-11-14 11:04       ` Vladimir Davydov
2015-11-13 10:57     ` [PATCH V4 2/2] slub: add missing kmem cgroup support to kmem_cache_free_bulk Jesper Dangaard Brouer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.