All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 12:16 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 12:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pekka Enberg, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall

Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER,
thus there is a good chance of unsuccessful allocation.
With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/slab.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d96e223..53bddc8 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 		gfp = GFP_NOWAIT;
 
 	/* Get cache's description obj. */
-	cachep = kmem_cache_zalloc(&cache_cache, gfp);
+	cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT);
 	if (!cachep)
 		goto oops;
 


^ permalink raw reply related	[flat|nested] 67+ messages in thread

* [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 12:16 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 12:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pekka Enberg, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall

Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER,
thus there is a good chance of unsuccessful allocation.
With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
---
 mm/slab.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d96e223..53bddc8 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 		gfp = GFP_NOWAIT;
 
 	/* Get cache's description obj. */
-	cachep = kmem_cache_zalloc(&cache_cache, gfp);
+	cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT);
 	if (!cachep)
 		goto oops;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 12:16 ` Konstantin Khlebnikov
@ 2011-07-20 13:14   ` Pekka Enberg
  -1 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 13:14 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER,
> thus there is a good chance of unsuccessful allocation.
> With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> ---
> mm/slab.c |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index d96e223..53bddc8 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
> 		gfp = GFP_NOWAIT;
>
> 	/* Get cache's description obj. */
> -	cachep = kmem_cache_zalloc(&cache_cache, gfp);
> +	cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT);
> 	if (!cachep)
> 		goto oops;

The changelog isn't that convincing, really. This is kmem_cache_create() 
so I'm surprised we'd ever get NULL here in practice. Does this fix some 
problem you're seeing? If this is really an issue, I'd blame the page 
allocator as GFP_KERNEL should just work.

 			Pekka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:14   ` Pekka Enberg
  0 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 13:14 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER,
> thus there is a good chance of unsuccessful allocation.
> With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively.
>
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> ---
> mm/slab.c |    2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index d96e223..53bddc8 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
> 		gfp = GFP_NOWAIT;
>
> 	/* Get cache's description obj. */
> -	cachep = kmem_cache_zalloc(&cache_cache, gfp);
> +	cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT);
> 	if (!cachep)
> 		goto oops;

The changelog isn't that convincing, really. This is kmem_cache_create() 
so I'm surprised we'd ever get NULL here in practice. Does this fix some 
problem you're seeing? If this is really an issue, I'd blame the page 
allocator as GFP_KERNEL should just work.

 			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:14   ` Pekka Enberg
@ 2011-07-20 13:28     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 13:28 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>> Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER,
>> thus there is a good chance of unsuccessful allocation.
>> With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> ---
>> mm/slab.c |    2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index d96e223..53bddc8 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
>> 		gfp = GFP_NOWAIT;
>>
>> 	/* Get cache's description obj. */
>> -	cachep = kmem_cache_zalloc(&cache_cache, gfp);
>> +	cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT);
>> 	if (!cachep)
>> 		goto oops;
>
> The changelog isn't that convincing, really. This is kmem_cache_create()
> so I'm surprised we'd ever get NULL here in practice. Does this fix some
> problem you're seeing? If this is really an issue, I'd blame the page
> allocator as GFP_KERNEL should just work.

nf_conntrack creates separate slab-cache for each net-namespace,
this patch of course not eliminates the chance of failure, but makes it more acceptable.

struct kmem_size for slub is more compact, it uses pecpu-pointers instead of dumb NR_CPUS-size array.
probably better to fix this side...

>
>   			Pekka


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:28     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 13:28 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>> Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER,
>> thus there is a good chance of unsuccessful allocation.
>> With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively.
>>
>> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org>
>> ---
>> mm/slab.c |    2 +-
>> 1 files changed, 1 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/slab.c b/mm/slab.c
>> index d96e223..53bddc8 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
>> 		gfp = GFP_NOWAIT;
>>
>> 	/* Get cache's description obj. */
>> -	cachep = kmem_cache_zalloc(&cache_cache, gfp);
>> +	cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT);
>> 	if (!cachep)
>> 		goto oops;
>
> The changelog isn't that convincing, really. This is kmem_cache_create()
> so I'm surprised we'd ever get NULL here in practice. Does this fix some
> problem you're seeing? If this is really an issue, I'd blame the page
> allocator as GFP_KERNEL should just work.

nf_conntrack creates separate slab-cache for each net-namespace,
this patch of course not eliminates the chance of failure, but makes it more acceptable.

struct kmem_size for slub is more compact, it uses pecpu-pointers instead of dumb NR_CPUS-size array.
probably better to fix this side...

>
>   			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:28     ` Konstantin Khlebnikov
@ 2011-07-20 13:42       ` Pekka Enberg
  -1 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 13:42 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>> The changelog isn't that convincing, really. This is kmem_cache_create()
>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>> problem you're seeing? If this is really an issue, I'd blame the page
>> allocator as GFP_KERNEL should just work.
>
> nf_conntrack creates separate slab-cache for each net-namespace,
> this patch of course not eliminates the chance of failure, but makes it more 
> acceptable.

I'm still surprised you are seeing failures. mm/slab.c hasn't changed 
significantly in a long time. Why hasn't anyone reported this before? I'd 
still be inclined to shift the blame to the page allocator... Mel, 
Christoph?

On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of 
> dumb NR_CPUS-size array.
> probably better to fix this side...

So how big is 'struct kmem_cache' for your configuration anyway? Fixing 
the per-cpu data structures would be nice but I'm guessing it'll be 
slightly painful for mm/slab.c.

 			Pekka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:42       ` Pekka Enberg
  0 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 13:42 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>> The changelog isn't that convincing, really. This is kmem_cache_create()
>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>> problem you're seeing? If this is really an issue, I'd blame the page
>> allocator as GFP_KERNEL should just work.
>
> nf_conntrack creates separate slab-cache for each net-namespace,
> this patch of course not eliminates the chance of failure, but makes it more 
> acceptable.

I'm still surprised you are seeing failures. mm/slab.c hasn't changed 
significantly in a long time. Why hasn't anyone reported this before? I'd 
still be inclined to shift the blame to the page allocator... Mel, 
Christoph?

On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of 
> dumb NR_CPUS-size array.
> probably better to fix this side...

So how big is 'struct kmem_cache' for your configuration anyway? Fixing 
the per-cpu data structures would be nice but I'm guessing it'll be 
slightly painful for mm/slab.c.

 			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:14   ` Pekka Enberg
@ 2011-07-20 13:43     ` Mel Gorman
  -1 siblings, 0 replies; 67+ messages in thread
From: Mel Gorman @ 2011-07-20 13:43 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm,
	Christoph Lameter, linux-kernel, Matt Mackall

On Wed, Jul 20, 2011 at 04:14:51PM +0300, Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> >Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER,
> >thus there is a good chance of unsuccessful allocation.
> >With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively.
> >
> >Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> >---
> >mm/slab.c |    2 +-
> >1 files changed, 1 insertions(+), 1 deletions(-)
> >
> >diff --git a/mm/slab.c b/mm/slab.c
> >index d96e223..53bddc8 100644
> >--- a/mm/slab.c
> >+++ b/mm/slab.c
> >@@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
> >		gfp = GFP_NOWAIT;
> >
> >	/* Get cache's description obj. */
> >-	cachep = kmem_cache_zalloc(&cache_cache, gfp);
> >+	cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT);
> >	if (!cachep)
> >		goto oops;
> 
> The changelog isn't that convincing, really. This is
> kmem_cache_create() so I'm surprised we'd ever get NULL here in
> practice. Does this fix some problem you're seeing? If this is
> really an issue, I'd blame the page allocator as GFP_KERNEL should
> just work.
> 

Besides, is allocating from cache_cache really a
PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an
order-2 allocation which is supporting up to 512 CPUs and 512 nodes.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:43     ` Mel Gorman
  0 siblings, 0 replies; 67+ messages in thread
From: Mel Gorman @ 2011-07-20 13:43 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm,
	Christoph Lameter, linux-kernel, Matt Mackall

On Wed, Jul 20, 2011 at 04:14:51PM +0300, Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> >Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER,
> >thus there is a good chance of unsuccessful allocation.
> >With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively.
> >
> >Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
> >---
> >mm/slab.c |    2 +-
> >1 files changed, 1 insertions(+), 1 deletions(-)
> >
> >diff --git a/mm/slab.c b/mm/slab.c
> >index d96e223..53bddc8 100644
> >--- a/mm/slab.c
> >+++ b/mm/slab.c
> >@@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
> >		gfp = GFP_NOWAIT;
> >
> >	/* Get cache's description obj. */
> >-	cachep = kmem_cache_zalloc(&cache_cache, gfp);
> >+	cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT);
> >	if (!cachep)
> >		goto oops;
> 
> The changelog isn't that convincing, really. This is
> kmem_cache_create() so I'm surprised we'd ever get NULL here in
> practice. Does this fix some problem you're seeing? If this is
> really an issue, I'd blame the page allocator as GFP_KERNEL should
> just work.
> 

Besides, is allocating from cache_cache really a
PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an
order-2 allocation which is supporting up to 512 CPUs and 512 nodes.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:42       ` Pekka Enberg
@ 2011-07-20 13:50         ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 13:50 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>> The changelog isn't that convincing, really. This is kmem_cache_create()
>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>>> problem you're seeing? If this is really an issue, I'd blame the page
>>> allocator as GFP_KERNEL should just work.
>>
>> nf_conntrack creates separate slab-cache for each net-namespace,
>> this patch of course not eliminates the chance of failure, but makes it more
>> acceptable.
>
> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
> significantly in a long time. Why hasn't anyone reported this before? I'd
> still be inclined to shift the blame to the page allocator... Mel,
> Christoph?
>
> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of
>> dumb NR_CPUS-size array.
>> probably better to fix this side...
>
> So how big is 'struct kmem_cache' for your configuration anyway? Fixing
> the per-cpu data structures would be nice but I'm guessing it'll be
> slightly painful for mm/slab.c.

With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k!
so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3

>
>   			Pekka


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:50         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 13:50 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>> The changelog isn't that convincing, really. This is kmem_cache_create()
>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>>> problem you're seeing? If this is really an issue, I'd blame the page
>>> allocator as GFP_KERNEL should just work.
>>
>> nf_conntrack creates separate slab-cache for each net-namespace,
>> this patch of course not eliminates the chance of failure, but makes it more
>> acceptable.
>
> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
> significantly in a long time. Why hasn't anyone reported this before? I'd
> still be inclined to shift the blame to the page allocator... Mel,
> Christoph?
>
> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of
>> dumb NR_CPUS-size array.
>> probably better to fix this side...
>
> So how big is 'struct kmem_cache' for your configuration anyway? Fixing
> the per-cpu data structures would be nice but I'm guessing it'll be
> slightly painful for mm/slab.c.

With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k!
so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3

>
>   			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:50         ` Konstantin Khlebnikov
@ 2011-07-20 13:53           ` Pekka Enberg
  -1 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 13:53 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>>> The changelog isn't that convincing, really. This is kmem_cache_create()
>>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>>>> problem you're seeing? If this is really an issue, I'd blame the page
>>>> allocator as GFP_KERNEL should just work.
>>> 
>>> nf_conntrack creates separate slab-cache for each net-namespace,
>>> this patch of course not eliminates the chance of failure, but makes it 
>>> more
>>> acceptable.
>> 
>> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
>> significantly in a long time. Why hasn't anyone reported this before? I'd
>> still be inclined to shift the blame to the page allocator... Mel,
>> Christoph?
>> 
>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead 
>>> of
>>> dumb NR_CPUS-size array.
>>> probably better to fix this side...
>> 
>> So how big is 'struct kmem_cache' for your configuration anyway? Fixing
>> the per-cpu data structures would be nice but I'm guessing it'll be
>> slightly painful for mm/slab.c.
>
> With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k!
> so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3

That's somewhat sad. I suppose I can just merge your patch unless other 
people object to it. I'd like a v2 with better changelog though.

 			Pekka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:53           ` Pekka Enberg
  0 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 13:53 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>>> The changelog isn't that convincing, really. This is kmem_cache_create()
>>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>>>> problem you're seeing? If this is really an issue, I'd blame the page
>>>> allocator as GFP_KERNEL should just work.
>>> 
>>> nf_conntrack creates separate slab-cache for each net-namespace,
>>> this patch of course not eliminates the chance of failure, but makes it 
>>> more
>>> acceptable.
>> 
>> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
>> significantly in a long time. Why hasn't anyone reported this before? I'd
>> still be inclined to shift the blame to the page allocator... Mel,
>> Christoph?
>> 
>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead 
>>> of
>>> dumb NR_CPUS-size array.
>>> probably better to fix this side...
>> 
>> So how big is 'struct kmem_cache' for your configuration anyway? Fixing
>> the per-cpu data structures would be nice but I'm guessing it'll be
>> slightly painful for mm/slab.c.
>
> With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k!
> so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3

That's somewhat sad. I suppose I can just merge your patch unless other 
people object to it. I'd like a v2 with better changelog though.

 			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:42       ` Pekka Enberg
@ 2011-07-20 13:54         ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 13:54 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Pekka Enberg wrote:

> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> > > The changelog isn't that convincing, really. This is kmem_cache_create()
> > > so I'm surprised we'd ever get NULL here in practice. Does this fix some
> > > problem you're seeing? If this is really an issue, I'd blame the page
> > > allocator as GFP_KERNEL should just work.
> >
> > nf_conntrack creates separate slab-cache for each net-namespace,
> > this patch of course not eliminates the chance of failure, but makes it more
> > acceptable.
>
> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
> significantly in a long time. Why hasn't anyone reported this before? I'd
> still be inclined to shift the blame to the page allocator... Mel, Christoph?

There was a lot of recent fiddling with the reclaim logic. Maybe some of
those changes caused the problem?

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:54         ` Christoph Lameter
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 13:54 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Pekka Enberg wrote:

> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> > > The changelog isn't that convincing, really. This is kmem_cache_create()
> > > so I'm surprised we'd ever get NULL here in practice. Does this fix some
> > > problem you're seeing? If this is really an issue, I'd blame the page
> > > allocator as GFP_KERNEL should just work.
> >
> > nf_conntrack creates separate slab-cache for each net-namespace,
> > this patch of course not eliminates the chance of failure, but makes it more
> > acceptable.
>
> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
> significantly in a long time. Why hasn't anyone reported this before? I'd
> still be inclined to shift the blame to the page allocator... Mel, Christoph?

There was a lot of recent fiddling with the reclaim logic. Maybe some of
those changes caused the problem?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:43     ` Mel Gorman
@ 2011-07-20 13:56       ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 13:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm,
	linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Mel Gorman wrote:

> > The changelog isn't that convincing, really. This is
> > kmem_cache_create() so I'm surprised we'd ever get NULL here in
> > practice. Does this fix some problem you're seeing? If this is
> > really an issue, I'd blame the page allocator as GFP_KERNEL should
> > just work.
> >
>
> Besides, is allocating from cache_cache really a
> PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an
> order-2 allocation which is supporting up to 512 CPUs and 512 nodes.

Slab's kmem_cache is configured with an array of NR_CPUS which is the
maximum nr of cpus supported. Some distros support 4096 cpus in order to
accomodate SGI machines. That array then will have the size of 4096 * 8 =
32k


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:56       ` Christoph Lameter
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 13:56 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm,
	linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Mel Gorman wrote:

> > The changelog isn't that convincing, really. This is
> > kmem_cache_create() so I'm surprised we'd ever get NULL here in
> > practice. Does this fix some problem you're seeing? If this is
> > really an issue, I'd blame the page allocator as GFP_KERNEL should
> > just work.
> >
>
> Besides, is allocating from cache_cache really a
> PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an
> order-2 allocation which is supporting up to 512 CPUs and 512 nodes.

Slab's kmem_cache is configured with an array of NR_CPUS which is the
maximum nr of cpus supported. Some distros support 4096 cpus in order to
accomodate SGI machines. That array then will have the size of 4096 * 8 =
32k

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:50         ` Konstantin Khlebnikov
@ 2011-07-20 13:59           ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 13:59 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

Konstantin Khlebnikov wrote:
> Pekka Enberg wrote:
>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>>> The changelog isn't that convincing, really. This is kmem_cache_create()
>>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>>>> problem you're seeing? If this is really an issue, I'd blame the page
>>>> allocator as GFP_KERNEL should just work.
>>>
>>> nf_conntrack creates separate slab-cache for each net-namespace,
>>> this patch of course not eliminates the chance of failure, but makes it more
>>> acceptable.
>>
>> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
>> significantly in a long time. Why hasn't anyone reported this before? I'd
>> still be inclined to shift the blame to the page allocator... Mel,
>> Christoph?
>>
>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of
>>> dumb NR_CPUS-size array.
>>> probably better to fix this side...
>>
>> So how big is 'struct kmem_cache' for your configuration anyway? Fixing
>> the per-cpu data structures would be nice but I'm guessing it'll be
>> slightly painful for mm/slab.c.
>
> With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k!
> so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3

sorry, it is 0x9070 bytes, 36+ kb, 9+ pages

>
>>
>>    			Pekka
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 13:59           ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 13:59 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel,
	Matt Mackall, mgorman

Konstantin Khlebnikov wrote:
> Pekka Enberg wrote:
>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>>> The changelog isn't that convincing, really. This is kmem_cache_create()
>>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>>>> problem you're seeing? If this is really an issue, I'd blame the page
>>>> allocator as GFP_KERNEL should just work.
>>>
>>> nf_conntrack creates separate slab-cache for each net-namespace,
>>> this patch of course not eliminates the chance of failure, but makes it more
>>> acceptable.
>>
>> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
>> significantly in a long time. Why hasn't anyone reported this before? I'd
>> still be inclined to shift the blame to the page allocator... Mel,
>> Christoph?
>>
>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of
>>> dumb NR_CPUS-size array.
>>> probably better to fix this side...
>>
>> So how big is 'struct kmem_cache' for your configuration anyway? Fixing
>> the per-cpu data structures would be nice but I'm guessing it'll be
>> slightly painful for mm/slab.c.
>
> With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k!
> so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3

sorry, it is 0x9070 bytes, 36+ kb, 9+ pages

>
>>
>>    			Pekka
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:56       ` Christoph Lameter
@ 2011-07-20 14:08         ` Eric Dumazet
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 14:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 à 08:56 -0500, Christoph Lameter a écrit :
> On Wed, 20 Jul 2011, Mel Gorman wrote:
> 
> > > The changelog isn't that convincing, really. This is
> > > kmem_cache_create() so I'm surprised we'd ever get NULL here in
> > > practice. Does this fix some problem you're seeing? If this is
> > > really an issue, I'd blame the page allocator as GFP_KERNEL should
> > > just work.
> > >
> >
> > Besides, is allocating from cache_cache really a
> > PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an
> > order-2 allocation which is supporting up to 512 CPUs and 512 nodes.
> 
> Slab's kmem_cache is configured with an array of NR_CPUS which is the
> maximum nr of cpus supported. Some distros support 4096 cpus in order to
> accomodate SGI machines. That array then will have the size of 4096 * 8 =
> 32k

We currently support a dynamic schem for the possible nodes :

cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + 
	nr_node_ids * sizeof(struct kmem_list3 *);

We could have a similar trick to make the real size both depends on
nr_node_ids and nr_cpu_ids.

(struct kmem_cache)->array would become a pointer.




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 14:08         ` Eric Dumazet
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 14:08 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 A  08:56 -0500, Christoph Lameter a A(C)crit :
> On Wed, 20 Jul 2011, Mel Gorman wrote:
> 
> > > The changelog isn't that convincing, really. This is
> > > kmem_cache_create() so I'm surprised we'd ever get NULL here in
> > > practice. Does this fix some problem you're seeing? If this is
> > > really an issue, I'd blame the page allocator as GFP_KERNEL should
> > > just work.
> > >
> >
> > Besides, is allocating from cache_cache really a
> > PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an
> > order-2 allocation which is supporting up to 512 CPUs and 512 nodes.
> 
> Slab's kmem_cache is configured with an array of NR_CPUS which is the
> maximum nr of cpus supported. Some distros support 4096 cpus in order to
> accomodate SGI machines. That array then will have the size of 4096 * 8 =
> 32k

We currently support a dynamic schem for the possible nodes :

cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + 
	nr_node_ids * sizeof(struct kmem_list3 *);

We could have a similar trick to make the real size both depends on
nr_node_ids and nr_cpu_ids.

(struct kmem_cache)->array would become a pointer.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:54         ` Christoph Lameter
@ 2011-07-20 14:20           ` Mel Gorman
  -1 siblings, 0 replies; 67+ messages in thread
From: Mel Gorman @ 2011-07-20 14:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm,
	linux-kernel, Matt Mackall

On Wed, Jul 20, 2011 at 08:54:10AM -0500, Christoph Lameter wrote:
> On Wed, 20 Jul 2011, Pekka Enberg wrote:
> 
> > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> > > > The changelog isn't that convincing, really. This is kmem_cache_create()
> > > > so I'm surprised we'd ever get NULL here in practice. Does this fix some
> > > > problem you're seeing? If this is really an issue, I'd blame the page
> > > > allocator as GFP_KERNEL should just work.
> > >
> > > nf_conntrack creates separate slab-cache for each net-namespace,
> > > this patch of course not eliminates the chance of failure, but makes it more
> > > acceptable.
> >
> > I'm still surprised you are seeing failures. mm/slab.c hasn't changed
> > significantly in a long time. Why hasn't anyone reported this before? I'd
> > still be inclined to shift the blame to the page allocator... Mel, Christoph?
> 
> There was a lot of recent fiddling with the reclaim logic. Maybe some of
> those changes caused the problem?
> 

It's more likely that creating new slabs while under memory pressure
significant enough to fail an order-4 allocation is a situation that is
rarely tested. 

What kernel version did this failure occur on? What was the system doing
at the time of failure? Can the page allocation failure message be
posted?

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 14:20           ` Mel Gorman
  0 siblings, 0 replies; 67+ messages in thread
From: Mel Gorman @ 2011-07-20 14:20 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm,
	linux-kernel, Matt Mackall

On Wed, Jul 20, 2011 at 08:54:10AM -0500, Christoph Lameter wrote:
> On Wed, 20 Jul 2011, Pekka Enberg wrote:
> 
> > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
> > > > The changelog isn't that convincing, really. This is kmem_cache_create()
> > > > so I'm surprised we'd ever get NULL here in practice. Does this fix some
> > > > problem you're seeing? If this is really an issue, I'd blame the page
> > > > allocator as GFP_KERNEL should just work.
> > >
> > > nf_conntrack creates separate slab-cache for each net-namespace,
> > > this patch of course not eliminates the chance of failure, but makes it more
> > > acceptable.
> >
> > I'm still surprised you are seeing failures. mm/slab.c hasn't changed
> > significantly in a long time. Why hasn't anyone reported this before? I'd
> > still be inclined to shift the blame to the page allocator... Mel, Christoph?
> 
> There was a lot of recent fiddling with the reclaim logic. Maybe some of
> those changes caused the problem?
> 

It's more likely that creating new slabs while under memory pressure
significant enough to fail an order-4 allocation is a situation that is
rarely tested. 

What kernel version did this failure occur on? What was the system doing
at the time of failure? Can the page allocation failure message be
posted?

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 14:20           ` Mel Gorman
@ 2011-07-20 14:32             ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 14:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-mm,
	linux-kernel, Matt Mackall

Mel Gorman wrote:
> On Wed, Jul 20, 2011 at 08:54:10AM -0500, Christoph Lameter wrote:
>> On Wed, 20 Jul 2011, Pekka Enberg wrote:
>>
>>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>>>> The changelog isn't that convincing, really. This is kmem_cache_create()
>>>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>>>>> problem you're seeing? If this is really an issue, I'd blame the page
>>>>> allocator as GFP_KERNEL should just work.
>>>>
>>>> nf_conntrack creates separate slab-cache for each net-namespace,
>>>> this patch of course not eliminates the chance of failure, but makes it more
>>>> acceptable.
>>>
>>> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
>>> significantly in a long time. Why hasn't anyone reported this before? I'd
>>> still be inclined to shift the blame to the page allocator... Mel, Christoph?
>>
>> There was a lot of recent fiddling with the reclaim logic. Maybe some of
>> those changes caused the problem?
>>
>
> It's more likely that creating new slabs while under memory pressure
> significant enough to fail an order-4 allocation is a situation that is
> rarely tested.
>
> What kernel version did this failure occur on? What was the system doing
> at the time of failure? Can the page allocation failure message be
> posted?
>

I catch this on our rhel6-openvz kernel, and yes it very patchy,
but I don't see any reasons why this cannot be reproduced on mainline kernel.

there was abount ten containers with random stuff, node already do intensive swapout but still alive,
in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack,
there no other messages except:
Unable to create nf_conn slab cache
and some
nf_conntrack: falling back to vmalloc.
(it try allocates huge hash table and do it via vmalloc if kmalloc fails)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 14:32             ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 14:32 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-mm,
	linux-kernel, Matt Mackall

Mel Gorman wrote:
> On Wed, Jul 20, 2011 at 08:54:10AM -0500, Christoph Lameter wrote:
>> On Wed, 20 Jul 2011, Pekka Enberg wrote:
>>
>>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote:
>>>>> The changelog isn't that convincing, really. This is kmem_cache_create()
>>>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some
>>>>> problem you're seeing? If this is really an issue, I'd blame the page
>>>>> allocator as GFP_KERNEL should just work.
>>>>
>>>> nf_conntrack creates separate slab-cache for each net-namespace,
>>>> this patch of course not eliminates the chance of failure, but makes it more
>>>> acceptable.
>>>
>>> I'm still surprised you are seeing failures. mm/slab.c hasn't changed
>>> significantly in a long time. Why hasn't anyone reported this before? I'd
>>> still be inclined to shift the blame to the page allocator... Mel, Christoph?
>>
>> There was a lot of recent fiddling with the reclaim logic. Maybe some of
>> those changes caused the problem?
>>
>
> It's more likely that creating new slabs while under memory pressure
> significant enough to fail an order-4 allocation is a situation that is
> rarely tested.
>
> What kernel version did this failure occur on? What was the system doing
> at the time of failure? Can the page allocation failure message be
> posted?
>

I catch this on our rhel6-openvz kernel, and yes it very patchy,
but I don't see any reasons why this cannot be reproduced on mainline kernel.

there was abount ten containers with random stuff, node already do intensive swapout but still alive,
in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack,
there no other messages except:
Unable to create nf_conn slab cache
and some
nf_conntrack: falling back to vmalloc.
(it try allocates huge hash table and do it via vmalloc if kmalloc fails)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 14:32             ` Konstantin Khlebnikov
@ 2011-07-20 14:40               ` Eric Dumazet
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 14:40 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Mel Gorman, Christoph Lameter, Pekka Enberg, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 à 18:32 +0400, Konstantin Khlebnikov a
écrit :

> I catch this on our rhel6-openvz kernel, and yes it very patchy,
> but I don't see any reasons why this cannot be reproduced on mainline kernel.
> 
> there was abount ten containers with random stuff, node already do intensive swapout but still alive,
> in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack,
> there no other messages except:
> Unable to create nf_conn slab cache
> and some
> nf_conntrack: falling back to vmalloc.
> (it try allocates huge hash table and do it via vmalloc if kmalloc fails)


Does this kernel contain commit 6d4831c2 ?
(vfs: avoid large kmalloc()s for the fdtable)




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 14:40               ` Eric Dumazet
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 14:40 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Mel Gorman, Christoph Lameter, Pekka Enberg, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 A  18:32 +0400, Konstantin Khlebnikov a
A(C)crit :

> I catch this on our rhel6-openvz kernel, and yes it very patchy,
> but I don't see any reasons why this cannot be reproduced on mainline kernel.
> 
> there was abount ten containers with random stuff, node already do intensive swapout but still alive,
> in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack,
> there no other messages except:
> Unable to create nf_conn slab cache
> and some
> nf_conntrack: falling back to vmalloc.
> (it try allocates huge hash table and do it via vmalloc if kmalloc fails)


Does this kernel contain commit 6d4831c2 ?
(vfs: avoid large kmalloc()s for the fdtable)



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 14:40               ` Eric Dumazet
@ 2011-07-20 14:47                 ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 14:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Christoph Lameter, Pekka Enberg, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Eric Dumazet wrote:
> Le mercredi 20 juillet 2011 à 18:32 +0400, Konstantin Khlebnikov a
> écrit :
>
>> I catch this on our rhel6-openvz kernel, and yes it very patchy,
>> but I don't see any reasons why this cannot be reproduced on mainline kernel.
>>
>> there was abount ten containers with random stuff, node already do intensive swapout but still alive,
>> in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack,
>> there no other messages except:
>> Unable to create nf_conn slab cache
>> and some
>> nf_conntrack: falling back to vmalloc.
>> (it try allocates huge hash table and do it via vmalloc if kmalloc fails)
>
>
> Does this kernel contain commit 6d4831c2 ?
> (vfs: avoid large kmalloc()s for the fdtable)
>

yes, but not exactly, in our kerner it looks like:

static inline void * alloc_fdmem(unsigned int size)
{
	if (size <= PAGE_SIZE)
		return kmalloc(size, GFP_KERNEL);
	else
		return vmalloc(size);
}

and looks like this change is came from ancient times =)

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 14:47                 ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-20 14:47 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Christoph Lameter, Pekka Enberg, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Eric Dumazet wrote:
> Le mercredi 20 juillet 2011 à 18:32 +0400, Konstantin Khlebnikov a
> écrit :
>
>> I catch this on our rhel6-openvz kernel, and yes it very patchy,
>> but I don't see any reasons why this cannot be reproduced on mainline kernel.
>>
>> there was abount ten containers with random stuff, node already do intensive swapout but still alive,
>> in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack,
>> there no other messages except:
>> Unable to create nf_conn slab cache
>> and some
>> nf_conntrack: falling back to vmalloc.
>> (it try allocates huge hash table and do it via vmalloc if kmalloc fails)
>
>
> Does this kernel contain commit 6d4831c2 ?
> (vfs: avoid large kmalloc()s for the fdtable)
>

yes, but not exactly, in our kerner it looks like:

static inline void * alloc_fdmem(unsigned int size)
{
	if (size <= PAGE_SIZE)
		return kmalloc(size, GFP_KERNEL);
	else
		return vmalloc(size);
}

and looks like this change is came from ancient times =)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 14:08         ` Eric Dumazet
@ 2011-07-20 14:52           ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 14:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:

> > Slab's kmem_cache is configured with an array of NR_CPUS which is the
> > maximum nr of cpus supported. Some distros support 4096 cpus in order to
> > accomodate SGI machines. That array then will have the size of 4096 * 8 =
> > 32k
>
> We currently support a dynamic schem for the possible nodes :
>
> cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
> 	nr_node_ids * sizeof(struct kmem_list3 *);
>
> We could have a similar trick to make the real size both depends on
> nr_node_ids and nr_cpu_ids.
>
> (struct kmem_cache)->array would become a pointer.

We should be making it a per cpu pointer like slub then. I looked at what
it would take to do so a couple of month ago but it was quite invasive.

The other solution is to use slub instead.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 14:52           ` Christoph Lameter
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 14:52 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:

> > Slab's kmem_cache is configured with an array of NR_CPUS which is the
> > maximum nr of cpus supported. Some distros support 4096 cpus in order to
> > accomodate SGI machines. That array then will have the size of 4096 * 8 =
> > 32k
>
> We currently support a dynamic schem for the possible nodes :
>
> cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
> 	nr_node_ids * sizeof(struct kmem_list3 *);
>
> We could have a similar trick to make the real size both depends on
> nr_node_ids and nr_cpu_ids.
>
> (struct kmem_cache)->array would become a pointer.

We should be making it a per cpu pointer like slub then. I looked at what
it would take to do so a couple of month ago but it was quite invasive.

The other solution is to use slub instead.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 14:52           ` Christoph Lameter
@ 2011-07-20 15:09             ` Eric Dumazet
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 15:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit :
> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> 
> > > Slab's kmem_cache is configured with an array of NR_CPUS which is the
> > > maximum nr of cpus supported. Some distros support 4096 cpus in order to
> > > accomodate SGI machines. That array then will have the size of 4096 * 8 =
> > > 32k
> >
> > We currently support a dynamic schem for the possible nodes :
> >
> > cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
> > 	nr_node_ids * sizeof(struct kmem_list3 *);
> >
> > We could have a similar trick to make the real size both depends on
> > nr_node_ids and nr_cpu_ids.
> >
> > (struct kmem_cache)->array would become a pointer.
> 
> We should be making it a per cpu pointer like slub then. I looked at what
> it would take to do so a couple of month ago but it was quite invasive.
> 

Lets try this first patch, simple enough : No need to setup percpu data
for a one time use structure...

[PATCH] slab: remove one NR_CPUS dependency

Reduce high order allocations in do_tune_cpucache() for some setups.
(NR_CPUS=4096 -> we need 64KB)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Christoph Lameter <cl@linux.com>
---
 mm/slab.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d96e223..862bd12 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3933,7 +3933,7 @@ fail:
 
 struct ccupdate_struct {
 	struct kmem_cache *cachep;
-	struct array_cache *new[NR_CPUS];
+	struct array_cache *new[0];
 };
 
 static void do_ccupdate_local(void *info)
@@ -3955,7 +3955,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
 	struct ccupdate_struct *new;
 	int i;
 
-	new = kzalloc(sizeof(*new), gfp);
+	new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *),
+		      gfp);
 	if (!new)
 		return -ENOMEM;
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 15:09             ` Eric Dumazet
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 15:09 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 A  09:52 -0500, Christoph Lameter a A(C)crit :
> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> 
> > > Slab's kmem_cache is configured with an array of NR_CPUS which is the
> > > maximum nr of cpus supported. Some distros support 4096 cpus in order to
> > > accomodate SGI machines. That array then will have the size of 4096 * 8 =
> > > 32k
> >
> > We currently support a dynamic schem for the possible nodes :
> >
> > cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
> > 	nr_node_ids * sizeof(struct kmem_list3 *);
> >
> > We could have a similar trick to make the real size both depends on
> > nr_node_ids and nr_cpu_ids.
> >
> > (struct kmem_cache)->array would become a pointer.
> 
> We should be making it a per cpu pointer like slub then. I looked at what
> it would take to do so a couple of month ago but it was quite invasive.
> 

Lets try this first patch, simple enough : No need to setup percpu data
for a one time use structure...

[PATCH] slab: remove one NR_CPUS dependency

Reduce high order allocations in do_tune_cpucache() for some setups.
(NR_CPUS=4096 -> we need 64KB)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Christoph Lameter <cl@linux.com>
---
 mm/slab.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d96e223..862bd12 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3933,7 +3933,7 @@ fail:
 
 struct ccupdate_struct {
 	struct kmem_cache *cachep;
-	struct array_cache *new[NR_CPUS];
+	struct array_cache *new[0];
 };
 
 static void do_ccupdate_local(void *info)
@@ -3955,7 +3955,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
 	struct ccupdate_struct *new;
 	int i;
 
-	new = kzalloc(sizeof(*new), gfp);
+	new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *),
+		      gfp);
 	if (!new)
 		return -ENOMEM;
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 15:09             ` Eric Dumazet
@ 2011-07-20 15:34               ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 15:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:

> [PATCH] slab: remove one NR_CPUS dependency

Ok simple enough.

Acked-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 15:34               ` Christoph Lameter
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 15:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:

> [PATCH] slab: remove one NR_CPUS dependency

Ok simple enough.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 13:53           ` Pekka Enberg
@ 2011-07-20 15:36             ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 15:36 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Pekka Enberg wrote:

> That's somewhat sad. I suppose I can just merge your patch unless other people
> object to it. I'd like a v2 with better changelog though.

Seems to be the simplest solution. Fix the changelog then add

Acked-by: Christoph Lameter <cl@linux.com>


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 15:36             ` Christoph Lameter
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 15:36 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel,
	Matt Mackall, mgorman

On Wed, 20 Jul 2011, Pekka Enberg wrote:

> That's somewhat sad. I suppose I can just merge your patch unless other people
> object to it. I'd like a v2 with better changelog though.

Seems to be the simplest solution. Fix the changelog then add

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 15:34               ` Christoph Lameter
@ 2011-07-20 15:56                 ` Eric Dumazet
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 15:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 à 10:34 -0500, Christoph Lameter a écrit :
> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> 
> > [PATCH] slab: remove one NR_CPUS dependency
> 
> Ok simple enough.
> 
> Acked-by: Christoph Lameter <cl@linux.com>
> 

Thanks Christoph

Here is the second patch, also simple and working for me (tested on
x86_64, NR_CPUS=4096, on my 2x4x2 machine)

Eventually, we could avoid the extra 'array' pointer if NR_CPUS is known
to be a small value (<= 16 for example)

Note that adding ____cacheline_aligned_in_smp on nodelists[] actually
helps performance, as all following fields are readonly after kmem_cache
setup.

[PATCH] slab: shrinks sizeof(struct kmem_cache)

Reduce high order allocations for some setups.
(NR_CPUS=4096 -> we need 64KB per kmem_cache struct)


Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Christoph Lameter <cl@linux.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/slab_def.h |    4 ++--
 mm/slab.c                |   10 ++++++----
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 83203ae..abedd8e 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -51,7 +51,7 @@
 
 struct kmem_cache {
 /* 1) per-cpu data, touched during every alloc/free */
-	struct array_cache *array[NR_CPUS];
+	struct array_cache **array;
 /* 2) Cache tunables. Protected by cache_chain_mutex */
 	unsigned int batchcount;
 	unsigned int limit;
@@ -118,7 +118,7 @@ struct kmem_cache {
 	 * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
 	 * is statically defined, so we reserve the max number of nodes.
 	 */
-	struct kmem_list3 *nodelists[MAX_NUMNODES];
+	struct kmem_list3 *nodelists[MAX_NUMNODES] ____cacheline_aligned_in_smp;
 	/*
 	 * Do not add fields after nodelists[]
 	 */
diff --git a/mm/slab.c b/mm/slab.c
index d96e223..f951015 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -574,7 +574,9 @@ static struct arraycache_init initarray_generic =
     { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
 
 /* internal cache of cache description objs */
+static struct array_cache *array_cache_cache[NR_CPUS];
 static struct kmem_cache cache_cache = {
+	.array = array_cache_cache,
 	.batchcount = 1,
 	.limit = BOOT_CPUCACHE_ENTRIES,
 	.shared = 1,
@@ -1492,11 +1494,10 @@ void __init kmem_cache_init(void)
 	cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node];
 
 	/*
-	 * struct kmem_cache size depends on nr_node_ids, which
-	 * can be less than MAX_NUMNODES.
+	 * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
 	 */
-	cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
-				 nr_node_ids * sizeof(struct kmem_list3 *);
+	cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists[nr_node_ids]) +
+				 nr_cpu_ids * sizeof(struct array_cache *);
 #if DEBUG
 	cache_cache.obj_size = cache_cache.buffer_size;
 #endif
@@ -2308,6 +2309,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	if (!cachep)
 		goto oops;
 
+	cachep->array = (struct array_cache **)&cachep->nodelists[nr_node_ids];
 #if DEBUG
 	cachep->obj_size = size;
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 15:56                 ` Eric Dumazet
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 15:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 A  10:34 -0500, Christoph Lameter a A(C)crit :
> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> 
> > [PATCH] slab: remove one NR_CPUS dependency
> 
> Ok simple enough.
> 
> Acked-by: Christoph Lameter <cl@linux.com>
> 

Thanks Christoph

Here is the second patch, also simple and working for me (tested on
x86_64, NR_CPUS=4096, on my 2x4x2 machine)

Eventually, we could avoid the extra 'array' pointer if NR_CPUS is known
to be a small value (<= 16 for example)

Note that adding ____cacheline_aligned_in_smp on nodelists[] actually
helps performance, as all following fields are readonly after kmem_cache
setup.

[PATCH] slab: shrinks sizeof(struct kmem_cache)

Reduce high order allocations for some setups.
(NR_CPUS=4096 -> we need 64KB per kmem_cache struct)


Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Christoph Lameter <cl@linux.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
 include/linux/slab_def.h |    4 ++--
 mm/slab.c                |   10 ++++++----
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 83203ae..abedd8e 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -51,7 +51,7 @@
 
 struct kmem_cache {
 /* 1) per-cpu data, touched during every alloc/free */
-	struct array_cache *array[NR_CPUS];
+	struct array_cache **array;
 /* 2) Cache tunables. Protected by cache_chain_mutex */
 	unsigned int batchcount;
 	unsigned int limit;
@@ -118,7 +118,7 @@ struct kmem_cache {
 	 * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
 	 * is statically defined, so we reserve the max number of nodes.
 	 */
-	struct kmem_list3 *nodelists[MAX_NUMNODES];
+	struct kmem_list3 *nodelists[MAX_NUMNODES] ____cacheline_aligned_in_smp;
 	/*
 	 * Do not add fields after nodelists[]
 	 */
diff --git a/mm/slab.c b/mm/slab.c
index d96e223..f951015 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -574,7 +574,9 @@ static struct arraycache_init initarray_generic =
     { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
 
 /* internal cache of cache description objs */
+static struct array_cache *array_cache_cache[NR_CPUS];
 static struct kmem_cache cache_cache = {
+	.array = array_cache_cache,
 	.batchcount = 1,
 	.limit = BOOT_CPUCACHE_ENTRIES,
 	.shared = 1,
@@ -1492,11 +1494,10 @@ void __init kmem_cache_init(void)
 	cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node];
 
 	/*
-	 * struct kmem_cache size depends on nr_node_ids, which
-	 * can be less than MAX_NUMNODES.
+	 * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
 	 */
-	cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
-				 nr_node_ids * sizeof(struct kmem_list3 *);
+	cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists[nr_node_ids]) +
+				 nr_cpu_ids * sizeof(struct array_cache *);
 #if DEBUG
 	cache_cache.obj_size = cache_cache.buffer_size;
 #endif
@@ -2308,6 +2309,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	if (!cachep)
 		goto oops;
 
+	cachep->array = (struct array_cache **)&cachep->nodelists[nr_node_ids];
 #if DEBUG
 	cachep->obj_size = size;
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 15:56                 ` Eric Dumazet
@ 2011-07-20 16:17                   ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 16:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:

> Note that adding ____cacheline_aligned_in_smp on nodelists[] actually
> helps performance, as all following fields are readonly after kmem_cache
> setup.

Well but that is not addresssing the same issue. Could you separate that
out?

The other question that follows from this is then: Does that
alignment compensate for the loss of performance due to the additional
lookup in hot code paths and the additional cacheline reference required?

The per node pointers are lower priority in terms of performance than the
per cpu pointers. I'd rather have the per node pointers requiring an
additional lookup. Less impact on hot code paths.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 16:17                   ` Christoph Lameter
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 16:17 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:

> Note that adding ____cacheline_aligned_in_smp on nodelists[] actually
> helps performance, as all following fields are readonly after kmem_cache
> setup.

Well but that is not addresssing the same issue. Could you separate that
out?

The other question that follows from this is then: Does that
alignment compensate for the loss of performance due to the additional
lookup in hot code paths and the additional cacheline reference required?

The per node pointers are lower priority in terms of performance than the
per cpu pointers. I'd rather have the per node pointers requiring an
additional lookup. Less impact on hot code paths.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 16:17                   ` Christoph Lameter
@ 2011-07-20 16:31                     ` Eric Dumazet
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 16:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 à 11:17 -0500, Christoph Lameter a écrit :
> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> 
> > Note that adding ____cacheline_aligned_in_smp on nodelists[] actually
> > helps performance, as all following fields are readonly after kmem_cache
> > setup.
> 
> Well but that is not addresssing the same issue. Could you separate that
> out?
> 

I would like this patch not being a performance regression. I know some
people really want fast SLAB/SLUB ;)

> The other question that follows from this is then: Does that
> alignment compensate for the loss of performance due to the additional
> lookup in hot code paths and the additional cacheline reference required?
> 

In fact resulting code is smaller, because most fields are now with <
127 offset (x86 assembly code can use shorter instructions)

Before patch :
# size mm/slab.o
   text	   data	    bss	    dec	    hex	filename
  22605	 361665	     32	 384302	  5dd2e	mm/slab.o

After patch :
# size mm/slab.o
   text	   data	    bss	    dec	    hex	filename
  22347	 328929	  32800	 384076	  5dc4c	mm/slab.o

> The per node pointers are lower priority in terms of performance than the
> per cpu pointers. I'd rather have the per node pointers requiring an
> additional lookup. Less impact on hot code paths.
> 

Sure. I'll post a V2 to have CPU array before NODE array.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 16:31                     ` Eric Dumazet
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 16:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 A  11:17 -0500, Christoph Lameter a A(C)crit :
> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> 
> > Note that adding ____cacheline_aligned_in_smp on nodelists[] actually
> > helps performance, as all following fields are readonly after kmem_cache
> > setup.
> 
> Well but that is not addresssing the same issue. Could you separate that
> out?
> 

I would like this patch not being a performance regression. I know some
people really want fast SLAB/SLUB ;)

> The other question that follows from this is then: Does that
> alignment compensate for the loss of performance due to the additional
> lookup in hot code paths and the additional cacheline reference required?
> 

In fact resulting code is smaller, because most fields are now with <
127 offset (x86 assembly code can use shorter instructions)

Before patch :
# size mm/slab.o
   text	   data	    bss	    dec	    hex	filename
  22605	 361665	     32	 384302	  5dd2e	mm/slab.o

After patch :
# size mm/slab.o
   text	   data	    bss	    dec	    hex	filename
  22347	 328929	  32800	 384076	  5dc4c	mm/slab.o

> The per node pointers are lower priority in terms of performance than the
> per cpu pointers. I'd rather have the per node pointers requiring an
> additional lookup. Less impact on hot code paths.
> 

Sure. I'll post a V2 to have CPU array before NODE array.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 16:31                     ` Eric Dumazet
@ 2011-07-20 17:04                       ` Eric Dumazet
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 17:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 à 18:31 +0200, Eric Dumazet a écrit :

> In fact resulting code is smaller, because most fields are now with <
> 127 offset (x86 assembly code can use shorter instructions)
> 
> Before patch :
> # size mm/slab.o
>    text	   data	    bss	    dec	    hex	filename
>   22605	 361665	     32	 384302	  5dd2e	mm/slab.o
> 
> After patch :
> # size mm/slab.o
>    text	   data	    bss	    dec	    hex	filename
>   22347	 328929	  32800	 384076	  5dc4c	mm/slab.o
> 
> > The per node pointers are lower priority in terms of performance than the
> > per cpu pointers. I'd rather have the per node pointers requiring an
> > additional lookup. Less impact on hot code paths.
> > 
> 
> Sure. I'll post a V2 to have CPU array before NODE array.
> 

Here it is. My dev machine still run after booting with this patch.
No performance impact seen yet.

Thanks

[PATCH v2] slab: shrinks sizeof(struct kmem_cache)

Reduce high order allocations for some setups.
(NR_CPUS=4096 -> we need 64KB per kmem_cache struct)

We now allocate exact needed size (using nr_cpu_ids and nr_node_ids)

This also makes code a bit smaller on x86_64, since some field offsets
are less than the 127 limit :

Before patch :
# size mm/slab.o
   text    data     bss     dec     hex filename
  22605  361665      32  384302   5dd2e mm/slab.o

After patch :
# size mm/slab.o
   text    data     bss     dec     hex filename
  22349	 353473	   8224	 384046	  5dc2e	mm/slab.o

Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Christoph Lameter <cl@linux.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
v2: move the CPU array before NODE array.

 include/linux/slab_def.h |   26 +++++++++++++-------------
 mm/slab.c                |   10 ++++++----
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 83203ae..6e51c4d 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -50,21 +50,19 @@
  */
 
 struct kmem_cache {
-/* 1) per-cpu data, touched during every alloc/free */
-	struct array_cache *array[NR_CPUS];
-/* 2) Cache tunables. Protected by cache_chain_mutex */
+/* 1) Cache tunables. Protected by cache_chain_mutex */
 	unsigned int batchcount;
 	unsigned int limit;
 	unsigned int shared;
 
 	unsigned int buffer_size;
 	u32 reciprocal_buffer_size;
-/* 3) touched by every alloc & free from the backend */
+/* 2) touched by every alloc & free from the backend */
 
 	unsigned int flags;		/* constant flags */
 	unsigned int num;		/* # of objs per slab */
 
-/* 4) cache_grow/shrink */
+/* 3) cache_grow/shrink */
 	/* order of pgs per slab (2^n) */
 	unsigned int gfporder;
 
@@ -80,11 +78,11 @@ struct kmem_cache {
 	/* constructor func */
 	void (*ctor)(void *obj);
 
-/* 5) cache creation/removal */
+/* 4) cache creation/removal */
 	const char *name;
 	struct list_head next;
 
-/* 6) statistics */
+/* 5) statistics */
 #ifdef CONFIG_DEBUG_SLAB
 	unsigned long num_active;
 	unsigned long num_allocations;
@@ -111,16 +109,18 @@ struct kmem_cache {
 	int obj_size;
 #endif /* CONFIG_DEBUG_SLAB */
 
+/* 6) per-cpu/per-node data, touched during every alloc/free */
 	/*
-	 * We put nodelists[] at the end of kmem_cache, because we want to size
-	 * this array to nr_node_ids slots instead of MAX_NUMNODES
+	 * We put array[] at the end of kmem_cache, because we want to size
+	 * this array to nr_cpu_ids slots instead of NR_CPUS
 	 * (see kmem_cache_init())
-	 * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
-	 * is statically defined, so we reserve the max number of nodes.
+	 * We still use [NR_CPUS] and not [1] or [0] because cache_cache
+	 * is statically defined, so we reserve the max number of cpus.
 	 */
-	struct kmem_list3 *nodelists[MAX_NUMNODES];
+	struct kmem_list3 **nodelists;
+	struct array_cache *array[NR_CPUS];
 	/*
-	 * Do not add fields after nodelists[]
+	 * Do not add fields after array[]
 	 */
 };
 
diff --git a/mm/slab.c b/mm/slab.c
index d96e223..30181fb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -574,7 +574,9 @@ static struct arraycache_init initarray_generic =
     { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
 
 /* internal cache of cache description objs */
+static struct kmem_list3 *cache_cache_nodelists[MAX_NUMNODES];
 static struct kmem_cache cache_cache = {
+	.nodelists = cache_cache_nodelists,
 	.batchcount = 1,
 	.limit = BOOT_CPUCACHE_ENTRIES,
 	.shared = 1,
@@ -1492,11 +1494,10 @@ void __init kmem_cache_init(void)
 	cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node];
 
 	/*
-	 * struct kmem_cache size depends on nr_node_ids, which
-	 * can be less than MAX_NUMNODES.
+	 * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
 	 */
-	cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
-				 nr_node_ids * sizeof(struct kmem_list3 *);
+	cache_cache.buffer_size = offsetof(struct kmem_cache, array[nr_cpu_ids]) +
+				  nr_node_ids * sizeof(struct kmem_list3 *);
 #if DEBUG
 	cache_cache.obj_size = cache_cache.buffer_size;
 #endif
@@ -2308,6 +2309,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	if (!cachep)
 		goto oops;
 
+	cachep->nodelists = (struct kmem_list3 **)&cachep->array[nr_cpu_ids];
 #if DEBUG
 	cachep->obj_size = size;
 



^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 17:04                       ` Eric Dumazet
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-20 17:04 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 A  18:31 +0200, Eric Dumazet a A(C)crit :

> In fact resulting code is smaller, because most fields are now with <
> 127 offset (x86 assembly code can use shorter instructions)
> 
> Before patch :
> # size mm/slab.o
>    text	   data	    bss	    dec	    hex	filename
>   22605	 361665	     32	 384302	  5dd2e	mm/slab.o
> 
> After patch :
> # size mm/slab.o
>    text	   data	    bss	    dec	    hex	filename
>   22347	 328929	  32800	 384076	  5dc4c	mm/slab.o
> 
> > The per node pointers are lower priority in terms of performance than the
> > per cpu pointers. I'd rather have the per node pointers requiring an
> > additional lookup. Less impact on hot code paths.
> > 
> 
> Sure. I'll post a V2 to have CPU array before NODE array.
> 

Here it is. My dev machine still run after booting with this patch.
No performance impact seen yet.

Thanks

[PATCH v2] slab: shrinks sizeof(struct kmem_cache)

Reduce high order allocations for some setups.
(NR_CPUS=4096 -> we need 64KB per kmem_cache struct)

We now allocate exact needed size (using nr_cpu_ids and nr_node_ids)

This also makes code a bit smaller on x86_64, since some field offsets
are less than the 127 limit :

Before patch :
# size mm/slab.o
   text    data     bss     dec     hex filename
  22605  361665      32  384302   5dd2e mm/slab.o

After patch :
# size mm/slab.o
   text    data     bss     dec     hex filename
  22349	 353473	   8224	 384046	  5dc2e	mm/slab.o

Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Pekka Enberg <penberg@kernel.org>
CC: Christoph Lameter <cl@linux.com>
CC: Andrew Morton <akpm@linux-foundation.org>
---
v2: move the CPU array before NODE array.

 include/linux/slab_def.h |   26 +++++++++++++-------------
 mm/slab.c                |   10 ++++++----
 2 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 83203ae..6e51c4d 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -50,21 +50,19 @@
  */
 
 struct kmem_cache {
-/* 1) per-cpu data, touched during every alloc/free */
-	struct array_cache *array[NR_CPUS];
-/* 2) Cache tunables. Protected by cache_chain_mutex */
+/* 1) Cache tunables. Protected by cache_chain_mutex */
 	unsigned int batchcount;
 	unsigned int limit;
 	unsigned int shared;
 
 	unsigned int buffer_size;
 	u32 reciprocal_buffer_size;
-/* 3) touched by every alloc & free from the backend */
+/* 2) touched by every alloc & free from the backend */
 
 	unsigned int flags;		/* constant flags */
 	unsigned int num;		/* # of objs per slab */
 
-/* 4) cache_grow/shrink */
+/* 3) cache_grow/shrink */
 	/* order of pgs per slab (2^n) */
 	unsigned int gfporder;
 
@@ -80,11 +78,11 @@ struct kmem_cache {
 	/* constructor func */
 	void (*ctor)(void *obj);
 
-/* 5) cache creation/removal */
+/* 4) cache creation/removal */
 	const char *name;
 	struct list_head next;
 
-/* 6) statistics */
+/* 5) statistics */
 #ifdef CONFIG_DEBUG_SLAB
 	unsigned long num_active;
 	unsigned long num_allocations;
@@ -111,16 +109,18 @@ struct kmem_cache {
 	int obj_size;
 #endif /* CONFIG_DEBUG_SLAB */
 
+/* 6) per-cpu/per-node data, touched during every alloc/free */
 	/*
-	 * We put nodelists[] at the end of kmem_cache, because we want to size
-	 * this array to nr_node_ids slots instead of MAX_NUMNODES
+	 * We put array[] at the end of kmem_cache, because we want to size
+	 * this array to nr_cpu_ids slots instead of NR_CPUS
 	 * (see kmem_cache_init())
-	 * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache
-	 * is statically defined, so we reserve the max number of nodes.
+	 * We still use [NR_CPUS] and not [1] or [0] because cache_cache
+	 * is statically defined, so we reserve the max number of cpus.
 	 */
-	struct kmem_list3 *nodelists[MAX_NUMNODES];
+	struct kmem_list3 **nodelists;
+	struct array_cache *array[NR_CPUS];
 	/*
-	 * Do not add fields after nodelists[]
+	 * Do not add fields after array[]
 	 */
 };
 
diff --git a/mm/slab.c b/mm/slab.c
index d96e223..30181fb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -574,7 +574,9 @@ static struct arraycache_init initarray_generic =
     { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} };
 
 /* internal cache of cache description objs */
+static struct kmem_list3 *cache_cache_nodelists[MAX_NUMNODES];
 static struct kmem_cache cache_cache = {
+	.nodelists = cache_cache_nodelists,
 	.batchcount = 1,
 	.limit = BOOT_CPUCACHE_ENTRIES,
 	.shared = 1,
@@ -1492,11 +1494,10 @@ void __init kmem_cache_init(void)
 	cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node];
 
 	/*
-	 * struct kmem_cache size depends on nr_node_ids, which
-	 * can be less than MAX_NUMNODES.
+	 * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids
 	 */
-	cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
-				 nr_node_ids * sizeof(struct kmem_list3 *);
+	cache_cache.buffer_size = offsetof(struct kmem_cache, array[nr_cpu_ids]) +
+				  nr_node_ids * sizeof(struct kmem_list3 *);
 #if DEBUG
 	cache_cache.obj_size = cache_cache.buffer_size;
 #endif
@@ -2308,6 +2309,7 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	if (!cachep)
 		goto oops;
 
+	cachep->nodelists = (struct kmem_list3 **)&cachep->array[nr_cpu_ids];
 #if DEBUG
 	cachep->obj_size = size;
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 17:04                       ` Eric Dumazet
@ 2011-07-20 17:13                         ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 17:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:

> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)

This will solve the issue for small nr_cpu_ids but those with 4k cpus will
still have the issue.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 17:13                         ` Christoph Lameter
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 17:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:

> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)

This will solve the issue for small nr_cpu_ids but those with 4k cpus will
still have the issue.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 17:13                         ` Christoph Lameter
@ 2011-07-20 17:28                           ` Pekka Enberg
  -1 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 17:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:
>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)

On Wed, 20 Jul 2011, Christoph Lameter wrote:
> This will solve the issue for small nr_cpu_ids but those with 4k cpus will
> still have the issue.
>
> Acked-by: Christoph Lameter <cl@linux.com>

Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin 
though?

 			Pekka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 17:28                           ` Pekka Enberg
  0 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 17:28 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Eric Dumazet wrote:
>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)

On Wed, 20 Jul 2011, Christoph Lameter wrote:
> This will solve the issue for small nr_cpu_ids but those with 4k cpus will
> still have the issue.
>
> Acked-by: Christoph Lameter <cl@linux.com>

Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin 
though?

 			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 17:28                           ` Pekka Enberg
@ 2011-07-20 17:37                             ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 17:37 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Pekka Enberg wrote:

> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> > > [PATCH v2] slab: shrinks sizeof(struct kmem_cache)
>
> On Wed, 20 Jul 2011, Christoph Lameter wrote:
> > This will solve the issue for small nr_cpu_ids but those with 4k cpus will
> > still have the issue.
> >
> > Acked-by: Christoph Lameter <cl@linux.com>
>
> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
> though?

Those with 4k cpus will be thankful I guess.


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 17:37                             ` Christoph Lameter
  0 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-20 17:37 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Pekka Enberg wrote:

> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> > > [PATCH v2] slab: shrinks sizeof(struct kmem_cache)
>
> On Wed, 20 Jul 2011, Christoph Lameter wrote:
> > This will solve the issue for small nr_cpu_ids but those with 4k cpus will
> > still have the issue.
> >
> > Acked-by: Christoph Lameter <cl@linux.com>
>
> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
> though?

Those with 4k cpus will be thankful I guess.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 17:37                             ` Christoph Lameter
@ 2011-07-20 17:41                               ` Pekka Enberg
  -1 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 17:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Pekka Enberg wrote:
>> On Wed, 20 Jul 2011, Eric Dumazet wrote:
>>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)
>>
>> On Wed, 20 Jul 2011, Christoph Lameter wrote:
>>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will
>>> still have the issue.
>>>
>>> Acked-by: Christoph Lameter <cl@linux.com>
>>
>> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
>> though?

On Wed, 20 Jul 2011, Christoph Lameter wrote:
> Those with 4k cpus will be thankful I guess.

OTOH, I'm slightly worried that it might mask a real problem 
with GFP_KERNEL not being aggressive enough. Mel?

 			Pekka

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 17:41                               ` Pekka Enberg
  0 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-20 17:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Wed, 20 Jul 2011, Pekka Enberg wrote:
>> On Wed, 20 Jul 2011, Eric Dumazet wrote:
>>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)
>>
>> On Wed, 20 Jul 2011, Christoph Lameter wrote:
>>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will
>>> still have the issue.
>>>
>>> Acked-by: Christoph Lameter <cl@linux.com>
>>
>> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
>> though?

On Wed, 20 Jul 2011, Christoph Lameter wrote:
> Those with 4k cpus will be thankful I guess.

OTOH, I'm slightly worried that it might mask a real problem 
with GFP_KERNEL not being aggressive enough. Mel?

 			Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 17:41                               ` Pekka Enberg
@ 2011-07-20 18:07                                 ` Matt Mackall
  -1 siblings, 0 replies; 67+ messages in thread
From: Matt Mackall @ 2011-07-20 18:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Eric Dumazet, Mel Gorman,
	Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel

On Wed, 2011-07-20 at 20:41 +0300, Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Pekka Enberg wrote:
> >> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> >>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)
> >>
> >> On Wed, 20 Jul 2011, Christoph Lameter wrote:
> >>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will
> >>> still have the issue.
> >>>
> >>> Acked-by: Christoph Lameter <cl@linux.com>
> >>
> >> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
> >> though?
> 
> On Wed, 20 Jul 2011, Christoph Lameter wrote:
> > Those with 4k cpus will be thankful I guess.
> 
> OTOH, I'm slightly worried that it might mask a real problem 
> with GFP_KERNEL not being aggressive enough. Mel?

I think that's already been demonstrated here, yes. It's just waiting
for another obscure workload to trigger it.

-- 
Mathematics is the supreme nostalgia of our time.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 18:07                                 ` Matt Mackall
  0 siblings, 0 replies; 67+ messages in thread
From: Matt Mackall @ 2011-07-20 18:07 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Eric Dumazet, Mel Gorman,
	Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel

On Wed, 2011-07-20 at 20:41 +0300, Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Pekka Enberg wrote:
> >> On Wed, 20 Jul 2011, Eric Dumazet wrote:
> >>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)
> >>
> >> On Wed, 20 Jul 2011, Christoph Lameter wrote:
> >>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will
> >>> still have the issue.
> >>>
> >>> Acked-by: Christoph Lameter <cl@linux.com>
> >>
> >> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
> >> though?
> 
> On Wed, 20 Jul 2011, Christoph Lameter wrote:
> > Those with 4k cpus will be thankful I guess.
> 
> OTOH, I'm slightly worried that it might mask a real problem 
> with GFP_KERNEL not being aggressive enough. Mel?

I think that's already been demonstrated here, yes. It's just waiting
for another obscure workload to trigger it.

-- 
Mathematics is the supreme nostalgia of our time.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 17:41                               ` Pekka Enberg
@ 2011-07-20 19:09                                 ` Mel Gorman
  -1 siblings, 0 replies; 67+ messages in thread
From: Mel Gorman @ 2011-07-20 19:09 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Eric Dumazet, Konstantin Khlebnikov,
	Andrew Morton, linux-mm, linux-kernel, Matt Mackall

On Wed, Jul 20, 2011 at 08:41:12PM +0300, Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Pekka Enberg wrote:
> >>On Wed, 20 Jul 2011, Eric Dumazet wrote:
> >>>>[PATCH v2] slab: shrinks sizeof(struct kmem_cache)
> >>
> >>On Wed, 20 Jul 2011, Christoph Lameter wrote:
> >>>This will solve the issue for small nr_cpu_ids but those with 4k cpus will
> >>>still have the issue.
> >>>
> >>>Acked-by: Christoph Lameter <cl@linux.com>
> >>
> >>Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
> >>though?
> 
> On Wed, 20 Jul 2011, Christoph Lameter wrote:
> >Those with 4k cpus will be thankful I guess.
> 
> OTOH, I'm slightly worried that it might mask a real problem with
> GFP_KERNEL not being aggressive enough. Mel?
> 

The reproduction case was while memory was under heavy pressure
(swapout was active) and even then only 1 in a 1000 containers were
failing to create due to an order-4 allocation failure. I'm not
convinced we need to increase how aggressive the allocator is for
PAGE_ALLOC_COSTLY_ORDER in general based on this.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-20 19:09                                 ` Mel Gorman
  0 siblings, 0 replies; 67+ messages in thread
From: Mel Gorman @ 2011-07-20 19:09 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Christoph Lameter, Eric Dumazet, Konstantin Khlebnikov,
	Andrew Morton, linux-mm, linux-kernel, Matt Mackall

On Wed, Jul 20, 2011 at 08:41:12PM +0300, Pekka Enberg wrote:
> On Wed, 20 Jul 2011, Pekka Enberg wrote:
> >>On Wed, 20 Jul 2011, Eric Dumazet wrote:
> >>>>[PATCH v2] slab: shrinks sizeof(struct kmem_cache)
> >>
> >>On Wed, 20 Jul 2011, Christoph Lameter wrote:
> >>>This will solve the issue for small nr_cpu_ids but those with 4k cpus will
> >>>still have the issue.
> >>>
> >>>Acked-by: Christoph Lameter <cl@linux.com>
> >>
> >>Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
> >>though?
> 
> On Wed, 20 Jul 2011, Christoph Lameter wrote:
> >Those with 4k cpus will be thankful I guess.
> 
> OTOH, I'm slightly worried that it might mask a real problem with
> GFP_KERNEL not being aggressive enough. Mel?
> 

The reproduction case was while memory was under heavy pressure
(swapout was active) and even then only 1 in a 1000 containers were
failing to create due to an order-4 allocation failure. I'm not
convinced we need to increase how aggressive the allocator is for
PAGE_ALLOC_COSTLY_ORDER in general based on this.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 18:07                                 ` Matt Mackall
@ 2011-07-21  7:18                                   ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-21  7:18 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Pekka Enberg, Christoph Lameter, Eric Dumazet, Mel Gorman,
	Andrew Morton, linux-mm, linux-kernel

Matt Mackall wrote:
> On Wed, 2011-07-20 at 20:41 +0300, Pekka Enberg wrote:
>> On Wed, 20 Jul 2011, Pekka Enberg wrote:
>>>> On Wed, 20 Jul 2011, Eric Dumazet wrote:
>>>>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)
>>>>
>>>> On Wed, 20 Jul 2011, Christoph Lameter wrote:
>>>>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will
>>>>> still have the issue.
>>>>>
>>>>> Acked-by: Christoph Lameter<cl@linux.com>
>>>>
>>>> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
>>>> though?
>>
>> On Wed, 20 Jul 2011, Christoph Lameter wrote:
>>> Those with 4k cpus will be thankful I guess.
>>
>> OTOH, I'm slightly worried that it might mask a real problem
>> with GFP_KERNEL not being aggressive enough. Mel?
>
> I think that's already been demonstrated here, yes. It's just waiting
> for another obscure workload to trigger it.
>

I Agree, __GFP_REPEAT doesn't fix the problem completely. So, some more detailed investigation required.
My case is not very honest, because my kernel has slightly different memory controller, and it has bugs for sure.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-21  7:18                                   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-21  7:18 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Pekka Enberg, Christoph Lameter, Eric Dumazet, Mel Gorman,
	Andrew Morton, linux-mm, linux-kernel

Matt Mackall wrote:
> On Wed, 2011-07-20 at 20:41 +0300, Pekka Enberg wrote:
>> On Wed, 20 Jul 2011, Pekka Enberg wrote:
>>>> On Wed, 20 Jul 2011, Eric Dumazet wrote:
>>>>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache)
>>>>
>>>> On Wed, 20 Jul 2011, Christoph Lameter wrote:
>>>>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will
>>>>> still have the issue.
>>>>>
>>>>> Acked-by: Christoph Lameter<cl@linux.com>
>>>>
>>>> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin
>>>> though?
>>
>> On Wed, 20 Jul 2011, Christoph Lameter wrote:
>>> Those with 4k cpus will be thankful I guess.
>>
>> OTOH, I'm slightly worried that it might mask a real problem
>> with GFP_KERNEL not being aggressive enough. Mel?
>
> I think that's already been demonstrated here, yes. It's just waiting
> for another obscure workload to trigger it.
>

I Agree, __GFP_REPEAT doesn't fix the problem completely. So, some more detailed investigation required.
My case is not very honest, because my kernel has slightly different memory controller, and it has bugs for sure.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 14:52           ` Christoph Lameter
@ 2011-07-21  8:43             ` Eric Dumazet
  -1 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-21  8:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit :

> We should be making it a per cpu pointer like slub then. I looked at what
> it would take to do so a couple of month ago but it was quite invasive.
> 

I took a look at this too, but using percpu data would consume more
memory, because percpu allocator allocates memory blobs for all possible
cpus, while current code handles online/offline cpu nicely.




^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-21  8:43             ` Eric Dumazet
  0 siblings, 0 replies; 67+ messages in thread
From: Eric Dumazet @ 2011-07-21  8:43 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

Le mercredi 20 juillet 2011 A  09:52 -0500, Christoph Lameter a A(C)crit :

> We should be making it a per cpu pointer like slub then. I looked at what
> it would take to do so a couple of month ago but it was quite invasive.
> 

I took a look at this too, but using percpu data would consume more
memory, because percpu allocator allocates memory blobs for all possible
cpus, while current code handles online/offline cpu nicely.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-21  8:43             ` Eric Dumazet
  (?)
@ 2011-07-21 15:27             ` Christoph Lameter
  -1 siblings, 0 replies; 67+ messages in thread
From: Christoph Lameter @ 2011-07-21 15:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

[-- Attachment #1: Type: TEXT/PLAIN, Size: 825 bytes --]



On Thu, 21 Jul 2011, Eric Dumazet wrote:

> Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit :
>
> > We should be making it a per cpu pointer like slub then. I looked at what
> > it would take to do so a couple of month ago but it was quite invasive.
> >
>
> I took a look at this too, but using percpu data would consume more
> memory, because percpu allocator allocates memory blobs for all possible
> cpus, while current code handles online/offline cpu nicely.

The number of possible cpus is determined on bootup. If the BIOS provides
the right information about which cpus are present and if there is no
hotswapping of cpus possible then only per cpu areas for the actual
functioning cpus are allocated. Certainly no desktop bios will indicate
that 4096 cpus are possible.



^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-20 15:09             ` Eric Dumazet
@ 2011-07-31 11:41               ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-31 11:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, Mel Gorman, Pekka Enberg, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

It seems someone forgot this patch,
the second one "slab: shrink sizeof(struct kmem_cache)" already in mainline

Eric Dumazet wrote:
> Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit :
>> On Wed, 20 Jul 2011, Eric Dumazet wrote:
>>
>>>> Slab's kmem_cache is configured with an array of NR_CPUS which is the
>>>> maximum nr of cpus supported. Some distros support 4096 cpus in order to
>>>> accomodate SGI machines. That array then will have the size of 4096 * 8 =
>>>> 32k
>>>
>>> We currently support a dynamic schem for the possible nodes :
>>>
>>> cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
>>> 	nr_node_ids * sizeof(struct kmem_list3 *);
>>>
>>> We could have a similar trick to make the real size both depends on
>>> nr_node_ids and nr_cpu_ids.
>>>
>>> (struct kmem_cache)->array would become a pointer.
>>
>> We should be making it a per cpu pointer like slub then. I looked at what
>> it would take to do so a couple of month ago but it was quite invasive.
>>
>
> Lets try this first patch, simple enough : No need to setup percpu data
> for a one time use structure...
>
> [PATCH] slab: remove one NR_CPUS dependency
>
> Reduce high order allocations in do_tune_cpucache() for some setups.
> (NR_CPUS=4096 ->  we need 64KB)
>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
> CC: Pekka Enberg<penberg@kernel.org>
> CC: Christoph Lameter<cl@linux.com>
> ---
>   mm/slab.c |    5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index d96e223..862bd12 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3933,7 +3933,7 @@ fail:
>
>   struct ccupdate_struct {
>   	struct kmem_cache *cachep;
> -	struct array_cache *new[NR_CPUS];
> +	struct array_cache *new[0];
>   };
>
>   static void do_ccupdate_local(void *info)
> @@ -3955,7 +3955,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
>   	struct ccupdate_struct *new;
>   	int i;
>
> -	new = kzalloc(sizeof(*new), gfp);
> +	new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *),
> +		      gfp);
>   	if (!new)
>   		return -ENOMEM;
>
>
>


^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-31 11:41               ` Konstantin Khlebnikov
  0 siblings, 0 replies; 67+ messages in thread
From: Konstantin Khlebnikov @ 2011-07-31 11:41 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, Mel Gorman, Pekka Enberg, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

It seems someone forgot this patch,
the second one "slab: shrink sizeof(struct kmem_cache)" already in mainline

Eric Dumazet wrote:
> Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit :
>> On Wed, 20 Jul 2011, Eric Dumazet wrote:
>>
>>>> Slab's kmem_cache is configured with an array of NR_CPUS which is the
>>>> maximum nr of cpus supported. Some distros support 4096 cpus in order to
>>>> accomodate SGI machines. That array then will have the size of 4096 * 8 =
>>>> 32k
>>>
>>> We currently support a dynamic schem for the possible nodes :
>>>
>>> cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) +
>>> 	nr_node_ids * sizeof(struct kmem_list3 *);
>>>
>>> We could have a similar trick to make the real size both depends on
>>> nr_node_ids and nr_cpu_ids.
>>>
>>> (struct kmem_cache)->array would become a pointer.
>>
>> We should be making it a per cpu pointer like slub then. I looked at what
>> it would take to do so a couple of month ago but it was quite invasive.
>>
>
> Lets try this first patch, simple enough : No need to setup percpu data
> for a one time use structure...
>
> [PATCH] slab: remove one NR_CPUS dependency
>
> Reduce high order allocations in do_tune_cpucache() for some setups.
> (NR_CPUS=4096 ->  we need 64KB)
>
> Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com>
> CC: Pekka Enberg<penberg@kernel.org>
> CC: Christoph Lameter<cl@linux.com>
> ---
>   mm/slab.c |    5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index d96e223..862bd12 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3933,7 +3933,7 @@ fail:
>
>   struct ccupdate_struct {
>   	struct kmem_cache *cachep;
> -	struct array_cache *new[NR_CPUS];
> +	struct array_cache *new[0];
>   };
>
>   static void do_ccupdate_local(void *info)
> @@ -3955,7 +3955,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
>   	struct ccupdate_struct *new;
>   	int i;
>
> -	new = kzalloc(sizeof(*new), gfp);
> +	new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *),
> +		      gfp);
>   	if (!new)
>   		return -ENOMEM;
>
>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
  2011-07-31 11:41               ` Konstantin Khlebnikov
@ 2011-07-31 11:44                 ` Pekka Enberg
  -1 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-31 11:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Eric Dumazet, Christoph Lameter, Mel Gorman, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Sun, Jul 31, 2011 at 2:41 PM, Konstantin Khlebnikov
<khlebnikov@parallels.com> wrote:
> It seems someone forgot this patch,

[snip]

>> [PATCH] slab: remove one NR_CPUS dependency
>>
>> Reduce high order allocations in do_tune_cpucache() for some setups.
>> (NR_CPUS=4096 ->  we need 64KB)

It's queued in slab/urgent branch and will be sent to Linus tonight.

^ permalink raw reply	[flat|nested] 67+ messages in thread

* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT
@ 2011-07-31 11:44                 ` Pekka Enberg
  0 siblings, 0 replies; 67+ messages in thread
From: Pekka Enberg @ 2011-07-31 11:44 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Eric Dumazet, Christoph Lameter, Mel Gorman, Andrew Morton,
	linux-mm, linux-kernel, Matt Mackall

On Sun, Jul 31, 2011 at 2:41 PM, Konstantin Khlebnikov
<khlebnikov@parallels.com> wrote:
> It seems someone forgot this patch,

[snip]

>> [PATCH] slab: remove one NR_CPUS dependency
>>
>> Reduce high order allocations in do_tune_cpucache() for some setups.
>> (NR_CPUS=4096 ->  we need 64KB)

It's queued in slab/urgent branch and will be sent to Linus tonight.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 67+ messages in thread

end of thread, other threads:[~2011-07-31 11:45 UTC | newest]

Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-20 12:16 [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT Konstantin Khlebnikov
2011-07-20 12:16 ` Konstantin Khlebnikov
2011-07-20 13:14 ` Pekka Enberg
2011-07-20 13:14   ` Pekka Enberg
2011-07-20 13:28   ` Konstantin Khlebnikov
2011-07-20 13:28     ` Konstantin Khlebnikov
2011-07-20 13:42     ` Pekka Enberg
2011-07-20 13:42       ` Pekka Enberg
2011-07-20 13:50       ` Konstantin Khlebnikov
2011-07-20 13:50         ` Konstantin Khlebnikov
2011-07-20 13:53         ` Pekka Enberg
2011-07-20 13:53           ` Pekka Enberg
2011-07-20 15:36           ` Christoph Lameter
2011-07-20 15:36             ` Christoph Lameter
2011-07-20 13:59         ` Konstantin Khlebnikov
2011-07-20 13:59           ` Konstantin Khlebnikov
2011-07-20 13:54       ` Christoph Lameter
2011-07-20 13:54         ` Christoph Lameter
2011-07-20 14:20         ` Mel Gorman
2011-07-20 14:20           ` Mel Gorman
2011-07-20 14:32           ` Konstantin Khlebnikov
2011-07-20 14:32             ` Konstantin Khlebnikov
2011-07-20 14:40             ` Eric Dumazet
2011-07-20 14:40               ` Eric Dumazet
2011-07-20 14:47               ` Konstantin Khlebnikov
2011-07-20 14:47                 ` Konstantin Khlebnikov
2011-07-20 13:43   ` Mel Gorman
2011-07-20 13:43     ` Mel Gorman
2011-07-20 13:56     ` Christoph Lameter
2011-07-20 13:56       ` Christoph Lameter
2011-07-20 14:08       ` Eric Dumazet
2011-07-20 14:08         ` Eric Dumazet
2011-07-20 14:52         ` Christoph Lameter
2011-07-20 14:52           ` Christoph Lameter
2011-07-20 15:09           ` Eric Dumazet
2011-07-20 15:09             ` Eric Dumazet
2011-07-20 15:34             ` Christoph Lameter
2011-07-20 15:34               ` Christoph Lameter
2011-07-20 15:56               ` Eric Dumazet
2011-07-20 15:56                 ` Eric Dumazet
2011-07-20 16:17                 ` Christoph Lameter
2011-07-20 16:17                   ` Christoph Lameter
2011-07-20 16:31                   ` Eric Dumazet
2011-07-20 16:31                     ` Eric Dumazet
2011-07-20 17:04                     ` Eric Dumazet
2011-07-20 17:04                       ` Eric Dumazet
2011-07-20 17:13                       ` Christoph Lameter
2011-07-20 17:13                         ` Christoph Lameter
2011-07-20 17:28                         ` Pekka Enberg
2011-07-20 17:28                           ` Pekka Enberg
2011-07-20 17:37                           ` Christoph Lameter
2011-07-20 17:37                             ` Christoph Lameter
2011-07-20 17:41                             ` Pekka Enberg
2011-07-20 17:41                               ` Pekka Enberg
2011-07-20 18:07                               ` Matt Mackall
2011-07-20 18:07                                 ` Matt Mackall
2011-07-21  7:18                                 ` Konstantin Khlebnikov
2011-07-21  7:18                                   ` Konstantin Khlebnikov
2011-07-20 19:09                               ` Mel Gorman
2011-07-20 19:09                                 ` Mel Gorman
2011-07-31 11:41             ` Konstantin Khlebnikov
2011-07-31 11:41               ` Konstantin Khlebnikov
2011-07-31 11:44               ` Pekka Enberg
2011-07-31 11:44                 ` Pekka Enberg
2011-07-21  8:43           ` Eric Dumazet
2011-07-21  8:43             ` Eric Dumazet
2011-07-21 15:27             ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.