* [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 12:16 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 12:16 UTC (permalink / raw) To: Andrew Morton Cc: Pekka Enberg, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER, thus there is a good chance of unsuccessful allocation. With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> --- mm/slab.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index d96e223..53bddc8 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, gfp = GFP_NOWAIT; /* Get cache's description obj. */ - cachep = kmem_cache_zalloc(&cache_cache, gfp); + cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT); if (!cachep) goto oops; ^ permalink raw reply related [flat|nested] 67+ messages in thread
* [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 12:16 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 12:16 UTC (permalink / raw) To: Andrew Morton Cc: Pekka Enberg, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER, thus there is a good chance of unsuccessful allocation. With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> --- mm/slab.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index d96e223..53bddc8 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, gfp = GFP_NOWAIT; /* Get cache's description obj. */ - cachep = kmem_cache_zalloc(&cache_cache, gfp); + cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT); if (!cachep) goto oops; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 12:16 ` Konstantin Khlebnikov @ 2011-07-20 13:14 ` Pekka Enberg -1 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 13:14 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER, > thus there is a good chance of unsuccessful allocation. > With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> > --- > mm/slab.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/mm/slab.c b/mm/slab.c > index d96e223..53bddc8 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, > gfp = GFP_NOWAIT; > > /* Get cache's description obj. */ > - cachep = kmem_cache_zalloc(&cache_cache, gfp); > + cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT); > if (!cachep) > goto oops; The changelog isn't that convincing, really. This is kmem_cache_create() so I'm surprised we'd ever get NULL here in practice. Does this fix some problem you're seeing? If this is really an issue, I'd blame the page allocator as GFP_KERNEL should just work. Pekka ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:14 ` Pekka Enberg 0 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 13:14 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER, > thus there is a good chance of unsuccessful allocation. > With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively. > > Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> > --- > mm/slab.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/mm/slab.c b/mm/slab.c > index d96e223..53bddc8 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, > gfp = GFP_NOWAIT; > > /* Get cache's description obj. */ > - cachep = kmem_cache_zalloc(&cache_cache, gfp); > + cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT); > if (!cachep) > goto oops; The changelog isn't that convincing, really. This is kmem_cache_create() so I'm surprised we'd ever get NULL here in practice. Does this fix some problem you're seeing? If this is really an issue, I'd blame the page allocator as GFP_KERNEL should just work. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:14 ` Pekka Enberg @ 2011-07-20 13:28 ` Konstantin Khlebnikov -1 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 13:28 UTC (permalink / raw) To: Pekka Enberg Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman Pekka Enberg wrote: > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >> Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER, >> thus there is a good chance of unsuccessful allocation. >> With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively. >> >> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org> >> --- >> mm/slab.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/mm/slab.c b/mm/slab.c >> index d96e223..53bddc8 100644 >> --- a/mm/slab.c >> +++ b/mm/slab.c >> @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, >> gfp = GFP_NOWAIT; >> >> /* Get cache's description obj. */ >> - cachep = kmem_cache_zalloc(&cache_cache, gfp); >> + cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT); >> if (!cachep) >> goto oops; > > The changelog isn't that convincing, really. This is kmem_cache_create() > so I'm surprised we'd ever get NULL here in practice. Does this fix some > problem you're seeing? If this is really an issue, I'd blame the page > allocator as GFP_KERNEL should just work. nf_conntrack creates separate slab-cache for each net-namespace, this patch of course not eliminates the chance of failure, but makes it more acceptable. struct kmem_size for slub is more compact, it uses pecpu-pointers instead of dumb NR_CPUS-size array. probably better to fix this side... > > Pekka ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:28 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 13:28 UTC (permalink / raw) To: Pekka Enberg Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman Pekka Enberg wrote: > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >> Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER, >> thus there is a good chance of unsuccessful allocation. >> With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively. >> >> Signed-off-by: Konstantin Khlebnikov<khlebnikov@openvz.org> >> --- >> mm/slab.c | 2 +- >> 1 files changed, 1 insertions(+), 1 deletions(-) >> >> diff --git a/mm/slab.c b/mm/slab.c >> index d96e223..53bddc8 100644 >> --- a/mm/slab.c >> +++ b/mm/slab.c >> @@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, >> gfp = GFP_NOWAIT; >> >> /* Get cache's description obj. */ >> - cachep = kmem_cache_zalloc(&cache_cache, gfp); >> + cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT); >> if (!cachep) >> goto oops; > > The changelog isn't that convincing, really. This is kmem_cache_create() > so I'm surprised we'd ever get NULL here in practice. Does this fix some > problem you're seeing? If this is really an issue, I'd blame the page > allocator as GFP_KERNEL should just work. nf_conntrack creates separate slab-cache for each net-namespace, this patch of course not eliminates the chance of failure, but makes it more acceptable. struct kmem_size for slub is more compact, it uses pecpu-pointers instead of dumb NR_CPUS-size array. probably better to fix this side... > > Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:28 ` Konstantin Khlebnikov @ 2011-07-20 13:42 ` Pekka Enberg -1 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 13:42 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >> The changelog isn't that convincing, really. This is kmem_cache_create() >> so I'm surprised we'd ever get NULL here in practice. Does this fix some >> problem you're seeing? If this is really an issue, I'd blame the page >> allocator as GFP_KERNEL should just work. > > nf_conntrack creates separate slab-cache for each net-namespace, > this patch of course not eliminates the chance of failure, but makes it more > acceptable. I'm still surprised you are seeing failures. mm/slab.c hasn't changed significantly in a long time. Why hasn't anyone reported this before? I'd still be inclined to shift the blame to the page allocator... Mel, Christoph? On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > struct kmem_size for slub is more compact, it uses pecpu-pointers instead of > dumb NR_CPUS-size array. > probably better to fix this side... So how big is 'struct kmem_cache' for your configuration anyway? Fixing the per-cpu data structures would be nice but I'm guessing it'll be slightly painful for mm/slab.c. Pekka ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:42 ` Pekka Enberg 0 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 13:42 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >> The changelog isn't that convincing, really. This is kmem_cache_create() >> so I'm surprised we'd ever get NULL here in practice. Does this fix some >> problem you're seeing? If this is really an issue, I'd blame the page >> allocator as GFP_KERNEL should just work. > > nf_conntrack creates separate slab-cache for each net-namespace, > this patch of course not eliminates the chance of failure, but makes it more > acceptable. I'm still surprised you are seeing failures. mm/slab.c hasn't changed significantly in a long time. Why hasn't anyone reported this before? I'd still be inclined to shift the blame to the page allocator... Mel, Christoph? On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > struct kmem_size for slub is more compact, it uses pecpu-pointers instead of > dumb NR_CPUS-size array. > probably better to fix this side... So how big is 'struct kmem_cache' for your configuration anyway? Fixing the per-cpu data structures would be nice but I'm guessing it'll be slightly painful for mm/slab.c. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:42 ` Pekka Enberg @ 2011-07-20 13:50 ` Konstantin Khlebnikov -1 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 13:50 UTC (permalink / raw) To: Pekka Enberg Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman Pekka Enberg wrote: > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>> The changelog isn't that convincing, really. This is kmem_cache_create() >>> so I'm surprised we'd ever get NULL here in practice. Does this fix some >>> problem you're seeing? If this is really an issue, I'd blame the page >>> allocator as GFP_KERNEL should just work. >> >> nf_conntrack creates separate slab-cache for each net-namespace, >> this patch of course not eliminates the chance of failure, but makes it more >> acceptable. > > I'm still surprised you are seeing failures. mm/slab.c hasn't changed > significantly in a long time. Why hasn't anyone reported this before? I'd > still be inclined to shift the blame to the page allocator... Mel, > Christoph? > > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of >> dumb NR_CPUS-size array. >> probably better to fix this side... > > So how big is 'struct kmem_cache' for your configuration anyway? Fixing > the per-cpu data structures would be nice but I'm guessing it'll be > slightly painful for mm/slab.c. With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k! so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3 > > Pekka ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:50 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 13:50 UTC (permalink / raw) To: Pekka Enberg Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman Pekka Enberg wrote: > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>> The changelog isn't that convincing, really. This is kmem_cache_create() >>> so I'm surprised we'd ever get NULL here in practice. Does this fix some >>> problem you're seeing? If this is really an issue, I'd blame the page >>> allocator as GFP_KERNEL should just work. >> >> nf_conntrack creates separate slab-cache for each net-namespace, >> this patch of course not eliminates the chance of failure, but makes it more >> acceptable. > > I'm still surprised you are seeing failures. mm/slab.c hasn't changed > significantly in a long time. Why hasn't anyone reported this before? I'd > still be inclined to shift the blame to the page allocator... Mel, > Christoph? > > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of >> dumb NR_CPUS-size array. >> probably better to fix this side... > > So how big is 'struct kmem_cache' for your configuration anyway? Fixing > the per-cpu data structures would be nice but I'm guessing it'll be > slightly painful for mm/slab.c. With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k! so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3 > > Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:50 ` Konstantin Khlebnikov @ 2011-07-20 13:53 ` Pekka Enberg -1 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 13:53 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>>> The changelog isn't that convincing, really. This is kmem_cache_create() >>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some >>>> problem you're seeing? If this is really an issue, I'd blame the page >>>> allocator as GFP_KERNEL should just work. >>> >>> nf_conntrack creates separate slab-cache for each net-namespace, >>> this patch of course not eliminates the chance of failure, but makes it >>> more >>> acceptable. >> >> I'm still surprised you are seeing failures. mm/slab.c hasn't changed >> significantly in a long time. Why hasn't anyone reported this before? I'd >> still be inclined to shift the blame to the page allocator... Mel, >> Christoph? >> >> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead >>> of >>> dumb NR_CPUS-size array. >>> probably better to fix this side... >> >> So how big is 'struct kmem_cache' for your configuration anyway? Fixing >> the per-cpu data structures would be nice but I'm guessing it'll be >> slightly painful for mm/slab.c. > > With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k! > so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3 That's somewhat sad. I suppose I can just merge your patch unless other people object to it. I'd like a v2 with better changelog though. Pekka ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:53 ` Pekka Enberg 0 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 13:53 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>>> The changelog isn't that convincing, really. This is kmem_cache_create() >>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some >>>> problem you're seeing? If this is really an issue, I'd blame the page >>>> allocator as GFP_KERNEL should just work. >>> >>> nf_conntrack creates separate slab-cache for each net-namespace, >>> this patch of course not eliminates the chance of failure, but makes it >>> more >>> acceptable. >> >> I'm still surprised you are seeing failures. mm/slab.c hasn't changed >> significantly in a long time. Why hasn't anyone reported this before? I'd >> still be inclined to shift the blame to the page allocator... Mel, >> Christoph? >> >> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead >>> of >>> dumb NR_CPUS-size array. >>> probably better to fix this side... >> >> So how big is 'struct kmem_cache' for your configuration anyway? Fixing >> the per-cpu data structures would be nice but I'm guessing it'll be >> slightly painful for mm/slab.c. > > With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k! > so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3 That's somewhat sad. I suppose I can just merge your patch unless other people object to it. I'd like a v2 with better changelog though. Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:53 ` Pekka Enberg @ 2011-07-20 15:36 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 15:36 UTC (permalink / raw) To: Pekka Enberg Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Pekka Enberg wrote: > That's somewhat sad. I suppose I can just merge your patch unless other people > object to it. I'd like a v2 with better changelog though. Seems to be the simplest solution. Fix the changelog then add Acked-by: Christoph Lameter <cl@linux.com> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 15:36 ` Christoph Lameter 0 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 15:36 UTC (permalink / raw) To: Pekka Enberg Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Pekka Enberg wrote: > That's somewhat sad. I suppose I can just merge your patch unless other people > object to it. I'd like a v2 with better changelog though. Seems to be the simplest solution. Fix the changelog then add Acked-by: Christoph Lameter <cl@linux.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:50 ` Konstantin Khlebnikov @ 2011-07-20 13:59 ` Konstantin Khlebnikov -1 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 13:59 UTC (permalink / raw) To: Pekka Enberg Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman Konstantin Khlebnikov wrote: > Pekka Enberg wrote: >> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>>> The changelog isn't that convincing, really. This is kmem_cache_create() >>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some >>>> problem you're seeing? If this is really an issue, I'd blame the page >>>> allocator as GFP_KERNEL should just work. >>> >>> nf_conntrack creates separate slab-cache for each net-namespace, >>> this patch of course not eliminates the chance of failure, but makes it more >>> acceptable. >> >> I'm still surprised you are seeing failures. mm/slab.c hasn't changed >> significantly in a long time. Why hasn't anyone reported this before? I'd >> still be inclined to shift the blame to the page allocator... Mel, >> Christoph? >> >> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of >>> dumb NR_CPUS-size array. >>> probably better to fix this side... >> >> So how big is 'struct kmem_cache' for your configuration anyway? Fixing >> the per-cpu data structures would be nice but I'm guessing it'll be >> slightly painful for mm/slab.c. > > With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k! > so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3 sorry, it is 0x9070 bytes, 36+ kb, 9+ pages > >> >> Pekka > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:59 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 13:59 UTC (permalink / raw) To: Pekka Enberg Cc: Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall, mgorman Konstantin Khlebnikov wrote: > Pekka Enberg wrote: >> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>>> The changelog isn't that convincing, really. This is kmem_cache_create() >>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some >>>> problem you're seeing? If this is really an issue, I'd blame the page >>>> allocator as GFP_KERNEL should just work. >>> >>> nf_conntrack creates separate slab-cache for each net-namespace, >>> this patch of course not eliminates the chance of failure, but makes it more >>> acceptable. >> >> I'm still surprised you are seeing failures. mm/slab.c hasn't changed >> significantly in a long time. Why hasn't anyone reported this before? I'd >> still be inclined to shift the blame to the page allocator... Mel, >> Christoph? >> >> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>> struct kmem_size for slub is more compact, it uses pecpu-pointers instead of >>> dumb NR_CPUS-size array. >>> probably better to fix this side... >> >> So how big is 'struct kmem_cache' for your configuration anyway? Fixing >> the per-cpu data structures would be nice but I'm guessing it'll be >> slightly painful for mm/slab.c. > > With NR_CPUS=4096 and MAX_NUMNODES=512 its over 9k! > so it require order-4 page, meanwhile PAGE_ALLOC_COSTLY_ORDER is 3 sorry, it is 0x9070 bytes, 36+ kb, 9+ pages > >> >> Pekka > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:42 ` Pekka Enberg @ 2011-07-20 13:54 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 13:54 UTC (permalink / raw) To: Pekka Enberg Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > > > The changelog isn't that convincing, really. This is kmem_cache_create() > > > so I'm surprised we'd ever get NULL here in practice. Does this fix some > > > problem you're seeing? If this is really an issue, I'd blame the page > > > allocator as GFP_KERNEL should just work. > > > > nf_conntrack creates separate slab-cache for each net-namespace, > > this patch of course not eliminates the chance of failure, but makes it more > > acceptable. > > I'm still surprised you are seeing failures. mm/slab.c hasn't changed > significantly in a long time. Why hasn't anyone reported this before? I'd > still be inclined to shift the blame to the page allocator... Mel, Christoph? There was a lot of recent fiddling with the reclaim logic. Maybe some of those changes caused the problem? ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:54 ` Christoph Lameter 0 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 13:54 UTC (permalink / raw) To: Pekka Enberg Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall, mgorman On Wed, 20 Jul 2011, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > > > The changelog isn't that convincing, really. This is kmem_cache_create() > > > so I'm surprised we'd ever get NULL here in practice. Does this fix some > > > problem you're seeing? If this is really an issue, I'd blame the page > > > allocator as GFP_KERNEL should just work. > > > > nf_conntrack creates separate slab-cache for each net-namespace, > > this patch of course not eliminates the chance of failure, but makes it more > > acceptable. > > I'm still surprised you are seeing failures. mm/slab.c hasn't changed > significantly in a long time. Why hasn't anyone reported this before? I'd > still be inclined to shift the blame to the page allocator... Mel, Christoph? There was a lot of recent fiddling with the reclaim logic. Maybe some of those changes caused the problem? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:54 ` Christoph Lameter @ 2011-07-20 14:20 ` Mel Gorman -1 siblings, 0 replies; 67+ messages in thread From: Mel Gorman @ 2011-07-20 14:20 UTC (permalink / raw) To: Christoph Lameter Cc: Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, Jul 20, 2011 at 08:54:10AM -0500, Christoph Lameter wrote: > On Wed, 20 Jul 2011, Pekka Enberg wrote: > > > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > > > > The changelog isn't that convincing, really. This is kmem_cache_create() > > > > so I'm surprised we'd ever get NULL here in practice. Does this fix some > > > > problem you're seeing? If this is really an issue, I'd blame the page > > > > allocator as GFP_KERNEL should just work. > > > > > > nf_conntrack creates separate slab-cache for each net-namespace, > > > this patch of course not eliminates the chance of failure, but makes it more > > > acceptable. > > > > I'm still surprised you are seeing failures. mm/slab.c hasn't changed > > significantly in a long time. Why hasn't anyone reported this before? I'd > > still be inclined to shift the blame to the page allocator... Mel, Christoph? > > There was a lot of recent fiddling with the reclaim logic. Maybe some of > those changes caused the problem? > It's more likely that creating new slabs while under memory pressure significant enough to fail an order-4 allocation is a situation that is rarely tested. What kernel version did this failure occur on? What was the system doing at the time of failure? Can the page allocation failure message be posted? -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 14:20 ` Mel Gorman 0 siblings, 0 replies; 67+ messages in thread From: Mel Gorman @ 2011-07-20 14:20 UTC (permalink / raw) To: Christoph Lameter Cc: Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, Jul 20, 2011 at 08:54:10AM -0500, Christoph Lameter wrote: > On Wed, 20 Jul 2011, Pekka Enberg wrote: > > > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > > > > The changelog isn't that convincing, really. This is kmem_cache_create() > > > > so I'm surprised we'd ever get NULL here in practice. Does this fix some > > > > problem you're seeing? If this is really an issue, I'd blame the page > > > > allocator as GFP_KERNEL should just work. > > > > > > nf_conntrack creates separate slab-cache for each net-namespace, > > > this patch of course not eliminates the chance of failure, but makes it more > > > acceptable. > > > > I'm still surprised you are seeing failures. mm/slab.c hasn't changed > > significantly in a long time. Why hasn't anyone reported this before? I'd > > still be inclined to shift the blame to the page allocator... Mel, Christoph? > > There was a lot of recent fiddling with the reclaim logic. Maybe some of > those changes caused the problem? > It's more likely that creating new slabs while under memory pressure significant enough to fail an order-4 allocation is a situation that is rarely tested. What kernel version did this failure occur on? What was the system doing at the time of failure? Can the page allocation failure message be posted? -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 14:20 ` Mel Gorman @ 2011-07-20 14:32 ` Konstantin Khlebnikov -1 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 14:32 UTC (permalink / raw) To: Mel Gorman Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Mel Gorman wrote: > On Wed, Jul 20, 2011 at 08:54:10AM -0500, Christoph Lameter wrote: >> On Wed, 20 Jul 2011, Pekka Enberg wrote: >> >>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>>>> The changelog isn't that convincing, really. This is kmem_cache_create() >>>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some >>>>> problem you're seeing? If this is really an issue, I'd blame the page >>>>> allocator as GFP_KERNEL should just work. >>>> >>>> nf_conntrack creates separate slab-cache for each net-namespace, >>>> this patch of course not eliminates the chance of failure, but makes it more >>>> acceptable. >>> >>> I'm still surprised you are seeing failures. mm/slab.c hasn't changed >>> significantly in a long time. Why hasn't anyone reported this before? I'd >>> still be inclined to shift the blame to the page allocator... Mel, Christoph? >> >> There was a lot of recent fiddling with the reclaim logic. Maybe some of >> those changes caused the problem? >> > > It's more likely that creating new slabs while under memory pressure > significant enough to fail an order-4 allocation is a situation that is > rarely tested. > > What kernel version did this failure occur on? What was the system doing > at the time of failure? Can the page allocation failure message be > posted? > I catch this on our rhel6-openvz kernel, and yes it very patchy, but I don't see any reasons why this cannot be reproduced on mainline kernel. there was abount ten containers with random stuff, node already do intensive swapout but still alive, in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack, there no other messages except: Unable to create nf_conn slab cache and some nf_conntrack: falling back to vmalloc. (it try allocates huge hash table and do it via vmalloc if kmalloc fails) ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 14:32 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 14:32 UTC (permalink / raw) To: Mel Gorman Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Mel Gorman wrote: > On Wed, Jul 20, 2011 at 08:54:10AM -0500, Christoph Lameter wrote: >> On Wed, 20 Jul 2011, Pekka Enberg wrote: >> >>> On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: >>>>> The changelog isn't that convincing, really. This is kmem_cache_create() >>>>> so I'm surprised we'd ever get NULL here in practice. Does this fix some >>>>> problem you're seeing? If this is really an issue, I'd blame the page >>>>> allocator as GFP_KERNEL should just work. >>>> >>>> nf_conntrack creates separate slab-cache for each net-namespace, >>>> this patch of course not eliminates the chance of failure, but makes it more >>>> acceptable. >>> >>> I'm still surprised you are seeing failures. mm/slab.c hasn't changed >>> significantly in a long time. Why hasn't anyone reported this before? I'd >>> still be inclined to shift the blame to the page allocator... Mel, Christoph? >> >> There was a lot of recent fiddling with the reclaim logic. Maybe some of >> those changes caused the problem? >> > > It's more likely that creating new slabs while under memory pressure > significant enough to fail an order-4 allocation is a situation that is > rarely tested. > > What kernel version did this failure occur on? What was the system doing > at the time of failure? Can the page allocation failure message be > posted? > I catch this on our rhel6-openvz kernel, and yes it very patchy, but I don't see any reasons why this cannot be reproduced on mainline kernel. there was abount ten containers with random stuff, node already do intensive swapout but still alive, in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack, there no other messages except: Unable to create nf_conn slab cache and some nf_conntrack: falling back to vmalloc. (it try allocates huge hash table and do it via vmalloc if kmalloc fails) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 14:32 ` Konstantin Khlebnikov @ 2011-07-20 14:40 ` Eric Dumazet -1 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 14:40 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Mel Gorman, Christoph Lameter, Pekka Enberg, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 à 18:32 +0400, Konstantin Khlebnikov a écrit : > I catch this on our rhel6-openvz kernel, and yes it very patchy, > but I don't see any reasons why this cannot be reproduced on mainline kernel. > > there was abount ten containers with random stuff, node already do intensive swapout but still alive, > in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack, > there no other messages except: > Unable to create nf_conn slab cache > and some > nf_conntrack: falling back to vmalloc. > (it try allocates huge hash table and do it via vmalloc if kmalloc fails) Does this kernel contain commit 6d4831c2 ? (vfs: avoid large kmalloc()s for the fdtable) ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 14:40 ` Eric Dumazet 0 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 14:40 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Mel Gorman, Christoph Lameter, Pekka Enberg, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 A 18:32 +0400, Konstantin Khlebnikov a A(C)crit : > I catch this on our rhel6-openvz kernel, and yes it very patchy, > but I don't see any reasons why this cannot be reproduced on mainline kernel. > > there was abount ten containers with random stuff, node already do intensive swapout but still alive, > in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack, > there no other messages except: > Unable to create nf_conn slab cache > and some > nf_conntrack: falling back to vmalloc. > (it try allocates huge hash table and do it via vmalloc if kmalloc fails) Does this kernel contain commit 6d4831c2 ? (vfs: avoid large kmalloc()s for the fdtable) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 14:40 ` Eric Dumazet @ 2011-07-20 14:47 ` Konstantin Khlebnikov -1 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 14:47 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Christoph Lameter, Pekka Enberg, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Eric Dumazet wrote: > Le mercredi 20 juillet 2011 à 18:32 +0400, Konstantin Khlebnikov a > écrit : > >> I catch this on our rhel6-openvz kernel, and yes it very patchy, >> but I don't see any reasons why this cannot be reproduced on mainline kernel. >> >> there was abount ten containers with random stuff, node already do intensive swapout but still alive, >> in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack, >> there no other messages except: >> Unable to create nf_conn slab cache >> and some >> nf_conntrack: falling back to vmalloc. >> (it try allocates huge hash table and do it via vmalloc if kmalloc fails) > > > Does this kernel contain commit 6d4831c2 ? > (vfs: avoid large kmalloc()s for the fdtable) > yes, but not exactly, in our kerner it looks like: static inline void * alloc_fdmem(unsigned int size) { if (size <= PAGE_SIZE) return kmalloc(size, GFP_KERNEL); else return vmalloc(size); } and looks like this change is came from ancient times =) ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 14:47 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-20 14:47 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Christoph Lameter, Pekka Enberg, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Eric Dumazet wrote: > Le mercredi 20 juillet 2011 à 18:32 +0400, Konstantin Khlebnikov a > écrit : > >> I catch this on our rhel6-openvz kernel, and yes it very patchy, >> but I don't see any reasons why this cannot be reproduced on mainline kernel. >> >> there was abount ten containers with random stuff, node already do intensive swapout but still alive, >> in this situation starting new containers sometimes (1 per 1000) fails due to kmem_cache_create failures in nf_conntrack, >> there no other messages except: >> Unable to create nf_conn slab cache >> and some >> nf_conntrack: falling back to vmalloc. >> (it try allocates huge hash table and do it via vmalloc if kmalloc fails) > > > Does this kernel contain commit 6d4831c2 ? > (vfs: avoid large kmalloc()s for the fdtable) > yes, but not exactly, in our kerner it looks like: static inline void * alloc_fdmem(unsigned int size) { if (size <= PAGE_SIZE) return kmalloc(size, GFP_KERNEL); else return vmalloc(size); } and looks like this change is came from ancient times =) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:14 ` Pekka Enberg @ 2011-07-20 13:43 ` Mel Gorman -1 siblings, 0 replies; 67+ messages in thread From: Mel Gorman @ 2011-07-20 13:43 UTC (permalink / raw) To: Pekka Enberg Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall On Wed, Jul 20, 2011 at 04:14:51PM +0300, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > >Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER, > >thus there is a good chance of unsuccessful allocation. > >With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively. > > > >Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> > >--- > >mm/slab.c | 2 +- > >1 files changed, 1 insertions(+), 1 deletions(-) > > > >diff --git a/mm/slab.c b/mm/slab.c > >index d96e223..53bddc8 100644 > >--- a/mm/slab.c > >+++ b/mm/slab.c > >@@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, > > gfp = GFP_NOWAIT; > > > > /* Get cache's description obj. */ > >- cachep = kmem_cache_zalloc(&cache_cache, gfp); > >+ cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT); > > if (!cachep) > > goto oops; > > The changelog isn't that convincing, really. This is > kmem_cache_create() so I'm surprised we'd ever get NULL here in > practice. Does this fix some problem you're seeing? If this is > really an issue, I'd blame the page allocator as GFP_KERNEL should > just work. > Besides, is allocating from cache_cache really a PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an order-2 allocation which is supporting up to 512 CPUs and 512 nodes. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:43 ` Mel Gorman 0 siblings, 0 replies; 67+ messages in thread From: Mel Gorman @ 2011-07-20 13:43 UTC (permalink / raw) To: Pekka Enberg Cc: Konstantin Khlebnikov, Andrew Morton, linux-mm, Christoph Lameter, linux-kernel, Matt Mackall On Wed, Jul 20, 2011 at 04:14:51PM +0300, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Konstantin Khlebnikov wrote: > >Order of sizeof(struct kmem_cache) can be bigger than PAGE_ALLOC_COSTLY_ORDER, > >thus there is a good chance of unsuccessful allocation. > >With __GFP_REPEAT buddy-allocator will reclaim/compact memory more aggressively. > > > >Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> > >--- > >mm/slab.c | 2 +- > >1 files changed, 1 insertions(+), 1 deletions(-) > > > >diff --git a/mm/slab.c b/mm/slab.c > >index d96e223..53bddc8 100644 > >--- a/mm/slab.c > >+++ b/mm/slab.c > >@@ -2304,7 +2304,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, > > gfp = GFP_NOWAIT; > > > > /* Get cache's description obj. */ > >- cachep = kmem_cache_zalloc(&cache_cache, gfp); > >+ cachep = kmem_cache_zalloc(&cache_cache, gfp | __GFP_REPEAT); > > if (!cachep) > > goto oops; > > The changelog isn't that convincing, really. This is > kmem_cache_create() so I'm surprised we'd ever get NULL here in > practice. Does this fix some problem you're seeing? If this is > really an issue, I'd blame the page allocator as GFP_KERNEL should > just work. > Besides, is allocating from cache_cache really a PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an order-2 allocation which is supporting up to 512 CPUs and 512 nodes. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:43 ` Mel Gorman @ 2011-07-20 13:56 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 13:56 UTC (permalink / raw) To: Mel Gorman Cc: Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Mel Gorman wrote: > > The changelog isn't that convincing, really. This is > > kmem_cache_create() so I'm surprised we'd ever get NULL here in > > practice. Does this fix some problem you're seeing? If this is > > really an issue, I'd blame the page allocator as GFP_KERNEL should > > just work. > > > > Besides, is allocating from cache_cache really a > PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an > order-2 allocation which is supporting up to 512 CPUs and 512 nodes. Slab's kmem_cache is configured with an array of NR_CPUS which is the maximum nr of cpus supported. Some distros support 4096 cpus in order to accomodate SGI machines. That array then will have the size of 4096 * 8 = 32k ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 13:56 ` Christoph Lameter 0 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 13:56 UTC (permalink / raw) To: Mel Gorman Cc: Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Mel Gorman wrote: > > The changelog isn't that convincing, really. This is > > kmem_cache_create() so I'm surprised we'd ever get NULL here in > > practice. Does this fix some problem you're seeing? If this is > > really an issue, I'd blame the page allocator as GFP_KERNEL should > > just work. > > > > Besides, is allocating from cache_cache really a > PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an > order-2 allocation which is supporting up to 512 CPUs and 512 nodes. Slab's kmem_cache is configured with an array of NR_CPUS which is the maximum nr of cpus supported. Some distros support 4096 cpus in order to accomodate SGI machines. That array then will have the size of 4096 * 8 = 32k -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 13:56 ` Christoph Lameter @ 2011-07-20 14:08 ` Eric Dumazet -1 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 14:08 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 à 08:56 -0500, Christoph Lameter a écrit : > On Wed, 20 Jul 2011, Mel Gorman wrote: > > > > The changelog isn't that convincing, really. This is > > > kmem_cache_create() so I'm surprised we'd ever get NULL here in > > > practice. Does this fix some problem you're seeing? If this is > > > really an issue, I'd blame the page allocator as GFP_KERNEL should > > > just work. > > > > > > > Besides, is allocating from cache_cache really a > > PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an > > order-2 allocation which is supporting up to 512 CPUs and 512 nodes. > > Slab's kmem_cache is configured with an array of NR_CPUS which is the > maximum nr of cpus supported. Some distros support 4096 cpus in order to > accomodate SGI machines. That array then will have the size of 4096 * 8 = > 32k We currently support a dynamic schem for the possible nodes : cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + nr_node_ids * sizeof(struct kmem_list3 *); We could have a similar trick to make the real size both depends on nr_node_ids and nr_cpu_ids. (struct kmem_cache)->array would become a pointer. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 14:08 ` Eric Dumazet 0 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 14:08 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 A 08:56 -0500, Christoph Lameter a A(C)crit : > On Wed, 20 Jul 2011, Mel Gorman wrote: > > > > The changelog isn't that convincing, really. This is > > > kmem_cache_create() so I'm surprised we'd ever get NULL here in > > > practice. Does this fix some problem you're seeing? If this is > > > really an issue, I'd blame the page allocator as GFP_KERNEL should > > > just work. > > > > > > > Besides, is allocating from cache_cache really a > > PAGE_ALLOC_COSTLY_ORDER allocation? On my laptop at least, it's an > > order-2 allocation which is supporting up to 512 CPUs and 512 nodes. > > Slab's kmem_cache is configured with an array of NR_CPUS which is the > maximum nr of cpus supported. Some distros support 4096 cpus in order to > accomodate SGI machines. That array then will have the size of 4096 * 8 = > 32k We currently support a dynamic schem for the possible nodes : cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + nr_node_ids * sizeof(struct kmem_list3 *); We could have a similar trick to make the real size both depends on nr_node_ids and nr_cpu_ids. (struct kmem_cache)->array would become a pointer. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 14:08 ` Eric Dumazet @ 2011-07-20 14:52 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 14:52 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: > > Slab's kmem_cache is configured with an array of NR_CPUS which is the > > maximum nr of cpus supported. Some distros support 4096 cpus in order to > > accomodate SGI machines. That array then will have the size of 4096 * 8 = > > 32k > > We currently support a dynamic schem for the possible nodes : > > cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + > nr_node_ids * sizeof(struct kmem_list3 *); > > We could have a similar trick to make the real size both depends on > nr_node_ids and nr_cpu_ids. > > (struct kmem_cache)->array would become a pointer. We should be making it a per cpu pointer like slub then. I looked at what it would take to do so a couple of month ago but it was quite invasive. The other solution is to use slub instead. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 14:52 ` Christoph Lameter 0 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 14:52 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: > > Slab's kmem_cache is configured with an array of NR_CPUS which is the > > maximum nr of cpus supported. Some distros support 4096 cpus in order to > > accomodate SGI machines. That array then will have the size of 4096 * 8 = > > 32k > > We currently support a dynamic schem for the possible nodes : > > cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + > nr_node_ids * sizeof(struct kmem_list3 *); > > We could have a similar trick to make the real size both depends on > nr_node_ids and nr_cpu_ids. > > (struct kmem_cache)->array would become a pointer. We should be making it a per cpu pointer like slub then. I looked at what it would take to do so a couple of month ago but it was quite invasive. The other solution is to use slub instead. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 14:52 ` Christoph Lameter @ 2011-07-20 15:09 ` Eric Dumazet -1 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 15:09 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit : > On Wed, 20 Jul 2011, Eric Dumazet wrote: > > > > Slab's kmem_cache is configured with an array of NR_CPUS which is the > > > maximum nr of cpus supported. Some distros support 4096 cpus in order to > > > accomodate SGI machines. That array then will have the size of 4096 * 8 = > > > 32k > > > > We currently support a dynamic schem for the possible nodes : > > > > cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + > > nr_node_ids * sizeof(struct kmem_list3 *); > > > > We could have a similar trick to make the real size both depends on > > nr_node_ids and nr_cpu_ids. > > > > (struct kmem_cache)->array would become a pointer. > > We should be making it a per cpu pointer like slub then. I looked at what > it would take to do so a couple of month ago but it was quite invasive. > Lets try this first patch, simple enough : No need to setup percpu data for a one time use structure... [PATCH] slab: remove one NR_CPUS dependency Reduce high order allocations in do_tune_cpucache() for some setups. (NR_CPUS=4096 -> we need 64KB) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Pekka Enberg <penberg@kernel.org> CC: Christoph Lameter <cl@linux.com> --- mm/slab.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index d96e223..862bd12 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3933,7 +3933,7 @@ fail: struct ccupdate_struct { struct kmem_cache *cachep; - struct array_cache *new[NR_CPUS]; + struct array_cache *new[0]; }; static void do_ccupdate_local(void *info) @@ -3955,7 +3955,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit, struct ccupdate_struct *new; int i; - new = kzalloc(sizeof(*new), gfp); + new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *), + gfp); if (!new) return -ENOMEM; ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 15:09 ` Eric Dumazet 0 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 15:09 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 A 09:52 -0500, Christoph Lameter a A(C)crit : > On Wed, 20 Jul 2011, Eric Dumazet wrote: > > > > Slab's kmem_cache is configured with an array of NR_CPUS which is the > > > maximum nr of cpus supported. Some distros support 4096 cpus in order to > > > accomodate SGI machines. That array then will have the size of 4096 * 8 = > > > 32k > > > > We currently support a dynamic schem for the possible nodes : > > > > cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + > > nr_node_ids * sizeof(struct kmem_list3 *); > > > > We could have a similar trick to make the real size both depends on > > nr_node_ids and nr_cpu_ids. > > > > (struct kmem_cache)->array would become a pointer. > > We should be making it a per cpu pointer like slub then. I looked at what > it would take to do so a couple of month ago but it was quite invasive. > Lets try this first patch, simple enough : No need to setup percpu data for a one time use structure... [PATCH] slab: remove one NR_CPUS dependency Reduce high order allocations in do_tune_cpucache() for some setups. (NR_CPUS=4096 -> we need 64KB) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Pekka Enberg <penberg@kernel.org> CC: Christoph Lameter <cl@linux.com> --- mm/slab.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/mm/slab.c b/mm/slab.c index d96e223..862bd12 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3933,7 +3933,7 @@ fail: struct ccupdate_struct { struct kmem_cache *cachep; - struct array_cache *new[NR_CPUS]; + struct array_cache *new[0]; }; static void do_ccupdate_local(void *info) @@ -3955,7 +3955,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit, struct ccupdate_struct *new; int i; - new = kzalloc(sizeof(*new), gfp); + new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *), + gfp); if (!new) return -ENOMEM; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 15:09 ` Eric Dumazet @ 2011-07-20 15:34 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 15:34 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: > [PATCH] slab: remove one NR_CPUS dependency Ok simple enough. Acked-by: Christoph Lameter <cl@linux.com> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 15:34 ` Christoph Lameter 0 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 15:34 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: > [PATCH] slab: remove one NR_CPUS dependency Ok simple enough. Acked-by: Christoph Lameter <cl@linux.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 15:34 ` Christoph Lameter @ 2011-07-20 15:56 ` Eric Dumazet -1 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 15:56 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 à 10:34 -0500, Christoph Lameter a écrit : > On Wed, 20 Jul 2011, Eric Dumazet wrote: > > > [PATCH] slab: remove one NR_CPUS dependency > > Ok simple enough. > > Acked-by: Christoph Lameter <cl@linux.com> > Thanks Christoph Here is the second patch, also simple and working for me (tested on x86_64, NR_CPUS=4096, on my 2x4x2 machine) Eventually, we could avoid the extra 'array' pointer if NR_CPUS is known to be a small value (<= 16 for example) Note that adding ____cacheline_aligned_in_smp on nodelists[] actually helps performance, as all following fields are readonly after kmem_cache setup. [PATCH] slab: shrinks sizeof(struct kmem_cache) Reduce high order allocations for some setups. (NR_CPUS=4096 -> we need 64KB per kmem_cache struct) Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Pekka Enberg <penberg@kernel.org> CC: Christoph Lameter <cl@linux.com> CC: Andrew Morton <akpm@linux-foundation.org> --- include/linux/slab_def.h | 4 ++-- mm/slab.c | 10 ++++++---- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 83203ae..abedd8e 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -51,7 +51,7 @@ struct kmem_cache { /* 1) per-cpu data, touched during every alloc/free */ - struct array_cache *array[NR_CPUS]; + struct array_cache **array; /* 2) Cache tunables. Protected by cache_chain_mutex */ unsigned int batchcount; unsigned int limit; @@ -118,7 +118,7 @@ struct kmem_cache { * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache * is statically defined, so we reserve the max number of nodes. */ - struct kmem_list3 *nodelists[MAX_NUMNODES]; + struct kmem_list3 *nodelists[MAX_NUMNODES] ____cacheline_aligned_in_smp; /* * Do not add fields after nodelists[] */ diff --git a/mm/slab.c b/mm/slab.c index d96e223..f951015 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -574,7 +574,9 @@ static struct arraycache_init initarray_generic = { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} }; /* internal cache of cache description objs */ +static struct array_cache *array_cache_cache[NR_CPUS]; static struct kmem_cache cache_cache = { + .array = array_cache_cache, .batchcount = 1, .limit = BOOT_CPUCACHE_ENTRIES, .shared = 1, @@ -1492,11 +1494,10 @@ void __init kmem_cache_init(void) cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node]; /* - * struct kmem_cache size depends on nr_node_ids, which - * can be less than MAX_NUMNODES. + * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids */ - cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + - nr_node_ids * sizeof(struct kmem_list3 *); + cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists[nr_node_ids]) + + nr_cpu_ids * sizeof(struct array_cache *); #if DEBUG cache_cache.obj_size = cache_cache.buffer_size; #endif @@ -2308,6 +2309,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, if (!cachep) goto oops; + cachep->array = (struct array_cache **)&cachep->nodelists[nr_node_ids]; #if DEBUG cachep->obj_size = size; ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 15:56 ` Eric Dumazet 0 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 15:56 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 A 10:34 -0500, Christoph Lameter a A(C)crit : > On Wed, 20 Jul 2011, Eric Dumazet wrote: > > > [PATCH] slab: remove one NR_CPUS dependency > > Ok simple enough. > > Acked-by: Christoph Lameter <cl@linux.com> > Thanks Christoph Here is the second patch, also simple and working for me (tested on x86_64, NR_CPUS=4096, on my 2x4x2 machine) Eventually, we could avoid the extra 'array' pointer if NR_CPUS is known to be a small value (<= 16 for example) Note that adding ____cacheline_aligned_in_smp on nodelists[] actually helps performance, as all following fields are readonly after kmem_cache setup. [PATCH] slab: shrinks sizeof(struct kmem_cache) Reduce high order allocations for some setups. (NR_CPUS=4096 -> we need 64KB per kmem_cache struct) Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Pekka Enberg <penberg@kernel.org> CC: Christoph Lameter <cl@linux.com> CC: Andrew Morton <akpm@linux-foundation.org> --- include/linux/slab_def.h | 4 ++-- mm/slab.c | 10 ++++++---- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 83203ae..abedd8e 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -51,7 +51,7 @@ struct kmem_cache { /* 1) per-cpu data, touched during every alloc/free */ - struct array_cache *array[NR_CPUS]; + struct array_cache **array; /* 2) Cache tunables. Protected by cache_chain_mutex */ unsigned int batchcount; unsigned int limit; @@ -118,7 +118,7 @@ struct kmem_cache { * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache * is statically defined, so we reserve the max number of nodes. */ - struct kmem_list3 *nodelists[MAX_NUMNODES]; + struct kmem_list3 *nodelists[MAX_NUMNODES] ____cacheline_aligned_in_smp; /* * Do not add fields after nodelists[] */ diff --git a/mm/slab.c b/mm/slab.c index d96e223..f951015 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -574,7 +574,9 @@ static struct arraycache_init initarray_generic = { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} }; /* internal cache of cache description objs */ +static struct array_cache *array_cache_cache[NR_CPUS]; static struct kmem_cache cache_cache = { + .array = array_cache_cache, .batchcount = 1, .limit = BOOT_CPUCACHE_ENTRIES, .shared = 1, @@ -1492,11 +1494,10 @@ void __init kmem_cache_init(void) cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node]; /* - * struct kmem_cache size depends on nr_node_ids, which - * can be less than MAX_NUMNODES. + * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids */ - cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + - nr_node_ids * sizeof(struct kmem_list3 *); + cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists[nr_node_ids]) + + nr_cpu_ids * sizeof(struct array_cache *); #if DEBUG cache_cache.obj_size = cache_cache.buffer_size; #endif @@ -2308,6 +2309,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, if (!cachep) goto oops; + cachep->array = (struct array_cache **)&cachep->nodelists[nr_node_ids]; #if DEBUG cachep->obj_size = size; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 15:56 ` Eric Dumazet @ 2011-07-20 16:17 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 16:17 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: > Note that adding ____cacheline_aligned_in_smp on nodelists[] actually > helps performance, as all following fields are readonly after kmem_cache > setup. Well but that is not addresssing the same issue. Could you separate that out? The other question that follows from this is then: Does that alignment compensate for the loss of performance due to the additional lookup in hot code paths and the additional cacheline reference required? The per node pointers are lower priority in terms of performance than the per cpu pointers. I'd rather have the per node pointers requiring an additional lookup. Less impact on hot code paths. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 16:17 ` Christoph Lameter 0 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 16:17 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: > Note that adding ____cacheline_aligned_in_smp on nodelists[] actually > helps performance, as all following fields are readonly after kmem_cache > setup. Well but that is not addresssing the same issue. Could you separate that out? The other question that follows from this is then: Does that alignment compensate for the loss of performance due to the additional lookup in hot code paths and the additional cacheline reference required? The per node pointers are lower priority in terms of performance than the per cpu pointers. I'd rather have the per node pointers requiring an additional lookup. Less impact on hot code paths. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 16:17 ` Christoph Lameter @ 2011-07-20 16:31 ` Eric Dumazet -1 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 16:31 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 à 11:17 -0500, Christoph Lameter a écrit : > On Wed, 20 Jul 2011, Eric Dumazet wrote: > > > Note that adding ____cacheline_aligned_in_smp on nodelists[] actually > > helps performance, as all following fields are readonly after kmem_cache > > setup. > > Well but that is not addresssing the same issue. Could you separate that > out? > I would like this patch not being a performance regression. I know some people really want fast SLAB/SLUB ;) > The other question that follows from this is then: Does that > alignment compensate for the loss of performance due to the additional > lookup in hot code paths and the additional cacheline reference required? > In fact resulting code is smaller, because most fields are now with < 127 offset (x86 assembly code can use shorter instructions) Before patch : # size mm/slab.o text data bss dec hex filename 22605 361665 32 384302 5dd2e mm/slab.o After patch : # size mm/slab.o text data bss dec hex filename 22347 328929 32800 384076 5dc4c mm/slab.o > The per node pointers are lower priority in terms of performance than the > per cpu pointers. I'd rather have the per node pointers requiring an > additional lookup. Less impact on hot code paths. > Sure. I'll post a V2 to have CPU array before NODE array. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 16:31 ` Eric Dumazet 0 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 16:31 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 A 11:17 -0500, Christoph Lameter a A(C)crit : > On Wed, 20 Jul 2011, Eric Dumazet wrote: > > > Note that adding ____cacheline_aligned_in_smp on nodelists[] actually > > helps performance, as all following fields are readonly after kmem_cache > > setup. > > Well but that is not addresssing the same issue. Could you separate that > out? > I would like this patch not being a performance regression. I know some people really want fast SLAB/SLUB ;) > The other question that follows from this is then: Does that > alignment compensate for the loss of performance due to the additional > lookup in hot code paths and the additional cacheline reference required? > In fact resulting code is smaller, because most fields are now with < 127 offset (x86 assembly code can use shorter instructions) Before patch : # size mm/slab.o text data bss dec hex filename 22605 361665 32 384302 5dd2e mm/slab.o After patch : # size mm/slab.o text data bss dec hex filename 22347 328929 32800 384076 5dc4c mm/slab.o > The per node pointers are lower priority in terms of performance than the > per cpu pointers. I'd rather have the per node pointers requiring an > additional lookup. Less impact on hot code paths. > Sure. I'll post a V2 to have CPU array before NODE array. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 16:31 ` Eric Dumazet @ 2011-07-20 17:04 ` Eric Dumazet -1 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 17:04 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 à 18:31 +0200, Eric Dumazet a écrit : > In fact resulting code is smaller, because most fields are now with < > 127 offset (x86 assembly code can use shorter instructions) > > Before patch : > # size mm/slab.o > text data bss dec hex filename > 22605 361665 32 384302 5dd2e mm/slab.o > > After patch : > # size mm/slab.o > text data bss dec hex filename > 22347 328929 32800 384076 5dc4c mm/slab.o > > > The per node pointers are lower priority in terms of performance than the > > per cpu pointers. I'd rather have the per node pointers requiring an > > additional lookup. Less impact on hot code paths. > > > > Sure. I'll post a V2 to have CPU array before NODE array. > Here it is. My dev machine still run after booting with this patch. No performance impact seen yet. Thanks [PATCH v2] slab: shrinks sizeof(struct kmem_cache) Reduce high order allocations for some setups. (NR_CPUS=4096 -> we need 64KB per kmem_cache struct) We now allocate exact needed size (using nr_cpu_ids and nr_node_ids) This also makes code a bit smaller on x86_64, since some field offsets are less than the 127 limit : Before patch : # size mm/slab.o text data bss dec hex filename 22605 361665 32 384302 5dd2e mm/slab.o After patch : # size mm/slab.o text data bss dec hex filename 22349 353473 8224 384046 5dc2e mm/slab.o Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Pekka Enberg <penberg@kernel.org> CC: Christoph Lameter <cl@linux.com> CC: Andrew Morton <akpm@linux-foundation.org> --- v2: move the CPU array before NODE array. include/linux/slab_def.h | 26 +++++++++++++------------- mm/slab.c | 10 ++++++---- 2 files changed, 19 insertions(+), 17 deletions(-) diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 83203ae..6e51c4d 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -50,21 +50,19 @@ */ struct kmem_cache { -/* 1) per-cpu data, touched during every alloc/free */ - struct array_cache *array[NR_CPUS]; -/* 2) Cache tunables. Protected by cache_chain_mutex */ +/* 1) Cache tunables. Protected by cache_chain_mutex */ unsigned int batchcount; unsigned int limit; unsigned int shared; unsigned int buffer_size; u32 reciprocal_buffer_size; -/* 3) touched by every alloc & free from the backend */ +/* 2) touched by every alloc & free from the backend */ unsigned int flags; /* constant flags */ unsigned int num; /* # of objs per slab */ -/* 4) cache_grow/shrink */ +/* 3) cache_grow/shrink */ /* order of pgs per slab (2^n) */ unsigned int gfporder; @@ -80,11 +78,11 @@ struct kmem_cache { /* constructor func */ void (*ctor)(void *obj); -/* 5) cache creation/removal */ +/* 4) cache creation/removal */ const char *name; struct list_head next; -/* 6) statistics */ +/* 5) statistics */ #ifdef CONFIG_DEBUG_SLAB unsigned long num_active; unsigned long num_allocations; @@ -111,16 +109,18 @@ struct kmem_cache { int obj_size; #endif /* CONFIG_DEBUG_SLAB */ +/* 6) per-cpu/per-node data, touched during every alloc/free */ /* - * We put nodelists[] at the end of kmem_cache, because we want to size - * this array to nr_node_ids slots instead of MAX_NUMNODES + * We put array[] at the end of kmem_cache, because we want to size + * this array to nr_cpu_ids slots instead of NR_CPUS * (see kmem_cache_init()) - * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache - * is statically defined, so we reserve the max number of nodes. + * We still use [NR_CPUS] and not [1] or [0] because cache_cache + * is statically defined, so we reserve the max number of cpus. */ - struct kmem_list3 *nodelists[MAX_NUMNODES]; + struct kmem_list3 **nodelists; + struct array_cache *array[NR_CPUS]; /* - * Do not add fields after nodelists[] + * Do not add fields after array[] */ }; diff --git a/mm/slab.c b/mm/slab.c index d96e223..30181fb 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -574,7 +574,9 @@ static struct arraycache_init initarray_generic = { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} }; /* internal cache of cache description objs */ +static struct kmem_list3 *cache_cache_nodelists[MAX_NUMNODES]; static struct kmem_cache cache_cache = { + .nodelists = cache_cache_nodelists, .batchcount = 1, .limit = BOOT_CPUCACHE_ENTRIES, .shared = 1, @@ -1492,11 +1494,10 @@ void __init kmem_cache_init(void) cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node]; /* - * struct kmem_cache size depends on nr_node_ids, which - * can be less than MAX_NUMNODES. + * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids */ - cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + - nr_node_ids * sizeof(struct kmem_list3 *); + cache_cache.buffer_size = offsetof(struct kmem_cache, array[nr_cpu_ids]) + + nr_node_ids * sizeof(struct kmem_list3 *); #if DEBUG cache_cache.obj_size = cache_cache.buffer_size; #endif @@ -2308,6 +2309,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, if (!cachep) goto oops; + cachep->nodelists = (struct kmem_list3 **)&cachep->array[nr_cpu_ids]; #if DEBUG cachep->obj_size = size; ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 17:04 ` Eric Dumazet 0 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-20 17:04 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 A 18:31 +0200, Eric Dumazet a A(C)crit : > In fact resulting code is smaller, because most fields are now with < > 127 offset (x86 assembly code can use shorter instructions) > > Before patch : > # size mm/slab.o > text data bss dec hex filename > 22605 361665 32 384302 5dd2e mm/slab.o > > After patch : > # size mm/slab.o > text data bss dec hex filename > 22347 328929 32800 384076 5dc4c mm/slab.o > > > The per node pointers are lower priority in terms of performance than the > > per cpu pointers. I'd rather have the per node pointers requiring an > > additional lookup. Less impact on hot code paths. > > > > Sure. I'll post a V2 to have CPU array before NODE array. > Here it is. My dev machine still run after booting with this patch. No performance impact seen yet. Thanks [PATCH v2] slab: shrinks sizeof(struct kmem_cache) Reduce high order allocations for some setups. (NR_CPUS=4096 -> we need 64KB per kmem_cache struct) We now allocate exact needed size (using nr_cpu_ids and nr_node_ids) This also makes code a bit smaller on x86_64, since some field offsets are less than the 127 limit : Before patch : # size mm/slab.o text data bss dec hex filename 22605 361665 32 384302 5dd2e mm/slab.o After patch : # size mm/slab.o text data bss dec hex filename 22349 353473 8224 384046 5dc2e mm/slab.o Reported-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Pekka Enberg <penberg@kernel.org> CC: Christoph Lameter <cl@linux.com> CC: Andrew Morton <akpm@linux-foundation.org> --- v2: move the CPU array before NODE array. include/linux/slab_def.h | 26 +++++++++++++------------- mm/slab.c | 10 ++++++---- 2 files changed, 19 insertions(+), 17 deletions(-) diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 83203ae..6e51c4d 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -50,21 +50,19 @@ */ struct kmem_cache { -/* 1) per-cpu data, touched during every alloc/free */ - struct array_cache *array[NR_CPUS]; -/* 2) Cache tunables. Protected by cache_chain_mutex */ +/* 1) Cache tunables. Protected by cache_chain_mutex */ unsigned int batchcount; unsigned int limit; unsigned int shared; unsigned int buffer_size; u32 reciprocal_buffer_size; -/* 3) touched by every alloc & free from the backend */ +/* 2) touched by every alloc & free from the backend */ unsigned int flags; /* constant flags */ unsigned int num; /* # of objs per slab */ -/* 4) cache_grow/shrink */ +/* 3) cache_grow/shrink */ /* order of pgs per slab (2^n) */ unsigned int gfporder; @@ -80,11 +78,11 @@ struct kmem_cache { /* constructor func */ void (*ctor)(void *obj); -/* 5) cache creation/removal */ +/* 4) cache creation/removal */ const char *name; struct list_head next; -/* 6) statistics */ +/* 5) statistics */ #ifdef CONFIG_DEBUG_SLAB unsigned long num_active; unsigned long num_allocations; @@ -111,16 +109,18 @@ struct kmem_cache { int obj_size; #endif /* CONFIG_DEBUG_SLAB */ +/* 6) per-cpu/per-node data, touched during every alloc/free */ /* - * We put nodelists[] at the end of kmem_cache, because we want to size - * this array to nr_node_ids slots instead of MAX_NUMNODES + * We put array[] at the end of kmem_cache, because we want to size + * this array to nr_cpu_ids slots instead of NR_CPUS * (see kmem_cache_init()) - * We still use [MAX_NUMNODES] and not [1] or [0] because cache_cache - * is statically defined, so we reserve the max number of nodes. + * We still use [NR_CPUS] and not [1] or [0] because cache_cache + * is statically defined, so we reserve the max number of cpus. */ - struct kmem_list3 *nodelists[MAX_NUMNODES]; + struct kmem_list3 **nodelists; + struct array_cache *array[NR_CPUS]; /* - * Do not add fields after nodelists[] + * Do not add fields after array[] */ }; diff --git a/mm/slab.c b/mm/slab.c index d96e223..30181fb 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -574,7 +574,9 @@ static struct arraycache_init initarray_generic = { {0, BOOT_CPUCACHE_ENTRIES, 1, 0} }; /* internal cache of cache description objs */ +static struct kmem_list3 *cache_cache_nodelists[MAX_NUMNODES]; static struct kmem_cache cache_cache = { + .nodelists = cache_cache_nodelists, .batchcount = 1, .limit = BOOT_CPUCACHE_ENTRIES, .shared = 1, @@ -1492,11 +1494,10 @@ void __init kmem_cache_init(void) cache_cache.nodelists[node] = &initkmem_list3[CACHE_CACHE + node]; /* - * struct kmem_cache size depends on nr_node_ids, which - * can be less than MAX_NUMNODES. + * struct kmem_cache size depends on nr_node_ids & nr_cpu_ids */ - cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + - nr_node_ids * sizeof(struct kmem_list3 *); + cache_cache.buffer_size = offsetof(struct kmem_cache, array[nr_cpu_ids]) + + nr_node_ids * sizeof(struct kmem_list3 *); #if DEBUG cache_cache.obj_size = cache_cache.buffer_size; #endif @@ -2308,6 +2309,7 @@ kmem_cache_create (const char *name, size_t size, size_t align, if (!cachep) goto oops; + cachep->nodelists = (struct kmem_list3 **)&cachep->array[nr_cpu_ids]; #if DEBUG cachep->obj_size = size; -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply related [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 17:04 ` Eric Dumazet @ 2011-07-20 17:13 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 17:13 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: > [PATCH v2] slab: shrinks sizeof(struct kmem_cache) This will solve the issue for small nr_cpu_ids but those with 4k cpus will still have the issue. Acked-by: Christoph Lameter <cl@linux.com> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 17:13 ` Christoph Lameter 0 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 17:13 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: > [PATCH v2] slab: shrinks sizeof(struct kmem_cache) This will solve the issue for small nr_cpu_ids but those with 4k cpus will still have the issue. Acked-by: Christoph Lameter <cl@linux.com> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 17:13 ` Christoph Lameter @ 2011-07-20 17:28 ` Pekka Enberg -1 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 17:28 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: >> [PATCH v2] slab: shrinks sizeof(struct kmem_cache) On Wed, 20 Jul 2011, Christoph Lameter wrote: > This will solve the issue for small nr_cpu_ids but those with 4k cpus will > still have the issue. > > Acked-by: Christoph Lameter <cl@linux.com> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin though? Pekka ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 17:28 ` Pekka Enberg 0 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 17:28 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Eric Dumazet wrote: >> [PATCH v2] slab: shrinks sizeof(struct kmem_cache) On Wed, 20 Jul 2011, Christoph Lameter wrote: > This will solve the issue for small nr_cpu_ids but those with 4k cpus will > still have the issue. > > Acked-by: Christoph Lameter <cl@linux.com> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin though? Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 17:28 ` Pekka Enberg @ 2011-07-20 17:37 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 17:37 UTC (permalink / raw) To: Pekka Enberg Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Eric Dumazet wrote: > > > [PATCH v2] slab: shrinks sizeof(struct kmem_cache) > > On Wed, 20 Jul 2011, Christoph Lameter wrote: > > This will solve the issue for small nr_cpu_ids but those with 4k cpus will > > still have the issue. > > > > Acked-by: Christoph Lameter <cl@linux.com> > > Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin > though? Those with 4k cpus will be thankful I guess. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 17:37 ` Christoph Lameter 0 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-20 17:37 UTC (permalink / raw) To: Pekka Enberg Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Eric Dumazet wrote: > > > [PATCH v2] slab: shrinks sizeof(struct kmem_cache) > > On Wed, 20 Jul 2011, Christoph Lameter wrote: > > This will solve the issue for small nr_cpu_ids but those with 4k cpus will > > still have the issue. > > > > Acked-by: Christoph Lameter <cl@linux.com> > > Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin > though? Those with 4k cpus will be thankful I guess. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 17:37 ` Christoph Lameter @ 2011-07-20 17:41 ` Pekka Enberg -1 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 17:41 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Pekka Enberg wrote: >> On Wed, 20 Jul 2011, Eric Dumazet wrote: >>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache) >> >> On Wed, 20 Jul 2011, Christoph Lameter wrote: >>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will >>> still have the issue. >>> >>> Acked-by: Christoph Lameter <cl@linux.com> >> >> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin >> though? On Wed, 20 Jul 2011, Christoph Lameter wrote: > Those with 4k cpus will be thankful I guess. OTOH, I'm slightly worried that it might mask a real problem with GFP_KERNEL not being aggressive enough. Mel? Pekka ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 17:41 ` Pekka Enberg 0 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-20 17:41 UTC (permalink / raw) To: Christoph Lameter Cc: Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, 20 Jul 2011, Pekka Enberg wrote: >> On Wed, 20 Jul 2011, Eric Dumazet wrote: >>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache) >> >> On Wed, 20 Jul 2011, Christoph Lameter wrote: >>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will >>> still have the issue. >>> >>> Acked-by: Christoph Lameter <cl@linux.com> >> >> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin >> though? On Wed, 20 Jul 2011, Christoph Lameter wrote: > Those with 4k cpus will be thankful I guess. OTOH, I'm slightly worried that it might mask a real problem with GFP_KERNEL not being aggressive enough. Mel? Pekka -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 17:41 ` Pekka Enberg @ 2011-07-20 18:07 ` Matt Mackall -1 siblings, 0 replies; 67+ messages in thread From: Matt Mackall @ 2011-07-20 18:07 UTC (permalink / raw) To: Pekka Enberg Cc: Christoph Lameter, Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel On Wed, 2011-07-20 at 20:41 +0300, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Pekka Enberg wrote: > >> On Wed, 20 Jul 2011, Eric Dumazet wrote: > >>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache) > >> > >> On Wed, 20 Jul 2011, Christoph Lameter wrote: > >>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will > >>> still have the issue. > >>> > >>> Acked-by: Christoph Lameter <cl@linux.com> > >> > >> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin > >> though? > > On Wed, 20 Jul 2011, Christoph Lameter wrote: > > Those with 4k cpus will be thankful I guess. > > OTOH, I'm slightly worried that it might mask a real problem > with GFP_KERNEL not being aggressive enough. Mel? I think that's already been demonstrated here, yes. It's just waiting for another obscure workload to trigger it. -- Mathematics is the supreme nostalgia of our time. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 18:07 ` Matt Mackall 0 siblings, 0 replies; 67+ messages in thread From: Matt Mackall @ 2011-07-20 18:07 UTC (permalink / raw) To: Pekka Enberg Cc: Christoph Lameter, Eric Dumazet, Mel Gorman, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel On Wed, 2011-07-20 at 20:41 +0300, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Pekka Enberg wrote: > >> On Wed, 20 Jul 2011, Eric Dumazet wrote: > >>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache) > >> > >> On Wed, 20 Jul 2011, Christoph Lameter wrote: > >>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will > >>> still have the issue. > >>> > >>> Acked-by: Christoph Lameter <cl@linux.com> > >> > >> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin > >> though? > > On Wed, 20 Jul 2011, Christoph Lameter wrote: > > Those with 4k cpus will be thankful I guess. > > OTOH, I'm slightly worried that it might mask a real problem > with GFP_KERNEL not being aggressive enough. Mel? I think that's already been demonstrated here, yes. It's just waiting for another obscure workload to trigger it. -- Mathematics is the supreme nostalgia of our time. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 18:07 ` Matt Mackall @ 2011-07-21 7:18 ` Konstantin Khlebnikov -1 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-21 7:18 UTC (permalink / raw) To: Matt Mackall Cc: Pekka Enberg, Christoph Lameter, Eric Dumazet, Mel Gorman, Andrew Morton, linux-mm, linux-kernel Matt Mackall wrote: > On Wed, 2011-07-20 at 20:41 +0300, Pekka Enberg wrote: >> On Wed, 20 Jul 2011, Pekka Enberg wrote: >>>> On Wed, 20 Jul 2011, Eric Dumazet wrote: >>>>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache) >>>> >>>> On Wed, 20 Jul 2011, Christoph Lameter wrote: >>>>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will >>>>> still have the issue. >>>>> >>>>> Acked-by: Christoph Lameter<cl@linux.com> >>>> >>>> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin >>>> though? >> >> On Wed, 20 Jul 2011, Christoph Lameter wrote: >>> Those with 4k cpus will be thankful I guess. >> >> OTOH, I'm slightly worried that it might mask a real problem >> with GFP_KERNEL not being aggressive enough. Mel? > > I think that's already been demonstrated here, yes. It's just waiting > for another obscure workload to trigger it. > I Agree, __GFP_REPEAT doesn't fix the problem completely. So, some more detailed investigation required. My case is not very honest, because my kernel has slightly different memory controller, and it has bugs for sure. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-21 7:18 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-21 7:18 UTC (permalink / raw) To: Matt Mackall Cc: Pekka Enberg, Christoph Lameter, Eric Dumazet, Mel Gorman, Andrew Morton, linux-mm, linux-kernel Matt Mackall wrote: > On Wed, 2011-07-20 at 20:41 +0300, Pekka Enberg wrote: >> On Wed, 20 Jul 2011, Pekka Enberg wrote: >>>> On Wed, 20 Jul 2011, Eric Dumazet wrote: >>>>>> [PATCH v2] slab: shrinks sizeof(struct kmem_cache) >>>> >>>> On Wed, 20 Jul 2011, Christoph Lameter wrote: >>>>> This will solve the issue for small nr_cpu_ids but those with 4k cpus will >>>>> still have the issue. >>>>> >>>>> Acked-by: Christoph Lameter<cl@linux.com> >>>> >>>> Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin >>>> though? >> >> On Wed, 20 Jul 2011, Christoph Lameter wrote: >>> Those with 4k cpus will be thankful I guess. >> >> OTOH, I'm slightly worried that it might mask a real problem >> with GFP_KERNEL not being aggressive enough. Mel? > > I think that's already been demonstrated here, yes. It's just waiting > for another obscure workload to trigger it. > I Agree, __GFP_REPEAT doesn't fix the problem completely. So, some more detailed investigation required. My case is not very honest, because my kernel has slightly different memory controller, and it has bugs for sure. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 17:41 ` Pekka Enberg @ 2011-07-20 19:09 ` Mel Gorman -1 siblings, 0 replies; 67+ messages in thread From: Mel Gorman @ 2011-07-20 19:09 UTC (permalink / raw) To: Pekka Enberg Cc: Christoph Lameter, Eric Dumazet, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, Jul 20, 2011 at 08:41:12PM +0300, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Pekka Enberg wrote: > >>On Wed, 20 Jul 2011, Eric Dumazet wrote: > >>>>[PATCH v2] slab: shrinks sizeof(struct kmem_cache) > >> > >>On Wed, 20 Jul 2011, Christoph Lameter wrote: > >>>This will solve the issue for small nr_cpu_ids but those with 4k cpus will > >>>still have the issue. > >>> > >>>Acked-by: Christoph Lameter <cl@linux.com> > >> > >>Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin > >>though? > > On Wed, 20 Jul 2011, Christoph Lameter wrote: > >Those with 4k cpus will be thankful I guess. > > OTOH, I'm slightly worried that it might mask a real problem with > GFP_KERNEL not being aggressive enough. Mel? > The reproduction case was while memory was under heavy pressure (swapout was active) and even then only 1 in a 1000 containers were failing to create due to an order-4 allocation failure. I'm not convinced we need to increase how aggressive the allocator is for PAGE_ALLOC_COSTLY_ORDER in general based on this. -- Mel Gorman SUSE Labs ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-20 19:09 ` Mel Gorman 0 siblings, 0 replies; 67+ messages in thread From: Mel Gorman @ 2011-07-20 19:09 UTC (permalink / raw) To: Pekka Enberg Cc: Christoph Lameter, Eric Dumazet, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Wed, Jul 20, 2011 at 08:41:12PM +0300, Pekka Enberg wrote: > On Wed, 20 Jul 2011, Pekka Enberg wrote: > >>On Wed, 20 Jul 2011, Eric Dumazet wrote: > >>>>[PATCH v2] slab: shrinks sizeof(struct kmem_cache) > >> > >>On Wed, 20 Jul 2011, Christoph Lameter wrote: > >>>This will solve the issue for small nr_cpu_ids but those with 4k cpus will > >>>still have the issue. > >>> > >>>Acked-by: Christoph Lameter <cl@linux.com> > >> > >>Applied, thanks! Do we still want the __GFP_REPEAT patch from Konstantin > >>though? > > On Wed, 20 Jul 2011, Christoph Lameter wrote: > >Those with 4k cpus will be thankful I guess. > > OTOH, I'm slightly worried that it might mask a real problem with > GFP_KERNEL not being aggressive enough. Mel? > The reproduction case was while memory was under heavy pressure (swapout was active) and even then only 1 in a 1000 containers were failing to create due to an order-4 allocation failure. I'm not convinced we need to increase how aggressive the allocator is for PAGE_ALLOC_COSTLY_ORDER in general based on this. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 15:09 ` Eric Dumazet @ 2011-07-31 11:41 ` Konstantin Khlebnikov -1 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-31 11:41 UTC (permalink / raw) To: Eric Dumazet Cc: Christoph Lameter, Mel Gorman, Pekka Enberg, Andrew Morton, linux-mm, linux-kernel, Matt Mackall It seems someone forgot this patch, the second one "slab: shrink sizeof(struct kmem_cache)" already in mainline Eric Dumazet wrote: > Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit : >> On Wed, 20 Jul 2011, Eric Dumazet wrote: >> >>>> Slab's kmem_cache is configured with an array of NR_CPUS which is the >>>> maximum nr of cpus supported. Some distros support 4096 cpus in order to >>>> accomodate SGI machines. That array then will have the size of 4096 * 8 = >>>> 32k >>> >>> We currently support a dynamic schem for the possible nodes : >>> >>> cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + >>> nr_node_ids * sizeof(struct kmem_list3 *); >>> >>> We could have a similar trick to make the real size both depends on >>> nr_node_ids and nr_cpu_ids. >>> >>> (struct kmem_cache)->array would become a pointer. >> >> We should be making it a per cpu pointer like slub then. I looked at what >> it would take to do so a couple of month ago but it was quite invasive. >> > > Lets try this first patch, simple enough : No need to setup percpu data > for a one time use structure... > > [PATCH] slab: remove one NR_CPUS dependency > > Reduce high order allocations in do_tune_cpucache() for some setups. > (NR_CPUS=4096 -> we need 64KB) > > Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com> > CC: Pekka Enberg<penberg@kernel.org> > CC: Christoph Lameter<cl@linux.com> > --- > mm/slab.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/slab.c b/mm/slab.c > index d96e223..862bd12 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -3933,7 +3933,7 @@ fail: > > struct ccupdate_struct { > struct kmem_cache *cachep; > - struct array_cache *new[NR_CPUS]; > + struct array_cache *new[0]; > }; > > static void do_ccupdate_local(void *info) > @@ -3955,7 +3955,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit, > struct ccupdate_struct *new; > int i; > > - new = kzalloc(sizeof(*new), gfp); > + new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *), > + gfp); > if (!new) > return -ENOMEM; > > > ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-31 11:41 ` Konstantin Khlebnikov 0 siblings, 0 replies; 67+ messages in thread From: Konstantin Khlebnikov @ 2011-07-31 11:41 UTC (permalink / raw) To: Eric Dumazet Cc: Christoph Lameter, Mel Gorman, Pekka Enberg, Andrew Morton, linux-mm, linux-kernel, Matt Mackall It seems someone forgot this patch, the second one "slab: shrink sizeof(struct kmem_cache)" already in mainline Eric Dumazet wrote: > Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit : >> On Wed, 20 Jul 2011, Eric Dumazet wrote: >> >>>> Slab's kmem_cache is configured with an array of NR_CPUS which is the >>>> maximum nr of cpus supported. Some distros support 4096 cpus in order to >>>> accomodate SGI machines. That array then will have the size of 4096 * 8 = >>>> 32k >>> >>> We currently support a dynamic schem for the possible nodes : >>> >>> cache_cache.buffer_size = offsetof(struct kmem_cache, nodelists) + >>> nr_node_ids * sizeof(struct kmem_list3 *); >>> >>> We could have a similar trick to make the real size both depends on >>> nr_node_ids and nr_cpu_ids. >>> >>> (struct kmem_cache)->array would become a pointer. >> >> We should be making it a per cpu pointer like slub then. I looked at what >> it would take to do so a couple of month ago but it was quite invasive. >> > > Lets try this first patch, simple enough : No need to setup percpu data > for a one time use structure... > > [PATCH] slab: remove one NR_CPUS dependency > > Reduce high order allocations in do_tune_cpucache() for some setups. > (NR_CPUS=4096 -> we need 64KB) > > Signed-off-by: Eric Dumazet<eric.dumazet@gmail.com> > CC: Pekka Enberg<penberg@kernel.org> > CC: Christoph Lameter<cl@linux.com> > --- > mm/slab.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/mm/slab.c b/mm/slab.c > index d96e223..862bd12 100644 > --- a/mm/slab.c > +++ b/mm/slab.c > @@ -3933,7 +3933,7 @@ fail: > > struct ccupdate_struct { > struct kmem_cache *cachep; > - struct array_cache *new[NR_CPUS]; > + struct array_cache *new[0]; > }; > > static void do_ccupdate_local(void *info) > @@ -3955,7 +3955,8 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit, > struct ccupdate_struct *new; > int i; > > - new = kzalloc(sizeof(*new), gfp); > + new = kzalloc(sizeof(*new) + nr_cpu_ids * sizeof(struct array_cache *), > + gfp); > if (!new) > return -ENOMEM; > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-31 11:41 ` Konstantin Khlebnikov @ 2011-07-31 11:44 ` Pekka Enberg -1 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-31 11:44 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Eric Dumazet, Christoph Lameter, Mel Gorman, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Sun, Jul 31, 2011 at 2:41 PM, Konstantin Khlebnikov <khlebnikov@parallels.com> wrote: > It seems someone forgot this patch, [snip] >> [PATCH] slab: remove one NR_CPUS dependency >> >> Reduce high order allocations in do_tune_cpucache() for some setups. >> (NR_CPUS=4096 -> we need 64KB) It's queued in slab/urgent branch and will be sent to Linus tonight. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-31 11:44 ` Pekka Enberg 0 siblings, 0 replies; 67+ messages in thread From: Pekka Enberg @ 2011-07-31 11:44 UTC (permalink / raw) To: Konstantin Khlebnikov Cc: Eric Dumazet, Christoph Lameter, Mel Gorman, Andrew Morton, linux-mm, linux-kernel, Matt Mackall On Sun, Jul 31, 2011 at 2:41 PM, Konstantin Khlebnikov <khlebnikov@parallels.com> wrote: > It seems someone forgot this patch, [snip] >> [PATCH] slab: remove one NR_CPUS dependency >> >> Reduce high order allocations in do_tune_cpucache() for some setups. >> (NR_CPUS=4096 -> we need 64KB) It's queued in slab/urgent branch and will be sent to Linus tonight. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-20 14:52 ` Christoph Lameter @ 2011-07-21 8:43 ` Eric Dumazet -1 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-21 8:43 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit : > We should be making it a per cpu pointer like slub then. I looked at what > it would take to do so a couple of month ago but it was quite invasive. > I took a look at this too, but using percpu data would consume more memory, because percpu allocator allocates memory blobs for all possible cpus, while current code handles online/offline cpu nicely. ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT @ 2011-07-21 8:43 ` Eric Dumazet 0 siblings, 0 replies; 67+ messages in thread From: Eric Dumazet @ 2011-07-21 8:43 UTC (permalink / raw) To: Christoph Lameter Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall Le mercredi 20 juillet 2011 A 09:52 -0500, Christoph Lameter a A(C)crit : > We should be making it a per cpu pointer like slub then. I looked at what > it would take to do so a couple of month ago but it was quite invasive. > I took a look at this too, but using percpu data would consume more memory, because percpu allocator allocates memory blobs for all possible cpus, while current code handles online/offline cpu nicely. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 67+ messages in thread
* Re: [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT 2011-07-21 8:43 ` Eric Dumazet (?) @ 2011-07-21 15:27 ` Christoph Lameter -1 siblings, 0 replies; 67+ messages in thread From: Christoph Lameter @ 2011-07-21 15:27 UTC (permalink / raw) To: Eric Dumazet Cc: Mel Gorman, Pekka Enberg, Konstantin Khlebnikov, Andrew Morton, linux-mm, linux-kernel, Matt Mackall [-- Attachment #1: Type: TEXT/PLAIN, Size: 825 bytes --] On Thu, 21 Jul 2011, Eric Dumazet wrote: > Le mercredi 20 juillet 2011 à 09:52 -0500, Christoph Lameter a écrit : > > > We should be making it a per cpu pointer like slub then. I looked at what > > it would take to do so a couple of month ago but it was quite invasive. > > > > I took a look at this too, but using percpu data would consume more > memory, because percpu allocator allocates memory blobs for all possible > cpus, while current code handles online/offline cpu nicely. The number of possible cpus is determined on bootup. If the BIOS provides the right information about which cpus are present and if there is no hotswapping of cpus possible then only per cpu areas for the actual functioning cpus are allocated. Certainly no desktop bios will indicate that 4096 cpus are possible. ^ permalink raw reply [flat|nested] 67+ messages in thread
end of thread, other threads:[~2011-07-31 11:45 UTC | newest] Thread overview: 67+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-07-20 12:16 [PATCH] mm-slab: allocate kmem_cache with __GFP_REPEAT Konstantin Khlebnikov 2011-07-20 12:16 ` Konstantin Khlebnikov 2011-07-20 13:14 ` Pekka Enberg 2011-07-20 13:14 ` Pekka Enberg 2011-07-20 13:28 ` Konstantin Khlebnikov 2011-07-20 13:28 ` Konstantin Khlebnikov 2011-07-20 13:42 ` Pekka Enberg 2011-07-20 13:42 ` Pekka Enberg 2011-07-20 13:50 ` Konstantin Khlebnikov 2011-07-20 13:50 ` Konstantin Khlebnikov 2011-07-20 13:53 ` Pekka Enberg 2011-07-20 13:53 ` Pekka Enberg 2011-07-20 15:36 ` Christoph Lameter 2011-07-20 15:36 ` Christoph Lameter 2011-07-20 13:59 ` Konstantin Khlebnikov 2011-07-20 13:59 ` Konstantin Khlebnikov 2011-07-20 13:54 ` Christoph Lameter 2011-07-20 13:54 ` Christoph Lameter 2011-07-20 14:20 ` Mel Gorman 2011-07-20 14:20 ` Mel Gorman 2011-07-20 14:32 ` Konstantin Khlebnikov 2011-07-20 14:32 ` Konstantin Khlebnikov 2011-07-20 14:40 ` Eric Dumazet 2011-07-20 14:40 ` Eric Dumazet 2011-07-20 14:47 ` Konstantin Khlebnikov 2011-07-20 14:47 ` Konstantin Khlebnikov 2011-07-20 13:43 ` Mel Gorman 2011-07-20 13:43 ` Mel Gorman 2011-07-20 13:56 ` Christoph Lameter 2011-07-20 13:56 ` Christoph Lameter 2011-07-20 14:08 ` Eric Dumazet 2011-07-20 14:08 ` Eric Dumazet 2011-07-20 14:52 ` Christoph Lameter 2011-07-20 14:52 ` Christoph Lameter 2011-07-20 15:09 ` Eric Dumazet 2011-07-20 15:09 ` Eric Dumazet 2011-07-20 15:34 ` Christoph Lameter 2011-07-20 15:34 ` Christoph Lameter 2011-07-20 15:56 ` Eric Dumazet 2011-07-20 15:56 ` Eric Dumazet 2011-07-20 16:17 ` Christoph Lameter 2011-07-20 16:17 ` Christoph Lameter 2011-07-20 16:31 ` Eric Dumazet 2011-07-20 16:31 ` Eric Dumazet 2011-07-20 17:04 ` Eric Dumazet 2011-07-20 17:04 ` Eric Dumazet 2011-07-20 17:13 ` Christoph Lameter 2011-07-20 17:13 ` Christoph Lameter 2011-07-20 17:28 ` Pekka Enberg 2011-07-20 17:28 ` Pekka Enberg 2011-07-20 17:37 ` Christoph Lameter 2011-07-20 17:37 ` Christoph Lameter 2011-07-20 17:41 ` Pekka Enberg 2011-07-20 17:41 ` Pekka Enberg 2011-07-20 18:07 ` Matt Mackall 2011-07-20 18:07 ` Matt Mackall 2011-07-21 7:18 ` Konstantin Khlebnikov 2011-07-21 7:18 ` Konstantin Khlebnikov 2011-07-20 19:09 ` Mel Gorman 2011-07-20 19:09 ` Mel Gorman 2011-07-31 11:41 ` Konstantin Khlebnikov 2011-07-31 11:41 ` Konstantin Khlebnikov 2011-07-31 11:44 ` Pekka Enberg 2011-07-31 11:44 ` Pekka Enberg 2011-07-21 8:43 ` Eric Dumazet 2011-07-21 8:43 ` Eric Dumazet 2011-07-21 15:27 ` Christoph Lameter
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.