All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] cache-specific changes for memcg (preparation)
@ 2012-06-20 20:59 Glauber Costa
  2012-06-20 20:59 ` [PATCH 1/4] slab: rename gfpflags to allocflags Glauber Costa
                   ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Glauber Costa @ 2012-06-20 20:59 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-mm, Cristoph Lameter, David Rientjes

Pekka,

Please consider merging the following patches.
Patches 1-3 are around for a while, specially 1 and 3.

Patch #4 is a bit newer, and got less reviews (reviews appreciated),
but I am using it myself without further problems.

Feel free to pick all of them, or part of them, the way you prefer.

Glauber Costa (4):
  slab: rename gfpflags to allocflags
  Wipe out CFLGS_OFF_SLAB from flags during initial slab creation
  slab: move FULL state transition to an initcall
  don't do __ClearPageSlab before freeing slab page.

 include/linux/slab_def.h |    2 +-
 mm/page_alloc.c          |    5 ++++-
 mm/slab.c                |   29 +++++++++++++++--------------
 mm/slob.c                |    1 -
 mm/slub.c                |    1 -
 5 files changed, 20 insertions(+), 18 deletions(-)

-- 
1.7.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] slab: rename gfpflags to allocflags
  2012-06-20 20:59 [PATCH 0/4] cache-specific changes for memcg (preparation) Glauber Costa
@ 2012-06-20 20:59 ` Glauber Costa
  2012-06-21  7:53   ` David Rientjes
  2012-06-20 20:59 ` [PATCH 2/4] Wipe out CFLGS_OFF_SLAB from flags during initial slab creation Glauber Costa
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-06-20 20:59 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-mm, Cristoph Lameter, David Rientjes, Glauber Costa, Pekka Enberg

A consistent name with slub saves us an acessor function.
In both caches, this field represents the same thing. We would
like to use it from the mem_cgroup code.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
---
 include/linux/slab_def.h |    2 +-
 mm/slab.c                |   10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 1d93f27..0c634fa 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -39,7 +39,7 @@ struct kmem_cache {
 	unsigned int gfporder;
 
 	/* force GFP flags, e.g. GFP_DMA */
-	gfp_t gfpflags;
+	gfp_t allocflags;
 
 	size_t colour;			/* cache colouring range */
 	unsigned int colour_off;	/* colour offset */
diff --git a/mm/slab.c b/mm/slab.c
index dd607a8..bb79652 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1771,7 +1771,7 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 	flags |= __GFP_COMP;
 #endif
 
-	flags |= cachep->gfpflags;
+	flags |= cachep->allocflags;
 	if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
 		flags |= __GFP_RECLAIMABLE;
 
@@ -2482,9 +2482,9 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 	cachep->colour = left_over / cachep->colour_off;
 	cachep->slab_size = slab_size;
 	cachep->flags = flags;
-	cachep->gfpflags = 0;
+	cachep->allocflags = 0;
 	if (CONFIG_ZONE_DMA_FLAG && (flags & SLAB_CACHE_DMA))
-		cachep->gfpflags |= GFP_DMA;
+		cachep->allocflags |= GFP_DMA;
 	cachep->size = size;
 	cachep->reciprocal_buffer_size = reciprocal_value(size);
 
@@ -2831,9 +2831,9 @@ static void kmem_flagcheck(struct kmem_cache *cachep, gfp_t flags)
 {
 	if (CONFIG_ZONE_DMA_FLAG) {
 		if (flags & GFP_DMA)
-			BUG_ON(!(cachep->gfpflags & GFP_DMA));
+			BUG_ON(!(cachep->allocflags & GFP_DMA));
 		else
-			BUG_ON(cachep->gfpflags & GFP_DMA);
+			BUG_ON(cachep->allocflags & GFP_DMA);
 	}
 }
 
-- 
1.7.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] Wipe out CFLGS_OFF_SLAB from flags during initial slab creation
  2012-06-20 20:59 [PATCH 0/4] cache-specific changes for memcg (preparation) Glauber Costa
  2012-06-20 20:59 ` [PATCH 1/4] slab: rename gfpflags to allocflags Glauber Costa
@ 2012-06-20 20:59 ` Glauber Costa
  2012-06-21  7:59   ` David Rientjes
  2012-06-20 20:59 ` [PATCH 3/4] slab: move FULL state transition to an initcall Glauber Costa
  2012-06-20 20:59 ` [PATCH 4/4] don't do __ClearPageSlab before freeing slab page Glauber Costa
  3 siblings, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-06-20 20:59 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-mm, Cristoph Lameter, David Rientjes, Pekka Enberg

CFLGS_OFF_SLAB is not a valid flag to be passed to cache creation.
If we are duplicating a cache - support added in a future patch -
we will rely on the flags it has stored in itself. That may include
CFLGS_OFF_SLAB.

So it is better to clean this flag at cache creation.

CC: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: David Rientjes <rientjes@google.com>
---
 mm/slab.c |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/mm/slab.c b/mm/slab.c
index bb79652..0d1bd09 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2317,6 +2317,12 @@ kmem_cache_create (const char *name, size_t size, size_t align,
 		BUG_ON(flags & SLAB_POISON);
 #endif
 	/*
+	 * Passing this flag at creation time is invalid, but if we're
+	 * duplicating a slab, it may happen.
+	 */
+	flags &= ~CFLGS_OFF_SLAB;
+
+	/*
 	 * Always checks flags, a caller might be expecting debug support which
 	 * isn't available.
 	 */
-- 
1.7.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] slab: move FULL state transition to an initcall
  2012-06-20 20:59 [PATCH 0/4] cache-specific changes for memcg (preparation) Glauber Costa
  2012-06-20 20:59 ` [PATCH 1/4] slab: rename gfpflags to allocflags Glauber Costa
  2012-06-20 20:59 ` [PATCH 2/4] Wipe out CFLGS_OFF_SLAB from flags during initial slab creation Glauber Costa
@ 2012-06-20 20:59 ` Glauber Costa
  2012-06-21  8:01   ` David Rientjes
  2012-06-20 20:59 ` [PATCH 4/4] don't do __ClearPageSlab before freeing slab page Glauber Costa
  3 siblings, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-06-20 20:59 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-mm, Cristoph Lameter, David Rientjes, Glauber Costa, Pekka Enberg

During kmem_cache_init_late(), we transition to the LATE state,
and after some more work, to the FULL state, its last state

This is quite different from slub, that will only transition to
its last state (previously SYSFS), in a (late)initcall, after a lot
more of the kernel is ready.

This means that in slab, we have no way to taking actions dependent
on the initialization of other pieces of the kernel that are supposed
to start way after kmem_init_late(), such as cgroups initialization.

To achieve more consistency in this behavior, that patch only
transitions to the UP state in kmem_init_late. In my analysis,
setup_cpu_cache() should be happy to test for >= UP, instead of
== FULL. It also has passed some tests I've made.

We then only mark FULL state after the reap timers are in place,
meaning that no further setup is expected.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Acked-by: Christoph Lameter <cl@linux.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: David Rientjes <rientjes@google.com>
---
 mm/slab.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 0d1bd09..cb6da05 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1668,9 +1668,6 @@ void __init kmem_cache_init_late(void)
 			BUG();
 	mutex_unlock(&cache_chain_mutex);
 
-	/* Done! */
-	g_cpucache_up = FULL;
-
 	/*
 	 * Register a cpu startup notifier callback that initializes
 	 * cpu_cache_get for all new cpus
@@ -1700,6 +1697,9 @@ static int __init cpucache_init(void)
 	 */
 	for_each_online_cpu(cpu)
 		start_cpu_timer(cpu);
+
+	/* Done! */
+	g_cpucache_up = FULL;
 	return 0;
 }
 __initcall(cpucache_init);
@@ -2167,7 +2167,7 @@ static size_t calculate_slab_order(struct kmem_cache *cachep,
 
 static int __init_refok setup_cpu_cache(struct kmem_cache *cachep, gfp_t gfp)
 {
-	if (g_cpucache_up == FULL)
+	if (g_cpucache_up >= LATE)
 		return enable_cpucache(cachep, gfp);
 
 	if (g_cpucache_up == NONE) {
-- 
1.7.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] don't do __ClearPageSlab before freeing slab page.
  2012-06-20 20:59 [PATCH 0/4] cache-specific changes for memcg (preparation) Glauber Costa
                   ` (2 preceding siblings ...)
  2012-06-20 20:59 ` [PATCH 3/4] slab: move FULL state transition to an initcall Glauber Costa
@ 2012-06-20 20:59 ` Glauber Costa
  2012-06-21  8:04   ` David Rientjes
  3 siblings, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-06-20 20:59 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-mm, Cristoph Lameter, David Rientjes, Glauber Costa,
	Pekka Enberg, Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner,
	Suleiman Souhlal

This will give the oportunity to the page allocator to
determine that a given page was previously a slab page, and
take action accordingly.

If memcg kmem is present, this means that that page needs to
be unaccounted. The page allocator will now have the responsibility
to clear that bit upon free_pages().

It is not uncommon to have the page allocator to check page flags.
Mlock flag, for instance, is checked pervasively all over the place.
So I hope this is okay for the slab as well.

Signed-off-by: Glauber Costa <glommer@parallels.com>
CC: Pekka Enberg <penberg@cs.helsinki.fi>
CC: Michal Hocko <mhocko@suse.cz>
CC: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Johannes Weiner <hannes@cmpxchg.org>
CC: Suleiman Souhlal <suleiman@google.com>
---
 mm/page_alloc.c |    5 ++++-
 mm/slab.c       |    5 -----
 mm/slob.c       |    1 -
 mm/slub.c       |    1 -
 4 files changed, 4 insertions(+), 8 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6092f33..fdec73e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -698,8 +698,10 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
 
 	if (PageAnon(page))
 		page->mapping = NULL;
-	for (i = 0; i < (1 << order); i++)
+	for (i = 0; i < (1 << order); i++) {
+		__ClearPageSlab(page + i);
 		bad += free_pages_check(page + i);
+	}
 	if (bad)
 		return false;
 
@@ -2561,6 +2563,7 @@ EXPORT_SYMBOL(get_zeroed_page);
 void __free_pages(struct page *page, unsigned int order)
 {
 	if (put_page_testzero(page)) {
+		__ClearPageSlab(page);
 		if (order == 0)
 			free_hot_cold_page(page, 0);
 		else
diff --git a/mm/slab.c b/mm/slab.c
index cb6da05..3e578fc 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1821,11 +1821,6 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 	else
 		sub_zone_page_state(page_zone(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
-	while (i--) {
-		BUG_ON(!PageSlab(page));
-		__ClearPageSlab(page);
-		page++;
-	}
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += nr_freed;
 	free_pages((unsigned long)addr, cachep->gfporder);
diff --git a/mm/slob.c b/mm/slob.c
index 95d1c7d..48b9a79 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -359,7 +359,6 @@ static void slob_free(void *block, int size)
 		if (slob_page_free(sp))
 			clear_slob_page_free(sp);
 		spin_unlock_irqrestore(&slob_lock, flags);
-		__ClearPageSlab(sp);
 		reset_page_mapcount(sp);
 		slob_free_pages(b, 0);
 		return;
diff --git a/mm/slub.c b/mm/slub.c
index f96d8bc..b0ac04a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1413,7 +1413,6 @@ static void __free_slab(struct kmem_cache *s, struct page *page)
 		NR_SLAB_RECLAIMABLE : NR_SLAB_UNRECLAIMABLE,
 		-pages);
 
-	__ClearPageSlab(page);
 	reset_page_mapcount(page);
 	if (current->reclaim_state)
 		current->reclaim_state->reclaimed_slab += pages;
-- 
1.7.10.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] slab: rename gfpflags to allocflags
  2012-06-20 20:59 ` [PATCH 1/4] slab: rename gfpflags to allocflags Glauber Costa
@ 2012-06-21  7:53   ` David Rientjes
  0 siblings, 0 replies; 16+ messages in thread
From: David Rientjes @ 2012-06-21  7:53 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Pekka Enberg, linux-mm, Cristoph Lameter, Pekka Enberg

On Thu, 21 Jun 2012, Glauber Costa wrote:

> A consistent name with slub saves us an acessor function.
> In both caches, this field represents the same thing. We would
> like to use it from the mem_cgroup code.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Acked-by: Christoph Lameter <cl@linux.com>
> CC: Pekka Enberg <penberg@cs.helsinki.fi>

Acked-by: David Rientjes <rientjes@google.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/4] Wipe out CFLGS_OFF_SLAB from flags during initial slab creation
  2012-06-20 20:59 ` [PATCH 2/4] Wipe out CFLGS_OFF_SLAB from flags during initial slab creation Glauber Costa
@ 2012-06-21  7:59   ` David Rientjes
  0 siblings, 0 replies; 16+ messages in thread
From: David Rientjes @ 2012-06-21  7:59 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Pekka Enberg, linux-mm, Cristoph Lameter, Pekka Enberg

On Thu, 21 Jun 2012, Glauber Costa wrote:

> CFLGS_OFF_SLAB is not a valid flag to be passed to cache creation.
> If we are duplicating a cache - support added in a future patch -
> we will rely on the flags it has stored in itself. That may include
> CFLGS_OFF_SLAB.
> 
> So it is better to clean this flag at cache creation.
> 

I think this should be folded into the patch that allows cache 
duplication, it doesn't make sense to apply right now because 
CFLGS_OFF_SLAB is internal to mm/slab.c and it's not going to be passing 
this to kmem_cache_create() itself yet.

Also, do we care that cache_estimate() uses a formal of type int for flags 
when CFLGS_OFF_SLAB is unsigned long?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] slab: move FULL state transition to an initcall
  2012-06-20 20:59 ` [PATCH 3/4] slab: move FULL state transition to an initcall Glauber Costa
@ 2012-06-21  8:01   ` David Rientjes
  2012-07-02 10:57     ` Pekka Enberg
  0 siblings, 1 reply; 16+ messages in thread
From: David Rientjes @ 2012-06-21  8:01 UTC (permalink / raw)
  To: Glauber Costa; +Cc: Pekka Enberg, linux-mm, Cristoph Lameter, Pekka Enberg

On Thu, 21 Jun 2012, Glauber Costa wrote:

> During kmem_cache_init_late(), we transition to the LATE state,
> and after some more work, to the FULL state, its last state
> 
> This is quite different from slub, that will only transition to
> its last state (previously SYSFS), in a (late)initcall, after a lot
> more of the kernel is ready.
> 
> This means that in slab, we have no way to taking actions dependent
> on the initialization of other pieces of the kernel that are supposed
> to start way after kmem_init_late(), such as cgroups initialization.
> 
> To achieve more consistency in this behavior, that patch only
> transitions to the UP state in kmem_init_late. In my analysis,
> setup_cpu_cache() should be happy to test for >= UP, instead of
> == FULL. It also has passed some tests I've made.
> 
> We then only mark FULL state after the reap timers are in place,
> meaning that no further setup is expected.
> 
> Signed-off-by: Glauber Costa <glommer@parallels.com>
> Acked-by: Christoph Lameter <cl@linux.com>
> CC: Pekka Enberg <penberg@cs.helsinki.fi>
> CC: David Rientjes <rientjes@google.com>

Acked-by: David Rientjes <rientjes@google.com>

 [ Might want to fix your address book in your email client because 
   Christoph's name is misspelled in the cc list. ]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] don't do __ClearPageSlab before freeing slab page.
  2012-06-20 20:59 ` [PATCH 4/4] don't do __ClearPageSlab before freeing slab page Glauber Costa
@ 2012-06-21  8:04   ` David Rientjes
  2012-06-21  8:13     ` Glauber Costa
  0 siblings, 1 reply; 16+ messages in thread
From: David Rientjes @ 2012-06-21  8:04 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Pekka Enberg, linux-mm, Cristoph Lameter, Pekka Enberg,
	Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner,
	Suleiman Souhlal

On Thu, 21 Jun 2012, Glauber Costa wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 6092f33..fdec73e 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -698,8 +698,10 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
>  
>  	if (PageAnon(page))
>  		page->mapping = NULL;
> -	for (i = 0; i < (1 << order); i++)
> +	for (i = 0; i < (1 << order); i++) {
> +		__ClearPageSlab(page + i);
>  		bad += free_pages_check(page + i);
> +	}
>  	if (bad)
>  		return false;
>  
> @@ -2561,6 +2563,7 @@ EXPORT_SYMBOL(get_zeroed_page);
>  void __free_pages(struct page *page, unsigned int order)
>  {
>  	if (put_page_testzero(page)) {
> +		__ClearPageSlab(page);
>  		if (order == 0)
>  			free_hot_cold_page(page, 0);
>  		else

These are called from a number of different places that has nothing to do 
with slab so it's certainly out of place here.  Is there really no 
alternative way of doing this?

> diff --git a/mm/slab.c b/mm/slab.c
> index cb6da05..3e578fc 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1821,11 +1821,6 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
>  	else
>  		sub_zone_page_state(page_zone(page),
>  				NR_SLAB_UNRECLAIMABLE, nr_freed);
> -	while (i--) {
> -		BUG_ON(!PageSlab(page));
> -		__ClearPageSlab(page);
> -		page++;
> -	}
>  	if (current->reclaim_state)
>  		current->reclaim_state->reclaimed_slab += nr_freed;
>  	free_pages((unsigned long)addr, cachep->gfporder);

And we lose this validation in slab.

I'm hoping there's an alternative.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] don't do __ClearPageSlab before freeing slab page.
  2012-06-21  8:04   ` David Rientjes
@ 2012-06-21  8:13     ` Glauber Costa
  2012-06-21 11:04       ` Kamezawa Hiroyuki
  2012-07-03 18:40       ` Christoph Lameter
  0 siblings, 2 replies; 16+ messages in thread
From: Glauber Costa @ 2012-06-21  8:13 UTC (permalink / raw)
  To: David Rientjes
  Cc: Pekka Enberg, linux-mm, Cristoph Lameter, Pekka Enberg,
	Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner,
	Suleiman Souhlal

On 06/21/2012 12:04 PM, David Rientjes wrote:
> On Thu, 21 Jun 2012, Glauber Costa wrote:
>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 6092f33..fdec73e 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -698,8 +698,10 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
>>
>>   	if (PageAnon(page))
>>   		page->mapping = NULL;
>> -	for (i = 0; i < (1 << order); i++)
>> +	for (i = 0; i < (1 << order); i++) {
>> +		__ClearPageSlab(page + i);
>>   		bad += free_pages_check(page + i);
>> +	}
>>   	if (bad)
>>   		return false;
>>
>> @@ -2561,6 +2563,7 @@ EXPORT_SYMBOL(get_zeroed_page);
>>   void __free_pages(struct page *page, unsigned int order)
>>   {
>>   	if (put_page_testzero(page)) {
>> +		__ClearPageSlab(page);
>>   		if (order == 0)
>>   			free_hot_cold_page(page, 0);
>>   		else
>
> These are called from a number of different places that has nothing to do
> with slab so it's certainly out of place here.  Is there really no
> alternative way of doing this?

Well, if the requirement is that we must handle this from the page 
allocator, how else should I know if I must call the corresponding free 
functions ?

Also note that other bits are tested inside the page allocator as well, 
such as MLock.

I saw no other way, but if you have suggestions, I'd be open to try 
them, of course.

>> diff --git a/mm/slab.c b/mm/slab.c
>> index cb6da05..3e578fc 100644
>> --- a/mm/slab.c
>> +++ b/mm/slab.c
>> @@ -1821,11 +1821,6 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
>>   	else
>>   		sub_zone_page_state(page_zone(page),
>>   				NR_SLAB_UNRECLAIMABLE, nr_freed);
>> -	while (i--) {
>> -		BUG_ON(!PageSlab(page));
>> -		__ClearPageSlab(page);
>> -		page++;
>> -	}
>>   	if (current->reclaim_state)
>>   		current->reclaim_state->reclaimed_slab += nr_freed;
>>   	free_pages((unsigned long)addr, cachep->gfporder);
>
> And we lose this validation in slab.
>
> I'm hoping there's an alternative.
The validation can stay, if you would prefer. We still expect the page 
to be PageSlab at this point, so we can test for it, and only clear later.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] don't do __ClearPageSlab before freeing slab page.
  2012-06-21  8:13     ` Glauber Costa
@ 2012-06-21 11:04       ` Kamezawa Hiroyuki
  2012-06-21 11:14         ` Glauber Costa
  2012-06-21 20:24         ` Glauber Costa
  2012-07-03 18:40       ` Christoph Lameter
  1 sibling, 2 replies; 16+ messages in thread
From: Kamezawa Hiroyuki @ 2012-06-21 11:04 UTC (permalink / raw)
  To: Glauber Costa
  Cc: David Rientjes, Pekka Enberg, linux-mm, Cristoph Lameter,
	Pekka Enberg, Michal Hocko, Johannes Weiner, Suleiman Souhlal

(2012/06/21 17:13), Glauber Costa wrote:
> On 06/21/2012 12:04 PM, David Rientjes wrote:
>> On Thu, 21 Jun 2012, Glauber Costa wrote:
>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 6092f33..fdec73e 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -698,8 +698,10 @@ static bool free_pages_prepare(struct page *page, unsigned int order)
>>>
>>> if (PageAnon(page))
>>> page->mapping = NULL;
>>> - for (i = 0; i < (1 << order); i++)
>>> + for (i = 0; i < (1 << order); i++) {
>>> + __ClearPageSlab(page + i);
>>> bad += free_pages_check(page + i);
>>> + }
>>> if (bad)
>>> return false;
>>>
>>> @@ -2561,6 +2563,7 @@ EXPORT_SYMBOL(get_zeroed_page);
>>> void __free_pages(struct page *page, unsigned int order)
>>> {
>>> if (put_page_testzero(page)) {
>>> + __ClearPageSlab(page);
>>> if (order == 0)
>>> free_hot_cold_page(page, 0);
>>> else
>>
>> These are called from a number of different places that has nothing to do
>> with slab so it's certainly out of place here. Is there really no
>> alternative way of doing this?
>
> Well, if the requirement is that we must handle this from the page allocator, how else should I know if I must call the corresponding free functions ?
>
> Also note that other bits are tested inside the page allocator as well, such as MLock.
>
> I saw no other way, but if you have suggestions, I'd be open to try them, of course.
>

I'm sorry I don't understand the logic enough well.

Why check in __free_pages() is better than check in callers of slab.c/slub.c ?

Thanks,
-Kame






--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] don't do __ClearPageSlab before freeing slab page.
  2012-06-21 11:04       ` Kamezawa Hiroyuki
@ 2012-06-21 11:14         ` Glauber Costa
  2012-06-21 20:24         ` Glauber Costa
  1 sibling, 0 replies; 16+ messages in thread
From: Glauber Costa @ 2012-06-21 11:14 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: David Rientjes, Pekka Enberg, linux-mm, Cristoph Lameter,
	Pekka Enberg, Michal Hocko, Johannes Weiner, Suleiman Souhlal

On 06/21/2012 03:04 PM, Kamezawa Hiroyuki wrote:
> (2012/06/21 17:13), Glauber Costa wrote:
>> On 06/21/2012 12:04 PM, David Rientjes wrote:
>>> On Thu, 21 Jun 2012, Glauber Costa wrote:
>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 6092f33..fdec73e 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -698,8 +698,10 @@ static bool free_pages_prepare(struct page
>>>> *page, unsigned int order)
>>>>
>>>> if (PageAnon(page))
>>>> page->mapping = NULL;
>>>> - for (i = 0; i < (1 << order); i++)
>>>> + for (i = 0; i < (1 << order); i++) {
>>>> + __ClearPageSlab(page + i);
>>>> bad += free_pages_check(page + i);
>>>> + }
>>>> if (bad)
>>>> return false;
>>>>
>>>> @@ -2561,6 +2563,7 @@ EXPORT_SYMBOL(get_zeroed_page);
>>>> void __free_pages(struct page *page, unsigned int order)
>>>> {
>>>> if (put_page_testzero(page)) {
>>>> + __ClearPageSlab(page);
>>>> if (order == 0)
>>>> free_hot_cold_page(page, 0);
>>>> else
>>>
>>> These are called from a number of different places that has nothing
>>> to do
>>> with slab so it's certainly out of place here. Is there really no
>>> alternative way of doing this?
>>
>> Well, if the requirement is that we must handle this from the page
>> allocator, how else should I know if I must call the corresponding
>> free functions ?
>>
>> Also note that other bits are tested inside the page allocator as
>> well, such as MLock.
>>
>> I saw no other way, but if you have suggestions, I'd be open to try
>> them, of course.
>>
>
> I'm sorry I don't understand the logic enough well.
>
> Why check in __free_pages() is better than check in callers of
> slab.c/slub.c ?
>

It's less intrusive, because it is independent of the cache being used 
and doesn't modify the cache code. That's one of the things I tried to 
achieve in this last patchset.

If the slab guys think this change in the slab is acceptable, and better 
than in __free_pages, I'd be happy to change back to it. But then, I'd 
like to to the same for the allocation.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] don't do __ClearPageSlab before freeing slab page.
  2012-06-21 11:04       ` Kamezawa Hiroyuki
  2012-06-21 11:14         ` Glauber Costa
@ 2012-06-21 20:24         ` Glauber Costa
  2012-07-03 18:42           ` Christoph Lameter
  1 sibling, 1 reply; 16+ messages in thread
From: Glauber Costa @ 2012-06-21 20:24 UTC (permalink / raw)
  To: Kamezawa Hiroyuki
  Cc: David Rientjes, Pekka Enberg, linux-mm, Cristoph Lameter,
	Pekka Enberg, Michal Hocko, Johannes Weiner, Suleiman Souhlal

On 06/21/2012 03:04 PM, Kamezawa Hiroyuki wrote:
> (2012/06/21 17:13), Glauber Costa wrote:
>> On 06/21/2012 12:04 PM, David Rientjes wrote:
>>> On Thu, 21 Jun 2012, Glauber Costa wrote:
>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 6092f33..fdec73e 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -698,8 +698,10 @@ static bool free_pages_prepare(struct page
>>>> *page, unsigned int order)
>>>>
>>>> if (PageAnon(page))
>>>> page->mapping = NULL;
>>>> - for (i = 0; i < (1 << order); i++)
>>>> + for (i = 0; i < (1 << order); i++) {
>>>> + __ClearPageSlab(page + i);
>>>> bad += free_pages_check(page + i);
>>>> + }
>>>> if (bad)
>>>> return false;
>>>>
>>>> @@ -2561,6 +2563,7 @@ EXPORT_SYMBOL(get_zeroed_page);
>>>> void __free_pages(struct page *page, unsigned int order)
>>>> {
>>>> if (put_page_testzero(page)) {
>>>> + __ClearPageSlab(page);
>>>> if (order == 0)
>>>> free_hot_cold_page(page, 0);
>>>> else
>>>
>>> These are called from a number of different places that has nothing
>>> to do
>>> with slab so it's certainly out of place here. Is there really no
>>> alternative way of doing this?
>>
>> Well, if the requirement is that we must handle this from the page
>> allocator, how else should I know if I must call the corresponding
>> free functions ?
>>
>> Also note that other bits are tested inside the page allocator as
>> well, such as MLock.
>>
>> I saw no other way, but if you have suggestions, I'd be open to try
>> them, of course.
>>
>
> I'm sorry I don't understand the logic enough well.
>
> Why check in __free_pages() is better than check in callers of
> slab.c/slub.c ?
>
> Thanks,
> -Kame
>

How would the slab people feel, specially Christoph, about a simple 
change in the caches, replacing free_pages and alloc_pages by common 
functions that calls the memcg correspondents when needed ?

It could even be done in a header file, and the change in the slab goes 
only as far as changing names in the call sites. (And maybe 
standardizing, because we have a mixture of free_pages and __free_pages 
around)

This would possibly render the __GFP_SLABMEMC not needed, since we'd 
have stable call sites for memcg to derive its context from.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/4] slab: move FULL state transition to an initcall
  2012-06-21  8:01   ` David Rientjes
@ 2012-07-02 10:57     ` Pekka Enberg
  0 siblings, 0 replies; 16+ messages in thread
From: Pekka Enberg @ 2012-07-02 10:57 UTC (permalink / raw)
  To: David Rientjes; +Cc: Glauber Costa, linux-mm, Cristoph Lameter, Pekka Enberg

On Thu, 21 Jun 2012, David Rientjes wrote:
> > During kmem_cache_init_late(), we transition to the LATE state,
> > and after some more work, to the FULL state, its last state
> > 
> > This is quite different from slub, that will only transition to
> > its last state (previously SYSFS), in a (late)initcall, after a lot
> > more of the kernel is ready.
> > 
> > This means that in slab, we have no way to taking actions dependent
> > on the initialization of other pieces of the kernel that are supposed
> > to start way after kmem_init_late(), such as cgroups initialization.
> > 
> > To achieve more consistency in this behavior, that patch only
> > transitions to the UP state in kmem_init_late. In my analysis,
> > setup_cpu_cache() should be happy to test for >= UP, instead of
> > == FULL. It also has passed some tests I've made.
> > 
> > We then only mark FULL state after the reap timers are in place,
> > meaning that no further setup is expected.
> > 
> > Signed-off-by: Glauber Costa <glommer@parallels.com>
> > Acked-by: Christoph Lameter <cl@linux.com>
> > CC: Pekka Enberg <penberg@cs.helsinki.fi>
> > CC: David Rientjes <rientjes@google.com>
> 
> Acked-by: David Rientjes <rientjes@google.com>
> 
>  [ Might want to fix your address book in your email client because 
>    Christoph's name is misspelled in the cc list. ]

Applied, thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] don't do __ClearPageSlab before freeing slab page.
  2012-06-21  8:13     ` Glauber Costa
  2012-06-21 11:04       ` Kamezawa Hiroyuki
@ 2012-07-03 18:40       ` Christoph Lameter
  1 sibling, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2012-07-03 18:40 UTC (permalink / raw)
  To: Glauber Costa
  Cc: David Rientjes, Pekka Enberg, linux-mm, Pekka Enberg,
	Michal Hocko, Kamezawa Hiroyuki, Johannes Weiner,
	Suleiman Souhlal

On Thu, 21 Jun 2012, Glauber Costa wrote:

> Well, if the requirement is that we must handle this from the page allocator,
> how else should I know if I must call the corresponding free functions ?

Is there such a requirement? I believe I was talking about a wrapper that
accounts for page allocator requests.

> Also note that other bits are tested inside the page allocator as well, such
> as MLock.

Yea so we could do this but doing so requires agreement with the people
involved in page allocator development.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] don't do __ClearPageSlab before freeing slab page.
  2012-06-21 20:24         ` Glauber Costa
@ 2012-07-03 18:42           ` Christoph Lameter
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Lameter @ 2012-07-03 18:42 UTC (permalink / raw)
  To: Glauber Costa
  Cc: Kamezawa Hiroyuki, David Rientjes, Pekka Enberg, linux-mm,
	Pekka Enberg, Michal Hocko, Johannes Weiner, Suleiman Souhlal

On Fri, 22 Jun 2012, Glauber Costa wrote:

> How would the slab people feel, specially Christoph, about a simple change in
> the caches, replacing free_pages and alloc_pages by common functions that
> calls the memcg correspondents when needed ?

I believe that is one of the optionst that I proposed earlier.

> This would possibly render the __GFP_SLABMEMC not needed, since we'd have
> stable call sites for memcg to derive its context from.

Right.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2012-07-03 18:42 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-20 20:59 [PATCH 0/4] cache-specific changes for memcg (preparation) Glauber Costa
2012-06-20 20:59 ` [PATCH 1/4] slab: rename gfpflags to allocflags Glauber Costa
2012-06-21  7:53   ` David Rientjes
2012-06-20 20:59 ` [PATCH 2/4] Wipe out CFLGS_OFF_SLAB from flags during initial slab creation Glauber Costa
2012-06-21  7:59   ` David Rientjes
2012-06-20 20:59 ` [PATCH 3/4] slab: move FULL state transition to an initcall Glauber Costa
2012-06-21  8:01   ` David Rientjes
2012-07-02 10:57     ` Pekka Enberg
2012-06-20 20:59 ` [PATCH 4/4] don't do __ClearPageSlab before freeing slab page Glauber Costa
2012-06-21  8:04   ` David Rientjes
2012-06-21  8:13     ` Glauber Costa
2012-06-21 11:04       ` Kamezawa Hiroyuki
2012-06-21 11:14         ` Glauber Costa
2012-06-21 20:24         ` Glauber Costa
2012-07-03 18:42           ` Christoph Lameter
2012-07-03 18:40       ` Christoph Lameter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.