linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] Small fixes for swap-over-network
@ 2012-09-04 17:24 Mel Gorman
  2012-09-04 17:24 ` [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Mel Gorman @ 2012-09-04 17:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Chuck Lever,
	Joonsoo Kim, Pekka, Enberg <penberg, David Rientjes,
	Mel Gorman

This series is 4 small patches posted by Jonsoo Kim and Chuck Lever with
some minor changes applied. They are not critical but they should be fixed
before 3.6 comes out. I've picked them up and reposted to make sure they
did not get lost.

Ordinarily I would say that 1-3 should go through Pekka's slab tree and
the last patch through David Millers linux-net tree but as they are fairly
minor maybe it would be easier if all 4 went through Andrew's tree at the
same time.

The patches have been tested against 3.6-rc4 and they passed the swap over
NFS and NBD tests.

 include/net/sock.h |    2 +-
 mm/slab.c          |    6 +++---
 mm/slub.c          |   15 ++++++++++-----
 3 files changed, 14 insertions(+), 9 deletions(-)

-- 
1.7.9.2


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab
  2012-09-04 17:24 [PATCH 0/4] Small fixes for swap-over-network Mel Gorman
@ 2012-09-04 17:24 ` Mel Gorman
  2012-09-06 17:57   ` JoonSoo Kim
  2012-09-04 17:24 ` [PATCH 2/4] slab: fix starting index for finding another object Mel Gorman
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2012-09-04 17:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Chuck Lever,
	Joonsoo Kim, Pekka, Enberg <penberg, David Rientjes,
	Mel Gorman

Right now, we call ClearSlabPfmemalloc() for first page of slab when we
clear SlabPfmemalloc flag. This is fine for most swap-over-network use
cases as it is expected that order-0 pages are in use. Unfortunately it
is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
and while this is harmless, it is sloppy. This patch ensures that the head
page is always used.

This problem was originally identified by Joonsoo Kim.

[js1304@gmail.com: Original implementation and problem identification]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/slab.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 811af03..d34a903 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		l3 = cachep->nodelists[numa_mem_id()];
 		if (!list_empty(&l3->slabs_free) && force_refill) {
 			struct slab *slabp = virt_to_slab(objp);
-			ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
+			ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct page *page = virt_to_page(objp);
+		struct page *page = virt_to_head_page(objp);
 		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] slab: fix starting index for finding another object
  2012-09-04 17:24 [PATCH 0/4] Small fixes for swap-over-network Mel Gorman
  2012-09-04 17:24 ` [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
@ 2012-09-04 17:24 ` Mel Gorman
  2012-09-04 17:24 ` [PATCH 3/4] slub: consider pfmemalloc_match() in get_partial_node() Mel Gorman
  2012-09-04 17:24 ` [PATCH 4/4] Squelch compiler warning in sk_rmem_schedule() Mel Gorman
  3 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2012-09-04 17:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Chuck Lever,
	Joonsoo Kim, Pekka, Enberg <penberg, David Rientjes,
	Mel Gorman

From: Joonsoo Kim <js1304@gmail.com>

In array cache, there is a object at index 0, check it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/slab.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/slab.c b/mm/slab.c
index d34a903..c685475 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -983,7 +983,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		}
 
 		/* The caller cannot use PFMEMALLOC objects, find another one */
-		for (i = 1; i < ac->avail; i++) {
+		for (i = 0; i < ac->avail; i++) {
 			/* If a !PFMEMALLOC object is found, swap them */
 			if (!is_obj_pfmemalloc(ac->entry[i])) {
 				objp = ac->entry[i];
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] slub: consider pfmemalloc_match() in get_partial_node()
  2012-09-04 17:24 [PATCH 0/4] Small fixes for swap-over-network Mel Gorman
  2012-09-04 17:24 ` [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
  2012-09-04 17:24 ` [PATCH 2/4] slab: fix starting index for finding another object Mel Gorman
@ 2012-09-04 17:24 ` Mel Gorman
  2012-09-04 17:24 ` [PATCH 4/4] Squelch compiler warning in sk_rmem_schedule() Mel Gorman
  3 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2012-09-04 17:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Chuck Lever,
	Joonsoo Kim, Pekka, Enberg <penberg, David Rientjes,
	Mel Gorman

From: Joonsoo Kim <js1304@gmail.com>

The function get_partial() is currently not checking pfmemalloc_match()
meaning that it is possible for pfmemalloc pages to leak to non-pfmemalloc
users. This is a problem in the following situation.  Assume that there is
a request from normal allocation and there are no objects in the per-cpu
cache and no node-partial slab.

In this case, slab_alloc enters the slow path and new_slab_objects()
is called which may return a PFMEMALLOC page. As the current user is not
allowed to access PFMEMALLOC page, deactivate_slab() is called ([5091b74a:
mm: slub: optimise the SLUB fast path to avoid pfmemalloc checks]) and
returns an object from PFMEMALLOC page.

Next time, when we get another request from normal allocation, slab_alloc()
enters the slow-path and calls new_slab_objects().  In new_slab_objects(),
we call get_partial() and get a partial slab which was just deactivated
but is a pfmemalloc page. We extract one object from it and re-deactivate.

"deactivate -> re-get in get_partial -> re-deactivate" occures repeatedly.

As a result, access to PFMEMALLOC page is not properly restricted and it
can cause a performance degradation due to frequent deactivation.
deactivation frequently.

This patch changes get_partial_node() to take pfmemalloc_match() into
account and prevents the "deactivate -> re-get in get_partial() scenario.
Instead, new_slab() is called.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/slub.c |   15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 8f78e25..2fdd96f9e9 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1524,12 +1524,13 @@ static inline void *acquire_slab(struct kmem_cache *s,
 }
 
 static int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain);
+static inline bool pfmemalloc_match(struct page *page, gfp_t gfpflags);
 
 /*
  * Try to allocate a partial slab from a specific node.
  */
-static void *get_partial_node(struct kmem_cache *s,
-		struct kmem_cache_node *n, struct kmem_cache_cpu *c)
+static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n,
+				struct kmem_cache_cpu *c, gfp_t flags)
 {
 	struct page *page, *page2;
 	void *object = NULL;
@@ -1545,9 +1546,13 @@ static void *get_partial_node(struct kmem_cache *s,
 
 	spin_lock(&n->list_lock);
 	list_for_each_entry_safe(page, page2, &n->partial, lru) {
-		void *t = acquire_slab(s, n, page, object == NULL);
+		void *t;
 		int available;
 
+		if (!pfmemalloc_match(page, flags))
+			continue;
+
+		t = acquire_slab(s, n, page, object == NULL);
 		if (!t)
 			break;
 
@@ -1614,7 +1619,7 @@ static void *get_any_partial(struct kmem_cache *s, gfp_t flags,
 
 			if (n && cpuset_zone_allowed_hardwall(zone, flags) &&
 					n->nr_partial > s->min_partial) {
-				object = get_partial_node(s, n, c);
+				object = get_partial_node(s, n, c, flags);
 				if (object) {
 					/*
 					 * Return the object even if
@@ -1643,7 +1648,7 @@ static void *get_partial(struct kmem_cache *s, gfp_t flags, int node,
 	void *object;
 	int searchnode = (node == NUMA_NO_NODE) ? numa_node_id() : node;
 
-	object = get_partial_node(s, get_node(s, searchnode), c);
+	object = get_partial_node(s, get_node(s, searchnode), c, flags);
 	if (object || node != NUMA_NO_NODE)
 		return object;
 
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] Squelch compiler warning in sk_rmem_schedule()
  2012-09-04 17:24 [PATCH 0/4] Small fixes for swap-over-network Mel Gorman
                   ` (2 preceding siblings ...)
  2012-09-04 17:24 ` [PATCH 3/4] slub: consider pfmemalloc_match() in get_partial_node() Mel Gorman
@ 2012-09-04 17:24 ` Mel Gorman
  3 siblings, 0 replies; 9+ messages in thread
From: Mel Gorman @ 2012-09-04 17:24 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux-MM, Linux-Netdev, LKML, David Miller, Chuck Lever,
	Joonsoo Kim, Pekka, Enberg <penberg, David Rientjes,
	Mel Gorman

From: Chuck Lever <chuck.lever@oracle.com>

In file included from linux/include/linux/tcp.h:227:0,
                 from linux/include/linux/ipv6.h:221,
                 from linux/include/net/ipv6.h:16,
                 from linux/include/linux/sunrpc/clnt.h:26,
                 from linux/net/sunrpc/stats.c:22:
linux/include/net/sock.h: In function ‘sk_rmem_schedule’:
linux/nfs-2.6/include/net/sock.h:1339:13: warning: comparison between
  signed and unsigned integer expressions [-Wsign-compare]

Seen with gcc (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) using the
-Wextra option.

[c76562b6: netvm: prevent a stream-specific deadlock] accidentally replaced
the "size" parameter of sk_rmem_schedule() with an unsigned int. This
changes the semantics of the comparison in the return statement.

In sk_wmem_schedule we have syntactically the same comparison, but
"size" is a signed integer.  In addition, __sk_mem_schedule() takes
a signed integer for its "size" parameter, so there is an implicit
type conversion in sk_rmem_schedule() anyway.

Revert the "size" parameter back to a signed integer so that the
semantics of the expressions in both sk_[rw]mem_schedule() are
exactly the same.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 include/net/sock.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 72132ae..adb7da2 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1332,7 +1332,7 @@ static inline bool sk_wmem_schedule(struct sock *sk, int size)
 }
 
 static inline bool
-sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, unsigned int size)
+sk_rmem_schedule(struct sock *sk, struct sk_buff *skb, int size)
 {
 	if (!sk_has_account(sk))
 		return true;
-- 
1.7.9.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab
  2012-09-04 17:24 ` [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
@ 2012-09-06 17:57   ` JoonSoo Kim
  2012-09-06 18:05     ` JoonSoo Kim
  0 siblings, 1 reply; 9+ messages in thread
From: JoonSoo Kim @ 2012-09-06 17:57 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, LKML, David Miller,
	Chuck Lever, Pekka, Enberg <penberg, David Rientjes,
	Christoph Lameter

Add "Cc" to "Christoph Lameter" <cl@linux.com>

2012/9/5 Mel Gorman <mgorman@suse.de>:
> Right now, we call ClearSlabPfmemalloc() for first page of slab when we
> clear SlabPfmemalloc flag. This is fine for most swap-over-network use
> cases as it is expected that order-0 pages are in use. Unfortunately it
> is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
> and while this is harmless, it is sloppy. This patch ensures that the head
> page is always used.
>
> This problem was originally identified by Joonsoo Kim.
>
> [js1304@gmail.com: Original implementation and problem identification]
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/slab.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 811af03..d34a903 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
>                 l3 = cachep->nodelists[numa_mem_id()];
>                 if (!list_empty(&l3->slabs_free) && force_refill) {
>                         struct slab *slabp = virt_to_slab(objp);
> -                       ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
> +                       ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
>                         clear_obj_pfmemalloc(&objp);
>                         recheck_pfmemalloc_active(cachep, ac);
>                         return objp;

We assume that slabp->s_mem's address is always in head page, so
"virt_to_head_page" is not needed.

> @@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
>  {
>         if (unlikely(pfmemalloc_active)) {
>                 /* Some pfmemalloc slabs exist, check if this is one */
> -               struct page *page = virt_to_page(objp);
> +               struct page *page = virt_to_head_page(objp);
>                 if (PageSlabPfmemalloc(page))
>                         set_obj_pfmemalloc(&objp);
>         }
> --
> 1.7.9.2
>

If we always use head page, following suggestion is more good to me.
How about you?

diff --git a/mm/slab.c b/mm/slab.c
index f8b0d53..ce70989 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache
*cachep, struct array_cache *ac,
 {
        if (unlikely(pfmemalloc_active)) {
                /* Some pfmemalloc slabs exist, check if this is one */
-               struct page *page = virt_to_page(objp);
+               struct page *page = virt_to_head_page(objp);
                if (PageSlabPfmemalloc(page))
                        set_obj_pfmemalloc(&objp);
        }
@@ -1921,10 +1921,9 @@ static void *kmem_getpages(struct kmem_cache
*cachep, gfp_t flags, int nodeid)
                        NR_SLAB_UNRECLAIMABLE, nr_pages);
        for (i = 0; i < nr_pages; i++) {
                __SetPageSlab(page + i);
-
-               if (page->pfmemalloc)
-                       SetPageSlabPfmemalloc(page + i);
        }
+       if (page->pfmemalloc)
+               SetPageSlabPfmemalloc(page);

        if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
                kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
@@ -1943,26 +1942,26 @@ static void *kmem_getpages(struct kmem_cache
*cachep, gfp_t flags, int nodeid)
  */
 static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 {
-       unsigned long i = (1 << cachep->gfporder);
+       int nr_pages = (1 << cachep->gfporder);
+       int i;
        struct page *page = virt_to_page(addr);
-       const unsigned long nr_freed = i;

        kmemcheck_free_shadow(page, cachep->gfporder);

        if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
                sub_zone_page_state(page_zone(page),
-                               NR_SLAB_RECLAIMABLE, nr_freed);
+                               NR_SLAB_RECLAIMABLE, nr_pages);
        else
                sub_zone_page_state(page_zone(page),
-                               NR_SLAB_UNRECLAIMABLE, nr_freed);
-       while (i--) {
-               BUG_ON(!PageSlab(page));
-               __ClearPageSlabPfmemalloc(page);
-               __ClearPageSlab(page);
-               page++;
+                               NR_SLAB_UNRECLAIMABLE, nr_pages);
+       for (i = 0; i < nr_pages; i++) {
+               BUG_ON(!PageSlab(page + i));
+               __ClearPageSlab(page + i);
        }
+       __ClearPageSlabPfmemalloc(page);
+
        if (current->reclaim_state)
-               current->reclaim_state->reclaimed_slab += nr_freed;
+               current->reclaim_state->reclaimed_slab += nr_pages;
        free_pages((unsigned long)addr, cachep->gfporder);
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab
  2012-09-06 17:57   ` JoonSoo Kim
@ 2012-09-06 18:05     ` JoonSoo Kim
  2012-09-07 12:55       ` Mel Gorman
  0 siblings, 1 reply; 9+ messages in thread
From: JoonSoo Kim @ 2012-09-06 18:05 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, LKML, David Miller,
	Chuck Lever, Pekka Enberg, David Rientjes, Christoph Lameter

Correct Pekka's mail address and resend.
Sorry.

Add "Cc" to "Christoph Lameter" <cl@linux.com>

2012/9/5 Mel Gorman <mgorman@suse.de>:
> Right now, we call ClearSlabPfmemalloc() for first page of slab when we
> clear SlabPfmemalloc flag. This is fine for most swap-over-network use
> cases as it is expected that order-0 pages are in use. Unfortunately it
> is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
> and while this is harmless, it is sloppy. This patch ensures that the head
> page is always used.
>
> This problem was originally identified by Joonsoo Kim.
>
> [js1304@gmail.com: Original implementation and problem identification]
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/slab.c |    4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 811af03..d34a903 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
>                 l3 = cachep->nodelists[numa_mem_id()];
>                 if (!list_empty(&l3->slabs_free) && force_refill) {
>                         struct slab *slabp = virt_to_slab(objp);
> -                       ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
> +                       ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
>                         clear_obj_pfmemalloc(&objp);
>                         recheck_pfmemalloc_active(cachep, ac);
>                         return objp;

We assume that slabp->s_mem's address is always in head page, so
"virt_to_head_page" is not needed.

> @@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
>  {
>         if (unlikely(pfmemalloc_active)) {
>                 /* Some pfmemalloc slabs exist, check if this is one */
> -               struct page *page = virt_to_page(objp);
> +               struct page *page = virt_to_head_page(objp);
>                 if (PageSlabPfmemalloc(page))
>                         set_obj_pfmemalloc(&objp);
>         }
> --
> 1.7.9.2
>

If we always use head page, following suggestion is more good to me.
How about you?

diff --git a/mm/slab.c b/mm/slab.c
index f8b0d53..ce70989 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache
*cachep, struct array_cache *ac,
 {
        if (unlikely(pfmemalloc_active)) {
                /* Some pfmemalloc slabs exist, check if this is one */
-               struct page *page = virt_to_page(objp);
+               struct page *page = virt_to_head_page(objp);
                if (PageSlabPfmemalloc(page))
                        set_obj_pfmemalloc(&objp);
        }
@@ -1921,10 +1921,9 @@ static void *kmem_getpages(struct kmem_cache
*cachep, gfp_t flags, int nodeid)
                        NR_SLAB_UNRECLAIMABLE, nr_pages);
        for (i = 0; i < nr_pages; i++) {
                __SetPageSlab(page + i);
-
-               if (page->pfmemalloc)
-                       SetPageSlabPfmemalloc(page + i);
        }
+       if (page->pfmemalloc)
+               SetPageSlabPfmemalloc(page);

        if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
                kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
@@ -1943,26 +1942,26 @@ static void *kmem_getpages(struct kmem_cache
*cachep, gfp_t flags, int nodeid)
  */
 static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 {
-       unsigned long i = (1 << cachep->gfporder);
+       int nr_pages = (1 << cachep->gfporder);
+       int i;
        struct page *page = virt_to_page(addr);
-       const unsigned long nr_freed = i;

        kmemcheck_free_shadow(page, cachep->gfporder);

        if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
                sub_zone_page_state(page_zone(page),
-                               NR_SLAB_RECLAIMABLE, nr_freed);
+                               NR_SLAB_RECLAIMABLE, nr_pages);
        else
                sub_zone_page_state(page_zone(page),
-                               NR_SLAB_UNRECLAIMABLE, nr_freed);
-       while (i--) {
-               BUG_ON(!PageSlab(page));
-               __ClearPageSlabPfmemalloc(page);
-               __ClearPageSlab(page);
-               page++;
+                               NR_SLAB_UNRECLAIMABLE, nr_pages);
+       for (i = 0; i < nr_pages; i++) {
+               BUG_ON(!PageSlab(page + i));
+               __ClearPageSlab(page + i);
        }
+       __ClearPageSlabPfmemalloc(page);
+
        if (current->reclaim_state)
-               current->reclaim_state->reclaimed_slab += nr_freed;
+               current->reclaim_state->reclaimed_slab += nr_pages;
        free_pages((unsigned long)addr, cachep->gfporder);
 }

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab
  2012-09-06 18:05     ` JoonSoo Kim
@ 2012-09-07 12:55       ` Mel Gorman
  2012-09-07 21:10         ` JoonSoo Kim
  0 siblings, 1 reply; 9+ messages in thread
From: Mel Gorman @ 2012-09-07 12:55 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, LKML, David Miller,
	Chuck Lever, Pekka Enberg, David Rientjes, Christoph Lameter

On Fri, Sep 07, 2012 at 03:05:39AM +0900, JoonSoo Kim wrote:
> Correct Pekka's mail address and resend.
> Sorry.
> 
> Add "Cc" to "Christoph Lameter" <cl@linux.com>
> 
> 2012/9/5 Mel Gorman <mgorman@suse.de>:
> > Right now, we call ClearSlabPfmemalloc() for first page of slab when we
> > clear SlabPfmemalloc flag. This is fine for most swap-over-network use
> > cases as it is expected that order-0 pages are in use. Unfortunately it
> > is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
> > and while this is harmless, it is sloppy. This patch ensures that the head
> > page is always used.
> >
> > This problem was originally identified by Joonsoo Kim.
> >
> > [js1304@gmail.com: Original implementation and problem identification]
> > Signed-off-by: Mel Gorman <mgorman@suse.de>
> > ---
> >  mm/slab.c |    4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 811af03..d34a903 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
> >                 l3 = cachep->nodelists[numa_mem_id()];
> >                 if (!list_empty(&l3->slabs_free) && force_refill) {
> >                         struct slab *slabp = virt_to_slab(objp);
> > -                       ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
> > +                       ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
> >                         clear_obj_pfmemalloc(&objp);
> >                         recheck_pfmemalloc_active(cachep, ac);
> >                         return objp;
> 
> We assume that slabp->s_mem's address is always in head page, so
> "virt_to_head_page" is not needed.
> 

Fair point. I thought it would be more "obvious" later that we really
always intended to use the head page but it is unnecessary.

> > @@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
> >  {
> >         if (unlikely(pfmemalloc_active)) {
> >                 /* Some pfmemalloc slabs exist, check if this is one */
> > -               struct page *page = virt_to_page(objp);
> > +               struct page *page = virt_to_head_page(objp);
> >                 if (PageSlabPfmemalloc(page))
> >                         set_obj_pfmemalloc(&objp);
> >         }
> > --
> > 1.7.9.2
> >
> 
> If we always use head page, following suggestion is more good to me.
> How about you?
> 
> diff --git a/mm/slab.c b/mm/slab.c
> index f8b0d53..ce70989 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache
> *cachep, struct array_cache *ac,
>  {
>         if (unlikely(pfmemalloc_active)) {
>                 /* Some pfmemalloc slabs exist, check if this is one */
> -               struct page *page = virt_to_page(objp);
> +               struct page *page = virt_to_head_page(objp);
>                 if (PageSlabPfmemalloc(page))
>                         set_obj_pfmemalloc(&objp);
>         }

ok.

> @@ -1921,10 +1921,9 @@ static void *kmem_getpages(struct kmem_cache
> *cachep, gfp_t flags, int nodeid)
>                         NR_SLAB_UNRECLAIMABLE, nr_pages);
>         for (i = 0; i < nr_pages; i++) {
>                 __SetPageSlab(page + i);
> -
> -               if (page->pfmemalloc)
> -                       SetPageSlabPfmemalloc(page + i);
>         }
> +       if (page->pfmemalloc)
> +               SetPageSlabPfmemalloc(page);
> 
>         if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
>                 kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);

ok.

> @@ -1943,26 +1942,26 @@ static void *kmem_getpages(struct kmem_cache
> *cachep, gfp_t flags, int nodeid)
>   */
>  static void kmem_freepages(struct kmem_cache *cachep, void *addr)
>  {
> -       unsigned long i = (1 << cachep->gfporder);
> +       int nr_pages = (1 << cachep->gfporder);
> +       int i;
>         struct page *page = virt_to_page(addr);
> -       const unsigned long nr_freed = i;
> 
>         kmemcheck_free_shadow(page, cachep->gfporder);
> 
>         if (cachep->flags & SLAB_RECLAIM_ACCOUNT)
>                 sub_zone_page_state(page_zone(page),
> -                               NR_SLAB_RECLAIMABLE, nr_freed);
> +                               NR_SLAB_RECLAIMABLE, nr_pages);
>         else
>                 sub_zone_page_state(page_zone(page),
> -                               NR_SLAB_UNRECLAIMABLE, nr_freed);
> -       while (i--) {
> -               BUG_ON(!PageSlab(page));
> -               __ClearPageSlabPfmemalloc(page);
> -               __ClearPageSlab(page);
> -               page++;
> +                               NR_SLAB_UNRECLAIMABLE, nr_pages);
> +       for (i = 0; i < nr_pages; i++) {
> +               BUG_ON(!PageSlab(page + i));
> +               __ClearPageSlab(page + i);
>         }
> +       __ClearPageSlabPfmemalloc(page);
> +
>         if (current->reclaim_state)
> -               current->reclaim_state->reclaimed_slab += nr_freed;
> +               current->reclaim_state->reclaimed_slab += nr_pages;
>         free_pages((unsigned long)addr, cachep->gfporder);
>  }

This churns code a lot more than is necessary. How about this as a
replacement patch?

---8<---
From: Joonsoo Kim <js1304@gmail.com>
Subject: [PATCH] slab: do ClearSlabPfmemalloc() for all pages of slab

Right now, we call ClearSlabPfmemalloc() for first page of slab when we
clear SlabPfmemalloc flag. This is fine for most swap-over-network use
cases as it is expected that order-0 pages are in use. Unfortunately it
is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
and while this is harmless, it is sloppy. This patch ensures that the head
page is always used.

[mgorman@suse.de: Easier implementation, changelog cleanup]
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/slab.c |   12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index 811af03..590d52a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct page *page = virt_to_page(objp);
+		struct page *page = virt_to_head_page(objp);
 		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}
@@ -1919,12 +1919,10 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
 	else
 		add_zone_page_state(page_zone(page),
 			NR_SLAB_UNRECLAIMABLE, nr_pages);
-	for (i = 0; i < nr_pages; i++) {
+	for (i = 0; i < nr_pages; i++)
 		__SetPageSlab(page + i);
-
-		if (page->pfmemalloc)
-			SetPageSlabPfmemalloc(page + i);
-	}
+	if (page->pfmemalloc)
+		SetPageSlabPfmemalloc(page);
 
 	if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
 		kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
@@ -1955,9 +1953,9 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
 	else
 		sub_zone_page_state(page_zone(page),
 				NR_SLAB_UNRECLAIMABLE, nr_freed);
+	__ClearPageSlabPfmemalloc(page);
 	while (i--) {
 		BUG_ON(!PageSlab(page));
-		__ClearPageSlabPfmemalloc(page);
 		__ClearPageSlab(page);
 		page++;
 	}

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab
  2012-09-07 12:55       ` Mel Gorman
@ 2012-09-07 21:10         ` JoonSoo Kim
  0 siblings, 0 replies; 9+ messages in thread
From: JoonSoo Kim @ 2012-09-07 21:10 UTC (permalink / raw)
  To: Mel Gorman
  Cc: Andrew Morton, Linux-MM, Linux-Netdev, LKML, David Miller,
	Chuck Lever, Pekka Enberg, David Rientjes, Christoph Lameter

2012/9/7 Mel Gorman <mgorman@suse.de>:
> This churns code a lot more than is necessary. How about this as a
> replacement patch?
>
> ---8<---
> From: Joonsoo Kim <js1304@gmail.com>
> Subject: [PATCH] slab: do ClearSlabPfmemalloc() for all pages of slab
>
> Right now, we call ClearSlabPfmemalloc() for first page of slab when we
> clear SlabPfmemalloc flag. This is fine for most swap-over-network use
> cases as it is expected that order-0 pages are in use. Unfortunately it
> is possible that that __ac_put_obj() checks SlabPfmemalloc on a tail page
> and while this is harmless, it is sloppy. This patch ensures that the head
> page is always used.
>
> [mgorman@suse.de: Easier implementation, changelog cleanup]
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> ---
>  mm/slab.c |   12 +++++-------
>  1 file changed, 5 insertions(+), 7 deletions(-)
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 811af03..590d52a 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
>  {
>         if (unlikely(pfmemalloc_active)) {
>                 /* Some pfmemalloc slabs exist, check if this is one */
> -               struct page *page = virt_to_page(objp);
> +               struct page *page = virt_to_head_page(objp);
>                 if (PageSlabPfmemalloc(page))
>                         set_obj_pfmemalloc(&objp);
>         }
> @@ -1919,12 +1919,10 @@ static void *kmem_getpages(struct kmem_cache *cachep, gfp_t flags, int nodeid)
>         else
>                 add_zone_page_state(page_zone(page),
>                         NR_SLAB_UNRECLAIMABLE, nr_pages);
> -       for (i = 0; i < nr_pages; i++) {
> +       for (i = 0; i < nr_pages; i++)
>                 __SetPageSlab(page + i);
> -
> -               if (page->pfmemalloc)
> -                       SetPageSlabPfmemalloc(page + i);
> -       }
> +       if (page->pfmemalloc)
> +               SetPageSlabPfmemalloc(page);
>
>         if (kmemcheck_enabled && !(cachep->flags & SLAB_NOTRACK)) {
>                 kmemcheck_alloc_shadow(page, cachep->gfporder, flags, nodeid);
> @@ -1955,9 +1953,9 @@ static void kmem_freepages(struct kmem_cache *cachep, void *addr)
>         else
>                 sub_zone_page_state(page_zone(page),
>                                 NR_SLAB_UNRECLAIMABLE, nr_freed);
> +       __ClearPageSlabPfmemalloc(page);
>         while (i--) {
>                 BUG_ON(!PageSlab(page));
> -               __ClearPageSlabPfmemalloc(page);
>                 __ClearPageSlab(page);
>                 page++;
>         }

Okay.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2012-09-07 21:10 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-04 17:24 [PATCH 0/4] Small fixes for swap-over-network Mel Gorman
2012-09-04 17:24 ` [PATCH 1/4] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
2012-09-06 17:57   ` JoonSoo Kim
2012-09-06 18:05     ` JoonSoo Kim
2012-09-07 12:55       ` Mel Gorman
2012-09-07 21:10         ` JoonSoo Kim
2012-09-04 17:24 ` [PATCH 2/4] slab: fix starting index for finding another object Mel Gorman
2012-09-04 17:24 ` [PATCH 3/4] slub: consider pfmemalloc_match() in get_partial_node() Mel Gorman
2012-09-04 17:24 ` [PATCH 4/4] Squelch compiler warning in sk_rmem_schedule() Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).