linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@suse.de>
To: David Rientjes <rientjes@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Linux-MM <linux-mm@kvack.org>,
	Linux-Netdev <netdev@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	David Miller <davem@davemloft.net>, Neil Brown <neilb@suse.de>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	Mike Christie <michaelc@cs.wisc.edu>,
	Eric B Munson <emunson@mgebm.net>
Subject: Re: [PATCH 02/16] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages
Date: Wed, 25 Apr 2012 16:05:56 +0100	[thread overview]
Message-ID: <20120425150556.GC15299@suse.de> (raw)
In-Reply-To: <alpine.DEB.2.00.1204231637390.17030@chino.kir.corp.google.com>

On Mon, Apr 23, 2012 at 04:51:02PM -0700, David Rientjes wrote:
> On Mon, 16 Apr 2012, Mel Gorman wrote:
> 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 280eabe..0fa2c72 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1463,6 +1463,7 @@ failed:
> >  #define ALLOC_HARDER		0x10 /* try to alloc harder */
> >  #define ALLOC_HIGH		0x20 /* __GFP_HIGH set */
> >  #define ALLOC_CPUSET		0x40 /* check for correct cpuset */
> > +#define ALLOC_PFMEMALLOC	0x80 /* Caller has PF_MEMALLOC set */
> >  
> >  #ifdef CONFIG_FAIL_PAGE_ALLOC
> >  
> > @@ -2208,16 +2209,22 @@ gfp_to_alloc_flags(gfp_t gfp_mask)
> >  	} else if (unlikely(rt_task(current)) && !in_interrupt())
> >  		alloc_flags |= ALLOC_HARDER;
> >  
> > -	if (likely(!(gfp_mask & __GFP_NOMEMALLOC))) {
> > -		if (!in_interrupt() &&
> > -		    ((current->flags & PF_MEMALLOC) ||
> > -		     unlikely(test_thread_flag(TIF_MEMDIE))))
> > +	if ((current->flags & PF_MEMALLOC) ||
> > +			unlikely(test_thread_flag(TIF_MEMDIE))) {
> > +		alloc_flags |= ALLOC_PFMEMALLOC;
> > +
> > +		if (likely(!(gfp_mask & __GFP_NOMEMALLOC)) && !in_interrupt())
> >  			alloc_flags |= ALLOC_NO_WATERMARKS;
> >  	}
> >  
> >  	return alloc_flags;
> >  }
> >  
> > +bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
> > +{
> > +	return !!(gfp_to_alloc_flags(gfp_mask) & ALLOC_PFMEMALLOC);
> > +}
> > +
> >  static inline struct page *
> >  __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	struct zonelist *zonelist, enum zone_type high_zoneidx,
> > @@ -2407,8 +2414,16 @@ nopage:
> >  got_pg:
> >  	if (kmemcheck_enabled)
> >  		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
> > -	return page;
> >  
> > +	/*
> > +	 * page->pfmemalloc is set when the caller had PFMEMALLOC set or is
> > +	 * been OOM killed. The expectation is that the caller is taking
> > +	 * steps that will free more memory. The caller should avoid the
> > +	 * page being used for !PFMEMALLOC purposes.
> > +	 */
> > +	page->pfmemalloc = !!(alloc_flags & ALLOC_PFMEMALLOC);
> > +
> > +	return page;
> >  }
> >  
> >  /*
> 
> I think this is slightly inconsistent if the page allocation succeeded 
> without needing ALLOC_NO_WATERMARKS, meaning that page was allocated above 
> the min watermark.  That's possible if the slowpath's first call to 
> get_page_from_freelist() succeeds without needing  __alloc_pages_high_priority(). 
> So perhaps we need to do something like
> 
> 	got_pg_memalloc:
> 		...
> 		page->pfmemalloc = !!(alloc_flags & ALLOC_PFMEMALLOC);
> 	got_pg:
> 		if (kmemcheck_enabled)
> 			kmemcheck_pagealloc_alloc(page, order, gfp_mask);
> 		return page;
> 
> and use got_pg_memalloc everywhere we currently use got_pg other than the 
> when it succeeds with ALLOC_NO_WATERMARKS.
> 

In this path, we are under memory pressure and at least below the low
watermark. kswapd is woken up and should start reclaiming pages. The
calling process may enter direct reclaim. As we are paging out and some
of that could be over the network, the pages are preserved until the
paging is complete. The problem is that in the context of this patch,
ALLOC_NO_WATERMARK is not being set for the cases we need so what it
does at this point makes sense.

That said, I see your point. The changelog is inconsistent with what
the code is doing. I updated the desription to read "When this patch is
applied, pages allocated from below the low watermark are returned with
page->pfmemalloc set and it is up to the caller to determine how the page
should be protected."

You are also right in that the series as a whole is actually protecting pages
below the low watermark and not below the min watermark as stated. I'll add a
new patch that will be patch 6 in the series that only sets page->pfmemalloc
when ALLOC_NO_WATERMARKS is used and test that.

One could argue that the patch "mm: Introduce __GFP_MEMALLOC to allow access
to emergency reserves" should be updated but I think a separate patch will
be easier to review and also easier to revert if regressions are reported.

Well spotted and thanks for reviewing.

This is not tested but is this what you had in mind? It only sets
page->pfmemalloc if ALLOC_NO_WATERMARKS was necessary to satisfy the
allocation. To do that, ALLOC_NO_WATERMARKS has to be stripped from
__alloc_pages_direct_reclaim and friends but it should goto rebalanace later.

---8<---
mm: Only set page->pfmemalloc when ALLOC_NO_WATERMARKS was used

__alloc_pages_slowpath() is called when the number of free pages is below
the low watermark. If the caller is entitled to use ALLOC_NO_WATERMARKS
then the page will be marked page->pfmemalloc.  This protects more pages
than are strictly necessary as we only need to protect pages allocated
below the min watermark (the pfmemalloc reserves).

This patch only sets page->pfmemalloc when ALLOC_NO_WATERMARKS was
required to allocate the page.

[rientjes@google.com: David noticed the problem during review]
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/page_alloc.c |   27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3738596..363ca90 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2043,8 +2043,8 @@ __alloc_pages_direct_compact(gfp_t gfp_mask, unsigned int order,
 
 		page = get_page_from_freelist(gfp_mask, nodemask,
 				order, zonelist, high_zoneidx,
-				alloc_flags, preferred_zone,
-				migratetype);
+				alloc_flags & ~ALLOC_NO_WATERMARKS,
+				preferred_zone, migratetype);
 		if (page) {
 			preferred_zone->compact_considered = 0;
 			preferred_zone->compact_defer_shift = 0;
@@ -2124,8 +2124,8 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order,
 retry:
 	page = get_page_from_freelist(gfp_mask, nodemask, order,
 					zonelist, high_zoneidx,
-					alloc_flags, preferred_zone,
-					migratetype);
+					alloc_flags & ~ALLOC_NO_WATERMARKS,
+					preferred_zone, migratetype);
 
 	/*
 	 * If an allocation failed after direct reclaim, it could be because
@@ -2296,8 +2296,17 @@ rebalance:
 		page = __alloc_pages_high_priority(gfp_mask, order,
 				zonelist, high_zoneidx, nodemask,
 				preferred_zone, migratetype);
-		if (page)
+		if (page) {
+			/*
+			 * page->pfmemalloc is set when ALLOC_NO_WATERMARKS was
+			 * necessary to allocate the page. The expectation is
+			 * that the caller is taking steps that will free more
+			 * memory. The caller should avoid the page being used
+			 * for !PFMEMALLOC purposes.
+			 */
+			page->pfmemalloc = true;
 			goto got_pg;
+		}
 	}
 
 	/* Atomic allocations - we can't balance anything */
@@ -2414,14 +2423,6 @@ nopage:
 	warn_alloc_failed(gfp_mask, order, NULL);
 	return page;
 got_pg:
-	/*
-	 * page->pfmemalloc is set when the caller had PFMEMALLOC set, is
-	 * been OOM killed or specified __GFP_MEMALLOC. The expectation is
-	 * that the caller is taking steps that will free more memory. The
-	 * caller should avoid the page being used for !PFMEMALLOC purposes.
-	 */
-	page->pfmemalloc = !!(alloc_flags & ALLOC_NO_WATERMARKS);
-
 	if (kmemcheck_enabled)
 		kmemcheck_pagealloc_alloc(page, order, gfp_mask);
 

  reply	other threads:[~2012-04-25 15:06 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-16 12:16 [PATCH 00/16] Swap-over-NBD without deadlocking V9 Mel Gorman
2012-04-16 12:16 ` [PATCH 01/16] mm: Serialize access to min_free_kbytes Mel Gorman
2012-04-23 23:50   ` David Rientjes
2012-04-16 12:16 ` [PATCH 02/16] mm: sl[au]b: Add knowledge of PFMEMALLOC reserve pages Mel Gorman
2012-04-23 23:51   ` David Rientjes
2012-04-25 15:05     ` Mel Gorman [this message]
2012-04-16 12:16 ` [PATCH 03/16] mm: slub: Optimise the SLUB fast path to avoid pfmemalloc checks Mel Gorman
2012-04-16 12:16 ` [PATCH 04/16] mm: Introduce __GFP_MEMALLOC to allow access to emergency reserves Mel Gorman
2012-04-16 12:16 ` [PATCH 05/16] mm: allow PF_MEMALLOC from softirq context Mel Gorman
2012-05-01 22:08   ` Andrew Morton
2012-05-02 16:24     ` Mel Gorman
2012-04-16 12:16 ` [PATCH 06/16] mm: Ignore mempolicies when using ALLOC_NO_WATERMARK Mel Gorman
2012-04-16 12:16 ` [PATCH 07/16] net: Introduce sk_allocation() to allow addition of GFP flags depending on the individual socket Mel Gorman
2012-04-16 12:16 ` [PATCH 08/16] netvm: Allow the use of __GFP_MEMALLOC by specific sockets Mel Gorman
2012-04-16 12:16 ` [PATCH 09/16] netvm: Allow skb allocation to use PFMEMALLOC reserves Mel Gorman
2012-04-16 12:16 ` [PATCH 10/16] netvm: Propagate page->pfmemalloc to skb Mel Gorman
2012-04-16 12:16 ` [PATCH 11/16] netvm: Propagate page->pfmemalloc from netdev_alloc_page " Mel Gorman
2012-04-16 12:16 ` [PATCH 12/16] netvm: Set PF_MEMALLOC as appropriate during SKB processing Mel Gorman
2012-04-16 12:17 ` [PATCH 13/16] mm: Micro-optimise slab to avoid a function call Mel Gorman
2012-04-16 12:17 ` [PATCH 14/16] nbd: Set SOCK_MEMALLOC for access to PFMEMALLOC reserves Mel Gorman
2012-04-16 12:17 ` [PATCH 15/16] mm: Throttle direct reclaimers if PF_MEMALLOC reserves are low and swap is backed by network storage Mel Gorman
2012-05-01 22:24   ` Andrew Morton
2012-05-02 16:24     ` Mel Gorman
2012-04-16 12:17 ` [PATCH 16/16] mm: Account for the number of times direct reclaimers get throttled Mel Gorman
2012-05-01 22:28 ` [PATCH 00/16] Swap-over-NBD without deadlocking V9 Andrew Morton
2012-05-03 15:00   ` Mel Gorman
2012-05-03 17:06     ` David Miller
2012-05-04 10:16       ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120425150556.GC15299@suse.de \
    --to=mgorman@suse.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=davem@davemloft.net \
    --cc=emunson@mgebm.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=michaelc@cs.wisc.edu \
    --cc=neilb@suse.de \
    --cc=netdev@vger.kernel.org \
    --cc=rientjes@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).