All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: Michal Hocko <mhocko@suse.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Dave Hansen <dave.hansen@intel.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	Andi Kleen <ak@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>
Subject: Re: [PATCH v4 08/13] mm/mempolicy: Create a page allocator for policy
Date: Thu, 15 Apr 2021 16:17:17 +0800	[thread overview]
Message-ID: <20210415081717.GC61572@shbuild999.sh.intel.com> (raw)
In-Reply-To: <YHbpQ2xpTVChY718@dhcp22.suse.cz>

On Wed, Apr 14, 2021 at 03:08:19PM +0200, Michal Hocko wrote:
> On Wed 17-03-21 11:40:05, Feng Tang wrote:
> > From: Ben Widawsky <ben.widawsky@intel.com>
> > 
> > Add a helper function which takes care of handling multiple preferred
> > nodes. It will be called by future patches that need to handle this,
> > specifically VMA based page allocation, and task based page allocation.
> > Huge pages don't quite fit the same pattern because they use different
> > underlying page allocation functions. This consumes the previous
> > interleave policy specific allocation function to make a one stop shop
> > for policy based allocation.
> > 
> > With this, MPOL_PREFERRED_MANY's semantic is more like MPOL_PREFERRED
> > that it will first try the preferred node/nodes, and fallback to all
> > other nodes when first try fails. Thanks to Michal Hocko for suggestions
> > on this.
> > 
> > For now, only interleaved policy will be used so there should be no
> > functional change yet. However, if bisection points to issues in the
> > next few commits, it was likely the fault of this patch.
> 
> I am not sure this is helping much. Let's see in later patches but I
> would keep them separate and rather create a dedicated function for the
> new policy allocation mode.
 
Thanks for the suggestion, we will rethink the implementations. 

- Feng

> > Similar functionality is offered via policy_node() and
> > policy_nodemask(). By themselves however, neither can achieve this
> > fallback style of sets of nodes.
> > 
> > [ Feng: for the first try, add NOWARN flag, and skip the direct reclaim
> >   to speedup allocation in some case ]
> > 
> > Link: https://lore.kernel.org/r/20200630212517.308045-9-ben.widawsky@intel.com
> > Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
> > Signed-off-by: Feng Tang <feng.tang@intel.com>
> > ---
> >  mm/mempolicy.c | 65 ++++++++++++++++++++++++++++++++++++++++++++++------------
> >  1 file changed, 52 insertions(+), 13 deletions(-)
> > 
> > diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> > index d945f29..d21105b 100644
> > --- a/mm/mempolicy.c
> > +++ b/mm/mempolicy.c
> > @@ -2187,22 +2187,60 @@ bool mempolicy_nodemask_intersects(struct task_struct *tsk,
> >  	return ret;
> >  }
> >  
> > -/* Allocate a page in interleaved policy.
> > -   Own path because it needs to do special accounting. */
> > -static struct page *alloc_page_interleave(gfp_t gfp, unsigned order,
> > -					unsigned nid)
> > +/* Handle page allocation for all but interleaved policies */
> > +static struct page *alloc_pages_policy(struct mempolicy *pol, gfp_t gfp,
> > +				       unsigned int order, int preferred_nid)
> >  {
> >  	struct page *page;
> > +	gfp_t gfp_mask = gfp;
> >  
> > -	page = __alloc_pages(gfp, order, nid);
> > -	/* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */
> > -	if (!static_branch_likely(&vm_numa_stat_key))
> > +	if (pol->mode == MPOL_INTERLEAVE) {
> > +		page = __alloc_pages(gfp, order, preferred_nid);
> > +		/* skip NUMA_INTERLEAVE_HIT counter update if numa stats is disabled */
> > +		if (!static_branch_likely(&vm_numa_stat_key))
> > +			return page;
> > +		if (page && page_to_nid(page) == preferred_nid) {
> > +			preempt_disable();
> > +			__inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT);
> > +			preempt_enable();
> > +		}
> >  		return page;
> > -	if (page && page_to_nid(page) == nid) {
> > -		preempt_disable();
> > -		__inc_numa_state(page_zone(page), NUMA_INTERLEAVE_HIT);
> > -		preempt_enable();
> >  	}
> > +
> > +	VM_BUG_ON(preferred_nid != NUMA_NO_NODE);
> > +
> > +	preferred_nid = numa_node_id();
> > +
> > +	/*
> > +	 * There is a two pass approach implemented here for
> > +	 * MPOL_PREFERRED_MANY. In the first pass we try the preferred nodes
> > +	 * but allow the allocation to fail. The below table explains how
> > +	 * this is achieved.
> > +	 *
> > +	 * | Policy                        | preferred nid | nodemask   |
> > +	 * |-------------------------------|---------------|------------|
> > +	 * | MPOL_DEFAULT                  | local         | NULL       |
> > +	 * | MPOL_PREFERRED                | best          | NULL       |
> > +	 * | MPOL_INTERLEAVE               | ERR           | ERR        |
> > +	 * | MPOL_BIND                     | local         | pol->nodes |
> > +	 * | MPOL_PREFERRED_MANY           | best          | pol->nodes |
> > +	 * | MPOL_PREFERRED_MANY (round 2) | local         | NULL       |
> > +	 * +-------------------------------+---------------+------------+
> > +	 */
> > +	if (pol->mode == MPOL_PREFERRED_MANY) {
> > +		gfp_mask |=  __GFP_NOWARN;
> > +
> > +		/* Skip direct reclaim, as there will be a second try */
> > +		gfp_mask &= ~__GFP_DIRECT_RECLAIM;
> > +	}
> > +
> > +	page = __alloc_pages_nodemask(gfp_mask, order,
> > +				      policy_node(gfp, pol, preferred_nid),
> > +				      policy_nodemask(gfp, pol));
> > +
> > +	if (unlikely(!page && pol->mode == MPOL_PREFERRED_MANY))
> > +		page = __alloc_pages_nodemask(gfp, order, preferred_nid, NULL);
> > +
> >  	return page;
> >  }
> >  
> > @@ -2244,8 +2282,8 @@ alloc_pages_vma(gfp_t gfp, int order, struct vm_area_struct *vma,
> >  		unsigned nid;
> >  
> >  		nid = interleave_nid(pol, vma, addr, PAGE_SHIFT + order);
> > +		page = alloc_pages_policy(pol, gfp, order, nid);
> >  		mpol_cond_put(pol);
> > -		page = alloc_page_interleave(gfp, order, nid);
> >  		goto out;
> >  	}
> >  
> > @@ -2329,7 +2367,8 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order)
> >  	 * nor system default_policy
> >  	 */
> >  	if (pol->mode == MPOL_INTERLEAVE)
> > -		page = alloc_page_interleave(gfp, order, interleave_nodes(pol));
> > +		page = alloc_pages_policy(pol, gfp, order,
> > +					  interleave_nodes(pol));
> >  	else
> >  		page = __alloc_pages_nodemask(gfp, order,
> >  				policy_node(gfp, pol, numa_node_id()),
> > -- 
> > 2.7.4
> 
> -- 
> Michal Hocko
> SUSE Labs

  reply	other threads:[~2021-04-15  8:17 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17  3:39 [PATCH v4 00/13] Introduced multi-preference mempolicy Feng Tang
2021-03-17  3:39 ` [PATCH v4 01/13] mm/mempolicy: Add comment for missing LOCAL Feng Tang
2021-03-17  3:39 ` [PATCH v4 02/13] mm/mempolicy: convert single preferred_node to full nodemask Feng Tang
2021-04-14 12:17   ` Michal Hocko
2021-03-17  3:40 ` [PATCH v4 03/13] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Feng Tang
2021-04-14 12:50   ` Michal Hocko
2021-04-20  7:16     ` Feng Tang
2021-05-13  7:23       ` Feng Tang
2021-05-13  7:25       ` [RFC PATCH 2/2] mempolicy: kill MPOL_F_LOCAL bit Feng Tang
2021-05-13 13:55         ` Andi Kleen
2021-03-17  3:40 ` [PATCH v4 04/13] mm/mempolicy: allow preferred code to take a nodemask Feng Tang
2021-04-14 12:55   ` Michal Hocko
2021-04-19  8:49     ` Feng Tang
2021-03-17  3:40 ` [PATCH v4 05/13] mm/mempolicy: refactor rebind code for PREFERRED_MANY Feng Tang
2021-04-14 12:57   ` Michal Hocko
2021-03-17  3:40 ` [PATCH v4 06/13] mm/mempolicy: kill v.preferred_nodes Feng Tang
2021-04-14 12:58   ` Michal Hocko
2021-03-17  3:40 ` [PATCH v4 07/13] mm/mempolicy: handle MPOL_PREFERRED_MANY like BIND Feng Tang
2021-04-14 13:01   ` Michal Hocko
2021-03-17  3:40 ` [PATCH v4 08/13] mm/mempolicy: Create a page allocator for policy Feng Tang
2021-04-14 13:08   ` Michal Hocko
2021-04-15  8:17     ` Feng Tang [this message]
2021-03-17  3:40 ` [PATCH v4 09/13] mm/mempolicy: Thread allocation for many preferred Feng Tang
2021-03-17  3:40 ` [PATCH v4 10/13] mm/mempolicy: VMA " Feng Tang
2021-04-14 13:14   ` Michal Hocko
2021-03-17  3:40 ` [PATCH v4 11/13] mm/mempolicy: huge-page " Feng Tang
2021-03-17  7:19   ` kernel test robot
2021-03-17  7:19     ` kernel test robot
2021-04-14 13:25   ` Michal Hocko
2021-04-15  7:41     ` Feng Tang
2021-03-17  3:40 ` [PATCH v4 12/13] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Feng Tang
2021-03-17  3:40 ` [PATCH v4 13/13] mem/mempolicy: unify mpol_new_preferred() and mpol_new_preferred_many() Feng Tang
2021-04-14 11:21 ` [PATCH v4 00/13] Introduced multi-preference mempolicy Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210415081717.GC61572@shbuild999.sh.intel.com \
    --to=feng.tang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=ben.widawsky@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.