Linux-api Archive on lore.kernel.org
 help / color / Atom feed
From: Ben Widawsky <ben.widawsky@intel.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm <linux-mm@kvack.org>, Andi Kleen <ak@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Christoph Lameter <cl@linux.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	David Hildenbrand <david@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Kuppuswamy Sathyanarayanan 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>,
	Li Xinhai <lixinhai.lxh@gmail.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Mina Almasry <almasrymina@google.com>, Tejun Heo <tj@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	linux-api@vger.kernel.org
Subject: Re: [PATCH 00/18] multiple preferred nodes
Date: Wed, 24 Jun 2020 12:37:33 -0700
Message-ID: <20200624193733.tqeligjd3pdvrsmi@intel.com> (raw)
In-Reply-To: <20200624183917.GW1320@dhcp22.suse.cz>

On 20-06-24 20:39:17, Michal Hocko wrote:
> On Wed 24-06-20 09:16:43, Ben Widawsky wrote:
> > On 20-06-24 09:52:16, Michal Hocko wrote:
> > > On Tue 23-06-20 09:12:11, Ben Widawsky wrote:
> > > > On 20-06-23 13:20:48, Michal Hocko wrote:
> > > [...]
> > > > > It would be also great to provide a high level semantic description
> > > > > here. I have very quickly glanced through patches and they are not
> > > > > really trivial to follow with many incremental steps so the higher level
> > > > > intention is lost easily.
> > > > > 
> > > > > Do I get it right that the default semantic is essentially
> > > > > 	- allocate page from the given nodemask (with __GFP_RETRY_MAYFAIL
> > > > > 	  semantic)
> > > > > 	- fallback to numa unrestricted allocation with the default
> > > > > 	  numa policy on the failure
> > > > > 
> > > > > Or are there any usecases to modify how hard to keep the preference over
> > > > > the fallback?
> > > > 
> > > > tl;dr is: yes, and no usecases.
> > > 
> > > OK, then I am wondering why the change has to be so involved. Except for
> > > syscall plumbing the only real change to the allocator path would be
> > > something like
> > > 
> > > static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
> > > {
> > > 	/* Lower zones don't get a nodemask applied for MPOL_BIND */
> > > 	if (unlikely(policy->mode == MPOL_BIND || 
> > > 	   	     policy->mode == MPOL_PREFERED_MANY) &&
> > > 			apply_policy_zone(policy, gfp_zone(gfp)) &&
> > > 			cpuset_nodemask_valid_mems_allowed(&policy->v.nodes))
> > > 		return &policy->v.nodes;
> > > 
> > > 	return NULL;
> > > }
> > > 
> > > alloc_pages_current
> > > 
> > > 	if (pol->mode == MPOL_INTERLEAVE)
> > > 		page = alloc_page_interleave(gfp, order, interleave_nodes(pol));
> > > 	else {
> > > 		gfp_t gfp_attempt = gfp;
> > > 
> > > 		/*
> > > 		 * Make sure the first allocation attempt will try hard
> > > 		 * but eventually fail without OOM killer or other
> > > 		 * disruption before falling back to the full nodemask
> > > 		 */
> > > 		if (pol->mode == MPOL_PREFERED_MANY)
> > > 			gfp_attempt |= __GFP_RETRY_MAYFAIL;	
> > > 
> > > 		page = __alloc_pages_nodemask(gfp_attempt, order,
> > > 				policy_node(gfp, pol, numa_node_id()),
> > > 				policy_nodemask(gfp, pol));
> > > 		if (!page && pol->mode == MPOL_PREFERED_MANY)
> > > 			page = __alloc_pages_nodemask(gfp, order,
> > > 				numa_node_id(), NULL);
> > > 	}
> > > 
> > > 	return page;
> > > 
> > > similar (well slightly more hairy) in alloc_pages_vma
> > > 
> > > Or do I miss something that really requires more involved approach like
> > > building custom zonelists and other larger changes to the allocator?
> > 
> > I think I'm missing how this allows selecting from multiple preferred nodes. In
> > this case when you try to get the page from the freelist, you'll get the
> > zonelist of the preferred node, and when you actually scan through on page
> > allocation, you have no way to filter out the non-preferred nodes. I think the
> > plumbing of multiple nodes has to go all the way through
> > __alloc_pages_nodemask(). But it's possible I've missed the point.
> 
> policy_nodemask() will provide the nodemask which will be used as a
> filter on the policy_node.

Ah, gotcha. Enabling independent masks seemed useful. Some bad decisions got me
to that point. UAPI cannot get independent masks, and callers of these functions
don't yet use them.

So let me ask before I actually type it up and find it's much much simpler, is
there not some perceived benefit to having both masks being independent?

  reply index

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20200619162425.1052382-1-ben.widawsky@intel.com>
2020-06-22  7:09 ` Michal Hocko
2020-06-23 11:20   ` Michal Hocko
2020-06-23 16:12     ` Ben Widawsky
2020-06-24  7:52       ` Michal Hocko
2020-06-24 16:16         ` Ben Widawsky
2020-06-24 18:39           ` Michal Hocko
2020-06-24 19:37             ` Ben Widawsky [this message]
2020-06-24 19:51               ` Michal Hocko
2020-06-24 20:01                 ` Ben Widawsky
2020-06-24 20:07                   ` Michal Hocko
2020-06-24 20:23                     ` Ben Widawsky
2020-06-24 20:42                       ` Michal Hocko
2020-06-24 20:55                         ` Ben Widawsky
2020-06-25  6:28                           ` Michal Hocko
2020-06-26 21:39         ` Ben Widawsky
2020-06-29 10:16           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200624193733.tqeligjd3pdvrsmi@intel.com \
    --to=ben.widawsky@intel.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=cl@linux.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jgg@ziepe.ca \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lixinhai.lxh@gmail.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rientjes@google.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=tj@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-api Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-api/0 linux-api/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-api linux-api/ https://lore.kernel.org/linux-api \
		linux-api@vger.kernel.org
	public-inbox-index linux-api

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-api


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git