linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Ben Widawsky <ben.widawsky@intel.com>
To: linux-mm <linux-mm@kvack.org>
Subject: [PATCH 03/18] mm/page_alloc: start plumbing multi preferred node
Date: Fri, 19 Jun 2020 09:23:59 -0700	[thread overview]
Message-ID: <20200619162414.1052234-4-ben.widawsky@intel.com> (raw)
In-Reply-To: <20200619162414.1052234-1-ben.widawsky@intel.com>

In preparation for supporting multiple preferred nodes, we need the
internals to switch from taking a nid to a nodemask.

As mentioned in the code as a comment, __alloc_pages_nodemask() is the
heart of the page allocator. It takes a single node as a preferred node
to try to obtain a zonelist from first. This patch leaves that internal
interface in place, but changes the guts of the function to consider a
list of preferred nodes.

The local node is always most preferred. If the local node is either
restricted because of preference or binding, then the closest node that
meets both the binding and preference criteria is used. If the
intersection of binding and preference is the empty set, then fall back
to the first node the meets binding criteria.

As of this patch, multiple preferred nodes aren't actually supported as
it might seem initially. As an example, suppose your preferred nodes
are 0, and 1. Node 0's fallback zone list may have zones from nodes
ordered 0->2->1. If this code were to pick 0's zonelist, and all zones
from node 0 were full, you'd get a zone from node 2 instead of 1. As
multiple nodes aren't yet supported anyway, this is okay just as a
preparatory patch.

v2:
Fixed memory hotplug handling (Ben)

Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 mm/page_alloc.c | 125 +++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 119 insertions(+), 6 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48eb0f1410d4..280ca85dc4d8 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -129,6 +129,10 @@ nodemask_t node_states[NR_NODE_STATES] __read_mostly = {
 };
 EXPORT_SYMBOL(node_states);
 
+#ifdef CONFIG_NUMA
+static int find_next_best_node(int node, nodemask_t *used_node_mask);
+#endif
+
 atomic_long_t _totalram_pages __read_mostly;
 EXPORT_SYMBOL(_totalram_pages);
 unsigned long totalreserve_pages __read_mostly;
@@ -4759,13 +4763,118 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	return page;
 }
 
-static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
-		int preferred_nid, nodemask_t *nodemask,
-		struct alloc_context *ac, gfp_t *alloc_mask,
-		unsigned int *alloc_flags)
+#ifndef CONFIG_NUMA
+#define set_pref_bind_mask(out, pref, bind)                                    \
+	{                                                                      \
+		(out)->bits[0] = 1UL                                           \
+	}
+#else
+static void set_pref_bind_mask(nodemask_t *out, const nodemask_t *prefmask,
+			       const nodemask_t *bindmask)
+{
+	bool has_pref, has_bind;
+
+	has_pref = prefmask && !nodes_empty(*prefmask);
+	has_bind = bindmask && !nodes_empty(*bindmask);
+
+	if (has_pref && has_bind)
+		nodes_and(*out, *prefmask, *bindmask);
+	else if (has_pref && !has_bind)
+		*out = *prefmask;
+	else if (!has_pref && has_bind)
+		*out = *bindmask;
+	else if (!has_pref && !has_bind)
+		unreachable(); /* Handled above */
+	else
+		unreachable();
+}
+#endif
+
+/*
+ * Find a zonelist from a preferred node. Here is a truth table example using 2
+ * different masks. The choices are, NULL mask, empty mask, two masks with an
+ * intersection, and two masks with no intersection. If the local node is in the
+ * intersection, it is used, otherwise the first set node is used.
+ *
+ * +----------+----------+------------+
+ * | bindmask | prefmask |  zonelist  |
+ * +----------+----------+------------+
+ * | NULL/0   | NULL/0   | local node |
+ * | NULL/0   | 0x2      | 0x2        |
+ * | NULL/0   | 0x4      | 0x4        |
+ * | 0x2      | NULL/0   | 0x2        |
+ * | 0x2      | 0x2      | 0x2        |
+ * | 0x2      | 0x4      | local*     |
+ * | 0x4      | NULL/0   | 0x4        |
+ * | 0x4      | 0x2      | local*     |
+ * | 0x4      | 0x4      | 0x4        |
+ * +----------+----------+------------+
+ *
+ * NB: That zonelist will have *all* zones in the fallback case, and not all of
+ * those zones will belong to preferred nodes.
+ */
+static struct zonelist *preferred_zonelist(gfp_t gfp_mask,
+					   const nodemask_t *prefmask,
+					   const nodemask_t *bindmask)
+{
+	nodemask_t pref;
+	int nid, local_node = numa_mem_id();
+
+	/* Multi nodes not supported yet */
+	VM_BUG_ON(prefmask && nodes_weight(*prefmask) != 1);
+
+#define _isset(mask, node)                                                     \
+	(!(mask) || nodes_empty(*(mask)) ? 1 : node_isset(node, *(mask)))
+	/*
+	 * This will handle NULL masks, empty masks, and when the local node
+	 * match all constraints. It does most of the magic here.
+	 */
+	if (_isset(prefmask, local_node) && _isset(bindmask, local_node))
+		return node_zonelist(local_node, gfp_mask);
+#undef _isset
+
+	VM_BUG_ON(!prefmask && !bindmask);
+
+	set_pref_bind_mask(&pref, prefmask, bindmask);
+
+	/*
+	 * It is possible that the caller may ask for a preferred set that isn't
+	 * available. One such case is memory hotplug. Memory hotplug code tries
+	 * to do some allocations from the target node (what will be local to
+	 * the new node) before the pages are onlined (N_MEMORY).
+	 */
+	for_each_node_mask(nid, pref) {
+		if (!node_state(nid, N_MEMORY))
+			node_clear(nid, pref);
+	}
+
+	/*
+	 * If we couldn't manage to get anything reasonable, let later code
+	 * clean up our mess. Local node will be the best approximation for what
+	 * is desired, just use it.
+	 */
+	if (unlikely(nodes_empty(pref)))
+		return node_zonelist(local_node, gfp_mask);
+
+	/* Try to find the "closest" node in the list. */
+	nodes_complement(pref, pref);
+	while ((nid = find_next_best_node(local_node, &pref)) != NUMA_NO_NODE)
+		return node_zonelist(nid, gfp_mask);
+
+	/*
+	 * find_next_best_node() will have to have found something if the
+	 * node list isn't empty and so it isn't possible to get here unless
+	 * find_next_best_node() is modified and this function isn't updated.
+	 */
+	BUG();
+}
+
+static inline bool
+prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, nodemask_t *prefmask,
+		    nodemask_t *nodemask, struct alloc_context *ac,
+		    gfp_t *alloc_mask, unsigned int *alloc_flags)
 {
 	ac->highest_zoneidx = gfp_zone(gfp_mask);
-	ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
 	ac->nodemask = nodemask;
 	ac->migratetype = gfp_migratetype(gfp_mask);
 
@@ -4777,6 +4886,8 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order,
 			*alloc_flags |= ALLOC_CPUSET;
 	}
 
+	ac->zonelist = preferred_zonelist(gfp_mask, prefmask, ac->nodemask);
+
 	fs_reclaim_acquire(gfp_mask);
 	fs_reclaim_release(gfp_mask);
 
@@ -4817,6 +4928,7 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 	unsigned int alloc_flags = ALLOC_WMARK_LOW;
 	gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */
 	struct alloc_context ac = { };
+	nodemask_t prefmask = nodemask_of_node(preferred_nid);
 
 	/*
 	 * There are several places where we assume that the order value is sane
@@ -4829,7 +4941,8 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid,
 
 	gfp_mask &= gfp_allowed_mask;
 	alloc_mask = gfp_mask;
-	if (!prepare_alloc_pages(gfp_mask, order, preferred_nid, nodemask, &ac, &alloc_mask, &alloc_flags))
+	if (!prepare_alloc_pages(gfp_mask, order, &prefmask, nodemask, &ac,
+				 &alloc_mask, &alloc_flags))
 		return NULL;
 
 	finalise_ac(gfp_mask, &ac);
-- 
2.27.0



  parent reply	other threads:[~2020-06-19 16:25 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 16:23 [PATCH 00/18] multiple preferred nodes Ben Widawsky
2020-06-19 16:23 ` [PATCH 01/18] mm/mempolicy: Add comment for missing LOCAL Ben Widawsky
2020-06-19 16:23 ` [PATCH 02/18] mm/mempolicy: Use node_mem_id() instead of node_id() Ben Widawsky
2020-06-19 16:23 ` Ben Widawsky [this message]
2020-06-19 16:24 ` [PATCH 04/18] mm/page_alloc: add preferred pass to page allocation Ben Widawsky
2020-06-19 16:24 ` [PATCH 05/18] mm/mempolicy: convert single preferred_node to full nodemask Ben Widawsky
2020-06-19 16:24 ` [PATCH 06/18] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Ben Widawsky
2020-06-19 16:24 ` [PATCH 07/18] mm/mempolicy: allow preferred code to take a nodemask Ben Widawsky
2020-06-19 16:24 ` [PATCH 08/18] mm/mempolicy: refactor rebind code for PREFERRED_MANY Ben Widawsky
2020-06-19 16:24 ` [PATCH 09/18] mm: Finish handling MPOL_PREFERRED_MANY Ben Widawsky
2020-06-19 16:24 ` [PATCH 10/18] mm: clean up alloc_pages_vma (thp) Ben Widawsky
2020-06-19 16:24 ` [PATCH 11/18] mm: Extract THP hugepage allocation Ben Widawsky
2020-06-19 16:24 ` [PATCH 12/18] mm/mempolicy: Use __alloc_page_node for interleaved Ben Widawsky
2020-06-19 16:24 ` [PATCH 13/18] mm: kill __alloc_pages Ben Widawsky
2020-06-19 16:25 ` [PATCH 00/18] multiple preferred nodes Ben Widawsky
2020-06-19 16:24 Ben Widawsky
2020-06-19 16:24 ` [PATCH 03/18] mm/page_alloc: start plumbing multi preferred node Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200619162414.1052234-4-ben.widawsky@intel.com \
    --to=ben.widawsky@intel.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).