[PATCH 14/18] mm/mempolicy: Introduce policy_preferred_nodes()

From: Ben Widawsky <ben.widawsky@intel.com>
To: linux-mm <linux-mm@kvack.org>
Cc: Ben Widawsky <ben.widawsky@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Li Xinhai <lixinhai.lxh@gmail.com>,
	Michal Hocko <mhocko@kernel.org>,
	Vlastimil Babka <vbabka@suse.cz>
Subject: [PATCH 14/18] mm/mempolicy: Introduce policy_preferred_nodes()
Date: Fri, 19 Jun 2020 09:24:21 -0700	[thread overview]
Message-ID: <20200619162425.1052382-15-ben.widawsky@intel.com> (raw)
In-Reply-To: <20200619162425.1052382-1-ben.widawsky@intel.com>

Current code provides a policy_node() helper which given a preferred
node, flags, and policy will help determine the preferred node. Going
forward it is desirable to have this same functionality given a set of
nodes, rather than a single node. policy_node is then implemented in
terms of the now more generic policy_preferred_nodes.

I went back and forth as to whether this function should take in a set
of preferred nodes and modify that. Something like:
policy_preferred_nodes(gfp, *policy, *mask);

That idea was nice as it allowed the policy function to create the mask
to be used. Ultimately, it turns out callers don't need such fanciness,
and those callers would use this mask directly in page allocation
functions that can accept NULL for a preference mask. So having this
function return NULL when there is no ideal mask turns out to be
beneficial.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Li Xinhai <lixinhai.lxh@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
---
 mm/mempolicy.c | 57 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 47 insertions(+), 10 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index eb2520d68a04..3c48f299d344 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1946,24 +1946,61 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy)
 	return NULL;
 }
 
-/* Return the node id preferred by the given mempolicy, or the given id */
-static int policy_node(gfp_t gfp, struct mempolicy *policy,
-								int nd)
+/*
+ * Returns a nodemask to be used for preference if the given policy dictates.
+ * Otherwise, returns NULL and the caller should likely use
+ * nodemask_of_node(numa_mem_id());
+ */
+static nodemask_t *policy_preferred_nodes(gfp_t gfp, struct mempolicy *policy)
 {
-	if ((policy->mode == MPOL_PREFERRED ||
-	     policy->mode == MPOL_PREFERRED_MANY) &&
-	    !(policy->flags & MPOL_F_LOCAL)) {
-		nd = first_node(policy->v.preferred_nodes);
-	} else {
+	nodemask_t *pol_pref = &policy->v.preferred_nodes;
+
+	/*
+	 * There are 2 "levels" of policy. What the callers asked for
+	 * (prefmask), and what the memory policy should be for the given gfp.
+	 * The memory policy takes preference in the case that prefmask isn't a
+	 * subset of the mem policy.
+	 */
+	switch (policy->mode) {
+	case MPOL_PREFERRED:
+		/* local, or buggy policy */
+		if (policy->flags & MPOL_F_LOCAL ||
+		    WARN_ON(nodes_weight(*pol_pref) != 1))
+			return NULL;
+		else
+			return pol_pref;
+		break;
+	case MPOL_PREFERRED_MANY:
+		if (WARN_ON(nodes_weight(*pol_pref) == 0))
+			return NULL;
+		else
+			return pol_pref;
+		break;
+	default:
+	case MPOL_INTERLEAVE:
+	case MPOL_BIND:
 		/*
 		 * __GFP_THISNODE shouldn't even be used with the bind policy
 		 * because we might easily break the expectation to stay on the
 		 * requested node and not break the policy.
 		 */
-		WARN_ON_ONCE(policy->mode == MPOL_BIND && (gfp & __GFP_THISNODE));
+		WARN_ON_ONCE(gfp & __GFP_THISNODE);
+		break;
 	}
 
-	return nd;
+	return NULL;
+}
+
+/* Return the node id preferred by the given mempolicy, or the given id */
+static int policy_node(gfp_t gfp, struct mempolicy *policy, int nd)
+{
+	nodemask_t *tmp;
+
+	tmp = policy_preferred_nodes(gfp, policy);
+	if (tmp)
+		return first_node(*tmp);
+	else
+		return nd;
 }
 
 /* Do dynamic interleaving for a process */
-- 
2.27.0