All of lore.kernel.org
 help / color / mirror / Atom feed
From: Feng Tang <feng.tang@intel.com>
To: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Michal Hocko <mhocko@kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	David Rientjes <rientjes@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Dave Hansen <dave.hansen@intel.com>,
	Ben Widawsky <ben.widawsky@intel.com>,
	Andi leen <ak@linux.intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Feng Tang <feng.tang@intel.com>
Subject: [PATCH v3 12/14] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY
Date: Wed,  3 Mar 2021 18:20:56 +0800	[thread overview]
Message-ID: <1614766858-90344-13-git-send-email-feng.tang@intel.com> (raw)
In-Reply-To: <1614766858-90344-1-git-send-email-feng.tang@intel.com>

From: Ben Widawsky <ben.widawsky@intel.com>

Adds a new mode to the existing mempolicy modes, MPOL_PREFERRED_MANY.

MPOL_PREFERRED_MANY will be adequately documented in the internal
admin-guide with this patch. Eventually, the man pages for mbind(2),
get_mempolicy(2), set_mempolicy(2) and numactl(8) will also have text
about this mode.  Those shall contain the canonical reference.

NUMA systems continue to become more prevalent. New technologies like
PMEM make finer grain control over memory access patterns increasingly
desirable. MPOL_PREFERRED_MANY allows userspace to specify a set of
nodes that will be tried first when performing allocations. If those
allocations fail, all remaining nodes will be tried. It's a straight
forward API which solves many of the presumptive needs of system
administrators wanting to optimize workloads on such machines. The mode
will work either per VMA, or per thread.

Generally speaking, this is similar to the way MPOL_BIND works, except
the user will only get a SIGSEGV if all nodes in the system are unable
to satisfy the allocation request.

v3: fix a typo of checking policy (Feng)

Link: https://lore.kernel.org/r/20200630212517.308045-13-ben.widawsky@intel.com
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
---
 Documentation/admin-guide/mm/numa_memory_policy.rst | 16 ++++++++++++----
 include/uapi/linux/mempolicy.h                      |  6 +++---
 mm/hugetlb.c                                        |  4 ++--
 mm/mempolicy.c                                      | 14 ++++++--------
 4 files changed, 23 insertions(+), 17 deletions(-)

diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst
index 1ad020c..b69963a 100644
--- a/Documentation/admin-guide/mm/numa_memory_policy.rst
+++ b/Documentation/admin-guide/mm/numa_memory_policy.rst
@@ -245,6 +245,14 @@ MPOL_INTERLEAVED
 	address range or file.  During system boot up, the temporary
 	interleaved system default policy works in this mode.
 
+MPOL_PREFERRED_MANY
+        This mode specifies that the allocation should be attempted from the
+        nodemask specified in the policy. If that allocation fails, the kernel
+        will search other nodes, in order of increasing distance from the first
+        set bit in the nodemask based on information provided by the platform
+        firmware. It is similar to MPOL_PREFERRED with the main exception that
+        is is an error to have an empty nodemask.
+
 NUMA memory policy supports the following optional mode flags:
 
 MPOL_F_STATIC_NODES
@@ -253,10 +261,10 @@ MPOL_F_STATIC_NODES
 	nodes changes after the memory policy has been defined.
 
 	Without this flag, any time a mempolicy is rebound because of a
-	change in the set of allowed nodes, the node (Preferred) or
-	nodemask (Bind, Interleave) is remapped to the new set of
-	allowed nodes.  This may result in nodes being used that were
-	previously undesired.
+        change in the set of allowed nodes, the preferred nodemask (Preferred
+        Many), preferred node (Preferred) or nodemask (Bind, Interleave) is
+        remapped to the new set of allowed nodes.  This may result in nodes
+        being used that were previously undesired.
 
 	With this flag, if the user-specified nodes overlap with the
 	nodes allowed by the task's cpuset, then the memory policy is
diff --git a/include/uapi/linux/mempolicy.h b/include/uapi/linux/mempolicy.h
index 3354774..ad3eee6 100644
--- a/include/uapi/linux/mempolicy.h
+++ b/include/uapi/linux/mempolicy.h
@@ -16,13 +16,13 @@
  */
 
 /* Policies */
-enum {
-	MPOL_DEFAULT,
+enum { MPOL_DEFAULT,
 	MPOL_PREFERRED,
 	MPOL_BIND,
 	MPOL_INTERLEAVE,
 	MPOL_LOCAL,
-	MPOL_MAX,	/* always last member of enum */
+	MPOL_PREFERRED_MANY,
+	MPOL_MAX, /* always last member of enum */
 };
 
 /* Flags for set_mempolicy */
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c7c9ef3..60a0d57 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1131,7 +1131,7 @@ static struct page *dequeue_huge_page_vma(struct hstate *h,
 
 	gfp_mask = htlb_alloc_mask(h);
 	nid = huge_node(vma, address, gfp_mask, &mpol, &nodemask);
-	if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */
+	if (mpol->mode == MPOL_PREFERRED_MANY) {
 		page = dequeue_huge_page_nodemask(h,
 				gfp_mask | __GFP_RETRY_MAYFAIL | __GFP_NOWARN,
 				nid, nodemask);
@@ -1943,7 +1943,7 @@ struct page *alloc_buddy_huge_page_with_mpol(struct hstate *h,
 	nodemask_t *nodemask;
 
 	nid = huge_node(vma, addr, gfp_mask, &mpol, &nodemask);
-	if (mpol->mode != MPOL_BIND && nodemask) { /* AKA MPOL_PREFERRED_MANY */
+	if (mpol->mode == MPOL_PREFERRED_MANY) {
 		page = alloc_surplus_huge_page(h,
 				gfp_mask | __GFP_RETRY_MAYFAIL | __GFP_NOWARN,
 				nid, nodemask);
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f9b2167..1438d58 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -108,8 +108,6 @@
 
 #include "internal.h"
 
-#define MPOL_PREFERRED_MANY MPOL_MAX
-
 /* Internal flags */
 #define MPOL_MF_DISCONTIG_OK (MPOL_MF_INTERNAL << 0)	/* Skip checks for continuous vmas */
 #define MPOL_MF_INVERT (MPOL_MF_INTERNAL << 1)		/* Invert check for nodemask */
@@ -180,7 +178,7 @@ struct mempolicy *get_task_policy(struct task_struct *p)
 static const struct mempolicy_operations {
 	int (*create)(struct mempolicy *pol, const nodemask_t *nodes);
 	void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes);
-} mpol_ops[MPOL_MAX + 1];
+} mpol_ops[MPOL_MAX];
 
 static inline int mpol_store_user_nodemask(const struct mempolicy *pol)
 {
@@ -389,8 +387,8 @@ static void mpol_rebind_preferred_common(struct mempolicy *pol,
 }
 
 /* MPOL_PREFERRED_MANY allows multiple nodes to be set in 'nodes' */
-static void __maybe_unused mpol_rebind_preferred_many(struct mempolicy *pol,
-						      const nodemask_t *nodes)
+static void mpol_rebind_preferred_many(struct mempolicy *pol,
+				       const nodemask_t *nodes)
 {
 	mpol_rebind_preferred_common(pol, nodes, nodes);
 }
@@ -452,7 +450,7 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new)
 	mmap_write_unlock(mm);
 }
 
-static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = {
+static const struct mempolicy_operations mpol_ops[MPOL_MAX] = {
 	[MPOL_DEFAULT] = {
 		.rebind = mpol_rebind_default,
 	},
@@ -470,8 +468,8 @@ static const struct mempolicy_operations mpol_ops[MPOL_MAX + 1] = {
 	},
 	/* [MPOL_LOCAL] - see mpol_new() */
 	[MPOL_PREFERRED_MANY] = {
-		.create = NULL,
-		.rebind = NULL,
+		.create = mpol_new_preferred_many,
+		.rebind = mpol_rebind_preferred_many,
 	},
 };
 
-- 
2.7.4


  parent reply	other threads:[~2021-03-03 15:59 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-03 10:20 [PATCH v3 00/14] Introduced multi-preference mempolicy Feng Tang
2021-03-03 10:20 ` [PATCH v3 01/14] mm/mempolicy: Add comment for missing LOCAL Feng Tang
2021-03-10  6:27   ` Feng Tang
2021-03-03 10:20 ` [PATCH v3 02/14] mm/mempolicy: convert single preferred_node to full nodemask Feng Tang
2021-03-03 10:20 ` [PATCH v3 03/14] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Feng Tang
2021-03-03 10:20 ` [PATCH v3 04/14] mm/mempolicy: allow preferred code to take a nodemask Feng Tang
2021-03-03 10:20 ` [PATCH v3 05/14] mm/mempolicy: refactor rebind code for PREFERRED_MANY Feng Tang
2021-03-03 10:20 ` [PATCH v3 06/14] mm/mempolicy: kill v.preferred_nodes Feng Tang
2021-03-03 10:20 ` [PATCH v3 07/14] mm/mempolicy: handle MPOL_PREFERRED_MANY like BIND Feng Tang
2021-03-03 10:20 ` [PATCH v3 08/14] mm/mempolicy: Create a page allocator for policy Feng Tang
2021-03-03 10:20 ` [PATCH v3 09/14] mm/mempolicy: Thread allocation for many preferred Feng Tang
2021-03-03 10:20 ` [PATCH v3 10/14] mm/mempolicy: VMA " Feng Tang
2021-03-03 10:20 ` [PATCH v3 11/14] mm/mempolicy: huge-page " Feng Tang
2021-03-03 10:20 ` Feng Tang [this message]
2021-03-03 10:20 ` [PATCH v3 13/14] mem/mempolicy: unify mpol_new_preferred() and mpol_new_preferred_many() Feng Tang
2021-03-03 10:20 ` [PATCH v3 RFC 14/14] mm: speedup page alloc for MPOL_PREFERRED_MANY by adding a NO_SLOWPATH gfp bit Feng Tang
2021-03-03 11:39   ` Michal Hocko
2021-03-03 12:07     ` Feng Tang
2021-03-03 12:18       ` Feng Tang
2021-03-03 12:32         ` Michal Hocko
2021-03-03 13:18           ` Feng Tang
2021-03-03 13:46             ` Feng Tang
2021-03-03 13:59               ` Michal Hocko
2021-03-03 16:31                 ` Ben Widawsky
2021-03-03 16:48                   ` Dave Hansen
2021-03-10  5:19                     ` Feng Tang
2021-03-10  9:44                       ` Michal Hocko
2021-03-10 11:49                         ` Feng Tang
2021-03-03 17:14                   ` Michal Hocko
2021-03-03 17:22                     ` Ben Widawsky
2021-03-04  8:14                       ` Feng Tang
2021-03-04 12:59                         ` Michal Hocko
2021-03-05  2:21                           ` Feng Tang
2021-03-04 12:57                       ` Michal Hocko
2021-03-03 13:53             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1614766858-90344-13-git-send-email-feng.tang@intel.com \
    --to=feng.tang@intel.com \
    --cc=aarcange@redhat.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=ben.widawsky@intel.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=mike.kravetz@oracle.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.