From: Ben Widawsky <ben.widawsky@intel.com>
To: linux-mm <linux-mm@kvack.org>
Subject: [PATCH 00/18] multiple preferred nodes
Date: Fri, 19 Jun 2020 09:23:56 -0700 [thread overview]
Message-ID: <20200619162414.1052234-1-ben.widawsky@intel.com> (raw)
This patch series introduces the concept of the MPOL_PREFERRED_MANY mempolicy.
This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2)
interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a
preference for nodes which will fulfil memory allocation requests. Like the
MPOL_BIND interface, it works over a set of nodes.
Summary:
1-2: Random fixes I found along the way
3-4: Logic to handle many preferred nodes in page allocation
5-9: Plumbing to allow multiple preferred nodes in mempolicy
10-13: Teach page allocation APIs about nodemasks
14: Provide a helper to generate preferred nodemasks
15: Have page allocation callers generate preferred nodemasks
16-17: Flip the switch to have __alloc_pages_nodemask take preferred mask.
18: Expose the new uapi
Along with these patches are patches for libnuma, numactl, numademo, and memhog.
They still need some polish, but can be found here:
https://gitlab.com/bwidawsk/numactl/-/tree/prefer-many
It allows new usage: `numactl -P 0,3,4`
The goal of the new mode is to enable some use-cases when using tiered memory
usage models which I've lovingly named.
1a. The Hare - The interconnect is fast enough to meet bandwidth and latency
requirements allowing preference to be given to all nodes with "fast" memory.
1b. The Indiscriminate Hare - An application knows it wants fast memory (or
perhaps slow memory), but doesn't care which node it runs on. The application
can prefer a set of nodes and then xpu bind to the local node (cpu, accelerator,
etc). This reverses the nodes are chosen today where the kernel attempts to use
local memory to the CPU whenever possible. This will attempt to use the local
accelerator to the memory.
2. The Tortoise - The administrator (or the application itself) is aware it only
needs slow memory, and so can prefer that.
Much of this is almost achievable with the bind interface, but the bind
interface suffers from an inability to fallback to another set of nodes if
binding fails to all nodes in the nodemask.
Like MPOL_BIND a nodemask is given. Inherently this removes ordering from the
preference.
> /* Set first two nodes as preferred in an 8 node system. */
> const unsigned long nodes = 0x3
> set_mempolicy(MPOL_PREFER_MANY, &nodes, 8);
> /* Mimic interleave policy, but have fallback *.
> const unsigned long nodes = 0xaa
> set_mempolicy(MPOL_PREFER_MANY, &nodes, 8);
Some internal discussion took place around the interface. There are two
alternatives which we have discussed, plus one I stuck in:
1. Ordered list of nodes. Currently it's believed that the added complexity is
nod needed for expected usecases.
2. A flag for bind to allow falling back to other nodes. This confuses the
notion of binding and is less flexible than the current solution.
3. Create flags or new modes that helps with some ordering. This offers both a
friendlier API as well as a solution for more customized usage. It's unknown
if it's worth the complexity to support this. Here is sample code for how
this might work:
> // Default
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_SOCKET, NULL, 0);
> // which is the same as
> set_mempolicy(MPOL_DEFAULT, NULL, 0);
>
> // The Hare
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE, NULL, 0);
>
> // The Tortoise
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE_REV, NULL, 0);
>
> // Prefer the fast memory of the first two sockets
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE, -1, 2);
>
> // Prefer specific nodes for some something wacky
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE_CUSTOM, 0x17c, 1024);
---
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Li Xinhai <lixinhai.lxh@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Ben Widawsky (14):
mm/mempolicy: Add comment for missing LOCAL
mm/mempolicy: Use node_mem_id() instead of node_id()
mm/page_alloc: start plumbing multi preferred node
mm/page_alloc: add preferred pass to page allocation
mm: Finish handling MPOL_PREFERRED_MANY
mm: clean up alloc_pages_vma (thp)
mm: Extract THP hugepage allocation
mm/mempolicy: Use __alloc_page_node for interleaved
mm: kill __alloc_pages
mm/mempolicy: Introduce policy_preferred_nodes()
mm: convert callers of __alloc_pages_nodemask to pmask
alloc_pages_nodemask: turn preferred nid into a nodemask
mm: Use less stack for page allocations
mm/mempolicy: Advertise new MPOL_PREFERRED_MANY
Dave Hansen (4):
mm/mempolicy: convert single preferred_node to full nodemask
mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes
mm/mempolicy: allow preferred code to take a nodemask
mm/mempolicy: refactor rebind code for PREFERRED_MANY
.../admin-guide/mm/numa_memory_policy.rst | 22 +-
include/linux/gfp.h | 19 +-
include/linux/mempolicy.h | 4 +-
include/linux/migrate.h | 4 +-
include/linux/mmzone.h | 3 +
include/uapi/linux/mempolicy.h | 6 +-
mm/hugetlb.c | 10 +-
mm/internal.h | 1 +
mm/mempolicy.c | 271 +++++++++++++-----
mm/page_alloc.c | 179 +++++++++++-
10 files changed, 403 insertions(+), 116 deletions(-)
--
2.27.0
next reply other threads:[~2020-06-19 16:25 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-19 16:23 Ben Widawsky [this message]
2020-06-19 16:23 ` [PATCH 01/18] mm/mempolicy: Add comment for missing LOCAL Ben Widawsky
2020-06-19 16:23 ` [PATCH 02/18] mm/mempolicy: Use node_mem_id() instead of node_id() Ben Widawsky
2020-06-19 16:23 ` [PATCH 03/18] mm/page_alloc: start plumbing multi preferred node Ben Widawsky
2020-06-19 16:24 ` [PATCH 04/18] mm/page_alloc: add preferred pass to page allocation Ben Widawsky
2020-06-19 16:24 ` [PATCH 05/18] mm/mempolicy: convert single preferred_node to full nodemask Ben Widawsky
2020-06-19 16:24 ` [PATCH 06/18] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Ben Widawsky
2020-06-19 16:24 ` [PATCH 07/18] mm/mempolicy: allow preferred code to take a nodemask Ben Widawsky
2020-06-19 16:24 ` [PATCH 08/18] mm/mempolicy: refactor rebind code for PREFERRED_MANY Ben Widawsky
2020-06-19 16:24 ` [PATCH 09/18] mm: Finish handling MPOL_PREFERRED_MANY Ben Widawsky
2020-06-19 16:24 ` [PATCH 10/18] mm: clean up alloc_pages_vma (thp) Ben Widawsky
2020-06-19 16:24 ` [PATCH 11/18] mm: Extract THP hugepage allocation Ben Widawsky
2020-06-19 16:24 ` [PATCH 12/18] mm/mempolicy: Use __alloc_page_node for interleaved Ben Widawsky
2020-06-19 16:24 ` [PATCH 13/18] mm: kill __alloc_pages Ben Widawsky
2020-06-19 16:25 ` [PATCH 00/18] multiple preferred nodes Ben Widawsky
2020-06-19 16:24 Ben Widawsky
2020-06-22 7:09 ` Michal Hocko
2020-06-23 11:20 ` Michal Hocko
2020-06-23 16:12 ` Ben Widawsky
2020-06-24 7:52 ` Michal Hocko
2020-06-24 16:16 ` Ben Widawsky
2020-06-24 18:39 ` Michal Hocko
2020-06-24 19:37 ` Ben Widawsky
2020-06-24 19:51 ` Michal Hocko
2020-06-24 20:01 ` Ben Widawsky
2020-06-24 20:07 ` Michal Hocko
2020-06-24 20:23 ` Ben Widawsky
2020-06-24 20:42 ` Michal Hocko
2020-06-24 20:55 ` Ben Widawsky
2020-06-25 6:28 ` Michal Hocko
2020-06-26 21:39 ` Ben Widawsky
2020-06-29 10:16 ` Michal Hocko
2020-06-22 20:54 ` Andi Kleen
2020-06-22 21:02 ` Ben Widawsky
2020-06-22 21:07 ` Dave Hansen
2020-06-22 22:02 ` Andi Kleen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200619162414.1052234-1-ben.widawsky@intel.com \
--to=ben.widawsky@intel.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).