All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ben Widawsky <ben.widawsky@intel.com>
To: linux-mm <linux-mm@kvack.org>
Cc: Ben Widawsky <ben.widawsky@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Michal Hocko <mhocko@kernel.org>
Subject: [PATCH v2 RESEND 00/12] Introduced multi-preference mempolicy
Date: Fri, 30 Oct 2020 12:02:26 -0700	[thread overview]
Message-ID: <20201030190238.306764-1-ben.widawsky@intel.com> (raw)

Significant changes since last send:
None. Just a rebase and conflict resolution.
Using get_maintainer.pl for --to and --cc this time.

Significant changes since v1:
* Dropped patch to replace numa_node_id in some places (mhocko)
* Dropped all the page allocation patches in favor of new mechanism to use
  fallbacks. (mhocko)
* Dropped the special snowflake preferred node algorithm (bwidawsk)
* If the preferred node fails, ALL nodes are rechecked instead of just the
  non-preferred nodes.

In v1, Andi Kleen brought up reusing MPOL_PREFERRED as the mode for the API.
There wasn't consensus around this, so I've left the existing API as it was. I'm
open to more feedback here, but my slight preference is to use a new API as it
ensures if people are using it, they are entirely aware of what they're doing
and not accidentally misusing the old interface. (In a similar way to how
MPOL_LOCAL was introduced).

In v1, Michal also brought up renaming this MPOL_PREFERRED_MASK. I'm equally
fine with that change, but I hadn't heard much emphatic support for one way or
another, so I've left that too.

v2 Summary:
1: Random fix I found along the way
2-5: Represent node preference as a mask internally
6-7: Tread many preferred like bind
8-11: Handle page allocation for the new policy
12: Enable the uapi

This patch series introduces the concept of the MPOL_PREFERRED_MANY mempolicy.
This mempolicy mode can be used with either the set_mempolicy(2) or mbind(2)
interfaces. Like the MPOL_PREFERRED interface, it allows an application to set a
preference for nodes which will fulfil memory allocation requests. Unlike the
MPOL_PREFERRED mode, it takes a set of nodes. Like the MPOL_BIND interface, it
works over a set of nodes. Unlike MPOL_BIND, it will not cause a SIGSEGV or
invoke the OOM killer if those preferred nodes are not available.

Along with these patches are patches for libnuma, numactl, numademo, and memhog.
They still need some polish, but can be found here:
https://gitlab.com/bwidawsk/numactl/-/tree/prefer-many
It allows new usage: `numactl -P 0,3,4`

The goal of the new mode is to enable some use-cases when using tiered memory
usage models which I've lovingly named.
1a. The Hare - The interconnect is fast enough to meet bandwidth and latency
    requirements allowing preference to be given to all nodes with "fast" memory.
1b. The Indiscriminate Hare - An application knows it wants fast memory (or
    perhaps slow memory), but doesn't care which node it runs on. The
    application can prefer a set of nodes and then xpu bind to the local node
    (cpu, accelerator, etc). This reverses the nodes are chosen today where the
    kernel attempts to use local memory to the CPU whenever possible. This will
    attempt to use the local accelerator to the memory.
2. The Tortoise - The administrator (or the application itself) is aware it only
   needs slow memory, and so can prefer that.

Much of this is almost achievable with the bind interface, but the bind
interface suffers from an inability to fallback to another set of nodes if
binding fails to all nodes in the nodemask.

Like MPOL_BIND a nodemask is given. Inherently this removes ordering from the
preference.

> /* Set first two nodes as preferred in an 8 node system. */
> const unsigned long nodes = 0x3
> set_mempolicy(MPOL_PREFER_MANY, &nodes, 8);

> /* Mimic interleave policy, but have fallback *.
> const unsigned long nodes = 0xaa
> set_mempolicy(MPOL_PREFER_MANY, &nodes, 8);

Some internal discussion took place around the interface. There are two
alternatives which we have discussed, plus one I stuck in:
1. Ordered list of nodes. Currently it's believed that the added complexity is
   nod needed for expected usecases.
2. A flag for bind to allow falling back to other nodes. This confuses the
   notion of binding and is less flexible than the current solution.
3. Create flags or new modes that helps with some ordering. This offers both a
   friendlier API as well as a solution for more customized usage. It's unknown
   if it's worth the complexity to support this. Here is sample code for how
   this might work:

> // Prefer specific nodes for some something wacky
> set_mempolicy(MPOL_PREFER_MANY, 0x17c, 1024);
>
> // Default
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_SOCKET, NULL, 0);
> // which is the same as
> set_mempolicy(MPOL_DEFAULT, NULL, 0);
>
> // The Hare
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE, NULL, 0);
>
> // The Tortoise
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE_REV, NULL, 0);
>
> // Prefer the fast memory of the first two sockets
> set_mempolicy(MPOL_PREFER_MANY | MPOL_F_PREFER_ORDER_TYPE, -1, 2);

Ben Widawsky (8):
  mm/mempolicy: Add comment for missing LOCAL
  mm/mempolicy: kill v.preferred_nodes
  mm/mempolicy: handle MPOL_PREFERRED_MANY like BIND
  mm/mempolicy: Create a page allocator for policy
  mm/mempolicy: Thread allocation for many preferred
  mm/mempolicy: VMA allocation for many preferred
  mm/mempolicy: huge-page allocation for many preferred
  mm/mempolicy: Advertise new MPOL_PREFERRED_MANY

Dave Hansen (4):
  mm/mempolicy: convert single preferred_node to full nodemask
  mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes
  mm/mempolicy: allow preferred code to take a nodemask
  mm/mempolicy: refactor rebind code for PREFERRED_MANY

 .../admin-guide/mm/numa_memory_policy.rst     |  22 +-
 include/linux/mempolicy.h                     |   6 +-
 include/uapi/linux/mempolicy.h                |   6 +-
 mm/hugetlb.c                                  |  20 +-
 mm/mempolicy.c                                | 271 ++++++++++++------
 5 files changed, 220 insertions(+), 105 deletions(-)

-- 
2.29.2



             reply	other threads:[~2020-10-30 19:03 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-30 19:02 Ben Widawsky [this message]
2020-10-30 19:02 ` [PATCH 01/12] mm/mempolicy: Add comment for missing LOCAL Ben Widawsky
2020-10-30 19:02 ` [PATCH 02/12] mm/mempolicy: convert single preferred_node to full nodemask Ben Widawsky
2020-10-30 19:02 ` [PATCH 03/12] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Ben Widawsky
2020-10-30 19:02 ` [PATCH 04/12] mm/mempolicy: allow preferred code to take a nodemask Ben Widawsky
2021-01-03 13:34   ` [mm/mempolicy] 5ef9c2a53c: Kernel_panic-not_syncing:stack-protector:Kernel_stack_is_corrupted_in:mpol_new_preferred kernel test robot
2021-01-03 13:34     ` kernel test robot
2020-10-30 19:02 ` [PATCH 05/12] mm/mempolicy: refactor rebind code for PREFERRED_MANY Ben Widawsky
2020-10-30 19:02 ` [PATCH 06/12] mm/mempolicy: kill v.preferred_nodes Ben Widawsky
2020-10-30 19:02 ` [PATCH 07/12] mm/mempolicy: handle MPOL_PREFERRED_MANY like BIND Ben Widawsky
2020-10-30 19:02 ` [PATCH 08/12] mm/mempolicy: Create a page allocator for policy Ben Widawsky
2020-10-30 19:02 ` [PATCH 09/12] mm/mempolicy: Thread allocation for many preferred Ben Widawsky
2020-10-30 19:02 ` [PATCH 10/12] mm/mempolicy: VMA " Ben Widawsky
2020-10-30 19:02 ` [PATCH 11/12] mm/mempolicy: huge-page " Ben Widawsky
2020-10-30 19:02 ` [PATCH 12/12] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201030190238.306764-1-ben.widawsky@intel.com \
    --to=ben.widawsky@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.