linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Ben Widawsky <ben.widawsky@intel.com>
Cc: linux-mm <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Lee Schermerhorn <lee.schermerhorn@hp.com>
Subject: Re: [PATCH 02/18] mm/mempolicy: Use node_mem_id() instead of node_id()
Date: Wed, 24 Jun 2020 10:25:59 +0200	[thread overview]
Message-ID: <20200624082559.GE1320@dhcp22.suse.cz> (raw)
In-Reply-To: <20200619162425.1052382-3-ben.widawsky@intel.com>

On Fri 19-06-20 09:24:09, Ben Widawsky wrote:
> Calling out some distinctions first as I understand it, and the
> reasoning of the patch:
> numa_node_id() - The node id for the currently running CPU.
> numa_mem_id() - The node id for the closest memory node.

Correct

> The case where they are not the same is CONFIG_HAVE_MEMORYLESS_NODES.
> Only ia64 and powerpc support this option, so it is perhaps not a very
> interesting situation to most.

Other arches can have nodes without any memory as well. Just offline all
the managed memory via hotplug... (please note that such node still
might have memory present! Just not useable by the page allocator)

> The question is, when you do want which?

Short answer is that you shouldn't really care. The fact that we do and
that we have a distinct API for that is IMHO a mistake of the past IMHO.

A slightly longer answer would be that the allocator should fallback to
a proper node(s) even if you specify a memory less node as a preferred
one. That is achieved by proper zonelist for each existing NUMA node.
There are bugs here and there when some nodes do not get their zonelists
initialized but fundamentally this should be nobrainer. There are also
corner cases where an application might have been bound to a node which
went offline completely which would be "fun"

> numa_node_id() is definitely
> what's desired if MPOL_PREFERRED, or MPOL_LOCAL were used, since the ABI
> states "This mode specifies "local allocation"; the memory is allocated
> on the node of the CPU that triggered the allocation (the "local
> node")."

In fact from the allocator point of view there is no real difference
because there is nothing to allocate from the node without memory,
obviously so it would fallback to the next node/zones from the closest
node...

> It would be weird, though not impossible to set this policy on
> a CPU that has memoryless nodes.

Keep in mind that the memory might went away via hotplug.

> A more likely way to hit this is with
> interleaving. The current interfaces will return some equally weird
> thing, but at least it's symmetric. Therefore, in cases where the node
> is being queried for the currently running process, it probably makes
> sense to use numa_node_id(). For other cases however, when CPU is trying
> to obtain the "local" memory, numa_mem_id() already contains this and
> should be used instead.
> 
> This really should only effect configurations where
> CONFIG_HAVE_MEMORYLESS_NODES=y, and even on those machines it's quite
> possible the ultimate behavior would be identical.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
> Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>

Well, mempolicy.c uses numa_node_id in most cases and I do not see these
two being special. So if we want to change that it should be done
consistently. I even suspect that these changes are mostly nops because
respective zonelists will do the right thing, But there might be land
mines here and there - e.g. if __GFP_THISNODE was used then somebody
might expect a failure rather than a misplaced allocation because there
is other fallback mechanism on a depleted numa node.

All that being said I am not sure this is an actual improvement.

> ---
>I  mm/mempolicy.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 36ee3267c25f..99e0f3f9c4a6 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -1991,7 +1991,7 @@ static unsigned offset_il_node(struct mempolicy *pol, unsigned long n)
>  	int nid;
>  
>  	if (!nnodes)
> -		return numa_node_id();
> +		return numa_mem_id();
>  	target = (unsigned int)n % nnodes;
>  	nid = first_node(pol->v.nodes);
>  	for (i = 0; i < target; i++)
> @@ -2049,7 +2049,7 @@ int huge_node(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags,
>  		nid = interleave_nid(*mpol, vma, addr,
>  					huge_page_shift(hstate_vma(vma)));
>  	} else {
> -		nid = policy_node(gfp_flags, *mpol, numa_node_id());
> +		nid = policy_node(gfp_flags, *mpol, numa_mem_id());
>  		if ((*mpol)->mode == MPOL_BIND)
>  			*nodemask = &(*mpol)->v.nodes;
>  	}
> -- 
> 2.27.0
> 

-- 
Michal Hocko
SUSE Labs


  reply	other threads:[~2020-06-24  8:26 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-19 16:24 [PATCH 00/18] multiple preferred nodes Ben Widawsky
2020-06-19 16:24 ` [PATCH 01/18] mm/mempolicy: Add comment for missing LOCAL Ben Widawsky
2020-06-24  7:55   ` Michal Hocko
2020-06-19 16:24 ` [PATCH 02/18] mm/mempolicy: Use node_mem_id() instead of node_id() Ben Widawsky
2020-06-24  8:25   ` Michal Hocko [this message]
2020-06-24 16:48     ` Ben Widawsky
2020-06-26 12:30       ` Michal Hocko
2020-06-19 16:24 ` [PATCH 03/18] mm/page_alloc: start plumbing multi preferred node Ben Widawsky
2020-06-19 16:24 ` [PATCH 04/18] mm/page_alloc: add preferred pass to page allocation Ben Widawsky
2020-06-19 16:24 ` [PATCH 05/18] mm/mempolicy: convert single preferred_node to full nodemask Ben Widawsky
2020-06-19 16:24 ` [PATCH 06/18] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Ben Widawsky
2020-06-19 16:24 ` [PATCH 07/18] mm/mempolicy: allow preferred code to take a nodemask Ben Widawsky
2020-06-19 16:24 ` [PATCH 08/18] mm/mempolicy: refactor rebind code for PREFERRED_MANY Ben Widawsky
2020-06-19 16:24 ` [PATCH 09/18] mm: Finish handling MPOL_PREFERRED_MANY Ben Widawsky
2020-06-19 16:24 ` [PATCH 10/18] mm: clean up alloc_pages_vma (thp) Ben Widawsky
2020-06-19 16:24 ` [PATCH 11/18] mm: Extract THP hugepage allocation Ben Widawsky
2020-06-19 16:24 ` [PATCH 12/18] mm/mempolicy: Use __alloc_page_node for interleaved Ben Widawsky
2020-06-19 16:24 ` [PATCH 13/18] mm: kill __alloc_pages Ben Widawsky
2020-06-19 16:24 ` [PATCH 14/18] mm/mempolicy: Introduce policy_preferred_nodes() Ben Widawsky
2020-06-19 16:24 ` [PATCH 15/18] mm: convert callers of __alloc_pages_nodemask to pmask Ben Widawsky
2020-06-19 16:24 ` [PATCH 16/18] alloc_pages_nodemask: turn preferred nid into a nodemask Ben Widawsky
2020-06-19 16:24 ` [PATCH 17/18] mm: Use less stack for page allocations Ben Widawsky
2020-06-19 16:24 ` [PATCH 18/18] mm/mempolicy: Advertise new MPOL_PREFERRED_MANY Ben Widawsky
2020-06-22  7:09 ` [PATCH 00/18] multiple preferred nodes Michal Hocko
2020-06-23 11:20   ` Michal Hocko
2020-06-23 16:12     ` Ben Widawsky
2020-06-24  7:52       ` Michal Hocko
2020-06-24 16:16         ` Ben Widawsky
2020-06-24 18:39           ` Michal Hocko
2020-06-24 19:37             ` Ben Widawsky
2020-06-24 19:51               ` Michal Hocko
2020-06-24 20:01                 ` Ben Widawsky
2020-06-24 20:07                   ` Michal Hocko
2020-06-24 20:23                     ` Ben Widawsky
2020-06-24 20:42                       ` Michal Hocko
2020-06-24 20:55                         ` Ben Widawsky
2020-06-25  6:28                           ` Michal Hocko
2020-06-26 21:39         ` Ben Widawsky
2020-06-29 10:16           ` Michal Hocko
2020-06-22 20:54 ` Andi Kleen
2020-06-22 21:02   ` Ben Widawsky
2020-06-22 21:07   ` Dave Hansen
2020-06-22 22:02     ` Andi Kleen
  -- strict thread matches above, loose matches on Subject: below --
2020-06-19 16:23 Ben Widawsky
2020-06-19 16:23 ` [PATCH 02/18] mm/mempolicy: Use node_mem_id() instead of node_id() Ben Widawsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200624082559.GE1320@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ben.widawsky@intel.com \
    --cc=lee.schermerhorn@hp.com \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).