linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@linux.ibm.com>
To: Pingfan Liu <kernelfans@gmail.com>
Cc: x86@kernel.org, linux-mm@kvack.org,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mike Rapoport <rppt@linux.vnet.ibm.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@suse.de>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Andy Lutomirski <luto@kernel.org>,
	Andi Kleen <ak@linux.intel.com>, Petr Tesarik <ptesarik@suse.cz>,
	Michal Hocko <mhocko@suse.com>,
	Stephen Rothwell <sfr@canb.auug.org.au>,
	Jonathan Corbet <corbet@lwn.net>,
	Nicholas Piggin <npiggin@gmail.com>,
	Daniel Vacek <neelx@redhat.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 2/6] mm/memblock: make full utilization of numa info
Date: Tue, 26 Feb 2019 13:58:44 +0200	[thread overview]
Message-ID: <20190226115844.GG11981@rapoport-lnx> (raw)
In-Reply-To: <1551011649-30103-3-git-send-email-kernelfans@gmail.com>

On Sun, Feb 24, 2019 at 08:34:05PM +0800, Pingfan Liu wrote:
> There are numa machines with memory-less node. When allocating memory for
> the memory-less node, memblock allocator falls back to 'Node 0' without fully
> utilizing the nearest node. This hurts the performance, especially for per
> cpu section. Suppressing this defect by building the full node fall back
> info for memblock allocator, like what we have done for page allocator.

Is it really necessary to build full node fallback info for memblock and
then rebuild it again for the page allocator?

I think it should be possible to split parts of build_all_zonelists_init()
that do not touch per-cpu areas into a separate function and call that
function after topology detection. Then it would be possible to use
local_memory_node() when calling memblock.
 
> Signed-off-by: Pingfan Liu <kernelfans@gmail.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Dave Hansen <dave.hansen@linux.intel.com>
> CC: Vlastimil Babka <vbabka@suse.cz>
> CC: Mike Rapoport <rppt@linux.vnet.ibm.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Mel Gorman <mgorman@suse.de>
> CC: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> CC: Andy Lutomirski <luto@kernel.org>
> CC: Andi Kleen <ak@linux.intel.com>
> CC: Petr Tesarik <ptesarik@suse.cz>
> CC: Michal Hocko <mhocko@suse.com>
> CC: Stephen Rothwell <sfr@canb.auug.org.au>
> CC: Jonathan Corbet <corbet@lwn.net>
> CC: Nicholas Piggin <npiggin@gmail.com>
> CC: Daniel Vacek <neelx@redhat.com>
> CC: linux-kernel@vger.kernel.org
> ---
>  include/linux/memblock.h |  3 +++
>  mm/memblock.c            | 68 ++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 66 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 64c41cf..ee999c5 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -342,6 +342,9 @@ void *memblock_alloc_try_nid_nopanic(phys_addr_t size, phys_addr_t align,
>  void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align,
>  			     phys_addr_t min_addr, phys_addr_t max_addr,
>  			     int nid);
> +extern int build_node_order(int *node_oder_array, int sz,
> +	int local_node, nodemask_t *used_mask);
> +void memblock_build_node_order(void);
> 
>  static inline void * __init memblock_alloc(phys_addr_t size,  phys_addr_t align)
>  {
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 022d4cb..cf78850 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1338,6 +1338,47 @@ phys_addr_t __init memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t ali
>  	return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
>  }
> 
> +static int **node_fallback __initdata;
> +
> +/*
> + * build_node_order() relies on cpumask_of_node(), hence arch should set up
> + * cpumask before calling this func.
> + */
> +void __init memblock_build_node_order(void)
> +{
> +	int nid, i;
> +	nodemask_t used_mask;
> +
> +	node_fallback = memblock_alloc(MAX_NUMNODES * sizeof(int *),
> +		sizeof(int *));
> +	for_each_online_node(nid) {
> +		node_fallback[nid] = memblock_alloc(
> +			num_online_nodes() * sizeof(int), sizeof(int));
> +		for (i = 0; i < num_online_nodes(); i++)
> +			node_fallback[nid][i] = NUMA_NO_NODE;
> +	}
> +
> +	for_each_online_node(nid) {
> +		nodes_clear(used_mask);
> +		node_set(nid, used_mask);
> +		build_node_order(node_fallback[nid], num_online_nodes(),
> +			nid, &used_mask);
> +	}
> +}
> +
> +static void __init memblock_free_node_order(void)
> +{
> +	int nid;
> +
> +	if (!node_fallback)
> +		return;
> +	for_each_online_node(nid)
> +		memblock_free(__pa(node_fallback[nid]),
> +			num_online_nodes() * sizeof(int));
> +	memblock_free(__pa(node_fallback), MAX_NUMNODES * sizeof(int *));
> +	node_fallback = NULL;
> +}
> +
>  /**
>   * memblock_alloc_internal - allocate boot memory block
>   * @size: size of memory block to be allocated in bytes
> @@ -1370,6 +1411,7 @@ static void * __init memblock_alloc_internal(
>  {
>  	phys_addr_t alloc;
>  	void *ptr;
> +	int node;
>  	enum memblock_flags flags = choose_memblock_flags();
> 
>  	if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
> @@ -1397,11 +1439,26 @@ static void * __init memblock_alloc_internal(
>  		goto done;
> 
>  	if (nid != NUMA_NO_NODE) {
> -		alloc = memblock_find_in_range_node(size, align, min_addr,
> -						    max_addr, NUMA_NO_NODE,
> -						    flags);
> -		if (alloc && !memblock_reserve(alloc, size))
> -			goto done;
> +		if (!node_fallback) {
> +			alloc = memblock_find_in_range_node(size, align,
> +					min_addr, max_addr,
> +					NUMA_NO_NODE, flags);
> +			if (alloc && !memblock_reserve(alloc, size))
> +				goto done;
> +		} else {
> +			int i;
> +			for (i = 0; i < num_online_nodes(); i++) {
> +				node = node_fallback[nid][i];
> +				/* fallback list has all memory nodes */
> +				if (node == NUMA_NO_NODE)
> +					break;
> +				alloc = memblock_find_in_range_node(size,
> +						align, min_addr, max_addr,
> +						node, flags);
> +				if (alloc && !memblock_reserve(alloc, size))
> +					goto done;
> +			}
> +		}
>  	}
> 
>  	if (min_addr) {
> @@ -1969,6 +2026,7 @@ unsigned long __init memblock_free_all(void)
> 
>  	reset_all_zones_managed_pages();
> 
> +	memblock_free_node_order();
>  	pages = free_low_memory_core_early();
>  	totalram_pages_add(pages);
> 
> -- 
> 2.7.4
> 

-- 
Sincerely yours,
Mike.


  parent reply	other threads:[~2019-02-26 11:59 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-24 12:34 [PATCH 0/6] make memblock allocator utilize the node's fallback info Pingfan Liu
2019-02-24 12:34 ` [PATCH 1/6] mm/numa: extract the code of building node fall back list Pingfan Liu
2019-02-24 12:34 ` [PATCH 2/6] mm/memblock: make full utilization of numa info Pingfan Liu
2019-02-25  7:07   ` kbuild test robot
2019-02-25  7:59   ` kbuild test robot
2019-02-25 15:34   ` Dave Hansen
2019-02-26  5:40     ` Pingfan Liu
2019-02-26 12:37       ` Dave Hansen
2019-02-26 11:58   ` Mike Rapoport [this message]
2019-02-27  9:23     ` Pingfan Liu
2019-02-24 12:34 ` [PATCH 3/6] x86/numa: define numa_init_array() conditional on CONFIG_NUMA Pingfan Liu
2019-02-25 15:23   ` Dave Hansen
2019-02-26  5:40     ` Pingfan Liu
2019-02-24 12:34 ` [PATCH 4/6] x86/numa: concentrate the code of setting cpu to node map Pingfan Liu
2019-02-24 12:34 ` [PATCH 5/6] x86/numa: push forward the setup of node to cpumask map Pingfan Liu
2019-02-25 15:30   ` Dave Hansen
2019-02-26  5:40     ` Pingfan Liu
2019-02-24 12:34 ` [PATCH 6/6] x86/numa: build node fallback info after setting up " Pingfan Liu
2019-02-25 16:03 ` [PATCH 0/6] make memblock allocator utilize the node's fallback info Michal Hocko
2019-02-26  5:47   ` Pingfan Liu
2019-02-26 12:09     ` Michal Hocko
2019-03-05 12:37       ` Pingfan Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190226115844.GG11981@rapoport-lnx \
    --to=rppt@linux.ibm.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=bp@alien8.de \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=kernelfans@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=mingo@redhat.com \
    --cc=neelx@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=ptesarik@suse.cz \
    --cc=rppt@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).