linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] to support memblock near alloc and memoryless on arm64
@ 2016-10-25  2:59 Zhen Lei
  2016-10-25  2:59 ` [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc Zhen Lei
  2016-10-25  2:59 ` [PATCH 2/2] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
  0 siblings, 2 replies; 11+ messages in thread
From: Zhen Lei @ 2016-10-25  2:59 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm
  Cc: Zefan Li, Xinwei Hu, Hanjun Guo, Zhen Lei

If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
actually exist. The percpu variable areas and numa control blocks of that
memoryless numa nodes need to be allocated from the nearest available
node to improve performance.

In the beginning, I added a new function:
phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size, phys_addr_t align, int nid);

But it can not replace memblock_virt_alloc_try_nid, because the latter can specify a min_addr,
it usually be assigned as __pa(MAX_DMA_ADDRESS), to prevent memory be allocated from DMA area.
It's bad to add another function, because the code will be duplicated in these two functions.

So I make memblock_alloc_near_nid to be called in the subfunctions of memblock_alloc_try_nid
and memblock_virt_alloc_try_nid. Add a macro node_distance_ready to distinguish different
situations:
1) By default, the value of node_distance_ready is zero, memblock_*_try_nid work as normal as before.
2) ARCH platforms set the value of node_distance_ready to be true when numa node distances are ready, (please refer patch 2)
   memblock_*_try_nid allocate memory from the nearest node relative to the specified node.

Zhen Lei (2):
  mm/memblock: prepare a capability to support memblock near alloc
  arm64/numa: support HAVE_MEMORYLESS_NODES

 arch/arm64/Kconfig            |  4 +++
 arch/arm64/include/asm/numa.h |  3 ++
 arch/arm64/mm/numa.c          |  6 +++-
 mm/memblock.c                 | 76 ++++++++++++++++++++++++++++++++++++-------
 4 files changed, 77 insertions(+), 12 deletions(-)

-- 
2.5.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
  2016-10-25  2:59 [PATCH 0/2] to support memblock near alloc and memoryless on arm64 Zhen Lei
@ 2016-10-25  2:59 ` Zhen Lei
  2016-10-25 13:23   ` Michal Hocko
  2016-10-25  2:59 ` [PATCH 2/2] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
  1 sibling, 1 reply; 11+ messages in thread
From: Zhen Lei @ 2016-10-25  2:59 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm
  Cc: Zefan Li, Xinwei Hu, Hanjun Guo, Zhen Lei

If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
actually exist. The percpu variable areas and numa control blocks of that
memoryless numa nodes need to be allocated from the nearest available
node to improve performance.

Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
specified nid at the first time, but if that allocation failed it will
directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
the second time.

To compatible the above old scene, I use a marco node_distance_ready to
control it. By default, the marco node_distance_ready is not defined in
any platforms, the above mentioned functions will work as normal as
before. Otherwise, they will try the nearest node first.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 mm/memblock.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 65 insertions(+), 11 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 7608bc3..556bbd2 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
 	return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
 }

+#ifndef node_distance_ready
+#define node_distance_ready()		0
+#endif
+
+static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size,
+					phys_addr_t align, phys_addr_t start,
+					phys_addr_t end, int nid, ulong flags,
+					int alloc_func_type)
+{
+	int nnid, round = 0;
+	u64 pa;
+	DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
+
+	bitmap_zero(nodes_map, MAX_NUMNODES);
+
+again:
+	/*
+	 * There are total 4 cases:
+	 * <nid == NUMA_NO_NODE>
+	 *   1)2) node_distance_ready || !node_distance_ready
+	 *	Round 1, nnid = nid = NUMA_NO_NODE;
+	 * <nid != NUMA_NO_NODE>
+	 *   3) !node_distance_ready
+	 *	Round 1, nnid = nid;
+	 *    ::Round 2, currently only applicable for alloc_func_type = <0>
+	 *	Round 2, nnid = NUMA_NO_NODE;
+	 *   4) node_distance_ready
+	 *	Round 1, LOCAL_DISTANCE, nnid = nid;
+	 *	Round ?, nnid = nearest nid;
+	 */
+	if (!node_distance_ready() || (nid == NUMA_NO_NODE)) {
+		nnid = (++round == 1) ? nid : NUMA_NO_NODE;
+	} else {
+		int i, distance = INT_MAX;
+
+		for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
+			if (node_distance(nid, i) < distance) {
+				nnid = i;
+				distance = node_distance(nid, i);
+			}
+	}
+
+	switch (alloc_func_type) {
+	case 0:
+		pa = memblock_find_in_range_node(size, align, start, end, nnid, flags);
+		break;
+
+	case 1:
+	default:
+		pa = memblock_alloc_nid(size, align, nnid);
+		if (!node_distance_ready())
+			return pa;
+	}
+
+	if (!pa && (nnid != NUMA_NO_NODE)) {
+		bitmap_set(nodes_map, nnid, 1);
+		goto again;
+	}
+
+	return pa;
+}
+
 phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid)
 {
-	phys_addr_t res = memblock_alloc_nid(size, align, nid);
+	phys_addr_t res = memblock_alloc_near_nid(size, align, 0, 0, nid, 0, 1);

 	if (res)
 		return res;
@@ -1276,19 +1338,11 @@ static void * __init memblock_virt_alloc_internal(
 		max_addr = memblock.current_limit;

 again:
-	alloc = memblock_find_in_range_node(size, align, min_addr, max_addr,
-					    nid, flags);
+	alloc = memblock_alloc_near_nid(size, align, min_addr, max_addr,
+					    nid, flags, 0);
 	if (alloc)
 		goto done;

-	if (nid != NUMA_NO_NODE) {
-		alloc = memblock_find_in_range_node(size, align, min_addr,
-						    max_addr, NUMA_NO_NODE,
-						    flags);
-		if (alloc)
-			goto done;
-	}
-
 	if (min_addr) {
 		min_addr = 0;
 		goto again;
--
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] arm64/numa: support HAVE_MEMORYLESS_NODES
  2016-10-25  2:59 [PATCH 0/2] to support memblock near alloc and memoryless on arm64 Zhen Lei
  2016-10-25  2:59 ` [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc Zhen Lei
@ 2016-10-25  2:59 ` Zhen Lei
  2016-10-26 18:36   ` Will Deacon
  1 sibling, 1 reply; 11+ messages in thread
From: Zhen Lei @ 2016-10-25  2:59 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm
  Cc: Zefan Li, Xinwei Hu, Hanjun Guo, Zhen Lei

Some numa nodes may have no memory. For example:
1) a node has no memory bank plugged.
2) a node has no memory bank slots.

To ensure percpu variable areas and numa control blocks of the
memoryless numa nodes to be allocated from the nearest available node to
improve performance, defined node_distance_ready. And make its value to be
true immediately after node distances have been initialized.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm64/Kconfig            | 4 ++++
 arch/arm64/include/asm/numa.h | 3 +++
 arch/arm64/mm/numa.c          | 6 +++++-
 3 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 30398db..648dd13 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -609,6 +609,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
 	def_bool y
 	depends on NUMA

+config HAVE_MEMORYLESS_NODES
+	def_bool y
+	depends on NUMA
+
 source kernel/Kconfig.preempt
 source kernel/Kconfig.hz

diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index 600887e..9d068bf 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -13,6 +13,9 @@
 int __node_distance(int from, int to);
 #define node_distance(a, b) __node_distance(a, b)

+extern int __initdata arch_node_distance_ready;
+#define node_distance_ready()	arch_node_distance_ready
+
 extern nodemask_t numa_nodes_parsed __initdata;

 /* Mappings between node number and cpus on that node. */
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 9a71d06..5db9765 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -36,6 +36,7 @@ static int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };
 static int numa_distance_cnt;
 static u8 *numa_distance;
 static bool numa_off;
+int __initdata arch_node_distance_ready;

 static __init int numa_parse_early_param(char *opt)
 {
@@ -395,9 +396,12 @@ static int __init numa_init(int (*init_func)(void))
 		return -EINVAL;
 	}

+	arch_node_distance_ready = 1;
 	ret = numa_register_nodes();
-	if (ret < 0)
+	if (ret < 0) {
+		arch_node_distance_ready = 0;
 		return ret;
+	}

 	setup_node_to_cpumask_map();

--
2.5.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
  2016-10-25  2:59 ` [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc Zhen Lei
@ 2016-10-25 13:23   ` Michal Hocko
  2016-10-26  3:10     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2016-10-25 13:23 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm, Zefan Li, Xinwei Hu, Hanjun Guo

On Tue 25-10-16 10:59:17, Zhen Lei wrote:
> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
> actually exist. The percpu variable areas and numa control blocks of that
> memoryless numa nodes need to be allocated from the nearest available
> node to improve performance.
> 
> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
> specified nid at the first time, but if that allocation failed it will
> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
> the second time.
> 
> To compatible the above old scene, I use a marco node_distance_ready to
> control it. By default, the marco node_distance_ready is not defined in
> any platforms, the above mentioned functions will work as normal as
> before. Otherwise, they will try the nearest node first.

I am sorry but it is absolutely unclear to me _what_ is the motivation
of the patch. Is this a performance optimization, correctness issue or
something else? Could you please restate what is the problem, why do you
think it has to be fixed at memblock layer and describe what the actual
fix is please?

>From a quick glance you are trying to bend over the memblock API for
something that should be handled on a different layer.

> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  mm/memblock.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 65 insertions(+), 11 deletions(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index 7608bc3..556bbd2 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c
> @@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
>  	return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
>  }
> 
> +#ifndef node_distance_ready
> +#define node_distance_ready()		0
> +#endif
> +
> +static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size,
> +					phys_addr_t align, phys_addr_t start,
> +					phys_addr_t end, int nid, ulong flags,
> +					int alloc_func_type)
> +{
> +	int nnid, round = 0;
> +	u64 pa;
> +	DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
> +
> +	bitmap_zero(nodes_map, MAX_NUMNODES);
> +
> +again:
> +	/*
> +	 * There are total 4 cases:
> +	 * <nid == NUMA_NO_NODE>
> +	 *   1)2) node_distance_ready || !node_distance_ready
> +	 *	Round 1, nnid = nid = NUMA_NO_NODE;
> +	 * <nid != NUMA_NO_NODE>
> +	 *   3) !node_distance_ready
> +	 *	Round 1, nnid = nid;
> +	 *    ::Round 2, currently only applicable for alloc_func_type = <0>
> +	 *	Round 2, nnid = NUMA_NO_NODE;
> +	 *   4) node_distance_ready
> +	 *	Round 1, LOCAL_DISTANCE, nnid = nid;
> +	 *	Round ?, nnid = nearest nid;
> +	 */
> +	if (!node_distance_ready() || (nid == NUMA_NO_NODE)) {
> +		nnid = (++round == 1) ? nid : NUMA_NO_NODE;
> +	} else {
> +		int i, distance = INT_MAX;
> +
> +		for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
> +			if (node_distance(nid, i) < distance) {
> +				nnid = i;
> +				distance = node_distance(nid, i);
> +			}
> +	}
> +
> +	switch (alloc_func_type) {
> +	case 0:
> +		pa = memblock_find_in_range_node(size, align, start, end, nnid, flags);
> +		break;
> +
> +	case 1:
> +	default:
> +		pa = memblock_alloc_nid(size, align, nnid);
> +		if (!node_distance_ready())
> +			return pa;
> +	}
> +
> +	if (!pa && (nnid != NUMA_NO_NODE)) {
> +		bitmap_set(nodes_map, nnid, 1);
> +		goto again;
> +	}
> +
> +	return pa;
> +}
> +
>  phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid)
>  {
> -	phys_addr_t res = memblock_alloc_nid(size, align, nid);
> +	phys_addr_t res = memblock_alloc_near_nid(size, align, 0, 0, nid, 0, 1);
> 
>  	if (res)
>  		return res;
> @@ -1276,19 +1338,11 @@ static void * __init memblock_virt_alloc_internal(
>  		max_addr = memblock.current_limit;
> 
>  again:
> -	alloc = memblock_find_in_range_node(size, align, min_addr, max_addr,
> -					    nid, flags);
> +	alloc = memblock_alloc_near_nid(size, align, min_addr, max_addr,
> +					    nid, flags, 0);
>  	if (alloc)
>  		goto done;
> 
> -	if (nid != NUMA_NO_NODE) {
> -		alloc = memblock_find_in_range_node(size, align, min_addr,
> -						    max_addr, NUMA_NO_NODE,
> -						    flags);
> -		if (alloc)
> -			goto done;
> -	}
> -
>  	if (min_addr) {
>  		min_addr = 0;
>  		goto again;
> --
> 2.5.0
> 

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
  2016-10-25 13:23   ` Michal Hocko
@ 2016-10-26  3:10     ` Leizhen (ThunderTown)
  2016-10-26  9:31       ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Leizhen (ThunderTown) @ 2016-10-26  3:10 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm, Zefan Li, Xinwei Hu, Hanjun Guo



On 2016/10/25 21:23, Michal Hocko wrote:
> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>> actually exist. The percpu variable areas and numa control blocks of that
>> memoryless numa nodes need to be allocated from the nearest available
>> node to improve performance.
>>
>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>> specified nid at the first time, but if that allocation failed it will
>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>> the second time.
>>
>> To compatible the above old scene, I use a marco node_distance_ready to
>> control it. By default, the marco node_distance_ready is not defined in
>> any platforms, the above mentioned functions will work as normal as
>> before. Otherwise, they will try the nearest node first.
> 
> I am sorry but it is absolutely unclear to me _what_ is the motivation
> of the patch. Is this a performance optimization, correctness issue or
> something else? Could you please restate what is the problem, why do you
> think it has to be fixed at memblock layer and describe what the actual
> fix is please?
This is a performance optimization. The problem is if some memoryless numa nodes are
actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no memory,
and the node distances is as below:
                    ---------board-------
		    |                   |
                    |                   |
                 socket0             socket1
                   / \                 / \
                  /   \               /   \
               node0 node1         node2 node3
distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 access
the memory of node0 is faster than node2 or node3.

Linux defines a lot of percpu variables, each cpu has a copy of it and most of the time
only to access their own percpu area. In this example, we hope the percpu area of CPUs
on node1 allocated from node0. But without these patches, it's not sure that.

If each node has their own memory, we can directly use below functions to allocate memory
from its local node:
1. memblock_alloc_nid
2. memblock_alloc_try_nid
3. memblock_virt_alloc_try_nid_nopanic
4. memblock_virt_alloc_try_nid

So, these patches is only used for numa memoryless scenario.

Another use case is the control block "extern pg_data_t *node_data[]",
Here is an example of x86 numa in arch/x86/mm/numa.c:
static void __init alloc_node_data(int nid)
{
	... ...
        /*
         * Allocate node data.  Try node-local memory and then any node.	//==>But the nearest node is the best
         * Never allocate in DMA zone.
         */
        nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
        if (!nd_pa) {
                nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
                                              MEMBLOCK_ALLOC_ACCESSIBLE);
                if (!nd_pa) {
                        pr_err("Cannot find %zu bytes in node %d\n",
                               nd_size, nid);
                        return;
                }
        }
        nd = __va(nd_pa);
        ... ...
        node_data[nid] = nd;

> 
>>From a quick glance you are trying to bend over the memblock API for
> something that should be handled on a different layer.
> 
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  mm/memblock.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++++---------
>>  1 file changed, 65 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/memblock.c b/mm/memblock.c
>> index 7608bc3..556bbd2 100644
>> --- a/mm/memblock.c
>> +++ b/mm/memblock.c
>> @@ -1213,9 +1213,71 @@ phys_addr_t __init memblock_alloc(phys_addr_t size, phys_addr_t align)
>>  	return memblock_alloc_base(size, align, MEMBLOCK_ALLOC_ACCESSIBLE);
>>  }
>>
>> +#ifndef node_distance_ready
>> +#define node_distance_ready()		0
>> +#endif
>> +
>> +static phys_addr_t __init memblock_alloc_near_nid(phys_addr_t size,
>> +					phys_addr_t align, phys_addr_t start,
>> +					phys_addr_t end, int nid, ulong flags,
>> +					int alloc_func_type)
>> +{
>> +	int nnid, round = 0;
>> +	u64 pa;
>> +	DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>> +
>> +	bitmap_zero(nodes_map, MAX_NUMNODES);
>> +
>> +again:
>> +	/*
>> +	 * There are total 4 cases:
>> +	 * <nid == NUMA_NO_NODE>
>> +	 *   1)2) node_distance_ready || !node_distance_ready
>> +	 *	Round 1, nnid = nid = NUMA_NO_NODE;
>> +	 * <nid != NUMA_NO_NODE>
>> +	 *   3) !node_distance_ready
>> +	 *	Round 1, nnid = nid;
>> +	 *    ::Round 2, currently only applicable for alloc_func_type = <0>
>> +	 *	Round 2, nnid = NUMA_NO_NODE;
>> +	 *   4) node_distance_ready
>> +	 *	Round 1, LOCAL_DISTANCE, nnid = nid;
>> +	 *	Round ?, nnid = nearest nid;
>> +	 */
>> +	if (!node_distance_ready() || (nid == NUMA_NO_NODE)) {
>> +		nnid = (++round == 1) ? nid : NUMA_NO_NODE;
>> +	} else {
>> +		int i, distance = INT_MAX;
>> +
>> +		for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>> +			if (node_distance(nid, i) < distance) {
>> +				nnid = i;
>> +				distance = node_distance(nid, i);
>> +			}
>> +	}
>> +
>> +	switch (alloc_func_type) {
>> +	case 0:
>> +		pa = memblock_find_in_range_node(size, align, start, end, nnid, flags);
>> +		break;
>> +
>> +	case 1:
>> +	default:
>> +		pa = memblock_alloc_nid(size, align, nnid);
>> +		if (!node_distance_ready())
>> +			return pa;
>> +	}
>> +
>> +	if (!pa && (nnid != NUMA_NO_NODE)) {
>> +		bitmap_set(nodes_map, nnid, 1);
>> +		goto again;
>> +	}
>> +
>> +	return pa;
>> +}
>> +
>>  phys_addr_t __init memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid)
>>  {
>> -	phys_addr_t res = memblock_alloc_nid(size, align, nid);
>> +	phys_addr_t res = memblock_alloc_near_nid(size, align, 0, 0, nid, 0, 1);
>>
>>  	if (res)
>>  		return res;
>> @@ -1276,19 +1338,11 @@ static void * __init memblock_virt_alloc_internal(
>>  		max_addr = memblock.current_limit;
>>
>>  again:
>> -	alloc = memblock_find_in_range_node(size, align, min_addr, max_addr,
>> -					    nid, flags);
>> +	alloc = memblock_alloc_near_nid(size, align, min_addr, max_addr,
>> +					    nid, flags, 0);
>>  	if (alloc)
>>  		goto done;
>>
>> -	if (nid != NUMA_NO_NODE) {
>> -		alloc = memblock_find_in_range_node(size, align, min_addr,
>> -						    max_addr, NUMA_NO_NODE,
>> -						    flags);
>> -		if (alloc)
>> -			goto done;
>> -	}
>> -
>>  	if (min_addr) {
>>  		min_addr = 0;
>>  		goto again;
>> --
>> 2.5.0
>>
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
  2016-10-26  3:10     ` Leizhen (ThunderTown)
@ 2016-10-26  9:31       ` Michal Hocko
  2016-10-27  2:41         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2016-10-26  9:31 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm, Zefan Li, Xinwei Hu, Hanjun Guo

On Wed 26-10-16 11:10:44, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/10/25 21:23, Michal Hocko wrote:
> > On Tue 25-10-16 10:59:17, Zhen Lei wrote:
> >> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
> >> actually exist. The percpu variable areas and numa control blocks of that
> >> memoryless numa nodes need to be allocated from the nearest available
> >> node to improve performance.
> >>
> >> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
> >> specified nid at the first time, but if that allocation failed it will
> >> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
> >> the second time.
> >>
> >> To compatible the above old scene, I use a marco node_distance_ready to
> >> control it. By default, the marco node_distance_ready is not defined in
> >> any platforms, the above mentioned functions will work as normal as
> >> before. Otherwise, they will try the nearest node first.
> > 
> > I am sorry but it is absolutely unclear to me _what_ is the motivation
> > of the patch. Is this a performance optimization, correctness issue or
> > something else? Could you please restate what is the problem, why do you
> > think it has to be fixed at memblock layer and describe what the actual
> > fix is please?
>
> This is a performance optimization.

Do you have any numbers to back the improvements?

> The problem is if some memoryless numa nodes are
> actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no memory,
> and the node distances is as below:
>                     ---------board-------
> 		    |                   |
>                     |                   |
>                  socket0             socket1
>                    / \                 / \
>                   /   \               /   \
>                node0 node1         node2 node3
> distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 access
> the memory of node0 is faster than node2 or node3.
> 
> Linux defines a lot of percpu variables, each cpu has a copy of it and most of the time
> only to access their own percpu area. In this example, we hope the percpu area of CPUs
> on node1 allocated from node0. But without these patches, it's not sure that.

I am not familiar with the percpu allocator much so I might be
completely missig a point but why cannot this be solved in the percpu
allocator directly e.g. by using cpu_to_mem which should already be
memoryless aware.

Generating a new API while we have means to use an existing one sounds
just not right to me.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] arm64/numa: support HAVE_MEMORYLESS_NODES
  2016-10-25  2:59 ` [PATCH 2/2] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
@ 2016-10-26 18:36   ` Will Deacon
  2016-10-27  3:54     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 11+ messages in thread
From: Will Deacon @ 2016-10-26 18:36 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Andrew Morton,
	linux-mm, Zefan Li, Xinwei Hu, Hanjun Guo

On Tue, Oct 25, 2016 at 10:59:18AM +0800, Zhen Lei wrote:
> Some numa nodes may have no memory. For example:
> 1) a node has no memory bank plugged.
> 2) a node has no memory bank slots.
> 
> To ensure percpu variable areas and numa control blocks of the
> memoryless numa nodes to be allocated from the nearest available node to
> improve performance, defined node_distance_ready. And make its value to be
> true immediately after node distances have been initialized.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/arm64/Kconfig            | 4 ++++
>  arch/arm64/include/asm/numa.h | 3 +++
>  arch/arm64/mm/numa.c          | 6 +++++-
>  3 files changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 30398db..648dd13 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -609,6 +609,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>  	def_bool y
>  	depends on NUMA
> 
> +config HAVE_MEMORYLESS_NODES
> +	def_bool y
> +	depends on NUMA

Given that patch 1 and the associated node_distance_ready stuff is all
an unqualified performance optimisation, is there any merit in just
enabling HAVE_MEMORYLESS_NODES in Kconfig and then optimising things as
a separate series when you have numbers to back it up?

Will

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
  2016-10-26  9:31       ` Michal Hocko
@ 2016-10-27  2:41         ` Leizhen (ThunderTown)
  2016-10-27  7:22           ` Michal Hocko
  0 siblings, 1 reply; 11+ messages in thread
From: Leizhen (ThunderTown) @ 2016-10-27  2:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm, Zefan Li, Xinwei Hu, Hanjun Guo



On 2016/10/26 17:31, Michal Hocko wrote:
> On Wed 26-10-16 11:10:44, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/10/25 21:23, Michal Hocko wrote:
>>> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>>>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>>>> actually exist. The percpu variable areas and numa control blocks of that
>>>> memoryless numa nodes need to be allocated from the nearest available
>>>> node to improve performance.
>>>>
>>>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>>>> specified nid at the first time, but if that allocation failed it will
>>>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>>>> the second time.
>>>>
>>>> To compatible the above old scene, I use a marco node_distance_ready to
>>>> control it. By default, the marco node_distance_ready is not defined in
>>>> any platforms, the above mentioned functions will work as normal as
>>>> before. Otherwise, they will try the nearest node first.
>>>
>>> I am sorry but it is absolutely unclear to me _what_ is the motivation
>>> of the patch. Is this a performance optimization, correctness issue or
>>> something else? Could you please restate what is the problem, why do you
>>> think it has to be fixed at memblock layer and describe what the actual
>>> fix is please?
>>
>> This is a performance optimization.
> 
> Do you have any numbers to back the improvements?
I have not collected any performance data, but at least in theory, it's beneficial and harmless,
except make code looks a bit urly. Because all related functions are actually defined as __init,
for example:
phys_addr_t __init memblock_alloc_try_nid(
void * __init memblock_virt_alloc_try_nid(

And all related memory(percpu variables and NODE_DATA) is mostly referred at running time.

> 
>> The problem is if some memoryless numa nodes are
>> actually exist, for example: there are total 4 nodes, 0,1,2,3, node 1 has no memory,
>> and the node distances is as below:
>>                     ---------board-------
>> 		    |                   |
>>                     |                   |
>>                  socket0             socket1
>>                    / \                 / \
>>                   /   \               /   \
>>                node0 node1         node2 node3
>> distance[1][0] is nearer than distance[1][2] and distance[1][3]. CPUs on node1 access
>> the memory of node0 is faster than node2 or node3.
>>
>> Linux defines a lot of percpu variables, each cpu has a copy of it and most of the time
>> only to access their own percpu area. In this example, we hope the percpu area of CPUs
>> on node1 allocated from node0. But without these patches, it's not sure that.
> 
> I am not familiar with the percpu allocator much so I might be
> completely missig a point but why cannot this be solved in the percpu
> allocator directly e.g. by using cpu_to_mem which should already be
> memoryless aware.
My test result told me that it can not:
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000011ffffffff]
[    0.000000] Could not find start_pfn for node 1
[    0.000000] Initmem setup node 1 [mem 0x0000000000000000-0x0000000000000000]
[    0.000000] Initmem setup node 2 [mem 0x0000001200000000-0x00000013ffffffff]
[    0.000000] Initmem setup node 3 [mem 0x0000001400000000-0x00000017ffffffff]


[   14.801895] NODE_DATA(0) = 0x11ffffe500
[   14.805749] NODE_DATA(1) = 0x11ffffca00	//(1), see below
[   14.809602] NODE_DATA(2) = 0x13ffffe500
[   14.813455] NODE_DATA(3) = 0x17fffe5480
[   14.817316] cpu 0 on node0: 11fff87638
[   14.821083] cpu 1 on node0: 11fff9c638
[   14.824850] cpu 2 on node0: 11fffb1638
[   14.828616] cpu 3 on node0: 11fffc6638
[   14.832383] cpu 4 on node1: 17fff8a638	//(2), see below
[   14.836149] cpu 5 on node1: 17fff9f638
[   14.839912] cpu 6 on node1: 17fffb4638
[   14.843677] cpu 7 on node1: 17fffc9638
[   14.847444] cpu 8 on node2: 13fffa4638
[   14.851210] cpu 9 on node2: 13fffb9638
[   14.854976] cpu10 on node2: 13fffce638
[   14.858742] cpu11 on node2: 13fffe3638
[   14.862510] cpu12 on node3: 17fff36638
[   14.866276] cpu13 on node3: 17fff4b638
[   14.870042] cpu14 on node3: 17fff60638
[   14.873809] cpu15 on node3: 17fff75638

(1) memblock_alloc_try_nid and with these patches, memory was allocated from node0
(2) do the same implementation as X86 and PowerPC, memory was allocated from node3:
    	return  __alloc_bootmem_node(NODE_DATA(nid), size, align, __pa(MAX_DMA_ADDRESS));

I'm not sure how about on X86 and PowerPC, here is my test cases. Is anybody interested and
have testing environment, can you help me to execute it?

static int tst_numa_002(void)
{
        int i;

        for (i = 0; i < nr_node_ids; i++)
                pr_info("NODE_DATA(%d) = 0x%llx\n", i, virt_to_phys(NODE_DATA(i)));

        return 0;
}

static int tst_numa_003(void)
{
        int cpu;
        void __percpu *p;

        p = __alloc_percpu(0x100, 1);

        for_each_possible_cpu(cpu)
                pr_info("cpu%2d on node%d: %llx\n", cpu, cpu_to_node(cpu), per_cpu_ptr_to_phys(per_cpu_ptr(p, cpu)));

        free_percpu(p);

        return 0;
}

> 
> Generating a new API while we have means to use an existing one sounds
> just not right to me.
Yes, so I gave up to create two new functions and selected this implementation.

> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] arm64/numa: support HAVE_MEMORYLESS_NODES
  2016-10-26 18:36   ` Will Deacon
@ 2016-10-27  3:54     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 11+ messages in thread
From: Leizhen (ThunderTown) @ 2016-10-27  3:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Andrew Morton,
	linux-mm, Zefan Li, Xinwei Hu, Hanjun Guo



On 2016/10/27 2:36, Will Deacon wrote:
> On Tue, Oct 25, 2016 at 10:59:18AM +0800, Zhen Lei wrote:
>> Some numa nodes may have no memory. For example:
>> 1) a node has no memory bank plugged.
>> 2) a node has no memory bank slots.
>>
>> To ensure percpu variable areas and numa control blocks of the
>> memoryless numa nodes to be allocated from the nearest available node to
>> improve performance, defined node_distance_ready. And make its value to be
>> true immediately after node distances have been initialized.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  arch/arm64/Kconfig            | 4 ++++
>>  arch/arm64/include/asm/numa.h | 3 +++
>>  arch/arm64/mm/numa.c          | 6 +++++-
>>  3 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 30398db..648dd13 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -609,6 +609,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>  	def_bool y
>>  	depends on NUMA
>>
>> +config HAVE_MEMORYLESS_NODES
>> +	def_bool y
>> +	depends on NUMA
> 
> Given that patch 1 and the associated node_distance_ready stuff is all
> an unqualified performance optimisation, is there any merit in just
> enabling HAVE_MEMORYLESS_NODES in Kconfig and then optimising things as
> a separate series when you have numbers to back it up?
HAVE_MEMORYLESS_NODES is also an performance optimisation for memoryless scenario.
For example:
node0 is a memoryless node, node1 is the nearest node of node0.
We want to allocate memory from node0, normally memory manager will try node0 first, then node1.
But we have already kwown that node0 have no memory, so we can tell memory manager directly try
node1 first. So HAVE_MEMORYLESS_NODES is used to skip the memoryless nodes, don't try them.

So I think the title of this patch is misleading, I will rewrite it in V2.

Or, Do you mean separate it into a new patch?


> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
  2016-10-27  2:41         ` Leizhen (ThunderTown)
@ 2016-10-27  7:22           ` Michal Hocko
  2016-10-27  8:23             ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 11+ messages in thread
From: Michal Hocko @ 2016-10-27  7:22 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm, Zefan Li, Xinwei Hu, Hanjun Guo

On Thu 27-10-16 10:41:24, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/10/26 17:31, Michal Hocko wrote:
> > On Wed 26-10-16 11:10:44, Leizhen (ThunderTown) wrote:
> >>
> >>
> >> On 2016/10/25 21:23, Michal Hocko wrote:
> >>> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
> >>>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
> >>>> actually exist. The percpu variable areas and numa control blocks of that
> >>>> memoryless numa nodes need to be allocated from the nearest available
> >>>> node to improve performance.
> >>>>
> >>>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
> >>>> specified nid at the first time, but if that allocation failed it will
> >>>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
> >>>> the second time.
> >>>>
> >>>> To compatible the above old scene, I use a marco node_distance_ready to
> >>>> control it. By default, the marco node_distance_ready is not defined in
> >>>> any platforms, the above mentioned functions will work as normal as
> >>>> before. Otherwise, they will try the nearest node first.
> >>>
> >>> I am sorry but it is absolutely unclear to me _what_ is the motivation
> >>> of the patch. Is this a performance optimization, correctness issue or
> >>> something else? Could you please restate what is the problem, why do you
> >>> think it has to be fixed at memblock layer and describe what the actual
> >>> fix is please?
> >>
> >> This is a performance optimization.
> > 
> > Do you have any numbers to back the improvements?
>
> I have not collected any performance data, but at least in theory,
> it's beneficial and harmless, except make code looks a bit
> urly.

The whole memoryless area is cluttered with hacks because everybody just
adds pieces here and there to make his particular usecase work IMHO.
Adding more on top for performance reasons which are even not measured
to prove a clear win is a no go. Please step back try to think how this
could be done with an existing infrastructure we have (some cleanups
while doing that would be hugely appreciated) and if that is not
possible then explain why and why it is not feasible to fix that before
you start adding a new API.

Thanks!

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc
  2016-10-27  7:22           ` Michal Hocko
@ 2016-10-27  8:23             ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 11+ messages in thread
From: Leizhen (ThunderTown) @ 2016-10-27  8:23 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Andrew Morton, linux-mm, Zefan Li, Xinwei Hu, Hanjun Guo



On 2016/10/27 15:22, Michal Hocko wrote:
> On Thu 27-10-16 10:41:24, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/10/26 17:31, Michal Hocko wrote:
>>> On Wed 26-10-16 11:10:44, Leizhen (ThunderTown) wrote:
>>>>
>>>>
>>>> On 2016/10/25 21:23, Michal Hocko wrote:
>>>>> On Tue 25-10-16 10:59:17, Zhen Lei wrote:
>>>>>> If HAVE_MEMORYLESS_NODES is selected, and some memoryless numa nodes are
>>>>>> actually exist. The percpu variable areas and numa control blocks of that
>>>>>> memoryless numa nodes need to be allocated from the nearest available
>>>>>> node to improve performance.
>>>>>>
>>>>>> Although memblock_alloc_try_nid and memblock_virt_alloc_try_nid try the
>>>>>> specified nid at the first time, but if that allocation failed it will
>>>>>> directly drop to use NUMA_NO_NODE. This mean any nodes maybe possible at
>>>>>> the second time.
>>>>>>
>>>>>> To compatible the above old scene, I use a marco node_distance_ready to
>>>>>> control it. By default, the marco node_distance_ready is not defined in
>>>>>> any platforms, the above mentioned functions will work as normal as
>>>>>> before. Otherwise, they will try the nearest node first.
>>>>>
>>>>> I am sorry but it is absolutely unclear to me _what_ is the motivation
>>>>> of the patch. Is this a performance optimization, correctness issue or
>>>>> something else? Could you please restate what is the problem, why do you
>>>>> think it has to be fixed at memblock layer and describe what the actual
>>>>> fix is please?
>>>>
>>>> This is a performance optimization.
>>>
>>> Do you have any numbers to back the improvements?
>>
>> I have not collected any performance data, but at least in theory,
>> it's beneficial and harmless, except make code looks a bit
>> urly.
> 
> The whole memoryless area is cluttered with hacks because everybody just
> adds pieces here and there to make his particular usecase work IMHO.
> Adding more on top for performance reasons which are even not measured
OK, I will ask my colleagues for help, whether some APPs can be used or not.

> to prove a clear win is a no go. Please step back try to think how this
> could be done with an existing infrastructure we have (some cleanups
OK, I will try to do it. But some infrastructures maybe only restricted in the
theoretical analysis, I don't have the related testing environment, so there is
no way to verify.


> while doing that would be hugely appreciated) and if that is not
> possible then explain why and why it is not feasible to fix that before
I think it will be feasible.

> you start adding a new API.
> 
> Thanks!
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-10-27 14:37 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-25  2:59 [PATCH 0/2] to support memblock near alloc and memoryless on arm64 Zhen Lei
2016-10-25  2:59 ` [PATCH 1/2] mm/memblock: prepare a capability to support memblock near alloc Zhen Lei
2016-10-25 13:23   ` Michal Hocko
2016-10-26  3:10     ` Leizhen (ThunderTown)
2016-10-26  9:31       ` Michal Hocko
2016-10-27  2:41         ` Leizhen (ThunderTown)
2016-10-27  7:22           ` Michal Hocko
2016-10-27  8:23             ` Leizhen (ThunderTown)
2016-10-25  2:59 ` [PATCH 2/2] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
2016-10-26 18:36   ` Will Deacon
2016-10-27  3:54     ` Leizhen (ThunderTown)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).