linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa
@ 2016-08-24  7:44 Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 01/14] of/numa: remove a duplicated pr_debug information Zhen Lei
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

v6 -> v7:
Fix a bug for this patch series when "numa=off" was set in bootargs, this
modification only impact patch 12.

Please refer https://lkml.org/lkml/2016/8/23/249 for more details.

@@ -119,13 +115,13 @@ static void __init setup_node_to_cpumask_map(void)
  */
 void numa_store_cpu_info(unsigned int cpu)
 {
-	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
+	map_cpu_to_node(cpu, cpu_to_node_map[cpu]);
 }

 void __init early_map_cpu_to_node(unsigned int cpu, int nid)
 {
 	/* fallback to node 0 */
-	if (nid < 0 || nid >= MAX_NUMNODES)
+	if (nid < 0 || nid >= MAX_NUMNODES || numa_off)
 		nid = 0;

v5 -> v6:
Move memblk nid check from arch/arm64/mm/numa.c into drivers/of/of_numa.c,
because this check is arch independent.

This modification only related to patch 3, but impacted the contents of patch 7 and 8,
other patches have no change.

v4 -> v5:
This version has no code changes, just add "Acked-by: Rob Herring <robh@kernel.org>"
into patches 1, 2, 4, 6, 7, 13, 14. Because these patches rely on some acpi numa
patches, and the latter had not been upstreamed in 4.7, but upstreamed in 4.8-rc1,
so I resend my patches again.

v3 -> v4:
1. Packed three patches of Kefeng Wang, patch6-8.
2. Add 6 new patches(9-15) to enhance the numa on arm64.

v2 -> v3:
1. Adjust patch2 and patch5 according to Matthias Brugger's advice, to make the
   patches looks more well. The final code have no change. 

v1 -> v2:
1. Base on https://lkml.org/lkml/2016/5/24/679
2. Rewrote of_numa_parse_memory_nodes according to Rob Herring's advice. So that it looks more clear.
3. Rewrote patch 5 because some scenes were not considered before.

Kefeng Wang (3):
  of_numa: Use of_get_next_parent to simplify code
  of_numa: Use pr_fmt()
  arm64: numa: Use pr_fmt()

Zhen Lei (11):
  of/numa: remove a duplicated pr_debug information
  of/numa: fix a memory@ node can only contains one memory block
  arm64/numa: add nid check for memory block
  of/numa: remove a duplicated warning
  arm64/numa: avoid inconsistent information to be printed
  arm64/numa: support HAVE_SETUP_PER_CPU_AREA
  arm64/numa: define numa_distance as array to simplify code
  arm64/numa: support HAVE_MEMORYLESS_NODES
  arm64/numa: remove the limitation that cpu0 must bind to node0
  of/numa: remove the constraint on the distances of node pairs
  Documentation: remove the constraint on the distances of node pairs

 Documentation/devicetree/bindings/numa.txt |   1 -
 arch/arm64/Kconfig                         |  12 ++
 arch/arm64/include/asm/numa.h              |   1 -
 arch/arm64/kernel/smp.c                    |   1 +
 arch/arm64/mm/numa.c                       | 226 ++++++++++++++++-------------
 drivers/of/of_numa.c                       |  88 ++++++-----
 6 files changed, 179 insertions(+), 150 deletions(-)

--
2.5.0

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH v7 01/14] of/numa: remove a duplicated pr_debug information
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 02/14] of/numa: fix a memory@ node can only contains one memory block Zhen Lei
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

This information will be printed in the subfunction numa_add_memblk.
They are not the same, but very similar.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/of/of_numa.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
index ed5a097..fb71b4e 100644
--- a/drivers/of/of_numa.c
+++ b/drivers/of/of_numa.c
@@ -88,10 +88,6 @@ static int __init of_numa_parse_memory_nodes(void)
 			break;
 		}

-		pr_debug("NUMA:  base = %llx len = %llx, node = %u\n",
-			 rsrc.start, rsrc.end - rsrc.start + 1, nid);
-
-
 		r = numa_add_memblk(nid, rsrc.start, rsrc.end + 1);
 		if (r)
 			break;
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 02/14] of/numa: fix a memory@ node can only contains one memory block
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 01/14] of/numa: remove a duplicated pr_debug information Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 03/14] arm64/numa: add nid check for " Zhen Lei
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

For a normal memory@ devicetree node, its reg property can contains more
memory blocks.

Because we don't known how many memory blocks maybe contained, so we try
from index=0, increase 1 until error returned(the end).

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/of/of_numa.c | 29 ++++++++++-------------------
 1 file changed, 10 insertions(+), 19 deletions(-)

diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
index fb71b4e..7b3fbdc 100644
--- a/drivers/of/of_numa.c
+++ b/drivers/of/of_numa.c
@@ -63,13 +63,9 @@ static int __init of_numa_parse_memory_nodes(void)
 	struct device_node *np = NULL;
 	struct resource rsrc;
 	u32 nid;
-	int r = 0;
-
-	for (;;) {
-		np = of_find_node_by_type(np, "memory");
-		if (!np)
-			break;
+	int i, r;

+	for_each_node_by_type(np, "memory") {
 		r = of_property_read_u32(np, "numa-node-id", &nid);
 		if (r == -EINVAL)
 			/*
@@ -78,23 +74,18 @@ static int __init of_numa_parse_memory_nodes(void)
 			 * "numa-node-id" property
 			 */
 			continue;
-		else if (r)
-			/* some other error */
-			break;

-		r = of_address_to_resource(np, 0, &rsrc);
-		if (r) {
-			pr_err("NUMA: bad reg property in memory node\n");
-			break;
-		}
+		for (i = 0; !r && !of_address_to_resource(np, i, &rsrc); i++)
+			r = numa_add_memblk(nid, rsrc.start, rsrc.end + 1);

-		r = numa_add_memblk(nid, rsrc.start, rsrc.end + 1);
-		if (r)
-			break;
+		if (!i || r) {
+			of_node_put(np);
+			pr_err("NUMA: bad property in memory node\n");
+			return r ? : -EINVAL;
+		}
 	}
-	of_node_put(np);

-	return r;
+	return 0;
 }

 static int __init of_numa_parse_distance_map_v1(struct device_node *map)
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 03/14] arm64/numa: add nid check for memory block
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 01/14] of/numa: remove a duplicated pr_debug information Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 02/14] of/numa: fix a memory@ node can only contains one memory block Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-26 12:39   ` Will Deacon
  2016-08-24  7:44 ` [PATCH v7 04/14] of/numa: remove a duplicated warning Zhen Lei
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

Use the same tactic to cpu and numa-distance nodes.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 drivers/of/of_numa.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
index 7b3fbdc..afaeb9c 100644
--- a/drivers/of/of_numa.c
+++ b/drivers/of/of_numa.c
@@ -75,6 +75,11 @@ static int __init of_numa_parse_memory_nodes(void)
 			 */
 			continue;

+		if (nid >= MAX_NUMNODES) {
+			pr_warn("NUMA: Node id %u exceeds maximum value\n", nid);
+			return -EINVAL;
+		}
+
 		for (i = 0; !r && !of_address_to_resource(np, i, &rsrc); i++)
 			r = numa_add_memblk(nid, rsrc.start, rsrc.end + 1);

--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 04/14] of/numa: remove a duplicated warning
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (2 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 03/14] arm64/numa: add nid check for " Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed Zhen Lei
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

This warning has been printed in of_numa_parse_cpu_nodes before.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/of/of_numa.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
index afaeb9c..8723f64 100644
--- a/drivers/of/of_numa.c
+++ b/drivers/of/of_numa.c
@@ -179,13 +179,8 @@ int of_node_to_nid(struct device_node *device)
 			np->name);
 	of_node_put(np);

-	if (!r) {
-		if (nid >= MAX_NUMNODES)
-			pr_warn("NUMA: Node id %u exceeds maximum value\n",
-				nid);
-		else
-			return nid;
-	}
+	if (!r)
+		return nid;

 	return NUMA_NO_NODE;
 }
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (3 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 04/14] of/numa: remove a duplicated warning Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-26 12:47   ` Will Deacon
  2016-08-24  7:44 ` [PATCH v7 06/14] of_numa: Use of_get_next_parent to simplify code Zhen Lei
                   ` (8 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

numa_init(of_numa_init) may returned error because of numa configuration
error. So "No NUMA configuration found" is inaccurate. In fact, specific
configuration error information should be immediately printed by the
testing branch.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm64/mm/numa.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 5bb15ea..d97c6e2 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -335,8 +335,10 @@ static int __init numa_init(int (*init_func)(void))
 	if (ret < 0)
 		return ret;

-	if (nodes_empty(numa_nodes_parsed))
+	if (nodes_empty(numa_nodes_parsed)) {
+		pr_info("No NUMA configuration found\n");
 		return -EINVAL;
+	}

 	ret = numa_register_nodes();
 	if (ret < 0)
@@ -367,8 +369,6 @@ static int __init dummy_numa_init(void)

 	if (numa_off)
 		pr_info("NUMA disabled\n"); /* Forced off on command line. */
-	else
-		pr_info("No NUMA configuration found\n");
 	pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n",
 	       0LLU, PFN_PHYS(max_pfn) - 1);

--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 06/14] of_numa: Use of_get_next_parent to simplify code
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (4 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 07/14] of_numa: Use pr_fmt() Zhen Lei
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

From: Kefeng Wang <wangkefeng.wang@huawei.com>

Use of_get_next_parent() instead of open-code.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/of/of_numa.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
index 8723f64..ed103e6 100644
--- a/drivers/of/of_numa.c
+++ b/drivers/of/of_numa.c
@@ -158,8 +158,6 @@ int of_node_to_nid(struct device_node *device)
 	np = of_node_get(device);

 	while (np) {
-		struct device_node *parent;
-
 		r = of_property_read_u32(np, "numa-node-id", &nid);
 		/*
 		 * -EINVAL indicates the property was not found, and
@@ -170,9 +168,7 @@ int of_node_to_nid(struct device_node *device)
 		if (r != -EINVAL)
 			break;

-		parent = of_get_parent(np);
-		of_node_put(np);
-		np = parent;
+		np = of_get_next_parent(np);
 	}
 	if (np && r)
 		pr_warn("NUMA: Invalid \"numa-node-id\" property in node %s\n",
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 07/14] of_numa: Use pr_fmt()
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (5 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 06/14] of_numa: Use of_get_next_parent to simplify code Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 08/14] arm64: numa: " Zhen Lei
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

From: Kefeng Wang <wangkefeng.wang@huawei.com>

Use pr_fmt to prefix kernel output.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/of/of_numa.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
index ed103e6..1234b4a 100644
--- a/drivers/of/of_numa.c
+++ b/drivers/of/of_numa.c
@@ -16,6 +16,8 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */

+#define pr_fmt(fmt) "OF: NUMA: " fmt
+
 #include <linux/of.h>
 #include <linux/of_address.h>
 #include <linux/nodemask.h>
@@ -49,10 +51,9 @@ static void __init of_numa_parse_cpu_nodes(void)
 		if (r)
 			continue;

-		pr_debug("NUMA: CPU on %u\n", nid);
+		pr_debug("CPU on %u\n", nid);
 		if (nid >= MAX_NUMNODES)
-			pr_warn("NUMA: Node id %u exceeds maximum value\n",
-				nid);
+			pr_warn("Node id %u exceeds maximum value\n", nid);
 		else
 			node_set(nid, numa_nodes_parsed);
 	}
@@ -76,7 +77,7 @@ static int __init of_numa_parse_memory_nodes(void)
 			continue;

 		if (nid >= MAX_NUMNODES) {
-			pr_warn("NUMA: Node id %u exceeds maximum value\n", nid);
+			pr_warn("Node id %u exceeds maximum value\n", nid);
 			return -EINVAL;
 		}

@@ -85,7 +86,7 @@ static int __init of_numa_parse_memory_nodes(void)

 		if (!i || r) {
 			of_node_put(np);
-			pr_err("NUMA: bad property in memory node\n");
+			pr_err("bad property in memory node\n");
 			return r ? : -EINVAL;
 		}
 	}
@@ -99,17 +100,17 @@ static int __init of_numa_parse_distance_map_v1(struct device_node *map)
 	int entry_count;
 	int i;

-	pr_info("NUMA: parsing numa-distance-map-v1\n");
+	pr_info("parsing numa-distance-map-v1\n");

 	matrix = of_get_property(map, "distance-matrix", NULL);
 	if (!matrix) {
-		pr_err("NUMA: No distance-matrix property in distance-map\n");
+		pr_err("No distance-matrix property in distance-map\n");
 		return -EINVAL;
 	}

 	entry_count = of_property_count_u32_elems(map, "distance-matrix");
 	if (entry_count <= 0) {
-		pr_err("NUMA: Invalid distance-matrix\n");
+		pr_err("Invalid distance-matrix\n");
 		return -EINVAL;
 	}

@@ -124,7 +125,7 @@ static int __init of_numa_parse_distance_map_v1(struct device_node *map)
 		matrix++;

 		numa_set_distance(nodea, nodeb, distance);
-		pr_debug("NUMA:  distance[node%d -> node%d] = %d\n",
+		pr_debug("distance[node%d -> node%d] = %d\n",
 			 nodea, nodeb, distance);

 		/* Set default distance of node B->A same as A->B */
@@ -171,7 +172,7 @@ int of_node_to_nid(struct device_node *device)
 		np = of_get_next_parent(np);
 	}
 	if (np && r)
-		pr_warn("NUMA: Invalid \"numa-node-id\" property in node %s\n",
+		pr_warn("Invalid \"numa-node-id\" property in node %s\n",
 			np->name);
 	of_node_put(np);

--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 08/14] arm64: numa: Use pr_fmt()
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (6 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 07/14] of_numa: Use pr_fmt() Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-26 12:54   ` Will Deacon
  2016-08-24  7:44 ` [PATCH v7 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA Zhen Lei
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

From: Kefeng Wang <wangkefeng.wang@huawei.com>

Use pr_fmt to prefix kernel output, and remove duplicated msg
of NUMA turned off.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 arch/arm64/mm/numa.c | 40 ++++++++++++++++++++--------------------
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index d97c6e2..7b73808 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -17,6 +17,8 @@
  * along with this program.  If not, see <http://www.gnu.org/licenses/>.
  */

+#define pr_fmt(fmt) "numa: " fmt
+
 #include <linux/acpi.h>
 #include <linux/bootmem.h>
 #include <linux/memblock.h>
@@ -38,10 +40,9 @@ static __init int numa_parse_early_param(char *opt)
 {
 	if (!opt)
 		return -EINVAL;
-	if (!strncmp(opt, "off", 3)) {
-		pr_info("%s\n", "NUMA turned off");
+	if (!strncmp(opt, "off", 3))
 		numa_off = true;
-	}
+
 	return 0;
 }
 early_param("numa", numa_parse_early_param);
@@ -110,7 +111,7 @@ static void __init setup_node_to_cpumask_map(void)
 		set_cpu_numa_node(cpu, NUMA_NO_NODE);

 	/* cpumask_of_node() will now work */
-	pr_debug("NUMA: Node to cpumask map for %d nodes\n", nr_node_ids);
+	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
 }

 /*
@@ -145,13 +146,13 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)

 	ret = memblock_set_node(start, (end - start), &memblock.memory, nid);
 	if (ret < 0) {
-		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
+		pr_err("memblock [0x%llx - 0x%llx] failed to add on node %d\n",
 			start, (end - 1), nid);
 		return ret;
 	}

 	node_set(nid, numa_nodes_parsed);
-	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
+	pr_info("Adding memblock [0x%llx - 0x%llx] on node %d\n",
 			start, (end - 1), nid);
 	return ret;
 }
@@ -166,19 +167,18 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 	void *nd;
 	int tnid;

-	pr_info("NUMA: Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
-			nid, start_pfn << PAGE_SHIFT,
-			(end_pfn << PAGE_SHIFT) - 1);
+	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+		nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);

 	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
 	nd = __va(nd_pa);

 	/* report and initialize */
-	pr_info("NUMA: NODE_DATA [mem %#010Lx-%#010Lx]\n",
+	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
 		nd_pa, nd_pa + nd_size - 1);
 	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
 	if (tnid != nid)
-		pr_info("NUMA: NODE_DATA(%d) on node %d\n", nid, tnid);
+		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);

 	node_data[nid] = nd;
 	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
@@ -235,8 +235,7 @@ static int __init numa_alloc_distance(void)
 			numa_distance[i * numa_distance_cnt + j] = i == j ?
 				LOCAL_DISTANCE : REMOTE_DISTANCE;

-	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
-			numa_distance_cnt);
+	pr_debug("Initialized distance table, cnt=%d\n", numa_distance_cnt);

 	return 0;
 }
@@ -257,20 +256,20 @@ static int __init numa_alloc_distance(void)
 void __init numa_set_distance(int from, int to, int distance)
 {
 	if (!numa_distance) {
-		pr_warn_once("NUMA: Warning: distance table not allocated yet\n");
+		pr_warn_once("Warning: distance table not allocated yet\n");
 		return;
 	}

 	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
 			from < 0 || to < 0) {
-		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
+		pr_warn_once("Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
 			    from, to, distance);
 		return;
 	}

 	if ((u8)distance != distance ||
 	    (from == to && distance != LOCAL_DISTANCE)) {
-		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
+		pr_warn_once("Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
 			     from, to, distance);
 		return;
 	}
@@ -297,7 +296,7 @@ static int __init numa_register_nodes(void)
 	/* Check that valid nid is set to memblks */
 	for_each_memblock(memory, mblk)
 		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES) {
-			pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
+			pr_warn("Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
 				mblk->nid, mblk->base,
 				mblk->base + mblk->size - 1);
 			return -EINVAL;
@@ -368,9 +367,10 @@ static int __init dummy_numa_init(void)
 	struct memblock_region *mblk;

 	if (numa_off)
-		pr_info("NUMA disabled\n"); /* Forced off on command line. */
-	pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n",
-	       0LLU, PFN_PHYS(max_pfn) - 1);
+		pr_warn("NUMA turned off by user\n"); /* Forced off on command line. */
+
+	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
+		0LLU, PFN_PHYS(max_pfn) - 1);

 	for_each_memblock(memory, mblk) {
 		ret = numa_add_memblk(0, mblk->base, mblk->base + mblk->size);
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (7 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 08/14] arm64: numa: " Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-26 13:28   ` Will Deacon
  2016-08-24  7:44 ` [PATCH v7 10/14] arm64/numa: define numa_distance as array to simplify code Zhen Lei
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

To make each percpu area allocated from its local numa node. Without this
patch, all percpu areas will be allocated from the node which cpu0 belongs
to.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm64/Kconfig   |  8 ++++++++
 arch/arm64/mm/numa.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bc3f00f..2815af6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -603,6 +603,14 @@ config USE_PERCPU_NUMA_NODE_ID
 	def_bool y
 	depends on NUMA

+config HAVE_SETUP_PER_CPU_AREA
+	def_bool y
+	depends on NUMA
+
+config NEED_PER_CPU_EMBED_FIRST_CHUNK
+	def_bool y
+	depends on NUMA
+
 source kernel/Kconfig.preempt
 source kernel/Kconfig.hz

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 7b73808..5e44ad1 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -26,6 +26,7 @@
 #include <linux/of.h>

 #include <asm/acpi.h>
+#include <asm/sections.h>

 struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
 EXPORT_SYMBOL(node_data);
@@ -131,6 +132,60 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
 	cpu_to_node_map[cpu] = nid;
 }

+#ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
+unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
+EXPORT_SYMBOL(__per_cpu_offset);
+
+static int __init early_cpu_to_node(int cpu)
+{
+	return cpu_to_node_map[cpu];
+}
+
+static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
+{
+	if (early_cpu_to_node(from) == early_cpu_to_node(to))
+		return LOCAL_DISTANCE;
+	else
+		return REMOTE_DISTANCE;
+}
+
+static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size,
+				       size_t align)
+{
+	int nid = early_cpu_to_node(cpu);
+
+	return  memblock_virt_alloc_try_nid(size, align,
+			__pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid);
+}
+
+static void __init pcpu_fc_free(void *ptr, size_t size)
+{
+	memblock_free_early(__pa(ptr), size);
+}
+
+void __init setup_per_cpu_areas(void)
+{
+	unsigned long delta;
+	unsigned int cpu;
+	int rc;
+
+	/*
+	 * Always reserve area for module percpu variables.  That's
+	 * what the legacy allocator did.
+	 */
+	rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
+				    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
+				    pcpu_cpu_distance,
+				    pcpu_fc_alloc, pcpu_fc_free);
+	if (rc < 0)
+		panic("Failed to initialize percpu areas.");
+
+	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
+	for_each_possible_cpu(cpu)
+		__per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
+}
+#endif
+
 /**
  * numa_add_memblk - Set node id to memblk
  * @nid: NUMA node ID of the new memblk
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 10/14] arm64/numa: define numa_distance as array to simplify code
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (8 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-26 15:29   ` Will Deacon
  2016-08-24  7:44 ` [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
                   ` (3 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

1. MAX_NUMNODES is base on CONFIG_NODES_SHIFT, the default value of the
   latter is very small now.
2. Suppose the default value of MAX_NUMNODES is enlarged to 64, so the
   size of numa_distance is 4K, it's still acceptable if run the Image
   on other processors.
3. It will make function __node_distance quicker than before.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm64/include/asm/numa.h |  1 -
 arch/arm64/mm/numa.c          | 74 +++----------------------------------------
 2 files changed, 5 insertions(+), 70 deletions(-)

diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
index 600887e..9b6cc38 100644
--- a/arch/arm64/include/asm/numa.h
+++ b/arch/arm64/include/asm/numa.h
@@ -32,7 +32,6 @@ static inline const struct cpumask *cpumask_of_node(int node)
 void __init arm64_numa_init(void);
 int __init numa_add_memblk(int nodeid, u64 start, u64 end);
 void __init numa_set_distance(int from, int to, int distance);
-void __init numa_free_distance(void);
 void __init early_map_cpu_to_node(unsigned int cpu, int nid);
 void numa_store_cpu_info(unsigned int cpu);

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 5e44ad1..6853db7 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -33,8 +33,7 @@ EXPORT_SYMBOL(node_data);
 nodemask_t numa_nodes_parsed __initdata;
 static int cpu_to_node_map[NR_CPUS] = { [0 ... NR_CPUS-1] = NUMA_NO_NODE };

-static int numa_distance_cnt;
-static u8 *numa_distance;
+static u8 numa_distance[MAX_NUMNODES][MAX_NUMNODES];
 static bool numa_off;

 static __init int numa_parse_early_param(char *opt)
@@ -243,59 +242,6 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 }

 /**
- * numa_free_distance
- *
- * The current table is freed.
- */
-void __init numa_free_distance(void)
-{
-	size_t size;
-
-	if (!numa_distance)
-		return;
-
-	size = numa_distance_cnt * numa_distance_cnt *
-		sizeof(numa_distance[0]);
-
-	memblock_free(__pa(numa_distance), size);
-	numa_distance_cnt = 0;
-	numa_distance = NULL;
-}
-
-/**
- *
- * Create a new NUMA distance table.
- *
- */
-static int __init numa_alloc_distance(void)
-{
-	size_t size;
-	u64 phys;
-	int i, j;
-
-	size = nr_node_ids * nr_node_ids * sizeof(numa_distance[0]);
-	phys = memblock_find_in_range(0, PFN_PHYS(max_pfn),
-				      size, PAGE_SIZE);
-	if (WARN_ON(!phys))
-		return -ENOMEM;
-
-	memblock_reserve(phys, size);
-
-	numa_distance = __va(phys);
-	numa_distance_cnt = nr_node_ids;
-
-	/* fill with the default distances */
-	for (i = 0; i < numa_distance_cnt; i++)
-		for (j = 0; j < numa_distance_cnt; j++)
-			numa_distance[i * numa_distance_cnt + j] = i == j ?
-				LOCAL_DISTANCE : REMOTE_DISTANCE;
-
-	pr_debug("Initialized distance table, cnt=%d\n", numa_distance_cnt);
-
-	return 0;
-}
-
-/**
  * numa_set_distance - Set inter node NUMA distance from node to node.
  * @from: the 'from' node to set distance
  * @to: the 'to'  node to set distance
@@ -310,12 +256,7 @@ static int __init numa_alloc_distance(void)
  */
 void __init numa_set_distance(int from, int to, int distance)
 {
-	if (!numa_distance) {
-		pr_warn_once("Warning: distance table not allocated yet\n");
-		return;
-	}
-
-	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
+	if (from >= MAX_NUMNODES || to >= MAX_NUMNODES ||
 			from < 0 || to < 0) {
 		pr_warn_once("Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
 			    from, to, distance);
@@ -329,7 +270,7 @@ void __init numa_set_distance(int from, int to, int distance)
 		return;
 	}

-	numa_distance[from * numa_distance_cnt + to] = distance;
+	numa_distance[from][to] = distance;
 }

 /**
@@ -337,9 +278,9 @@ void __init numa_set_distance(int from, int to, int distance)
  */
 int __node_distance(int from, int to)
 {
-	if (from >= numa_distance_cnt || to >= numa_distance_cnt)
+	if (from >= MAX_NUMNODES || to >= MAX_NUMNODES)
 		return from == to ? LOCAL_DISTANCE : REMOTE_DISTANCE;
-	return numa_distance[from * numa_distance_cnt + to];
+	return numa_distance[from][to];
 }
 EXPORT_SYMBOL(__node_distance);

@@ -379,11 +320,6 @@ static int __init numa_init(int (*init_func)(void))
 	nodes_clear(numa_nodes_parsed);
 	nodes_clear(node_possible_map);
 	nodes_clear(node_online_map);
-	numa_free_distance();
-
-	ret = numa_alloc_distance();
-	if (ret < 0)
-		return ret;

 	ret = init_func();
 	if (ret < 0)
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (9 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 10/14] arm64/numa: define numa_distance as array to simplify code Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-26 15:43   ` Will Deacon
  2016-08-24  7:44 ` [PATCH v7 12/14] arm64/numa: remove the limitation that cpu0 must bind to node0 Zhen Lei
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

Some numa nodes may have no memory. For example:
1. cpu0 on node0
2. cpu1 on node1
3. device0 access the momory from node0 and node1 take the same time.

So, we can not simply classify device0 to node0 or node1, but we can
define a node2 which distances to node0 and node1 are the same.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm64/Kconfig      |  4 ++++
 arch/arm64/kernel/smp.c |  1 +
 arch/arm64/mm/numa.c    | 43 +++++++++++++++++++++++++++++++++++++++++--
 3 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 2815af6..3a2b6ed 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -611,6 +611,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
 	def_bool y
 	depends on NUMA

+config HAVE_MEMORYLESS_NODES
+	def_bool y
+	depends on NUMA
+
 source kernel/Kconfig.preempt
 source kernel/Kconfig.hz

diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index d93d433..4879085 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -619,6 +619,7 @@ static void __init of_parse_and_init_cpus(void)
 			}

 			bootcpu_valid = true;
+			early_map_cpu_to_node(0, of_node_to_nid(dn));

 			/*
 			 * cpu_logical_map has already been
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 6853db7..114180f 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -129,6 +129,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
 		nid = 0;

 	cpu_to_node_map[cpu] = nid;
+
+	/*
+	 * We should set the numa node of cpu0 as soon as possible, because it
+	 * has already been set up online before. cpu_to_node(0) will soon be
+	 * called.
+	 */
+	if (!cpu)
+		set_cpu_numa_node(cpu, nid);
 }

 #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
@@ -211,6 +219,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
 	return ret;
 }

+static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t size)
+{
+	int i, best_nid, distance;
+	u64 pa;
+	DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
+
+	bitmap_zero(nodes_map, MAX_NUMNODES);
+	bitmap_set(nodes_map, nid, 1);
+
+find_nearest_node:
+	best_nid = NUMA_NO_NODE;
+	distance = INT_MAX;
+
+	for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
+		if (numa_distance[nid][i] < distance) {
+			best_nid = i;
+			distance = numa_distance[nid][i];
+		}
+
+	pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
+	if (!pa) {
+		BUG_ON(best_nid == NUMA_NO_NODE);
+		bitmap_set(nodes_map, best_nid, 1);
+		goto find_nearest_node;
+	}
+
+	return pa;
+}
+
 /**
  * Initialize NODE_DATA for a node on the local memory
  */
@@ -224,7 +261,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
 		nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);

-	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
+	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	if (!nd_pa)
+		nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
 	nd = __va(nd_pa);

 	/* report and initialize */
@@ -234,7 +273,7 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
 	if (tnid != nid)
 		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);

-	node_data[nid] = nd;
+	NODE_DATA(nid) = nd;
 	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
 	NODE_DATA(nid)->node_id = nid;
 	NODE_DATA(nid)->node_start_pfn = start_pfn;
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 12/14] arm64/numa: remove the limitation that cpu0 must bind to node0
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (10 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-26 15:49   ` Will Deacon
  2016-08-24  7:44 ` [PATCH v7 13/14] of/numa: remove the constraint on the distances of node pairs Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 14/14] Documentation: " Zhen Lei
  13 siblings, 1 reply; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

1. Currently only cpu0 set on cpu_possible_mask and percpu areas have not
   been initialized.
2. No reason to limit cpu0 must belongs to node0.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
---
 arch/arm64/mm/numa.c | 12 ++----------
 1 file changed, 2 insertions(+), 10 deletions(-)

diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
index 114180f..07a1978 100644
--- a/arch/arm64/mm/numa.c
+++ b/arch/arm64/mm/numa.c
@@ -94,7 +94,6 @@ void numa_clear_node(unsigned int cpu)
  */
 static void __init setup_node_to_cpumask_map(void)
 {
-	unsigned int cpu;
 	int node;

 	/* setup nr_node_ids if not done yet */
@@ -107,9 +106,6 @@ static void __init setup_node_to_cpumask_map(void)
 		cpumask_clear(node_to_cpumask_map[node]);
 	}

-	for_each_possible_cpu(cpu)
-		set_cpu_numa_node(cpu, NUMA_NO_NODE);
-
 	/* cpumask_of_node() will now work */
 	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
 }
@@ -119,13 +115,13 @@ static void __init setup_node_to_cpumask_map(void)
  */
 void numa_store_cpu_info(unsigned int cpu)
 {
-	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
+	map_cpu_to_node(cpu, cpu_to_node_map[cpu]);
 }

 void __init early_map_cpu_to_node(unsigned int cpu, int nid)
 {
 	/* fallback to node 0 */
-	if (nid < 0 || nid >= MAX_NUMNODES)
+	if (nid < 0 || nid >= MAX_NUMNODES || numa_off)
 		nid = 0;

 	cpu_to_node_map[cpu] = nid;
@@ -375,10 +371,6 @@ static int __init numa_init(int (*init_func)(void))

 	setup_node_to_cpumask_map();

-	/* init boot processor */
-	cpu_to_node_map[0] = 0;
-	map_cpu_to_node(0, 0);
-
 	return 0;
 }

--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 13/14] of/numa: remove the constraint on the distances of node pairs
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (11 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 12/14] arm64/numa: remove the limitation that cpu0 must bind to node0 Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-24  7:44 ` [PATCH v7 14/14] Documentation: " Zhen Lei
  13 siblings, 0 replies; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

At present, the distances must equal in both direction for each node
pairs. For example: the distance of node B->A must the same to A->B.
But we really don't have to do this.

End up fill default distances as below:
1. If both direction specified, keep no change.
2. If only one direction specified, assign it to the other direction.
3. If none of the two direction specified, both are assigned to
   REMOTE_DISTANCE.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 drivers/of/of_numa.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
index 1234b4a..d0cdf29 100644
--- a/drivers/of/of_numa.c
+++ b/drivers/of/of_numa.c
@@ -127,15 +127,25 @@ static int __init of_numa_parse_distance_map_v1(struct device_node *map)
 		numa_set_distance(nodea, nodeb, distance);
 		pr_debug("distance[node%d -> node%d] = %d\n",
 			 nodea, nodeb, distance);
-
-		/* Set default distance of node B->A same as A->B */
-		if (nodeb > nodea)
-			numa_set_distance(nodeb, nodea, distance);
 	}

 	return 0;
 }

+static void __init fill_default_distances(void)
+{
+	int i, j;
+
+	for (i = 0; i < nr_node_ids; i++)
+		for (j = 0; j < nr_node_ids; j++)
+			if (i == j)
+				numa_set_distance(i, j, LOCAL_DISTANCE);
+			else if (!node_distance(i, j))
+				numa_set_distance(i, j,
+				    node_distance(j, i) ? : REMOTE_DISTANCE);
+
+}
+
 static int __init of_numa_parse_distance_map(void)
 {
 	int ret = 0;
@@ -145,8 +155,10 @@ static int __init of_numa_parse_distance_map(void)
 				     "numa-distance-map-v1");
 	if (np)
 		ret = of_numa_parse_distance_map_v1(np);
-
 	of_node_put(np);
+
+	fill_default_distances();
+
 	return ret;
 }

--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH v7 14/14] Documentation: remove the constraint on the distances of node pairs
  2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
                   ` (12 preceding siblings ...)
  2016-08-24  7:44 ` [PATCH v7 13/14] of/numa: remove the constraint on the distances of node pairs Zhen Lei
@ 2016-08-24  7:44 ` Zhen Lei
  2016-08-26 15:35   ` Will Deacon
  13 siblings, 1 reply; 36+ messages in thread
From: Zhen Lei @ 2016-08-24  7:44 UTC (permalink / raw)
  To: Catalin Marinas, Will Deacon, linux-arm-kernel, linux-kernel,
	Rob Herring, Frank Rowand, devicetree
  Cc: Zefan Li, Xinwei Hu, Tianhong Ding, Hanjun Guo, Zhen Lei

Update documentation. This limit is unneccessary.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/numa.txt | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt
index 21b3505..c0ea4a7 100644
--- a/Documentation/devicetree/bindings/numa.txt
+++ b/Documentation/devicetree/bindings/numa.txt
@@ -48,7 +48,6 @@ distance (memory latency) between all numa nodes.

   Note:
 	1. Each entry represents distance from first node to second node.
-	The distances are equal in either direction.
 	2. The distance from a node to self (local distance) is represented
 	with value 10 and all internode distance should be represented with
 	a value greater than 10.
--
2.5.0

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 03/14] arm64/numa: add nid check for memory block
  2016-08-24  7:44 ` [PATCH v7 03/14] arm64/numa: add nid check for " Zhen Lei
@ 2016-08-26 12:39   ` Will Deacon
  2016-08-27  8:02     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-26 12:39 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Wed, Aug 24, 2016 at 03:44:42PM +0800, Zhen Lei wrote:
> Use the same tactic to cpu and numa-distance nodes.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  drivers/of/of_numa.c | 5 +++++
>  1 file changed, 5 insertions(+)

The subject has arm64/numa, but this is clearly core OF code and
requires an ack from Rob.

The commit message also doesn't make much sense to me.

> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
> index 7b3fbdc..afaeb9c 100644
> --- a/drivers/of/of_numa.c
> +++ b/drivers/of/of_numa.c
> @@ -75,6 +75,11 @@ static int __init of_numa_parse_memory_nodes(void)
>  			 */
>  			continue;
> 
> +		if (nid >= MAX_NUMNODES) {
> +			pr_warn("NUMA: Node id %u exceeds maximum value\n", nid);
> +			return -EINVAL;
> +		}

Do you really want to return from the function here? Shouldn't we at least
of_node_put(np), i.e. by using a break; ?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed
  2016-08-24  7:44 ` [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed Zhen Lei
@ 2016-08-26 12:47   ` Will Deacon
  2016-08-27  8:54     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-26 12:47 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Wed, Aug 24, 2016 at 03:44:44PM +0800, Zhen Lei wrote:
> numa_init(of_numa_init) may returned error because of numa configuration
> error. So "No NUMA configuration found" is inaccurate. In fact, specific
> configuration error information should be immediately printed by the
> testing branch.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/arm64/mm/numa.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 5bb15ea..d97c6e2 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -335,8 +335,10 @@ static int __init numa_init(int (*init_func)(void))
>  	if (ret < 0)
>  		return ret;
> 
> -	if (nodes_empty(numa_nodes_parsed))
> +	if (nodes_empty(numa_nodes_parsed)) {
> +		pr_info("No NUMA configuration found\n");
>  		return -EINVAL;

Hmm, but dummy_numa_init calls node_set(nid, numa_nodes_parsed) for a
completely artificial setup, created by adding all memblocks to node 0,
so this new message will be suppressed even though things really did go
wrong.

In that case, don't we want to print *something* (like we do today in
dummy_numa_init) but maybe not "No NUMA configuration found"? What
exactly do you find inaccurate about the current message?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 08/14] arm64: numa: Use pr_fmt()
  2016-08-24  7:44 ` [PATCH v7 08/14] arm64: numa: " Zhen Lei
@ 2016-08-26 12:54   ` Will Deacon
  2016-08-27  9:14     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-26 12:54 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Wed, Aug 24, 2016 at 03:44:47PM +0800, Zhen Lei wrote:
> From: Kefeng Wang <wangkefeng.wang@huawei.com>
> 
> Use pr_fmt to prefix kernel output, and remove duplicated msg
> of NUMA turned off.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  arch/arm64/mm/numa.c | 40 ++++++++++++++++++++--------------------
>  1 file changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index d97c6e2..7b73808 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -17,6 +17,8 @@
>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>   */
> 
> +#define pr_fmt(fmt) "numa: " fmt

Shouldn't this be uppercase for consistency with the existing code and
the code in places like drivers/of/of_numa.c?

>  #include <linux/acpi.h>
>  #include <linux/bootmem.h>
>  #include <linux/memblock.h>
> @@ -38,10 +40,9 @@ static __init int numa_parse_early_param(char *opt)
>  {
>  	if (!opt)
>  		return -EINVAL;
> -	if (!strncmp(opt, "off", 3)) {
> -		pr_info("%s\n", "NUMA turned off");
> +	if (!strncmp(opt, "off", 3))
>  		numa_off = true;
> -	}
> +
>  	return 0;
>  }
>  early_param("numa", numa_parse_early_param);
> @@ -110,7 +111,7 @@ static void __init setup_node_to_cpumask_map(void)
>  		set_cpu_numa_node(cpu, NUMA_NO_NODE);
> 
>  	/* cpumask_of_node() will now work */
> -	pr_debug("NUMA: Node to cpumask map for %d nodes\n", nr_node_ids);
> +	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>  }
> 
>  /*
> @@ -145,13 +146,13 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
> 
>  	ret = memblock_set_node(start, (end - start), &memblock.memory, nid);
>  	if (ret < 0) {
> -		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
> +		pr_err("memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>  			start, (end - 1), nid);
>  		return ret;
>  	}
> 
>  	node_set(nid, numa_nodes_parsed);
> -	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
> +	pr_info("Adding memblock [0x%llx - 0x%llx] on node %d\n",
>  			start, (end - 1), nid);
>  	return ret;
>  }
> @@ -166,19 +167,18 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>  	void *nd;
>  	int tnid;
> 
> -	pr_info("NUMA: Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> -			nid, start_pfn << PAGE_SHIFT,
> -			(end_pfn << PAGE_SHIFT) - 1);
> +	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
> +		nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
> 
>  	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>  	nd = __va(nd_pa);
> 
>  	/* report and initialize */
> -	pr_info("NUMA: NODE_DATA [mem %#010Lx-%#010Lx]\n",
> +	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",

Why are you adding leading whitespace?

>  		nd_pa, nd_pa + nd_size - 1);
>  	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>  	if (tnid != nid)
> -		pr_info("NUMA: NODE_DATA(%d) on node %d\n", nid, tnid);
> +		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);


Same here.

>  	node_data[nid] = nd;
>  	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
> @@ -235,8 +235,7 @@ static int __init numa_alloc_distance(void)
>  			numa_distance[i * numa_distance_cnt + j] = i == j ?
>  				LOCAL_DISTANCE : REMOTE_DISTANCE;
> 
> -	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
> -			numa_distance_cnt);
> +	pr_debug("Initialized distance table, cnt=%d\n", numa_distance_cnt);
> 
>  	return 0;
>  }
> @@ -257,20 +256,20 @@ static int __init numa_alloc_distance(void)
>  void __init numa_set_distance(int from, int to, int distance)
>  {
>  	if (!numa_distance) {
> -		pr_warn_once("NUMA: Warning: distance table not allocated yet\n");
> +		pr_warn_once("Warning: distance table not allocated yet\n");
>  		return;
>  	}
> 
>  	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>  			from < 0 || to < 0) {
> -		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
> +		pr_warn_once("Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>  			    from, to, distance);
>  		return;
>  	}
> 
>  	if ((u8)distance != distance ||
>  	    (from == to && distance != LOCAL_DISTANCE)) {
> -		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
> +		pr_warn_once("Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>  			     from, to, distance);
>  		return;
>  	}
> @@ -297,7 +296,7 @@ static int __init numa_register_nodes(void)
>  	/* Check that valid nid is set to memblks */
>  	for_each_memblock(memory, mblk)
>  		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES) {
> -			pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
> +			pr_warn("Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
>  				mblk->nid, mblk->base,
>  				mblk->base + mblk->size - 1);
>  			return -EINVAL;
> @@ -368,9 +367,10 @@ static int __init dummy_numa_init(void)
>  	struct memblock_region *mblk;
> 
>  	if (numa_off)
> -		pr_info("NUMA disabled\n"); /* Forced off on command line. */
> -	pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n",
> -	       0LLU, PFN_PHYS(max_pfn) - 1);
> +		pr_warn("NUMA turned off by user\n"); /* Forced off on command line. */

Why are you changing the string? What's wrong with "NUMA turned off" like
we had before?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA
  2016-08-24  7:44 ` [PATCH v7 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA Zhen Lei
@ 2016-08-26 13:28   ` Will Deacon
  2016-08-27 10:06     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-26 13:28 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Wed, Aug 24, 2016 at 03:44:48PM +0800, Zhen Lei wrote:
> To make each percpu area allocated from its local numa node. Without this
> patch, all percpu areas will be allocated from the node which cpu0 belongs
> to.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/arm64/Kconfig   |  8 ++++++++
>  arch/arm64/mm/numa.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index bc3f00f..2815af6 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -603,6 +603,14 @@ config USE_PERCPU_NUMA_NODE_ID
>  	def_bool y
>  	depends on NUMA
> 
> +config HAVE_SETUP_PER_CPU_AREA
> +	def_bool y
> +	depends on NUMA
> +
> +config NEED_PER_CPU_EMBED_FIRST_CHUNK
> +	def_bool y
> +	depends on NUMA

Why do we need this? Is it purely about using block mappings for the
pcpu area?

>  source kernel/Kconfig.preempt
>  source kernel/Kconfig.hz
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 7b73808..5e44ad1 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -26,6 +26,7 @@
>  #include <linux/of.h>
> 
>  #include <asm/acpi.h>
> +#include <asm/sections.h>
> 
>  struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>  EXPORT_SYMBOL(node_data);
> @@ -131,6 +132,60 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>  	cpu_to_node_map[cpu] = nid;
>  }
> 
> +#ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
> +unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
> +EXPORT_SYMBOL(__per_cpu_offset);
> +
> +static int __init early_cpu_to_node(int cpu)
> +{
> +	return cpu_to_node_map[cpu];
> +}
> +
> +static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
> +{
> +	if (early_cpu_to_node(from) == early_cpu_to_node(to))
> +		return LOCAL_DISTANCE;
> +	else
> +		return REMOTE_DISTANCE;
> +}

Is it too early to use __node_distance here?

> +static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size,
> +				       size_t align)
> +{
> +	int nid = early_cpu_to_node(cpu);
> +
> +	return  memblock_virt_alloc_try_nid(size, align,
> +			__pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid);
> +}
> +
> +static void __init pcpu_fc_free(void *ptr, size_t size)
> +{
> +	memblock_free_early(__pa(ptr), size);
> +}
> +
> +void __init setup_per_cpu_areas(void)
> +{
> +	unsigned long delta;
> +	unsigned int cpu;
> +	int rc;
> +
> +	/*
> +	 * Always reserve area for module percpu variables.  That's
> +	 * what the legacy allocator did.
> +	 */
> +	rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
> +				    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
> +				    pcpu_cpu_distance,
> +				    pcpu_fc_alloc, pcpu_fc_free);
> +	if (rc < 0)
> +		panic("Failed to initialize percpu areas.");
> +
> +	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
> +	for_each_possible_cpu(cpu)
> +		__per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
> +}
> +#endif

It's a pity that this is practically identical to PowerPC. Ideally, there
would be definitions of this initialisation gunk in the core code that
could be reused across architectures.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 10/14] arm64/numa: define numa_distance as array to simplify code
  2016-08-24  7:44 ` [PATCH v7 10/14] arm64/numa: define numa_distance as array to simplify code Zhen Lei
@ 2016-08-26 15:29   ` Will Deacon
  2016-08-27 10:29     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-26 15:29 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Wed, Aug 24, 2016 at 03:44:49PM +0800, Zhen Lei wrote:
> 1. MAX_NUMNODES is base on CONFIG_NODES_SHIFT, the default value of the
>    latter is very small now.
> 2. Suppose the default value of MAX_NUMNODES is enlarged to 64, so the
>    size of numa_distance is 4K, it's still acceptable if run the Image
>    on other processors.
> 3. It will make function __node_distance quicker than before.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/arm64/include/asm/numa.h |  1 -
>  arch/arm64/mm/numa.c          | 74 +++----------------------------------------
>  2 files changed, 5 insertions(+), 70 deletions(-)

I fail to see the advantages of this patch. Do you have some compelling
performance figures or something?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 14/14] Documentation: remove the constraint on the distances of node pairs
  2016-08-24  7:44 ` [PATCH v7 14/14] Documentation: " Zhen Lei
@ 2016-08-26 15:35   ` Will Deacon
  2016-08-27 10:44     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-26 15:35 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Wed, Aug 24, 2016 at 03:44:53PM +0800, Zhen Lei wrote:
> Update documentation. This limit is unneccessary.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> Acked-by: Rob Herring <robh@kernel.org>
> ---
>  Documentation/devicetree/bindings/numa.txt | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt
> index 21b3505..c0ea4a7 100644
> --- a/Documentation/devicetree/bindings/numa.txt
> +++ b/Documentation/devicetree/bindings/numa.txt
> @@ -48,7 +48,6 @@ distance (memory latency) between all numa nodes.
> 
>    Note:
>  	1. Each entry represents distance from first node to second node.
> -	The distances are equal in either direction.

Hmm, so what happens now if firmware provides a description where both
distances (in either direction) are supplied, but are different?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
  2016-08-24  7:44 ` [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
@ 2016-08-26 15:43   ` Will Deacon
  2016-08-27 11:05     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-26 15:43 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Wed, Aug 24, 2016 at 03:44:50PM +0800, Zhen Lei wrote:
> Some numa nodes may have no memory. For example:
> 1. cpu0 on node0
> 2. cpu1 on node1
> 3. device0 access the momory from node0 and node1 take the same time.
> 
> So, we can not simply classify device0 to node0 or node1, but we can
> define a node2 which distances to node0 and node1 are the same.
> 
> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/arm64/Kconfig      |  4 ++++
>  arch/arm64/kernel/smp.c |  1 +
>  arch/arm64/mm/numa.c    | 43 +++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 46 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 2815af6..3a2b6ed 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -611,6 +611,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>  	def_bool y
>  	depends on NUMA
> 
> +config HAVE_MEMORYLESS_NODES
> +	def_bool y
> +	depends on NUMA
> +
>  source kernel/Kconfig.preempt
>  source kernel/Kconfig.hz
> 
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index d93d433..4879085 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -619,6 +619,7 @@ static void __init of_parse_and_init_cpus(void)
>  			}
> 
>  			bootcpu_valid = true;
> +			early_map_cpu_to_node(0, of_node_to_nid(dn));

This seems unrelated?

>  			/*
>  			 * cpu_logical_map has already been
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 6853db7..114180f 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -129,6 +129,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>  		nid = 0;
> 
>  	cpu_to_node_map[cpu] = nid;
> +
> +	/*
> +	 * We should set the numa node of cpu0 as soon as possible, because it
> +	 * has already been set up online before. cpu_to_node(0) will soon be
> +	 * called.
> +	 */
> +	if (!cpu)
> +		set_cpu_numa_node(cpu, nid);

Likewise.

>  }
> 
>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
> @@ -211,6 +219,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>  	return ret;
>  }
> 
> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t size)
> +{
> +	int i, best_nid, distance;
> +	u64 pa;
> +	DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
> +
> +	bitmap_zero(nodes_map, MAX_NUMNODES);
> +	bitmap_set(nodes_map, nid, 1);
> +
> +find_nearest_node:
> +	best_nid = NUMA_NO_NODE;
> +	distance = INT_MAX;
> +
> +	for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
> +		if (numa_distance[nid][i] < distance) {
> +			best_nid = i;
> +			distance = numa_distance[nid][i];
> +		}
> +
> +	pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
> +	if (!pa) {
> +		BUG_ON(best_nid == NUMA_NO_NODE);
> +		bitmap_set(nodes_map, best_nid, 1);
> +		goto find_nearest_node;
> +	}
> +
> +	return pa;
> +}
> +
>  /**
>   * Initialize NODE_DATA for a node on the local memory
>   */
> @@ -224,7 +261,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>  	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>  		nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
> 
> -	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
> +	if (!nd_pa)
> +		nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);

Why not add memblock_alloc_near_nid to the core code, and make it do
what you need there?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 12/14] arm64/numa: remove the limitation that cpu0 must bind to node0
  2016-08-24  7:44 ` [PATCH v7 12/14] arm64/numa: remove the limitation that cpu0 must bind to node0 Zhen Lei
@ 2016-08-26 15:49   ` Will Deacon
  2016-08-29  6:55     ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-26 15:49 UTC (permalink / raw)
  To: Zhen Lei
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Wed, Aug 24, 2016 at 03:44:51PM +0800, Zhen Lei wrote:
> 1. Currently only cpu0 set on cpu_possible_mask and percpu areas have not
>    been initialized.
> 2. No reason to limit cpu0 must belongs to node0.

Whilst I suspect you're using enumerated lists in order to try to make
things clearer, I'm having a really hard time understanding the commit
messages you have in this series. It's actually much better if you
structure them as concise paragraphs explaining:

  - What is the problem that you're fixing?

  - How does that problem manifest?

  - How does the patch fix it?

As far as I can see, this patch just removes a bunch of code with no
explanation as to why it's not required or any problems caused by
keeping it around.

Will

> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> ---
>  arch/arm64/mm/numa.c | 12 ++----------
>  1 file changed, 2 insertions(+), 10 deletions(-)
> 
> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> index 114180f..07a1978 100644
> --- a/arch/arm64/mm/numa.c
> +++ b/arch/arm64/mm/numa.c
> @@ -94,7 +94,6 @@ void numa_clear_node(unsigned int cpu)
>   */
>  static void __init setup_node_to_cpumask_map(void)
>  {
> -	unsigned int cpu;
>  	int node;
> 
>  	/* setup nr_node_ids if not done yet */
> @@ -107,9 +106,6 @@ static void __init setup_node_to_cpumask_map(void)
>  		cpumask_clear(node_to_cpumask_map[node]);
>  	}
> 
> -	for_each_possible_cpu(cpu)
> -		set_cpu_numa_node(cpu, NUMA_NO_NODE);
> -
>  	/* cpumask_of_node() will now work */
>  	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>  }
> @@ -119,13 +115,13 @@ static void __init setup_node_to_cpumask_map(void)
>   */
>  void numa_store_cpu_info(unsigned int cpu)
>  {
> -	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
> +	map_cpu_to_node(cpu, cpu_to_node_map[cpu]);
>  }
> 
>  void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>  {
>  	/* fallback to node 0 */
> -	if (nid < 0 || nid >= MAX_NUMNODES)
> +	if (nid < 0 || nid >= MAX_NUMNODES || numa_off)
>  		nid = 0;
> 
>  	cpu_to_node_map[cpu] = nid;
> @@ -375,10 +371,6 @@ static int __init numa_init(int (*init_func)(void))
> 
>  	setup_node_to_cpumask_map();
> 
> -	/* init boot processor */
> -	cpu_to_node_map[0] = 0;
> -	map_cpu_to_node(0, 0);
> -
>  	return 0;
>  }
> 
> --
> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 03/14] arm64/numa: add nid check for memory block
  2016-08-26 12:39   ` Will Deacon
@ 2016-08-27  8:02     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-27  8:02 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/26 20:39, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:42PM +0800, Zhen Lei wrote:
>> Use the same tactic to cpu and numa-distance nodes.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  drivers/of/of_numa.c | 5 +++++
>>  1 file changed, 5 insertions(+)
> 
> The subject has arm64/numa, but this is clearly core OF code and
I originally added below check in arch/arm64/mm/numa.c, until Hanjun Guo
told me that it should move into drivers/of/of_numa.c

I forgot updating this.

> requires an ack from Rob.
> 
> The commit message also doesn't make much sense to me.
> 
>> diff --git a/drivers/of/of_numa.c b/drivers/of/of_numa.c
>> index 7b3fbdc..afaeb9c 100644
>> --- a/drivers/of/of_numa.c
>> +++ b/drivers/of/of_numa.c
>> @@ -75,6 +75,11 @@ static int __init of_numa_parse_memory_nodes(void)
>>  			 */
>>  			continue;
>>
>> +		if (nid >= MAX_NUMNODES) {
>> +			pr_warn("NUMA: Node id %u exceeds maximum value\n", nid);
>> +			return -EINVAL;
>> +		}
> 
> Do you really want to return from the function here? Shouldn't we at least
> of_node_put(np), i.e. by using a break; ?
Thanks for pointing out this mistake. I will change to "r = -EINVAL" in the next version.

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed
  2016-08-26 12:47   ` Will Deacon
@ 2016-08-27  8:54     ` Leizhen (ThunderTown)
  2016-08-30 17:51       ` Will Deacon
  0 siblings, 1 reply; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-27  8:54 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/26 20:47, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:44PM +0800, Zhen Lei wrote:
>> numa_init(of_numa_init) may returned error because of numa configuration
>> error. So "No NUMA configuration found" is inaccurate. In fact, specific
>> configuration error information should be immediately printed by the
>> testing branch.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  arch/arm64/mm/numa.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 5bb15ea..d97c6e2 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -335,8 +335,10 @@ static int __init numa_init(int (*init_func)(void))
>>  	if (ret < 0)
>>  		return ret;
>>
>> -	if (nodes_empty(numa_nodes_parsed))
>> +	if (nodes_empty(numa_nodes_parsed)) {
>> +		pr_info("No NUMA configuration found\n");
>>  		return -EINVAL;
> 
> Hmm, but dummy_numa_init calls node_set(nid, numa_nodes_parsed) for a
> completely artificial setup, created by adding all memblocks to node 0,
> so this new message will be suppressed even though things really did go
> wrong.
It will be printed by the former: numa_init(of_numa_init)

> 
> In that case, don't we want to print *something* (like we do today in
> dummy_numa_init) but maybe not "No NUMA configuration found"? What
> exactly do you find inaccurate about the current message?
For example:
[    0.000000] NUMA: No distance-matrix property in distance-map
[    0.000000] No NUMA configuration found

So if of_numa_init or arm64_acpi_numa_init returned error, because of
some numa configuration error had been found, it's no good to print "No NUMA ...".

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 08/14] arm64: numa: Use pr_fmt()
  2016-08-26 12:54   ` Will Deacon
@ 2016-08-27  9:14     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-27  9:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/26 20:54, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:47PM +0800, Zhen Lei wrote:
>> From: Kefeng Wang <wangkefeng.wang@huawei.com>
>>
>> Use pr_fmt to prefix kernel output, and remove duplicated msg
>> of NUMA turned off.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>  arch/arm64/mm/numa.c | 40 ++++++++++++++++++++--------------------
>>  1 file changed, 20 insertions(+), 20 deletions(-)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index d97c6e2..7b73808 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -17,6 +17,8 @@
>>   * along with this program.  If not, see <http://www.gnu.org/licenses/>.
>>   */
>>
>> +#define pr_fmt(fmt) "numa: " fmt
> 
> Shouldn't this be uppercase for consistency with the existing code and
> the code in places like drivers/of/of_numa.c?
OK, I will change it to "NUMA: ".

> 
>>  #include <linux/acpi.h>
>>  #include <linux/bootmem.h>
>>  #include <linux/memblock.h>
>> @@ -38,10 +40,9 @@ static __init int numa_parse_early_param(char *opt)
>>  {
>>  	if (!opt)
>>  		return -EINVAL;
>> -	if (!strncmp(opt, "off", 3)) {
>> -		pr_info("%s\n", "NUMA turned off");
>> +	if (!strncmp(opt, "off", 3))
>>  		numa_off = true;
>> -	}
>> +
>>  	return 0;
>>  }
>>  early_param("numa", numa_parse_early_param);
>> @@ -110,7 +111,7 @@ static void __init setup_node_to_cpumask_map(void)
>>  		set_cpu_numa_node(cpu, NUMA_NO_NODE);
>>
>>  	/* cpumask_of_node() will now work */
>> -	pr_debug("NUMA: Node to cpumask map for %d nodes\n", nr_node_ids);
>> +	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>>  }
>>
>>  /*
>> @@ -145,13 +146,13 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>
>>  	ret = memblock_set_node(start, (end - start), &memblock.memory, nid);
>>  	if (ret < 0) {
>> -		pr_err("NUMA: memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>> +		pr_err("memblock [0x%llx - 0x%llx] failed to add on node %d\n",
>>  			start, (end - 1), nid);
>>  		return ret;
>>  	}
>>
>>  	node_set(nid, numa_nodes_parsed);
>> -	pr_info("NUMA: Adding memblock [0x%llx - 0x%llx] on node %d\n",
>> +	pr_info("Adding memblock [0x%llx - 0x%llx] on node %d\n",
>>  			start, (end - 1), nid);
>>  	return ret;
>>  }
>> @@ -166,19 +167,18 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>  	void *nd;
>>  	int tnid;
>>
>> -	pr_info("NUMA: Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> -			nid, start_pfn << PAGE_SHIFT,
>> -			(end_pfn << PAGE_SHIFT) - 1);
>> +	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>> +		nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>
>>  	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>>  	nd = __va(nd_pa);
>>
>>  	/* report and initialize */
>> -	pr_info("NUMA: NODE_DATA [mem %#010Lx-%#010Lx]\n",
>> +	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
> 
> Why are you adding leading whitespace?
Kefeng Wang said that just in order to make the final print info looks more clear.

I will remove the leading whitespace in v8.

> 
>>  		nd_pa, nd_pa + nd_size - 1);
>>  	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
>>  	if (tnid != nid)
>> -		pr_info("NUMA: NODE_DATA(%d) on node %d\n", nid, tnid);
>> +		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
> 
> 
> Same here.
> 
>>  	node_data[nid] = nd;
>>  	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
>> @@ -235,8 +235,7 @@ static int __init numa_alloc_distance(void)
>>  			numa_distance[i * numa_distance_cnt + j] = i == j ?
>>  				LOCAL_DISTANCE : REMOTE_DISTANCE;
>>
>> -	pr_debug("NUMA: Initialized distance table, cnt=%d\n",
>> -			numa_distance_cnt);
>> +	pr_debug("Initialized distance table, cnt=%d\n", numa_distance_cnt);
>>
>>  	return 0;
>>  }
>> @@ -257,20 +256,20 @@ static int __init numa_alloc_distance(void)
>>  void __init numa_set_distance(int from, int to, int distance)
>>  {
>>  	if (!numa_distance) {
>> -		pr_warn_once("NUMA: Warning: distance table not allocated yet\n");
>> +		pr_warn_once("Warning: distance table not allocated yet\n");
>>  		return;
>>  	}
>>
>>  	if (from >= numa_distance_cnt || to >= numa_distance_cnt ||
>>  			from < 0 || to < 0) {
>> -		pr_warn_once("NUMA: Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>> +		pr_warn_once("Warning: node ids are out of bound, from=%d to=%d distance=%d\n",
>>  			    from, to, distance);
>>  		return;
>>  	}
>>
>>  	if ((u8)distance != distance ||
>>  	    (from == to && distance != LOCAL_DISTANCE)) {
>> -		pr_warn_once("NUMA: Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>> +		pr_warn_once("Warning: invalid distance parameter, from=%d to=%d distance=%d\n",
>>  			     from, to, distance);
>>  		return;
>>  	}
>> @@ -297,7 +296,7 @@ static int __init numa_register_nodes(void)
>>  	/* Check that valid nid is set to memblks */
>>  	for_each_memblock(memory, mblk)
>>  		if (mblk->nid == NUMA_NO_NODE || mblk->nid >= MAX_NUMNODES) {
>> -			pr_warn("NUMA: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
>> +			pr_warn("Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
>>  				mblk->nid, mblk->base,
>>  				mblk->base + mblk->size - 1);
>>  			return -EINVAL;
>> @@ -368,9 +367,10 @@ static int __init dummy_numa_init(void)
>>  	struct memblock_region *mblk;
>>
>>  	if (numa_off)
>> -		pr_info("NUMA disabled\n"); /* Forced off on command line. */
>> -	pr_info("NUMA: Faking a node at [mem %#018Lx-%#018Lx]\n",
>> -	       0LLU, PFN_PHYS(max_pfn) - 1);
>> +		pr_warn("NUMA turned off by user\n"); /* Forced off on command line. */
> 
> Why are you changing the string? What's wrong with "NUMA turned off" like
> we had before?
OK. I will keep it no change in v8.

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA
  2016-08-26 13:28   ` Will Deacon
@ 2016-08-27 10:06     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-27 10:06 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/26 21:28, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:48PM +0800, Zhen Lei wrote:
>> To make each percpu area allocated from its local numa node. Without this
>> patch, all percpu areas will be allocated from the node which cpu0 belongs
>> to.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  arch/arm64/Kconfig   |  8 ++++++++
>>  arch/arm64/mm/numa.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 63 insertions(+)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index bc3f00f..2815af6 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -603,6 +603,14 @@ config USE_PERCPU_NUMA_NODE_ID
>>  	def_bool y
>>  	depends on NUMA
>>
>> +config HAVE_SETUP_PER_CPU_AREA
>> +	def_bool y
>> +	depends on NUMA
>> +
>> +config NEED_PER_CPU_EMBED_FIRST_CHUNK
>> +	def_bool y
>> +	depends on NUMA
> 
> Why do we need this? Is it purely about using block mappings for the
> pcpu area?
Without NEED_PER_CPU_EMBED_FIRST_CHUNK, Link error will be reported.

#if defined(CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK) || \
	!defined(CONFIG_HAVE_SETUP_PER_CPU_AREA)
#define BUILD_EMBED_FIRST_CHUNK
#endif

#if defined(BUILD_EMBED_FIRST_CHUNK)
//pcpu_embed_first_chunk definition
#endif

setup_per_cpu_areas -->pcpu_embed_first_chunk


> 
>>  source kernel/Kconfig.preempt
>>  source kernel/Kconfig.hz
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 7b73808..5e44ad1 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -26,6 +26,7 @@
>>  #include <linux/of.h>
>>
>>  #include <asm/acpi.h>
>> +#include <asm/sections.h>
>>
>>  struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
>>  EXPORT_SYMBOL(node_data);
>> @@ -131,6 +132,60 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>>  	cpu_to_node_map[cpu] = nid;
>>  }
>>
>> +#ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>> +unsigned long __per_cpu_offset[NR_CPUS] __read_mostly;
>> +EXPORT_SYMBOL(__per_cpu_offset);
>> +
>> +static int __init early_cpu_to_node(int cpu)
>> +{
>> +	return cpu_to_node_map[cpu];
>> +}
>> +
>> +static int __init pcpu_cpu_distance(unsigned int from, unsigned int to)
>> +{
>> +	if (early_cpu_to_node(from) == early_cpu_to_node(to))
>> +		return LOCAL_DISTANCE;
>> +	else
>> +		return REMOTE_DISTANCE;
>> +}
> 
> Is it too early to use __node_distance here?
Good, we can directly use node_distance, thanks.

> 
>> +static void * __init pcpu_fc_alloc(unsigned int cpu, size_t size,
>> +				       size_t align)
>> +{
>> +	int nid = early_cpu_to_node(cpu);
>> +
>> +	return  memblock_virt_alloc_try_nid(size, align,
>> +			__pa(MAX_DMA_ADDRESS), MEMBLOCK_ALLOC_ACCESSIBLE, nid);
>> +}
>> +
>> +static void __init pcpu_fc_free(void *ptr, size_t size)
>> +{
>> +	memblock_free_early(__pa(ptr), size);
>> +}
>> +
>> +void __init setup_per_cpu_areas(void)
>> +{
>> +	unsigned long delta;
>> +	unsigned int cpu;
>> +	int rc;
>> +
>> +	/*
>> +	 * Always reserve area for module percpu variables.  That's
>> +	 * what the legacy allocator did.
>> +	 */
>> +	rc = pcpu_embed_first_chunk(PERCPU_MODULE_RESERVE,
>> +				    PERCPU_DYNAMIC_RESERVE, PAGE_SIZE,
>> +				    pcpu_cpu_distance,
>> +				    pcpu_fc_alloc, pcpu_fc_free);
>> +	if (rc < 0)
>> +		panic("Failed to initialize percpu areas.");
>> +
>> +	delta = (unsigned long)pcpu_base_addr - (unsigned long)__per_cpu_start;
>> +	for_each_possible_cpu(cpu)
>> +		__per_cpu_offset[cpu] = delta + pcpu_unit_offsets[cpu];
>> +}
>> +#endif
> 
> It's a pity that this is practically identical to PowerPC. Ideally, there
> would be definitions of this initialisation gunk in the core code that
> could be reused across architectures.
But these are different from other ARCHs, except PPC.

I originally want to put it into driver/of/of_numa.c, but now the ACPI NUMA is
coming up, so I don't known where.

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 10/14] arm64/numa: define numa_distance as array to simplify code
  2016-08-26 15:29   ` Will Deacon
@ 2016-08-27 10:29     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-27 10:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/26 23:29, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:49PM +0800, Zhen Lei wrote:
>> 1. MAX_NUMNODES is base on CONFIG_NODES_SHIFT, the default value of the
>>    latter is very small now.
>> 2. Suppose the default value of MAX_NUMNODES is enlarged to 64, so the
>>    size of numa_distance is 4K, it's still acceptable if run the Image
>>    on other processors.
>> 3. It will make function __node_distance quicker than before.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  arch/arm64/include/asm/numa.h |  1 -
>>  arch/arm64/mm/numa.c          | 74 +++----------------------------------------
>>  2 files changed, 5 insertions(+), 70 deletions(-)
> 
> I fail to see the advantages of this patch. Do you have some compelling
> performance figures or something?

We can only put numa_distance_cnt on one node, so for the cpus of other nodes to access it should
spend more time. I have not tested how many can be improved yet.

I will try to get some data next week.

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 14/14] Documentation: remove the constraint on the distances of node pairs
  2016-08-26 15:35   ` Will Deacon
@ 2016-08-27 10:44     ` Leizhen (ThunderTown)
  2016-08-30 17:55       ` Will Deacon
  0 siblings, 1 reply; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-27 10:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/26 23:35, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:53PM +0800, Zhen Lei wrote:
>> Update documentation. This limit is unneccessary.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> Acked-by: Rob Herring <robh@kernel.org>
>> ---
>>  Documentation/devicetree/bindings/numa.txt | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt
>> index 21b3505..c0ea4a7 100644
>> --- a/Documentation/devicetree/bindings/numa.txt
>> +++ b/Documentation/devicetree/bindings/numa.txt
>> @@ -48,7 +48,6 @@ distance (memory latency) between all numa nodes.
>>
>>    Note:
>>  	1. Each entry represents distance from first node to second node.
>> -	The distances are equal in either direction.
> 
> Hmm, so what happens now if firmware provides a description where both
> distances (in either direction) are supplied, but are different?
I have not known any hardware that the distances of two direction are different yet, but:
1. software have no need to limit the distances of two direction must be equal.
2. suppose below software scenario:
   1) cpu0 and cpu1 belong to the same hardware node.
   2) cpu0 is a master control CPU, many tasks and interrupts deliver to cpu0 first. So cpu0 often busy than cpu1.
   3) we split cpu0 and cpu1 into two logical nodes, cpu0 belongs to node0, cpu1 belong to node1. Now, we make
      the distance from cpu0 to cpu1 larger than the distance from cpu1 to cpu0.

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
  2016-08-26 15:43   ` Will Deacon
@ 2016-08-27 11:05     ` Leizhen (ThunderTown)
  2016-08-29  3:15       ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-27 11:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/26 23:43, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:50PM +0800, Zhen Lei wrote:
>> Some numa nodes may have no memory. For example:
>> 1. cpu0 on node0
>> 2. cpu1 on node1
>> 3. device0 access the momory from node0 and node1 take the same time.
>>
>> So, we can not simply classify device0 to node0 or node1, but we can
>> define a node2 which distances to node0 and node1 are the same.
>>
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  arch/arm64/Kconfig      |  4 ++++
>>  arch/arm64/kernel/smp.c |  1 +
>>  arch/arm64/mm/numa.c    | 43 +++++++++++++++++++++++++++++++++++++++++--
>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 2815af6..3a2b6ed 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -611,6 +611,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>  	def_bool y
>>  	depends on NUMA
>>
>> +config HAVE_MEMORYLESS_NODES
>> +	def_bool y
>> +	depends on NUMA
>> +
>>  source kernel/Kconfig.preempt
>>  source kernel/Kconfig.hz
>>
>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>> index d93d433..4879085 100644
>> --- a/arch/arm64/kernel/smp.c
>> +++ b/arch/arm64/kernel/smp.c
>> @@ -619,6 +619,7 @@ static void __init of_parse_and_init_cpus(void)
>>  			}
>>
>>  			bootcpu_valid = true;
>> +			early_map_cpu_to_node(0, of_node_to_nid(dn));
> 
> This seems unrelated?
I will get off my work soon. Maybe I need put it into patch 12.

> 
>>  			/*
>>  			 * cpu_logical_map has already been
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 6853db7..114180f 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -129,6 +129,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>>  		nid = 0;
>>
>>  	cpu_to_node_map[cpu] = nid;
>> +
>> +	/*
>> +	 * We should set the numa node of cpu0 as soon as possible, because it
>> +	 * has already been set up online before. cpu_to_node(0) will soon be
>> +	 * called.
>> +	 */
>> +	if (!cpu)
>> +		set_cpu_numa_node(cpu, nid);
> 
> Likewise.
> 
>>  }
>>
>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>> @@ -211,6 +219,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>  	return ret;
>>  }
>>
>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t size)
>> +{
>> +	int i, best_nid, distance;
>> +	u64 pa;
>> +	DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>> +
>> +	bitmap_zero(nodes_map, MAX_NUMNODES);
>> +	bitmap_set(nodes_map, nid, 1);
>> +
>> +find_nearest_node:
>> +	best_nid = NUMA_NO_NODE;
>> +	distance = INT_MAX;
>> +
>> +	for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>> +		if (numa_distance[nid][i] < distance) {
>> +			best_nid = i;
>> +			distance = numa_distance[nid][i];
>> +		}
>> +
>> +	pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
>> +	if (!pa) {
>> +		BUG_ON(best_nid == NUMA_NO_NODE);
>> +		bitmap_set(nodes_map, best_nid, 1);
>> +		goto find_nearest_node;
>> +	}
>> +
>> +	return pa;
>> +}
>> +
>>  /**
>>   * Initialize NODE_DATA for a node on the local memory
>>   */
>> @@ -224,7 +261,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>  	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>  		nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>
>> -	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>> +	if (!nd_pa)
>> +		nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
> 
> Why not add memblock_alloc_near_nid to the core code, and make it do
> what you need there?
I'm thinking about it next week. But some ARCHs like X86/IA64 have their own implementation.

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
  2016-08-27 11:05     ` Leizhen (ThunderTown)
@ 2016-08-29  3:15       ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-29  3:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/27 19:05, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/8/26 23:43, Will Deacon wrote:
>> On Wed, Aug 24, 2016 at 03:44:50PM +0800, Zhen Lei wrote:
>>> Some numa nodes may have no memory. For example:
>>> 1. cpu0 on node0
>>> 2. cpu1 on node1
>>> 3. device0 access the momory from node0 and node1 take the same time.
>>>
>>> So, we can not simply classify device0 to node0 or node1, but we can
>>> define a node2 which distances to node0 and node1 are the same.
>>>
>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>> ---
>>>  arch/arm64/Kconfig      |  4 ++++
>>>  arch/arm64/kernel/smp.c |  1 +
>>>  arch/arm64/mm/numa.c    | 43 +++++++++++++++++++++++++++++++++++++++++--
>>>  3 files changed, 46 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index 2815af6..3a2b6ed 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -611,6 +611,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK
>>>  	def_bool y
>>>  	depends on NUMA
>>>
>>> +config HAVE_MEMORYLESS_NODES
>>> +	def_bool y
>>> +	depends on NUMA
>>> +
>>>  source kernel/Kconfig.preempt
>>>  source kernel/Kconfig.hz
>>>
>>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
>>> index d93d433..4879085 100644
>>> --- a/arch/arm64/kernel/smp.c
>>> +++ b/arch/arm64/kernel/smp.c
>>> @@ -619,6 +619,7 @@ static void __init of_parse_and_init_cpus(void)
>>>  			}
>>>
>>>  			bootcpu_valid = true;
>>> +			early_map_cpu_to_node(0, of_node_to_nid(dn));
>>
>> This seems unrelated?
> I will get off my work soon. Maybe I need put it into patch 12.
> 
>>
>>>  			/*
>>>  			 * cpu_logical_map has already been
>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>> index 6853db7..114180f 100644
>>> --- a/arch/arm64/mm/numa.c
>>> +++ b/arch/arm64/mm/numa.c
>>> @@ -129,6 +129,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>>>  		nid = 0;
>>>
>>>  	cpu_to_node_map[cpu] = nid;
>>> +
>>> +	/*
>>> +	 * We should set the numa node of cpu0 as soon as possible, because it
>>> +	 * has already been set up online before. cpu_to_node(0) will soon be
>>> +	 * called.
>>> +	 */
>>> +	if (!cpu)
>>> +		set_cpu_numa_node(cpu, nid);
>>
>> Likewise.
>>
>>>  }
>>>
>>>  #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA
>>> @@ -211,6 +219,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end)
>>>  	return ret;
>>>  }
>>>
>>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t size)
>>> +{
>>> +	int i, best_nid, distance;
>>> +	u64 pa;
>>> +	DECLARE_BITMAP(nodes_map, MAX_NUMNODES);
>>> +
>>> +	bitmap_zero(nodes_map, MAX_NUMNODES);
>>> +	bitmap_set(nodes_map, nid, 1);
>>> +
>>> +find_nearest_node:
>>> +	best_nid = NUMA_NO_NODE;
>>> +	distance = INT_MAX;
>>> +
>>> +	for_each_clear_bit(i, nodes_map, MAX_NUMNODES)
>>> +		if (numa_distance[nid][i] < distance) {
>>> +			best_nid = i;
>>> +			distance = numa_distance[nid][i];
>>> +		}
>>> +
>>> +	pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid);
>>> +	if (!pa) {
>>> +		BUG_ON(best_nid == NUMA_NO_NODE);
>>> +		bitmap_set(nodes_map, best_nid, 1);
>>> +		goto find_nearest_node;
>>> +	}
>>> +
>>> +	return pa;
>>> +}
>>> +
>>>  /**
>>>   * Initialize NODE_DATA for a node on the local memory
>>>   */
>>> @@ -224,7 +261,9 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>>>  	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
>>>  		nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1);
>>>
>>> -	nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
>>> +	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
>>> +	if (!nd_pa)
>>> +		nd_pa = alloc_node_data_from_nearest_node(nid, nd_size);
>>
>> Why not add memblock_alloc_near_nid to the core code, and make it do
>> what you need there?
> I'm thinking about it next week. But some ARCHs like X86/IA64 have their own implementation.

Do you mean directly and only call alloc_node_data_from_nearest_node? OK, that's fine. Thanks.

> 
>>
>> Will
>>
>> .
>>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 12/14] arm64/numa: remove the limitation that cpu0 must bind to node0
  2016-08-26 15:49   ` Will Deacon
@ 2016-08-29  6:55     ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-29  6:55 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/26 23:49, Will Deacon wrote:
> On Wed, Aug 24, 2016 at 03:44:51PM +0800, Zhen Lei wrote:
>> 1. Currently only cpu0 set on cpu_possible_mask and percpu areas have not
>>    been initialized.
This description refer to below:
-	for_each_possible_cpu(cpu)
-		set_cpu_numa_node(cpu, NUMA_NO_NODE);

1. When the above code is executed, only the bit of cpu0 was set on cpu_possible_mask.
   So that, only set_cpu_numa_node(0, NUMA_NO_NODE); will be executed.
2. set_cpu_numa_node will access percpu variable numa_node, but setup_per_cpu_areas is
   called after current time. Without the first problem, it will lead kernel crash.

I changed the title of this patch in v7, the original is "remove some useless code".
I think I should separate this into a new patch.



>> 2. No reason to limit cpu0 must belongs to node0.
> 
> Whilst I suspect you're using enumerated lists in order to try to make
> things clearer, I'm having a really hard time understanding the commit
> messages you have in this series. It's actually much better if you
> structure them as concise paragraphs explaining:
> 
>   - What is the problem that you're fixing?
> 
>   - How does that problem manifest?
> 
>   - How does the patch fix it?
> 
> As far as I can see, this patch just removes a bunch of code with no
> explanation as to why it's not required or any problems caused by
> keeping it around.
> 
> Will
> 
>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>> ---
>>  arch/arm64/mm/numa.c | 12 ++----------
>>  1 file changed, 2 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>> index 114180f..07a1978 100644
>> --- a/arch/arm64/mm/numa.c
>> +++ b/arch/arm64/mm/numa.c
>> @@ -94,7 +94,6 @@ void numa_clear_node(unsigned int cpu)
>>   */
>>  static void __init setup_node_to_cpumask_map(void)
>>  {
>> -	unsigned int cpu;
>>  	int node;
>>
>>  	/* setup nr_node_ids if not done yet */
>> @@ -107,9 +106,6 @@ static void __init setup_node_to_cpumask_map(void)
>>  		cpumask_clear(node_to_cpumask_map[node]);
>>  	}
>>
>> -	for_each_possible_cpu(cpu)
>> -		set_cpu_numa_node(cpu, NUMA_NO_NODE);
>> -
>>  	/* cpumask_of_node() will now work */
>>  	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
>>  }
>> @@ -119,13 +115,13 @@ static void __init setup_node_to_cpumask_map(void)
>>   */
>>  void numa_store_cpu_info(unsigned int cpu)
>>  {
>> -	map_cpu_to_node(cpu, numa_off ? 0 : cpu_to_node_map[cpu]);
>> +	map_cpu_to_node(cpu, cpu_to_node_map[cpu]);
>>  }
>>
>>  void __init early_map_cpu_to_node(unsigned int cpu, int nid)
>>  {
>>  	/* fallback to node 0 */
>> -	if (nid < 0 || nid >= MAX_NUMNODES)
>> +	if (nid < 0 || nid >= MAX_NUMNODES || numa_off)
>>  		nid = 0;
After the below code have been removed, we should make the corresponding adjustment.
otherwise, kernel will be crashed if "numa=off" was set in bootargs.

>>
>>  	cpu_to_node_map[cpu] = nid;
>> @@ -375,10 +371,6 @@ static int __init numa_init(int (*init_func)(void))
>>
>>  	setup_node_to_cpumask_map();
>>
>> -	/* init boot processor */
>> -	cpu_to_node_map[0] = 0;
>> -	map_cpu_to_node(0, 0);
These code limit cpu0 must belong to node0, but our current implementation deesn't
have this limitation.

>> -
>>  	return 0;
>>  }
>>
>> --
>> 2.5.0
>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed
  2016-08-27  8:54     ` Leizhen (ThunderTown)
@ 2016-08-30 17:51       ` Will Deacon
  2016-08-31  2:29         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-30 17:51 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Sat, Aug 27, 2016 at 04:54:56PM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/8/26 20:47, Will Deacon wrote:
> > On Wed, Aug 24, 2016 at 03:44:44PM +0800, Zhen Lei wrote:
> >> numa_init(of_numa_init) may returned error because of numa configuration
> >> error. So "No NUMA configuration found" is inaccurate. In fact, specific
> >> configuration error information should be immediately printed by the
> >> testing branch.
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >> ---
> >>  arch/arm64/mm/numa.c | 6 +++---
> >>  1 file changed, 3 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
> >> index 5bb15ea..d97c6e2 100644
> >> --- a/arch/arm64/mm/numa.c
> >> +++ b/arch/arm64/mm/numa.c
> >> @@ -335,8 +335,10 @@ static int __init numa_init(int (*init_func)(void))
> >>  	if (ret < 0)
> >>  		return ret;
> >>
> >> -	if (nodes_empty(numa_nodes_parsed))
> >> +	if (nodes_empty(numa_nodes_parsed)) {
> >> +		pr_info("No NUMA configuration found\n");
> >>  		return -EINVAL;
> > 
> > Hmm, but dummy_numa_init calls node_set(nid, numa_nodes_parsed) for a
> > completely artificial setup, created by adding all memblocks to node 0,
> > so this new message will be suppressed even though things really did go
> > wrong.
> It will be printed by the former: numa_init(of_numa_init)

Does that print an error for every possible failure case? What about the
acpi path?

> > In that case, don't we want to print *something* (like we do today in
> > dummy_numa_init) but maybe not "No NUMA configuration found"? What
> > exactly do you find inaccurate about the current message?
> For example:
> [    0.000000] NUMA: No distance-matrix property in distance-map
> [    0.000000] No NUMA configuration found
> 
> So if of_numa_init or arm64_acpi_numa_init returned error, because of
> some numa configuration error had been found, it's no good to print "No
> NUMA ...".

Sure, I'm all for changing the message. I just think removing it is
probably unhelpful. Something like:

"NUMA: Failed to initialise from firmware"

might do the trick?

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 14/14] Documentation: remove the constraint on the distances of node pairs
  2016-08-27 10:44     ` Leizhen (ThunderTown)
@ 2016-08-30 17:55       ` Will Deacon
  2016-08-31  2:46         ` Leizhen (ThunderTown)
  0 siblings, 1 reply; 36+ messages in thread
From: Will Deacon @ 2016-08-30 17:55 UTC (permalink / raw)
  To: Leizhen (ThunderTown)
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo

On Sat, Aug 27, 2016 at 06:44:39PM +0800, Leizhen (ThunderTown) wrote:
> 
> 
> On 2016/8/26 23:35, Will Deacon wrote:
> > On Wed, Aug 24, 2016 at 03:44:53PM +0800, Zhen Lei wrote:
> >> Update documentation. This limit is unneccessary.
> >>
> >> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
> >> Acked-by: Rob Herring <robh@kernel.org>
> >> ---
> >>  Documentation/devicetree/bindings/numa.txt | 1 -
> >>  1 file changed, 1 deletion(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt
> >> index 21b3505..c0ea4a7 100644
> >> --- a/Documentation/devicetree/bindings/numa.txt
> >> +++ b/Documentation/devicetree/bindings/numa.txt
> >> @@ -48,7 +48,6 @@ distance (memory latency) between all numa nodes.
> >>
> >>    Note:
> >>  	1. Each entry represents distance from first node to second node.
> >> -	The distances are equal in either direction.
> > 
> > Hmm, so what happens now if firmware provides a description where both
> > distances (in either direction) are supplied, but are different?
> I have not known any hardware that the distances of two direction are
> different yet

Then let's not add support for this just yet. When we have systems that
actually need it, we'll be in a much better position to assess the
suitability of any patches. At the moment, the whole thing is pretty
questionable and it adds needless complication to the code.

Will

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed
  2016-08-30 17:51       ` Will Deacon
@ 2016-08-31  2:29         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-31  2:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/31 1:51, Will Deacon wrote:
> On Sat, Aug 27, 2016 at 04:54:56PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/8/26 20:47, Will Deacon wrote:
>>> On Wed, Aug 24, 2016 at 03:44:44PM +0800, Zhen Lei wrote:
>>>> numa_init(of_numa_init) may returned error because of numa configuration
>>>> error. So "No NUMA configuration found" is inaccurate. In fact, specific
>>>> configuration error information should be immediately printed by the
>>>> testing branch.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> ---
>>>>  arch/arm64/mm/numa.c | 6 +++---
>>>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
>>>> index 5bb15ea..d97c6e2 100644
>>>> --- a/arch/arm64/mm/numa.c
>>>> +++ b/arch/arm64/mm/numa.c
>>>> @@ -335,8 +335,10 @@ static int __init numa_init(int (*init_func)(void))
>>>>  	if (ret < 0)
>>>>  		return ret;
>>>>
>>>> -	if (nodes_empty(numa_nodes_parsed))
>>>> +	if (nodes_empty(numa_nodes_parsed)) {
>>>> +		pr_info("No NUMA configuration found\n");
>>>>  		return -EINVAL;
>>>
>>> Hmm, but dummy_numa_init calls node_set(nid, numa_nodes_parsed) for a
>>> completely artificial setup, created by adding all memblocks to node 0,
>>> so this new message will be suppressed even though things really did go
>>> wrong.
>> It will be printed by the former: numa_init(of_numa_init)
> 
> Does that print an error for every possible failure case? What about the
> acpi path?
I think acpi path should print error by itself. The reason maybe:
1. In numa_init and its sub function, all error paths printed error immediately, except arm64_acpi_numa_init.
2. Suppose numa_init returns error, we do not print the returned error code, so the user don't known what problem cause acpi numa failed.


> 
>>> In that case, don't we want to print *something* (like we do today in
>>> dummy_numa_init) but maybe not "No NUMA configuration found"? What
>>> exactly do you find inaccurate about the current message?
>> For example:
>> [    0.000000] NUMA: No distance-matrix property in distance-map
>> [    0.000000] No NUMA configuration found
>>
>> So if of_numa_init or arm64_acpi_numa_init returned error, because of
>> some numa configuration error had been found, it's no good to print "No
>> NUMA ...".
> 
> Sure, I'm all for changing the message. I just think removing it is
> probably unhelpful. Something like:
> 
> "NUMA: Failed to initialise from firmware"
I think adding this into arm64_acpi_numa_init will be better, maybe we should print 'ret' further:

int __init arm64_acpi_numa_init(void)
{
	int ret;

	ret = acpi_numa_init();
	if (ret) {
+		pr_info("Failed to initialise from firmware\n");
		return ret;
	}

> 
> might do the trick?
> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH v7 14/14] Documentation: remove the constraint on the distances of node pairs
  2016-08-30 17:55       ` Will Deacon
@ 2016-08-31  2:46         ` Leizhen (ThunderTown)
  0 siblings, 0 replies; 36+ messages in thread
From: Leizhen (ThunderTown) @ 2016-08-31  2:46 UTC (permalink / raw)
  To: Will Deacon
  Cc: Catalin Marinas, linux-arm-kernel, linux-kernel, Rob Herring,
	Frank Rowand, devicetree, Zefan Li, Xinwei Hu, Tianhong Ding,
	Hanjun Guo



On 2016/8/31 1:55, Will Deacon wrote:
> On Sat, Aug 27, 2016 at 06:44:39PM +0800, Leizhen (ThunderTown) wrote:
>>
>>
>> On 2016/8/26 23:35, Will Deacon wrote:
>>> On Wed, Aug 24, 2016 at 03:44:53PM +0800, Zhen Lei wrote:
>>>> Update documentation. This limit is unneccessary.
>>>>
>>>> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
>>>> Acked-by: Rob Herring <robh@kernel.org>
>>>> ---
>>>>  Documentation/devicetree/bindings/numa.txt | 1 -
>>>>  1 file changed, 1 deletion(-)
>>>>
>>>> diff --git a/Documentation/devicetree/bindings/numa.txt b/Documentation/devicetree/bindings/numa.txt
>>>> index 21b3505..c0ea4a7 100644
>>>> --- a/Documentation/devicetree/bindings/numa.txt
>>>> +++ b/Documentation/devicetree/bindings/numa.txt
>>>> @@ -48,7 +48,6 @@ distance (memory latency) between all numa nodes.
>>>>
>>>>    Note:
>>>>  	1. Each entry represents distance from first node to second node.
>>>> -	The distances are equal in either direction.
>>>
>>> Hmm, so what happens now if firmware provides a description where both
>>> distances (in either direction) are supplied, but are different?
>> I have not known any hardware that the distances of two direction are
>> different yet
> 
> Then let's not add support for this just yet. When we have systems that
> actually need it, we'll be in a much better position to assess the
> suitability of any patches. At the moment, the whole thing is pretty
> questionable and it adds needless complication to the code.
How about I changed to:
To simplify the configuration, the distance of the opposite direction is the same to it by default.

> 
> Will
> 
> .
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2016-08-31  2:48 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-24  7:44 [PATCH v7 00/14] fix some type infos and bugs for arm64/of numa Zhen Lei
2016-08-24  7:44 ` [PATCH v7 01/14] of/numa: remove a duplicated pr_debug information Zhen Lei
2016-08-24  7:44 ` [PATCH v7 02/14] of/numa: fix a memory@ node can only contains one memory block Zhen Lei
2016-08-24  7:44 ` [PATCH v7 03/14] arm64/numa: add nid check for " Zhen Lei
2016-08-26 12:39   ` Will Deacon
2016-08-27  8:02     ` Leizhen (ThunderTown)
2016-08-24  7:44 ` [PATCH v7 04/14] of/numa: remove a duplicated warning Zhen Lei
2016-08-24  7:44 ` [PATCH v7 05/14] arm64/numa: avoid inconsistent information to be printed Zhen Lei
2016-08-26 12:47   ` Will Deacon
2016-08-27  8:54     ` Leizhen (ThunderTown)
2016-08-30 17:51       ` Will Deacon
2016-08-31  2:29         ` Leizhen (ThunderTown)
2016-08-24  7:44 ` [PATCH v7 06/14] of_numa: Use of_get_next_parent to simplify code Zhen Lei
2016-08-24  7:44 ` [PATCH v7 07/14] of_numa: Use pr_fmt() Zhen Lei
2016-08-24  7:44 ` [PATCH v7 08/14] arm64: numa: " Zhen Lei
2016-08-26 12:54   ` Will Deacon
2016-08-27  9:14     ` Leizhen (ThunderTown)
2016-08-24  7:44 ` [PATCH v7 09/14] arm64/numa: support HAVE_SETUP_PER_CPU_AREA Zhen Lei
2016-08-26 13:28   ` Will Deacon
2016-08-27 10:06     ` Leizhen (ThunderTown)
2016-08-24  7:44 ` [PATCH v7 10/14] arm64/numa: define numa_distance as array to simplify code Zhen Lei
2016-08-26 15:29   ` Will Deacon
2016-08-27 10:29     ` Leizhen (ThunderTown)
2016-08-24  7:44 ` [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES Zhen Lei
2016-08-26 15:43   ` Will Deacon
2016-08-27 11:05     ` Leizhen (ThunderTown)
2016-08-29  3:15       ` Leizhen (ThunderTown)
2016-08-24  7:44 ` [PATCH v7 12/14] arm64/numa: remove the limitation that cpu0 must bind to node0 Zhen Lei
2016-08-26 15:49   ` Will Deacon
2016-08-29  6:55     ` Leizhen (ThunderTown)
2016-08-24  7:44 ` [PATCH v7 13/14] of/numa: remove the constraint on the distances of node pairs Zhen Lei
2016-08-24  7:44 ` [PATCH v7 14/14] Documentation: " Zhen Lei
2016-08-26 15:35   ` Will Deacon
2016-08-27 10:44     ` Leizhen (ThunderTown)
2016-08-30 17:55       ` Will Deacon
2016-08-31  2:46         ` Leizhen (ThunderTown)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).