All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/4] arm64:numa: Add numa support for arm64 platforms.
@ 2014-09-25  9:03 ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

This is initial patch set to support numa on arm64 based platforms.
Tested these patches on cavium's multinode(2 node topology) simulator.
Tried all test-cases present in numactl-2.0.9 package.

In this patchset, defined dt bindings for numa mapping for memory
nodes. The cpu to node mapping is derived from the existing cpu-map
mapping as defined in topology binding.

thanks,
Ganapat

Ganapatrao Kulkarni (4):
  arm64: defconfig: increase NR_CPUS range to 2-128
  arm/arm64:dt:numa: adding numa node mapping for memory nodes.
  arm64:thunder: Add initial dts for Cavium Thunder SoC in 2 Node topology.
  arm64:numa: adding numa support for arm64 platforms.

 Documentation/devicetree/bindings/arm/numa.txt |  60 ++
 arch/arm64/Kconfig                             |  37 +-
 arch/arm64/boot/dts/thunder-88xx-2n.dts        |  76 ++
 arch/arm64/boot/dts/thunder-88xx-2n.dtsi       | 990 +++++++++++++++++++++++++
 arch/arm64/include/asm/mmzone.h                |  32 +
 arch/arm64/include/asm/numa.h                  |  41 +
 arch/arm64/kernel/setup.c                      |   8 +
 arch/arm64/kernel/smp.c                        |   2 +
 arch/arm64/mm/Makefile                         |   1 +
 arch/arm64/mm/init.c                           |  33 +-
 arch/arm64/mm/numa.c                           | 471 ++++++++++++
 11 files changed, 1745 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
 create mode 100644 arch/arm64/boot/dts/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/thunder-88xx-2n.dtsi
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 0/4] arm64:numa: Add numa support for arm64 platforms.
@ 2014-09-25  9:03 ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

This is initial patch set to support numa on arm64 based platforms.
Tested these patches on cavium's multinode(2 node topology) simulator.
Tried all test-cases present in numactl-2.0.9 package.

In this patchset, defined dt bindings for numa mapping for memory
nodes. The cpu to node mapping is derived from the existing cpu-map
mapping as defined in topology binding.

thanks,
Ganapat

Ganapatrao Kulkarni (4):
  arm64: defconfig: increase NR_CPUS range to 2-128
  arm/arm64:dt:numa: adding numa node mapping for memory nodes.
  arm64:thunder: Add initial dts for Cavium Thunder SoC in 2 Node topology.
  arm64:numa: adding numa support for arm64 platforms.

 Documentation/devicetree/bindings/arm/numa.txt |  60 ++
 arch/arm64/Kconfig                             |  37 +-
 arch/arm64/boot/dts/thunder-88xx-2n.dts        |  76 ++
 arch/arm64/boot/dts/thunder-88xx-2n.dtsi       | 990 +++++++++++++++++++++++++
 arch/arm64/include/asm/mmzone.h                |  32 +
 arch/arm64/include/asm/numa.h                  |  41 +
 arch/arm64/kernel/setup.c                      |   8 +
 arch/arm64/kernel/smp.c                        |   2 +
 arch/arm64/mm/Makefile                         |   1 +
 arch/arm64/mm/init.c                           |  33 +-
 arch/arm64/mm/numa.c                           | 471 ++++++++++++
 11 files changed, 1745 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
 create mode 100644 arch/arm64/boot/dts/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/thunder-88xx-2n.dtsi
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 1/4] arm64: defconfig: increase NR_CPUS range to 2-128
  2014-09-25  9:03 ` Ganapatrao Kulkarni
@ 2014-09-25  9:03     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

Raising the maximum limit to 128. This is needed for Cavium's
Thunder systems that will have 96 cores on Multi-node system.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
---
 arch/arm64/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4d42453..a409105 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -296,8 +296,8 @@ config SCHED_SMT
 	  places. If unsure say N here.
 
 config NR_CPUS
-	int "Maximum number of CPUs (2-64)"
-	range 2 64
+	int "Maximum number of CPUs (2-128)"
+	range 2 128
 	depends on SMP
 	# These have to remain sorted largest to smallest
 	default "64"
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 1/4] arm64: defconfig: increase NR_CPUS range to 2-128
@ 2014-09-25  9:03     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

Raising the maximum limit to 128. This is needed for Cavium's
Thunder systems that will have 96 cores on Multi-node system.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 4d42453..a409105 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -296,8 +296,8 @@ config SCHED_SMT
 	  places. If unsure say N here.
 
 config NR_CPUS
-	int "Maximum number of CPUs (2-64)"
-	range 2 64
+	int "Maximum number of CPUs (2-128)"
+	range 2 128
 	depends on SMP
 	# These have to remain sorted largest to smallest
 	default "64"
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
  2014-09-25  9:03 ` Ganapatrao Kulkarni
@ 2014-09-25  9:03     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

Adding Documentation for dt binding for memory to numa node mapping.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
---
 Documentation/devicetree/bindings/arm/numa.txt | 60 ++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt

diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
new file mode 100644
index 0000000..1cdc6d3
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/numa.txt
@@ -0,0 +1,60 @@
+========================================================
+ARM numa id binding description
+========================================================
+
+========================================================
+1 - Introduction
+========================================================
+
+The device node  property nid (numa node id) can be added
+to memory device node to map the range of memory addresses
+as defined in property reg. The property nid maps the memory
+range to the numa node id, which is used to find the local
+and remote pages on numa aware systems.
+
+========================================================
+2 - nid property
+========================================================
+nid is required property of memory device node for
+numa enabled platforms.
+
+|------------------------------------------------------|
+|Property Type  | Usage | Value Type | Definition      |
+|------------------------------------------------------|
+|  nid          |  R    |    <u32>   | Numa Node id    |
+|               |       |            | for this memory |
+|------------------------------------------------------|
+
+========================================================
+4 - Example memory nodes with numa information
+========================================================
+
+Example 1 (2 memory nodes, each mapped to a numa node.):
+
+	memory@00000000 {
+		device_type = "memory";
+		reg = <0x0 0x00000000 0x0 0x80000000>;
+		nid = <0x0>;
+	};
+
+	memory@10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		nid = <0x1>;
+	};
+
+Example 2 (multiple memory ranges in each memory node and mapped to numa node):
+
+	memory@00000000 {
+		device_type = "memory";
+		reg = <0x0 0x00000000 0x0 0x80000000>,
+		      <0x1 0x00000000 0x0 0x80000000>;
+		nid = <0x0>;
+	};
+
+	memory@10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		reg = <0x100 0x80000000 0x0 0x80000000>;
+		nid = <0x1>;
+	};
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
@ 2014-09-25  9:03     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

Adding Documentation for dt binding for memory to numa node mapping.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
---
 Documentation/devicetree/bindings/arm/numa.txt | 60 ++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/numa.txt

diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
new file mode 100644
index 0000000..1cdc6d3
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/numa.txt
@@ -0,0 +1,60 @@
+========================================================
+ARM numa id binding description
+========================================================
+
+========================================================
+1 - Introduction
+========================================================
+
+The device node  property nid (numa node id) can be added
+to memory device node to map the range of memory addresses
+as defined in property reg. The property nid maps the memory
+range to the numa node id, which is used to find the local
+and remote pages on numa aware systems.
+
+========================================================
+2 - nid property
+========================================================
+nid is required property of memory device node for
+numa enabled platforms.
+
+|------------------------------------------------------|
+|Property Type  | Usage | Value Type | Definition      |
+|------------------------------------------------------|
+|  nid          |  R    |    <u32>   | Numa Node id    |
+|               |       |            | for this memory |
+|------------------------------------------------------|
+
+========================================================
+4 - Example memory nodes with numa information
+========================================================
+
+Example 1 (2 memory nodes, each mapped to a numa node.):
+
+	memory at 00000000 {
+		device_type = "memory";
+		reg = <0x0 0x00000000 0x0 0x80000000>;
+		nid = <0x0>;
+	};
+
+	memory at 10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		nid = <0x1>;
+	};
+
+Example 2 (multiple memory ranges in each memory node and mapped to numa node):
+
+	memory at 00000000 {
+		device_type = "memory";
+		reg = <0x0 0x00000000 0x0 0x80000000>,
+		      <0x1 0x00000000 0x0 0x80000000>;
+		nid = <0x0>;
+	};
+
+	memory at 10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		reg = <0x100 0x80000000 0x0 0x80000000>;
+		nid = <0x1>;
+	};
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 3/4] arm64:thunder: Add initial dts for Cavium Thunder SoC in 2 Node topology.
  2014-09-25  9:03 ` Ganapatrao Kulkarni
@ 2014-09-25  9:03     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

adding devicetree definition for thunder's 2 node topology.
Defined cpu-map for all 96 cores of 2 node system.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
---
 arch/arm64/boot/dts/thunder-88xx-2n.dts  |  76 +++
 arch/arm64/boot/dts/thunder-88xx-2n.dtsi | 990 +++++++++++++++++++++++++++++++
 2 files changed, 1066 insertions(+)
 create mode 100644 arch/arm64/boot/dts/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/thunder-88xx-2n.dtsi

diff --git a/arch/arm64/boot/dts/thunder-88xx-2n.dts b/arch/arm64/boot/dts/thunder-88xx-2n.dts
new file mode 100644
index 0000000..b2e70e1
--- /dev/null
+++ b/arch/arm64/boot/dts/thunder-88xx-2n.dts
@@ -0,0 +1,76 @@
+/*
+ * Cavium Thunder DTS file - Thunder board description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+
+/include/ "thunder-88xx-2n.dtsi"
+
+/ {
+	model = "Cavium ThunderX CN88XX board";
+	compatible = "cavium,thunder-88xx";
+
+	aliases {
+		serial0 = &uaa0;
+		serial1 = &uaa1;
+	};
+
+	memory@00000000 {
+		device_type = "memory";
+		reg = <0x0 0x00000000 0x0 0x80000000>,
+		      <0x1 0x00000000 0x0 0x80000000>;
+		nid = <0x0>;
+	};
+
+	memory@10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		nid = <0x1>;
+	};
+
+};
diff --git a/arch/arm64/boot/dts/thunder-88xx-2n.dtsi b/arch/arm64/boot/dts/thunder-88xx-2n.dtsi
new file mode 100644
index 0000000..511c932
--- /dev/null
+++ b/arch/arm64/boot/dts/thunder-88xx-2n.dtsi
@@ -0,0 +1,990 @@
+/*
+ * Cavium Thunder DTS file - Thunder SoC description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/ {
+	compatible = "cavium,thunder-88xx";
+	interrupt-parent = <&gic0>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	psci {
+		compatible = "arm,psci-0.2";
+		method = "smc";
+	};
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu-map {
+			cluster0 {
+				core0 {
+					cpu = <&CPU0>;
+				};
+				core1 {
+					cpu = <&CPU1>;
+				};
+				core2 {
+					cpu = <&CPU2>;
+				};
+				core3 {
+					cpu = <&CPU3>;
+				};
+				core4 {
+					cpu = <&CPU4>;
+				};
+				core5 {
+					cpu = <&CPU5>;
+				};
+				core6 {
+					cpu = <&CPU6>;
+				};
+				core7 {
+					cpu = <&CPU7>;
+				};
+				core8 {
+					cpu = <&CPU8>;
+				};
+				core9 {
+					cpu = <&CPU9>;
+				};
+				core10 {
+					cpu = <&CPU10>;
+				};
+				core11 {
+					cpu = <&CPU11>;
+				};
+				core12 {
+					cpu = <&CPU12>;
+				};
+				core13 {
+					cpu = <&CPU13>;
+				};
+				core14 {
+					cpu = <&CPU14>;
+				};
+				core15 {
+					cpu = <&CPU15>;
+				};
+				core16 {
+					cpu = <&CPU16>;
+				};
+				core17 {
+					cpu = <&CPU17>;
+				};
+				core18 {
+					cpu = <&CPU18>;
+				};
+				core19 {
+					cpu = <&CPU19>;
+				};
+				core20 {
+					cpu = <&CPU20>;
+				};
+				core21 {
+					cpu = <&CPU21>;
+				};
+				core22 {
+					cpu = <&CPU22>;
+				};
+				core23 {
+					cpu = <&CPU23>;
+				};
+				core24 {
+					cpu = <&CPU24>;
+				};
+				core25 {
+					cpu = <&CPU25>;
+				};
+				core26 {
+					cpu = <&CPU26>;
+				};
+				core27 {
+					cpu = <&CPU27>;
+				};
+				core28 {
+					cpu = <&CPU28>;
+				};
+				core29 {
+					cpu = <&CPU29>;
+				};
+				core30 {
+					cpu = <&CPU30>;
+				};
+				core31 {
+					cpu = <&CPU31>;
+				};
+				core32 {
+					cpu = <&CPU32>;
+				};
+				core33 {
+					cpu = <&CPU33>;
+				};
+				core34 {
+					cpu = <&CPU34>;
+				};
+				core35 {
+					cpu = <&CPU35>;
+				};
+				core36 {
+					cpu = <&CPU36>;
+				};
+				core37 {
+					cpu = <&CPU37>;
+				};
+				core38 {
+					cpu = <&CPU38>;
+				};
+				core39 {
+					cpu = <&CPU39>;
+				};
+				core40 {
+					cpu = <&CPU40>;
+				};
+				core41 {
+					cpu = <&CPU41>;
+				};
+				core42 {
+					cpu = <&CPU42>;
+				};
+				core43 {
+					cpu = <&CPU43>;
+				};
+				core44 {
+					cpu = <&CPU44>;
+				};
+				core45 {
+					cpu = <&CPU45>;
+				};
+				core46 {
+					cpu = <&CPU46>;
+				};
+				core47 {
+					cpu = <&CPU47>;
+				};
+			};
+
+			cluster1 {
+				core0 {
+					cpu = <&CPU48>;
+				};
+				core1 {
+					cpu = <&CPU49>;
+				};
+				core2 {
+					cpu = <&CPU50>;
+				};
+				core3 {
+					cpu = <&CPU51>;
+				};
+				core4 {
+					cpu = <&CPU52>;
+				};
+				core5 {
+					cpu = <&CPU53>;
+				};
+				core6 {
+					cpu = <&CPU54>;
+				};
+				core7 {
+					cpu = <&CPU55>;
+				};
+				core8 {
+					cpu = <&CPU56>;
+				};
+				core9 {
+					cpu = <&CPU57>;
+				};
+				core10 {
+					cpu = <&CPU58>;
+				};
+				core11 {
+					cpu = <&CPU59>;
+				};
+				core12 {
+					cpu = <&CPU60>;
+				};
+				core13 {
+					cpu = <&CPU61>;
+				};
+				core14 {
+					cpu = <&CPU62>;
+				};
+				core15 {
+					cpu = <&CPU63>;
+				};
+				core16 {
+					cpu = <&CPU64>;
+				};
+				core17 {
+					cpu = <&CPU65>;
+				};
+				core18 {
+					cpu = <&CPU66>;
+				};
+				core19 {
+					cpu = <&CPU67>;
+				};
+				core20 {
+					cpu = <&CPU68>;
+				};
+				core21 {
+					cpu = <&CPU69>;
+				};
+				core22 {
+					cpu = <&CPU70>;
+				};
+				core23 {
+					cpu = <&CPU71>;
+				};
+				core24 {
+					cpu = <&CPU72>;
+				};
+				core25 {
+					cpu = <&CPU73>;
+				};
+				core26 {
+					cpu = <&CPU74>;
+				};
+				core27 {
+					cpu = <&CPU75>;
+				};
+				core28 {
+					cpu = <&CPU76>;
+				};
+				core29 {
+					cpu = <&CPU77>;
+				};
+				core30 {
+					cpu = <&CPU78>;
+				};
+				core31 {
+					cpu = <&CPU79>;
+				};
+				core32 {
+					cpu = <&CPU80>;
+				};
+				core33 {
+					cpu = <&CPU81>;
+				};
+				core34 {
+					cpu = <&CPU82>;
+				};
+				core35 {
+					cpu = <&CPU83>;
+				};
+				core36 {
+					cpu = <&CPU84>;
+				};
+				core37 {
+					cpu = <&CPU85>;
+				};
+				core38 {
+					cpu = <&CPU86>;
+				};
+				core39 {
+					cpu = <&CPU87>;
+				};
+				core40 {
+					cpu = <&CPU88>;
+				};
+				core41 {
+					cpu = <&CPU89>;
+				};
+				core42 {
+					cpu = <&CPU90>;
+				};
+				core43 {
+					cpu = <&CPU91>;
+				};
+				core44 {
+					cpu = <&CPU92>;
+				};
+				core45 {
+					cpu = <&CPU93>;
+				};
+				core46 {
+					cpu = <&CPU94>;
+				};
+				core47 {
+					cpu = <&CPU95>;
+				};
+			};
+		};
+
+		CPU0: cpu@000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x000>;
+			enable-method = "psci";
+		};
+		CPU1: cpu@001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x001>;
+			enable-method = "psci";
+		};
+		CPU2: cpu@002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x002>;
+			enable-method = "psci";
+		};
+		CPU3: cpu@003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x003>;
+			enable-method = "psci";
+		};
+		CPU4: cpu@004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x004>;
+			enable-method = "psci";
+		};
+		CPU5: cpu@005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x005>;
+			enable-method = "psci";
+		};
+		CPU6: cpu@006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x006>;
+			enable-method = "psci";
+		};
+		CPU7: cpu@007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x007>;
+			enable-method = "psci";
+		};
+		CPU8: cpu@008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x008>;
+			enable-method = "psci";
+		};
+		CPU9: cpu@009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x009>;
+			enable-method = "psci";
+		};
+		CPU10: cpu@00a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00a>;
+			enable-method = "psci";
+		};
+		CPU11: cpu@00b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00b>;
+			enable-method = "psci";
+		};
+		CPU12: cpu@00c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00c>;
+			enable-method = "psci";
+		};
+		CPU13: cpu@00d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00d>;
+			enable-method = "psci";
+		};
+		CPU14: cpu@00e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00e>;
+			enable-method = "psci";
+		};
+		CPU15: cpu@00f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00f>;
+			enable-method = "psci";
+		};
+		CPU16: cpu@100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x100>;
+			enable-method = "psci";
+		};
+		CPU17: cpu@101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x101>;
+			enable-method = "psci";
+		};
+		CPU18: cpu@102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x102>;
+			enable-method = "psci";
+		};
+		CPU19: cpu@103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x103>;
+			enable-method = "psci";
+		};
+		CPU20: cpu@104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x104>;
+			enable-method = "psci";
+		};
+		CPU21: cpu@105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x105>;
+			enable-method = "psci";
+		};
+		CPU22: cpu@106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x106>;
+			enable-method = "psci";
+		};
+		CPU23: cpu@107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x107>;
+			enable-method = "psci";
+		};
+		CPU24: cpu@108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x108>;
+			enable-method = "psci";
+		};
+		CPU25: cpu@109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x109>;
+			enable-method = "psci";
+		};
+		CPU26: cpu@10a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10a>;
+			enable-method = "psci";
+		};
+		CPU27: cpu@10b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10b>;
+			enable-method = "psci";
+		};
+		CPU28: cpu@10c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10c>;
+			enable-method = "psci";
+		};
+		CPU29: cpu@10d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10d>;
+			enable-method = "psci";
+		};
+		CPU30: cpu@10e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10e>;
+			enable-method = "psci";
+		};
+		CPU31: cpu@10f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10f>;
+			enable-method = "psci";
+		};
+		CPU32: cpu@200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x200>;
+			enable-method = "psci";
+		};
+		CPU33: cpu@201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x201>;
+			enable-method = "psci";
+		};
+		CPU34: cpu@202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x202>;
+			enable-method = "psci";
+		};
+		CPU35: cpu@203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x203>;
+			enable-method = "psci";
+		};
+		CPU36: cpu@204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x204>;
+			enable-method = "psci";
+		};
+		CPU37: cpu@205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x205>;
+			enable-method = "psci";
+		};
+		CPU38: cpu@206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x206>;
+			enable-method = "psci";
+		};
+		CPU39: cpu@207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x207>;
+			enable-method = "psci";
+		};
+		CPU40: cpu@208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x208>;
+			enable-method = "psci";
+		};
+		CPU41: cpu@209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x209>;
+			enable-method = "psci";
+		};
+		CPU42: cpu@20a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20a>;
+			enable-method = "psci";
+		};
+		CPU43: cpu@20b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20b>;
+			enable-method = "psci";
+		};
+		CPU44: cpu@20c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20c>;
+			enable-method = "psci";
+		};
+		CPU45: cpu@20d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20d>;
+			enable-method = "psci";
+		};
+		CPU46: cpu@20e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20e>;
+			enable-method = "psci";
+		};
+		CPU47: cpu@20f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20f>;
+			enable-method = "psci";
+		};
+		CPU48: cpu@10000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10000>;
+			enable-method = "psci";
+		};
+		CPU49: cpu@10001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10001>;
+			enable-method = "psci";
+		};
+		CPU50: cpu@10002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10002>;
+			enable-method = "psci";
+		};
+		CPU51: cpu@10003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10003>;
+			enable-method = "psci";
+		};
+		CPU52: cpu@10004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10004>;
+			enable-method = "psci";
+		};
+		CPU53: cpu@10005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10005>;
+			enable-method = "psci";
+		};
+		CPU54: cpu@10006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10006>;
+			enable-method = "psci";
+		};
+		CPU55: cpu@10007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10007>;
+			enable-method = "psci";
+		};
+		CPU56: cpu@10008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10008>;
+			enable-method = "psci";
+		};
+		CPU57: cpu@10009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10009>;
+			enable-method = "psci";
+		};
+		CPU58: cpu@1000a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000a>;
+			enable-method = "psci";
+		};
+		CPU59: cpu@1000b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000b>;
+			enable-method = "psci";
+		};
+		CPU60: cpu@1000c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000c>;
+			enable-method = "psci";
+		};
+		CPU61: cpu@1000d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000d>;
+			enable-method = "psci";
+		};
+		CPU62: cpu@1000e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000e>;
+			enable-method = "psci";
+		};
+		CPU63: cpu@1000f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000f>;
+			enable-method = "psci";
+		};
+		CPU64: cpu@10100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10100>;
+			enable-method = "psci";
+		};
+		CPU65: cpu@10101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10101>;
+			enable-method = "psci";
+		};
+		CPU66: cpu@10102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10102>;
+			enable-method = "psci";
+		};
+		CPU67: cpu@10103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10103>;
+			enable-method = "psci";
+		};
+		CPU68: cpu@10104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10104>;
+			enable-method = "psci";
+		};
+		CPU69: cpu@10105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10105>;
+			enable-method = "psci";
+		};
+		CPU70: cpu@10106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10106>;
+			enable-method = "psci";
+		};
+		CPU71: cpu@10107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10107>;
+			enable-method = "psci";
+		};
+		CPU72: cpu@10108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10108>;
+			enable-method = "psci";
+		};
+		CPU73: cpu@10109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10109>;
+			enable-method = "psci";
+		};
+		CPU74: cpu@1010a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010a>;
+			enable-method = "psci";
+		};
+		CPU75: cpu@1010b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010b>;
+			enable-method = "psci";
+		};
+		CPU76: cpu@1010c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010c>;
+			enable-method = "psci";
+		};
+		CPU77: cpu@1010d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010d>;
+			enable-method = "psci";
+		};
+		CPU78: cpu@1010e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010e>;
+			enable-method = "psci";
+		};
+		CPU79: cpu@1010f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010f>;
+			enable-method = "psci";
+		};
+		CPU80: cpu@10200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10200>;
+			enable-method = "psci";
+		};
+		CPU81: cpu@10201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10201>;
+			enable-method = "psci";
+		};
+		CPU82: cpu@10202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10202>;
+			enable-method = "psci";
+		};
+		CPU83: cpu@10203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10203>;
+			enable-method = "psci";
+		};
+		CPU84: cpu@10204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10204>;
+			enable-method = "psci";
+		};
+		CPU85: cpu@10205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10205>;
+			enable-method = "psci";
+		};
+		CPU86: cpu@10206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10206>;
+			enable-method = "psci";
+		};
+		CPU87: cpu@10207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10207>;
+			enable-method = "psci";
+		};
+		CPU88: cpu@10208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10208>;
+			enable-method = "psci";
+		};
+		CPU89: cpu@10209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10209>;
+			enable-method = "psci";
+		};
+		CPU90: cpu@1020a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020a>;
+			enable-method = "psci";
+		};
+		CPU91: cpu@1020b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020b>;
+			enable-method = "psci";
+		};
+		CPU92: cpu@1020c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020c>;
+			enable-method = "psci";
+		};
+		CPU93: cpu@1020d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020d>;
+			enable-method = "psci";
+		};
+		CPU94: cpu@1020e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020e>;
+			enable-method = "psci";
+		};
+		CPU95: cpu@1020f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020f>;
+			enable-method = "psci";
+		};
+	};
+
+	timer {
+		compatible = "arm,armv8-timer";
+		interrupts = <1 13 0xff01>,
+		             <1 14 0xff01>,
+		             <1 11 0xff01>,
+		             <1 10 0xff01>;
+	};
+
+	soc {
+		compatible = "simple-bus";
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+
+		refclk50mhz: refclk50mhz {
+			compatible = "fixed-clock";
+			#clock-cells = <0>;
+			clock-frequency = <50000000>;
+			clock-output-names = "refclk50mhz";
+		};
+
+		gic0: interrupt-controller@8010,00000000 {
+			compatible = "arm,gic-v3";
+			#interrupt-cells = <3>;
+			#address-cells = <2>;
+			#size-cells = <2>;
+			#redistributor-regions = <2>;
+			ranges;
+			interrupt-controller;
+			reg = <0x8010 0x00000000 0x0 0x010000>, /* GICD */
+			      <0x8010 0x80000000 0x0 0x600000>, /* GICR Node 0 */
+			      <0x9010 0x80000000 0x0 0x600000>; /* GICR Node 1 */
+			interrupts = <1 9 0xf04>;
+		};
+
+		uaa0: serial@87e0,24000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x24000000 0x0 0x1000>;
+			interrupts = <1 21 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+
+		uaa1: serial@87e0,25000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x25000000 0x0 0x1000>;
+			interrupts = <1 22 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+	};
+};
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 3/4] arm64:thunder: Add initial dts for Cavium Thunder SoC in 2 Node topology.
@ 2014-09-25  9:03     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

adding devicetree definition for thunder's 2 node topology.
Defined cpu-map for all 96 cores of 2 node system.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
---
 arch/arm64/boot/dts/thunder-88xx-2n.dts  |  76 +++
 arch/arm64/boot/dts/thunder-88xx-2n.dtsi | 990 +++++++++++++++++++++++++++++++
 2 files changed, 1066 insertions(+)
 create mode 100644 arch/arm64/boot/dts/thunder-88xx-2n.dts
 create mode 100644 arch/arm64/boot/dts/thunder-88xx-2n.dtsi

diff --git a/arch/arm64/boot/dts/thunder-88xx-2n.dts b/arch/arm64/boot/dts/thunder-88xx-2n.dts
new file mode 100644
index 0000000..b2e70e1
--- /dev/null
+++ b/arch/arm64/boot/dts/thunder-88xx-2n.dts
@@ -0,0 +1,76 @@
+/*
+ * Cavium Thunder DTS file - Thunder board description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/dts-v1/;
+
+/include/ "thunder-88xx-2n.dtsi"
+
+/ {
+	model = "Cavium ThunderX CN88XX board";
+	compatible = "cavium,thunder-88xx";
+
+	aliases {
+		serial0 = &uaa0;
+		serial1 = &uaa1;
+	};
+
+	memory at 00000000 {
+		device_type = "memory";
+		reg = <0x0 0x00000000 0x0 0x80000000>,
+		      <0x1 0x00000000 0x0 0x80000000>;
+		nid = <0x0>;
+	};
+
+	memory at 10000000000 {
+		device_type = "memory";
+		reg = <0x100 0x00000000 0x0 0x80000000>;
+		nid = <0x1>;
+	};
+
+};
diff --git a/arch/arm64/boot/dts/thunder-88xx-2n.dtsi b/arch/arm64/boot/dts/thunder-88xx-2n.dtsi
new file mode 100644
index 0000000..511c932
--- /dev/null
+++ b/arch/arm64/boot/dts/thunder-88xx-2n.dtsi
@@ -0,0 +1,990 @@
+/*
+ * Cavium Thunder DTS file - Thunder SoC description
+ *
+ * Copyright (C) 2014, Cavium Inc.
+ *
+ * This file is dual-licensed: you can use it either under the terms
+ * of the GPL or the X11 license, at your option. Note that this dual
+ * licensing only applies to this file, and not this project as a
+ * whole.
+ *
+ *  a) This library is free software; you can redistribute it and/or
+ *     modify it under the terms of the GNU General Public License as
+ *     published by the Free Software Foundation; either version 2 of the
+ *     License, or (at your option) any later version.
+ *
+ *     This library is distributed in the hope that it will be useful,
+ *     but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *     GNU General Public License for more details.
+ *
+ *     You should have received a copy of the GNU General Public
+ *     License along with this library; if not, write to the Free
+ *     Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston,
+ *     MA 02110-1301 USA
+ *
+ * Or, alternatively,
+ *
+ *  b) Permission is hereby granted, free of charge, to any person
+ *     obtaining a copy of this software and associated documentation
+ *     files (the "Software"), to deal in the Software without
+ *     restriction, including without limitation the rights to use,
+ *     copy, modify, merge, publish, distribute, sublicense, and/or
+ *     sell copies of the Software, and to permit persons to whom the
+ *     Software is furnished to do so, subject to the following
+ *     conditions:
+ *
+ *     The above copyright notice and this permission notice shall be
+ *     included in all copies or substantial portions of the Software.
+ *
+ *     THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ *     EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+ *     OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ *     NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+ *     HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+ *     WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ *     FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ *     OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+/ {
+	compatible = "cavium,thunder-88xx";
+	interrupt-parent = <&gic0>;
+	#address-cells = <2>;
+	#size-cells = <2>;
+
+	psci {
+		compatible = "arm,psci-0.2";
+		method = "smc";
+	};
+
+	cpus {
+		#address-cells = <2>;
+		#size-cells = <0>;
+
+		cpu-map {
+			cluster0 {
+				core0 {
+					cpu = <&CPU0>;
+				};
+				core1 {
+					cpu = <&CPU1>;
+				};
+				core2 {
+					cpu = <&CPU2>;
+				};
+				core3 {
+					cpu = <&CPU3>;
+				};
+				core4 {
+					cpu = <&CPU4>;
+				};
+				core5 {
+					cpu = <&CPU5>;
+				};
+				core6 {
+					cpu = <&CPU6>;
+				};
+				core7 {
+					cpu = <&CPU7>;
+				};
+				core8 {
+					cpu = <&CPU8>;
+				};
+				core9 {
+					cpu = <&CPU9>;
+				};
+				core10 {
+					cpu = <&CPU10>;
+				};
+				core11 {
+					cpu = <&CPU11>;
+				};
+				core12 {
+					cpu = <&CPU12>;
+				};
+				core13 {
+					cpu = <&CPU13>;
+				};
+				core14 {
+					cpu = <&CPU14>;
+				};
+				core15 {
+					cpu = <&CPU15>;
+				};
+				core16 {
+					cpu = <&CPU16>;
+				};
+				core17 {
+					cpu = <&CPU17>;
+				};
+				core18 {
+					cpu = <&CPU18>;
+				};
+				core19 {
+					cpu = <&CPU19>;
+				};
+				core20 {
+					cpu = <&CPU20>;
+				};
+				core21 {
+					cpu = <&CPU21>;
+				};
+				core22 {
+					cpu = <&CPU22>;
+				};
+				core23 {
+					cpu = <&CPU23>;
+				};
+				core24 {
+					cpu = <&CPU24>;
+				};
+				core25 {
+					cpu = <&CPU25>;
+				};
+				core26 {
+					cpu = <&CPU26>;
+				};
+				core27 {
+					cpu = <&CPU27>;
+				};
+				core28 {
+					cpu = <&CPU28>;
+				};
+				core29 {
+					cpu = <&CPU29>;
+				};
+				core30 {
+					cpu = <&CPU30>;
+				};
+				core31 {
+					cpu = <&CPU31>;
+				};
+				core32 {
+					cpu = <&CPU32>;
+				};
+				core33 {
+					cpu = <&CPU33>;
+				};
+				core34 {
+					cpu = <&CPU34>;
+				};
+				core35 {
+					cpu = <&CPU35>;
+				};
+				core36 {
+					cpu = <&CPU36>;
+				};
+				core37 {
+					cpu = <&CPU37>;
+				};
+				core38 {
+					cpu = <&CPU38>;
+				};
+				core39 {
+					cpu = <&CPU39>;
+				};
+				core40 {
+					cpu = <&CPU40>;
+				};
+				core41 {
+					cpu = <&CPU41>;
+				};
+				core42 {
+					cpu = <&CPU42>;
+				};
+				core43 {
+					cpu = <&CPU43>;
+				};
+				core44 {
+					cpu = <&CPU44>;
+				};
+				core45 {
+					cpu = <&CPU45>;
+				};
+				core46 {
+					cpu = <&CPU46>;
+				};
+				core47 {
+					cpu = <&CPU47>;
+				};
+			};
+
+			cluster1 {
+				core0 {
+					cpu = <&CPU48>;
+				};
+				core1 {
+					cpu = <&CPU49>;
+				};
+				core2 {
+					cpu = <&CPU50>;
+				};
+				core3 {
+					cpu = <&CPU51>;
+				};
+				core4 {
+					cpu = <&CPU52>;
+				};
+				core5 {
+					cpu = <&CPU53>;
+				};
+				core6 {
+					cpu = <&CPU54>;
+				};
+				core7 {
+					cpu = <&CPU55>;
+				};
+				core8 {
+					cpu = <&CPU56>;
+				};
+				core9 {
+					cpu = <&CPU57>;
+				};
+				core10 {
+					cpu = <&CPU58>;
+				};
+				core11 {
+					cpu = <&CPU59>;
+				};
+				core12 {
+					cpu = <&CPU60>;
+				};
+				core13 {
+					cpu = <&CPU61>;
+				};
+				core14 {
+					cpu = <&CPU62>;
+				};
+				core15 {
+					cpu = <&CPU63>;
+				};
+				core16 {
+					cpu = <&CPU64>;
+				};
+				core17 {
+					cpu = <&CPU65>;
+				};
+				core18 {
+					cpu = <&CPU66>;
+				};
+				core19 {
+					cpu = <&CPU67>;
+				};
+				core20 {
+					cpu = <&CPU68>;
+				};
+				core21 {
+					cpu = <&CPU69>;
+				};
+				core22 {
+					cpu = <&CPU70>;
+				};
+				core23 {
+					cpu = <&CPU71>;
+				};
+				core24 {
+					cpu = <&CPU72>;
+				};
+				core25 {
+					cpu = <&CPU73>;
+				};
+				core26 {
+					cpu = <&CPU74>;
+				};
+				core27 {
+					cpu = <&CPU75>;
+				};
+				core28 {
+					cpu = <&CPU76>;
+				};
+				core29 {
+					cpu = <&CPU77>;
+				};
+				core30 {
+					cpu = <&CPU78>;
+				};
+				core31 {
+					cpu = <&CPU79>;
+				};
+				core32 {
+					cpu = <&CPU80>;
+				};
+				core33 {
+					cpu = <&CPU81>;
+				};
+				core34 {
+					cpu = <&CPU82>;
+				};
+				core35 {
+					cpu = <&CPU83>;
+				};
+				core36 {
+					cpu = <&CPU84>;
+				};
+				core37 {
+					cpu = <&CPU85>;
+				};
+				core38 {
+					cpu = <&CPU86>;
+				};
+				core39 {
+					cpu = <&CPU87>;
+				};
+				core40 {
+					cpu = <&CPU88>;
+				};
+				core41 {
+					cpu = <&CPU89>;
+				};
+				core42 {
+					cpu = <&CPU90>;
+				};
+				core43 {
+					cpu = <&CPU91>;
+				};
+				core44 {
+					cpu = <&CPU92>;
+				};
+				core45 {
+					cpu = <&CPU93>;
+				};
+				core46 {
+					cpu = <&CPU94>;
+				};
+				core47 {
+					cpu = <&CPU95>;
+				};
+			};
+		};
+
+		CPU0: cpu at 000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x000>;
+			enable-method = "psci";
+		};
+		CPU1: cpu at 001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x001>;
+			enable-method = "psci";
+		};
+		CPU2: cpu at 002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x002>;
+			enable-method = "psci";
+		};
+		CPU3: cpu at 003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x003>;
+			enable-method = "psci";
+		};
+		CPU4: cpu at 004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x004>;
+			enable-method = "psci";
+		};
+		CPU5: cpu at 005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x005>;
+			enable-method = "psci";
+		};
+		CPU6: cpu at 006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x006>;
+			enable-method = "psci";
+		};
+		CPU7: cpu at 007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x007>;
+			enable-method = "psci";
+		};
+		CPU8: cpu at 008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x008>;
+			enable-method = "psci";
+		};
+		CPU9: cpu at 009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x009>;
+			enable-method = "psci";
+		};
+		CPU10: cpu at 00a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00a>;
+			enable-method = "psci";
+		};
+		CPU11: cpu at 00b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00b>;
+			enable-method = "psci";
+		};
+		CPU12: cpu at 00c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00c>;
+			enable-method = "psci";
+		};
+		CPU13: cpu at 00d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00d>;
+			enable-method = "psci";
+		};
+		CPU14: cpu at 00e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00e>;
+			enable-method = "psci";
+		};
+		CPU15: cpu at 00f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x00f>;
+			enable-method = "psci";
+		};
+		CPU16: cpu at 100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x100>;
+			enable-method = "psci";
+		};
+		CPU17: cpu at 101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x101>;
+			enable-method = "psci";
+		};
+		CPU18: cpu at 102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x102>;
+			enable-method = "psci";
+		};
+		CPU19: cpu at 103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x103>;
+			enable-method = "psci";
+		};
+		CPU20: cpu at 104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x104>;
+			enable-method = "psci";
+		};
+		CPU21: cpu at 105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x105>;
+			enable-method = "psci";
+		};
+		CPU22: cpu at 106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x106>;
+			enable-method = "psci";
+		};
+		CPU23: cpu at 107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x107>;
+			enable-method = "psci";
+		};
+		CPU24: cpu at 108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x108>;
+			enable-method = "psci";
+		};
+		CPU25: cpu at 109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x109>;
+			enable-method = "psci";
+		};
+		CPU26: cpu at 10a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10a>;
+			enable-method = "psci";
+		};
+		CPU27: cpu at 10b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10b>;
+			enable-method = "psci";
+		};
+		CPU28: cpu at 10c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10c>;
+			enable-method = "psci";
+		};
+		CPU29: cpu at 10d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10d>;
+			enable-method = "psci";
+		};
+		CPU30: cpu at 10e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10e>;
+			enable-method = "psci";
+		};
+		CPU31: cpu at 10f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10f>;
+			enable-method = "psci";
+		};
+		CPU32: cpu at 200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x200>;
+			enable-method = "psci";
+		};
+		CPU33: cpu at 201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x201>;
+			enable-method = "psci";
+		};
+		CPU34: cpu at 202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x202>;
+			enable-method = "psci";
+		};
+		CPU35: cpu at 203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x203>;
+			enable-method = "psci";
+		};
+		CPU36: cpu at 204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x204>;
+			enable-method = "psci";
+		};
+		CPU37: cpu at 205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x205>;
+			enable-method = "psci";
+		};
+		CPU38: cpu at 206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x206>;
+			enable-method = "psci";
+		};
+		CPU39: cpu at 207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x207>;
+			enable-method = "psci";
+		};
+		CPU40: cpu at 208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x208>;
+			enable-method = "psci";
+		};
+		CPU41: cpu at 209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x209>;
+			enable-method = "psci";
+		};
+		CPU42: cpu at 20a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20a>;
+			enable-method = "psci";
+		};
+		CPU43: cpu at 20b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20b>;
+			enable-method = "psci";
+		};
+		CPU44: cpu at 20c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20c>;
+			enable-method = "psci";
+		};
+		CPU45: cpu at 20d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20d>;
+			enable-method = "psci";
+		};
+		CPU46: cpu at 20e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20e>;
+			enable-method = "psci";
+		};
+		CPU47: cpu at 20f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x20f>;
+			enable-method = "psci";
+		};
+		CPU48: cpu at 10000 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10000>;
+			enable-method = "psci";
+		};
+		CPU49: cpu at 10001 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10001>;
+			enable-method = "psci";
+		};
+		CPU50: cpu at 10002 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10002>;
+			enable-method = "psci";
+		};
+		CPU51: cpu at 10003 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10003>;
+			enable-method = "psci";
+		};
+		CPU52: cpu at 10004 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10004>;
+			enable-method = "psci";
+		};
+		CPU53: cpu at 10005 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10005>;
+			enable-method = "psci";
+		};
+		CPU54: cpu at 10006 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10006>;
+			enable-method = "psci";
+		};
+		CPU55: cpu at 10007 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10007>;
+			enable-method = "psci";
+		};
+		CPU56: cpu at 10008 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10008>;
+			enable-method = "psci";
+		};
+		CPU57: cpu at 10009 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10009>;
+			enable-method = "psci";
+		};
+		CPU58: cpu at 1000a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000a>;
+			enable-method = "psci";
+		};
+		CPU59: cpu at 1000b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000b>;
+			enable-method = "psci";
+		};
+		CPU60: cpu at 1000c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000c>;
+			enable-method = "psci";
+		};
+		CPU61: cpu at 1000d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000d>;
+			enable-method = "psci";
+		};
+		CPU62: cpu at 1000e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000e>;
+			enable-method = "psci";
+		};
+		CPU63: cpu at 1000f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1000f>;
+			enable-method = "psci";
+		};
+		CPU64: cpu at 10100 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10100>;
+			enable-method = "psci";
+		};
+		CPU65: cpu at 10101 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10101>;
+			enable-method = "psci";
+		};
+		CPU66: cpu at 10102 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10102>;
+			enable-method = "psci";
+		};
+		CPU67: cpu at 10103 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10103>;
+			enable-method = "psci";
+		};
+		CPU68: cpu at 10104 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10104>;
+			enable-method = "psci";
+		};
+		CPU69: cpu at 10105 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10105>;
+			enable-method = "psci";
+		};
+		CPU70: cpu at 10106 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10106>;
+			enable-method = "psci";
+		};
+		CPU71: cpu at 10107 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10107>;
+			enable-method = "psci";
+		};
+		CPU72: cpu at 10108 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10108>;
+			enable-method = "psci";
+		};
+		CPU73: cpu at 10109 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10109>;
+			enable-method = "psci";
+		};
+		CPU74: cpu at 1010a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010a>;
+			enable-method = "psci";
+		};
+		CPU75: cpu at 1010b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010b>;
+			enable-method = "psci";
+		};
+		CPU76: cpu at 1010c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010c>;
+			enable-method = "psci";
+		};
+		CPU77: cpu at 1010d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010d>;
+			enable-method = "psci";
+		};
+		CPU78: cpu at 1010e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010e>;
+			enable-method = "psci";
+		};
+		CPU79: cpu at 1010f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1010f>;
+			enable-method = "psci";
+		};
+		CPU80: cpu at 10200 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10200>;
+			enable-method = "psci";
+		};
+		CPU81: cpu at 10201 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10201>;
+			enable-method = "psci";
+		};
+		CPU82: cpu at 10202 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10202>;
+			enable-method = "psci";
+		};
+		CPU83: cpu at 10203 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10203>;
+			enable-method = "psci";
+		};
+		CPU84: cpu at 10204 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10204>;
+			enable-method = "psci";
+		};
+		CPU85: cpu at 10205 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10205>;
+			enable-method = "psci";
+		};
+		CPU86: cpu at 10206 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10206>;
+			enable-method = "psci";
+		};
+		CPU87: cpu at 10207 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10207>;
+			enable-method = "psci";
+		};
+		CPU88: cpu at 10208 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10208>;
+			enable-method = "psci";
+		};
+		CPU89: cpu at 10209 {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x10209>;
+			enable-method = "psci";
+		};
+		CPU90: cpu at 1020a {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020a>;
+			enable-method = "psci";
+		};
+		CPU91: cpu at 1020b {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020b>;
+			enable-method = "psci";
+		};
+		CPU92: cpu at 1020c {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020c>;
+			enable-method = "psci";
+		};
+		CPU93: cpu at 1020d {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020d>;
+			enable-method = "psci";
+		};
+		CPU94: cpu at 1020e {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020e>;
+			enable-method = "psci";
+		};
+		CPU95: cpu at 1020f {
+			device_type = "cpu";
+			compatible = "cavium,thunder", "arm,armv8";
+			reg = <0x0 0x1020f>;
+			enable-method = "psci";
+		};
+	};
+
+	timer {
+		compatible = "arm,armv8-timer";
+		interrupts = <1 13 0xff01>,
+		             <1 14 0xff01>,
+		             <1 11 0xff01>,
+		             <1 10 0xff01>;
+	};
+
+	soc {
+		compatible = "simple-bus";
+		#address-cells = <2>;
+		#size-cells = <2>;
+		ranges;
+
+		refclk50mhz: refclk50mhz {
+			compatible = "fixed-clock";
+			#clock-cells = <0>;
+			clock-frequency = <50000000>;
+			clock-output-names = "refclk50mhz";
+		};
+
+		gic0: interrupt-controller at 8010,00000000 {
+			compatible = "arm,gic-v3";
+			#interrupt-cells = <3>;
+			#address-cells = <2>;
+			#size-cells = <2>;
+			#redistributor-regions = <2>;
+			ranges;
+			interrupt-controller;
+			reg = <0x8010 0x00000000 0x0 0x010000>, /* GICD */
+			      <0x8010 0x80000000 0x0 0x600000>, /* GICR Node 0 */
+			      <0x9010 0x80000000 0x0 0x600000>; /* GICR Node 1 */
+			interrupts = <1 9 0xf04>;
+		};
+
+		uaa0: serial at 87e0,24000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x24000000 0x0 0x1000>;
+			interrupts = <1 21 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+
+		uaa1: serial at 87e0,25000000 {
+			compatible = "arm,pl011", "arm,primecell";
+			reg = <0x87e0 0x25000000 0x0 0x1000>;
+			interrupts = <1 22 4>;
+			clocks = <&refclk50mhz>;
+			clock-names = "apb_pclk";
+		};
+	};
+};
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-09-25  9:03 ` Ganapatrao Kulkarni
@ 2014-09-25  9:03     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

Adding numa support for arm64 based platforms.
This version creates numa mapping by parsing the dt table.
cpu to node id mapping is derived from cluster_id as defined in cpu-map.
memory to node id mapping is derived from nid property of memory node.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
---
 arch/arm64/Kconfig              |  33 +++
 arch/arm64/include/asm/mmzone.h |  32 +++
 arch/arm64/include/asm/numa.h   |  41 ++++
 arch/arm64/kernel/setup.c       |   8 +
 arch/arm64/kernel/smp.c         |   2 +
 arch/arm64/mm/Makefile          |   1 +
 arch/arm64/mm/init.c            |  33 ++-
 arch/arm64/mm/numa.c            | 471 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 617 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a409105..415ee53 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,6 +142,7 @@ config ARCH_THUNDER
 	select NET_VENDOR_CAVIUM
 	select SATA_AHCI
 	select SATA_AHCI_PLATFORM
+	select HAVE_MEMBLOCK_NODE_MAP if NUMA
 
 config ARCH_VEXPRESS
 	bool "ARMv8 software model (Versatile Express)"
@@ -309,6 +310,38 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+	bool "Numa Memory Allocation and Scheduler Support"
+	depends on SMP
+	---help---
+	  Enable NUMA (Non Uniform Memory Access) support.
+
+	  The kernel will try to allocate memory used by a CPU on the
+	  local memory controller of the CPU and add some more
+	  NUMA awareness to the kernel.
+
+config ARM64_DT_NUMA
+	def_bool y
+	prompt "DT NUMA detection"
+	depends on ARM64 && NUMA && DTC
+	---help---
+	  Enable DT based numa.
+
+config NODES_SHIFT
+	int "Maximum NUMA Nodes (as a power of 2)"
+	range 1 10
+	default "2"
+	depends on NEED_MULTIPLE_NODES
+	---help---
+	  Specify the maximum number of NUMA Nodes available on the target
+	  system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
+
 source kernel/Kconfig.preempt
 
 config HZ
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 0000000..d27ee66
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,32 @@
+#ifndef __ASM_ARM64_MMZONE_H_
+#define __ASM_ARM64_MMZONE_H_
+
+#ifdef CONFIG_NUMA
+
+#include <linux/mmdebug.h>
+#include <asm/smp.h>
+#include <linux/types.h>
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+
+#define NODE_DATA(nid)		(node_data[nid])
+
+
+struct numa_memblk {
+	u64			start;
+	u64			end;
+	int			nid;
+};
+
+struct numa_meminfo {
+	int			nr_blks;
+	struct numa_memblk	blk[NR_NODE_MEMBLKS];
+};
+
+void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi);
+int __init numa_cleanup_meminfo(struct numa_meminfo *mi);
+void __init numa_reset_distance(void);
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_ARM64_MMZONE_H_ */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 0000000..46d53fd
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,41 @@
+#ifndef _ASM_ARM64_NUMA_H
+#define _ASM_ARM64_NUMA_H
+
+#include <linux/nodemask.h>
+#include <asm/topology.h>
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
+#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
+
+/*
+ * Too small node sizes may confuse the VM badly. Usually they
+ * result from BIOS bugs. So dont recognize nodes as standalone
+ * NUMA entities that have less than this amount of RAM listed:
+ */
+#define NODE_MIN_SIZE (4*1024*1024)
+
+#define parent_node(node)	(node)
+
+/* dummy definitions for pci functions */
+#define pcibus_to_node(node)	0
+#define cpumask_of_pcibus(bus)	0
+
+const struct cpumask *cpumask_of_node(int node);
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(u32 nodeid, u64 start, u64 end);
+void numa_store_cpu_info(int cpu);
+void numa_set_node(int cpu, int node);
+void numa_clear_node(int cpu);
+void numa_add_cpu(int cpu);
+void numa_remove_cpu(int cpu);
+#else	/* CONFIG_NUMA */
+static inline void arm64_numa_init(void);
+static inline void numa_store_cpu_info(int cpu)	{ }
+static inline void arm64_numa_init()			{ }
+#endif	/* CONFIG_NUMA */
+#endif	/* _ASM_ARM64_NUMA_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index edb146d..436b78d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -424,6 +424,9 @@ static int __init topology_init(void)
 {
 	int i;
 
+	for_each_online_node(i)
+		register_one_node(i);
+
 	for_each_possible_cpu(i) {
 		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
 		cpu->hotpluggable = 1;
@@ -460,7 +463,12 @@ static int c_show(struct seq_file *m, void *v)
 		 * "processor".  Give glibc what it expects.
 		 */
 #ifdef CONFIG_SMP
+	if (IS_ENABLED(CONFIG_NUMA)) {
+		seq_printf(m, "processor\t: %d", i);
+		seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
+	} else {
 		seq_printf(m, "processor\t: %d\n", i);
+	}
 #endif
 	}
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 4743397..60120db 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -50,6 +50,7 @@
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
 #include <asm/ptrace.h>
+#include <asm/numa.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/ipi.h>
@@ -123,6 +124,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 static void smp_store_cpu_info(unsigned int cpuid)
 {
 	store_cpu_topology(cpuid);
+	numa_store_cpu_info(cpuid);
 }
 
 /*
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 3ecb56c..4dda3d0 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -3,3 +3,4 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   ioremap.o mmap.o pgd.o mmu.o \
 				   context.o proc.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
+obj-$(CONFIG_NUMA)		+= numa.o
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 271e654..4b2bbb4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -39,6 +39,7 @@
 #include <asm/setup.h>
 #include <asm/sizes.h>
 #include <asm/tlb.h>
+#include <asm/numa.h>
 
 #include "mm.h"
 
@@ -73,6 +74,20 @@ static phys_addr_t max_zone_dma_phys(void)
 	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+#ifdef CONFIG_NUMA
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+	unsigned long max_zone_pfns[MAX_NR_ZONES];
+
+	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
+	if (IS_ENABLED(CONFIG_ZONE_DMA))
+		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
+	max_zone_pfns[ZONE_NORMAL] = max;
+
+	free_area_init_nodes(max_zone_pfns);
+}
+
+#else
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
 	struct memblock_region *reg;
@@ -111,6 +126,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 
 	free_area_init_node(0, zone_size, min, zhole_size);
 }
+#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 int pfn_valid(unsigned long pfn)
@@ -129,9 +145,16 @@ static void arm64_memory_present(void)
 {
 	struct memblock_region *reg;
 
-	for_each_memblock(memory, reg)
+	for_each_memblock(memory, reg) {
+#ifdef CONFIG_NUMA
+		memory_present(reg->nid,
+				memblock_region_memory_base_pfn(reg),
+				memblock_region_memory_end_pfn(reg));
+#else
 		memory_present(0, memblock_region_memory_base_pfn(reg),
 			       memblock_region_memory_end_pfn(reg));
+#endif
+	}
 }
 #endif
 
@@ -168,6 +191,11 @@ void __init bootmem_init(void)
 	min = PFN_UP(memblock_start_of_DRAM());
 	max = PFN_DOWN(memblock_end_of_DRAM());
 
+	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
+	max_pfn = max_low_pfn = max;
+
+	if (IS_ENABLED(CONFIG_NUMA))
+		arm64_numa_init();
 	/*
 	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
 	 * done after the fixed reservations.
@@ -176,9 +204,6 @@ void __init bootmem_init(void)
 
 	sparse_init();
 	zone_sizes_init(min, max);
-
-	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
-	max_pfn = max_low_pfn = max;
 }
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
new file mode 100644
index 0000000..a5f4555
--- /dev/null
+++ b/arch/arm64/mm/numa.c
@@ -0,0 +1,469 @@
+/*
+ * NUMA support, based on the x86 implementation.
+ *
+ * Copyright (C) 2014 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/memblock.h>
+#include <linux/mmzone.h>
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/sched.h>
+#include <linux/topology.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+int __initdata numa_off;
+nodemask_t numa_nodes_parsed __initdata;
+
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+
+static struct numa_meminfo numa_meminfo;
+
+static __init int numa_setup(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+	if (!strncmp(opt, "off", 3))
+		numa_off = 1;
+	return 0;
+}
+early_param("numa", numa_setup);
+
+cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const struct cpumask *cpumask_of_node(int node)
+{
+	if (node >= nr_node_ids) {
+		pr_warn("cpumask_of_node(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return cpu_none_mask;
+	}
+	if (node_to_cpumask_map[node] == NULL) {
+		pr_warn("cpumask_of_node(%d): no node_to_cpumask_map!\n",
+			node);
+		dump_stack();
+		return cpu_online_mask;
+	}
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+
+int cpu_to_node_map[NR_CPUS];
+EXPORT_SYMBOL(cpu_to_node_map);
+
+void numa_clear_node(int cpu)
+{
+	cpu_to_node_map[cpu] = NUMA_NO_NODE;
+}
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: cpumask_of_node() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int node;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES)
+		setup_nr_node_ids();
+
+	/* allocate the map */
+	for (node = 0; node < nr_node_ids; node++)
+		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
+
+	/* cpumask_of_node() will now work */
+	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
+}
+
+/*
+ *  Set the cpu to node and mem mapping
+ */
+void numa_store_cpu_info(cpu)
+{
+	cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
+	cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
+	set_numa_node(cpu_to_node_map[cpu]);
+	set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
+}
+
+/**
+ * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
+ */
+
+static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
+				     struct numa_meminfo *mi)
+{
+	/* ignore zero length blks */
+	if (start == end)
+		return 0;
+
+	/* whine about and ignore invalid blks */
+	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
+		pr_warn("numa: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
+				nid, start, end - 1);
+		return 0;
+	}
+
+	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
+		pr_err("numa: too many memblk ranges\n");
+		return -EINVAL;
+	}
+
+	pr_info("numa: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
+			mi->nr_blks, start, end, nid);
+	mi->blk[mi->nr_blks].start = start;
+	mi->blk[mi->nr_blks].end = end;
+	mi->blk[mi->nr_blks].nid = nid;
+	mi->nr_blks++;
+	return 0;
+}
+
+/**
+ * numa_add_memblk - Add one numa_memblk to numa_meminfo
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @end: End address of the new memblk
+ *
+ * Add a new memblk to the default numa_meminfo.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+#define MAX_PHYS_ADDR	((phys_addr_t)~0)
+
+int __init numa_add_memblk(u32 nid, u64 base, u64 size)
+{
+	const u64 phys_offset = __pa(PAGE_OFFSET);
+
+	base &= PAGE_MASK;
+	size &= PAGE_MASK;
+
+	if (base > MAX_PHYS_ADDR) {
+		pr_warn("numa: Ignoring memory block 0x%llx - 0x%llx\n",
+				base, base + size);
+		return -ENOMEM;
+	}
+
+	if (base + size > MAX_PHYS_ADDR) {
+		pr_warn("numa: Ignoring memory range 0x%lx - 0x%llx\n",
+				ULONG_MAX, base + size);
+		size = MAX_PHYS_ADDR - base;
+	}
+
+	if (base + size < phys_offset) {
+		pr_warn("numa: Ignoring memory block 0x%llx - 0x%llx\n",
+			   base, base + size);
+		return -ENOMEM;
+	}
+	if (base < phys_offset) {
+		pr_warn("numa: Ignoring memory range 0x%llx - 0x%llx\n",
+			   base, phys_offset);
+		size -= phys_offset - base;
+		base = phys_offset;
+	}
+
+	node_set(nid, numa_nodes_parsed);
+	return numa_add_memblk_to(nid, base, base+size, &numa_meminfo);
+}
+EXPORT_SYMBOL(numa_add_memblk);
+
+/* Initialize NODE_DATA for a node on the local memory */
+static void __init setup_node_data(int nid, u64 start, u64 end)
+{
+	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	u64 nd_pa;
+	void *nd;
+	int tnid;
+
+	/*
+	 * Don't confuse VM with a node that doesn't have the
+	 * minimum amount of memory:
+	 */
+	if (end && (end - start) < NODE_MIN_SIZE)
+		return;
+
+	start = roundup(start, ZONE_ALIGN);
+
+	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+	       nid, start, end - 1);
+
+	/*
+	 * Allocate node data.  Try node-local memory and then any node.
+	 */
+	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	if (!nd_pa) {
+		nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
+					      MEMBLOCK_ALLOC_ACCESSIBLE);
+		if (!nd_pa) {
+			pr_err("Cannot find %zu bytes in node %d\n",
+			       nd_size, nid);
+			return;
+		}
+	}
+	nd = __va(nd_pa);
+
+	/* report and initialize */
+	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
+	       nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	node_data[nid] = nd;
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
+	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
+
+	node_set_online(nid);
+}
+
+/*
+ * Set nodes, which have memory in @mi, in *@nodemask.
+ */
+static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
+					      const struct numa_meminfo *mi)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
+		if (mi->blk[i].start != mi->blk[i].end &&
+		    mi->blk[i].nid != NUMA_NO_NODE)
+			node_set(mi->blk[i].nid, *nodemask);
+}
+
+/*
+ * Sanity check to catch more bad NUMA configurations (they are amazingly
+ * common).  Make sure the nodes cover all memory.
+ */
+static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
+{
+	u64 numaram, totalram;
+	int i;
+
+	numaram = 0;
+	for (i = 0; i < mi->nr_blks; i++) {
+		u64 s = mi->blk[i].start >> PAGE_SHIFT;
+		u64 e = mi->blk[i].end >> PAGE_SHIFT;
+
+		numaram += e - s;
+		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		if ((s64)numaram < 0)
+			numaram = 0;
+	}
+
+	totalram = max_pfn - absent_pages_in_range(0, max_pfn);
+
+	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
+	if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
+		pr_err("numa: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
+		       (numaram << PAGE_SHIFT) >> 20,
+		       (totalram << PAGE_SHIFT) >> 20);
+		return false;
+	}
+	return true;
+}
+
+static int __init numa_register_memblks(struct numa_meminfo *mi)
+{
+	unsigned long uninitialized_var(pfn_align);
+	int i, nid;
+
+	/* Account for nodes with cpus and no memory */
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
+	if (WARN_ON(nodes_empty(node_possible_map)))
+		return -EINVAL;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+
+		memblock_set_node(mb->start, mb->end - mb->start,
+				  &memblock.memory, mb->nid);
+	}
+
+	/*
+	 * If sections array is gonna be used for pfn -> nid mapping, check
+	 * whether its granularity is fine enough.
+	 */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
+		       PFN_PHYS(pfn_align) >> 20,
+		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
+		return -EINVAL;
+	}
+#endif
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = PFN_PHYS(max_pfn);
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_data(nid, start, end);
+	}
+
+	/* Dump memblock with node info and return. */
+	memblock_dump_all();
+	return 0;
+}
+
+static int __init numa_init(int (*init_func)(void))
+{
+	int ret, i;
+
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+
+	ret = init_func();
+	if (ret < 0)
+		return ret;
+
+	ret = numa_register_memblks(&numa_meminfo);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < nr_cpu_ids; i++)
+		numa_clear_node(i);
+
+	setup_node_to_cpumask_map();
+	return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node and add memory blocks that cover all
+ * allowed memory.  This function must not fail.
+ */
+static int __init dummy_numa_init(void)
+{
+	pr_info("%s\n",
+	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
+	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
+	       0LLU, PFN_PHYS(max_pfn) - 1);
+
+	node_set(0, numa_nodes_parsed);
+	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
+
+	return 0;
+}
+
+/**
+ * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
+ */
+int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
+				     int depth, void *data)
+{
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+	const __be32 *reg, *endp, *nid_prop;
+	int l, nid;
+
+	/* We are scanning "memory" nodes only */
+	if (type == NULL) {
+		/*
+		 * The longtrail doesn't have a device_type on the
+		 * /memory node, so look for the node called /memory@0.
+		 */
+		if (depth != 1 || strcmp(uname, "memory@0") != 0)
+			return 0;
+	} else if (strcmp(type, "memory") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "reg", &l);
+	if (reg == NULL)
+		return 0;
+
+	endp = reg + (l / sizeof(__be32));
+	nid_prop = of_get_flat_dt_prop(node, "nid", &l);
+
+	if (nid_prop == NULL)
+		return -1;
+
+	nid = dt_mem_next_cell(OF_ROOT_NODE_ADDR_CELLS_DEFAULT, &nid_prop);
+
+	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+		u64 base, size;
+
+		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+		size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+		if (size == 0)
+			continue;
+		pr_debug("numa: nid %d , base %llx , size %llx\n", nid,
+				(unsigned long long)base,
+				(unsigned long long)size);
+		numa_add_memblk(nid, base, size);
+	}
+	return 0;
+}
+
+/* DT node mapping is done already early_init_dt_scan_memory */
+static inline int __init arm64_dt_numa_init(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_numa_map, NULL);
+	return 0;
+}
+
+/**
+ * arm64_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds.  The
+ * last fallback is dummy single node config encomapssing whole memory and
+ * never fails.
+ */
+void __init arm64_numa_init(void)
+{
+	if (!numa_off) {
+#ifdef CONFIG_ARM64_DT_NUMA
+		if (!numa_init(arm64_dt_numa_init))
+			return;
+#endif
+	}
+
+	numa_init(dummy_numa_init);
+}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-09-25  9:03     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:03 UTC (permalink / raw)
  To: linux-arm-kernel

Adding numa support for arm64 based platforms.
This version creates numa mapping by parsing the dt table.
cpu to node id mapping is derived from cluster_id as defined in cpu-map.
memory to node id mapping is derived from nid property of memory node.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig              |  33 +++
 arch/arm64/include/asm/mmzone.h |  32 +++
 arch/arm64/include/asm/numa.h   |  41 ++++
 arch/arm64/kernel/setup.c       |   8 +
 arch/arm64/kernel/smp.c         |   2 +
 arch/arm64/mm/Makefile          |   1 +
 arch/arm64/mm/init.c            |  33 ++-
 arch/arm64/mm/numa.c            | 471 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 617 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a409105..415ee53 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,6 +142,7 @@ config ARCH_THUNDER
 	select NET_VENDOR_CAVIUM
 	select SATA_AHCI
 	select SATA_AHCI_PLATFORM
+	select HAVE_MEMBLOCK_NODE_MAP if NUMA
 
 config ARCH_VEXPRESS
 	bool "ARMv8 software model (Versatile Express)"
@@ -309,6 +310,38 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+	bool "Numa Memory Allocation and Scheduler Support"
+	depends on SMP
+	---help---
+	  Enable NUMA (Non Uniform Memory Access) support.
+
+	  The kernel will try to allocate memory used by a CPU on the
+	  local memory controller of the CPU and add some more
+	  NUMA awareness to the kernel.
+
+config ARM64_DT_NUMA
+	def_bool y
+	prompt "DT NUMA detection"
+	depends on ARM64 && NUMA && DTC
+	---help---
+	  Enable DT based numa.
+
+config NODES_SHIFT
+	int "Maximum NUMA Nodes (as a power of 2)"
+	range 1 10
+	default "2"
+	depends on NEED_MULTIPLE_NODES
+	---help---
+	  Specify the maximum number of NUMA Nodes available on the target
+	  system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
+
 source kernel/Kconfig.preempt
 
 config HZ
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 0000000..d27ee66
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,32 @@
+#ifndef __ASM_ARM64_MMZONE_H_
+#define __ASM_ARM64_MMZONE_H_
+
+#ifdef CONFIG_NUMA
+
+#include <linux/mmdebug.h>
+#include <asm/smp.h>
+#include <linux/types.h>
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+
+#define NODE_DATA(nid)		(node_data[nid])
+
+
+struct numa_memblk {
+	u64			start;
+	u64			end;
+	int			nid;
+};
+
+struct numa_meminfo {
+	int			nr_blks;
+	struct numa_memblk	blk[NR_NODE_MEMBLKS];
+};
+
+void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi);
+int __init numa_cleanup_meminfo(struct numa_meminfo *mi);
+void __init numa_reset_distance(void);
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_ARM64_MMZONE_H_ */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 0000000..46d53fd
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,41 @@
+#ifndef _ASM_ARM64_NUMA_H
+#define _ASM_ARM64_NUMA_H
+
+#include <linux/nodemask.h>
+#include <asm/topology.h>
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
+#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
+
+/*
+ * Too small node sizes may confuse the VM badly. Usually they
+ * result from BIOS bugs. So dont recognize nodes as standalone
+ * NUMA entities that have less than this amount of RAM listed:
+ */
+#define NODE_MIN_SIZE (4*1024*1024)
+
+#define parent_node(node)	(node)
+
+/* dummy definitions for pci functions */
+#define pcibus_to_node(node)	0
+#define cpumask_of_pcibus(bus)	0
+
+const struct cpumask *cpumask_of_node(int node);
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(u32 nodeid, u64 start, u64 end);
+void numa_store_cpu_info(int cpu);
+void numa_set_node(int cpu, int node);
+void numa_clear_node(int cpu);
+void numa_add_cpu(int cpu);
+void numa_remove_cpu(int cpu);
+#else	/* CONFIG_NUMA */
+static inline void arm64_numa_init(void);
+static inline void numa_store_cpu_info(int cpu)	{ }
+static inline void arm64_numa_init()			{ }
+#endif	/* CONFIG_NUMA */
+#endif	/* _ASM_ARM64_NUMA_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index edb146d..436b78d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -424,6 +424,9 @@ static int __init topology_init(void)
 {
 	int i;
 
+	for_each_online_node(i)
+		register_one_node(i);
+
 	for_each_possible_cpu(i) {
 		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
 		cpu->hotpluggable = 1;
@@ -460,7 +463,12 @@ static int c_show(struct seq_file *m, void *v)
 		 * "processor".  Give glibc what it expects.
 		 */
 #ifdef CONFIG_SMP
+	if (IS_ENABLED(CONFIG_NUMA)) {
+		seq_printf(m, "processor\t: %d", i);
+		seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
+	} else {
 		seq_printf(m, "processor\t: %d\n", i);
+	}
 #endif
 	}
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 4743397..60120db 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -50,6 +50,7 @@
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
 #include <asm/ptrace.h>
+#include <asm/numa.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/ipi.h>
@@ -123,6 +124,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 static void smp_store_cpu_info(unsigned int cpuid)
 {
 	store_cpu_topology(cpuid);
+	numa_store_cpu_info(cpuid);
 }
 
 /*
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 3ecb56c..4dda3d0 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -3,3 +3,4 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   ioremap.o mmap.o pgd.o mmu.o \
 				   context.o proc.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
+obj-$(CONFIG_NUMA)		+= numa.o
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 271e654..4b2bbb4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -39,6 +39,7 @@
 #include <asm/setup.h>
 #include <asm/sizes.h>
 #include <asm/tlb.h>
+#include <asm/numa.h>
 
 #include "mm.h"
 
@@ -73,6 +74,20 @@ static phys_addr_t max_zone_dma_phys(void)
 	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+#ifdef CONFIG_NUMA
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+	unsigned long max_zone_pfns[MAX_NR_ZONES];
+
+	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
+	if (IS_ENABLED(CONFIG_ZONE_DMA))
+		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
+	max_zone_pfns[ZONE_NORMAL] = max;
+
+	free_area_init_nodes(max_zone_pfns);
+}
+
+#else
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
 	struct memblock_region *reg;
@@ -111,6 +126,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 
 	free_area_init_node(0, zone_size, min, zhole_size);
 }
+#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 int pfn_valid(unsigned long pfn)
@@ -129,9 +145,16 @@ static void arm64_memory_present(void)
 {
 	struct memblock_region *reg;
 
-	for_each_memblock(memory, reg)
+	for_each_memblock(memory, reg) {
+#ifdef CONFIG_NUMA
+		memory_present(reg->nid,
+				memblock_region_memory_base_pfn(reg),
+				memblock_region_memory_end_pfn(reg));
+#else
 		memory_present(0, memblock_region_memory_base_pfn(reg),
 			       memblock_region_memory_end_pfn(reg));
+#endif
+	}
 }
 #endif
 
@@ -168,6 +191,11 @@ void __init bootmem_init(void)
 	min = PFN_UP(memblock_start_of_DRAM());
 	max = PFN_DOWN(memblock_end_of_DRAM());
 
+	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
+	max_pfn = max_low_pfn = max;
+
+	if (IS_ENABLED(CONFIG_NUMA))
+		arm64_numa_init();
 	/*
 	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
 	 * done after the fixed reservations.
@@ -176,9 +204,6 @@ void __init bootmem_init(void)
 
 	sparse_init();
 	zone_sizes_init(min, max);
-
-	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
-	max_pfn = max_low_pfn = max;
 }
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
new file mode 100644
index 0000000..a5f4555
--- /dev/null
+++ b/arch/arm64/mm/numa.c
@@ -0,0 +1,469 @@
+/*
+ * NUMA support, based on the x86 implementation.
+ *
+ * Copyright (C) 2014 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/memblock.h>
+#include <linux/mmzone.h>
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/sched.h>
+#include <linux/topology.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+int __initdata numa_off;
+nodemask_t numa_nodes_parsed __initdata;
+
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+
+static struct numa_meminfo numa_meminfo;
+
+static __init int numa_setup(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+	if (!strncmp(opt, "off", 3))
+		numa_off = 1;
+	return 0;
+}
+early_param("numa", numa_setup);
+
+cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const struct cpumask *cpumask_of_node(int node)
+{
+	if (node >= nr_node_ids) {
+		pr_warn("cpumask_of_node(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return cpu_none_mask;
+	}
+	if (node_to_cpumask_map[node] == NULL) {
+		pr_warn("cpumask_of_node(%d): no node_to_cpumask_map!\n",
+			node);
+		dump_stack();
+		return cpu_online_mask;
+	}
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+
+int cpu_to_node_map[NR_CPUS];
+EXPORT_SYMBOL(cpu_to_node_map);
+
+void numa_clear_node(int cpu)
+{
+	cpu_to_node_map[cpu] = NUMA_NO_NODE;
+}
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: cpumask_of_node() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int node;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES)
+		setup_nr_node_ids();
+
+	/* allocate the map */
+	for (node = 0; node < nr_node_ids; node++)
+		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
+
+	/* cpumask_of_node() will now work */
+	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
+}
+
+/*
+ *  Set the cpu to node and mem mapping
+ */
+void numa_store_cpu_info(cpu)
+{
+	cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
+	cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
+	set_numa_node(cpu_to_node_map[cpu]);
+	set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
+}
+
+/**
+ * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
+ */
+
+static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
+				     struct numa_meminfo *mi)
+{
+	/* ignore zero length blks */
+	if (start == end)
+		return 0;
+
+	/* whine about and ignore invalid blks */
+	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
+		pr_warn("numa: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
+				nid, start, end - 1);
+		return 0;
+	}
+
+	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
+		pr_err("numa: too many memblk ranges\n");
+		return -EINVAL;
+	}
+
+	pr_info("numa: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
+			mi->nr_blks, start, end, nid);
+	mi->blk[mi->nr_blks].start = start;
+	mi->blk[mi->nr_blks].end = end;
+	mi->blk[mi->nr_blks].nid = nid;
+	mi->nr_blks++;
+	return 0;
+}
+
+/**
+ * numa_add_memblk - Add one numa_memblk to numa_meminfo
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @end: End address of the new memblk
+ *
+ * Add a new memblk to the default numa_meminfo.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+#define MAX_PHYS_ADDR	((phys_addr_t)~0)
+
+int __init numa_add_memblk(u32 nid, u64 base, u64 size)
+{
+	const u64 phys_offset = __pa(PAGE_OFFSET);
+
+	base &= PAGE_MASK;
+	size &= PAGE_MASK;
+
+	if (base > MAX_PHYS_ADDR) {
+		pr_warn("numa: Ignoring memory block 0x%llx - 0x%llx\n",
+				base, base + size);
+		return -ENOMEM;
+	}
+
+	if (base + size > MAX_PHYS_ADDR) {
+		pr_warn("numa: Ignoring memory range 0x%lx - 0x%llx\n",
+				ULONG_MAX, base + size);
+		size = MAX_PHYS_ADDR - base;
+	}
+
+	if (base + size < phys_offset) {
+		pr_warn("numa: Ignoring memory block 0x%llx - 0x%llx\n",
+			   base, base + size);
+		return -ENOMEM;
+	}
+	if (base < phys_offset) {
+		pr_warn("numa: Ignoring memory range 0x%llx - 0x%llx\n",
+			   base, phys_offset);
+		size -= phys_offset - base;
+		base = phys_offset;
+	}
+
+	node_set(nid, numa_nodes_parsed);
+	return numa_add_memblk_to(nid, base, base+size, &numa_meminfo);
+}
+EXPORT_SYMBOL(numa_add_memblk);
+
+/* Initialize NODE_DATA for a node on the local memory */
+static void __init setup_node_data(int nid, u64 start, u64 end)
+{
+	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	u64 nd_pa;
+	void *nd;
+	int tnid;
+
+	/*
+	 * Don't confuse VM with a node that doesn't have the
+	 * minimum amount of memory:
+	 */
+	if (end && (end - start) < NODE_MIN_SIZE)
+		return;
+
+	start = roundup(start, ZONE_ALIGN);
+
+	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+	       nid, start, end - 1);
+
+	/*
+	 * Allocate node data.  Try node-local memory and then any node.
+	 */
+	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	if (!nd_pa) {
+		nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
+					      MEMBLOCK_ALLOC_ACCESSIBLE);
+		if (!nd_pa) {
+			pr_err("Cannot find %zu bytes in node %d\n",
+			       nd_size, nid);
+			return;
+		}
+	}
+	nd = __va(nd_pa);
+
+	/* report and initialize */
+	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
+	       nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	node_data[nid] = nd;
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
+	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
+
+	node_set_online(nid);
+}
+
+/*
+ * Set nodes, which have memory in @mi, in *@nodemask.
+ */
+static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
+					      const struct numa_meminfo *mi)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
+		if (mi->blk[i].start != mi->blk[i].end &&
+		    mi->blk[i].nid != NUMA_NO_NODE)
+			node_set(mi->blk[i].nid, *nodemask);
+}
+
+/*
+ * Sanity check to catch more bad NUMA configurations (they are amazingly
+ * common).  Make sure the nodes cover all memory.
+ */
+static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
+{
+	u64 numaram, totalram;
+	int i;
+
+	numaram = 0;
+	for (i = 0; i < mi->nr_blks; i++) {
+		u64 s = mi->blk[i].start >> PAGE_SHIFT;
+		u64 e = mi->blk[i].end >> PAGE_SHIFT;
+
+		numaram += e - s;
+		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		if ((s64)numaram < 0)
+			numaram = 0;
+	}
+
+	totalram = max_pfn - absent_pages_in_range(0, max_pfn);
+
+	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
+	if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
+		pr_err("numa: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
+		       (numaram << PAGE_SHIFT) >> 20,
+		       (totalram << PAGE_SHIFT) >> 20);
+		return false;
+	}
+	return true;
+}
+
+static int __init numa_register_memblks(struct numa_meminfo *mi)
+{
+	unsigned long uninitialized_var(pfn_align);
+	int i, nid;
+
+	/* Account for nodes with cpus and no memory */
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
+	if (WARN_ON(nodes_empty(node_possible_map)))
+		return -EINVAL;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+
+		memblock_set_node(mb->start, mb->end - mb->start,
+				  &memblock.memory, mb->nid);
+	}
+
+	/*
+	 * If sections array is gonna be used for pfn -> nid mapping, check
+	 * whether its granularity is fine enough.
+	 */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
+		       PFN_PHYS(pfn_align) >> 20,
+		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
+		return -EINVAL;
+	}
+#endif
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = PFN_PHYS(max_pfn);
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_data(nid, start, end);
+	}
+
+	/* Dump memblock with node info and return. */
+	memblock_dump_all();
+	return 0;
+}
+
+static int __init numa_init(int (*init_func)(void))
+{
+	int ret, i;
+
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+
+	ret = init_func();
+	if (ret < 0)
+		return ret;
+
+	ret = numa_register_memblks(&numa_meminfo);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < nr_cpu_ids; i++)
+		numa_clear_node(i);
+
+	setup_node_to_cpumask_map();
+	return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node and add memory blocks that cover all
+ * allowed memory.  This function must not fail.
+ */
+static int __init dummy_numa_init(void)
+{
+	pr_info("%s\n",
+	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
+	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
+	       0LLU, PFN_PHYS(max_pfn) - 1);
+
+	node_set(0, numa_nodes_parsed);
+	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
+
+	return 0;
+}
+
+/**
+ * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
+ */
+int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
+				     int depth, void *data)
+{
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+	const __be32 *reg, *endp, *nid_prop;
+	int l, nid;
+
+	/* We are scanning "memory" nodes only */
+	if (type == NULL) {
+		/*
+		 * The longtrail doesn't have a device_type on the
+		 * /memory node, so look for the node called /memory at 0.
+		 */
+		if (depth != 1 || strcmp(uname, "memory@0") != 0)
+			return 0;
+	} else if (strcmp(type, "memory") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "reg", &l);
+	if (reg == NULL)
+		return 0;
+
+	endp = reg + (l / sizeof(__be32));
+	nid_prop = of_get_flat_dt_prop(node, "nid", &l);
+
+	if (nid_prop == NULL)
+		return -1;
+
+	nid = dt_mem_next_cell(OF_ROOT_NODE_ADDR_CELLS_DEFAULT, &nid_prop);
+
+	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+		u64 base, size;
+
+		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+		size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+		if (size == 0)
+			continue;
+		pr_debug("numa: nid %d , base %llx , size %llx\n", nid,
+				(unsigned long long)base,
+				(unsigned long long)size);
+		numa_add_memblk(nid, base, size);
+	}
+	return 0;
+}
+
+/* DT node mapping is done already early_init_dt_scan_memory */
+static inline int __init arm64_dt_numa_init(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_numa_map, NULL);
+	return 0;
+}
+
+/**
+ * arm64_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds.  The
+ * last fallback is dummy single node config encomapssing whole memory and
+ * never fails.
+ */
+void __init arm64_numa_init(void)
+{
+	if (!numa_off) {
+#ifdef CONFIG_ARM64_DT_NUMA
+		if (!numa_init(arm64_dt_numa_init))
+			return;
+#endif
+	}
+
+	numa_init(dummy_numa_init);
+}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-09-25  9:03 ` Ganapatrao Kulkarni
@ 2014-09-25  9:04     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:04 UTC (permalink / raw)
  To: catalin.marinas-5wv7dgnIgG8, will.deacon-5wv7dgnIgG8,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

Adding numa support for arm64 based platforms.
This version creates numa mapping by parsing the dt table.
cpu to node id mapping is derived from cluster_id as defined in cpu-map.
memory to node id mapping is derived from nid property of memory node.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
---
 arch/arm64/Kconfig              |  33 +++
 arch/arm64/include/asm/mmzone.h |  32 +++
 arch/arm64/include/asm/numa.h   |  41 ++++
 arch/arm64/kernel/setup.c       |   8 +
 arch/arm64/kernel/smp.c         |   2 +
 arch/arm64/mm/Makefile          |   1 +
 arch/arm64/mm/init.c            |  33 ++-
 arch/arm64/mm/numa.c            | 471 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 617 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a409105..415ee53 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,6 +142,7 @@ config ARCH_THUNDER
 	select NET_VENDOR_CAVIUM
 	select SATA_AHCI
 	select SATA_AHCI_PLATFORM
+	select HAVE_MEMBLOCK_NODE_MAP if NUMA
 
 config ARCH_VEXPRESS
 	bool "ARMv8 software model (Versatile Express)"
@@ -309,6 +310,38 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+	bool "Numa Memory Allocation and Scheduler Support"
+	depends on SMP
+	---help---
+	  Enable NUMA (Non Uniform Memory Access) support.
+
+	  The kernel will try to allocate memory used by a CPU on the
+	  local memory controller of the CPU and add some more
+	  NUMA awareness to the kernel.
+
+config ARM64_DT_NUMA
+	def_bool y
+	prompt "DT NUMA detection"
+	depends on ARM64 && NUMA && DTC
+	---help---
+	  Enable DT based numa.
+
+config NODES_SHIFT
+	int "Maximum NUMA Nodes (as a power of 2)"
+	range 1 10
+	default "2"
+	depends on NEED_MULTIPLE_NODES
+	---help---
+	  Specify the maximum number of NUMA Nodes available on the target
+	  system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
+
 source kernel/Kconfig.preempt
 
 config HZ
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 0000000..d27ee66
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,32 @@
+#ifndef __ASM_ARM64_MMZONE_H_
+#define __ASM_ARM64_MMZONE_H_
+
+#ifdef CONFIG_NUMA
+
+#include <linux/mmdebug.h>
+#include <asm/smp.h>
+#include <linux/types.h>
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+
+#define NODE_DATA(nid)		(node_data[nid])
+
+
+struct numa_memblk {
+	u64			start;
+	u64			end;
+	int			nid;
+};
+
+struct numa_meminfo {
+	int			nr_blks;
+	struct numa_memblk	blk[NR_NODE_MEMBLKS];
+};
+
+void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi);
+int __init numa_cleanup_meminfo(struct numa_meminfo *mi);
+void __init numa_reset_distance(void);
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_ARM64_MMZONE_H_ */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 0000000..46d53fd
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,41 @@
+#ifndef _ASM_ARM64_NUMA_H
+#define _ASM_ARM64_NUMA_H
+
+#include <linux/nodemask.h>
+#include <asm/topology.h>
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
+#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
+
+/*
+ * Too small node sizes may confuse the VM badly. Usually they
+ * result from BIOS bugs. So dont recognize nodes as standalone
+ * NUMA entities that have less than this amount of RAM listed:
+ */
+#define NODE_MIN_SIZE (4*1024*1024)
+
+#define parent_node(node)	(node)
+
+/* dummy definitions for pci functions */
+#define pcibus_to_node(node)	0
+#define cpumask_of_pcibus(bus)	0
+
+const struct cpumask *cpumask_of_node(int node);
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(u32 nodeid, u64 start, u64 end);
+void numa_store_cpu_info(int cpu);
+void numa_set_node(int cpu, int node);
+void numa_clear_node(int cpu);
+void numa_add_cpu(int cpu);
+void numa_remove_cpu(int cpu);
+#else	/* CONFIG_NUMA */
+static inline void arm64_numa_init(void );
+static inline void numa_store_cpu_info(int cpu)		{ }
+static inline void arm64_numa_init( )			{ }
+#endif	/* CONFIG_NUMA */
+#endif	/* _ASM_ARM64_NUMA_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index edb146d..436b78d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -424,6 +424,9 @@ static int __init topology_init(void)
 {
 	int i;
 
+	for_each_online_node(i)
+		register_one_node(i);
+
 	for_each_possible_cpu(i) {
 		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
 		cpu->hotpluggable = 1;
@@ -460,7 +463,12 @@ static int c_show(struct seq_file *m, void *v)
 		 * "processor".  Give glibc what it expects.
 		 */
 #ifdef CONFIG_SMP
+	if (IS_ENABLED(CONFIG_NUMA)) {
+		seq_printf(m, "processor\t: %d", i);
+		seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
+	} else {
 		seq_printf(m, "processor\t: %d\n", i);
+	}
 #endif
 	}
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 4743397..60120db 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -50,6 +50,7 @@
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
 #include <asm/ptrace.h>
+#include <asm/numa.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/ipi.h>
@@ -123,6 +124,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 static void smp_store_cpu_info(unsigned int cpuid)
 {
 	store_cpu_topology(cpuid);
+	numa_store_cpu_info(cpuid);
 }
 
 /*
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 3ecb56c..4dda3d0 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -3,3 +3,4 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   ioremap.o mmap.o pgd.o mmu.o \
 				   context.o proc.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
+obj-$(CONFIG_NUMA)		+= numa.o
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 271e654..4b2bbb4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -39,6 +39,7 @@
 #include <asm/setup.h>
 #include <asm/sizes.h>
 #include <asm/tlb.h>
+#include <asm/numa.h>
 
 #include "mm.h"
 
@@ -73,6 +74,20 @@ static phys_addr_t max_zone_dma_phys(void)
 	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+#ifdef CONFIG_NUMA
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+	unsigned long max_zone_pfns[MAX_NR_ZONES];
+
+	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
+	if (IS_ENABLED(CONFIG_ZONE_DMA))
+		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
+	max_zone_pfns[ZONE_NORMAL] = max;
+
+	free_area_init_nodes(max_zone_pfns);
+}
+
+#else
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
 	struct memblock_region *reg;
@@ -111,6 +126,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 
 	free_area_init_node(0, zone_size, min, zhole_size);
 }
+#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 int pfn_valid(unsigned long pfn)
@@ -129,9 +145,16 @@ static void arm64_memory_present(void)
 {
 	struct memblock_region *reg;
 
-	for_each_memblock(memory, reg)
+	for_each_memblock(memory, reg) {
+#ifdef CONFIG_NUMA
+		memory_present(reg->nid,
+				memblock_region_memory_base_pfn(reg),
+				memblock_region_memory_end_pfn(reg));
+#else
 		memory_present(0, memblock_region_memory_base_pfn(reg),
 			       memblock_region_memory_end_pfn(reg));
+#endif
+	}
 }
 #endif
 
@@ -168,6 +191,11 @@ void __init bootmem_init(void)
 	min = PFN_UP(memblock_start_of_DRAM());
 	max = PFN_DOWN(memblock_end_of_DRAM());
 
+	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
+	max_pfn = max_low_pfn = max;
+
+	if (IS_ENABLED(CONFIG_NUMA))
+		arm64_numa_init();
 	/*
 	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
 	 * done after the fixed reservations.
@@ -176,9 +204,6 @@ void __init bootmem_init(void)
 
 	sparse_init();
 	zone_sizes_init(min, max);
-
-	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
-	max_pfn = max_low_pfn = max;
 }
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
new file mode 100644
index 0000000..a5f4555
--- /dev/null
+++ b/arch/arm64/mm/numa.c
@@ -0,0 +1,471 @@
+/*
+ * NUMA support, based on the x86 implementation.
+ *
+ * Copyright (C) 2014 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni-YGCgFSpz5w/QT0dZR+AlfA@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/memblock.h>
+#include <linux/mmzone.h>
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/sched.h>
+#include <linux/topology.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+int __initdata numa_off;
+nodemask_t numa_nodes_parsed __initdata;
+
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+
+static struct numa_meminfo numa_meminfo;
+
+static __init int numa_setup(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+	if (!strncmp(opt, "off", 3))
+		numa_off = 1;
+	return 0;
+}
+early_param("numa", numa_setup);
+
+cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const struct cpumask *cpumask_of_node(int node)
+{
+	if (node >= nr_node_ids) {
+		printk(KERN_WARNING
+			"cpumask_of_node(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return cpu_none_mask;
+	}
+	if (node_to_cpumask_map[node] == NULL) {
+		printk(KERN_WARNING
+			"cpumask_of_node(%d): no node_to_cpumask_map!\n",
+			node);
+		dump_stack();
+		return cpu_online_mask;
+	}
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+
+int cpu_to_node_map[NR_CPUS];
+EXPORT_SYMBOL(cpu_to_node_map);
+
+void numa_clear_node(int cpu)
+{
+	cpu_to_node_map[cpu] = NUMA_NO_NODE;
+}
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: cpumask_of_node() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int node;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES)
+		setup_nr_node_ids();
+
+	/* allocate the map */
+	for (node = 0; node < nr_node_ids; node++)
+		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
+
+	/* cpumask_of_node() will now work */
+	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
+}
+
+/*
+ *  Set the cpu to node and mem mapping
+ */
+void numa_store_cpu_info(cpu)
+{
+	cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
+	cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
+	set_numa_node(cpu_to_node_map[cpu]);
+	set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
+}
+
+/**
+ * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
+ */
+
+static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
+				     struct numa_meminfo *mi)
+{
+	/* ignore zero length blks */
+	if (start == end)
+		return 0;
+
+	/* whine about and ignore invalid blks */
+	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
+		pr_warn("numa: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
+				nid, start, end - 1);
+		return 0;
+	}
+
+	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
+		pr_err("numa: too many memblk ranges\n");
+		return -EINVAL;
+	}
+
+	pr_info("numa: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
+			mi->nr_blks, start, end, nid);
+	mi->blk[mi->nr_blks].start = start;
+	mi->blk[mi->nr_blks].end = end;
+	mi->blk[mi->nr_blks].nid = nid;
+	mi->nr_blks++;
+	return 0;
+}
+
+/**
+ * numa_add_memblk - Add one numa_memblk to numa_meminfo
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @end: End address of the new memblk
+ *
+ * Add a new memblk to the default numa_meminfo.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+#define MAX_PHYS_ADDR	((phys_addr_t)~0)
+
+int __init numa_add_memblk(u32 nid, u64 base, u64 size)
+{
+	const u64 phys_offset = __pa(PAGE_OFFSET);
+
+	base &= PAGE_MASK;
+	size &= PAGE_MASK;
+
+	if (base > MAX_PHYS_ADDR) {
+		pr_warn("numa: Ignoring memory block 0x%llx - 0x%llx\n",
+				base, base + size);
+		return -ENOMEM;
+	}
+
+	if (base + size > MAX_PHYS_ADDR) {
+		pr_warn("numa: Ignoring memory range 0x%lx - 0x%llx\n",
+				ULONG_MAX, base + size);
+		size = MAX_PHYS_ADDR - base;
+	}
+
+	if (base + size < phys_offset) {
+		pr_warn("numa: Ignoring memory block 0x%llx - 0x%llx\n",
+			   base, base + size);
+		return -ENOMEM;
+	}
+	if (base < phys_offset) {
+		pr_warn("numa: Ignoring memory range 0x%llx - 0x%llx\n",
+			   base, phys_offset);
+		size -= phys_offset - base;
+		base = phys_offset;
+	}
+
+	node_set(nid, numa_nodes_parsed);
+	return numa_add_memblk_to(nid, base, base+size, &numa_meminfo);
+}
+EXPORT_SYMBOL(numa_add_memblk);
+
+/* Initialize NODE_DATA for a node on the local memory */
+static void __init setup_node_data(int nid, u64 start, u64 end)
+{
+	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	u64 nd_pa;
+	void *nd;
+	int tnid;
+
+	/*
+	 * Don't confuse VM with a node that doesn't have the
+	 * minimum amount of memory:
+	 */
+	if (end && (end - start) < NODE_MIN_SIZE)
+		return;
+
+	start = roundup(start, ZONE_ALIGN);
+
+	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+	       nid, start, end - 1);
+
+	/*
+	 * Allocate node data.  Try node-local memory and then any node.
+	 */
+	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	if (!nd_pa) {
+		nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
+					      MEMBLOCK_ALLOC_ACCESSIBLE);
+		if (!nd_pa) {
+			pr_err("Cannot find %zu bytes in node %d\n",
+			       nd_size, nid);
+			return;
+		}
+	}
+	nd = __va(nd_pa);
+
+	/* report and initialize */
+	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
+	       nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	node_data[nid] = nd;
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
+	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
+
+	node_set_online(nid);
+}
+
+/*
+ * Set nodes, which have memory in @mi, in *@nodemask.
+ */
+static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
+					      const struct numa_meminfo *mi)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
+		if (mi->blk[i].start != mi->blk[i].end &&
+		    mi->blk[i].nid != NUMA_NO_NODE)
+			node_set(mi->blk[i].nid, *nodemask);
+}
+
+/*
+ * Sanity check to catch more bad NUMA configurations (they are amazingly
+ * common).  Make sure the nodes cover all memory.
+ */
+static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
+{
+	u64 numaram, totalram;
+	int i;
+
+	numaram = 0;
+	for (i = 0; i < mi->nr_blks; i++) {
+		u64 s = mi->blk[i].start >> PAGE_SHIFT;
+		u64 e = mi->blk[i].end >> PAGE_SHIFT;
+
+		numaram += e - s;
+		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		if ((s64)numaram < 0)
+			numaram = 0;
+	}
+
+	totalram = max_pfn - absent_pages_in_range(0, max_pfn);
+
+	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
+	if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
+		pr_err("numa: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
+		       (numaram << PAGE_SHIFT) >> 20,
+		       (totalram << PAGE_SHIFT) >> 20);
+		return false;
+	}
+	return true;
+}
+
+static int __init numa_register_memblks(struct numa_meminfo *mi)
+{
+	unsigned long uninitialized_var(pfn_align);
+	int i, nid;
+
+	/* Account for nodes with cpus and no memory */
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
+	if (WARN_ON(nodes_empty(node_possible_map)))
+		return -EINVAL;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+
+		memblock_set_node(mb->start, mb->end - mb->start,
+				  &memblock.memory, mb->nid);
+	}
+
+	/*
+	 * If sections array is gonna be used for pfn -> nid mapping, check
+	 * whether its granularity is fine enough.
+	 */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
+		       PFN_PHYS(pfn_align) >> 20,
+		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
+		return -EINVAL;
+	}
+#endif
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = PFN_PHYS(max_pfn);
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_data(nid, start, end);
+	}
+
+	/* Dump memblock with node info and return. */
+	memblock_dump_all();
+	return 0;
+}
+
+static int __init numa_init(int (*init_func)(void))
+{
+	int ret, i;
+
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+
+	ret = init_func();
+	if (ret < 0)
+		return ret;
+
+	ret = numa_register_memblks(&numa_meminfo);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < nr_cpu_ids; i++)
+		numa_clear_node(i);
+
+	setup_node_to_cpumask_map();
+	return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node and add memory blocks that cover all
+ * allowed memory.  This function must not fail.
+ */
+static int __init dummy_numa_init(void)
+{
+	pr_info("%s\n",
+	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
+	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
+	       0LLU, PFN_PHYS(max_pfn) - 1);
+
+	node_set(0, numa_nodes_parsed);
+	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
+
+	return 0;
+}
+
+/**
+ * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
+ */
+int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
+				     int depth, void *data)
+{
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+	const __be32 *reg, *endp, *nid_prop;
+	int l, nid;
+
+	/* We are scanning "memory" nodes only */
+	if (type == NULL) {
+		/*
+		 * The longtrail doesn't have a device_type on the
+		 * /memory node, so look for the node called /memory@0.
+		 */
+		if (depth != 1 || strcmp(uname, "memory@0") != 0)
+			return 0;
+	} else if (strcmp(type, "memory") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "reg", &l);
+	if (reg == NULL)
+		return 0;
+
+	endp = reg + (l / sizeof(__be32));
+	nid_prop = of_get_flat_dt_prop(node, "nid", &l);
+
+	if (nid_prop == NULL)
+		return -1;
+
+	nid = dt_mem_next_cell(OF_ROOT_NODE_ADDR_CELLS_DEFAULT, &nid_prop);
+
+	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+		u64 base, size;
+
+		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+		size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+		if (size == 0)
+			continue;
+		pr_debug("numa: nid %d , base %llx , size %llx\n", nid,
+				(unsigned long long)base,
+				(unsigned long long)size);
+		numa_add_memblk(nid, base, size);
+	}
+	return 0;
+}
+
+/* DT node mapping is done already early_init_dt_scan_memory */
+static inline int __init arm64_dt_numa_init(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_numa_map, NULL);
+	return 0;
+}
+
+/**
+ * arm64_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds.  The
+ * last fallback is dummy single node config encomapssing whole memory and
+ * never fails.
+ */
+void __init arm64_numa_init(void)
+{
+	if (!numa_off) {
+#ifdef CONFIG_ARM64_DT_NUMA
+		if (!numa_init(arm64_dt_numa_init))
+			return;
+#endif
+	}
+
+	numa_init(dummy_numa_init);
+}
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-09-25  9:04     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-09-25  9:04 UTC (permalink / raw)
  To: linux-arm-kernel

Adding numa support for arm64 based platforms.
This version creates numa mapping by parsing the dt table.
cpu to node id mapping is derived from cluster_id as defined in cpu-map.
memory to node id mapping is derived from nid property of memory node.

Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
---
 arch/arm64/Kconfig              |  33 +++
 arch/arm64/include/asm/mmzone.h |  32 +++
 arch/arm64/include/asm/numa.h   |  41 ++++
 arch/arm64/kernel/setup.c       |   8 +
 arch/arm64/kernel/smp.c         |   2 +
 arch/arm64/mm/Makefile          |   1 +
 arch/arm64/mm/init.c            |  33 ++-
 arch/arm64/mm/numa.c            | 471 ++++++++++++++++++++++++++++++++++++++++
 8 files changed, 617 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm64/include/asm/mmzone.h
 create mode 100644 arch/arm64/include/asm/numa.h
 create mode 100644 arch/arm64/mm/numa.c

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a409105..415ee53 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -142,6 +142,7 @@ config ARCH_THUNDER
 	select NET_VENDOR_CAVIUM
 	select SATA_AHCI
 	select SATA_AHCI_PLATFORM
+	select HAVE_MEMBLOCK_NODE_MAP if NUMA
 
 config ARCH_VEXPRESS
 	bool "ARMv8 software model (Versatile Express)"
@@ -309,6 +310,38 @@ config HOTPLUG_CPU
 	  Say Y here to experiment with turning CPUs off and on.  CPUs
 	  can be controlled through /sys/devices/system/cpu.
 
+# Common NUMA Features
+config NUMA
+	bool "Numa Memory Allocation and Scheduler Support"
+	depends on SMP
+	---help---
+	  Enable NUMA (Non Uniform Memory Access) support.
+
+	  The kernel will try to allocate memory used by a CPU on the
+	  local memory controller of the CPU and add some more
+	  NUMA awareness to the kernel.
+
+config ARM64_DT_NUMA
+	def_bool y
+	prompt "DT NUMA detection"
+	depends on ARM64 && NUMA && DTC
+	---help---
+	  Enable DT based numa.
+
+config NODES_SHIFT
+	int "Maximum NUMA Nodes (as a power of 2)"
+	range 1 10
+	default "2"
+	depends on NEED_MULTIPLE_NODES
+	---help---
+	  Specify the maximum number of NUMA Nodes available on the target
+	  system.  Increases memory reserved to accommodate various tables.
+
+config USE_PERCPU_NUMA_NODE_ID
+	def_bool y
+	depends on NUMA
+
+
 source kernel/Kconfig.preempt
 
 config HZ
diff --git a/arch/arm64/include/asm/mmzone.h b/arch/arm64/include/asm/mmzone.h
new file mode 100644
index 0000000..d27ee66
--- /dev/null
+++ b/arch/arm64/include/asm/mmzone.h
@@ -0,0 +1,32 @@
+#ifndef __ASM_ARM64_MMZONE_H_
+#define __ASM_ARM64_MMZONE_H_
+
+#ifdef CONFIG_NUMA
+
+#include <linux/mmdebug.h>
+#include <asm/smp.h>
+#include <linux/types.h>
+#include <asm/numa.h>
+
+extern struct pglist_data *node_data[];
+
+#define NODE_DATA(nid)		(node_data[nid])
+
+
+struct numa_memblk {
+	u64			start;
+	u64			end;
+	int			nid;
+};
+
+struct numa_meminfo {
+	int			nr_blks;
+	struct numa_memblk	blk[NR_NODE_MEMBLKS];
+};
+
+void __init numa_remove_memblk_from(int idx, struct numa_meminfo *mi);
+int __init numa_cleanup_meminfo(struct numa_meminfo *mi);
+void __init numa_reset_distance(void);
+
+#endif /* CONFIG_NUMA */
+#endif /* __ASM_ARM64_MMZONE_H_ */
diff --git a/arch/arm64/include/asm/numa.h b/arch/arm64/include/asm/numa.h
new file mode 100644
index 0000000..46d53fd
--- /dev/null
+++ b/arch/arm64/include/asm/numa.h
@@ -0,0 +1,41 @@
+#ifndef _ASM_ARM64_NUMA_H
+#define _ASM_ARM64_NUMA_H
+
+#include <linux/nodemask.h>
+#include <asm/topology.h>
+
+#ifdef CONFIG_NUMA
+
+#define NR_NODE_MEMBLKS		(MAX_NUMNODES * 2)
+#define ZONE_ALIGN (1UL << (MAX_ORDER + PAGE_SHIFT))
+
+/*
+ * Too small node sizes may confuse the VM badly. Usually they
+ * result from BIOS bugs. So dont recognize nodes as standalone
+ * NUMA entities that have less than this amount of RAM listed:
+ */
+#define NODE_MIN_SIZE (4*1024*1024)
+
+#define parent_node(node)	(node)
+
+/* dummy definitions for pci functions */
+#define pcibus_to_node(node)	0
+#define cpumask_of_pcibus(bus)	0
+
+const struct cpumask *cpumask_of_node(int node);
+/* Mappings between node number and cpus on that node. */
+extern cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+
+void __init arm64_numa_init(void);
+int __init numa_add_memblk(u32 nodeid, u64 start, u64 end);
+void numa_store_cpu_info(int cpu);
+void numa_set_node(int cpu, int node);
+void numa_clear_node(int cpu);
+void numa_add_cpu(int cpu);
+void numa_remove_cpu(int cpu);
+#else	/* CONFIG_NUMA */
+static inline void arm64_numa_init(void );
+static inline void numa_store_cpu_info(int cpu)		{ }
+static inline void arm64_numa_init( )			{ }
+#endif	/* CONFIG_NUMA */
+#endif	/* _ASM_ARM64_NUMA_H */
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index edb146d..436b78d 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -424,6 +424,9 @@ static int __init topology_init(void)
 {
 	int i;
 
+	for_each_online_node(i)
+		register_one_node(i);
+
 	for_each_possible_cpu(i) {
 		struct cpu *cpu = &per_cpu(cpu_data.cpu, i);
 		cpu->hotpluggable = 1;
@@ -460,7 +463,12 @@ static int c_show(struct seq_file *m, void *v)
 		 * "processor".  Give glibc what it expects.
 		 */
 #ifdef CONFIG_SMP
+	if (IS_ENABLED(CONFIG_NUMA)) {
+		seq_printf(m, "processor\t: %d", i);
+		seq_printf(m, " [nid: %d]\n", cpu_to_node(i));
+	} else {
 		seq_printf(m, "processor\t: %d\n", i);
+	}
 #endif
 	}
 
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index 4743397..60120db 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -50,6 +50,7 @@
 #include <asm/sections.h>
 #include <asm/tlbflush.h>
 #include <asm/ptrace.h>
+#include <asm/numa.h>
 
 #define CREATE_TRACE_POINTS
 #include <trace/events/ipi.h>
@@ -123,6 +124,7 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
 static void smp_store_cpu_info(unsigned int cpuid)
 {
 	store_cpu_topology(cpuid);
+	numa_store_cpu_info(cpuid);
 }
 
 /*
diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
index 3ecb56c..4dda3d0 100644
--- a/arch/arm64/mm/Makefile
+++ b/arch/arm64/mm/Makefile
@@ -3,3 +3,4 @@ obj-y				:= dma-mapping.o extable.o fault.o init.o \
 				   ioremap.o mmap.o pgd.o mmu.o \
 				   context.o proc.o
 obj-$(CONFIG_HUGETLB_PAGE)	+= hugetlbpage.o
+obj-$(CONFIG_NUMA)		+= numa.o
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 271e654..4b2bbb4 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -39,6 +39,7 @@
 #include <asm/setup.h>
 #include <asm/sizes.h>
 #include <asm/tlb.h>
+#include <asm/numa.h>
 
 #include "mm.h"
 
@@ -73,6 +74,20 @@ static phys_addr_t max_zone_dma_phys(void)
 	return min(offset + (1ULL << 32), memblock_end_of_DRAM());
 }
 
+#ifdef CONFIG_NUMA
+static void __init zone_sizes_init(unsigned long min, unsigned long max)
+{
+	unsigned long max_zone_pfns[MAX_NR_ZONES];
+
+	memset(max_zone_pfns, 0, sizeof(max_zone_pfns));
+	if (IS_ENABLED(CONFIG_ZONE_DMA))
+		max_zone_pfns[ZONE_DMA] = PFN_DOWN(max_zone_dma_phys());
+	max_zone_pfns[ZONE_NORMAL] = max;
+
+	free_area_init_nodes(max_zone_pfns);
+}
+
+#else
 static void __init zone_sizes_init(unsigned long min, unsigned long max)
 {
 	struct memblock_region *reg;
@@ -111,6 +126,7 @@ static void __init zone_sizes_init(unsigned long min, unsigned long max)
 
 	free_area_init_node(0, zone_size, min, zhole_size);
 }
+#endif /* CONFIG_NUMA */
 
 #ifdef CONFIG_HAVE_ARCH_PFN_VALID
 int pfn_valid(unsigned long pfn)
@@ -129,9 +145,16 @@ static void arm64_memory_present(void)
 {
 	struct memblock_region *reg;
 
-	for_each_memblock(memory, reg)
+	for_each_memblock(memory, reg) {
+#ifdef CONFIG_NUMA
+		memory_present(reg->nid,
+				memblock_region_memory_base_pfn(reg),
+				memblock_region_memory_end_pfn(reg));
+#else
 		memory_present(0, memblock_region_memory_base_pfn(reg),
 			       memblock_region_memory_end_pfn(reg));
+#endif
+	}
 }
 #endif
 
@@ -168,6 +191,11 @@ void __init bootmem_init(void)
 	min = PFN_UP(memblock_start_of_DRAM());
 	max = PFN_DOWN(memblock_end_of_DRAM());
 
+	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
+	max_pfn = max_low_pfn = max;
+
+	if (IS_ENABLED(CONFIG_NUMA))
+		arm64_numa_init();
 	/*
 	 * Sparsemem tries to allocate bootmem in memory_present(), so must be
 	 * done after the fixed reservations.
@@ -176,9 +204,6 @@ void __init bootmem_init(void)
 
 	sparse_init();
 	zone_sizes_init(min, max);
-
-	high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
-	max_pfn = max_low_pfn = max;
 }
 
 #ifndef CONFIG_SPARSEMEM_VMEMMAP
diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c
new file mode 100644
index 0000000..a5f4555
--- /dev/null
+++ b/arch/arm64/mm/numa.c
@@ -0,0 +1,471 @@
+/*
+ * NUMA support, based on the x86 implementation.
+ *
+ * Copyright (C) 2014 Cavium Inc.
+ * Author: Ganapatrao Kulkarni <gkulkarni@cavium.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/string.h>
+#include <linux/init.h>
+#include <linux/bootmem.h>
+#include <linux/memblock.h>
+#include <linux/mmzone.h>
+#include <linux/ctype.h>
+#include <linux/module.h>
+#include <linux/nodemask.h>
+#include <linux/sched.h>
+#include <linux/topology.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+int __initdata numa_off;
+nodemask_t numa_nodes_parsed __initdata;
+
+struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
+EXPORT_SYMBOL(node_data);
+
+static struct numa_meminfo numa_meminfo;
+
+static __init int numa_setup(char *opt)
+{
+	if (!opt)
+		return -EINVAL;
+	if (!strncmp(opt, "off", 3))
+		numa_off = 1;
+	return 0;
+}
+early_param("numa", numa_setup);
+
+cpumask_var_t node_to_cpumask_map[MAX_NUMNODES];
+EXPORT_SYMBOL(node_to_cpumask_map);
+
+/*
+ * Returns a pointer to the bitmask of CPUs on Node 'node'.
+ */
+const struct cpumask *cpumask_of_node(int node)
+{
+	if (node >= nr_node_ids) {
+		printk(KERN_WARNING
+			"cpumask_of_node(%d): node > nr_node_ids(%d)\n",
+			node, nr_node_ids);
+		dump_stack();
+		return cpu_none_mask;
+	}
+	if (node_to_cpumask_map[node] == NULL) {
+		printk(KERN_WARNING
+			"cpumask_of_node(%d): no node_to_cpumask_map!\n",
+			node);
+		dump_stack();
+		return cpu_online_mask;
+	}
+	return node_to_cpumask_map[node];
+}
+EXPORT_SYMBOL(cpumask_of_node);
+
+int cpu_to_node_map[NR_CPUS];
+EXPORT_SYMBOL(cpu_to_node_map);
+
+void numa_clear_node(int cpu)
+{
+	cpu_to_node_map[cpu] = NUMA_NO_NODE;
+}
+
+/*
+ * Allocate node_to_cpumask_map based on number of available nodes
+ * Requires node_possible_map to be valid.
+ *
+ * Note: cpumask_of_node() is not valid until after this is done.
+ * (Use CONFIG_DEBUG_PER_CPU_MAPS to check this.)
+ */
+void __init setup_node_to_cpumask_map(void)
+{
+	unsigned int node;
+
+	/* setup nr_node_ids if not done yet */
+	if (nr_node_ids == MAX_NUMNODES)
+		setup_nr_node_ids();
+
+	/* allocate the map */
+	for (node = 0; node < nr_node_ids; node++)
+		alloc_bootmem_cpumask_var(&node_to_cpumask_map[node]);
+
+	/* cpumask_of_node() will now work */
+	pr_debug("Node to cpumask map for %d nodes\n", nr_node_ids);
+}
+
+/*
+ *  Set the cpu to node and mem mapping
+ */
+void numa_store_cpu_info(cpu)
+{
+	cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
+	cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
+	set_numa_node(cpu_to_node_map[cpu]);
+	set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
+}
+
+/**
+ * numa_add_memblk_to - Add one numa_memblk to a numa_meminfo
+ */
+
+static int __init numa_add_memblk_to(int nid, u64 start, u64 end,
+				     struct numa_meminfo *mi)
+{
+	/* ignore zero length blks */
+	if (start == end)
+		return 0;
+
+	/* whine about and ignore invalid blks */
+	if (start > end || nid < 0 || nid >= MAX_NUMNODES) {
+		pr_warn("numa: Warning: invalid memblk node %d [mem %#010Lx-%#010Lx]\n",
+				nid, start, end - 1);
+		return 0;
+	}
+
+	if (mi->nr_blks >= NR_NODE_MEMBLKS) {
+		pr_err("numa: too many memblk ranges\n");
+		return -EINVAL;
+	}
+
+	pr_info("numa: Adding memblock %d [0x%llx - 0x%llx] on node %d\n",
+			mi->nr_blks, start, end, nid);
+	mi->blk[mi->nr_blks].start = start;
+	mi->blk[mi->nr_blks].end = end;
+	mi->blk[mi->nr_blks].nid = nid;
+	mi->nr_blks++;
+	return 0;
+}
+
+/**
+ * numa_add_memblk - Add one numa_memblk to numa_meminfo
+ * @nid: NUMA node ID of the new memblk
+ * @start: Start address of the new memblk
+ * @end: End address of the new memblk
+ *
+ * Add a new memblk to the default numa_meminfo.
+ *
+ * RETURNS:
+ * 0 on success, -errno on failure.
+ */
+#define MAX_PHYS_ADDR	((phys_addr_t)~0)
+
+int __init numa_add_memblk(u32 nid, u64 base, u64 size)
+{
+	const u64 phys_offset = __pa(PAGE_OFFSET);
+
+	base &= PAGE_MASK;
+	size &= PAGE_MASK;
+
+	if (base > MAX_PHYS_ADDR) {
+		pr_warn("numa: Ignoring memory block 0x%llx - 0x%llx\n",
+				base, base + size);
+		return -ENOMEM;
+	}
+
+	if (base + size > MAX_PHYS_ADDR) {
+		pr_warn("numa: Ignoring memory range 0x%lx - 0x%llx\n",
+				ULONG_MAX, base + size);
+		size = MAX_PHYS_ADDR - base;
+	}
+
+	if (base + size < phys_offset) {
+		pr_warn("numa: Ignoring memory block 0x%llx - 0x%llx\n",
+			   base, base + size);
+		return -ENOMEM;
+	}
+	if (base < phys_offset) {
+		pr_warn("numa: Ignoring memory range 0x%llx - 0x%llx\n",
+			   base, phys_offset);
+		size -= phys_offset - base;
+		base = phys_offset;
+	}
+
+	node_set(nid, numa_nodes_parsed);
+	return numa_add_memblk_to(nid, base, base+size, &numa_meminfo);
+}
+EXPORT_SYMBOL(numa_add_memblk);
+
+/* Initialize NODE_DATA for a node on the local memory */
+static void __init setup_node_data(int nid, u64 start, u64 end)
+{
+	const size_t nd_size = roundup(sizeof(pg_data_t), PAGE_SIZE);
+	u64 nd_pa;
+	void *nd;
+	int tnid;
+
+	/*
+	 * Don't confuse VM with a node that doesn't have the
+	 * minimum amount of memory:
+	 */
+	if (end && (end - start) < NODE_MIN_SIZE)
+		return;
+
+	start = roundup(start, ZONE_ALIGN);
+
+	pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n",
+	       nid, start, end - 1);
+
+	/*
+	 * Allocate node data.  Try node-local memory and then any node.
+	 */
+	nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
+	if (!nd_pa) {
+		nd_pa = __memblock_alloc_base(nd_size, SMP_CACHE_BYTES,
+					      MEMBLOCK_ALLOC_ACCESSIBLE);
+		if (!nd_pa) {
+			pr_err("Cannot find %zu bytes in node %d\n",
+			       nd_size, nid);
+			return;
+		}
+	}
+	nd = __va(nd_pa);
+
+	/* report and initialize */
+	pr_info("  NODE_DATA [mem %#010Lx-%#010Lx]\n",
+	       nd_pa, nd_pa + nd_size - 1);
+	tnid = early_pfn_to_nid(nd_pa >> PAGE_SHIFT);
+	if (tnid != nid)
+		pr_info("    NODE_DATA(%d) on node %d\n", nid, tnid);
+
+	node_data[nid] = nd;
+	memset(NODE_DATA(nid), 0, sizeof(pg_data_t));
+	NODE_DATA(nid)->node_id = nid;
+	NODE_DATA(nid)->node_start_pfn = start >> PAGE_SHIFT;
+	NODE_DATA(nid)->node_spanned_pages = (end - start) >> PAGE_SHIFT;
+
+	node_set_online(nid);
+}
+
+/*
+ * Set nodes, which have memory in @mi, in *@nodemask.
+ */
+static void __init numa_nodemask_from_meminfo(nodemask_t *nodemask,
+					      const struct numa_meminfo *mi)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mi->blk); i++)
+		if (mi->blk[i].start != mi->blk[i].end &&
+		    mi->blk[i].nid != NUMA_NO_NODE)
+			node_set(mi->blk[i].nid, *nodemask);
+}
+
+/*
+ * Sanity check to catch more bad NUMA configurations (they are amazingly
+ * common).  Make sure the nodes cover all memory.
+ */
+static bool __init numa_meminfo_cover_memory(const struct numa_meminfo *mi)
+{
+	u64 numaram, totalram;
+	int i;
+
+	numaram = 0;
+	for (i = 0; i < mi->nr_blks; i++) {
+		u64 s = mi->blk[i].start >> PAGE_SHIFT;
+		u64 e = mi->blk[i].end >> PAGE_SHIFT;
+
+		numaram += e - s;
+		numaram -= __absent_pages_in_range(mi->blk[i].nid, s, e);
+		if ((s64)numaram < 0)
+			numaram = 0;
+	}
+
+	totalram = max_pfn - absent_pages_in_range(0, max_pfn);
+
+	/* We seem to lose 3 pages somewhere. Allow 1M of slack. */
+	if ((s64)(totalram - numaram) >= (1 << (20 - PAGE_SHIFT))) {
+		pr_err("numa: nodes only cover %lluMB of your %lluMB Total RAM. Not used.\n",
+		       (numaram << PAGE_SHIFT) >> 20,
+		       (totalram << PAGE_SHIFT) >> 20);
+		return false;
+	}
+	return true;
+}
+
+static int __init numa_register_memblks(struct numa_meminfo *mi)
+{
+	unsigned long uninitialized_var(pfn_align);
+	int i, nid;
+
+	/* Account for nodes with cpus and no memory */
+	node_possible_map = numa_nodes_parsed;
+	numa_nodemask_from_meminfo(&node_possible_map, mi);
+	if (WARN_ON(nodes_empty(node_possible_map)))
+		return -EINVAL;
+
+	for (i = 0; i < mi->nr_blks; i++) {
+		struct numa_memblk *mb = &mi->blk[i];
+
+		memblock_set_node(mb->start, mb->end - mb->start,
+				  &memblock.memory, mb->nid);
+	}
+
+	/*
+	 * If sections array is gonna be used for pfn -> nid mapping, check
+	 * whether its granularity is fine enough.
+	 */
+#ifdef NODE_NOT_IN_PAGE_FLAGS
+	pfn_align = node_map_pfn_alignment();
+	if (pfn_align && pfn_align < PAGES_PER_SECTION) {
+		pr_warn("Node alignment %lluMB < min %lluMB, rejecting NUMA config\n",
+		       PFN_PHYS(pfn_align) >> 20,
+		       PFN_PHYS(PAGES_PER_SECTION) >> 20);
+		return -EINVAL;
+	}
+#endif
+	if (!numa_meminfo_cover_memory(mi))
+		return -EINVAL;
+
+	/* Finally register nodes. */
+	for_each_node_mask(nid, node_possible_map) {
+		u64 start = PFN_PHYS(max_pfn);
+		u64 end = 0;
+
+		for (i = 0; i < mi->nr_blks; i++) {
+			if (nid != mi->blk[i].nid)
+				continue;
+			start = min(mi->blk[i].start, start);
+			end = max(mi->blk[i].end, end);
+		}
+
+		if (start < end)
+			setup_node_data(nid, start, end);
+	}
+
+	/* Dump memblock with node info and return. */
+	memblock_dump_all();
+	return 0;
+}
+
+static int __init numa_init(int (*init_func)(void))
+{
+	int ret, i;
+
+	nodes_clear(node_possible_map);
+	nodes_clear(node_online_map);
+
+	ret = init_func();
+	if (ret < 0)
+		return ret;
+
+	ret = numa_register_memblks(&numa_meminfo);
+	if (ret < 0)
+		return ret;
+
+	for (i = 0; i < nr_cpu_ids; i++)
+		numa_clear_node(i);
+
+	setup_node_to_cpumask_map();
+	return 0;
+}
+
+/**
+ * dummy_numa_init - Fallback dummy NUMA init
+ *
+ * Used if there's no underlying NUMA architecture, NUMA initialization
+ * fails, or NUMA is disabled on the command line.
+ *
+ * Must online at least one node and add memory blocks that cover all
+ * allowed memory.  This function must not fail.
+ */
+static int __init dummy_numa_init(void)
+{
+	pr_info("%s\n",
+	       numa_off ? "NUMA turned off" : "No NUMA configuration found");
+	pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
+	       0LLU, PFN_PHYS(max_pfn) - 1);
+
+	node_set(0, numa_nodes_parsed);
+	numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
+
+	return 0;
+}
+
+/**
+ * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
+ */
+int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
+				     int depth, void *data)
+{
+	const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
+	const __be32 *reg, *endp, *nid_prop;
+	int l, nid;
+
+	/* We are scanning "memory" nodes only */
+	if (type == NULL) {
+		/*
+		 * The longtrail doesn't have a device_type on the
+		 * /memory node, so look for the node called /memory at 0.
+		 */
+		if (depth != 1 || strcmp(uname, "memory@0") != 0)
+			return 0;
+	} else if (strcmp(type, "memory") != 0)
+		return 0;
+
+	reg = of_get_flat_dt_prop(node, "reg", &l);
+	if (reg == NULL)
+		return 0;
+
+	endp = reg + (l / sizeof(__be32));
+	nid_prop = of_get_flat_dt_prop(node, "nid", &l);
+
+	if (nid_prop == NULL)
+		return -1;
+
+	nid = dt_mem_next_cell(OF_ROOT_NODE_ADDR_CELLS_DEFAULT, &nid_prop);
+
+	while ((endp - reg) >= (dt_root_addr_cells + dt_root_size_cells)) {
+		u64 base, size;
+
+		base = dt_mem_next_cell(dt_root_addr_cells, &reg);
+		size = dt_mem_next_cell(dt_root_size_cells, &reg);
+
+		if (size == 0)
+			continue;
+		pr_debug("numa: nid %d , base %llx , size %llx\n", nid,
+				(unsigned long long)base,
+				(unsigned long long)size);
+		numa_add_memblk(nid, base, size);
+	}
+	return 0;
+}
+
+/* DT node mapping is done already early_init_dt_scan_memory */
+static inline int __init arm64_dt_numa_init(void)
+{
+	of_scan_flat_dt(early_init_dt_scan_numa_map, NULL);
+	return 0;
+}
+
+/**
+ * arm64_numa_init - Initialize NUMA
+ *
+ * Try each configured NUMA initialization method until one succeeds.  The
+ * last fallback is dummy single node config encomapssing whole memory and
+ * never fails.
+ */
+void __init arm64_numa_init(void)
+{
+	if (!numa_off) {
+#ifdef CONFIG_ARM64_DT_NUMA
+		if (!numa_init(arm64_dt_numa_init))
+			return;
+#endif
+	}
+
+	numa_init(dummy_numa_init);
+}
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 1/4] arm64: defconfig: increase NR_CPUS range to 2-128
  2014-09-25  9:03     ` Ganapatrao Kulkarni
@ 2014-10-03 10:58         ` Mark Rutland
  -1 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-03 10:58 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Sep 25, 2014 at 10:03:56AM +0100, Ganapatrao Kulkarni wrote:
> Raising the maximum limit to 128. This is needed for Cavium's
> Thunder systems that will have 96 cores on Multi-node system.

Has this been tested on any such systems?

Mark.

> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>  arch/arm64/Kconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 4d42453..a409105 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -296,8 +296,8 @@ config SCHED_SMT
>  	  places. If unsure say N here.
>  
>  config NR_CPUS
> -	int "Maximum number of CPUs (2-64)"
> -	range 2 64
> +	int "Maximum number of CPUs (2-128)"
> +	range 2 128
>  	depends on SMP
>  	# These have to remain sorted largest to smallest
>  	default "64"
> -- 
> 1.8.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 1/4] arm64: defconfig: increase NR_CPUS range to 2-128
@ 2014-10-03 10:58         ` Mark Rutland
  0 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-03 10:58 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 25, 2014 at 10:03:56AM +0100, Ganapatrao Kulkarni wrote:
> Raising the maximum limit to 128. This is needed for Cavium's
> Thunder systems that will have 96 cores on Multi-node system.

Has this been tested on any such systems?

Mark.

> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
> ---
>  arch/arm64/Kconfig | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index 4d42453..a409105 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -296,8 +296,8 @@ config SCHED_SMT
>  	  places. If unsure say N here.
>  
>  config NR_CPUS
> -	int "Maximum number of CPUs (2-64)"
> -	range 2 64
> +	int "Maximum number of CPUs (2-128)"
> +	range 2 128
>  	depends on SMP
>  	# These have to remain sorted largest to smallest
>  	default "64"
> -- 
> 1.8.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
  2014-09-25  9:03     ` Ganapatrao Kulkarni
@ 2014-10-03 11:05         ` Mark Rutland
  -1 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-03 11:05 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Sep 25, 2014 at 10:03:57AM +0100, Ganapatrao Kulkarni wrote:
> Adding Documentation for dt binding for memory to numa node mapping.

As I previously commented [1], this binding doesn't specify what a nid
maps to in terms of the CPU hierarchy, and is thus unusable. The binding
absolutely must be explicit about this, and NAK until it is.

Given we're seeing systems with increasing numbers of CPUs and
increasingly complex interconnect hierarchies, I would expect at minimum
that we would refer to elements in the cpu-map to define the
relationship between memory banks and CPUs.

What does the interconnect/memory hierarchy look like in your system?

Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/288263.html

> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
> ---
>  Documentation/devicetree/bindings/arm/numa.txt | 60 ++++++++++++++++++++++++++
>  1 file changed, 60 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
> new file mode 100644
> index 0000000..1cdc6d3
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/numa.txt
> @@ -0,0 +1,60 @@
> +========================================================
> +ARM numa id binding description
> +========================================================
> +
> +========================================================
> +1 - Introduction
> +========================================================
> +
> +The device node  property nid (numa node id) can be added
> +to memory device node to map the range of memory addresses
> +as defined in property reg. The property nid maps the memory
> +range to the numa node id, which is used to find the local
> +and remote pages on numa aware systems.
> +
> +========================================================
> +2 - nid property
> +========================================================
> +nid is required property of memory device node for
> +numa enabled platforms.
> +
> +|------------------------------------------------------|
> +|Property Type  | Usage | Value Type | Definition      |
> +|------------------------------------------------------|
> +|  nid          |  R    |    <u32>   | Numa Node id    |
> +|               |       |            | for this memory |
> +|------------------------------------------------------|
> +
> +========================================================
> +4 - Example memory nodes with numa information
> +========================================================
> +
> +Example 1 (2 memory nodes, each mapped to a numa node.):
> +
> +	memory@00000000 {
> +		device_type = "memory";
> +		reg = <0x0 0x00000000 0x0 0x80000000>;
> +		nid = <0x0>;
> +	};
> +
> +	memory@10000000000 {
> +		device_type = "memory";
> +		reg = <0x100 0x00000000 0x0 0x80000000>;
> +		nid = <0x1>;
> +	};
> +
> +Example 2 (multiple memory ranges in each memory node and mapped to numa node):
> +
> +	memory@00000000 {
> +		device_type = "memory";
> +		reg = <0x0 0x00000000 0x0 0x80000000>,
> +		      <0x1 0x00000000 0x0 0x80000000>;
> +		nid = <0x0>;
> +	};
> +
> +	memory@10000000000 {
> +		device_type = "memory";
> +		reg = <0x100 0x00000000 0x0 0x80000000>;
> +		reg = <0x100 0x80000000 0x0 0x80000000>;
> +		nid = <0x1>;
> +	};
> -- 
> 1.8.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
@ 2014-10-03 11:05         ` Mark Rutland
  0 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-03 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 25, 2014 at 10:03:57AM +0100, Ganapatrao Kulkarni wrote:
> Adding Documentation for dt binding for memory to numa node mapping.

As I previously commented [1], this binding doesn't specify what a nid
maps to in terms of the CPU hierarchy, and is thus unusable. The binding
absolutely must be explicit about this, and NAK until it is.

Given we're seeing systems with increasing numbers of CPUs and
increasingly complex interconnect hierarchies, I would expect at minimum
that we would refer to elements in the cpu-map to define the
relationship between memory banks and CPUs.

What does the interconnect/memory hierarchy look like in your system?

Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/288263.html

> 
> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
> ---
>  Documentation/devicetree/bindings/arm/numa.txt | 60 ++++++++++++++++++++++++++
>  1 file changed, 60 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
> 
> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
> new file mode 100644
> index 0000000..1cdc6d3
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/arm/numa.txt
> @@ -0,0 +1,60 @@
> +========================================================
> +ARM numa id binding description
> +========================================================
> +
> +========================================================
> +1 - Introduction
> +========================================================
> +
> +The device node  property nid (numa node id) can be added
> +to memory device node to map the range of memory addresses
> +as defined in property reg. The property nid maps the memory
> +range to the numa node id, which is used to find the local
> +and remote pages on numa aware systems.
> +
> +========================================================
> +2 - nid property
> +========================================================
> +nid is required property of memory device node for
> +numa enabled platforms.
> +
> +|------------------------------------------------------|
> +|Property Type  | Usage | Value Type | Definition      |
> +|------------------------------------------------------|
> +|  nid          |  R    |    <u32>   | Numa Node id    |
> +|               |       |            | for this memory |
> +|------------------------------------------------------|
> +
> +========================================================
> +4 - Example memory nodes with numa information
> +========================================================
> +
> +Example 1 (2 memory nodes, each mapped to a numa node.):
> +
> +	memory at 00000000 {
> +		device_type = "memory";
> +		reg = <0x0 0x00000000 0x0 0x80000000>;
> +		nid = <0x0>;
> +	};
> +
> +	memory at 10000000000 {
> +		device_type = "memory";
> +		reg = <0x100 0x00000000 0x0 0x80000000>;
> +		nid = <0x1>;
> +	};
> +
> +Example 2 (multiple memory ranges in each memory node and mapped to numa node):
> +
> +	memory at 00000000 {
> +		device_type = "memory";
> +		reg = <0x0 0x00000000 0x0 0x80000000>,
> +		      <0x1 0x00000000 0x0 0x80000000>;
> +		nid = <0x0>;
> +	};
> +
> +	memory at 10000000000 {
> +		device_type = "memory";
> +		reg = <0x100 0x00000000 0x0 0x80000000>;
> +		reg = <0x100 0x80000000 0x0 0x80000000>;
> +		nid = <0x1>;
> +	};
> -- 
> 1.8.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe devicetree" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 3/4] arm64:thunder: Add initial dts for Cavium Thunder SoC in 2 Node topology.
  2014-09-25  9:03     ` Ganapatrao Kulkarni
@ 2014-10-03 11:19         ` Mark Rutland
  -1 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-03 11:19 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Sep 25, 2014 at 10:03:58AM +0100, Ganapatrao Kulkarni wrote:
> adding devicetree definition for thunder's 2 node topology.
> Defined cpu-map for all 96 cores of 2 node system.

[...]

> +               CPU0: cpu@000 {
> +                       device_type = "cpu";
> +                       compatible = "cavium,thunder", "arm,armv8";
> +                       reg = <0x0 0x000>;
> +                       enable-method = "psci";
> +               };
> +               CPU1: cpu@001 {
> +                       device_type = "cpu";
> +                       compatible = "cavium,thunder", "arm,armv8";
> +                       reg = <0x0 0x001>;
> +                       enable-method = "psci";
> +               };

This is going to take up an awful lot of space.  Perhaps we should follwo ePAPR
and allow for common properties to go in /cpus (as I believe we do for arm). We
might not be able to do that for device_type given how we detect CPU nodes at
present, but we could certainly do it for the enable-method:

---->8----
diff --git a/arch/arm64/kernel/cpu_ops.c b/arch/arm64/kernel/cpu_ops.c
index cce9524..733b3ed 100644
--- a/arch/arm64/kernel/cpu_ops.c
+++ b/arch/arm64/kernel/cpu_ops.c
@@ -54,7 +54,18 @@ static const struct cpu_operations * __init cpu_get_ops(const char *name)
  */
 int __init cpu_read_ops(struct device_node *dn, int cpu)
 {
-       const char *enable_method = of_get_property(dn, "enable-method", NULL);
+       const char *enable_method = NULL;
+       of_property_read_string(dn, "enable-method", &enable_method);
+
+       if (!enable_method) {
+               /*
+                * Perhaps we have a common /cpus/enable-method
+                */
+               struct device_node *cpus = of_get_parent(dn);
+               of_property_read_string(cpus, "enable-method", &enable_method);
+               of_node_put(cpus);
+       }
+
        if (!enable_method) {
                /*
                 * The boot CPU may not have an enable method (e.g. when
---->8----

I don't believe we check the compatible string at the moment, so that should be
safe to make common too.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [RFC PATCH 3/4] arm64:thunder: Add initial dts for Cavium Thunder SoC in 2 Node topology.
@ 2014-10-03 11:19         ` Mark Rutland
  0 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-03 11:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 25, 2014 at 10:03:58AM +0100, Ganapatrao Kulkarni wrote:
> adding devicetree definition for thunder's 2 node topology.
> Defined cpu-map for all 96 cores of 2 node system.

[...]

> +               CPU0: cpu at 000 {
> +                       device_type = "cpu";
> +                       compatible = "cavium,thunder", "arm,armv8";
> +                       reg = <0x0 0x000>;
> +                       enable-method = "psci";
> +               };
> +               CPU1: cpu at 001 {
> +                       device_type = "cpu";
> +                       compatible = "cavium,thunder", "arm,armv8";
> +                       reg = <0x0 0x001>;
> +                       enable-method = "psci";
> +               };

This is going to take up an awful lot of space.  Perhaps we should follwo ePAPR
and allow for common properties to go in /cpus (as I believe we do for arm). We
might not be able to do that for device_type given how we detect CPU nodes at
present, but we could certainly do it for the enable-method:

---->8----
diff --git a/arch/arm64/kernel/cpu_ops.c b/arch/arm64/kernel/cpu_ops.c
index cce9524..733b3ed 100644
--- a/arch/arm64/kernel/cpu_ops.c
+++ b/arch/arm64/kernel/cpu_ops.c
@@ -54,7 +54,18 @@ static const struct cpu_operations * __init cpu_get_ops(const char *name)
  */
 int __init cpu_read_ops(struct device_node *dn, int cpu)
 {
-       const char *enable_method = of_get_property(dn, "enable-method", NULL);
+       const char *enable_method = NULL;
+       of_property_read_string(dn, "enable-method", &enable_method);
+
+       if (!enable_method) {
+               /*
+                * Perhaps we have a common /cpus/enable-method
+                */
+               struct device_node *cpus = of_get_parent(dn);
+               of_property_read_string(cpus, "enable-method", &enable_method);
+               of_node_put(cpus);
+       }
+
        if (!enable_method) {
                /*
                 * The boot CPU may not have an enable method (e.g. when
---->8----

I don't believe we check the compatible string at the moment, so that should be
safe to make common too.

Mark.

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-09-25  9:03     ` Ganapatrao Kulkarni
@ 2014-10-03 12:13         ` Mark Rutland
  -1 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-03 12:13 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w

On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
> Adding numa support for arm64 based platforms.
> This version creates numa mapping by parsing the dt table.
> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
> memory to node id mapping is derived from nid property of memory node.

[...]

> +/*
> + * Too small node sizes may confuse the VM badly. Usually they
> + * result from BIOS bugs. So dont recognize nodes as standalone
> + * NUMA entities that have less than this amount of RAM listed:
> + */
> +#define NODE_MIN_SIZE (4*1024*1024)

Why do these confuse the VM? what does BIOS have to do with arm64?

> +
> +#define parent_node(node)      (node)

Huh?

[...]

> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
>         min = PFN_UP(memblock_start_of_DRAM());
>         max = PFN_DOWN(memblock_end_of_DRAM());
> 
> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> +       max_pfn = max_low_pfn = max;
> +
> +       if (IS_ENABLED(CONFIG_NUMA))
> +               arm64_numa_init();

Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
that case anyway?

[...]

> +/*
> + *  Set the cpu to node and mem mapping
> + */
> +void numa_store_cpu_info(cpu)
> +{
> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
> +       set_numa_node(cpu_to_node_map[cpu]);
> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
> +}

I don't like this. I think we need to be more explicit in the DT w.r.t.
the relationship between memory and the CPU hierarchy.

I can imagine that we might end up with systems with multiple levels of
NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
explcit as possible from the start w.r.t. the relationship between
memory and groups of CPUs such that we don't end up with multiple ways
of specifying said relationship, and horrible edge cases around implicit
definitions (e.g. the nid to cluster mapping).

> +/**
> + * dummy_numa_init - Fallback dummy NUMA init
> + *
> + * Used if there's no underlying NUMA architecture, NUMA initialization
> + * fails, or NUMA is disabled on the command line.
> + *
> + * Must online at least one node and add memory blocks that cover all
> + * allowed memory.  This function must not fail.
> + */
> +static int __init dummy_numa_init(void)
> +{
> +       pr_info("%s\n",
> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");

Why not print "NUMA turned off" in numa_setup?

> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> +              0LLU, PFN_PHYS(max_pfn) - 1);
> +
> +       node_set(0, numa_nodes_parsed);
> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> +
> +       return 0;
> +}
> +
> +/**
> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
> + */
> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
> +                                    int depth, void *data)
> +{
> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +       const __be32 *reg, *endp, *nid_prop;
> +       int l, nid;
> +
> +       /* We are scanning "memory" nodes only */
> +       if (type == NULL) {
> +               /*
> +                * The longtrail doesn't have a device_type on the
> +                * /memory node, so look for the node called /memory@0.
> +                */
> +               if (depth != 1 || strcmp(uname, "memory@0") != 0)
> +                       return 0;

This has no place on arm64.

We limited to longtrail workaround in the core memory parsing to PPC32
only in commit b44aa25d20e2ef6b (of: Handle memory@0 node on PPC32
only). This code doesn't need it enabled ever.

Are you booting using UEFI? This isn't going to work when the memory map
comes from UEFI and we have no memory nodes in the DTB.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-10-03 12:13         ` Mark Rutland
  0 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-03 12:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
> Adding numa support for arm64 based platforms.
> This version creates numa mapping by parsing the dt table.
> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
> memory to node id mapping is derived from nid property of memory node.

[...]

> +/*
> + * Too small node sizes may confuse the VM badly. Usually they
> + * result from BIOS bugs. So dont recognize nodes as standalone
> + * NUMA entities that have less than this amount of RAM listed:
> + */
> +#define NODE_MIN_SIZE (4*1024*1024)

Why do these confuse the VM? what does BIOS have to do with arm64?

> +
> +#define parent_node(node)      (node)

Huh?

[...]

> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
>         min = PFN_UP(memblock_start_of_DRAM());
>         max = PFN_DOWN(memblock_end_of_DRAM());
> 
> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> +       max_pfn = max_low_pfn = max;
> +
> +       if (IS_ENABLED(CONFIG_NUMA))
> +               arm64_numa_init();

Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
that case anyway?

[...]

> +/*
> + *  Set the cpu to node and mem mapping
> + */
> +void numa_store_cpu_info(cpu)
> +{
> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
> +       set_numa_node(cpu_to_node_map[cpu]);
> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
> +}

I don't like this. I think we need to be more explicit in the DT w.r.t.
the relationship between memory and the CPU hierarchy.

I can imagine that we might end up with systems with multiple levels of
NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
explcit as possible from the start w.r.t. the relationship between
memory and groups of CPUs such that we don't end up with multiple ways
of specifying said relationship, and horrible edge cases around implicit
definitions (e.g. the nid to cluster mapping).

> +/**
> + * dummy_numa_init - Fallback dummy NUMA init
> + *
> + * Used if there's no underlying NUMA architecture, NUMA initialization
> + * fails, or NUMA is disabled on the command line.
> + *
> + * Must online at least one node and add memory blocks that cover all
> + * allowed memory.  This function must not fail.
> + */
> +static int __init dummy_numa_init(void)
> +{
> +       pr_info("%s\n",
> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");

Why not print "NUMA turned off" in numa_setup?

> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> +              0LLU, PFN_PHYS(max_pfn) - 1);
> +
> +       node_set(0, numa_nodes_parsed);
> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> +
> +       return 0;
> +}
> +
> +/**
> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
> + */
> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
> +                                    int depth, void *data)
> +{
> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> +       const __be32 *reg, *endp, *nid_prop;
> +       int l, nid;
> +
> +       /* We are scanning "memory" nodes only */
> +       if (type == NULL) {
> +               /*
> +                * The longtrail doesn't have a device_type on the
> +                * /memory node, so look for the node called /memory at 0.
> +                */
> +               if (depth != 1 || strcmp(uname, "memory at 0") != 0)
> +                       return 0;

This has no place on arm64.

We limited to longtrail workaround in the core memory parsing to PPC32
only in commit b44aa25d20e2ef6b (of: Handle memory at 0 node on PPC32
only). This code doesn't need it enabled ever.

Are you booting using UEFI? This isn't going to work when the memory map
comes from UEFI and we have no memory nodes in the DTB.

Mark.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
  2014-10-03 11:05         ` Mark Rutland
@ 2014-10-06  4:20           ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06  4:20 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Mark,

On Fri, Oct 3, 2014 at 4:35 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> On Thu, Sep 25, 2014 at 10:03:57AM +0100, Ganapatrao Kulkarni wrote:
>> Adding Documentation for dt binding for memory to numa node mapping.
>
> As I previously commented [1], this binding doesn't specify what a nid
> maps to in terms of the CPU hierarchy, and is thus unusable. The binding
> absolutely must be explicit about this, and NAK until it is.
The nid/numa node id is to map the each memory range/bank to numa node.
IIUC, the numa manages the resources based on which node they are tide to.
with nid, i am trying to map the memory range to a node.
Same follows for the all IO peripherals and for CPUs.
for cpus, i am using cluster-id as a node id to map all cpus to node.
thunder has 2 nodes, in this patch, i have grouped all cpus which
belongs to each node under cluster-id(cluster0, cluster1).
>
> Given we're seeing systems with increasing numbers of CPUs and
> increasingly complex interconnect hierarchies, I would expect at minimum
> that we would refer to elements in the cpu-map to define the
> relationship between memory banks and CPUs.
>
> What does the interconnect/memory hierarchy look like in your system?

In tunder, 2 SoCs (each has 48 cores and ram controllers and IOs) can
be connected to form 2 node NUMA system.
in a SoC(within node) there is no hierarchy with respect to memory or
IO access. However w.r.t GICv3,
48 cores are in each SoC/node are split in to 3 clusters each of 16 cores.

the MPIDR mapping for this topology is,
Aff0 is mapped to 16 cores within a cluster. Valid range is 0 to 0xf
Aff1 is mapped to cluster number, valid values are 0,1 and 2.
Aff2 is mapped to Socket-id/node id/SoC number. Valid values are 0 and 1.
>
> Mark.
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/288263.html
>
>>
>> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> ---
>>  Documentation/devicetree/bindings/arm/numa.txt | 60 ++++++++++++++++++++++++++
>>  1 file changed, 60 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
>>
>> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
>> new file mode 100644
>> index 0000000..1cdc6d3
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/numa.txt
>> @@ -0,0 +1,60 @@
>> +========================================================
>> +ARM numa id binding description
>> +========================================================
>> +
>> +========================================================
>> +1 - Introduction
>> +========================================================
>> +
>> +The device node  property nid (numa node id) can be added
>> +to memory device node to map the range of memory addresses
>> +as defined in property reg. The property nid maps the memory
>> +range to the numa node id, which is used to find the local
>> +and remote pages on numa aware systems.
>> +
>> +========================================================
>> +2 - nid property
>> +========================================================
>> +nid is required property of memory device node for
>> +numa enabled platforms.
>> +
>> +|------------------------------------------------------|
>> +|Property Type  | Usage | Value Type | Definition      |
>> +|------------------------------------------------------|
>> +|  nid          |  R    |    <u32>   | Numa Node id    |
>> +|               |       |            | for this memory |
>> +|------------------------------------------------------|
>> +
>> +========================================================
>> +4 - Example memory nodes with numa information
>> +========================================================
>> +
>> +Example 1 (2 memory nodes, each mapped to a numa node.):
>> +
>> +     memory@00000000 {
>> +             device_type = "memory";
>> +             reg = <0x0 0x00000000 0x0 0x80000000>;
>> +             nid = <0x0>;
>> +     };
>> +
>> +     memory@10000000000 {
>> +             device_type = "memory";
>> +             reg = <0x100 0x00000000 0x0 0x80000000>;
>> +             nid = <0x1>;
>> +     };
>> +
>> +Example 2 (multiple memory ranges in each memory node and mapped to numa node):
>> +
>> +     memory@00000000 {
>> +             device_type = "memory";
>> +             reg = <0x0 0x00000000 0x0 0x80000000>,
>> +                   <0x1 0x00000000 0x0 0x80000000>;
>> +             nid = <0x0>;
>> +     };
>> +
>> +     memory@10000000000 {
>> +             device_type = "memory";
>> +             reg = <0x100 0x00000000 0x0 0x80000000>;
>> +             reg = <0x100 0x80000000 0x0 0x80000000>;
>> +             nid = <0x1>;
>> +     };
>> --
>> 1.8.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
thanks
ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
@ 2014-10-06  4:20           ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06  4:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Fri, Oct 3, 2014 at 4:35 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Sep 25, 2014 at 10:03:57AM +0100, Ganapatrao Kulkarni wrote:
>> Adding Documentation for dt binding for memory to numa node mapping.
>
> As I previously commented [1], this binding doesn't specify what a nid
> maps to in terms of the CPU hierarchy, and is thus unusable. The binding
> absolutely must be explicit about this, and NAK until it is.
The nid/numa node id is to map the each memory range/bank to numa node.
IIUC, the numa manages the resources based on which node they are tide to.
with nid, i am trying to map the memory range to a node.
Same follows for the all IO peripherals and for CPUs.
for cpus, i am using cluster-id as a node id to map all cpus to node.
thunder has 2 nodes, in this patch, i have grouped all cpus which
belongs to each node under cluster-id(cluster0, cluster1).
>
> Given we're seeing systems with increasing numbers of CPUs and
> increasingly complex interconnect hierarchies, I would expect at minimum
> that we would refer to elements in the cpu-map to define the
> relationship between memory banks and CPUs.
>
> What does the interconnect/memory hierarchy look like in your system?

In tunder, 2 SoCs (each has 48 cores and ram controllers and IOs) can
be connected to form 2 node NUMA system.
in a SoC(within node) there is no hierarchy with respect to memory or
IO access. However w.r.t GICv3,
48 cores are in each SoC/node are split in to 3 clusters each of 16 cores.

the MPIDR mapping for this topology is,
Aff0 is mapped to 16 cores within a cluster. Valid range is 0 to 0xf
Aff1 is mapped to cluster number, valid values are 0,1 and 2.
Aff2 is mapped to Socket-id/node id/SoC number. Valid values are 0 and 1.
>
> Mark.
>
> [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/288263.html
>
>>
>> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
>> ---
>>  Documentation/devicetree/bindings/arm/numa.txt | 60 ++++++++++++++++++++++++++
>>  1 file changed, 60 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/arm/numa.txt
>>
>> diff --git a/Documentation/devicetree/bindings/arm/numa.txt b/Documentation/devicetree/bindings/arm/numa.txt
>> new file mode 100644
>> index 0000000..1cdc6d3
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/arm/numa.txt
>> @@ -0,0 +1,60 @@
>> +========================================================
>> +ARM numa id binding description
>> +========================================================
>> +
>> +========================================================
>> +1 - Introduction
>> +========================================================
>> +
>> +The device node  property nid (numa node id) can be added
>> +to memory device node to map the range of memory addresses
>> +as defined in property reg. The property nid maps the memory
>> +range to the numa node id, which is used to find the local
>> +and remote pages on numa aware systems.
>> +
>> +========================================================
>> +2 - nid property
>> +========================================================
>> +nid is required property of memory device node for
>> +numa enabled platforms.
>> +
>> +|------------------------------------------------------|
>> +|Property Type  | Usage | Value Type | Definition      |
>> +|------------------------------------------------------|
>> +|  nid          |  R    |    <u32>   | Numa Node id    |
>> +|               |       |            | for this memory |
>> +|------------------------------------------------------|
>> +
>> +========================================================
>> +4 - Example memory nodes with numa information
>> +========================================================
>> +
>> +Example 1 (2 memory nodes, each mapped to a numa node.):
>> +
>> +     memory at 00000000 {
>> +             device_type = "memory";
>> +             reg = <0x0 0x00000000 0x0 0x80000000>;
>> +             nid = <0x0>;
>> +     };
>> +
>> +     memory at 10000000000 {
>> +             device_type = "memory";
>> +             reg = <0x100 0x00000000 0x0 0x80000000>;
>> +             nid = <0x1>;
>> +     };
>> +
>> +Example 2 (multiple memory ranges in each memory node and mapped to numa node):
>> +
>> +     memory at 00000000 {
>> +             device_type = "memory";
>> +             reg = <0x0 0x00000000 0x0 0x80000000>,
>> +                   <0x1 0x00000000 0x0 0x80000000>;
>> +             nid = <0x0>;
>> +     };
>> +
>> +     memory at 10000000000 {
>> +             device_type = "memory";
>> +             reg = <0x100 0x00000000 0x0 0x80000000>;
>> +             reg = <0x100 0x80000000 0x0 0x80000000>;
>> +             nid = <0x1>;
>> +     };
>> --
>> 1.8.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
thanks
ganapat

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 1/4] arm64: defconfig: increase NR_CPUS range to 2-128
  2014-10-03 10:58         ` Mark Rutland
@ 2014-10-06  4:29           ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06  4:29 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Fri, Oct 3, 2014 at 4:28 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> On Thu, Sep 25, 2014 at 10:03:56AM +0100, Ganapatrao Kulkarni wrote:
>> Raising the maximum limit to 128. This is needed for Cavium's
>> Thunder systems that will have 96 cores on Multi-node system.
>
> Has this been tested on any such systems?
This is tested on Cavium's Thunder 2 Node Simulator with kernel 3.17-rc5.
please let us know, if any specific configurations needs to be tested
with the change.
>
> Mark.
>
>>
>> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
>> ---
>>  arch/arm64/Kconfig | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 4d42453..a409105 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -296,8 +296,8 @@ config SCHED_SMT
>>         places. If unsure say N here.
>>
>>  config NR_CPUS
>> -     int "Maximum number of CPUs (2-64)"
>> -     range 2 64
>> +     int "Maximum number of CPUs (2-128)"
>> +     range 2 128
>>       depends on SMP
>>       # These have to remain sorted largest to smallest
>>       default "64"
>> --
>> 1.8.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
thanks
ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 1/4] arm64: defconfig: increase NR_CPUS range to 2-128
@ 2014-10-06  4:29           ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06  4:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 3, 2014 at 4:28 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Sep 25, 2014 at 10:03:56AM +0100, Ganapatrao Kulkarni wrote:
>> Raising the maximum limit to 128. This is needed for Cavium's
>> Thunder systems that will have 96 cores on Multi-node system.
>
> Has this been tested on any such systems?
This is tested on Cavium's Thunder 2 Node Simulator with kernel 3.17-rc5.
please let us know, if any specific configurations needs to be tested
with the change.
>
> Mark.
>
>>
>> Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
>> ---
>>  arch/arm64/Kconfig | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>> index 4d42453..a409105 100644
>> --- a/arch/arm64/Kconfig
>> +++ b/arch/arm64/Kconfig
>> @@ -296,8 +296,8 @@ config SCHED_SMT
>>         places. If unsure say N here.
>>
>>  config NR_CPUS
>> -     int "Maximum number of CPUs (2-64)"
>> -     range 2 64
>> +     int "Maximum number of CPUs (2-128)"
>> +     range 2 128
>>       depends on SMP
>>       # These have to remain sorted largest to smallest
>>       default "64"
>> --
>> 1.8.1.4
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe devicetree" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
thanks
ganapat

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-03 12:13         ` Mark Rutland
@ 2014-10-06  5:14           ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06  5:14 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

Hi Mark,

On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
>> Adding numa support for arm64 based platforms.
>> This version creates numa mapping by parsing the dt table.
>> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
>> memory to node id mapping is derived from nid property of memory node.
>
> [...]
>
>> +/*
>> + * Too small node sizes may confuse the VM badly. Usually they
>> + * result from BIOS bugs. So dont recognize nodes as standalone
>> + * NUMA entities that have less than this amount of RAM listed:
>> + */
>> +#define NODE_MIN_SIZE (4*1024*1024)
>
> Why do these confuse the VM? what does BIOS have to do with arm64?
sneaked in from x86, will remove this.
>
>> +
>> +#define parent_node(node)      (node)
>
> Huh?
for thunder, no hierarchy at numa nodes. shall i put under ifdef or
separate header file?
>
> [...]
>
>> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
>>         min = PFN_UP(memblock_start_of_DRAM());
>>         max = PFN_DOWN(memblock_end_of_DRAM());
>>
>> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>> +       max_pfn = max_low_pfn = max;
>> +
>> +       if (IS_ENABLED(CONFIG_NUMA))
>> +               arm64_numa_init();
>
> Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
> that case anyway?
yes, if is not required, will remove it.
>
> [...]
>
>> +/*
>> + *  Set the cpu to node and mem mapping
>> + */
>> +void numa_store_cpu_info(cpu)
>> +{
>> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
>> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
>> +       set_numa_node(cpu_to_node_map[cpu]);
>> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
>> +}
>
> I don't like this. I think we need to be more explicit in the DT w.r.t.
> the relationship between memory and the CPU hierarchy.
>
> I can imagine that we might end up with systems with multiple levels of
> NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
> explcit as possible from the start w.r.t. the relationship between
> memory and groups of CPUs such that we don't end up with multiple ways
> of specifying said relationship, and horrible edge cases around implicit
> definitions (e.g. the nid to cluster mapping).
are you recomending to have explicit nid attribute to each cpu device node?
>
>> +/**
>> + * dummy_numa_init - Fallback dummy NUMA init
>> + *
>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> + * fails, or NUMA is disabled on the command line.
>> + *
>> + * Must online at least one node and add memory blocks that cover all
>> + * allowed memory.  This function must not fail.
>> + */
>> +static int __init dummy_numa_init(void)
>> +{
>> +       pr_info("%s\n",
>> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
>
> Why not print "NUMA turned off" in numa_setup?
enters this function only when, numa is turned off or the DT/ACPI
based numa init fails.
>
>> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>> +              0LLU, PFN_PHYS(max_pfn) - 1);
>> +
>> +       node_set(0, numa_nodes_parsed);
>> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>> +
>> +       return 0;
>> +}
>> +
>> +/**
>> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
>> + */
>> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
>> +                                    int depth, void *data)
>> +{
>> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>> +       const __be32 *reg, *endp, *nid_prop;
>> +       int l, nid;
>> +
>> +       /* We are scanning "memory" nodes only */
>> +       if (type == NULL) {
>> +               /*
>> +                * The longtrail doesn't have a device_type on the
>> +                * /memory node, so look for the node called /memory@0.
>> +                */
>> +               if (depth != 1 || strcmp(uname, "memory@0") != 0)
>> +                       return 0;
>
> This has no place on arm64.
i am not sure that we can move to driver/of, at this moment this is
arm64 specific binding.
>
> We limited to longtrail workaround in the core memory parsing to PPC32
> only in commit b44aa25d20e2ef6b (of: Handle memory@0 node on PPC32
> only). This code doesn't need it enabled ever.
>
> Are you booting using UEFI? This isn't going to work when the memory map
tried with bootwrapper, working on to boot from UEFI.
> comes from UEFI and we have no memory nodes in the DTB.
yes, there is issue with UEFI boot, since memory node is removed.
i request UEFI stub dev-team to suggest the possible ways to address this.
>
> Mark.
thanks
ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-10-06  5:14           ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06  5:14 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
>> Adding numa support for arm64 based platforms.
>> This version creates numa mapping by parsing the dt table.
>> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
>> memory to node id mapping is derived from nid property of memory node.
>
> [...]
>
>> +/*
>> + * Too small node sizes may confuse the VM badly. Usually they
>> + * result from BIOS bugs. So dont recognize nodes as standalone
>> + * NUMA entities that have less than this amount of RAM listed:
>> + */
>> +#define NODE_MIN_SIZE (4*1024*1024)
>
> Why do these confuse the VM? what does BIOS have to do with arm64?
sneaked in from x86, will remove this.
>
>> +
>> +#define parent_node(node)      (node)
>
> Huh?
for thunder, no hierarchy at numa nodes. shall i put under ifdef or
separate header file?
>
> [...]
>
>> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
>>         min = PFN_UP(memblock_start_of_DRAM());
>>         max = PFN_DOWN(memblock_end_of_DRAM());
>>
>> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>> +       max_pfn = max_low_pfn = max;
>> +
>> +       if (IS_ENABLED(CONFIG_NUMA))
>> +               arm64_numa_init();
>
> Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
> that case anyway?
yes, if is not required, will remove it.
>
> [...]
>
>> +/*
>> + *  Set the cpu to node and mem mapping
>> + */
>> +void numa_store_cpu_info(cpu)
>> +{
>> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
>> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
>> +       set_numa_node(cpu_to_node_map[cpu]);
>> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
>> +}
>
> I don't like this. I think we need to be more explicit in the DT w.r.t.
> the relationship between memory and the CPU hierarchy.
>
> I can imagine that we might end up with systems with multiple levels of
> NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
> explcit as possible from the start w.r.t. the relationship between
> memory and groups of CPUs such that we don't end up with multiple ways
> of specifying said relationship, and horrible edge cases around implicit
> definitions (e.g. the nid to cluster mapping).
are you recomending to have explicit nid attribute to each cpu device node?
>
>> +/**
>> + * dummy_numa_init - Fallback dummy NUMA init
>> + *
>> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> + * fails, or NUMA is disabled on the command line.
>> + *
>> + * Must online at least one node and add memory blocks that cover all
>> + * allowed memory.  This function must not fail.
>> + */
>> +static int __init dummy_numa_init(void)
>> +{
>> +       pr_info("%s\n",
>> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
>
> Why not print "NUMA turned off" in numa_setup?
enters this function only when, numa is turned off or the DT/ACPI
based numa init fails.
>
>> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>> +              0LLU, PFN_PHYS(max_pfn) - 1);
>> +
>> +       node_set(0, numa_nodes_parsed);
>> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>> +
>> +       return 0;
>> +}
>> +
>> +/**
>> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
>> + */
>> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
>> +                                    int depth, void *data)
>> +{
>> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>> +       const __be32 *reg, *endp, *nid_prop;
>> +       int l, nid;
>> +
>> +       /* We are scanning "memory" nodes only */
>> +       if (type == NULL) {
>> +               /*
>> +                * The longtrail doesn't have a device_type on the
>> +                * /memory node, so look for the node called /memory at 0.
>> +                */
>> +               if (depth != 1 || strcmp(uname, "memory at 0") != 0)
>> +                       return 0;
>
> This has no place on arm64.
i am not sure that we can move to driver/of, at this moment this is
arm64 specific binding.
>
> We limited to longtrail workaround in the core memory parsing to PPC32
> only in commit b44aa25d20e2ef6b (of: Handle memory at 0 node on PPC32
> only). This code doesn't need it enabled ever.
>
> Are you booting using UEFI? This isn't going to work when the memory map
tried with bootwrapper, working on to boot from UEFI.
> comes from UEFI and we have no memory nodes in the DTB.
yes, there is issue with UEFI boot, since memory node is removed.
i request UEFI stub dev-team to suggest the possible ways to address this.
>
> Mark.
thanks
ganapat

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
  2014-10-06  4:20           ` Ganapatrao Kulkarni
@ 2014-10-06 11:08               ` Mark Rutland
  -1 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-06 11:08 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Oct 06, 2014 at 05:20:14AM +0100, Ganapatrao Kulkarni wrote:
> Hi Mark,
> 
> On Fri, Oct 3, 2014 at 4:35 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> > On Thu, Sep 25, 2014 at 10:03:57AM +0100, Ganapatrao Kulkarni wrote:
> >> Adding Documentation for dt binding for memory to numa node mapping.
> >
> > As I previously commented [1], this binding doesn't specify what a nid
> > maps to in terms of the CPU hierarchy, and is thus unusable. The binding
> > absolutely must be explicit about this, and NAK until it is.
> The nid/numa node id is to map the each memory range/bank to numa node.

The issue is what constitutes a "numa node" is not defined. Hence the
mapping a memory banks to a "nid" is just a mapping to an arbitrary
number -- the mapping of this number to CPUs isn't defined.

> IIUC, the numa manages the resources based on which node they are tide to.
> with nid, i am trying to map the memory range to a node.
> Same follows for the all IO peripherals and for CPUs.
> for cpus, i am using cluster-id as a node id to map all cpus to node.

I strongly suspect that this is not going to work for very long. I don't
think relying on a mapping of nid to a top-level cluster-id is a good
idea, especially given we have the facility to be more explicit through
use of the cpu-map.

We don't need to handle all the possible cases from the start, but I'd
rather we consistently used the cou-map to explicitly define the
relationship between CPUs and memory.

> thunder has 2 nodes, in this patch, i have grouped all cpus which
> belongs to each node under cluster-id(cluster0, cluster1).
>
> > Given we're seeing systems with increasing numbers of CPUs and
> > increasingly complex interconnect hierarchies, I would expect at minimum
> > that we would refer to elements in the cpu-map to define the
> > relationship between memory banks and CPUs.
> >
> > What does the interconnect/memory hierarchy look like in your system?
> 
> In tunder, 2 SoCs (each has 48 cores and ram controllers and IOs) can
> be connected to form 2 node NUMA system.
> in a SoC(within node) there is no hierarchy with respect to memory or
> IO access. However w.r.t GICv3,
> 48 cores are in each SoC/node are split in to 3 clusters each of 16 cores.
>
> the MPIDR mapping for this topology is,
> Aff0 is mapped to 16 cores within a cluster. Valid range is 0 to 0xf
> Aff1 is mapped to cluster number, valid values are 0,1 and 2.
> Aff2 is mapped to Socket-id/node id/SoC number. Valid values are 0 and 1.

Thanks for the information.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
@ 2014-10-06 11:08               ` Mark Rutland
  0 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-06 11:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 06, 2014 at 05:20:14AM +0100, Ganapatrao Kulkarni wrote:
> Hi Mark,
> 
> On Fri, Oct 3, 2014 at 4:35 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Thu, Sep 25, 2014 at 10:03:57AM +0100, Ganapatrao Kulkarni wrote:
> >> Adding Documentation for dt binding for memory to numa node mapping.
> >
> > As I previously commented [1], this binding doesn't specify what a nid
> > maps to in terms of the CPU hierarchy, and is thus unusable. The binding
> > absolutely must be explicit about this, and NAK until it is.
> The nid/numa node id is to map the each memory range/bank to numa node.

The issue is what constitutes a "numa node" is not defined. Hence the
mapping a memory banks to a "nid" is just a mapping to an arbitrary
number -- the mapping of this number to CPUs isn't defined.

> IIUC, the numa manages the resources based on which node they are tide to.
> with nid, i am trying to map the memory range to a node.
> Same follows for the all IO peripherals and for CPUs.
> for cpus, i am using cluster-id as a node id to map all cpus to node.

I strongly suspect that this is not going to work for very long. I don't
think relying on a mapping of nid to a top-level cluster-id is a good
idea, especially given we have the facility to be more explicit through
use of the cpu-map.

We don't need to handle all the possible cases from the start, but I'd
rather we consistently used the cou-map to explicitly define the
relationship between CPUs and memory.

> thunder has 2 nodes, in this patch, i have grouped all cpus which
> belongs to each node under cluster-id(cluster0, cluster1).
>
> > Given we're seeing systems with increasing numbers of CPUs and
> > increasingly complex interconnect hierarchies, I would expect at minimum
> > that we would refer to elements in the cpu-map to define the
> > relationship between memory banks and CPUs.
> >
> > What does the interconnect/memory hierarchy look like in your system?
> 
> In tunder, 2 SoCs (each has 48 cores and ram controllers and IOs) can
> be connected to form 2 node NUMA system.
> in a SoC(within node) there is no hierarchy with respect to memory or
> IO access. However w.r.t GICv3,
> 48 cores are in each SoC/node are split in to 3 clusters each of 16 cores.
>
> the MPIDR mapping for this topology is,
> Aff0 is mapped to 16 cores within a cluster. Valid range is 0 to 0xf
> Aff1 is mapped to cluster number, valid values are 0,1 and 2.
> Aff2 is mapped to Socket-id/node id/SoC number. Valid values are 0 and 1.

Thanks for the information.

Mark.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-06  5:14           ` Ganapatrao Kulkarni
@ 2014-10-06 11:26               ` Mark Rutland
  -1 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-06 11:26 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	leif.lindholm-QSEj5FYQhm4dnm+yROfE0A,
	roy.franz-QSEj5FYQhm4dnm+yROfE0A,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA

On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
> Hi Mark,
> 
> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
> >> Adding numa support for arm64 based platforms.
> >> This version creates numa mapping by parsing the dt table.
> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
> >> memory to node id mapping is derived from nid property of memory node.
> >
> > [...]
> >
> >> +/*
> >> + * Too small node sizes may confuse the VM badly. Usually they
> >> + * result from BIOS bugs. So dont recognize nodes as standalone
> >> + * NUMA entities that have less than this amount of RAM listed:
> >> + */
> >> +#define NODE_MIN_SIZE (4*1024*1024)
> >
> > Why do these confuse the VM? what does BIOS have to do with arm64?
> sneaked in from x86, will remove this.
> >
> >> +
> >> +#define parent_node(node)      (node)
> >
> > Huh?
> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
> separate header file?

I was confused by a node being its own parent, but that seems to be the
case elsewhere for parent_node() implementations. Please at least have a
comment that we're assuming a flat hierarchy (for now).

> >
> > [...]
> >
> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
> >>         min = PFN_UP(memblock_start_of_DRAM());
> >>         max = PFN_DOWN(memblock_end_of_DRAM());
> >>
> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> >> +       max_pfn = max_low_pfn = max;
> >> +
> >> +       if (IS_ENABLED(CONFIG_NUMA))
> >> +               arm64_numa_init();
> >
> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
> > that case anyway?
> yes, if is not required, will remove it.
> >
> > [...]
> >
> >> +/*
> >> + *  Set the cpu to node and mem mapping
> >> + */
> >> +void numa_store_cpu_info(cpu)
> >> +{
> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
> >> +       set_numa_node(cpu_to_node_map[cpu]);
> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
> >> +}
> >
> > I don't like this. I think we need to be more explicit in the DT w.r.t.
> > the relationship between memory and the CPU hierarchy.
> >
> > I can imagine that we might end up with systems with multiple levels of
> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
> > explcit as possible from the start w.r.t. the relationship between
> > memory and groups of CPUs such that we don't end up with multiple ways
> > of specifying said relationship, and horrible edge cases around implicit
> > definitions (e.g. the nid to cluster mapping).
> are you recomending to have explicit nid attribute to each cpu device node?

I am recommending that we make the relationship explicit. If anything,
using the cpu-map (with phandles) seems like a better approach.

> >> +/**
> >> + * dummy_numa_init - Fallback dummy NUMA init
> >> + *
> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
> >> + * fails, or NUMA is disabled on the command line.
> >> + *
> >> + * Must online at least one node and add memory blocks that cover all
> >> + * allowed memory.  This function must not fail.
> >> + */
> >> +static int __init dummy_numa_init(void)
> >> +{
> >> +       pr_info("%s\n",
> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
> >
> > Why not print "NUMA turned off" in numa_setup?
> enters this function only when, numa is turned off or the DT/ACPI
> based numa init fails.

Sure. Moving the "NUMA turned off" print into numa_setup would mean you
could just print "Using dummy NUMA layout" or something to that effect
here -- the function has no need to care about the value of numa_off.

> >
> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
> >> +
> >> +       node_set(0, numa_nodes_parsed);
> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +/**
> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
> >> + */
> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
> >> +                                    int depth, void *data)
> >> +{
> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> >> +       const __be32 *reg, *endp, *nid_prop;
> >> +       int l, nid;
> >> +
> >> +       /* We are scanning "memory" nodes only */
> >> +       if (type == NULL) {
> >> +               /*
> >> +                * The longtrail doesn't have a device_type on the
> >> +                * /memory node, so look for the node called /memory@0.
> >> +                */
> >> +               if (depth != 1 || strcmp(uname, "memory@0") != 0)
> >> +                       return 0;
> >
> > This has no place on arm64.
> i am not sure that we can move to driver/of, at this moment this is
> arm64 specific binding.

I meant the longtrail workaround, hence my comments on that below.

> > We limited to longtrail workaround in the core memory parsing to PPC32
> > only in commit b44aa25d20e2ef6b (of: Handle memory@0 node on PPC32
> > only). This code doesn't need it enabled ever.
> >
> > Are you booting using UEFI? This isn't going to work when the memory map
> tried with bootwrapper, working on to boot from UEFI.
> > comes from UEFI and we have no memory nodes in the DTB.
> yes, there is issue with UEFI boot, since memory node is removed.
> i request UEFI stub dev-team to suggest the possible ways to address this.

I've Cc'd a few people who have worked on the stub and/or EFI memory map
stuff. It would be worth keeping them on Cc so as to keep them informed.

I believe that the EFI stub is doing the right thing by ensuring that
the EFI memory map is used, so this is just another configuration that
your binding has to consider.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-10-06 11:26               ` Mark Rutland
  0 siblings, 0 replies; 44+ messages in thread
From: Mark Rutland @ 2014-10-06 11:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
> Hi Mark,
> 
> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
> >> Adding numa support for arm64 based platforms.
> >> This version creates numa mapping by parsing the dt table.
> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
> >> memory to node id mapping is derived from nid property of memory node.
> >
> > [...]
> >
> >> +/*
> >> + * Too small node sizes may confuse the VM badly. Usually they
> >> + * result from BIOS bugs. So dont recognize nodes as standalone
> >> + * NUMA entities that have less than this amount of RAM listed:
> >> + */
> >> +#define NODE_MIN_SIZE (4*1024*1024)
> >
> > Why do these confuse the VM? what does BIOS have to do with arm64?
> sneaked in from x86, will remove this.
> >
> >> +
> >> +#define parent_node(node)      (node)
> >
> > Huh?
> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
> separate header file?

I was confused by a node being its own parent, but that seems to be the
case elsewhere for parent_node() implementations. Please at least have a
comment that we're assuming a flat hierarchy (for now).

> >
> > [...]
> >
> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
> >>         min = PFN_UP(memblock_start_of_DRAM());
> >>         max = PFN_DOWN(memblock_end_of_DRAM());
> >>
> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> >> +       max_pfn = max_low_pfn = max;
> >> +
> >> +       if (IS_ENABLED(CONFIG_NUMA))
> >> +               arm64_numa_init();
> >
> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
> > that case anyway?
> yes, if is not required, will remove it.
> >
> > [...]
> >
> >> +/*
> >> + *  Set the cpu to node and mem mapping
> >> + */
> >> +void numa_store_cpu_info(cpu)
> >> +{
> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
> >> +       set_numa_node(cpu_to_node_map[cpu]);
> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
> >> +}
> >
> > I don't like this. I think we need to be more explicit in the DT w.r.t.
> > the relationship between memory and the CPU hierarchy.
> >
> > I can imagine that we might end up with systems with multiple levels of
> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
> > explcit as possible from the start w.r.t. the relationship between
> > memory and groups of CPUs such that we don't end up with multiple ways
> > of specifying said relationship, and horrible edge cases around implicit
> > definitions (e.g. the nid to cluster mapping).
> are you recomending to have explicit nid attribute to each cpu device node?

I am recommending that we make the relationship explicit. If anything,
using the cpu-map (with phandles) seems like a better approach.

> >> +/**
> >> + * dummy_numa_init - Fallback dummy NUMA init
> >> + *
> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
> >> + * fails, or NUMA is disabled on the command line.
> >> + *
> >> + * Must online at least one node and add memory blocks that cover all
> >> + * allowed memory.  This function must not fail.
> >> + */
> >> +static int __init dummy_numa_init(void)
> >> +{
> >> +       pr_info("%s\n",
> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
> >
> > Why not print "NUMA turned off" in numa_setup?
> enters this function only when, numa is turned off or the DT/ACPI
> based numa init fails.

Sure. Moving the "NUMA turned off" print into numa_setup would mean you
could just print "Using dummy NUMA layout" or something to that effect
here -- the function has no need to care about the value of numa_off.

> >
> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
> >> +
> >> +       node_set(0, numa_nodes_parsed);
> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> >> +
> >> +       return 0;
> >> +}
> >> +
> >> +/**
> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
> >> + */
> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
> >> +                                    int depth, void *data)
> >> +{
> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> >> +       const __be32 *reg, *endp, *nid_prop;
> >> +       int l, nid;
> >> +
> >> +       /* We are scanning "memory" nodes only */
> >> +       if (type == NULL) {
> >> +               /*
> >> +                * The longtrail doesn't have a device_type on the
> >> +                * /memory node, so look for the node called /memory at 0.
> >> +                */
> >> +               if (depth != 1 || strcmp(uname, "memory at 0") != 0)
> >> +                       return 0;
> >
> > This has no place on arm64.
> i am not sure that we can move to driver/of, at this moment this is
> arm64 specific binding.

I meant the longtrail workaround, hence my comments on that below.

> > We limited to longtrail workaround in the core memory parsing to PPC32
> > only in commit b44aa25d20e2ef6b (of: Handle memory at 0 node on PPC32
> > only). This code doesn't need it enabled ever.
> >
> > Are you booting using UEFI? This isn't going to work when the memory map
> tried with bootwrapper, working on to boot from UEFI.
> > comes from UEFI and we have no memory nodes in the DTB.
> yes, there is issue with UEFI boot, since memory node is removed.
> i request UEFI stub dev-team to suggest the possible ways to address this.

I've Cc'd a few people who have worked on the stub and/or EFI memory map
stuff. It would be worth keeping them on Cc so as to keep them informed.

I believe that the EFI stub is doing the right thing by ensuring that
the EFI memory map is used, so this is just another configuration that
your binding has to consider.

Mark.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
  2014-10-06 11:08               ` Mark Rutland
@ 2014-10-06 17:26                 ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06 17:26 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r

On Mon, Oct 6, 2014 at 4:38 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> On Mon, Oct 06, 2014 at 05:20:14AM +0100, Ganapatrao Kulkarni wrote:
>> Hi Mark,
>>
>> On Fri, Oct 3, 2014 at 4:35 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
>> > On Thu, Sep 25, 2014 at 10:03:57AM +0100, Ganapatrao Kulkarni wrote:
>> >> Adding Documentation for dt binding for memory to numa node mapping.
>> >
>> > As I previously commented [1], this binding doesn't specify what a nid
>> > maps to in terms of the CPU hierarchy, and is thus unusable. The binding
>> > absolutely must be explicit about this, and NAK until it is.
>> The nid/numa node id is to map the each memory range/bank to numa node.
>
> The issue is what constitutes a "numa node" is not defined. Hence the
> mapping a memory banks to a "nid" is just a mapping to an arbitrary
> number -- the mapping of this number to CPUs isn't defined.
>
>> IIUC, the numa manages the resources based on which node they are tide to.
>> with nid, i am trying to map the memory range to a node.
>> Same follows for the all IO peripherals and for CPUs.
>> for cpus, i am using cluster-id as a node id to map all cpus to node.
>
> I strongly suspect that this is not going to work for very long. I don't
> think relying on a mapping of nid to a top-level cluster-id is a good
> idea, especially given we have the facility to be more explicit through
> use of the cpu-map.
>
> We don't need to handle all the possible cases from the start, but I'd
> rather we consistently used the cou-map to explicitly define the
> relationship between CPUs and memory.
agreed, will implement nid mapping in cpu-map in v2 patchset.
>
>> thunder has 2 nodes, in this patch, i have grouped all cpus which
>> belongs to each node under cluster-id(cluster0, cluster1).
>>
>> > Given we're seeing systems with increasing numbers of CPUs and
>> > increasingly complex interconnect hierarchies, I would expect at minimum
>> > that we would refer to elements in the cpu-map to define the
>> > relationship between memory banks and CPUs.
>> >
>> > What does the interconnect/memory hierarchy look like in your system?
>>
>> In tunder, 2 SoCs (each has 48 cores and ram controllers and IOs) can
>> be connected to form 2 node NUMA system.
>> in a SoC(within node) there is no hierarchy with respect to memory or
>> IO access. However w.r.t GICv3,
>> 48 cores are in each SoC/node are split in to 3 clusters each of 16 cores.
>>
>> the MPIDR mapping for this topology is,
>> Aff0 is mapped to 16 cores within a cluster. Valid range is 0 to 0xf
>> Aff1 is mapped to cluster number, valid values are 0,1 and 2.
>> Aff2 is mapped to Socket-id/node id/SoC number. Valid values are 0 and 1.
>
> Thanks for the information.
>
> Mark.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes.
@ 2014-10-06 17:26                 ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06 17:26 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 6, 2014 at 4:38 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Oct 06, 2014 at 05:20:14AM +0100, Ganapatrao Kulkarni wrote:
>> Hi Mark,
>>
>> On Fri, Oct 3, 2014 at 4:35 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Thu, Sep 25, 2014 at 10:03:57AM +0100, Ganapatrao Kulkarni wrote:
>> >> Adding Documentation for dt binding for memory to numa node mapping.
>> >
>> > As I previously commented [1], this binding doesn't specify what a nid
>> > maps to in terms of the CPU hierarchy, and is thus unusable. The binding
>> > absolutely must be explicit about this, and NAK until it is.
>> The nid/numa node id is to map the each memory range/bank to numa node.
>
> The issue is what constitutes a "numa node" is not defined. Hence the
> mapping a memory banks to a "nid" is just a mapping to an arbitrary
> number -- the mapping of this number to CPUs isn't defined.
>
>> IIUC, the numa manages the resources based on which node they are tide to.
>> with nid, i am trying to map the memory range to a node.
>> Same follows for the all IO peripherals and for CPUs.
>> for cpus, i am using cluster-id as a node id to map all cpus to node.
>
> I strongly suspect that this is not going to work for very long. I don't
> think relying on a mapping of nid to a top-level cluster-id is a good
> idea, especially given we have the facility to be more explicit through
> use of the cpu-map.
>
> We don't need to handle all the possible cases from the start, but I'd
> rather we consistently used the cou-map to explicitly define the
> relationship between CPUs and memory.
agreed, will implement nid mapping in cpu-map in v2 patchset.
>
>> thunder has 2 nodes, in this patch, i have grouped all cpus which
>> belongs to each node under cluster-id(cluster0, cluster1).
>>
>> > Given we're seeing systems with increasing numbers of CPUs and
>> > increasingly complex interconnect hierarchies, I would expect at minimum
>> > that we would refer to elements in the cpu-map to define the
>> > relationship between memory banks and CPUs.
>> >
>> > What does the interconnect/memory hierarchy look like in your system?
>>
>> In tunder, 2 SoCs (each has 48 cores and ram controllers and IOs) can
>> be connected to form 2 node NUMA system.
>> in a SoC(within node) there is no hierarchy with respect to memory or
>> IO access. However w.r.t GICv3,
>> 48 cores are in each SoC/node are split in to 3 clusters each of 16 cores.
>>
>> the MPIDR mapping for this topology is,
>> Aff0 is mapped to 16 cores within a cluster. Valid range is 0 to 0xf
>> Aff1 is mapped to cluster number, valid values are 0,1 and 2.
>> Aff2 is mapped to Socket-id/node id/SoC number. Valid values are 0 and 1.
>
> Thanks for the information.
>
> Mark.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-06 11:26               ` Mark Rutland
@ 2014-10-06 17:52                 ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06 17:52 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Leif Lindholm,
	roy.franz-QSEj5FYQhm4dnm+yROfE0A,
	ard.biesheuvel-QSEj5FYQhm4dnm+yROfE0A,
	msalter-H+wXaHxf7aLQT0dZR+AlfA

On Mon, Oct 6, 2014 at 4:56 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
>> Hi Mark,
>>
>> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
>> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
>> >> Adding numa support for arm64 based platforms.
>> >> This version creates numa mapping by parsing the dt table.
>> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
>> >> memory to node id mapping is derived from nid property of memory node.
>> >
>> > [...]
>> >
>> >> +/*
>> >> + * Too small node sizes may confuse the VM badly. Usually they
>> >> + * result from BIOS bugs. So dont recognize nodes as standalone
>> >> + * NUMA entities that have less than this amount of RAM listed:
>> >> + */
>> >> +#define NODE_MIN_SIZE (4*1024*1024)
>> >
>> > Why do these confuse the VM? what does BIOS have to do with arm64?
>> sneaked in from x86, will remove this.
>> >
>> >> +
>> >> +#define parent_node(node)      (node)
>> >
>> > Huh?
>> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
>> separate header file?
>
> I was confused by a node being its own parent, but that seems to be the
> case elsewhere for parent_node() implementations. Please at least have a
> comment that we're assuming a flat hierarchy (for now).
sure, will add the required comments.
>
>> >
>> > [...]
>> >
>> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
>> >>         min = PFN_UP(memblock_start_of_DRAM());
>> >>         max = PFN_DOWN(memblock_end_of_DRAM());
>> >>
>> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>> >> +       max_pfn = max_low_pfn = max;
>> >> +
>> >> +       if (IS_ENABLED(CONFIG_NUMA))
>> >> +               arm64_numa_init();
>> >
>> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
>> > that case anyway?
>> yes, if is not required, will remove it.
>> >
>> > [...]
>> >
>> >> +/*
>> >> + *  Set the cpu to node and mem mapping
>> >> + */
>> >> +void numa_store_cpu_info(cpu)
>> >> +{
>> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
>> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
>> >> +       set_numa_node(cpu_to_node_map[cpu]);
>> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
>> >> +}
>> >
>> > I don't like this. I think we need to be more explicit in the DT w.r.t.
>> > the relationship between memory and the CPU hierarchy.
>> >
>> > I can imagine that we might end up with systems with multiple levels of
>> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
>> > explcit as possible from the start w.r.t. the relationship between
>> > memory and groups of CPUs such that we don't end up with multiple ways
>> > of specifying said relationship, and horrible edge cases around implicit
>> > definitions (e.g. the nid to cluster mapping).
>> are you recomending to have explicit nid attribute to each cpu device node?
>
> I am recommending that we make the relationship explicit. If anything,
> using the cpu-map (with phandles) seems like a better approach.
yes, will add the mapping.
>
>> >> +/**
>> >> + * dummy_numa_init - Fallback dummy NUMA init
>> >> + *
>> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> >> + * fails, or NUMA is disabled on the command line.
>> >> + *
>> >> + * Must online at least one node and add memory blocks that cover all
>> >> + * allowed memory.  This function must not fail.
>> >> + */
>> >> +static int __init dummy_numa_init(void)
>> >> +{
>> >> +       pr_info("%s\n",
>> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
>> >
>> > Why not print "NUMA turned off" in numa_setup?
>> enters this function only when, numa is turned off or the DT/ACPI
>> based numa init fails.
>
> Sure. Moving the "NUMA turned off" print into numa_setup would mean you
> could just print "Using dummy NUMA layout" or something to that effect
> here -- the function has no need to care about the value of numa_off.
agreed.
>
>> >
>> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
>> >> +
>> >> +       node_set(0, numa_nodes_parsed);
>> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>> >> +
>> >> +       return 0;
>> >> +}
>> >> +
>> >> +/**
>> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
>> >> + */
>> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
>> >> +                                    int depth, void *data)
>> >> +{
>> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>> >> +       const __be32 *reg, *endp, *nid_prop;
>> >> +       int l, nid;
>> >> +
>> >> +       /* We are scanning "memory" nodes only */
>> >> +       if (type == NULL) {
>> >> +               /*
>> >> +                * The longtrail doesn't have a device_type on the
>> >> +                * /memory node, so look for the node called /memory@0.
>> >> +                */
>> >> +               if (depth != 1 || strcmp(uname, "memory@0") != 0)
>> >> +                       return 0;
>> >
>> > This has no place on arm64.
>> i am not sure that we can move to driver/of, at this moment this is
>> arm64 specific binding.
>
> I meant the longtrail workaround, hence my comments on that below.
oh! thanks, will remove this.
>
>> > We limited to longtrail workaround in the core memory parsing to PPC32
>> > only in commit b44aa25d20e2ef6b (of: Handle memory@0 node on PPC32
>> > only). This code doesn't need it enabled ever.
>> >
>> > Are you booting using UEFI? This isn't going to work when the memory map
>> tried with bootwrapper, working on to boot from UEFI.
>> > comes from UEFI and we have no memory nodes in the DTB.
>> yes, there is issue with UEFI boot, since memory node is removed.
>> i request UEFI stub dev-team to suggest the possible ways to address this.
>
> I've Cc'd a few people who have worked on the stub and/or EFI memory map
> stuff. It would be worth keeping them on Cc so as to keep them informed.
thanks.
>
> I believe that the EFI stub is doing the right thing by ensuring that
> the EFI memory map is used, so this is just another configuration that
> your binding has to consider.
going through EFI stub, next is to boot numa kernel using UEFI, will
include UEFI boot support in v2 patch.
>
> Mark.
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-10-06 17:52                 ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-06 17:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 6, 2014 at 4:56 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
>> Hi Mark,
>>
>> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
>> >> Adding numa support for arm64 based platforms.
>> >> This version creates numa mapping by parsing the dt table.
>> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
>> >> memory to node id mapping is derived from nid property of memory node.
>> >
>> > [...]
>> >
>> >> +/*
>> >> + * Too small node sizes may confuse the VM badly. Usually they
>> >> + * result from BIOS bugs. So dont recognize nodes as standalone
>> >> + * NUMA entities that have less than this amount of RAM listed:
>> >> + */
>> >> +#define NODE_MIN_SIZE (4*1024*1024)
>> >
>> > Why do these confuse the VM? what does BIOS have to do with arm64?
>> sneaked in from x86, will remove this.
>> >
>> >> +
>> >> +#define parent_node(node)      (node)
>> >
>> > Huh?
>> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
>> separate header file?
>
> I was confused by a node being its own parent, but that seems to be the
> case elsewhere for parent_node() implementations. Please at least have a
> comment that we're assuming a flat hierarchy (for now).
sure, will add the required comments.
>
>> >
>> > [...]
>> >
>> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
>> >>         min = PFN_UP(memblock_start_of_DRAM());
>> >>         max = PFN_DOWN(memblock_end_of_DRAM());
>> >>
>> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>> >> +       max_pfn = max_low_pfn = max;
>> >> +
>> >> +       if (IS_ENABLED(CONFIG_NUMA))
>> >> +               arm64_numa_init();
>> >
>> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
>> > that case anyway?
>> yes, if is not required, will remove it.
>> >
>> > [...]
>> >
>> >> +/*
>> >> + *  Set the cpu to node and mem mapping
>> >> + */
>> >> +void numa_store_cpu_info(cpu)
>> >> +{
>> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
>> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
>> >> +       set_numa_node(cpu_to_node_map[cpu]);
>> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
>> >> +}
>> >
>> > I don't like this. I think we need to be more explicit in the DT w.r.t.
>> > the relationship between memory and the CPU hierarchy.
>> >
>> > I can imagine that we might end up with systems with multiple levels of
>> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
>> > explcit as possible from the start w.r.t. the relationship between
>> > memory and groups of CPUs such that we don't end up with multiple ways
>> > of specifying said relationship, and horrible edge cases around implicit
>> > definitions (e.g. the nid to cluster mapping).
>> are you recomending to have explicit nid attribute to each cpu device node?
>
> I am recommending that we make the relationship explicit. If anything,
> using the cpu-map (with phandles) seems like a better approach.
yes, will add the mapping.
>
>> >> +/**
>> >> + * dummy_numa_init - Fallback dummy NUMA init
>> >> + *
>> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
>> >> + * fails, or NUMA is disabled on the command line.
>> >> + *
>> >> + * Must online at least one node and add memory blocks that cover all
>> >> + * allowed memory.  This function must not fail.
>> >> + */
>> >> +static int __init dummy_numa_init(void)
>> >> +{
>> >> +       pr_info("%s\n",
>> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
>> >
>> > Why not print "NUMA turned off" in numa_setup?
>> enters this function only when, numa is turned off or the DT/ACPI
>> based numa init fails.
>
> Sure. Moving the "NUMA turned off" print into numa_setup would mean you
> could just print "Using dummy NUMA layout" or something to that effect
> here -- the function has no need to care about the value of numa_off.
agreed.
>
>> >
>> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
>> >> +
>> >> +       node_set(0, numa_nodes_parsed);
>> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>> >> +
>> >> +       return 0;
>> >> +}
>> >> +
>> >> +/**
>> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
>> >> + */
>> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
>> >> +                                    int depth, void *data)
>> >> +{
>> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>> >> +       const __be32 *reg, *endp, *nid_prop;
>> >> +       int l, nid;
>> >> +
>> >> +       /* We are scanning "memory" nodes only */
>> >> +       if (type == NULL) {
>> >> +               /*
>> >> +                * The longtrail doesn't have a device_type on the
>> >> +                * /memory node, so look for the node called /memory at 0.
>> >> +                */
>> >> +               if (depth != 1 || strcmp(uname, "memory at 0") != 0)
>> >> +                       return 0;
>> >
>> > This has no place on arm64.
>> i am not sure that we can move to driver/of, at this moment this is
>> arm64 specific binding.
>
> I meant the longtrail workaround, hence my comments on that below.
oh! thanks, will remove this.
>
>> > We limited to longtrail workaround in the core memory parsing to PPC32
>> > only in commit b44aa25d20e2ef6b (of: Handle memory at 0 node on PPC32
>> > only). This code doesn't need it enabled ever.
>> >
>> > Are you booting using UEFI? This isn't going to work when the memory map
>> tried with bootwrapper, working on to boot from UEFI.
>> > comes from UEFI and we have no memory nodes in the DTB.
>> yes, there is issue with UEFI boot, since memory node is removed.
>> i request UEFI stub dev-team to suggest the possible ways to address this.
>
> I've Cc'd a few people who have worked on the stub and/or EFI memory map
> stuff. It would be worth keeping them on Cc so as to keep them informed.
thanks.
>
> I believe that the EFI stub is doing the right thing by ensuring that
> the EFI memory map is used, so this is just another configuration that
> your binding has to consider.
going through EFI stub, next is to boot numa kernel using UEFI, will
include UEFI boot support in v2 patch.
>
> Mark.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-06 17:52                 ` Ganapatrao Kulkarni
@ 2014-10-17 17:19                     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-17 17:19 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Leif Lindholm,
	Roy Franz, Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA,
	steve.capper-QSEj5FYQhm4dnm+yROfE0A

Hi All,

On Mon, Oct 6, 2014 at 11:22 PM, Ganapatrao Kulkarni
<gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On Mon, Oct 6, 2014 at 4:56 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
>> On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
>>> Hi Mark,
>>>
>>> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
>>> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
>>> >> Adding numa support for arm64 based platforms.
>>> >> This version creates numa mapping by parsing the dt table.
>>> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
>>> >> memory to node id mapping is derived from nid property of memory node.
>>> >
>>> > [...]
>>> >
>>> >> +/*
>>> >> + * Too small node sizes may confuse the VM badly. Usually they
>>> >> + * result from BIOS bugs. So dont recognize nodes as standalone
>>> >> + * NUMA entities that have less than this amount of RAM listed:
>>> >> + */
>>> >> +#define NODE_MIN_SIZE (4*1024*1024)
>>> >
>>> > Why do these confuse the VM? what does BIOS have to do with arm64?
>>> sneaked in from x86, will remove this.
>>> >
>>> >> +
>>> >> +#define parent_node(node)      (node)
>>> >
>>> > Huh?
>>> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
>>> separate header file?
>>
>> I was confused by a node being its own parent, but that seems to be the
>> case elsewhere for parent_node() implementations. Please at least have a
>> comment that we're assuming a flat hierarchy (for now).
> sure, will add the required comments.
>>
>>> >
>>> > [...]
>>> >
>>> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
>>> >>         min = PFN_UP(memblock_start_of_DRAM());
>>> >>         max = PFN_DOWN(memblock_end_of_DRAM());
>>> >>
>>> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>>> >> +       max_pfn = max_low_pfn = max;
>>> >> +
>>> >> +       if (IS_ENABLED(CONFIG_NUMA))
>>> >> +               arm64_numa_init();
>>> >
>>> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
>>> > that case anyway?
>>> yes, if is not required, will remove it.
>>> >
>>> > [...]
>>> >
>>> >> +/*
>>> >> + *  Set the cpu to node and mem mapping
>>> >> + */
>>> >> +void numa_store_cpu_info(cpu)
>>> >> +{
>>> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
>>> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
>>> >> +       set_numa_node(cpu_to_node_map[cpu]);
>>> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
>>> >> +}
>>> >
>>> > I don't like this. I think we need to be more explicit in the DT w.r.t.
>>> > the relationship between memory and the CPU hierarchy.
>>> >
>>> > I can imagine that we might end up with systems with multiple levels of
>>> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
>>> > explcit as possible from the start w.r.t. the relationship between
>>> > memory and groups of CPUs such that we don't end up with multiple ways
>>> > of specifying said relationship, and horrible edge cases around implicit
>>> > definitions (e.g. the nid to cluster mapping).
>>> are you recomending to have explicit nid attribute to each cpu device node?
>>
>> I am recommending that we make the relationship explicit. If anything,
>> using the cpu-map (with phandles) seems like a better approach.
> yes, will add the mapping.
>>
>>> >> +/**
>>> >> + * dummy_numa_init - Fallback dummy NUMA init
>>> >> + *
>>> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
>>> >> + * fails, or NUMA is disabled on the command line.
>>> >> + *
>>> >> + * Must online at least one node and add memory blocks that cover all
>>> >> + * allowed memory.  This function must not fail.
>>> >> + */
>>> >> +static int __init dummy_numa_init(void)
>>> >> +{
>>> >> +       pr_info("%s\n",
>>> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
>>> >
>>> > Why not print "NUMA turned off" in numa_setup?
>>> enters this function only when, numa is turned off or the DT/ACPI
>>> based numa init fails.
>>
>> Sure. Moving the "NUMA turned off" print into numa_setup would mean you
>> could just print "Using dummy NUMA layout" or something to that effect
>> here -- the function has no need to care about the value of numa_off.
> agreed.
>>
>>> >
>>> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>>> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
>>> >> +
>>> >> +       node_set(0, numa_nodes_parsed);
>>> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>>> >> +
>>> >> +       return 0;
>>> >> +}
>>> >> +
>>> >> +/**
>>> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
>>> >> + */
>>> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
>>> >> +                                    int depth, void *data)
>>> >> +{
>>> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>>> >> +       const __be32 *reg, *endp, *nid_prop;
>>> >> +       int l, nid;
>>> >> +
>>> >> +       /* We are scanning "memory" nodes only */
>>> >> +       if (type == NULL) {
>>> >> +               /*
>>> >> +                * The longtrail doesn't have a device_type on the
>>> >> +                * /memory node, so look for the node called /memory@0.
>>> >> +                */
>>> >> +               if (depth != 1 || strcmp(uname, "memory@0") != 0)
>>> >> +                       return 0;
>>> >
>>> > This has no place on arm64.
>>> i am not sure that we can move to driver/of, at this moment this is
>>> arm64 specific binding.
>>
>> I meant the longtrail workaround, hence my comments on that below.
> oh! thanks, will remove this.
>>
>>> > We limited to longtrail workaround in the core memory parsing to PPC32
>>> > only in commit b44aa25d20e2ef6b (of: Handle memory@0 node on PPC32
>>> > only). This code doesn't need it enabled ever.
>>> >
>>> > Are you booting using UEFI? This isn't going to work when the memory map
>>> tried with bootwrapper, working on to boot from UEFI.
>>> > comes from UEFI and we have no memory nodes in the DTB.
>>> yes, there is issue with UEFI boot, since memory node is removed.
>>> i request UEFI stub dev-team to suggest the possible ways to address this.
>>
>> I've Cc'd a few people who have worked on the stub and/or EFI memory map
>> stuff. It would be worth keeping them on Cc so as to keep them informed.
> thanks.
>>
>> I believe that the EFI stub is doing the right thing by ensuring that
>> the EFI memory map is used, so this is just another configuration that
>> your binding has to consider.
> going through EFI stub, next is to boot numa kernel using UEFI, will
> include UEFI boot support in v2 patch.
>>
>> Mark.
Below is the example for the proposal of numa bindings in DT.
This covers cpu to node mapping, memory ranges to node mapping.
Also defines proximity distance matrix of nodes to each other.
please let me know your comments to go ahead with the implementation.

numa-map{
         /*  Address cells used for memory range base address in mem-map.
             For all others, size-cells is used.
             Node-count tells the number of numa nodes in the system.
         */
         #address-cells = <2>;
         #size-cells = <1>;
         #node-count = <4>;

         /* Memmap for memory ranges on each node>

         mem-map = <0x0   0x00c00000 0>,
           <0x1   0x00000000 1>,
           <0x100 0x00000000 2>,
           <0x200 0x00000000 3>;

         /* CPU to node map for 4 NODE and 16 CPUs system
        < first-cpu last-cpu  node belongs>
         */
         cpu-map = <0 3 0>,
                         <4 7 1>,
                        <8 11 2>,
                        <12 16 3>;

         /*Proximity Distance matrix for 4Node system
                       <from-node to-node distance>
        */
         node-matrix=    <0 0 10>,
                               <0 1 20>,
                               <0 2 30>,
                               <0 3 10>,
                               <1 0 20>,
                                <1 1 10>,
                               <1 2 30>,
                              <1 3 10>,
                              <2 0 30>,
                              <2 1 20>,
                              <2 2 10>,
                             <2 3 10>,
                            <3 0 10>,
                            <3 1 20>,
                            <3 2 30>,
                            <3 3 10>,
 }

thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-10-17 17:19                     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-17 17:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi All,

On Mon, Oct 6, 2014 at 11:22 PM, Ganapatrao Kulkarni
<gpkulkarni@gmail.com> wrote:
> On Mon, Oct 6, 2014 at 4:56 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>> On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
>>> Hi Mark,
>>>
>>> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland@arm.com> wrote:
>>> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
>>> >> Adding numa support for arm64 based platforms.
>>> >> This version creates numa mapping by parsing the dt table.
>>> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
>>> >> memory to node id mapping is derived from nid property of memory node.
>>> >
>>> > [...]
>>> >
>>> >> +/*
>>> >> + * Too small node sizes may confuse the VM badly. Usually they
>>> >> + * result from BIOS bugs. So dont recognize nodes as standalone
>>> >> + * NUMA entities that have less than this amount of RAM listed:
>>> >> + */
>>> >> +#define NODE_MIN_SIZE (4*1024*1024)
>>> >
>>> > Why do these confuse the VM? what does BIOS have to do with arm64?
>>> sneaked in from x86, will remove this.
>>> >
>>> >> +
>>> >> +#define parent_node(node)      (node)
>>> >
>>> > Huh?
>>> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
>>> separate header file?
>>
>> I was confused by a node being its own parent, but that seems to be the
>> case elsewhere for parent_node() implementations. Please at least have a
>> comment that we're assuming a flat hierarchy (for now).
> sure, will add the required comments.
>>
>>> >
>>> > [...]
>>> >
>>> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
>>> >>         min = PFN_UP(memblock_start_of_DRAM());
>>> >>         max = PFN_DOWN(memblock_end_of_DRAM());
>>> >>
>>> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
>>> >> +       max_pfn = max_low_pfn = max;
>>> >> +
>>> >> +       if (IS_ENABLED(CONFIG_NUMA))
>>> >> +               arm64_numa_init();
>>> >
>>> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
>>> > that case anyway?
>>> yes, if is not required, will remove it.
>>> >
>>> > [...]
>>> >
>>> >> +/*
>>> >> + *  Set the cpu to node and mem mapping
>>> >> + */
>>> >> +void numa_store_cpu_info(cpu)
>>> >> +{
>>> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
>>> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
>>> >> +       set_numa_node(cpu_to_node_map[cpu]);
>>> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
>>> >> +}
>>> >
>>> > I don't like this. I think we need to be more explicit in the DT w.r.t.
>>> > the relationship between memory and the CPU hierarchy.
>>> >
>>> > I can imagine that we might end up with systems with multiple levels of
>>> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
>>> > explcit as possible from the start w.r.t. the relationship between
>>> > memory and groups of CPUs such that we don't end up with multiple ways
>>> > of specifying said relationship, and horrible edge cases around implicit
>>> > definitions (e.g. the nid to cluster mapping).
>>> are you recomending to have explicit nid attribute to each cpu device node?
>>
>> I am recommending that we make the relationship explicit. If anything,
>> using the cpu-map (with phandles) seems like a better approach.
> yes, will add the mapping.
>>
>>> >> +/**
>>> >> + * dummy_numa_init - Fallback dummy NUMA init
>>> >> + *
>>> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
>>> >> + * fails, or NUMA is disabled on the command line.
>>> >> + *
>>> >> + * Must online at least one node and add memory blocks that cover all
>>> >> + * allowed memory.  This function must not fail.
>>> >> + */
>>> >> +static int __init dummy_numa_init(void)
>>> >> +{
>>> >> +       pr_info("%s\n",
>>> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
>>> >
>>> > Why not print "NUMA turned off" in numa_setup?
>>> enters this function only when, numa is turned off or the DT/ACPI
>>> based numa init fails.
>>
>> Sure. Moving the "NUMA turned off" print into numa_setup would mean you
>> could just print "Using dummy NUMA layout" or something to that effect
>> here -- the function has no need to care about the value of numa_off.
> agreed.
>>
>>> >
>>> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
>>> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
>>> >> +
>>> >> +       node_set(0, numa_nodes_parsed);
>>> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
>>> >> +
>>> >> +       return 0;
>>> >> +}
>>> >> +
>>> >> +/**
>>> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
>>> >> + */
>>> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
>>> >> +                                    int depth, void *data)
>>> >> +{
>>> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
>>> >> +       const __be32 *reg, *endp, *nid_prop;
>>> >> +       int l, nid;
>>> >> +
>>> >> +       /* We are scanning "memory" nodes only */
>>> >> +       if (type == NULL) {
>>> >> +               /*
>>> >> +                * The longtrail doesn't have a device_type on the
>>> >> +                * /memory node, so look for the node called /memory at 0.
>>> >> +                */
>>> >> +               if (depth != 1 || strcmp(uname, "memory at 0") != 0)
>>> >> +                       return 0;
>>> >
>>> > This has no place on arm64.
>>> i am not sure that we can move to driver/of, at this moment this is
>>> arm64 specific binding.
>>
>> I meant the longtrail workaround, hence my comments on that below.
> oh! thanks, will remove this.
>>
>>> > We limited to longtrail workaround in the core memory parsing to PPC32
>>> > only in commit b44aa25d20e2ef6b (of: Handle memory at 0 node on PPC32
>>> > only). This code doesn't need it enabled ever.
>>> >
>>> > Are you booting using UEFI? This isn't going to work when the memory map
>>> tried with bootwrapper, working on to boot from UEFI.
>>> > comes from UEFI and we have no memory nodes in the DTB.
>>> yes, there is issue with UEFI boot, since memory node is removed.
>>> i request UEFI stub dev-team to suggest the possible ways to address this.
>>
>> I've Cc'd a few people who have worked on the stub and/or EFI memory map
>> stuff. It would be worth keeping them on Cc so as to keep them informed.
> thanks.
>>
>> I believe that the EFI stub is doing the right thing by ensuring that
>> the EFI memory map is used, so this is just another configuration that
>> your binding has to consider.
> going through EFI stub, next is to boot numa kernel using UEFI, will
> include UEFI boot support in v2 patch.
>>
>> Mark.
Below is the example for the proposal of numa bindings in DT.
This covers cpu to node mapping, memory ranges to node mapping.
Also defines proximity distance matrix of nodes to each other.
please let me know your comments to go ahead with the implementation.

numa-map{
         /*  Address cells used for memory range base address in mem-map.
             For all others, size-cells is used.
             Node-count tells the number of numa nodes in the system.
         */
         #address-cells = <2>;
         #size-cells = <1>;
         #node-count = <4>;

         /* Memmap for memory ranges on each node>

         mem-map = <0x0   0x00c00000 0>,
           <0x1   0x00000000 1>,
           <0x100 0x00000000 2>,
           <0x200 0x00000000 3>;

         /* CPU to node map for 4 NODE and 16 CPUs system
        < first-cpu last-cpu  node belongs>
         */
         cpu-map = <0 3 0>,
                         <4 7 1>,
                        <8 11 2>,
                        <12 16 3>;

         /*Proximity Distance matrix for 4Node system
                       <from-node to-node distance>
        */
         node-matrix=    <0 0 10>,
                               <0 1 20>,
                               <0 2 30>,
                               <0 3 10>,
                               <1 0 20>,
                                <1 1 10>,
                               <1 2 30>,
                              <1 3 10>,
                              <2 0 30>,
                              <2 1 20>,
                              <2 2 10>,
                             <2 3 10>,
                            <3 0 10>,
                            <3 1 20>,
                            <3 2 30>,
                            <3 3 10>,
 }

thanks
Ganapat

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-17 17:19                     ` Ganapatrao Kulkarni
@ 2014-10-20 14:25                         ` Steve Capper
  -1 siblings, 0 replies; 44+ messages in thread
From: Steve Capper @ 2014-10-20 14:25 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Mark Rutland, Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Leif Lindholm,
	Roy Franz, Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA

On Fri, Oct 17, 2014 at 10:49:56PM +0530, Ganapatrao Kulkarni wrote:
> Hi All,
> 
> On Mon, Oct 6, 2014 at 11:22 PM, Ganapatrao Kulkarni
> <gpkulkarni-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Mon, Oct 6, 2014 at 4:56 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> >> On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
> >>> Hi Mark,
> >>>
> >>> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland-5wv7dgnIgG8@public.gmane.org> wrote:
> >>> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
> >>> >> Adding numa support for arm64 based platforms.
> >>> >> This version creates numa mapping by parsing the dt table.
> >>> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
> >>> >> memory to node id mapping is derived from nid property of memory node.
> >>> >
> >>> > [...]
> >>> >
> >>> >> +/*
> >>> >> + * Too small node sizes may confuse the VM badly. Usually they
> >>> >> + * result from BIOS bugs. So dont recognize nodes as standalone
> >>> >> + * NUMA entities that have less than this amount of RAM listed:
> >>> >> + */
> >>> >> +#define NODE_MIN_SIZE (4*1024*1024)
> >>> >
> >>> > Why do these confuse the VM? what does BIOS have to do with arm64?
> >>> sneaked in from x86, will remove this.
> >>> >
> >>> >> +
> >>> >> +#define parent_node(node)      (node)
> >>> >
> >>> > Huh?
> >>> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
> >>> separate header file?
> >>
> >> I was confused by a node being its own parent, but that seems to be the
> >> case elsewhere for parent_node() implementations. Please at least have a
> >> comment that we're assuming a flat hierarchy (for now).
> > sure, will add the required comments.
> >>
> >>> >
> >>> > [...]
> >>> >
> >>> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
> >>> >>         min = PFN_UP(memblock_start_of_DRAM());
> >>> >>         max = PFN_DOWN(memblock_end_of_DRAM());
> >>> >>
> >>> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> >>> >> +       max_pfn = max_low_pfn = max;
> >>> >> +
> >>> >> +       if (IS_ENABLED(CONFIG_NUMA))
> >>> >> +               arm64_numa_init();
> >>> >
> >>> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
> >>> > that case anyway?
> >>> yes, if is not required, will remove it.
> >>> >
> >>> > [...]
> >>> >
> >>> >> +/*
> >>> >> + *  Set the cpu to node and mem mapping
> >>> >> + */
> >>> >> +void numa_store_cpu_info(cpu)
> >>> >> +{
> >>> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
> >>> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
> >>> >> +       set_numa_node(cpu_to_node_map[cpu]);
> >>> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
> >>> >> +}
> >>> >
> >>> > I don't like this. I think we need to be more explicit in the DT w.r.t.
> >>> > the relationship between memory and the CPU hierarchy.
> >>> >
> >>> > I can imagine that we might end up with systems with multiple levels of
> >>> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
> >>> > explcit as possible from the start w.r.t. the relationship between
> >>> > memory and groups of CPUs such that we don't end up with multiple ways
> >>> > of specifying said relationship, and horrible edge cases around implicit
> >>> > definitions (e.g. the nid to cluster mapping).
> >>> are you recomending to have explicit nid attribute to each cpu device node?
> >>
> >> I am recommending that we make the relationship explicit. If anything,
> >> using the cpu-map (with phandles) seems like a better approach.
> > yes, will add the mapping.
> >>
> >>> >> +/**
> >>> >> + * dummy_numa_init - Fallback dummy NUMA init
> >>> >> + *
> >>> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
> >>> >> + * fails, or NUMA is disabled on the command line.
> >>> >> + *
> >>> >> + * Must online at least one node and add memory blocks that cover all
> >>> >> + * allowed memory.  This function must not fail.
> >>> >> + */
> >>> >> +static int __init dummy_numa_init(void)
> >>> >> +{
> >>> >> +       pr_info("%s\n",
> >>> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
> >>> >
> >>> > Why not print "NUMA turned off" in numa_setup?
> >>> enters this function only when, numa is turned off or the DT/ACPI
> >>> based numa init fails.
> >>
> >> Sure. Moving the "NUMA turned off" print into numa_setup would mean you
> >> could just print "Using dummy NUMA layout" or something to that effect
> >> here -- the function has no need to care about the value of numa_off.
> > agreed.
> >>
> >>> >
> >>> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> >>> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
> >>> >> +
> >>> >> +       node_set(0, numa_nodes_parsed);
> >>> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> >>> >> +
> >>> >> +       return 0;
> >>> >> +}
> >>> >> +
> >>> >> +/**
> >>> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
> >>> >> + */
> >>> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
> >>> >> +                                    int depth, void *data)
> >>> >> +{
> >>> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> >>> >> +       const __be32 *reg, *endp, *nid_prop;
> >>> >> +       int l, nid;
> >>> >> +
> >>> >> +       /* We are scanning "memory" nodes only */
> >>> >> +       if (type == NULL) {
> >>> >> +               /*
> >>> >> +                * The longtrail doesn't have a device_type on the
> >>> >> +                * /memory node, so look for the node called /memory@0.
> >>> >> +                */
> >>> >> +               if (depth != 1 || strcmp(uname, "memory@0") != 0)
> >>> >> +                       return 0;
> >>> >
> >>> > This has no place on arm64.
> >>> i am not sure that we can move to driver/of, at this moment this is
> >>> arm64 specific binding.
> >>
> >> I meant the longtrail workaround, hence my comments on that below.
> > oh! thanks, will remove this.
> >>
> >>> > We limited to longtrail workaround in the core memory parsing to PPC32
> >>> > only in commit b44aa25d20e2ef6b (of: Handle memory@0 node on PPC32
> >>> > only). This code doesn't need it enabled ever.
> >>> >
> >>> > Are you booting using UEFI? This isn't going to work when the memory map
> >>> tried with bootwrapper, working on to boot from UEFI.
> >>> > comes from UEFI and we have no memory nodes in the DTB.
> >>> yes, there is issue with UEFI boot, since memory node is removed.
> >>> i request UEFI stub dev-team to suggest the possible ways to address this.
> >>
> >> I've Cc'd a few people who have worked on the stub and/or EFI memory map
> >> stuff. It would be worth keeping them on Cc so as to keep them informed.
> > thanks.
> >>
> >> I believe that the EFI stub is doing the right thing by ensuring that
> >> the EFI memory map is used, so this is just another configuration that
> >> your binding has to consider.
> > going through EFI stub, next is to boot numa kernel using UEFI, will
> > include UEFI boot support in v2 patch.
> >>
> >> Mark.
> Below is the example for the proposal of numa bindings in DT.
> This covers cpu to node mapping, memory ranges to node mapping.
> Also defines proximity distance matrix of nodes to each other.
> please let me know your comments to go ahead with the implementation.
> 
> numa-map{
>          /*  Address cells used for memory range base address in mem-map.
>              For all others, size-cells is used.
>              Node-count tells the number of numa nodes in the system.
>          */
>          #address-cells = <2>;
>          #size-cells = <1>;
>          #node-count = <4>;
> 
>          /* Memmap for memory ranges on each node>
> 
>          mem-map = <0x0   0x00c00000 0>,
>            <0x1   0x00000000 1>,
>            <0x100 0x00000000 2>,
>            <0x200 0x00000000 3>;
> 
>          /* CPU to node map for 4 NODE and 16 CPUs system
>         < first-cpu last-cpu  node belongs>
>          */
>          cpu-map = <0 3 0>,
>                          <4 7 1>,
>                         <8 11 2>,
>                         <12 16 3>;
> 
>          /*Proximity Distance matrix for 4Node system
>                        <from-node to-node distance>
>         */
>          node-matrix=    <0 0 10>,
>                                <0 1 20>,
>                                <0 2 30>,
>                                <0 3 10>,
>                                <1 0 20>,
>                                 <1 1 10>,
>                                <1 2 30>,
>                               <1 3 10>,
>                               <2 0 30>,
>                               <2 1 20>,
>                               <2 2 10>,
>                              <2 3 10>,
>                             <3 0 10>,
>                             <3 1 20>,
>                             <3 2 30>,
>                             <3 3 10>,
>  }

Hi Ganapat,
The above caught my attention.

For a 4-node system do we not need 16 distances; the implication of that
would be that the distance between node A-B could be different from the
distance between B-A? Also the distance from a node to itself could be
safely assumed to be zero?

I think we should have a symmetric matrix with zero-diagonals so strictly
only seven values would need specifying for a 4-node system.

Cheers,
-- 
Steve
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-10-20 14:25                         ` Steve Capper
  0 siblings, 0 replies; 44+ messages in thread
From: Steve Capper @ 2014-10-20 14:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Oct 17, 2014 at 10:49:56PM +0530, Ganapatrao Kulkarni wrote:
> Hi All,
> 
> On Mon, Oct 6, 2014 at 11:22 PM, Ganapatrao Kulkarni
> <gpkulkarni@gmail.com> wrote:
> > On Mon, Oct 6, 2014 at 4:56 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> >> On Mon, Oct 06, 2014 at 06:14:36AM +0100, Ganapatrao Kulkarni wrote:
> >>> Hi Mark,
> >>>
> >>> On Fri, Oct 3, 2014 at 5:43 PM, Mark Rutland <mark.rutland@arm.com> wrote:
> >>> > On Thu, Sep 25, 2014 at 10:03:59AM +0100, Ganapatrao Kulkarni wrote:
> >>> >> Adding numa support for arm64 based platforms.
> >>> >> This version creates numa mapping by parsing the dt table.
> >>> >> cpu to node id mapping is derived from cluster_id as defined in cpu-map.
> >>> >> memory to node id mapping is derived from nid property of memory node.
> >>> >
> >>> > [...]
> >>> >
> >>> >> +/*
> >>> >> + * Too small node sizes may confuse the VM badly. Usually they
> >>> >> + * result from BIOS bugs. So dont recognize nodes as standalone
> >>> >> + * NUMA entities that have less than this amount of RAM listed:
> >>> >> + */
> >>> >> +#define NODE_MIN_SIZE (4*1024*1024)
> >>> >
> >>> > Why do these confuse the VM? what does BIOS have to do with arm64?
> >>> sneaked in from x86, will remove this.
> >>> >
> >>> >> +
> >>> >> +#define parent_node(node)      (node)
> >>> >
> >>> > Huh?
> >>> for thunder, no hierarchy at numa nodes. shall i put under ifdef or
> >>> separate header file?
> >>
> >> I was confused by a node being its own parent, but that seems to be the
> >> case elsewhere for parent_node() implementations. Please at least have a
> >> comment that we're assuming a flat hierarchy (for now).
> > sure, will add the required comments.
> >>
> >>> >
> >>> > [...]
> >>> >
> >>> >> @@ -168,6 +191,11 @@ void __init bootmem_init(void)
> >>> >>         min = PFN_UP(memblock_start_of_DRAM());
> >>> >>         max = PFN_DOWN(memblock_end_of_DRAM());
> >>> >>
> >>> >> +       high_memory = __va((max << PAGE_SHIFT) - 1) + 1;
> >>> >> +       max_pfn = max_low_pfn = max;
> >>> >> +
> >>> >> +       if (IS_ENABLED(CONFIG_NUMA))
> >>> >> +               arm64_numa_init();
> >>> >
> >>> > Is this function defined if !CONFIG_NUMA? Surely it must do nothing in
> >>> > that case anyway?
> >>> yes, if is not required, will remove it.
> >>> >
> >>> > [...]
> >>> >
> >>> >> +/*
> >>> >> + *  Set the cpu to node and mem mapping
> >>> >> + */
> >>> >> +void numa_store_cpu_info(cpu)
> >>> >> +{
> >>> >> +       cpu_to_node_map[cpu] = cpu_topology[cpu].cluster_id;
> >>> >> +       cpumask_set_cpu(cpu, node_to_cpumask_map[cpu_to_node_map[cpu]]);
> >>> >> +       set_numa_node(cpu_to_node_map[cpu]);
> >>> >> +       set_numa_mem(local_memory_node(cpu_to_node_map[cpu]));
> >>> >> +}
> >>> >
> >>> > I don't like this. I think we need to be more explicit in the DT w.r.t.
> >>> > the relationship between memory and the CPU hierarchy.
> >>> >
> >>> > I can imagine that we might end up with systems with multiple levels of
> >>> > NUMA hierarchy (using MPIDR_EL1.Aff{3,2}), and I'd rather that we were
> >>> > explcit as possible from the start w.r.t. the relationship between
> >>> > memory and groups of CPUs such that we don't end up with multiple ways
> >>> > of specifying said relationship, and horrible edge cases around implicit
> >>> > definitions (e.g. the nid to cluster mapping).
> >>> are you recomending to have explicit nid attribute to each cpu device node?
> >>
> >> I am recommending that we make the relationship explicit. If anything,
> >> using the cpu-map (with phandles) seems like a better approach.
> > yes, will add the mapping.
> >>
> >>> >> +/**
> >>> >> + * dummy_numa_init - Fallback dummy NUMA init
> >>> >> + *
> >>> >> + * Used if there's no underlying NUMA architecture, NUMA initialization
> >>> >> + * fails, or NUMA is disabled on the command line.
> >>> >> + *
> >>> >> + * Must online at least one node and add memory blocks that cover all
> >>> >> + * allowed memory.  This function must not fail.
> >>> >> + */
> >>> >> +static int __init dummy_numa_init(void)
> >>> >> +{
> >>> >> +       pr_info("%s\n",
> >>> >> +              numa_off ? "NUMA turned off" : "No NUMA configuration found");
> >>> >
> >>> > Why not print "NUMA turned off" in numa_setup?
> >>> enters this function only when, numa is turned off or the DT/ACPI
> >>> based numa init fails.
> >>
> >> Sure. Moving the "NUMA turned off" print into numa_setup would mean you
> >> could just print "Using dummy NUMA layout" or something to that effect
> >> here -- the function has no need to care about the value of numa_off.
> > agreed.
> >>
> >>> >
> >>> >> +       pr_info("Faking a node at [mem %#018Lx-%#018Lx]\n",
> >>> >> +              0LLU, PFN_PHYS(max_pfn) - 1);
> >>> >> +
> >>> >> +       node_set(0, numa_nodes_parsed);
> >>> >> +       numa_add_memblk(0, 0, PFN_PHYS(max_pfn));
> >>> >> +
> >>> >> +       return 0;
> >>> >> +}
> >>> >> +
> >>> >> +/**
> >>> >> + * early_init_dt_scan_numa_map - parse memory node and map nid to memory range.
> >>> >> + */
> >>> >> +int __init early_init_dt_scan_numa_map(unsigned long node, const char *uname,
> >>> >> +                                    int depth, void *data)
> >>> >> +{
> >>> >> +       const char *type = of_get_flat_dt_prop(node, "device_type", NULL);
> >>> >> +       const __be32 *reg, *endp, *nid_prop;
> >>> >> +       int l, nid;
> >>> >> +
> >>> >> +       /* We are scanning "memory" nodes only */
> >>> >> +       if (type == NULL) {
> >>> >> +               /*
> >>> >> +                * The longtrail doesn't have a device_type on the
> >>> >> +                * /memory node, so look for the node called /memory at 0.
> >>> >> +                */
> >>> >> +               if (depth != 1 || strcmp(uname, "memory at 0") != 0)
> >>> >> +                       return 0;
> >>> >
> >>> > This has no place on arm64.
> >>> i am not sure that we can move to driver/of, at this moment this is
> >>> arm64 specific binding.
> >>
> >> I meant the longtrail workaround, hence my comments on that below.
> > oh! thanks, will remove this.
> >>
> >>> > We limited to longtrail workaround in the core memory parsing to PPC32
> >>> > only in commit b44aa25d20e2ef6b (of: Handle memory at 0 node on PPC32
> >>> > only). This code doesn't need it enabled ever.
> >>> >
> >>> > Are you booting using UEFI? This isn't going to work when the memory map
> >>> tried with bootwrapper, working on to boot from UEFI.
> >>> > comes from UEFI and we have no memory nodes in the DTB.
> >>> yes, there is issue with UEFI boot, since memory node is removed.
> >>> i request UEFI stub dev-team to suggest the possible ways to address this.
> >>
> >> I've Cc'd a few people who have worked on the stub and/or EFI memory map
> >> stuff. It would be worth keeping them on Cc so as to keep them informed.
> > thanks.
> >>
> >> I believe that the EFI stub is doing the right thing by ensuring that
> >> the EFI memory map is used, so this is just another configuration that
> >> your binding has to consider.
> > going through EFI stub, next is to boot numa kernel using UEFI, will
> > include UEFI boot support in v2 patch.
> >>
> >> Mark.
> Below is the example for the proposal of numa bindings in DT.
> This covers cpu to node mapping, memory ranges to node mapping.
> Also defines proximity distance matrix of nodes to each other.
> please let me know your comments to go ahead with the implementation.
> 
> numa-map{
>          /*  Address cells used for memory range base address in mem-map.
>              For all others, size-cells is used.
>              Node-count tells the number of numa nodes in the system.
>          */
>          #address-cells = <2>;
>          #size-cells = <1>;
>          #node-count = <4>;
> 
>          /* Memmap for memory ranges on each node>
> 
>          mem-map = <0x0   0x00c00000 0>,
>            <0x1   0x00000000 1>,
>            <0x100 0x00000000 2>,
>            <0x200 0x00000000 3>;
> 
>          /* CPU to node map for 4 NODE and 16 CPUs system
>         < first-cpu last-cpu  node belongs>
>          */
>          cpu-map = <0 3 0>,
>                          <4 7 1>,
>                         <8 11 2>,
>                         <12 16 3>;
> 
>          /*Proximity Distance matrix for 4Node system
>                        <from-node to-node distance>
>         */
>          node-matrix=    <0 0 10>,
>                                <0 1 20>,
>                                <0 2 30>,
>                                <0 3 10>,
>                                <1 0 20>,
>                                 <1 1 10>,
>                                <1 2 30>,
>                               <1 3 10>,
>                               <2 0 30>,
>                               <2 1 20>,
>                               <2 2 10>,
>                              <2 3 10>,
>                             <3 0 10>,
>                             <3 1 20>,
>                             <3 2 30>,
>                             <3 3 10>,
>  }

Hi Ganapat,
The above caught my attention.

For a 4-node system do we not need 16 distances; the implication of that
would be that the distance between node A-B could be different from the
distance between B-A? Also the distance from a node to itself could be
safely assumed to be zero?

I think we should have a symmetric matrix with zero-diagonals so strictly
only seven values would need specifying for a 4-node system.

Cheers,
-- 
Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-20 14:25                         ` Steve Capper
@ 2014-10-20 14:30                             ` Steve Capper
  -1 siblings, 0 replies; 44+ messages in thread
From: Steve Capper @ 2014-10-20 14:30 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Mark Rutland, Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Leif Lindholm,
	Roy Franz, Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA

On Mon, Oct 20, 2014 at 03:25:56PM +0100, Steve Capper wrote:
> On Fri, Oct 17, 2014 at 10:49:56PM +0530, Ganapatrao Kulkarni wrote:

[...]

>          /*Proximity Distance matrix for 4Node system
> >                        <from-node to-node distance>
> >         */
> >          node-matrix=    <0 0 10>,
> >                                <0 1 20>,
> >                                <0 2 30>,
> >                                <0 3 10>,
> >                                <1 0 20>,
> >                                 <1 1 10>,
> >                                <1 2 30>,
> >                               <1 3 10>,
> >                               <2 0 30>,
> >                               <2 1 20>,
> >                               <2 2 10>,
> >                              <2 3 10>,
> >                             <3 0 10>,
> >                             <3 1 20>,
> >                             <3 2 30>,
> >                             <3 3 10>,
> >  }
> 
> Hi Ganapat,
> The above caught my attention.
> 
> For a 4-node system do we not need 16 distances; the implication of that
> would be that the distance between node A-B could be different from the
> distance between B-A? Also the distance from a node to itself could be
> safely assumed to be zero?
> 
> I think we should have a symmetric matrix with zero-diagonals so strictly
> only seven values would need specifying for a 4-node system.

s/seven/six/

I really need to learn how to count.... :-/

> 
> Cheers,
> -- 
> Steve
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-10-20 14:30                             ` Steve Capper
  0 siblings, 0 replies; 44+ messages in thread
From: Steve Capper @ 2014-10-20 14:30 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 20, 2014 at 03:25:56PM +0100, Steve Capper wrote:
> On Fri, Oct 17, 2014 at 10:49:56PM +0530, Ganapatrao Kulkarni wrote:

[...]

>          /*Proximity Distance matrix for 4Node system
> >                        <from-node to-node distance>
> >         */
> >          node-matrix=    <0 0 10>,
> >                                <0 1 20>,
> >                                <0 2 30>,
> >                                <0 3 10>,
> >                                <1 0 20>,
> >                                 <1 1 10>,
> >                                <1 2 30>,
> >                               <1 3 10>,
> >                               <2 0 30>,
> >                               <2 1 20>,
> >                               <2 2 10>,
> >                              <2 3 10>,
> >                             <3 0 10>,
> >                             <3 1 20>,
> >                             <3 2 30>,
> >                             <3 3 10>,
> >  }
> 
> Hi Ganapat,
> The above caught my attention.
> 
> For a 4-node system do we not need 16 distances; the implication of that
> would be that the distance between node A-B could be different from the
> distance between B-A? Also the distance from a node to itself could be
> safely assumed to be zero?
> 
> I think we should have a symmetric matrix with zero-diagonals so strictly
> only seven values would need specifying for a 4-node system.

s/seven/six/

I really need to learn how to count.... :-/

> 
> Cheers,
> -- 
> Steve

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-20 14:30                             ` Steve Capper
@ 2014-10-22 11:27                                 ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-22 11:27 UTC (permalink / raw)
  To: Steve Capper
  Cc: Mark Rutland, Ganapatrao Kulkarni, Catalin Marinas, Will Deacon,
	grant.likely-QSEj5FYQhm4dnm+yROfE0A,
	robh+dt-DgEjT+Ai2ygdnm+yROfE0A,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Leif Lindholm,
	Roy Franz, Ard Biesheuvel, msalter-H+wXaHxf7aLQT0dZR+AlfA

On Mon, Oct 20, 2014 at 8:00 PM, Steve Capper <steve.capper-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org> wrote:
> On Mon, Oct 20, 2014 at 03:25:56PM +0100, Steve Capper wrote:
>> On Fri, Oct 17, 2014 at 10:49:56PM +0530, Ganapatrao Kulkarni wrote:
>
> [...]
>
>>          /*Proximity Distance matrix for 4Node system
>> >                        <from-node to-node distance>
>> >         */
>> >          node-matrix=    <0 0 10>,
>> >                                <0 1 20>,
>> >                                <0 2 30>,
>> >                                <0 3 10>,
>> >                                <1 0 20>,
>> >                                 <1 1 10>,
>> >                                <1 2 30>,
>> >                               <1 3 10>,
>> >                               <2 0 30>,
>> >                               <2 1 20>,
>> >                               <2 2 10>,
>> >                              <2 3 10>,
>> >                             <3 0 10>,
>> >                             <3 1 20>,
>> >                             <3 2 30>,
>> >                             <3 3 10>,
>> >  }
>>
>> Hi Ganapat,
>> The above caught my attention.
>>
>> For a 4-node system do we not need 16 distances; the implication of that
>> would be that the distance between node A-B could be different from the
>> distance between B-A? Also the distance from a node to itself could be
>> safely assumed to be zero?
>>
>> I think we should have a symmetric matrix with zero-diagonals so strictly
>> only seven values would need specifying for a 4-node system.
Thanks Stave for the comments.
I too thought initially to take the assumption like
distance B-A is same as A-B and A-A is always defined to LOCAL.
However this example is an ideal case which will define all 16
distances for 4 node system.
In actual DT, we can skip to provide A-A and B-A, In kernel
implementation we will use these common/generic assumptions
to derive missing distances.
>
> s/seven/six/
>
> I really need to learn how to count.... :-/
>
>>
>> Cheers,
>> --
>> Steve
thanks
Ganapat
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
@ 2014-10-22 11:27                                 ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-22 11:27 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Oct 20, 2014 at 8:00 PM, Steve Capper <steve.capper@linaro.org> wrote:
> On Mon, Oct 20, 2014 at 03:25:56PM +0100, Steve Capper wrote:
>> On Fri, Oct 17, 2014 at 10:49:56PM +0530, Ganapatrao Kulkarni wrote:
>
> [...]
>
>>          /*Proximity Distance matrix for 4Node system
>> >                        <from-node to-node distance>
>> >         */
>> >          node-matrix=    <0 0 10>,
>> >                                <0 1 20>,
>> >                                <0 2 30>,
>> >                                <0 3 10>,
>> >                                <1 0 20>,
>> >                                 <1 1 10>,
>> >                                <1 2 30>,
>> >                               <1 3 10>,
>> >                               <2 0 30>,
>> >                               <2 1 20>,
>> >                               <2 2 10>,
>> >                              <2 3 10>,
>> >                             <3 0 10>,
>> >                             <3 1 20>,
>> >                             <3 2 30>,
>> >                             <3 3 10>,
>> >  }
>>
>> Hi Ganapat,
>> The above caught my attention.
>>
>> For a 4-node system do we not need 16 distances; the implication of that
>> would be that the distance between node A-B could be different from the
>> distance between B-A? Also the distance from a node to itself could be
>> safely assumed to be zero?
>>
>> I think we should have a symmetric matrix with zero-diagonals so strictly
>> only seven values would need specifying for a 4-node system.
Thanks Stave for the comments.
I too thought initially to take the assumption like
distance B-A is same as A-B and A-A is always defined to LOCAL.
However this example is an ideal case which will define all 16
distances for 4 node system.
In actual DT, we can skip to provide A-A and B-A, In kernel
implementation we will use these common/generic assumptions
to derive missing distances.
>
> s/seven/six/
>
> I really need to learn how to count.... :-/
>
>>
>> Cheers,
>> --
>> Steve
thanks
Ganapat

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-17 17:19                     ` Ganapatrao Kulkarni
  (?)
  (?)
@ 2014-10-28  8:48                     ` Hanjun Guo
  2014-10-29  7:20                       ` Ganapatrao Kulkarni
  -1 siblings, 1 reply; 44+ messages in thread
From: Hanjun Guo @ 2014-10-28  8:48 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Ganapatrao,

On 2014-10-18 1:19, Ganapatrao Kulkarni wrote:
[...]
> Below is the example for the proposal of numa bindings in DT.
> This covers cpu to node mapping, memory ranges to node mapping.
> Also defines proximity distance matrix of nodes to each other.
> please let me know your comments to go ahead with the implementation.
> 
> numa-map{
>          /*  Address cells used for memory range base address in mem-map.
>              For all others, size-cells is used.
>              Node-count tells the number of numa nodes in the system.
>          */
>          #address-cells = <2>;
>          #size-cells = <1>;
>          #node-count = <4>;
> 
>          /* Memmap for memory ranges on each node>
> 
>          mem-map = <0x0   0x00c00000 0>,
>            <0x1   0x00000000 1>,
>            <0x100 0x00000000 2>,
>            <0x200 0x00000000 3>;
> 
>          /* CPU to node map for 4 NODE and 16 CPUs system
>         < first-cpu last-cpu  node belongs>

What's the property for the cpu? MPIDR of this CPU?

>          */
>          cpu-map = <0 3 0>,
>                          <4 7 1>,
>                         <8 11 2>,
>                         <12 16 3>;
> 

Thanks
Hanjun

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms.
  2014-10-28  8:48                     ` Hanjun Guo
@ 2014-10-29  7:20                       ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 44+ messages in thread
From: Ganapatrao Kulkarni @ 2014-10-29  7:20 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Hanjun,

On Tue, Oct 28, 2014 at 2:18 PM, Hanjun Guo <hanjun.guo@linaro.org> wrote:
> Hi Ganapatrao,
>
> On 2014-10-18 1:19, Ganapatrao Kulkarni wrote:
> [...]
>> Below is the example for the proposal of numa bindings in DT.
>> This covers cpu to node mapping, memory ranges to node mapping.
>> Also defines proximity distance matrix of nodes to each other.
>> please let me know your comments to go ahead with the implementation.
>>
>> numa-map{
>>          /*  Address cells used for memory range base address in mem-map.
>>              For all others, size-cells is used.
>>              Node-count tells the number of numa nodes in the system.
>>          */
>>          #address-cells = <2>;
>>          #size-cells = <1>;
>>          #node-count = <4>;
>>
>>          /* Memmap for memory ranges on each node>
>>
>>          mem-map = <0x0   0x00c00000 0>,
>>            <0x1   0x00000000 1>,
>>            <0x100 0x00000000 2>,
>>            <0x200 0x00000000 3>;
>>
>>          /* CPU to node map for 4 NODE and 16 CPUs system
>>         < first-cpu last-cpu  node belongs>
>
> What's the property for the cpu? MPIDR of this CPU?
I see in ACPI spec, there is mapping between logical cpu number to
physical id(using MPIDR) using GICC Affinity Structure
Here i am defining mapping between the logical CPUs and the node it belongs.
SMP initialization uses CPU node property of DT to bring them up. We
can get physical id using cpu_logical_map(cpu), which is MPIDR
Do you see any need to expand cpu-map here to have the mapping between
physical cpu id and logical cpu id?
>
>>          */
>>          cpu-map = <0 3 0>,
>>                          <4 7 1>,
>>                         <8 11 2>,
>>                         <12 16 3>;
>>
>
> Thanks
> Hanjun
>
>
thanks
ganapat
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2014-10-29  7:20 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-25  9:03 [RFC PATCH 0/4] arm64:numa: Add numa support for arm64 platforms Ganapatrao Kulkarni
2014-09-25  9:03 ` Ganapatrao Kulkarni
     [not found] ` <1411635840-24038-1-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-09-25  9:03   ` [RFC PATCH 1/4] arm64: defconfig: increase NR_CPUS range to 2-128 Ganapatrao Kulkarni
2014-09-25  9:03     ` Ganapatrao Kulkarni
     [not found]     ` <1411635840-24038-2-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-10-03 10:58       ` Mark Rutland
2014-10-03 10:58         ` Mark Rutland
2014-10-06  4:29         ` Ganapatrao Kulkarni
2014-10-06  4:29           ` Ganapatrao Kulkarni
2014-09-25  9:03   ` [RFC PATCH 2/4] arm/arm64:dt:numa: adding numa node mapping for memory nodes Ganapatrao Kulkarni
2014-09-25  9:03     ` Ganapatrao Kulkarni
     [not found]     ` <1411635840-24038-3-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-10-03 11:05       ` Mark Rutland
2014-10-03 11:05         ` Mark Rutland
2014-10-06  4:20         ` Ganapatrao Kulkarni
2014-10-06  4:20           ` Ganapatrao Kulkarni
     [not found]           ` <CAFpQJXUaTGS+D8q-GDLRhWFJg88rJToMpVnSHggS5e0HLDvXPw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-06 11:08             ` Mark Rutland
2014-10-06 11:08               ` Mark Rutland
2014-10-06 17:26               ` Ganapatrao Kulkarni
2014-10-06 17:26                 ` Ganapatrao Kulkarni
2014-09-25  9:03   ` [RFC PATCH 3/4] arm64:thunder: Add initial dts for Cavium Thunder SoC in 2 Node topology Ganapatrao Kulkarni
2014-09-25  9:03     ` Ganapatrao Kulkarni
     [not found]     ` <1411635840-24038-4-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-10-03 11:19       ` Mark Rutland
2014-10-03 11:19         ` Mark Rutland
2014-09-25  9:03   ` [RFC PATCH 4/4] arm64:numa: adding numa support for arm64 platforms Ganapatrao Kulkarni
2014-09-25  9:03     ` Ganapatrao Kulkarni
     [not found]     ` <1411635840-24038-5-git-send-email-ganapatrao.kulkarni-M3mlKVOIwJVv6pq1l3V1OdBPR1lH4CV8@public.gmane.org>
2014-10-03 12:13       ` Mark Rutland
2014-10-03 12:13         ` Mark Rutland
2014-10-06  5:14         ` Ganapatrao Kulkarni
2014-10-06  5:14           ` Ganapatrao Kulkarni
     [not found]           ` <CAFpQJXVJtn=PTYTd5icHhfYxsQnEzUP+w9kXXDJSs-M=eGNWVw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-06 11:26             ` Mark Rutland
2014-10-06 11:26               ` Mark Rutland
2014-10-06 17:52               ` Ganapatrao Kulkarni
2014-10-06 17:52                 ` Ganapatrao Kulkarni
     [not found]                 ` <CAFpQJXU4AW_WycWUWWaONvCiFgJU5KC_coCH8EU9kO=a0Rf9hQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-17 17:19                   ` Ganapatrao Kulkarni
2014-10-17 17:19                     ` Ganapatrao Kulkarni
     [not found]                     ` <CAFpQJXXkgtr6E0owHxAu0MG8+7s5LBt_mVB9gQM1VfkX2rY5FQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-10-20 14:25                       ` Steve Capper
2014-10-20 14:25                         ` Steve Capper
     [not found]                         ` <20141020142555.GA9968-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-10-20 14:30                           ` Steve Capper
2014-10-20 14:30                             ` Steve Capper
     [not found]                             ` <20141020143045.GA10233-QSEj5FYQhm4dnm+yROfE0A@public.gmane.org>
2014-10-22 11:27                               ` Ganapatrao Kulkarni
2014-10-22 11:27                                 ` Ganapatrao Kulkarni
2014-10-28  8:48                     ` Hanjun Guo
2014-10-29  7:20                       ` Ganapatrao Kulkarni
2014-09-25  9:04   ` Ganapatrao Kulkarni
2014-09-25  9:04     ` Ganapatrao Kulkarni

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.